Both sides previous revisionPrevious revisionNext revision | Previous revision |
help:proc [2017/04/06 14:43] – peter | help:proc [2020/04/15 10:34] (current) – removed peter |
---|
====== Proc ====== | |
| |
The **/proc** file system, doesn't contain 'real' files. Most of the 'files' within /proc have a file size of 0. | |
| |
/proc simply acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system (such as memory, disks mounted, hardware configuration, etc.) and to change certain kernel parameters at runtime (sysctl). | |
| |
It not only allows access to process data but also allows you to request the kernel status by reading files in the hierarchy. | |
| |
| |
===== Process-Specific Sub-directories ===== | |
| |
The **/proc** directory contains (among other things) one sub-directory for each process running on the system, which is named after the process ID (PID). | |
| |
Each process sub-directory has the following entries. | |
| |
^File^Content^ | |
|[[Proc:clear_refs file|clear_refs]]|Clears page referenced bits shown in smaps output.| | |
|cmdline|Command line arguments.| | |
|cpu|Current and last cpu in which it was executed (2.4)(smp).| | |
|cwd|Link to the current working directory.| | |
|environ|Values of environment variables.| | |
|exe|Link to the executable of this process.| | |
|fd|Directory, which contains all file descriptors.| | |
|[[Proc:maps file|maps]]|Memory maps to executables and library files (2.4).| | |
|mem|Memory held by this process.| | |
|root|Link to the root directory of this process.| | |
|[[Proc:Process Status file|stat]]|Process status.| | |
|[[Proc:statm file|statm]]|Process memory status information.| | |
|[[Proc:status file|status]]|Process status in human readable form.| | |
|wchan|If CONFIG_KALLSYMS is set, a pre-decoded wchan.| | |
|[[Proc:pagemap file|pagemap]]|Page table.| | |
|stack|Report full stack trace, enable via CONFIG_STACKTRACE.| | |
|[[Proc:smaps file|smaps]]|An extension based on maps, showing the memory consumption of each mapping.| | |
| |
| |
| |
===== Kernel data ===== | |
| |
Similar to the process entries, the kernel data files give information about the running kernel. The files used to obtain this information are contained in /proc and are listed here. Not all of these will be present in your system. It depends on the kernel configuration and the loaded modules, which files are there, and which are missing. | |
| |
^File^Content^ | |
|apm|Advanced power management info.| | |
|[[Proc:Buddy Info file|buddyinfo]]|Kernel memory allocator information (2.5).| | |
|bus|Directory containing bus specific information.| | |
|cmdline|Kernel command line.| | |
|cpuinfo|Info about the CPU.| | |
|devices|Available devices (block and character).| | |
|dma|Used DMS channels.| | |
|filesystems|Supported filesystems.| | |
|driver|Various drivers grouped here, currently rtc (2.4).| | |
|execdomains|Execdomains, related to security (2.4).| | |
|fb|Frame Buffer devices (2.4).| | |
|[[Proc:File System directory|fs]]|File system parameters, currently nfs/exports (2.4).| | |
|[[Proc:IDE directory|ide]]|Directory containing info about the IDE subsystem.| | |
|[[Proc:Interrupts file|interrupts]]|Interrupt usage.| | |
|iomem|Memory map (2.4).| | |
|ioports|I/O port usage.| | |
|[[Proc:IRQ Vectors|irq]]|Masks for irq to cpu affinity (2.4) (smp?).| | |
|isapnp|ISA PnP (Plug&Play) Info (2.4).| | |
|kcore|Kernel core image (can be ELF or A.OUT (deprecated in 2.4)).| | |
|kmsg|Kernel messages.| | |
|ksyms|Kernel symbol table.| | |
|loadavg|Load average of last 1, 5 & 15 minutes.| | |
|locks|Kernel locks.| | |
|[[Proc:Memory Info file|meminfo]]|Memory info.| | |
|misc|Miscellaneous.| | |
|modules|List of loaded modules.| | |
|mounts|Mounted filesystems.| | |
|[[Proc:Network directory|net]]|Networking info.| | |
|[[Proc:Page Type Info file|pagetypeinfo]]|Additional page allocator information (see text) (2.5).| | |
|[[Proc:Parallel Ports directory|parports]]|Parallel Ports| | |
|partitions|Table of partitions known to the system.| | |
|pci|Deprecated info of PCI bus (new way -> /proc/bus/pci/, decoupled by lspci (2.4).| | |
|rtc|Real time clock.| | |
|[[Proc:SCSI directory|scsi]]|SCSI info.| | |
|[[Proc:Slab pool info file|slabinfo]]|Slab pool info.| | |
|[[Proc:Soft IRQs file|softirqs]]|softirq usage.| | |
|[[Proc:Stat file|stat]]|Overall statistics.| | |
|swaps|Swap space utilization.| | |
|[[Proc:System file|sys]]|System info.| | |
|sysvipc|Info of SysVIPC Resources (msg, sem, shm) (2.4).| | |
|[[Proc:TTY directory|tty]]|Info of tty drivers.| | |
|uptime|System uptime.| | |
|version|Kernel version.| | |
|video|bttv info of video resources (2.4).| | |
|[[Proc:VM Alloc Info file|vmallocinfo]]|Show vmalloced areas.| | |
| |
| |
===== Per Process Parameters ===== | |
| |
==== oom-killer score ==== | |
| |
**/proc/<pid>/oom_adj** and **/proc/<pid>/oom_score_adj** adjust the oom-killer score. | |
| |
These file can be used to adjust the badness heuristic used to select which process gets killed in out of memory conditions. | |
| |
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted. The units are roughly a proportion along that range of allowed memory the process may allocate from based on an estimation of its current memory and swap use. For example, if a task is using all allowed memory, its badness score will be 1000. If it is using half of its allowed memory, its score will be 500. | |
| |
There is an additional factor included in the badness score: root processes are given 3% extra memory over other tasks. | |
| |
The amount of "allowed" memory depends on the context in which the oom killer was called. If it is due to the memory assigned to the allocating task's cpuset being exhausted, the allowed memory represents the set of mems assigned to that cpuset. If it is due to a mempolicy's node(s) being exhausted, the allowed memory represents the set of mempolicy nodes. If it is due to a memory limit (or swap limit) being reached, the allowed memory is that configured limit. Finally, if it is due to the entire system being out of memory, the allowed memory represents all allocatable resources. | |
| |
The value of **/proc/<pid>/oom_score_adj** is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 (OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_ADJ_MAX). This allows userspace to polarize the preference for oom killing either by always preferring a certain task or completely disabling it. The lowest possible value, -1000, is equivalent to disabling oom killing entirely for that task since it will always report a badness score of 0. | |
| |
Consequently, it is very simple for userspace to define the amount of memory to consider for each task. Setting a /proc/<pid>/oom_score_adj value of +500, for example, is roughly equivalent to allowing the remainder of tasks sharing the same system, cpuset, mempolicy, or memory controller resources to use at least 50% more memory. A value of -500, on the other hand, would be roughly equivalent to discounting 50% of the task's allowed memory from being considered as scoring against the task. | |
| |
For backwards compatibility with previous kernels, **/proc/<pid>/oom_adj** may also be used to tune the badness score. Its acceptable values range from -16 (OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17 (OOM_DISABLE) to disable oom killing entirely for that task. Its value is scaled linearly with /proc/<pid>/oom_score_adj. | |
| |
Writing to /proc/<pid>/oom_score_adj or /proc/<pid>/oom_adj will change the other with its scaled value. | |
| |
<WRAP info> | |
**NOTICE**: /proc/<pid>/oom_adj is deprecated and will be removed, please see Documentation/feature-removal-schedule.txt. | |
</WRAP> | |
| |
**Caveat**: When a parent task is selected, the oom killer will sacrifice any first generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. | |
| |
| |
**/proc/<pid>/oom_score** displays the current oom-killer score. This file can be used to check the current score used by the oom-killer is for any given <pid>. Use it together with **/proc/<pid>/oom_adj** to tune which process should be killed in an out-of-memory situation. | |
| |
| |
==== IO accounting fields ==== | |
| |
**/proc/<pid>/io** display the IO accounting fields, which are IO statistics for each running process. | |
| |
<code bash> | |
dd if=/dev/zero of=/tmp/test.dat & | |
[1] 3828 | |
| |
| |
cat /proc/3828/io | |
| |
rchar: 323934931 | |
wchar: 323929600 | |
syscr: 632687 | |
syscw: 632675 | |
read_bytes: 0 | |
write_bytes: 323932160 | |
cancelled_write_bytes: 0 | |
</code> | |
| |
where: | |
| |
* **rchar** is count of chars read. The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to **read()** and **pread()**. It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache) | |
* **wchar** is count of chars written. The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with **rchar**. | |
* **syscr** is a count of read syscalls. Attempt to count the number of read I/O operations, i.e. syscalls like **read()** and **pread()**. | |
* **syscw** is a count of writen syscalls. Attempt to count the number of write I/O operations, i.e. syscalls like **write()** and **pwrite()**. | |
* **read_bytes** is count of bytes read. Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the **submit_bio()** level, so it is accurate for block-backed filesystems. TODO: Add status regarding NFS and CIFS at a later time. | |
* **write_bytes** is count of bytes written. Attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at **page-dirtying** time. | |
* **cancelled_write_bytes** The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been accounted as having caused 1MB of write. In other words: The number of bytes which this process caused to not happen, by truncating pagecache. A task can cause "negative" IO too. If this task truncates some dirty pagecache, some IO which another task has been accounted for (in its write_bytes) will not be happening. We _could_ just subtract that from the truncating task's write_bytes, but there is information loss in doing that. | |
| |
<WRAP warning> | |
**WARNING:** At its current implementation state, this is a bit racy on 32-bit machines: if process A reads process B's **/proc/pid/io** while process B is updating one of those 64-bit counters, process A could see an intermediate result. | |
| |
More information about this can be found within the taskstats documentation in Documentation/accounting. | |
</WRAP> | |
| |
===== References ===== | |
| |
http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html | |
| |
https://www.cyberciti.biz/files/linux-kernel/Documentation/filesystems/proc.txt | |