kernel:control_groups
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revision | |||
kernel:control_groups [2020/07/22 18:05] – old revision restored (2017/04/06 10:22) 207.244.157.10 | kernel:control_groups [2020/07/22 18:05] (current) – old revision restored (2020/07/20 16:09) 207.244.157.10 | ||
---|---|---|---|
Line 15: | Line 15: | ||
User level code may create and destroy cgroups by name in an instance of the cgroup virtual file system, specify and query to which cgroup a task is assigned, and list the task pids assigned to a cgroup. Those creations and assignments only affect the hierarchy associated with that instance of the cgroup file system. | User level code may create and destroy cgroups by name in an instance of the cgroup virtual file system, specify and query to which cgroup a task is assigned, and list the task pids assigned to a cgroup. Those creations and assignments only affect the hierarchy associated with that instance of the cgroup file system. | ||
- | On their own, the only use for cgroups is for simple job tracking. The intention is that other subsystems hook into the generic cgroup support to provide new attributes for cgroups, such as accounting/ | + | On their own, the only use for cgroups is for simple job tracking. The intention is that other subsystems hook into the generic cgroup support to provide new attributes for cgroups, such as accounting/ |
===== Why are cgroups needed? ===== | ===== Why are cgroups needed? ===== | ||
- | There are multiple efforts to provide process aggregations in the Linux kernel, mainly for resource tracking purposes. Such efforts include cpusets, CKRM/ | + | There are multiple efforts to provide process aggregations in the Linux kernel, mainly for resource tracking purposes. Such efforts include |
- | The kernel cgroup patch provides the minimum essential kernel mechanisms required to efficiently implement such groups. | + | The kernel cgroup patch provides the minimum essential kernel mechanisms required to efficiently implement such groups. |
Multiple hierarchy support is provided to allow for situations where the division of tasks into cgroups is distinctly different for different subsystems - having parallel hierarchies allows each hierarchy to be a natural division of tasks, without having to handle complex combinations of tasks that would be present if several unrelated subsystems needed to be forced into the same tree of cgroups. | Multiple hierarchy support is provided to allow for situations where the division of tasks into cgroups is distinctly different for different subsystems - having parallel hierarchies allows each hierarchy to be a natural division of tasks, without having to handle complex combinations of tasks that would be present if several unrelated subsystems needed to be forced into the same tree of cgroups. | ||
Line 60: | Line 60: | ||
</ | </ | ||
- | With only a single hierarchy, | + | With only a single hierarchy, |
- | Also lets say that the administrator would like to give enhanced network access temporarily to a student' | + | Also lets say that the administrator would like to give enhanced network access temporarily to a student' |
- | + | ||
- | With ability to write pids directly to resource classes, it's just a matter of : | + | |
<code bash> | <code bash> | ||
Line 74: | Line 72: | ||
</ | </ | ||
- | Without this ability, | + | Without this ability, |
===== How are cgroups implemented? | ===== How are cgroups implemented? | ||
Line 81: | Line 78: | ||
Control Groups extends the kernel as follows: | Control Groups extends the kernel as follows: | ||
- | * Each task in the system has a reference-counted pointer to a css_set. | + | * Each task in the system has a reference-counted pointer to a **css_set**. |
- | * A css_set contains a set of reference-counted pointers to cgroup_subsys_state objects, one for each cgroup subsystem registered in the system. | + | * A **css_set** contains a set of reference-counted pointers to **cgroup_subsys_state** objects, one for each cgroup subsystem registered in the system. |
* A cgroup hierarchy filesystem can be mounted | * A cgroup hierarchy filesystem can be mounted | ||
Line 98: | Line 95: | ||
If an active hierarchy with exactly the same set of subsystems already exists, it will be reused for the new mount. | If an active hierarchy with exactly the same set of subsystems already exists, it will be reused for the new mount. | ||
- | matches, and any of the requested subsystems are in use in an existing hierarchy, the mount will fail with -EBUSY. Otherwise, a new hierarchy | + | matches, and any of the requested subsystems are in use in an existing hierarchy, the mount will fail with **-EBUSY**. Otherwise, a new hierarchy is activated, associated with the requested subsystems. |
- | is activated, associated with the requested subsystems. | + | |
- | It's not currently possible to bind a new subsystem to an active cgroup hierarchy, or to unbind a subsystem from an active cgroup hierarchy. This may be possible in future, but is fraught with nasty error-recovery issues. | + | It's not currently possible to bind a new subsystem to an active cgroup hierarchy, or to unbind a subsystem from an active cgroup hierarchy. |
When a cgroup filesystem is unmounted, if there are any child cgroups created below the top-level cgroup, that hierarchy will remain active even though unmounted; if there are no child cgroups then the hierarchy will be deactivated. | When a cgroup filesystem is unmounted, if there are any child cgroups created below the top-level cgroup, that hierarchy will remain active even though unmounted; if there are no child cgroups then the hierarchy will be deactivated. | ||
Line 114: | Line 110: | ||
* **cgroup.procs**: | * **cgroup.procs**: | ||
* **notify_on_release** flag: run the release agent on exit? | * **notify_on_release** flag: run the release agent on exit? | ||
- | * **release_agent**: | + | * **release_agent**: |
- | Other subsystems such as cpusets may add additional files in each cgroup dir. | + | Other subsystems such as [[Kernel: |
New cgroups are created using the mkdir system call or shell command. | New cgroups are created using the mkdir system call or shell command. | ||
Line 124: | Line 120: | ||
The attachment of each task, automatically inherited at fork by any children of that task, to a cgroup allows organizing the work load on a system into related sets of tasks. | The attachment of each task, automatically inherited at fork by any children of that task, to a cgroup allows organizing the work load on a system into related sets of tasks. | ||
- | When a task is moved from one cgroup to another, it gets a new css_set pointer - if there' | + | When a task is moved from one cgroup to another, it gets a new **css_set** pointer - if there' |
- | To allow access from a cgroup to the css_sets (and hence tasks) that comprise it, a set of cg_cgroup_link objects form a lattice; each cg_cgroup_link is linked into a list of cg_cgroup_links for a single cgroup on its cgrp_link_list field, and a list of cg_cgroup_links for a single css_set on its cg_link_list. | + | To allow access from a cgroup to the css_sets (and hence tasks) that comprise it, a set of **cg_cgroup_link** objects form a lattice; each cg_cgroup_link is linked into a list of **cg_cgroup_links** for a single cgroup on its **cgrp_link_list** field, and a list of cg_cgroup_links for a single css_set on its **cg_link_list**. |
Thus the set of tasks in a cgroup can be listed by iterating over each css_set that references the cgroup, and sub-iterating over each css_set' | Thus the set of tasks in a cgroup can be listed by iterating over each css_set that references the cgroup, and sub-iterating over each css_set' | ||
Line 135: | Line 131: | ||
===== What does notify_on_release do? ===== | ===== What does notify_on_release do? ===== | ||
- | If the notify_on_release flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup is removed, then the kernel runs the command specified by the contents of the " | + | If the **notify_on_release** flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup is removed, then the kernel runs the command specified by the contents of the **" |
===== What does clone_children do? ===== | ===== What does clone_children do? ===== | ||
- | If the clone_children flag is enabled (1) in a cgroup, then all cgroups created beneath will call the post_clone callbacks for each subsystem of the newly created cgroup. Usually when this callback is implemented for a subsystem, it copies the values of the parent subsystem, this is the case for the cpuset. | + | If the **clone_children** flag is enabled (1) in a cgroup, then all cgroups created beneath will call the post_clone callbacks for each subsystem of the newly created cgroup. Usually when this callback is implemented for a subsystem, it copies the values of the parent subsystem, this is the case for the cpuset. |
Line 147: | Line 143: | ||
To start a new job that is to be contained within a cgroup, using the " | To start a new job that is to be contained within a cgroup, using the " | ||
- | | + | - mkdir / |
- | 2) mount -t cgroup -ocpuset cpuset / | + | |
- | 3) Create the new cgroup by doing mkdir' | + | |
- | | + | |
- | 4) Start a task that will be the " | + | |
- | 5) Attach that task to the new cgroup by writing its pid to the | + | |
- | | + | |
- | 6) fork, exec or clone the job tasks from this founding father task. | + | |
For example, the following sequence of commands will setup a cgroup named " | For example, the following sequence of commands will setup a cgroup named " | ||
Line 185: | Line 179: | ||
</ | </ | ||
- | The " | + | The " |
To mount a cgroup hierarchy with just the cpuset and memory subsystems, type: | To mount a cgroup hierarchy with just the cpuset and memory subsystems, type: | ||
Line 218: | Line 212: | ||
Note that changing the set of subsystems is currently only supported when the hierarchy consists of a single (root) cgroup. | Note that changing the set of subsystems is currently only supported when the hierarchy consists of a single (root) cgroup. | ||
- | Then under /dev/cgroup you can find a tree that corresponds to the tree of the cgroups in the system. For instance, /dev/cgroup is the cgroup that holds the whole system. | + | Then under **/dev/cgroup** you can find a tree that corresponds to the tree of the cgroups in the system. For instance, /dev/cgroup is the cgroup that holds the whole system. |
If you want to change the value of release_agent: | If you want to change the value of release_agent: | ||
Line 245: | Line 239: | ||
<code bash> | <code bash> | ||
ls | ls | ||
- | < | + | |
cgroup.procs notify_on_release tasks | cgroup.procs notify_on_release tasks | ||
(plus whatever files added by the attached subsystems) | (plus whatever files added by the attached subsystems) | ||
Line 279: | Line 273: | ||
</ | </ | ||
- | Note that it is PID, not PIDs. You can only attach ONE task at a time. If you have several tasks to attach, you have to do it one after another: | + | <WRAP info> |
+ | **NOTE: | ||
+ | </ | ||
<code bash> | <code bash> | ||
Line 303: | Line 299: | ||
When passing a name=< | When passing a name=< | ||
- | The name of the subsystem appears as part of the hierarchy description in / | + | The name of the subsystem appears as part of the hierarchy description in **/ |
==== Notification API ==== | ==== Notification API ==== | ||
Line 320: | Line 316: | ||
To unregister notification handler just close eventfd. | To unregister notification handler just close eventfd. | ||
- | NOTE: Support of notifications should be implemented for the control file. See documentation for the subsystem. | + | <WRAP info> |
+ | **NOTE**: Support of notifications should be implemented for the control file. See documentation for the subsystem. | ||
+ | </ | ||
===== Kernel API ===== | ===== Kernel API ===== | ||
Line 327: | Line 324: | ||
==== Overview ==== | ==== Overview ==== | ||
- | Each kernel subsystem that wants to hook into the generic cgroup system needs to create a cgroup_subsys object. | + | Each kernel subsystem that wants to hook into the generic cgroup system needs to create a **cgroup_subsys** object. |
Other fields in the cgroup_subsys object include: | Other fields in the cgroup_subsys object include: | ||
Line 340: | Line 337: | ||
==== Synchronization ==== | ==== Synchronization ==== | ||
- | There is a global mutex, cgroup_mutex, | + | There is a global mutex, |
See kernel/ | See kernel/ | ||
- | Subsystems can take/ | + | Subsystems can take/ |
Accessing a task's cgroup pointer may be done in the following ways: | Accessing a task's cgroup pointer may be done in the following ways: |
kernel/control_groups.1595441124.txt.gz · Last modified: 2020/07/22 18:05 by 207.244.157.10