* RFC: documentation of the autogroup feature @ 2016-11-22 15:59 Michael Kerrisk (man-pages) 2016-11-23 10:33 ` [patch] sched/autogroup: Fix 64bit kernel nice adjustment Mike Galbraith 2016-11-23 11:39 ` RFC: documentation of the autogroup feature Mike Galbraith 0 siblings, 2 replies; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-22 15:59 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hello Mike and others, The autogroup feature that you added in 2.6.38 remains poorly documented, so I took a stab at adding some text to the sched(7) manual page. There are still a few pieces to be fixed, and you may also see some other pieces that should be added. Could I ask you to take a look at the text below? Cheers, Michael The autogroup feature Since Linux 2.6.38, the kernel provides a feature known as autogrouping to improve interactive desktop performance in the face of multiprocess CPU-intensive workloads such as building the Linux kernel with large numbers of parallel build processes (i.e., the make(1) -j flag). This feature operates in conjunction with the CFS scheduler and requires a kernel that is configured with CONFIG_SCHED_AUTO‐ GROUP. On a running system, this feature is enabled or dis‐ abled via the file /proc/sys/kernel/sched_autogroup_enabled; a value of 0 disables the feature, while a value of 1 enables it. The default value in this file is 1, unless the kernel was booted with the noautogroup parameter. When autogrouping is enabled, processes are automatically placed into "task groups" for the purposes of scheduling. In the current implementation, a new task group is created when a new session is created via setsid(2), as happens, for example, when a new terminal window is created. A task group is auto‐ matically destroyed when the last process in the group termi‐ nates. ┌─────────────────────────────────────────────────────┐ │FIXME │ ├─────────────────────────────────────────────────────┤ │The following is a little vague. Does it need to be │ │made more precise? │ └─────────────────────────────────────────────────────┘ The CFS scheduler employs an algorithm that distributes the CPU across task groups. As a result of this algorithm, the pro‐ cesses in task groups that contain multiple CPU-intensive pro‐ cesses are in effect disfavored by the scheduler. A process's autogroup (task group) membership can be viewed via the file /proc/[pid]/autogroup: $ cat /proc/1/autogroup /autogroup-1 nice 0 This file can also be used to modify the CPU bandwidth allo‐ cated to a task group. This is done by writing a number in the "nice" range to the file to set the task group's nice value. The allowed range is from +19 (low priority) to -20 (high pri‐ ority). Note that all values in this range cause a task group to be further disfavored by the scheduler, with -20 resulting in the scheduler mildy disfavoring the task group and +19 greatly disfavoring it. ┌─────────────────────────────────────────────────────┐ │FIXME │ ├─────────────────────────────────────────────────────┤ │Regarding the previous paragraph... My tests indi‐ │ │cate that writing *any* value to the autogroup file │ │causes the task group to get a lower priority. This │ │somewhat surprised me, since I assumed (based on the │ │parallel with the process nice(2) value) that nega‐ │ │tive values might boost the task group's priority │ │above a task group whose autogroup file had not been │ │touched. │ │ │ │Is this the expected behavior? I presume it is... │ │ │ │But then there's a small surprise in the interface. │ │Suppose that the value 0 is written to the autogroup │ │file, then this results in the task group being sig‐ │ │nificantly disfavored. But, the nice value *shown* │ │in the autogroup file will be the same as if the │ │file had not been modified. So, the user has no way │ │of discovering the difference. That seems odd. Am I │ │missing something? │ └─────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────┐ │FIXME │ ├─────────────────────────────────────────────────────┤ │Is the following correct? Does the statement need to │ │be more precise? (E.g., in precisely which circum‐ │ │stances does the use of cgroups override autogroup?) │ └─────────────────────────────────────────────────────┘ The use of the cgroups(7) CPU controller overrides the effect of autogrouping. ┌─────────────────────────────────────────────────────┐ │FIXME │ ├─────────────────────────────────────────────────────┤ │What needs to be said about autogroup and real-time │ │tasks? │ └─────────────────────────────────────────────────────┘ -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* [patch] sched/autogroup: Fix 64bit kernel nice adjustment 2016-11-22 15:59 RFC: documentation of the autogroup feature Michael Kerrisk (man-pages) @ 2016-11-23 10:33 ` Mike Galbraith 2016-11-23 13:47 ` Michael Kerrisk (man-pages) 2016-11-24 6:24 ` [tip:sched/urgent] sched/autogroup: Fix 64-bit kernel nice level adjustment tip-bot for Mike Galbraith 2016-11-23 11:39 ` RFC: documentation of the autogroup feature Mike Galbraith 1 sibling, 2 replies; 38+ messages in thread From: Mike Galbraith @ 2016-11-23 10:33 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote: > ┌─────────────────────────────────────────────────────┐ > │FIXME │ > ├─────────────────────────────────────────────────────┤ > │Regarding the previous paragraph... My tests indi‐ │ > │cate that writing *any* value to the autogroup file │ > │causes the task group to get a lower priority. This │ Because autogroup didn't call the then meaningless scale_load()... Autogroup nice level adjustment has been broken ever since load resolution was increased for 64bit kernels. Use scale_load() to scale group weight. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: stable@vger.kernel.org --- kernel/sched/auto_group.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/kernel/sched/auto_group.c +++ b/kernel/sched/auto_group.c @@ -192,6 +192,7 @@ int proc_sched_autogroup_set_nice(struct { static unsigned long next = INITIAL_JIFFIES; struct autogroup *ag; + unsigned long shares; int err; if (nice < MIN_NICE || nice > MAX_NICE) @@ -210,9 +211,10 @@ int proc_sched_autogroup_set_nice(struct next = HZ / 10 + jiffies; ag = autogroup_task_get(p); + shares = scale_load(sched_prio_to_weight[nice + 20]); down_write(&ag->lock); - err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]); + err = sched_group_set_shares(ag->tg, shares); if (!err) ag->nice = nice; up_write(&ag->lock); ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment 2016-11-23 10:33 ` [patch] sched/autogroup: Fix 64bit kernel nice adjustment Mike Galbraith @ 2016-11-23 13:47 ` Michael Kerrisk (man-pages) 2016-11-23 14:12 ` Mike Galbraith 2016-11-24 6:24 ` [tip:sched/urgent] sched/autogroup: Fix 64-bit kernel nice level adjustment tip-bot for Mike Galbraith 1 sibling, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-23 13:47 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hello Mike, On 11/23/2016 11:33 AM, Mike Galbraith wrote: > On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote: > >> ┌─────────────────────────────────────────────────────┐ >> │FIXME │ >> ├─────────────────────────────────────────────────────┤ >> │Regarding the previous paragraph... My tests indi‐ │ >> │cate that writing *any* value to the autogroup file │ >> │causes the task group to get a lower priority. This │ > > Because autogroup didn't call the then meaningless scale_load()... So, does that mean that this buglet kicked in starting (only) in Linux 4.7 with commit 2159197d66770ec01f75c93fb11dc66df81fd45b? > Autogroup nice level adjustment has been broken ever since load > resolution was increased for 64bit kernels. Use scale_load() to > scale group weight. Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Applied and tested against 4.9-rc6 on an Intel u7 (4 cores). Test setup: Terminal window 1: running 40 CPU burner jobs Terminal window 2: running 40 CPU burner jobs Terminal window 1: running 1 CPU burner job Demonstrated that: * Writing "0" to the autogroup file for TW1 now causes no change to the rate at which the process on the terminal consume CPU. * Writing -20 to the autogroup file for TW1 caused those processes to get the lion's share of CPU while TW2 TW3 get a tiny amount. * Writing -20 to the autogroup files for TW1 and TW3 allowed the process on TW3 to get as much CPU as it was getting as when the autogroup nice values for both terminals were 0. Thanks, Michael > Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> > Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> > Cc: stable@vger.kernel.org > --- > kernel/sched/auto_group.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > --- a/kernel/sched/auto_group.c > +++ b/kernel/sched/auto_group.c > @@ -192,6 +192,7 @@ int proc_sched_autogroup_set_nice(struct > { > static unsigned long next = INITIAL_JIFFIES; > struct autogroup *ag; > + unsigned long shares; > int err; > > if (nice < MIN_NICE || nice > MAX_NICE) > @@ -210,9 +211,10 @@ int proc_sched_autogroup_set_nice(struct > > next = HZ / 10 + jiffies; > ag = autogroup_task_get(p); > + shares = scale_load(sched_prio_to_weight[nice + 20]); > > down_write(&ag->lock); > - err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]); > + err = sched_group_set_shares(ag->tg, shares); > if (!err) > ag->nice = nice; > up_write(&ag->lock); > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment 2016-11-23 13:47 ` Michael Kerrisk (man-pages) @ 2016-11-23 14:12 ` Mike Galbraith 2016-11-23 14:20 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2016-11-23 14:12 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Wed, 2016-11-23 at 14:47 +0100, Michael Kerrisk (man-pages) wrote: > Hello Mike, > > On 11/23/2016 11:33 AM, Mike Galbraith wrote: > > On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) > > wrote: > > > > > ┌─────────────────────────────────────────────────────┐ > > > │FIXME │ > > > ├─────────────────────────────────────────────────────┤ > > > │Regarding the previous paragraph... My tests indi‐ │ > > > │cate that writing *any* value to the autogroup file │ > > > │causes the task group to get a lower priority. This │ > > > > Because autogroup didn't call the then meaningless scale_load()... > > So, does that mean that this buglet kicked in starting (only) in > Linux 4.7 with commit 2159197d66770ec01f75c93fb11dc66df81fd45b? Yeah, that gave it teeth. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment 2016-11-23 14:12 ` Mike Galbraith @ 2016-11-23 14:20 ` Michael Kerrisk (man-pages) 2016-11-23 15:55 ` Mike Galbraith 0 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-23 14:20 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On 11/23/2016 03:12 PM, Mike Galbraith wrote: > On Wed, 2016-11-23 at 14:47 +0100, Michael Kerrisk (man-pages) wrote: >> Hello Mike, >> >> On 11/23/2016 11:33 AM, Mike Galbraith wrote: >>> On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) >>> wrote: >>> >>>> ┌─────────────────────────────────────────────────────┐ >>>> │FIXME │ >>>> ├─────────────────────────────────────────────────────┤ >>>> │Regarding the previous paragraph... My tests indi‐ │ >>>> │cate that writing *any* value to the autogroup file │ >>>> │causes the task group to get a lower priority. This │ >>> >>> Because autogroup didn't call the then meaningless scale_load()... >> >> So, does that mean that this buglet kicked in starting (only) in >> Linux 4.7 with commit 2159197d66770ec01f75c93fb11dc66df81fd45b? > > Yeah, that gave it teeth. Thanks for the confirmation. Are you aiming to see the fix merged for 4.9, or will this wait for 4.10? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [patch] sched/autogroup: Fix 64bit kernel nice adjustment 2016-11-23 14:20 ` Michael Kerrisk (man-pages) @ 2016-11-23 15:55 ` Mike Galbraith 0 siblings, 0 replies; 38+ messages in thread From: Mike Galbraith @ 2016-11-23 15:55 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Wed, 2016-11-23 at 15:20 +0100, Michael Kerrisk (man-pages) wrote: > Thanks for the confirmation. Are you aiming to see the fix > merged for 4.9, or will this wait for 4.10? Dunno, that's up to Peter/Ingo. It's unlikely that anyone other than we two will notice a thing either way :) -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:sched/urgent] sched/autogroup: Fix 64-bit kernel nice level adjustment 2016-11-23 10:33 ` [patch] sched/autogroup: Fix 64bit kernel nice adjustment Mike Galbraith 2016-11-23 13:47 ` Michael Kerrisk (man-pages) @ 2016-11-24 6:24 ` tip-bot for Mike Galbraith 1 sibling, 0 replies; 38+ messages in thread From: tip-bot for Mike Galbraith @ 2016-11-24 6:24 UTC (permalink / raw) To: linux-tip-commits Cc: efault, linux-kernel, peterz, mtk.manpages, umgwanakikbuti, torvalds, tglx, a.p.zijlstra, hpa, linux-man, mingo Commit-ID: 83929cce95251cc77e5659bf493bd424ae0e7a67 Gitweb: http://git.kernel.org/tip/83929cce95251cc77e5659bf493bd424ae0e7a67 Author: Mike Galbraith <efault@gmx.de> AuthorDate: Wed, 23 Nov 2016 11:33:37 +0100 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Thu, 24 Nov 2016 05:45:02 +0100 sched/autogroup: Fix 64-bit kernel nice level adjustment Michael Kerrisk reported: > Regarding the previous paragraph... My tests indicate > that writing *any* value to the autogroup [nice priority level] > file causes the task group to get a lower priority. Because autogroup didn't call the then meaningless scale_load()... Autogroup nice level adjustment has been broken ever since load resolution was increased for 64-bit kernels. Use scale_load() to scale group weight. Michael Kerrisk tested this patch to fix the problem: > Applied and tested against 4.9-rc6 on an Intel u7 (4 cores). > Test setup: > > Terminal window 1: running 40 CPU burner jobs > Terminal window 2: running 40 CPU burner jobs > Terminal window 1: running 1 CPU burner job > > Demonstrated that: > * Writing "0" to the autogroup file for TW1 now causes no change > to the rate at which the process on the terminal consume CPU. > * Writing -20 to the autogroup file for TW1 caused those processes > to get the lion's share of CPU while TW2 TW3 get a tiny amount. > * Writing -20 to the autogroup files for TW1 and TW3 allowed the > process on TW3 to get as much CPU as it was getting as when > the autogroup nice values for both terminals were 0. Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Tested-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-man <linux-man@vger.kernel.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1479897217.4306.6.camel@gmx.de Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/sched/auto_group.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c index f1c8fd5..da39489 100644 --- a/kernel/sched/auto_group.c +++ b/kernel/sched/auto_group.c @@ -212,6 +212,7 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, int nice) { static unsigned long next = INITIAL_JIFFIES; struct autogroup *ag; + unsigned long shares; int err; if (nice < MIN_NICE || nice > MAX_NICE) @@ -230,9 +231,10 @@ int proc_sched_autogroup_set_nice(struct task_struct *p, int nice) next = HZ / 10 + jiffies; ag = autogroup_task_get(p); + shares = scale_load(sched_prio_to_weight[nice + 20]); down_write(&ag->lock); - err = sched_group_set_shares(ag->tg, sched_prio_to_weight[nice + 20]); + err = sched_group_set_shares(ag->tg, shares); if (!err) ag->nice = nice; up_write(&ag->lock); ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-22 15:59 RFC: documentation of the autogroup feature Michael Kerrisk (man-pages) 2016-11-23 10:33 ` [patch] sched/autogroup: Fix 64bit kernel nice adjustment Mike Galbraith @ 2016-11-23 11:39 ` Mike Galbraith 2016-11-23 13:54 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2016-11-23 11:39 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote: > ┌─────────────────────────────────────────────────────┐ > │FIXME │ > ├─────────────────────────────────────────────────────┤ > │The following is a little vague. Does it need to be │ > │made more precise? │ > └─────────────────────────────────────────────────────┘ > The CFS scheduler employs an algorithm that distributes the CPU > across task groups. As a result of this algorithm, the pro‐ > cesses in task groups that contain multiple CPU-intensive pro‐ > cesses are in effect disfavored by the scheduler. Mmmm, they're actually equalized (modulo smp fairness goop), but I see what you mean. > A process's autogroup (task group) membership can be viewed via > the file /proc/[pid]/autogroup: > > $ cat /proc/1/autogroup > /autogroup-1 nice 0 > > This file can also be used to modify the CPU bandwidth allo‐ > cated to a task group. This is done by writing a number in the > "nice" range to the file to set the task group's nice value. > The allowed range is from +19 (low priority) to -20 (high pri‐ > ority). Note that all values in this range cause a task group > to be further disfavored by the scheduler, with -20 resulting > in the scheduler mildy disfavoring the task group and +19 > greatly disfavoring it. Group nice levels exactly work the same as task nice levels, ie negative nice increases share, positive nice decreases it relative to the default nice 0. > ┌─────────────────────────────────────────────────────┐ > │FIXME │ > ├─────────────────────────────────────────────────────┤ > │Regarding the previous paragraph... My tests indi‐ │ > │cate that writing *any* value to the autogroup file │ > │causes the task group to get a lower priority. (patchlet.. I'd prefer to whack the knob, but like the on/off switch, it may be in use, so I guess we're stuck with it) > ┌─────────────────────────────────────────────────────┐ > │FIXME │ > ├─────────────────────────────────────────────────────┤ > │Is the following correct? Does the statement need to │ > │be more precise? (E.g., in precisely which circum‐ │ > │stances does the use of cgroups override autogroup?) │ > └─────────────────────────────────────────────────────┘ > The use of the cgroups(7) CPU controller overrides the effect > of autogrouping. Correct, autogroup defers to cgroups. Perhaps mention that moving a task back to the root task group will result in the autogroup again taking effect. > ┌─────────────────────────────────────────────────────┐ > │FIXME │ > ├─────────────────────────────────────────────────────┤ > │What needs to be said about autogroup and real-time │ > │tasks? │ > └─────────────────────────────────────────────────────┘ That it does not group realtime tasks, they are auto-deflected to the root task group. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 11:39 ` RFC: documentation of the autogroup feature Mike Galbraith @ 2016-11-23 13:54 ` Michael Kerrisk (man-pages) 2016-11-23 15:33 ` Mike Galbraith 0 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-23 13:54 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Mike, First off, I better say that I'm not at all intimate with the details of the scheduler, so bear with me... On 11/23/2016 12:39 PM, Mike Galbraith wrote: > On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote: > >> ┌─────────────────────────────────────────────────────┐ >> │FIXME │ >> ├─────────────────────────────────────────────────────┤ >> │The following is a little vague. Does it need to be │ >> │made more precise? │ >> └─────────────────────────────────────────────────────┘ >> The CFS scheduler employs an algorithm that distributes the CPU >> across task groups. As a result of this algorithm, the pro‐ >> cesses in task groups that contain multiple CPU-intensive pro‐ >> cesses are in effect disfavored by the scheduler. > > Mmmm, they're actually equalized (modulo smp fairness goop), but I see > what you mean. I couldn't quite grok that sentence. My problem is resolving "they". Do you mean: "the CPU scheduler equalizes the distribution of CPU cycles across task groups"? > >> A process's autogroup (task group) membership can be viewed via >> the file /proc/[pid]/autogroup: >> >> $ cat /proc/1/autogroup >> /autogroup-1 nice 0 >> >> This file can also be used to modify the CPU bandwidth allo‐ >> cated to a task group. This is done by writing a number in the >> "nice" range to the file to set the task group's nice value. >> The allowed range is from +19 (low priority) to -20 (high pri‐ >> ority). Note that all values in this range cause a task group >> to be further disfavored by the scheduler, with -20 resulting >> in the scheduler mildy disfavoring the task group and +19 >> greatly disfavoring it. > > Group nice levels exactly work the same as task nice levels, ie > negative nice increases share, positive nice decreases it relative to > the default nice 0. Yes, got it now. >> ┌─────────────────────────────────────────────────────┐ >> │FIXME │ >> ├─────────────────────────────────────────────────────┤ >> │Regarding the previous paragraph... My tests indi‐ │ >> │cate that writing *any* value to the autogroup file │ >> │causes the task group to get a lower priority. > > (patchlet.. Writing documentation finds bugs. Who knew? ;-) > I'd prefer to whack the knob, but like the on/off switch, > it may be in use, so I guess we're stuck with it) > >> ┌─────────────────────────────────────────────────────┐ >> │FIXME │ >> ├─────────────────────────────────────────────────────┤ >> │Is the following correct? Does the statement need to │ >> │be more precise? (E.g., in precisely which circum‐ │ >> │stances does the use of cgroups override autogroup?) │ >> └─────────────────────────────────────────────────────┘ >> The use of the cgroups(7) CPU controller overrides the effect >> of autogrouping. > > Correct, autogroup defers to cgroups. Perhaps mention that moving a > task back to the root task group will result in the autogroup again > taking effect. In what circumstances does a process get moved back to the root task group? Actually, can you define for me what the root task group is, and why it exists? That may be worth some words in this man page. >> ┌─────────────────────────────────────────────────────┐ >> │FIXME │ >> ├─────────────────────────────────────────────────────┤ >> │What needs to be said about autogroup and real-time │ >> │tasks? │ >> └─────────────────────────────────────────────────────┘ > > That it does not group realtime tasks, they are auto-deflected to the > root task group. Okay. Thanks. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 13:54 ` Michael Kerrisk (man-pages) @ 2016-11-23 15:33 ` Mike Galbraith 2016-11-23 16:04 ` Michael Kerrisk (man-pages) ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Mike Galbraith @ 2016-11-23 15:33 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Wed, 2016-11-23 at 14:54 +0100, Michael Kerrisk (man-pages) wrote: > Hi Mike, > > First off, I better say that I'm not at all intimate with the details > of the scheduler, so bear with me... > > On 11/23/2016 12:39 PM, Mike Galbraith wrote: > > On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote: > > > > > ┌─────────────────────────────────────────────────────┐ > > > │FIXME │ > > > ├─────────────────────────────────────────────────────┤ > > > │The following is a little vague. Does it need to be │ > > > │made more precise? │ > > > └─────────────────────────────────────────────────────┘ > > > The CFS scheduler employs an algorithm that distributes the CPU > > > across task groups. As a result of this algorithm, the pro‐ > > > cesses in task groups that contain multiple CPU-intensive pro‐ > > > cesses are in effect disfavored by the scheduler. > > > > Mmmm, they're actually equalized (modulo smp fairness goop), but I see > > what you mean. > > I couldn't quite grok that sentence. My problem is resolving "they". > Do you mean: "the CPU scheduler equalizes the distribution of > CPU cycles across task groups"? Sort of. "They" are scheduler entities, runqueue (group) or task. The scheduler equalizes entity vruntimes. > > > │FIXME │ > > > ├─────────────────────────────────────────────────────┤ > > > │Is the following correct? Does the statement need to │ > > > │be more precise? (E.g., in precisely which circum‐ │ > > > │stances does the use of cgroups override autogroup?) │ > > > └─────────────────────────────────────────────────────┘ > > > The use of the cgroups(7) CPU controller overrides the effect > > > of autogrouping. > > > > Correct, autogroup defers to cgroups. Perhaps mention that moving a > > task back to the root task group will result in the autogroup again > > taking effect. > > In what circumstances does a process get moved back to the root > task group? Userspace actions, tool or human fingers. > Actually, can you define for me what the root task group is, and > why it exists? That may be worth some words in this man page. I don't think we need group scheduling details, there's plenty of documentation elsewhere for those who want theory. Autogroup is for those who don't want to have to care (which is also why it should have never grown nice knob). -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 15:33 ` Mike Galbraith @ 2016-11-23 16:04 ` Michael Kerrisk (man-pages) 2016-11-23 17:11 ` Mike Galbraith 2016-11-23 16:05 ` RFC: documentation of the autogroup feature Michael Kerrisk (man-pages) 2016-11-27 21:13 ` Michael Kerrisk (man-pages) 2 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-23 16:04 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Mike, On 11/23/2016 04:33 PM, Mike Galbraith wrote: > On Wed, 2016-11-23 at 14:54 +0100, Michael Kerrisk (man-pages) wrote: >> Hi Mike, >> >> First off, I better say that I'm not at all intimate with the details >> of the scheduler, so bear with me... >> >> On 11/23/2016 12:39 PM, Mike Galbraith wrote: >>> On Tue, 2016-11-22 at 16:59 +0100, Michael Kerrisk (man-pages) wrote: >>> >>>> ┌─────────────────────────────────────────────────────┐ >>>> │FIXME │ >>>> ├─────────────────────────────────────────────────────┤ >>>> │The following is a little vague. Does it need to be │ >>>> │made more precise? │ >>>> └─────────────────────────────────────────────────────┘ >>>> The CFS scheduler employs an algorithm that distributes the CPU >>>> across task groups. As a result of this algorithm, the pro‐ >>>> cesses in task groups that contain multiple CPU-intensive pro‐ >>>> cesses are in effect disfavored by the scheduler. >>> >>> Mmmm, they're actually equalized (modulo smp fairness goop), but I see >>> what you mean. >> >> I couldn't quite grok that sentence. My problem is resolving "they". >> Do you mean: "the CPU scheduler equalizes the distribution of >> CPU cycles across task groups"? > > Sort of. "They" are scheduler entities, runqueue (group) or task. The > scheduler equalizes entity vruntimes. Okay -- I'll see if I can come up with some wording there. > >>>> │FIXME │ >>>> ├─────────────────────────────────────────────────────┤ >>>> │Is the following correct? Does the statement need to │ >>>> │be more precise? (E.g., in precisely which circum‐ │ >>>> │stances does the use of cgroups override autogroup?) │ >>>> └─────────────────────────────────────────────────────┘ >>>> The use of the cgroups(7) CPU controller overrides the effect >>>> of autogrouping. >>> >>> Correct, autogroup defers to cgroups. Perhaps mention that moving a >>> task back to the root task group will result in the autogroup again >>> taking effect. >> >> In what circumstances does a process get moved back to the root >> task group? > > Userspace actions, tool or human fingers. Could you say a little more please. What Kernel-user-space APIs/system calls/etc. cause this to happen? >> Actually, can you define for me what the root task group is, and >> why it exists? That may be worth some words in this man page. > > I don't think we need group scheduling details, there's plenty of > documentation elsewhere for those who want theory. Well, you suggested above Perhaps mention that moving a task back to the root task group will result in the autogroup again taking effect. So, that inevitable would lead me and the reader of the man page to ask: what's the root task group? > Autogroup is for > those who don't want to have to care (which is also why it should have > never grown nice knob). Yes, that I understand that much :-). Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 16:04 ` Michael Kerrisk (man-pages) @ 2016-11-23 17:11 ` Mike Galbraith 2016-11-24 21:41 ` RFC: documentation of the autogroup feature [v2] Michael Kerrisk (man-pages) 0 siblings, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2016-11-23 17:11 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Wed, 2016-11-23 at 17:04 +0100, Michael Kerrisk (man-pages) wrote: > > > In what circumstances does a process get moved back to the root > > > task group? > > > > Userspace actions, tool or human fingers. > > Could you say a little more please. What Kernel-user-space > APIs/system calls/etc. cause this to happen? Well, the system call would be write(), scribbling in the cgroups vfs interface.. not all that helpful without ever more technical detail. > > > Actually, can you define for me what the root task group is, and > > > why it exists? That may be worth some words in this man page. > > > > I don't think we need group scheduling details, there's plenty of > > documentation elsewhere for those who want theory. > > Well, you suggested above > > Perhaps mention that moving a task back to the root task > group will result in the autogroup again taking effect. Dang, evolution doesn't have an unsend button :) -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* RFC: documentation of the autogroup feature [v2] 2016-11-23 17:11 ` Mike Galbraith @ 2016-11-24 21:41 ` Michael Kerrisk (man-pages) 2016-11-25 12:52 ` Afzal Mohammed 2016-11-25 13:02 ` Mike Galbraith 0 siblings, 2 replies; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-24 21:41 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Mike, I reworked the text on autogroups, and in the process learned something/have another question. Could you tell me if anything in the below needs fixing/improving, and also let me know about the FIXME? Thanks, Michael The autogroup feature Since Linux 2.6.38, the kernel provides a feature known as autogrouping to improve interactive desktop performance in the face of multiprocess, CPU-intensive workloads such as building the Linux kernel with large numbers of parallel build processes (i.e., the make(1) -j flag). This feature operates in conjunction with the CFS scheduler and requires a kernel that is configured with CONFIG_SCHED_AUTO‐ GROUP. On a running system, this feature is enabled or dis‐ abled via the file /proc/sys/kernel/sched_autogroup_enabled; a value of 0 disables the feature, while a value of 1 enables it. The default value in this file is 1, unless the kernel was booted with the noautogroup parameter. A new autogroup is created created when a new session is cre‐ ated via setsid(2); this happens, for example, when a new ter‐ minal window is started. A new process created by fork(2) inherits its parent's autogroup membership. Thus, all of the processes in a session are members of the same autogroup. An autogroup is automatically destroyed when the last process in the group terminates. When autogrouping is enabled, all of the members of an auto‐ group are placed in the same kernel scheduler "task group". The CFS scheduler employs an algorithm that equalizes the dis‐ tribution of CPU cycles across task groups. The benefits of this for interactive desktop performance can be described via the following example. Suppose that there are two autogroups competing for the same CPU. The first group contains ten CPU-bound processes from a kernel build started with make -j10. The other contains a sin‐ gle CPU-bound process: a video player. The effect of auto‐ grouping is that the two groups will each receive half of the CPU cycles. That is, the video player will receive 50% of the CPU cycles, rather just 9% of the cycles, which would likely lead to degraded video playback. Or to put things another way: an autogroup that contains a large number of CPU-bound pro‐ cesses does not end up overwhelming the CPU at the expense of the other jobs on the system. A process's autogroup (task group) membership can be viewed via the file /proc/[pid]/autogroup: $ cat /proc/1/autogroup /autogroup-1 nice 0 This file can also be used to modify the CPU bandwidth allo‐ cated to an autogroup. This is done by writing a number in the "nice" range to the file to set the autogroup's nice value. The allowed range is from +19 (low priority) to -20 (high pri‐ ority), and the setting has the same effect as modifying the nice level via getpriority(2). (For a discussion of the nice value, see getpriority(2).) ┌─────────────────────────────────────────────────────┐ │FIXME │ ├─────────────────────────────────────────────────────┤ │How do the nice value of a process and the nice │ │value of an autogroup interact? Which has priority? │ │ │ │It *appears* that the autogroup nice value is used │ │for CPU distribution between task groups, and that │ │the process nice value has no effect there. (I.e., │ │suppose two autogroups each contain a CPU-bound │ │process, with one process having nice==0 and the │ │other having nice==19. It appears that they each │ │get 50% of the CPU.) It appears that the process │ │nice value has effect only with respect to schedul‐ │ │ing relative to other processes in the *same* auto‐ │ │group. Is this correct? │ └─────────────────────────────────────────────────────┘ The use of the cgroups(7) CPU controller overrides the effect of autogrouping. The autogroup feature does not group processes that are sched‐ uled under a real-time and deadline policies. Those processes are scheduled according to the rules described earlier. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-24 21:41 ` RFC: documentation of the autogroup feature [v2] Michael Kerrisk (man-pages) @ 2016-11-25 12:52 ` Afzal Mohammed 2016-11-25 13:04 ` Michael Kerrisk (man-pages) 2016-11-25 13:02 ` Mike Galbraith 1 sibling, 1 reply; 38+ messages in thread From: Afzal Mohammed @ 2016-11-25 12:52 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Mike Galbraith, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi, On Thu, Nov 24, 2016 at 10:41:29PM +0100, Michael Kerrisk (man-pages) wrote: > Suppose that there are two autogroups competing for the same > CPU. The first group contains ten CPU-bound processes from a > kernel build started with make -j10. The other contains a sin‐ > gle CPU-bound process: a video player. The effect of auto‐ > grouping is that the two groups will each receive half of the > CPU cycles. That is, the video player will receive 50% of the > CPU cycles, rather just 9% of the cycles, which would likely ^^^^ than ? Regards afzal > lead to degraded video playback. Or to put things another way: > an autogroup that contains a large number of CPU-bound pro‐ > cesses does not end up overwhelming the CPU at the expense of > the other jobs on the system. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 12:52 ` Afzal Mohammed @ 2016-11-25 13:04 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-25 13:04 UTC (permalink / raw) To: Afzal Mohammed Cc: mtk.manpages, Mike Galbraith, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On 11/25/2016 01:52 PM, Afzal Mohammed wrote: > Hi, > > On Thu, Nov 24, 2016 at 10:41:29PM +0100, Michael Kerrisk (man-pages) wrote: > >> Suppose that there are two autogroups competing for the same >> CPU. The first group contains ten CPU-bound processes from a >> kernel build started with make -j10. The other contains a sin‐ >> gle CPU-bound process: a video player. The effect of auto‐ >> grouping is that the two groups will each receive half of the >> CPU cycles. That is, the video player will receive 50% of the >> CPU cycles, rather just 9% of the cycles, which would likely > ^^^^ > than ? > > Regards > afzal Thanks, Afzal. Fixed! Cheers, Michael > >> lead to degraded video playback. Or to put things another way: >> an autogroup that contains a large number of CPU-bound pro‐ >> cesses does not end up overwhelming the CPU at the expense of >> the other jobs on the system. > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-24 21:41 ` RFC: documentation of the autogroup feature [v2] Michael Kerrisk (man-pages) 2016-11-25 12:52 ` Afzal Mohammed @ 2016-11-25 13:02 ` Mike Galbraith 2016-11-25 15:04 ` Michael Kerrisk (man-pages) 1 sibling, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2016-11-25 13:02 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Thu, 2016-11-24 at 22:41 +0100, Michael Kerrisk (man-pages) wrote: > Suppose that there are two autogroups competing for the same > CPU. The first group contains ten CPU-bound processes from a > kernel build started with make -j10. The other contains a sin‐ > gle CPU-bound process: a video player. The effect of auto‐ > grouping is that the two groups will each receive half of the > CPU cycles. That is, the video player will receive 50% of the > CPU cycles, rather just 9% of the cycles, which would likely > lead to degraded video playback. Or to put things another way: > an autogroup that contains a large number of CPU-bound pro‐ > cesses does not end up overwhelming the CPU at the expense of > the other jobs on the system. I'd say something more wishy-washy here, like cycles are distributed fairly across groups and leave it at that, as your detailed example is incorrect due to SMP fairness (which I don't like much because [very unlikely] worst case scenario renders a box sized group incapable of utilizing more that a single CPU total). For example, if a group of NR_CPUS size competes with a singleton, load balancing will try to give the singleton a full CPU of its very own. If groups intersect for whatever reason on say my quad lappy, distribution is 80/20 in favor of the singleton. > ┌─────────────────────────────────────────────────────┐ > │FIXME │ > ├─────────────────────────────────────────────────────┤ > │How do the nice value of a process and the nice │ > │value of an autogroup interact? Which has priority? │ > │ │ > │It *appears* that the autogroup nice value is used │ > │for CPU distribution between task groups, and that │ > │the process nice value has no effect there. (I.e., │ > │suppose two autogroups each contain a CPU-bound │ > │process, with one process having nice==0 and the │ > │other having nice==19. It appears that they each │ > │get 50% of the CPU.) It appears that the process │ > │nice value has effect only with respect to schedul‐ │ > │ing relative to other processes in the *same* auto‐ │ > │group. Is this correct? │ > └─────────────────────────────────────────────────────┘ Yup, entity nice level affects distribution among peer entities. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 13:02 ` Mike Galbraith @ 2016-11-25 15:04 ` Michael Kerrisk (man-pages) 2016-11-25 15:48 ` Michael Kerrisk (man-pages) ` (2 more replies) 0 siblings, 3 replies; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-25 15:04 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Mike, On 11/25/2016 02:02 PM, Mike Galbraith wrote: > On Thu, 2016-11-24 at 22:41 +0100, Michael Kerrisk (man-pages) wrote: > >> Suppose that there are two autogroups competing for the same >> CPU. The first group contains ten CPU-bound processes from a >> kernel build started with make -j10. The other contains a sin‐ >> gle CPU-bound process: a video player. The effect of auto‐ >> grouping is that the two groups will each receive half of the >> CPU cycles. That is, the video player will receive 50% of the >> CPU cycles, rather just 9% of the cycles, which would likely >> lead to degraded video playback. Or to put things another way: >> an autogroup that contains a large number of CPU-bound pro‐ >> cesses does not end up overwhelming the CPU at the expense of >> the other jobs on the system. > > I'd say something more wishy-washy here, like cycles are distributed > fairly across groups and leave it at that, I see where you want to go, but the problem is that the word "fair" will invoke different interpretations for different people. So, I think one does need to be a little more concrete. > as your detailed example is > incorrect due to SMP fairness Well, I was trying to exclude SMP from the discussion by saying "competing for the same CPU". Here I was meaning that we involve taskset(1) to confine everyone to the same CPU. Then, I think my example is correct. (I did some light testing before writing that text.) But I guess my meaning wasn't clear enough, and it is a slightly contrived scenario anyway. I'll add some words to clarify my example, and also add something to say that the situation is more complex on an SMP system. Something like the following: Suppose that there are two autogroups competing for the same CPU (i.e., presume either a single CPU system or the use of taskset(1) to confine all the processes to the same CPU on an SMP system). The first group contains ten CPU-bound processes from a kernel build started with make -j10. The other contains a single CPU- bound process: a video player. The effect of autogrouping is that the two groups will each receive half of the CPU cycles. That is, the video player will receive 50% of the CPU cycles, rather than just 9% of the cycles, which would likely lead to degraded video playback. The situation on an SMP system is more complex, but the general effect is the same: the scheduler distributes CPU cycles across task groups such that an autogroup that contains a large number of CPU-bound processes does not end up hoffing CPU cycles at the expense of the other jobs on the system. > (which I don't like much because [very > unlikely] worst case scenario renders a box sized group incapable of > utilizing more that a single CPU total). For example, if a group of > NR_CPUS size competes with a singleton, load balancing will try to give > the singleton a full CPU of its very own. If groups intersect for > whatever reason on say my quad lappy, distribution is 80/20 in favor of > the singleton. Thanks for the additional info. Good for educating me, but I think you'll agree it's more than we need for the man page. >> ┌─────────────────────────────────────────────────────┐ >> │FIXME │ >> ├─────────────────────────────────────────────────────┤ >> │How do the nice value of a process and the nice │ >> │value of an autogroup interact? Which has priority? │ >> │ │ >> │It *appears* that the autogroup nice value is used │ >> │for CPU distribution between task groups, and that │ >> │the process nice value has no effect there. (I.e., │ >> │suppose two autogroups each contain a CPU-bound │ >> │process, with one process having nice==0 and the │ >> │other having nice==19. It appears that they each │ >> │get 50% of the CPU.) It appears that the process │ >> │nice value has effect only with respect to schedul‐ │ >> │ing relative to other processes in the *same* auto‐ │ >> │group. Is this correct? │ >> └─────────────────────────────────────────────────────┘ > > Yup, entity nice level affects distribution among peer entities. Huh! I only just learned about this via my experiments while investigating autogroups. How long have things been like this? Always? (I don't think so.) Since the arrival of CFS? Since the arrival of autogrouping? (I'm guessing not.) Since some other point? (When?) It seems to me that this renders the traditional process nice pretty much useless. (I bet I'm not the only one who'd be surprised by the current behavior.) Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 15:04 ` Michael Kerrisk (man-pages) @ 2016-11-25 15:48 ` Michael Kerrisk (man-pages) 2016-11-25 15:51 ` Mike Galbraith 2016-11-25 16:04 ` Peter Zijlstra 2 siblings, 0 replies; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-25 15:48 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On 11/25/2016 04:04 PM, Michael Kerrisk (man-pages) wrote: > Hi Mike, > > On 11/25/2016 02:02 PM, Mike Galbraith wrote: >>> ┌─────────────────────────────────────────────────────┐ >>> │FIXME │ >>> ├─────────────────────────────────────────────────────┤ >>> │How do the nice value of a process and the nice │ >>> │value of an autogroup interact? Which has priority? │ >>> │ │ >>> │It *appears* that the autogroup nice value is used │ >>> │for CPU distribution between task groups, and that │ >>> │the process nice value has no effect there. (I.e., │ >>> │suppose two autogroups each contain a CPU-bound │ >>> │process, with one process having nice==0 and the │ >>> │other having nice==19. It appears that they each │ >>> │get 50% of the CPU.) It appears that the process │ >>> │nice value has effect only with respect to schedul‐ │ >>> │ing relative to other processes in the *same* auto‐ │ >>> │group. Is this correct? │ >>> └─────────────────────────────────────────────────────┘ >> >> Yup, entity nice level affects distribution among peer entities. > > Huh! I only just learned about this via my experiments while > investigating autogroups. > > How long have things been like this? Always? (I don't think > so.) Since the arrival of CFS? Since the arrival of > autogrouping? (I'm guessing not.) Since some other point? > (When?) Okay, things changed sometime after 2.6.31, at least. (Just tested on an old box.) So, presumably with the arrival of either CFS or autogrouping? Next comment certainly applies: > It seems to me that this renders the traditional process > nice pretty much useless. (I bet I'm not the only one who'd > be surprised by the current behavior.) Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 15:04 ` Michael Kerrisk (man-pages) 2016-11-25 15:48 ` Michael Kerrisk (man-pages) @ 2016-11-25 15:51 ` Mike Galbraith 2016-11-25 16:08 ` Michael Kerrisk (man-pages) 2016-11-25 16:04 ` Peter Zijlstra 2 siblings, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2016-11-25 15:51 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Fri, 2016-11-25 at 16:04 +0100, Michael Kerrisk (man-pages) wrote: > > > ┌─────────────────────────────────────────────────────┐ > > > │FIXME │ > > > ├─────────────────────────────────────────────────────┤ > > > │How do the nice value of a process and the nice │ > > > │value of an autogroup interact? Which has priority? │ > > > │ │ > > > │It *appears* that the autogroup nice value is used │ > > > │for CPU distribution between task groups, and that │ > > > │the process nice value has no effect there. (I.e., │ > > > │suppose two autogroups each contain a CPU-bound │ > > > │process, with one process having nice==0 and the │ > > > │other having nice==19. It appears that they each │ > > > │get 50% of the CPU.) It appears that the process │ > > > │nice value has effect only with respect to schedul‐ │ > > > │ing relative to other processes in the *same* auto‐ │ > > > │group. Is this correct? │ > > > └─────────────────────────────────────────────────────┘ > > > > Yup, entity nice level affects distribution among peer entities. > > Huh! I only just learned about this via my experiments while > investigating autogroups. > > How long have things been like this? Always? (I don't think > so.) Since the arrival of CFS? Since the arrival of > autogrouping? (I'm guessing not.) Since some other point? > (When?) Always. Before CFS there just were no non-peers :) > It seems to me that this renders the traditional process > nice pretty much useless. (I bet I'm not the only one who'd > be surprised by the current behavior.) Yup, group scheduling is not a single edged sword, those don't exist. Box wide nice loss is not the only thing that can bite you, fairness, whether group or task oriented cuts both ways. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 15:51 ` Mike Galbraith @ 2016-11-25 16:08 ` Michael Kerrisk (man-pages) 2016-11-25 16:18 ` Peter Zijlstra 0 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-25 16:08 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On 11/25/2016 04:51 PM, Mike Galbraith wrote: > On Fri, 2016-11-25 at 16:04 +0100, Michael Kerrisk (man-pages) wrote: > >>>> ┌─────────────────────────────────────────────────────┐ >>>> │FIXME │ >>>> ├─────────────────────────────────────────────────────┤ >>>> │How do the nice value of a process and the nice │ >>>> │value of an autogroup interact? Which has priority? │ >>>> │ │ >>>> │It *appears* that the autogroup nice value is used │ >>>> │for CPU distribution between task groups, and that │ >>>> │the process nice value has no effect there. (I.e., │ >>>> │suppose two autogroups each contain a CPU-bound │ >>>> │process, with one process having nice==0 and the │ >>>> │other having nice==19. It appears that they each │ >>>> │get 50% of the CPU.) It appears that the process │ >>>> │nice value has effect only with respect to schedul‐ │ >>>> │ing relative to other processes in the *same* auto‐ │ >>>> │group. Is this correct? │ >>>> └─────────────────────────────────────────────────────┘ >>> >>> Yup, entity nice level affects distribution among peer entities. >> >> Huh! I only just learned about this via my experiments while >> investigating autogroups. >> >> How long have things been like this? Always? (I don't think >> so.) Since the arrival of CFS? Since the arrival of >> autogrouping? (I'm guessing not.) Since some other point? >> (When?) > > Always. Before CFS there just were no non-peers :) Well that's one way of looking at it. So, the change that I'm talking about came in 2.6.32 with CFS then? >> It seems to me that this renders the traditional process >> nice pretty much useless. (I bet I'm not the only one who'd >> be surprised by the current behavior.) > > Yup, group scheduling is not a single edged sword, those don't exist. > Box wide nice loss is not the only thing that can bite you, fairness, > whether group or task oriented cuts both ways. Understood. But again I'll say, I bet a lot of old-time users (and maybe many newer) would be surprised by the fact that nice(1) / setpriority(2) have effectively been rendered no-ops in many use cases. At the very least, it'd have been nice if someone had sent a man pages patch or at least a note... Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 16:08 ` Michael Kerrisk (man-pages) @ 2016-11-25 16:18 ` Peter Zijlstra 2016-11-25 16:34 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 38+ messages in thread From: Peter Zijlstra @ 2016-11-25 16:18 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Fri, Nov 25, 2016 at 05:08:44PM +0100, Michael Kerrisk (man-pages) wrote: > On 11/25/2016 04:51 PM, Mike Galbraith wrote: > Well that's one way of looking at it. So, the change > that I'm talking about came in 2.6.32 with CFS then? cfs-cgroup landed later I think, and it was fairly wobbly in the first few release (as per usual I'd say for major features). ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 16:18 ` Peter Zijlstra @ 2016-11-25 16:34 ` Michael Kerrisk (man-pages) 2016-11-25 20:54 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-25 16:34 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner On 11/25/2016 05:18 PM, Peter Zijlstra wrote: > On Fri, Nov 25, 2016 at 05:08:44PM +0100, Michael Kerrisk (man-pages) wrote: >> On 11/25/2016 04:51 PM, Mike Galbraith wrote: >> Well that's one way of looking at it. So, the change >> that I'm talking about came in 2.6.32 with CFS then? > > cfs-cgroup landed later I think, and it was fairly wobbly in the first > few release (as per usual I'd say for major features). So I've been searching git logs and elsewhere, but didn't yet find a likely commit(s). Any clues what I should be looking for. I'd like this info, because while documenting the changes, I'd also like to document when they occurred. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 16:34 ` Michael Kerrisk (man-pages) @ 2016-11-25 20:54 ` Michael Kerrisk (man-pages) 2016-11-25 21:49 ` Peter Zijlstra 0 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-25 20:54 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Peter, On 11/25/2016 05:34 PM, Michael Kerrisk (man-pages) wrote: > On 11/25/2016 05:18 PM, Peter Zijlstra wrote: >> On Fri, Nov 25, 2016 at 05:08:44PM +0100, Michael Kerrisk (man-pages) wrote: >>> On 11/25/2016 04:51 PM, Mike Galbraith wrote: >>> Well that's one way of looking at it. So, the change >>> that I'm talking about came in 2.6.32 with CFS then? >> >> cfs-cgroup landed later I think, and it was fairly wobbly in the first >> few release (as per usual I'd say for major features). > > So I've been searching git logs and elsewhere, but didn't yet > find a likely commit(s). Any clues what I should be looking for. > I'd like this info, because while documenting the changes, I'd > also like to document when they occurred. So, part of what I was struggling with was what you meant by cfs-cgroup. Do you mean the CFS bandwidth control features added in Linux 3.2? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 20:54 ` Michael Kerrisk (man-pages) @ 2016-11-25 21:49 ` Peter Zijlstra 2016-11-29 7:43 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 38+ messages in thread From: Peter Zijlstra @ 2016-11-25 21:49 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Fri, Nov 25, 2016 at 09:54:05PM +0100, Michael Kerrisk (man-pages) wrote: > So, part of what I was struggling with was what you meant by cfs-cgroup. > Do you mean the CFS bandwidth control features added in Linux 3.2? Nope, /me digs around for a bit... around here I suppose: 68318b8e0b61 ("Hook up group scheduler with control groups") 68318b8e0b61 v2.6.24-rc1~151 But I really have no idea what that looked like. In any case, for the case of autogroup, the behaviour has always been, autogroups came quite late. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 21:49 ` Peter Zijlstra @ 2016-11-29 7:43 ` Michael Kerrisk (man-pages) 2016-11-29 11:46 ` Peter Zijlstra 0 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-29 7:43 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Peter, On 11/25/2016 10:49 PM, Peter Zijlstra wrote: > On Fri, Nov 25, 2016 at 09:54:05PM +0100, Michael Kerrisk (man-pages) wrote: >> So, part of what I was struggling with was what you meant by cfs-cgroup. >> Do you mean the CFS bandwidth control features added in Linux 3.2? > > Nope, /me digs around for a bit... around here I suppose: > > 68318b8e0b61 ("Hook up group scheduler with control groups") Thanks. The pieces are starting to fall into place now. > 68318b8e0b61 v2.6.24-rc1~151 > > But I really have no idea what that looked like. > > In any case, for the case of autogroup, the behaviour has always been, > autogroups came quite late. This ("the behavior has always been") isn't quite true. Yes, group scheduling has been around since Linux 2.6.24, but in terms of the semantics of the thread nice value, there was no visible change then, *unless* explicit action was taken to create cgroups. The arrival of autogroups in Linux 2.6.38 was different. With this feature enabled (which is the default), task groups were implicitly created *without the user needing to do anything*. Thus, [two terminal windows] == [two task groups] and in those two terminal windows, nice(1) on a CPU-bound command in one terminal did nothing in terms of improving CPU access for a CPU-bound tasks running on the other terminal window. Put more succinctly: in Linux 2.6.38, autogrouping broke nice(1) for many use cases. Once I came to that simple summary it was easy to find multiple reports of problems from users: http://serverfault.com/questions/405092/nice-level-not-working-on-linux http://superuser.com/questions/805599/nice-has-no-effect-in-linux-unless-the-same-shell-is-used https://www.reddit.com/r/linux/comments/1c4jew/nice_has_no_effect/ http://stackoverflow.com/questions/10342470/process-niceness-priority-setting-has-no-effect-on-linux Someone else quickly pointed out to me another such report: https://bbs.archlinux.org/viewtopic.php?id=149553 And when I quickly surveyed a few more or less savvy Linux users in one room, most understood what nice does, but none of them knew about the behavior change wrought by autogroup. I haven't looked at all of the mails in the old threads that discussed the implementation of this feature, but so far none of those that I saw mentioned this behavior change. It's unfortunate that it never even got documented. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-29 7:43 ` Michael Kerrisk (man-pages) @ 2016-11-29 11:46 ` Peter Zijlstra 2016-11-29 13:44 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 38+ messages in thread From: Peter Zijlstra @ 2016-11-29 11:46 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Tue, Nov 29, 2016 at 08:43:33AM +0100, Michael Kerrisk (man-pages) wrote: > > > > In any case, for the case of autogroup, the behaviour has always been, > > autogroups came quite late. > > This ("the behavior has always been") isn't quite true. Yes, group > scheduling has been around since Linux 2.6.24, but in terms of the > semantics of the thread nice value, there was no visible change > then, *unless* explicit action was taken to create cgroups. > > The arrival of autogroups in Linux 2.6.38 was different. > With this feature enabled (which is the default), task I don't think the SCHED_AUTOGROUP symbol is default y, most distros might have default enabled it, but that's not something I can help. > groups were implicitly created *without the user needing to > do anything*. Thus, [two terminal windows] == [two task groups] > and in those two terminal windows, nice(1) on a CPU-bound > command in one terminal did nothing in terms of improving > CPU access for a CPU-bound tasks running on the other terminal > window. > > Put more succinctly: in Linux 2.6.38, autogrouping broke nice(1) > for many use cases. > > Once I came to that simple summary it was easy to find multiple > reports of problems from users: > > http://serverfault.com/questions/405092/nice-level-not-working-on-linux > http://superuser.com/questions/805599/nice-has-no-effect-in-linux-unless-the-same-shell-is-used > https://www.reddit.com/r/linux/comments/1c4jew/nice_has_no_effect/ > http://stackoverflow.com/questions/10342470/process-niceness-priority-setting-has-no-effect-on-linux > > Someone else quickly pointed out to me another such report: > > https://bbs.archlinux.org/viewtopic.php?id=149553 Well, none of that ever got back to me, so again, nothing I could do about that. > And when I quickly surveyed a few more or less savvy Linux users > in one room, most understood what nice does, but none of them knew > about the behavior change wrought by autogroup. > > I haven't looked at all of the mails in the old threads that > discussed the implementation of this feature, but so far none of > those that I saw mentioned this behavior change. It's unfortunate > that it never even got documented. Well, when we added the feature people (most notable Linus) understood what cgroups did. So no surprises for any of us. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-29 11:46 ` Peter Zijlstra @ 2016-11-29 13:44 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-29 13:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Peter, On 29 November 2016 at 12:46, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Nov 29, 2016 at 08:43:33AM +0100, Michael Kerrisk (man-pages) wrote: >> > >> > In any case, for the case of autogroup, the behaviour has always been, >> > autogroups came quite late. >> >> This ("the behavior has always been") isn't quite true. Yes, group >> scheduling has been around since Linux 2.6.24, but in terms of the >> semantics of the thread nice value, there was no visible change >> then, *unless* explicit action was taken to create cgroups. >> >> The arrival of autogroups in Linux 2.6.38 was different. >> With this feature enabled (which is the default), task > > I don't think the SCHED_AUTOGROUP symbol is default y, most distros > might have default enabled it, but that's not something I can help. Actually, it looks to me like it is the default. But that isn't really the point. Even if the default was off, it's the way of things that distros will generally default "on" things, because some users want them. That's a repeated and to be expected pattern. >> groups were implicitly created *without the user needing to >> do anything*. Thus, [two terminal windows] == [two task groups] >> and in those two terminal windows, nice(1) on a CPU-bound >> command in one terminal did nothing in terms of improving >> CPU access for a CPU-bound tasks running on the other terminal >> window. >> >> Put more succinctly: in Linux 2.6.38, autogrouping broke nice(1) >> for many use cases. >> >> Once I came to that simple summary it was easy to find multiple >> reports of problems from users: >> >> http://serverfault.com/questions/405092/nice-level-not-working-on-linux >> http://superuser.com/questions/805599/nice-has-no-effect-in-linux-unless-the-same-shell-is-used >> https://www.reddit.com/r/linux/comments/1c4jew/nice_has_no_effect/ >> http://stackoverflow.com/questions/10342470/process-niceness-priority-setting-has-no-effect-on-linux >> >> Someone else quickly pointed out to me another such report: >> >> https://bbs.archlinux.org/viewtopic.php?id=149553 > > Well, none of that ever got back to me, so again, nothing I could do > about that. I understand. It's just unfortunate that the (as far as I can see) the implications were not fully considered before making the change. Such consideration often springs out of writing comprehensive documentation, I find ;-). >> And when I quickly surveyed a few more or less savvy Linux users >> in one room, most understood what nice does, but none of them knew >> about the behavior change wrought by autogroup. >> >> I haven't looked at all of the mails in the old threads that >> discussed the implementation of this feature, but so far none of >> those that I saw mentioned this behavior change. It's unfortunate >> that it never even got documented. > > Well, when we added the feature people (most notable Linus) understood > what cgroups did. So no surprises for any of us. Sure, but cgroups is different. It requires explicit action by the ueser (creating cgroups) to see the behavior. With autogroups, the change kicks in on the desktop without the user needing to do anything, and changes desktop behavior in a way that was unexpected. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 15:04 ` Michael Kerrisk (man-pages) 2016-11-25 15:48 ` Michael Kerrisk (man-pages) 2016-11-25 15:51 ` Mike Galbraith @ 2016-11-25 16:04 ` Peter Zijlstra 2016-11-25 16:13 ` Peter Zijlstra 2016-11-25 16:33 ` Michael Kerrisk (man-pages) 2 siblings, 2 replies; 38+ messages in thread From: Peter Zijlstra @ 2016-11-25 16:04 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Fri, Nov 25, 2016 at 04:04:25PM +0100, Michael Kerrisk (man-pages) wrote: > >> ┌─────────────────────────────────────────────────────┐ > >> │FIXME │ > >> ├─────────────────────────────────────────────────────┤ > >> │How do the nice value of a process and the nice │ > >> │value of an autogroup interact? Which has priority? │ > >> │ │ > >> │It *appears* that the autogroup nice value is used │ > >> │for CPU distribution between task groups, and that │ > >> │the process nice value has no effect there. (I.e., │ > >> │suppose two autogroups each contain a CPU-bound │ > >> │process, with one process having nice==0 and the │ > >> │other having nice==19. It appears that they each │ > >> │get 50% of the CPU.) It appears that the process │ > >> │nice value has effect only with respect to schedul‐ │ > >> │ing relative to other processes in the *same* auto‐ │ > >> │group. Is this correct? │ > >> └─────────────────────────────────────────────────────┘ > > > > Yup, entity nice level affects distribution among peer entities. > > Huh! I only just learned about this via my experiments while > investigating autogroups. > > How long have things been like this? Always? (I don't think > so.) Since the arrival of CFS? Since the arrival of > autogrouping? (I'm guessing not.) Since some other point? > (When?) Ever since cfs-cgroup, this is a fundamental design point of cgroups, and has therefore always been the case for autogroups (as that is nothing more than an application of the cgroup code). > It seems to me that this renders the traditional process > nice pretty much useless. (I bet I'm not the only one who'd > be surprised by the current behavior.) Its really rather fundamental to how the whole hierarchical things works. CFS is a weighted fair queueing scheduler; this means each entity receives: w_i dt_i = dt -------- \Sum w_j CPU ______/ \______ / | | \ A B C D So if each entity {A,B,C,D} has equal weight, then they will receive equal time. Explicitly, for C you get: w_C dt_C = dt ----------------------- (w_A + w_B + w_C + w_D) Extending this to a hierarchy, we get: CPU ______/ \______ / | | \ A B C D / \ E F Where C becomes a 'server' for entities {E,F}. The weight of C does not depend on its child entities. This way the time of {E,F} becomes a straight product of their ratio with C. That is; the whole thing becomes, where l denotes the level in the hierarchy and i an entity on that level: l w_g,i dt_l,i = dt \Prod ---------- g=0 \Sum w_g,j Or more concretely, for E: w_E dt_1,E = dt_0,C ----------- (w_E + w_F) w_C w_E = dt ----------------------- ----------- (w_A + w_B + w_C + w_D) (w_E + w_F) And this 'trivially' extends to SMP, with the tricky bit being that the sums over all entities end up being machine wide, instead of per CPU, which is a real and royal pain for performance. Note that this property, where the weight of the server entity is independent from its child entities is a desired feature. Without that it would be impossible to control the relative weights of groups, and that is the sole parameter of the WFQ model. It is also why Linus so likes autogroups, each session competes equally amongst one another. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 16:04 ` Peter Zijlstra @ 2016-11-25 16:13 ` Peter Zijlstra 2016-11-25 16:33 ` Michael Kerrisk (man-pages) 1 sibling, 0 replies; 38+ messages in thread From: Peter Zijlstra @ 2016-11-25 16:13 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Fri, Nov 25, 2016 at 05:04:56PM +0100, Peter Zijlstra wrote: > That is; the whole thing > becomes, where l denotes the level in the hierarchy and i an > entity on that level: > > l w_g,i > dt_l,i = dt \Prod ---------- > g=0 \Sum w_g,j > > > Or more concretely, for E: > > w_E > dt_1,E = dt_0,C ----------- > (w_E + w_F) > > w_C w_E > = dt ----------------------- ----------- > (w_A + w_B + w_C + w_D) (w_E + w_F) > And this also immediately shows one of the 'problems' with it. Since we don't have floating point in kernel, these fractions are evaluated with fixed-point arithmetic. Traditionally (and on 32bit) we use 10bit fixed point, recently we switched to 20bit for 64bit machines. That change is what bit you on the nice testing. But it also means that once we run out of fractional bits things go wobbly. The fractions, as per the above, increase the deeper the group hierarchy goes but are also affected by the number of CPUs in the system (not immediately represented in that equation). Not to mention that many scheduler operations become O(depth) in cost, which also hurts. An obvious example being task selection, we pick a runnable entity for each level, until the resulting entity has no further children (iow. is a task). ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 16:04 ` Peter Zijlstra 2016-11-25 16:13 ` Peter Zijlstra @ 2016-11-25 16:33 ` Michael Kerrisk (man-pages) 2016-11-25 22:48 ` Peter Zijlstra 1 sibling, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-25 16:33 UTC (permalink / raw) To: Peter Zijlstra Cc: mtk.manpages, Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Peter, On 11/25/2016 05:04 PM, Peter Zijlstra wrote: > On Fri, Nov 25, 2016 at 04:04:25PM +0100, Michael Kerrisk (man-pages) wrote: >>>> ┌─────────────────────────────────────────────────────┐ >>>> │FIXME │ >>>> ├─────────────────────────────────────────────────────┤ >>>> │How do the nice value of a process and the nice │ >>>> │value of an autogroup interact? Which has priority? │ >>>> │ │ >>>> │It *appears* that the autogroup nice value is used │ >>>> │for CPU distribution between task groups, and that │ >>>> │the process nice value has no effect there. (I.e., │ >>>> │suppose two autogroups each contain a CPU-bound │ >>>> │process, with one process having nice==0 and the │ >>>> │other having nice==19. It appears that they each │ >>>> │get 50% of the CPU.) It appears that the process │ >>>> │nice value has effect only with respect to schedul‐ │ >>>> │ing relative to other processes in the *same* auto‐ │ >>>> │group. Is this correct? │ >>>> └─────────────────────────────────────────────────────┘ >>> >>> Yup, entity nice level affects distribution among peer entities. >> >> Huh! I only just learned about this via my experiments while >> investigating autogroups. >> >> How long have things been like this? Always? (I don't think >> so.) Since the arrival of CFS? Since the arrival of >> autogrouping? (I'm guessing not.) Since some other point? >> (When?) > > Ever since cfs-cgroup, Okay. That begs the question still though. > this is a fundamental design point of cgroups, > and has therefore always been the case for autogroups (as that is > nothing more than an application of the cgroup code). Understood. >> It seems to me that this renders the traditional process >> nice pretty much useless. (I bet I'm not the only one who'd >> be surprised by the current behavior.) > > Its really rather fundamental to how the whole hierarchical things > works. > > CFS is a weighted fair queueing scheduler; this means each entity > receives: > > w_i > dt_i = dt -------- > \Sum w_j > > > CPU > ______/ \______ > / | | \ > A B C D > > > So if each entity {A,B,C,D} has equal weight, then they will receive > equal time. Explicitly, for C you get: > > > w_C > dt_C = dt ----------------------- > (w_A + w_B + w_C + w_D) > > > Extending this to a hierarchy, we get: > > > CPU > ______/ \______ > / | | \ > A B C D > / \ > E F > > Where C becomes a 'server' for entities {E,F}. The weight of C does not > depend on its child entities. This way the time of {E,F} becomes a > straight product of their ratio with C. That is; the whole thing > becomes, where l denotes the level in the hierarchy and i an > entity on that level: > > l w_g,i > dt_l,i = dt \Prod ---------- > g=0 \Sum w_g,j > > > Or more concretely, for E: > > w_E > dt_1,E = dt_0,C ----------- > (w_E + w_F) > > w_C w_E > = dt ----------------------- ----------- > (w_A + w_B + w_C + w_D) (w_E + w_F) > > > And this 'trivially' extends to SMP, with the tricky bit being that the > sums over all entities end up being machine wide, instead of per CPU, > which is a real and royal pain for performance. Okay -- you're really quite the ASCII artist. And somehow, I think you needed to compose the mail in LaTeX. But thanks for the detail. It's helpful, for me at least. > Note that this property, where the weight of the server entity is > independent from its child entities is a desired feature. Without that > it would be impossible to control the relative weights of groups, and > that is the sole parameter of the WFQ model. > > It is also why Linus so likes autogroups, each session competes equally > amongst one another. I get it. But, the behavior changes for the process nice value are undocumented, and they should be documented. I understand what the behavior change was. But not yet when. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature [v2] 2016-11-25 16:33 ` Michael Kerrisk (man-pages) @ 2016-11-25 22:48 ` Peter Zijlstra 0 siblings, 0 replies; 38+ messages in thread From: Peter Zijlstra @ 2016-11-25 22:48 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Mike Galbraith, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Fri, Nov 25, 2016 at 05:33:23PM +0100, Michael Kerrisk (man-pages) wrote: > Okay -- you're really quite the ASCII artist. And somehow, > I think you needed to compose the mail in LaTeX. But thanks > for the detail. It's helpful, for me at least. Hehe, its been a while since I did LaTeX, so I'd probably make a mess of it :-) Glad my ramblings made sense. > > Note that this property, where the weight of the server entity is > > independent from its child entities is a desired feature. Without that > > it would be impossible to control the relative weights of groups, and > > that is the sole parameter of the WFQ model. > > > > It is also why Linus so likes autogroups, each session competes equally > > amongst one another. > > I get it. But, the behavior changes for the process nice value are > undocumented, and they should be documented. I understand > what the behavior change was. But not yet when. Well, its all undocumented -- I suppose you're about to go fix that :-) But think of it differently, think of the group as a container, then the behaviour inside the container is exactly as expected. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 15:33 ` Mike Galbraith 2016-11-23 16:04 ` Michael Kerrisk (man-pages) @ 2016-11-23 16:05 ` Michael Kerrisk (man-pages) 2016-11-23 17:19 ` Mike Galbraith 2016-11-27 21:13 ` Michael Kerrisk (man-pages) 2 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-23 16:05 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner > I don't think we need group scheduling details, there's plenty of > documentation elsewhere for those who want theory. Actually, which documentation were you referring to here? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 16:05 ` RFC: documentation of the autogroup feature Michael Kerrisk (man-pages) @ 2016-11-23 17:19 ` Mike Galbraith 2016-11-23 22:12 ` Michael Kerrisk (man-pages) 0 siblings, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2016-11-23 17:19 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Wed, 2016-11-23 at 17:05 +0100, Michael Kerrisk (man-pages) wrote: > > I don't think we need group scheduling details, there's plenty of > > documentation elsewhere for those who want theory. > > Actually, which documentation were you referring to here? Documentation/scheduler/* ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 17:19 ` Mike Galbraith @ 2016-11-23 22:12 ` Michael Kerrisk (man-pages) 0 siblings, 0 replies; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-23 22:12 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On 11/23/2016 06:19 PM, Mike Galbraith wrote: > On Wed, 2016-11-23 at 17:05 +0100, Michael Kerrisk (man-pages) wrote: >>> I don't think we need group scheduling details, there's plenty of >>> documentation elsewhere for those who want theory. >> >> Actually, which documentation were you referring to here? > > Documentation/scheduler/* I think there's a lot less information in there than you think... Certainly, I can't get any big picture from reading those docs. Cheers Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-23 15:33 ` Mike Galbraith 2016-11-23 16:04 ` Michael Kerrisk (man-pages) 2016-11-23 16:05 ` RFC: documentation of the autogroup feature Michael Kerrisk (man-pages) @ 2016-11-27 21:13 ` Michael Kerrisk (man-pages) 2016-11-28 1:46 ` Mike Galbraith 2 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-27 21:13 UTC (permalink / raw) To: Mike Galbraith Cc: mtk.manpages, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner Hi Mike, On 11/23/2016 04:33 PM, Mike Galbraith wrote: > On Wed, 2016-11-23 at 14:54 +0100, Michael Kerrisk (man-pages) wrote: >> Hi Mike, [...] >> Actually, can you define for me what the root task group is, and >> why it exists? That may be worth some words in this man page. > > I don't think we need group scheduling details, there's plenty of > documentation elsewhere for those who want theory. Autogroup is for > those who don't want to have to care (which is also why it should have > never grown nice knob). Actually, the more I think about this, the more I think we *do* need a few details on group scheduling. Otherwise, it's difficult to explain to the use why nice(1) no longer works as traditionally expected. Here's my attempt to define the root task group: * If autogrouping is disabled, then all processes in the root CPU cgroup form a scheduling group (sometimes called the "root task group"). Can you improve on this? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-27 21:13 ` Michael Kerrisk (man-pages) @ 2016-11-28 1:46 ` Mike Galbraith [not found] ` <1127218a-dd9b-71a8-845d-3a83969632fc@gmail.com> 0 siblings, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2016-11-28 1:46 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Sun, 2016-11-27 at 22:13 +0100, Michael Kerrisk (man-pages) wrote: > Here's my attempt to define the root task group: > > * If autogrouping is disabled, then all processes in the root CPU > cgroup form a scheduling group (sometimes called the "root task > group"). > > Can you improve on this? A task group is a set of percpu runqueues. The root task group is the top level set in a hierarchy of such sets when group scheduling is enabled, or the only set when group scheduling is not enabled. The autogroup hierarchy has a depth of one, ie all autogroups are peers who's common parent is the root task group. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
[parent not found: <1127218a-dd9b-71a8-845d-3a83969632fc@gmail.com>]
* Re: RFC: documentation of the autogroup feature [not found] ` <1127218a-dd9b-71a8-845d-3a83969632fc@gmail.com> @ 2016-11-29 9:10 ` Michael Kerrisk (man-pages) 2016-11-29 13:46 ` Mike Galbraith 0 siblings, 1 reply; 38+ messages in thread From: Michael Kerrisk (man-pages) @ 2016-11-29 9:10 UTC (permalink / raw) To: Mike Galbraith Cc: Michael Kerrisk, Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner [Resending because of bounces from the lists. (Somehow my mailer messed up the MIME labeling)] Hi Mike, On 11/28/2016 02:46 AM, Mike Galbraith wrote: > On Sun, 2016-11-27 at 22:13 +0100, Michael Kerrisk (man-pages) wrote: > >> Here's my attempt to define the root task group: >> >> * If autogrouping is disabled, then all processes in the root CPU >> cgroup form a scheduling group (sometimes called the "root task >> group"). >> >> Can you improve on this? The below is helpful, but... > A task group is a set of percpu runqueues. The explanation needs really to be in terms of what user-space understands and sees. "Runqueues" are a kernel scheduler implementation detail. > The root task group is the > top level set in a hierarchy of such sets when group scheduling is > enabled, or the only set when group scheduling is not enabled. The > autogroup hierarchy has a depth of one, ie all autogroups are peers > who's common parent is the root task group. Let's try and go further. How's this: When scheduling non-real-time processes (i.e., those scheduled under the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the CFS scheduler employs a technique known as "group scheduling", if the kernel was configured with the CONFIG_FAIR_GROUP_SCHED option (which is typical). Under group scheduling, threads are scheduled in "task groups". Task groups have a hierarchical relationship, rooted under the initial task group on the system, known as the "root task group". Task groups are formed in the following circumstances: * All of the threads in a CPU cgroup form a task group. The par‐ ent of this task group is the task group of the corresponding parent cgroup. * If autogrouping is enabled, then all of the threads that are (implicitly) placed in an autogroup (i.e., the same session, as created by setsid(2)) form a task group. Each new autogroup is thus a separate task group. The root task group is the parent of all such autogroups. * If autogrouping is enabled, then the root task group consists of all processes in the root CPU cgroup that were not otherwise implicitly placed into a new autogroup. * If autogrouping is disabled, then the root task group consists of all processes in the root CPU cgroup. * If group scheduling was disabled (i.e., the kernel was config‐ ured without CONFIG_FAIR_GROUP_SCHED), then all of the pro‐ cesses on the system are notionally placed in a single task group. [To be followed by a discussion of the nice value and task groups] ? Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: RFC: documentation of the autogroup feature 2016-11-29 9:10 ` Michael Kerrisk (man-pages) @ 2016-11-29 13:46 ` Mike Galbraith 0 siblings, 0 replies; 38+ messages in thread From: Mike Galbraith @ 2016-11-29 13:46 UTC (permalink / raw) To: mtk.manpages Cc: Peter Zijlstra, Ingo Molnar, linux-man, lkml, Thomas Gleixner On Tue, 2016-11-29 at 10:10 +0100, Michael Kerrisk (man-pages) wrote: > Let's try and go further. How's this: > > When scheduling non-real-time processes (i.e., those scheduled > under the SCHED_OTHER, SCHED_BATCH, and SCHED_IDLE policies), the > CFS scheduler employs a technique known as "group scheduling", if > the kernel was configured with the CONFIG_FAIR_GROUP_SCHED option > (which is typical). > > Under group scheduling, threads are scheduled in "task groups". > Task groups have a hierarchical relationship, rooted under the > initial task group on the system, known as the "root task group". > Task groups are formed in the following circumstances: > > * All of the threads in a CPU cgroup form a task group. The par‐ > ent of this task group is the task group of the corresponding > parent cgroup. > > * If autogrouping is enabled, then all of the threads that are > (implicitly) placed in an autogroup (i.e., the same session, as > created by setsid(2)) form a task group. Each new autogroup is > thus a separate task group. The root task group is the parent > of all such autogroups. > > * If autogrouping is enabled, then the root task group consists > of all processes in the root CPU cgroup that were not otherwise > implicitly placed into a new autogroup. > > * If autogrouping is disabled, then the root task group consists > of all processes in the root CPU cgroup. > > * If group scheduling was disabled (i.e., the kernel was config‐ > ured without CONFIG_FAIR_GROUP_SCHED), then all of the pro‐ > cesses on the system are notionally placed in a single task > group. Notionally works for me. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2016-11-29 13:48 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-22 15:59 RFC: documentation of the autogroup feature Michael Kerrisk (man-pages) 2016-11-23 10:33 ` [patch] sched/autogroup: Fix 64bit kernel nice adjustment Mike Galbraith 2016-11-23 13:47 ` Michael Kerrisk (man-pages) 2016-11-23 14:12 ` Mike Galbraith 2016-11-23 14:20 ` Michael Kerrisk (man-pages) 2016-11-23 15:55 ` Mike Galbraith 2016-11-24 6:24 ` [tip:sched/urgent] sched/autogroup: Fix 64-bit kernel nice level adjustment tip-bot for Mike Galbraith 2016-11-23 11:39 ` RFC: documentation of the autogroup feature Mike Galbraith 2016-11-23 13:54 ` Michael Kerrisk (man-pages) 2016-11-23 15:33 ` Mike Galbraith 2016-11-23 16:04 ` Michael Kerrisk (man-pages) 2016-11-23 17:11 ` Mike Galbraith 2016-11-24 21:41 ` RFC: documentation of the autogroup feature [v2] Michael Kerrisk (man-pages) 2016-11-25 12:52 ` Afzal Mohammed 2016-11-25 13:04 ` Michael Kerrisk (man-pages) 2016-11-25 13:02 ` Mike Galbraith 2016-11-25 15:04 ` Michael Kerrisk (man-pages) 2016-11-25 15:48 ` Michael Kerrisk (man-pages) 2016-11-25 15:51 ` Mike Galbraith 2016-11-25 16:08 ` Michael Kerrisk (man-pages) 2016-11-25 16:18 ` Peter Zijlstra 2016-11-25 16:34 ` Michael Kerrisk (man-pages) 2016-11-25 20:54 ` Michael Kerrisk (man-pages) 2016-11-25 21:49 ` Peter Zijlstra 2016-11-29 7:43 ` Michael Kerrisk (man-pages) 2016-11-29 11:46 ` Peter Zijlstra 2016-11-29 13:44 ` Michael Kerrisk (man-pages) 2016-11-25 16:04 ` Peter Zijlstra 2016-11-25 16:13 ` Peter Zijlstra 2016-11-25 16:33 ` Michael Kerrisk (man-pages) 2016-11-25 22:48 ` Peter Zijlstra 2016-11-23 16:05 ` RFC: documentation of the autogroup feature Michael Kerrisk (man-pages) 2016-11-23 17:19 ` Mike Galbraith 2016-11-23 22:12 ` Michael Kerrisk (man-pages) 2016-11-27 21:13 ` Michael Kerrisk (man-pages) 2016-11-28 1:46 ` Mike Galbraith [not found] ` <1127218a-dd9b-71a8-845d-3a83969632fc@gmail.com> 2016-11-29 9:10 ` Michael Kerrisk (man-pages) 2016-11-29 13:46 ` Mike Galbraith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).