From: Peter Zijlstra <peterz@infradead.org> To: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org>, torvalds@linux-foundation.org, akpm@linux-foundation.org, mingo@redhat.com, lizefan@huawei.com, pjt@google.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-api@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Date: Thu, 7 Apr 2016 22:25:42 +0200 [thread overview] Message-ID: <20160407202542.GD3448@twins.programming.kicks-ass.net> (raw) In-Reply-To: <20160407194555.GI7822@mtj.duckdns.org> On Thu, Apr 07, 2016 at 03:45:55PM -0400, Tejun Heo wrote: > Hello, Peter. > > On Thu, Apr 07, 2016 at 10:08:33AM +0200, Peter Zijlstra wrote: > > On Thu, Apr 07, 2016 at 03:35:47AM -0400, Johannes Weiner wrote: > > > So it was a nice cleanup for the memory controller and I believe the > > > IO controller as well. I'd be curious how it'd be a problem for CPU? > > > > The full hierarchy took years to make work and is fully ingrained with > > how the thing words, changing it isn't going to be nice or easy. > > > > So sure, go with a lowest common denominator, instead of fixing shit, > > yay for progress :/ > > It's easy to get fixated on what each subsystem can do and develop > towards different directions siloed in each subsystem. That's what > we've had for quite a while in cgroup. Expectedly, this sends off > controllers towards different directions. Direct competion between > tasks and child cgroups was one of the main sources of balkanization. > > The balkanization was no coincidence either. Tasks and cgroups are > different types of entities and don't have the same control knobs or > follow the same lifetime rules. For absolute limits, it isn't clear > how much of the parent's resources should be distributed to internal > children as opposed to child cgroups. People end up depending on > specific implementation details and proposing one-off hacks and > interface additions. Yes, I'm familiar with the problem; but simply mandating leaf only nodes is not a solution, for the very simple fact that there are tasks in the root cgroup that cannot ever be moved out, so we _must_ be able to deal with !leaf nodes containing tasks. A consistent interface for absolute controllers to divvy up the resources between local tasks and child cgroups isn't _that_ hard. And this leaf only business totally screwed over anything proportional. This simply cannot work. > Proportional weights aren't much better either. CPU has internal > mapping between nice values and shares and treat them equally, which > can get confusing as the configured weights behave differently > depending on how many threads are in the parent cgroup which often is > opaque and can't be controlled from outside. Huh what? There's nothing confusing there, the nice to weight mapping is static and can easily be consulted. Alternatively we can make an interface where you can set weight through nice values, for those people that are afraid of numbers. But the configured weights do _not_ behave differently depending on the number of tasks, they behave exactly as specified in the proportional weight based rate distribution. We've done the math.. > Widely diverging from > CPU's behavior, IO grouped all internal tasks into an internal leaf > node and used to assign a fixed weight to it. That's just plain broken... That is not how a proportional weight based hierarchical controller works. > Now, you might think that none of it matters and each subsystem > treating cgroup hierarchy as arbitrary and orthogonal collections of > bean counters is fine; however, that makes it impossible to account > for and control operations which span different types of resources. > This prevented us from implementing resource control over frigging > buffered writes, making the whole IO control thing a joke. While CPU > currently doesn't directly tie into it, that is only because CPU > cycles spent during writeback isn't yet properly accounted. CPU cycles spend in waitqueues aren't properly accounted to whoever queued the job either, and there's a metric ton of async stuff that's not properly accounted, so what? > However, please understand that there are a lot of use cases where > comprehensive and consistent resource accounting and control over all > major resources is useful and necessary. Maybe, but so far I've only heard people complain this v2 thing didn't work for them, and as far as I can see the whole v2 model is internally inconsistent and impossible to implement. The suggestion by Johannes to adjust the leaf node weight depending on the number of tasks in is so ludicrous I don't even know where to start enumerating the fail.
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org Subject: Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Date: Thu, 7 Apr 2016 22:25:42 +0200 [thread overview] Message-ID: <20160407202542.GD3448@twins.programming.kicks-ass.net> (raw) In-Reply-To: <20160407194555.GI7822-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org> On Thu, Apr 07, 2016 at 03:45:55PM -0400, Tejun Heo wrote: > Hello, Peter. > > On Thu, Apr 07, 2016 at 10:08:33AM +0200, Peter Zijlstra wrote: > > On Thu, Apr 07, 2016 at 03:35:47AM -0400, Johannes Weiner wrote: > > > So it was a nice cleanup for the memory controller and I believe the > > > IO controller as well. I'd be curious how it'd be a problem for CPU? > > > > The full hierarchy took years to make work and is fully ingrained with > > how the thing words, changing it isn't going to be nice or easy. > > > > So sure, go with a lowest common denominator, instead of fixing shit, > > yay for progress :/ > > It's easy to get fixated on what each subsystem can do and develop > towards different directions siloed in each subsystem. That's what > we've had for quite a while in cgroup. Expectedly, this sends off > controllers towards different directions. Direct competion between > tasks and child cgroups was one of the main sources of balkanization. > > The balkanization was no coincidence either. Tasks and cgroups are > different types of entities and don't have the same control knobs or > follow the same lifetime rules. For absolute limits, it isn't clear > how much of the parent's resources should be distributed to internal > children as opposed to child cgroups. People end up depending on > specific implementation details and proposing one-off hacks and > interface additions. Yes, I'm familiar with the problem; but simply mandating leaf only nodes is not a solution, for the very simple fact that there are tasks in the root cgroup that cannot ever be moved out, so we _must_ be able to deal with !leaf nodes containing tasks. A consistent interface for absolute controllers to divvy up the resources between local tasks and child cgroups isn't _that_ hard. And this leaf only business totally screwed over anything proportional. This simply cannot work. > Proportional weights aren't much better either. CPU has internal > mapping between nice values and shares and treat them equally, which > can get confusing as the configured weights behave differently > depending on how many threads are in the parent cgroup which often is > opaque and can't be controlled from outside. Huh what? There's nothing confusing there, the nice to weight mapping is static and can easily be consulted. Alternatively we can make an interface where you can set weight through nice values, for those people that are afraid of numbers. But the configured weights do _not_ behave differently depending on the number of tasks, they behave exactly as specified in the proportional weight based rate distribution. We've done the math.. > Widely diverging from > CPU's behavior, IO grouped all internal tasks into an internal leaf > node and used to assign a fixed weight to it. That's just plain broken... That is not how a proportional weight based hierarchical controller works. > Now, you might think that none of it matters and each subsystem > treating cgroup hierarchy as arbitrary and orthogonal collections of > bean counters is fine; however, that makes it impossible to account > for and control operations which span different types of resources. > This prevented us from implementing resource control over frigging > buffered writes, making the whole IO control thing a joke. While CPU > currently doesn't directly tie into it, that is only because CPU > cycles spent during writeback isn't yet properly accounted. CPU cycles spend in waitqueues aren't properly accounted to whoever queued the job either, and there's a metric ton of async stuff that's not properly accounted, so what? > However, please understand that there are a lot of use cases where > comprehensive and consistent resource accounting and control over all > major resources is useful and necessary. Maybe, but so far I've only heard people complain this v2 thing didn't work for them, and as far as I can see the whole v2 model is internally inconsistent and impossible to implement. The suggestion by Johannes to adjust the leaf node weight depending on the number of tasks in is so ludicrous I don't even know where to start enumerating the fail.
next prev parent reply other threads:[~2016-04-07 20:25 UTC|newest] Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-03-11 15:41 [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 01/10] cgroup: introduce cgroup_[un]lock() Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 02/10] cgroup: un-inline cgroup_path() and friends Tejun Heo 2016-03-11 15:41 ` [PATCH 03/10] cgroup: introduce CGRP_MIGRATE_* flags Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 04/10] signal: make put_signal_struct() public Tejun Heo 2016-03-11 15:41 ` [PATCH 05/10] cgroup, fork: add @new_rgrp_cset[p] and @clone_flags to cgroup fork callbacks Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 06/10] cgroup, fork: add @child and @clone_flags to threadgroup_change_begin/end() Tejun Heo 2016-03-11 15:41 ` [PATCH 07/10] cgroup: introduce resource group Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 08/10] cgroup: implement rgroup control mask handling Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 09/10] cgroup: implement rgroup subtree migration Tejun Heo 2016-03-11 15:41 ` [PATCH 10/10] cgroup, sched: implement PRIO_RGRP for {set|get}priority() Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 16:05 ` Example program for PRIO_RGRP Tejun Heo 2016-03-11 16:05 ` Tejun Heo 2016-03-12 6:26 ` [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Mike Galbraith 2016-03-12 6:26 ` Mike Galbraith 2016-03-12 17:04 ` Mike Galbraith 2016-03-12 17:04 ` Mike Galbraith 2016-03-12 17:13 ` cgroup NAKs ignored? " Ingo Molnar 2016-03-12 17:13 ` Ingo Molnar 2016-03-13 14:42 ` Tejun Heo 2016-03-13 14:42 ` Tejun Heo 2016-03-13 15:00 ` Tejun Heo 2016-03-13 15:00 ` Tejun Heo 2016-03-13 17:40 ` Mike Galbraith 2016-03-13 17:40 ` Mike Galbraith 2016-04-07 0:00 ` Tejun Heo 2016-04-07 0:00 ` Tejun Heo 2016-04-07 3:26 ` Mike Galbraith 2016-04-07 3:26 ` Mike Galbraith 2016-03-14 2:23 ` Mike Galbraith 2016-03-14 2:23 ` Mike Galbraith 2016-03-14 11:30 ` Peter Zijlstra 2016-03-14 11:30 ` Peter Zijlstra 2016-04-06 15:58 ` Tejun Heo 2016-04-06 15:58 ` Tejun Heo 2016-04-06 15:58 ` Tejun Heo 2016-04-07 6:45 ` Peter Zijlstra 2016-04-07 6:45 ` Peter Zijlstra 2016-04-07 7:35 ` Johannes Weiner 2016-04-07 7:35 ` Johannes Weiner 2016-04-07 8:05 ` Mike Galbraith 2016-04-07 8:05 ` Mike Galbraith 2016-04-07 8:08 ` Peter Zijlstra 2016-04-07 8:08 ` Peter Zijlstra 2016-04-07 9:28 ` Johannes Weiner 2016-04-07 9:28 ` Johannes Weiner 2016-04-07 10:42 ` Peter Zijlstra 2016-04-07 10:42 ` Peter Zijlstra 2016-04-07 19:45 ` Tejun Heo 2016-04-07 19:45 ` Tejun Heo 2016-04-07 20:25 ` Peter Zijlstra [this message] 2016-04-07 20:25 ` Peter Zijlstra 2016-04-08 20:11 ` Tejun Heo 2016-04-08 20:11 ` Tejun Heo 2016-04-09 6:16 ` Mike Galbraith 2016-04-09 6:16 ` Mike Galbraith 2016-04-09 13:39 ` Peter Zijlstra 2016-04-09 13:39 ` Peter Zijlstra 2016-04-12 22:29 ` Tejun Heo 2016-04-12 22:29 ` Tejun Heo 2016-04-13 7:43 ` Mike Galbraith 2016-04-13 7:43 ` Mike Galbraith 2016-04-13 15:59 ` Tejun Heo 2016-04-13 19:15 ` Mike Galbraith 2016-04-13 19:15 ` Mike Galbraith 2016-04-14 6:07 ` Mike Galbraith 2016-04-14 19:57 ` Tejun Heo 2016-04-14 19:57 ` Tejun Heo 2016-04-15 2:42 ` Mike Galbraith 2016-04-15 2:42 ` Mike Galbraith 2016-04-09 16:02 ` Peter Zijlstra 2016-04-09 16:02 ` Peter Zijlstra 2016-04-07 8:28 ` Peter Zijlstra 2016-04-07 8:28 ` Peter Zijlstra 2016-04-07 19:04 ` Johannes Weiner 2016-04-07 19:04 ` Johannes Weiner 2016-04-07 19:31 ` Peter Zijlstra 2016-04-07 19:31 ` Peter Zijlstra 2016-04-07 20:23 ` Johannes Weiner 2016-04-07 20:23 ` Johannes Weiner 2016-04-08 3:13 ` Mike Galbraith 2016-04-08 3:13 ` Mike Galbraith 2016-03-15 17:21 ` Michal Hocko 2016-03-15 17:21 ` Michal Hocko 2016-04-06 21:53 ` Tejun Heo 2016-04-06 21:53 ` Tejun Heo 2016-04-07 6:40 ` Peter Zijlstra 2016-04-07 6:40 ` Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20160407202542.GD3448@twins.programming.kicks-ass.net \ --to=peterz@infradead.org \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=hannes@cmpxchg.org \ --cc=kernel-team@fb.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lizefan@huawei.com \ --cc=mingo@redhat.com \ --cc=pjt@google.com \ --cc=tj@kernel.org \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.