From: Mike Galbraith <umgwanakikbuti@gmail.com> To: Tejun Heo <tj@kernel.org>, Peter Zijlstra <peterz@infradead.org> Cc: Johannes Weiner <hannes@cmpxchg.org>, torvalds@linux-foundation.org, akpm@linux-foundation.org, mingo@redhat.com, lizefan@huawei.com, pjt@google.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-api@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Date: Wed, 13 Apr 2016 09:43:01 +0200 [thread overview] Message-ID: <1460533381.3780.191.camel@gmail.com> (raw) In-Reply-To: <20160412222915.GT24661@htj.duckdns.org> On Tue, 2016-04-12 at 18:29 -0400, Tejun Heo wrote: > Hello, Peter. > > On Sat, Apr 09, 2016 at 03:39:17PM +0200, Peter Zijlstra wrote: > > > While the separate buckets and entities model may not be as elegant as > > > tree of uniform objects, it is far from uncommon and more robust when > > > dealing with different types of objects. > > > > The graph does not care about the type of objects the nodes represent, > > and proportional weight distribution only cares about the edges. > > > > With cpu-cgroup the nodes are not of uniform type either, they can be a > > group or a task. You get runtime type identification and make it work. > > > > There just isn't an excuse for crazy crap like this. Its wrong, no two > > ways about it. > > Abstracing tasks and groups as equivalent objects works well for the > scheduler and that's great. This is also because the domain lends > itself very well to such simple and elegant approach. The only > entities of interest are tasks, as you and Mike pointed out earlier in > the thread, and group priority can be easily mapped to task priority. > However, this isn't necessarily the case for other controllers. > > There's also the issue of mapping the model to absolute controllers. > For the uniform model to work, there must be a way to treat internal > and leaf entities in the same way. For memory, the leaf entities are > processes and applying the same model would mean that memory > controller would have to implement equivalent per-process control > knobs. We don't have that. In fact, we can't have that - a > significant part of memory consumption can't be attached to a single > process. There is a fundamental distinction between internal and leaf > nodes in the memory resource graph. > > We aren't designing a spherical cow in a vacuum, and, I believe, > should aspire to make pragmatic trade-offs of all involved factors. > If multiple controllers co-operating on the same resource domains is > beneficial and required, we should figure out a way to make different > controllers agree and that way most likely will require some > trade-offs from various controllers. > > Given the currently known requirements and constraints, restricting > internal competition is a simple and straight-forward way to isolate > leaf node handling details of different controllers. > > The cost is part aesthetical and part practical. While less elegant > than tree of uniform objects, it seems a stretch to call internal / > leaf node distinction broken especially given that the model is > natural to some controllers. That justifies prohibiting proper usages of three controllers, cpu, cpuacct and cpuset? > The practical cost is loss of the ability to let leaf entities compete > against groups. However, we can't evaluate how important such > capability is without actual use-cases. If there are important ones, > please bring them up, so that we can examine the actual requirements > and try to find a good trade-off to support them. Hm, I though Google did that, and I know I mentioned another gigabuck sized outfit. Whatever, ob trade-off.. Another cpuset example is something I was asked to look into recently. There are folks out in the real world who want to run RT guests. Now VIRTUAL REALtime tickles my funny-bone, but I piddled around with it nonetheless to see what such can deliver (not much). System thing and/or libvirt created a cpuset home for qemu, but with VPUs sharing CPU with other qemu threads and the rest of the world, RT performance in little virtual box was as pathetic as one would expect. What did I do about it? Among others, the obvious, I created an exclusive cpuset, and distributed qemu contexts having different requirements among context containment vessels having the required properties. I won't be doing any more of that particular scenario, but certainly will want to distribute various contexts among various context containment vessels in future. I soon enough won't care about cgroups, but others will surely expect cpu, cpuacct and cpuset controllers to continue to function properly. > I understand that CPU controller getting constrained due to other > controllers can feel frustrating; however, the constraint is there to > solve practical problems which hopefully are being explained in this > conversation. If there is a better trade-off, we can easily get rid > of it and move on, but such decision can only be made considering all > the relevant factors. If you can think of a better solution, let's > please discuss it. None here. Any artificial restriction placed on controllers will render same broken in one way or another that will matter to someone somewhere. Making something less than it was will do that. -Mike
WARNING: multiple messages have this Message-ID (diff)
From: Mike Galbraith <umgwanakikbuti-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org Subject: Re: [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Date: Wed, 13 Apr 2016 09:43:01 +0200 [thread overview] Message-ID: <1460533381.3780.191.camel@gmail.com> (raw) In-Reply-To: <20160412222915.GT24661-piEFEHQLUPpN0TnZuCh8vA@public.gmane.org> On Tue, 2016-04-12 at 18:29 -0400, Tejun Heo wrote: > Hello, Peter. > > On Sat, Apr 09, 2016 at 03:39:17PM +0200, Peter Zijlstra wrote: > > > While the separate buckets and entities model may not be as elegant as > > > tree of uniform objects, it is far from uncommon and more robust when > > > dealing with different types of objects. > > > > The graph does not care about the type of objects the nodes represent, > > and proportional weight distribution only cares about the edges. > > > > With cpu-cgroup the nodes are not of uniform type either, they can be a > > group or a task. You get runtime type identification and make it work. > > > > There just isn't an excuse for crazy crap like this. Its wrong, no two > > ways about it. > > Abstracing tasks and groups as equivalent objects works well for the > scheduler and that's great. This is also because the domain lends > itself very well to such simple and elegant approach. The only > entities of interest are tasks, as you and Mike pointed out earlier in > the thread, and group priority can be easily mapped to task priority. > However, this isn't necessarily the case for other controllers. > > There's also the issue of mapping the model to absolute controllers. > For the uniform model to work, there must be a way to treat internal > and leaf entities in the same way. For memory, the leaf entities are > processes and applying the same model would mean that memory > controller would have to implement equivalent per-process control > knobs. We don't have that. In fact, we can't have that - a > significant part of memory consumption can't be attached to a single > process. There is a fundamental distinction between internal and leaf > nodes in the memory resource graph. > > We aren't designing a spherical cow in a vacuum, and, I believe, > should aspire to make pragmatic trade-offs of all involved factors. > If multiple controllers co-operating on the same resource domains is > beneficial and required, we should figure out a way to make different > controllers agree and that way most likely will require some > trade-offs from various controllers. > > Given the currently known requirements and constraints, restricting > internal competition is a simple and straight-forward way to isolate > leaf node handling details of different controllers. > > The cost is part aesthetical and part practical. While less elegant > than tree of uniform objects, it seems a stretch to call internal / > leaf node distinction broken especially given that the model is > natural to some controllers. That justifies prohibiting proper usages of three controllers, cpu, cpuacct and cpuset? > The practical cost is loss of the ability to let leaf entities compete > against groups. However, we can't evaluate how important such > capability is without actual use-cases. If there are important ones, > please bring them up, so that we can examine the actual requirements > and try to find a good trade-off to support them. Hm, I though Google did that, and I know I mentioned another gigabuck sized outfit. Whatever, ob trade-off.. Another cpuset example is something I was asked to look into recently. There are folks out in the real world who want to run RT guests. Now VIRTUAL REALtime tickles my funny-bone, but I piddled around with it nonetheless to see what such can deliver (not much). System thing and/or libvirt created a cpuset home for qemu, but with VPUs sharing CPU with other qemu threads and the rest of the world, RT performance in little virtual box was as pathetic as one would expect. What did I do about it? Among others, the obvious, I created an exclusive cpuset, and distributed qemu contexts having different requirements among context containment vessels having the required properties. I won't be doing any more of that particular scenario, but certainly will want to distribute various contexts among various context containment vessels in future. I soon enough won't care about cgroups, but others will surely expect cpu, cpuacct and cpuset controllers to continue to function properly. > I understand that CPU controller getting constrained due to other > controllers can feel frustrating; however, the constraint is there to > solve practical problems which hopefully are being explained in this > conversation. If there is a better trade-off, we can easily get rid > of it and move on, but such decision can only be made considering all > the relevant factors. If you can think of a better solution, let's > please discuss it. None here. Any artificial restriction placed on controllers will render same broken in one way or another that will matter to someone somewhere. Making something less than it was will do that. -Mike
next prev parent reply other threads:[~2016-04-13 7:43 UTC|newest] Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-03-11 15:41 [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 01/10] cgroup: introduce cgroup_[un]lock() Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 02/10] cgroup: un-inline cgroup_path() and friends Tejun Heo 2016-03-11 15:41 ` [PATCH 03/10] cgroup: introduce CGRP_MIGRATE_* flags Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 04/10] signal: make put_signal_struct() public Tejun Heo 2016-03-11 15:41 ` [PATCH 05/10] cgroup, fork: add @new_rgrp_cset[p] and @clone_flags to cgroup fork callbacks Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 06/10] cgroup, fork: add @child and @clone_flags to threadgroup_change_begin/end() Tejun Heo 2016-03-11 15:41 ` [PATCH 07/10] cgroup: introduce resource group Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 08/10] cgroup: implement rgroup control mask handling Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 15:41 ` [PATCH 09/10] cgroup: implement rgroup subtree migration Tejun Heo 2016-03-11 15:41 ` [PATCH 10/10] cgroup, sched: implement PRIO_RGRP for {set|get}priority() Tejun Heo 2016-03-11 15:41 ` Tejun Heo 2016-03-11 16:05 ` Example program for PRIO_RGRP Tejun Heo 2016-03-11 16:05 ` Tejun Heo 2016-03-12 6:26 ` [PATCHSET RFC cgroup/for-4.6] cgroup, sched: implement resource group and PRIO_RGRP Mike Galbraith 2016-03-12 6:26 ` Mike Galbraith 2016-03-12 17:04 ` Mike Galbraith 2016-03-12 17:04 ` Mike Galbraith 2016-03-12 17:13 ` cgroup NAKs ignored? " Ingo Molnar 2016-03-12 17:13 ` Ingo Molnar 2016-03-13 14:42 ` Tejun Heo 2016-03-13 14:42 ` Tejun Heo 2016-03-13 15:00 ` Tejun Heo 2016-03-13 15:00 ` Tejun Heo 2016-03-13 17:40 ` Mike Galbraith 2016-03-13 17:40 ` Mike Galbraith 2016-04-07 0:00 ` Tejun Heo 2016-04-07 0:00 ` Tejun Heo 2016-04-07 3:26 ` Mike Galbraith 2016-04-07 3:26 ` Mike Galbraith 2016-03-14 2:23 ` Mike Galbraith 2016-03-14 2:23 ` Mike Galbraith 2016-03-14 11:30 ` Peter Zijlstra 2016-03-14 11:30 ` Peter Zijlstra 2016-04-06 15:58 ` Tejun Heo 2016-04-06 15:58 ` Tejun Heo 2016-04-06 15:58 ` Tejun Heo 2016-04-07 6:45 ` Peter Zijlstra 2016-04-07 6:45 ` Peter Zijlstra 2016-04-07 7:35 ` Johannes Weiner 2016-04-07 7:35 ` Johannes Weiner 2016-04-07 8:05 ` Mike Galbraith 2016-04-07 8:05 ` Mike Galbraith 2016-04-07 8:08 ` Peter Zijlstra 2016-04-07 8:08 ` Peter Zijlstra 2016-04-07 9:28 ` Johannes Weiner 2016-04-07 9:28 ` Johannes Weiner 2016-04-07 10:42 ` Peter Zijlstra 2016-04-07 10:42 ` Peter Zijlstra 2016-04-07 19:45 ` Tejun Heo 2016-04-07 19:45 ` Tejun Heo 2016-04-07 20:25 ` Peter Zijlstra 2016-04-07 20:25 ` Peter Zijlstra 2016-04-08 20:11 ` Tejun Heo 2016-04-08 20:11 ` Tejun Heo 2016-04-09 6:16 ` Mike Galbraith 2016-04-09 6:16 ` Mike Galbraith 2016-04-09 13:39 ` Peter Zijlstra 2016-04-09 13:39 ` Peter Zijlstra 2016-04-12 22:29 ` Tejun Heo 2016-04-12 22:29 ` Tejun Heo 2016-04-13 7:43 ` Mike Galbraith [this message] 2016-04-13 7:43 ` Mike Galbraith 2016-04-13 15:59 ` Tejun Heo 2016-04-13 19:15 ` Mike Galbraith 2016-04-13 19:15 ` Mike Galbraith 2016-04-14 6:07 ` Mike Galbraith 2016-04-14 19:57 ` Tejun Heo 2016-04-14 19:57 ` Tejun Heo 2016-04-15 2:42 ` Mike Galbraith 2016-04-15 2:42 ` Mike Galbraith 2016-04-09 16:02 ` Peter Zijlstra 2016-04-09 16:02 ` Peter Zijlstra 2016-04-07 8:28 ` Peter Zijlstra 2016-04-07 8:28 ` Peter Zijlstra 2016-04-07 19:04 ` Johannes Weiner 2016-04-07 19:04 ` Johannes Weiner 2016-04-07 19:31 ` Peter Zijlstra 2016-04-07 19:31 ` Peter Zijlstra 2016-04-07 20:23 ` Johannes Weiner 2016-04-07 20:23 ` Johannes Weiner 2016-04-08 3:13 ` Mike Galbraith 2016-04-08 3:13 ` Mike Galbraith 2016-03-15 17:21 ` Michal Hocko 2016-03-15 17:21 ` Michal Hocko 2016-04-06 21:53 ` Tejun Heo 2016-04-06 21:53 ` Tejun Heo 2016-04-07 6:40 ` Peter Zijlstra 2016-04-07 6:40 ` Peter Zijlstra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1460533381.3780.191.camel@gmail.com \ --to=umgwanakikbuti@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=hannes@cmpxchg.org \ --cc=kernel-team@fb.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=lizefan@huawei.com \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=pjt@google.com \ --cc=tj@kernel.org \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.