linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Li Zefan <lizefan@huawei.com>,
	Michal Hocko <mhocko@suse.cz>,
	Glauber Costa <glommer@parallels.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Paul Turner <pjt@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Thomas Graf <tgraf@suug.ch>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Neil Horman <nhorman@tuxdriver.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Serge Hallyn <serge.hallyn@ubuntu.com>
Subject: Re: [RFC] cgroup TODOs
Date: Fri, 14 Sep 2012 10:25:39 -0400	[thread overview]
Message-ID: <20120914142539.GC6221@redhat.com> (raw)
In-Reply-To: <20120913205827.GO7677@google.com>

On Thu, Sep 13, 2012 at 01:58:27PM -0700, Tejun Heo wrote:

[..]
>   * blkio is the most problematic.  It has two sub-controllers - cfq
>     and blk-throttle.  Both are utterly broken in terms of hierarchy
>     support and the former is known to have pretty hairy code base.  I
>     don't see any other way than just biting the bullet and fixing it.

I am still little concerned about changing the blkio behavior
unexpectedly. Can we have some kind of mount time flag which retains
the old flat behavior and we warn user that this mode is deprecated
and will soon be removed. Move over to hierarchical mode. Then after
few release we can drop the flag and cleanup any extra code which
supports flat mode in CFQ. This will atleast make transition smooth.

> 
>   * cgroup_freezer and others shouldn't be too difficult to fix.
> 
>   Who:
> 
>   memcg can be handled by memcg people and I can handle cgroup_freezer
>   and others with help from the authors.  The problematic one is
>   blkio.  If anyone is interested in working on blkio, please be my
>   guest.  Vivek?  Glauber?

I will try to spend some time on this. Doing changes in blk-throttle
should be relatively easy. Painful part if CFQ. It does so much that
it is not clear whether a particular change will bite us badly or
not. So doing changes becomes hard. There are heuristics, preemptions,
queue selection logic, service tree and bringing it all together
for full hierarchy becomes interesting.

I think first thing which needs to be done is merge group scheduling
and cfqq scheduling. Because of flat hierarchy currently we use two
scheduling algorithm. Old logic for queue selection and new logic
for group scheduling. If we treat task and group at same level then
we have to merge two and come up with single logic.

Glauber feel free to jump into it if you like to. We can sort it out
together.

[..]
>   * Vivek brought up the issue of distributing resources to tasks and
>     groups in the same cgroup.  I don't know.  Need to think more
>     about it.

This one will require some thought. I have heard arguments for both the
models. Treating tasks and groups at same level seem to have one
disadvantange and that is that people can't think of system resources
in terms of %. People often say, give 20% of disk resources to a
particular cgroup. But it is not possible as there are all kernel
threads running in root cgroup and tasks come and go and that means
% share of a group is variable and not fixed.

To make it fixed, we will need to make sure that number of entities
fighting for resources are not variable. That means only group fight
for resources at a level and tasks with-in groups. 

Now the question is should kernel enforce it or should it be left to 
user space. I think doing it in user space is also messy as different
agents control different part of hiearchy. For example, if somebody says
that give a particular virtual machine a x% of system resource, libvirt
has no way to do that. At max it can ensure x% of parent group but above
that hierarchy is controlled by systemd and libvirtd has no control
over that.

Only possible way to do this will seem to be that systemd creates libvirt
group at top level with a minimum fixed % of quota and then libvirt can
figure out % share of each virtual machine. But it is hard to do.

So while % model is more intutive to users, it is hard to implement. So
an easier way is to stick to the model of relative weights/share and
let user specify relative importance of a virtual machine and actual
quota or % will vary dynamically depending on other tasks/components
in the system.

Thoughts?

Thanks
Vivek

  parent reply	other threads:[~2012-09-14 14:26 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-13 20:58 [RFC] cgroup TODOs Tejun Heo
2012-09-14  9:04 ` Mike Galbraith
2012-09-14 17:17   ` Tejun Heo
2012-09-14  9:10 ` Daniel P. Berrange
2012-09-14 13:58   ` Vivek Goyal
2012-09-14 19:29     ` Tejun Heo
2012-09-14 21:51       ` Kay Sievers
     [not found] ` <5052E7DF.7040000@parallels.com>
2012-09-14  9:12   ` Li Zefan
2012-09-14 11:22     ` Peter Zijlstra
2012-09-14 17:59     ` Tejun Heo
2012-09-14 18:23       ` Peter Zijlstra
2012-09-14 18:33         ` Tejun Heo
2012-09-14 17:43   ` Tejun Heo
2012-09-17  8:50     ` Glauber Costa
2012-09-17 17:21       ` Tejun Heo
2012-09-14 11:15 ` Peter Zijlstra
2012-09-14 12:54   ` Daniel P. Berrange
2012-09-14 17:53   ` Tejun Heo
2012-09-14 14:25 ` Vivek Goyal [this message]
2012-09-14 14:53   ` Peter Zijlstra
2012-09-14 15:14     ` Vivek Goyal
2012-09-14 21:57       ` Tejun Heo
2012-09-17 15:27         ` Vivek Goyal
2012-09-18 18:08         ` Vivek Goyal
2012-09-14 21:39   ` Tejun Heo
2012-09-17 15:05     ` Vivek Goyal
2012-09-17 16:40       ` Tejun Heo
2012-09-14 15:03 ` Michal Hocko
2012-09-19 14:02   ` Michal Hocko
2012-09-19 14:03     ` [PATCH 2.6.32] memcg: warn on deeper hierarchies with use_hierarchy==0 Michal Hocko
2012-09-19 19:38       ` David Rientjes
2012-09-20 13:24         ` Michal Hocko
2012-09-20 22:33           ` David Rientjes
2012-09-21  7:16             ` Michal Hocko
2012-09-19 14:03     ` [PATCH 3.0] " Michal Hocko
2012-09-19 14:05     ` [PATCH 3.2+] " Michal Hocko
2012-09-14 18:07 ` [RFC] cgroup TODOs Vivek Goyal
2012-09-14 18:53   ` Tejun Heo
2012-09-14 19:28     ` Vivek Goyal
2012-09-14 19:44       ` Tejun Heo
2012-09-14 19:49         ` Tejun Heo
2012-09-14 20:39           ` Tejun Heo
2012-09-17  8:40             ` Glauber Costa
2012-09-17 17:30               ` Tejun Heo
2012-09-17 14:37             ` Vivek Goyal
2012-09-14 18:36 ` Aristeu Rozanski
2012-09-14 18:54   ` Tejun Heo
2012-09-15  2:20   ` Serge E. Hallyn
2012-09-15  9:27     ` Controlling devices and device namespaces Eric W. Biederman
2012-09-15 22:05       ` Serge E. Hallyn
2012-09-16  0:24         ` Eric W. Biederman
2012-09-16  3:31           ` Serge E. Hallyn
2012-09-16 11:21           ` Alan Cox
2012-09-16 11:56             ` Eric W. Biederman
2012-09-16 12:17               ` Eric W. Biederman
2012-09-16 13:32                 ` Serge Hallyn
2012-09-16 14:23                   ` Eric W. Biederman
2012-09-16 16:13                     ` Alan Cox
2012-09-16 17:49                       ` Eric W. Biederman
2012-09-16 16:15                     ` Serge Hallyn
2012-09-16 16:53                       ` Eric W. Biederman
2012-09-16  8:19   ` [RFC] cgroup TODOs James Bottomley
2012-09-16 14:41     ` Eric W. Biederman
2012-09-17 13:21     ` Aristeu Rozanski
2012-09-14 22:03 ` Dhaval Giani
2012-09-14 22:06   ` Tejun Heo
2012-09-20  1:33 ` Andy Lutomirski
2012-09-20 18:26   ` Tejun Heo
2012-09-20 18:39     ` Andy Lutomirski
2012-09-21 21:40 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120914142539.GC6221@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=acme@ghostprotocols.net \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mhocko@suse.cz \
    --cc=mingo@redhat.com \
    --cc=nhorman@tuxdriver.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=serge.hallyn@ubuntu.com \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).