linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Tejun Heo <tj@kernel.org>,
	containers@lists.linux-foundation.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, Neil Horman <nhorman@tuxdriver.com>,
	Michal Hocko <mhocko@suse.cz>, Paul Mackerras <paulus@samba.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Johannes Weiner <hannes@cmpxchg.org>, Thomas Graf <tgraf@suug.ch>,
	"Serge E. Hallyn" <serue@us.ibm.com>,
	Paul Turner <pjt@google.com>, Ingo Molnar <mingo@redhat.com>
Subject: Re: [RFC] cgroup TODOs
Date: Fri, 14 Sep 2012 09:58:30 -0400	[thread overview]
Message-ID: <20120914135830.GB6221@redhat.com> (raw)
In-Reply-To: <20120914091032.GA6819@redhat.com>

On Fri, Sep 14, 2012 at 10:10:32AM +0100, Daniel P. Berrange wrote:

[..]
> > 6. Multiple hierarchies
> > 
> >   Apart from the apparent wheeeeeeeeness of it (I think I talked about
> >   that enough the last time[1]), there's a basic problem when more
> >   than one controllers interact - it's impossible to define a resource
> >   group when more than two controllers are involved because the
> >   intersection of different controllers is only defined in terms of
> >   tasks.
> > 
> >   IOW, if an entity X is of interest to two controllers, there's no
> >   way to map X to the cgroups of the two controllers.  X may belong to
> >   A and B when viewed by one task but A' and B when viewed by another.
> >   This already is a head scratcher in writeback where blkcg and memcg
> >   have to interact.
> > 
> >   While I am pushing for unified hierarchy, I think it's necessary to
> >   have different levels of granularities depending on controllers
> >   given that nesting involves significant overhead and noticeable
> >   controller-dependent behavior changes.
> > 
> >   Solution:
> > 
> >   I think a unified hierarchy with the ability to ignore subtrees
> >   depending on controllers should work.  For example, let's assume the
> >   following hierarchy.
> > 
> >           R
> > 	/   \
> >        A     B
> >       / \
> >      AA AB
> > 
> >   All controllers are co-mounted.  There is per-cgroup knob which
> >   controls which controllers nest beyond it.  If blkio doesn't want to
> >   distinguish AA and AB, the user can specify that blkio doesn't nest
> >   beyond A and blkio would see the tree as,
> > 
> >           R
> > 	/   \
> >        A     B
> > 
> >   While other controllers keep seeing the original tree.  The exact
> >   form of interface, I don't know yet.  It could be a single file
> >   which the user echoes [-]controller name into it or per-controller
> >   boolean file.
> > 
> >   I think this level of flexibility should be enough for most use
> >   cases.  If someone disagrees, please voice your objections now.

Tejun, Daniel,

I am little concerned about above and wondering how systemd and libvirt
will interact and behave out of the box.

Currently systemd does not create its own hierarchy under blkio and
libvirt does. So putting all together means there is no way to avoid
the overhead of systemd created hierarchy.

\
|
+- system
     |
     +- libvirtd.service
              |
              +- virt-machine1
              +- virt-machine2

So there is now way to avoid the overhead of two levels of hierarchy
created by systemd. I really wish that systemd gets rid of "system"
cgroup and puts services directly in top level group. Creating deeper
hieararchices is expensive.

I just want to mention it clearly that with above model, it will not
be possible for libvirt to avoid hierarchy levels created by systemd.
So solution would be to keep depth of hierarchy as low as possible and
to keep controller overhead as low as possible.

Now I know that with blkio idling kills performance. So one solution
could be that on anything fast, don't use CFQ. Use deadline and then
group idling overhead goes away and tools like systemd and libvirt don't
have to worry about keeping track of disks and what scheduler is running.
They don't want to do it and expect kernel to get it right.

But getting that right out of box does not happen as of today as CFQ
is default on everything. Distributions can carry their own patches
to do some approximation, but it would be better to have a better
mechanism in kernel to select better IO scheduler out of box for a
storage lun. It is more important now then even since blkio controller
has come into picture.

Above is the scenario I am most worried about where CFQ shows up by default
on all the luns, systemd and libvirt create 4-5 level deep hierarchies
by default and IO performance sucks out of the box. Already CFQ underforms
for fast storage and with group creation problem becomes worse.

Thanks
Vivek

  reply	other threads:[~2012-09-14 13:59 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-13 20:58 [RFC] cgroup TODOs Tejun Heo
2012-09-14  9:04 ` Mike Galbraith
2012-09-14 17:17   ` Tejun Heo
2012-09-14  9:10 ` Daniel P. Berrange
2012-09-14 13:58   ` Vivek Goyal [this message]
2012-09-14 19:29     ` Tejun Heo
2012-09-14 21:51       ` Kay Sievers
     [not found] ` <5052E7DF.7040000@parallels.com>
2012-09-14  9:12   ` Li Zefan
2012-09-14 11:22     ` Peter Zijlstra
2012-09-14 17:59     ` Tejun Heo
2012-09-14 18:23       ` Peter Zijlstra
2012-09-14 18:33         ` Tejun Heo
2012-09-14 17:43   ` Tejun Heo
2012-09-17  8:50     ` Glauber Costa
2012-09-17 17:21       ` Tejun Heo
2012-09-14 11:15 ` Peter Zijlstra
2012-09-14 12:54   ` Daniel P. Berrange
2012-09-14 17:53   ` Tejun Heo
2012-09-14 14:25 ` Vivek Goyal
2012-09-14 14:53   ` Peter Zijlstra
2012-09-14 15:14     ` Vivek Goyal
2012-09-14 21:57       ` Tejun Heo
2012-09-17 15:27         ` Vivek Goyal
2012-09-18 18:08         ` Vivek Goyal
2012-09-14 21:39   ` Tejun Heo
2012-09-17 15:05     ` Vivek Goyal
2012-09-17 16:40       ` Tejun Heo
2012-09-14 15:03 ` Michal Hocko
2012-09-19 14:02   ` Michal Hocko
2012-09-19 14:03     ` [PATCH 2.6.32] memcg: warn on deeper hierarchies with use_hierarchy==0 Michal Hocko
2012-09-19 19:38       ` David Rientjes
2012-09-20 13:24         ` Michal Hocko
2012-09-20 22:33           ` David Rientjes
2012-09-21  7:16             ` Michal Hocko
2012-09-19 14:03     ` [PATCH 3.0] " Michal Hocko
2012-09-19 14:05     ` [PATCH 3.2+] " Michal Hocko
2012-09-14 18:07 ` [RFC] cgroup TODOs Vivek Goyal
2012-09-14 18:53   ` Tejun Heo
2012-09-14 19:28     ` Vivek Goyal
2012-09-14 19:44       ` Tejun Heo
2012-09-14 19:49         ` Tejun Heo
2012-09-14 20:39           ` Tejun Heo
2012-09-17  8:40             ` Glauber Costa
2012-09-17 17:30               ` Tejun Heo
2012-09-17 14:37             ` Vivek Goyal
2012-09-14 18:36 ` Aristeu Rozanski
2012-09-14 18:54   ` Tejun Heo
2012-09-15  2:20   ` Serge E. Hallyn
2012-09-15  9:27     ` Controlling devices and device namespaces Eric W. Biederman
2012-09-15 22:05       ` Serge E. Hallyn
2012-09-16  0:24         ` Eric W. Biederman
2012-09-16  3:31           ` Serge E. Hallyn
2012-09-16 11:21           ` Alan Cox
2012-09-16 11:56             ` Eric W. Biederman
2012-09-16 12:17               ` Eric W. Biederman
2012-09-16 13:32                 ` Serge Hallyn
2012-09-16 14:23                   ` Eric W. Biederman
2012-09-16 16:13                     ` Alan Cox
2012-09-16 17:49                       ` Eric W. Biederman
2012-09-16 16:15                     ` Serge Hallyn
2012-09-16 16:53                       ` Eric W. Biederman
2012-09-16  8:19   ` [RFC] cgroup TODOs James Bottomley
2012-09-16 14:41     ` Eric W. Biederman
2012-09-17 13:21     ` Aristeu Rozanski
2012-09-14 22:03 ` Dhaval Giani
2012-09-14 22:06   ` Tejun Heo
2012-09-20  1:33 ` Andy Lutomirski
2012-09-20 18:26   ` Tejun Heo
2012-09-20 18:39     ` Andy Lutomirski
2012-09-21 21:40 ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120914135830.GB6221@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=acme@ghostprotocols.net \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=berrange@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@redhat.com \
    --cc=nhorman@tuxdriver.com \
    --cc=paulus@samba.org \
    --cc=pjt@google.com \
    --cc=serue@us.ibm.com \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).