linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@parallels.com>,
	linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
	Li Zefan <lizf@cn.fujitsu.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Paul Turner <pjt@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Thomas Graf <tgraf@suug.ch>,
	"Serge E. Hallyn" <serue@us.ibm.com>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@ghostprotocols.net>,
	Neil Horman <nhorman@tuxdriver.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	rni@google.com, ctalbott@google.com
Subject: Re: Block IO controller hierarchy suppport (Was: Re: [PATCH RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them)
Date: Thu, 13 Sep 2012 22:53:17 -0400	[thread overview]
Message-ID: <20120914025317.GB4333@redhat.com> (raw)
In-Reply-To: <20120913220613.GP7677@google.com>

On Thu, Sep 13, 2012 at 03:06:13PM -0700, Tejun Heo wrote:
> Hey, Vivek.
> 
> (cc'ing Rakesh and Chad who work on iosched in google).
> 
> On Thu, Sep 13, 2012 at 10:53:41AM -0400, Vivek Goyal wrote:
> > Biggest problem with blkcg CFQ implementation is idling on cgroup. If
> > we don't idle on cgroup, then we don't get the service differentiaton
> > for most of the workloads and if we do idle then performance starts
> > to suck very soon (The moment few cgroups are created). And hierarchy
> > will just exacertbate this problem because then one will try to idle
> > at each group in hierarchy.
> > 
> > This problem is something similar to CFQ's idling on sequential queues
> > and iopriority. Because we never idled on random IO queue, ioprios never
> > worked on random IO queues. And same is true for buffered write queues.
> > Similary, if you don't idle on groups, then for most of the workloads,
> > service differentiation is not visible. Only the one which are highly
> > sequential on nature, one can see service differentiation.
> > 
> > That's one fundamental problem for which we need to have a good answer
> > before we try to do more work on blkcg. Because we can write as much
> > code but at the end of the day it might still not be useful because
> > of the above mentioned issue I faced.
> 
> I talked with Rakesh about this as the modified cfq-iosched used in
> google supports proper hierarchy and the feature is heavily depended
> upon.  I was told that nesting doesn't really change anything.  The
> only thing which matters is the number of active cgroups and whether
> they're nested or how deep doesn't matter - IIUC there's no need to
> idle for internal nodes if they don't have IOs pending.
> 
> He draw me some diagrams which made sense for me and the code
> apparently actually works, so there doesn't seem to be any fundamental
> issue in implementing hierarchy support in cfq.

Hmm...., They probably are right. Idling only on leaf groups can make
sure that none of the groups loses its fair share of quota. Thinking
loud...


				root
                                /  \
                               T1  G1
                                  /  \
                                 T2   G2
                                       \
                                        T3 

So if task T3 finishes and there is no active IO from T3, we will idle
on group G2 (in the hope that soon some IO will show up from T3 or from
some other task in G2). And that alone should make sure all the group
nodes in the path to root (G1) get their fair share at their respective
level and no additional idling should be needed.

So sounds like hierarhcy will not cause additional idling. Idling will
solely depend on leaf active groups. 

So how bad group/no-idle service tree idling is proving to be. In my
experience if we use it on anything other than single spindle sata disk,
performance hit will start showing up immediately.

Thanks
Vivek

  reply	other threads:[~2012-09-14  2:54 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-10 22:31 [PATCH RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them Tejun Heo
2012-09-10 22:33 ` [PATCH REPOST " Tejun Heo
2012-09-11 10:04   ` Michal Hocko
2012-09-11 17:07     ` Tejun Heo
2012-09-12 15:47       ` Michal Hocko
2012-09-12 16:41         ` Tejun Heo
     [not found]     ` <5050568B.9090601@parallels.com>
2012-09-12 15:49       ` Michal Hocko
2012-09-12 17:11         ` Tejun Heo
2012-09-13 12:14           ` Michal Hocko
2012-09-13 17:18             ` Tejun Heo
2012-09-13 17:39               ` Michal Hocko
     [not found]                 ` <5052E87A.1050405@parallels.com>
2012-09-14 19:15                   ` Tejun Heo
     [not found]           ` <5051CB24.4010801@parallels.com>
2012-09-13 17:21             ` Tejun Heo
2012-09-11 12:38   ` Li Zefan
2012-09-11 17:08     ` Tejun Heo
2012-09-11 17:43       ` Tejun Heo
     [not found]         ` <505057D8.4010908@parallels.com>
2012-09-12 16:34           ` Tejun Heo
2012-09-13  6:48             ` Li Zefan
2012-09-11 18:23   ` [PATCH UPDATED " Tejun Heo
2012-09-11 20:50     ` Aristeu Rozanski
2012-09-11 20:51       ` Tejun Heo
2012-09-13 12:16   ` [PATCH REPOST " Daniel P. Berrange
2012-09-13 17:52     ` Tejun Heo
2012-09-11 14:51 ` [PATCH " Vivek Goyal
2012-09-11 14:54   ` Vivek Goyal
2012-09-11 17:16   ` Tejun Heo
2012-09-11 17:35     ` Vivek Goyal
2012-09-11 17:55       ` Tejun Heo
2012-09-11 18:16         ` Vivek Goyal
2012-09-11 18:22           ` Tejun Heo
2012-09-11 18:38             ` Vivek Goyal
     [not found]         ` <50505C39.1050600@parallels.com>
2012-09-12 17:09           ` Tejun Heo
2012-09-13 14:53             ` Block IO controller hierarchy suppport (Was: Re: [PATCH RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them) Vivek Goyal
2012-09-13 22:06               ` Tejun Heo
2012-09-14  2:53                 ` Vivek Goyal [this message]
     [not found]                   ` <5052E8DA.1000106@parallels.com>
2012-09-14 13:22                     ` Vivek Goyal
     [not found]             ` <5051CBAA.5040308@parallels.com>
2012-09-13 17:54               ` [PATCH RFC cgroup/for-3.7] cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them Tejun Heo
     [not found]                 ` <5052E931.8000007@parallels.com>
2012-09-14 18:56                   ` Tejun Heo
     [not found] ` <505055E5.90903@parallels.com>
2012-09-12 17:03   ` Tejun Heo
     [not found]     ` <5051C954.2080600@parallels.com>
2012-09-13 17:48       ` Tejun Heo
     [not found]         ` <5052E9BC.2020908@parallels.com>
2012-09-17  7:59           ` Daniel Wagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120914025317.GB4333@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=acme@ghostprotocols.net \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=ctalbott@google.com \
    --cc=glommer@parallels.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mhocko@suse.cz \
    --cc=mingo@redhat.com \
    --cc=nhorman@tuxdriver.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rni@google.com \
    --cc=serue@us.ibm.com \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).