From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vivek Goyal Subject: Re: [RFC] writeback and cgroup Date: Wed, 4 Apr 2012 16:32:39 -0400 Message-ID: <20120404203239.GM12676__36391.3291869268$1333571584$gmane$org@redhat.com> References: <20120403183655.GA23106@dhcp-172-17-108-109.mtv.corp.google.com> <20120404145134.GC12676@redhat.com> <20120404184909.GB29686@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20120404184909.GB29686-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Tejun Heo Cc: Jens Axboe , ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, Jan Kara , rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, andrea-oIIqvOZpAevzfdHfmsDf5w@public.gmane.org, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, sjayaraman-IBi9RG/b67k@public.gmane.org, lsf-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Fengguang Wu List-Id: containers.vger.kernel.org On Wed, Apr 04, 2012 at 11:49:09AM -0700, Tejun Heo wrote: [..] > Thirdly, I don't see how writeback can control all the IOs. I mean, > what about reads or direct IOs? It's not like IO devices have > separate channels for those different types of IOs. They interact > heavily. > Let's say we have iops/bps limitation applied on top of proportional IO > distribution We already do that. First IO is subjected to throttling limit and only then it is passed to the elevator to do the proportional IO. So throttling is already stacked on top of proportional IO. The only question is should it be pushed to even higher layers or not. > or a device holds two partitions and one > of them is being used for direct IO w/o filesystems. How would that > work? I think the question goes even deeper, what do the separate > limits even mean? Separate limits for buffered writes are just filling the gap. Agreed it is not a very neat solution. > Does the IO sched have to calculate allocation of > IO resource to different types of IOs and then give a "number" to > writeback which in turn enforces that limit? How does the elevator > know what number to give? Is the number iops or bps or weight? If we push up all the throttling somewhere in higher layer, say some of kind of per bdi throttling interface, then elevator just have to worry about doing proportional IO. No interaction with higher layers regarding iops/bps etc. (Not that elevator worries about it today). > If > the iosched doesn't know how much write workload exists, how does it > distribute the surplus buffered writeback resource across different > cgroups? If so, what makes the limit actualy enforceable (due to > inaccuracies in estimation, fluctuation in workload, delay in > enforcement in different layers and whatnot) except for block layer > applying the limit *again* on the resulting stream of combined IOs? So split model is definitely confusing. Anyway, block layer will not apply the limits again as flusher IO will go in root cgroup which generally goes to root which is unthrottled generally. Or flusher could mark the bios with a flag saying "do not throttle" bios again as these have been throttled already. So throttling again is probably not an issue. In summary, agreed that split is confusing and it fills a gap existing today. Thanks Vivek