linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Konstantin Khebnikov <khlebnikov@yandex-team.ru>
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	Roman Gushchin <klamm@yandex-team.ru>, Jan Kara <jack@suse.cz>,
	Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	koct9i@gmail.com
Subject: Re: [PATCHSET RFC 0/6] memcg: inode-based dirty-set controller
Date: Wed, 28 Jan 2015 20:21:46 -0500	[thread overview]
Message-ID: <20150129012146.GA20617@htj.dyndns.org> (raw)
In-Reply-To: <20150115180242.10450.92.stgit@buzz>

Hello, Konstantin.

Sorry about the delay.

On Thu, Jan 15, 2015 at 09:49:10PM +0300, Konstantin Khebnikov wrote:
> This is ressurection of my old RFC patch for dirty-set accounting cgroup [1]
> Now it's merged into memory cgroup and got bandwidth controller as a bonus.
> 
> That shows alternative solution: less accurate but much less monstrous than
> accurate page-based dirty-set controller from Tejun Heo.

I went over the whole patchset and ISTR having reviewed this a while
ago and the conclusion is the same.  This appears to be simpler on the
surface but this is a hackjob of a design to put it nicely.  You're
working around the complexity of pressure propagation from the lower
layer by building a separate pressure layer at the top most layer.  In
doing so, it's duplicating what already exist below in degenerate
forms but at the cost of fundamental crippling of the whole thing.

This, even in its current simplistic form, is already a dead end.
e.g. iops or bw aren't even the proper resources to distribute for
rotating disks, IO time is, which is what a large proportion of cfq is
trying to estimate and distribute.  What if there are multiple
filesystems on a single device?  Or if a cgroup accesses multiple
backing devices?  How would you guarantee low latency access to a high
priority cgroup while a huge inode from a low pri cgroup is being
written out when none of the lower layers have any idea what they're
doing?

Sure, these issues can be dealt with partially with various
workarounds and additions and I'm sure we'll be doing that if we go
down this path, but the only thing that'll lead to is duplicating more
of what's already in the lower layers with ever growing list of
behavioral and interface oddities which are inherent to the design.

Even in the absence of alternatives, I'd be strongly against this
direction.  I think this sort of ad-hoc "let's solve this one
immediate issue in the easiest way possible" is often worse than not
doing anything.  In the longer term, things like this paint us into a
corner of which we can't easily get out and memcg happens to be an
area where that sort of things took place quite a bit in the past and
people have been desparately trying to right the course, so, no, I
don't think this is happening.

I agree that propagating backpressure from the lower layer involves
more complexity but it is a full and conceptually and design-wise
straight-forward solution which doesn't need to get constantly papered
over.  This is the right thing to do.  It can be argued that the
amount of complexity can be reduced by tracking dirty pages per-inode,
but, if we're gonna do that, we should converting memcg itself to be
per address space too.  The arguments would be exactly the same for
memcg and memcg and writeback must be on the same foundation.

Thanks.

-- 
tejun

      parent reply	other threads:[~2015-01-29  1:21 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-15 18:49 [PATCHSET RFC 0/6] memcg: inode-based dirty-set controller Konstantin Khebnikov
2015-01-15 18:49 ` [PATCH 1/6] memcg: inode-based dirty and writeback pages accounting Konstantin Khebnikov
2015-01-15 18:49 ` [PATCH 2/6] memcg: dirty-set limiting and filtered writeback Konstantin Khebnikov
2015-01-15 18:49 ` [PATCH 3/6] memcg: track shared inodes with dirty pages Konstantin Khebnikov
2015-01-15 18:55   ` Tejun Heo
2015-01-15 19:04     ` Konstantin Khlebnikov
2015-01-15 19:08       ` Tejun Heo
2015-01-15 18:49 ` [PATCH 4/6] percpu_ratelimit: high-performance ratelimiting counter Konstantin Khebnikov
2015-01-15 18:49 ` [PATCH 5/6] delay-injection: resource management via procrastination Konstantin Khebnikov
2015-01-15 18:49 ` [PATCH 6/6] memcg: filesystem bandwidth controller Konstantin Khebnikov
2015-01-16  9:37 ` [PATCHSET RFC 0/6] memcg: inode-based dirty-set controller Jan Kara
2015-01-16 12:33   ` Konstantin Khlebnikov
2015-01-16 14:25     ` Jan Kara
2015-01-29  1:21 ` Tejun Heo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150129012146.GA20617@htj.dyndns.org \
    --to=tj@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=khlebnikov@yandex-team.ru \
    --cc=klamm@yandex-team.ru \
    --cc=koct9i@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).