linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Michael Stapelberg <stapelberg+linux@google.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Dennis Zhou <dennis@kernel.org>, Jens Axboe <axboe@kernel.dk>,
	Roman Gushchin <guro@fb.com>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>,
	Jan Kara <jack@suse.cz>, Song Liu <song@kernel.org>,
	David Sterba <dsterba@suse.com>
Subject: Re: [PATCH] backing_dev_info: introduce min_bw/max_bw limits
Date: Tue, 22 Jun 2021 14:12:05 +0200	[thread overview]
Message-ID: <20210622121205.GG14261@quack2.suse.cz> (raw)
In-Reply-To: <CAH9Oa-YxeZ25Vbto3NyUw=RK5vQWv_v7xp3vHS9667iJJ8XV_A@mail.gmail.com>

On Mon 21-06-21 11:20:10, Michael Stapelberg wrote:
> Hey Miklos
> 
> On Fri, 18 Jun 2021 at 16:42, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 18 Jun 2021 at 10:31, Michael Stapelberg
> > <stapelberg+linux@google.com> wrote:
> >
> > > Maybe, but I don’t have the expertise, motivation or time to
> > > investigate this any further, let alone commit to get it done.
> > > During our previous discussion I got the impression that nobody else
> > > had any cycles for this either:
> > > https://lore.kernel.org/linux-fsdevel/CANnVG6n=ySfe1gOr=0ituQidp56idGARDKHzP0hv=ERedeMrMA@mail.gmail.com/
> > >
> > > Have you had a look at the China LSF report at
> > > http://bardofschool.blogspot.com/2011/?
> > > The author of the heuristic has spent significant effort and time
> > > coming up with what we currently have in the kernel:
> > >
> > > """
> > > Fengguang said he draw more than 10K performance graphs and read even
> > > more in the past year.
> > > """
> > >
> > > This implies that making changes to the heuristic will not be a quick fix.
> >
> > Having a piece of kernel code sitting there that nobody is willing to
> > fix is certainly not a great situation to be in.
> 
> Agreed.
> 
> >
> > And introducing band aids is not going improve the above situation,
> > more likely it will prolong it even further.
> 
> Sounds like “Perfect is the enemy of good” to me: you’re looking for a
> perfect hypothetical solution,
> whereas we have a known-working low risk fix for a real problem.
> 
> Could we find a solution where medium-/long-term, the code in question
> is improved,
> perhaps via a Summer Of Code project or similar community efforts,
> but until then, we apply the patch at hand?
> 
> As I mentioned, I think adding min/max limits can be useful regardless
> of how the heuristic itself changes.
> 
> If that turns out to be incorrect or undesired, we can still turn the
> knobs into a no-op, if removal isn’t an option.

Well, removal of added knobs is more or less out of question as it can
break some userspace. Similarly making them no-op is problematic unless we
are pretty certain it cannot break some existing setup. That's why we have
to think twice (or better three times ;) before adding any knobs. Also
honestly the knobs you suggest will be pretty hard to tune when there are
multiple cgroups with writeback control involved (which can be affected by
the same problems you observe as well). So I agree with Miklos that this is
not the right way to go. Speaking of tunables, did you try tuning
/sys/devices/virtual/bdi/<fuse-bdi>/min_ratio? I suspect that may
workaround your problems...

Looking into your original report and tracing you did (thanks for that,
really useful), it seems that the problem is that writeback bandwidth is
updated at most every 200ms (more frequent calls are just ignored) and are
triggered only from balance_dirty_pages() (happen when pages are dirtied) and
inode writeback code so if the workload tends to have short spikes of activity
and extended periods of quiet time, then writeback bandwidth may indeed be
seriously miscomputed because we just won't update writeback throughput
after most of writeback has happened as you observed.

I think the fix for this can be relatively simple. We just need to make
sure we update writeback bandwidth reasonably quickly after the IO
finishes. I'll write a patch and see if it helps.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2021-06-22 12:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17  9:53 [PATCH] backing_dev_info: introduce min_bw/max_bw limits Michael Stapelberg
2021-06-18  8:18 ` Miklos Szeredi
2021-06-18  8:31   ` Michael Stapelberg
2021-06-18 14:41     ` Miklos Szeredi
2021-06-21  9:20       ` Michael Stapelberg
2021-06-22 12:12         ` Jan Kara [this message]
2021-06-22 12:29           ` Michael Stapelberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210622121205.GG14261@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=dennis@kernel.org \
    --cc=dsterba@suse.com \
    --cc=guro@fb.com \
    --cc=johannes.thumshirn@wdc.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=song@kernel.org \
    --cc=stapelberg+linux@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).