From: Michael Stapelberg <stapelberg+linux@google.com>
To: Jan Kara <jack@suse.cz>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm <linux-mm@kvack.org>,
linux-fsdevel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Dennis Zhou <dennis@kernel.org>, Jens Axboe <axboe@kernel.dk>,
Roman Gushchin <guro@fb.com>,
Johannes Thumshirn <johannes.thumshirn@wdc.com>,
Song Liu <song@kernel.org>, David Sterba <dsterba@suse.com>
Subject: Re: [PATCH] backing_dev_info: introduce min_bw/max_bw limits
Date: Tue, 22 Jun 2021 14:29:32 +0200 [thread overview]
Message-ID: <CAH9Oa-YxL1iu_TVn6bL3Nd4qzYSVDPaO9a96sX4u7dhq+ewasA@mail.gmail.com> (raw)
In-Reply-To: <20210622121205.GG14261@quack2.suse.cz>
Thanks for taking a look! Comments inline:
On Tue, 22 Jun 2021 at 14:12, Jan Kara <jack@suse.cz> wrote:
>
> On Mon 21-06-21 11:20:10, Michael Stapelberg wrote:
> > Hey Miklos
> >
> > On Fri, 18 Jun 2021 at 16:42, Miklos Szeredi <miklos@szeredi.hu> wrote:
> > >
> > > On Fri, 18 Jun 2021 at 10:31, Michael Stapelberg
> > > <stapelberg+linux@google.com> wrote:
> > >
> > > > Maybe, but I don’t have the expertise, motivation or time to
> > > > investigate this any further, let alone commit to get it done.
> > > > During our previous discussion I got the impression that nobody else
> > > > had any cycles for this either:
> > > > https://lore.kernel.org/linux-fsdevel/CANnVG6n=ySfe1gOr=0ituQidp56idGARDKHzP0hv=ERedeMrMA@mail.gmail.com/
> > > >
> > > > Have you had a look at the China LSF report at
> > > > http://bardofschool.blogspot.com/2011/?
> > > > The author of the heuristic has spent significant effort and time
> > > > coming up with what we currently have in the kernel:
> > > >
> > > > """
> > > > Fengguang said he draw more than 10K performance graphs and read even
> > > > more in the past year.
> > > > """
> > > >
> > > > This implies that making changes to the heuristic will not be a quick fix.
> > >
> > > Having a piece of kernel code sitting there that nobody is willing to
> > > fix is certainly not a great situation to be in.
> >
> > Agreed.
> >
> > >
> > > And introducing band aids is not going improve the above situation,
> > > more likely it will prolong it even further.
> >
> > Sounds like “Perfect is the enemy of good” to me: you’re looking for a
> > perfect hypothetical solution,
> > whereas we have a known-working low risk fix for a real problem.
> >
> > Could we find a solution where medium-/long-term, the code in question
> > is improved,
> > perhaps via a Summer Of Code project or similar community efforts,
> > but until then, we apply the patch at hand?
> >
> > As I mentioned, I think adding min/max limits can be useful regardless
> > of how the heuristic itself changes.
> >
> > If that turns out to be incorrect or undesired, we can still turn the
> > knobs into a no-op, if removal isn’t an option.
>
> Well, removal of added knobs is more or less out of question as it can
> break some userspace. Similarly making them no-op is problematic unless we
> are pretty certain it cannot break some existing setup. That's why we have
> to think twice (or better three times ;) before adding any knobs. Also
> honestly the knobs you suggest will be pretty hard to tune when there are
> multiple cgroups with writeback control involved (which can be affected by
> the same problems you observe as well). So I agree with Miklos that this is
> not the right way to go. Speaking of tunables, did you try tuning
> /sys/devices/virtual/bdi/<fuse-bdi>/min_ratio? I suspect that may
> workaround your problems...
Back then, I did try the various tunables (vm.dirty_ratio and
vm.dirty_background_ratio on the global level,
/sys/class/bdi/<bdi>/{min,max}_ratio on the file system level), and
they have had no observable effect on the problem at all in my tests.
>
> Looking into your original report and tracing you did (thanks for that,
> really useful), it seems that the problem is that writeback bandwidth is
> updated at most every 200ms (more frequent calls are just ignored) and are
> triggered only from balance_dirty_pages() (happen when pages are dirtied) and
> inode writeback code so if the workload tends to have short spikes of activity
> and extended periods of quiet time, then writeback bandwidth may indeed be
> seriously miscomputed because we just won't update writeback throughput
> after most of writeback has happened as you observed.
>
> I think the fix for this can be relatively simple. We just need to make
> sure we update writeback bandwidth reasonably quickly after the IO
> finishes. I'll write a patch and see if it helps.
Thank you! Please keep us posted.
prev parent reply other threads:[~2021-06-22 12:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-17 9:53 [PATCH] backing_dev_info: introduce min_bw/max_bw limits Michael Stapelberg
2021-06-18 8:18 ` Miklos Szeredi
2021-06-18 8:31 ` Michael Stapelberg
2021-06-18 14:41 ` Miklos Szeredi
2021-06-21 9:20 ` Michael Stapelberg
2021-06-22 12:12 ` Jan Kara
2021-06-22 12:29 ` Michael Stapelberg [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAH9Oa-YxL1iu_TVn6bL3Nd4qzYSVDPaO9a96sX4u7dhq+ewasA@mail.gmail.com \
--to=stapelberg+linux@google.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=dennis@kernel.org \
--cc=dsterba@suse.com \
--cc=guro@fb.com \
--cc=jack@suse.cz \
--cc=johannes.thumshirn@wdc.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=song@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).