linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Brian Foster <bfoster@redhat.com>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC v3 3/3] iomap: bound ioend size to 4096 pages
Date: Tue, 25 May 2021 20:32:10 -0700	[thread overview]
Message-ID: <20210526033210.GG202078@locust> (raw)
In-Reply-To: <YK2uorrbm0L76p68@casper.infradead.org>

On Wed, May 26, 2021 at 03:12:50AM +0100, Matthew Wilcox wrote:
> On Mon, May 24, 2021 at 09:20:35PM -0700, Darrick J. Wong wrote:
> > > > This patch establishes a maximum ioend size of 4096 pages so that we
> > > > don't trip the lockup watchdog while clearing pagewriteback and also so
> > > > that we don't pin a large number of pages while constructing a big chain
> > > > of bios.  On gfs2 and zonefs, each ioend completion will now have to
> > > > clear up to 4096 pages from whatever context bio_endio is called.
> > > > 
> > > > For XFS it's a more complicated -- XFS already overrode the bio handler
> > > > for ioends that required further metadata updates (e.g. unwritten
> > > > conversion, eof extension, or cow) so that it could combine ioends when
> > > > possible.  XFS wants to combine ioends to amortize the cost of getting
> > > > the ILOCK and running transactions over a larger number of pages.
> > > > 
> > > > So I guess I see how the two changes dovetail nicely for XFS -- iomap
> > > > issues smaller write bios, and the xfs ioend worker can recombine
> > > > however many bios complete before the worker runs.  As a bonus, we don't
> > > > have to worry about situations like the device driver completing so many
> > > > bios from a single invocation of a bottom half handler that we run afoul
> > > > of the soft lockup timer.
> > > > 
> > > > Is that a correct understanding of how the two changes intersect with
> > > > each other?  TBH I was expecting the two thresholds to be closer in
> > > > value.
> > > > 
> > > 
> > > I think so. That's interesting because my inclination was to make them
> > > farther apart (or more specifically, increase the threshold in this
> > > patch and leave the previous). The primary goal of this series was to
> > > address the soft lockup warning problem, hence the thresholds on earlier
> > > versions started at rather conservative values. I think both values have
> > > been reasonably justified in being reduced, though this patch has a more
> > > broad impact than the previous in that it changes behavior for all iomap
> > > based fs'. Of course that's something that could also be addressed with
> > > a more dynamic tunable..
> > 
> > <shrug> I think I'm comfortable starting with 256 for xfs to bump an
> > ioend to a workqueue, and 4096 pages as the limit for an iomap ioend.
> > If people demonstrate a need to smart-tune or manual-tune we can always
> > add one later.
> > 
> > Though I guess I did kind of wonder if maybe a better limit for iomap
> > would be max_hw_sectors?  Since that's the maximum size of an IO that
> > the kernel will for that device?
> 
> I think you're looking at this wrong.  The question is whether the
> system can tolerate the additional latency of bumping to a workqueue vs
> servicing directly.
> 
> If the I/O is large, then clearly it can.  It already waited for all
> those DMAs to happen which took a certain amount of time on the I/O bus.
> If the I/O is small, then maybe it can and maybe it can't.  So we should
> be conservative and complete it in interrupt context.
> 
> This is why I think "number of pages" is really a red herring.  Sure,
> that's the amount of work to be done, but really the question is "can
> this I/O tolerate the extra delay".  Short of passing that information
> in from the caller, number of bytes really is our best way of knowing.
> And that doesn't scale with anything to do with the device or the
> system bus.  

It doesn't matter whether the process(es) that triggered writeback will
tolerate the extra latency of a workqueue.  The hangcheck timer trips,
which means we've been doing things in softirq context too long.

The next thing that happens is that the kind of people who treat **ANY**
stack trace in dmesg as grounds to file a bug and escalate it will file
a bug and escalate it, and now I'm working 10 hour days trying to stomp
down all 6 escalations, run a QA botnet, review patches, and make any
incremental progress on long term goals when I can squeeze out five
minutes of free time.

Yeah, it'd be nice to rebuild writeback with some sort of QOS system so
that it could pick different strategies based on the amount of work to
do and the impatience levels of the processes waiting for it.  But that
is a project of its own.  This is a starter fix to take the heat off.

The reason I've been running at 110% burnout for the last 9 months is
exactly this -- someone submits a patchset to fix or improve something,
but then the reviewers pile on with "No no no, you should consider
building this far more elaborate solution", withhold review tags, but
then seem to be too busy to participate in building the elaborate thing.

At least in this case I can do something about it.  We're nearly to rc4
so barring anything weird showing up in QA runs overnight I plan to
stuff this in for 5.14.

--D

      reply	other threads:[~2021-05-26  3:32 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 17:17 [PATCH v3 0/3] iomap: avoid soft lockup warnings on large ioends Brian Foster
2021-05-17 17:17 ` [PATCH v3 1/3] iomap: resched ioend completion when in non-atomic context Brian Foster
2021-05-17 17:54   ` Matthew Wilcox
2021-05-18 11:38     ` Brian Foster
2021-05-20 21:58       ` Darrick J. Wong
2021-05-24 11:57         ` Brian Foster
2021-05-24 16:53           ` Darrick J. Wong
2021-05-26  1:19             ` Darrick J. Wong
2021-05-22  7:45   ` Ming Lei
2021-05-24 11:57     ` Brian Foster
2021-05-24 14:11       ` Ming Lei
2021-05-17 17:17 ` [PATCH v3 2/3] xfs: kick large ioends to completion workqueue Brian Foster
2021-05-26  1:20   ` Darrick J. Wong
2021-05-17 17:17 ` [PATCH RFC v3 3/3] iomap: bound ioend size to 4096 pages Brian Foster
2021-05-19 13:28   ` Christoph Hellwig
2021-05-19 14:52     ` Brian Foster
2021-05-20 23:27   ` Darrick J. Wong
2021-05-24 12:02     ` Brian Foster
2021-05-25  4:20       ` Darrick J. Wong
2021-05-25  4:29         ` Damien Le Moal
2021-05-25  7:13         ` Dave Chinner
2021-05-25  9:07         ` Andreas Gruenbacher
2021-05-26  2:12         ` Matthew Wilcox
2021-05-26  3:32           ` Darrick J. Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210526033210.GG202078@locust \
    --to=djwong@kernel.org \
    --cc=Damien.LeMoal@wdc.com \
    --cc=agruenba@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).