All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Brian Foster <bfoster@redhat.com>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH RFC v3 3/3] iomap: bound ioend size to 4096 pages
Date: Wed, 26 May 2021 03:12:50 +0100	[thread overview]
Message-ID: <YK2uorrbm0L76p68@casper.infradead.org> (raw)
In-Reply-To: <20210525042035.GE202121@locust>

On Mon, May 24, 2021 at 09:20:35PM -0700, Darrick J. Wong wrote:
> > > This patch establishes a maximum ioend size of 4096 pages so that we
> > > don't trip the lockup watchdog while clearing pagewriteback and also so
> > > that we don't pin a large number of pages while constructing a big chain
> > > of bios.  On gfs2 and zonefs, each ioend completion will now have to
> > > clear up to 4096 pages from whatever context bio_endio is called.
> > > 
> > > For XFS it's a more complicated -- XFS already overrode the bio handler
> > > for ioends that required further metadata updates (e.g. unwritten
> > > conversion, eof extension, or cow) so that it could combine ioends when
> > > possible.  XFS wants to combine ioends to amortize the cost of getting
> > > the ILOCK and running transactions over a larger number of pages.
> > > 
> > > So I guess I see how the two changes dovetail nicely for XFS -- iomap
> > > issues smaller write bios, and the xfs ioend worker can recombine
> > > however many bios complete before the worker runs.  As a bonus, we don't
> > > have to worry about situations like the device driver completing so many
> > > bios from a single invocation of a bottom half handler that we run afoul
> > > of the soft lockup timer.
> > > 
> > > Is that a correct understanding of how the two changes intersect with
> > > each other?  TBH I was expecting the two thresholds to be closer in
> > > value.
> > > 
> > 
> > I think so. That's interesting because my inclination was to make them
> > farther apart (or more specifically, increase the threshold in this
> > patch and leave the previous). The primary goal of this series was to
> > address the soft lockup warning problem, hence the thresholds on earlier
> > versions started at rather conservative values. I think both values have
> > been reasonably justified in being reduced, though this patch has a more
> > broad impact than the previous in that it changes behavior for all iomap
> > based fs'. Of course that's something that could also be addressed with
> > a more dynamic tunable..
> 
> <shrug> I think I'm comfortable starting with 256 for xfs to bump an
> ioend to a workqueue, and 4096 pages as the limit for an iomap ioend.
> If people demonstrate a need to smart-tune or manual-tune we can always
> add one later.
> 
> Though I guess I did kind of wonder if maybe a better limit for iomap
> would be max_hw_sectors?  Since that's the maximum size of an IO that
> the kernel will for that device?

I think you're looking at this wrong.  The question is whether the
system can tolerate the additional latency of bumping to a workqueue vs
servicing directly.

If the I/O is large, then clearly it can.  It already waited for all
those DMAs to happen which took a certain amount of time on the I/O bus.
If the I/O is small, then maybe it can and maybe it can't.  So we should
be conservative and complete it in interrupt context.

This is why I think "number of pages" is really a red herring.  Sure,
that's the amount of work to be done, but really the question is "can
this I/O tolerate the extra delay".  Short of passing that information
in from the caller, number of bytes really is our best way of knowing.
And that doesn't scale with anything to do with the device or the
system bus.  

  parent reply	other threads:[~2021-05-26  2:13 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 17:17 [PATCH v3 0/3] iomap: avoid soft lockup warnings on large ioends Brian Foster
2021-05-17 17:17 ` [PATCH v3 1/3] iomap: resched ioend completion when in non-atomic context Brian Foster
2021-05-17 17:54   ` Matthew Wilcox
2021-05-18 11:38     ` Brian Foster
2021-05-20 21:58       ` Darrick J. Wong
2021-05-24 11:57         ` Brian Foster
2021-05-24 16:53           ` Darrick J. Wong
2021-05-26  1:19             ` Darrick J. Wong
2021-05-22  7:45   ` Ming Lei
2021-05-24 11:57     ` Brian Foster
2021-05-24 14:11       ` Ming Lei
2021-05-17 17:17 ` [PATCH v3 2/3] xfs: kick large ioends to completion workqueue Brian Foster
2021-05-26  1:20   ` Darrick J. Wong
2021-05-17 17:17 ` [PATCH RFC v3 3/3] iomap: bound ioend size to 4096 pages Brian Foster
2021-05-19 13:28   ` Christoph Hellwig
2021-05-19 14:52     ` Brian Foster
2021-05-20 23:27   ` Darrick J. Wong
2021-05-24 12:02     ` Brian Foster
2021-05-25  4:20       ` Darrick J. Wong
2021-05-25  4:29         ` Damien Le Moal
2021-05-25  7:13         ` Dave Chinner
2021-05-25  9:07         ` Andreas Gruenbacher
2021-05-26  2:12         ` Matthew Wilcox [this message]
2021-05-26  3:32           ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YK2uorrbm0L76p68@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=Damien.LeMoal@wdc.com \
    --cc=agruenba@redhat.com \
    --cc=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.