All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Chris Mason <clm@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Christoph Hellwig <hch@infradead.org>,
	"Darrick J. Wong" <djwong@kernel.org>,
	xfs <linux-xfs@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	"dchinner@redhat.com" <dchinner@redhat.com>
Subject: Re: [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage()
Date: Mon, 6 Jun 2022 09:32:13 +1000	[thread overview]
Message-ID: <20220605233213.GN1098723@dread.disaster.area> (raw)
In-Reply-To: <c3620e1f-91c5-777c-4193-2478c69a033c@fb.com>

On Fri, Jun 03, 2022 at 12:09:06PM -0400, Chris Mason wrote:
> [ From a different message, Dave asks wtf my email client was doing. Thanks
> Dave, apparently exchange is being exchangey with base64 in unpredictable
> ways.  This was better in my test reply, lets see. ]
> 
> On 6/3/22 11:06 AM, Johannes Weiner wrote:
> > Hello Dave,
> > 
> > On Fri, Jun 03, 2022 at 03:20:47PM +1000, Dave Chinner wrote:
> > > On Fri, Jun 03, 2022 at 01:29:40AM +0000, Chris Mason wrote:
> > > > As you describe above, the loops are definitely coming from higher
> > > > in the stack.  wb_writeback() will loop as long as
> > > > __writeback_inodes_wb() returns that it’s making progress and
> > > > we’re still globally over the bg threshold, so write_cache_pages()
> > > > is just being called over and over again.  We’re coming from
> > > > wb_check_background_flush(), so:
> > > > 
> > > >                  struct wb_writeback_work work = {
> > > >                          .nr_pages       = LONG_MAX,
> > > >                          .sync_mode      = WB_SYNC_NONE,
> > > >                          .for_background = 1,
> > > >                          .range_cyclic   = 1,
> > > >                          .reason         = WB_REASON_BACKGROUND,
> > > >                  };
> > > 
> > > Sure, but we end up in writeback_sb_inodes() which does this after
> > > the __writeback_single_inode()->do_writepages() call that iterates
> > > the dirty pages:
> > > 
> > >                 if (need_resched()) {
> > >                          /*
> > >                           * We're trying to balance between building up a nice
> > >                           * long list of IOs to improve our merge rate, and
> > >                           * getting those IOs out quickly for anyone throttling
> > >                           * in balance_dirty_pages().  cond_resched() doesn't
> > >                           * unplug, so get our IOs out the door before we
> > >                           * give up the CPU.
> > >                           */
> > >                          blk_flush_plug(current->plug, false);
> > >                          cond_resched();
> > >                  }
> > > 
> > > So if there is a pending IO completion on this CPU on a work queue
> > > here, we'll reschedule to it because the work queue kworkers are
> > > bound to CPUs and they take priority over user threads.
> > 
> > The flusher thread is also a kworker, though. So it may hit this
> > cond_resched(), but it doesn't yield until the timeslice expires.

17us or 10ms, it doesn't matter. The fact is the writeback thread
will give up the CPU long before the latency durations (seconds)
that were reported upthread are seen. Writeback spinning can
not explain why truncate is not making progress - everything points
to it being a downstream symptom, not a cause.

Also important to note, as we are talking about kworker sheduling
hold-offs, the writeback flusher work is unbound (can run on any
CPU), whilst the IO completion workers in XFS are per-CPU and bound
to individual CPUs. Bound kernel tasks usually take run queue
priority on a CPU over unbound and/or user tasks that can be punted
to a different CPU. So, again, with this taken into account I don't
see how the flusher thread consuming CPU would cause long duration
hold-offs of IO completion work....

> Just to underline this, the long tail latencies aren't softlockups or major
> explosions.  It's just suboptimal enough that different metrics and
> dashboards noticed it.

Sure, but you've brought a problem we don't understand the root
cause of to my attention. I want to know what the root cause is so
that I can determine that there are no other unknown underlying
issues that are contributing to this issue.

Nothing found so far explains why truncate is not making progress
and that's what *I* need to understand here. Sure, go ahead and
patch your systems to get rid of the bad writeback looping
behaviour, but that does not explain or solve the underlying long IO
completion tail latency issue that originally lead to finding
writeback spinning.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2022-06-05 23:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01  1:11 [PATCH RFC] iomap: invalidate pages past eof in iomap_do_writepage() Chris Mason
2022-06-01 12:18 ` Christoph Hellwig
2022-06-01 14:13   ` Chris Mason
2022-06-02  6:52     ` Dave Chinner
2022-06-02 15:32       ` Johannes Weiner
2022-06-02 19:41         ` Chris Mason
2022-06-02 19:59           ` Matthew Wilcox
2022-06-02 22:07             ` Dave Chinner
2022-06-02 22:06         ` Dave Chinner
2022-06-03  1:29           ` Chris Mason
2022-06-03  5:20             ` Dave Chinner
2022-06-03 15:06               ` Johannes Weiner
2022-06-03 16:09                 ` Chris Mason
2022-06-05 23:32                   ` Dave Chinner [this message]
2022-06-06 14:46                     ` Johannes Weiner
2022-06-06 15:13                       ` Chris Mason
2022-06-07 22:52                       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220605233213.GN1098723@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=clm@fb.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.