All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, Eryu Guan <eguan@redhat.com>
Subject: Re: [PATCH 1/5] xfs: bypass dquot reclaim to avoid quotacheck deadlock
Date: Sat, 18 Feb 2017 10:12:15 +1100	[thread overview]
Message-ID: <20170217231215.GD15349@dastard> (raw)
In-Reply-To: <20170217183009.GB20429@bfoster.bfoster>

On Fri, Feb 17, 2017 at 01:30:09PM -0500, Brian Foster wrote:
> On Fri, Feb 17, 2017 at 09:37:05AM +1100, Dave Chinner wrote:
> > On Wed, Feb 15, 2017 at 10:40:43AM -0500, Brian Foster wrote:
> > > Reclaim during quotacheck can lead to deadlocks on the dquot flush lock:
> > > 
> > >  - Quotacheck populates a local delwri queue with the physical dquot
> > >    buffers.
> > >  - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and dirties
> > >    all of the dquots.
> > >  - Reclaim kicks in and attempts to flush a dquot whose buffer is
> > >    already queud on the quotacheck queue. The flush succeeds but
> > >    queueing to the reclaim delwri queue fails as the backing buffer is
> > >    already queued. The flush unlock is now deferred to I/O completion of
> > >    the buffer from the quotacheck queue.
> > >  - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires the
> > >    flush lock to update the backing buffers with the in-core
> > >    recalculated values. This deadlocks as the flush lock was acquired by
> > >    reclaim but the buffer never submitted for I/O as it already resided
> > >    on the quotacheck queue.
> > > 
> > > This is reproduced by running quotacheck on a filesystem with a couple
> > > million inodes in low memory (512MB-1GB) situations.
> > > 
> > > Quotacheck first resets and collects the physical dquot buffers in a
> > > delwri queue. Then, it traverses the filesystem inodes via bulkstat,
> > > updates the in-core dquots, flushes the corrected dquots to the backing
> > > buffers and finally submits the delwri queue for I/O. Since the backing
> > > buffers are queued across the entire quotacheck operation, dquot reclaim
> > > cannot possibly complete a dquot flush before quotacheck completes.
> > > Therefore, dquot reclaim provides no real value during quotacheck.
> > 
> > Which is an OOM vector on systems with lots of dquots and low memory
> > at mount time. Turning off dquot reclaim doesn't change that.
> > 
> 
> It's not intended to. The inefficient design of quotacheck wrt to memory
> usage is a separate problem. AFAICT, that problem goes back as far as my
> git tree does (2.6.xx).
> 
> Looking back through the git history, this deadlock appears to be a
> regression as of commit 43ff2122e6 ("xfs: on-stack delayed write buffer
> lists") in 2012, which replaced a trylock/push sequence from the
> quotacheck flush walk with a synchronous flush lock.

Right - that's the commit that changed the code to what it is now.
That's what I implied needs fixing....

> > Address the root cause - the buffer list is never flushed and so
> > pins the memory quotacheck requires, so we need to flush the buffer
> > list more often.  We also need to submit the buffer list before the
> > flush walks begin, thereby unlocking all the dquots before doing the
> > flush walk and avoiding the deadlock.
> > 
> 
> I acknowledge that this patch doesn't address the root problem, but
> neither does this proposal. The root cause is not that quotacheck uses
> too much memory, it's that the serialization between quotacheck and
> dquot reclaim is bogus.
> 
> For example, if we drop the buffer list being held across the adjust and
> flush walks, we do indeed unpin the buffers and dquots for memory
> reclaim, particularly during the dqadjust bulkstat. The flush walk then
> still has to cycle the flush lock of every dquot to guarantee we wait on
> dquots being flushed by reclaim (otherwise we risk corruption with the
> xfs_qm_flush_one() logic is it is today). The same deadlock vector still
> exists, albeit with a smaller window, if reclaim happens to flush a
> dquot that is backed by a buffer that has already been queued by the
> flush walk. AFAICT, the only requirements for this to occur are a couple
> of dquots on a single backing buffer and some memory pressure from
> somewhere, so the local memory usage is irrelevant.

Except that the window is *tiny* because the dquots are walked in
order and so all the dquots are processed in ascending order. This
means all the dquots in a buffer are flushed sequentially, and each
dquot lookup resets the reclaim reference count count on the dquot
buffer. Hence it is extremely unlikely that a dquot buffer will be
reclaimed while it is partially flushed during the flush walk.

> I'm specifically trying to address the deadlock problem here. If we
> don't want to bypass reclaim, the other options I see are to rework the
> logic on the reclaim side to check the buffer state before we flush the
> dquot, or push on the buffer from the quotacheck side like we used to
> before the commit above.

The latter is effectively what flushing the buffer list before the
flush walk is run does.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2017-02-17 23:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-15 15:40 [PATCH 0/5] xfs: quota deadlock fixes Brian Foster
2017-02-15 15:40 ` [PATCH 1/5] xfs: bypass dquot reclaim to avoid quotacheck deadlock Brian Foster
2017-02-16 22:37   ` Dave Chinner
2017-02-17 18:30     ` Brian Foster
2017-02-17 23:12       ` Dave Chinner [this message]
2017-02-18 12:55         ` Brian Foster
2017-02-15 15:40 ` [PATCH 2/5] xfs: allocate quotaoff transactions up front to avoid log deadlock Brian Foster
2017-04-26 21:23   ` Darrick J. Wong
2017-04-27 12:03     ` Brian Foster
2017-04-27 15:47       ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 3/5] xfs: support ability to wait on new inodes Brian Foster
2017-04-27 21:15   ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 4/5] xfs: update ag iterator to support " Brian Foster
2017-04-27 21:17   ` Darrick J. Wong
2017-02-15 15:40 ` [PATCH 5/5] xfs: wait on new inodes during quotaoff dquot release Brian Foster
2017-04-27 21:17   ` Darrick J. Wong
2017-02-16  7:42 ` [PATCH 0/5] xfs: quota deadlock fixes Eryu Guan
2017-02-16 12:01   ` Brian Foster
2017-02-17  6:53 ` Eryu Guan
2017-02-17 17:54   ` Brian Foster
2017-02-20  3:52     ` Eryu Guan
2017-02-20 13:25       ` Brian Foster
2017-02-22 15:35         ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170217231215.GD15349@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=eguan@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.