All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: use ordered buffers to initialize dquot buffers during quotacheck
Date: Wed, 13 May 2020 08:26:28 -0700	[thread overview]
Message-ID: <20200513152628.GV6714@magnolia> (raw)
In-Reply-To: <20200513131415.GC44225@bfoster>

On Wed, May 13, 2020 at 09:14:15AM -0400, Brian Foster wrote:
> On Tue, May 12, 2020 at 02:00:33PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > While QAing the new xfs_repair quotacheck code, I uncovered a quota
> > corruption bug resulting from a bad interaction between dquot buffer
> > initialization and quotacheck.  The bug can be reproduced with the
> > following sequence:
> > 
> > # mkfs.xfs -f /dev/sdf
> > # mount /dev/sdf /opt -o usrquota
> > # su nobody -s /bin/bash -c 'touch /opt/barf'
> > # sync
> > # xfs_quota -x -c 'report -ahi' /opt
> > User quota on /opt (/dev/sdf)
> >                         Inodes
> > User ID      Used   Soft   Hard Warn/Grace
> > ---------- ---------------------------------
> > root            3      0      0  00 [------]
> > nobody          1      0      0  00 [------]
> > 
> > # xfs_io -x -c 'shutdown' /opt
> > # umount /opt
> > # mount /dev/sdf /opt -o usrquota
> > # touch /opt/man2
> > # xfs_quota -x -c 'report -ahi' /opt
> > User quota on /opt (/dev/sdf)
> >                         Inodes
> > User ID      Used   Soft   Hard Warn/Grace
> > ---------- ---------------------------------
> > root            1      0      0  00 [------]
> > nobody          1      0      0  00 [------]
> > 
> > # umount /opt
> > 
> > Notice how the initial quotacheck set the root dquot icount to 3
> > (rootino, rbmino, rsumino), but after shutdown -> remount -> recovery,
> > xfs_quota reports that the root dquot has only 1 icount.  We haven't
> > deleted anything from the filesystem, which means that quota is now
> > under-counting.  This behavior is not limited to icount or the root
> > dquot, but this is the shortest reproducer.
> > 
> > I traced the cause of this discrepancy to the way that we handle ondisk
> > dquot updates during quotacheck vs. regular fs activity.  Normally, when
> > we allocate a disk block for a dquot, we log the buffer as a regular
> > (dquot) buffer.  Subsequent updates to the dquots backed by that block
> > are done via separate dquot log item updates, which means that they
> > depend on the logged buffer update being written to disk before the
> > dquot items.  Because individual dquots have their own LSN fields, that
> > initial dquot buffer must always be recovered.
> > 
> > However, the story changes for quotacheck, which can cause dquot block
> > allocations but persists the final dquot counter values via a delwri
> > list.  Because recovery doesn't gate dquot buffer replay on an LSN, this
> > means that the initial dquot buffer can be replayed over the (newer)
> > contents that were delwritten at the end of quotacheck.  In effect, this
> > re-initializes the dquot counters after they've been updated.  If the
> > log does not contain any other dquot items to recover, the obsolete
> > dquot contents will not be corrected by log recovery.
> > 
> > Because quotacheck uses a transaction to log the setting of the CHKD
> > flags in the superblock, we skip quotacheck during the second mount
> > call, which allows the incorrect icount to remain.
> > 
> > Fix this by changing the ondisk dquot initialization function to use
> > ordered buffers to write out fresh dquot blocks if it detects that we're
> > running quotacheck.  If the system goes down before quotacheck can
> > complete, the CHKD flags will not be set in the superblock and the next
> > mount will run quotacheck again, which can fix uninitialized dquot
> > buffers.  This requires amending the defer code to maintaine ordered
> > buffer state across defer rolls for the sake of the dquot allocation
> > code.
> > 
> > For regular operations we preserve the current behavior since the dquot
> > items require properly initialized ondisk dquot records.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c |   10 ++++++++-
> >  fs/xfs/xfs_dquot.c        |   51 ++++++++++++++++++++++++++++++++++-----------
> >  2 files changed, 47 insertions(+), 14 deletions(-)
> > 
> ...  
> > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > index 7f39dd24d475..cf8b2f4de587 100644
> > --- a/fs/xfs/xfs_dquot.c
> > +++ b/fs/xfs/xfs_dquot.c
> ...
> > @@ -277,11 +279,34 @@ xfs_qm_init_dquot_blk(
> >  		}
> >  	}
> >  
> > -	xfs_trans_dquot_buf(tp, bp,
> > -			    (type & XFS_DQ_USER ? XFS_BLF_UDQUOT_BUF :
> > -			    ((type & XFS_DQ_PROJ) ? XFS_BLF_PDQUOT_BUF :
> > -			     XFS_BLF_GDQUOT_BUF)));
> > -	xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
> > +	if (type & XFS_DQ_USER) {
> > +		qflag = XFS_UQUOTA_CHKD;
> > +		blftype = XFS_BLF_UDQUOT_BUF;
> > +	} else if (type & XFS_DQ_PROJ) {
> > +		qflag = XFS_PQUOTA_CHKD;
> > +		blftype = XFS_BLF_PDQUOT_BUF;
> > +	} else {
> > +		qflag = XFS_GQUOTA_CHKD;
> > +		blftype = XFS_BLF_GDQUOT_BUF;
> > +	}
> > +
> > +	xfs_trans_dquot_buf(tp, bp, blftype);
> > +
> > +	/*
> > +	 * If the CHKD flag isn't set, we're running quotacheck and need to use
> > +	 * ordered buffers so that the logged initialization buffer does not
> > +	 * get replayed over the delwritten quotacheck buffer.  If we crash
> > +	 * before the end of quotacheck, the CHKD flags will not be set in the
> > +	 * superblock and we'll re-run quotacheck at next mount.
> > +	 *
> > +	 * Outside of quotacheck, dquot updates are logged via dquot items and
> > +	 * we must use the regular buffer logging mechanisms to ensure that the
> > +	 * initial buffer state is recovered before dquot items.
> > +	 */
> > +	if (mp->m_qflags & qflag)
> > +		xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
> > +	else
> > +		xfs_trans_ordered_buf(tp, bp);
> 
> Ok, I think I follow what's going on here. quotacheck runs and allocates
> a dquot block and inits it in a transaction. Some time later quotacheck
> updates dquots in said block and queues the buffer in its own delwri
> list. Quotacheck completes, the buffer is written back and then the
> filesystem immediately crashes. We replay the content of the alloc/init
> transaction over the updated content in the block on disk. This isn't a
> problem outside of quotacheck because a subsequent dquot update would
> have also been logged instead of directly written back.

Correct.

> Therefore, the purpose of the change above is primarily to avoid logging
> the init of the dquot buffer during quotacheck. That makes sense, but
> what about the integrity of this particular transaction? For example,
> what happens if this transaction commits to the on-disk log and we crash
> before quotacheck completes and the buffer is written back? I'm assuming
> recovery would replay the dquot allocation but not the initialization,
> then quotacheck would run again and find the buffer in an unknown state
> (perhaps failing a read verifier?). Hm?

Yes, the read verifier can fail in quotacheck, but quotacheck reacts to
that by re-trying the buffer read with a NULL buffer ops (in
xfs_qm_reset_dqcounts_all).  Next, it calls xfs_qm_reset_dqcounts, which
calls xfs_dqblk_repair to reinitialize the dquot records.  After that,
xfs_qm_reset_dqcounts resets even more of the dquot state (counters,
timers, flags, warns).

In short, quotacheck will fix anything it doesn't like about the dquot
buffers that are attached to the quota inodes.

--D

> Brian
> 
> >  }
> >  
> >  /*
> > 
> 

  reply	other threads:[~2020-05-13 15:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-12 21:00 [PATCH] xfs: use ordered buffers to initialize dquot buffers during quotacheck Darrick J. Wong
2020-05-13  0:06 ` Dave Chinner
2020-05-13  0:25   ` Darrick J. Wong
2020-05-13  0:38     ` Dave Chinner
2020-05-13 13:14 ` Brian Foster
2020-05-13 15:26   ` Darrick J. Wong [this message]
2020-05-13 16:12     ` Brian Foster
2020-05-13 17:06       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513152628.GV6714@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.