linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 05/22] xfs: scrub in-memory metadata buffers
Date: Mon, 24 Jul 2017 15:36:54 -0700	[thread overview]
Message-ID: <20170724223654.GK4352@magnolia> (raw)
In-Reply-To: <20170724014327.GC17762@dastard>

On Mon, Jul 24, 2017 at 11:43:27AM +1000, Dave Chinner wrote:
> On Thu, Jul 20, 2017 at 09:39:00PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Call the verifier function for all in-memory metadata buffers, looking
> > for memory corruption either due to bad memory or coding bugs.
> 
> How does this fit into the bigger picture? We can't do an exhaustive
> search of the in memory buffer cache, because access is racy w.r.t.
> the life cycle of in memory buffers.
> 
> Also, if we are doing a full scrub, we're going to hit and then
> check the cached in-memory buffers anyway, so I'm missing the
> context that explains why this code is necessary.

Before we start scanning the filesystem (which could lead to clean
buffers being pushed out of memory and later reread), we want to check
the buffers that have been sitting around in memory to see if they've
mutated since the last time the verifiers ran.

> >  #endif	/* __XFS_REPAIR_COMMON_H__ */
> > diff --git a/fs/xfs/scrub/metabufs.c b/fs/xfs/scrub/metabufs.c
> > new file mode 100644
> > index 0000000..63faaa6
> > --- /dev/null
> > +++ b/fs/xfs/scrub/metabufs.c
> > @@ -0,0 +1,177 @@
> > +/*
> > + * Copyright (C) 2017 Oracle.  All Rights Reserved.
> > + *
> > + * Author: Darrick J. Wong <darrick.wong@oracle.com>
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License
> > + * as published by the Free Software Foundation; either version 2
> > + * of the License, or (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it would be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write the Free Software Foundation,
> > + * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_format.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_btree.h"
> > +#include "xfs_bit.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_sb.h"
> > +#include "scrub/common.h"
> > +
> > +/* We only iterate buffers one by one, so we don't need any setup. */
> > +int
> > +xfs_scrub_setup_metabufs(
> > +	struct xfs_scrub_context	*sc,
> > +	struct xfs_inode		*ip)
> > +{
> > +	return 0;
> > +}
> > +
> > +#define XFS_SCRUB_METABUFS_TOO_MANY_RETRIES	10
> > +struct xfs_scrub_metabufs_info {
> > +	struct xfs_scrub_context	*sc;
> > +	unsigned int			retries;
> > +};
> 
> So do we get 10 retries per buffer, or 10 retries across an entire
> scan?

Ten per scan.

> > +/* In-memory buffer corruption. */
> > +
> > +#define XFS_SCRUB_BUF_OP_ERROR_GOTO(label) \
> > +	XFS_SCRUB_OP_ERROR_GOTO(smi->sc, \
> > +			xfs_daddr_to_agno(smi->sc->mp, bp->b_bn), \
> > +			xfs_daddr_to_agbno(smi->sc->mp, bp->b_bn), "buf", \
> > +			&error, label)
> 
> Nested macros - yuck!
> 
> > +STATIC int
> > +xfs_scrub_metabufs_scrub_buf(
> > +	struct xfs_scrub_metabufs_info	*smi,
> > +	struct xfs_buf			*bp)
> > +{
> > +	int				olderror;
> > +	int				error = 0;
> > +
> > +	/*
> > +	 * We hold the rcu lock during the rhashtable walk, so we can't risk
> > +	 * having the log forced due to a stale buffer by xfs_buf_lock.
> > +	 */
> > +	if (bp->b_flags & XBF_STALE)
> > +		return 0;
> > +
> > +	atomic_inc(&bp->b_hold);
> 
> This looks wrong. I think it can race with reclaim because we don't
> hold the pag->pag_buf_lock. i.e.  xfs_buf_rele() does this:
> 
> 	release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
> 
> to prevent lookups - which are done under the pag->pag_buf_lock -
> from finding the buffer while it has a zero hold count and may be
> removed from the cache and freed.

I could be misunderstanding rhashtable here -- as I understand it, the
rhashtable_walk_start function calls rcu_read_lock and doesn't release
it until we call rhashtable_walk_stop.  The rhashtable lookup, insert,
and remove functions each call rcu_read_lock before fidding with the
hashtable internals.  I /think/ this means that even if the scrubber
gets ahold of a buffer with zero b_hold that's being xfs_buf_rele'd,
that concurrent xfs_buf_rele will be waiting for rcu_read_lock, and
therefore the buffer cannot be freed until the walk stops.

> Further, if we are going to iterate the cache, I'd much prefer the
> iteration code to be in fs/xfs/xfs_buf.c - nothing outside that file
> should be touching core buffer cache structures....
> 
> FWIW, the LRU walks already handle this problem by the fact the LRU
> owns a hold count on the buffer. So it may be better to do this via
> a LRU walk rather than a hashtable walk....

Yeah, once we clear up the above I'll move it to xfs_buf.c.

> [snip]
> 
> > +	olderror = bp->b_error;
> > +	if (bp->b_fspriv)
> > +		bp->b_ops->verify_write(bp);
> 
> Should we be recalculating the CRC on buffers we aren't about to 
> be writing to disk?

I don't think it causes any harm to recalculate the crc early.  If the
buffer is dirty and corrupt we can't fix it and write out will flag it
and shut down the fs anyway, so it likely doesn't matter anyway.

> Should we be verifying a buffer that has a non-zero error value on it?

No.

> > +	else
> > +		bp->b_ops->verify_read(bp);
> > +	error = bp->b_error;
> > +	bp->b_error = olderror;
> > +
> > +	/* Mark any corruption errors we might find. */
> > +	XFS_SCRUB_BUF_OP_ERROR_GOTO(out_unlock);
> 
> Ah, what? Why does this need a goto? And why doesn't it report the
> error that was found? (bloody macros!).
> 
> > +out_unlock:
> > +	xfs_buf_unlock(bp);
> > +out_dec:
> > +	atomic_dec(&bp->b_hold);
> > +	return error;
> > +}
> > +#undef XFS_SCRUB_BUF_OP_ERROR_GOTO
> 
> Oh, that's the macro defined above the function. Which I paid little
> attention to other than it called another macro. Now I realise that
> it (ab)uses local variables without them being passed into the
> macro. Yup, another reason we need to get rid of the macros from
> this code....

Ok.

> > +	struct xfs_scrub_metabufs_info	*smi,
> > +	struct rhashtable_iter		*iter)
> > +{
> > +	struct xfs_buf			*bp;
> > +	int				error = 0;
> > +
> > +	do {
> > +		if (xfs_scrub_should_terminate(&error))
> > +			break;
> > +
> > +		bp = rhashtable_walk_next(iter);
> > +		if (IS_ERR(bp))
> > +			return PTR_ERR(bp);
> > +		else if (bp == NULL)
> > +			return 0;
> > +
> > +		error = xfs_scrub_metabufs_scrub_buf(smi, bp);
> > +	} while (error != 0);
> > +
> > +	return error;
> > +}
> > +
> > +/* Try to walk the buffers in this AG in order to scrub them. */
> > +int
> > +xfs_scrub_metabufs(
> 
> Ah, please put an "_ag_" in this so it's clear it's only scrubbing a
> single AG. This is hidden deep inside the scrub context, so it took
> me a little bit of back tracking to understand that this wasn't a
> global scan....

Ok.

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-07-24 22:36 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-21  4:38 [PATCH v8 00/22] xfs: online scrub support Darrick J. Wong
2017-07-21  4:38 ` [PATCH 01/22] xfs: query the per-AG reservation counters Darrick J. Wong
2017-07-23 16:16   ` Allison Henderson
2017-07-23 22:25   ` Dave Chinner
2017-07-24 19:07     ` Darrick J. Wong
2017-07-21  4:38 ` [PATCH 02/22] xfs: add scrub tracepoints Darrick J. Wong
2017-07-23 16:23   ` Allison Henderson
2017-07-21  4:38 ` [PATCH 03/22] xfs: create an ioctl to scrub AG metadata Darrick J. Wong
2017-07-23 16:37   ` Allison Henderson
2017-07-23 23:45   ` Dave Chinner
2017-07-24 21:14     ` Darrick J. Wong
2017-07-21  4:38 ` [PATCH 04/22] xfs: generic functions to scrub metadata and btrees Darrick J. Wong
2017-07-23 16:40   ` Allison Henderson
2017-07-24  1:05   ` Dave Chinner
2017-07-24 21:58     ` Darrick J. Wong
2017-07-24 23:15       ` Dave Chinner
2017-07-25  0:39         ` Darrick J. Wong
2017-07-21  4:39 ` [PATCH 05/22] xfs: scrub in-memory metadata buffers Darrick J. Wong
2017-07-23 16:48   ` Allison Henderson
2017-07-24  1:43   ` Dave Chinner
2017-07-24 22:36     ` Darrick J. Wong [this message]
2017-07-24 23:38       ` Dave Chinner
2017-07-25  0:14         ` Darrick J. Wong
2017-07-25  3:32           ` Dave Chinner
2017-07-25  5:27             ` Darrick J. Wong
2017-07-21  4:39 ` [PATCH 06/22] xfs: scrub the backup superblocks Darrick J. Wong
2017-07-23 16:50   ` Allison Henderson
2017-07-25  4:05   ` Dave Chinner
2017-07-25  5:42     ` Darrick J. Wong
2017-07-21  4:39 ` [PATCH 07/22] xfs: scrub AGF and AGFL Darrick J. Wong
2017-07-23 16:59   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 08/22] xfs: scrub the AGI Darrick J. Wong
2017-07-23 17:02   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 09/22] xfs: scrub free space btrees Darrick J. Wong
2017-07-23 17:09   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 10/22] xfs: scrub inode btrees Darrick J. Wong
2017-07-23 17:15   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 11/22] xfs: scrub rmap btrees Darrick J. Wong
2017-07-23 17:21   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 12/22] xfs: scrub refcount btrees Darrick J. Wong
2017-07-23 17:25   ` Allison Henderson
2017-07-21  4:39 ` [PATCH 13/22] xfs: scrub inodes Darrick J. Wong
2017-07-23 17:38   ` Allison Henderson
2017-07-24 20:02     ` Darrick J. Wong
2017-07-21  4:40 ` [PATCH 14/22] xfs: scrub inode block mappings Darrick J. Wong
2017-07-23 17:41   ` Allison Henderson
2017-07-24 20:05     ` Darrick J. Wong
2017-07-21  4:40 ` [PATCH 15/22] xfs: scrub directory/attribute btrees Darrick J. Wong
2017-07-23 17:45   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 16/22] xfs: scrub directory metadata Darrick J. Wong
2017-07-23 17:51   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 17/22] xfs: scrub directory freespace Darrick J. Wong
2017-07-23 17:55   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 18/22] xfs: scrub extended attributes Darrick J. Wong
2017-07-23 17:57   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 19/22] xfs: scrub symbolic links Darrick J. Wong
2017-07-23 17:59   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 20/22] xfs: scrub parent pointers Darrick J. Wong
2017-07-23 18:03   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 21/22] xfs: scrub realtime bitmap/summary Darrick J. Wong
2017-07-23 18:05   ` Allison Henderson
2017-07-21  4:40 ` [PATCH 22/22] xfs: scrub quota information Darrick J. Wong
2017-07-23 18:07   ` Allison Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170724223654.GK4352@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).