Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/11] xfs_scrub: fix IO error detection during media verify
@ 2019-09-06  3:37 Darrick J. Wong
  2019-09-06  3:37 ` [PATCH 01/11] xfs_scrub: fix handling of read-verify pool runtime errors Darrick J. Wong
                   ` (10 more replies)
  0 siblings, 11 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:37 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

The existing scrub media scan code (phase 6) has a lot of problems
dealing with errors.  It treats runtime errors (e.g. ENOMEM) as disk
errors, it fails to handle short reads properly, it reports entire
extents as failed when the read returns EIO instead of single-stepping
filesystem blocks to find the bad ones, and there's no good way to test
that it works.

This series fixes each of those problems and adds a couple of debugging
knobs so that we can simulate disk errors and short reads to make sure
that the media scan code actually detects media errors correctly.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-error-detection

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 01/11] xfs_scrub: fix handling of read-verify pool runtime errors
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
@ 2019-09-06  3:37 ` Darrick J. Wong
  2019-09-06  3:37 ` [PATCH 02/11] xfs_scrub: abort all read verification work immediately on error Darrick J. Wong
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:37 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Fix some bogosity with how we handle runtime errors in the read verify
pool functions.  First of all, memory allocation failures shouldn't be
recorded as disk IO errors, they should just complain and abort the
phase.  Second, we need to collect any other runtime errors in the IO
thread and abort the phase instead of silently ignoring them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/read_verify.c |   23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)


diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index b890c92f..00627307 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -53,6 +53,7 @@ struct read_verify_pool {
 	struct disk		*disk;		/* which disk? */
 	read_verify_ioerr_fn_t	ioerr_fn;	/* io error callback */
 	size_t			miniosz;	/* minimum io size, bytes */
+	int			errors_seen;
 };
 
 /*
@@ -91,6 +92,7 @@ read_verify_pool_init(
 	rvp->ctx = ctx;
 	rvp->disk = disk;
 	rvp->ioerr_fn = ioerr_fn;
+	rvp->errors_seen = false;
 	error = ptvar_alloc(submitter_threads, sizeof(struct read_verify),
 			&rvp->rvstate);
 	if (error)
@@ -149,6 +151,7 @@ read_verify(
 	unsigned long long		verified = 0;
 	ssize_t				sz;
 	ssize_t				len;
+	int				ret;
 
 	rvp = (struct read_verify_pool *)wq->wq_ctx;
 	while (rv->io_length > 0) {
@@ -173,7 +176,12 @@ read_verify(
 	}
 
 	free(rv);
-	ptcounter_add(rvp->verified_bytes, verified);
+	ret = ptcounter_add(rvp->verified_bytes, verified);
+	if (ret) {
+		str_liberror(rvp->ctx, ret,
+				_("updating bytes verified counter"));
+		rvp->errors_seen = true;
+	}
 }
 
 /* Queue a read verify request. */
@@ -188,18 +196,25 @@ read_verify_queue(
 	dbg_printf("verify fd %d start %"PRIu64" len %"PRIu64"\n",
 			rvp->disk->d_fd, rv->io_start, rv->io_length);
 
+	/* Worker thread saw a runtime error, don't queue more. */
+	if (rvp->errors_seen)
+		return false;
+
+	/* Otherwise clone the request and queue the copy. */
 	tmp = malloc(sizeof(struct read_verify));
 	if (!tmp) {
-		rvp->ioerr_fn(rvp->ctx, rvp->disk, rv->io_start,
-				rv->io_length, errno, rv->io_end_arg);
-		return true;
+		str_errno(rvp->ctx, _("allocating read-verify request"));
+		rvp->errors_seen = true;
+		return false;
 	}
+
 	memcpy(tmp, rv, sizeof(*tmp));
 
 	ret = workqueue_add(&rvp->wq, read_verify, 0, tmp);
 	if (ret) {
 		str_liberror(rvp->ctx, ret, _("queueing read-verify work"));
 		free(tmp);
+		rvp->errors_seen = true;
 		return false;
 	}
 	rv->io_length = 0;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 02/11] xfs_scrub: abort all read verification work immediately on error
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
  2019-09-06  3:37 ` [PATCH 01/11] xfs_scrub: fix handling of read-verify pool runtime errors Darrick J. Wong
@ 2019-09-06  3:37 ` Darrick J. Wong
  2019-09-06  3:37 ` [PATCH 03/11] xfs_scrub: fix read-verify pool error communication problems Darrick J. Wong
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:37 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a new abort function to the read verify pool code so that the caller
can immediately abort all pending verification work if things start
going wrong.  There's no point in waiting for queued work to run if
we've already decided to bail.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c      |    6 +++---
 scrub/read_verify.c |   10 ++++++++++
 scrub/read_verify.h |    1 +
 3 files changed, 14 insertions(+), 3 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index d9285fee..4c81ee7b 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -514,16 +514,16 @@ _("Could not create data device media verifier."));
 
 out_rtpool:
 	if (vs.rvp_realtime) {
-		read_verify_pool_flush(vs.rvp_realtime);
+		read_verify_pool_abort(vs.rvp_realtime);
 		read_verify_pool_destroy(vs.rvp_realtime);
 	}
 out_logpool:
 	if (vs.rvp_log) {
-		read_verify_pool_flush(vs.rvp_log);
+		read_verify_pool_abort(vs.rvp_log);
 		read_verify_pool_destroy(vs.rvp_log);
 	}
 out_datapool:
-	read_verify_pool_flush(vs.rvp_data);
+	read_verify_pool_abort(vs.rvp_data);
 	read_verify_pool_destroy(vs.rvp_data);
 out_rbad:
 	bitmap_free(&vs.r_bad);
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 00627307..301e9b48 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -117,6 +117,16 @@ read_verify_pool_init(
 	return NULL;
 }
 
+/* Abort all verification work. */
+void
+read_verify_pool_abort(
+	struct read_verify_pool		*rvp)
+{
+	if (!rvp->errors_seen)
+		rvp->errors_seen = ECANCELED;
+	workqueue_terminate(&rvp->wq);
+}
+
 /* Finish up any read verification work. */
 void
 read_verify_pool_flush(
diff --git a/scrub/read_verify.h b/scrub/read_verify.h
index 5fabe5e0..f0ed8902 100644
--- a/scrub/read_verify.h
+++ b/scrub/read_verify.h
@@ -19,6 +19,7 @@ struct read_verify_pool *read_verify_pool_init(struct scrub_ctx *ctx,
 		struct disk *disk, size_t miniosz,
 		read_verify_ioerr_fn_t ioerr_fn,
 		unsigned int submitter_threads);
+void read_verify_pool_abort(struct read_verify_pool *rvp);
 void read_verify_pool_flush(struct read_verify_pool *rvp);
 void read_verify_pool_destroy(struct read_verify_pool *rvp);
 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 03/11] xfs_scrub: fix read-verify pool error communication problems
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
  2019-09-06  3:37 ` [PATCH 01/11] xfs_scrub: fix handling of read-verify pool runtime errors Darrick J. Wong
  2019-09-06  3:37 ` [PATCH 02/11] xfs_scrub: abort all read verification work immediately on error Darrick J. Wong
@ 2019-09-06  3:37 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 04/11] xfs_scrub: fix queue-and-stash of non-contiguous verify requests Darrick J. Wong
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:37 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Fix all the places in the read-verify pool functions either we fail to
check for runtime errors or fail to communicate them properly to
callers.  Then fix all the callers to report the error messages instead
of hiding them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c      |   89 ++++++++++++++++++++++++++++++++++++---------------
 scrub/read_verify.c |   87 ++++++++++++++++++++++----------------------------
 scrub/read_verify.h |   16 +++++----
 3 files changed, 109 insertions(+), 83 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index 4c81ee7b..f6274a49 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -387,6 +387,7 @@ xfs_check_rmap(
 {
 	struct media_verify_state	*vs = arg;
 	struct read_verify_pool		*rvp;
+	int				ret;
 
 	rvp = xfs_dev_to_pool(ctx, vs, map->fmr_device);
 
@@ -415,28 +416,48 @@ xfs_check_rmap(
 	/* XXX: Filter out directory data blocks. */
 
 	/* Schedule the read verify command for (eventual) running. */
-	read_verify_schedule_io(rvp, map->fmr_physical, map->fmr_length, vs);
+	ret = read_verify_schedule_io(rvp, map->fmr_physical, map->fmr_length,
+			vs);
+	if (ret) {
+		str_liberror(ctx, ret, descr);
+		return false;
+	}
 
 out:
 	/* Is this the last extent?  Fire off the read. */
-	if (map->fmr_flags & FMR_OF_LAST)
-		read_verify_force_io(rvp);
+	if (map->fmr_flags & FMR_OF_LAST) {
+		ret = read_verify_force_io(rvp);
+		if (ret) {
+			str_liberror(ctx, ret, descr);
+			return false;
+		}
+	}
 
 	return true;
 }
 
 /* Wait for read/verify actions to finish, then return # bytes checked. */
-static uint64_t
+static int
 clean_pool(
-	struct read_verify_pool	*rvp)
+	struct read_verify_pool	*rvp,
+	unsigned long long	*bytes_checked)
 {
-	uint64_t		ret;
+	uint64_t		pool_checked;
+	int			ret;
 
 	if (!rvp)
 		return 0;
 
-	read_verify_pool_flush(rvp);
-	ret = read_verify_bytes(rvp);
+	ret = read_verify_pool_flush(rvp);
+	if (ret)
+		goto out_destroy;
+
+	ret = read_verify_bytes(rvp, &pool_checked);
+	if (ret)
+		goto out_destroy;
+
+	*bytes_checked += pool_checked;
+out_destroy:
 	read_verify_pool_destroy(rvp);
 	return ret;
 }
@@ -469,43 +490,57 @@ xfs_scan_blocks(
 		goto out_dbad;
 	}
 
-	vs.rvp_data = read_verify_pool_init(ctx, ctx->datadev,
+	ret = read_verify_pool_alloc(ctx, ctx->datadev,
 			ctx->mnt.fsgeom.blocksize, xfs_check_rmap_ioerr,
-			scrub_nproc(ctx));
-	if (!vs.rvp_data) {
-		str_info(ctx, ctx->mntpoint,
-_("Could not create data device media verifier."));
+			scrub_nproc(ctx), &vs.rvp_data);
+	if (ret) {
+		str_liberror(ctx, ret, _("creating datadev media verifier"));
 		goto out_rbad;
 	}
 	if (ctx->logdev) {
-		vs.rvp_log = read_verify_pool_init(ctx, ctx->logdev,
+		ret = read_verify_pool_alloc(ctx, ctx->logdev,
 				ctx->mnt.fsgeom.blocksize, xfs_check_rmap_ioerr,
-				scrub_nproc(ctx));
-		if (!vs.rvp_log) {
-			str_info(ctx, ctx->mntpoint,
-	_("Could not create log device media verifier."));
+				scrub_nproc(ctx), &vs.rvp_log);
+		if (ret) {
+			str_liberror(ctx, ret,
+					_("creating logdev media verifier"));
 			goto out_datapool;
 		}
 	}
 	if (ctx->rtdev) {
-		vs.rvp_realtime = read_verify_pool_init(ctx, ctx->rtdev,
+		ret = read_verify_pool_alloc(ctx, ctx->rtdev,
 				ctx->mnt.fsgeom.blocksize, xfs_check_rmap_ioerr,
-				scrub_nproc(ctx));
-		if (!vs.rvp_realtime) {
-			str_info(ctx, ctx->mntpoint,
-	_("Could not create realtime device media verifier."));
+				scrub_nproc(ctx), &vs.rvp_realtime);
+		if (ret) {
+			str_liberror(ctx, ret,
+					_("creating rtdev media verifier"));
 			goto out_logpool;
 		}
 	}
 	moveon = xfs_scan_all_spacemaps(ctx, xfs_check_rmap, &vs);
 	if (!moveon)
 		goto out_rtpool;
-	ctx->bytes_checked += clean_pool(vs.rvp_data);
-	ctx->bytes_checked += clean_pool(vs.rvp_log);
-	ctx->bytes_checked += clean_pool(vs.rvp_realtime);
+
+	ret = clean_pool(vs.rvp_data, &ctx->bytes_checked);
+	if (ret) {
+		str_liberror(ctx, ret, _("flushing datadev verify pool"));
+		moveon = false;
+	}
+
+	ret = clean_pool(vs.rvp_log, &ctx->bytes_checked);
+	if (ret) {
+		str_liberror(ctx, ret, _("flushing logdev verify pool"));
+		moveon = false;
+	}
+
+	ret = clean_pool(vs.rvp_realtime, &ctx->bytes_checked);
+	if (ret) {
+		str_liberror(ctx, ret, _("flushing rtdev verify pool"));
+		moveon = false;
+	}
 
 	/* Scan the whole dir tree to see what matches the bad extents. */
-	if (!bitmap_empty(vs.d_bad) || !bitmap_empty(vs.r_bad))
+	if (moveon && (!bitmap_empty(vs.d_bad) || !bitmap_empty(vs.r_bad)))
 		moveon = xfs_report_verify_errors(ctx, &vs);
 
 	bitmap_free(&vs.r_bad);
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 301e9b48..8f80dcaf 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -65,37 +65,37 @@ struct read_verify_pool {
  * @submitter_threads is the number of threads that may be sending verify
  * requests at any given time.
  */
-struct read_verify_pool *
-read_verify_pool_init(
+int
+read_verify_pool_alloc(
 	struct scrub_ctx		*ctx,
 	struct disk			*disk,
 	size_t				miniosz,
 	read_verify_ioerr_fn_t		ioerr_fn,
-	unsigned int			submitter_threads)
+	unsigned int			submitter_threads,
+	struct read_verify_pool		**prvp)
 {
 	struct read_verify_pool		*rvp;
-	bool				ret;
-	int				error;
+	int				ret;
 
 	rvp = calloc(1, sizeof(struct read_verify_pool));
 	if (!rvp)
-		return NULL;
+		return errno;
 
-	error = posix_memalign((void **)&rvp->readbuf, page_size,
+	ret = posix_memalign((void **)&rvp->readbuf, page_size,
 			RVP_IO_MAX_SIZE);
-	if (error || !rvp->readbuf)
+	if (ret)
 		goto out_free;
-	error = ptcounter_alloc(nproc, &rvp->verified_bytes);
-	if (error)
+	ret = ptcounter_alloc(nproc, &rvp->verified_bytes);
+	if (ret)
 		goto out_buf;
 	rvp->miniosz = miniosz;
 	rvp->ctx = ctx;
 	rvp->disk = disk;
 	rvp->ioerr_fn = ioerr_fn;
-	rvp->errors_seen = false;
-	error = ptvar_alloc(submitter_threads, sizeof(struct read_verify),
+	rvp->errors_seen = 0;
+	ret = ptvar_alloc(submitter_threads, sizeof(struct read_verify),
 			&rvp->rvstate);
-	if (error)
+	if (ret)
 		goto out_counter;
 	/* Run in the main thread if we only want one thread. */
 	if (nproc == 1)
@@ -104,7 +104,8 @@ read_verify_pool_init(
 			disk_heads(disk));
 	if (ret)
 		goto out_rvstate;
-	return rvp;
+	*prvp = rvp;
+	return 0;
 
 out_rvstate:
 	ptvar_free(rvp->rvstate);
@@ -114,7 +115,7 @@ read_verify_pool_init(
 	free(rvp->readbuf);
 out_free:
 	free(rvp);
-	return NULL;
+	return ret;
 }
 
 /* Abort all verification work. */
@@ -128,11 +129,11 @@ read_verify_pool_abort(
 }
 
 /* Finish up any read verification work. */
-void
+int
 read_verify_pool_flush(
 	struct read_verify_pool		*rvp)
 {
-	workqueue_terminate(&rvp->wq);
+	return workqueue_terminate(&rvp->wq);
 }
 
 /* Finish up any read verification work and tear it down. */
@@ -187,15 +188,12 @@ read_verify(
 
 	free(rv);
 	ret = ptcounter_add(rvp->verified_bytes, verified);
-	if (ret) {
-		str_liberror(rvp->ctx, ret,
-				_("updating bytes verified counter"));
-		rvp->errors_seen = true;
-	}
+	if (ret)
+		rvp->errors_seen = ret;
 }
 
 /* Queue a read verify request. */
-static bool
+static int
 read_verify_queue(
 	struct read_verify_pool		*rvp,
 	struct read_verify		*rv)
@@ -208,34 +206,33 @@ read_verify_queue(
 
 	/* Worker thread saw a runtime error, don't queue more. */
 	if (rvp->errors_seen)
-		return false;
+		return rvp->errors_seen;
 
 	/* Otherwise clone the request and queue the copy. */
 	tmp = malloc(sizeof(struct read_verify));
 	if (!tmp) {
-		str_errno(rvp->ctx, _("allocating read-verify request"));
-		rvp->errors_seen = true;
-		return false;
+		rvp->errors_seen = errno;
+		return errno;
 	}
 
 	memcpy(tmp, rv, sizeof(*tmp));
 
 	ret = workqueue_add(&rvp->wq, read_verify, 0, tmp);
 	if (ret) {
-		str_liberror(rvp->ctx, ret, _("queueing read-verify work"));
 		free(tmp);
-		rvp->errors_seen = true;
-		return false;
+		rvp->errors_seen = ret;
+		return ret;
 	}
+
 	rv->io_length = 0;
-	return true;
+	return 0;
 }
 
 /*
  * Issue an IO request.  We'll batch subsequent requests if they're
  * within 64k of each other
  */
-bool
+int
 read_verify_schedule_io(
 	struct read_verify_pool		*rvp,
 	uint64_t			start,
@@ -250,7 +247,7 @@ read_verify_schedule_io(
 	assert(rvp->readbuf);
 	rv = ptvar_get(rvp->rvstate, &ret);
 	if (ret)
-		return false;
+		return ret;
 	req_end = start + length;
 	rv_end = rv->io_start + rv->io_length;
 
@@ -277,38 +274,32 @@ read_verify_schedule_io(
 		rv->io_end_arg = end_arg;
 	}
 
-	return true;
+	return 0;
 }
 
 /* Force any stashed IOs into the verifier. */
-bool
+int
 read_verify_force_io(
 	struct read_verify_pool		*rvp)
 {
 	struct read_verify		*rv;
-	bool				moveon;
 	int				ret;
 
 	assert(rvp->readbuf);
 	rv = ptvar_get(rvp->rvstate, &ret);
 	if (ret)
-		return false;
+		return ret;
 	if (rv->io_length == 0)
-		return true;
+		return 0;
 
-	moveon = read_verify_queue(rvp, rv);
-	if (moveon)
-		rv->io_length = 0;
-	return moveon;
+	return read_verify_queue(rvp, rv);
 }
 
 /* How many bytes has this process verified? */
-uint64_t
+int
 read_verify_bytes(
-	struct read_verify_pool		*rvp)
+	struct read_verify_pool		*rvp,
+	uint64_t			*bytes_checked)
 {
-	uint64_t			ret;
-
-	ptcounter_value(rvp->verified_bytes, &ret);
-	return ret;
+	return ptcounter_value(rvp->verified_bytes, bytes_checked);
 }
diff --git a/scrub/read_verify.h b/scrub/read_verify.h
index f0ed8902..650c46d4 100644
--- a/scrub/read_verify.h
+++ b/scrub/read_verify.h
@@ -15,17 +15,17 @@ typedef void (*read_verify_ioerr_fn_t)(struct scrub_ctx *ctx,
 		struct disk *disk, uint64_t start, uint64_t length,
 		int error, void *arg);
 
-struct read_verify_pool *read_verify_pool_init(struct scrub_ctx *ctx,
-		struct disk *disk, size_t miniosz,
-		read_verify_ioerr_fn_t ioerr_fn,
-		unsigned int submitter_threads);
+int read_verify_pool_alloc(struct scrub_ctx *ctx, struct disk *disk,
+		size_t miniosz, read_verify_ioerr_fn_t ioerr_fn,
+		unsigned int submitter_threads,
+		struct read_verify_pool **prvp);
 void read_verify_pool_abort(struct read_verify_pool *rvp);
-void read_verify_pool_flush(struct read_verify_pool *rvp);
+int read_verify_pool_flush(struct read_verify_pool *rvp);
 void read_verify_pool_destroy(struct read_verify_pool *rvp);
 
-bool read_verify_schedule_io(struct read_verify_pool *rvp, uint64_t start,
+int read_verify_schedule_io(struct read_verify_pool *rvp, uint64_t start,
 		uint64_t length, void *end_arg);
-bool read_verify_force_io(struct read_verify_pool *rvp);
-uint64_t read_verify_bytes(struct read_verify_pool *rvp);
+int read_verify_force_io(struct read_verify_pool *rvp);
+int read_verify_bytes(struct read_verify_pool *rvp, uint64_t *bytes);
 
 #endif /* XFS_SCRUB_READ_VERIFY_H_ */


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 04/11] xfs_scrub: fix queue-and-stash of non-contiguous verify requests
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-09-06  3:37 ` [PATCH 03/11] xfs_scrub: fix read-verify pool error communication problems Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 05/11] xfs_scrub: only call read_verify_force_io once per pool Darrick J. Wong
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

read_verify_schedule_io is supposed to have the ability to decide that a
retained aggregate extent verification request is not sufficiently
contiguous with the request that is being scheduled, and therefore it
needs to queue the retained request and use the new request to start
building a new aggregate request.

Unfortunately, it stupidly returns after queueing the IO, so we lose the
incoming request.  Fix the code so we only do that if there's a run time
error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/read_verify.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)


diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 8f80dcaf..980b92b8 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -265,8 +265,13 @@ read_verify_schedule_io(
 		rv->io_length = max(req_end, rv_end) - rv->io_start;
 	} else  {
 		/* Otherwise, issue the stashed IO (if there is one) */
-		if (rv->io_length > 0)
-			return read_verify_queue(rvp, rv);
+		if (rv->io_length > 0) {
+			int	res;
+
+			res = read_verify_queue(rvp, rv);
+			if (res)
+				return res;
+		}
 
 		/* Stash the new IO. */
 		rv->io_start = start;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 05/11] xfs_scrub: only call read_verify_force_io once per pool
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-09-06  3:38 ` [PATCH 04/11] xfs_scrub: fix queue-and-stash of non-contiguous verify requests Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 06/11] xfs_scrub: refactor inode prefix rendering code Darrick J. Wong
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

There's no reason we need to call read_verify_force_io every AG; we can
just let the request aggregation code do its thing and push when we're
totally done browsing the fsmap information.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c      |   16 +++++-----------
 scrub/read_verify.c |   26 +++++++++++++++++---------
 2 files changed, 22 insertions(+), 20 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index f6274a49..8063d6ce 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -411,7 +411,7 @@ xfs_check_rmap(
 	 */
 	if (map->fmr_flags & (FMR_OF_PREALLOC | FMR_OF_ATTR_FORK |
 			      FMR_OF_EXTENT_MAP | FMR_OF_SPECIAL_OWNER))
-		goto out;
+		return true;
 
 	/* XXX: Filter out directory data blocks. */
 
@@ -423,16 +423,6 @@ xfs_check_rmap(
 		return false;
 	}
 
-out:
-	/* Is this the last extent?  Fire off the read. */
-	if (map->fmr_flags & FMR_OF_LAST) {
-		ret = read_verify_force_io(rvp);
-		if (ret) {
-			str_liberror(ctx, ret, descr);
-			return false;
-		}
-	}
-
 	return true;
 }
 
@@ -448,6 +438,10 @@ clean_pool(
 	if (!rvp)
 		return 0;
 
+	ret = read_verify_force_io(rvp);
+	if (ret)
+		return ret;
+
 	ret = read_verify_pool_flush(rvp);
 	if (ret)
 		goto out_destroy;
diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 980b92b8..73d30817 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -282,22 +282,30 @@ read_verify_schedule_io(
 	return 0;
 }
 
+/* Force any per-thread stashed IOs into the verifier. */
+static int
+force_one_io(
+	struct ptvar		*ptv,
+	void			*data,
+	void			*foreach_arg)
+{
+	struct read_verify_pool	*rvp = foreach_arg;
+	struct read_verify	*rv = data;
+
+	if (rv->io_length == 0)
+		return 0;
+
+	return read_verify_queue(rvp, rv);
+}
+
 /* Force any stashed IOs into the verifier. */
 int
 read_verify_force_io(
 	struct read_verify_pool		*rvp)
 {
-	struct read_verify		*rv;
-	int				ret;
-
 	assert(rvp->readbuf);
-	rv = ptvar_get(rvp->rvstate, &ret);
-	if (ret)
-		return ret;
-	if (rv->io_length == 0)
-		return 0;
 
-	return read_verify_queue(rvp, rv);
+	return ptvar_foreach(rvp->rvstate, force_one_io, rvp);
 }
 
 /* How many bytes has this process verified? */


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 06/11] xfs_scrub: refactor inode prefix rendering code
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-09-06  3:38 ` [PATCH 05/11] xfs_scrub: only call read_verify_force_io once per pool Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 07/11] xfs_scrub: record disk LBA size Darrick J. Wong
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor all the places in the code where we try to render an inode
number as a prefix for some sort of status message.  This will help make
message prefixes more consistent, which should help users to locate
broken metadata.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 scrub/common.h |    6 ++++++
 scrub/inodes.c |    4 ++--
 scrub/phase3.c |    8 ++------
 scrub/phase5.c |    8 ++------
 scrub/phase6.c |    6 +++---
 scrub/scrub.c  |   17 +++++++++--------
 7 files changed, 71 insertions(+), 25 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index 7db47044..a814f568 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -354,3 +354,50 @@ within_range(
 
 	return true;
 }
+
+/*
+ * Render an inode number as both the raw inode number and as an AG number
+ * and AG inode pair.  This is intended for use with status message reporting.
+ * If @format is not NULL, it should provide any desired leading whitespace.
+ *
+ * For example, "inode 287232352 (13/352) <suffix>: <status message>"
+ */
+int
+scrub_render_ino_suffix(
+	const struct scrub_ctx	*ctx,
+	char			*buf,
+	size_t			buflen,
+	uint64_t		ino,
+	uint32_t		gen,
+	const char		*format,
+	...)
+{
+	va_list			args;
+	uint32_t		agno;
+	uint32_t		agino;
+	int			ret;
+
+	agno = cvt_ino_to_agno(&ctx->mnt, ino);
+	agino = cvt_ino_to_agino(&ctx->mnt, ino);
+	ret = snprintf(buf, buflen, _("inode %"PRIu64" (%"PRIu32"/%"PRIu32")"),
+			ino, agno, agino);
+	if (ret < 0 || ret >= buflen || format == NULL)
+		return ret;
+
+	va_start(args, format);
+	ret += vsnprintf(buf + ret, buflen - ret, format, args);
+	va_end(args);
+	return ret;
+}
+
+/* Render an inode number for message reporting with no suffix. */
+int
+scrub_render_ino(
+	const struct scrub_ctx	*ctx,
+	char			*buf,
+	size_t			buflen,
+	uint64_t		ino,
+	uint32_t		gen)
+{
+	return scrub_render_ino_suffix(ctx, buf, buflen, ino, gen, NULL);
+}
diff --git a/scrub/common.h b/scrub/common.h
index 33555891..1b9ad48f 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -86,4 +86,10 @@ bool within_range(struct scrub_ctx *ctx, unsigned long long value,
 		unsigned long long desired, unsigned long long abs_threshold,
 		unsigned int n, unsigned int d, const char *descr);
 
+int scrub_render_ino_suffix(const struct scrub_ctx *ctx, char *buf,
+		size_t buflen, uint64_t ino, uint32_t gen,
+		const char *format, ...);
+int scrub_render_ino(const struct scrub_ctx *ctx, char *buf,
+		size_t buflen, uint64_t ino, uint32_t gen);
+
 #endif /* XFS_SCRUB_COMMON_H_ */
diff --git a/scrub/inodes.c b/scrub/inodes.c
index 37a35a3f..f436beb8 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -159,8 +159,8 @@ xfs_iterate_inodes_ag(
 					ireq->hdr.ino = inogrp->xi_startino;
 					goto igrp_retry;
 				}
-				snprintf(idescr, DESCR_BUFSZ, "inode %"PRIu64,
-						(uint64_t)bs->bs_ino);
+				scrub_render_ino(ctx, idescr, DESCR_BUFSZ,
+						bs->bs_ino, bs->bs_gen);
 				str_info(ctx, idescr,
 _("Changed too many times during scan; giving up."));
 				break;
diff --git a/scrub/phase3.c b/scrub/phase3.c
index 1e908c2c..48bcc21c 100644
--- a/scrub/phase3.c
+++ b/scrub/phase3.c
@@ -48,14 +48,10 @@ xfs_scrub_inode_vfs_error(
 	struct xfs_bulkstat	*bstat)
 {
 	char			descr[DESCR_BUFSZ];
-	xfs_agnumber_t		agno;
-	xfs_agino_t		agino;
 	int			old_errno = errno;
 
-	agno = cvt_ino_to_agno(&ctx->mnt, bstat->bs_ino);
-	agino = cvt_ino_to_agino(&ctx->mnt, bstat->bs_ino);
-	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (%u/%u)"),
-			(uint64_t)bstat->bs_ino, agno, agino);
+	scrub_render_ino(ctx, descr, DESCR_BUFSZ, bstat->bs_ino,
+			bstat->bs_gen);
 	errno = old_errno;
 	str_errno(ctx, descr);
 }
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 99cd51b2..997c88d9 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -234,15 +234,11 @@ xfs_scrub_connections(
 	bool			*pmoveon = arg;
 	char			descr[DESCR_BUFSZ];
 	bool			moveon = true;
-	xfs_agnumber_t		agno;
-	xfs_agino_t		agino;
 	int			fd = -1;
 	int			error;
 
-	agno = cvt_ino_to_agno(&ctx->mnt, bstat->bs_ino);
-	agino = cvt_ino_to_agino(&ctx->mnt, bstat->bs_ino);
-	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (%u/%u)"),
-			(uint64_t)bstat->bs_ino, agno, agino);
+	scrub_render_ino(ctx, descr, DESCR_BUFSZ, bstat->bs_ino,
+			bstat->bs_gen);
 	background_sleep();
 
 	/* Warn about naming problems in xattrs. */
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 8063d6ce..4554af9a 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -180,15 +180,15 @@ xfs_report_verify_inode(
 	int				fd;
 	int				error;
 
-	snprintf(descr, DESCR_BUFSZ, _("inode %"PRIu64" (unlinked)"),
-			(uint64_t)bstat->bs_ino);
-
 	/* Ignore linked files and things we can't open. */
 	if (bstat->bs_nlink != 0)
 		return 0;
 	if (!S_ISREG(bstat->bs_mode) && !S_ISDIR(bstat->bs_mode))
 		return 0;
 
+	scrub_render_ino_suffix(ctx, descr, DESCR_BUFSZ,
+			bstat->bs_ino, bstat->bs_gen, _(" (unlinked)"));
+
 	/* Try to open the inode. */
 	fd = xfs_open_handle(handle);
 	if (fd < 0) {
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 0e30bb2f..9bd6a682 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -26,11 +26,13 @@
 /* Format a scrub description. */
 static void
 format_scrub_descr(
+	struct scrub_ctx		*ctx,
 	char				*buf,
 	size_t				buflen,
-	struct xfs_scrub_metadata	*meta,
-	const struct xfrog_scrub_descr	*sc)
+	struct xfs_scrub_metadata	*meta)
 {
+	const struct xfrog_scrub_descr	*sc = &xfrog_scrubbers[meta->sm_type];
+
 	switch (sc->type) {
 	case XFROG_SCRUB_TYPE_AGHEADER:
 	case XFROG_SCRUB_TYPE_PERAG:
@@ -38,8 +40,9 @@ format_scrub_descr(
 				_(sc->descr));
 		break;
 	case XFROG_SCRUB_TYPE_INODE:
-		snprintf(buf, buflen, _("Inode %"PRIu64" %s"),
-				(uint64_t)meta->sm_ino, _(sc->descr));
+		scrub_render_ino_suffix(ctx, buf, buflen,
+				meta->sm_ino, meta->sm_gen, " %s",
+				_(sc->descr));
 		break;
 	case XFROG_SCRUB_TYPE_FS:
 		snprintf(buf, buflen, _("%s"), _(sc->descr));
@@ -123,8 +126,7 @@ xfs_check_metadata(
 
 	assert(!debug_tweak_on("XFS_SCRUB_NO_KERNEL"));
 	assert(meta->sm_type < XFS_SCRUB_TYPE_NR);
-	format_scrub_descr(buf, DESCR_BUFSZ, meta,
-			&xfrog_scrubbers[meta->sm_type]);
+	format_scrub_descr(ctx, buf, DESCR_BUFSZ, meta);
 
 	dbg_printf("check %s flags %xh\n", buf, meta->sm_flags);
 retry:
@@ -677,8 +679,7 @@ xfs_repair_metadata(
 		return CHECK_RETRY;
 
 	memcpy(&oldm, &meta, sizeof(oldm));
-	format_scrub_descr(buf, DESCR_BUFSZ, &meta,
-			&xfrog_scrubbers[meta.sm_type]);
+	format_scrub_descr(ctx, buf, DESCR_BUFSZ, &meta);
 
 	if (needs_repair(&meta))
 		str_info(ctx, buf, _("Attempting repair."));


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 07/11] xfs_scrub: record disk LBA size
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (5 preceding siblings ...)
  2019-09-06  3:38 ` [PATCH 06/11] xfs_scrub: refactor inode prefix rendering code Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 08/11] xfs_scrub: enforce read verify pool minimum io size Darrick J. Wong
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Remember the size (in bytes) of a logical block on the disk.  We'll use
this in subsequent patches to improve the ability of media scans to
report on which files are corrupt.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/disk.c |    7 +++----
 scrub/disk.h |    3 ++-
 2 files changed, 5 insertions(+), 5 deletions(-)


diff --git a/scrub/disk.c b/scrub/disk.c
index dcdd5ba8..d2101cc6 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -193,7 +193,6 @@ disk_open(
 #endif
 	struct disk		*disk;
 	bool			suspicious_disk = false;
-	int			lba_sz;
 	int			error;
 
 	disk = calloc(1, sizeof(struct disk));
@@ -205,10 +204,10 @@ disk_open(
 		goto out_free;
 
 	/* Try to get LBA size. */
-	error = ioctl(disk->d_fd, BLKSSZGET, &lba_sz);
+	error = ioctl(disk->d_fd, BLKSSZGET, &disk->d_lbasize);
 	if (error)
-		lba_sz = 512;
-	disk->d_lbalog = log2_roundup(lba_sz);
+		disk->d_lbasize = 512;
+	disk->d_lbalog = log2_roundup(disk->d_lbasize);
 
 	/* Obtain disk's stat info. */
 	error = fstat(disk->d_fd, &disk->d_sb);
diff --git a/scrub/disk.h b/scrub/disk.h
index 74a26d98..36bfb826 100644
--- a/scrub/disk.h
+++ b/scrub/disk.h
@@ -10,7 +10,8 @@
 struct disk {
 	struct stat	d_sb;
 	int		d_fd;
-	int		d_lbalog;
+	unsigned int	d_lbalog;
+	unsigned int	d_lbasize;	/* bytes */
 	unsigned int	d_flags;
 	unsigned int	d_blksize;	/* bytes */
 	uint64_t	d_size;		/* bytes */


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 08/11] xfs_scrub: enforce read verify pool minimum io size
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (6 preceding siblings ...)
  2019-09-06  3:38 ` [PATCH 07/11] xfs_scrub: record disk LBA size Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 09/11] xfs_scrub: return bytes verified from a SCSI VERIFY command Darrick J. Wong
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure we always issue media verification requests aligned to the
minimum IO size that the caller cares about.  Concretely, this means
that we only care about doing IO in filesystem block-sized chunks.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/read_verify.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 73d30817..9d9be68d 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -77,6 +77,15 @@ read_verify_pool_alloc(
 	struct read_verify_pool		*rvp;
 	int				ret;
 
+	/*
+	 * The minimum IO size must be a multiple of the disk sector size
+	 * and a factor of the max io size.
+	 */
+	if (miniosz % disk->d_lbasize)
+		return EINVAL;
+	if (RVP_IO_MAX_SIZE % miniosz)
+		return EINVAL;
+
 	rvp = calloc(1, sizeof(struct read_verify_pool));
 	if (!rvp)
 		return errno;
@@ -245,6 +254,11 @@ read_verify_schedule_io(
 	int				ret;
 
 	assert(rvp->readbuf);
+
+	/* Round up and down to the start of a miniosz chunk. */
+	start &= ~(rvp->miniosz - 1);
+	length = roundup(length, rvp->miniosz);
+
 	rv = ptvar_get(rvp->rvstate, &ret);
 	if (ret)
 		return ret;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 09/11] xfs_scrub: return bytes verified from a SCSI VERIFY command
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (7 preceding siblings ...)
  2019-09-06  3:38 ` [PATCH 08/11] xfs_scrub: enforce read verify pool minimum io size Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 10/11] xfs_scrub: fix read verify disk error handling strategy Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 11/11] xfs_scrub: simulate errors in the read-verify phase Darrick J. Wong
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Since disk_scsi_verify and pread are interchangeably called from
disk_read_verify(), we must return the number of bytes verified (or -1)
just like what pread returns.  This doesn't matter now due to bugs in
scrub, but we're about to fix those bugs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/disk.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/scrub/disk.c b/scrub/disk.c
index d2101cc6..bf9c795a 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -144,7 +144,7 @@ disk_scsi_verify(
 	iohdr.timeout = 30000; /* 30s */
 
 	error = ioctl(disk->d_fd, SG_IO, &iohdr);
-	if (error)
+	if (error < 0)
 		return error;
 
 	dbg_printf("VERIFY(16) fd %d lba %"PRIu64" len %"PRIu64" info %x "
@@ -163,7 +163,7 @@ disk_scsi_verify(
 		return -1;
 	}
 
-	return error;
+	return blockcount << BBSHIFT;
 }
 #else
 # define disk_scsi_verify(...)		(ENOTTY)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 10/11] xfs_scrub: fix read verify disk error handling strategy
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (8 preceding siblings ...)
  2019-09-06  3:38 ` [PATCH 09/11] xfs_scrub: return bytes verified from a SCSI VERIFY command Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  2019-09-06  3:38 ` [PATCH 11/11] xfs_scrub: simulate errors in the read-verify phase Darrick J. Wong
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The error handling strategy for media errors is totally bogus.  First of
all, short reads are entirely unhandled -- when we encounter a short
read, we know the disk was able to feed us the beginning of what we
asked for, so we need to single-step through the remainder to try to
capture the exact error that we hit.

Second, an actual IO error causes the entire region to be marked bad
even though it could be just a few MB of a multi-gigabyte extent that's
bad.  Therefore, single-step each block in the IO request until we stop
getting IO errors to find out if all the blocks are bad or if it's just
that extent.

Third, fix the fact that the loop updates its own counter variables with
the length fed to read(), which doesn't necessarily have anything to do
with the amount of data that the read actually produced.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/read_verify.c |   86 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 74 insertions(+), 12 deletions(-)


diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 9d9be68d..3dac10ce 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -169,30 +169,92 @@ read_verify(
 	struct read_verify		*rv = arg;
 	struct read_verify_pool		*rvp;
 	unsigned long long		verified = 0;
+	ssize_t				io_max_size;
 	ssize_t				sz;
 	ssize_t				len;
+	int				io_error;
 	int				ret;
 
 	rvp = (struct read_verify_pool *)wq->wq_ctx;
+	if (rvp->errors_seen)
+		return;
+
+	io_max_size = RVP_IO_MAX_SIZE;
+
 	while (rv->io_length > 0) {
-		len = min(rv->io_length, RVP_IO_MAX_SIZE);
+		io_error = 0;
+		len = min(rv->io_length, io_max_size);
 		dbg_printf("diskverify %d %"PRIu64" %zu\n", rvp->disk->d_fd,
 				rv->io_start, len);
 		sz = disk_read_verify(rvp->disk, rvp->readbuf, rv->io_start,
 				len);
-		if (sz < 0) {
-			dbg_printf("IOERR %d %"PRIu64" %zu\n",
-					rvp->disk->d_fd, rv->io_start, len);
-			/* IO error, so try the next logical block. */
-			len = rvp->miniosz;
-			rvp->ioerr_fn(rvp->ctx, rvp->disk, rv->io_start, len,
-					errno, rv->io_end_arg);
+		if (sz == len && io_max_size < rvp->miniosz) {
+			/*
+			 * If the verify request was 100% successful and less
+			 * than a single block in length, we were trying to
+			 * read to the end of a block after a short read.  That
+			 * suggests there's something funny with this device,
+			 * so single-step our way through the rest of the @rv
+			 * range.
+			 */
+			io_max_size = rvp->miniosz;
+		} else if (sz < 0) {
+			io_error = errno;
+
+			/* Runtime error, bail out... */
+			if (io_error != EIO && io_error != EILSEQ) {
+				rvp->errors_seen = io_error;
+				return;
+			}
+
+			/*
+			 * A direct read encountered an error while performing
+			 * a multi-block read.  Reduce the transfer size to a
+			 * single block so that we can identify the exact range
+			 * of bad blocks and good blocks.  We single-step all
+			 * the way to the end of the @rv range, (re)starting
+			 * with the block that just failed.
+			 */
+			if (io_max_size > rvp->miniosz) {
+				io_max_size = rvp->miniosz;
+				continue;
+			}
+
+			/*
+			 * A direct read hit an error while we were stepping
+			 * through single blocks.  Mark everything bad from
+			 * io_start to the next miniosz block.
+			 */
+			sz = rvp->miniosz - (rv->io_start % rvp->miniosz);
+			dbg_printf("IOERR %d @ %"PRIu64" %zu err %d\n",
+					rvp->disk->d_fd, rv->io_start, sz,
+					io_error);
+			rvp->ioerr_fn(rvp->ctx, rvp->disk, rv->io_start, sz,
+					io_error, rv->io_end_arg);
+		} else if (sz < len) {
+			/*
+			 * A short direct read suggests that we might have hit
+			 * an IO error midway through the read but still had to
+			 * return the number of bytes that were actually read.
+			 *
+			 * We need to force an EIO, so try reading the rest of
+			 * the block (if it was a partial block read) or the
+			 * next full block.
+			 */
+			io_max_size = rvp->miniosz - (sz % rvp->miniosz);
+			dbg_printf("SHORT %d READ @ %"PRIu64" %zu try for %zd\n",
+					rvp->disk->d_fd, rv->io_start, sz,
+					io_max_size);
+		} else {
+			/* We should never get back more bytes than we asked. */
+			assert(sz == len);
 		}
 
-		progress_add(len);
-		verified += len;
-		rv->io_start += len;
-		rv->io_length -= len;
+		progress_add(sz);
+		if (io_error == 0)
+			verified += sz;
+		rv->io_start += sz;
+		rv->io_length -= sz;
 	}
 
 	free(rv);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 11/11] xfs_scrub: simulate errors in the read-verify phase
  2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
                   ` (9 preceding siblings ...)
  2019-09-06  3:38 ` [PATCH 10/11] xfs_scrub: fix read verify disk error handling strategy Darrick J. Wong
@ 2019-09-06  3:38 ` Darrick J. Wong
  10 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Add a debugging hook so that we can simulate disk errors during the
media scan to test that the code works.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/disk.c      |   67 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 scrub/xfs_scrub.c |    2 ++
 2 files changed, 69 insertions(+)


diff --git a/scrub/disk.c b/scrub/disk.c
index bf9c795a..214a5346 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -276,6 +276,59 @@ disk_close(
 #define LBASIZE(d)		(1ULL << (d)->d_lbalog)
 #define BTOLBA(d, bytes)	(((uint64_t)(bytes) + LBASIZE(d) - 1) >> (d)->d_lbalog)
 
+/* Simulate disk errors. */
+static int
+disk_simulate_read_error(
+	struct disk		*disk,
+	uint64_t		start,
+	uint64_t		*length)
+{
+	static int64_t		interval;
+	uint64_t		start_interval;
+
+	/* Simulated disk errors are disabled. */
+	if (interval < 0)
+		return 0;
+
+	/* Figure out the disk read error interval. */
+	if (interval == 0) {
+		char		*p;
+
+		/* Pretend there's bad media every so often, in bytes. */
+		p = getenv("XFS_SCRUB_DISK_ERROR_INTERVAL");
+		if (p == NULL) {
+			interval = -1;
+			return 0;
+		}
+		interval = strtoull(p, NULL, 10);
+		interval &= ~((1U << disk->d_lbalog) - 1);
+	}
+
+	/*
+	 * We simulate disk errors by pretending that there are media errors at
+	 * predetermined intervals across the disk.  If a read verify request
+	 * crosses one of those intervals we shorten it so that the next read
+	 * will start on an interval threshold.  If the read verify request
+	 * starts on an interval threshold, we send back EIO as if it had
+	 * failed.
+	 */
+	if ((start % interval) == 0) {
+		dbg_printf("fd %d: simulating disk error at %"PRIu64".\n",
+				disk->d_fd, start);
+		return EIO;
+	}
+
+	start_interval = start / interval;
+	if (start_interval != (start + *length) / interval) {
+		*length = ((start_interval + 1) * interval) - start;
+		dbg_printf(
+"fd %d: simulating short read at %"PRIu64" to length %"PRIu64".\n",
+				disk->d_fd, start, *length);
+	}
+
+	return 0;
+}
+
 /* Read-verify an extent of a disk device. */
 ssize_t
 disk_read_verify(
@@ -284,6 +337,20 @@ disk_read_verify(
 	uint64_t		start,
 	uint64_t		length)
 {
+	if (debug) {
+		int		ret;
+
+		ret = disk_simulate_read_error(disk, start, &length);
+		if (ret) {
+			errno = ret;
+			return -1;
+		}
+
+		/* Don't actually issue the IO */
+		if (getenv("XFS_SCRUB_DISK_VERIFY_SKIP"))
+			return length;
+	}
+
 	/* Convert to logical block size. */
 	if (disk->d_flags & DISK_FLAG_SCSI_VERIFY)
 		return disk_scsi_verify(disk, BTOLBAT(disk, start),
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 05478093..b6a01274 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -111,6 +111,8 @@
  * XFS_SCRUB_NO_SCSI_VERIFY	-- disable SCSI VERIFY (if present)
  * XFS_SCRUB_PHASE		-- run only this scrub phase
  * XFS_SCRUB_THREADS		-- start exactly this number of threads
+ * XFS_SCRUB_DISK_ERROR_INTERVAL-- simulate a disk error every this many bytes
+ * XFS_SCRUB_DISK_VERIFY_SKIP	-- pretend disk verify read calls succeeded
  *
  * Available even in non-debug mode:
  * SERVICE_MODE			-- compress all error codes to 1 for LSB


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 00/11] xfs_scrub: fix IO error detection during media verify
@ 2019-09-25 21:34 Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-09-25 21:34 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

The existing scrub media scan code (phase 6) has a lot of problems
dealing with errors.  It treats runtime errors (e.g. ENOMEM) as disk
errors, it fails to handle short reads properly, it reports entire
extents as failed when the read returns EIO instead of single-stepping
filesystem blocks to find the bad ones, and there's no good way to test
that it works.

This series fixes each of those problems and adds a couple of debugging
knobs so that we can simulate disk errors and short reads to make sure
that the media scan code actually detects media errors correctly.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-error-detection

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 00/11] xfs_scrub: fix IO error detection during media verify
@ 2019-08-26 21:29 Darrick J. Wong
  0 siblings, 0 replies; 14+ messages in thread
From: Darrick J. Wong @ 2019-08-26 21:29 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

The existing scrub media scan code (phase 6) has a lot of problems
dealing with errors.  It treats runtime errors (e.g. ENOMEM) as disk
errors, it fails to handle short reads properly, it reports entire
extents as failed when the read returns EIO instead of single-stepping
filesystem blocks to find the bad ones, and there's no good way to test
that it works.

This series fixes each of those problems and adds a couple of debugging
knobs so that we can simulate disk errors and short reads to make sure
that the media scan code actually detects media errors correctly.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-error-detection

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-06  3:37 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
2019-09-06  3:37 ` [PATCH 01/11] xfs_scrub: fix handling of read-verify pool runtime errors Darrick J. Wong
2019-09-06  3:37 ` [PATCH 02/11] xfs_scrub: abort all read verification work immediately on error Darrick J. Wong
2019-09-06  3:37 ` [PATCH 03/11] xfs_scrub: fix read-verify pool error communication problems Darrick J. Wong
2019-09-06  3:38 ` [PATCH 04/11] xfs_scrub: fix queue-and-stash of non-contiguous verify requests Darrick J. Wong
2019-09-06  3:38 ` [PATCH 05/11] xfs_scrub: only call read_verify_force_io once per pool Darrick J. Wong
2019-09-06  3:38 ` [PATCH 06/11] xfs_scrub: refactor inode prefix rendering code Darrick J. Wong
2019-09-06  3:38 ` [PATCH 07/11] xfs_scrub: record disk LBA size Darrick J. Wong
2019-09-06  3:38 ` [PATCH 08/11] xfs_scrub: enforce read verify pool minimum io size Darrick J. Wong
2019-09-06  3:38 ` [PATCH 09/11] xfs_scrub: return bytes verified from a SCSI VERIFY command Darrick J. Wong
2019-09-06  3:38 ` [PATCH 10/11] xfs_scrub: fix read verify disk error handling strategy Darrick J. Wong
2019-09-06  3:38 ` [PATCH 11/11] xfs_scrub: simulate errors in the read-verify phase Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2019-09-25 21:34 [PATCH 00/11] xfs_scrub: fix IO error detection during media verify Darrick J. Wong
2019-08-26 21:29 Darrick J. Wong

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org linux-xfs@archiver.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox