linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] xfs_scrub: clean up error classifications
@ 2019-10-22 18:48 Darrick J. Wong
  2019-10-22 18:48 ` [PATCH 1/7] xfs_scrub: fix misclassified error reporting Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:48 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

In reviewing how xfs_scrub classifies errors, I noticed that there were
numerous things that could be improved.

First, I fixed some of the error reports that were simply in the wrong
category.

Next, I cleaned up some of the internal logic and error class tables in
preparation for future patches.

Then, I noticed that corruptions and some of the runtime errors were
being munged into a single errors_found counter.  I made this the
explicit corruptions counter, which can be incremented via
str_corrupt().  I then changed str_error() to mean runtime errors.

Having done that, I was then able to reclassify some str_info calls that
are really reporting runtime errors and therefore should use str_error.

Finally, I introduce a new category of errors that are unfixable by
scrub, and assign to this class all the media errors since there's
nothing XFS can do.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-reclassify-errors

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] xfs_scrub: fix misclassified error reporting
  2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
@ 2019-10-22 18:48 ` Darrick J. Wong
  2019-11-01 20:59   ` Eric Sandeen
  2019-10-22 18:49 ` [PATCH 2/7] xfs_scrub: simplify post-run reporting logic Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:48 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Fix a few places where we assign error reports to the wrong
classification.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/fscounters.c |    2 +-
 scrub/inodes.c     |    4 ++--
 scrub/phase1.c     |    2 +-
 scrub/xfs_scrub.c  |    2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)


diff --git a/scrub/fscounters.c b/scrub/fscounters.c
index 98aa3826..e064c865 100644
--- a/scrub/fscounters.c
+++ b/scrub/fscounters.c
@@ -48,7 +48,7 @@ xfs_count_inodes_ag(
 
 	ireq = xfrog_inumbers_alloc_req(64, 0);
 	if (!ireq) {
-		str_info(ctx, descr, _("Insufficient memory; giving up."));
+		str_liberror(ctx, ENOMEM, _("allocating inumbers request"));
 		return false;
 	}
 	xfrog_inumbers_set_ag(ireq, agno);
diff --git a/scrub/inodes.c b/scrub/inodes.c
index 7c04d7f6..71e53bb6 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -126,13 +126,13 @@ xfs_iterate_inodes_ag(
 
 	breq = xfrog_bulkstat_alloc_req(XFS_INODES_PER_CHUNK, 0);
 	if (!breq) {
-		str_info(ctx, descr, _("Insufficient memory; giving up."));
+		str_liberror(ctx, ENOMEM, _("allocating bulkstat request"));
 		return false;
 	}
 
 	ireq = xfrog_inumbers_alloc_req(1, 0);
 	if (!ireq) {
-		str_info(ctx, descr, _("Insufficient memory; giving up."));
+		str_liberror(ctx, ENOMEM, _("allocating inumbers request"));
 		free(breq);
 		return false;
 	}
diff --git a/scrub/phase1.c b/scrub/phase1.c
index 3211a488..d040c4a8 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -127,7 +127,7 @@ _("Not an XFS filesystem."));
 
 	if (!xfs_action_lists_alloc(ctx->mnt.fsgeom.agcount,
 				&ctx->action_lists)) {
-		str_error(ctx, ctx->mntpoint, _("Not enough memory."));
+		str_liberror(ctx, ENOMEM, _("allocating action lists"));
 		return false;
 	}
 
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 147c114c..e9fc3650 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -738,7 +738,7 @@ main(
 		str_info(&ctx, ctx.mntpoint, _("Too many errors; aborting."));
 
 	if (debug_tweak_on("XFS_SCRUB_FORCE_ERROR"))
-		str_error(&ctx, ctx.mntpoint, _("Injecting error."));
+		str_info(&ctx, ctx.mntpoint, _("Injecting error."));
 
 	/* Clean up scan data. */
 	moveon = xfs_cleanup_fs(&ctx);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7] xfs_scrub: simplify post-run reporting logic
  2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
  2019-10-22 18:48 ` [PATCH 1/7] xfs_scrub: fix misclassified error reporting Darrick J. Wong
@ 2019-10-22 18:49 ` Darrick J. Wong
  2019-11-01 21:05   ` Eric Sandeen
  2019-10-22 18:49 ` [PATCH 3/7] xfs_scrub: clean up error level table Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:49 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Simplify the post-run error and warning reporting logic so that in
subsequent patches we can be more specific about what types of things
went wrong.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/xfs_scrub.c |   18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)


diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index e9fc3650..c7305694 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -516,24 +516,20 @@ report_outcome(
 	total_errors = ctx->errors_found + ctx->runtime_errors;
 
 	if (total_errors == 0 && ctx->warnings_found == 0) {
-		log_info(ctx, _("No errors found."));
+		log_info(ctx, _("No problems found."));
 		return;
 	}
 
-	if (total_errors == 0) {
-		fprintf(stderr, _("%s: warnings found: %llu\n"), ctx->mntpoint,
-				ctx->warnings_found);
-		log_warn(ctx, _("warnings found: %llu"), ctx->warnings_found);
-	} else if (ctx->warnings_found == 0) {
+	if (total_errors > 0) {
 		fprintf(stderr, _("%s: errors found: %llu\n"), ctx->mntpoint,
 				total_errors);
 		log_err(ctx, _("errors found: %llu"), total_errors);
-	} else {
-		fprintf(stderr, _("%s: errors found: %llu; warnings found: %llu\n"),
-				ctx->mntpoint, total_errors,
+	}
+
+	if (ctx->warnings_found > 0) {
+		fprintf(stderr, _("%s: warnings found: %llu\n"), ctx->mntpoint,
 				ctx->warnings_found);
-		log_err(ctx, _("errors found: %llu; warnings found: %llu"),
-				total_errors, ctx->warnings_found);
+		log_warn(ctx, _("warnings found: %llu"), ctx->warnings_found);
 	}
 
 	/*


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/7] xfs_scrub: clean up error level table
  2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
  2019-10-22 18:48 ` [PATCH 1/7] xfs_scrub: fix misclassified error reporting Darrick J. Wong
  2019-10-22 18:49 ` [PATCH 2/7] xfs_scrub: simplify post-run reporting logic Darrick J. Wong
@ 2019-10-22 18:49 ` Darrick J. Wong
  2019-11-01 21:11   ` Eric Sandeen
  2019-10-22 18:49 ` [PATCH 4/7] xfs_scrub: explicitly track corruptions, not just errors Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:49 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Rework the error levels table in preparation for adding a few more error
categories that won't fit on a single line.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c |   25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index 7632a8d8..90fbad64 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -46,11 +46,26 @@ static struct {
 	const char *string;
 	int loglevel;
 } err_levels[] = {
-	[S_ERROR]  = { .string = "Error",	.loglevel = LOG_ERR },
-	[S_WARN]   = { .string = "Warning",	.loglevel = LOG_WARNING },
-	[S_REPAIR] = { .string = "Repaired",	.loglevel = LOG_INFO },
-	[S_INFO]   = { .string = "Info",	.loglevel = LOG_INFO },
-	[S_PREEN]  = { .string = "Optimized",	.loglevel = LOG_INFO }
+	[S_ERROR]  = {
+		.string = "Error",
+		.loglevel = LOG_ERR,
+	},
+	[S_WARN]   = {
+		.string = "Warning",
+		.loglevel = LOG_WARNING,
+	},
+	[S_REPAIR] = {
+		.string = "Repaired",
+		.loglevel = LOG_INFO,
+	},
+	[S_INFO]   = {
+		.string = "Info",
+		.loglevel = LOG_INFO,
+	},
+	[S_PREEN]  = {
+		.string = "Optimized",
+		.loglevel = LOG_INFO,
+	},
 };
 
 /* If stream is a tty, clear to end of line to clean up progress bar. */


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/7] xfs_scrub: explicitly track corruptions, not just errors
  2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-10-22 18:49 ` [PATCH 3/7] xfs_scrub: clean up error level table Darrick J. Wong
@ 2019-10-22 18:49 ` Darrick J. Wong
  2019-11-01 21:15   ` Eric Sandeen
  2019-10-22 18:49 ` [PATCH 5/7] xfs_scrub: promote some of the str_info to str_error calls Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:49 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Rename the @errors_found variable to @corruptions_found to make it
more explicit that we're tracking fs corruption issues.  Add a new
str_corrupt() function to handle communications that fall under this new
corruption classification.  str_error() now exists to log runtime errors
that do not have an associated errno code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |   12 ++++++++----
 scrub/common.h    |    3 +++
 scrub/phase4.c    |    2 +-
 scrub/phase5.c    |    2 +-
 scrub/phase6.c    |    6 +++---
 scrub/scrub.c     |    6 +++---
 scrub/xfs_scrub.c |   20 ++++++++++++++------
 scrub/xfs_scrub.h |    2 +-
 8 files changed, 34 insertions(+), 19 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index 90fbad64..b1c6abd1 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -36,7 +36,7 @@ xfs_scrub_excessive_errors(
 	bool			ret;
 
 	pthread_mutex_lock(&ctx->lock);
-	ret = ctx->max_errors > 0 && ctx->errors_found >= ctx->max_errors;
+	ret = ctx->max_errors > 0 && ctx->corruptions_found >= ctx->max_errors;
 	pthread_mutex_unlock(&ctx->lock);
 
 	return ret;
@@ -50,6 +50,10 @@ static struct {
 		.string = "Error",
 		.loglevel = LOG_ERR,
 	},
+	[S_CORRUPT] = {
+		.string = "Corruption",
+		.loglevel = LOG_ERR,
+	},
 	[S_WARN]   = {
 		.string = "Warning",
 		.loglevel = LOG_WARNING,
@@ -121,10 +125,10 @@ __str_out(
 		fflush(stream);
 
 out_record:
-	if (error)      /* A syscall failed */
+	if (error || level == S_ERROR)      /* A syscall failed */
 		ctx->runtime_errors++;
-	else if (level == S_ERROR)
-		ctx->errors_found++;
+	else if (level == S_CORRUPT)
+		ctx->corruptions_found++;
 	else if (level == S_WARN)
 		ctx->warnings_found++;
 	else if (level == S_REPAIR)
diff --git a/scrub/common.h b/scrub/common.h
index ef4cf439..b1f2ea2c 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -17,6 +17,7 @@ bool xfs_scrub_excessive_errors(struct scrub_ctx *ctx);
 
 enum error_level {
 	S_ERROR	= 0,
+	S_CORRUPT,
 	S_WARN,
 	S_INFO,
 	S_REPAIR,
@@ -30,6 +31,8 @@ void __str_out(struct scrub_ctx *ctx, const char *descr, enum error_level level,
 	__str_out(ctx, str, S_ERROR,	errno,	__FILE__, __LINE__, NULL)
 #define str_liberror(ctx, error, str) \
 	__str_out(ctx, str, S_ERROR,	error,	__FILE__, __LINE__, NULL)
+#define str_corrupt(ctx, str, ...) \
+	__str_out(ctx, str, S_CORRUPT,	0,	__FILE__, __LINE__, __VA_ARGS__)
 #define str_error(ctx, str, ...) \
 	__str_out(ctx, str, S_ERROR,	0,	__FILE__, __LINE__, __VA_ARGS__)
 #define str_warn(ctx, str, ...) \
diff --git a/scrub/phase4.c b/scrub/phase4.c
index eb30c189..1cf3f6b7 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -99,7 +99,7 @@ xfs_process_action_items(
 	workqueue_destroy(&wq);
 
 	pthread_mutex_lock(&ctx->lock);
-	if (moveon && ctx->errors_found == 0 && want_fstrim) {
+	if (moveon && ctx->corruptions_found == 0 && want_fstrim) {
 		fstrim(ctx);
 		progress_add(1);
 	}
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 27941907..dc0ee5e8 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -336,7 +336,7 @@ xfs_scan_connections(
 	bool			moveon = true;
 	bool			ret;
 
-	if (ctx->errors_found) {
+	if (ctx->corruptions_found) {
 		str_info(ctx, ctx->mntpoint,
 _("Filesystem has errors, skipping connectivity checks."));
 		return true;
diff --git a/scrub/phase6.c b/scrub/phase6.c
index fccd18e9..bb159641 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -233,7 +233,7 @@ _("found unexpected realtime attr fork extent."));
 	}
 
 	if (bitmap_test(bmp, bmap->bm_physical, bmap->bm_length))
-		str_error(ctx, descr,
+		str_corrupt(ctx, descr,
 _("media error in extended attribute data."));
 
 	return true;
@@ -389,7 +389,7 @@ report_ioerr_fsmap(
 		snprintf(buf, DESCR_BUFSZ, _("disk offset %"PRIu64),
 				(uint64_t)map->fmr_physical + err_off);
 		type = xfs_decode_special_owner(map->fmr_owner);
-		str_error(ctx, buf, _("media error in %s."), type);
+		str_corrupt(ctx, buf, _("media error in %s."), type);
 	}
 
 	/* Report extent maps */
@@ -400,7 +400,7 @@ report_ioerr_fsmap(
 				map->fmr_owner, 0, " %s",
 				attr ? _("extended attribute") :
 				       _("file data"));
-		str_error(ctx, buf, _("media error in extent map"));
+		str_corrupt(ctx, buf, _("media error in extent map"));
 	}
 
 	/*
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 0293ce30..75a64efa 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -188,7 +188,7 @@ _("Kernel bug!  errno=%d"), code);
 	 */
 	if (is_corrupt(meta) || xref_disagrees(meta)) {
 		if (ctx->mode < SCRUB_MODE_REPAIR) {
-			str_error(ctx, buf,
+			str_corrupt(ctx, buf,
 _("Repairs are required."));
 			return CHECK_DONE;
 		}
@@ -727,7 +727,7 @@ _("Filesystem is shut down, aborting."));
 			/* fall through */
 		case EINVAL:
 			/* Kernel doesn't know how to repair this? */
-			str_error(ctx, buf,
+			str_corrupt(ctx, buf,
 _("Don't know how to fix; offline repair required."));
 			return CHECK_DONE;
 		case EROFS:
@@ -768,7 +768,7 @@ _("Read-only filesystem; cannot make changes."));
 		 */
 		if (!(repair_flags & XRM_COMPLAIN_IF_UNFIXED))
 			return CHECK_RETRY;
-		str_error(ctx, buf,
+		str_corrupt(ctx, buf,
 _("Repair unsuccessful; offline repair required."));
 	} else {
 		/* Clean operation, no corruption detected. */
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index c7305694..222daae1 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -513,17 +513,25 @@ report_outcome(
 {
 	unsigned long long	total_errors;
 
-	total_errors = ctx->errors_found + ctx->runtime_errors;
+	total_errors = ctx->corruptions_found + ctx->runtime_errors;
 
 	if (total_errors == 0 && ctx->warnings_found == 0) {
 		log_info(ctx, _("No problems found."));
 		return;
 	}
 
-	if (total_errors > 0) {
-		fprintf(stderr, _("%s: errors found: %llu\n"), ctx->mntpoint,
-				total_errors);
-		log_err(ctx, _("errors found: %llu"), total_errors);
+	if (ctx->corruptions_found > 0) {
+		fprintf(stderr, _("%s: corruptions found: %llu\n"),
+				ctx->mntpoint, ctx->corruptions_found);
+		log_err(ctx, _("corruptions found: %llu"),
+				ctx->corruptions_found);
+	}
+
+	if (ctx->runtime_errors > 0) {
+		fprintf(stderr, _("%s: operational errors found: %llu\n"),
+				ctx->mntpoint, ctx->runtime_errors);
+		log_err(ctx, _("operational errors found: %llu"),
+				ctx->runtime_errors);
 	}
 
 	if (ctx->warnings_found > 0) {
@@ -745,7 +753,7 @@ main(
 	report_modifications(&ctx);
 	report_outcome(&ctx);
 
-	if (ctx.errors_found) {
+	if (ctx.corruptions_found) {
 		if (ctx.error_action == ERRORS_SHUTDOWN)
 			xfs_shutdown_fs(&ctx);
 		ret |= SCRUB_RET_CORRUPT;
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 37d78f61..5abc41fd 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -73,7 +73,7 @@ struct scrub_ctx {
 	struct xfs_action_list	*action_lists;
 	unsigned long long	max_errors;
 	unsigned long long	runtime_errors;
-	unsigned long long	errors_found;
+	unsigned long long	corruptions_found;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
 	unsigned long long	bytes_checked;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/7] xfs_scrub: promote some of the str_info to str_error calls
  2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-10-22 18:49 ` [PATCH 4/7] xfs_scrub: explicitly track corruptions, not just errors Darrick J. Wong
@ 2019-10-22 18:49 ` Darrick J. Wong
  2019-11-01 21:21   ` Eric Sandeen
  2019-10-22 18:49 ` [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors Darrick J. Wong
  2019-10-22 18:49 ` [PATCH 7/7] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
  6 siblings, 1 reply; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:49 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Now that str_error is only for runtime errors, we can promote a few of
the str_info calls that report runtime errors to str_error.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase1.c |   10 +++++-----
 scrub/scrub.c  |   11 +++++------
 2 files changed, 10 insertions(+), 11 deletions(-)


diff --git a/scrub/phase1.c b/scrub/phase1.c
index d040c4a8..0ae368ff 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -88,7 +88,7 @@ xfs_setup_fs(
 			O_RDONLY | O_NOATIME | O_DIRECTORY);
 	if (error) {
 		if (error == EPERM)
-			str_info(ctx, ctx->mntpoint,
+			str_error(ctx, ctx->mntpoint,
 _("Must be root to run scrub."));
 		else if (error == ENOTTY)
 			str_error(ctx, ctx->mntpoint,
@@ -143,26 +143,26 @@ _("Not an XFS filesystem."));
 	    !xfs_can_scrub_bmap(ctx) || !xfs_can_scrub_dir(ctx) ||
 	    !xfs_can_scrub_attr(ctx) || !xfs_can_scrub_symlink(ctx) ||
 	    !xfs_can_scrub_parent(ctx)) {
-		str_info(ctx, ctx->mntpoint,
+		str_error(ctx, ctx->mntpoint,
 _("Kernel metadata scrubbing facility is not available."));
 		return false;
 	}
 
 	/* Do we need kernel-assisted metadata repair? */
 	if (ctx->mode != SCRUB_MODE_DRY_RUN && !xfs_can_repair(ctx)) {
-		str_info(ctx, ctx->mntpoint,
+		str_error(ctx, ctx->mntpoint,
 _("Kernel metadata repair facility is not available.  Use -n to scrub."));
 		return false;
 	}
 
 	/* Did we find the log and rt devices, if they're present? */
 	if (ctx->mnt.fsgeom.logstart == 0 && ctx->fsinfo.fs_log == NULL) {
-		str_info(ctx, ctx->mntpoint,
+		str_error(ctx, ctx->mntpoint,
 _("Unable to find log device path."));
 		return false;
 	}
 	if (ctx->mnt.fsgeom.rtblocks && ctx->fsinfo.fs_rt == NULL) {
-		str_info(ctx, ctx->mntpoint,
+		str_error(ctx, ctx->mntpoint,
 _("Unable to find realtime device path."));
 		return false;
 	}
diff --git a/scrub/scrub.c b/scrub/scrub.c
index 75a64efa..718f09b8 100644
--- a/scrub/scrub.c
+++ b/scrub/scrub.c
@@ -141,7 +141,7 @@ xfs_check_metadata(
 			return CHECK_DONE;
 		case ESHUTDOWN:
 			/* FS already crashed, give up. */
-			str_info(ctx, buf,
+			str_error(ctx, buf,
 _("Filesystem is shut down, aborting."));
 			return CHECK_ABORT;
 		case EIO:
@@ -157,8 +157,7 @@ _("Filesystem is shut down, aborting."));
 			 * The first two should never escape the kernel,
 			 * and the other two should be reported via sm_flags.
 			 */
-			str_info(ctx, buf,
-_("Kernel bug!  errno=%d"), code);
+			str_liberror(ctx, code, _("Kernel bug"));
 			/* fall through */
 		default:
 			/* Operational error. */
@@ -702,7 +701,7 @@ _("Filesystem is busy, deferring repair."));
 			return CHECK_RETRY;
 		case ESHUTDOWN:
 			/* Filesystem is already shut down, abort. */
-			str_info(ctx, buf,
+			str_error(ctx, buf,
 _("Filesystem is shut down, aborting."));
 			return CHECK_ABORT;
 		case ENOTTY:
@@ -733,9 +732,9 @@ _("Don't know how to fix; offline repair required."));
 		case EROFS:
 			/* Read-only filesystem, can't fix. */
 			if (verbose || debug || needs_repair(&oldm))
-				str_info(ctx, buf,
+				str_error(ctx, buf,
 _("Read-only filesystem; cannot make changes."));
-			return CHECK_DONE;
+			return CHECK_ABORT;
 		case ENOENT:
 			/* Metadata not present, just skip it. */
 			return CHECK_DONE;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors
  2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-10-22 18:49 ` [PATCH 5/7] xfs_scrub: promote some of the str_info to str_error calls Darrick J. Wong
@ 2019-10-22 18:49 ` Darrick J. Wong
  2019-11-01 21:25   ` Eric Sandeen
  2019-11-01 21:50   ` Eric Sandeen
  2019-10-22 18:49 ` [PATCH 7/7] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
  6 siblings, 2 replies; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:49 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Refactor this helper to avoid cycling the scrub context lock when the
user hasn't configured a maximum error count threshold.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index b1c6abd1..261c6bb2 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -33,13 +33,20 @@ bool
 xfs_scrub_excessive_errors(
 	struct scrub_ctx	*ctx)
 {
-	bool			ret;
+	unsigned long long	errors_seen;
+
+	/*
+	 * We only set max_errors at the start of the program, so it's safe to
+	 * access it locklessly.
+	 */
+	if (ctx->max_errors <= 0)
+		return false;
 
 	pthread_mutex_lock(&ctx->lock);
-	ret = ctx->max_errors > 0 && ctx->corruptions_found >= ctx->max_errors;
+	errors_seen = ctx->corruptions_found;
 	pthread_mutex_unlock(&ctx->lock);
 
-	return ret;
+	return errors_seen >= ctx->max_errors;
 }
 
 static struct {


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/7] xfs_scrub: create a new category for unfixable errors
  2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
                   ` (5 preceding siblings ...)
  2019-10-22 18:49 ` [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors Darrick J. Wong
@ 2019-10-22 18:49 ` Darrick J. Wong
  2019-11-01 21:32   ` Eric Sandeen
  2019-11-01 22:15   ` Eric Sandeen
  6 siblings, 2 replies; 19+ messages in thread
From: Darrick J. Wong @ 2019-10-22 18:49 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

There's nothing that xfs_scrub (or XFS) can do about media errors for
data file blocks -- the data are gone.  Create a new category for these
unfixable errors so that we don't advise the user to take further action
that won't fix the problem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |    8 +++++++-
 scrub/common.h    |    3 +++
 scrub/phase4.c    |    5 ++++-
 scrub/phase5.c    |    2 +-
 scrub/phase6.c    |    2 +-
 scrub/xfs_scrub.c |   17 +++++++++++++----
 scrub/xfs_scrub.h |    1 +
 7 files changed, 30 insertions(+), 8 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index 261c6bb2..e72ae540 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -43,7 +43,7 @@ xfs_scrub_excessive_errors(
 		return false;
 
 	pthread_mutex_lock(&ctx->lock);
-	errors_seen = ctx->corruptions_found;
+	errors_seen = ctx->corruptions_found + ctx->unfixable_errors;
 	pthread_mutex_unlock(&ctx->lock);
 
 	return errors_seen >= ctx->max_errors;
@@ -61,6 +61,10 @@ static struct {
 		.string = "Corruption",
 		.loglevel = LOG_ERR,
 	},
+	[S_UNFIXABLE] = {
+		.string = "Unfixable Error",
+		.loglevel = LOG_ERR,
+	},
 	[S_WARN]   = {
 		.string = "Warning",
 		.loglevel = LOG_WARNING,
@@ -136,6 +140,8 @@ __str_out(
 		ctx->runtime_errors++;
 	else if (level == S_CORRUPT)
 		ctx->corruptions_found++;
+	else if (level == S_UNFIXABLE)
+		ctx->unfixable_errors++;
 	else if (level == S_WARN)
 		ctx->warnings_found++;
 	else if (level == S_REPAIR)
diff --git a/scrub/common.h b/scrub/common.h
index b1f2ea2c..cfd9f186 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -18,6 +18,7 @@ bool xfs_scrub_excessive_errors(struct scrub_ctx *ctx);
 enum error_level {
 	S_ERROR	= 0,
 	S_CORRUPT,
+	S_UNFIXABLE,
 	S_WARN,
 	S_INFO,
 	S_REPAIR,
@@ -43,6 +44,8 @@ void __str_out(struct scrub_ctx *ctx, const char *descr, enum error_level level,
 	__str_out(ctx, str, S_REPAIR,	0,	__FILE__, __LINE__, __VA_ARGS__)
 #define record_preen(ctx, str, ...) \
 	__str_out(ctx, str, S_PREEN,	0,	__FILE__, __LINE__, __VA_ARGS__)
+#define str_unfixable_error(ctx, str, ...) \
+	__str_out(ctx, str, S_UNFIXABLE, 0,	__FILE__, __LINE__, __VA_ARGS__)
 
 #define dbg_printf(fmt, ...) \
 	do {if (debug > 1) {printf(fmt, __VA_ARGS__);}} while (0)
diff --git a/scrub/phase4.c b/scrub/phase4.c
index 1cf3f6b7..a276bc32 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -99,7 +99,10 @@ xfs_process_action_items(
 	workqueue_destroy(&wq);
 
 	pthread_mutex_lock(&ctx->lock);
-	if (moveon && ctx->corruptions_found == 0 && want_fstrim) {
+	if (moveon &&
+	    ctx->corruptions_found == 0 &&
+	    ctx->unfixable_errors == 0 &&
+	    want_fstrim) {
 		fstrim(ctx);
 		progress_add(1);
 	}
diff --git a/scrub/phase5.c b/scrub/phase5.c
index dc0ee5e8..e0c4a3df 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -336,7 +336,7 @@ xfs_scan_connections(
 	bool			moveon = true;
 	bool			ret;
 
-	if (ctx->corruptions_found) {
+	if (ctx->corruptions_found || ctx->unfixable_errors) {
 		str_info(ctx, ctx->mntpoint,
 _("Filesystem has errors, skipping connectivity checks."));
 		return true;
diff --git a/scrub/phase6.c b/scrub/phase6.c
index bb159641..aae6b7d8 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -161,7 +161,7 @@ report_badfile(
 	bad_length = min(start + length,
 			 br->bmap->bm_physical + br->bmap->bm_length) - start;
 
-	str_error(br->ctx, br->descr,
+	str_unfixable_error(br->ctx, br->descr,
 _("media error at data offset %llu length %llu."),
 			br->bmap->bm_offset + bad_offset, bad_length);
 	return 0;
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 222daae1..963d0d70 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -511,15 +511,24 @@ static void
 report_outcome(
 	struct scrub_ctx	*ctx)
 {
-	unsigned long long	total_errors;
+	unsigned long long	actionable_errors;
 
-	total_errors = ctx->corruptions_found + ctx->runtime_errors;
+	actionable_errors = ctx->corruptions_found + ctx->runtime_errors;
 
-	if (total_errors == 0 && ctx->warnings_found == 0) {
+	if (actionable_errors == 0 &&
+	    ctx->unfixable_errors == 0 &&
+	    ctx->warnings_found == 0) {
 		log_info(ctx, _("No problems found."));
 		return;
 	}
 
+	if (ctx->unfixable_errors) {
+		fprintf(stderr, _("%s: unfixable errors found: %llu\n"),
+				ctx->mntpoint, ctx->unfixable_errors);
+		log_err(ctx, _("unfixable errors found: %llu"),
+				ctx->unfixable_errors);
+	}
+
 	if (ctx->corruptions_found > 0) {
 		fprintf(stderr, _("%s: corruptions found: %llu\n"),
 				ctx->mntpoint, ctx->corruptions_found);
@@ -545,7 +554,7 @@ report_outcome(
 	 * setting up the scrub and we actually saw corruptions.  Warnings
 	 * are not corruptions.
 	 */
-	if (ctx->scrub_setup_succeeded && total_errors > 0) {
+	if (ctx->scrub_setup_succeeded && actionable_errors > 0) {
 		char		*msg;
 
 		if (ctx->mode == SCRUB_MODE_DRY_RUN)
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 5abc41fd..61831c92 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -74,6 +74,7 @@ struct scrub_ctx {
 	unsigned long long	max_errors;
 	unsigned long long	runtime_errors;
 	unsigned long long	corruptions_found;
+	unsigned long long	unfixable_errors;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
 	unsigned long long	bytes_checked;


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7] xfs_scrub: fix misclassified error reporting
  2019-10-22 18:48 ` [PATCH 1/7] xfs_scrub: fix misclassified error reporting Darrick J. Wong
@ 2019-11-01 20:59   ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 20:59 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:48 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Fix a few places where we assign error reports to the wrong
> classification.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Eric Sandeen <sandeen@redhat.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] xfs_scrub: simplify post-run reporting logic
  2019-10-22 18:49 ` [PATCH 2/7] xfs_scrub: simplify post-run reporting logic Darrick J. Wong
@ 2019-11-01 21:05   ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 21:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Simplify the post-run error and warning reporting logic so that in
> subsequent patches we can be more specific about what types of things
> went wrong.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/7] xfs_scrub: clean up error level table
  2019-10-22 18:49 ` [PATCH 3/7] xfs_scrub: clean up error level table Darrick J. Wong
@ 2019-11-01 21:11   ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 21:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Rework the error levels table in preparation for adding a few more error
> categories that won't fit on a single line.

Ok, I'll massage this to match my prior reordering.

> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/7] xfs_scrub: explicitly track corruptions, not just errors
  2019-10-22 18:49 ` [PATCH 4/7] xfs_scrub: explicitly track corruptions, not just errors Darrick J. Wong
@ 2019-11-01 21:15   ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 21:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Rename the @errors_found variable to @corruptions_found to make it
> more explicit that we're tracking fs corruption issues.  Add a new
> str_corrupt() function to handle communications that fall under this new
> corruption classification.  str_error() now exists to log runtime errors
> that do not have an associated errno code.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Seems ok.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] xfs_scrub: promote some of the str_info to str_error calls
  2019-10-22 18:49 ` [PATCH 5/7] xfs_scrub: promote some of the str_info to str_error calls Darrick J. Wong
@ 2019-11-01 21:21   ` Eric Sandeen
  0 siblings, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 21:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Now that str_error is only for runtime errors, we can promote a few of
> the str_info calls that report runtime errors to str_error.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors
  2019-10-22 18:49 ` [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors Darrick J. Wong
@ 2019-11-01 21:25   ` Eric Sandeen
  2019-11-01 21:46     ` Darrick J. Wong
  2019-11-01 21:50   ` Eric Sandeen
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 21:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Refactor this helper to avoid cycling the scrub context lock when the
> user hasn't configured a maximum error count threshold.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  scrub/common.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> 
> diff --git a/scrub/common.c b/scrub/common.c
> index b1c6abd1..261c6bb2 100644
> --- a/scrub/common.c
> +++ b/scrub/common.c
> @@ -33,13 +33,20 @@ bool
>  xfs_scrub_excessive_errors(
>  	struct scrub_ctx	*ctx)
>  {
> -	bool			ret;
> +	unsigned long long	errors_seen;
> +
> +	/*
> +	 * We only set max_errors at the start of the program, so it's safe to
> +	 * access it locklessly.
> +	 */
> +	if (ctx->max_errors <= 0)

max_errors is an /unsigned/ long long, 'sup w/ the < part?

== maybe?

> +		return false;
>  
>  	pthread_mutex_lock(&ctx->lock);
> -	ret = ctx->max_errors > 0 && ctx->corruptions_found >= ctx->max_errors;
> +	errors_seen = ctx->corruptions_found;
>  	pthread_mutex_unlock(&ctx->lock);
>  
> -	return ret;
> +	return errors_seen >= ctx->max_errors;
>  }
>  
>  static struct {
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: create a new category for unfixable errors
  2019-10-22 18:49 ` [PATCH 7/7] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
@ 2019-11-01 21:32   ` Eric Sandeen
  2019-11-01 22:10     ` Darrick J. Wong
  2019-11-01 22:15   ` Eric Sandeen
  1 sibling, 1 reply; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 21:32 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> There's nothing that xfs_scrub (or XFS) can do about media errors for
> data file blocks -- the data are gone.  Create a new category for these
> unfixable errors so that we don't advise the user to take further action
> that won't fix the problem.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

are "unfixable errors" /exclusively/ file data media errors?

> diff --git a/scrub/phase4.c b/scrub/phase4.c
> index 1cf3f6b7..a276bc32 100644
> --- a/scrub/phase4.c
> +++ b/scrub/phase4.c
> @@ -99,7 +99,10 @@ xfs_process_action_items(
>  	workqueue_destroy(&wq);
>  
>  	pthread_mutex_lock(&ctx->lock);
> -	if (moveon && ctx->corruptions_found == 0 && want_fstrim) {
> +	if (moveon &&
> +	    ctx->corruptions_found == 0 &&
> +	    ctx->unfixable_errors == 0 &&
> +	    want_fstrim) {
>  		fstrim(ctx);
>  		progress_add(1);
>  	}


why would a file data media error preclude fstrim?

> diff --git a/scrub/phase5.c b/scrub/phase5.c
> index dc0ee5e8..e0c4a3df 100644
> --- a/scrub/phase5.c
> +++ b/scrub/phase5.c
> @@ -336,7 +336,7 @@ xfs_scan_connections(
>  	bool			moveon = true;
>  	bool			ret;
>  
> -	if (ctx->corruptions_found) {
> +	if (ctx->corruptions_found || ctx->unfixable_errors) {
>  		str_info(ctx, ctx->mntpoint,
>  _("Filesystem has errors, skipping connectivity checks."));

why would a file data media error stop the connectivity check?

unless "unfixable" may be other types, in which case it makes sense.

But the commit log indicates it's just for a file data media error.

What's the intent?

-Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors
  2019-11-01 21:25   ` Eric Sandeen
@ 2019-11-01 21:46     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2019-11-01 21:46 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Fri, Nov 01, 2019 at 04:25:55PM -0500, Eric Sandeen wrote:
> On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Refactor this helper to avoid cycling the scrub context lock when the
> > user hasn't configured a maximum error count threshold.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  scrub/common.c |   13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> > 
> > 
> > diff --git a/scrub/common.c b/scrub/common.c
> > index b1c6abd1..261c6bb2 100644
> > --- a/scrub/common.c
> > +++ b/scrub/common.c
> > @@ -33,13 +33,20 @@ bool
> >  xfs_scrub_excessive_errors(
> >  	struct scrub_ctx	*ctx)
> >  {
> > -	bool			ret;
> > +	unsigned long long	errors_seen;
> > +
> > +	/*
> > +	 * We only set max_errors at the start of the program, so it's safe to
> > +	 * access it locklessly.
> > +	 */
> > +	if (ctx->max_errors <= 0)
> 
> max_errors is an /unsigned/ long long, 'sup w/ the < part?

Being thorough. :)

> == maybe?

Yes, that works.

--D

> > +		return false;
> >  
> >  	pthread_mutex_lock(&ctx->lock);
> > -	ret = ctx->max_errors > 0 && ctx->corruptions_found >= ctx->max_errors;
> > +	errors_seen = ctx->corruptions_found;
> >  	pthread_mutex_unlock(&ctx->lock);
> >  
> > -	return ret;
> > +	return errors_seen >= ctx->max_errors;
> >  }
> >  
> >  static struct {
> > 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors
  2019-10-22 18:49 ` [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors Darrick J. Wong
  2019-11-01 21:25   ` Eric Sandeen
@ 2019-11-01 21:50   ` Eric Sandeen
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 21:50 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Refactor this helper to avoid cycling the scrub context lock when the
> user hasn't configured a maximum error count threshold.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Ok I'll change the ULL <= 0 thing on the way in.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: create a new category for unfixable errors
  2019-11-01 21:32   ` Eric Sandeen
@ 2019-11-01 22:10     ` Darrick J. Wong
  0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2019-11-01 22:10 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-xfs

On Fri, Nov 01, 2019 at 04:32:28PM -0500, Eric Sandeen wrote:
> On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > There's nothing that xfs_scrub (or XFS) can do about media errors for
> > data file blocks -- the data are gone.  Create a new category for these
> > unfixable errors so that we don't advise the user to take further action
> > that won't fix the problem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> are "unfixable errors" /exclusively/ file data media errors?

The intent behind this new classification is for things that we can't
ever fix or rebuild in the filesystem.  However, at least for now, file
data loss is the only source of unfixable errors.

> > diff --git a/scrub/phase4.c b/scrub/phase4.c
> > index 1cf3f6b7..a276bc32 100644
> > --- a/scrub/phase4.c
> > +++ b/scrub/phase4.c
> > @@ -99,7 +99,10 @@ xfs_process_action_items(
> >  	workqueue_destroy(&wq);
> >  
> >  	pthread_mutex_lock(&ctx->lock);
> > -	if (moveon && ctx->corruptions_found == 0 && want_fstrim) {
> > +	if (moveon &&
> > +	    ctx->corruptions_found == 0 &&
> > +	    ctx->unfixable_errors == 0 &&
> > +	    want_fstrim) {
> >  		fstrim(ctx);
> >  		progress_add(1);
> >  	}
> 
> 
> why would a file data media error preclude fstrim?

If there's even the slightest hint of things still being wrong with the
filesystem or the underlying storage, we should avoid writing to the
storage (e.g. FITRIM) because that could screw things up even more.

Note that this fstrim happens at the end of phase 4, so that means that
if we have any corruptions or unfixable errors at this point, they're in
the metadata, our attempts to fix them have failed, and so we absolutely
should not go writing more things to the disk.

> > diff --git a/scrub/phase5.c b/scrub/phase5.c
> > index dc0ee5e8..e0c4a3df 100644
> > --- a/scrub/phase5.c
> > +++ b/scrub/phase5.c
> > @@ -336,7 +336,7 @@ xfs_scan_connections(
> >  	bool			moveon = true;
> >  	bool			ret;
> >  
> > -	if (ctx->corruptions_found) {
> > +	if (ctx->corruptions_found || ctx->unfixable_errors) {
> >  		str_info(ctx, ctx->mntpoint,
> >  _("Filesystem has errors, skipping connectivity checks."));
> 
> why would a file data media error stop the connectivity check?

It won't, because the media scan happens during phase 6.

> unless "unfixable" may be other types, in which case it makes sense.

They will, presumably.  For an unfixable error to stop the connectivity
checks (phase 5), the unfixable error would have to be observed during
phases 1-4.  Those first four phases exclusively look for (and repair)
internal fs metadata, so if we reach phase 5 with an unfixable error
that means the fs metadata are trashed and xfs_scrub is going to fail
anyway.

Note that I haven't actually tried to write a directory repair function
yet.  The last time I asked viro about it he wondered what in the hell I
was going to do about the dentry cache and doubted that it could be
done... so either I have to prove him wrong, or maybe directories /will/
become the first source of unfixable errors for phase 1-4.

So yes, the 'unfixable' category is supposed to be broad enough to cover
more than just file data loss, though for now it's only used for that.

--D

> But the commit log indicates it's just for a file data media error.
> 
> What's the intent?
>
> -Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] xfs_scrub: create a new category for unfixable errors
  2019-10-22 18:49 ` [PATCH 7/7] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
  2019-11-01 21:32   ` Eric Sandeen
@ 2019-11-01 22:15   ` Eric Sandeen
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Sandeen @ 2019-11-01 22:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On 10/22/19 1:49 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> There's nothing that xfs_scrub (or XFS) can do about media errors for
> data file blocks -- the data are gone.  Create a new category for these
> unfixable errors so that we don't advise the user to take further action
> that won't fix the problem.

Ok.  I'll just slightly edit this to indicate that it's not exclusively
media errors and the rest makes sense.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-11-01 22:15 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-22 18:48 [PATCH 0/7] xfs_scrub: clean up error classifications Darrick J. Wong
2019-10-22 18:48 ` [PATCH 1/7] xfs_scrub: fix misclassified error reporting Darrick J. Wong
2019-11-01 20:59   ` Eric Sandeen
2019-10-22 18:49 ` [PATCH 2/7] xfs_scrub: simplify post-run reporting logic Darrick J. Wong
2019-11-01 21:05   ` Eric Sandeen
2019-10-22 18:49 ` [PATCH 3/7] xfs_scrub: clean up error level table Darrick J. Wong
2019-11-01 21:11   ` Eric Sandeen
2019-10-22 18:49 ` [PATCH 4/7] xfs_scrub: explicitly track corruptions, not just errors Darrick J. Wong
2019-11-01 21:15   ` Eric Sandeen
2019-10-22 18:49 ` [PATCH 5/7] xfs_scrub: promote some of the str_info to str_error calls Darrick J. Wong
2019-11-01 21:21   ` Eric Sandeen
2019-10-22 18:49 ` [PATCH 6/7] xfs_scrub: refactor xfs_scrub_excessive_errors Darrick J. Wong
2019-11-01 21:25   ` Eric Sandeen
2019-11-01 21:46     ` Darrick J. Wong
2019-11-01 21:50   ` Eric Sandeen
2019-10-22 18:49 ` [PATCH 7/7] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
2019-11-01 21:32   ` Eric Sandeen
2019-11-01 22:10     ` Darrick J. Wong
2019-11-01 22:15   ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).