Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/11] xfs_scrub: fix IO error reporting
@ 2019-09-06  3:38 Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 01/11] xfs_scrub: separate media error reporting for attribute forks Darrick J. Wong
                   ` (10 more replies)
  0 siblings, 11 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:38 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

The scrub media error reporting could use some improvements -- first,
scrub can calculate the exact offset of media errors in file mappings,
so we should report more precise offsets.  Second, we only need to scan
the rmap once after assembling the io error bitmap to look for destroyed
metadata (instead of once per error!).  Third, we can filter out
unwritten and attr/cow fork extents from what we report since sector
remapping takes care of unwritten/cow extents and attr media errors
should be detected by phase 3.  Finally, we introduce a new category of
errors that are unfixable by scrub, and assign to this class all the
media errors since there's nothing XFS can do.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-media-error-reporting

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=scrub-media-error-reporting

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 01/11] xfs_scrub: separate media error reporting for attribute forks
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 02/11] xfs_scrub: improve reporting of file data media errors Darrick J. Wong
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Use different functions to warn about media errors that were detected
underlying xattr data because logical offsets for attribute fork extents
have no meaning to users.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c |   45 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 7 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index 4554af9a..1edd98af 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -113,7 +113,7 @@ xfs_decode_special_owner(
 
 /* Report if this extent overlaps a bad region. */
 static bool
-xfs_report_verify_inode_bmap(
+report_data_loss(
 	struct scrub_ctx		*ctx,
 	const char			*descr,
 	int				fd,
@@ -142,6 +142,40 @@ _("offset %llu failed read verification."), bmap->bm_offset);
 	return true;
 }
 
+/* Report if the extended attribute data overlaps a bad region. */
+static bool
+report_attr_loss(
+	struct scrub_ctx		*ctx,
+	const char			*descr,
+	int				fd,
+	int				whichfork,
+	struct fsxattr			*fsx,
+	struct xfs_bmap			*bmap,
+	void				*arg)
+{
+	struct media_verify_state	*vs = arg;
+	struct bitmap			*bmp = vs->d_bad;
+
+	/* Complain about attr fork extents that don't look right. */
+	if (bmap->bm_flags & (BMV_OF_PREALLOC | BMV_OF_DELALLOC)) {
+		str_info(ctx, descr,
+_("found unexpected unwritten/delalloc attr fork extent."));
+		return true;
+	}
+
+	if (fsx->fsx_xflags & FS_XFLAG_REALTIME) {
+		str_info(ctx, descr,
+_("found unexpected realtime attr fork extent."));
+		return true;
+	}
+
+	if (bitmap_test(bmp, bmap->bm_physical, bmap->bm_length))
+		str_error(ctx, descr,
+_("media error in extended attribute data."));
+
+	return true;
+}
+
 /* Iterate the extent mappings of a file to report errors. */
 static bool
 xfs_report_verify_fd(
@@ -155,16 +189,13 @@ xfs_report_verify_fd(
 
 	/* data fork */
 	moveon = xfs_iterate_filemaps(ctx, descr, fd, XFS_DATA_FORK, &key,
-			xfs_report_verify_inode_bmap, arg);
+			report_data_loss, arg);
 	if (!moveon)
 		return false;
 
 	/* attr fork */
-	moveon = xfs_iterate_filemaps(ctx, descr, fd, XFS_ATTR_FORK, &key,
-			xfs_report_verify_inode_bmap, arg);
-	if (!moveon)
-		return false;
-	return true;
+	return xfs_iterate_filemaps(ctx, descr, fd, XFS_ATTR_FORK, &key,
+			report_attr_loss, arg);
 }
 
 /* Report read verify errors in unlinked (but still open) files. */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 02/11] xfs_scrub: improve reporting of file data media errors
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 01/11] xfs_scrub: separate media error reporting for attribute forks Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 03/11] xfs_scrub: better reporting of metadata " Darrick J. Wong
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

When we report media errors, we should tell the administrator the file
offset and length of the bad region, not just the offset of the entire
file extent record that overlaps a bad region.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libfrog/bitmap.c |   37 +++++++++++++++++++++++++++++++++++++
 libfrog/bitmap.h |    2 ++
 scrub/phase6.c   |   52 +++++++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 86 insertions(+), 5 deletions(-)


diff --git a/libfrog/bitmap.c b/libfrog/bitmap.c
index a75d085a..6a88ef48 100644
--- a/libfrog/bitmap.c
+++ b/libfrog/bitmap.c
@@ -339,6 +339,43 @@ bitmap_iterate(
 }
 #endif
 
+/* Iterate the set regions of part of this bitmap. */
+int
+bitmap_iterate_range(
+	struct bitmap		*bmap,
+	uint64_t		start,
+	uint64_t		length,
+	int			(*fn)(uint64_t, uint64_t, void *),
+	void			*arg)
+{
+	struct avl64node	*firstn;
+	struct avl64node	*lastn;
+	struct avl64node	*pos;
+	struct avl64node	*n;
+	struct avl64node	*l;
+	struct bitmap_node	*ext;
+	int			ret = 0;
+
+	pthread_mutex_lock(&bmap->bt_lock);
+
+	avl64_findranges(bmap->bt_tree, start, start + length, &firstn,
+			&lastn);
+
+	if (firstn == NULL && lastn == NULL)
+		goto out;
+
+	avl_for_each_range_safe(pos, n, l, firstn, lastn) {
+		ext = container_of(pos, struct bitmap_node, btn_node);
+		ret = fn(ext->btn_start, ext->btn_length, arg);
+		if (ret)
+			break;
+	}
+
+out:
+	pthread_mutex_unlock(&bmap->bt_lock);
+	return ret;
+}
+
 /* Do any bitmap extents overlap the given one?  (locked) */
 static bool
 __bitmap_test(
diff --git a/libfrog/bitmap.h b/libfrog/bitmap.h
index 759386a8..043b77ee 100644
--- a/libfrog/bitmap.h
+++ b/libfrog/bitmap.h
@@ -16,6 +16,8 @@ void bitmap_free(struct bitmap **bmap);
 int bitmap_set(struct bitmap *bmap, uint64_t start, uint64_t length);
 int bitmap_iterate(struct bitmap *bmap, int (*fn)(uint64_t, uint64_t, void *),
 		void *arg);
+int bitmap_iterate_range(struct bitmap *bmap, uint64_t start, uint64_t length,
+		int (*fn)(uint64_t, uint64_t, void *), void *arg);
 bool bitmap_test(struct bitmap *bmap, uint64_t start,
 		uint64_t len);
 bool bitmap_empty(struct bitmap *bmap);
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 1edd98af..a16ad114 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -111,6 +111,41 @@ xfs_decode_special_owner(
 
 /* Routines to translate bad physical extents into file paths and offsets. */
 
+struct badfile_report {
+	struct scrub_ctx	*ctx;
+	const char		*descr;
+	struct xfs_bmap		*bmap;
+};
+
+/* Report on bad extents found during a media scan. */
+static int
+report_badfile(
+	uint64_t		start,
+	uint64_t		length,
+	void			*arg)
+{
+	struct badfile_report	*br = arg;
+	unsigned long long	bad_offset;
+	unsigned long long	bad_length;
+
+	/* Clamp the bad region to the file mapping. */
+	if (start < br->bmap->bm_physical) {
+		length -= br->bmap->bm_physical - start;
+		start = br->bmap->bm_physical;
+	}
+	length = min(length, br->bmap->bm_length);
+
+	/* Figure out how far into the bmap is the bad mapping and report it. */
+	bad_offset = start - br->bmap->bm_physical;
+	bad_length = min(start + length,
+			 br->bmap->bm_physical + br->bmap->bm_length) - start;
+
+	str_error(br->ctx, br->descr,
+_("media error at data offset %llu length %llu."),
+			br->bmap->bm_offset + bad_offset, bad_length);
+	return 0;
+}
+
 /* Report if this extent overlaps a bad region. */
 static bool
 report_data_loss(
@@ -122,8 +157,14 @@ report_data_loss(
 	struct xfs_bmap			*bmap,
 	void				*arg)
 {
+	struct badfile_report		br = {
+		.ctx			= ctx,
+		.descr			= descr,
+		.bmap			= bmap,
+	};
 	struct media_verify_state	*vs = arg;
 	struct bitmap			*bmp;
+	int				ret;
 
 	/* Only report errors for real extents. */
 	if (bmap->bm_flags & (BMV_OF_PREALLOC | BMV_OF_DELALLOC))
@@ -134,11 +175,12 @@ report_data_loss(
 	else
 		bmp = vs->d_bad;
 
-	if (!bitmap_test(bmp, bmap->bm_physical, bmap->bm_length))
-		return true;
-
-	str_error(ctx, descr,
-_("offset %llu failed read verification."), bmap->bm_offset);
+	ret = bitmap_iterate_range(bmp, bmap->bm_physical, bmap->bm_length,
+			report_badfile, &br);
+	if (ret) {
+		str_liberror(ctx, ret, descr);
+		return false;
+	}
 	return true;
 }
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 03/11] xfs_scrub: better reporting of metadata media errors
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 01/11] xfs_scrub: separate media error reporting for attribute forks Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 02/11] xfs_scrub: improve reporting of file data media errors Darrick J. Wong
@ 2019-09-06  3:39 ` " Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 04/11] xfs_scrub: improve reporting of file " Darrick J. Wong
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

When we report bad metadata, we inexplicably report the physical address
in units of sectors, whereas for file data we report file offsets in
units of bytes.  Fix the metadata reporting units to match the file data
units (i.e. bytes) and skip the printf for all other cases.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index a16ad114..310ab36c 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -368,7 +368,7 @@ xfs_check_rmap_error_report(
 	void			*arg)
 {
 	const char		*type;
-	char			buf[32];
+	char			buf[DESCR_BUFSZ];
 	uint64_t		err_physical = *(uint64_t *)arg;
 	uint64_t		err_off;
 
@@ -377,14 +377,12 @@ xfs_check_rmap_error_report(
 	else
 		err_off = 0;
 
-	snprintf(buf, 32, _("disk offset %"PRIu64),
-			(uint64_t)BTOBB(map->fmr_physical + err_off));
-
+	/* Report special owners */
 	if (map->fmr_flags & FMR_OF_SPECIAL_OWNER) {
+		snprintf(buf, DESCR_BUFSZ, _("disk offset %"PRIu64),
+				(uint64_t)map->fmr_physical + err_off);
 		type = xfs_decode_special_owner(map->fmr_owner);
-		str_error(ctx, buf,
-_("%s failed read verification."),
-				type);
+		str_error(ctx, buf, _("media error in %s."), type);
 	}
 
 	/*


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 04/11] xfs_scrub: improve reporting of file metadata media errors
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (2 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 03/11] xfs_scrub: better reporting of metadata " Darrick J. Wong
@ 2019-09-06  3:39 ` " Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 05/11] xfs_scrub: don't report media errors on unwritten extents Darrick J. Wong
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Report media errors that map to data and attr fork extent maps.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c |   11 +++++++++++
 1 file changed, 11 insertions(+)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index 310ab36c..1013ba6d 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -385,6 +385,17 @@ xfs_check_rmap_error_report(
 		str_error(ctx, buf, _("media error in %s."), type);
 	}
 
+	/* Report extent maps */
+	if (map->fmr_flags & FMR_OF_EXTENT_MAP) {
+		bool		attr = (map->fmr_flags & FMR_OF_ATTR_FORK);
+
+		scrub_render_ino_suffix(ctx, buf, DESCR_BUFSZ,
+				map->fmr_owner, 0, " %s",
+				attr ? _("extended attribute") :
+				       _("file data"));
+		str_error(ctx, buf, _("media error in extent map"));
+	}
+
 	/*
 	 * XXX: If we had a getparent() call we could report IO errors
 	 * efficiently.  Until then, we'll have to scan the dir tree


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 05/11] xfs_scrub: don't report media errors on unwritten extents
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (3 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 04/11] xfs_scrub: improve reporting of file " Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 06/11] xfs_scrub: reduce fsmap activity for media errors Darrick J. Wong
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Don't report media errors for unwritten extents since no data has been
lost.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c |    4 ++++
 1 file changed, 4 insertions(+)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index 1013ba6d..ec821373 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -372,6 +372,10 @@ xfs_check_rmap_error_report(
 	uint64_t		err_physical = *(uint64_t *)arg;
 	uint64_t		err_off;
 
+	/* Don't care about unwritten extents. */
+	if (map->fmr_flags & FMR_OF_PREALLOC)
+		return true;
+
 	if (err_physical > map->fmr_physical)
 		err_off = err_physical - map->fmr_physical;
 	else


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 06/11] xfs_scrub: reduce fsmap activity for media errors
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (4 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 05/11] xfs_scrub: don't report media errors on unwritten extents Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 07/11] xfs_scrub: request fewer bmaps when we can Darrick J. Wong
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Right now we rather foolishly query the fsmap data for every single
media error that we find.  This is a silly waste of time since we
have yet to combine adjacent bad blocks into bad extents, so move the
rmap query until after we've constructed the bad block bitmap data.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libfrog/bitmap.c |   10 +---
 scrub/phase6.c   |  148 ++++++++++++++++++++++++++++++++++++++----------------
 2 files changed, 108 insertions(+), 50 deletions(-)


diff --git a/libfrog/bitmap.c b/libfrog/bitmap.c
index 6a88ef48..5daa1081 100644
--- a/libfrog/bitmap.c
+++ b/libfrog/bitmap.c
@@ -314,7 +314,6 @@ bitmap_clear(
 }
 #endif
 
-#ifdef DEBUG
 /* Iterate the set regions of this bitmap. */
 int
 bitmap_iterate(
@@ -324,20 +323,19 @@ bitmap_iterate(
 {
 	struct avl64node	*node;
 	struct bitmap_node	*ext;
-	int			error = 0;
+	int			ret;
 
 	pthread_mutex_lock(&bmap->bt_lock);
 	avl_for_each(bmap->bt_tree, node) {
 		ext = container_of(node, struct bitmap_node, btn_node);
-		error = fn(ext->btn_start, ext->btn_length, arg);
-		if (error)
+		ret = fn(ext->btn_start, ext->btn_length, arg);
+		if (ret)
 			break;
 	}
 	pthread_mutex_unlock(&bmap->bt_lock);
 
-	return error;
+	return ret;
 }
-#endif
 
 /* Iterate the set regions of part of this bitmap. */
 int
diff --git a/scrub/phase6.c b/scrub/phase6.c
index ec821373..378ea0fb 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -341,27 +341,9 @@ xfs_report_verify_dirent(
 	return moveon;
 }
 
-/* Given bad extent lists for the data & rtdev, find bad files. */
-static bool
-xfs_report_verify_errors(
-	struct scrub_ctx		*ctx,
-	struct media_verify_state	*vs)
-{
-	bool				moveon;
-
-	/* Scan the directory tree to get file paths. */
-	moveon = scan_fs_tree(ctx, xfs_report_verify_dir,
-			xfs_report_verify_dirent, vs);
-	if (!moveon)
-		return false;
-
-	/* Scan for unlinked files. */
-	return xfs_scan_all_inodes(ctx, xfs_report_verify_inode, vs);
-}
-
 /* Report an IO error resulting from read-verify based off getfsmap. */
 static bool
-xfs_check_rmap_error_report(
+ioerr_fsmap_report(
 	struct scrub_ctx	*ctx,
 	const char		*descr,
 	struct fsmap		*map,
@@ -409,12 +391,31 @@ xfs_check_rmap_error_report(
 	return true;
 }
 
+static struct bitmap *
+bitmap_for_disk(
+	struct scrub_ctx		*ctx,
+	struct disk			*disk,
+	struct media_verify_state	*vs)
+{
+	dev_t				dev = xfs_disk_to_dev(ctx, disk);
+
+	/*
+	 * If we don't have parent pointers, save the bad extent for
+	 * later rescanning.
+	 */
+	if (dev == ctx->fsinfo.fs_datadev)
+		return vs->d_bad;
+	else if (dev == ctx->fsinfo.fs_rtdev)
+		return vs->r_bad;
+	return NULL;
+}
+
 /*
  * Remember a read error for later, and see if rmap will tell us about the
  * owner ahead of time.
  */
 static void
-xfs_check_rmap_ioerr(
+remember_ioerr(
 	struct scrub_ctx		*ctx,
 	struct disk			*disk,
 	uint64_t			start,
@@ -422,32 +423,39 @@ xfs_check_rmap_ioerr(
 	int				error,
 	void				*arg)
 {
-	struct fsmap			keys[2];
-	char				descr[DESCR_BUFSZ];
 	struct media_verify_state	*vs = arg;
 	struct bitmap			*tree;
-	dev_t				dev;
 	int				ret;
 
-	dev = xfs_disk_to_dev(ctx, disk);
+	tree = bitmap_for_disk(ctx, disk, vs);
+	if (!tree)
+		return;
 
-	/*
-	 * If we don't have parent pointers, save the bad extent for
-	 * later rescanning.
-	 */
-	if (dev == ctx->fsinfo.fs_datadev)
-		tree = vs->d_bad;
-	else if (dev == ctx->fsinfo.fs_rtdev)
-		tree = vs->r_bad;
-	else
-		tree = NULL;
-	if (tree) {
-		ret = bitmap_set(tree, start, length);
-		if (ret)
-			str_liberror(ctx, ret, _("setting bad block bitmap"));
-	}
+	ret = bitmap_set(tree, start, length);
+	if (ret)
+		str_liberror(ctx, ret, _("setting bad block bitmap"));
+}
+
+struct walk_ioerr {
+	struct scrub_ctx	*ctx;
+	struct disk		*disk;
+};
+
+static int
+walk_ioerr(
+	uint64_t		start,
+	uint64_t		length,
+	void			*arg)
+{
+	struct walk_ioerr	*wioerr = arg;
+	struct fsmap		keys[2];
+	char			descr[DESCR_BUFSZ];
+	dev_t			dev;
+
+	dev = xfs_disk_to_dev(wioerr->ctx, wioerr->disk);
 
-	snprintf(descr, DESCR_BUFSZ, _("dev %d:%d ioerr @ %"PRIu64":%"PRIu64" "),
+	snprintf(descr, DESCR_BUFSZ,
+_("dev %d:%d ioerr @ %"PRIu64":%"PRIu64" "),
 			major(dev), minor(dev), start, length);
 
 	/* Go figure out which blocks are bad from the fsmap. */
@@ -459,8 +467,60 @@ xfs_check_rmap_ioerr(
 	(keys + 1)->fmr_owner = ULLONG_MAX;
 	(keys + 1)->fmr_offset = ULLONG_MAX;
 	(keys + 1)->fmr_flags = UINT_MAX;
-	xfs_iterate_fsmap(ctx, descr, keys, xfs_check_rmap_error_report,
+	xfs_iterate_fsmap(wioerr->ctx, descr, keys, ioerr_fsmap_report,
 			&start);
+	return 0;
+}
+
+static int
+walk_ioerrs(
+	struct scrub_ctx		*ctx,
+	struct disk			*disk,
+	struct media_verify_state	*vs)
+{
+	struct walk_ioerr		wioerr = {
+		.ctx			= ctx,
+		.disk			= disk,
+	};
+	struct bitmap			*tree;
+
+	if (!disk)
+		return 0;
+	tree = bitmap_for_disk(ctx, disk, vs);
+	if (!tree)
+		return 0;
+	return bitmap_iterate(tree, walk_ioerr, &wioerr);
+}
+
+/* Given bad extent lists for the data & rtdev, find bad files. */
+static bool
+xfs_report_verify_errors(
+	struct scrub_ctx		*ctx,
+	struct media_verify_state	*vs)
+{
+	bool				moveon;
+	int				ret;
+
+	ret = walk_ioerrs(ctx, ctx->datadev, vs);
+	if (ret) {
+		str_liberror(ctx, ret, _("walking datadev io errors"));
+		return false;
+	}
+
+	ret = walk_ioerrs(ctx, ctx->rtdev, vs);
+	if (ret) {
+		str_liberror(ctx, ret, _("walking rtdev io errors"));
+		return false;
+	}
+
+	/* Scan the directory tree to get file paths. */
+	moveon = scan_fs_tree(ctx, xfs_report_verify_dir,
+			xfs_report_verify_dirent, vs);
+	if (!moveon)
+		return false;
+
+	/* Scan for unlinked files. */
+	return xfs_scan_all_inodes(ctx, xfs_report_verify_inode, vs);
 }
 
 /* Schedule a read-verify of a (data block) extent. */
@@ -571,7 +631,7 @@ xfs_scan_blocks(
 	}
 
 	ret = read_verify_pool_alloc(ctx, ctx->datadev,
-			ctx->mnt.fsgeom.blocksize, xfs_check_rmap_ioerr,
+			ctx->mnt.fsgeom.blocksize, remember_ioerr,
 			scrub_nproc(ctx), &vs.rvp_data);
 	if (ret) {
 		str_liberror(ctx, ret, _("creating datadev media verifier"));
@@ -579,7 +639,7 @@ xfs_scan_blocks(
 	}
 	if (ctx->logdev) {
 		ret = read_verify_pool_alloc(ctx, ctx->logdev,
-				ctx->mnt.fsgeom.blocksize, xfs_check_rmap_ioerr,
+				ctx->mnt.fsgeom.blocksize, remember_ioerr,
 				scrub_nproc(ctx), &vs.rvp_log);
 		if (ret) {
 			str_liberror(ctx, ret,
@@ -589,7 +649,7 @@ xfs_scan_blocks(
 	}
 	if (ctx->rtdev) {
 		ret = read_verify_pool_alloc(ctx, ctx->rtdev,
-				ctx->mnt.fsgeom.blocksize, xfs_check_rmap_ioerr,
+				ctx->mnt.fsgeom.blocksize, remember_ioerr,
 				scrub_nproc(ctx), &vs.rvp_realtime);
 		if (ret) {
 			str_liberror(ctx, ret,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 07/11] xfs_scrub: request fewer bmaps when we can
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (5 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 06/11] xfs_scrub: reduce fsmap activity for media errors Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 08/11] xfs_scrub: fix media verification thread pool size calculations Darrick J. Wong
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_iterate_filemaps, we query the number of bmaps for a given file
that we're going to iterate, so feed that information to bmap so that
the kernel won't waste time allocating in-kernel memory unnecessarily.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/filemap.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/scrub/filemap.c b/scrub/filemap.c
index bdc6d8f9..aaaa0521 100644
--- a/scrub/filemap.c
+++ b/scrub/filemap.c
@@ -71,7 +71,6 @@ xfs_iterate_filemaps(
 		map->bmv_length = ULLONG_MAX;
 	else
 		map->bmv_length = BTOBB(key->bm_length);
-	map->bmv_count = BMAP_NR;
 	map->bmv_iflags = BMV_IF_NO_DMAPI_READ | BMV_IF_PREALLOC |
 			  BMV_IF_NO_HOLES;
 	switch (whichfork) {
@@ -96,6 +95,7 @@ xfs_iterate_filemaps(
 		moveon = false;
 		goto out;
 	}
+	map->bmv_count = min(fsx.fsx_nextents + 2, BMAP_NR);
 
 	while ((error = ioctl(fd, XFS_IOC_GETBMAPX, map)) == 0) {
 		for (i = 0, p = &map[i + 1]; i < map->bmv_entries; i++, p++) {


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 08/11] xfs_scrub: fix media verification thread pool size calculations
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (6 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 07/11] xfs_scrub: request fewer bmaps when we can Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 09/11] libfrog: clean up platform_nproc Darrick J. Wong
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The read verifier pool deals with two different thread counts -- there's
the submitter thread count that enables us to perform per-thread verify
request aggregation, and then there's the io thread pool count which is
the maximum number of IO requests we want to send to the disk at any
given time.

The io thread pool count should be derived from disk_heads() but instead
we bungle it by measuring and modifying(!) the nproc global variable.
Fix the derivation to use global variables correctly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/read_verify.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)


diff --git a/scrub/read_verify.c b/scrub/read_verify.c
index 3dac10ce..834571a7 100644
--- a/scrub/read_verify.c
+++ b/scrub/read_verify.c
@@ -75,6 +75,7 @@ read_verify_pool_alloc(
 	struct read_verify_pool		**prvp)
 {
 	struct read_verify_pool		*rvp;
+	unsigned int			verifier_threads = disk_heads(disk);
 	int				ret;
 
 	/*
@@ -94,7 +95,7 @@ read_verify_pool_alloc(
 			RVP_IO_MAX_SIZE);
 	if (ret)
 		goto out_free;
-	ret = ptcounter_alloc(nproc, &rvp->verified_bytes);
+	ret = ptcounter_alloc(verifier_threads, &rvp->verified_bytes);
 	if (ret)
 		goto out_buf;
 	rvp->miniosz = miniosz;
@@ -106,11 +107,8 @@ read_verify_pool_alloc(
 			&rvp->rvstate);
 	if (ret)
 		goto out_counter;
-	/* Run in the main thread if we only want one thread. */
-	if (nproc == 1)
-		nproc = 0;
 	ret = workqueue_create(&rvp->wq, (struct xfs_mount *)rvp,
-			disk_heads(disk));
+			verifier_threads == 1 ? 0 : verifier_threads);
 	if (ret)
 		goto out_rvstate;
 	*prvp = rvp;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 09/11] libfrog: clean up platform_nproc
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (7 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 08/11] xfs_scrub: fix media verification thread pool size calculations Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:39 ` [PATCH 10/11] xfs_scrub: clean out the nproc global variable Darrick J. Wong
  2019-09-06  3:40 ` [PATCH 11/11] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

The platform_nproc function should check for error returns and obviously
garbage values and deal with them appropriately.  Fix the header
declaration since it's part of the libfrog platform support code, not
libxfs.  xfs_scrub will make use of it in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/libxfs.h           |    1 -
 include/platform_defs.h.in |    2 ++
 libfrog/linux.c            |    9 ++++++++-
 3 files changed, 10 insertions(+), 2 deletions(-)


diff --git a/include/libxfs.h b/include/libxfs.h
index 63696df5..227084ae 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -135,7 +135,6 @@ extern void	libxfs_device_close (dev_t);
 extern int	libxfs_device_alignment (void);
 extern void	libxfs_report(FILE *);
 extern void	platform_findsizes(char *path, int fd, long long *sz, int *bsz);
-extern int	platform_nproc(void);
 
 /* check or write log footer: specify device, log size in blocks & uuid */
 typedef char	*(libxfs_get_block_t)(char *, int, void *);
diff --git a/include/platform_defs.h.in b/include/platform_defs.h.in
index d111ec6d..adb00181 100644
--- a/include/platform_defs.h.in
+++ b/include/platform_defs.h.in
@@ -77,4 +77,6 @@ typedef unsigned short umode_t;
 # define ASSERT(EX)	((void) 0)
 #endif
 
+extern int	platform_nproc(void);
+
 #endif	/* __XFS_PLATFORM_DEFS_H__ */
diff --git a/libfrog/linux.c b/libfrog/linux.c
index b6c24879..79bd79eb 100644
--- a/libfrog/linux.c
+++ b/libfrog/linux.c
@@ -242,10 +242,17 @@ platform_align_blockdev(void)
 	return max_block_alignment;
 }
 
+/* How many CPUs are online? */
 int
 platform_nproc(void)
 {
-	return sysconf(_SC_NPROCESSORS_ONLN);
+	long nproc = sysconf(_SC_NPROCESSORS_ONLN);
+
+	if (nproc < 1)
+		return 1;
+	if (nproc >= INT_MAX)
+		return INT_MAX;
+	return nproc;
 }
 
 unsigned long


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 10/11] xfs_scrub: clean out the nproc global variable
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (8 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 09/11] libfrog: clean up platform_nproc Darrick J. Wong
@ 2019-09-06  3:39 ` Darrick J. Wong
  2019-09-06  3:40 ` [PATCH 11/11] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:39 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Get rid of this global variable since we already have a libfrog function
that does exactly what it does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/disk.c      |    2 ++
 scrub/xfs_scrub.c |    8 --------
 scrub/xfs_scrub.h |    1 -
 3 files changed, 2 insertions(+), 9 deletions(-)


diff --git a/scrub/disk.c b/scrub/disk.c
index 214a5346..31dc4192 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -22,6 +22,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "disk.h"
+#include "platform_defs.h"
 
 #ifndef BLKROTATIONAL
 # define BLKROTATIONAL	_IO(0x12, 126)
@@ -42,6 +43,7 @@ __disk_heads(
 {
 	int			iomin;
 	int			ioopt;
+	int			nproc = platform_nproc();
 	unsigned short		rot;
 	int			error;
 
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index b6a01274..147c114c 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -131,9 +131,6 @@ static bool			display_rusage;
 /* Background mode; higher values insert more pauses between scrub calls. */
 unsigned int			bg_mode;
 
-/* Maximum number of processors available to us. */
-int				nproc;
-
 /* Number of threads we're allowed to use. */
 unsigned int			force_nr_threads;
 
@@ -717,11 +714,6 @@ main(
 	}
 	memcpy(&ctx.fsinfo, fsp, sizeof(struct fs_path));
 
-	/* How many CPUs? */
-	nproc = sysconf(_SC_NPROCESSORS_ONLN);
-	if (nproc < 1)
-		nproc = 1;
-
 	/* Set up a page-aligned buffer for read verification. */
 	page_size = sysconf(_SC_PAGESIZE);
 	if (page_size < 0) {
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index f9a72052..37d78f61 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -15,7 +15,6 @@ extern char *progname;
 extern unsigned int		force_nr_threads;
 extern unsigned int		bg_mode;
 extern unsigned int		debug;
-extern int			nproc;
 extern bool			verbose;
 extern long			page_size;
 extern bool			want_fstrim;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 11/11] xfs_scrub: create a new category for unfixable errors
  2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
                   ` (9 preceding siblings ...)
  2019-09-06  3:39 ` [PATCH 10/11] xfs_scrub: clean out the nproc global variable Darrick J. Wong
@ 2019-09-06  3:40 ` Darrick J. Wong
  10 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-09-06  3:40 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

There's nothing that xfs_scrub (or XFS) can do about media errors for
data file blocks -- the data are gone.  Create a new category for these
unfixable errors so that we don't advise the user to take further action
that won't fix the problem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |    9 ++++++++-
 scrub/common.h    |    3 +++
 scrub/phase4.c    |    3 ++-
 scrub/phase5.c    |    2 +-
 scrub/phase6.c    |    2 +-
 scrub/xfs_scrub.c |   15 +++++++++++++--
 scrub/xfs_scrub.h |    1 +
 7 files changed, 29 insertions(+), 6 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index a814f568..79ec9fd6 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -36,7 +36,8 @@ xfs_scrub_excessive_errors(
 	bool			ret;
 
 	pthread_mutex_lock(&ctx->lock);
-	ret = ctx->max_errors > 0 && ctx->errors_found >= ctx->max_errors;
+	ret = ctx->max_errors > 0 &&
+	      (ctx->unfixable_errors + ctx->errors_found) >= ctx->max_errors;
 	pthread_mutex_unlock(&ctx->lock);
 
 	return ret;
@@ -47,6 +48,10 @@ static struct {
 	int loglevel;
 } err_levels[] = {
 	[S_ERROR]  = { .string = "Error",	.loglevel = LOG_ERR },
+	[S_UNFIXABLE] = {
+		.string = "Unfixable error",
+		.loglevel = LOG_ERR
+	},
 	[S_WARN]   = { .string = "Warning",	.loglevel = LOG_WARNING },
 	[S_REPAIR] = { .string = "Repaired",	.loglevel = LOG_WARNING },
 	[S_INFO]   = { .string = "Info",	.loglevel = LOG_INFO },
@@ -108,6 +113,8 @@ __str_out(
 out_record:
 	if (error)      /* A syscall failed */
 		ctx->runtime_errors++;
+	else if (level == S_UNFIXABLE)
+		ctx->unfixable_errors++;
 	else if (level == S_ERROR)
 		ctx->errors_found++;
 	else if (level == S_WARN)
diff --git a/scrub/common.h b/scrub/common.h
index 1b9ad48f..e8485b4c 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -17,6 +17,7 @@ bool xfs_scrub_excessive_errors(struct scrub_ctx *ctx);
 
 enum error_level {
 	S_ERROR	= 0,
+	S_UNFIXABLE,
 	S_WARN,
 	S_REPAIR,
 	S_INFO,
@@ -40,6 +41,8 @@ void __str_out(struct scrub_ctx *ctx, const char *descr, enum error_level level,
 	__str_out(ctx, str, S_REPAIR,	0,	__FILE__, __LINE__, __VA_ARGS__)
 #define record_preen(ctx, str, ...) \
 	__str_out(ctx, str, S_PREEN,	0,	__FILE__, __LINE__, __VA_ARGS__)
+#define str_unfixable_error(ctx, str, ...) \
+	__str_out(ctx, str, S_UNFIXABLE, 0,	__FILE__, __LINE__, __VA_ARGS__)
 
 #define dbg_printf(fmt, ...) \
 	do {if (debug > 1) {printf(fmt, __VA_ARGS__);}} while (0)
diff --git a/scrub/phase4.c b/scrub/phase4.c
index eb30c189..07927036 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -99,7 +99,8 @@ xfs_process_action_items(
 	workqueue_destroy(&wq);
 
 	pthread_mutex_lock(&ctx->lock);
-	if (moveon && ctx->errors_found == 0 && want_fstrim) {
+	if (moveon && ctx->errors_found == 0 && ctx->unfixable_errors == 0 &&
+	    want_fstrim) {
 		fstrim(ctx);
 		progress_add(1);
 	}
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 997c88d9..30346fc1 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -336,7 +336,7 @@ xfs_scan_connections(
 	bool			moveon = true;
 	bool			ret;
 
-	if (ctx->errors_found) {
+	if (ctx->errors_found || ctx->unfixable_errors) {
 		str_info(ctx, ctx->mntpoint,
 _("Filesystem has errors, skipping connectivity checks."));
 		return true;
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 378ea0fb..c50fb8fb 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -140,7 +140,7 @@ report_badfile(
 	bad_length = min(start + length,
 			 br->bmap->bm_physical + br->bmap->bm_length) - start;
 
-	str_error(br->ctx, br->descr,
+	str_unfixable_error(br->ctx, br->descr,
 _("media error at data offset %llu length %llu."),
 			br->bmap->bm_offset + bad_offset, bad_length);
 	return 0;
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 147c114c..aa98caaa 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -515,12 +515,16 @@ report_outcome(
 
 	total_errors = ctx->errors_found + ctx->runtime_errors;
 
-	if (total_errors == 0 && ctx->warnings_found == 0) {
+	if (total_errors == 0 &&
+	    ctx->unfixable_errors == 0 &&
+	    ctx->warnings_found == 0) {
 		log_info(ctx, _("No errors found."));
 		return;
 	}
 
-	if (total_errors == 0) {
+	if (total_errors == 0 && ctx->warnings_found == 0) {
+		/* nothing to report */
+	} else if (total_errors == 0) {
 		fprintf(stderr, _("%s: warnings found: %llu\n"), ctx->mntpoint,
 				ctx->warnings_found);
 		log_warn(ctx, _("warnings found: %llu"), ctx->warnings_found);
@@ -536,6 +540,13 @@ report_outcome(
 				total_errors, ctx->warnings_found);
 	}
 
+	if (ctx->unfixable_errors) {
+		fprintf(stderr, _("%s: unfixable errors found: %llu\n"),
+				ctx->mntpoint, ctx->unfixable_errors);
+		log_err(ctx, _("unfixable errors found: %llu"),
+				ctx->unfixable_errors);
+	}
+
 	/*
 	 * Don't advise the user to run repair unless we were successful in
 	 * setting up the scrub and we actually saw corruptions.  Warnings
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index 37d78f61..54876acb 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -74,6 +74,7 @@ struct scrub_ctx {
 	unsigned long long	max_errors;
 	unsigned long long	runtime_errors;
 	unsigned long long	errors_found;
+	unsigned long long	unfixable_errors;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
 	unsigned long long	bytes_checked;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 11/11] xfs_scrub: create a new category for unfixable errors
  2019-08-26 21:31 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
@ 2019-08-26 21:32 ` Darrick J. Wong
  0 siblings, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2019-08-26 21:32 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

There's nothing that xfs_scrub (or XFS) can do about media errors for
data file blocks -- the data are gone.  Create a new category for these
unfixable errors so that we don't advise the user to take further action
that won't fix the problem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |    9 ++++++++-
 scrub/common.h    |    3 +++
 scrub/phase4.c    |    3 ++-
 scrub/phase5.c    |    2 +-
 scrub/phase6.c    |    2 +-
 scrub/xfs_scrub.c |   15 +++++++++++++--
 scrub/xfs_scrub.h |    1 +
 7 files changed, 29 insertions(+), 6 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index fdbbf294..1ca550dd 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -36,7 +36,8 @@ xfs_scrub_excessive_errors(
 	bool			ret;
 
 	pthread_mutex_lock(&ctx->lock);
-	ret = ctx->max_errors > 0 && ctx->errors_found >= ctx->max_errors;
+	ret = ctx->max_errors > 0 &&
+	      (ctx->unfixable_errors + ctx->errors_found) >= ctx->max_errors;
 	pthread_mutex_unlock(&ctx->lock);
 
 	return ret;
@@ -47,6 +48,10 @@ static struct {
 	int loglevel;
 } err_levels[] = {
 	[S_ERROR]  = { .string = "Error",	.loglevel = LOG_ERR },
+	[S_UNFIXABLE] = {
+		.string = "Unfixable error",
+		.loglevel = LOG_ERR
+	},
 	[S_WARN]   = { .string = "Warning",	.loglevel = LOG_WARNING },
 	[S_REPAIR] = { .string = "Repaired",	.loglevel = LOG_WARNING },
 	[S_INFO]   = { .string = "Info",	.loglevel = LOG_INFO },
@@ -108,6 +113,8 @@ __str_out(
 out_record:
 	if (error)      /* A syscall failed */
 		ctx->runtime_errors++;
+	else if (level == S_UNFIXABLE)
+		ctx->unfixable_errors++;
 	else if (level == S_ERROR)
 		ctx->errors_found++;
 	else if (level == S_WARN)
diff --git a/scrub/common.h b/scrub/common.h
index b34cb4a6..886e6c2e 100644
--- a/scrub/common.h
+++ b/scrub/common.h
@@ -17,6 +17,7 @@ bool xfs_scrub_excessive_errors(struct scrub_ctx *ctx);
 
 enum error_level {
 	S_ERROR	= 0,
+	S_UNFIXABLE,
 	S_WARN,
 	S_REPAIR,
 	S_INFO,
@@ -40,6 +41,8 @@ void __str_out(struct scrub_ctx *ctx, const char *descr, enum error_level level,
 	__str_out(ctx, str, S_REPAIR,	0,	__FILE__, __LINE__, __VA_ARGS__)
 #define record_preen(ctx, str, ...) \
 	__str_out(ctx, str, S_PREEN,	0,	__FILE__, __LINE__, __VA_ARGS__)
+#define str_unfixable_error(ctx, str, ...) \
+	__str_out(ctx, str, S_UNFIXABLE, 0,	__FILE__, __LINE__, __VA_ARGS__)
 
 #define dbg_printf(fmt, ...) \
 	do {if (debug > 1) {printf(fmt, __VA_ARGS__);}} while (0)
diff --git a/scrub/phase4.c b/scrub/phase4.c
index 79c8a6b8..a030b1de 100644
--- a/scrub/phase4.c
+++ b/scrub/phase4.c
@@ -99,7 +99,8 @@ xfs_process_action_items(
 	workqueue_destroy(&wq);
 
 	pthread_mutex_lock(&ctx->lock);
-	if (moveon && ctx->errors_found == 0 && want_fstrim) {
+	if (moveon && ctx->errors_found == 0 && ctx->unfixable_errors == 0 &&
+	    want_fstrim) {
 		fstrim(ctx);
 		progress_add(1);
 	}
diff --git a/scrub/phase5.c b/scrub/phase5.c
index 224081d5..d90b4028 100644
--- a/scrub/phase5.c
+++ b/scrub/phase5.c
@@ -336,7 +336,7 @@ xfs_scan_connections(
 	bool			moveon = true;
 	bool			ret;
 
-	if (ctx->errors_found) {
+	if (ctx->errors_found || ctx->unfixable_errors) {
 		str_info(ctx, ctx->mntpoint,
 _("Filesystem has errors, skipping connectivity checks."));
 		return true;
diff --git a/scrub/phase6.c b/scrub/phase6.c
index 33ed75cc..1e55fad8 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -140,7 +140,7 @@ report_badfile(
 	bad_length = min(start + length,
 			 br->bmap->bm_physical + br->bmap->bm_length) - start;
 
-	str_error(br->ctx, br->descr,
+	str_unfixable_error(br->ctx, br->descr,
 _("media error at data offset %llu length %llu."),
 			br->bmap->bm_offset + bad_offset, bad_length);
 	return 0;
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index 99bf54f5..645b2c50 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -515,12 +515,16 @@ report_outcome(
 
 	total_errors = ctx->errors_found + ctx->runtime_errors;
 
-	if (total_errors == 0 && ctx->warnings_found == 0) {
+	if (total_errors == 0 &&
+	    ctx->unfixable_errors == 0 &&
+	    ctx->warnings_found == 0) {
 		log_info(ctx, _("No errors found."));
 		return;
 	}
 
-	if (total_errors == 0) {
+	if (total_errors == 0 && ctx->warnings_found == 0) {
+		/* nothing to report */
+	} else if (total_errors == 0) {
 		fprintf(stderr, _("%s: warnings found: %llu\n"), ctx->mntpoint,
 				ctx->warnings_found);
 		log_warn(ctx, _("warnings found: %llu"), ctx->warnings_found);
@@ -536,6 +540,13 @@ report_outcome(
 				total_errors, ctx->warnings_found);
 	}
 
+	if (ctx->unfixable_errors) {
+		fprintf(stderr, _("%s: unfixable errors found: %llu\n"),
+				ctx->mntpoint, ctx->unfixable_errors);
+		log_err(ctx, _("unfixable errors found: %llu"),
+				ctx->unfixable_errors);
+	}
+
 	/*
 	 * Don't advise the user to run repair unless we were successful in
 	 * setting up the scrub and we actually saw corruptions.  Warnings
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index bce56b1d..6984d24c 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -74,6 +74,7 @@ struct scrub_ctx {
 	unsigned long long	max_errors;
 	unsigned long long	runtime_errors;
 	unsigned long long	errors_found;
+	unsigned long long	unfixable_errors;
 	unsigned long long	warnings_found;
 	unsigned long long	inodes_checked;
 	unsigned long long	bytes_checked;


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-06  3:38 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
2019-09-06  3:39 ` [PATCH 01/11] xfs_scrub: separate media error reporting for attribute forks Darrick J. Wong
2019-09-06  3:39 ` [PATCH 02/11] xfs_scrub: improve reporting of file data media errors Darrick J. Wong
2019-09-06  3:39 ` [PATCH 03/11] xfs_scrub: better reporting of metadata " Darrick J. Wong
2019-09-06  3:39 ` [PATCH 04/11] xfs_scrub: improve reporting of file " Darrick J. Wong
2019-09-06  3:39 ` [PATCH 05/11] xfs_scrub: don't report media errors on unwritten extents Darrick J. Wong
2019-09-06  3:39 ` [PATCH 06/11] xfs_scrub: reduce fsmap activity for media errors Darrick J. Wong
2019-09-06  3:39 ` [PATCH 07/11] xfs_scrub: request fewer bmaps when we can Darrick J. Wong
2019-09-06  3:39 ` [PATCH 08/11] xfs_scrub: fix media verification thread pool size calculations Darrick J. Wong
2019-09-06  3:39 ` [PATCH 09/11] libfrog: clean up platform_nproc Darrick J. Wong
2019-09-06  3:39 ` [PATCH 10/11] xfs_scrub: clean out the nproc global variable Darrick J. Wong
2019-09-06  3:40 ` [PATCH 11/11] xfs_scrub: create a new category for unfixable errors Darrick J. Wong
  -- strict thread matches above, loose matches on Subject: below --
2019-08-26 21:31 [PATCH 00/11] xfs_scrub: fix IO error reporting Darrick J. Wong
2019-08-26 21:32 ` [PATCH 11/11] xfs_scrub: create a new category for unfixable errors Darrick J. Wong

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org linux-xfs@archiver.kernel.org
	public-inbox-index linux-xfs


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/ public-inbox