All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] xfsprogs: various fixes
@ 2018-12-19 19:29 Darrick J. Wong
  2018-12-19 19:29 ` [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly Darrick J. Wong
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:29 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Here are some fixes for xfsprogs 4.20:

Patch 1 fixes xfs_scrub_all to discover the fs to physical device
mappings correctly when nvme devices are present in the system.

Patch 2 holds off automated xfs_scrub_all triggers until the system is
fully booted so that we don't end up racing metadata scrubbing with
system boot.

Patches 3-5 fix thread count discovery in xfs_scrub so that we stop
shadowing variables and estimate parallelism based at least somewhat on
the storage hardware and not just the number of CPUs.  In other words,
we'll stop pounding a spinning disk with all the CPUs.

Patches 6-7 fix some link count handling in xfs_repair when we're
(re)initializing the root directory and lost+found directories.

Patch 8 fixes some build warnings in xfs_repair.

This series survived overnight testing against current xfsprogs
for-next.  It can be pulled as a git branch[1].

--D

[1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=djwong-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
@ 2018-12-19 19:29 ` Darrick J. Wong
  2019-02-04 18:08   ` Eric Sandeen
  2018-12-19 19:29 ` [PATCH 2/8] xfs_scrub_all.timer: activate after most of the system is up Darrick J. Wong
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:29 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Back when I was designing xfs_scrub_all, I naïvely assumed that the
emitted output would always list physical storage before the virtual
devices stacked atop it.  However, this is not actually true when one
omits the "NAME" column, which is crucial to forcing the output (json or
otherwise) to capture the block device hierarchy.  If the assumption is
violated, the program crashes with a python exception.

To fix this, force the hierarchal json output and restructure the
discovery routines to walk the json object that we receive, from the top
(physical devices) downwards to wherever there are live xfs filesystems.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/xfs_scrub_all.in |   28 +++++++++++++++++-----------
 1 file changed, 17 insertions(+), 11 deletions(-)


diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
index c4e9899d..5b76b49a 100644
--- a/scrub/xfs_scrub_all.in
+++ b/scrub/xfs_scrub_all.in
@@ -28,9 +28,21 @@ def DEVNULL():
 
 def find_mounts():
 	'''Map mountpoints to physical disks.'''
+	def find_xfs_mounts(bdev, fs, lastdisk):
+		'''Attach lastdisk to each fs found under bdev.'''
+		if bdev['fstype'] == 'xfs' and bdev['mountpoint'] is not None:
+			mnt = bdev['mountpoint']
+			if mnt in fs:
+				fs[mnt].add(lastdisk)
+			else:
+				fs[mnt] = set([lastdisk])
+		if 'children' not in bdev:
+			return
+		for child in bdev['children']:
+			find_xfs_mounts(child, fs, lastdisk)
 
 	fs = {}
-	cmd=['lsblk', '-o', 'KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
+	cmd=['lsblk', '-o', 'NAME,KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
 	result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
 	result.wait()
 	if result.returncode != 0:
@@ -38,18 +50,12 @@ def find_mounts():
 	sarray = [x.decode(sys.stdout.encoding) for x in result.stdout.readlines()]
 	output = ' '.join(sarray)
 	bdevdata = json.loads(output)
+
 	# The lsblk output had better be in disks-then-partitions order
 	for bdev in bdevdata['blockdevices']:
-		if bdev['type'] in ('disk', 'loop'):
-			lastdisk = bdev['kname']
-		if bdev['fstype'] == 'xfs':
-			mnt = bdev['mountpoint']
-			if mnt is None:
-				continue
-			if mnt in fs:
-				fs[mnt].add(lastdisk)
-			else:
-				fs[mnt] = set([lastdisk])
+		lastdisk = bdev['kname']
+		find_xfs_mounts(bdev, fs, lastdisk)
+
 	return fs
 
 def kill_systemd(unit, proc):

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/8] xfs_scrub_all.timer: activate after most of the system is up
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
  2018-12-19 19:29 ` [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly Darrick J. Wong
@ 2018-12-19 19:29 ` Darrick J. Wong
  2019-02-04 18:12   ` Eric Sandeen
  2018-12-19 19:29 ` [PATCH 3/8] xfs_scrub: rename the global nr_threads Darrick J. Wong
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:29 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

We really don't want the xfs_scrub_all timer triggering while the system
is booting up because not all the mounts will have finished, networking
might not be up for reporting, and slowing down bootup annoys people.
Therefore, delay the xfs_scrub_all service's activation until after the
system has started all the big pieces it's going to start.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/xfs_scrub_all.service.in |    1 +
 1 file changed, 1 insertion(+)


diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
index 66f82fc7..b1b80da4 100644
--- a/scrub/xfs_scrub_all.service.in
+++ b/scrub/xfs_scrub_all.service.in
@@ -2,6 +2,7 @@
 Description=Online XFS Metadata Check for All Filesystems
 ConditionACPower=true
 Documentation=man:xfs_scrub_all(8)
+After=paths.target multi-user.target network.target network-online.target systemd-networkd.service NetworkManager.service connman.service
 
 [Service]
 Type=oneshot

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/8] xfs_scrub: rename the global nr_threads
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
  2018-12-19 19:29 ` [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly Darrick J. Wong
  2018-12-19 19:29 ` [PATCH 2/8] xfs_scrub_all.timer: activate after most of the system is up Darrick J. Wong
@ 2018-12-19 19:29 ` Darrick J. Wong
  2019-02-04 18:20   ` Eric Sandeen
  2018-12-19 19:29 ` [PATCH 4/8] xfs_scrub: use datadev parallelization estimates for thread count Darrick J. Wong
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:29 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Various functions have nr_threads local variables that shadow the global
one.  Since the global one forces the number of threads we use, change
its name to remove this ambiguity and reflect what it really does.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/common.c    |    4 ++--
 scrub/disk.c      |    4 ++--
 scrub/xfs_scrub.c |    6 +++---
 scrub/xfs_scrub.h |    2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)


diff --git a/scrub/common.c b/scrub/common.c
index 78afc4bf..c877c7c8 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -231,8 +231,8 @@ unsigned int
 scrub_nproc(
 	struct scrub_ctx	*ctx)
 {
-	if (nr_threads)
-		return nr_threads;
+	if (force_nr_threads)
+		return force_nr_threads;
 	return ctx->nr_io_threads;
 }
 
diff --git a/scrub/disk.c b/scrub/disk.c
index 7daa508e..dd109533 100644
--- a/scrub/disk.c
+++ b/scrub/disk.c
@@ -75,8 +75,8 @@ unsigned int
 disk_heads(
 	struct disk		*disk)
 {
-	if (nr_threads)
-		return nr_threads;
+	if (force_nr_threads)
+		return force_nr_threads;
 	return __disk_heads(disk);
 }
 
diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
index b8138000..71fc274f 100644
--- a/scrub/xfs_scrub.c
+++ b/scrub/xfs_scrub.c
@@ -133,7 +133,7 @@ unsigned int			bg_mode;
 int				nproc;
 
 /* Number of threads we're allowed to use. */
-unsigned int			nr_threads;
+unsigned int			force_nr_threads;
 
 /* Verbosity; higher values print more information. */
 bool				verbose;
@@ -589,7 +589,7 @@ main(
 			}
 			break;
 		case 'b':
-			nr_threads = 1;
+			force_nr_threads = 1;
 			bg_mode++;
 			break;
 		case 'C':
@@ -659,7 +659,7 @@ main(
 			perror("nr_threads");
 			usage();
 		}
-		nr_threads = x;
+		force_nr_threads = x;
 	}
 
 	if (optind != argc - 1)
diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
index a961d8fd..a459e4b5 100644
--- a/scrub/xfs_scrub.h
+++ b/scrub/xfs_scrub.h
@@ -10,7 +10,7 @@ extern char *progname;
 
 #define _PATH_PROC_MOUNTS	"/proc/mounts"
 
-extern unsigned int		nr_threads;
+extern unsigned int		force_nr_threads;
 extern unsigned int		bg_mode;
 extern unsigned int		debug;
 extern int			nproc;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/8] xfs_scrub: use datadev parallelization estimates for thread count
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
                   ` (2 preceding siblings ...)
  2018-12-19 19:29 ` [PATCH 3/8] xfs_scrub: rename the global nr_threads Darrick J. Wong
@ 2018-12-19 19:29 ` Darrick J. Wong
  2019-02-04 18:31   ` Eric Sandeen
  2018-12-19 19:30 ` [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool Darrick J. Wong
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:29 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

During phases 2-5, xfs_scrub should estimate the level of
parallelization possible on the data device to determine the number of
threads spawned to scrub filesystem metadata, not just blindly using the
number of CPUs.  This avoids flooding non-rotational storage with random
reads, which totally destroys performance and makes scrub runtimes
higher.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase1.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)


diff --git a/scrub/phase1.c b/scrub/phase1.c
index 2113014b..6b472147 100644
--- a/scrub/phase1.c
+++ b/scrub/phase1.c
@@ -109,13 +109,6 @@ _("Must be root to run scrub."));
 		return false;
 	}
 
-	ctx->nr_io_threads = nproc;
-	if (verbose) {
-		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
-				ctx->mntpoint, scrub_nproc(ctx));
-		fflush(stdout);
-	}
-
 	if (!platform_test_xfs_fd(ctx->mnt_fd)) {
 		str_info(ctx, ctx->mntpoint,
 _("Does not appear to be an XFS filesystem!"));
@@ -193,6 +186,13 @@ _("Unable to find realtime device path."));
 		return false;
 	}
 
+	ctx->nr_io_threads = disk_heads(ctx->datadev);
+	if (verbose) {
+		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
+				ctx->mntpoint, scrub_nproc(ctx));
+		fflush(stdout);
+	}
+
 	if (ctx->fsinfo.fs_log) {
 		ctx->logdev = disk_open(ctx->fsinfo.fs_log);
 		if (error) {

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
                   ` (3 preceding siblings ...)
  2018-12-19 19:29 ` [PATCH 4/8] xfs_scrub: use datadev parallelization estimates for thread count Darrick J. Wong
@ 2018-12-19 19:30 ` Darrick J. Wong
  2019-02-04 18:35   ` Eric Sandeen
  2018-12-19 19:30 ` [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly Darrick J. Wong
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:30 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Since we use the same read-verify pool object to dispatch file data read
requests for both the data device and the realtime device, we should
create enough IO threads to handle the estimated parallelization of both
devices, not just the data device.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 scrub/phase6.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index ead48d77..cbda9b53 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -435,6 +435,22 @@ xfs_check_rmap(
 	return true;
 }
 
+/*
+ * Estimate the amount of parallelization possible for scanning file data on
+ * the data and realtime devices.
+ */
+static unsigned int
+phase6_threads(
+	struct scrub_ctx	*ctx)
+{
+	unsigned int		nr = disk_heads(ctx->datadev);
+
+	if (ctx->rtdev)
+		nr += disk_heads(ctx->rtdev);
+
+	return nr > nproc ? nproc : nr;
+}
+
 /*
  * Read verify all the file data blocks in a filesystem.  Since XFS doesn't
  * do data checksums, we trust that the underlying storage will pass back
@@ -469,7 +485,7 @@ xfs_scan_blocks(
 	}
 
 	ve.readverify = read_verify_pool_init(ctx, ctx->geo.blocksize,
-			xfs_check_rmap_ioerr, disk_heads(ctx->datadev));
+			xfs_check_rmap_ioerr, phase6_threads(ctx));
 	if (!ve.readverify) {
 		moveon = false;
 		str_info(ctx, ctx->mntpoint,
@@ -525,7 +541,7 @@ xfs_estimate_verify_work(
 		return moveon;
 
 	*items = ((d_blocks - d_bfree) + (r_blocks - r_bfree)) << ctx->blocklog;
-	*nr_threads = disk_heads(ctx->datadev);
+	*nr_threads = phase6_threads(ctx);
 	*rshift = 20;
 	return moveon;
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
                   ` (4 preceding siblings ...)
  2018-12-19 19:30 ` [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool Darrick J. Wong
@ 2018-12-19 19:30 ` Darrick J. Wong
  2018-12-19 20:24   ` Bill O'Donnell
  2019-02-04 19:19   ` Eric Sandeen
  2018-12-19 19:30 ` [PATCH 7/8] xfs_repair: bump the irec on-disk nlink when adding lost+found Darrick J. Wong
  2018-12-19 19:30 ` [PATCH 8/8] xfs_repair: fix uninitialized variable warnings Darrick J. Wong
  7 siblings, 2 replies; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:30 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In mk_root_dir, we reinitialize the root directory inode with a link
count of 1.  This differs from mkfs parseproto, which initializes the
root to have a link count of 2.  The nlink discrepancy in repair is
caught and corrected during phase 7, but this is unnecessary since we
should set it properly in the first place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase6.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index 9477bc25..8a50b350 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -891,7 +891,7 @@ mk_root_dir(xfs_mount_t *mp)
 	ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
 	ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
 
-	set_nlink(VFS_I(ip), 1);	/* account for . */
+	set_nlink(VFS_I(ip), 2);	/* account for . and .. */
 
 	times = XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD;
 	if (ip->i_d.di_version == 3) {

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/8] xfs_repair: bump the irec on-disk nlink when adding lost+found
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
                   ` (5 preceding siblings ...)
  2018-12-19 19:30 ` [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly Darrick J. Wong
@ 2018-12-19 19:30 ` Darrick J. Wong
  2018-12-19 20:30   ` Bill O'Donnell
  2018-12-19 19:30 ` [PATCH 8/8] xfs_repair: fix uninitialized variable warnings Darrick J. Wong
  7 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:30 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

We increment the nlink of the root directory inode when creating a
"lost+found" directory during phase 6, but we don't update the irec copy
of the root dir nlink.  This normally gets papered over by phase 7, but
this can fail badly if:

1) The root directory had an entry to a busted subdirectory, so
   that root directory will have nlink == 3, but in the ino_tree,
   counted_nlinks == 2 and disk_nlinks == 3.

2) Phase 6 creates lost+found to root the files that were in the busted
   directory, we'll set nlink = 4 and counted_nlinks = 3.  The correct
   nlink is 3 ('.', '..', 'lost+found'), not 4.

3) During phase 7, we see that counted_nlinks == disk_nlinks and so we
   totally fail to correct the on-disk inode.

4) A subsequent run of xfs_repair complains about the nlink being 4
   instead of 3.

To fix this, we have to adjust the irec's disk_nlinks in step 2 so that
phase 7 seeds that counted_nlinks < disk_nlinks and resets nlink to
counted_nlinks.  This can be reproduced somewhat frequently by xfs/117.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/phase6.c |   14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)


diff --git a/repair/phase6.c b/repair/phase6.c
index 8a50b350..194cfdbf 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1019,6 +1019,7 @@ mk_orphanage(xfs_mount_t *mp)
 	 */
 	set_inode_used(irec, ino_offset);
 	add_inode_ref(irec, ino_offset);
+	add_inode_reached(irec, ino_offset);
 
 	/*
 	 * now that we know the transaction will stay around,
@@ -1037,14 +1038,14 @@ mk_orphanage(xfs_mount_t *mp)
 
 	/*
 	 * bump up the link count in the root directory to account
-	 * for .. in the new directory
+	 * for .. in the new directory, and update the irec copy of the
+	 * on-disk nlink so we don't fail the link count check later.
 	 */
 	inc_nlink(VFS_I(pip));
-	add_inode_ref(find_inode_rec(mp,
-				XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino),
-				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino)), 0);
-
-
+	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino),
+				  XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino));
+	add_inode_ref(irec, 0);
+	set_inode_disk_nlinks(irec, 0, get_inode_disk_nlinks(irec, 0) + 1);
 
 	libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
 	libxfs_dir_init(tp, ip, pip);
@@ -1056,7 +1057,6 @@ mk_orphanage(xfs_mount_t *mp)
 	}
 	libxfs_irele(ip);
 	libxfs_irele(pip);
-	add_inode_reached(irec,ino_offset);
 
 	return(ino);
 }

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/8] xfs_repair: fix uninitialized variable warnings
  2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
                   ` (6 preceding siblings ...)
  2018-12-19 19:30 ` [PATCH 7/8] xfs_repair: bump the irec on-disk nlink when adding lost+found Darrick J. Wong
@ 2018-12-19 19:30 ` Darrick J. Wong
  2018-12-19 20:25   ` Bill O'Donnell
  7 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2018-12-19 19:30 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Fix some uninitialized variable warnings because ASSERT disappears if
DEBUG is not defined.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/dinode.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/repair/dinode.c b/repair/dinode.c
index f670bf87..c0a56daa 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -1176,8 +1176,8 @@ process_quota_inode(
 	struct xfs_buf		*bp;
 	xfs_filblks_t		dqchunklen;
 	uint			dqperchunk;
-	int			quota_type;
-	char			*quota_string;
+	int			quota_type = 0;
+	char			*quota_string = NULL;
 	xfs_dqid_t		dqid;
 	xfs_fileoff_t		qbno;
 	int			i;

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly
  2018-12-19 19:30 ` [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly Darrick J. Wong
@ 2018-12-19 20:24   ` Bill O'Donnell
  2019-02-04 19:19   ` Eric Sandeen
  1 sibling, 0 replies; 23+ messages in thread
From: Bill O'Donnell @ 2018-12-19 20:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Wed, Dec 19, 2018 at 11:30:11AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In mk_root_dir, we reinitialize the root directory inode with a link
> count of 1.  This differs from mkfs parseproto, which initializes the
> root to have a link count of 2.  The nlink discrepancy in repair is
> caught and corrected during phase 7, but this is unnecessary since we
> should set it properly in the first place.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Looks fine.
Reviewed-by: Bill O'Donnell <billodo@redhat.com>

>  repair/phase6.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/repair/phase6.c b/repair/phase6.c
> index 9477bc25..8a50b350 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -891,7 +891,7 @@ mk_root_dir(xfs_mount_t *mp)
>  	ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
>  	ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
>  
> -	set_nlink(VFS_I(ip), 1);	/* account for . */
> +	set_nlink(VFS_I(ip), 2);	/* account for . and .. */
>  
>  	times = XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD;
>  	if (ip->i_d.di_version == 3) {
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/8] xfs_repair: fix uninitialized variable warnings
  2018-12-19 19:30 ` [PATCH 8/8] xfs_repair: fix uninitialized variable warnings Darrick J. Wong
@ 2018-12-19 20:25   ` Bill O'Donnell
  0 siblings, 0 replies; 23+ messages in thread
From: Bill O'Donnell @ 2018-12-19 20:25 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Wed, Dec 19, 2018 at 11:30:23AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Fix some uninitialized variable warnings because ASSERT disappears if
> DEBUG is not defined.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Looks fine.
Reviewed-by <billodo@redhat.com>

>  repair/dinode.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/repair/dinode.c b/repair/dinode.c
> index f670bf87..c0a56daa 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -1176,8 +1176,8 @@ process_quota_inode(
>  	struct xfs_buf		*bp;
>  	xfs_filblks_t		dqchunklen;
>  	uint			dqperchunk;
> -	int			quota_type;
> -	char			*quota_string;
> +	int			quota_type = 0;
> +	char			*quota_string = NULL;
>  	xfs_dqid_t		dqid;
>  	xfs_fileoff_t		qbno;
>  	int			i;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/8] xfs_repair: bump the irec on-disk nlink when adding lost+found
  2018-12-19 19:30 ` [PATCH 7/8] xfs_repair: bump the irec on-disk nlink when adding lost+found Darrick J. Wong
@ 2018-12-19 20:30   ` Bill O'Donnell
  0 siblings, 0 replies; 23+ messages in thread
From: Bill O'Donnell @ 2018-12-19 20:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Wed, Dec 19, 2018 at 11:30:17AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We increment the nlink of the root directory inode when creating a
> "lost+found" directory during phase 6, but we don't update the irec copy
> of the root dir nlink.  This normally gets papered over by phase 7, but
> this can fail badly if:
> 
> 1) The root directory had an entry to a busted subdirectory, so
>    that root directory will have nlink == 3, but in the ino_tree,
>    counted_nlinks == 2 and disk_nlinks == 3.
> 
> 2) Phase 6 creates lost+found to root the files that were in the busted
>    directory, we'll set nlink = 4 and counted_nlinks = 3.  The correct
>    nlink is 3 ('.', '..', 'lost+found'), not 4.
> 
> 3) During phase 7, we see that counted_nlinks == disk_nlinks and so we
>    totally fail to correct the on-disk inode.
> 
> 4) A subsequent run of xfs_repair complains about the nlink being 4
>    instead of 3.
> 
> To fix this, we have to adjust the irec's disk_nlinks in step 2 so that
> phase 7 seeds that counted_nlinks < disk_nlinks and resets nlink to
> counted_nlinks.  This can be reproduced somewhat frequently by xfs/117.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Makes sense.
Reviewed-by: Bill O'Donnell <billodo@redhat.com>

>  repair/phase6.c |   14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/repair/phase6.c b/repair/phase6.c
> index 8a50b350..194cfdbf 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -1019,6 +1019,7 @@ mk_orphanage(xfs_mount_t *mp)
>  	 */
>  	set_inode_used(irec, ino_offset);
>  	add_inode_ref(irec, ino_offset);
> +	add_inode_reached(irec, ino_offset);
>  
>  	/*
>  	 * now that we know the transaction will stay around,
> @@ -1037,14 +1038,14 @@ mk_orphanage(xfs_mount_t *mp)
>  
>  	/*
>  	 * bump up the link count in the root directory to account
> -	 * for .. in the new directory
> +	 * for .. in the new directory, and update the irec copy of the
> +	 * on-disk nlink so we don't fail the link count check later.
>  	 */
>  	inc_nlink(VFS_I(pip));
> -	add_inode_ref(find_inode_rec(mp,
> -				XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino),
> -				XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino)), 0);
> -
> -
> +	irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino),
> +				  XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino));
> +	add_inode_ref(irec, 0);
> +	set_inode_disk_nlinks(irec, 0, get_inode_disk_nlinks(irec, 0) + 1);
>  
>  	libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
>  	libxfs_dir_init(tp, ip, pip);
> @@ -1056,7 +1057,6 @@ mk_orphanage(xfs_mount_t *mp)
>  	}
>  	libxfs_irele(ip);
>  	libxfs_irele(pip);
> -	add_inode_reached(irec,ino_offset);
>  
>  	return(ino);
>  }
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly
  2018-12-19 19:29 ` [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly Darrick J. Wong
@ 2019-02-04 18:08   ` Eric Sandeen
  2019-02-04 18:16     ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2019-02-04 18:08 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 12/19/18 1:29 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Back when I was designing xfs_scrub_all, I naïvely assumed that the
> emitted output would always list physical storage before the virtual
> devices stacked atop it.  However, this is not actually true when one
> omits the "NAME" column, which is crucial to forcing the output (json or
> otherwise) to capture the block device hierarchy.  If the assumption is
> violated, the program crashes with a python exception.

Is this a quirk or a documented feature of lsblk?
 
> To fix this, force the hierarchal json output and restructure the
> discovery routines to walk the json object that we receive, from the top
> (physical devices) downwards to wherever there are live xfs filesystems.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  scrub/xfs_scrub_all.in |   28 +++++++++++++++++-----------
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> 
> diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
> index c4e9899d..5b76b49a 100644
> --- a/scrub/xfs_scrub_all.in
> +++ b/scrub/xfs_scrub_all.in
> @@ -28,9 +28,21 @@ def DEVNULL():
>  
>  def find_mounts():
>  	'''Map mountpoints to physical disks.'''
> +	def find_xfs_mounts(bdev, fs, lastdisk):
> +		'''Attach lastdisk to each fs found under bdev.'''
> +		if bdev['fstype'] == 'xfs' and bdev['mountpoint'] is not None:
> +			mnt = bdev['mountpoint']
> +			if mnt in fs:
> +				fs[mnt].add(lastdisk)
> +			else:
> +				fs[mnt] = set([lastdisk])
> +		if 'children' not in bdev:
> +			return
> +		for child in bdev['children']:
> +			find_xfs_mounts(child, fs, lastdisk)
>  
>  	fs = {}
> -	cmd=['lsblk', '-o', 'KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
> +	cmd=['lsblk', '-o', 'NAME,KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']

sorry for the ridonculously late review, and although '-J" isn't added
new in this patch, FYI at least RHEL7 does not allow it:

# lsblk -o KNAME -J
lsblk: invalid option -- 'J'

... thoughts?  Probably should be handled gracefully at least?

-Eric

>  	result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
>  	result.wait()
>  	if result.returncode != 0:
> @@ -38,18 +50,12 @@ def find_mounts():
>  	sarray = [x.decode(sys.stdout.encoding) for x in result.stdout.readlines()]
>  	output = ' '.join(sarray)
>  	bdevdata = json.loads(output)
> +
>  	# The lsblk output had better be in disks-then-partitions order
>  	for bdev in bdevdata['blockdevices']:
> -		if bdev['type'] in ('disk', 'loop'):
> -			lastdisk = bdev['kname']
> -		if bdev['fstype'] == 'xfs':
> -			mnt = bdev['mountpoint']
> -			if mnt is None:
> -				continue
> -			if mnt in fs:
> -				fs[mnt].add(lastdisk)
> -			else:
> -				fs[mnt] = set([lastdisk])
> +		lastdisk = bdev['kname']
> +		find_xfs_mounts(bdev, fs, lastdisk)
> +
>  	return fs
>  
>  def kill_systemd(unit, proc):
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/8] xfs_scrub_all.timer: activate after most of the system is up
  2018-12-19 19:29 ` [PATCH 2/8] xfs_scrub_all.timer: activate after most of the system is up Darrick J. Wong
@ 2019-02-04 18:12   ` Eric Sandeen
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Sandeen @ 2019-02-04 18:12 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 12/19/18 1:29 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> We really don't want the xfs_scrub_all timer triggering while the system
> is booting up because not all the mounts will have finished, networking
> might not be up for reporting, and slowing down bootup annoys people.
> Therefore, delay the xfs_scrub_all service's activation until after the
> system has started all the big pieces it's going to start.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  scrub/xfs_scrub_all.service.in |    1 +
>  1 file changed, 1 insertion(+)

<lays hands on systemd-fu, attempts to sense aura of goodness>

<has no real basis for judgement but feels that waiting for more $STUFF can't hurt>

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> 
> diff --git a/scrub/xfs_scrub_all.service.in b/scrub/xfs_scrub_all.service.in
> index 66f82fc7..b1b80da4 100644
> --- a/scrub/xfs_scrub_all.service.in
> +++ b/scrub/xfs_scrub_all.service.in
> @@ -2,6 +2,7 @@
>  Description=Online XFS Metadata Check for All Filesystems
>  ConditionACPower=true
>  Documentation=man:xfs_scrub_all(8)
> +After=paths.target multi-user.target network.target network-online.target systemd-networkd.service NetworkManager.service connman.service
>  
>  [Service]
>  Type=oneshot
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly
  2019-02-04 18:08   ` Eric Sandeen
@ 2019-02-04 18:16     ` Darrick J. Wong
  2019-02-04 18:27       ` Eric Sandeen
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2019-02-04 18:16 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Mon, Feb 04, 2019 at 12:08:49PM -0600, Eric Sandeen wrote:
> On 12/19/18 1:29 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Back when I was designing xfs_scrub_all, I naïvely assumed that the
> > emitted output would always list physical storage before the virtual
> > devices stacked atop it.  However, this is not actually true when one
> > omits the "NAME" column, which is crucial to forcing the output (json or
> > otherwise) to capture the block device hierarchy.  If the assumption is
> > violated, the program crashes with a python exception.
> 
> Is this a quirk or a documented feature of lsblk?

Not a documented feature, but seems to be a fairly common behavioral
quirk?

> > To fix this, force the hierarchal json output and restructure the
> > discovery routines to walk the json object that we receive, from the top
> > (physical devices) downwards to wherever there are live xfs filesystems.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  scrub/xfs_scrub_all.in |   28 +++++++++++++++++-----------
> >  1 file changed, 17 insertions(+), 11 deletions(-)
> > 
> > 
> > diff --git a/scrub/xfs_scrub_all.in b/scrub/xfs_scrub_all.in
> > index c4e9899d..5b76b49a 100644
> > --- a/scrub/xfs_scrub_all.in
> > +++ b/scrub/xfs_scrub_all.in
> > @@ -28,9 +28,21 @@ def DEVNULL():
> >  
> >  def find_mounts():
> >  	'''Map mountpoints to physical disks.'''
> > +	def find_xfs_mounts(bdev, fs, lastdisk):
> > +		'''Attach lastdisk to each fs found under bdev.'''
> > +		if bdev['fstype'] == 'xfs' and bdev['mountpoint'] is not None:
> > +			mnt = bdev['mountpoint']
> > +			if mnt in fs:
> > +				fs[mnt].add(lastdisk)
> > +			else:
> > +				fs[mnt] = set([lastdisk])
> > +		if 'children' not in bdev:
> > +			return
> > +		for child in bdev['children']:
> > +			find_xfs_mounts(child, fs, lastdisk)
> >  
> >  	fs = {}
> > -	cmd=['lsblk', '-o', 'KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
> > +	cmd=['lsblk', '-o', 'NAME,KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
> 
> sorry for the ridonculously late review, and although '-J" isn't added
> new in this patch, FYI at least RHEL7 does not allow it:
> 
> # lsblk -o KNAME -J
> lsblk: invalid option -- 'J'
> 
> ... thoughts?  Probably should be handled gracefully at least?

lsblk returns 1 for unrecognized arguments, so xfs_scrub_all will bail
if lsblk barfs.  Not sure if we want to divert lsblk's stderr to
/dev/null or just let it spray out sloppily like we do now?

(Practically speaking I suspect that distros will pick up the util-linux
release that has json support before they pick up XFS kernel scrub...
but I should at least make sure that's really true.)

--D

> -Eric
> 
> >  	result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
> >  	result.wait()
> >  	if result.returncode != 0:
> > @@ -38,18 +50,12 @@ def find_mounts():
> >  	sarray = [x.decode(sys.stdout.encoding) for x in result.stdout.readlines()]
> >  	output = ' '.join(sarray)
> >  	bdevdata = json.loads(output)
> > +
> >  	# The lsblk output had better be in disks-then-partitions order
> >  	for bdev in bdevdata['blockdevices']:
> > -		if bdev['type'] in ('disk', 'loop'):
> > -			lastdisk = bdev['kname']
> > -		if bdev['fstype'] == 'xfs':
> > -			mnt = bdev['mountpoint']
> > -			if mnt is None:
> > -				continue
> > -			if mnt in fs:
> > -				fs[mnt].add(lastdisk)
> > -			else:
> > -				fs[mnt] = set([lastdisk])
> > +		lastdisk = bdev['kname']
> > +		find_xfs_mounts(bdev, fs, lastdisk)
> > +
> >  	return fs
> >  
> >  def kill_systemd(unit, proc):
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/8] xfs_scrub: rename the global nr_threads
  2018-12-19 19:29 ` [PATCH 3/8] xfs_scrub: rename the global nr_threads Darrick J. Wong
@ 2019-02-04 18:20   ` Eric Sandeen
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Sandeen @ 2019-02-04 18:20 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 12/19/18 1:29 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Various functions have nr_threads local variables that shadow the global
> one.  Since the global one forces the number of threads we use, change
> its name to remove this ambiguity and reflect what it really does.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  scrub/common.c    |    4 ++--
>  scrub/disk.c      |    4 ++--
>  scrub/xfs_scrub.c |    6 +++---
>  scrub/xfs_scrub.h |    2 +-
>  4 files changed, 8 insertions(+), 8 deletions(-)


Weird, for some reason I thought I addressed this when I did the "make check"
sparse stuff.  Apparently not.

Yeah, the new name is a bit more self-documenting too I suppose.

Reviwed-by: Eric Sandeen <sandeen@redhat.com>

> 
> diff --git a/scrub/common.c b/scrub/common.c
> index 78afc4bf..c877c7c8 100644
> --- a/scrub/common.c
> +++ b/scrub/common.c
> @@ -231,8 +231,8 @@ unsigned int
>  scrub_nproc(
>  	struct scrub_ctx	*ctx)
>  {
> -	if (nr_threads)
> -		return nr_threads;
> +	if (force_nr_threads)
> +		return force_nr_threads;
>  	return ctx->nr_io_threads;
>  }
>  
> diff --git a/scrub/disk.c b/scrub/disk.c
> index 7daa508e..dd109533 100644
> --- a/scrub/disk.c
> +++ b/scrub/disk.c
> @@ -75,8 +75,8 @@ unsigned int
>  disk_heads(
>  	struct disk		*disk)
>  {
> -	if (nr_threads)
> -		return nr_threads;
> +	if (force_nr_threads)
> +		return force_nr_threads;
>  	return __disk_heads(disk);
>  }
>  
> diff --git a/scrub/xfs_scrub.c b/scrub/xfs_scrub.c
> index b8138000..71fc274f 100644
> --- a/scrub/xfs_scrub.c
> +++ b/scrub/xfs_scrub.c
> @@ -133,7 +133,7 @@ unsigned int			bg_mode;
>  int				nproc;
>  
>  /* Number of threads we're allowed to use. */
> -unsigned int			nr_threads;
> +unsigned int			force_nr_threads;
>  
>  /* Verbosity; higher values print more information. */
>  bool				verbose;
> @@ -589,7 +589,7 @@ main(
>  			}
>  			break;
>  		case 'b':
> -			nr_threads = 1;
> +			force_nr_threads = 1;
>  			bg_mode++;
>  			break;
>  		case 'C':
> @@ -659,7 +659,7 @@ main(
>  			perror("nr_threads");
>  			usage();
>  		}
> -		nr_threads = x;
> +		force_nr_threads = x;
>  	}
>  
>  	if (optind != argc - 1)
> diff --git a/scrub/xfs_scrub.h b/scrub/xfs_scrub.h
> index a961d8fd..a459e4b5 100644
> --- a/scrub/xfs_scrub.h
> +++ b/scrub/xfs_scrub.h
> @@ -10,7 +10,7 @@ extern char *progname;
>  
>  #define _PATH_PROC_MOUNTS	"/proc/mounts"
>  
> -extern unsigned int		nr_threads;
> +extern unsigned int		force_nr_threads;
>  extern unsigned int		bg_mode;
>  extern unsigned int		debug;
>  extern int			nproc;
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly
  2019-02-04 18:16     ` Darrick J. Wong
@ 2019-02-04 18:27       ` Eric Sandeen
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Sandeen @ 2019-02-04 18:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs



On 2/4/19 12:16 PM, Darrick J. Wong wrote:
> On Mon, Feb 04, 2019 at 12:08:49PM -0600, Eric Sandeen wrote:
>> On 12/19/18 1:29 PM, Darrick J. Wong wrote:
>>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>>
>>> Back when I was designing xfs_scrub_all, I naïvely assumed that the
>>> emitted output would always list physical storage before the virtual
>>> devices stacked atop it.  However, this is not actually true when one
>>> omits the "NAME" column, which is crucial to forcing the output (json or
>>> otherwise) to capture the block device hierarchy.  If the assumption is
>>> violated, the program crashes with a python exception.
>>
>> Is this a quirk or a documented feature of lsblk?
> 
> Not a documented feature, but seems to be a fairly common behavioral
> quirk?

bleah.  Sounds fragile.  :(

Let me check with kzak on this.  If your patch improves it for now, super,
but I don't want to leave a little time bomb here.

>>> -	cmd=['lsblk', '-o', 'KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
>>> +	cmd=['lsblk', '-o', 'NAME,KNAME,TYPE,FSTYPE,MOUNTPOINT', '-J']
>>
>> sorry for the ridonculously late review, and although '-J" isn't added
>> new in this patch, FYI at least RHEL7 does not allow it:
>>
>> # lsblk -o KNAME -J
>> lsblk: invalid option -- 'J'
>>
>> ... thoughts?  Probably should be handled gracefully at least?
> 
> lsblk returns 1 for unrecognized arguments, so xfs_scrub_all will bail
> if lsblk barfs.  Not sure if we want to divert lsblk's stderr to
> /dev/null or just let it spray out sloppily like we do now?
> 
> (Practically speaking I suspect that distros will pick up the util-linux
> release that has json support before they pick up XFS kernel scrub...
> but I should at least make sure that's really true.)

Yeah I'd expect that as well, just wondered how much of a landmine it
might be, I guess.

-Eric

> --D
> 
>> -Eric
>>
>>>  	result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
>>>  	result.wait()
>>>  	if result.returncode != 0:
>>> @@ -38,18 +50,12 @@ def find_mounts():
>>>  	sarray = [x.decode(sys.stdout.encoding) for x in result.stdout.readlines()]
>>>  	output = ' '.join(sarray)
>>>  	bdevdata = json.loads(output)
>>> +
>>>  	# The lsblk output had better be in disks-then-partitions order
>>>  	for bdev in bdevdata['blockdevices']:
>>> -		if bdev['type'] in ('disk', 'loop'):
>>> -			lastdisk = bdev['kname']
>>> -		if bdev['fstype'] == 'xfs':
>>> -			mnt = bdev['mountpoint']
>>> -			if mnt is None:
>>> -				continue
>>> -			if mnt in fs:
>>> -				fs[mnt].add(lastdisk)
>>> -			else:
>>> -				fs[mnt] = set([lastdisk])
>>> +		lastdisk = bdev['kname']
>>> +		find_xfs_mounts(bdev, fs, lastdisk)
>>> +
>>>  	return fs
>>>  
>>>  def kill_systemd(unit, proc):
>>>
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/8] xfs_scrub: use datadev parallelization estimates for thread count
  2018-12-19 19:29 ` [PATCH 4/8] xfs_scrub: use datadev parallelization estimates for thread count Darrick J. Wong
@ 2019-02-04 18:31   ` Eric Sandeen
  2019-02-04 18:34     ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2019-02-04 18:31 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 12/19/18 1:29 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> During phases 2-5, xfs_scrub should estimate the level of
> parallelization possible on the data device to determine the number of
> threads spawned to scrub filesystem metadata, not just blindly using the
> number of CPUs.  This avoids flooding non-rotational storage with random
> reads, which totally destroys performance and makes scrub runtimes
> higher.

typo in the changelog?  __disk_heads() /does/ throw all cpus at nonrotational,
should the above presumably say "rotational storage?"

Otherwise,

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  scrub/phase1.c |   14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/scrub/phase1.c b/scrub/phase1.c
> index 2113014b..6b472147 100644
> --- a/scrub/phase1.c
> +++ b/scrub/phase1.c
> @@ -109,13 +109,6 @@ _("Must be root to run scrub."));
>  		return false;
>  	}
>  
> -	ctx->nr_io_threads = nproc;
> -	if (verbose) {
> -		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
> -				ctx->mntpoint, scrub_nproc(ctx));
> -		fflush(stdout);
> -	}
> -
>  	if (!platform_test_xfs_fd(ctx->mnt_fd)) {
>  		str_info(ctx, ctx->mntpoint,
>  _("Does not appear to be an XFS filesystem!"));
> @@ -193,6 +186,13 @@ _("Unable to find realtime device path."));
>  		return false;
>  	}
>  
> +	ctx->nr_io_threads = disk_heads(ctx->datadev);
> +	if (verbose) {
> +		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
> +				ctx->mntpoint, scrub_nproc(ctx));
> +		fflush(stdout);
> +	}
> +
>  	if (ctx->fsinfo.fs_log) {
>  		ctx->logdev = disk_open(ctx->fsinfo.fs_log);
>  		if (error) {
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/8] xfs_scrub: use datadev parallelization estimates for thread count
  2019-02-04 18:31   ` Eric Sandeen
@ 2019-02-04 18:34     ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2019-02-04 18:34 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Mon, Feb 04, 2019 at 12:31:59PM -0600, Eric Sandeen wrote:
> On 12/19/18 1:29 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > During phases 2-5, xfs_scrub should estimate the level of
> > parallelization possible on the data device to determine the number of
> > threads spawned to scrub filesystem metadata, not just blindly using the
> > number of CPUs.  This avoids flooding non-rotational storage with random
> > reads, which totally destroys performance and makes scrub runtimes
> > higher.
> 
> typo in the changelog?  __disk_heads() /does/ throw all cpus at nonrotational,
> should the above presumably say "rotational storage?"

Er... yes.  Would you mind changing that on its way in, pretty please?

:)

--D

> Otherwise,
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  scrub/phase1.c |   14 +++++++-------
> >  1 file changed, 7 insertions(+), 7 deletions(-)
> > 
> > 
> > diff --git a/scrub/phase1.c b/scrub/phase1.c
> > index 2113014b..6b472147 100644
> > --- a/scrub/phase1.c
> > +++ b/scrub/phase1.c
> > @@ -109,13 +109,6 @@ _("Must be root to run scrub."));
> >  		return false;
> >  	}
> >  
> > -	ctx->nr_io_threads = nproc;
> > -	if (verbose) {
> > -		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
> > -				ctx->mntpoint, scrub_nproc(ctx));
> > -		fflush(stdout);
> > -	}
> > -
> >  	if (!platform_test_xfs_fd(ctx->mnt_fd)) {
> >  		str_info(ctx, ctx->mntpoint,
> >  _("Does not appear to be an XFS filesystem!"));
> > @@ -193,6 +186,13 @@ _("Unable to find realtime device path."));
> >  		return false;
> >  	}
> >  
> > +	ctx->nr_io_threads = disk_heads(ctx->datadev);
> > +	if (verbose) {
> > +		fprintf(stdout, _("%s: using %d threads to scrub.\n"),
> > +				ctx->mntpoint, scrub_nproc(ctx));
> > +		fflush(stdout);
> > +	}
> > +
> >  	if (ctx->fsinfo.fs_log) {
> >  		ctx->logdev = disk_open(ctx->fsinfo.fs_log);
> >  		if (error) {
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool
  2018-12-19 19:30 ` [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool Darrick J. Wong
@ 2019-02-04 18:35   ` Eric Sandeen
  2019-02-04 18:38     ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Sandeen @ 2019-02-04 18:35 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 12/19/18 1:30 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Since we use the same read-verify pool object to dispatch file data read
> requests for both the data device and the realtime device, we should
> create enough IO threads to handle the estimated parallelization of both
> devices, not just the data device.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

So if rtdev is on SSD and datadev is on a rotational device, won't we
possibly still Launch All Threads at the rotational device for this
case, and won't that still hurt?  I'm not sure it works to lump these
together, does it?  (I also don't know if it can be done another way...)

-Eric


> ---
>  scrub/phase6.c |   20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/scrub/phase6.c b/scrub/phase6.c
> index ead48d77..cbda9b53 100644
> --- a/scrub/phase6.c
> +++ b/scrub/phase6.c
> @@ -435,6 +435,22 @@ xfs_check_rmap(
>  	return true;
>  }
>  
> +/*
> + * Estimate the amount of parallelization possible for scanning file data on
> + * the data and realtime devices.
> + */
> +static unsigned int
> +phase6_threads(
> +	struct scrub_ctx	*ctx)
> +{
> +	unsigned int		nr = disk_heads(ctx->datadev);
> +
> +	if (ctx->rtdev)
> +		nr += disk_heads(ctx->rtdev);
> +
> +	return nr > nproc ? nproc : nr;
> +}
> +
>  /*
>   * Read verify all the file data blocks in a filesystem.  Since XFS doesn't
>   * do data checksums, we trust that the underlying storage will pass back
> @@ -469,7 +485,7 @@ xfs_scan_blocks(
>  	}
>  
>  	ve.readverify = read_verify_pool_init(ctx, ctx->geo.blocksize,
> -			xfs_check_rmap_ioerr, disk_heads(ctx->datadev));
> +			xfs_check_rmap_ioerr, phase6_threads(ctx));
>  	if (!ve.readverify) {
>  		moveon = false;
>  		str_info(ctx, ctx->mntpoint,
> @@ -525,7 +541,7 @@ xfs_estimate_verify_work(
>  		return moveon;
>  
>  	*items = ((d_blocks - d_bfree) + (r_blocks - r_bfree)) << ctx->blocklog;
> -	*nr_threads = disk_heads(ctx->datadev);
> +	*nr_threads = phase6_threads(ctx);
>  	*rshift = 20;
>  	return moveon;
>  }
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool
  2019-02-04 18:35   ` Eric Sandeen
@ 2019-02-04 18:38     ` Darrick J. Wong
  2019-02-05  2:23       ` Darrick J. Wong
  0 siblings, 1 reply; 23+ messages in thread
From: Darrick J. Wong @ 2019-02-04 18:38 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Mon, Feb 04, 2019 at 12:35:00PM -0600, Eric Sandeen wrote:
> On 12/19/18 1:30 PM, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Since we use the same read-verify pool object to dispatch file data read
> > requests for both the data device and the realtime device, we should
> > create enough IO threads to handle the estimated parallelization of both
> > devices, not just the data device.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> 
> So if rtdev is on SSD and datadev is on a rotational device, won't we
> possibly still Launch All Threads at the rotational device for this
> case, and won't that still hurt?  I'm not sure it works to lump these
> together, does it?  (I also don't know if it can be done another way...)

Hmm.  I think we /could/ have separate readverify pools for data and rt
devices.  Let me look into that...

--D

> -Eric
> 
> 
> > ---
> >  scrub/phase6.c |   20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> > 
> > diff --git a/scrub/phase6.c b/scrub/phase6.c
> > index ead48d77..cbda9b53 100644
> > --- a/scrub/phase6.c
> > +++ b/scrub/phase6.c
> > @@ -435,6 +435,22 @@ xfs_check_rmap(
> >  	return true;
> >  }
> >  
> > +/*
> > + * Estimate the amount of parallelization possible for scanning file data on
> > + * the data and realtime devices.
> > + */
> > +static unsigned int
> > +phase6_threads(
> > +	struct scrub_ctx	*ctx)
> > +{
> > +	unsigned int		nr = disk_heads(ctx->datadev);
> > +
> > +	if (ctx->rtdev)
> > +		nr += disk_heads(ctx->rtdev);
> > +
> > +	return nr > nproc ? nproc : nr;
> > +}
> > +
> >  /*
> >   * Read verify all the file data blocks in a filesystem.  Since XFS doesn't
> >   * do data checksums, we trust that the underlying storage will pass back
> > @@ -469,7 +485,7 @@ xfs_scan_blocks(
> >  	}
> >  
> >  	ve.readverify = read_verify_pool_init(ctx, ctx->geo.blocksize,
> > -			xfs_check_rmap_ioerr, disk_heads(ctx->datadev));
> > +			xfs_check_rmap_ioerr, phase6_threads(ctx));
> >  	if (!ve.readverify) {
> >  		moveon = false;
> >  		str_info(ctx, ctx->mntpoint,
> > @@ -525,7 +541,7 @@ xfs_estimate_verify_work(
> >  		return moveon;
> >  
> >  	*items = ((d_blocks - d_bfree) + (r_blocks - r_bfree)) << ctx->blocklog;
> > -	*nr_threads = disk_heads(ctx->datadev);
> > +	*nr_threads = phase6_threads(ctx);
> >  	*rshift = 20;
> >  	return moveon;
> >  }
> > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly
  2018-12-19 19:30 ` [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly Darrick J. Wong
  2018-12-19 20:24   ` Bill O'Donnell
@ 2019-02-04 19:19   ` Eric Sandeen
  1 sibling, 0 replies; 23+ messages in thread
From: Eric Sandeen @ 2019-02-04 19:19 UTC (permalink / raw)
  To: Darrick J. Wong, sandeen; +Cc: linux-xfs

On 12/19/18 1:30 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In mk_root_dir, we reinitialize the root directory inode with a link
> count of 1.  This differs from mkfs parseproto, which initializes the
> root to have a link count of 2.  The nlink discrepancy in repair is
> caught and corrected during phase 7, but this is unnecessary since we
> should set it properly in the first place.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

I have this nagging feeling that played whackamole with root dir counts
but can't find any real evidence of that now.  Seems ok.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> ---
>  repair/phase6.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/repair/phase6.c b/repair/phase6.c
> index 9477bc25..8a50b350 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -891,7 +891,7 @@ mk_root_dir(xfs_mount_t *mp)
>  	ip->i_d.di_format = XFS_DINODE_FMT_EXTENTS;
>  	ip->i_d.di_aformat = XFS_DINODE_FMT_EXTENTS;
>  
> -	set_nlink(VFS_I(ip), 1);	/* account for . */
> +	set_nlink(VFS_I(ip), 2);	/* account for . and .. */
>  
>  	times = XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD;
>  	if (ip->i_d.di_version == 3) {
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool
  2019-02-04 18:38     ` Darrick J. Wong
@ 2019-02-05  2:23       ` Darrick J. Wong
  0 siblings, 0 replies; 23+ messages in thread
From: Darrick J. Wong @ 2019-02-05  2:23 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: sandeen, linux-xfs

On Mon, Feb 04, 2019 at 10:38:55AM -0800, Darrick J. Wong wrote:
> On Mon, Feb 04, 2019 at 12:35:00PM -0600, Eric Sandeen wrote:
> > On 12/19/18 1:30 PM, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Since we use the same read-verify pool object to dispatch file data read
> > > requests for both the data device and the realtime device, we should
> > > create enough IO threads to handle the estimated parallelization of both
> > > devices, not just the data device.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > So if rtdev is on SSD and datadev is on a rotational device, won't we
> > possibly still Launch All Threads at the rotational device for this
> > case, and won't that still hurt?  I'm not sure it works to lump these
> > together, does it?  (I also don't know if it can be done another way...)
> 
> Hmm.  I think we /could/ have separate readverify pools for data and rt
> devices.  Let me look into that...

...yes, the readverify pool code needs some refactoring and thinko
removal; however, that cleanup should wait for xfsprogs 5.0.

--D

> --D
> 
> > -Eric
> > 
> > 
> > > ---
> > >  scrub/phase6.c |   20 ++++++++++++++++++--
> > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > > 
> > > 
> > > diff --git a/scrub/phase6.c b/scrub/phase6.c
> > > index ead48d77..cbda9b53 100644
> > > --- a/scrub/phase6.c
> > > +++ b/scrub/phase6.c
> > > @@ -435,6 +435,22 @@ xfs_check_rmap(
> > >  	return true;
> > >  }
> > >  
> > > +/*
> > > + * Estimate the amount of parallelization possible for scanning file data on
> > > + * the data and realtime devices.
> > > + */
> > > +static unsigned int
> > > +phase6_threads(
> > > +	struct scrub_ctx	*ctx)
> > > +{
> > > +	unsigned int		nr = disk_heads(ctx->datadev);
> > > +
> > > +	if (ctx->rtdev)
> > > +		nr += disk_heads(ctx->rtdev);
> > > +
> > > +	return nr > nproc ? nproc : nr;
> > > +}
> > > +
> > >  /*
> > >   * Read verify all the file data blocks in a filesystem.  Since XFS doesn't
> > >   * do data checksums, we trust that the underlying storage will pass back
> > > @@ -469,7 +485,7 @@ xfs_scan_blocks(
> > >  	}
> > >  
> > >  	ve.readverify = read_verify_pool_init(ctx, ctx->geo.blocksize,
> > > -			xfs_check_rmap_ioerr, disk_heads(ctx->datadev));
> > > +			xfs_check_rmap_ioerr, phase6_threads(ctx));
> > >  	if (!ve.readverify) {
> > >  		moveon = false;
> > >  		str_info(ctx, ctx->mntpoint,
> > > @@ -525,7 +541,7 @@ xfs_estimate_verify_work(
> > >  		return moveon;
> > >  
> > >  	*items = ((d_blocks - d_bfree) + (r_blocks - r_bfree)) << ctx->blocklog;
> > > -	*nr_threads = disk_heads(ctx->datadev);
> > > +	*nr_threads = phase6_threads(ctx);
> > >  	*rshift = 20;
> > >  	return moveon;
> > >  }
> > > 

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-02-05  2:23 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-19 19:29 [PATCH 0/8] xfsprogs: various fixes Darrick J. Wong
2018-12-19 19:29 ` [PATCH 1/8] xfs_scrub_all: walk the lsblk device/fs hierarchy correctly Darrick J. Wong
2019-02-04 18:08   ` Eric Sandeen
2019-02-04 18:16     ` Darrick J. Wong
2019-02-04 18:27       ` Eric Sandeen
2018-12-19 19:29 ` [PATCH 2/8] xfs_scrub_all.timer: activate after most of the system is up Darrick J. Wong
2019-02-04 18:12   ` Eric Sandeen
2018-12-19 19:29 ` [PATCH 3/8] xfs_scrub: rename the global nr_threads Darrick J. Wong
2019-02-04 18:20   ` Eric Sandeen
2018-12-19 19:29 ` [PATCH 4/8] xfs_scrub: use datadev parallelization estimates for thread count Darrick J. Wong
2019-02-04 18:31   ` Eric Sandeen
2019-02-04 18:34     ` Darrick J. Wong
2018-12-19 19:30 ` [PATCH 5/8] xfs_scrub: use data/rtdev parallelization estimates for the read-verify pool Darrick J. Wong
2019-02-04 18:35   ` Eric Sandeen
2019-02-04 18:38     ` Darrick J. Wong
2019-02-05  2:23       ` Darrick J. Wong
2018-12-19 19:30 ` [PATCH 6/8] xfs_repair: reinitialize the root directory nlink correctly Darrick J. Wong
2018-12-19 20:24   ` Bill O'Donnell
2019-02-04 19:19   ` Eric Sandeen
2018-12-19 19:30 ` [PATCH 7/8] xfs_repair: bump the irec on-disk nlink when adding lost+found Darrick J. Wong
2018-12-19 20:30   ` Bill O'Donnell
2018-12-19 19:30 ` [PATCH 8/8] xfs_repair: fix uninitialized variable warnings Darrick J. Wong
2018-12-19 20:25   ` Bill O'Donnell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.