All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] xfsprogs: actually check that writes succeeded
@ 2020-02-25  0:10 Darrick J. Wong
  2020-02-25  0:10 ` [PATCH 1/7] libxfs: libxfs_buf_delwri_submit should write buffers immediately Darrick J. Wong
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:10 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

Hi all,

A code audit demonstrated that many xfsprogs utilities do not check that
the buffers they write actually make it to disk.  While the userspace
buffer cache has a means to check that a buffer was written (which is to
reread the buffer after the write), most utilities mark a buffer dirty
and release it to the MRU and do not re-check the buffer.

Worse yet, the MRU will retain the buffers for all failed writes until
the buffer cache is torn down, but it in turn has no way to communicate
that writes were lost due to IO errors.  libxfs will flush the device
when it is unmounted, but as there is no return value, we again fail to
notice that writes have been lost.  Most likely this leads to a corrupt
filesystem, which makes it all the more surprising that xfs_repair can
lose writes yet still return 0!

Fix all this by making delwri_submit a synchronous write operation like
its kernel counterpart; teaching the buffer cache to mark the buftarg
when it knows it's losing writes; teaching the device flush functions to
return error codes; and adding a new "flush filesystem" API that user
programs can call to check for lost writes or IO errors.  Then teach all
the userspace programs to flush the fs at exit and report errors.

In v2 we split up some of the patches and make sure we always fsync when
flushing a block device.  In v3 we move the buffer and disk flushing
requests into the libxfs unmount function.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=buffer-write-fixes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] libxfs: libxfs_buf_delwri_submit should write buffers immediately
  2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
@ 2020-02-25  0:10 ` Darrick J. Wong
  2020-02-25  0:10 ` [PATCH 2/7] libxfs: complain when write IOs fail Darrick J. Wong
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:10 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, Allison Collins, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

The whole point of libxfs_buf_delwri_submit is to submit a bunch of
buffers for write and wait for the response.  Unfortunately, while it
does mark the buffers dirty, it doesn't actually flush them and lets the
cache mru flusher do it.  This is inconsistent with the kernel API,
which actually writes the buffers and returns any IO errors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/rdwr.c   |    3 ++-
 mkfs/xfs_mkfs.c |   16 ++++++++++------
 2 files changed, 12 insertions(+), 7 deletions(-)


diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 0d9d7202..2e9f66cc 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1491,9 +1491,10 @@ xfs_buf_delwri_submit(
 
 	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
 		list_del_init(&bp->b_list);
-		error2 = libxfs_writebuf(bp, 0);
+		error2 = libxfs_writebufr(bp);
 		if (!error)
 			error = error2;
+		libxfs_putbuf(bp);
 	}
 
 	return error;
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 5a042917..1f5d2105 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3685,6 +3685,7 @@ main(
 	};
 
 	struct list_head	buffer_list;
+	int			error;
 
 	platform_uuid_generate(&cli.uuid);
 	progname = basename(argv[0]);
@@ -3885,16 +3886,19 @@ main(
 		if (agno % 16)
 			continue;
 
-		if (libxfs_buf_delwri_submit(&buffer_list)) {
-			fprintf(stderr, _("%s: writing AG headers failed\n"),
-					progname);
+		error = -libxfs_buf_delwri_submit(&buffer_list);
+		if (error) {
+			fprintf(stderr,
+	_("%s: writing AG headers failed, err=%d\n"),
+					progname, error);
 			exit(1);
 		}
 	}
 
-	if (libxfs_buf_delwri_submit(&buffer_list)) {
-		fprintf(stderr, _("%s: writing AG headers failed\n"),
-				progname);
+	error = -libxfs_buf_delwri_submit(&buffer_list);
+	if (error) {
+		fprintf(stderr, _("%s: writing AG headers failed, err=%d\n"),
+				progname, error);
 		exit(1);
 	}
 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] libxfs: complain when write IOs fail
  2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
  2020-02-25  0:10 ` [PATCH 1/7] libxfs: libxfs_buf_delwri_submit should write buffers immediately Darrick J. Wong
@ 2020-02-25  0:10 ` Darrick J. Wong
  2020-02-25  0:10 ` [PATCH 3/7] libxfs: return flush failures Darrick J. Wong
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:10 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, Allison Collins, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

Complain whenever a metadata write fails.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libxfs/rdwr.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)


diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 2e9f66cc..8b47d438 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -1149,7 +1149,12 @@ libxfs_writebufr(xfs_buf_t *bp)
 			(long long)LIBXFS_BBTOOFF64(bp->b_bn),
 			(long long)bp->b_bn, bp, bp->b_error);
 #endif
-	if (!bp->b_error) {
+	if (bp->b_error) {
+		fprintf(stderr,
+	_("%s: write failed on %s bno 0x%llx/0x%x, err=%d\n"),
+			__func__, bp->b_ops->name,
+			(long long)bp->b_bn, bp->b_bcount, -bp->b_error);
+	} else {
 		bp->b_flags |= LIBXFS_B_UPTODATE;
 		bp->b_flags &= ~(LIBXFS_B_DIRTY | LIBXFS_B_EXIT |
 				 LIBXFS_B_UNCHECKED);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] libxfs: return flush failures
  2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
  2020-02-25  0:10 ` [PATCH 1/7] libxfs: libxfs_buf_delwri_submit should write buffers immediately Darrick J. Wong
  2020-02-25  0:10 ` [PATCH 2/7] libxfs: complain when write IOs fail Darrick J. Wong
@ 2020-02-25  0:10 ` Darrick J. Wong
  2020-02-25  0:10 ` [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem Darrick J. Wong
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:10 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, Allison Collins, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

Modify platform_flush_device so that we can return error status when
device flushes fail.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 libfrog/linux.c    |   25 +++++++++++++++++--------
 libfrog/platform.h |    2 +-
 2 files changed, 18 insertions(+), 9 deletions(-)


diff --git a/libfrog/linux.c b/libfrog/linux.c
index 41a168b4..60bc1dc4 100644
--- a/libfrog/linux.c
+++ b/libfrog/linux.c
@@ -140,20 +140,29 @@ platform_set_blocksize(int fd, char *path, dev_t device, int blocksize, int fata
 	return error;
 }
 
-void
-platform_flush_device(int fd, dev_t device)
+/*
+ * Flush dirty pagecache and disk write cache to stable media.  Returns 0 for
+ * success or -1 (with errno set) for failure.
+ */
+int
+platform_flush_device(
+	int		fd,
+	dev_t		device)
 {
 	struct stat	st;
+	int		ret;
+
 	if (major(device) == RAMDISK_MAJOR)
-		return;
+		return 0;
 
-	if (fstat(fd, &st) < 0)
-		return;
+	ret = fstat(fd, &st);
+	if (ret)
+		return ret;
 
 	if (S_ISREG(st.st_mode))
-		fsync(fd);
-	else
-		ioctl(fd, BLKFLSBUF, 0);
+		return fsync(fd);
+
+	return ioctl(fd, BLKFLSBUF, 0);
 }
 
 void
diff --git a/libfrog/platform.h b/libfrog/platform.h
index 76887e5e..0aef318a 100644
--- a/libfrog/platform.h
+++ b/libfrog/platform.h
@@ -12,7 +12,7 @@ int platform_check_ismounted(char *path, char *block, struct stat *sptr,
 int platform_check_iswritable(char *path, char *block, struct stat *sptr);
 int platform_set_blocksize(int fd, char *path, dev_t device, int bsz,
 		int fatal);
-void platform_flush_device(int fd, dev_t device);
+int platform_flush_device(int fd, dev_t device);
 char *platform_findrawpath(char *path);
 char *platform_findblockpath(char *path);
 int platform_direct_blockdev(void);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem
  2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
                   ` (2 preceding siblings ...)
  2020-02-25  0:10 ` [PATCH 3/7] libxfs: return flush failures Darrick J. Wong
@ 2020-02-25  0:10 ` Darrick J. Wong
  2020-02-25 15:07   ` Brian Foster
  2020-02-25 17:37   ` Christoph Hellwig
  2020-02-25  0:10 ` [PATCH 5/7] mkfs: check that metadata updates have been committed Darrick J. Wong
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:10 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Teach libxfs_umount to flush all dirty buffers when unmounting the
filesystem, to log write verifier errors and IO errors, and to return an
error code when things go wrong.  Subsequent patches will teach critical
utilities to exit with EXIT_FAILURE when this happens.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 include/xfs_mount.h |    2 +
 libxfs/init.c       |   96 +++++++++++++++++++++++++++++++++++++++++++++++++--
 libxfs/libxfs_io.h  |    7 ++++
 libxfs/rdwr.c       |   38 ++++++++++++++++++--
 4 files changed, 135 insertions(+), 8 deletions(-)


diff --git a/include/xfs_mount.h b/include/xfs_mount.h
index 29b3cc1b..7bd23fbb 100644
--- a/include/xfs_mount.h
+++ b/include/xfs_mount.h
@@ -184,7 +184,7 @@ xfs_perag_resv(
 
 extern xfs_mount_t	*libxfs_mount (xfs_mount_t *, xfs_sb_t *,
 				dev_t, dev_t, dev_t, int);
-extern void	libxfs_umount (xfs_mount_t *);
+int		libxfs_umount(struct xfs_mount *mp);
 extern void	libxfs_rtmount_destroy (xfs_mount_t *);
 
 #endif	/* __XFS_MOUNT_H__ */
diff --git a/libxfs/init.c b/libxfs/init.c
index a0d4b7f4..d4804ead 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -569,6 +569,8 @@ libxfs_buftarg_alloc(
 	}
 	btp->bt_mount = mp;
 	btp->dev = dev;
+	btp->flags = 0;
+
 	return btp;
 }
 
@@ -791,17 +793,104 @@ libxfs_rtmount_destroy(xfs_mount_t *mp)
 	mp->m_rsumip = mp->m_rbmip = NULL;
 }
 
+/* Flush a device and report on writes that didn't make it to stable storage. */
+static inline int
+libxfs_flush_buftarg(
+	struct xfs_buftarg	*btp,
+	const char		*buftarg_descr)
+{
+	int			error = 0;
+	int			err2;
+
+	/*
+	 * Write verifier failures are evidence of a buggy program.  Make sure
+	 * that this state is always reported to the caller.
+	 */
+	if (btp->flags & XFS_BUFTARG_CORRUPT_WRITE) {
+		fprintf(stderr,
+_("%s: Refusing to write a corrupt buffer to the %s!\n"),
+				progname, buftarg_descr);
+		error = -EFSCORRUPTED;
+	}
+
+	if (btp->flags & XFS_BUFTARG_LOST_WRITE) {
+		fprintf(stderr,
+_("%s: Lost a write to the %s!\n"),
+				progname, buftarg_descr);
+		if (!error)
+			error = -EIO;
+	}
+
+	err2 = libxfs_blkdev_issue_flush(btp);
+	if (err2) {
+		fprintf(stderr,
+_("%s: Flushing the %s failed, err=%d!\n"),
+				progname, buftarg_descr, -err2);
+	}
+	if (!error)
+		error = err2;
+
+	return error;
+}
+
+/*
+ * Flush all dirty buffers to stable storage and report on writes that didn't
+ * make it to stable storage.
+ */
+static int
+libxfs_flush_mount(
+	struct xfs_mount	*mp)
+{
+	int			error = 0;
+	int			err2;
+
+	/*
+	 * Purge the buffer cache to write all dirty buffers to disk and free
+	 * all incore buffers.  Buffers that fail write verification will cause
+	 * the CORRUPT_WRITE flag to be set in the buftarg.  Buffers that
+	 * cannot be written will cause the LOST_WRITE flag to be set in the
+	 * buftarg.
+	 */
+	libxfs_bcache_purge();
+
+	/* Flush all kernel and disk write caches, and report failures. */
+	if (mp->m_ddev_targp) {
+		err2 = libxfs_flush_buftarg(mp->m_ddev_targp, _("data device"));
+		if (!error)
+			error = err2;
+	}
+
+	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
+		err2 = libxfs_flush_buftarg(mp->m_logdev_targp,
+				_("log device"));
+		if (!error)
+			error = err2;
+	}
+
+	if (mp->m_rtdev_targp) {
+		err2 = libxfs_flush_buftarg(mp->m_rtdev_targp,
+				_("realtime device"));
+		if (!error)
+			error = err2;
+	}
+
+	return error;
+}
+
 /*
  * Release any resource obtained during a mount.
  */
-void
-libxfs_umount(xfs_mount_t *mp)
+int
+libxfs_umount(
+	struct xfs_mount	*mp)
 {
 	struct xfs_perag	*pag;
 	int			agno;
+	int			error;
 
 	libxfs_rtmount_destroy(mp);
-	libxfs_bcache_purge();
+
+	error = libxfs_flush_mount(mp);
 
 	for (agno = 0; agno < mp->m_maxagi; agno++) {
 		pag = radix_tree_delete(&mp->m_perag_tree, agno);
@@ -816,6 +905,7 @@ libxfs_umount(xfs_mount_t *mp)
 		kmem_free(mp->m_logdev_targp);
 	kmem_free(mp->m_ddev_targp);
 
+	return error;
 }
 
 /*
diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
index 579df52b..6bb75a67 100644
--- a/libxfs/libxfs_io.h
+++ b/libxfs/libxfs_io.h
@@ -23,10 +23,17 @@ struct xfs_perag;
 struct xfs_buftarg {
 	struct xfs_mount	*bt_mount;
 	dev_t			dev;
+	unsigned int		flags;
 };
 
+/* We purged a dirty buffer and lost a write. */
+#define XFS_BUFTARG_LOST_WRITE		(1 << 0)
+/* A dirty buffer failed the write verifier. */
+#define XFS_BUFTARG_CORRUPT_WRITE	(1 << 1)
+
 extern void	libxfs_buftarg_init(struct xfs_mount *mp, dev_t ddev,
 				    dev_t logdev, dev_t rtdev);
+int libxfs_blkdev_issue_flush(struct xfs_buftarg *btp);
 
 #define LIBXFS_BBTOOFF64(bbs)	(((xfs_off_t)(bbs)) << BBSHIFT)
 
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 8b47d438..4253b890 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -17,6 +17,7 @@
 #include "xfs_inode_fork.h"
 #include "xfs_inode.h"
 #include "xfs_trans.h"
+#include "libfrog/platform.h"
 
 #include "libxfs.h"		/* for LIBXFS_EXIT_ON_FAILURE */
 
@@ -1219,6 +1220,19 @@ libxfs_iomove(xfs_buf_t *bp, uint boff, int len, void *data, int flags)
 	}
 }
 
+/* Complain about (and remember) dropping dirty buffers. */
+static void
+libxfs_whine_dirty_buf(
+	struct xfs_buf		*bp)
+{
+	fprintf(stderr, _("%s: Releasing dirty buffer to free list!\n"),
+			progname);
+
+	if (bp->b_error == -EFSCORRUPTED)
+		bp->b_target->flags |= XFS_BUFTARG_CORRUPT_WRITE;
+	bp->b_target->flags |= XFS_BUFTARG_LOST_WRITE;
+}
+
 static void
 libxfs_brelse(
 	struct cache_node	*node)
@@ -1228,8 +1242,7 @@ libxfs_brelse(
 	if (!bp)
 		return;
 	if (bp->b_flags & LIBXFS_B_DIRTY)
-		fprintf(stderr,
-			"releasing dirty buffer to free list!\n");
+		libxfs_whine_dirty_buf(bp);
 
 	pthread_mutex_lock(&xfs_buf_freelist.cm_mutex);
 	list_add(&bp->b_node.cn_mru, &xfs_buf_freelist.cm_list);
@@ -1249,8 +1262,7 @@ libxfs_bulkrelse(
 
 	list_for_each_entry(bp, list, b_node.cn_mru) {
 		if (bp->b_flags & LIBXFS_B_DIRTY)
-			fprintf(stderr,
-				"releasing dirty buffer (bulk) to free list!\n");
+			libxfs_whine_dirty_buf(bp);
 		count++;
 	}
 
@@ -1479,6 +1491,24 @@ libxfs_irele(
 	kmem_cache_free(xfs_inode_zone, ip);
 }
 
+/*
+ * Flush everything dirty in the kernel and disk write caches to stable media.
+ * Returns 0 for success or a negative error code.
+ */
+int
+libxfs_blkdev_issue_flush(
+	struct xfs_buftarg	*btp)
+{
+	int			fd, ret;
+
+	if (btp->dev == 0)
+		return 0;
+
+	fd = libxfs_device_to_fd(btp->dev);
+	ret = platform_flush_device(fd, btp->dev);
+	return ret ? -errno : 0;
+}
+
 /*
  * Write out a buffer list synchronously.
  *


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] mkfs: check that metadata updates have been committed
  2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
                   ` (3 preceding siblings ...)
  2020-02-25  0:10 ` [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem Darrick J. Wong
@ 2020-02-25  0:10 ` Darrick J. Wong
  2020-02-25 15:08   ` Brian Foster
  2020-02-25 17:37   ` Christoph Hellwig
  2020-02-25  0:11 ` [PATCH 6/7] xfs_repair: " Darrick J. Wong
  2020-02-25  0:11 ` [PATCH 7/7] libfrog: always fsync when flushing a device Darrick J. Wong
  6 siblings, 2 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:10 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure that all the metadata we wrote in the process of formatting
the filesystem have been written correctly, or exit with failure.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 mkfs/xfs_mkfs.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 1f5d2105..1038e604 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3940,13 +3940,16 @@ main(
 	(XFS_BUF_TO_SBP(buf))->sb_inprogress = 0;
 	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
 
-	libxfs_umount(mp);
+	/* Report failure if anything failed to get written to our new fs. */
+	error = -libxfs_umount(mp);
+	if (error)
+		exit(1);
+
 	if (xi.rtdev)
 		libxfs_device_close(xi.rtdev);
 	if (xi.logdev && xi.logdev != xi.ddev)
 		libxfs_device_close(xi.logdev);
 	libxfs_device_close(xi.ddev);
 	libxfs_destroy();
-
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] xfs_repair: check that metadata updates have been committed
  2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
                   ` (4 preceding siblings ...)
  2020-02-25  0:10 ` [PATCH 5/7] mkfs: check that metadata updates have been committed Darrick J. Wong
@ 2020-02-25  0:11 ` Darrick J. Wong
  2020-02-25 15:08   ` Brian Foster
  2020-02-25 19:24   ` [PATCH v2 " Darrick J. Wong
  2020-02-25  0:11 ` [PATCH 7/7] libfrog: always fsync when flushing a device Darrick J. Wong
  6 siblings, 2 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:11 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure that any metadata that we repaired or regenerated has been
written to disk.  If that fails, exit with 1 to signal that there are
still errors in the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 repair/xfs_repair.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)


diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index eb1ce546..ccb13f4a 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -703,6 +703,7 @@ main(int argc, char **argv)
 	struct xfs_sb	psb;
 	int		rval;
 	struct xfs_ino_geometry	*igeo;
+	int		error;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
 	 */
 	libxfs_bcache_flush();
 	format_log_max_lsn(mp);
-	libxfs_umount(mp);
+
+	/* Report failure if anything failed to get written to our fs. */
+	error = -libxfs_umount(mp);
+	if (error)
+		exit(1);
 
 	if (x.rtdev)
 		libxfs_device_close(x.rtdev);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] libfrog: always fsync when flushing a device
  2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
                   ` (5 preceding siblings ...)
  2020-02-25  0:11 ` [PATCH 6/7] xfs_repair: " Darrick J. Wong
@ 2020-02-25  0:11 ` Darrick J. Wong
  2020-02-25 17:38   ` Christoph Hellwig
  6 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25  0:11 UTC (permalink / raw)
  To: sandeen, darrick.wong; +Cc: linux-xfs, Brian Foster

From: Darrick J. Wong <darrick.wong@oracle.com>

Always call fsync() when we're flushing a device, even if it is a block
device.  It's probably redundant to call fsync /and/ BLKFLSBUF, but the
latter has odd behavior so we want to make sure the standard flush
methods have a chance to run first.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
 libfrog/linux.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)


diff --git a/libfrog/linux.c b/libfrog/linux.c
index 60bc1dc4..40a839d1 100644
--- a/libfrog/linux.c
+++ b/libfrog/linux.c
@@ -155,14 +155,18 @@ platform_flush_device(
 	if (major(device) == RAMDISK_MAJOR)
 		return 0;
 
+	ret = fsync(fd);
+	if (ret)
+		return ret;
+
 	ret = fstat(fd, &st);
 	if (ret)
 		return ret;
 
-	if (S_ISREG(st.st_mode))
-		return fsync(fd);
+	if (S_ISBLK(st.st_mode))
+		return ioctl(fd, BLKFLSBUF, 0);
 
-	return ioctl(fd, BLKFLSBUF, 0);
+	return 0;
 }
 
 void


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem
  2020-02-25  0:10 ` [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem Darrick J. Wong
@ 2020-02-25 15:07   ` Brian Foster
  2020-02-25 17:37   ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: Brian Foster @ 2020-02-25 15:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Feb 24, 2020 at 04:10:53PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Teach libxfs_umount to flush all dirty buffers when unmounting the
> filesystem, to log write verifier errors and IO errors, and to return an
> error code when things go wrong.  Subsequent patches will teach critical
> utilities to exit with EXIT_FAILURE when this happens.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Looks reasonable to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  include/xfs_mount.h |    2 +
>  libxfs/init.c       |   96 +++++++++++++++++++++++++++++++++++++++++++++++++--
>  libxfs/libxfs_io.h  |    7 ++++
>  libxfs/rdwr.c       |   38 ++++++++++++++++++--
>  4 files changed, 135 insertions(+), 8 deletions(-)
> 
> 
> diff --git a/include/xfs_mount.h b/include/xfs_mount.h
> index 29b3cc1b..7bd23fbb 100644
> --- a/include/xfs_mount.h
> +++ b/include/xfs_mount.h
> @@ -184,7 +184,7 @@ xfs_perag_resv(
>  
>  extern xfs_mount_t	*libxfs_mount (xfs_mount_t *, xfs_sb_t *,
>  				dev_t, dev_t, dev_t, int);
> -extern void	libxfs_umount (xfs_mount_t *);
> +int		libxfs_umount(struct xfs_mount *mp);
>  extern void	libxfs_rtmount_destroy (xfs_mount_t *);
>  
>  #endif	/* __XFS_MOUNT_H__ */
> diff --git a/libxfs/init.c b/libxfs/init.c
> index a0d4b7f4..d4804ead 100644
> --- a/libxfs/init.c
> +++ b/libxfs/init.c
> @@ -569,6 +569,8 @@ libxfs_buftarg_alloc(
>  	}
>  	btp->bt_mount = mp;
>  	btp->dev = dev;
> +	btp->flags = 0;
> +
>  	return btp;
>  }
>  
> @@ -791,17 +793,104 @@ libxfs_rtmount_destroy(xfs_mount_t *mp)
>  	mp->m_rsumip = mp->m_rbmip = NULL;
>  }
>  
> +/* Flush a device and report on writes that didn't make it to stable storage. */
> +static inline int
> +libxfs_flush_buftarg(
> +	struct xfs_buftarg	*btp,
> +	const char		*buftarg_descr)
> +{
> +	int			error = 0;
> +	int			err2;
> +
> +	/*
> +	 * Write verifier failures are evidence of a buggy program.  Make sure
> +	 * that this state is always reported to the caller.
> +	 */
> +	if (btp->flags & XFS_BUFTARG_CORRUPT_WRITE) {
> +		fprintf(stderr,
> +_("%s: Refusing to write a corrupt buffer to the %s!\n"),
> +				progname, buftarg_descr);
> +		error = -EFSCORRUPTED;
> +	}
> +
> +	if (btp->flags & XFS_BUFTARG_LOST_WRITE) {
> +		fprintf(stderr,
> +_("%s: Lost a write to the %s!\n"),
> +				progname, buftarg_descr);
> +		if (!error)
> +			error = -EIO;
> +	}
> +
> +	err2 = libxfs_blkdev_issue_flush(btp);
> +	if (err2) {
> +		fprintf(stderr,
> +_("%s: Flushing the %s failed, err=%d!\n"),
> +				progname, buftarg_descr, -err2);
> +	}
> +	if (!error)
> +		error = err2;
> +
> +	return error;
> +}
> +
> +/*
> + * Flush all dirty buffers to stable storage and report on writes that didn't
> + * make it to stable storage.
> + */
> +static int
> +libxfs_flush_mount(
> +	struct xfs_mount	*mp)
> +{
> +	int			error = 0;
> +	int			err2;
> +
> +	/*
> +	 * Purge the buffer cache to write all dirty buffers to disk and free
> +	 * all incore buffers.  Buffers that fail write verification will cause
> +	 * the CORRUPT_WRITE flag to be set in the buftarg.  Buffers that
> +	 * cannot be written will cause the LOST_WRITE flag to be set in the
> +	 * buftarg.
> +	 */
> +	libxfs_bcache_purge();
> +
> +	/* Flush all kernel and disk write caches, and report failures. */
> +	if (mp->m_ddev_targp) {
> +		err2 = libxfs_flush_buftarg(mp->m_ddev_targp, _("data device"));
> +		if (!error)
> +			error = err2;
> +	}
> +
> +	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
> +		err2 = libxfs_flush_buftarg(mp->m_logdev_targp,
> +				_("log device"));
> +		if (!error)
> +			error = err2;
> +	}
> +
> +	if (mp->m_rtdev_targp) {
> +		err2 = libxfs_flush_buftarg(mp->m_rtdev_targp,
> +				_("realtime device"));
> +		if (!error)
> +			error = err2;
> +	}
> +
> +	return error;
> +}
> +
>  /*
>   * Release any resource obtained during a mount.
>   */
> -void
> -libxfs_umount(xfs_mount_t *mp)
> +int
> +libxfs_umount(
> +	struct xfs_mount	*mp)
>  {
>  	struct xfs_perag	*pag;
>  	int			agno;
> +	int			error;
>  
>  	libxfs_rtmount_destroy(mp);
> -	libxfs_bcache_purge();
> +
> +	error = libxfs_flush_mount(mp);
>  
>  	for (agno = 0; agno < mp->m_maxagi; agno++) {
>  		pag = radix_tree_delete(&mp->m_perag_tree, agno);
> @@ -816,6 +905,7 @@ libxfs_umount(xfs_mount_t *mp)
>  		kmem_free(mp->m_logdev_targp);
>  	kmem_free(mp->m_ddev_targp);
>  
> +	return error;
>  }
>  
>  /*
> diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
> index 579df52b..6bb75a67 100644
> --- a/libxfs/libxfs_io.h
> +++ b/libxfs/libxfs_io.h
> @@ -23,10 +23,17 @@ struct xfs_perag;
>  struct xfs_buftarg {
>  	struct xfs_mount	*bt_mount;
>  	dev_t			dev;
> +	unsigned int		flags;
>  };
>  
> +/* We purged a dirty buffer and lost a write. */
> +#define XFS_BUFTARG_LOST_WRITE		(1 << 0)
> +/* A dirty buffer failed the write verifier. */
> +#define XFS_BUFTARG_CORRUPT_WRITE	(1 << 1)
> +
>  extern void	libxfs_buftarg_init(struct xfs_mount *mp, dev_t ddev,
>  				    dev_t logdev, dev_t rtdev);
> +int libxfs_blkdev_issue_flush(struct xfs_buftarg *btp);
>  
>  #define LIBXFS_BBTOOFF64(bbs)	(((xfs_off_t)(bbs)) << BBSHIFT)
>  
> diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
> index 8b47d438..4253b890 100644
> --- a/libxfs/rdwr.c
> +++ b/libxfs/rdwr.c
> @@ -17,6 +17,7 @@
>  #include "xfs_inode_fork.h"
>  #include "xfs_inode.h"
>  #include "xfs_trans.h"
> +#include "libfrog/platform.h"
>  
>  #include "libxfs.h"		/* for LIBXFS_EXIT_ON_FAILURE */
>  
> @@ -1219,6 +1220,19 @@ libxfs_iomove(xfs_buf_t *bp, uint boff, int len, void *data, int flags)
>  	}
>  }
>  
> +/* Complain about (and remember) dropping dirty buffers. */
> +static void
> +libxfs_whine_dirty_buf(
> +	struct xfs_buf		*bp)
> +{
> +	fprintf(stderr, _("%s: Releasing dirty buffer to free list!\n"),
> +			progname);
> +
> +	if (bp->b_error == -EFSCORRUPTED)
> +		bp->b_target->flags |= XFS_BUFTARG_CORRUPT_WRITE;
> +	bp->b_target->flags |= XFS_BUFTARG_LOST_WRITE;
> +}
> +
>  static void
>  libxfs_brelse(
>  	struct cache_node	*node)
> @@ -1228,8 +1242,7 @@ libxfs_brelse(
>  	if (!bp)
>  		return;
>  	if (bp->b_flags & LIBXFS_B_DIRTY)
> -		fprintf(stderr,
> -			"releasing dirty buffer to free list!\n");
> +		libxfs_whine_dirty_buf(bp);
>  
>  	pthread_mutex_lock(&xfs_buf_freelist.cm_mutex);
>  	list_add(&bp->b_node.cn_mru, &xfs_buf_freelist.cm_list);
> @@ -1249,8 +1262,7 @@ libxfs_bulkrelse(
>  
>  	list_for_each_entry(bp, list, b_node.cn_mru) {
>  		if (bp->b_flags & LIBXFS_B_DIRTY)
> -			fprintf(stderr,
> -				"releasing dirty buffer (bulk) to free list!\n");
> +			libxfs_whine_dirty_buf(bp);
>  		count++;
>  	}
>  
> @@ -1479,6 +1491,24 @@ libxfs_irele(
>  	kmem_cache_free(xfs_inode_zone, ip);
>  }
>  
> +/*
> + * Flush everything dirty in the kernel and disk write caches to stable media.
> + * Returns 0 for success or a negative error code.
> + */
> +int
> +libxfs_blkdev_issue_flush(
> +	struct xfs_buftarg	*btp)
> +{
> +	int			fd, ret;
> +
> +	if (btp->dev == 0)
> +		return 0;
> +
> +	fd = libxfs_device_to_fd(btp->dev);
> +	ret = platform_flush_device(fd, btp->dev);
> +	return ret ? -errno : 0;
> +}
> +
>  /*
>   * Write out a buffer list synchronously.
>   *
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/7] mkfs: check that metadata updates have been committed
  2020-02-25  0:10 ` [PATCH 5/7] mkfs: check that metadata updates have been committed Darrick J. Wong
@ 2020-02-25 15:08   ` Brian Foster
  2020-02-25 17:37   ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: Brian Foster @ 2020-02-25 15:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Feb 24, 2020 at 04:10:59PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Make sure that all the metadata we wrote in the process of formatting
> the filesystem have been written correctly, or exit with failure.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  mkfs/xfs_mkfs.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
> index 1f5d2105..1038e604 100644
> --- a/mkfs/xfs_mkfs.c
> +++ b/mkfs/xfs_mkfs.c
> @@ -3940,13 +3940,16 @@ main(
>  	(XFS_BUF_TO_SBP(buf))->sb_inprogress = 0;
>  	libxfs_writebuf(buf, LIBXFS_EXIT_ON_FAILURE);
>  
> -	libxfs_umount(mp);
> +	/* Report failure if anything failed to get written to our new fs. */
> +	error = -libxfs_umount(mp);
> +	if (error)
> +		exit(1);
> +
>  	if (xi.rtdev)
>  		libxfs_device_close(xi.rtdev);
>  	if (xi.logdev && xi.logdev != xi.ddev)
>  		libxfs_device_close(xi.logdev);
>  	libxfs_device_close(xi.ddev);
>  	libxfs_destroy();
> -
>  	return 0;
>  }
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] xfs_repair: check that metadata updates have been committed
  2020-02-25  0:11 ` [PATCH 6/7] xfs_repair: " Darrick J. Wong
@ 2020-02-25 15:08   ` Brian Foster
  2020-02-25 15:14     ` Darrick J. Wong
  2020-02-25 19:24   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 18+ messages in thread
From: Brian Foster @ 2020-02-25 15:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Make sure that any metadata that we repaired or regenerated has been
> written to disk.  If that fails, exit with 1 to signal that there are
> still errors in the filesystem.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  repair/xfs_repair.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> index eb1ce546..ccb13f4a 100644
> --- a/repair/xfs_repair.c
> +++ b/repair/xfs_repair.c
> @@ -703,6 +703,7 @@ main(int argc, char **argv)
>  	struct xfs_sb	psb;
>  	int		rval;
>  	struct xfs_ino_geometry	*igeo;
> +	int		error;
>  
>  	progname = basename(argv[0]);
>  	setlocale(LC_ALL, "");
> @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
>  	 */
>  	libxfs_bcache_flush();
>  	format_log_max_lsn(mp);
> -	libxfs_umount(mp);
> +
> +	/* Report failure if anything failed to get written to our fs. */
> +	error = -libxfs_umount(mp);
> +	if (error)
> +		exit(1);

I wonder a bit whether repair should really exit like this vs. report
the error as it does for most others, but I could go either way. I'll
defer to Eric:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  
>  	if (x.rtdev)
>  		libxfs_device_close(x.rtdev);
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] xfs_repair: check that metadata updates have been committed
  2020-02-25 15:08   ` Brian Foster
@ 2020-02-25 15:14     ` Darrick J. Wong
  2020-02-25 17:38       ` Christoph Hellwig
  0 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25 15:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: sandeen, linux-xfs

On Tue, Feb 25, 2020 at 10:08:17AM -0500, Brian Foster wrote:
> On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Make sure that any metadata that we repaired or regenerated has been
> > written to disk.  If that fails, exit with 1 to signal that there are
> > still errors in the filesystem.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  repair/xfs_repair.c |    7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > 
> > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > index eb1ce546..ccb13f4a 100644
> > --- a/repair/xfs_repair.c
> > +++ b/repair/xfs_repair.c
> > @@ -703,6 +703,7 @@ main(int argc, char **argv)
> >  	struct xfs_sb	psb;
> >  	int		rval;
> >  	struct xfs_ino_geometry	*igeo;
> > +	int		error;
> >  
> >  	progname = basename(argv[0]);
> >  	setlocale(LC_ALL, "");
> > @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
> >  	 */
> >  	libxfs_bcache_flush();
> >  	format_log_max_lsn(mp);
> > -	libxfs_umount(mp);
> > +
> > +	/* Report failure if anything failed to get written to our fs. */
> > +	error = -libxfs_umount(mp);
> > +	if (error)
> > +		exit(1);
> 
> I wonder a bit whether repair should really exit like this vs. report
> the error as it does for most others, but I could go either way. I'll
> defer to Eric:

I suppose I could do:

	error = -libxfs_umount();
	if (error)
		do_error(_("fs unmount failed (err=%d), re-run repair!\n"),
				error);

Though then you'd end up with:

	# xfs_repair /dev/fd0
	...
	Refusing to write corrupted metadata to the data device!
	fs unmount failed (err=117), re-run repair!
	# echo $?
	1

Which seems a little redundant.  But let's see what Eric thinks.

> Reviewed-by: Brian Foster <bfoster@redhat.com>

Anyway, thanks for reviewing!

--D

> >  
> >  	if (x.rtdev)
> >  		libxfs_device_close(x.rtdev);
> > 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem
  2020-02-25  0:10 ` [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem Darrick J. Wong
  2020-02-25 15:07   ` Brian Foster
@ 2020-02-25 17:37   ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-25 17:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Feb 24, 2020 at 04:10:53PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Teach libxfs_umount to flush all dirty buffers when unmounting the
> filesystem, to log write verifier errors and IO errors, and to return an
> error code when things go wrong.  Subsequent patches will teach critical
> utilities to exit with EXIT_FAILURE when this happens.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/7] mkfs: check that metadata updates have been committed
  2020-02-25  0:10 ` [PATCH 5/7] mkfs: check that metadata updates have been committed Darrick J. Wong
  2020-02-25 15:08   ` Brian Foster
@ 2020-02-25 17:37   ` Christoph Hellwig
  1 sibling, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-25 17:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs

On Mon, Feb 24, 2020 at 04:10:59PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Make sure that all the metadata we wrote in the process of formatting
> the filesystem have been written correctly, or exit with failure.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] xfs_repair: check that metadata updates have been committed
  2020-02-25 15:14     ` Darrick J. Wong
@ 2020-02-25 17:38       ` Christoph Hellwig
  2020-02-29 22:37         ` Eric Sandeen
  0 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-25 17:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, sandeen, linux-xfs

On Tue, Feb 25, 2020 at 07:14:27AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 25, 2020 at 10:08:17AM -0500, Brian Foster wrote:
> > On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > Make sure that any metadata that we repaired or regenerated has been
> > > written to disk.  If that fails, exit with 1 to signal that there are
> > > still errors in the filesystem.
> > > 
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  repair/xfs_repair.c |    7 ++++++-
> > >  1 file changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > 
> > > diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
> > > index eb1ce546..ccb13f4a 100644
> > > --- a/repair/xfs_repair.c
> > > +++ b/repair/xfs_repair.c
> > > @@ -703,6 +703,7 @@ main(int argc, char **argv)
> > >  	struct xfs_sb	psb;
> > >  	int		rval;
> > >  	struct xfs_ino_geometry	*igeo;
> > > +	int		error;
> > >  
> > >  	progname = basename(argv[0]);
> > >  	setlocale(LC_ALL, "");
> > > @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
> > >  	 */
> > >  	libxfs_bcache_flush();
> > >  	format_log_max_lsn(mp);
> > > -	libxfs_umount(mp);
> > > +
> > > +	/* Report failure if anything failed to get written to our fs. */
> > > +	error = -libxfs_umount(mp);
> > > +	if (error)
> > > +		exit(1);
> > 
> > I wonder a bit whether repair should really exit like this vs. report
> > the error as it does for most others, but I could go either way. I'll
> > defer to Eric:
> 
> I suppose I could do:
> 
> 	error = -libxfs_umount();
> 	if (error)
> 		do_error(_("fs unmount failed (err=%d), re-run repair!\n"),
> 				error);
> 
> Though then you'd end up with:
> 
> 	# xfs_repair /dev/fd0
> 	...
> 	Refusing to write corrupted metadata to the data device!
> 	fs unmount failed (err=117), re-run repair!
> 	# echo $?
> 	1
> 
> Which seems a little redundant.  But let's see what Eric thinks.

I think this message would be at last somewhat useful.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 7/7] libfrog: always fsync when flushing a device
  2020-02-25  0:11 ` [PATCH 7/7] libfrog: always fsync when flushing a device Darrick J. Wong
@ 2020-02-25 17:38   ` Christoph Hellwig
  0 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-02-25 17:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: sandeen, linux-xfs, Brian Foster

On Mon, Feb 24, 2020 at 04:11:11PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> Always call fsync() when we're flushing a device, even if it is a block
> device.  It's probably redundant to call fsync /and/ BLKFLSBUF, but the
> latter has odd behavior so we want to make sure the standard flush
> methods have a chance to run first.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Brian Foster <bfoster@redhat.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 6/7] xfs_repair: check that metadata updates have been committed
  2020-02-25  0:11 ` [PATCH 6/7] xfs_repair: " Darrick J. Wong
  2020-02-25 15:08   ` Brian Foster
@ 2020-02-25 19:24   ` Darrick J. Wong
  1 sibling, 0 replies; 18+ messages in thread
From: Darrick J. Wong @ 2020-02-25 19:24 UTC (permalink / raw)
  To: sandeen; +Cc: linux-xfs, Brian Foster, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

Make sure that any metadata that we repaired or regenerated has been
written to disk.  If that fails, exit with 1 to signal that there are
still errors in the filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
v2: use the usual error reporting to log and exit
---
 repair/xfs_repair.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index eb1ce546..2ebf6a5b 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -703,6 +703,7 @@ main(int argc, char **argv)
 	struct xfs_sb	psb;
 	int		rval;
 	struct xfs_ino_geometry	*igeo;
+	int		error;
 
 	progname = basename(argv[0]);
 	setlocale(LC_ALL, "");
@@ -1104,7 +1105,13 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
 	 */
 	libxfs_bcache_flush();
 	format_log_max_lsn(mp);
-	libxfs_umount(mp);
+
+	/* Report failure if anything failed to get written to our fs. */
+	error = -libxfs_umount(mp);
+	if (error)
+		do_error(
+	_("File system metadata writeout failed, err=%d.  Re-run xfs_repair."),
+				error);
 
 	if (x.rtdev)
 		libxfs_device_close(x.rtdev);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] xfs_repair: check that metadata updates have been committed
  2020-02-25 17:38       ` Christoph Hellwig
@ 2020-02-29 22:37         ` Eric Sandeen
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Sandeen @ 2020-02-29 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On 2/25/20 9:38 AM, Christoph Hellwig wrote:
> On Tue, Feb 25, 2020 at 07:14:27AM -0800, Darrick J. Wong wrote:
>> On Tue, Feb 25, 2020 at 10:08:17AM -0500, Brian Foster wrote:
>>> On Mon, Feb 24, 2020 at 04:11:05PM -0800, Darrick J. Wong wrote:
>>>> From: Darrick J. Wong <darrick.wong@oracle.com>
>>>>
>>>> Make sure that any metadata that we repaired or regenerated has been
>>>> written to disk.  If that fails, exit with 1 to signal that there are
>>>> still errors in the filesystem.
>>>>
>>>> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
>>>> ---
>>>>  repair/xfs_repair.c |    7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>>
>>>> diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
>>>> index eb1ce546..ccb13f4a 100644
>>>> --- a/repair/xfs_repair.c
>>>> +++ b/repair/xfs_repair.c
>>>> @@ -703,6 +703,7 @@ main(int argc, char **argv)
>>>>  	struct xfs_sb	psb;
>>>>  	int		rval;
>>>>  	struct xfs_ino_geometry	*igeo;
>>>> +	int		error;
>>>>  
>>>>  	progname = basename(argv[0]);
>>>>  	setlocale(LC_ALL, "");
>>>> @@ -1104,7 +1105,11 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\
>>>>  	 */
>>>>  	libxfs_bcache_flush();
>>>>  	format_log_max_lsn(mp);
>>>> -	libxfs_umount(mp);
>>>> +
>>>> +	/* Report failure if anything failed to get written to our fs. */
>>>> +	error = -libxfs_umount(mp);
>>>> +	if (error)
>>>> +		exit(1);
>>>
>>> I wonder a bit whether repair should really exit like this vs. report
>>> the error as it does for most others, but I could go either way. I'll
>>> defer to Eric:
>>
>> I suppose I could do:
>>
>> 	error = -libxfs_umount();
>> 	if (error)
>> 		do_error(_("fs unmount failed (err=%d), re-run repair!\n"),
>> 				error);
>>
>> Though then you'd end up with:
>>
>> 	# xfs_repair /dev/fd0
>> 	...
>> 	Refusing to write corrupted metadata to the data device!
>> 	fs unmount failed (err=117), re-run repair!
>> 	# echo $?
>> 	1
>>
>> Which seems a little redundant.  But let's see what Eric thinks.
> 
> I think this message would be at last somewhat useful.

I also think this is good, thank you guys for working it out while I'm
half-traveling/half-working (as was hch but he's keeping up!)  :)

-Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-02-29 22:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25  0:10 [PATCH v3 0/7] xfsprogs: actually check that writes succeeded Darrick J. Wong
2020-02-25  0:10 ` [PATCH 1/7] libxfs: libxfs_buf_delwri_submit should write buffers immediately Darrick J. Wong
2020-02-25  0:10 ` [PATCH 2/7] libxfs: complain when write IOs fail Darrick J. Wong
2020-02-25  0:10 ` [PATCH 3/7] libxfs: return flush failures Darrick J. Wong
2020-02-25  0:10 ` [PATCH 4/7] libxfs: flush all dirty buffers and report errors when unmounting filesystem Darrick J. Wong
2020-02-25 15:07   ` Brian Foster
2020-02-25 17:37   ` Christoph Hellwig
2020-02-25  0:10 ` [PATCH 5/7] mkfs: check that metadata updates have been committed Darrick J. Wong
2020-02-25 15:08   ` Brian Foster
2020-02-25 17:37   ` Christoph Hellwig
2020-02-25  0:11 ` [PATCH 6/7] xfs_repair: " Darrick J. Wong
2020-02-25 15:08   ` Brian Foster
2020-02-25 15:14     ` Darrick J. Wong
2020-02-25 17:38       ` Christoph Hellwig
2020-02-29 22:37         ` Eric Sandeen
2020-02-25 19:24   ` [PATCH v2 " Darrick J. Wong
2020-02-25  0:11 ` [PATCH 7/7] libfrog: always fsync when flushing a device Darrick J. Wong
2020-02-25 17:38   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.