All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
@ 2017-05-31 12:45 Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 01/17] lib: add errseq_t type and infrastructure for handling it Jeff Layton
                   ` (18 more replies)
  0 siblings, 19 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

v5: don't retrofit old API over the new infrastructure
    add fstype flag to indicate how wb errors are tracked within that fs
    add more function variants that take a errseq_t "since" value
    add second errseq_t to struct file to track metadata wb errors
    convert ext4 and ext2 to use the new APIs

v4: several more cleanup patches
    documentation and kerneldoc comment updates
    fix bugs in gfs2 patches
    make sync_file_range use same error reporting semantics
    bugfixes in buffer.c
    convert nfs to new scheme (maybe bogus, can be dropped)

v3: wb_err_t -> errseq_t conversion
    clean up places that re-set errors after calling filemap_* functions

v2: introduce wb_err_t, use atomics

This is v5 of the patchset to improve how we're tracking and reporting
errors that occur during pagecache writeback. The main difference in
this set from the last one is that I've stopped trying to retrofit the
old error tracking API on top of the new one. This is more work since
we'll have to touch each fs individually, but should be safer as the
"since" values used for checking errors will be more deliberate.

There are several situations where the kernel can "lose" errors that
occur during writeback, such that fsync will return success even
though it failed to write back some data previously. The basic idea
here is to have the kernel be more deliberate about the point from
which errors are checked to ensure that that doesn't happen.

An additional aim of this set is to change the behavior of fsync in
Linux to report writeback errors on all fds instead of just the first
one. This allows writers to reliably tell whether their data made it to
the backing device without having to coordinate fsync calls with other
writers.

To do this, we add a new typedef: errseq_t. This is a 32-bit value
that can store an error code, and a sequence number so we can tell
whether it has changed since we last sampled it. This allows us to
record errors in the address_space and then report those errors only
once per file description.

This set just alters block device files, ext4 and the legacy ext2
driver. If this general approach seems acceptable, then I'll start
converting other filesystems in follow-on patchsets. I'd also like
to get this into linux-next as soon as possible to ensure that we're
banging out any bugs that might be lurking here.

I also have a couple of xfstests for this as well that I'll re-post
soon.

Jeff Layton (17):
  lib: add errseq_t type and infrastructure for handling it
  fs: new infrastructure for writeback error handling and reporting
  mm: tracepoints for writeback error events
  fs: add a new fstype flag to indicate how writeback errors are tracked
  Documentation: flesh out the section in vfs.txt on storing and
    reporting writeback errors
  fs: adapt sync_file_range to new reporting infrastructure
  mm: add filemap_fdatawait_range_since and
    filemap_write_and_wait_range_since
  dax: set errors in mapping when writeback fails
  block: convert to errseq_t based writeback error tracking
  block: add sync_blockdev_since and sync_filesystem_since
  fs: add f_md_wb_err field to struct file for tracking metadata errors
  fs: allow __generic_file_fsync to support both flavors of error
    reporting
  jbd2: conditionally handle errors using errseq_t based on FS_WB_ERRSEQ
    flag
  ext4: convert to errseq_t based error tracking
  fs: add a write_one_page_since
  ext2: convert to errseq_t based writeback error tracking
  fs: convert ext2 to use write_one_page_since

 Documentation/filesystems/vfs.txt |  50 ++++++++-
 drivers/dax/device.c              |   1 +
 fs/block_dev.c                    |  29 +++++-
 fs/dax.c                          |  18 +++-
 fs/ext2/dir.c                     |  25 +++--
 fs/ext2/file.c                    |  29 ++++--
 fs/ext2/super.c                   |   2 +-
 fs/ext4/dir.c                     |   8 +-
 fs/ext4/ext4.h                    |   8 +-
 fs/ext4/extents.c                 |  24 +++--
 fs/ext4/file.c                    |   5 +-
 fs/ext4/fsync.c                   |  23 ++++-
 fs/ext4/inode.c                   |  19 ++--
 fs/ext4/ioctl.c                   |   9 +-
 fs/ext4/super.c                   |   9 +-
 fs/file_table.c                   |   1 +
 fs/internal.h                     |   8 ++
 fs/jbd2/commit.c                  |  29 ++++--
 fs/jbd2/recovery.c                |   5 +-
 fs/jbd2/transaction.c             |   1 +
 fs/libfs.c                        |  26 +++--
 fs/open.c                         |   3 +
 fs/sync.c                         |  62 +++++++++++-
 include/linux/errseq.h            |  19 ++++
 include/linux/fs.h                |  82 ++++++++++++++-
 include/linux/jbd2.h              |   3 +
 include/linux/mm.h                |   2 +
 include/linux/pagemap.h           |  32 ++++--
 include/trace/events/filemap.h    |  52 ++++++++++
 lib/Makefile                      |   2 +-
 lib/errseq.c                      | 208 ++++++++++++++++++++++++++++++++++++++
 mm/filemap.c                      | 145 ++++++++++++++++++++++++++
 mm/page-writeback.c               |  53 +++++++---
 33 files changed, 892 insertions(+), 100 deletions(-)
 create mode 100644 include/linux/errseq.h
 create mode 100644 lib/errseq.c

-- 
2.9.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v5 01/17] lib: add errseq_t type and infrastructure for handling it
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 02/17] fs: new infrastructure for writeback error handling and reporting Jeff Layton
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

An errseq_t is a way of recording errors in one place, and allowing any
number of "subscribers" to tell whether an error has been set again
since a previous time.

It's implemented as an unsigned 32-bit value that is managed with atomic
operations. The low order bits are designated to hold an error code
(max size of MAX_ERRNO). The upper bits are used as a counter.

The API works with consumers sampling an errseq_t value at a particular
point in time. Later, that value can be used to tell whether new errors
have been set since that time.

Note that there is a 1 in 512k risk of collisions here if new errors
are being recorded frequently, since we have so few bits to use as a
counter. To mitigate this, one bit is used as a flag to tell whether the
value has been sampled since a new value was recorded. That allows
us to avoid bumping the counter if no one has sampled it since it
was last bumped.

Later patches will build on this infrastructure to change how writeback
errors are tracked in the kernel.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: NeilBrown <neilb@suse.com>
---
 include/linux/errseq.h |  19 +++++
 lib/Makefile           |   2 +-
 lib/errseq.c           | 200 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 220 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/errseq.h
 create mode 100644 lib/errseq.c

diff --git a/include/linux/errseq.h b/include/linux/errseq.h
new file mode 100644
index 000000000000..0d2555f310cd
--- /dev/null
+++ b/include/linux/errseq.h
@@ -0,0 +1,19 @@
+#ifndef _LINUX_ERRSEQ_H
+#define _LINUX_ERRSEQ_H
+
+/* See lib/errseq.c for more info */
+
+typedef u32	errseq_t;
+
+void __errseq_set(errseq_t *eseq, int err);
+static inline void errseq_set(errseq_t *eseq, int err)
+{
+	/* Optimize for the common case of no error */
+	if (unlikely(err))
+		__errseq_set(eseq, err);
+}
+
+errseq_t errseq_sample(errseq_t *eseq);
+int errseq_check(errseq_t *eseq, errseq_t since);
+int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
+#endif
diff --git a/lib/Makefile b/lib/Makefile
index 0166fbc0fa81..519782d9ca3f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,7 +41,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
 	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
-	 once.o refcount.o usercopy.o
+	 once.o refcount.o usercopy.o errseq.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
diff --git a/lib/errseq.c b/lib/errseq.c
new file mode 100644
index 000000000000..d129c0611c1f
--- /dev/null
+++ b/lib/errseq.c
@@ -0,0 +1,200 @@
+#include <linux/err.h>
+#include <linux/bug.h>
+#include <linux/atomic.h>
+#include <linux/errseq.h>
+
+/*
+ * An errseq_t is a way of recording errors in one place, and allowing any
+ * number of "subscribers" to tell whether it has changed since a previous
+ * point where it was sampled.
+ *
+ * It's implemented as an unsigned 32-bit value. The low order bits are
+ * designated to hold an error code (between 0 and -MAX_ERRNO). The upper bits
+ * are used as a counter. This is done with atomics instead of locking so that
+ * these functions can be called from any context.
+ *
+ * The general idea is for consumers to sample an errseq_t value. That value
+ * can later be used to tell whether any new errors have occurred since that
+ * sampling was done.
+ *
+ * Note that there is a risk of collisions if new errors are being recorded
+ * frequently, since we have so few bits to use as a counter.
+ *
+ * To mitigate this, one bit is used as a flag to tell whether the value has
+ * been sampled since a new value was recorded. That allows us to avoid bumping
+ * the counter if no one has sampled it since the last time an error was
+ * recorded.
+ *
+ * A new errseq_t should always be zeroed out.  A errseq_t value of all zeroes
+ * is the special (but common) case where there has never been an error. An all
+ * zero value thus serves as the "epoch" if one wishes to know whether there
+ * has ever been an error set since it was first initialized.
+ */
+
+/* The low bits are designated for error code (max of MAX_ERRNO) */
+#define ERRSEQ_SHIFT		ilog2(MAX_ERRNO + 1)
+
+/* This bit is used as a flag to indicate whether the value has been seen */
+#define ERRSEQ_SEEN		(1 << ERRSEQ_SHIFT)
+
+/* The lowest bit of the counter */
+#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 1))
+
+/**
+ * __errseq_set - set a errseq_t for later reporting
+ * @eseq: errseq_t field that should be set
+ * @err: error to set
+ *
+ * This function sets the error in *eseq, and increments the sequence counter
+ * if the last sequence was sampled at some point in the past.
+ *
+ * Any error set will always overwrite an existing error.
+ *
+ * Most callers will want to use the errseq_set inline wrapper to efficiently
+ * handle the common case where err is 0.
+ */
+void __errseq_set(errseq_t *eseq, int err)
+{
+	errseq_t old;
+
+	/* MAX_ERRNO must be able to serve as a mask */
+	BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1);
+
+	/*
+	 * Ensure the error code actually fits where we want it to go. If it
+	 * doesn't then just throw a warning and don't record anything. We
+	 * also don't accept zero here as that would effectively clear a
+	 * previous error.
+	 */
+	if (WARN(unlikely(err == 0 || (unsigned int)-err > MAX_ERRNO),
+				"err = %d\n", err))
+		return;
+
+	old = READ_ONCE(*eseq);
+	for (;;) {
+		errseq_t new, cur;
+
+		/* Clear out error bits and set new error */
+		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
+
+		/* Only increment if someone has looked at it */
+		if (old & ERRSEQ_SEEN)
+			new += ERRSEQ_CTR_INC;
+
+		/* If there would be no change, then call it done */
+		if (new == old)
+			break;
+
+		/* Try to swap the new value into place */
+		cur = cmpxchg(eseq, old, new);
+
+		/*
+		 * Call it success if we did the swap or someone else beat us
+		 * to it for the same value.
+		 */
+		if (likely(cur == old || cur == new))
+			break;
+
+		/* Raced with an update, try again */
+		old = cur;
+	}
+}
+EXPORT_SYMBOL(__errseq_set);
+
+/**
+ * errseq_sample - grab current errseq_t value
+ * @eseq: pointer to errseq_t to be sampled
+ *
+ * This function allows callers to sample an errseq_t value, marking it as
+ * "seen" if required.
+ */
+errseq_t errseq_sample(errseq_t *eseq)
+{
+	errseq_t old = READ_ONCE(*eseq);
+	errseq_t new = old;
+
+	/*
+	 * For the common case of no errors ever having been set, we can skip
+	 * marking the SEEN bit. Once an error has been set, the value will
+	 * never go back to zero.
+	 */
+	if (old != 0) {
+		new |= ERRSEQ_SEEN;
+		if (old != new)
+			cmpxchg(eseq, old, new);
+	}
+	return new;
+}
+EXPORT_SYMBOL(errseq_sample);
+
+/**
+ * errseq_check - has an error occurred since a particular sample point?
+ * @eseq: pointer to errseq_t value to be checked
+ * @since: previously-sampled errseq_t from which to check
+ *
+ * Grab the value that eseq points to, and see if it has changed "since"
+ * the given value was sampled. The "since" value is not advanced, so there
+ * is no need to mark the value as seen.
+ *
+ * Returns the latest error set in the errseq_t or 0 if it hasn't changed.
+ */
+int errseq_check(errseq_t *eseq, errseq_t since)
+{
+	errseq_t cur = READ_ONCE(*eseq);
+
+	if (likely(cur == since))
+		return 0;
+	return -(cur & MAX_ERRNO);
+}
+EXPORT_SYMBOL(errseq_check);
+
+/**
+ * errseq_check_and_advance - check an errseq_t and advance it to the current value
+ * @eseq: pointer to value being checked and reported
+ * @since: pointer to previously-sampled errseq_t to check against and advance
+ *
+ * Grab the eseq value, and see whether it matches the value that "since"
+ * points to. If it does, then just return 0.
+ *
+ * If it doesn't, then the value has changed. Set the "seen" flag, and try to
+ * swap it into place as the new eseq value. Then, set that value as the new
+ * "since" value, and return whatever the error portion is set to.
+ *
+ * Note that no locking is provided here for concurrent updates to the "since"
+ * value. The caller must provide that if necessary. Because of this, callers
+ * may want to do a lockless errseq_check before taking the lock and calling
+ * this.
+ */
+int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
+{
+	int err = 0;
+	errseq_t old, new;
+
+	/*
+	 * Most callers will want to use the inline wrapper to check this,
+	 * so that the common case of no error is handled without needing
+	 * to take the lock that protects the "since" value.
+	 */
+	old = READ_ONCE(*eseq);
+	if (old != *since) {
+		/*
+		 * Set the flag and try to swap it into place if it has
+		 * changed.
+		 *
+		 * We don't care about the outcome of the swap here. If the
+		 * swap doesn't occur, then it has either been updated by a
+		 * writer who is altering the value in some way (updating
+		 * counter or resetting the error), or another reader who is
+		 * just setting the "seen" flag. Either outcome is OK, and we
+		 * can advance "since" and return an error based on what we
+		 * have.
+		 */
+		new = old | ERRSEQ_SEEN;
+		if (new != old)
+			cmpxchg(eseq, old, new);
+		*since = new;
+		err = -(new & MAX_ERRNO);
+	}
+	return err;
+}
+EXPORT_SYMBOL(errseq_check_and_advance);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 02/17] fs: new infrastructure for writeback error handling and reporting
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 01/17] lib: add errseq_t type and infrastructure for handling it Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 03/17] mm: tracepoints for writeback error events Jeff Layton
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Most filesystems currently use mapping_set_error and
filemap_check_errors for setting and reporting/clearing writeback errors
at the mapping level. filemap_check_errors is indirectly called from
most of the filemap_fdatawait_* functions and from
filemap_write_and_wait*. These functions are called from all sorts of
contexts to wait on writeback to finish -- e.g. mostly in fsync, but
also in truncate calls, getattr, etc.

The non-fsync callers are problematic. We should be reporting writeback
errors during fsync, but many places spread over the tree clear out
errors before they can be properly reported, or report errors at
nonsensical times.

If I get -EIO on a stat() call, there is no reason for me to assume that
it is because some previous writeback failed. The fact that it also
clears out the error such that a subsequent fsync returns 0 is a bug,
and a nasty one since that's potentially silent data corruption.

This patch adds a small bit of new infrastructure for setting and
reporting errors during address_space writeback. While the above was my
original impetus for adding this, I think it's also the case that
current fsync semantics are just problematic for userland. Most
applications that call fsync do so to ensure that the data they wrote
has hit the backing store.

In the case where there are multiple writers to the file at the same
time, this is really hard to determine. The first one to call fsync will
see any stored error, and the rest get back 0. The processes with open
fds may not be associated with one another in any way. They could even
be in different containers, so ensuring coordination between all fsync
callers is not really an option.

One way to remedy this would be to track what file descriptor was used
to dirty the file, but that's rather cumbersome and would likely be
slow. However, there is a simpler way to improve the semantics here
without incurring too much overhead.

This set adds an errseq_t to struct address_space, and a corresponding
one is added to struct file. Writeback errors are recorded in the
mapping's errseq_t, and the one in struct file is used as the "since"
value.

This changes the semantics of the Linux fsync implementation such that
applications can now use it to determine whether there were any
writeback errors since fsync(fd) was last called (or since the file was
opened in the case of fsync having never been called).

Note that those writeback errors may have occurred when writing data
that was dirtied via an entirely different fd, but that's the case now
with the current mapping_set_error/filemap_check_error infrastructure.
This will at least prevent you from getting a false report of success.

The new behavior is still consistent with the POSIX spec, and is more
reliable for application developers. This patch just adds some basic
infrastructure for doing this, and ensures that the f_wb_err "cursor"
is properly set when a file is opened. Later patches will change the
existing code to use this new infrastructure.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 drivers/dax/device.c |  1 +
 fs/block_dev.c       |  1 +
 fs/file_table.c      |  1 +
 fs/open.c            |  3 +++
 include/linux/fs.h   | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/filemap.c         | 38 +++++++++++++++++++++++++++++++++++++
 6 files changed, 97 insertions(+)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 006e657dfcb9..12943d19bfc4 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -499,6 +499,7 @@ static int dax_open(struct inode *inode, struct file *filp)
 	inode->i_mapping = __dax_inode->i_mapping;
 	inode->i_mapping->host = __dax_inode;
 	filp->f_mapping = inode->i_mapping;
+	filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
 	filp->private_data = dev_dax;
 	inode->i_flags = S_DAX;
 
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 519599dddd36..4d62fe771587 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1743,6 +1743,7 @@ static int blkdev_open(struct inode * inode, struct file * filp)
 		return -ENOMEM;
 
 	filp->f_mapping = bdev->bd_inode->i_mapping;
+	filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
 
 	return blkdev_get(bdev, filp->f_mode, filp);
 }
diff --git a/fs/file_table.c b/fs/file_table.c
index 954d510b765a..72e861a35a7f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -168,6 +168,7 @@ struct file *alloc_file(const struct path *path, fmode_t mode,
 	file->f_path = *path;
 	file->f_inode = path->dentry->d_inode;
 	file->f_mapping = path->dentry->d_inode->i_mapping;
+	file->f_wb_err = filemap_sample_wb_err(file->f_mapping);
 	if ((mode & FMODE_READ) &&
 	     likely(fop->read || fop->read_iter))
 		mode |= FMODE_CAN_READ;
diff --git a/fs/open.c b/fs/open.c
index cd0c5be8d012..280d4a963791 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -707,6 +707,9 @@ static int do_dentry_open(struct file *f,
 	f->f_inode = inode;
 	f->f_mapping = inode->i_mapping;
 
+	/* Ensure that we skip any errors that predate opening of the file */
+	f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
+
 	if (unlikely(f->f_flags & O_PATH)) {
 		f->f_mode = FMODE_PATH;
 		f->f_op = &empty_fops;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 26f0c64cd4fb..24178107379d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -30,6 +30,7 @@
 #include <linux/percpu-rwsem.h>
 #include <linux/workqueue.h>
 #include <linux/delayed_call.h>
+#include <linux/errseq.h>
 
 #include <asm/byteorder.h>
 #include <uapi/linux/fs.h>
@@ -392,6 +393,7 @@ struct address_space {
 	gfp_t			gfp_mask;	/* implicit gfp mask for allocations */
 	struct list_head	private_list;	/* ditto */
 	void			*private_data;	/* ditto */
+	errseq_t		wb_err;
 } __attribute__((aligned(sizeof(long))));
 	/*
 	 * On most architectures that alignment is already the case; but
@@ -846,6 +848,7 @@ struct file {
 	 * Must not be taken from IRQ context.
 	 */
 	spinlock_t		f_lock;
+	errseq_t		f_wb_err;
 	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
@@ -2524,6 +2527,56 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end);
 extern int filemap_check_errors(struct address_space *mapping);
 
+extern int __must_check filemap_report_wb_err(struct file *file);
+
+/**
+ * filemap_set_wb_err - set a writeback error on an address_space
+ * @mapping: mapping in which to set writeback error
+ * @err: error to be set in mapping
+ *
+ * When writeback fails in some way, we must record that error so that
+ * userspace can be informed when fsync and the like are called.  We endeavor
+ * to report errors on any file that was open at the time of the error.  Some
+ * internal callers also need to know when writeback errors have occurred.
+ *
+ * When a writeback error occurs, most filesystems will want to call
+ * filemap_set_wb_err to record the error in the mapping so that it will be
+ * automatically reported whenever fsync is called on the file.
+ *
+ * FIXME: mention FS_* flag here?
+ */
+static inline void filemap_set_wb_err(struct address_space *mapping, int err)
+{
+	errseq_set(&mapping->wb_err, err);
+}
+
+/**
+ * filemap_check_wb_error - has an error occurred since the mark was sampled?
+ * @mapping: mapping to check for writeback errors
+ * @since: previously-sampled errseq_t
+ *
+ * Grab the errseq_t value from the mapping, and see if it has changed "since"
+ * the given value was sampled.
+ *
+ * If it has then report the latest error set, otherwise return 0.
+ */
+static inline int filemap_check_wb_err(struct address_space *mapping, errseq_t since)
+{
+	return errseq_check(&mapping->wb_err, since);
+}
+
+/**
+ * filemap_sample_wb_err - sample the current errseq_t to test for later errors
+ * @mapping: mapping to be sampled
+ *
+ * Writeback errors are always reported relative to a particular sample point
+ * in the past. This function provides those sample points.
+ */
+static inline errseq_t filemap_sample_wb_err(struct address_space *mapping)
+{
+	return errseq_sample(&mapping->wb_err);
+}
+
 extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end,
 			   int datasync);
 extern int vfs_fsync(struct file *file, int datasync);
diff --git a/mm/filemap.c b/mm/filemap.c
index 1de71753de28..aeb58db10688 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -536,6 +536,44 @@ int filemap_write_and_wait_range(struct address_space *mapping,
 EXPORT_SYMBOL(filemap_write_and_wait_range);
 
 /**
+ * filemap_report_wb_err - report wb error (if any) that was previously set
+ * @file: struct file on which the error is being reported
+ *
+ * When userland calls fsync (or something like nfsd does the equivalent), we
+ * want to report any writeback errors that occurred since the last fsync (or
+ * since the file was opened if there haven't been any).
+ *
+ * Grab the wb_err from the mapping. If it matches what we have in the file,
+ * then just quickly return 0. The file is all caught up.
+ *
+ * If it doesn't match, then take the mapping value, set the "seen" flag in
+ * it and try to swap it into place. If it works, or another task beat us
+ * to it with the new value, then update the f_wb_err and return the error
+ * portion. The error at this point must be reported via proper channels
+ * (a'la fsync, or NFS COMMIT operation, etc.).
+ *
+ * While we handle mapping->wb_err with atomic operations, the f_wb_err
+ * value is protected by the f_lock since we must ensure that it reflects
+ * the latest value swapped in for this file descriptor.
+ */
+int filemap_report_wb_err(struct file *file)
+{
+	int err = 0;
+	struct address_space *mapping = file->f_mapping;
+
+	/* Locklessly handle the common case where nothing has changed */
+	if (errseq_check(&mapping->wb_err, READ_ONCE(file->f_wb_err))) {
+		/* Something changed, must use slow path */
+		spin_lock(&file->f_lock);
+		err = errseq_check_and_advance(&mapping->wb_err,
+						&file->f_wb_err);
+		spin_unlock(&file->f_lock);
+	}
+	return err;
+}
+EXPORT_SYMBOL(filemap_report_wb_err);
+
+/**
  * replace_page_cache_page - replace a pagecache page with a new one
  * @old:	page to be replaced
  * @new:	page to replace with
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 03/17] mm: tracepoints for writeback error events
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 01/17] lib: add errseq_t type and infrastructure for handling it Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 02/17] fs: new infrastructure for writeback error handling and reporting Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 04/17] fs: add a new fstype flag to indicate how writeback errors are tracked Jeff Layton
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

To enable that, make __errseq_set return the value that it was set to
we exit the loop. Take heed that that value is not suitable as a later
"since" value, as it will not have been marked seen.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/errseq.h         |  2 +-
 include/linux/fs.h             |  5 +++-
 include/trace/events/filemap.h | 55 ++++++++++++++++++++++++++++++++++++++++++
 lib/errseq.c                   | 20 ++++++++++-----
 mm/filemap.c                   | 13 +++++++++-
 5 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/include/linux/errseq.h b/include/linux/errseq.h
index 0d2555f310cd..9e0d444ac88d 100644
--- a/include/linux/errseq.h
+++ b/include/linux/errseq.h
@@ -5,7 +5,7 @@
 
 typedef u32	errseq_t;
 
-void __errseq_set(errseq_t *eseq, int err);
+errseq_t __errseq_set(errseq_t *eseq, int err);
 static inline void errseq_set(errseq_t *eseq, int err)
 {
 	/* Optimize for the common case of no error */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 24178107379d..293cbc7f3520 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2528,6 +2528,7 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_check_errors(struct address_space *mapping);
 
 extern int __must_check filemap_report_wb_err(struct file *file);
+extern void __filemap_set_wb_err(struct address_space *mapping, int err);
 
 /**
  * filemap_set_wb_err - set a writeback error on an address_space
@@ -2547,7 +2548,9 @@ extern int __must_check filemap_report_wb_err(struct file *file);
  */
 static inline void filemap_set_wb_err(struct address_space *mapping, int err)
 {
-	errseq_set(&mapping->wb_err, err);
+	/* Fastpath for common case of no error */
+	if (unlikely(err))
+		__filemap_set_wb_err(mapping, err);
 }
 
 /**
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 42febb6bc1d5..2af66920f267 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -10,6 +10,7 @@
 #include <linux/memcontrol.h>
 #include <linux/device.h>
 #include <linux/kdev_t.h>
+#include <linux/errseq.h>
 
 DECLARE_EVENT_CLASS(mm_filemap_op_page_cache,
 
@@ -52,6 +53,60 @@ DEFINE_EVENT(mm_filemap_op_page_cache, mm_filemap_add_to_page_cache,
 	TP_ARGS(page)
 	);
 
+TRACE_EVENT(filemap_set_wb_err,
+		TP_PROTO(struct address_space *mapping, errseq_t eseq),
+
+		TP_ARGS(mapping, eseq),
+
+		TP_STRUCT__entry(
+			__field(unsigned long, i_ino)
+			__field(dev_t, s_dev)
+			__field(errseq_t, errseq)
+		),
+
+		TP_fast_assign(
+			__entry->i_ino = mapping->host->i_ino;
+			__entry->errseq = eseq;
+			if (mapping->host->i_sb)
+				__entry->s_dev = mapping->host->i_sb->s_dev;
+			else
+				__entry->s_dev = mapping->host->i_rdev;
+		),
+
+		TP_printk("dev=%d:%d ino=0x%lx errseq=0x%x",
+			MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
+			__entry->i_ino, __entry->errseq)
+);
+
+TRACE_EVENT(filemap_report_wb_err,
+		TP_PROTO(struct file *file, errseq_t old),
+
+		TP_ARGS(file, old),
+
+		TP_STRUCT__entry(
+			__field(struct file *, file);
+			__field(unsigned long, i_ino)
+			__field(dev_t, s_dev)
+			__field(errseq_t, old)
+			__field(errseq_t, new)
+		),
+
+		TP_fast_assign(
+			__entry->file = file;
+			__entry->i_ino = file->f_mapping->host->i_ino;
+			if (file->f_mapping->host->i_sb)
+				__entry->s_dev = file->f_mapping->host->i_sb->s_dev;
+			else
+				__entry->s_dev = file->f_mapping->host->i_rdev;
+			__entry->old = old;
+			__entry->new = file->f_wb_err;
+		),
+
+		TP_printk("file=%p dev=%d:%d ino=0x%lx old=0x%x new=0x%x",
+			__entry->file, MAJOR(__entry->s_dev),
+			MINOR(__entry->s_dev), __entry->i_ino, __entry->old,
+			__entry->new)
+);
 #endif /* _TRACE_FILEMAP_H */
 
 /* This part must be outside protection */
diff --git a/lib/errseq.c b/lib/errseq.c
index d129c0611c1f..009972d3000c 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -52,10 +52,14 @@
  *
  * Most callers will want to use the errseq_set inline wrapper to efficiently
  * handle the common case where err is 0.
+ *
+ * We do return an errseq_t here, primarily for debugging purposes. The return
+ * value should not be used as a previously sampled value in later calls as it
+ * will not have the SEEN flag set.
  */
-void __errseq_set(errseq_t *eseq, int err)
+errseq_t __errseq_set(errseq_t *eseq, int err)
 {
-	errseq_t old;
+	errseq_t cur, old;
 
 	/* MAX_ERRNO must be able to serve as a mask */
 	BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1);
@@ -66,13 +70,14 @@ void __errseq_set(errseq_t *eseq, int err)
 	 * also don't accept zero here as that would effectively clear a
 	 * previous error.
 	 */
+	old = READ_ONCE(*eseq);
+
 	if (WARN(unlikely(err == 0 || (unsigned int)-err > MAX_ERRNO),
 				"err = %d\n", err))
-		return;
+		return old;
 
-	old = READ_ONCE(*eseq);
 	for (;;) {
-		errseq_t new, cur;
+		errseq_t new;
 
 		/* Clear out error bits and set new error */
 		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
@@ -82,8 +87,10 @@ void __errseq_set(errseq_t *eseq, int err)
 			new += ERRSEQ_CTR_INC;
 
 		/* If there would be no change, then call it done */
-		if (new == old)
+		if (new == old) {
+			cur = new;
 			break;
+		}
 
 		/* Try to swap the new value into place */
 		cur = cmpxchg(eseq, old, new);
@@ -98,6 +105,7 @@ void __errseq_set(errseq_t *eseq, int err)
 		/* Raced with an update, try again */
 		old = cur;
 	}
+	return cur;
 }
 EXPORT_SYMBOL(__errseq_set);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index aeb58db10688..c5e19ea0bf12 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -535,6 +535,14 @@ int filemap_write_and_wait_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL(filemap_write_and_wait_range);
 
+void __filemap_set_wb_err(struct address_space *mapping, int err)
+{
+	errseq_t eseq = __errseq_set(&mapping->wb_err, err);
+	trace_filemap_set_wb_err(mapping, eseq);
+
+}
+EXPORT_SYMBOL(__filemap_set_wb_err);
+
 /**
  * filemap_report_wb_err - report wb error (if any) that was previously set
  * @file: struct file on which the error is being reported
@@ -559,14 +567,17 @@ EXPORT_SYMBOL(filemap_write_and_wait_range);
 int filemap_report_wb_err(struct file *file)
 {
 	int err = 0;
+	errseq_t old = READ_ONCE(file->f_wb_err);
 	struct address_space *mapping = file->f_mapping;
 
 	/* Locklessly handle the common case where nothing has changed */
-	if (errseq_check(&mapping->wb_err, READ_ONCE(file->f_wb_err))) {
+	if (errseq_check(&mapping->wb_err, old)) {
 		/* Something changed, must use slow path */
 		spin_lock(&file->f_lock);
+		old = file->f_wb_err;
 		err = errseq_check_and_advance(&mapping->wb_err,
 						&file->f_wb_err);
+		trace_filemap_report_wb_err(file, old);
 		spin_unlock(&file->f_lock);
 	}
 	return err;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 04/17] fs: add a new fstype flag to indicate how writeback errors are tracked
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (2 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 03/17] mm: tracepoints for writeback error events Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 05/17] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors Jeff Layton
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Now that we have new infrastructure for handling writeback errors using
errseq_t, we need to convert the existing code to use it. We could
attempt to retrofit the old interfaces on top of the new, but there is
a conceptual disconnect here in the case of internal callers that
invoke filemap_fdatawait and the like.

When reporting writeback errors, we will always report errors that have
occurred since a particular point in time. With the old writeback error
reporting, the time we used was "since it was last tested/cleared" which
is entirely arbitrary and potentially racy. Now, we can report the
latest error that has occurred since an arbitrary point in time
(represented as a sampled errseq_t value).

This means that we need to touch each filesystem that calls
filemap_check_errors in some fashion and ensure that we establish sane
"since" values for those callers. But...some code is shared between
filesystems and needs to be able to handle both error tracking schemes.

Add a new FS_WB_ERRSEQ flag to the fstype. When mapping_set_error is
called, set mapping->wb_err if it's set, along with setting the
"legacy" AS_EIO/AS_ENOSPC flags. When calling filemap_report_wb_err,
always clear the legacy flags out as well.

This should allow subsystems to use the new errseq_t based error
reporting while simultaneously allowing the traditional semantics of
AS_EIO/AS_ENOSPC flags.

Eventually, this flag should be removed once everything is converted
to errseq_t based error tracking.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h      |  1 +
 include/linux/pagemap.h | 32 ++++++++++++++++++++++++++------
 mm/filemap.c            |  7 +++++++
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 293cbc7f3520..2f3bcf4eb73b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2021,6 +2021,7 @@ struct file_system_type {
 #define FS_BINARY_MOUNTDATA	2
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
+#define FS_WB_ERRSEQ		16	/* errseq_t writeback err tracking */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	struct dentry *(*mount) (struct file_system_type *, int,
 		       const char *, void *);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 316a19f6b635..1dbc2dd6fdd2 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -28,14 +28,34 @@ enum mapping_flags {
 	AS_NO_WRITEBACK_TAGS = 5,
 };
 
+/**
+ * mapping_set_error - record a writeback error in the address_space
+ * @mapping - the mapping in which an error should be set
+ * @error - the error to set in the mapping
+ *
+ * When writeback fails in some way, we must record that error so that
+ * userspace can be informed when fsync and the like are called.  We endeavor
+ * to report errors on any file that was open at the time of the error.  Some
+ * internal callers also need to know when writeback errors have occurred.
+ *
+ * When a writeback error occurs, most filesystems will want to call
+ * mapping_set_error to record the error in the mapping so that it can be
+ * reported when the application calls fsync(2).
+ */
 static inline void mapping_set_error(struct address_space *mapping, int error)
 {
-	if (unlikely(error)) {
-		if (error == -ENOSPC)
-			set_bit(AS_ENOSPC, &mapping->flags);
-		else
-			set_bit(AS_EIO, &mapping->flags);
-	}
+	if (likely(!error))
+		return;
+
+	/* Record it in wb_err if fs is using errseq_t based error tracking */
+	if (mapping->host->i_sb->s_type->fs_flags & FS_WB_ERRSEQ)
+		filemap_set_wb_err(mapping, error);
+
+	/* Unconditionally record it in flags for now, for legacy callers */
+	if (error == -ENOSPC)
+		set_bit(AS_ENOSPC, &mapping->flags);
+	else
+		set_bit(AS_EIO, &mapping->flags);
 }
 
 static inline void mapping_set_unevictable(struct address_space *mapping)
diff --git a/mm/filemap.c b/mm/filemap.c
index c5e19ea0bf12..97dc28f853fc 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -580,6 +580,13 @@ int filemap_report_wb_err(struct file *file)
 		trace_filemap_report_wb_err(file, old);
 		spin_unlock(&file->f_lock);
 	}
+
+	/* Now clear the AS_* flags if any are set */
+	if (test_bit(AS_ENOSPC, &mapping->flags))
+	    clear_bit(AS_ENOSPC, &mapping->flags);
+	if (test_bit(AS_EIO, &mapping->flags))
+	    clear_bit(AS_EIO, &mapping->flags);
+
 	return err;
 }
 EXPORT_SYMBOL(filemap_report_wb_err);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 05/17] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (3 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 04/17] fs: add a new fstype flag to indicate how writeback errors are tracked Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 06/17] fs: adapt sync_file_range to new reporting infrastructure Jeff Layton
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

I waxed a little loquacious here, but I figured that more detail was
better, and writeback error handling is so hard to get right.

Although I think we'll eventually remove it once the transition is
complete, I've gone ahead and documented the FS_WB_ERRSEQ flag as well.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 Documentation/filesystems/vfs.txt | 50 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index f42b90687d40..c3efdd833a3d 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -576,7 +576,49 @@ should clear PG_Dirty and set PG_Writeback.  It can be actually
 written at any point after PG_Dirty is clear.  Once it is known to be
 safe, PG_Writeback is cleared.
 
-Writeback makes use of a writeback_control structure...
+Writeback makes use of a writeback_control structure to direct the
+operations.  This gives the the writepage and writepages operations some
+information about the nature of and reason for the writeback request,
+and the constraints under which it is being done.  It is also used to
+return information back to the caller about the result of a writepage or
+writepages request.
+
+Handling errors during writeback
+--------------------------------
+Most applications that utilize the pagecache will periodically call
+fsync to ensure that data written has made it to the backing store.
+When there is an error during writeback, expect that error to be
+reported when fsync is called.  After an error has been reported to
+fsync, subsequent fsync calls on the same file descriptor should return
+0, unless further writeback errors have occurred since the previous
+fsync.
+
+Ideally, the kernel would report an error only on file descriptions on
+which writes were done that subsequently failed to be written back.  The
+generic pagecache infrastructure does not track the file descriptions
+that have dirtied each individual page however, so determining which
+file descriptors should get back an error is not possible.
+
+Instead, the generic writeback error tracking infrastructure in the
+kernel settles for reporting errors to fsync on all file descriptions
+that were open at the time that the error occurred.  In a situation with
+multiple writers, all of them will get back an error on a subsequent fsync,
+even if all of the writes done through that particular file descriptor
+succeeded (or even if there were no writes on that file descriptor at all).
+
+Filesystems that wish to use this infrastructure should call
+filemap_set_wb_err to record the error in the address_space when it
+occurs.  Then, at the end of their fsync operation, they should call
+filemap_report_wb_err to ensure that the struct file's error cursor
+has advanced to the correct point in the stream of errors emitted by
+the backing device(s).
+
+Older kernels used a different method for tracking errors, based on flags
+in the address_space. We're currently switching everything over to use
+the infrastructure based on errseq_t values. During the transition,
+filesystem authors will want to also ensure their file_system_type has
+FS_WB_ERRSEQ set in fs_flags to ensure that shared infrastructure is
+aware of the model in use.
 
 struct address_space_operations
 -------------------------------
@@ -804,7 +846,8 @@ struct address_space_operations {
 The File Object
 ===============
 
-A file object represents a file opened by a process.
+A file object represents a file opened by a process. This is also known
+as an "open file description" in POSIX parlance.
 
 
 struct file_operations
@@ -887,7 +930,8 @@ otherwise noted.
 
   release: called when the last reference to an open file is closed
 
-  fsync: called by the fsync(2) system call
+  fsync: called by the fsync(2) system call. Also see the section above
+	 entitled "Handling errors during writeback".
 
   fasync: called by the fcntl(2) system call when asynchronous
 	(non-blocking) mode is enabled for a file
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 06/17] fs: adapt sync_file_range to new reporting infrastructure
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (4 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 05/17] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 07/17] mm: add filemap_fdatawait_range_since and filemap_write_and_wait_range_since Jeff Layton
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Since it returns errors in a way similar to fsync, have it use the same
method for returning previously-reported writeback errors.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/sync.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/sync.c b/fs/sync.c
index ec93aac4feb9..819a81526714 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -275,8 +275,11 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
  *
  *
  * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any
- * I/O errors or ENOSPC conditions and will return those to the caller, after
- * clearing the EIO and ENOSPC flags in the address_space.
+ * error condition that occurred prior to or after writeback, and will return
+ * that to the caller, while advancing the file's errseq_t cursor. Note that
+ * any errors returned here may have occurred in an area of the file that is
+ * not covered by the given range as most filesystems track writeback errors
+ * on a per-address_space basis
  *
  * It should be noted that none of these operations write out the file's
  * metadata.  So unless the application is strictly performing overwrites of
@@ -343,19 +346,25 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
 	if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
 		ret = filemap_fdatawait_range(mapping, offset, endbyte);
 		if (ret < 0)
-			goto out_put;
+			goto out_report;
 	}
 
 	if (flags & SYNC_FILE_RANGE_WRITE) {
 		ret = __filemap_fdatawrite_range(mapping, offset, endbyte,
 						 WB_SYNC_NONE);
 		if (ret < 0)
-			goto out_put;
+			goto out_report;
 	}
 
 	if (flags & SYNC_FILE_RANGE_WAIT_AFTER)
 		ret = filemap_fdatawait_range(mapping, offset, endbyte);
 
+out_report:
+	if (mapping->host->i_sb->s_type->fs_flags & FS_WB_ERRSEQ) {
+		int ret2 = filemap_report_wb_err(f.file);
+		if (!ret)
+			ret = ret2;
+	}
 out_put:
 	fdput(f);
 out:
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 07/17] mm: add filemap_fdatawait_range_since and filemap_write_and_wait_range_since
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (5 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 06/17] fs: adapt sync_file_range to new reporting infrastructure Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 08/17] dax: set errors in mapping when writeback fails Jeff Layton
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Add new filemap_*wait* variants that take a "since" value and return an
error if one occurred since that sample point.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h |  9 ++++++++
 mm/filemap.c       | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2f3bcf4eb73b..7d1bd3163d99 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2516,12 +2516,21 @@ extern int write_inode_now(struct inode *, int);
 extern int filemap_fdatawrite(struct address_space *);
 extern int filemap_flush(struct address_space *);
 extern int filemap_fdatawait(struct address_space *);
+extern int filemap_fdatawait_since(struct address_space *, errseq_t);
 extern void filemap_fdatawait_keep_errors(struct address_space *);
 extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
 				   loff_t lend);
+extern int filemap_fdatawait_range_since(struct address_space *mapping,
+				   loff_t start_byte, loff_t end_byte,
+				   errseq_t since);
 extern int filemap_write_and_wait(struct address_space *mapping);
+extern int filemap_write_and_wait_since(struct address_space *mapping,
+					errseq_t since);
 extern int filemap_write_and_wait_range(struct address_space *mapping,
 				        loff_t lstart, loff_t lend);
+extern int filemap_write_and_wait_range_since(struct address_space *mapping,
+				   loff_t start_byte, loff_t end_byte,
+				   errseq_t since);
 extern int __filemap_fdatawrite_range(struct address_space *mapping,
 				loff_t start, loff_t end, int sync_mode);
 extern int filemap_fdatawrite_range(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index 97dc28f853fc..38a14dc825ad 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -431,6 +431,14 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
 }
 EXPORT_SYMBOL(filemap_fdatawait_range);
 
+int filemap_fdatawait_range_since(struct address_space *mapping, loff_t start_byte,
+				  loff_t end_byte, errseq_t since)
+{
+	__filemap_fdatawait_range(mapping, start_byte, end_byte);
+	return filemap_check_wb_err(mapping, since);
+}
+EXPORT_SYMBOL(filemap_fdatawait_range_since);
+
 /**
  * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
  * @mapping: address space structure to wait for
@@ -476,6 +484,17 @@ int filemap_fdatawait(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_fdatawait);
 
+int filemap_fdatawait_since(struct address_space *mapping, errseq_t since)
+{
+	loff_t i_size = i_size_read(mapping->host);
+
+	if (i_size == 0)
+		return 0;
+
+	return filemap_fdatawait_range_since(mapping, 0, i_size - 1, since);
+}
+EXPORT_SYMBOL(filemap_fdatawait_since);
+
 int filemap_write_and_wait(struct address_space *mapping)
 {
 	int err = 0;
@@ -501,6 +520,31 @@ int filemap_write_and_wait(struct address_space *mapping)
 }
 EXPORT_SYMBOL(filemap_write_and_wait);
 
+int filemap_write_and_wait_since(struct address_space *mapping, errseq_t since)
+{
+	int err = 0;
+
+	if ((!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+		err = filemap_fdatawrite(mapping);
+		/*
+		 * Even if the above returned error, the pages may be
+		 * written partially (e.g. -ENOSPC), so we wait for it.
+		 * But the -EIO is special case, it may indicate the worst
+		 * thing (e.g. bug) happened, so we avoid waiting for it.
+		 */
+		if (err != -EIO) {
+			int err2 = filemap_fdatawait_since(mapping, since);
+			if (!err)
+				err = err2;
+		}
+	} else {
+		err = filemap_check_wb_err(mapping, since);
+	}
+	return err;
+}
+EXPORT_SYMBOL(filemap_write_and_wait_since);
+
 /**
  * filemap_write_and_wait_range - write out & wait on a file range
  * @mapping:	the address_space for the pages
@@ -535,6 +579,29 @@ int filemap_write_and_wait_range(struct address_space *mapping,
 }
 EXPORT_SYMBOL(filemap_write_and_wait_range);
 
+int filemap_write_and_wait_range_since(struct address_space *mapping,
+				 loff_t lstart, loff_t lend, errseq_t since)
+{
+	int err = 0;
+
+	if ((!dax_mapping(mapping) && mapping->nrpages) ||
+	    (dax_mapping(mapping) && mapping->nrexceptional)) {
+		err = __filemap_fdatawrite_range(mapping, lstart, lend,
+						 WB_SYNC_ALL);
+		/* See comment of filemap_write_and_wait() */
+		if (err != -EIO) {
+			int err2 = filemap_fdatawait_range_since(mapping,
+						lstart, lend, since);
+			if (!err)
+				err = err2;
+		}
+	} else {
+		err = filemap_check_wb_err(mapping, since);
+	}
+	return err;
+}
+EXPORT_SYMBOL(filemap_write_and_wait_range_since);
+
 void __filemap_set_wb_err(struct address_space *mapping, int err)
 {
 	errseq_t eseq = __errseq_set(&mapping->wb_err, err);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 08/17] dax: set errors in mapping when writeback fails
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (6 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 07/17] mm: add filemap_fdatawait_range_since and filemap_write_and_wait_range_since Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-06-06  1:01   ` Ross Zwisler
  2017-05-31 12:45 ` [PATCH v5 09/17] block: convert to errseq_t based writeback error tracking Jeff Layton
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Jan's description for this patch is much better than mine, so I'm
quoting it verbatim here:

-----------------8<-----------------
DAX currently doesn't set errors in the mapping when cache flushing
fails in dax_writeback_mapping_range(). Since this function can get
called only from fsync(2) or sync(2), this is actually as good as it can
currently get since we correctly propagate the error up from
dax_writeback_mapping_range() to filemap_fdatawrite()

However, in the future better writeback error handling will enable us to
properly report these errors on fsync(2) even if there are multiple file
descriptors open against the file or if sync(2) gets called before
fsync(2). So convert DAX to using standard error reporting through the
mapping.
-----------------8<-----------------

For now, only do this when the FS_WB_ERRSEQ flag is set. The
AS_EIO/AS_ENOSPC flags are not currently cleared in the older code when
writeback initiation fails, only when we discover an error after waiting
on writeback to complete, so we only want to do this with errseq_t based
error handling to prevent seeing duplicate errors on fsync.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-and-Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
---
 fs/dax.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index c22eaf162f95..42788d8505c7 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -856,8 +856,24 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 
 			ret = dax_writeback_one(bdev, dax_dev, mapping,
 					indices[i], pvec.pages[i]);
-			if (ret < 0)
+			if (ret < 0) {
+				/*
+				 * For fs' that use errseq_t based error
+				 * tracking, we must call mapping_set_error
+				 * here to ensure that fsync on all open fds
+				 * get back an error. Doing this with the old
+				 * wb error tracking infrastructure is
+				 * problematic though, as DAX writeback is
+				 * synchronous, and the error flags are not
+				 * cleared when initiation fails, only when
+				 * it fails after the write has been submitted
+				 * to the backing store.
+				 */
+				if (mapping->host->i_sb->s_type->fs_flags &
+						FS_WB_ERRSEQ)
+					mapping_set_error(mapping, ret);
 				goto out;
+			}
 		}
 	}
 out:
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 09/17] block: convert to errseq_t based writeback error tracking
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (7 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 08/17] dax: set errors in mapping when writeback fails Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 10/17] block: add sync_blockdev_since and sync_filesystem_since Jeff Layton
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Fairly straightforward conversion. In fsync, just use the file->f_wb_err
value as a "since" value. At the end, call filemap_report_wb_err to
advance the cursor in it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/block_dev.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 4d62fe771587..0d5f849e2a18 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -622,11 +622,13 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 {
 	struct inode *bd_inode = bdev_file_inode(filp);
 	struct block_device *bdev = I_BDEV(bd_inode);
-	int error;
+	int error, wberr;
+	errseq_t since = READ_ONCE(filp->f_wb_err);
 	
-	error = filemap_write_and_wait_range(filp->f_mapping, start, end);
+	error = filemap_write_and_wait_range_since(filp->f_mapping, start,
+							end, since);
 	if (error)
-		return error;
+		goto out;
 
 	/*
 	 * There is no need to serialise calls to blkdev_issue_flush with
@@ -637,6 +639,10 @@ int blkdev_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
 	if (error == -EOPNOTSUPP)
 		error = 0;
 
+out:
+	wberr = filemap_report_wb_err(filp);
+	if (!error)
+		error = wberr;
 	return error;
 }
 EXPORT_SYMBOL(blkdev_fsync);
@@ -801,6 +807,7 @@ static struct file_system_type bd_type = {
 	.name		= "bdev",
 	.mount		= bd_mount,
 	.kill_sb	= kill_anon_super,
+	.fs_flags	= FS_WB_ERRSEQ,
 };
 
 struct super_block *blockdev_superblock __read_mostly;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 10/17] block: add sync_blockdev_since and sync_filesystem_since
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (8 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 09/17] block: convert to errseq_t based writeback error tracking Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 11/17] fs: add f_md_wb_err field to struct file for tracking metadata errors Jeff Layton
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

New variants of sync_filesystem and sync_blockdev.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/block_dev.c     | 15 +++++++++++++++
 fs/internal.h      |  8 ++++++++
 fs/sync.c          | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h | 13 ++++++++++++-
 4 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0d5f849e2a18..9da613ec1665 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -452,6 +452,15 @@ int __sync_blockdev(struct block_device *bdev, int wait)
 	return filemap_write_and_wait(bdev->bd_inode->i_mapping);
 }
 
+int __sync_blockdev_since(struct block_device *bdev, int wait, errseq_t since)
+{
+	if (!bdev)
+		return 0;
+	if (!wait)
+		return filemap_flush(bdev->bd_inode->i_mapping);
+	return filemap_write_and_wait_since(bdev->bd_inode->i_mapping, since);
+}
+
 /*
  * Write out and wait upon all the dirty data associated with a block
  * device via its mapping.  Does not take the superblock lock.
@@ -462,6 +471,12 @@ int sync_blockdev(struct block_device *bdev)
 }
 EXPORT_SYMBOL(sync_blockdev);
 
+int sync_blockdev_since(struct block_device *bdev, errseq_t since)
+{
+	return __sync_blockdev_since(bdev, 1, since);
+}
+EXPORT_SYMBOL(sync_blockdev_since);
+
 /*
  * Write out and wait upon all dirty data associated with this
  * device.   Filesystem data as well as the underlying block
diff --git a/fs/internal.h b/fs/internal.h
index 9676fe11c093..234343ba8af7 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -25,6 +25,8 @@ struct shrink_control;
 extern void __init bdev_cache_init(void);
 
 extern int __sync_blockdev(struct block_device *bdev, int wait);
+extern int __sync_blockdev_since(struct block_device *bdev, int wait,
+					errseq_t since);
 
 #else
 static inline void bdev_cache_init(void)
@@ -35,6 +37,12 @@ static inline int __sync_blockdev(struct block_device *bdev, int wait)
 {
 	return 0;
 }
+
+static inline int __sync_blockdev_since(struct block_device *bdev, int wait,
+					errseq_t since)
+{
+	return 0;
+}
 #endif
 
 /*
diff --git a/fs/sync.c b/fs/sync.c
index 819a81526714..2a8202f9eb21 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -71,6 +71,51 @@ int sync_filesystem(struct super_block *sb)
 }
 EXPORT_SYMBOL(sync_filesystem);
 
+static int __sync_filesystem_since(struct super_block *sb, int wait,
+					errseq_t since)
+{
+	int fs_ret = 0, bd_ret;
+
+	if (wait)
+		sync_inodes_sb(sb);
+	else
+		writeback_inodes_sb(sb, WB_REASON_SYNC);
+
+	if (sb->s_op->sync_fs)
+		fs_ret = sb->s_op->sync_fs(sb, wait);
+	bd_ret = __sync_blockdev_since(sb->s_bdev, wait, since);
+
+	return fs_ret ? fs_ret : bd_ret;
+}
+
+/*
+ * Write out and wait upon all dirty data associated with this
+ * superblock.  Filesystem data as well as the underlying block
+ * device.  Takes the superblock lock.
+ */
+int sync_filesystem_since(struct super_block *sb, errseq_t since)
+{
+	int ret;
+
+	/*
+	 * We need to be protected against the filesystem going from
+	 * r/o to r/w or vice versa.
+	 */
+	WARN_ON(!rwsem_is_locked(&sb->s_umount));
+
+	/*
+	 * No point in syncing out anything if the filesystem is read-only.
+	 */
+	if (sb->s_flags & MS_RDONLY)
+		return 0;
+
+	ret = __sync_filesystem_since(sb, 0, since);
+	if (ret < 0)
+		return ret;
+	return __sync_filesystem_since(sb, 1, since);
+}
+EXPORT_SYMBOL(sync_filesystem_since);
+
 static void sync_inodes_one_sb(struct super_block *sb, void *arg)
 {
 	if (!(sb->s_flags & MS_RDONLY))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7d1bd3163d99..f483c23866c4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2376,6 +2376,7 @@ extern void bdput(struct block_device *);
 extern void invalidate_bdev(struct block_device *);
 extern void iterate_bdevs(void (*)(struct block_device *, void *), void *);
 extern int sync_blockdev(struct block_device *bdev);
+extern int sync_blockdev_since(struct block_device *bdev, errseq_t since);
 extern void kill_bdev(struct block_device *);
 extern struct super_block *freeze_bdev(struct block_device *);
 extern void emergency_thaw_all(void);
@@ -2390,7 +2391,16 @@ static inline bool sb_is_blkdev_sb(struct super_block *sb)
 }
 #else
 static inline void bd_forget(struct inode *inode) {}
-static inline int sync_blockdev(struct block_device *bdev) { return 0; }
+static inline int sync_blockdev(struct block_device *bdev)
+{
+	return 0;
+}
+
+static inline int sync_blockdev_since(struct block_device *bdev,
+							errseq_t since)
+{
+	return 0;
+}
 static inline void kill_bdev(struct block_device *bdev) {}
 static inline void invalidate_bdev(struct block_device *bdev) {}
 
@@ -2414,6 +2424,7 @@ static inline bool sb_is_blkdev_sb(struct super_block *sb)
 }
 #endif
 extern int sync_filesystem(struct super_block *);
+extern int sync_filesystem_since(struct super_block *, errseq_t);
 extern const struct file_operations def_blk_fops;
 extern const struct file_operations def_chr_fops;
 #ifdef CONFIG_BLOCK
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 11/17] fs: add f_md_wb_err field to struct file for tracking metadata errors
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (9 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 10/17] block: add sync_blockdev_since and sync_filesystem_since Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 12/17] fs: allow __generic_file_fsync to support both flavors of error reporting Jeff Layton
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Some filesystems (particularly local ones) keep a different mapping for
metadata writeback. Add a second errseq_t to struct file for tracking
metadata writeback errors. Also add a new function for checking a
mapping of the caller's choosing vs. the f_md_wb_err value.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h             |  3 +++
 include/trace/events/filemap.h | 23 ++++++++++-------------
 mm/filemap.c                   | 40 +++++++++++++++++++++++++++++++---------
 3 files changed, 44 insertions(+), 22 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index f483c23866c4..df1d68e3605a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -871,6 +871,7 @@ struct file {
 	struct list_head	f_tfile_llink;
 #endif /* #ifdef CONFIG_EPOLL */
 	struct address_space	*f_mapping;
+	errseq_t		f_md_wb_err; /* optional metadata wb error tracking */
 } __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
 
 struct file_handle {
@@ -2549,6 +2550,8 @@ extern int filemap_fdatawrite_range(struct address_space *mapping,
 extern int filemap_check_errors(struct address_space *mapping);
 
 extern int __must_check filemap_report_wb_err(struct file *file);
+extern int __must_check filemap_report_md_wb_err(struct file *file,
+					struct address_space *mapping);
 extern void __filemap_set_wb_err(struct address_space *mapping, int err);
 
 /**
diff --git a/include/trace/events/filemap.h b/include/trace/events/filemap.h
index 2af66920f267..6e0d78c01a2e 100644
--- a/include/trace/events/filemap.h
+++ b/include/trace/events/filemap.h
@@ -79,12 +79,11 @@ TRACE_EVENT(filemap_set_wb_err,
 );
 
 TRACE_EVENT(filemap_report_wb_err,
-		TP_PROTO(struct file *file, errseq_t old),
+		TP_PROTO(struct address_space *mapping, errseq_t old, errseq_t new),
 
-		TP_ARGS(file, old),
+		TP_ARGS(mapping, old, new),
 
 		TP_STRUCT__entry(
-			__field(struct file *, file);
 			__field(unsigned long, i_ino)
 			__field(dev_t, s_dev)
 			__field(errseq_t, old)
@@ -92,20 +91,18 @@ TRACE_EVENT(filemap_report_wb_err,
 		),
 
 		TP_fast_assign(
-			__entry->file = file;
-			__entry->i_ino = file->f_mapping->host->i_ino;
-			if (file->f_mapping->host->i_sb)
-				__entry->s_dev = file->f_mapping->host->i_sb->s_dev;
+			__entry->i_ino = mapping->host->i_ino;
+			if (mapping->host->i_sb)
+				__entry->s_dev = mapping->host->i_sb->s_dev;
 			else
-				__entry->s_dev = file->f_mapping->host->i_rdev;
+				__entry->s_dev = mapping->host->i_rdev;
 			__entry->old = old;
-			__entry->new = file->f_wb_err;
+			__entry->new = new;
 		),
 
-		TP_printk("file=%p dev=%d:%d ino=0x%lx old=0x%x new=0x%x",
-			__entry->file, MAJOR(__entry->s_dev),
-			MINOR(__entry->s_dev), __entry->i_ino, __entry->old,
-			__entry->new)
+		TP_printk("dev=%d:%d ino=0x%lx old=0x%x new=0x%x",
+			MAJOR(__entry->s_dev), MINOR(__entry->s_dev),
+			__entry->i_ino, __entry->old, __entry->new)
 );
 #endif /* _TRACE_FILEMAP_H */
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 38a14dc825ad..0edf0234973e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -631,21 +631,20 @@ EXPORT_SYMBOL(__filemap_set_wb_err);
  * value is protected by the f_lock since we must ensure that it reflects
  * the latest value swapped in for this file descriptor.
  */
-int filemap_report_wb_err(struct file *file)
+static int __filemap_report_wb_err(errseq_t *cursor, spinlock_t *lock,
+				struct address_space *mapping)
 {
 	int err = 0;
-	errseq_t old = READ_ONCE(file->f_wb_err);
-	struct address_space *mapping = file->f_mapping;
+	errseq_t old = READ_ONCE(*cursor);
 
 	/* Locklessly handle the common case where nothing has changed */
 	if (errseq_check(&mapping->wb_err, old)) {
 		/* Something changed, must use slow path */
-		spin_lock(&file->f_lock);
-		old = file->f_wb_err;
-		err = errseq_check_and_advance(&mapping->wb_err,
-						&file->f_wb_err);
-		trace_filemap_report_wb_err(file, old);
-		spin_unlock(&file->f_lock);
+		spin_lock(lock);
+		old = *cursor;
+		err = errseq_check_and_advance(&mapping->wb_err, cursor);
+		trace_filemap_report_wb_err(mapping, old, *cursor);
+		spin_unlock(lock);
 	}
 
 	/* Now clear the AS_* flags if any are set */
@@ -656,9 +655,32 @@ int filemap_report_wb_err(struct file *file)
 
 	return err;
 }
+EXPORT_SYMBOL(__filemap_report_wb_err);
+
+int filemap_report_wb_err(struct file *file)
+{
+	return __filemap_report_wb_err(&file->f_wb_err, &file->f_lock,
+					file->f_mapping);
+}
 EXPORT_SYMBOL(filemap_report_wb_err);
 
 /**
+ * filemap_report_md_wb_err - report wb error (if any) that was previously set
+ * @file: struct file on which the error is being reported
+ * @mapping: pointer to metadata mapping to check
+ *
+ * Many filesystems keep inode metadata in the pagecache, and will use the
+ * cache to write it back to the backing store. This function is for these
+ * callers to track metadata writeback.
+ */
+int filemap_report_md_wb_err(struct file *file, struct address_space *mapping)
+{
+	return __filemap_report_wb_err(&file->f_md_wb_err, &file->f_lock,
+					mapping);
+}
+EXPORT_SYMBOL(filemap_report_md_wb_err);
+
+/**
  * replace_page_cache_page - replace a pagecache page with a new one
  * @old:	page to be replaced
  * @new:	page to replace with
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 12/17] fs: allow __generic_file_fsync to support both flavors of error reporting
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (10 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 11/17] fs: add f_md_wb_err field to struct file for tracking metadata errors Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 13/17] jbd2: conditionally handle errors using errseq_t based on FS_WB_ERRSEQ flag Jeff Layton
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

For now, we add a FS_WB_ERRSEQ check to know how to handle it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/libfs.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 1dec90819366..2ae58a252718 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -971,10 +971,18 @@ int __generic_file_fsync(struct file *file, loff_t start, loff_t end,
 				 int datasync)
 {
 	struct inode *inode = file->f_mapping->host;
-	int err;
-	int ret;
-
-	err = filemap_write_and_wait_range(inode->i_mapping, start, end);
+	int err, ret;
+	bool use_errseq = inode->i_sb->s_type->fs_flags & FS_WB_ERRSEQ;
+	errseq_t since;
+
+	if (use_errseq) {
+		since = READ_ONCE(file->f_wb_err);
+		err = filemap_write_and_wait_range_since(inode->i_mapping,
+				start, end, since);
+	} else {
+		err = filemap_write_and_wait_range(inode->i_mapping,
+							start, end);
+	}
 	if (err)
 		return err;
 
@@ -988,11 +996,15 @@ int __generic_file_fsync(struct file *file, loff_t start, loff_t end,
 	err = sync_inode_metadata(inode, 1);
 	if (ret == 0)
 		ret = err;
-
 out:
 	inode_unlock(inode);
-	err = filemap_check_errors(inode->i_mapping);
-	return ret ? ret : err;
+	if (ret == 0) {
+		if (use_errseq)
+			err = filemap_check_wb_err(inode->i_mapping, since);
+		else
+			err = filemap_check_errors(inode->i_mapping);
+	}
+	return ret;
 }
 EXPORT_SYMBOL(__generic_file_fsync);
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 13/17] jbd2: conditionally handle errors using errseq_t based on FS_WB_ERRSEQ flag
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (11 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 12/17] fs: allow __generic_file_fsync to support both flavors of error reporting Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 14/17] ext4: convert to errseq_t based error tracking Jeff Layton
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Grab the current mapping->wb_err when linking a transaction to the list
and stash it in the journal inode. Then we can use that as a "since"
value when committing it to ensure that there were no writeback errors
since the transaction was started.

We do still need to perform old-style error handling too for now in
journal_finish_inode_data_buffers. jbd2 is shared infrastructure between
several filesystems. Eventually we should be able to remove the flag check
and simplify this function again.

For journal recovery, sample the wb_err early on and then pass that as
the since value to sync_blockdev_since.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/jbd2/commit.c      | 29 +++++++++++++++++++----------
 fs/jbd2/recovery.c    |  5 +++--
 fs/jbd2/transaction.c |  1 +
 include/linux/jbd2.h  |  3 +++
 4 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index b6b194ec1b4f..aea71e4bc9be 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -259,21 +259,30 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
 	/* For locking, see the comment in journal_submit_data_buffers() */
 	spin_lock(&journal->j_list_lock);
 	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
+		struct inode *inode = jinode->i_vfs_inode;
+
 		if (!(jinode->i_flags & JI_WAIT_DATA))
 			continue;
 		jinode->i_flags |= JI_COMMIT_RUNNING;
 		spin_unlock(&journal->j_list_lock);
-		err = filemap_fdatawait(jinode->i_vfs_inode->i_mapping);
-		if (err) {
-			/*
-			 * Because AS_EIO is cleared by
-			 * filemap_fdatawait_range(), set it again so
-			 * that user process can get -EIO from fsync().
-			 */
-			mapping_set_error(jinode->i_vfs_inode->i_mapping, -EIO);
-
-			if (!ret)
+		if (inode->i_sb->s_type->fs_flags & FS_WB_ERRSEQ) {
+			err = filemap_fdatawait_since(inode->i_mapping,
+					jinode->i_since);
+			if (err && !ret)
 				ret = err;
+		} else {
+			err = filemap_fdatawait(inode->i_mapping);
+			if (err) {
+				/*
+				 * Because AS_EIO is cleared by
+				 * filemap_fdatawait_range(), we must set it again so
+				 * that user process can get -EIO from fsync() if
+				 * non-errseq_t based error tracking is in play.
+				 */
+				mapping_set_error(inode->i_mapping, -EIO);
+				if (!ret)
+					ret = err;
+			}
 		}
 		spin_lock(&journal->j_list_lock);
 		jinode->i_flags &= ~JI_COMMIT_RUNNING;
diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c
index 02dd3360cb20..06a8ee71848c 100644
--- a/fs/jbd2/recovery.c
+++ b/fs/jbd2/recovery.c
@@ -248,11 +248,12 @@ int jbd2_journal_recover(journal_t *journal)
 {
 	int			err, err2;
 	journal_superblock_t *	sb;
-
 	struct recovery_info	info;
+	errseq_t		since;
 
 	memset(&info, 0, sizeof(info));
 	sb = journal->j_superblock;
+	since = filemap_sample_wb_err(journal->j_fs_dev->bd_inode->i_mapping);
 
 	/*
 	 * The journal superblock's s_start field (the current log head)
@@ -284,7 +285,7 @@ int jbd2_journal_recover(journal_t *journal)
 	journal->j_transaction_sequence = ++info.end_transaction;
 
 	jbd2_journal_clear_revoke(journal);
-	err2 = sync_blockdev(journal->j_fs_dev);
+	err2 = sync_blockdev_since(journal->j_fs_dev, since);
 	if (!err)
 		err = err2;
 	/* Make sure all replayed data is on permanent storage */
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 9ee4832b6f8b..e9e6af20a087 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -2535,6 +2535,7 @@ static int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode,
 	/* Not on any transaction list... */
 	J_ASSERT(!jinode->i_next_transaction);
 	jinode->i_transaction = transaction;
+	jinode->i_since = filemap_sample_wb_err(jinode->i_vfs_inode->i_mapping);
 	list_add(&jinode->i_list, &transaction->t_inode_list);
 done:
 	spin_unlock(&journal->j_list_lock);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 606b6bce3a5b..b6901eac2d8e 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -439,6 +439,9 @@ struct jbd2_inode {
 
 	/* Flags of inode [j_list_lock] */
 	unsigned long i_flags;
+
+	/* Sampled writeback error at the time of transaction start */
+	errseq_t i_since;
 };
 
 struct jbd2_revoke_table_s;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 14/17] ext4: convert to errseq_t based error tracking
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (12 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 13/17] jbd2: conditionally handle errors using errseq_t based on FS_WB_ERRSEQ flag Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 15/17] fs: add a write_one_page_since Jeff Layton
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Sample the block device inode's errseq_t when opening a file, so we can
catch metadata writeback errors at fsync time. Change ext4_sync_file to
check for data errors first, and then check the blockdev for metadata
errors afterward.

There are also several internal callers of filemap_write_and_wait_* that
check the error code afterward. Convert them to the "_since" variants,
using the file->f_wb_err value as the "since" value. This means passing
file pointers to several functions instead of inode pointers.

Note that because metadata writeback errors are only tracked on a
per-device level, this does mean that we'll end up reporting an error on
all open file descriptors when there is a metadata writeback failure.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/ext4/dir.c     |  8 ++++++--
 fs/ext4/ext4.h    |  8 ++++----
 fs/ext4/extents.c | 24 ++++++++++++++----------
 fs/ext4/file.c    |  5 ++++-
 fs/ext4/fsync.c   | 23 ++++++++++++++++++-----
 fs/ext4/inode.c   | 19 ++++++++++++-------
 fs/ext4/ioctl.c   |  9 +++++----
 fs/ext4/super.c   |  9 +++++----
 8 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index e8b365000d73..6bbb19510f74 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -611,9 +611,13 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
 
 static int ext4_dir_open(struct inode * inode, struct file * filp)
 {
+	int ret = 0;
+
 	if (ext4_encrypted_inode(inode))
-		return fscrypt_get_encryption_info(inode) ? -EACCES : 0;
-	return 0;
+		ret = fscrypt_get_encryption_info(inode) ? -EACCES : 0;
+	if (!ret)
+		filp->f_md_wb_err = filemap_sample_wb_err(inode->i_sb->s_bdev->bd_inode->i_mapping);
+	return ret;
 }
 
 static int ext4_release_dir(struct inode *inode, struct file *filp)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8e8046104f4d..e3ab27db43d0 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2468,12 +2468,12 @@ extern void ext4_clear_inode(struct inode *);
 extern int  ext4_file_getattr(const struct path *, struct kstat *, u32, unsigned int);
 extern int  ext4_sync_inode(handle_t *, struct inode *);
 extern void ext4_dirty_inode(struct inode *, int);
-extern int ext4_change_inode_journal_flag(struct inode *, int);
+extern int ext4_change_inode_journal_flag(struct file *, int);
 extern int ext4_get_inode_loc(struct inode *, struct ext4_iloc *);
 extern int ext4_inode_attach_jinode(struct inode *inode);
 extern int ext4_can_truncate(struct inode *inode);
 extern int ext4_truncate(struct inode *);
-extern int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length);
+extern int ext4_punch_hole(struct file *file, loff_t offset, loff_t length);
 extern int ext4_truncate_restart_trans(handle_t *, struct inode *, int nblocks);
 extern void ext4_set_inode_flags(struct inode *);
 extern int ext4_alloc_da_blocks(struct inode *inode);
@@ -3143,8 +3143,8 @@ extern ext4_lblk_t ext4_ext_next_allocated_block(struct ext4_ext_path *path);
 extern int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			__u64 start, __u64 len);
 extern int ext4_ext_precache(struct inode *inode);
-extern int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len);
-extern int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len);
+extern int ext4_collapse_range(struct file *file, loff_t offset, loff_t len);
+extern int ext4_insert_range(struct file *file, loff_t offset, loff_t len);
 extern int ext4_swap_extents(handle_t *handle, struct inode *inode1,
 				struct inode *inode2, ext4_lblk_t lblk1,
 			     ext4_lblk_t lblk2,  ext4_lblk_t count,
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 2a97dff87b96..7e108fda9ae9 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4934,17 +4934,17 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		return -EOPNOTSUPP;
 
 	if (mode & FALLOC_FL_PUNCH_HOLE)
-		return ext4_punch_hole(inode, offset, len);
+		return ext4_punch_hole(file, offset, len);
 
 	ret = ext4_convert_inline_data(inode);
 	if (ret)
 		return ret;
 
 	if (mode & FALLOC_FL_COLLAPSE_RANGE)
-		return ext4_collapse_range(inode, offset, len);
+		return ext4_collapse_range(file, offset, len);
 
 	if (mode & FALLOC_FL_INSERT_RANGE)
-		return ext4_insert_range(inode, offset, len);
+		return ext4_insert_range(file, offset, len);
 
 	if (mode & FALLOC_FL_ZERO_RANGE)
 		return ext4_zero_range(file, offset, len, mode);
@@ -5444,14 +5444,16 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
  * This implements the fallocate's collapse range functionality for ext4
  * Returns: 0 and non-zero on error.
  */
-int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
+int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
 {
+	struct inode *inode = file_inode(file);
 	struct super_block *sb = inode->i_sb;
 	ext4_lblk_t punch_start, punch_stop;
 	handle_t *handle;
 	unsigned int credits;
 	loff_t new_size, ioffset;
 	int ret;
+	errseq_t since = READ_ONCE(file->f_wb_err);
 
 	/*
 	 * We need to test this early because xfstests assumes that a
@@ -5515,7 +5517,7 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
 	 * Write tail of the last page before removed range since it will get
 	 * removed from the page cache below.
 	 */
-	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset, offset);
+	ret = filemap_write_and_wait_range_since(inode->i_mapping, ioffset, offset, since);
 	if (ret)
 		goto out_mmap;
 	/*
@@ -5523,8 +5525,8 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
 	 * page cache below. We are also protected from pages becoming dirty
 	 * by i_mmap_sem.
 	 */
-	ret = filemap_write_and_wait_range(inode->i_mapping, offset + len,
-					   LLONG_MAX);
+	ret = filemap_write_and_wait_range_since(inode->i_mapping, offset + len,
+					   LLONG_MAX, since);
 	if (ret)
 		goto out_mmap;
 	truncate_pagecache(inode, ioffset);
@@ -5588,8 +5590,9 @@ int ext4_collapse_range(struct inode *inode, loff_t offset, loff_t len)
  * by len bytes.
  * Returns 0 on success, error otherwise.
  */
-int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
+int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
 {
+	struct inode *inode = file_inode(file);
 	struct super_block *sb = inode->i_sb;
 	handle_t *handle;
 	struct ext4_ext_path *path;
@@ -5598,6 +5601,7 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
 	unsigned int credits, ee_len;
 	int ret = 0, depth, split_flag = 0;
 	loff_t ioffset;
+	errseq_t since = READ_ONCE(file->f_wb_err);
 
 	/*
 	 * We need to test this early because xfstests assumes that an
@@ -5661,8 +5665,8 @@ int ext4_insert_range(struct inode *inode, loff_t offset, loff_t len)
 	 */
 	ioffset = round_down(offset, PAGE_SIZE);
 	/* Write out all dirty pages */
-	ret = filemap_write_and_wait_range(inode->i_mapping, ioffset,
-			LLONG_MAX);
+	ret = filemap_write_and_wait_range_since(inode->i_mapping, ioffset,
+			LLONG_MAX, since);
 	if (ret)
 		goto out_mmap;
 	truncate_pagecache(inode, ioffset);
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 831fd6beebf0..fe0d6e01c4b7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -435,7 +435,10 @@ static int ext4_file_open(struct inode * inode, struct file * filp)
 		if (ret < 0)
 			return ret;
 	}
-	return dquot_file_open(inode, filp);
+	ret = dquot_file_open(inode, filp);
+	if (!ret)
+		filp->f_md_wb_err = filemap_sample_wb_err(sb->s_bdev->bd_inode->i_mapping);
+	return ret;
 }
 
 /*
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 9d549608fd30..ba474de2dadb 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -99,9 +99,12 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	int ret = 0, err;
 	tid_t commit_tid;
 	bool needs_barrier = false;
+	errseq_t since = READ_ONCE(file->f_wb_err);
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
-		return -EIO;
+	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb)))) {
+		ret = -EIO;
+		goto out;
+	}
 
 	J_ASSERT(ext4_journal_current_handle() == NULL);
 
@@ -124,9 +127,11 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}
 
-	ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
+	ret = filemap_write_and_wait_range_since(inode->i_mapping, start,
+						 end, since);
 	if (ret)
-		return ret;
+		goto out;
+
 	/*
 	 * data=writeback,ordered:
 	 *  The caller's filemap_fdatawrite()/wait will sync the data.
@@ -152,12 +157,20 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		needs_barrier = true;
 	ret = jbd2_complete_transaction(journal, commit_tid);
 	if (needs_barrier) {
-	issue_flush:
+issue_flush:
 		err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
 		if (!ret)
 			ret = err;
 	}
 out:
+	err = filemap_report_wb_err(file);
+	if (!ret)
+		ret = err;
+
+	err = filemap_report_md_wb_err(file,
+				inode->i_sb->s_bdev->bd_inode->i_mapping);
+	if (!ret)
+		ret = err;
 	trace_ext4_sync_file_exit(inode, ret);
 	return ret;
 }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1bd0bfa547f6..df3b6f62dcbb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3705,6 +3705,7 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter)
 	struct inode *inode = mapping->host;
 	size_t count = iov_iter_count(iter);
 	ssize_t ret;
+	errseq_t since = READ_ONCE(iocb->ki_filp->f_wb_err);
 
 	/*
 	 * Shared inode_lock is enough for us - it protects against concurrent
@@ -3712,8 +3713,8 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter)
 	 * we are protected against page writeback as well.
 	 */
 	inode_lock_shared(inode);
-	ret = filemap_write_and_wait_range(mapping, iocb->ki_pos,
-					   iocb->ki_pos + count);
+	ret = filemap_write_and_wait_range_since(mapping, iocb->ki_pos,
+					   iocb->ki_pos + count, since);
 	if (ret)
 		goto out_unlock;
 	ret = __blockdev_direct_IO(iocb, inode, inode->i_sb->s_bdev,
@@ -4085,8 +4086,9 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
  * Returns: 0 on success or negative on failure
  */
 
-int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
+int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
 {
+	struct inode *inode = file_inode(file);
 	struct super_block *sb = inode->i_sb;
 	ext4_lblk_t first_block, stop_block;
 	struct address_space *mapping = inode->i_mapping;
@@ -4094,6 +4096,7 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
 	handle_t *handle;
 	unsigned int credits;
 	int ret = 0;
+	errseq_t since = READ_ONCE(file->f_wb_err);
 
 	if (!S_ISREG(inode->i_mode))
 		return -EOPNOTSUPP;
@@ -4105,8 +4108,8 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
 	 * Then release them.
 	 */
 	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) {
-		ret = filemap_write_and_wait_range(mapping, offset,
-						   offset + length - 1);
+		ret = filemap_write_and_wait_range_since(mapping, offset,
+						   offset + length - 1, since);
 		if (ret)
 			return ret;
 	}
@@ -5771,12 +5774,14 @@ static int ext4_pin_inode(handle_t *handle, struct inode *inode)
 }
 #endif
 
-int ext4_change_inode_journal_flag(struct inode *inode, int val)
+int ext4_change_inode_journal_flag(struct file *file, int val)
 {
+	struct inode *inode = file_inode(file);
 	journal_t *journal;
 	handle_t *handle;
 	int err;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	errseq_t since = READ_ONCE(file->f_wb_err);
 
 	/*
 	 * We have to be very careful here: changing a data block's
@@ -5808,7 +5813,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
 	 */
 	if (val) {
 		down_write(&EXT4_I(inode)->i_mmap_sem);
-		err = filemap_write_and_wait(inode->i_mapping);
+		err = filemap_write_and_wait_since(inode->i_mapping, since);
 		if (err < 0) {
 			up_write(&EXT4_I(inode)->i_mmap_sem);
 			ext4_inode_resume_unlocked_dio(inode);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 0c21e22acd74..888a4533d078 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -207,9 +207,10 @@ static int uuid_is_zero(__u8 u[16])
 }
 #endif
 
-static int ext4_ioctl_setflags(struct inode *inode,
+static int ext4_ioctl_setflags(struct file *file,
 			       unsigned int flags)
 {
+	struct inode *inode = file_inode(file);
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	handle_t *handle = NULL;
 	int err = -EPERM, migrate = 0;
@@ -293,7 +294,7 @@ static int ext4_ioctl_setflags(struct inode *inode,
 		goto flags_out;
 
 	if ((jflag ^ oldflags) & (EXT4_JOURNAL_DATA_FL))
-		err = ext4_change_inode_journal_flag(inode, jflag);
+		err = ext4_change_inode_journal_flag(file, jflag);
 	if (err)
 		goto flags_out;
 	if (migrate) {
@@ -617,7 +618,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 			return err;
 
 		inode_lock(inode);
-		err = ext4_ioctl_setflags(inode, flags);
+		err = ext4_ioctl_setflags(filp, flags);
 		inode_unlock(inode);
 		mnt_drop_write_file(filp);
 		return err;
@@ -1015,7 +1016,7 @@ long ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		inode_lock(inode);
 		flags = (ei->i_flags & ~EXT4_FL_XFLAG_VISIBLE) |
 			 (flags & EXT4_FL_XFLAG_VISIBLE);
-		err = ext4_ioctl_setflags(inode, flags);
+		err = ext4_ioctl_setflags(filp, flags);
 		inode_unlock(inode);
 		mnt_drop_write_file(filp);
 		if (err)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0b177da9ea82..9ce0b6e63abb 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -119,7 +119,7 @@ static struct file_system_type ext2_fs_type = {
 	.name		= "ext2",
 	.mount		= ext4_mount,
 	.kill_sb	= kill_block_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV|FS_WB_ERRSEQ,
 };
 MODULE_ALIAS_FS("ext2");
 MODULE_ALIAS("ext2");
@@ -134,7 +134,7 @@ static struct file_system_type ext3_fs_type = {
 	.name		= "ext3",
 	.mount		= ext4_mount,
 	.kill_sb	= kill_block_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV|FS_WB_ERRSEQ,
 };
 MODULE_ALIAS_FS("ext3");
 MODULE_ALIAS("ext3");
@@ -4887,6 +4887,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
 	ext4_group_t g;
 	unsigned int journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
 	int err = 0;
+	errseq_t since = filemap_sample_wb_err(sb->s_bdev->bd_inode->i_mapping);
 #ifdef CONFIG_QUOTA
 	int i, j;
 #endif
@@ -4988,7 +4989,7 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
 		}
 
 		if (*flags & MS_RDONLY) {
-			err = sync_filesystem(sb);
+			err = sync_filesystem_since(sb, since);
 			if (err < 0)
 				goto restore_opts;
 			err = dquot_suspend(sb, -1);
@@ -5690,7 +5691,7 @@ static struct file_system_type ext4_fs_type = {
 	.name		= "ext4",
 	.mount		= ext4_mount,
 	.kill_sb	= kill_block_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV|FS_WB_ERRSEQ,
 };
 MODULE_ALIAS_FS("ext4");
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 15/17] fs: add a write_one_page_since
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (13 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 14/17] ext4: convert to errseq_t based error tracking Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 16/17] ext2: convert to errseq_t based writeback error tracking Jeff Layton
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Allow filesystems to pass in an errseq_t for a since value.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/mm.h  |  2 ++
 mm/page-writeback.c | 53 +++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ca9c8b27cecb..c901d7313374 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -23,6 +23,7 @@
 #include <linux/page_ext.h>
 #include <linux/err.h>
 #include <linux/page_ref.h>
+#include <linux/errseq.h>
 
 struct mempolicy;
 struct anon_vma;
@@ -2200,6 +2201,7 @@ extern int filemap_page_mkwrite(struct vm_fault *vmf);
 
 /* mm/page-writeback.c */
 int __must_check write_one_page(struct page *page);
+int __must_check write_one_page_since(struct page *page, errseq_t since);
 void task_dirty_inc(struct task_struct *tsk);
 
 /* readahead.c */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index e369e8ea2a29..63058e35c60d 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2365,19 +2365,10 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
 	return ret;
 }
 
-/**
- * write_one_page - write out a single page and wait on I/O
- * @page: the page to write
- *
- * The page must be locked by the caller and will be unlocked upon return.
- *
- * Note that the mapping's AS_EIO/AS_ENOSPC flags will be cleared when this
- * function returns.
- */
-int write_one_page(struct page *page)
+static int __write_one_page(struct page *page)
 {
 	struct address_space *mapping = page->mapping;
-	int ret = 0, ret2;
+	int ret;
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_ALL,
 		.nr_to_write = 1,
@@ -2394,16 +2385,54 @@ int write_one_page(struct page *page)
 			wait_on_page_writeback(page);
 		put_page(page);
 	} else {
+		ret = 0;
 		unlock_page(page);
 	}
+	return ret;
+}
 
+/**
+ * write_one_page - write out a single page and wait on I/O
+ * @page: the page to write
+ *
+ * The page must be locked by the caller and will be unlocked upon return.
+ *
+ * Note that the mapping's AS_EIO/AS_ENOSPC flags will be cleared when this
+ * function returns.
+ */
+int write_one_page(struct page *page)
+{
+	int ret;
+
+	ret = __write_one_page(page);
 	if (!ret)
-		ret = filemap_check_errors(mapping);
+		ret = filemap_check_errors(page->mapping);
 	return ret;
 }
 EXPORT_SYMBOL(write_one_page);
 
 /*
+ * write_one_page_since - write out a single page and wait on I/O
+ * @page: the page to write
+ * @since: previously sampled errseq_t
+ *
+ * The page must be locked by the caller and will be unlocked upon return.
+ *
+ * The caller should pass in a previously-sampled errseq_t. The mapping will
+ * be checked for errors since that point.
+ */
+int write_one_page_since(struct page *page, errseq_t since)
+{
+	int ret;
+
+	ret = __write_one_page(page);
+	if (!ret)
+		ret = filemap_check_wb_err(page->mapping, since);
+	return ret;
+}
+EXPORT_SYMBOL(write_one_page_since);
+
+/*
  * For address_spaces which do not use buffers nor write back.
  */
 int __set_page_dirty_no_writeback(struct page *page)
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 16/17] ext2: convert to errseq_t based writeback error tracking
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (14 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 15/17] fs: add a write_one_page_since Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 12:45 ` [PATCH v5 17/17] fs: convert ext2 to use write_one_page_since Jeff Layton
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Set the flag to indicate that we want new-style data writeback error
handling.

This means that we need to override the open routines for files and
directories so that we can sample the bdev wb_err at open.

XXX: doesn't quite pass the xfstest for this currently, as ext2_error
     resets the error on the device inode on every call.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/ext2/dir.c   |  8 ++++++++
 fs/ext2/file.c  | 29 +++++++++++++++++++++++------
 fs/ext2/super.c |  2 +-
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index e2709695b177..6e476c9929f8 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -713,6 +713,13 @@ int ext2_empty_dir (struct inode * inode)
 	return 0;
 }
 
+static int ext2_dir_open(struct inode *inode, struct file *file)
+{
+	/* Sample blockdev mapping errseq_t for metadata writeback */
+	file->f_md_wb_err = filemap_sample_wb_err(inode->i_sb->s_bdev->bd_inode->i_mapping);
+	return 0;
+}
+
 const struct file_operations ext2_dir_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= generic_read_dir,
@@ -721,5 +728,6 @@ const struct file_operations ext2_dir_operations = {
 #ifdef CONFIG_COMPAT
 	.compat_ioctl	= ext2_compat_ioctl,
 #endif
+	.open		= ext2_dir_open,
 	.fsync		= ext2_fsync,
 };
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index ed00e7ae0ef3..6f3cd7bc3fb3 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -172,16 +172,23 @@ static int ext2_release_file (struct inode * inode, struct file * filp)
 
 int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 {
-	int ret;
+	int ret, ret2;
 	struct super_block *sb = file->f_mapping->host->i_sb;
 	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
 
 	ret = generic_file_fsync(file, start, end, datasync);
-	if (ret == -EIO) {
-		/* We don't really know where the IO error happened... */
-		ext2_error(sb, __func__,
+
+	ret2 = filemap_report_wb_err(file);
+	if (ret == 0)
+		ret = ret2;
+
+	ret2 = filemap_report_md_wb_err(file, mapping);
+	if (ret2) {
+		if (ret == 0)
+			ret = ret2;
+		if (ret == -EIO)
+			ext2_error(sb, __func__,
 			   "detected IO error when writing metadata buffers");
-		ret = -EIO;
 	}
 	return ret;
 }
@@ -204,6 +211,16 @@ static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	return generic_file_write_iter(iocb, from);
 }
 
+static int ext2_file_open(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	ret = dquot_file_open(inode, file);
+	if (likely(ret == 0))
+		file->f_md_wb_err = filemap_sample_wb_err(inode->i_sb->s_bdev->bd_inode->i_mapping);
+	return ret;
+}
+
 const struct file_operations ext2_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read_iter	= ext2_file_read_iter,
@@ -213,7 +230,7 @@ const struct file_operations ext2_file_operations = {
 	.compat_ioctl	= ext2_compat_ioctl,
 #endif
 	.mmap		= ext2_file_mmap,
-	.open		= dquot_file_open,
+	.open		= ext2_file_open,
 	.release	= ext2_release_file,
 	.fsync		= ext2_fsync,
 	.get_unmapped_area = thp_get_unmapped_area,
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 9c2028b50e5c..dd37d7f955bf 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1629,7 +1629,7 @@ static struct file_system_type ext2_fs_type = {
 	.name		= "ext2",
 	.mount		= ext2_mount,
 	.kill_sb	= kill_block_super,
-	.fs_flags	= FS_REQUIRES_DEV,
+	.fs_flags	= FS_REQUIRES_DEV|FS_WB_ERRSEQ,
 };
 MODULE_ALIAS_FS("ext2");
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v5 17/17] fs: convert ext2 to use write_one_page_since
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (15 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 16/17] ext2: convert to errseq_t based writeback error tracking Jeff Layton
@ 2017-05-31 12:45 ` Jeff Layton
  2017-05-31 20:27 ` [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Andrew Morton
  2017-06-02  5:25 ` Ross Zwisler
  18 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 12:45 UTC (permalink / raw)
  To: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet
  Cc: linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

Sample the wb_err before changing the directory, so that we can catch
errors that occur since that point.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/ext2/dir.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 6e476c9929f8..073f096ac5e6 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -85,7 +85,8 @@ ext2_last_byte(struct inode *inode, unsigned long page_nr)
 	return last_byte;
 }
 
-static int ext2_commit_chunk(struct page *page, loff_t pos, unsigned len)
+static int ext2_commit_chunk(struct page *page, loff_t pos, unsigned len,
+				errseq_t since)
 {
 	struct address_space *mapping = page->mapping;
 	struct inode *dir = mapping->host;
@@ -100,7 +101,7 @@ static int ext2_commit_chunk(struct page *page, loff_t pos, unsigned len)
 	}
 
 	if (IS_DIRSYNC(dir)) {
-		err = write_one_page(page);
+		err = write_one_page_since(page, since);
 		if (!err)
 			err = sync_inode_metadata(dir, 1);
 	} else {
@@ -462,13 +463,14 @@ void ext2_set_link(struct inode *dir, struct ext2_dir_entry_2 *de,
 			(char *) de - (char *) page_address(page);
 	unsigned len = ext2_rec_len_from_disk(de->rec_len);
 	int err;
+	errseq_t since = filemap_sample_wb_err(dir->i_mapping);
 
 	lock_page(page);
 	err = ext2_prepare_chunk(page, pos, len);
 	BUG_ON(err);
 	de->inode = cpu_to_le32(inode->i_ino);
 	ext2_set_de_type(de, inode);
-	err = ext2_commit_chunk(page, pos, len);
+	err = ext2_commit_chunk(page, pos, len, since);
 	ext2_put_page(page);
 	if (update_times)
 		dir->i_mtime = dir->i_ctime = current_time(dir);
@@ -494,6 +496,7 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 	char *kaddr;
 	loff_t pos;
 	int err;
+	errseq_t since = filemap_sample_wb_err(dir->i_mapping);
 
 	/*
 	 * We take care of directory expansion in the same loop.
@@ -560,7 +563,7 @@ int ext2_add_link (struct dentry *dentry, struct inode *inode)
 	memcpy(de->name, name, namelen);
 	de->inode = cpu_to_le32(inode->i_ino);
 	ext2_set_de_type (de, inode);
-	err = ext2_commit_chunk(page, pos, rec_len);
+	err = ext2_commit_chunk(page, pos, rec_len, since);
 	dir->i_mtime = dir->i_ctime = current_time(dir);
 	EXT2_I(dir)->i_flags &= ~EXT2_BTREE_FL;
 	mark_inode_dirty(dir);
@@ -589,6 +592,7 @@ int ext2_delete_entry (struct ext2_dir_entry_2 * dir, struct page * page )
 	ext2_dirent * pde = NULL;
 	ext2_dirent * de = (ext2_dirent *) (kaddr + from);
 	int err;
+	errseq_t since = filemap_sample_wb_err(inode->i_mapping);
 
 	while ((char*)de < (char*)dir) {
 		if (de->rec_len == 0) {
@@ -609,7 +613,7 @@ int ext2_delete_entry (struct ext2_dir_entry_2 * dir, struct page * page )
 	if (pde)
 		pde->rec_len = ext2_rec_len_to_disk(to - from);
 	dir->inode = 0;
-	err = ext2_commit_chunk(page, pos, to - from);
+	err = ext2_commit_chunk(page, pos, to - from, since);
 	inode->i_ctime = inode->i_mtime = current_time(inode);
 	EXT2_I(inode)->i_flags &= ~EXT2_BTREE_FL;
 	mark_inode_dirty(inode);
@@ -628,6 +632,7 @@ int ext2_make_empty(struct inode *inode, struct inode *parent)
 	struct ext2_dir_entry_2 * de;
 	int err;
 	void *kaddr;
+	errseq_t since = filemap_sample_wb_err(inode->i_mapping);
 
 	if (!page)
 		return -ENOMEM;
@@ -653,7 +658,7 @@ int ext2_make_empty(struct inode *inode, struct inode *parent)
 	memcpy (de->name, "..\0", 4);
 	ext2_set_de_type (de, inode);
 	kunmap_atomic(kaddr);
-	err = ext2_commit_chunk(page, 0, chunk_size);
+	err = ext2_commit_chunk(page, 0, chunk_size, since);
 fail:
 	put_page(page);
 	return err;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (16 preceding siblings ...)
  2017-05-31 12:45 ` [PATCH v5 17/17] fs: convert ext2 to use write_one_page_since Jeff Layton
@ 2017-05-31 20:27 ` Andrew Morton
  2017-05-31 21:31     ` Jeff Layton
  2017-06-02  5:25 ` Ross Zwisler
  18 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2017-05-31 20:27 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, Jan Kara, tytso, axboe, mawilcox, ross.zwisler, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@redhat.com> wrote:

> This is v5 of the patchset to improve how we're tracking and reporting
> errors that occur during pagecache writeback.

I'm curious to know how you've been testing this?  Is that testing
strong enough for us to be confident that all nature of I/O errors
will be reported to userspace?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
  2017-05-31 20:27 ` [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Andrew Morton
@ 2017-05-31 21:31     ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 21:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kara, tytso, axboe, mawilcox, ross.zwisler, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> 
> > This is v5 of the patchset to improve how we're tracking and reporting
> > errors that occur during pagecache writeback.
> 
> I'm curious to know how you've been testing this?

>  Is that testing
> strong enough for us to be confident that all nature of I/O errors
> will be reported to userspace?
> 

That's a tall order. This is a difficult thing to test as these sorts of
errors are pretty rare by nature.

I have an xfstest that I posted just after this set that demonstrates
that it works correctly, at least on ext2/3/4 when run by the ext4
driver (ext2 legacy driver reports too many errors currently). I had
btrfs and xfs working on that test too in an earlier incarnation of this
set, so I think we can fix this in them as well without too much
difficulty.

I'm happy to run other tests if someone wants to suggest them.

Now, all that said, I don't think this will make things any worse than
they are today as far as reporting errors properly to userland goes.
It's rather easy for an incidental synchronous writeback request from an
internal caller to clear the AS_* flags today. This will at least ensure
that we're reporting errors since a well-defined point in time when you
call fsync.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
@ 2017-05-31 21:31     ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 21:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kara, tytso, axboe, mawilcox, ross.zwisler, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> 
> > This is v5 of the patchset to improve how we're tracking and reporting
> > errors that occur during pagecache writeback.
> 
> I'm curious to know how you've been testing this?

>  Is that testing
> strong enough for us to be confident that all nature of I/O errors
> will be reported to userspace?
> 

That's a tall order. This is a difficult thing to test as these sorts of
errors are pretty rare by nature.

I have an xfstest that I posted just after this set that demonstrates
that it works correctly, at least on ext2/3/4 when run by the ext4
driver (ext2 legacy driver reports too many errors currently). I had
btrfs and xfs working on that test too in an earlier incarnation of this
set, so I think we can fix this in them as well without too much
difficulty.

I'm happy to run other tests if someone wants to suggest them.

Now, all that said, I don't think this will make things any worse than
they are today as far as reporting errors properly to userland goes.
It's rather easy for an incidental synchronous writeback request from an
internal caller to clear the AS_* flags today. This will at least ensure
that we're reporting errors since a well-defined point in time when you
call fsync.
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
  2017-05-31 21:31     ` Jeff Layton
  (?)
@ 2017-05-31 21:37     ` Andrew Morton
  2017-05-31 22:01         ` Jeff Layton
  -1 siblings, 1 reply; 30+ messages in thread
From: Andrew Morton @ 2017-05-31 21:37 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Al Viro, Jan Kara, tytso, axboe, mawilcox, ross.zwisler, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton <jlayton@redhat.com> wrote:

> On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> > On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> > 
> > > This is v5 of the patchset to improve how we're tracking and reporting
> > > errors that occur during pagecache writeback.
> > 
> > I'm curious to know how you've been testing this?
> 
> >  Is that testing
> > strong enough for us to be confident that all nature of I/O errors
> > will be reported to userspace?
> > 
> 
> That's a tall order. This is a difficult thing to test as these sorts of
> errors are pretty rare by nature.
> 
> I have an xfstest that I posted just after this set that demonstrates
> that it works correctly, at least on ext2/3/4 when run by the ext4
> driver (ext2 legacy driver reports too many errors currently). I had
> btrfs and xfs working on that test too in an earlier incarnation of this
> set, so I think we can fix this in them as well without too much
> difficulty.
> 
> I'm happy to run other tests if someone wants to suggest them.
> 
> Now, all that said, I don't think this will make things any worse than
> they are today as far as reporting errors properly to userland goes.
> It's rather easy for an incidental synchronous writeback request from an
> internal caller to clear the AS_* flags today. This will at least ensure
> that we're reporting errors since a well-defined point in time when you
> call fsync.

Were you using error injection of some form?  If so, how was that all
set up?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
  2017-05-31 21:37     ` Andrew Morton
@ 2017-05-31 22:01         ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kara, tytso, axboe, mawilcox, ross.zwisler, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Wed, 2017-05-31 at 14:37 -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> > > On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> > > 
> > > > This is v5 of the patchset to improve how we're tracking and reporting
> > > > errors that occur during pagecache writeback.
> > > 
> > > I'm curious to know how you've been testing this?
> > >  Is that testing
> > > strong enough for us to be confident that all nature of I/O errors
> > > will be reported to userspace?
> > > 
> > 
> > That's a tall order. This is a difficult thing to test as these sorts of
> > errors are pretty rare by nature.
> > 
> > I have an xfstest that I posted just after this set that demonstrates
> > that it works correctly, at least on ext2/3/4 when run by the ext4
> > driver (ext2 legacy driver reports too many errors currently). I had
> > btrfs and xfs working on that test too in an earlier incarnation of this
> > set, so I think we can fix this in them as well without too much
> > difficulty.
> > 
> > I'm happy to run other tests if someone wants to suggest them.
> > 
> > Now, all that said, I don't think this will make things any worse than
> > they are today as far as reporting errors properly to userland goes.
> > It's rather easy for an incidental synchronous writeback request from an
> > internal caller to clear the AS_* flags today. This will at least ensure
> > that we're reporting errors since a well-defined point in time when you
> > call fsync.
> 
> Were you using error injection of some form?  If so, how was that all
> set up?
> 

Yes, it uses dm-error for fault injection.

The test basically does:

1) set up a dm-error device in a working configuration

2) build a scratch filesystem on it, with the log on a different device
in some fashion so metadata writeback will still succeed.

3) open the same file several times

4) flip dm-error device to non-working mode

5) write to each fd

6) fsync each fd

...do you get back an error on each fsync?

It then does a bit more to make sure they're cleared afterward as you'd
expect. That works for most block device based filesystems. I also have
a second xfstest that opens a block device and does the same basic
thing. That also works correctly with this patch series.

I still need to come up with a way to simulate errors on other fs'
though. We may need to plumb in some kernel-level fault injection on
some fs' to do that correctly. Suggestions welcome there.

With this series though, the idea is to convert one filesystem at a
time, so I think that should help mitigate some of the risk.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
@ 2017-05-31 22:01         ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-05-31 22:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Jan Kara, tytso, axboe, mawilcox, ross.zwisler, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Wed, 2017-05-31 at 14:37 -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> 
> > On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> > > On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@redhat.com> wrote:
> > > 
> > > > This is v5 of the patchset to improve how we're tracking and reporting
> > > > errors that occur during pagecache writeback.
> > > 
> > > I'm curious to know how you've been testing this?
> > >  Is that testing
> > > strong enough for us to be confident that all nature of I/O errors
> > > will be reported to userspace?
> > > 
> > 
> > That's a tall order. This is a difficult thing to test as these sorts of
> > errors are pretty rare by nature.
> > 
> > I have an xfstest that I posted just after this set that demonstrates
> > that it works correctly, at least on ext2/3/4 when run by the ext4
> > driver (ext2 legacy driver reports too many errors currently). I had
> > btrfs and xfs working on that test too in an earlier incarnation of this
> > set, so I think we can fix this in them as well without too much
> > difficulty.
> > 
> > I'm happy to run other tests if someone wants to suggest them.
> > 
> > Now, all that said, I don't think this will make things any worse than
> > they are today as far as reporting errors properly to userland goes.
> > It's rather easy for an incidental synchronous writeback request from an
> > internal caller to clear the AS_* flags today. This will at least ensure
> > that we're reporting errors since a well-defined point in time when you
> > call fsync.
> 
> Were you using error injection of some form?  If so, how was that all
> set up?
> 

Yes, it uses dm-error for fault injection.

The test basically does:

1) set up a dm-error device in a working configuration

2) build a scratch filesystem on it, with the log on a different device
in some fashion so metadata writeback will still succeed.

3) open the same file several times

4) flip dm-error device to non-working mode

5) write to each fd

6) fsync each fd

...do you get back an error on each fsync?

It then does a bit more to make sure they're cleared afterward as you'd
expect. That works for most block device based filesystems. I also have
a second xfstest that opens a block device and does the same basic
thing. That also works correctly with this patch series.

I still need to come up with a way to simulate errors on other fs'
though. We may need to plumb in some kernel-level fault injection on
some fs' to do that correctly. Suggestions welcome there.

With this series though, the idea is to convert one filesystem at a
time, so I think that should help mitigate some of the risk.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
  2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
                   ` (17 preceding siblings ...)
  2017-05-31 20:27 ` [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Andrew Morton
@ 2017-06-02  5:25 ` Ross Zwisler
  2017-06-02 10:07     ` Jeff Layton
  18 siblings, 1 reply; 30+ messages in thread
From: Ross Zwisler @ 2017-06-02  5:25 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet, linux-ext4, linux-fsdevel, linux-kernel,
	linux-block, linux-doc

On Wed, May 31, 2017 at 08:45:23AM -0400, Jeff Layton wrote:
> v5: don't retrofit old API over the new infrastructure
>     add fstype flag to indicate how wb errors are tracked within that fs
>     add more function variants that take a errseq_t "since" value
>     add second errseq_t to struct file to track metadata wb errors
>     convert ext4 and ext2 to use the new APIs
> 
> v4: several more cleanup patches
>     documentation and kerneldoc comment updates
>     fix bugs in gfs2 patches
>     make sync_file_range use same error reporting semantics
>     bugfixes in buffer.c
>     convert nfs to new scheme (maybe bogus, can be dropped)
> 
> v3: wb_err_t -> errseq_t conversion
>     clean up places that re-set errors after calling filemap_* functions
> 
> v2: introduce wb_err_t, use atomics
> 
> This is v5 of the patchset to improve how we're tracking and reporting
> errors that occur during pagecache writeback. The main difference in
> this set from the last one is that I've stopped trying to retrofit the
> old error tracking API on top of the new one. This is more work since
> we'll have to touch each fs individually, but should be safer as the
> "since" values used for checking errors will be more deliberate.
> 
> There are several situations where the kernel can "lose" errors that
> occur during writeback, such that fsync will return success even
> though it failed to write back some data previously. The basic idea
> here is to have the kernel be more deliberate about the point from
> which errors are checked to ensure that that doesn't happen.
> 
> An additional aim of this set is to change the behavior of fsync in
> Linux to report writeback errors on all fds instead of just the first
> one. This allows writers to reliably tell whether their data made it to
> the backing device without having to coordinate fsync calls with other
> writers.
> 
> To do this, we add a new typedef: errseq_t. This is a 32-bit value
> that can store an error code, and a sequence number so we can tell
> whether it has changed since we last sampled it. This allows us to
> record errors in the address_space and then report those errors only
> once per file description.
> 
> This set just alters block device files, ext4 and the legacy ext2
> driver. If this general approach seems acceptable, then I'll start
> converting other filesystems in follow-on patchsets. I'd also like
> to get this into linux-next as soon as possible to ensure that we're
> banging out any bugs that might be lurking here.
> 
> I also have a couple of xfstests for this as well that I'll re-post
> soon.

Can you tell me a baseline that this applies cleanly to, or give me a link to
a tree with these patches already applied?  I've tried applying it to v4.11,
linux/master and mmots/master, and so far nothing has worked.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
  2017-06-02  5:25 ` Ross Zwisler
@ 2017-06-02 10:07     ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-06-02 10:07 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Thu, 2017-06-01 at 23:25 -0600, Ross Zwisler wrote:
> On Wed, May 31, 2017 at 08:45:23AM -0400, Jeff Layton wrote:
> > v5: don't retrofit old API over the new infrastructure
> >     add fstype flag to indicate how wb errors are tracked within that fs
> >     add more function variants that take a errseq_t "since" value
> >     add second errseq_t to struct file to track metadata wb errors
> >     convert ext4 and ext2 to use the new APIs
> > 
> > v4: several more cleanup patches
> >     documentation and kerneldoc comment updates
> >     fix bugs in gfs2 patches
> >     make sync_file_range use same error reporting semantics
> >     bugfixes in buffer.c
> >     convert nfs to new scheme (maybe bogus, can be dropped)
> > 
> > v3: wb_err_t -> errseq_t conversion
> >     clean up places that re-set errors after calling filemap_* functions
> > 
> > v2: introduce wb_err_t, use atomics
> > 
> > This is v5 of the patchset to improve how we're tracking and reporting
> > errors that occur during pagecache writeback. The main difference in
> > this set from the last one is that I've stopped trying to retrofit the
> > old error tracking API on top of the new one. This is more work since
> > we'll have to touch each fs individually, but should be safer as the
> > "since" values used for checking errors will be more deliberate.
> > 
> > There are several situations where the kernel can "lose" errors that
> > occur during writeback, such that fsync will return success even
> > though it failed to write back some data previously. The basic idea
> > here is to have the kernel be more deliberate about the point from
> > which errors are checked to ensure that that doesn't happen.
> > 
> > An additional aim of this set is to change the behavior of fsync in
> > Linux to report writeback errors on all fds instead of just the first
> > one. This allows writers to reliably tell whether their data made it to
> > the backing device without having to coordinate fsync calls with other
> > writers.
> > 
> > To do this, we add a new typedef: errseq_t. This is a 32-bit value
> > that can store an error code, and a sequence number so we can tell
> > whether it has changed since we last sampled it. This allows us to
> > record errors in the address_space and then report those errors only
> > once per file description.
> > 
> > This set just alters block device files, ext4 and the legacy ext2
> > driver. If this general approach seems acceptable, then I'll start
> > converting other filesystems in follow-on patchsets. I'd also like
> > to get this into linux-next as soon as possible to ensure that we're
> > banging out any bugs that might be lurking here.
> > 
> > I also have a couple of xfstests for this as well that I'll re-post
> > soon.
> 
> Can you tell me a baseline that this applies cleanly to, or give me a link to
> a tree with these patches already applied?  I've tried applying it to v4.11,
> linux/master and mmots/master, and so far nothing has worked.

It's basically on top of v4.12-rc3, but it may not apply cleanly
without the pile of individual patches that I sent recently.

It may be best to just pull down the "wberr" branch from my tree here:

    git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git

I was originally sending the prep patches as part of this series, but
maintainers weren't picking them up, so I moved to sending them
individually and then sending this pile as its own set.

Many thanks for giving this a look and testing it!
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it
@ 2017-06-02 10:07     ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-06-02 10:07 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Thu, 2017-06-01 at 23:25 -0600, Ross Zwisler wrote:
> On Wed, May 31, 2017 at 08:45:23AM -0400, Jeff Layton wrote:
> > v5: don't retrofit old API over the new infrastructure
> >     add fstype flag to indicate how wb errors are tracked within that fs
> >     add more function variants that take a errseq_t "since" value
> >     add second errseq_t to struct file to track metadata wb errors
> >     convert ext4 and ext2 to use the new APIs
> > 
> > v4: several more cleanup patches
> >     documentation and kerneldoc comment updates
> >     fix bugs in gfs2 patches
> >     make sync_file_range use same error reporting semantics
> >     bugfixes in buffer.c
> >     convert nfs to new scheme (maybe bogus, can be dropped)
> > 
> > v3: wb_err_t -> errseq_t conversion
> >     clean up places that re-set errors after calling filemap_* functions
> > 
> > v2: introduce wb_err_t, use atomics
> > 
> > This is v5 of the patchset to improve how we're tracking and reporting
> > errors that occur during pagecache writeback. The main difference in
> > this set from the last one is that I've stopped trying to retrofit the
> > old error tracking API on top of the new one. This is more work since
> > we'll have to touch each fs individually, but should be safer as the
> > "since" values used for checking errors will be more deliberate.
> > 
> > There are several situations where the kernel can "lose" errors that
> > occur during writeback, such that fsync will return success even
> > though it failed to write back some data previously. The basic idea
> > here is to have the kernel be more deliberate about the point from
> > which errors are checked to ensure that that doesn't happen.
> > 
> > An additional aim of this set is to change the behavior of fsync in
> > Linux to report writeback errors on all fds instead of just the first
> > one. This allows writers to reliably tell whether their data made it to
> > the backing device without having to coordinate fsync calls with other
> > writers.
> > 
> > To do this, we add a new typedef: errseq_t. This is a 32-bit value
> > that can store an error code, and a sequence number so we can tell
> > whether it has changed since we last sampled it. This allows us to
> > record errors in the address_space and then report those errors only
> > once per file description.
> > 
> > This set just alters block device files, ext4 and the legacy ext2
> > driver. If this general approach seems acceptable, then I'll start
> > converting other filesystems in follow-on patchsets. I'd also like
> > to get this into linux-next as soon as possible to ensure that we're
> > banging out any bugs that might be lurking here.
> > 
> > I also have a couple of xfstests for this as well that I'll re-post
> > soon.
> 
> Can you tell me a baseline that this applies cleanly to, or give me a link to
> a tree with these patches already applied?  I've tried applying it to v4.11,
> linux/master and mmots/master, and so far nothing has worked.

It's basically on top of v4.12-rc3, but it may not apply cleanly
without the pile of individual patches that I sent recently.

It may be best to just pull down the "wberr" branch from my tree here:

    git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git

I was originally sending the prep patches as part of this series, but
maintainers weren't picking them up, so I moved to sending them
individually and then sending this pile as its own set.

Many thanks for giving this a look and testing it!
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 08/17] dax: set errors in mapping when writeback fails
  2017-05-31 12:45 ` [PATCH v5 08/17] dax: set errors in mapping when writeback fails Jeff Layton
@ 2017-06-06  1:01   ` Ross Zwisler
  2017-06-06  1:08       ` Jeff Layton
  0 siblings, 1 reply; 30+ messages in thread
From: Ross Zwisler @ 2017-06-06  1:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox,
	ross.zwisler, corbet, linux-ext4, linux-fsdevel, linux-kernel,
	linux-block, linux-doc

On Wed, May 31, 2017 at 08:45:31AM -0400, Jeff Layton wrote:
> Jan's description for this patch is much better than mine, so I'm
> quoting it verbatim here:
> 
> -----------------8<-----------------
> DAX currently doesn't set errors in the mapping when cache flushing
> fails in dax_writeback_mapping_range(). Since this function can get
> called only from fsync(2) or sync(2), this is actually as good as it can
> currently get since we correctly propagate the error up from
> dax_writeback_mapping_range() to filemap_fdatawrite()
> 
> However, in the future better writeback error handling will enable us to
> properly report these errors on fsync(2) even if there are multiple file
> descriptors open against the file or if sync(2) gets called before
> fsync(2). So convert DAX to using standard error reporting through the
> mapping.
> -----------------8<-----------------
> 
> For now, only do this when the FS_WB_ERRSEQ flag is set. The
> AS_EIO/AS_ENOSPC flags are not currently cleared in the older code when
> writeback initiation fails, only when we discover an error after waiting
> on writeback to complete, so we only want to do this with errseq_t based
> error handling to prevent seeing duplicate errors on fsync.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-and-Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>

Re-tested this version of the series with some injected DAX errors, and it
looks good.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 08/17] dax: set errors in mapping when writeback fails
  2017-06-06  1:01   ` Ross Zwisler
@ 2017-06-06  1:08       ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-06-06  1:08 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Mon, 2017-06-05 at 19:01 -0600, Ross Zwisler wrote:
> On Wed, May 31, 2017 at 08:45:31AM -0400, Jeff Layton wrote:
> > Jan's description for this patch is much better than mine, so I'm
> > quoting it verbatim here:
> > 
> > -----------------8<-----------------
> > DAX currently doesn't set errors in the mapping when cache flushing
> > fails in dax_writeback_mapping_range(). Since this function can get
> > called only from fsync(2) or sync(2), this is actually as good as it can
> > currently get since we correctly propagate the error up from
> > dax_writeback_mapping_range() to filemap_fdatawrite()
> > 
> > However, in the future better writeback error handling will enable us to
> > properly report these errors on fsync(2) even if there are multiple file
> > descriptors open against the file or if sync(2) gets called before
> > fsync(2). So convert DAX to using standard error reporting through the
> > mapping.
> > -----------------8<-----------------
> > 
> > For now, only do this when the FS_WB_ERRSEQ flag is set. The
> > AS_EIO/AS_ENOSPC flags are not currently cleared in the older code when
> > writeback initiation fails, only when we discover an error after waiting
> > on writeback to complete, so we only want to do this with errseq_t based
> > error handling to prevent seeing duplicate errors on fsync.
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-and-Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Re-tested this version of the series with some injected DAX errors, and it
> looks good.

Excellent! Thanks very much for helping test it.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v5 08/17] dax: set errors in mapping when writeback fails
@ 2017-06-06  1:08       ` Jeff Layton
  0 siblings, 0 replies; 30+ messages in thread
From: Jeff Layton @ 2017-06-06  1:08 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Andrew Morton, Al Viro, Jan Kara, tytso, axboe, mawilcox, corbet,
	linux-ext4, linux-fsdevel, linux-kernel, linux-block, linux-doc

On Mon, 2017-06-05 at 19:01 -0600, Ross Zwisler wrote:
> On Wed, May 31, 2017 at 08:45:31AM -0400, Jeff Layton wrote:
> > Jan's description for this patch is much better than mine, so I'm
> > quoting it verbatim here:
> > 
> > -----------------8<-----------------
> > DAX currently doesn't set errors in the mapping when cache flushing
> > fails in dax_writeback_mapping_range(). Since this function can get
> > called only from fsync(2) or sync(2), this is actually as good as it can
> > currently get since we correctly propagate the error up from
> > dax_writeback_mapping_range() to filemap_fdatawrite()
> > 
> > However, in the future better writeback error handling will enable us to
> > properly report these errors on fsync(2) even if there are multiple file
> > descriptors open against the file or if sync(2) gets called before
> > fsync(2). So convert DAX to using standard error reporting through the
> > mapping.
> > -----------------8<-----------------
> > 
> > For now, only do this when the FS_WB_ERRSEQ flag is set. The
> > AS_EIO/AS_ENOSPC flags are not currently cleared in the older code when
> > writeback initiation fails, only when we discover an error after waiting
> > on writeback to complete, so we only want to do this with errseq_t based
> > error handling to prevent seeing duplicate errors on fsync.
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > Reviewed-by: Jan Kara <jack@suse.cz>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-and-Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Re-tested this version of the series with some injected DAX errors, and it
> looks good.

Excellent! Thanks very much for helping test it.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-06-06  1:08 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 12:45 [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Jeff Layton
2017-05-31 12:45 ` [PATCH v5 01/17] lib: add errseq_t type and infrastructure for handling it Jeff Layton
2017-05-31 12:45 ` [PATCH v5 02/17] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-05-31 12:45 ` [PATCH v5 03/17] mm: tracepoints for writeback error events Jeff Layton
2017-05-31 12:45 ` [PATCH v5 04/17] fs: add a new fstype flag to indicate how writeback errors are tracked Jeff Layton
2017-05-31 12:45 ` [PATCH v5 05/17] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors Jeff Layton
2017-05-31 12:45 ` [PATCH v5 06/17] fs: adapt sync_file_range to new reporting infrastructure Jeff Layton
2017-05-31 12:45 ` [PATCH v5 07/17] mm: add filemap_fdatawait_range_since and filemap_write_and_wait_range_since Jeff Layton
2017-05-31 12:45 ` [PATCH v5 08/17] dax: set errors in mapping when writeback fails Jeff Layton
2017-06-06  1:01   ` Ross Zwisler
2017-06-06  1:08     ` Jeff Layton
2017-06-06  1:08       ` Jeff Layton
2017-05-31 12:45 ` [PATCH v5 09/17] block: convert to errseq_t based writeback error tracking Jeff Layton
2017-05-31 12:45 ` [PATCH v5 10/17] block: add sync_blockdev_since and sync_filesystem_since Jeff Layton
2017-05-31 12:45 ` [PATCH v5 11/17] fs: add f_md_wb_err field to struct file for tracking metadata errors Jeff Layton
2017-05-31 12:45 ` [PATCH v5 12/17] fs: allow __generic_file_fsync to support both flavors of error reporting Jeff Layton
2017-05-31 12:45 ` [PATCH v5 13/17] jbd2: conditionally handle errors using errseq_t based on FS_WB_ERRSEQ flag Jeff Layton
2017-05-31 12:45 ` [PATCH v5 14/17] ext4: convert to errseq_t based error tracking Jeff Layton
2017-05-31 12:45 ` [PATCH v5 15/17] fs: add a write_one_page_since Jeff Layton
2017-05-31 12:45 ` [PATCH v5 16/17] ext2: convert to errseq_t based writeback error tracking Jeff Layton
2017-05-31 12:45 ` [PATCH v5 17/17] fs: convert ext2 to use write_one_page_since Jeff Layton
2017-05-31 20:27 ` [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it Andrew Morton
2017-05-31 21:31   ` Jeff Layton
2017-05-31 21:31     ` Jeff Layton
2017-05-31 21:37     ` Andrew Morton
2017-05-31 22:01       ` Jeff Layton
2017-05-31 22:01         ` Jeff Layton
2017-06-02  5:25 ` Ross Zwisler
2017-06-02 10:07   ` Jeff Layton
2017-06-02 10:07     ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.