linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] vfs: track per-sb writeback errors and report them via fsinfo()
@ 2018-06-04 18:02 Jeff Layton
  2018-06-04 18:03 ` [PATCH 1/5] vfs: track per-sb writeback errors Jeff Layton
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Jeff Layton @ 2018-06-04 18:02 UTC (permalink / raw)
  To: viro, dhowells; +Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel

From: Jeff Layton <jlayton@redhat.com>

This is a reworked (and more modest) version of the syncfs rework that I
posted recently. Given that we have a new fsinfo syscall being
introduced, we may as well use it to report writeback errors on a per
superblock basis. This allows us to provide the info that the PostgreSQL
developers wanted, without needing to change an existing interface.

We may still want to change syncfs to report writeback errors at some
point, but for now, I think this is a safer path to presenting this
info.

This seems to do the right thing when tested by hand, but I don't yet
have an xfstest for it, since the syscall is still quite new. Once that
goes in and we get fsinfo support in xfs_io, it should be rather
trivial to modify the test I wrote for the syncfs rework.

Jeff Layton (5):
  vfs: track per-sb writeback errors
  buffer: record blockdev write errors in super_block that backs them
  errseq: add a new errseq_scrape function
  vfs: allow fsinfo to fetch the current state of s_wb_err
  samples: extend test-fsinfo to access error_state

 fs/buffer.c                 |  2 ++
 fs/statfs.c                 |  9 +++++++++
 include/linux/errseq.h      |  1 +
 include/linux/fs.h          |  3 +++
 include/linux/pagemap.h     |  5 ++++-
 include/uapi/linux/fsinfo.h | 11 +++++++++++
 lib/errseq.c                | 33 +++++++++++++++++++++++++++++++--
 samples/statx/test-fsinfo.c | 13 +++++++++++++
 8 files changed, 74 insertions(+), 3 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] vfs: track per-sb writeback errors
  2018-06-04 18:02 [PATCH 0/5] vfs: track per-sb writeback errors and report them via fsinfo() Jeff Layton
@ 2018-06-04 18:03 ` Jeff Layton
  2018-06-04 18:03 ` [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them Jeff Layton
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2018-06-04 18:03 UTC (permalink / raw)
  To: viro, dhowells; +Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel

From: Jeff Layton <jlayton@redhat.com>

Keep a per-sb errseq_t that tracks whether there have been any writeback
errors on this superblock. When an error is recorded for an inode via
mapping_set_error, record it in s_wb_err as well.

Cc: Andres Freund <andres@anarazel.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h      | 3 +++
 include/linux/pagemap.h | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 79f98ed39a18..ccdcde9e185b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1429,6 +1429,9 @@ struct super_block {
 	/* Being remounted read-only */
 	int s_readonly_remount;
 
+	/* per-sb errseq_t for reporting writeback errors via syncfs */
+	errseq_t s_wb_err;
+
 	/* AIO completions deferred from interrupt context */
 	struct workqueue_struct *s_dio_done_wq;
 	struct hlist_head s_pins;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index b1bd2186e6d2..2de87c5a2718 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -51,7 +51,10 @@ static inline void mapping_set_error(struct address_space *mapping, int error)
 		return;
 
 	/* Record in wb_err for checkers using errseq_t based tracking */
-	filemap_set_wb_err(mapping, error);
+	__filemap_set_wb_err(mapping, error);
+
+	/* Record it in superblock */
+	errseq_set(&mapping->host->i_sb->s_wb_err, error);
 
 	/* Record it in flags for now, for legacy callers */
 	if (error == -ENOSPC)
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them
  2018-06-04 18:02 [PATCH 0/5] vfs: track per-sb writeback errors and report them via fsinfo() Jeff Layton
  2018-06-04 18:03 ` [PATCH 1/5] vfs: track per-sb writeback errors Jeff Layton
@ 2018-06-04 18:03 ` Jeff Layton
  2018-06-06 15:56   ` Jeff Layton
  2018-06-04 18:03 ` [PATCH 3/5] errseq: add a new errseq_scrape function Jeff Layton
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Jeff Layton @ 2018-06-04 18:03 UTC (permalink / raw)
  To: viro, dhowells; +Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel

From: Jeff Layton <jlayton@redhat.com>

When syncing out a block device (a'la __sync_blockdev), any error
encountered will only be recorded in the bd_inode's mapping. When the
blockdev contains a filesystem however, we'd like to also record the
error in the super_block that's stored there.

Make mark_buffer_write_io_error also record the error in the
corresponding super_block when a writeback error occurs and the block
device contains a mounted superblock.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/buffer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 249b83fafe48..dae2a857d5bc 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1117,6 +1117,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
 		mapping_set_error(bh->b_page->mapping, -EIO);
 	if (bh->b_assoc_map)
 		mapping_set_error(bh->b_assoc_map, -EIO);
+	if (bh->b_bdev->bd_super)
+		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
 }
 EXPORT_SYMBOL(mark_buffer_write_io_error);
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] errseq: add a new errseq_scrape function
  2018-06-04 18:02 [PATCH 0/5] vfs: track per-sb writeback errors and report them via fsinfo() Jeff Layton
  2018-06-04 18:03 ` [PATCH 1/5] vfs: track per-sb writeback errors Jeff Layton
  2018-06-04 18:03 ` [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them Jeff Layton
@ 2018-06-04 18:03 ` Jeff Layton
  2018-06-04 18:03 ` [PATCH 4/5] vfs: allow fsinfo to fetch the current state of s_wb_err Jeff Layton
  2018-06-04 18:03 ` [PATCH 5/5] samples: extend test-fsinfo to access error_state Jeff Layton
  4 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2018-06-04 18:03 UTC (permalink / raw)
  To: viro, dhowells; +Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel

From: Jeff Layton <jlayton@redhat.com>

To grab the current value of an errseq_t, mark it as seen and then
return the value with the seen bit masked off.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/errseq.h |  1 +
 lib/errseq.c           | 33 +++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/include/linux/errseq.h b/include/linux/errseq.h
index fc2777770768..de165623fa86 100644
--- a/include/linux/errseq.h
+++ b/include/linux/errseq.h
@@ -9,6 +9,7 @@ typedef u32	errseq_t;
 
 errseq_t errseq_set(errseq_t *eseq, int err);
 errseq_t errseq_sample(errseq_t *eseq);
+errseq_t errseq_scrape(errseq_t *eseq);
 int errseq_check(errseq_t *eseq, errseq_t since);
 int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
 #endif
diff --git a/lib/errseq.c b/lib/errseq.c
index 81f9e33aa7e7..8ded0920eed3 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -108,7 +108,7 @@ errseq_t errseq_set(errseq_t *eseq, int err)
 EXPORT_SYMBOL(errseq_set);
 
 /**
- * errseq_sample() - Grab current errseq_t value.
+ * errseq_sample() - Grab current errseq_t value (or 0 if it hasn't been seen)
  * @eseq: Pointer to errseq_t to be sampled.
  *
  * This function allows callers to initialise their errseq_t variable.
@@ -117,7 +117,7 @@ EXPORT_SYMBOL(errseq_set);
  * see it the next time it checks for an error.
  *
  * Context: Any context.
- * Return: The current errseq value.
+ * Return: The current errseq value or 0 if it wasn't previously seen
  */
 errseq_t errseq_sample(errseq_t *eseq)
 {
@@ -130,6 +130,35 @@ errseq_t errseq_sample(errseq_t *eseq)
 }
 EXPORT_SYMBOL(errseq_sample);
 
+/**
+ * errseq_scrape() - Grab current errseq_t value
+ * @eseq: Pointer to errseq_t to be sampled.
+ *
+ * This function allows callers to scrape the current value of an errseq_t.
+ * Unlike errseq_sample, this will always return the current value with
+ * the SEEN flag unset, even when the value has not yet been seen.
+ *
+ * Context: Any context.
+ * Return: The current errseq value with ERRSEQ_SEEN masked off
+ */
+errseq_t errseq_scrape(errseq_t *eseq)
+{
+	errseq_t old = READ_ONCE(*eseq);
+
+	/*
+	 * For the common case of no errors ever having been set, we can skip
+	 * marking the SEEN bit. Once an error has been set, the value will
+	 * never go back to zero.
+	 */
+	if (old != 0) {
+		errseq_t new = old | ERRSEQ_SEEN;
+		if (old != new)
+			cmpxchg(eseq, old, new);
+	}
+	return old & ~ERRSEQ_SEEN;
+}
+EXPORT_SYMBOL(errseq_scrape);
+
 /**
  * errseq_check() - Has an error occurred since a particular sample point?
  * @eseq: Pointer to errseq_t value to be checked.
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] vfs: allow fsinfo to fetch the current state of s_wb_err
  2018-06-04 18:02 [PATCH 0/5] vfs: track per-sb writeback errors and report them via fsinfo() Jeff Layton
                   ` (2 preceding siblings ...)
  2018-06-04 18:03 ` [PATCH 3/5] errseq: add a new errseq_scrape function Jeff Layton
@ 2018-06-04 18:03 ` Jeff Layton
  2018-06-04 18:03 ` [PATCH 5/5] samples: extend test-fsinfo to access error_state Jeff Layton
  4 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2018-06-04 18:03 UTC (permalink / raw)
  To: viro, dhowells; +Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel

From: Jeff Layton <jlayton@redhat.com>

Add a new "error_state" struct to fsinfo, and teach the kernel to fill
that out from sb->s_wb_info. There are two fields:

wb_error_last: the most recently recorded errno for the filesystem

wb_error_cookie: this value will change vs. the previously fetched
                 value if a new error was recorded since it was last
		 checked

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/statfs.c                 |  9 +++++++++
 include/uapi/linux/fsinfo.h | 11 +++++++++++
 2 files changed, 20 insertions(+)

diff --git a/fs/statfs.c b/fs/statfs.c
index f996ab6af44f..df8f1a1166c0 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -576,6 +576,14 @@ static int fsinfo_generic_io_size(struct dentry *dentry,
 	return sizeof(*c);
 }
 
+static int fsinfo_generic_error_state(struct dentry *dentry,
+				  struct fsinfo_error_state *c)
+{
+	c->wb_error_cookie = errseq_scrape(&dentry->d_sb->s_wb_err);
+	c->wb_error_last = c->wb_error_cookie & MAX_ERRNO;
+	return sizeof(*c);
+}
+
 /*
  * Implement some queries generically from stuff in the superblock.
  */
@@ -594,6 +602,7 @@ int generic_fsinfo(struct dentry *dentry, struct fsinfo_kparams *params)
 	case _gen(volume_id);
 	case _gen(name_encoding);
 	case _gen(io_size);
+	case _gen(error_state);
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index a6758e71f0c7..a0c6e0e18e8a 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -35,6 +35,7 @@ enum fsinfo_attribute {
 	fsinfo_attr_name_encoding	= 17,	/* Filename encoding (string) */
 	fsinfo_attr_name_codepage	= 18,	/* Filename codepage (string) */
 	fsinfo_attr_io_size		= 19,	/* Optimal I/O sizes */
+	fsinfo_attr_error_state		= 20,	/* Error state */
 	fsinfo_attr__nr
 };
 
@@ -209,6 +210,16 @@ struct fsinfo_server_address {
 	struct __kernel_sockaddr_storage address;
 };
 
+/*
+ * Information struct for fsinfo(fsinfo_attr_error_state).
+ *
+ * Retrieve the error state for a filesystem.
+ */
+struct fsinfo_error_state {
+	__u32		wb_error_cookie;	/* writeback error cookie */
+	__u32		wb_error_last;		/* latest writeback error */
+};
+
 /*
  * Information struct for fsinfo(fsinfo_attr_io_size).
  *
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] samples: extend test-fsinfo to access error_state
  2018-06-04 18:02 [PATCH 0/5] vfs: track per-sb writeback errors and report them via fsinfo() Jeff Layton
                   ` (3 preceding siblings ...)
  2018-06-04 18:03 ` [PATCH 4/5] vfs: allow fsinfo to fetch the current state of s_wb_err Jeff Layton
@ 2018-06-04 18:03 ` Jeff Layton
  4 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2018-06-04 18:03 UTC (permalink / raw)
  To: viro, dhowells; +Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel

From: Jeff Layton <jlayton@redhat.com>

Add support for error_state struct to test-fsinfo sample program.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 samples/statx/test-fsinfo.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/samples/statx/test-fsinfo.c b/samples/statx/test-fsinfo.c
index 9d70c422da11..b383a6ec81a0 100644
--- a/samples/statx/test-fsinfo.c
+++ b/samples/statx/test-fsinfo.c
@@ -60,6 +60,7 @@ static const __u8 fsinfo_buffer_sizes[fsinfo_attr__nr] = {
 	FSINFO_STRING		(name_encoding),
 	FSINFO_STRING		(name_codepage),
 	FSINFO_STRUCT		(io_size),
+	FSINFO_STRUCT		(error_state),
 };
 
 #define FSINFO_NAME(N) [fsinfo_attr_##N] = #N
@@ -84,6 +85,7 @@ static const char *fsinfo_attr_names[fsinfo_attr__nr] = {
 	FSINFO_NAME(name_encoding),
 	FSINFO_NAME(name_codepage),
 	FSINFO_NAME(io_size),
+	FSINFO_NAME(error_state),
 };
 
 union reply {
@@ -98,6 +100,7 @@ union reply {
 	struct fsinfo_volume_uuid uuid;
 	struct fsinfo_server_address srv_addr;
 	struct fsinfo_io_size io_size;
+	struct fsinfo_error_state error_state;
 };
 
 static void dump_hex(unsigned int *data, int from, int to)
@@ -303,6 +306,15 @@ static void dump_attr_io_size(union reply *r, int size)
 	printf("bs=%u\n", f->block_size);
 }
 
+static void dump_attr_error_state(union reply *r, int size)
+{
+	struct fsinfo_error_state *f = &r->error_state;
+
+	printf("err_cookie=0x%x err_last=%u\n",
+			f->wb_error_cookie,
+			f->wb_error_last);
+}
+
 /*
  *
  */
@@ -320,6 +332,7 @@ static const dumper_t fsinfo_attr_dumper[fsinfo_attr__nr] = {
 	FSINFO_DUMPER(volume_uuid),
 	FSINFO_DUMPER(server_address),
 	FSINFO_DUMPER(io_size),
+	FSINFO_DUMPER(error_state),
 };
 
 static void dump_fsinfo(enum fsinfo_attribute attr, __u8 about,
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them
  2018-06-04 18:03 ` [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them Jeff Layton
@ 2018-06-06 15:56   ` Jeff Layton
  2018-06-19 10:40     ` Jeff Layton
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Layton @ 2018-06-06 15:56 UTC (permalink / raw)
  To: viro, dhowells, Jens Axboe
  Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel, linux-block

On Mon, 2018-06-04 at 14:03 -0400, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> When syncing out a block device (a'la __sync_blockdev), any error
> encountered will only be recorded in the bd_inode's mapping. When the
> blockdev contains a filesystem however, we'd like to also record the
> error in the super_block that's stored there.
> 
> Make mark_buffer_write_io_error also record the error in the
> corresponding super_block when a writeback error occurs and the block
> device contains a mounted superblock.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  fs/buffer.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 249b83fafe48..dae2a857d5bc 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1117,6 +1117,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
>  		mapping_set_error(bh->b_page->mapping, -EIO);
>  	if (bh->b_assoc_map)
>  		mapping_set_error(bh->b_assoc_map, -EIO);
> +	if (bh->b_bdev->bd_super)
> +		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
>  }
>  EXPORT_SYMBOL(mark_buffer_write_io_error);
>  

(cc'ing linux-block and Jens)

I'm wondering whether this patch might turn out to be racy. For
instance, could a call to __sync_blockdev race with an unmount in such
a way that bd_super goes NULL after we check it but before errseq_set
is called?

If so, what can we do to ensure that that doesn't happen? Any insight
here would be appreciated.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them
  2018-06-06 15:56   ` Jeff Layton
@ 2018-06-19 10:40     ` Jeff Layton
  2018-06-19 13:03       ` Jeff Layton
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Layton @ 2018-06-19 10:40 UTC (permalink / raw)
  To: viro, dhowells, Jens Axboe, David Howells
  Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel, linux-block

On Wed, 2018-06-06 at 11:56 -0400, Jeff Layton wrote:
> On Mon, 2018-06-04 at 14:03 -0400, Jeff Layton wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > When syncing out a block device (a'la __sync_blockdev), any error
> > encountered will only be recorded in the bd_inode's mapping. When the
> > blockdev contains a filesystem however, we'd like to also record the
> > error in the super_block that's stored there.
> > 
> > Make mark_buffer_write_io_error also record the error in the
> > corresponding super_block when a writeback error occurs and the block
> > device contains a mounted superblock.
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  fs/buffer.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 249b83fafe48..dae2a857d5bc 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1117,6 +1117,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
> >  		mapping_set_error(bh->b_page->mapping, -EIO);
> >  	if (bh->b_assoc_map)
> >  		mapping_set_error(bh->b_assoc_map, -EIO);
> > +	if (bh->b_bdev->bd_super)
> > +		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
> >  }
> >  EXPORT_SYMBOL(mark_buffer_write_io_error);
> >  
> 
> (cc'ing linux-block and Jens)
> 
> I'm wondering whether this patch might turn out to be racy. For
> instance, could a call to __sync_blockdev race with an unmount in such
> a way that bd_super goes NULL after we check it but before errseq_set
> is called?
> 
> If so, what can we do to ensure that that doesn't happen? Any insight
> here would be appreciated.
> 
> Thanks,

Jens, ping? I never got a response on the above.

After looking over it some more, I suspect that this may be racy with
some filesystems. Some of them seem to just flush out data to the
bd_inode on unmount, and trust the system to take care of the rest.

One possible fix there might be to turn bd_super into an RCU managed
pointer. We already free super_blocks under RCU, so we could do
something there like:

rcu_read_lock();
sb = rcu_dereference(bh->b_bdev->bd_super);
if (sb)
	errseq_set(&sb->s_wb_err, -EIO);
rcu_read_unlock();

There aren't that many accessors of bd_super, so that seems like it'd be
fairly simple to do.

Still, I'd like someone to sanity check me here. Is there something that
would prevent the above race that I'm not seeing?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them
  2018-06-19 10:40     ` Jeff Layton
@ 2018-06-19 13:03       ` Jeff Layton
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2018-06-19 13:03 UTC (permalink / raw)
  To: viro, dhowells, Jens Axboe, Theodore Ts'o
  Cc: willy, andres, cmaiolino, linux-kernel, linux-fsdevel, linux-block

On Tue, 2018-06-19 at 06:40 -0400, Jeff Layton wrote:
> On Wed, 2018-06-06 at 11:56 -0400, Jeff Layton wrote:
> > On Mon, 2018-06-04 at 14:03 -0400, Jeff Layton wrote:
> > > From: Jeff Layton <jlayton@redhat.com>
> > > 
> > > When syncing out a block device (a'la __sync_blockdev), any error
> > > encountered will only be recorded in the bd_inode's mapping. When the
> > > blockdev contains a filesystem however, we'd like to also record the
> > > error in the super_block that's stored there.
> > > 
> > > Make mark_buffer_write_io_error also record the error in the
> > > corresponding super_block when a writeback error occurs and the block
> > > device contains a mounted superblock.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > > ---
> > >  fs/buffer.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/fs/buffer.c b/fs/buffer.c
> > > index 249b83fafe48..dae2a857d5bc 100644
> > > --- a/fs/buffer.c
> > > +++ b/fs/buffer.c
> > > @@ -1117,6 +1117,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
> > >  		mapping_set_error(bh->b_page->mapping, -EIO);
> > >  	if (bh->b_assoc_map)
> > >  		mapping_set_error(bh->b_assoc_map, -EIO);
> > > +	if (bh->b_bdev->bd_super)
> > > +		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
> > >  }
> > >  EXPORT_SYMBOL(mark_buffer_write_io_error);
> > >  
> > 
> > (cc'ing linux-block and Jens)
> > 
> > I'm wondering whether this patch might turn out to be racy. For
> > instance, could a call to __sync_blockdev race with an unmount in such
> > a way that bd_super goes NULL after we check it but before errseq_set
> > is called?
> > 
> > If so, what can we do to ensure that that doesn't happen? Any insight
> > here would be appreciated.
> > 
> > Thanks,
> 
> Jens, ping? I never got a response on the above.
> 
> After looking over it some more, I suspect that this may be racy with
> some filesystems. Some of them seem to just flush out data to the
> bd_inode on unmount, and trust the system to take care of the rest.
> 
> One possible fix there might be to turn bd_super into an RCU managed
> pointer. We already free super_blocks under RCU, so we could do
> something there like:
> 
> rcu_read_lock();
> sb = rcu_dereference(bh->b_bdev->bd_super);
> if (sb)
> 	errseq_set(&sb->s_wb_err, -EIO);
> rcu_read_unlock();
> 
> There aren't that many accessors of bd_super, so that seems like it'd be
> fairly simple to do.
> 
> Still, I'd like someone to sanity check me here. Is there something that
> would prevent the above race that I'm not seeing?
> 

(cc'ing Ted since he added blkdev_releasepage in 2009)

Corollary question:

What makes it safe to dereference bd_super in blkdev_releasepage?

bd_super can go NULL in kill_sb and eventually the super_block will be
freed. Is there a ToC/ToU race in that function?

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-06-19 13:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 18:02 [PATCH 0/5] vfs: track per-sb writeback errors and report them via fsinfo() Jeff Layton
2018-06-04 18:03 ` [PATCH 1/5] vfs: track per-sb writeback errors Jeff Layton
2018-06-04 18:03 ` [PATCH 2/5] buffer: record blockdev write errors in super_block that backs them Jeff Layton
2018-06-06 15:56   ` Jeff Layton
2018-06-19 10:40     ` Jeff Layton
2018-06-19 13:03       ` Jeff Layton
2018-06-04 18:03 ` [PATCH 3/5] errseq: add a new errseq_scrape function Jeff Layton
2018-06-04 18:03 ` [PATCH 4/5] vfs: allow fsinfo to fetch the current state of s_wb_err Jeff Layton
2018-06-04 18:03 ` [PATCH 5/5] samples: extend test-fsinfo to access error_state Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).