All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 RESEND 0/2] vfs: have syncfs() return error when there are writeback errors
@ 2020-04-14 12:04 Jeff Layton
  2020-04-14 12:04 ` [PATCH v4 RESEND 1/2] vfs: track per-sb writeback errors and report them to syncfs Jeff Layton
  2020-04-14 12:04 ` [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs Jeff Layton
  0 siblings, 2 replies; 7+ messages in thread
From: Jeff Layton @ 2020-04-14 12:04 UTC (permalink / raw)
  To: viro
  Cc: linux-fsdevel, linux-kernel, linux-api, andres, willy, dhowells,
	hch, jack, akpm, david

I sent the original v4 set on February 13th. There have been no changes
since then (other than a clean rebase onto master). I'd like to see this
go into v5.8 if this looks reasonable. Original v4 cover letter follows:

-----------------8<---------------

v4:
- switch to dedicated errseq_t cursor in struct file for syncfs
- drop ioctl for fetching the errseq_t without syncing

This is the fourth posting of this patchset. After thinking about it
more, I think multiplexing file->f_wb_err based on O_PATH open is just
too weird. I think it'd be better if syncfs() "just worked" as expected
no matter what sort of fd you use, or how you multiplex it with fsync.

Also (at least on x86_64) there is currently a 4 byte pad at the end of
the struct so this doesn't end up growing the memory utilization anyway.
Does anyone object to doing this?

I've also dropped the ioctl patch. I have a draft patch to expose that
via fsinfo, but that functionality is really separate from returning an
error to syncfs. We can look at that after the syncfs piece is settled.

Jeff Layton (2):
  vfs: track per-sb writeback errors and report them to syncfs
  buffer: record blockdev write errors in super_block that it backs

 drivers/dax/device.c    |  1 +
 fs/buffer.c             |  2 ++
 fs/file_table.c         |  1 +
 fs/open.c               |  3 +--
 fs/sync.c               |  6 ++++--
 include/linux/fs.h      | 16 ++++++++++++++++
 include/linux/pagemap.h |  5 ++++-
 7 files changed, 29 insertions(+), 5 deletions(-)

-- 
2.25.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 RESEND 1/2] vfs: track per-sb writeback errors and report them to syncfs
  2020-04-14 12:04 [PATCH v4 RESEND 0/2] vfs: have syncfs() return error when there are writeback errors Jeff Layton
@ 2020-04-14 12:04 ` Jeff Layton
  2020-04-14 16:23   ` Jan Kara
  2020-04-14 12:04 ` [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs Jeff Layton
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2020-04-14 12:04 UTC (permalink / raw)
  To: viro
  Cc: linux-fsdevel, linux-kernel, linux-api, andres, willy, dhowells,
	hch, jack, akpm, david

From: Jeff Layton <jlayton@redhat.com>

Usually we suggest that applications call fsync when they want to
ensure that all data written to the file has made it to the backing
store, but that can be inefficient when there are a lot of open
files.

Calling syncfs on the filesystem can be more efficient in some
situations, but the error reporting doesn't currently work the way most
people expect. If a single inode on a filesystem reports a writeback
error, syncfs won't necessarily return an error. syncfs only returns an
error if __sync_blockdev fails, and on some filesystems that's a no-op.

It would be better if syncfs reported an error if there were any writeback
failures. Then applications could call syncfs to see if there are any
errors on any open files, and could then call fsync on all of the other
descriptors to figure out which one failed.

This patch adds a new errseq_t to struct super_block, and has
mapping_set_error also record writeback errors there.

To report those errors, we also need to keep an errseq_t in struct
file to act as a cursor. This patch adds a dedicated field for that
purpose, which slots nicely into 4 bytes of padding at the end of
struct file on x86_64.

An earlier version of this patch used an O_PATH file descriptor to cue
the kernel that the open file should track the superblock error and not
the inode's writeback error.

I think that API is just too weird though. This is simpler and should
make syncfs error reporting "just work" even if someone is multiplexing
fsync and syncfs on the same fds.

Cc: Andres Freund <andres@anarazel.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 drivers/dax/device.c    |  1 +
 fs/file_table.c         |  1 +
 fs/open.c               |  3 +--
 fs/sync.c               |  6 ++++--
 include/linux/fs.h      | 16 ++++++++++++++++
 include/linux/pagemap.h |  5 ++++-
 6 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 1af823b2fe6b..4c0af2eb7e19 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -377,6 +377,7 @@ static int dax_open(struct inode *inode, struct file *filp)
 	inode->i_mapping->a_ops = &dev_dax_aops;
 	filp->f_mapping = inode->i_mapping;
 	filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
+	filp->f_sb_err = file_sample_sb_err(filp);
 	filp->private_data = dev_dax;
 	inode->i_flags = S_DAX;
 
diff --git a/fs/file_table.c b/fs/file_table.c
index 30d55c9a1744..676e620948d2 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -198,6 +198,7 @@ static struct file *alloc_file(const struct path *path, int flags,
 	file->f_inode = path->dentry->d_inode;
 	file->f_mapping = path->dentry->d_inode->i_mapping;
 	file->f_wb_err = filemap_sample_wb_err(file->f_mapping);
+	file->f_sb_err = file_sample_sb_err(file);
 	if ((file->f_mode & FMODE_READ) &&
 	     likely(fop->read || fop->read_iter))
 		file->f_mode |= FMODE_CAN_READ;
diff --git a/fs/open.c b/fs/open.c
index 719b320ede52..d9467a8a7f6a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -743,9 +743,8 @@ static int do_dentry_open(struct file *f,
 	path_get(&f->f_path);
 	f->f_inode = inode;
 	f->f_mapping = inode->i_mapping;
-
-	/* Ensure that we skip any errors that predate opening of the file */
 	f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
+	f->f_sb_err = file_sample_sb_err(f);
 
 	if (unlikely(f->f_flags & O_PATH)) {
 		f->f_mode = FMODE_PATH | FMODE_OPENED;
diff --git a/fs/sync.c b/fs/sync.c
index 4d1ff010bc5a..c6f6f5be5682 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -161,7 +161,7 @@ SYSCALL_DEFINE1(syncfs, int, fd)
 {
 	struct fd f = fdget(fd);
 	struct super_block *sb;
-	int ret;
+	int ret, ret2;
 
 	if (!f.file)
 		return -EBADF;
@@ -171,8 +171,10 @@ SYSCALL_DEFINE1(syncfs, int, fd)
 	ret = sync_filesystem(sb);
 	up_read(&sb->s_umount);
 
+	ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
+
 	fdput(f);
-	return ret;
+	return ret ? ret : ret2;
 }
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 4f6f59b4f22a..5ad13cd6441c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -976,6 +976,7 @@ struct file {
 #endif /* #ifdef CONFIG_EPOLL */
 	struct address_space	*f_mapping;
 	errseq_t		f_wb_err;
+	errseq_t		f_sb_err; /* for syncfs */
 } __randomize_layout
   __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
 
@@ -1520,6 +1521,9 @@ struct super_block {
 	/* Being remounted read-only */
 	int s_readonly_remount;
 
+	/* per-sb errseq_t for reporting writeback errors via syncfs */
+	errseq_t s_wb_err;
+
 	/* AIO completions deferred from interrupt context */
 	struct workqueue_struct *s_dio_done_wq;
 	struct hlist_head s_pins;
@@ -2827,6 +2831,18 @@ static inline errseq_t filemap_sample_wb_err(struct address_space *mapping)
 	return errseq_sample(&mapping->wb_err);
 }
 
+/**
+ * file_sample_sb_err - sample the current errseq_t to test for later errors
+ * @mapping: mapping to be sampled
+ *
+ * Grab the most current superblock-level errseq_t value for the given
+ * struct file.
+ */
+static inline errseq_t file_sample_sb_err(struct file *file)
+{
+	return errseq_sample(&file->f_path.dentry->d_sb->s_wb_err);
+}
+
 static inline int filemap_nr_thps(struct address_space *mapping)
 {
 #ifdef CONFIG_READ_ONLY_THP_FOR_FS
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a8f7bd8ea1c6..d4409b13747e 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -51,7 +51,10 @@ static inline void mapping_set_error(struct address_space *mapping, int error)
 		return;
 
 	/* Record in wb_err for checkers using errseq_t based tracking */
-	filemap_set_wb_err(mapping, error);
+	__filemap_set_wb_err(mapping, error);
+
+	/* Record it in superblock */
+	errseq_set(&mapping->host->i_sb->s_wb_err, error);
 
 	/* Record it in flags for now, for legacy callers */
 	if (error == -ENOSPC)
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs
  2020-04-14 12:04 [PATCH v4 RESEND 0/2] vfs: have syncfs() return error when there are writeback errors Jeff Layton
  2020-04-14 12:04 ` [PATCH v4 RESEND 1/2] vfs: track per-sb writeback errors and report them to syncfs Jeff Layton
@ 2020-04-14 12:04 ` Jeff Layton
  2020-04-14 16:26   ` Jan Kara
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2020-04-14 12:04 UTC (permalink / raw)
  To: viro
  Cc: linux-fsdevel, linux-kernel, linux-api, andres, willy, dhowells,
	hch, jack, akpm, david

From: Jeff Layton <jlayton@redhat.com>

When syncing out a block device (a'la __sync_blockdev), any error
encountered will only be recorded in the bd_inode's mapping. When the
blockdev contains a filesystem however, we'd like to also record the
error in the super_block that's stored there.

Make mark_buffer_write_io_error also record the error in the
corresponding super_block when a writeback error occurs and the block
device contains a mounted superblock.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/buffer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index f73276d746bb..a9d986d27fa1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1160,6 +1160,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
 		mapping_set_error(bh->b_page->mapping, -EIO);
 	if (bh->b_assoc_map)
 		mapping_set_error(bh->b_assoc_map, -EIO);
+	if (bh->b_bdev->bd_super)
+		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
 }
 EXPORT_SYMBOL(mark_buffer_write_io_error);
 
-- 
2.25.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 RESEND 1/2] vfs: track per-sb writeback errors and report them to syncfs
  2020-04-14 12:04 ` [PATCH v4 RESEND 1/2] vfs: track per-sb writeback errors and report them to syncfs Jeff Layton
@ 2020-04-14 16:23   ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2020-04-14 16:23 UTC (permalink / raw)
  To: Jeff Layton
  Cc: viro, linux-fsdevel, linux-kernel, linux-api, andres, willy,
	dhowells, hch, jack, akpm, david

On Tue 14-04-20 08:04:08, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> Usually we suggest that applications call fsync when they want to
> ensure that all data written to the file has made it to the backing
> store, but that can be inefficient when there are a lot of open
> files.
> 
> Calling syncfs on the filesystem can be more efficient in some
> situations, but the error reporting doesn't currently work the way most
> people expect. If a single inode on a filesystem reports a writeback
> error, syncfs won't necessarily return an error. syncfs only returns an
> error if __sync_blockdev fails, and on some filesystems that's a no-op.
> 
> It would be better if syncfs reported an error if there were any writeback
> failures. Then applications could call syncfs to see if there are any
> errors on any open files, and could then call fsync on all of the other
> descriptors to figure out which one failed.
> 
> This patch adds a new errseq_t to struct super_block, and has
> mapping_set_error also record writeback errors there.
> 
> To report those errors, we also need to keep an errseq_t in struct
> file to act as a cursor. This patch adds a dedicated field for that
> purpose, which slots nicely into 4 bytes of padding at the end of
> struct file on x86_64.
> 
> An earlier version of this patch used an O_PATH file descriptor to cue
> the kernel that the open file should track the superblock error and not
> the inode's writeback error.
> 
> I think that API is just too weird though. This is simpler and should
> make syncfs error reporting "just work" even if someone is multiplexing
> fsync and syncfs on the same fds.
> 
> Cc: Andres Freund <andres@anarazel.de>
> Cc: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  drivers/dax/device.c    |  1 +
>  fs/file_table.c         |  1 +
>  fs/open.c               |  3 +--
>  fs/sync.c               |  6 ++++--
>  include/linux/fs.h      | 16 ++++++++++++++++
>  include/linux/pagemap.h |  5 ++++-
>  6 files changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index 1af823b2fe6b..4c0af2eb7e19 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -377,6 +377,7 @@ static int dax_open(struct inode *inode, struct file *filp)
>  	inode->i_mapping->a_ops = &dev_dax_aops;
>  	filp->f_mapping = inode->i_mapping;
>  	filp->f_wb_err = filemap_sample_wb_err(filp->f_mapping);
> +	filp->f_sb_err = file_sample_sb_err(filp);
>  	filp->private_data = dev_dax;
>  	inode->i_flags = S_DAX;
>  
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 30d55c9a1744..676e620948d2 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -198,6 +198,7 @@ static struct file *alloc_file(const struct path *path, int flags,
>  	file->f_inode = path->dentry->d_inode;
>  	file->f_mapping = path->dentry->d_inode->i_mapping;
>  	file->f_wb_err = filemap_sample_wb_err(file->f_mapping);
> +	file->f_sb_err = file_sample_sb_err(file);
>  	if ((file->f_mode & FMODE_READ) &&
>  	     likely(fop->read || fop->read_iter))
>  		file->f_mode |= FMODE_CAN_READ;
> diff --git a/fs/open.c b/fs/open.c
> index 719b320ede52..d9467a8a7f6a 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -743,9 +743,8 @@ static int do_dentry_open(struct file *f,
>  	path_get(&f->f_path);
>  	f->f_inode = inode;
>  	f->f_mapping = inode->i_mapping;
> -
> -	/* Ensure that we skip any errors that predate opening of the file */
>  	f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
> +	f->f_sb_err = file_sample_sb_err(f);
>  
>  	if (unlikely(f->f_flags & O_PATH)) {
>  		f->f_mode = FMODE_PATH | FMODE_OPENED;
> diff --git a/fs/sync.c b/fs/sync.c
> index 4d1ff010bc5a..c6f6f5be5682 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -161,7 +161,7 @@ SYSCALL_DEFINE1(syncfs, int, fd)
>  {
>  	struct fd f = fdget(fd);
>  	struct super_block *sb;
> -	int ret;
> +	int ret, ret2;
>  
>  	if (!f.file)
>  		return -EBADF;
> @@ -171,8 +171,10 @@ SYSCALL_DEFINE1(syncfs, int, fd)
>  	ret = sync_filesystem(sb);
>  	up_read(&sb->s_umount);
>  
> +	ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> +
>  	fdput(f);
> -	return ret;
> +	return ret ? ret : ret2;
>  }
>  
>  /**
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 4f6f59b4f22a..5ad13cd6441c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -976,6 +976,7 @@ struct file {
>  #endif /* #ifdef CONFIG_EPOLL */
>  	struct address_space	*f_mapping;
>  	errseq_t		f_wb_err;
> +	errseq_t		f_sb_err; /* for syncfs */
>  } __randomize_layout
>    __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
>  
> @@ -1520,6 +1521,9 @@ struct super_block {
>  	/* Being remounted read-only */
>  	int s_readonly_remount;
>  
> +	/* per-sb errseq_t for reporting writeback errors via syncfs */
> +	errseq_t s_wb_err;
> +
>  	/* AIO completions deferred from interrupt context */
>  	struct workqueue_struct *s_dio_done_wq;
>  	struct hlist_head s_pins;
> @@ -2827,6 +2831,18 @@ static inline errseq_t filemap_sample_wb_err(struct address_space *mapping)
>  	return errseq_sample(&mapping->wb_err);
>  }
>  
> +/**
> + * file_sample_sb_err - sample the current errseq_t to test for later errors
> + * @mapping: mapping to be sampled
> + *
> + * Grab the most current superblock-level errseq_t value for the given
> + * struct file.
> + */
> +static inline errseq_t file_sample_sb_err(struct file *file)
> +{
> +	return errseq_sample(&file->f_path.dentry->d_sb->s_wb_err);
> +}
> +
>  static inline int filemap_nr_thps(struct address_space *mapping)
>  {
>  #ifdef CONFIG_READ_ONLY_THP_FOR_FS
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index a8f7bd8ea1c6..d4409b13747e 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -51,7 +51,10 @@ static inline void mapping_set_error(struct address_space *mapping, int error)
>  		return;
>  
>  	/* Record in wb_err for checkers using errseq_t based tracking */
> -	filemap_set_wb_err(mapping, error);
> +	__filemap_set_wb_err(mapping, error);
> +
> +	/* Record it in superblock */
> +	errseq_set(&mapping->host->i_sb->s_wb_err, error);
>  
>  	/* Record it in flags for now, for legacy callers */
>  	if (error == -ENOSPC)
> -- 
> 2.25.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs
  2020-04-14 12:04 ` [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs Jeff Layton
@ 2020-04-14 16:26   ` Jan Kara
  2020-04-14 18:37     ` Jeff Layton
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2020-04-14 16:26 UTC (permalink / raw)
  To: Jeff Layton
  Cc: viro, linux-fsdevel, linux-kernel, linux-api, andres, willy,
	dhowells, hch, jack, akpm, david

On Tue 14-04-20 08:04:09, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> When syncing out a block device (a'la __sync_blockdev), any error
> encountered will only be recorded in the bd_inode's mapping. When the
> blockdev contains a filesystem however, we'd like to also record the
> error in the super_block that's stored there.
> 
> Make mark_buffer_write_io_error also record the error in the
> corresponding super_block when a writeback error occurs and the block
> device contains a mounted superblock.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>

The patch looks good to me. I'd just note that bh->b_bdev->bd_super
dereference is safe only because we will flush all dirty data when
unmounting a filesystem which is somewhat tricky. Maybe that warrants a
comment? Otherwise feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/buffer.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index f73276d746bb..a9d986d27fa1 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1160,6 +1160,8 @@ void mark_buffer_write_io_error(struct buffer_head *bh)
>  		mapping_set_error(bh->b_page->mapping, -EIO);
>  	if (bh->b_assoc_map)
>  		mapping_set_error(bh->b_assoc_map, -EIO);
> +	if (bh->b_bdev->bd_super)
> +		errseq_set(&bh->b_bdev->bd_super->s_wb_err, -EIO);
>  }
>  EXPORT_SYMBOL(mark_buffer_write_io_error);
>  
> -- 
> 2.25.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs
  2020-04-14 16:26   ` Jan Kara
@ 2020-04-14 18:37     ` Jeff Layton
  2020-04-15  9:17       ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2020-04-14 18:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: viro, linux-fsdevel, linux-kernel, linux-api, andres, willy,
	dhowells, hch, akpm, david

On Tue, 2020-04-14 at 18:26 +0200, Jan Kara wrote:
> On Tue 14-04-20 08:04:09, Jeff Layton wrote:
> > From: Jeff Layton <jlayton@redhat.com>
> > 
> > When syncing out a block device (a'la __sync_blockdev), any error
> > encountered will only be recorded in the bd_inode's mapping. When the
> > blockdev contains a filesystem however, we'd like to also record the
> > error in the super_block that's stored there.
> > 
> > Make mark_buffer_write_io_error also record the error in the
> > corresponding super_block when a writeback error occurs and the block
> > device contains a mounted superblock.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> 
> The patch looks good to me. I'd just note that bh->b_bdev->bd_super
> dereference is safe only because we will flush all dirty data when
> unmounting a filesystem which is somewhat tricky. Maybe that warrants a
> comment? Otherwise feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Oh, hmm...now that I look again, I'm not sure this is actually safe.

bh->b_bdev gets cleared out as we discard the buffer, so I don't think
that could end up getting zeroed while we're still using it.

The bd_super pointer gets zeroed out in kill_block_super, and after that
point it calls sync_blockdev(). Could writeback error processing race
with kill_block_super such that bd_inode gets set to NULL after we test
it but before we dereference it?

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs
  2020-04-14 18:37     ` Jeff Layton
@ 2020-04-15  9:17       ` Jan Kara
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Kara @ 2020-04-15  9:17 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Jan Kara, viro, linux-fsdevel, linux-kernel, linux-api, andres,
	willy, dhowells, hch, akpm, david

On Tue 14-04-20 14:37:21, Jeff Layton wrote:
> On Tue, 2020-04-14 at 18:26 +0200, Jan Kara wrote:
> > On Tue 14-04-20 08:04:09, Jeff Layton wrote:
> > > From: Jeff Layton <jlayton@redhat.com>
> > > 
> > > When syncing out a block device (a'la __sync_blockdev), any error
> > > encountered will only be recorded in the bd_inode's mapping. When the
> > > blockdev contains a filesystem however, we'd like to also record the
> > > error in the super_block that's stored there.
> > > 
> > > Make mark_buffer_write_io_error also record the error in the
> > > corresponding super_block when a writeback error occurs and the block
> > > device contains a mounted superblock.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > 
> > The patch looks good to me. I'd just note that bh->b_bdev->bd_super
> > dereference is safe only because we will flush all dirty data when
> > unmounting a filesystem which is somewhat tricky. Maybe that warrants a
> > comment? Otherwise feel free to add:
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> 
> Oh, hmm...now that I look again, I'm not sure this is actually safe.
> 
> bh->b_bdev gets cleared out as we discard the buffer, so I don't think
> that could end up getting zeroed while we're still using it.

Correct.

> The bd_super pointer gets zeroed out in kill_block_super, and after that
> point it calls sync_blockdev(). Could writeback error processing race
> with kill_block_super such that bd_inode gets set to NULL after we test
> it but before we dereference it?

Yeah, you're right. But you can avoid the race with
READ_ONCE(bh->b_bdev->bd_super) and a big fat comment explaining why it is
safe... :)

Or you could be less daring and put rcu protection there because
superblocks are RCU freed...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-15  9:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-14 12:04 [PATCH v4 RESEND 0/2] vfs: have syncfs() return error when there are writeback errors Jeff Layton
2020-04-14 12:04 ` [PATCH v4 RESEND 1/2] vfs: track per-sb writeback errors and report them to syncfs Jeff Layton
2020-04-14 16:23   ` Jan Kara
2020-04-14 12:04 ` [PATCH v4 RESEND 2/2] buffer: record blockdev write errors in super_block that it backs Jeff Layton
2020-04-14 16:26   ` Jan Kara
2020-04-14 18:37     ` Jeff Layton
2020-04-15  9:17       ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.