Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: viro@zeniv.linux.org.uk
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-api@vger.kernel.org, andres@anarazel.de,
	willy@infradead.org, dhowells@redhat.com, hch@infradead.org,
	jack@suse.cz, akpm@linux-foundation.org
Subject: [PATCH v3 1/3] vfs: track per-sb writeback errors and report them to syncfs
Date: Fri,  7 Feb 2020 12:04:21 -0500
Message-ID: <20200207170423.377931-2-jlayton@kernel.org> (raw)
In-Reply-To: <20200207170423.377931-1-jlayton@kernel.org>

From: Jeff Layton <jlayton@redhat.com>

Usually we suggest that applications call fsync when they want to
ensure that all data written to the file has made it to the backing
store, but that can be inefficient when there are a lot of open
files.

Calling syncfs on the filesystem can be more efficient in some
situations, but the error reporting doesn't currently work the way most
people expect. If a single inode on a filesystem reports a writeback
error, syncfs won't necessarily return an error. syncfs only returns an
error if __sync_blockdev fails, and on some filesystems that's a no-op.

It would be better if syncfs reported an error if there were any writeback
failures. Then applications could call syncfs to see if there are any
errors on any open files, and could then call fsync on all of the other
descriptors to figure out which one failed.

This patch adds a new errseq_t to struct super_block, and has
mapping_set_error also record writeback errors there.

To report those errors, we also need to keep an errseq_t for in struct
file to act as a cursor, but growing struct file for this purpose is
undesirable. We could just reuse f_wb_err, but someone could mix calls
to fsync and syncfs and that would break things.

This patch implements an alternative suggested by Willy. When the file
is opened with O_PATH, then we repurpose the f_wb_err cursor to track
s_wb_err. Any file opened with O_PATH will not have an fsync
file_operation, and attempts to fsync such a fd will return -EBADF.

Note that calling syncfs on an O_PATH descriptor today will also return
-EBADF, so this scheme gives userland a way to tell whether this
mechanism will work at runtime.

Cc: Andres Freund <andres@anarazel.de>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/open.c               | 6 +++---
 fs/sync.c               | 9 ++++++++-
 include/linux/fs.h      | 3 +++
 include/linux/pagemap.h | 5 ++++-
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index 0788b3715731..de10a0bf7697 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -744,12 +744,10 @@ static int do_dentry_open(struct file *f,
 	f->f_inode = inode;
 	f->f_mapping = inode->i_mapping;
 
-	/* Ensure that we skip any errors that predate opening of the file */
-	f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
-
 	if (unlikely(f->f_flags & O_PATH)) {
 		f->f_mode = FMODE_PATH | FMODE_OPENED;
 		f->f_op = &empty_fops;
+		f->f_wb_err = errseq_sample(&f->f_path.dentry->d_sb->s_wb_err);
 		return 0;
 	}
 
@@ -759,6 +757,8 @@ static int do_dentry_open(struct file *f,
 		goto cleanup_file;
 	}
 
+	f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
+
 	if (f->f_mode & FMODE_WRITE && !special_file(inode->i_mode)) {
 		error = get_write_access(inode);
 		if (unlikely(error))
diff --git a/fs/sync.c b/fs/sync.c
index 4d1ff010bc5a..8373d0372767 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -159,7 +159,7 @@ void emergency_sync(void)
  */
 SYSCALL_DEFINE1(syncfs, int, fd)
 {
-	struct fd f = fdget(fd);
+	struct fd f = fdget_raw(fd);
 	struct super_block *sb;
 	int ret;
 
@@ -171,6 +171,13 @@ SYSCALL_DEFINE1(syncfs, int, fd)
 	ret = sync_filesystem(sb);
 	up_read(&sb->s_umount);
 
+	if (f.file->f_flags & O_PATH) {
+		int ret2 = errseq_check_and_advance(&sb->s_wb_err,
+						    &f.file->f_wb_err);
+		if (ret == 0)
+			ret = ret2;
+	}
+
 	fdput(f);
 	return ret;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6eae91c0668f..bdbb0cbad03a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1514,6 +1514,9 @@ struct super_block {
 	/* Being remounted read-only */
 	int s_readonly_remount;
 
+	/* per-sb errseq_t for reporting writeback errors via syncfs */
+	errseq_t s_wb_err;
+
 	/* AIO completions deferred from interrupt context */
 	struct workqueue_struct *s_dio_done_wq;
 	struct hlist_head s_pins;
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ccb14b6a16b5..897439475315 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -51,7 +51,10 @@ static inline void mapping_set_error(struct address_space *mapping, int error)
 		return;
 
 	/* Record in wb_err for checkers using errseq_t based tracking */
-	filemap_set_wb_err(mapping, error);
+	__filemap_set_wb_err(mapping, error);
+
+	/* Record it in superblock */
+	errseq_set(&mapping->host->i_sb->s_wb_err, error);
 
 	/* Record it in flags for now, for legacy callers */
 	if (error == -ENOSPC)
-- 
2.24.1


  reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 17:04 [PATCH v3 0/3] vfs: have syncfs() return error when there are writeback errors Jeff Layton
2020-02-07 17:04 ` Jeff Layton [this message]
2020-02-07 17:04 ` [PATCH v3 2/3] buffer: record blockdev write errors in super_block that it backs Jeff Layton
2020-02-07 17:04 ` [PATCH v3 3/3] vfs: add a new ioctl for fetching the superblock's errseq_t Jeff Layton
2020-02-07 20:52 ` [PATCH v3 0/3] vfs: have syncfs() return error when there are writeback errors Dave Chinner
2020-02-07 21:20   ` Andres Freund
2020-02-07 22:05     ` Jeff Layton
2020-02-07 22:21       ` Andres Freund
2020-02-10 21:46     ` Dave Chinner
2020-02-10 23:59       ` Andres Freund, David Howells
2020-02-11  0:04       ` Andres Freund
2020-02-11  0:48         ` Dave Chinner
2020-02-11  1:31           ` Andres Freund
2020-02-11 12:57       ` Jeff Layton
2020-02-12 12:21 ` Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200207170423.377931-2-jlayton@kernel.org \
    --to=jlayton@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andres@anarazel.de \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git