[PATCH] fs: allow calling kiocb_end_write() unmatched with kiocb_start_write()

From: Amir Goldstein <amir73il@gmail.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	David Howells <dhowells@redhat.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH] fs: allow calling kiocb_end_write() unmatched with kiocb_start_write()
Date: Tue, 21 Nov 2023 15:25:51 +0200	[thread overview]
Message-ID: <20231121132551.2337431-1-amir73il@gmail.com> (raw)

We want to move kiocb_start_write() into vfs_iocb_iter_write(), after
the permission hook, but leave kiocb_end_write() in the write completion
handler of the callers of vfs_iocb_iter_write().

After this change, there will be no way of knowing in completion handler,
if write has failed before or after calling kiocb_start_write().

Add a flag IOCB_WRITE_STARTED, which is set and cleared internally by
kiocb_{start,end}_write(), so that kiocb_end_write() could be called for
cleanup of async write, whether it was successful or whether it failed
before or after calling kiocb_start_write().

This flag must not be copied by stacked filesystems (e.g. overlayfs)
that clone the iocb to another iocb for io request on a backing file.

Link: https://lore.kernel.org/r/CAOQ4uxihfJJRxxUhAmOwtD97Lg8PL8RgXw88rH1UfEeP8AtP+w@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Jens,

The kiocb_{start,end}_write() helpers were originally taken out of
an early version of the permission hooks cleanup patches [1].

This early version of the helpers had the IOCB_WRITE_STARTED flag, but
when I posted the helpers independently from the cleanup series, you had
correctly pointed out [2] that IOCB_WRITE_STARTED is not needed for any
of the existing callers of kiocb_{start,end}_write(), so I removed it for
the final version of the helpers that got merged.

When coming back to the permission hook cleanup, I see that the
IOCB_WRITE_STARTED flag is needed to allow the new semantics of calling
kiocb_start_write() inside vfs_iocb_iter_write() and kiocb_end_write()
in completion callback.

I realize these semantics are not great, but the alternative of moving
the permission hook from vfs_iocb_iter_write() into all the callers is
worse IMO.

Can you accept the solution with IOCB_WRITE_STARTED state flag?
Have a better idea for cleaner iocb issue semantics that will allow
calling the permission hook with freeze protection held?

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20231114153321.1716028-1-amir73il@gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/e6948836-1d9d-4219-9a21-a2ab442a9a34@kernel.dk/

 fs/overlayfs/file.c |  2 +-
 include/linux/fs.h  | 18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 131621daeb13..e4baa4ea5c95 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -455,7 +455,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 
 		aio_req->orig_iocb = iocb;
 		kiocb_clone(&aio_req->iocb, iocb, get_file(real.file));
-		aio_req->iocb.ki_flags = ifl;
+		aio_req->iocb.ki_flags &= ifl;
 		aio_req->iocb.ki_complete = ovl_aio_queue_completion;
 		refcount_set(&aio_req->ref, 2);
 		kiocb_start_write(&aio_req->iocb);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..64414e146e1e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -352,6 +352,8 @@ enum rw_hint {
  * unrelated IO (like cache flushing, new IO generation, etc).
  */
 #define IOCB_DIO_CALLER_COMP	(1 << 22)
+/* file_start_write() was called */
+#define IOCB_WRITE_STARTED	(1 << 23)
 
 /* for use in trace events */
 #define TRACE_IOCB_STRINGS \
@@ -366,7 +368,8 @@ enum rw_hint {
 	{ IOCB_WAITQ,		"WAITQ" }, \
 	{ IOCB_NOIO,		"NOIO" }, \
 	{ IOCB_ALLOC_CACHE,	"ALLOC_CACHE" }, \
-	{ IOCB_DIO_CALLER_COMP,	"CALLER_COMP" }
+	{ IOCB_DIO_CALLER_COMP,	"CALLER_COMP" }, \
+	{ IOCB_WRITE_STARTED,	"WRITE_STARTED" }
 
 struct kiocb {
 	struct file		*ki_filp;
@@ -2183,12 +2186,15 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 	};
 }
 
+/* IOCB flags associated with ki_filp state must not be cloned */
+#define IOCB_CLONE_FLAGS_MASK	(~IOCB_WRITE_STARTED)
+
 static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
 			       struct file *filp)
 {
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
-		.ki_flags = kiocb_src->ki_flags,
+		.ki_flags = kiocb_src->ki_flags & IOCB_CLONE_FLAGS_MASK,
 		.ki_ioprio = kiocb_src->ki_ioprio,
 		.ki_pos = kiocb_src->ki_pos,
 	};
@@ -2744,12 +2750,16 @@ static inline void kiocb_start_write(struct kiocb *iocb)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 
+	if (WARN_ON_ONCE(iocb->ki_flags & IOCB_WRITE_STARTED))
+		return;
+
 	sb_start_write(inode->i_sb);
 	/*
 	 * Fool lockdep by telling it the lock got released so that it
 	 * doesn't complain about the held lock when we return to userspace.
 	 */
 	__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+	iocb->ki_flags |= IOCB_WRITE_STARTED;
 }
 
 /**
@@ -2762,11 +2772,15 @@ static inline void kiocb_end_write(struct kiocb *iocb)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 
+	if (!(iocb->ki_flags & IOCB_WRITE_STARTED))
+		return;
+
 	/*
 	 * Tell lockdep we inherited freeze protection from submission thread.
 	 */
 	__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
 	sb_end_write(inode->i_sb);
+	iocb->ki_flags &= ~IOCB_WRITE_STARTED;
 }
 
 /*
-- 
2.34.1