All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCHES] iov_iter stuff
@ 2022-06-07  4:08 Al Viro
  2022-06-07  4:09 ` [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size() Al Viro
                   ` (10 more replies)
  0 siblings, 11 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:08 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

	Rebased to -rc1 and reordered.  Sits in vfs.git #work.iov_iter,
individual patches in followups

1/9: No need of likely/unlikely on calls of check_copy_size()
	not just in uio.h; the thing is inlined and it has unlikely on
all paths leading to return false

2/9: btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
	new flag for iomap_dio_rw(), telling it to suppress generic_write_sync()

3/9: struct file: use anonymous union member for rcuhead and llist
	"f_u" might have been an amusing name, but... we expect anon unions to
work.

4/9: iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
	makes iocb_flags() much cheaper, and it's easier to keep track of
the places where it can change.

5/9: keep iocb_flags() result cached in struct file
	that, along with the previous commit, reduces the overhead of
new_sync_{read,write}().  struct file doesn't grow - we can keep that
thing in the same anon union where rcuhead and llist live; that field
gets used only before ->f_count reaches zero while the other two are
used only after ->f_count has reached zero.

6/9: copy_page_{to,from}_iter(): switch iovec variants to generic
	kmap_local_page() allows that.  And it kills quite a bit of
code.

7/9: new iov_iter flavour - ITER_UBUF
	iovec analogue, with single segment.  That case is fairly common and it
can be handled with less overhead than full-blown iovec.

8/9: switch new_sync_{read,write}() to ITER_UBUF
	... and this is why it is so common.  Further reduction of overhead
for new_sync_{read,write}().

9/9: iov_iter_bvec_advance(): don't bother with bvec_iter
	AFAICS, variant similar to what we do for iovec/kvec generates better
code.  Needs profiling, obviously.

Diffstat:
 arch/powerpc/include/asm/uaccess.h |   2 +-
 arch/s390/include/asm/uaccess.h    |   4 +-
 block/fops.c                       |   8 +-
 drivers/nvme/target/io-cmd-file.c  |   2 +-
 fs/aio.c                           |   2 +-
 fs/btrfs/file.c                    |  19 +--
 fs/btrfs/inode.c                   |   2 +-
 fs/ceph/file.c                     |   2 +-
 fs/cifs/file.c                     |   2 +-
 fs/direct-io.c                     |   4 +-
 fs/fcntl.c                         |   1 +
 fs/file_table.c                    |  17 +-
 fs/fuse/dev.c                      |   4 +-
 fs/fuse/file.c                     |   4 +-
 fs/gfs2/file.c                     |   2 +-
 fs/io_uring.c                      |   2 +-
 fs/iomap/direct-io.c               |  24 +--
 fs/nfs/direct.c                    |   2 +-
 fs/open.c                          |   1 +
 fs/read_write.c                    |   6 +-
 fs/zonefs/super.c                  |   2 +-
 include/linux/fs.h                 |  21 ++-
 include/linux/iomap.h              |   2 +
 include/linux/uaccess.h            |   4 +-
 include/linux/uio.h                |  41 +++--
 lib/iov_iter.c                     | 308 +++++++++++--------------------------
 mm/shmem.c                         |   2 +-
 27 files changed, 191 insertions(+), 299 deletions(-)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size()
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
@ 2022-06-07  4:09 ` Al Viro
  2022-06-07  4:41   ` Christoph Hellwig
  2022-06-07 11:49   ` Christian Brauner
  2022-06-07  4:09 ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Al Viro
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

it's inline and unlikely() inside of it (including the implicit one
in WARN_ON_ONCE()) suffice to convince the compiler that getting
false from check_copy_size() is unlikely.

Spotted-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/include/asm/uaccess.h |  2 +-
 arch/s390/include/asm/uaccess.h    |  4 ++--
 include/linux/uaccess.h            |  4 ++--
 include/linux/uio.h                | 15 ++++++---------
 4 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 9b82b38ff867..105f200b1e31 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -348,7 +348,7 @@ copy_mc_to_kernel(void *to, const void *from, unsigned long size)
 static inline unsigned long __must_check
 copy_mc_to_user(void __user *to, const void *from, unsigned long n)
 {
-	if (likely(check_copy_size(from, n, true))) {
+	if (check_copy_size(from, n, true)) {
 		if (access_ok(to, n)) {
 			allow_write_to_user(to, n);
 			n = copy_mc_generic((void *)to, from, n);
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index f4511e21d646..c2c9995466e0 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -39,7 +39,7 @@ _copy_from_user_key(void *to, const void __user *from, unsigned long n, unsigned
 static __always_inline unsigned long __must_check
 copy_from_user_key(void *to, const void __user *from, unsigned long n, unsigned long key)
 {
-	if (likely(check_copy_size(to, n, false)))
+	if (check_copy_size(to, n, false))
 		n = _copy_from_user_key(to, from, n, key);
 	return n;
 }
@@ -50,7 +50,7 @@ _copy_to_user_key(void __user *to, const void *from, unsigned long n, unsigned l
 static __always_inline unsigned long __must_check
 copy_to_user_key(void __user *to, const void *from, unsigned long n, unsigned long key)
 {
-	if (likely(check_copy_size(from, n, true)))
+	if (check_copy_size(from, n, true))
 		n = _copy_to_user_key(to, from, n, key);
 	return n;
 }
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a328cf02b75..47e5d374c7eb 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -148,7 +148,7 @@ _copy_to_user(void __user *, const void *, unsigned long);
 static __always_inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-	if (likely(check_copy_size(to, n, false)))
+	if (check_copy_size(to, n, false))
 		n = _copy_from_user(to, from, n);
 	return n;
 }
@@ -156,7 +156,7 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 static __always_inline unsigned long __must_check
 copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	if (likely(check_copy_size(from, n, true)))
+	if (check_copy_size(from, n, true))
 		n = _copy_to_user(to, from, n);
 	return n;
 }
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..76d305f3d4c2 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -156,19 +156,17 @@ static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
 static __always_inline __must_check
 size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(!check_copy_size(addr, bytes, true)))
-		return 0;
-	else
+	if (check_copy_size(addr, bytes, true))
 		return _copy_to_iter(addr, bytes, i);
+	return 0;
 }
 
 static __always_inline __must_check
 size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(!check_copy_size(addr, bytes, false)))
-		return 0;
-	else
+	if (check_copy_size(addr, bytes, false))
 		return _copy_from_iter(addr, bytes, i);
+	return 0;
 }
 
 static __always_inline __must_check
@@ -184,10 +182,9 @@ bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
 static __always_inline __must_check
 size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(!check_copy_size(addr, bytes, false)))
-		return 0;
-	else
+	if (check_copy_size(addr, bytes, false))
 		return _copy_from_iter_nocache(addr, bytes, i);
+	return 0;
 }
 
 static __always_inline __must_check
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
  2022-06-07  4:09 ` [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size() Al Viro
@ 2022-06-07  4:09 ` Al Viro
  2022-06-07  4:42   ` Christoph Hellwig
  2022-06-07 14:49   ` Matthew Wilcox
  2022-06-07  4:10 ` [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist Al Viro
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

explicitly tell iomap to do it, rather than messing with IOCB_DSYNC
[folded a fix for braino spotted by willy]

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/btrfs/file.c       | 17 -----------------
 fs/btrfs/inode.c      |  2 +-
 fs/iomap/direct-io.c  |  2 +-
 include/linux/iomap.h |  2 ++
 4 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 1fd827b99c1b..98f81e304eb1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1848,7 +1848,6 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
 
 static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
-	const bool is_sync_write = (iocb->ki_flags & IOCB_DSYNC);
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1901,15 +1900,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 		goto buffered;
 	}
 
-	/*
-	 * We remove IOCB_DSYNC so that we don't deadlock when iomap_dio_rw()
-	 * calls generic_write_sync() (through iomap_dio_complete()), because
-	 * that results in calling fsync (btrfs_sync_file()) which will try to
-	 * lock the inode in exclusive/write mode.
-	 */
-	if (is_sync_write)
-		iocb->ki_flags &= ~IOCB_DSYNC;
-
 	/*
 	 * The iov_iter can be mapped to the same file range we are writing to.
 	 * If that's the case, then we will deadlock in the iomap code, because
@@ -1964,13 +1954,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 
 	btrfs_inode_unlock(inode, ilock_flags);
 
-	/*
-	 * Add back IOCB_DSYNC. Our caller, btrfs_file_write_iter(), will do
-	 * the fsync (call generic_write_sync()).
-	 */
-	if (is_sync_write)
-		iocb->ki_flags |= IOCB_DSYNC;
-
 	/* If 'err' is -ENOTBLK then it means we must fallback to buffered IO. */
 	if ((err < 0 && err != -ENOTBLK) || !iov_iter_count(from))
 		goto out;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 81737eff92f3..c9c8f49568d1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8152,7 +8152,7 @@ ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter, size_t done_befo
 	struct btrfs_dio_data data;
 
 	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
-			    IOMAP_DIO_PARTIAL, &data, done_before);
+			    IOMAP_DIO_PARTIAL | IOMAP_DIO_NOSYNC, &data, done_before);
 }
 
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 370c3241618a..0f16479b13d6 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -548,7 +548,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		}
 
 		/* for data sync or sync, we need sync completion processing */
-		if (iocb->ki_flags & IOCB_DSYNC)
+		if (iocb->ki_flags & IOCB_DSYNC && !(dio_flags & IOMAP_DIO_NOSYNC))
 			dio->flags |= IOMAP_DIO_NEED_SYNC;
 
 		/*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index e552097c67e0..95de0c771d37 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -353,6 +353,8 @@ struct iomap_dio_ops {
  */
 #define IOMAP_DIO_PARTIAL		(1 << 2)
 
+#define IOMAP_DIO_NOSYNC		(1 << 3)
+
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, void *private, size_t done_before);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
  2022-06-07  4:09 ` [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size() Al Viro
  2022-06-07  4:09 ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Al Viro
@ 2022-06-07  4:10 ` Al Viro
  2022-06-07 10:18   ` Jan Kara
  2022-06-07 11:46   ` Christian Brauner
  2022-06-07  4:11 ` [PATCH 4/9] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:10 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

Once upon a time we couldn't afford anon unions; these days minimal
gcc version had been raised enough to take care of that.
---
 fs/file_table.c    | 16 ++++++++--------
 include/linux/fs.h |  6 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 5424e3a8df5f..b989e33aacda 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -45,7 +45,7 @@ static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
 static void file_free_rcu(struct rcu_head *head)
 {
-	struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
+	struct file *f = container_of(head, struct file, f_rcuhead);
 
 	put_cred(f->f_cred);
 	kmem_cache_free(filp_cachep, f);
@@ -56,7 +56,7 @@ static inline void file_free(struct file *f)
 	security_file_free(f);
 	if (!(f->f_mode & FMODE_NOACCOUNT))
 		percpu_counter_dec(&nr_files);
-	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
+	call_rcu(&f->f_rcuhead, file_free_rcu);
 }
 
 /*
@@ -142,7 +142,7 @@ static struct file *__alloc_file(int flags, const struct cred *cred)
 	f->f_cred = get_cred(cred);
 	error = security_file_alloc(f);
 	if (unlikely(error)) {
-		file_free_rcu(&f->f_u.fu_rcuhead);
+		file_free_rcu(&f->f_rcuhead);
 		return ERR_PTR(error);
 	}
 
@@ -341,13 +341,13 @@ static void delayed_fput(struct work_struct *unused)
 	struct llist_node *node = llist_del_all(&delayed_fput_list);
 	struct file *f, *t;
 
-	llist_for_each_entry_safe(f, t, node, f_u.fu_llist)
+	llist_for_each_entry_safe(f, t, node, f_llist)
 		__fput(f);
 }
 
 static void ____fput(struct callback_head *work)
 {
-	__fput(container_of(work, struct file, f_u.fu_rcuhead));
+	__fput(container_of(work, struct file, f_rcuhead));
 }
 
 /*
@@ -374,8 +374,8 @@ void fput(struct file *file)
 		struct task_struct *task = current;
 
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
-			init_task_work(&file->f_u.fu_rcuhead, ____fput);
-			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME))
+			init_task_work(&file->f_rcuhead, ____fput);
+			if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
 				return;
 			/*
 			 * After this task has run exit_task_work(),
@@ -384,7 +384,7 @@ void fput(struct file *file)
 			 */
 		}
 
-		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
+		if (llist_add(&file->f_llist, &delayed_fput_list))
 			schedule_delayed_work(&delayed_fput_work, 1);
 	}
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..6a2a4906041f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -924,9 +924,9 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index)
 
 struct file {
 	union {
-		struct llist_node	fu_llist;
-		struct rcu_head 	fu_rcuhead;
-	} f_u;
+		struct llist_node	f_llist;
+		struct rcu_head 	f_rcuhead;
+	};
 	struct path		f_path;
 	struct inode		*f_inode;	/* cached value */
 	const struct file_operations	*f_op;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 4/9] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (2 preceding siblings ...)
  2022-06-07  4:10 ` [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist Al Viro
@ 2022-06-07  4:11 ` Al Viro
  2022-06-07 10:34   ` Jan Kara
  2022-06-07  4:11 ` [PATCH 5/9] keep iocb_flags() result cached in struct file Al Viro
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

New helper to be used instead of direct checks for IOCB_DSYNC:
iocb_is_dsync(iocb).  Checks converted, which allows to avoid
the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines)
from iocb_flags() - it's checked in iocb_is_dsync() instead

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/fops.c         |  2 +-
 fs/btrfs/file.c      |  2 +-
 fs/direct-io.c       |  2 +-
 fs/fuse/file.c       |  2 +-
 fs/iomap/direct-io.c | 22 ++++++++++++----------
 fs/zonefs/super.c    |  2 +-
 include/linux/fs.h   | 10 ++++++++--
 7 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d6b3276a6c68..6e86931ab847 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -37,7 +37,7 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 	unsigned int op = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
 
 	/* avoid the need for a I/O completion work item */
-	if (iocb->ki_flags & IOCB_DSYNC)
+	if (iocb_is_dsync(iocb))
 		op |= REQ_FUA;
 	return op;
 }
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 98f81e304eb1..54358a5c9d56 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2021,7 +2021,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	struct file *file = iocb->ki_filp;
 	struct btrfs_inode *inode = BTRFS_I(file_inode(file));
 	ssize_t num_written, num_sync;
-	const bool sync = iocb->ki_flags & IOCB_DSYNC;
+	const bool sync = iocb_is_dsync(iocb);
 
 	/*
 	 * If the fs flips readonly due to some impossible error, although we
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 840752006f60..39647eb56904 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1210,7 +1210,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 */
 	if (dio->is_async && iov_iter_rw(iter) == WRITE) {
 		retval = 0;
-		if (iocb->ki_flags & IOCB_DSYNC)
+		if (iocb_is_dsync(iocb))
 			retval = dio_set_defer_completion(dio);
 		else if (!dio->inode->i_sb->s_dio_done_wq) {
 			/*
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 05caa2b9272e..00fa861aeead 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1042,7 +1042,7 @@ static unsigned int fuse_write_flags(struct kiocb *iocb)
 {
 	unsigned int flags = iocb->ki_filp->f_flags;
 
-	if (iocb->ki_flags & IOCB_DSYNC)
+	if (iocb_is_dsync(iocb))
 		flags |= O_DSYNC;
 	if (iocb->ki_flags & IOCB_SYNC)
 		flags |= O_SYNC;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 0f16479b13d6..2be8d9e98fbc 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -548,17 +548,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		}
 
 		/* for data sync or sync, we need sync completion processing */
-		if (iocb->ki_flags & IOCB_DSYNC && !(dio_flags & IOMAP_DIO_NOSYNC))
-			dio->flags |= IOMAP_DIO_NEED_SYNC;
+		if (iocb_is_dsync(iocb)) {
+			if (!(dio_flags & IOMAP_DIO_NOSYNC))
+				dio->flags |= IOMAP_DIO_NEED_SYNC;
 
-		/*
-		 * For datasync only writes, we optimistically try using FUA for
-		 * this IO.  Any non-FUA write that occurs will clear this flag,
-		 * hence we know before completion whether a cache flush is
-		 * necessary.
-		 */
-		if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC)
-			dio->flags |= IOMAP_DIO_WRITE_FUA;
+			/*
+			 * For datasync only writes, we optimistically try
+			 * using FUA for this IO.  Any non-FUA write that
+			 * occurs will clear this flag, hence we know before
+			 * completion whether a cache flush is necessary.
+			 */
+			if (!(iocb->ki_flags & IOCB_SYNC))
+				dio->flags |= IOMAP_DIO_WRITE_FUA;
+		}
 	}
 
 	if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index bcb21aea990a..04a98b4cd7ee 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -746,7 +746,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
 			REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE, GFP_NOFS);
 	bio->bi_iter.bi_sector = zi->i_zsector;
 	bio->bi_ioprio = iocb->ki_ioprio;
-	if (iocb->ki_flags & IOCB_DSYNC)
+	if (iocb_is_dsync(iocb))
 		bio->bi_opf |= REQ_FUA;
 
 	ret = bio_iov_iter_get_pages(bio, from);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6a2a4906041f..380a1292f4f9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2720,6 +2720,12 @@ extern int vfs_fsync(struct file *file, int datasync);
 extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
 				unsigned int flags);
 
+static inline bool iocb_is_dsync(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_DSYNC) ||
+		IS_SYNC(iocb->ki_filp->f_mapping->host);
+}
+
 /*
  * Sync the bytes written if this was a synchronous write.  Expect ki_pos
  * to already be updated for the write, and will return either the amount
@@ -2727,7 +2733,7 @@ extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
  */
 static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count)
 {
-	if (iocb->ki_flags & IOCB_DSYNC) {
+	if (iocb_is_dsync(iocb)) {
 		int ret = vfs_fsync_range(iocb->ki_filp,
 				iocb->ki_pos - count, iocb->ki_pos - 1,
 				(iocb->ki_flags & IOCB_SYNC) ? 0 : 1);
@@ -3262,7 +3268,7 @@ static inline int iocb_flags(struct file *file)
 		res |= IOCB_APPEND;
 	if (file->f_flags & O_DIRECT)
 		res |= IOCB_DIRECT;
-	if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
+	if (file->f_flags & O_DSYNC)
 		res |= IOCB_DSYNC;
 	if (file->f_flags & __O_SYNC)
 		res |= IOCB_SYNC;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 5/9] keep iocb_flags() result cached in struct file
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (3 preceding siblings ...)
  2022-06-07  4:11 ` [PATCH 4/9] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
@ 2022-06-07  4:11 ` Al Viro
  2022-06-07  4:12 ` [PATCH 6/9] copy_page_{to,from}_iter(): switch iovec variants to generic Al Viro
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:11 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

* calculate at the time we set FMODE_OPENED (do_dentry_open() for normal
opens, alloc_file() for pipe()/socket()/etc.)
* update when handling F_SETFL
* keep in a new field - file->f_i_flags; since that thing is needed only
before the refcount reaches zero, we can put it into the same anon union
where ->f_rcuhead and ->f_llist live - those are used only after refcount
reaches zero.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/nvme/target/io-cmd-file.c | 2 +-
 fs/aio.c                          | 2 +-
 fs/fcntl.c                        | 1 +
 fs/file_table.c                   | 1 +
 fs/io_uring.c                     | 2 +-
 fs/open.c                         | 1 +
 include/linux/fs.h                | 5 ++---
 7 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index f3d58abf11e0..2be306fe9c13 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -112,7 +112,7 @@ static ssize_t nvmet_file_submit_bvec(struct nvmet_req *req, loff_t pos,
 
 	iocb->ki_pos = pos;
 	iocb->ki_filp = req->ns->file;
-	iocb->ki_flags = ki_flags | iocb_flags(req->ns->file);
+	iocb->ki_flags = ki_flags | iocb->ki_filp->f_i_flags;
 
 	return call_iter(iocb, &iter);
 }
diff --git a/fs/aio.c b/fs/aio.c
index 3c249b938632..fb84adb6dc00 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1475,7 +1475,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
 	req->ki_complete = aio_complete_rw;
 	req->private = NULL;
 	req->ki_pos = iocb->aio_offset;
-	req->ki_flags = iocb_flags(req->ki_filp);
+	req->ki_flags = req->ki_filp->f_i_flags;
 	if (iocb->aio_flags & IOCB_FLAG_RESFD)
 		req->ki_flags |= IOCB_EVENTFD;
 	if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 34a3faa4886d..696faccb1726 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -78,6 +78,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 	}
 	spin_lock(&filp->f_lock);
 	filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
+	filp->f_i_flags = iocb_flags(filp);
 	spin_unlock(&filp->f_lock);
 
  out:
diff --git a/fs/file_table.c b/fs/file_table.c
index b989e33aacda..3d1800ad3857 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -241,6 +241,7 @@ static struct file *alloc_file(const struct path *path, int flags,
 	if ((file->f_mode & FMODE_WRITE) &&
 	     likely(fop->write || fop->write_iter))
 		file->f_mode |= FMODE_CAN_WRITE;
+	file->f_i_flags = iocb_flags(file);
 	file->f_mode |= FMODE_OPENED;
 	file->f_op = fop;
 	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3aab4182fd89..79d475bebf30 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4330,7 +4330,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 	if (!io_req_ffs_set(req))
 		req->flags |= io_file_get_flags(file) << REQ_F_SUPPORT_NOWAIT_BIT;
 
-	kiocb->ki_flags = iocb_flags(file);
+	kiocb->ki_flags = file->f_i_flags;
 	ret = kiocb_set_rw_flags(kiocb, req->rw.flags);
 	if (unlikely(ret))
 		return ret;
diff --git a/fs/open.c b/fs/open.c
index 1d57fbde2feb..1f45c63716ee 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -862,6 +862,7 @@ static int do_dentry_open(struct file *f,
 		f->f_mode |= FMODE_CAN_ODIRECT;
 
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
+	f->f_i_flags = iocb_flags(f);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 380a1292f4f9..7f4530a219b6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -926,6 +926,7 @@ struct file {
 	union {
 		struct llist_node	f_llist;
 		struct rcu_head 	f_rcuhead;
+		unsigned int 		f_i_flags;
 	};
 	struct path		f_path;
 	struct inode		*f_inode;	/* cached value */
@@ -2199,13 +2200,11 @@ static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns,
 	       !gid_valid(i_gid_into_mnt(mnt_userns, inode));
 }
 
-static inline int iocb_flags(struct file *file);
-
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
-		.ki_flags = iocb_flags(filp),
+		.ki_flags = filp->f_i_flags,
 		.ki_ioprio = get_current_ioprio(),
 	};
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 6/9] copy_page_{to,from}_iter(): switch iovec variants to generic
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (4 preceding siblings ...)
  2022-06-07  4:11 ` [PATCH 5/9] keep iocb_flags() result cached in struct file Al Viro
@ 2022-06-07  4:12 ` Al Viro
  2022-06-07  4:12 ` [PATCH 7/9] new iov_iter flavour - ITER_UBUF Al Viro
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:12 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

we can do copyin/copyout under kmap_local_page(); it shouldn't overflow
the kmap stack - the maximal footprint increase only by one here.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 191 ++-----------------------------------------------
 1 file changed, 4 insertions(+), 187 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..4c658a25e29c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -168,174 +168,6 @@ static int copyin(void *to, const void __user *from, size_t n)
 	return n;
 }
 
-static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
-			 struct iov_iter *i)
-{
-	size_t skip, copy, left, wanted;
-	const struct iovec *iov;
-	char __user *buf;
-	void *kaddr, *from;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-
-	if (unlikely(!bytes))
-		return 0;
-
-	might_fault();
-	wanted = bytes;
-	iov = i->iov;
-	skip = i->iov_offset;
-	buf = iov->iov_base + skip;
-	copy = min(bytes, iov->iov_len - skip);
-
-	if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_writeable(buf, copy)) {
-		kaddr = kmap_atomic(page);
-		from = kaddr + offset;
-
-		/* first chunk, usually the only one */
-		left = copyout(buf, from, copy);
-		copy -= left;
-		skip += copy;
-		from += copy;
-		bytes -= copy;
-
-		while (unlikely(!left && bytes)) {
-			iov++;
-			buf = iov->iov_base;
-			copy = min(bytes, iov->iov_len);
-			left = copyout(buf, from, copy);
-			copy -= left;
-			skip = copy;
-			from += copy;
-			bytes -= copy;
-		}
-		if (likely(!bytes)) {
-			kunmap_atomic(kaddr);
-			goto done;
-		}
-		offset = from - kaddr;
-		buf += copy;
-		kunmap_atomic(kaddr);
-		copy = min(bytes, iov->iov_len - skip);
-	}
-	/* Too bad - revert to non-atomic kmap */
-
-	kaddr = kmap(page);
-	from = kaddr + offset;
-	left = copyout(buf, from, copy);
-	copy -= left;
-	skip += copy;
-	from += copy;
-	bytes -= copy;
-	while (unlikely(!left && bytes)) {
-		iov++;
-		buf = iov->iov_base;
-		copy = min(bytes, iov->iov_len);
-		left = copyout(buf, from, copy);
-		copy -= left;
-		skip = copy;
-		from += copy;
-		bytes -= copy;
-	}
-	kunmap(page);
-
-done:
-	if (skip == iov->iov_len) {
-		iov++;
-		skip = 0;
-	}
-	i->count -= wanted - bytes;
-	i->nr_segs -= iov - i->iov;
-	i->iov = iov;
-	i->iov_offset = skip;
-	return wanted - bytes;
-}
-
-static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t bytes,
-			 struct iov_iter *i)
-{
-	size_t skip, copy, left, wanted;
-	const struct iovec *iov;
-	char __user *buf;
-	void *kaddr, *to;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-
-	if (unlikely(!bytes))
-		return 0;
-
-	might_fault();
-	wanted = bytes;
-	iov = i->iov;
-	skip = i->iov_offset;
-	buf = iov->iov_base + skip;
-	copy = min(bytes, iov->iov_len - skip);
-
-	if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_readable(buf, copy)) {
-		kaddr = kmap_atomic(page);
-		to = kaddr + offset;
-
-		/* first chunk, usually the only one */
-		left = copyin(to, buf, copy);
-		copy -= left;
-		skip += copy;
-		to += copy;
-		bytes -= copy;
-
-		while (unlikely(!left && bytes)) {
-			iov++;
-			buf = iov->iov_base;
-			copy = min(bytes, iov->iov_len);
-			left = copyin(to, buf, copy);
-			copy -= left;
-			skip = copy;
-			to += copy;
-			bytes -= copy;
-		}
-		if (likely(!bytes)) {
-			kunmap_atomic(kaddr);
-			goto done;
-		}
-		offset = to - kaddr;
-		buf += copy;
-		kunmap_atomic(kaddr);
-		copy = min(bytes, iov->iov_len - skip);
-	}
-	/* Too bad - revert to non-atomic kmap */
-
-	kaddr = kmap(page);
-	to = kaddr + offset;
-	left = copyin(to, buf, copy);
-	copy -= left;
-	skip += copy;
-	to += copy;
-	bytes -= copy;
-	while (unlikely(!left && bytes)) {
-		iov++;
-		buf = iov->iov_base;
-		copy = min(bytes, iov->iov_len);
-		left = copyin(to, buf, copy);
-		copy -= left;
-		skip = copy;
-		to += copy;
-		bytes -= copy;
-	}
-	kunmap(page);
-
-done:
-	if (skip == iov->iov_len) {
-		iov++;
-		skip = 0;
-	}
-	i->count -= wanted - bytes;
-	i->nr_segs -= iov - i->iov;
-	i->iov = iov;
-	i->iov_offset = skip;
-	return wanted - bytes;
-}
-
 #ifdef PIPE_PARANOIA
 static bool sanity(const struct iov_iter *i)
 {
@@ -848,24 +680,14 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
 static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
-	if (likely(iter_is_iovec(i)))
-		return copy_page_to_iter_iovec(page, offset, bytes, i);
-	if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
+	if (unlikely(iov_iter_is_pipe(i))) {
+		return copy_page_to_iter_pipe(page, offset, bytes, i);
+	} else {
 		void *kaddr = kmap_local_page(page);
 		size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
 		kunmap_local(kaddr);
 		return wanted;
 	}
-	if (iov_iter_is_pipe(i))
-		return copy_page_to_iter_pipe(page, offset, bytes, i);
-	if (unlikely(iov_iter_is_discard(i))) {
-		if (unlikely(i->count < bytes))
-			bytes = i->count;
-		i->count -= bytes;
-		return bytes;
-	}
-	WARN_ON(1);
-	return 0;
 }
 
 size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
@@ -896,17 +718,12 @@ EXPORT_SYMBOL(copy_page_to_iter);
 size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
-	if (unlikely(!page_copy_sane(page, offset, bytes)))
-		return 0;
-	if (likely(iter_is_iovec(i)))
-		return copy_page_from_iter_iovec(page, offset, bytes, i);
-	if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
+	if (page_copy_sane(page, offset, bytes)) {
 		void *kaddr = kmap_local_page(page);
 		size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
 		kunmap_local(kaddr);
 		return wanted;
 	}
-	WARN_ON(1);
 	return 0;
 }
 EXPORT_SYMBOL(copy_page_from_iter);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 7/9] new iov_iter flavour - ITER_UBUF
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (5 preceding siblings ...)
  2022-06-07  4:12 ` [PATCH 6/9] copy_page_{to,from}_iter(): switch iovec variants to generic Al Viro
@ 2022-06-07  4:12 ` Al Viro
  2022-06-07  4:13 ` [PATCH 8/9] switch new_sync_{read,write}() to ITER_UBUF Al Viro
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:12 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/fops.c         |  6 +--
 fs/ceph/file.c       |  2 +-
 fs/cifs/file.c       |  2 +-
 fs/direct-io.c       |  2 +-
 fs/fuse/dev.c        |  4 +-
 fs/fuse/file.c       |  2 +-
 fs/gfs2/file.c       |  2 +-
 fs/iomap/direct-io.c |  2 +-
 fs/nfs/direct.c      |  2 +-
 include/linux/uio.h  | 26 ++++++++++++
 lib/iov_iter.c       | 94 ++++++++++++++++++++++++++++++++++----------
 mm/shmem.c           |  2 +-
 12 files changed, 113 insertions(+), 33 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 6e86931ab847..3e68d69e0ee3 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -69,7 +69,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 
 	if (iov_iter_rw(iter) == READ) {
 		bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ);
-		if (iter_is_iovec(iter))
+		if (user_backed_iter(iter))
 			should_dirty = true;
 	} else {
 		bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
@@ -199,7 +199,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	}
 
 	dio->size = 0;
-	if (is_read && iter_is_iovec(iter))
+	if (is_read && user_backed_iter(iter))
 		dio->flags |= DIO_SHOULD_DIRTY;
 
 	blk_start_plug(&plug);
@@ -331,7 +331,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->size = bio->bi_iter.bi_size;
 
 	if (is_read) {
-		if (iter_is_iovec(iter)) {
+		if (user_backed_iter(iter)) {
 			dio->flags |= DIO_SHOULD_DIRTY;
 			bio_set_pages_dirty(bio);
 		}
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 8c8226c0feac..e132adeeaf16 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1262,7 +1262,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 	size_t count = iov_iter_count(iter);
 	loff_t pos = iocb->ki_pos;
 	bool write = iov_iter_rw(iter) == WRITE;
-	bool should_dirty = !write && iter_is_iovec(iter);
+	bool should_dirty = !write && user_backed_iter(iter);
 
 	if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP)
 		return -EROFS;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1618e0537d58..4b4129d9a90c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4004,7 +4004,7 @@ static ssize_t __cifs_readv(
 	if (!is_sync_kiocb(iocb))
 		ctx->iocb = iocb;
 
-	if (iter_is_iovec(to))
+	if (user_backed_iter(to))
 		ctx->should_dirty = true;
 
 	if (direct) {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 39647eb56904..72237f49ad94 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1245,7 +1245,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	spin_lock_init(&dio->bio_lock);
 	dio->refcount = 1;
 
-	dio->should_dirty = iter_is_iovec(iter) && iov_iter_rw(iter) == READ;
+	dio->should_dirty = user_backed_iter(iter) && iov_iter_rw(iter) == READ;
 	sdio.iter = iter;
 	sdio.final_block_in_request = end >> blkbits;
 
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0e537e580dc1..8d657c2cd6f7 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1356,7 +1356,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, struct iov_iter *to)
 	if (!fud)
 		return -EPERM;
 
-	if (!iter_is_iovec(to))
+	if (!user_backed_iter(to))
 		return -EINVAL;
 
 	fuse_copy_init(&cs, 1, to);
@@ -1949,7 +1949,7 @@ static ssize_t fuse_dev_write(struct kiocb *iocb, struct iov_iter *from)
 	if (!fud)
 		return -EPERM;
 
-	if (!iter_is_iovec(from))
+	if (!user_backed_iter(from))
 		return -EINVAL;
 
 	fuse_copy_init(&cs, 0, from);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 00fa861aeead..c982e3afe3b4 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1465,7 +1465,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 			inode_unlock(inode);
 	}
 
-	io->should_dirty = !write && iter_is_iovec(iter);
+	io->should_dirty = !write && user_backed_iter(iter);
 	while (count) {
 		ssize_t nres;
 		fl_owner_t owner = current->files;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 2cceb193dcd8..48e6cc74fdc1 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -780,7 +780,7 @@ static inline bool should_fault_in_pages(struct iov_iter *i,
 
 	if (!count)
 		return false;
-	if (!iter_is_iovec(i))
+	if (!user_backed_iter(i))
 		return false;
 
 	size = PAGE_SIZE;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 2be8d9e98fbc..322ac9ace23d 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -533,7 +533,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			iomi.flags |= IOMAP_NOWAIT;
 		}
 
-		if (iter_is_iovec(iter))
+		if (user_backed_iter(iter))
 			dio->flags |= IOMAP_DIO_DIRTY;
 	} else {
 		iomi.flags |= IOMAP_WRITE;
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 4eb2a8380a28..022e1ce63e62 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -478,7 +478,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
-	if (iter_is_iovec(iter))
+	if (user_backed_iter(iter))
 		dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
 
 	if (!swap)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 76d305f3d4c2..6ab4260c3d6c 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -26,6 +26,7 @@ enum iter_type {
 	ITER_PIPE,
 	ITER_XARRAY,
 	ITER_DISCARD,
+	ITER_UBUF,
 };
 
 struct iov_iter_state {
@@ -38,6 +39,7 @@ struct iov_iter {
 	u8 iter_type;
 	bool nofault;
 	bool data_source;
+	bool user_backed;
 	size_t iov_offset;
 	size_t count;
 	union {
@@ -46,6 +48,7 @@ struct iov_iter {
 		const struct bio_vec *bvec;
 		struct xarray *xarray;
 		struct pipe_inode_info *pipe;
+		void __user *ubuf;
 	};
 	union {
 		unsigned long nr_segs;
@@ -70,6 +73,11 @@ static inline void iov_iter_save_state(struct iov_iter *iter,
 	state->nr_segs = iter->nr_segs;
 }
 
+static inline bool iter_is_ubuf(const struct iov_iter *i)
+{
+	return iov_iter_type(i) == ITER_UBUF;
+}
+
 static inline bool iter_is_iovec(const struct iov_iter *i)
 {
 	return iov_iter_type(i) == ITER_IOVEC;
@@ -105,6 +113,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 	return i->data_source ? WRITE : READ;
 }
 
+static inline bool user_backed_iter(const struct iov_iter *i)
+{
+	return i->user_backed;
+}
+
 /*
  * Total number of bytes covered by an iovec.
  *
@@ -320,4 +333,17 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
 
+static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
+			void __user *buf, size_t count)
+{
+	WARN_ON(direction & ~(READ | WRITE));
+	*i = (struct iov_iter) {
+		.iter_type = ITER_UBUF,
+		.user_backed = true,
+		.data_source = direction,
+		.ubuf = buf,
+		.count = count
+	};
+}
+
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 4c658a25e29c..8275b28e886b 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -16,6 +16,16 @@
 
 #define PIPE_PARANOIA /* for now */
 
+/* covers ubuf and kbuf alike */
+#define iterate_buf(i, n, base, len, off, __p, STEP) {		\
+	size_t __maybe_unused off = 0;				\
+	len = n;						\
+	base = __p + i->iov_offset;				\
+	len -= (STEP);						\
+	i->iov_offset += len;					\
+	n = len;						\
+}
+
 /* covers iovec and kvec alike */
 #define iterate_iovec(i, n, base, len, off, __p, STEP) {	\
 	size_t off = 0;						\
@@ -110,7 +120,12 @@ __out:								\
 	if (unlikely(i->count < n))				\
 		n = i->count;					\
 	if (likely(n)) {					\
-		if (likely(iter_is_iovec(i))) {			\
+		if (likely(iter_is_ubuf(i))) {			\
+			void __user *base;			\
+			size_t len;				\
+			iterate_buf(i, n, base, len, off,	\
+						i->ubuf, (I)) 	\
+		} else if (likely(iter_is_iovec(i))) {		\
 			const struct iovec *iov = i->iov;	\
 			void __user *base;			\
 			size_t len;				\
@@ -275,7 +290,11 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
  */
 size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t size)
 {
-	if (iter_is_iovec(i)) {
+	if (iter_is_ubuf(i)) {
+		size_t n = min(size, iov_iter_count(i));
+		n -= fault_in_readable(i->ubuf + i->iov_offset, n);
+		return size - n;
+	} else if (iter_is_iovec(i)) {
 		size_t count = min(size, iov_iter_count(i));
 		const struct iovec *p;
 		size_t skip;
@@ -314,7 +333,11 @@ EXPORT_SYMBOL(fault_in_iov_iter_readable);
  */
 size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t size)
 {
-	if (iter_is_iovec(i)) {
+	if (iter_is_ubuf(i)) {
+		size_t n = min(size, iov_iter_count(i));
+		n -= fault_in_safe_writeable(i->ubuf + i->iov_offset, n);
+		return size - n;
+	} else if (iter_is_iovec(i)) {
 		size_t count = min(size, iov_iter_count(i));
 		const struct iovec *p;
 		size_t skip;
@@ -345,6 +368,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 	*i = (struct iov_iter) {
 		.iter_type = ITER_IOVEC,
 		.nofault = false,
+		.user_backed = true,
 		.data_source = direction,
 		.iov = iov,
 		.nr_segs = nr_segs,
@@ -494,7 +518,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(iov_iter_is_pipe(i)))
 		return copy_pipe_to_iter(addr, bytes, i);
-	if (iter_is_iovec(i))
+	if (user_backed_iter(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
 		copyout(base, addr + off, len),
@@ -576,7 +600,7 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(iov_iter_is_pipe(i)))
 		return copy_mc_pipe_to_iter(addr, bytes, i);
-	if (iter_is_iovec(i))
+	if (user_backed_iter(i))
 		might_fault();
 	__iterate_and_advance(i, bytes, base, len, off,
 		copyout_mc(base, addr + off, len),
@@ -594,7 +618,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 		WARN_ON(1);
 		return 0;
 	}
-	if (iter_is_iovec(i))
+	if (user_backed_iter(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
 		copyin(addr + off, base, len),
@@ -882,16 +906,16 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
 {
 	if (unlikely(i->count < size))
 		size = i->count;
-	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i))) {
+	if (likely(iter_is_ubuf(i)) || unlikely(iov_iter_is_xarray(i))) {
+		i->iov_offset += size;
+		i->count -= size;
+	} else if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i))) {
 		/* iovec and kvec have identical layouts */
 		iov_iter_iovec_advance(i, size);
 	} else if (iov_iter_is_bvec(i)) {
 		iov_iter_bvec_advance(i, size);
 	} else if (iov_iter_is_pipe(i)) {
 		pipe_advance(i, size);
-	} else if (unlikely(iov_iter_is_xarray(i))) {
-		i->iov_offset += size;
-		i->count -= size;
 	} else if (iov_iter_is_discard(i)) {
 		i->count -= size;
 	}
@@ -938,7 +962,7 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 		return;
 	}
 	unroll -= i->iov_offset;
-	if (iov_iter_is_xarray(i)) {
+	if (iov_iter_is_xarray(i) || iter_is_ubuf(i)) {
 		BUG(); /* We should never go beyond the start of the specified
 			* range since we might then be straying into pages that
 			* aren't pinned.
@@ -1129,6 +1153,13 @@ static unsigned long iov_iter_alignment_bvec(const struct iov_iter *i)
 
 unsigned long iov_iter_alignment(const struct iov_iter *i)
 {
+	if (likely(iter_is_ubuf(i))) {
+		size_t size = i->count;
+		if (size)
+			return ((unsigned long)i->ubuf + i->iov_offset) | size;
+		return 0;
+	}
+
 	/* iovec and kvec have identical layouts */
 	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
 		return iov_iter_alignment_iovec(i);
@@ -1159,6 +1190,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 	size_t size = i->count;
 	unsigned k;
 
+	if (iter_is_ubuf(i))
+		return 0;
+
 	if (WARN_ON(!iter_is_iovec(i)))
 		return ~0U;
 
@@ -1287,7 +1321,19 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	return actual;
 }
 
-/* must be done on non-empty ITER_IOVEC one */
+static unsigned long found_ubuf_segment(unsigned long addr,
+					size_t len,
+					size_t *size, size_t *start,
+					unsigned maxpages)
+{
+	len += (*start = addr % PAGE_SIZE);
+	if (len > maxpages * PAGE_SIZE)
+		len = maxpages * PAGE_SIZE;
+	*size = len;
+	return addr & PAGE_MASK;
+}
+
+/* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
 static unsigned long first_iovec_segment(const struct iov_iter *i,
 					 size_t *size, size_t *start,
 					 size_t maxsize, unsigned maxpages)
@@ -1295,6 +1341,11 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 	size_t skip;
 	long k;
 
+	if (iter_is_ubuf(i)) {
+		unsigned long addr = (unsigned long)i->ubuf + i->iov_offset;
+		return found_ubuf_segment(addr, maxsize, size, start, maxpages);
+	}
+
 	for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) {
 		unsigned long addr = (unsigned long)i->iov[k].iov_base + skip;
 		size_t len = i->iov[k].iov_len - skip;
@@ -1303,11 +1354,7 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 			continue;
 		if (len > maxsize)
 			len = maxsize;
-		len += (*start = addr % PAGE_SIZE);
-		if (len > maxpages * PAGE_SIZE)
-			len = maxpages * PAGE_SIZE;
-		*size = len;
-		return addr & PAGE_MASK;
+		return found_ubuf_segment(addr, len, size, start, maxpages);
 	}
 	BUG(); // if it had been empty, we wouldn't get called
 }
@@ -1344,7 +1391,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 	if (!maxsize)
 		return 0;
 
-	if (likely(iter_is_iovec(i))) {
+	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
 		unsigned long addr;
 
@@ -1470,7 +1517,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 	if (!maxsize)
 		return 0;
 
-	if (likely(iter_is_iovec(i))) {
+	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
 		unsigned long addr;
 
@@ -1624,6 +1671,11 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 {
 	if (unlikely(!i->count))
 		return 0;
+	if (likely(iter_is_ubuf(i))) {
+		unsigned offs = offset_in_page(i->ubuf + i->iov_offset);
+		int npages = DIV_ROUND_UP(offs + i->count, PAGE_SIZE);
+		return min(npages, maxpages);
+	}
 	/* iovec and kvec have identical layouts */
 	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
 		return iov_npages(i, maxpages);
@@ -1862,10 +1914,12 @@ EXPORT_SYMBOL(import_single_range);
 void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
 {
 	if (WARN_ON_ONCE(!iov_iter_is_bvec(i) && !iter_is_iovec(i)) &&
-			 !iov_iter_is_kvec(i))
+			 !iov_iter_is_kvec(i) && !iter_is_ubuf(i))
 		return;
 	i->iov_offset = state->iov_offset;
 	i->count = state->count;
+	if (iter_is_ubuf(i))
+		return;
 	/*
 	 * For the *vec iters, nr_segs + iov is constant - if we increment
 	 * the vec, then we also decrement the nr_segs count. Hence we don't
diff --git a/mm/shmem.c b/mm/shmem.c
index a6f565308133..6b83f3971795 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2603,7 +2603,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			ret = copy_page_to_iter(page, offset, nr, to);
 			put_page(page);
 
-		} else if (iter_is_iovec(to)) {
+		} else if (!user_backed_iter(to)) {
 			/*
 			 * Copy to user tends to be so well optimized, but
 			 * clear_user() not so much, that it is noticeably
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 8/9] switch new_sync_{read,write}() to ITER_UBUF
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (6 preceding siblings ...)
  2022-06-07  4:12 ` [PATCH 7/9] new iov_iter flavour - ITER_UBUF Al Viro
@ 2022-06-07  4:13 ` Al Viro
  2022-06-07  4:13 ` [PATCH 9/9] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:13 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/read_write.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index b1b1cdfee9d3..e82e4301cadd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -389,14 +389,13 @@ EXPORT_SYMBOL(rw_verify_area);
 
 static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
 {
-	struct iovec iov = { .iov_base = buf, .iov_len = len };
 	struct kiocb kiocb;
 	struct iov_iter iter;
 	ssize_t ret;
 
 	init_sync_kiocb(&kiocb, filp);
 	kiocb.ki_pos = (ppos ? *ppos : 0);
-	iov_iter_init(&iter, READ, &iov, 1, len);
+	iov_iter_ubuf(&iter, READ, buf, len);
 
 	ret = call_read_iter(filp, &kiocb, &iter);
 	BUG_ON(ret == -EIOCBQUEUED);
@@ -492,14 +491,13 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 
 static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
 {
-	struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
 	struct kiocb kiocb;
 	struct iov_iter iter;
 	ssize_t ret;
 
 	init_sync_kiocb(&kiocb, filp);
 	kiocb.ki_pos = (ppos ? *ppos : 0);
-	iov_iter_init(&iter, WRITE, &iov, 1, len);
+	iov_iter_ubuf(&iter, WRITE, (void __user *)buf, len);
 
 	ret = call_write_iter(filp, &kiocb, &iter);
 	BUG_ON(ret == -EIOCBQUEUED);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 9/9] iov_iter_bvec_advance(): don't bother with bvec_iter
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (7 preceding siblings ...)
  2022-06-07  4:13 ` [PATCH 8/9] switch new_sync_{read,write}() to ITER_UBUF Al Viro
@ 2022-06-07  4:13 ` Al Viro
  2022-06-08 19:28 ` [RFC][PATCHES] iov_iter stuff Sedat Dilek
  2022-06-17 22:30 ` Jens Axboe
  10 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07  4:13 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Jens Axboe, Christoph Hellwig, Matthew Wilcox

do what we do for iovec/kvec; that ends up generating better code,
AFAICS.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8275b28e886b..93ceb13ec7b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -870,17 +870,22 @@ static void pipe_advance(struct iov_iter *i, size_t size)
 
 static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
 {
-	struct bvec_iter bi;
+	const struct bio_vec *bvec, *end;
 
-	bi.bi_size = i->count;
-	bi.bi_bvec_done = i->iov_offset;
-	bi.bi_idx = 0;
-	bvec_iter_advance(i->bvec, &bi, size);
+	if (!i->count)
+		return;
+	i->count -= size;
+
+	size += i->iov_offset;
 
-	i->bvec += bi.bi_idx;
-	i->nr_segs -= bi.bi_idx;
-	i->count = bi.bi_size;
-	i->iov_offset = bi.bi_bvec_done;
+	for (bvec = i->bvec, end = bvec + i->nr_segs; bvec < end; bvec++) {
+		if (likely(size < bvec->bv_len))
+			break;
+		size -= bvec->bv_len;
+	}
+	i->iov_offset = size;
+	i->nr_segs -= bvec - i->bvec;
+	i->bvec = bvec;
 }
 
 static void iov_iter_iovec_advance(struct iov_iter *i, size_t size)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size()
  2022-06-07  4:09 ` [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size() Al Viro
@ 2022-06-07  4:41   ` Christoph Hellwig
  2022-06-07 11:49   ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2022-06-07  4:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Tue, Jun 07, 2022 at 04:09:23AM +0000, Al Viro wrote:
> it's inline and unlikely() inside of it (including the implicit one
> in WARN_ON_ONCE()) suffice to convince the compiler that getting
> false from check_copy_size() is unlikely.

Looks good, I also really like getting rid of the totally pointless
elses.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
  2022-06-07  4:09 ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Al Viro
@ 2022-06-07  4:42   ` Christoph Hellwig
  2022-06-07 16:06     ` Al Viro
  2022-06-07 14:49   ` Matthew Wilcox
  1 sibling, 1 reply; 93+ messages in thread
From: Christoph Hellwig @ 2022-06-07  4:42 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Tue, Jun 07, 2022 at 04:09:57AM +0000, Al Viro wrote:
> explicitly tell iomap to do it, rather than messing with IOCB_DSYNC
> [folded a fix for braino spotted by willy]

Please split the iomap and btrfs side into separate patches.

> +++ b/fs/btrfs/inode.c
> @@ -8152,7 +8152,7 @@ ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter, size_t done_befo
>  	struct btrfs_dio_data data;
>  
>  	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
> -			    IOMAP_DIO_PARTIAL, &data, done_before);
> +			    IOMAP_DIO_PARTIAL | IOMAP_DIO_NOSYNC, &data, done_before);

Please avoid the overly long line.

> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 370c3241618a..0f16479b13d6 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -548,7 +548,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		}
>  
>  		/* for data sync or sync, we need sync completion processing */
> -		if (iocb->ki_flags & IOCB_DSYNC)
> +		if (iocb->ki_flags & IOCB_DSYNC && !(dio_flags & IOMAP_DIO_NOSYNC))

Same here.  Also the FUA check below needs to check IOMAP_DIO_NOSYNC as
well.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist
  2022-06-07  4:10 ` [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist Al Viro
@ 2022-06-07 10:18   ` Jan Kara
  2022-06-07 11:46   ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Jan Kara @ 2022-06-07 10:18 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Tue 07-06-22 04:10:28, Al Viro wrote:
> Once upon a time we couldn't afford anon unions; these days minimal
> gcc version had been raised enough to take care of that.

This patch misses your Signed-off-by but otherwise it looks good to me.
Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/file_table.c    | 16 ++++++++--------
>  include/linux/fs.h |  6 +++---
>  2 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 5424e3a8df5f..b989e33aacda 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -45,7 +45,7 @@ static struct percpu_counter nr_files __cacheline_aligned_in_smp;
>  
>  static void file_free_rcu(struct rcu_head *head)
>  {
> -	struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
> +	struct file *f = container_of(head, struct file, f_rcuhead);
>  
>  	put_cred(f->f_cred);
>  	kmem_cache_free(filp_cachep, f);
> @@ -56,7 +56,7 @@ static inline void file_free(struct file *f)
>  	security_file_free(f);
>  	if (!(f->f_mode & FMODE_NOACCOUNT))
>  		percpu_counter_dec(&nr_files);
> -	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
> +	call_rcu(&f->f_rcuhead, file_free_rcu);
>  }
>  
>  /*
> @@ -142,7 +142,7 @@ static struct file *__alloc_file(int flags, const struct cred *cred)
>  	f->f_cred = get_cred(cred);
>  	error = security_file_alloc(f);
>  	if (unlikely(error)) {
> -		file_free_rcu(&f->f_u.fu_rcuhead);
> +		file_free_rcu(&f->f_rcuhead);
>  		return ERR_PTR(error);
>  	}
>  
> @@ -341,13 +341,13 @@ static void delayed_fput(struct work_struct *unused)
>  	struct llist_node *node = llist_del_all(&delayed_fput_list);
>  	struct file *f, *t;
>  
> -	llist_for_each_entry_safe(f, t, node, f_u.fu_llist)
> +	llist_for_each_entry_safe(f, t, node, f_llist)
>  		__fput(f);
>  }
>  
>  static void ____fput(struct callback_head *work)
>  {
> -	__fput(container_of(work, struct file, f_u.fu_rcuhead));
> +	__fput(container_of(work, struct file, f_rcuhead));
>  }
>  
>  /*
> @@ -374,8 +374,8 @@ void fput(struct file *file)
>  		struct task_struct *task = current;
>  
>  		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
> -			init_task_work(&file->f_u.fu_rcuhead, ____fput);
> -			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME))
> +			init_task_work(&file->f_rcuhead, ____fput);
> +			if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
>  				return;
>  			/*
>  			 * After this task has run exit_task_work(),
> @@ -384,7 +384,7 @@ void fput(struct file *file)
>  			 */
>  		}
>  
> -		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
> +		if (llist_add(&file->f_llist, &delayed_fput_list))
>  			schedule_delayed_work(&delayed_fput_work, 1);
>  	}
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ad5e3520fae..6a2a4906041f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -924,9 +924,9 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index)
>  
>  struct file {
>  	union {
> -		struct llist_node	fu_llist;
> -		struct rcu_head 	fu_rcuhead;
> -	} f_u;
> +		struct llist_node	f_llist;
> +		struct rcu_head 	f_rcuhead;
> +	};
>  	struct path		f_path;
>  	struct inode		*f_inode;	/* cached value */
>  	const struct file_operations	*f_op;
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 4/9] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
  2022-06-07  4:11 ` [PATCH 4/9] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
@ 2022-06-07 10:34   ` Jan Kara
  2022-06-07 15:34     ` Al Viro
  0 siblings, 1 reply; 93+ messages in thread
From: Jan Kara @ 2022-06-07 10:34 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Tue 07-06-22 04:11:06, Al Viro wrote:
> New helper to be used instead of direct checks for IOCB_DSYNC:
> iocb_is_dsync(iocb).  Checks converted, which allows to avoid
> the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines)
> from iocb_flags() - it's checked in iocb_is_dsync() instead
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Does it really matter that much when we have io_is_direct() just two lines
above which does: IS_DAX(filp->f_mapping->host)?

Also presumably even if we got rid of the IS_DAX check, we'll need to do
file->f_mapping->host traversal sooner rather than later anyway so it is
not clear to me how much it helps performance to defer this traversal to a
bit later.

Finally it seems a bit error prone to be checking some IOCB_ flags directly
while IOCB_DSYNC needs to be checked with iocb_is_dsync() instead. I think
we'll grow some place mistakenly checking IOCB_DSYNC sooner rather than
later. So maybe at least rename IOCB_DSYNC to __IOCB_DSYNC to make it more
obvious in the name that something unusual is going on?

								Honza

> ---
>  block/fops.c         |  2 +-
>  fs/btrfs/file.c      |  2 +-
>  fs/direct-io.c       |  2 +-
>  fs/fuse/file.c       |  2 +-
>  fs/iomap/direct-io.c | 22 ++++++++++++----------
>  fs/zonefs/super.c    |  2 +-
>  include/linux/fs.h   | 10 ++++++++--
>  7 files changed, 25 insertions(+), 17 deletions(-)
> 
> diff --git a/block/fops.c b/block/fops.c
> index d6b3276a6c68..6e86931ab847 100644
> --- a/block/fops.c
> +++ b/block/fops.c
> @@ -37,7 +37,7 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
>  	unsigned int op = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
>  
>  	/* avoid the need for a I/O completion work item */
> -	if (iocb->ki_flags & IOCB_DSYNC)
> +	if (iocb_is_dsync(iocb))
>  		op |= REQ_FUA;
>  	return op;
>  }
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 98f81e304eb1..54358a5c9d56 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2021,7 +2021,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  	struct file *file = iocb->ki_filp;
>  	struct btrfs_inode *inode = BTRFS_I(file_inode(file));
>  	ssize_t num_written, num_sync;
> -	const bool sync = iocb->ki_flags & IOCB_DSYNC;
> +	const bool sync = iocb_is_dsync(iocb);
>  
>  	/*
>  	 * If the fs flips readonly due to some impossible error, although we
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 840752006f60..39647eb56904 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1210,7 +1210,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	 */
>  	if (dio->is_async && iov_iter_rw(iter) == WRITE) {
>  		retval = 0;
> -		if (iocb->ki_flags & IOCB_DSYNC)
> +		if (iocb_is_dsync(iocb))
>  			retval = dio_set_defer_completion(dio);
>  		else if (!dio->inode->i_sb->s_dio_done_wq) {
>  			/*
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 05caa2b9272e..00fa861aeead 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1042,7 +1042,7 @@ static unsigned int fuse_write_flags(struct kiocb *iocb)
>  {
>  	unsigned int flags = iocb->ki_filp->f_flags;
>  
> -	if (iocb->ki_flags & IOCB_DSYNC)
> +	if (iocb_is_dsync(iocb))
>  		flags |= O_DSYNC;
>  	if (iocb->ki_flags & IOCB_SYNC)
>  		flags |= O_SYNC;
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 0f16479b13d6..2be8d9e98fbc 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -548,17 +548,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		}
>  
>  		/* for data sync or sync, we need sync completion processing */
> -		if (iocb->ki_flags & IOCB_DSYNC && !(dio_flags & IOMAP_DIO_NOSYNC))
> -			dio->flags |= IOMAP_DIO_NEED_SYNC;
> +		if (iocb_is_dsync(iocb)) {
> +			if (!(dio_flags & IOMAP_DIO_NOSYNC))
> +				dio->flags |= IOMAP_DIO_NEED_SYNC;
>  
> -		/*
> -		 * For datasync only writes, we optimistically try using FUA for
> -		 * this IO.  Any non-FUA write that occurs will clear this flag,
> -		 * hence we know before completion whether a cache flush is
> -		 * necessary.
> -		 */
> -		if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC)
> -			dio->flags |= IOMAP_DIO_WRITE_FUA;
> +			/*
> +			 * For datasync only writes, we optimistically try
> +			 * using FUA for this IO.  Any non-FUA write that
> +			 * occurs will clear this flag, hence we know before
> +			 * completion whether a cache flush is necessary.
> +			 */
> +			if (!(iocb->ki_flags & IOCB_SYNC))
> +				dio->flags |= IOMAP_DIO_WRITE_FUA;
> +		}
>  	}
>  
>  	if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index bcb21aea990a..04a98b4cd7ee 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -746,7 +746,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
>  			REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE, GFP_NOFS);
>  	bio->bi_iter.bi_sector = zi->i_zsector;
>  	bio->bi_ioprio = iocb->ki_ioprio;
> -	if (iocb->ki_flags & IOCB_DSYNC)
> +	if (iocb_is_dsync(iocb))
>  		bio->bi_opf |= REQ_FUA;
>  
>  	ret = bio_iov_iter_get_pages(bio, from);
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 6a2a4906041f..380a1292f4f9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2720,6 +2720,12 @@ extern int vfs_fsync(struct file *file, int datasync);
>  extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
>  				unsigned int flags);
>  
> +static inline bool iocb_is_dsync(const struct kiocb *iocb)
> +{
> +	return (iocb->ki_flags & IOCB_DSYNC) ||
> +		IS_SYNC(iocb->ki_filp->f_mapping->host);
> +}
> +
>  /*
>   * Sync the bytes written if this was a synchronous write.  Expect ki_pos
>   * to already be updated for the write, and will return either the amount
> @@ -2727,7 +2733,7 @@ extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
>   */
>  static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count)
>  {
> -	if (iocb->ki_flags & IOCB_DSYNC) {
> +	if (iocb_is_dsync(iocb)) {
>  		int ret = vfs_fsync_range(iocb->ki_filp,
>  				iocb->ki_pos - count, iocb->ki_pos - 1,
>  				(iocb->ki_flags & IOCB_SYNC) ? 0 : 1);
> @@ -3262,7 +3268,7 @@ static inline int iocb_flags(struct file *file)
>  		res |= IOCB_APPEND;
>  	if (file->f_flags & O_DIRECT)
>  		res |= IOCB_DIRECT;
> -	if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
> +	if (file->f_flags & O_DSYNC)
>  		res |= IOCB_DSYNC;
>  	if (file->f_flags & __O_SYNC)
>  		res |= IOCB_SYNC;
> -- 
> 2.30.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist
  2022-06-07  4:10 ` [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist Al Viro
  2022-06-07 10:18   ` Jan Kara
@ 2022-06-07 11:46   ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Christian Brauner @ 2022-06-07 11:46 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Tue, Jun 07, 2022 at 04:10:28AM +0000, Al Viro wrote:
> Once upon a time we couldn't afford anon unions; these days minimal
> gcc version had been raised enough to take care of that.
> ---

Neat,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size()
  2022-06-07  4:09 ` [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size() Al Viro
  2022-06-07  4:41   ` Christoph Hellwig
@ 2022-06-07 11:49   ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Christian Brauner @ 2022-06-07 11:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Tue, Jun 07, 2022 at 04:09:23AM +0000, Al Viro wrote:
> it's inline and unlikely() inside of it (including the implicit one
> in WARN_ON_ONCE()) suffice to convince the compiler that getting
> false from check_copy_size() is unlikely.
> 
> Spotted-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
  2022-06-07  4:09 ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Al Viro
  2022-06-07  4:42   ` Christoph Hellwig
@ 2022-06-07 14:49   ` Matthew Wilcox
  2022-06-07 20:17     ` Al Viro
  1 sibling, 1 reply; 93+ messages in thread
From: Matthew Wilcox @ 2022-06-07 14:49 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig

On Tue, Jun 07, 2022 at 04:09:57AM +0000, Al Viro wrote:
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e552097c67e0..95de0c771d37 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -353,6 +353,8 @@ struct iomap_dio_ops {
>   */
>  #define IOMAP_DIO_PARTIAL		(1 << 2)
>  
> +#define IOMAP_DIO_NOSYNC		(1 << 3)
> +
>  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
>  		unsigned int dio_flags, void *private, size_t done_before);

You didn't like the little comment I added?

+/*
+ * The caller will sync the write if needed; do not sync it within
+ * iomap_dio_rw.  Overrides IOMAP_DIO_FORCE_WAIT.
+ */


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 4/9] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
  2022-06-07 10:34   ` Jan Kara
@ 2022-06-07 15:34     ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 15:34 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Tue, Jun 07, 2022 at 12:34:50PM +0200, Jan Kara wrote:
> On Tue 07-06-22 04:11:06, Al Viro wrote:
> > New helper to be used instead of direct checks for IOCB_DSYNC:
> > iocb_is_dsync(iocb).  Checks converted, which allows to avoid
> > the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines)
> > from iocb_flags() - it's checked in iocb_is_dsync() instead
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> Does it really matter that much when we have io_is_direct() just two lines
> above which does: IS_DAX(filp->f_mapping->host)?
> 
> Also presumably even if we got rid of the IS_DAX check, we'll need to do
> file->f_mapping->host traversal sooner rather than later anyway so it is
> not clear to me how much it helps performance to defer this traversal to a
> bit later.

... which would be out of the part of codepath that is shared with e.g.
reads and writes on pipes.

> Finally it seems a bit error prone to be checking some IOCB_ flags directly
> while IOCB_DSYNC needs to be checked with iocb_is_dsync() instead. I think
> we'll grow some place mistakenly checking IOCB_DSYNC sooner rather than
> later. So maybe at least rename IOCB_DSYNC to __IOCB_DSYNC to make it more
> obvious in the name that something unusual is going on?

That might make sense...

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
  2022-06-07  4:42   ` Christoph Hellwig
@ 2022-06-07 16:06     ` Al Viro
  2022-06-07 23:27       ` Al Viro
  2022-06-08  6:16       ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Christoph Hellwig
  0 siblings, 2 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 16:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 06:42:17AM +0200, Christoph Hellwig wrote:

> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index 370c3241618a..0f16479b13d6 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -548,7 +548,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >  		}
> >  
> >  		/* for data sync or sync, we need sync completion processing */
> > -		if (iocb->ki_flags & IOCB_DSYNC)
> > +		if (iocb->ki_flags & IOCB_DSYNC && !(dio_flags & IOMAP_DIO_NOSYNC))
> 
> Same here.

Dealt with in the next commit, actually.

> Also the FUA check below needs to check IOMAP_DIO_NOSYNC as
> well.

Does it?  AFAICS, we don't really care about REQ_FUA on any requests - what
btrfs hack tries to avoid is stepping into
        if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
		ret = generic_write_sync(iocb, ret);
with generic_write_sync() called by btrfs_do_write_iter() after it has
dropped the lock held through btrfs_direct_write().  Do we want to
suppress REQ_FUA on the requests generated by __iomap_dio_rw() in
that case (DSYNC, !SYNC)?  Confused...

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
  2022-06-07 14:49   ` Matthew Wilcox
@ 2022-06-07 20:17     ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 20:17 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig

On Tue, Jun 07, 2022 at 03:49:25PM +0100, Matthew Wilcox wrote:
> On Tue, Jun 07, 2022 at 04:09:57AM +0000, Al Viro wrote:
> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> > index e552097c67e0..95de0c771d37 100644
> > --- a/include/linux/iomap.h
> > +++ b/include/linux/iomap.h
> > @@ -353,6 +353,8 @@ struct iomap_dio_ops {
> >   */
> >  #define IOMAP_DIO_PARTIAL		(1 << 2)
> >  
> > +#define IOMAP_DIO_NOSYNC		(1 << 3)
> > +
> >  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
> >  		unsigned int dio_flags, void *private, size_t done_before);
> 
> You didn't like the little comment I added?

I didn't have your patch in front of my eyes at the time...
Comment folded in now.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
  2022-06-07 16:06     ` Al Viro
@ 2022-06-07 23:27       ` Al Viro
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
  2022-06-08  6:16       ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Christoph Hellwig
  1 sibling, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-fsdevel, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 04:06:53PM +0000, Al Viro wrote:
> On Tue, Jun 07, 2022 at 06:42:17AM +0200, Christoph Hellwig wrote:
> 
> > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > > index 370c3241618a..0f16479b13d6 100644
> > > --- a/fs/iomap/direct-io.c
> > > +++ b/fs/iomap/direct-io.c
> > > @@ -548,7 +548,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> > >  		}
> > >  
> > >  		/* for data sync or sync, we need sync completion processing */
> > > -		if (iocb->ki_flags & IOCB_DSYNC)
> > > +		if (iocb->ki_flags & IOCB_DSYNC && !(dio_flags & IOMAP_DIO_NOSYNC))
> > 
> > Same here.
> 
> Dealt with in the next commit, actually.
> 
> > Also the FUA check below needs to check IOMAP_DIO_NOSYNC as
> > well.
> 
> Does it?  AFAICS, we don't really care about REQ_FUA on any requests - what
> btrfs hack tries to avoid is stepping into
>         if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
> 		ret = generic_write_sync(iocb, ret);
> with generic_write_sync() called by btrfs_do_write_iter() after it has
> dropped the lock held through btrfs_direct_write().  Do we want to
> suppress REQ_FUA on the requests generated by __iomap_dio_rw() in
> that case (DSYNC, !SYNC)?  Confused...

Anyway, updated branch force-pushed...

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size()
  2022-06-07 23:27       ` Al Viro
@ 2022-06-07 23:31         ` Al Viro
  2022-06-07 23:31           ` [PATCH 02/10] teach iomap_dio_rw() to suppress dsync Al Viro
                             ` (8 more replies)
  0 siblings, 9 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

it's inline and unlikely() inside of it (including the implicit one
in WARN_ON_ONCE()) suffice to convince the compiler that getting
false from check_copy_size() is unlikely.

Spotted-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 arch/powerpc/include/asm/uaccess.h |  2 +-
 arch/s390/include/asm/uaccess.h    |  4 ++--
 include/linux/uaccess.h            |  4 ++--
 include/linux/uio.h                | 15 ++++++---------
 4 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 9b82b38ff867..105f200b1e31 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -348,7 +348,7 @@ copy_mc_to_kernel(void *to, const void *from, unsigned long size)
 static inline unsigned long __must_check
 copy_mc_to_user(void __user *to, const void *from, unsigned long n)
 {
-	if (likely(check_copy_size(from, n, true))) {
+	if (check_copy_size(from, n, true)) {
 		if (access_ok(to, n)) {
 			allow_write_to_user(to, n);
 			n = copy_mc_generic((void *)to, from, n);
diff --git a/arch/s390/include/asm/uaccess.h b/arch/s390/include/asm/uaccess.h
index f4511e21d646..c2c9995466e0 100644
--- a/arch/s390/include/asm/uaccess.h
+++ b/arch/s390/include/asm/uaccess.h
@@ -39,7 +39,7 @@ _copy_from_user_key(void *to, const void __user *from, unsigned long n, unsigned
 static __always_inline unsigned long __must_check
 copy_from_user_key(void *to, const void __user *from, unsigned long n, unsigned long key)
 {
-	if (likely(check_copy_size(to, n, false)))
+	if (check_copy_size(to, n, false))
 		n = _copy_from_user_key(to, from, n, key);
 	return n;
 }
@@ -50,7 +50,7 @@ _copy_to_user_key(void __user *to, const void *from, unsigned long n, unsigned l
 static __always_inline unsigned long __must_check
 copy_to_user_key(void __user *to, const void *from, unsigned long n, unsigned long key)
 {
-	if (likely(check_copy_size(from, n, true)))
+	if (check_copy_size(from, n, true))
 		n = _copy_to_user_key(to, from, n, key);
 	return n;
 }
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 5a328cf02b75..47e5d374c7eb 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -148,7 +148,7 @@ _copy_to_user(void __user *, const void *, unsigned long);
 static __always_inline unsigned long __must_check
 copy_from_user(void *to, const void __user *from, unsigned long n)
 {
-	if (likely(check_copy_size(to, n, false)))
+	if (check_copy_size(to, n, false))
 		n = _copy_from_user(to, from, n);
 	return n;
 }
@@ -156,7 +156,7 @@ copy_from_user(void *to, const void __user *from, unsigned long n)
 static __always_inline unsigned long __must_check
 copy_to_user(void __user *to, const void *from, unsigned long n)
 {
-	if (likely(check_copy_size(from, n, true)))
+	if (check_copy_size(from, n, true))
 		n = _copy_to_user(to, from, n);
 	return n;
 }
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 739285fe5a2f..76d305f3d4c2 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -156,19 +156,17 @@ static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset,
 static __always_inline __must_check
 size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(!check_copy_size(addr, bytes, true)))
-		return 0;
-	else
+	if (check_copy_size(addr, bytes, true))
 		return _copy_to_iter(addr, bytes, i);
+	return 0;
 }
 
 static __always_inline __must_check
 size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(!check_copy_size(addr, bytes, false)))
-		return 0;
-	else
+	if (check_copy_size(addr, bytes, false))
 		return _copy_from_iter(addr, bytes, i);
+	return 0;
 }
 
 static __always_inline __must_check
@@ -184,10 +182,9 @@ bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
 static __always_inline __must_check
 size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
 {
-	if (unlikely(!check_copy_size(addr, bytes, false)))
-		return 0;
-	else
+	if (check_copy_size(addr, bytes, false))
 		return _copy_from_iter_nocache(addr, bytes, i);
+	return 0;
 }
 
 static __always_inline __must_check
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 02/10] teach iomap_dio_rw() to suppress dsync
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-08  6:18             ` Christoph Hellwig
                               ` (2 more replies)
  2022-06-07 23:31           ` [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC Al Viro
                             ` (7 subsequent siblings)
  8 siblings, 3 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

New flag, equivalent to removal of IOCB_DSYNC from iocb flags.
This mimics what btrfs is doing (and that's what btrfs will
switch to).  However, I'm not at all sure that we want to
suppress REQ_FUA for those - all btrfs hack really cares about
is suppression of generic_write_sync().  For now let's keep
the existing behaviour, but I really want to hear more detailed
arguments pro or contra.

[folded brain fix from willy]

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/iomap/direct-io.c  | 20 +++++++++++---------
 include/linux/iomap.h |  6 ++++++
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 370c3241618a..c10c69e2de24 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -548,17 +548,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		}
 
 		/* for data sync or sync, we need sync completion processing */
-		if (iocb->ki_flags & IOCB_DSYNC)
+		if (iocb->ki_flags & IOCB_DSYNC &&
+		    !(dio_flags & IOMAP_DIO_NOSYNC)) {
 			dio->flags |= IOMAP_DIO_NEED_SYNC;
 
-		/*
-		 * For datasync only writes, we optimistically try using FUA for
-		 * this IO.  Any non-FUA write that occurs will clear this flag,
-		 * hence we know before completion whether a cache flush is
-		 * necessary.
-		 */
-		if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC)
-			dio->flags |= IOMAP_DIO_WRITE_FUA;
+		       /*
+			* For datasync only writes, we optimistically try
+			* using FUA for this IO.  Any non-FUA write that
+			* occurs will clear this flag, hence we know before
+			* completion whether a cache flush is necessary.
+			*/
+			if (!(iocb->ki_flags & IOCB_SYNC))
+				dio->flags |= IOMAP_DIO_WRITE_FUA;
+		}
 	}
 
 	if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index e552097c67e0..c8622d8f064e 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -353,6 +353,12 @@ struct iomap_dio_ops {
  */
 #define IOMAP_DIO_PARTIAL		(1 << 2)
 
+/*
+ * The caller will sync the write if needed; do not sync it within
+ * iomap_dio_rw.  Overrides IOMAP_DIO_FORCE_WAIT.
+ */
+#define IOMAP_DIO_NOSYNC		(1 << 3)
+
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, void *private, size_t done_before);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
  2022-06-07 23:31           ` [PATCH 02/10] teach iomap_dio_rw() to suppress dsync Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-08  6:18             ` Christoph Hellwig
  2022-06-10 11:09             ` Christian Brauner
  2022-06-07 23:31           ` [PATCH 04/10] struct file: use anonymous union member for rcuhead and llist Al Viro
                             ` (6 subsequent siblings)
  8 siblings, 2 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

... instead of messing with iocb flags

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/btrfs/file.c  | 17 -----------------
 fs/btrfs/inode.c |  3 ++-
 2 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 1fd827b99c1b..98f81e304eb1 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1848,7 +1848,6 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info,
 
 static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 {
-	const bool is_sync_write = (iocb->ki_flags & IOCB_DSYNC);
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file_inode(file);
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
@@ -1901,15 +1900,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 		goto buffered;
 	}
 
-	/*
-	 * We remove IOCB_DSYNC so that we don't deadlock when iomap_dio_rw()
-	 * calls generic_write_sync() (through iomap_dio_complete()), because
-	 * that results in calling fsync (btrfs_sync_file()) which will try to
-	 * lock the inode in exclusive/write mode.
-	 */
-	if (is_sync_write)
-		iocb->ki_flags &= ~IOCB_DSYNC;
-
 	/*
 	 * The iov_iter can be mapped to the same file range we are writing to.
 	 * If that's the case, then we will deadlock in the iomap code, because
@@ -1964,13 +1954,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
 
 	btrfs_inode_unlock(inode, ilock_flags);
 
-	/*
-	 * Add back IOCB_DSYNC. Our caller, btrfs_file_write_iter(), will do
-	 * the fsync (call generic_write_sync()).
-	 */
-	if (is_sync_write)
-		iocb->ki_flags |= IOCB_DSYNC;
-
 	/* If 'err' is -ENOTBLK then it means we must fallback to buffered IO. */
 	if ((err < 0 && err != -ENOTBLK) || !iov_iter_count(from))
 		goto out;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 81737eff92f3..fbf0aee7d66a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8152,7 +8152,8 @@ ssize_t btrfs_dio_rw(struct kiocb *iocb, struct iov_iter *iter, size_t done_befo
 	struct btrfs_dio_data data;
 
 	return iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dio_ops,
-			    IOMAP_DIO_PARTIAL, &data, done_before);
+			    IOMAP_DIO_PARTIAL | IOMAP_DIO_NOSYNC,
+			    &data, done_before);
 }
 
 static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 04/10] struct file: use anonymous union member for rcuhead and llist
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
  2022-06-07 23:31           ` [PATCH 02/10] teach iomap_dio_rw() to suppress dsync Al Viro
  2022-06-07 23:31           ` [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-07 23:31           ` [PATCH 05/10] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
                             ` (5 subsequent siblings)
  8 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Once upon a time we couldn't afford anon unions; these days minimal
gcc version had been raised enough to take care of that.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/file_table.c    | 16 ++++++++--------
 include/linux/fs.h |  6 +++---
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 5424e3a8df5f..b989e33aacda 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -45,7 +45,7 @@ static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
 static void file_free_rcu(struct rcu_head *head)
 {
-	struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
+	struct file *f = container_of(head, struct file, f_rcuhead);
 
 	put_cred(f->f_cred);
 	kmem_cache_free(filp_cachep, f);
@@ -56,7 +56,7 @@ static inline void file_free(struct file *f)
 	security_file_free(f);
 	if (!(f->f_mode & FMODE_NOACCOUNT))
 		percpu_counter_dec(&nr_files);
-	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
+	call_rcu(&f->f_rcuhead, file_free_rcu);
 }
 
 /*
@@ -142,7 +142,7 @@ static struct file *__alloc_file(int flags, const struct cred *cred)
 	f->f_cred = get_cred(cred);
 	error = security_file_alloc(f);
 	if (unlikely(error)) {
-		file_free_rcu(&f->f_u.fu_rcuhead);
+		file_free_rcu(&f->f_rcuhead);
 		return ERR_PTR(error);
 	}
 
@@ -341,13 +341,13 @@ static void delayed_fput(struct work_struct *unused)
 	struct llist_node *node = llist_del_all(&delayed_fput_list);
 	struct file *f, *t;
 
-	llist_for_each_entry_safe(f, t, node, f_u.fu_llist)
+	llist_for_each_entry_safe(f, t, node, f_llist)
 		__fput(f);
 }
 
 static void ____fput(struct callback_head *work)
 {
-	__fput(container_of(work, struct file, f_u.fu_rcuhead));
+	__fput(container_of(work, struct file, f_rcuhead));
 }
 
 /*
@@ -374,8 +374,8 @@ void fput(struct file *file)
 		struct task_struct *task = current;
 
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
-			init_task_work(&file->f_u.fu_rcuhead, ____fput);
-			if (!task_work_add(task, &file->f_u.fu_rcuhead, TWA_RESUME))
+			init_task_work(&file->f_rcuhead, ____fput);
+			if (!task_work_add(task, &file->f_rcuhead, TWA_RESUME))
 				return;
 			/*
 			 * After this task has run exit_task_work(),
@@ -384,7 +384,7 @@ void fput(struct file *file)
 			 */
 		}
 
-		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
+		if (llist_add(&file->f_llist, &delayed_fput_list))
 			schedule_delayed_work(&delayed_fput_work, 1);
 	}
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ad5e3520fae..6a2a4906041f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -924,9 +924,9 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index)
 
 struct file {
 	union {
-		struct llist_node	fu_llist;
-		struct rcu_head 	fu_rcuhead;
-	} f_u;
+		struct llist_node	f_llist;
+		struct rcu_head 	f_rcuhead;
+	};
 	struct path		f_path;
 	struct inode		*f_inode;	/* cached value */
 	const struct file_operations	*f_op;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 05/10] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
                             ` (2 preceding siblings ...)
  2022-06-07 23:31           ` [PATCH 04/10] struct file: use anonymous union member for rcuhead and llist Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-10 11:41             ` Christian Brauner
  2022-06-07 23:31           ` [PATCH 06/10] keep iocb_flags() result cached in struct file Al Viro
                             ` (4 subsequent siblings)
  8 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

New helper to be used instead of direct checks for IOCB_DSYNC:
iocb_is_dsync(iocb).  Checks converted, which allows to avoid
the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines)
from iocb_flags() - it's checked in iocb_is_dsync() instead

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/fops.c         |  2 +-
 fs/btrfs/file.c      |  2 +-
 fs/direct-io.c       |  2 +-
 fs/fuse/file.c       |  2 +-
 fs/iomap/direct-io.c |  3 +--
 fs/zonefs/super.c    |  2 +-
 include/linux/fs.h   | 10 ++++++++--
 7 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index d6b3276a6c68..6e86931ab847 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -37,7 +37,7 @@ static unsigned int dio_bio_write_op(struct kiocb *iocb)
 	unsigned int op = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
 
 	/* avoid the need for a I/O completion work item */
-	if (iocb->ki_flags & IOCB_DSYNC)
+	if (iocb_is_dsync(iocb))
 		op |= REQ_FUA;
 	return op;
 }
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 98f81e304eb1..54358a5c9d56 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2021,7 +2021,7 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from,
 	struct file *file = iocb->ki_filp;
 	struct btrfs_inode *inode = BTRFS_I(file_inode(file));
 	ssize_t num_written, num_sync;
-	const bool sync = iocb->ki_flags & IOCB_DSYNC;
+	const bool sync = iocb_is_dsync(iocb);
 
 	/*
 	 * If the fs flips readonly due to some impossible error, although we
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 840752006f60..39647eb56904 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1210,7 +1210,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 */
 	if (dio->is_async && iov_iter_rw(iter) == WRITE) {
 		retval = 0;
-		if (iocb->ki_flags & IOCB_DSYNC)
+		if (iocb_is_dsync(iocb))
 			retval = dio_set_defer_completion(dio);
 		else if (!dio->inode->i_sb->s_dio_done_wq) {
 			/*
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 05caa2b9272e..00fa861aeead 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1042,7 +1042,7 @@ static unsigned int fuse_write_flags(struct kiocb *iocb)
 {
 	unsigned int flags = iocb->ki_filp->f_flags;
 
-	if (iocb->ki_flags & IOCB_DSYNC)
+	if (iocb_is_dsync(iocb))
 		flags |= O_DSYNC;
 	if (iocb->ki_flags & IOCB_SYNC)
 		flags |= O_SYNC;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index c10c69e2de24..31c7f1035b20 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -548,8 +548,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		}
 
 		/* for data sync or sync, we need sync completion processing */
-		if (iocb->ki_flags & IOCB_DSYNC &&
-		    !(dio_flags & IOMAP_DIO_NOSYNC)) {
+		if (iocb_is_dsync(iocb) && !(dio_flags & IOMAP_DIO_NOSYNC)) {
 			dio->flags |= IOMAP_DIO_NEED_SYNC;
 
 		       /*
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index bcb21aea990a..04a98b4cd7ee 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -746,7 +746,7 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
 			REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE, GFP_NOFS);
 	bio->bi_iter.bi_sector = zi->i_zsector;
 	bio->bi_ioprio = iocb->ki_ioprio;
-	if (iocb->ki_flags & IOCB_DSYNC)
+	if (iocb_is_dsync(iocb))
 		bio->bi_opf |= REQ_FUA;
 
 	ret = bio_iov_iter_get_pages(bio, from);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6a2a4906041f..380a1292f4f9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2720,6 +2720,12 @@ extern int vfs_fsync(struct file *file, int datasync);
 extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
 				unsigned int flags);
 
+static inline bool iocb_is_dsync(const struct kiocb *iocb)
+{
+	return (iocb->ki_flags & IOCB_DSYNC) ||
+		IS_SYNC(iocb->ki_filp->f_mapping->host);
+}
+
 /*
  * Sync the bytes written if this was a synchronous write.  Expect ki_pos
  * to already be updated for the write, and will return either the amount
@@ -2727,7 +2733,7 @@ extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
  */
 static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count)
 {
-	if (iocb->ki_flags & IOCB_DSYNC) {
+	if (iocb_is_dsync(iocb)) {
 		int ret = vfs_fsync_range(iocb->ki_filp,
 				iocb->ki_pos - count, iocb->ki_pos - 1,
 				(iocb->ki_flags & IOCB_SYNC) ? 0 : 1);
@@ -3262,7 +3268,7 @@ static inline int iocb_flags(struct file *file)
 		res |= IOCB_APPEND;
 	if (file->f_flags & O_DIRECT)
 		res |= IOCB_DIRECT;
-	if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
+	if (file->f_flags & O_DSYNC)
 		res |= IOCB_DSYNC;
 	if (file->f_flags & __O_SYNC)
 		res |= IOCB_SYNC;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 06/10] keep iocb_flags() result cached in struct file
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
                             ` (3 preceding siblings ...)
  2022-06-07 23:31           ` [PATCH 05/10] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-09  0:35             ` Dave Chinner
  2022-06-10 11:43             ` Christian Brauner
  2022-06-07 23:31           ` [PATCH 07/10] copy_page_{to,from}_iter(): switch iovec variants to generic Al Viro
                             ` (3 subsequent siblings)
  8 siblings, 2 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

* calculate at the time we set FMODE_OPENED (do_dentry_open() for normal
opens, alloc_file() for pipe()/socket()/etc.)
* update when handling F_SETFL
* keep in a new field - file->f_i_flags; since that thing is needed only
before the refcount reaches zero, we can put it into the same anon union
where ->f_rcuhead and ->f_llist live - those are used only after refcount
reaches zero.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/nvme/target/io-cmd-file.c | 2 +-
 fs/aio.c                          | 2 +-
 fs/fcntl.c                        | 1 +
 fs/file_table.c                   | 1 +
 fs/io_uring.c                     | 2 +-
 fs/open.c                         | 1 +
 include/linux/fs.h                | 5 ++---
 7 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index f3d58abf11e0..2be306fe9c13 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -112,7 +112,7 @@ static ssize_t nvmet_file_submit_bvec(struct nvmet_req *req, loff_t pos,
 
 	iocb->ki_pos = pos;
 	iocb->ki_filp = req->ns->file;
-	iocb->ki_flags = ki_flags | iocb_flags(req->ns->file);
+	iocb->ki_flags = ki_flags | iocb->ki_filp->f_i_flags;
 
 	return call_iter(iocb, &iter);
 }
diff --git a/fs/aio.c b/fs/aio.c
index 3c249b938632..fb84adb6dc00 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1475,7 +1475,7 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
 	req->ki_complete = aio_complete_rw;
 	req->private = NULL;
 	req->ki_pos = iocb->aio_offset;
-	req->ki_flags = iocb_flags(req->ki_filp);
+	req->ki_flags = req->ki_filp->f_i_flags;
 	if (iocb->aio_flags & IOCB_FLAG_RESFD)
 		req->ki_flags |= IOCB_EVENTFD;
 	if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 34a3faa4886d..696faccb1726 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -78,6 +78,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 	}
 	spin_lock(&filp->f_lock);
 	filp->f_flags = (arg & SETFL_MASK) | (filp->f_flags & ~SETFL_MASK);
+	filp->f_i_flags = iocb_flags(filp);
 	spin_unlock(&filp->f_lock);
 
  out:
diff --git a/fs/file_table.c b/fs/file_table.c
index b989e33aacda..3d1800ad3857 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -241,6 +241,7 @@ static struct file *alloc_file(const struct path *path, int flags,
 	if ((file->f_mode & FMODE_WRITE) &&
 	     likely(fop->write || fop->write_iter))
 		file->f_mode |= FMODE_CAN_WRITE;
+	file->f_i_flags = iocb_flags(file);
 	file->f_mode |= FMODE_OPENED;
 	file->f_op = fop;
 	if ((file->f_mode & (FMODE_READ | FMODE_WRITE)) == FMODE_READ)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3aab4182fd89..79d475bebf30 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4330,7 +4330,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 	if (!io_req_ffs_set(req))
 		req->flags |= io_file_get_flags(file) << REQ_F_SUPPORT_NOWAIT_BIT;
 
-	kiocb->ki_flags = iocb_flags(file);
+	kiocb->ki_flags = file->f_i_flags;
 	ret = kiocb_set_rw_flags(kiocb, req->rw.flags);
 	if (unlikely(ret))
 		return ret;
diff --git a/fs/open.c b/fs/open.c
index 1d57fbde2feb..1f45c63716ee 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -862,6 +862,7 @@ static int do_dentry_open(struct file *f,
 		f->f_mode |= FMODE_CAN_ODIRECT;
 
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
+	f->f_i_flags = iocb_flags(f);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 380a1292f4f9..7f4530a219b6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -926,6 +926,7 @@ struct file {
 	union {
 		struct llist_node	f_llist;
 		struct rcu_head 	f_rcuhead;
+		unsigned int 		f_i_flags;
 	};
 	struct path		f_path;
 	struct inode		*f_inode;	/* cached value */
@@ -2199,13 +2200,11 @@ static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns,
 	       !gid_valid(i_gid_into_mnt(mnt_userns, inode));
 }
 
-static inline int iocb_flags(struct file *file);
-
 static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
 {
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
-		.ki_flags = iocb_flags(filp),
+		.ki_flags = filp->f_i_flags,
 		.ki_ioprio = get_current_ioprio(),
 	};
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 07/10] copy_page_{to,from}_iter(): switch iovec variants to generic
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
                             ` (4 preceding siblings ...)
  2022-06-07 23:31           ` [PATCH 06/10] keep iocb_flags() result cached in struct file Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-07 23:31           ` [PATCH 08/10] new iov_iter flavour - ITER_UBUF Al Viro
                             ` (2 subsequent siblings)
  8 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

we can do copyin/copyout under kmap_local_page(); it shouldn't overflow
the kmap stack - the maximal footprint increase only by one here.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 191 ++-----------------------------------------------
 1 file changed, 4 insertions(+), 187 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a99..4c658a25e29c 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -168,174 +168,6 @@ static int copyin(void *to, const void __user *from, size_t n)
 	return n;
 }
 
-static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
-			 struct iov_iter *i)
-{
-	size_t skip, copy, left, wanted;
-	const struct iovec *iov;
-	char __user *buf;
-	void *kaddr, *from;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-
-	if (unlikely(!bytes))
-		return 0;
-
-	might_fault();
-	wanted = bytes;
-	iov = i->iov;
-	skip = i->iov_offset;
-	buf = iov->iov_base + skip;
-	copy = min(bytes, iov->iov_len - skip);
-
-	if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_writeable(buf, copy)) {
-		kaddr = kmap_atomic(page);
-		from = kaddr + offset;
-
-		/* first chunk, usually the only one */
-		left = copyout(buf, from, copy);
-		copy -= left;
-		skip += copy;
-		from += copy;
-		bytes -= copy;
-
-		while (unlikely(!left && bytes)) {
-			iov++;
-			buf = iov->iov_base;
-			copy = min(bytes, iov->iov_len);
-			left = copyout(buf, from, copy);
-			copy -= left;
-			skip = copy;
-			from += copy;
-			bytes -= copy;
-		}
-		if (likely(!bytes)) {
-			kunmap_atomic(kaddr);
-			goto done;
-		}
-		offset = from - kaddr;
-		buf += copy;
-		kunmap_atomic(kaddr);
-		copy = min(bytes, iov->iov_len - skip);
-	}
-	/* Too bad - revert to non-atomic kmap */
-
-	kaddr = kmap(page);
-	from = kaddr + offset;
-	left = copyout(buf, from, copy);
-	copy -= left;
-	skip += copy;
-	from += copy;
-	bytes -= copy;
-	while (unlikely(!left && bytes)) {
-		iov++;
-		buf = iov->iov_base;
-		copy = min(bytes, iov->iov_len);
-		left = copyout(buf, from, copy);
-		copy -= left;
-		skip = copy;
-		from += copy;
-		bytes -= copy;
-	}
-	kunmap(page);
-
-done:
-	if (skip == iov->iov_len) {
-		iov++;
-		skip = 0;
-	}
-	i->count -= wanted - bytes;
-	i->nr_segs -= iov - i->iov;
-	i->iov = iov;
-	i->iov_offset = skip;
-	return wanted - bytes;
-}
-
-static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t bytes,
-			 struct iov_iter *i)
-{
-	size_t skip, copy, left, wanted;
-	const struct iovec *iov;
-	char __user *buf;
-	void *kaddr, *to;
-
-	if (unlikely(bytes > i->count))
-		bytes = i->count;
-
-	if (unlikely(!bytes))
-		return 0;
-
-	might_fault();
-	wanted = bytes;
-	iov = i->iov;
-	skip = i->iov_offset;
-	buf = iov->iov_base + skip;
-	copy = min(bytes, iov->iov_len - skip);
-
-	if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_readable(buf, copy)) {
-		kaddr = kmap_atomic(page);
-		to = kaddr + offset;
-
-		/* first chunk, usually the only one */
-		left = copyin(to, buf, copy);
-		copy -= left;
-		skip += copy;
-		to += copy;
-		bytes -= copy;
-
-		while (unlikely(!left && bytes)) {
-			iov++;
-			buf = iov->iov_base;
-			copy = min(bytes, iov->iov_len);
-			left = copyin(to, buf, copy);
-			copy -= left;
-			skip = copy;
-			to += copy;
-			bytes -= copy;
-		}
-		if (likely(!bytes)) {
-			kunmap_atomic(kaddr);
-			goto done;
-		}
-		offset = to - kaddr;
-		buf += copy;
-		kunmap_atomic(kaddr);
-		copy = min(bytes, iov->iov_len - skip);
-	}
-	/* Too bad - revert to non-atomic kmap */
-
-	kaddr = kmap(page);
-	to = kaddr + offset;
-	left = copyin(to, buf, copy);
-	copy -= left;
-	skip += copy;
-	to += copy;
-	bytes -= copy;
-	while (unlikely(!left && bytes)) {
-		iov++;
-		buf = iov->iov_base;
-		copy = min(bytes, iov->iov_len);
-		left = copyin(to, buf, copy);
-		copy -= left;
-		skip = copy;
-		to += copy;
-		bytes -= copy;
-	}
-	kunmap(page);
-
-done:
-	if (skip == iov->iov_len) {
-		iov++;
-		skip = 0;
-	}
-	i->count -= wanted - bytes;
-	i->nr_segs -= iov - i->iov;
-	i->iov = iov;
-	i->iov_offset = skip;
-	return wanted - bytes;
-}
-
 #ifdef PIPE_PARANOIA
 static bool sanity(const struct iov_iter *i)
 {
@@ -848,24 +680,14 @@ static inline bool page_copy_sane(struct page *page, size_t offset, size_t n)
 static size_t __copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
-	if (likely(iter_is_iovec(i)))
-		return copy_page_to_iter_iovec(page, offset, bytes, i);
-	if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
+	if (unlikely(iov_iter_is_pipe(i))) {
+		return copy_page_to_iter_pipe(page, offset, bytes, i);
+	} else {
 		void *kaddr = kmap_local_page(page);
 		size_t wanted = _copy_to_iter(kaddr + offset, bytes, i);
 		kunmap_local(kaddr);
 		return wanted;
 	}
-	if (iov_iter_is_pipe(i))
-		return copy_page_to_iter_pipe(page, offset, bytes, i);
-	if (unlikely(iov_iter_is_discard(i))) {
-		if (unlikely(i->count < bytes))
-			bytes = i->count;
-		i->count -= bytes;
-		return bytes;
-	}
-	WARN_ON(1);
-	return 0;
 }
 
 size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
@@ -896,17 +718,12 @@ EXPORT_SYMBOL(copy_page_to_iter);
 size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
-	if (unlikely(!page_copy_sane(page, offset, bytes)))
-		return 0;
-	if (likely(iter_is_iovec(i)))
-		return copy_page_from_iter_iovec(page, offset, bytes, i);
-	if (iov_iter_is_bvec(i) || iov_iter_is_kvec(i) || iov_iter_is_xarray(i)) {
+	if (page_copy_sane(page, offset, bytes)) {
 		void *kaddr = kmap_local_page(page);
 		size_t wanted = _copy_from_iter(kaddr + offset, bytes, i);
 		kunmap_local(kaddr);
 		return wanted;
 	}
-	WARN_ON(1);
 	return 0;
 }
 EXPORT_SYMBOL(copy_page_from_iter);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 08/10] new iov_iter flavour - ITER_UBUF
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
                             ` (5 preceding siblings ...)
  2022-06-07 23:31           ` [PATCH 07/10] copy_page_{to,from}_iter(): switch iovec variants to generic Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-07 23:31           ` [PATCH 09/10] switch new_sync_{read,write}() to ITER_UBUF Al Viro
  2022-06-07 23:31           ` [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
  8 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/fops.c         |  6 +--
 fs/ceph/file.c       |  2 +-
 fs/cifs/file.c       |  2 +-
 fs/direct-io.c       |  2 +-
 fs/fuse/dev.c        |  4 +-
 fs/fuse/file.c       |  2 +-
 fs/gfs2/file.c       |  2 +-
 fs/iomap/direct-io.c |  2 +-
 fs/nfs/direct.c      |  2 +-
 include/linux/uio.h  | 26 ++++++++++++
 lib/iov_iter.c       | 94 ++++++++++++++++++++++++++++++++++----------
 mm/shmem.c           |  2 +-
 12 files changed, 113 insertions(+), 33 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index 6e86931ab847..3e68d69e0ee3 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -69,7 +69,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 
 	if (iov_iter_rw(iter) == READ) {
 		bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ);
-		if (iter_is_iovec(iter))
+		if (user_backed_iter(iter))
 			should_dirty = true;
 	} else {
 		bio_init(&bio, bdev, vecs, nr_pages, dio_bio_write_op(iocb));
@@ -199,7 +199,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	}
 
 	dio->size = 0;
-	if (is_read && iter_is_iovec(iter))
+	if (is_read && user_backed_iter(iter))
 		dio->flags |= DIO_SHOULD_DIRTY;
 
 	blk_start_plug(&plug);
@@ -331,7 +331,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->size = bio->bi_iter.bi_size;
 
 	if (is_read) {
-		if (iter_is_iovec(iter)) {
+		if (user_backed_iter(iter)) {
 			dio->flags |= DIO_SHOULD_DIRTY;
 			bio_set_pages_dirty(bio);
 		}
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 8c8226c0feac..e132adeeaf16 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1262,7 +1262,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 	size_t count = iov_iter_count(iter);
 	loff_t pos = iocb->ki_pos;
 	bool write = iov_iter_rw(iter) == WRITE;
-	bool should_dirty = !write && iter_is_iovec(iter);
+	bool should_dirty = !write && user_backed_iter(iter);
 
 	if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP)
 		return -EROFS;
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 1618e0537d58..4b4129d9a90c 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4004,7 +4004,7 @@ static ssize_t __cifs_readv(
 	if (!is_sync_kiocb(iocb))
 		ctx->iocb = iocb;
 
-	if (iter_is_iovec(to))
+	if (user_backed_iter(to))
 		ctx->should_dirty = true;
 
 	if (direct) {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 39647eb56904..72237f49ad94 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1245,7 +1245,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	spin_lock_init(&dio->bio_lock);
 	dio->refcount = 1;
 
-	dio->should_dirty = iter_is_iovec(iter) && iov_iter_rw(iter) == READ;
+	dio->should_dirty = user_backed_iter(iter) && iov_iter_rw(iter) == READ;
 	sdio.iter = iter;
 	sdio.final_block_in_request = end >> blkbits;
 
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0e537e580dc1..8d657c2cd6f7 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1356,7 +1356,7 @@ static ssize_t fuse_dev_read(struct kiocb *iocb, struct iov_iter *to)
 	if (!fud)
 		return -EPERM;
 
-	if (!iter_is_iovec(to))
+	if (!user_backed_iter(to))
 		return -EINVAL;
 
 	fuse_copy_init(&cs, 1, to);
@@ -1949,7 +1949,7 @@ static ssize_t fuse_dev_write(struct kiocb *iocb, struct iov_iter *from)
 	if (!fud)
 		return -EPERM;
 
-	if (!iter_is_iovec(from))
+	if (!user_backed_iter(from))
 		return -EINVAL;
 
 	fuse_copy_init(&cs, 0, from);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 00fa861aeead..c982e3afe3b4 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1465,7 +1465,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
 			inode_unlock(inode);
 	}
 
-	io->should_dirty = !write && iter_is_iovec(iter);
+	io->should_dirty = !write && user_backed_iter(iter);
 	while (count) {
 		ssize_t nres;
 		fl_owner_t owner = current->files;
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 2cceb193dcd8..48e6cc74fdc1 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -780,7 +780,7 @@ static inline bool should_fault_in_pages(struct iov_iter *i,
 
 	if (!count)
 		return false;
-	if (!iter_is_iovec(i))
+	if (!user_backed_iter(i))
 		return false;
 
 	size = PAGE_SIZE;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 31c7f1035b20..d5c7d019653b 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -533,7 +533,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			iomi.flags |= IOMAP_NOWAIT;
 		}
 
-		if (iter_is_iovec(iter))
+		if (user_backed_iter(iter))
 			dio->flags |= IOMAP_DIO_DIRTY;
 	} else {
 		iomi.flags |= IOMAP_WRITE;
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 4eb2a8380a28..022e1ce63e62 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -478,7 +478,7 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
 	if (!is_sync_kiocb(iocb))
 		dreq->iocb = iocb;
 
-	if (iter_is_iovec(iter))
+	if (user_backed_iter(iter))
 		dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
 
 	if (!swap)
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 76d305f3d4c2..6ab4260c3d6c 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -26,6 +26,7 @@ enum iter_type {
 	ITER_PIPE,
 	ITER_XARRAY,
 	ITER_DISCARD,
+	ITER_UBUF,
 };
 
 struct iov_iter_state {
@@ -38,6 +39,7 @@ struct iov_iter {
 	u8 iter_type;
 	bool nofault;
 	bool data_source;
+	bool user_backed;
 	size_t iov_offset;
 	size_t count;
 	union {
@@ -46,6 +48,7 @@ struct iov_iter {
 		const struct bio_vec *bvec;
 		struct xarray *xarray;
 		struct pipe_inode_info *pipe;
+		void __user *ubuf;
 	};
 	union {
 		unsigned long nr_segs;
@@ -70,6 +73,11 @@ static inline void iov_iter_save_state(struct iov_iter *iter,
 	state->nr_segs = iter->nr_segs;
 }
 
+static inline bool iter_is_ubuf(const struct iov_iter *i)
+{
+	return iov_iter_type(i) == ITER_UBUF;
+}
+
 static inline bool iter_is_iovec(const struct iov_iter *i)
 {
 	return iov_iter_type(i) == ITER_IOVEC;
@@ -105,6 +113,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
 	return i->data_source ? WRITE : READ;
 }
 
+static inline bool user_backed_iter(const struct iov_iter *i)
+{
+	return i->user_backed;
+}
+
 /*
  * Total number of bytes covered by an iovec.
  *
@@ -320,4 +333,17 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec,
 int import_single_range(int type, void __user *buf, size_t len,
 		 struct iovec *iov, struct iov_iter *i);
 
+static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
+			void __user *buf, size_t count)
+{
+	WARN_ON(direction & ~(READ | WRITE));
+	*i = (struct iov_iter) {
+		.iter_type = ITER_UBUF,
+		.user_backed = true,
+		.data_source = direction,
+		.ubuf = buf,
+		.count = count
+	};
+}
+
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 4c658a25e29c..8275b28e886b 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -16,6 +16,16 @@
 
 #define PIPE_PARANOIA /* for now */
 
+/* covers ubuf and kbuf alike */
+#define iterate_buf(i, n, base, len, off, __p, STEP) {		\
+	size_t __maybe_unused off = 0;				\
+	len = n;						\
+	base = __p + i->iov_offset;				\
+	len -= (STEP);						\
+	i->iov_offset += len;					\
+	n = len;						\
+}
+
 /* covers iovec and kvec alike */
 #define iterate_iovec(i, n, base, len, off, __p, STEP) {	\
 	size_t off = 0;						\
@@ -110,7 +120,12 @@ __out:								\
 	if (unlikely(i->count < n))				\
 		n = i->count;					\
 	if (likely(n)) {					\
-		if (likely(iter_is_iovec(i))) {			\
+		if (likely(iter_is_ubuf(i))) {			\
+			void __user *base;			\
+			size_t len;				\
+			iterate_buf(i, n, base, len, off,	\
+						i->ubuf, (I)) 	\
+		} else if (likely(iter_is_iovec(i))) {		\
 			const struct iovec *iov = i->iov;	\
 			void __user *base;			\
 			size_t len;				\
@@ -275,7 +290,11 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
  */
 size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t size)
 {
-	if (iter_is_iovec(i)) {
+	if (iter_is_ubuf(i)) {
+		size_t n = min(size, iov_iter_count(i));
+		n -= fault_in_readable(i->ubuf + i->iov_offset, n);
+		return size - n;
+	} else if (iter_is_iovec(i)) {
 		size_t count = min(size, iov_iter_count(i));
 		const struct iovec *p;
 		size_t skip;
@@ -314,7 +333,11 @@ EXPORT_SYMBOL(fault_in_iov_iter_readable);
  */
 size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t size)
 {
-	if (iter_is_iovec(i)) {
+	if (iter_is_ubuf(i)) {
+		size_t n = min(size, iov_iter_count(i));
+		n -= fault_in_safe_writeable(i->ubuf + i->iov_offset, n);
+		return size - n;
+	} else if (iter_is_iovec(i)) {
 		size_t count = min(size, iov_iter_count(i));
 		const struct iovec *p;
 		size_t skip;
@@ -345,6 +368,7 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 	*i = (struct iov_iter) {
 		.iter_type = ITER_IOVEC,
 		.nofault = false,
+		.user_backed = true,
 		.data_source = direction,
 		.iov = iov,
 		.nr_segs = nr_segs,
@@ -494,7 +518,7 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(iov_iter_is_pipe(i)))
 		return copy_pipe_to_iter(addr, bytes, i);
-	if (iter_is_iovec(i))
+	if (user_backed_iter(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
 		copyout(base, addr + off, len),
@@ -576,7 +600,7 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i)
 {
 	if (unlikely(iov_iter_is_pipe(i)))
 		return copy_mc_pipe_to_iter(addr, bytes, i);
-	if (iter_is_iovec(i))
+	if (user_backed_iter(i))
 		might_fault();
 	__iterate_and_advance(i, bytes, base, len, off,
 		copyout_mc(base, addr + off, len),
@@ -594,7 +618,7 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
 		WARN_ON(1);
 		return 0;
 	}
-	if (iter_is_iovec(i))
+	if (user_backed_iter(i))
 		might_fault();
 	iterate_and_advance(i, bytes, base, len, off,
 		copyin(addr + off, base, len),
@@ -882,16 +906,16 @@ void iov_iter_advance(struct iov_iter *i, size_t size)
 {
 	if (unlikely(i->count < size))
 		size = i->count;
-	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i))) {
+	if (likely(iter_is_ubuf(i)) || unlikely(iov_iter_is_xarray(i))) {
+		i->iov_offset += size;
+		i->count -= size;
+	} else if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i))) {
 		/* iovec and kvec have identical layouts */
 		iov_iter_iovec_advance(i, size);
 	} else if (iov_iter_is_bvec(i)) {
 		iov_iter_bvec_advance(i, size);
 	} else if (iov_iter_is_pipe(i)) {
 		pipe_advance(i, size);
-	} else if (unlikely(iov_iter_is_xarray(i))) {
-		i->iov_offset += size;
-		i->count -= size;
 	} else if (iov_iter_is_discard(i)) {
 		i->count -= size;
 	}
@@ -938,7 +962,7 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 		return;
 	}
 	unroll -= i->iov_offset;
-	if (iov_iter_is_xarray(i)) {
+	if (iov_iter_is_xarray(i) || iter_is_ubuf(i)) {
 		BUG(); /* We should never go beyond the start of the specified
 			* range since we might then be straying into pages that
 			* aren't pinned.
@@ -1129,6 +1153,13 @@ static unsigned long iov_iter_alignment_bvec(const struct iov_iter *i)
 
 unsigned long iov_iter_alignment(const struct iov_iter *i)
 {
+	if (likely(iter_is_ubuf(i))) {
+		size_t size = i->count;
+		if (size)
+			return ((unsigned long)i->ubuf + i->iov_offset) | size;
+		return 0;
+	}
+
 	/* iovec and kvec have identical layouts */
 	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
 		return iov_iter_alignment_iovec(i);
@@ -1159,6 +1190,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 	size_t size = i->count;
 	unsigned k;
 
+	if (iter_is_ubuf(i))
+		return 0;
+
 	if (WARN_ON(!iter_is_iovec(i)))
 		return ~0U;
 
@@ -1287,7 +1321,19 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	return actual;
 }
 
-/* must be done on non-empty ITER_IOVEC one */
+static unsigned long found_ubuf_segment(unsigned long addr,
+					size_t len,
+					size_t *size, size_t *start,
+					unsigned maxpages)
+{
+	len += (*start = addr % PAGE_SIZE);
+	if (len > maxpages * PAGE_SIZE)
+		len = maxpages * PAGE_SIZE;
+	*size = len;
+	return addr & PAGE_MASK;
+}
+
+/* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
 static unsigned long first_iovec_segment(const struct iov_iter *i,
 					 size_t *size, size_t *start,
 					 size_t maxsize, unsigned maxpages)
@@ -1295,6 +1341,11 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 	size_t skip;
 	long k;
 
+	if (iter_is_ubuf(i)) {
+		unsigned long addr = (unsigned long)i->ubuf + i->iov_offset;
+		return found_ubuf_segment(addr, maxsize, size, start, maxpages);
+	}
+
 	for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) {
 		unsigned long addr = (unsigned long)i->iov[k].iov_base + skip;
 		size_t len = i->iov[k].iov_len - skip;
@@ -1303,11 +1354,7 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 			continue;
 		if (len > maxsize)
 			len = maxsize;
-		len += (*start = addr % PAGE_SIZE);
-		if (len > maxpages * PAGE_SIZE)
-			len = maxpages * PAGE_SIZE;
-		*size = len;
-		return addr & PAGE_MASK;
+		return found_ubuf_segment(addr, len, size, start, maxpages);
 	}
 	BUG(); // if it had been empty, we wouldn't get called
 }
@@ -1344,7 +1391,7 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 	if (!maxsize)
 		return 0;
 
-	if (likely(iter_is_iovec(i))) {
+	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
 		unsigned long addr;
 
@@ -1470,7 +1517,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 	if (!maxsize)
 		return 0;
 
-	if (likely(iter_is_iovec(i))) {
+	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
 		unsigned long addr;
 
@@ -1624,6 +1671,11 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 {
 	if (unlikely(!i->count))
 		return 0;
+	if (likely(iter_is_ubuf(i))) {
+		unsigned offs = offset_in_page(i->ubuf + i->iov_offset);
+		int npages = DIV_ROUND_UP(offs + i->count, PAGE_SIZE);
+		return min(npages, maxpages);
+	}
 	/* iovec and kvec have identical layouts */
 	if (likely(iter_is_iovec(i) || iov_iter_is_kvec(i)))
 		return iov_npages(i, maxpages);
@@ -1862,10 +1914,12 @@ EXPORT_SYMBOL(import_single_range);
 void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state)
 {
 	if (WARN_ON_ONCE(!iov_iter_is_bvec(i) && !iter_is_iovec(i)) &&
-			 !iov_iter_is_kvec(i))
+			 !iov_iter_is_kvec(i) && !iter_is_ubuf(i))
 		return;
 	i->iov_offset = state->iov_offset;
 	i->count = state->count;
+	if (iter_is_ubuf(i))
+		return;
 	/*
 	 * For the *vec iters, nr_segs + iov is constant - if we increment
 	 * the vec, then we also decrement the nr_segs count. Hence we don't
diff --git a/mm/shmem.c b/mm/shmem.c
index a6f565308133..6b83f3971795 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2603,7 +2603,7 @@ static ssize_t shmem_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 			ret = copy_page_to_iter(page, offset, nr, to);
 			put_page(page);
 
-		} else if (iter_is_iovec(to)) {
+		} else if (!user_backed_iter(to)) {
 			/*
 			 * Copy to user tends to be so well optimized, but
 			 * clear_user() not so much, that it is noticeably
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 09/10] switch new_sync_{read,write}() to ITER_UBUF
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
                             ` (6 preceding siblings ...)
  2022-06-07 23:31           ` [PATCH 08/10] new iov_iter flavour - ITER_UBUF Al Viro
@ 2022-06-07 23:31           ` Al Viro
  2022-06-10 11:11             ` Christian Brauner
  2022-06-07 23:31           ` [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
  8 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/read_write.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index b1b1cdfee9d3..e82e4301cadd 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -389,14 +389,13 @@ EXPORT_SYMBOL(rw_verify_area);
 
 static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
 {
-	struct iovec iov = { .iov_base = buf, .iov_len = len };
 	struct kiocb kiocb;
 	struct iov_iter iter;
 	ssize_t ret;
 
 	init_sync_kiocb(&kiocb, filp);
 	kiocb.ki_pos = (ppos ? *ppos : 0);
-	iov_iter_init(&iter, READ, &iov, 1, len);
+	iov_iter_ubuf(&iter, READ, buf, len);
 
 	ret = call_read_iter(filp, &kiocb, &iter);
 	BUG_ON(ret == -EIOCBQUEUED);
@@ -492,14 +491,13 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 
 static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
 {
-	struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
 	struct kiocb kiocb;
 	struct iov_iter iter;
 	ssize_t ret;
 
 	init_sync_kiocb(&kiocb, filp);
 	kiocb.ki_pos = (ppos ? *ppos : 0);
-	iov_iter_init(&iter, WRITE, &iov, 1, len);
+	iov_iter_ubuf(&iter, WRITE, (void __user *)buf, len);
 
 	ret = call_write_iter(filp, &kiocb, &iter);
 	BUG_ON(ret == -EIOCBQUEUED);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter
  2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
                             ` (7 preceding siblings ...)
  2022-06-07 23:31           ` [PATCH 09/10] switch new_sync_{read,write}() to ITER_UBUF Al Viro
@ 2022-06-07 23:31           ` Al Viro
  8 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-07 23:31 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

do what we do for iovec/kvec; that ends up generating better code,
AFAICS.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8275b28e886b..93ceb13ec7b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -870,17 +870,22 @@ static void pipe_advance(struct iov_iter *i, size_t size)
 
 static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
 {
-	struct bvec_iter bi;
+	const struct bio_vec *bvec, *end;
 
-	bi.bi_size = i->count;
-	bi.bi_bvec_done = i->iov_offset;
-	bi.bi_idx = 0;
-	bvec_iter_advance(i->bvec, &bi, size);
+	if (!i->count)
+		return;
+	i->count -= size;
+
+	size += i->iov_offset;
 
-	i->bvec += bi.bi_idx;
-	i->nr_segs -= bi.bi_idx;
-	i->count = bi.bi_size;
-	i->iov_offset = bi.bi_bvec_done;
+	for (bvec = i->bvec, end = bvec + i->nr_segs; bvec < end; bvec++) {
+		if (likely(size < bvec->bv_len))
+			break;
+		size -= bvec->bv_len;
+	}
+	i->iov_offset = size;
+	i->nr_segs -= bvec - i->bvec;
+	i->bvec = bvec;
 }
 
 static void iov_iter_iovec_advance(struct iov_iter *i, size_t size)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
  2022-06-07 16:06     ` Al Viro
  2022-06-07 23:27       ` Al Viro
@ 2022-06-08  6:16       ` Christoph Hellwig
  1 sibling, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2022-06-08  6:16 UTC (permalink / raw)
  To: Al Viro; +Cc: Christoph Hellwig, linux-fsdevel, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 04:06:53PM +0000, Al Viro wrote:
> > Also the FUA check below needs to check IOMAP_DIO_NOSYNC as
> > well.
> 
> Does it?  AFAICS, we don't really care about REQ_FUA on any requests - what
> btrfs hack tries to avoid is stepping into
>         if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC))
> 		ret = generic_write_sync(iocb, ret);
> with generic_write_sync() called by btrfs_do_write_iter() after it has
> dropped the lock held through btrfs_direct_write().  Do we want to
> suppress REQ_FUA on the requests generated by __iomap_dio_rw() in
> that case (DSYNC, !SYNC)?  Confused...

Yes.  FUA is an used an an optimization so that we can avid the
generic_write_sync or similar call if a O_DSYNC write doesn't need
any metadata update.  If the caller already does the generic_write_sync
equivalent we don't also need FUA.  It is not actually harmful in that
it gives worse results, but it will kill performance.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 02/10] teach iomap_dio_rw() to suppress dsync
  2022-06-07 23:31           ` [PATCH 02/10] teach iomap_dio_rw() to suppress dsync Al Viro
@ 2022-06-08  6:18             ` Christoph Hellwig
  2022-06-08 15:17             ` Darrick J. Wong
  2022-06-10 11:38             ` Christian Brauner
  2 siblings, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2022-06-08  6:18 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC
  2022-06-07 23:31           ` [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC Al Viro
@ 2022-06-08  6:18             ` Christoph Hellwig
  2022-06-10 11:09             ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2022-06-08  6:18 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 02/10] teach iomap_dio_rw() to suppress dsync
  2022-06-07 23:31           ` [PATCH 02/10] teach iomap_dio_rw() to suppress dsync Al Viro
  2022-06-08  6:18             ` Christoph Hellwig
@ 2022-06-08 15:17             ` Darrick J. Wong
  2022-06-10 11:38             ` Christian Brauner
  2 siblings, 0 replies; 93+ messages in thread
From: Darrick J. Wong @ 2022-06-08 15:17 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 11:31:35PM +0000, Al Viro wrote:
> New flag, equivalent to removal of IOCB_DSYNC from iocb flags.
> This mimics what btrfs is doing (and that's what btrfs will
> switch to).  However, I'm not at all sure that we want to
> suppress REQ_FUA for those - all btrfs hack really cares about
> is suppression of generic_write_sync().  For now let's keep
> the existing behaviour, but I really want to hear more detailed
> arguments pro or contra.
> 
> [folded brain fix from willy]
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks ok,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/direct-io.c  | 20 +++++++++++---------
>  include/linux/iomap.h |  6 ++++++
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 370c3241618a..c10c69e2de24 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -548,17 +548,19 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		}
>  
>  		/* for data sync or sync, we need sync completion processing */
> -		if (iocb->ki_flags & IOCB_DSYNC)
> +		if (iocb->ki_flags & IOCB_DSYNC &&
> +		    !(dio_flags & IOMAP_DIO_NOSYNC)) {
>  			dio->flags |= IOMAP_DIO_NEED_SYNC;
>  
> -		/*
> -		 * For datasync only writes, we optimistically try using FUA for
> -		 * this IO.  Any non-FUA write that occurs will clear this flag,
> -		 * hence we know before completion whether a cache flush is
> -		 * necessary.
> -		 */
> -		if ((iocb->ki_flags & (IOCB_DSYNC | IOCB_SYNC)) == IOCB_DSYNC)
> -			dio->flags |= IOMAP_DIO_WRITE_FUA;
> +		       /*
> +			* For datasync only writes, we optimistically try
> +			* using FUA for this IO.  Any non-FUA write that
> +			* occurs will clear this flag, hence we know before
> +			* completion whether a cache flush is necessary.
> +			*/
> +			if (!(iocb->ki_flags & IOCB_SYNC))
> +				dio->flags |= IOMAP_DIO_WRITE_FUA;
> +		}
>  	}
>  
>  	if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e552097c67e0..c8622d8f064e 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -353,6 +353,12 @@ struct iomap_dio_ops {
>   */
>  #define IOMAP_DIO_PARTIAL		(1 << 2)
>  
> +/*
> + * The caller will sync the write if needed; do not sync it within
> + * iomap_dio_rw.  Overrides IOMAP_DIO_FORCE_WAIT.
> + */
> +#define IOMAP_DIO_NOSYNC		(1 << 3)
> +
>  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
>  		unsigned int dio_flags, void *private, size_t done_before);
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (8 preceding siblings ...)
  2022-06-07  4:13 ` [PATCH 9/9] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
@ 2022-06-08 19:28 ` Sedat Dilek
  2022-06-08 20:39   ` Al Viro
  2022-06-17 22:30 ` Jens Axboe
  10 siblings, 1 reply; 93+ messages in thread
From: Sedat Dilek @ 2022-06-08 19:28 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Wed, Jun 8, 2022 at 6:48 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
>         Rebased to -rc1 and reordered.  Sits in vfs.git #work.iov_iter,
> individual patches in followups
>
> 1/9: No need of likely/unlikely on calls of check_copy_size()
>         not just in uio.h; the thing is inlined and it has unlikely on
> all paths leading to return false
>
> 2/9: btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
>         new flag for iomap_dio_rw(), telling it to suppress generic_write_sync()
>
> 3/9: struct file: use anonymous union member for rcuhead and llist
>         "f_u" might have been an amusing name, but... we expect anon unions to
> work.
>
> 4/9: iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
>         makes iocb_flags() much cheaper, and it's easier to keep track of
> the places where it can change.
>
> 5/9: keep iocb_flags() result cached in struct file
>         that, along with the previous commit, reduces the overhead of
> new_sync_{read,write}().  struct file doesn't grow - we can keep that
> thing in the same anon union where rcuhead and llist live; that field
> gets used only before ->f_count reaches zero while the other two are
> used only after ->f_count has reached zero.
>
> 6/9: copy_page_{to,from}_iter(): switch iovec variants to generic
>         kmap_local_page() allows that.  And it kills quite a bit of
> code.
>
> 7/9: new iov_iter flavour - ITER_UBUF
>         iovec analogue, with single segment.  That case is fairly common and it
> can be handled with less overhead than full-blown iovec.
>
> 8/9: switch new_sync_{read,write}() to ITER_UBUF
>         ... and this is why it is so common.  Further reduction of overhead
> for new_sync_{read,write}().
>
> 9/9: iov_iter_bvec_advance(): don't bother with bvec_iter
>         AFAICS, variant similar to what we do for iovec/kvec generates better
> code.  Needs profiling, obviously.
>

I have pulled this on top of Linux v5.19-rc1... plus assorted patches
to fix issues with LLVM/Clang version 14.
No (new) warnings in my build-log.
Boots fine on bare metal on my Debian/unstable AMD64 system.

Any hints for testing - to see improvements?

-Sedat-

> Diffstat:
>  arch/powerpc/include/asm/uaccess.h |   2 +-
>  arch/s390/include/asm/uaccess.h    |   4 +-
>  block/fops.c                       |   8 +-
>  drivers/nvme/target/io-cmd-file.c  |   2 +-
>  fs/aio.c                           |   2 +-
>  fs/btrfs/file.c                    |  19 +--
>  fs/btrfs/inode.c                   |   2 +-
>  fs/ceph/file.c                     |   2 +-
>  fs/cifs/file.c                     |   2 +-
>  fs/direct-io.c                     |   4 +-
>  fs/fcntl.c                         |   1 +
>  fs/file_table.c                    |  17 +-
>  fs/fuse/dev.c                      |   4 +-
>  fs/fuse/file.c                     |   4 +-
>  fs/gfs2/file.c                     |   2 +-
>  fs/io_uring.c                      |   2 +-
>  fs/iomap/direct-io.c               |  24 +--
>  fs/nfs/direct.c                    |   2 +-
>  fs/open.c                          |   1 +
>  fs/read_write.c                    |   6 +-
>  fs/zonefs/super.c                  |   2 +-
>  include/linux/fs.h                 |  21 ++-
>  include/linux/iomap.h              |   2 +
>  include/linux/uaccess.h            |   4 +-
>  include/linux/uio.h                |  41 +++--
>  lib/iov_iter.c                     | 308 +++++++++++--------------------------
>  mm/shmem.c                         |   2 +-
>  27 files changed, 191 insertions(+), 299 deletions(-)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-08 19:28 ` [RFC][PATCHES] iov_iter stuff Sedat Dilek
@ 2022-06-08 20:39   ` Al Viro
  2022-06-09 19:10     ` Sedat Dilek
  0 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-08 20:39 UTC (permalink / raw)
  To: Sedat Dilek; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Wed, Jun 08, 2022 at 09:28:18PM +0200, Sedat Dilek wrote:

> I have pulled this on top of Linux v5.19-rc1... plus assorted patches
> to fix issues with LLVM/Clang version 14.
> No (new) warnings in my build-log.
> Boots fine on bare metal on my Debian/unstable AMD64 system.
> 
> Any hints for testing - to see improvements?

Profiling, basically...  A somewhat artificial microbenchmark would be
to remove read_null()/write_null()/read_zero()/write_zero(), along with
the corresponding .read and .write initializers in drivers/char/mem.c
and see how dd to/from /dev/zero and friends behaves.  On the mainline
it gives a noticable regression, due to overhead in new_sync_{read,write}().
With this series it should get better; pipe reads/writes also should see
reduction of overhead.

	There'd been a thread regarding /dev/random stuff; look for
"random: convert to using iters" and things nearby...

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 06/10] keep iocb_flags() result cached in struct file
  2022-06-07 23:31           ` [PATCH 06/10] keep iocb_flags() result cached in struct file Al Viro
@ 2022-06-09  0:35             ` Dave Chinner
  2022-06-10 11:43             ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Dave Chinner @ 2022-06-09  0:35 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 11:31:39PM +0000, Al Viro wrote:
> * calculate at the time we set FMODE_OPENED (do_dentry_open() for normal
> opens, alloc_file() for pipe()/socket()/etc.)
> * update when handling F_SETFL
> * keep in a new field - file->f_i_flags; since that thing is needed only

Can you name this f_iocb_flags, because I keep reading it the "f_i_"
shorthand as "file_inode_"....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-08 20:39   ` Al Viro
@ 2022-06-09 19:10     ` Sedat Dilek
  2022-06-09 19:22       ` Matthew Wilcox
  2022-06-09 19:45       ` Al Viro
  0 siblings, 2 replies; 93+ messages in thread
From: Sedat Dilek @ 2022-06-09 19:10 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Wed, Jun 8, 2022 at 10:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Jun 08, 2022 at 09:28:18PM +0200, Sedat Dilek wrote:
>
> > I have pulled this on top of Linux v5.19-rc1... plus assorted patches
> > to fix issues with LLVM/Clang version 14.
> > No (new) warnings in my build-log.
> > Boots fine on bare metal on my Debian/unstable AMD64 system.
> >
> > Any hints for testing - to see improvements?
>
> Profiling, basically...  A somewhat artificial microbenchmark would be
> to remove read_null()/write_null()/read_zero()/write_zero(), along with
> the corresponding .read and .write initializers in drivers/char/mem.c
> and see how dd to/from /dev/zero and friends behaves.  On the mainline
> it gives a noticable regression, due to overhead in new_sync_{read,write}().
> With this series it should get better; pipe reads/writes also should see
> reduction of overhead.
>
>         There'd been a thread regarding /dev/random stuff; look for
> "random: convert to using iters" and things nearby...

Hmm, I did not find it...

I bookmarked Ingo's reply on Boris x86-usercopy patch.
There is a vague description without (for me at least) concrete instructions.

> So Mel gave me the idea to simply measure how fast the function becomes.
> ...

My SandyBridge-CPU has no FSRM feature, so I'm unsure if I really
benefit from your changes.

My test-cases:

1. LC_ALL=C dd if=/dev/zero of=/dev/null bs=1M count=1M status=progress

2. perf bench mem memcpy (with Debian's perf v5.18 and a selfmade v5.19-rc1)

First test-case shows no measurable/noticable differences.
The 2nd one I ran for the first time with your changes and did not
compare with a kernel without them.
Link to the 2nd test-case and comments see [1].

In a later version you may add some notes/comments about benchmarking.
"Numbers talk - bullshit walks." Linus T.

-Sedat-

[1] https://lore.kernel.org/all/YpCxt31TKxV5zS3l@gmail.com/

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-09 19:10     ` Sedat Dilek
@ 2022-06-09 19:22       ` Matthew Wilcox
  2022-06-09 19:58         ` Matthew Wilcox
  2022-06-09 19:45       ` Al Viro
  1 sibling, 1 reply; 93+ messages in thread
From: Matthew Wilcox @ 2022-06-09 19:22 UTC (permalink / raw)
  To: Sedat Dilek; +Cc: Al Viro, linux-fsdevel, Jens Axboe, Christoph Hellwig

On Thu, Jun 09, 2022 at 09:10:04PM +0200, Sedat Dilek wrote:
> On Wed, Jun 8, 2022 at 10:39 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Wed, Jun 08, 2022 at 09:28:18PM +0200, Sedat Dilek wrote:
> >
> > > I have pulled this on top of Linux v5.19-rc1... plus assorted patches
> > > to fix issues with LLVM/Clang version 14.
> > > No (new) warnings in my build-log.
> > > Boots fine on bare metal on my Debian/unstable AMD64 system.
> > >
> > > Any hints for testing - to see improvements?
> >
> > Profiling, basically...  A somewhat artificial microbenchmark would be
> > to remove read_null()/write_null()/read_zero()/write_zero(), along with
> > the corresponding .read and .write initializers in drivers/char/mem.c
> > and see how dd to/from /dev/zero and friends behaves.  On the mainline
> > it gives a noticable regression, due to overhead in new_sync_{read,write}().
> > With this series it should get better; pipe reads/writes also should see
> > reduction of overhead.
> >
> >         There'd been a thread regarding /dev/random stuff; look for
> > "random: convert to using iters" and things nearby...
> 
> Hmm, I did not find it...
> 
> I bookmarked Ingo's reply on Boris x86-usercopy patch.
> There is a vague description without (for me at least) concrete instructions.

It's not really that.  This is more about per-IO overhead, so you'd want
to do a lot of 1-byte writes to maximise your chance of seeing a
difference.


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-09 19:10     ` Sedat Dilek
  2022-06-09 19:22       ` Matthew Wilcox
@ 2022-06-09 19:45       ` Al Viro
  1 sibling, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-09 19:45 UTC (permalink / raw)
  To: Sedat Dilek; +Cc: linux-fsdevel, Jens Axboe, Christoph Hellwig, Matthew Wilcox

On Thu, Jun 09, 2022 at 09:10:04PM +0200, Sedat Dilek wrote:

> > So Mel gave me the idea to simply measure how fast the function becomes.
> > ...
> 
> My SandyBridge-CPU has no FSRM feature, so I'm unsure if I really
> benefit from your changes.

What does it have to do with FSRM?

> My test-cases:
> 
> 1. LC_ALL=C dd if=/dev/zero of=/dev/null bs=1M count=1M status=progress
> 
> 2. perf bench mem memcpy (with Debian's perf v5.18 and a selfmade v5.19-rc1)
> 
> First test-case shows no measurable/noticable differences.

No surprise - you hit read() once and write() once per 1Mb worth of clear_user().
If overhead in new_sync_{read,write}() had been _that_ large, the things would've
really sucked.

> The 2nd one I ran for the first time with your changes and did not
> compare with a kernel without them.

????

How could _any_ changes in that series have any impact whatsoever on memcpy()
performance?  Hell, just look at diffstat - nothing in there goes anywhere
near the stuff involved in that test.  Nothing whatsoever in arch/x86; no
changes in lib/ outside of lib/iov_iter.c, etc.

What it does deal with is the overhead of the glue that leads to ->read_iter()
and ->write_iter(), as well as overhead of copy_to_iter()/copy_from_iter()
that becomes noticable on fairly short reads and writes.  It doesn't (and cannot)
do anything for the stuff dominated by the time spent in raw_copy_to_user() or
raw_copy_from_user() - the code responsible for actual copying data between
the kernel and userland memory is completely unaffected by any of that.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-09 19:22       ` Matthew Wilcox
@ 2022-06-09 19:58         ` Matthew Wilcox
  0 siblings, 0 replies; 93+ messages in thread
From: Matthew Wilcox @ 2022-06-09 19:58 UTC (permalink / raw)
  To: Sedat Dilek; +Cc: Al Viro, linux-fsdevel, Jens Axboe, Christoph Hellwig

On Thu, Jun 09, 2022 at 08:22:09PM +0100, Matthew Wilcox wrote:
> It's not really that.  This is more about per-IO overhead, so you'd want
> to do a lot of 1-byte writes to maximise your chance of seeing a
> difference.

Here's an earlier thread on ->read_iter vs ->read performance costs:

https://lore.kernel.org/linux-fsdevel/20210107151125.GB5270@casper.infradead.org/

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC
  2022-06-07 23:31           ` [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC Al Viro
  2022-06-08  6:18             ` Christoph Hellwig
@ 2022-06-10 11:09             ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Christian Brauner @ 2022-06-10 11:09 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 11:31:36PM +0000, Al Viro wrote:
> ... instead of messing with iocb flags
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Good cleanup,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 09/10] switch new_sync_{read,write}() to ITER_UBUF
  2022-06-07 23:31           ` [PATCH 09/10] switch new_sync_{read,write}() to ITER_UBUF Al Viro
@ 2022-06-10 11:11             ` Christian Brauner
  0 siblings, 0 replies; 93+ messages in thread
From: Christian Brauner @ 2022-06-10 11:11 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 11:31:42PM +0000, Al Viro wrote:
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 02/10] teach iomap_dio_rw() to suppress dsync
  2022-06-07 23:31           ` [PATCH 02/10] teach iomap_dio_rw() to suppress dsync Al Viro
  2022-06-08  6:18             ` Christoph Hellwig
  2022-06-08 15:17             ` Darrick J. Wong
@ 2022-06-10 11:38             ` Christian Brauner
  2 siblings, 0 replies; 93+ messages in thread
From: Christian Brauner @ 2022-06-10 11:38 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 11:31:35PM +0000, Al Viro wrote:
> New flag, equivalent to removal of IOCB_DSYNC from iocb flags.
> This mimics what btrfs is doing (and that's what btrfs will
> switch to).  However, I'm not at all sure that we want to
> suppress REQ_FUA for those - all btrfs hack really cares about
> is suppression of generic_write_sync().  For now let's keep
> the existing behaviour, but I really want to hear more detailed
> arguments pro or contra.
> 
> [folded brain fix from willy]
> 
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 05/10] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
  2022-06-07 23:31           ` [PATCH 05/10] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
@ 2022-06-10 11:41             ` Christian Brauner
  0 siblings, 0 replies; 93+ messages in thread
From: Christian Brauner @ 2022-06-10 11:41 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 11:31:38PM +0000, Al Viro wrote:
> New helper to be used instead of direct checks for IOCB_DSYNC:
> iocb_is_dsync(iocb).  Checks converted, which allows to avoid
> the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines)
> from iocb_flags() - it's checked in iocb_is_dsync() instead
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me,
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 06/10] keep iocb_flags() result cached in struct file
  2022-06-07 23:31           ` [PATCH 06/10] keep iocb_flags() result cached in struct file Al Viro
  2022-06-09  0:35             ` Dave Chinner
@ 2022-06-10 11:43             ` Christian Brauner
  1 sibling, 0 replies; 93+ messages in thread
From: Christian Brauner @ 2022-06-10 11:43 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Tue, Jun 07, 2022 at 11:31:39PM +0000, Al Viro wrote:
> * calculate at the time we set FMODE_OPENED (do_dentry_open() for normal
> opens, alloc_file() for pipe()/socket()/etc.)
> * update when handling F_SETFL
> * keep in a new field - file->f_i_flags; since that thing is needed only
> before the refcount reaches zero, we can put it into the same anon union
> where ->f_rcuhead and ->f_llist live - those are used only after refcount
> reaches zero.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---

Looks good to me (independent of whether that'll be called f_i_flag or f_iocb_flags),
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
                   ` (9 preceding siblings ...)
  2022-06-08 19:28 ` [RFC][PATCHES] iov_iter stuff Sedat Dilek
@ 2022-06-17 22:30 ` Jens Axboe
  2022-06-17 22:48   ` Al Viro
  10 siblings, 1 reply; 93+ messages in thread
From: Jens Axboe @ 2022-06-17 22:30 UTC (permalink / raw)
  To: Al Viro, linux-fsdevel; +Cc: Christoph Hellwig, Matthew Wilcox

On 6/6/22 10:08 PM, Al Viro wrote:
> 	Rebased to -rc1 and reordered.  Sits in vfs.git #work.iov_iter,
> individual patches in followups
> 
> 1/9: No need of likely/unlikely on calls of check_copy_size()
> 	not just in uio.h; the thing is inlined and it has unlikely on
> all paths leading to return false
> 
> 2/9: btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression
> 	new flag for iomap_dio_rw(), telling it to suppress generic_write_sync()
> 
> 3/9: struct file: use anonymous union member for rcuhead and llist
> 	"f_u" might have been an amusing name, but... we expect anon unions to
> work.
> 
> 4/9: iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
> 	makes iocb_flags() much cheaper, and it's easier to keep track of
> the places where it can change.
> 
> 5/9: keep iocb_flags() result cached in struct file
> 	that, along with the previous commit, reduces the overhead of
> new_sync_{read,write}().  struct file doesn't grow - we can keep that
> thing in the same anon union where rcuhead and llist live; that field
> gets used only before ->f_count reaches zero while the other two are
> used only after ->f_count has reached zero.
> 
> 6/9: copy_page_{to,from}_iter(): switch iovec variants to generic
> 	kmap_local_page() allows that.  And it kills quite a bit of
> code.
> 
> 7/9: new iov_iter flavour - ITER_UBUF
> 	iovec analogue, with single segment.  That case is fairly common and it
> can be handled with less overhead than full-blown iovec.
> 
> 8/9: switch new_sync_{read,write}() to ITER_UBUF
> 	... and this is why it is so common.  Further reduction of overhead
> for new_sync_{read,write}().
> 
> 9/9: iov_iter_bvec_advance(): don't bother with bvec_iter
> 	AFAICS, variant similar to what we do for iovec/kvec generates better
> code.  Needs profiling, obviously.

Al, looks good to me from inspection, and I ported stuffed this on top
of -git and my 5.20 branch, and did my send/recv/recvmsg io_uring change
on top and see a noticeable reduction there too for some benchmarking.
Feel free to add:

Reviewed-by: Jens Axboe <axboe@kernel.dk>

to the series.

Side note - of my initial series I played with, I still have this one
leftover that I do utilize for io_uring:

https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.20/io_uring-iter&id=a59f5c21a6eeb9506163c20aff4846dbec159f47

Doesn't make sense standalone, but I have it as a prep patch.

Can I consider your work.iov_iter stable at this point, or are you still
planning rebasing?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-17 22:30 ` Jens Axboe
@ 2022-06-17 22:48   ` Al Viro
  2022-06-18  5:27     ` Al Viro
  0 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-17 22:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, Christoph Hellwig, Matthew Wilcox

On Fri, Jun 17, 2022 at 04:30:49PM -0600, Jens Axboe wrote:

> Al, looks good to me from inspection, and I ported stuffed this on top
> of -git and my 5.20 branch, and did my send/recv/recvmsg io_uring change
> on top and see a noticeable reduction there too for some benchmarking.
> Feel free to add:
> 
> Reviewed-by: Jens Axboe <axboe@kernel.dk>
> 
> to the series.
> 
> Side note - of my initial series I played with, I still have this one
> leftover that I do utilize for io_uring:
> 
> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.20/io_uring-iter&id=a59f5c21a6eeb9506163c20aff4846dbec159f47
> 
> Doesn't make sense standalone, but I have it as a prep patch.
> 
> Can I consider your work.iov_iter stable at this point, or are you still
> planning rebasing?

Umm...  Rebasing this part - probably no; there's a fun followup to it, though,
I'm finishing the carve up & reorder at the moment.  Will post for review
tonight...

Current state:

Al Viro (43):
      No need of likely/unlikely on calls of check_copy_size()
      9p: handling Rerror without copy_from_iter_full()
      teach iomap_dio_rw() to suppress dsync
      btrfs: use IOMAP_DIO_NOSYNC
      struct file: use anonymous union member for rcuhead and llist
      iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC
      keep iocb_flags() result cached in struct file
      copy_page_{to,from}_iter(): switch iovec variants to generic
      new iov_iter flavour - ITER_UBUF
      switch new_sync_{read,write}() to ITER_UBUF
      iov_iter_bvec_advance(): don't bother with bvec_iter
      fix short copy handling in copy_mc_pipe_to_iter()
      splice: stop abusing iov_iter_advance() to flush a pipe
      ITER_PIPE: helper for getting pipe buffer by index
      ITER_PIPE: helpers for adding pipe buffers
      ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives
      ITER_PIPE: fold push_pipe() into __pipe_get_pages()
      ITER_PIPE: lose iter_head argument of __pipe_get_pages()
      ITER_PIPE: clean pipe_advance() up
      ITER_PIPE: clean iov_iter_revert()
      ITER_PIPE: cache the type of last buffer
      fold data_start() and pipe_space_for_user() together
      iov_iter_get_pages{,_alloc}(): cap the maxsize with LONG_MAX
      iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper
      iov_iter_get_pages(): sanity-check arguments
      unify pipe_get_pages() and pipe_get_pages_alloc()
      unify xarray_get_pages() and xarray_get_pages_alloc()
      unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts
      ITER_XARRAY: don't open-code DIV_ROUND_UP()
      iov_iter: lift dealing with maxpages into iov_iter_get_pages()
      iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
      found_iovec_segment(): just return address
      fold __pipe_get_pages() into pipe_get_pages()
      iov_iter: saner helper for page array allocation
      iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
      block: convert to advancing variants of iov_iter_get_pages{,_alloc}()
      iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()
      af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages()
      9p: convert to advancing variant of iov_iter_get_pages_alloc()
      ceph: switch the last caller of iov_iter_get_pages_alloc()
      get rid of non-advancing variants
      pipe_get_pages(): switch to append_pipe()
      expand those iov_iter_advance()...

 arch/powerpc/include/asm/uaccess.h |   2 +-
 arch/s390/include/asm/uaccess.h    |   4 +-
 block/bio.c                        |  15 +-
 block/blk-map.c                    |   7 +-
 block/fops.c                       |   8 +-
 crypto/af_alg.c                    |   3 +-
 crypto/algif_hash.c                |   5 +-
 drivers/nvme/target/io-cmd-file.c  |   2 +-
 drivers/vhost/scsi.c               |   4 +-
 fs/aio.c                           |   2 +-
 fs/btrfs/file.c                    |  19 +-
 fs/btrfs/inode.c                   |   3 +-
 fs/ceph/addr.c                     |   2 +-
 fs/ceph/file.c                     |   5 +-
 fs/cifs/file.c                     |   8 +-
 fs/cifs/misc.c                     |   3 +-
 fs/direct-io.c                     |   7 +-
 fs/fcntl.c                         |   1 +
 fs/file_table.c                    |  17 +-
 fs/fuse/dev.c                      |   7 +-
 fs/fuse/file.c                     |   7 +-
 fs/gfs2/file.c                     |   2 +-
 fs/io_uring.c                      |   2 +-
 fs/iomap/direct-io.c               |  21 +-
 fs/nfs/direct.c                    |   8 +-
 fs/open.c                          |   1 +
 fs/read_write.c                    |   6 +-
 fs/splice.c                        |  54 +-
 fs/zonefs/super.c                  |   2 +-
 include/linux/fs.h                 |  21 +-
 include/linux/iomap.h              |   6 +
 include/linux/pipe_fs_i.h          |  29 +-
 include/linux/uaccess.h            |   4 +-
 include/linux/uio.h                |  50 +-
 lib/iov_iter.c                     | 978 ++++++++++++++-----------------------
 mm/shmem.c                         |   2 +-
 net/9p/client.c                    | 125 +----
 net/9p/protocol.c                  |   3 +-
 net/9p/trans_virtio.c              |  37 +-
 net/core/datagram.c                |   3 +-
 net/core/skmsg.c                   |   3 +-
 net/rds/message.c                  |   3 +-
 net/tls/tls_sw.c                   |   4 +-
 43 files changed, 589 insertions(+), 906 deletions(-)


^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [RFC][PATCHES] iov_iter stuff
  2022-06-17 22:48   ` Al Viro
@ 2022-06-18  5:27     ` Al Viro
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
  0 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-fsdevel, Christoph Hellwig, Matthew Wilcox

On Fri, Jun 17, 2022 at 11:48:01PM +0100, Al Viro wrote:
> On Fri, Jun 17, 2022 at 04:30:49PM -0600, Jens Axboe wrote:
> 
> > Al, looks good to me from inspection, and I ported stuffed this on top
> > of -git and my 5.20 branch, and did my send/recv/recvmsg io_uring change
> > on top and see a noticeable reduction there too for some benchmarking.
> > Feel free to add:
> > 
> > Reviewed-by: Jens Axboe <axboe@kernel.dk>
> > 
> > to the series.
> > 
> > Side note - of my initial series I played with, I still have this one
> > leftover that I do utilize for io_uring:
> > 
> > https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.20/io_uring-iter&id=a59f5c21a6eeb9506163c20aff4846dbec159f47
> > 
> > Doesn't make sense standalone, but I have it as a prep patch.
> > 
> > Can I consider your work.iov_iter stable at this point, or are you still
> > planning rebasing?
> 
> Umm...  Rebasing this part - probably no; there's a fun followup to it, though,
> I'm finishing the carve up & reorder at the moment.  Will post for review
> tonight...

	This stuff sits on top of #work.iov_iter (as posted a week ago) +
#fixes (one commit, handling of failures halfway through copy_mc_to_iter()
into ITER_PIPE, posted several days ago, backportable minimal fix) +
#work.9p (handling of RERROR on zerocopy 9P read/readdir, posted about
a week ago).  The branch is #work.iov_iter_get_pages; individual patches
in followups.

	NOTE: the older branches are unchanged, but this series on top of
them had been repeatedly carved up, reordered, etc. - there had been a lot
of recent massage, so at this point it should be treated as absolutely
untested.  It can shit over memory and/or chew your filesystems; DON'T
TRY IT OUTSIDE OF A SCRATCH KVM IMAGE.  Said that, review and (cautious)
testing would be very welcome.

	Part 1: ITER_PIPE cleanups

ITER_PIPE handling had never been pretty, but by now it has become
really obfuscated and hard to read.  Untangle it a bit.

1) splice: stop abusing iov_iter_advance() to flush a pipe
	A really odd (ab)use of iov_iter_advance() - in case of error
generic_file_splice_read() wants to free all pipe buffers ->read_iter()
has produced.  Yes, forcibly resetting ->head and ->iov_offset to
original values and calling iov_iter_advance(i, 0), will trigger
pipe_advance(), which will trigger pipe_truncate(), which will free
buffers.  Or we could just go ahead and free the same buffers;
pipe_discard_from() does exactly that, no iov_iter stuff needs to
be involved.

2) ITER_PIPE: helper for getting pipe buffer by index
	In a lot of places we want to find pipe_buffer by index;
expression is convoluted and hard to read.  Provide an inline helper
for that, convert trivial open-coded cases.  Eventually *all*
open-coded instances in iov_iter.c will get converted.

3) ITER_PIPE: helpers for adding pipe buffers
	There are only two kinds of pipe_buffer in the area used by ITER_PIPE.
* anonymous - copy_to_iter() et.al. end up creating those and copying data
  there.  They have zero ->offset, and their ->ops points to
  default_pipe_page_ops.
* zero-copy ones - those come from copy_page_to_iter(), and page comes from
  caller.  ->offset is also caller-supplied - it might be non-zero.
  ->ops points to page_cache_pipe_buf_ops.
	Move creation and insertion of those into helpers -
push_anon(pipe, size) and push_page(pipe, page, offset, size) resp., separating
them from the "could we avoid creating a new buffer by merging with the current
head?" logics.

4) ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives
	New helper: append_pipe().  Extends the last buffer if possible,
allocates a new one otherwise.  Returns page and offset in it on success,
NULL on failure.  iov_iter is advanced past the data we've got.
	Use that instead of push_pipe() in copy-to-pipe primitives;
they get simpler that way.  Handling of short copy (in "mc" one)
is done simply by iov_iter_revert() - iov_iter is in consistent
state after that one, so we can use that.

5) ITER_PIPE: fold push_pipe() into __pipe_get_pages()
	Expand the only remaining call of push_pipe() (in
__pipe_get_pages()), combine it with the page-collecting loop there.
We don't need to bother with i->count checks or calculation of offset
in the first page - the caller already has done that.
	Note that the only reason it's not a loop doing append_pipe()
is that append_pipe() is advancing, while iov_iter_get_pages() is not.
As soon as it switches to saner semantics, this thing will switch
to using append_pipe().

6) ITER_PIPE: lose iter_head argument of __pipe_get_pages()
	Always equal to pipe->head - 1.

7) ITER_PIPE: clean pipe_advance() up
	Don't bother with pipe_truncate(); adjust the buffer
length just as we decide it'll be the last one, then use
pipe_discard_from() to release buffers past that one.

8) ITER_PIPE: clean iov_iter_revert()
	Fold pipe_truncate() in there, clean the things up.

9) ITER_PIPE: cache the type of last buffer
	We often need to find whether the last buffer is anon or not, and
currently it's rather clumsy:
	check if ->iov_offset is non-zero (i.e. that pipe is not empty)
	if so, get the corresponding pipe_buffer and check its ->ops
	if it's &default_pipe_buf_ops, we have an anon buffer.
Let's replace the use of ->iov_offset (which is nowhere near similar to
its role for other flavours) with signed field (->last_offset), with
the following rules:
	empty, no buffers occupied:		0
	anon, with bytes up to N-1 filled:	N
	zero-copy, with bytes up to N-1 filled:	-N
That way abs(i->last_offset) is equal to what used to be in i->iov_offset
and empty vs. anon vs. zero-copy can be distinguished by the sign of
i->last_offset.
	Checks for "should we extend the last buffer or should we start
a new one?" become easier to follow that way.
	Note that most of the operations can only be done in a sane
state - i.e. when the pipe has nothing past the current position of
iterator.  About the only thing that could be done outside of that
state is iov_iter_advance(), which transitions to the sane state by
truncating the pipe.  There are only two cases where we leave the
sane state:
	1) iov_iter_get_pages()/iov_iter_get_pages_alloc().  Will be
dealt with later, when we make get_pages advancing - the callers are
actually happier that way.
	2) iov_iter copied, then something is put into the copy.  Since
they share the underlying pipe, the original gets behind.  When we
decide that we are done with the copy (original is not usable until then)
we advance the original.  direct_io used to be done that way; nowadays
it operates on the original and we do iov_iter_revert() to discard
the excessive data.  At the moment there's nothing in the kernel that
could do that to ITER_PIPE iterators, so this reason for insane state
is theoretical right now.

10) ITER_PIPE: fold data_start() and pipe_space_for_user() together
	All their callers are next to each other; all of them
want the total amount of pages and, possibly, the
offset in the partial final buffer.
	Combine into a new helper (pipe_npages()), fix the
bogosity in pipe_space_for_user(), while we are at it.

	Part 2: iov_iter_get_pages()/iov_iter_get_pages_alloc() unification

	There's a lot of duplication between iov_iter_get_pages() and
iov_iter_get_pages_alloc().  With some massage it can be eliminated,
along with some of the cruft accumulated there.

	Flavour-independent arguments validation and, for ..._alloc(),
cleanup handling on failure:
11) iov_iter_get_pages{,_alloc}(): cap the maxsize with LONG_MAX
12) iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper
13) iov_iter_get_pages(): sanity-check arguments

	Mechanically merge parallel ..._get_pages() and ..._get_pages_alloc().
14) unify pipe_get_pages() and pipe_get_pages_alloc()
15) unify xarray_get_pages() and xarray_get_pages_alloc()
16) unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts

	Decrufting for XARRAY:
17) ITER_XARRAY: don't open-code DIV_ROUND_UP()
	Decrufting for iBUF/IOVEC/BVEC:
18) iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
19) iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
20) found_iovec_segment(): just return address
	Decrufting for PIPE:
21) fold __pipe_get_pages() into pipe_get_pages()

	Collapsing the bits that differ for get_pages and get_pages_alloc
cases into a common helper:
22) iov_iter: saner helper for page array allocation

	Part 3: making iov_iter_get_pages{,_alloc}() advancing

	Most of the callers follow successful ...get_pages... with advance
by the amount it had reported.  For some it's unconditional, for some it
might end up being less in some cases.  All of them would be fine with
advancing variants of those primitives - those that might want to advance
by less than reported could easily use revert by the difference of those
amounts.
	Rather than doing a flagday change (they are exported and signatures
remain unchanged), replacement variants are added (iov_iter_get_pages2()
and iov_iter_get_pages_alloc2(), initially as wrappers).  By the end of
the series everything is converted to those and the old ones are removed.

23) iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
24) block: convert to advancing variants of iov_iter_get_pages{,_alloc}()
25) iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()
26) af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages()
27) 9p: convert to advancing variant of iov_iter_get_pages_alloc()
28) ceph: switch the last caller of iov_iter_get_pages_alloc()
29) get rid of non-advancing variants

	Part 4: cleanups
30) pipe_get_pages(): switch to append_pipe()
31) expand those iov_iter_advance()...

Overall diffstat:

 arch/powerpc/include/asm/uaccess.h |   2 +-
 arch/s390/include/asm/uaccess.h    |   4 +-
 block/bio.c                        |  15 +-
 block/blk-map.c                    |   7 +-
 block/fops.c                       |   8 +-
 crypto/af_alg.c                    |   3 +-
 crypto/algif_hash.c                |   5 +-
 drivers/nvme/target/io-cmd-file.c  |   2 +-
 drivers/vhost/scsi.c               |   4 +-
 fs/aio.c                           |   2 +-
 fs/btrfs/file.c                    |  19 +-
 fs/btrfs/inode.c                   |   3 +-
 fs/ceph/addr.c                     |   2 +-
 fs/ceph/file.c                     |   5 +-
 fs/cifs/file.c                     |   8 +-
 fs/cifs/misc.c                     |   3 +-
 fs/direct-io.c                     |   7 +-
 fs/fcntl.c                         |   1 +
 fs/file_table.c                    |  17 +-
 fs/fuse/dev.c                      |   7 +-
 fs/fuse/file.c                     |   7 +-
 fs/gfs2/file.c                     |   2 +-
 fs/io_uring.c                      |   2 +-
 fs/iomap/direct-io.c               |  21 +-
 fs/nfs/direct.c                    |   8 +-
 fs/open.c                          |   1 +
 fs/read_write.c                    |   6 +-
 fs/splice.c                        |  54 +-
 fs/zonefs/super.c                  |   2 +-
 include/linux/fs.h                 |  21 +-
 include/linux/iomap.h              |   6 +
 include/linux/pipe_fs_i.h          |  29 +-
 include/linux/uaccess.h            |   4 +-
 include/linux/uio.h                |  50 +-
 lib/iov_iter.c                     | 978 ++++++++++++++-----------------------
 mm/shmem.c                         |   2 +-
 net/9p/client.c                    | 125 +----
 net/9p/protocol.c                  |   3 +-
 net/9p/trans_virtio.c              |  37 +-
 net/core/datagram.c                |   3 +-
 net/core/skmsg.c                   |   3 +-
 net/rds/message.c                  |   3 +-
 net/tls/tls_sw.c                   |   4 +-
 43 files changed, 589 insertions(+), 906 deletions(-)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe
  2022-06-18  5:27     ` Al Viro
@ 2022-06-18  5:35       ` Al Viro
  2022-06-18  5:35         ` [PATCH 02/31] ITER_PIPE: helper for getting pipe buffer by index Al Viro
                           ` (30 more replies)
  0 siblings, 31 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Use pipe_discard_from() explicitly in generic_file_read_iter(); don't bother
with rather non-obvious use of iov_iter_advance() in there.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 047b79db8eb5..6645b30ec990 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -301,11 +301,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 {
 	struct iov_iter to;
 	struct kiocb kiocb;
-	unsigned int i_head;
 	int ret;
 
 	iov_iter_pipe(&to, READ, pipe, len);
-	i_head = to.head;
 	init_sync_kiocb(&kiocb, in);
 	kiocb.ki_pos = *ppos;
 	ret = call_read_iter(in, &kiocb, &to);
@@ -313,9 +311,8 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 		*ppos = kiocb.ki_pos;
 		file_accessed(in);
 	} else if (ret < 0) {
-		to.head = i_head;
-		to.iov_offset = 0;
-		iov_iter_advance(&to, 0); /* to free what was emitted */
+		/* free what was emitted */
+		pipe_discard_from(pipe, to.start_head);
 		/*
 		 * callers of ->splice_read() expect -EAGAIN on
 		 * "can't put anything in there", rather than -EFAULT.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 02/31] ITER_PIPE: helper for getting pipe buffer by index
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 03/31] ITER_PIPE: helpers for adding pipe buffers Al Viro
                           ` (29 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

pipe_buffer instances of a pipe are organized as a ring buffer,
with power-of-2 size.  Indices are kept *not* reduced modulo ring
size, so the buffer refered to by index N is
	pipe->bufs[N & (pipe->ring_size - 1)].

Ring size can change over the lifetime of a pipe, but not while
the pipe is locked.  So for any iov_iter primitives it's a constant.
Original conversion of pipes to this layout went overboard trying
to microoptimize that - calculating pipe->ring_size - 1, storing
it in a local variable and using through the function.  In some
cases it might be warranted, but most of the times it only
obfuscates what's going on in there.

Introduce a helper (pipe_buf(pipe, N)) that would encapsulate
that and use it in the obvious cases.  More will follow...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d00cc8971b5b..08bb393da677 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -183,13 +183,18 @@ static int copyin(void *to, const void __user *from, size_t n)
 	return n;
 }
 
+static inline struct pipe_buffer *pipe_buf(const struct pipe_inode_info *pipe,
+					   unsigned int slot)
+{
+	return &pipe->bufs[slot & (pipe->ring_size - 1)];
+}
+
 #ifdef PIPE_PARANOIA
 static bool sanity(const struct iov_iter *i)
 {
 	struct pipe_inode_info *pipe = i->pipe;
 	unsigned int p_head = pipe->head;
 	unsigned int p_tail = pipe->tail;
-	unsigned int p_mask = pipe->ring_size - 1;
 	unsigned int p_occupancy = pipe_occupancy(p_head, p_tail);
 	unsigned int i_head = i->head;
 	unsigned int idx;
@@ -201,7 +206,7 @@ static bool sanity(const struct iov_iter *i)
 		if (unlikely(i_head != p_head - 1))
 			goto Bad;	// must be at the last buffer...
 
-		p = &pipe->bufs[i_head & p_mask];
+		p = pipe_buf(pipe, i_head);
 		if (unlikely(p->offset + p->len != i->iov_offset))
 			goto Bad;	// ... at the end of segment
 	} else {
@@ -386,11 +391,10 @@ static inline bool allocated(struct pipe_buffer *buf)
 static inline void data_start(const struct iov_iter *i,
 			      unsigned int *iter_headp, size_t *offp)
 {
-	unsigned int p_mask = i->pipe->ring_size - 1;
 	unsigned int iter_head = i->head;
 	size_t off = i->iov_offset;
 
-	if (off && (!allocated(&i->pipe->bufs[iter_head & p_mask]) ||
+	if (off && (!allocated(pipe_buf(i->pipe, iter_head)) ||
 		    off == PAGE_SIZE)) {
 		iter_head++;
 		off = 0;
@@ -1180,10 +1184,9 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
 		return iov_iter_alignment_bvec(i);
 
 	if (iov_iter_is_pipe(i)) {
-		unsigned int p_mask = i->pipe->ring_size - 1;
 		size_t size = i->count;
 
-		if (size && i->iov_offset && allocated(&i->pipe->bufs[i->head & p_mask]))
+		if (size && i->iov_offset && allocated(pipe_buf(i->pipe, i->head)))
 			return size | i->iov_offset;
 		return size;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 03/31] ITER_PIPE: helpers for adding pipe buffers
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
  2022-06-18  5:35         ` [PATCH 02/31] ITER_PIPE: helper for getting pipe buffer by index Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 04/31] ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives Al Viro
                           ` (28 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

There are only two kinds of pipe_buffer in the area used by ITER_PIPE.

1) anonymous - copy_to_iter() et.al. end up creating those and copying
data there.  They have zero ->offset, and their ->ops points to
default_pipe_page_ops.

2) zero-copy ones - those come from copy_page_to_iter(), and page
comes from caller.  ->offset is also caller-supplied - it might be
non-zero.  ->ops points to page_cache_pipe_buf_ops.

Move creation and insertion of those into helpers - push_anon(pipe, size)
and push_page(pipe, page, offset, size) resp., separating them from
the "could we avoid creating a new buffer by merging with the current
head?" logics.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 88 ++++++++++++++++++++++++++------------------------
 1 file changed, 46 insertions(+), 42 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 08bb393da677..924854c2a7ce 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -231,15 +231,39 @@ static bool sanity(const struct iov_iter *i)
 #define sanity(i) true
 #endif
 
+static struct page *push_anon(struct pipe_inode_info *pipe, unsigned size)
+{
+	struct page *page = alloc_page(GFP_USER);
+	if (page) {
+		struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++);
+		*buf = (struct pipe_buffer) {
+			.ops = &default_pipe_buf_ops,
+			.page = page,
+			.offset = 0,
+			.len = size
+		};
+	}
+	return page;
+}
+
+static void push_page(struct pipe_inode_info *pipe, struct page *page,
+			unsigned int offset, unsigned int size)
+{
+	struct pipe_buffer *buf = pipe_buf(pipe, pipe->head++);
+	*buf = (struct pipe_buffer) {
+		.ops = &page_cache_pipe_buf_ops,
+		.page = page,
+		.offset = offset,
+		.len = size
+	};
+	get_page(page);
+}
+
 static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
 	struct pipe_inode_info *pipe = i->pipe;
-	struct pipe_buffer *buf;
-	unsigned int p_tail = pipe->tail;
-	unsigned int p_mask = pipe->ring_size - 1;
-	unsigned int i_head = i->head;
-	size_t off;
+	unsigned int head = pipe->head;
 
 	if (unlikely(bytes > i->count))
 		bytes = i->count;
@@ -250,32 +274,21 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
 	if (!sanity(i))
 		return 0;
 
-	off = i->iov_offset;
-	buf = &pipe->bufs[i_head & p_mask];
-	if (off) {
-		if (offset == off && buf->page == page) {
-			/* merge with the last one */
+	if (offset && i->iov_offset == offset) { // could we merge it?
+		struct pipe_buffer *buf = pipe_buf(pipe, head - 1);
+		if (buf->page == page) {
 			buf->len += bytes;
 			i->iov_offset += bytes;
-			goto out;
+			i->count -= bytes;
+			return bytes;
 		}
-		i_head++;
-		buf = &pipe->bufs[i_head & p_mask];
 	}
-	if (pipe_full(i_head, p_tail, pipe->max_usage))
+	if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
 		return 0;
 
-	buf->ops = &page_cache_pipe_buf_ops;
-	buf->flags = 0;
-	get_page(page);
-	buf->page = page;
-	buf->offset = offset;
-	buf->len = bytes;
-
-	pipe->head = i_head + 1;
+	push_page(pipe, page, offset, bytes);
 	i->iov_offset = offset + bytes;
-	i->head = i_head;
-out:
+	i->head = head;
 	i->count -= bytes;
 	return bytes;
 }
@@ -407,8 +420,6 @@ static size_t push_pipe(struct iov_iter *i, size_t size,
 			int *iter_headp, size_t *offp)
 {
 	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_tail = pipe->tail;
-	unsigned int p_mask = pipe->ring_size - 1;
 	unsigned int iter_head;
 	size_t off;
 	ssize_t left;
@@ -423,30 +434,23 @@ static size_t push_pipe(struct iov_iter *i, size_t size,
 	*iter_headp = iter_head;
 	*offp = off;
 	if (off) {
+		struct pipe_buffer *buf = pipe_buf(pipe, iter_head);
+
 		left -= PAGE_SIZE - off;
 		if (left <= 0) {
-			pipe->bufs[iter_head & p_mask].len += size;
+			buf->len += size;
 			return size;
 		}
-		pipe->bufs[iter_head & p_mask].len = PAGE_SIZE;
-		iter_head++;
+		buf->len = PAGE_SIZE;
 	}
-	while (!pipe_full(iter_head, p_tail, pipe->max_usage)) {
-		struct pipe_buffer *buf = &pipe->bufs[iter_head & p_mask];
-		struct page *page = alloc_page(GFP_USER);
+	while (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
+		struct page *page = push_anon(pipe,
+					      min_t(ssize_t, left, PAGE_SIZE));
 		if (!page)
 			break;
 
-		buf->ops = &default_pipe_buf_ops;
-		buf->flags = 0;
-		buf->page = page;
-		buf->offset = 0;
-		buf->len = min_t(ssize_t, left, PAGE_SIZE);
-		left -= buf->len;
-		iter_head++;
-		pipe->head = iter_head;
-
-		if (left == 0)
+		left -= PAGE_SIZE;
+		if (left <= 0)
 			return size;
 	}
 	return size - left;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 04/31] ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
  2022-06-18  5:35         ` [PATCH 02/31] ITER_PIPE: helper for getting pipe buffer by index Al Viro
  2022-06-18  5:35         ` [PATCH 03/31] ITER_PIPE: helpers for adding pipe buffers Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-19  1:34           ` Al Viro
  2022-06-18  5:35         ` [PATCH 05/31] ITER_PIPE: fold push_pipe() into __pipe_get_pages() Al Viro
                           ` (27 subsequent siblings)
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

New helper: append_pipe().  Extends the last buffer if possible,
allocates a new one otherwise.  Returns page and offset in it
on success, NULL on failure.  iov_iter is advanced past the
data we've got.

Use that instead of push_pipe() in copy-to-pipe primitives;
they get simpler that way.  Handling of short copy (in "mc" one)
is done simply by iov_iter_revert() - iov_iter is in consistent
state after that one, so we can use that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 159 +++++++++++++++++++++++++++++--------------------
 1 file changed, 93 insertions(+), 66 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 924854c2a7ce..d23e4ccd0564 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -259,6 +259,44 @@ static void push_page(struct pipe_inode_info *pipe, struct page *page,
 	get_page(page);
 }
 
+static inline bool allocated(struct pipe_buffer *buf)
+{
+	return buf->ops == &default_pipe_buf_ops;
+}
+
+static struct page *append_pipe(struct iov_iter *i, size_t size, size_t *off)
+{
+	struct pipe_inode_info *pipe = i->pipe;
+	size_t offset = i->iov_offset;
+	struct pipe_buffer *buf;
+	struct page *page;
+
+	if (offset && offset < PAGE_SIZE) {
+		// some space in the last buffer; can we add to it?
+		buf = pipe_buf(pipe, pipe->head - 1);
+		if (allocated(buf)) {
+			size = min_t(size_t, size, PAGE_SIZE - offset);
+			buf->len += size;
+			i->iov_offset += size;
+			i->count -= size;
+			*off = offset;
+			return buf->page;
+		}
+	}
+	// OK, we need a new buffer
+	*off = 0;
+	size = min_t(size_t, size, PAGE_SIZE);
+	if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
+		return NULL;
+	page = push_anon(pipe, size);
+	if (!page)
+		return NULL;
+	i->head = pipe->head - 1;
+	i->iov_offset = size;
+	i->count -= size;
+	return page;
+}
+
 static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
 			 struct iov_iter *i)
 {
@@ -396,11 +434,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_init);
 
-static inline bool allocated(struct pipe_buffer *buf)
-{
-	return buf->ops == &default_pipe_buf_ops;
-}
-
 static inline void data_start(const struct iov_iter *i,
 			      unsigned int *iter_headp, size_t *offp)
 {
@@ -459,28 +492,26 @@ static size_t push_pipe(struct iov_iter *i, size_t size,
 static size_t copy_pipe_to_iter(const void *addr, size_t bytes,
 				struct iov_iter *i)
 {
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_mask = pipe->ring_size - 1;
-	unsigned int i_head;
 	size_t n, off;
 
-	if (!sanity(i))
+	if (unlikely(bytes > i->count))
+		bytes = i->count;
+	if (unlikely(!bytes))
 		return 0;
 
-	bytes = n = push_pipe(i, bytes, &i_head, &off);
-	if (unlikely(!n))
+	if (!sanity(i))
 		return 0;
-	do {
+
+	n = bytes;
+	while (n) {
+		struct page *page = append_pipe(i, n, &off);
 		size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
-		memcpy_to_page(pipe->bufs[i_head & p_mask].page, off, addr, chunk);
-		i->head = i_head;
-		i->iov_offset = off + chunk;
-		n -= chunk;
+		if (!page)
+			break;
+		memcpy_to_page(page, off, addr, chunk);
 		addr += chunk;
-		off = 0;
-		i_head++;
-	} while (n);
-	i->count -= bytes;
+		n -= chunk;
+	}
 	return bytes;
 }
 
@@ -494,31 +525,32 @@ static __wsum csum_and_memcpy(void *to, const void *from, size_t len,
 static size_t csum_and_copy_to_pipe_iter(const void *addr, size_t bytes,
 					 struct iov_iter *i, __wsum *sump)
 {
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_mask = pipe->ring_size - 1;
 	__wsum sum = *sump;
 	size_t off = 0;
-	unsigned int i_head;
 	size_t r;
 
+	if (unlikely(bytes > i->count))
+		bytes = i->count;
+	if (unlikely(!bytes))
+		return 0;
+
 	if (!sanity(i))
 		return 0;
 
-	bytes = push_pipe(i, bytes, &i_head, &r);
 	while (bytes) {
+		struct page *page = append_pipe(i, bytes, &r);
 		size_t chunk = min_t(size_t, bytes, PAGE_SIZE - r);
-		char *p = kmap_local_page(pipe->bufs[i_head & p_mask].page);
+		char *p;
+
+		if (!page)
+			break;
+		p = kmap_local_page(page);
 		sum = csum_and_memcpy(p + r, addr + off, chunk, sum, off);
 		kunmap_local(p);
-		i->head = i_head;
-		i->iov_offset = r + chunk;
-		bytes -= chunk;
 		off += chunk;
-		r = 0;
-		i_head++;
+		bytes -= chunk;
 	}
 	*sump = sum;
-	i->count -= off;
 	return off;
 }
 
@@ -550,39 +582,35 @@ static int copyout_mc(void __user *to, const void *from, size_t n)
 static size_t copy_mc_pipe_to_iter(const void *addr, size_t bytes,
 				struct iov_iter *i)
 {
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_mask = pipe->ring_size - 1;
-	unsigned int i_head;
-	unsigned int valid = pipe->head;
-	size_t n, off, xfer = 0;
+	size_t off, xfer = 0;
+
+	if (unlikely(bytes > i->count))
+		bytes = i->count;
+	if (unlikely(!bytes))
+		return 0;
 
 	if (!sanity(i))
 		return 0;
 
-	n = push_pipe(i, bytes, &i_head, &off);
-	while (n) {
-		size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
-		char *p = kmap_local_page(pipe->bufs[i_head & p_mask].page);
+	while (bytes) {
+		struct page *page = append_pipe(i, bytes, &off);
+		size_t chunk = min_t(size_t, bytes, PAGE_SIZE - off);
 		unsigned long rem;
+		char *p;
+
+		if (!page)
+			break;
+		p = kmap_local_page(page);
 		rem = copy_mc_to_kernel(p + off, addr + xfer, chunk);
 		chunk -= rem;
 		kunmap_local(p);
-		if (chunk) {
-			i->head = i_head;
-			i->iov_offset = off + chunk;
-			xfer += chunk;
-			valid = i_head + 1;
-		}
+		xfer += chunk;
+		bytes -= chunk;
 		if (rem) {
-			pipe->bufs[i_head & p_mask].len -= rem;
-			pipe_discard_from(pipe, valid);
+			iov_iter_revert(i, rem);
 			break;
 		}
-		n -= chunk;
-		off = 0;
-		i_head++;
 	}
-	i->count -= xfer;
 	return xfer;
 }
 
@@ -769,30 +797,29 @@ EXPORT_SYMBOL(copy_page_from_iter);
 
 static size_t pipe_zero(size_t bytes, struct iov_iter *i)
 {
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_mask = pipe->ring_size - 1;
-	unsigned int i_head;
 	size_t n, off;
 
-	if (!sanity(i))
+	if (unlikely(bytes > i->count))
+		bytes = i->count;
+	if (unlikely(!bytes))
 		return 0;
 
-	bytes = n = push_pipe(i, bytes, &i_head, &off);
-	if (unlikely(!n))
+	if (!sanity(i))
 		return 0;
 
-	do {
+	n = bytes;
+	while (n) {
+		struct page *page = append_pipe(i, n, &off);
 		size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
-		char *p = kmap_local_page(pipe->bufs[i_head & p_mask].page);
+		char *p;
+
+		if (page)
+			break;
+		p = kmap_local_page(page);
 		memset(p + off, 0, chunk);
 		kunmap_local(p);
-		i->head = i_head;
-		i->iov_offset = off + chunk;
 		n -= chunk;
-		off = 0;
-		i_head++;
-	} while (n);
-	i->count -= bytes;
+	}
 	return bytes;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 05/31] ITER_PIPE: fold push_pipe() into __pipe_get_pages()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (2 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 04/31] ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 06/31] ITER_PIPE: lose iter_head argument of __pipe_get_pages() Al Viro
                           ` (26 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

	Expand the only remaining call of push_pipe() (in
__pipe_get_pages()), combine it with the page-collecting loop there.

Note that the only reason it's not a loop doing append_pipe() is
that append_pipe() is advancing, while iov_iter_get_pages() is not.
As soon as it switches to saner semantics, this thing will switch
to using append_pipe().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 80 ++++++++++++++++----------------------------------
 1 file changed, 25 insertions(+), 55 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index d23e4ccd0564..603e5a55fe4e 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -449,46 +449,6 @@ static inline void data_start(const struct iov_iter *i,
 	*offp = off;
 }
 
-static size_t push_pipe(struct iov_iter *i, size_t size,
-			int *iter_headp, size_t *offp)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int iter_head;
-	size_t off;
-	ssize_t left;
-
-	if (unlikely(size > i->count))
-		size = i->count;
-	if (unlikely(!size))
-		return 0;
-
-	left = size;
-	data_start(i, &iter_head, &off);
-	*iter_headp = iter_head;
-	*offp = off;
-	if (off) {
-		struct pipe_buffer *buf = pipe_buf(pipe, iter_head);
-
-		left -= PAGE_SIZE - off;
-		if (left <= 0) {
-			buf->len += size;
-			return size;
-		}
-		buf->len = PAGE_SIZE;
-	}
-	while (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
-		struct page *page = push_anon(pipe,
-					      min_t(ssize_t, left, PAGE_SIZE));
-		if (!page)
-			break;
-
-		left -= PAGE_SIZE;
-		if (left <= 0)
-			return size;
-	}
-	return size - left;
-}
-
 static size_t copy_pipe_to_iter(const void *addr, size_t bytes,
 				struct iov_iter *i)
 {
@@ -1261,23 +1221,33 @@ static inline ssize_t __pipe_get_pages(struct iov_iter *i,
 				size_t maxsize,
 				struct page **pages,
 				int iter_head,
-				size_t *start)
+				size_t off)
 {
 	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_mask = pipe->ring_size - 1;
-	ssize_t n = push_pipe(i, maxsize, &iter_head, start);
-	if (!n)
-		return -EFAULT;
+	ssize_t left = maxsize;
 
-	maxsize = n;
-	n += *start;
-	while (n > 0) {
-		get_page(*pages++ = pipe->bufs[iter_head & p_mask].page);
-		iter_head++;
-		n -= PAGE_SIZE;
-	}
+	if (off) {
+		struct pipe_buffer *buf = pipe_buf(pipe, iter_head);
 
-	return maxsize;
+		get_page(*pages++ = buf->page);
+		left -= PAGE_SIZE - off;
+		if (left <= 0) {
+			buf->len += maxsize;
+			return maxsize;
+		}
+		buf->len = PAGE_SIZE;
+	}
+	while (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
+		struct page *page = push_anon(pipe,
+					      min_t(ssize_t, left, PAGE_SIZE));
+		if (!page)
+			break;
+		get_page(*pages++ = page);
+		left -= PAGE_SIZE;
+		if (left <= 0)
+			return maxsize;
+	}
+	return maxsize - left ? : -EFAULT;
 }
 
 static ssize_t pipe_get_pages(struct iov_iter *i,
@@ -1295,7 +1265,7 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe);
 	capacity = min(npages, maxpages) * PAGE_SIZE - *start;
 
-	return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start);
+	return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, *start);
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
@@ -1491,7 +1461,7 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
 	p = get_pages_array(npages);
 	if (!p)
 		return -ENOMEM;
-	n = __pipe_get_pages(i, maxsize, p, iter_head, start);
+	n = __pipe_get_pages(i, maxsize, p, iter_head, *start);
 	if (n > 0)
 		*pages = p;
 	else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 06/31] ITER_PIPE: lose iter_head argument of __pipe_get_pages()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (3 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 05/31] ITER_PIPE: fold push_pipe() into __pipe_get_pages() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 07/31] ITER_PIPE: clean pipe_advance() up Al Viro
                           ` (25 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

it's only used to get to the partial buffer we can add to,
and that's always the last one, i.e. pipe->head - 1.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 603e5a55fe4e..892810c6ec61 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1220,14 +1220,13 @@ EXPORT_SYMBOL(iov_iter_gap_alignment);
 static inline ssize_t __pipe_get_pages(struct iov_iter *i,
 				size_t maxsize,
 				struct page **pages,
-				int iter_head,
 				size_t off)
 {
 	struct pipe_inode_info *pipe = i->pipe;
 	ssize_t left = maxsize;
 
 	if (off) {
-		struct pipe_buffer *buf = pipe_buf(pipe, iter_head);
+		struct pipe_buffer *buf = pipe_buf(pipe, pipe->head - 1);
 
 		get_page(*pages++ = buf->page);
 		left -= PAGE_SIZE - off;
@@ -1265,7 +1264,7 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe);
 	capacity = min(npages, maxpages) * PAGE_SIZE - *start;
 
-	return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, *start);
+	return __pipe_get_pages(i, min(maxsize, capacity), pages, *start);
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
@@ -1461,7 +1460,7 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
 	p = get_pages_array(npages);
 	if (!p)
 		return -ENOMEM;
-	n = __pipe_get_pages(i, maxsize, p, iter_head, *start);
+	n = __pipe_get_pages(i, maxsize, p, *start);
 	if (n > 0)
 		*pages = p;
 	else
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 07/31] ITER_PIPE: clean pipe_advance() up
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (4 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 06/31] ITER_PIPE: lose iter_head argument of __pipe_get_pages() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 08/31] ITER_PIPE: clean iov_iter_revert() Al Viro
                           ` (24 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

instead of setting ->iov_offset for new position and calling
pipe_truncate() to adjust ->len of the last buffer and discard
everything after it, adjust ->len at the same time we set ->iov_offset
and use pipe_discard_from() to deal with buffers past that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 892810c6ec61..ce2ce5b0c600 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -847,27 +847,27 @@ static inline void pipe_truncate(struct iov_iter *i)
 static void pipe_advance(struct iov_iter *i, size_t size)
 {
 	struct pipe_inode_info *pipe = i->pipe;
-	if (size) {
-		struct pipe_buffer *buf;
-		unsigned int p_mask = pipe->ring_size - 1;
-		unsigned int i_head = i->head;
-		size_t off = i->iov_offset, left = size;
+	unsigned int off = i->iov_offset;
 
+	if (!off && !size) {
+		pipe_discard_from(pipe, i->start_head); // discard everything
+		return;
+	}
+	i->count -= size;
+	while (1) {
+		struct pipe_buffer *buf = pipe_buf(pipe, i->head);
 		if (off) /* make it relative to the beginning of buffer */
-			left += off - pipe->bufs[i_head & p_mask].offset;
-		while (1) {
-			buf = &pipe->bufs[i_head & p_mask];
-			if (left <= buf->len)
-				break;
-			left -= buf->len;
-			i_head++;
+			size += off - buf->offset;
+		if (size <= buf->len) {
+			buf->len = size;
+			i->iov_offset = buf->offset + size;
+			break;
 		}
-		i->head = i_head;
-		i->iov_offset = buf->offset + left;
+		size -= buf->len;
+		i->head++;
+		off = 0;
 	}
-	i->count -= size;
-	/* ... and discard everything past that point */
-	pipe_truncate(i);
+	pipe_discard_from(pipe, i->head + 1); // discard everything past this one
 }
 
 static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 08/31] ITER_PIPE: clean iov_iter_revert()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (5 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 07/31] ITER_PIPE: clean pipe_advance() up Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 09/31] ITER_PIPE: cache the type of last buffer Al Viro
                           ` (23 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Fold pipe_truncate() into it, clean up.  We can release buffers
in the same loop where we walk backwards to the iterator beginning
looking for the place where the new position will be.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 60 ++++++++++++--------------------------------------
 1 file changed, 14 insertions(+), 46 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index ce2ce5b0c600..62afba79e600 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -818,32 +818,6 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t byt
 }
 EXPORT_SYMBOL(copy_page_from_iter_atomic);
 
-static inline void pipe_truncate(struct iov_iter *i)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int p_tail = pipe->tail;
-	unsigned int p_head = pipe->head;
-	unsigned int p_mask = pipe->ring_size - 1;
-
-	if (!pipe_empty(p_head, p_tail)) {
-		struct pipe_buffer *buf;
-		unsigned int i_head = i->head;
-		size_t off = i->iov_offset;
-
-		if (off) {
-			buf = &pipe->bufs[i_head & p_mask];
-			buf->len = off - buf->offset;
-			i_head++;
-		}
-		while (p_head != i_head) {
-			p_head--;
-			pipe_buf_release(pipe, &pipe->bufs[p_head & p_mask]);
-		}
-
-		pipe->head = p_head;
-	}
-}
-
 static void pipe_advance(struct iov_iter *i, size_t size)
 {
 	struct pipe_inode_info *pipe = i->pipe;
@@ -938,28 +912,22 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 	i->count += unroll;
 	if (unlikely(iov_iter_is_pipe(i))) {
 		struct pipe_inode_info *pipe = i->pipe;
-		unsigned int p_mask = pipe->ring_size - 1;
-		unsigned int i_head = i->head;
-		size_t off = i->iov_offset;
-		while (1) {
-			struct pipe_buffer *b = &pipe->bufs[i_head & p_mask];
-			size_t n = off - b->offset;
-			if (unroll < n) {
-				off -= unroll;
-				break;
-			}
-			unroll -= n;
-			if (!unroll && i_head == i->start_head) {
-				off = 0;
-				break;
+		unsigned int head = pipe->head;
+
+		while (head > i->start_head) {
+			struct pipe_buffer *b = pipe_buf(pipe, --head);
+			if (unroll < b->len) {
+				b->len -= unroll;
+				i->iov_offset = b->offset + b->len;
+				i->head = head;
+				return;
 			}
-			i_head--;
-			b = &pipe->bufs[i_head & p_mask];
-			off = b->offset + b->len;
+			unroll -= b->len;
+			pipe_buf_release(pipe, b);
+			pipe->head--;
 		}
-		i->iov_offset = off;
-		i->head = i_head;
-		pipe_truncate(i);
+		i->iov_offset = 0;
+		i->head = head;
 		return;
 	}
 	if (unlikely(iov_iter_is_discard(i)))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 09/31] ITER_PIPE: cache the type of last buffer
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (6 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 08/31] ITER_PIPE: clean iov_iter_revert() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
                           ` (22 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

We often need to find whether the last buffer is anon or not, and
currently it's rather clumsy:
	check if ->iov_offset is non-zero (i.e. that pipe is not empty)
	if so, get the corresponding pipe_buffer and check its ->ops
	if it's &default_pipe_buf_ops, we have an anon buffer.

Let's replace the use of ->iov_offset (which is nowhere near similar to
its role for other flavours) with signed field (->last_offset), with
the following rules:
	empty, no buffers occupied:		0
	anon, with bytes up to N-1 filled:	N
	zero-copy, with bytes up to N-1 filled:	-N

That way abs(i->last_offset) is equal to what used to be in i->iov_offset
and empty vs. anon vs. zero-copy can be distinguished by the sign of
i->last_offset.

	Checks for "should we extend the last buffer or should we start
a new one?" become easier to follow that way.

	Note that most of the operations can only be done in a sane
state - i.e. when the pipe has nothing past the current position of
iterator.  About the only thing that could be done outside of that
state is iov_iter_advance(), which transitions to the sane state by
truncating the pipe.  There are only two cases where we leave the
sane state:
	1) iov_iter_get_pages()/iov_iter_get_pages_alloc().  Will be
dealt with later, when we make get_pages advancing - the callers are
actually happier that way.
	2) iov_iter copied, then something is put into the copy.  Since
they share the underlying pipe, the original gets behind.  When we
decide that we are done with the copy (original is not usable until then)
we advance the original.  direct_io used to be done that way; nowadays
it operates on the original and we do iov_iter_revert() to discard
the excessive data.  At the moment there's nothing in the kernel that
could do that to ITER_PIPE iterators, so this reason for insane state
is theoretical right now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/uio.h |  5 +++-
 lib/iov_iter.c      | 72 ++++++++++++++++++++++-----------------------
 2 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 6ab4260c3d6c..d3e13b37ea72 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -40,7 +40,10 @@ struct iov_iter {
 	bool nofault;
 	bool data_source;
 	bool user_backed;
-	size_t iov_offset;
+	union {
+		size_t iov_offset;
+		int last_offset;
+	};
 	size_t count;
 	union {
 		const struct iovec *iov;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 62afba79e600..f6e5c20ed1c8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -199,7 +199,7 @@ static bool sanity(const struct iov_iter *i)
 	unsigned int i_head = i->head;
 	unsigned int idx;
 
-	if (i->iov_offset) {
+	if (i->last_offset) {
 		struct pipe_buffer *p;
 		if (unlikely(p_occupancy == 0))
 			goto Bad;	// pipe must be non-empty
@@ -207,7 +207,7 @@ static bool sanity(const struct iov_iter *i)
 			goto Bad;	// must be at the last buffer...
 
 		p = pipe_buf(pipe, i_head);
-		if (unlikely(p->offset + p->len != i->iov_offset))
+		if (unlikely(p->offset + p->len != abs(i->last_offset)))
 			goto Bad;	// ... at the end of segment
 	} else {
 		if (i_head != p_head)
@@ -215,7 +215,7 @@ static bool sanity(const struct iov_iter *i)
 	}
 	return true;
 Bad:
-	printk(KERN_ERR "idx = %d, offset = %zd\n", i_head, i->iov_offset);
+	printk(KERN_ERR "idx = %d, offset = %d\n", i_head, i->last_offset);
 	printk(KERN_ERR "head = %d, tail = %d, buffers = %d\n",
 			p_head, p_tail, pipe->ring_size);
 	for (idx = 0; idx < pipe->ring_size; idx++)
@@ -259,29 +259,30 @@ static void push_page(struct pipe_inode_info *pipe, struct page *page,
 	get_page(page);
 }
 
-static inline bool allocated(struct pipe_buffer *buf)
+static inline int last_offset(const struct pipe_buffer *buf)
 {
-	return buf->ops == &default_pipe_buf_ops;
+	if (buf->ops == &default_pipe_buf_ops)
+		return buf->len;	// buf->offset is 0 for those
+	else
+		return -(buf->offset + buf->len);
 }
 
 static struct page *append_pipe(struct iov_iter *i, size_t size, size_t *off)
 {
 	struct pipe_inode_info *pipe = i->pipe;
-	size_t offset = i->iov_offset;
+	int offset = i->last_offset;
 	struct pipe_buffer *buf;
 	struct page *page;
 
-	if (offset && offset < PAGE_SIZE) {
-		// some space in the last buffer; can we add to it?
+	if (offset > 0 && offset < PAGE_SIZE) {
+		// some space in the last buffer; add to it
 		buf = pipe_buf(pipe, pipe->head - 1);
-		if (allocated(buf)) {
-			size = min_t(size_t, size, PAGE_SIZE - offset);
-			buf->len += size;
-			i->iov_offset += size;
-			i->count -= size;
-			*off = offset;
-			return buf->page;
-		}
+		size = min_t(size_t, size, PAGE_SIZE - offset);
+		buf->len += size;
+		i->last_offset += size;
+		i->count -= size;
+		*off = offset;
+		return buf->page;
 	}
 	// OK, we need a new buffer
 	*off = 0;
@@ -292,7 +293,7 @@ static struct page *append_pipe(struct iov_iter *i, size_t size, size_t *off)
 	if (!page)
 		return NULL;
 	i->head = pipe->head - 1;
-	i->iov_offset = size;
+	i->last_offset = size;
 	i->count -= size;
 	return page;
 }
@@ -312,11 +313,11 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
 	if (!sanity(i))
 		return 0;
 
-	if (offset && i->iov_offset == offset) { // could we merge it?
+	if (offset && i->last_offset == -offset) { // could we merge it?
 		struct pipe_buffer *buf = pipe_buf(pipe, head - 1);
 		if (buf->page == page) {
 			buf->len += bytes;
-			i->iov_offset += bytes;
+			i->last_offset -= bytes;
 			i->count -= bytes;
 			return bytes;
 		}
@@ -325,7 +326,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by
 		return 0;
 
 	push_page(pipe, page, offset, bytes);
-	i->iov_offset = offset + bytes;
+	i->last_offset = -(offset + bytes);
 	i->head = head;
 	i->count -= bytes;
 	return bytes;
@@ -437,16 +438,15 @@ EXPORT_SYMBOL(iov_iter_init);
 static inline void data_start(const struct iov_iter *i,
 			      unsigned int *iter_headp, size_t *offp)
 {
-	unsigned int iter_head = i->head;
-	size_t off = i->iov_offset;
+	int off = i->last_offset;
 
-	if (off && (!allocated(pipe_buf(i->pipe, iter_head)) ||
-		    off == PAGE_SIZE)) {
-		iter_head++;
-		off = 0;
+	if (off > 0 && off < PAGE_SIZE) { // anon and not full
+		*iter_headp = i->pipe->head - 1;
+		*offp = off;
+	} else {
+		*iter_headp = i->pipe->head;
+		*offp = 0;
 	}
-	*iter_headp = iter_head;
-	*offp = off;
 }
 
 static size_t copy_pipe_to_iter(const void *addr, size_t bytes,
@@ -821,7 +821,7 @@ EXPORT_SYMBOL(copy_page_from_iter_atomic);
 static void pipe_advance(struct iov_iter *i, size_t size)
 {
 	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int off = i->iov_offset;
+	int off = i->last_offset;
 
 	if (!off && !size) {
 		pipe_discard_from(pipe, i->start_head); // discard everything
@@ -831,10 +831,10 @@ static void pipe_advance(struct iov_iter *i, size_t size)
 	while (1) {
 		struct pipe_buffer *buf = pipe_buf(pipe, i->head);
 		if (off) /* make it relative to the beginning of buffer */
-			size += off - buf->offset;
+			size += abs(off) - buf->offset;
 		if (size <= buf->len) {
 			buf->len = size;
-			i->iov_offset = buf->offset + size;
+			i->last_offset = last_offset(buf);
 			break;
 		}
 		size -= buf->len;
@@ -918,7 +918,7 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 			struct pipe_buffer *b = pipe_buf(pipe, --head);
 			if (unroll < b->len) {
 				b->len -= unroll;
-				i->iov_offset = b->offset + b->len;
+				i->last_offset = last_offset(b);
 				i->head = head;
 				return;
 			}
@@ -926,7 +926,7 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll)
 			pipe_buf_release(pipe, b);
 			pipe->head--;
 		}
-		i->iov_offset = 0;
+		i->last_offset = 0;
 		i->head = head;
 		return;
 	}
@@ -1029,7 +1029,7 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction,
 		.pipe = pipe,
 		.head = pipe->head,
 		.start_head = pipe->head,
-		.iov_offset = 0,
+		.last_offset = 0,
 		.count = count
 	};
 }
@@ -1145,8 +1145,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i)
 	if (iov_iter_is_pipe(i)) {
 		size_t size = i->count;
 
-		if (size && i->iov_offset && allocated(pipe_buf(i->pipe, i->head)))
-			return size | i->iov_offset;
+		if (size && i->last_offset > 0)
+			return size | i->last_offset;
 		return size;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (7 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 09/31] ITER_PIPE: cache the type of last buffer Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 10/31] ITER_PIPE: fold data_start() and pipe_space_for_user() together Al Viro
                           ` (21 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

do what we do for iovec/kvec; that ends up generating better code,
AFAICS.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8275b28e886b..93ceb13ec7b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -870,17 +870,22 @@ static void pipe_advance(struct iov_iter *i, size_t size)
 
 static void iov_iter_bvec_advance(struct iov_iter *i, size_t size)
 {
-	struct bvec_iter bi;
+	const struct bio_vec *bvec, *end;
 
-	bi.bi_size = i->count;
-	bi.bi_bvec_done = i->iov_offset;
-	bi.bi_idx = 0;
-	bvec_iter_advance(i->bvec, &bi, size);
+	if (!i->count)
+		return;
+	i->count -= size;
+
+	size += i->iov_offset;
 
-	i->bvec += bi.bi_idx;
-	i->nr_segs -= bi.bi_idx;
-	i->count = bi.bi_size;
-	i->iov_offset = bi.bi_bvec_done;
+	for (bvec = i->bvec, end = bvec + i->nr_segs; bvec < end; bvec++) {
+		if (likely(size < bvec->bv_len))
+			break;
+		size -= bvec->bv_len;
+	}
+	i->iov_offset = size;
+	i->nr_segs -= bvec - i->bvec;
+	i->bvec = bvec;
 }
 
 static void iov_iter_iovec_advance(struct iov_iter *i, size_t size)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 10/31] ITER_PIPE: fold data_start() and pipe_space_for_user() together
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (8 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-19  2:25           ` Al Viro
  2022-06-18  5:35         ` [PATCH 11/31] iov_iter_get_pages{,_alloc}(): cap the maxsize with LONG_MAX Al Viro
                           ` (20 subsequent siblings)
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

All their callers are next to each other; all of them
want the total amount of pages and, possibly, the
offset in the partial final buffer.

Combine into a new helper (pipe_npages()), fix the
bogosity in pipe_space_for_user(), while we are at it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/pipe_fs_i.h | 20 ------------------
 lib/iov_iter.c            | 44 +++++++++++++++++----------------------
 2 files changed, 19 insertions(+), 45 deletions(-)

diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 4ea496924106..6cb65df3e3ba 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -156,26 +156,6 @@ static inline bool pipe_full(unsigned int head, unsigned int tail,
 	return pipe_occupancy(head, tail) >= limit;
 }
 
-/**
- * pipe_space_for_user - Return number of slots available to userspace
- * @head: The pipe ring head pointer
- * @tail: The pipe ring tail pointer
- * @pipe: The pipe info structure
- */
-static inline unsigned int pipe_space_for_user(unsigned int head, unsigned int tail,
-					       struct pipe_inode_info *pipe)
-{
-	unsigned int p_occupancy, p_space;
-
-	p_occupancy = pipe_occupancy(head, tail);
-	if (p_occupancy >= pipe->max_usage)
-		return 0;
-	p_space = pipe->ring_size - p_occupancy;
-	if (p_space > pipe->max_usage)
-		p_space = pipe->max_usage;
-	return p_space;
-}
-
 /**
  * pipe_buf_get - get a reference to a pipe_buffer
  * @pipe:	the pipe that the buffer belongs to
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f6e5c20ed1c8..3abd1c596520 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -435,18 +435,20 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
 }
 EXPORT_SYMBOL(iov_iter_init);
 
-static inline void data_start(const struct iov_iter *i,
-			      unsigned int *iter_headp, size_t *offp)
+// returns the offset in partial buffer (if any)
+static inline unsigned int pipe_npages(const struct iov_iter *i, int *npages)
 {
+	struct pipe_inode_info *pipe = i->pipe;
+	int used = pipe->head - pipe->tail;
 	int off = i->last_offset;
 
+	*npages = max(used - (int)pipe->max_usage, 0);
+
 	if (off > 0 && off < PAGE_SIZE) { // anon and not full
-		*iter_headp = i->pipe->head - 1;
-		*offp = off;
-	} else {
-		*iter_headp = i->pipe->head;
-		*offp = 0;
+		(*npages)++;
+		return off;
 	}
+	return 0;
 }
 
 static size_t copy_pipe_to_iter(const void *addr, size_t bytes,
@@ -1221,18 +1223,16 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
 {
-	unsigned int iter_head, npages;
+	unsigned int npages, off;
 	size_t capacity;
 
 	if (!sanity(i))
 		return -EFAULT;
 
-	data_start(i, &iter_head, start);
-	/* Amount of free space: some of this one + all after this one */
-	npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe);
-	capacity = min(npages, maxpages) * PAGE_SIZE - *start;
+	*start = off = pipe_npages(i, &npages);
+	capacity = min(npages, maxpages) * PAGE_SIZE - off;
 
-	return __pipe_get_pages(i, min(maxsize, capacity), pages, *start);
+	return __pipe_get_pages(i, min(maxsize, capacity), pages, off);
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
@@ -1411,24 +1411,22 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
 		   size_t *start)
 {
 	struct page **p;
-	unsigned int iter_head, npages;
+	unsigned int npages, off;
 	ssize_t n;
 
 	if (!sanity(i))
 		return -EFAULT;
 
-	data_start(i, &iter_head, start);
-	/* Amount of free space: some of this one + all after this one */
-	npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe);
-	n = npages * PAGE_SIZE - *start;
+	*start = off = pipe_npages(i, &npages);
+	n = npages * PAGE_SIZE - off;
 	if (maxsize > n)
 		maxsize = n;
 	else
-		npages = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
+		npages = DIV_ROUND_UP(maxsize + off, PAGE_SIZE);
 	p = get_pages_array(npages);
 	if (!p)
 		return -ENOMEM;
-	n = __pipe_get_pages(i, maxsize, p, *start);
+	n = __pipe_get_pages(i, maxsize, p, off);
 	if (n > 0)
 		*pages = p;
 	else
@@ -1653,16 +1651,12 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages)
 	if (iov_iter_is_bvec(i))
 		return bvec_npages(i, maxpages);
 	if (iov_iter_is_pipe(i)) {
-		unsigned int iter_head;
 		int npages;
-		size_t off;
 
 		if (!sanity(i))
 			return 0;
 
-		data_start(i, &iter_head, &off);
-		/* some of this one + all after this one */
-		npages = pipe_space_for_user(iter_head, i->pipe->tail, i->pipe);
+		pipe_npages(i, &npages);
 		return min(npages, maxpages);
 	}
 	if (iov_iter_is_xarray(i)) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 11/31] iov_iter_get_pages{,_alloc}(): cap the maxsize with LONG_MAX
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (9 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 10/31] ITER_PIPE: fold data_start() and pipe_space_for_user() together Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 12/31] iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper Al Viro
                           ` (19 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

All callers can and should handle iov_iter_get_pages() returning
fewer pages than requested.  All in-kernel ones do.  And it makes
the arithmetical overflow analysis much simpler...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c    | 2 +-
 lib/iov_iter.c | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index 6645b30ec990..493878bd9bb9 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1168,7 +1168,7 @@ static int iter_to_pipe(struct iov_iter *from,
 		size_t start;
 		int n;
 
-		copied = iov_iter_get_pages(from, pages, ~0UL, 16, &start);
+		copied = iov_iter_get_pages(from, pages, LONG_MAX, 16, &start);
 		if (copied <= 0) {
 			ret = copied;
 			break;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 3abd1c596520..2d4176a2a1b5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1367,6 +1367,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 		maxsize = i->count;
 	if (!maxsize)
 		return 0;
+	if (maxsize > LONG_MAX)
+		maxsize = LONG_MAX;
 
 	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
@@ -1485,6 +1487,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		maxsize = i->count;
 	if (!maxsize)
 		return 0;
+	if (maxsize > LONG_MAX)
+		maxsize = LONG_MAX;
 
 	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 12/31] iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (10 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 11/31] iov_iter_get_pages{,_alloc}(): cap the maxsize with LONG_MAX Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 13/31] iov_iter_get_pages(): sanity-check arguments Al Viro
                           ` (18 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Incidentally, ITER_XARRAY did *not* free the sucker in case when
iter_xarray_populate_pages() returned NULL...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 51 +++++++++++++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 2d4176a2a1b5..f5e14535f6bb 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1425,15 +1425,10 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
 		maxsize = n;
 	else
 		npages = DIV_ROUND_UP(maxsize + off, PAGE_SIZE);
-	p = get_pages_array(npages);
+	*pages = p = get_pages_array(npages);
 	if (!p)
 		return -ENOMEM;
-	n = __pipe_get_pages(i, maxsize, p, off);
-	if (n > 0)
-		*pages = p;
-	else
-		kvfree(p);
-	return n;
+	return __pipe_get_pages(i, maxsize, p, off);
 }
 
 static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i,
@@ -1463,10 +1458,9 @@ static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i,
 			count++;
 	}
 
-	p = get_pages_array(count);
+	*pages = p = get_pages_array(count);
 	if (!p)
 		return -ENOMEM;
-	*pages = p;
 
 	nr = iter_xarray_populate_pages(p, i->xarray, index, count);
 	if (nr == 0)
@@ -1475,7 +1469,7 @@ static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i,
 	return min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
 }
 
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   size_t *start)
 {
@@ -1483,10 +1477,6 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 	size_t len;
 	int n, res;
 
-	if (maxsize > i->count)
-		maxsize = i->count;
-	if (!maxsize)
-		return 0;
 	if (maxsize > LONG_MAX)
 		maxsize = LONG_MAX;
 
@@ -1501,17 +1491,15 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 
 		addr = first_iovec_segment(i, &len, start, maxsize, ~0U);
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
-		p = get_pages_array(n);
+		*pages = p = get_pages_array(n);
 		if (!p)
 			return -ENOMEM;
 		res = get_user_pages_fast(addr, n, gup_flags, p);
-		if (unlikely(res <= 0)) {
-			kvfree(p);
-			*pages = NULL;
+		if (unlikely(res <= 0))
 			return res;
-		}
-		*pages = p;
-		return (res == n ? len : res * PAGE_SIZE) - *start;
+		if (res < n)
+			len = res * PAGE_SIZE;
+		return len - *start;
 	}
 	if (iov_iter_is_bvec(i)) {
 		struct page *page;
@@ -1531,6 +1519,27 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		return iter_xarray_get_pages_alloc(i, pages, maxsize, start);
 	return -EFAULT;
 }
+
+ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+		   struct page ***pages, size_t maxsize,
+		   size_t *start)
+{
+	size_t len;
+
+	*pages = NULL;
+
+	if (maxsize > i->count)
+		maxsize = i->count;
+	if (!maxsize)
+		return 0;
+
+	len = __iov_iter_get_pages_alloc(i, pages, maxsize, start);
+	if (len <= 0) {
+		kvfree(*pages);
+		*pages = NULL;
+	}
+	return len;
+}
 EXPORT_SYMBOL(iov_iter_get_pages_alloc);
 
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 13/31] iov_iter_get_pages(): sanity-check arguments
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (11 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 12/31] iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-19  3:07           ` Al Viro
  2022-06-18  5:35         ` [PATCH 14/31] unify pipe_get_pages() and pipe_get_pages_alloc() Al Viro
                           ` (17 subsequent siblings)
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

zero maxpages is bogus, but best treated as "just return 0";
NULL pages, OTOH, should be treated as a hard bug.

get rid of now completely useless checks in xarray_get_pages{,_alloc}().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f5e14535f6bb..369fbb10b16f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1271,9 +1271,6 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	size_t size = maxsize;
 	loff_t pos;
 
-	if (!size || !maxpages)
-		return 0;
-
 	pos = i->xarray_start + i->iov_offset;
 	index = pos >> PAGE_SHIFT;
 	offset = pos & ~PAGE_MASK;
@@ -1365,10 +1362,11 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 
 	if (maxsize > i->count)
 		maxsize = i->count;
-	if (!maxsize)
+	if (!maxsize || maxpages)
 		return 0;
 	if (maxsize > LONG_MAX)
 		maxsize = LONG_MAX;
+	BUG_ON(!pages);
 
 	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
@@ -1441,9 +1439,6 @@ static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i,
 	size_t size = maxsize;
 	loff_t pos;
 
-	if (!size)
-		return 0;
-
 	pos = i->xarray_start + i->iov_offset;
 	index = pos >> PAGE_SHIFT;
 	offset = pos & ~PAGE_MASK;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 14/31] unify pipe_get_pages() and pipe_get_pages_alloc()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (12 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 13/31] iov_iter_get_pages(): sanity-check arguments Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 15/31] unify xarray_get_pages() and xarray_get_pages_alloc() Al Viro
                           ` (16 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

	The differences between those two are
* pipe_get_pages() gets a non-NULL struct page ** value pointing to
preallocated array + array size.
* pipe_get_pages_alloc() gets an address of struct page ** variable that
contains NULL, allocates the array and (on success) stores its address in
that variable.

	Not hard to combine - always pass struct page ***, have
the previous pipe_get_pages_alloc() caller pass ~0U as cap for
array size.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 49 +++++++++++++++++--------------------------------
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 369fbb10b16f..fb8a44f6c5a2 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1187,6 +1187,11 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 }
 EXPORT_SYMBOL(iov_iter_gap_alignment);
 
+static struct page **get_pages_array(size_t n)
+{
+	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
+}
+
 static inline ssize_t __pipe_get_pages(struct iov_iter *i,
 				size_t maxsize,
 				struct page **pages,
@@ -1220,10 +1225,11 @@ static inline ssize_t __pipe_get_pages(struct iov_iter *i,
 }
 
 static ssize_t pipe_get_pages(struct iov_iter *i,
-		   struct page **pages, size_t maxsize, unsigned maxpages,
+		   struct page ***pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
 {
 	unsigned int npages, off;
+	struct page **p;
 	size_t capacity;
 
 	if (!sanity(i))
@@ -1231,8 +1237,15 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 
 	*start = off = pipe_npages(i, &npages);
 	capacity = min(npages, maxpages) * PAGE_SIZE - off;
+	maxsize = min(maxsize, capacity);
+	p = *pages;
+	if (!p) {
+		*pages = p = get_pages_array(DIV_ROUND_UP(maxsize + off, PAGE_SIZE));
+		if (!p)
+			return -ENOMEM;
+	}
 
-	return __pipe_get_pages(i, min(maxsize, capacity), pages, off);
+	return __pipe_get_pages(i, maxsize, p, off);
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
@@ -1394,41 +1407,13 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 		return len - *start;
 	}
 	if (iov_iter_is_pipe(i))
-		return pipe_get_pages(i, pages, maxsize, maxpages, start);
+		return pipe_get_pages(i, &pages, maxsize, maxpages, start);
 	if (iov_iter_is_xarray(i))
 		return iter_xarray_get_pages(i, pages, maxsize, maxpages, start);
 	return -EFAULT;
 }
 EXPORT_SYMBOL(iov_iter_get_pages);
 
-static struct page **get_pages_array(size_t n)
-{
-	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
-}
-
-static ssize_t pipe_get_pages_alloc(struct iov_iter *i,
-		   struct page ***pages, size_t maxsize,
-		   size_t *start)
-{
-	struct page **p;
-	unsigned int npages, off;
-	ssize_t n;
-
-	if (!sanity(i))
-		return -EFAULT;
-
-	*start = off = pipe_npages(i, &npages);
-	n = npages * PAGE_SIZE - off;
-	if (maxsize > n)
-		maxsize = n;
-	else
-		npages = DIV_ROUND_UP(maxsize + off, PAGE_SIZE);
-	*pages = p = get_pages_array(npages);
-	if (!p)
-		return -ENOMEM;
-	return __pipe_get_pages(i, maxsize, p, off);
-}
-
 static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i,
 					   struct page ***pages, size_t maxsize,
 					   size_t *_start_offset)
@@ -1509,7 +1494,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		return len - *start;
 	}
 	if (iov_iter_is_pipe(i))
-		return pipe_get_pages_alloc(i, pages, maxsize, start);
+		return pipe_get_pages(i, pages, maxsize, ~0U, start);
 	if (iov_iter_is_xarray(i))
 		return iter_xarray_get_pages_alloc(i, pages, maxsize, start);
 	return -EFAULT;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 15/31] unify xarray_get_pages() and xarray_get_pages_alloc()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (13 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 14/31] unify pipe_get_pages() and pipe_get_pages_alloc() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 16/31] unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts Al Viro
                           ` (15 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

same as for pipes

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 49 ++++++++++---------------------------------------
 1 file changed, 10 insertions(+), 39 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index fb8a44f6c5a2..2240daf2280d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1276,7 +1276,7 @@ static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa
 }
 
 static ssize_t iter_xarray_get_pages(struct iov_iter *i,
-				     struct page **pages, size_t maxsize,
+				     struct page ***pages, size_t maxsize,
 				     unsigned maxpages, size_t *_start_offset)
 {
 	unsigned nr, offset;
@@ -1301,7 +1301,13 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	if (count > maxpages)
 		count = maxpages;
 
-	nr = iter_xarray_populate_pages(pages, i->xarray, index, count);
+	if (!*pages) {
+		*pages = get_pages_array(count);
+		if (!*pages)
+			return -ENOMEM;
+	}
+
+	nr = iter_xarray_populate_pages(*pages, i->xarray, index, count);
 	if (nr == 0)
 		return 0;
 
@@ -1409,46 +1415,11 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 	if (iov_iter_is_pipe(i))
 		return pipe_get_pages(i, &pages, maxsize, maxpages, start);
 	if (iov_iter_is_xarray(i))
-		return iter_xarray_get_pages(i, pages, maxsize, maxpages, start);
+		return iter_xarray_get_pages(i, &pages, maxsize, maxpages, start);
 	return -EFAULT;
 }
 EXPORT_SYMBOL(iov_iter_get_pages);
 
-static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i,
-					   struct page ***pages, size_t maxsize,
-					   size_t *_start_offset)
-{
-	struct page **p;
-	unsigned nr, offset;
-	pgoff_t index, count;
-	size_t size = maxsize;
-	loff_t pos;
-
-	pos = i->xarray_start + i->iov_offset;
-	index = pos >> PAGE_SHIFT;
-	offset = pos & ~PAGE_MASK;
-	*_start_offset = offset;
-
-	count = 1;
-	if (size > PAGE_SIZE - offset) {
-		size -= PAGE_SIZE - offset;
-		count += size >> PAGE_SHIFT;
-		size &= ~PAGE_MASK;
-		if (size)
-			count++;
-	}
-
-	*pages = p = get_pages_array(count);
-	if (!p)
-		return -ENOMEM;
-
-	nr = iter_xarray_populate_pages(p, i->xarray, index, count);
-	if (nr == 0)
-		return 0;
-
-	return min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
-}
-
 static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   size_t *start)
@@ -1496,7 +1467,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 	if (iov_iter_is_pipe(i))
 		return pipe_get_pages(i, pages, maxsize, ~0U, start);
 	if (iov_iter_is_xarray(i))
-		return iter_xarray_get_pages_alloc(i, pages, maxsize, start);
+		return iter_xarray_get_pages(i, pages, maxsize, ~0U, start);
 	return -EFAULT;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 16/31] unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (14 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 15/31] unify xarray_get_pages() and xarray_get_pages_alloc() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-19  3:56           ` Al Viro
  2022-06-18  5:35         ` [PATCH 17/31] ITER_XARRAY: don't open-code DIV_ROUND_UP() Al Viro
                           ` (14 subsequent siblings)
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

same as for pipes and xarrays; after that iov_iter_get_pages() becomes
a wrapper for __iov_iter_get_pages_alloc().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 97 +++++++++++++++++---------------------------------
 1 file changed, 33 insertions(+), 64 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 2240daf2280d..379a9a5fa60b 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1372,20 +1372,19 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
 	return page;
 }
 
-ssize_t iov_iter_get_pages(struct iov_iter *i,
-		   struct page **pages, size_t maxsize, unsigned maxpages,
-		   size_t *start)
+static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
+		   struct page ***pages, size_t maxsize,
+		   unsigned int maxpages, size_t *start)
 {
 	size_t len;
 	int n, res;
 
 	if (maxsize > i->count)
 		maxsize = i->count;
-	if (!maxsize || maxpages)
+	if (!maxsize)
 		return 0;
 	if (maxsize > LONG_MAX)
 		maxsize = LONG_MAX;
-	BUG_ON(!pages);
 
 	if (likely(user_backed_iter(i))) {
 		unsigned int gup_flags = 0;
@@ -1398,78 +1397,53 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 
 		addr = first_iovec_segment(i, &len, start, maxsize, maxpages);
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
-		res = get_user_pages_fast(addr, n, gup_flags, pages);
+		if (*pages) {
+			*pages = get_pages_array(n);
+			if (!*pages)
+				return -ENOMEM;
+		}
+		res = get_user_pages_fast(addr, n, gup_flags, *pages);
 		if (unlikely(res <= 0))
 			return res;
-		return (res == n ? len : res * PAGE_SIZE) - *start;
+		if (res < n)
+			len = res * PAGE_SIZE;
+		return len - *start;
 	}
 	if (iov_iter_is_bvec(i)) {
+		struct page **p;
 		struct page *page;
 
 		page = first_bvec_segment(i, &len, start, maxsize, maxpages);
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
-		while (n--)
-			get_page(*pages++ = page++);
+		p = *pages;
+		if (!p) {
+			*pages = p = get_pages_array(n);
+			if (!p)
+				return -ENOMEM;
+		}
+		for (int k = 0; k < n; k++)
+			get_page(*p++ = page++);
 		return len - *start;
 	}
 	if (iov_iter_is_pipe(i))
-		return pipe_get_pages(i, &pages, maxsize, maxpages, start);
+		return pipe_get_pages(i, pages, maxsize, maxpages, start);
 	if (iov_iter_is_xarray(i))
-		return iter_xarray_get_pages(i, &pages, maxsize, maxpages, start);
+		return iter_xarray_get_pages(i, pages, maxsize, maxpages,
+					     start);
 	return -EFAULT;
 }
-EXPORT_SYMBOL(iov_iter_get_pages);
 
-static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
-		   struct page ***pages, size_t maxsize,
+ssize_t iov_iter_get_pages(struct iov_iter *i,
+		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
 {
-	struct page **p;
-	size_t len;
-	int n, res;
-
-	if (maxsize > LONG_MAX)
-		maxsize = LONG_MAX;
-
-	if (likely(user_backed_iter(i))) {
-		unsigned int gup_flags = 0;
-		unsigned long addr;
-
-		if (iov_iter_rw(i) != WRITE)
-			gup_flags |= FOLL_WRITE;
-		if (i->nofault)
-			gup_flags |= FOLL_NOFAULT;
-
-		addr = first_iovec_segment(i, &len, start, maxsize, ~0U);
-		n = DIV_ROUND_UP(len, PAGE_SIZE);
-		*pages = p = get_pages_array(n);
-		if (!p)
-			return -ENOMEM;
-		res = get_user_pages_fast(addr, n, gup_flags, p);
-		if (unlikely(res <= 0))
-			return res;
-		if (res < n)
-			len = res * PAGE_SIZE;
-		return len - *start;
-	}
-	if (iov_iter_is_bvec(i)) {
-		struct page *page;
+	if (!maxpages)
+		return 0;
+	BUG_ON(!pages);
 
-		page = first_bvec_segment(i, &len, start, maxsize, ~0U);
-		n = DIV_ROUND_UP(len, PAGE_SIZE);
-		*pages = p = get_pages_array(n);
-		if (!p)
-			return -ENOMEM;
-		while (n--)
-			get_page(*p++ = page++);
-		return len - *start;
-	}
-	if (iov_iter_is_pipe(i))
-		return pipe_get_pages(i, pages, maxsize, ~0U, start);
-	if (iov_iter_is_xarray(i))
-		return iter_xarray_get_pages(i, pages, maxsize, ~0U, start);
-	return -EFAULT;
+	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
 }
+EXPORT_SYMBOL(iov_iter_get_pages);
 
 ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
@@ -1479,12 +1453,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 
 	*pages = NULL;
 
-	if (maxsize > i->count)
-		maxsize = i->count;
-	if (!maxsize)
-		return 0;
-
-	len = __iov_iter_get_pages_alloc(i, pages, maxsize, start);
+	len = __iov_iter_get_pages_alloc(i, pages, maxsize, ~0U, start);
 	if (len <= 0) {
 		kvfree(*pages);
 		*pages = NULL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 17/31] ITER_XARRAY: don't open-code DIV_ROUND_UP()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (15 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 16/31] unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 18/31] iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() Al Viro
                           ` (13 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 379a9a5fa60b..04c3a62679f8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1289,15 +1289,7 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	offset = pos & ~PAGE_MASK;
 	*_start_offset = offset;
 
-	count = 1;
-	if (size > PAGE_SIZE - offset) {
-		size -= PAGE_SIZE - offset;
-		count += size >> PAGE_SHIFT;
-		size &= ~PAGE_MASK;
-		if (size)
-			count++;
-	}
-
+	count = DIV_ROUND_UP(size + offset, PAGE_SIZE);
 	if (count > maxpages)
 		count = maxpages;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 18/31] iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (16 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 17/31] ITER_XARRAY: don't open-code DIV_ROUND_UP() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 19/31] iov_iter: massage calling conventions for first_{iovec,bvec}_segment() Al Viro
                           ` (12 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 04c3a62679f8..8f1d63295f37 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1308,12 +1308,9 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 
 static unsigned long found_ubuf_segment(unsigned long addr,
 					size_t len,
-					size_t *size, size_t *start,
-					unsigned maxpages)
+					size_t *size, size_t *start)
 {
 	len += (*start = addr % PAGE_SIZE);
-	if (len > maxpages * PAGE_SIZE)
-		len = maxpages * PAGE_SIZE;
 	*size = len;
 	return addr & PAGE_MASK;
 }
@@ -1321,14 +1318,14 @@ static unsigned long found_ubuf_segment(unsigned long addr,
 /* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
 static unsigned long first_iovec_segment(const struct iov_iter *i,
 					 size_t *size, size_t *start,
-					 size_t maxsize, unsigned maxpages)
+					 size_t maxsize)
 {
 	size_t skip;
 	long k;
 
 	if (iter_is_ubuf(i)) {
 		unsigned long addr = (unsigned long)i->ubuf + i->iov_offset;
-		return found_ubuf_segment(addr, maxsize, size, start, maxpages);
+		return found_ubuf_segment(addr, maxsize, size, start);
 	}
 
 	for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) {
@@ -1339,7 +1336,7 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 			continue;
 		if (len > maxsize)
 			len = maxsize;
-		return found_ubuf_segment(addr, len, size, start, maxpages);
+		return found_ubuf_segment(addr, len, size, start);
 	}
 	BUG(); // if it had been empty, we wouldn't get called
 }
@@ -1347,7 +1344,7 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 /* must be done on non-empty ITER_BVEC one */
 static struct page *first_bvec_segment(const struct iov_iter *i,
 				       size_t *size, size_t *start,
-				       size_t maxsize, unsigned maxpages)
+				       size_t maxsize)
 {
 	struct page *page;
 	size_t skip = i->iov_offset, len;
@@ -1358,8 +1355,6 @@ static struct page *first_bvec_segment(const struct iov_iter *i,
 	skip += i->bvec->bv_offset;
 	page = i->bvec->bv_page + skip / PAGE_SIZE;
 	len += (*start = skip % PAGE_SIZE);
-	if (len > maxpages * PAGE_SIZE)
-		len = maxpages * PAGE_SIZE;
 	*size = len;
 	return page;
 }
@@ -1387,7 +1382,9 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		if (i->nofault)
 			gup_flags |= FOLL_NOFAULT;
 
-		addr = first_iovec_segment(i, &len, start, maxsize, maxpages);
+		addr = first_iovec_segment(i, &len, start, maxsize);
+		if (len > maxpages * PAGE_SIZE)
+			len = maxpages * PAGE_SIZE;
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
 		if (*pages) {
 			*pages = get_pages_array(n);
@@ -1405,7 +1402,9 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		struct page **p;
 		struct page *page;
 
-		page = first_bvec_segment(i, &len, start, maxsize, maxpages);
+		page = first_bvec_segment(i, &len, start, maxsize);
+		if (len > maxpages * PAGE_SIZE)
+			len = maxpages * PAGE_SIZE;
 		n = DIV_ROUND_UP(len, PAGE_SIZE);
 		p = *pages;
 		if (!p) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 19/31] iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (17 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 18/31] iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18 11:13           ` Al Viro
  2022-06-18  5:35         ` [PATCH 20/31] found_iovec_segment(): just return address Al Viro
                           ` (11 subsequent siblings)
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Pass maxsize by reference, return length via the same.  And do not
add offset to returned length.  Callers adjusted...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 49 +++++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8f1d63295f37..1a30783e2b60 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1307,25 +1307,22 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 }
 
 static unsigned long found_ubuf_segment(unsigned long addr,
-					size_t len,
-					size_t *size, size_t *start)
+					size_t *start)
 {
-	len += (*start = addr % PAGE_SIZE);
-	*size = len;
+	*start = addr % PAGE_SIZE;
 	return addr & PAGE_MASK;
 }
 
 /* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
 static unsigned long first_iovec_segment(const struct iov_iter *i,
-					 size_t *size, size_t *start,
-					 size_t maxsize)
+					 size_t *size, size_t *start)
 {
 	size_t skip;
 	long k;
 
 	if (iter_is_ubuf(i)) {
 		unsigned long addr = (unsigned long)i->ubuf + i->iov_offset;
-		return found_ubuf_segment(addr, maxsize, size, start);
+		return found_ubuf_segment(addr, start);
 	}
 
 	for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) {
@@ -1334,28 +1331,26 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 
 		if (unlikely(!len))
 			continue;
-		if (len > maxsize)
-			len = maxsize;
-		return found_ubuf_segment(addr, len, size, start);
+		if (*size > len)
+			*size = len;
+		return found_ubuf_segment(addr, start);
 	}
 	BUG(); // if it had been empty, we wouldn't get called
 }
 
 /* must be done on non-empty ITER_BVEC one */
 static struct page *first_bvec_segment(const struct iov_iter *i,
-				       size_t *size, size_t *start,
-				       size_t maxsize)
+				       size_t *size, size_t *start)
 {
 	struct page *page;
 	size_t skip = i->iov_offset, len;
 
 	len = i->bvec->bv_len - skip;
-	if (len > maxsize)
-		len = maxsize;
+	if (*size > len)
+		*size = len;
 	skip += i->bvec->bv_offset;
 	page = i->bvec->bv_page + skip / PAGE_SIZE;
-	len += (*start = skip % PAGE_SIZE);
-	*size = len;
+	*start = skip % PAGE_SIZE;
 	return page;
 }
 
@@ -1382,10 +1377,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		if (i->nofault)
 			gup_flags |= FOLL_NOFAULT;
 
-		addr = first_iovec_segment(i, &len, start, maxsize);
-		if (len > maxpages * PAGE_SIZE)
-			len = maxpages * PAGE_SIZE;
-		n = DIV_ROUND_UP(len, PAGE_SIZE);
+		addr = first_iovec_segment(i, &maxsize, start);
+		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
+		if (n > maxpages)
+			n = maxpages;
 		if (*pages) {
 			*pages = get_pages_array(n);
 			if (!*pages)
@@ -1394,18 +1389,16 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		res = get_user_pages_fast(addr, n, gup_flags, *pages);
 		if (unlikely(res <= 0))
 			return res;
-		if (res < n)
-			len = res * PAGE_SIZE;
-		return len - *start;
+		return min_t(size_t, len, res * PAGE_SIZE - *start);
 	}
 	if (iov_iter_is_bvec(i)) {
 		struct page **p;
 		struct page *page;
 
-		page = first_bvec_segment(i, &len, start, maxsize);
-		if (len > maxpages * PAGE_SIZE)
-			len = maxpages * PAGE_SIZE;
-		n = DIV_ROUND_UP(len, PAGE_SIZE);
+		page = first_bvec_segment(i, &maxsize, start);
+		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
+		if (n > maxpages)
+			n = maxpages;
 		p = *pages;
 		if (!p) {
 			*pages = p = get_pages_array(n);
@@ -1414,7 +1407,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		}
 		for (int k = 0; k < n; k++)
 			get_page(*p++ = page++);
-		return len - *start;
+		return min_t(size_t, maxsize, n * PAGE_SIZE - *start);
 	}
 	if (iov_iter_is_pipe(i))
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 20/31] found_iovec_segment(): just return address
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (18 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 19/31] iov_iter: massage calling conventions for first_{iovec,bvec}_segment() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 21/31] fold __pipe_get_pages() into pipe_get_pages() Al Viro
                           ` (10 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

... and calculate the offset in the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 23 +++++++----------------
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1a30783e2b60..96cf7a05946d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1306,34 +1306,23 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	return min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
 }
 
-static unsigned long found_ubuf_segment(unsigned long addr,
-					size_t *start)
-{
-	*start = addr % PAGE_SIZE;
-	return addr & PAGE_MASK;
-}
-
 /* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
 static unsigned long first_iovec_segment(const struct iov_iter *i,
-					 size_t *size, size_t *start)
+					 size_t *size)
 {
 	size_t skip;
 	long k;
 
-	if (iter_is_ubuf(i)) {
-		unsigned long addr = (unsigned long)i->ubuf + i->iov_offset;
-		return found_ubuf_segment(addr, start);
-	}
+	if (iter_is_ubuf(i))
+		return (unsigned long)i->ubuf + i->iov_offset;
 
 	for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) {
-		unsigned long addr = (unsigned long)i->iov[k].iov_base + skip;
 		size_t len = i->iov[k].iov_len - skip;
-
 		if (unlikely(!len))
 			continue;
 		if (*size > len)
 			*size = len;
-		return found_ubuf_segment(addr, start);
+		return (unsigned long)i->iov[k].iov_base + skip;
 	}
 	BUG(); // if it had been empty, we wouldn't get called
 }
@@ -1377,7 +1366,9 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		if (i->nofault)
 			gup_flags |= FOLL_NOFAULT;
 
-		addr = first_iovec_segment(i, &maxsize, start);
+		addr = first_iovec_segment(i, &maxsize);
+		*start = addr % PAGE_SIZE;
+		addr &= PAGE_MASK;
 		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
 		if (n > maxpages)
 			n = maxpages;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 21/31] fold __pipe_get_pages() into pipe_get_pages()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (19 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 20/31] found_iovec_segment(): just return address Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 22/31] iov_iter: saner helper for page array allocation Al Viro
                           ` (9 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

... and don't mangle maxsize there - turn the loop into counting
one instead.  Easier to see that we won't run out of array that
way.  Note that special treatment of the partial buffer in that
thing is an artifact of the non-advancing semantics of
iov_iter_get_pages() - if not for that, it would be append_pipe(),
same as the body of the loop that follows it.  IOW, once we make
iov_iter_get_pages() advancing, the whole thing will turn into
	calculate how many pages do we want
	allocate an array (if needed)
	call append_pipe() that many times.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 75 +++++++++++++++++++++++++-------------------------
 1 file changed, 38 insertions(+), 37 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 96cf7a05946d..f20ba33f48da 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1192,60 +1192,61 @@ static struct page **get_pages_array(size_t n)
 	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
 }
 
-static inline ssize_t __pipe_get_pages(struct iov_iter *i,
-				size_t maxsize,
-				struct page **pages,
-				size_t off)
-{
-	struct pipe_inode_info *pipe = i->pipe;
-	ssize_t left = maxsize;
-
-	if (off) {
-		struct pipe_buffer *buf = pipe_buf(pipe, pipe->head - 1);
-
-		get_page(*pages++ = buf->page);
-		left -= PAGE_SIZE - off;
-		if (left <= 0) {
-			buf->len += maxsize;
-			return maxsize;
-		}
-		buf->len = PAGE_SIZE;
-	}
-	while (!pipe_full(pipe->head, pipe->tail, pipe->max_usage)) {
-		struct page *page = push_anon(pipe,
-					      min_t(ssize_t, left, PAGE_SIZE));
-		if (!page)
-			break;
-		get_page(*pages++ = page);
-		left -= PAGE_SIZE;
-		if (left <= 0)
-			return maxsize;
-	}
-	return maxsize - left ? : -EFAULT;
-}
-
 static ssize_t pipe_get_pages(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
 {
+	struct pipe_inode_info *pipe = i->pipe;
 	unsigned int npages, off;
 	struct page **p;
-	size_t capacity;
+	ssize_t left;
+	int count;
 
 	if (!sanity(i))
 		return -EFAULT;
 
 	*start = off = pipe_npages(i, &npages);
-	capacity = min(npages, maxpages) * PAGE_SIZE - off;
-	maxsize = min(maxsize, capacity);
+	count = DIV_ROUND_UP(maxsize + off, PAGE_SIZE);
+	if (count > npages)
+		count = npages;
+	if (count > maxpages)
+		count = maxpages;
 	p = *pages;
 	if (!p) {
-		*pages = p = get_pages_array(DIV_ROUND_UP(maxsize + off, PAGE_SIZE));
+		*pages = p = get_pages_array(count);
 		if (!p)
 			return -ENOMEM;
 	}
 
-	return __pipe_get_pages(i, maxsize, p, off);
+	left = maxsize;
+	npages = 0;
+	if (off) {
+		struct pipe_buffer *buf = pipe_buf(pipe, pipe->head - 1);
+
+		get_page(*p++ = buf->page);
+		left -= PAGE_SIZE - off;
+		if (left <= 0) {
+			buf->len += maxsize;
+			return maxsize;
+		}
+		buf->len = PAGE_SIZE;
+		npages = 1;
+	}
+	for ( ; npages < count; npages++) {
+		struct page *page;
+		unsigned int size = min_t(ssize_t, left, PAGE_SIZE);
+
+		if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
+			break;
+		page = push_anon(pipe, size);
+		if (!page)
+			break;
+		get_page(*p++ = page);
+		left -= size;
+	}
+	if (!npages)
+		return -EFAULT;
+	return maxsize - left;
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 22/31] iov_iter: saner helper for page array allocation
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (20 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 21/31] fold __pipe_get_pages() into pipe_get_pages() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18 11:14           ` Al Viro
  2022-06-18  5:35         ` [PATCH 23/31] iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Al Viro
                           ` (8 subsequent siblings)
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

All call sites of get_pages_array() are essenitally identical now.
Replace with common helper...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 64 +++++++++++++++++++-------------------------------
 1 file changed, 24 insertions(+), 40 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index f20ba33f48da..a137bfaaaa77 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1187,9 +1187,19 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 }
 EXPORT_SYMBOL(iov_iter_gap_alignment);
 
-static struct page **get_pages_array(size_t n)
+static int want_pages_array(struct page ***res, size_t size,
+			    size_t start, unsigned int maxpages)
 {
-	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
+	unsigned count = DIV_ROUND_UP(size + start, PAGE_SIZE);
+
+	if (count > maxpages)
+		count = maxpages;
+	if (!*res) {
+		*res = kvmalloc_array(count, sizeof(struct page *), GFP_KERNEL);
+		if (!*res)
+			return -ENOMEM;
+	}
+	return count;
 }
 
 static ssize_t pipe_get_pages(struct iov_iter *i,
@@ -1206,18 +1216,10 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 		return -EFAULT;
 
 	*start = off = pipe_npages(i, &npages);
-	count = DIV_ROUND_UP(maxsize + off, PAGE_SIZE);
-	if (count > npages)
-		count = npages;
-	if (count > maxpages)
-		count = maxpages;
+	count = want_pages_array(pages, maxsize, off, min(npages, maxpages));
+	if (count < 0)
+		return count;
 	p = *pages;
-	if (!p) {
-		*pages = p = get_pages_array(count);
-		if (!p)
-			return -ENOMEM;
-	}
-
 	left = maxsize;
 	npages = 0;
 	if (off) {
@@ -1282,7 +1284,6 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 {
 	unsigned nr, offset;
 	pgoff_t index, count;
-	size_t size = maxsize;
 	loff_t pos;
 
 	pos = i->xarray_start + i->iov_offset;
@@ -1290,16 +1291,9 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	offset = pos & ~PAGE_MASK;
 	*_start_offset = offset;
 
-	count = DIV_ROUND_UP(size + offset, PAGE_SIZE);
-	if (count > maxpages)
-		count = maxpages;
-
-	if (!*pages) {
-		*pages = get_pages_array(count);
-		if (!*pages)
-			return -ENOMEM;
-	}
-
+	count = want_pages_array(pages, maxsize, offset, maxpages);
+	if (count < 0)
+		return count;
 	nr = iter_xarray_populate_pages(*pages, i->xarray, index, count);
 	if (nr == 0)
 		return 0;
@@ -1370,14 +1364,9 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		addr = first_iovec_segment(i, &maxsize);
 		*start = addr % PAGE_SIZE;
 		addr &= PAGE_MASK;
-		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
-		if (n > maxpages)
-			n = maxpages;
-		if (*pages) {
-			*pages = get_pages_array(n);
-			if (!*pages)
-				return -ENOMEM;
-		}
+		n = want_pages_array(pages, len, *start, maxpages);
+		if (n < 0)
+			return n;
 		res = get_user_pages_fast(addr, n, gup_flags, *pages);
 		if (unlikely(res <= 0))
 			return res;
@@ -1388,15 +1377,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		struct page *page;
 
 		page = first_bvec_segment(i, &maxsize, start);
-		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
-		if (n > maxpages)
-			n = maxpages;
+		n = want_pages_array(pages, len, *start, maxpages);
+		if (n < 0)
+			return n;
 		p = *pages;
-		if (!p) {
-			*pages = p = get_pages_array(n);
-			if (!p)
-				return -ENOMEM;
-		}
 		for (int k = 0; k < n; k++)
 			get_page(*p++ = page++);
 		return min_t(size_t, maxsize, n * PAGE_SIZE - *start);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 23/31] iov_iter: advancing variants of iov_iter_get_pages{,_alloc}()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (21 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 22/31] iov_iter: saner helper for page array allocation Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 24/31] block: convert to " Al Viro
                           ` (7 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrappers doing that, convert trivial open-coded
uses of those.

BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 drivers/vhost/scsi.c |  4 +---
 fs/ceph/file.c       |  3 +--
 fs/cifs/file.c       |  6 ++----
 fs/cifs/misc.c       |  3 +--
 fs/direct-io.c       |  3 +--
 fs/fuse/dev.c        |  3 +--
 fs/fuse/file.c       |  3 +--
 fs/nfs/direct.c      |  6 ++----
 include/linux/uio.h  | 20 ++++++++++++++++++++
 net/core/datagram.c  |  3 +--
 net/core/skmsg.c     |  3 +--
 net/rds/message.c    |  3 +--
 net/tls/tls_sw.c     |  4 +---
 13 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index ffd9e6c2ffc1..9b65509424dc 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -643,14 +643,12 @@ vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd,
 	size_t offset;
 	unsigned int npages = 0;
 
-	bytes = iov_iter_get_pages(iter, pages, LONG_MAX,
+	bytes = iov_iter_get_pages2(iter, pages, LONG_MAX,
 				VHOST_SCSI_PREALLOC_UPAGES, &offset);
 	/* No pages were pinned */
 	if (bytes <= 0)
 		return bytes < 0 ? bytes : -EFAULT;
 
-	iov_iter_advance(iter, bytes);
-
 	while (bytes) {
 		unsigned n = min_t(unsigned, PAGE_SIZE - offset, bytes);
 		sg_set_page(sg++, pages[npages++], n, offset);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index c535de5852bf..8fab5db16c73 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -95,12 +95,11 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize,
 		size_t start;
 		int idx = 0;
 
-		bytes = iov_iter_get_pages(iter, pages, maxsize - size,
+		bytes = iov_iter_get_pages2(iter, pages, maxsize - size,
 					   ITER_GET_BVECS_PAGES, &start);
 		if (bytes < 0)
 			return size ?: bytes;
 
-		iov_iter_advance(iter, bytes);
 		size += bytes;
 
 		for ( ; bytes; idx++, bvec_idx++) {
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e1e05b253daa..3ba013e2987f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3022,7 +3022,7 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 		if (ctx->direct_io) {
 			ssize_t result;
 
-			result = iov_iter_get_pages_alloc(
+			result = iov_iter_get_pages_alloc2(
 				from, &pagevec, cur_len, &start);
 			if (result < 0) {
 				cifs_dbg(VFS,
@@ -3036,7 +3036,6 @@ cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from,
 				break;
 			}
 			cur_len = (size_t)result;
-			iov_iter_advance(from, cur_len);
 
 			nr_pages =
 				(cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE;
@@ -3758,7 +3757,7 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 		if (ctx->direct_io) {
 			ssize_t result;
 
-			result = iov_iter_get_pages_alloc(
+			result = iov_iter_get_pages_alloc2(
 					&direct_iov, &pagevec,
 					cur_len, &start);
 			if (result < 0) {
@@ -3774,7 +3773,6 @@ cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file,
 				break;
 			}
 			cur_len = (size_t)result;
-			iov_iter_advance(&direct_iov, cur_len);
 
 			rdata = cifs_readdata_direct_alloc(
 					pagevec, cifs_uncached_readv_complete);
diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c
index c69e1240d730..37493118fb72 100644
--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -1022,7 +1022,7 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw)
 	saved_len = count;
 
 	while (count && npages < max_pages) {
-		rc = iov_iter_get_pages(iter, pages, count, max_pages, &start);
+		rc = iov_iter_get_pages2(iter, pages, count, max_pages, &start);
 		if (rc < 0) {
 			cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc);
 			break;
@@ -1034,7 +1034,6 @@ setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw)
 			break;
 		}
 
-		iov_iter_advance(iter, rc);
 		count -= rc;
 		rc += start;
 		cur_npages = DIV_ROUND_UP(rc, PAGE_SIZE);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 72237f49ad94..9724244f12ce 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -169,7 +169,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 {
 	ssize_t ret;
 
-	ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES,
+	ret = iov_iter_get_pages2(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES,
 				&sdio->from);
 
 	if (ret < 0 && sdio->blocks_available && (dio->op == REQ_OP_WRITE)) {
@@ -191,7 +191,6 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 	}
 
 	if (ret >= 0) {
-		iov_iter_advance(sdio->iter, ret);
 		ret += sdio->from;
 		sdio->head = 0;
 		sdio->tail = (ret + PAGE_SIZE - 1) / PAGE_SIZE;
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8d657c2cd6f7..51897427a534 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -730,14 +730,13 @@ static int fuse_copy_fill(struct fuse_copy_state *cs)
 		}
 	} else {
 		size_t off;
-		err = iov_iter_get_pages(cs->iter, &page, PAGE_SIZE, 1, &off);
+		err = iov_iter_get_pages2(cs->iter, &page, PAGE_SIZE, 1, &off);
 		if (err < 0)
 			return err;
 		BUG_ON(!err);
 		cs->len = err;
 		cs->offset = off;
 		cs->pg = page;
-		iov_iter_advance(cs->iter, err);
 	}
 
 	return lock_request(cs->req);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index c982e3afe3b4..69e19fc0afc1 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1401,14 +1401,13 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
 	while (nbytes < *nbytesp && ap->num_pages < max_pages) {
 		unsigned npages;
 		size_t start;
-		ret = iov_iter_get_pages(ii, &ap->pages[ap->num_pages],
+		ret = iov_iter_get_pages2(ii, &ap->pages[ap->num_pages],
 					*nbytesp - nbytes,
 					max_pages - ap->num_pages,
 					&start);
 		if (ret < 0)
 			break;
 
-		iov_iter_advance(ii, ret);
 		nbytes += ret;
 
 		ret += start;
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 022e1ce63e62..c275c83f0aef 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -364,13 +364,12 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 		size_t pgbase;
 		unsigned npages, i;
 
-		result = iov_iter_get_pages_alloc(iter, &pagevec, 
+		result = iov_iter_get_pages_alloc2(iter, &pagevec,
 						  rsize, &pgbase);
 		if (result < 0)
 			break;
 	
 		bytes = result;
-		iov_iter_advance(iter, bytes);
 		npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
 		for (i = 0; i < npages; i++) {
 			struct nfs_page *req;
@@ -812,13 +811,12 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 		size_t pgbase;
 		unsigned npages, i;
 
-		result = iov_iter_get_pages_alloc(iter, &pagevec, 
+		result = iov_iter_get_pages_alloc2(iter, &pagevec,
 						  wsize, &pgbase);
 		if (result < 0)
 			break;
 
 		bytes = result;
-		iov_iter_advance(iter, bytes);
 		npages = (result + pgbase + PAGE_SIZE - 1) / PAGE_SIZE;
 		for (i = 0; i < npages; i++) {
 			struct nfs_page *req;
diff --git a/include/linux/uio.h b/include/linux/uio.h
index d3e13b37ea72..ab1cc218b9de 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -349,4 +349,24 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	};
 }
 
+static inline ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
+			size_t maxsize, unsigned maxpages, size_t *start)
+{
+	ssize_t res = iov_iter_get_pages(i, pages, maxsize, maxpages, start);
+
+	if (res >= 0)
+		iov_iter_advance(i, res);
+	return res;
+}
+
+static inline ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
+			size_t maxsize, size_t *start)
+{
+	ssize_t res = iov_iter_get_pages_alloc(i, pages, maxsize, start);
+
+	if (res >= 0)
+		iov_iter_advance(i, res);
+	return res;
+}
+
 #endif
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 50f4faeea76c..344b4c5791ac 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -629,12 +629,11 @@ int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb,
 		if (frag == MAX_SKB_FRAGS)
 			return -EMSGSIZE;
 
-		copied = iov_iter_get_pages(from, pages, length,
+		copied = iov_iter_get_pages2(from, pages, length,
 					    MAX_SKB_FRAGS - frag, &start);
 		if (copied < 0)
 			return -EFAULT;
 
-		iov_iter_advance(from, copied);
 		length -= copied;
 
 		truesize = PAGE_ALIGN(copied + start);
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 22b983ade0e7..662151678f20 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -324,14 +324,13 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
 			goto out;
 		}
 
-		copied = iov_iter_get_pages(from, pages, bytes, maxpages,
+		copied = iov_iter_get_pages2(from, pages, bytes, maxpages,
 					    &offset);
 		if (copied <= 0) {
 			ret = -EFAULT;
 			goto out;
 		}
 
-		iov_iter_advance(from, copied);
 		bytes -= copied;
 		msg->sg.size += copied;
 
diff --git a/net/rds/message.c b/net/rds/message.c
index 799034e0f513..d74be4e3f3fa 100644
--- a/net/rds/message.c
+++ b/net/rds/message.c
@@ -391,7 +391,7 @@ static int rds_message_zcopy_from_user(struct rds_message *rm, struct iov_iter *
 		size_t start;
 		ssize_t copied;
 
-		copied = iov_iter_get_pages(from, &pages, PAGE_SIZE,
+		copied = iov_iter_get_pages2(from, &pages, PAGE_SIZE,
 					    1, &start);
 		if (copied < 0) {
 			struct mmpin *mmp;
@@ -405,7 +405,6 @@ static int rds_message_zcopy_from_user(struct rds_message *rm, struct iov_iter *
 			goto err;
 		}
 		total_copied += copied;
-		iov_iter_advance(from, copied);
 		length -= copied;
 		sg_set_page(sg, pages, copied, start);
 		rm->data.op_nents++;
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0513f82b8537..b1406c60f8df 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1361,7 +1361,7 @@ static int tls_setup_from_iter(struct iov_iter *from,
 			rc = -EFAULT;
 			goto out;
 		}
-		copied = iov_iter_get_pages(from, pages,
+		copied = iov_iter_get_pages2(from, pages,
 					    length,
 					    maxpages, &offset);
 		if (copied <= 0) {
@@ -1369,8 +1369,6 @@ static int tls_setup_from_iter(struct iov_iter *from,
 			goto out;
 		}
 
-		iov_iter_advance(from, copied);
-
 		length -= copied;
 		size += copied;
 		while (copied) {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 24/31] block: convert to advancing variants of iov_iter_get_pages{,_alloc}()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (22 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 23/31] iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 25/31] iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() Al Viro
                           ` (6 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

... doing revert if we end up not using some pages

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 block/bio.c     | 15 ++++++---------
 block/blk-map.c |  7 ++++---
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 51c99f2c5c90..01ab683e67be 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1190,7 +1190,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+	size = iov_iter_get_pages2(iter, pages, LONG_MAX, nr_pages, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
@@ -1205,6 +1205,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		} else {
 			if (WARN_ON_ONCE(bio_full(bio, len))) {
 				bio_put_pages(pages + i, left, offset);
+				iov_iter_revert(iter, left);
 				return -EINVAL;
 			}
 			__bio_add_page(bio, page, len, offset);
@@ -1212,7 +1213,6 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		offset = 0;
 	}
 
-	iov_iter_advance(iter, size);
 	return 0;
 }
 
@@ -1227,7 +1227,6 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	ssize_t size, left;
 	unsigned len, i;
 	size_t offset;
-	int ret = 0;
 
 	if (WARN_ON_ONCE(!max_append_sectors))
 		return 0;
@@ -1240,7 +1239,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
-	size = iov_iter_get_pages(iter, pages, LONG_MAX, nr_pages, &offset);
+	size = iov_iter_get_pages2(iter, pages, LONG_MAX, nr_pages, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
 
@@ -1252,16 +1251,14 @@ static int __bio_iov_append_get_pages(struct bio *bio, struct iov_iter *iter)
 		if (bio_add_hw_page(q, bio, page, len, offset,
 				max_append_sectors, &same_page) != len) {
 			bio_put_pages(pages + i, left, offset);
-			ret = -EINVAL;
-			break;
+			iov_iter_revert(iter, left);
+			return -EINVAL;
 		}
 		if (same_page)
 			put_page(page);
 		offset = 0;
 	}
-
-	iov_iter_advance(iter, size - left);
-	return ret;
+	return 0;
 }
 
 /**
diff --git a/block/blk-map.c b/block/blk-map.c
index df8b066cd548..7196a6b64c80 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -254,7 +254,7 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 		size_t offs, added = 0;
 		int npages;
 
-		bytes = iov_iter_get_pages_alloc(iter, &pages, LONG_MAX, &offs);
+		bytes = iov_iter_get_pages_alloc2(iter, &pages, LONG_MAX, &offs);
 		if (unlikely(bytes <= 0)) {
 			ret = bytes ? bytes : -EFAULT;
 			goto out_unmap;
@@ -284,7 +284,6 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 				bytes -= n;
 				offs = 0;
 			}
-			iov_iter_advance(iter, added);
 		}
 		/*
 		 * release the pages we didn't map into the bio, if any
@@ -293,8 +292,10 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 			put_page(pages[j++]);
 		kvfree(pages);
 		/* couldn't stuff something into bio? */
-		if (bytes)
+		if (bytes) {
+			iov_iter_revert(iter, bytes);
 			break;
+		}
 	}
 
 	ret = blk_rq_append_bio(rq, bio);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 25/31] iter_to_pipe(): switch to advancing variant of iov_iter_get_pages()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (23 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 24/31] block: convert to " Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 26/31] af_alg_make_sg(): " Al Viro
                           ` (5 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

... and untangle the cleanup on failure to add into pipe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 47 ++++++++++++++++++++++++-----------------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index 493878bd9bb9..1b810fbbabcf 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1160,39 +1160,40 @@ static int iter_to_pipe(struct iov_iter *from,
 	};
 	size_t total = 0;
 	int ret = 0;
-	bool failed = false;
 
-	while (iov_iter_count(from) && !failed) {
+	while (iov_iter_count(from)) {
 		struct page *pages[16];
-		ssize_t copied;
+		ssize_t left;
 		size_t start;
-		int n;
+		int i, n;
 
-		copied = iov_iter_get_pages(from, pages, LONG_MAX, 16, &start);
-		if (copied <= 0) {
-			ret = copied;
+		left = iov_iter_get_pages2(from, pages, LONG_MAX, 16, &start);
+		if (left <= 0) {
+			ret = left;
 			break;
 		}
 
-		for (n = 0; copied; n++, start = 0) {
-			int size = min_t(int, copied, PAGE_SIZE - start);
-			if (!failed) {
-				buf.page = pages[n];
-				buf.offset = start;
-				buf.len = size;
-				ret = add_to_pipe(pipe, &buf);
-				if (unlikely(ret < 0)) {
-					failed = true;
-				} else {
-					iov_iter_advance(from, ret);
-					total += ret;
-				}
-			} else {
-				put_page(pages[n]);
+		n = DIV_ROUND_UP(left + start, PAGE_SIZE);
+		for (i = 0; i < n; i++) {
+			int size = min_t(int, left, PAGE_SIZE - start);
+
+			buf.page = pages[i];
+			buf.offset = start;
+			buf.len = size;
+			ret = add_to_pipe(pipe, &buf);
+			if (unlikely(ret < 0)) {
+				iov_iter_revert(from, left);
+				// this one got dropped by add_to_pipe()
+				while (++i < n)
+					put_page(pages[i]);
+				goto out;
 			}
-			copied -= size;
+			total += ret;
+			left -= size;
+			start = 0;
 		}
 	}
+out:
 	return total ? total : ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 26/31] af_alg_make_sg(): switch to advancing variant of iov_iter_get_pages()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (24 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 25/31] iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 27/31] 9p: convert to advancing variant of iov_iter_get_pages_alloc() Al Viro
                           ` (4 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

... and adjust the callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 crypto/af_alg.c     | 3 +--
 crypto/algif_hash.c | 5 +++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index c8289b7a85ba..e893c0f6c879 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -404,7 +404,7 @@ int af_alg_make_sg(struct af_alg_sgl *sgl, struct iov_iter *iter, int len)
 	ssize_t n;
 	int npages, i;
 
-	n = iov_iter_get_pages(iter, sgl->pages, len, ALG_MAX_PAGES, &off);
+	n = iov_iter_get_pages2(iter, sgl->pages, len, ALG_MAX_PAGES, &off);
 	if (n < 0)
 		return n;
 
@@ -1191,7 +1191,6 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
 		len += err;
 		atomic_add(err, &ctx->rcvused);
 		rsgl->sg_num_bytes = err;
-		iov_iter_advance(&msg->msg_iter, err);
 	}
 
 	*outlen = len;
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 50f7b22f1b48..1d017ec5c63c 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -102,11 +102,12 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
 		err = crypto_wait_req(crypto_ahash_update(&ctx->req),
 				      &ctx->wait);
 		af_alg_free_sg(&ctx->sgl);
-		if (err)
+		if (err) {
+			iov_iter_revert(&msg->msg_iter, len);
 			goto unlock;
+		}
 
 		copied += len;
-		iov_iter_advance(&msg->msg_iter, len);
 	}
 
 	err = 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 27/31] 9p: convert to advancing variant of iov_iter_get_pages_alloc()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (25 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 26/31] af_alg_make_sg(): " Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 28/31] ceph: switch the last caller " Al Viro
                           ` (3 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

that one is somewhat clumsier than usual and needs serious testing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 net/9p/client.c       | 39 +++++++++++++++++++++++----------------
 net/9p/protocol.c     |  3 +--
 net/9p/trans_virtio.c |  3 ++-
 3 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index d403085b9ef5..cb4324211561 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -1491,7 +1491,7 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 	struct p9_client *clnt = fid->clnt;
 	struct p9_req_t *req;
 	int count = iov_iter_count(to);
-	int rsize, non_zc = 0;
+	int rsize, received, non_zc = 0;
 	char *dataptr;
 
 	*err = 0;
@@ -1520,36 +1520,40 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 	}
 	if (IS_ERR(req)) {
 		*err = PTR_ERR(req);
+		if (!non_zc)
+			iov_iter_revert(to, count - iov_iter_count(to));
 		return 0;
 	}
 
 	*err = p9pdu_readf(&req->rc, clnt->proto_version,
-			   "D", &count, &dataptr);
+			   "D", &received, &dataptr);
 	if (*err) {
+		if (!non_zc)
+			iov_iter_revert(to, count - iov_iter_count(to));
 		trace_9p_protocol_dump(clnt, &req->rc);
 		p9_tag_remove(clnt, req);
 		return 0;
 	}
-	if (rsize < count) {
-		pr_err("bogus RREAD count (%d > %d)\n", count, rsize);
-		count = rsize;
+	if (rsize < received) {
+		pr_err("bogus RREAD count (%d > %d)\n", received, rsize);
+		received = rsize;
 	}
 
 	p9_debug(P9_DEBUG_9P, "<<< RREAD count %d\n", count);
 
 	if (non_zc) {
-		int n = copy_to_iter(dataptr, count, to);
+		int n = copy_to_iter(dataptr, received, to);
 
-		if (n != count) {
+		if (n != received) {
 			*err = -EFAULT;
 			p9_tag_remove(clnt, req);
 			return n;
 		}
 	} else {
-		iov_iter_advance(to, count);
+		iov_iter_revert(to, count - received - iov_iter_count(to));
 	}
 	p9_tag_remove(clnt, req);
-	return count;
+	return received;
 }
 EXPORT_SYMBOL(p9_client_read_once);
 
@@ -1567,6 +1571,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 	while (iov_iter_count(from)) {
 		int count = iov_iter_count(from);
 		int rsize = fid->iounit;
+		int written;
 
 		if (!rsize || rsize > clnt->msize - P9_IOHDRSZ)
 			rsize = clnt->msize - P9_IOHDRSZ;
@@ -1584,27 +1589,29 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 					    offset, rsize, from);
 		}
 		if (IS_ERR(req)) {
+			iov_iter_revert(from, count - iov_iter_count(from));
 			*err = PTR_ERR(req);
 			break;
 		}
 
-		*err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
+		*err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &written);
 		if (*err) {
+			iov_iter_revert(from, count - iov_iter_count(from));
 			trace_9p_protocol_dump(clnt, &req->rc);
 			p9_tag_remove(clnt, req);
 			break;
 		}
-		if (rsize < count) {
-			pr_err("bogus RWRITE count (%d > %d)\n", count, rsize);
-			count = rsize;
+		if (rsize < written) {
+			pr_err("bogus RWRITE count (%d > %d)\n", written, rsize);
+			written = rsize;
 		}
 
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", count);
 
 		p9_tag_remove(clnt, req);
-		iov_iter_advance(from, count);
-		total += count;
-		offset += count;
+		iov_iter_revert(from, count - written - iov_iter_count(from));
+		total += written;
+		offset += written;
 	}
 	return total;
 }
diff --git a/net/9p/protocol.c b/net/9p/protocol.c
index 3754c33e2974..83694c631989 100644
--- a/net/9p/protocol.c
+++ b/net/9p/protocol.c
@@ -63,9 +63,8 @@ static size_t
 pdu_write_u(struct p9_fcall *pdu, struct iov_iter *from, size_t size)
 {
 	size_t len = min(pdu->capacity - pdu->size, size);
-	struct iov_iter i = *from;
 
-	if (!copy_from_iter_full(&pdu->sdata[pdu->size], len, &i))
+	if (!copy_from_iter_full(&pdu->sdata[pdu->size], len, from))
 		len = 0;
 
 	pdu->size += len;
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 2a210c2f8e40..1977d33475fe 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -331,7 +331,7 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
 			if (err == -ERESTARTSYS)
 				return err;
 		}
-		n = iov_iter_get_pages_alloc(data, pages, count, offs);
+		n = iov_iter_get_pages_alloc2(data, pages, count, offs);
 		if (n < 0)
 			return n;
 		*need_drop = 1;
@@ -373,6 +373,7 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
 				(*pages)[index] = kmap_to_page(p);
 			p += PAGE_SIZE;
 		}
+		iov_iter_advance(data, len);
 		return len;
 	}
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 28/31] ceph: switch the last caller of iov_iter_get_pages_alloc()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (26 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 27/31] 9p: convert to advancing variant of iov_iter_get_pages_alloc() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18  5:35         ` [PATCH 29/31] get rid of non-advancing variants Al Viro
                           ` (2 subsequent siblings)
  30 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

here nothing even looks at the iov_iter after the call, so we couldn't
care less whether it advances or not.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/ceph/addr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6dee88815491..3c8a7cf19e5d 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -329,7 +329,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq)
 
 	dout("%s: pos=%llu orig_len=%zu len=%llu\n", __func__, subreq->start, subreq->len, len);
 	iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, subreq->start, len);
-	err = iov_iter_get_pages_alloc(&iter, &pages, len, &page_off);
+	err = iov_iter_get_pages_alloc2(&iter, &pages, len, &page_off);
 	if (err < 0) {
 		dout("%s: iov_ter_get_pages_alloc returned %d\n", __func__, err);
 		goto out;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 29/31] get rid of non-advancing variants
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (27 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 28/31] ceph: switch the last caller " Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18 11:14           ` Al Viro
  2022-06-18  5:35         ` [PATCH 30/31] pipe_get_pages(): switch to append_pipe() Al Viro
  2022-06-18  5:35         ` [PATCH 31/31] expand those iov_iter_advance() Al Viro
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

mechanical change; will be further massaged in subsequent commits

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/uio.h | 24 ++----------------------
 lib/iov_iter.c      | 27 ++++++++++++++++++---------
 2 files changed, 20 insertions(+), 31 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index ab1cc218b9de..f2fc55f88e45 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -245,9 +245,9 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
 		     loff_t start, size_t count);
-ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
+ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
+ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
 			size_t maxsize, size_t *start);
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
 void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state);
@@ -349,24 +349,4 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	};
 }
 
-static inline ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
-			size_t maxsize, unsigned maxpages, size_t *start)
-{
-	ssize_t res = iov_iter_get_pages(i, pages, maxsize, maxpages, start);
-
-	if (res >= 0)
-		iov_iter_advance(i, res);
-	return res;
-}
-
-static inline ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
-			size_t maxsize, size_t *start)
-{
-	ssize_t res = iov_iter_get_pages_alloc(i, pages, maxsize, start);
-
-	if (res >= 0)
-		iov_iter_advance(i, res);
-	return res;
-}
-
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a137bfaaaa77..c1e5de842fe3 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1229,6 +1229,7 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 		left -= PAGE_SIZE - off;
 		if (left <= 0) {
 			buf->len += maxsize;
+			iov_iter_advance(i, maxsize);
 			return maxsize;
 		}
 		buf->len = PAGE_SIZE;
@@ -1248,7 +1249,9 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	}
 	if (!npages)
 		return -EFAULT;
-	return maxsize - left;
+	maxsize -= left;
+	iov_iter_advance(i, maxsize);
+	return maxsize;
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
@@ -1298,7 +1301,9 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	if (nr == 0)
 		return 0;
 
-	return min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
+	maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
 }
 
 /* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
@@ -1370,7 +1375,9 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		res = get_user_pages_fast(addr, n, gup_flags, *pages);
 		if (unlikely(res <= 0))
 			return res;
-		return min_t(size_t, len, res * PAGE_SIZE - *start);
+		len = min_t(size_t, len, res * PAGE_SIZE - *start);
+		iov_iter_advance(i, len);
+		return len;
 	}
 	if (iov_iter_is_bvec(i)) {
 		struct page **p;
@@ -1382,8 +1389,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 			return n;
 		p = *pages;
 		for (int k = 0; k < n; k++)
-			get_page(*p++ = page++);
-		return min_t(size_t, maxsize, n * PAGE_SIZE - *start);
+			get_page(p[k] = page + k);
+		len = min_t(size_t, len, n * PAGE_SIZE - *start);
+		iov_iter_advance(i, len);
+		return len;
 	}
 	if (iov_iter_is_pipe(i))
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
@@ -1393,7 +1402,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 	return -EFAULT;
 }
 
-ssize_t iov_iter_get_pages(struct iov_iter *i,
+ssize_t iov_iter_get_pages2(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
 {
@@ -1403,9 +1412,9 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 
 	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
 }
-EXPORT_SYMBOL(iov_iter_get_pages);
+EXPORT_SYMBOL(iov_iter_get_pages2);
 
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   size_t *start)
 {
@@ -1420,7 +1429,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 	}
 	return len;
 }
-EXPORT_SYMBOL(iov_iter_get_pages_alloc);
+EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
 
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 			       struct iov_iter *i)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 30/31] pipe_get_pages(): switch to append_pipe()
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (28 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 29/31] get rid of non-advancing variants Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-19  4:01           ` Al Viro
  2022-06-18  5:35         ` [PATCH 31/31] expand those iov_iter_advance() Al Viro
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

now that we are advancing the iterator, there's no need to
treat the first page separately - just call append_pipe()
in a loop.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 35 ++++++-----------------------------
 1 file changed, 6 insertions(+), 29 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index c1e5de842fe3..3306072c7b73 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1206,10 +1206,9 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
 {
-	struct pipe_inode_info *pipe = i->pipe;
-	unsigned int npages, off;
+	unsigned int npages;
+	size_t left, off;
 	struct page **p;
-	ssize_t left;
 	int count;
 
 	if (!sanity(i))
@@ -1220,38 +1219,16 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	if (count < 0)
 		return count;
 	p = *pages;
-	left = maxsize;
-	npages = 0;
-	if (off) {
-		struct pipe_buffer *buf = pipe_buf(pipe, pipe->head - 1);
-
-		get_page(*p++ = buf->page);
-		left -= PAGE_SIZE - off;
-		if (left <= 0) {
-			buf->len += maxsize;
-			iov_iter_advance(i, maxsize);
-			return maxsize;
-		}
-		buf->len = PAGE_SIZE;
-		npages = 1;
-	}
-	for ( ; npages < count; npages++) {
-		struct page *page;
-		unsigned int size = min_t(ssize_t, left, PAGE_SIZE);
-
-		if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
-			break;
-		page = push_anon(pipe, size);
+	for (npages = 0, left = maxsize ; npages < count; npages++) {
+		struct page *page = append_pipe(i, left, &off);
 		if (!page)
 			break;
 		get_page(*p++ = page);
-		left -= size;
+		left -= PAGE_SIZE - off;
 	}
 	if (!npages)
 		return -EFAULT;
-	maxsize -= left;
-	iov_iter_advance(i, maxsize);
-	return maxsize;
+	return maxsize - left;
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 31/31] expand those iov_iter_advance()...
  2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
                           ` (29 preceding siblings ...)
  2022-06-18  5:35         ` [PATCH 30/31] pipe_get_pages(): switch to append_pipe() Al Viro
@ 2022-06-18  5:35         ` Al Viro
  2022-06-18 11:14           ` Al Viro
  30 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18  5:35 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 3306072c7b73..b50e264a14bf 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1279,7 +1279,8 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 		return 0;
 
 	maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
-	iov_iter_advance(i, maxsize);
+	i->iov_offset += maxsize;
+	i->count -= maxsize;
 	return maxsize;
 }
 
@@ -1368,7 +1369,13 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		for (int k = 0; k < n; k++)
 			get_page(p[k] = page + k);
 		len = min_t(size_t, len, n * PAGE_SIZE - *start);
-		iov_iter_advance(i, len);
+		i->count -= len;
+		i->iov_offset += len;
+		if (i->iov_offset == i->bvec->bv_len) {
+			i->iov_offset = 0;
+			i->bvec++;
+			i->nr_segs--;
+		}
 		return len;
 	}
 	if (iov_iter_is_pipe(i))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 19/31] iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
  2022-06-18  5:35         ` [PATCH 19/31] iov_iter: massage calling conventions for first_{iovec,bvec}_segment() Al Viro
@ 2022-06-18 11:13           ` Al Viro
  2022-06-18 11:18             ` Al Viro
  0 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-18 11:13 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

[with braino fixed]
From 3893e9565ad55a8514f5f819545bf9df5a339c76 Mon Sep 17 00:00:00 2001
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Fri, 10 Jun 2022 22:19:25 -0400
Subject: [PATCH 19/31] iov_iter: massage calling conventions for
 first_{iovec,bvec}_segment()

Pass maxsize by reference, return length via the same.  And do not
add offset to returned length.  Callers adjusted...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 50 +++++++++++++++++++++-----------------------------
 1 file changed, 21 insertions(+), 29 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 8f1d63295f37..b789728678d2 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1307,25 +1307,22 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 }
 
 static unsigned long found_ubuf_segment(unsigned long addr,
-					size_t len,
-					size_t *size, size_t *start)
+					size_t *start)
 {
-	len += (*start = addr % PAGE_SIZE);
-	*size = len;
+	*start = addr % PAGE_SIZE;
 	return addr & PAGE_MASK;
 }
 
 /* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
 static unsigned long first_iovec_segment(const struct iov_iter *i,
-					 size_t *size, size_t *start,
-					 size_t maxsize)
+					 size_t *size, size_t *start)
 {
 	size_t skip;
 	long k;
 
 	if (iter_is_ubuf(i)) {
 		unsigned long addr = (unsigned long)i->ubuf + i->iov_offset;
-		return found_ubuf_segment(addr, maxsize, size, start);
+		return found_ubuf_segment(addr, start);
 	}
 
 	for (k = 0, skip = i->iov_offset; k < i->nr_segs; k++, skip = 0) {
@@ -1334,28 +1331,26 @@ static unsigned long first_iovec_segment(const struct iov_iter *i,
 
 		if (unlikely(!len))
 			continue;
-		if (len > maxsize)
-			len = maxsize;
-		return found_ubuf_segment(addr, len, size, start);
+		if (*size > len)
+			*size = len;
+		return found_ubuf_segment(addr, start);
 	}
 	BUG(); // if it had been empty, we wouldn't get called
 }
 
 /* must be done on non-empty ITER_BVEC one */
 static struct page *first_bvec_segment(const struct iov_iter *i,
-				       size_t *size, size_t *start,
-				       size_t maxsize)
+				       size_t *size, size_t *start)
 {
 	struct page *page;
 	size_t skip = i->iov_offset, len;
 
 	len = i->bvec->bv_len - skip;
-	if (len > maxsize)
-		len = maxsize;
+	if (*size > len)
+		*size = len;
 	skip += i->bvec->bv_offset;
 	page = i->bvec->bv_page + skip / PAGE_SIZE;
-	len += (*start = skip % PAGE_SIZE);
-	*size = len;
+	*start = skip % PAGE_SIZE;
 	return page;
 }
 
@@ -1363,7 +1358,6 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   unsigned int maxpages, size_t *start)
 {
-	size_t len;
 	int n, res;
 
 	if (maxsize > i->count)
@@ -1382,10 +1376,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		if (i->nofault)
 			gup_flags |= FOLL_NOFAULT;
 
-		addr = first_iovec_segment(i, &len, start, maxsize);
-		if (len > maxpages * PAGE_SIZE)
-			len = maxpages * PAGE_SIZE;
-		n = DIV_ROUND_UP(len, PAGE_SIZE);
+		addr = first_iovec_segment(i, &maxsize, start);
+		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
+		if (n > maxpages)
+			n = maxpages;
 		if (*pages) {
 			*pages = get_pages_array(n);
 			if (!*pages)
@@ -1394,18 +1388,16 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		res = get_user_pages_fast(addr, n, gup_flags, *pages);
 		if (unlikely(res <= 0))
 			return res;
-		if (res < n)
-			len = res * PAGE_SIZE;
-		return len - *start;
+		return min_t(size_t, maxsize, res * PAGE_SIZE - *start);
 	}
 	if (iov_iter_is_bvec(i)) {
 		struct page **p;
 		struct page *page;
 
-		page = first_bvec_segment(i, &len, start, maxsize);
-		if (len > maxpages * PAGE_SIZE)
-			len = maxpages * PAGE_SIZE;
-		n = DIV_ROUND_UP(len, PAGE_SIZE);
+		page = first_bvec_segment(i, &maxsize, start);
+		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
+		if (n > maxpages)
+			n = maxpages;
 		p = *pages;
 		if (!p) {
 			*pages = p = get_pages_array(n);
@@ -1414,7 +1406,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		}
 		for (int k = 0; k < n; k++)
 			get_page(*p++ = page++);
-		return len - *start;
+		return min_t(size_t, maxsize, n * PAGE_SIZE - *start);
 	}
 	if (iov_iter_is_pipe(i))
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 22/31] iov_iter: saner helper for page array allocation
  2022-06-18  5:35         ` [PATCH 22/31] iov_iter: saner helper for page array allocation Al Viro
@ 2022-06-18 11:14           ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18 11:14 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

[braino fix]
From e64d637d648390e4ac0643747ae174c3be15f243 Mon Sep 17 00:00:00 2001
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Fri, 17 Jun 2022 14:45:41 -0400
Subject: [PATCH 22/31] iov_iter: saner helper for page array allocation

All call sites of get_pages_array() are essenitally identical now.
Replace with common helper...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 64 +++++++++++++++++++-------------------------------
 1 file changed, 24 insertions(+), 40 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index a65c766936cc..2ce062f1817d 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1187,9 +1187,19 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i)
 }
 EXPORT_SYMBOL(iov_iter_gap_alignment);
 
-static struct page **get_pages_array(size_t n)
+static int want_pages_array(struct page ***res, size_t size,
+			    size_t start, unsigned int maxpages)
 {
-	return kvmalloc_array(n, sizeof(struct page *), GFP_KERNEL);
+	unsigned count = DIV_ROUND_UP(size + start, PAGE_SIZE);
+
+	if (count > maxpages)
+		count = maxpages;
+	if (!*res) {
+		*res = kvmalloc_array(count, sizeof(struct page *), GFP_KERNEL);
+		if (!*res)
+			return -ENOMEM;
+	}
+	return count;
 }
 
 static ssize_t pipe_get_pages(struct iov_iter *i,
@@ -1206,18 +1216,10 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 		return -EFAULT;
 
 	*start = off = pipe_npages(i, &npages);
-	count = DIV_ROUND_UP(maxsize + off, PAGE_SIZE);
-	if (count > npages)
-		count = npages;
-	if (count > maxpages)
-		count = maxpages;
+	count = want_pages_array(pages, maxsize, off, min(npages, maxpages));
+	if (count < 0)
+		return count;
 	p = *pages;
-	if (!p) {
-		*pages = p = get_pages_array(count);
-		if (!p)
-			return -ENOMEM;
-	}
-
 	left = maxsize;
 	npages = 0;
 	if (off) {
@@ -1282,7 +1284,6 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 {
 	unsigned nr, offset;
 	pgoff_t index, count;
-	size_t size = maxsize;
 	loff_t pos;
 
 	pos = i->xarray_start + i->iov_offset;
@@ -1290,16 +1291,9 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	offset = pos & ~PAGE_MASK;
 	*_start_offset = offset;
 
-	count = DIV_ROUND_UP(size + offset, PAGE_SIZE);
-	if (count > maxpages)
-		count = maxpages;
-
-	if (!*pages) {
-		*pages = get_pages_array(count);
-		if (!*pages)
-			return -ENOMEM;
-	}
-
+	count = want_pages_array(pages, maxsize, offset, maxpages);
+	if (count < 0)
+		return count;
 	nr = iter_xarray_populate_pages(*pages, i->xarray, index, count);
 	if (nr == 0)
 		return 0;
@@ -1369,14 +1363,9 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		addr = first_iovec_segment(i, &maxsize);
 		*start = addr % PAGE_SIZE;
 		addr &= PAGE_MASK;
-		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
-		if (n > maxpages)
-			n = maxpages;
-		if (*pages) {
-			*pages = get_pages_array(n);
-			if (!*pages)
-				return -ENOMEM;
-		}
+		n = want_pages_array(pages, maxsize, *start, maxpages);
+		if (n < 0)
+			return n;
 		res = get_user_pages_fast(addr, n, gup_flags, *pages);
 		if (unlikely(res <= 0))
 			return res;
@@ -1387,15 +1376,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		struct page *page;
 
 		page = first_bvec_segment(i, &maxsize, start);
-		n = DIV_ROUND_UP(maxsize + *start, PAGE_SIZE);
-		if (n > maxpages)
-			n = maxpages;
+		n = want_pages_array(pages, maxsize, *start, maxpages);
+		if (n < 0)
+			return n;
 		p = *pages;
-		if (!p) {
-			*pages = p = get_pages_array(n);
-			if (!p)
-				return -ENOMEM;
-		}
 		for (int k = 0; k < n; k++)
 			get_page(*p++ = page++);
 		return min_t(size_t, maxsize, n * PAGE_SIZE - *start);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 29/31] get rid of non-advancing variants
  2022-06-18  5:35         ` [PATCH 29/31] get rid of non-advancing variants Al Viro
@ 2022-06-18 11:14           ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18 11:14 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

[braino fixed]
From 808b5178c97c4e6789ff727988c571d77b42acf7 Mon Sep 17 00:00:00 2001
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Fri, 10 Jun 2022 13:05:12 -0400
Subject: [PATCH 29/31] get rid of non-advancing variants

mechanical change; will be further massaged in subsequent commits

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 include/linux/uio.h | 24 ++----------------------
 lib/iov_iter.c      | 27 ++++++++++++++++++---------
 2 files changed, 20 insertions(+), 31 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index ab1cc218b9de..f2fc55f88e45 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -245,9 +245,9 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode
 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count);
 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray,
 		     loff_t start, size_t count);
-ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages,
+ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
 			size_t maxsize, unsigned maxpages, size_t *start);
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages,
+ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
 			size_t maxsize, size_t *start);
 int iov_iter_npages(const struct iov_iter *i, int maxpages);
 void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state);
@@ -349,24 +349,4 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	};
 }
 
-static inline ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
-			size_t maxsize, unsigned maxpages, size_t *start)
-{
-	ssize_t res = iov_iter_get_pages(i, pages, maxsize, maxpages, start);
-
-	if (res >= 0)
-		iov_iter_advance(i, res);
-	return res;
-}
-
-static inline ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages,
-			size_t maxsize, size_t *start)
-{
-	ssize_t res = iov_iter_get_pages_alloc(i, pages, maxsize, start);
-
-	if (res >= 0)
-		iov_iter_advance(i, res);
-	return res;
-}
-
 #endif
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 2ce062f1817d..d581a204c256 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1229,6 +1229,7 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 		left -= PAGE_SIZE - off;
 		if (left <= 0) {
 			buf->len += maxsize;
+			iov_iter_advance(i, maxsize);
 			return maxsize;
 		}
 		buf->len = PAGE_SIZE;
@@ -1248,7 +1249,9 @@ static ssize_t pipe_get_pages(struct iov_iter *i,
 	}
 	if (!npages)
 		return -EFAULT;
-	return maxsize - left;
+	maxsize -= left;
+	iov_iter_advance(i, maxsize);
+	return maxsize;
 }
 
 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
@@ -1298,7 +1301,9 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 	if (nr == 0)
 		return 0;
 
-	return min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
+	maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
+	iov_iter_advance(i, maxsize);
+	return maxsize;
 }
 
 /* must be done on non-empty ITER_UBUF or ITER_IOVEC one */
@@ -1369,7 +1374,9 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		res = get_user_pages_fast(addr, n, gup_flags, *pages);
 		if (unlikely(res <= 0))
 			return res;
-		return min_t(size_t, maxsize, res * PAGE_SIZE - *start);
+		maxsize = min_t(size_t, maxsize, res * PAGE_SIZE - *start);
+		iov_iter_advance(i, maxsize);
+		return maxsize;
 	}
 	if (iov_iter_is_bvec(i)) {
 		struct page **p;
@@ -1381,8 +1388,10 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 			return n;
 		p = *pages;
 		for (int k = 0; k < n; k++)
-			get_page(*p++ = page++);
-		return min_t(size_t, maxsize, n * PAGE_SIZE - *start);
+			get_page(p[k] = page + k);
+		maxsize = min_t(size_t, maxsize, n * PAGE_SIZE - *start);
+		iov_iter_advance(i, maxsize);
+		return maxsize;
 	}
 	if (iov_iter_is_pipe(i))
 		return pipe_get_pages(i, pages, maxsize, maxpages, start);
@@ -1392,7 +1401,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 	return -EFAULT;
 }
 
-ssize_t iov_iter_get_pages(struct iov_iter *i,
+ssize_t iov_iter_get_pages2(struct iov_iter *i,
 		   struct page **pages, size_t maxsize, unsigned maxpages,
 		   size_t *start)
 {
@@ -1402,9 +1411,9 @@ ssize_t iov_iter_get_pages(struct iov_iter *i,
 
 	return __iov_iter_get_pages_alloc(i, &pages, maxsize, maxpages, start);
 }
-EXPORT_SYMBOL(iov_iter_get_pages);
+EXPORT_SYMBOL(iov_iter_get_pages2);
 
-ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
+ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i,
 		   struct page ***pages, size_t maxsize,
 		   size_t *start)
 {
@@ -1419,7 +1428,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i,
 	}
 	return len;
 }
-EXPORT_SYMBOL(iov_iter_get_pages_alloc);
+EXPORT_SYMBOL(iov_iter_get_pages_alloc2);
 
 size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
 			       struct iov_iter *i)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 31/31] expand those iov_iter_advance()...
  2022-06-18  5:35         ` [PATCH 31/31] expand those iov_iter_advance() Al Viro
@ 2022-06-18 11:14           ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18 11:14 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

[braino fixed]
From fe8e2809c7db0ec65403b31b50906f3f481a9b10 Mon Sep 17 00:00:00 2001
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat, 11 Jun 2022 04:04:33 -0400
Subject: [PATCH 31/31] expand those iov_iter_advance()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 lib/iov_iter.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 68a2e2a68aa1..38e7605a6794 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1279,7 +1279,8 @@ static ssize_t iter_xarray_get_pages(struct iov_iter *i,
 		return 0;
 
 	maxsize = min_t(size_t, nr * PAGE_SIZE - offset, maxsize);
-	iov_iter_advance(i, maxsize);
+	i->iov_offset += maxsize;
+	i->count -= maxsize;
 	return maxsize;
 }
 
@@ -1367,7 +1368,13 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		for (int k = 0; k < n; k++)
 			get_page(p[k] = page + k);
 		maxsize = min_t(size_t, maxsize, n * PAGE_SIZE - *start);
-		iov_iter_advance(i, maxsize);
+		i->count -= maxsize;
+		i->iov_offset += maxsize;
+		if (i->iov_offset == i->bvec->bv_len) {
+			i->iov_offset = 0;
+			i->bvec++;
+			i->nr_segs--;
+		}
 		return maxsize;
 	}
 	if (iov_iter_is_pipe(i))
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 19/31] iov_iter: massage calling conventions for first_{iovec,bvec}_segment()
  2022-06-18 11:13           ` Al Viro
@ 2022-06-18 11:18             ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-18 11:18 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Sat, Jun 18, 2022 at 12:13:26PM +0100, Al Viro wrote:
> [with braino fixed]

	FWIW, a branch with this reordering damage fixed had been
force-pushed to the same place.  Since the knock-on effects are
only in 4 commits out of 31, reposting the entire series would be
excessive, IMO.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 04/31] ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives
  2022-06-18  5:35         ` [PATCH 04/31] ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives Al Viro
@ 2022-06-19  1:34           ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-19  1:34 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Sat, Jun 18, 2022 at 06:35:10AM +0100, Al Viro wrote:
> +
> +		if (page)

		if (!page)
that is...

> +			break;
> +		p = kmap_local_page(page);
>  		memset(p + off, 0, chunk);
>  		kunmap_local(p);
> -		i->head = i_head;
> -		i->iov_offset = off + chunk;
>  		n -= chunk;
> -		off = 0;
> -		i_head++;
> -	} while (n);
> -	i->count -= bytes;
> +	}
>  	return bytes;
>  }
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 10/31] ITER_PIPE: fold data_start() and pipe_space_for_user() together
  2022-06-18  5:35         ` [PATCH 10/31] ITER_PIPE: fold data_start() and pipe_space_for_user() together Al Viro
@ 2022-06-19  2:25           ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-19  2:25 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Sat, Jun 18, 2022 at 06:35:17AM +0100, Al Viro wrote:
> +	*npages = max(used - (int)pipe->max_usage, 0);

	*npages = max((int)pipe->max_usage - used, 0);

>  	if (off > 0 && off < PAGE_SIZE) { // anon and not full

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 13/31] iov_iter_get_pages(): sanity-check arguments
  2022-06-18  5:35         ` [PATCH 13/31] iov_iter_get_pages(): sanity-check arguments Al Viro
@ 2022-06-19  3:07           ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-19  3:07 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Sat, Jun 18, 2022 at 06:35:20AM +0100, Al Viro wrote:

> +	if (!maxsize || maxpages)

	if (!maxsize || !maxpages)

obviously...  That's only a bisect hazard, since 16/31 fixes it.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 16/31] unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts
  2022-06-18  5:35         ` [PATCH 16/31] unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts Al Viro
@ 2022-06-19  3:56           ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-19  3:56 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Sat, Jun 18, 2022 at 06:35:23AM +0100, Al Viro wrote:
> -		res = get_user_pages_fast(addr, n, gup_flags, pages);
> +		if (*pages) {

		if (!*pages) {

obviously; transient - it goes away in 22/31.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 30/31] pipe_get_pages(): switch to append_pipe()
  2022-06-18  5:35         ` [PATCH 30/31] pipe_get_pages(): switch to append_pipe() Al Viro
@ 2022-06-19  4:01           ` Al Viro
  2022-06-19  4:09             ` Al Viro
  0 siblings, 1 reply; 93+ messages in thread
From: Al Viro @ 2022-06-19  4:01 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Sat, Jun 18, 2022 at 06:35:37AM +0100, Al Viro wrote:

>  		get_page(*p++ = page);
> -		left -= size;

Argh...
		if (left <= PAGE_SIZE - off)
			return maxsize;
> +		left -= PAGE_SIZE - off;
>  	}
>  	if (!npages)
>  		return -EFAULT;
> -	maxsize -= left;
> -	iov_iter_advance(i, maxsize);
> -	return maxsize;
> +	return maxsize - left;
>  }
>  
>  static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 30/31] pipe_get_pages(): switch to append_pipe()
  2022-06-19  4:01           ` Al Viro
@ 2022-06-19  4:09             ` Al Viro
  0 siblings, 0 replies; 93+ messages in thread
From: Al Viro @ 2022-06-19  4:09 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Christoph Hellwig, Jens Axboe, Matthew Wilcox

On Sun, Jun 19, 2022 at 05:01:29AM +0100, Al Viro wrote:
> On Sat, Jun 18, 2022 at 06:35:37AM +0100, Al Viro wrote:
> 
> >  		get_page(*p++ = page);
> > -		left -= size;
> 
> Argh...
> 		if (left <= PAGE_SIZE - off)
> 			return maxsize;
> > +		left -= PAGE_SIZE - off;
> >  	}
> >  	if (!npages)
> >  		return -EFAULT;
> > -	maxsize -= left;
> > -	iov_iter_advance(i, maxsize);
> > -	return maxsize;
> > +	return maxsize - left;
> >  }
> >  
> >  static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa,
> > -- 
> > 2.30.2
> > 

Might be better to have it return npages and let the caller do the usual
min(maxsize, npages * PAGE_SIZE - offset) song and dance...  Not sure.

Anyway, with these fixes it seems to survive xfstests and ltp without regressions
compared to mainline.  Updated branch force-pushed...

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2022-06-19  4:12 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-07  4:08 [RFC][PATCHES] iov_iter stuff Al Viro
2022-06-07  4:09 ` [PATCH 1/9] No need of likely/unlikely on calls of check_copy_size() Al Viro
2022-06-07  4:41   ` Christoph Hellwig
2022-06-07 11:49   ` Christian Brauner
2022-06-07  4:09 ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Al Viro
2022-06-07  4:42   ` Christoph Hellwig
2022-06-07 16:06     ` Al Viro
2022-06-07 23:27       ` Al Viro
2022-06-07 23:31         ` [PATCH 01/10] No need of likely/unlikely on calls of check_copy_size() Al Viro
2022-06-07 23:31           ` [PATCH 02/10] teach iomap_dio_rw() to suppress dsync Al Viro
2022-06-08  6:18             ` Christoph Hellwig
2022-06-08 15:17             ` Darrick J. Wong
2022-06-10 11:38             ` Christian Brauner
2022-06-07 23:31           ` [PATCH 03/10] btrfs: use IOMAP_DIO_NOSYNC Al Viro
2022-06-08  6:18             ` Christoph Hellwig
2022-06-10 11:09             ` Christian Brauner
2022-06-07 23:31           ` [PATCH 04/10] struct file: use anonymous union member for rcuhead and llist Al Viro
2022-06-07 23:31           ` [PATCH 05/10] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
2022-06-10 11:41             ` Christian Brauner
2022-06-07 23:31           ` [PATCH 06/10] keep iocb_flags() result cached in struct file Al Viro
2022-06-09  0:35             ` Dave Chinner
2022-06-10 11:43             ` Christian Brauner
2022-06-07 23:31           ` [PATCH 07/10] copy_page_{to,from}_iter(): switch iovec variants to generic Al Viro
2022-06-07 23:31           ` [PATCH 08/10] new iov_iter flavour - ITER_UBUF Al Viro
2022-06-07 23:31           ` [PATCH 09/10] switch new_sync_{read,write}() to ITER_UBUF Al Viro
2022-06-10 11:11             ` Christian Brauner
2022-06-07 23:31           ` [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
2022-06-08  6:16       ` [PATCH 2/9] btrfs_direct_write(): cleaner way to handle generic_write_sync() suppression Christoph Hellwig
2022-06-07 14:49   ` Matthew Wilcox
2022-06-07 20:17     ` Al Viro
2022-06-07  4:10 ` [PATCH 3/9] struct file: use anonymous union member for rcuhead and llist Al Viro
2022-06-07 10:18   ` Jan Kara
2022-06-07 11:46   ` Christian Brauner
2022-06-07  4:11 ` [PATCH 4/9] iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC Al Viro
2022-06-07 10:34   ` Jan Kara
2022-06-07 15:34     ` Al Viro
2022-06-07  4:11 ` [PATCH 5/9] keep iocb_flags() result cached in struct file Al Viro
2022-06-07  4:12 ` [PATCH 6/9] copy_page_{to,from}_iter(): switch iovec variants to generic Al Viro
2022-06-07  4:12 ` [PATCH 7/9] new iov_iter flavour - ITER_UBUF Al Viro
2022-06-07  4:13 ` [PATCH 8/9] switch new_sync_{read,write}() to ITER_UBUF Al Viro
2022-06-07  4:13 ` [PATCH 9/9] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
2022-06-08 19:28 ` [RFC][PATCHES] iov_iter stuff Sedat Dilek
2022-06-08 20:39   ` Al Viro
2022-06-09 19:10     ` Sedat Dilek
2022-06-09 19:22       ` Matthew Wilcox
2022-06-09 19:58         ` Matthew Wilcox
2022-06-09 19:45       ` Al Viro
2022-06-17 22:30 ` Jens Axboe
2022-06-17 22:48   ` Al Viro
2022-06-18  5:27     ` Al Viro
2022-06-18  5:35       ` [PATCH 01/31] splice: stop abusing iov_iter_advance() to flush a pipe Al Viro
2022-06-18  5:35         ` [PATCH 02/31] ITER_PIPE: helper for getting pipe buffer by index Al Viro
2022-06-18  5:35         ` [PATCH 03/31] ITER_PIPE: helpers for adding pipe buffers Al Viro
2022-06-18  5:35         ` [PATCH 04/31] ITER_PIPE: allocate buffers as we go in copy-to-pipe primitives Al Viro
2022-06-19  1:34           ` Al Viro
2022-06-18  5:35         ` [PATCH 05/31] ITER_PIPE: fold push_pipe() into __pipe_get_pages() Al Viro
2022-06-18  5:35         ` [PATCH 06/31] ITER_PIPE: lose iter_head argument of __pipe_get_pages() Al Viro
2022-06-18  5:35         ` [PATCH 07/31] ITER_PIPE: clean pipe_advance() up Al Viro
2022-06-18  5:35         ` [PATCH 08/31] ITER_PIPE: clean iov_iter_revert() Al Viro
2022-06-18  5:35         ` [PATCH 09/31] ITER_PIPE: cache the type of last buffer Al Viro
2022-06-18  5:35         ` [PATCH 10/10] iov_iter_bvec_advance(): don't bother with bvec_iter Al Viro
2022-06-18  5:35         ` [PATCH 10/31] ITER_PIPE: fold data_start() and pipe_space_for_user() together Al Viro
2022-06-19  2:25           ` Al Viro
2022-06-18  5:35         ` [PATCH 11/31] iov_iter_get_pages{,_alloc}(): cap the maxsize with LONG_MAX Al Viro
2022-06-18  5:35         ` [PATCH 12/31] iov_iter_get_pages_alloc(): lift freeing pages array on failure exits into wrapper Al Viro
2022-06-18  5:35         ` [PATCH 13/31] iov_iter_get_pages(): sanity-check arguments Al Viro
2022-06-19  3:07           ` Al Viro
2022-06-18  5:35         ` [PATCH 14/31] unify pipe_get_pages() and pipe_get_pages_alloc() Al Viro
2022-06-18  5:35         ` [PATCH 15/31] unify xarray_get_pages() and xarray_get_pages_alloc() Al Viro
2022-06-18  5:35         ` [PATCH 16/31] unify the rest of iov_iter_get_pages()/iov_iter_get_pages_alloc() guts Al Viro
2022-06-19  3:56           ` Al Viro
2022-06-18  5:35         ` [PATCH 17/31] ITER_XARRAY: don't open-code DIV_ROUND_UP() Al Viro
2022-06-18  5:35         ` [PATCH 18/31] iov_iter: lift dealing with maxpages out of first_{iovec,bvec}_segment() Al Viro
2022-06-18  5:35         ` [PATCH 19/31] iov_iter: massage calling conventions for first_{iovec,bvec}_segment() Al Viro
2022-06-18 11:13           ` Al Viro
2022-06-18 11:18             ` Al Viro
2022-06-18  5:35         ` [PATCH 20/31] found_iovec_segment(): just return address Al Viro
2022-06-18  5:35         ` [PATCH 21/31] fold __pipe_get_pages() into pipe_get_pages() Al Viro
2022-06-18  5:35         ` [PATCH 22/31] iov_iter: saner helper for page array allocation Al Viro
2022-06-18 11:14           ` Al Viro
2022-06-18  5:35         ` [PATCH 23/31] iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Al Viro
2022-06-18  5:35         ` [PATCH 24/31] block: convert to " Al Viro
2022-06-18  5:35         ` [PATCH 25/31] iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() Al Viro
2022-06-18  5:35         ` [PATCH 26/31] af_alg_make_sg(): " Al Viro
2022-06-18  5:35         ` [PATCH 27/31] 9p: convert to advancing variant of iov_iter_get_pages_alloc() Al Viro
2022-06-18  5:35         ` [PATCH 28/31] ceph: switch the last caller " Al Viro
2022-06-18  5:35         ` [PATCH 29/31] get rid of non-advancing variants Al Viro
2022-06-18 11:14           ` Al Viro
2022-06-18  5:35         ` [PATCH 30/31] pipe_get_pages(): switch to append_pipe() Al Viro
2022-06-19  4:01           ` Al Viro
2022-06-19  4:09             ` Al Viro
2022-06-18  5:35         ` [PATCH 31/31] expand those iov_iter_advance() Al Viro
2022-06-18 11:14           ` Al Viro

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.