linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter
@ 2023-06-30 15:25 David Howells
  2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm

Hi Jens, Christoph,

Here are some patches to switch from using the I/O direction indication in the
iov_iter struct to using the I/O direction flags to be found in the kiocb
struct, the iomap_iter struct and the request struct.  The iterator's I/O
direction is then only used in some internal checks.

The patches also add direction flags into iov_iter_extract_pages() so that it
can perform some checks.  New constants are defined rather than using READ and
WRITE so that a check can be made that one of them is specified.  The problem
with the READ constant is that it is zero and is thus the same as no direction
being specified - but if we're modifying the buffer contents (ie. reading into
it), we need to know to set FOLL_WRITE.  Granted this would be the default if
unspecified, but it seems better that this case should be explicit.

There are also patches to make 9P and SCSI use iov_iter_extract_pages().

I've pushed the patches here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=iov-extract

David

David Howells (11):
  iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc()
  vfs: Set IOCB_WRITE in iocbs that we're going to write from
  vfs: Use init_kiocb() to initialise new IOCBs
  iov_iter: Use IOCB_WRITE rather than iterator direction
  iov_iter: Use IOMAP_WRITE rather than iterator direction
  iov_iter: Use op_is_write() rather than iterator direction
  cifs: Drop the check using iov_iter_rw()
  iov_iter: Drop iov_iter_rw() and fold in last user
  iov_iter: Use I/O dir flags with iov_iter_extract_pages()
  9p: Pin pages rather than ref'ing if appropriate
  scsi: Use extract_iter_to_sg()

 block/bio.c                       |  6 ++
 block/blk-map.c                   |  5 +-
 block/fops.c                      |  8 +--
 crypto/af_alg.c                   |  5 +-
 crypto/algif_hash.c               |  3 +-
 drivers/block/loop.c              | 11 ++--
 drivers/nvme/target/io-cmd-file.c |  5 +-
 drivers/target/target_core_file.c |  2 +-
 drivers/vhost/scsi.c              | 79 ++++++++------------------
 fs/9p/vfs_addr.c                  |  2 +-
 fs/affs/file.c                    |  4 +-
 fs/aio.c                          |  9 ++-
 fs/btrfs/ioctl.c                  |  4 +-
 fs/cachefiles/io.c                | 10 ++--
 fs/ceph/file.c                    |  6 +-
 fs/dax.c                          |  6 +-
 fs/direct-io.c                    | 28 ++++++----
 fs/exfat/inode.c                  |  6 +-
 fs/ext2/inode.c                   |  2 +-
 fs/f2fs/file.c                    | 10 ++--
 fs/fat/inode.c                    |  4 +-
 fs/fuse/dax.c                     |  2 +-
 fs/fuse/file.c                    |  8 +--
 fs/hfs/inode.c                    |  2 +-
 fs/hfsplus/inode.c                |  2 +-
 fs/iomap/direct-io.c              |  4 +-
 fs/jfs/inode.c                    |  2 +-
 fs/nfs/direct.c                   |  2 +-
 fs/nilfs2/inode.c                 |  2 +-
 fs/ntfs3/inode.c                  |  2 +-
 fs/ocfs2/aops.c                   |  2 +-
 fs/orangefs/inode.c               |  2 +-
 fs/read_write.c                   | 10 ++--
 fs/reiserfs/inode.c               |  2 +-
 fs/seq_file.c                     |  2 +-
 fs/smb/client/smbdirect.c         |  9 ---
 fs/splice.c                       |  2 +-
 fs/udf/inode.c                    |  2 +-
 include/linux/bio.h               | 18 +++++-
 include/linux/fs.h                | 16 +++++-
 include/linux/mm_types.h          |  2 +-
 include/linux/uio.h               | 10 ++--
 io_uring/rw.c                     | 10 ++--
 lib/iov_iter.c                    | 14 ++++-
 lib/scatterlist.c                 | 12 +++-
 mm/filemap.c                      |  2 +-
 mm/page_io.c                      |  4 +-
 net/9p/trans_common.c             |  8 +--
 net/9p/trans_common.h             |  2 +-
 net/9p/trans_virtio.c             | 92 ++++++++++---------------------
 50 files changed, 221 insertions(+), 241 deletions(-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc()
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-07-06 15:21   ` Christoph Hellwig
  2023-06-30 15:25 ` [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from David Howells
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

Fix references to iov_iter_get_pages/pages_alloc() in comments to refer to
the *2 interfaces instead.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/ceph/file.c           | 4 ++--
 include/linux/mm_types.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index b1925232dc08..3bb27b9ce751 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -75,7 +75,7 @@ static __le32 ceph_flags_sys2wire(u32 flags)
  */
 
 /*
- * How many pages to get in one call to iov_iter_get_pages().  This
+ * How many pages to get in one call to iov_iter_get_pages2().  This
  * determines the size of the on-stack array used as a buffer.
  */
 #define ITER_GET_BVECS_PAGES	64
@@ -115,7 +115,7 @@ static ssize_t __iter_get_bvecs(struct iov_iter *iter, size_t maxsize,
 }
 
 /*
- * iov_iter_get_pages() only considers one iov_iter segment, no matter
+ * iov_iter_get_pages2() only considers one iov_iter segment, no matter
  * what maxsize or maxpages are given.  For ITER_BVEC that is a single
  * page.
  *
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index de10fc797c8e..f49029c943b0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1249,7 +1249,7 @@ enum {
 	/*
 	 * FOLL_LONGTERM indicates that the page will be held for an indefinite
 	 * time period _often_ under userspace control.  This is in contrast to
-	 * iov_iter_get_pages(), whose usages are transient.
+	 * iov_iter_get_pages2(), whose usages are transient.
 	 */
 	FOLL_LONGTERM = 1 << 8,
 	/* split huge pmd before returning */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
  2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-07-06 15:22   ` Christoph Hellwig
  2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

IOCB_WRITE is set by aio, io_uring and cachefiles before submitting a write
operation to the VFS, but it isn't set by, say, the write() system call.

Fix this by adding an extra argument to init_sync_kiocb() to indicate the
direction and setting that to READ or WRITE, which will cause IOCB_WRITE to
be set as appropriate.

Whilst we're at it, rename init_sync_kiocb() to init_kiocb().

This will allow drivers to use IOCB_WRITE instead of the iterator data
source to determine the I/O direction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/btrfs/ioctl.c   |  4 ++--
 fs/read_write.c    | 10 +++++-----
 fs/seq_file.c      |  2 +-
 fs/splice.c        |  2 +-
 include/linux/fs.h |  6 +++++-
 mm/filemap.c       |  2 +-
 mm/page_io.c       |  4 ++--
 7 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a895d105464b..15870337dd26 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -4422,7 +4422,7 @@ static int btrfs_ioctl_encoded_read(struct file *file, void __user *argp,
 	if (ret < 0)
 		goto out_iov;
 
-	init_sync_kiocb(&kiocb, file);
+	init_kiocb(&kiocb, file, READ);
 	kiocb.ki_pos = pos;
 
 	ret = btrfs_encoded_read(&kiocb, &iter, &args);
@@ -4523,7 +4523,7 @@ static int btrfs_ioctl_encoded_write(struct file *file, void __user *argp, bool
 	if (ret < 0)
 		goto out_end_write;
 
-	init_sync_kiocb(&kiocb, file);
+	init_kiocb(&kiocb, file, WRITE);
 	ret = kiocb_set_rw_flags(&kiocb, 0);
 	if (ret)
 		goto out_end_write;
diff --git a/fs/read_write.c b/fs/read_write.c
index b07de77ef126..6fe517047095 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -382,7 +382,7 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
 	struct iov_iter iter;
 	ssize_t ret;
 
-	init_sync_kiocb(&kiocb, filp);
+	init_kiocb(&kiocb, filp, READ);
 	kiocb.ki_pos = (ppos ? *ppos : 0);
 	iov_iter_ubuf(&iter, ITER_DEST, buf, len);
 
@@ -422,7 +422,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 	if (unlikely(!file->f_op->read_iter || file->f_op->read))
 		return warn_unsupported(file, "read");
 
-	init_sync_kiocb(&kiocb, file);
+	init_kiocb(&kiocb, file, READ);
 	kiocb.ki_pos = pos ? *pos : 0;
 	iov_iter_kvec(&iter, ITER_DEST, &iov, 1, iov.iov_len);
 	ret = file->f_op->read_iter(&kiocb, &iter);
@@ -484,7 +484,7 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t
 	struct iov_iter iter;
 	ssize_t ret;
 
-	init_sync_kiocb(&kiocb, filp);
+	init_kiocb(&kiocb, filp, WRITE);
 	kiocb.ki_pos = (ppos ? *ppos : 0);
 	iov_iter_ubuf(&iter, ITER_SOURCE, (void __user *)buf, len);
 
@@ -512,7 +512,7 @@ ssize_t __kernel_write_iter(struct file *file, struct iov_iter *from, loff_t *po
 	if (unlikely(!file->f_op->write_iter || file->f_op->write))
 		return warn_unsupported(file, "write");
 
-	init_sync_kiocb(&kiocb, file);
+	init_kiocb(&kiocb, file, WRITE);
 	kiocb.ki_pos = pos ? *pos : 0;
 	ret = file->f_op->write_iter(&kiocb, from);
 	if (ret > 0) {
@@ -723,7 +723,7 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 	struct kiocb kiocb;
 	ssize_t ret;
 
-	init_sync_kiocb(&kiocb, filp);
+	init_kiocb(&kiocb, filp, type);
 	ret = kiocb_set_rw_flags(&kiocb, flags);
 	if (ret)
 		return ret;
diff --git a/fs/seq_file.c b/fs/seq_file.c
index f5fdaf3b1572..1ee6ffc630da 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -155,7 +155,7 @@ ssize_t seq_read(struct file *file, char __user *buf, size_t size, loff_t *ppos)
 	struct iov_iter iter;
 	ssize_t ret;
 
-	init_sync_kiocb(&kiocb, file);
+	init_kiocb(&kiocb, file, READ);
 	iov_iter_init(&iter, ITER_DEST, &iov, 1, size);
 
 	kiocb.ki_pos = *ppos;
diff --git a/fs/splice.c b/fs/splice.c
index 004eb1c4ce31..867357ebb2c3 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -362,7 +362,7 @@ ssize_t copy_splice_read(struct file *in, loff_t *ppos,
 
 	/* Do the I/O */
 	iov_iter_bvec(&to, ITER_DEST, bv, npages, len);
-	init_sync_kiocb(&kiocb, in);
+	init_kiocb(&kiocb, in, READ);
 	kiocb.ki_pos = *ppos;
 	ret = call_read_iter(in, &kiocb, &to);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d4b67bdeb53e..466eba253502 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2017,13 +2017,17 @@ static inline bool HAS_UNMAPPED_ID(struct mnt_idmap *idmap,
 	       !vfsgid_valid(i_gid_into_vfsgid(idmap, inode));
 }
 
-static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
+static inline void init_kiocb(struct kiocb *kiocb, struct file *filp,
+			      unsigned int rw)
 {
 	*kiocb = (struct kiocb) {
 		.ki_filp = filp,
 		.ki_flags = filp->f_iocb_flags,
 		.ki_ioprio = get_current_ioprio(),
 	};
+
+	if (rw == WRITE)
+		kiocb->ki_flags |= IOCB_WRITE;
 }
 
 static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
diff --git a/mm/filemap.c b/mm/filemap.c
index 9e44a49bbd74..cd763122d2a2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2905,7 +2905,7 @@ ssize_t filemap_splice_read(struct file *in, loff_t *ppos,
 	if (unlikely(*ppos >= in->f_mapping->host->i_sb->s_maxbytes))
 		return 0;
 
-	init_sync_kiocb(&iocb, in);
+	init_kiocb(&iocb, in, READ);
 	iocb.ki_pos = *ppos;
 
 	/* Work out how much data we can actually add into the pipe */
diff --git a/mm/page_io.c b/mm/page_io.c
index 684cd3c7b59b..85cbadaf7395 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -312,7 +312,7 @@ static void swap_writepage_fs(struct page *page, struct writeback_control *wbc)
 	}
 	if (!sio) {
 		sio = mempool_alloc(sio_pool, GFP_NOIO);
-		init_sync_kiocb(&sio->iocb, swap_file);
+		init_kiocb(&sio->iocb, swap_file, WRITE);
 		sio->iocb.ki_complete = sio_write_complete;
 		sio->iocb.ki_pos = pos;
 		sio->pages = 0;
@@ -443,7 +443,7 @@ static void swap_readpage_fs(struct page *page,
 	}
 	if (!sio) {
 		sio = mempool_alloc(sio_pool, GFP_KERNEL);
-		init_sync_kiocb(&sio->iocb, sis->swap_file);
+		init_kiocb(&sio->iocb, sis->swap_file, READ);
 		sio->iocb.ki_pos = pos;
 		sio->iocb.ki_complete = sio_read_complete;
 		sio->pages = 0;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
  2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
  2023-06-30 15:25 ` [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-06-30 15:39   ` Jens Axboe
                     ` (2 more replies)
  2023-06-30 15:25 ` [RFC PATCH 04/11] iov_iter: Use IOCB_WRITE rather than iterator direction David Howells
                   ` (7 subsequent siblings)
  10 siblings, 3 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

A number of places that generate kiocbs didn't use init_sync_kiocb() to
initialise the new kiocb.  Fix these to always use init_kiocb().

Note that aio and io_uring pass information in through ki_filp through an
overlaid union before I can call init_kiocb(), so that gets reinitialised.
I don't think it clobbers anything else.

After this point, IOCB_WRITE is only set by init_kiocb().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 drivers/block/loop.c              | 11 ++++++-----
 drivers/nvme/target/io-cmd-file.c |  5 +++--
 drivers/target/target_core_file.c |  2 +-
 fs/aio.c                          |  9 ++++-----
 fs/cachefiles/io.c                | 10 ++++------
 io_uring/rw.c                     | 10 +++++-----
 6 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 37511d2b2caf..ea92235c5ba2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -439,16 +439,17 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
 	}
 	atomic_set(&cmd->ref, 2);
 
-	iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
+	iov_iter_bvec(&iter, rw == WRITE ? ITER_SOURCE : ITER_DEST,
+		      bvec, nr_bvec, blk_rq_bytes(rq));
 	iter.iov_offset = offset;
 
+	init_kiocb(&cmd->iocb, file, rw);
 	cmd->iocb.ki_pos = pos;
-	cmd->iocb.ki_filp = file;
 	cmd->iocb.ki_complete = lo_rw_aio_complete;
 	cmd->iocb.ki_flags = IOCB_DIRECT;
 	cmd->iocb.ki_ioprio = IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0);
 
-	if (rw == ITER_SOURCE)
+	if (rw == WRITE)
 		ret = call_write_iter(file, &cmd->iocb, &iter);
 	else
 		ret = call_read_iter(file, &cmd->iocb, &iter);
@@ -490,12 +491,12 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
 		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
 	case REQ_OP_WRITE:
 		if (cmd->use_aio)
-			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
+			return lo_rw_aio(lo, cmd, pos, WRITE);
 		else
 			return lo_write_simple(lo, rq, pos);
 	case REQ_OP_READ:
 		if (cmd->use_aio)
-			return lo_rw_aio(lo, cmd, pos, ITER_DEST);
+			return lo_rw_aio(lo, cmd, pos, READ);
 		else
 			return lo_read_simple(lo, rq, pos);
 	default:
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 2d068439b129..0b6577d51b69 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -85,17 +85,18 @@ static ssize_t nvmet_file_submit_bvec(struct nvmet_req *req, loff_t pos,
 		if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
 			ki_flags |= IOCB_DSYNC;
 		call_iter = req->ns->file->f_op->write_iter;
+		init_kiocb(iocb, req->ns->file, WRITE);
 		rw = ITER_SOURCE;
 	} else {
 		call_iter = req->ns->file->f_op->read_iter;
+		init_kiocb(iocb, req->ns->file, READ);
 		rw = ITER_DEST;
 	}
 
 	iov_iter_bvec(&iter, rw, req->f.bvec, nr_segs, count);
 
+	iocb->ki_flags |= ki_flags;
 	iocb->ki_pos = pos;
-	iocb->ki_filp = req->ns->file;
-	iocb->ki_flags = ki_flags | iocb->ki_filp->f_iocb_flags;
 
 	return call_iter(iocb, &iter);
 }
diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c
index ce0e000b74fc..d70cf89959dc 100644
--- a/drivers/target/target_core_file.c
+++ b/drivers/target/target_core_file.c
@@ -287,11 +287,11 @@ fd_execute_rw_aio(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents,
 	}
 
 	iov_iter_bvec(&iter, is_write, aio_cmd->bvecs, sgl_nents, len);
+	init_kiocb(&aio_cmd->iocb, file, is_write);
 
 	aio_cmd->cmd = cmd;
 	aio_cmd->len = len;
 	aio_cmd->iocb.ki_pos = cmd->t_task_lba * dev->dev_attrib.block_size;
-	aio_cmd->iocb.ki_filp = file;
 	aio_cmd->iocb.ki_complete = cmd_rw_aio_complete;
 	aio_cmd->iocb.ki_flags = IOCB_DIRECT;
 
diff --git a/fs/aio.c b/fs/aio.c
index 77e33619de40..26e173be9448 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1461,14 +1461,14 @@ static void aio_complete_rw(struct kiocb *kiocb, long res)
 	iocb_put(iocb);
 }
 
-static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
+static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb, int rw)
 {
 	int ret;
 
+	init_kiocb(req, req->ki_filp, rw);
 	req->ki_complete = aio_complete_rw;
 	req->private = NULL;
 	req->ki_pos = iocb->aio_offset;
-	req->ki_flags = req->ki_filp->f_iocb_flags;
 	if (iocb->aio_flags & IOCB_FLAG_RESFD)
 		req->ki_flags |= IOCB_EVENTFD;
 	if (iocb->aio_flags & IOCB_FLAG_IOPRIO) {
@@ -1539,7 +1539,7 @@ static int aio_read(struct kiocb *req, const struct iocb *iocb,
 	struct file *file;
 	int ret;
 
-	ret = aio_prep_rw(req, iocb);
+	ret = aio_prep_rw(req, iocb, READ);
 	if (ret)
 		return ret;
 	file = req->ki_filp;
@@ -1566,7 +1566,7 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
 	struct file *file;
 	int ret;
 
-	ret = aio_prep_rw(req, iocb);
+	ret = aio_prep_rw(req, iocb, WRITE);
 	if (ret)
 		return ret;
 	file = req->ki_filp;
@@ -1592,7 +1592,6 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb,
 			sb_start_write(file_inode(file)->i_sb);
 			__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
 		}
-		req->ki_flags |= IOCB_WRITE;
 		aio_rw_done(req, call_write_iter(file, req, &iter));
 	}
 	kfree(iovec);
diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c
index 175a25fcade8..2c47788f38d2 100644
--- a/fs/cachefiles/io.c
+++ b/fs/cachefiles/io.c
@@ -134,11 +134,10 @@ static int cachefiles_read(struct netfs_cache_resources *cres,
 	if (!ki)
 		goto presubmission_error;
 
+	init_kiocb(&ki->iocb, file, READ);
 	refcount_set(&ki->ki_refcnt, 2);
-	ki->iocb.ki_filp	= file;
+	ki->iocb.ki_flags	|= IOCB_DIRECT;
 	ki->iocb.ki_pos		= start_pos + skipped;
-	ki->iocb.ki_flags	= IOCB_DIRECT;
-	ki->iocb.ki_ioprio	= get_current_ioprio();
 	ki->skipped		= skipped;
 	ki->object		= object;
 	ki->inval_counter	= cres->inval_counter;
@@ -306,10 +305,9 @@ int __cachefiles_write(struct cachefiles_object *object,
 	}
 
 	refcount_set(&ki->ki_refcnt, 2);
-	ki->iocb.ki_filp	= file;
+	init_kiocb(&ki->iocb, file, WRITE);
 	ki->iocb.ki_pos		= start_pos;
-	ki->iocb.ki_flags	= IOCB_DIRECT | IOCB_WRITE;
-	ki->iocb.ki_ioprio	= get_current_ioprio();
+	ki->iocb.ki_flags	|= IOCB_DIRECT;
 	ki->object		= object;
 	ki->start		= start_pos;
 	ki->len			= len;
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 1bce2208b65c..1cade1567162 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -655,12 +655,13 @@ static bool need_complete_io(struct io_kiocb *req)
 		S_ISBLK(file_inode(req->file)->i_mode);
 }
 
-static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
+static int io_rw_init_file(struct io_kiocb *req, unsigned int io_direction)
 {
 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 	struct kiocb *kiocb = &rw->kiocb;
 	struct io_ring_ctx *ctx = req->ctx;
 	struct file *file = req->file;
+	fmode_t mode = (io_direction == WRITE) ? FMODE_WRITE : FMODE_READ;
 	int ret;
 
 	if (unlikely(!file || !(file->f_mode & mode)))
@@ -669,7 +670,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 	if (!(req->flags & REQ_F_FIXED_FILE))
 		req->flags |= io_file_get_flags(file);
 
-	kiocb->ki_flags = file->f_iocb_flags;
+	init_kiocb(kiocb, file, io_direction);
 	ret = kiocb_set_rw_flags(kiocb, rw->flags);
 	if (unlikely(ret))
 		return ret;
@@ -738,7 +739,7 @@ int io_read(struct io_kiocb *req, unsigned int issue_flags)
 		iov_iter_restore(&s->iter, &s->iter_state);
 		iovec = NULL;
 	}
-	ret = io_rw_init_file(req, FMODE_READ);
+	ret = io_rw_init_file(req, READ);
 	if (unlikely(ret)) {
 		kfree(iovec);
 		return ret;
@@ -870,7 +871,7 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 		iov_iter_restore(&s->iter, &s->iter_state);
 		iovec = NULL;
 	}
-	ret = io_rw_init_file(req, FMODE_WRITE);
+	ret = io_rw_init_file(req, WRITE);
 	if (unlikely(ret)) {
 		kfree(iovec);
 		return ret;
@@ -914,7 +915,6 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 		__sb_writers_release(file_inode(req->file)->i_sb,
 					SB_FREEZE_WRITE);
 	}
-	kiocb->ki_flags |= IOCB_WRITE;
 
 	if (likely(req->file->f_op->write_iter))
 		ret2 = call_write_iter(req->file, kiocb, &s->iter);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 04/11] iov_iter: Use IOCB_WRITE rather than iterator direction
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (2 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-06-30 15:25 ` [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE " David Howells
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

If a kiocb is available, use the IOCB_WRITE flag instead of the iterator
direction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 block/fops.c         |  8 ++++----
 fs/9p/vfs_addr.c     |  2 +-
 fs/affs/file.c       |  4 ++--
 fs/ceph/file.c       |  2 +-
 fs/dax.c             |  2 +-
 fs/direct-io.c       | 22 +++++++++++-----------
 fs/exfat/inode.c     |  6 +++---
 fs/ext2/inode.c      |  2 +-
 fs/f2fs/file.c       | 10 +++++-----
 fs/fat/inode.c       |  4 ++--
 fs/fuse/dax.c        |  2 +-
 fs/fuse/file.c       |  8 ++++----
 fs/hfs/inode.c       |  2 +-
 fs/hfsplus/inode.c   |  2 +-
 fs/iomap/direct-io.c |  2 +-
 fs/jfs/inode.c       |  2 +-
 fs/nfs/direct.c      |  2 +-
 fs/nilfs2/inode.c    |  2 +-
 fs/ntfs3/inode.c     |  2 +-
 fs/ocfs2/aops.c      |  2 +-
 fs/orangefs/inode.c  |  2 +-
 fs/reiserfs/inode.c  |  2 +-
 fs/udf/inode.c       |  2 +-
 include/linux/fs.h   | 10 ++++++++++
 24 files changed, 57 insertions(+), 47 deletions(-)

diff --git a/block/fops.c b/block/fops.c
index a286bf3325c5..e70a5ef4d447 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,7 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 			return -ENOMEM;
 	}
 
-	if (iov_iter_rw(iter) == READ) {
+	if (iocb_is_read(iocb)) {
 		bio_init(&bio, bdev, vecs, nr_pages, REQ_OP_READ);
 		if (user_backed_iter(iter))
 			should_dirty = true;
@@ -88,7 +88,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 		goto out;
 	ret = bio.bi_iter.bi_size;
 
-	if (iov_iter_rw(iter) == WRITE)
+	if (iocb_is_write(iocb))
 		task_io_account_write(ret);
 
 	if (iocb->ki_flags & IOCB_NOWAIT)
@@ -174,7 +174,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	struct blk_plug plug;
 	struct blkdev_dio *dio;
 	struct bio *bio;
-	bool is_read = (iov_iter_rw(iter) == READ), is_sync;
+	bool is_read = iocb_is_read(iocb), is_sync;
 	blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
 	loff_t pos = iocb->ki_pos;
 	int ret = 0;
@@ -311,7 +311,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 					unsigned int nr_pages)
 {
 	struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
-	bool is_read = iov_iter_rw(iter) == READ;
+	bool is_read = iocb_is_read(iocb);
 	blk_opf_t opf = is_read ? REQ_OP_READ : dio_bio_write_op(iocb);
 	struct blkdev_dio *dio;
 	struct bio *bio;
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 8a635999a7d6..18bce4f8a158 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -257,7 +257,7 @@ v9fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	ssize_t n;
 	int err = 0;
 
-	if (iov_iter_rw(iter) == WRITE) {
+	if (iocb_is_write(iocb)) {
 		n = p9_client_write(file->private_data, pos, iter, &err);
 		if (n) {
 			struct inode *inode = file_inode(file);
diff --git a/fs/affs/file.c b/fs/affs/file.c
index e43f2f007ac1..ad70ff27e5f2 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -400,7 +400,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	loff_t offset = iocb->ki_pos;
 	ssize_t ret;
 
-	if (iov_iter_rw(iter) == WRITE) {
+	if (iocb_is_write(iocb)) {
 		loff_t size = offset + count;
 
 		if (AFFS_I(inode)->mmu_private < size)
@@ -408,7 +408,7 @@ affs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	}
 
 	ret = blockdev_direct_IO(iocb, inode, iter, affs_get_block);
-	if (ret < 0 && iov_iter_rw(iter) == WRITE)
+	if (ret < 0 && iocb_is_write(iocb))
 		affs_write_failed(mapping, offset + count);
 	return ret;
 }
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 3bb27b9ce751..453429e9b9e6 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1280,7 +1280,7 @@ ceph_direct_read_write(struct kiocb *iocb, struct iov_iter *iter,
 	struct timespec64 mtime = current_time(inode);
 	size_t count = iov_iter_count(iter);
 	loff_t pos = iocb->ki_pos;
-	bool write = iov_iter_rw(iter) == WRITE;
+	bool write = iocb_is_write(iocb);
 	bool should_dirty = !write && user_backed_iter(iter);
 
 	if (write && ceph_snap(file_inode(file)) != CEPH_NOSNAP)
diff --git a/fs/dax.c b/fs/dax.c
index 2ababb89918d..5c6ebe64a3fd 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1573,7 +1573,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (!iomi.len)
 		return 0;
 
-	if (iov_iter_rw(iter) == WRITE) {
+	if (iocb_is_write(iocb)) {
 		lockdep_assert_held_write(&iomi.inode->i_rwsem);
 		iomi.flags |= IOMAP_WRITE;
 	} else {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 7bc494ee56b9..2b517cc5b32d 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1125,7 +1125,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 */
 
 	/* watch out for a 0 len io from a tricksy fs */
-	if (iov_iter_rw(iter) == READ && !count)
+	if (iocb_is_read(iocb) && !count)
 		return 0;
 
 	dio = kmem_cache_alloc(dio_cache, GFP_KERNEL);
@@ -1139,7 +1139,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	memset(dio, 0, offsetof(struct dio, pages));
 
 	dio->flags = flags;
-	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
+	if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) {
 		/* will be released by direct_io_worker */
 		inode_lock(inode);
 	}
@@ -1147,7 +1147,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 
 	/* Once we sampled i_size check for reads beyond EOF */
 	dio->i_size = i_size_read(inode);
-	if (iov_iter_rw(iter) == READ && offset >= dio->i_size) {
+	if (iocb_is_read(iocb) && offset >= dio->i_size) {
 		retval = 0;
 		goto fail_dio;
 	}
@@ -1160,7 +1160,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 			goto fail_dio;
 	}
 
-	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
+	if (dio->flags & DIO_LOCKING && iocb_is_read(iocb)) {
 		struct address_space *mapping = iocb->ki_filp->f_mapping;
 
 		retval = filemap_write_and_wait_range(mapping, offset, end - 1);
@@ -1176,13 +1176,13 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 */
 	if (is_sync_kiocb(iocb))
 		dio->is_async = false;
-	else if (iov_iter_rw(iter) == WRITE && end > i_size_read(inode))
+	else if (iocb_is_write(iocb) && end > i_size_read(inode))
 		dio->is_async = false;
 	else
 		dio->is_async = true;
 
 	dio->inode = inode;
-	if (iov_iter_rw(iter) == WRITE) {
+	if (iocb_is_write(iocb)) {
 		dio->opf = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
 		if (iocb->ki_flags & IOCB_NOWAIT)
 			dio->opf |= REQ_NOWAIT;
@@ -1194,7 +1194,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 * For AIO O_(D)SYNC writes we need to defer completions to a workqueue
 	 * so that we can call ->fsync.
 	 */
-	if (dio->is_async && iov_iter_rw(iter) == WRITE) {
+	if (dio->is_async && iocb_is_write(iocb)) {
 		retval = 0;
 		if (iocb_is_dsync(iocb))
 			retval = dio_set_defer_completion(dio);
@@ -1230,7 +1230,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	spin_lock_init(&dio->bio_lock);
 	dio->refcount = 1;
 
-	dio->should_dirty = user_backed_iter(iter) && iov_iter_rw(iter) == READ;
+	dio->should_dirty = user_backed_iter(iter) && iocb_is_read(iocb);
 	sdio.iter = iter;
 	sdio.final_block_in_request = end >> blkbits;
 
@@ -1287,7 +1287,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 * we can let i_mutex go now that its achieved its purpose
 	 * of protecting us from looking up uninitialized blocks.
 	 */
-	if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))
+	if (iocb_is_read(iocb) && (dio->flags & DIO_LOCKING))
 		inode_unlock(dio->inode);
 
 	/*
@@ -1299,7 +1299,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 */
 	BUG_ON(retval == -EIOCBQUEUED);
 	if (dio->is_async && retval == 0 && dio->result &&
-	    (iov_iter_rw(iter) == READ || dio->result == count))
+	    (iocb_is_read(iocb) || dio->result == count))
 		retval = -EIOCBQUEUED;
 	else
 		dio_await_completion(dio);
@@ -1312,7 +1312,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	return retval;
 
 fail_dio:
-	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
+	if (dio->flags & DIO_LOCKING && iocb_is_read(iocb))
 		inode_unlock(inode);
 
 	kmem_cache_free(dio_cache, dio);
diff --git a/fs/exfat/inode.c b/fs/exfat/inode.c
index 481dd338f2b8..706db9cd1b4e 100644
--- a/fs/exfat/inode.c
+++ b/fs/exfat/inode.c
@@ -411,10 +411,10 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct address_space *mapping = iocb->ki_filp->f_mapping;
 	struct inode *inode = mapping->host;
 	loff_t size = iocb->ki_pos + iov_iter_count(iter);
-	int rw = iov_iter_rw(iter);
+	bool writing = iocb_is_write(iocb);
 	ssize_t ret;
 
-	if (rw == WRITE) {
+	if (writing) {
 		/*
 		 * FIXME: blockdev_direct_IO() doesn't use ->write_begin(),
 		 * so we need to update the ->i_size_aligned to block boundary.
@@ -433,7 +433,7 @@ static ssize_t exfat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	 * condition of exfat_get_block() and ->truncate().
 	 */
 	ret = blockdev_direct_IO(iocb, inode, iter, exfat_get_block);
-	if (ret < 0 && (rw & WRITE))
+	if (ret < 0 && writing)
 		exfat_write_failed(mapping, size);
 	return ret;
 }
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 26f135e7ffce..b239c16ab7ee 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -919,7 +919,7 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	ssize_t ret;
 
 	ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block);
-	if (ret < 0 && iov_iter_rw(iter) == WRITE)
+	if (ret < 0 && iocb_is_write(iocb))
 		ext2_write_failed(mapping, offset + count);
 	return ret;
 }
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 2435111a8532..7ef596b901d9 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -805,7 +805,7 @@ int f2fs_truncate(struct inode *inode)
 	return 0;
 }
 
-static bool f2fs_force_buffered_io(struct inode *inode, int rw)
+static bool f2fs_force_buffered_io(struct inode *inode, bool writing)
 {
 	struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
 
@@ -823,9 +823,9 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw)
 	 * for blkzoned device, fallback direct IO to buffered IO, so
 	 * all IOs can be serialized by log-structured write.
 	 */
-	if (f2fs_sb_has_blkzoned(sbi) && (rw == WRITE))
+	if (f2fs_sb_has_blkzoned(sbi) && writing)
 		return true;
-	if (f2fs_lfs_mode(sbi) && rw == WRITE && F2FS_IO_ALIGNED(sbi))
+	if (f2fs_lfs_mode(sbi) && writing && F2FS_IO_ALIGNED(sbi))
 		return true;
 	if (is_sbi_flag_set(sbi, SBI_CP_DISABLED))
 		return true;
@@ -861,7 +861,7 @@ int f2fs_getattr(struct mnt_idmap *idmap, const struct path *path,
 		unsigned int bsize = i_blocksize(inode);
 
 		stat->result_mask |= STATX_DIOALIGN;
-		if (!f2fs_force_buffered_io(inode, WRITE)) {
+		if (!f2fs_force_buffered_io(inode, true)) {
 			stat->dio_mem_align = bsize;
 			stat->dio_offset_align = bsize;
 		}
@@ -4280,7 +4280,7 @@ static bool f2fs_should_use_dio(struct inode *inode, struct kiocb *iocb,
 	if (!(iocb->ki_flags & IOCB_DIRECT))
 		return false;
 
-	if (f2fs_force_buffered_io(inode, iov_iter_rw(iter)))
+	if (f2fs_force_buffered_io(inode, iocb_is_write(iocb)))
 		return false;
 
 	/*
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index d99b8549ec8f..237e20891df2 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -261,7 +261,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	loff_t offset = iocb->ki_pos;
 	ssize_t ret;
 
-	if (iov_iter_rw(iter) == WRITE) {
+	if (iocb_is_write(iocb)) {
 		/*
 		 * FIXME: blockdev_direct_IO() doesn't use ->write_begin(),
 		 * so we need to update the ->mmu_private to block boundary.
@@ -281,7 +281,7 @@ static ssize_t fat_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	 * condition of fat_get_block() and ->truncate().
 	 */
 	ret = blockdev_direct_IO(iocb, inode, iter, fat_get_block);
-	if (ret < 0 && iov_iter_rw(iter) == WRITE)
+	if (ret < 0 && iocb_is_write(iocb))
 		fat_write_failed(mapping, offset + count);
 
 	return ret;
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 8e74f278a3f6..7bfbe1783462 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -720,7 +720,7 @@ static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 
-	return (iov_iter_rw(from) == WRITE &&
+	return (iocb_is_write(iocb) &&
 		((iocb->ki_pos) >= i_size_read(inode) ||
 		  (iocb->ki_pos + iov_iter_count(from) > i_size_read(inode))));
 }
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index bc4115288eec..f43a39faa8e4 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2911,7 +2911,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	inode = file->f_mapping->host;
 	i_size = i_size_read(inode);
 
-	if ((iov_iter_rw(iter) == READ) && (offset >= i_size))
+	if (iocb_is_read(iocb) && (offset >= i_size))
 		return 0;
 
 	io = kmalloc(sizeof(struct fuse_io_priv), GFP_KERNEL);
@@ -2923,7 +2923,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	io->bytes = -1;
 	io->size = 0;
 	io->offset = offset;
-	io->write = (iov_iter_rw(iter) == WRITE);
+	io->write = iocb_is_write(iocb);
 	io->err = 0;
 	/*
 	 * By default, we want to optimize all I/Os with async request
@@ -2956,7 +2956,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 		io->done = &wait;
 	}
 
-	if (iov_iter_rw(iter) == WRITE) {
+	if (iocb_is_write(iocb)) {
 		ret = fuse_direct_io(io, iter, &pos, FUSE_DIO_WRITE);
 		fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
 	} else {
@@ -2979,7 +2979,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 	kref_put(&io->refcnt, fuse_io_release);
 
-	if (iov_iter_rw(iter) == WRITE) {
+	if (iocb_is_write(iocb)) {
 		fuse_write_update_attr(inode, pos, ret);
 		/* For extending writes we already hold exclusive lock */
 		if (ret < 0 && offset + count > i_size)
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index 441d7fc952e3..d95ac1f559b5 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -141,7 +141,7 @@ static ssize_t hfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	 * In case of error extending write may have instantiated a few
 	 * blocks outside i_size. Trim these off again.
 	 */
-	if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+	if (unlikely(iocb_is_write(iocb) && ret < 0)) {
 		loff_t isize = i_size_read(inode);
 		loff_t end = iocb->ki_pos + count;
 
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index 7d1a675e037d..9851a6cb9e3b 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -138,7 +138,7 @@ static ssize_t hfsplus_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	 * In case of error extending write may have instantiated a few
 	 * blocks outside i_size. Trim these off again.
 	 */
-	if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+	if (unlikely(iocb_is_write(iocb) && ret < 0)) {
 		loff_t isize = i_size_read(inode);
 		loff_t end = iocb->ki_pos + count;
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index ea3b868c8355..a56099470185 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -510,7 +510,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		iomi.flags |= IOMAP_NOWAIT;
 
-	if (iov_iter_rw(iter) == READ) {
+	if (iocb_is_read(iocb)) {
 		if (iomi.pos >= dio->i_size)
 			goto out_free_dio;
 
diff --git a/fs/jfs/inode.c b/fs/jfs/inode.c
index 8ac10e396050..0d1f94ac9488 100644
--- a/fs/jfs/inode.c
+++ b/fs/jfs/inode.c
@@ -334,7 +334,7 @@ static ssize_t jfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	 * In case of error extending write may have instantiated a few
 	 * blocks outside i_size. Trim these off again.
 	 */
-	if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+	if (unlikely(iocb_is_write(iocb) && ret < 0)) {
 		loff_t isize = i_size_read(inode);
 		loff_t end = iocb->ki_pos + count;
 
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 9a18c5a69ace..5f4522dd05b5 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -133,7 +133,7 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
 
 	VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
 
-	if (iov_iter_rw(iter) == READ)
+	if (iocb_is_read(iocb))
 		ret = nfs_file_direct_read(iocb, iter, true);
 	else
 		ret = nfs_file_direct_write(iocb, iter, true);
diff --git a/fs/nilfs2/inode.c b/fs/nilfs2/inode.c
index a8ce522ac747..93c61fcb5e91 100644
--- a/fs/nilfs2/inode.c
+++ b/fs/nilfs2/inode.c
@@ -289,7 +289,7 @@ nilfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 
-	if (iov_iter_rw(iter) == WRITE)
+	if (iocb_is_write(iocb))
 		return 0;
 
 	/* Needs synchronization with the cleaner */
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index 6c560245eef4..5b50c4653ff8 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -774,7 +774,7 @@ static ssize_t ntfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	struct ntfs_inode *ni = ntfs_i(inode);
 	loff_t vbo = iocb->ki_pos;
 	loff_t end;
-	int wr = iov_iter_rw(iter) & WRITE;
+	bool wr = iocb_is_write(iocb);
 	size_t iter_count = iov_iter_count(iter);
 	loff_t valid;
 	ssize_t ret;
diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 8dfc284e85f0..e8afc0c13f71 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -2456,7 +2456,7 @@ static ssize_t ocfs2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	    !ocfs2_supports_append_dio(osb))
 		return 0;
 
-	if (iov_iter_rw(iter) == READ)
+	if (iocb_is_read(iocb))
 		get_block = ocfs2_lock_get_block;
 	else
 		get_block = ocfs2_dio_wr_get_block;
diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c
index 9014bbcc8031..120a9a6c0f3b 100644
--- a/fs/orangefs/inode.c
+++ b/fs/orangefs/inode.c
@@ -513,7 +513,7 @@ static ssize_t orangefs_direct_IO(struct kiocb *iocb,
 	 */
 	struct file *file = iocb->ki_filp;
 	loff_t pos = iocb->ki_pos;
-	enum ORANGEFS_io_type type = iov_iter_rw(iter) == WRITE ?
+	enum ORANGEFS_io_type type = iocb_is_write(iocb) ?
             ORANGEFS_IO_WRITE : ORANGEFS_IO_READ;
 	loff_t *offset = &pos;
 	struct inode *inode = file->f_mapping->host;
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 77bd3b27059f..ca57930178f4 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -3248,7 +3248,7 @@ static ssize_t reiserfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	 * In case of error extending write may have instantiated a few
 	 * blocks outside i_size. Trim these off again.
 	 */
-	if (unlikely(iov_iter_rw(iter) == WRITE && ret < 0)) {
+	if (unlikely(iocb_is_write(iocb) && ret < 0)) {
 		loff_t isize = i_size_read(inode);
 		loff_t end = iocb->ki_pos + count;
 
diff --git a/fs/udf/inode.c b/fs/udf/inode.c
index 1e71e04ae8f6..d49c12c1f4d9 100644
--- a/fs/udf/inode.c
+++ b/fs/udf/inode.c
@@ -311,7 +311,7 @@ static ssize_t udf_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	if (UDF_I(inode)->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB)
 		return 0;
 	ret = blockdev_direct_IO(iocb, inode, iter, udf_get_block);
-	if (unlikely(ret < 0 && iov_iter_rw(iter) == WRITE))
+	if (unlikely(ret < 0 && iocb_is_write(iocb)))
 		udf_write_failed(mapping, iocb->ki_pos + count);
 	return ret;
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 466eba253502..e344db819498 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -368,6 +368,16 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb)
 	return kiocb->ki_complete == NULL;
 }
 
+static inline bool iocb_is_write(const struct kiocb *kiocb)
+{
+	return kiocb->ki_flags & IOCB_WRITE;
+}
+
+static inline bool iocb_is_read(const struct kiocb *kiocb)
+{
+	return !iocb_is_write(kiocb);
+}
+
 struct address_space_operations {
 	int (*writepage)(struct page *page, struct writeback_control *wbc);
 	int (*read_folio)(struct file *, struct folio *);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE rather than iterator direction
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (3 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 04/11] iov_iter: Use IOCB_WRITE rather than iterator direction David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-07-06 15:30   ` Christoph Hellwig
  2023-06-30 15:25 ` [RFC PATCH 06/11] iov_iter: Use op_is_write() " David Howells
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

If an iomap_iter is available, use the IOMAP_WRITE flag instead of the
iterator direction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/dax.c             | 4 ++--
 fs/iomap/direct-io.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5c6ebe64a3fd..d49480675fc0 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1438,7 +1438,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 	loff_t pos = iomi->pos;
 	struct dax_device *dax_dev = iomap->dax_dev;
 	loff_t end = pos + length, done = 0;
-	bool write = iov_iter_rw(iter) == WRITE;
+	bool write = iomi->flags & IOMAP_WRITE;
 	bool cow = write && iomap->flags & IOMAP_F_SHARED;
 	ssize_t ret = 0;
 	size_t xfer;
@@ -1498,7 +1498,7 @@ static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
 
 		map_len = dax_direct_access(dax_dev, pgoff, PHYS_PFN(size),
 				DAX_ACCESS, &kaddr, NULL);
-		if (map_len == -EIO && iov_iter_rw(iter) == WRITE) {
+		if (map_len == -EIO && write) {
 			map_len = dax_direct_access(dax_dev, pgoff,
 					PHYS_PFN(size), DAX_RECOVERY_WRITE,
 					&kaddr, NULL);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index a56099470185..3095091af680 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -587,7 +587,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	 * Revert iter to a state corresponding to that as some callers (such
 	 * as the splice code) rely on it.
 	 */
-	if (iov_iter_rw(iter) == READ && iomi.pos >= dio->i_size)
+	if (!(iomi.flags & IOMAP_WRITE) && iomi.pos >= dio->i_size)
 		iov_iter_revert(iter, iomi.pos - dio->i_size);
 
 	if (ret == -EFAULT && dio->size && (dio_flags & IOMAP_DIO_PARTIAL)) {


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 06/11] iov_iter: Use op_is_write() rather than iterator direction
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (4 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE " David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-07-06 15:30   ` Christoph Hellwig
  2023-06-30 15:25 ` [RFC PATCH 07/11] cifs: Drop the check using iov_iter_rw() David Howells
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

If a request struct is available, use op_is_write() instead of the iterator
direction.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 block/blk-map.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-map.c b/block/blk-map.c
index 44d74a30ddac..f8fe114ae433 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -205,7 +205,7 @@ static int bio_copy_user_iov(struct request *rq, struct rq_map_data *map_data,
 	/*
 	 * success
 	 */
-	if ((iov_iter_rw(iter) == WRITE &&
+	if ((op_is_write(rq->cmd_flags) &&
 	     (!map_data || !map_data->null_mapped)) ||
 	    (map_data && map_data->from_user)) {
 		ret = bio_copy_from_iter(bio, iter);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 07/11] cifs: Drop the check using iov_iter_rw()
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (5 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 06/11] iov_iter: Use op_is_write() " David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-06-30 15:25 ` [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user David Howells
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Steve French, Paulo Alcantara,
	Ronnie Sahlberg, Shyam Prasad N, Tom Talpey, Christoph Hellwig,
	Christian Brauner, linux-cifs

smbd_recv() has a check that the iterator is the correct direction.  Drop
this check as we're getting rid of iov_iter_rw().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Paulo Alcantara <pc@cjr.nz>
cc: Ronnie Sahlberg <lsahlber@redhat.com>
cc: Shyam Prasad N <sprasad@microsoft.com>
cc: Tom Talpey <tom@talpey.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-cifs@vger.kernel.org
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 fs/smb/client/smbdirect.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/fs/smb/client/smbdirect.c b/fs/smb/client/smbdirect.c
index 223e17c16b60..672078d00207 100644
--- a/fs/smb/client/smbdirect.c
+++ b/fs/smb/client/smbdirect.c
@@ -1906,14 +1906,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 	unsigned int to_read, page_offset;
 	int rc;
 
-	if (iov_iter_rw(&msg->msg_iter) == WRITE) {
-		/* It's a bug in upper layer to get there */
-		cifs_dbg(VFS, "Invalid msg iter dir %u\n",
-			 iov_iter_rw(&msg->msg_iter));
-		rc = -EINVAL;
-		goto out;
-	}
-
 	switch (iov_iter_type(&msg->msg_iter)) {
 	case ITER_KVEC:
 		buf = msg->msg_iter.kvec->iov_base;
@@ -1935,7 +1927,6 @@ int smbd_recv(struct smbd_connection *info, struct msghdr *msg)
 		rc = -EINVAL;
 	}
 
-out:
 	/* SMBDirect will read it all or nothing */
 	if (rc > 0)
 		msg->msg_iter.count = 0;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (6 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 07/11] cifs: Drop the check using iov_iter_rw() David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-07-06 15:31   ` Christoph Hellwig
  2023-06-30 15:25 ` [RFC PATCH 09/11] iov_iter: Use I/O dir flags with iov_iter_extract_pages() David Howells
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

Drop the last usage of iov_iter_rw() into __iov_iter_get_pages_alloc().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
---
 include/linux/uio.h | 5 -----
 lib/iov_iter.c      | 2 +-
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index ff81e5ccaef2..70e12f536f8f 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -136,11 +136,6 @@ static inline bool iov_iter_is_xarray(const struct iov_iter *i)
 	return iov_iter_type(i) == ITER_XARRAY;
 }
 
-static inline unsigned char iov_iter_rw(const struct iov_iter *i)
-{
-	return i->data_source ? WRITE : READ;
-}
-
 static inline bool user_backed_iter(const struct iov_iter *i)
 {
 	return i->user_backed;
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b667b1e2f688..b8c52231a6ff 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1097,7 +1097,7 @@ static ssize_t __iov_iter_get_pages_alloc(struct iov_iter *i,
 		unsigned long addr;
 		int res;
 
-		if (iov_iter_rw(i) != WRITE)
+		if (i->data_source == ITER_SOURCE)
 			gup_flags |= FOLL_WRITE;
 		if (i->nofault)
 			gup_flags |= FOLL_NOFAULT;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 09/11] iov_iter: Use I/O dir flags with iov_iter_extract_pages()
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (7 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-06-30 15:25 ` [RFC PATCH 10/11] 9p: Pin pages rather than ref'ing if appropriate David Howells
  2023-06-30 15:25 ` [RFC PATCH 11/11] scsi: Use extract_iter_to_sg() David Howells
  10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner,
	Herbert Xu, linux-crypto

Define flags to pass into iov_iter_extract_pages() to indicate I/O
direction.  A warning is issued and the function fails if neither or both
flags are set.  The flag is also checked against the iterator's data_source
flag.

Also make extract_iter_to_sg() check the flags and propagate them to
iov_iter_extract_pages().

Finally, make the callers pass the flags into iov_iter_extract_pages() and
extract_iter_to_sg().

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Christoph Hellwig <hch@lst.de>
cc: Jens Axboe <axboe@kernel.dk>
cc: Christian Brauner <christian@brauner.io>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Herbert Xu <herbert@gondor.apana.org.au>
cc: linux-block@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-crypto@vger.kernel.org
---
 block/bio.c         |  6 ++++++
 block/blk-map.c     |  3 +++
 crypto/af_alg.c     |  5 +++--
 crypto/algif_hash.c |  3 ++-
 fs/direct-io.c      |  6 +++++-
 include/linux/bio.h | 18 ++++++++++++++++--
 include/linux/uio.h |  5 ++++-
 lib/iov_iter.c      | 12 ++++++++++--
 lib/scatterlist.c   | 12 ++++++++++--
 9 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 8672179213b9..440d4889ba13 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1242,6 +1242,8 @@ static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
  * will have to be cleaned up in the way indicated by the BIO_PAGE_PINNED flag.
  * For a multi-segment *iter, this function only adds pages from the next
  * non-empty segment of the iov iterator.
+ *
+ * The I/O direction is determined from the bio operation type.
  */
 static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -1263,6 +1265,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	BUILD_BUG_ON(PAGE_PTRS_PER_BVEC < 2);
 	pages += entries_left * (PAGE_PTRS_PER_BVEC - 1);
 
+	extraction_flags |= bio_is_write(bio) ? WRITE_FROM_ITER : READ_INTO_ITER;
+
 	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
 
@@ -1332,6 +1336,8 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
  * fit into the bio, or are requested in @iter, whatever is smaller. If
  * MM encounters an error pinning the requested pages, it stops. Error
  * is returned only if 0 pages could be pinned.
+ *
+ * The bio operation indicates the data direction.
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
diff --git a/block/blk-map.c b/block/blk-map.c
index f8fe114ae433..71de2cf9bf4e 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -279,6 +279,9 @@ static int bio_map_user_iov(struct request *rq, struct iov_iter *iter,
 	if (bio == NULL)
 		return -ENOMEM;
 
+	extraction_flags |=
+		bio_is_write(bio) ? WRITE_FROM_ITER : READ_INTO_ITER;
+
 	if (blk_queue_pci_p2pdma(rq->q))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
 	if (iov_iter_extract_will_pin(iter))
diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 6218c773d71c..c62ac5e32aeb 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -1042,7 +1042,8 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
 			};
 
 			plen = extract_iter_to_sg(&msg->msg_iter, len, &sgtable,
-						  MAX_SGL_ENTS - sgl->cur, 0);
+						  MAX_SGL_ENTS - sgl->cur,
+						  WRITE_FROM_ITER);
 			if (plen < 0) {
 				err = plen;
 				goto unlock;
@@ -1247,7 +1248,7 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags,
 
 		sg_init_table(rsgl->sgl.sgt.sgl, ALG_MAX_PAGES);
 		err = extract_iter_to_sg(&msg->msg_iter, seglen, &rsgl->sgl.sgt,
-					 ALG_MAX_PAGES, 0);
+					 ALG_MAX_PAGES, READ_INTO_ITER);
 		if (err < 0) {
 			rsgl->sg_num_bytes = 0;
 			return err;
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 0ab43e149f0e..b343c4973dbf 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -115,7 +115,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr *msg,
 		ctx->sgl.need_unpin = iov_iter_extract_will_pin(&msg->msg_iter);
 
 		err = extract_iter_to_sg(&msg->msg_iter, LONG_MAX,
-					 &ctx->sgl.sgt, npages, 0);
+					 &ctx->sgl.sgt, npages,
+					 WRITE_FROM_ITER);
 		if (err < 0)
 			goto unlock_free;
 		len = err;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 2b517cc5b32d..3316d2f8f21c 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -168,10 +168,14 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio)
 {
 	struct page **pages = dio->pages;
 	const enum req_op dio_op = dio->opf & REQ_OP_MASK;
+	unsigned int extraction_flags;
 	ssize_t ret;
 
+	extraction_flags =
+		op_is_write(dio_op) ? WRITE_FROM_ITER : READ_INTO_ITER;
+
 	ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX,
-				     DIO_PAGES, 0, &sdio->from);
+				     DIO_PAGES, extraction_flags, &sdio->from);
 
 	if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) {
 		/*
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c4f5b5228105..d4b4c477e69b 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -42,11 +42,25 @@ static inline unsigned int bio_max_segs(unsigned int nr_segs)
 #define bio_sectors(bio)	bvec_iter_sectors((bio)->bi_iter)
 #define bio_end_sector(bio)	bvec_iter_end_sector((bio)->bi_iter)
 
+/**
+ * bio_is_write - Query if the I/O direction is towards the disk
+ * @bio: The bio to query
+ *
+ * Return true if this is some sort of write operation - ie. the data is going
+ * towards the disk.
+ */
+static inline bool bio_is_write(const struct bio *bio)
+{
+	return op_is_write(bio_op(bio));
+}
+
 /*
  * Return the data direction, READ or WRITE.
  */
-#define bio_data_dir(bio) \
-	(op_is_write(bio_op(bio)) ? WRITE : READ)
+static inline int bio_data_dir(const struct bio *bio)
+{
+	return bio_is_write(bio) ? WRITE : READ;
+}
 
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
diff --git a/include/linux/uio.h b/include/linux/uio.h
index 70e12f536f8f..6ea7c1b589c1 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -379,8 +379,11 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
 	};
 }
 /* Flags for iov_iter_get/extract_pages*() */
+/* Indicate if we are going to be writing from the buffer or reading into it. */
+#define WRITE_FROM_ITER		((__force iov_iter_extraction_t)0x01) // == WRITE
+#define READ_INTO_ITER		((__force iov_iter_extraction_t)0x02)
 /* Allow P2PDMA on the extracted pages */
-#define ITER_ALLOW_P2PDMA	((__force iov_iter_extraction_t)0x01)
+#define ITER_ALLOW_P2PDMA	((__force iov_iter_extraction_t)0x04)
 
 ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
 			       size_t maxsize, unsigned int maxpages,
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index b8c52231a6ff..d31f40b69da5 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1791,8 +1791,10 @@ static ssize_t iov_iter_extract_user_pages(struct iov_iter *i,
  * that the caller allocated a page list at least @maxpages in size and this
  * will be filled in.
  *
- * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA
- * be allowed on the pages extracted.
+ * @extraction_flags should have either WRITE_FROM_ITER or READ_INTO_ITER to
+ * indicate the direction the data is intended to flow to/from the buffer and
+ * can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be allowed on the
+ * pages extracted.
  *
  * The iov_iter_extract_will_pin() function can be used to query how cleanup
  * should be performed.
@@ -1823,6 +1825,12 @@ ssize_t iov_iter_extract_pages(struct iov_iter *i,
 			       iov_iter_extraction_t extraction_flags,
 			       size_t *offset0)
 {
+	unsigned int dir = extraction_flags & (READ_INTO_ITER | WRITE_FROM_ITER);
+
+	if (WARN_ON_ONCE(dir != READ_INTO_ITER && dir != WRITE_FROM_ITER) ||
+	    WARN_ON_ONCE((dir & WRITE) != i->data_source))
+		return -EFAULT;
+
 	maxsize = min_t(size_t, min_t(size_t, maxsize, i->count), MAX_RW_COUNT);
 	if (!maxsize)
 		return 0;
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index e97d7060329e..116ce320eb36 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -1326,8 +1326,10 @@ static ssize_t extract_xarray_to_sg(struct iov_iter *iter,
  *
  * No end mark is placed on the scatterlist; that's left to the caller.
  *
- * @extraction_flags can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA
- * be allowed on the pages extracted.
+ * @extraction_flags should have either WRITE_FROM_ITER or READ_INTO_ITER to
+ * indicate the direction the data is intended to flow to/from the buffer and
+ * can have ITER_ALLOW_P2PDMA set to request peer-to-peer DMA be allowed on the
+ * pages extracted.
  *
  * If successful, @sgtable->nents is updated to include the number of elements
  * added and the number of bytes added is returned.  @sgtable->orig_nents is
@@ -1340,6 +1342,12 @@ ssize_t extract_iter_to_sg(struct iov_iter *iter, size_t maxsize,
 			   struct sg_table *sgtable, unsigned int sg_max,
 			   iov_iter_extraction_t extraction_flags)
 {
+	unsigned int dir = extraction_flags & (READ_INTO_ITER | WRITE_FROM_ITER);
+
+	if (WARN_ON_ONCE(dir != READ_INTO_ITER && dir != WRITE_FROM_ITER) ||
+	    WARN_ON_ONCE((dir & WRITE) != iter->data_source))
+		return -EFAULT;
+
 	if (maxsize == 0)
 		return 0;
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 10/11] 9p: Pin pages rather than ref'ing if appropriate
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (8 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 09/11] iov_iter: Use I/O dir flags with iov_iter_extract_pages() David Howells
@ 2023-06-30 15:25 ` David Howells
  2023-06-30 15:25 ` [RFC PATCH 11/11] scsi: Use extract_iter_to_sg() David Howells
  10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, Christian Schoenebeck, v9fs-developer

Convert the 9p filesystem to use iov_iter_extract_pages() instead of
iov_iter_get_pages().  This will pin pages or leave them unaltered rather
than getting a ref on them as appropriate to the iterator.

The pages need to be pinned for DIO-read rather than having refs taken on
them to prevent VM copy-on-write from malfunctioning during a concurrent
fork() (the result of the I/O would otherwise end up only visible to the
child process and not the parent).

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Eric Van Hensbergen <ericvh@gmail.com>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: v9fs-developer@lists.sourceforge.net
---
 net/9p/trans_common.c |  8 ++--
 net/9p/trans_common.h |  2 +-
 net/9p/trans_virtio.c | 92 ++++++++++++++-----------------------------
 3 files changed, 34 insertions(+), 68 deletions(-)

diff --git a/net/9p/trans_common.c b/net/9p/trans_common.c
index c827f694551c..4342de18f08b 100644
--- a/net/9p/trans_common.c
+++ b/net/9p/trans_common.c
@@ -9,16 +9,16 @@
 #include "trans_common.h"
 
 /**
- * p9_release_pages - Release pages after the transaction.
+ * p9_unpin_pages - Unpin pages after the transaction.
  * @pages: array of pages to be put
  * @nr_pages: size of array
  */
-void p9_release_pages(struct page **pages, int nr_pages)
+void p9_unpin_pages(struct page **pages, int nr_pages)
 {
 	int i;
 
 	for (i = 0; i < nr_pages; i++)
 		if (pages[i])
-			put_page(pages[i]);
+			unpin_user_page(pages[i]);
 }
-EXPORT_SYMBOL(p9_release_pages);
+EXPORT_SYMBOL(p9_unpin_pages);
diff --git a/net/9p/trans_common.h b/net/9p/trans_common.h
index 32134db6abf3..fd94c48aba5b 100644
--- a/net/9p/trans_common.h
+++ b/net/9p/trans_common.h
@@ -4,4 +4,4 @@
  * Author Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
  */
 
-void p9_release_pages(struct page **pages, int nr_pages);
+void p9_unpin_pages(struct page **pages, int nr_pages);
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 3c27ffb781e3..93569de2bdba 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -310,71 +310,35 @@ static int p9_get_mapped_pages(struct virtio_chan *chan,
 			       struct iov_iter *data,
 			       int count,
 			       size_t *offs,
-			       int *need_drop)
+			       bool *need_unpin,
+			       iov_iter_extraction_t extraction_flags)
 {
 	int nr_pages;
 	int err;
+	int n;
 
 	if (!iov_iter_count(data))
 		return 0;
 
-	if (!iov_iter_is_kvec(data)) {
-		int n;
-		/*
-		 * We allow only p9_max_pages pinned. We wait for the
-		 * Other zc request to finish here
-		 */
-		if (atomic_read(&vp_pinned) >= chan->p9_max_pages) {
-			err = wait_event_killable(vp_wq,
-			      (atomic_read(&vp_pinned) < chan->p9_max_pages));
-			if (err == -ERESTARTSYS)
-				return err;
-		}
-		n = iov_iter_get_pages_alloc2(data, pages, count, offs);
-		if (n < 0)
-			return n;
-		*need_drop = 1;
-		nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE);
-		atomic_add(nr_pages, &vp_pinned);
-		return n;
-	} else {
-		/* kernel buffer, no need to pin pages */
-		int index;
-		size_t len;
-		void *p;
-
-		/* we'd already checked that it's non-empty */
-		while (1) {
-			len = iov_iter_single_seg_count(data);
-			if (likely(len)) {
-				p = data->kvec->iov_base + data->iov_offset;
-				break;
-			}
-			iov_iter_advance(data, 0);
-		}
-		if (len > count)
-			len = count;
-
-		nr_pages = DIV_ROUND_UP((unsigned long)p + len, PAGE_SIZE) -
-			   (unsigned long)p / PAGE_SIZE;
-
-		*pages = kmalloc_array(nr_pages, sizeof(struct page *),
-				       GFP_NOFS);
-		if (!*pages)
-			return -ENOMEM;
-
-		*need_drop = 0;
-		p -= (*offs = offset_in_page(p));
-		for (index = 0; index < nr_pages; index++) {
-			if (is_vmalloc_addr(p))
-				(*pages)[index] = vmalloc_to_page(p);
-			else
-				(*pages)[index] = kmap_to_page(p);
-			p += PAGE_SIZE;
-		}
-		iov_iter_advance(data, len);
-		return len;
+	/*
+	 * We allow only p9_max_pages pinned. We wait for the
+	 * Other zc request to finish here
+	 */
+	if (atomic_read(&vp_pinned) >= chan->p9_max_pages) {
+		err = wait_event_killable(vp_wq,
+					  (atomic_read(&vp_pinned) < chan->p9_max_pages));
+		if (err == -ERESTARTSYS)
+			return err;
 	}
+
+	n = iov_iter_extract_pages(data, pages, count, INT_MAX,
+				   extraction_flags, offs);
+	if (n < 0)
+		return n;
+	*need_unpin = iov_iter_extract_will_pin(data);
+	nr_pages = DIV_ROUND_UP(n + *offs, PAGE_SIZE);
+	atomic_add(nr_pages, &vp_pinned);
+	return n;
 }
 
 static void handle_rerror(struct p9_req_t *req, int in_hdr_len,
@@ -429,7 +393,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	struct virtio_chan *chan = client->trans;
 	struct scatterlist *sgs[4];
 	size_t offs;
-	int need_drop = 0;
+	bool need_unpin;
 	int kicked = 0;
 
 	p9_debug(P9_DEBUG_TRANS, "virtio request\n");
@@ -437,7 +401,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	if (uodata) {
 		__le32 sz;
 		int n = p9_get_mapped_pages(chan, &out_pages, uodata,
-					    outlen, &offs, &need_drop);
+					    outlen, &offs, &need_unpin,
+					    WRITE_FROM_ITER);
 		if (n < 0) {
 			err = n;
 			goto err_out;
@@ -456,7 +421,8 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 		memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
 	} else if (uidata) {
 		int n = p9_get_mapped_pages(chan, &in_pages, uidata,
-					    inlen, &offs, &need_drop);
+					    inlen, &offs, &need_unpin,
+					    READ_INTO_ITER);
 		if (n < 0) {
 			err = n;
 			goto err_out;
@@ -542,13 +508,13 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
 	 * Non kernel buffers are pinned, unpin them
 	 */
 err_out:
-	if (need_drop) {
+	if (need_unpin) {
 		if (in_pages) {
-			p9_release_pages(in_pages, in_nr_pages);
+			p9_unpin_pages(in_pages, in_nr_pages);
 			atomic_sub(in_nr_pages, &vp_pinned);
 		}
 		if (out_pages) {
-			p9_release_pages(out_pages, out_nr_pages);
+			p9_unpin_pages(out_pages, out_nr_pages);
 			atomic_sub(out_nr_pages, &vp_pinned);
 		}
 		/* wakeup anybody waiting for slots to pin pages */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 11/11] scsi: Use extract_iter_to_sg()
  2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
                   ` (9 preceding siblings ...)
  2023-06-30 15:25 ` [RFC PATCH 10/11] 9p: Pin pages rather than ref'ing if appropriate David Howells
@ 2023-06-30 15:25 ` David Howells
  10 siblings, 0 replies; 21+ messages in thread
From: David Howells @ 2023-06-30 15:25 UTC (permalink / raw)
  To: Jens Axboe, Al Viro, Christoph Hellwig
  Cc: David Howells, Matthew Wilcox, Jan Kara, Jeff Layton,
	David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, James E . J . Bottomley,
	Martin K . Petersen, Christoph Hellwig, linux-scsi

Use extract_iter_to_sg() to build a scatterlist from an iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: James E.J. Bottomley <jejb@linux.ibm.com>
cc: Martin K. Petersen <martin.petersen@oracle.com>
cc: Christoph Hellwig <hch@lst.de>
cc: linux-scsi@vger.kernel.org
---
 drivers/vhost/scsi.c | 79 +++++++++++++-------------------------------
 1 file changed, 23 insertions(+), 56 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index bb10fa4bb4f6..7bb41e2a0d64 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -75,6 +75,9 @@ struct vhost_scsi_cmd {
 	u32 tvc_prot_sgl_count;
 	/* Saved unpacked SCSI LUN for vhost_scsi_target_queue_cmd() */
 	u32 tvc_lun;
+	/* Cleanup modes for scatterlists */
+	unsigned int tvc_need_unpin;
+	unsigned int tvc_prot_need_unpin;
 	/* Pointer to the SGL formatted memory from virtio-scsi */
 	struct scatterlist *tvc_sgl;
 	struct scatterlist *tvc_prot_sgl;
@@ -327,14 +330,13 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd)
 	struct vhost_scsi_inflight *inflight = tv_cmd->inflight;
 	int i;
 
-	if (tv_cmd->tvc_sgl_count) {
+	if (tv_cmd->tvc_need_unpin && tv_cmd->tvc_sgl_count)
 		for (i = 0; i < tv_cmd->tvc_sgl_count; i++)
-			put_page(sg_page(&tv_cmd->tvc_sgl[i]));
-	}
-	if (tv_cmd->tvc_prot_sgl_count) {
+			unpin_user_page(sg_page(&tv_cmd->tvc_sgl[i]));
+
+	if (tv_cmd->tvc_prot_need_unpin && tv_cmd->tvc_prot_sgl_count)
 		for (i = 0; i < tv_cmd->tvc_prot_sgl_count; i++)
-			put_page(sg_page(&tv_cmd->tvc_prot_sgl[i]));
-	}
+			unpin_user_page(sg_page(&tv_cmd->tvc_prot_sgl[i]));
 
 	sbitmap_clear_bit(&svq->scsi_tags, se_cmd->map_tag);
 	vhost_scsi_put_inflight(inflight);
@@ -606,38 +608,6 @@ vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg,
 	return cmd;
 }
 
-/*
- * Map a user memory range into a scatterlist
- *
- * Returns the number of scatterlist entries used or -errno on error.
- */
-static int
-vhost_scsi_map_to_sgl(struct vhost_scsi_cmd *cmd,
-		      struct iov_iter *iter,
-		      struct scatterlist *sgl,
-		      bool write)
-{
-	struct page **pages = cmd->tvc_upages;
-	struct scatterlist *sg = sgl;
-	ssize_t bytes;
-	size_t offset;
-	unsigned int npages = 0;
-
-	bytes = iov_iter_get_pages2(iter, pages, LONG_MAX,
-				VHOST_SCSI_PREALLOC_UPAGES, &offset);
-	/* No pages were pinned */
-	if (bytes <= 0)
-		return bytes < 0 ? bytes : -EFAULT;
-
-	while (bytes) {
-		unsigned n = min_t(unsigned, PAGE_SIZE - offset, bytes);
-		sg_set_page(sg++, pages[npages++], n, offset);
-		bytes -= n;
-		offset = 0;
-	}
-	return npages;
-}
-
 static int
 vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls)
 {
@@ -661,24 +631,19 @@ vhost_scsi_calc_sgls(struct iov_iter *iter, size_t bytes, int max_sgls)
 static int
 vhost_scsi_iov_to_sgl(struct vhost_scsi_cmd *cmd, bool write,
 		      struct iov_iter *iter,
-		      struct scatterlist *sg, int sg_count)
+		      struct scatterlist *sg, int sg_count,
+		      unsigned int *need_unpin)
 {
-	struct scatterlist *p = sg;
-	int ret;
+	struct sg_table sgt = { .sgl = sg };
+	ssize_t ret;
 
-	while (iov_iter_count(iter)) {
-		ret = vhost_scsi_map_to_sgl(cmd, iter, sg, write);
-		if (ret < 0) {
-			while (p < sg) {
-				struct page *page = sg_page(p++);
-				if (page)
-					put_page(page);
-			}
-			return ret;
-		}
-		sg += ret;
-	}
-	return 0;
+	ret = extract_iter_to_sg(iter, LONG_MAX, &sgt, sg_count,
+				 write ? WRITE_FROM_ITER : READ_INTO_ITER);
+	if (ret > 0)
+		sg_mark_end(sg + sgt.nents - 1);
+
+	*need_unpin = iov_iter_extract_will_pin(iter);
+	return ret;
 }
 
 static int
@@ -702,7 +667,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd,
 
 		ret = vhost_scsi_iov_to_sgl(cmd, write, prot_iter,
 					    cmd->tvc_prot_sgl,
-					    cmd->tvc_prot_sgl_count);
+					    cmd->tvc_prot_sgl_count,
+					    &cmd->tvc_prot_need_unpin);
 		if (ret < 0) {
 			cmd->tvc_prot_sgl_count = 0;
 			return ret;
@@ -719,7 +685,8 @@ vhost_scsi_mapal(struct vhost_scsi_cmd *cmd,
 		  cmd->tvc_sgl, cmd->tvc_sgl_count);
 
 	ret = vhost_scsi_iov_to_sgl(cmd, write, data_iter,
-				    cmd->tvc_sgl, cmd->tvc_sgl_count);
+				    cmd->tvc_sgl, cmd->tvc_sgl_count,
+				    &cmd->tvc_need_unpin);
 	if (ret < 0) {
 		cmd->tvc_sgl_count = 0;
 		return ret;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
  2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
@ 2023-06-30 15:39   ` Jens Axboe
  2023-06-30 16:00   ` David Howells
  2023-07-06 15:29   ` Christoph Hellwig
  2 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2023-06-30 15:39 UTC (permalink / raw)
  To: David Howells, Al Viro, Christoph Hellwig
  Cc: Matthew Wilcox, Jan Kara, Jeff Layton, David Hildenbrand,
	Jason Gunthorpe, Logan Gunthorpe, Hillf Danton,
	Christian Brauner, linux-fsdevel, linux-block, linux-kernel,
	linux-mm, Christoph Hellwig, Christian Brauner

On 6/30/23 9:25?AM, David Howells wrote:
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 1bce2208b65c..1cade1567162 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -655,12 +655,13 @@ static bool need_complete_io(struct io_kiocb *req)
>  		S_ISBLK(file_inode(req->file)->i_mode);
>  }
>  
> -static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
> +static int io_rw_init_file(struct io_kiocb *req, unsigned int io_direction)
>  {
>  	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
>  	struct kiocb *kiocb = &rw->kiocb;
>  	struct io_ring_ctx *ctx = req->ctx;
>  	struct file *file = req->file;
> +	fmode_t mode = (io_direction == WRITE) ? FMODE_WRITE : FMODE_READ;
>  	int ret;
>  
>  	if (unlikely(!file || !(file->f_mode & mode)))

Not ideal to add a branch here, probably better to just pass in both?

> @@ -870,7 +871,7 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>  		iov_iter_restore(&s->iter, &s->iter_state);
>  		iovec = NULL;
>  	}
> -	ret = io_rw_init_file(req, FMODE_WRITE);
> +	ret = io_rw_init_file(req, WRITE);
>  	if (unlikely(ret)) {
>  		kfree(iovec);
>  		return ret;
> @@ -914,7 +915,6 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>  		__sb_writers_release(file_inode(req->file)->i_sb,
>  					SB_FREEZE_WRITE);
>  	}
> -	kiocb->ki_flags |= IOCB_WRITE;
>  
>  	if (likely(req->file->f_op->write_iter))
>  		ret2 = call_write_iter(req->file, kiocb, &s->iter);
> 

One concern here is that we're using IOCB_WRITE here to tell if
sb_start_write() has been done or not, and hence whether
kiocb_end_write() needs to be called. You know set it earlier, which
means if we get a failure if we need to setup async data, then we know
have IOCB_WRITE set at that point even though we did not call
sb_start_write().

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
  2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
  2023-06-30 15:39   ` Jens Axboe
@ 2023-06-30 16:00   ` David Howells
  2023-06-30 16:05     ` Jens Axboe
  2023-07-06 15:29   ` Christoph Hellwig
  2 siblings, 1 reply; 21+ messages in thread
From: David Howells @ 2023-06-30 16:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: dhowells, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

Jens Axboe <axboe@kernel.dk> wrote:

> One concern here is that we're using IOCB_WRITE here to tell if
> sb_start_write() has been done or not, and hence whether
> kiocb_end_write() needs to be called. You know set it earlier, which
> means if we get a failure if we need to setup async data, then we know
> have IOCB_WRITE set at that point even though we did not call
> sb_start_write().

Hmmm...  It's set earlier in a number of places anyway - __cachefiles_write()
for example.

Btw, can you please put some comments on the IOCB_* constants?  I have to
guess at what they mean and how they're meant to be used.  Or better still,
get Christoph to write Documentation/core-api/iocb.rst describing the API? ;-)

David


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
  2023-06-30 16:00   ` David Howells
@ 2023-06-30 16:05     ` Jens Axboe
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2023-06-30 16:05 UTC (permalink / raw)
  To: David Howells
  Cc: Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

On 6/30/23 10:00?AM, David Howells wrote:
> Jens Axboe <axboe@kernel.dk> wrote:
> 
>> One concern here is that we're using IOCB_WRITE here to tell if
>> sb_start_write() has been done or not, and hence whether
>> kiocb_end_write() needs to be called. You know set it earlier, which
>> means if we get a failure if we need to setup async data, then we know
>> have IOCB_WRITE set at that point even though we did not call
>> sb_start_write().
> 
> Hmmm...  It's set earlier in a number of places anyway -
> __cachefiles_write() for example.

Not sure how that's relevant, that's a private kiocb and not related to
the private one that io_uring uses?

> Btw, can you please put some comments on the IOCB_* constants?  I have
> to guess at what they mean and how they're meant to be used.  Or
> better still, get Christoph to write Documentation/core-api/iocb.rst
> describing the API? ;-)

The ones I have added do have comments, mostly, though it's not a lot of
commentary for sure... Which ones are confusing and need better
comments? Would be happy to do that. I do think the comments belong in
there rather than have a separate doc for the kiocb. Though one thing
that's confusing is the ki_private ownership. You'd think it belongs to
the owner of the kiocb, but nope, it has random uses in iomap and ocfs2
at least.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc()
  2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
@ 2023-07-06 15:21   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:21 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christian Brauner

On Fri, Jun 30, 2023 at 04:25:14PM +0100, David Howells wrote:
>  	/*
>  	 * FOLL_LONGTERM indicates that the page will be held for an indefinite
>  	 * time period _often_ under userspace control.  This is in contrast to
> -	 * iov_iter_get_pages(), whose usages are transient.
> +	 * iov_iter_get_pages2(), whose usages are transient.
>  	 */

I don't think this should refer to iov_iter_get_pages* at all.  The
flag should document that actual get/pin_user interfaces and not refer
to a (deprecated) interface built on top of it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from
  2023-06-30 15:25 ` [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from David Howells
@ 2023-07-06 15:22   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:22 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs
  2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
  2023-06-30 15:39   ` Jens Axboe
  2023-06-30 16:00   ` David Howells
@ 2023-07-06 15:29   ` Christoph Hellwig
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:29 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

On Fri, Jun 30, 2023 at 04:25:16PM +0100, David Howells wrote:
> A number of places that generate kiocbs didn't use init_sync_kiocb() to
> initialise the new kiocb.  Fix these to always use init_kiocb().
> 
> Note that aio and io_uring pass information in through ki_filp through an
> overlaid union before I can call init_kiocb(), so that gets reinitialised.
> I don't think it clobbers anything else.
> 
> After this point, IOCB_WRITE is only set by init_kiocb().

Nothing in this patch touches the VFS, so the subject line is
wrong.  And I think we're better off splitting it into one per
subsystem, which also allows documenting the exact changes.

Which includes now setting the flags from f_iocb_flags and setting
and I/O priority.  Please explain why this is harmless or even useful.

> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 37511d2b2caf..ea92235c5ba2 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -439,16 +439,17 @@ static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
>  	}
>  	atomic_set(&cmd->ref, 2);
>  
> -	iov_iter_bvec(&iter, rw, bvec, nr_bvec, blk_rq_bytes(rq));
> +	iov_iter_bvec(&iter, rw == WRITE ? ITER_SOURCE : ITER_DEST,
> +		      bvec, nr_bvec, blk_rq_bytes(rq));

Given the cover letter I expect this is going to go away, but the
changes would probably a lot more readable if you had a helper
to convert from READ/WRITE to the iter flags inbetween.

Or maybe do it the other way - add a helper to init the 

> @@ -490,12 +491,12 @@ static int do_req_filebacked(struct loop_device *lo, struct request *rq)
>  		return lo_fallocate(lo, rq, pos, FALLOC_FL_PUNCH_HOLE);
>  	case REQ_OP_WRITE:
>  		if (cmd->use_aio)
> -			return lo_rw_aio(lo, cmd, pos, ITER_SOURCE);
> +			return lo_rw_aio(lo, cmd, pos, WRITE);
>  		else
>  			return lo_write_simple(lo, rq, pos);
>  	case REQ_OP_READ:
>  		if (cmd->use_aio)
> -			return lo_rw_aio(lo, cmd, pos, ITER_DEST);
> +			return lo_rw_aio(lo, cmd, pos, READ);

I don't think there is any need to pass the rw argument at all,
lo_rw_aio can just do an op_is_write(req_op(rq))

> -static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
> +static int io_rw_init_file(struct io_kiocb *req, unsigned int io_direction)
>  {
>  	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
>  	struct kiocb *kiocb = &rw->kiocb;
>  	struct io_ring_ctx *ctx = req->ctx;
>  	struct file *file = req->file;
> +	fmode_t mode = (io_direction == WRITE) ? FMODE_WRITE : FMODE_READ;
>  	int ret;
>  
>  	if (unlikely(!file || !(file->f_mode & mode)))

I'd just move this check into the two callers, that way you can hard
code the mode instead of adding a conversion.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE rather than iterator direction
  2023-06-30 15:25 ` [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE " David Howells
@ 2023-07-06 15:30   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:30 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

On Fri, Jun 30, 2023 at 04:25:18PM +0100, David Howells wrote:
> If an iomap_iter is available, use the IOMAP_WRITE flag instead of the
> iterator direction.

This is really two patches, one for dax and one for iomap, and not one
for the iov_iter code.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 06/11] iov_iter: Use op_is_write() rather than iterator direction
  2023-06-30 15:25 ` [RFC PATCH 06/11] iov_iter: Use op_is_write() " David Howells
@ 2023-07-06 15:30   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:30 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

On Fri, Jun 30, 2023 at 04:25:19PM +0100, David Howells wrote:
> If a request struct is available, use op_is_write() instead of the iterator
> direction.

s/iov_iter/block/ in the subject.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user
  2023-06-30 15:25 ` [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user David Howells
@ 2023-07-06 15:31   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2023-07-06 15:31 UTC (permalink / raw)
  To: David Howells
  Cc: Jens Axboe, Al Viro, Christoph Hellwig, Matthew Wilcox, Jan Kara,
	Jeff Layton, David Hildenbrand, Jason Gunthorpe, Logan Gunthorpe,
	Hillf Danton, Christian Brauner, linux-fsdevel, linux-block,
	linux-kernel, linux-mm, Christoph Hellwig, Christian Brauner

On Fri, Jun 30, 2023 at 04:25:21PM +0100, David Howells wrote:
> Drop the last usage of iov_iter_rw() into __iov_iter_get_pages_alloc().

Well, if we can't just drop this check entirely, the rest of the prep
work is just churn and not actually very useful.  Shouldn't we go
all the way and kill ITER_DEST/SOURCE?


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-06 15:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-30 15:25 [RFC PATCH 00/11] iov_iter: Use I/O direction from kiocb, iomap & request rather than iov_iter David Howells
2023-06-30 15:25 ` [RFC PATCH 01/11] iov_iter: Fix comment refs to iov_iter_get_pages/pages_alloc() David Howells
2023-07-06 15:21   ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 02/11] vfs: Set IOCB_WRITE in iocbs that we're going to write from David Howells
2023-07-06 15:22   ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 03/11] vfs: Use init_kiocb() to initialise new IOCBs David Howells
2023-06-30 15:39   ` Jens Axboe
2023-06-30 16:00   ` David Howells
2023-06-30 16:05     ` Jens Axboe
2023-07-06 15:29   ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 04/11] iov_iter: Use IOCB_WRITE rather than iterator direction David Howells
2023-06-30 15:25 ` [RFC PATCH 05/11] iov_iter: Use IOMAP_WRITE " David Howells
2023-07-06 15:30   ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 06/11] iov_iter: Use op_is_write() " David Howells
2023-07-06 15:30   ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 07/11] cifs: Drop the check using iov_iter_rw() David Howells
2023-06-30 15:25 ` [RFC PATCH 08/11] iov_iter: Drop iov_iter_rw() and fold in last user David Howells
2023-07-06 15:31   ` Christoph Hellwig
2023-06-30 15:25 ` [RFC PATCH 09/11] iov_iter: Use I/O dir flags with iov_iter_extract_pages() David Howells
2023-06-30 15:25 ` [RFC PATCH 10/11] 9p: Pin pages rather than ref'ing if appropriate David Howells
2023-06-30 15:25 ` [RFC PATCH 11/11] scsi: Use extract_iter_to_sg() David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).