All of lore.kernel.org
 help / color / mirror / Atom feed
* selective block polling and preadv2/pwritev2 revisited V2
@ 2016-02-22 17:07 Christoph Hellwig
  2016-02-22 17:07 ` [PATCH 1/7] vfs: pass a flags argument to vfs_readv/vfs_writev Christoph Hellwig
                   ` (8 more replies)
  0 siblings, 9 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

This series allows to selectively enable/disable polling for completions
in the block layer on a per-I/O basis.  For this it resurrects the
preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
are much simpler now due to VFS changes that happened in the meantime).
That approach also had a man page update prepared, which I will resubmit
with the current flags once this series makes it in.

Polling for block I/O is important to reduce the latency on flash and
post-flash storage technologies.  On the fastest NVMe controller I have
access to it almost halves latencies from over 7 microseconds to about 4
microseonds.  But it only is usesful if we actually care for the latency
of this particular I/O, and generally is a waste if enabled for all I/O
to a given device.  This series uses the per-I/O flags in preadv2/pwritev2
to control this behavior.  The alternative would be a new O_* flag set
at open time or using fcntl, but this is still to corse-grained for some
applications and we're starting to run out out of open flags.

Note that there are plenty of other use cases for preadv2/pwritev2 as well,
but I'd like to concentrate on this one for now.  Example are: non-blocking
reads (the original purpose), per-I/O O_SYNC, user space support for T10
DIF/DIX applications tags and probably some more.

Changes since V1:
  - rebased on top of Linux 4.5-rc5


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/7] vfs: pass a flags argument to vfs_readv/vfs_writev
  2016-02-22 17:07 selective block polling and preadv2/pwritev2 revisited V2 Christoph Hellwig
@ 2016-02-22 17:07 ` Christoph Hellwig
  2016-02-26 21:52     ` Jeff Moyer
  2016-02-22 17:07   ` Christoph Hellwig
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

This way we can set kiocb flags also from the sync read/write path.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfsd/vfs.c      |  4 ++--
 fs/read_write.c    | 44 ++++++++++++++++++++++++++------------------
 fs/splice.c        |  2 +-
 include/linux/fs.h |  4 ++--
 4 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5d2a57e..d40010e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -870,7 +870,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
 
 	oldfs = get_fs();
 	set_fs(KERNEL_DS);
-	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
 	set_fs(oldfs);
 	return nfsd_finish_read(file, count, host_err);
 }
@@ -957,7 +957,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
-	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
 	set_fs(oldfs);
 	if (host_err < 0)
 		goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index 324ec27..7d453c3 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -692,11 +692,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
 EXPORT_SYMBOL(iov_shorten);
 
 static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
-		loff_t *ppos, iter_fn_t fn)
+		loff_t *ppos, iter_fn_t fn, int flags)
 {
 	struct kiocb kiocb;
 	ssize_t ret;
 
+	if (flags)
+		return -EOPNOTSUPP;
+
 	init_sync_kiocb(&kiocb, filp);
 	kiocb.ki_pos = *ppos;
 
@@ -708,10 +711,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 
 /* Do it by hand, with file-ops */
 static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
-		loff_t *ppos, io_fn_t fn)
+		loff_t *ppos, io_fn_t fn, int flags)
 {
 	ssize_t ret = 0;
 
+	if (flags)
+		return -EOPNOTSUPP;
+
 	while (iov_iter_count(iter)) {
 		struct iovec iovec = iov_iter_iovec(iter);
 		ssize_t nr;
@@ -812,7 +818,8 @@ out:
 
 static ssize_t do_readv_writev(int type, struct file *file,
 			       const struct iovec __user * uvector,
-			       unsigned long nr_segs, loff_t *pos)
+			       unsigned long nr_segs, loff_t *pos,
+			       int flags)
 {
 	size_t tot_len;
 	struct iovec iovstack[UIO_FASTIOV];
@@ -844,9 +851,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
 	}
 
 	if (iter_fn)
-		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
-		ret = do_loop_readv_writev(file, &iter, pos, fn);
+		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
 	if (type != READ)
 		file_end_write(file);
@@ -863,27 +870,27 @@ out:
 }
 
 ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
-		  unsigned long vlen, loff_t *pos)
+		  unsigned long vlen, loff_t *pos, int flags)
 {
 	if (!(file->f_mode & FMODE_READ))
 		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_READ))
 		return -EINVAL;
 
-	return do_readv_writev(READ, file, vec, vlen, pos);
+	return do_readv_writev(READ, file, vec, vlen, pos, flags);
 }
 
 EXPORT_SYMBOL(vfs_readv);
 
 ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
-		   unsigned long vlen, loff_t *pos)
+		   unsigned long vlen, loff_t *pos, int flags)
 {
 	if (!(file->f_mode & FMODE_WRITE))
 		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
 
-	return do_readv_writev(WRITE, file, vec, vlen, pos);
+	return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
 }
 
 EXPORT_SYMBOL(vfs_writev);
@@ -896,7 +903,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos);
+		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -916,7 +923,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos);
+		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -948,7 +955,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos);
+			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
 		fdput(f);
 	}
 
@@ -972,7 +979,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos);
+			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
 		fdput(f);
 	}
 
@@ -986,7 +993,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
 			       const struct compat_iovec __user *uvector,
-			       unsigned long nr_segs, loff_t *pos)
+			       unsigned long nr_segs, loff_t *pos,
+			       int flags)
 {
 	compat_ssize_t tot_len;
 	struct iovec iovstack[UIO_FASTIOV];
@@ -1018,9 +1026,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
 	}
 
 	if (iter_fn)
-		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
-		ret = do_loop_readv_writev(file, &iter, pos, fn);
+		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
 	if (type != READ)
 		file_end_write(file);
@@ -1049,7 +1057,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
 
 out:
 	if (ret > 0)
@@ -1126,7 +1134,7 @@ static size_t compat_writev(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		goto out;
 
-	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
 
 out:
 	if (ret > 0)
diff --git a/fs/splice.c b/fs/splice.c
index 82bc0d6..3dc1426 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -577,7 +577,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
 	old_fs = get_fs();
 	set_fs(get_ds());
 	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
 	set_fs(old_fs);
 
 	return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ae68100..875277a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1709,9 +1709,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *);
+		unsigned long, loff_t *, int);
 extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *);
+		unsigned long, loff_t *, int);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
 extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/7] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

From: Milosz Tanski <milosz@adfin.com>

New syscalls that take an flag argument. This change does not add any
specific flags.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c          | 162 +++++++++++++++++++++++++++++++++++++----------
 include/linux/compat.h   |   6 ++
 include/linux/syscalls.h |   6 ++
 3 files changed, 139 insertions(+), 35 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 7d453c3..38b9afa 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -895,15 +895,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 
 EXPORT_SYMBOL(vfs_writev);
 
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+			unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+		ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -915,15 +915,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+		ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -941,10 +941,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
 	return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
 }
 
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -955,7 +954,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+			ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -965,10 +964,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+			  unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -979,7 +977,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+			ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -989,6 +987,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_readv(fd, vec, vlen, flags);
+
+	return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_writev(fd, vec, vlen, flags);
+
+	return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
 #ifdef CONFIG_COMPAT
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
@@ -1046,7 +1096,7 @@ out:
 
 static size_t compat_readv(struct file *file,
 			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos)
+			   unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1057,7 +1107,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
 
 out:
 	if (ret > 0)
@@ -1066,9 +1116,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_readv(compat_ulong_t fd,
+				 const struct compat_iovec __user *vec,
+				 compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1077,16 +1127,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos);
+	ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
+
+}
+
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_readv(fd, vec, vlen, 0);
 }
 
-static long __compat_sys_preadv64(unsigned long fd,
+static long do_compat_preadv64(unsigned long fd,
 				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos)
+				  unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
@@ -1098,7 +1156,7 @@ static long __compat_sys_preadv64(unsigned long fd,
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos);
+		ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1108,7 +1166,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1118,12 +1176,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+		int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_readv(fd, vec, vlen, flags);
+
+	return do_compat_preadv64(fd, vec, vlen, pos, flags);
 }
 
 static size_t compat_writev(struct file *file,
 			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos)
+			    unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1143,9 +1214,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_writev(compat_ulong_t fd,
+				  const struct compat_iovec __user* vec,
+				  compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1154,28 +1225,36 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos);
+	ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
 }
 
-static long __compat_sys_pwritev64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+		const struct compat_iovec __user *, vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_writev(fd, vec, vlen, 0);
+}
+
+static long do_compat_pwritev64(unsigned long fd,
 				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos)
+				   unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
 
 	if (pos < 0)
 		return -EINVAL;
+
 	f = fdget(fd);
 	if (!f.file)
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos);
+		ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1185,7 +1264,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1195,8 +1274,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_writev(fd, vec, vlen, flags);
+
+	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
 }
+
 #endif
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c917..fe4ccd0 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 185815c..d795472 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
 			     size_t count, loff_t pos);
 asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
 			   unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
 			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
 asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
 asmlinkage long sys_chdir(const char __user *filename);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/7] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>

New syscalls that take an flag argument. This change does not add any
specific flags.

Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/read_write.c          | 162 +++++++++++++++++++++++++++++++++++++----------
 include/linux/compat.h   |   6 ++
 include/linux/syscalls.h |   6 ++
 3 files changed, 139 insertions(+), 35 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 7d453c3..38b9afa 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -895,15 +895,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 
 EXPORT_SYMBOL(vfs_writev);
 
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+			unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+		ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -915,15 +915,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+		ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -941,10 +941,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
 	return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
 }
 
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -955,7 +954,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+			ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -965,10 +964,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+			  unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -979,7 +977,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+			ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -989,6 +987,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_readv(fd, vec, vlen, flags);
+
+	return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_writev(fd, vec, vlen, flags);
+
+	return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
 #ifdef CONFIG_COMPAT
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
@@ -1046,7 +1096,7 @@ out:
 
 static size_t compat_readv(struct file *file,
 			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos)
+			   unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1057,7 +1107,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
 
 out:
 	if (ret > 0)
@@ -1066,9 +1116,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_readv(compat_ulong_t fd,
+				 const struct compat_iovec __user *vec,
+				 compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1077,16 +1127,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos);
+	ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
+
+}
+
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_readv(fd, vec, vlen, 0);
 }
 
-static long __compat_sys_preadv64(unsigned long fd,
+static long do_compat_preadv64(unsigned long fd,
 				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos)
+				  unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
@@ -1098,7 +1156,7 @@ static long __compat_sys_preadv64(unsigned long fd,
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos);
+		ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1108,7 +1166,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1118,12 +1176,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+		int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_readv(fd, vec, vlen, flags);
+
+	return do_compat_preadv64(fd, vec, vlen, pos, flags);
 }
 
 static size_t compat_writev(struct file *file,
 			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos)
+			    unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1143,9 +1214,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_writev(compat_ulong_t fd,
+				  const struct compat_iovec __user* vec,
+				  compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1154,28 +1225,36 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos);
+	ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
 }
 
-static long __compat_sys_pwritev64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+		const struct compat_iovec __user *, vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_writev(fd, vec, vlen, 0);
+}
+
+static long do_compat_pwritev64(unsigned long fd,
 				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos)
+				   unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
 
 	if (pos < 0)
 		return -EINVAL;
+
 	f = fdget(fd);
 	if (!f.file)
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos);
+		ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1185,7 +1264,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1195,8 +1274,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_writev(fd, vec, vlen, flags);
+
+	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
 }
+
 #endif
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c917..fe4ccd0 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 185815c..d795472 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
 			     size_t count, loff_t pos);
 asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
 			   unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
 			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
 asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
 asmlinkage long sys_chdir(const char __user *filename);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/7] x86: wire up preadv2 and pwritev2
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased due to newly added syscalls]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
 arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index cb713df..b30dd81 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -384,3 +384,5 @@
 375	i386	membarrier		sys_membarrier
 376	i386	mlock2			sys_mlock2
 377	i386	copy_file_range		sys_copy_file_range
+378	i386	preadv2			sys_preadv2
+379	i386	pwritev2		sys_pwritev2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index dc1040a..31cec92 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -333,6 +333,8 @@
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
 326	common	copy_file_range		sys_copy_file_range
+327	64	preadv2			sys_preadv2
+328	64	pwritev2		sys_pwritev2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/7] x86: wire up preadv2 and pwritev2
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased due to newly added syscalls]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
 arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index cb713df..b30dd81 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -384,3 +384,5 @@
 375	i386	membarrier		sys_membarrier
 376	i386	mlock2			sys_mlock2
 377	i386	copy_file_range		sys_copy_file_range
+378	i386	preadv2			sys_preadv2
+379	i386	pwritev2		sys_pwritev2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index dc1040a..31cec92 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -333,6 +333,8 @@
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
 326	common	copy_file_range		sys_copy_file_range
+327	64	preadv2			sys_preadv2
+328	64	pwritev2		sys_pwritev2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/7] vfs: add the RWF_HIPRI flag for preadv2/pwritev2
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

This adds a flag that tells the file system that this is a high priority
request for which it's worth to poll the hardware.  The flag is purely
advisory and can be ignored if not supported.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c         | 6 ++++--
 include/linux/fs.h      | 1 +
 include/uapi/linux/fs.h | 3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 38b9afa..3b3fb22 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -697,10 +697,12 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 	struct kiocb kiocb;
 	ssize_t ret;
 
-	if (flags)
+	if (flags & ~RWF_HIPRI)
 		return -EOPNOTSUPP;
 
 	init_sync_kiocb(&kiocb, filp);
+	if (flags & RWF_HIPRI)
+		kiocb.ki_flags |= IOCB_HIPRI;
 	kiocb.ki_pos = *ppos;
 
 	ret = fn(&kiocb, iter);
@@ -715,7 +717,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 {
 	ssize_t ret = 0;
 
-	if (flags)
+	if (flags & ~RWF_HIPRI)
 		return -EOPNOTSUPP;
 
 	while (iov_iter_count(iter)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 875277a..a1f731c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -320,6 +320,7 @@ struct writeback_control;
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
+#define IOCB_HIPRI		(1 << 3)
 
 struct kiocb {
 	struct file		*ki_filp;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 149bec8..d246339 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -304,4 +304,7 @@ struct fsxattr {
 #define SYNC_FILE_RANGE_WRITE		2
 #define SYNC_FILE_RANGE_WAIT_AFTER	4
 
+/* flags for preadv2/pwritev2: */
+#define RWF_HIPRI			0x00000001 /* high priority request, poll if possible */
+
 #endif /* _UAPI_LINUX_FS_H */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/7] vfs: add the RWF_HIPRI flag for preadv2/pwritev2
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

This adds a flag that tells the file system that this is a high priority
request for which it's worth to poll the hardware.  The flag is purely
advisory and can be ignored if not supported.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/read_write.c         | 6 ++++--
 include/linux/fs.h      | 1 +
 include/uapi/linux/fs.h | 3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 38b9afa..3b3fb22 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -697,10 +697,12 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 	struct kiocb kiocb;
 	ssize_t ret;
 
-	if (flags)
+	if (flags & ~RWF_HIPRI)
 		return -EOPNOTSUPP;
 
 	init_sync_kiocb(&kiocb, filp);
+	if (flags & RWF_HIPRI)
+		kiocb.ki_flags |= IOCB_HIPRI;
 	kiocb.ki_pos = *ppos;
 
 	ret = fn(&kiocb, iter);
@@ -715,7 +717,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 {
 	ssize_t ret = 0;
 
-	if (flags)
+	if (flags & ~RWF_HIPRI)
 		return -EOPNOTSUPP;
 
 	while (iov_iter_count(iter)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 875277a..a1f731c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -320,6 +320,7 @@ struct writeback_control;
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
+#define IOCB_HIPRI		(1 << 3)
 
 struct kiocb {
 	struct file		*ki_filp;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 149bec8..d246339 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -304,4 +304,7 @@ struct fsxattr {
 #define SYNC_FILE_RANGE_WRITE		2
 #define SYNC_FILE_RANGE_WAIT_AFTER	4
 
+/* flags for preadv2/pwritev2: */
+#define RWF_HIPRI			0x00000001 /* high priority request, poll if possible */
+
 #endif /* _UAPI_LINUX_FS_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/7] direct-io: only use block polling if explicitly requested
  2016-02-22 17:07 selective block polling and preadv2/pwritev2 revisited V2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2016-02-22 17:07   ` Christoph Hellwig
@ 2016-02-22 17:07 ` Christoph Hellwig
  2016-02-26 21:58     ` Jeff Moyer
  2016-02-22 17:07   ` Christoph Hellwig
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/direct-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index d6a9012..0a8d937 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -445,7 +445,8 @@ static struct bio *dio_await_one(struct dio *dio)
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		dio->waiter = current;
 		spin_unlock_irqrestore(&dio->bio_lock, flags);
-		if (!blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
+		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
+		    !blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
 			io_schedule();
 		/* wake up sets us TASK_RUNNING */
 		spin_lock_irqsave(&dio->bio_lock, flags);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/7] blk-mq: enable polling support by default
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

Now that applications need to explicitly ask for polling we can enable it
by default in blk-mq drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4571ef1..458f6ef 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -499,7 +499,8 @@ struct request_queue {
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
-				 (1 << QUEUE_FLAG_SAME_COMP))
+				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
+				 (1 << QUEUE_FLAG_POLL))
 
 static inline void queue_lockdep_assert_held(struct request_queue *q)
 {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/7] blk-mq: enable polling support by default
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Now that applications need to explicitly ask for polling we can enable it
by default in blk-mq drivers.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 include/linux/blkdev.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4571ef1..458f6ef 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -499,7 +499,8 @@ struct request_queue {
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
-				 (1 << QUEUE_FLAG_SAME_COMP))
+				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
+				 (1 << QUEUE_FLAG_POLL))
 
 static inline void queue_lockdep_assert_held(struct request_queue *q)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/7] block, directio: set a REQ_POLL flag when submitting polled bios
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c          | 9 +++++++--
 fs/direct-io.c            | 4 ++++
 include/linux/blk_types.h | 2 ++
 include/linux/blkdev.h    | 1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b83d297..81b4b8b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3335,13 +3335,18 @@ void blk_finish_plug(struct blk_plug *plug)
 }
 EXPORT_SYMBOL(blk_finish_plug);
 
+inline bool blk_queue_can_poll(struct request_queue *q)
+{
+	return q->mq_ops && q->mq_ops->poll &&
+		test_bit(QUEUE_FLAG_POLL, &q->queue_flags);
+}
+
 bool blk_poll(struct request_queue *q, blk_qc_t cookie)
 {
 	struct blk_plug *plug;
 	long state;
 
-	if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
-	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_queue_can_poll(q) || !blk_qc_t_valid(cookie))
 		return false;
 
 	plug = current->plug;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0a8d937..ba5ba7e 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1197,6 +1197,10 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	dio->inode = inode;
 	dio->rw = iov_iter_rw(iter) == WRITE ? WRITE_ODIRECT : READ;
 
+	if ((iocb->ki_flags & IOCB_HIPRI) &&
+	    blk_queue_can_poll(bdev_get_queue(bdev)))
+		dio->rw |= REQ_POLL;
+
 	/*
 	 * For AIO O_(D)SYNC writes we need to defer completions to a workqueue
 	 * so that we can call ->fsync.
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 86a38ea..d667bb4 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -161,6 +161,7 @@ enum rq_flag_bits {
 	__REQ_INTEGRITY,	/* I/O includes block integrity payload */
 	__REQ_FUA,		/* forced unit access */
 	__REQ_FLUSH,		/* request for cache flush */
+	__REQ_POLL,		/* request polling for completion */
 
 	/* bio only flags */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
@@ -202,6 +203,7 @@ enum rq_flag_bits {
 #define REQ_WRITE_SAME		(1ULL << __REQ_WRITE_SAME)
 #define REQ_NOIDLE		(1ULL << __REQ_NOIDLE)
 #define REQ_INTEGRITY		(1ULL << __REQ_INTEGRITY)
+#define REQ_POLL		(1ULL << __REQ_POLL)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 458f6ef..d79353f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -824,6 +824,7 @@ extern int blk_execute_rq(struct request_queue *, struct gendisk *,
 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 				  struct request *, int, rq_end_io_fn *);
 
+bool blk_queue_can_poll(struct request_queue *q);
 bool blk_poll(struct request_queue *q, blk_qc_t cookie);
 
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 7/7] block, directio: set a REQ_POLL flag when submitting polled bios
@ 2016-02-22 17:07   ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-22 17:07 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 block/blk-core.c          | 9 +++++++--
 fs/direct-io.c            | 4 ++++
 include/linux/blk_types.h | 2 ++
 include/linux/blkdev.h    | 1 +
 4 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b83d297..81b4b8b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3335,13 +3335,18 @@ void blk_finish_plug(struct blk_plug *plug)
 }
 EXPORT_SYMBOL(blk_finish_plug);
 
+inline bool blk_queue_can_poll(struct request_queue *q)
+{
+	return q->mq_ops && q->mq_ops->poll &&
+		test_bit(QUEUE_FLAG_POLL, &q->queue_flags);
+}
+
 bool blk_poll(struct request_queue *q, blk_qc_t cookie)
 {
 	struct blk_plug *plug;
 	long state;
 
-	if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
-	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!blk_queue_can_poll(q) || !blk_qc_t_valid(cookie))
 		return false;
 
 	plug = current->plug;
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 0a8d937..ba5ba7e 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1197,6 +1197,10 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	dio->inode = inode;
 	dio->rw = iov_iter_rw(iter) == WRITE ? WRITE_ODIRECT : READ;
 
+	if ((iocb->ki_flags & IOCB_HIPRI) &&
+	    blk_queue_can_poll(bdev_get_queue(bdev)))
+		dio->rw |= REQ_POLL;
+
 	/*
 	 * For AIO O_(D)SYNC writes we need to defer completions to a workqueue
 	 * so that we can call ->fsync.
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 86a38ea..d667bb4 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -161,6 +161,7 @@ enum rq_flag_bits {
 	__REQ_INTEGRITY,	/* I/O includes block integrity payload */
 	__REQ_FUA,		/* forced unit access */
 	__REQ_FLUSH,		/* request for cache flush */
+	__REQ_POLL,		/* request polling for completion */
 
 	/* bio only flags */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
@@ -202,6 +203,7 @@ enum rq_flag_bits {
 #define REQ_WRITE_SAME		(1ULL << __REQ_WRITE_SAME)
 #define REQ_NOIDLE		(1ULL << __REQ_NOIDLE)
 #define REQ_INTEGRITY		(1ULL << __REQ_INTEGRITY)
+#define REQ_POLL		(1ULL << __REQ_POLL)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 458f6ef..d79353f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -824,6 +824,7 @@ extern int blk_execute_rq(struct request_queue *, struct gendisk *,
 extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
 				  struct request *, int, rq_end_io_fn *);
 
+bool blk_queue_can_poll(struct request_queue *q);
 bool blk_poll(struct request_queue *q, blk_qc_t cookie);
 
 static inline struct request_queue *bdev_get_queue(struct block_device *bdev)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* RE: selective block polling and preadv2/pwritev2 revisited V2
  2016-02-22 17:07 selective block polling and preadv2/pwritev2 revisited V2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2016-02-22 17:07   ` Christoph Hellwig
@ 2016-02-26 15:06 ` Stephen Bates
  2016-02-26 21:18   ` Jeff Moyer
  8 siblings, 0 replies; 39+ messages in thread
From: Stephen Bates @ 2016-02-26 15:06 UTC (permalink / raw)
  To: Christoph Hellwig, viro, axboe
  Cc: milosz, linux-fsdevel, linux-block, linux-api

> 
> This series allows to selectively enable/disable polling for completions in the
> block layer on a per-I/O basis.  For this it resurrects the
> preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which are
> much simpler now due to VFS changes that happened in the meantime).
> That approach also had a man page update prepared, which I will resubmit
> with the current flags once this series makes it in.
> 
> Polling for block I/O is important to reduce the latency on flash and post-flash
> storage technologies.  On the fastest NVMe controller I have access to it
> almost halves latencies from over 7 microseconds to about 4 microseonds.
> But it only is usesful if we actually care for the latency of this particular I/O,
> and generally is a waste if enabled for all I/O to a given device.  This series
> uses the per-I/O flags in preadv2/pwritev2 to control this behavior.  The
> alternative would be a new O_* flag set at open time or using fcntl, but this is
> still to corse-grained for some applications and we're starting to run out out
> of open flags.

Thanks Christoph for re-submitting this. I for one am very supportive of being able to set priority (and other) flags on a per IO basis. I did some testing of this on a NVMe SSD that uses DRAM rather than NAND as its backing store. My performance absolutes are a bit worse than yours but the improvement of HIPRI over a normal IO was about the same with one thread (3-4us) and was a little bit more (6-7us) at a higher thread count. 

I used a fork of fio with a (rather ugly) hack to enable the new syscalls [1]. I then tested this on a per-thread basis using the following simple shell script. 

#!/bin/bash

FIO=/home/mtr/batesste/fio-git/fio
FILE=/dev/nvme0n1
THREADS=5
TIME=30

$FIO --name lowpri --filename=$FILE --size=1G --direct=1 \
    --ioengine=pvsync --rw=randread --bs=4k --runtime=$TIME \
    --numjobs=$THREADS --iodepth=1 --randrepeat=0 \
    --refill_buffers \
    --name hipri --filename=$FILE --size=1G --direct=1 \
    --ioengine=pv2sync --rw=randread --bs=4k --runtime=$TIME \
    --numjobs=$THREADS --iodepth=1 --randrepeat=0 \
    --refill_buffers

I also reviewed the code and it all looks good to me!

For the series:

Reviewed-by: Stephen Bates <stephen.bates@pmcs.com>
Tested-by: Stephen Bates <stephen.bates@pmcs.com> 

> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now.  Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.
> 

 Totally agree!

[1] https://github.com/sbates130272/fio/tree/hipri

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/7] blk-mq: enable polling support by default
@ 2016-02-26 20:44     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 20:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Hi, Christoph,

Christoph Hellwig <hch@lst.de> writes:

> Now that applications need to explicitly ask for polling we can enable it
> by default in blk-mq drivers.

I don't think this is a good idea.  I'd just enable it in nvme and the
micron driver for now.

Cheers,
Jeff

>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/blkdev.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4571ef1..458f6ef 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -499,7 +499,8 @@ struct request_queue {
>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_STACKABLE)	|	\
> -				 (1 << QUEUE_FLAG_SAME_COMP))
> +				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
> +				 (1 << QUEUE_FLAG_POLL))
>  
>  static inline void queue_lockdep_assert_held(struct request_queue *q)
>  {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/7] blk-mq: enable polling support by default
@ 2016-02-26 20:44     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 20:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi, Christoph,

Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:

> Now that applications need to explicitly ask for polling we can enable it
> by default in blk-mq drivers.

I don't think this is a good idea.  I'd just enable it in nvme and the
micron driver for now.

Cheers,
Jeff

>
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  include/linux/blkdev.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4571ef1..458f6ef 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -499,7 +499,8 @@ struct request_queue {
>  
>  #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_STACKABLE)	|	\
> -				 (1 << QUEUE_FLAG_SAME_COMP))
> +				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
> +				 (1 << QUEUE_FLAG_POLL))
>  
>  static inline void queue_lockdep_assert_held(struct request_queue *q)
>  {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] block, directio: set a REQ_POLL flag when submitting polled bios
@ 2016-02-26 21:10     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Hi, Christoph,

REQ_POLL is set but never checked.  Is part of the patch missing, or was
that intentional?

-Jeff

Christoph Hellwig <hch@lst.de> writes:

> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-core.c          | 9 +++++++--
>  fs/direct-io.c            | 4 ++++
>  include/linux/blk_types.h | 2 ++
>  include/linux/blkdev.h    | 1 +
>  4 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b83d297..81b4b8b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3335,13 +3335,18 @@ void blk_finish_plug(struct blk_plug *plug)
>  }
>  EXPORT_SYMBOL(blk_finish_plug);
>  
> +inline bool blk_queue_can_poll(struct request_queue *q)
> +{
> +	return q->mq_ops && q->mq_ops->poll &&
> +		test_bit(QUEUE_FLAG_POLL, &q->queue_flags);
> +}
> +
>  bool blk_poll(struct request_queue *q, blk_qc_t cookie)
>  {
>  	struct blk_plug *plug;
>  	long state;
>  
> -	if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
> -	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
> +	if (!blk_queue_can_poll(q) || !blk_qc_t_valid(cookie))
>  		return false;
>  
>  	plug = current->plug;
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 0a8d937..ba5ba7e 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1197,6 +1197,10 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	dio->inode = inode;
>  	dio->rw = iov_iter_rw(iter) == WRITE ? WRITE_ODIRECT : READ;
>  
> +	if ((iocb->ki_flags & IOCB_HIPRI) &&
> +	    blk_queue_can_poll(bdev_get_queue(bdev)))
> +		dio->rw |= REQ_POLL;
> +
>  	/*
>  	 * For AIO O_(D)SYNC writes we need to defer completions to a workqueue
>  	 * so that we can call ->fsync.
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 86a38ea..d667bb4 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -161,6 +161,7 @@ enum rq_flag_bits {
>  	__REQ_INTEGRITY,	/* I/O includes block integrity payload */
>  	__REQ_FUA,		/* forced unit access */
>  	__REQ_FLUSH,		/* request for cache flush */
> +	__REQ_POLL,		/* request polling for completion */
>  
>  	/* bio only flags */
>  	__REQ_RAHEAD,		/* read ahead, can fail anytime */
> @@ -202,6 +203,7 @@ enum rq_flag_bits {
>  #define REQ_WRITE_SAME		(1ULL << __REQ_WRITE_SAME)
>  #define REQ_NOIDLE		(1ULL << __REQ_NOIDLE)
>  #define REQ_INTEGRITY		(1ULL << __REQ_INTEGRITY)
> +#define REQ_POLL		(1ULL << __REQ_POLL)
>  
>  #define REQ_FAILFAST_MASK \
>  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 458f6ef..d79353f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -824,6 +824,7 @@ extern int blk_execute_rq(struct request_queue *, struct gendisk *,
>  extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
>  				  struct request *, int, rq_end_io_fn *);
>  
> +bool blk_queue_can_poll(struct request_queue *q);
>  bool blk_poll(struct request_queue *q, blk_qc_t cookie);
>  
>  static inline struct request_queue *bdev_get_queue(struct block_device *bdev)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] block, directio: set a REQ_POLL flag when submitting polled bios
@ 2016-02-26 21:10     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi, Christoph,

REQ_POLL is set but never checked.  Is part of the patch missing, or was
that intentional?

-Jeff

Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:

> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  block/blk-core.c          | 9 +++++++--
>  fs/direct-io.c            | 4 ++++
>  include/linux/blk_types.h | 2 ++
>  include/linux/blkdev.h    | 1 +
>  4 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b83d297..81b4b8b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3335,13 +3335,18 @@ void blk_finish_plug(struct blk_plug *plug)
>  }
>  EXPORT_SYMBOL(blk_finish_plug);
>  
> +inline bool blk_queue_can_poll(struct request_queue *q)
> +{
> +	return q->mq_ops && q->mq_ops->poll &&
> +		test_bit(QUEUE_FLAG_POLL, &q->queue_flags);
> +}
> +
>  bool blk_poll(struct request_queue *q, blk_qc_t cookie)
>  {
>  	struct blk_plug *plug;
>  	long state;
>  
> -	if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
> -	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
> +	if (!blk_queue_can_poll(q) || !blk_qc_t_valid(cookie))
>  		return false;
>  
>  	plug = current->plug;
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 0a8d937..ba5ba7e 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1197,6 +1197,10 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	dio->inode = inode;
>  	dio->rw = iov_iter_rw(iter) == WRITE ? WRITE_ODIRECT : READ;
>  
> +	if ((iocb->ki_flags & IOCB_HIPRI) &&
> +	    blk_queue_can_poll(bdev_get_queue(bdev)))
> +		dio->rw |= REQ_POLL;
> +
>  	/*
>  	 * For AIO O_(D)SYNC writes we need to defer completions to a workqueue
>  	 * so that we can call ->fsync.
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 86a38ea..d667bb4 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -161,6 +161,7 @@ enum rq_flag_bits {
>  	__REQ_INTEGRITY,	/* I/O includes block integrity payload */
>  	__REQ_FUA,		/* forced unit access */
>  	__REQ_FLUSH,		/* request for cache flush */
> +	__REQ_POLL,		/* request polling for completion */
>  
>  	/* bio only flags */
>  	__REQ_RAHEAD,		/* read ahead, can fail anytime */
> @@ -202,6 +203,7 @@ enum rq_flag_bits {
>  #define REQ_WRITE_SAME		(1ULL << __REQ_WRITE_SAME)
>  #define REQ_NOIDLE		(1ULL << __REQ_NOIDLE)
>  #define REQ_INTEGRITY		(1ULL << __REQ_INTEGRITY)
> +#define REQ_POLL		(1ULL << __REQ_POLL)
>  
>  #define REQ_FAILFAST_MASK \
>  	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 458f6ef..d79353f 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -824,6 +824,7 @@ extern int blk_execute_rq(struct request_queue *, struct gendisk *,
>  extern void blk_execute_rq_nowait(struct request_queue *, struct gendisk *,
>  				  struct request *, int, rq_end_io_fn *);
>  
> +bool blk_queue_can_poll(struct request_queue *q);
>  bool blk_poll(struct request_queue *q, blk_qc_t cookie);
>  
>  static inline struct request_queue *bdev_get_queue(struct block_device *bdev)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited V2
@ 2016-02-26 21:18   ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> This series allows to selectively enable/disable polling for completions
> in the block layer on a per-I/O basis.  For this it resurrects the
> preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
> are much simpler now due to VFS changes that happened in the meantime).
> That approach also had a man page update prepared, which I will resubmit
> with the current flags once this series makes it in.

It would be helpful for reviewers if you submitted the man page at the
same time, in my opinion.

Do you have any plans on adding polling support to the buffered path?

> Polling for block I/O is important to reduce the latency on flash and
> post-flash storage technologies.  On the fastest NVMe controller I have
> access to it almost halves latencies from over 7 microseconds to about 4
> microseonds.  But it only is usesful if we actually care for the latency
> of this particular I/O, and generally is a waste if enabled for all I/O
> to a given device.  This series uses the per-I/O flags in preadv2/pwritev2
> to control this behavior.  The alternative would be a new O_* flag set
> at open time or using fcntl, but this is still to corse-grained for some
> applications and we're starting to run out out of open flags.

I agree that a per-I/O mechanism is better suited to this application.

> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now.  Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.

And my favorite, RWF_ATOMIC.  :)  I agree, it's time we got this
interface in.

See responses to individual patches for my review.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited V2
@ 2016-02-26 21:18   ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:

> This series allows to selectively enable/disable polling for completions
> in the block layer on a per-I/O basis.  For this it resurrects the
> preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
> are much simpler now due to VFS changes that happened in the meantime).
> That approach also had a man page update prepared, which I will resubmit
> with the current flags once this series makes it in.

It would be helpful for reviewers if you submitted the man page at the
same time, in my opinion.

Do you have any plans on adding polling support to the buffered path?

> Polling for block I/O is important to reduce the latency on flash and
> post-flash storage technologies.  On the fastest NVMe controller I have
> access to it almost halves latencies from over 7 microseconds to about 4
> microseonds.  But it only is usesful if we actually care for the latency
> of this particular I/O, and generally is a waste if enabled for all I/O
> to a given device.  This series uses the per-I/O flags in preadv2/pwritev2
> to control this behavior.  The alternative would be a new O_* flag set
> at open time or using fcntl, but this is still to corse-grained for some
> applications and we're starting to run out out of open flags.

I agree that a per-I/O mechanism is better suited to this application.

> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now.  Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.

And my favorite, RWF_ATOMIC.  :)  I agree, it's time we got this
interface in.

See responses to individual patches for my review.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-02-26 21:51     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> From: Milosz Tanski <milosz@adfin.com>
>
> New syscalls that take an flag argument. This change does not add any
> specific flags.

So, it looks like file systems that don't implement read_iter/write_iter
won't get the flags argument passed along.  I don't think that's a big
deal, as such file systems seem to be in-memory file systems, but I
think it warrants mention in the changelog.

Also, I think you added a stray newline below:

> +static long do_compat_pwritev64(unsigned long fd,
>  				   const struct compat_iovec __user *vec,
> -				   unsigned long vlen, loff_t pos)
> +				   unsigned long vlen, loff_t pos, int flags)
>  {
>  	struct fd f;
>  	ssize_t ret;
>  
>  	if (pos < 0)
>  		return -EINVAL;
> +
>  	f = fdget(fd);
>  	if (!f.file)
>  		return -EBADF;
>  	ret = -ESPIPE;
>  	if (f.file->f_mode & FMODE_PWRITE)
> -		ret = compat_writev(f.file, vec, vlen, &pos);
> +		ret = compat_writev(f.file, vec, vlen, &pos, flags);
>  	fdput(f);
>  	return ret;
>  }

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-02-26 21:51     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:

> From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
>
> New syscalls that take an flag argument. This change does not add any
> specific flags.

So, it looks like file systems that don't implement read_iter/write_iter
won't get the flags argument passed along.  I don't think that's a big
deal, as such file systems seem to be in-memory file systems, but I
think it warrants mention in the changelog.

Also, I think you added a stray newline below:

> +static long do_compat_pwritev64(unsigned long fd,
>  				   const struct compat_iovec __user *vec,
> -				   unsigned long vlen, loff_t pos)
> +				   unsigned long vlen, loff_t pos, int flags)
>  {
>  	struct fd f;
>  	ssize_t ret;
>  
>  	if (pos < 0)
>  		return -EINVAL;
> +
>  	f = fdget(fd);
>  	if (!f.file)
>  		return -EBADF;
>  	ret = -ESPIPE;
>  	if (f.file->f_mode & FMODE_PWRITE)
> -		ret = compat_writev(f.file, vec, vlen, &pos);
> +		ret = compat_writev(f.file, vec, vlen, &pos, flags);
>  	fdput(f);
>  	return ret;
>  }

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/7] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2016-02-26 21:52     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> This way we can set kiocb flags also from the sync read/write path.
>
> Signed-off-by: Milosz Tanski <milosz@adfin.com>
> [hch: rebased on top of my kiocb changes]
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks sane to me.

Acked-by: Jeff Moyer <jmoyer@redhat.com>

> ---
>  fs/nfsd/vfs.c      |  4 ++--
>  fs/read_write.c    | 44 ++++++++++++++++++++++++++------------------
>  fs/splice.c        |  2 +-
>  include/linux/fs.h |  4 ++--
>  4 files changed, 31 insertions(+), 23 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 5d2a57e..d40010e 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -870,7 +870,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
>  
>  	oldfs = get_fs();
>  	set_fs(KERNEL_DS);
> -	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
> +	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
>  	set_fs(oldfs);
>  	return nfsd_finish_read(file, count, host_err);
>  }
> @@ -957,7 +957,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
>  
>  	/* Write the data. */
>  	oldfs = get_fs(); set_fs(KERNEL_DS);
> -	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
> +	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
>  	set_fs(oldfs);
>  	if (host_err < 0)
>  		goto out_nfserr;
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 324ec27..7d453c3 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -692,11 +692,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
>  EXPORT_SYMBOL(iov_shorten);
>  
>  static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
> -		loff_t *ppos, iter_fn_t fn)
> +		loff_t *ppos, iter_fn_t fn, int flags)
>  {
>  	struct kiocb kiocb;
>  	ssize_t ret;
>  
> +	if (flags)
> +		return -EOPNOTSUPP;
> +
>  	init_sync_kiocb(&kiocb, filp);
>  	kiocb.ki_pos = *ppos;
>  
> @@ -708,10 +711,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
>  
>  /* Do it by hand, with file-ops */
>  static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
> -		loff_t *ppos, io_fn_t fn)
> +		loff_t *ppos, io_fn_t fn, int flags)
>  {
>  	ssize_t ret = 0;
>  
> +	if (flags)
> +		return -EOPNOTSUPP;
> +
>  	while (iov_iter_count(iter)) {
>  		struct iovec iovec = iov_iter_iovec(iter);
>  		ssize_t nr;
> @@ -812,7 +818,8 @@ out:
>  
>  static ssize_t do_readv_writev(int type, struct file *file,
>  			       const struct iovec __user * uvector,
> -			       unsigned long nr_segs, loff_t *pos)
> +			       unsigned long nr_segs, loff_t *pos,
> +			       int flags)
>  {
>  	size_t tot_len;
>  	struct iovec iovstack[UIO_FASTIOV];
> @@ -844,9 +851,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
>  	}
>  
>  	if (iter_fn)
> -		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
> +		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
>  	else
> -		ret = do_loop_readv_writev(file, &iter, pos, fn);
> +		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
>  
>  	if (type != READ)
>  		file_end_write(file);
> @@ -863,27 +870,27 @@ out:
>  }
>  
>  ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
> -		  unsigned long vlen, loff_t *pos)
> +		  unsigned long vlen, loff_t *pos, int flags)
>  {
>  	if (!(file->f_mode & FMODE_READ))
>  		return -EBADF;
>  	if (!(file->f_mode & FMODE_CAN_READ))
>  		return -EINVAL;
>  
> -	return do_readv_writev(READ, file, vec, vlen, pos);
> +	return do_readv_writev(READ, file, vec, vlen, pos, flags);
>  }
>  
>  EXPORT_SYMBOL(vfs_readv);
>  
>  ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
> -		   unsigned long vlen, loff_t *pos)
> +		   unsigned long vlen, loff_t *pos, int flags)
>  {
>  	if (!(file->f_mode & FMODE_WRITE))
>  		return -EBADF;
>  	if (!(file->f_mode & FMODE_CAN_WRITE))
>  		return -EINVAL;
>  
> -	return do_readv_writev(WRITE, file, vec, vlen, pos);
> +	return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
>  }
>  
>  EXPORT_SYMBOL(vfs_writev);
> @@ -896,7 +903,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
>  
>  	if (f.file) {
>  		loff_t pos = file_pos_read(f.file);
> -		ret = vfs_readv(f.file, vec, vlen, &pos);
> +		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
>  		if (ret >= 0)
>  			file_pos_write(f.file, pos);
>  		fdput_pos(f);
> @@ -916,7 +923,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
>  
>  	if (f.file) {
>  		loff_t pos = file_pos_read(f.file);
> -		ret = vfs_writev(f.file, vec, vlen, &pos);
> +		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
>  		if (ret >= 0)
>  			file_pos_write(f.file, pos);
>  		fdput_pos(f);
> @@ -948,7 +955,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
>  	if (f.file) {
>  		ret = -ESPIPE;
>  		if (f.file->f_mode & FMODE_PREAD)
> -			ret = vfs_readv(f.file, vec, vlen, &pos);
> +			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
>  		fdput(f);
>  	}
>  
> @@ -972,7 +979,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
>  	if (f.file) {
>  		ret = -ESPIPE;
>  		if (f.file->f_mode & FMODE_PWRITE)
> -			ret = vfs_writev(f.file, vec, vlen, &pos);
> +			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
>  		fdput(f);
>  	}
>  
> @@ -986,7 +993,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
>  
>  static ssize_t compat_do_readv_writev(int type, struct file *file,
>  			       const struct compat_iovec __user *uvector,
> -			       unsigned long nr_segs, loff_t *pos)
> +			       unsigned long nr_segs, loff_t *pos,
> +			       int flags)
>  {
>  	compat_ssize_t tot_len;
>  	struct iovec iovstack[UIO_FASTIOV];
> @@ -1018,9 +1026,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
>  	}
>  
>  	if (iter_fn)
> -		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
> +		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
>  	else
> -		ret = do_loop_readv_writev(file, &iter, pos, fn);
> +		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
>  
>  	if (type != READ)
>  		file_end_write(file);
> @@ -1049,7 +1057,7 @@ static size_t compat_readv(struct file *file,
>  	if (!(file->f_mode & FMODE_CAN_READ))
>  		goto out;
>  
> -	ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
> +	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
>  
>  out:
>  	if (ret > 0)
> @@ -1126,7 +1134,7 @@ static size_t compat_writev(struct file *file,
>  	if (!(file->f_mode & FMODE_CAN_WRITE))
>  		goto out;
>  
> -	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
> +	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
>  
>  out:
>  	if (ret > 0)
> diff --git a/fs/splice.c b/fs/splice.c
> index 82bc0d6..3dc1426 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -577,7 +577,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
>  	old_fs = get_fs();
>  	set_fs(get_ds());
>  	/* The cast to a user pointer is valid due to the set_fs() */
> -	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
> +	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
>  	set_fs(old_fs);
>  
>  	return res;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ae68100..875277a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1709,9 +1709,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
>  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
>  extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
> -		unsigned long, loff_t *);
> +		unsigned long, loff_t *, int);
>  extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
> -		unsigned long, loff_t *);
> +		unsigned long, loff_t *, int);
>  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
>  				   loff_t, size_t, unsigned int);
>  extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/7] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2016-02-26 21:52     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:

> This way we can set kiocb flags also from the sync read/write path.
>
> Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> [hch: rebased on top of my kiocb changes]
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

Looks sane to me.

Acked-by: Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

> ---
>  fs/nfsd/vfs.c      |  4 ++--
>  fs/read_write.c    | 44 ++++++++++++++++++++++++++------------------
>  fs/splice.c        |  2 +-
>  include/linux/fs.h |  4 ++--
>  4 files changed, 31 insertions(+), 23 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 5d2a57e..d40010e 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -870,7 +870,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
>  
>  	oldfs = get_fs();
>  	set_fs(KERNEL_DS);
> -	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
> +	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
>  	set_fs(oldfs);
>  	return nfsd_finish_read(file, count, host_err);
>  }
> @@ -957,7 +957,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
>  
>  	/* Write the data. */
>  	oldfs = get_fs(); set_fs(KERNEL_DS);
> -	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
> +	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
>  	set_fs(oldfs);
>  	if (host_err < 0)
>  		goto out_nfserr;
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 324ec27..7d453c3 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -692,11 +692,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
>  EXPORT_SYMBOL(iov_shorten);
>  
>  static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
> -		loff_t *ppos, iter_fn_t fn)
> +		loff_t *ppos, iter_fn_t fn, int flags)
>  {
>  	struct kiocb kiocb;
>  	ssize_t ret;
>  
> +	if (flags)
> +		return -EOPNOTSUPP;
> +
>  	init_sync_kiocb(&kiocb, filp);
>  	kiocb.ki_pos = *ppos;
>  
> @@ -708,10 +711,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
>  
>  /* Do it by hand, with file-ops */
>  static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
> -		loff_t *ppos, io_fn_t fn)
> +		loff_t *ppos, io_fn_t fn, int flags)
>  {
>  	ssize_t ret = 0;
>  
> +	if (flags)
> +		return -EOPNOTSUPP;
> +
>  	while (iov_iter_count(iter)) {
>  		struct iovec iovec = iov_iter_iovec(iter);
>  		ssize_t nr;
> @@ -812,7 +818,8 @@ out:
>  
>  static ssize_t do_readv_writev(int type, struct file *file,
>  			       const struct iovec __user * uvector,
> -			       unsigned long nr_segs, loff_t *pos)
> +			       unsigned long nr_segs, loff_t *pos,
> +			       int flags)
>  {
>  	size_t tot_len;
>  	struct iovec iovstack[UIO_FASTIOV];
> @@ -844,9 +851,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
>  	}
>  
>  	if (iter_fn)
> -		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
> +		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
>  	else
> -		ret = do_loop_readv_writev(file, &iter, pos, fn);
> +		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
>  
>  	if (type != READ)
>  		file_end_write(file);
> @@ -863,27 +870,27 @@ out:
>  }
>  
>  ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
> -		  unsigned long vlen, loff_t *pos)
> +		  unsigned long vlen, loff_t *pos, int flags)
>  {
>  	if (!(file->f_mode & FMODE_READ))
>  		return -EBADF;
>  	if (!(file->f_mode & FMODE_CAN_READ))
>  		return -EINVAL;
>  
> -	return do_readv_writev(READ, file, vec, vlen, pos);
> +	return do_readv_writev(READ, file, vec, vlen, pos, flags);
>  }
>  
>  EXPORT_SYMBOL(vfs_readv);
>  
>  ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
> -		   unsigned long vlen, loff_t *pos)
> +		   unsigned long vlen, loff_t *pos, int flags)
>  {
>  	if (!(file->f_mode & FMODE_WRITE))
>  		return -EBADF;
>  	if (!(file->f_mode & FMODE_CAN_WRITE))
>  		return -EINVAL;
>  
> -	return do_readv_writev(WRITE, file, vec, vlen, pos);
> +	return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
>  }
>  
>  EXPORT_SYMBOL(vfs_writev);
> @@ -896,7 +903,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
>  
>  	if (f.file) {
>  		loff_t pos = file_pos_read(f.file);
> -		ret = vfs_readv(f.file, vec, vlen, &pos);
> +		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
>  		if (ret >= 0)
>  			file_pos_write(f.file, pos);
>  		fdput_pos(f);
> @@ -916,7 +923,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
>  
>  	if (f.file) {
>  		loff_t pos = file_pos_read(f.file);
> -		ret = vfs_writev(f.file, vec, vlen, &pos);
> +		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
>  		if (ret >= 0)
>  			file_pos_write(f.file, pos);
>  		fdput_pos(f);
> @@ -948,7 +955,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
>  	if (f.file) {
>  		ret = -ESPIPE;
>  		if (f.file->f_mode & FMODE_PREAD)
> -			ret = vfs_readv(f.file, vec, vlen, &pos);
> +			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
>  		fdput(f);
>  	}
>  
> @@ -972,7 +979,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
>  	if (f.file) {
>  		ret = -ESPIPE;
>  		if (f.file->f_mode & FMODE_PWRITE)
> -			ret = vfs_writev(f.file, vec, vlen, &pos);
> +			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
>  		fdput(f);
>  	}
>  
> @@ -986,7 +993,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
>  
>  static ssize_t compat_do_readv_writev(int type, struct file *file,
>  			       const struct compat_iovec __user *uvector,
> -			       unsigned long nr_segs, loff_t *pos)
> +			       unsigned long nr_segs, loff_t *pos,
> +			       int flags)
>  {
>  	compat_ssize_t tot_len;
>  	struct iovec iovstack[UIO_FASTIOV];
> @@ -1018,9 +1026,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
>  	}
>  
>  	if (iter_fn)
> -		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
> +		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
>  	else
> -		ret = do_loop_readv_writev(file, &iter, pos, fn);
> +		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
>  
>  	if (type != READ)
>  		file_end_write(file);
> @@ -1049,7 +1057,7 @@ static size_t compat_readv(struct file *file,
>  	if (!(file->f_mode & FMODE_CAN_READ))
>  		goto out;
>  
> -	ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
> +	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
>  
>  out:
>  	if (ret > 0)
> @@ -1126,7 +1134,7 @@ static size_t compat_writev(struct file *file,
>  	if (!(file->f_mode & FMODE_CAN_WRITE))
>  		goto out;
>  
> -	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
> +	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
>  
>  out:
>  	if (ret > 0)
> diff --git a/fs/splice.c b/fs/splice.c
> index 82bc0d6..3dc1426 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -577,7 +577,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
>  	old_fs = get_fs();
>  	set_fs(get_ds());
>  	/* The cast to a user pointer is valid due to the set_fs() */
> -	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
> +	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
>  	set_fs(old_fs);
>  
>  	return res;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ae68100..875277a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1709,9 +1709,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
>  extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
>  extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
>  extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
> -		unsigned long, loff_t *);
> +		unsigned long, loff_t *, int);
>  extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
> -		unsigned long, loff_t *);
> +		unsigned long, loff_t *, int);
>  extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
>  				   loff_t, size_t, unsigned int);
>  extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/7] vfs: add the RWF_HIPRI flag for preadv2/pwritev2
@ 2016-02-26 21:56     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> This adds a flag that tells the file system that this is a high priority
> request for which it's worth to poll the hardware.  The flag is purely
> advisory and can be ignored if not supported.

I'm not in love with the HIGHPRI name, but I don't have a better one in
mind.  So...

Acked-by: Jeff Moyer <jmoyer@redhat.com>

>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/read_write.c         | 6 ++++--
>  include/linux/fs.h      | 1 +
>  include/uapi/linux/fs.h | 3 +++
>  3 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 38b9afa..3b3fb22 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -697,10 +697,12 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
>  	struct kiocb kiocb;
>  	ssize_t ret;
>  
> -	if (flags)
> +	if (flags & ~RWF_HIPRI)
>  		return -EOPNOTSUPP;
>  
>  	init_sync_kiocb(&kiocb, filp);
> +	if (flags & RWF_HIPRI)
> +		kiocb.ki_flags |= IOCB_HIPRI;
>  	kiocb.ki_pos = *ppos;
>  
>  	ret = fn(&kiocb, iter);
> @@ -715,7 +717,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
>  {
>  	ssize_t ret = 0;
>  
> -	if (flags)
> +	if (flags & ~RWF_HIPRI)
>  		return -EOPNOTSUPP;
>  
>  	while (iov_iter_count(iter)) {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 875277a..a1f731c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -320,6 +320,7 @@ struct writeback_control;
>  #define IOCB_EVENTFD		(1 << 0)
>  #define IOCB_APPEND		(1 << 1)
>  #define IOCB_DIRECT		(1 << 2)
> +#define IOCB_HIPRI		(1 << 3)
>  
>  struct kiocb {
>  	struct file		*ki_filp;
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 149bec8..d246339 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -304,4 +304,7 @@ struct fsxattr {
>  #define SYNC_FILE_RANGE_WRITE		2
>  #define SYNC_FILE_RANGE_WAIT_AFTER	4
>  
> +/* flags for preadv2/pwritev2: */
> +#define RWF_HIPRI			0x00000001 /* high priority request, poll if possible */
> +
>  #endif /* _UAPI_LINUX_FS_H */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/7] vfs: add the RWF_HIPRI flag for preadv2/pwritev2
@ 2016-02-26 21:56     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:

> This adds a flag that tells the file system that this is a high priority
> request for which it's worth to poll the hardware.  The flag is purely
> advisory and can be ignored if not supported.

I'm not in love with the HIGHPRI name, but I don't have a better one in
mind.  So...

Acked-by: Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

>
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  fs/read_write.c         | 6 ++++--
>  include/linux/fs.h      | 1 +
>  include/uapi/linux/fs.h | 3 +++
>  3 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 38b9afa..3b3fb22 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -697,10 +697,12 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
>  	struct kiocb kiocb;
>  	ssize_t ret;
>  
> -	if (flags)
> +	if (flags & ~RWF_HIPRI)
>  		return -EOPNOTSUPP;
>  
>  	init_sync_kiocb(&kiocb, filp);
> +	if (flags & RWF_HIPRI)
> +		kiocb.ki_flags |= IOCB_HIPRI;
>  	kiocb.ki_pos = *ppos;
>  
>  	ret = fn(&kiocb, iter);
> @@ -715,7 +717,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
>  {
>  	ssize_t ret = 0;
>  
> -	if (flags)
> +	if (flags & ~RWF_HIPRI)
>  		return -EOPNOTSUPP;
>  
>  	while (iov_iter_count(iter)) {
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 875277a..a1f731c 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -320,6 +320,7 @@ struct writeback_control;
>  #define IOCB_EVENTFD		(1 << 0)
>  #define IOCB_APPEND		(1 << 1)
>  #define IOCB_DIRECT		(1 << 2)
> +#define IOCB_HIPRI		(1 << 3)
>  
>  struct kiocb {
>  	struct file		*ki_filp;
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 149bec8..d246339 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -304,4 +304,7 @@ struct fsxattr {
>  #define SYNC_FILE_RANGE_WRITE		2
>  #define SYNC_FILE_RANGE_WAIT_AFTER	4
>  
> +/* flags for preadv2/pwritev2: */
> +#define RWF_HIPRI			0x00000001 /* high priority request, poll if possible */
> +
>  #endif /* _UAPI_LINUX_FS_H */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/7] direct-io: only use block polling if explicitly requested
@ 2016-02-26 21:58     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

> ---
>  fs/direct-io.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index d6a9012..0a8d937 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -445,7 +445,8 @@ static struct bio *dio_await_one(struct dio *dio)
>  		__set_current_state(TASK_UNINTERRUPTIBLE);
>  		dio->waiter = current;
>  		spin_unlock_irqrestore(&dio->bio_lock, flags);
> -		if (!blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
> +		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
> +		    !blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
>  			io_schedule();
>  		/* wake up sets us TASK_RUNNING */
>  		spin_lock_irqsave(&dio->bio_lock, flags);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/7] direct-io: only use block polling if explicitly requested
@ 2016-02-26 21:58     ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-26 21:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:

> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

Reviewed-by: Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

> ---
>  fs/direct-io.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index d6a9012..0a8d937 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -445,7 +445,8 @@ static struct bio *dio_await_one(struct dio *dio)
>  		__set_current_state(TASK_UNINTERRUPTIBLE);
>  		dio->waiter = current;
>  		spin_unlock_irqrestore(&dio->bio_lock, flags);
> -		if (!blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
> +		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
> +		    !blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
>  			io_schedule();
>  		/* wake up sets us TASK_RUNNING */
>  		spin_lock_irqsave(&dio->bio_lock, flags);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/7] blk-mq: enable polling support by default
@ 2016-02-27  8:56       ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-27  8:56 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Fri, Feb 26, 2016 at 03:44:30PM -0500, Jeff Moyer wrote:
> Hi, Christoph,
> 
> Christoph Hellwig <hch@lst.de> writes:
> 
> > Now that applications need to explicitly ask for polling we can enable it
> > by default in blk-mq drivers.
> 
> I don't think this is a good idea.  I'd just enable it in nvme and the
> micron driver for now.

I think my wording was a bit unclear - the flag to enable polling is
enabled everywhere.  You still need to implement the poll blk_mq operation,
which currently only NVMe implements.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/7] blk-mq: enable polling support by default
@ 2016-02-27  8:56       ` Christoph Hellwig
  0 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-27  8:56 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	axboe-b10kYP2dOMg, milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Fri, Feb 26, 2016 at 03:44:30PM -0500, Jeff Moyer wrote:
> Hi, Christoph,
> 
> Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> writes:
> 
> > Now that applications need to explicitly ask for polling we can enable it
> > by default in blk-mq drivers.
> 
> I don't think this is a good idea.  I'd just enable it in nvme and the
> micron driver for now.

I think my wording was a bit unclear - the flag to enable polling is
enabled everywhere.  You still need to implement the poll blk_mq operation,
which currently only NVMe implements.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] block, directio: set a REQ_POLL flag when submitting polled bios
  2016-02-26 21:10     ` Jeff Moyer
  (?)
@ 2016-02-27  8:57     ` Christoph Hellwig
  2016-02-29 14:28       ` Jeff Moyer
  -1 siblings, 1 reply; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-27  8:57 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Fri, Feb 26, 2016 at 04:10:51PM -0500, Jeff Moyer wrote:
> Hi, Christoph,
> 
> REQ_POLL is set but never checked.  Is part of the patch missing, or was
> that intentional?

Jens insisted that I add it, even if there aren't users.  I disagree,
but in the end he'll have to apply the series..

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited V2
  2016-02-26 21:18   ` Jeff Moyer
  (?)
@ 2016-02-27  8:57   ` Christoph Hellwig
  2016-02-29  1:30       ` Damien Le Moal
  2016-02-29 14:59     ` Jeff Moyer
  -1 siblings, 2 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-27  8:57 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Fri, Feb 26, 2016 at 04:18:55PM -0500, Jeff Moyer wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > This series allows to selectively enable/disable polling for completions
> > in the block layer on a per-I/O basis.  For this it resurrects the
> > preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
> > are much simpler now due to VFS changes that happened in the meantime).
> > That approach also had a man page update prepared, which I will resubmit
> > with the current flags once this series makes it in.
> 
> It would be helpful for reviewers if you submitted the man page at the
> same time, in my opinion.

Ok.

> Do you have any plans on adding polling support to the buffered path?

How would that even make sense?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/7] vfs: vfs: Define new syscalls preadv2,pwritev2
  2016-02-26 21:51     ` Jeff Moyer
  (?)
@ 2016-02-27  8:58     ` Christoph Hellwig
  -1 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-27  8:58 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Fri, Feb 26, 2016 at 04:51:24PM -0500, Jeff Moyer wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > From: Milosz Tanski <milosz@adfin.com>
> >
> > New syscalls that take an flag argument. This change does not add any
> > specific flags.
> 
> So, it looks like file systems that don't implement read_iter/write_iter
> won't get the flags argument passed along.  I don't think that's a big
> deal, as such file systems seem to be in-memory file systems, but I
> think it warrants mention in the changelog.

Ok.

> Also, I think you added a stray newline below:
> 
> > +static long do_compat_pwritev64(unsigned long fd,
> >  				   const struct compat_iovec __user *vec,
> > -				   unsigned long vlen, loff_t pos)
> > +				   unsigned long vlen, loff_t pos, int flags)
> >  {
> >  	struct fd f;
> >  	ssize_t ret;
> >  
> >  	if (pos < 0)
> >  		return -EINVAL;
> > +
> >  	f = fdget(fd);
> >  	if (!f.file)
> >  		return -EBADF;
> >  	ret = -ESPIPE;
> >  	if (f.file->f_mode & FMODE_PWRITE)
> > -		ret = compat_writev(f.file, vec, vlen, &pos);
> > +		ret = compat_writev(f.file, vec, vlen, &pos, flags);
> >  	fdput(f);
> >  	return ret;
> >  }

Yeah, no real need to add it, although the new version is defintively
more readable.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/7] vfs: add the RWF_HIPRI flag for preadv2/pwritev2
  2016-02-26 21:56     ` Jeff Moyer
  (?)
@ 2016-02-27  8:58     ` Christoph Hellwig
  -1 siblings, 0 replies; 39+ messages in thread
From: Christoph Hellwig @ 2016-02-27  8:58 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Fri, Feb 26, 2016 at 04:56:03PM -0500, Jeff Moyer wrote:
> Christoph Hellwig <hch@lst.de> writes:
> 
> > This adds a flag that tells the file system that this is a high priority
> > request for which it's worth to poll the hardware.  The flag is purely
> > advisory and can be ignored if not supported.
> 
> I'm not in love with the HIGHPRI name, but I don't have a better one in
> mind.  So...

I don't really like it too much either.  So if someone else has a better
suggestion: you're welcome!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited V2
@ 2016-02-29  1:30       ` Damien Le Moal
  0 siblings, 0 replies; 39+ messages in thread
From: Damien Le Moal @ 2016-02-29  1:30 UTC (permalink / raw)
  To: Christoph Hellwig, Jeff Moyer
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api


>On Fri, Feb 26, 2016 at 04:18:55PM -0500, Jeff Moyer wrote:
>> Christoph Hellwig <hch@lst.de> writes:
>> 
>> > This series allows to selectively enable/disable polling for completions
>> > in the block layer on a per-I/O basis.  For this it resurrects the
>> > preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
>> > are much simpler now due to VFS changes that happened in the meantime).
>> > That approach also had a man page update prepared, which I will resubmit
>> > with the current flags once this series makes it in.
>> 
>> It would be helpful for reviewers if you submitted the man page at the
>> same time, in my opinion.
>
>Ok.
>
>> Do you have any plans on adding polling support to the buffered path?
>
>How would that even make sense?

Christoph,

On a page cache miss, poll for the BIO serving the page read instead of waiting for the page lock to be released (from device IRQ handler) ? As long as the application is lucky to get a fast page allocation, similar performance improvements as direct I/Os should be observed, shouldn't it ?
Benefit may be restricted to very random accesses. For reasonably sequential accesses, read-ahead would cover, and obviously, no polling there.

So may be polling could be enabled only if read-ahead is disabled. Interestingly, if the device+bus is really really fast, that could result in minimal performance difference between a page cache hit and a miss...

Best.
>

------------------------
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital company
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa, 
Kanagawa, 252-0888 Japan
www.hgst.com 



Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited V2
@ 2016-02-29  1:30       ` Damien Le Moal
  0 siblings, 0 replies; 39+ messages in thread
From: Damien Le Moal @ 2016-02-29  1:30 UTC (permalink / raw)
  To: Christoph Hellwig, Jeff Moyer
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


>On Fri, Feb 26, 2016 at 04:18:55PM -0500, Jeff Moyer wrote:
>> Christoph Hellwig <hch@lst.de> writes:
>> 
>> > This series allows to selectively enable/disable polling for completions
>> > in the block layer on a per-I/O basis.  For this it resurrects the
>> > preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
>> > are much simpler now due to VFS changes that happened in the meantime).
>> > That approach also had a man page update prepared, which I will resubmit
>> > with the current flags once this series makes it in.
>> 
>> It would be helpful for reviewers if you submitted the man page at the
>> same time, in my opinion.
>
>Ok.
>
>> Do you have any plans on adding polling support to the buffered path?
>
>How would that even make sense?

Christoph,

On a page cache miss, poll for the BIO serving the page read instead of waiting for the page lock to be released (from device IRQ handler) ? As long as the application is lucky to get a fast page allocation, similar performance improvements as direct I/Os should be observed, shouldn't it ?
Benefit may be restricted to very random accesses. For reasonably sequential accesses, read-ahead would cover, and obviously, no polling there.

So may be polling could be enabled only if read-ahead is disabled. Interestingly, if the device+bus is really really fast, that could result in minimal performance difference between a page cache hit and a miss...

Best.
>

------------------------
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital company
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa, 
Kanagawa, 252-0888 Japan
www.hgst.com 



Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/7] blk-mq: enable polling support by default
  2016-02-27  8:56       ` Christoph Hellwig
  (?)
@ 2016-02-29 14:27       ` Jeff Moyer
  -1 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-29 14:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> On Fri, Feb 26, 2016 at 03:44:30PM -0500, Jeff Moyer wrote:
>> Hi, Christoph,
>> 
>> Christoph Hellwig <hch@lst.de> writes:
>> 
>> > Now that applications need to explicitly ask for polling we can enable it
>> > by default in blk-mq drivers.
>> 
>> I don't think this is a good idea.  I'd just enable it in nvme and the
>> micron driver for now.
>
> I think my wording was a bit unclear - the flag to enable polling is
> enabled everywhere.  You still need to implement the poll blk_mq operation,
> which currently only NVMe implements.

Oh, duh.  Yeah, this is fine, then.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/7] block, directio: set a REQ_POLL flag when submitting polled bios
  2016-02-27  8:57     ` Christoph Hellwig
@ 2016-02-29 14:28       ` Jeff Moyer
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-29 14:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> On Fri, Feb 26, 2016 at 04:10:51PM -0500, Jeff Moyer wrote:
>> Hi, Christoph,
>> 
>> REQ_POLL is set but never checked.  Is part of the patch missing, or was
>> that intentional?
>
> Jens insisted that I add it, even if there aren't users.  I disagree,
> but in the end he'll have to apply the series..

OK.  I think it just adds confusion.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited V2
  2016-02-27  8:57   ` Christoph Hellwig
  2016-02-29  1:30       ` Damien Le Moal
@ 2016-02-29 14:59     ` Jeff Moyer
  1 sibling, 0 replies; 39+ messages in thread
From: Jeff Moyer @ 2016-02-29 14:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Christoph Hellwig <hch@lst.de> writes:

> On Fri, Feb 26, 2016 at 04:18:55PM -0500, Jeff Moyer wrote:
>> Christoph Hellwig <hch@lst.de> writes:
>> 
>> > This series allows to selectively enable/disable polling for completions
>> > in the block layer on a per-I/O basis.  For this it resurrects the
>> > preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
>> > are much simpler now due to VFS changes that happened in the meantime).
>> > That approach also had a man page update prepared, which I will resubmit
>> > with the current flags once this series makes it in.
>> 
>> It would be helpful for reviewers if you submitted the man page at the
>> same time, in my opinion.
>
> Ok.
>
>> Do you have any plans on adding polling support to the buffered path?
>
> How would that even make sense?

I admit I haven't looked into the implementation details, but I was
thinking about O_SYNC writes or reads that missed the cache.

Thinking about this some more, it occurs to me that a file opened with
O_SYNC will still incur context switching.  Do you think we should add
that to the NOTES section of the man page?

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-02-29 14:59 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-22 17:07 selective block polling and preadv2/pwritev2 revisited V2 Christoph Hellwig
2016-02-22 17:07 ` [PATCH 1/7] vfs: pass a flags argument to vfs_readv/vfs_writev Christoph Hellwig
2016-02-26 21:52   ` Jeff Moyer
2016-02-26 21:52     ` Jeff Moyer
2016-02-22 17:07 ` [PATCH 2/7] vfs: vfs: Define new syscalls preadv2,pwritev2 Christoph Hellwig
2016-02-22 17:07   ` Christoph Hellwig
2016-02-26 21:51   ` Jeff Moyer
2016-02-26 21:51     ` Jeff Moyer
2016-02-27  8:58     ` Christoph Hellwig
2016-02-22 17:07 ` [PATCH 3/7] x86: wire up preadv2 and pwritev2 Christoph Hellwig
2016-02-22 17:07   ` Christoph Hellwig
2016-02-22 17:07 ` [PATCH 4/7] vfs: add the RWF_HIPRI flag for preadv2/pwritev2 Christoph Hellwig
2016-02-22 17:07   ` Christoph Hellwig
2016-02-26 21:56   ` Jeff Moyer
2016-02-26 21:56     ` Jeff Moyer
2016-02-27  8:58     ` Christoph Hellwig
2016-02-22 17:07 ` [PATCH 5/7] direct-io: only use block polling if explicitly requested Christoph Hellwig
2016-02-26 21:58   ` Jeff Moyer
2016-02-26 21:58     ` Jeff Moyer
2016-02-22 17:07 ` [PATCH 6/7] blk-mq: enable polling support by default Christoph Hellwig
2016-02-22 17:07   ` Christoph Hellwig
2016-02-26 20:44   ` Jeff Moyer
2016-02-26 20:44     ` Jeff Moyer
2016-02-27  8:56     ` Christoph Hellwig
2016-02-27  8:56       ` Christoph Hellwig
2016-02-29 14:27       ` Jeff Moyer
2016-02-22 17:07 ` [PATCH 7/7] block, directio: set a REQ_POLL flag when submitting polled bios Christoph Hellwig
2016-02-22 17:07   ` Christoph Hellwig
2016-02-26 21:10   ` Jeff Moyer
2016-02-26 21:10     ` Jeff Moyer
2016-02-27  8:57     ` Christoph Hellwig
2016-02-29 14:28       ` Jeff Moyer
2016-02-26 15:06 ` selective block polling and preadv2/pwritev2 revisited V2 Stephen Bates
2016-02-26 21:18 ` Jeff Moyer
2016-02-26 21:18   ` Jeff Moyer
2016-02-27  8:57   ` Christoph Hellwig
2016-02-29  1:30     ` Damien Le Moal
2016-02-29  1:30       ` Damien Le Moal
2016-02-29 14:59     ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.