All of lore.kernel.org
 help / color / mirror / Atom feed
* selective block polling and preadv2/pwritev2 revisited
@ 2015-12-24 14:14 ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

This series allows to selectively enable/disable polling for completions
in the block layer [1] on a per-I/O basis.  For this it resurrects the
preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
are much simpler now due to VFS changes that happened in the meantime).
That approach also had a man page update prepared, which I will resubmit
with the current flags once this series makes it in.

Polling for block I/O is important to reduce the latency on flash and
post-flash storage technologies.  On the fastest NVMe controller I have
access to it almost halves latencies from over 7 microseconds to about 4
microseonds.  But it only is usesful if we actually care for the latency
of this particular I/O, and generally is a waste if enabled for all I/O
to a given device.  This series uses the per-I/O flags in preadv2/pwritev2
to control this behavior.  The alternative would be a new O_* flag set
at open time or using fcntl, but this is still to corse-grained for some
applications and we're starting to run out out of open flags.

Note that there are plenty of other use cases for preadv2/pwritev2 as well,
but I'd like to concentrate on this one for now.  Example are: non-blocking
reads (the original purpose), per-I/O O_SYNC, user space support for T10
DIF/DIX applications tags and probably some more.

[1] only supported for NVMe at the moment.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* selective block polling and preadv2/pwritev2 revisited
@ 2015-12-24 14:14 ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

This series allows to selectively enable/disable polling for completions
in the block layer [1] on a per-I/O basis.  For this it resurrects the
preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
are much simpler now due to VFS changes that happened in the meantime).
That approach also had a man page update prepared, which I will resubmit
with the current flags once this series makes it in.

Polling for block I/O is important to reduce the latency on flash and
post-flash storage technologies.  On the fastest NVMe controller I have
access to it almost halves latencies from over 7 microseconds to about 4
microseonds.  But it only is usesful if we actually care for the latency
of this particular I/O, and generally is a waste if enabled for all I/O
to a given device.  This series uses the per-I/O flags in preadv2/pwritev2
to control this behavior.  The alternative would be a new O_* flag set
at open time or using fcntl, but this is still to corse-grained for some
applications and we're starting to run out out of open flags.

Note that there are plenty of other use cases for preadv2/pwritev2 as well,
but I'd like to concentrate on this one for now.  Example are: non-blocking
reads (the original purpose), per-I/O O_SYNC, user space support for T10
DIF/DIX applications tags and probably some more.

[1] only supported for NVMe at the moment.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2015-12-24 14:14   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

From: Milosz Tanski <milosz@adfin.com>

This way we can set kiocb flags also from the sync read/write path.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/nfsd/vfs.c      |  4 ++--
 fs/read_write.c    | 44 ++++++++++++++++++++++++++------------------
 fs/splice.c        |  2 +-
 include/linux/fs.h |  4 ++--
 4 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 994d66f..3a9f7bf 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -855,7 +855,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
 
 	oldfs = get_fs();
 	set_fs(KERNEL_DS);
-	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
 	set_fs(oldfs);
 	return nfsd_finish_read(file, count, host_err);
 }
@@ -942,7 +942,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
-	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
 	set_fs(oldfs);
 	if (host_err < 0)
 		goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..34a2920 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -653,11 +653,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
 EXPORT_SYMBOL(iov_shorten);
 
 static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
-		loff_t *ppos, iter_fn_t fn)
+		loff_t *ppos, iter_fn_t fn, int flags)
 {
 	struct kiocb kiocb;
 	ssize_t ret;
 
+	if (flags)
+		return -EOPNOTSUPP;
+
 	init_sync_kiocb(&kiocb, filp);
 	kiocb.ki_pos = *ppos;
 
@@ -669,10 +672,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 
 /* Do it by hand, with file-ops */
 static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
-		loff_t *ppos, io_fn_t fn)
+		loff_t *ppos, io_fn_t fn, int flags)
 {
 	ssize_t ret = 0;
 
+	if (flags)
+		return -EOPNOTSUPP;
+
 	while (iov_iter_count(iter)) {
 		struct iovec iovec = iov_iter_iovec(iter);
 		ssize_t nr;
@@ -773,7 +779,8 @@ out:
 
 static ssize_t do_readv_writev(int type, struct file *file,
 			       const struct iovec __user * uvector,
-			       unsigned long nr_segs, loff_t *pos)
+			       unsigned long nr_segs, loff_t *pos,
+			       int flags)
 {
 	size_t tot_len;
 	struct iovec iovstack[UIO_FASTIOV];
@@ -805,9 +812,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
 	}
 
 	if (iter_fn)
-		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
-		ret = do_loop_readv_writev(file, &iter, pos, fn);
+		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
 	if (type != READ)
 		file_end_write(file);
@@ -824,27 +831,27 @@ out:
 }
 
 ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
-		  unsigned long vlen, loff_t *pos)
+		  unsigned long vlen, loff_t *pos, int flags)
 {
 	if (!(file->f_mode & FMODE_READ))
 		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_READ))
 		return -EINVAL;
 
-	return do_readv_writev(READ, file, vec, vlen, pos);
+	return do_readv_writev(READ, file, vec, vlen, pos, flags);
 }
 
 EXPORT_SYMBOL(vfs_readv);
 
 ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
-		   unsigned long vlen, loff_t *pos)
+		   unsigned long vlen, loff_t *pos, int flags)
 {
 	if (!(file->f_mode & FMODE_WRITE))
 		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
 
-	return do_readv_writev(WRITE, file, vec, vlen, pos);
+	return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
 }
 
 EXPORT_SYMBOL(vfs_writev);
@@ -857,7 +864,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos);
+		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -877,7 +884,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos);
+		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -909,7 +916,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos);
+			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
 		fdput(f);
 	}
 
@@ -933,7 +940,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos);
+			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
 		fdput(f);
 	}
 
@@ -947,7 +954,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
 			       const struct compat_iovec __user *uvector,
-			       unsigned long nr_segs, loff_t *pos)
+			       unsigned long nr_segs, loff_t *pos,
+			       int flags)
 {
 	compat_ssize_t tot_len;
 	struct iovec iovstack[UIO_FASTIOV];
@@ -979,9 +987,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
 	}
 
 	if (iter_fn)
-		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
-		ret = do_loop_readv_writev(file, &iter, pos, fn);
+		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
 	if (type != READ)
 		file_end_write(file);
@@ -1010,7 +1018,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
 
 out:
 	if (ret > 0)
@@ -1087,7 +1095,7 @@ static size_t compat_writev(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		goto out;
 
-	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
 
 out:
 	if (ret > 0)
diff --git a/fs/splice.c b/fs/splice.c
index 801c21c..f357bc0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -579,7 +579,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
 	old_fs = get_fs();
 	set_fs(get_ds());
 	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
 	set_fs(old_fs);
 
 	return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa5142..2b0e078 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1677,9 +1677,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *);
+		unsigned long, loff_t *, int);
 extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *);
+		unsigned long, loff_t *, int);
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2015-12-24 14:14   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>

This way we can set kiocb flags also from the sync read/write path.

Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/nfsd/vfs.c      |  4 ++--
 fs/read_write.c    | 44 ++++++++++++++++++++++++++------------------
 fs/splice.c        |  2 +-
 include/linux/fs.h |  4 ++--
 4 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 994d66f..3a9f7bf 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -855,7 +855,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
 
 	oldfs = get_fs();
 	set_fs(KERNEL_DS);
-	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+	host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
 	set_fs(oldfs);
 	return nfsd_finish_read(file, count, host_err);
 }
@@ -942,7 +942,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 
 	/* Write the data. */
 	oldfs = get_fs(); set_fs(KERNEL_DS);
-	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+	host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
 	set_fs(oldfs);
 	if (host_err < 0)
 		goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..34a2920 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -653,11 +653,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
 EXPORT_SYMBOL(iov_shorten);
 
 static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
-		loff_t *ppos, iter_fn_t fn)
+		loff_t *ppos, iter_fn_t fn, int flags)
 {
 	struct kiocb kiocb;
 	ssize_t ret;
 
+	if (flags)
+		return -EOPNOTSUPP;
+
 	init_sync_kiocb(&kiocb, filp);
 	kiocb.ki_pos = *ppos;
 
@@ -669,10 +672,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 
 /* Do it by hand, with file-ops */
 static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
-		loff_t *ppos, io_fn_t fn)
+		loff_t *ppos, io_fn_t fn, int flags)
 {
 	ssize_t ret = 0;
 
+	if (flags)
+		return -EOPNOTSUPP;
+
 	while (iov_iter_count(iter)) {
 		struct iovec iovec = iov_iter_iovec(iter);
 		ssize_t nr;
@@ -773,7 +779,8 @@ out:
 
 static ssize_t do_readv_writev(int type, struct file *file,
 			       const struct iovec __user * uvector,
-			       unsigned long nr_segs, loff_t *pos)
+			       unsigned long nr_segs, loff_t *pos,
+			       int flags)
 {
 	size_t tot_len;
 	struct iovec iovstack[UIO_FASTIOV];
@@ -805,9 +812,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
 	}
 
 	if (iter_fn)
-		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
-		ret = do_loop_readv_writev(file, &iter, pos, fn);
+		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
 	if (type != READ)
 		file_end_write(file);
@@ -824,27 +831,27 @@ out:
 }
 
 ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
-		  unsigned long vlen, loff_t *pos)
+		  unsigned long vlen, loff_t *pos, int flags)
 {
 	if (!(file->f_mode & FMODE_READ))
 		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_READ))
 		return -EINVAL;
 
-	return do_readv_writev(READ, file, vec, vlen, pos);
+	return do_readv_writev(READ, file, vec, vlen, pos, flags);
 }
 
 EXPORT_SYMBOL(vfs_readv);
 
 ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
-		   unsigned long vlen, loff_t *pos)
+		   unsigned long vlen, loff_t *pos, int flags)
 {
 	if (!(file->f_mode & FMODE_WRITE))
 		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
 
-	return do_readv_writev(WRITE, file, vec, vlen, pos);
+	return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
 }
 
 EXPORT_SYMBOL(vfs_writev);
@@ -857,7 +864,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos);
+		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -877,7 +884,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos);
+		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -909,7 +916,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos);
+			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
 		fdput(f);
 	}
 
@@ -933,7 +940,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos);
+			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
 		fdput(f);
 	}
 
@@ -947,7 +954,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
 			       const struct compat_iovec __user *uvector,
-			       unsigned long nr_segs, loff_t *pos)
+			       unsigned long nr_segs, loff_t *pos,
+			       int flags)
 {
 	compat_ssize_t tot_len;
 	struct iovec iovstack[UIO_FASTIOV];
@@ -979,9 +987,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
 	}
 
 	if (iter_fn)
-		ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
-		ret = do_loop_readv_writev(file, &iter, pos, fn);
+		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
 	if (type != READ)
 		file_end_write(file);
@@ -1010,7 +1018,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
 
 out:
 	if (ret > 0)
@@ -1087,7 +1095,7 @@ static size_t compat_writev(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		goto out;
 
-	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+	ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
 
 out:
 	if (ret > 0)
diff --git a/fs/splice.c b/fs/splice.c
index 801c21c..f357bc0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -579,7 +579,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
 	old_fs = get_fs();
 	set_fs(get_ds());
 	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
 	set_fs(old_fs);
 
 	return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa5142..2b0e078 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1677,9 +1677,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *);
+		unsigned long, loff_t *, int);
 extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *);
+		unsigned long, loff_t *, int);
 
 struct super_operations {
    	struct inode *(*alloc_inode)(struct super_block *sb);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2015-12-24 14:14   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

From: Milosz Tanski <milosz@adfin.com>

New syscalls that take an flag argument. This change does not add any
specific flags.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c          | 162 +++++++++++++++++++++++++++++++++++++----------
 include/linux/compat.h   |   6 ++
 include/linux/syscalls.h |   6 ++
 3 files changed, 139 insertions(+), 35 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 34a2920..caa30ac 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -856,15 +856,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 
 EXPORT_SYMBOL(vfs_writev);
 
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+			unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+		ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -876,15 +876,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+		ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -902,10 +902,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
 	return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
 }
 
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -916,7 +915,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+			ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -926,10 +925,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+			  unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -940,7 +938,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+			ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -950,6 +948,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_readv(fd, vec, vlen, flags);
+
+	return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_writev(fd, vec, vlen, flags);
+
+	return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
 #ifdef CONFIG_COMPAT
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
@@ -1007,7 +1057,7 @@ out:
 
 static size_t compat_readv(struct file *file,
 			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos)
+			   unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1018,7 +1068,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
 
 out:
 	if (ret > 0)
@@ -1027,9 +1077,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_readv(compat_ulong_t fd,
+				 const struct compat_iovec __user *vec,
+				 compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1038,16 +1088,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos);
+	ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
+
+}
+
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_readv(fd, vec, vlen, 0);
 }
 
-static long __compat_sys_preadv64(unsigned long fd,
+static long do_compat_preadv64(unsigned long fd,
 				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos)
+				  unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
@@ -1059,7 +1117,7 @@ static long __compat_sys_preadv64(unsigned long fd,
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos);
+		ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1069,7 +1127,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1079,12 +1137,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+		int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_readv(fd, vec, vlen, flags);
+
+	return do_compat_preadv64(fd, vec, vlen, pos, flags);
 }
 
 static size_t compat_writev(struct file *file,
 			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos)
+			    unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1104,9 +1175,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_writev(compat_ulong_t fd,
+				  const struct compat_iovec __user* vec,
+				  compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1115,28 +1186,36 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos);
+	ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
 }
 
-static long __compat_sys_pwritev64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+		const struct compat_iovec __user *, vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_writev(fd, vec, vlen, 0);
+}
+
+static long do_compat_pwritev64(unsigned long fd,
 				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos)
+				   unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
 
 	if (pos < 0)
 		return -EINVAL;
+
 	f = fdget(fd);
 	if (!f.file)
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos);
+		ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1146,7 +1225,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1156,8 +1235,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_writev(fd, vec, vlen, flags);
+
+	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
 }
+
 #endif
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c917..fe4ccd0 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a156b82..c4fac0d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
 			     size_t count, loff_t pos);
 asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
 			   unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
 			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
 asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
 asmlinkage long sys_chdir(const char __user *filename);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2015-12-24 14:14   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>

New syscalls that take an flag argument. This change does not add any
specific flags.

Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 fs/read_write.c          | 162 +++++++++++++++++++++++++++++++++++++----------
 include/linux/compat.h   |   6 ++
 include/linux/syscalls.h |   6 ++
 3 files changed, 139 insertions(+), 35 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 34a2920..caa30ac 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -856,15 +856,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 
 EXPORT_SYMBOL(vfs_writev);
 
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+			unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+		ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -876,15 +876,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+		ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -902,10 +902,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
 	return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
 }
 
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -916,7 +915,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+			ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -926,10 +925,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+			  unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -940,7 +938,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+			ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -950,6 +948,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_readv(fd, vec, vlen, flags);
+
+	return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_writev(fd, vec, vlen, flags);
+
+	return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
 #ifdef CONFIG_COMPAT
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
@@ -1007,7 +1057,7 @@ out:
 
 static size_t compat_readv(struct file *file,
 			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos)
+			   unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1018,7 +1068,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
 
 out:
 	if (ret > 0)
@@ -1027,9 +1077,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_readv(compat_ulong_t fd,
+				 const struct compat_iovec __user *vec,
+				 compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1038,16 +1088,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos);
+	ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
+
+}
+
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_readv(fd, vec, vlen, 0);
 }
 
-static long __compat_sys_preadv64(unsigned long fd,
+static long do_compat_preadv64(unsigned long fd,
 				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos)
+				  unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
@@ -1059,7 +1117,7 @@ static long __compat_sys_preadv64(unsigned long fd,
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos);
+		ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1069,7 +1127,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1079,12 +1137,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+		int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_readv(fd, vec, vlen, flags);
+
+	return do_compat_preadv64(fd, vec, vlen, pos, flags);
 }
 
 static size_t compat_writev(struct file *file,
 			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos)
+			    unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1104,9 +1175,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_writev(compat_ulong_t fd,
+				  const struct compat_iovec __user* vec,
+				  compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1115,28 +1186,36 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos);
+	ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
 }
 
-static long __compat_sys_pwritev64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+		const struct compat_iovec __user *, vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_writev(fd, vec, vlen, 0);
+}
+
+static long do_compat_pwritev64(unsigned long fd,
 				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos)
+				   unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
 
 	if (pos < 0)
 		return -EINVAL;
+
 	f = fdget(fd);
 	if (!f.file)
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos);
+		ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1146,7 +1225,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1156,8 +1235,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_writev(fd, vec, vlen, flags);
+
+	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
 }
+
 #endif
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c917..fe4ccd0 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a156b82..c4fac0d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
 			     size_t count, loff_t pos);
 asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
 			   unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
 			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
 asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
 asmlinkage long sys_chdir(const char __user *filename);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/6] x86: wire up preadv2 and pwritev2
@ 2015-12-24 14:14   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

From: Milosz Tanski <milosz@adfin.com>

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased due to newly added syscalls]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
 arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f17705e..13e33ced 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -383,3 +383,5 @@
 374	i386	userfaultfd		sys_userfaultfd
 375	i386	membarrier		sys_membarrier
 376	i386	mlock2			sys_mlock2
+377	i386	preadv2			sys_preadv2
+378	i386	pwritev2		sys_pwritev2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 314a90b..2108dae 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -332,6 +332,8 @@
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
+326	64	preadv2			sys_preadv2
+327	64	pwritev2		sys_pwritev2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/6] x86: wire up preadv2 and pwritev2
@ 2015-12-24 14:14   ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
  Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>

Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased due to newly added syscalls]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
 arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f17705e..13e33ced 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -383,3 +383,5 @@
 374	i386	userfaultfd		sys_userfaultfd
 375	i386	membarrier		sys_membarrier
 376	i386	mlock2			sys_mlock2
+377	i386	preadv2			sys_preadv2
+378	i386	pwritev2		sys_pwritev2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 314a90b..2108dae 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -332,6 +332,8 @@
 323	common	userfaultfd		sys_userfaultfd
 324	common	membarrier		sys_membarrier
 325	common	mlock2			sys_mlock2
+326	64	preadv2			sys_preadv2
+327	64	pwritev2		sys_pwritev2
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/6] vfs: add the RWF_HIPRI flag for preadv2/pwritev2
  2015-12-24 14:14 ` Christoph Hellwig
                   ` (3 preceding siblings ...)
  (?)
@ 2015-12-24 14:14 ` Christoph Hellwig
  -1 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

This adds a flag that tells the file system that this is a high priority
request for which it's worth to poll the hardware.  The flag is purely
advisory and can be ignored if not supported.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c         | 6 ++++--
 include/linux/fs.h      | 1 +
 include/uapi/linux/fs.h | 3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index caa30ac..4dc377e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -658,10 +658,12 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
 	struct kiocb kiocb;
 	ssize_t ret;
 
-	if (flags)
+	if (flags & ~RWF_HIPRI)
 		return -EOPNOTSUPP;
 
 	init_sync_kiocb(&kiocb, filp);
+	if (flags & RWF_HIPRI)
+		kiocb.ki_flags |= IOCB_HIPRI;
 	kiocb.ki_pos = *ppos;
 
 	ret = fn(&kiocb, iter);
@@ -676,7 +678,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 {
 	ssize_t ret = 0;
 
-	if (flags)
+	if (flags & ~RWF_HIPRI)
 		return -EOPNOTSUPP;
 
 	while (iov_iter_count(iter)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2b0e078..0247620 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -319,6 +319,7 @@ struct writeback_control;
 #define IOCB_EVENTFD		(1 << 0)
 #define IOCB_APPEND		(1 << 1)
 #define IOCB_DIRECT		(1 << 2)
+#define IOCB_HIPRI		(1 << 3)
 
 struct kiocb {
 	struct file		*ki_filp;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index f15d980..42f7627 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -208,4 +208,7 @@ struct inodes_stat_t {
 #define SYNC_FILE_RANGE_WRITE		2
 #define SYNC_FILE_RANGE_WAIT_AFTER	4
 
+/* flags for preadv2/pwritev2: */
+#define RWF_HIPRI			0x00000001 /* high priority request, poll if possible */
+
 #endif /* _UAPI_LINUX_FS_H */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/6] direct-io: only use block polling if explicitly requested
  2015-12-24 14:14 ` Christoph Hellwig
                   ` (4 preceding siblings ...)
  (?)
@ 2015-12-24 14:14 ` Christoph Hellwig
  -1 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/direct-io.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index cb5337d..904ff7f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -445,7 +445,8 @@ static struct bio *dio_await_one(struct dio *dio)
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		dio->waiter = current;
 		spin_unlock_irqrestore(&dio->bio_lock, flags);
-		if (!blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
+		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
+		    !blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
 			io_schedule();
 		/* wake up sets us TASK_RUNNING */
 		spin_lock_irqsave(&dio->bio_lock, flags);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/6] blk-mq: enable polling support by default
  2015-12-24 14:14 ` Christoph Hellwig
                   ` (5 preceding siblings ...)
  (?)
@ 2015-12-24 14:14 ` Christoph Hellwig
  -1 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

Now that applications need to explicitly ask for polling we can enable it
by default in blk-mq drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e711f29..1b73222 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -496,7 +496,8 @@ struct request_queue {
 
 #define QUEUE_FLAG_MQ_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
-				 (1 << QUEUE_FLAG_SAME_COMP))
+				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
+				 (1 << QUEUE_FLAG_POLL))
 
 static inline void queue_lockdep_assert_held(struct request_queue *q)
 {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 14:58   ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-01-04 14:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Hi Christoph,

> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now.  Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.

So I'm trying to understand how can integrity metadata be used here.
Will the user-app append the meta-data to the data iovec (given there
is no metadata iovec)? If so, how will we separate data from metadata?

Cheers,
Sagi.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 14:58   ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-01-04 14:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Christoph,

> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now.  Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.

So I'm trying to understand how can integrity metadata be used here.
Will the user-app append the meta-data to the data iovec (given there
is no metadata iovec)? If so, how will we separate data from metadata?

Cheers,
Sagi.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 16:39     ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-01-04 16:39 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Mon, Jan 04, 2016 at 04:58:38PM +0200, Sagi Grimberg wrote:
> Hi Christoph,
>
>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>> but I'd like to concentrate on this one for now.  Example are: non-blocking
>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>> DIF/DIX applications tags and probably some more.
>
> So I'm trying to understand how can integrity metadata be used here.
> Will the user-app append the meta-data to the data iovec (given there
> is no metadata iovec)? If so, how will we separate data from metadata?

The idea that was floated aroud a few times was to have a flag where
the first half of the vectors would be the data, and the second half
the metadata.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 16:39     ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-01-04 16:39 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	axboe-b10kYP2dOMg, milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Mon, Jan 04, 2016 at 04:58:38PM +0200, Sagi Grimberg wrote:
> Hi Christoph,
>
>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>> but I'd like to concentrate on this one for now.  Example are: non-blocking
>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>> DIF/DIX applications tags and probably some more.
>
> So I'm trying to understand how can integrity metadata be used here.
> Will the user-app append the meta-data to the data iovec (given there
> is no metadata iovec)? If so, how will we separate data from metadata?

The idea that was floated aroud a few times was to have a flag where
the first half of the vectors would be the data, and the second half
the metadata.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-06 17:01       ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-01-06 17:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api


>> Hi Christoph,
>>
>>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>>> but I'd like to concentrate on this one for now.  Example are: non-blocking
>>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>>> DIF/DIX applications tags and probably some more.
>>
>> So I'm trying to understand how can integrity metadata be used here.
>> Will the user-app append the meta-data to the data iovec (given there
>> is no metadata iovec)? If so, how will we separate data from metadata?
>
> The idea that was floated aroud a few times was to have a flag where
> the first half of the vectors would be the data, and the second half
> the metadata.

This means that the user would need to pass iovec entries of 8 bytes
correct? Seems like a waste for large IOs (sorry for diverging from the
subject)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-06 17:01       ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-01-06 17:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA


>> Hi Christoph,
>>
>>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>>> but I'd like to concentrate on this one for now.  Example are: non-blocking
>>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>>> DIF/DIX applications tags and probably some more.
>>
>> So I'm trying to understand how can integrity metadata be used here.
>> Will the user-app append the meta-data to the data iovec (given there
>> is no metadata iovec)? If so, how will we separate data from metadata?
>
> The idea that was floated aroud a few times was to have a flag where
> the first half of the vectors would be the data, and the second half
> the metadata.

This means that the user would need to pass iovec entries of 8 bytes
correct? Seems like a waste for large IOs (sorry for diverging from the
subject)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
  2016-01-06 17:01       ` Sagi Grimberg
  (?)
@ 2016-01-06 22:49       ` Martin K. Petersen
  2016-01-07 14:41           ` Sagi Grimberg
  -1 siblings, 1 reply; 31+ messages in thread
From: Martin K. Petersen @ 2016-01-06 22:49 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

>>>>> "Sagi" == Sagi Grimberg <sagig@dev.mellanox.co.il> writes:

>> The idea that was floated aroud a few times was to have a flag where
>> the first half of the vectors would be the data, and the second half
>> the metadata.

Sagi> This means that the user would need to pass iovec entries of 8
Sagi> bytes correct? Seems like a waste for large IOs (sorry for
Sagi> diverging from the subject)

The assumption was that there would be a 1:1 mapping between the number
of data buffers and the metadata ditto. But nothing says that a data
iovec entry is limited in size to a single sector.

The other option to have a single iovec at the end representing the
metadata for all data buffers. I think there are valid use cases for
either approach and we may end up having to support both via a flag.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-07 14:41           ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-01-07 14:41 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

Hi Martin,

> Sagi> This means that the user would need to pass iovec entries of 8
> Sagi> bytes correct? Seems like a waste for large IOs (sorry for
> Sagi> diverging from the subject)
>
> The assumption was that there would be a 1:1 mapping between the number
> of data buffers and the metadata ditto. But nothing says that a data
> iovec entry is limited in size to a single sector.

Yea... I meant 1:1, I got confused on the 8 bytes comment...

> The other option to have a single iovec at the end representing the
> metadata for all data buffers. I think there are valid use cases for
> either approach and we may end up having to support both via a flag.

Either approach presents limitation, but I guess user-space can deal
with it...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-07 14:41           ` Sagi Grimberg
  0 siblings, 0 replies; 31+ messages in thread
From: Sagi Grimberg @ 2016-01-07 14:41 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	axboe-b10kYP2dOMg, milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hi Martin,

> Sagi> This means that the user would need to pass iovec entries of 8
> Sagi> bytes correct? Seems like a waste for large IOs (sorry for
> Sagi> diverging from the subject)
>
> The assumption was that there would be a 1:1 mapping between the number
> of data buffers and the metadata ditto. But nothing says that a data
> iovec entry is limited in size to a single sector.

Yea... I meant 1:1, I got confused on the 8 bytes comment...

> The other option to have a single iovec at the end representing the
> metadata for all data buffers. I think there are valid use cases for
> either approach and we may end up having to support both via a flag.

Either approach presents limitation, but I guess user-space can deal
with it...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
  2016-04-25 17:35             ` Michael Kerrisk (man-pages)
  (?)
@ 2016-05-08  9:29             ` Christoph Hellwig
  -1 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-05-08  9:29 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christoph Hellwig, Christoph Hellwig, Alexander Viro, Jens Axboe,
	Milosz Tanski, linux-fsdevel, linux-block, Linux API

On Mon, Apr 25, 2016 at 07:35:36PM +0200, Michael Kerrisk (man-pages) wrote:
> > I'd rather update the man page - EOPNOTSUPP is a much more descriptive
> > error code for this case.  I'll send you a patch.
> 
> Unless I'm misunderstanding something here, you're proposing something
> very inconsistent. The standard error for unknown flag bits is EINVAL.
> This is so for dozens of systems calls (check the man pages; you might
> find a rare exception, but that's the point, they are exceptions). It
> seems to me here that it's really the implementation that needs
> fixing, not the man page!

For new filesystem calls we try to use EOPNOTSUPP as much as possible,
e.g. fallocate.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-04-25 17:35             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-04-25 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Alexander Viro, Jens Axboe, Milosz Tanski,
	linux-fsdevel, linux-block, Linux API

Hi Christoph,

On 25 April 2016 at 10:47, Christoph Hellwig <hch@infradead.org> wrote:
> On Mon, Apr 18, 2016 at 02:51:50PM +0100, Michael Kerrisk (man-pages) wrote:
>> Thanks. I applied the patch, but I see one point where the doc
>> and code differ, and I suspect that the code needs to be fixed.
>> See below.
>
>> >  .TP
>> >  .B EINVAL
>> >  The vector count \fIiovcnt\fP is less than zero or greater than the
>> > -permitted maximum.
>> > +permitted maximum. Or, an unknown flag is specified in \fIflags\fP.
>>
>> In the case described in the last sentence, the code currently appears
>> to be returning EOPNOTSUPP. EINVAL is more usual here, so I think the
>> code needs adjusting. Your thoughts?
>
> I'd rather update the man page - EOPNOTSUPP is a much more descriptive
> error code for this case.  I'll send you a patch.

Unless I'm misunderstanding something here, you're proposing something
very inconsistent. The standard error for unknown flag bits is EINVAL.
This is so for dozens of systems calls (check the man pages; you might
find a rare exception, but that's the point, they are exceptions). It
seems to me here that it's really the implementation that needs
fixing, not the man page!

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-04-25 17:35             ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-04-25 17:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Alexander Viro, Jens Axboe, Milosz Tanski,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Linux API

Hi Christoph,

On 25 April 2016 at 10:47, Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> On Mon, Apr 18, 2016 at 02:51:50PM +0100, Michael Kerrisk (man-pages) wrote:
>> Thanks. I applied the patch, but I see one point where the doc
>> and code differ, and I suspect that the code needs to be fixed.
>> See below.
>
>> >  .TP
>> >  .B EINVAL
>> >  The vector count \fIiovcnt\fP is less than zero or greater than the
>> > -permitted maximum.
>> > +permitted maximum. Or, an unknown flag is specified in \fIflags\fP.
>>
>> In the case described in the last sentence, the code currently appears
>> to be returning EOPNOTSUPP. EINVAL is more usual here, so I think the
>> code needs adjusting. Your thoughts?
>
> I'd rather update the man page - EOPNOTSUPP is a much more descriptive
> error code for this case.  I'll send you a patch.

Unless I'm misunderstanding something here, you're proposing something
very inconsistent. The standard error for unknown flag bits is EINVAL.
This is so for dozens of systems calls (check the man pages; you might
find a rare exception, but that's the point, they are exceptions). It
seems to me here that it's really the implementation that needs
fixing, not the man page!

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-04-25  8:47           ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-04-25  8:47 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Mon, Apr 18, 2016 at 02:51:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Thanks. I applied the patch, but I see one point where the doc
> and code differ, and I suspect that the code needs to be fixed.
> See below.

> >  .TP
> >  .B EINVAL
> >  The vector count \fIiovcnt\fP is less than zero or greater than the
> > -permitted maximum.
> > +permitted maximum. Or, an unknown flag is specified in \fIflags\fP.
> 
> In the case described in the last sentence, the code currently appears
> to be returning EOPNOTSUPP. EINVAL is more usual here, so I think the
> code needs adjusting. Your thoughts?

I'd rather update the man page - EOPNOTSUPP is a much more descriptive
error code for this case.  I'll send you a patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-04-25  8:47           ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-04-25  8:47 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	axboe-b10kYP2dOMg, milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 18, 2016 at 02:51:50PM +0100, Michael Kerrisk (man-pages) wrote:
> Thanks. I applied the patch, but I see one point where the doc
> and code differ, and I suspect that the code needs to be fixed.
> See below.

> >  .TP
> >  .B EINVAL
> >  The vector count \fIiovcnt\fP is less than zero or greater than the
> > -permitted maximum.
> > +permitted maximum. Or, an unknown flag is specified in \fIflags\fP.
> 
> In the case described in the last sentence, the code currently appears
> to be returning EOPNOTSUPP. EINVAL is more usual here, so I think the
> code needs adjusting. Your thoughts?

I'd rather update the man page - EOPNOTSUPP is a much more descriptive
error code for this case.  I'll send you a patch.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-04-18 13:51         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-04-18 13:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mtk.manpages, viro, axboe, milosz, linux-fsdevel, linux-block, linux-api

Hello Christoph,

On 03/11/2016 09:53 AM, Christoph Hellwig wrote:
> On Thu, Mar 10, 2016 at 07:15:04PM +0100, Michael Kerrisk (man-pages) wrote:
>> Hi Christoph,
>>
>> On 03/03/2016 04:03 PM, Christoph Hellwig wrote:
>>> From: Milosz Tanski <milosz@adfin.com>
>>>
>>> New syscalls that take an flag argument.   No flags are added yet in this
>>> patch.
>>
>> Are there some man pages patches for these proposed system calls?
> 
> This is what I have:

Thanks. I applied the patch, but I see one point where the doc
and code differ, and I suspect that the code needs to be fixed.
See below.

> ---
>>>From d33a02d56f447a6cb223b3964e1dd894f2921d5c Mon Sep 17 00:00:00 2001
> From: Milosz Tanski <milosz@adfin.com>
> Date: Fri, 11 Mar 2016 10:52:31 +0100
> Subject: add preadv2/pwritev2 documentation
> 
> New syscalls that are a variation on the preadv/pwritev but support an extra
> flag argument.
> 
> Signed-off-by: Milosz Tanski <milosz@adfin.com>
> [hch: added RWF_HIPRI documentation]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  man2/readv.2 | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/man2/readv.2 b/man2/readv.2
> index 93f2b6f..5cba5e2 100644
> --- a/man2/readv.2
> +++ b/man2/readv.2
> @@ -45,6 +45,12 @@ readv, writev, preadv, pwritev \- read or write data into multiple buffers
>  .sp
>  .BI "ssize_t pwritev(int " fd ", const struct iovec *" iov ", int " iovcnt ,
>  .BI "                off_t " offset );
> +.sp
> +.BI "ssize_t preadv2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
> +.BI "                off_t " offset ", int " flags );
> +.sp
> +.BI "ssize_t pwritev2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
> +.BI "                 off_t " offset ", int " flags );
>  .fi
>  .sp
>  .in -4n
> @@ -166,9 +172,9 @@ The
>  system call combines the functionality of
>  .BR writev ()
>  and
> -.BR pwrite (2).
> +.BR pwrite (2) "."
>  It performs the same task as
> -.BR writev (),
> +.BR writev () ","
>  but adds a fourth argument,
>  .IR offset ,
>  which specifies the file offset at which the output operation
> @@ -178,15 +184,43 @@ The file offset is not changed by these system calls.
>  The file referred to by
>  .I fd
>  must be capable of seeking.
> +.SS preadv2() and pwritev2()
> +
> +This pair of system calls has similar functionality to the
> +.BR preadv ()
> +and
> +.BR pwritev ()
> +calls, but adds a fifth argument, \fIflags\fP, which modifies the behavior on a per call basis.
> +
> +Like the
> +.BR preadv ()
> +and
> +.BR pwritev ()
> +calls, they accept an \fIoffset\fP argument. Unlike those calls, if the \fIoffset\fP argument is set to -1 then the current file offset is used and updated.
> +
> +The \fIflags\fP arguments to
> +.BR preadv2 ()
> +and
> +.BR pwritev2 ()
> +contains a bitwise OR of one or more of the following flags:
> +.TP
> +.BR RWF_HIPRI " (since Linux 4.6)"
> +High priority read/write.  Allows block based filesystems to use polling of the
> +device, which provides lower latency, but may use additional ressources.  (Currently
> +only usable on a file descriptor opened using the
> +.BR O_DIRECT " flag)."
> +
>  .SH RETURN VALUE
>  On success,
> -.BR readv ()
> -and
> +.BR readv () ","
>  .BR preadv ()
> -return the number of bytes read;
> -.BR writev ()
>  and
> +.BR preadv2 ()
> +return the number of bytes read;
> +.BR writev () ","
>  .BR pwritev ()
> +and
> +.BR pwritev2 ()
>  return the number of bytes written.
>  
>  Note that is not an error for a successful call to transfer fewer bytes
> @@ -202,9 +236,11 @@ The errors are as given for
>  and
>  .BR write (2).
>  Furthermore,
> -.BR preadv ()
> -and
> +.BR preadv () ","
> +.BR preadv2 () ","
>  .BR pwritev ()
> +and
> +.BR pwritev2 ()
>  can also fail for the same reasons as
>  .BR lseek (2).
>  Additionally, the following error is defined:
> @@ -218,12 +254,17 @@ value.
>  .TP
>  .B EINVAL
>  The vector count \fIiovcnt\fP is less than zero or greater than the
> -permitted maximum.
> +permitted maximum. Or, an unknown flag is specified in \fIflags\fP.

In the case described in the last sentence, the code currently appears
to be returning EOPNOTSUPP. EINVAL is more usual here, so I think the
code needs adjusting. Your thoughts?

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-04-18 13:51         ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-04-18 13:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
	milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Hello Christoph,

On 03/11/2016 09:53 AM, Christoph Hellwig wrote:
> On Thu, Mar 10, 2016 at 07:15:04PM +0100, Michael Kerrisk (man-pages) wrote:
>> Hi Christoph,
>>
>> On 03/03/2016 04:03 PM, Christoph Hellwig wrote:
>>> From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
>>>
>>> New syscalls that take an flag argument.   No flags are added yet in this
>>> patch.
>>
>> Are there some man pages patches for these proposed system calls?
> 
> This is what I have:

Thanks. I applied the patch, but I see one point where the doc
and code differ, and I suspect that the code needs to be fixed.
See below.

> ---
>>From d33a02d56f447a6cb223b3964e1dd894f2921d5c Mon Sep 17 00:00:00 2001
> From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> Date: Fri, 11 Mar 2016 10:52:31 +0100
> Subject: add preadv2/pwritev2 documentation
> 
> New syscalls that are a variation on the preadv/pwritev but support an extra
> flag argument.
> 
> Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> [hch: added RWF_HIPRI documentation]
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  man2/readv.2 | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 54 insertions(+), 9 deletions(-)
> 
> diff --git a/man2/readv.2 b/man2/readv.2
> index 93f2b6f..5cba5e2 100644
> --- a/man2/readv.2
> +++ b/man2/readv.2
> @@ -45,6 +45,12 @@ readv, writev, preadv, pwritev \- read or write data into multiple buffers
>  .sp
>  .BI "ssize_t pwritev(int " fd ", const struct iovec *" iov ", int " iovcnt ,
>  .BI "                off_t " offset );
> +.sp
> +.BI "ssize_t preadv2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
> +.BI "                off_t " offset ", int " flags );
> +.sp
> +.BI "ssize_t pwritev2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
> +.BI "                 off_t " offset ", int " flags );
>  .fi
>  .sp
>  .in -4n
> @@ -166,9 +172,9 @@ The
>  system call combines the functionality of
>  .BR writev ()
>  and
> -.BR pwrite (2).
> +.BR pwrite (2) "."
>  It performs the same task as
> -.BR writev (),
> +.BR writev () ","
>  but adds a fourth argument,
>  .IR offset ,
>  which specifies the file offset at which the output operation
> @@ -178,15 +184,43 @@ The file offset is not changed by these system calls.
>  The file referred to by
>  .I fd
>  must be capable of seeking.
> +.SS preadv2() and pwritev2()
> +
> +This pair of system calls has similar functionality to the
> +.BR preadv ()
> +and
> +.BR pwritev ()
> +calls, but adds a fifth argument, \fIflags\fP, which modifies the behavior on a per call basis.
> +
> +Like the
> +.BR preadv ()
> +and
> +.BR pwritev ()
> +calls, they accept an \fIoffset\fP argument. Unlike those calls, if the \fIoffset\fP argument is set to -1 then the current file offset is used and updated.
> +
> +The \fIflags\fP arguments to
> +.BR preadv2 ()
> +and
> +.BR pwritev2 ()
> +contains a bitwise OR of one or more of the following flags:
> +.TP
> +.BR RWF_HIPRI " (since Linux 4.6)"
> +High priority read/write.  Allows block based filesystems to use polling of the
> +device, which provides lower latency, but may use additional ressources.  (Currently
> +only usable on a file descriptor opened using the
> +.BR O_DIRECT " flag)."
> +
>  .SH RETURN VALUE
>  On success,
> -.BR readv ()
> -and
> +.BR readv () ","
>  .BR preadv ()
> -return the number of bytes read;
> -.BR writev ()
>  and
> +.BR preadv2 ()
> +return the number of bytes read;
> +.BR writev () ","
>  .BR pwritev ()
> +and
> +.BR pwritev2 ()
>  return the number of bytes written.
>  
>  Note that is not an error for a successful call to transfer fewer bytes
> @@ -202,9 +236,11 @@ The errors are as given for
>  and
>  .BR write (2).
>  Furthermore,
> -.BR preadv ()
> -and
> +.BR preadv () ","
> +.BR preadv2 () ","
>  .BR pwritev ()
> +and
> +.BR pwritev2 ()
>  can also fail for the same reasons as
>  .BR lseek (2).
>  Additionally, the following error is defined:
> @@ -218,12 +254,17 @@ value.
>  .TP
>  .B EINVAL
>  The vector count \fIiovcnt\fP is less than zero or greater than the
> -permitted maximum.
> +permitted maximum. Or, an unknown flag is specified in \fIflags\fP.

In the case described in the last sentence, the code currently appears
to be returning EOPNOTSUPP. EINVAL is more usual here, so I think the
code needs adjusting. Your thoughts?

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-03-11  9:53       ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-03-11  9:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
	linux-block, linux-api

On Thu, Mar 10, 2016 at 07:15:04PM +0100, Michael Kerrisk (man-pages) wrote:
> Hi Christoph,
> 
> On 03/03/2016 04:03 PM, Christoph Hellwig wrote:
> > From: Milosz Tanski <milosz@adfin.com>
> > 
> > New syscalls that take an flag argument.   No flags are added yet in this
> > patch.
> 
> Are there some man pages patches for these proposed system calls?

This is what I have:

---
>From d33a02d56f447a6cb223b3964e1dd894f2921d5c Mon Sep 17 00:00:00 2001
From: Milosz Tanski <milosz@adfin.com>
Date: Fri, 11 Mar 2016 10:52:31 +0100
Subject: add preadv2/pwritev2 documentation

New syscalls that are a variation on the preadv/pwritev but support an extra
flag argument.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: added RWF_HIPRI documentation]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 man2/readv.2 | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 54 insertions(+), 9 deletions(-)

diff --git a/man2/readv.2 b/man2/readv.2
index 93f2b6f..5cba5e2 100644
--- a/man2/readv.2
+++ b/man2/readv.2
@@ -45,6 +45,12 @@ readv, writev, preadv, pwritev \- read or write data into multiple buffers
 .sp
 .BI "ssize_t pwritev(int " fd ", const struct iovec *" iov ", int " iovcnt ,
 .BI "                off_t " offset );
+.sp
+.BI "ssize_t preadv2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
+.BI "                off_t " offset ", int " flags );
+.sp
+.BI "ssize_t pwritev2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
+.BI "                 off_t " offset ", int " flags );
 .fi
 .sp
 .in -4n
@@ -166,9 +172,9 @@ The
 system call combines the functionality of
 .BR writev ()
 and
-.BR pwrite (2).
+.BR pwrite (2) "."
 It performs the same task as
-.BR writev (),
+.BR writev () ","
 but adds a fourth argument,
 .IR offset ,
 which specifies the file offset at which the output operation
@@ -178,15 +184,43 @@ The file offset is not changed by these system calls.
 The file referred to by
 .I fd
 must be capable of seeking.
+.SS preadv2() and pwritev2()
+
+This pair of system calls has similar functionality to the
+.BR preadv ()
+and
+.BR pwritev ()
+calls, but adds a fifth argument, \fIflags\fP, which modifies the behavior on a per call basis.
+
+Like the
+.BR preadv ()
+and
+.BR pwritev ()
+calls, they accept an \fIoffset\fP argument. Unlike those calls, if the \fIoffset\fP argument is set to -1 then the current file offset is used and updated.
+
+The \fIflags\fP arguments to
+.BR preadv2 ()
+and
+.BR pwritev2 ()
+contains a bitwise OR of one or more of the following flags:
+.TP
+.BR RWF_HIPRI " (since Linux 4.6)"
+High priority read/write.  Allows block based filesystems to use polling of the
+device, which provides lower latency, but may use additional ressources.  (Currently
+only usable on a file descriptor opened using the
+.BR O_DIRECT " flag)."
+
 .SH RETURN VALUE
 On success,
-.BR readv ()
-and
+.BR readv () ","
 .BR preadv ()
-return the number of bytes read;
-.BR writev ()
 and
+.BR preadv2 ()
+return the number of bytes read;
+.BR writev () ","
 .BR pwritev ()
+and
+.BR pwritev2 ()
 return the number of bytes written.
 
 Note that is not an error for a successful call to transfer fewer bytes
@@ -202,9 +236,11 @@ The errors are as given for
 and
 .BR write (2).
 Furthermore,
-.BR preadv ()
-and
+.BR preadv () ","
+.BR preadv2 () ","
 .BR pwritev ()
+and
+.BR pwritev2 ()
 can also fail for the same reasons as
 .BR lseek (2).
 Additionally, the following error is defined:
@@ -218,12 +254,17 @@ value.
 .TP
 .B EINVAL
 The vector count \fIiovcnt\fP is less than zero or greater than the
-permitted maximum.
+permitted maximum. Or, an unknown flag is specified in \fIflags\fP.
 .SH VERSIONS
 .BR preadv ()
 and
 .BR pwritev ()
 first appeared in Linux 2.6.30; library support was added in glibc 2.10.
+.sp
+.BR preadv2 ()
+and
+.BR pwritev2 ()
+first appeared in Linux 4.6
 .SH CONFORMING TO
 .BR readv (),
 .BR writev ():
@@ -237,6 +278,10 @@ POSIX.1-2001, POSIX.1-2008,
 .BR preadv (),
 .BR pwritev ():
 nonstandard, but present also on the modern BSDs.
+.sp
+.BR preadv2 (),
+.BR pwritev2 ():
+nonstandard, Linux extension.
 .SH NOTES
 POSIX.1 allows an implementation to place a limit on
 the number of items that can be passed in
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2016-03-11  9:53       ` Christoph Hellwig
  0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2016-03-11  9:53 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages)
  Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
	axboe-b10kYP2dOMg, milosz-B5zB6C1i6pkAvxtiuMwx3w,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 10, 2016 at 07:15:04PM +0100, Michael Kerrisk (man-pages) wrote:
> Hi Christoph,
> 
> On 03/03/2016 04:03 PM, Christoph Hellwig wrote:
> > From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
> > 
> > New syscalls that take an flag argument.   No flags are added yet in this
> > patch.
> 
> Are there some man pages patches for these proposed system calls?

This is what I have:

---
>From d33a02d56f447a6cb223b3964e1dd894f2921d5c Mon Sep 17 00:00:00 2001
From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
Date: Fri, 11 Mar 2016 10:52:31 +0100
Subject: add preadv2/pwritev2 documentation

New syscalls that are a variation on the preadv/pwritev but support an extra
flag argument.

Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: added RWF_HIPRI documentation]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 man2/readv.2 | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 54 insertions(+), 9 deletions(-)

diff --git a/man2/readv.2 b/man2/readv.2
index 93f2b6f..5cba5e2 100644
--- a/man2/readv.2
+++ b/man2/readv.2
@@ -45,6 +45,12 @@ readv, writev, preadv, pwritev \- read or write data into multiple buffers
 .sp
 .BI "ssize_t pwritev(int " fd ", const struct iovec *" iov ", int " iovcnt ,
 .BI "                off_t " offset );
+.sp
+.BI "ssize_t preadv2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
+.BI "                off_t " offset ", int " flags );
+.sp
+.BI "ssize_t pwritev2(int " fd ", const struct iovec *" iov ", int " iovcnt ,
+.BI "                 off_t " offset ", int " flags );
 .fi
 .sp
 .in -4n
@@ -166,9 +172,9 @@ The
 system call combines the functionality of
 .BR writev ()
 and
-.BR pwrite (2).
+.BR pwrite (2) "."
 It performs the same task as
-.BR writev (),
+.BR writev () ","
 but adds a fourth argument,
 .IR offset ,
 which specifies the file offset at which the output operation
@@ -178,15 +184,43 @@ The file offset is not changed by these system calls.
 The file referred to by
 .I fd
 must be capable of seeking.
+.SS preadv2() and pwritev2()
+
+This pair of system calls has similar functionality to the
+.BR preadv ()
+and
+.BR pwritev ()
+calls, but adds a fifth argument, \fIflags\fP, which modifies the behavior on a per call basis.
+
+Like the
+.BR preadv ()
+and
+.BR pwritev ()
+calls, they accept an \fIoffset\fP argument. Unlike those calls, if the \fIoffset\fP argument is set to -1 then the current file offset is used and updated.
+
+The \fIflags\fP arguments to
+.BR preadv2 ()
+and
+.BR pwritev2 ()
+contains a bitwise OR of one or more of the following flags:
+.TP
+.BR RWF_HIPRI " (since Linux 4.6)"
+High priority read/write.  Allows block based filesystems to use polling of the
+device, which provides lower latency, but may use additional ressources.  (Currently
+only usable on a file descriptor opened using the
+.BR O_DIRECT " flag)."
+
 .SH RETURN VALUE
 On success,
-.BR readv ()
-and
+.BR readv () ","
 .BR preadv ()
-return the number of bytes read;
-.BR writev ()
 and
+.BR preadv2 ()
+return the number of bytes read;
+.BR writev () ","
 .BR pwritev ()
+and
+.BR pwritev2 ()
 return the number of bytes written.
 
 Note that is not an error for a successful call to transfer fewer bytes
@@ -202,9 +236,11 @@ The errors are as given for
 and
 .BR write (2).
 Furthermore,
-.BR preadv ()
-and
+.BR preadv () ","
+.BR preadv2 () ","
 .BR pwritev ()
+and
+.BR pwritev2 ()
 can also fail for the same reasons as
 .BR lseek (2).
 Additionally, the following error is defined:
@@ -218,12 +254,17 @@ value.
 .TP
 .B EINVAL
 The vector count \fIiovcnt\fP is less than zero or greater than the
-permitted maximum.
+permitted maximum. Or, an unknown flag is specified in \fIflags\fP.
 .SH VERSIONS
 .BR preadv ()
 and
 .BR pwritev ()
 first appeared in Linux 2.6.30; library support was added in glibc 2.10.
+.sp
+.BR preadv2 ()
+and
+.BR pwritev2 ()
+first appeared in Linux 4.6
 .SH CONFORMING TO
 .BR readv (),
 .BR writev ():
@@ -237,6 +278,10 @@ POSIX.1-2001, POSIX.1-2008,
 .BR preadv (),
 .BR pwritev ():
 nonstandard, but present also on the modern BSDs.
+.sp
+.BR preadv2 (),
+.BR pwritev2 ():
+nonstandard, Linux extension.
 .SH NOTES
 POSIX.1 allows an implementation to place a limit on
 the number of items that can be passed in
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
  2016-03-03 15:03 ` [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2 Christoph Hellwig
@ 2016-03-10 18:15   ` Michael Kerrisk (man-pages)
  2016-03-11  9:53       ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Michael Kerrisk (man-pages) @ 2016-03-10 18:15 UTC (permalink / raw)
  To: Christoph Hellwig, viro, axboe
  Cc: mtk.manpages, milosz, linux-fsdevel, linux-block, linux-api

Hi Christoph,

On 03/03/2016 04:03 PM, Christoph Hellwig wrote:
> From: Milosz Tanski <milosz@adfin.com>
> 
> New syscalls that take an flag argument.   No flags are added yet in this
> patch.

Are there some man pages patches for these proposed system calls?

Thanks,

Michael


> Signed-off-by: Milosz Tanski <milosz@adfin.com>
> [hch: rebased on top of my kiocb changes]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Stephen Bates <stephen.bates@pmcs.com>
> Tested-by: Stephen Bates <stephen.bates@pmcs.com>
> Acked-by: Jeff Moyer <jmoyer@redhat.com>
> ---
>  fs/read_write.c          | 161 ++++++++++++++++++++++++++++++++++++-----------
>  include/linux/compat.h   |   6 ++
>  include/linux/syscalls.h |   6 ++
>  3 files changed, 138 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 3b7577d..799d25f 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -896,15 +896,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
>  
>  EXPORT_SYMBOL(vfs_writev);
>  
> -SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> -		unsigned long, vlen)
> +static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
> +			unsigned long vlen, int flags)
>  {
>  	struct fd f = fdget_pos(fd);
>  	ssize_t ret = -EBADF;
>  
>  	if (f.file) {
>  		loff_t pos = file_pos_read(f.file);
> -		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
> +		ret = vfs_readv(f.file, vec, vlen, &pos, flags);
>  		if (ret >= 0)
>  			file_pos_write(f.file, pos);
>  		fdput_pos(f);
> @@ -916,15 +916,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
>  	return ret;
>  }
>  
> -SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
> -		unsigned long, vlen)
> +static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
> +			 unsigned long vlen, int flags)
>  {
>  	struct fd f = fdget_pos(fd);
>  	ssize_t ret = -EBADF;
>  
>  	if (f.file) {
>  		loff_t pos = file_pos_read(f.file);
> -		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
> +		ret = vfs_writev(f.file, vec, vlen, &pos, flags);
>  		if (ret >= 0)
>  			file_pos_write(f.file, pos);
>  		fdput_pos(f);
> @@ -942,10 +942,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
>  	return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
>  }
>  
> -SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
> -		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
> +static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
> +			 unsigned long vlen, loff_t pos, int flags)
>  {
> -	loff_t pos = pos_from_hilo(pos_h, pos_l);
>  	struct fd f;
>  	ssize_t ret = -EBADF;
>  
> @@ -956,7 +955,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
>  	if (f.file) {
>  		ret = -ESPIPE;
>  		if (f.file->f_mode & FMODE_PREAD)
> -			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
> +			ret = vfs_readv(f.file, vec, vlen, &pos, flags);
>  		fdput(f);
>  	}
>  
> @@ -966,10 +965,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
>  	return ret;
>  }
>  
> -SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
> -		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
> +static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
> +			  unsigned long vlen, loff_t pos, int flags)
>  {
> -	loff_t pos = pos_from_hilo(pos_h, pos_l);
>  	struct fd f;
>  	ssize_t ret = -EBADF;
>  
> @@ -980,7 +978,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
>  	if (f.file) {
>  		ret = -ESPIPE;
>  		if (f.file->f_mode & FMODE_PWRITE)
> -			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
> +			ret = vfs_writev(f.file, vec, vlen, &pos, flags);
>  		fdput(f);
>  	}
>  
> @@ -990,6 +988,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
>  	return ret;
>  }
>  
> +SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> +		unsigned long, vlen)
> +{
> +	return do_readv(fd, vec, vlen, 0);
> +}
> +
> +SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
> +		unsigned long, vlen)
> +{
> +	return do_writev(fd, vec, vlen, 0);
> +}
> +
> +SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
> +		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
> +{
> +	loff_t pos = pos_from_hilo(pos_h, pos_l);
> +
> +	return do_preadv(fd, vec, vlen, pos, 0);
> +}
> +
> +SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
> +		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
> +		int, flags)
> +{
> +	loff_t pos = pos_from_hilo(pos_h, pos_l);
> +
> +	if (pos == -1)
> +		return do_readv(fd, vec, vlen, flags);
> +
> +	return do_preadv(fd, vec, vlen, pos, flags);
> +}
> +
> +SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
> +		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
> +{
> +	loff_t pos = pos_from_hilo(pos_h, pos_l);
> +
> +	return do_pwritev(fd, vec, vlen, pos, 0);
> +}
> +
> +SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
> +		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
> +		int, flags)
> +{
> +	loff_t pos = pos_from_hilo(pos_h, pos_l);
> +
> +	if (pos == -1)
> +		return do_writev(fd, vec, vlen, flags);
> +
> +	return do_pwritev(fd, vec, vlen, pos, flags);
> +}
> +
>  #ifdef CONFIG_COMPAT
>  
>  static ssize_t compat_do_readv_writev(int type, struct file *file,
> @@ -1047,7 +1097,7 @@ out:
>  
>  static size_t compat_readv(struct file *file,
>  			   const struct compat_iovec __user *vec,
> -			   unsigned long vlen, loff_t *pos)
> +			   unsigned long vlen, loff_t *pos, int flags)
>  {
>  	ssize_t ret = -EBADF;
>  
> @@ -1058,7 +1108,7 @@ static size_t compat_readv(struct file *file,
>  	if (!(file->f_mode & FMODE_CAN_READ))
>  		goto out;
>  
> -	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
> +	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
>  
>  out:
>  	if (ret > 0)
> @@ -1067,9 +1117,9 @@ out:
>  	return ret;
>  }
>  
> -COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
> -		const struct compat_iovec __user *,vec,
> -		compat_ulong_t, vlen)
> +static size_t do_compat_readv(compat_ulong_t fd,
> +				 const struct compat_iovec __user *vec,
> +				 compat_ulong_t vlen, int flags)
>  {
>  	struct fd f = fdget_pos(fd);
>  	ssize_t ret;
> @@ -1078,16 +1128,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
>  	if (!f.file)
>  		return -EBADF;
>  	pos = f.file->f_pos;
> -	ret = compat_readv(f.file, vec, vlen, &pos);
> +	ret = compat_readv(f.file, vec, vlen, &pos, flags);
>  	if (ret >= 0)
>  		f.file->f_pos = pos;
>  	fdput_pos(f);
>  	return ret;
> +
>  }
>  
> -static long __compat_sys_preadv64(unsigned long fd,
> +COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
> +		const struct compat_iovec __user *,vec,
> +		compat_ulong_t, vlen)
> +{
> +	return do_compat_readv(fd, vec, vlen, 0);
> +}
> +
> +static long do_compat_preadv64(unsigned long fd,
>  				  const struct compat_iovec __user *vec,
> -				  unsigned long vlen, loff_t pos)
> +				  unsigned long vlen, loff_t pos, int flags)
>  {
>  	struct fd f;
>  	ssize_t ret;
> @@ -1099,7 +1157,7 @@ static long __compat_sys_preadv64(unsigned long fd,
>  		return -EBADF;
>  	ret = -ESPIPE;
>  	if (f.file->f_mode & FMODE_PREAD)
> -		ret = compat_readv(f.file, vec, vlen, &pos);
> +		ret = compat_readv(f.file, vec, vlen, &pos, flags);
>  	fdput(f);
>  	return ret;
>  }
> @@ -1109,7 +1167,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
>  		const struct compat_iovec __user *,vec,
>  		unsigned long, vlen, loff_t, pos)
>  {
> -	return __compat_sys_preadv64(fd, vec, vlen, pos);
> +	return do_compat_preadv64(fd, vec, vlen, pos, 0);
>  }
>  #endif
>  
> @@ -1119,12 +1177,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
>  {
>  	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
>  
> -	return __compat_sys_preadv64(fd, vec, vlen, pos);
> +	return do_compat_preadv64(fd, vec, vlen, pos, 0);
> +}
> +
> +COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
> +		const struct compat_iovec __user *,vec,
> +		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
> +		int, flags)
> +{
> +	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
> +
> +	if (pos == -1)
> +		return do_compat_readv(fd, vec, vlen, flags);
> +
> +	return do_compat_preadv64(fd, vec, vlen, pos, flags);
>  }
>  
>  static size_t compat_writev(struct file *file,
>  			    const struct compat_iovec __user *vec,
> -			    unsigned long vlen, loff_t *pos)
> +			    unsigned long vlen, loff_t *pos, int flags)
>  {
>  	ssize_t ret = -EBADF;
>  
> @@ -1144,9 +1215,9 @@ out:
>  	return ret;
>  }
>  
> -COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
> -		const struct compat_iovec __user *, vec,
> -		compat_ulong_t, vlen)
> +static size_t do_compat_writev(compat_ulong_t fd,
> +				  const struct compat_iovec __user* vec,
> +				  compat_ulong_t vlen, int flags)
>  {
>  	struct fd f = fdget_pos(fd);
>  	ssize_t ret;
> @@ -1155,16 +1226,23 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
>  	if (!f.file)
>  		return -EBADF;
>  	pos = f.file->f_pos;
> -	ret = compat_writev(f.file, vec, vlen, &pos);
> +	ret = compat_writev(f.file, vec, vlen, &pos, flags);
>  	if (ret >= 0)
>  		f.file->f_pos = pos;
>  	fdput_pos(f);
>  	return ret;
>  }
>  
> -static long __compat_sys_pwritev64(unsigned long fd,
> +COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
> +		const struct compat_iovec __user *, vec,
> +		compat_ulong_t, vlen)
> +{
> +	return do_compat_writev(fd, vec, vlen, 0);
> +}
> +
> +static long do_compat_pwritev64(unsigned long fd,
>  				   const struct compat_iovec __user *vec,
> -				   unsigned long vlen, loff_t pos)
> +				   unsigned long vlen, loff_t pos, int flags)
>  {
>  	struct fd f;
>  	ssize_t ret;
> @@ -1176,7 +1254,7 @@ static long __compat_sys_pwritev64(unsigned long fd,
>  		return -EBADF;
>  	ret = -ESPIPE;
>  	if (f.file->f_mode & FMODE_PWRITE)
> -		ret = compat_writev(f.file, vec, vlen, &pos);
> +		ret = compat_writev(f.file, vec, vlen, &pos, flags);
>  	fdput(f);
>  	return ret;
>  }
> @@ -1186,7 +1264,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
>  		const struct compat_iovec __user *,vec,
>  		unsigned long, vlen, loff_t, pos)
>  {
> -	return __compat_sys_pwritev64(fd, vec, vlen, pos);
> +	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
>  }
>  #endif
>  
> @@ -1196,8 +1274,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
>  {
>  	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
>  
> -	return __compat_sys_pwritev64(fd, vec, vlen, pos);
> +	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
> +}
> +
> +COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
> +		const struct compat_iovec __user *,vec,
> +		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
> +{
> +	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
> +
> +	if (pos == -1)
> +		return do_compat_writev(fd, vec, vlen, flags);
> +
> +	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
>  }
> +
>  #endif
>  
>  static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
> diff --git a/include/linux/compat.h b/include/linux/compat.h
> index a76c917..fe4ccd0 100644
> --- a/include/linux/compat.h
> +++ b/include/linux/compat.h
> @@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
>  asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
>  		const struct compat_iovec __user *vec,
>  		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
> +asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
> +		const struct compat_iovec __user *vec,
> +		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
> +asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
> +		const struct compat_iovec __user *vec,
> +		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
>  
>  #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
>  asmlinkage long compat_sys_preadv64(unsigned long fd,
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index 185815c..d795472 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
>  			     size_t count, loff_t pos);
>  asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
>  			   unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
> +asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
> +			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
> +			    int flags);
>  asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
>  			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
> +asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
> +			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
> +			    int flags);
>  asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
>  asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
>  asmlinkage long sys_chdir(const char __user *filename);
> 


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
  2016-03-03 15:03 generic RDMA READ/WRITE API V2 Christoph Hellwig
@ 2016-03-03 15:03 ` Christoph Hellwig
  2016-03-10 18:15   ` Michael Kerrisk (man-pages)
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2016-03-03 15:03 UTC (permalink / raw)
  To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api

From: Milosz Tanski <milosz@adfin.com>

New syscalls that take an flag argument.   No flags are added yet in this
patch.

Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stephen Bates <stephen.bates@pmcs.com>
Tested-by: Stephen Bates <stephen.bates@pmcs.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
 fs/read_write.c          | 161 ++++++++++++++++++++++++++++++++++++-----------
 include/linux/compat.h   |   6 ++
 include/linux/syscalls.h |   6 ++
 3 files changed, 138 insertions(+), 35 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 3b7577d..799d25f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -896,15 +896,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 
 EXPORT_SYMBOL(vfs_writev);
 
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+			unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+		ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -916,15 +916,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret = -EBADF;
 
 	if (f.file) {
 		loff_t pos = file_pos_read(f.file);
-		ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+		ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		if (ret >= 0)
 			file_pos_write(f.file, pos);
 		fdput_pos(f);
@@ -942,10 +942,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
 	return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
 }
 
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+			 unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -956,7 +955,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PREAD)
-			ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+			ret = vfs_readv(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -966,10 +965,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
-		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+			  unsigned long vlen, loff_t pos, int flags)
 {
-	loff_t pos = pos_from_hilo(pos_h, pos_l);
 	struct fd f;
 	ssize_t ret = -EBADF;
 
@@ -980,7 +978,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	if (f.file) {
 		ret = -ESPIPE;
 		if (f.file->f_mode & FMODE_PWRITE)
-			ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+			ret = vfs_writev(f.file, vec, vlen, &pos, flags);
 		fdput(f);
 	}
 
@@ -990,6 +988,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
 	return ret;
 }
 
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen)
+{
+	return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_readv(fd, vec, vlen, flags);
+
+	return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+		unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+		int, flags)
+{
+	loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+	if (pos == -1)
+		return do_writev(fd, vec, vlen, flags);
+
+	return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
 #ifdef CONFIG_COMPAT
 
 static ssize_t compat_do_readv_writev(int type, struct file *file,
@@ -1047,7 +1097,7 @@ out:
 
 static size_t compat_readv(struct file *file,
 			   const struct compat_iovec __user *vec,
-			   unsigned long vlen, loff_t *pos)
+			   unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1058,7 +1108,7 @@ static size_t compat_readv(struct file *file,
 	if (!(file->f_mode & FMODE_CAN_READ))
 		goto out;
 
-	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
+	ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
 
 out:
 	if (ret > 0)
@@ -1067,9 +1117,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
-		const struct compat_iovec __user *,vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_readv(compat_ulong_t fd,
+				 const struct compat_iovec __user *vec,
+				 compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1078,16 +1128,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_readv(f.file, vec, vlen, &pos);
+	ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
+
 }
 
-static long __compat_sys_preadv64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_readv(fd, vec, vlen, 0);
+}
+
+static long do_compat_preadv64(unsigned long fd,
 				  const struct compat_iovec __user *vec,
-				  unsigned long vlen, loff_t pos)
+				  unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
@@ -1099,7 +1157,7 @@ static long __compat_sys_preadv64(unsigned long fd,
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PREAD)
-		ret = compat_readv(f.file, vec, vlen, &pos);
+		ret = compat_readv(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1109,7 +1167,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1119,12 +1177,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_preadv64(fd, vec, vlen, pos);
+	return do_compat_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+		int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_readv(fd, vec, vlen, flags);
+
+	return do_compat_preadv64(fd, vec, vlen, pos, flags);
 }
 
 static size_t compat_writev(struct file *file,
 			    const struct compat_iovec __user *vec,
-			    unsigned long vlen, loff_t *pos)
+			    unsigned long vlen, loff_t *pos, int flags)
 {
 	ssize_t ret = -EBADF;
 
@@ -1144,9 +1215,9 @@ out:
 	return ret;
 }
 
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
-		const struct compat_iovec __user *, vec,
-		compat_ulong_t, vlen)
+static size_t do_compat_writev(compat_ulong_t fd,
+				  const struct compat_iovec __user* vec,
+				  compat_ulong_t vlen, int flags)
 {
 	struct fd f = fdget_pos(fd);
 	ssize_t ret;
@@ -1155,16 +1226,23 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
 	if (!f.file)
 		return -EBADF;
 	pos = f.file->f_pos;
-	ret = compat_writev(f.file, vec, vlen, &pos);
+	ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	if (ret >= 0)
 		f.file->f_pos = pos;
 	fdput_pos(f);
 	return ret;
 }
 
-static long __compat_sys_pwritev64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+		const struct compat_iovec __user *, vec,
+		compat_ulong_t, vlen)
+{
+	return do_compat_writev(fd, vec, vlen, 0);
+}
+
+static long do_compat_pwritev64(unsigned long fd,
 				   const struct compat_iovec __user *vec,
-				   unsigned long vlen, loff_t pos)
+				   unsigned long vlen, loff_t pos, int flags)
 {
 	struct fd f;
 	ssize_t ret;
@@ -1176,7 +1254,7 @@ static long __compat_sys_pwritev64(unsigned long fd,
 		return -EBADF;
 	ret = -ESPIPE;
 	if (f.file->f_mode & FMODE_PWRITE)
-		ret = compat_writev(f.file, vec, vlen, &pos);
+		ret = compat_writev(f.file, vec, vlen, &pos, flags);
 	fdput(f);
 	return ret;
 }
@@ -1186,7 +1264,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
 		const struct compat_iovec __user *,vec,
 		unsigned long, vlen, loff_t, pos)
 {
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
 }
 #endif
 
@@ -1196,8 +1274,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
 {
 	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
 
-	return __compat_sys_pwritev64(fd, vec, vlen, pos);
+	return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+		const struct compat_iovec __user *,vec,
+		compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+	loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+	if (pos == -1)
+		return do_compat_writev(fd, vec, vlen, flags);
+
+	return do_compat_pwritev64(fd, vec, vlen, pos, flags);
 }
+
 #endif
 
 static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c917..fe4ccd0 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
 asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
 		const struct compat_iovec __user *vec,
 		compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+		const struct compat_iovec __user *vec,
+		compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
 
 #ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
 asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 185815c..d795472 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
 			     size_t count, loff_t pos);
 asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
 			   unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
 			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+			    unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+			    int flags);
 asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
 asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
 asmlinkage long sys_chdir(const char __user *filename);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2016-05-08  9:29 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-24 14:14 selective block polling and preadv2/pwritev2 revisited Christoph Hellwig
2015-12-24 14:14 ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev Christoph Hellwig
2015-12-24 14:14   ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2 Christoph Hellwig
2015-12-24 14:14   ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 3/6] x86: wire up preadv2 and pwritev2 Christoph Hellwig
2015-12-24 14:14   ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 4/6] vfs: add the RWF_HIPRI flag for preadv2/pwritev2 Christoph Hellwig
2015-12-24 14:14 ` [PATCH 5/6] direct-io: only use block polling if explicitly requested Christoph Hellwig
2015-12-24 14:14 ` [PATCH 6/6] blk-mq: enable polling support by default Christoph Hellwig
2016-01-04 14:58 ` selective block polling and preadv2/pwritev2 revisited Sagi Grimberg
2016-01-04 14:58   ` Sagi Grimberg
2016-01-04 16:39   ` Christoph Hellwig
2016-01-04 16:39     ` Christoph Hellwig
2016-01-06 17:01     ` Sagi Grimberg
2016-01-06 17:01       ` Sagi Grimberg
2016-01-06 22:49       ` Martin K. Petersen
2016-01-07 14:41         ` Sagi Grimberg
2016-01-07 14:41           ` Sagi Grimberg
2016-03-03 15:03 generic RDMA READ/WRITE API V2 Christoph Hellwig
2016-03-03 15:03 ` [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2 Christoph Hellwig
2016-03-10 18:15   ` Michael Kerrisk (man-pages)
2016-03-11  9:53     ` Christoph Hellwig
2016-03-11  9:53       ` Christoph Hellwig
2016-04-18 13:51       ` Michael Kerrisk (man-pages)
2016-04-18 13:51         ` Michael Kerrisk (man-pages)
2016-04-25  8:47         ` Christoph Hellwig
2016-04-25  8:47           ` Christoph Hellwig
2016-04-25 17:35           ` Michael Kerrisk (man-pages)
2016-04-25 17:35             ` Michael Kerrisk (man-pages)
2016-05-08  9:29             ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.