* selective block polling and preadv2/pwritev2 revisited
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
This series allows to selectively enable/disable polling for completions
in the block layer [1] on a per-I/O basis. For this it resurrects the
preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
are much simpler now due to VFS changes that happened in the meantime).
That approach also had a man page update prepared, which I will resubmit
with the current flags once this series makes it in.
Polling for block I/O is important to reduce the latency on flash and
post-flash storage technologies. On the fastest NVMe controller I have
access to it almost halves latencies from over 7 microseconds to about 4
microseonds. But it only is usesful if we actually care for the latency
of this particular I/O, and generally is a waste if enabled for all I/O
to a given device. This series uses the per-I/O flags in preadv2/pwritev2
to control this behavior. The alternative would be a new O_* flag set
at open time or using fcntl, but this is still to corse-grained for some
applications and we're starting to run out out of open flags.
Note that there are plenty of other use cases for preadv2/pwritev2 as well,
but I'd like to concentrate on this one for now. Example are: non-blocking
reads (the original purpose), per-I/O O_SYNC, user space support for T10
DIF/DIX applications tags and probably some more.
[1] only supported for NVMe at the moment.
^ permalink raw reply [flat|nested] 22+ messages in thread
* selective block polling and preadv2/pwritev2 revisited
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
This series allows to selectively enable/disable polling for completions
in the block layer [1] on a per-I/O basis. For this it resurrects the
preadv2/pwritev2 syscalls that Milosz prepared a while ago (and which
are much simpler now due to VFS changes that happened in the meantime).
That approach also had a man page update prepared, which I will resubmit
with the current flags once this series makes it in.
Polling for block I/O is important to reduce the latency on flash and
post-flash storage technologies. On the fastest NVMe controller I have
access to it almost halves latencies from over 7 microseconds to about 4
microseonds. But it only is usesful if we actually care for the latency
of this particular I/O, and generally is a waste if enabled for all I/O
to a given device. This series uses the per-I/O flags in preadv2/pwritev2
to control this behavior. The alternative would be a new O_* flag set
at open time or using fcntl, but this is still to corse-grained for some
applications and we're starting to run out out of open flags.
Note that there are plenty of other use cases for preadv2/pwritev2 as well,
but I'd like to concentrate on this one for now. Example are: non-blocking
reads (the original purpose), per-I/O O_SYNC, user space support for T10
DIF/DIX applications tags and probably some more.
[1] only supported for NVMe at the moment.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
From: Milosz Tanski <milosz@adfin.com>
This way we can set kiocb flags also from the sync read/write path.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/nfsd/vfs.c | 4 ++--
fs/read_write.c | 44 ++++++++++++++++++++++++++------------------
fs/splice.c | 2 +-
include/linux/fs.h | 4 ++--
4 files changed, 31 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 994d66f..3a9f7bf 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -855,7 +855,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
oldfs = get_fs();
set_fs(KERNEL_DS);
- host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+ host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
set_fs(oldfs);
return nfsd_finish_read(file, count, host_err);
}
@@ -942,7 +942,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
- host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+ host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
set_fs(oldfs);
if (host_err < 0)
goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..34a2920 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -653,11 +653,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
EXPORT_SYMBOL(iov_shorten);
static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, iter_fn_t fn)
+ loff_t *ppos, iter_fn_t fn, int flags)
{
struct kiocb kiocb;
ssize_t ret;
+ if (flags)
+ return -EOPNOTSUPP;
+
init_sync_kiocb(&kiocb, filp);
kiocb.ki_pos = *ppos;
@@ -669,10 +672,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
/* Do it by hand, with file-ops */
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, io_fn_t fn)
+ loff_t *ppos, io_fn_t fn, int flags)
{
ssize_t ret = 0;
+ if (flags)
+ return -EOPNOTSUPP;
+
while (iov_iter_count(iter)) {
struct iovec iovec = iov_iter_iovec(iter);
ssize_t nr;
@@ -773,7 +779,8 @@ out:
static ssize_t do_readv_writev(int type, struct file *file,
const struct iovec __user * uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
size_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -805,9 +812,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -824,27 +831,27 @@ out:
}
ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
- return do_readv_writev(READ, file, vec, vlen, pos);
+ return do_readv_writev(READ, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_readv);
ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
- return do_readv_writev(WRITE, file, vec, vlen, pos);
+ return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_writev);
@@ -857,7 +864,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -877,7 +884,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -909,7 +916,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -933,7 +940,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -947,7 +954,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
static ssize_t compat_do_readv_writev(int type, struct file *file,
const struct compat_iovec __user *uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -979,9 +987,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -1010,7 +1018,7 @@ static size_t compat_readv(struct file *file,
if (!(file->f_mode & FMODE_CAN_READ))
goto out;
- ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
out:
if (ret > 0)
@@ -1087,7 +1095,7 @@ static size_t compat_writev(struct file *file,
if (!(file->f_mode & FMODE_CAN_WRITE))
goto out;
- ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
out:
if (ret > 0)
diff --git a/fs/splice.c b/fs/splice.c
index 801c21c..f357bc0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -579,7 +579,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
old_fs = get_fs();
set_fs(get_ds());
/* The cast to a user pointer is valid due to the set_fs() */
- res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+ res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
set_fs(old_fs);
return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa5142..2b0e078 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1677,9 +1677,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb);
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
This way we can set kiocb flags also from the sync read/write path.
Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
fs/nfsd/vfs.c | 4 ++--
fs/read_write.c | 44 ++++++++++++++++++++++++++------------------
fs/splice.c | 2 +-
include/linux/fs.h | 4 ++--
4 files changed, 31 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 994d66f..3a9f7bf 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -855,7 +855,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
oldfs = get_fs();
set_fs(KERNEL_DS);
- host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+ host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
set_fs(oldfs);
return nfsd_finish_read(file, count, host_err);
}
@@ -942,7 +942,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
- host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+ host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
set_fs(oldfs);
if (host_err < 0)
goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index 819ef3f..34a2920 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -653,11 +653,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
EXPORT_SYMBOL(iov_shorten);
static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, iter_fn_t fn)
+ loff_t *ppos, iter_fn_t fn, int flags)
{
struct kiocb kiocb;
ssize_t ret;
+ if (flags)
+ return -EOPNOTSUPP;
+
init_sync_kiocb(&kiocb, filp);
kiocb.ki_pos = *ppos;
@@ -669,10 +672,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
/* Do it by hand, with file-ops */
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, io_fn_t fn)
+ loff_t *ppos, io_fn_t fn, int flags)
{
ssize_t ret = 0;
+ if (flags)
+ return -EOPNOTSUPP;
+
while (iov_iter_count(iter)) {
struct iovec iovec = iov_iter_iovec(iter);
ssize_t nr;
@@ -773,7 +779,8 @@ out:
static ssize_t do_readv_writev(int type, struct file *file,
const struct iovec __user * uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
size_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -805,9 +812,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -824,27 +831,27 @@ out:
}
ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
- return do_readv_writev(READ, file, vec, vlen, pos);
+ return do_readv_writev(READ, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_readv);
ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
- return do_readv_writev(WRITE, file, vec, vlen, pos);
+ return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_writev);
@@ -857,7 +864,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -877,7 +884,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -909,7 +916,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -933,7 +940,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -947,7 +954,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
static ssize_t compat_do_readv_writev(int type, struct file *file,
const struct compat_iovec __user *uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -979,9 +987,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -1010,7 +1018,7 @@ static size_t compat_readv(struct file *file,
if (!(file->f_mode & FMODE_CAN_READ))
goto out;
- ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
out:
if (ret > 0)
@@ -1087,7 +1095,7 @@ static size_t compat_writev(struct file *file,
if (!(file->f_mode & FMODE_CAN_WRITE))
goto out;
- ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
out:
if (ret > 0)
diff --git a/fs/splice.c b/fs/splice.c
index 801c21c..f357bc0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -579,7 +579,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
old_fs = get_fs();
set_fs(get_ds());
/* The cast to a user pointer is valid due to the set_fs() */
- res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+ res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
set_fs(old_fs);
return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3aa5142..2b0e078 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1677,9 +1677,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb);
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
From: Milosz Tanski <milosz@adfin.com>
New syscalls that take an flag argument. This change does not add any
specific flags.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/read_write.c | 162 +++++++++++++++++++++++++++++++++++++----------
include/linux/compat.h | 6 ++
include/linux/syscalls.h | 6 ++
3 files changed, 139 insertions(+), 35 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 34a2920..caa30ac 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -856,15 +856,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
EXPORT_SYMBOL(vfs_writev);
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+ ret = vfs_readv(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -876,15 +876,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+ ret = vfs_writev(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -902,10 +902,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
}
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos, int flags)
{
- loff_t pos = pos_from_hilo(pos_h, pos_l);
struct fd f;
ssize_t ret = -EBADF;
@@ -916,7 +915,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+ ret = vfs_readv(f.file, vec, vlen, &pos, flags);
fdput(f);
}
@@ -926,10 +925,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos, int flags)
{
- loff_t pos = pos_from_hilo(pos_h, pos_l);
struct fd f;
ssize_t ret = -EBADF;
@@ -940,7 +938,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+ ret = vfs_writev(f.file, vec, vlen, &pos, flags);
fdput(f);
}
@@ -950,6 +948,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen)
+{
+ return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen)
+{
+ return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+ int, flags)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ if (pos == -1)
+ return do_readv(fd, vec, vlen, flags);
+
+ return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+ int, flags)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ if (pos == -1)
+ return do_writev(fd, vec, vlen, flags);
+
+ return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
#ifdef CONFIG_COMPAT
static ssize_t compat_do_readv_writev(int type, struct file *file,
@@ -1007,7 +1057,7 @@ out:
static size_t compat_readv(struct file *file,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
ssize_t ret = -EBADF;
@@ -1018,7 +1068,7 @@ static size_t compat_readv(struct file *file,
if (!(file->f_mode & FMODE_CAN_READ))
goto out;
- ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
+ ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
out:
if (ret > 0)
@@ -1027,9 +1077,9 @@ out:
return ret;
}
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
- const struct compat_iovec __user *,vec,
- compat_ulong_t, vlen)
+static size_t do_compat_readv(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret;
@@ -1038,16 +1088,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
if (!f.file)
return -EBADF;
pos = f.file->f_pos;
- ret = compat_readv(f.file, vec, vlen, &pos);
+ ret = compat_readv(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
f.file->f_pos = pos;
fdput_pos(f);
return ret;
+
+}
+
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen)
+{
+ return do_compat_readv(fd, vec, vlen, 0);
}
-static long __compat_sys_preadv64(unsigned long fd,
+static long do_compat_preadv64(unsigned long fd,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos)
+ unsigned long vlen, loff_t pos, int flags)
{
struct fd f;
ssize_t ret;
@@ -1059,7 +1117,7 @@ static long __compat_sys_preadv64(unsigned long fd,
return -EBADF;
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = compat_readv(f.file, vec, vlen, &pos);
+ ret = compat_readv(f.file, vec, vlen, &pos, flags);
fdput(f);
return ret;
}
@@ -1069,7 +1127,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
const struct compat_iovec __user *,vec,
unsigned long, vlen, loff_t, pos)
{
- return __compat_sys_preadv64(fd, vec, vlen, pos);
+ return do_compat_preadv64(fd, vec, vlen, pos, 0);
}
#endif
@@ -1079,12 +1137,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
{
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
- return __compat_sys_preadv64(fd, vec, vlen, pos);
+ return do_compat_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+ int, flags)
+{
+ loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+ if (pos == -1)
+ return do_compat_readv(fd, vec, vlen, flags);
+
+ return do_compat_preadv64(fd, vec, vlen, pos, flags);
}
static size_t compat_writev(struct file *file,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
ssize_t ret = -EBADF;
@@ -1104,9 +1175,9 @@ out:
return ret;
}
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
- const struct compat_iovec __user *, vec,
- compat_ulong_t, vlen)
+static size_t do_compat_writev(compat_ulong_t fd,
+ const struct compat_iovec __user* vec,
+ compat_ulong_t vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret;
@@ -1115,28 +1186,36 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
if (!f.file)
return -EBADF;
pos = f.file->f_pos;
- ret = compat_writev(f.file, vec, vlen, &pos);
+ ret = compat_writev(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
f.file->f_pos = pos;
fdput_pos(f);
return ret;
}
-static long __compat_sys_pwritev64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+ const struct compat_iovec __user *, vec,
+ compat_ulong_t, vlen)
+{
+ return do_compat_writev(fd, vec, vlen, 0);
+}
+
+static long do_compat_pwritev64(unsigned long fd,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos)
+ unsigned long vlen, loff_t pos, int flags)
{
struct fd f;
ssize_t ret;
if (pos < 0)
return -EINVAL;
+
f = fdget(fd);
if (!f.file)
return -EBADF;
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = compat_writev(f.file, vec, vlen, &pos);
+ ret = compat_writev(f.file, vec, vlen, &pos, flags);
fdput(f);
return ret;
}
@@ -1146,7 +1225,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
const struct compat_iovec __user *,vec,
unsigned long, vlen, loff_t, pos)
{
- return __compat_sys_pwritev64(fd, vec, vlen, pos);
+ return do_compat_pwritev64(fd, vec, vlen, pos, 0);
}
#endif
@@ -1156,8 +1235,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
{
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
- return __compat_sys_pwritev64(fd, vec, vlen, pos);
+ return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+ loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+ if (pos == -1)
+ return do_compat_writev(fd, vec, vlen, flags);
+
+ return do_compat_pwritev64(fd, vec, vlen, pos, flags);
}
+
#endif
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c917..fe4ccd0 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
#ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a156b82..c4fac0d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
size_t count, loff_t pos);
asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+ int flags);
asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+ int flags);
asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
asmlinkage long sys_chdir(const char __user *filename);
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
New syscalls that take an flag argument. This change does not add any
specific flags.
Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
fs/read_write.c | 162 +++++++++++++++++++++++++++++++++++++----------
include/linux/compat.h | 6 ++
include/linux/syscalls.h | 6 ++
3 files changed, 139 insertions(+), 35 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 34a2920..caa30ac 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -856,15 +856,15 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
EXPORT_SYMBOL(vfs_writev);
-SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen)
+static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+ ret = vfs_readv(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -876,15 +876,15 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
-SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen)
+static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+ ret = vfs_writev(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -902,10 +902,9 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
return (((loff_t)high << HALF_LONG_BITS) << HALF_LONG_BITS) | low;
}
-SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos, int flags)
{
- loff_t pos = pos_from_hilo(pos_h, pos_l);
struct fd f;
ssize_t ret = -EBADF;
@@ -916,7 +915,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos, 0);
+ ret = vfs_readv(f.file, vec, vlen, &pos, flags);
fdput(f);
}
@@ -926,10 +925,9 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
-SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
- unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, loff_t pos, int flags)
{
- loff_t pos = pos_from_hilo(pos_h, pos_l);
struct fd f;
ssize_t ret = -EBADF;
@@ -940,7 +938,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos, 0);
+ ret = vfs_writev(f.file, vec, vlen, &pos, flags);
fdput(f);
}
@@ -950,6 +948,58 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
return ret;
}
+SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen)
+{
+ return do_readv(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen)
+{
+ return do_writev(fd, vec, vlen, 0);
+}
+
+SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ return do_preadv(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(preadv2, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+ int, flags)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ if (pos == -1)
+ return do_readv(fd, vec, vlen, flags);
+
+ return do_preadv(fd, vec, vlen, pos, flags);
+}
+
+SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ return do_pwritev(fd, vec, vlen, pos, 0);
+}
+
+SYSCALL_DEFINE6(pwritev2, unsigned long, fd, const struct iovec __user *, vec,
+ unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h,
+ int, flags)
+{
+ loff_t pos = pos_from_hilo(pos_h, pos_l);
+
+ if (pos == -1)
+ return do_writev(fd, vec, vlen, flags);
+
+ return do_pwritev(fd, vec, vlen, pos, flags);
+}
+
#ifdef CONFIG_COMPAT
static ssize_t compat_do_readv_writev(int type, struct file *file,
@@ -1007,7 +1057,7 @@ out:
static size_t compat_readv(struct file *file,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
ssize_t ret = -EBADF;
@@ -1018,7 +1068,7 @@ static size_t compat_readv(struct file *file,
if (!(file->f_mode & FMODE_CAN_READ))
goto out;
- ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
+ ret = compat_do_readv_writev(READ, file, vec, vlen, pos, flags);
out:
if (ret > 0)
@@ -1027,9 +1077,9 @@ out:
return ret;
}
-COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
- const struct compat_iovec __user *,vec,
- compat_ulong_t, vlen)
+static size_t do_compat_readv(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret;
@@ -1038,16 +1088,24 @@ COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
if (!f.file)
return -EBADF;
pos = f.file->f_pos;
- ret = compat_readv(f.file, vec, vlen, &pos);
+ ret = compat_readv(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
f.file->f_pos = pos;
fdput_pos(f);
return ret;
+
+}
+
+COMPAT_SYSCALL_DEFINE3(readv, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen)
+{
+ return do_compat_readv(fd, vec, vlen, 0);
}
-static long __compat_sys_preadv64(unsigned long fd,
+static long do_compat_preadv64(unsigned long fd,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos)
+ unsigned long vlen, loff_t pos, int flags)
{
struct fd f;
ssize_t ret;
@@ -1059,7 +1117,7 @@ static long __compat_sys_preadv64(unsigned long fd,
return -EBADF;
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = compat_readv(f.file, vec, vlen, &pos);
+ ret = compat_readv(f.file, vec, vlen, &pos, flags);
fdput(f);
return ret;
}
@@ -1069,7 +1127,7 @@ COMPAT_SYSCALL_DEFINE4(preadv64, unsigned long, fd,
const struct compat_iovec __user *,vec,
unsigned long, vlen, loff_t, pos)
{
- return __compat_sys_preadv64(fd, vec, vlen, pos);
+ return do_compat_preadv64(fd, vec, vlen, pos, 0);
}
#endif
@@ -1079,12 +1137,25 @@ COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,
{
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
- return __compat_sys_preadv64(fd, vec, vlen, pos);
+ return do_compat_preadv64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(preadv2, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen, u32, pos_low, u32, pos_high,
+ int, flags)
+{
+ loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+ if (pos == -1)
+ return do_compat_readv(fd, vec, vlen, flags);
+
+ return do_compat_preadv64(fd, vec, vlen, pos, flags);
}
static size_t compat_writev(struct file *file,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
ssize_t ret = -EBADF;
@@ -1104,9 +1175,9 @@ out:
return ret;
}
-COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
- const struct compat_iovec __user *, vec,
- compat_ulong_t, vlen)
+static size_t do_compat_writev(compat_ulong_t fd,
+ const struct compat_iovec __user* vec,
+ compat_ulong_t vlen, int flags)
{
struct fd f = fdget_pos(fd);
ssize_t ret;
@@ -1115,28 +1186,36 @@ COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
if (!f.file)
return -EBADF;
pos = f.file->f_pos;
- ret = compat_writev(f.file, vec, vlen, &pos);
+ ret = compat_writev(f.file, vec, vlen, &pos, flags);
if (ret >= 0)
f.file->f_pos = pos;
fdput_pos(f);
return ret;
}
-static long __compat_sys_pwritev64(unsigned long fd,
+COMPAT_SYSCALL_DEFINE3(writev, compat_ulong_t, fd,
+ const struct compat_iovec __user *, vec,
+ compat_ulong_t, vlen)
+{
+ return do_compat_writev(fd, vec, vlen, 0);
+}
+
+static long do_compat_pwritev64(unsigned long fd,
const struct compat_iovec __user *vec,
- unsigned long vlen, loff_t pos)
+ unsigned long vlen, loff_t pos, int flags)
{
struct fd f;
ssize_t ret;
if (pos < 0)
return -EINVAL;
+
f = fdget(fd);
if (!f.file)
return -EBADF;
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = compat_writev(f.file, vec, vlen, &pos);
+ ret = compat_writev(f.file, vec, vlen, &pos, flags);
fdput(f);
return ret;
}
@@ -1146,7 +1225,7 @@ COMPAT_SYSCALL_DEFINE4(pwritev64, unsigned long, fd,
const struct compat_iovec __user *,vec,
unsigned long, vlen, loff_t, pos)
{
- return __compat_sys_pwritev64(fd, vec, vlen, pos);
+ return do_compat_pwritev64(fd, vec, vlen, pos, 0);
}
#endif
@@ -1156,8 +1235,21 @@ COMPAT_SYSCALL_DEFINE5(pwritev, compat_ulong_t, fd,
{
loff_t pos = ((loff_t)pos_high << 32) | pos_low;
- return __compat_sys_pwritev64(fd, vec, vlen, pos);
+ return do_compat_pwritev64(fd, vec, vlen, pos, 0);
+}
+
+COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
+ const struct compat_iovec __user *,vec,
+ compat_ulong_t, vlen, u32, pos_low, u32, pos_high, int, flags)
+{
+ loff_t pos = ((loff_t)pos_high << 32) | pos_low;
+
+ if (pos == -1)
+ return do_compat_writev(fd, vec, vlen, flags);
+
+ return do_compat_pwritev64(fd, vec, vlen, pos, flags);
}
+
#endif
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
diff --git a/include/linux/compat.h b/include/linux/compat.h
index a76c917..fe4ccd0 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -340,6 +340,12 @@ asmlinkage ssize_t compat_sys_preadv(compat_ulong_t fd,
asmlinkage ssize_t compat_sys_pwritev(compat_ulong_t fd,
const struct compat_iovec __user *vec,
compat_ulong_t vlen, u32 pos_low, u32 pos_high);
+asmlinkage ssize_t compat_sys_preadv2(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
+asmlinkage ssize_t compat_sys_pwritev2(compat_ulong_t fd,
+ const struct compat_iovec __user *vec,
+ compat_ulong_t vlen, u32 pos_low, u32 pos_high, int flags);
#ifdef __ARCH_WANT_COMPAT_SYS_PREADV64
asmlinkage long compat_sys_preadv64(unsigned long fd,
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index a156b82..c4fac0d 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -575,8 +575,14 @@ asmlinkage long sys_pwrite64(unsigned int fd, const char __user *buf,
size_t count, loff_t pos);
asmlinkage long sys_preadv(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_preadv2(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+ int flags);
asmlinkage long sys_pwritev(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, unsigned long pos_l, unsigned long pos_h);
+asmlinkage long sys_pwritev2(unsigned long fd, const struct iovec __user *vec,
+ unsigned long vlen, unsigned long pos_l, unsigned long pos_h,
+ int flags);
asmlinkage long sys_getcwd(char __user *buf, unsigned long size);
asmlinkage long sys_mkdir(const char __user *pathname, umode_t mode);
asmlinkage long sys_chdir(const char __user *filename);
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/6] x86: wire up preadv2 and pwritev2
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
From: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased due to newly added syscalls]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
2 files changed, 4 insertions(+)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f17705e..13e33ced 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -383,3 +383,5 @@
374 i386 userfaultfd sys_userfaultfd
375 i386 membarrier sys_membarrier
376 i386 mlock2 sys_mlock2
+377 i386 preadv2 sys_preadv2
+378 i386 pwritev2 sys_pwritev2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 314a90b..2108dae 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -332,6 +332,8 @@
323 common userfaultfd sys_userfaultfd
324 common membarrier sys_membarrier
325 common mlock2 sys_mlock2
+326 64 preadv2 sys_preadv2
+327 64 pwritev2 sys_pwritev2
#
# x32-specific system call numbers start at 512 to avoid cache impact
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/6] x86: wire up preadv2 and pwritev2
@ 2015-12-24 14:14 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
From: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased due to newly added syscalls]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
arch/x86/entry/syscalls/syscall_32.tbl | 2 ++
arch/x86/entry/syscalls/syscall_64.tbl | 2 ++
2 files changed, 4 insertions(+)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index f17705e..13e33ced 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -383,3 +383,5 @@
374 i386 userfaultfd sys_userfaultfd
375 i386 membarrier sys_membarrier
376 i386 mlock2 sys_mlock2
+377 i386 preadv2 sys_preadv2
+378 i386 pwritev2 sys_pwritev2
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 314a90b..2108dae 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -332,6 +332,8 @@
323 common userfaultfd sys_userfaultfd
324 common membarrier sys_membarrier
325 common mlock2 sys_mlock2
+326 64 preadv2 sys_preadv2
+327 64 pwritev2 sys_pwritev2
#
# x32-specific system call numbers start at 512 to avoid cache impact
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/6] vfs: add the RWF_HIPRI flag for preadv2/pwritev2
2015-12-24 14:14 ` Christoph Hellwig
` (3 preceding siblings ...)
(?)
@ 2015-12-24 14:14 ` Christoph Hellwig
-1 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
This adds a flag that tells the file system that this is a high priority
request for which it's worth to poll the hardware. The flag is purely
advisory and can be ignored if not supported.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/read_write.c | 6 ++++--
include/linux/fs.h | 1 +
include/uapi/linux/fs.h | 3 +++
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index caa30ac..4dc377e 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -658,10 +658,12 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
struct kiocb kiocb;
ssize_t ret;
- if (flags)
+ if (flags & ~RWF_HIPRI)
return -EOPNOTSUPP;
init_sync_kiocb(&kiocb, filp);
+ if (flags & RWF_HIPRI)
+ kiocb.ki_flags |= IOCB_HIPRI;
kiocb.ki_pos = *ppos;
ret = fn(&kiocb, iter);
@@ -676,7 +678,7 @@ static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
{
ssize_t ret = 0;
- if (flags)
+ if (flags & ~RWF_HIPRI)
return -EOPNOTSUPP;
while (iov_iter_count(iter)) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2b0e078..0247620 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -319,6 +319,7 @@ struct writeback_control;
#define IOCB_EVENTFD (1 << 0)
#define IOCB_APPEND (1 << 1)
#define IOCB_DIRECT (1 << 2)
+#define IOCB_HIPRI (1 << 3)
struct kiocb {
struct file *ki_filp;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index f15d980..42f7627 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -208,4 +208,7 @@ struct inodes_stat_t {
#define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4
+/* flags for preadv2/pwritev2: */
+#define RWF_HIPRI 0x00000001 /* high priority request, poll if possible */
+
#endif /* _UAPI_LINUX_FS_H */
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/6] direct-io: only use block polling if explicitly requested
2015-12-24 14:14 ` Christoph Hellwig
` (4 preceding siblings ...)
(?)
@ 2015-12-24 14:14 ` Christoph Hellwig
-1 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/direct-io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/direct-io.c b/fs/direct-io.c
index cb5337d..904ff7f 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -445,7 +445,8 @@ static struct bio *dio_await_one(struct dio *dio)
__set_current_state(TASK_UNINTERRUPTIBLE);
dio->waiter = current;
spin_unlock_irqrestore(&dio->bio_lock, flags);
- if (!blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
+ if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
+ !blk_poll(bdev_get_queue(dio->bio_bdev), dio->bio_cookie))
io_schedule();
/* wake up sets us TASK_RUNNING */
spin_lock_irqsave(&dio->bio_lock, flags);
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 6/6] blk-mq: enable polling support by default
2015-12-24 14:14 ` Christoph Hellwig
` (5 preceding siblings ...)
(?)
@ 2015-12-24 14:14 ` Christoph Hellwig
-1 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-12-24 14:14 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
Now that applications need to explicitly ask for polling we can enable it
by default in blk-mq drivers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e711f29..1b73222 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -496,7 +496,8 @@ struct request_queue {
#define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
- (1 << QUEUE_FLAG_SAME_COMP))
+ (1 << QUEUE_FLAG_SAME_COMP) | \
+ (1 << QUEUE_FLAG_POLL))
static inline void queue_lockdep_assert_held(struct request_queue *q)
{
--
1.9.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 14:58 ` Sagi Grimberg
0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2016-01-04 14:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api
Hi Christoph,
> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now. Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.
So I'm trying to understand how can integrity metadata be used here.
Will the user-app append the meta-data to the data iovec (given there
is no metadata iovec)? If so, how will we separate data from metadata?
Cheers,
Sagi.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 14:58 ` Sagi Grimberg
0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2016-01-04 14:58 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
Hi Christoph,
> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
> but I'd like to concentrate on this one for now. Example are: non-blocking
> reads (the original purpose), per-I/O O_SYNC, user space support for T10
> DIF/DIX applications tags and probably some more.
So I'm trying to understand how can integrity metadata be used here.
Will the user-app append the meta-data to the data iovec (given there
is no metadata iovec)? If so, how will we separate data from metadata?
Cheers,
Sagi.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 16:39 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2016-01-04 16:39 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
linux-block, linux-api
On Mon, Jan 04, 2016 at 04:58:38PM +0200, Sagi Grimberg wrote:
> Hi Christoph,
>
>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>> but I'd like to concentrate on this one for now. Example are: non-blocking
>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>> DIF/DIX applications tags and probably some more.
>
> So I'm trying to understand how can integrity metadata be used here.
> Will the user-app append the meta-data to the data iovec (given there
> is no metadata iovec)? If so, how will we separate data from metadata?
The idea that was floated aroud a few times was to have a flag where
the first half of the vectors would be the data, and the second half
the metadata.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-04 16:39 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2016-01-04 16:39 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
axboe-b10kYP2dOMg, milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
On Mon, Jan 04, 2016 at 04:58:38PM +0200, Sagi Grimberg wrote:
> Hi Christoph,
>
>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>> but I'd like to concentrate on this one for now. Example are: non-blocking
>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>> DIF/DIX applications tags and probably some more.
>
> So I'm trying to understand how can integrity metadata be used here.
> Will the user-app append the meta-data to the data iovec (given there
> is no metadata iovec)? If so, how will we separate data from metadata?
The idea that was floated aroud a few times was to have a flag where
the first half of the vectors would be the data, and the second half
the metadata.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-06 17:01 ` Sagi Grimberg
0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2016-01-06 17:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro, axboe, milosz, linux-fsdevel, linux-block, linux-api
>> Hi Christoph,
>>
>>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>>> but I'd like to concentrate on this one for now. Example are: non-blocking
>>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>>> DIF/DIX applications tags and probably some more.
>>
>> So I'm trying to understand how can integrity metadata be used here.
>> Will the user-app append the meta-data to the data iovec (given there
>> is no metadata iovec)? If so, how will we separate data from metadata?
>
> The idea that was floated aroud a few times was to have a flag where
> the first half of the vectors would be the data, and the second half
> the metadata.
This means that the user would need to pass iovec entries of 8 bytes
correct? Seems like a waste for large IOs (sorry for diverging from the
subject)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-06 17:01 ` Sagi Grimberg
0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2016-01-06 17:01 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg,
milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
>> Hi Christoph,
>>
>>> Note that there are plenty of other use cases for preadv2/pwritev2 as well,
>>> but I'd like to concentrate on this one for now. Example are: non-blocking
>>> reads (the original purpose), per-I/O O_SYNC, user space support for T10
>>> DIF/DIX applications tags and probably some more.
>>
>> So I'm trying to understand how can integrity metadata be used here.
>> Will the user-app append the meta-data to the data iovec (given there
>> is no metadata iovec)? If so, how will we separate data from metadata?
>
> The idea that was floated aroud a few times was to have a flag where
> the first half of the vectors would be the data, and the second half
> the metadata.
This means that the user would need to pass iovec entries of 8 bytes
correct? Seems like a waste for large IOs (sorry for diverging from the
subject)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
2016-01-06 17:01 ` Sagi Grimberg
(?)
@ 2016-01-06 22:49 ` Martin K. Petersen
2016-01-07 14:41 ` Sagi Grimberg
-1 siblings, 1 reply; 22+ messages in thread
From: Martin K. Petersen @ 2016-01-06 22:49 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
linux-block, linux-api
>>>>> "Sagi" == Sagi Grimberg <sagig@dev.mellanox.co.il> writes:
>> The idea that was floated aroud a few times was to have a flag where
>> the first half of the vectors would be the data, and the second half
>> the metadata.
Sagi> This means that the user would need to pass iovec entries of 8
Sagi> bytes correct? Seems like a waste for large IOs (sorry for
Sagi> diverging from the subject)
The assumption was that there would be a 1:1 mapping between the number
of data buffers and the metadata ditto. But nothing says that a data
iovec entry is limited in size to a single sector.
The other option to have a single iovec at the end representing the
metadata for all data buffers. I think there are valid use cases for
either approach and we may end up having to support both via a flag.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-07 14:41 ` Sagi Grimberg
0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2016-01-07 14:41 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, viro, axboe, milosz, linux-fsdevel,
linux-block, linux-api
Hi Martin,
> Sagi> This means that the user would need to pass iovec entries of 8
> Sagi> bytes correct? Seems like a waste for large IOs (sorry for
> Sagi> diverging from the subject)
>
> The assumption was that there would be a 1:1 mapping between the number
> of data buffers and the metadata ditto. But nothing says that a data
> iovec entry is limited in size to a single sector.
Yea... I meant 1:1, I got confused on the 8 bytes comment...
> The other option to have a single iovec at the end representing the
> metadata for all data buffers. I think there are valid use cases for
> either approach and we may end up having to support both via a flag.
Either approach presents limitation, but I guess user-space can deal
with it...
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: selective block polling and preadv2/pwritev2 revisited
@ 2016-01-07 14:41 ` Sagi Grimberg
0 siblings, 0 replies; 22+ messages in thread
From: Sagi Grimberg @ 2016-01-07 14:41 UTC (permalink / raw)
To: Martin K. Petersen
Cc: Christoph Hellwig, viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn,
axboe-b10kYP2dOMg, milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
Hi Martin,
> Sagi> This means that the user would need to pass iovec entries of 8
> Sagi> bytes correct? Seems like a waste for large IOs (sorry for
> Sagi> diverging from the subject)
>
> The assumption was that there would be a 1:1 mapping between the number
> of data buffers and the metadata ditto. But nothing says that a data
> iovec entry is limited in size to a single sector.
Yea... I meant 1:1, I got confused on the 8 bytes comment...
> The other option to have a single iovec at the end representing the
> metadata for all data buffers. I think there are valid use cases for
> either approach and we may end up having to support both via a flag.
Either approach presents limitation, but I guess user-space can deal
with it...
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2016-03-03 15:03 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2016-03-03 15:03 UTC (permalink / raw)
To: viro, axboe; +Cc: milosz, linux-fsdevel, linux-block, linux-api
This way we can set kiocb flags also from the sync read/write path for
the read_iter/write_iter operations. For now there is no way to pass
flags to plain read/write operations as there is no real need for that,
and all flags passed are explicitly rejected for these files.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stephen Bates <stephen.bates@pmcs.com>
Tested-by: Stephen Bates <stephen.bates@pmcs.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
---
fs/nfsd/vfs.c | 4 ++--
fs/read_write.c | 44 ++++++++++++++++++++++++++------------------
fs/splice.c | 2 +-
include/linux/fs.h | 4 ++--
4 files changed, 31 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5d2a57e..d40010e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -870,7 +870,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
oldfs = get_fs();
set_fs(KERNEL_DS);
- host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+ host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
set_fs(oldfs);
return nfsd_finish_read(file, count, host_err);
}
@@ -957,7 +957,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
- host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+ host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
set_fs(oldfs);
if (host_err < 0)
goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index dadf24e..3b7577d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -693,11 +693,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
EXPORT_SYMBOL(iov_shorten);
static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, iter_fn_t fn)
+ loff_t *ppos, iter_fn_t fn, int flags)
{
struct kiocb kiocb;
ssize_t ret;
+ if (flags)
+ return -EOPNOTSUPP;
+
init_sync_kiocb(&kiocb, filp);
kiocb.ki_pos = *ppos;
@@ -709,10 +712,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
/* Do it by hand, with file-ops */
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, io_fn_t fn)
+ loff_t *ppos, io_fn_t fn, int flags)
{
ssize_t ret = 0;
+ if (flags)
+ return -EOPNOTSUPP;
+
while (iov_iter_count(iter)) {
struct iovec iovec = iov_iter_iovec(iter);
ssize_t nr;
@@ -813,7 +819,8 @@ out:
static ssize_t do_readv_writev(int type, struct file *file,
const struct iovec __user * uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
size_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -845,9 +852,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -864,27 +871,27 @@ out:
}
ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
- return do_readv_writev(READ, file, vec, vlen, pos);
+ return do_readv_writev(READ, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_readv);
ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
- return do_readv_writev(WRITE, file, vec, vlen, pos);
+ return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_writev);
@@ -897,7 +904,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -917,7 +924,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -949,7 +956,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -973,7 +980,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -987,7 +994,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
static ssize_t compat_do_readv_writev(int type, struct file *file,
const struct compat_iovec __user *uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -1019,9 +1027,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -1050,7 +1058,7 @@ static size_t compat_readv(struct file *file,
if (!(file->f_mode & FMODE_CAN_READ))
goto out;
- ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
out:
if (ret > 0)
@@ -1127,7 +1135,7 @@ static size_t compat_writev(struct file *file,
if (!(file->f_mode & FMODE_CAN_WRITE))
goto out;
- ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
out:
if (ret > 0)
diff --git a/fs/splice.c b/fs/splice.c
index 82bc0d6..3dc1426 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -577,7 +577,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
old_fs = get_fs();
set_fs(get_ds());
/* The cast to a user pointer is valid due to the set_fs() */
- res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+ res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
set_fs(old_fs);
return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ae68100..875277a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1709,9 +1709,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
loff_t, size_t, unsigned int);
extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
--
2.1.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev
@ 2016-03-03 15:03 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2016-03-03 15:03 UTC (permalink / raw)
To: viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn, axboe-b10kYP2dOMg
Cc: milosz-B5zB6C1i6pkAvxtiuMwx3w,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
linux-block-u79uwXL29TY76Z2rM5mHXA,
linux-api-u79uwXL29TY76Z2rM5mHXA
This way we can set kiocb flags also from the sync read/write path for
the read_iter/write_iter operations. For now there is no way to pass
flags to plain read/write operations as there is no real need for that,
and all flags passed are explicitly rejected for these files.
Signed-off-by: Milosz Tanski <milosz-B5zB6C1i6pkAvxtiuMwx3w@public.gmane.org>
[hch: rebased on top of my kiocb changes]
Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Stephen Bates <stephen.bates-PwyqCcigF0Q@public.gmane.org>
Tested-by: Stephen Bates <stephen.bates-PwyqCcigF0Q@public.gmane.org>
Acked-by: Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
fs/nfsd/vfs.c | 4 ++--
fs/read_write.c | 44 ++++++++++++++++++++++++++------------------
fs/splice.c | 2 +-
include/linux/fs.h | 4 ++--
4 files changed, 31 insertions(+), 23 deletions(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5d2a57e..d40010e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -870,7 +870,7 @@ __be32 nfsd_readv(struct file *file, loff_t offset, struct kvec *vec, int vlen,
oldfs = get_fs();
set_fs(KERNEL_DS);
- host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset);
+ host_err = vfs_readv(file, (struct iovec __user *)vec, vlen, &offset, 0);
set_fs(oldfs);
return nfsd_finish_read(file, count, host_err);
}
@@ -957,7 +957,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
/* Write the data. */
oldfs = get_fs(); set_fs(KERNEL_DS);
- host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos);
+ host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &pos, 0);
set_fs(oldfs);
if (host_err < 0)
goto out_nfserr;
diff --git a/fs/read_write.c b/fs/read_write.c
index dadf24e..3b7577d 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -693,11 +693,14 @@ unsigned long iov_shorten(struct iovec *iov, unsigned long nr_segs, size_t to)
EXPORT_SYMBOL(iov_shorten);
static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, iter_fn_t fn)
+ loff_t *ppos, iter_fn_t fn, int flags)
{
struct kiocb kiocb;
ssize_t ret;
+ if (flags)
+ return -EOPNOTSUPP;
+
init_sync_kiocb(&kiocb, filp);
kiocb.ki_pos = *ppos;
@@ -709,10 +712,13 @@ static ssize_t do_iter_readv_writev(struct file *filp, struct iov_iter *iter,
/* Do it by hand, with file-ops */
static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
- loff_t *ppos, io_fn_t fn)
+ loff_t *ppos, io_fn_t fn, int flags)
{
ssize_t ret = 0;
+ if (flags)
+ return -EOPNOTSUPP;
+
while (iov_iter_count(iter)) {
struct iovec iovec = iov_iter_iovec(iter);
ssize_t nr;
@@ -813,7 +819,8 @@ out:
static ssize_t do_readv_writev(int type, struct file *file,
const struct iovec __user * uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
size_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -845,9 +852,9 @@ static ssize_t do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -864,27 +871,27 @@ out:
}
ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_READ))
return -EINVAL;
- return do_readv_writev(READ, file, vec, vlen, pos);
+ return do_readv_writev(READ, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_readv);
ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
- unsigned long vlen, loff_t *pos)
+ unsigned long vlen, loff_t *pos, int flags)
{
if (!(file->f_mode & FMODE_WRITE))
return -EBADF;
if (!(file->f_mode & FMODE_CAN_WRITE))
return -EINVAL;
- return do_readv_writev(WRITE, file, vec, vlen, pos);
+ return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
}
EXPORT_SYMBOL(vfs_writev);
@@ -897,7 +904,7 @@ SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -917,7 +924,7 @@ SYSCALL_DEFINE3(writev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
loff_t pos = file_pos_read(f.file);
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
if (ret >= 0)
file_pos_write(f.file, pos);
fdput_pos(f);
@@ -949,7 +956,7 @@ SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos);
+ ret = vfs_readv(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -973,7 +980,7 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
if (f.file) {
ret = -ESPIPE;
if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos);
+ ret = vfs_writev(f.file, vec, vlen, &pos, 0);
fdput(f);
}
@@ -987,7 +994,8 @@ SYSCALL_DEFINE5(pwritev, unsigned long, fd, const struct iovec __user *, vec,
static ssize_t compat_do_readv_writev(int type, struct file *file,
const struct compat_iovec __user *uvector,
- unsigned long nr_segs, loff_t *pos)
+ unsigned long nr_segs, loff_t *pos,
+ int flags)
{
compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
@@ -1019,9 +1027,9 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
}
if (iter_fn)
- ret = do_iter_readv_writev(file, &iter, pos, iter_fn);
+ ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
else
- ret = do_loop_readv_writev(file, &iter, pos, fn);
+ ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
if (type != READ)
file_end_write(file);
@@ -1050,7 +1058,7 @@ static size_t compat_readv(struct file *file,
if (!(file->f_mode & FMODE_CAN_READ))
goto out;
- ret = compat_do_readv_writev(READ, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(READ, file, vec, vlen, pos, 0);
out:
if (ret > 0)
@@ -1127,7 +1135,7 @@ static size_t compat_writev(struct file *file,
if (!(file->f_mode & FMODE_CAN_WRITE))
goto out;
- ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos);
+ ret = compat_do_readv_writev(WRITE, file, vec, vlen, pos, 0);
out:
if (ret > 0)
diff --git a/fs/splice.c b/fs/splice.c
index 82bc0d6..3dc1426 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -577,7 +577,7 @@ static ssize_t kernel_readv(struct file *file, const struct iovec *vec,
old_fs = get_fs();
set_fs(get_ds());
/* The cast to a user pointer is valid due to the set_fs() */
- res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos);
+ res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
set_fs(old_fs);
return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ae68100..875277a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1709,9 +1709,9 @@ extern ssize_t __vfs_write(struct file *, const char __user *, size_t, loff_t *)
extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
- unsigned long, loff_t *);
+ unsigned long, loff_t *, int);
extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
loff_t, size_t, unsigned int);
extern int vfs_clone_file_range(struct file *file_in, loff_t pos_in,
--
2.1.4
^ permalink raw reply related [flat|nested] 22+ messages in thread
end of thread, other threads:[~2016-03-03 15:04 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-24 14:14 selective block polling and preadv2/pwritev2 revisited Christoph Hellwig
2015-12-24 14:14 ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev Christoph Hellwig
2015-12-24 14:14 ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 2/6] vfs: vfs: Define new syscalls preadv2,pwritev2 Christoph Hellwig
2015-12-24 14:14 ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 3/6] x86: wire up preadv2 and pwritev2 Christoph Hellwig
2015-12-24 14:14 ` Christoph Hellwig
2015-12-24 14:14 ` [PATCH 4/6] vfs: add the RWF_HIPRI flag for preadv2/pwritev2 Christoph Hellwig
2015-12-24 14:14 ` [PATCH 5/6] direct-io: only use block polling if explicitly requested Christoph Hellwig
2015-12-24 14:14 ` [PATCH 6/6] blk-mq: enable polling support by default Christoph Hellwig
2016-01-04 14:58 ` selective block polling and preadv2/pwritev2 revisited Sagi Grimberg
2016-01-04 14:58 ` Sagi Grimberg
2016-01-04 16:39 ` Christoph Hellwig
2016-01-04 16:39 ` Christoph Hellwig
2016-01-06 17:01 ` Sagi Grimberg
2016-01-06 17:01 ` Sagi Grimberg
2016-01-06 22:49 ` Martin K. Petersen
2016-01-07 14:41 ` Sagi Grimberg
2016-01-07 14:41 ` Sagi Grimberg
2016-03-03 15:03 generic RDMA READ/WRITE API V2 Christoph Hellwig
2016-03-03 15:03 ` [PATCH 1/6] vfs: pass a flags argument to vfs_readv/vfs_writev Christoph Hellwig
2016-03-03 15:03 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.