* [PATCH 1/7] aio: don't print the page size at boot time
2018-04-15 15:01 io_pgetevents & aio fsync V3 Christoph Hellwig
@ 2018-04-15 15:01 ` Christoph Hellwig
2018-04-15 15:01 ` [PATCH 2/7] aio: remove an outdated BUG_ON and comment in aio_complete Christoph Hellwig
` (5 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-04-15 15:01 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
The page size is in no way related to the aio code, and printing it in
the (debug) dmesg at every boot serves no purpose.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/aio.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 88d7927ffbc6..add46b06be86 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -264,9 +264,6 @@ static int __init aio_setup(void)
kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC);
kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC);
-
- pr_debug("sizeof(struct page) = %zu\n", sizeof(struct page));
-
return 0;
}
__initcall(aio_setup);
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/7] aio: remove an outdated BUG_ON and comment in aio_complete
2018-04-15 15:01 io_pgetevents & aio fsync V3 Christoph Hellwig
2018-04-15 15:01 ` [PATCH 1/7] aio: don't print the page size at boot time Christoph Hellwig
@ 2018-04-15 15:01 ` Christoph Hellwig
2018-04-24 21:27 ` Darrick J. Wong
2018-04-15 15:01 ` [PATCH 3/7] aio: sanitize ki_list handling Christoph Hellwig
` (4 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-04-15 15:01 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
These days we don't treat sync iocbs special in the aio completion code as
they never use it. Remove the old comment and BUG_ON given that the
current definition of is_sync_kiocb makes it impossible to hit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/aio.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index add46b06be86..7c1855afd723 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1107,15 +1107,6 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
file_end_write(file);
}
- /*
- * Special case handling for sync iocbs:
- * - events go directly into the iocb for fast handling
- * - the sync task with the iocb in its stack holds the single iocb
- * ref, no other paths have a way to get another ref
- * - the sync task helpfully left a reference to itself in the iocb
- */
- BUG_ON(is_sync_kiocb(kiocb));
-
if (iocb->ki_list.next) {
unsigned long flags;
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 2/7] aio: remove an outdated BUG_ON and comment in aio_complete
2018-04-15 15:01 ` [PATCH 2/7] aio: remove an outdated BUG_ON and comment in aio_complete Christoph Hellwig
@ 2018-04-24 21:27 ` Darrick J. Wong
0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-04-24 21:27 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
On Sun, Apr 15, 2018 at 05:01:03PM +0200, Christoph Hellwig wrote:
> These days we don't treat sync iocbs special in the aio completion code as
> they never use it. Remove the old comment and BUG_ON given that the
> current definition of is_sync_kiocb makes it impossible to hit.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/aio.c | 9 ---------
> 1 file changed, 9 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index add46b06be86..7c1855afd723 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1107,15 +1107,6 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
> file_end_write(file);
> }
>
> - /*
> - * Special case handling for sync iocbs:
> - * - events go directly into the iocb for fast handling
> - * - the sync task with the iocb in its stack holds the single iocb
> - * ref, no other paths have a way to get another ref
> - * - the sync task helpfully left a reference to itself in the iocb
> - */
> - BUG_ON(is_sync_kiocb(kiocb));
> -
> if (iocb->ki_list.next) {
> unsigned long flags;
>
> --
> 2.17.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 3/7] aio: sanitize ki_list handling
2018-04-15 15:01 io_pgetevents & aio fsync V3 Christoph Hellwig
2018-04-15 15:01 ` [PATCH 1/7] aio: don't print the page size at boot time Christoph Hellwig
2018-04-15 15:01 ` [PATCH 2/7] aio: remove an outdated BUG_ON and comment in aio_complete Christoph Hellwig
@ 2018-04-15 15:01 ` Christoph Hellwig
2018-04-15 15:01 ` [PATCH 4/7] aio: remove the extra get_file/fput pair in io_submit_one Christoph Hellwig
` (3 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-04-15 15:01 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
Instead of handcoded non-null checks always initialize ki_list to an
empty list and use list_empty / list_empty_careful on it. While we're
at it also error out on a double call to kiocb_set_cancel_fn instead
of ignoring it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/aio.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 7c1855afd723..18507743757a 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -553,13 +553,12 @@ void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
- spin_lock_irqsave(&ctx->ctx_lock, flags);
-
- if (!req->ki_list.next)
- list_add(&req->ki_list, &ctx->active_reqs);
+ if (WARN_ON_ONCE(!list_empty(&req->ki_list)))
+ return;
+ spin_lock_irqsave(&ctx->ctx_lock, flags);
+ list_add_tail(&req->ki_list, &ctx->active_reqs);
req->ki_cancel = cancel;
-
spin_unlock_irqrestore(&ctx->ctx_lock, flags);
}
EXPORT_SYMBOL(kiocb_set_cancel_fn);
@@ -1039,7 +1038,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
goto out_put;
percpu_ref_get(&ctx->reqs);
-
+ INIT_LIST_HEAD(&req->ki_list);
req->ki_ctx = ctx;
return req;
out_put:
@@ -1107,7 +1106,7 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
file_end_write(file);
}
- if (iocb->ki_list.next) {
+ if (!list_empty_careful(&iocb->ki_list)) {
unsigned long flags;
spin_lock_irqsave(&ctx->ctx_lock, flags);
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/7] aio: remove the extra get_file/fput pair in io_submit_one
2018-04-15 15:01 io_pgetevents & aio fsync V3 Christoph Hellwig
` (2 preceding siblings ...)
2018-04-15 15:01 ` [PATCH 3/7] aio: sanitize ki_list handling Christoph Hellwig
@ 2018-04-15 15:01 ` Christoph Hellwig
2018-04-26 15:41 ` Darrick J. Wong
2018-04-15 15:01 ` [PATCH 5/7] aio: refactor read/write iocb setup Christoph Hellwig
` (2 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2018-04-15 15:01 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
If we release the lockdep write protection token before calling into
->write_iter and thus never access the file pointer after an -EIOCBQUEUED
return from ->write_iter or ->read_iter we don't need this extra
reference.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/aio.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index 18507743757a..d7be32cdd1db 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1515,16 +1515,17 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
return ret;
ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
if (!ret) {
- req->ki_flags |= IOCB_WRITE;
- file_start_write(file);
- ret = aio_ret(req, call_write_iter(file, req, &iter));
/*
* We release freeze protection in aio_complete(). Fool lockdep
* by telling it the lock got released so that it doesn't
* complain about held lock when we return to userspace.
*/
- if (S_ISREG(file_inode(file)->i_mode))
+ if (S_ISREG(file_inode(file)->i_mode)) {
+ __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
+ }
+ req->ki_flags |= IOCB_WRITE;
+ ret = aio_ret(req, call_write_iter(file, req, &iter));
}
kfree(iovec);
return ret;
@@ -1599,7 +1600,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
req->ki_user_iocb = user_iocb;
req->ki_user_data = iocb->aio_data;
- get_file(file);
switch (iocb->aio_lio_opcode) {
case IOCB_CMD_PREAD:
ret = aio_read(&req->common, iocb, false, compat);
@@ -1618,7 +1618,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
ret = -EINVAL;
break;
}
- fput(file);
if (ret && ret != -EIOCBQUEUED)
goto out_put_req;
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 4/7] aio: remove the extra get_file/fput pair in io_submit_one
2018-04-15 15:01 ` [PATCH 4/7] aio: remove the extra get_file/fput pair in io_submit_one Christoph Hellwig
@ 2018-04-26 15:41 ` Darrick J. Wong
0 siblings, 0 replies; 19+ messages in thread
From: Darrick J. Wong @ 2018-04-26 15:41 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
On Sun, Apr 15, 2018 at 05:01:05PM +0200, Christoph Hellwig wrote:
> If we release the lockdep write protection token before calling into
> ->write_iter and thus never access the file pointer after an -EIOCBQUEUED
> return from ->write_iter or ->read_iter we don't need this extra
> reference.
Hmm, subtleties lurk to this unfamiliar reader...
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> fs/aio.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 18507743757a..d7be32cdd1db 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1515,16 +1515,17 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
> return ret;
> ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
> if (!ret) {
> - req->ki_flags |= IOCB_WRITE;
> - file_start_write(file);
> - ret = aio_ret(req, call_write_iter(file, req, &iter));
> /*
> * We release freeze protection in aio_complete(). Fool lockdep
> * by telling it the lock got released so that it doesn't
> * complain about held lock when we return to userspace.
> */
> - if (S_ISREG(file_inode(file)->i_mode))
> + if (S_ISREG(file_inode(file)->i_mode)) {
> + __sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
It took me a while to figure out that this ^^^ is the same as the
file_start_write call that you remove above, can you please update the
comment to note that we take freeze protection for the file before
screwing with lockdep? e.g.,
/*
* Open-code file_start_write here to grab freeze protection, which will
* be released by another thread in aio_complete(). Fool lockdep by
* telling it the lock got released so that it doesn't complain about
* held lock when we return to userspace.
*/
> __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
> + }
> + req->ki_flags |= IOCB_WRITE;
> + ret = aio_ret(req, call_write_iter(file, req, &iter));
> }
> kfree(iovec);
> return ret;
> @@ -1599,7 +1600,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> req->ki_user_iocb = user_iocb;
> req->ki_user_data = iocb->aio_data;
>
> - get_file(file);
Here we have a reference to *file, but...
> switch (iocb->aio_lio_opcode) {
> case IOCB_CMD_PREAD:
> ret = aio_read(&req->common, iocb, false, compat);
> @@ -1618,7 +1618,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> ret = -EINVAL;
> break;
> }
> - fput(file);
...by the time we get to here the reference may have gone away, but
you'd have to dig through aio_{read,write} -> call_{r,w}_iter ->
{r,w}_iter in order to figure out that the reference isn't valid
anymore on a EIOCBQUEUED return.
That's a little subtle, can you add a comment about that?
/*
* If ret is EIOCBQUEUED here, the ->read_iter/->write_iter dropped the
* reference on *file. We don't ourselves ensure a reference to the
* file, so we must be careful about that here and in the subfunctions.
*/
--D
>
> if (ret && ret != -EIOCBQUEUED)
> goto out_put_req;
> --
> 2.17.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 5/7] aio: refactor read/write iocb setup
2018-04-15 15:01 io_pgetevents & aio fsync V3 Christoph Hellwig
` (3 preceding siblings ...)
2018-04-15 15:01 ` [PATCH 4/7] aio: remove the extra get_file/fput pair in io_submit_one Christoph Hellwig
@ 2018-04-15 15:01 ` Christoph Hellwig
2018-04-15 15:01 ` [PATCH 6/7] aio: implement IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC Christoph Hellwig
2018-04-15 15:01 ` [PATCH 7/7] aio: implement io_pgetevents Christoph Hellwig
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-04-15 15:01 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
Don't reference the kiocb structure from the common aio code, and move
any use of it into helper specific to the read/write path. This is in
preparation for aio_poll support that wants to use the space for different
fields.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/aio.c | 159 +++++++++++++++++++++++++++++++------------------------
1 file changed, 91 insertions(+), 68 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index d7be32cdd1db..d4dd002712c8 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -170,7 +170,9 @@ struct kioctx {
#define KIOCB_CANCELLED ((void *) (~0ULL))
struct aio_kiocb {
- struct kiocb common;
+ union {
+ struct kiocb rw;
+ };
struct kioctx *ki_ctx;
kiocb_cancel_fn *ki_cancel;
@@ -549,7 +551,7 @@ static int aio_setup_ring(struct kioctx *ctx, unsigned int nr_events)
void kiocb_set_cancel_fn(struct kiocb *iocb, kiocb_cancel_fn *cancel)
{
- struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, common);
+ struct aio_kiocb *req = container_of(iocb, struct aio_kiocb, rw);
struct kioctx *ctx = req->ki_ctx;
unsigned long flags;
@@ -581,7 +583,7 @@ static int kiocb_cancel(struct aio_kiocb *kiocb)
cancel = cmpxchg(&kiocb->ki_cancel, old, KIOCB_CANCELLED);
} while (cancel != old);
- return cancel(&kiocb->common);
+ return cancel(&kiocb->rw);
}
/*
@@ -1046,15 +1048,6 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
return NULL;
}
-static void kiocb_free(struct aio_kiocb *req)
-{
- if (req->common.ki_filp)
- fput(req->common.ki_filp);
- if (req->ki_eventfd != NULL)
- eventfd_ctx_put(req->ki_eventfd);
- kmem_cache_free(kiocb_cachep, req);
-}
-
static struct kioctx *lookup_ioctx(unsigned long ctx_id)
{
struct aio_ring __user *ring = (void __user *)ctx_id;
@@ -1085,27 +1078,14 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
/* aio_complete
* Called when the io request on the given iocb is complete.
*/
-static void aio_complete(struct kiocb *kiocb, long res, long res2)
+static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
{
- struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, common);
struct kioctx *ctx = iocb->ki_ctx;
struct aio_ring *ring;
struct io_event *ev_page, *event;
unsigned tail, pos, head;
unsigned long flags;
- if (kiocb->ki_flags & IOCB_WRITE) {
- struct file *file = kiocb->ki_filp;
-
- /*
- * Tell lockdep we inherited freeze protection from submission
- * thread.
- */
- if (S_ISREG(file_inode(file)->i_mode))
- __sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE);
- file_end_write(file);
- }
-
if (!list_empty_careful(&iocb->ki_list)) {
unsigned long flags;
@@ -1167,11 +1147,12 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
* eventfd. The eventfd_signal() function is safe to be called
* from IRQ context.
*/
- if (iocb->ki_eventfd != NULL)
+ if (iocb->ki_eventfd) {
eventfd_signal(iocb->ki_eventfd, 1);
+ eventfd_ctx_put(iocb->ki_eventfd);
+ }
- /* everything turned out well, dispose of the aiocb. */
- kiocb_free(iocb);
+ kmem_cache_free(kiocb_cachep, iocb);
/*
* We have to order our ring_info tail store above and test
@@ -1434,6 +1415,45 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
return -EINVAL;
}
+static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
+{
+ struct aio_kiocb *iocb = container_of(kiocb, struct aio_kiocb, rw);
+
+ if (kiocb->ki_flags & IOCB_WRITE) {
+ struct inode *inode = file_inode(kiocb->ki_filp);
+
+ /*
+ * Tell lockdep we inherited freeze protection from submission
+ * thread.
+ */
+ if (S_ISREG(inode->i_mode))
+ __sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
+ file_end_write(kiocb->ki_filp);
+ }
+
+ fput(kiocb->ki_filp);
+ aio_complete(iocb, res, res2);
+}
+
+static int aio_prep_rw(struct kiocb *req, struct iocb *iocb)
+{
+ int ret;
+
+ req->ki_filp = fget(iocb->aio_fildes);
+ if (unlikely(!req->ki_filp))
+ return -EBADF;
+ req->ki_complete = aio_complete_rw;
+ req->ki_pos = iocb->aio_offset;
+ req->ki_flags = iocb_flags(req->ki_filp);
+ if (iocb->aio_flags & IOCB_FLAG_RESFD)
+ req->ki_flags |= IOCB_EVENTFD;
+ req->ki_hint = file_write_hint(req->ki_filp);
+ ret = kiocb_set_rw_flags(req, iocb->aio_rw_flags);
+ if (unlikely(ret))
+ fput(req->ki_filp);
+ return ret;
+}
+
static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
bool vectored, bool compat, struct iov_iter *iter)
{
@@ -1453,7 +1473,7 @@ static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
}
-static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
+static inline ssize_t aio_rw_ret(struct kiocb *req, ssize_t ret)
{
switch (ret) {
case -EIOCBQUEUED:
@@ -1469,7 +1489,7 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
ret = -EINTR;
/*FALLTHRU*/
default:
- aio_complete(req, ret, 0);
+ aio_complete_rw(req, ret, 0);
return 0;
}
}
@@ -1477,57 +1497,77 @@ static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
bool compat)
{
- struct file *file = req->ki_filp;
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct iov_iter iter;
+ struct file *file;
ssize_t ret;
+ ret = aio_prep_rw(req, iocb);
+ if (ret)
+ return ret;
+ file = req->ki_filp;
+
+ ret = -EBADF;
if (unlikely(!(file->f_mode & FMODE_READ)))
- return -EBADF;
+ goto out_fput;
+ ret = -EINVAL;
if (unlikely(!file->f_op->read_iter))
- return -EINVAL;
+ goto out_fput;
ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
if (ret)
- return ret;
+ goto out_fput;
ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
if (!ret)
- ret = aio_ret(req, call_read_iter(file, req, &iter));
+ ret = aio_rw_ret(req, call_read_iter(file, req, &iter));
kfree(iovec);
+out_fput:
+ if (unlikely(ret && ret != -EIOCBQUEUED))
+ fput(file);
return ret;
}
static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
bool compat)
{
- struct file *file = req->ki_filp;
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
struct iov_iter iter;
+ struct file *file;
ssize_t ret;
+ ret = aio_prep_rw(req, iocb);
+ if (ret)
+ return ret;
+ file = req->ki_filp;
+
+ ret = -EBADF;
if (unlikely(!(file->f_mode & FMODE_WRITE)))
- return -EBADF;
+ goto out_fput;
+ ret = -EINVAL;
if (unlikely(!file->f_op->write_iter))
- return -EINVAL;
+ goto out_fput;
ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
if (ret)
- return ret;
+ goto out_fput;
ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
if (!ret) {
/*
- * We release freeze protection in aio_complete(). Fool lockdep
- * by telling it the lock got released so that it doesn't
- * complain about held lock when we return to userspace.
+ * We release freeze protection in aio_complete_rw(). Fool
+ * lockdep by telling it the lock got released so that it
+ * doesn't complain about held lock when we return to userspace.
*/
if (S_ISREG(file_inode(file)->i_mode)) {
__sb_start_write(file_inode(file)->i_sb, SB_FREEZE_WRITE, true);
__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
}
req->ki_flags |= IOCB_WRITE;
- ret = aio_ret(req, call_write_iter(file, req, &iter));
+ ret = aio_rw_ret(req, call_write_iter(file, req, &iter));
}
kfree(iovec);
+out_fput:
+ if (unlikely(ret && ret != -EIOCBQUEUED))
+ fput(file);
return ret;
}
@@ -1535,7 +1575,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
struct iocb *iocb, bool compat)
{
struct aio_kiocb *req;
- struct file *file;
ssize_t ret;
/* enforce forwards compatibility on users */
@@ -1558,16 +1597,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
if (unlikely(!req))
return -EAGAIN;
- req->common.ki_filp = file = fget(iocb->aio_fildes);
- if (unlikely(!req->common.ki_filp)) {
- ret = -EBADF;
- goto out_put_req;
- }
- req->common.ki_pos = iocb->aio_offset;
- req->common.ki_complete = aio_complete;
- req->common.ki_flags = iocb_flags(req->common.ki_filp);
- req->common.ki_hint = file_write_hint(file);
-
if (iocb->aio_flags & IOCB_FLAG_RESFD) {
/*
* If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
@@ -1581,14 +1610,6 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
req->ki_eventfd = NULL;
goto out_put_req;
}
-
- req->common.ki_flags |= IOCB_EVENTFD;
- }
-
- ret = kiocb_set_rw_flags(&req->common, iocb->aio_rw_flags);
- if (unlikely(ret)) {
- pr_debug("EINVAL: aio_rw_flags\n");
- goto out_put_req;
}
ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
@@ -1602,16 +1623,16 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
switch (iocb->aio_lio_opcode) {
case IOCB_CMD_PREAD:
- ret = aio_read(&req->common, iocb, false, compat);
+ ret = aio_read(&req->rw, iocb, false, compat);
break;
case IOCB_CMD_PWRITE:
- ret = aio_write(&req->common, iocb, false, compat);
+ ret = aio_write(&req->rw, iocb, false, compat);
break;
case IOCB_CMD_PREADV:
- ret = aio_read(&req->common, iocb, true, compat);
+ ret = aio_read(&req->rw, iocb, true, compat);
break;
case IOCB_CMD_PWRITEV:
- ret = aio_write(&req->common, iocb, true, compat);
+ ret = aio_write(&req->rw, iocb, true, compat);
break;
default:
pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
@@ -1625,7 +1646,9 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
out_put_req:
put_reqs_available(ctx, 1);
percpu_ref_put(&ctx->reqs);
- kiocb_free(req);
+ if (req->ki_eventfd)
+ eventfd_ctx_put(req->ki_eventfd);
+ kmem_cache_free(kiocb_cachep, req);
return ret;
}
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/7] aio: implement IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC
2018-04-15 15:01 io_pgetevents & aio fsync V3 Christoph Hellwig
` (4 preceding siblings ...)
2018-04-15 15:01 ` [PATCH 5/7] aio: refactor read/write iocb setup Christoph Hellwig
@ 2018-04-15 15:01 ` Christoph Hellwig
2018-04-15 15:01 ` [PATCH 7/7] aio: implement io_pgetevents Christoph Hellwig
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-04-15 15:01 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
Simple workqueue offload for now, but prepared for adding a real aio_fsync
method if the need arises. Based on an earlier patch from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/aio.c | 43 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 43 insertions(+)
diff --git a/fs/aio.c b/fs/aio.c
index d4dd002712c8..de56496ba86c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -156,6 +156,12 @@ struct kioctx {
unsigned id;
};
+struct fsync_iocb {
+ struct work_struct work;
+ struct file *file;
+ bool datasync;
+};
+
/*
* We use ki_cancel == KIOCB_CANCELLED to indicate that a kiocb has been either
* cancelled or completed (this makes a certain amount of sense because
@@ -172,6 +178,7 @@ struct kioctx {
struct aio_kiocb {
union {
struct kiocb rw;
+ struct fsync_iocb fsync;
};
struct kioctx *ki_ctx;
@@ -1571,6 +1578,36 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
return ret;
}
+static void aio_fsync_work(struct work_struct *work)
+{
+ struct fsync_iocb *req = container_of(work, struct fsync_iocb, work);
+ int ret;
+
+ ret = vfs_fsync(req->file, req->datasync);
+ fput(req->file);
+ aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
+}
+
+static int aio_fsync(struct fsync_iocb *req, struct iocb *iocb, bool datasync)
+{
+ if (unlikely(iocb->aio_buf || iocb->aio_offset || iocb->aio_nbytes ||
+ iocb->aio_rw_flags))
+ return -EINVAL;
+
+ req->file = fget(iocb->aio_fildes);
+ if (unlikely(!req->file))
+ return -EBADF;
+ if (unlikely(!req->file->f_op->fsync)) {
+ fput(req->file);
+ return -EINVAL;
+ }
+
+ req->datasync = datasync;
+ INIT_WORK(&req->work, aio_fsync_work);
+ schedule_work(&req->work);
+ return -EIOCBQUEUED;
+}
+
static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
struct iocb *iocb, bool compat)
{
@@ -1634,6 +1671,12 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
case IOCB_CMD_PWRITEV:
ret = aio_write(&req->rw, iocb, true, compat);
break;
+ case IOCB_CMD_FSYNC:
+ ret = aio_fsync(&req->fsync, iocb, false);
+ break;
+ case IOCB_CMD_FDSYNC:
+ ret = aio_fsync(&req->fsync, iocb, true);
+ break;
default:
pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
ret = -EINVAL;
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/7] aio: implement io_pgetevents
2018-04-15 15:01 io_pgetevents & aio fsync V3 Christoph Hellwig
` (5 preceding siblings ...)
2018-04-15 15:01 ` [PATCH 6/7] aio: implement IOCB_CMD_FSYNC and IOCB_CMD_FDSYNC Christoph Hellwig
@ 2018-04-15 15:01 ` Christoph Hellwig
6 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-04-15 15:01 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:
sigset_t origmask;
pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
ret = io_getevents(ctx, min_nr, nr, events, timeout);
pthread_sigmask(SIG_SETMASK, &origmask, NULL);
Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure. It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/aio.c | 114 ++++++++++++++++++++++---
include/linux/compat.h | 7 ++
include/linux/syscalls.h | 6 ++
include/uapi/asm-generic/unistd.h | 4 +-
include/uapi/linux/aio_abi.h | 6 ++
kernel/sys_ni.c | 2 +
8 files changed, 130 insertions(+), 11 deletions(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index c58f75b088c5..99b6f7a36178 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -391,3 +391,4 @@
382 i386 pkey_free sys_pkey_free
383 i386 statx sys_statx
384 i386 arch_prctl sys_arch_prctl compat_sys_arch_prctl
+385 i386 io_pgetevents sys_io_pgetevents compat_sys_io_pgetevents
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 5aef183e2f85..e995cd2b4e65 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -339,6 +339,7 @@
330 common pkey_alloc sys_pkey_alloc
331 common pkey_free sys_pkey_free
332 common statx sys_statx
+333 common io_pgetevents sys_io_pgetevents
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/aio.c b/fs/aio.c
index de56496ba86c..5cfeb5bc3a7d 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1303,10 +1303,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
wait_event_interruptible_hrtimeout(ctx->wait,
aio_read_events(ctx, min_nr, nr, event, &ret),
until);
-
- if (!ret && signal_pending(current))
- ret = -EINTR;
-
return ret;
}
@@ -1913,13 +1909,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
struct timespec __user *, timeout)
{
struct timespec64 ts;
+ int ret;
+
+ if (timeout && unlikely(get_timespec64(&ts, timeout)))
+ return -EFAULT;
+
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+ if (!ret && signal_pending(current))
+ ret = -EINTR;
+ return ret;
+}
- if (timeout) {
- if (unlikely(get_timespec64(&ts, timeout)))
+SYSCALL_DEFINE6(io_pgetevents,
+ aio_context_t, ctx_id,
+ long, min_nr,
+ long, nr,
+ struct io_event __user *, events,
+ struct timespec __user *, timeout,
+ const struct __aio_sigset __user *, usig)
+{
+ struct __aio_sigset ksig = { NULL, };
+ sigset_t ksigmask, sigsaved;
+ struct timespec64 ts;
+ int ret;
+
+ if (timeout && unlikely(get_timespec64(&ts, timeout)))
+ return -EFAULT;
+
+ if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+ return -EFAULT;
+
+ if (ksig.sigmask) {
+ if (ksig.sigsetsize != sizeof(sigset_t))
+ return -EINVAL;
+ if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
return -EFAULT;
+ sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+ sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+ }
+
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+ if (signal_pending(current)) {
+ if (ksig.sigmask) {
+ current->saved_sigmask = sigsaved;
+ set_restore_sigmask();
+ }
+
+ if (!ret)
+ ret = -ERESTARTNOHAND;
+ } else {
+ if (ksig.sigmask)
+ sigprocmask(SIG_SETMASK, &sigsaved, NULL);
}
- return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+ return ret;
}
#ifdef CONFIG_COMPAT
@@ -1930,13 +1973,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
struct compat_timespec __user *, timeout)
{
struct timespec64 t;
+ int ret;
+
+ if (timeout && compat_get_timespec64(&t, timeout))
+ return -EFAULT;
+
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+ if (!ret && signal_pending(current))
+ ret = -EINTR;
+ return ret;
+}
+
- if (timeout) {
- if (compat_get_timespec64(&t, timeout))
+struct __compat_aio_sigset {
+ compat_sigset_t __user *sigmask;
+ compat_size_t sigsetsize;
+};
+
+COMPAT_SYSCALL_DEFINE6(io_pgetevents,
+ compat_aio_context_t, ctx_id,
+ compat_long_t, min_nr,
+ compat_long_t, nr,
+ struct io_event __user *, events,
+ struct compat_timespec __user *, timeout,
+ const struct __compat_aio_sigset __user *, usig)
+{
+ struct __compat_aio_sigset ksig = { NULL, };
+ sigset_t ksigmask, sigsaved;
+ struct timespec64 t;
+ int ret;
+
+ if (timeout && compat_get_timespec64(&t, timeout))
+ return -EFAULT;
+
+ if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+ return -EFAULT;
+
+ if (ksig.sigmask) {
+ if (ksig.sigsetsize != sizeof(compat_sigset_t))
+ return -EINVAL;
+ if (get_compat_sigset(&ksigmask, ksig.sigmask))
return -EFAULT;
+ sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+ sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+ }
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+ if (signal_pending(current)) {
+ if (ksig.sigmask) {
+ current->saved_sigmask = sigsaved;
+ set_restore_sigmask();
+ }
+ if (!ret)
+ ret = -ERESTARTNOHAND;
+ } else {
+ if (ksig.sigmask)
+ sigprocmask(SIG_SETMASK, &sigsaved, NULL);
}
- return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+ return ret;
}
#endif
diff --git a/include/linux/compat.h b/include/linux/compat.h
index f188eab10570..fdd068f192c1 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -310,6 +310,7 @@ extern int put_compat_rusage(const struct rusage *,
struct compat_rusage __user *);
struct compat_siginfo;
+struct __compat_aio_sigset;
struct compat_dirent {
u32 d_ino;
@@ -528,6 +529,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
compat_long_t nr,
struct io_event __user *events,
struct compat_timespec __user *timeout);
+asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
+ compat_long_t min_nr,
+ compat_long_t nr,
+ struct io_event __user *events,
+ struct compat_timespec __user *timeout,
+ const struct __compat_aio_sigset __user *usig);
/* fs/cookies.c */
asmlinkage long compat_sys_lookup_dcookie(u32, u32, char __user *, compat_size_t);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index b961184f597a..c40cf5ec24c9 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -264,6 +264,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
long nr,
struct io_event __user *events,
struct timespec __user *timeout);
+asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
+ long min_nr,
+ long nr,
+ struct io_event __user *events,
+ struct timespec __user *timeout,
+ const struct __aio_sigset *sig);
/* fs/xattr.c */
asmlinkage long sys_setxattr(const char __user *path, const char __user *name,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8bcb186c6f67..42990676a55e 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc, sys_pkey_alloc)
__SYSCALL(__NR_pkey_free, sys_pkey_free)
#define __NR_statx 291
__SYSCALL(__NR_statx, sys_statx)
+#define __NR_io_pgetevents 292
+__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
#undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..2c0a3415beee 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -29,6 +29,7 @@
#include <linux/types.h>
#include <linux/fs.h>
+#include <linux/signal.h>
#include <asm/byteorder.h>
typedef __kernel_ulong_t aio_context_t;
@@ -108,5 +109,10 @@ struct iocb {
#undef IFBIG
#undef IFLITTLE
+struct __aio_sigset {
+ sigset_t __user *sigmask;
+ size_t sigsetsize;
+};
+
#endif /* __LINUX__AIO_ABI_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 6cafc008f6db..7571cf9d7424 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -33,7 +33,9 @@ COND_SYSCALL(io_submit);
COND_SYSCALL_COMPAT(io_submit);
COND_SYSCALL(io_cancel);
COND_SYSCALL(io_getevents);
+COND_SYSCALL(io_pgetevents);
COND_SYSCALL_COMPAT(io_getevents);
+COND_SYSCALL_COMPAT(io_pgetevents);
/* fs/xattr.c */
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/7] aio: implement io_pgetevents
2018-05-02 21:14 io_pgetevents & aio fsync V4 Christoph Hellwig
@ 2018-05-02 21:14 ` Christoph Hellwig
2018-05-18 8:28 ` James Hogan
2018-07-04 14:21 ` Adrian Reber
0 siblings, 2 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-05-02 21:14 UTC (permalink / raw)
To: viro; +Cc: Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
This is the io_getevents equivalent of ppoll/pselect and allows to
properly mix signals and aio completions (especially with IOCB_CMD_POLL)
and atomically executes the following sequence:
sigset_t origmask;
pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
ret = io_getevents(ctx, min_nr, nr, events, timeout);
pthread_sigmask(SIG_SETMASK, &origmask, NULL);
Note that unlike many other signal related calls we do not pass a sigmask
size, as that would get us to 7 arguments, which aren't easily supported
by the syscall infrastructure. It seems a lot less painful to just add a
new syscall variant in the unlikely case we're going to increase the
sigset size.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
fs/aio.c | 114 ++++++++++++++++++++++---
include/linux/compat.h | 7 ++
include/linux/syscalls.h | 6 ++
include/uapi/asm-generic/unistd.h | 4 +-
include/uapi/linux/aio_abi.h | 6 ++
kernel/sys_ni.c | 2 +
8 files changed, 130 insertions(+), 11 deletions(-)
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index d6b27dab1b30..14a2f996e543 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -396,3 +396,4 @@
382 i386 pkey_free sys_pkey_free __ia32_sys_pkey_free
383 i386 statx sys_statx __ia32_sys_statx
384 i386 arch_prctl sys_arch_prctl __ia32_compat_sys_arch_prctl
+385 i386 io_pgetevents sys_io_pgetevents __ia32_compat_sys_io_pgetevents
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 4dfe42666d0c..cd36232ab62f 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -341,6 +341,7 @@
330 common pkey_alloc __x64_sys_pkey_alloc
331 common pkey_free __x64_sys_pkey_free
332 common statx __x64_sys_statx
+333 common io_pgetevents __x64_sys_io_pgetevents
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/aio.c b/fs/aio.c
index 61d2e6942951..f3eae5d5771b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1303,10 +1303,6 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
wait_event_interruptible_hrtimeout(ctx->wait,
aio_read_events(ctx, min_nr, nr, event, &ret),
until);
-
- if (!ret && signal_pending(current))
- ret = -EINTR;
-
return ret;
}
@@ -1921,13 +1917,60 @@ SYSCALL_DEFINE5(io_getevents, aio_context_t, ctx_id,
struct timespec __user *, timeout)
{
struct timespec64 ts;
+ int ret;
+
+ if (timeout && unlikely(get_timespec64(&ts, timeout)))
+ return -EFAULT;
+
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+ if (!ret && signal_pending(current))
+ ret = -EINTR;
+ return ret;
+}
- if (timeout) {
- if (unlikely(get_timespec64(&ts, timeout)))
+SYSCALL_DEFINE6(io_pgetevents,
+ aio_context_t, ctx_id,
+ long, min_nr,
+ long, nr,
+ struct io_event __user *, events,
+ struct timespec __user *, timeout,
+ const struct __aio_sigset __user *, usig)
+{
+ struct __aio_sigset ksig = { NULL, };
+ sigset_t ksigmask, sigsaved;
+ struct timespec64 ts;
+ int ret;
+
+ if (timeout && unlikely(get_timespec64(&ts, timeout)))
+ return -EFAULT;
+
+ if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+ return -EFAULT;
+
+ if (ksig.sigmask) {
+ if (ksig.sigsetsize != sizeof(sigset_t))
+ return -EINVAL;
+ if (copy_from_user(&ksigmask, ksig.sigmask, sizeof(ksigmask)))
return -EFAULT;
+ sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+ sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+ }
+
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+ if (signal_pending(current)) {
+ if (ksig.sigmask) {
+ current->saved_sigmask = sigsaved;
+ set_restore_sigmask();
+ }
+
+ if (!ret)
+ ret = -ERESTARTNOHAND;
+ } else {
+ if (ksig.sigmask)
+ sigprocmask(SIG_SETMASK, &sigsaved, NULL);
}
- return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL);
+ return ret;
}
#ifdef CONFIG_COMPAT
@@ -1938,13 +1981,64 @@ COMPAT_SYSCALL_DEFINE5(io_getevents, compat_aio_context_t, ctx_id,
struct compat_timespec __user *, timeout)
{
struct timespec64 t;
+ int ret;
+
+ if (timeout && compat_get_timespec64(&t, timeout))
+ return -EFAULT;
+
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+ if (!ret && signal_pending(current))
+ ret = -EINTR;
+ return ret;
+}
+
- if (timeout) {
- if (compat_get_timespec64(&t, timeout))
+struct __compat_aio_sigset {
+ compat_sigset_t __user *sigmask;
+ compat_size_t sigsetsize;
+};
+
+COMPAT_SYSCALL_DEFINE6(io_pgetevents,
+ compat_aio_context_t, ctx_id,
+ compat_long_t, min_nr,
+ compat_long_t, nr,
+ struct io_event __user *, events,
+ struct compat_timespec __user *, timeout,
+ const struct __compat_aio_sigset __user *, usig)
+{
+ struct __compat_aio_sigset ksig = { NULL, };
+ sigset_t ksigmask, sigsaved;
+ struct timespec64 t;
+ int ret;
+
+ if (timeout && compat_get_timespec64(&t, timeout))
+ return -EFAULT;
+
+ if (usig && copy_from_user(&ksig, usig, sizeof(ksig)))
+ return -EFAULT;
+
+ if (ksig.sigmask) {
+ if (ksig.sigsetsize != sizeof(compat_sigset_t))
+ return -EINVAL;
+ if (get_compat_sigset(&ksigmask, ksig.sigmask))
return -EFAULT;
+ sigdelsetmask(&ksigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));
+ sigprocmask(SIG_SETMASK, &ksigmask, &sigsaved);
+ }
+ ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+ if (signal_pending(current)) {
+ if (ksig.sigmask) {
+ current->saved_sigmask = sigsaved;
+ set_restore_sigmask();
+ }
+ if (!ret)
+ ret = -ERESTARTNOHAND;
+ } else {
+ if (ksig.sigmask)
+ sigprocmask(SIG_SETMASK, &sigsaved, NULL);
}
- return do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL);
+ return ret;
}
#endif
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 081281ad5772..ad192057b887 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -330,6 +330,7 @@ extern int put_compat_rusage(const struct rusage *,
struct compat_rusage __user *);
struct compat_siginfo;
+struct __compat_aio_sigset;
struct compat_dirent {
u32 d_ino;
@@ -553,6 +554,12 @@ asmlinkage long compat_sys_io_getevents(compat_aio_context_t ctx_id,
compat_long_t nr,
struct io_event __user *events,
struct compat_timespec __user *timeout);
+asmlinkage long compat_sys_io_pgetevents(compat_aio_context_t ctx_id,
+ compat_long_t min_nr,
+ compat_long_t nr,
+ struct io_event __user *events,
+ struct compat_timespec __user *timeout,
+ const struct __compat_aio_sigset __user *usig);
/* fs/cookies.c */
asmlinkage long compat_sys_lookup_dcookie(u32, u32, char __user *, compat_size_t);
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 70fcda1a9049..811172fcb916 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -290,6 +290,12 @@ asmlinkage long sys_io_getevents(aio_context_t ctx_id,
long nr,
struct io_event __user *events,
struct timespec __user *timeout);
+asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
+ long min_nr,
+ long nr,
+ struct io_event __user *events,
+ struct timespec __user *timeout,
+ const struct __aio_sigset *sig);
/* fs/xattr.c */
asmlinkage long sys_setxattr(const char __user *path, const char __user *name,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 8bcb186c6f67..42990676a55e 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,9 +732,11 @@ __SYSCALL(__NR_pkey_alloc, sys_pkey_alloc)
__SYSCALL(__NR_pkey_free, sys_pkey_free)
#define __NR_statx 291
__SYSCALL(__NR_statx, sys_statx)
+#define __NR_io_pgetevents 292
+__SC_COMP(__NR_io_pgetevents, sys_io_pgetevents, compat_sys_io_pgetevents)
#undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
/*
* 32 bit systems traditionally used different
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index a04adbc70ddf..2c0a3415beee 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -29,6 +29,7 @@
#include <linux/types.h>
#include <linux/fs.h>
+#include <linux/signal.h>
#include <asm/byteorder.h>
typedef __kernel_ulong_t aio_context_t;
@@ -108,5 +109,10 @@ struct iocb {
#undef IFBIG
#undef IFLITTLE
+struct __aio_sigset {
+ sigset_t __user *sigmask;
+ size_t sigsetsize;
+};
+
#endif /* __LINUX__AIO_ABI_H */
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 9791364925dc..183169c2a75b 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -43,7 +43,9 @@ COND_SYSCALL(io_submit);
COND_SYSCALL_COMPAT(io_submit);
COND_SYSCALL(io_cancel);
COND_SYSCALL(io_getevents);
+COND_SYSCALL(io_pgetevents);
COND_SYSCALL_COMPAT(io_getevents);
+COND_SYSCALL_COMPAT(io_pgetevents);
/* fs/xattr.c */
--
2.17.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-05-02 21:14 ` [PATCH 7/7] aio: implement io_pgetevents Christoph Hellwig
@ 2018-05-18 8:28 ` James Hogan
2018-05-18 8:57 ` Christoph Hellwig
2018-07-04 14:21 ` Adrian Reber
1 sibling, 1 reply; 19+ messages in thread
From: James Hogan @ 2018-05-18 8:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 882 bytes --]
Given this:
On Wed, May 02, 2018 at 11:14:48PM +0200, Christoph Hellwig wrote:
> +struct __aio_sigset {
> + sigset_t __user *sigmask;
> + size_t sigsetsize;
> +};
and:
> +asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
> + long min_nr,
> + long nr,
> + struct io_event __user *events,
> + struct timespec __user *timeout,
> + const struct __aio_sigset *sig);
The following paragraph in the commit message would appear to be
misleading since __aio_sigset contains a size:
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure. It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.
Is it possible to correct it before this gets merged?
Thanks
James
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-05-18 8:28 ` James Hogan
@ 2018-05-18 8:57 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-05-18 8:57 UTC (permalink / raw)
To: James Hogan
Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
linux-api, linux-kernel
On Fri, May 18, 2018 at 09:28:38AM +0100, James Hogan wrote:
> Given this:
>
> On Wed, May 02, 2018 at 11:14:48PM +0200, Christoph Hellwig wrote:
> > +struct __aio_sigset {
> > + sigset_t __user *sigmask;
> > + size_t sigsetsize;
> > +};
>
> and:
>
> > +asmlinkage long sys_io_pgetevents(aio_context_t ctx_id,
> > + long min_nr,
> > + long nr,
> > + struct io_event __user *events,
> > + struct timespec __user *timeout,
> > + const struct __aio_sigset *sig);
>
> The following paragraph in the commit message would appear to be
> misleading since __aio_sigset contains a size:
>
> > Note that unlike many other signal related calls we do not pass a sigmask
> > size, as that would get us to 7 arguments, which aren't easily supported
> > by the syscall infrastructure. It seems a lot less painful to just add a
> > new syscall variant in the unlikely case we're going to increase the
> > sigset size.
>
> Is it possible to correct it before this gets merged?
True, the calling convention change based on feedback. Al, is the
branch stable or you can edit this out?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-05-02 21:14 ` [PATCH 7/7] aio: implement io_pgetevents Christoph Hellwig
2018-05-18 8:28 ` James Hogan
@ 2018-07-04 14:21 ` Adrian Reber
2018-07-08 20:44 ` Christoph Hellwig
1 sibling, 1 reply; 19+ messages in thread
From: Adrian Reber @ 2018-07-04 14:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: viro, Avi Kivity, linux-aio, linux-fsdevel, linux-api, linux-kernel
On Wed, May 02, 2018 at 11:14:48PM +0200, Christoph Hellwig wrote:
> This is the io_getevents equivalent of ppoll/pselect and allows to
> properly mix signals and aio completions (especially with IOCB_CMD_POLL)
> and atomically executes the following sequence:
>
> sigset_t origmask;
>
> pthread_sigmask(SIG_SETMASK, &sigmask, &origmask);
> ret = io_getevents(ctx, min_nr, nr, events, timeout);
> pthread_sigmask(SIG_SETMASK, &origmask, NULL);
>
> Note that unlike many other signal related calls we do not pass a sigmask
> size, as that would get us to 7 arguments, which aren't easily supported
> by the syscall infrastructure. It seems a lot less painful to just add a
> new syscall variant in the unlikely case we're going to increase the
> sigset size.
Starting with this commit following code does not compile for me
anymore:
#include <signal.h>
#include <linux/aio_abi.h>
int main()
{
return 0;
}
In file included from /usr/include/linux/signal.h:5,
from /usr/include/linux/aio_abi.h:32,
from include.c:2:
/usr/include/asm/signal.h:16:23: error: conflicting types for ‘sigset_t’
typedef unsigned long sigset_t;
^~~~~~~~
In file included from /usr/include/signal.h:35,
from include.c:1:
/usr/include/bits/types/sigset_t.h:7:20: note: previous declaration of ‘sigset_t’ was here
typedef __sigset_t sigset_t;
^~~~~~~~
In file included from /usr/include/linux/signal.h:5,
from /usr/include/linux/aio_abi.h:32,
from include.c:2:
/usr/include/asm/signal.h:115:8: error: redefinition of ‘struct sigaction’
struct sigaction {
^~~~~~~~~
In file included from /usr/include/signal.h:226,
from include.c:1:
/usr/include/bits/sigaction.h:27:8: note: originally defined here
struct sigaction
^~~~~~~~~
[and much more]
Before this commit it compiles without errors.
Adrian
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-07-04 14:21 ` Adrian Reber
@ 2018-07-08 20:44 ` Christoph Hellwig
2018-07-09 17:20 ` Stephan Müller
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-07-08 20:44 UTC (permalink / raw)
To: Adrian Reber
Cc: Christoph Hellwig, viro, Avi Kivity, linux-aio, linux-fsdevel,
linux-api, linux-kernel
On Wed, Jul 04, 2018 at 04:21:16PM +0200, Adrian Reber wrote:
> In file included from /usr/include/linux/signal.h:5,
> from /usr/include/linux/aio_abi.h:32,
> from include.c:2:
> /usr/include/asm/signal.h:16:23: error: conflicting types for ‘sigset_t’
> typedef unsigned long sigset_t;
> ^~~~~~~~
> In file included from /usr/include/signal.h:35,
> from include.c:1:
> /usr/include/bits/types/sigset_t.h:7:20: note: previous declaration of ‘sigset_t’ was here
> typedef __sigset_t sigset_t;
I guess we could do something like the patch below, although it is
rather ugly:
diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
index 75846164290e..b7705ad66d78 100644
--- a/include/uapi/linux/aio_abi.h
+++ b/include/uapi/linux/aio_abi.h
@@ -29,7 +29,11 @@
#include <linux/types.h>
#include <linux/fs.h>
+#ifdef __KERNEL__
#include <linux/signal.h>
+#else
+#include <signal.h>
+#endif
#include <asm/byteorder.h>
typedef __kernel_ulong_t aio_context_t;
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-07-08 20:44 ` Christoph Hellwig
@ 2018-07-09 17:20 ` Stephan Müller
2018-07-09 19:21 ` Stephan Müller
2018-07-10 5:11 ` Andrei Vagin
2 siblings, 0 replies; 19+ messages in thread
From: Stephan Müller @ 2018-07-09 17:20 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Adrian Reber, viro, Avi Kivity, linux-aio, linux-fsdevel,
linux-api, linux-kernel, Ondrej Mosnacek
Am Sonntag, 8. Juli 2018, 22:44:00 CEST schrieb Christoph Hellwig:
Hi Christoph,
>
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index 75846164290e..b7705ad66d78 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,7 +29,11 @@
>
> #include <linux/types.h>
> #include <linux/fs.h>
> +#ifdef __KERNEL__
> #include <linux/signal.h>
> +#else
> +#include <signal.h>
> +#endif
> #include <asm/byteorder.h>
Without such a patch, libkcapi fails to compile as well. See [1].
Apart from your suggested patch above, do you have another suggestion how make
the user space code compile?
[1] https://github.com/smuellerDD/libkcapi/issues/59
Thanks
Stephan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-07-08 20:44 ` Christoph Hellwig
2018-07-09 17:20 ` Stephan Müller
@ 2018-07-09 19:21 ` Stephan Müller
2018-07-10 12:51 ` Christoph Hellwig
2018-07-10 5:11 ` Andrei Vagin
2 siblings, 1 reply; 19+ messages in thread
From: Stephan Müller @ 2018-07-09 19:21 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Adrian Reber, viro, Avi Kivity, linux-aio, linux-fsdevel,
linux-api, linux-kernel, Ondrej Mosnacek
Am Sonntag, 8. Juli 2018, 22:44:00 CEST schrieb Christoph Hellwig:
Hi Christoph,
>
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index 75846164290e..b7705ad66d78 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,7 +29,11 @@
>
> #include <linux/types.h>
> #include <linux/fs.h>
> +#ifdef __KERNEL__
> #include <linux/signal.h>
> +#else
> +#include <signal.h>
> +#endif
> #include <asm/byteorder.h>
Without such a patch, libkcapi fails to compile as well. See [1].
Apart from your suggested patch above, do you have another suggestion how make
the user space code compile?
[1] https://github.com/smuellerDD/libkcapi/issues/59
Thanks
Stephan
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-07-09 19:21 ` Stephan Müller
@ 2018-07-10 12:51 ` Christoph Hellwig
0 siblings, 0 replies; 19+ messages in thread
From: Christoph Hellwig @ 2018-07-10 12:51 UTC (permalink / raw)
To: Stephan Müller
Cc: Christoph Hellwig, Adrian Reber, viro, Avi Kivity, linux-aio,
linux-fsdevel, linux-api, linux-kernel, Ondrej Mosnacek
On Mon, Jul 09, 2018 at 09:21:53PM +0200, Stephan Müller wrote:
> Without such a patch, libkcapi fails to compile as well. See [1].
>
> Apart from your suggested patch above, do you have another suggestion how make
> the user space code compile?
The only other option would be to move the __aio_sigset delcaration
out of the uapi header, similar to what we do for pselect. This is
also rather ugly, but I gess that's what we'll have to do. I will
prepare a patch for it.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/7] aio: implement io_pgetevents
2018-07-08 20:44 ` Christoph Hellwig
2018-07-09 17:20 ` Stephan Müller
2018-07-09 19:21 ` Stephan Müller
@ 2018-07-10 5:11 ` Andrei Vagin
2 siblings, 0 replies; 19+ messages in thread
From: Andrei Vagin @ 2018-07-10 5:11 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Adrian Reber, viro, Avi Kivity, linux-aio, linux-fsdevel,
linux-api, linux-kernel
On Sun, Jul 08, 2018 at 10:44:00PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 04, 2018 at 04:21:16PM +0200, Adrian Reber wrote:
> > In file included from /usr/include/linux/signal.h:5,
> > from /usr/include/linux/aio_abi.h:32,
> > from include.c:2:
> > /usr/include/asm/signal.h:16:23: error: conflicting types for ‘sigset_t’
> > typedef unsigned long sigset_t;
> > ^~~~~~~~
> > In file included from /usr/include/signal.h:35,
> > from include.c:1:
> > /usr/include/bits/types/sigset_t.h:7:20: note: previous declaration of ‘sigset_t’ was here
> > typedef __sigset_t sigset_t;
>
> I guess we could do something like the patch below, although it is
> rather ugly:
>
> diff --git a/include/uapi/linux/aio_abi.h b/include/uapi/linux/aio_abi.h
> index 75846164290e..b7705ad66d78 100644
> --- a/include/uapi/linux/aio_abi.h
> +++ b/include/uapi/linux/aio_abi.h
> @@ -29,7 +29,11 @@
>
> #include <linux/types.h>
> #include <linux/fs.h>
> +#ifdef __KERNEL__
> #include <linux/signal.h>
> +#else
> +#include <signal.h>
> +#endif
I think we can not do this because this header specifies the kernel
API, but signal.h is provided by libc and sigset_t can be defined
differently there:
[avagin@laptop ~]$ cat test.c
#ifdef TEST_LINUX_SIGNAL
# include <glob.h>
# include <linux/signal.h>
#else
# include <signal.h>
#endif
#include <stdio.h>
int main()
{
printf("sizeof(sigset_t) = %d\n", sizeof(sigset_t));
return 0;
}
[avagin@laptop ~]$ gcc -DTEST_LINUX_SIGNAL test.c && ./a.out
sizeof(sigset_t) = 8
[avagin@laptop ~]$ gcc test.c && ./a.out
sizeof(sigset_t) = 128
[avagin@laptop include]$ rpm -qf /usr/include/signal.h
glibc-headers-2.27-8.fc28.i686
glibc-headers-2.27-8.fc28.x86_64
[avagin@laptop include]$ rpm -qf /usr/include/linux/signal.h
kernel-headers-4.16.5-300.fc28.x86_64
> #include <asm/byteorder.h>
>
> typedef __kernel_ulong_t aio_context_t;
^ permalink raw reply [flat|nested] 19+ messages in thread