linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/8] aio: make sure file is pinned
       [not found] <CAHk-=wghRmdYKRAwakeHmjcpbLt9fJAHyU2x8s_NZONhz6RTOw@mail.gmail.com>
@ 2019-03-07  0:03 ` Al Viro
  2019-03-07  0:03   ` [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup Al Viro
                     ` (7 more replies)
  0 siblings, 8 replies; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

"aio: remove the extra get_file/fput pair in io_submit_one" was
too optimistic - not dereferencing file pointer after e.g.
->write_iter() returns is not enough; that reference might've been
the only thing that kept alive objects that are referenced
*before* the method returns.  Such as inode, for example...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/aio.c b/fs/aio.c
index 3d9669d011b9..ea30b78187ed 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1790,6 +1790,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 			   struct iocb __user *user_iocb, bool compat)
 {
 	struct aio_kiocb *req;
+	struct file *file;
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
@@ -1844,6 +1845,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 
 	req->ki_user_iocb = user_iocb;
 	req->ki_user_data = iocb->aio_data;
+	file = get_file(req->ki_filp);	/* req can die too early */
 
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
@@ -1872,6 +1874,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		ret = -EINVAL;
 		break;
 	}
+	fput(file);
 
 	/*
 	 * If ret is 0, we'd either done aio_complete() ourselves or have
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
@ 2019-03-07  0:03   ` Al Viro
  2019-03-07  2:18     ` Al Viro
  2019-03-07  0:03   ` [PATCH 3/8] aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error Al Viro
                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

In case of early wakeups, aio_poll() assumes that aio_poll_complete()
has either already happened or is imminent.  In that case we do not
want to put iocb on the list of cancellables.  However, ignored
wakeups need to be treated as if wakeup has not happened at all.
Trivially fixed by having aio_poll_wake() set ->woken only after
it's committed to taking iocb out of the waitqueue.

Spotted-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index ea30b78187ed..3a8b894378e0 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1668,13 +1668,13 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	__poll_t mask = key_to_poll(key);
 	unsigned long flags;
 
+	/* for instances that support it check for an event match first: */
+	if (mask && !(mask & req->events))
+		return 0;
+
 	req->woken = true;
 
-	/* for instances that support it check for an event match first: */
 	if (mask) {
-		if (!(mask & req->events))
-			return 0;
-
 		/*
 		 * Try to complete the iocb inline if we can. Use
 		 * irqsave/irqrestore because not all filesystems (e.g. fuse)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/8] aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
  2019-03-07  0:03   ` [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup Al Viro
@ 2019-03-07  0:03   ` Al Viro
  2019-03-07  2:11     ` zhengbin (A)
  2019-03-07  0:03   ` [PATCH 4/8] aio_poll(): get rid of weird refcounting Al Viro
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

We want iocb_put() happening on errors, to balance the extra reference
we'd taken.  As it is, we end up with a leak.  The rules should be
	* error: iocb_put() to deal with the extra ref, return error,
let the caller do another iocb_put().
	* async: iocb_put() to deal with the extra ref, return 0.
	* no error, event present immediately: aio_poll_complete() to
report it, iocb_put() to deal with the extra ref, return 0.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3a8b894378e0..22b288997441 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1724,6 +1724,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	struct kioctx *ctx = aiocb->ki_ctx;
 	struct poll_iocb *req = &aiocb->poll;
 	struct aio_poll_table apt;
+	bool async = false;
 	__poll_t mask;
 
 	/* reject any unknown events outside the normal event mask. */
@@ -1760,30 +1761,26 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 
 	spin_lock_irq(&ctx->ctx_lock);
 	spin_lock(&req->head->lock);
-	if (req->woken) {
-		/* wake_up context handles the rest */
-		mask = 0;
+	if (req->woken) { /* already taken up by aio_poll_wake() */
+		async = true;
 		apt.error = 0;
-	} else if (mask || apt.error) {
-		/* if we get an error or a mask we are done */
-		WARN_ON_ONCE(list_empty(&req->wait.entry));
-		list_del_init(&req->wait.entry);
-	} else {
-		/* actually waiting for an event */
+	} else if (!mask && !apt.error) { /* actually waiting for an event */
 		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
 		aiocb->ki_cancel = aio_poll_cancel;
+		async = true;
+	} else { /* if we get an error or a mask we are done */
+		WARN_ON_ONCE(list_empty(&req->wait.entry));
+		list_del_init(&req->wait.entry);
+		/* no wakeup in the future either; aiocb is ours to dispose of */
 	}
 	spin_unlock(&req->head->lock);
 	spin_unlock_irq(&ctx->ctx_lock);
 
 out:
-	if (unlikely(apt.error))
-		return apt.error;
-
-	if (mask)
+	if (async && !apt.error)
 		aio_poll_complete(aiocb, mask);
 	iocb_put(aiocb);
-	return 0;
+	return apt.error;
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/8] aio_poll(): get rid of weird refcounting
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
  2019-03-07  0:03   ` [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup Al Viro
  2019-03-07  0:03   ` [PATCH 3/8] aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error Al Viro
@ 2019-03-07  0:03   ` Al Viro
  2019-03-07  0:03   ` [PATCH 5/8] make aio_read()/aio_write() return int Al Viro
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

The only reason for taking the extra ref to iocb is that we want
to access it after vfs_poll() and an early wakeup could have it
already completed by the time vfs_poll() returns.

That's very easy to avoid, though - we need to know which lock
to grab and, having grabbed it, we need to check if an early
wakeup has already happened.  So let's just copy the reference
to waitqueue into aio_poll_table and instead of having the
"woken" flag in iocb, move it to aio_poll() stack frame and
put its address into iocb (and make sure it's cleared, so
aio_poll_wake() won't step onto it after aio_poll() exits).

That's enough to get rid of the refcount.  In async case freeing
is done by aio_poll_complete() (and aio_poll() will only touch
the iocb past the vfs_poll() if it's guaranteed that aio_poll_complete()
has not happened yet), in all other cases we make sure that wakeups
hadn't and won't happen and deal with disposal of iocb ourselves.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 55 +++++++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 22b288997441..ee062253e303 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -180,8 +180,8 @@ struct fsync_iocb {
 struct poll_iocb {
 	struct file		*file;
 	struct wait_queue_head	*head;
+	bool			*taken;
 	__poll_t		events;
-	bool			woken;
 	bool			cancelled;
 	struct wait_queue_entry	wait;
 	struct work_struct	work;
@@ -209,8 +209,6 @@ struct aio_kiocb {
 
 	struct list_head	ki_list;	/* the aio core uses this
 						 * for cancellation */
-	refcount_t		ki_refcnt;
-
 	/*
 	 * If the aio_resfd field of the userspace iocb is not zero,
 	 * this is the underlying eventfd context to deliver events to.
@@ -1034,7 +1032,6 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 	percpu_ref_get(&ctx->reqs);
 	req->ki_ctx = ctx;
 	INIT_LIST_HEAD(&req->ki_list);
-	refcount_set(&req->ki_refcnt, 0);
 	req->ki_eventfd = NULL;
 	return req;
 }
@@ -1069,13 +1066,10 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 
 static inline void iocb_put(struct aio_kiocb *iocb)
 {
-	if (refcount_read(&iocb->ki_refcnt) == 0 ||
-	    refcount_dec_and_test(&iocb->ki_refcnt)) {
-		if (iocb->ki_filp)
-			fput(iocb->ki_filp);
-		percpu_ref_put(&iocb->ki_ctx->reqs);
-		kmem_cache_free(kiocb_cachep, iocb);
-	}
+	if (iocb->ki_filp)
+		fput(iocb->ki_filp);
+	percpu_ref_put(&iocb->ki_ctx->reqs);
+	kmem_cache_free(kiocb_cachep, iocb);
 }
 
 static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb,
@@ -1672,8 +1666,10 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	if (mask && !(mask & req->events))
 		return 0;
 
-	req->woken = true;
-
+	if (unlikely(req->taken)) {
+		*req->taken = true;
+		req->taken = NULL;
+	}
 	if (mask) {
 		/*
 		 * Try to complete the iocb inline if we can. Use
@@ -1698,6 +1694,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 
 struct aio_poll_table {
 	struct poll_table_struct	pt;
+	struct wait_queue_head		*head;
 	struct aio_kiocb		*iocb;
 	int				error;
 };
@@ -1715,7 +1712,7 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head,
 	}
 
 	pt->error = 0;
-	pt->iocb->poll.head = head;
+	pt->head = pt->iocb->poll.head = head;
 	add_wait_queue(head, &pt->iocb->poll.wait);
 }
 
@@ -1738,7 +1735,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
 
 	req->head = NULL;
-	req->woken = false;
+	req->taken = &async;
 	req->cancelled = false;
 
 	apt.pt._qproc = aio_poll_queue_proc;
@@ -1750,36 +1747,38 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	INIT_LIST_HEAD(&req->wait.entry);
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 
-	/* one for removal from waitqueue, one for this function */
-	refcount_set(&aiocb->ki_refcnt, 2);
-
-	mask = vfs_poll(req->file, &apt.pt) & req->events;
-	if (unlikely(!req->head)) {
+	mask = req->events;
+	mask &= vfs_poll(req->file, &apt.pt);
+	/*
+	 * Careful: we might've been put into waitqueue *and* already
+	 * woken up before vfs_poll() returns.  The caller is holding
+	 * a reference to file, so it couldn't have been killed under
+	 * us, no matter what.  However, in case of early wakeup, @req
+	 * might be already gone by now.
+	 */
+	if (unlikely(!apt.head)) {
 		/* we did not manage to set up a waitqueue, done */
 		goto out;
 	}
-
 	spin_lock_irq(&ctx->ctx_lock);
-	spin_lock(&req->head->lock);
-	if (req->woken) { /* already taken up by aio_poll_wake() */
-		async = true;
+	spin_lock(&apt.head->lock);
+	if (async) { /* already taken up by aio_poll_wake() */
 		apt.error = 0;
 	} else if (!mask && !apt.error) { /* actually waiting for an event */
 		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
 		aiocb->ki_cancel = aio_poll_cancel;
+		req->taken = NULL;
 		async = true;
 	} else { /* if we get an error or a mask we are done */
 		WARN_ON_ONCE(list_empty(&req->wait.entry));
 		list_del_init(&req->wait.entry);
 		/* no wakeup in the future either; aiocb is ours to dispose of */
 	}
-	spin_unlock(&req->head->lock);
+	spin_unlock(&apt.head->lock);
 	spin_unlock_irq(&ctx->ctx_lock);
-
 out:
-	if (async && !apt.error)
+	if (!async && !apt.error)
 		aio_poll_complete(aiocb, mask);
-	iocb_put(aiocb);
 	return apt.error;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/8] make aio_read()/aio_write() return int
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
                     ` (2 preceding siblings ...)
  2019-03-07  0:03   ` [PATCH 4/8] aio_poll(): get rid of weird refcounting Al Viro
@ 2019-03-07  0:03   ` Al Viro
  2019-03-07  0:03   ` [PATCH 6/8] move dropping ->ki_eventfd into iocb_put() Al Viro
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

that ssize_t is a rudiment of earlier calling conventions; it's been
used only to pass 0 and -E... since last autumn.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index ee062253e303..5dd5f35d054c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1508,13 +1508,13 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
 	}
 }
 
-static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb,
+static int aio_read(struct kiocb *req, const struct iocb *iocb,
 			bool vectored, bool compat)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
 	struct file *file;
-	ssize_t ret;
+	int ret;
 
 	ret = aio_prep_rw(req, iocb);
 	if (ret)
@@ -1536,13 +1536,13 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb,
 	return ret;
 }
 
-static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb,
+static int aio_write(struct kiocb *req, const struct iocb *iocb,
 			 bool vectored, bool compat)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
 	struct file *file;
-	ssize_t ret;
+	int ret;
 
 	ret = aio_prep_rw(req, iocb);
 	if (ret)
@@ -1716,7 +1716,7 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head,
 	add_wait_queue(head, &pt->iocb->poll.wait);
 }
 
-static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
+static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 {
 	struct kioctx *ctx = aiocb->ki_ctx;
 	struct poll_iocb *req = &aiocb->poll;
@@ -1787,7 +1787,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 {
 	struct aio_kiocb *req;
 	struct file *file;
-	ssize_t ret;
+	int ret;
 
 	/* enforce forwards compatibility on users */
 	if (unlikely(iocb->aio_reserved2)) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/8] move dropping ->ki_eventfd into iocb_put()
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
                     ` (3 preceding siblings ...)
  2019-03-07  0:03   ` [PATCH 5/8] make aio_read()/aio_write() return int Al Viro
@ 2019-03-07  0:03   ` Al Viro
  2019-03-07  0:03   ` [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself Al Viro
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 5dd5f35d054c..5837a29e63fe 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1066,6 +1066,8 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 
 static inline void iocb_put(struct aio_kiocb *iocb)
 {
+	if (iocb->ki_eventfd)
+		eventfd_ctx_put(iocb->ki_eventfd);
 	if (iocb->ki_filp)
 		fput(iocb->ki_filp);
 	percpu_ref_put(&iocb->ki_ctx->reqs);
@@ -1142,10 +1144,8 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 	 * eventfd. The eventfd_signal() function is safe to be called
 	 * from IRQ context.
 	 */
-	if (iocb->ki_eventfd) {
+	if (iocb->ki_eventfd)
 		eventfd_signal(iocb->ki_eventfd, 1);
-		eventfd_ctx_put(iocb->ki_eventfd);
-	}
 
 	/*
 	 * We have to order our ring_info tail store above and test
@@ -1819,18 +1819,19 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		goto out_put_req;
 
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
+		struct eventfd_ctx *eventfd;
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
 		 * instance of the file* now. The file descriptor must be
 		 * an eventfd() fd, and will be signaled for each completed
 		 * event using the eventfd_signal() function.
 		 */
-		req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
-		if (IS_ERR(req->ki_eventfd)) {
-			ret = PTR_ERR(req->ki_eventfd);
-			req->ki_eventfd = NULL;
+		eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
+		if (IS_ERR(eventfd)) {
+			ret = PTR_ERR(eventfd);
 			goto out_put_req;
 		}
+		req->ki_eventfd = eventfd;
 	}
 
 	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
@@ -1881,8 +1882,6 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		goto out_put_req;
 	return 0;
 out_put_req:
-	if (req->ki_eventfd)
-		eventfd_ctx_put(req->ki_eventfd);
 	iocb_put(req);
 out_put_reqs_available:
 	put_reqs_available(ctx, 1);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
                     ` (4 preceding siblings ...)
  2019-03-07  0:03   ` [PATCH 6/8] move dropping ->ki_eventfd into iocb_put() Al Viro
@ 2019-03-07  0:03   ` Al Viro
  2019-03-07  0:03   ` [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one() Al Viro
  2019-03-07  0:23   ` [PATCH 1/8] aio: make sure file is pinned Linus Torvalds
  7 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

simplifies the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 5837a29e63fe..af51b1360305 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1029,6 +1029,11 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 	if (unlikely(!req))
 		return NULL;
 
+	if (unlikely(!get_reqs_available(ctx))) {
+		kfree(req);
+		return NULL;
+	}
+
 	percpu_ref_get(&ctx->reqs);
 	req->ki_ctx = ctx;
 	INIT_LIST_HEAD(&req->ki_list);
@@ -1805,13 +1810,9 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		return -EINVAL;
 	}
 
-	if (!get_reqs_available(ctx))
-		return -EAGAIN;
-
-	ret = -EAGAIN;
 	req = aio_get_req(ctx);
 	if (unlikely(!req))
-		goto out_put_reqs_available;
+		return -EAGAIN;
 
 	req->ki_filp = fget(iocb->aio_fildes);
 	ret = -EBADF;
@@ -1883,7 +1884,6 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 	return 0;
 out_put_req:
 	iocb_put(req);
-out_put_reqs_available:
 	put_reqs_available(ctx, 1);
 	return ret;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one()
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
                     ` (5 preceding siblings ...)
  2019-03-07  0:03   ` [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself Al Viro
@ 2019-03-07  0:03   ` Al Viro
  2019-03-07  0:23   ` [PATCH 1/8] aio: make sure file is pinned Linus Torvalds
  7 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

makes for somewhat cleaner control flow in __io_submit_one()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 86 +++++++++++++++++++++++++++++++---------------------------------
 1 file changed, 42 insertions(+), 44 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index af51b1360305..6993581b77b2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1788,36 +1788,15 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
-			   struct iocb __user *user_iocb, bool compat)
+			   struct iocb __user *user_iocb, struct aio_kiocb *req,
+			   bool compat)
 {
-	struct aio_kiocb *req;
 	struct file *file;
 	int ret;
 
-	/* enforce forwards compatibility on users */
-	if (unlikely(iocb->aio_reserved2)) {
-		pr_debug("EINVAL: reserve field set\n");
-		return -EINVAL;
-	}
-
-	/* prevent overflows */
-	if (unlikely(
-	    (iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
-	    (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
-	    ((ssize_t)iocb->aio_nbytes < 0)
-	   )) {
-		pr_debug("EINVAL: overflow check\n");
-		return -EINVAL;
-	}
-
-	req = aio_get_req(ctx);
-	if (unlikely(!req))
-		return -EAGAIN;
-
 	req->ki_filp = fget(iocb->aio_fildes);
-	ret = -EBADF;
 	if (unlikely(!req->ki_filp))
-		goto out_put_req;
+		return -EBADF;
 
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		struct eventfd_ctx *eventfd;
@@ -1828,17 +1807,15 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		 * event using the eventfd_signal() function.
 		 */
 		eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
-		if (IS_ERR(eventfd)) {
-			ret = PTR_ERR(eventfd);
-			goto out_put_req;
-		}
+		if (IS_ERR(eventfd))
+			return PTR_ERR(req->ki_eventfd);
+
 		req->ki_eventfd = eventfd;
 	}
 
-	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
-	if (unlikely(ret)) {
+	if (unlikely(put_user(KIOCB_KEY, &user_iocb->aio_key))) {
 		pr_debug("EFAULT: aio_key\n");
-		goto out_put_req;
+		return -EFAULT;
 	}
 
 	req->ki_user_iocb = user_iocb;
@@ -1873,30 +1850,51 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		break;
 	}
 	fput(file);
-
-	/*
-	 * If ret is 0, we'd either done aio_complete() ourselves or have
-	 * arranged for that to be done asynchronously.  Anything non-zero
-	 * means that we need to destroy req ourselves.
-	 */
-	if (ret)
-		goto out_put_req;
-	return 0;
-out_put_req:
-	iocb_put(req);
-	put_reqs_available(ctx, 1);
 	return ret;
 }
 
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 bool compat)
 {
+	struct aio_kiocb *req;
 	struct iocb iocb;
+	int err;
 
 	if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb))))
 		return -EFAULT;
 
-	return __io_submit_one(ctx, &iocb, user_iocb, compat);
+	/* enforce forwards compatibility on users */
+	if (unlikely(iocb.aio_reserved2)) {
+		pr_debug("EINVAL: reserve field set\n");
+		return -EINVAL;
+	}
+
+	/* prevent overflows */
+	if (unlikely(
+	    (iocb.aio_buf != (unsigned long)iocb.aio_buf) ||
+	    (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) ||
+	    ((ssize_t)iocb.aio_nbytes < 0)
+	   )) {
+		pr_debug("EINVAL: overflow check\n");
+		return -EINVAL;
+	}
+
+	req = aio_get_req(ctx);
+	if (unlikely(!req))
+		return -EAGAIN;
+
+	err = __io_submit_one(ctx, &iocb, user_iocb, req, compat);
+
+	/*
+	 * If err is 0, we'd either done aio_complete() ourselves or have
+	 * arranged for that to be done asynchronously.  Anything non-zero
+	 * means that we need to destroy req ourselves.
+	 */
+	if (unlikely(err)) {
+		iocb_put(req);
+		put_reqs_available(ctx, 1);
+	}
+	return err;
 }
 
 /* sys_io_submit:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
                     ` (6 preceding siblings ...)
  2019-03-07  0:03   ` [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one() Al Viro
@ 2019-03-07  0:23   ` Linus Torvalds
  2019-03-07  0:41     ` Al Viro
  7 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2019-03-07  0:23 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> "aio: remove the extra get_file/fput pair in io_submit_one" was
> too optimistic - not dereferencing file pointer after e.g.
> ->write_iter() returns is not enough; that reference might've been
> the only thing that kept alive objects that are referenced
> *before* the method returns.  Such as inode, for example...

I still; think that this is actually _worse_ than just having the
refcount on the req instead.

As it is, we have that completely insane "ref can go away from under
us", because nothing keeps that around, which then causes all those
other crazy issues with "woken" etc garbage.

I think we should be able to get rid of those entirely. Make the
poll() case just return zero if it has added the entry successfully to
poll queue.  No need for "woken", no need for all that odd "oh, but
now the req might no longer exist".

The refcount wasn't the problem. Everything *else* was the problem,
including only using the refcount for the poll case etc.

                       Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-07  0:23   ` [PATCH 1/8] aio: make sure file is pinned Linus Torvalds
@ 2019-03-07  0:41     ` Al Viro
  2019-03-07  0:48       ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote:
> On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > "aio: remove the extra get_file/fput pair in io_submit_one" was
> > too optimistic - not dereferencing file pointer after e.g.
> > ->write_iter() returns is not enough; that reference might've been
> > the only thing that kept alive objects that are referenced
> > *before* the method returns.  Such as inode, for example...
> 
> I still; think that this is actually _worse_ than just having the
> refcount on the req instead.
> 
> As it is, we have that completely insane "ref can go away from under
> us", because nothing keeps that around, which then causes all those
> other crazy issues with "woken" etc garbage.
>
> I think we should be able to get rid of those entirely. Make the
> poll() case just return zero if it has added the entry successfully to
> poll queue.  No need for "woken", no need for all that odd "oh, but
> now the req might no longer exist".

Not really.  Sure, you can get rid of "might no longer exist"
considerations, but you still need to decide which way do we want to
handle it.  There are 3 cases:
	* it's already taken up; don't put on the list for possible
cancel, don't call aio_complete().
	* will eventually be woken up; put on the list for possible
cancle, don't call aio_complete().
	* wanted to be on several queues, fortunately not woken up
yet.  Make sure it's gone from queue, return an error.
	* none of the above, and ->poll() has reported what we wanted
from the very beginning.  Remove from queue, call aio_complete().

You'll need some logics to handle that.  I can buy the "if we know
the req is still alive, we can check if it's still queued instead of
separate woken flag", but but it won't win you much ;-/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-07  0:41     ` Al Viro
@ 2019-03-07  0:48       ` Al Viro
  2019-03-07  1:20         ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-07  0:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Thu, Mar 07, 2019 at 12:41:59AM +0000, Al Viro wrote:
> On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote:
> > On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > "aio: remove the extra get_file/fput pair in io_submit_one" was
> > > too optimistic - not dereferencing file pointer after e.g.
> > > ->write_iter() returns is not enough; that reference might've been
> > > the only thing that kept alive objects that are referenced
> > > *before* the method returns.  Such as inode, for example...
> > 
> > I still; think that this is actually _worse_ than just having the
> > refcount on the req instead.
> > 
> > As it is, we have that completely insane "ref can go away from under
> > us", because nothing keeps that around, which then causes all those
> > other crazy issues with "woken" etc garbage.
> >
> > I think we should be able to get rid of those entirely. Make the
> > poll() case just return zero if it has added the entry successfully to
> > poll queue.  No need for "woken", no need for all that odd "oh, but
> > now the req might no longer exist".
> 
> Not really.  Sure, you can get rid of "might no longer exist"
> considerations, but you still need to decide which way do we want to
> handle it.  There are 3 cases:
> 	* it's already taken up; don't put on the list for possible
> cancel, don't call aio_complete().
> 	* will eventually be woken up; put on the list for possible
> cancle, don't call aio_complete().
> 	* wanted to be on several queues, fortunately not woken up
> yet.  Make sure it's gone from queue, return an error.
> 	* none of the above, and ->poll() has reported what we wanted
> from the very beginning.  Remove from queue, call aio_complete().
> 
> You'll need some logics to handle that.  I can buy the "if we know
> the req is still alive, we can check if it's still queued instead of
> separate woken flag", but but it won't win you much ;-/

If anything, the one good reason for refcount would be the risk that
some ->read_iter() or ->write_iter() will try to dereference iocb
after having decided to return -EIOCBQUEUED and submitted all bios.
I think that doesn't happen, but making sure it doesn't would be
a good argument in favour of that refcount.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-07  0:48       ` Al Viro
@ 2019-03-07  1:20         ` Al Viro
  2019-03-07  1:30           ` Linus Torvalds
  0 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-07  1:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Thu, Mar 07, 2019 at 12:48:28AM +0000, Al Viro wrote:
> On Thu, Mar 07, 2019 at 12:41:59AM +0000, Al Viro wrote:
> > On Wed, Mar 06, 2019 at 04:23:04PM -0800, Linus Torvalds wrote:
> > > On Wed, Mar 6, 2019 at 4:03 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > >
> > > > "aio: remove the extra get_file/fput pair in io_submit_one" was
> > > > too optimistic - not dereferencing file pointer after e.g.
> > > > ->write_iter() returns is not enough; that reference might've been
> > > > the only thing that kept alive objects that are referenced
> > > > *before* the method returns.  Such as inode, for example...
> > > 
> > > I still; think that this is actually _worse_ than just having the
> > > refcount on the req instead.
> > > 
> > > As it is, we have that completely insane "ref can go away from under
> > > us", because nothing keeps that around, which then causes all those
> > > other crazy issues with "woken" etc garbage.
> > >
> > > I think we should be able to get rid of those entirely. Make the
> > > poll() case just return zero if it has added the entry successfully to
> > > poll queue.  No need for "woken", no need for all that odd "oh, but
> > > now the req might no longer exist".
> > 
> > Not really.  Sure, you can get rid of "might no longer exist"
> > considerations, but you still need to decide which way do we want to
> > handle it.  There are 3 cases:
> > 	* it's already taken up; don't put on the list for possible
> > cancel, don't call aio_complete().
> > 	* will eventually be woken up; put on the list for possible
> > cancle, don't call aio_complete().
> > 	* wanted to be on several queues, fortunately not woken up
> > yet.  Make sure it's gone from queue, return an error.
> > 	* none of the above, and ->poll() has reported what we wanted
> > from the very beginning.  Remove from queue, call aio_complete().
> > 
> > You'll need some logics to handle that.  I can buy the "if we know
> > the req is still alive, we can check if it's still queued instead of
> > separate woken flag", but but it won't win you much ;-/
> 
> If anything, the one good reason for refcount would be the risk that
> some ->read_iter() or ->write_iter() will try to dereference iocb
> after having decided to return -EIOCBQUEUED and submitted all bios.
> I think that doesn't happen, but making sure it doesn't would be
> a good argument in favour of that refcount.

*grumble*

It is a good argument, unfortunately ;-/  Proof that instances do not
step into that is rather subtle and won't take much to break.  OK...

I'll try to massage that series on top of your patch; I still hate the
post-vfs_poll() logics in aio_poll() ;-/  Give me about half an hour
and I'll have something to post.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-07  1:20         ` Al Viro
@ 2019-03-07  1:30           ` Linus Torvalds
  2019-03-08  3:36             ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Linus Torvalds @ 2019-03-07  1:30 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Wed, Mar 6, 2019 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> I'll try to massage that series on top of your patch; I still hate the
> post-vfs_poll() logics in aio_poll() ;-/  Give me about half an hour
> and I'll have something to post.

No inherent hurry, I sent the ping just to make sure it hadn't gotten lost.

And yeah, I think the post-vfs_poll() logic cannot possibly be
necessary. My gut feel is that *if* we have the refcounting right,
then we should be able to just let the wakeup come in at any later
point, and ordering shouldn't matter all that much, and we shouldn't
even need any locking.

I'd like to think that it can be done with something like "just 'or'
in the mask atomically" (so that we don't care about ordering between
the synchronous vfs_poll() and the async poll wakeup), together with
"when refcount goes to zero, finish the thing off and complete it" (so
that we don't care who finishes first).

No "woken" logic, no "who fired first" logic, no BS. Just make the
operations work regardless of ordering.

And maybe it can't be done. But the current model seems just so hacky
that it can't be the right model.

               Linus

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/8] aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error
  2019-03-07  0:03   ` [PATCH 3/8] aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error Al Viro
@ 2019-03-07  2:11     ` zhengbin (A)
  0 siblings, 0 replies; 42+ messages in thread
From: zhengbin (A) @ 2019-03-07  2:11 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

+	if (async && !apt.error)  --->may be this should be if (!async && !apt.error) ?

On 2019/3/7 8:03, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> We want iocb_put() happening on errors, to balance the extra reference
> we'd taken.  As it is, we end up with a leak.  The rules should be
> 	* error: iocb_put() to deal with the extra ref, return error,
> let the caller do another iocb_put().
> 	* async: iocb_put() to deal with the extra ref, return 0.
> 	* no error, event present immediately: aio_poll_complete() to
> report it, iocb_put() to deal with the extra ref, return 0.
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/aio.c | 25 +++++++++++--------------
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 3a8b894378e0..22b288997441 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1724,6 +1724,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
>  	struct kioctx *ctx = aiocb->ki_ctx;
>  	struct poll_iocb *req = &aiocb->poll;
>  	struct aio_poll_table apt;
> +	bool async = false;
>  	__poll_t mask;
>  
>  	/* reject any unknown events outside the normal event mask. */
> @@ -1760,30 +1761,26 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
>  
>  	spin_lock_irq(&ctx->ctx_lock);
>  	spin_lock(&req->head->lock);
> -	if (req->woken) {
> -		/* wake_up context handles the rest */
> -		mask = 0;
> +	if (req->woken) { /* already taken up by aio_poll_wake() */
> +		async = true;
>  		apt.error = 0;
> -	} else if (mask || apt.error) {
> -		/* if we get an error or a mask we are done */
> -		WARN_ON_ONCE(list_empty(&req->wait.entry));
> -		list_del_init(&req->wait.entry);
> -	} else {
> -		/* actually waiting for an event */
> +	} else if (!mask && !apt.error) { /* actually waiting for an event */
>  		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
>  		aiocb->ki_cancel = aio_poll_cancel;
> +		async = true;
> +	} else { /* if we get an error or a mask we are done */
> +		WARN_ON_ONCE(list_empty(&req->wait.entry));
> +		list_del_init(&req->wait.entry);
> +		/* no wakeup in the future either; aiocb is ours to dispose of */
>  	}
>  	spin_unlock(&req->head->lock);
>  	spin_unlock_irq(&ctx->ctx_lock);
>  
>  out:
> -	if (unlikely(apt.error))
> -		return apt.error;
> -
> -	if (mask)
> +	if (async && !apt.error)
>  		aio_poll_complete(aiocb, mask);
>  	iocb_put(aiocb);
> -	return 0;
> +	return apt.error;
>  }
>  
>  static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup
  2019-03-07  0:03   ` [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup Al Viro
@ 2019-03-07  2:18     ` Al Viro
  2019-03-08 11:16       ` zhengbin (A)
  0 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-07  2:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Thu, Mar 07, 2019 at 12:03:10AM +0000, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> In case of early wakeups, aio_poll() assumes that aio_poll_complete()
> has either already happened or is imminent.  In that case we do not
> want to put iocb on the list of cancellables.  However, ignored
> wakeups need to be treated as if wakeup has not happened at all.
> Trivially fixed by having aio_poll_wake() set ->woken only after
> it's committed to taking iocb out of the waitqueue.
> 
> Spotted-by: zhengbin <zhengbin13@huawei.com>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

... and unfortunately it's worse than just that - what both of us
have missed is that one could have non-specific wakep + schedule_work +
aio_poll_complete_work() rechecking ->poll(), seeing nothing of
interest and reinserting into queue.  All before vfs_poll() manages
to return into aio_poll().  The window is harder to hit, but it's
still there, with exact same "failed to add to cancel list" kind of bug
if we do hit it ;-/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-07  1:30           ` Linus Torvalds
@ 2019-03-08  3:36             ` Al Viro
  2019-03-08 15:50               ` Christoph Hellwig
  2019-03-10  7:06               ` Al Viro
  0 siblings, 2 replies; 42+ messages in thread
From: Al Viro @ 2019-03-08  3:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Wed, Mar 06, 2019 at 05:30:21PM -0800, Linus Torvalds wrote:
> On Wed, Mar 6, 2019 at 5:20 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > I'll try to massage that series on top of your patch; I still hate the
> > post-vfs_poll() logics in aio_poll() ;-/  Give me about half an hour
> > and I'll have something to post.
> 
> No inherent hurry, I sent the ping just to make sure it hadn't gotten lost.
> 
> And yeah, I think the post-vfs_poll() logic cannot possibly be
> necessary. My gut feel is that *if* we have the refcounting right,
> then we should be able to just let the wakeup come in at any later
> point, and ordering shouldn't matter all that much, and we shouldn't
> even need any locking.
> 
> I'd like to think that it can be done with something like "just 'or'
> in the mask atomically" (so that we don't care about ordering between
> the synchronous vfs_poll() and the async poll wakeup), together with
> "when refcount goes to zero, finish the thing off and complete it" (so
> that we don't care who finishes first).
> 
> No "woken" logic, no "who fired first" logic, no BS. Just make the
> operations work regardless of ordering.
> 
> And maybe it can't be done. But the current model seems just so hacky
> that it can't be the right model.

Umm...  It is kinda-sorta doable; we do need something vaguely similar
to ->woken ("should we add it to the list of cancellables, or is the
async reference already gone?"), but other than that it seems to be
feasible.

See vfs.git#work.aio; the crucial bits are in these commits:
      keep io_event in aio_kiocb
      get rid of aio_complete() res/res2 arguments
      move aio_complete() to final iocb_put(), try to fix aio_poll() logics
The first two are preparations, the last is where the fixes (hopefully)
happen.

The logics in aio_poll() after vfs_poll():
	* we might want to steal the async reference (e.g. due to event
returned from the very beginning, or due to attempt to put on more than
one waitqueue, which makes results unreliable).  That's _NOT_ possible
if the thing had been put on a waitqueue, but currently isn't there.
It might be either due to early wakeup having done everything or the
same having scheduled aio_poll_complete_work().  In either case, the
best we can do is to ignore the return value of vfs_poll() and, in
case of error, mark the sucker cancelled.  We *can't* return an error
in that case.

	* if we want and can steal the async reference, rip it from
waitqueue; otherwise, put it on the "cancellable" list, unless it's
already gone or unless we are simulating the cancel ourselves.

	* if vfs_poll() has reported something we want and we have
successufully stolen the iocb, put it there, have the reference
we'd taken over dropped and return 0

Comments?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup
  2019-03-07  2:18     ` Al Viro
@ 2019-03-08 11:16       ` zhengbin (A)
  0 siblings, 0 replies; 42+ messages in thread
From: zhengbin (A) @ 2019-03-08 11:16 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

So, what should we do?

On 2019/3/7 10:18, Al Viro wrote:
> On Thu, Mar 07, 2019 at 12:03:10AM +0000, Al Viro wrote:
>> From: Al Viro <viro@zeniv.linux.org.uk>
>>
>> In case of early wakeups, aio_poll() assumes that aio_poll_complete()
>> has either already happened or is imminent.  In that case we do not
>> want to put iocb on the list of cancellables.  However, ignored
>> wakeups need to be treated as if wakeup has not happened at all.
>> Trivially fixed by having aio_poll_wake() set ->woken only after
>> it's committed to taking iocb out of the waitqueue.
>>
>> Spotted-by: zhengbin <zhengbin13@huawei.com>
>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> ... and unfortunately it's worse than just that - what both of us
> have missed is that one could have non-specific wakep + schedule_work +
> aio_poll_complete_work() rechecking ->poll(), seeing nothing of
> interest and reinserting into queue.  All before vfs_poll() manages
> to return into aio_poll().  The window is harder to hit, but it's
> still there, with exact same "failed to add to cancel list" kind of bug
> if we do hit it ;-/
> 
> .
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-08  3:36             ` Al Viro
@ 2019-03-08 15:50               ` Christoph Hellwig
  2019-03-10  7:06               ` Al Viro
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-08 15:50 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang, avi

On Fri, Mar 08, 2019 at 03:36:50AM +0000, Al Viro wrote:
> See vfs.git#work.aio; the crucial bits are in these commits:
>       keep io_event in aio_kiocb
>       get rid of aio_complete() res/res2 arguments
>       move aio_complete() to final iocb_put(), try to fix aio_poll() logics
> The first two are preparations, the last is where the fixes (hopefully)
> happen.

Looks sensible.  I'll try to run the tests over it, and I've added
Avi so that maybe he can make sure that scylladb is also happy with it,
that was usually the best way to find aio poll bugs..

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-08  3:36             ` Al Viro
  2019-03-08 15:50               ` Christoph Hellwig
@ 2019-03-10  7:06               ` Al Viro
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
  2019-03-11 19:41                 ` [PATCH 1/8] aio: make sure file is pinned Christoph Hellwig
  1 sibling, 2 replies; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Fri, Mar 08, 2019 at 03:36:50AM +0000, Al Viro wrote:

> See vfs.git#work.aio; the crucial bits are in these commits:
>       keep io_event in aio_kiocb
>       get rid of aio_complete() res/res2 arguments
>       move aio_complete() to final iocb_put(), try to fix aio_poll() logics
> The first two are preparations, the last is where the fixes (hopefully)
> happen.

OK, refactored, cleaned up and force-pushed.  Current state:
Al Viro (7):
      keep io_event in aio_kiocb
      aio: store event at final iocb_put()
      Fix aio_poll() races
      make aio_read()/aio_write() return int
      move dropping ->ki_eventfd into iocb_destroy()
      deal with get_reqs_available() in aio_get_req() itself
      aio: move sanity checks and request allocation to io_submit_one()

Linus Torvalds (1):
      pin iocb through aio.

 fs/aio.c | 327 ++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 146 insertions(+), 181 deletions(-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/8] pin iocb through aio.
  2019-03-10  7:06               ` Al Viro
@ 2019-03-10  7:08                 ` Al Viro
  2019-03-10  7:08                   ` [PATCH 2/8] keep io_event in aio_kiocb Al Viro
                                     ` (7 more replies)
  2019-03-11 19:41                 ` [PATCH 1/8] aio: make sure file is pinned Christoph Hellwig
  1 sibling, 8 replies; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Linus Torvalds <torvalds@linux-foundation.org>

aio_poll() is not the only case that needs file pinned; worse, while
aio_read()/aio_write() can live without pinning iocb itself, the
proof is rather brittle and can easily break on later changes.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3d9669d011b9..363d7d7c8bff 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1022,6 +1022,9 @@ static bool get_reqs_available(struct kioctx *ctx)
 /* aio_get_req
  *	Allocate a slot for an aio request.
  * Returns NULL if no requests are free.
+ *
+ * The refcount is initialized to 2 - one for the async op completion,
+ * one for the synchronous code that does this.
  */
 static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 {
@@ -1034,7 +1037,7 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 	percpu_ref_get(&ctx->reqs);
 	req->ki_ctx = ctx;
 	INIT_LIST_HEAD(&req->ki_list);
-	refcount_set(&req->ki_refcnt, 0);
+	refcount_set(&req->ki_refcnt, 2);
 	req->ki_eventfd = NULL;
 	return req;
 }
@@ -1067,15 +1070,18 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 	return ret;
 }
 
+static inline void iocb_destroy(struct aio_kiocb *iocb)
+{
+	if (iocb->ki_filp)
+		fput(iocb->ki_filp);
+	percpu_ref_put(&iocb->ki_ctx->reqs);
+	kmem_cache_free(kiocb_cachep, iocb);
+}
+
 static inline void iocb_put(struct aio_kiocb *iocb)
 {
-	if (refcount_read(&iocb->ki_refcnt) == 0 ||
-	    refcount_dec_and_test(&iocb->ki_refcnt)) {
-		if (iocb->ki_filp)
-			fput(iocb->ki_filp);
-		percpu_ref_put(&iocb->ki_ctx->reqs);
-		kmem_cache_free(kiocb_cachep, iocb);
-	}
+	if (refcount_dec_and_test(&iocb->ki_refcnt))
+		iocb_destroy(iocb);
 }
 
 static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb,
@@ -1749,9 +1755,6 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	INIT_LIST_HEAD(&req->wait.entry);
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 
-	/* one for removal from waitqueue, one for this function */
-	refcount_set(&aiocb->ki_refcnt, 2);
-
 	mask = vfs_poll(req->file, &apt.pt) & req->events;
 	if (unlikely(!req->head)) {
 		/* we did not manage to set up a waitqueue, done */
@@ -1782,7 +1785,6 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 
 	if (mask)
 		aio_poll_complete(aiocb, mask);
-	iocb_put(aiocb);
 	return 0;
 }
 
@@ -1873,18 +1875,21 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		break;
 	}
 
+	/* Done with the synchronous reference */
+	iocb_put(req);
+
 	/*
 	 * If ret is 0, we'd either done aio_complete() ourselves or have
 	 * arranged for that to be done asynchronously.  Anything non-zero
 	 * means that we need to destroy req ourselves.
 	 */
-	if (ret)
-		goto out_put_req;
-	return 0;
+	if (!ret)
+		return 0;
+
 out_put_req:
 	if (req->ki_eventfd)
 		eventfd_ctx_put(req->ki_eventfd);
-	iocb_put(req);
+	iocb_destroy(req);
 out_put_reqs_available:
 	put_reqs_available(ctx, 1);
 	return ret;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/8] keep io_event in aio_kiocb
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
@ 2019-03-10  7:08                   ` Al Viro
  2019-03-11 19:43                     ` Christoph Hellwig
  2019-03-10  7:08                   ` [PATCH 3/8] aio: store event at final iocb_put() Al Viro
                                     ` (6 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 56 +++++++++++++++++++++-----------------------------------
 1 file changed, 21 insertions(+), 35 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 363d7d7c8bff..2249a7a1d6b3 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -204,8 +204,7 @@ struct aio_kiocb {
 	struct kioctx		*ki_ctx;
 	kiocb_cancel_fn		*ki_cancel;
 
-	struct iocb __user	*ki_user_iocb;	/* user's aiocb */
-	__u64			ki_user_data;	/* user's data for completion */
+	struct io_event		ki_res;
 
 	struct list_head	ki_list;	/* the aio core uses this
 						 * for cancellation */
@@ -1087,10 +1086,9 @@ static inline void iocb_put(struct aio_kiocb *iocb)
 static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb,
 			   long res, long res2)
 {
-	ev->obj = (u64)(unsigned long)iocb->ki_user_iocb;
-	ev->data = iocb->ki_user_data;
-	ev->res = res;
-	ev->res2 = res2;
+	iocb->ki_res.res = res;
+	iocb->ki_res.res2 = res2;
+	*ev = iocb->ki_res;
 }
 
 /* aio_complete
@@ -1126,7 +1124,7 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 	flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 
 	pr_debug("%p[%u]: %p: %p %Lx %lx %lx\n",
-		 ctx, tail, iocb, iocb->ki_user_iocb, iocb->ki_user_data,
+		 ctx, tail, iocb, (void __user *)iocb->ki_res.obj, iocb->ki_res.data,
 		 res, res2);
 
 	/* after flagging the request as done, we
@@ -1674,13 +1672,13 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	__poll_t mask = key_to_poll(key);
 	unsigned long flags;
 
+	/* for instances that support it check for an event match first: */
+	if (mask && !(mask & req->events))
+		return 0;
+
 	req->woken = true;
 
-	/* for instances that support it check for an event match first: */
 	if (mask) {
-		if (!(mask & req->events))
-			return 0;
-
 		/*
 		 * Try to complete the iocb inline if we can. Use
 		 * irqsave/irqrestore because not all filesystems (e.g. fuse)
@@ -1844,8 +1842,10 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		goto out_put_req;
 	}
 
-	req->ki_user_iocb = user_iocb;
-	req->ki_user_data = iocb->aio_data;
+	req->ki_res.obj = (u64)(unsigned long)user_iocb;
+	req->ki_res.data = iocb->aio_data;
+	req->ki_res.res = 0;
+	req->ki_res.res2 = 0;
 
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
@@ -2002,24 +2002,6 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
 }
 #endif
 
-/* lookup_kiocb
- *	Finds a given iocb for cancellation.
- */
-static struct aio_kiocb *
-lookup_kiocb(struct kioctx *ctx, struct iocb __user *iocb)
-{
-	struct aio_kiocb *kiocb;
-
-	assert_spin_locked(&ctx->ctx_lock);
-
-	/* TODO: use a hash or array, this sucks. */
-	list_for_each_entry(kiocb, &ctx->active_reqs, ki_list) {
-		if (kiocb->ki_user_iocb == iocb)
-			return kiocb;
-	}
-	return NULL;
-}
-
 /* sys_io_cancel:
  *	Attempts to cancel an iocb previously passed to io_submit.  If
  *	the operation is successfully cancelled, the resulting event is
@@ -2037,6 +2019,7 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 	struct aio_kiocb *kiocb;
 	int ret = -EINVAL;
 	u32 key;
+	u64 obj = (u64)(unsigned long)iocb;
 
 	if (unlikely(get_user(key, &iocb->aio_key)))
 		return -EFAULT;
@@ -2048,10 +2031,13 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
 		return -EINVAL;
 
 	spin_lock_irq(&ctx->ctx_lock);
-	kiocb = lookup_kiocb(ctx, iocb);
-	if (kiocb) {
-		ret = kiocb->ki_cancel(&kiocb->rw);
-		list_del_init(&kiocb->ki_list);
+	/* TODO: use a hash or array, this sucks. */
+	list_for_each_entry(kiocb, &ctx->active_reqs, ki_list) {
+		if (kiocb->ki_res.obj == obj) {
+			ret = kiocb->ki_cancel(&kiocb->rw);
+			list_del_init(&kiocb->ki_list);
+			break;
+		}
 	}
 	spin_unlock_irq(&ctx->ctx_lock);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/8] aio: store event at final iocb_put()
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
  2019-03-10  7:08                   ` [PATCH 2/8] keep io_event in aio_kiocb Al Viro
@ 2019-03-10  7:08                   ` Al Viro
  2019-03-11 19:44                     ` Christoph Hellwig
  2019-03-10  7:08                   ` [PATCH 4/8] Fix aio_poll() races Al Viro
                                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

Instead of having aio_complete() set ->ki_res.{res,res2}, do that
explicitly in its callers, drop the reference (as aio_complete()
used to do) and delay the rest until the final iocb_put().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 45 ++++++++++++++++++++-------------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 2249a7a1d6b3..b9c4c1894020 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1077,24 +1077,10 @@ static inline void iocb_destroy(struct aio_kiocb *iocb)
 	kmem_cache_free(kiocb_cachep, iocb);
 }
 
-static inline void iocb_put(struct aio_kiocb *iocb)
-{
-	if (refcount_dec_and_test(&iocb->ki_refcnt))
-		iocb_destroy(iocb);
-}
-
-static void aio_fill_event(struct io_event *ev, struct aio_kiocb *iocb,
-			   long res, long res2)
-{
-	iocb->ki_res.res = res;
-	iocb->ki_res.res2 = res2;
-	*ev = iocb->ki_res;
-}
-
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  */
-static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
+static void aio_complete(struct aio_kiocb *iocb)
 {
 	struct kioctx	*ctx = iocb->ki_ctx;
 	struct aio_ring	*ring;
@@ -1118,14 +1104,14 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 	ev_page = kmap_atomic(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 	event = ev_page + pos % AIO_EVENTS_PER_PAGE;
 
-	aio_fill_event(event, iocb, res, res2);
+	*event = iocb->ki_res;
 
 	kunmap_atomic(ev_page);
 	flush_dcache_page(ctx->ring_pages[pos / AIO_EVENTS_PER_PAGE]);
 
-	pr_debug("%p[%u]: %p: %p %Lx %lx %lx\n",
+	pr_debug("%p[%u]: %p: %p %Lx %Lx %Lx\n",
 		 ctx, tail, iocb, (void __user *)iocb->ki_res.obj, iocb->ki_res.data,
-		 res, res2);
+		 iocb->ki_res.res, iocb->ki_res.res2);
 
 	/* after flagging the request as done, we
 	 * must never even look at it again
@@ -1167,7 +1153,14 @@ static void aio_complete(struct aio_kiocb *iocb, long res, long res2)
 
 	if (waitqueue_active(&ctx->wait))
 		wake_up(&ctx->wait);
-	iocb_put(iocb);
+}
+
+static inline void iocb_put(struct aio_kiocb *iocb)
+{
+	if (refcount_dec_and_test(&iocb->ki_refcnt)) {
+		aio_complete(iocb);
+		iocb_destroy(iocb);
+	}
 }
 
 /* aio_read_events_ring
@@ -1441,7 +1434,9 @@ static void aio_complete_rw(struct kiocb *kiocb, long res, long res2)
 		file_end_write(kiocb->ki_filp);
 	}
 
-	aio_complete(iocb, res, res2);
+	iocb->ki_res.res = res;
+	iocb->ki_res.res2 = res2;
+	iocb_put(iocb);
 }
 
 static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb)
@@ -1589,11 +1584,10 @@ static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb,
 
 static void aio_fsync_work(struct work_struct *work)
 {
-	struct fsync_iocb *req = container_of(work, struct fsync_iocb, work);
-	int ret;
+	struct aio_kiocb *iocb = container_of(work, struct aio_kiocb, fsync.work);
 
-	ret = vfs_fsync(req->file, req->datasync);
-	aio_complete(container_of(req, struct aio_kiocb, fsync), ret, 0);
+	iocb->ki_res.res = vfs_fsync(iocb->fsync.file, iocb->fsync.datasync);
+	iocb_put(iocb);
 }
 
 static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb,
@@ -1614,7 +1608,8 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb,
 
 static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
 {
-	aio_complete(iocb, mangle_poll(mask), 0);
+	iocb->ki_res.res = mangle_poll(mask);
+	iocb_put(iocb);
 }
 
 static void aio_poll_complete_work(struct work_struct *work)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/8] Fix aio_poll() races
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
  2019-03-10  7:08                   ` [PATCH 2/8] keep io_event in aio_kiocb Al Viro
  2019-03-10  7:08                   ` [PATCH 3/8] aio: store event at final iocb_put() Al Viro
@ 2019-03-10  7:08                   ` Al Viro
  2019-03-11 19:58                     ` Christoph Hellwig
  2019-03-10  7:08                   ` [PATCH 5/8] make aio_read()/aio_write() return int Al Viro
                                     ` (4 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

aio_poll() has to cope with several unpleasant problems:
	* requests that might stay around indefinitely need to
be made visible for io_cancel(2); that must not be done to
a request already completed, though.
	* in cases when ->poll() has placed us on a waitqueue,
wakeup might have happened (and request completed) before ->poll()
returns.
	* worse, in some early wakeup cases request might end
up re-added into the queue later - we can't treat "woken up and
currently not in the queue" as "it's not going to stick around
indefinitely"
	* ... moreover, ->poll() might have decided not to
put it on any queues to start with, and that needs to be distinguished
from the previous case
	* ->poll() might have tried to put us on more than one queue.
Only the first will succeed for aio poll, so we might end up missing
wakeups.  OTOH, we might very well notice that only after the
wakeup hits and request gets completed (all before ->poll() gets
around to the second poll_wait()).  In that case it's too late to
decide that we have an error.

req->woken was an attempt to deal with that.  Unfortunately, it was
broken.  What we need to keep track of is not that wakeup has happened -
the thing might come back after that.  It's that async reference is
already gone and won't come back, so we can't (and needn't) put the
request on the list of cancellables.

The easiest case is "request hadn't been put on any waitqueues"; we
can tell by seeing NULL apt.head, and in that case there won't be
anything async.  We should either complete the request ourselves
(if vfs_poll() reports anything of interest) or return an error.

In all other cases we get exclusion with wakeups by grabbing the
queue lock.

If request is currently on queue and we have something interesting
from vfs_poll(), we can steal it and complete the request ourselves.

If it's on queue and vfs_poll() has not reported anything interesting,
we either put it on the cancellable list, or, if we know that it
hadn't been put on all queues ->poll() wanted it on, we steal it and
return an error.

If it's _not_ on queue, it's either been already dealt with (in which
case we do nothing), or there's aio_poll_complete_work() about to be
executed.  In that case we either put it on the cancellable list,
or, if we know it hadn't been put on all queues ->poll() wanted it on,
simulate what cancel would've done.

It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
suck, and unfortunately aio is not an exception...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 71 +++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 32 insertions(+), 39 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index b9c4c1894020..f47a29f7f201 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -181,7 +181,7 @@ struct poll_iocb {
 	struct file		*file;
 	struct wait_queue_head	*head;
 	__poll_t		events;
-	bool			woken;
+	bool			done;
 	bool			cancelled;
 	struct wait_queue_entry	wait;
 	struct work_struct	work;
@@ -1606,12 +1606,6 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb,
 	return 0;
 }
 
-static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
-{
-	iocb->ki_res.res = mangle_poll(mask);
-	iocb_put(iocb);
-}
-
 static void aio_poll_complete_work(struct work_struct *work)
 {
 	struct poll_iocb *req = container_of(work, struct poll_iocb, work);
@@ -1637,9 +1631,11 @@ static void aio_poll_complete_work(struct work_struct *work)
 		return;
 	}
 	list_del_init(&iocb->ki_list);
+	iocb->ki_res.res = mangle_poll(mask);
+	req->done = true;
 	spin_unlock_irq(&ctx->ctx_lock);
 
-	aio_poll_complete(iocb, mask);
+	iocb_put(iocb);
 }
 
 /* assumes we are called with irqs disabled */
@@ -1671,7 +1667,7 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	if (mask && !(mask & req->events))
 		return 0;
 
-	req->woken = true;
+	list_del_init(&req->wait.entry);
 
 	if (mask) {
 		/*
@@ -1682,15 +1678,14 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 		 */
 		if (spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
 			list_del(&iocb->ki_list);
+			iocb->ki_res.res = mangle_poll(mask);
+			req->done = true;
 			spin_unlock_irqrestore(&iocb->ki_ctx->ctx_lock, flags);
-
-			list_del_init(&req->wait.entry);
-			aio_poll_complete(iocb, mask);
+			iocb_put(iocb);
 			return 1;
 		}
 	}
 
-	list_del_init(&req->wait.entry);
 	schedule_work(&req->work);
 	return 1;
 }
@@ -1723,6 +1718,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	struct kioctx *ctx = aiocb->ki_ctx;
 	struct poll_iocb *req = &aiocb->poll;
 	struct aio_poll_table apt;
+	bool cancel = false;
 	__poll_t mask;
 
 	/* reject any unknown events outside the normal event mask. */
@@ -1736,7 +1732,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
 
 	req->head = NULL;
-	req->woken = false;
+	req->done = false;
 	req->cancelled = false;
 
 	apt.pt._qproc = aio_poll_queue_proc;
@@ -1749,36 +1745,33 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 
 	mask = vfs_poll(req->file, &apt.pt) & req->events;
-	if (unlikely(!req->head)) {
-		/* we did not manage to set up a waitqueue, done */
-		goto out;
-	}
-
 	spin_lock_irq(&ctx->ctx_lock);
-	spin_lock(&req->head->lock);
-	if (req->woken) {
-		/* wake_up context handles the rest */
-		mask = 0;
+	if (likely(req->head)) {
+		spin_lock(&req->head->lock);
+		if (unlikely(list_empty(&req->wait.entry))) {
+			if (apt.error)
+				cancel = true;
+			apt.error = 0;
+			mask = 0;
+		}
+		if (mask || apt.error) {
+			list_del_init(&req->wait.entry);
+		} else if (cancel) {
+			WRITE_ONCE(req->cancelled, true);
+		} else if (!req->done) { /* actually waiting for an event */
+			list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
+			aiocb->ki_cancel = aio_poll_cancel;
+		}
+		spin_unlock(&req->head->lock);
+	}
+	if (mask) { /* no async, we'd stolen it */
+		aiocb->ki_res.res = mangle_poll(mask);
 		apt.error = 0;
-	} else if (mask || apt.error) {
-		/* if we get an error or a mask we are done */
-		WARN_ON_ONCE(list_empty(&req->wait.entry));
-		list_del_init(&req->wait.entry);
-	} else {
-		/* actually waiting for an event */
-		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
-		aiocb->ki_cancel = aio_poll_cancel;
 	}
-	spin_unlock(&req->head->lock);
 	spin_unlock_irq(&ctx->ctx_lock);
-
-out:
-	if (unlikely(apt.error))
-		return apt.error;
-
 	if (mask)
-		aio_poll_complete(aiocb, mask);
-	return 0;
+		iocb_put(aiocb);
+	return apt.error;
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/8] make aio_read()/aio_write() return int
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
                                     ` (2 preceding siblings ...)
  2019-03-10  7:08                   ` [PATCH 4/8] Fix aio_poll() races Al Viro
@ 2019-03-10  7:08                   ` Al Viro
  2019-03-11 19:44                     ` Christoph Hellwig
  2019-03-10  7:08                   ` [PATCH 6/8] move dropping ->ki_eventfd into iocb_destroy() Al Viro
                                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

that ssize_t is a rudiment of earlier calling conventions; it's been
used only to pass 0 and -E... since last autumn.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f47a29f7f201..f9e8f1edfe36 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1513,13 +1513,13 @@ static inline void aio_rw_done(struct kiocb *req, ssize_t ret)
 	}
 }
 
-static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb,
+static int aio_read(struct kiocb *req, const struct iocb *iocb,
 			bool vectored, bool compat)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
 	struct file *file;
-	ssize_t ret;
+	int ret;
 
 	ret = aio_prep_rw(req, iocb);
 	if (ret)
@@ -1541,13 +1541,13 @@ static ssize_t aio_read(struct kiocb *req, const struct iocb *iocb,
 	return ret;
 }
 
-static ssize_t aio_write(struct kiocb *req, const struct iocb *iocb,
+static int aio_write(struct kiocb *req, const struct iocb *iocb,
 			 bool vectored, bool compat)
 {
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
 	struct file *file;
-	ssize_t ret;
+	int ret;
 
 	ret = aio_prep_rw(req, iocb);
 	if (ret)
@@ -1713,7 +1713,7 @@ aio_poll_queue_proc(struct file *file, struct wait_queue_head *head,
 	add_wait_queue(head, &pt->iocb->poll.wait);
 }
 
-static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
+static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 {
 	struct kioctx *ctx = aiocb->ki_ctx;
 	struct poll_iocb *req = &aiocb->poll;
@@ -1778,7 +1778,7 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 			   struct iocb __user *user_iocb, bool compat)
 {
 	struct aio_kiocb *req;
-	ssize_t ret;
+	int ret;
 
 	/* enforce forwards compatibility on users */
 	if (unlikely(iocb->aio_reserved2)) {
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/8] move dropping ->ki_eventfd into iocb_destroy()
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
                                     ` (3 preceding siblings ...)
  2019-03-10  7:08                   ` [PATCH 5/8] make aio_read()/aio_write() return int Al Viro
@ 2019-03-10  7:08                   ` Al Viro
  2019-03-11 19:46                     ` Christoph Hellwig
  2019-03-10  7:08                   ` [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself Al Viro
                                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index f9e8f1edfe36..595c19965a8b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1071,6 +1071,8 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
 
 static inline void iocb_destroy(struct aio_kiocb *iocb)
 {
+	if (iocb->ki_eventfd)
+		eventfd_ctx_put(iocb->ki_eventfd);
 	if (iocb->ki_filp)
 		fput(iocb->ki_filp);
 	percpu_ref_put(&iocb->ki_ctx->reqs);
@@ -1138,10 +1140,8 @@ static void aio_complete(struct aio_kiocb *iocb)
 	 * eventfd. The eventfd_signal() function is safe to be called
 	 * from IRQ context.
 	 */
-	if (iocb->ki_eventfd) {
+	if (iocb->ki_eventfd)
 		eventfd_signal(iocb->ki_eventfd, 1);
-		eventfd_ctx_put(iocb->ki_eventfd);
-	}
 
 	/*
 	 * We have to order our ring_info tail store above and test
@@ -1810,18 +1810,19 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		goto out_put_req;
 
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
+		struct eventfd_ctx *eventfd;
 		/*
 		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
 		 * instance of the file* now. The file descriptor must be
 		 * an eventfd() fd, and will be signaled for each completed
 		 * event using the eventfd_signal() function.
 		 */
-		req->ki_eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
-		if (IS_ERR(req->ki_eventfd)) {
-			ret = PTR_ERR(req->ki_eventfd);
-			req->ki_eventfd = NULL;
+		eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
+		if (IS_ERR(eventfd)) {
+			ret = PTR_ERR(eventfd);
 			goto out_put_req;
 		}
+		req->ki_eventfd = eventfd;
 	}
 
 	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
@@ -1875,8 +1876,6 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		return 0;
 
 out_put_req:
-	if (req->ki_eventfd)
-		eventfd_ctx_put(req->ki_eventfd);
 	iocb_destroy(req);
 out_put_reqs_available:
 	put_reqs_available(ctx, 1);
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
                                     ` (4 preceding siblings ...)
  2019-03-10  7:08                   ` [PATCH 6/8] move dropping ->ki_eventfd into iocb_destroy() Al Viro
@ 2019-03-10  7:08                   ` Al Viro
  2019-03-11 19:46                     ` Christoph Hellwig
  2019-03-10  7:08                   ` [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one() Al Viro
  2019-03-11 19:41                   ` [PATCH 1/8] pin iocb through aio Christoph Hellwig
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

simplifies the caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 595c19965a8b..66ca52c57cca 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1033,6 +1033,11 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 	if (unlikely(!req))
 		return NULL;
 
+	if (unlikely(!get_reqs_available(ctx))) {
+		kfree(req);
+		return NULL;
+	}
+
 	percpu_ref_get(&ctx->reqs);
 	req->ki_ctx = ctx;
 	INIT_LIST_HEAD(&req->ki_list);
@@ -1796,13 +1801,9 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		return -EINVAL;
 	}
 
-	if (!get_reqs_available(ctx))
-		return -EAGAIN;
-
-	ret = -EAGAIN;
 	req = aio_get_req(ctx);
 	if (unlikely(!req))
-		goto out_put_reqs_available;
+		return -EAGAIN;
 
 	req->ki_filp = fget(iocb->aio_fildes);
 	ret = -EBADF;
@@ -1877,7 +1878,6 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 
 out_put_req:
 	iocb_destroy(req);
-out_put_reqs_available:
 	put_reqs_available(ctx, 1);
 	return ret;
 }
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one()
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
                                     ` (5 preceding siblings ...)
  2019-03-10  7:08                   ` [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself Al Viro
@ 2019-03-10  7:08                   ` Al Viro
  2019-03-11 19:48                     ` Christoph Hellwig
  2019-03-11 19:41                   ` [PATCH 1/8] pin iocb through aio Christoph Hellwig
  7 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-10  7:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric Dumazet, David Miller, Jason Baron, kgraul, ktkhai,
	kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

From: Al Viro <viro@zeniv.linux.org.uk>

makes for somewhat cleaner control flow in __io_submit_one()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 119 ++++++++++++++++++++++++++++-----------------------------------
 1 file changed, 53 insertions(+), 66 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 66ca52c57cca..2d1db52b1268 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1780,35 +1780,12 @@ static int aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
-			   struct iocb __user *user_iocb, bool compat)
+			   struct iocb __user *user_iocb, struct aio_kiocb *req,
+			   bool compat)
 {
-	struct aio_kiocb *req;
-	int ret;
-
-	/* enforce forwards compatibility on users */
-	if (unlikely(iocb->aio_reserved2)) {
-		pr_debug("EINVAL: reserve field set\n");
-		return -EINVAL;
-	}
-
-	/* prevent overflows */
-	if (unlikely(
-	    (iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
-	    (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
-	    ((ssize_t)iocb->aio_nbytes < 0)
-	   )) {
-		pr_debug("EINVAL: overflow check\n");
-		return -EINVAL;
-	}
-
-	req = aio_get_req(ctx);
-	if (unlikely(!req))
-		return -EAGAIN;
-
 	req->ki_filp = fget(iocb->aio_fildes);
-	ret = -EBADF;
 	if (unlikely(!req->ki_filp))
-		goto out_put_req;
+		return -EBADF;
 
 	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
 		struct eventfd_ctx *eventfd;
@@ -1819,17 +1796,15 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 		 * event using the eventfd_signal() function.
 		 */
 		eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);
-		if (IS_ERR(eventfd)) {
-			ret = PTR_ERR(eventfd);
-			goto out_put_req;
-		}
+		if (IS_ERR(eventfd))
+			return PTR_ERR(req->ki_eventfd);
+
 		req->ki_eventfd = eventfd;
 	}
 
-	ret = put_user(KIOCB_KEY, &user_iocb->aio_key);
-	if (unlikely(ret)) {
+	if (unlikely(put_user(KIOCB_KEY, &user_iocb->aio_key))) {
 		pr_debug("EFAULT: aio_key\n");
-		goto out_put_req;
+		return -EFAULT;
 	}
 
 	req->ki_res.obj = (u64)(unsigned long)user_iocb;
@@ -1839,58 +1814,70 @@ static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
 
 	switch (iocb->aio_lio_opcode) {
 	case IOCB_CMD_PREAD:
-		ret = aio_read(&req->rw, iocb, false, compat);
-		break;
+		return aio_read(&req->rw, iocb, false, compat);
 	case IOCB_CMD_PWRITE:
-		ret = aio_write(&req->rw, iocb, false, compat);
-		break;
+		return aio_write(&req->rw, iocb, false, compat);
 	case IOCB_CMD_PREADV:
-		ret = aio_read(&req->rw, iocb, true, compat);
-		break;
+		return aio_read(&req->rw, iocb, true, compat);
 	case IOCB_CMD_PWRITEV:
-		ret = aio_write(&req->rw, iocb, true, compat);
-		break;
+		return aio_write(&req->rw, iocb, true, compat);
 	case IOCB_CMD_FSYNC:
-		ret = aio_fsync(&req->fsync, iocb, false);
-		break;
+		return aio_fsync(&req->fsync, iocb, false);
 	case IOCB_CMD_FDSYNC:
-		ret = aio_fsync(&req->fsync, iocb, true);
-		break;
+		return aio_fsync(&req->fsync, iocb, true);
 	case IOCB_CMD_POLL:
-		ret = aio_poll(req, iocb);
-		break;
+		return aio_poll(req, iocb);
 	default:
 		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
-		ret = -EINVAL;
-		break;
+		return -EINVAL;
 	}
-
-	/* Done with the synchronous reference */
-	iocb_put(req);
-
-	/*
-	 * If ret is 0, we'd either done aio_complete() ourselves or have
-	 * arranged for that to be done asynchronously.  Anything non-zero
-	 * means that we need to destroy req ourselves.
-	 */
-	if (!ret)
-		return 0;
-
-out_put_req:
-	iocb_destroy(req);
-	put_reqs_available(ctx, 1);
-	return ret;
 }
 
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 bool compat)
 {
+	struct aio_kiocb *req;
 	struct iocb iocb;
+	int err;
 
 	if (unlikely(copy_from_user(&iocb, user_iocb, sizeof(iocb))))
 		return -EFAULT;
 
-	return __io_submit_one(ctx, &iocb, user_iocb, compat);
+	/* enforce forwards compatibility on users */
+	if (unlikely(iocb.aio_reserved2)) {
+		pr_debug("EINVAL: reserve field set\n");
+		return -EINVAL;
+	}
+
+	/* prevent overflows */
+	if (unlikely(
+	    (iocb.aio_buf != (unsigned long)iocb.aio_buf) ||
+	    (iocb.aio_nbytes != (size_t)iocb.aio_nbytes) ||
+	    ((ssize_t)iocb.aio_nbytes < 0)
+	   )) {
+		pr_debug("EINVAL: overflow check\n");
+		return -EINVAL;
+	}
+
+	req = aio_get_req(ctx);
+	if (unlikely(!req))
+		return -EAGAIN;
+
+	err = __io_submit_one(ctx, &iocb, user_iocb, req, compat);
+
+	/* Done with the synchronous reference */
+	iocb_put(req);
+
+	/*
+	 * If err is 0, we'd either done aio_complete() ourselves or have
+	 * arranged for that to be done asynchronously.  Anything non-zero
+	 * means that we need to destroy req ourselves.
+	 */
+	if (unlikely(err)) {
+		iocb_destroy(req);
+		put_reqs_available(ctx, 1);
+	}
+	return err;
 }
 
 /* sys_io_submit:
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] aio: make sure file is pinned
  2019-03-10  7:06               ` Al Viro
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
@ 2019-03-11 19:41                 ` Christoph Hellwig
  1 sibling, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, Cong Wang, Christoph Hellwig, zhengbin, bcrl,
	linux-fsdevel, linux-aio, houtao1, yi.zhang

On Sun, Mar 10, 2019 at 07:06:18AM +0000, Al Viro wrote:
> OK, refactored, cleaned up and force-pushed.  Current state:

This survives the libaio test suite at least.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/8] pin iocb through aio.
  2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
                                     ` (6 preceding siblings ...)
  2019-03-10  7:08                   ` [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one() Al Viro
@ 2019-03-11 19:41                   ` Christoph Hellwig
  7 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/8] keep io_event in aio_kiocb
  2019-03-10  7:08                   ` [PATCH 2/8] keep io_event in aio_kiocb Al Viro
@ 2019-03-11 19:43                     ` Christoph Hellwig
  2019-03-11 21:17                       ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Sun, Mar 10, 2019 at 07:08:16AM +0000, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

This could use a little changelog at least that explains the why.

It also removes lookup_kiocb and folds that into the only caller,
which is at least worth mentioning (and probably should be a separate
patch).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/8] aio: store event at final iocb_put()
  2019-03-10  7:08                   ` [PATCH 3/8] aio: store event at final iocb_put() Al Viro
@ 2019-03-11 19:44                     ` Christoph Hellwig
  2019-03-11 21:13                       ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Sun, Mar 10, 2019 at 07:08:17AM +0000, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Instead of having aio_complete() set ->ki_res.{res,res2}, do that
> explicitly in its callers, drop the reference (as aio_complete()
> used to do) and delay the rest until the final iocb_put().
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
>  fs/aio.c | 45 ++++++++++++++++++++-------------------------
>  1 file changed, 20 insertions(+), 25 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index 2249a7a1d6b3..b9c4c1894020 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1077,24 +1077,10 @@ static inline void iocb_destroy(struct aio_kiocb *iocb)
>  	kmem_cache_free(kiocb_cachep, iocb);
>  }
>  
> -static inline void iocb_put(struct aio_kiocb *iocb)
> -{
> -	if (refcount_dec_and_test(&iocb->ki_refcnt))
> -		iocb_destroy(iocb);
> -}

Maybe iocb_put should just have been added in the place you move
it to in patch 1?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/8] make aio_read()/aio_write() return int
  2019-03-10  7:08                   ` [PATCH 5/8] make aio_read()/aio_write() return int Al Viro
@ 2019-03-11 19:44                     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/8] move dropping ->ki_eventfd into iocb_destroy()
  2019-03-10  7:08                   ` [PATCH 6/8] move dropping ->ki_eventfd into iocb_destroy() Al Viro
@ 2019-03-11 19:46                     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:46 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Sun, Mar 10, 2019 at 07:08:20AM +0000, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks sensible, but a changelog would be nice.

>  	if (iocb->aio_flags & IOCB_FLAG_RESFD) {
> +		struct eventfd_ctx *eventfd;
>  		/*
>  		 * If the IOCB_FLAG_RESFD flag of aio_flags is set, get an
>  		 * instance of the file* now. The file descriptor must be
>  		 * an eventfd() fd, and will be signaled for each completed
>  		 * event using the eventfd_signal() function.
>  		 */
> +		eventfd = eventfd_ctx_fdget((int) iocb->aio_resfd);

I don't think there is any point in the cast in either the old or new
code.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself
  2019-03-10  7:08                   ` [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself Al Viro
@ 2019-03-11 19:46                     ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:46 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Sun, Mar 10, 2019 at 07:08:21AM +0000, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> simplifies the caller
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one()
  2019-03-10  7:08                   ` [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one() Al Viro
@ 2019-03-11 19:48                     ` Christoph Hellwig
  2019-03-11 21:12                       ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:48 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

On Sun, Mar 10, 2019 at 07:08:22AM +0000, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> makes for somewhat cleaner control flow in __io_submit_one()
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

I wonder if we should even bother keeping __io_submit_one.  Splitting
that out was prep work from Jens for something that eventually turned
into io_uring.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/8] Fix aio_poll() races
  2019-03-10  7:08                   ` [PATCH 4/8] Fix aio_poll() races Al Viro
@ 2019-03-11 19:58                     ` Christoph Hellwig
  2019-03-11 21:06                       ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-11 19:58 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, Christoph Hellwig, zhengbin,
	bcrl, linux-fsdevel, linux-aio, houtao1, yi.zhang

Where do we put the second iocb reference in case we return from
vfs_poll without ever being woken?

Also it seems like the complete code would still benefit from a little
helper, something like:

diff --git a/fs/aio.c b/fs/aio.c
index b2a5c7b3a1fe..8415e5e484ce 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1611,6 +1611,13 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb,
 	return 0;
 }
 
+static void aio_poll_finish(struct aio_kiocb *iocb, __poll_t mask)
+{
+	list_del_init(&iocb->ki_list);
+	iocb->ki_res.res = mangle_poll(mask);
+	iocb->poll.done = true;
+}
+
 static void aio_poll_complete_work(struct work_struct *work)
 {
 	struct poll_iocb *req = container_of(work, struct poll_iocb, work);
@@ -1635,9 +1642,7 @@ static void aio_poll_complete_work(struct work_struct *work)
 		spin_unlock_irq(&ctx->ctx_lock);
 		return;
 	}
-	list_del_init(&iocb->ki_list);
-	iocb->ki_res.res = mangle_poll(mask);
-	req->done = true;
+	aio_poll_finish(iocb, mask);
 	spin_unlock_irq(&ctx->ctx_lock);
 
 	iocb_put(iocb);
@@ -1674,24 +1679,20 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 
 	list_del_init(&req->wait.entry);
 
-	if (mask) {
+	if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
 		/*
 		 * Try to complete the iocb inline if we can. Use
 		 * irqsave/irqrestore because not all filesystems (e.g. fuse)
 		 * call this function with IRQs disabled and because IRQs
 		 * have to be disabled before ctx_lock is obtained.
 		 */
-		if (spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
-			list_del(&iocb->ki_list);
-			iocb->ki_res.res = mangle_poll(mask);
-			req->done = true;
-			spin_unlock_irqrestore(&iocb->ki_ctx->ctx_lock, flags);
-			iocb_put(iocb);
-			return 1;
-		}
+		aio_poll_finish(iocb, mask);
+		spin_unlock_irqrestore(&iocb->ki_ctx->ctx_lock, flags);
+		iocb_put(iocb);
+	} else {
+		schedule_work(&req->work);
 	}
 
-	schedule_work(&req->work);
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/8] Fix aio_poll() races
  2019-03-11 19:58                     ` Christoph Hellwig
@ 2019-03-11 21:06                       ` Al Viro
  2019-03-12 19:18                         ` Christoph Hellwig
  0 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-11 21:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, zhengbin, bcrl, linux-fsdevel,
	linux-aio, houtao1, yi.zhang

On Mon, Mar 11, 2019 at 08:58:31PM +0100, Christoph Hellwig wrote:
> Where do we put the second iocb reference in case we return from
> vfs_poll without ever being woken?

Depends.  If mask is non-zero (i.e. vfs_poll() has returned something
we care about) and it has never been woken, we steal it and drop the
reference ourselves.  If it is zero and we see that ->poll() has tried
to put it on two queues, we steal it (again, assuming it's not on
waitqueue and _can_ be stolen) and return -EINVAL.  In that case
__io_submit_one() (or, by the end of the series, io_submit_one())
will call iocb_destroy().  And in the normal waiting case (nothing
interesting reported and no errors) it will end up on the list of
cancellables.  Then it either will get completed by later wakeup, which
will drop the reference, or it will get eventually cancelled, which will
hit the same aio_poll_complete_work() and drop the reference...

> Also it seems like the complete code would still benefit from a little
> helper, something like:

Umm...  Not sure I like the name (something like aio_poll_done() seems
to be better), but other than that - no problem.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one()
  2019-03-11 19:48                     ` Christoph Hellwig
@ 2019-03-11 21:12                       ` Al Viro
  0 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-11 21:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, zhengbin, bcrl, linux-fsdevel,
	linux-aio, houtao1, yi.zhang

On Mon, Mar 11, 2019 at 08:48:30PM +0100, Christoph Hellwig wrote:
> On Sun, Mar 10, 2019 at 07:08:22AM +0000, Al Viro wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > makes for somewhat cleaner control flow in __io_submit_one()
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> I wonder if we should even bother keeping __io_submit_one.  Splitting
> that out was prep work from Jens for something that eventually turned
> into io_uring.

*shrug*

I think it's a bit easier to keep track of control flow - failure exits
are simpler that way.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/8] aio: store event at final iocb_put()
  2019-03-11 19:44                     ` Christoph Hellwig
@ 2019-03-11 21:13                       ` Al Viro
  2019-03-11 22:52                         ` Al Viro
  0 siblings, 1 reply; 42+ messages in thread
From: Al Viro @ 2019-03-11 21:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, zhengbin, bcrl, linux-fsdevel,
	linux-aio, houtao1, yi.zhang

On Mon, Mar 11, 2019 at 08:44:31PM +0100, Christoph Hellwig wrote:
> On Sun, Mar 10, 2019 at 07:08:17AM +0000, Al Viro wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > Instead of having aio_complete() set ->ki_res.{res,res2}, do that
> > explicitly in its callers, drop the reference (as aio_complete()
> > used to do) and delay the rest until the final iocb_put().
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> >  fs/aio.c | 45 ++++++++++++++++++++-------------------------
> >  1 file changed, 20 insertions(+), 25 deletions(-)
> > 
> > diff --git a/fs/aio.c b/fs/aio.c
> > index 2249a7a1d6b3..b9c4c1894020 100644
> > --- a/fs/aio.c
> > +++ b/fs/aio.c
> > @@ -1077,24 +1077,10 @@ static inline void iocb_destroy(struct aio_kiocb *iocb)
> >  	kmem_cache_free(kiocb_cachep, iocb);
> >  }
> >  
> > -static inline void iocb_put(struct aio_kiocb *iocb)
> > -{
> > -	if (refcount_dec_and_test(&iocb->ki_refcnt))
> > -		iocb_destroy(iocb);
> > -}
> 
> Maybe iocb_put should just have been added in the place you move
> it to in patch 1?

Might as well...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/8] keep io_event in aio_kiocb
  2019-03-11 19:43                     ` Christoph Hellwig
@ 2019-03-11 21:17                       ` Al Viro
  0 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-11 21:17 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, zhengbin, bcrl, linux-fsdevel,
	linux-aio, houtao1, yi.zhang

On Mon, Mar 11, 2019 at 08:43:06PM +0100, Christoph Hellwig wrote:
> On Sun, Mar 10, 2019 at 07:08:16AM +0000, Al Viro wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> > 
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> 
> This could use a little changelog at least that explains the why.
> 
> It also removes lookup_kiocb and folds that into the only caller,
> which is at least worth mentioning (and probably should be a separate
> patch).

More confusingly, it contains a pointless rudiment in aio_poll_wake();
that chunk doesn't solve the problems with ->woken, and the whole
thing gets killed off shortly...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/8] aio: store event at final iocb_put()
  2019-03-11 21:13                       ` Al Viro
@ 2019-03-11 22:52                         ` Al Viro
  0 siblings, 0 replies; 42+ messages in thread
From: Al Viro @ 2019-03-11 22:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Linus Torvalds, Eric Dumazet, David Miller, Jason Baron, kgraul,
	ktkhai, kyeongdon.kim, Linux List Kernel Mailing, Netdev, pabeni,
	syzkaller-bugs, xiyou.wangcong, zhengbin, bcrl, linux-fsdevel,
	linux-aio, houtao1, yi.zhang

On Mon, Mar 11, 2019 at 09:13:28PM +0000, Al Viro wrote:
> On Mon, Mar 11, 2019 at 08:44:31PM +0100, Christoph Hellwig wrote:
> > On Sun, Mar 10, 2019 at 07:08:17AM +0000, Al Viro wrote:
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > 
> > > Instead of having aio_complete() set ->ki_res.{res,res2}, do that
> > > explicitly in its callers, drop the reference (as aio_complete()
> > > used to do) and delay the rest until the final iocb_put().
> > > 
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > ---
> > >  fs/aio.c | 45 ++++++++++++++++++++-------------------------
> > >  1 file changed, 20 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/fs/aio.c b/fs/aio.c
> > > index 2249a7a1d6b3..b9c4c1894020 100644
> > > --- a/fs/aio.c
> > > +++ b/fs/aio.c
> > > @@ -1077,24 +1077,10 @@ static inline void iocb_destroy(struct aio_kiocb *iocb)
> > >  	kmem_cache_free(kiocb_cachep, iocb);
> > >  }
> > >  
> > > -static inline void iocb_put(struct aio_kiocb *iocb)
> > > -{
> > > -	if (refcount_dec_and_test(&iocb->ki_refcnt))
> > > -		iocb_destroy(iocb);
> > > -}
> > 
> > Maybe iocb_put should just have been added in the place you move
> > it to in patch 1?
> 
> Might as well...

Actually, that wouldn't be any better - it would need at least a declaration
before aio_complete(), since in the original patch aio_complete() calls it.
So that wouldn't be less noisy...

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/8] Fix aio_poll() races
  2019-03-11 21:06                       ` Al Viro
@ 2019-03-12 19:18                         ` Christoph Hellwig
  0 siblings, 0 replies; 42+ messages in thread
From: Christoph Hellwig @ 2019-03-12 19:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Christoph Hellwig, Linus Torvalds, Eric Dumazet, David Miller,
	Jason Baron, kgraul, ktkhai, kyeongdon.kim,
	Linux List Kernel Mailing, Netdev, pabeni, syzkaller-bugs,
	xiyou.wangcong, zhengbin, bcrl, linux-fsdevel, linux-aio,
	houtao1, yi.zhang

On Mon, Mar 11, 2019 at 09:06:18PM +0000, Al Viro wrote:
> On Mon, Mar 11, 2019 at 08:58:31PM +0100, Christoph Hellwig wrote:
> > Where do we put the second iocb reference in case we return from
> > vfs_poll without ever being woken?
> 
> Depends.  If mask is non-zero (i.e. vfs_poll() has returned something
> we care about) and it has never been woken, we steal it and drop the
> reference ourselves.  If it is zero and we see that ->poll() has tried
> to put it on two queues, we steal it (again, assuming it's not on
> waitqueue and _can_ be stolen) and return -EINVAL.  In that case
> __io_submit_one() (or, by the end of the series, io_submit_one())
> will call iocb_destroy().  And in the normal waiting case (nothing
> interesting reported and no errors) it will end up on the list of
> cancellables.  Then it either will get completed by later wakeup, which
> will drop the reference, or it will get eventually cancelled, which will
> hit the same aio_poll_complete_work() and drop the reference...

Ok, seems like the logic is sane.  I was missing how the actual
mask logic worked in aio_poll().

> > Also it seems like the complete code would still benefit from a little
> > helper, something like:
> 
> Umm...  Not sure I like the name (something like aio_poll_done() seems
> to be better), but other than that - no problem.

I don't care about the name.  Feel free to change it to whatever suits
you.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-03-12 19:18 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAHk-=wghRmdYKRAwakeHmjcpbLt9fJAHyU2x8s_NZONhz6RTOw@mail.gmail.com>
2019-03-07  0:03 ` [PATCH 1/8] aio: make sure file is pinned Al Viro
2019-03-07  0:03   ` [PATCH 2/8] aio_poll_wake(): don't set ->woken if we ignore the wakeup Al Viro
2019-03-07  2:18     ` Al Viro
2019-03-08 11:16       ` zhengbin (A)
2019-03-07  0:03   ` [PATCH 3/8] aio_poll(): sanitize the logics after vfs_poll(), get rid of leak on error Al Viro
2019-03-07  2:11     ` zhengbin (A)
2019-03-07  0:03   ` [PATCH 4/8] aio_poll(): get rid of weird refcounting Al Viro
2019-03-07  0:03   ` [PATCH 5/8] make aio_read()/aio_write() return int Al Viro
2019-03-07  0:03   ` [PATCH 6/8] move dropping ->ki_eventfd into iocb_put() Al Viro
2019-03-07  0:03   ` [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself Al Viro
2019-03-07  0:03   ` [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one() Al Viro
2019-03-07  0:23   ` [PATCH 1/8] aio: make sure file is pinned Linus Torvalds
2019-03-07  0:41     ` Al Viro
2019-03-07  0:48       ` Al Viro
2019-03-07  1:20         ` Al Viro
2019-03-07  1:30           ` Linus Torvalds
2019-03-08  3:36             ` Al Viro
2019-03-08 15:50               ` Christoph Hellwig
2019-03-10  7:06               ` Al Viro
2019-03-10  7:08                 ` [PATCH 1/8] pin iocb through aio Al Viro
2019-03-10  7:08                   ` [PATCH 2/8] keep io_event in aio_kiocb Al Viro
2019-03-11 19:43                     ` Christoph Hellwig
2019-03-11 21:17                       ` Al Viro
2019-03-10  7:08                   ` [PATCH 3/8] aio: store event at final iocb_put() Al Viro
2019-03-11 19:44                     ` Christoph Hellwig
2019-03-11 21:13                       ` Al Viro
2019-03-11 22:52                         ` Al Viro
2019-03-10  7:08                   ` [PATCH 4/8] Fix aio_poll() races Al Viro
2019-03-11 19:58                     ` Christoph Hellwig
2019-03-11 21:06                       ` Al Viro
2019-03-12 19:18                         ` Christoph Hellwig
2019-03-10  7:08                   ` [PATCH 5/8] make aio_read()/aio_write() return int Al Viro
2019-03-11 19:44                     ` Christoph Hellwig
2019-03-10  7:08                   ` [PATCH 6/8] move dropping ->ki_eventfd into iocb_destroy() Al Viro
2019-03-11 19:46                     ` Christoph Hellwig
2019-03-10  7:08                   ` [PATCH 7/8] deal with get_reqs_available() in aio_get_req() itself Al Viro
2019-03-11 19:46                     ` Christoph Hellwig
2019-03-10  7:08                   ` [PATCH 8/8] aio: move sanity checks and request allocation to io_submit_one() Al Viro
2019-03-11 19:48                     ` Christoph Hellwig
2019-03-11 21:12                       ` Al Viro
2019-03-11 19:41                   ` [PATCH 1/8] pin iocb through aio Christoph Hellwig
2019-03-11 19:41                 ` [PATCH 1/8] aio: make sure file is pinned Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).