io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] bunch of random fixes
@ 2021-01-02 16:06 Pavel Begunkov
  2021-01-02 16:06 ` [PATCH 1/4] io_uring: dont kill fasync under completion_lock Pavel Begunkov
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Pavel Begunkov @ 2021-01-02 16:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring

The patches are more or less clear, but for 3/4 don't know if that was
done this way intentionally.

Pavel Begunkov (4):
  io_uring: dont kill fasync under completion_lock
  io_uring: patch up IOPOLL overflow_flush sync
  io_uring: drop file refs after task cancel 
  io_uring: cancel more aggressively in exit_work

 fs/io_uring.c | 70 +++++++++++++++++++++++++++++++++------------------
 1 file changed, 46 insertions(+), 24 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] io_uring: dont kill fasync under completion_lock
  2021-01-02 16:06 [PATCH 0/4] bunch of random fixes Pavel Begunkov
@ 2021-01-02 16:06 ` Pavel Begunkov
  2021-01-03 11:58   ` Pavel Begunkov
  2021-01-02 16:06 ` [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync Pavel Begunkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-01-02 16:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: syzbot+91ca3f25bd7f795f019c

      CPU0                    CPU1
       ----                    ----
  lock(&new->fa_lock);
                               local_irq_disable();
                               lock(&ctx->completion_lock);
                               lock(&new->fa_lock);
  <Interrupt>
    lock(&ctx->completion_lock);

 *** DEADLOCK ***

Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
so it doesn't hold completion_lock while doing it. That saves from the
reported deadlock, and it's just nice to shorten the locking time and
untangle nested locks (compl_lock -> wq_head::lock).

Reported-by: syzbot+91ca3f25bd7f795f019c@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ca46f314640b..00dd85acd039 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1342,11 +1342,6 @@ static void __io_commit_cqring(struct io_ring_ctx *ctx)
 
 	/* order cqe stores with ring update */
 	smp_store_release(&rings->cq.tail, ctx->cached_cq_tail);
-
-	if (wq_has_sleeper(&ctx->cq_wait)) {
-		wake_up_interruptible(&ctx->cq_wait);
-		kill_fasync(&ctx->cq_fasync, SIGIO, POLL_IN);
-	}
 }
 
 static void io_put_identity(struct io_uring_task *tctx, struct io_kiocb *req)
@@ -1710,6 +1705,10 @@ static void io_cqring_ev_posted(struct io_ring_ctx *ctx)
 		wake_up(&ctx->sq_data->wait);
 	if (io_should_trigger_evfd(ctx))
 		eventfd_signal(ctx->cq_ev_fd, 1);
+	if (wq_has_sleeper(&ctx->cq_wait)) {
+		wake_up_interruptible(&ctx->cq_wait);
+		kill_fasync(&ctx->cq_fasync, SIGIO, POLL_IN);
+	}
 }
 
 /* Returns true if there are no backlogged entries after the flush */
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync
  2021-01-02 16:06 [PATCH 0/4] bunch of random fixes Pavel Begunkov
  2021-01-02 16:06 ` [PATCH 1/4] io_uring: dont kill fasync under completion_lock Pavel Begunkov
@ 2021-01-02 16:06 ` Pavel Begunkov
  2021-01-03 15:12   ` Jens Axboe
  2021-01-02 16:06 ` [PATCH 3/4] io_uring: drop file refs after task cancel Pavel Begunkov
  2021-01-02 16:06 ` [PATCH 4/4] io_uring: cancel more aggressively in exit_work Pavel Begunkov
  3 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-01-02 16:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring

IOPOLL skips completion locking but keeps it under uring_lock, thus
io_cqring_overflow_flush() and so io_cqring_events() need extra care.
Add extra conditional locking around them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 00dd85acd039..a4deef746bc3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2312,7 +2312,8 @@ static void io_double_put_req(struct io_kiocb *req)
 		io_free_req(req);
 }
 
-static unsigned io_cqring_events(struct io_ring_ctx *ctx, bool noflush)
+static unsigned io_cqring_events(struct io_ring_ctx *ctx, bool noflush,
+				 bool iopoll_lock)
 {
 	if (test_bit(0, &ctx->cq_check_overflow)) {
 		/*
@@ -2323,7 +2324,15 @@ static unsigned io_cqring_events(struct io_ring_ctx *ctx, bool noflush)
 		if (noflush)
 			return -1U;
 
+		/*
+		 * iopoll doesn't care about ctx->completion_lock but uses
+		 * ctx->uring_lock
+		 */
+		if (iopoll_lock)
+			mutex_lock(&ctx->uring_lock);
 		io_cqring_overflow_flush(ctx, false, NULL, NULL);
+		if (iopoll_lock)
+			mutex_unlock(&ctx->uring_lock);
 	}
 
 	/* See comment at the top of this file */
@@ -2550,7 +2559,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
 		 * If we do, we can potentially be spinning for commands that
 		 * already triggered a CQE (eg in error).
 		 */
-		if (io_cqring_events(ctx, false))
+		if (io_cqring_events(ctx, false, false))
 			break;
 
 		/*
@@ -7097,7 +7106,7 @@ static inline bool io_should_wake(struct io_wait_queue *iowq, bool noflush)
 	 * started waiting. For timeouts, we always want to return to userspace,
 	 * regardless of event count.
 	 */
-	return io_cqring_events(ctx, noflush) >= iowq->to_wait ||
+	return io_cqring_events(ctx, noflush, false) >= iowq->to_wait ||
 			atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
 }
 
@@ -7142,13 +7151,14 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 		.ctx		= ctx,
 		.to_wait	= min_events,
 	};
+	bool iopoll = ctx->flags & IORING_SETUP_IOPOLL;
 	struct io_rings *rings = ctx->rings;
 	struct timespec64 ts;
 	signed long timeout = 0;
 	int ret = 0;
 
 	do {
-		if (io_cqring_events(ctx, false) >= min_events)
+		if (io_cqring_events(ctx, false, iopoll) >= min_events)
 			return 0;
 		if (!io_run_task_work())
 			break;
@@ -7184,7 +7194,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 			continue;
 		else if (ret < 0)
 			break;
-		if (io_should_wake(&iowq, false))
+		/* iopoll ignores completion_lock, so not safe to flush */
+		if (io_should_wake(&iowq, iopoll))
 			break;
 		if (uts) {
 			timeout = schedule_timeout(timeout);
@@ -8623,7 +8634,7 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
 	smp_rmb();
 	if (!io_sqring_full(ctx))
 		mask |= EPOLLOUT | EPOLLWRNORM;
-	if (io_cqring_events(ctx, false))
+	if (io_cqring_events(ctx, false, ctx->flags & IORING_SETUP_IOPOLL))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	return mask;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] io_uring: drop file refs after task cancel
  2021-01-02 16:06 [PATCH 0/4] bunch of random fixes Pavel Begunkov
  2021-01-02 16:06 ` [PATCH 1/4] io_uring: dont kill fasync under completion_lock Pavel Begunkov
  2021-01-02 16:06 ` [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync Pavel Begunkov
@ 2021-01-02 16:06 ` Pavel Begunkov
  2021-01-02 16:06 ` [PATCH 4/4] io_uring: cancel more aggressively in exit_work Pavel Begunkov
  3 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2021-01-02 16:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring

io_uring fds marked O_CLOEXEC and we explicitly cancel all requests
before going through exec, so we don't want to leave task's file
references to not our anymore io_uring instances.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a4deef746bc3..3f38c252860b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8958,6 +8958,15 @@ static void io_uring_attempt_task_drop(struct file *file)
 		io_uring_del_task_file(file);
 }
 
+static void io_uring_remove_task_files(struct io_uring_task *tctx)
+{
+	struct file *file;
+	unsigned long index;
+
+	xa_for_each(&tctx->xa, index, file)
+		io_uring_del_task_file(file);
+}
+
 void __io_uring_files_cancel(struct files_struct *files)
 {
 	struct io_uring_task *tctx = current->io_uring;
@@ -8966,16 +8975,12 @@ void __io_uring_files_cancel(struct files_struct *files)
 
 	/* make sure overflow events are dropped */
 	atomic_inc(&tctx->in_idle);
-
-	xa_for_each(&tctx->xa, index, file) {
-		struct io_ring_ctx *ctx = file->private_data;
-
-		io_uring_cancel_task_requests(ctx, files);
-		if (files)
-			io_uring_del_task_file(file);
-	}
-
+	xa_for_each(&tctx->xa, index, file)
+		io_uring_cancel_task_requests(file->private_data, files);
 	atomic_dec(&tctx->in_idle);
+
+	if (files)
+		io_uring_remove_task_files(tctx);
 }
 
 static s64 tctx_inflight(struct io_uring_task *tctx)
@@ -9038,6 +9043,8 @@ void __io_uring_task_cancel(void)
 	} while (1);
 
 	atomic_dec(&tctx->in_idle);
+
+	io_uring_remove_task_files(tctx);
 }
 
 static int io_uring_flush(struct file *file, void *data)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] io_uring: cancel more aggressively in exit_work
  2021-01-02 16:06 [PATCH 0/4] bunch of random fixes Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-01-02 16:06 ` [PATCH 3/4] io_uring: drop file refs after task cancel Pavel Begunkov
@ 2021-01-02 16:06 ` Pavel Begunkov
  3 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2021-01-02 16:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring

While io_ring_exit_work() is running new requests of all sorts may be
issued, so it should do a bit more to cancel them, otherwise they may
just get stuck. e.g. in io-wq, in poll lists, etc.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3f38c252860b..389e6d359c3b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -992,6 +992,9 @@ enum io_mem_account {
 	ACCT_PINNED,
 };
 
+static void __io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
+					    struct task_struct *task);
+
 static void destroy_fixed_file_ref_node(struct fixed_file_ref_node *ref_node);
 static struct fixed_file_ref_node *alloc_fixed_file_ref_node(
 			struct io_ring_ctx *ctx);
@@ -8673,7 +8676,7 @@ static void io_ring_exit_work(struct work_struct *work)
 	 * as nobody else will be looking for them.
 	 */
 	do {
-		io_iopoll_try_reap_events(ctx);
+		__io_uring_cancel_task_requests(ctx, NULL);
 	} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
 	io_ring_ctx_free(ctx);
 }
@@ -8828,9 +8831,11 @@ static void __io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 		enum io_wq_cancel cret;
 		bool ret = false;
 
-		cret = io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb, &cancel, true);
-		if (cret != IO_WQ_CANCEL_NOTFOUND)
-			ret = true;
+		if (ctx->io_wq) {
+			cret = io_wq_cancel_cb(ctx->io_wq, io_cancel_task_cb,
+					       &cancel, true);
+			ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
+		}
 
 		/* SQPOLL thread does its own polling */
 		if (!(ctx->flags & IORING_SETUP_SQPOLL)) {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] io_uring: dont kill fasync under completion_lock
  2021-01-02 16:06 ` [PATCH 1/4] io_uring: dont kill fasync under completion_lock Pavel Begunkov
@ 2021-01-03 11:58   ` Pavel Begunkov
  0 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2021-01-03 11:58 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: syzbot+91ca3f25bd7f795f019c

On 02/01/2021 16:06, Pavel Begunkov wrote:
>       CPU0                    CPU1
>        ----                    ----
>   lock(&new->fa_lock);
>                                local_irq_disable();
>                                lock(&ctx->completion_lock);
>                                lock(&new->fa_lock);
>   <Interrupt>
>     lock(&ctx->completion_lock);
> 
>  *** DEADLOCK ***
> 
> Move kill_fasync() out of io_commit_cqring() to io_cqring_ev_posted(),
> so it doesn't hold completion_lock while doing it. That saves from the
> reported deadlock, and it's just nice to shorten the locking time and
> untangle nested locks (compl_lock -> wq_head::lock).

Need to resend this one

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync
  2021-01-02 16:06 ` [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync Pavel Begunkov
@ 2021-01-03 15:12   ` Jens Axboe
  2021-01-03 16:29     ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2021-01-03 15:12 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 1/2/21 9:06 AM, Pavel Begunkov wrote:
> IOPOLL skips completion locking but keeps it under uring_lock, thus
> io_cqring_overflow_flush() and so io_cqring_events() need extra care.
> Add extra conditional locking around them.

This one is pretty ugly. Would be greatly preferable to grab the lock
higher up instead of passing down the need to do so, imho.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync
  2021-01-03 15:12   ` Jens Axboe
@ 2021-01-03 16:29     ` Pavel Begunkov
  2021-01-03 22:05       ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-01-03 16:29 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 03/01/2021 15:12, Jens Axboe wrote:
> On 1/2/21 9:06 AM, Pavel Begunkov wrote:
>> IOPOLL skips completion locking but keeps it under uring_lock, thus
>> io_cqring_overflow_flush() and so io_cqring_events() need extra care.
>> Add extra conditional locking around them.
> 
> This one is pretty ugly. Would be greatly preferable to grab the lock
> higher up instead of passing down the need to do so, imho.
I can't disagree with that, the whole iopoll locking is a mess, but
still don't want to penalise SQPOLL|IOPOLL.

Splitting flushing from cqring_events might be a good idea. How
about the one below (not tested)? Killing this noflush looks even
cleaner than before.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 00dd85acd039..099f50de4aa5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1712,9 +1712,9 @@ static void io_cqring_ev_posted(struct io_ring_ctx *ctx)
 }
 
 /* Returns true if there are no backlogged entries after the flush */
-static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force,
-				     struct task_struct *tsk,
-				     struct files_struct *files)
+static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force,
+				       struct task_struct *tsk,
+				       struct files_struct *files)
 {
 	struct io_rings *rings = ctx->rings;
 	struct io_kiocb *req, *tmp;
@@ -1767,6 +1767,27 @@ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force,
 	return all_flushed;
 }
 
+static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force,
+				     struct task_struct *tsk,
+				     struct files_struct *files)
+{
+	if (test_bit(0, &ctx->cq_check_overflow)) {
+		bool ret, iopoll = ctx->flags & IORING_SETUP_IOPOLL;
+
+		/*
+		 * iopoll doesn't care about ctx->completion_lock but uses
+		 * ctx->uring_lock
+		 */
+		if (iopoll)
+			mutex_lock(&ctx->uring_lock);
+		ret = __io_cqring_overflow_flush(ctx, force, tsk, files);
+		if (iopoll)
+			mutex_unlock(&ctx->uring_lock);
+		return ret;
+	}
+	return true;
+}
+
 static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
@@ -2312,20 +2333,8 @@ static void io_double_put_req(struct io_kiocb *req)
 		io_free_req(req);
 }
 
-static unsigned io_cqring_events(struct io_ring_ctx *ctx, bool noflush)
+static unsigned io_cqring_events(struct io_ring_ctx *ctx)
 {
-	if (test_bit(0, &ctx->cq_check_overflow)) {
-		/*
-		 * noflush == true is from the waitqueue handler, just ensure
-		 * we wake up the task, and the next invocation will flush the
-		 * entries. We cannot safely to it from here.
-		 */
-		if (noflush)
-			return -1U;
-
-		io_cqring_overflow_flush(ctx, false, NULL, NULL);
-	}
-
 	/* See comment at the top of this file */
 	smp_rmb();
 	return __io_cqring_events(ctx);
@@ -2550,7 +2559,8 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
 		 * If we do, we can potentially be spinning for commands that
 		 * already triggered a CQE (eg in error).
 		 */
-		if (io_cqring_events(ctx, false))
+		__io_cqring_overflow_flush(ctx, false, NULL, NULL);
+		if (io_cqring_events(ctx))
 			break;
 
 		/*
@@ -6825,7 +6835,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
 
 	/* if we have a backlog and couldn't flush it all, return BUSY */
 	if (test_bit(0, &ctx->sq_check_overflow)) {
-		if (!io_cqring_overflow_flush(ctx, false, NULL, NULL))
+		if (!__io_cqring_overflow_flush(ctx, false, NULL, NULL))
 			return -EBUSY;
 	}
 
@@ -7088,7 +7098,7 @@ struct io_wait_queue {
 	unsigned nr_timeouts;
 };
 
-static inline bool io_should_wake(struct io_wait_queue *iowq, bool noflush)
+static inline bool io_should_wake(struct io_wait_queue *iowq)
 {
 	struct io_ring_ctx *ctx = iowq->ctx;
 
@@ -7097,7 +7107,7 @@ static inline bool io_should_wake(struct io_wait_queue *iowq, bool noflush)
 	 * started waiting. For timeouts, we always want to return to userspace,
 	 * regardless of event count.
 	 */
-	return io_cqring_events(ctx, noflush) >= iowq->to_wait ||
+	return io_cqring_events(ctx) >= iowq->to_wait ||
 			atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts;
 }
 
@@ -7107,10 +7117,14 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
 	struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
 							wq);
 
-	/* use noflush == true, as we can't safely rely on locking context */
-	if (!io_should_wake(iowq, true))
+	/*
+	 * just ensure we wake up the task, and the next invocation will
+	 * flush the entries. We cannot safely to it from here.
+	 */
+	if (test_bit(0, &ctx->cq_check_overflow))
+		return -1;
+	if (!io_should_wake(iowq))
 		return -1;
-
 	return autoremove_wake_function(curr, mode, wake_flags, key);
 }
 
@@ -7148,7 +7162,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	int ret = 0;
 
 	do {
-		if (io_cqring_events(ctx, false) >= min_events)
+		io_cqring_overflow_flush(ctx, false, NULL, NULL);
+		if (io_cqring_events(ctx) >= min_events)
 			return 0;
 		if (!io_run_task_work())
 			break;
@@ -7184,8 +7199,16 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 			continue;
 		else if (ret < 0)
 			break;
-		if (io_should_wake(&iowq, false))
+
+		/* iopoll ignores completion_lock, so not safe to flush */
+		if (io_should_wake(&iowq))
 			break;
+		if (test_bit(0, &ctx->cq_check_overflow)) {
+			finish_wait(&ctx->wait, &iowq.wq);
+			io_cqring_overflow_flush(ctx, false, NULL, NULL);
+			continue;
+		}
+
 		if (uts) {
 			timeout = schedule_timeout(timeout);
 			if (timeout == 0) {
@@ -8623,7 +8646,8 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
 	smp_rmb();
 	if (!io_sqring_full(ctx))
 		mask |= EPOLLOUT | EPOLLWRNORM;
-	if (io_cqring_events(ctx, false))
+	io_cqring_overflow_flush(ctx, false, NULL, NULL);
+	if (io_cqring_events(ctx))
 		mask |= EPOLLIN | EPOLLRDNORM;
 
 	return mask;
@@ -8681,7 +8705,7 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 	/* if force is set, the ring is going away. always drop after that */
 	ctx->cq_overflow_flushed = 1;
 	if (ctx->rings)
-		io_cqring_overflow_flush(ctx, true, NULL, NULL);
+		__io_cqring_overflow_flush(ctx, true, NULL, NULL);
 	mutex_unlock(&ctx->uring_lock);
 
 	io_kill_timeouts(ctx, NULL, NULL);
@@ -8855,9 +8879,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 	}
 
 	io_cancel_defer_files(ctx, task, files);
-	io_ring_submit_lock(ctx, (ctx->flags & IORING_SETUP_IOPOLL));
 	io_cqring_overflow_flush(ctx, true, task, files);
-	io_ring_submit_unlock(ctx, (ctx->flags & IORING_SETUP_IOPOLL));
 
 	if (!files)
 		__io_uring_cancel_task_requests(ctx, task);
@@ -9193,13 +9215,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	 */
 	ret = 0;
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
-		if (!list_empty_careful(&ctx->cq_overflow_list)) {
-			bool needs_lock = ctx->flags & IORING_SETUP_IOPOLL;
+		io_cqring_overflow_flush(ctx, false, NULL, NULL);
 
-			io_ring_submit_lock(ctx, needs_lock);
-			io_cqring_overflow_flush(ctx, false, NULL, NULL);
-			io_ring_submit_unlock(ctx, needs_lock);
-		}
 		if (flags & IORING_ENTER_SQ_WAKEUP)
 			wake_up(&ctx->sq_data->wait);
 		if (flags & IORING_ENTER_SQ_WAIT)
-- 
2.24.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync
  2021-01-03 16:29     ` Pavel Begunkov
@ 2021-01-03 22:05       ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2021-01-03 22:05 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 1/3/21 9:29 AM, Pavel Begunkov wrote:
> On 03/01/2021 15:12, Jens Axboe wrote:
>> On 1/2/21 9:06 AM, Pavel Begunkov wrote:
>>> IOPOLL skips completion locking but keeps it under uring_lock, thus
>>> io_cqring_overflow_flush() and so io_cqring_events() need extra care.
>>> Add extra conditional locking around them.
>>
>> This one is pretty ugly. Would be greatly preferable to grab the lock
>> higher up instead of passing down the need to do so, imho.
> I can't disagree with that, the whole iopoll locking is a mess, but
> still don't want to penalise SQPOLL|IOPOLL.
> 
> Splitting flushing from cqring_events might be a good idea. How
> about the one below (not tested)? Killing this noflush looks even
> cleaner than before.

From a quick look, that's much better.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-01-03 22:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-02 16:06 [PATCH 0/4] bunch of random fixes Pavel Begunkov
2021-01-02 16:06 ` [PATCH 1/4] io_uring: dont kill fasync under completion_lock Pavel Begunkov
2021-01-03 11:58   ` Pavel Begunkov
2021-01-02 16:06 ` [PATCH 2/4] io_uring: patch up IOPOLL overflow_flush sync Pavel Begunkov
2021-01-03 15:12   ` Jens Axboe
2021-01-03 16:29     ` Pavel Begunkov
2021-01-03 22:05       ` Jens Axboe
2021-01-02 16:06 ` [PATCH 3/4] io_uring: drop file refs after task cancel Pavel Begunkov
2021-01-02 16:06 ` [PATCH 4/4] io_uring: cancel more aggressively in exit_work Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).