IO-Uring Archive on lore.kernel.org
 help / color / Atom feed
* [PATCHSET 0/2] io_uring support for linked timeouts
@ 2019-11-05 21:11 Jens Axboe
  2019-11-05 21:11 ` [PATCH 1/2] io_uring: abstract out io_async_cancel_one() helper Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-05 21:11 UTC (permalink / raw)
  To: io-uring, linux-block; +Cc: zeba.hrvoje, asml.silence, liuyun01

First of all, if you haven't already noticed, there's a new and shiny
io_uring mailing list. It's:

io-uring@vger.kernel.org

Subscribe if you are at all interested in io_uring development. It's
meant to cover both kernel and user side, and everything from questions,
bugs, development, etc.

Anyway, this is support for IORING_OP_LINK_TIMEOUT. Unlike the timeouts
we have now that work on purely the CQ ring, these timeouts are
specifically tied to a specific command. They are meant to be used to
auto-cancel a request, if it hasn't finished in X amount of time. The
way to use then is to setup your command as you usually would, but then
mark is IOSQE_IO_LINK and add an IORING_OP_LINK_TIMEOUT right after it.
That's how linked commands work to begin with. The main difference here
is that links are normally only started once the dependent request
completes, but for IORING_OP_LINK_TIMEOUT they are armed as soon as we
start the dependent request.

If the dependent request finishes before the linked timeout, the timeout
is canceled. If the timeout finishes before the dependent request, the
dependent request is attempted canceled.

IORING_OP_LINK_TIMEOUT is setup just like IORING_OP_TIMEOUT in terms
of passing in the timespec associated with it.

I added a bunch of test cases to liburing, currently residing in a
link-timeout branch. View them here:

https://git.kernel.dk/cgit/liburing/commit/?h=link-timeout&id=bc1bd5e97e2c758d6fd975bd35843b9b2c770c5a

Patches are against for-5.5/io_uring, and can currently also be found in
my for-5.5/io_uring-test branch.

 fs/io_uring.c                 | 203 +++++++++++++++++++++++++++++++---
 include/uapi/linux/io_uring.h |   1 +
 2 files changed, 187 insertions(+), 17 deletions(-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] io_uring: abstract out io_async_cancel_one() helper
  2019-11-05 21:11 [PATCHSET 0/2] io_uring support for linked timeouts Jens Axboe
@ 2019-11-05 21:11 ` Jens Axboe
  2019-11-05 21:11 ` [PATCH 2/2] io_uring: add support for linked SQE timeouts Jens Axboe
  2019-11-14 21:24 ` [PATCHSET 0/2] io_uring support for linked timeouts Pavel Begunkov
  2 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-05 21:11 UTC (permalink / raw)
  To: io-uring, linux-block; +Cc: zeba.hrvoje, asml.silence, liuyun01, Jens Axboe

We're going to need this helper in a future patch, so move it out
of io_async_cancel() and into its own separate function.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c | 31 ++++++++++++++++++++-----------
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7813bc7d5b61..a71c84808dd0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2142,21 +2142,11 @@ static bool io_cancel_cb(struct io_wq_work *work, void *data)
 	return req->user_data == (unsigned long) data;
 }
 
-static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
-			   struct io_kiocb **nxt)
+static int io_async_cancel_one(struct io_ring_ctx *ctx, void *sqe_addr)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	enum io_wq_cancel cancel_ret;
-	void *sqe_addr;
 	int ret = 0;
 
-	if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
-		return -EINVAL;
-	if (sqe->flags || sqe->ioprio || sqe->off || sqe->len ||
-	    sqe->cancel_flags)
-		return -EINVAL;
-
-	sqe_addr = (void *) (unsigned long) READ_ONCE(sqe->addr);
 	cancel_ret = io_wq_cancel_cb(ctx->io_wq, io_cancel_cb, sqe_addr);
 	switch (cancel_ret) {
 	case IO_WQ_CANCEL_OK:
@@ -2170,6 +2160,25 @@ static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		break;
 	}
 
+	return ret;
+}
+
+static int io_async_cancel(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+			   struct io_kiocb **nxt)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+	void *sqe_addr;
+	int ret;
+
+	if (unlikely(ctx->flags & IORING_SETUP_IOPOLL))
+		return -EINVAL;
+	if (sqe->flags || sqe->ioprio || sqe->off || sqe->len ||
+	    sqe->cancel_flags)
+		return -EINVAL;
+
+	sqe_addr = (void *) (unsigned long) READ_ONCE(sqe->addr);
+	ret = io_async_cancel_one(ctx, sqe_addr);
+
 	if (ret < 0 && (req->flags & REQ_F_LINK))
 		req->flags |= REQ_F_FAIL_LINK;
 	io_cqring_add_event(req->ctx, sqe->user_data, ret);
-- 
2.24.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/2] io_uring: add support for linked SQE timeouts
  2019-11-05 21:11 [PATCHSET 0/2] io_uring support for linked timeouts Jens Axboe
  2019-11-05 21:11 ` [PATCH 1/2] io_uring: abstract out io_async_cancel_one() helper Jens Axboe
@ 2019-11-05 21:11 ` Jens Axboe
  2019-11-14 21:24 ` [PATCHSET 0/2] io_uring support for linked timeouts Pavel Begunkov
  2 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-05 21:11 UTC (permalink / raw)
  To: io-uring, linux-block; +Cc: zeba.hrvoje, asml.silence, liuyun01, Jens Axboe

While we have support for generic timeouts, we don't have a way to tie
a timeout to a specific SQE. The generic timeouts simply trigger wakeups
on the CQ ring.

This adds support for IORING_OP_LINK_TIMEOUT. This command is only valid
as a link to a previous command. The timeout specific can be either
relative or absolute, following the same rules as IORING_OP_TIMEOUT. If
the timeout triggers before the dependent command completes, it will
attempt to cancel that command. Likewise, if the dependent command
completes before the timeout triggers, it will cancel the timeout.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/io_uring.c                 | 172 ++++++++++++++++++++++++++++++++--
 include/uapi/linux/io_uring.h |   1 +
 2 files changed, 167 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a71c84808dd0..d4ff3e49a78c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -336,6 +336,7 @@ struct io_kiocb {
 #define REQ_F_ISREG		2048	/* regular file */
 #define REQ_F_MUST_PUNT		4096	/* must be punted even for NONBLOCK */
 #define REQ_F_INFLIGHT		8192	/* on inflight list */
+#define REQ_F_LINK_TIMEOUT	16384	/* has linked timeout */
 	u64			user_data;
 	u32			result;
 	u32			sequence;
@@ -372,6 +373,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr);
 static void io_cqring_fill_event(struct io_ring_ctx *ctx, u64 ki_user_data,
 				 long res);
 static void __io_free_req(struct io_kiocb *req);
+static void io_put_req(struct io_kiocb *req, struct io_kiocb **nxtptr);
 
 static struct kmem_cache *req_cachep;
 
@@ -713,9 +715,35 @@ static void __io_free_req(struct io_kiocb *req)
 	kmem_cache_free(req_cachep, req);
 }
 
+static void io_link_cancel_timeout(struct io_ring_ctx *ctx,
+				   struct io_kiocb *req)
+{
+	int ret;
+
+	ret = hrtimer_try_to_cancel(&req->timeout.timer);
+	if (ret != -1) {
+		io_cqring_fill_event(ctx, req->user_data, -ECANCELED);
+		io_commit_cqring(ctx);
+		req->flags &= ~REQ_F_LINK;
+		__io_free_req(req);
+	}
+}
+
 static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
 {
+	struct io_ring_ctx *ctx = req->ctx;
 	struct io_kiocb *nxt;
+	bool is_timeout_link;
+	unsigned long flags;
+
+	/*
+	 * If this is a timeout link, we could be racing with the timeout
+	 * timer. Grab the completion lock for this case to protect against
+	 * that.
+	 */
+	is_timeout_link = (req->flags & REQ_F_LINK_TIMEOUT);
+	if (is_timeout_link)
+		spin_lock_irqsave(&ctx->completion_lock, flags);
 
 	/*
 	 * The list should never be empty when we are called here. But could
@@ -723,7 +751,7 @@ static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
 	 * safe side.
 	 */
 	nxt = list_first_entry_or_null(&req->link_list, struct io_kiocb, list);
-	if (nxt) {
+	while (nxt) {
 		list_del(&nxt->list);
 		if (!list_empty(&req->link_list)) {
 			INIT_LIST_HEAD(&nxt->link_list);
@@ -736,10 +764,24 @@ static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
 		 * If we're in async work, we can continue processing the chain
 		 * in this context instead of having to queue up new async work.
 		 */
-		if (nxtptr && current_work())
+		if (is_timeout_link) {
+			io_link_cancel_timeout(ctx, nxt);
+
+			/* we dropped this link, get next */
+			nxt = list_first_entry_or_null(&req->link_list,
+							struct io_kiocb, list);
+		} else if (nxtptr && current_work()) {
 			*nxtptr = nxt;
-		else
+			nxt = NULL;
+		} else {
 			io_queue_async_work(req->ctx, nxt);
+			nxt = NULL;
+		}
+	}
+
+	if (is_timeout_link) {
+		spin_unlock_irqrestore(&ctx->completion_lock, flags);
+		io_cqring_ev_posted(ctx);
 	}
 }
 
@@ -748,16 +790,30 @@ static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
  */
 static void io_fail_links(struct io_kiocb *req)
 {
+	struct io_ring_ctx *ctx = req->ctx;
 	struct io_kiocb *link;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->completion_lock, flags);
 
 	while (!list_empty(&req->link_list)) {
 		link = list_first_entry(&req->link_list, struct io_kiocb, list);
-		list_del(&link->list);
+		list_del_init(&link->list);
 
 		trace_io_uring_fail_link(req, link);
-		io_cqring_add_event(req->ctx, link->user_data, -ECANCELED);
-		__io_free_req(link);
+
+		if ((req->flags & REQ_F_LINK_TIMEOUT) &&
+		    link->submit.sqe->opcode == IORING_OP_LINK_TIMEOUT) {
+			io_link_cancel_timeout(ctx, link);
+		} else {
+			io_cqring_fill_event(ctx, link->user_data, -ECANCELED);
+			__io_free_req(link);
+		}
 	}
+
+	io_commit_cqring(ctx);
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+	io_cqring_ev_posted(ctx);
 }
 
 static void io_free_req(struct io_kiocb *req, struct io_kiocb **nxt)
@@ -2434,11 +2490,111 @@ static int io_grab_files(struct io_ring_ctx *ctx, struct io_kiocb *req)
 	return ret;
 }
 
+static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
+{
+	struct io_kiocb *req = container_of(timer, struct io_kiocb,
+						timeout.timer);
+	struct io_ring_ctx *ctx = req->ctx;
+	struct io_kiocb *prev = NULL;
+	unsigned long flags;
+	int ret = -ETIME;
+
+	spin_lock_irqsave(&ctx->completion_lock, flags);
+
+	/*
+	 * We don't expect the list to be empty, that will only happen if we
+	 * race with the completion of the linked work.
+	 */
+	if (!list_empty(&req->list)) {
+		prev = list_entry(req->list.prev, struct io_kiocb, link_list);
+		list_del_init(&req->list);
+	}
+
+	spin_unlock_irqrestore(&ctx->completion_lock, flags);
+
+	if (prev)
+		ret = io_async_cancel_one(ctx, (void *) prev->user_data);
+
+	io_cqring_add_event(ctx, req->user_data, ret);
+	io_put_req(req, NULL);
+	return HRTIMER_NORESTART;
+}
+
+static int io_queue_linked_timeout(struct io_kiocb *req, struct io_kiocb *nxt)
+{
+	const struct io_uring_sqe *sqe = nxt->submit.sqe;
+	enum hrtimer_mode mode;
+	struct timespec64 ts;
+	int ret = -EINVAL;
+
+	if (sqe->flags || sqe->ioprio || sqe->buf_index || sqe->len != 1)
+		goto err;
+	if (sqe->timeout_flags & ~IORING_TIMEOUT_ABS)
+		goto err;
+	if (get_timespec64(&ts, u64_to_user_ptr(sqe->addr))) {
+		ret = -EFAULT;
+		goto err;
+	}
+
+	req->flags |= REQ_F_LINK_TIMEOUT;
+
+	if (sqe->flags & IORING_TIMEOUT_ABS)
+		mode = HRTIMER_MODE_ABS;
+	else
+		mode = HRTIMER_MODE_REL;
+	hrtimer_init(&nxt->timeout.timer, CLOCK_MONOTONIC, mode);
+	nxt->timeout.timer.function = io_link_timeout_fn;
+	hrtimer_start(&nxt->timeout.timer, timespec64_to_ktime(ts), mode);
+	ret = 0;
+err:
+	/* drop submission reference */
+	io_put_req(nxt, NULL);
+
+	if (ret) {
+		struct io_ring_ctx *ctx = req->ctx;
+
+		/*
+		 * Break the link and fail linked timeout, parent will get
+		 * failed by the regular submission path.
+		 */
+		list_del(&nxt->list);
+		io_cqring_fill_event(ctx, nxt->user_data, ret);
+		trace_io_uring_fail_link(req, nxt);
+		io_commit_cqring(ctx);
+		io_put_req(nxt, NULL);
+		ret = -ECANCELED;
+	}
+
+	return ret;
+}
+
+static inline struct io_kiocb *io_get_linked_timeout(struct io_kiocb *req)
+{
+	struct io_kiocb *nxt;
+
+	if (!(req->flags & REQ_F_LINK))
+		return NULL;
+
+	nxt = list_first_entry_or_null(&req->link_list, struct io_kiocb, list);
+	if (nxt && nxt->submit.sqe->opcode == IORING_OP_LINK_TIMEOUT)
+		return nxt;
+
+	return NULL;
+}
+
 static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 			struct sqe_submit *s)
 {
+	struct io_kiocb *nxt;
 	int ret;
 
+	nxt = io_get_linked_timeout(req);
+	if (unlikely(nxt)) {
+		ret = io_queue_linked_timeout(req, nxt);
+		if (ret)
+			goto err;
+	}
+
 	ret = __io_submit_sqe(ctx, req, s, NULL, true);
 
 	/*
@@ -2603,6 +2759,10 @@ static void io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s,
 		memcpy(&req->submit, s, sizeof(*s));
 		INIT_LIST_HEAD(&req->link_list);
 		*link = req;
+	} else if (READ_ONCE(s->sqe->opcode) == IORING_OP_LINK_TIMEOUT) {
+		/* Only valid as a linked SQE */
+		ret = -EINVAL;
+		goto err_req;
 	} else {
 		io_queue_sqe(ctx, req, s);
 	}
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 6877cf8894db..f1a118b01d18 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -72,6 +72,7 @@ struct io_uring_sqe {
 #define IORING_OP_TIMEOUT_REMOVE	12
 #define IORING_OP_ACCEPT	13
 #define IORING_OP_ASYNC_CANCEL	14
+#define IORING_OP_LINK_TIMEOUT	15
 
 /*
  * sqe->fsync_flags
-- 
2.24.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-05 21:11 [PATCHSET 0/2] io_uring support for linked timeouts Jens Axboe
  2019-11-05 21:11 ` [PATCH 1/2] io_uring: abstract out io_async_cancel_one() helper Jens Axboe
  2019-11-05 21:11 ` [PATCH 2/2] io_uring: add support for linked SQE timeouts Jens Axboe
@ 2019-11-14 21:24 ` Pavel Begunkov
  2019-11-14 22:37   ` Jens Axboe
  2 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-14 21:24 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 2868 bytes --]

On 06/11/2019 00:11, Jens Axboe wrote:
> Anyway, this is support for IORING_OP_LINK_TIMEOUT. Unlike the timeouts
> we have now that work on purely the CQ ring, these timeouts are
> specifically tied to a specific command. They are meant to be used to
> auto-cancel a request, if it hasn't finished in X amount of time. The
> way to use then is to setup your command as you usually would, but then
> mark is IOSQE_IO_LINK and add an IORING_OP_LINK_TIMEOUT right after it.
> That's how linked commands work to begin with. The main difference here
> is that links are normally only started once the dependent request
> completes, but for IORING_OP_LINK_TIMEOUT they are armed as soon as we
> start the dependent request.
> 
> If the dependent request finishes before the linked timeout, the timeout
> is canceled. If the timeout finishes before the dependent request, the
> dependent request is attempted canceled.
> 
> IORING_OP_LINK_TIMEOUT is setup just like IORING_OP_TIMEOUT in terms
> of passing in the timespec associated with it.
> 
> I added a bunch of test cases to liburing, currently residing in a
> link-timeout branch. View them here:
> 

Finally got to this patch. I think, find it adding too many edge cases
and it isn't integrated consistently into what we have now. I would love
to hear your vision, but I'd try to implement them in such a way, that it
doesn't need to modify the framework, at least for some particular case.
In other words, as opcodes could have been added from the outside with a
function table.

Also, it's not so consistent with the userspace API as well.

1. If we specified drain for the timeout, should its start be delayed
until then? I would prefer so.

E.g. send_msg + drained linked_timeout, which would set a timeout from the
start of the send.

2. Why it could be only the second one in a link? May we want to cancel
from a certain point?
e.g. "op1 -> op2 -> timeout -> op3" cancels op2 and op3

3. It's a bit strange, that the timeout affects a request from the left,
and after as an consequence cancels everything on the right (i.e. chain).
Could we place it in the head? So it would affect all requests on the right
from it.

4. I'd prefer to handle it as a new generic command and setting a timer
in __io_submit_sqe().

I believe we can do it more gracefully, and at the same moment giving
more freedom to the user. What do you think?


> https://git.kernel.dk/cgit/liburing/commit/?h=link-timeout&id=bc1bd5e97e2c758d6fd975bd35843b9b2c770c5a
> 
> Patches are against for-5.5/io_uring, and can currently also be found in
> my for-5.5/io_uring-test branch.
> 
>  fs/io_uring.c                 | 203 +++++++++++++++++++++++++++++++---
>  include/uapi/linux/io_uring.h |   1 +
>  2 files changed, 187 insertions(+), 17 deletions(-)
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-14 21:24 ` [PATCHSET 0/2] io_uring support for linked timeouts Pavel Begunkov
@ 2019-11-14 22:37   ` Jens Axboe
  2019-11-15  9:40     ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-14 22:37 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

On 11/14/19 2:24 PM, Pavel Begunkov wrote:
> On 06/11/2019 00:11, Jens Axboe wrote:
>> Anyway, this is support for IORING_OP_LINK_TIMEOUT. Unlike the timeouts
>> we have now that work on purely the CQ ring, these timeouts are
>> specifically tied to a specific command. They are meant to be used to
>> auto-cancel a request, if it hasn't finished in X amount of time. The
>> way to use then is to setup your command as you usually would, but then
>> mark is IOSQE_IO_LINK and add an IORING_OP_LINK_TIMEOUT right after it.
>> That's how linked commands work to begin with. The main difference here
>> is that links are normally only started once the dependent request
>> completes, but for IORING_OP_LINK_TIMEOUT they are armed as soon as we
>> start the dependent request.
>>
>> If the dependent request finishes before the linked timeout, the timeout
>> is canceled. If the timeout finishes before the dependent request, the
>> dependent request is attempted canceled.
>>
>> IORING_OP_LINK_TIMEOUT is setup just like IORING_OP_TIMEOUT in terms
>> of passing in the timespec associated with it.
>>
>> I added a bunch of test cases to liburing, currently residing in a
>> link-timeout branch. View them here:
>>
> 
> Finally got to this patch. I think, find it adding too many edge cases
> and it isn't integrated consistently into what we have now. I would love
> to hear your vision, but I'd try to implement them in such a way, that it
> doesn't need to modify the framework, at least for some particular case.
> In other words, as opcodes could have been added from the outside with a
> function table.

I agree, it could do with a bit of cleanup. Incrementals would be
appreciated!

> Also, it's not so consistent with the userspace API as well.
> 
> 1. If we specified drain for the timeout, should its start be delayed
> until then? I would prefer so.
>
> E.g. send_msg + drained linked_timeout, which would set a timeout from the
> start of the send.

What cases would that apply to, what would the timeout even do in this
case? The point of the linked timeout is to abort the previous command.
Maybe I'm not following what you mean here.

> 2. Why it could be only the second one in a link? May we want to cancel
> from a certain point?
> e.g. "op1 -> op2 -> timeout -> op3" cancels op2 and op3

Logically it need not be the second, it just has to follow another
request. Is there a bug there?

> 3. It's a bit strange, that the timeout affects a request from the left,
> and after as an consequence cancels everything on the right (i.e. chain).
> Could we place it in the head? So it would affect all requests on the right
> from it.

But that's how links work, though. If you keep linking, then everything
that depends on X will fail, if X itself isn't succesful.

> 4. I'd prefer to handle it as a new generic command and setting a timer
> in __io_submit_sqe().
> 
> I believe we can do it more gracefully, and at the same moment giving
> more freedom to the user. What do you think?

I just think we need to make sure the ground rules are sane. I'm going
to write a few test cases to make sure we do the right thing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-14 22:37   ` Jens Axboe
@ 2019-11-15  9:40     ` Pavel Begunkov
  2019-11-15 14:21       ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-15  9:40 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

>> Finally got to this patch. I think, find it adding too many edge cases
>> and it isn't integrated consistently into what we have now. I would love
>> to hear your vision, but I'd try to implement them in such a way, that it
>> doesn't need to modify the framework, at least for some particular case.
>> In other words, as opcodes could have been added from the outside with a
>> function table.
> 
> I agree, it could do with a bit of cleanup. Incrementals would be
> appreciated!
> 
>> Also, it's not so consistent with the userspace API as well.
>>
>> 1. If we specified drain for the timeout, should its start be delayed
>> until then? I would prefer so.
>>
>> E.g. send_msg + drained linked_timeout, which would set a timeout from the
>> start of the send.
> 
> What cases would that apply to, what would the timeout even do in this
> case? The point of the linked timeout is to abort the previous command.
> Maybe I'm not following what you mean here.
> 
Hmm, got it a bit wrong with defer. io_queue_link_head() can defer it
without setting timeout. However, it seems that io_wq_submit_work()
won't set a timer, as it uses __io_submit_sqe(), but not
__io_queue_sqe(), which handles all this with linked timeouts.

Indeed, maybe it be, that you wanted to place it in __io_submit_sqe?

>> 2. Why it could be only the second one in a link? May we want to cancel
>> from a certain point?
>> e.g. "op1 -> op2 -> timeout -> op3" cancels op2 and op3
> 
> Logically it need not be the second, it just has to follow another
> request. Is there a bug there?
> 
__io_queue_sqe looks only for the second one in a link. Other linked
timeouts will be ignored, if I get the code right.

Also linking may (or __may not__) be an issue. As you remember, the head
is linked through link_list, and all following with list.
i.e. req_head.link_list <-> req.list <-> req.list <-> req.list

free_req() (last time I saw it), expects that timeout's previous request
is linked with link_list. If a timeout can fire in the middle of a link
(during execution), this could be not the case. But it depends on when
we set an timeout.

BTW, personally I'd link them all through link_list. E.g. may get rid of
splicing in free_req(). I'll try to make it later.

>> 3. It's a bit strange, that the timeout affects a request from the left,
>> and after as an consequence cancels everything on the right (i.e. chain).
>> Could we place it in the head? So it would affect all requests on the right
>> from it.
> 
> But that's how links work, though. If you keep linking, then everything
> that depends on X will fail, if X itself isn't succesful.
> 
Right. That's about what userspace API would be saner. To place timeout
on the left of a request, or on the right, with the same resulting effect.

Let put this question away until the others are clear.

>> 4. I'd prefer to handle it as a new generic command and setting a timer
>> in __io_submit_sqe().
>>
>> I believe we can do it more gracefully, and at the same moment giving
>> more freedom to the user. What do you think?
> 
> I just think we need to make sure the ground rules are sane. I'm going
> to write a few test cases to make sure we do the right thing.
> 

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15  9:40     ` Pavel Begunkov
@ 2019-11-15 14:21       ` Pavel Begunkov
  2019-11-15 15:13         ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-15 14:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 4336 bytes --]

On 15/11/2019 12:40, Pavel Begunkov wrote:
>>> Finally got to this patch. I think, find it adding too many edge cases
>>> and it isn't integrated consistently into what we have now. I would love
>>> to hear your vision, but I'd try to implement them in such a way, that it
>>> doesn't need to modify the framework, at least for some particular case.
>>> In other words, as opcodes could have been added from the outside with a
>>> function table.
>>
>> I agree, it could do with a bit of cleanup. Incrementals would be
>> appreciated!
>>
>>> Also, it's not so consistent with the userspace API as well.
>>>
>>> 1. If we specified drain for the timeout, should its start be delayed
>>> until then? I would prefer so.
>>>
>>> E.g. send_msg + drained linked_timeout, which would set a timeout from the
>>> start of the send.
>>
>> What cases would that apply to, what would the timeout even do in this
>> case? The point of the linked timeout is to abort the previous command.
>> Maybe I'm not following what you mean here.
>>
> Hmm, got it a bit wrong with defer. io_queue_link_head() can defer it
> without setting timeout. However, it seems that io_wq_submit_work()
> won't set a timer, as it uses __io_submit_sqe(), but not
> __io_queue_sqe(), which handles all this with linked timeouts.
> 
> Indeed, maybe it be, that you wanted to place it in __io_submit_sqe?
> 
>>> 2. Why it could be only the second one in a link? May we want to cancel
>>> from a certain point?
>>> e.g. "op1 -> op2 -> timeout -> op3" cancels op2 and op3
>>
>> Logically it need not be the second, it just has to follow another
>> request. Is there a bug there?
>>
> __io_queue_sqe looks only for the second one in a link. Other linked
> timeouts will be ignored, if I get the code right.
> 
> Also linking may (or __may not__) be an issue. As you remember, the head
> is linked through link_list, and all following with list.
> i.e. req_head.link_list <-> req.list <-> req.list <-> req.list
> 
> free_req() (last time I saw it), expects that timeout's previous request
> is linked with link_list. If a timeout can fire in the middle of a link
> (during execution), this could be not the case. But it depends on when
> we set an timeout.
> 
> BTW, personally I'd link them all through link_list. E.g. may get rid of
> splicing in free_req(). I'll try to make it later.
> 
>>> 3. It's a bit strange, that the timeout affects a request from the left,
>>> and after as an consequence cancels everything on the right (i.e. chain).
>>> Could we place it in the head? So it would affect all requests on the right
>>> from it.
>>
>> But that's how links work, though. If you keep linking, then everything
>> that depends on X will fail, if X itself isn't succesful.
>>
> Right. That's about what userspace API would be saner. To place timeout
> on the left of a request, or on the right, with the same resulting effect.
> 
> Let put this question away until the others are clear.
> 
>>> 4. I'd prefer to handle it as a new generic command and setting a timer
>>> in __io_submit_sqe().
>>>
>>> I believe we can do it more gracefully, and at the same moment giving
>>> more freedom to the user. What do you think?
>>
>> I just think we need to make sure the ground rules are sane. I'm going
>> to write a few test cases to make sure we do the right thing.
>>
> 
Ok, let me try to state some rules to discuss:

1. REQ -> LINK_TIMEOUT
is a valid use case

2. timeout is set at the moment of starting execution of operation.
e.g. REQ1, REQ2|DRAIN -> LINK_TIMEOUT

Timer is set at the moment, when everything is drained and we
sending REQ. i.e. after completion of REQ1

3. REQ1 -> LINK_TIMEOUT1 -> REQ2 -> LINK_TIMEOUT2

is valid, and LINK_TIMEOUT2 will be set, at the moment of
start of REQ2's execution. It also mean, that if
LINK_TIMEOUT1 fires, it will cancel REQ1, and REQ2
with LINK_TIMEOUT2 (with proper return values)

4. REQ1, LINK_TIMEOUT
is invalid, fail it

5. LINK_TIMEOUT1 -> LINK_TIMEOUT2
Fail first, link-fail (aka cancelled) for the second one

6. REQ1 -> LINK_TIMEOUT1 -> LINK_TIMEOUT2
execute REQ1+LINK_TIMEOUT1, and then fail LINK_TIMEOUT2 as
invalid. Also, LINK_TIMEOUT2 could be just cancelled
(e.g. if fail_links for REQ1)

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 14:21       ` Pavel Begunkov
@ 2019-11-15 15:13         ` Jens Axboe
  2019-11-15 17:11           ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-15 15:13 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

On 11/15/19 7:21 AM, Pavel Begunkov wrote:
> On 15/11/2019 12:40, Pavel Begunkov wrote:
>>>> Finally got to this patch. I think, find it adding too many edge cases
>>>> and it isn't integrated consistently into what we have now. I would love
>>>> to hear your vision, but I'd try to implement them in such a way, that it
>>>> doesn't need to modify the framework, at least for some particular case.
>>>> In other words, as opcodes could have been added from the outside with a
>>>> function table.
>>>
>>> I agree, it could do with a bit of cleanup. Incrementals would be
>>> appreciated!
>>>
>>>> Also, it's not so consistent with the userspace API as well.
>>>>
>>>> 1. If we specified drain for the timeout, should its start be delayed
>>>> until then? I would prefer so.
>>>>
>>>> E.g. send_msg + drained linked_timeout, which would set a timeout from the
>>>> start of the send.
>>>
>>> What cases would that apply to, what would the timeout even do in this
>>> case? The point of the linked timeout is to abort the previous command.
>>> Maybe I'm not following what you mean here.
>>>
>> Hmm, got it a bit wrong with defer. io_queue_link_head() can defer it
>> without setting timeout. However, it seems that io_wq_submit_work()
>> won't set a timer, as it uses __io_submit_sqe(), but not
>> __io_queue_sqe(), which handles all this with linked timeouts.
>>
>> Indeed, maybe it be, that you wanted to place it in __io_submit_sqe?
>>
>>>> 2. Why it could be only the second one in a link? May we want to cancel
>>>> from a certain point?
>>>> e.g. "op1 -> op2 -> timeout -> op3" cancels op2 and op3
>>>
>>> Logically it need not be the second, it just has to follow another
>>> request. Is there a bug there?
>>>
>> __io_queue_sqe looks only for the second one in a link. Other linked
>> timeouts will be ignored, if I get the code right.
>>
>> Also linking may (or __may not__) be an issue. As you remember, the head
>> is linked through link_list, and all following with list.
>> i.e. req_head.link_list <-> req.list <-> req.list <-> req.list
>>
>> free_req() (last time I saw it), expects that timeout's previous request
>> is linked with link_list. If a timeout can fire in the middle of a link
>> (during execution), this could be not the case. But it depends on when
>> we set an timeout.
>>
>> BTW, personally I'd link them all through link_list. E.g. may get rid of
>> splicing in free_req(). I'll try to make it later.
>>
>>>> 3. It's a bit strange, that the timeout affects a request from the left,
>>>> and after as an consequence cancels everything on the right (i.e. chain).
>>>> Could we place it in the head? So it would affect all requests on the right
>>>> from it.
>>>
>>> But that's how links work, though. If you keep linking, then everything
>>> that depends on X will fail, if X itself isn't succesful.
>>>
>> Right. That's about what userspace API would be saner. To place timeout
>> on the left of a request, or on the right, with the same resulting effect.
>>
>> Let put this question away until the others are clear.
>>
>>>> 4. I'd prefer to handle it as a new generic command and setting a timer
>>>> in __io_submit_sqe().
>>>>
>>>> I believe we can do it more gracefully, and at the same moment giving
>>>> more freedom to the user. What do you think?
>>>
>>> I just think we need to make sure the ground rules are sane. I'm going
>>> to write a few test cases to make sure we do the right thing.
>>>
>>
> Ok, let me try to state some rules to discuss:

> 1. REQ -> LINK_TIMEOUT
> is a valid use case

Yes

> 2. timeout is set at the moment of starting execution of operation.
> e.g. REQ1, REQ2|DRAIN -> LINK_TIMEOUT
>
> Timer is set at the moment, when everything is drained and we
> sending REQ. i.e. after completion of REQ1

Right, the timeout is prepped before REQ2 is started, armed when it is
started (if not already done). The prep + arm split is important to
ensure that a short timeout doesn't even find REQ2.

> 3. REQ1 -> LINK_TIMEOUT1 -> REQ2 -> LINK_TIMEOUT2
> 
> is valid, and LINK_TIMEOUT2 will be set, at the moment of
> start of REQ2's execution. It also mean, that if
> LINK_TIMEOUT1 fires, it will cancel REQ1, and REQ2
> with LINK_TIMEOUT2 (with proper return values)

That's not valid with the patches I sent. It could be, but we'd need to
fix that bit.

> 4. REQ1, LINK_TIMEOUT
> is invalid, fail it

Correct

> 5. LINK_TIMEOUT1 -> LINK_TIMEOUT2
> Fail first, link-fail (aka cancelled) for the second one

Correct

> 6. REQ1 -> LINK_TIMEOUT1 -> LINK_TIMEOUT2
> execute REQ1+LINK_TIMEOUT1, and then fail LINK_TIMEOUT2 as
> invalid. Also, LINK_TIMEOUT2 could be just cancelled
> (e.g. if fail_links for REQ1)

Given case 5, why would this one be legal?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 15:13         ` Jens Axboe
@ 2019-11-15 17:11           ` Pavel Begunkov
  2019-11-15 19:34             ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-15 17:11 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 2904 bytes --]

>>>> I just think we need to make sure the ground rules are sane. I'm going
>>>> to write a few test cases to make sure we do the right thing.
>>>>
>>>
>> Ok, let me try to state some rules to discuss:
> 
>> 1. REQ -> LINK_TIMEOUT
>> is a valid use case
> 
> Yes
> 
>> 2. timeout is set at the moment of starting execution of operation.
>> e.g. REQ1, REQ2|DRAIN -> LINK_TIMEOUT
>>
>> Timer is set at the moment, when everything is drained and we
>> sending REQ. i.e. after completion of REQ1
> 
> Right, the timeout is prepped before REQ2 is started, armed when it is
> started (if not already done). The prep + arm split is important to
> ensure that a short timeout doesn't even find REQ2.
I've got (and seen the patch) for prep + arm split. Could a
submission block/take a long time? If so, it's probably not what
user would want.

e.g. WRITE -> LINK_TIMEOUT (1s)
- submit write (blocking, takes 2s)
- and only after this 2s have a chance to arm the timeout.

> 
>> 3. REQ1 -> LINK_TIMEOUT1 -> REQ2 -> LINK_TIMEOUT2
>>
>> is valid, and LINK_TIMEOUT2 will be set, at the moment of
>> start of REQ2's execution. It also mean, that if
>> LINK_TIMEOUT1 fires, it will cancel REQ1, and REQ2
>> with LINK_TIMEOUT2 (with proper return values)
> 
> That's not valid with the patches I sent. It could be, but we'd need to
> fix that bit.
> 
It should almost work, if we move linked timeout init/arm code 
into __io_submit_sqe(). There is also a problem, which it'll solve:

If a request is deferred, it will skip timeout initialisation,
because io_req_defer() happens before __io_queue_sqe().
io_wq_submit_work() won't initialise/arm the timeout as well,
as it use __io_submit_sqe() directly. So
- rule 2. doesn't work
- free_req() calls io_link_cancel_timeout() for an non-inititialised
timeout


The case I keep in mind is:
read file -> SEND (+LINK_TIMEOUT) 
	-> RECV (+LINK_TIMEOUT) -> write file ...

We don't care how long file read/write would take,
but would want to limit execution time for network operations.


>> 4. REQ1, LINK_TIMEOUT
>> is invalid, fail it
> 
> Correct
> 
>> 5. LINK_TIMEOUT1 -> LINK_TIMEOUT2
>> Fail first, link-fail (aka cancelled) for the second one
> 
> Correct
> 
>> 6. REQ1 -> LINK_TIMEOUT1 -> LINK_TIMEOUT2
>> execute REQ1+LINK_TIMEOUT1, and then fail LINK_TIMEOUT2 as
>> invalid. Also, LINK_TIMEOUT2 could be just cancelled
>> (e.g. if fail_links for REQ1)
> 
> Given case 5, why would this one be legal?
> 
This one is different if we conceptually consider
REQ + following LINK_TIMEOUT as a single operation with timeout.
If so, it can be said that (REQ1 -> LINK_TIMEOUT1) is valid
pair, but LINK_TIMEOUT2 is a following single link timeout,
that's more like in 4.


7. If we decide to not implement 3., what's about the case below?
REQ1 -> REQ2 -> LINK_TIMEOUT

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 17:11           ` Pavel Begunkov
@ 2019-11-15 19:34             ` Jens Axboe
  2019-11-15 21:16               ` Jens Axboe
  2019-11-15 21:22               ` Pavel Begunkov
  0 siblings, 2 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-15 19:34 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

How about something like this? Should work (and be valid) to have any
sequence of timeout links, as long as there's something in front of it.
Commit message has more details.


commit 437fd0500d08f32444747b861c72cd58a4b833d5
Author: Jens Axboe <axboe@kernel.dk>
Date:   Thu Nov 14 19:39:52 2019 -0700

    io_uring: fix sequencing issues with linked timeouts
    
    We have an issue with timeout links that are deeper in the submit chain,
    because we only handle it upfront, not from later submissions. Move the
    prep + issue of the timeout link to the async work prep handler, and do
    it normally for non-async queue. If we validate and prepare the timeout
    links upfront when we first see them, there's nothing stopping us from
    supporting any sort of nesting.
    
    Ensure that we handle the sequencing of linked timeouts correctly as
    well. The loop in io_req_link_next() isn't necessary, and it will never
    do anything, because we can't have both REQ_F_LINK_TIMEOUT set and have
    dependent links.
    
    Reported-by: Pavel Begunkov <asml.silence@gmail.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a0204b48866f..47d61899b8ba 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -352,6 +352,7 @@ struct io_kiocb {
 #define REQ_F_MUST_PUNT		4096	/* must be punted even for NONBLOCK */
 #define REQ_F_INFLIGHT		8192	/* on inflight list */
 #define REQ_F_COMP_LOCKED	16384	/* completion under lock */
+#define REQ_F_FREE_SQE		32768
 	u64			user_data;
 	u32			result;
 	u32			sequence;
@@ -390,6 +391,8 @@ static void __io_free_req(struct io_kiocb *req);
 static void io_put_req(struct io_kiocb *req);
 static void io_double_put_req(struct io_kiocb *req);
 static void __io_double_put_req(struct io_kiocb *req);
+static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req);
+static void io_queue_linked_timeout(struct io_kiocb *req);
 
 static struct kmem_cache *req_cachep;
 
@@ -524,7 +527,8 @@ static inline bool io_sqe_needs_user(const struct io_uring_sqe *sqe)
 		 opcode == IORING_OP_WRITE_FIXED);
 }
 
-static inline bool io_prep_async_work(struct io_kiocb *req)
+static inline bool io_prep_async_work(struct io_kiocb *req,
+				      struct io_kiocb **link)
 {
 	bool do_hashed = false;
 
@@ -553,13 +557,17 @@ static inline bool io_prep_async_work(struct io_kiocb *req)
 			req->work.flags |= IO_WQ_WORK_NEEDS_USER;
 	}
 
+	*link = io_prep_linked_timeout(req);
 	return do_hashed;
 }
 
 static inline void io_queue_async_work(struct io_kiocb *req)
 {
-	bool do_hashed = io_prep_async_work(req);
 	struct io_ring_ctx *ctx = req->ctx;
+	struct io_kiocb *link;
+	bool do_hashed;
+
+	do_hashed = io_prep_async_work(req, &link);
 
 	trace_io_uring_queue_async_work(ctx, do_hashed, req, &req->work,
 					req->flags);
@@ -569,6 +577,9 @@ static inline void io_queue_async_work(struct io_kiocb *req)
 		io_wq_enqueue_hashed(ctx->io_wq, &req->work,
 					file_inode(req->file));
 	}
+
+	if (link)
+		io_queue_linked_timeout(link);
 }
 
 static void io_kill_timeout(struct io_kiocb *req)
@@ -868,30 +879,26 @@ static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
 	 * safe side.
 	 */
 	nxt = list_first_entry_or_null(&req->link_list, struct io_kiocb, list);
-	while (nxt) {
+	if (nxt) {
 		list_del_init(&nxt->list);
 		if (!list_empty(&req->link_list)) {
 			INIT_LIST_HEAD(&nxt->link_list);
 			list_splice(&req->link_list, &nxt->link_list);
 			nxt->flags |= REQ_F_LINK;
+		} else if (req->flags & REQ_F_LINK_TIMEOUT) {
+			wake_ev = io_link_cancel_timeout(nxt);
+			nxt = NULL;
 		}
 
 		/*
 		 * If we're in async work, we can continue processing the chain
 		 * in this context instead of having to queue up new async work.
 		 */
-		if (req->flags & REQ_F_LINK_TIMEOUT) {
-			wake_ev = io_link_cancel_timeout(nxt);
-
-			/* we dropped this link, get next */
-			nxt = list_first_entry_or_null(&req->link_list,
-							struct io_kiocb, list);
-		} else if (nxtptr && io_wq_current_is_worker()) {
-			*nxtptr = nxt;
-			break;
-		} else {
-			io_queue_async_work(nxt);
-			break;
+		if (nxt) {
+			if (nxtptr && io_wq_current_is_worker())
+				*nxtptr = nxt;
+			else
+				io_queue_async_work(nxt);
 		}
 	}
 
@@ -916,6 +923,9 @@ static void io_fail_links(struct io_kiocb *req)
 
 		trace_io_uring_fail_link(req, link);
 
+		if (req->flags & REQ_F_FREE_SQE)
+			kfree(link->submit.sqe);
+
 		if ((req->flags & REQ_F_LINK_TIMEOUT) &&
 		    link->submit.sqe->opcode == IORING_OP_LINK_TIMEOUT) {
 			io_link_cancel_timeout(link);
@@ -2651,8 +2661,12 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
 
 	/* if a dependent link is ready, pass it back */
 	if (!ret && nxt) {
-		io_prep_async_work(nxt);
+		struct io_kiocb *link;
+
+		io_prep_async_work(nxt, &link);
 		*workptr = &nxt->work;
+		if (link)
+			io_queue_linked_timeout(link);
 	}
 }
 
@@ -2832,7 +2846,6 @@ static int io_validate_link_timeout(struct io_kiocb *req)
 static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
 {
 	struct io_kiocb *nxt;
-	int ret;
 
 	if (!(req->flags & REQ_F_LINK))
 		return NULL;
@@ -2841,14 +2854,6 @@ static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
 	if (!nxt || nxt->submit.sqe->opcode != IORING_OP_LINK_TIMEOUT)
 		return NULL;
 
-	ret = io_validate_link_timeout(nxt);
-	if (ret) {
-		list_del_init(&nxt->list);
-		io_cqring_add_event(nxt, ret);
-		io_double_put_req(nxt);
-		return ERR_PTR(-ECANCELED);
-	}
-
 	if (nxt->submit.sqe->timeout_flags & IORING_TIMEOUT_ABS)
 		nxt->timeout.data->mode = HRTIMER_MODE_ABS;
 	else
@@ -2862,16 +2867,9 @@ static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
 
 static void __io_queue_sqe(struct io_kiocb *req)
 {
-	struct io_kiocb *nxt;
+	struct io_kiocb *nxt = io_prep_linked_timeout(req);
 	int ret;
 
-	nxt = io_prep_linked_timeout(req);
-	if (IS_ERR(nxt)) {
-		ret = PTR_ERR(nxt);
-		nxt = NULL;
-		goto err;
-	}
-
 	ret = __io_submit_sqe(req, NULL, true);
 
 	/*
@@ -2899,10 +2897,6 @@ static void __io_queue_sqe(struct io_kiocb *req)
 			 * submit reference when the iocb is actually submitted.
 			 */
 			io_queue_async_work(req);
-
-			if (nxt)
-				io_queue_linked_timeout(nxt);
-
 			return;
 		}
 	}
@@ -2947,6 +2941,10 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
 	int need_submit = false;
 	struct io_ring_ctx *ctx = req->ctx;
 
+	if (unlikely(req->flags & REQ_F_FAIL_LINK)) {
+		ret = -ECANCELED;
+		goto err;
+	}
 	if (!shadow) {
 		io_queue_sqe(req);
 		return;
@@ -2961,9 +2959,11 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
 	ret = io_req_defer(req);
 	if (ret) {
 		if (ret != -EIOCBQUEUED) {
+err:
 			io_cqring_add_event(req, ret);
 			io_double_put_req(req);
-			__io_free_req(shadow);
+			if (shadow)
+				__io_free_req(shadow);
 			return;
 		}
 	} else {
@@ -3020,6 +3020,14 @@ static void io_submit_sqe(struct io_kiocb *req, struct io_submit_state *state,
 	if (*link) {
 		struct io_kiocb *prev = *link;
 
+		if (READ_ONCE(s->sqe->opcode) == IORING_OP_LINK_TIMEOUT) {
+			ret = io_validate_link_timeout(req);
+			if (ret) {
+				prev->flags |= REQ_F_FAIL_LINK;
+				goto err_req;
+			}
+		}
+
 		sqe_copy = kmemdup(s->sqe, sizeof(*sqe_copy), GFP_KERNEL);
 		if (!sqe_copy) {
 			ret = -EAGAIN;

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 19:34             ` Jens Axboe
@ 2019-11-15 21:16               ` Jens Axboe
  2019-11-15 21:38                 ` Pavel Begunkov
  2019-11-15 21:22               ` Pavel Begunkov
  1 sibling, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-15 21:16 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

On 11/15/19 12:34 PM, Jens Axboe wrote:
> How about something like this? Should work (and be valid) to have any
> sequence of timeout links, as long as there's something in front of it.
> Commit message has more details.

Updated below (missed the sqe free), easiest to check out the repo
here:

https://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-post

as that will show the couple of prep patches, too. Let me know what
you think.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 19:34             ` Jens Axboe
  2019-11-15 21:16               ` Jens Axboe
@ 2019-11-15 21:22               ` Pavel Begunkov
  2019-11-15 21:26                 ` Jens Axboe
  1 sibling, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-15 21:22 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 8505 bytes --]

On 15/11/2019 22:34, Jens Axboe wrote:
> How about something like this? Should work (and be valid) to have any
> sequence of timeout links, as long as there's something in front of it.
> Commit message has more details.

If you don't mind, I'll give a try rewriting this. A bit tight
on time, so hopefully by this Sunday.

In any case, there are enough of edge cases, I need to spend some
time to review and check it.

> 
> 
> commit 437fd0500d08f32444747b861c72cd58a4b833d5
> Author: Jens Axboe <axboe@kernel.dk>
> Date:   Thu Nov 14 19:39:52 2019 -0700
> 
>     io_uring: fix sequencing issues with linked timeouts
>     
>     We have an issue with timeout links that are deeper in the submit chain,
>     because we only handle it upfront, not from later submissions. Move the
>     prep + issue of the timeout link to the async work prep handler, and do
>     it normally for non-async queue. If we validate and prepare the timeout
>     links upfront when we first see them, there's nothing stopping us from
>     supporting any sort of nesting.
>     
>     Ensure that we handle the sequencing of linked timeouts correctly as
>     well. The loop in io_req_link_next() isn't necessary, and it will never
>     do anything, because we can't have both REQ_F_LINK_TIMEOUT set and have
>     dependent links.

REQ1 -> LINKED_TIMEOUT -> REQ2 -> REQ3
Is this a valid case? Just to check that I got this "can't have both" right.
If no, why so? I think there are a lot of use cases for this.


>     
>     Reported-by: Pavel Begunkov <asml.silence@gmail.com>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a0204b48866f..47d61899b8ba 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -352,6 +352,7 @@ struct io_kiocb {
>  #define REQ_F_MUST_PUNT		4096	/* must be punted even for NONBLOCK */
>  #define REQ_F_INFLIGHT		8192	/* on inflight list */
>  #define REQ_F_COMP_LOCKED	16384	/* completion under lock */
> +#define REQ_F_FREE_SQE		32768
>  	u64			user_data;
>  	u32			result;
>  	u32			sequence;
> @@ -390,6 +391,8 @@ static void __io_free_req(struct io_kiocb *req);
>  static void io_put_req(struct io_kiocb *req);
>  static void io_double_put_req(struct io_kiocb *req);
>  static void __io_double_put_req(struct io_kiocb *req);
> +static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req);
> +static void io_queue_linked_timeout(struct io_kiocb *req);
>  
>  static struct kmem_cache *req_cachep;
>  
> @@ -524,7 +527,8 @@ static inline bool io_sqe_needs_user(const struct io_uring_sqe *sqe)
>  		 opcode == IORING_OP_WRITE_FIXED);
>  }
>  
> -static inline bool io_prep_async_work(struct io_kiocb *req)
> +static inline bool io_prep_async_work(struct io_kiocb *req,
> +				      struct io_kiocb **link)
>  {
>  	bool do_hashed = false;
>  
> @@ -553,13 +557,17 @@ static inline bool io_prep_async_work(struct io_kiocb *req)
>  			req->work.flags |= IO_WQ_WORK_NEEDS_USER;
>  	}
>  
> +	*link = io_prep_linked_timeout(req);
>  	return do_hashed;
>  }
>  
>  static inline void io_queue_async_work(struct io_kiocb *req)
>  {
> -	bool do_hashed = io_prep_async_work(req);
>  	struct io_ring_ctx *ctx = req->ctx;
> +	struct io_kiocb *link;
> +	bool do_hashed;
> +
> +	do_hashed = io_prep_async_work(req, &link);
>  
>  	trace_io_uring_queue_async_work(ctx, do_hashed, req, &req->work,
>  					req->flags);
> @@ -569,6 +577,9 @@ static inline void io_queue_async_work(struct io_kiocb *req)
>  		io_wq_enqueue_hashed(ctx->io_wq, &req->work,
>  					file_inode(req->file));
>  	}
> +
> +	if (link)
> +		io_queue_linked_timeout(link);
>  }
>  
>  static void io_kill_timeout(struct io_kiocb *req)
> @@ -868,30 +879,26 @@ static void io_req_link_next(struct io_kiocb *req, struct io_kiocb **nxtptr)
>  	 * safe side.
>  	 */
>  	nxt = list_first_entry_or_null(&req->link_list, struct io_kiocb, list);
> -	while (nxt) {
> +	if (nxt) {
>  		list_del_init(&nxt->list);
>  		if (!list_empty(&req->link_list)) {
>  			INIT_LIST_HEAD(&nxt->link_list);
>  			list_splice(&req->link_list, &nxt->link_list);
>  			nxt->flags |= REQ_F_LINK;
> +		} else if (req->flags & REQ_F_LINK_TIMEOUT) {
> +			wake_ev = io_link_cancel_timeout(nxt);
> +			nxt = NULL;
>  		}
>  
>  		/*
>  		 * If we're in async work, we can continue processing the chain
>  		 * in this context instead of having to queue up new async work.
>  		 */
> -		if (req->flags & REQ_F_LINK_TIMEOUT) {
> -			wake_ev = io_link_cancel_timeout(nxt);
> -
> -			/* we dropped this link, get next */
> -			nxt = list_first_entry_or_null(&req->link_list,
> -							struct io_kiocb, list);
> -		} else if (nxtptr && io_wq_current_is_worker()) {
> -			*nxtptr = nxt;
> -			break;
> -		} else {
> -			io_queue_async_work(nxt);
> -			break;
> +		if (nxt) {
> +			if (nxtptr && io_wq_current_is_worker())
> +				*nxtptr = nxt;
> +			else
> +				io_queue_async_work(nxt);
>  		}
>  	}
>  
> @@ -916,6 +923,9 @@ static void io_fail_links(struct io_kiocb *req)
>  
>  		trace_io_uring_fail_link(req, link);
>  
> +		if (req->flags & REQ_F_FREE_SQE)
> +			kfree(link->submit.sqe);
> +
>  		if ((req->flags & REQ_F_LINK_TIMEOUT) &&
>  		    link->submit.sqe->opcode == IORING_OP_LINK_TIMEOUT) {
>  			io_link_cancel_timeout(link);
> @@ -2651,8 +2661,12 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
>  
>  	/* if a dependent link is ready, pass it back */
>  	if (!ret && nxt) {
> -		io_prep_async_work(nxt);
> +		struct io_kiocb *link;
> +
> +		io_prep_async_work(nxt, &link);
>  		*workptr = &nxt->work;
> +		if (link)
> +			io_queue_linked_timeout(link);
>  	}
>  }
>  
> @@ -2832,7 +2846,6 @@ static int io_validate_link_timeout(struct io_kiocb *req)
>  static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
>  {
>  	struct io_kiocb *nxt;
> -	int ret;
>  
>  	if (!(req->flags & REQ_F_LINK))
>  		return NULL;
> @@ -2841,14 +2854,6 @@ static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
>  	if (!nxt || nxt->submit.sqe->opcode != IORING_OP_LINK_TIMEOUT)
>  		return NULL;
>  
> -	ret = io_validate_link_timeout(nxt);
> -	if (ret) {
> -		list_del_init(&nxt->list);
> -		io_cqring_add_event(nxt, ret);
> -		io_double_put_req(nxt);
> -		return ERR_PTR(-ECANCELED);
> -	}
> -
>  	if (nxt->submit.sqe->timeout_flags & IORING_TIMEOUT_ABS)
>  		nxt->timeout.data->mode = HRTIMER_MODE_ABS;
>  	else
> @@ -2862,16 +2867,9 @@ static struct io_kiocb *io_prep_linked_timeout(struct io_kiocb *req)
>  
>  static void __io_queue_sqe(struct io_kiocb *req)
>  {
> -	struct io_kiocb *nxt;
> +	struct io_kiocb *nxt = io_prep_linked_timeout(req);
>  	int ret;
>  
> -	nxt = io_prep_linked_timeout(req);
> -	if (IS_ERR(nxt)) {
> -		ret = PTR_ERR(nxt);
> -		nxt = NULL;
> -		goto err;
> -	}
> -
>  	ret = __io_submit_sqe(req, NULL, true);
>  
>  	/*
> @@ -2899,10 +2897,6 @@ static void __io_queue_sqe(struct io_kiocb *req)
>  			 * submit reference when the iocb is actually submitted.
>  			 */
>  			io_queue_async_work(req);
> -
> -			if (nxt)
> -				io_queue_linked_timeout(nxt);
> -
>  			return;
>  		}
>  	}
> @@ -2947,6 +2941,10 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>  	int need_submit = false;
>  	struct io_ring_ctx *ctx = req->ctx;
>  
> +	if (unlikely(req->flags & REQ_F_FAIL_LINK)) {
> +		ret = -ECANCELED;
> +		goto err;
> +	}
>  	if (!shadow) {
>  		io_queue_sqe(req);
>  		return;
> @@ -2961,9 +2959,11 @@ static void io_queue_link_head(struct io_kiocb *req, struct io_kiocb *shadow)
>  	ret = io_req_defer(req);
>  	if (ret) {
>  		if (ret != -EIOCBQUEUED) {
> +err:
>  			io_cqring_add_event(req, ret);
>  			io_double_put_req(req);
> -			__io_free_req(shadow);
> +			if (shadow)
> +				__io_free_req(shadow);
>  			return;
>  		}
>  	} else {
> @@ -3020,6 +3020,14 @@ static void io_submit_sqe(struct io_kiocb *req, struct io_submit_state *state,
>  	if (*link) {
>  		struct io_kiocb *prev = *link;
>  
> +		if (READ_ONCE(s->sqe->opcode) == IORING_OP_LINK_TIMEOUT) {
> +			ret = io_validate_link_timeout(req);
> +			if (ret) {
> +				prev->flags |= REQ_F_FAIL_LINK;
> +				goto err_req;
> +			}
> +		}
> +
>  		sqe_copy = kmemdup(s->sqe, sizeof(*sqe_copy), GFP_KERNEL);
>  		if (!sqe_copy) {
>  			ret = -EAGAIN;
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 21:22               ` Pavel Begunkov
@ 2019-11-15 21:26                 ` Jens Axboe
  2019-11-19 21:11                   ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-15 21:26 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

On 11/15/19 2:22 PM, Pavel Begunkov wrote:
> On 15/11/2019 22:34, Jens Axboe wrote:
>> How about something like this? Should work (and be valid) to have any
>> sequence of timeout links, as long as there's something in front of it.
>> Commit message has more details.
> 
> If you don't mind, I'll give a try rewriting this. A bit tight
> on time, so hopefully by this Sunday.
> 
> In any case, there are enough of edge cases, I need to spend some
> time to review and check it.

Of course, appreciate more eyes on this for sure. We'll see what happens
with 5.4 release, I suspect it won't happen until 11/24. In any case,
this is not really staged yet, just sitting in for-5.5/io_uring-post
as part of a series that'll likely go in after the initial merge.

> REQ1 -> LINKED_TIMEOUT -> REQ2 -> REQ3
> Is this a valid case? Just to check that I got this "can't have both" right.
> If no, why so? I think there are a lot of use cases for this.

Yes, it's valid. With the recently posted stuff, the only invalid case
is having a linked timeout as the first entry since that's nonsensical.
It has to be linked from a previous request. We no longer need to
restrict where the linked timeout appears otherwise.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 21:16               ` Jens Axboe
@ 2019-11-15 21:38                 ` Pavel Begunkov
  2019-11-15 22:15                   ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-15 21:38 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 978 bytes --]

On 16/11/2019 00:16, Jens Axboe wrote:
> On 11/15/19 12:34 PM, Jens Axboe wrote:
>> How about something like this? Should work (and be valid) to have any
>> sequence of timeout links, as long as there's something in front of it.
>> Commit message has more details.
> 
> Updated below (missed the sqe free), easiest to check out the repo
> here:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-post
> 
> as that will show the couple of prep patches, too. Let me know what
> you think.
> 

Sure,

BTW, found "io_uring: make io_double_put_req() use normal completion
path" in the tree. And it do exactly the same, what my patch was doing,
the one which "blowed" the link test :)

I'd add there "req->flags | REQ_F_FAIL_LINK" in-between failed
io_req_defer() and calling io_double_put_req(). (in 2 places)
Otherwise, even though a request failed, it will enqueue the rest
of its link with io_queue_async_work().

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 21:38                 ` Pavel Begunkov
@ 2019-11-15 22:15                   ` Jens Axboe
  2019-11-15 22:19                     ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2019-11-15 22:15 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

On 11/15/19 2:38 PM, Pavel Begunkov wrote:
> On 16/11/2019 00:16, Jens Axboe wrote:
>> On 11/15/19 12:34 PM, Jens Axboe wrote:
>>> How about something like this? Should work (and be valid) to have any
>>> sequence of timeout links, as long as there's something in front of it.
>>> Commit message has more details.
>>
>> Updated below (missed the sqe free), easiest to check out the repo
>> here:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-post
>>
>> as that will show the couple of prep patches, too. Let me know what
>> you think.
>>
> 
> Sure,
> 
> BTW, found "io_uring: make io_double_put_req() use normal completion
> path" in the tree. And it do exactly the same, what my patch was doing,
> the one which "blowed" the link test :)

Hah yes, you are right, you never did resend it though. I'll get
rid of the one I have, and replace with your original (but with
the arguments fixed).

> I'd add there "req->flags | REQ_F_FAIL_LINK" in-between failed
> io_req_defer() and calling io_double_put_req(). (in 2 places)
> Otherwise, even though a request failed, it will enqueue the rest
> of its link with io_queue_async_work().

Good point, updating now.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 22:15                   ` Jens Axboe
@ 2019-11-15 22:19                     ` Pavel Begunkov
  2019-11-15 22:23                       ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-15 22:19 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 1355 bytes --]

On 16/11/2019 01:15, Jens Axboe wrote:
> On 11/15/19 2:38 PM, Pavel Begunkov wrote:
>> On 16/11/2019 00:16, Jens Axboe wrote:
>>> On 11/15/19 12:34 PM, Jens Axboe wrote:
>>>> How about something like this? Should work (and be valid) to have any
>>>> sequence of timeout links, as long as there's something in front of it.
>>>> Commit message has more details.
>>>
>>> Updated below (missed the sqe free), easiest to check out the repo
>>> here:
>>>
>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-post
>>>
>>> as that will show the couple of prep patches, too. Let me know what
>>> you think.
>>>
>>
>> Sure,
>>
>> BTW, found "io_uring: make io_double_put_req() use normal completion
>> path" in the tree. And it do exactly the same, what my patch was doing,
>> the one which "blowed" the link test :)
> 
> Hah yes, you are right, you never did resend it though. I'll get
> rid of the one I have, and replace with your original (but with
> the arguments fixed).
> 
Just keep yours, it's better :)

>> I'd add there "req->flags | REQ_F_FAIL_LINK" in-between failed
>> io_req_defer() and calling io_double_put_req(). (in 2 places)
>> Otherwise, even though a request failed, it will enqueue the rest
>> of its link with io_queue_async_work().
> 
> Good point, updating now.
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 22:19                     ` Pavel Begunkov
@ 2019-11-15 22:23                       ` Pavel Begunkov
  2019-11-15 22:25                         ` Jens Axboe
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-15 22:23 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 1525 bytes --]

On 16/11/2019 01:19, Pavel Begunkov wrote:
> On 16/11/2019 01:15, Jens Axboe wrote:
>> On 11/15/19 2:38 PM, Pavel Begunkov wrote:
>>> On 16/11/2019 00:16, Jens Axboe wrote:
>>>> On 11/15/19 12:34 PM, Jens Axboe wrote:
>>>>> How about something like this? Should work (and be valid) to have any
>>>>> sequence of timeout links, as long as there's something in front of it.
>>>>> Commit message has more details.
>>>>
>>>> Updated below (missed the sqe free), easiest to check out the repo
>>>> here:
>>>>
>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-post
>>>>
>>>> as that will show the couple of prep patches, too. Let me know what
>>>> you think.
>>>>
>>>
>>> Sure,
>>>
>>> BTW, found "io_uring: make io_double_put_req() use normal completion
>>> path" in the tree. And it do exactly the same, what my patch was doing,
>>> the one which "blowed" the link test :)
>>
>> Hah yes, you are right, you never did resend it though. I'll get
>> rid of the one I have, and replace with your original (but with
>> the arguments fixed).
>>
> Just keep yours, it's better :)

Moreover, mine have one extra REQ_F_FAIL_LINK, which really
should not be there.

> 
>>> I'd add there "req->flags | REQ_F_FAIL_LINK" in-between failed
>>> io_req_defer() and calling io_double_put_req(). (in 2 places)
>>> Otherwise, even though a request failed, it will enqueue the rest
>>> of its link with io_queue_async_work().
>>
>> Good point, updating now.
>>
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 22:23                       ` Pavel Begunkov
@ 2019-11-15 22:25                         ` Jens Axboe
  0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2019-11-15 22:25 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

On 11/15/19 3:23 PM, Pavel Begunkov wrote:
> On 16/11/2019 01:19, Pavel Begunkov wrote:
>> On 16/11/2019 01:15, Jens Axboe wrote:
>>> On 11/15/19 2:38 PM, Pavel Begunkov wrote:
>>>> On 16/11/2019 00:16, Jens Axboe wrote:
>>>>> On 11/15/19 12:34 PM, Jens Axboe wrote:
>>>>>> How about something like this? Should work (and be valid) to have any
>>>>>> sequence of timeout links, as long as there's something in front of it.
>>>>>> Commit message has more details.
>>>>>
>>>>> Updated below (missed the sqe free), easiest to check out the repo
>>>>> here:
>>>>>
>>>>> https://git.kernel.dk/cgit/linux-block/log/?h=for-5.5/io_uring-post
>>>>>
>>>>> as that will show the couple of prep patches, too. Let me know what
>>>>> you think.
>>>>>
>>>>
>>>> Sure,
>>>>
>>>> BTW, found "io_uring: make io_double_put_req() use normal completion
>>>> path" in the tree. And it do exactly the same, what my patch was doing,
>>>> the one which "blowed" the link test :)
>>>
>>> Hah yes, you are right, you never did resend it though. I'll get
>>> rid of the one I have, and replace with your original (but with
>>> the arguments fixed).
>>>
>> Just keep yours, it's better :)
> 
> Moreover, mine have one extra REQ_F_FAIL_LINK, which really
> should not be there.

Gotcha, ok I'll just keep it as-is.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHSET 0/2] io_uring support for linked timeouts
  2019-11-15 21:26                 ` Jens Axboe
@ 2019-11-19 21:11                   ` Pavel Begunkov
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2019-11-19 21:11 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block; +Cc: zeba.hrvoje, liuyun01

[-- Attachment #1.1: Type: text/plain, Size: 2292 bytes --]

On 16/11/2019 00:26, Jens Axboe wrote:
> On 11/15/19 2:22 PM, Pavel Begunkov wrote:
>> On 15/11/2019 22:34, Jens Axboe wrote:
>>> How about something like this? Should work (and be valid) to have any
>>> sequence of timeout links, as long as there's something in front of it.
>>> Commit message has more details.
>>
>> If you don't mind, I'll give a try rewriting this. A bit tight
>> on time, so hopefully by this Sunday.
>>
>> In any case, there are enough of edge cases, I need to spend some
>> time to review and check it.
> 
> Of course, appreciate more eyes on this for sure. We'll see what happens
> with 5.4 release, I suspect it won't happen until 11/24. In any case,
> this is not really staged yet, just sitting in for-5.5/io_uring-post
> as part of a series that'll likely go in after the initial merge.
> 
Tangled up in cancellation and io-wq, so no reworking here until I
understand it well enough. I've sent a separate series for what spotted

I've wanted to return back a version with queuing linked timeout
before issuing request, but make the timeout callback mark the
master request as cancelled. e.g. 

io_link_timeout_fn(req) {
	req->work.flags |= IO_WQ_WORK_CANCEL;
	try_to_cancel();
}
issue_req(req) {
	next = get_next(req);
	queue_linked_timeout(next);
	__issue_req(req);
}



There is another point to discuss until de-staged it. Did you consider
reverse syntax? e.g. LINKED_TIMEOUT -> REQ instead.

I think it's a bit more consistent, as any request affects only those
on right of it (from the user perspective).

Also, I'm tempted to implement the scheme above, and after remove all
linked timeout handling from submission path and move it into
io_issue_sqe() (last __io_submit_sqe) as yet another case in the switch.


>> REQ1 -> LINKED_TIMEOUT -> REQ2 -> REQ3
>> Is this a valid case? Just to check that I got this "can't have both" right.
>> If no, why so? I think there are a lot of use cases for this.
> 
> Yes, it's valid. With the recently posted stuff, the only invalid case
> is having a linked timeout as the first entry since that's nonsensical.
> It has to be linked from a previous request. We no longer need to
> restrict where the linked timeout appears otherwise.
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-05 21:11 [PATCHSET 0/2] io_uring support for linked timeouts Jens Axboe
2019-11-05 21:11 ` [PATCH 1/2] io_uring: abstract out io_async_cancel_one() helper Jens Axboe
2019-11-05 21:11 ` [PATCH 2/2] io_uring: add support for linked SQE timeouts Jens Axboe
2019-11-14 21:24 ` [PATCHSET 0/2] io_uring support for linked timeouts Pavel Begunkov
2019-11-14 22:37   ` Jens Axboe
2019-11-15  9:40     ` Pavel Begunkov
2019-11-15 14:21       ` Pavel Begunkov
2019-11-15 15:13         ` Jens Axboe
2019-11-15 17:11           ` Pavel Begunkov
2019-11-15 19:34             ` Jens Axboe
2019-11-15 21:16               ` Jens Axboe
2019-11-15 21:38                 ` Pavel Begunkov
2019-11-15 22:15                   ` Jens Axboe
2019-11-15 22:19                     ` Pavel Begunkov
2019-11-15 22:23                       ` Pavel Begunkov
2019-11-15 22:25                         ` Jens Axboe
2019-11-15 21:22               ` Pavel Begunkov
2019-11-15 21:26                 ` Jens Axboe
2019-11-19 21:11                   ` Pavel Begunkov

IO-Uring Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/io-uring/0 io-uring/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 io-uring io-uring/ https://lore.kernel.org/io-uring \
		io-uring@vger.kernel.org
	public-inbox-index io-uring

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.io-uring


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git