linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2][for-next] cleanup submission path
@ 2019-10-27 15:35 Pavel Begunkov
  2019-10-27 15:35 ` [PATCH 1/2] io_uring: handle mm_fault outside of submission Pavel Begunkov
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 15:35 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel

A small cleanup of very similar but diverged io_submit_sqes() and
io_ring_submit()

Pavel Begunkov (2):
  io_uring: handle mm_fault outside of submission
  io_uring: merge io_submit_sqes and io_ring_submit

 fs/io_uring.c | 116 ++++++++++++++------------------------------------
 1 file changed, 33 insertions(+), 83 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] io_uring: handle mm_fault outside of submission
  2019-10-27 15:35 [PATCH 0/2][for-next] cleanup submission path Pavel Begunkov
@ 2019-10-27 15:35 ` Pavel Begunkov
  2019-10-27 15:35 ` [PATCH 2/2] io_uring: merge io_submit_sqes and io_ring_submit Pavel Begunkov
  2019-10-27 16:32 ` [PATCH 0/2][for-next] cleanup submission path Jens Axboe
  2 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 15:35 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel

Preparation for the following patch.
Let callers of io_submit_sqes() handle an mm_fault case.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 76cc8add9e77..f65727f2ba95 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2640,7 +2640,7 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
 }
 
 static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
-			  bool has_user, bool mm_fault)
+			  bool has_user)
 {
 	struct io_submit_state state, *statep = NULL;
 	struct io_kiocb *link = NULL;
@@ -2682,17 +2682,12 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 		}
 
 out:
-		if (unlikely(mm_fault)) {
-			io_cqring_add_event(ctx, s.sqe->user_data,
-						-EFAULT);
-		} else {
-			s.has_user = has_user;
-			s.in_async = true;
-			s.needs_fixed_file = true;
-			trace_io_uring_submit_sqe(ctx, true, true);
-			io_submit_sqe(ctx, &s, statep, &link);
-			submitted++;
-		}
+		s.has_user = has_user;
+		s.in_async = true;
+		s.needs_fixed_file = true;
+		trace_io_uring_submit_sqe(ctx, true, true);
+		io_submit_sqe(ctx, &s, statep, &link);
+		submitted++;
 	}
 
 	if (link)
@@ -2703,6 +2698,16 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 	return submitted;
 }
 
+static void io_fail_all_sqes(struct io_ring_ctx *ctx)
+{
+	struct sqe_submit s;
+
+	while (io_get_sqring(ctx, &s))
+		io_cqring_add_event(ctx, s.sqe->user_data, -EFAULT);
+
+	io_commit_sqring(ctx);
+}
+
 static int io_sq_thread(void *data)
 {
 	struct io_ring_ctx *ctx = data;
@@ -2813,12 +2818,15 @@ static int io_sq_thread(void *data)
 			}
 		}
 
-		to_submit = min(to_submit, ctx->sq_entries);
-		inflight += io_submit_sqes(ctx, to_submit, cur_mm != NULL,
-					   mm_fault);
-
-		/* Commit SQ ring head once we've consumed all SQEs */
-		io_commit_sqring(ctx);
+		if (unlikely(mm_fault)) {
+			io_fail_all_sqes(ctx);
+		} else {
+			to_submit = min(to_submit, ctx->sq_entries);
+			inflight += io_submit_sqes(ctx, to_submit,
+						   cur_mm != NULL);
+			/* Commit SQ ring head once we've consumed all SQEs */
+			io_commit_sqring(ctx);
+		}
 	}
 
 	set_fs(old_fs);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] io_uring: merge io_submit_sqes and io_ring_submit
  2019-10-27 15:35 [PATCH 0/2][for-next] cleanup submission path Pavel Begunkov
  2019-10-27 15:35 ` [PATCH 1/2] io_uring: handle mm_fault outside of submission Pavel Begunkov
@ 2019-10-27 15:35 ` Pavel Begunkov
  2019-10-27 16:32 ` [PATCH 0/2][for-next] cleanup submission path Jens Axboe
  2 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 15:35 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel

io_submit_sqes() and io_ring_submit() are mostly identical now, except
for several flags. And it's error-prone, as usually requires
synchcronously changing both of them.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/io_uring.c | 88 +++++++++------------------------------------------
 1 file changed, 15 insertions(+), 73 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f65727f2ba95..949faf14345e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2640,7 +2640,8 @@ static bool io_get_sqring(struct io_ring_ctx *ctx, struct sqe_submit *s)
 }
 
 static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
-			  bool has_user)
+			  struct file *ring_file, int ring_fd,
+			  bool has_user, bool in_async)
 {
 	struct io_submit_state state, *statep = NULL;
 	struct io_kiocb *link = NULL;
@@ -2682,10 +2683,12 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 		}
 
 out:
+		s.ring_file = ring_file;
 		s.has_user = has_user;
-		s.in_async = true;
-		s.needs_fixed_file = true;
-		trace_io_uring_submit_sqe(ctx, true, true);
+		s.in_async = in_async;
+		s.needs_fixed_file = in_async;
+		s.ring_fd = ring_fd;
+		trace_io_uring_submit_sqe(ctx, true, in_async);
 		io_submit_sqe(ctx, &s, statep, &link);
 		submitted++;
 	}
@@ -2693,7 +2696,10 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
 	if (link)
 		io_queue_link_head(ctx, link, &link->submit, shadow_req);
 	if (statep)
-		io_submit_state_end(&state);
+		io_submit_state_end(statep);
+
+	/* Commit SQ ring head once we've consumed all SQEs */
+	io_commit_sqring(ctx);
 
 	return submitted;
 }
@@ -2822,10 +2828,8 @@ static int io_sq_thread(void *data)
 			io_fail_all_sqes(ctx);
 		} else {
 			to_submit = min(to_submit, ctx->sq_entries);
-			inflight += io_submit_sqes(ctx, to_submit,
-						   cur_mm != NULL);
-			/* Commit SQ ring head once we've consumed all SQEs */
-			io_commit_sqring(ctx);
+			inflight += io_submit_sqes(ctx, to_submit, NULL, -1,
+						   cur_mm != NULL, true);
 		}
 	}
 
@@ -2840,69 +2844,6 @@ static int io_sq_thread(void *data)
 	return 0;
 }
 
-static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit,
-			  struct file *ring_file, int ring_fd)
-{
-	struct io_submit_state state, *statep = NULL;
-	struct io_kiocb *link = NULL;
-	struct io_kiocb *shadow_req = NULL;
-	bool prev_was_link = false;
-	int i, submit = 0;
-
-	if (to_submit > IO_PLUG_THRESHOLD) {
-		io_submit_state_start(&state, ctx, to_submit);
-		statep = &state;
-	}
-
-	for (i = 0; i < to_submit; i++) {
-		struct sqe_submit s;
-
-		if (!io_get_sqring(ctx, &s))
-			break;
-
-		/*
-		 * If previous wasn't linked and we have a linked command,
-		 * that's the end of the chain. Submit the previous link.
-		 */
-		if (!prev_was_link && link) {
-			io_queue_link_head(ctx, link, &link->submit, shadow_req);
-			link = NULL;
-			shadow_req = NULL;
-		}
-		prev_was_link = (s.sqe->flags & IOSQE_IO_LINK) != 0;
-
-		if (link && (s.sqe->flags & IOSQE_IO_DRAIN)) {
-			if (!shadow_req) {
-				shadow_req = io_get_req(ctx, NULL);
-				if (unlikely(!shadow_req))
-					goto out;
-				shadow_req->flags |= (REQ_F_IO_DRAIN | REQ_F_SHADOW_DRAIN);
-				refcount_dec(&shadow_req->refs);
-			}
-			shadow_req->sequence = s.sequence;
-		}
-
-out:
-		s.ring_file = ring_file;
-		s.has_user = true;
-		s.in_async = false;
-		s.needs_fixed_file = false;
-		s.ring_fd = ring_fd;
-		submit++;
-		trace_io_uring_submit_sqe(ctx, true, false);
-		io_submit_sqe(ctx, &s, statep, &link);
-	}
-
-	if (link)
-		io_queue_link_head(ctx, link, &link->submit, shadow_req);
-	if (statep)
-		io_submit_state_end(statep);
-
-	io_commit_sqring(ctx);
-
-	return submit;
-}
-
 struct io_wait_queue {
 	struct wait_queue_entry wq;
 	struct io_ring_ctx *ctx;
@@ -4027,7 +3968,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		to_submit = min(to_submit, ctx->sq_entries);
 
 		mutex_lock(&ctx->uring_lock);
-		submitted = io_ring_submit(ctx, to_submit, f.file, fd);
+		submitted = io_submit_sqes(ctx, to_submit, f.file, fd,
+					   true, false);
 		mutex_unlock(&ctx->uring_lock);
 	}
 	if (flags & IORING_ENTER_GETEVENTS) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 15:35 [PATCH 0/2][for-next] cleanup submission path Pavel Begunkov
  2019-10-27 15:35 ` [PATCH 1/2] io_uring: handle mm_fault outside of submission Pavel Begunkov
  2019-10-27 15:35 ` [PATCH 2/2] io_uring: merge io_submit_sqes and io_ring_submit Pavel Begunkov
@ 2019-10-27 16:32 ` Jens Axboe
  2019-10-27 16:44   ` Pavel Begunkov
  2 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2019-10-27 16:32 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block, linux-kernel

On 10/27/19 9:35 AM, Pavel Begunkov wrote:
> A small cleanup of very similar but diverged io_submit_sqes() and
> io_ring_submit()
> 
> Pavel Begunkov (2):
>    io_uring: handle mm_fault outside of submission
>    io_uring: merge io_submit_sqes and io_ring_submit
> 
>   fs/io_uring.c | 116 ++++++++++++++------------------------------------
>   1 file changed, 33 insertions(+), 83 deletions(-)

I like the cleanups here, but one thing that seems off is the
assumption that io_sq_thread() always needs to grab the mm. If
the sqes processed are just READ/WRITE_FIXED, then it never needs
to grab the mm.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 16:32 ` [PATCH 0/2][for-next] cleanup submission path Jens Axboe
@ 2019-10-27 16:44   ` Pavel Begunkov
  2019-10-27 16:49     ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 16:44 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 957 bytes --]

On 27/10/2019 19:32, Jens Axboe wrote:
> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>> A small cleanup of very similar but diverged io_submit_sqes() and
>> io_ring_submit()
>>
>> Pavel Begunkov (2):
>>    io_uring: handle mm_fault outside of submission
>>    io_uring: merge io_submit_sqes and io_ring_submit
>>
>>   fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>   1 file changed, 33 insertions(+), 83 deletions(-)
> 
> I like the cleanups here, but one thing that seems off is the
> assumption that io_sq_thread() always needs to grab the mm. If
> the sqes processed are just READ/WRITE_FIXED, then it never needs
> to grab the mm.
> Yeah, we removed it to fix bugs. Personally, I think it would be
clearer to do lazy grabbing conditionally, rather than have two
functions. And in this case it's easier to do after merging.

Do you prefer to return it back first?

-- 
Yours sincerely,
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 16:44   ` Pavel Begunkov
@ 2019-10-27 16:49     ` Jens Axboe
  2019-10-27 16:56       ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2019-10-27 16:49 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block, linux-kernel

On 10/27/19 10:44 AM, Pavel Begunkov wrote:
> On 27/10/2019 19:32, Jens Axboe wrote:
>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>> io_ring_submit()
>>>
>>> Pavel Begunkov (2):
>>>     io_uring: handle mm_fault outside of submission
>>>     io_uring: merge io_submit_sqes and io_ring_submit
>>>
>>>    fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>    1 file changed, 33 insertions(+), 83 deletions(-)
>>
>> I like the cleanups here, but one thing that seems off is the
>> assumption that io_sq_thread() always needs to grab the mm. If
>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>> to grab the mm.
>> Yeah, we removed it to fix bugs. Personally, I think it would be
> clearer to do lazy grabbing conditionally, rather than have two
> functions. And in this case it's easier to do after merging.
> 
> Do you prefer to return it back first?

Ah I see, no I don't care about that.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 16:49     ` Jens Axboe
@ 2019-10-27 16:56       ` Jens Axboe
  2019-10-27 17:19         ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2019-10-27 16:56 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block, linux-kernel

On 10/27/19 10:49 AM, Jens Axboe wrote:
> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>> On 27/10/2019 19:32, Jens Axboe wrote:
>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>> io_ring_submit()
>>>>
>>>> Pavel Begunkov (2):
>>>>      io_uring: handle mm_fault outside of submission
>>>>      io_uring: merge io_submit_sqes and io_ring_submit
>>>>
>>>>     fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>     1 file changed, 33 insertions(+), 83 deletions(-)
>>>
>>> I like the cleanups here, but one thing that seems off is the
>>> assumption that io_sq_thread() always needs to grab the mm. If
>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>> to grab the mm.
>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>> clearer to do lazy grabbing conditionally, rather than have two
>> functions. And in this case it's easier to do after merging.
>>
>> Do you prefer to return it back first?
> 
> Ah I see, no I don't care about that.

OK, looked at the post-patches state. It's still not correct. You are
grabbing the mm from io_sq_thread() unconditionally. We should not do
that, only if the sqes we need to submit need mm context.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 16:56       ` Jens Axboe
@ 2019-10-27 17:19         ` Pavel Begunkov
  2019-10-27 17:26           ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 17:19 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1701 bytes --]

On 27/10/2019 19:56, Jens Axboe wrote:
> On 10/27/19 10:49 AM, Jens Axboe wrote:
>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>> io_ring_submit()
>>>>>
>>>>> Pavel Begunkov (2):
>>>>>      io_uring: handle mm_fault outside of submission
>>>>>      io_uring: merge io_submit_sqes and io_ring_submit
>>>>>
>>>>>     fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>     1 file changed, 33 insertions(+), 83 deletions(-)
>>>>
>>>> I like the cleanups here, but one thing that seems off is the
>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>> to grab the mm.
>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>> clearer to do lazy grabbing conditionally, rather than have two
>>> functions. And in this case it's easier to do after merging.
>>>
>>> Do you prefer to return it back first?
>>
>> Ah I see, no I don't care about that.
> 
> OK, looked at the post-patches state. It's still not correct. You are
> grabbing the mm from io_sq_thread() unconditionally. We should not do
> that, only if the sqes we need to submit need mm context.
> 
That's what my question to the fix was about :)
1. Then, what the case it could fail?
2. Is it ok to hold it while polling? It could keep it for quite
a long time if host is swift, e.g. submit->poll->submit->poll-> ...

Anyway, I will add it back and resend the patchset.

-- 
Yours sincerely,
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 17:19         ` Pavel Begunkov
@ 2019-10-27 17:26           ` Jens Axboe
  2019-10-27 17:37             ` Pavel Begunkov
  2019-10-27 18:56             ` Pavel Begunkov
  0 siblings, 2 replies; 17+ messages in thread
From: Jens Axboe @ 2019-10-27 17:26 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block, linux-kernel

On 10/27/19 11:19 AM, Pavel Begunkov wrote:
> On 27/10/2019 19:56, Jens Axboe wrote:
>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>> io_ring_submit()
>>>>>>
>>>>>> Pavel Begunkov (2):
>>>>>>       io_uring: handle mm_fault outside of submission
>>>>>>       io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>
>>>>>>      fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>      1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>
>>>>> I like the cleanups here, but one thing that seems off is the
>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>> to grab the mm.
>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>> functions. And in this case it's easier to do after merging.
>>>>
>>>> Do you prefer to return it back first?
>>>
>>> Ah I see, no I don't care about that.
>>
>> OK, looked at the post-patches state. It's still not correct. You are
>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>> that, only if the sqes we need to submit need mm context.
>>
> That's what my question to the fix was about :)
> 1. Then, what the case it could fail?
> 2. Is it ok to hold it while polling? It could keep it for quite
> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
> 
> Anyway, I will add it back and resend the patchset.

If possible in a simple way, I'd prefer if we do it as a prep patch and
then queue that up for 5.4 since we now lost that optimization.  Then
layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
on top of that.

If not trivially possible for 5.4, then we'll just have to leave with it
in that release. For that case, you can fold the change in with these
two patches.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 17:26           ` Jens Axboe
@ 2019-10-27 17:37             ` Pavel Begunkov
  2019-10-27 18:56             ` Pavel Begunkov
  1 sibling, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 17:37 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2349 bytes --]

On 27/10/2019 20:26, Jens Axboe wrote:
> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>> On 27/10/2019 19:56, Jens Axboe wrote:
>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>> io_ring_submit()
>>>>>>>
>>>>>>> Pavel Begunkov (2):
>>>>>>>       io_uring: handle mm_fault outside of submission
>>>>>>>       io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>
>>>>>>>      fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>      1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>
>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>> to grab the mm.
>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>> functions. And in this case it's easier to do after merging.
>>>>>
>>>>> Do you prefer to return it back first?
>>>>
>>>> Ah I see, no I don't care about that.
>>>
>>> OK, looked at the post-patches state. It's still not correct. You are
>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>> that, only if the sqes we need to submit need mm context.
>>>
>> That's what my question to the fix was about :)
>> 1. Then, what the case it could fail?
>> 2. Is it ok to hold it while polling? It could keep it for quite
>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>
>> Anyway, I will add it back and resend the patchset.
> 
> If possible in a simple way, I'd prefer if we do it as a prep patch and
> then queue that up for 5.4 since we now lost that optimization.  Then
> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
> on top of that.

Sure, will do this way. There won't be much difference.

> 
> If not trivially possible for 5.4, then we'll just have to leave with it
> in that release. For that case, you can fold the change in with these
> two patches.
> 

-- 
Yours sincerely,
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 17:26           ` Jens Axboe
  2019-10-27 17:37             ` Pavel Begunkov
@ 2019-10-27 18:56             ` Pavel Begunkov
  2019-10-27 19:02               ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 18:56 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 2451 bytes --]

On 27/10/2019 20:26, Jens Axboe wrote:
> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>> On 27/10/2019 19:56, Jens Axboe wrote:
>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>> io_ring_submit()
>>>>>>>
>>>>>>> Pavel Begunkov (2):
>>>>>>>       io_uring: handle mm_fault outside of submission
>>>>>>>       io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>
>>>>>>>      fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>      1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>
>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>> to grab the mm.
>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>> functions. And in this case it's easier to do after merging.
>>>>>
>>>>> Do you prefer to return it back first?
>>>>
>>>> Ah I see, no I don't care about that.
>>>
>>> OK, looked at the post-patches state. It's still not correct. You are
>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>> that, only if the sqes we need to submit need mm context.
>>>
>> That's what my question to the fix was about :)
>> 1. Then, what the case it could fail?
>> 2. Is it ok to hold it while polling? It could keep it for quite
>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>
>> Anyway, I will add it back and resend the patchset.
> 
> If possible in a simple way, I'd prefer if we do it as a prep patch and
> then queue that up for 5.4 since we now lost that optimization.  Then
> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
> on top of that.
> 
> If not trivially possible for 5.4, then we'll just have to leave with it
> in that release. For that case, you can fold the change in with these
> two patches.
> 
Hmm, what's the semantics? I think we should fail only those who need
mm, but can't get it. The alternative is to fail all subsequent after
the first mm_fault.

-- 
Yours sincerely,
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 18:56             ` Pavel Begunkov
@ 2019-10-27 19:02               ` Jens Axboe
  2019-10-27 19:17                 ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2019-10-27 19:02 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block, linux-kernel

On 10/27/19 12:56 PM, Pavel Begunkov wrote:
> On 27/10/2019 20:26, Jens Axboe wrote:
>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>> io_ring_submit()
>>>>>>>>
>>>>>>>> Pavel Begunkov (2):
>>>>>>>>        io_uring: handle mm_fault outside of submission
>>>>>>>>        io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>
>>>>>>>>       fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>       1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>
>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>> to grab the mm.
>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>
>>>>>> Do you prefer to return it back first?
>>>>>
>>>>> Ah I see, no I don't care about that.
>>>>
>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>> that, only if the sqes we need to submit need mm context.
>>>>
>>> That's what my question to the fix was about :)
>>> 1. Then, what the case it could fail?
>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>
>>> Anyway, I will add it back and resend the patchset.
>>
>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>> then queue that up for 5.4 since we now lost that optimization.  Then
>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>> on top of that.
>>
>> If not trivially possible for 5.4, then we'll just have to leave with it
>> in that release. For that case, you can fold the change in with these
>> two patches.
>>
> Hmm, what's the semantics? I think we should fail only those who need
> mm, but can't get it. The alternative is to fail all subsequent after
> the first mm_fault.

For the sqthread setup, there's no notion of "do this many". It just
grabs whatever it can and issues it. This means that the mm assign
is really per-sqe. What we did before, with the batching, just optimized
it so we'd only grab it for one batch IFF at least one sqe in that batch
needed the mm.

Since you've killed the batching, I think the logic should be something
ala:

if (io_sqe_needs_user(sqe) && !cur_mm)) {
	if (already_attempted_mmget_and_failed_ {
		-EFAULT end sqe
	} else {
		do mm_get and mmuse dance
	}
}

Hence if the sqe doesn't need the mm, doesn't matter if we previously
failed. If we need the mm and previously failed, -EFAULT.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 19:02               ` Jens Axboe
@ 2019-10-27 19:17                 ` Pavel Begunkov
  2019-10-27 19:51                   ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 19:17 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3437 bytes --]

On 27/10/2019 22:02, Jens Axboe wrote:
> On 10/27/19 12:56 PM, Pavel Begunkov wrote:
>> On 27/10/2019 20:26, Jens Axboe wrote:
>>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>>> io_ring_submit()
>>>>>>>>>
>>>>>>>>> Pavel Begunkov (2):
>>>>>>>>>        io_uring: handle mm_fault outside of submission
>>>>>>>>>        io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>>
>>>>>>>>>       fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>>       1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>>
>>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>>> to grab the mm.
>>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>>
>>>>>>> Do you prefer to return it back first?
>>>>>>
>>>>>> Ah I see, no I don't care about that.
>>>>>
>>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>>> that, only if the sqes we need to submit need mm context.
>>>>>
>>>> That's what my question to the fix was about :)
>>>> 1. Then, what the case it could fail?
>>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>>
>>>> Anyway, I will add it back and resend the patchset.
>>>
>>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>>> then queue that up for 5.4 since we now lost that optimization.  Then
>>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>>> on top of that.
>>>
>>> If not trivially possible for 5.4, then we'll just have to leave with it
>>> in that release. For that case, you can fold the change in with these
>>> two patches.
>>>
>> Hmm, what's the semantics? I think we should fail only those who need
>> mm, but can't get it. The alternative is to fail all subsequent after
>> the first mm_fault.
> 
> For the sqthread setup, there's no notion of "do this many". It just
> grabs whatever it can and issues it. This means that the mm assign
> is really per-sqe. What we did before, with the batching, just optimized
> it so we'd only grab it for one batch IFF at least one sqe in that batch
> needed the mm.
> 
> Since you've killed the batching, I think the logic should be something
> ala:
> 
> if (io_sqe_needs_user(sqe) && !cur_mm)) {
> 	if (already_attempted_mmget_and_failed_ {
> 		-EFAULT end sqe
> 	} else {
> 		do mm_get and mmuse dance
> 	}
> }
> 
> Hence if the sqe doesn't need the mm, doesn't matter if we previously
> failed. If we need the mm and previously failed, -EFAULT.
> 
That makes sense, but a bit hard to implement honoring links and drains 

-- 
Yours sincerely,
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 19:17                 ` Pavel Begunkov
@ 2019-10-27 19:51                   ` Jens Axboe
  2019-10-27 19:59                     ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2019-10-27 19:51 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block, linux-kernel

On 10/27/19 1:17 PM, Pavel Begunkov wrote:
> On 27/10/2019 22:02, Jens Axboe wrote:
>> On 10/27/19 12:56 PM, Pavel Begunkov wrote:
>>> On 27/10/2019 20:26, Jens Axboe wrote:
>>>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>>>> io_ring_submit()
>>>>>>>>>>
>>>>>>>>>> Pavel Begunkov (2):
>>>>>>>>>>         io_uring: handle mm_fault outside of submission
>>>>>>>>>>         io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>>>
>>>>>>>>>>        fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>>>        1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>>>
>>>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>>>> to grab the mm.
>>>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>>>
>>>>>>>> Do you prefer to return it back first?
>>>>>>>
>>>>>>> Ah I see, no I don't care about that.
>>>>>>
>>>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>>>> that, only if the sqes we need to submit need mm context.
>>>>>>
>>>>> That's what my question to the fix was about :)
>>>>> 1. Then, what the case it could fail?
>>>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>>>
>>>>> Anyway, I will add it back and resend the patchset.
>>>>
>>>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>>>> then queue that up for 5.4 since we now lost that optimization.  Then
>>>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>>>> on top of that.
>>>>
>>>> If not trivially possible for 5.4, then we'll just have to leave with it
>>>> in that release. For that case, you can fold the change in with these
>>>> two patches.
>>>>
>>> Hmm, what's the semantics? I think we should fail only those who need
>>> mm, but can't get it. The alternative is to fail all subsequent after
>>> the first mm_fault.
>>
>> For the sqthread setup, there's no notion of "do this many". It just
>> grabs whatever it can and issues it. This means that the mm assign
>> is really per-sqe. What we did before, with the batching, just optimized
>> it so we'd only grab it for one batch IFF at least one sqe in that batch
>> needed the mm.
>>
>> Since you've killed the batching, I think the logic should be something
>> ala:
>>
>> if (io_sqe_needs_user(sqe) && !cur_mm)) {
>> 	if (already_attempted_mmget_and_failed_ {
>> 		-EFAULT end sqe
>> 	} else {
>> 		do mm_get and mmuse dance
>> 	}
>> }
>>
>> Hence if the sqe doesn't need the mm, doesn't matter if we previously
>> failed. If we need the mm and previously failed, -EFAULT.
>>
> That makes sense, but a bit hard to implement honoring links and drains

If it becomes too complicated or convoluted, just drop it. It's not
worth spending that much time on.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 19:51                   ` Jens Axboe
@ 2019-10-27 19:59                     ` Pavel Begunkov
  2019-10-28  3:38                       ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-27 19:59 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 3882 bytes --]

On 27/10/2019 22:51, Jens Axboe wrote:
> On 10/27/19 1:17 PM, Pavel Begunkov wrote:
>> On 27/10/2019 22:02, Jens Axboe wrote:
>>> On 10/27/19 12:56 PM, Pavel Begunkov wrote:
>>>> On 27/10/2019 20:26, Jens Axboe wrote:
>>>>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>>>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>>>>> io_ring_submit()
>>>>>>>>>>>
>>>>>>>>>>> Pavel Begunkov (2):
>>>>>>>>>>>         io_uring: handle mm_fault outside of submission
>>>>>>>>>>>         io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>>>>
>>>>>>>>>>>        fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>>>>        1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>>>>
>>>>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>>>>> to grab the mm.
>>>>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>>>>
>>>>>>>>> Do you prefer to return it back first?
>>>>>>>>
>>>>>>>> Ah I see, no I don't care about that.
>>>>>>>
>>>>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>>>>> that, only if the sqes we need to submit need mm context.
>>>>>>>
>>>>>> That's what my question to the fix was about :)
>>>>>> 1. Then, what the case it could fail?
>>>>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>>>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>>>>
>>>>>> Anyway, I will add it back and resend the patchset.
>>>>>
>>>>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>>>>> then queue that up for 5.4 since we now lost that optimization.  Then
>>>>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>>>>> on top of that.
>>>>>
>>>>> If not trivially possible for 5.4, then we'll just have to leave with it
>>>>> in that release. For that case, you can fold the change in with these
>>>>> two patches.
>>>>>
>>>> Hmm, what's the semantics? I think we should fail only those who need
>>>> mm, but can't get it. The alternative is to fail all subsequent after
>>>> the first mm_fault.
>>>
>>> For the sqthread setup, there's no notion of "do this many". It just
>>> grabs whatever it can and issues it. This means that the mm assign
>>> is really per-sqe. What we did before, with the batching, just optimized
>>> it so we'd only grab it for one batch IFF at least one sqe in that batch
>>> needed the mm.
>>>
>>> Since you've killed the batching, I think the logic should be something
>>> ala:
>>>
>>> if (io_sqe_needs_user(sqe) && !cur_mm)) {
>>> 	if (already_attempted_mmget_and_failed_ {
>>> 		-EFAULT end sqe
>>> 	} else {
>>> 		do mm_get and mmuse dance
>>> 	}
>>> }
>>>
>>> Hence if the sqe doesn't need the mm, doesn't matter if we previously
>>> failed. If we need the mm and previously failed, -EFAULT.
>>>
>> That makes sense, but a bit hard to implement honoring links and drains
> 
> If it becomes too complicated or convoluted, just drop it. It's not
> worth spending that much time on.
> 
I've already done it more or less elegantly, just prefer to test commits
before sending.

-- 
Yours sincerely,
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-27 19:59                     ` Pavel Begunkov
@ 2019-10-28  3:38                       ` Jens Axboe
  2019-10-28 11:12                         ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2019-10-28  3:38 UTC (permalink / raw)
  To: Pavel Begunkov, linux-block, linux-kernel

On 10/27/19 1:59 PM, Pavel Begunkov wrote:
> On 27/10/2019 22:51, Jens Axboe wrote:
>> On 10/27/19 1:17 PM, Pavel Begunkov wrote:
>>> On 27/10/2019 22:02, Jens Axboe wrote:
>>>> On 10/27/19 12:56 PM, Pavel Begunkov wrote:
>>>>> On 27/10/2019 20:26, Jens Axboe wrote:
>>>>>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>>>>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>>>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>>>>>> io_ring_submit()
>>>>>>>>>>>>
>>>>>>>>>>>> Pavel Begunkov (2):
>>>>>>>>>>>>          io_uring: handle mm_fault outside of submission
>>>>>>>>>>>>          io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>>>>>
>>>>>>>>>>>>         fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>>>>>         1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>>>>>> to grab the mm.
>>>>>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>>>>>
>>>>>>>>>> Do you prefer to return it back first?
>>>>>>>>>
>>>>>>>>> Ah I see, no I don't care about that.
>>>>>>>>
>>>>>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>>>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>>>>>> that, only if the sqes we need to submit need mm context.
>>>>>>>>
>>>>>>> That's what my question to the fix was about :)
>>>>>>> 1. Then, what the case it could fail?
>>>>>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>>>>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>>>>>
>>>>>>> Anyway, I will add it back and resend the patchset.
>>>>>>
>>>>>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>>>>>> then queue that up for 5.4 since we now lost that optimization.  Then
>>>>>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>>>>>> on top of that.
>>>>>>
>>>>>> If not trivially possible for 5.4, then we'll just have to leave with it
>>>>>> in that release. For that case, you can fold the change in with these
>>>>>> two patches.
>>>>>>
>>>>> Hmm, what's the semantics? I think we should fail only those who need
>>>>> mm, but can't get it. The alternative is to fail all subsequent after
>>>>> the first mm_fault.
>>>>
>>>> For the sqthread setup, there's no notion of "do this many". It just
>>>> grabs whatever it can and issues it. This means that the mm assign
>>>> is really per-sqe. What we did before, with the batching, just optimized
>>>> it so we'd only grab it for one batch IFF at least one sqe in that batch
>>>> needed the mm.
>>>>
>>>> Since you've killed the batching, I think the logic should be something
>>>> ala:
>>>>
>>>> if (io_sqe_needs_user(sqe) && !cur_mm)) {
>>>> 	if (already_attempted_mmget_and_failed_ {
>>>> 		-EFAULT end sqe
>>>> 	} else {
>>>> 		do mm_get and mmuse dance
>>>> 	}
>>>> }
>>>>
>>>> Hence if the sqe doesn't need the mm, doesn't matter if we previously
>>>> failed. If we need the mm and previously failed, -EFAULT.
>>>>
>>> That makes sense, but a bit hard to implement honoring links and drains
>>
>> If it becomes too complicated or convoluted, just drop it. It's not
>> worth spending that much time on.
>>
> I've already done it more or less elegantly, just prefer to test commits
> before sending.

That's always appreciated!

It struck me that while I've added quite a few regression tests, we don't
have any that just do basic read/write using the variety of settings we
have for that. So I added that to liburing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2][for-next] cleanup submission path
  2019-10-28  3:38                       ` Jens Axboe
@ 2019-10-28 11:12                         ` Pavel Begunkov
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2019-10-28 11:12 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4465 bytes --]

On 28/10/2019 06:38, Jens Axboe wrote:
> On 10/27/19 1:59 PM, Pavel Begunkov wrote:
>> On 27/10/2019 22:51, Jens Axboe wrote:
>>> On 10/27/19 1:17 PM, Pavel Begunkov wrote:
>>>> On 27/10/2019 22:02, Jens Axboe wrote:
>>>>> On 10/27/19 12:56 PM, Pavel Begunkov wrote:
>>>>>> On 27/10/2019 20:26, Jens Axboe wrote:
>>>>>>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>>>>>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>>>>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>>>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>>>>>>> io_ring_submit()
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pavel Begunkov (2):
>>>>>>>>>>>>>          io_uring: handle mm_fault outside of submission
>>>>>>>>>>>>>          io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>>>>>>
>>>>>>>>>>>>>         fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>>>>>>         1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>>>>>>> to grab the mm.
>>>>>>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>>>>>>
>>>>>>>>>>> Do you prefer to return it back first?
>>>>>>>>>>
>>>>>>>>>> Ah I see, no I don't care about that.
>>>>>>>>>
>>>>>>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>>>>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>>>>>>> that, only if the sqes we need to submit need mm context.
>>>>>>>>>
>>>>>>>> That's what my question to the fix was about :)
>>>>>>>> 1. Then, what the case it could fail?
>>>>>>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>>>>>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>>>>>>
>>>>>>>> Anyway, I will add it back and resend the patchset.
>>>>>>>
>>>>>>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>>>>>>> then queue that up for 5.4 since we now lost that optimization.  Then
>>>>>>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>>>>>>> on top of that.
>>>>>>>
>>>>>>> If not trivially possible for 5.4, then we'll just have to leave with it
>>>>>>> in that release. For that case, you can fold the change in with these
>>>>>>> two patches.
>>>>>>>
>>>>>> Hmm, what's the semantics? I think we should fail only those who need
>>>>>> mm, but can't get it. The alternative is to fail all subsequent after
>>>>>> the first mm_fault.
>>>>>
>>>>> For the sqthread setup, there's no notion of "do this many". It just
>>>>> grabs whatever it can and issues it. This means that the mm assign
>>>>> is really per-sqe. What we did before, with the batching, just optimized
>>>>> it so we'd only grab it for one batch IFF at least one sqe in that batch
>>>>> needed the mm.
>>>>>
>>>>> Since you've killed the batching, I think the logic should be something
>>>>> ala:
>>>>>
>>>>> if (io_sqe_needs_user(sqe) && !cur_mm)) {
>>>>> 	if (already_attempted_mmget_and_failed_ {
>>>>> 		-EFAULT end sqe
>>>>> 	} else {
>>>>> 		do mm_get and mmuse dance
>>>>> 	}
>>>>> }
>>>>>
>>>>> Hence if the sqe doesn't need the mm, doesn't matter if we previously
>>>>> failed. If we need the mm and previously failed, -EFAULT.
>>>>>
>>>> That makes sense, but a bit hard to implement honoring links and drains
>>>
>>> If it becomes too complicated or convoluted, just drop it. It's not
>>> worth spending that much time on.
>>>
>> I've already done it more or less elegantly, just prefer to test commits
>> before sending.
> 
> That's always appreciated!
> 
> It struck me that while I've added quite a few regression tests, we don't
> have any that just do basic read/write using the variety of settings we
> have for that. So I added that to liburing.
> 
Great, thanks!
I think, I'll postpone patches including these until start of 5.5

-- 
Yours sincerely,
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-10-28 11:12 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-27 15:35 [PATCH 0/2][for-next] cleanup submission path Pavel Begunkov
2019-10-27 15:35 ` [PATCH 1/2] io_uring: handle mm_fault outside of submission Pavel Begunkov
2019-10-27 15:35 ` [PATCH 2/2] io_uring: merge io_submit_sqes and io_ring_submit Pavel Begunkov
2019-10-27 16:32 ` [PATCH 0/2][for-next] cleanup submission path Jens Axboe
2019-10-27 16:44   ` Pavel Begunkov
2019-10-27 16:49     ` Jens Axboe
2019-10-27 16:56       ` Jens Axboe
2019-10-27 17:19         ` Pavel Begunkov
2019-10-27 17:26           ` Jens Axboe
2019-10-27 17:37             ` Pavel Begunkov
2019-10-27 18:56             ` Pavel Begunkov
2019-10-27 19:02               ` Jens Axboe
2019-10-27 19:17                 ` Pavel Begunkov
2019-10-27 19:51                   ` Jens Axboe
2019-10-27 19:59                     ` Pavel Begunkov
2019-10-28  3:38                       ` Jens Axboe
2019-10-28 11:12                         ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).