Re: [PATCH 0/2][for-next] cleanup submission path

From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/2][for-next] cleanup submission path
Date: Sun, 27 Oct 2019 13:51:33 -0600	[thread overview]
Message-ID: <3957148b-0dac-a621-8f12-5d2d45557e24@kernel.dk> (raw)
In-Reply-To: <3b8b84d0-cde2-6bb0-c903-a1d71f9b83e2@gmail.com>

On 10/27/19 1:17 PM, Pavel Begunkov wrote:
> On 27/10/2019 22:02, Jens Axboe wrote:
>> On 10/27/19 12:56 PM, Pavel Begunkov wrote:
>>> On 27/10/2019 20:26, Jens Axboe wrote:
>>>> On 10/27/19 11:19 AM, Pavel Begunkov wrote:
>>>>> On 27/10/2019 19:56, Jens Axboe wrote:
>>>>>> On 10/27/19 10:49 AM, Jens Axboe wrote:
>>>>>>> On 10/27/19 10:44 AM, Pavel Begunkov wrote:
>>>>>>>> On 27/10/2019 19:32, Jens Axboe wrote:
>>>>>>>>> On 10/27/19 9:35 AM, Pavel Begunkov wrote:
>>>>>>>>>> A small cleanup of very similar but diverged io_submit_sqes() and
>>>>>>>>>> io_ring_submit()
>>>>>>>>>>
>>>>>>>>>> Pavel Begunkov (2):
>>>>>>>>>>         io_uring: handle mm_fault outside of submission
>>>>>>>>>>         io_uring: merge io_submit_sqes and io_ring_submit
>>>>>>>>>>
>>>>>>>>>>        fs/io_uring.c | 116 ++++++++++++++------------------------------------
>>>>>>>>>>        1 file changed, 33 insertions(+), 83 deletions(-)
>>>>>>>>>
>>>>>>>>> I like the cleanups here, but one thing that seems off is the
>>>>>>>>> assumption that io_sq_thread() always needs to grab the mm. If
>>>>>>>>> the sqes processed are just READ/WRITE_FIXED, then it never needs
>>>>>>>>> to grab the mm.
>>>>>>>>> Yeah, we removed it to fix bugs. Personally, I think it would be
>>>>>>>> clearer to do lazy grabbing conditionally, rather than have two
>>>>>>>> functions. And in this case it's easier to do after merging.
>>>>>>>>
>>>>>>>> Do you prefer to return it back first?
>>>>>>>
>>>>>>> Ah I see, no I don't care about that.
>>>>>>
>>>>>> OK, looked at the post-patches state. It's still not correct. You are
>>>>>> grabbing the mm from io_sq_thread() unconditionally. We should not do
>>>>>> that, only if the sqes we need to submit need mm context.
>>>>>>
>>>>> That's what my question to the fix was about :)
>>>>> 1. Then, what the case it could fail?
>>>>> 2. Is it ok to hold it while polling? It could keep it for quite
>>>>> a long time if host is swift, e.g. submit->poll->submit->poll-> ...
>>>>>
>>>>> Anyway, I will add it back and resend the patchset.
>>>>
>>>> If possible in a simple way, I'd prefer if we do it as a prep patch and
>>>> then queue that up for 5.4 since we now lost that optimization.  Then
>>>> layer the other 2 on top of that, since I'll just rebase the 5.5 stuff
>>>> on top of that.
>>>>
>>>> If not trivially possible for 5.4, then we'll just have to leave with it
>>>> in that release. For that case, you can fold the change in with these
>>>> two patches.
>>>>
>>> Hmm, what's the semantics? I think we should fail only those who need
>>> mm, but can't get it. The alternative is to fail all subsequent after
>>> the first mm_fault.
>>
>> For the sqthread setup, there's no notion of "do this many". It just
>> grabs whatever it can and issues it. This means that the mm assign
>> is really per-sqe. What we did before, with the batching, just optimized
>> it so we'd only grab it for one batch IFF at least one sqe in that batch
>> needed the mm.
>>
>> Since you've killed the batching, I think the logic should be something
>> ala:
>>
>> if (io_sqe_needs_user(sqe) && !cur_mm)) {
>> 	if (already_attempted_mmget_and_failed_ {
>> 		-EFAULT end sqe
>> 	} else {
>> 		do mm_get and mmuse dance
>> 	}
>> }
>>
>> Hence if the sqe doesn't need the mm, doesn't matter if we previously
>> failed. If we need the mm and previously failed, -EFAULT.
>>
> That makes sense, but a bit hard to implement honoring links and drains

If it becomes too complicated or convoluted, just drop it. It's not
worth spending that much time on.

-- 
Jens Axboe