Re: [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme

From: Jens Axboe <axboe@kernel.dk>
To: Jerome Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Benjamin LaHaise <bcrl@kvack.org>
Subject: Re: [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme
Date: Wed, 22 Jan 2020 10:04:44 -0700	[thread overview]
Message-ID: <66027259-81c3-0bc4-a70b-74069e746058@kernel.dk> (raw)
In-Reply-To: <20200122165427.GA6009@redhat.com>

On 1/22/20 9:54 AM, Jerome Glisse wrote:
> On Wed, Jan 22, 2020 at 08:12:51AM -0700, Jens Axboe wrote:
>> On 1/22/20 4:59 AM, Michal Hocko wrote:
>>> On Tue 21-01-20 20:57:23, Jerome Glisse wrote:
>>>> We can also discuss what kind of knobs we want to expose so that
>>>> people can decide to choose the tradeof themself (ie from i want low
>>>> latency io-uring and i don't care wether mm can not do its business; to
>>>> i want mm to never be impeded in its business and i accept the extra
>>>> latency burst i might face in io operations).
>>>
>>> I do not think it is a good idea to make this configurable. How can
>>> people sensibly choose between the two without deep understanding of
>>> internals?
>>
>> Fully agree, we can't just punt this to a knob and call it good, that's
>> a typical fallacy of core changes. And there is only one mode for
>> io_uring, and that's consistent low latency. If this change introduces
>> weird reclaim, compaction or migration latencies, then that's a
>> non-starter as far as I'm concerned.
>>
>> And what do those two settings even mean? I don't even know, and a user
>> sure as hell doesn't either.
>>
>> io_uring pins two types of pages - registered buffers, these are used
>> for actual IO, and the rings themselves. The rings are not used for IO,
>> just used to communicate between the application and the kernel.
> 
> So, do we still want to solve file back pages write back if page in
> ubuffer are from a file ?

That's not currently a concern for io_uring, as it disallows file backed
pages for the IO buffers that are being registered.

> Also we can introduce a flag when registering buffer that allows to
> register buffer without pining and thus avoid the RLIMIT_MEMLOCK at
> the cost of possible latency spike. Then user registering the buffer
> knows what he gets.

That may be fine for others users, but I don't think it'll apply
to io_uring. I can't see anyone selecting that flag, unless you're
doing something funky where you're registering a substantial amount
of the system memory for IO buffers. And I don't think that's going
to be a super valid use case...

> Maybe it would be good to test, it might stay in the noise, then it
> might be a good thing to do. Also they are strategy to avoid latency
> spike for instance we can block/force skip mm invalidation if buffer
> has pending/running io in the ring ie only have buffer invalidation
> happens when there is no pending/running submission entry.

Would that really work? The buffer could very well be idle right when
you check, but wanting to do IO the instant you decide you can do
background work on it. Additionally, that would require accounting
on when the buffers are inflight, which is exactly the kind of
overhead we're trying to avoid to begin with.

> We can also pick what kind of invalidation we allow (compaction,
> migration, ...) and thus limit the scope and likelyhood of
> invalidation.

I think it'd be useful to try and understand the use case first.
If we're pinning a small percentage of the system memory, do we
really care at all? Isn't it completely fine to just ignore?

-- 
Jens Axboe