Re: [PATCH v2 2/4] mm: introduce cma_alloc_bulk API

From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	hyesoo.yu@samsung.com, willy@infradead.org,
	iamjoonsoo.kim@lge.com, vbabka@suse.cz, surenb@google.com,
	pullip.cho@samsung.com, joaodias@google.com, hridya@google.com,
	sumit.semwal@linaro.org, john.stultz@linaro.org,
	Brian.Starkey@arm.com, linux-media@vger.kernel.org,
	devicetree@vger.kernel.org, robh@kernel.org,
	christian.koenig@amd.com, linaro-mm-sig@lists.linaro.org
Subject: Re: [PATCH v2 2/4] mm: introduce cma_alloc_bulk API
Date: Thu, 3 Dec 2020 10:47:02 +0100	[thread overview]
Message-ID: <c5209dce-dc30-6d8d-e8f8-c5412b072310@redhat.com> (raw)
In-Reply-To: <20201203082810.GX17338@dhcp22.suse.cz>

On 03.12.20 09:28, Michal Hocko wrote:
> On Wed 02-12-20 21:22:36, David Hildenbrand wrote:
>> On 02.12.20 20:26, Minchan Kim wrote:
>>> On Wed, Dec 02, 2020 at 07:51:07PM +0100, Michal Hocko wrote:
> [...]
>>>> I am still not sure a specific flag is a good interface. Really can this
>>>> be gfp_mask instead?
>>>
>>> I am not strong(even, I did it with GFP_NORETRY) but David wanted to
>>> have special mode and I agreed when he mentioned ALLOC_CONTIG_HARD as
>>> one of options in future(it would be hard to indicate that mode with
>>> gfp flags).
>>
>> I can't tell regarding the CMA interface, but for the alloc_contig()
>> interface I think modes make sense. Yes, it's different to other
>> allocaters, but the contig range allocater is different already. E.g.,
>> the CMA allocater mostly hides "which exact PFNs you try to allocate".
> 
> Yes, alloc_contig_range is a low level API but it already has a gfp_mask
> parameter. Adding yet another allocation mode sounds like API
> convolution to me.

Well, if another parameter is a concern, we can introduce

alloc_contig_range_fast()

instead.

> 
>> In the contig range allocater, gfp flags are currently used to express
>> how to allocate pages used as migration targets. I don't think mangling
>> in other gfp flags (or even overloading them) makes things a lot
>> clearer. E.g., GFP_NORETRY: don't retry to allocate migration targets?
>> don't retry to migrate pages? both?
>>
>> As I said, other aspects might be harder to model (e.g., don't drain
>> LRU) and hiding them behind generic gfp flags (e.g., GFP_NORETRY) feels
>> wrong.
>>
>> With the mode, we're expressing details for the necessary page
>> migration. Suggestions on how to model that are welcome.
> 
> The question is whether the caller should really have such an intimate
> knowledge and control of the alloc_contig_range implementation. This all
> are implementation details. Should really anybody think about how many
> times migration retries or control LRU draining? Those can change in the

The question is not "how many times", rather "if at all". I can
understand the possible performance improvements by letting the caller
handle things (lru draining, pcp draining) like that when issuing
gazillions of alloc_contig_range() calls.

> future and I do not think we really want to go over all users grown over
> that time and try to deduce what was the intention behind.

That's why I think we need a clear mechanism to express the expected
behavior - something we can properly document and users can actually
understand to optimize for - and we can fix them up when the documented
behavior changes. Mangling this into somewhat-fitting gfp flags makes
the interface harder to use and more error-prone IMHO.

> 
> I think we should aim at easy and very highlevel behavior:
> - GFP_NOWAIT - unsupported currently IIRC but something that something
>   that should be possible to implement. Isolation is non blocking,
>   migration could be skipped
> - GFP_KERNEL - default behavior whatever that means
> - GFP_NORETRY - opportunistic allocation as lightweight as we can get.
>   Failures to be expected also for transient reasons.
> - GFP_RETRY_MAYFAIL - try hard but not as hard as to trigger disruption
>   (e.g. via oom killer).

I think we currently see demand for 3 modes for alloc_contig_range()

a) normal

As is. Try, but don't try too hard. E.g., drain LRU, drain PCP, retry a
couple of times. Failures in some cases (short-term pinning, PCP races)
are still possible and acceptable.

GFP_RETRY_MAYFAIL ?

E.g., "Allocations with this flag may fail, but only when there is
genuinely little unused memory." - current description does not match at
all. When allocating ranges things behave completely different.

b) fast

Try, but fail fast. Leave optimizations that can improve the result to
the caller. E.g., don't drain LRU, don't drain PCP, don't retry.
Frequent failures are expected and acceptable.

__GFP_NORETRY ?

E.g., "The VM implementation will try only very lightweight memory
direct reclaim to get some memory under memory pressure" - again, I
think current description does not really match.

c) hard

Try hard, E.g., temporarily disabling the PCP. Certainly not
__GFP_NOFAIL, that would be highly dangerous. So no flags / GFP_KERNEL?

> 
> - __GFP_THIS_NODE - stick to a node without fallback
> - we can support zone modifiers although there is no existing user.
> - __GFP_NOWARN - obvious
> 
> And that is it. Or maybe I am seeing that oversimplified.
> 

Again, I think most flags make sense for the migration target allocation
 path and mainly deal with OOM situations and reclaim. For the migration
path - which is specific to the alloc_contig_range() allocater - they
don't really apply and create more confusion than they actually help - IMHO.

-- 
Thanks,

David / dhildenb