Re: [RFC v2 PATCH] mm, sl[au]b: Introduce lockless cache

From: Jens Axboe <axboe@kernel.dk>
To: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: linux-mm@kvack.org, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	John Garry <john.garry@huawei.com>,
	linux-block@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [RFC v2 PATCH] mm, sl[au]b: Introduce lockless cache
Date: Wed, 22 Sep 2021 06:58:00 -0600	[thread overview]
Message-ID: <688e6750-87e9-fb44-ce40-943bad072e48@kernel.dk> (raw)
In-Reply-To: <20210922081906.GA78305@kvm.asia-northeast3-a.c.our-ratio-313919.internal>

On 9/22/21 2:19 AM, Hyeonggon Yoo wrote:
> On Tue, Sep 21, 2021 at 09:37:40AM -0600, Jens Axboe wrote:
>>> @@ -424,6 +431,57 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
>>>  }
>>>  EXPORT_SYMBOL(kmem_cache_create);
>>>  
>>> +/**
>>> + * kmem_cache_alloc_cached - try to allocate from cache without lock
>>> + * @s: slab cache
>>> + * @flags: SLAB flags
>>> + *
>>> + * Try to allocate from cache without lock. If fails, fill the lockless cache
>>> + * using bulk alloc API
>>> + *
>>> + * Be sure that there's no race condition.
>>> + * Must create slab cache with SLAB_LOCKLESS_CACHE flag to use this function.
>>> + *
>>> + * Return: a pointer to free object on allocation success, NULL on failure.
>>> + */
>>> +void *kmem_cache_alloc_cached(struct kmem_cache *s, gfp_t gfpflags)
>>> +{
>>> +	struct kmem_lockless_cache *cache = this_cpu_ptr(s->cache);
>>> +
>>> +	BUG_ON(!(s->flags & SLAB_LOCKLESS_CACHE));
>>> +
>>> +	if (cache->size) /* fastpath without lock */
>>> +		return cache->queue[--cache->size];
>>> +
>>> +	/* slowpath */
>>> +	cache->size = kmem_cache_alloc_bulk(s, gfpflags,
>>> +			KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue);
>>> +	if (cache->size)
>>> +		return cache->queue[--cache->size];
>>> +	else
>>> +		return NULL;
>>> +}
>>> +EXPORT_SYMBOL(kmem_cache_alloc_cached);
> 
> Hello Jens, I'm so happy that you gave comment.
> 
>> What I implemented for IOPOLL doesn't need to care about interrupts,
>> hence preemption disable is enough. But we do need that, at least.
> 
> To be honest, that was my mistake. I was mistakenly using percpu API.
> it's a shame :> Thank you for pointing that.
> 
> Fixed it in v3 (work in progress now)

Another thing to fix from there, just make it:

if (cache->size)
	return cached item
return NULL;

No need for an if/else with the return.

> 
>> There are basically two types of use cases for this:
>>
>> 1) Freeing can happen from interrupts
>> 2) Freeing cannot happen from interrupts
>>
> 
> I considered only case 2) when writing code. Well, To support 1),
> I think there are two ways:
> 
>  a) internally call kmem_cache_free when in_interrupt() is true
>  b) caller must disable interrupt when freeing
> 
> I think a) is okay, how do you think?

If the API doesn't support freeing from interrupts, then I'd make that
the rule. Caller should know better if that can happen, and then just
use kmem_cache_free() if in a problematic context. That avoids polluting
the fast path with that check. I'd still make it a WARN_ON_ONCE() as
described and it can get removed later, hopefully.

But note it's not just the freeing side that would be problematic. If
you do support from-irq freeing, then the alloc side would need
local_irq_save/restore() instead of just basic preempt protection. That
would make it more expensive all around.

> note that b) can be problematic with kmem_cache_free_bulk
> as it says interrupts must be enabled.

Not sure that's actually true, apart from being bitrot.

>> How does this work for preempt? You seem to assume that the function is
>> invoked with preempt disabled, but then it could only be used with
>> GFP_ATOMIC.
> 
> I wrote it just same prototype with kmem_cache_alloc, and the gfpflags
> parameter is unnecessary as you said. Okay, let's remove it in v3.

Please also run some actual comparitative benchmarks on this, with a
real workload. You also need an internal user of this, a stand-alone
feature isn't really useful. It needs someone using it, and the
benchmarks would be based on that (or those) use cases.

Another consideration - is it going to be OK to ignore slab pruning for
this? Probably. But it needs to be thought about.

In terms of API, not sure why these are separate functions. Creating the
cache with SLAB_FOO should be enough, and then kmem_cache_alloc() tries
the cache first. If nothing there, fallback to the regular path.
Something like this would only be used for cases that have a high
alloc+free rate, would seem like a shame to need two function calls for
the case where you just want a piece of memory and the cache is empty.

-- 
Jens Axboe