linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	John Garry <john.garry@huawei.com>,
	linux-block@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [RFC v2 PATCH] mm, sl[au]b: Introduce lockless cache
Date: Tue, 21 Sep 2021 15:42:39 +0000	[thread overview]
Message-ID: <20210921154239.GA5092@kvm.asia-northeast3-a.c.our-ratio-313919.internal> (raw)
In-Reply-To: <YUkErK1vVZMht4s8@casper.infradead.org>

On Mon, Sep 20, 2021 at 11:01:16PM +0100, Matthew Wilcox wrote:
> On Mon, Sep 20, 2021 at 03:48:16PM +0000, Hyeonggon Yoo wrote:
> > +#define KMEM_LOCKLESS_CACHE_QUEUE_SIZE 64
> 
> I would suggest that, to be nice to the percpu allocator, this be
> one less than 2^n.

I think first we need to discuss if KMEM_LOCKLESS_CACHE_QUEUE_SIZE will be
okay with constant, or per kmem_cache size, or global boot parameter.

But even before that, we need to discuss how to manage magazine.
because that affect on what KMEM_LOCKLESS_CACHE_QUEUE_SIZE will be.

> > +struct kmem_lockless_cache {
> > +	void *queue[KMEM_LOCKLESS_CACHE_QUEUE_SIZE];
> > +	unsigned int size;
> > +};
> 
> I would also suggest that 'size' be first as it is going to be accessed
> every time, and then there's a reasonable chance that queue[size - 1] will
> be in the same cacheline.  CPUs will tend to handle that better.

That looks good to me, as you said there's chance that 'size' and some of
queue's elements (hopefully queue[size - 1]) are in same cacheline.

Plus, in v2 I didn't consider cache line when determining position of
kmem_lockless_cache in kmem_cache, I think we need to reconsider this
too.

> > +/**
> > + * kmem_cache_alloc_cached - try to allocate from cache without lock
> > + * @s: slab cache
> > + * @flags: SLAB flags
> > + *
> > + * Try to allocate from cache without lock. If fails, fill the lockless cache
> > + * using bulk alloc API
> > + *
> > + * Be sure that there's no race condition.
> > + * Must create slab cache with SLAB_LOCKLESS_CACHE flag to use this function.
> > + *
> > + * Return: a pointer to free object on allocation success, NULL on failure.
> > + */
> > +void *kmem_cache_alloc_cached(struct kmem_cache *s, gfp_t gfpflags)
> > +{
> > +	struct kmem_lockless_cache *cache = this_cpu_ptr(s->cache);
> > +
> > +	BUG_ON(!(s->flags & SLAB_LOCKLESS_CACHE));
> > +
> > +	if (cache->size) /* fastpath without lock */
> > +		return cache->queue[--cache->size];
> > +
> > +	/* slowpath */
> > +	cache->size = kmem_cache_alloc_bulk(s, gfpflags,
> > +			KMEM_LOCKLESS_CACHE_QUEUE_SIZE, cache->queue);
> 
> Go back to the Bonwick paper and look at the magazine section again.
> You have to allocate _half_ the size of the queue, otherwise you get
> into pathological situations where you start to free and allocate
> every time.

I want to ask you where idea of allocating 'half' the size of queue came from.
the paper you sent does not work with single queue(magazine). Instead,
it manages pool of magazines.

And after reading the paper, I see managing pool of magazine (where M is
an boot parameter) is valid approach to reduce hitting slowpath.

Thanks,
Hyeonggon Yoo

> > +void kmem_cache_free_cached(struct kmem_cache *s, void *p)
> > +{
> > +	struct kmem_lockless_cache *cache = this_cpu_ptr(s->cache);
> > +
> > +	BUG_ON(!(s->flags & SLAB_LOCKLESS_CACHE));
> > +
> > +	/* Is there better way to do this? */
> > +	if (cache->size == KMEM_LOCKLESS_CACHE_QUEUE_SIZE)
> > +		kmem_cache_free(s, cache->queue[--cache->size]);
> 
> Yes.
> 
> 	if (cache->size == KMEM_LOCKLESS_CACHE_QUEUE_SIZE) {
> 		kmem_cache_free_bulk(s, KMEM_LOCKLESS_CACHE_QUEUE_SIZE / 2,
> 			&cache->queue[KMEM_LOCKLESS_CACHE_QUEUE_SIZE / 2));
> 		cache->size = KMEM_LOCKLESS_CACHE_QUEUE_SIZE / 2;
> 
> (check the maths on that; it might have some off-by-one)
 

  parent reply	other threads:[~2021-09-21 15:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-20 15:48 Hyeonggon Yoo
2021-09-20 22:01 ` Matthew Wilcox
2021-09-21 10:56   ` Hyeonggon Yoo
2021-09-21 15:42   ` Hyeonggon Yoo [this message]
2021-09-21 16:17     ` Matthew Wilcox
2021-09-22  8:32       ` Hyeonggon Yoo
2021-09-22  9:11         ` Hyeonggon Yoo
2021-09-21 15:37 ` Jens Axboe
2021-09-22  8:19   ` Hyeonggon Yoo
2021-09-22 12:58     ` Jens Axboe
2021-09-23  3:34       ` Hyeonggon Yoo
2021-09-23 13:28       ` Jakub Kicinski
2021-09-23  3:55 ` Github link here Hyeonggon Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210921154239.GA5092@kvm.asia-northeast3-a.c.our-ratio-313919.internal \
    --to=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=john.garry@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --subject='Re: [RFC v2 PATCH] mm, sl[au]b: Introduce lockless cache' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).