Re: [PATCH 1/1] rcu/tree: support kfree_bulk() interface in kfree_rcu()

From: Uladzislau Rezki <urezki@gmail.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [PATCH 1/1] rcu/tree: support kfree_bulk() interface in kfree_rcu()
Date: Fri, 17 Jan 2020 18:52:17 +0100	[thread overview]
Message-ID: <20200117175217.GA23622@pc636> (raw)
In-Reply-To: <20200115225350.GA246464@google.com>

> > > > But rcuperf uses a single block size, which turns into kfree_bulk() using
> > > > a single slab, which results in good locality of reference.  So I have to
> > > 
> > > You meant a "single cache" category when you say "single slab"? Just to
> > > mention, the number of slabs (in a single cache) when a large number of
> > > objects are allocated is more than 1 (not single). With current rcuperf, I
> > > see 100s of slabs (each slab being one page) in the kmalloc-32 cache. Each
> > > slab contains around 128 objects of type kfree_rcu (24 byte object aligned to
> > > 32-byte slab object).
> > > 
> > I think that is about using different slab caches to break locality. It
> > makes sense, IMHO, because usually the system make use of different slabs,
> > because of different object sizes. From the other hand i guess there are
> > test cases when only one slab gets used.
> 
> I was wondering about "locality". A cache can be split into many slabs. Only
> the data on a page is local (contiguous). If there are a large number of
> objects, then it goes to a new slab (on the same cache). At least on the
> kmalloc slabs, there is only 1 slab per page. So for example, if on
> kmalloc-32 slab, there are more than 128 objects, then it goes to a different
> slab / page. So how is there still locality?
> 
Hmm.. On a high level:

one slab cache manages a specific object size, i.e. the slab memory consists of
contiguous pages(when increased probably not) of memory(4096 bytes or so) divided
into equal object size. For example when kmalloc() gets called, the appropriate
cache size(slab that serves only specific size) is selected and an object assigned
from it is returned.

But that is theory and i have not deeply analyzed how the SLAB works internally,
so i can be wrong :)

You mentioned 128 objects per one slab in the kmalloc-32 slab-cache. But all of
them follows each other, i mean it is sequential and is like regular array. In
that sense freeing can be beneficial because when an access is done to any object
whole CPU cache-line is fetched(if it was not before), usually it is 64K.

That is what i meant "locality". In order to "break it" i meant to allocate from
different slabs to see how kfree_slub() behaves in that sense, what is more real
scenario and workload, i think.

--
Vlad Rezki