All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Some questions and an idea on SLUB/SLAB
@ 2021-10-09  0:19 Hyeonggon Yoo
  2021-10-09  0:33 ` Matthew Wilcox
  2021-10-11  7:13 ` [RFC] Some questions and an idea on SLUB/SLAB Christoph Lameter
  0 siblings, 2 replies; 7+ messages in thread
From: Hyeonggon Yoo @ 2021-10-09  0:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Christoph Lameter, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka

Questions:

 - Is there a reason that SLUB does not implement cache coloring?
   it will help utilizing hardware cache. Especially in block layer,
   they are literally *squeezing* its performance now.
 
 - In SLAB, do we really need to flush queues every few seconds? 
   (per cpu queue and shared queue). Flushing alien caches makes
   sense, but flushing queues seems reducing it's fastpath.
   But yeah, we need to reclaim memory. can we just defer this?

Idea:

  - I don't like SLAB's per-node cache coloring, because L1 cache
    isn't shared between cpus. For now, cpus in same node are sharing
    its colour_next - but we can do better.

    what about splitting some per-cpu variables into kmem_cache_cpu
    like SLUB? I think cpu_cache, colour (and colour_next),
    alloc{hit,miss}, and free{hit,miss} can be per-cpu variables.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Some questions and an idea on SLUB/SLAB
  2021-10-09  0:19 [RFC] Some questions and an idea on SLUB/SLAB Hyeonggon Yoo
@ 2021-10-09  0:33 ` Matthew Wilcox
  2021-10-09  0:40   ` Hyeonggon Yoo
                     ` (2 more replies)
  2021-10-11  7:13 ` [RFC] Some questions and an idea on SLUB/SLAB Christoph Lameter
  1 sibling, 3 replies; 7+ messages in thread
From: Matthew Wilcox @ 2021-10-09  0:33 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: linux-mm, linux-kernel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka

On Sat, Oct 09, 2021 at 12:19:03AM +0000, Hyeonggon Yoo wrote:
>  - Is there a reason that SLUB does not implement cache coloring?
>    it will help utilizing hardware cache. Especially in block layer,
>    they are literally *squeezing* its performance now.

Have you tried turning off cache colouring in SLAB and seeing if
performance changes?  My impression is that it's useful for caches
with low associativity (direct mapped / 2-way / 4-way), but loses
its effectiveness for caches with higher associativity.  For example,
my laptop:

 L1 Data Cache: 48KB, 12-way associative, 64 byte line size
 L1 Instruction Cache: 32KB, 8-way associative, 64 byte line size
 L2 Unified Cache: 1280KB, 20-way associative, 64 byte line size
 L3 Unified Cache: 12288KB, 12-way associative, 64 byte line size

I very much doubt that cache colouring is still useful for this machine.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Some questions and an idea on SLUB/SLAB
  2021-10-09  0:33 ` Matthew Wilcox
@ 2021-10-09  0:40   ` Hyeonggon Yoo
  2021-10-09  2:02   ` Hyeonggon Yoo
  2021-10-09 11:45   ` Almost no difference Hyeonggon Yoo
  2 siblings, 0 replies; 7+ messages in thread
From: Hyeonggon Yoo @ 2021-10-09  0:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka

On Sat, Oct 09, 2021 at 01:33:43AM +0100, Matthew Wilcox wrote:
> On Sat, Oct 09, 2021 at 12:19:03AM +0000, Hyeonggon Yoo wrote:
> >  - Is there a reason that SLUB does not implement cache coloring?
> >    it will help utilizing hardware cache. Especially in block layer,
> >    they are literally *squeezing* its performance now.
> 
> Have you tried turning off cache colouring in SLAB and seeing if
> performance changes?  My impression is that it's useful for caches
> with low associativity (direct mapped / 2-way / 4-way), but loses
> its effectiveness for caches with higher associativity.  For example,
> my laptop:
> 
>  L1 Data Cache: 48KB, 12-way associative, 64 byte line size
>  L1 Instruction Cache: 32KB, 8-way associative, 64 byte line size
>  L2 Unified Cache: 1280KB, 20-way associative, 64 byte line size
>  L3 Unified Cache: 12288KB, 12-way associative, 64 byte line size
> 
> I very much doubt that cache colouring is still useful for this machine.

Hello Matthew,
What benchmark did you use for test?

-
Hyeonggon

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Some questions and an idea on SLUB/SLAB
  2021-10-09  0:33 ` Matthew Wilcox
  2021-10-09  0:40   ` Hyeonggon Yoo
@ 2021-10-09  2:02   ` Hyeonggon Yoo
  2021-10-09 11:45   ` Almost no difference Hyeonggon Yoo
  2 siblings, 0 replies; 7+ messages in thread
From: Hyeonggon Yoo @ 2021-10-09  2:02 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Linux Memory Management List, LKML, Christoph Lameter,
	Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

On Sat, Oct 9, 2021, 9:34 AM Matthew Wilcox <willy@infradead.org> wrote:

> On Sat, Oct 09, 2021 at 12:19:03AM +0000, Hyeonggon Yoo wrote:
> >  - Is there a reason that SLUB does not implement cache coloring?
> >    it will help utilizing hardware cache. Especially in block layer,
> >    they are literally *squeezing* its performance now.
>
> Have you tried turning off cache colouring in SLAB and seeing if
> performance changes?  My impression is that it's useful for caches
> with low associativity (direct mapped / 2-way / 4-way), but loses
> its effectiveness for caches with higher associativity.  For example,
> my laptop:
>
>  L1 Data Cache: 48KB, 12-way associative, 64 byte line size
>  L1 Instruction Cache: 32KB, 8-way associative, 64 byte line size
>  L2 Unified Cache: 1280KB, 20-way associative, 64 byte line size
>  L3 Unified Cache: 12288KB, 12-way associative, 64 byte line size
>
> I very much doubt that cache colouring is still useful for this machine.
>

And what was result on that benchmark?

How many cores on your processor?
And is it NUMA or UMA?

As I mentioned, color scheme is shared between cpus in same node.

I think we need to measure performqnce again after per-cpu coloring.

[-- Attachment #2: Type: text/html, Size: 1968 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Almost no difference
  2021-10-09  0:33 ` Matthew Wilcox
  2021-10-09  0:40   ` Hyeonggon Yoo
  2021-10-09  2:02   ` Hyeonggon Yoo
@ 2021-10-09 11:45   ` Hyeonggon Yoo
  2 siblings, 0 replies; 7+ messages in thread
From: Hyeonggon Yoo @ 2021-10-09 11:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton, Vlastimil Babka

On Sat, Oct 09, 2021 at 01:33:43AM +0100, Matthew Wilcox wrote:
> On Sat, Oct 09, 2021 at 12:19:03AM +0000, Hyeonggon Yoo wrote:
> >  - Is there a reason that SLUB does not implement cache coloring?
> >    it will help utilizing hardware cache. Especially in block layer,
> >    they are literally *squeezing* its performance now.
> 
> Have you tried turning off cache colouring in SLAB and seeing if
> performance changes?  My impression is that it's useful for caches
> with low associativity (direct mapped / 2-way / 4-way), but loses
> its effectiveness for caches with higher associativity.  For example,
> my laptop:
> 
>  L1 Data Cache: 48KB, 12-way associative, 64 byte line size
>  L1 Instruction Cache: 32KB, 8-way associative, 64 byte line size
>  L2 Unified Cache: 1280KB, 20-way associative, 64 byte line size
>  L3 Unified Cache: 12288KB, 12-way associative, 64 byte line size
> 
> I very much doubt that cache colouring is still useful for this machine.

On my machine,
L1 Data Cache: 32KB, 8-way associative, 64 byte line size
L1 Instruction Cache: 32KB, 8-way associative, 64 byte line size
L2 Unified Cache: 1MB, 16-way associative, 64 byte line size
L3 Unified Cache: 33MB, 11-way associative, 64 byte line size


I run hackbench with per-node coloring, per-cpu coloring, and without
coloring.

hackbench -g 100 -l 200000
without coloring: 2196.787
with per-node coloring: 2193.607
with per-cpu coloring: 2198.076

it seems there is almost no difference.
How much difference did you seen low associativity processors?

Hmm... I'm gonna search if there's related paper.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Some questions and an idea on SLUB/SLAB
  2021-10-09  0:19 [RFC] Some questions and an idea on SLUB/SLAB Hyeonggon Yoo
  2021-10-09  0:33 ` Matthew Wilcox
@ 2021-10-11  7:13 ` Christoph Lameter
  2021-10-13  3:44   ` Hyeonggon Yoo
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Lameter @ 2021-10-11  7:13 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: linux-mm, linux-kernel, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka

On Sat, 9 Oct 2021, Hyeonggon Yoo wrote:

>  - Is there a reason that SLUB does not implement cache coloring?
>    it will help utilizing hardware cache. Especially in block layer,
>    they are literally *squeezing* its performance now.

Well as Matthew says: The high associativity of caches and the execution
of other code path seems to make this not useful anymore.

I am sure you can find a benchmark that shows some benefit. But please
realize that in real-life the OS must perform work. This means that
multiple other code paths are executed that affect cache use and placement
of data in cache lines.


>  - In SLAB, do we really need to flush queues every few seconds?
>    (per cpu queue and shared queue). Flushing alien caches makes
>    sense, but flushing queues seems reducing it's fastpath.
>    But yeah, we need to reclaim memory. can we just defer this?

The queues are designed to track cache hot objects (See the Bonwick
paper). After a while the cachelines will be used for other purposes and
no longer reflect what is in the caches. That is why they need to be
expired.


>   - I don't like SLAB's per-node cache coloring, because L1 cache
>     isn't shared between cpus. For now, cpus in same node are sharing
>     its colour_next - but we can do better.

This differs based on the cpu architecture in use. SLAB has an ideal model
of how caches work and keeps objects cache hot based on that. In real life
the cpu architecture differs from what SLAB things how caches operate.

>     what about splitting some per-cpu variables into kmem_cache_cpu
>     like SLUB? I think cpu_cache, colour (and colour_next),
>     alloc{hit,miss}, and free{hit,miss} can be per-cpu variables.

That would in turn increase memory use and potentially the cache footprint
of the hot paths.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] Some questions and an idea on SLUB/SLAB
  2021-10-11  7:13 ` [RFC] Some questions and an idea on SLUB/SLAB Christoph Lameter
@ 2021-10-13  3:44   ` Hyeonggon Yoo
  0 siblings, 0 replies; 7+ messages in thread
From: Hyeonggon Yoo @ 2021-10-13  3:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-mm, linux-kernel, Pekka Enberg, David Rientjes,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka


Hello Christoph, thank you for answering.

On Mon, Oct 11, 2021 at 09:13:52AM +0200, Christoph Lameter wrote:
> On Sat, 9 Oct 2021, Hyeonggon Yoo wrote:
> 
> >  - Is there a reason that SLUB does not implement cache coloring?
> >    it will help utilizing hardware cache. Especially in block layer,
> >    they are literally *squeezing* its performance now.
> 
> Well as Matthew says: The high associativity of caches 

it seems not useful on my both machines (4-way / 8-way set associative) too.

> and the execution
> of other code path seems to make this not useful anymore.
> 
> I am sure you can find a benchmark that shows some benefit. But please
> realize that in real-life the OS must perform work. This means that
> multiple other code paths are executed that affect cache use and placement
> of data in cache lines.
> 

cache coloring can make benchmark results better. But as slab uses more
cache lines - that reduces other code paths' cache line. Did I get right?

> 
> >  - In SLAB, do we really need to flush queues every few seconds?
> >    (per cpu queue and shared queue). Flushing alien caches makes
> >    sense, but flushing queues seems reducing it's fastpath.
> >    But yeah, we need to reclaim memory. can we just defer this?
> 
> The queues are designed to track cache hot objects (See the Bonwick
> paper). After a while the cachelines will be used for other purposes and
> no longer reflect what is in the caches. That is why they need to be
> expired.

I've read Bonwick paper but I thought expiring was need for reclaiming
memory. maybe I got it wrong.. I should read it again.

> 
> 
> >   - I don't like SLAB's per-node cache coloring, because L1 cache
> >     isn't shared between cpus. For now, cpus in same node are sharing
> >     its colour_next - but we can do better.
> 
> This differs based on the cpu architecture in use. SLAB has an ideal model
> of how caches work and keeps objects cache hot based on that. In real life
> the cpu architecture differs from what SLAB things how caches operate.
> 

So the point is, As cache hierarchy differs based on architecture,
assuming cpus have both unique cache per cpu, and shared cache among
cpus can misfit in some architectures.

> >     what about splitting some per-cpu variables into kmem_cache_cpu
> >     like SLUB? I think cpu_cache, colour (and colour_next),
> >     alloc{hit,miss}, and free{hit,miss} can be per-cpu variables.
> 
> That would in turn increase memory use and potentially the cache footprint
> of the hot paths.
>

I thought splitting percpu data was need for coloring but it
isn't useful. So that's unnecessary cost.

Thanks,
Hyeonggon.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-10-13  3:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09  0:19 [RFC] Some questions and an idea on SLUB/SLAB Hyeonggon Yoo
2021-10-09  0:33 ` Matthew Wilcox
2021-10-09  0:40   ` Hyeonggon Yoo
2021-10-09  2:02   ` Hyeonggon Yoo
2021-10-09 11:45   ` Almost no difference Hyeonggon Yoo
2021-10-11  7:13 ` [RFC] Some questions and an idea on SLUB/SLAB Christoph Lameter
2021-10-13  3:44   ` Hyeonggon Yoo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.