All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	netdev@vger.kernel.org, linux-mm@kvack.org,
	Christoph Lameter <cl@linux.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	penberg@kernel.org, Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	edumazet@google.com, pabeni@redhat.com,
	Matthew Wilcox <willy@infradead.org>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	David Sterba <dsterba@suse.com>
Subject: Re: [PATCH RFC] mm+net: allow to set kmem_cache create flag for SLAB_NEVER_MERGE
Date: Wed, 31 May 2023 13:08:17 -0700	[thread overview]
Message-ID: <ZHepMQybpBcKLgVg@P9FQF9L96D.corp.robot.car> (raw)
In-Reply-To: <81597717-0fed-5fd0-37d0-857d976b9d40@suse.cz>

On Wed, May 31, 2023 at 02:03:05PM +0200, Vlastimil Babka wrote:
> On 1/17/23 14:40, Jesper Dangaard Brouer wrote:
> > Allow API users of kmem_cache_create to specify that they don't want
> > any slab merge or aliasing (with similar sized objects). Use this in
> > network stack and kfence_test.
> > 
> > The SKB (sk_buff) kmem_cache slab is critical for network performance.
> > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> > performance by amortising the alloc/free cost.
> > 
> > For the bulk API to perform efficiently the slub fragmentation need to
> > be low. Especially for the SLUB allocator, the efficiency of bulk free
> > API depend on objects belonging to the same slab (page).
> > 
> > When running different network performance microbenchmarks, I started
> > to notice that performance was reduced (slightly) when machines had
> > longer uptimes. I believe the cause was 'skbuff_head_cache' got
> > aliased/merged into the general slub for 256 bytes sized objects (with
> > my kernel config, without CONFIG_HARDENED_USERCOPY).
> > 
> > For SKB kmem_cache network stack have reasons for not merging, but it
> > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> > We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> Since this idea was revived by David [1], and neither patch worked as is,
> but yours was more complete and first, I have fixed it up as below. The
> skbuff part itself will be best submitted separately afterwards so we don't
> get conflicts between trees etc. Comments?
> 
> ----8<----
> From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Tue, 17 Jan 2023 14:40:00 +0100
> Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
> 
> Allow API users of kmem_cache_create to specify that they don't want
> any slab merge or aliasing (with similar sized objects). Use this in
> kfence_test.
> 
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
> 
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
> 
> When running different network performance microbenchmarks, I started
> to notice that performance was reduced (slightly) when machines had
> longer uptimes. I believe the cause was 'skbuff_head_cache' got
> aliased/merged into the general slub for 256 bytes sized objects (with
> my kernel config, without CONFIG_HARDENED_USERCOPY).
> 
> For SKB kmem_cache network stack have reasons for not merging, but it
> varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> We want to explicitly set SLAB_NO_MERGE for this kmem_cache.

I believe it's also good for the visibility: having a separate entity in
/proc/slabinfo for skbuff_head_cache can be really useful.

> 
> Another use case for the flag has been described by David Sterba [1]:
> 
> > This can be used for more fine grained control over the caches or for
> > debugging builds where separate slabs can verify that no objects leak.
> 
> > The slab_nomerge boot option is too coarse and would need to be
> > enabled on all testing hosts. There are some other ways how to disable
> > merging, e.g. a slab constructor but this disables poisoning besides
> > that it adds additional overhead. Other flags are internal and may
> > have other semantics.
> 
> > A concrete example what motivates the flag. During 'btrfs balance'
> > slab top reported huge increase in caches like
> 
> >  1330095 1330095 100%    0.10K  34105       39    136420K Acpi-ParseExt
> >  1734684 1734684 100%    0.14K  61953       28    247812K pid_namespace
> >  8244036 6873075  83%    0.11K 229001       36    916004K khugepaged_mm_slot
> 
> > which was confusing and that it's because of slab merging was not the
> > first idea.  After rebooting with slab_nomerge all the caches were
> > from btrfs_ namespace as expected.
> 
> [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/
> 
> [ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the
>   one proposed by David so it does not collide with internal SLAB/SLUB
>   flags, write a comment for the flag, expand changelog, drop the skbuff
>   part to be handled spearately ]
> 
> Reported-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
both for this patch and the corresponding change on the networking side.

Thanks!

      parent reply	other threads:[~2023-05-31 20:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 13:40 [PATCH RFC] mm+net: allow to set kmem_cache create flag for SLAB_NEVER_MERGE Jesper Dangaard Brouer
2023-01-17 14:54 ` Christoph Lameter
2023-01-18  5:17   ` Matthew Wilcox
2023-01-19 18:08     ` Jesper Dangaard Brouer
2023-01-24 16:06   ` Hyeonggon Yoo
2023-01-18  7:36 ` Vlastimil Babka
2023-01-23 16:14   ` Jesper Dangaard Brouer
2023-05-31 12:03 ` Vlastimil Babka
2023-05-31 13:59   ` Jesper Dangaard Brouer
2023-05-31 20:08   ` Roman Gushchin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZHepMQybpBcKLgVg@P9FQF9L96D.corp.robot.car \
    --to=roman.gushchin@linux.dev \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=davem@davemloft.net \
    --cc=dsterba@suse.com \
    --cc=edumazet@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kuba@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=penberg@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.