LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Shakeel Butt <shakeelb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, Linux MM <linux-mm@kvack.org>,
	Kernel Team <kernel-team@fb.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH v6 00/19] The new cgroup slab memory controller
Date: Thu, 18 Jun 2020 18:27:12 -0700
Message-ID: <20200619012712.GD135965@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20200618104344.6a96ac04@carbon>

On Thu, Jun 18, 2020 at 10:43:44AM +0200, Jesper Dangaard Brouer wrote:
> On Wed, 17 Jun 2020 18:29:28 -0700
> Roman Gushchin <guro@fb.com> wrote:
> 
> > On Wed, Jun 17, 2020 at 01:24:21PM +0200, Vlastimil Babka wrote:
> > > On 6/17/20 5:32 AM, Roman Gushchin wrote:  
> > > > On Tue, Jun 16, 2020 at 08:05:39PM -0700, Shakeel Butt wrote:  
> > > >> On Tue, Jun 16, 2020 at 7:41 PM Roman Gushchin <guro@fb.com> wrote:  
> > > >> >
> > > >> > On Tue, Jun 16, 2020 at 06:46:56PM -0700, Shakeel Butt wrote:  
> > > >> > > On Mon, Jun 8, 2020 at 4:07 PM Roman Gushchin <guro@fb.com> wrote:  
> > > >> > > >  
> > > >> [...]  
> > > >> > >
> > > >> > > Have you performed any [perf] testing on SLAB with this patchset?  
> > > >> >
> > > >> > The accounting part is the same for SLAB and SLUB, so there should be no
> > > >> > significant difference. I've checked that it compiles, boots and passes
> > > >> > kselftests. And that memory savings are there.
> > > >> >  
> > > >> 
> > > >> What about performance? Also you mentioned that sharing kmem-cache
> > > >> between accounted and non-accounted can have additional overhead. Any
> > > >> difference between SLAB and SLUB for such a case?  
> > > > 
> > > > Not really.
> > > > 
> > > > Sharing a single set of caches adds some overhead to root- and non-accounted
> > > > allocations, which is something I've tried hard to avoid in my original version.
> > > > But I have to admit, it allows to simplify and remove a lot of code, and here
> > > > it's hard to argue with Johanness, who pushed on this design.
> > > > 
> > > > With performance testing it's not that easy, because it's not obvious what
> > > > we wanna test. Obviously, per-object accounting is more expensive, and
> > > > measuring something like 1000000 allocations and deallocations in a line from
> > > > a single kmem_cache will show a regression. But in the real world the relative
> > > > cost of allocations is usually low, and we can get some benefits from a smaller
> > > > working set and from having shared kmem_cache objects cache hot.
> > > > Not speaking about some extra memory and the fragmentation reduction.
> > > > 
> > > > We've done an extensive testing of the original version in Facebook production,
> > > > and we haven't noticed any regressions so far. But I have to admit, we were
> > > > using an original version with two sets of kmem_caches.
> > > > 
> > > > If you have any specific tests in mind, I can definitely run them. Or if you
> > > > can help with the performance evaluation, I'll appreciate it a lot.  
> > > 
> > > Jesper provided some pointers here [1], it would be really great if you could
> > > run at least those microbenchmarks. With mmtests it's the major question of
> > > which subset/profiles to run, maybe the referenced commits provide some hints,
> > > or maybe Mel could suggest what he used to evaluate SLAB vs SLUB not so long ago.
> > > 
> > > [1] https://lore.kernel.org/linux-mm/20200527103545.4348ac10@carbon/  
> > 
> > Oh, Jesper, I'm really sorry, somehow I missed your mail.
> > Thank you, Vlastimil, for pointing at it.
> > 
> > I've got some results (slab_bulk_test01), but honestly I fail to interpret them.
> > 
> > I ran original vs patched with SLUB and SLAB, each test several times and picked
> > 3 which looked most consistently. But it still looks very noisy.
> > 
> > I ran them on my desktop (8-core Ryzen 1700, 16 GB RAM, Fedora 32),
> > it's 5.8-rc1 + slab controller v6 vs 5.8-rc1 (default config from Fedora 32).
> 
> What about running these tests on the server level hardware, that you
> intent to run this on?

I'm going to backport this version to the kernel version we're using internally
and will come up with more number soon.

> 
> > 
> > How should I interpret this data?
> 
> First of all these SLUB+SLAB microbenchmarks use object size 256 bytes,
> because network stack alloc object of this size for SKBs/sk_buff (due
> to cache-align as used size is 224 bytes). Checked SLUB: Each slab use
> 2 pages (8192 bytes) and contain 32 object of size 256 (256*32=8192).
> 
>   The SLUB allocator have a per-CPU slab which speedup fast-reuse, in this
> case up-to 32 objects. For SLUB the "fastpath reuse" test this behaviour,
> and it serves as a baseline for optimal 1-object performance (where my bulk
> API tries to beat that, which is possible even for 1-object due to knowing
> bulk API cannot be used from IRQ context).
> 
> SLUB fastpath: 3 measurements reporting cycles(tsc)
>  - SLUB-patched : fastpath reuse: 184 - 177 - 176  cycles(tsc)
>  - SLUB-original: fastpath reuse: 178 - 153 - 156  cycles(tsc)
> 
> There are some stability concerns as you mention, but it seems pretty
> consistently that patched version is slower. If you compile with
> no-PREEMPT you can likely get more stable results (and remove a slight
> overhead for SLUB fastpath).
> 
> The microbenchmark also measures the bulk-API, which is AFAIK only used
> by network stack (and io_uring). I guess you shouldn't focus too much
> on these bulk measurements. When bulk-API cross this objects per slab
> threshold, or is unlucky is it use two per-CPU slab, then the
> measurements can fluctuate a bit.
> 
> Your numbers for SLUB bulk-API:
> 
> SLUB-patched - bulk-API
>  - SLUB-patched : bulk_quick_reuse objects=1 : 187 -  90 - 224  cycles(tsc)
>  - SLUB-patched : bulk_quick_reuse objects=2 : 110 -  53 - 133  cycles(tsc)
>  - SLUB-patched : bulk_quick_reuse objects=3 :  88 -  95 -  42  cycles(tsc)
>  - SLUB-patched : bulk_quick_reuse objects=4 :  91 -  85 -  36  cycles(tsc)
>  - SLUB-patched : bulk_quick_reuse objects=8 :  32 -  66 -  32  cycles(tsc)
> 
> SLUB-original -  bulk-API
>  - SLUB-original: bulk_quick_reuse objects=1 :  87 -  87 - 142  cycles(tsc)
>  - SLUB-original: bulk_quick_reuse objects=2 :  52 -  53 -  53  cycles(tsc)
>  - SLUB-original: bulk_quick_reuse objects=3 :  42 -  42 -  91  cycles(tsc)
>  - SLUB-original: bulk_quick_reuse objects=4 :  91 -  37 -  37  cycles(tsc)
>  - SLUB-original: bulk_quick_reuse objects=8 :  31 -  79 -  76  cycles(tsc)
> 
> Maybe it is just noise or instability in measurements, but it seem that the
> 1-object case is consistently slower in your patched version.
> 
> Mail is too long now... I'll take a look at your SLAB results and followup.


Thank you very much for helping with the analysis!

So does it mean you're looking at the smallest number in each series?
If so, the difference is not that big?

Theoretically speaking it should get worse (especially for non-root allocations),
but if the difference is not big, it still should be better, because there is
a big expected win from memory savings/smaller working set/less fragmentation etc.

The only thing I'm slightly worried is what's the effect on root allocations
if we're sharing slab caches between root- and non-root allocations. Because if
someone depends so much on the allocation speed, memcg-based accounting can be
ignored anyway. For most users the cost of allocation is negligible.
That's why the patch which merges root- and memcg slab caches is put on top
and can be reverted if somebody will complain.

Thank you!

  parent reply index

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-08 23:06 Roman Gushchin
2020-06-08 23:06 ` [PATCH v6 01/19] mm: memcg: factor out memcg- and lruvec-level changes out of __mod_lruvec_state() Roman Gushchin
2020-06-17  1:52   ` Shakeel Butt
2020-06-17  2:50     ` Roman Gushchin
2020-06-17  2:59       ` Shakeel Butt
2020-06-17  3:19         ` Roman Gushchin
2020-06-08 23:06 ` [PATCH v6 02/19] mm: memcg: prepare for byte-sized vmstat items Roman Gushchin
2020-06-17  2:57   ` Shakeel Butt
2020-06-17  3:19     ` Roman Gushchin
2020-06-17 15:55   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 03/19] mm: memcg: convert vmstat slab counters to bytes Roman Gushchin
2020-06-17  3:03   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 04/19] mm: slub: implement SLUB version of obj_to_index() Roman Gushchin
2020-06-17  3:08   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 05/19] mm: memcontrol: decouple reference counting from page accounting Roman Gushchin
2020-06-18  0:47   ` Shakeel Butt
2020-06-18 14:55   ` Shakeel Butt
2020-06-18 19:51     ` Roman Gushchin
2020-06-19  1:08     ` Roman Gushchin
2020-06-19  1:18       ` Shakeel Butt
2020-06-19  1:31   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 06/19] mm: memcg/slab: obj_cgroup API Roman Gushchin
2020-06-19 15:42   ` Shakeel Butt
2020-06-19 21:38     ` Roman Gushchin
2020-06-19 22:16       ` Shakeel Butt
2020-06-19 22:52         ` Roman Gushchin
2020-06-20 22:50       ` Andrew Morton
2020-06-08 23:06 ` [PATCH v6 07/19] mm: memcg/slab: allocate obj_cgroups for non-root slab pages Roman Gushchin
2020-06-19 16:36   ` Shakeel Butt
2020-06-20  0:25     ` Roman Gushchin
2020-06-20  0:31       ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 08/19] mm: memcg/slab: save obj_cgroup for non-root slab objects Roman Gushchin
2020-06-20  0:16   ` Shakeel Butt
2020-06-20  1:19     ` Roman Gushchin
2020-06-08 23:06 ` [PATCH v6 09/19] mm: memcg/slab: charge individual slab objects instead of pages Roman Gushchin
2020-06-20  0:54   ` Shakeel Butt
2020-06-20  1:29     ` Roman Gushchin
2020-06-08 23:06 ` [PATCH v6 10/19] mm: memcg/slab: deprecate memory.kmem.slabinfo Roman Gushchin
2020-06-22 17:12   ` Shakeel Butt
2020-06-22 18:01     ` Roman Gushchin
2020-06-22 18:09       ` Shakeel Butt
2020-06-22 18:25         ` Roman Gushchin
2020-06-22 18:38           ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 11/19] mm: memcg/slab: move memcg_kmem_bypass() to memcontrol.h Roman Gushchin
2020-06-20  1:19   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 12/19] mm: memcg/slab: use a single set of kmem_caches for all accounted allocations Roman Gushchin
2020-06-22 16:56   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 13/19] mm: memcg/slab: simplify memcg cache creation Roman Gushchin
2020-06-22 17:29   ` Shakeel Butt
2020-06-22 17:40     ` Roman Gushchin
2020-06-22 18:03       ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 14/19] mm: memcg/slab: remove memcg_kmem_get_cache() Roman Gushchin
2020-06-22 18:42   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 15/19] mm: memcg/slab: deprecate slab_root_caches Roman Gushchin
2020-06-22 17:36   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 16/19] mm: memcg/slab: remove redundant check in memcg_accumulate_slabinfo() Roman Gushchin
2020-06-22 17:32   ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 17/19] mm: memcg/slab: use a single set of kmem_caches for all allocations Roman Gushchin
2020-06-17 23:35   ` Andrew Morton
2020-06-18  0:35     ` Roman Gushchin
2020-06-18  7:33       ` Vlastimil Babka
2020-06-18 19:54         ` Roman Gushchin
2020-06-22 19:21   ` Shakeel Butt
2020-06-22 20:37     ` Roman Gushchin
2020-06-22 21:04       ` Shakeel Butt
2020-06-22 21:13         ` Roman Gushchin
2020-06-22 21:28           ` Shakeel Butt
2020-06-22 21:58             ` Roman Gushchin
2020-06-22 22:05               ` Shakeel Butt
2020-06-08 23:06 ` [PATCH v6 18/19] kselftests: cgroup: add kernel memory accounting tests Roman Gushchin
2020-06-17  1:46 ` [PATCH v6 00/19] The new cgroup slab memory controller Shakeel Butt
2020-06-17  2:41   ` Roman Gushchin
2020-06-17  3:05     ` Shakeel Butt
2020-06-17  3:32       ` Roman Gushchin
2020-06-17 11:24         ` Vlastimil Babka
2020-06-17 14:31           ` Mel Gorman
2020-06-20  0:57             ` Roman Gushchin
2020-06-18  1:29           ` Roman Gushchin
2020-06-18  8:43             ` Jesper Dangaard Brouer
2020-06-18  9:31               ` Jesper Dangaard Brouer
2020-06-19  1:30                 ` Roman Gushchin
2020-06-19  8:32                   ` Jesper Dangaard Brouer
2020-06-19  1:27               ` Roman Gushchin [this message]
2020-06-19  9:39                 ` Jesper Dangaard Brouer
2020-06-19 18:47                   ` Roman Gushchin
2020-06-18  1:18   ` Roman Gushchin
2020-06-18  9:27 ` Mike Rapoport
2020-06-18 20:43   ` Roman Gushchin
2020-06-21 22:57 ` Qian Cai
2020-06-21 23:34   ` Roman Gushchin
2020-06-21 23:53     ` Qian Cai
2020-06-22  3:07       ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200619012712.GD135965@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=brouer@redhat.com \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git