linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Xunlei Pang <xlpang@linux.alibaba.com>,
	Christoph Lameter <cl@linux.com>,
	Pekka Enberg <penberg@kernel.org>, Roman Gushchin <guro@fb.com>,
	Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Shu Ming <sming56@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Wen Yang <wenyang@linux.alibaba.com>,
	James Wang <jnwang@linux.alibaba.com>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v3 0/4] mm/slub: Fix count_partial() problem
Date: Mon, 15 Mar 2021 19:49:57 +0100	[thread overview]
Message-ID: <793c884a-9d60-baaf-fab8-3e5f4a024124@suse.cz> (raw)
In-Reply-To: <1615303512-35058-1-git-send-email-xlpang@linux.alibaba.com>

On 3/9/21 4:25 PM, Xunlei Pang wrote:
> count_partial() can hold n->list_lock spinlock for quite long, which
> makes much trouble to the system. This series eliminate this problem.

Before I check the details, I have two high-level comments:

- patch 1 introduces some counting scheme that patch 4 then changes, could we do
this in one step to avoid the churn?

- the series addresses the concern that spinlock is being held, but doesn't
address the fact that counting partial per-node slabs is not nearly enough if we
want accurate <active_objs> in /proc/slabinfo because there are also percpu
slabs and per-cpu partial slabs, where we don't track the free objects at all.
So after this series while the readers of /proc/slabinfo won't block the
spinlock, they will get the same garbage data as before. So Christoph is not
wrong to say that we can just report active_objs == num_objs and it won't
actually break any ABI.
At the same time somebody might actually want accurate object statistics at the
expense of peak performance, and it would be nice to give them such option in
SLUB. Right now we don't provide this accuracy even with CONFIG_SLUB_STATS,
although that option provides many additional tuning stats, with additional
overhead.
So my proposal would be a new config for "accurate active objects" (or just tie
it to CONFIG_SLUB_DEBUG?) that would extend the approach of percpu counters in
patch 4 to all alloc/free, so that it includes percpu slabs. Without this config
enabled, let's just report active_objs == num_objs.

Vlastimil

> v1->v2:
> - Improved changelog and variable naming for PATCH 1~2.
> - PATCH3 adds per-cpu counter to avoid performance regression
>   in concurrent __slab_free().
> 
> v2->v3:
> - Changed "page->inuse" to the safe "new.inuse", etc.
> - Used CONFIG_SLUB_DEBUG and CONFIG_SYSFS condition for new counters.
> - atomic_long_t -> unsigned long
> 
> [Testing]
> There seems might be a little performance impact under extreme
> __slab_free() concurrent calls according to my tests.
> 
> On my 32-cpu 2-socket physical machine:
> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
> 
> 1) perf stat --null --repeat 10 -- hackbench 20 thread 20000
> 
> == original, no patched
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> 
>       24.536050899 seconds time elapsed                                          ( +-  0.24% )
> 
> 
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> 
>       24.588049142 seconds time elapsed                                          ( +-  0.35% )
> 
> 
> == patched with patch1~4
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> 
>       24.670892273 seconds time elapsed                                          ( +-  0.29% )
> 
> 
> Performance counter stats for 'hackbench 20 thread 20000' (10 runs):
> 
>       24.746755689 seconds time elapsed                                          ( +-  0.21% )
> 
> 
> 2) perf stat --null --repeat 10 -- hackbench 32 thread 20000
> 
> == original, no patched
>  Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> 
>       39.784911855 seconds time elapsed                                          ( +-  0.14% )
> 
>  Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> 
>       39.868687608 seconds time elapsed                                          ( +-  0.19% )
> 
> == patched with patch1~4
>  Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> 
>       39.681273015 seconds time elapsed                                          ( +-  0.21% )
> 
>  Performance counter stats for 'hackbench 32 thread 20000' (10 runs):
> 
>       39.681238459 seconds time elapsed                                          ( +-  0.09% )
> 
> 
> Xunlei Pang (4):
>   mm/slub: Introduce two counters for partial objects
>   mm/slub: Get rid of count_partial()
>   percpu: Export per_cpu_sum()
>   mm/slub: Use percpu partial free counter
> 
>  include/linux/percpu-defs.h   |  10 ++++
>  kernel/locking/percpu-rwsem.c |  10 ----
>  mm/slab.h                     |   4 ++
>  mm/slub.c                     | 120 +++++++++++++++++++++++++++++-------------
>  4 files changed, 97 insertions(+), 47 deletions(-)
> 


  parent reply	other threads:[~2021-03-15 18:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09 15:25 [PATCH v3 0/4] mm/slub: Fix count_partial() problem Xunlei Pang
2021-03-09 15:25 ` [PATCH v3 1/4] mm/slub: Introduce two counters for partial objects Xunlei Pang
2021-03-09 15:25 ` [PATCH v3 2/4] mm/slub: Get rid of count_partial() Xunlei Pang
2021-03-09 15:25 ` [PATCH v3 3/4] percpu: Export per_cpu_sum() Xunlei Pang
2021-03-09 15:25 ` [PATCH v3 4/4] mm/slub: Use percpu partial free counter Xunlei Pang
2021-03-15 18:49 ` Vlastimil Babka [this message]
2021-03-15 19:05   ` [PATCH v3 0/4] mm/slub: Fix count_partial() problem Roman Gushchin
2021-03-15 19:22     ` Yang Shi
2021-03-16 10:07       ` Christoph Lameter
2021-03-16 10:32         ` Vlastimil Babka
2021-03-16 10:42   ` Xunlei Pang
2021-03-16 11:02     ` Vlastimil Babka
2021-03-16 11:49       ` Xunlei Pang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=793c884a-9d60-baaf-fab8-3e5f4a024124@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=guro@fb.com \
    --cc=jnwang@linux.alibaba.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=sming56@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=wenyang@linux.alibaba.com \
    --cc=willy@infradead.org \
    --cc=xlpang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).