[PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
@ 2020-02-22  9:24 Wen Yang
  2020-02-24  1:29   ` Christopher Lameter
  0 siblings, 1 reply; 10+ messages in thread
From: Wen Yang @ 2020-02-22  9:24 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton
  Cc: Wen Yang, Roman Gushchin, Xunlei Pang, linux-mm, linux-kernel

In the cloud server scenario, reading "/proc/slabinfo" can possibily
block the slab allocation on another CPU for a while, 200ms in extreme
cases. If the slab object is to carry network packet, targeting the
far-end disk array, it causes block IO jitter issues.

This is because the list_lock, which protecting the node partial list,
is taken when couting the free objects resident in that list. It introduces
locking contention when the page(s) is moved between CPU and node partial
lists in allocation path on another CPU.

We also observed that in this scenario, CONFIG_SLUB_CPU_PARTIAL is turned
on by default, and count_partial() is useless because the returned number
is far from the reality.

Therefore, we can simply return 0, then nr_free is also 0, and eventually
active_objects == total_objects. We do not introduce any regression, and
it's preferable to show the unrealistic uniform 100% slab utilization
rather than some very high but incorrect value.

Co-developed-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Xunlei Pang <xlpang@linux.alibaba.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/slub.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e..d5b7230 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2411,14 +2411,16 @@ static inline unsigned long node_nr_objs(struct kmem_cache_node *n)
 static unsigned long count_partial(struct kmem_cache_node *n,
 					int (*get_count)(struct page *))
 {
-	unsigned long flags;
 	unsigned long x = 0;
+#ifndef CONFIG_SLUB_CPU_PARTIAL
+	unsigned long flags;
 	struct page *page;

 	spin_lock_irqsave(&n->list_lock, flags);
 	list_for_each_entry(page, &n->partial, slab_list)
 		x += get_count(page);
 	spin_unlock_irqrestore(&n->list_lock, flags);
+#endif
 	return x;
 }
 #endif /* CONFIG_SLUB_DEBUG || CONFIG_SYSFS */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
  2020-02-22  9:24 [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL Wen Yang
@ 2020-02-24  1:29   ` Christopher Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2020-02-24  1:29 UTC (permalink / raw)
  To: Wen Yang
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Xunlei Pang, linux-mm, linux-kernel

On Sat, 22 Feb 2020, Wen Yang wrote:

> We also observed that in this scenario, CONFIG_SLUB_CPU_PARTIAL is turned
> on by default, and count_partial() is useless because the returned number
> is far from the reality.

Well its not useless. Its just not counting the partial objects in per cpu
partial slabs. Those are counted by a different counter it.

> Therefore, we can simply return 0, then nr_free is also 0, and eventually
> active_objects == total_objects. We do not introduce any regression, and
> it's preferable to show the unrealistic uniform 100% slab utilization
> rather than some very high but incorrect value.

I suggest that you simply use the number of partial slabs and multiply
them by the number of objects in a slab and use that as a value. Both
values are readily available via /sys/kernel/slab/<...>/

You dont need a patch to do that.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
@ 2020-02-24  1:29   ` Christopher Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2020-02-24  1:29 UTC (permalink / raw)
  To: Wen Yang
  Cc: Pekka Enberg, David Rientjes, Joonsoo Kim, Andrew Morton,
	Roman Gushchin, Xunlei Pang, linux-mm, linux-kernel

On Sat, 22 Feb 2020, Wen Yang wrote:

> We also observed that in this scenario, CONFIG_SLUB_CPU_PARTIAL is turned
> on by default, and count_partial() is useless because the returned number
> is far from the reality.

Well its not useless. Its just not counting the partial objects in per cpu
partial slabs. Those are counted by a different counter it.

> Therefore, we can simply return 0, then nr_free is also 0, and eventually
> active_objects == total_objects. We do not introduce any regression, and
> it's preferable to show the unrealistic uniform 100% slab utilization
> rather than some very high but incorrect value.

I suggest that you simply use the number of partial slabs and multiply
them by the number of objects in a slab and use that as a value. Both
values are readily available via /sys/kernel/slab/<...>/

You dont need a patch to do that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
  2020-02-24  1:29   ` Christopher Lameter
  (?)
@ 2020-02-24 16:57   ` Roman Gushchin
  2020-02-25 15:49     ` Vlastimil Babka
  2020-02-26 18:31       ` Christopher Lameter
  -1 siblings, 2 replies; 10+ messages in thread
From: Roman Gushchin @ 2020-02-24 16:57 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Wen Yang, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Xunlei Pang, linux-mm, linux-kernel

On Mon, Feb 24, 2020 at 01:29:09AM +0000, Christoph Lameter wrote:
> On Sat, 22 Feb 2020, Wen Yang wrote:

Hello, Christopher!

> 
> > We also observed that in this scenario, CONFIG_SLUB_CPU_PARTIAL is turned
> > on by default, and count_partial() is useless because the returned number
> > is far from the reality.
> 
> Well its not useless. Its just not counting the partial objects in per cpu
> partial slabs. Those are counted by a different counter it.

Do you mean CPU_PARTIAL_ALLOC or something else?

"useless" isn't the most accurate wording, sorry for that.

The point is that the number of active objects displayed in /proc/slabinfo
is misleading if percpu partial lists are used. So it's strange to pay
for it by potentially slowing down concurrent allocations.

> 
> > Therefore, we can simply return 0, then nr_free is also 0, and eventually
> > active_objects == total_objects. We do not introduce any regression, and
> > it's preferable to show the unrealistic uniform 100% slab utilization
> > rather than some very high but incorrect value.
> 
> I suggest that you simply use the number of partial slabs and multiply
> them by the number of objects in a slab and use that as a value. Both
> values are readily available via /sys/kernel/slab/<...>/

So maybe something like this?

@@ -5907,7 +5907,9 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
 	for_each_kmem_cache_node(s, node, n) {
 		nr_slabs += node_nr_slabs(n);
 		nr_objs += node_nr_objs(n);
+#ifndef CONFIG_SLUB_CPU_PARTIAL
 		nr_free += count_partial(n, count_free);
+#endif
 	}
 
 	sinfo->active_objs = nr_objs - nr_free;


Thank you!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
  2020-02-24 16:57   ` Roman Gushchin
@ 2020-02-25 15:49     ` Vlastimil Babka
  2020-02-26 18:31       ` Christopher Lameter
  1 sibling, 0 replies; 10+ messages in thread
From: Vlastimil Babka @ 2020-02-25 15:49 UTC (permalink / raw)
  To: Roman Gushchin, Christopher Lameter
  Cc: Wen Yang, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Xunlei Pang, linux-mm, linux-kernel

On 2/24/20 5:57 PM, Roman Gushchin wrote:
> On Mon, Feb 24, 2020 at 01:29:09AM +0000, Christoph Lameter wrote:
>> On Sat, 22 Feb 2020, Wen Yang wrote:
> 
> Hello, Christopher!
> 
>>
>>> We also observed that in this scenario, CONFIG_SLUB_CPU_PARTIAL is turned
>>> on by default, and count_partial() is useless because the returned number
>>> is far from the reality.
>>
>> Well its not useless. Its just not counting the partial objects in per cpu
>> partial slabs. Those are counted by a different counter it.
> 
> Do you mean CPU_PARTIAL_ALLOC or something else?
> 
> "useless" isn't the most accurate wording, sorry for that.
> 
> The point is that the number of active objects displayed in /proc/slabinfo
> is misleading if percpu partial lists are used. So it's strange to pay
> for it by potentially slowing down concurrent allocations.

Hmm, I wonder... kmem_cache_cpu has those quite detailed stats with
CONFIG_SLUB_STATS. Could perhaps the number of free object be
reconstructed from them by adding up / subtracting the relevant items
across all CPUs? Expensive, but the cost would be taken by the
/proc/slabinfo reader, without blocking anyone.

Then again, CONFIG_SLUB_STATS is disabled by default. But the same
percpu mechanism could be used to create some "stats light" variant that
doesn't count everything, just what's needed to track number of free
objects. Percpu should mean the atomic inc/decs wouldn't cause much
contention...

It's certainly useful to have an idea of slab fragmentation (low inuse
vs total object) from /proc/slabinfo. But if that remains available via
/sys/kernel/slab/ then I guess it's fine... until all continuous
monitoring tools that now read /proc/slabinfo periodically start reading
all those /sys/kernel/slab/ files periodically...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
  2020-02-24 16:57   ` Roman Gushchin
@ 2020-02-26 18:31       ` Christopher Lameter
  2020-02-26 18:31       ` Christopher Lameter
  1 sibling, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2020-02-26 18:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wen Yang, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Xunlei Pang, linux-mm, linux-kernel

On Mon, 24 Feb 2020, Roman Gushchin wrote:

> > I suggest that you simply use the number of partial slabs and multiply
> > them by the number of objects in a slab and use that as a value. Both
> > values are readily available via /sys/kernel/slab/<...>/
>
> So maybe something like this?
>
> @@ -5907,7 +5907,9 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
>  	for_each_kmem_cache_node(s, node, n) {
>  		nr_slabs += node_nr_slabs(n);
>  		nr_objs += node_nr_objs(n);
> +#ifndef CONFIG_SLUB_CPU_PARTIAL
>  		nr_free += count_partial(n, count_free);
> +#endif
>  	}

Why would not having cpu partials screws up the counting of objects in
partial slabs?


You dont need kernel mods for this. The numbers are exposed already in
/sys/kernel/slab/xxx.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
@ 2020-02-26 18:31       ` Christopher Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2020-02-26 18:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wen Yang, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Xunlei Pang, linux-mm, linux-kernel

On Mon, 24 Feb 2020, Roman Gushchin wrote:

> > I suggest that you simply use the number of partial slabs and multiply
> > them by the number of objects in a slab and use that as a value. Both
> > values are readily available via /sys/kernel/slab/<...>/
>
> So maybe something like this?
>
> @@ -5907,7 +5907,9 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
>  	for_each_kmem_cache_node(s, node, n) {
>  		nr_slabs += node_nr_slabs(n);
>  		nr_objs += node_nr_objs(n);
> +#ifndef CONFIG_SLUB_CPU_PARTIAL
>  		nr_free += count_partial(n, count_free);
> +#endif
>  	}

Why would not having cpu partials screws up the counting of objects in
partial slabs?


You dont need kernel mods for this. The numbers are exposed already in
/sys/kernel/slab/xxx.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
  2020-02-26 18:31       ` Christopher Lameter
  (?)
@ 2020-02-27 18:35       ` Roman Gushchin
  2020-03-03 16:05           ` Christopher Lameter
  -1 siblings, 1 reply; 10+ messages in thread
From: Roman Gushchin @ 2020-02-27 18:35 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Wen Yang, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Xunlei Pang, linux-mm, linux-kernel

On Wed, Feb 26, 2020 at 06:31:28PM +0000, Christoph Lameter wrote:
> On Mon, 24 Feb 2020, Roman Gushchin wrote:
> 
> > > I suggest that you simply use the number of partial slabs and multiply
> > > them by the number of objects in a slab and use that as a value. Both
> > > values are readily available via /sys/kernel/slab/<...>/
> >
> > So maybe something like this?
> >
> > @@ -5907,7 +5907,9 @@ void get_slabinfo(struct kmem_cache *s, struct slabinfo *sinfo)
> >  	for_each_kmem_cache_node(s, node, n) {
> >  		nr_slabs += node_nr_slabs(n);
> >  		nr_objs += node_nr_objs(n);
> > +#ifndef CONFIG_SLUB_CPU_PARTIAL
> >  		nr_free += count_partial(n, count_free);
> > +#endif
> >  	}
> 
> Why would not having cpu partials screws up the counting of objects in
> partial slabs?
> 
> 
> You dont need kernel mods for this. The numbers are exposed already in
> /sys/kernel/slab/xxx.

Stepping a little bit back, there are two independent problems:

1) Reading /proc/slabinfo can cause latency spikes in concurrent memory allocations.
   This is the problem which Wen raised initially.

2) The number of active objects as displayed in /proc/slabinfo is misleadingly
   big if CONFIG_SLUB_CPU_PARTIAL is set.

Maybe mixing them in a single workaround wasn't the best idea, but what do you
suggest instead?

Thank you!

Roman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
  2020-02-27 18:35       ` Roman Gushchin
@ 2020-03-03 16:05           ` Christopher Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2020-03-03 16:05 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wen Yang, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Xunlei Pang, linux-mm, linux-kernel

On Thu, 27 Feb 2020, Roman Gushchin wrote:

> 1) Reading /proc/slabinfo can cause latency spikes in concurrent memory allocations.
>    This is the problem which Wen raised initially.

Ok. Maybe cache the values or do not read /proc/slabinfo? Use
/sys/kernel/slab/... instead?

> 2) The number of active objects as displayed in /proc/slabinfo is misleadingly
>    big if CONFIG_SLUB_CPU_PARTIAL is set.

Ok that cou.d be fixed by counting the partial slabs when computing
/proc/slabinfo output but that would increase the times even more.

> Maybe mixing them in a single workaround wasn't the best idea, but what do you
> suggest instead?

Read select values from /sys/kernel/slab/.... to determine the values you
need and avoid reading those that cause latency spikes. Use the number of
slabs for example to estimate the number of objects instead of the number
of objects which requires scanning through all the individual slab pages.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL
@ 2020-03-03 16:05           ` Christopher Lameter
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher Lameter @ 2020-03-03 16:05 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Wen Yang, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Xunlei Pang, linux-mm, linux-kernel

On Thu, 27 Feb 2020, Roman Gushchin wrote:

> 1) Reading /proc/slabinfo can cause latency spikes in concurrent memory allocations.
>    This is the problem which Wen raised initially.

Ok. Maybe cache the values or do not read /proc/slabinfo? Use
/sys/kernel/slab/... instead?

> 2) The number of active objects as displayed in /proc/slabinfo is misleadingly
>    big if CONFIG_SLUB_CPU_PARTIAL is set.

Ok that cou.d be fixed by counting the partial slabs when computing
/proc/slabinfo output but that would increase the times even more.

> Maybe mixing them in a single workaround wasn't the best idea, but what do you
> suggest instead?

Read select values from /sys/kernel/slab/.... to determine the values you
need and avoid reading those that cause latency spikes. Use the number of
slabs for example to estimate the number of objects instead of the number
of objects which requires scanning through all the individual slab pages.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-03 16:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-22  9:24 [PATCH] mm/slub: improve count_partial() for CONFIG_SLUB_CPU_PARTIAL Wen Yang
2020-02-24  1:29 ` Christopher Lameter
2020-02-24  1:29   ` Christopher Lameter
2020-02-24 16:57   ` Roman Gushchin
2020-02-25 15:49     ` Vlastimil Babka
2020-02-26 18:31     ` Christopher Lameter
2020-02-26 18:31       ` Christopher Lameter
2020-02-27 18:35       ` Roman Gushchin
2020-03-03 16:05         ` Christopher Lameter
2020-03-03 16:05           ` Christopher Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.