All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/slub: disable slab merging in the default configuration
@ 2023-06-29 22:19 Julian Pidancet
  2023-07-03  0:09 ` David Rientjes
  0 siblings, 1 reply; 12+ messages in thread
From: Julian Pidancet @ 2023-06-29 22:19 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg, David Rientjes, Joonsoo Kim,
	Andrew Morton, Vlastimil Babka
  Cc: Roman Gushchin, Hyeonggon Yoo, linux-mm, Jonathan Corbet,
	linux-doc, linux-kernel, Matthew Wilcox, Kees Cook,
	Rafael Aquini, Julian Pidancet

Make CONFIG_SLAB_MERGE_DEFAULT default to n unless CONFIG_SLUB_TINY is
enabled. Benefits of slab merging is limited on systems that are not
memory constrained: the memory overhead is low and evidence of its
effect on cache hotness is hard to come by.

On the other hand, distinguishing allocations into different slabs will
make attacks that rely on "heap spraying" more difficult to carry out
with success.

Take sides with security in the default kernel configuration over
questionnable performance benefits/memory efficiency.

A timed kernel compilation test, on x86 with 4K pages, conducted 10
times with slab_merge, and the same test then conducted with
slab_nomerge on the same hardware in a similar state do not show any
sign of performance hit one way or another:

      | slab_merge       | slab_nomerge     |
------+------------------+------------------|
Time  |  588.080 ± 0.799 |  587.308 ± 1.411 |
Min   |          586.267 |          584.640 |
Max   |          589.248 |          590.091 |

Peaks in slab usage during the test workload reveal a memory overhead
of 2.2 MiB when using slab_nomerge. Slab usage overhead after a fresh boot
amounts to 2.3 MiB:

Slab Usage         | slab_merge | slab_nomerge |
-------------------+------------+--------------|
After fresh boot   |   79908 kB |     82284 kB |
During test (peak) |  127940 kB |    130204 kB |

Signed-off-by: Julian Pidancet <julian.pidancet@oracle.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---

v2:
  - Re-run benchmark to minimize variance in results due to CPU
    frequency scaling.
  - Record slab usage after boot and peaks during tests workload.
  - Include benchmark results in commit message.
  - Fix typo: s/MEGE/MERGE/.
  - Specify that "overhead" refers to memory overhead in SLUB doc.

v1:
  - Link: https://lore.kernel.org/linux-mm/20230627132131.214475-1-julian.pidancet@oracle.com/

 .../admin-guide/kernel-parameters.txt         | 29 ++++++++++---------
 Documentation/mm/slub.rst                     |  7 +++--
 mm/Kconfig                                    |  6 ++--
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c5e7bb4babf0..7e78471a96b7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5652,21 +5652,22 @@
 
 	slram=		[HW,MTD]
 
-	slab_merge	[MM]
-			Enable merging of slabs with similar size when the
-			kernel is built without CONFIG_SLAB_MERGE_DEFAULT.
-
 	slab_nomerge	[MM]
-			Disable merging of slabs with similar size. May be
-			necessary if there is some reason to distinguish
-			allocs to different slabs, especially in hardened
-			environments where the risk of heap overflows and
-			layout control by attackers can usually be
-			frustrated by disabling merging. This will reduce
-			most of the exposure of a heap attack to a single
-			cache (risks via metadata attacks are mostly
-			unchanged). Debug options disable merging on their
-			own.
+			Disable merging of slabs with similar size when
+			the kernel is built with CONFIG_SLAB_MERGE_DEFAULT.
+			Allocations of the same size made in distinct
+			caches will be placed in separate slabs. In
+			hardened environment, the risk of heap overflows
+			and layout control by attackers can usually be
+			frustrated by disabling merging.
+
+	slab_merge	[MM]
+			Enable merging of slabs with similar size. May be
+			necessary to reduce overhead or increase cache
+			hotness of objects, at the cost of increased
+			exposure in case of a heap attack to a single
+			cache. (risks via metadata attacks are mostly
+			unchanged).
 			For more information see Documentation/mm/slub.rst.
 
 	slab_max_order=	[MM, SLAB]
diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst
index be75971532f5..0e2ce82177c0 100644
--- a/Documentation/mm/slub.rst
+++ b/Documentation/mm/slub.rst
@@ -122,9 +122,10 @@ used on the wrong slab.
 Slab merging
 ============
 
-If no debug options are specified then SLUB may merge similar slabs together
-in order to reduce overhead and increase cache hotness of objects.
-``slabinfo -a`` displays which slabs were merged together.
+If the kernel is built with ``CONFIG_SLAB_MERGE_DEFAULT`` or if ``slab_merge``
+is specified on the kernel command line, then SLUB may merge similar slabs
+together in order to reduce memory overhead and increase cache hotness of
+objects.  ``slabinfo -a`` displays which slabs were merged together.
 
 Slab validation
 ===============
diff --git a/mm/Kconfig b/mm/Kconfig
index 7672a22647b4..05b0304302d4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -255,7 +255,7 @@ config SLUB_TINY
 
 config SLAB_MERGE_DEFAULT
 	bool "Allow slab caches to be merged"
-	default y
+	default n
 	depends on SLAB || SLUB
 	help
 	  For reduced kernel memory fragmentation, slab caches can be
@@ -264,8 +264,8 @@ config SLAB_MERGE_DEFAULT
 	  overwrite objects from merged caches (and more easily control
 	  cache layout), which makes such heap attacks easier to exploit
 	  by attackers. By keeping caches unmerged, these kinds of exploits
-	  can usually only damage objects in the same cache. To disable
-	  merging at runtime, "slab_nomerge" can be passed on the kernel
+	  can usually only damage objects in the same cache. To enable
+	  merging at runtime, "slab_merge" can be passed on the kernel
 	  command line.
 
 config SLAB_FREELIST_RANDOM
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-06-29 22:19 [PATCH v2] mm/slub: disable slab merging in the default configuration Julian Pidancet
@ 2023-07-03  0:09 ` David Rientjes
  2023-07-03 10:33   ` Julian Pidancet
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2023-07-03  0:09 UTC (permalink / raw)
  To: Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

[-- Attachment #1: Type: text/plain, Size: 5809 bytes --]

On Fri, 30 Jun 2023, Julian Pidancet wrote:

> Make CONFIG_SLAB_MERGE_DEFAULT default to n unless CONFIG_SLUB_TINY is
> enabled. Benefits of slab merging is limited on systems that are not
> memory constrained: the memory overhead is low and evidence of its
> effect on cache hotness is hard to come by.
> 
> On the other hand, distinguishing allocations into different slabs will
> make attacks that rely on "heap spraying" more difficult to carry out
> with success.
> 
> Take sides with security in the default kernel configuration over
> questionnable performance benefits/memory efficiency.
> 
> A timed kernel compilation test, on x86 with 4K pages, conducted 10
> times with slab_merge, and the same test then conducted with
> slab_nomerge on the same hardware in a similar state do not show any
> sign of performance hit one way or another:
> 
>       | slab_merge       | slab_nomerge     |
> ------+------------------+------------------|
> Time  |  588.080 ± 0.799 |  587.308 ± 1.411 |
> Min   |          586.267 |          584.640 |
> Max   |          589.248 |          590.091 |
> 
> Peaks in slab usage during the test workload reveal a memory overhead
> of 2.2 MiB when using slab_nomerge. Slab usage overhead after a fresh boot
> amounts to 2.3 MiB:
> 
> Slab Usage         | slab_merge | slab_nomerge |
> -------------------+------------+--------------|
> After fresh boot   |   79908 kB |     82284 kB |
> During test (peak) |  127940 kB |    130204 kB |
> 
> Signed-off-by: Julian Pidancet <julian.pidancet@oracle.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>

Thanks for continuing to work on this.

I think we need more data beyond just kernbench.  Christoph's point about 
different page sizes is interesting.  In the above results, I don't know 
the page orders for the various slab caches that this workload will 
stress.  I think the memory overhead data may be different depending on 
how slab_max_order is being used, if at all.

We should be able to run this through a variety of different benchmarks 
and measure peak slab usage at the same time for due diligence.  I support 
the change in the default, I would just prefer to know what the 
implications of it is.

Is it possible to collect data for other microbenchmarks and real-world 
workloads?  And perhaps also with different page sizes where this will 
impact memory overhead more?  I can help running more workloads once we 
have the next set of data.

> ---
> 
> v2:
>   - Re-run benchmark to minimize variance in results due to CPU
>     frequency scaling.
>   - Record slab usage after boot and peaks during tests workload.
>   - Include benchmark results in commit message.
>   - Fix typo: s/MEGE/MERGE/.
>   - Specify that "overhead" refers to memory overhead in SLUB doc.
> 
> v1:
>   - Link: https://lore.kernel.org/linux-mm/20230627132131.214475-1-julian.pidancet@oracle.com/
> 
>  .../admin-guide/kernel-parameters.txt         | 29 ++++++++++---------
>  Documentation/mm/slub.rst                     |  7 +++--
>  mm/Kconfig                                    |  6 ++--
>  3 files changed, 22 insertions(+), 20 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c5e7bb4babf0..7e78471a96b7 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5652,21 +5652,22 @@
>  
>  	slram=		[HW,MTD]
>  
> -	slab_merge	[MM]
> -			Enable merging of slabs with similar size when the
> -			kernel is built without CONFIG_SLAB_MERGE_DEFAULT.
> -
>  	slab_nomerge	[MM]
> -			Disable merging of slabs with similar size. May be
> -			necessary if there is some reason to distinguish
> -			allocs to different slabs, especially in hardened
> -			environments where the risk of heap overflows and
> -			layout control by attackers can usually be
> -			frustrated by disabling merging. This will reduce
> -			most of the exposure of a heap attack to a single
> -			cache (risks via metadata attacks are mostly
> -			unchanged). Debug options disable merging on their
> -			own.
> +			Disable merging of slabs with similar size when
> +			the kernel is built with CONFIG_SLAB_MERGE_DEFAULT.
> +			Allocations of the same size made in distinct
> +			caches will be placed in separate slabs. In
> +			hardened environment, the risk of heap overflows
> +			and layout control by attackers can usually be
> +			frustrated by disabling merging.
> +
> +	slab_merge	[MM]
> +			Enable merging of slabs with similar size. May be
> +			necessary to reduce overhead or increase cache
> +			hotness of objects, at the cost of increased
> +			exposure in case of a heap attack to a single
> +			cache. (risks via metadata attacks are mostly
> +			unchanged).
>  			For more information see Documentation/mm/slub.rst.
>  
>  	slab_max_order=	[MM, SLAB]
> diff --git a/Documentation/mm/slub.rst b/Documentation/mm/slub.rst
> index be75971532f5..0e2ce82177c0 100644
> --- a/Documentation/mm/slub.rst
> +++ b/Documentation/mm/slub.rst
> @@ -122,9 +122,10 @@ used on the wrong slab.
>  Slab merging
>  ============
>  
> -If no debug options are specified then SLUB may merge similar slabs together
> -in order to reduce overhead and increase cache hotness of objects.
> -``slabinfo -a`` displays which slabs were merged together.
> +If the kernel is built with ``CONFIG_SLAB_MERGE_DEFAULT`` or if ``slab_merge``
> +is specified on the kernel command line, then SLUB may merge similar slabs
> +together in order to reduce memory overhead and increase cache hotness of
> +objects.  ``slabinfo -a`` displays which slabs were merged together.
>  

Suggest mentioning that one of the primary goals of slab cache merging is 
to reduce cache footprint.

>  Slab validation
>  ===============

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-03  0:09 ` David Rientjes
@ 2023-07-03 10:33   ` Julian Pidancet
  2023-07-03 18:38     ` Kees Cook
  2023-07-03 20:17     ` David Rientjes
  0 siblings, 2 replies; 12+ messages in thread
From: Julian Pidancet @ 2023-07-03 10:33 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

On Mon Jul 3, 2023 at 02:09, David Rientjes wrote:
> I think we need more data beyond just kernbench.  Christoph's point about 
> different page sizes is interesting.  In the above results, I don't know 
> the page orders for the various slab caches that this workload will 
> stress.  I think the memory overhead data may be different depending on 
> how slab_max_order is being used, if at all.
>
> We should be able to run this through a variety of different benchmarks 
> and measure peak slab usage at the same time for due diligence.  I support 
> the change in the default, I would just prefer to know what the 
> implications of it is.
>
> Is it possible to collect data for other microbenchmarks and real-world 
> workloads?  And perhaps also with different page sizes where this will 
> impact memory overhead more?  I can help running more workloads once we 
> have the next set of data.
>

David,

I agree about the need to perform those tests on hardware using larger
pages. I will collect data if I have the chance to get my hands on one
of these systems.

Do you have specific tests or workload in mind ? Compiling the kernel
with files sitting on an XFS partition is not exhaustive but it is the
only test I could think of that is both easy to set up and can be 
reproduced while keeping external interferences as little as possible.

-- 
Julian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-03 10:33   ` Julian Pidancet
@ 2023-07-03 18:38     ` Kees Cook
  2023-07-03 20:17     ` David Rientjes
  1 sibling, 0 replies; 12+ messages in thread
From: Kees Cook @ 2023-07-03 18:38 UTC (permalink / raw)
  To: Julian Pidancet
  Cc: David Rientjes, Christoph Lameter, Lameter, Christopher,
	Pekka Enberg, Joonsoo Kim, Andrew Morton, Vlastimil Babka,
	Roman Gushchin, Hyeonggon Yoo, linux-mm, Jonathan Corbet,
	linux-doc, linux-kernel, Matthew Wilcox, Rafael Aquini

On Mon, Jul 03, 2023 at 12:33:25PM +0200, Julian Pidancet wrote:
> On Mon Jul 3, 2023 at 02:09, David Rientjes wrote:
> > I think we need more data beyond just kernbench.  Christoph's point about 
> > different page sizes is interesting.  In the above results, I don't know 
> > the page orders for the various slab caches that this workload will 
> > stress.  I think the memory overhead data may be different depending on 
> > how slab_max_order is being used, if at all.
> >
> > We should be able to run this through a variety of different benchmarks 
> > and measure peak slab usage at the same time for due diligence.  I support 
> > the change in the default, I would just prefer to know what the 
> > implications of it is.
> >
> > Is it possible to collect data for other microbenchmarks and real-world 
> > workloads?  And perhaps also with different page sizes where this will 
> > impact memory overhead more?  I can help running more workloads once we 
> > have the next set of data.
> >
> 
> David,
> 
> I agree about the need to perform those tests on hardware using larger
> pages. I will collect data if I have the chance to get my hands on one
> of these systems.
> 
> Do you have specific tests or workload in mind ? Compiling the kernel
> with files sitting on an XFS partition is not exhaustive but it is the
> only test I could think of that is both easy to set up and can be 
> reproduced while keeping external interferences as little as possible.

I think it is a sufficiently complicated heap allocation workload (and
real-world). I'd prefer we get this change landed in -next after -rc1 so
we can see if there are any regressions reported by the 0day and other
CI performance tests.

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-03 10:33   ` Julian Pidancet
  2023-07-03 18:38     ` Kees Cook
@ 2023-07-03 20:17     ` David Rientjes
  2023-07-06  7:38       ` David Rientjes
  2023-07-10 14:56       ` Vlastimil Babka
  1 sibling, 2 replies; 12+ messages in thread
From: David Rientjes @ 2023-07-03 20:17 UTC (permalink / raw)
  To: Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

On Mon, 3 Jul 2023, Julian Pidancet wrote:

> On Mon Jul 3, 2023 at 02:09, David Rientjes wrote:
> > I think we need more data beyond just kernbench.  Christoph's point about 
> > different page sizes is interesting.  In the above results, I don't know 
> > the page orders for the various slab caches that this workload will 
> > stress.  I think the memory overhead data may be different depending on 
> > how slab_max_order is being used, if at all.
> >
> > We should be able to run this through a variety of different benchmarks 
> > and measure peak slab usage at the same time for due diligence.  I support 
> > the change in the default, I would just prefer to know what the 
> > implications of it is.
> >
> > Is it possible to collect data for other microbenchmarks and real-world 
> > workloads?  And perhaps also with different page sizes where this will 
> > impact memory overhead more?  I can help running more workloads once we 
> > have the next set of data.
> >
> 
> David,
> 
> I agree about the need to perform those tests on hardware using larger
> pages. I will collect data if I have the chance to get my hands on one
> of these systems.
> 

Thanks.  I think arm64 should suffice for things like 64KB pages that 
Christoph was referring to.

We also may want to play around with slub_min_order on the kernel command 
line since that will inflate the size of slab pages and we may see some 
different results because of the increased page size.

> Do you have specific tests or workload in mind ? Compiling the kernel
> with files sitting on an XFS partition is not exhaustive but it is the
> only test I could think of that is both easy to set up and can be 
> reproduced while keeping external interferences as little as possible.
> 

The ones that Binder, cc'd, used to evaluate SLAB vs SLUB memory overhead:

hackbench
netperf
redis
specjbb2015
unixbench
will-it-scale

And Vlastimil had also suggested a few XFS specific benchmarks.

I can try to help run benchmarks that you're not able to run or if you 
can't get your hands on an arm64 system.

Additionally, I wouldn't consider this to be super urgent: slab cache 
merging has been this way for several years, we have some time to do an 
assessment of the implications of changing an important aspect of kernel 
memory allocation that will affect everybody.  I agree with the patch if 
we can make it work, I'd just like to study the effect of it more fully 
beyond some kernbench runs.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-03 20:17     ` David Rientjes
@ 2023-07-06  7:38       ` David Rientjes
  2023-07-09  8:55         ` David Rientjes
  2023-07-10 14:56       ` Vlastimil Babka
  1 sibling, 1 reply; 12+ messages in thread
From: David Rientjes @ 2023-07-06  7:38 UTC (permalink / raw)
  To: Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

On Mon, 3 Jul 2023, David Rientjes wrote:

> hackbench

Running hackbench on Skylake with v6.1.30 (A) and v6.1.30 + your patch 
(B), for example:

              LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
  SReclaimable                  |       |            |            |            |            |           |                
  (A) v6.1.30                   | 11    | 129480.000 | 233208.000 | 189936.364 | 204316.000 | 31465.625 |                
  (B) <same sha>                | 11    | 139084.000 | 236772.000 | 198931.273 | 213672.000 | 30013.204 |                
                                |       | +7.42%     | +1.53%     | +4.74%     | +4.58%     | -4.62%    | <not defined>  
  SUnreclaim                    |       |            |            |            |            |           |                
  (A) v6.1.30                   | 11    | 305400.000 | 538744.000 | 422148.000 | 449344.000 | 65005.045 |                
  (B) <same sha>                | 11    | 305780.000 | 518300.000 | 422219.636 | 450252.000 | 61245.137 |                
                                |       | +0.12%     | -3.79%     | +0.02%     | +0.20%     | -5.78%    | <not defined>  

Amount of reclaimable slab significantly increases which is likely not a 
problem because, well, it's reclaimable.  But I suspect we'll find other 
interesting data points with the other suggested benchmarks.

And benchmark results:

              LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
  hackbench_process_pipes_234   |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 1.735      | 1.979      | 1.831      | 1.835      | 0.086291  |                
  (B) <same sha>                | 7     | 1.687      | 2.023      | 1.886      | 1.911      | 0.10276   |                
                                |       | -2.77%     | +2.22%     | +3.00%     | +4.14%     | +19.09%   | <not defined>  
  hackbench_process_pipes_max   |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 1.735      | 1.979      | 1.831      | 1.835      | 0.086291  |                
  (B) <same sha>                | 7     | 1.687      | 2.023      | 1.886      | 1.911      | 0.10276   |                
                                |       | -2.77%     | +2.22%     | +3.00%     | +4.14%     | +19.09%   | - is good      
  hackbench_process_sockets_234 |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 7.883      | 7.909      | 7.899      | 7.899      | 0.0087808 |                
  (B) <same sha>                | 7     | 7.872      | 7.961      | 7.907      | 7.904      | 0.028019  |                
                                |       | -0.14%     | +0.66%     | +0.10%     | +0.06%     | +219.09%  | <not defined>  
  hackbench_process_sockets_max |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 7.883      | 7.909      | 7.899      | 7.899      | 0.0087808 |                
  (B) <same sha>                | 7     | 7.872      | 7.961      | 7.907      | 7.904      | 0.028019  |                
                                |       | -0.14%     | +0.66%     | +0.10%     | +0.06%     | +219.09%  | - is good      
  hackbench_thread_pipes_234    |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 2.146      | 2.677      | 2.410      | 2.418      | 0.18143   |                
  (B) <same sha>                | 7     | 2.016      | 2.514      | 2.268      | 2.241      | 0.17474   |                
                                |       | -6.06%     | -6.09%     | -5.88%     | -7.32%     | -3.69%    | <not defined>  
  hackbench_thread_pipes_max    |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 2.146      | 2.677      | 2.410      | 2.418      | 0.18143   |               
  (B) <same sha>                | 7     | 2.016      | 2.514      | 2.268      | 2.241      | 0.17474   |                
                                |       | -6.06%     | -6.09%     | -5.88%     | -7.32%     | -3.69%    | - is good      
  hackbench_thread_sockets_234  |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 8.025      | 8.127      | 8.084      | 8.085      | 0.029755  |                
  (B) <same sha>                | 7     | 7.990      | 8.093      | 8.042      | 8.035      | 0.035152  |                
                                |       | -0.44%     | -0.42%     | -0.53%     | -0.62%     | +18.14%   | <not defined>  
  hackbench_thread_sockets_max  |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 8.025      | 8.127      | 8.084      | 8.085      | 0.029755  |                
  (B) <same sha>                | 7     | 7.990      | 8.093      | 8.042      | 8.035      | 0.035152  |                
                                |       | -0.44%     | -0.42%     | -0.53%     | -0.62%     | +18.14%   | - is good    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-06  7:38       ` David Rientjes
@ 2023-07-09  8:55         ` David Rientjes
  2023-07-10  2:40           ` David Rientjes
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2023-07-09  8:55 UTC (permalink / raw)
  To: Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

On Thu, 6 Jul 2023, David Rientjes wrote:

> On Mon, 3 Jul 2023, David Rientjes wrote:
> 
> > hackbench
> 
> Running hackbench on Skylake with v6.1.30 (A) and v6.1.30 + your patch 
> (B), for example:
> 
>               LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
> --------------------------------+-------+------------+------------+------------+------------+-----------+----------------
>   SReclaimable                  |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 11    | 129480.000 | 233208.000 | 189936.364 | 204316.000 | 31465.625 |                
>   (B) <same sha>                | 11    | 139084.000 | 236772.000 | 198931.273 | 213672.000 | 30013.204 |                
>                                 |       | +7.42%     | +1.53%     | +4.74%     | +4.58%     | -4.62%    | <not defined>  
>   SUnreclaim                    |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 11    | 305400.000 | 538744.000 | 422148.000 | 449344.000 | 65005.045 |                
>   (B) <same sha>                | 11    | 305780.000 | 518300.000 | 422219.636 | 450252.000 | 61245.137 |                
>                                 |       | +0.12%     | -3.79%     | +0.02%     | +0.20%     | -5.78%    | <not defined>  
> 
> Amount of reclaimable slab significantly increases which is likely not a 
> problem because, well, it's reclaimable.  But I suspect we'll find other 
> interesting data points with the other suggested benchmarks.
> 
> And benchmark results:
> 
>               LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
> --------------------------------+-------+------------+------------+------------+------------+-----------+----------------
>   hackbench_process_pipes_234   |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 1.735      | 1.979      | 1.831      | 1.835      | 0.086291  |                
>   (B) <same sha>                | 7     | 1.687      | 2.023      | 1.886      | 1.911      | 0.10276   |                
>                                 |       | -2.77%     | +2.22%     | +3.00%     | +4.14%     | +19.09%   | <not defined>  
>   hackbench_process_pipes_max   |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 1.735      | 1.979      | 1.831      | 1.835      | 0.086291  |                
>   (B) <same sha>                | 7     | 1.687      | 2.023      | 1.886      | 1.911      | 0.10276   |                
>                                 |       | -2.77%     | +2.22%     | +3.00%     | +4.14%     | +19.09%   | - is good      
>   hackbench_process_sockets_234 |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 7.883      | 7.909      | 7.899      | 7.899      | 0.0087808 |                
>   (B) <same sha>                | 7     | 7.872      | 7.961      | 7.907      | 7.904      | 0.028019  |                
>                                 |       | -0.14%     | +0.66%     | +0.10%     | +0.06%     | +219.09%  | <not defined>  
>   hackbench_process_sockets_max |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 7.883      | 7.909      | 7.899      | 7.899      | 0.0087808 |                
>   (B) <same sha>                | 7     | 7.872      | 7.961      | 7.907      | 7.904      | 0.028019  |                
>                                 |       | -0.14%     | +0.66%     | +0.10%     | +0.06%     | +219.09%  | - is good      
>   hackbench_thread_pipes_234    |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 2.146      | 2.677      | 2.410      | 2.418      | 0.18143   |                
>   (B) <same sha>                | 7     | 2.016      | 2.514      | 2.268      | 2.241      | 0.17474   |                
>                                 |       | -6.06%     | -6.09%     | -5.88%     | -7.32%     | -3.69%    | <not defined>  
>   hackbench_thread_pipes_max    |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 2.146      | 2.677      | 2.410      | 2.418      | 0.18143   |               
>   (B) <same sha>                | 7     | 2.016      | 2.514      | 2.268      | 2.241      | 0.17474   |                
>                                 |       | -6.06%     | -6.09%     | -5.88%     | -7.32%     | -3.69%    | - is good      
>   hackbench_thread_sockets_234  |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 8.025      | 8.127      | 8.084      | 8.085      | 0.029755  |                
>   (B) <same sha>                | 7     | 7.990      | 8.093      | 8.042      | 8.035      | 0.035152  |                
>                                 |       | -0.44%     | -0.42%     | -0.53%     | -0.62%     | +18.14%   | <not defined>  
>   hackbench_thread_sockets_max  |       |            |            |            |            |           |                
>   (A) v6.1.30                   | 7     | 8.025      | 8.127      | 8.084      | 8.085      | 0.029755  |                
>   (B) <same sha>                | 7     | 7.990      | 8.093      | 8.042      | 8.035      | 0.035152  |                
>                                 |       | -0.44%     | -0.42%     | -0.53%     | -0.62%     | +18.14%   | - is good    

My takeaway from running half a dozen benchmarks on Intel is that
performance is more impacted than slab memory usage.  There are slight
regressions in memory usage, but only measurable for SReclaimable which
would be the better form (as opposed to SUnreclaimable).

There are some substantial performance degradations, most notably 
context_switch1_per_thread_ops which regressed ~21%.  I'll need to repeat
that test to confirm it and can also try on cascadelake if it reproduces.

There are some more negligible redis, specjbb, and will-it-scale
regressions which don't look terribly concerning.

I'll try running performance tests on AMD Zen3 and also ARM with
PAGE_SIZE == 4KB and 64KB.

Unixbench memory usage and performance is within +/- 1% for every metric,
so it's not presented here.

Full results for Skylake, removing results where mean is +/- 1% of
baseline:

============================== MEMORY USAGE ==============================

hackbench
              LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
  SReclaimable                  |       |            |            |            |            |           |                
  (A) v6.1.30                   | 11    | 129480.000 | 233208.000 | 189936.364 | 204316.000 | 31465.625 |                
  (B) v6.1.30 slab_nomerge      | 11    | 139084.000 | 236772.000 | 198931.273 | 213672.000 | 30013.204 |                
                                |       | +7.42%     | +1.53%     | +4.74%     | +4.58%     | -4.62%    | - is good

redis
             LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
-------------------------------+-------+------------+------------+------------+------------+-----------+----------------
  SReclaimable                 |       |            |            |            |            |           |                
  (A) v6.1.30                  | 298   | 137056.000 | 238664.000 | 226005.477 | 226940.000 | 8109.328  |                
  (B) v6.1.30 slab_nomerge     | 302   | 139664.000 | 242664.000 | 229096.689 | 230098.000 | 8215.134  |                
                               |       | +1.90%     | +1.68%     | +1.37%     | +1.39%     | +1.30%    | - is good 

specjbb2015
               LABEL               | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV  |   DIRECTION    
-----------------------------------+-------+------------+------------+------------+------------+----------+----------------
  SReclaimable                     |       |            |            |            |            |          |                
  (A) v6.1.30                      | 1602  | 118344.000 | 217932.000 | 203559.618 | 205372.000 | 5314.410 |                
  (B) v6.1.30 slab_nomerge         | 1655  | 128000.000 | 222536.000 | 208099.973 | 209396.000 | 4608.582 |                
                                   |       | +8.16%     | +2.11%     | +2.23%     | +1.96%     | -13.28%  | - is good 

============================== PERFORMANCE ==============================

hackbench
              LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
--------------------------------+-------+------------+------------+------------+------------+-----------+----------------
  hackbench_process_pipes_234   |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 1.735      | 1.979      | 1.831      | 1.835      | 0.086291  |                
  (B) v6.1.30 slab_nomerge      | 7     | 1.687      | 2.023      | 1.886      | 1.911      | 0.10276   |                
                                |       | -2.77%     | +2.22%     | +3.00%     | +4.14%     | +19.09%   | - is good 
  hackbench_thread_pipes_234    |       |            |            |            |            |           |                
  (A) v6.1.30                   | 7     | 2.146      | 2.677      | 2.410      | 2.418      | 0.18143   |                
  (B) v6.1.30 slab_nomerge      | 7     | 2.016      | 2.514      | 2.268      | 2.241      | 0.17474   |                
                                |       | -6.06%     | -6.09%     | -5.88%     | -7.32%     | -3.69%    | - is good

redis
             LABEL             | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV   |   DIRECTION    
-------------------------------+-------+------------+------------+------------+------------+-----------+----------------
  redis_medium_max_INCR        |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 108695.660 | 112637.980 | 110639.626 | 109757.440 | 1668.190  |                
  (B) v6.1.30 slab_nomerge     | 5     | 101853.740 | 106564.370 | 104166.478 | 104942.800 | 1833.377  |                
                               |       | -6.29%     | -5.39%     | -5.85%     | -4.39%     | +9.90%    | + is good      
  redis_medium_max_LPOP        |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 102944.200 | 108471.630 | 105572.750 | 106303.820 | 2016.986  |                
  (B) v6.1.30 slab_nomerge     | 5     | 101471.340 | 104231.810 | 103361.688 | 104090.770 | 1064.277  |                
                               |       | -1.43%     | -3.91%     | -2.09%     | -2.08%     | -47.23%   | + is good      
  redis_medium_max_LPUSH       |       |            |            |            |            |           |                
  (A) v6.1.30                  | 10    | 99255.590  | 108295.430 | 105960.440 | 106338.120 | 2553.802  |                
  (B) v6.1.30 slab_nomerge     | 10    | 100130.160 | 107032.000 | 104335.070 | 105091.705 | 2169.708  |                
                               |       | +0.88%     | -1.17%     | -1.53%     | -1.17%     | -15.04%   | + is good      
  redis_medium_max_LRANGE_100  |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 72427.030  | 73046.020  | 72671.814  | 72626.910  | 202.812   |                
  (B) v6.1.30 slab_nomerge     | 5     | 70811.500  | 72030.540  | 71519.286  | 71761.750  | 450.918   |                
                               |       | -2.23%     | -1.39%     | -1.59%     | -1.19%     | +122.33%  | + is good      
  redis_medium_max_MSET_10     |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 87642.420  | 89798.850  | 89044.390  | 89102.740  | 769.933   |                
  (B) v6.1.30 slab_nomerge     | 5     | 85287.840  | 89758.550  | 87876.598  | 88386.070  | 1641.608  |                
                               |       | -2.69%     | -0.04%     | -1.31%     | -0.80%     | +113.21%  | + is good      
  redis_medium_max_PING_BULK   |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 101729.400 | 108189.980 | 105003.228 | 105307.490 | 2171.756  |                
  (B) v6.1.30 slab_nomerge     | 5     | 100553.050 | 105340.770 | 102561.464 | 101947.190 | 1789.953  |                
                               |       | -1.16%     | -2.63%     | -2.33%     | -3.19%     | -17.58%   | + is good      
  redis_medium_max_PING_INLINE |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 102522.050 | 107503.770 | 105209.902 | 106033.300 | 1981.499  |                
  (B) v6.1.30 slab_nomerge     | 5     | 97541.950  | 107319.170 | 103729.414 | 104854.780 | 3304.256  |                
                               |       | -4.86%     | -0.17%     | -1.41%     | -1.11%     | +66.76%   | + is good      
  redis_medium_max_SET         |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 105663.570 | 112283.850 | 108917.118 | 109469.070 | 2663.234  |                
  (B) v6.1.30 slab_nomerge     | 5     | 103071.540 | 106723.590 | 105128.226 | 106179.660 | 1666.892  |                
                               |       | -2.45%     | -4.95%     | -3.48%     | -3.00%     | -37.41%   | + is good      
  redis_medium_max_SPOP        |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 104079.940 | 107238.610 | 105140.616 | 104964.840 | 1150.370  |                
  (B) v6.1.30 slab_nomerge     | 5     | 102637.790 | 103885.300 | 103343.934 | 103412.620 | 437.159   |                
                               |       | -1.39%     | -3.13%     | -1.71%     | -1.48%     | -62.00%   | + is good      
   redis_small_max_INCR        |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 98814.230  | 114942.530 | 107744.856 | 108813.920 | 6150.540  |                
  (B) v6.1.30 slab_nomerge     | 5     | 99800.400  | 109529.020 | 104451.708 | 104058.270 | 3732.461  |                
                               |       | +1.00%     | -4.71%     | -3.06%     | -4.37%     | -39.31%   | + is good      
  redis_small_max_LPOP         |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 104275.290 | 118764.840 | 108648.192 | 106951.880 | 5208.918  |                
  (B) v6.1.30 slab_nomerge     | 5     | 97560.980  | 115074.800 | 103120.496 | 99800.400  | 6353.203  |                
                               |       | -6.44%     | -3.11%     | -5.09%     | -6.69%     | +21.97%   | + is good      
  redis_small_max_LRANGE_100   |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 67980.970  | 72992.700  | 71589.644  | 72150.070  | 1832.810  |                
  (B) v6.1.30 slab_nomerge     | 5     | 64977.260  | 72046.110  | 70273.716  | 71684.590  | 2680.854  |                
                               |       | -4.42%     | -1.30%     | -1.84%     | -0.65%     | +46.27%   | + is good       
  redis_small_max_MSET_10      |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 90497.730  | 106044.540 | 100756.422 | 102880.660 | 5455.768  |                
  (B) v6.1.30 slab_nomerge     | 5     | 97276.270  | 106951.880 | 102818.856 | 102880.660 | 3293.135  |                
                               |       | +7.49%     | +0.86%     | +2.05%     | +0.00%     | -39.64%   | + is good        
  redis_small_max_PING_INLINE  |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 96153.850  | 108459.870 | 102493.414 | 102459.020 | 4995.757  |                
  (B) v6.1.30 slab_nomerge     | 5     | 84317.030  | 116144.020 | 99995.920  | 98039.220  | 11045.861 |                
                               |       | -12.31%    | +7.08%     | -2.44%     | -4.31%     | +121.10%  | + is good      
  redis_small_max_SADD         |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 106044.540 | 115606.940 | 109804.052 | 110375.270 | 3451.251  |                
  (B) v6.1.30 slab_nomerge     | 5     | 95693.780  | 109769.480 | 102329.518 | 102249.490 | 4602.161  |                
                               |       | -9.76%     | -5.05%     | -6.81%     | -7.36%     | +33.35%   | + is good      
  redis_small_max_SET          |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 91911.760  | 116686.120 | 104509.200 | 102354.150 | 8993.532  |                
  (B) v6.1.30 slab_nomerge     | 5     | 100502.520 | 113636.370 | 108815.700 | 109649.120 | 4750.002  |                
                               |       | +9.35%     | -2.61%     | +4.12%     | +7.13%     | -47.18%   | + is good      
  redis_small_max_SPOP         |       |            |            |            |            |           |                
  (A) v6.1.30                  | 5     | 96899.230  | 108695.650 | 103648.652 | 104931.800 | 3901.567  |                
  (B) v6.1.30 slab_nomerge     | 5     | 93457.940  | 108108.110 | 101680.560 | 101626.020 | 5096.944  |                
                               |       | -3.55%     | -0.54%     | -1.90%     | -3.15%     | +30.64%   | + is good 

specjbb2015
               LABEL               | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   |  STDDEV  |   DIRECTION    
-----------------------------------+-------+------------+------------+------------+------------+----------+----------------
  specjbb2015_single_Critical_JOPS |       |            |            |            |            |          |                
  (A) v6.1.30                      | 1     | 46294.000  | 46294.000  | 46294.000  | 46294.000  | 0        |                
  (B) v6.1.30 slab_nomerge         | 1     | 46167.000  | 46167.000  | 46167.000  | 46167.000  | 0        |                
                                   |       | -0.27%     | -0.27%     | -0.27%     | -0.27%     | ---      | + is good      
  specjbb2015_single_Max_JOPS      |       |            |            |            |            |          |                
  (A) v6.1.30                      | 1     | 68842.000  | 68842.000  | 68842.000  | 68842.000  | 0        |                
  (B) v6.1.30 slab_nomerge         | 1     | 67801.000  | 67801.000  | 67801.000  | 67801.000  | 0        |                
                                   |       | -1.51%     | -1.51%     | -1.51%     | -1.51%     | ---      | + is good   
                                   
vm-scalability
                 LABEL                 | COUNT |       MIN       |       MAX       |      MEAN       |     MEDIAN      |    STDDEV     | DIRECTION  
---------------------------------------+-------+-----------------+-----------------+-----------------+-----------------+---------------+------------
  300s_128G_truncate_throughput        |       |                 |                 |                 |                 |               |            
  (A) v6.1.30                          | 15    | 16398714804.000 | 17010339870.000 | 16772025703.867 | 16834675132.000 | 232697088.501 |            
  (B) v6.1.30 slab_nomerge             | 15    | 16704416343.000 | 17271437122.000 | 16948419991.200 | 16821799877.000 | 233146680.475 |            
                                       |       | +1.86%          | +1.53%          | +1.05%          | -0.08%          | +0.19%        | + is good  
  300s_512G_anon_wx_rand_mt_throughput |       |                 |                 |                 |                 |               |            
  (A) v6.1.30                          | 15    | 7198561.000     | 7359712.000     | 7263944.200     | 7259418.000     | 50394.115     |            
  (B) v6.1.30 slab_nomerge             | 15    | 7191842.000     | 7628158.000     | 7390629.000     | 7407204.000     | 171602.612    |            
                                       |       | -0.09%          | +3.65%          | +1.74%          | +2.04%          | +240.52%      | + is good  

will-it-scale
               LABEL               | COUNT |     MIN      |     MAX      |     MEAN     |    MEDIAN    |  STDDEV   |   DIRECTION    
-----------------------------------+-------+--------------+--------------+--------------+--------------+-----------+----------------
  context_switch1_per_thread_ops   |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 324721.000   | 324721.000   | 324721.000   | 324721.000   | 0         |                
  (B) v6.1.30 slab_nomerge         | 1     | 255999.000   | 255999.000   | 255999.000   | 255999.000   | 0         |                
    !! REGRESSED !!                |       | -21.16%      | -21.16%      | -21.16%      | -21.16%      | ---       | + is good      
  getppid1_scalability             |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 0.71943      | 0.71943      | 0.71943      | 0.71943      | 0         |                
  (B) v6.1.30 slab_nomerge         | 1     | 0.70923      | 0.70923      | 0.70923      | 0.70923      | 0         |                
                                   |       | -1.42%       | -1.42%       | -1.42%       | -1.42%       | ---       | + is good      
  mmap1_scalability                |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 0.18831      | 0.18831      | 0.18831      | 0.18831      | 0         |                
  (B) v6.1.30 slab_nomerge         | 1     | 0.18413      | 0.18413      | 0.18413      | 0.18413      | 0         |                
                                   |       | -2.22%       | -2.22%       | -2.22%       | -2.22%       | ---       | + is good      
  poll2_scalability                |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 0.45608      | 0.45608      | 0.45608      | 0.45608      | 0         |                
  (B) v6.1.30 slab_nomerge         | 1     | 0.44207      | 0.44207      | 0.44207      | 0.44207      | 0         |                
                                   |       | -3.07%       | -3.07%       | -3.07%       | -3.07%       | ---       | + is good      
  pthread_mutex1_scalability       |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 0.45207      | 0.45207      | 0.45207      | 0.45207      | 0         |                
  (B) v6.1.30 slab_nomerge         | 1     | 0.44194      | 0.44194      | 0.44194      | 0.44194      | 0         |                
                                   |       | -2.24%       | -2.24%       | -2.24%       | -2.24%       | ---       | + is good      
  pthread_mutex2_per_process_ops   |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 36292960.000 | 36292960.000 | 36292960.000 | 36292960.000 | 0         |                
  (B) <v6.1.30 slab_nomerge        | 1     | 35446930.000 | 35446930.000 | 35446930.000 | 35446930.000 | 0         |                
                                   |       | -2.33%       | -2.33%       | -2.33%       | -2.33%       | ---       | + is good      
  signal1_scalability              |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 0.55541      | 0.55541      | 0.55541      | 0.55541      | 0         |                
  (B) v6.1.30 slab_nomerge         | 1     | 0.54773      | 0.54773      | 0.54773      | 0.54773      | 0         |                
                                   |       | -1.38%       | -1.38%       | -1.38%       | -1.38%       | ---       | + is good      
  unix1_scalability                |       |              |              |              |              |           |                
  (A) v6.1.30                      | 1     | 0.55085      | 0.55085      | 0.55085      | 0.55085      | 0         |                
  (B) v6.1.30 slab_nomerge         | 1     | 0.53957      | 0.53957      | 0.53957      | 0.53957      | 0         |                
                                   |       | -2.05%       | -2.05%       | -2.05%       | -2.05%       | ---       | + is good   

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-09  8:55         ` David Rientjes
@ 2023-07-10  2:40           ` David Rientjes
  2023-07-18 12:08             ` Julian Pidancet
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2023-07-10  2:40 UTC (permalink / raw)
  To: Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

On Sun, 9 Jul 2023, David Rientjes wrote:

> There are some substantial performance degradations, most notably 
> context_switch1_per_thread_ops which regressed ~21%.  I'll need to repeat
> that test to confirm it and can also try on cascadelake if it reproduces.
> 

So the regression on skylake for will-it-scale appears to be real:

               LABEL              | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   | STDDEV | DIRECTION  
----------------------------------+-------+------------+------------+------------+------------+--------+------------
  context_switch1_per_thread_ops  |       |            |            |            |            |        |            
  (A) v6.1.30                     | 1     | 314507.000 | 314507.000 | 314507.000 | 314507.000 | 0      |            
  (B) v6.1.30 slab_nomerge        | 1     | 257403.000 | 257403.000 | 257403.000 | 257403.000 | 0      |            
    !! REGRESSED !!               |       | -18.16%    | -18.16%    | -18.16%    | -18.16%    | ---    | + is good  

but I can't reproduce this on cascadelake:

               LABEL              | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   | STDDEV | DIRECTION  
----------------------------------+-------+------------+------------+------------+------------+--------+------------
  context_switch1_per_thread_ops  |       |            |            |            |            |        |            
  (A) v6.1.30                     | 1     | 301128.000 | 301128.000 | 301128.000 | 301128.000 | 0      |            
  (B) v6.1.30 slab_nomerge        | 1     | 301282.000 | 301282.000 | 301282.000 | 301282.000 | 0      |            
                                  |       | +0.05%     | +0.05%     | +0.05%     | +0.05%     | ---    | + is good  

So I'm a bit baffled at the moment.

I'll try to dig deeper and see what slab caches this benchmark exercises
that apparently no other benchmarks do.  (I'm really hoping that the only
way to recover this performance is by something like
kmem_cache_create(SLAB_MERGE).)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-03 20:17     ` David Rientjes
  2023-07-06  7:38       ` David Rientjes
@ 2023-07-10 14:56       ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2023-07-10 14:56 UTC (permalink / raw)
  To: David Rientjes, Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Hyeonggon Yoo,
	linux-mm, Jonathan Corbet, linux-doc, linux-kernel,
	Matthew Wilcox, Kees Cook, Rafael Aquini

On 7/3/23 22:17, David Rientjes wrote:
> Additionally, I wouldn't consider this to be super urgent: slab cache 
> merging has been this way for several years, we have some time to do an 
> assessment of the implications of changing an important aspect of kernel 
> memory allocation that will affect everybody.

Agreed, although I wouldn't say "affect everybody" because the changed
upstream default may not automatically translate to what distros will use,
and I'd expect most people rely on distro kernels.

> I agree with the patch if 
> we can make it work, I'd just like to study the effect of it more fully 
> beyond some kernbench runs.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-10  2:40           ` David Rientjes
@ 2023-07-18 12:08             ` Julian Pidancet
  2023-07-25 23:25               ` David Rientjes
  0 siblings, 1 reply; 12+ messages in thread
From: Julian Pidancet @ 2023-07-18 12:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

[-- Attachment #1: Type: text/plain, Size: 2877 bytes --]

On Mon Jul 10, 2023 at 04:40, David Rientjes wrote:
> On Sun, 9 Jul 2023, David Rientjes wrote:
>
> > There are some substantial performance degradations, most notably 
> > context_switch1_per_thread_ops which regressed ~21%.  I'll need to repeat
> > that test to confirm it and can also try on cascadelake if it reproduces.
> > 
>
> So the regression on skylake for will-it-scale appears to be real:
>
>                LABEL              | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   | STDDEV | DIRECTION  
> ----------------------------------+-------+------------+------------+------------+------------+--------+------------
>   context_switch1_per_thread_ops  |       |            |            |            |            |        |            
>   (A) v6.1.30                     | 1     | 314507.000 | 314507.000 | 314507.000 | 314507.000 | 0      |            
>   (B) v6.1.30 slab_nomerge        | 1     | 257403.000 | 257403.000 | 257403.000 | 257403.000 | 0      |            
>     !! REGRESSED !!               |       | -18.16%    | -18.16%    | -18.16%    | -18.16%    | ---    | + is good  
>
> but I can't reproduce this on cascadelake:
>
>                LABEL              | COUNT |    MIN     |    MAX     |    MEAN    |   MEDIAN   | STDDEV | DIRECTION  
> ----------------------------------+-------+------------+------------+------------+------------+--------+------------
>   context_switch1_per_thread_ops  |       |            |            |            |            |        |            
>   (A) v6.1.30                     | 1     | 301128.000 | 301128.000 | 301128.000 | 301128.000 | 0      |            
>   (B) v6.1.30 slab_nomerge        | 1     | 301282.000 | 301282.000 | 301282.000 | 301282.000 | 0      |            
>                                   |       | +0.05%     | +0.05%     | +0.05%     | +0.05%     | ---    | + is good  
>
> So I'm a bit baffled at the moment.
>
> I'll try to dig deeper and see what slab caches this benchmark exercises
> that apparently no other benchmarks do.  (I'm really hoping that the only
> way to recover this performance is by something like
> kmem_cache_create(SLAB_MERGE).)

Hi David,

Many thanks for running all these tests. The amount of attention you've
given this change is simply amazing. I wish I could have been able to
assist you by doing more tests, but I've been lacking the necessary
resources to do so.

I'm as surprised as you are regarding the skylake regression. 20% is
quite a large number, but perhaps it's less worrying than it looks given
that benchmarks are usually very different from real-world workloads?

As Kees Cook was suggesting in his own reply, have you given a thought
about including this change in -next and see if there are regressions
showing up in CI performance tests results?

Regards,

-- 
Julian

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-18 12:08             ` Julian Pidancet
@ 2023-07-25 23:25               ` David Rientjes
  2023-07-26  8:34                 ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: David Rientjes @ 2023-07-25 23:25 UTC (permalink / raw)
  To: Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Vlastimil Babka, Roman Gushchin,
	Hyeonggon Yoo, linux-mm, Jonathan Corbet, linux-doc,
	linux-kernel, Matthew Wilcox, Kees Cook, Rafael Aquini

On Tue, 18 Jul 2023, Julian Pidancet wrote:

> Hi David,
> 
> Many thanks for running all these tests. The amount of attention you've
> given this change is simply amazing. I wish I could have been able to
> assist you by doing more tests, but I've been lacking the necessary
> resources to do so.
> 
> I'm as surprised as you are regarding the skylake regression. 20% is
> quite a large number, but perhaps it's less worrying than it looks given
> that benchmarks are usually very different from real-world workloads?
> 

I'm not an expert on context_switch1_per_thread_ops so I can't infere 
which workloads would be most affected by such a regression other than to 
point out that -18% is quite substantial.

I'm still hoping to run some benchmarks with 64KB page sizes as Christoph 
suggested, I should be able to do this with arm64.

It's ceratinly good news that the overall memory footprint doesn't change 
much with this change.

> As Kees Cook was suggesting in his own reply, have you given a thought
> about including this change in -next and see if there are regressions
> showing up in CI performance tests results?
> 

I assume that anything we can run with CI performance tests can also be 
run without merging into -next?

The performance degradation is substantial for a microbenchmark, I'd like 
to complete the picture on other benchmarks and do a complete analysis 
with 64KB page sizes since I think the concern Christoph mentions could be 
quite real.  We just don't have the data yet to make an informed 
assessment of it.  Certainly would welcome any help that others would like 
to provide for running benchmarks with this change as well :P

Once we have a complete picture, we might also want to discuss what we are 
hoping to achieve with such a change.  I was very supportive of it prior 
to the -18% benchmark result.  But if most users are simply using whatever 
their distro defaults to and other users may already be opting into this 
either by the kernel command line or .config, it's hard to determine 
exactly the set of users that would be affected by this change.  Suddenly 
causing a -18% regression overnight for this would be surprising for them.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] mm/slub: disable slab merging in the default configuration
  2023-07-25 23:25               ` David Rientjes
@ 2023-07-26  8:34                 ` Vlastimil Babka
  0 siblings, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2023-07-26  8:34 UTC (permalink / raw)
  To: David Rientjes, Julian Pidancet
  Cc: Christoph Lameter, Lameter, Christopher, Pekka Enberg,
	Joonsoo Kim, Andrew Morton, Roman Gushchin, Hyeonggon Yoo,
	linux-mm, Jonathan Corbet, linux-doc, linux-kernel,
	Matthew Wilcox, Kees Cook, Rafael Aquini

On 7/26/23 01:25, David Rientjes wrote:
> On Tue, 18 Jul 2023, Julian Pidancet wrote:
> 
>> Hi David,
>> 
>> Many thanks for running all these tests. The amount of attention you've
>> given this change is simply amazing. I wish I could have been able to
>> assist you by doing more tests, but I've been lacking the necessary
>> resources to do so.
>> 
>> I'm as surprised as you are regarding the skylake regression. 20% is
>> quite a large number, but perhaps it's less worrying than it looks given
>> that benchmarks are usually very different from real-world workloads?
>> 
> 
> I'm not an expert on context_switch1_per_thread_ops so I can't infere 
> which workloads would be most affected by such a regression other than to 
> point out that -18% is quite substantial.

It might turn out that this regression is accidental in that merging happens
to result in a better caching that benefits the particular skylake cache
hierarchy (but not others), because the workload happens to use two
different classes of objects that are compatible for merging, and uses them
with identical lifetime.

But that would be arguably still a corner case and not something to result
in a hard go/no-go for the change, as similar corner cases would likely
exist that would benefit from not merging.

But it's possible the reason for the regression is something less expectable
than the above hypotehsis, so indeed we should investigate first.

> I'm still hoping to run some benchmarks with 64KB page sizes as Christoph 
> suggested, I should be able to do this with arm64.
> 
> It's ceratinly good news that the overall memory footprint doesn't change 
> much with this change.
> 
>> As Kees Cook was suggesting in his own reply, have you given a thought
>> about including this change in -next and see if there are regressions
>> showing up in CI performance tests results?
>> 
> 
> I assume that anything we can run with CI performance tests can also be 
> run without merging into -next?
> 
> The performance degradation is substantial for a microbenchmark, I'd like 
> to complete the picture on other benchmarks and do a complete analysis 
> with 64KB page sizes since I think the concern Christoph mentions could be 
> quite real.  We just don't have the data yet to make an informed 
> assessment of it.  Certainly would welcome any help that others would like 
> to provide for running benchmarks with this change as well :P
> 
> Once we have a complete picture, we might also want to discuss what we are 
> hoping to achieve with such a change.  I was very supportive of it prior 
> to the -18% benchmark result.  But if most users are simply using whatever 
> their distro defaults to and other users may already be opting into this 
> either by the kernel command line or .config, it's hard to determine 
> exactly the set of users that would be affected by this change.  Suddenly 
> causing a -18% regression overnight for this would be surprising for them.

What I'd hope to achieve is that if we find out that the differences of
merging/not-merging are negligible (modulo corner cases) for both
performance and memory, we'd not only change the default, but even make
merging more exceptional. It should still be done under SLUB_TINY, and maybe
we can keep the slab_merge boot option, but that's it?

Because in case they are comparable, not merging has indeed benefits -
/proc/slabinfo accounting is not misleading, so in case a bug is reported,
it's not neccessary to reboot with nomerge to get the real picture, then
there are the security benefits mentioned etc.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-26  8:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29 22:19 [PATCH v2] mm/slub: disable slab merging in the default configuration Julian Pidancet
2023-07-03  0:09 ` David Rientjes
2023-07-03 10:33   ` Julian Pidancet
2023-07-03 18:38     ` Kees Cook
2023-07-03 20:17     ` David Rientjes
2023-07-06  7:38       ` David Rientjes
2023-07-09  8:55         ` David Rientjes
2023-07-10  2:40           ` David Rientjes
2023-07-18 12:08             ` Julian Pidancet
2023-07-25 23:25               ` David Rientjes
2023-07-26  8:34                 ` Vlastimil Babka
2023-07-10 14:56       ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.