[RFC PATCH v0] mm/slub: Let number of online CPUs determine the slub page order

* [RFC PATCH v0] mm/slub: Let number of online CPUs determine the slub page order
@ 2020-11-18  8:27 Bharata B Rao
  2020-11-18 11:25 ` Vlastimil Babka
  2021-01-20 17:36 ` Vincent Guittot
  0 siblings, 2 replies; 37+ messages in thread
From: Bharata B Rao @ 2020-11-18  8:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, cl, rientjes, iamjoonsoo.kim, akpm, guro, vbabka,
	shakeelb, hannes, aneesh.kumar, Bharata B Rao

The page order of the slab that gets chosen for a given slab
cache depends on the number of objects that can be fit in the
slab while meeting other requirements. We start with a value
of minimum objects based on nr_cpu_ids that is driven by
possible number of CPUs and hence could be higher than the
actual number of CPUs present in the system. This leads to
calculate_order() chosing a page order that is on the higher
side leading to increased slab memory consumption on systems
that have bigger page sizes.

Hence rely on the number of online CPUs when determining the
mininum objects, thereby increasing the chances of chosing
a lower conservative page order for the slab.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
This is a generic change and I am unsure how it would affect
other archs, but as a start, here are some numbers from
PowerPC pseries KVM guest with and without this patch:

This table shows how this change has affected some of the slab
caches.
===================================================================
		Current				Patched
Cache	<objperslab> <pagesperslab>	<objperslab> <pagesperslab>
===================================================================
TCPv6		53    2			26    1
net_namespace	53    4			26    2
dtl		32    2			16    1
names_cache	32    2			16    1
task_struct	53    8			13    2
thread_stack	32    8			8     2
pgtable-2^11	16    8			8     4
pgtable-2^8	32    2			16    1
kmalloc-32k	16    8			8     4
kmalloc-16k	32    8			8     2
kmalloc-8k	32    4			8     1
kmalloc-4k	32    2			16    1
===================================================================

Slab memory (kB) consumption comparision
==================================================================
			Current		Patched
==================================================================
After-boot		205760		156096
During-hackbench	629145		506752 (Avg of 5 runs)
After-hackbench		474176		331840 (after drop_caches)
==================================================================

Hackbench Time (Avg of 5 runs)
(hackbench -s 1024 -l 200 -g 200 -f 25 -P)
==========================================
Current		Patched
==========================================
10.990		11.010
==========================================

Measuring the effect due to CPU hotplug
----------------------------------------
Since the patch doesn't consider all the possible CPUs for page
order calcluation, let's see how affects the case when CPUs are
hotplugged. Here I compare a system that is booted with 64CPUs
with a system that is booted with 16CPUs but hotplugged with
48CPUs after boot. These numbers are with the patch applied.

Slab memory (kB) consumption comparision
===================================================================
			64bootCPUs	16bootCPUs+48HotPluggedCPUs
===================================================================
After-boot		390272		159744
After-hotplug		-		251328
During-hackbench	1001267		941926 (Avg of 5 runs)
After-hackbench		913600		827200 (after drop_caches)
===================================================================

Hackbench Time (Avg of 5 runs)
(hackbench -s 1024 -l 200 -g 200 -f 25 -P)
===========================================
64bootCPUs	16bootCPUs+48HotPluggedCPUs
===========================================
12.554		12.589
===========================================
 mm/slub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 34dcc09e2ec9..8342c0a167b2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3433,7 +3433,7 @@ static inline int calculate_order(unsigned int size)
 	 */
 	min_objects = slub_min_objects;
 	if (!min_objects)
-		min_objects = 4 * (fls(nr_cpu_ids) + 1);
+		min_objects = 4 * (fls(num_online_cpus()) + 1);
 	max_objects = order_objects(slub_max_order, size);
 	min_objects = min(min_objects, max_objects);
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 37+ messages in thread