All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/3] slub partial list thrashing performance degradation
@ 2009-03-30  5:43 David Rientjes
  2009-03-30  5:43 ` [patch 1/3] slub: add per-cache slab thrash ratio David Rientjes
  2009-03-30  6:38 ` [patch 0/3] slub partial list thrashing performance degradation Pekka Enberg
  0 siblings, 2 replies; 28+ messages in thread
From: David Rientjes @ 2009-03-30  5:43 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, Nick Piggin, Martin Bligh, linux-kernel

SLUB causes a performance degradation in comparison to SLAB when a
workload has an object allocation and freeing pattern such that it spends
more time in partial list handling than utilizing the fastpaths.

This usually occurs when freeing to a non-cpu slab either due to remote
cpu freeing or freeing to a full or partial slab.  When the cpu slab is
later replaced with the freeing slab, it can only satisfy a limited
number of allocations before becoming full and requiring additional
partial list handling.

When the slowpath to fastpath ratio becomes high, this partial list
handling causes the entire allocator to become very slow for the specific
workload.

The bash script at the end of this email (inline) illustrates the
performance degradation well.  It uses the netperf TCP_RR benchmark to
measure transfer rates with various thread counts, each being multiples
of the number of cores.  The transfer rates are reported as an aggregate
of the individual thread results.

CONFIG_SLUB_STATS demonstrates that the kmalloc-256 and kmalloc-2048 are
performing quite poorly:

	cache		ALLOC_FASTPATH	ALLOC_SLOWPATH
	kmalloc-256	98125871	31585955
	kmalloc-2048	77243698	52347453

	cache		FREE_FASTPATH	FREE_SLOWPATH
	kmalloc-256	173624		129538000
	kmalloc-2048	90520		129500630

The majority of slowpath allocations were from the partial list
(30786261, or 97.5%, for kmalloc-256 and 51688159, or 98.7%, for
kmalloc-2048).

A large percentage of frees required the slab to be added back to the
partial list.  For kmalloc-256, 30786630 (23.8%) of slowpath frees
required partial list handling.  For kmalloc-2048, 51688697 (39.9%) of
slowpath frees required partial list handling.

On my 16-core machines with 64G of ram, these are the results:

	# threads	SLAB		SLUB		SLUB+patchset
	16		69892		71592		69505
	32		126490		95373		119731
	48		138050		113072		125014
	64		169240		149043		158919
	80		192294		172035		179679
	96		197779		187849		192154
	112		217283		204962		209988
	128		229848		217547		223507
	144		238550		232369		234565
	160		250333		239871		244789
	176		256878		242712		248971
	192		261611		243182		255596

 [ The SLUB+patchset results were attained with the latest git plus this
   patchset and slab_thrash_ratio set at 20 for both the kmalloc-256 and
   the kmalloc-2048 cache. ]

Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/slub_def.h |    4 +
 mm/slub.c                |  138 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 122 insertions(+), 20 deletions(-)


#!/bin/bash

TIME=60				# seconds
HOSTNAME=<hostname>		# netserver

NR_CPUS=$(grep ^processor /proc/cpuinfo | wc -l)
echo NR_CPUS=$NR_CPUS

run_netperf() {
	for i in $(seq 1 $1); do
		netperf -H $HOSTNAME -t TCP_RR -l $TIME &
	done
}

ITERATIONS=0
while [ $ITERATIONS -lt 12 ]; do
	RATE=0
	ITERATIONS=$[$ITERATIONS + 1]	
	THREADS=$[$NR_CPUS * $ITERATIONS]
	RESULTS=$(run_netperf $THREADS | grep -v '[a-zA-Z]' | awk '{ print $6 }')

	for j in $RESULTS; do
		RATE=$[$RATE + ${j/.*}]
	done
	echo threads=$THREADS rate=$RATE
done

^ permalink raw reply	[flat|nested] 28+ messages in thread
* [patch 1/3] slub: add per-cache slab thrash ratio
@ 2009-03-26  9:42 David Rientjes
  2009-03-26  9:42 ` [patch 2/3] slub: scan partial list for free slabs when thrashing David Rientjes
  0 siblings, 1 reply; 28+ messages in thread
From: David Rientjes @ 2009-03-26  9:42 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Christoph Lameter, Nick Piggin, Martin Bligh, linux-kernel

Adds /sys/kernel/slab/cache/slab_thrash_ratio, which represents the
percentage of a slab's objects that the fastpath must fulfill to not be
considered thrashing on a per-cpu basis[*].

"Thrashing" here is defined as the constant swapping of the cpu slab such
that the slowpath is followed the majority of the time because the
refilled cpu slab can only accommodate a small number of allocations.
This occurs when the object allocation and freeing pattern for a cache is
such that it spends more time swapping the cpu slab than fulfulling
fastpath allocations.

 [*] A single instance of the thrash ratio not being reached in the
     fastpath does not indicate the cpu cache is thrashing.  A
     pre-defined value will later be added to determine how many times
     the ratio must not be reached before a cache is actually thrashing.

This is defined as a ratio based on the number of objects in a cache's
slab.  This is automatically changed when /sys/kernel/slab/cache/order is
changed to reflect the same ratio.

The netperf TCP_RR benchmark illustrates slab thrashing very well with a
large number of threads.  With a test length of 60 seconds, the following
thread counts were used to show the effect of the allocation and freeing
pattern of such a workload.

Before this patchset:

	threads			Transfer Rate (per sec)
	10			66636.39
	20			96311.02
	40			103948.16
	60			140977.62
	80			166714.37
	100			190431.35
	200			244092.36

To identify the thrashing caches, the same workload was run with
CONFIG_SLUB_STATS enabled.  The following caches are obviously performing
very poorly:

	cache		ALLOC_FASTPATH	ALLOC_SLOWPATH	FREE_FASTPATH	FREE_SLOWPATH
	kmalloc-256	45186169	15930724	88289		61028526
	kmalloc-2048	33507239	27541884	46525		61002601

After this patchset (both caches with slab_thrash_ratios of 20):

	threads			Transfer Rate (per sec)
	10			68857.31
	20			98335.04
	40			124376.77
	60			146014.14
	80			177352.16
	100			195467.61
	200			245555.99

Although slabs may accommodate fewer objects than others when contiguous
memory cannot be allocated for a cache's order, the ratio is still based
on its configured `order' since slabs will exist on the partial list that
will be able to fulfill such a requirement.

The value is stored in terms of the number of objects that the ratio
represents, not the ratio itself.  This avoids costly arithmetic in the
slowpath for a calculation that could otherwise be done only when
`slab_thrash_ratio' or `order' is changed.

This also will adjust the configured ratio to one that can actually be
represented in terms of whole numbers: for example, if slab_thrash_ratio
is set to 20 for a cache with 64 objects, the effective ratio is actually
3:16 (or 18.75%).  This will be shown when reading the ratio since it is
better to represent the actual ratio instead of a pseudo substitute.

The slab_thrash_ratio for each cache do not have non-zero defaults
(yet?).

Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 include/linux/slub_def.h |    1 +
 mm/slub.c                |   29 +++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -94,6 +94,7 @@ struct kmem_cache {
 #ifdef CONFIG_SLUB_DEBUG
 	struct kobject kobj;	/* For sysfs */
 #endif
+	u16 min_free_watermark;	/* Calculated from slab thrash ratio */
 
 #ifdef CONFIG_NUMA
 	/*
diff --git a/mm/slub.c b/mm/slub.c
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2190,6 +2190,7 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	unsigned long flags = s->flags;
 	unsigned long size = s->objsize;
 	unsigned long align = s->align;
+	u16 thrash_ratio = 0;
 	int order;
 
 	/*
@@ -2295,10 +2296,13 @@ static int calculate_sizes(struct kmem_cache *s, int forced_order)
 	/*
 	 * Determine the number of objects per slab
 	 */
+	if (oo_objects(s->oo))
+		thrash_ratio = s->min_free_watermark * 100 / oo_objects(s->oo);
 	s->oo = oo_make(order, size);
 	s->min = oo_make(get_order(size), size);
 	if (oo_objects(s->oo) > oo_objects(s->max))
 		s->max = s->oo;
+	s->min_free_watermark = oo_objects(s->oo) * thrash_ratio / 100;
 
 	return !!oo_objects(s->oo);
 
@@ -2320,6 +2324,7 @@ static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags,
 		goto error;
 
 	s->refcount = 1;
+	s->min_free_watermark = 0;
 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 1000;
 #endif
@@ -4089,6 +4094,29 @@ static ssize_t remote_node_defrag_ratio_store(struct kmem_cache *s,
 SLAB_ATTR(remote_node_defrag_ratio);
 #endif
 
+static ssize_t slab_thrash_ratio_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%d\n",
+		       s->min_free_watermark * 100 / oo_objects(s->oo));
+}
+
+static ssize_t slab_thrash_ratio_store(struct kmem_cache *s, const char *buf,
+				       size_t length)
+{
+	unsigned long ratio;
+	int err;
+
+	err = strict_strtoul(buf, 10, &ratio);
+	if (err)
+		return err;
+
+	if (ratio <= 100)
+		s->min_free_watermark = oo_objects(s->oo) * ratio / 100;
+
+	return length;
+}
+SLAB_ATTR(slab_thrash_ratio);
+
 #ifdef CONFIG_SLUB_STATS
 static int show_stat(struct kmem_cache *s, char *buf, enum stat_item si)
 {
@@ -4172,6 +4200,7 @@ static struct attribute *slab_attrs[] = {
 	&shrink_attr.attr,
 	&alloc_calls_attr.attr,
 	&free_calls_attr.attr,
+	&slab_thrash_ratio_attr.attr,
 #ifdef CONFIG_ZONE_DMA
 	&cache_dma_attr.attr,
 #endif

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-03-31 17:37 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-30  5:43 [patch 0/3] slub partial list thrashing performance degradation David Rientjes
2009-03-30  5:43 ` [patch 1/3] slub: add per-cache slab thrash ratio David Rientjes
2009-03-30  5:43   ` [patch 2/3] slub: scan partial list for free slabs when thrashing David Rientjes
2009-03-30  5:43     ` [patch 3/3] slub: sort parital list " David Rientjes
2009-03-30 14:41       ` Christoph Lameter
2009-03-30 20:29         ` David Rientjes
2009-03-30 14:37     ` [patch 2/3] slub: scan partial list for free slabs " Christoph Lameter
2009-03-30 20:22       ` David Rientjes
2009-03-30 21:20         ` Christoph Lameter
2009-03-31  7:13       ` Pekka Enberg
2009-03-31  8:23         ` David Rientjes
2009-03-31  8:49           ` Pekka Enberg
2009-03-31 13:23         ` Christoph Lameter
2009-03-30  7:11   ` [patch 1/3] slub: add per-cache slab thrash ratio Pekka Enberg
2009-03-30  8:41     ` David Rientjes
2009-03-30 15:54     ` Mel Gorman
2009-03-30 20:38       ` David Rientjes
2009-03-30 14:30   ` Christoph Lameter
2009-03-30 20:12     ` David Rientjes
2009-03-30 21:19       ` Christoph Lameter
2009-03-30 22:48         ` David Rientjes
2009-03-31  4:44           ` David Rientjes
2009-03-31 13:26             ` Christoph Lameter
2009-03-31 17:21               ` David Rientjes
2009-03-31 17:24                 ` Christoph Lameter
2009-03-31 17:35                   ` David Rientjes
2009-03-30  6:38 ` [patch 0/3] slub partial list thrashing performance degradation Pekka Enberg
  -- strict thread matches above, loose matches on Subject: below --
2009-03-26  9:42 [patch 1/3] slub: add per-cache slab thrash ratio David Rientjes
2009-03-26  9:42 ` [patch 2/3] slub: scan partial list for free slabs when thrashing David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.