All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Reduce cross CPU IPI interference
@ 2011-10-23 15:48 Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We have lots of infrastructure in place to partition a multi-core system such
that we have a group of CPUs that are dedicated to specific task: cgroups,
scheduler and interrupt affinity and cpuisol boot parameter. Still, kernel
code will some time interrupt all CPUs in the system via IPIs for various
needs. These IPIs are useful and cannot be avoided altogether, but in certain
cases it is possible to interrupt only specific CPUs that have useful work to
do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying to
explore doing this by locating the places where such global IPI calls are
being made and turning a global IPI into an IPI for a specific group of CPUs.
The purpose of the patch set is to get  feedback if this is the right way to
go for dealing with this issue and indeed, if the issue is even worth dealing
with at all.

The patch creates an on_each_cpu_mask infrastructure API (derived from
existing arch specific versions in Tile and Arm) and uses it to turn two global
IPI invocation to per CPU group invocations.

This second version incorporates changes due to reviewers feedback and 
additional testing. The major changes from the previous version of the patch 
are:

- Better description for some of the patches with examples of what I am
  trying to solve.
- Better coding style for on_each_cpu based on review comments by Peter 
  Zijlstra and Sasha Levin.
- Fixed pcp_count handling to take into account which cpu the accounting 
  is done for. Sadly, AFAIK this negates using this_cpu_add/sub as 
  suggested by Peter Z.
- Removed kmalloc from the flush_all() path as per review comment by 
  Pekka Enberg.
- Moved cpumask allocations for CONFIG_CPUMASK_OFFSTACK=y to a point previous
  to first use during boot as testing revealed we no longer boot under 
  CONFIG_CPUMASK_OFFSTACK=y with original code.

The patch was compiled for arm and boot tested on x86 in UP, SMP, with and without 
CONFIG_CPUMASK_OFFSTACK and was further tested by running hackbench on x86 in 
SMP mode in a 4 CPUs VM for several hours with no obvious regressions.

I also artificially exercised SLUB flush_all via the debug interface and observed 
the difference in IPI count across processors  with and without the patch - from 
an IPI on all processors but one without the patch to a subset (and often no IPI 
at all) with the patch.


Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>

Gilad Ben-Yossef (6):
  smp: Introduce a generic on_each_cpu_mask function
  arm: Move arm over to generic on_each_cpu_mask
  tile: Move tile to use generic on_each_cpu_mask
  mm: Only IPI CPUs to drain local pages if they exist
  slub: Only IPI CPUs that have per cpu obj to flush
  slub: only preallocate cpus_with_slabs if offstack

 arch/arm/kernel/smp_tlb.c   |   20 +++----------
 arch/tile/include/asm/smp.h |    7 -----
 arch/tile/kernel/smp.c      |   19 -------------
 include/linux/slub_def.h    |    9 ++++++
 include/linux/smp.h         |   16 +++++++++++
 kernel/smp.c                |   20 +++++++++++++
 mm/page_alloc.c             |   64 +++++++++++++++++++++++++++++++++++++------
 mm/slub.c                   |   61 +++++++++++++++++++++++++++++++++++++++-
 8 files changed, 164 insertions(+), 52 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function
  2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
@ 2011-10-23 15:48 ` Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 2/6] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

on_each_cpu_mask calls a function on processors specified my cpumask,
which may include the local processor.

All the limitation specified in smp_call_function_many apply.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/smp.h |   16 ++++++++++++++++
 kernel/smp.c        |   20 ++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 8cc38d3..60628d7 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
 int on_each_cpu(smp_call_func_t func, void *info, int wait);
 
 /*
+ * Call a function on processors specified by mask, which might include
+ * the local one.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+		void *info, bool wait);
+
+/*
  * Mark the boot cpu "online" so that it can call console drivers in
  * printk() and can access its per-cpu storage.
  */
@@ -132,6 +139,15 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
 		local_irq_enable();		\
 		0;				\
 	})
+#define on_each_cpu_mask(mask, func, info, wait) \
+	do {						\
+		if (cpumask_test_cpu(0, (mask))) {	\
+			local_irq_disable();		\
+			(func)(info);			\
+			local_irq_enable();		\
+		}					\
+	} while (0)
+
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
 #define smp_prepare_boot_cpu()			do {} while (0)
diff --git a/kernel/smp.c b/kernel/smp.c
index fb67dfa..df37c08 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -701,3 +701,23 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
+
+/*
+ * Call a function on processors specified by cpumask, which may include
+ * the local processor. All the limitation specified in smp_call_function_many
+ * apply.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+			void *info, bool wait)
+{
+	int cpu = get_cpu();
+
+	smp_call_function_many(mask, func, info, wait);
+	if (cpumask_test_cpu(cpu, mask)) {
+		local_irq_disable();
+		func(info);
+		local_irq_enable();
+	}
+	put_cpu();
+}
+EXPORT_SYMBOL(on_each_cpu_mask);
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/6] arm: Move arm over to generic on_each_cpu_mask
  2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
@ 2011-10-23 15:48 ` Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 3/6] tile: Move tile to use " Gilad Ben-Yossef
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

Note the generic version has the mask as first parameter

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 arch/arm/kernel/smp_tlb.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 7dcb352..02c5d2c 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -13,18 +13,6 @@
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
-static void on_each_cpu_mask(void (*func)(void *), void *info, int wait,
-	const struct cpumask *mask)
-{
-	preempt_disable();
-
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(smp_processor_id(), mask))
-		func(info);
-
-	preempt_enable();
-}
-
 /**********************************************************************/
 
 /*
@@ -87,7 +75,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(ipi_flush_tlb_mm, mm, 1, mm_cpumask(mm));
+		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
 	else
 		local_flush_tlb_mm(mm);
 }
@@ -98,7 +86,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(ipi_flush_tlb_page, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+					&ta, 1);
 	} else
 		local_flush_tlb_page(vma, uaddr);
 }
@@ -121,7 +110,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(ipi_flush_tlb_range, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+					&ta, 1);
 	} else
 		local_flush_tlb_range(vma, start, end);
 }
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/6] tile: Move tile to use generic on_each_cpu_mask
  2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 2/6] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
@ 2011-10-23 15:48 ` Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

The API is the same as the tile private one, so just remove
the private version of the functions

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 arch/tile/include/asm/smp.h |    7 -------
 arch/tile/kernel/smp.c      |   19 -------------------
 2 files changed, 0 insertions(+), 26 deletions(-)

diff --git a/arch/tile/include/asm/smp.h b/arch/tile/include/asm/smp.h
index 532124a..1aa759a 100644
--- a/arch/tile/include/asm/smp.h
+++ b/arch/tile/include/asm/smp.h
@@ -43,10 +43,6 @@ void evaluate_message(int tag);
 /* Boot a secondary cpu */
 void online_secondary(void);
 
-/* Call a function on a specified set of CPUs (may include this one). */
-extern void on_each_cpu_mask(const struct cpumask *mask,
-			     void (*func)(void *), void *info, bool wait);
-
 /* Topology of the supervisor tile grid, and coordinates of boot processor */
 extern HV_Topology smp_topology;
 
@@ -91,9 +87,6 @@ void print_disabled_cpus(void);
 
 #else /* !CONFIG_SMP */
 
-#define on_each_cpu_mask(mask, func, info, wait)		\
-  do { if (cpumask_test_cpu(0, (mask))) func(info); } while (0)
-
 #define smp_master_cpu		0
 #define smp_height		1
 #define smp_width		1
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index c52224d..a44e103 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -87,25 +87,6 @@ void send_IPI_allbutself(int tag)
 	send_IPI_many(&mask, tag);
 }
 
-
-/*
- * Provide smp_call_function_mask, but also run function locally
- * if specified in the mask.
- */
-void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
-		      void *info, bool wait)
-{
-	int cpu = get_cpu();
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(cpu, mask)) {
-		local_irq_disable();
-		func(info);
-		local_irq_enable();
-	}
-	put_cpu();
-}
-
-
 /*
  * Functions related to starting/stopping cpus.
  */
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
                   ` (2 preceding siblings ...)
  2011-10-23 15:48 ` [PATCH v2 3/6] tile: Move tile to use " Gilad Ben-Yossef
@ 2011-10-23 15:48 ` Gilad Ben-Yossef
  2011-10-28  4:10   ` Christoph Lameter
  2011-10-23 15:48 ` [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack Gilad Ben-Yossef
  5 siblings, 1 reply; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

Use a cpumask to track CPUs with per-cpu pages in any zone
and only send an IPI requesting CPUs to drain these pages
to the buddy allocator if they actually have pages.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 mm/page_alloc.c |   64 +++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e8ecb6..9551b90 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -57,11 +57,17 @@
 #include <linux/ftrace_event.h>
 #include <linux/memcontrol.h>
 #include <linux/prefetch.h>
+#include <linux/percpu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
+#include <asm/local.h>
 #include "internal.h"
 
+/* Which CPUs have per cpu pages  */
+cpumask_var_t cpus_with_pcp;
+static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
+
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -224,6 +230,36 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+/*
+ * The following two functions track page counts both per zone/per CPU
+ * and globaly per CPU.
+ *
+ * They must be called with interrupts disabled and either pinned to specific
+ * CPU or for offline CPUs or under stop_machine.
+ */
+
+static inline void inc_pcp_count(int cpu, struct per_cpu_pages *pcp, int count)
+{
+	unsigned long *tot = &per_cpu(total_cpu_pcp_count, cpu);
+
+	if (unlikely(!*tot))
+		cpumask_set_cpu(cpu, cpus_with_pcp);
+
+	*tot += count;
+	pcp->count += count;
+}
+
+static inline void dec_pcp_count(int cpu, struct per_cpu_pages *pcp, int count)
+{
+	unsigned long *tot = &per_cpu(total_cpu_pcp_count, cpu);
+
+	pcp->count -= count;
+	*tot -= count;
+
+	if (unlikely(!*tot))
+		cpumask_clear_cpu(cpu, cpus_with_pcp);
+}
+
 static void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
@@ -1072,7 +1108,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 	else
 		to_drain = pcp->count;
 	free_pcppages_bulk(zone, to_drain, pcp);
-	pcp->count -= to_drain;
+	dec_pcp_count(smp_processor_id(), pcp, to_drain);
 	local_irq_restore(flags);
 }
 #endif
@@ -1099,7 +1135,7 @@ static void drain_pages(unsigned int cpu)
 		pcp = &pset->pcp;
 		if (pcp->count) {
 			free_pcppages_bulk(zone, pcp->count, pcp);
-			pcp->count = 0;
+			dec_pcp_count(cpu, pcp, pcp->count);
 		}
 		local_irq_restore(flags);
 	}
@@ -1118,7 +1154,7 @@ void drain_local_pages(void *arg)
  */
 void drain_all_pages(void)
 {
-	on_each_cpu(drain_local_pages, NULL, 1);
+	on_each_cpu_mask(cpus_with_pcp, drain_local_pages, NULL, 1);
 }
 
 #ifdef CONFIG_HIBERNATION
@@ -1166,7 +1202,7 @@ void free_hot_cold_page(struct page *page, int cold)
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
-	int migratetype;
+	int migratetype, cpu;
 	int wasMlocked = __TestClearPageMlocked(page);
 
 	if (!free_pages_prepare(page, 0))
@@ -1194,15 +1230,16 @@ void free_hot_cold_page(struct page *page, int cold)
 		migratetype = MIGRATE_MOVABLE;
 	}
 
+	cpu = smp_processor_id();
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	if (cold)
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
 		list_add(&page->lru, &pcp->lists[migratetype]);
-	pcp->count++;
+	inc_pcp_count(cpu, pcp, 1);
 	if (pcp->count >= pcp->high) {
 		free_pcppages_bulk(zone, pcp->batch, pcp);
-		pcp->count -= pcp->batch;
+		dec_pcp_count(cpu, pcp, pcp->batch);
 	}
 
 out:
@@ -1305,9 +1342,10 @@ again:
 		pcp = &this_cpu_ptr(zone->pageset)->pcp;
 		list = &pcp->lists[migratetype];
 		if (list_empty(list)) {
-			pcp->count += rmqueue_bulk(zone, 0,
+			inc_pcp_count(smp_processor_id(), pcp,
+					rmqueue_bulk(zone, 0,
 					pcp->batch, list,
-					migratetype, cold);
+					migratetype, cold));
 			if (unlikely(list_empty(list)))
 				goto failed;
 		}
@@ -1318,7 +1356,7 @@ again:
 			page = list_entry(list->next, struct page, lru);
 
 		list_del(&page->lru);
-		pcp->count--;
+		dec_pcp_count(smp_processor_id(), pcp, 1);
 	} else {
 		if (unlikely(gfp_flags & __GFP_NOFAIL)) {
 			/*
@@ -3553,6 +3591,10 @@ static int zone_batchsize(struct zone *zone)
 #endif
 }
 
+/*
+ * NOTE: If you call this function on a pcp of a populated zone you
+ * need to worry about syncing cpus_with_pcp state as well.
+ */
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
@@ -3673,6 +3715,7 @@ static int __zone_pcp_update(void *data)
 
 		local_irq_save(flags);
 		free_pcppages_bulk(zone, pcp->count, pcp);
+		dec_pcp_count(cpu, pcp, pcp->count);
 		setup_pageset(pset, batch);
 		local_irq_restore(flags);
 	}
@@ -5040,6 +5083,9 @@ static int page_alloc_cpu_notify(struct notifier_block *self,
 void __init page_alloc_init(void)
 {
 	hotcpu_notifier(page_alloc_cpu_notify, 0);
+
+	/* Allocate the cpus_with_pcp var if CONFIG_CPUMASK_OFFSTACK */
+	alloc_bootmem_cpumask_var(&cpus_with_pcp);
 }
 
 /*
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush
  2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
                   ` (3 preceding siblings ...)
  2011-10-23 15:48 ` [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
@ 2011-10-23 15:48 ` Gilad Ben-Yossef
  2011-10-23 15:48 ` [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack Gilad Ben-Yossef
  5 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

flush_all() is called for each kmem_cahce_destroy(). So every cache
being destroyed dynamically ended up sending an IPI to each CPU in the
system, regardless if the cache has ever been used there.

For example, if you close the Infinband ipath driver char device file,
the close file ops calls kmem_cache_destroy(). So running some
infiniband config tool on one a single CPU dedicated to system tasks
might interrupt the rest of the 127 CPUs I dedicated to some CPU
intensive task.

I suspect there is a good chance that every line in the output of "git
grep kmem_cache_destroy linux/ | grep '\->'" has a similar scenario.

This patch attempts to rectify this issue by sending an IPI to flush
the per cpu objects back to the free lists only to CPUs that seems to
have such objects.

The check which CPU to IPI is racy but we don't care since
asking a CPU without per cpu objects to flush does no
damage and as far as I can tell the flush_all by itself is
racy against allocs on remote CPUs anyway, so if you meant
the flush_all to be determinstic, you had to arrange for
locking regardless.

Also note that it is fine for concurrent uses of the cpumask var
on different cpus since they end up tracking the same thing. The
only downside to a race is asking a CPU with not per cpu cache
to flush, which before this patch happens all the time any way.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/slub_def.h |    3 +++
 mm/slub.c                |   37 +++++++++++++++++++++++++++++++++++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index f58d641..b130f61 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -102,6 +102,9 @@ struct kmem_cache {
 	 */
 	int remote_node_defrag_ratio;
 #endif
+
+	/* Which CPUs hold local slabs for this cache. */
+	cpumask_var_t cpus_with_slabs;
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/mm/slub.c b/mm/slub.c
index 7c54fe8..f8cbf2d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1948,7 +1948,18 @@ static void flush_cpu_slab(void *d)
 
 static void flush_all(struct kmem_cache *s)
 {
-	on_each_cpu(flush_cpu_slab, s, 1);
+	struct kmem_cache_cpu *c;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		c = per_cpu_ptr(s->cpu_slab, cpu);
+		if (c && c->page)
+			cpumask_set_cpu(cpu, s->cpus_with_slabs);
+		else
+			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
+	}
+
+	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
 }
 
 /*
@@ -3028,6 +3039,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		if (s->flags & SLAB_DESTROY_BY_RCU)
 			rcu_barrier();
 		sysfs_slab_remove(s);
+		free_cpumask_var(s->cpus_with_slabs);
 	}
 	up_write(&slub_lock);
 }
@@ -3528,6 +3540,7 @@ void __init kmem_cache_init(void)
 	int order;
 	struct kmem_cache *temp_kmem_cache_node;
 	unsigned long kmalloc_size;
+	int ret;
 
 	kmem_size = offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *);
@@ -3635,15 +3648,24 @@ void __init kmem_cache_init(void)
 
 	slab_state = UP;
 
-	/* Provide the correct kmalloc names now that the caches are up */
+	/*
+	 * Provide the correct kmalloc names and the cpus_with_slabs cpumasks
+	 * for CONFIG_CPUMASK_OFFSTACK=y case now that the caches are up.
+	 */
 	if (KMALLOC_MIN_SIZE <= 32) {
 		kmalloc_caches[1]->name = kstrdup(kmalloc_caches[1]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[1]->name);
+		ret = alloc_cpumask_var(&kmalloc_caches[1]->cpus_with_slabs,
+			GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 	if (KMALLOC_MIN_SIZE <= 64) {
 		kmalloc_caches[2]->name = kstrdup(kmalloc_caches[2]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[2]->name);
+		ret = alloc_cpumask_var(&kmalloc_caches[2]->cpus_with_slabs,
+				GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 	for (i = KMALLOC_SHIFT_LOW; i < SLUB_PAGE_SHIFT; i++) {
@@ -3651,6 +3673,9 @@ void __init kmem_cache_init(void)
 
 		BUG_ON(!s);
 		kmalloc_caches[i]->name = s;
+		ret = alloc_cpumask_var(&kmalloc_caches[i]->cpus_with_slabs,
+				GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 #ifdef CONFIG_SMP
@@ -3668,6 +3693,10 @@ void __init kmem_cache_init(void)
 			BUG_ON(!name);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(name,
 				s->objsize, SLAB_CACHE_DMA);
+			ret = alloc_cpumask_var(
+				&kmalloc_dma_caches[i]->cpus_with_slabs,
+				GFP_NOWAIT);
+			BUG_ON(!ret);
 		}
 	}
 #endif
@@ -3778,15 +3807,19 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 
 	s = kmalloc(kmem_size, GFP_KERNEL);
 	if (s) {
+
 		if (kmem_cache_open(s, n,
 				size, align, flags, ctor)) {
+			alloc_cpumask_var(&s->cpus_with_slabs, GFP_KERNEL);
 			list_add(&s->list, &slab_caches);
 			if (sysfs_slab_add(s)) {
 				list_del(&s->list);
+				free_cpumask_var(s->cpus_with_slabs);
 				kfree(n);
 				kfree(s);
 				goto err;
 			}
+
 			up_write(&slub_lock);
 			return s;
 		}
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
                   ` (4 preceding siblings ...)
  2011-10-23 15:48 ` [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
@ 2011-10-23 15:48 ` Gilad Ben-Yossef
  2011-10-24  5:19   ` Andi Kleen
  5 siblings, 1 reply; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We need a cpumask to track cpus with per cpu cache pages
to know which cpu to whack during flush_all. For
CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
on the flush_all path, so we preallocate per kmem_cache
on cache creation and use it in flush_all.

The result is that for the common CONFIG_CPUMASK_OFFSTACK=n
case there is no memory overhead for the mask var.

Since systems where CONFIG_CPUMASK_OFFSTACK=y are the systems
which are most likely to benefit from less IPIs by tracking
which cpu pas actually has a per cpu cache, we end up paying
the overhead only in cases we enjoy the upside.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/slub_def.h |    8 ++++++-
 mm/slub.c                |   52 +++++++++++++++++++++++++++++++++------------
 2 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index b130f61..c07f7aa 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -103,8 +103,14 @@ struct kmem_cache {
 	int remote_node_defrag_ratio;
 #endif
 
-	/* Which CPUs hold local slabs for this cache. */
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	/*
+	 * Which CPUs hold local slabs for this cache.
+	 * Only updated on calling flush_all().
+	 * Defined on stack for CONFIG_CPUMASK_OFFSTACK=n.
+	 */
 	cpumask_var_t cpus_with_slabs;
+#endif
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/mm/slub.c b/mm/slub.c
index f8cbf2d..765be95 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1946,20 +1946,48 @@ static void flush_cpu_slab(void *d)
 	__flush_cpu_slab(s, smp_processor_id());
 }
 
+/*
+ * We need a cpumask struct to track which cpus have
+ * per cpu caches. For CONFIG_CPUMASK_OFFSTACK=n we
+ * allocate on stack. For CONFIG_CPUMASK_OFFSTACK=y
+ * we don't want to allocate in the flush_all code path
+ * so we allocate a struct for each cache structure
+ * on kmem cache creation and use it here.
+ */
 static void flush_all(struct kmem_cache *s)
 {
 	struct kmem_cache_cpu *c;
 	int cpu;
+	cpumask_var_t cpus;
 
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	cpus = s->cpus_with_slabs;
+#endif
 	for_each_online_cpu(cpu) {
 		c = per_cpu_ptr(s->cpu_slab, cpu);
 		if (c && c->page)
-			cpumask_set_cpu(cpu, s->cpus_with_slabs);
+			cpumask_set_cpu(cpu, cpus);
 		else
-			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
+			cpumask_clear_cpu(cpu, cpus);
 	}
 
-	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
+	on_each_cpu_mask(cpus, flush_cpu_slab, s, 1);
+}
+
+static inline int alloc_cpus_mask(struct kmem_cache *s, int flags)
+{
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	return alloc_cpumask_var(&s->cpus_with_slabs, flags);
+#else
+	return 1;
+#endif
+}
+
+static inline void free_cpus_mask(struct kmem_cache *s)
+{
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	free_cpumask_var(s->cpus_with_slabs);
+#endif
 }
 
 /*
@@ -3039,7 +3067,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		if (s->flags & SLAB_DESTROY_BY_RCU)
 			rcu_barrier();
 		sysfs_slab_remove(s);
-		free_cpumask_var(s->cpus_with_slabs);
+		free_cpus_mask(s);
 	}
 	up_write(&slub_lock);
 }
@@ -3655,16 +3683,14 @@ void __init kmem_cache_init(void)
 	if (KMALLOC_MIN_SIZE <= 32) {
 		kmalloc_caches[1]->name = kstrdup(kmalloc_caches[1]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[1]->name);
-		ret = alloc_cpumask_var(&kmalloc_caches[1]->cpus_with_slabs,
-			GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[1], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
 	if (KMALLOC_MIN_SIZE <= 64) {
 		kmalloc_caches[2]->name = kstrdup(kmalloc_caches[2]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[2]->name);
-		ret = alloc_cpumask_var(&kmalloc_caches[2]->cpus_with_slabs,
-				GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[2], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
@@ -3673,8 +3699,7 @@ void __init kmem_cache_init(void)
 
 		BUG_ON(!s);
 		kmalloc_caches[i]->name = s;
-		ret = alloc_cpumask_var(&kmalloc_caches[i]->cpus_with_slabs,
-				GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[i], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
@@ -3693,8 +3718,7 @@ void __init kmem_cache_init(void)
 			BUG_ON(!name);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(name,
 				s->objsize, SLAB_CACHE_DMA);
-			ret = alloc_cpumask_var(
-				&kmalloc_dma_caches[i]->cpus_with_slabs,
+			ret = alloc_cpus_mask(kmalloc_dma_caches[i],
 				GFP_NOWAIT);
 			BUG_ON(!ret);
 		}
@@ -3810,11 +3834,11 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 
 		if (kmem_cache_open(s, n,
 				size, align, flags, ctor)) {
-			alloc_cpumask_var(&s->cpus_with_slabs, GFP_KERNEL);
+			alloc_cpus_mask(s, GFP_KERNEL);
 			list_add(&s->list, &slab_caches);
 			if (sysfs_slab_add(s)) {
 				list_del(&s->list);
-				free_cpumask_var(s->cpus_with_slabs);
+				free_cpus_mask(s);
 				kfree(n);
 				kfree(s);
 				goto err;
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-23 15:48 ` [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack Gilad Ben-Yossef
@ 2011-10-24  5:19   ` Andi Kleen
  2011-10-24  6:16     ` Sasha Levin
  2011-10-24  8:02     ` Gilad Ben-Yossef
  0 siblings, 2 replies; 17+ messages in thread
From: Andi Kleen @ 2011-10-24  5:19 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: lkml, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Sasha Levin

Gilad Ben-Yossef <gilad@benyossef.com> writes:

> We need a cpumask to track cpus with per cpu cache pages
> to know which cpu to whack during flush_all. For
> CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
> For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
> on the flush_all path, so we preallocate per kmem_cache
> on cache creation and use it in flush_all.

What's the problem with calling kmalloc in flush_all? 
That's a slow path anyways, isn't it?

I believe the IPI functions usually allocate anyways.

So maybe you can do that much simpler.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-24  5:19   ` Andi Kleen
@ 2011-10-24  6:16     ` Sasha Levin
  2011-10-24  8:02     ` Gilad Ben-Yossef
  1 sibling, 0 replies; 17+ messages in thread
From: Sasha Levin @ 2011-10-24  6:16 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Gilad Ben-Yossef, lkml, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall

On Sun, 2011-10-23 at 22:19 -0700, Andi Kleen wrote:
> Gilad Ben-Yossef <gilad@benyossef.com> writes:
> 
> > We need a cpumask to track cpus with per cpu cache pages
> > to know which cpu to whack during flush_all. For
> > CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
> > For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
> > on the flush_all path, so we preallocate per kmem_cache
> > on cache creation and use it in flush_all.
> 
> What's the problem with calling kmalloc in flush_all? 
> That's a slow path anyways, isn't it?
> 
> I believe the IPI functions usually allocate anyways.
> 
> So maybe you can do that much simpler.

You'd be trying to allocate memory in a memory shrinking path.

-- 

Sasha.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-24  5:19   ` Andi Kleen
  2011-10-24  6:16     ` Sasha Levin
@ 2011-10-24  8:02     ` Gilad Ben-Yossef
  2011-10-24  8:05       ` Fwd: " Gilad Ben-Yossef
  1 sibling, 1 reply; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-24  8:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: lkml, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Sasha Levin

On Mon, Oct 24, 2011 at 7:19 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
> Gilad Ben-Yossef <gilad@benyossef.com> writes:
>
> > We need a cpumask to track cpus with per cpu cache pages
> > to know which cpu to whack during flush_all. For
> > CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
> > For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
> > on the flush_all path, so we preallocate per kmem_cache
> > on cache creation and use it in flush_all.
>
> What's the problem with calling kmalloc in flush_all?
> That's a slow path anyways, isn't it?
>
> I believe the IPI functions usually allocate anyways.
>
> So maybe you can do that much simpler.

That was what the first version of the patch did (use
alloc_cpumask_var in flush_all).

Pekka Enberg pointed out that calling kmalloc on the kmem_cache
shrinking code path is not a good idea
and it does sound like a deadlock waiting to happen.

Gilad

--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Fwd: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-24  8:02     ` Gilad Ben-Yossef
@ 2011-10-24  8:05       ` Gilad Ben-Yossef
  0 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-24  8:05 UTC (permalink / raw)
  To: linux-kernel

[Re-sending to the list due to some foul up with LKML address on my part]


On Mon, Oct 24, 2011 at 7:19 AM, Andi Kleen <andi@firstfloor.org> wrote:
>
> Gilad Ben-Yossef <gilad@benyossef.com> writes:
>
> > We need a cpumask to track cpus with per cpu cache pages
> > to know which cpu to whack during flush_all. For
> > CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
> > For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
> > on the flush_all path, so we preallocate per kmem_cache
> > on cache creation and use it in flush_all.
>
> What's the problem with calling kmalloc in flush_all?
> That's a slow path anyways, isn't it?
>
> I believe the IPI functions usually allocate anyways.
>
> So maybe you can do that much simpler.

That was what the first version of the patch did (use
alloc_cpumask_var in flush_all).

Pekka Enberg pointed out that calling kmalloc on the kmem_cache
shrinking code path is not a good idea
and it does sound like a deadlock waiting to happen.

Gilad

--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "



-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-23 15:48 ` [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
@ 2011-10-28  4:10   ` Christoph Lameter
  2011-10-28  8:42     ` Gilad Ben-Yossef
  2011-10-28  8:43       ` Gilad Ben-Yossef
  0 siblings, 2 replies; 17+ messages in thread
From: Christoph Lameter @ 2011-10-28  4:10 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: lkml, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:

> +/* Which CPUs have per cpu pages  */
> +cpumask_var_t cpus_with_pcp;
> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);

This increases the cache footprint of a hot vm path. Is it possible to do
the same than what you did for slub? Run a loop over all zones when
draining to check for remaining pcp pages and build the set of cpus
needing IPIs temporarily while draining?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-28  4:10   ` Christoph Lameter
@ 2011-10-28  8:42     ` Gilad Ben-Yossef
  2011-10-28  8:43       ` Gilad Ben-Yossef
  1 sibling, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-28  8:42 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

[-- Attachment #1: Type: text/plain, Size: 955 bytes --]

On Fri, Oct 28, 2011 at 6:10 AM, Christoph Lameter <cl@gentwo.org> wrote:

> On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:
>
> > +/* Which CPUs have per cpu pages  */
> > +cpumask_var_t cpus_with_pcp;
> > +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
>
> This increases the cache footprint of a hot vm path. Is it possible to do
> the same than what you did for slub? Run a loop over all zones when
> draining to check for remaining pcp pages and build the set of cpus
> needing IPIs temporarily while draining?
>
>
Sounds like a good idea. I will give it a shot.

Thanks,
Gilad



-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in registers. All
those moments will be lost in time... like tears in rain... Time to die. "

[-- Attachment #2: Type: text/html, Size: 1555 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-28  4:10   ` Christoph Lameter
@ 2011-10-28  8:43       ` Gilad Ben-Yossef
  2011-10-28  8:43       ` Gilad Ben-Yossef
  1 sibling, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-28  8:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, Oct 28, 2011 at 6:10 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:
>
>> +/* Which CPUs have per cpu pages  */
>> +cpumask_var_t cpus_with_pcp;
>> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
>
> This increases the cache footprint of a hot vm path. Is it possible to do
> the same than what you did for slub? Run a loop over all zones when
> draining to check for remaining pcp pages and build the set of cpus
> needing IPIs temporarily while draining?
>

Sounds like a good idea. I will give it a shot.

Thanks,
Gilad




-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
@ 2011-10-28  8:43       ` Gilad Ben-Yossef
  0 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-28  8:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, Oct 28, 2011 at 6:10 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:
>
>> +/* Which CPUs have per cpu pages  */
>> +cpumask_var_t cpus_with_pcp;
>> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
>
> This increases the cache footprint of a hot vm path. Is it possible to do
> the same than what you did for slub? Run a loop over all zones when
> draining to check for remaining pcp pages and build the set of cpus
> needing IPIs temporarily while draining?
>

Sounds like a good idea. I will give it a shot.

Thanks,
Gilad




-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 0/6] Reduce cross CPU IPI interference
@ 2011-10-23 15:56 ` Gilad Ben-Yossef
  0 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We have lots of infrastructure in place to partition a multi-core system such
that we have a group of CPUs that are dedicated to specific task: cgroups,
scheduler and interrupt affinity and cpuisol boot parameter. Still, kernel
code will some time interrupt all CPUs in the system via IPIs for various
needs. These IPIs are useful and cannot be avoided altogether, but in certain
cases it is possible to interrupt only specific CPUs that have useful work to
do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying to
explore doing this by locating the places where such global IPI calls are
being made and turning a global IPI into an IPI for a specific group of CPUs.
The purpose of the patch set is to get  feedback if this is the right way to
go for dealing with this issue and indeed, if the issue is even worth dealing
with at all.

The patch creates an on_each_cpu_mask infrastructure API (derived from
existing arch specific versions in Tile and Arm) and uses it to turn two global
IPI invocation to per CPU group invocations.

This second version incorporates changes due to reviewers feedback and 
additional testing. The major changes from the previous version of the patch 
are:

- Better description for some of the patches with examples of what I am
  trying to solve.
- Better coding style for on_each_cpu based on review comments by Peter 
  Zijlstra and Sasha Levin.
- Fixed pcp_count handling to take into account which cpu the accounting 
  is done for. Sadly, AFAIK this negates using this_cpu_add/sub as 
  suggested by Peter Z.
- Removed kmalloc from the flush_all() path as per review comment by 
  Pekka Enberg.
- Moved cpumask allocations for CONFIG_CPUMASK_OFFSTACK=y to a point previous
  to first use during boot as testing revealed we no longer boot under 
  CONFIG_CPUMASK_OFFSTACK=y with original code.

The patch was compiled for arm and boot tested on x86 in UP, SMP, with and without 
CONFIG_CPUMASK_OFFSTACK and was further tested by running hackbench on x86 in 
SMP mode in a 4 CPUs VM for several hours with no obvious regressions.

I also artificially exercised SLUB flush_all via the debug interface and observed 
the difference in IPI count across processors  with and without the patch - from 
an IPI on all processors but one without the patch to a subset (and often no IPI 
at all) with the patch.


Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>

Gilad Ben-Yossef (6):
  smp: Introduce a generic on_each_cpu_mask function
  arm: Move arm over to generic on_each_cpu_mask
  tile: Move tile to use generic on_each_cpu_mask
  mm: Only IPI CPUs to drain local pages if they exist
  slub: Only IPI CPUs that have per cpu obj to flush
  slub: only preallocate cpus_with_slabs if offstack

 arch/arm/kernel/smp_tlb.c   |   20 +++----------
 arch/tile/include/asm/smp.h |    7 -----
 arch/tile/kernel/smp.c      |   19 -------------
 include/linux/slub_def.h    |    9 ++++++
 include/linux/smp.h         |   16 +++++++++++
 kernel/smp.c                |   20 +++++++++++++
 mm/page_alloc.c             |   64 +++++++++++++++++++++++++++++++++++++------
 mm/slub.c                   |   61 +++++++++++++++++++++++++++++++++++++++-
 8 files changed, 164 insertions(+), 52 deletions(-)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 0/6] Reduce cross CPU IPI interference
@ 2011-10-23 15:56 ` Gilad Ben-Yossef
  0 siblings, 0 replies; 17+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We have lots of infrastructure in place to partition a multi-core system such
that we have a group of CPUs that are dedicated to specific task: cgroups,
scheduler and interrupt affinity and cpuisol boot parameter. Still, kernel
code will some time interrupt all CPUs in the system via IPIs for various
needs. These IPIs are useful and cannot be avoided altogether, but in certain
cases it is possible to interrupt only specific CPUs that have useful work to
do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying to
explore doing this by locating the places where such global IPI calls are
being made and turning a global IPI into an IPI for a specific group of CPUs.
The purpose of the patch set is to get  feedback if this is the right way to
go for dealing with this issue and indeed, if the issue is even worth dealing
with at all.

The patch creates an on_each_cpu_mask infrastructure API (derived from
existing arch specific versions in Tile and Arm) and uses it to turn two global
IPI invocation to per CPU group invocations.

This second version incorporates changes due to reviewers feedback and 
additional testing. The major changes from the previous version of the patch 
are:

- Better description for some of the patches with examples of what I am
  trying to solve.
- Better coding style for on_each_cpu based on review comments by Peter 
  Zijlstra and Sasha Levin.
- Fixed pcp_count handling to take into account which cpu the accounting 
  is done for. Sadly, AFAIK this negates using this_cpu_add/sub as 
  suggested by Peter Z.
- Removed kmalloc from the flush_all() path as per review comment by 
  Pekka Enberg.
- Moved cpumask allocations for CONFIG_CPUMASK_OFFSTACK=y to a point previous
  to first use during boot as testing revealed we no longer boot under 
  CONFIG_CPUMASK_OFFSTACK=y with original code.

The patch was compiled for arm and boot tested on x86 in UP, SMP, with and without 
CONFIG_CPUMASK_OFFSTACK and was further tested by running hackbench on x86 in 
SMP mode in a 4 CPUs VM for several hours with no obvious regressions.

I also artificially exercised SLUB flush_all via the debug interface and observed 
the difference in IPI count across processors  with and without the patch - from 
an IPI on all processors but one without the patch to a subset (and often no IPI 
at all) with the patch.


Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>

Gilad Ben-Yossef (6):
  smp: Introduce a generic on_each_cpu_mask function
  arm: Move arm over to generic on_each_cpu_mask
  tile: Move tile to use generic on_each_cpu_mask
  mm: Only IPI CPUs to drain local pages if they exist
  slub: Only IPI CPUs that have per cpu obj to flush
  slub: only preallocate cpus_with_slabs if offstack

 arch/arm/kernel/smp_tlb.c   |   20 +++----------
 arch/tile/include/asm/smp.h |    7 -----
 arch/tile/kernel/smp.c      |   19 -------------
 include/linux/slub_def.h    |    9 ++++++
 include/linux/smp.h         |   16 +++++++++++
 kernel/smp.c                |   20 +++++++++++++
 mm/page_alloc.c             |   64 +++++++++++++++++++++++++++++++++++++------
 mm/slub.c                   |   61 +++++++++++++++++++++++++++++++++++++++-
 8 files changed, 164 insertions(+), 52 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-10-28  8:43 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
2011-10-23 15:48 ` [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2011-10-23 15:48 ` [PATCH v2 2/6] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2011-10-23 15:48 ` [PATCH v2 3/6] tile: Move tile to use " Gilad Ben-Yossef
2011-10-23 15:48 ` [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2011-10-28  4:10   ` Christoph Lameter
2011-10-28  8:42     ` Gilad Ben-Yossef
2011-10-28  8:43     ` Gilad Ben-Yossef
2011-10-28  8:43       ` Gilad Ben-Yossef
2011-10-23 15:48 ` [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2011-10-23 15:48 ` [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack Gilad Ben-Yossef
2011-10-24  5:19   ` Andi Kleen
2011-10-24  6:16     ` Sasha Levin
2011-10-24  8:02     ` Gilad Ben-Yossef
2011-10-24  8:05       ` Fwd: " Gilad Ben-Yossef
2011-10-23 15:56 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
2011-10-23 15:56 ` Gilad Ben-Yossef

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.