All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Reduce cross CPU IPI interference
@ 2011-10-23 15:56 ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We have lots of infrastructure in place to partition a multi-core system such
that we have a group of CPUs that are dedicated to specific task: cgroups,
scheduler and interrupt affinity and cpuisol boot parameter. Still, kernel
code will some time interrupt all CPUs in the system via IPIs for various
needs. These IPIs are useful and cannot be avoided altogether, but in certain
cases it is possible to interrupt only specific CPUs that have useful work to
do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying to
explore doing this by locating the places where such global IPI calls are
being made and turning a global IPI into an IPI for a specific group of CPUs.
The purpose of the patch set is to get  feedback if this is the right way to
go for dealing with this issue and indeed, if the issue is even worth dealing
with at all.

The patch creates an on_each_cpu_mask infrastructure API (derived from
existing arch specific versions in Tile and Arm) and uses it to turn two global
IPI invocation to per CPU group invocations.

This second version incorporates changes due to reviewers feedback and 
additional testing. The major changes from the previous version of the patch 
are:

- Better description for some of the patches with examples of what I am
  trying to solve.
- Better coding style for on_each_cpu based on review comments by Peter 
  Zijlstra and Sasha Levin.
- Fixed pcp_count handling to take into account which cpu the accounting 
  is done for. Sadly, AFAIK this negates using this_cpu_add/sub as 
  suggested by Peter Z.
- Removed kmalloc from the flush_all() path as per review comment by 
  Pekka Enberg.
- Moved cpumask allocations for CONFIG_CPUMASK_OFFSTACK=y to a point previous
  to first use during boot as testing revealed we no longer boot under 
  CONFIG_CPUMASK_OFFSTACK=y with original code.

The patch was compiled for arm and boot tested on x86 in UP, SMP, with and without 
CONFIG_CPUMASK_OFFSTACK and was further tested by running hackbench on x86 in 
SMP mode in a 4 CPUs VM for several hours with no obvious regressions.

I also artificially exercised SLUB flush_all via the debug interface and observed 
the difference in IPI count across processors  with and without the patch - from 
an IPI on all processors but one without the patch to a subset (and often no IPI 
at all) with the patch.


Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>

Gilad Ben-Yossef (6):
  smp: Introduce a generic on_each_cpu_mask function
  arm: Move arm over to generic on_each_cpu_mask
  tile: Move tile to use generic on_each_cpu_mask
  mm: Only IPI CPUs to drain local pages if they exist
  slub: Only IPI CPUs that have per cpu obj to flush
  slub: only preallocate cpus_with_slabs if offstack

 arch/arm/kernel/smp_tlb.c   |   20 +++----------
 arch/tile/include/asm/smp.h |    7 -----
 arch/tile/kernel/smp.c      |   19 -------------
 include/linux/slub_def.h    |    9 ++++++
 include/linux/smp.h         |   16 +++++++++++
 kernel/smp.c                |   20 +++++++++++++
 mm/page_alloc.c             |   64 +++++++++++++++++++++++++++++++++++++------
 mm/slub.c                   |   61 +++++++++++++++++++++++++++++++++++++++-
 8 files changed, 164 insertions(+), 52 deletions(-)


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/6] Reduce cross CPU IPI interference
@ 2011-10-23 15:56 ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We have lots of infrastructure in place to partition a multi-core system such
that we have a group of CPUs that are dedicated to specific task: cgroups,
scheduler and interrupt affinity and cpuisol boot parameter. Still, kernel
code will some time interrupt all CPUs in the system via IPIs for various
needs. These IPIs are useful and cannot be avoided altogether, but in certain
cases it is possible to interrupt only specific CPUs that have useful work to
do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying to
explore doing this by locating the places where such global IPI calls are
being made and turning a global IPI into an IPI for a specific group of CPUs.
The purpose of the patch set is to get  feedback if this is the right way to
go for dealing with this issue and indeed, if the issue is even worth dealing
with at all.

The patch creates an on_each_cpu_mask infrastructure API (derived from
existing arch specific versions in Tile and Arm) and uses it to turn two global
IPI invocation to per CPU group invocations.

This second version incorporates changes due to reviewers feedback and 
additional testing. The major changes from the previous version of the patch 
are:

- Better description for some of the patches with examples of what I am
  trying to solve.
- Better coding style for on_each_cpu based on review comments by Peter 
  Zijlstra and Sasha Levin.
- Fixed pcp_count handling to take into account which cpu the accounting 
  is done for. Sadly, AFAIK this negates using this_cpu_add/sub as 
  suggested by Peter Z.
- Removed kmalloc from the flush_all() path as per review comment by 
  Pekka Enberg.
- Moved cpumask allocations for CONFIG_CPUMASK_OFFSTACK=y to a point previous
  to first use during boot as testing revealed we no longer boot under 
  CONFIG_CPUMASK_OFFSTACK=y with original code.

The patch was compiled for arm and boot tested on x86 in UP, SMP, with and without 
CONFIG_CPUMASK_OFFSTACK and was further tested by running hackbench on x86 in 
SMP mode in a 4 CPUs VM for several hours with no obvious regressions.

I also artificially exercised SLUB flush_all via the debug interface and observed 
the difference in IPI count across processors  with and without the patch - from 
an IPI on all processors but one without the patch to a subset (and often no IPI 
at all) with the patch.


Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>

Gilad Ben-Yossef (6):
  smp: Introduce a generic on_each_cpu_mask function
  arm: Move arm over to generic on_each_cpu_mask
  tile: Move tile to use generic on_each_cpu_mask
  mm: Only IPI CPUs to drain local pages if they exist
  slub: Only IPI CPUs that have per cpu obj to flush
  slub: only preallocate cpus_with_slabs if offstack

 arch/arm/kernel/smp_tlb.c   |   20 +++----------
 arch/tile/include/asm/smp.h |    7 -----
 arch/tile/kernel/smp.c      |   19 -------------
 include/linux/slub_def.h    |    9 ++++++
 include/linux/smp.h         |   16 +++++++++++
 kernel/smp.c                |   20 +++++++++++++
 mm/page_alloc.c             |   64 +++++++++++++++++++++++++++++++++++++------
 mm/slub.c                   |   61 +++++++++++++++++++++++++++++++++++++++-
 8 files changed, 164 insertions(+), 52 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function
  2011-10-23 15:56 ` Gilad Ben-Yossef
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

on_each_cpu_mask calls a function on processors specified my cpumask,
which may include the local processor.

All the limitation specified in smp_call_function_many apply.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/smp.h |   16 ++++++++++++++++
 kernel/smp.c        |   20 ++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 8cc38d3..60628d7 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
 int on_each_cpu(smp_call_func_t func, void *info, int wait);
 
 /*
+ * Call a function on processors specified by mask, which might include
+ * the local one.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+		void *info, bool wait);
+
+/*
  * Mark the boot cpu "online" so that it can call console drivers in
  * printk() and can access its per-cpu storage.
  */
@@ -132,6 +139,15 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
 		local_irq_enable();		\
 		0;				\
 	})
+#define on_each_cpu_mask(mask, func, info, wait) \
+	do {						\
+		if (cpumask_test_cpu(0, (mask))) {	\
+			local_irq_disable();		\
+			(func)(info);			\
+			local_irq_enable();		\
+		}					\
+	} while (0)
+
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
 #define smp_prepare_boot_cpu()			do {} while (0)
diff --git a/kernel/smp.c b/kernel/smp.c
index fb67dfa..df37c08 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -701,3 +701,23 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
+
+/*
+ * Call a function on processors specified by cpumask, which may include
+ * the local processor. All the limitation specified in smp_call_function_many
+ * apply.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+			void *info, bool wait)
+{
+	int cpu = get_cpu();
+
+	smp_call_function_many(mask, func, info, wait);
+	if (cpumask_test_cpu(cpu, mask)) {
+		local_irq_disable();
+		func(info);
+		local_irq_enable();
+	}
+	put_cpu();
+}
+EXPORT_SYMBOL(on_each_cpu_mask);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

on_each_cpu_mask calls a function on processors specified my cpumask,
which may include the local processor.

All the limitation specified in smp_call_function_many apply.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/smp.h |   16 ++++++++++++++++
 kernel/smp.c        |   20 ++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 8cc38d3..60628d7 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
 int on_each_cpu(smp_call_func_t func, void *info, int wait);
 
 /*
+ * Call a function on processors specified by mask, which might include
+ * the local one.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+		void *info, bool wait);
+
+/*
  * Mark the boot cpu "online" so that it can call console drivers in
  * printk() and can access its per-cpu storage.
  */
@@ -132,6 +139,15 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
 		local_irq_enable();		\
 		0;				\
 	})
+#define on_each_cpu_mask(mask, func, info, wait) \
+	do {						\
+		if (cpumask_test_cpu(0, (mask))) {	\
+			local_irq_disable();		\
+			(func)(info);			\
+			local_irq_enable();		\
+		}					\
+	} while (0)
+
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
 #define smp_prepare_boot_cpu()			do {} while (0)
diff --git a/kernel/smp.c b/kernel/smp.c
index fb67dfa..df37c08 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -701,3 +701,23 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
+
+/*
+ * Call a function on processors specified by cpumask, which may include
+ * the local processor. All the limitation specified in smp_call_function_many
+ * apply.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+			void *info, bool wait)
+{
+	int cpu = get_cpu();
+
+	smp_call_function_many(mask, func, info, wait);
+	if (cpumask_test_cpu(cpu, mask)) {
+		local_irq_disable();
+		func(info);
+		local_irq_enable();
+	}
+	put_cpu();
+}
+EXPORT_SYMBOL(on_each_cpu_mask);
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/6] arm: Move arm over to generic on_each_cpu_mask
  2011-10-23 15:56 ` Gilad Ben-Yossef
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

Note the generic version has the mask as first parameter

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 arch/arm/kernel/smp_tlb.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 7dcb352..02c5d2c 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -13,18 +13,6 @@
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
-static void on_each_cpu_mask(void (*func)(void *), void *info, int wait,
-	const struct cpumask *mask)
-{
-	preempt_disable();
-
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(smp_processor_id(), mask))
-		func(info);
-
-	preempt_enable();
-}
-
 /**********************************************************************/
 
 /*
@@ -87,7 +75,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(ipi_flush_tlb_mm, mm, 1, mm_cpumask(mm));
+		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
 	else
 		local_flush_tlb_mm(mm);
 }
@@ -98,7 +86,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(ipi_flush_tlb_page, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+					&ta, 1);
 	} else
 		local_flush_tlb_page(vma, uaddr);
 }
@@ -121,7 +110,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(ipi_flush_tlb_range, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+					&ta, 1);
 	} else
 		local_flush_tlb_range(vma, start, end);
 }
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/6] arm: Move arm over to generic on_each_cpu_mask
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

Note the generic version has the mask as first parameter

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 arch/arm/kernel/smp_tlb.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 7dcb352..02c5d2c 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -13,18 +13,6 @@
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
-static void on_each_cpu_mask(void (*func)(void *), void *info, int wait,
-	const struct cpumask *mask)
-{
-	preempt_disable();
-
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(smp_processor_id(), mask))
-		func(info);
-
-	preempt_enable();
-}
-
 /**********************************************************************/
 
 /*
@@ -87,7 +75,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(ipi_flush_tlb_mm, mm, 1, mm_cpumask(mm));
+		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
 	else
 		local_flush_tlb_mm(mm);
 }
@@ -98,7 +86,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(ipi_flush_tlb_page, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+					&ta, 1);
 	} else
 		local_flush_tlb_page(vma, uaddr);
 }
@@ -121,7 +110,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(ipi_flush_tlb_range, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+					&ta, 1);
 	} else
 		local_flush_tlb_range(vma, start, end);
 }
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/6] tile: Move tile to use generic on_each_cpu_mask
  2011-10-23 15:56 ` Gilad Ben-Yossef
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

The API is the same as the tile private one, so just remove
the private version of the functions

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 arch/tile/include/asm/smp.h |    7 -------
 arch/tile/kernel/smp.c      |   19 -------------------
 2 files changed, 0 insertions(+), 26 deletions(-)

diff --git a/arch/tile/include/asm/smp.h b/arch/tile/include/asm/smp.h
index 532124a..1aa759a 100644
--- a/arch/tile/include/asm/smp.h
+++ b/arch/tile/include/asm/smp.h
@@ -43,10 +43,6 @@ void evaluate_message(int tag);
 /* Boot a secondary cpu */
 void online_secondary(void);
 
-/* Call a function on a specified set of CPUs (may include this one). */
-extern void on_each_cpu_mask(const struct cpumask *mask,
-			     void (*func)(void *), void *info, bool wait);
-
 /* Topology of the supervisor tile grid, and coordinates of boot processor */
 extern HV_Topology smp_topology;
 
@@ -91,9 +87,6 @@ void print_disabled_cpus(void);
 
 #else /* !CONFIG_SMP */
 
-#define on_each_cpu_mask(mask, func, info, wait)		\
-  do { if (cpumask_test_cpu(0, (mask))) func(info); } while (0)
-
 #define smp_master_cpu		0
 #define smp_height		1
 #define smp_width		1
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index c52224d..a44e103 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -87,25 +87,6 @@ void send_IPI_allbutself(int tag)
 	send_IPI_many(&mask, tag);
 }
 
-
-/*
- * Provide smp_call_function_mask, but also run function locally
- * if specified in the mask.
- */
-void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
-		      void *info, bool wait)
-{
-	int cpu = get_cpu();
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(cpu, mask)) {
-		local_irq_disable();
-		func(info);
-		local_irq_enable();
-	}
-	put_cpu();
-}
-
-
 /*
  * Functions related to starting/stopping cpus.
  */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/6] tile: Move tile to use generic on_each_cpu_mask
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

The API is the same as the tile private one, so just remove
the private version of the functions

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 arch/tile/include/asm/smp.h |    7 -------
 arch/tile/kernel/smp.c      |   19 -------------------
 2 files changed, 0 insertions(+), 26 deletions(-)

diff --git a/arch/tile/include/asm/smp.h b/arch/tile/include/asm/smp.h
index 532124a..1aa759a 100644
--- a/arch/tile/include/asm/smp.h
+++ b/arch/tile/include/asm/smp.h
@@ -43,10 +43,6 @@ void evaluate_message(int tag);
 /* Boot a secondary cpu */
 void online_secondary(void);
 
-/* Call a function on a specified set of CPUs (may include this one). */
-extern void on_each_cpu_mask(const struct cpumask *mask,
-			     void (*func)(void *), void *info, bool wait);
-
 /* Topology of the supervisor tile grid, and coordinates of boot processor */
 extern HV_Topology smp_topology;
 
@@ -91,9 +87,6 @@ void print_disabled_cpus(void);
 
 #else /* !CONFIG_SMP */
 
-#define on_each_cpu_mask(mask, func, info, wait)		\
-  do { if (cpumask_test_cpu(0, (mask))) func(info); } while (0)
-
 #define smp_master_cpu		0
 #define smp_height		1
 #define smp_width		1
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index c52224d..a44e103 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -87,25 +87,6 @@ void send_IPI_allbutself(int tag)
 	send_IPI_many(&mask, tag);
 }
 
-
-/*
- * Provide smp_call_function_mask, but also run function locally
- * if specified in the mask.
- */
-void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
-		      void *info, bool wait)
-{
-	int cpu = get_cpu();
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(cpu, mask)) {
-		local_irq_disable();
-		func(info);
-		local_irq_enable();
-	}
-	put_cpu();
-}
-
-
 /*
  * Functions related to starting/stopping cpus.
  */
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-23 15:56 ` Gilad Ben-Yossef
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

Use a cpumask to track CPUs with per-cpu pages in any zone
and only send an IPI requesting CPUs to drain these pages
to the buddy allocator if they actually have pages.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 mm/page_alloc.c |   64 +++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e8ecb6..9551b90 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -57,11 +57,17 @@
 #include <linux/ftrace_event.h>
 #include <linux/memcontrol.h>
 #include <linux/prefetch.h>
+#include <linux/percpu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
+#include <asm/local.h>
 #include "internal.h"
 
+/* Which CPUs have per cpu pages  */
+cpumask_var_t cpus_with_pcp;
+static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
+
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -224,6 +230,36 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+/*
+ * The following two functions track page counts both per zone/per CPU
+ * and globaly per CPU.
+ *
+ * They must be called with interrupts disabled and either pinned to specific
+ * CPU or for offline CPUs or under stop_machine.
+ */
+
+static inline void inc_pcp_count(int cpu, struct per_cpu_pages *pcp, int count)
+{
+	unsigned long *tot = &per_cpu(total_cpu_pcp_count, cpu);
+
+	if (unlikely(!*tot))
+		cpumask_set_cpu(cpu, cpus_with_pcp);
+
+	*tot += count;
+	pcp->count += count;
+}
+
+static inline void dec_pcp_count(int cpu, struct per_cpu_pages *pcp, int count)
+{
+	unsigned long *tot = &per_cpu(total_cpu_pcp_count, cpu);
+
+	pcp->count -= count;
+	*tot -= count;
+
+	if (unlikely(!*tot))
+		cpumask_clear_cpu(cpu, cpus_with_pcp);
+}
+
 static void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
@@ -1072,7 +1108,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 	else
 		to_drain = pcp->count;
 	free_pcppages_bulk(zone, to_drain, pcp);
-	pcp->count -= to_drain;
+	dec_pcp_count(smp_processor_id(), pcp, to_drain);
 	local_irq_restore(flags);
 }
 #endif
@@ -1099,7 +1135,7 @@ static void drain_pages(unsigned int cpu)
 		pcp = &pset->pcp;
 		if (pcp->count) {
 			free_pcppages_bulk(zone, pcp->count, pcp);
-			pcp->count = 0;
+			dec_pcp_count(cpu, pcp, pcp->count);
 		}
 		local_irq_restore(flags);
 	}
@@ -1118,7 +1154,7 @@ void drain_local_pages(void *arg)
  */
 void drain_all_pages(void)
 {
-	on_each_cpu(drain_local_pages, NULL, 1);
+	on_each_cpu_mask(cpus_with_pcp, drain_local_pages, NULL, 1);
 }
 
 #ifdef CONFIG_HIBERNATION
@@ -1166,7 +1202,7 @@ void free_hot_cold_page(struct page *page, int cold)
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
-	int migratetype;
+	int migratetype, cpu;
 	int wasMlocked = __TestClearPageMlocked(page);
 
 	if (!free_pages_prepare(page, 0))
@@ -1194,15 +1230,16 @@ void free_hot_cold_page(struct page *page, int cold)
 		migratetype = MIGRATE_MOVABLE;
 	}
 
+	cpu = smp_processor_id();
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	if (cold)
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
 		list_add(&page->lru, &pcp->lists[migratetype]);
-	pcp->count++;
+	inc_pcp_count(cpu, pcp, 1);
 	if (pcp->count >= pcp->high) {
 		free_pcppages_bulk(zone, pcp->batch, pcp);
-		pcp->count -= pcp->batch;
+		dec_pcp_count(cpu, pcp, pcp->batch);
 	}
 
 out:
@@ -1305,9 +1342,10 @@ again:
 		pcp = &this_cpu_ptr(zone->pageset)->pcp;
 		list = &pcp->lists[migratetype];
 		if (list_empty(list)) {
-			pcp->count += rmqueue_bulk(zone, 0,
+			inc_pcp_count(smp_processor_id(), pcp,
+					rmqueue_bulk(zone, 0,
 					pcp->batch, list,
-					migratetype, cold);
+					migratetype, cold));
 			if (unlikely(list_empty(list)))
 				goto failed;
 		}
@@ -1318,7 +1356,7 @@ again:
 			page = list_entry(list->next, struct page, lru);
 
 		list_del(&page->lru);
-		pcp->count--;
+		dec_pcp_count(smp_processor_id(), pcp, 1);
 	} else {
 		if (unlikely(gfp_flags & __GFP_NOFAIL)) {
 			/*
@@ -3553,6 +3591,10 @@ static int zone_batchsize(struct zone *zone)
 #endif
 }
 
+/*
+ * NOTE: If you call this function on a pcp of a populated zone you
+ * need to worry about syncing cpus_with_pcp state as well.
+ */
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
@@ -3673,6 +3715,7 @@ static int __zone_pcp_update(void *data)
 
 		local_irq_save(flags);
 		free_pcppages_bulk(zone, pcp->count, pcp);
+		dec_pcp_count(cpu, pcp, pcp->count);
 		setup_pageset(pset, batch);
 		local_irq_restore(flags);
 	}
@@ -5040,6 +5083,9 @@ static int page_alloc_cpu_notify(struct notifier_block *self,
 void __init page_alloc_init(void)
 {
 	hotcpu_notifier(page_alloc_cpu_notify, 0);
+
+	/* Allocate the cpus_with_pcp var if CONFIG_CPUMASK_OFFSTACK */
+	alloc_bootmem_cpumask_var(&cpus_with_pcp);
 }
 
 /*
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

Use a cpumask to track CPUs with per-cpu pages in any zone
and only send an IPI requesting CPUs to drain these pages
to the buddy allocator if they actually have pages.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 mm/page_alloc.c |   64 +++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6e8ecb6..9551b90 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -57,11 +57,17 @@
 #include <linux/ftrace_event.h>
 #include <linux/memcontrol.h>
 #include <linux/prefetch.h>
+#include <linux/percpu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
+#include <asm/local.h>
 #include "internal.h"
 
+/* Which CPUs have per cpu pages  */
+cpumask_var_t cpus_with_pcp;
+static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
+
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
@@ -224,6 +230,36 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
+/*
+ * The following two functions track page counts both per zone/per CPU
+ * and globaly per CPU.
+ *
+ * They must be called with interrupts disabled and either pinned to specific
+ * CPU or for offline CPUs or under stop_machine.
+ */
+
+static inline void inc_pcp_count(int cpu, struct per_cpu_pages *pcp, int count)
+{
+	unsigned long *tot = &per_cpu(total_cpu_pcp_count, cpu);
+
+	if (unlikely(!*tot))
+		cpumask_set_cpu(cpu, cpus_with_pcp);
+
+	*tot += count;
+	pcp->count += count;
+}
+
+static inline void dec_pcp_count(int cpu, struct per_cpu_pages *pcp, int count)
+{
+	unsigned long *tot = &per_cpu(total_cpu_pcp_count, cpu);
+
+	pcp->count -= count;
+	*tot -= count;
+
+	if (unlikely(!*tot))
+		cpumask_clear_cpu(cpu, cpus_with_pcp);
+}
+
 static void set_pageblock_migratetype(struct page *page, int migratetype)
 {
 
@@ -1072,7 +1108,7 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 	else
 		to_drain = pcp->count;
 	free_pcppages_bulk(zone, to_drain, pcp);
-	pcp->count -= to_drain;
+	dec_pcp_count(smp_processor_id(), pcp, to_drain);
 	local_irq_restore(flags);
 }
 #endif
@@ -1099,7 +1135,7 @@ static void drain_pages(unsigned int cpu)
 		pcp = &pset->pcp;
 		if (pcp->count) {
 			free_pcppages_bulk(zone, pcp->count, pcp);
-			pcp->count = 0;
+			dec_pcp_count(cpu, pcp, pcp->count);
 		}
 		local_irq_restore(flags);
 	}
@@ -1118,7 +1154,7 @@ void drain_local_pages(void *arg)
  */
 void drain_all_pages(void)
 {
-	on_each_cpu(drain_local_pages, NULL, 1);
+	on_each_cpu_mask(cpus_with_pcp, drain_local_pages, NULL, 1);
 }
 
 #ifdef CONFIG_HIBERNATION
@@ -1166,7 +1202,7 @@ void free_hot_cold_page(struct page *page, int cold)
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
 	unsigned long flags;
-	int migratetype;
+	int migratetype, cpu;
 	int wasMlocked = __TestClearPageMlocked(page);
 
 	if (!free_pages_prepare(page, 0))
@@ -1194,15 +1230,16 @@ void free_hot_cold_page(struct page *page, int cold)
 		migratetype = MIGRATE_MOVABLE;
 	}
 
+	cpu = smp_processor_id();
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	if (cold)
 		list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	else
 		list_add(&page->lru, &pcp->lists[migratetype]);
-	pcp->count++;
+	inc_pcp_count(cpu, pcp, 1);
 	if (pcp->count >= pcp->high) {
 		free_pcppages_bulk(zone, pcp->batch, pcp);
-		pcp->count -= pcp->batch;
+		dec_pcp_count(cpu, pcp, pcp->batch);
 	}
 
 out:
@@ -1305,9 +1342,10 @@ again:
 		pcp = &this_cpu_ptr(zone->pageset)->pcp;
 		list = &pcp->lists[migratetype];
 		if (list_empty(list)) {
-			pcp->count += rmqueue_bulk(zone, 0,
+			inc_pcp_count(smp_processor_id(), pcp,
+					rmqueue_bulk(zone, 0,
 					pcp->batch, list,
-					migratetype, cold);
+					migratetype, cold));
 			if (unlikely(list_empty(list)))
 				goto failed;
 		}
@@ -1318,7 +1356,7 @@ again:
 			page = list_entry(list->next, struct page, lru);
 
 		list_del(&page->lru);
-		pcp->count--;
+		dec_pcp_count(smp_processor_id(), pcp, 1);
 	} else {
 		if (unlikely(gfp_flags & __GFP_NOFAIL)) {
 			/*
@@ -3553,6 +3591,10 @@ static int zone_batchsize(struct zone *zone)
 #endif
 }
 
+/*
+ * NOTE: If you call this function on a pcp of a populated zone you
+ * need to worry about syncing cpus_with_pcp state as well.
+ */
 static void setup_pageset(struct per_cpu_pageset *p, unsigned long batch)
 {
 	struct per_cpu_pages *pcp;
@@ -3673,6 +3715,7 @@ static int __zone_pcp_update(void *data)
 
 		local_irq_save(flags);
 		free_pcppages_bulk(zone, pcp->count, pcp);
+		dec_pcp_count(cpu, pcp, pcp->count);
 		setup_pageset(pset, batch);
 		local_irq_restore(flags);
 	}
@@ -5040,6 +5083,9 @@ static int page_alloc_cpu_notify(struct notifier_block *self,
 void __init page_alloc_init(void)
 {
 	hotcpu_notifier(page_alloc_cpu_notify, 0);
+
+	/* Allocate the cpus_with_pcp var if CONFIG_CPUMASK_OFFSTACK */
+	alloc_bootmem_cpumask_var(&cpus_with_pcp);
 }
 
 /*
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush
  2011-10-23 15:56 ` Gilad Ben-Yossef
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

flush_all() is called for each kmem_cahce_destroy(). So every cache
being destroyed dynamically ended up sending an IPI to each CPU in the
system, regardless if the cache has ever been used there.

For example, if you close the Infinband ipath driver char device file,
the close file ops calls kmem_cache_destroy(). So running some
infiniband config tool on one a single CPU dedicated to system tasks
might interrupt the rest of the 127 CPUs I dedicated to some CPU
intensive task.

I suspect there is a good chance that every line in the output of "git
grep kmem_cache_destroy linux/ | grep '\->'" has a similar scenario.

This patch attempts to rectify this issue by sending an IPI to flush
the per cpu objects back to the free lists only to CPUs that seems to
have such objects.

The check which CPU to IPI is racy but we don't care since
asking a CPU without per cpu objects to flush does no
damage and as far as I can tell the flush_all by itself is
racy against allocs on remote CPUs anyway, so if you meant
the flush_all to be determinstic, you had to arrange for
locking regardless.

Also note that it is fine for concurrent uses of the cpumask var
on different cpus since they end up tracking the same thing. The
only downside to a race is asking a CPU with not per cpu cache
to flush, which before this patch happens all the time any way.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/slub_def.h |    3 +++
 mm/slub.c                |   37 +++++++++++++++++++++++++++++++++++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index f58d641..b130f61 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -102,6 +102,9 @@ struct kmem_cache {
 	 */
 	int remote_node_defrag_ratio;
 #endif
+
+	/* Which CPUs hold local slabs for this cache. */
+	cpumask_var_t cpus_with_slabs;
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/mm/slub.c b/mm/slub.c
index 7c54fe8..f8cbf2d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1948,7 +1948,18 @@ static void flush_cpu_slab(void *d)
 
 static void flush_all(struct kmem_cache *s)
 {
-	on_each_cpu(flush_cpu_slab, s, 1);
+	struct kmem_cache_cpu *c;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		c = per_cpu_ptr(s->cpu_slab, cpu);
+		if (c && c->page)
+			cpumask_set_cpu(cpu, s->cpus_with_slabs);
+		else
+			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
+	}
+
+	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
 }
 
 /*
@@ -3028,6 +3039,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		if (s->flags & SLAB_DESTROY_BY_RCU)
 			rcu_barrier();
 		sysfs_slab_remove(s);
+		free_cpumask_var(s->cpus_with_slabs);
 	}
 	up_write(&slub_lock);
 }
@@ -3528,6 +3540,7 @@ void __init kmem_cache_init(void)
 	int order;
 	struct kmem_cache *temp_kmem_cache_node;
 	unsigned long kmalloc_size;
+	int ret;
 
 	kmem_size = offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *);
@@ -3635,15 +3648,24 @@ void __init kmem_cache_init(void)
 
 	slab_state = UP;
 
-	/* Provide the correct kmalloc names now that the caches are up */
+	/*
+	 * Provide the correct kmalloc names and the cpus_with_slabs cpumasks
+	 * for CONFIG_CPUMASK_OFFSTACK=y case now that the caches are up.
+	 */
 	if (KMALLOC_MIN_SIZE <= 32) {
 		kmalloc_caches[1]->name = kstrdup(kmalloc_caches[1]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[1]->name);
+		ret = alloc_cpumask_var(&kmalloc_caches[1]->cpus_with_slabs,
+			GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 	if (KMALLOC_MIN_SIZE <= 64) {
 		kmalloc_caches[2]->name = kstrdup(kmalloc_caches[2]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[2]->name);
+		ret = alloc_cpumask_var(&kmalloc_caches[2]->cpus_with_slabs,
+				GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 	for (i = KMALLOC_SHIFT_LOW; i < SLUB_PAGE_SHIFT; i++) {
@@ -3651,6 +3673,9 @@ void __init kmem_cache_init(void)
 
 		BUG_ON(!s);
 		kmalloc_caches[i]->name = s;
+		ret = alloc_cpumask_var(&kmalloc_caches[i]->cpus_with_slabs,
+				GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 #ifdef CONFIG_SMP
@@ -3668,6 +3693,10 @@ void __init kmem_cache_init(void)
 			BUG_ON(!name);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(name,
 				s->objsize, SLAB_CACHE_DMA);
+			ret = alloc_cpumask_var(
+				&kmalloc_dma_caches[i]->cpus_with_slabs,
+				GFP_NOWAIT);
+			BUG_ON(!ret);
 		}
 	}
 #endif
@@ -3778,15 +3807,19 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 
 	s = kmalloc(kmem_size, GFP_KERNEL);
 	if (s) {
+
 		if (kmem_cache_open(s, n,
 				size, align, flags, ctor)) {
+			alloc_cpumask_var(&s->cpus_with_slabs, GFP_KERNEL);
 			list_add(&s->list, &slab_caches);
 			if (sysfs_slab_add(s)) {
 				list_del(&s->list);
+				free_cpumask_var(s->cpus_with_slabs);
 				kfree(n);
 				kfree(s);
 				goto err;
 			}
+
 			up_write(&slub_lock);
 			return s;
 		}
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

flush_all() is called for each kmem_cahce_destroy(). So every cache
being destroyed dynamically ended up sending an IPI to each CPU in the
system, regardless if the cache has ever been used there.

For example, if you close the Infinband ipath driver char device file,
the close file ops calls kmem_cache_destroy(). So running some
infiniband config tool on one a single CPU dedicated to system tasks
might interrupt the rest of the 127 CPUs I dedicated to some CPU
intensive task.

I suspect there is a good chance that every line in the output of "git
grep kmem_cache_destroy linux/ | grep '\->'" has a similar scenario.

This patch attempts to rectify this issue by sending an IPI to flush
the per cpu objects back to the free lists only to CPUs that seems to
have such objects.

The check which CPU to IPI is racy but we don't care since
asking a CPU without per cpu objects to flush does no
damage and as far as I can tell the flush_all by itself is
racy against allocs on remote CPUs anyway, so if you meant
the flush_all to be determinstic, you had to arrange for
locking regardless.

Also note that it is fine for concurrent uses of the cpumask var
on different cpus since they end up tracking the same thing. The
only downside to a race is asking a CPU with not per cpu cache
to flush, which before this patch happens all the time any way.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/slub_def.h |    3 +++
 mm/slub.c                |   37 +++++++++++++++++++++++++++++++++++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index f58d641..b130f61 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -102,6 +102,9 @@ struct kmem_cache {
 	 */
 	int remote_node_defrag_ratio;
 #endif
+
+	/* Which CPUs hold local slabs for this cache. */
+	cpumask_var_t cpus_with_slabs;
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/mm/slub.c b/mm/slub.c
index 7c54fe8..f8cbf2d 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1948,7 +1948,18 @@ static void flush_cpu_slab(void *d)
 
 static void flush_all(struct kmem_cache *s)
 {
-	on_each_cpu(flush_cpu_slab, s, 1);
+	struct kmem_cache_cpu *c;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		c = per_cpu_ptr(s->cpu_slab, cpu);
+		if (c && c->page)
+			cpumask_set_cpu(cpu, s->cpus_with_slabs);
+		else
+			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
+	}
+
+	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
 }
 
 /*
@@ -3028,6 +3039,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		if (s->flags & SLAB_DESTROY_BY_RCU)
 			rcu_barrier();
 		sysfs_slab_remove(s);
+		free_cpumask_var(s->cpus_with_slabs);
 	}
 	up_write(&slub_lock);
 }
@@ -3528,6 +3540,7 @@ void __init kmem_cache_init(void)
 	int order;
 	struct kmem_cache *temp_kmem_cache_node;
 	unsigned long kmalloc_size;
+	int ret;
 
 	kmem_size = offsetof(struct kmem_cache, node) +
 				nr_node_ids * sizeof(struct kmem_cache_node *);
@@ -3635,15 +3648,24 @@ void __init kmem_cache_init(void)
 
 	slab_state = UP;
 
-	/* Provide the correct kmalloc names now that the caches are up */
+	/*
+	 * Provide the correct kmalloc names and the cpus_with_slabs cpumasks
+	 * for CONFIG_CPUMASK_OFFSTACK=y case now that the caches are up.
+	 */
 	if (KMALLOC_MIN_SIZE <= 32) {
 		kmalloc_caches[1]->name = kstrdup(kmalloc_caches[1]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[1]->name);
+		ret = alloc_cpumask_var(&kmalloc_caches[1]->cpus_with_slabs,
+			GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 	if (KMALLOC_MIN_SIZE <= 64) {
 		kmalloc_caches[2]->name = kstrdup(kmalloc_caches[2]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[2]->name);
+		ret = alloc_cpumask_var(&kmalloc_caches[2]->cpus_with_slabs,
+				GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 	for (i = KMALLOC_SHIFT_LOW; i < SLUB_PAGE_SHIFT; i++) {
@@ -3651,6 +3673,9 @@ void __init kmem_cache_init(void)
 
 		BUG_ON(!s);
 		kmalloc_caches[i]->name = s;
+		ret = alloc_cpumask_var(&kmalloc_caches[i]->cpus_with_slabs,
+				GFP_NOWAIT);
+		BUG_ON(!ret);
 	}
 
 #ifdef CONFIG_SMP
@@ -3668,6 +3693,10 @@ void __init kmem_cache_init(void)
 			BUG_ON(!name);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(name,
 				s->objsize, SLAB_CACHE_DMA);
+			ret = alloc_cpumask_var(
+				&kmalloc_dma_caches[i]->cpus_with_slabs,
+				GFP_NOWAIT);
+			BUG_ON(!ret);
 		}
 	}
 #endif
@@ -3778,15 +3807,19 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 
 	s = kmalloc(kmem_size, GFP_KERNEL);
 	if (s) {
+
 		if (kmem_cache_open(s, n,
 				size, align, flags, ctor)) {
+			alloc_cpumask_var(&s->cpus_with_slabs, GFP_KERNEL);
 			list_add(&s->list, &slab_caches);
 			if (sysfs_slab_add(s)) {
 				list_del(&s->list);
+				free_cpumask_var(s->cpus_with_slabs);
 				kfree(n);
 				kfree(s);
 				goto err;
 			}
+
 			up_write(&slub_lock);
 			return s;
 		}
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-23 15:56 ` Gilad Ben-Yossef
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We need a cpumask to track cpus with per cpu cache pages
to know which cpu to whack during flush_all. For
CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
on the flush_all path, so we preallocate per kmem_cache
on cache creation and use it in flush_all.

The result is that for the common CONFIG_CPUMASK_OFFSTACK=n
case there is no memory overhead for the mask var.

Since systems where CONFIG_CPUMASK_OFFSTACK=y are the systems
which are most likely to benefit from less IPIs by tracking
which cpu pas actually has a per cpu cache, we end up paying
the overhead only in cases we enjoy the upside.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/slub_def.h |    8 ++++++-
 mm/slub.c                |   52 +++++++++++++++++++++++++++++++++------------
 2 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index b130f61..c07f7aa 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -103,8 +103,14 @@ struct kmem_cache {
 	int remote_node_defrag_ratio;
 #endif
 
-	/* Which CPUs hold local slabs for this cache. */
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	/*
+	 * Which CPUs hold local slabs for this cache.
+	 * Only updated on calling flush_all().
+	 * Defined on stack for CONFIG_CPUMASK_OFFSTACK=n.
+	 */
 	cpumask_var_t cpus_with_slabs;
+#endif
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/mm/slub.c b/mm/slub.c
index f8cbf2d..765be95 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1946,20 +1946,48 @@ static void flush_cpu_slab(void *d)
 	__flush_cpu_slab(s, smp_processor_id());
 }
 
+/*
+ * We need a cpumask struct to track which cpus have
+ * per cpu caches. For CONFIG_CPUMASK_OFFSTACK=n we
+ * allocate on stack. For CONFIG_CPUMASK_OFFSTACK=y
+ * we don't want to allocate in the flush_all code path
+ * so we allocate a struct for each cache structure
+ * on kmem cache creation and use it here.
+ */
 static void flush_all(struct kmem_cache *s)
 {
 	struct kmem_cache_cpu *c;
 	int cpu;
+	cpumask_var_t cpus;
 
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	cpus = s->cpus_with_slabs;
+#endif
 	for_each_online_cpu(cpu) {
 		c = per_cpu_ptr(s->cpu_slab, cpu);
 		if (c && c->page)
-			cpumask_set_cpu(cpu, s->cpus_with_slabs);
+			cpumask_set_cpu(cpu, cpus);
 		else
-			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
+			cpumask_clear_cpu(cpu, cpus);
 	}
 
-	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
+	on_each_cpu_mask(cpus, flush_cpu_slab, s, 1);
+}
+
+static inline int alloc_cpus_mask(struct kmem_cache *s, int flags)
+{
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	return alloc_cpumask_var(&s->cpus_with_slabs, flags);
+#else
+	return 1;
+#endif
+}
+
+static inline void free_cpus_mask(struct kmem_cache *s)
+{
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	free_cpumask_var(s->cpus_with_slabs);
+#endif
 }
 
 /*
@@ -3039,7 +3067,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		if (s->flags & SLAB_DESTROY_BY_RCU)
 			rcu_barrier();
 		sysfs_slab_remove(s);
-		free_cpumask_var(s->cpus_with_slabs);
+		free_cpus_mask(s);
 	}
 	up_write(&slub_lock);
 }
@@ -3655,16 +3683,14 @@ void __init kmem_cache_init(void)
 	if (KMALLOC_MIN_SIZE <= 32) {
 		kmalloc_caches[1]->name = kstrdup(kmalloc_caches[1]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[1]->name);
-		ret = alloc_cpumask_var(&kmalloc_caches[1]->cpus_with_slabs,
-			GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[1], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
 	if (KMALLOC_MIN_SIZE <= 64) {
 		kmalloc_caches[2]->name = kstrdup(kmalloc_caches[2]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[2]->name);
-		ret = alloc_cpumask_var(&kmalloc_caches[2]->cpus_with_slabs,
-				GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[2], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
@@ -3673,8 +3699,7 @@ void __init kmem_cache_init(void)
 
 		BUG_ON(!s);
 		kmalloc_caches[i]->name = s;
-		ret = alloc_cpumask_var(&kmalloc_caches[i]->cpus_with_slabs,
-				GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[i], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
@@ -3693,8 +3718,7 @@ void __init kmem_cache_init(void)
 			BUG_ON(!name);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(name,
 				s->objsize, SLAB_CACHE_DMA);
-			ret = alloc_cpumask_var(
-				&kmalloc_dma_caches[i]->cpus_with_slabs,
+			ret = alloc_cpus_mask(kmalloc_dma_caches[i],
 				GFP_NOWAIT);
 			BUG_ON(!ret);
 		}
@@ -3810,11 +3834,11 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 
 		if (kmem_cache_open(s, n,
 				size, align, flags, ctor)) {
-			alloc_cpumask_var(&s->cpus_with_slabs, GFP_KERNEL);
+			alloc_cpus_mask(s, GFP_KERNEL);
 			list_add(&s->list, &slab_caches);
 			if (sysfs_slab_add(s)) {
 				list_del(&s->list);
-				free_cpumask_var(s->cpus_with_slabs);
+				free_cpus_mask(s);
 				kfree(n);
 				kfree(s);
 				goto err;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
@ 2011-10-23 15:56   ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We need a cpumask to track cpus with per cpu cache pages
to know which cpu to whack during flush_all. For
CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
on the flush_all path, so we preallocate per kmem_cache
on cache creation and use it in flush_all.

The result is that for the common CONFIG_CPUMASK_OFFSTACK=n
case there is no memory overhead for the mask var.

Since systems where CONFIG_CPUMASK_OFFSTACK=y are the systems
which are most likely to benefit from less IPIs by tracking
which cpu pas actually has a per cpu cache, we end up paying
the overhead only in cases we enjoy the upside.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/slub_def.h |    8 ++++++-
 mm/slub.c                |   52 +++++++++++++++++++++++++++++++++------------
 2 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index b130f61..c07f7aa 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -103,8 +103,14 @@ struct kmem_cache {
 	int remote_node_defrag_ratio;
 #endif
 
-	/* Which CPUs hold local slabs for this cache. */
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	/*
+	 * Which CPUs hold local slabs for this cache.
+	 * Only updated on calling flush_all().
+	 * Defined on stack for CONFIG_CPUMASK_OFFSTACK=n.
+	 */
 	cpumask_var_t cpus_with_slabs;
+#endif
 	struct kmem_cache_node *node[MAX_NUMNODES];
 };
 
diff --git a/mm/slub.c b/mm/slub.c
index f8cbf2d..765be95 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1946,20 +1946,48 @@ static void flush_cpu_slab(void *d)
 	__flush_cpu_slab(s, smp_processor_id());
 }
 
+/*
+ * We need a cpumask struct to track which cpus have
+ * per cpu caches. For CONFIG_CPUMASK_OFFSTACK=n we
+ * allocate on stack. For CONFIG_CPUMASK_OFFSTACK=y
+ * we don't want to allocate in the flush_all code path
+ * so we allocate a struct for each cache structure
+ * on kmem cache creation and use it here.
+ */
 static void flush_all(struct kmem_cache *s)
 {
 	struct kmem_cache_cpu *c;
 	int cpu;
+	cpumask_var_t cpus;
 
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	cpus = s->cpus_with_slabs;
+#endif
 	for_each_online_cpu(cpu) {
 		c = per_cpu_ptr(s->cpu_slab, cpu);
 		if (c && c->page)
-			cpumask_set_cpu(cpu, s->cpus_with_slabs);
+			cpumask_set_cpu(cpu, cpus);
 		else
-			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
+			cpumask_clear_cpu(cpu, cpus);
 	}
 
-	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
+	on_each_cpu_mask(cpus, flush_cpu_slab, s, 1);
+}
+
+static inline int alloc_cpus_mask(struct kmem_cache *s, int flags)
+{
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	return alloc_cpumask_var(&s->cpus_with_slabs, flags);
+#else
+	return 1;
+#endif
+}
+
+static inline void free_cpus_mask(struct kmem_cache *s)
+{
+#ifdef CONFIG_CPUMASK_OFFSTACK
+	free_cpumask_var(s->cpus_with_slabs);
+#endif
 }
 
 /*
@@ -3039,7 +3067,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		if (s->flags & SLAB_DESTROY_BY_RCU)
 			rcu_barrier();
 		sysfs_slab_remove(s);
-		free_cpumask_var(s->cpus_with_slabs);
+		free_cpus_mask(s);
 	}
 	up_write(&slub_lock);
 }
@@ -3655,16 +3683,14 @@ void __init kmem_cache_init(void)
 	if (KMALLOC_MIN_SIZE <= 32) {
 		kmalloc_caches[1]->name = kstrdup(kmalloc_caches[1]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[1]->name);
-		ret = alloc_cpumask_var(&kmalloc_caches[1]->cpus_with_slabs,
-			GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[1], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
 	if (KMALLOC_MIN_SIZE <= 64) {
 		kmalloc_caches[2]->name = kstrdup(kmalloc_caches[2]->name, GFP_NOWAIT);
 		BUG_ON(!kmalloc_caches[2]->name);
-		ret = alloc_cpumask_var(&kmalloc_caches[2]->cpus_with_slabs,
-				GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[2], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
@@ -3673,8 +3699,7 @@ void __init kmem_cache_init(void)
 
 		BUG_ON(!s);
 		kmalloc_caches[i]->name = s;
-		ret = alloc_cpumask_var(&kmalloc_caches[i]->cpus_with_slabs,
-				GFP_NOWAIT);
+		ret = alloc_cpus_mask(kmalloc_caches[i], GFP_NOWAIT);
 		BUG_ON(!ret);
 	}
 
@@ -3693,8 +3718,7 @@ void __init kmem_cache_init(void)
 			BUG_ON(!name);
 			kmalloc_dma_caches[i] = create_kmalloc_cache(name,
 				s->objsize, SLAB_CACHE_DMA);
-			ret = alloc_cpumask_var(
-				&kmalloc_dma_caches[i]->cpus_with_slabs,
+			ret = alloc_cpus_mask(kmalloc_dma_caches[i],
 				GFP_NOWAIT);
 			BUG_ON(!ret);
 		}
@@ -3810,11 +3834,11 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 
 		if (kmem_cache_open(s, n,
 				size, align, flags, ctor)) {
-			alloc_cpumask_var(&s->cpus_with_slabs, GFP_KERNEL);
+			alloc_cpus_mask(s, GFP_KERNEL);
 			list_add(&s->list, &slab_caches);
 			if (sysfs_slab_add(s)) {
 				list_del(&s->list);
-				free_cpumask_var(s->cpus_with_slabs);
+				free_cpus_mask(s);
 				kfree(n);
 				kfree(s);
 				goto err;
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-23 15:56   ` Gilad Ben-Yossef
@ 2011-10-28  4:06     ` Christoph Lameter
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2011-10-28  4:06 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:

> We need a cpumask to track cpus with per cpu cache pages
> to know which cpu to whack during flush_all. For
> CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
> For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
> on the flush_all path, so we preallocate per kmem_cache
> on cache creation and use it in flush_all.

I think the on stack alloc should be the default because we can then avoid
the field in kmem_cache and the associated logic with managing the field.
Can we do a GFP_ATOMIC allocation in flush_all()? If the alloc
fails then you can still fallback to send an IPI to all cpus.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
@ 2011-10-28  4:06     ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2011-10-28  4:06 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:

> We need a cpumask to track cpus with per cpu cache pages
> to know which cpu to whack during flush_all. For
> CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
> For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
> on the flush_all path, so we preallocate per kmem_cache
> on cache creation and use it in flush_all.

I think the on stack alloc should be the default because we can then avoid
the field in kmem_cache and the associated logic with managing the field.
Can we do a GFP_ATOMIC allocation in flush_all()? If the alloc
fails then you can still fallback to send an IPI to all cpus.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush
  2011-10-23 15:56   ` Gilad Ben-Yossef
@ 2011-10-28  4:06     ` Christoph Lameter
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2011-10-28  4:06 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:

> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index f58d641..b130f61 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -102,6 +102,9 @@ struct kmem_cache {
>  	 */
>  	int remote_node_defrag_ratio;
>  #endif
> +
> +	/* Which CPUs hold local slabs for this cache. */
> +	cpumask_var_t cpus_with_slabs;
>  	struct kmem_cache_node *node[MAX_NUMNODES];
>  };

Please do not add fields to structures for passing parameters to
functions. This just increases the complexity of the patch and extends a
structures needlessly.

> diff --git a/mm/slub.c b/mm/slub.c
> index 7c54fe8..f8cbf2d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1948,7 +1948,18 @@ static void flush_cpu_slab(void *d)
>
>  static void flush_all(struct kmem_cache *s)
>  {
> -	on_each_cpu(flush_cpu_slab, s, 1);
> +	struct kmem_cache_cpu *c;
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		c = per_cpu_ptr(s->cpu_slab, cpu);
> +		if (c && c->page)
> +			cpumask_set_cpu(cpu, s->cpus_with_slabs);
> +		else
> +			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
> +	}
> +
> +	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
>  }


You do not need s->cpus_with_slabs to be in kmem_cache. Make it a local
variable instead.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush
@ 2011-10-28  4:06     ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2011-10-28  4:06 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:

> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index f58d641..b130f61 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -102,6 +102,9 @@ struct kmem_cache {
>  	 */
>  	int remote_node_defrag_ratio;
>  #endif
> +
> +	/* Which CPUs hold local slabs for this cache. */
> +	cpumask_var_t cpus_with_slabs;
>  	struct kmem_cache_node *node[MAX_NUMNODES];
>  };

Please do not add fields to structures for passing parameters to
functions. This just increases the complexity of the patch and extends a
structures needlessly.

> diff --git a/mm/slub.c b/mm/slub.c
> index 7c54fe8..f8cbf2d 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1948,7 +1948,18 @@ static void flush_cpu_slab(void *d)
>
>  static void flush_all(struct kmem_cache *s)
>  {
> -	on_each_cpu(flush_cpu_slab, s, 1);
> +	struct kmem_cache_cpu *c;
> +	int cpu;
> +
> +	for_each_online_cpu(cpu) {
> +		c = per_cpu_ptr(s->cpu_slab, cpu);
> +		if (c && c->page)
> +			cpumask_set_cpu(cpu, s->cpus_with_slabs);
> +		else
> +			cpumask_clear_cpu(cpu, s->cpus_with_slabs);
> +	}
> +
> +	on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
>  }


You do not need s->cpus_with_slabs to be in kmem_cache. Make it a local
variable instead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush
  2011-10-28  4:06     ` Christoph Lameter
@ 2011-10-28  8:50       ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-28  8:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, Oct 28, 2011 at 6:06 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:
>
>> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
>> index f58d641..b130f61 100644
>> --- a/include/linux/slub_def.h
>> +++ b/include/linux/slub_def.h
>> @@ -102,6 +102,9 @@ struct kmem_cache {
>>        */
>>       int remote_node_defrag_ratio;
>>  #endif
>> +
>> +     /* Which CPUs hold local slabs for this cache. */
>> +     cpumask_var_t cpus_with_slabs;
>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>
> Please do not add fields to structures for passing parameters to
> functions. This just increases the complexity of the patch and extends a
> structures needlessly.

The field was added to provide storage to cpus_with_slabs during
flush_all, since otherwise cpus_with_slabs, being a cpumask, would
require a kmem_cache allocation in the middle of flush_all  for
CONFIG_CPUMASK_OFF_STACK=y case, which Pekka E. objected to.

The next patch in the series makes the field (and overhead) only added
for  CONFIG_CPUMASK_OFF_STACK=y case but I wanted to break out the
addition to the patch core feature and the optimization of only adding
the field for CONFIG_CPUMASK_OFF_STACK=y, so this patch as is without
the next one is only good for bisect value.

I should have probably have commented about it in the description of
this patch and not only in the next one. Sorry about that. I will fix
it for the next round.

>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 7c54fe8..f8cbf2d 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1948,7 +1948,18 @@ static void flush_cpu_slab(void *d)
>>
>>  static void flush_all(struct kmem_cache *s)
>>  {
>> -     on_each_cpu(flush_cpu_slab, s, 1);
>> +     struct kmem_cache_cpu *c;
>> +     int cpu;
>> +
>> +     for_each_online_cpu(cpu) {
>> +             c = per_cpu_ptr(s->cpu_slab, cpu);
>> +             if (c && c->page)
>> +                     cpumask_set_cpu(cpu, s->cpus_with_slabs);
>> +             else
>> +                     cpumask_clear_cpu(cpu, s->cpus_with_slabs);
>> +     }
>> +
>> +     on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
>>  }
>
>
> You do not need s->cpus_with_slabs to be in kmem_cache. Make it a local
> variable instead.

That is what the next patch does - for CONFIG_CPUMASK_OFFSTACK=n, alt least.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush
@ 2011-10-28  8:50       ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-28  8:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, Oct 28, 2011 at 6:06 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:
>
>> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
>> index f58d641..b130f61 100644
>> --- a/include/linux/slub_def.h
>> +++ b/include/linux/slub_def.h
>> @@ -102,6 +102,9 @@ struct kmem_cache {
>>        */
>>       int remote_node_defrag_ratio;
>>  #endif
>> +
>> +     /* Which CPUs hold local slabs for this cache. */
>> +     cpumask_var_t cpus_with_slabs;
>>       struct kmem_cache_node *node[MAX_NUMNODES];
>>  };
>
> Please do not add fields to structures for passing parameters to
> functions. This just increases the complexity of the patch and extends a
> structures needlessly.

The field was added to provide storage to cpus_with_slabs during
flush_all, since otherwise cpus_with_slabs, being a cpumask, would
require a kmem_cache allocation in the middle of flush_all  for
CONFIG_CPUMASK_OFF_STACK=y case, which Pekka E. objected to.

The next patch in the series makes the field (and overhead) only added
for  CONFIG_CPUMASK_OFF_STACK=y case but I wanted to break out the
addition to the patch core feature and the optimization of only adding
the field for CONFIG_CPUMASK_OFF_STACK=y, so this patch as is without
the next one is only good for bisect value.

I should have probably have commented about it in the description of
this patch and not only in the next one. Sorry about that. I will fix
it for the next round.

>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 7c54fe8..f8cbf2d 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1948,7 +1948,18 @@ static void flush_cpu_slab(void *d)
>>
>>  static void flush_all(struct kmem_cache *s)
>>  {
>> -     on_each_cpu(flush_cpu_slab, s, 1);
>> +     struct kmem_cache_cpu *c;
>> +     int cpu;
>> +
>> +     for_each_online_cpu(cpu) {
>> +             c = per_cpu_ptr(s->cpu_slab, cpu);
>> +             if (c && c->page)
>> +                     cpumask_set_cpu(cpu, s->cpus_with_slabs);
>> +             else
>> +                     cpumask_clear_cpu(cpu, s->cpus_with_slabs);
>> +     }
>> +
>> +     on_each_cpu_mask(s->cpus_with_slabs, flush_cpu_slab, s, 1);
>>  }
>
>
> You do not need s->cpus_with_slabs to be in kmem_cache. Make it a local
> variable instead.

That is what the next patch does - for CONFIG_CPUMASK_OFFSTACK=n, alt least.

Thanks,
Gilad


-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-28  4:06     ` Christoph Lameter
@ 2011-10-28  9:09       ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-28  9:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, Oct 28, 2011 at 6:06 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:
>
>> We need a cpumask to track cpus with per cpu cache pages
>> to know which cpu to whack during flush_all. For
>> CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
>> For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
>> on the flush_all path, so we preallocate per kmem_cache
>> on cache creation and use it in flush_all.
>
> I think the on stack alloc should be the default because we can then avoid
> the field in kmem_cache and the associated logic with managing the field.
> Can we do a GFP_ATOMIC allocation in flush_all()? If the alloc
> fails then you can still fallback to send an IPI to all cpus.


Yes, that was exactly what I did in the first version of this patch
did. See: https://lkml.org/lkml/2011/9/25/32

Pekka E. did not like it because of the allocation out of kmem_cache
in CONFIG_CPUMASK_OFFSTACK=y case in a code path that is supposed to
shrink kmem_caches. I have to say I certainly see his point so I tried
to work around that. On the other hand the overhead code complexity
wise of avoiding that allocation is non trivial.

I tried to give it some more thought -

Since flush_all is called on a kmem_cache basis, to allocate off of
the cpumask kmem_cache while shrinking *another cache* is fine. A
little weird maybe, but fine.

Trouble might lurk if some code path will try to shrink the cpumask
kmem_cache. This can happens if a code path ever tries to either close
the cpumask kmem_cache, which I find very unlikely, or if someone will
try to shrink the cpumask kmem_cache. Right now the only in tree user
I found of kmem_shrink_cache is the acpi code, and even that happens
only for a few specific caches and only during boot. I don't see that
changing.

I think if it is up to me, I recommend going the simpler  route that
does the allocation in flush_all using GFP_ATOMIC for
CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
it is simpler code and in the end I believe it is also correct.

What do you guys think?

Thanks!
Gilad
-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
@ 2011-10-28  9:09       ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-28  9:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, Oct 28, 2011 at 6:06 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sun, 23 Oct 2011, Gilad Ben-Yossef wrote:
>
>> We need a cpumask to track cpus with per cpu cache pages
>> to know which cpu to whack during flush_all. For
>> CONFIG_CPUMASK_OFFSTACK=n we allocate the mask on stack.
>> For CONFIG_CPUMASK_OFFSTACK=y we don't want to call kmalloc
>> on the flush_all path, so we preallocate per kmem_cache
>> on cache creation and use it in flush_all.
>
> I think the on stack alloc should be the default because we can then avoid
> the field in kmem_cache and the associated logic with managing the field.
> Can we do a GFP_ATOMIC allocation in flush_all()? If the alloc
> fails then you can still fallback to send an IPI to all cpus.


Yes, that was exactly what I did in the first version of this patch
did. See: https://lkml.org/lkml/2011/9/25/32

Pekka E. did not like it because of the allocation out of kmem_cache
in CONFIG_CPUMASK_OFFSTACK=y case in a code path that is supposed to
shrink kmem_caches. I have to say I certainly see his point so I tried
to work around that. On the other hand the overhead code complexity
wise of avoiding that allocation is non trivial.

I tried to give it some more thought -

Since flush_all is called on a kmem_cache basis, to allocate off of
the cpumask kmem_cache while shrinking *another cache* is fine. A
little weird maybe, but fine.

Trouble might lurk if some code path will try to shrink the cpumask
kmem_cache. This can happens if a code path ever tries to either close
the cpumask kmem_cache, which I find very unlikely, or if someone will
try to shrink the cpumask kmem_cache. Right now the only in tree user
I found of kmem_shrink_cache is the acpi code, and even that happens
only for a few specific caches and only during boot. I don't see that
changing.

I think if it is up to me, I recommend going the simpler  route that
does the allocation in flush_all using GFP_ATOMIC for
CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
it is simpler code and in the end I believe it is also correct.

What do you guys think?

Thanks!
Gilad
-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-23 15:56   ` Gilad Ben-Yossef
@ 2011-10-28 16:07     ` Rik van Riel
  -1 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2011-10-28 16:07 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Sasha Levin

On 10/23/2011 11:56 AM, Gilad Ben-Yossef wrote:
> Use a cpumask to track CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages.

> +/* Which CPUs have per cpu pages  */
> +cpumask_var_t cpus_with_pcp;
> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);

Does the flushing happen so frequently that it is worth keeping this
state on a per-cpu basis, or would it be better to check each CPU's
pcp info and assemble a cpumask at flush time like done in patch 5?



-- 
All rights reversed

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
@ 2011-10-28 16:07     ` Rik van Riel
  0 siblings, 0 replies; 36+ messages in thread
From: Rik van Riel @ 2011-10-28 16:07 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Sasha Levin

On 10/23/2011 11:56 AM, Gilad Ben-Yossef wrote:
> Use a cpumask to track CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages.

> +/* Which CPUs have per cpu pages  */
> +cpumask_var_t cpus_with_pcp;
> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);

Does the flushing happen so frequently that it is worth keeping this
state on a per-cpu basis, or would it be better to check each CPU's
pcp info and assemble a cpumask at flush time like done in patch 5?



-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-28 16:07     ` Rik van Riel
@ 2011-10-29 15:29       ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-29 15:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Sasha Levin

On Fri, Oct 28, 2011 at 6:07 PM, Rik van Riel <riel@redhat.com> wrote:
> On 10/23/2011 11:56 AM, Gilad Ben-Yossef wrote:
>>
>> Use a cpumask to track CPUs with per-cpu pages in any zone
>> and only send an IPI requesting CPUs to drain these pages
>> to the buddy allocator if they actually have pages.
>
>> +/* Which CPUs have per cpu pages  */
>> +cpumask_var_t cpus_with_pcp;
>> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
>
> Does the flushing happen so frequently that it is worth keeping this
> state on a per-cpu basis, or would it be better to check each CPU's
> pcp info and assemble a cpumask at flush time like done in patch 5?
>

No, I don't  believe it is frequent at all. I will try to re-work the
patch as suggested.

Thanks,
Gilad

>

>
> --
> All rights reversed
>



-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
@ 2011-10-29 15:29       ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-29 15:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Sasha Levin

On Fri, Oct 28, 2011 at 6:07 PM, Rik van Riel <riel@redhat.com> wrote:
> On 10/23/2011 11:56 AM, Gilad Ben-Yossef wrote:
>>
>> Use a cpumask to track CPUs with per-cpu pages in any zone
>> and only send an IPI requesting CPUs to drain these pages
>> to the buddy allocator if they actually have pages.
>
>> +/* Which CPUs have per cpu pages  */
>> +cpumask_var_t cpus_with_pcp;
>> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
>
> Does the flushing happen so frequently that it is worth keeping this
> state on a per-cpu basis, or would it be better to check each CPU's
> pcp info and assemble a cpumask at flush time like done in patch 5?
>

No, I don't  believe it is frequent at all. I will try to re-work the
patch as suggested.

Thanks,
Gilad

>

>
> --
> All rights reversed
>



-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"I've seen things you people wouldn't believe. Goto statements used to
implement co-routines. I watched C structures being stored in
registers. All those moments will be lost in time... like tears in
rain... Time to die. "

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-10-28  9:09       ` Gilad Ben-Yossef
@ 2011-11-02  8:52         ` Christoph Lameter
  -1 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2011-11-02  8:52 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, 28 Oct 2011, Gilad Ben-Yossef wrote:

> I think if it is up to me, I recommend going the simpler  route that
> does the allocation in flush_all using GFP_ATOMIC for
> CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
> it is simpler code and in the end I believe it is also correct.

I support that. Pekka?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
@ 2011-11-02  8:52         ` Christoph Lameter
  0 siblings, 0 replies; 36+ messages in thread
From: Christoph Lameter @ 2011-11-02  8:52 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Fri, 28 Oct 2011, Gilad Ben-Yossef wrote:

> I think if it is up to me, I recommend going the simpler  route that
> does the allocation in flush_all using GFP_ATOMIC for
> CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
> it is simpler code and in the end I believe it is also correct.

I support that. Pekka?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-10-29 15:29       ` Gilad Ben-Yossef
  (?)
@ 2011-11-02  8:53       ` Christoph Lameter
  2011-11-10  8:03           ` Gilad Ben-Yossef
  -1 siblings, 1 reply; 36+ messages in thread
From: Christoph Lameter @ 2011-11-02  8:53 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: Rik van Riel, linux-kernel, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

[-- Attachment #1: Type: TEXT/PLAIN, Size: 693 bytes --]

On Sat, 29 Oct 2011, Gilad Ben-Yossef wrote:

> >> +/* Which CPUs have per cpu pages  */
> >> +cpumask_var_t cpus_with_pcp;
> >> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
> >
> > Does the flushing happen so frequently that it is worth keeping this
> > state on a per-cpu basis, or would it be better to check each CPU's
> > pcp info and assemble a cpumask at flush time like done in patch 5?
> >
>
> No, I don't  believe it is frequent at all. I will try to re-work the
> patch as suggested.

The draining of the pcp pages is done from the vmstat callback which
occurs every second. Only if there is something to clean in the caches
will the flush happen.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-11-02  8:52         ` Christoph Lameter
@ 2011-11-06 19:10           ` Pekka Enberg
  -1 siblings, 0 replies; 36+ messages in thread
From: Pekka Enberg @ 2011-11-06 19:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Gilad Ben-Yossef, linux-kernel, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Matt Mackall,
	Sasha Levin

On Fri, 28 Oct 2011, Gilad Ben-Yossef wrote:
> > I think if it is up to me, I recommend going the simpler  route that
> > does the allocation in flush_all using GFP_ATOMIC for
> > CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
> > it is simpler code and in the end I believe it is also correct.

On Wed, 2011-11-02 at 03:52 -0500, Christoph Lameter wrote:
> I support that. Pekka?

Sure. I'm OK with that. Someone needs to run some tests to make sure
it's working with low memory conditions when GFP_ATOMIC allocations
fail, though.

			Pekka


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
@ 2011-11-06 19:10           ` Pekka Enberg
  0 siblings, 0 replies; 36+ messages in thread
From: Pekka Enberg @ 2011-11-06 19:10 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Gilad Ben-Yossef, linux-kernel, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Matt Mackall,
	Sasha Levin

On Fri, 28 Oct 2011, Gilad Ben-Yossef wrote:
> > I think if it is up to me, I recommend going the simpler  route that
> > does the allocation in flush_all using GFP_ATOMIC for
> > CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
> > it is simpler code and in the end I believe it is also correct.

On Wed, 2011-11-02 at 03:52 -0500, Christoph Lameter wrote:
> I support that. Pekka?

Sure. I'm OK with that. Someone needs to run some tests to make sure
it's working with low memory conditions when GFP_ATOMIC allocations
fail, though.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
  2011-11-02  8:53       ` Christoph Lameter
@ 2011-11-10  8:03           ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-11-10  8:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, linux-kernel, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Wed, Nov 2, 2011 at 10:53 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sat, 29 Oct 2011, Gilad Ben-Yossef wrote:
>
>> >> +/* Which CPUs have per cpu pages  */
>> >> +cpumask_var_t cpus_with_pcp;
>> >> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
>> >
>> > Does the flushing happen so frequently that it is worth keeping this
>> > state on a per-cpu basis, or would it be better to check each CPU's
>> > pcp info and assemble a cpumask at flush time like done in patch 5?
>> >
>>
>> No, I don't  believe it is frequent at all. I will try to re-work the
>> patch as suggested.
>
> The draining of the pcp pages is done from the vmstat callback which
> occurs every second. Only if there is something to clean in the caches
> will the flush happen.
>

Right,  I wasn't accurate with my answer - I meant to say that the code to IPI
all CPUs asking to flush their pcp pages is infrequent, so doing more
work in that
code path is not unthinkable. Thanks for pointing it out.

As Christoph pointed out flushing on each CPU is also done by the  vmstat
workqueue every second, in addition to the IPI path.

Since the changes I need wish to do involve the code that sends the IPI and not
the flush code itself, I believe it is correct to say it is not a
frequent activity.

Having said that, trying to come up with a way with avoiding waking up each CPU
once per second to do the vmstat work is also on my todo list, but
this is another
patch and another story altogether... :-)

Thanks!
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist
@ 2011-11-10  8:03           ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-11-10  8:03 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, linux-kernel, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin

On Wed, Nov 2, 2011 at 10:53 AM, Christoph Lameter <cl@gentwo.org> wrote:
> On Sat, 29 Oct 2011, Gilad Ben-Yossef wrote:
>
>> >> +/* Which CPUs have per cpu pages  */
>> >> +cpumask_var_t cpus_with_pcp;
>> >> +static DEFINE_PER_CPU(unsigned long, total_cpu_pcp_count);
>> >
>> > Does the flushing happen so frequently that it is worth keeping this
>> > state on a per-cpu basis, or would it be better to check each CPU's
>> > pcp info and assemble a cpumask at flush time like done in patch 5?
>> >
>>
>> No, I don't  believe it is frequent at all. I will try to re-work the
>> patch as suggested.
>
> The draining of the pcp pages is done from the vmstat callback which
> occurs every second. Only if there is something to clean in the caches
> will the flush happen.
>

Right,  I wasn't accurate with my answer - I meant to say that the code to IPI
all CPUs asking to flush their pcp pages is infrequent, so doing more
work in that
code path is not unthinkable. Thanks for pointing it out.

As Christoph pointed out flushing on each CPU is also done by the  vmstat
workqueue every second, in addition to the IPI path.

Since the changes I need wish to do involve the code that sends the IPI and not
the flush code itself, I believe it is correct to say it is not a
frequent activity.

Having said that, trying to come up with a way with avoiding waking up each CPU
once per second to do the vmstat work is also on my todo list, but
this is another
patch and another story altogether... :-)

Thanks!
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
  2011-11-06 19:10           ` Pekka Enberg
@ 2011-11-13  8:39             ` Gilad Ben-Yossef
  -1 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-11-13  8:39 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, linux-kernel, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Matt Mackall,
	Sasha Levin

On Sun, Nov 6, 2011 at 9:10 PM, Pekka Enberg <penberg@kernel.org> wrote:
> On Fri, 28 Oct 2011, Gilad Ben-Yossef wrote:
>> > I think if it is up to me, I recommend going the simpler  route that
>> > does the allocation in flush_all using GFP_ATOMIC for
>> > CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
>> > it is simpler code and in the end I believe it is also correct.
>
> On Wed, 2011-11-02 at 03:52 -0500, Christoph Lameter wrote:
>> I support that. Pekka?
>
> Sure. I'm OK with that. Someone needs to run some tests to make sure
> it's working with low memory conditions when GFP_ATOMIC allocations
> fail, though.

I've just used the fault injection framework (which is really cool by
the way) to inject an
allocation failure for every cpumask alloc in slub.c flush_all in
CONFIG_CPUMASK_OFFSTACK=y kernel and then forced each kmem cache
to flush by reading sys/kernel/slab/*/alloc_calls and everything seems to be
in order. dmesg log shows the fault injection failing the allocation,
I get an extra debug
trace from the cpumask allocation code and the system keeps chugging along.

While at it I did a similar thing for the drain_all_pages of
mm/page_alloc.c (running
a new version of the code from the previous patch) and forced the
drain to be called
by running ./hackbench 1000. Here again I saw log reports of the code
path being
called (from the fault injection framework), allocation failed and
save for the debug
trace the system continued to work fine (the OOm killer has killed by
shell, but that
is to be expected).

Both of the above with the latest spin of the patch I'll send out soon
after some more tests,
so it looks good.

Thanks!
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack
@ 2011-11-13  8:39             ` Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-11-13  8:39 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, linux-kernel, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Matt Mackall,
	Sasha Levin

On Sun, Nov 6, 2011 at 9:10 PM, Pekka Enberg <penberg@kernel.org> wrote:
> On Fri, 28 Oct 2011, Gilad Ben-Yossef wrote:
>> > I think if it is up to me, I recommend going the simpler  route that
>> > does the allocation in flush_all using GFP_ATOMIC for
>> > CPUMASK_OFFSTACK=y and sends an IPI to all CPUs if it fails, because
>> > it is simpler code and in the end I believe it is also correct.
>
> On Wed, 2011-11-02 at 03:52 -0500, Christoph Lameter wrote:
>> I support that. Pekka?
>
> Sure. I'm OK with that. Someone needs to run some tests to make sure
> it's working with low memory conditions when GFP_ATOMIC allocations
> fail, though.

I've just used the fault injection framework (which is really cool by
the way) to inject an
allocation failure for every cpumask alloc in slub.c flush_all in
CONFIG_CPUMASK_OFFSTACK=y kernel and then forced each kmem cache
to flush by reading sys/kernel/slab/*/alloc_calls and everything seems to be
in order. dmesg log shows the fault injection failing the allocation,
I get an extra debug
trace from the cpumask allocation code and the system keeps chugging along.

While at it I did a similar thing for the drain_all_pages of
mm/page_alloc.c (running
a new version of the code from the previous patch) and forced the
drain to be called
by running ./hackbench 1000. Here again I saw log reports of the code
path being
called (from the fault injection framework), allocation failed and
save for the debug
trace the system continued to work fine (the OOm killer has killed by
shell, but that
is to be expected).

Both of the above with the latest spin of the patch I'll send out soon
after some more tests,
so it looks good.

Thanks!
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 0/6] Reduce cross CPU IPI interference
@ 2011-10-23 15:48 Gilad Ben-Yossef
  0 siblings, 0 replies; 36+ messages in thread
From: Gilad Ben-Yossef @ 2011-10-23 15:48 UTC (permalink / raw)
  To: lkml
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Sasha Levin

We have lots of infrastructure in place to partition a multi-core system such
that we have a group of CPUs that are dedicated to specific task: cgroups,
scheduler and interrupt affinity and cpuisol boot parameter. Still, kernel
code will some time interrupt all CPUs in the system via IPIs for various
needs. These IPIs are useful and cannot be avoided altogether, but in certain
cases it is possible to interrupt only specific CPUs that have useful work to
do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying to
explore doing this by locating the places where such global IPI calls are
being made and turning a global IPI into an IPI for a specific group of CPUs.
The purpose of the patch set is to get  feedback if this is the right way to
go for dealing with this issue and indeed, if the issue is even worth dealing
with at all.

The patch creates an on_each_cpu_mask infrastructure API (derived from
existing arch specific versions in Tile and Arm) and uses it to turn two global
IPI invocation to per CPU group invocations.

This second version incorporates changes due to reviewers feedback and 
additional testing. The major changes from the previous version of the patch 
are:

- Better description for some of the patches with examples of what I am
  trying to solve.
- Better coding style for on_each_cpu based on review comments by Peter 
  Zijlstra and Sasha Levin.
- Fixed pcp_count handling to take into account which cpu the accounting 
  is done for. Sadly, AFAIK this negates using this_cpu_add/sub as 
  suggested by Peter Z.
- Removed kmalloc from the flush_all() path as per review comment by 
  Pekka Enberg.
- Moved cpumask allocations for CONFIG_CPUMASK_OFFSTACK=y to a point previous
  to first use during boot as testing revealed we no longer boot under 
  CONFIG_CPUMASK_OFFSTACK=y with original code.

The patch was compiled for arm and boot tested on x86 in UP, SMP, with and without 
CONFIG_CPUMASK_OFFSTACK and was further tested by running hackbench on x86 in 
SMP mode in a 4 CPUs VM for several hours with no obvious regressions.

I also artificially exercised SLUB flush_all via the debug interface and observed 
the difference in IPI count across processors  with and without the patch - from 
an IPI on all processors but one without the patch to a subset (and often no IPI 
at all) with the patch.


Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>

Gilad Ben-Yossef (6):
  smp: Introduce a generic on_each_cpu_mask function
  arm: Move arm over to generic on_each_cpu_mask
  tile: Move tile to use generic on_each_cpu_mask
  mm: Only IPI CPUs to drain local pages if they exist
  slub: Only IPI CPUs that have per cpu obj to flush
  slub: only preallocate cpus_with_slabs if offstack

 arch/arm/kernel/smp_tlb.c   |   20 +++----------
 arch/tile/include/asm/smp.h |    7 -----
 arch/tile/kernel/smp.c      |   19 -------------
 include/linux/slub_def.h    |    9 ++++++
 include/linux/smp.h         |   16 +++++++++++
 kernel/smp.c                |   20 +++++++++++++
 mm/page_alloc.c             |   64 +++++++++++++++++++++++++++++++++++++------
 mm/slub.c                   |   61 +++++++++++++++++++++++++++++++++++++++-
 8 files changed, 164 insertions(+), 52 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-11-13  8:39 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-23 15:56 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef
2011-10-23 15:56 ` Gilad Ben-Yossef
2011-10-23 15:56 ` [PATCH v2 1/6] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2011-10-23 15:56   ` Gilad Ben-Yossef
2011-10-23 15:56 ` [PATCH v2 2/6] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2011-10-23 15:56   ` Gilad Ben-Yossef
2011-10-23 15:56 ` [PATCH v2 3/6] tile: Move tile to use " Gilad Ben-Yossef
2011-10-23 15:56   ` Gilad Ben-Yossef
2011-10-23 15:56 ` [PATCH v2 4/6] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2011-10-23 15:56   ` Gilad Ben-Yossef
2011-10-28 16:07   ` Rik van Riel
2011-10-28 16:07     ` Rik van Riel
2011-10-29 15:29     ` Gilad Ben-Yossef
2011-10-29 15:29       ` Gilad Ben-Yossef
2011-11-02  8:53       ` Christoph Lameter
2011-11-10  8:03         ` Gilad Ben-Yossef
2011-11-10  8:03           ` Gilad Ben-Yossef
2011-10-23 15:56 ` [PATCH v2 5/6] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2011-10-23 15:56   ` Gilad Ben-Yossef
2011-10-28  4:06   ` Christoph Lameter
2011-10-28  4:06     ` Christoph Lameter
2011-10-28  8:50     ` Gilad Ben-Yossef
2011-10-28  8:50       ` Gilad Ben-Yossef
2011-10-23 15:56 ` [PATCH v2 6/6] slub: only preallocate cpus_with_slabs if offstack Gilad Ben-Yossef
2011-10-23 15:56   ` Gilad Ben-Yossef
2011-10-28  4:06   ` Christoph Lameter
2011-10-28  4:06     ` Christoph Lameter
2011-10-28  9:09     ` Gilad Ben-Yossef
2011-10-28  9:09       ` Gilad Ben-Yossef
2011-11-02  8:52       ` Christoph Lameter
2011-11-02  8:52         ` Christoph Lameter
2011-11-06 19:10         ` Pekka Enberg
2011-11-06 19:10           ` Pekka Enberg
2011-11-13  8:39           ` Gilad Ben-Yossef
2011-11-13  8:39             ` Gilad Ben-Yossef
  -- strict thread matches above, loose matches on Subject: below --
2011-10-23 15:48 [PATCH v2 0/6] Reduce cross CPU IPI interference Gilad Ben-Yossef

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.