linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function
       [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
@ 2012-01-02 10:24 ` Gilad Ben-Yossef
  2012-01-03  7:51   ` Michal Nazarewicz
  2012-01-03 22:26   ` Andrew Morton
  2012-01-02 10:24 ` [PATCH v5 2/8] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-02 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Chris Metcalf, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
	Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin, Mel Gorman,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

on_each_cpu_mask calls a function on processors specified my cpumask,
which may include the local processor.

All the limitation specified in smp_call_function_many apply.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
---
 include/linux/smp.h |   16 ++++++++++++++++
 kernel/smp.c        |   20 ++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 8cc38d3..60628d7 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
 int on_each_cpu(smp_call_func_t func, void *info, int wait);
 
 /*
+ * Call a function on processors specified by mask, which might include
+ * the local one.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+		void *info, bool wait);
+
+/*
  * Mark the boot cpu "online" so that it can call console drivers in
  * printk() and can access its per-cpu storage.
  */
@@ -132,6 +139,15 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
 		local_irq_enable();		\
 		0;				\
 	})
+#define on_each_cpu_mask(mask, func, info, wait) \
+	do {						\
+		if (cpumask_test_cpu(0, (mask))) {	\
+			local_irq_disable();		\
+			(func)(info);			\
+			local_irq_enable();		\
+		}					\
+	} while (0)
+
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
 #define smp_prepare_boot_cpu()			do {} while (0)
diff --git a/kernel/smp.c b/kernel/smp.c
index db197d6..7c0cbd7 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -701,3 +701,23 @@ int on_each_cpu(void (*func) (void *info), void *info, int wait)
 	return ret;
 }
 EXPORT_SYMBOL(on_each_cpu);
+
+/*
+ * Call a function on processors specified by cpumask, which may include
+ * the local processor. All the limitation specified in smp_call_function_many
+ * apply.
+ */
+void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
+			void *info, bool wait)
+{
+	int cpu = get_cpu();
+
+	smp_call_function_many(mask, func, info, wait);
+	if (cpumask_test_cpu(cpu, mask)) {
+		local_irq_disable();
+		func(info);
+		local_irq_enable();
+	}
+	put_cpu();
+}
+EXPORT_SYMBOL(on_each_cpu_mask);
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 2/8] arm: Move arm over to generic on_each_cpu_mask
       [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
  2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
@ 2012-01-02 10:24 ` Gilad Ben-Yossef
  2012-01-02 10:24 ` [PATCH v5 3/8] tile: Move tile to use " Gilad Ben-Yossef
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-02 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, Christoph Lameter, Chris Metcalf, linux-mm,
	Pekka Enberg, Matt Mackall, Rik van Riel, Andi Kleen,
	Sasha Levin, Mel Gorman, Andrew Morton, Alexander Viro,
	linux-fsdevel, Avi Kivity

Note that the generic version is a little different then the Arm one:

1. It has the mask as first parameter
2. It calls the function on the calling CPU with interrupts disabled,
   but this should be OK since the function is called on the other CPUs
   with interrupts disabled anyway.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
---
 arch/arm/kernel/smp_tlb.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/smp_tlb.c b/arch/arm/kernel/smp_tlb.c
index 7dcb352..02c5d2c 100644
--- a/arch/arm/kernel/smp_tlb.c
+++ b/arch/arm/kernel/smp_tlb.c
@@ -13,18 +13,6 @@
 #include <asm/smp_plat.h>
 #include <asm/tlbflush.h>
 
-static void on_each_cpu_mask(void (*func)(void *), void *info, int wait,
-	const struct cpumask *mask)
-{
-	preempt_disable();
-
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(smp_processor_id(), mask))
-		func(info);
-
-	preempt_enable();
-}
-
 /**********************************************************************/
 
 /*
@@ -87,7 +75,7 @@ void flush_tlb_all(void)
 void flush_tlb_mm(struct mm_struct *mm)
 {
 	if (tlb_ops_need_broadcast())
-		on_each_cpu_mask(ipi_flush_tlb_mm, mm, 1, mm_cpumask(mm));
+		on_each_cpu_mask(mm_cpumask(mm), ipi_flush_tlb_mm, mm, 1);
 	else
 		local_flush_tlb_mm(mm);
 }
@@ -98,7 +86,8 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned long uaddr)
 		struct tlb_args ta;
 		ta.ta_vma = vma;
 		ta.ta_start = uaddr;
-		on_each_cpu_mask(ipi_flush_tlb_page, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_page,
+					&ta, 1);
 	} else
 		local_flush_tlb_page(vma, uaddr);
 }
@@ -121,7 +110,8 @@ void flush_tlb_range(struct vm_area_struct *vma,
 		ta.ta_vma = vma;
 		ta.ta_start = start;
 		ta.ta_end = end;
-		on_each_cpu_mask(ipi_flush_tlb_range, &ta, 1, mm_cpumask(vma->vm_mm));
+		on_each_cpu_mask(mm_cpumask(vma->vm_mm), ipi_flush_tlb_range,
+					&ta, 1);
 	} else
 		local_flush_tlb_range(vma, start, end);
 }
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 3/8] tile: Move tile to use generic on_each_cpu_mask
       [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
  2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
  2012-01-02 10:24 ` [PATCH v5 2/8] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
@ 2012-01-02 10:24 ` Gilad Ben-Yossef
  2012-01-02 10:24 ` [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func Gilad Ben-Yossef
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-02 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Christoph Lameter, Pekka Enberg,
	Matt Mackall, Rik van Riel, Andi Kleen, Sasha Levin, Mel Gorman,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

The API is the same as the tile private one, so just remove
the private version of the functions

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
---
 arch/tile/include/asm/smp.h |    7 -------
 arch/tile/kernel/smp.c      |   19 -------------------
 2 files changed, 0 insertions(+), 26 deletions(-)

diff --git a/arch/tile/include/asm/smp.h b/arch/tile/include/asm/smp.h
index 532124a..1aa759a 100644
--- a/arch/tile/include/asm/smp.h
+++ b/arch/tile/include/asm/smp.h
@@ -43,10 +43,6 @@ void evaluate_message(int tag);
 /* Boot a secondary cpu */
 void online_secondary(void);
 
-/* Call a function on a specified set of CPUs (may include this one). */
-extern void on_each_cpu_mask(const struct cpumask *mask,
-			     void (*func)(void *), void *info, bool wait);
-
 /* Topology of the supervisor tile grid, and coordinates of boot processor */
 extern HV_Topology smp_topology;
 
@@ -91,9 +87,6 @@ void print_disabled_cpus(void);
 
 #else /* !CONFIG_SMP */
 
-#define on_each_cpu_mask(mask, func, info, wait)		\
-  do { if (cpumask_test_cpu(0, (mask))) func(info); } while (0)
-
 #define smp_master_cpu		0
 #define smp_height		1
 #define smp_width		1
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index c52224d..a44e103 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -87,25 +87,6 @@ void send_IPI_allbutself(int tag)
 	send_IPI_many(&mask, tag);
 }
 
-
-/*
- * Provide smp_call_function_mask, but also run function locally
- * if specified in the mask.
- */
-void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
-		      void *info, bool wait)
-{
-	int cpu = get_cpu();
-	smp_call_function_many(mask, func, info, wait);
-	if (cpumask_test_cpu(cpu, mask)) {
-		local_irq_disable();
-		func(info);
-		local_irq_enable();
-	}
-	put_cpu();
-}
-
-
 /*
  * Functions related to starting/stopping cpus.
  */
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func
       [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
                   ` (2 preceding siblings ...)
  2012-01-02 10:24 ` [PATCH v5 3/8] tile: Move tile to use " Gilad Ben-Yossef
@ 2012-01-02 10:24 ` Gilad Ben-Yossef
  2012-01-03 22:34   ` Andrew Morton
  2012-01-02 10:24 ` [PATCH v5 5/8] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-02 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Chris Metcalf, Christoph Lameter,
	Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
	Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity

Add the on_each_cpu_required() function that wraps on_each_cpu_mask()
and calculates the cpumask of cpus to IPI by calling a function supplied
as a parameter in order to determine whether to IPI each specific cpu.

The function deals with allocation failure of cpumask variable in
CONFIG_CPUMASK_OFFSTACK=y by sending IPI to all cpus via on_each_cpu()
instead.

The function is useful since it allows to seperate the specific
code that decided in each case whether to IPI a specific cpu for
a specific request from the common boilerplate code of handling
creating the mask, handling failures etc.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
---
 include/linux/smp.h |   16 ++++++++++++++++
 kernel/smp.c        |   27 +++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 0 deletions(-)

diff --git a/include/linux/smp.h b/include/linux/smp.h
index 60628d7..ef85a3d 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -109,6 +109,14 @@ void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
 		void *info, bool wait);
 
 /*
+ * Call a function on each processor for which the supplied function
+ * cond_func returns a positive value. This may include the local
+ * processor.
+ */
+void on_each_cpu_cond(int (*cond_func) (int cpu, void *info),
+		void (*func)(void *), void *info, bool wait);
+
+/*
  * Mark the boot cpu "online" so that it can call console drivers in
  * printk() and can access its per-cpu storage.
  */
@@ -147,6 +155,14 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
 			local_irq_enable();		\
 		}					\
 	} while (0)
+#define on_each_cpu_cond(cond_func, func, info, wait) \
+	do {						\
+		if (cond_func(0, info)) {		\
+			local_irq_disable();		\
+			(func)(info);			\
+			local_irq_enable();		\
+		}					\
+	} while (0)
 
 static inline void smp_send_reschedule(int cpu) { }
 #define num_booting_cpus()			1
diff --git a/kernel/smp.c b/kernel/smp.c
index 7c0cbd7..5f7b24e 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -721,3 +721,30 @@ void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
 	put_cpu();
 }
 EXPORT_SYMBOL(on_each_cpu_mask);
+
+/*
+ * Call a function on each processor for which the supplied function
+ * cond_func returns a positive value. This may include the local
+ * processor, optionally waiting for all the required CPUs to finish.
+ * The function may be called on all online CPUs without running the
+ * cond_func function in extreme circumstance (memory allocation
+ * failure condition when CONFIG_CPUMASK_OFFSTACK=y)
+ * All the limitations specified in smp_call_function_many apply.
+ */
+void on_each_cpu_cond(int (*cond_func) (int cpu, void *info),
+			void (*func)(void *), void *info, bool wait)
+{
+	cpumask_var_t cpus;
+	int cpu;
+
+	if (likely(zalloc_cpumask_var(&cpus, GFP_ATOMIC))) {
+		for_each_online_cpu(cpu)
+			if (cond_func(cpu, info))
+				cpumask_set_cpu(cpu, cpus);
+		on_each_cpu_mask(cpus, func, info, wait);
+		free_cpumask_var(cpus);
+	} else
+		on_each_cpu(func, info, wait);
+}
+EXPORT_SYMBOL(on_each_cpu_cond);
+
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 5/8] slub: Only IPI CPUs that have per cpu obj to flush
       [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
                   ` (3 preceding siblings ...)
  2012-01-02 10:24 ` [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func Gilad Ben-Yossef
@ 2012-01-02 10:24 ` Gilad Ben-Yossef
  2012-01-02 10:24 ` [PATCH v5 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
  2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
  6 siblings, 0 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-02 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Christoph Lameter, Chris Metcalf,
	Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen, Mel Gorman,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

flush_all() is called for each kmem_cahce_destroy(). So every cache
being destroyed dynamically ended up sending an IPI to each CPU in the
system, regardless if the cache has ever been used there.

For example, if you close the Infinband ipath driver char device file,
the close file ops calls kmem_cache_destroy(). So running some
infiniband config tool on one a single CPU dedicated to system tasks
might interrupt the rest of the 127 CPUs I dedicated to some CPU
intensive task.

I suspect there is a good chance that every line in the output of "git
grep kmem_cache_destroy linux/ | grep '\->'" has a similar scenario.

This patch attempts to rectify this issue by sending an IPI to flush
the per cpu objects back to the free lists only to CPUs that seems to
have such objects.

The check which CPU to IPI is racy but we don't care since asking a
CPU without per cpu objects to flush does no damage and as far as I
can tell the flush_all by itself is racy against allocs on remote
CPUs anyway, so if you meant the flush_all to be determinstic, you
had to arrange for locking regardless.

Without this patch the following artificial test case:

$ cd /sys/kernel/slab
$ for DIR in *; do cat $DIR/alloc_calls > /dev/null; done

produces 166 IPIs on an cpuset isolated CPU. With it it produces none.

The code path of memory allocation failure for CPUMASK_OFFSTACK=y
config was tested using fault injection framework.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
---

 The Acks were for a previous version that had the code
 of on_each_cpu_cond() inlined at the call site.

 mm/slub.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index ed3334d..c53aa2c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2013,9 +2013,17 @@ static void flush_cpu_slab(void *d)
 	__flush_cpu_slab(s, smp_processor_id());
 }
 
+static int has_cpu_slab(int cpu, void *info)
+{
+	struct kmem_cache *s = info;
+	struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
+
+	return !!(c->page);
+}
+
 static void flush_all(struct kmem_cache *s)
 {
-	on_each_cpu(flush_cpu_slab, s, 1);
+	on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1);
 }
 
 /*
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 6/8] fs: only send IPI to invalidate LRU BH when needed
       [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
                   ` (4 preceding siblings ...)
  2012-01-02 10:24 ` [PATCH v5 5/8] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
@ 2012-01-02 10:24 ` Gilad Ben-Yossef
  2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
  6 siblings, 0 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-02 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Christoph Lameter, Chris Metcalf,
	Peter Zijlstra, Frederic Weisbecker, Russell King, linux-mm,
	Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Mel Gorman, Andrew Morton, Alexander Viro,
	linux-fsdevel, Avi Kivity

In several code paths, such as when unmounting a file system (but
not only) we send an IPI to ask each cpu to invalidate its local
LRU BHs.

For multi-cores systems that have many cpus that may not have
any LRU BH because they are idle or because they have no performed
any file system access since last invalidation (e.g. CPU crunching
on high perfomance computing nodes that write results to shared
memory) this can lead to loss of performance each time someone
switches KVM (the virtual keyboard and screen type, not the
hypervisor) that has a USB storage stuck in.

This patch attempts to only send the IPI to cpus that have LRU BH.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
---
 fs/buffer.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 19d8eb7..b2378d4 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1434,10 +1434,23 @@ static void invalidate_bh_lru(void *arg)
 	}
 	put_cpu_var(bh_lrus);
 }
+
+static int local_bh_lru_avail(int cpu, void *dummy)
+{
+	struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu);
+	int i;
 	
+	for (i = 0; i < BH_LRU_SIZE; i++) {
+		if (b->bhs[i])
+			return 1;
+	}
+
+	return 0;
+}
+
 void invalidate_bh_lrus(void)
 {
-	on_each_cpu(invalidate_bh_lru, NULL, 1);
+	on_each_cpu_cond(local_bh_lru_avail, invalidate_bh_lru, NULL, 1);
 }
 EXPORT_SYMBOL_GPL(invalidate_bh_lrus);
 
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
       [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
                   ` (5 preceding siblings ...)
  2012-01-02 10:24 ` [PATCH v5 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
@ 2012-01-02 10:24 ` Gilad Ben-Yossef
  2012-01-03 17:45   ` KOSAKI Motohiro
  2012-01-05 15:54   ` Mel Gorman
  6 siblings, 2 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-02 10:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gilad Ben-Yossef, Chris Metcalf, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen, Mel Gorman,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

Calculate a cpumask of CPUs with per-cpu pages in any zone
and only send an IPI requesting CPUs to drain these pages
to the buddy allocator if they actually have pages when
asked to flush.

This patch saves 99% of IPIs asking to drain per-cpu
pages in case of severe memory preassure that leads
to OOM since in these cases multiple, possibly concurrent,
allocation requests end up in the direct reclaim code
path so when the per-cpu pages end up reclaimed on first
allocation failure for most of the proceeding allocation
attempts until the memory pressure is off (possibly via
the OOM killer) there are no per-cpu pages on most CPUs
(and there can easily be hundreds of them).

This also has the side effect of shortening the average
latency of direct reclaim by 1 or more order of magnitude
since waiting for all the CPUs to ACK the IPI takes a
long time.

Tested by running "hackbench 400" on a 4 CPU x86 otherwise
idle VM and observing the difference between the number
of direct reclaim attempts that end up in drain_all_pages()
and those were more then 1/2 of the online CPU had any
per-cpu page in them, using the vmstat counters introduced
in the next patch in the series and using proc/interrupts.

In the test sceanrio, this saved around 500 global IPIs.
After trigerring an OOM:

$ cat /proc/vmstat
...
pcp_global_drain 627
pcp_global_ipi_saved 578

I've also seen the number of drains reach 15k calls
with the saved percentage reaching 99% when there
are more tasks running during an OOM kill.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
---
 Christopth Ack was for a previous version that allocated
 the cpumask in drain_all_pages().

 mm/page_alloc.c |   26 +++++++++++++++++++++++++-
 1 files changed, 25 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2b8ba3a..092c331 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -67,6 +67,14 @@ DEFINE_PER_CPU(int, numa_node);
 EXPORT_PER_CPU_SYMBOL(numa_node);
 #endif
 
+/*
+ * A global cpumask of CPUs with per-cpu pages that gets 
+ * recomputed on each drain. We use a global cpumask
+ * for to avoid allocation on direct reclaim code path 
+ * for CONFIG_CPUMASK_OFFSTACK=y
+ */
+static cpumask_var_t cpus_with_pcps;
+
 #ifdef CONFIG_HAVE_MEMORYLESS_NODES
 /*
  * N.B., Do NOT reference the '_numa_mem_' per cpu variable directly.
@@ -1119,7 +1127,19 @@ void drain_local_pages(void *arg)
  */
 void drain_all_pages(void)
 {
-	on_each_cpu(drain_local_pages, NULL, 1);
+	int cpu;
+	struct per_cpu_pageset *pcp;
+	struct zone *zone;
+
+	for_each_online_cpu(cpu)
+		for_each_populated_zone(zone) {
+			pcp = per_cpu_ptr(zone->pageset, cpu);
+			if (pcp->pcp.count)
+				cpumask_set_cpu(cpu, cpus_with_pcps);
+			else
+				cpumask_clear_cpu(cpu, cpus_with_pcps);
+		}
+	on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
 }
 
 #ifdef CONFIG_HIBERNATION
@@ -3623,6 +3643,10 @@ static void setup_zone_pageset(struct zone *zone)
 void __init setup_per_cpu_pageset(void)
 {
 	struct zone *zone;
+	int ret;
+
+	ret = zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL);
+	BUG_ON(!ret);
 
 	for_each_populated_zone(zone)
 		setup_zone_pageset(zone);
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function
  2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
@ 2012-01-03  7:51   ` Michal Nazarewicz
  2012-01-03  8:12     ` Gilad Ben-Yossef
  2012-01-03 22:26   ` Andrew Morton
  1 sibling, 1 reply; 37+ messages in thread
From: Michal Nazarewicz @ 2012-01-03  7:51 UTC (permalink / raw)
  To: linux-kernel, Gilad Ben-Yossef
  Cc: Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, Russell King,
	linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel, Andi Kleen,
	Sasha Levin, Mel Gorman, Andrew Morton, Alexander Viro,
	linux-fsdevel, Avi Kivity

On Mon, 02 Jan 2012 11:24:12 +0100, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> @@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
>  int on_each_cpu(smp_call_func_t func, void *info, int wait);
> /*
> + * Call a function on processors specified by mask, which might include
> + * the local one.
> + */
> +void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
> +		void *info, bool wait);
> +

on_each_cpu() returns an int.  For consistency reasons, would it make sense to
make on_each_cpu_maks() to return and int?  I know that the difference is that
smp_call_function() returns and int and smp_call_function_many() returns void,
but to me it actually seems strange and either I'm missing something important
(which is likely) or this needs to get cleaned up at one point as well.

> +/*
>   * Mark the boot cpu "online" so that it can call console drivers in
>   * printk() and can access its per-cpu storage.
>   */

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function
  2012-01-03  7:51   ` Michal Nazarewicz
@ 2012-01-03  8:12     ` Gilad Ben-Yossef
  2012-01-03  8:57       ` Michal Nazarewicz
  0 siblings, 1 reply; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-03  8:12 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel,
	Andi Kleen, Sasha Levin, Mel Gorman, Andrew Morton,
	Alexander Viro, linux-fsdevel, Avi Kivity

2012/1/3 Michal Nazarewicz <mina86@mina86.com>:
> On Mon, 02 Jan 2012 11:24:12 +0100, Gilad Ben-Yossef <gilad@benyossef.com>
> wrote:
>>
>> @@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
>>  int on_each_cpu(smp_call_func_t func, void *info, int wait);
>> /*
>> + * Call a function on processors specified by mask, which might include
>> + * the local one.
>> + */
>> +void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
>> +               void *info, bool wait);
>> +
>
>
> on_each_cpu() returns an int.  For consistency reasons, would it make sense
> to
> make on_each_cpu_maks() to return and int?  I know that the difference is
> that
> smp_call_function() returns and int and smp_call_function_many() returns
> void,
> but to me it actually seems strange and either I'm missing something
> important
> (which is likely) or this needs to get cleaned up at one point as well.
>

I'd say we should go the other way around - kill the return value on
on_each_cpu()

The return value is always a hard coded zero and we have some code that tests
for that return value. Silly...

It looks like it's there for hysterical reasons to me :-)

Gilad



-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function
  2012-01-03  8:12     ` Gilad Ben-Yossef
@ 2012-01-03  8:57       ` Michal Nazarewicz
  0 siblings, 0 replies; 37+ messages in thread
From: Michal Nazarewicz @ 2012-01-03  8:57 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel,
	Andi Kleen, Sasha Levin, Mel Gorman, Andrew Morton,
	Alexander Viro, linux-fsdevel, Avi Kivity

> 2012/1/3 Michal Nazarewicz <mina86@mina86.com>:
>> on_each_cpu() returns an int.  For consistency reasons, would it make sense
>> to make on_each_cpu_maks() to return and int?  I know that the difference
>> is that smp_call_function() returns and int and smp_call_function_many()
>> returns void, but to me it actually seems strange and either I'm missing
>> something important (which is likely) or this needs to get cleaned up at
>> one point as well.

On Tue, 03 Jan 2012 09:12:21 +0100, Gilad Ben-Yossef <gilad@benyossef.com> wrote:
> I'd say we should go the other way around - kill the return value on
> on_each_cpu()
>
> The return value is always a hard coded zero and we have some code that tests
> for that return value. Silly...
>
> It looks like it's there for hysterical reasons to me :-)

That might be right.  Of course, this goes deeper then on_each_cpu() since
some of the smp_call_function functions have an int return value, but I
couldn't find an instance when they return non-zero.

I'd offer to volunteer to do the clean-up but I have too little experience
in IPI to say with confidence that we in fact can and want to drop the “int”
return value from all of those functions.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
@ 2012-01-03 17:45   ` KOSAKI Motohiro
  2012-01-03 18:58     ` Gilad Ben-Yossef
  2012-01-05 14:20     ` Mel Gorman
  2012-01-05 15:54   ` Mel Gorman
  1 sibling, 2 replies; 37+ messages in thread
From: KOSAKI Motohiro @ 2012-01-03 17:45 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin,
	Rik van Riel, Andi Kleen, Mel Gorman, Andrew Morton,
	Alexander Viro, linux-fsdevel, Avi Kivity

(1/2/12 5:24 AM), Gilad Ben-Yossef wrote:
> Calculate a cpumask of CPUs with per-cpu pages in any zone
> and only send an IPI requesting CPUs to drain these pages
> to the buddy allocator if they actually have pages when
> asked to flush.
> 
> This patch saves 99% of IPIs asking to drain per-cpu
> pages in case of severe memory preassure that leads
> to OOM since in these cases multiple, possibly concurrent,
> allocation requests end up in the direct reclaim code
> path so when the per-cpu pages end up reclaimed on first
> allocation failure for most of the proceeding allocation
> attempts until the memory pressure is off (possibly via
> the OOM killer) there are no per-cpu pages on most CPUs
> (and there can easily be hundreds of them).
> 
> This also has the side effect of shortening the average
> latency of direct reclaim by 1 or more order of magnitude
> since waiting for all the CPUs to ACK the IPI takes a
> long time.
> 
> Tested by running "hackbench 400" on a 4 CPU x86 otherwise
> idle VM and observing the difference between the number
> of direct reclaim attempts that end up in drain_all_pages()
> and those were more then 1/2 of the online CPU had any
> per-cpu page in them, using the vmstat counters introduced
> in the next patch in the series and using proc/interrupts.
> 
> In the test sceanrio, this saved around 500 global IPIs.
> After trigerring an OOM:
> 
> $ cat /proc/vmstat
> ...
> pcp_global_drain 627
> pcp_global_ipi_saved 578
> 
> I've also seen the number of drains reach 15k calls
> with the saved percentage reaching 99% when there
> are more tasks running during an OOM kill.
> 
> Signed-off-by: Gilad Ben-Yossef<gilad@benyossef.com>
> Acked-by: Christoph Lameter<cl@linux.com>
> CC: Chris Metcalf<cmetcalf@tilera.com>
> CC: Peter Zijlstra<a.p.zijlstra@chello.nl>
> CC: Frederic Weisbecker<fweisbec@gmail.com>
> CC: Russell King<linux@arm.linux.org.uk>
> CC: linux-mm@kvack.org
> CC: Pekka Enberg<penberg@kernel.org>
> CC: Matt Mackall<mpm@selenic.com>
> CC: Sasha Levin<levinsasha928@gmail.com>
> CC: Rik van Riel<riel@redhat.com>
> CC: Andi Kleen<andi@firstfloor.org>
> CC: Mel Gorman<mel@csn.ul.ie>
> CC: Andrew Morton<akpm@linux-foundation.org>
> CC: Alexander Viro<viro@zeniv.linux.org.uk>
> CC: linux-fsdevel@vger.kernel.org
> CC: Avi Kivity<avi@redhat.com>
> ---
>   Christopth Ack was for a previous version that allocated
>   the cpumask in drain_all_pages().

When you changed a patch design and implementation, ACKs are
should be dropped. otherwise you miss to chance to get a good
review.



>   mm/page_alloc.c |   26 +++++++++++++++++++++++++-
>   1 files changed, 25 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2b8ba3a..092c331 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -67,6 +67,14 @@ DEFINE_PER_CPU(int, numa_node);
>   EXPORT_PER_CPU_SYMBOL(numa_node);
>   #endif
> 
> +/*
> + * A global cpumask of CPUs with per-cpu pages that gets
> + * recomputed on each drain. We use a global cpumask
> + * for to avoid allocation on direct reclaim code path
> + * for CONFIG_CPUMASK_OFFSTACK=y
> + */
> +static cpumask_var_t cpus_with_pcps;
> +
>   #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>   /*
>    * N.B., Do NOT reference the '_numa_mem_' per cpu variable directly.
> @@ -1119,7 +1127,19 @@ void drain_local_pages(void *arg)
>    */
>   void drain_all_pages(void)
>   {
> -	on_each_cpu(drain_local_pages, NULL, 1);
> +	int cpu;
> +	struct per_cpu_pageset *pcp;
> +	struct zone *zone;
> +

get_online_cpu() ?

> +	for_each_online_cpu(cpu)
> +		for_each_populated_zone(zone) {
> +			pcp = per_cpu_ptr(zone->pageset, cpu);
> +			if (pcp->pcp.count)
> +				cpumask_set_cpu(cpu, cpus_with_pcps);
> +			else
> +				cpumask_clear_cpu(cpu, cpus_with_pcps);

cpumask* functions can't be used locklessly?

> +		}
> +	on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
>   }
> 
>   #ifdef CONFIG_HIBERNATION
> @@ -3623,6 +3643,10 @@ static void setup_zone_pageset(struct zone *zone)
>   void __init setup_per_cpu_pageset(void)
>   {
>   	struct zone *zone;
> +	int ret;
> +
> +	ret = zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL);
> +	BUG_ON(!ret);
> 
>   	for_each_populated_zone(zone)
>   		setup_zone_pageset(zone);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-03 17:45   ` KOSAKI Motohiro
@ 2012-01-03 18:58     ` Gilad Ben-Yossef
  2012-01-03 22:02       ` KOSAKI Motohiro
  2012-01-05 14:20     ` Mel Gorman
  1 sibling, 1 reply; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-03 18:58 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin,
	Rik van Riel, Andi Kleen, Mel Gorman, Andrew Morton,
	Alexander Viro, linux-fsdevel, Avi Kivity

2012/1/3 KOSAKI Motohiro <kosaki.motohiro@gmail.com>:
> (1/2/12 5:24 AM), Gilad Ben-Yossef wrote:
>> Calculate a cpumask of CPUs with per-cpu pages in any zone
>> and only send an IPI requesting CPUs to drain these pages
>> to the buddy allocator if they actually have pages when
>> asked to flush.
>>
>> This patch saves 99% of IPIs asking to drain per-cpu
>> pages in case of severe memory preassure that leads
>> to OOM since in these cases multiple, possibly concurrent,
>> allocation requests end up in the direct reclaim code
>> path so when the per-cpu pages end up reclaimed on first
>> allocation failure for most of the proceeding allocation
>> attempts until the memory pressure is off (possibly via
>> the OOM killer) there are no per-cpu pages on most CPUs
>> (and there can easily be hundreds of them).
>>
>> This also has the side effect of shortening the average
>> latency of direct reclaim by 1 or more order of magnitude
>> since waiting for all the CPUs to ACK the IPI takes a
>> long time.
>>
>> Tested by running "hackbench 400" on a 4 CPU x86 otherwise
>> idle VM and observing the difference between the number
>> of direct reclaim attempts that end up in drain_all_pages()
>> and those were more then 1/2 of the online CPU had any
>> per-cpu page in them, using the vmstat counters introduced
>> in the next patch in the series and using proc/interrupts.
>>
>> In the test sceanrio, this saved around 500 global IPIs.
>> After trigerring an OOM:
>>
>> $ cat /proc/vmstat
>> ...
>> pcp_global_drain 627
>> pcp_global_ipi_saved 578
>>
>> I've also seen the number of drains reach 15k calls
>> with the saved percentage reaching 99% when there
>> are more tasks running during an OOM kill.
>>
>> Signed-off-by: Gilad Ben-Yossef<gilad@benyossef.com>
>> Acked-by: Christoph Lameter<cl@linux.com>
>> CC: Chris Metcalf<cmetcalf@tilera.com>
>> CC: Peter Zijlstra<a.p.zijlstra@chello.nl>
>> CC: Frederic Weisbecker<fweisbec@gmail.com>
>> CC: Russell King<linux@arm.linux.org.uk>
>> CC: linux-mm@kvack.org
>> CC: Pekka Enberg<penberg@kernel.org>
>> CC: Matt Mackall<mpm@selenic.com>
>> CC: Sasha Levin<levinsasha928@gmail.com>
>> CC: Rik van Riel<riel@redhat.com>
>> CC: Andi Kleen<andi@firstfloor.org>
>> CC: Mel Gorman<mel@csn.ul.ie>
>> CC: Andrew Morton<akpm@linux-foundation.org>
>> CC: Alexander Viro<viro@zeniv.linux.org.uk>
>> CC: linux-fsdevel@vger.kernel.org
>> CC: Avi Kivity<avi@redhat.com>
>> ---
>>   Christopth Ack was for a previous version that allocated
>>   the cpumask in drain_all_pages().
>
> When you changed a patch design and implementation, ACKs are
> should be dropped. otherwise you miss to chance to get a good
> review.
>

Got you. Thanks for the review :-)
>
>
>>   mm/page_alloc.c |   26 +++++++++++++++++++++++++-
>>   1 files changed, 25 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 2b8ba3a..092c331 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -67,6 +67,14 @@ DEFINE_PER_CPU(int, numa_node);
>>   EXPORT_PER_CPU_SYMBOL(numa_node);
>>   #endif
>>
>> +/*
>> + * A global cpumask of CPUs with per-cpu pages that gets
>> + * recomputed on each drain. We use a global cpumask
>> + * for to avoid allocation on direct reclaim code path
>> + * for CONFIG_CPUMASK_OFFSTACK=y
>> + */
>> +static cpumask_var_t cpus_with_pcps;
>> +
>>   #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>>   /*
>>    * N.B., Do NOT reference the '_numa_mem_' per cpu variable directly.
>> @@ -1119,7 +1127,19 @@ void drain_local_pages(void *arg)
>>    */
>>   void drain_all_pages(void)
>>   {
>> -     on_each_cpu(drain_local_pages, NULL, 1);
>> +     int cpu;
>> +     struct per_cpu_pageset *pcp;
>> +     struct zone *zone;
>> +
>
> get_online_cpu() ?

I believe this is not needed here as on_each_cpu_mask() (smp_call_function_many
really) later masks the cpumask with the online cpus, so at worst we
are turning on or off
a meaningless bit.

Anyway, If I'm wrong someone should fix show_free_areas() as well :-)

>
>> +     for_each_online_cpu(cpu)
>> +             for_each_populated_zone(zone) {
>> +                     pcp = per_cpu_ptr(zone->pageset, cpu);
>> +                     if (pcp->pcp.count)
>> +                             cpumask_set_cpu(cpu, cpus_with_pcps);
>> +                     else
>> +                             cpumask_clear_cpu(cpu, cpus_with_pcps);
>
> cpumask* functions can't be used locklessly?

I'm not sure I understand your question ocrrectly. As far as I
understand cpumask_set_cpu and cpumask_set_cpu
are atomic operations that do not require a lock (they might be
implemented using one though).

Thanks!
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-03 18:58     ` Gilad Ben-Yossef
@ 2012-01-03 22:02       ` KOSAKI Motohiro
  0 siblings, 0 replies; 37+ messages in thread
From: KOSAKI Motohiro @ 2012-01-03 22:02 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin,
	Rik van Riel, Andi Kleen, Mel Gorman, Andrew Morton,
	Alexander Viro, linux-fsdevel, Avi Kivity

(1/3/12 1:58 PM), Gilad Ben-Yossef wrote:
> 2012/1/3 KOSAKI Motohiro<kosaki.motohiro@gmail.com>:
>> (1/2/12 5:24 AM), Gilad Ben-Yossef wrote:
>>> Calculate a cpumask of CPUs with per-cpu pages in any zone
>>> and only send an IPI requesting CPUs to drain these pages
>>> to the buddy allocator if they actually have pages when
>>> asked to flush.
>>>
>>> This patch saves 99% of IPIs asking to drain per-cpu
>>> pages in case of severe memory preassure that leads
>>> to OOM since in these cases multiple, possibly concurrent,
>>> allocation requests end up in the direct reclaim code
>>> path so when the per-cpu pages end up reclaimed on first
>>> allocation failure for most of the proceeding allocation
>>> attempts until the memory pressure is off (possibly via
>>> the OOM killer) there are no per-cpu pages on most CPUs
>>> (and there can easily be hundreds of them).
>>>
>>> This also has the side effect of shortening the average
>>> latency of direct reclaim by 1 or more order of magnitude
>>> since waiting for all the CPUs to ACK the IPI takes a
>>> long time.
>>>
>>> Tested by running "hackbench 400" on a 4 CPU x86 otherwise
>>> idle VM and observing the difference between the number
>>> of direct reclaim attempts that end up in drain_all_pages()
>>> and those were more then 1/2 of the online CPU had any
>>> per-cpu page in them, using the vmstat counters introduced
>>> in the next patch in the series and using proc/interrupts.
>>>
>>> In the test sceanrio, this saved around 500 global IPIs.
>>> After trigerring an OOM:
>>>
>>> $ cat /proc/vmstat
>>> ...
>>> pcp_global_drain 627
>>> pcp_global_ipi_saved 578
>>>
>>> I've also seen the number of drains reach 15k calls
>>> with the saved percentage reaching 99% when there
>>> are more tasks running during an OOM kill.
>>>
>>> Signed-off-by: Gilad Ben-Yossef<gilad@benyossef.com>
>>> Acked-by: Christoph Lameter<cl@linux.com>
>>> CC: Chris Metcalf<cmetcalf@tilera.com>
>>> CC: Peter Zijlstra<a.p.zijlstra@chello.nl>
>>> CC: Frederic Weisbecker<fweisbec@gmail.com>
>>> CC: Russell King<linux@arm.linux.org.uk>
>>> CC: linux-mm@kvack.org
>>> CC: Pekka Enberg<penberg@kernel.org>
>>> CC: Matt Mackall<mpm@selenic.com>
>>> CC: Sasha Levin<levinsasha928@gmail.com>
>>> CC: Rik van Riel<riel@redhat.com>
>>> CC: Andi Kleen<andi@firstfloor.org>
>>> CC: Mel Gorman<mel@csn.ul.ie>
>>> CC: Andrew Morton<akpm@linux-foundation.org>
>>> CC: Alexander Viro<viro@zeniv.linux.org.uk>
>>> CC: linux-fsdevel@vger.kernel.org
>>> CC: Avi Kivity<avi@redhat.com>
>>> ---
>>>    Christopth Ack was for a previous version that allocated
>>>    the cpumask in drain_all_pages().
>>
>> When you changed a patch design and implementation, ACKs are
>> should be dropped. otherwise you miss to chance to get a good
>> review.
>>
>
> Got you. Thanks for the review :-)
>>
>>
>>>    mm/page_alloc.c |   26 +++++++++++++++++++++++++-
>>>    1 files changed, 25 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 2b8ba3a..092c331 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -67,6 +67,14 @@ DEFINE_PER_CPU(int, numa_node);
>>>    EXPORT_PER_CPU_SYMBOL(numa_node);
>>>    #endif
>>>
>>> +/*
>>> + * A global cpumask of CPUs with per-cpu pages that gets
>>> + * recomputed on each drain. We use a global cpumask
>>> + * for to avoid allocation on direct reclaim code path
>>> + * for CONFIG_CPUMASK_OFFSTACK=y
>>> + */
>>> +static cpumask_var_t cpus_with_pcps;
>>> +
>>>    #ifdef CONFIG_HAVE_MEMORYLESS_NODES
>>>    /*
>>>     * N.B., Do NOT reference the '_numa_mem_' per cpu variable directly.
>>> @@ -1119,7 +1127,19 @@ void drain_local_pages(void *arg)
>>>     */
>>>    void drain_all_pages(void)
>>>    {
>>> -     on_each_cpu(drain_local_pages, NULL, 1);
>>> +     int cpu;
>>> +     struct per_cpu_pageset *pcp;
>>> +     struct zone *zone;
>>> +
>>
>> get_online_cpu() ?
>
> I believe this is not needed here as on_each_cpu_mask() (smp_call_function_many
> really) later masks the cpumask with the online cpus, so at worst we
> are turning on or off
> a meaningless bit.

You are right. this function can't call get_online_cpus() and cpu unplug 
event automatically drop pcps. so, no worry.


>
> Anyway, If I'm wrong someone should fix show_free_areas() as well :-)
 >
>>> +     for_each_online_cpu(cpu)
>>> +             for_each_populated_zone(zone) {
>>> +                     pcp = per_cpu_ptr(zone->pageset, cpu);
>>> +                     if (pcp->pcp.count)
>>> +                             cpumask_set_cpu(cpu, cpus_with_pcps);
>>> +                     else
>>> +                             cpumask_clear_cpu(cpu, cpus_with_pcps);
>>
>> cpumask* functions can't be used locklessly?
>
> I'm not sure I understand your question ocrrectly. As far as I
> understand cpumask_set_cpu and cpumask_set_cpu
> are atomic operations that do not require a lock (they might be
> implemented using one though).

Ahh, yup. right you are.


Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function
  2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
  2012-01-03  7:51   ` Michal Nazarewicz
@ 2012-01-03 22:26   ` Andrew Morton
  2012-01-05 13:17     ` Michal Nazarewicz
  2012-01-08 16:04     ` Gilad Ben-Yossef
  1 sibling, 2 replies; 37+ messages in thread
From: Andrew Morton @ 2012-01-03 22:26 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel,
	Andi Kleen, Sasha Levin, Mel Gorman, Alexander Viro,
	linux-fsdevel, Avi Kivity

On Mon,  2 Jan 2012 12:24:12 +0200
Gilad Ben-Yossef <gilad@benyossef.com> wrote:

> on_each_cpu_mask calls a function on processors specified my cpumask,
> which may include the local processor.
> 
> All the limitation specified in smp_call_function_many apply.
> 
> ...
>
> --- a/include/linux/smp.h
> +++ b/include/linux/smp.h
> @@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
>  int on_each_cpu(smp_call_func_t func, void *info, int wait);
>  
>  /*
> + * Call a function on processors specified by mask, which might include
> + * the local one.
> + */
> +void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
> +		void *info, bool wait);
> +
> +/*
>   * Mark the boot cpu "online" so that it can call console drivers in
>   * printk() and can access its per-cpu storage.
>   */
> @@ -132,6 +139,15 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
>  		local_irq_enable();		\
>  		0;				\
>  	})
> +#define on_each_cpu_mask(mask, func, info, wait) \
> +	do {						\
> +		if (cpumask_test_cpu(0, (mask))) {	\
> +			local_irq_disable();		\
> +			(func)(info);			\
> +			local_irq_enable();		\
> +		}					\
> +	} while (0)

Why is the cpumask_test_cpu() call there?  It's hard to think of a
reason why "mask" would specify any CPU other than "0" in a
uniprocessor kernel.

If this code remains as-is, please add a comment here explaining this,
so others don't wonder the same thing.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func
  2012-01-02 10:24 ` [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func Gilad Ben-Yossef
@ 2012-01-03 22:34   ` Andrew Morton
  2012-01-08 16:09     ` Gilad Ben-Yossef
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2012-01-03 22:34 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Chris Metcalf, Christoph Lameter, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
	Alexander Viro, linux-fsdevel, Avi Kivity

On Mon,  2 Jan 2012 12:24:15 +0200
Gilad Ben-Yossef <gilad@benyossef.com> wrote:

> Add the on_each_cpu_required() function that wraps on_each_cpu_mask()
> and calculates the cpumask of cpus to IPI by calling a function supplied
> as a parameter in order to determine whether to IPI each specific cpu.

The name is actually "on_each_cpu_cond".

> The function deals with allocation failure of cpumask variable in
> CONFIG_CPUMASK_OFFSTACK=y by sending IPI to all cpus via on_each_cpu()
> instead.

This seems rather dangerous.  Poeple will test and ship code which has
always called only the targetted CPUs.  Later, real live users will get
the occasional memory exhaustion and will end up calling the callback
function on CPUs which aren't supposed to be used.  So users end up
running untested code.  And it's code which could quite easily explode,
because inattentive programmers could fall into assuming that the
function is not called on incorrect CPUs.  I think this is easy to fix
(see below).

> The function is useful since it allows to seperate the specific
> code that decided in each case whether to IPI a specific cpu for
> a specific request from the common boilerplate code of handling
> creating the mask, handling failures etc.
> 
>
> ...
>
> @@ -147,6 +155,14 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
>  			local_irq_enable();		\
>  		}					\
>  	} while (0)
> +#define on_each_cpu_cond(cond_func, func, info, wait) \
> +	do {						\
> +		if (cond_func(0, info)) {		\

I suppose this is reasonable.  It's likely that on UP, cond_func() will
always return true but perhaps for some reason it won't.  hmmm...

> +			local_irq_disable();		\
> +			(func)(info);			\
> +			local_irq_enable();		\
> +		}					\
> +	} while (0)
>  
>  static inline void smp_send_reschedule(int cpu) { }
>  #define num_booting_cpus()			1
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 7c0cbd7..5f7b24e 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -721,3 +721,30 @@ void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
>  	put_cpu();
>  }
>  EXPORT_SYMBOL(on_each_cpu_mask);
> +
> +/*
> + * Call a function on each processor for which the supplied function
> + * cond_func returns a positive value. This may include the local
> + * processor, optionally waiting for all the required CPUs to finish.
> + * The function may be called on all online CPUs without running the
> + * cond_func function in extreme circumstance (memory allocation
> + * failure condition when CONFIG_CPUMASK_OFFSTACK=y)
> + * All the limitations specified in smp_call_function_many apply.
> + */
> +void on_each_cpu_cond(int (*cond_func) (int cpu, void *info),
> +			void (*func)(void *), void *info, bool wait)
> +{
> +	cpumask_var_t cpus;
> +	int cpu;
> +
> +	if (likely(zalloc_cpumask_var(&cpus, GFP_ATOMIC))) {
> +		for_each_online_cpu(cpu)
> +			if (cond_func(cpu, info))
> +				cpumask_set_cpu(cpu, cpus);
> +		on_each_cpu_mask(cpus, func, info, wait);
> +		free_cpumask_var(cpus);
> +	} else
> +		on_each_cpu(func, info, wait);
> +}
> +EXPORT_SYMBOL(on_each_cpu_cond);

If zalloc_cpumask_var() fails, can we not fall back to

		for_each_online_cpu(cpu)
			if (cond_func(cpu, info))
				smp_call_function_single(...);


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function
  2012-01-03 22:26   ` Andrew Morton
@ 2012-01-05 13:17     ` Michal Nazarewicz
  2012-01-08 16:04     ` Gilad Ben-Yossef
  1 sibling, 0 replies; 37+ messages in thread
From: Michal Nazarewicz @ 2012-01-05 13:17 UTC (permalink / raw)
  To: Gilad Ben-Yossef, Andrew Morton
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel,
	Andi Kleen, Sasha Levin, Mel Gorman, Alexander Viro,
	linux-fsdevel, Avi Kivity

On Tue, 03 Jan 2012 23:26:24 +0100, Andrew Morton <akpm@linux-foundation.org> wrote:

> On Mon,  2 Jan 2012 12:24:12 +0200
> Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>> on_each_cpu_mask calls a function on processors specified my cpumask,
>> which may include the local processor.

>> @@ -132,6 +139,15 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
>>  		local_irq_enable();		\
>>  		0;				\
>>  	})
>> +#define on_each_cpu_mask(mask, func, info, wait) \
>> +	do {						\
>> +		if (cpumask_test_cpu(0, (mask))) {	\
>> +			local_irq_disable();		\
>> +			(func)(info);			\
>> +			local_irq_enable();		\
>> +		}					\
>> +	} while (0)
>
> Why is the cpumask_test_cpu() call there?  It's hard to think of a
> reason why "mask" would specify any CPU other than "0" in a
> uniprocessor kernel.

It may specify none.  For instance, in drain_all_pages() case, if the
CPU has no pages on PCP lists, the mask will be empty and so the
cpumask_test_cpu() will return zero.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-03 17:45   ` KOSAKI Motohiro
  2012-01-03 18:58     ` Gilad Ben-Yossef
@ 2012-01-05 14:20     ` Mel Gorman
  2012-01-05 14:40       ` Russell King - ARM Linux
  1 sibling, 1 reply; 37+ messages in thread
From: Mel Gorman @ 2012-01-05 14:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Gilad Ben-Yossef, linux-kernel, Chris Metcalf, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

On Tue, Jan 03, 2012 at 12:45:45PM -0500, KOSAKI Motohiro wrote:
> >   void drain_all_pages(void)
> >   {
> > -	on_each_cpu(drain_local_pages, NULL, 1);
> > +	int cpu;
> > +	struct per_cpu_pageset *pcp;
> > +	struct zone *zone;
> > +
> 
> get_online_cpu() ?
> 

Just a separate note;

I'm looking at some mysterious CPU hotplug problems that only happen
under heavy load. My strongest suspicion at the moment that the problem
is related to on_each_cpu() being used without get_online_cpu() but you
cannot simply call get_online_cpu() in this path without causing
deadlock.

If/when I get a patch that can complete a CPU hotplug stress test
successfully, I'll post it. It'll collide with this series but it should
be manageable.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 14:20     ` Mel Gorman
@ 2012-01-05 14:40       ` Russell King - ARM Linux
  2012-01-05 15:24         ` Peter Zijlstra
  2012-01-05 16:17         ` Mel Gorman
  0 siblings, 2 replies; 37+ messages in thread
From: Russell King - ARM Linux @ 2012-01-05 14:40 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Gilad Ben-Yossef, linux-kernel, Chris Metcalf,
	Peter Zijlstra, Frederic Weisbecker, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, Jan 05, 2012 at 02:20:17PM +0000, Mel Gorman wrote:
> On Tue, Jan 03, 2012 at 12:45:45PM -0500, KOSAKI Motohiro wrote:
> > >   void drain_all_pages(void)
> > >   {
> > > -	on_each_cpu(drain_local_pages, NULL, 1);
> > > +	int cpu;
> > > +	struct per_cpu_pageset *pcp;
> > > +	struct zone *zone;
> > > +
> > 
> > get_online_cpu() ?
> > 
> 
> Just a separate note;
> 
> I'm looking at some mysterious CPU hotplug problems that only happen
> under heavy load. My strongest suspicion at the moment that the problem
> is related to on_each_cpu() being used without get_online_cpu() but you
> cannot simply call get_online_cpu() in this path without causing
> deadlock.

Mel,

That's a known hotplug problems.  PeterZ has a patch which (probably)
solves it, but there seems to be very little traction of any kind to
merge it.  I've been chasing that patch and getting no replies what so
ever from folk like Peter, Thomas and Ingo.

The problem affects all IPI-raising functions, which mask with
cpu_online_mask directly.

I'm not sure that smp_call_function() can use get_online_cpu() as it
looks like it's not permitted to sleep (it spins in csd_lock_wait if
it is to wait for the called function to complete on all CPUs,
rather than using a sleepable completion.)  get_online_cpu() solves
the online mask problem by sleeping until it's safe to access it.

So, I think this whole CPU bringup mess needs to be re-thought, and
the seemingly constant to pile more and more restrictions onto the
bringup path needs resolving.  It's got to the point where there's
soo many restrictions that actually it's impossible for arch code to
simultaneously satisfy them all.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 14:40       ` Russell King - ARM Linux
@ 2012-01-05 15:24         ` Peter Zijlstra
  2012-01-05 16:17         ` Mel Gorman
  1 sibling, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2012-01-05 15:24 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mel Gorman, KOSAKI Motohiro, Gilad Ben-Yossef, linux-kernel,
	Chris Metcalf, Frederic Weisbecker, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, 2012-01-05 at 14:40 +0000, Russell King - ARM Linux wrote:
> I've been chasing that patch and getting no replies what so
> ever from folk like Peter, Thomas and Ingo. 

Holidays etc.. I _think_ the patch is good, but would really like
someone else to verify, its too simple to be right :-)

Thomas said he'd bend his brain around it, but he's been having holidays
as well and isn't back from them afaik.

As for completely reworking the whole hotplug crap, I'd fully support
that, there's a lot of duplication in the arch code that should be
generic code. Also lots of different ways to solve the same problem
etc.. lots of different bug too I bet.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
  2012-01-03 17:45   ` KOSAKI Motohiro
@ 2012-01-05 15:54   ` Mel Gorman
  2012-01-08 16:01     ` Gilad Ben-Yossef
  1 sibling, 1 reply; 37+ messages in thread
From: Mel Gorman @ 2012-01-05 15:54 UTC (permalink / raw)
  To: Gilad Ben-Yossef
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin,
	Rik van Riel, Andi Kleen, Andrew Morton, Alexander Viro,
	linux-fsdevel, Avi Kivity

On Mon, Jan 02, 2012 at 12:24:18PM +0200, Gilad Ben-Yossef wrote:
> <SNIP>
> This patch saves 99% of IPIs asking to drain per-cpu
> pages in case of severe memory preassure that leads
> to OOM since in these cases multiple, possibly concurrent,
> allocation requests end up in the direct reclaim code
> path so when the per-cpu pages end up reclaimed on first
> allocation failure for most of the proceeding allocation
> attempts until the memory pressure is off (possibly via
> the OOM killer) there are no per-cpu pages on most CPUs
> (and there can easily be hundreds of them).
> 

Ok. I also noticed this independently within the last day while
investing a CPU hotplug problem. Specifically, in low memory situations
(not necessarily OOM) a number of processes hit direct reclaim at
the same time, drain at the same time so there were multiple IPIs
draining the lists of which only the first one had useful work to do.
The workload in this case was a large number of kernel compiles but
I suspect any fork-heavy workload doing order-1 allocations under
memory pressure encounters this.

> <SNIP>
> Tested by running "hackbench 400" on a 4 CPU x86 otherwise
> idle VM and observing the difference between the number
> of direct reclaim attempts that end up in drain_all_pages()
> and those were more then 1/2 of the online CPU had any
> per-cpu page in them, using the vmstat counters introduced
> in the next patch in the series and using proc/interrupts.
> 
> In the test sceanrio, this saved around 500 global IPIs.
> After trigerring an OOM:
> 
> $ cat /proc/vmstat
> ...
> pcp_global_drain 627
> pcp_global_ipi_saved 578
> 

This isn't 99% savings as you claim earlier but they are still great.

Thanks for doing the stats. Just to be clear, I didn't expect these
stats to be merged, nor do I want them to. I wanted to be sure the patch
was really behaving as advertised.

Acked-by: Mel Gorman <mgorman@suse.de>


> +	for_each_online_cpu(cpu)
> +		for_each_populated_zone(zone) {
> +			pcp = per_cpu_ptr(zone->pageset, cpu);
> +			if (pcp->pcp.count)
> +				cpumask_set_cpu(cpu, cpus_with_pcps);
> +			else
> +				cpumask_clear_cpu(cpu, cpus_with_pcps);
> +		}
> +	on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);

As a heads-up, I'm looking at a candidate CPU hotplug patch that almost
certainly will collide with this patch. If/when I get it fixed, I'll be
sure to CC you so we can figure out what order the patches need to go
in. Ordinarily it wouldn't matter but if this really is a CPU hotplug
fix, it might also be a -stable candidate so it would need to go in
before your patches.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 14:40       ` Russell King - ARM Linux
  2012-01-05 15:24         ` Peter Zijlstra
@ 2012-01-05 16:17         ` Mel Gorman
  2012-01-05 16:35           ` Russell King - ARM Linux
                             ` (2 more replies)
  1 sibling, 3 replies; 37+ messages in thread
From: Mel Gorman @ 2012-01-05 16:17 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: KOSAKI Motohiro, Gilad Ben-Yossef, linux-kernel, Chris Metcalf,
	Peter Zijlstra, Frederic Weisbecker, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, Jan 05, 2012 at 02:40:11PM +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 05, 2012 at 02:20:17PM +0000, Mel Gorman wrote:
> > On Tue, Jan 03, 2012 at 12:45:45PM -0500, KOSAKI Motohiro wrote:
> > > >   void drain_all_pages(void)
> > > >   {
> > > > -	on_each_cpu(drain_local_pages, NULL, 1);
> > > > +	int cpu;
> > > > +	struct per_cpu_pageset *pcp;
> > > > +	struct zone *zone;
> > > > +
> > > 
> > > get_online_cpu() ?
> > > 
> > 
> > Just a separate note;
> > 
> > I'm looking at some mysterious CPU hotplug problems that only happen
> > under heavy load. My strongest suspicion at the moment that the problem
> > is related to on_each_cpu() being used without get_online_cpu() but you
> > cannot simply call get_online_cpu() in this path without causing
> > deadlock.
> 
> Mel,
> 
> That's a known hotplug problems.  PeterZ has a patch which (probably)
> solves it, but there seems to be very little traction of any kind to
> merge it. 

Link please? I'm including a patch below under development that is
intended to only cope with the page allocator case under heavy memory
pressure. Currently it does not pass testing because eventually RCU
gets stalled with the following trace

[ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
[ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
[ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
[ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
[ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
[ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
[ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
[ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
[ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
[ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
[ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
[ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
[ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
[ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
[ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
[ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
[ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b

It might be a separate bug, don't know for sure.

> I've been chasing that patch and getting no replies what so
> ever from folk like Peter, Thomas and Ingo.
> 
> The problem affects all IPI-raising functions, which mask with
> cpu_online_mask directly.
> 

Actually, in one sense I'm glad to hear it because from my brief
poking around, I was having trouble understanding why we were always
safe from sending IPIs to CPUs in the process of being offlined.

> I'm not sure that smp_call_function() can use get_online_cpu() as it
> looks like it's not permitted to sleep (it spins in csd_lock_wait if
> it is to wait for the called function to complete on all CPUs,
> rather than using a sleepable completion.)  get_online_cpu() solves
> the online mask problem by sleeping until it's safe to access it.
> 

Yeah, although from the context of the page allocator calling
get_online_cpu() is not safe because it can deadlock kthreadd.

In the interest of comparing with PeterZ's patch, here is the patch I'm
currently looking at. It has not passed testing yet. I suspect it'll be
met with hatred but it will at least highlight some of the problems I've
seen recently (which apparently are not new)

Gilad, I expect this patch to collide with yours but I also expect
yours could be based on top of it if necessary. There is also the
side-effect that this patch should reduce the number of IPIs sent by the
page allocator under memory pressure.

==== CUT HERE ====
mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists

While running a CPU hotplug stress test under memory pressure, I
saw cases where under enough stress the machine would halt although
it required a machine with 8 cores and plenty memory. I think the
problems may be related.

Part of the problem is the page allocator is sending IPIs using
on_each_cpu() without calling get_online_cpus() to prevent changes
to the online cpumask. This allows IPIs to be send to CPUs that
are going offline or offline already.

Adding just a call to get_online_cpus() is not enough as kthreadd
could block on cpu_hotplug mutex while another process is blocked with
the mutex held waiting for kthreadd to make forward progress leading
to deadlock. Additionally, it is important that cpu_hotplug mutex
does not become a new hot lock while under pressure.  This is also
the consideration that CPU hotplug expects that get_online_cpus()
is not called frequently as it can lead to livelock in exceptional
circumstances (see comment above cpu_hotplug_begin()).

Hence, this patch adds a try_get_online_cpus() function used
by the page allocator to only acquire the mutex and elevate the
hotplug reference count when uncontended. This ensures the CPU mask
is valid when sending an IPI to drain all pages while avoiding
hammering cpu_hotplug mutex or potentially deadlocking kthreadd.
As a side-effect the number of IPIs sent while under memory pressure
is reduced.

Not-signed-off
--- 
 include/linux/cpu.h |    2 ++
 kernel/cpu.c        |   26 ++++++++++++++++++++++++++
 mm/page_alloc.c     |   22 ++++++++++++++++++----
 3 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 5f09323..9ac5c27 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -133,6 +133,7 @@ extern struct sysdev_class cpu_sysdev_class;
 /* Stop CPUs going up and down. */
 
 extern void get_online_cpus(void);
+extern bool try_get_online_cpus(void);
 extern void put_online_cpus(void);
 #define hotcpu_notifier(fn, pri)	cpu_notifier(fn, pri)
 #define register_hotcpu_notifier(nb)	register_cpu_notifier(nb)
@@ -156,6 +157,7 @@ static inline void cpu_hotplug_driver_unlock(void)
 
 #define get_online_cpus()	do { } while (0)
 #define put_online_cpus()	do { } while (0)
+#define try_put_online_cpus()	true
 #define hotcpu_notifier(fn, pri)	do { (void)(fn); } while (0)
 /* These aren't inline functions due to a GCC bug. */
 #define register_hotcpu_notifier(nb)	({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa39dd7..a90422f 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -70,6 +70,32 @@ void get_online_cpus(void)
 }
 EXPORT_SYMBOL_GPL(get_online_cpus);
 
+/*
+ * This differs from get_online_cpus() in that it tries to get uncontended
+ * access to the online CPU mask. Principally this is used by the page
+ * allocator to avoid hammering on the cpu_hotplug mutex and to limit the
+ * number of IPIs it is sending.
+ */
+bool try_get_online_cpus(void)
+{
+	bool contention_free = false;
+	might_sleep();
+	if (cpu_hotplug.refcount)
+		return false;
+
+	if (cpu_hotplug.active_writer == current)
+		return true;
+
+	mutex_lock(&cpu_hotplug.lock);
+	if (!cpu_hotplug.refcount) {
+		contention_free = true;
+		cpu_hotplug.refcount++;
+	}
+	mutex_unlock(&cpu_hotplug.lock);
+
+	return contention_free;
+}
+
 void put_online_cpus(void)
 {
 	if (cpu_hotplug.active_writer == current)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e684e6b..7f75cab 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -57,6 +57,7 @@
 #include <linux/ftrace_event.h>
 #include <linux/memcontrol.h>
 #include <linux/prefetch.h>
+#include <linux/kthread.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1129,7 +1130,18 @@ void drain_local_pages(void *arg)
  */
 void drain_all_pages(void)
 {
+	get_online_cpus();
 	on_each_cpu(drain_local_pages, NULL, 1);
+	put_online_cpus();
+}
+
+static bool try_drain_all_pages(void)
+{
+	if (!try_get_online_cpus())
+		return false;
+	on_each_cpu(drain_local_pages, NULL, 1);
+	put_online_cpus();
+	return true;
 }
 
 #ifdef CONFIG_HIBERNATION
@@ -2026,11 +2038,13 @@ retry:
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
-	 * pages are pinned on the per-cpu lists. Drain them and try again
+	 * pages are pinned on the per-cpu lists. Drain them and try again.
+	 * kthreadd cannot drain all pages as the current holder of the
+	 * cpu_hotplug mutex could be waiting for kthreadd to make forward
+	 * progress.
 	 */
-	if (!page && !drained) {
-		drain_all_pages();
-		drained = true;
+	if (!page && !drained && current != kthreadd_task) {
+		drained = try_drain_all_pages();
 		goto retry;
 	}
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 16:17         ` Mel Gorman
@ 2012-01-05 16:35           ` Russell King - ARM Linux
  2012-01-05 18:35             ` Paul E. McKenney
  2012-01-05 22:06           ` Andrew Morton
  2012-01-07 16:52           ` Paul E. McKenney
  2 siblings, 1 reply; 37+ messages in thread
From: Russell King - ARM Linux @ 2012-01-05 16:35 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Gilad Ben-Yossef, linux-kernel, Chris Metcalf,
	Peter Zijlstra, Frederic Weisbecker, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
	Andrew Morton, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> Link please?

Forwarded, as its still in my mailbox.

> I'm including a patch below under development that is
> intended to only cope with the page allocator case under heavy memory
> pressure. Currently it does not pass testing because eventually RCU
> gets stalled with the following trace
> 
> [ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
> [ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
> [ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
> [ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
> [ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
> [ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
> [ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
> [ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
> [ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
> [ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
> [ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
> [ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
> [ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
> [ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
> [ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
> [ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
> [ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
> 
> It might be a separate bug, don't know for sure.

I'm not going to even pretend to understand what the above backtrace
means: it doesn't look like what I'd expect from the problem which
PeterZ's patch is supposed to address.  It certainly doesn't do anything
to address the cpu-going-offline problem you seem to have found.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 16:35           ` Russell King - ARM Linux
@ 2012-01-05 18:35             ` Paul E. McKenney
  2012-01-05 22:21               ` Mel Gorman
  0 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2012-01-05 18:35 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Mel Gorman, KOSAKI Motohiro, Gilad Ben-Yossef, linux-kernel,
	Chris Metcalf, Peter Zijlstra, Frederic Weisbecker, linux-mm,
	Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Andrew Morton, Alexander Viro, linux-fsdevel,
	Avi Kivity

On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote:
> On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> > Link please?
> 
> Forwarded, as its still in my mailbox.
> 
> > I'm including a patch below under development that is
> > intended to only cope with the page allocator case under heavy memory
> > pressure. Currently it does not pass testing because eventually RCU
> > gets stalled with the following trace
> > 
> > [ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
> > [ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
> > [ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
> > [ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
> > [ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
> > [ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
> > [ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
> > [ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
> > [ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
> > [ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
> > [ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
> > [ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
> > [ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
> > [ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
> > [ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
> > [ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
> > [ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
> > 
> > It might be a separate bug, don't know for sure.

Do you get multiple RCU CPU stall-warning messages?  If so, it can
be helpful to look at how the stack frame changes over time.  These
stalls are normally caused by a loop in the kernel with preemption
disabled, though other scenarios can also cause them.

I am assuming that the CPU is reporting a stall on itself in this case.
If not, then it is necessary to look at the stack of the CPU that the
stall is being reported for.

							Thanx, Paul

> I'm not going to even pretend to understand what the above backtrace
> means: it doesn't look like what I'd expect from the problem which
> PeterZ's patch is supposed to address.  It certainly doesn't do anything
> to address the cpu-going-offline problem you seem to have found.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 16:17         ` Mel Gorman
  2012-01-05 16:35           ` Russell King - ARM Linux
@ 2012-01-05 22:06           ` Andrew Morton
  2012-01-05 22:31             ` Mel Gorman
  2012-01-07 16:52           ` Paul E. McKenney
  2 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2012-01-05 22:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Russell King - ARM Linux, KOSAKI Motohiro, Gilad Ben-Yossef,
	linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, 5 Jan 2012 16:17:39 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists
> 
> While running a CPU hotplug stress test under memory pressure, I
> saw cases where under enough stress the machine would halt although
> it required a machine with 8 cores and plenty memory. I think the
> problems may be related.

When we first implemented them, the percpu pages in the page allocator
were of really really marginal benefit.  I didn't merge the patches at
all for several cycles, and it was eventually a 49/51 decision.

So I suggest that our approach to solving this particular problem
should be to nuke the whole thing, then see if that caused any
observeable problems.  If it did, can we solve those problems by means
other than bringing the dang things back?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 18:35             ` Paul E. McKenney
@ 2012-01-05 22:21               ` Mel Gorman
  2012-01-06  6:06                 ` Srivatsa S. Bhat
  2012-01-06 13:28                 ` Greg KH
  0 siblings, 2 replies; 37+ messages in thread
From: Mel Gorman @ 2012-01-05 22:21 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Russell King - ARM Linux, KOSAKI Motohiro, Gilad Ben-Yossef,
	linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Andrew Morton, Alexander Viro, Greg KH,
	linux-fsdevel, Avi Kivity

(Adding Greg to cc to see if he recalls seeing issues with sysfs dentry
suffering from recursive locking recently)

On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote:
> > On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> > > Link please?
> > 
> > Forwarded, as its still in my mailbox.
> > 
> > > I'm including a patch below under development that is
> > > intended to only cope with the page allocator case under heavy memory
> > > pressure. Currently it does not pass testing because eventually RCU
> > > gets stalled with the following trace
> > > 
> > > [ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
> > > [ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
> > > [ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
> > > [ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
> > > [ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
> > > [ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
> > > [ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
> > > [ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
> > > [ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
> > > [ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
> > > [ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
> > > [ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
> > > [ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
> > > [ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
> > > [ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
> > > [ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
> > > [ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
> > > 
> > > It might be a separate bug, don't know for sure.
> 

I rebased the patch on top of 3.2 and tested again with a bunch of
debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU
hotplug is a lot more reliable and less likely to hang but eventually
gets into trouble.

Taking a closer look though, I don't think this is an RCU problem. It's
just the messenger.

> Do you get multiple RCU CPU stall-warning messages? 

Yes, one roughly every 50000 jiffies or so (HZ=250).

[  878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies)
[  878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies)
[ 1072.878669] INFO: rcu_sched detected stall on CPU 3 (t=65030 jiffies)
[ 1072.878672] INFO: rcu_sched detected stall on CPU 6 (t=65030 jiffies)
[ 1267.442308] INFO: rcu_sched detected stall on CPU 3 (t=113810 jiffies)
[ 1267.442312] INFO: rcu_sched detected stall on CPU 6 (t=113810 jiffies)
[ 1462.005948] INFO: rcu_sched detected stall on CPU 3 (t=162590 jiffies)
[ 1462.005952] INFO: rcu_sched detected stall on CPU 6 (t=162590 jiffies)
[ 1656.569588] INFO: rcu_sched detected stall on CPU 3 (t=211370 jiffies)
[ 1656.569592] INFO: rcu_sched detected stall on CPU 6 (t=211370 jiffies)
[ 1851.133229] INFO: rcu_sched detected stall on CPU 6 (t=260150 jiffies)
[ 1851.133233] INFO: rcu_sched detected stall on CPU 3 (t=260150 jiffies)
[ 2045.696868] INFO: rcu_sched detected stall on CPU 3 (t=308930 jiffies)
[ 2045.696872] INFO: rcu_sched detected stall on CPU 6 (t=308930 jiffies)
[ 2240.260508] INFO: rcu_sched detected stall on CPU 6 (t=357710 jiffies)
[ 2240.260511] INFO: rcu_sched detected stall on CPU 3 (t=357710 jiffies)

> If so, it can
> be helpful to look at how the stack frame changes over time.  These
> stalls are normally caused by a loop in the kernel with preemption
> disabled, though other scenarios can also cause them.
> 

The stacks are not changing much over time and start with this;

[  878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies)
[  878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies)
[  878.315036] Pid: 4422, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2
[  878.315037] Call Trace:
[  878.315038]  <IRQ>  [<ffffffff810a8b20>] __rcu_pending+0x8e/0x36c
[  878.315052]  [<ffffffff81071b9a>] ? tick_nohz_handler+0xdc/0xdc
[  878.315054]  [<ffffffff810a8f04>] rcu_check_callbacks+0x106/0x172
[  878.315056]  [<ffffffff810528e0>] update_process_times+0x3f/0x76
[  878.315058]  [<ffffffff81071c0a>] tick_sched_timer+0x70/0x9a
[  878.315060]  [<ffffffff8106654e>] __run_hrtimer+0xc7/0x157
[  878.315062]  [<ffffffff810667ec>] hrtimer_interrupt+0xba/0x18a
[  878.315065]  [<ffffffff8134fbad>] smp_apic_timer_interrupt+0x86/0x99
[  878.315067]  [<ffffffff8134dbf3>] apic_timer_interrupt+0x73/0x80
[  878.315068]  <EOI>  [<ffffffff81345f34>] ? retint_restore_args+0x13/0x13
[  878.315072]  [<ffffffff81139591>] ? __shrink_dcache_sb+0x7d/0x19f
[  878.315075]  [<ffffffff81008c6e>] ? native_read_tsc+0x1/0x16
[  878.315077]  [<ffffffff811df434>] ? delay_tsc+0x3a/0x82
[  878.315079]  [<ffffffff811df4a1>] __delay+0xf/0x11
[  878.315081]  [<ffffffff811e51e5>] do_raw_spin_lock+0xb5/0xf9
[  878.315083]  [<ffffffff81345561>] _raw_spin_lock+0x39/0x3d
[  878.315085]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
[  878.315087]  [<ffffffff8113972a>] shrink_dcache_parent+0x77/0x28c
[  878.315089]  [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd
[  878.315092]  [<ffffffff81185970>] sysfs_dentry_revalidate+0xaa/0xbe
[  878.315093]  [<ffffffff8112e731>] do_lookup+0x263/0x2fc
[  878.315096]  [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20
[  878.315098]  [<ffffffff8112f33d>] link_path_walk+0x1e2/0x763
[  878.315099]  [<ffffffff8112fd66>] path_lookupat+0x5c/0x61a
[  878.315102]  [<ffffffff810f4810>] ? might_fault+0x89/0x8d
[  878.315104]  [<ffffffff810f47c7>] ? might_fault+0x40/0x8d
[  878.315105]  [<ffffffff8113034e>] do_path_lookup+0x2a/0xa8
[  878.315107]  [<ffffffff81132a51>] user_path_at_empty+0x5d/0x97
[  878.315109]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
[  878.315111]  [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a
[  878.315112]  [<ffffffff81132a9c>] user_path_at+0x11/0x13
[  878.315115]  [<ffffffff81128b64>] vfs_fstatat+0x44/0x71
[  878.315117]  [<ffffffff81128bef>] vfs_lstat+0x1e/0x20
[  878.315118]  [<ffffffff81128c10>] sys_newlstat+0x1f/0x40
[  878.315120]  [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164
[  878.315122]  [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  878.315124]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
[  878.315126]  [<ffffffff8134d082>] system_call_fastpath+0x16/0x1b
[  878.557790] Pid: 5704, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2
[  878.564226] Call Trace:
[  878.566677]  <IRQ>  [<ffffffff810a8b20>] __rcu_pending+0x8e/0x36c
[  878.572783]  [<ffffffff81071b9a>] ? tick_nohz_handler+0xdc/0xdc
[  878.578702]  [<ffffffff810a8f04>] rcu_check_callbacks+0x106/0x172
[  878.584794]  [<ffffffff810528e0>] update_process_times+0x3f/0x76
[  878.590798]  [<ffffffff81071c0a>] tick_sched_timer+0x70/0x9a
[  878.596459]  [<ffffffff8106654e>] __run_hrtimer+0xc7/0x157
[  878.601944]  [<ffffffff810667ec>] hrtimer_interrupt+0xba/0x18a
[  878.607778]  [<ffffffff8134fbad>] smp_apic_timer_interrupt+0x86/0x99
[  878.614129]  [<ffffffff8134dbf3>] apic_timer_interrupt+0x73/0x80
[  878.620134]  <EOI>  [<ffffffff81051e66>] ? run_timer_softirq+0x49/0x32a
[  878.626759]  [<ffffffff81139591>] ? __shrink_dcache_sb+0x7d/0x19f
[  878.632851]  [<ffffffff811df402>] ? delay_tsc+0x8/0x82
[  878.637988]  [<ffffffff811df4a1>] __delay+0xf/0x11
[  878.642778]  [<ffffffff811e51e5>] do_raw_spin_lock+0xb5/0xf9
[  878.648437]  [<ffffffff81345561>] _raw_spin_lock+0x39/0x3d
[  878.653920]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
[  878.660186]  [<ffffffff8113972a>] shrink_dcache_parent+0x77/0x28c
[  878.666277]  [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd
[  878.672107]  [<ffffffff81185970>] sysfs_dentry_revalidate+0xaa/0xbe
[  878.678372]  [<ffffffff8112e731>] do_lookup+0x263/0x2fc
[  878.683596]  [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20
[  878.690207]  [<ffffffff8112f33d>] link_path_walk+0x1e2/0x763
[  878.695866]  [<ffffffff8112fd66>] path_lookupat+0x5c/0x61a
[  878.701350]  [<ffffffff810f4810>] ? might_fault+0x89/0x8d
[  878.706747]  [<ffffffff810f47c7>] ? might_fault+0x40/0x8d
[  878.712145]  [<ffffffff8113034e>] do_path_lookup+0x2a/0xa8
[  878.717630]  [<ffffffff81132a51>] user_path_at_empty+0x5d/0x97
[  878.723463]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
[  878.729295]  [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a
[  878.736080]  [<ffffffff81132a9c>] user_path_at+0x11/0x13
[  878.741391]  [<ffffffff81128b64>] vfs_fstatat+0x44/0x71
[  878.746616]  [<ffffffff81128bef>] vfs_lstat+0x1e/0x20
[  878.751668]  [<ffffffff81128c10>] sys_newlstat+0x1f/0x40
[  878.756981]  [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164
[  878.763678]  [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  878.770116]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
[  878.775949]  [<ffffffff8134d082>] system_call_fastpath+0x16/0x1b
[  908.769486] BUG: spinlock lockup on CPU#6, udevd/4422
[  908.774547]  lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4

Seeing that the owner was CPU 4, I found earlier in the log

[  815.244051] BUG: spinlock lockup on CPU#4, udevd/5709
[  815.249103]  lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4
[  815.258430] Pid: 5709, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2
[  815.264866] Call Trace:
[  815.267329]  [<ffffffff811e507d>] spin_dump+0x88/0x8d
[  815.272388]  [<ffffffff811e5206>] do_raw_spin_lock+0xd6/0xf9
[  815.278062]  [<ffffffff81345561>] ? _raw_spin_lock+0x39/0x3d
[  815.283720]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
[  815.289986]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
[  815.296249]  [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd
[  815.302080]  [<ffffffff81185970>] ? sysfs_dentry_revalidate+0xaa/0xbe
[  815.308515]  [<ffffffff8112e731>] ? do_lookup+0x263/0x2fc
[  815.313915]  [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20
[  815.320524]  [<ffffffff8112f33d>] ? link_path_walk+0x1e2/0x763
[  815.326357]  [<ffffffff8112fd66>] ? path_lookupat+0x5c/0x61a
[  815.332014]  [<ffffffff810f4810>] ? might_fault+0x89/0x8d
[  815.337410]  [<ffffffff810f47c7>] ? might_fault+0x40/0x8d
[  815.342807]  [<ffffffff8113034e>] ? do_path_lookup+0x2a/0xa8
[  815.348465]  [<ffffffff81132a51>] ? user_path_at_empty+0x5d/0x97
[  815.354474]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
[  815.360303]  [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a
[  815.367085]  [<ffffffff81132a9c>] ? user_path_at+0x11/0x13
[  815.372569]  [<ffffffff81128b64>] ? vfs_fstatat+0x44/0x71
[  815.377965]  [<ffffffff81128bef>] ? vfs_lstat+0x1e/0x20
[  815.383192]  [<ffffffff81128c10>] ? sys_newlstat+0x1f/0x40
[  815.388676]  [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164
[  815.395373]  [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[  815.401811]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
[  815.407642]  [<ffffffff8134d082>] ? system_call_fastpath+0x16/0x1b

The trace is not particularly useful but it looks like it
recursively locked even though the message doesn't say that.  If the
shrink_dcache_parent() entry is accurate, that corresponds to this

static int select_parent(struct dentry * parent)
{
        struct dentry *this_parent;
        struct list_head *next;
        unsigned seq;
        int found = 0;
        int locked = 0;

        seq = read_seqbegin(&rename_lock);
again: 
        this_parent = parent;
        spin_lock(&this_parent->d_lock); <----- HERE

I'm not overly clear on how VFS locking is meant to work but it almost
looks as if the last reference to an inode is being dropped during a
sysfs path lookup. Is that meant to happen?

Judging by sysfs_dentry_revalidate() - possibly not. It looks like
we must have reached out_bad: and called shrink_dcache_parent() on a
dentry that was already locked by the running process. Not sure how
this could have happened - Greg, does this look familiar?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 22:06           ` Andrew Morton
@ 2012-01-05 22:31             ` Mel Gorman
  2012-01-05 23:19               ` Andrew Morton
  0 siblings, 1 reply; 37+ messages in thread
From: Mel Gorman @ 2012-01-05 22:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Russell King - ARM Linux, KOSAKI Motohiro, Gilad Ben-Yossef,
	linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, Jan 05, 2012 at 02:06:45PM -0800, Andrew Morton wrote:
> On Thu, 5 Jan 2012 16:17:39 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists
> > 
> > While running a CPU hotplug stress test under memory pressure, I
> > saw cases where under enough stress the machine would halt although
> > it required a machine with 8 cores and plenty memory. I think the
> > problems may be related.
> 
> When we first implemented them, the percpu pages in the page allocator
> were of really really marginal benefit.  I didn't merge the patches at
> all for several cycles, and it was eventually a 49/51 decision.
> 
> So I suggest that our approach to solving this particular problem
> should be to nuke the whole thing, then see if that caused any
> observeable problems.  If it did, can we solve those problems by means
> other than bringing the dang things back?
> 

Sounds drastic. It would be less controversial to replace this patch
with a version that calls get_online_cpu() in drain_all_pages() but
remove the call to drain_all_pages() call from the page allocator on
the grounds it is not safe against CPU hotplug and to hell with the
slightly elevated allocation failure rates and stalls. That would avoid
the try_get_online_cpus() crappiness and be less complex.

If you really want to consider deleting the per-cpu allocator, maybe
it could be a LSF/MM topic? Personally I would be wary of deleting
it but mostly because I lack regular access to the type of hardware
to evaulate whether it was safe to remove or not. Minimally, removing
the per-cpu allocator could make the zone lock very hot even though slub
probably makes it very hot already.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 22:31             ` Mel Gorman
@ 2012-01-05 23:19               ` Andrew Morton
  2012-01-09 17:25                 ` Mel Gorman
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Morton @ 2012-01-05 23:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Russell King - ARM Linux, KOSAKI Motohiro, Gilad Ben-Yossef,
	linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, 5 Jan 2012 22:31:06 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> On Thu, Jan 05, 2012 at 02:06:45PM -0800, Andrew Morton wrote:
> > On Thu, 5 Jan 2012 16:17:39 +0000
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists
> > > 
> > > While running a CPU hotplug stress test under memory pressure, I
> > > saw cases where under enough stress the machine would halt although
> > > it required a machine with 8 cores and plenty memory. I think the
> > > problems may be related.
> > 
> > When we first implemented them, the percpu pages in the page allocator
> > were of really really marginal benefit.  I didn't merge the patches at
> > all for several cycles, and it was eventually a 49/51 decision.
> > 
> > So I suggest that our approach to solving this particular problem
> > should be to nuke the whole thing, then see if that caused any
> > observeable problems.  If it did, can we solve those problems by means
> > other than bringing the dang things back?
> > 
> 
> Sounds drastic.

Wrong thinking ;)

Simplifying the code should always be the initial proposal.  Adding
more complexity on top is the worst-case when-all-else-failed option. 
Yet we so often reach for that option first :(

> It would be less controversial to replace this patch
> with a version that calls get_online_cpu() in drain_all_pages() but
> remove the call to drain_all_pages() call from the page allocator on
> the grounds it is not safe against CPU hotplug and to hell with the
> slightly elevated allocation failure rates and stalls. That would avoid
> the try_get_online_cpus() crappiness and be less complex.

If we can come up with a reasonably simple patch which improves or even
fixes the problem then I suppose there is some value in that, as it
provides users of earlier kernels with something to backport if they
hit problems.

But the social downside of that is that everyone would shuffle off
towards other bright and shiny things and we'd be stuck with more
complexity piled on top of dubiously beneficial code.

> If you really want to consider deleting the per-cpu allocator, maybe
> it could be a LSF/MM topic?

eek, spare me.

Anyway, we couldn't discuss such a topic without data.  Such data would
be obtained by deleting the code and measuring the results.  Which is
what I just said ;)

> Personally I would be wary of deleting
> it but mostly because I lack regular access to the type of hardware
> to evaulate whether it was safe to remove or not. Minimally, removing
> the per-cpu allocator could make the zone lock very hot even though slub
> probably makes it very hot already.

Much of the testing of the initial code was done on mbligh's weirdass
NUMAq box: 32-way 386 NUMA which suffered really badly if there were
contention issues.  And even on that box, the code was marginal.  So
I'm hopeful that things will be similar on current machines.  Of
course, it's possible that calling patterns have changed in ways which
make the code more beneficial than it used to be.

But this all ties into my proposal yesterday to remove
mm/swap.c:lru_*_pvecs.  Most or all of the heavy one-page-at-a-time
code can pretty easily be converted to operate on batches of pages. 
Folowing on from that, it should be pretty simple to extend the
batching down into the page freeing.  Look at put_pages_list() and
weep.  And stuff like free_hot_cold_page_list() which could easily free
the pages directly whilebatching the locking.

Page freeing should be relatively straightforward.  Batching page
allocation is hard in some cases (anonymous pagefaults).

Please do note that the above suggestions are only needed if removing
the pcp lists causes a problem!  It may not.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 22:21               ` Mel Gorman
@ 2012-01-06  6:06                 ` Srivatsa S. Bhat
  2012-01-06 10:46                   ` Mel Gorman
  2012-01-06 13:28                 ` Greg KH
  1 sibling, 1 reply; 37+ messages in thread
From: Srivatsa S. Bhat @ 2012-01-06  6:06 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Paul E. McKenney, Russell King - ARM Linux, KOSAKI Motohiro,
	Gilad Ben-Yossef, linux-kernel, Chris Metcalf, Peter Zijlstra,
	Frederic Weisbecker, linux-mm, Pekka Enberg, Matt Mackall,
	Sasha Levin, Rik van Riel, Andi Kleen, Andrew Morton,
	Alexander Viro, Greg KH, linux-fsdevel, Avi Kivity

On 01/06/2012 03:51 AM, Mel Gorman wrote:

> (Adding Greg to cc to see if he recalls seeing issues with sysfs dentry
> suffering from recursive locking recently)
> 
> On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote:
>> On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote:
>>> On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
>>>> Link please?
>>>
>>> Forwarded, as its still in my mailbox.
>>>
>>>> I'm including a patch below under development that is
>>>> intended to only cope with the page allocator case under heavy memory
>>>> pressure. Currently it does not pass testing because eventually RCU
>>>> gets stalled with the following trace
>>>>
>>>> [ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
>>>> [ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
>>>> [ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
>>>> [ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
>>>> [ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
>>>> [ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
>>>> [ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
>>>> [ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
>>>> [ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
>>>> [ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
>>>> [ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
>>>> [ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
>>>> [ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
>>>> [ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
>>>> [ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
>>>> [ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
>>>> [ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
>>>>
>>>> It might be a separate bug, don't know for sure.
>>
> 
> I rebased the patch on top of 3.2 and tested again with a bunch of
> debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU
> hotplug is a lot more reliable and less likely to hang but eventually
> gets into trouble.
> 


Hi everyone,

I was running some CPU hotplug stress tests recently and found it to be
problematic too. Mel, I have some logs from those tests which appear very
relevant to the "IPI to offline CPU" issue that has been discussed in this
thread.

Kernel: 3.2-rc7
Here is the log: 
(Unfortunately I couldn't capture the log intact, due to some annoying
serial console issues, but I hope this log is good enough to analyze.)
  
[  907.825267] Booting Node 1 Processor 15 APIC 0x17
[  907.830117] smpboot cpu 15: start_ip = 97000
[  906.104006] Calibrating delay loop (skipped) already calibrated this CPU
[  907.860875] NMI watchdog enabled, takes one hw-pmu counter.
[  907.898899] Broke affinity for irq 81
[  907.904539] CPU 1 is now offline
[  907.912891] CPU 9 MCA banks CMCI:2 CMCI:3 CMCI:5
[  907.929462] CPU 2 is now offline
[  907.939573] CPU 10 MCA banks CMCI:2 CMCI:3 CMCI:5
[  907.969514] CPU 3 is now offline
[  907.978644] CPU 11 MCA banks CMCI:2 CMCI:3 CMCI:5
[  908.021903] Broke affinity for irq 74
[  908.024021] ------------[ cut here ]------------
[  908.024021] WARNING: at kernel/smp.c:258 generic_smp_call_function_single_interrupt+0x109/0x120()
[  908.024021] Hardware name: IBM System x -[7870C4Q]-
[  908.024021] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[  908.024021] Pid: 22076, comm: migration/4 Not tainted 3.2.0-rc7-0.0.0.28.36b5ec9-default #1
[  908.024021] Call Trace:
[  908.024021]  <IRQ>  [<ffffffff81099309>] ? generic_smp_call_function_single_interrupt+0x109/0x120
[  908.024021]  [<ffffffff8105441a>] warn_slowpath_common+0x7a/0xb0
[  908.024021]  [<ffffffff81054465>] warn_slowpath_null+0x15/0x20
[  908.024021]  [<ffffffff81099309>] generic_smp_call_function_single_interrupt+0x109/0x120
[  908.024021]  [<ffffffff8101ffa2>] smp_call_function_single_interrupt+0x22/0x40
[  908.024021]  [<ffffffff8146afb3>] call_function_single_interrupt+0x73/0x80
[  908.024021]  <EOI>  [<ffffffff810ba86a>] ? stop_machine_cpu_stop+0xda/0x130
[  908.024021]  [<ffffffff810ba790>] ? stop_one_cpu_nowait+0x50/0x50
[  908.024021]  [<ffffffff810ba4ea>] cpu_stopper_thread+0xba/0x180
[  908.024021]  [<ffffffff8146077f>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
[  908.024021]  [<ffffffff810ba430>] ? res_counter_init+0x50/0x50
[  908.024021]  [<ffffffff8109141d>] ? trace_hardirqs_on_caller+0x12d/0x1b0
[  908.024021]  [<ffffffff810914ad>] ? trace_hardirqs_on+0xd/0x10
[  908.024021]  [<ffffffff810ba430>] ? res_counter_init+0x50/0x50
[  908.024021]  [<ffffffff81078cf6>] kthread+0x96/0xa0
[  908.024021]  [<ffffffff8146b444>] kernel_thread_helper+0x4/0x10
[  908.024021]  [<ffffffff81460ab4>] ? retint_restore_args+0x13/0x13
[  908.024021]  [<ffffffff81078c60>] ? __init_kthread_worker+0x70/0x70
[  908.024021]  [<ffffffff8146b440>] ? gs_change+0x13/0x13
[  908.024021] ---[ end trace f4c7a25be63a672a ]---
[  908.328208] CPU 4 is now offline
[  908.332730] CPU 5 MCA banks CMCI:6 CMCI:8
[  908.337074] CPU 12 MCA banks CMCI:2 CMCI:3 CMCI:5
[  908.349270] CPU 5 is now offline
[  908.353888] CPU 6 MCA banks CMCI:6 CMCI:8
[  908.376131] CPU 13 MCA banks CMCI:2 CMCI:3 CMCI:5
[  908.391939] CPU 6 is now offline
[  908.413193] CPU 7 MCA banks CMCI:6 CMCI:8
[  908.443245] CPU 14 MCA banks CMCI:2 CMCI:3 CMCI:5
[  908.475871] CPU 7 is now offline
[  908.481601] CPU 12 MCA banks CMCI:6 CMCI:8
[  908.485923] CPU 15 MCA banks CMCI:2 CMCI:3 CMCI:5
[  908.519889] CPU 8 is now offline
[  908.565926] CPU 9 is now offline
[  908.602874] CPU 10 is now offline
[  908.634696] CPU 11 is now offline
[  908.674735] CPU 12 is now offline
[  908.680343] CPU 13 MCA banks CMCI:6 CMCI:8
[  908.721887] CPU 13 is now offline
[  908.728086] CPU 14 MCA banks CMCI:6 CMCI:8
[  908.789105] CPU 14 is now offline
[  908.794969] CPU 15 MCA banks CMCI:6 CMCI:8
[  908.881878] CPU 15 is now offline
[  908.885301] lockdep: fixing up alternatives.
[  908.889663] SMP alternatives: switching to UP code
[  909.140900] lockdep: fixing up alternatives.
[  909.145281] SMP alternatives: switching to SMP code
[  909.153536] Booting Node 0 Processor 1 APIC 0x2
[  909.158157] smpboot cpu 1: start_ip = 97000
[  907.900022] Calibrating delay loop (skipped) already calibrated this CPU
[  909.181323] NMI watchdog enabled, takes one hw-pmu counter.
[  909.275696] lockdep: fixing up alternatives.
[  909.280106] Booting Node 0 Processor 2 APIC 0x4
[  909.280107] smpboot cpu 2: start_ip = 97000
[  907.928015] Calibrating delay loop (skipped) already calibrated this CPU
[  909.308538] NMI watchdog enabled, takes one hw-pmu counter.
[  909.376170] lockdep: fixing up alternatives.
[  909.380589] Booting Node 0 Processor 3 APIC 0x6
[ 1319.109486] Booting Node 1 Processor 14 APIC 0x15
[ 1319.114320] smpboot cpu 14: start_ip = 97000
[ 1318.456153] Calibrating delay loop (skipped) already calibrated this CPU
[ 1319.139762] NMI watchdog enabled, takes one hw-pmu counter.
[ 1319.150412] lockdep: fixing up alternatives.
[ 1319.155062] Booting Node 1 Processor 15 APIC 0x17
[ 1319.160165] smpboot cpu 15: start_ip = 97000
[ 1318.472003] Calibrating delay loop (skipped) already calibrated this CPU
[ 1319.188592] NMI watchdog enabled, takes one hw-pmu counter.
[ 1319.216529] CPU 1 is now offline
[ 1319.224915] CPU 9 MCA banks CMCI:2 CMCI:3 CMCI:5
[ 1319.240750] CPU 2 is now offline
[ 1319.256419] CPU 10 MCA banks CMCI:2 CMCI:3 CMCI:5
[ 1319.269161] CPU 3 is now offline
[ 1319.280258] CPU 11 MCA banks CMCI:2 CMCI:3 CMCI:5
[ 1319.293433] CPU 4 is now offline
[ 1319.298109] CPU 5 MCA banks CMCI:6 CMCI:8
[ 1319.312516] CPU 12 MCA banks CMCI:2 CMCI:3 CMCI:5
[ 1319.325377] CPU 5 is now offline
[ 1319.331679] CPU 6 MCA banks CMCI:6 CMCI:8
[ 1319.340437] CPU 13 MCA banks CMCI:2 CMCI:3 CMCI:5
[ 1319.352367] CPU 6 is now offline
[ 1319.357553] CPU 7 MCA banks CMCI:6 CMCI:8
[ 1319.372577] CPU 14 MCA banks CMCI:2 CMCI:3 CMCI:5
[ 1319.385997] CPU 7 is now offline
[ 1319.393018] CPU 12 MCA banks CMCI:6 CMCI:8
[ 1319.397604] CPU 15 MCA banks CMCI:2 CMCI:3 CMCI:5
[ 1319.409149] CPU 8 is now offline
[ 1319.428255] CPU 9 is now offline
[ 1319.450764] CPU 10 is now offline
[ 1319.474489] CPU 11 is now offline
[ 1319.496806] CPU 12 is now offline
[ 1319.502966] CPU 13 MCA banks CMCI:6 CMCI:8
[ 1319.511746] CPU 13 is now offline
[ 1347.146085] BUG: soft lockup - CPU#14 stuck for 22s! [udevd:1068]
[ 1347.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1347.148005] irq event stamp: 151225746
[ 1347.148005] hardirqs last  enabled at (151225745): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1347.148005] hardirqs last disabled at (151225746): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1347.148005] softirqs last  enabled at (151225744): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1347.148005] softirqs last disabled at (151225739): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1347.148005] CPU 14 
[ 1347.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1347.148005] 
[ 1347.148005] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1347.148005] RIP: 0010:[<ffffffff81033de8>]  [<ffffffff81033de8>] flush_tlb_others_ipi+0x108/0x140
[ 1347.148005] RSP: 0000:ffff881147bfdc48  EFLAGS: 00000246
[ 1347.148005] RAX: 0000000000000000 RBX: ffffffff81460ab4 RCX: 0000000000000010
[ 1347.148005] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1347.148005] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1347.148005] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfdbb8
[ 1347.148005] R13: ffff8808ca7e0c80 R14: ffff881147bfc000 R15: 0000000000000000
[ 1347.148005] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1347.148005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1347.148005] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1347.148005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1347.148005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1347.148005] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1347.148005] Stack:
[ 1347.148005]  ffff881147bfdcb8 ffff881146a20d80 00007fff56156db8 ffff881146a20de0
[ 1347.148005]  ffff88114739f818 ffff881147776ab0 ffff881147bfdc88 ffffffff81033e29
[ 1347.148005]  ffff881147bfdcb8 ffffffff81033f2a ffff881146a20df8 0000000000000001
[ 1347.148005] Call Trace:
[ 1347.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1347.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1347.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1347.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1347.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1347.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1347.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1347.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1347.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1347.148005] Code: 00 00 00 48 8b 05 91 14 9b 00 41 8d b7 cf 00 00 00 4c 89 e7 ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 35 00 4f 9b 00 <4c> 89 e7 e8 80 7a 22 00 85 c0 74 ec eb 84 66 2e 0f 1f 84 00 00 
[ 1347.148005] Call Trace:
[ 1347.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1347.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1347.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1347.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1347.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1347.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1347.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1347.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1347.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1384.508005] INFO: rcu_sched detected stall on CPU 14 (t=16250 jiffies)
[ 1384.508007] sending NMI to all CPUs:
[ 1384.516014] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 15, t=16252 jiffies)
[ 1384.527400] NMI backtrace for cpu 0
[ 1384.528012] CPU 0 
[ 1384.528012] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1384.528012] 
[ 1384.528012] Pid: 24575, comm: cc1 Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1384.528012] RIP: 0033:[<00002abf6f4f7c48>]  [<00002abf6f4f7c48>] 0x2abf6f4f7c47
[ 1384.528012] RSP: 002b:00007fffa24587e8  EFLAGS: 00000206
[ 1384.528012] RAX: 000000000000002b RBX: 00007fffa24587f0 RCX: 0000000000000004
[ 1384.528012] RDX: 0000000000000000 RSI: 00002abf7029b714 RDI: 00007fffa24587f0
[ 1384.528012] RBP: 00007fffa2458860 R08: fffffffffffffffc R09: 00007fffa245892e
[ 1384.528012] R10: 00002abf6ee686c0 R11: 00002abf6f4fa1c6 R12: 0000000000000000
[ 1384.528012] R13: 00002abf7029b714 R14: 0000000000000002 R15: 00007fffa245892f
[ 1384.528012] FS:  00002abf6f7d87e0(0000) GS:ffff8808ffc00000(0000) knlGS:0000000000000000
[ 1384.528012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1384.528012] CR2: 00002abf7029e000 CR3: 00000007fb3e4000 CR4: 00000000000006f0
[ 1384.528012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1384.528012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1384.528012] Process cc1 (pid: 24575, threadinfo ffff8807fb3ea000, task ffff8807fb3ed640)
[ 1384.528012] 
[ 1384.528012] Call Trace:
[ 1384.508007] NMI backtrace for cpu 14
[ 1384.508007] CPU 14 
[ 1384.508007] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1384.508007] 
[ 1384.508007] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1384.508007] RIP: 0010:[<ffffffff81258d62>]  [<ffffffff81258d62>] delay_tsc+0x42/0xa0
[ 1384.508007] RSP: 0000:ffff88117fd83d30  EFLAGS: 00000093
[ 1384.508007] RAX: 00000000000000d4 RBX: 00000000000470af RCX: ffffffff8ac50e81
[ 1384.508007] RDX: 000000008ac50e81 RSI: 000000000000000f RDI: 00000000000470af
[ 1384.508007] RBP: ffff88117fd83d68 R08: 0000000000000010 R09: ffffffff819e7660
[ 1384.508007] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000001000
[ 1384.508007] R13: 0000000000000002 R14: 000000000000000e R15: 000000000000000e
[ 1384.508007] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1384.508007] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1384.508007] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1384.508007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1384.508007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1384.508007] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1384.508007] Stack:
[ 1384.508007]  ffffffff8ac50dad 0000000e00000046 0000000000000000 0000000000001000
[ 1384.508007]  0000000000000002 000000000000cbc0 0000000000000002 ffff88117fd83d78
[ 1384.508007]  ffffffff81258e1f ffff88117fd83d98 ffffffff81021aaa 000000000000000f
[ 1384.508007] Call Trace:
[ 1384.508007]  <IRQ> 
[ 1384.508007]  [<ffffffff81258e1f>] __const_udelay+0x2f/0x40
[ 1384.508007]  [<ffffffff81021aaa>] native_safe_apic_wait_icr_idle+0x1a/0x50
[ 1384.508007]  [<ffffffff81021fbd>] default_send_IPI_mask_sequence_phys+0xdd/0x130
[ 1384.508007]  [<ffffffff81025094>] physflat_send_IPI_all+0x14/0x20
[ 1384.508007]  [<ffffffff81022097>] arch_trigger_all_cpu_backtrace+0x67/0xb0
[ 1384.508007]  [<ffffffff810cfca9>] __rcu_pending+0x119/0x280
[ 1384.508007]  [<ffffffff810cfeba>] rcu_check_callbacks+0xaa/0x1b0
[ 1384.508007]  [<ffffffff810643c1>] update_process_times+0x41/0x80
[ 1384.508007]  [<ffffffff8108b05f>] tick_sched_timer+0x5f/0xc0
[ 1384.508007]  [<ffffffff8108b000>] ? tick_nohz_handler+0x100/0x100
[ 1384.508007]  [<ffffffff8107d951>] __run_hrtimer+0xd1/0x1d0
[ 1384.508007]  [<ffffffff8107dc97>] hrtimer_interrupt+0xc7/0x1f0
[ 1384.508007]  [<ffffffff81021a34>] smp_apic_timer_interrupt+0x64/0xa0
[ 1384.508007]  [<ffffffff81469db3>] apic_timer_interrupt+0x73/0x80
[ 1384.508007]  <EOI> 
[ 1384.508007]  [<ffffffff81033de2>] ? flush_tlb_others_ipi+0x102/0x140
[ 1384.508007]  [<ffffffff81033df0>] ? flush_tlb_others_ipi+0x110/0x140
[ 1384.508007]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1384.508007]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1384.508007]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1384.508007]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1384.508007]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1384.508007]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1384.508007]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1384.508007]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1384.508007]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1384.508007] Code: 34 25 b0 cb 00 00 66 66 90 0f ae e8 e8 e8 15 db ff 66 90 48 98 44 89 75 d4 48 89 45 c8 eb 1b 66 2e 0f 1f 84 00 00 00 00 00 f3 90 <65> 44 8b 3c 25 b0 cb 00 00 44 3b 7d d4 75 2b 66 66 90 0f ae e8 
[ 1384.508007] Call Trace:
[ 1384.508007]  <IRQ>  [<ffffffff81258e1f>] __const_udelay+0x2f/0x40
[ 1384.508007]  [<ffffffff81021aaa>] native_safe_apic_wait_icr_idle+0x1a/0x50
[ 1384.508007]  [<ffffffff81021fbd>] default_send_IPI_mask_sequence_phys+0xdd/0x130
[ 1384.508007]  [<ffffffff81025094>] physflat_send_IPI_all+0x14/0x20
[ 1384.508007]  [<ffffffff81022097>] arch_trigger_all_cpu_backtrace+0x67/0xb0
[ 1384.508007]  [<ffffffff810cfca9>] __rcu_pending+0x119/0x280
[ 1384.508007]  [<ffffffff810cfeba>] rcu_check_callbacks+0xaa/0x1b0
[ 1384.508007]  [<ffffffff810643c1>] update_process_times+0x41/0x80
[ 1384.508007]  [<ffffffff8108b05f>] tick_sched_timer+0x5f/0xc0
[ 1384.508007]  [<ffffffff8108b000>] ? tick_nohz_handler+0x100/0x100
[ 1384.508007]  [<ffffffff8107d951>] __run_hrtimer+0xd1/0x1d0
[ 1384.508007]  [<ffffffff8107dc97>] hrtimer_interrupt+0xc7/0x1f0
[ 1384.508007]  [<ffffffff81021a34>] smp_apic_timer_interrupt+0x64/0xa0
[ 1384.508007]  [<ffffffff81469db3>] apic_timer_interrupt+0x73/0x80
[ 1384.508007]  <EOI>  [<ffffffff81033de2>] ? flush_tlb_others_ipi+0x102/0x140
[ 1384.508007]  [<ffffffff81033df0>] ? flush_tlb_others_ipi+0x110/0x140
[ 1384.508007]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1384.508007]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1384.508007]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1384.508007]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1384.508007]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1384.508007]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1384.508007]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1384.508007]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1384.508007]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1385.089722] NMI backtrace for cpu 15
[ 1385.089722] CPU 15 
[ 1385.089722] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1385.089722] 
[ 1385.089722] Pid: 24569, comm: sh Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1385.089722] RIP: 0033:[<00002b292b250c59>]  [<00002b292b250c59>] 0x2b292b250c58
[ 1385.089722] RSP: 002b:00007fff6bb94b70  EFLAGS: 00000202
[ 1385.089722] RAX: 00002b292b520580 RBX: 00000000006b0907 RCX: 00007fff6bb94c20
[ 1385.089722] RDX: 00007fff6bb94bd0 RSI: 00000000006b0907 RDI: 00007fff6bb94bd0
[ 1385.089722] RBP: 000000000000001b R08: 00000000006b0907 R09: 0000000000000001
[ 1385.089722] R10: 0000000000000000 R11: 00007fff6bb94c20 R12: 0000000000000000
[ 1385.089722] R13: 00000000006b0907 R14: 00007fff6bb94bd0 R15: 00002b292b76eba0
[ 1385.089722] FS:  00002b292b76eba0(0000) GS:ffff88117fdc0000(0000) knlGS:0000000000000000
[ 1385.089722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1385.089722] CR2: 00000000006b1110 CR3: 00000010a023d000 CR4: 00000000000006e0
[ 1385.089722] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1385.089722] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1385.089722] Process sh (pid: 24569, threadinfo ffff8810a0224000, task ffff8810a0220d40)
[ 1385.089722] 
[ 1385.089722] Call Trace:
[ 1411.146085] BUG: soft lockup - CPU#14 stuck for 23s! [udevd:1068]
[ 1411.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1411.148005] irq event stamp: 151367944
[ 1411.148005] hardirqs last  enabled at (151367943): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1411.148005] hardirqs last disabled at (151367944): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1411.148005] softirqs last  enabled at (151367942): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1411.148005] softirqs last disabled at (151367937): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1411.148005] CPU 14 
[ 1411.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1411.148005] 
[ 1411.148005] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1411.148005] RIP: 0010:[<ffffffff81033de2>]  [<ffffffff81033de2>] flush_tlb_others_ipi+0x102/0x140
[ 1411.148005] RSP: 0000:ffff881147bfdc48  EFLAGS: 00000246
[ 1411.148005] RAX: 0000000000000000 RBX: ffffffff81460ab4 RCX: 0000000000000010
[ 1411.148005] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1411.148005] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1411.148005] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfdbb8
[ 1411.148005] R13: ffff8808ca7e0c80 R14: ffff881147bfc000 R15: 0000000000000000
[ 1411.148005] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1411.148005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1411.148005] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1411.148005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1411.148005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1411.148005] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1411.148005] Stack:
[ 1411.148005]  ffff881147bfdcb8 ffff881146a20d80 00007fff56156db8 ffff881146a20de0
[ 1411.148005]  ffff88114739f818 ffff881147776ab0 ffff881147bfdc88 ffffffff81033e29
[ 1411.148005]  ffff881147bfdcb8 ffffffff81033f2a ffff881146a20df8 0000000000000001
[ 1411.148005] Call Trace:
[ 1411.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1411.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1411.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1411.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1411.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1411.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1411.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1411.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1411.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1411.148005] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 91 14 9b 00 41 8d b7 cf 00 00 00 4c 89 e7 ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 00 4f 9b 00 4c 89 e7 e8 80 7a 22 00 85 c0 74 ec eb 84 66 
[ 1411.148005] Call Trace:
[ 1411.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1411.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1411.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1411.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1411.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1411.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1411.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1411.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1411.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1439.146086] BUG: soft lockup - CPU#14 stuck for 23s! [udevd:1068]
[ 1439.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1439.148005] irq event stamp: 151430582
[ 1439.148005] hardirqs last  enabled at (151430581): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1439.148005] hardirqs last disabled at (151430582): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1439.148005] softirqs last  enabled at (151430580): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1439.148005] softirqs last disabled at (151430575): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1439.148005] CPU 14 
[ 1439.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1439.148005] 
[ 1439.148005] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1439.148005] RIP: 0010:[<ffffffff8125b883>]  [<ffffffff8125b883>] __bitmap_empty+0x13/0x90
[ 1439.148005] RSP: 0000:ffff881147bfdc38  EFLAGS: 00000246
[ 1439.148005] RAX: 000000000000004f RBX: ffffffff81460ab4 RCX: 0000000000000010
[ 1439.148005] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1439.148005] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1439.148005] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfdba8
[ 1439.148005] R13: ffff8808ca7e0c80 R14: ffff881147bfc000 R15: 0000000000000000
[ 1439.148005] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1439.148005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1439.148005] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1439.148005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1439.148005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1439.148005] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1439.148005] Stack:
[ 1439.148005]  ffff881147bfdc78 ffffffff81033df0 ffff881147bfdcb8 ffff881146a20d80
[ 1439.148005]  00007fff56156db8 ffff881146a20de0 ffff88114739f818 ffff881147776ab0
[ 1439.148005]  ffff881147bfdc88 ffffffff81033e29 ffff881147bfdcb8 ffffffff81033f2a
[ 1439.148005] Call Trace:
[ 1439.148005]  [<ffffffff81033df0>] ? flush_tlb_others_ipi+0x110/0x140
[ 1439.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1439.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1439.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1439.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1439.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1439.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1439.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1439.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1439.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1439.148005] Code: c7 45 b0 10 00 00 00 48 89 45 c0 e8 38 ff ff ff c9 c3 90 90 90 90 90 90 8d 46 3f 85 f6 41 89 f0 55 44 0f 48 c0 31 d2 41 c1 f8 06 <48> 89 e5 45 85 c0 7e 2a 31 d2 48 83 3f 00 48 89 f9 74 17 eb 60 
[ 1439.148005] Call Trace:
[ 1439.148005]  [<ffffffff81033df0>] ? flush_tlb_others_ipi+0x110/0x140
[ 1439.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1439.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1439.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1439.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1439.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1439.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1439.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1439.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1439.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1467.146085] BUG: soft lockup - CPU#14 stuck for 22s! [udevd:1068]
[ 1467.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1467.148005] irq event stamp: 151493442
[ 1467.148005] hardirqs last  enabled at (151493441): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1467.148005] hardirqs last disabled at (151493442): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1467.148005] softirqs last  enabled at (151493440): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1467.148005] softirqs last disabled at (151493435): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1467.148005] CPU 14 
[ 1467.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1467.148005] 
[ 1467.148005] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1467.148005] RIP: 0010:[<ffffffff81033de2>]  [<ffffffff81033de2>] flush_tlb_others_ipi+0x102/0x140
[ 1467.148005] RSP: 0000:ffff881147bfdc48  EFLAGS: 00000246
[ 1467.148005] RAX: 0000000000000000 RBX: ffffffff81460ab4 RCX: 0000000000000010
[ 1467.148005] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1467.148005] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1467.148005] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfdbb8
[ 1467.148005] R13: ffff8808ca7e0c80 R14: ffff881147bfc000 R15: 0000000000000000
[ 1467.148005] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1467.148005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1467.148005] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1467.148005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1467.148005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1467.148005] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1467.148005] Stack:
[ 1467.148005]  ffff881147bfdcb8 ffff881146a20d80 00007fff56156db8 ffff881146a20de0
[ 1467.148005]  ffff88114739f818 ffff881147776ab0 ffff881147bfdc88 ffffffff81033e29
[ 1467.148005]  ffff881147bfdcb8 ffffffff81033f2a ffff881146a20df8 0000000000000001
[ 1467.148005] Call Trace:
[ 1467.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1467.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1467.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1467.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1467.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1467.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1467.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1467.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1467.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1467.148005] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 91 14 9b 00 41 8d b7 cf 00 00 00 4c 89 e7 ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 00 4f 9b 00 4c 89 e7 e8 80 7a 22 00 85 c0 74 ec eb 84 66 
[ 1467.148005] Call Trace:
[ 1467.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1467.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1467.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1467.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1467.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1467.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1467.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1467.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1467.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1495.146085] BUG: soft lockup - CPU#14 stuck for 22s! [udevd:1068]
[ 1495.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1495.148005] irq event stamp: 151556262
[ 1495.148005] hardirqs last  enabled at (151556261): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1495.148005] hardirqs last disabled at (151556262): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1495.148005] softirqs last  enabled at (151556260): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1495.148005] softirqs last disabled at (151556255): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1495.148005] CPU 14 
[ 1495.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1495.148005] 
[ 1495.148005] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1495.148005] RIP: 0010:[<ffffffff81033de2>]  [<ffffffff81033de2>] flush_tlb_others_ipi+0x102/0x140
[ 1495.148005] RSP: 0000:ffff881147bfdc48  EFLAGS: 00000246
[ 1495.148005] RAX: 0000000000000000 RBX: ffffffff81460ab4 RCX: 0000000000000010
[ 1495.148005] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1495.148005] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1495.148005] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfdbb8
[ 1495.148005] R13: ffff8808ca7e0c80 R14: ffff881147bfc000 R15: 0000000000000000
[ 1495.148005] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1495.148005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1495.148005] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1495.148005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1495.148005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1495.148005] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1495.148005] Stack:
[ 1495.148005]  ffff881147bfdcb8 ffff881146a20d80 00007fff56156db8 ffff881146a20de0
[ 1495.148005]  ffff88114739f818 ffff881147776ab0 ffff881147bfdc88 ffffffff81033e29
[ 1495.148005]  ffff881147bfdcb8 ffffffff81033f2a ffff881146a20df8 0000000000000001
[ 1495.148005] Call Trace:
[ 1495.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1495.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1495.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1495.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1495.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1495.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1495.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1495.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1495.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1495.148005] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 91 14 9b 00 41 8d b7 cf 00 00 00 4c 89 e7 ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 00 4f 9b 00 4c 89 e7 e8 80 7a 22 00 85 c0 74 ec eb 84 66 
[ 1495.148005] Call Trace:
[ 1495.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1495.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1495.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1495.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1495.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1495.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1495.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1495.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1495.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1523.146085] BUG: soft lockup - CPU#14 stuck for 22s! [udevd:1068]
[ 1523.148004] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1523.148004] irq event stamp: 151619118
[ 1523.148004] hardirqs last  enabled at (151619117): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1523.148004] hardirqs last disabled at (151619118): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1523.148004] softirqs last  enabled at (151619116): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1523.148004] softirqs last disabled at (151619111): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1523.148004] CPU 14 
[ 1523.148004] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1523.148004] 
[ 1523.148004] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1523.148004] RIP: 0010:[<ffffffff8125b8e7>]  [<ffffffff8125b8e7>] __bitmap_empty+0x77/0x90
[ 1523.148004] RSP: 0000:ffff881147bfdc38  EFLAGS: 00000216
[ 1523.148004] RAX: 000000000000ffff RBX: ffff881147bfdbb8 RCX: 0000000000000010
[ 1523.148004] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1523.148004] RBP: ffff881147bfdc38 R08: 0000000000000000 R09: 0000000000000000
[ 1523.148004] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfc000
[ 1523.148004] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
[ 1523.148004] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1523.148004] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1523.148004] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1523.148004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1523.148004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1523.148004] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1523.148004] Stack:
[ 1523.148004]  ffff881147bfdc78 ffffffff81033df0 ffff881147bfdcb8 ffff881146a20d80
[ 1523.148004]  00007fff56156db8 ffff881146a20de0 ffff88114739f818 ffff881147776ab0
[ 1523.148004]  ffff881147bfdc88 ffffffff81033e29 ffff881147bfdcb8 ffffffff81033f2a
[ 1523.148004] Call Trace:
[ 1523.148004]  [<ffffffff81033df0>] flush_tlb_others_ipi+0x110/0x140
[ 1523.148004]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1523.148004]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1523.148004]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1523.148004]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1523.148004]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1523.148004]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1523.148004]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1523.148004]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1523.148004]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1523.148004] Code: 00 00 75 08 c9 c3 66 0f 1f 44 00 00 89 f0 48 63 d2 c1 f8 1f c1 e8 1a 8d 0c 06 83 e1 3f 29 c1 b8 01 00 00 00 48 d3 e0 48 83 e8 01 <48> 85 04 d7 c9 0f 94 c0 0f b6 c0 c3 0f 1f 44 00 00 31 c0 c9 c3 
[ 1523.148004] Call Trace:
[ 1523.148004]  [<ffffffff81033df0>] flush_tlb_others_ipi+0x110/0x140
[ 1523.148004]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1523.148004]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1523.148004]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1523.148004]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1523.148004]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1523.148004]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1523.148004]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1523.148004]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1523.148004]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1551.146087] BUG: soft lockup - CPU#14 stuck for 22s! [udevd:1068]
[ 1551.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1551.148005] irq event stamp: 151681846
[ 1551.148005] hardirqs last  enabled at (151681845): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1551.148005] hardirqs last disabled at (151681846): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1551.148005] softirqs last  enabled at (151681844): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1551.148005] softirqs last disabled at (151681839): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1551.148005] CPU 14 
[ 1551.148005] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1551.148005] 
[ 1551.148005] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1551.148005] RIP: 0010:[<ffffffff81033de8>]  [<ffffffff81033de8>] flush_tlb_others_ipi+0x108/0x140
[ 1551.148005] RSP: 0000:ffff881147bfdc48  EFLAGS: 00000246
[ 1551.148005] RAX: 0000000000000000 RBX: ffffffff81460ab4 RCX: 0000000000000010
[ 1551.148005] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1551.148005] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1551.148005] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfdbb8
[ 1551.148005] R13: ffff8808ca7e0c80 R14: ffff881147bfc000 R15: 0000000000000000
[ 1551.148005] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1551.148005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1551.148005] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1551.148005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1551.148005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1551.148005] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1551.148005] Stack:
[ 1551.148005]  ffff881147bfdcb8 ffff881146a20d80 00007fff56156db8 ffff881146a20de0
[ 1551.148005]  ffff88114739f818 ffff881147776ab0 ffff881147bfdc88 ffffffff81033e29
[ 1551.148005]  ffff881147bfdcb8 ffffffff81033f2a ffff881146a20df8 0000000000000001
[ 1551.148005] Call Trace:
[ 1551.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1551.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1551.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1551.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1551.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1551.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1551.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1551.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1551.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1551.148005] Code: 00 00 00 48 8b 05 91 14 9b 00 41 8d b7 cf 00 00 00 4c 89 e7 ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 35 00 4f 9b 00 <4c> 89 e7 e8 80 7a 22 00 85 c0 74 ec eb 84 66 2e 0f 1f 84 00 00 
[ 1551.148005] Call Trace:
[ 1551.148005]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1551.148005]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1551.148005]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1551.148005]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1551.148005]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1551.148005]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1551.148005]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1551.148005]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1551.148005]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1579.146085] BUG: soft lockup - CPU#14 stuck for 22s! [udevd:1068]
[ 1579.148003] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1579.148003] irq event stamp: 151744636
[ 1579.148003] hardirqs last  enabled at (151744635): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1579.148003] hardirqs last disabled at (151744636): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1579.148003] softirqs last  enabled at (151744634): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1579.148003] softirqs last disabled at (151744629): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1579.148003] CPU 14 
[ 1579.148003] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1579.148003] 
[ 1579.148003] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1579.148003] RIP: 0010:[<ffffffff8125b8f2>]  [<ffffffff8125b8f2>] __bitmap_empty+0x82/0x90
[ 1579.148003] RSP: 0000:ffff881147bfdc40  EFLAGS: 00000206
[ 1579.148003] RAX: 0000000000000000 RBX: ffffffff81460ab4 RCX: 0000000000000010
[ 1579.148003] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1579.148003] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1579.148003] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffff881147bfdbb8
[ 1579.148003] R13: ffff8808ca7e0c80 R14: ffff881147bfc000 R15: 0000000000000000
[ 1579.148003] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1579.148003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1579.148003] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1579.148003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1579.148003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1579.148003] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1579.148003] Stack:
[ 1579.148003]  ffffffff81033df0 ffff881147bfdcb8 ffff881146a20d80 00007fff56156db8
[ 1579.148003]  ffff881146a20de0 ffff88114739f818 ffff881147776ab0 ffff881147bfdc88
[ 1579.148003]  ffffffff81033e29 ffff881147bfdcb8 ffffffff81033f2a ffff881146a20df8
[ 1579.148003] Call Trace:
[ 1579.148003]  [<ffffffff81033df0>] ? flush_tlb_others_ipi+0x110/0x140
[ 1579.148003]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1579.148003]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1579.148003]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1579.148003]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1579.148003]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1579.148003]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1579.148003]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1579.148003]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1579.148003]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1579.148003] Code: 00 89 f0 48 63 d2 c1 f8 1f c1 e8 1a 8d 0c 06 83 e1 3f 29 c1 b8 01 00 00 00 48 d3 e0 48 83 e8 01 48 85 04 d7 c9 0f 94 c0 0f b6 c0 <c3> 0f 1f 44 00 00 31 c0 c9 c3 0f 1f 40 00 8d 46 3f 85 f6 41 89 
[ 1579.148003] Call Trace:
[ 1579.148003]  [<ffffffff81033df0>] ? flush_tlb_others_ipi+0x110/0x140
[ 1579.148003]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1579.148003]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1579.148003]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1579.148003]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1579.148003]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1579.148003]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1579.148003]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1579.148003]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1579.148003]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1579.636006] INFO: rcu_sched detected stall on CPU 14 (t=65032 jiffies)
[ 1579.636006] sending NMI to all CPUs:
[ 1579.644010] INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 0, t=65034 jiffies)
[ 1579.655327] NMI backtrace for cpu 0
[ 1579.656012] CPU 0 
[ 1579.656012] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1579.656012] 
[ 1579.656012] Pid: 9736, comm: make Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1579.656012] RIP: 0010:[<ffffffff810ffe52>]  [<ffffffff810ffe52>] generic_file_aio_read+0x62/0x280
[ 1579.656012] RSP: 0018:ffff8807ce195b10  EFLAGS: 00000292
[ 1579.656012] RAX: ffff8807ce190800 RBX: ffff8807ce195c38 RCX: ffff8807ce195ab0
[ 1579.656012] RDX: ffff8807ce195ac0 RSI: ffff8807ce195ab0 RDI: ffff8807ce195aa8
[ 1579.656012] RBP: ffff8807ce195b38 R08: 0000000000000040 R09: ffff8807ce1910b8
[ 1579.656012] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000001
[ 1579.656012] R13: 00000000000001f8 R14: ffff8807ce18e6c0 R15: ffff8807ce195ad8
[ 1579.656012] FS:  00002b9180bbe700(0000) GS:ffff8808ffc00000(0000) knlGS:0000000000000000
[ 1579.656012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1579.656012] CR2: 00002ae261975280 CR3: 0000001146873000 CR4: 00000000000006f0
[ 1579.656012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1579.656012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1579.656012] Process make (pid: 9736, threadinfo ffff8807ce194000, task ffff8807ce190800)
[ 1579.656012] Stack:
[ 1579.656012]  ffff8807ce195b48 ffff8807ce195c28 ffff8807ce18e6c0 ffff8807ce195ca0
[ 1579.656012]  ffff8807ce195f58 ffff8807ce195c58 ffffffff81160f79 ffff8807ce1910b8
[ 1579.656012]  ffff8807ce190800 0000000000000000 ffffffff00000001 ffff8807ce18e6c0
[ 1579.656012] Call Trace:
[ 1579.656012]  [<ffffffff81160f79>] do_sync_read+0xd9/0x120
[ 1579.656012]  [<ffffffff81221b1d>] ? common_file_perm+0x8d/0x110
[ 1579.656012]  [<ffffffff811fc373>] ? security_file_permission+0x93/0xa0
[ 1579.656012]  [<ffffffff81161658>] vfs_read+0xc8/0x130
[ 1579.656012]  [<ffffffff81168534>] kernel_read+0x44/0x60
[ 1579.656012]  [<ffffffff811b5aa3>] load_elf_binary+0x173/0x1060
[ 1579.656012]  [<ffffffff81092531>] ? __lock_acquire+0x301/0x520
[ 1579.656012]  [<ffffffff811695cc>] ? search_binary_handler+0xfc/0x360
[ 1579.656012]  [<ffffffff811b5930>] ? load_elf_interp+0x5e0/0x5e0
[ 1579.656012]  [<ffffffff811b5930>] ? load_elf_interp+0x5e0/0x5e0
[ 1579.656012]  [<ffffffff811695d6>] search_binary_handler+0x106/0x360
[ 1579.656012]  [<ffffffff8116951e>] ? search_binary_handler+0x4e/0x360
[ 1579.656012]  [<ffffffff81169d2d>] do_execve_common+0x27d/0x320
[ 1579.656012]  [<ffffffff81169e5a>] do_execve+0x3a/0x40
[ 1579.656012]  [<ffffffff8100aac9>] sys_execve+0x49/0x70
[ 1579.656012]  [<ffffffff8146972c>] stub_execve+0x6c/0xc0
[ 1579.656012] Code: c8 4c 8b 77 20 4c 89 e7 48 89 85 58 ff ff ff 48 c7 45 c8 00 00 00 00 e8 8d d8 ff ff 4c 63 e8 4d 85 ed 74 15 48 81 c4 88 00 00 00 <4c> 89 e8 5b 41 5c 41 5d 41 5e 41 5f c9 c3 48 8d bd 70 ff ff ff 
[ 1579.656012] Call Trace:
[ 1579.656012]  [<ffffffff81160f79>] do_sync_read+0xd9/0x120
[ 1579.656012]  [<ffffffff81221b1d>] ? common_file_perm+0x8d/0x110
[ 1579.656012]  [<ffffffff811fc373>] ? security_file_permission+0x93/0xa0
[ 1579.656012]  [<ffffffff81161658>] vfs_read+0xc8/0x130
[ 1579.656012]  [<ffffffff81168534>] kernel_read+0x44/0x60
[ 1579.656012]  [<ffffffff811b5aa3>] load_elf_binary+0x173/0x1060
[ 1579.656012]  [<ffffffff81092531>] ? __lock_acquire+0x301/0x520
[ 1579.656012]  [<ffffffff811695cc>] ? search_binary_handler+0xfc/0x360
[ 1579.656012]  [<ffffffff811b5930>] ? load_elf_interp+0x5e0/0x5e0
[ 1579.656012]  [<ffffffff811b5930>] ? load_elf_interp+0x5e0/0x5e0
[ 1579.656012]  [<ffffffff811695d6>] search_binary_handler+0x106/0x360
[ 1579.656012]  [<ffffffff8116951e>] ? search_binary_handler+0x4e/0x360
[ 1579.656012]  [<ffffffff81169d2d>] do_execve_common+0x27d/0x320
[ 1579.656012]  [<ffffffff81169e5a>] do_execve+0x3a/0x40
[ 1579.656012]  [<ffffffff8100aac9>] sys_execve+0x49/0x70
[ 1579.656012]  [<ffffffff8146972c>] stub_execve+0x6c/0xc0
[ 1579.636006] NMI backtrace for cpu 14
[ 1579.636006] CPU 14 
[ 1579.636006] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1579.636006] 
[ 1579.636006] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1579.636006] RIP: 0010:[<ffffffff81258d71>]  [<ffffffff81258d71>] delay_tsc+0x51/0xa0
[ 1579.636006] RSP: 0000:ffff88117fd83d30  EFLAGS: 00000046
[ 1579.636006] RAX: 0000000000000042 RBX: 00000000000470af RCX: ffffffffd032bab4
[ 1579.636006] RDX: 00000000d032bab4 RSI: 000000000000000f RDI: 00000000000470af
[ 1579.636006] RBP: ffff88117fd83d68 R08: 0000000000000010 R09: ffffffff819e7660
[ 1579.636006] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000001000
[ 1579.636006] R13: 0000000000000002 R14: 000000000000000e R15: 000000000000000e
[ 1579.636006] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1579.636006] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1579.636006] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1579.636006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1579.636006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1579.636006] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1579.636006] Stack:
[ 1579.636006]  ffffffffd032ba72 0000000e00000046 0000000000000000 0000000000001000
[ 1579.636006]  0000000000000002 000000000000cbc0 0000000000000002 ffff88117fd83d78
[ 1579.636006]  ffffffff81258e1f ffff88117fd83d98 ffffffff81021aaa 000000000000000f
[ 1579.636006] Call Trace:
[ 1579.636006]  <IRQ> 
[ 1579.636006]  [<ffffffff81258e1f>] __const_udelay+0x2f/0x40
[ 1579.636006]  [<ffffffff81021aaa>] native_safe_apic_wait_icr_idle+0x1a/0x50
[ 1579.636006]  [<ffffffff81021fbd>] default_send_IPI_mask_sequence_phys+0xdd/0x130
[ 1579.636006]  [<ffffffff81025094>] physflat_send_IPI_all+0x14/0x20
[ 1579.636006]  [<ffffffff81022097>] arch_trigger_all_cpu_backtrace+0x67/0xb0
[ 1579.636006]  [<ffffffff810cfca9>] __rcu_pending+0x119/0x280
[ 1579.636006]  [<ffffffff810cfeba>] rcu_check_callbacks+0xaa/0x1b0
[ 1579.636006]  [<ffffffff810643c1>] update_process_times+0x41/0x80
[ 1579.636006]  [<ffffffff8108b05f>] tick_sched_timer+0x5f/0xc0
[ 1579.636006]  [<ffffffff8108b000>] ? tick_nohz_handler+0x100/0x100
[ 1579.636006]  [<ffffffff8107d951>] __run_hrtimer+0xd1/0x1d0
[ 1579.636006]  [<ffffffff8107dc97>] hrtimer_interrupt+0xc7/0x1f0
[ 1579.636006]  [<ffffffff81021a34>] smp_apic_timer_interrupt+0x64/0xa0
[ 1579.636006]  [<ffffffff81469db3>] apic_timer_interrupt+0x73/0x80
[ 1579.636006]  <EOI> 
[ 1579.636006]  [<ffffffff8125b8eb>] ? __bitmap_empty+0x7b/0x90
[ 1579.636006]  [<ffffffff81033df0>] flush_tlb_others_ipi+0x110/0x140
[ 1579.636006]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1579.636006]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1579.636006]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1579.636006]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1579.636006]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1579.636006]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1579.636006]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1579.636006]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1579.636006]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1579.636006] Code: db ff 66 90 48 98 44 89 75 d4 48 89 45 c8 eb 1b 66 2e 0f 1f 84 00 00 00 00 00 f3 90 65 44 8b 3c 25 b0 cb 00 00 44 3b 7d d4 75 2b <66> 66 90 0f ae e8 e8 b4 15 db ff 66 90 48 63 c8 48 89 c8 48 2b 
[ 1579.636006] Call Trace:
[ 1579.636006]  <IRQ>  [<ffffffff81258e1f>] __const_udelay+0x2f/0x40
[ 1579.636006]  [<ffffffff81021aaa>] native_safe_apic_wait_icr_idle+0x1a/0x50
[ 1579.636006]  [<ffffffff81021fbd>] default_send_IPI_mask_sequence_phys+0xdd/0x130
[ 1579.636006]  [<ffffffff81025094>] physflat_send_IPI_all+0x14/0x20
[ 1579.636006]  [<ffffffff81022097>] arch_trigger_all_cpu_backtrace+0x67/0xb0
[ 1579.636006]  [<ffffffff810cfca9>] __rcu_pending+0x119/0x280
[ 1579.636006]  [<ffffffff810cfeba>] rcu_check_callbacks+0xaa/0x1b0
[ 1579.636006]  [<ffffffff810643c1>] update_process_times+0x41/0x80
[ 1579.636006]  [<ffffffff8108b05f>] tick_sched_timer+0x5f/0xc0
[ 1579.636006]  [<ffffffff8108b000>] ? tick_nohz_handler+0x100/0x100
[ 1579.636006]  [<ffffffff8107d951>] __run_hrtimer+0xd1/0x1d0
[ 1579.636006]  [<ffffffff8107dc97>] hrtimer_interrupt+0xc7/0x1f0
[ 1579.636006]  [<ffffffff81021a34>] smp_apic_timer_interrupt+0x64/0xa0
[ 1579.636006]  [<ffffffff81469db3>] apic_timer_interrupt+0x73/0x80
[ 1579.636006]  <EOI>  [<ffffffff8125b8eb>] ? __bitmap_empty+0x7b/0x90
[ 1579.636006]  [<ffffffff81033df0>] flush_tlb_others_ipi+0x110/0x140
[ 1579.636006]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1579.636006]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1579.636006]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1579.636006]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1579.636006]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1579.636006]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1579.636006]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1579.636006]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1579.636006]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1580.574143] NMI backtrace for cpu 15
[ 1580.576011] CPU 15 
[ 1580.576011] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1580.576011] 
[ 1580.576011] Pid: 9869, comm: sh Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1580.576011] RIP: 0010:[<ffffffff8125a1f4>]  [<ffffffff8125a1f4>] restore+0xe/0x3a
[ 1580.576011] RSP: 0000:ffff8810760aff00  EFLAGS: 00000006
[ 1580.576011] RAX: 0000000000000dbe RBX: 0000000000000000 RCX: 0000000000000000
[ 1580.576011] RDX: ffff8810760dcbc0 RSI: 000000000068cd50 RDI: ffffffff81460f06
[ 1580.576011] RBP: 00007fff2d19ee70 R08: 0000000000000000 R09: 0000000000000005
[ 1580.576011] R10: 0000000000478df0 R11: ffffffffffffffff R12: 000000000068cd50
[ 1580.576011] R13: 000000000068cd50 R14: 0000000000000030 R15: 000000000069ce00
[ 1580.576011] FS:  00002b28f9fc1ba0(0000) GS:ffff88117fdc0000(0000) knlGS:0000000000000000
[ 1580.576011] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1580.576011] CR2: 000000000068c1e0 CR3: 000000107602d000 CR4: 00000000000006e0
[ 1580.576011] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1580.576011] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1580.576011] Process sh (pid: 9869, threadinfo ffff8810760ae000, task ffff8810760dcbc0)
[ 1580.576011] Stack:
[ 1580.576011]  ffffffffffffffff 0000000000478df0 0000000000000005 0000000000000000
[ 1580.576011]  0000000000477790 0000000000000000 0000000000000030 000000000068cd50
[ 1580.576011]  000000000068c1e0 ffffffff81460f06 ffffffff81460caf 000000000069ce00
[ 1580.576011] Call Trace:
[ 1580.576011]  [<ffffffff81460f06>] ? error_sti+0x5/0x6
[ 1580.576011]  [<ffffffff81460caf>] ? page_fault+0xf/0x30
[ 1580.576011] Code: 48 89 4c 24 28 48 89 44 24 20 4c 89 44 24 18 4c 89 4c 24 10 4c 89 54 24 08 4c 89 1c 24 4c 8b 1c 24 4c 8b 54 24 08 4c 8b 4c 24 10 <4c> 8b 44 24 18 48 8b 44 24 20 48 8b 4c 24 28 48 8b 54 24 30 48 
[ 1580.576011] Call Trace:
[ 1580.576011]  [<ffffffff81460f06>] ? error_sti+0x5/0x6
[ 1580.576011]  [<ffffffff81460caf>] ? page_fault+0xf/0x30
[ 1607.146085] BUG: soft lockup - CPU#14 stuck for 22s! [udevd:1068]
[ 1607.148003] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1607.148003] irq event stamp: 151806026
[ 1607.148003] hardirqs last  enabled at (151806025): [<ffffffff81460ab4>] restore_args+0x0/0x30
[ 1607.148003] hardirqs last disabled at (151806026): [<ffffffff81469dae>] apic_timer_interrupt+0x6e/0x80
[ 1607.148003] softirqs last  enabled at (151806024): [<ffffffff8105bc31>] __do_softirq+0x1a1/0x200
[ 1607.148003] softirqs last disabled at (151806019): [<ffffffff8146b53c>] call_softirq+0x1c/0x30
[ 1607.148003] CPU 14 
[ 1607.148003] Modules linked in: ipv6 cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf microcode fuse loop dm_mod usbhid i2c_i801 ioatdma i2c_core hid cdc_ether usbnet bnx2 serio_raw mii i7core_edac sg iTCO_wdt dca shpchp iTCO_vendor_support pcspkr mptctl edac_core rtc_cmos tpm_tis tpm tpm_bios button pci_hotplug uhci_hcd ehci_hcd usbcore usb_common sd_mod crc_t10dif edd ext3 mbcache jbd fan processor mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal thermal_sys hwmon
[ 1607.148003] 
[ 1607.148003] Pid: 1068, comm: udevd Tainted: G        W    3.2.0-rc7-0.0.0.28.36b5ec9-default #1 IBM IBM System x -[7870C4Q]-/68Y8033     
[ 1607.148003] RIP: 0010:[<ffffffff81033de2>]  [<ffffffff81033de2>] flush_tlb_others_ipi+0x102/0x140
[ 1607.148003] RSP: 0000:ffff881147bfdc48  EFLAGS: 00000246
[ 1607.148003] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000010
[ 1607.148003] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ffffffff81bbcfc8
[ 1607.148003] RBP: ffff881147bfdc78 R08: 0000000000000000 R09: 0000000000000000
[ 1607.148003] R10: 0000000000000002 R11: ffff8811475a8580 R12: ffffffff81460ab4
[ 1607.148003] R13: 000000000000000e R14: ffff881147bfdba8 R15: ffff8808ca7e0c80
[ 1607.148003] FS:  00007fbaa8d43780(0000) GS:ffff88117fd80000(0000) knlGS:0000000000000000
[ 1607.148003] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1607.148003] CR2: 00007fff56156db8 CR3: 00000011473a0000 CR4: 00000000000006e0
[ 1607.148003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1607.148003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1607.148003] Process udevd (pid: 1068, threadinfo ffff881147bfc000, task ffff881146fe96c0)
[ 1607.148003] Stack:
[ 1607.148003]  ffff881147bfdcb8 ffff881146a20d80 00007fff56156db8 ffff881146a20de0
[ 1607.148003]  ffff88114739f818 ffff881147776ab0 ffff881147bfdc88 ffffffff81033e29
[ 1607.148003]  ffff881147bfdcb8 ffffffff81033f2a ffff881146a20df8 0000000000000001
[ 1607.148003] Call Trace:
[ 1607.148003]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1607.148003]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1607.148003]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1607.148003]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1607.148003]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1607.148003]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1607.148003]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1607.148003]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1607.148003]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1607.148003] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 91 14 9b 00 41 8d b7 cf 00 00 00 4c 89 e7 ff 90 d0 00 00 00 eb 09 0f 1f 80 00 00 00 00 f3 90 <8b> 35 00 4f 9b 00 4c 89 e7 e8 80 7a 22 00 85 c0 74 ec eb 84 66 
[ 1607.148003] Call Trace:
[ 1607.148003]  [<ffffffff81033e29>] native_flush_tlb_others+0x9/0x10
[ 1607.148003]  [<ffffffff81033f2a>] flush_tlb_page+0x5a/0xa0
[ 1607.148003]  [<ffffffff81032a4d>] ptep_set_access_flags+0x4d/0x70
[ 1607.148003]  [<ffffffff811268a9>] do_wp_page+0x469/0x7e0
[ 1607.148003]  [<ffffffff81127acd>] handle_pte_fault+0x19d/0x1e0
[ 1607.148003]  [<ffffffff81127c88>] handle_mm_fault+0x178/0x2e0
[ 1607.148003]  [<ffffffff81464315>] do_page_fault+0x1e5/0x490
[ 1607.148003]  [<ffffffff8125a17d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 1607.148003]  [<ffffffff81460cc5>] page_fault+0x25/0x30
[ 1607.553462] CPU 14 MCA banks CMCI:6 CMCI:8
[ 1607.562277] Broke affinity for irq 74
[ 1607.564339] Broke affinity for irq 80
[ 1607.570964] CPU 14 is now offline
[ 1607.576191] CPU 15 MCA banks CMCI:6 CMCI:8
[ 1607.582947] Broke affinity for irq 76
[ 1607.587820] CPU 15 is now offline
[ 1607.591275] lockdep: fixing up alternatives.
[ 1607.595716] SMP alternatives: switching to UP code
[ 1607.656141] lockdep: fixing up alternatives.
[ 1607.660614] SMP alternatives: switching to SMP code
[ 1607.669045] Booting Node 0 Processor 1 APIC 0x2
[ 1607.673721] smpboot cpu 1: start_ip = 97000
[ 1319.218635] Calibrating delay loop (skipped) already calibrated this CPU
[ 1607.697178] NMI watchdog enabled, takes one hw-pmu counter.
[ 1607.715459] lockdep: fixing up alternatives.
[ 1607.719911] Booting Node 0 Processor 2 APIC 0x4
[ 1607.724552] smpboot cpu 2: start_ip = 97000
[ 1319.242949] Calibrating delay loop (skipped) already calibrated this CPU
[ 1607.747636] NMI watchdog enabled, takes one hw-pmu counter.
[ 1607.760177] lockdep: fixing up alternatives.
[ 1607.764552] Booting Node 0 Processor 3 APIC 0x6
[ 1607.769178] smpboot cpu 3: start_ip = 97000
[ 1319.271602] Calibrating delay loop (skipped) already calibrated this CPU
[ 1607.792496] NMI watchdog enabled, takes one hw-pmu counter.
[ 1607.803598] lockdep: fixing up alternatives.


Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-06  6:06                 ` Srivatsa S. Bhat
@ 2012-01-06 10:46                   ` Mel Gorman
  0 siblings, 0 replies; 37+ messages in thread
From: Mel Gorman @ 2012-01-06 10:46 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Paul E. McKenney, Russell King - ARM Linux, KOSAKI Motohiro,
	Gilad Ben-Yossef, linux-kernel, Chris Metcalf, Peter Zijlstra,
	Frederic Weisbecker, linux-mm, Pekka Enberg, Matt Mackall,
	Sasha Levin, Rik van Riel, Andi Kleen, Andrew Morton,
	Alexander Viro, Greg KH, linux-fsdevel, Avi Kivity

On Fri, Jan 06, 2012 at 11:36:11AM +0530, Srivatsa S. Bhat wrote:
> On 01/06/2012 03:51 AM, Mel Gorman wrote:
> 
> > (Adding Greg to cc to see if he recalls seeing issues with sysfs dentry
> > suffering from recursive locking recently)
> > 
> > On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote:
> >> On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote:
> >>> On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> >>>> Link please?
> >>>
> >>> Forwarded, as its still in my mailbox.
> >>>
> >>>> I'm including a patch below under development that is
> >>>> intended to only cope with the page allocator case under heavy memory
> >>>> pressure. Currently it does not pass testing because eventually RCU
> >>>> gets stalled with the following trace
> >>>>
> >>>> [ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
> >>>> [ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
> >>>> [ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
> >>>> [ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
> >>>> [ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
> >>>> [ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
> >>>> [ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
> >>>> [ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
> >>>> [ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
> >>>> [ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
> >>>> [ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
> >>>> [ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
> >>>> [ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
> >>>> [ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
> >>>> [ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
> >>>> [ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
> >>>> [ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
> >>>>
> >>>> It might be a separate bug, don't know for sure.
> >>
> > 
> > I rebased the patch on top of 3.2 and tested again with a bunch of
> > debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU
> > hotplug is a lot more reliable and less likely to hang but eventually
> > gets into trouble.
> > 
> 
> I was running some CPU hotplug stress tests recently and found it to be
> problematic too. Mel, I have some logs from those tests which appear very
> relevant to the "IPI to offline CPU" issue that has been discussed in this
> thread.
> 
> Kernel: 3.2-rc7
> Here is the log: 
> (Unfortunately I couldn't capture the log intact, due to some annoying
> serial console issues, but I hope this log is good enough to analyze.)
>   

Ok, it looks vaguely similar to what I'm seeing. I think I spotted
the sysfs problem as well and am testing a series. I'll add you to
the cc if it passes tests locally.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 22:21               ` Mel Gorman
  2012-01-06  6:06                 ` Srivatsa S. Bhat
@ 2012-01-06 13:28                 ` Greg KH
  2012-01-06 14:09                   ` Mel Gorman
  1 sibling, 1 reply; 37+ messages in thread
From: Greg KH @ 2012-01-06 13:28 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Paul E. McKenney, Russell King - ARM Linux, KOSAKI Motohiro,
	Gilad Ben-Yossef, linux-kernel, Chris Metcalf, Peter Zijlstra,
	Frederic Weisbecker, linux-mm, Pekka Enberg, Matt Mackall,
	Sasha Levin, Rik van Riel, Andi Kleen, Andrew Morton,
	Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, Jan 05, 2012 at 10:21:16PM +0000, Mel Gorman wrote:
> (Adding Greg to cc to see if he recalls seeing issues with sysfs dentry
> suffering from recursive locking recently)
> 
> On Thu, Jan 05, 2012 at 10:35:04AM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 05, 2012 at 04:35:29PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> > > > Link please?
> > > 
> > > Forwarded, as its still in my mailbox.
> > > 
> > > > I'm including a patch below under development that is
> > > > intended to only cope with the page allocator case under heavy memory
> > > > pressure. Currently it does not pass testing because eventually RCU
> > > > gets stalled with the following trace
> > > > 
> > > > [ 1817.176001]  [<ffffffff810214d7>] arch_trigger_all_cpu_backtrace+0x87/0xa0
> > > > [ 1817.176001]  [<ffffffff810c4779>] __rcu_pending+0x149/0x260
> > > > [ 1817.176001]  [<ffffffff810c48ef>] rcu_check_callbacks+0x5f/0x110
> > > > [ 1817.176001]  [<ffffffff81068d7f>] update_process_times+0x3f/0x80
> > > > [ 1817.176001]  [<ffffffff8108c4eb>] tick_sched_timer+0x5b/0xc0
> > > > [ 1817.176001]  [<ffffffff8107f28e>] __run_hrtimer+0xbe/0x1a0
> > > > [ 1817.176001]  [<ffffffff8107f581>] hrtimer_interrupt+0xc1/0x1e0
> > > > [ 1817.176001]  [<ffffffff81020ef3>] smp_apic_timer_interrupt+0x63/0xa0
> > > > [ 1817.176001]  [<ffffffff81449073>] apic_timer_interrupt+0x13/0x20
> > > > [ 1817.176001]  [<ffffffff8116c135>] vfsmount_lock_local_lock+0x25/0x30
> > > > [ 1817.176001]  [<ffffffff8115c855>] path_init+0x2d5/0x370
> > > > [ 1817.176001]  [<ffffffff8115eecd>] path_lookupat+0x2d/0x620
> > > > [ 1817.176001]  [<ffffffff8115f4ef>] do_path_lookup+0x2f/0xd0
> > > > [ 1817.176001]  [<ffffffff811602af>] user_path_at_empty+0x9f/0xd0
> > > > [ 1817.176001]  [<ffffffff81154e7b>] vfs_fstatat+0x4b/0x90
> > > > [ 1817.176001]  [<ffffffff81154f4f>] sys_newlstat+0x1f/0x50
> > > > [ 1817.176001]  [<ffffffff81448692>] system_call_fastpath+0x16/0x1b
> > > > 
> > > > It might be a separate bug, don't know for sure.
> > 
> 
> I rebased the patch on top of 3.2 and tested again with a bunch of
> debugging options set (PROVE_RCU, PROVE_LOCKING etc). Same results. CPU
> hotplug is a lot more reliable and less likely to hang but eventually
> gets into trouble.
> 
> Taking a closer look though, I don't think this is an RCU problem. It's
> just the messenger.
> 
> > Do you get multiple RCU CPU stall-warning messages? 
> 
> Yes, one roughly every 50000 jiffies or so (HZ=250).
> 
> [  878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies)
> [  878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies)
> [ 1072.878669] INFO: rcu_sched detected stall on CPU 3 (t=65030 jiffies)
> [ 1072.878672] INFO: rcu_sched detected stall on CPU 6 (t=65030 jiffies)
> [ 1267.442308] INFO: rcu_sched detected stall on CPU 3 (t=113810 jiffies)
> [ 1267.442312] INFO: rcu_sched detected stall on CPU 6 (t=113810 jiffies)
> [ 1462.005948] INFO: rcu_sched detected stall on CPU 3 (t=162590 jiffies)
> [ 1462.005952] INFO: rcu_sched detected stall on CPU 6 (t=162590 jiffies)
> [ 1656.569588] INFO: rcu_sched detected stall on CPU 3 (t=211370 jiffies)
> [ 1656.569592] INFO: rcu_sched detected stall on CPU 6 (t=211370 jiffies)
> [ 1851.133229] INFO: rcu_sched detected stall on CPU 6 (t=260150 jiffies)
> [ 1851.133233] INFO: rcu_sched detected stall on CPU 3 (t=260150 jiffies)
> [ 2045.696868] INFO: rcu_sched detected stall on CPU 3 (t=308930 jiffies)
> [ 2045.696872] INFO: rcu_sched detected stall on CPU 6 (t=308930 jiffies)
> [ 2240.260508] INFO: rcu_sched detected stall on CPU 6 (t=357710 jiffies)
> [ 2240.260511] INFO: rcu_sched detected stall on CPU 3 (t=357710 jiffies)
> 
> > If so, it can
> > be helpful to look at how the stack frame changes over time.  These
> > stalls are normally caused by a loop in the kernel with preemption
> > disabled, though other scenarios can also cause them.
> > 
> 
> The stacks are not changing much over time and start with this;
> 
> [  878.315029] INFO: rcu_sched detected stall on CPU 3 (t=16250 jiffies)
> [  878.315032] INFO: rcu_sched detected stall on CPU 6 (t=16250 jiffies)
> [  878.315036] Pid: 4422, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2
> [  878.315037] Call Trace:
> [  878.315038]  <IRQ>  [<ffffffff810a8b20>] __rcu_pending+0x8e/0x36c
> [  878.315052]  [<ffffffff81071b9a>] ? tick_nohz_handler+0xdc/0xdc
> [  878.315054]  [<ffffffff810a8f04>] rcu_check_callbacks+0x106/0x172
> [  878.315056]  [<ffffffff810528e0>] update_process_times+0x3f/0x76
> [  878.315058]  [<ffffffff81071c0a>] tick_sched_timer+0x70/0x9a
> [  878.315060]  [<ffffffff8106654e>] __run_hrtimer+0xc7/0x157
> [  878.315062]  [<ffffffff810667ec>] hrtimer_interrupt+0xba/0x18a
> [  878.315065]  [<ffffffff8134fbad>] smp_apic_timer_interrupt+0x86/0x99
> [  878.315067]  [<ffffffff8134dbf3>] apic_timer_interrupt+0x73/0x80
> [  878.315068]  <EOI>  [<ffffffff81345f34>] ? retint_restore_args+0x13/0x13
> [  878.315072]  [<ffffffff81139591>] ? __shrink_dcache_sb+0x7d/0x19f
> [  878.315075]  [<ffffffff81008c6e>] ? native_read_tsc+0x1/0x16
> [  878.315077]  [<ffffffff811df434>] ? delay_tsc+0x3a/0x82
> [  878.315079]  [<ffffffff811df4a1>] __delay+0xf/0x11
> [  878.315081]  [<ffffffff811e51e5>] do_raw_spin_lock+0xb5/0xf9
> [  878.315083]  [<ffffffff81345561>] _raw_spin_lock+0x39/0x3d
> [  878.315085]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
> [  878.315087]  [<ffffffff8113972a>] shrink_dcache_parent+0x77/0x28c
> [  878.315089]  [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd
> [  878.315092]  [<ffffffff81185970>] sysfs_dentry_revalidate+0xaa/0xbe
> [  878.315093]  [<ffffffff8112e731>] do_lookup+0x263/0x2fc
> [  878.315096]  [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20
> [  878.315098]  [<ffffffff8112f33d>] link_path_walk+0x1e2/0x763
> [  878.315099]  [<ffffffff8112fd66>] path_lookupat+0x5c/0x61a
> [  878.315102]  [<ffffffff810f4810>] ? might_fault+0x89/0x8d
> [  878.315104]  [<ffffffff810f47c7>] ? might_fault+0x40/0x8d
> [  878.315105]  [<ffffffff8113034e>] do_path_lookup+0x2a/0xa8
> [  878.315107]  [<ffffffff81132a51>] user_path_at_empty+0x5d/0x97
> [  878.315109]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
> [  878.315111]  [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a
> [  878.315112]  [<ffffffff81132a9c>] user_path_at+0x11/0x13
> [  878.315115]  [<ffffffff81128b64>] vfs_fstatat+0x44/0x71
> [  878.315117]  [<ffffffff81128bef>] vfs_lstat+0x1e/0x20
> [  878.315118]  [<ffffffff81128c10>] sys_newlstat+0x1f/0x40
> [  878.315120]  [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164
> [  878.315122]  [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  878.315124]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
> [  878.315126]  [<ffffffff8134d082>] system_call_fastpath+0x16/0x1b
> [  878.557790] Pid: 5704, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2
> [  878.564226] Call Trace:
> [  878.566677]  <IRQ>  [<ffffffff810a8b20>] __rcu_pending+0x8e/0x36c
> [  878.572783]  [<ffffffff81071b9a>] ? tick_nohz_handler+0xdc/0xdc
> [  878.578702]  [<ffffffff810a8f04>] rcu_check_callbacks+0x106/0x172
> [  878.584794]  [<ffffffff810528e0>] update_process_times+0x3f/0x76
> [  878.590798]  [<ffffffff81071c0a>] tick_sched_timer+0x70/0x9a
> [  878.596459]  [<ffffffff8106654e>] __run_hrtimer+0xc7/0x157
> [  878.601944]  [<ffffffff810667ec>] hrtimer_interrupt+0xba/0x18a
> [  878.607778]  [<ffffffff8134fbad>] smp_apic_timer_interrupt+0x86/0x99
> [  878.614129]  [<ffffffff8134dbf3>] apic_timer_interrupt+0x73/0x80
> [  878.620134]  <EOI>  [<ffffffff81051e66>] ? run_timer_softirq+0x49/0x32a
> [  878.626759]  [<ffffffff81139591>] ? __shrink_dcache_sb+0x7d/0x19f
> [  878.632851]  [<ffffffff811df402>] ? delay_tsc+0x8/0x82
> [  878.637988]  [<ffffffff811df4a1>] __delay+0xf/0x11
> [  878.642778]  [<ffffffff811e51e5>] do_raw_spin_lock+0xb5/0xf9
> [  878.648437]  [<ffffffff81345561>] _raw_spin_lock+0x39/0x3d
> [  878.653920]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
> [  878.660186]  [<ffffffff8113972a>] shrink_dcache_parent+0x77/0x28c
> [  878.666277]  [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd
> [  878.672107]  [<ffffffff81185970>] sysfs_dentry_revalidate+0xaa/0xbe
> [  878.678372]  [<ffffffff8112e731>] do_lookup+0x263/0x2fc
> [  878.683596]  [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20
> [  878.690207]  [<ffffffff8112f33d>] link_path_walk+0x1e2/0x763
> [  878.695866]  [<ffffffff8112fd66>] path_lookupat+0x5c/0x61a
> [  878.701350]  [<ffffffff810f4810>] ? might_fault+0x89/0x8d
> [  878.706747]  [<ffffffff810f47c7>] ? might_fault+0x40/0x8d
> [  878.712145]  [<ffffffff8113034e>] do_path_lookup+0x2a/0xa8
> [  878.717630]  [<ffffffff81132a51>] user_path_at_empty+0x5d/0x97
> [  878.723463]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
> [  878.729295]  [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a
> [  878.736080]  [<ffffffff81132a9c>] user_path_at+0x11/0x13
> [  878.741391]  [<ffffffff81128b64>] vfs_fstatat+0x44/0x71
> [  878.746616]  [<ffffffff81128bef>] vfs_lstat+0x1e/0x20
> [  878.751668]  [<ffffffff81128c10>] sys_newlstat+0x1f/0x40
> [  878.756981]  [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164
> [  878.763678]  [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  878.770116]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
> [  878.775949]  [<ffffffff8134d082>] system_call_fastpath+0x16/0x1b
> [  908.769486] BUG: spinlock lockup on CPU#6, udevd/4422
> [  908.774547]  lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4
> 
> Seeing that the owner was CPU 4, I found earlier in the log
> 
> [  815.244051] BUG: spinlock lockup on CPU#4, udevd/5709
> [  815.249103]  lock: ffff8803b4c701c8, .magic: dead4ead, .owner: udevd/5709, .owner_cpu: 4
> [  815.258430] Pid: 5709, comm: udevd Not tainted 3.2.0-guardipi-v1r6 #2
> [  815.264866] Call Trace:
> [  815.267329]  [<ffffffff811e507d>] spin_dump+0x88/0x8d
> [  815.272388]  [<ffffffff811e5206>] do_raw_spin_lock+0xd6/0xf9
> [  815.278062]  [<ffffffff81345561>] ? _raw_spin_lock+0x39/0x3d
> [  815.283720]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
> [  815.289986]  [<ffffffff8113972a>] ? shrink_dcache_parent+0x77/0x28c
> [  815.296249]  [<ffffffff8113741d>] ? have_submounts+0x13e/0x1bd
> [  815.302080]  [<ffffffff81185970>] ? sysfs_dentry_revalidate+0xaa/0xbe
> [  815.308515]  [<ffffffff8112e731>] ? do_lookup+0x263/0x2fc
> [  815.313915]  [<ffffffff8119ca13>] ? security_inode_permission+0x1e/0x20
> [  815.320524]  [<ffffffff8112f33d>] ? link_path_walk+0x1e2/0x763
> [  815.326357]  [<ffffffff8112fd66>] ? path_lookupat+0x5c/0x61a
> [  815.332014]  [<ffffffff810f4810>] ? might_fault+0x89/0x8d
> [  815.337410]  [<ffffffff810f47c7>] ? might_fault+0x40/0x8d
> [  815.342807]  [<ffffffff8113034e>] ? do_path_lookup+0x2a/0xa8
> [  815.348465]  [<ffffffff81132a51>] ? user_path_at_empty+0x5d/0x97
> [  815.354474]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
> [  815.360303]  [<ffffffff81345c4f>] ? _raw_spin_unlock_irqrestore+0x44/0x5a
> [  815.367085]  [<ffffffff81132a9c>] ? user_path_at+0x11/0x13
> [  815.372569]  [<ffffffff81128b64>] ? vfs_fstatat+0x44/0x71
> [  815.377965]  [<ffffffff81128bef>] ? vfs_lstat+0x1e/0x20
> [  815.383192]  [<ffffffff81128c10>] ? sys_newlstat+0x1f/0x40
> [  815.388676]  [<ffffffff810759a8>] ? trace_hardirqs_on_caller+0x12d/0x164
> [  815.395373]  [<ffffffff811e057e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [  815.401811]  [<ffffffff8107447f>] ? trace_hardirqs_off+0xd/0xf
> [  815.407642]  [<ffffffff8134d082>] ? system_call_fastpath+0x16/0x1b
> 
> The trace is not particularly useful but it looks like it
> recursively locked even though the message doesn't say that.  If the
> shrink_dcache_parent() entry is accurate, that corresponds to this
> 
> static int select_parent(struct dentry * parent)
> {
>         struct dentry *this_parent;
>         struct list_head *next;
>         unsigned seq;
>         int found = 0;
>         int locked = 0;
> 
>         seq = read_seqbegin(&rename_lock);
> again: 
>         this_parent = parent;
>         spin_lock(&this_parent->d_lock); <----- HERE
> 
> I'm not overly clear on how VFS locking is meant to work but it almost
> looks as if the last reference to an inode is being dropped during a
> sysfs path lookup. Is that meant to happen?
> 
> Judging by sysfs_dentry_revalidate() - possibly not. It looks like
> we must have reached out_bad: and called shrink_dcache_parent() on a
> dentry that was already locked by the running process. Not sure how
> this could have happened - Greg, does this look familiar?

I don't know.  I'm working with some others who are trying to trace down
a sysfs lockup bug when files go away and are created very quickly and
userspace tries to stat them, but I'm not quite sure this is the same
issue or not.

Are these sysfs files being removed that you are having problems with?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-06 13:28                 ` Greg KH
@ 2012-01-06 14:09                   ` Mel Gorman
  0 siblings, 0 replies; 37+ messages in thread
From: Mel Gorman @ 2012-01-06 14:09 UTC (permalink / raw)
  To: Greg KH
  Cc: Paul E. McKenney, Russell King - ARM Linux, KOSAKI Motohiro,
	Gilad Ben-Yossef, linux-kernel, Chris Metcalf, Peter Zijlstra,
	Frederic Weisbecker, linux-mm, Pekka Enberg, Matt Mackall,
	Sasha Levin, Rik van Riel, Andi Kleen, Andrew Morton,
	Alexander Viro, linux-fsdevel, Avi Kivity

On Fri, Jan 06, 2012 at 05:28:47AM -0800, Greg KH wrote:
> > <SNIP>
> > 
> > I'm not overly clear on how VFS locking is meant to work but it almost
> > looks as if the last reference to an inode is being dropped during a
> > sysfs path lookup. Is that meant to happen?
> > 
> > Judging by sysfs_dentry_revalidate() - possibly not. It looks like
> > we must have reached out_bad: and called shrink_dcache_parent() on a
> > dentry that was already locked by the running process. Not sure how
> > this could have happened - Greg, does this look familiar?
> 
> I don't know.  I'm working with some others who are trying to trace down
> a sysfs lockup bug when files go away and are created very quickly and
> userspace tries to stat them, but I'm not quite sure this is the same
> issue or not.
> 

It seems similar.

> Are these sysfs files being removed that you are having problems with?
> 

Yes, considering that cpu hot-remove is happening around the same time
which results in sysfs files and directories being removed. I'm
currently testing the following patch in conjunction with a page
allocator fix. It's still running after 5 hours which is good but will
take some time to complete.

This patch is part of a short series I planned to post on Monday if
tests complete successfully. The changelog has an ample amount of
guesswork in there.

---8<---
From: Mel Gorman <mgorman@suse.de>
Subject: [PATCH] fs: sysfs: Do dcache-related updates to sysfs dentries under sysfs_mutex

While running a CPU hotplug stress test under memory pressure, a
spinlock lockup was detected due to what looks like sysfs recursively
taking a lock on a dentry. When this happens varies considerably
and is difficult to trigger.

[  482.345588] BUG: spinlock lockup on CPU#2, udevd/4400
[  482.345590]  lock: ffff8803075be0d0, .magic: dead4ead, .owner: udevd/5689, .owner_cpu: 0
[  482.345592] Pid: 4400, comm: udevd Not tainted 3.2.0-vanilla #1
[  482.345592] Call Trace:
[  482.345595]  [<ffffffff811e4ffd>] spin_dump+0x88/0x8d
[  482.345597]  [<ffffffff811e5186>] do_raw_spin_lock+0xd6/0xf9
[  482.345599]  [<ffffffff813454e1>] _raw_spin_lock+0x39/0x3d
[  482.345601]  [<ffffffff811396b6>] ? shrink_dcache_parent+0x77/0x28c
[  482.345603]  [<ffffffff811396b6>] shrink_dcache_parent+0x77/0x28c
[  482.345605]  [<ffffffff811373a9>] ? have_submounts+0x13e/0x1bd
[  482.345607]  [<ffffffff811858f8>] sysfs_dentry_revalidate+0xaa/0xbe
[  482.345608]  [<ffffffff8112e6bd>] do_lookup+0x263/0x2fc
[  482.345610]  [<ffffffff8119c99b>] ? security_inode_permission+0x1e/0x20
[  482.345612]  [<ffffffff8112f2c9>] link_path_walk+0x1e2/0x763
[  482.345614]  [<ffffffff8112fcf2>] path_lookupat+0x5c/0x61a
[  482.345616]  [<ffffffff810f479c>] ? might_fault+0x89/0x8d
[  482.345618]  [<ffffffff810f4753>] ? might_fault+0x40/0x8d
[  482.345619]  [<ffffffff811302da>] do_path_lookup+0x2a/0xa8
[  482.345621]  [<ffffffff811329dd>] user_path_at_empty+0x5d/0x97
[  482.345623]  [<ffffffff8107441b>] ? trace_hardirqs_off+0xd/0xf
[  482.345625]  [<ffffffff81345bcf>] ? _raw_spin_unlock_irqrestore+0x44/0x5a
[  482.345627]  [<ffffffff81132a28>] user_path_at+0x11/0x13
[  482.345629]  [<ffffffff81128af0>] vfs_fstatat+0x44/0x71
[  482.345631]  [<ffffffff81128b7b>] vfs_lstat+0x1e/0x20
[  482.345632]  [<ffffffff81128b9c>] sys_newlstat+0x1f/0x40
[  482.345634]  [<ffffffff81075944>] ? trace_hardirqs_on_caller+0x12d/0x164
[  482.345636]  [<ffffffff811e04fe>] ?  trace_hardirqs_on_thunk+0x3a/0x3f
[  482.345638]  [<ffffffff8107441b>] ? trace_hardirqs_off+0xd/0xf
[  482.345640]  [<ffffffff8134d002>] system_call_fastpath+0x16/0x1b
[  482.515004]  [<ffffffff8107441b>] ? trace_hardirqs_off+0xd/0xf
[  482.520870]  [<ffffffff8134d002>] system_call_fastpath+0x16/0x1b

At this point, CPU hotplug stops and other processes get stuck in a
similar deadlock waiting for 5689 to unlock. RCU reports stalls but
it is collateral damage.

Most of the deadlocked processes have sysfs_dentry_revalidate()
in common and while the cause of the deadlock is unclear to me, it
feels like a race between udev receiving an fsnotify for cpuonline
versus udev receiving another fsnotify for cpuoffline.

During online or offline, a number of dentries are being created and
deleted. udev is receiving fsnotifies of the activity. I suspect what
is happening is due to insufficient locking that one of the fsnotifies
operates on a dentry that is in the process of being dropped from
dcache. Looking at sysfs, it looks like there is a global sysfs_mutex
that protects the sysfs directory tree from concurrent reclaims. Almost
all operations involving directory inodes and dentries take place
under the sysfs_mutex - linking, unlinking, patch searching lookup,
renames and readdir.

d_invalidate is slightly different. It is mostly under the mutex but
if the dentry has to be removed from the dcache, the mutex is dropped.
This patch holds the mutex for the dcache operation to protect the
dentry from concurrent operations while it is being dropped. Once
applied, this particular bug no longer occurs.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 fs/sysfs/dir.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 7fdf6a7..acaf21d 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -279,8 +279,8 @@ static int sysfs_dentry_revalidate(struct dentry *dentry, struct nameidata *nd)
 	if (strcmp(dentry->d_name.name, sd->s_name) != 0)
 		goto out_bad;
 
-	mutex_unlock(&sysfs_mutex);
 out_valid:
+	mutex_unlock(&sysfs_mutex);
 	return 1;
 out_bad:
 	/* Remove the dentry from the dcache hashes.
@@ -294,7 +294,6 @@ out_bad:
 	 * to the dcache hashes.
 	 */
 	is_dir = (sysfs_type(sd) == SYSFS_DIR);
-	mutex_unlock(&sysfs_mutex);
 	if (is_dir) {
 		/* If we have submounts we must allow the vfs caches
 		 * to lie about the state of the filesystem to prevent
@@ -305,6 +304,7 @@ out_bad:
 		shrink_dcache_parent(dentry);
 	}
 	d_drop(dentry);
+	mutex_unlock(&sysfs_mutex);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 16:17         ` Mel Gorman
  2012-01-05 16:35           ` Russell King - ARM Linux
  2012-01-05 22:06           ` Andrew Morton
@ 2012-01-07 16:52           ` Paul E. McKenney
  2012-01-07 17:05             ` Paul E. McKenney
  2 siblings, 1 reply; 37+ messages in thread
From: Paul E. McKenney @ 2012-01-07 16:52 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Russell King - ARM Linux, KOSAKI Motohiro, Gilad Ben-Yossef,
	linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Andrew Morton, Alexander Viro, linux-fsdevel,
	Avi Kivity

On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> On Thu, Jan 05, 2012 at 02:40:11PM +0000, Russell King - ARM Linux wrote:
> > On Thu, Jan 05, 2012 at 02:20:17PM +0000, Mel Gorman wrote:

[ . . . ]

> > I've been chasing that patch and getting no replies what so
> > ever from folk like Peter, Thomas and Ingo.
> > 
> > The problem affects all IPI-raising functions, which mask with
> > cpu_online_mask directly.
> 
> Actually, in one sense I'm glad to hear it because from my brief
> poking around, I was having trouble understanding why we were always
> safe from sending IPIs to CPUs in the process of being offlined.

The trick is to disable preemption (not interrupts!) across the IPI, which
prevents CPU-hotplug's stop_machine() from running.  You also have to
have checked that the CPU is online within this same preemption-disabled
section of code.  This means that the outgoing CPU has to accept IPIs
even after its CPU_DOWN_PREPARE notifier has been called -- right up
to the stop_machine() call to take_cpu_down().

							Thanx, Paul


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-07 16:52           ` Paul E. McKenney
@ 2012-01-07 17:05             ` Paul E. McKenney
  0 siblings, 0 replies; 37+ messages in thread
From: Paul E. McKenney @ 2012-01-07 17:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Russell King - ARM Linux, KOSAKI Motohiro, Gilad Ben-Yossef,
	linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Andrew Morton, Alexander Viro, linux-fsdevel,
	Avi Kivity

On Sat, Jan 07, 2012 at 08:52:01AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 05, 2012 at 04:17:39PM +0000, Mel Gorman wrote:
> > On Thu, Jan 05, 2012 at 02:40:11PM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Jan 05, 2012 at 02:20:17PM +0000, Mel Gorman wrote:
> 
> [ . . . ]
> 
> > > I've been chasing that patch and getting no replies what so
> > > ever from folk like Peter, Thomas and Ingo.
> > > 
> > > The problem affects all IPI-raising functions, which mask with
> > > cpu_online_mask directly.
> > 
> > Actually, in one sense I'm glad to hear it because from my brief
> > poking around, I was having trouble understanding why we were always
> > safe from sending IPIs to CPUs in the process of being offlined.
> 
> The trick is to disable preemption (not interrupts!) across the IPI, which
> prevents CPU-hotplug's stop_machine() from running.  You also have to
> have checked that the CPU is online within this same preemption-disabled
> section of code.  This means that the outgoing CPU has to accept IPIs
> even after its CPU_DOWN_PREPARE notifier has been called -- right up
> to the stop_machine() call to take_cpu_down().

Of course, another trick is to hold the CPU-hotplug lock across the IPI,
but this is quite a bit more heavy-weight than disabling preemption.

							Thanx, Paul

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 15:54   ` Mel Gorman
@ 2012-01-08 16:01     ` Gilad Ben-Yossef
  0 siblings, 0 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-08 16:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin,
	Rik van Riel, Andi Kleen, Andrew Morton, Alexander Viro,
	linux-fsdevel, Avi Kivity

On Thu, Jan 5, 2012 at 5:54 PM, Mel Gorman <mel@csn.ul.ie> wrote:
>
> On Mon, Jan 02, 2012 at 12:24:18PM +0200, Gilad Ben-Yossef wrote:


>
> > Tested by running "hackbench 400" on a 4 CPU x86 otherwise
> > idle VM and observing the difference between the number
> > of direct reclaim attempts that end up in drain_all_pages()
> > and those were more then 1/2 of the online CPU had any
> > per-cpu page in them, using the vmstat counters introduced
> > in the next patch in the series and using proc/interrupts.
> >
> > In the test sceanrio, this saved around 500 global IPIs.
> > After trigerring an OOM:
> >
> > $ cat /proc/vmstat
> > ...
> > pcp_global_drain 627
> > pcp_global_ipi_saved 578
> >
>
> This isn't 99% savings as you claim earlier but they are still great.
>

You are right of course, more like 92%. I did  see test runs where the %
was 99% (which is were the 99% number came from) .I never saw it drop
below 90% for the specified  test load.

I modified the description to read 90%+. I guess that is good enough.

> Thanks for doing the stats. Just to be clear, I didn't expect these
> stats to be merged, nor do I want them to. I wanted to be sure the patch
> was really behaving as advertised.
>
> Acked-by: Mel Gorman <mgorman@suse.de>
>
Of course, my pleasure and thanks for the review.
>
>
> > +     for_each_online_cpu(cpu)
> > +             for_each_populated_zone(zone) {
> > +                     pcp = per_cpu_ptr(zone->pageset, cpu);
> > +                     if (pcp->pcp.count)
> > +                             cpumask_set_cpu(cpu, cpus_with_pcps);
> > +                     else
> > +                             cpumask_clear_cpu(cpu, cpus_with_pcps);
> > +             }
> > +     on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1);
>
> As a heads-up, I'm looking at a candidate CPU hotplug patch that almost
> certainly will collide with this patch. If/when I get it fixed, I'll be
> sure to CC you so we can figure out what order the patches need to go
> in. Ordinarily it wouldn't matter but if this really is a CPU hotplug
> fix, it might also be a -stable candidate so it would need to go in
> before your patches.


No problem. I'm sending v6 right now because of unrelated changes Andrew M.
asked for. I'll be happy to re-base on top of CPU hotplug fixes later.

Thanks,
Gilad


--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function
  2012-01-03 22:26   ` Andrew Morton
  2012-01-05 13:17     ` Michal Nazarewicz
@ 2012-01-08 16:04     ` Gilad Ben-Yossef
  1 sibling, 0 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-08 16:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	Russell King, linux-mm, Pekka Enberg, Matt Mackall, Rik van Riel,
	Andi Kleen, Sasha Levin, Mel Gorman, Alexander Viro,
	linux-fsdevel, Avi Kivity

On Wed, Jan 4, 2012 at 12:26 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon,  2 Jan 2012 12:24:12 +0200
> Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>> on_each_cpu_mask calls a function on processors specified my cpumask,
>> which may include the local processor.
>>
>> All the limitation specified in smp_call_function_many apply.
>>
>> ...
>>
>> --- a/include/linux/smp.h
>> +++ b/include/linux/smp.h
>> @@ -102,6 +102,13 @@ static inline void call_function_init(void) { }
>>  int on_each_cpu(smp_call_func_t func, void *info, int wait);
>>
>>  /*
>> + * Call a function on processors specified by mask, which might include
>> + * the local one.
>> + */
>> +void on_each_cpu_mask(const struct cpumask *mask, void (*func)(void *),
>> +             void *info, bool wait);
>> +
>> +/*
>>   * Mark the boot cpu "online" so that it can call console drivers in
>>   * printk() and can access its per-cpu storage.
>>   */
>> @@ -132,6 +139,15 @@ static inline int up_smp_call_function(smp_call_func_t func, void *info)
>>               local_irq_enable();             \
>>               0;                              \
>>       })
>> +#define on_each_cpu_mask(mask, func, info, wait) \
>> +     do {                                            \
>> +             if (cpumask_test_cpu(0, (mask))) {      \
>> +                     local_irq_disable();            \
>> +                     (func)(info);                   \
>> +                     local_irq_enable();             \
>> +             }                                       \
>> +     } while (0)
>
> Why is the cpumask_test_cpu() call there?  It's hard to think of a
> reason why "mask" would specify any CPU other than "0" in a
> uniprocessor kernel.

As Michal already answered, because the current CPU might be not
specified in the mask, even on UP.

> If this code remains as-is, please add a comment here explaining this,
> so others don't wonder the same thing.

Comment added and will be included in V6.

Thanks for the review.

Gilad



-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func
  2012-01-03 22:34   ` Andrew Morton
@ 2012-01-08 16:09     ` Gilad Ben-Yossef
  0 siblings, 0 replies; 37+ messages in thread
From: Gilad Ben-Yossef @ 2012-01-08 16:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Chris Metcalf, Christoph Lameter, Peter Zijlstra,
	Frederic Weisbecker, Russell King, linux-mm, Pekka Enberg,
	Matt Mackall, Sasha Levin, Rik van Riel, Andi Kleen,
	Alexander Viro, linux-fsdevel, Avi Kivity

On Wed, Jan 4, 2012 at 12:34 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Mon,  2 Jan 2012 12:24:15 +0200
> Gilad Ben-Yossef <gilad@benyossef.com> wrote:
>
>> Add the on_each_cpu_required() function that wraps on_each_cpu_mask()
>> and calculates the cpumask of cpus to IPI by calling a function supplied
>> as a parameter in order to determine whether to IPI each specific cpu.
>
> The name is actually "on_each_cpu_cond".

Oopss... I started out with on_each_cpu_required as a name and switched but
missed the description. Thanks for pointing it out.

<SNIP>

>> + * Call a function on each processor for which the supplied function
>> + * cond_func returns a positive value. This may include the local
>> + * processor, optionally waiting for all the required CPUs to finish.
>> + * The function may be called on all online CPUs without running the
>> + * cond_func function in extreme circumstance (memory allocation
>> + * failure condition when CONFIG_CPUMASK_OFFSTACK=y)
>> + * All the limitations specified in smp_call_function_many apply.
>> + */
>> +void on_each_cpu_cond(int (*cond_func) (int cpu, void *info),
>> +                     void (*func)(void *), void *info, bool wait)
>> +{
>> +     cpumask_var_t cpus;
>> +     int cpu;
>> +
>> +     if (likely(zalloc_cpumask_var(&cpus, GFP_ATOMIC))) {
>> +             for_each_online_cpu(cpu)
>> +                     if (cond_func(cpu, info))
>> +                             cpumask_set_cpu(cpu, cpus);
>> +             on_each_cpu_mask(cpus, func, info, wait);
>> +             free_cpumask_var(cpus);
>> +     } else
>> +             on_each_cpu(func, info, wait);
>> +}
>> +EXPORT_SYMBOL(on_each_cpu_cond);
>
> If zalloc_cpumask_var() fails, can we not fall back to
>
>                for_each_online_cpu(cpu)
>                        if (cond_func(cpu, info))
>                                smp_call_function_single(...);
>

Indeed we can and probably should :-)

I'll send out v6 with this and other fixes momentarily.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker
gilad@benyossef.com
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
  2012-01-05 23:19               ` Andrew Morton
@ 2012-01-09 17:25                 ` Mel Gorman
  0 siblings, 0 replies; 37+ messages in thread
From: Mel Gorman @ 2012-01-09 17:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Russell King - ARM Linux, KOSAKI Motohiro, Gilad Ben-Yossef,
	linux-kernel, Chris Metcalf, Peter Zijlstra, Frederic Weisbecker,
	linux-mm, Pekka Enberg, Matt Mackall, Sasha Levin, Rik van Riel,
	Andi Kleen, Alexander Viro, linux-fsdevel, Avi Kivity

On Thu, Jan 05, 2012 at 03:19:19PM -0800, Andrew Morton wrote:
> On Thu, 5 Jan 2012 22:31:06 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Thu, Jan 05, 2012 at 02:06:45PM -0800, Andrew Morton wrote:
> > > On Thu, 5 Jan 2012 16:17:39 +0000
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > 
> > > > mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists
> > > > 
> > > > While running a CPU hotplug stress test under memory pressure, I
> > > > saw cases where under enough stress the machine would halt although
> > > > it required a machine with 8 cores and plenty memory. I think the
> > > > problems may be related.
> > > 
> > > When we first implemented them, the percpu pages in the page allocator
> > > were of really really marginal benefit.  I didn't merge the patches at
> > > all for several cycles, and it was eventually a 49/51 decision.
> > > 
> > > So I suggest that our approach to solving this particular problem
> > > should be to nuke the whole thing, then see if that caused any
> > > observeable problems.  If it did, can we solve those problems by means
> > > other than bringing the dang things back?
> > > 
> > 
> > Sounds drastic.
> 
> Wrong thinking ;)
> 

:)

> Simplifying the code should always be the initial proposal.  Adding
> more complexity on top is the worst-case when-all-else-failed option. 
> Yet we so often reach for that option first :(
> 

Enngghh, I really want to agree with you but reducing lock contention
has been such an important goal for a long time that I am really loathe
to just rip it out and hope for the best.

> > It would be less controversial to replace this patch
> > with a version that calls get_online_cpu() in drain_all_pages() but
> > remove the call to drain_all_pages() call from the page allocator on
> > the grounds it is not safe against CPU hotplug and to hell with the
> > slightly elevated allocation failure rates and stalls. That would avoid
> > the try_get_online_cpus() crappiness and be less complex.
> 
> If we can come up with a reasonably simple patch which improves or even
> fixes the problem then I suppose there is some value in that, as it
> provides users of earlier kernels with something to backport if they
> hit problems.
> 

I'm preparing a patch that is a simplier fix but not sending an IPI at
all. There is also a sysfs fix that is necessary for tests to complete
successfully. The details will be in the series.

> But the social downside of that is that everyone would shuffle off
> towards other bright and shiny things and we'd be stuck with more
> complexity piled on top of dubiously beneficial code.
> 
> > If you really want to consider deleting the per-cpu allocator, maybe
> > it could be a LSF/MM topic?
> 
> eek, spare me.
> 

It was worth a shot.

> Anyway, we couldn't discuss such a topic without data.  Such data would
> be obtained by deleting the code and measuring the results.  Which is
> what I just said ;)
> 

Crap. ok. I've added a TODO list to implement a patch that removes it.
It is at a lower priority than removing lumpy reclaim though -
eventally this TODO list will start shrinking. I'll need to put
some thought into how it can be tested but even then I probably am
not the best person to test it. I don't have regular access to a 2+
socket machine to test NUMA effects for example.

> > Personally I would be wary of deleting
> > it but mostly because I lack regular access to the type of hardware
> > to evaulate whether it was safe to remove or not. Minimally, removing
> > the per-cpu allocator could make the zone lock very hot even though slub
> > probably makes it very hot already.
> 
> Much of the testing of the initial code was done on mbligh's weirdass
> NUMAq box: 32-way 386 NUMA which suffered really badly if there were
> contention issues.  And even on that box, the code was marginal.  So
> I'm hopeful that things will be similar on current machines.  Of
> course, it's possible that calling patterns have changed in ways which
> make the code more beneficial than it used to be.
> 

Core counts are also higher and some workloads might be more
allocator intensive than they used to be - netperf and network-related
allocations for socket receive might be a problem for example.

> But this all ties into my proposal yesterday to remove
> mm/swap.c:lru_*_pvecs.  Most or all of the heavy one-page-at-a-time
> code can pretty easily be converted to operate on batches of pages. 
>
> Folowing on from that, it should be pretty simple to extend the
> batching down into the page freeing.  Look at put_pages_list() and
> weep.  And stuff like free_hot_cold_page_list() which could easily free
> the pages directly whilebatching the locking.
> 
> Page freeing should be relatively straightforward.  Batching page
> allocation is hard in some cases (anonymous pagefaults).
> 

Page faulting would certainly be hard to batch but it would only be
really a big problem if they are intensive enough and on enough CPUs to
cause zone lock contention that was a problem.

> Please do note that the above suggestions are only needed if removing
> the pcp lists causes a problem!  It may not.
> 

True.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2012-01-09 17:25 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2012-01-03  7:51   ` Michal Nazarewicz
2012-01-03  8:12     ` Gilad Ben-Yossef
2012-01-03  8:57       ` Michal Nazarewicz
2012-01-03 22:26   ` Andrew Morton
2012-01-05 13:17     ` Michal Nazarewicz
2012-01-08 16:04     ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 2/8] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 3/8] tile: Move tile to use " Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func Gilad Ben-Yossef
2012-01-03 22:34   ` Andrew Morton
2012-01-08 16:09     ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 5/8] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-01-03 17:45   ` KOSAKI Motohiro
2012-01-03 18:58     ` Gilad Ben-Yossef
2012-01-03 22:02       ` KOSAKI Motohiro
2012-01-05 14:20     ` Mel Gorman
2012-01-05 14:40       ` Russell King - ARM Linux
2012-01-05 15:24         ` Peter Zijlstra
2012-01-05 16:17         ` Mel Gorman
2012-01-05 16:35           ` Russell King - ARM Linux
2012-01-05 18:35             ` Paul E. McKenney
2012-01-05 22:21               ` Mel Gorman
2012-01-06  6:06                 ` Srivatsa S. Bhat
2012-01-06 10:46                   ` Mel Gorman
2012-01-06 13:28                 ` Greg KH
2012-01-06 14:09                   ` Mel Gorman
2012-01-05 22:06           ` Andrew Morton
2012-01-05 22:31             ` Mel Gorman
2012-01-05 23:19               ` Andrew Morton
2012-01-09 17:25                 ` Mel Gorman
2012-01-07 16:52           ` Paul E. McKenney
2012-01-07 17:05             ` Paul E. McKenney
2012-01-05 15:54   ` Mel Gorman
2012-01-08 16:01     ` Gilad Ben-Yossef

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).