linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* slub/debugobjects: lockup when freeing memory
@ 2014-06-19 14:30 Sasha Levin
  2014-06-19 15:03 ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Sasha Levin @ 2014-06-19 14:30 UTC (permalink / raw)
  To: Pekka Enberg, Christoph Lameter, Thomas Gleixner, Matt Mackall
  Cc: Andrew Morton, Dave Jones, linux-mm, Paul E. McKenney, Dave Jones, LKML

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following spew. It seems to cause an actual lockup
as hung task messages followed soon after.

[  690.762537] =============================================
[  690.764196] [ INFO: possible recursive locking detected ]
[  690.765247] 3.16.0-rc1-next-20140618-sasha-00029-g9e4acf8-dirty #664 Tainted: G        W
[  690.766457] ---------------------------------------------
[  690.767237] kworker/u95:0/256 is trying to acquire lock:
[  690.767886] (&(&n->list_lock)->rlock){-.-.-.}, at: get_partial_node.isra.35 (mm/slub.c:1630)
[  690.769162]
[  690.769162] but task is already holding lock:
[  690.769851] (&(&n->list_lock)->rlock){-.-.-.}, at: kmem_cache_close (mm/slub.c:3209 mm/slub.c:3233)
[  690.770137]
[  690.770137] other info that might help us debug this:
[  690.770137]  Possible unsafe locking scenario:
[  690.770137]
[  690.770137]        CPU0
[  690.770137]        ----
[  690.770137]   lock(&(&n->list_lock)->rlock);
[  690.770137]   lock(&(&n->list_lock)->rlock);
[  690.770137]
[  690.770137]  *** DEADLOCK ***
[  690.770137]
[  690.770137]  May be due to missing lock nesting notation
[  690.770137]
[  690.770137] 7 locks held by kworker/u95:0/256:
[  690.770137] #0: ("%s"("netns")){.+.+.+}, at: process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:599 kernel/workqueue.c:626 kernel/workqueue.c:2074)
[  690.770137] #1: (net_cleanup_work){+.+.+.}, at: process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:599 kernel/workqueue.c:626 kernel/workqueue.c:2074)
[  690.770137] #2: (net_mutex){+.+.+.}, at: cleanup_net (net/core/net_namespace.c:287)
[  690.770137] #3: (cpu_hotplug.lock){++++++}, at: get_online_cpus (kernel/cpu.c:90)
[  690.770137] #4: (mem_hotplug.lock){.+.+.+}, at: get_online_mems (mm/memory_hotplug.c:83)
[  690.770137] #5: (slab_mutex){+.+.+.}, at: kmem_cache_destroy (mm/slab_common.c:343)
[  690.770137] #6: (&(&n->list_lock)->rlock){-.-.-.}, at: kmem_cache_close (mm/slub.c:3209 mm/slub.c:3233)
[  690.770137]
[  690.770137] stack backtrace:
[  690.770137] CPU: 18 PID: 256 Comm: kworker/u95:0 Tainted: G        W     3.16.0-rc1-next-20140618-sasha-00029-g9e4acf8-dirty #664
[  690.770137] Workqueue: netns cleanup_net
[  690.770137]  ffff8808a172b000 ffff8808a1737628 ffffffff9d5179a0 0000000000000003
[  690.770137]  ffffffffa0b499c0 ffff8808a1737728 ffffffff9a1cac52 ffff8808a1737668
[  690.770137]  ffffffff9a1a74f8 23e00d8075e32f12 ffff8808a172b000 23e00d8000000001
[  690.770137] Call Trace:
[  690.770137] dump_stack (lib/dump_stack.c:52)
[  690.770137] __lock_acquire (kernel/locking/lockdep.c:3034 kernel/locking/lockdep.c:3180)
[  690.770137] ? sched_clock_cpu (kernel/sched/clock.c:311)
[  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[  690.770137] lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  690.770137] ? get_partial_node.isra.35 (mm/slub.c:1630)
[  690.770137] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[  690.770137] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[  690.770137] ? get_partial_node.isra.35 (mm/slub.c:1630)
[  690.770137] get_partial_node.isra.35 (mm/slub.c:1630)
[  690.770137] ? __slab_alloc (mm/slub.c:2304)
[  690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
[  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[  690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
[  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[  690.770137] ? debug_object_activate (lib/debugobjects.c:439)
[  690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[  690.770137] debug_object_init (lib/debugobjects.c:365)
[  690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
[  690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
[  690.770137] ? discard_slab (mm/slub.c:1486)
[  690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
[  690.770137] call_rcu (kernel/rcu/tree_plugin.h:679)
[  690.770137] discard_slab (mm/slub.c:1515 mm/slub.c:1523)
[  690.770137] kmem_cache_close (mm/slub.c:3212 mm/slub.c:3233)
[  690.770137] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[  690.770137] __kmem_cache_shutdown (mm/slub.c:3245)
[  690.770137] kmem_cache_destroy (mm/slab_common.c:349)
[  690.770137] nf_conntrack_cleanup_net_list (net/netfilter/nf_conntrack_core.c:1569 (discriminator 2))
[  690.770137] nf_conntrack_pernet_exit (net/netfilter/nf_conntrack_standalone.c:558)
[  690.770137] ops_exit_list.isra.1 (net/core/net_namespace.c:135)
[  690.770137] cleanup_net (net/core/net_namespace.c:302 (discriminator 2))
[  690.770137] process_one_work (kernel/workqueue.c:2081 include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 kernel/workqueue.c:2086)
[  690.770137] ? process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:599 kernel/workqueue.c:626 kernel/workqueue.c:2074)
[  690.770137] worker_thread (kernel/workqueue.c:2213)
[  690.770137] ? rescuer_thread (kernel/workqueue.c:2157)
[  690.770137] kthread (kernel/kthread.c:210)
[  690.770137] ? kthread_create_on_node (kernel/kthread.c:176)
[  690.770137] ret_from_fork (arch/x86/kernel/entry_64.S:349)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 14:30 slub/debugobjects: lockup when freeing memory Sasha Levin
@ 2014-06-19 15:03 ` Christoph Lameter
  2014-06-19 16:52   ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2014-06-19 15:03 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Pekka Enberg, Thomas Gleixner, Matt Mackall, Andrew Morton,
	Dave Jones, linux-mm, Paul E. McKenney, Dave Jones, LKML

On Thu, 19 Jun 2014, Sasha Levin wrote:

> [  690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [  690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> [  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [  690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [  690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> [  690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [  690.770137] debug_object_init (lib/debugobjects.c:365)
> [  690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> [  690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> [  690.770137] ? discard_slab (mm/slub.c:1486)
> [  690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))

__call_rcu does a slab allocation? This means __call_rcu can no longer be
used in slab allocators? What happened?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 15:03 ` Christoph Lameter
@ 2014-06-19 16:52   ` Paul E. McKenney
  2014-06-19 19:29     ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-19 16:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Sasha Levin, Pekka Enberg, Thomas Gleixner, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> On Thu, 19 Jun 2014, Sasha Levin wrote:
> 
> > [  690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > [  690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> > [  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > [  690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > [  690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> > [  690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > [  690.770137] debug_object_init (lib/debugobjects.c:365)
> > [  690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> > [  690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> > [  690.770137] ? discard_slab (mm/slub.c:1486)
> > [  690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> 
> __call_rcu does a slab allocation? This means __call_rcu can no longer be
> used in slab allocators? What happened?

My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
call_rcu_bh(), or call_srcu().

Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
That would be unfortunate...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 16:52   ` Paul E. McKenney
@ 2014-06-19 19:29     ` Thomas Gleixner
  2014-06-19 20:19       ` Christoph Lameter
  2014-06-19 20:29       ` Paul E. McKenney
  0 siblings, 2 replies; 28+ messages in thread
From: Thomas Gleixner @ 2014-06-19 19:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, 19 Jun 2014, Paul E. McKenney wrote:

> On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> > On Thu, 19 Jun 2014, Sasha Levin wrote:
> > 
> > > [  690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > > [  690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> > > [  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > [  690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > [  690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> > > [  690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > [  690.770137] debug_object_init (lib/debugobjects.c:365)
> > > [  690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> > > [  690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> > > [  690.770137] ? discard_slab (mm/slub.c:1486)
> > > [  690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> > 
> > __call_rcu does a slab allocation? This means __call_rcu can no longer be
> > used in slab allocators? What happened?
> 
> My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
> call_rcu_bh(), or call_srcu().
> 
> Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
> That would be unfortunate...

Well, no. Look at the callchain:

__call_rcu
    debug_object_activate
       rcuhead_fixup_activate
          debug_object_init
              kmem_cache_alloc

So call rcu activates the object, but the object has no reference in
the debug objects code so the fixup code is called which inits the
object and allocates a reference ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 19:29     ` Thomas Gleixner
@ 2014-06-19 20:19       ` Christoph Lameter
  2014-06-19 20:28         ` Thomas Gleixner
                           ` (2 more replies)
  2014-06-19 20:29       ` Paul E. McKenney
  1 sibling, 3 replies; 28+ messages in thread
From: Christoph Lameter @ 2014-06-19 20:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paul E. McKenney, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, 19 Jun 2014, Thomas Gleixner wrote:

> Well, no. Look at the callchain:
>
> __call_rcu
>     debug_object_activate
>        rcuhead_fixup_activate
>           debug_object_init
>               kmem_cache_alloc
>
> So call rcu activates the object, but the object has no reference in
> the debug objects code so the fixup code is called which inits the
> object and allocates a reference ....

So we need to init the object in the page struct before the __call_rcu?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:19       ` Christoph Lameter
@ 2014-06-19 20:28         ` Thomas Gleixner
  2014-06-19 20:36         ` Paul E. McKenney
  2014-08-18 16:37         ` Paul E. McKenney
  2 siblings, 0 replies; 28+ messages in thread
From: Thomas Gleixner @ 2014-06-19 20:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Paul E. McKenney, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, 19 Jun 2014, Christoph Lameter wrote:
> On Thu, 19 Jun 2014, Thomas Gleixner wrote:
> 
> > Well, no. Look at the callchain:
> >
> > __call_rcu
> >     debug_object_activate
> >        rcuhead_fixup_activate
> >           debug_object_init
> >               kmem_cache_alloc
> >
> > So call rcu activates the object, but the object has no reference in
> > the debug objects code so the fixup code is called which inits the
> > object and allocates a reference ....
> 
> So we need to init the object in the page struct before the __call_rcu?
 
Looks like RCU is lazily relying on the state callback to initialize
the objects.

There is an unused debug_init_rcu_head() inline in kernel/rcu/update.c

Paul????



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 19:29     ` Thomas Gleixner
  2014-06-19 20:19       ` Christoph Lameter
@ 2014-06-19 20:29       ` Paul E. McKenney
  2014-06-19 20:32         ` Sasha Levin
                           ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-19 20:29 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> 
> > On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> > > On Thu, 19 Jun 2014, Sasha Levin wrote:
> > > 
> > > > [  690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > > > [  690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> > > > [  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> > > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [  690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> > > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [  690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> > > > [  690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [  690.770137] debug_object_init (lib/debugobjects.c:365)
> > > > [  690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> > > > [  690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> > > > [  690.770137] ? discard_slab (mm/slub.c:1486)
> > > > [  690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> > > 
> > > __call_rcu does a slab allocation? This means __call_rcu can no longer be
> > > used in slab allocators? What happened?
> > 
> > My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
> > call_rcu_bh(), or call_srcu().
> > 
> > Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
> > That would be unfortunate...
> 
> Well, no. Look at the callchain:
> 
> __call_rcu
>     debug_object_activate
>        rcuhead_fixup_activate
>           debug_object_init
>               kmem_cache_alloc
> 
> So call rcu activates the object, but the object has no reference in
> the debug objects code so the fixup code is called which inits the
> object and allocates a reference ....

OK, got it.  And you are right, call_rcu() has done this for a very
long time, so not sure what changed.  But it seems like the right
approach is to provide a debug-object-free call_rcu_alloc() for use
by the memory allocators.

Seem reasonable?  If so, please see the following patch.

						Thanx, Paul

------------------------------------------------------------------------

rcu: Provide call_rcu_alloc() and call_rcu_sched_alloc() to avoid recursion

The sl*b allocators use call_rcu() to manage object lifetimes, but
call_rcu() can use debug-objects, which in turn invokes the sl*b
allocators.  These allocators are not prepared for this sort of
recursion, which can result in failures.

This commit therefore creates call_rcu_alloc() and call_rcu_sched_alloc(),
which act as their call_rcu() and call_rcu_sched() counterparts, but
which avoid invoking debug-objects.  These new API members are intended
only for use by the sl*b allocators, and this commit makes the sl*b
allocators use call_rcu_alloc().  Why call_rcu_sched_alloc()?  Because
in CONFIG_PREEMPT=n kernels, call_rcu() maps to call_rcu_sched(), so
therefore call_rcu_alloc() must map to call_rcu_sched_alloc().

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Set-straight-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index d5e40a42cc43..1f708a7f9e7d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -140,13 +140,24 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
  * if CPU A and CPU B are the same CPU (but again only if the system has
  * more than one CPU).
  */
-void call_rcu(struct rcu_head *head,
-	      void (*func)(struct rcu_head *head));
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *head));
+
+/**
+ * call_rcu__alloc() - Queue an RCU for invocation after grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * Similar to call_rcu(), but avoids invoking debug-objects.  This permits
+ * this to be called from allocators without needing to worry about
+ * recursive calls into those allocators for debug-objects allocations.
+ */
+void call_rcu_alloc(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
 
 #else /* #ifdef CONFIG_PREEMPT_RCU */
 
 /* In classic RCU, call_rcu() is just call_rcu_sched(). */
 #define	call_rcu	call_rcu_sched
+#define	call_rcu_alloc	call_rcu_sched_alloc
 
 #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
 
@@ -196,6 +207,19 @@ void call_rcu_bh(struct rcu_head *head,
 void call_rcu_sched(struct rcu_head *head,
 		    void (*func)(struct rcu_head *rcu));
 
+/**
+ * call_rcu_sched_alloc() - Queue RCU for invocation after sched grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * Similar to call_rcu_sched(), but avoids invoking debug-objects.
+ * This permits this to be called from allocators without needing to
+ * worry about recursive calls into those allocators for debug-objects
+ * allocations.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+			  void (*func)(struct rcu_head *rcu));
+
 void synchronize_sched(void);
 
 #ifdef CONFIG_PREEMPT_RCU
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index d9efcc13008c..515e60067c53 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -338,15 +338,14 @@ void synchronize_sched(void)
 EXPORT_SYMBOL_GPL(synchronize_sched);
 
 /*
- * Helper function for call_rcu() and call_rcu_bh().
+ * Provide call_rcu() function, but avoid invoking debug objects.
  */
-static void __call_rcu(struct rcu_head *head,
-		       void (*func)(struct rcu_head *rcu),
-		       struct rcu_ctrlblk *rcp)
+static void __call_rcu_nodo(struct rcu_head *head,
+			    void (*func)(struct rcu_head *rcu),
+			    struct rcu_ctrlblk *rcp)
 {
 	unsigned long flags;
 
-	debug_rcu_head_queue(head);
 	head->func = func;
 	head->next = NULL;
 
@@ -358,6 +357,17 @@ static void __call_rcu(struct rcu_head *head,
 }
 
 /*
+ * Helper function for call_rcu() and call_rcu_bh().
+ */
+static void __call_rcu(struct rcu_head *head,
+		       void (*func)(struct rcu_head *rcu),
+		       struct rcu_ctrlblk *rcp)
+{
+	debug_rcu_head_queue(head);
+	__call_rcu_nodo(head, func, rcp);
+}
+
+/*
  * Post an RCU callback to be invoked after the end of an RCU-sched grace
  * period.  But since we have but one CPU, that would be after any
  * quiescent state.
@@ -369,6 +379,18 @@ void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu_sched);
 
 /*
+ * Similar to call_rcu_sched(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+			  void (*func)(struct rcu_head *rcu))
+{
+	__call_rcu_nodo(head, func, &rcu_sched_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu_sched_alloc);
+
+/*
  * Post an RCU bottom-half callback to be invoked after any subsequent
  * quiescent state.
  */
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8c47d04ecdea..593195d38850 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2640,25 +2640,16 @@ static void rcu_leak_callback(struct rcu_head *rhp)
 }
 
 /*
- * Helper function for call_rcu() and friends.  The cpu argument will
- * normally be -1, indicating "currently running CPU".  It may specify
- * a CPU only if that CPU is a no-CBs CPU.  Currently, only _rcu_barrier()
- * is expected to specify a CPU.
+ * Provide call_rcu() function, but avoid invoking debug objects.
  */
 static void
-__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
-	   struct rcu_state *rsp, int cpu, bool lazy)
+__call_rcu_nodo(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
+		struct rcu_state *rsp, int cpu, bool lazy)
 {
 	unsigned long flags;
 	struct rcu_data *rdp;
 
 	WARN_ON_ONCE((unsigned long)head & 0x1); /* Misaligned rcu_head! */
-	if (debug_rcu_head_queue(head)) {
-		/* Probable double call_rcu(), so leak the callback. */
-		ACCESS_ONCE(head->func) = rcu_leak_callback;
-		WARN_ONCE(1, "__call_rcu(): Leaked duplicate callback\n");
-		return;
-	}
 	head->func = func;
 	head->next = NULL;
 
@@ -2704,6 +2695,25 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
 }
 
 /*
+ * Helper function for call_rcu() and friends.  The cpu argument will
+ * normally be -1, indicating "currently running CPU".  It may specify
+ * a CPU only if that CPU is a no-CBs CPU.  Currently, only _rcu_barrier()
+ * is expected to specify a CPU.
+ */
+static void
+__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
+	   struct rcu_state *rsp, int cpu, bool lazy)
+{
+	if (debug_rcu_head_queue(head)) {
+		/* Probable double call_rcu(), so leak the callback. */
+		ACCESS_ONCE(head->func) = rcu_leak_callback;
+		WARN_ONCE(1, "__call_rcu(): Leaked duplicate callback\n");
+		return;
+	}
+	__call_rcu_nodo(head, func, rsp, cpu, lazy);
+}
+
+/*
  * Queue an RCU-sched callback for invocation after a grace period.
  */
 void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
@@ -2713,6 +2723,18 @@ void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 EXPORT_SYMBOL_GPL(call_rcu_sched);
 
 /*
+ * Similar to call_rcu_sched(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+			  void (*func)(struct rcu_head *rcu))
+{
+	__call_rcu_nodo(head, func, &rcu_sched_state, -1, 0);
+}
+EXPORT_SYMBOL_GPL(call_rcu_sched_alloc);
+
+/*
  * Queue an RCU callback for invocation after a quicker grace period.
  */
 void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 569b390daa15..e9362d7f8328 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -679,6 +679,17 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
 }
 EXPORT_SYMBOL_GPL(call_rcu);
 
+/*
+ * Similar to call_rcu(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_alloc(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+	__call_rcu_nodo(head, func, &rcu_preempt_state, -1, 0);
+}
+EXPORT_SYMBOL_GPL(call_rcu_alloc);
+
 /**
  * synchronize_rcu - wait until a grace period has elapsed.
  *
diff --git a/mm/slab.c b/mm/slab.c
index 9ca3b87edabc..1e5de0d39701 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1994,7 +1994,7 @@ static void slab_destroy(struct kmem_cache *cachep, struct page *page)
 		 * we can use it safely.
 		 */
 		head = (void *)&page->rcu_head;
-		call_rcu(head, kmem_rcu_free);
+		call_rcu_alloc(head, kmem_rcu_free);
 
 	} else {
 		kmem_freepages(cachep, page);
diff --git a/mm/slob.c b/mm/slob.c
index 21980e0f39a8..47ad4a43521a 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -605,7 +605,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
 		struct slob_rcu *slob_rcu;
 		slob_rcu = b + (c->size - sizeof(struct slob_rcu));
 		slob_rcu->size = c->size;
-		call_rcu(&slob_rcu->head, kmem_rcu_free);
+		call_rcu_alloc(&slob_rcu->head, kmem_rcu_free);
 	} else {
 		__kmem_cache_free(b, c->size);
 	}
diff --git a/mm/slub.c b/mm/slub.c
index b2b047327d76..7f01e57fd99f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1512,7 +1512,7 @@ static void free_slab(struct kmem_cache *s, struct page *page)
 			head = (void *)&page->lru;
 		}
 
-		call_rcu(head, rcu_free_slab);
+		call_rcu_alloc(head, rcu_free_slab);
 	} else
 		__free_slab(s, page);
 }


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:29       ` Paul E. McKenney
@ 2014-06-19 20:32         ` Sasha Levin
  2014-06-19 20:39           ` Paul E. McKenney
  2014-06-19 20:37         ` Thomas Gleixner
  2014-06-19 20:42         ` Sasha Levin
  2 siblings, 1 reply; 28+ messages in thread
From: Sasha Levin @ 2014-06-19 20:32 UTC (permalink / raw)
  To: paulmck, Thomas Gleixner
  Cc: Christoph Lameter, Pekka Enberg, Matt Mackall, Andrew Morton,
	Dave Jones, linux-mm, LKML

On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
>> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
>> > 
>>> > > On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
>>>> > > > On Thu, 19 Jun 2014, Sasha Levin wrote:
>>>> > > > 
>>>>> > > > > [  690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
>>>>> > > > > [  690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
>>>>> > > > > [  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
>>>>> > > > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
>>>>> > > > > [  690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
>>>>> > > > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
>>>>> > > > > [  690.770137] ? debug_object_activate (lib/debugobjects.c:439)
>>>>> > > > > [  690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
>>>>> > > > > [  690.770137] debug_object_init (lib/debugobjects.c:365)
>>>>> > > > > [  690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
>>>>> > > > > [  690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
>>>>> > > > > [  690.770137] ? discard_slab (mm/slub.c:1486)
>>>>> > > > > [  690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
>>>> > > > 
>>>> > > > __call_rcu does a slab allocation? This means __call_rcu can no longer be
>>>> > > > used in slab allocators? What happened?
>>> > > 
>>> > > My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
>>> > > call_rcu_bh(), or call_srcu().
>>> > > 
>>> > > Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
>>> > > That would be unfortunate...
>> > 
>> > Well, no. Look at the callchain:
>> > 
>> > __call_rcu
>> >     debug_object_activate
>> >        rcuhead_fixup_activate
>> >           debug_object_init
>> >               kmem_cache_alloc
>> > 
>> > So call rcu activates the object, but the object has no reference in
>> > the debug objects code so the fixup code is called which inits the
>> > object and allocates a reference ....
> OK, got it.  And you are right, call_rcu() has done this for a very
> long time, so not sure what changed.

It's probable my fault. I've introduced clone() and unshare() fuzzing.

Those two are full with issues and I've been waiting with enabling those
until the rest of the kernel could survive trinity for more than an hour.


Thanks,
Sasha


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:19       ` Christoph Lameter
  2014-06-19 20:28         ` Thomas Gleixner
@ 2014-06-19 20:36         ` Paul E. McKenney
  2014-08-18 16:37         ` Paul E. McKenney
  2 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-19 20:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 03:19:39PM -0500, Christoph Lameter wrote:
> On Thu, 19 Jun 2014, Thomas Gleixner wrote:
> 
> > Well, no. Look at the callchain:
> >
> > __call_rcu
> >     debug_object_activate
> >        rcuhead_fixup_activate
> >           debug_object_init
> >               kmem_cache_alloc
> >
> > So call rcu activates the object, but the object has no reference in
> > the debug objects code so the fixup code is called which inits the
> > object and allocates a reference ....
> 
> So we need to init the object in the page struct before the __call_rcu?

Good point.  The patch I just sent will complain at callback-invocation
time because the debug-object information won't be present.

One way to handle this would be for rcu_do_batch() to avoid complaining
if it gets a callback that has not been through call_rcu()'s
debug_rcu_head_queue().  One way to do that would be to have an
alternative to debug_object_deactivate() that does not complain
if it is handed an unactivated object.

Another way to handle this would be for me to put the definition of
debug_rcu_head_queue() somewhere where the sl*b allocator could get
at it, and have the sl*b allocators invoke it some at initialization
and within the RCU callback.

Other thoughts?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:29       ` Paul E. McKenney
  2014-06-19 20:32         ` Sasha Levin
@ 2014-06-19 20:37         ` Thomas Gleixner
  2014-06-19 20:53           ` Paul E. McKenney
  2014-06-19 20:42         ` Sasha Levin
  2 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2014-06-19 20:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > Well, no. Look at the callchain:
> > 
> > __call_rcu
> >     debug_object_activate
> >        rcuhead_fixup_activate
> >           debug_object_init
> >               kmem_cache_alloc
> > 
> > So call rcu activates the object, but the object has no reference in
> > the debug objects code so the fixup code is called which inits the
> > object and allocates a reference ....
> 
> OK, got it.  And you are right, call_rcu() has done this for a very
> long time, so not sure what changed.  But it seems like the right
> approach is to provide a debug-object-free call_rcu_alloc() for use
> by the memory allocators.
> 
> Seem reasonable?  If so, please see the following patch.

Not really, you're torpedoing the whole purpose of debugobjects :)

So, why can't we just init the rcu head when the stuff is created?

If that's impossible due to other memory allocator constraints, then
instead of inventing a whole new API we can simply flag the relevent
data in the memory allocator as we do with the debug objects mem cache
itself (SLAB_DEBUG_OBJECTS).

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:32         ` Sasha Levin
@ 2014-06-19 20:39           ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-19 20:39 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 04:32:38PM -0400, Sasha Levin wrote:
> On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> >> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> >> > 
> >>> > > On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> >>>> > > > On Thu, 19 Jun 2014, Sasha Levin wrote:
> >>>> > > > 
> >>>>> > > > > [  690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> >>>>> > > > > [  690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> >>>>> > > > > [  690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> >>>>> > > > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> >>>>> > > > > [  690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> >>>>> > > > > [  690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> >>>>> > > > > [  690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> >>>>> > > > > [  690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> >>>>> > > > > [  690.770137] debug_object_init (lib/debugobjects.c:365)
> >>>>> > > > > [  690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> >>>>> > > > > [  690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> >>>>> > > > > [  690.770137] ? discard_slab (mm/slub.c:1486)
> >>>>> > > > > [  690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> >>>> > > > 
> >>>> > > > __call_rcu does a slab allocation? This means __call_rcu can no longer be
> >>>> > > > used in slab allocators? What happened?
> >>> > > 
> >>> > > My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
> >>> > > call_rcu_bh(), or call_srcu().
> >>> > > 
> >>> > > Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
> >>> > > That would be unfortunate...
> >> > 
> >> > Well, no. Look at the callchain:
> >> > 
> >> > __call_rcu
> >> >     debug_object_activate
> >> >        rcuhead_fixup_activate
> >> >           debug_object_init
> >> >               kmem_cache_alloc
> >> > 
> >> > So call rcu activates the object, but the object has no reference in
> >> > the debug objects code so the fixup code is called which inits the
> >> > object and allocates a reference ....
> > OK, got it.  And you are right, call_rcu() has done this for a very
> > long time, so not sure what changed.
> 
> It's probable my fault. I've introduced clone() and unshare() fuzzing.
> 
> Those two are full with issues and I've been waiting with enabling those
> until the rest of the kernel could survive trinity for more than an hour.

Well, that might explain why I haven't seen it in my testing.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:29       ` Paul E. McKenney
  2014-06-19 20:32         ` Sasha Levin
  2014-06-19 20:37         ` Thomas Gleixner
@ 2014-06-19 20:42         ` Sasha Levin
  2014-06-19 20:53           ` Paul E. McKenney
  2 siblings, 1 reply; 28+ messages in thread
From: Sasha Levin @ 2014-06-19 20:42 UTC (permalink / raw)
  To: paulmck, Thomas Gleixner
  Cc: Christoph Lameter, Pekka Enberg, Matt Mackall, Andrew Morton,
	Dave Jones, linux-mm, LKML

On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> rcu: Provide call_rcu_alloc() and call_rcu_sched_alloc() to avoid recursion
> 
> The sl*b allocators use call_rcu() to manage object lifetimes, but
> call_rcu() can use debug-objects, which in turn invokes the sl*b
> allocators.  These allocators are not prepared for this sort of
> recursion, which can result in failures.
> 
> This commit therefore creates call_rcu_alloc() and call_rcu_sched_alloc(),
> which act as their call_rcu() and call_rcu_sched() counterparts, but
> which avoid invoking debug-objects.  These new API members are intended
> only for use by the sl*b allocators, and this commit makes the sl*b
> allocators use call_rcu_alloc().  Why call_rcu_sched_alloc()?  Because
> in CONFIG_PREEMPT=n kernels, call_rcu() maps to call_rcu_sched(), so
> therefore call_rcu_alloc() must map to call_rcu_sched_alloc().
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Set-straight-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Paul, what is this patch based on? It won't apply cleanly on -next
or Linus's tree.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:37         ` Thomas Gleixner
@ 2014-06-19 20:53           ` Paul E. McKenney
  2014-06-19 21:32             ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-19 20:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > Well, no. Look at the callchain:
> > > 
> > > __call_rcu
> > >     debug_object_activate
> > >        rcuhead_fixup_activate
> > >           debug_object_init
> > >               kmem_cache_alloc
> > > 
> > > So call rcu activates the object, but the object has no reference in
> > > the debug objects code so the fixup code is called which inits the
> > > object and allocates a reference ....
> > 
> > OK, got it.  And you are right, call_rcu() has done this for a very
> > long time, so not sure what changed.  But it seems like the right
> > approach is to provide a debug-object-free call_rcu_alloc() for use
> > by the memory allocators.
> > 
> > Seem reasonable?  If so, please see the following patch.
> 
> Not really, you're torpedoing the whole purpose of debugobjects :)
> 
> So, why can't we just init the rcu head when the stuff is created?

That would allow me to keep my code unchanged, so I am in favor.  ;-)

							Thanx, Paul

> If that's impossible due to other memory allocator constraints, then
> instead of inventing a whole new API we can simply flag the relevent
> data in the memory allocator as we do with the debug objects mem cache
> itself (SLAB_DEBUG_OBJECTS).
> 
> Thanks,
> 
> 	tglx
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:42         ` Sasha Levin
@ 2014-06-19 20:53           ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-19 20:53 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 04:42:11PM -0400, Sasha Levin wrote:
> On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> > rcu: Provide call_rcu_alloc() and call_rcu_sched_alloc() to avoid recursion
> > 
> > The sl*b allocators use call_rcu() to manage object lifetimes, but
> > call_rcu() can use debug-objects, which in turn invokes the sl*b
> > allocators.  These allocators are not prepared for this sort of
> > recursion, which can result in failures.
> > 
> > This commit therefore creates call_rcu_alloc() and call_rcu_sched_alloc(),
> > which act as their call_rcu() and call_rcu_sched() counterparts, but
> > which avoid invoking debug-objects.  These new API members are intended
> > only for use by the sl*b allocators, and this commit makes the sl*b
> > allocators use call_rcu_alloc().  Why call_rcu_sched_alloc()?  Because
> > in CONFIG_PREEMPT=n kernels, call_rcu() maps to call_rcu_sched(), so
> > therefore call_rcu_alloc() must map to call_rcu_sched_alloc().
> > 
> > Reported-by: Sasha Levin <sasha.levin@oracle.com>
> > Set-straight-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> Paul, what is this patch based on? It won't apply cleanly on -next
> or Linus's tree.

On my -rcu tree, but I think that Thomas's approach is better.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:53           ` Paul E. McKenney
@ 2014-06-19 21:32             ` Thomas Gleixner
  2014-06-19 22:04               ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2014-06-19 21:32 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML



On Thu, 19 Jun 2014, Paul E. McKenney wrote:

> On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > Well, no. Look at the callchain:
> > > > 
> > > > __call_rcu
> > > >     debug_object_activate
> > > >        rcuhead_fixup_activate
> > > >           debug_object_init
> > > >               kmem_cache_alloc
> > > > 
> > > > So call rcu activates the object, but the object has no reference in
> > > > the debug objects code so the fixup code is called which inits the
> > > > object and allocates a reference ....
> > > 
> > > OK, got it.  And you are right, call_rcu() has done this for a very
> > > long time, so not sure what changed.  But it seems like the right
> > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > by the memory allocators.
> > > 
> > > Seem reasonable?  If so, please see the following patch.
> > 
> > Not really, you're torpedoing the whole purpose of debugobjects :)
> > 
> > So, why can't we just init the rcu head when the stuff is created?
> 
> That would allow me to keep my code unchanged, so I am in favor.  ;-)

Almost unchanged. You need to provide a function to do so, i.e. make
use of

    debug_init_rcu_head()

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 21:32             ` Thomas Gleixner
@ 2014-06-19 22:04               ` Paul E. McKenney
  2014-06-20  8:17                 ` Thomas Gleixner
  2014-06-20 14:30                 ` Christoph Lameter
  0 siblings, 2 replies; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-19 22:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 11:32:41PM +0200, Thomas Gleixner wrote:
> 
> 
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> 
> > On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > Well, no. Look at the callchain:
> > > > > 
> > > > > __call_rcu
> > > > >     debug_object_activate
> > > > >        rcuhead_fixup_activate
> > > > >           debug_object_init
> > > > >               kmem_cache_alloc
> > > > > 
> > > > > So call rcu activates the object, but the object has no reference in
> > > > > the debug objects code so the fixup code is called which inits the
> > > > > object and allocates a reference ....
> > > > 
> > > > OK, got it.  And you are right, call_rcu() has done this for a very
> > > > long time, so not sure what changed.  But it seems like the right
> > > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > > by the memory allocators.
> > > > 
> > > > Seem reasonable?  If so, please see the following patch.
> > > 
> > > Not really, you're torpedoing the whole purpose of debugobjects :)
> > > 
> > > So, why can't we just init the rcu head when the stuff is created?
> > 
> > That would allow me to keep my code unchanged, so I am in favor.  ;-)
> 
> Almost unchanged. You need to provide a function to do so, i.e. make
> use of
> 
>     debug_init_rcu_head()

You mean like this?

							Thanx, Paul

------------------------------------------------------------------------

rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()

Currently, call_rcu() relies on implicit allocation and initialization
for the debug-objects handling of RCU callbacks.  If you hammer the
kernel hard enough with Sasha's modified version of trinity, you can end
up with the sl*b allocators recursing into themselves via this implicit
call_rcu() allocation.

This commit therefore exports the debug_init_rcu_head() and
debug_rcu_head_free() functions, which permits the allocators to allocated
and pre-initialize the debug-objects information, so that there no longer
any need for call_rcu() to do that initialization, which in turn prevents
the recursion into the memory allocators.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 063a6bf1a2b6..34ae5c376e35 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -358,9 +358,19 @@ void wait_rcu_gp(call_rcu_func_t crf);
  * initialization.
  */
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
+void debug_init_rcu_head(struct rcu_head *head);
+void debug_rcu_head_free(struct rcu_head *head);
 void init_rcu_head_on_stack(struct rcu_head *head);
 void destroy_rcu_head_on_stack(struct rcu_head *head);
 #else /* !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
+static inline void debug_init_rcu_head(struct rcu_head *head)
+{
+}
+
+static inline void debug_rcu_head_free(struct rcu_head *head)
+{
+}
+
 static inline void init_rcu_head_on_stack(struct rcu_head *head)
 {
 }
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a2aeb4df0f60..a41c81a26506 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -200,12 +200,12 @@ void wait_rcu_gp(call_rcu_func_t crf)
 EXPORT_SYMBOL_GPL(wait_rcu_gp);
 
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-static inline void debug_init_rcu_head(struct rcu_head *head)
+void debug_init_rcu_head(struct rcu_head *head)
 {
 	debug_object_init(head, &rcuhead_debug_descr);
 }
 
-static inline void debug_rcu_head_free(struct rcu_head *head)
+void debug_rcu_head_free(struct rcu_head *head)
 {
 	debug_object_free(head, &rcuhead_debug_descr);
 }


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 22:04               ` Paul E. McKenney
@ 2014-06-20  8:17                 ` Thomas Gleixner
  2014-06-20 15:40                   ` Paul E. McKenney
  2014-06-20 14:30                 ` Christoph Lameter
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2014-06-20  8:17 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 11:32:41PM +0200, Thomas Gleixner wrote:
> > 
> > 
> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > 
> > > On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > > Well, no. Look at the callchain:
> > > > > > 
> > > > > > __call_rcu
> > > > > >     debug_object_activate
> > > > > >        rcuhead_fixup_activate
> > > > > >           debug_object_init
> > > > > >               kmem_cache_alloc
> > > > > > 
> > > > > > So call rcu activates the object, but the object has no reference in
> > > > > > the debug objects code so the fixup code is called which inits the
> > > > > > object and allocates a reference ....
> > > > > 
> > > > > OK, got it.  And you are right, call_rcu() has done this for a very
> > > > > long time, so not sure what changed.  But it seems like the right
> > > > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > > > by the memory allocators.
> > > > > 
> > > > > Seem reasonable?  If so, please see the following patch.
> > > > 
> > > > Not really, you're torpedoing the whole purpose of debugobjects :)
> > > > 
> > > > So, why can't we just init the rcu head when the stuff is created?
> > > 
> > > That would allow me to keep my code unchanged, so I am in favor.  ;-)
> > 
> > Almost unchanged. You need to provide a function to do so, i.e. make
> > use of
> > 
> >     debug_init_rcu_head()
> 
> You mean like this?

I'd rather name it init_rcu_head() and free_rcu_head() w/o the debug_
prefix, so it's consistent with init_rcu_head_on_stack /
destroy_rcu_head_on_stack. But either way works for me.

Acked-by: Thomas Gleixner <tglx@linutronix.de>

 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 22:04               ` Paul E. McKenney
  2014-06-20  8:17                 ` Thomas Gleixner
@ 2014-06-20 14:30                 ` Christoph Lameter
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Lameter @ 2014-06-20 14:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, 19 Jun 2014, Paul E. McKenney wrote:

> This commit therefore exports the debug_init_rcu_head() and
> debug_rcu_head_free() functions, which permits the allocators to allocated
> and pre-initialize the debug-objects information, so that there no longer
> any need for call_rcu() to do that initialization, which in turn prevents
> the recursion into the memory allocators.

Looks-good-to: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-20  8:17                 ` Thomas Gleixner
@ 2014-06-20 15:40                   ` Paul E. McKenney
  2014-07-12 18:03                     ` Sasha Levin
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2014-06-20 15:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christoph Lameter, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Fri, Jun 20, 2014 at 10:17:32AM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > On Thu, Jun 19, 2014 at 11:32:41PM +0200, Thomas Gleixner wrote:
> > > 
> > > 
> > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > 
> > > > On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > > > Well, no. Look at the callchain:
> > > > > > > 
> > > > > > > __call_rcu
> > > > > > >     debug_object_activate
> > > > > > >        rcuhead_fixup_activate
> > > > > > >           debug_object_init
> > > > > > >               kmem_cache_alloc
> > > > > > > 
> > > > > > > So call rcu activates the object, but the object has no reference in
> > > > > > > the debug objects code so the fixup code is called which inits the
> > > > > > > object and allocates a reference ....
> > > > > > 
> > > > > > OK, got it.  And you are right, call_rcu() has done this for a very
> > > > > > long time, so not sure what changed.  But it seems like the right
> > > > > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > > > > by the memory allocators.
> > > > > > 
> > > > > > Seem reasonable?  If so, please see the following patch.
> > > > > 
> > > > > Not really, you're torpedoing the whole purpose of debugobjects :)
> > > > > 
> > > > > So, why can't we just init the rcu head when the stuff is created?
> > > > 
> > > > That would allow me to keep my code unchanged, so I am in favor.  ;-)
> > > 
> > > Almost unchanged. You need to provide a function to do so, i.e. make
> > > use of
> > > 
> > >     debug_init_rcu_head()
> > 
> > You mean like this?
> 
> I'd rather name it init_rcu_head() and free_rcu_head() w/o the debug_
> prefix, so it's consistent with init_rcu_head_on_stack /
> destroy_rcu_head_on_stack. But either way works for me.
> 
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

So just drop the _on_stack() from the other names, then.  Please see
below.

							Thanx, Paul

------------------------------------------------------------------------

rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()

Currently, call_rcu() relies on implicit allocation and initialization
for the debug-objects handling of RCU callbacks.  If you hammer the
kernel hard enough with Sasha's modified version of trinity, you can end
up with the sl*b allocators recursing into themselves via this implicit
call_rcu() allocation.

This commit therefore exports the debug_init_rcu_head() and
debug_rcu_head_free() functions, which permits the allocators to allocated
and pre-initialize the debug-objects information, so that there no longer
any need for call_rcu() to do that initialization, which in turn prevents
the recursion into the memory allocators.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 063a6bf1a2b6..37c92cfef9ec 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -358,9 +358,19 @@ void wait_rcu_gp(call_rcu_func_t crf);
  * initialization.
  */
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
+void init_rcu_head(struct rcu_head *head);
+void destroy_rcu_head(struct rcu_head *head);
 void init_rcu_head_on_stack(struct rcu_head *head);
 void destroy_rcu_head_on_stack(struct rcu_head *head);
 #else /* !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
+static inline void init_rcu_head(struct rcu_head *head)
+{
+}
+
+static inline void destroy_rcu_head(struct rcu_head *head)
+{
+}
+
 static inline void init_rcu_head_on_stack(struct rcu_head *head)
 {
 }
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a2aeb4df0f60..0fb691e63ce6 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -200,12 +200,12 @@ void wait_rcu_gp(call_rcu_func_t crf)
 EXPORT_SYMBOL_GPL(wait_rcu_gp);
 
 #ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-static inline void debug_init_rcu_head(struct rcu_head *head)
+void init_rcu_head(struct rcu_head *head)
 {
 	debug_object_init(head, &rcuhead_debug_descr);
 }
 
-static inline void debug_rcu_head_free(struct rcu_head *head)
+void destroy_rcu_head(struct rcu_head *head)
 {
 	debug_object_free(head, &rcuhead_debug_descr);
 }


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-20 15:40                   ` Paul E. McKenney
@ 2014-07-12 18:03                     ` Sasha Levin
  2014-07-12 19:33                       ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Sasha Levin @ 2014-07-12 18:03 UTC (permalink / raw)
  To: paulmck, Thomas Gleixner
  Cc: Christoph Lameter, Pekka Enberg, Matt Mackall, Andrew Morton,
	Dave Jones, linux-mm, LKML

On 06/20/2014 11:40 AM, Paul E. McKenney wrote:
> rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
> 
> Currently, call_rcu() relies on implicit allocation and initialization
> for the debug-objects handling of RCU callbacks.  If you hammer the
> kernel hard enough with Sasha's modified version of trinity, you can end
> up with the sl*b allocators recursing into themselves via this implicit
> call_rcu() allocation.
> 
> This commit therefore exports the debug_init_rcu_head() and
> debug_rcu_head_free() functions, which permits the allocators to allocated
> and pre-initialize the debug-objects information, so that there no longer
> any need for call_rcu() to do that initialization, which in turn prevents
> the recursion into the memory allocators.
> 
> Reported-by: Sasha Levin <sasha.levin@oracle.com>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Acked-by: Thomas Gleixner <tglx@linutronix.de>

Hi Paul,

Oddly enough, I still see the issue in -next (I made sure that this patch
was in the tree):

[  393.810123] =============================================
[  393.810123] [ INFO: possible recursive locking detected ]
[  393.810123] 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 Not tainted
[  393.810123] ---------------------------------------------
[  393.810123] trinity-c32/9762 is trying to acquire lock:
[  393.810123] (&(&n->list_lock)->rlock){-.-...}, at: get_partial_node.isra.39 (mm/slub.c:1628)
[  393.810123]
[  393.810123] but task is already holding lock:
[  393.810123] (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[  393.810123]
[  393.810123] other info that might help us debug this:
[  393.810123]  Possible unsafe locking scenario:
[  393.810123]
[  393.810123]        CPU0
[  393.810123]        ----
[  393.810123]   lock(&(&n->list_lock)->rlock);
[  393.810123]   lock(&(&n->list_lock)->rlock);
[  393.810123]
[  393.810123]  *** DEADLOCK ***
[  393.810123]
[  393.810123]  May be due to missing lock nesting notation
[  393.810123]
[  393.810123] 5 locks held by trinity-c32/9762:
[  393.810123] #0: (net_mutex){+.+.+.}, at: copy_net_ns (net/core/net_namespace.c:254)
[  393.810123] #1: (cpu_hotplug.lock){++++++}, at: get_online_cpus (kernel/cpu.c:90)
[  393.810123] #2: (mem_hotplug.lock){.+.+.+}, at: get_online_mems (mm/memory_hotplug.c:83)
[  393.810123] #3: (slab_mutex){+.+.+.}, at: kmem_cache_destroy (mm/slab_common.c:344)
[  393.810123] #4: (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[  393.810123]
[  393.810123] stack backtrace:
[  393.810123] CPU: 32 PID: 9762 Comm: trinity-c32 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
[  393.843284]  ffff880bc26730e0 0000000000000000 ffffffffb4ae7ff0 ffff880bc26a3848
[  393.843284]  ffffffffb0e47068 ffffffffb4ae7ff0 ffff880bc26a38f0 ffffffffac258586
[  393.843284]  ffff880bc2673e30 000000050000000a ffffffffb444dee0 ffff880bc2673e48
[  393.843284] Call Trace:
[  393.843284] dump_stack (lib/dump_stack.c:52)
[  393.843284] __lock_acquire (kernel/locking/lockdep.c:1739 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
[  393.843284] lock_acquire (kernel/locking/lockdep.c:3602)
[  393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
[  393.843284] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[  393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
[  393.843284] get_partial_node.isra.39 (mm/slub.c:1628)
[  393.843284] ? check_irq_usage (kernel/locking/lockdep.c:1638)
[  393.843284] ? __slab_alloc (mm/slub.c:2307)
[  393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  393.843284] __slab_alloc (mm/slub.c:1730 mm/slub.c:2208 mm/slub.c:2372)
[  393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[  393.843284] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
[  393.843284] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:304)
[  393.843284] kmem_cache_alloc (mm/slub.c:2445 mm/slub.c:2487 mm/slub.c:2492)
[  393.843284] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[  393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[  393.843284] ? check_chain_key (kernel/locking/lockdep.c:2188)
[  393.843284] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[  393.843284] ? _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[  393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[  393.843284] debug_object_init (lib/debugobjects.c:365)
[  393.843284] rcuhead_fixup_activate (kernel/rcu/update.c:260)
[  393.843284] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
[  393.843284] ? preempt_count_sub (kernel/sched/core.c:2600)
[  393.843284] ? slab_cpuup_callback (mm/slub.c:1484)
[  393.843284] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 8) kernel/rcu/tree.c:2665 (discriminator 8))
[  393.843284] ? __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[  393.843284] call_rcu (kernel/rcu/tree_plugin.h:679)
[  393.843284] discard_slab (mm/slub.c:1522)
[  393.843284] __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[  393.843284] kmem_cache_destroy (mm/slab_common.c:350)
[  393.843284] nf_conntrack_cleanup_net_list (net/netfilter/nf_conntrack_core.c:1569 (discriminator 3))
[  393.843284] nf_conntrack_pernet_exit (net/netfilter/nf_conntrack_standalone.c:558)
[  393.843284] ops_exit_list.isra.1 (net/core/net_namespace.c:135)
[  393.843284] setup_net (net/core/net_namespace.c:180 (discriminator 3))
[  393.843284] copy_net_ns (net/core/net_namespace.c:255)
[  393.843284] create_new_namespaces (kernel/nsproxy.c:95)
[  393.843284] unshare_nsproxy_namespaces (kernel/nsproxy.c:190 (discriminator 4))
[  393.843284] SyS_unshare (kernel/fork.c:1865 kernel/fork.c:1814)
[  393.843284] tracesys (arch/x86/kernel/entry_64.S:542)


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-07-12 18:03                     ` Sasha Levin
@ 2014-07-12 19:33                       ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2014-07-12 19:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Christoph Lameter, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Sat, Jul 12, 2014 at 02:03:57PM -0400, Sasha Levin wrote:
> On 06/20/2014 11:40 AM, Paul E. McKenney wrote:
> > rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
> > 
> > Currently, call_rcu() relies on implicit allocation and initialization
> > for the debug-objects handling of RCU callbacks.  If you hammer the
> > kernel hard enough with Sasha's modified version of trinity, you can end
> > up with the sl*b allocators recursing into themselves via this implicit
> > call_rcu() allocation.
> > 
> > This commit therefore exports the debug_init_rcu_head() and
> > debug_rcu_head_free() functions, which permits the allocators to allocated
> > and pre-initialize the debug-objects information, so that there no longer
> > any need for call_rcu() to do that initialization, which in turn prevents
> > the recursion into the memory allocators.
> > 
> > Reported-by: Sasha Levin <sasha.levin@oracle.com>
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Acked-by: Thomas Gleixner <tglx@linutronix.de>
> 
> Hi Paul,
> 
> Oddly enough, I still see the issue in -next (I made sure that this patch
> was in the tree):

Hello, Sasha,

This commit is only part of the solution.  The allocators need to change
to make use of it.

							Thanx, Paul

> [  393.810123] =============================================
> [  393.810123] [ INFO: possible recursive locking detected ]
> [  393.810123] 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 Not tainted
> [  393.810123] ---------------------------------------------
> [  393.810123] trinity-c32/9762 is trying to acquire lock:
> [  393.810123] (&(&n->list_lock)->rlock){-.-...}, at: get_partial_node.isra.39 (mm/slub.c:1628)
> [  393.810123]
> [  393.810123] but task is already holding lock:
> [  393.810123] (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [  393.810123]
> [  393.810123] other info that might help us debug this:
> [  393.810123]  Possible unsafe locking scenario:
> [  393.810123]
> [  393.810123]        CPU0
> [  393.810123]        ----
> [  393.810123]   lock(&(&n->list_lock)->rlock);
> [  393.810123]   lock(&(&n->list_lock)->rlock);
> [  393.810123]
> [  393.810123]  *** DEADLOCK ***
> [  393.810123]
> [  393.810123]  May be due to missing lock nesting notation
> [  393.810123]
> [  393.810123] 5 locks held by trinity-c32/9762:
> [  393.810123] #0: (net_mutex){+.+.+.}, at: copy_net_ns (net/core/net_namespace.c:254)
> [  393.810123] #1: (cpu_hotplug.lock){++++++}, at: get_online_cpus (kernel/cpu.c:90)
> [  393.810123] #2: (mem_hotplug.lock){.+.+.+}, at: get_online_mems (mm/memory_hotplug.c:83)
> [  393.810123] #3: (slab_mutex){+.+.+.}, at: kmem_cache_destroy (mm/slab_common.c:344)
> [  393.810123] #4: (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [  393.810123]
> [  393.810123] stack backtrace:
> [  393.810123] CPU: 32 PID: 9762 Comm: trinity-c32 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [  393.843284]  ffff880bc26730e0 0000000000000000 ffffffffb4ae7ff0 ffff880bc26a3848
> [  393.843284]  ffffffffb0e47068 ffffffffb4ae7ff0 ffff880bc26a38f0 ffffffffac258586
> [  393.843284]  ffff880bc2673e30 000000050000000a ffffffffb444dee0 ffff880bc2673e48
> [  393.843284] Call Trace:
> [  393.843284] dump_stack (lib/dump_stack.c:52)
> [  393.843284] __lock_acquire (kernel/locking/lockdep.c:1739 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
> [  393.843284] lock_acquire (kernel/locking/lockdep.c:3602)
> [  393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
> [  393.843284] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
> [  393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
> [  393.843284] get_partial_node.isra.39 (mm/slub.c:1628)
> [  393.843284] ? check_irq_usage (kernel/locking/lockdep.c:1638)
> [  393.843284] ? __slab_alloc (mm/slub.c:2307)
> [  393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [  393.843284] __slab_alloc (mm/slub.c:1730 mm/slub.c:2208 mm/slub.c:2372)
> [  393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [  393.843284] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
> [  393.843284] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:304)
> [  393.843284] kmem_cache_alloc (mm/slub.c:2445 mm/slub.c:2487 mm/slub.c:2492)
> [  393.843284] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
> [  393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [  393.843284] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [  393.843284] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [  393.843284] ? _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
> [  393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [  393.843284] debug_object_init (lib/debugobjects.c:365)
> [  393.843284] rcuhead_fixup_activate (kernel/rcu/update.c:260)
> [  393.843284] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> [  393.843284] ? preempt_count_sub (kernel/sched/core.c:2600)
> [  393.843284] ? slab_cpuup_callback (mm/slub.c:1484)
> [  393.843284] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 8) kernel/rcu/tree.c:2665 (discriminator 8))
> [  393.843284] ? __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [  393.843284] call_rcu (kernel/rcu/tree_plugin.h:679)
> [  393.843284] discard_slab (mm/slub.c:1522)
> [  393.843284] __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [  393.843284] kmem_cache_destroy (mm/slab_common.c:350)
> [  393.843284] nf_conntrack_cleanup_net_list (net/netfilter/nf_conntrack_core.c:1569 (discriminator 3))
> [  393.843284] nf_conntrack_pernet_exit (net/netfilter/nf_conntrack_standalone.c:558)
> [  393.843284] ops_exit_list.isra.1 (net/core/net_namespace.c:135)
> [  393.843284] setup_net (net/core/net_namespace.c:180 (discriminator 3))
> [  393.843284] copy_net_ns (net/core/net_namespace.c:255)
> [  393.843284] create_new_namespaces (kernel/nsproxy.c:95)
> [  393.843284] unshare_nsproxy_namespaces (kernel/nsproxy.c:190 (discriminator 4))
> [  393.843284] SyS_unshare (kernel/fork.c:1865 kernel/fork.c:1814)
> [  393.843284] tracesys (arch/x86/kernel/entry_64.S:542)
> 
> 
> Thanks,
> Sasha
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-06-19 20:19       ` Christoph Lameter
  2014-06-19 20:28         ` Thomas Gleixner
  2014-06-19 20:36         ` Paul E. McKenney
@ 2014-08-18 16:37         ` Paul E. McKenney
  2014-08-19  3:44           ` Christoph Lameter
  2 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2014-08-18 16:37 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Thu, Jun 19, 2014 at 03:19:39PM -0500, Christoph Lameter wrote:
> On Thu, 19 Jun 2014, Thomas Gleixner wrote:
> 
> > Well, no. Look at the callchain:
> >
> > __call_rcu
> >     debug_object_activate
> >        rcuhead_fixup_activate
> >           debug_object_init
> >               kmem_cache_alloc
> >
> > So call rcu activates the object, but the object has no reference in
> > the debug objects code so the fixup code is called which inits the
> > object and allocates a reference ....
> 
> So we need to init the object in the page struct before the __call_rcu?

And the needed APIs are now in mainline:

	void init_rcu_head(struct rcu_head *head);
	void destroy_rcu_head(struct rcu_head *head);

Over to you, Christoph!  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-08-18 16:37         ` Paul E. McKenney
@ 2014-08-19  3:44           ` Christoph Lameter
  2014-08-19  3:58             ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2014-08-19  3:44 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Mon, 18 Aug 2014, Paul E. McKenney wrote:

> > > So call rcu activates the object, but the object has no reference in
> > > the debug objects code so the fixup code is called which inits the
> > > object and allocates a reference ....
> >
> > So we need to init the object in the page struct before the __call_rcu?
>
> And the needed APIs are now in mainline:
>
> 	void init_rcu_head(struct rcu_head *head);
> 	void destroy_rcu_head(struct rcu_head *head);
>
> Over to you, Christoph!  ;-)

The field we are using for the rcu head serves other purposes before
the free action. We cannot init the field at slab creation as we
thought since it is used for the queueing of slabs on the partial, free
and full lists. The kmem_cache information is not available when doing
the freeing so we must force the allocation of reserve fields and the
use of the reserved areas for rcu on all kmem_caches.

I made this conditional on CONFIG_RCU_XYZ. This needs to be the actual
Debug options that will require allocations when initializing rcu heads.

Also note that the allocations in the rcu head initialization must be
restricted to non RCU slabs otherwise the recursion may not terminate.


Subject RFC: Allow allocations on initializing rcu fields in slub.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -1308,6 +1308,41 @@ static inline struct page *alloc_slab_pa
 	return page;
 }

+#ifdef CONFIG_RCU_DEBUG_XYZ
+/*
+ * We may have to do alloations during the initialization of the
+ * debug portion of the rcu structure for a slab. It must therefore
+ * be separately allocated and initized on allocation.
+ * We cannot overload the lru field in the page struct at all.
+ */
+#define need_reserve_slab_rcu 1
+#else
+/*
+ * Overload the lru field in struct page if it fits.
+ * Should struct rcu_head grow due to debugging fields etc then
+ * additional space will be allocated from the end of the slab to
+ * store the rcu_head.
+ */
+#define need_reserve_slab_rcu						\
+	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
+#endif
+
+static struct rcu_head *get_rcu_head(struct kmem_cache *s, struct page *page)
+{
+	if (need_reserve_slab_rcu) {
+		int order = compound_order(page);
+		int offset = (PAGE_SIZE << order) - s->reserved;
+
+		VM_BUG_ON(s->reserved != sizeof(struct rcu_head));
+		return page_address(page) + offset;
+	} else {
+		/*
+		 * RCU free overloads the RCU head over the LRU
+		 */
+		return (void *)&page->lru;
+	}
+}
+
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
 	struct page *page;
@@ -1357,6 +1392,21 @@ static struct page *allocate_slab(struct
 			kmemcheck_mark_unallocated_pages(page, pages);
 	}

+#ifdef CONFIG_RCU_DEBUG_XYZ
+	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU) && page)
+		/*
+		 * Initialize rcu_head and potentially do other
+		 * allocations. Note that this is still a recursive
+		 * call into the allocator which may recurse endlessly
+		 * if the same kmem_cache is used for allocation here.
+		 *
+		 * So in order to be safe the slab caches used
+		 * in init_rcu_head must be restricted to be of the
+		 * non rcu kind only.
+		 */
+		init_rcu_head(get_rcu_head(s, page));
+#endif
+
 	if (flags & __GFP_WAIT)
 		local_irq_disable();
 	if (!page)
@@ -1452,13 +1502,13 @@ static void __free_slab(struct kmem_cach
 	memcg_uncharge_slab(s, order);
 }

-#define need_reserve_slab_rcu						\
-	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
-
 static void rcu_free_slab(struct rcu_head *h)
 {
 	struct page *page;

+#ifdef CONFIG_RCU_DEBUG_XYZ
+	destroy_rcu_head(h);
+#endif
 	if (need_reserve_slab_rcu)
 		page = virt_to_head_page(h);
 	else
@@ -1469,24 +1519,9 @@ static void rcu_free_slab(struct rcu_hea

 static void free_slab(struct kmem_cache *s, struct page *page)
 {
-	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU)) {
-		struct rcu_head *head;
-
-		if (need_reserve_slab_rcu) {
-			int order = compound_order(page);
-			int offset = (PAGE_SIZE << order) - s->reserved;
-
-			VM_BUG_ON(s->reserved != sizeof(*head));
-			head = page_address(page) + offset;
-		} else {
-			/*
-			 * RCU free overloads the RCU head over the LRU
-			 */
-			head = (void *)&page->lru;
-		}
-
-		call_rcu(head, rcu_free_slab);
-	} else
+	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU))
+		call_rcu(get_rcu_head(s, page), rcu_free_slab);
+	else
 		__free_slab(s, page);
 }


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-08-19  3:44           ` Christoph Lameter
@ 2014-08-19  3:58             ` Paul E. McKenney
  2014-08-20  2:00               ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2014-08-19  3:58 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Mon, Aug 18, 2014 at 10:44:34PM -0500, Christoph Lameter wrote:
> On Mon, 18 Aug 2014, Paul E. McKenney wrote:
> 
> > > > So call rcu activates the object, but the object has no reference in
> > > > the debug objects code so the fixup code is called which inits the
> > > > object and allocates a reference ....
> > >
> > > So we need to init the object in the page struct before the __call_rcu?
> >
> > And the needed APIs are now in mainline:
> >
> > 	void init_rcu_head(struct rcu_head *head);
> > 	void destroy_rcu_head(struct rcu_head *head);
> >
> > Over to you, Christoph!  ;-)
> 
> The field we are using for the rcu head serves other purposes before
> the free action. We cannot init the field at slab creation as we
> thought since it is used for the queueing of slabs on the partial, free
> and full lists. The kmem_cache information is not available when doing
> the freeing so we must force the allocation of reserve fields and the
> use of the reserved areas for rcu on all kmem_caches.

Yow!  I am glad I didn't try doing this myself!

> I made this conditional on CONFIG_RCU_XYZ. This needs to be the actual
> Debug options that will require allocations when initializing rcu heads.
> 
> Also note that the allocations in the rcu head initialization must be
> restricted to non RCU slabs otherwise the recursion may not terminate.
> 
> 
> Subject RFC: Allow allocations on initializing rcu fields in slub.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
> @@ -1308,6 +1308,41 @@ static inline struct page *alloc_slab_pa
>  	return page;
>  }
> 
> +#ifdef CONFIG_RCU_DEBUG_XYZ

If you make CONFIG_RCU_DEBUG_XYZ instead be CONFIG_DEBUG_OBJECTS_RCU_HEAD,
then it will automatically show up when it needs to.

The rest looks plausible, for whatever that is worth.

							Thanx, Paul

> +/*
> + * We may have to do alloations during the initialization of the
> + * debug portion of the rcu structure for a slab. It must therefore
> + * be separately allocated and initized on allocation.
> + * We cannot overload the lru field in the page struct at all.
> + */
> +#define need_reserve_slab_rcu 1
> +#else
> +/*
> + * Overload the lru field in struct page if it fits.
> + * Should struct rcu_head grow due to debugging fields etc then
> + * additional space will be allocated from the end of the slab to
> + * store the rcu_head.
> + */
> +#define need_reserve_slab_rcu						\
> +	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
> +#endif
> +
> +static struct rcu_head *get_rcu_head(struct kmem_cache *s, struct page *page)
> +{
> +	if (need_reserve_slab_rcu) {
> +		int order = compound_order(page);
> +		int offset = (PAGE_SIZE << order) - s->reserved;
> +
> +		VM_BUG_ON(s->reserved != sizeof(struct rcu_head));
> +		return page_address(page) + offset;
> +	} else {
> +		/*
> +		 * RCU free overloads the RCU head over the LRU
> +		 */
> +		return (void *)&page->lru;
> +	}
> +}
> +
>  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  {
>  	struct page *page;
> @@ -1357,6 +1392,21 @@ static struct page *allocate_slab(struct
>  			kmemcheck_mark_unallocated_pages(page, pages);
>  	}
> 
> +#ifdef CONFIG_RCU_DEBUG_XYZ
> +	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU) && page)
> +		/*
> +		 * Initialize rcu_head and potentially do other
> +		 * allocations. Note that this is still a recursive
> +		 * call into the allocator which may recurse endlessly
> +		 * if the same kmem_cache is used for allocation here.
> +		 *
> +		 * So in order to be safe the slab caches used
> +		 * in init_rcu_head must be restricted to be of the
> +		 * non rcu kind only.
> +		 */
> +		init_rcu_head(get_rcu_head(s, page));
> +#endif
> +
>  	if (flags & __GFP_WAIT)
>  		local_irq_disable();
>  	if (!page)
> @@ -1452,13 +1502,13 @@ static void __free_slab(struct kmem_cach
>  	memcg_uncharge_slab(s, order);
>  }
> 
> -#define need_reserve_slab_rcu						\
> -	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
> -
>  static void rcu_free_slab(struct rcu_head *h)
>  {
>  	struct page *page;
> 
> +#ifdef CONFIG_RCU_DEBUG_XYZ
> +	destroy_rcu_head(h);
> +#endif
>  	if (need_reserve_slab_rcu)
>  		page = virt_to_head_page(h);
>  	else
> @@ -1469,24 +1519,9 @@ static void rcu_free_slab(struct rcu_hea
> 
>  static void free_slab(struct kmem_cache *s, struct page *page)
>  {
> -	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU)) {
> -		struct rcu_head *head;
> -
> -		if (need_reserve_slab_rcu) {
> -			int order = compound_order(page);
> -			int offset = (PAGE_SIZE << order) - s->reserved;
> -
> -			VM_BUG_ON(s->reserved != sizeof(*head));
> -			head = page_address(page) + offset;
> -		} else {
> -			/*
> -			 * RCU free overloads the RCU head over the LRU
> -			 */
> -			head = (void *)&page->lru;
> -		}
> -
> -		call_rcu(head, rcu_free_slab);
> -	} else
> +	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU))
> +		call_rcu(get_rcu_head(s, page), rcu_free_slab);
> +	else
>  		__free_slab(s, page);
>  }
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-08-19  3:58             ` Paul E. McKenney
@ 2014-08-20  2:00               ` Christoph Lameter
  2014-08-20  2:31                 ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2014-08-20  2:00 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Mon, 18 Aug 2014, Paul E. McKenney wrote:

> > +#ifdef CONFIG_RCU_DEBUG_XYZ
>
> If you make CONFIG_RCU_DEBUG_XYZ instead be CONFIG_DEBUG_OBJECTS_RCU_HEAD,
> then it will automatically show up when it needs to.

Ok.

> The rest looks plausible, for whatever that is worth.

We talked in the hallway about init_rcu_head not touching
the contents of the rcu_head. If that is the case then we can simplify
the patch.

We could also remove the #ifdefs if init_rcu_head and destroy_rcu_head
are no ops if CONFIG_DEBUG_RCU_HEAD is not defined.



Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -1308,6 +1308,25 @@ static inline struct page *alloc_slab_pa
 	return page;
 }

+#define need_reserve_slab_rcu						\
+	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
+
+static struct rcu_head *get_rcu_head(struct kmem_cache *s, struct page *page)
+{
+	if (need_reserve_slab_rcu) {
+		int order = compound_order(page);
+		int offset = (PAGE_SIZE << order) - s->reserved;
+
+		VM_BUG_ON(s->reserved != sizeof(struct rcu_head));
+		return page_address(page) + offset;
+	} else {
+		/*
+		 * RCU free overloads the RCU head over the LRU
+		 */
+		return (void *)&page->lru;
+	}
+}
+
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
 	struct page *page;
@@ -1357,6 +1376,22 @@ static struct page *allocate_slab(struct
 			kmemcheck_mark_unallocated_pages(page, pages);
 	}

+#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
+	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU) && page)
+		/*
+		 * Initialize various things. However, this init is
+	 	 * not allowed to modify the contents of the rcu head.
+		 * Allocations are permitted. However, the use of
+		 * the same cache or another cache with SLAB_RCU_DESTROY
+		 * set may cause additional recursions.
+		 *
+		 * So in order to be safe the slab caches used
+		 * in init_rcu_head should be restricted to be of the
+		 * non rcu kind only.
+		 */
+		init_rcu_head(get_rcu_head(s, page));
+#endif
+
 	if (flags & __GFP_WAIT)
 		local_irq_disable();
 	if (!page)
@@ -1452,13 +1487,13 @@ static void __free_slab(struct kmem_cach
 	memcg_uncharge_slab(s, order);
 }

-#define need_reserve_slab_rcu						\
-	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
-
 static void rcu_free_slab(struct rcu_head *h)
 {
 	struct page *page;

+#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
+	destroy_rcu_head(h);
+#endif
 	if (need_reserve_slab_rcu)
 		page = virt_to_head_page(h);
 	else
@@ -1469,24 +1504,9 @@ static void rcu_free_slab(struct rcu_hea

 static void free_slab(struct kmem_cache *s, struct page *page)
 {
-	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU)) {
-		struct rcu_head *head;
-
-		if (need_reserve_slab_rcu) {
-			int order = compound_order(page);
-			int offset = (PAGE_SIZE << order) - s->reserved;
-
-			VM_BUG_ON(s->reserved != sizeof(*head));
-			head = page_address(page) + offset;
-		} else {
-			/*
-			 * RCU free overloads the RCU head over the LRU
-			 */
-			head = (void *)&page->lru;
-		}
-
-		call_rcu(head, rcu_free_slab);
-	} else
+	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU))
+		call_rcu(get_rcu_head(s, page), rcu_free_slab);
+	else
 		__free_slab(s, page);
 }


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-08-20  2:00               ` Christoph Lameter
@ 2014-08-20  2:31                 ` Paul E. McKenney
  2014-08-20  6:01                   ` Christoph Lameter
  0 siblings, 1 reply; 28+ messages in thread
From: Paul E. McKenney @ 2014-08-20  2:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Tue, Aug 19, 2014 at 09:00:05PM -0500, Christoph Lameter wrote:
> On Mon, 18 Aug 2014, Paul E. McKenney wrote:
> 
> > > +#ifdef CONFIG_RCU_DEBUG_XYZ
> >
> > If you make CONFIG_RCU_DEBUG_XYZ instead be CONFIG_DEBUG_OBJECTS_RCU_HEAD,
> > then it will automatically show up when it needs to.
> 
> Ok.
> 
> > The rest looks plausible, for whatever that is worth.
> 
> We talked in the hallway about init_rcu_head not touching
> the contents of the rcu_head. If that is the case then we can simplify
> the patch.

That is correct -- the debug-objects code uses separate storage to track
states, and does not touch the memory to which the state applies.

> We could also remove the #ifdefs if init_rcu_head and destroy_rcu_head
> are no ops if CONFIG_DEBUG_RCU_HEAD is not defined.

And indeed they are, good point!  It appears to me that both sets of
#ifdefs can go away.

							Thanx, Paul

> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
> @@ -1308,6 +1308,25 @@ static inline struct page *alloc_slab_pa
>  	return page;
>  }
> 
> +#define need_reserve_slab_rcu						\
> +	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
> +
> +static struct rcu_head *get_rcu_head(struct kmem_cache *s, struct page *page)
> +{
> +	if (need_reserve_slab_rcu) {
> +		int order = compound_order(page);
> +		int offset = (PAGE_SIZE << order) - s->reserved;
> +
> +		VM_BUG_ON(s->reserved != sizeof(struct rcu_head));
> +		return page_address(page) + offset;
> +	} else {
> +		/*
> +		 * RCU free overloads the RCU head over the LRU
> +		 */
> +		return (void *)&page->lru;
> +	}
> +}
> +
>  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  {
>  	struct page *page;
> @@ -1357,6 +1376,22 @@ static struct page *allocate_slab(struct
>  			kmemcheck_mark_unallocated_pages(page, pages);
>  	}
> 
> +#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
> +	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU) && page)
> +		/*
> +		 * Initialize various things. However, this init is
> +	 	 * not allowed to modify the contents of the rcu head.
> +		 * Allocations are permitted. However, the use of
> +		 * the same cache or another cache with SLAB_RCU_DESTROY
> +		 * set may cause additional recursions.
> +		 *
> +		 * So in order to be safe the slab caches used
> +		 * in init_rcu_head should be restricted to be of the
> +		 * non rcu kind only.
> +		 */
> +		init_rcu_head(get_rcu_head(s, page));
> +#endif
> +
>  	if (flags & __GFP_WAIT)
>  		local_irq_disable();
>  	if (!page)
> @@ -1452,13 +1487,13 @@ static void __free_slab(struct kmem_cach
>  	memcg_uncharge_slab(s, order);
>  }
> 
> -#define need_reserve_slab_rcu						\
> -	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
> -
>  static void rcu_free_slab(struct rcu_head *h)
>  {
>  	struct page *page;
> 
> +#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
> +	destroy_rcu_head(h);
> +#endif
>  	if (need_reserve_slab_rcu)
>  		page = virt_to_head_page(h);
>  	else
> @@ -1469,24 +1504,9 @@ static void rcu_free_slab(struct rcu_hea
> 
>  static void free_slab(struct kmem_cache *s, struct page *page)
>  {
> -	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU)) {
> -		struct rcu_head *head;
> -
> -		if (need_reserve_slab_rcu) {
> -			int order = compound_order(page);
> -			int offset = (PAGE_SIZE << order) - s->reserved;
> -
> -			VM_BUG_ON(s->reserved != sizeof(*head));
> -			head = page_address(page) + offset;
> -		} else {
> -			/*
> -			 * RCU free overloads the RCU head over the LRU
> -			 */
> -			head = (void *)&page->lru;
> -		}
> -
> -		call_rcu(head, rcu_free_slab);
> -	} else
> +	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU))
> +		call_rcu(get_rcu_head(s, page), rcu_free_slab);
> +	else
>  		__free_slab(s, page);
>  }
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-08-20  2:31                 ` Paul E. McKenney
@ 2014-08-20  6:01                   ` Christoph Lameter
  2014-08-20 12:19                     ` Paul E. McKenney
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Lameter @ 2014-08-20  6:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Tue, 19 Aug 2014, Paul E. McKenney wrote:

> > We could also remove the #ifdefs if init_rcu_head and destroy_rcu_head
> > are no ops if CONFIG_DEBUG_RCU_HEAD is not defined.
>
> And indeed they are, good point!  It appears to me that both sets of
> #ifdefs can go away.

Ok then this is a first workable version I think. How do we test this?

From: Christoph Lameter <cl@linux.com>
Subject: slub: Add init/destroy function calls for rcu_heads

In order to do proper debugging for rcu_head use we need some
additional structures allocated when an object potentially
using a rcu_head is allocated in the slub allocator.

This adds the proper calls to init_rcu_head()
and destroy_rcu_head().

init_rcu_head() is a bit of an unusual function since:
1. It does not touch the contents of the rcu_head. This is
   required since the rcu_head is only used during
   slab_page freeing. Outside of that the same memory location
   is used for slab page list management. However, the
   initialization occurs when the slab page is initially allocated.
   So in the time between init_rcu_head() and destroy_rcu_head()
   there may be multiple uses of the indicated address as a
   list_head.

2. It is called without gfp flags and could potentially
   be called from atomic contexts. Allocations from init_rcu_head()
   context need to deal with this.

3. init_rcu_head() is called from within the slab allocation
   functions. Since init_rcu_head() calls the allocator again
   for more allocations it must avoid to use slabs that use
   rcu freeing. Otherwise endless recursion may occur
   (We may have to convince lockdep that what we do here is sane).

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c
+++ linux/mm/slub.c
@@ -1308,6 +1308,25 @@ static inline struct page *alloc_slab_pa
 	return page;
 }

+#define need_reserve_slab_rcu						\
+	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
+
+static struct rcu_head *get_rcu_head(struct kmem_cache *s, struct page *page)
+{
+	if (need_reserve_slab_rcu) {
+		int order = compound_order(page);
+		int offset = (PAGE_SIZE << order) - s->reserved;
+
+		VM_BUG_ON(s->reserved != sizeof(struct rcu_head));
+		return page_address(page) + offset;
+	} else {
+		/*
+		 * RCU free overloads the RCU head over the LRU
+		 */
+		return (void *)&page->lru;
+	}
+}
+
 static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 {
 	struct page *page;
@@ -1357,6 +1376,29 @@ static struct page *allocate_slab(struct
 			kmemcheck_mark_unallocated_pages(page, pages);
 	}

+	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU) && page)
+		/*
+		 * Initialize various things. However, this init is
+	 	 * not allowed to modify the contents of the rcu head.
+		 * The allocator typically overloads the rcu head over
+		 * page->lru which is also used to manage lists of
+		 * slab pages.
+		 *
+		 * Allocations are permitted in init_rcu_head().
+		 * However, the use of the same cache or another
+		 * cache with SLAB_DESTROY_BY_RCU set will cause
+		 * additional recursions.
+		 *
+		 * So in order to be safe the slab caches used
+		 * in init_rcu_head() should be restricted to be of the
+		 * non rcu kind only.
+		 *
+		 * Note also that no GFPFLAG is passed. The function
+		 * may therefore be called from atomic contexts
+		 * and somehow(?) needs to do the right thing.
+		 */
+		init_rcu_head(get_rcu_head(s, page));
+
 	if (flags & __GFP_WAIT)
 		local_irq_disable();
 	if (!page)
@@ -1452,13 +1494,11 @@ static void __free_slab(struct kmem_cach
 	memcg_uncharge_slab(s, order);
 }

-#define need_reserve_slab_rcu						\
-	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
-
 static void rcu_free_slab(struct rcu_head *h)
 {
 	struct page *page;

+	destroy_rcu_head(h);
 	if (need_reserve_slab_rcu)
 		page = virt_to_head_page(h);
 	else
@@ -1469,24 +1509,9 @@ static void rcu_free_slab(struct rcu_hea

 static void free_slab(struct kmem_cache *s, struct page *page)
 {
-	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU)) {
-		struct rcu_head *head;
-
-		if (need_reserve_slab_rcu) {
-			int order = compound_order(page);
-			int offset = (PAGE_SIZE << order) - s->reserved;
-
-			VM_BUG_ON(s->reserved != sizeof(*head));
-			head = page_address(page) + offset;
-		} else {
-			/*
-			 * RCU free overloads the RCU head over the LRU
-			 */
-			head = (void *)&page->lru;
-		}
-
-		call_rcu(head, rcu_free_slab);
-	} else
+	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU))
+		call_rcu(get_rcu_head(s, page), rcu_free_slab);
+	else
 		__free_slab(s, page);
 }


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: slub/debugobjects: lockup when freeing memory
  2014-08-20  6:01                   ` Christoph Lameter
@ 2014-08-20 12:19                     ` Paul E. McKenney
  0 siblings, 0 replies; 28+ messages in thread
From: Paul E. McKenney @ 2014-08-20 12:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Thomas Gleixner, Sasha Levin, Pekka Enberg, Matt Mackall,
	Andrew Morton, Dave Jones, linux-mm, LKML

On Wed, Aug 20, 2014 at 01:01:19AM -0500, Christoph Lameter wrote:
> On Tue, 19 Aug 2014, Paul E. McKenney wrote:
> 
> > > We could also remove the #ifdefs if init_rcu_head and destroy_rcu_head
> > > are no ops if CONFIG_DEBUG_RCU_HEAD is not defined.
> >
> > And indeed they are, good point!  It appears to me that both sets of
> > #ifdefs can go away.
> 
> Ok then this is a first workable version I think. How do we test this?

It looks good to me.

Sasha, could you please try this out?  This should fix the problem
you reported here:  https://lkml.org/lkml/2014/6/19/306

							Thanx, Paul

> From: Christoph Lameter <cl@linux.com>
> Subject: slub: Add init/destroy function calls for rcu_heads
> 
> In order to do proper debugging for rcu_head use we need some
> additional structures allocated when an object potentially
> using a rcu_head is allocated in the slub allocator.
> 
> This adds the proper calls to init_rcu_head()
> and destroy_rcu_head().
> 
> init_rcu_head() is a bit of an unusual function since:
> 1. It does not touch the contents of the rcu_head. This is
>    required since the rcu_head is only used during
>    slab_page freeing. Outside of that the same memory location
>    is used for slab page list management. However, the
>    initialization occurs when the slab page is initially allocated.
>    So in the time between init_rcu_head() and destroy_rcu_head()
>    there may be multiple uses of the indicated address as a
>    list_head.
> 
> 2. It is called without gfp flags and could potentially
>    be called from atomic contexts. Allocations from init_rcu_head()
>    context need to deal with this.
> 
> 3. init_rcu_head() is called from within the slab allocation
>    functions. Since init_rcu_head() calls the allocator again
>    for more allocations it must avoid to use slabs that use
>    rcu freeing. Otherwise endless recursion may occur
>    (We may have to convince lockdep that what we do here is sane).
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
> @@ -1308,6 +1308,25 @@ static inline struct page *alloc_slab_pa
>  	return page;
>  }
> 
> +#define need_reserve_slab_rcu						\
> +	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
> +
> +static struct rcu_head *get_rcu_head(struct kmem_cache *s, struct page *page)
> +{
> +	if (need_reserve_slab_rcu) {
> +		int order = compound_order(page);
> +		int offset = (PAGE_SIZE << order) - s->reserved;
> +
> +		VM_BUG_ON(s->reserved != sizeof(struct rcu_head));
> +		return page_address(page) + offset;
> +	} else {
> +		/*
> +		 * RCU free overloads the RCU head over the LRU
> +		 */
> +		return (void *)&page->lru;
> +	}
> +}
> +
>  static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
>  {
>  	struct page *page;
> @@ -1357,6 +1376,29 @@ static struct page *allocate_slab(struct
>  			kmemcheck_mark_unallocated_pages(page, pages);
>  	}
> 
> +	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU) && page)
> +		/*
> +		 * Initialize various things. However, this init is
> +	 	 * not allowed to modify the contents of the rcu head.
> +		 * The allocator typically overloads the rcu head over
> +		 * page->lru which is also used to manage lists of
> +		 * slab pages.
> +		 *
> +		 * Allocations are permitted in init_rcu_head().
> +		 * However, the use of the same cache or another
> +		 * cache with SLAB_DESTROY_BY_RCU set will cause
> +		 * additional recursions.
> +		 *
> +		 * So in order to be safe the slab caches used
> +		 * in init_rcu_head() should be restricted to be of the
> +		 * non rcu kind only.
> +		 *
> +		 * Note also that no GFPFLAG is passed. The function
> +		 * may therefore be called from atomic contexts
> +		 * and somehow(?) needs to do the right thing.
> +		 */
> +		init_rcu_head(get_rcu_head(s, page));
> +
>  	if (flags & __GFP_WAIT)
>  		local_irq_disable();
>  	if (!page)
> @@ -1452,13 +1494,11 @@ static void __free_slab(struct kmem_cach
>  	memcg_uncharge_slab(s, order);
>  }
> 
> -#define need_reserve_slab_rcu						\
> -	(sizeof(((struct page *)NULL)->lru) < sizeof(struct rcu_head))
> -
>  static void rcu_free_slab(struct rcu_head *h)
>  {
>  	struct page *page;
> 
> +	destroy_rcu_head(h);
>  	if (need_reserve_slab_rcu)
>  		page = virt_to_head_page(h);
>  	else
> @@ -1469,24 +1509,9 @@ static void rcu_free_slab(struct rcu_hea
> 
>  static void free_slab(struct kmem_cache *s, struct page *page)
>  {
> -	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU)) {
> -		struct rcu_head *head;
> -
> -		if (need_reserve_slab_rcu) {
> -			int order = compound_order(page);
> -			int offset = (PAGE_SIZE << order) - s->reserved;
> -
> -			VM_BUG_ON(s->reserved != sizeof(*head));
> -			head = page_address(page) + offset;
> -		} else {
> -			/*
> -			 * RCU free overloads the RCU head over the LRU
> -			 */
> -			head = (void *)&page->lru;
> -		}
> -
> -		call_rcu(head, rcu_free_slab);
> -	} else
> +	if (unlikely(s->flags & SLAB_DESTROY_BY_RCU))
> +		call_rcu(get_rcu_head(s, page), rcu_free_slab);
> +	else
>  		__free_slab(s, page);
>  }
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-08-20 12:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-19 14:30 slub/debugobjects: lockup when freeing memory Sasha Levin
2014-06-19 15:03 ` Christoph Lameter
2014-06-19 16:52   ` Paul E. McKenney
2014-06-19 19:29     ` Thomas Gleixner
2014-06-19 20:19       ` Christoph Lameter
2014-06-19 20:28         ` Thomas Gleixner
2014-06-19 20:36         ` Paul E. McKenney
2014-08-18 16:37         ` Paul E. McKenney
2014-08-19  3:44           ` Christoph Lameter
2014-08-19  3:58             ` Paul E. McKenney
2014-08-20  2:00               ` Christoph Lameter
2014-08-20  2:31                 ` Paul E. McKenney
2014-08-20  6:01                   ` Christoph Lameter
2014-08-20 12:19                     ` Paul E. McKenney
2014-06-19 20:29       ` Paul E. McKenney
2014-06-19 20:32         ` Sasha Levin
2014-06-19 20:39           ` Paul E. McKenney
2014-06-19 20:37         ` Thomas Gleixner
2014-06-19 20:53           ` Paul E. McKenney
2014-06-19 21:32             ` Thomas Gleixner
2014-06-19 22:04               ` Paul E. McKenney
2014-06-20  8:17                 ` Thomas Gleixner
2014-06-20 15:40                   ` Paul E. McKenney
2014-07-12 18:03                     ` Sasha Levin
2014-07-12 19:33                       ` Paul E. McKenney
2014-06-20 14:30                 ` Christoph Lameter
2014-06-19 20:42         ` Sasha Levin
2014-06-19 20:53           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).