linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
@ 2012-10-02 16:14 Jiri Kosina
  2012-10-02 17:01 ` Paul E. McKenney
  2012-10-02 20:39 ` Srivatsa S. Bhat
  0 siblings, 2 replies; 46+ messages in thread
From: Jiri Kosina @ 2012-10-02 16:14 UTC (permalink / raw)
  To: Paul E. McKenney, Josh Triplett; +Cc: linux-kernel

Hi,

this commit:

==
1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
Author: Paul E. McKenney <paul.mckenney@linaro.org>
Date:   Thu Aug 2 17:43:50 2012 -0700

    rcu: Remove _rcu_barrier() dependency on __stop_machine()
    
    Currently, _rcu_barrier() relies on preempt_disable() to prevent
    any CPU from going offline, which in turn depends on CPU hotplug's
    use of __stop_machine().
    
    This patch therefore makes _rcu_barrier() use get_online_cpus() to
    block CPU-hotplug operations.  This has the added benefit of removing
    the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
    operations are excluded, there can be no callbacks to adopt.  This
    commit simplifies the code accordingly.
    
    Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Reviewed-by: Josh Triplett <josh@joshtriplett.org>
==

is causing lockdep to complain (see the full trace below). I haven't yet 
had time to analyze what exactly is happening, and probably will not have 
time to do so until tomorrow, so just sending this as a heads-up in case 
anyone sees the culprit immediately.

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
 -------------------------------------------------------
 kworker/u:2/40 is trying to acquire lock:
  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
 
 but task is already holding lock:
  (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #2 (slab_mutex){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
        [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
        [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
        [<ffffffff8155719d>] _cpu_up+0xba/0x14e
        [<ffffffff815572ed>] cpu_up+0xbc/0x117
        [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
        [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
 
 -> #1 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81049197>] get_online_cpus+0x37/0x50
        [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
        [<ffffffff8118cc01>] deactivate_super+0x61/0x70
        [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
        [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
        [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
 
 -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
        [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
        [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
        [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
        [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
        [<ffffffff81454b79>] ops_exit_list+0x39/0x60
        [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
        [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
        [<ffffffff81069f3e>] worker_thread+0x12e/0x320
        [<ffffffff8106f73e>] kthread+0x9e/0xb0
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
 
 other info that might help us debug this:
 
 Chain exists of:
   rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
 
  Possible unsafe locking scenario:
 
        CPU0                    CPU1
        ----                    ----
   lock(slab_mutex);
                                lock(cpu_hotplug.lock);
                                lock(slab_mutex);
   lock(rcu_sched_state.barrier_mutex);
 
  *** DEADLOCK ***
 
 4 locks held by kworker/u:2/40:
  #0:  (netns){.+.+.+}, at: [<ffffffff810690b2>] process_one_work+0x1a2/0x4c0
  #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff810690b2>] process_one_work+0x1a2/0x4c0
  #2:  (net_mutex){+.+.+.}, at: [<ffffffff81455130>] cleanup_net+0x80/0x1b0
  #3:  (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
 
 stack backtrace:
 Pid: 40, comm: kworker/u:2 Not tainted 3.6.0-rc5-00004-g0d8ee37 #143
 Call Trace:
  [<ffffffff810ac85f>] print_circular_bug+0x10f/0x120
  [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
  [<ffffffff810ad85a>] ? check_prev_add+0xea/0x440
  [<ffffffff8102c72f>] ? flat_send_IPI_mask+0x7f/0xc0
  [<ffffffff810ae1e2>] validate_chain+0x632/0x720
  [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
  [<ffffffff810ae921>] lock_acquire+0x121/0x190
  [<ffffffff810f2126>] ? _rcu_barrier+0x26/0x1e0
  [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
  [<ffffffff810f2126>] ? _rcu_barrier+0x26/0x1e0
  [<ffffffff810b5e45>] ? on_each_cpu+0x65/0xc0
  [<ffffffff810f2126>] ? _rcu_barrier+0x26/0x1e0
  [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
  [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
  [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
  [<ffffffff810f2309>] rcu_barrier+0x9/0x10
  [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
  [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
  [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
  [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
  [<ffffffff81454b79>] ops_exit_list+0x39/0x60
  [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
  [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
  [<ffffffff810690b2>] ? process_one_work+0x1a2/0x4c0
  [<ffffffff81069e69>] ? worker_thread+0x59/0x320
  [<ffffffff814550b0>] ? net_drop_ns+0x40/0x40
  [<ffffffff81069f3e>] worker_thread+0x12e/0x320
  [<ffffffff81069e10>] ? manage_workers+0x110/0x110
  [<ffffffff8106f73e>] kthread+0x9e/0xb0
  [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
  [<ffffffff81560b70>] ? retint_restore_args+0x13/0x13
  [<ffffffff8106f6a0>] ? __init_kthread_worker+0x70/0x70
  [<ffffffff8156ab40>] ? gs_change+0x13/0x13


-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
@ 2012-10-02 17:01 ` Paul E. McKenney
  2012-10-02 21:27   ` Jiri Kosina
  2012-10-02 20:39 ` Srivatsa S. Bhat
  1 sibling, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-02 17:01 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Paul E. McKenney, Josh Triplett, linux-kernel

On Tue, Oct 02, 2012 at 06:14:08PM +0200, Jiri Kosina wrote:
> Hi,
> 
> this commit:
> 
> ==
> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> Date:   Thu Aug 2 17:43:50 2012 -0700
> 
>     rcu: Remove _rcu_barrier() dependency on __stop_machine()
>     
>     Currently, _rcu_barrier() relies on preempt_disable() to prevent
>     any CPU from going offline, which in turn depends on CPU hotplug's
>     use of __stop_machine().
>     
>     This patch therefore makes _rcu_barrier() use get_online_cpus() to
>     block CPU-hotplug operations.  This has the added benefit of removing
>     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
>     operations are excluded, there can be no callbacks to adopt.  This
>     commit simplifies the code accordingly.
>     
>     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> ==
> 
> is causing lockdep to complain (see the full trace below). I haven't yet 
> had time to analyze what exactly is happening, and probably will not have 
> time to do so until tomorrow, so just sending this as a heads-up in case 
> anyone sees the culprit immediately.

Hmmm...  Does the following patch help?  It swaps the order in which
rcu_barrier() acquires the hotplug and rcu_barrier locks.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 2ad9e81..2c71d61 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2560,6 +2560,9 @@ static void _rcu_barrier(struct rcu_state *rsp)
 
 	_rcu_barrier_trace(rsp, "Begin", -1, snap);
 
+	/* Exclude CPU-hotplug, ensuring that no offline CPU has callbacks. */
+	get_online_cpus();
+
 	/* Take mutex to serialize concurrent rcu_barrier() requests. */
 	mutex_lock(&rsp->barrier_mutex);
 
@@ -2581,6 +2584,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 		_rcu_barrier_trace(rsp, "EarlyExit", -1, snap_done);
 		smp_mb(); /* caller's subsequent code after above check. */
 		mutex_unlock(&rsp->barrier_mutex);
+		put_online_cpus();
 		return;
 	}
 
@@ -2597,8 +2601,7 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	/*
 	 * Initialize the count to one rather than to zero in order to
 	 * avoid a too-soon return to zero in case of a short grace period
-	 * (or preemption of this task).  Exclude CPU-hotplug operations
-	 * to ensure that no offline CPU has callbacks queued.
+	 * (or preemption of this task).
 	 */
 	init_completion(&rsp->barrier_completion);
 	atomic_set(&rsp->barrier_cpu_count, 1);
@@ -2620,7 +2623,6 @@ static void _rcu_barrier(struct rcu_state *rsp)
 					   rsp->n_barrier_done);
 		}
 	}
-	put_online_cpus();
 
 	/*
 	 * Now that we have an rcu_barrier_callback() callback on each
@@ -2639,8 +2641,9 @@ static void _rcu_barrier(struct rcu_state *rsp)
 	/* Wait for all rcu_barrier_callback() callbacks to be invoked. */
 	wait_for_completion(&rsp->barrier_completion);
 
-	/* Other rcu_barrier() invocations can now safely proceed. */
+	/* Hotplug and rcu_barrier() invocations can now safely proceed. */
 	mutex_unlock(&rsp->barrier_mutex);
+	put_online_cpus();
 }
 
 /**


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
  2012-10-02 17:01 ` Paul E. McKenney
@ 2012-10-02 20:39 ` Srivatsa S. Bhat
  2012-10-02 22:17   ` Jiri Kosina
  1 sibling, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-02 20:39 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Paul E. McKenney

On 10/02/2012 09:44 PM, Jiri Kosina wrote:
> Hi,
> 
> this commit:
> 
> ==
> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> Date:   Thu Aug 2 17:43:50 2012 -0700
> 
>     rcu: Remove _rcu_barrier() dependency on __stop_machine()
>     
>     Currently, _rcu_barrier() relies on preempt_disable() to prevent
>     any CPU from going offline, which in turn depends on CPU hotplug's
>     use of __stop_machine().
>     
>     This patch therefore makes _rcu_barrier() use get_online_cpus() to
>     block CPU-hotplug operations.  This has the added benefit of removing
>     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
>     operations are excluded, there can be no callbacks to adopt.  This
>     commit simplifies the code accordingly.
>     
>     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> ==
> 
> is causing lockdep to complain (see the full trace below). I haven't yet 
> had time to analyze what exactly is happening, and probably will not have 
> time to do so until tomorrow, so just sending this as a heads-up in case 
> anyone sees the culprit immediately.
> 
>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
>  -------------------------------------------------------
>  kworker/u:2/40 is trying to acquire lock:
>   (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> 
>  but task is already holding lock:
>   (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
> 
>  which lock already depends on the new lock.
> 
> 
>  the existing dependency chain (in reverse order) is:
> 
>  -> #2 (slab_mutex){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
>         [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
>         [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
>         [<ffffffff8155719d>] _cpu_up+0xba/0x14e
>         [<ffffffff815572ed>] cpu_up+0xbc/0x117
>         [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
>         [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> 
>  -> #1 (cpu_hotplug.lock){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81049197>] get_online_cpus+0x37/0x50
>         [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
>         [<ffffffff8118cc01>] deactivate_super+0x61/0x70
>         [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
>         [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
>         [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
> 
>  -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
>         [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
>         [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
>         [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
>         [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
>         [<ffffffff81454b79>] ops_exit_list+0x39/0x60
>         [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
>         [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
>         [<ffffffff81069f3e>] worker_thread+0x12e/0x320
>         [<ffffffff8106f73e>] kthread+0x9e/0xb0
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> 
>  other info that might help us debug this:
> 
>  Chain exists of:
>    rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(slab_mutex);
>                                 lock(cpu_hotplug.lock);
>                                 lock(slab_mutex);
>    lock(rcu_sched_state.barrier_mutex);
> 
>   *** DEADLOCK ***
> 
>  4 locks held by kworker/u:2/40:
>   #0:  (netns){.+.+.+}, at: [<ffffffff810690b2>] process_one_work+0x1a2/0x4c0
>   #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff810690b2>] process_one_work+0x1a2/0x4c0
>   #2:  (net_mutex){+.+.+.}, at: [<ffffffff81455130>] cleanup_net+0x80/0x1b0
>   #3:  (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
>

I don't see how this circular locking dependency can occur.. If you are using SLUB,
kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If you are
using SLAB, kmem_cache_destroy() wraps its whole operation inside get/put_online_cpus(),
which means, it cannot run concurrently with a hotplug operation such as cpu_up(). So, I'm
rather puzzled at this lockdep splat..

Regards,
Srivatsa S. Bhat
 
>  stack backtrace:
>  Pid: 40, comm: kworker/u:2 Not tainted 3.6.0-rc5-00004-g0d8ee37 #143
>  Call Trace:
>   [<ffffffff810ac85f>] print_circular_bug+0x10f/0x120
>   [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
>   [<ffffffff810ad85a>] ? check_prev_add+0xea/0x440
>   [<ffffffff8102c72f>] ? flat_send_IPI_mask+0x7f/0xc0
>   [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>   [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>   [<ffffffff810ae921>] lock_acquire+0x121/0x190
>   [<ffffffff810f2126>] ? _rcu_barrier+0x26/0x1e0
>   [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>   [<ffffffff810f2126>] ? _rcu_barrier+0x26/0x1e0
>   [<ffffffff810b5e45>] ? on_each_cpu+0x65/0xc0
>   [<ffffffff810f2126>] ? _rcu_barrier+0x26/0x1e0
>   [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>   [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>   [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>   [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>   [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
>   [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
>   [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
>   [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
>   [<ffffffff81454b79>] ops_exit_list+0x39/0x60
>   [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
>   [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
>   [<ffffffff810690b2>] ? process_one_work+0x1a2/0x4c0
>   [<ffffffff81069e69>] ? worker_thread+0x59/0x320
>   [<ffffffff814550b0>] ? net_drop_ns+0x40/0x40
>   [<ffffffff81069f3e>] worker_thread+0x12e/0x320
>   [<ffffffff81069e10>] ? manage_workers+0x110/0x110
>   [<ffffffff8106f73e>] kthread+0x9e/0xb0
>   [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>   [<ffffffff81560b70>] ? retint_restore_args+0x13/0x13
>   [<ffffffff8106f6a0>] ? __init_kthread_worker+0x70/0x70
>   [<ffffffff8156ab40>] ? gs_change+0x13/0x13
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 17:01 ` Paul E. McKenney
@ 2012-10-02 21:27   ` Jiri Kosina
  2012-10-02 21:49     ` Jiri Kosina
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-02 21:27 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Paul E. McKenney, Josh Triplett, linux-kernel

On Tue, 2 Oct 2012, Paul E. McKenney wrote:

> > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> > commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > Date:   Thu Aug 2 17:43:50 2012 -0700
> > 
> >     rcu: Remove _rcu_barrier() dependency on __stop_machine()
> >     
> >     Currently, _rcu_barrier() relies on preempt_disable() to prevent
> >     any CPU from going offline, which in turn depends on CPU hotplug's
> >     use of __stop_machine().
> >     
> >     This patch therefore makes _rcu_barrier() use get_online_cpus() to
> >     block CPU-hotplug operations.  This has the added benefit of removing
> >     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
> >     operations are excluded, there can be no callbacks to adopt.  This
> >     commit simplifies the code accordingly.
> >     
> >     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > ==
> > 
> > is causing lockdep to complain (see the full trace below). I haven't yet 
> > had time to analyze what exactly is happening, and probably will not have 
> > time to do so until tomorrow, so just sending this as a heads-up in case 
> > anyone sees the culprit immediately.
> 
> Hmmm...  Does the following patch help?  It swaps the order in which
> rcu_barrier() acquires the hotplug and rcu_barrier locks.

It changed the report slightly (see for example the change in possible 
unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
now directly about cpu_hotplug.lock). With the patch applied I get



======================================================
[ INFO: possible circular locking dependency detected ]
3.6.0-03888-g3f99f3b #145 Not tainted
-------------------------------------------------------
kworker/u:3/43 is trying to acquire lock:
 (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81049287>] get_online_cpus+0x37/0x50

but task is already holding lock:
 (slab_mutex){+.+.+.}, at: [<ffffffff81178175>] kmem_cache_destroy+0x45/0xe0

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (slab_mutex){+.+.+.}:
       [<ffffffff810aeb22>] validate_chain+0x632/0x720
       [<ffffffff810aef69>] __lock_acquire+0x359/0x580
       [<ffffffff810af2b1>] lock_acquire+0x121/0x190
       [<ffffffff8156130c>] __mutex_lock_common+0x5c/0x450
       [<ffffffff8156182e>] mutex_lock_nested+0x3e/0x50
       [<ffffffff8155cafa>] cpuup_callback+0x2f/0xbe
       [<ffffffff81568bc3>] notifier_call_chain+0x93/0x140
       [<ffffffff81077289>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff8155b1ac>] _cpu_up+0xc9/0x162
       [<ffffffff8155b301>] cpu_up+0xbc/0x11b
       [<ffffffff81ae1793>] smp_init+0x6b/0x9f
       [<ffffffff81ac57d6>] kernel_init+0x147/0x1dc
       [<ffffffff8156eca4>] kernel_thread_helper+0x4/0x10

-> #0 (cpu_hotplug.lock){+.+.+.}:
       [<ffffffff810ae48e>] check_prev_add+0x3de/0x440
       [<ffffffff810aeb22>] validate_chain+0x632/0x720
       [<ffffffff810aef69>] __lock_acquire+0x359/0x580
       [<ffffffff810af2b1>] lock_acquire+0x121/0x190
       [<ffffffff8156130c>] __mutex_lock_common+0x5c/0x450
       [<ffffffff8156182e>] mutex_lock_nested+0x3e/0x50
       [<ffffffff81049287>] get_online_cpus+0x37/0x50
       [<ffffffff810f3a92>] _rcu_barrier+0x22/0x1f0
       [<ffffffff810f3c70>] rcu_barrier_sched+0x10/0x20
       [<ffffffff810f3c89>] rcu_barrier+0x9/0x10
       [<ffffffff81178201>] kmem_cache_destroy+0xd1/0xe0
       [<ffffffffa0488154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
       [<ffffffffa04881aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
       [<ffffffffa04892ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
       [<ffffffff81458629>] ops_exit_list+0x39/0x60
       [<ffffffff81458c5b>] cleanup_net+0xfb/0x1b0
       [<ffffffff810691eb>] process_one_work+0x26b/0x4c0
       [<ffffffff8106a03e>] worker_thread+0x12e/0x320
       [<ffffffff8106f86e>] kthread+0xde/0xf0
       [<ffffffff8156eca4>] kernel_thread_helper+0x4/0x10

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(slab_mutex);
                               lock(cpu_hotplug.lock);
                               lock(slab_mutex);
  lock(cpu_hotplug.lock);

 *** DEADLOCK ***

4 locks held by kworker/u:3/43:
 #0:  (netns){.+.+.+}, at: [<ffffffff81069122>] process_one_work+0x1a2/0x4c0
 #1:  (net_cleanup_work){+.+.+.}, at: [<ffffffff81069122>] process_one_work+0x1a2/0x4c0
 #2:  (net_mutex){+.+.+.}, at: [<ffffffff81458be0>] cleanup_net+0x80/0x1b0
 #3:  (slab_mutex){+.+.+.}, at: [<ffffffff81178175>] kmem_cache_destroy+0x45/0xe0

stack backtrace:
Pid: 43, comm: kworker/u:3 Not tainted 3.6.0-03888-g3f99f3b #145
Call Trace:
 [<ffffffff810ac5cf>] print_circular_bug+0x10f/0x120
 [<ffffffff810ae48e>] check_prev_add+0x3de/0x440
 [<ffffffff810aeb22>] validate_chain+0x632/0x720
 [<ffffffff810aef69>] __lock_acquire+0x359/0x580
 [<ffffffff810af2b1>] lock_acquire+0x121/0x190
 [<ffffffff81049287>] ? get_online_cpus+0x37/0x50
 [<ffffffff8156130c>] __mutex_lock_common+0x5c/0x450
 [<ffffffff81049287>] ? get_online_cpus+0x37/0x50
 [<ffffffff810ada40>] ? mark_held_locks+0x80/0x120
 [<ffffffff81049287>] ? get_online_cpus+0x37/0x50
 [<ffffffff8156182e>] mutex_lock_nested+0x3e/0x50
 [<ffffffff81049287>] get_online_cpus+0x37/0x50
 [<ffffffff810f3a92>] _rcu_barrier+0x22/0x1f0
 [<ffffffff810f3c70>] rcu_barrier_sched+0x10/0x20
 [<ffffffff810f3c89>] rcu_barrier+0x9/0x10
 [<ffffffff81178201>] kmem_cache_destroy+0xd1/0xe0
 [<ffffffffa0488154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
 [<ffffffffa04881aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
 [<ffffffffa04892ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
 [<ffffffff81458629>] ops_exit_list+0x39/0x60
 [<ffffffff81458c5b>] cleanup_net+0xfb/0x1b0
 [<ffffffff810691eb>] process_one_work+0x26b/0x4c0
 [<ffffffff81069122>] ? process_one_work+0x1a2/0x4c0
 [<ffffffff81069f69>] ? worker_thread+0x59/0x320
 [<ffffffff81458b60>] ? net_drop_ns+0x40/0x40
 [<ffffffff8106a03e>] worker_thread+0x12e/0x320
 [<ffffffff81069f10>] ? manage_workers+0x1a0/0x1a0
 [<ffffffff8106f86e>] kthread+0xde/0xf0
 [<ffffffff8156eca4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81564b33>] ? retint_restore_args+0x13/0x13
 [<ffffffff8106f790>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8156eca0>] ? gs_change+0x13/0x13

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 21:27   ` Jiri Kosina
@ 2012-10-02 21:49     ` Jiri Kosina
  2012-10-02 21:58       ` Jiri Kosina
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-02 21:49 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat

On Tue, 2 Oct 2012, Jiri Kosina wrote:

> > > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> > > commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > Date:   Thu Aug 2 17:43:50 2012 -0700
> > > 
> > >     rcu: Remove _rcu_barrier() dependency on __stop_machine()
> > >     
> > >     Currently, _rcu_barrier() relies on preempt_disable() to prevent
> > >     any CPU from going offline, which in turn depends on CPU hotplug's
> > >     use of __stop_machine().
> > >     
> > >     This patch therefore makes _rcu_barrier() use get_online_cpus() to
> > >     block CPU-hotplug operations.  This has the added benefit of removing
> > >     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
> > >     operations are excluded, there can be no callbacks to adopt.  This
> > >     commit simplifies the code accordingly.
> > >     
> > >     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > >     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > > ==
> > > 
> > > is causing lockdep to complain (see the full trace below). I haven't yet 
> > > had time to analyze what exactly is happening, and probably will not have 
> > > time to do so until tomorrow, so just sending this as a heads-up in case 
> > > anyone sees the culprit immediately.
> > 
> > Hmmm...  Does the following patch help?  It swaps the order in which
> > rcu_barrier() acquires the hotplug and rcu_barrier locks.
> 
> It changed the report slightly (see for example the change in possible 
> unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
> now directly about cpu_hotplug.lock). With the patch applied I get
> 
> 
> 
> ======================================================
> [ INFO: possible circular locking dependency detected ]
> 3.6.0-03888-g3f99f3b #145 Not tainted

And it really seems valid. 

kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
introduces slab_mutex -> cpu_hotplug.lock dependency (through 
rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).

On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
gets called, which acquires slab_mutex. This gives the reverse dependency, 
i.e. deadlock scenario is valid one.

1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
before that, there was no slab_mutex -> cpu_hotplug.lock dependency.

Simply put, the commit causes get_online_cpus() to be called with 
slab_mutex held, which is invalid.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 21:49     ` Jiri Kosina
@ 2012-10-02 21:58       ` Jiri Kosina
  2012-10-02 23:31         ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-02 21:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat

On Tue, 2 Oct 2012, Jiri Kosina wrote:

> > > > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> > > > commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > Date:   Thu Aug 2 17:43:50 2012 -0700
> > > > 
> > > >     rcu: Remove _rcu_barrier() dependency on __stop_machine()
> > > >     
> > > >     Currently, _rcu_barrier() relies on preempt_disable() to prevent
> > > >     any CPU from going offline, which in turn depends on CPU hotplug's
> > > >     use of __stop_machine().
> > > >     
> > > >     This patch therefore makes _rcu_barrier() use get_online_cpus() to
> > > >     block CPU-hotplug operations.  This has the added benefit of removing
> > > >     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
> > > >     operations are excluded, there can be no callbacks to adopt.  This
> > > >     commit simplifies the code accordingly.
> > > >     
> > > >     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > >     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > > > ==
> > > > 
> > > > is causing lockdep to complain (see the full trace below). I haven't yet 
> > > > had time to analyze what exactly is happening, and probably will not have 
> > > > time to do so until tomorrow, so just sending this as a heads-up in case 
> > > > anyone sees the culprit immediately.
> > > 
> > > Hmmm...  Does the following patch help?  It swaps the order in which
> > > rcu_barrier() acquires the hotplug and rcu_barrier locks.
> > 
> > It changed the report slightly (see for example the change in possible 
> > unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
> > now directly about cpu_hotplug.lock). With the patch applied I get
> > 
> > 
> > 
> > ======================================================
> > [ INFO: possible circular locking dependency detected ]
> > 3.6.0-03888-g3f99f3b #145 Not tainted
> 
> And it really seems valid. 
> 
> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
> 
> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
> gets called, which acquires slab_mutex. This gives the reverse dependency, 
> i.e. deadlock scenario is valid one.
> 
> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
> 
> Simply put, the commit causes get_online_cpus() to be called with 
> slab_mutex held, which is invalid.

Oh, and it seems to be actually triggering in real.

With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
your patch, changing the order in which rcu_barrier() acquires hotplug and 
rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
very likely actually is the deadlock described above.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 20:39 ` Srivatsa S. Bhat
@ 2012-10-02 22:17   ` Jiri Kosina
  2012-10-03  3:35     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-02 22:17 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Paul E. McKenney

On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:

> I don't see how this circular locking dependency can occur.. If you are using SLUB,
> kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If you are
> using SLAB, kmem_cache_destroy() wraps its whole operation inside get/put_online_cpus(),
> which means, it cannot run concurrently with a hotplug operation such as cpu_up(). So, I'm
> rather puzzled at this lockdep splat..

I am using SLAB here.

The scenario I think is very well possible:


	CPU 0				CPU 1
	kmem_cache_destroy()
	mutex_lock(slab_mutex)
	 				_cpu_up()
					cpu_hotplug_begin()
					mutex_lock(cpu_hotplug.lock)
	rcu_barrier()
	_rcu_barrier()
	get_online_cpus()
	mutex_lock(cpu_hotplug.lock)
	 (blocks, CPU 1 has the mutex)
					__cpu_notify()
					mutex_lock(slab_mutex)

Deadlock.

Right?

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 21:58       ` Jiri Kosina
@ 2012-10-02 23:31         ` Paul E. McKenney
  2012-10-02 23:48           ` Jiri Kosina
  2012-10-03  3:59           ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
  0 siblings, 2 replies; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-02 23:31 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat

On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Jiri Kosina wrote:
> 
> > > > > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> > > > > commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> > > > > Author: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > > Date:   Thu Aug 2 17:43:50 2012 -0700
> > > > > 
> > > > >     rcu: Remove _rcu_barrier() dependency on __stop_machine()
> > > > >     
> > > > >     Currently, _rcu_barrier() relies on preempt_disable() to prevent
> > > > >     any CPU from going offline, which in turn depends on CPU hotplug's
> > > > >     use of __stop_machine().
> > > > >     
> > > > >     This patch therefore makes _rcu_barrier() use get_online_cpus() to
> > > > >     block CPU-hotplug operations.  This has the added benefit of removing
> > > > >     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
> > > > >     operations are excluded, there can be no callbacks to adopt.  This
> > > > >     commit simplifies the code accordingly.
> > > > >     
> > > > >     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> > > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > >     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> > > > > ==
> > > > > 
> > > > > is causing lockdep to complain (see the full trace below). I haven't yet 
> > > > > had time to analyze what exactly is happening, and probably will not have 
> > > > > time to do so until tomorrow, so just sending this as a heads-up in case 
> > > > > anyone sees the culprit immediately.
> > > > 
> > > > Hmmm...  Does the following patch help?  It swaps the order in which
> > > > rcu_barrier() acquires the hotplug and rcu_barrier locks.
> > > 
> > > It changed the report slightly (see for example the change in possible 
> > > unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
> > > now directly about cpu_hotplug.lock). With the patch applied I get
> > > 
> > > 
> > > 
> > > ======================================================
> > > [ INFO: possible circular locking dependency detected ]
> > > 3.6.0-03888-g3f99f3b #145 Not tainted
> > 
> > And it really seems valid. 

Yep, it sure is.  I wasn't getting the full picture earlier, so please
accept my apologies for the bogus patch.

> > kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
> > introduces slab_mutex -> cpu_hotplug.lock dependency (through 
> > rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
> > 
> > On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
> > cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
> > gets called, which acquires slab_mutex. This gives the reverse dependency, 
> > i.e. deadlock scenario is valid one.
> > 
> > 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
> > before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
> > 
> > Simply put, the commit causes get_online_cpus() to be called with 
> > slab_mutex held, which is invalid.
> 
> Oh, and it seems to be actually triggering in real.
> 
> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
> your patch, changing the order in which rcu_barrier() acquires hotplug and 
> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
> very likely actually is the deadlock described above.

Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
notifier, which doesn't sit so well with rcu_barrier() trying to exclude
CPU hotplug events.  I could go back to the old approach, but it is
significantly more complex.  I cannot say that I am all that happy
about anyone calling rcu_barrier() from a CPU hotplug notifier because
it doesn't help CPU hotplug latency, but that is a separate issue.

But the thing is that rcu_barrier()'s assumptions work just fine if either
(1) it excludes hotplug operations or (2) if it is called from a hotplug
notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
is executing.  So the right way to resolve this seems to be to do the
get_online_cpus() only if rcu_barrier() is -not- executing in the context
of a hotplug notifier.  Should be fixable without too much hassle...

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 23:31         ` Paul E. McKenney
@ 2012-10-02 23:48           ` Jiri Kosina
  2012-10-03  0:15             ` Paul E. McKenney
  2012-10-03  3:59           ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
  1 sibling, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-02 23:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat

On Tue, 2 Oct 2012, Paul E. McKenney wrote:

> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
> notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
> CPU hotplug events.  I could go back to the old approach, but it is 
> significantly more complex.  I cannot say that I am all that happy about 
> anyone calling rcu_barrier() from a CPU hotplug notifier because it 
> doesn't help CPU hotplug latency, but that is a separate issue.
> 
> But the thing is that rcu_barrier()'s assumptions work just fine if either
> (1) it excludes hotplug operations or (2) if it is called from a hotplug
> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> is executing.  So the right way to resolve this seems to be to do the
> get_online_cpus() only if rcu_barrier() is -not- executing in the context
> of a hotplug notifier.  Should be fixable without too much hassle...

Sorry, I don't think I understand what you are proposing just yet.

If I understand it correctly, you are proposing to introduce some magic 
into _rcu_barrier() such as (pseudocode of course):

	if (!being_called_from_hotplug_notifier_callback)
		get_online_cpus()

How does that protect from the scenario I've outlined before though?

	CPU 0                           CPU 1
	kmem_cache_destroy()
	mutex_lock(slab_mutex)
					_cpu_up()
					cpu_hotplug_begin()
					mutex_lock(cpu_hotplug.lock)
	rcu_barrier()
	_rcu_barrier()
	get_online_cpus()
	mutex_lock(cpu_hotplug.lock)
	 (blocks, CPU 1 has the mutex)
					__cpu_notify()
					mutex_lock(slab_mutex)	

CPU 0 grabs both locks anyway (it's not running from notifier callback). 
CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
from notifier callback either.

What did I miss?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 23:48           ` Jiri Kosina
@ 2012-10-03  0:15             ` Paul E. McKenney
  2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-03  0:15 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat

On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> 
> > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
> > notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
> > CPU hotplug events.  I could go back to the old approach, but it is 
> > significantly more complex.  I cannot say that I am all that happy about 
> > anyone calling rcu_barrier() from a CPU hotplug notifier because it 
> > doesn't help CPU hotplug latency, but that is a separate issue.
> > 
> > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> > is executing.  So the right way to resolve this seems to be to do the
> > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > of a hotplug notifier.  Should be fixable without too much hassle...
> 
> Sorry, I don't think I understand what you are proposing just yet.
> 
> If I understand it correctly, you are proposing to introduce some magic 
> into _rcu_barrier() such as (pseudocode of course):
> 
> 	if (!being_called_from_hotplug_notifier_callback)
> 		get_online_cpus()
> 
> How does that protect from the scenario I've outlined before though?
> 
> 	CPU 0                           CPU 1
> 	kmem_cache_destroy()
> 	mutex_lock(slab_mutex)
> 					_cpu_up()
> 					cpu_hotplug_begin()
> 					mutex_lock(cpu_hotplug.lock)
> 	rcu_barrier()
> 	_rcu_barrier()
> 	get_online_cpus()
> 	mutex_lock(cpu_hotplug.lock)
> 	 (blocks, CPU 1 has the mutex)
> 					__cpu_notify()
> 					mutex_lock(slab_mutex)	
> 
> CPU 0 grabs both locks anyway (it's not running from notifier callback). 
> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
> from notifier callback either.
> 
> What did I miss?

You didn't miss anything, I was suffering a failure to read carefully.

So my next stupid question is "Why can't kmem_cache_destroy drop
slab_mutex early?" like the following:

	void kmem_cache_destroy(struct kmem_cache *cachep)
	{
		BUG_ON(!cachep || in_interrupt());

		/* Find the cache in the chain of caches. */
		get_online_cpus();
		mutex_lock(&slab_mutex);
		/*
		 * the chain is never empty, cache_cache is never destroyed
		 */
		list_del(&cachep->list);
		if (__cache_shrink(cachep)) {
			slab_error(cachep, "Can't free all objects");
			list_add(&cachep->list, &slab_caches);
			mutex_unlock(&slab_mutex);
			put_online_cpus();
			return;
		}
		mutex_unlock(&slab_mutex);

		if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
			rcu_barrier();

		__kmem_cache_destroy(cachep);
		put_online_cpus();
	}

Or did I miss some reason why __kmem_cache_destroy() needs that lock?
Looks to me like it is just freeing now-disconnected memory.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
  2012-10-03  0:15             ` Paul E. McKenney
@ 2012-10-03  0:45               ` Jiri Kosina
  2012-10-03  3:41                 ` Paul E. McKenney
                                   ` (3 more replies)
  0 siblings, 4 replies; 46+ messages in thread
From: Jiri Kosina @ 2012-10-03  0:45 UTC (permalink / raw)
  To: Paul E. McKenney, Christoph Lameter, Pekka Enberg
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat,
	linux-mm

On Tue, 2 Oct 2012, Paul E. McKenney wrote:

> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
> > On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> > 
> > > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
> > > notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
> > > CPU hotplug events.  I could go back to the old approach, but it is 
> > > significantly more complex.  I cannot say that I am all that happy about 
> > > anyone calling rcu_barrier() from a CPU hotplug notifier because it 
> > > doesn't help CPU hotplug latency, but that is a separate issue.
> > > 
> > > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > > notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> > > is executing.  So the right way to resolve this seems to be to do the
> > > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > > of a hotplug notifier.  Should be fixable without too much hassle...
> > 
> > Sorry, I don't think I understand what you are proposing just yet.
> > 
> > If I understand it correctly, you are proposing to introduce some magic 
> > into _rcu_barrier() such as (pseudocode of course):
> > 
> > 	if (!being_called_from_hotplug_notifier_callback)
> > 		get_online_cpus()
> > 
> > How does that protect from the scenario I've outlined before though?
> > 
> > 	CPU 0                           CPU 1
> > 	kmem_cache_destroy()
> > 	mutex_lock(slab_mutex)
> > 					_cpu_up()
> > 					cpu_hotplug_begin()
> > 					mutex_lock(cpu_hotplug.lock)
> > 	rcu_barrier()
> > 	_rcu_barrier()
> > 	get_online_cpus()
> > 	mutex_lock(cpu_hotplug.lock)
> > 	 (blocks, CPU 1 has the mutex)
> > 					__cpu_notify()
> > 					mutex_lock(slab_mutex)	
> > 
> > CPU 0 grabs both locks anyway (it's not running from notifier callback). 
> > CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
> > from notifier callback either.
> > 
> > What did I miss?
> 
> You didn't miss anything, I was suffering a failure to read carefully.
> 
> So my next stupid question is "Why can't kmem_cache_destroy drop
> slab_mutex early?" like the following:
> 
> 	void kmem_cache_destroy(struct kmem_cache *cachep)
> 	{
> 		BUG_ON(!cachep || in_interrupt());
> 
> 		/* Find the cache in the chain of caches. */
> 		get_online_cpus();
> 		mutex_lock(&slab_mutex);
> 		/*
> 		 * the chain is never empty, cache_cache is never destroyed
> 		 */
> 		list_del(&cachep->list);
> 		if (__cache_shrink(cachep)) {
> 			slab_error(cachep, "Can't free all objects");
> 			list_add(&cachep->list, &slab_caches);
> 			mutex_unlock(&slab_mutex);
> 			put_online_cpus();
> 			return;
> 		}
> 		mutex_unlock(&slab_mutex);
> 
> 		if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> 			rcu_barrier();
> 
> 		__kmem_cache_destroy(cachep);
> 		put_online_cpus();
> 	}
> 
> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
> Looks to me like it is just freeing now-disconnected memory.

Good question. I believe it should be safe to drop slab_mutex earlier, as 
cachep has already been unlinked. I am adding slab people and linux-mm to 
CC (the whole thread on LKML can be found at 
https://lkml.org/lkml/2012/10/2/296 for reference).

How about the patch below? Pekka, Christoph, please?

It makes the lockdep happy again, and obviously removes the deadlock (I 
tested it).



From: Jiri Kosina <jkosina@suse.cz>
Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()

Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().

This opens a possibilty for deadlock:

        CPU 0                           CPU 1
	        kmem_cache_destroy()
	        mutex_lock(slab_mutex)
	                                        _cpu_up()
	                                        cpu_hotplug_begin()
	                                        mutex_lock(cpu_hotplug.lock)
	        rcu_barrier()
	        _rcu_barrier()
	        get_online_cpus()
	        mutex_lock(cpu_hotplug.lock)
	         (blocks, CPU 1 has the mutex)
	                                        __cpu_notify()
	                                        mutex_lock(slab_mutex)

It turns out that slab's kmem_cache_destroy() might release slab_mutex
earlier before calling out to rcu_barrier(), as cachep has already been
unlinked.

This patch removes the AB-BA dependency by calling rcu_barrier() with 
slab_mutex already unlocked.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 mm/slab.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 1133911..693c7cb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
 		put_online_cpus();
 		return;
 	}
+	mutex_unlock(&slab_mutex);
 
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
 		rcu_barrier();
 
 	__kmem_cache_destroy(cachep);
-	mutex_unlock(&slab_mutex);
 	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 22:17   ` Jiri Kosina
@ 2012-10-03  3:35     ` Srivatsa S. Bhat
  2012-10-03  3:44       ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  3:35 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Paul E. McKenney

On 10/03/2012 03:47 AM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> 
>> I don't see how this circular locking dependency can occur.. If you are using SLUB,
>> kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If you are
>> using SLAB, kmem_cache_destroy() wraps its whole operation inside get/put_online_cpus(),
>> which means, it cannot run concurrently with a hotplug operation such as cpu_up(). So, I'm
>> rather puzzled at this lockdep splat..
> 
> I am using SLAB here.
> 
> The scenario I think is very well possible:
> 
> 
> 	CPU 0				CPU 1
> 	kmem_cache_destroy()

What about the get_online_cpus() right here at CPU0 before
calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
on CPU1?? I still don't get it... :(

(kmem_cache_destroy() uses get/put_online_cpus() around acquiring
and releasing slab_mutex).

Regards,
Srivatsa S. Bhat

> 	mutex_lock(slab_mutex)
> 	 				_cpu_up()
> 					cpu_hotplug_begin()
> 					mutex_lock(cpu_hotplug.lock)
> 	rcu_barrier()
> 	_rcu_barrier()
> 	get_online_cpus()
> 	mutex_lock(cpu_hotplug.lock)
> 	 (blocks, CPU 1 has the mutex)
> 					__cpu_notify()
> 					mutex_lock(slab_mutex)
> 
> Deadlock.
> 
> Right?
> 



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
  2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
@ 2012-10-03  3:41                 ` Paul E. McKenney
  2012-10-03  3:50                 ` Srivatsa S. Bhat
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-03  3:41 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Christoph Lameter, Pekka Enberg, Paul E. McKenney, Josh Triplett,
	linux-kernel, Srivatsa S. Bhat, linux-mm

On Wed, Oct 03, 2012 at 02:45:30AM +0200, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> 
> > On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
> > > On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> > > 
> > > > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
> > > > notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
> > > > CPU hotplug events.  I could go back to the old approach, but it is 
> > > > significantly more complex.  I cannot say that I am all that happy about 
> > > > anyone calling rcu_barrier() from a CPU hotplug notifier because it 
> > > > doesn't help CPU hotplug latency, but that is a separate issue.
> > > > 
> > > > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > > > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > > > notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> > > > is executing.  So the right way to resolve this seems to be to do the
> > > > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > > > of a hotplug notifier.  Should be fixable without too much hassle...
> > > 
> > > Sorry, I don't think I understand what you are proposing just yet.
> > > 
> > > If I understand it correctly, you are proposing to introduce some magic 
> > > into _rcu_barrier() such as (pseudocode of course):
> > > 
> > > 	if (!being_called_from_hotplug_notifier_callback)
> > > 		get_online_cpus()
> > > 
> > > How does that protect from the scenario I've outlined before though?
> > > 
> > > 	CPU 0                           CPU 1
> > > 	kmem_cache_destroy()
> > > 	mutex_lock(slab_mutex)
> > > 					_cpu_up()
> > > 					cpu_hotplug_begin()
> > > 					mutex_lock(cpu_hotplug.lock)
> > > 	rcu_barrier()
> > > 	_rcu_barrier()
> > > 	get_online_cpus()
> > > 	mutex_lock(cpu_hotplug.lock)
> > > 	 (blocks, CPU 1 has the mutex)
> > > 					__cpu_notify()
> > > 					mutex_lock(slab_mutex)	
> > > 
> > > CPU 0 grabs both locks anyway (it's not running from notifier callback). 
> > > CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
> > > from notifier callback either.
> > > 
> > > What did I miss?
> > 
> > You didn't miss anything, I was suffering a failure to read carefully.
> > 
> > So my next stupid question is "Why can't kmem_cache_destroy drop
> > slab_mutex early?" like the following:
> > 
> > 	void kmem_cache_destroy(struct kmem_cache *cachep)
> > 	{
> > 		BUG_ON(!cachep || in_interrupt());
> > 
> > 		/* Find the cache in the chain of caches. */
> > 		get_online_cpus();
> > 		mutex_lock(&slab_mutex);
> > 		/*
> > 		 * the chain is never empty, cache_cache is never destroyed
> > 		 */
> > 		list_del(&cachep->list);
> > 		if (__cache_shrink(cachep)) {
> > 			slab_error(cachep, "Can't free all objects");
> > 			list_add(&cachep->list, &slab_caches);
> > 			mutex_unlock(&slab_mutex);
> > 			put_online_cpus();
> > 			return;
> > 		}
> > 		mutex_unlock(&slab_mutex);
> > 
> > 		if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> > 			rcu_barrier();
> > 
> > 		__kmem_cache_destroy(cachep);
> > 		put_online_cpus();
> > 	}
> > 
> > Or did I miss some reason why __kmem_cache_destroy() needs that lock?
> > Looks to me like it is just freeing now-disconnected memory.
> 
> Good question. I believe it should be safe to drop slab_mutex earlier, as 
> cachep has already been unlinked. I am adding slab people and linux-mm to 
> CC (the whole thread on LKML can be found at 
> https://lkml.org/lkml/2012/10/2/296 for reference).
> 
> How about the patch below? Pekka, Christoph, please?
> 
> It makes the lockdep happy again, and obviously removes the deadlock (I 
> tested it).

You can certainly add my Reviewed-by, for whatever that is worth.  ;-)

BTW, with this patch, are you able to dispense with my earlier patch,
or is it still needed?

							Thanx, Paul

> From: Jiri Kosina <jkosina@suse.cz>
> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> 
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> This opens a possibilty for deadlock:
> 
>         CPU 0                           CPU 1
> 	        kmem_cache_destroy()
> 	        mutex_lock(slab_mutex)
> 	                                        _cpu_up()
> 	                                        cpu_hotplug_begin()
> 	                                        mutex_lock(cpu_hotplug.lock)
> 	        rcu_barrier()
> 	        _rcu_barrier()
> 	        get_online_cpus()
> 	        mutex_lock(cpu_hotplug.lock)
> 	         (blocks, CPU 1 has the mutex)
> 	                                        __cpu_notify()
> 	                                        mutex_lock(slab_mutex)
> 
> It turns out that slab's kmem_cache_destroy() might release slab_mutex
> earlier before calling out to rcu_barrier(), as cachep has already been
> unlinked.
> 
> This patch removes the AB-BA dependency by calling rcu_barrier() with 
> slab_mutex already unlocked.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
>  mm/slab.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 1133911..693c7cb 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
>  		put_online_cpus();
>  		return;
>  	}
> +	mutex_unlock(&slab_mutex);
> 
>  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>  		rcu_barrier();
> 
>  	__kmem_cache_destroy(cachep);
> -	mutex_unlock(&slab_mutex);
>  	put_online_cpus();
>  }
>  EXPORT_SYMBOL(kmem_cache_destroy);
> 
> -- 
> Jiri Kosina
> SUSE Labs
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  3:35     ` Srivatsa S. Bhat
@ 2012-10-03  3:44       ` Paul E. McKenney
  2012-10-03  4:04         ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-03  3:44 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Kosina, Paul E. McKenney, Josh Triplett, linux-kernel

On Wed, Oct 03, 2012 at 09:05:31AM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 03:47 AM, Jiri Kosina wrote:
> > On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> > 
> >> I don't see how this circular locking dependency can occur.. If you are using SLUB,
> >> kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If you are
> >> using SLAB, kmem_cache_destroy() wraps its whole operation inside get/put_online_cpus(),
> >> which means, it cannot run concurrently with a hotplug operation such as cpu_up(). So, I'm
> >> rather puzzled at this lockdep splat..
> > 
> > I am using SLAB here.
> > 
> > The scenario I think is very well possible:
> > 
> > 
> > 	CPU 0				CPU 1
> > 	kmem_cache_destroy()
> 
> What about the get_online_cpus() right here at CPU0 before
> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
> on CPU1?? I still don't get it... :(
> 
> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
> and releasing slab_mutex).

The problem is that there is a CPU-hotplug notifier for slab, which
establishes hotplug->slab.  Then having kmem_cache_destroy() call
rcu_barrier() under the lock establishes slab->hotplug, which results
in deadlock.  Jiri really did explain this in an earlier email
message, but both of us managed to miss it.  ;-)

							Thanx, Paul

> Regards,
> Srivatsa S. Bhat
> 
> > 	mutex_lock(slab_mutex)
> > 	 				_cpu_up()
> > 					cpu_hotplug_begin()
> > 					mutex_lock(cpu_hotplug.lock)
> > 	rcu_barrier()
> > 	_rcu_barrier()
> > 	get_online_cpus()
> > 	mutex_lock(cpu_hotplug.lock)
> > 	 (blocks, CPU 1 has the mutex)
> > 					__cpu_notify()
> > 					mutex_lock(slab_mutex)
> > 
> > Deadlock.
> > 
> > Right?
> > 
> 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
  2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
  2012-10-03  3:41                 ` Paul E. McKenney
@ 2012-10-03  3:50                 ` Srivatsa S. Bhat
  2012-10-03  6:08                   ` Srivatsa S. Bhat
  2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
  2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
  3 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  3:50 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 06:15 AM, Jiri Kosina wrote:
> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
> 
>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>
>>>> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
>>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
>>>> CPU hotplug events.  I could go back to the old approach, but it is 
>>>> significantly more complex.  I cannot say that I am all that happy about 
>>>> anyone calling rcu_barrier() from a CPU hotplug notifier because it 
>>>> doesn't help CPU hotplug latency, but that is a separate issue.
>>>>
>>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>>> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
>>>> is executing.  So the right way to resolve this seems to be to do the
>>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>>> of a hotplug notifier.  Should be fixable without too much hassle...
>>>
>>> Sorry, I don't think I understand what you are proposing just yet.
>>>
>>> If I understand it correctly, you are proposing to introduce some magic 
>>> into _rcu_barrier() such as (pseudocode of course):
>>>
>>> 	if (!being_called_from_hotplug_notifier_callback)
>>> 		get_online_cpus()
>>>
>>> How does that protect from the scenario I've outlined before though?
>>>
>>> 	CPU 0                           CPU 1
>>> 	kmem_cache_destroy()
>>> 	mutex_lock(slab_mutex)
>>> 					_cpu_up()
>>> 					cpu_hotplug_begin()
>>> 					mutex_lock(cpu_hotplug.lock)
>>> 	rcu_barrier()
>>> 	_rcu_barrier()
>>> 	get_online_cpus()
>>> 	mutex_lock(cpu_hotplug.lock)
>>> 	 (blocks, CPU 1 has the mutex)
>>> 					__cpu_notify()
>>> 					mutex_lock(slab_mutex)	
>>>
>>> CPU 0 grabs both locks anyway (it's not running from notifier callback). 
>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
>>> from notifier callback either.
>>>
>>> What did I miss?
>>
>> You didn't miss anything, I was suffering a failure to read carefully.
>>
>> So my next stupid question is "Why can't kmem_cache_destroy drop
>> slab_mutex early?" like the following:
>>
>> 	void kmem_cache_destroy(struct kmem_cache *cachep)
>> 	{
>> 		BUG_ON(!cachep || in_interrupt());
>>
>> 		/* Find the cache in the chain of caches. */
>> 		get_online_cpus();
>> 		mutex_lock(&slab_mutex);
>> 		/*
>> 		 * the chain is never empty, cache_cache is never destroyed
>> 		 */
>> 		list_del(&cachep->list);
>> 		if (__cache_shrink(cachep)) {
>> 			slab_error(cachep, "Can't free all objects");
>> 			list_add(&cachep->list, &slab_caches);
>> 			mutex_unlock(&slab_mutex);
>> 			put_online_cpus();
>> 			return;
>> 		}
>> 		mutex_unlock(&slab_mutex);
>>
>> 		if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>> 			rcu_barrier();
>>
>> 		__kmem_cache_destroy(cachep);
>> 		put_online_cpus();
>> 	}
>>
>> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
>> Looks to me like it is just freeing now-disconnected memory.
> 
> Good question. I believe it should be safe to drop slab_mutex earlier, as 
> cachep has already been unlinked. I am adding slab people and linux-mm to 
> CC (the whole thread on LKML can be found at 
> https://lkml.org/lkml/2012/10/2/296 for reference).
> 
> How about the patch below? Pekka, Christoph, please?
> 
> It makes the lockdep happy again, and obviously removes the deadlock (I 
> tested it).
> 
> 
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> 
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> This opens a possibilty for deadlock:
> 
>         CPU 0                           CPU 1
> 	        kmem_cache_destroy()
> 	        mutex_lock(slab_mutex)
> 	                                        _cpu_up()
> 	                                        cpu_hotplug_begin()
> 	                                        mutex_lock(cpu_hotplug.lock)
> 	        rcu_barrier()
> 	        _rcu_barrier()
> 	        get_online_cpus()
> 	        mutex_lock(cpu_hotplug.lock)
> 	         (blocks, CPU 1 has the mutex)
> 	                                        __cpu_notify()
> 	                                        mutex_lock(slab_mutex)

Hmm.. no, this should *never* happen IMHO!

If I am seeing the code right, kmem_cache_destroy() wraps its entire content
inside get/put_online_cpus(), which means it cannot run concurrently with cpu_up()
or cpu_down(). Are we really hitting a corner case where the refcounting logic
in get/put_online_cpus() is failing and allowing a hotplug writer to run in
parallel with a hotplug reader? If yes, *that* is the problem we have to fix..

Regards,
Srivatsa S. Bhat

> 
> It turns out that slab's kmem_cache_destroy() might release slab_mutex
> earlier before calling out to rcu_barrier(), as cachep has already been
> unlinked.
> 
> This patch removes the AB-BA dependency by calling rcu_barrier() with 
> slab_mutex already unlocked.
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
>  mm/slab.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 1133911..693c7cb 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
>  		put_online_cpus();
>  		return;
>  	}
> +	mutex_unlock(&slab_mutex);
> 
>  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>  		rcu_barrier();
> 
>  	__kmem_cache_destroy(cachep);
> -	mutex_unlock(&slab_mutex);
>  	put_online_cpus();
>  }
>  EXPORT_SYMBOL(kmem_cache_destroy);
> 


-- 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-02 23:31         ` Paul E. McKenney
  2012-10-02 23:48           ` Jiri Kosina
@ 2012-10-03  3:59           ` Srivatsa S. Bhat
  2012-10-03  4:07             ` Paul E. McKenney
  1 sibling, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  3:59 UTC (permalink / raw)
  To: paulmck; +Cc: Jiri Kosina, Paul E. McKenney, Josh Triplett, linux-kernel

On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
> On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
>> On Tue, 2 Oct 2012, Jiri Kosina wrote:
>>
>>>>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
>>>>>> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
>>>>>> Author: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>> Date:   Thu Aug 2 17:43:50 2012 -0700
>>>>>>
>>>>>>     rcu: Remove _rcu_barrier() dependency on __stop_machine()
>>>>>>     
>>>>>>     Currently, _rcu_barrier() relies on preempt_disable() to prevent
>>>>>>     any CPU from going offline, which in turn depends on CPU hotplug's
>>>>>>     use of __stop_machine().
>>>>>>     
>>>>>>     This patch therefore makes _rcu_barrier() use get_online_cpus() to
>>>>>>     block CPU-hotplug operations.  This has the added benefit of removing
>>>>>>     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
>>>>>>     operations are excluded, there can be no callbacks to adopt.  This
>>>>>>     commit simplifies the code accordingly.
>>>>>>     
>>>>>>     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>>>>>     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
>>>>>> ==
>>>>>>
>>>>>> is causing lockdep to complain (see the full trace below). I haven't yet 
>>>>>> had time to analyze what exactly is happening, and probably will not have 
>>>>>> time to do so until tomorrow, so just sending this as a heads-up in case 
>>>>>> anyone sees the culprit immediately.
>>>>>
>>>>> Hmmm...  Does the following patch help?  It swaps the order in which
>>>>> rcu_barrier() acquires the hotplug and rcu_barrier locks.
>>>>
>>>> It changed the report slightly (see for example the change in possible 
>>>> unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
>>>> now directly about cpu_hotplug.lock). With the patch applied I get
>>>>
>>>>
>>>>
>>>> ======================================================
>>>> [ INFO: possible circular locking dependency detected ]
>>>> 3.6.0-03888-g3f99f3b #145 Not tainted
>>>
>>> And it really seems valid. 
> 
> Yep, it sure is.  I wasn't getting the full picture earlier, so please
> accept my apologies for the bogus patch.
> 
>>> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
>>> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
>>> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
>>>
>>> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
>>> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
>>> gets called, which acquires slab_mutex. This gives the reverse dependency, 
>>> i.e. deadlock scenario is valid one.
>>>
>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
>>> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
>>>
>>> Simply put, the commit causes get_online_cpus() to be called with 
>>> slab_mutex held, which is invalid.
>>
>> Oh, and it seems to be actually triggering in real.
>>
>> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
>> your patch, changing the order in which rcu_barrier() acquires hotplug and 
>> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
>> very likely actually is the deadlock described above.
> 
> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
> CPU hotplug events.

Why not? IMHO it should have been perfectly fine! See below...

>  I could go back to the old approach, but it is
> significantly more complex.  I cannot say that I am all that happy
> about anyone calling rcu_barrier() from a CPU hotplug notifier because
> it doesn't help CPU hotplug latency, but that is a separate issue.
> 
> But the thing is that rcu_barrier()'s assumptions work just fine if either
> (1) it excludes hotplug operations or (2) if it is called from a hotplug
> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> is executing.  So the right way to resolve this seems to be to do the
> get_online_cpus() only if rcu_barrier() is -not- executing in the context
> of a hotplug notifier.  Should be fixable without too much hassle...
> 

The thing is, get_online_cpus() is smart: it *knows* when you are calling
it in a hotplug-writer, IOW, when you are in a hotplug notifier.

The relevant code is:

void get_online_cpus(void)
{
        might_sleep();
        if (cpu_hotplug.active_writer == current)
                return;
	....
}

So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
notifier should pose no problem at all!
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  3:44       ` Paul E. McKenney
@ 2012-10-03  4:04         ` Srivatsa S. Bhat
  2012-10-03  7:43           ` Jiri Kosina
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  4:04 UTC (permalink / raw)
  To: paulmck; +Cc: Jiri Kosina, Paul E. McKenney, Josh Triplett, linux-kernel

On 10/03/2012 09:14 AM, Paul E. McKenney wrote:
> On Wed, Oct 03, 2012 at 09:05:31AM +0530, Srivatsa S. Bhat wrote:
>> On 10/03/2012 03:47 AM, Jiri Kosina wrote:
>>> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
>>>
>>>> I don't see how this circular locking dependency can occur.. If you are using SLUB,
>>>> kmem_cache_destroy() releases slab_mutex before it calls rcu_barrier(). If you are
>>>> using SLAB, kmem_cache_destroy() wraps its whole operation inside get/put_online_cpus(),
>>>> which means, it cannot run concurrently with a hotplug operation such as cpu_up(). So, I'm
>>>> rather puzzled at this lockdep splat..
>>>
>>> I am using SLAB here.
>>>
>>> The scenario I think is very well possible:
>>>
>>>
>>> 	CPU 0				CPU 1
>>> 	kmem_cache_destroy()
>>
>> What about the get_online_cpus() right here at CPU0 before
>> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
>> on CPU1?? I still don't get it... :(
>>
>> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
>> and releasing slab_mutex).
> 
> The problem is that there is a CPU-hotplug notifier for slab, which
> establishes hotplug->slab.

Agreed.

>  Then having kmem_cache_destroy() call
> rcu_barrier() under the lock

Ah, that's where I disagree. kmem_cache_destroy() *cannot* proceed at
this point in time, because it has invoked get_online_cpus()! It simply
cannot be running past that point in the presence of a running hotplug
notifier! So, kmem_cache_destroy() should have been sleeping on the
hotplug lock, waiting for the notifier to release it, no?

> establishes slab->hotplug, which results
> in deadlock.  Jiri really did explain this in an earlier email
> message, but both of us managed to miss it.  ;-)
> 

Maybe I'm just being blind, sorry! ;-)

Regards,
Srivatsa S. Bhat

> 							Thanx, Paul
> 
>> Regards,
>> Srivatsa S. Bhat
>>
>>> 	mutex_lock(slab_mutex)
>>> 	 				_cpu_up()
>>> 					cpu_hotplug_begin()
>>> 					mutex_lock(cpu_hotplug.lock)
>>> 	rcu_barrier()
>>> 	_rcu_barrier()
>>> 	get_online_cpus()
>>> 	mutex_lock(cpu_hotplug.lock)
>>> 	 (blocks, CPU 1 has the mutex)
>>> 					__cpu_notify()
>>> 					mutex_lock(slab_mutex)
>>>
>>> Deadlock.
>>>
>>> Right?
>>>
>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  3:59           ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
@ 2012-10-03  4:07             ` Paul E. McKenney
  2012-10-03  4:15               ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-03  4:07 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Kosina, Paul E. McKenney, Josh Triplett, linux-kernel

On Wed, Oct 03, 2012 at 09:29:01AM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
> > On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
> >> On Tue, 2 Oct 2012, Jiri Kosina wrote:
> >>
> >>>>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
> >>>>>> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
> >>>>>> Author: Paul E. McKenney <paul.mckenney@linaro.org>
> >>>>>> Date:   Thu Aug 2 17:43:50 2012 -0700
> >>>>>>
> >>>>>>     rcu: Remove _rcu_barrier() dependency on __stop_machine()
> >>>>>>     
> >>>>>>     Currently, _rcu_barrier() relies on preempt_disable() to prevent
> >>>>>>     any CPU from going offline, which in turn depends on CPU hotplug's
> >>>>>>     use of __stop_machine().
> >>>>>>     
> >>>>>>     This patch therefore makes _rcu_barrier() use get_online_cpus() to
> >>>>>>     block CPU-hotplug operations.  This has the added benefit of removing
> >>>>>>     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
> >>>>>>     operations are excluded, there can be no callbacks to adopt.  This
> >>>>>>     commit simplifies the code accordingly.
> >>>>>>     
> >>>>>>     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
> >>>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >>>>>>     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
> >>>>>> ==
> >>>>>>
> >>>>>> is causing lockdep to complain (see the full trace below). I haven't yet 
> >>>>>> had time to analyze what exactly is happening, and probably will not have 
> >>>>>> time to do so until tomorrow, so just sending this as a heads-up in case 
> >>>>>> anyone sees the culprit immediately.
> >>>>>
> >>>>> Hmmm...  Does the following patch help?  It swaps the order in which
> >>>>> rcu_barrier() acquires the hotplug and rcu_barrier locks.
> >>>>
> >>>> It changed the report slightly (see for example the change in possible 
> >>>> unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
> >>>> now directly about cpu_hotplug.lock). With the patch applied I get
> >>>>
> >>>>
> >>>>
> >>>> ======================================================
> >>>> [ INFO: possible circular locking dependency detected ]
> >>>> 3.6.0-03888-g3f99f3b #145 Not tainted
> >>>
> >>> And it really seems valid. 
> > 
> > Yep, it sure is.  I wasn't getting the full picture earlier, so please
> > accept my apologies for the bogus patch.
> > 
> >>> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
> >>> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
> >>> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
> >>>
> >>> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
> >>> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
> >>> gets called, which acquires slab_mutex. This gives the reverse dependency, 
> >>> i.e. deadlock scenario is valid one.
> >>>
> >>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
> >>> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
> >>>
> >>> Simply put, the commit causes get_online_cpus() to be called with 
> >>> slab_mutex held, which is invalid.
> >>
> >> Oh, and it seems to be actually triggering in real.
> >>
> >> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
> >> your patch, changing the order in which rcu_barrier() acquires hotplug and 
> >> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
> >> very likely actually is the deadlock described above.
> > 
> > Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
> > notifier, which doesn't sit so well with rcu_barrier() trying to exclude
> > CPU hotplug events.
> 
> Why not? IMHO it should have been perfectly fine! See below...
> 
> >  I could go back to the old approach, but it is
> > significantly more complex.  I cannot say that I am all that happy
> > about anyone calling rcu_barrier() from a CPU hotplug notifier because
> > it doesn't help CPU hotplug latency, but that is a separate issue.
> > 
> > But the thing is that rcu_barrier()'s assumptions work just fine if either
> > (1) it excludes hotplug operations or (2) if it is called from a hotplug
> > notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
> > is executing.  So the right way to resolve this seems to be to do the
> > get_online_cpus() only if rcu_barrier() is -not- executing in the context
> > of a hotplug notifier.  Should be fixable without too much hassle...
> > 
> 
> The thing is, get_online_cpus() is smart: it *knows* when you are calling
> it in a hotplug-writer, IOW, when you are in a hotplug notifier.
> 
> The relevant code is:
> 
> void get_online_cpus(void)
> {
>         might_sleep();
>         if (cpu_hotplug.active_writer == current)
>                 return;
> 	....
> }
> 
> So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
> notifier should pose no problem at all!

Indeed, that was my confusion.  The deadlock can happen with
the slab CPU-hotplug notifier (without calling rcu_barrier()!), which
establishes hotplug->slab.  The some other unrelated thread calls
kmem_cache_destroy(), which acquires slab and then calls rcu_barrier(),
which acquires hotplug.  So the deadlock can happen independently of
rcu_barrier() being called from a CPU-hotplug notifier.

Making kmem_cache_destroy() release slab before calling rcu_barrier()
seems to clear things up for Jiri, but we need Pekka's or Christoph
Lameter's view on whether this is really safe.  (It looks safe to
both Jiri and I, but...)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  4:07             ` Paul E. McKenney
@ 2012-10-03  4:15               ` Srivatsa S. Bhat
  0 siblings, 0 replies; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  4:15 UTC (permalink / raw)
  To: paulmck; +Cc: Jiri Kosina, Paul E. McKenney, Josh Triplett, linux-kernel

On 10/03/2012 09:37 AM, Paul E. McKenney wrote:
> On Wed, Oct 03, 2012 at 09:29:01AM +0530, Srivatsa S. Bhat wrote:
>> On 10/03/2012 05:01 AM, Paul E. McKenney wrote:
>>> On Tue, Oct 02, 2012 at 11:58:36PM +0200, Jiri Kosina wrote:
>>>> On Tue, 2 Oct 2012, Jiri Kosina wrote:
>>>>
>>>>>>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is the first bad commit
>>>>>>>> commit 1331e7a1bbe1f11b19c4327ba0853bee2a606543
>>>>>>>> Author: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>>>> Date:   Thu Aug 2 17:43:50 2012 -0700
>>>>>>>>
>>>>>>>>     rcu: Remove _rcu_barrier() dependency on __stop_machine()
>>>>>>>>     
>>>>>>>>     Currently, _rcu_barrier() relies on preempt_disable() to prevent
>>>>>>>>     any CPU from going offline, which in turn depends on CPU hotplug's
>>>>>>>>     use of __stop_machine().
>>>>>>>>     
>>>>>>>>     This patch therefore makes _rcu_barrier() use get_online_cpus() to
>>>>>>>>     block CPU-hotplug operations.  This has the added benefit of removing
>>>>>>>>     the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
>>>>>>>>     operations are excluded, there can be no callbacks to adopt.  This
>>>>>>>>     commit simplifies the code accordingly.
>>>>>>>>     
>>>>>>>>     Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
>>>>>>>>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>>>>>>>     Reviewed-by: Josh Triplett <josh@joshtriplett.org>
>>>>>>>> ==
>>>>>>>>
>>>>>>>> is causing lockdep to complain (see the full trace below). I haven't yet 
>>>>>>>> had time to analyze what exactly is happening, and probably will not have 
>>>>>>>> time to do so until tomorrow, so just sending this as a heads-up in case 
>>>>>>>> anyone sees the culprit immediately.
>>>>>>>
>>>>>>> Hmmm...  Does the following patch help?  It swaps the order in which
>>>>>>> rcu_barrier() acquires the hotplug and rcu_barrier locks.
>>>>>>
>>>>>> It changed the report slightly (see for example the change in possible 
>>>>>> unsafe locking scenario, rcu_sched_state.barrier_mutex vanished and it's 
>>>>>> now directly about cpu_hotplug.lock). With the patch applied I get
>>>>>>
>>>>>>
>>>>>>
>>>>>> ======================================================
>>>>>> [ INFO: possible circular locking dependency detected ]
>>>>>> 3.6.0-03888-g3f99f3b #145 Not tainted
>>>>>
>>>>> And it really seems valid. 
>>>
>>> Yep, it sure is.  I wasn't getting the full picture earlier, so please
>>> accept my apologies for the bogus patch.
>>>
>>>>> kmem_cache_destroy() calls rcu_barrier() with slab_mutex locked, which 
>>>>> introduces slab_mutex -> cpu_hotplug.lock dependency (through 
>>>>> rcu_barrier() -> _rcu_barrier() -> get_online_cpus()).
>>>>>
>>>>> On the other hand, _cpu_up() acquires cpu_hotplug.lock through 
>>>>> cpu_hotplug_begin(), and with this lock held cpuup_callback() notifier 
>>>>> gets called, which acquires slab_mutex. This gives the reverse dependency, 
>>>>> i.e. deadlock scenario is valid one.
>>>>>
>>>>> 1331e7a1bbe1f11b19c4327ba0853bee2a606543 is triggering this, because 
>>>>> before that, there was no slab_mutex -> cpu_hotplug.lock dependency.
>>>>>
>>>>> Simply put, the commit causes get_online_cpus() to be called with 
>>>>> slab_mutex held, which is invalid.
>>>>
>>>> Oh, and it seems to be actually triggering in real.
>>>>
>>>> With HEAD being 974a847e00c, machine suspends nicely. With 974a847e00c + 
>>>> your patch, changing the order in which rcu_barrier() acquires hotplug and 
>>>> rcu_barrier locks, the machine hangs 100% reliably during suspend, which 
>>>> very likely actually is the deadlock described above.
>>>
>>> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug
>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude
>>> CPU hotplug events.
>>
>> Why not? IMHO it should have been perfectly fine! See below...
>>
>>>  I could go back to the old approach, but it is
>>> significantly more complex.  I cannot say that I am all that happy
>>> about anyone calling rcu_barrier() from a CPU hotplug notifier because
>>> it doesn't help CPU hotplug latency, but that is a separate issue.
>>>
>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
>>> is executing.  So the right way to resolve this seems to be to do the
>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>> of a hotplug notifier.  Should be fixable without too much hassle...
>>>
>>
>> The thing is, get_online_cpus() is smart: it *knows* when you are calling
>> it in a hotplug-writer, IOW, when you are in a hotplug notifier.
>>
>> The relevant code is:
>>
>> void get_online_cpus(void)
>> {
>>         might_sleep();
>>         if (cpu_hotplug.active_writer == current)
>>                 return;
>> 	....
>> }
>>
>> So calling rcu_barrier() (and hence get_online_cpus()) from within a hotplug
>> notifier should pose no problem at all!
> 
> Indeed, that was my confusion.  The deadlock can happen with
> the slab CPU-hotplug notifier (without calling rcu_barrier()!), which
> establishes hotplug->slab.  The some other unrelated thread calls
> kmem_cache_destroy(), which acquires slab and then calls rcu_barrier(),
> which acquires hotplug.  So the deadlock can happen independently of
> rcu_barrier() being called from a CPU-hotplug notifier.
> 

Right, this is exactly what I thought yesterday. I had drafted a mail explaining
why the length of the circular locking dependency is really 2 but not 3 and
why the rcu_barrier() (barrier_mutex) is only aggravating a problem that is
there even without using rcu_barrier() at all. But then I stopped short of posting
it when I noticed the get/put_online_cpus() in kmem_cache_destroy() which really
looked puzzling to me. I (still) can't get myself to believe that kmem_cache_destroy()
could go beyond its get_online_cpus() and call rcu_barrier() at all, in the
presence of a concurrent CPU hotplug notifier!

Regards,
Srivatsa S. Bhat

> Making kmem_cache_destroy() release slab before calling rcu_barrier()
> seems to clear things up for Jiri, but we need Pekka's or Christoph
> Lameter's view on whether this is really safe.  (It looks safe to
> both Jiri and I, but...)
> 
> 							Thanx, Paul
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
  2012-10-03  3:50                 ` Srivatsa S. Bhat
@ 2012-10-03  6:08                   ` Srivatsa S. Bhat
  2012-10-03  8:21                     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  6:08 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 09:20 AM, Srivatsa S. Bhat wrote:
> On 10/03/2012 06:15 AM, Jiri Kosina wrote:
>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>
>>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
>>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>>
>>>>> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
>>>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
>>>>> CPU hotplug events.  I could go back to the old approach, but it is 
>>>>> significantly more complex.  I cannot say that I am all that happy about 
>>>>> anyone calling rcu_barrier() from a CPU hotplug notifier because it 
>>>>> doesn't help CPU hotplug latency, but that is a separate issue.
>>>>>
>>>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>>>> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
>>>>> is executing.  So the right way to resolve this seems to be to do the
>>>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>>>> of a hotplug notifier.  Should be fixable without too much hassle...
>>>>
>>>> Sorry, I don't think I understand what you are proposing just yet.
>>>>
>>>> If I understand it correctly, you are proposing to introduce some magic 
>>>> into _rcu_barrier() such as (pseudocode of course):
>>>>
>>>> 	if (!being_called_from_hotplug_notifier_callback)
>>>> 		get_online_cpus()
>>>>
>>>> How does that protect from the scenario I've outlined before though?
>>>>
>>>> 	CPU 0                           CPU 1
>>>> 	kmem_cache_destroy()
>>>> 	mutex_lock(slab_mutex)
>>>> 					_cpu_up()
>>>> 					cpu_hotplug_begin()
>>>> 					mutex_lock(cpu_hotplug.lock)
>>>> 	rcu_barrier()
>>>> 	_rcu_barrier()
>>>> 	get_online_cpus()
>>>> 	mutex_lock(cpu_hotplug.lock)
>>>> 	 (blocks, CPU 1 has the mutex)
>>>> 					__cpu_notify()
>>>> 					mutex_lock(slab_mutex)	
>>>>
>>>> CPU 0 grabs both locks anyway (it's not running from notifier callback). 
>>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
>>>> from notifier callback either.
>>>>
>>>> What did I miss?
>>>
>>> You didn't miss anything, I was suffering a failure to read carefully.
>>>
>>> So my next stupid question is "Why can't kmem_cache_destroy drop
>>> slab_mutex early?" like the following:
>>>
>>> 	void kmem_cache_destroy(struct kmem_cache *cachep)
>>> 	{
>>> 		BUG_ON(!cachep || in_interrupt());
>>>
>>> 		/* Find the cache in the chain of caches. */
>>> 		get_online_cpus();
>>> 		mutex_lock(&slab_mutex);
>>> 		/*
>>> 		 * the chain is never empty, cache_cache is never destroyed
>>> 		 */
>>> 		list_del(&cachep->list);
>>> 		if (__cache_shrink(cachep)) {
>>> 			slab_error(cachep, "Can't free all objects");
>>> 			list_add(&cachep->list, &slab_caches);
>>> 			mutex_unlock(&slab_mutex);
>>> 			put_online_cpus();
>>> 			return;
>>> 		}
>>> 		mutex_unlock(&slab_mutex);
>>>
>>> 		if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>>> 			rcu_barrier();
>>>
>>> 		__kmem_cache_destroy(cachep);
>>> 		put_online_cpus();
>>> 	}
>>>
>>> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
>>> Looks to me like it is just freeing now-disconnected memory.
>>
>> Good question. I believe it should be safe to drop slab_mutex earlier, as 
>> cachep has already been unlinked. I am adding slab people and linux-mm to 
>> CC (the whole thread on LKML can be found at 
>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>
>> How about the patch below? Pekka, Christoph, please?
>>
>> It makes the lockdep happy again, and obviously removes the deadlock (I 
>> tested it).
>>
>>
>>
>> From: Jiri Kosina <jkosina@suse.cz>
>> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>>
>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
>> _rcu_barrier() -> get_online_cpus().
>>
>> This opens a possibilty for deadlock:
>>
>>         CPU 0                           CPU 1
>> 	        kmem_cache_destroy()
>> 	        mutex_lock(slab_mutex)
>> 	                                        _cpu_up()
>> 	                                        cpu_hotplug_begin()
>> 	                                        mutex_lock(cpu_hotplug.lock)
>> 	        rcu_barrier()
>> 	        _rcu_barrier()
>> 	        get_online_cpus()
>> 	        mutex_lock(cpu_hotplug.lock)
>> 	         (blocks, CPU 1 has the mutex)
>> 	                                        __cpu_notify()
>> 	                                        mutex_lock(slab_mutex)
> 
> Hmm.. no, this should *never* happen IMHO!
> 
> If I am seeing the code right, kmem_cache_destroy() wraps its entire content
> inside get/put_online_cpus(), which means it cannot run concurrently with cpu_up()
> or cpu_down(). Are we really hitting a corner case where the refcounting logic
> in get/put_online_cpus() is failing and allowing a hotplug writer to run in
> parallel with a hotplug reader? If yes, *that* is the problem we have to fix..
>

One scenario where we can see this happen is if we had a put_online_cpus() "leak"
somewhere.. that is, perhaps the task that was about to invoke kmem_cache_destroy()
previously called put_online_cpus() once too many. In that case, the get_online_cpus()
in kmem_cache_destroy() might prove useless in excluding it from concurrent hotplug
operations.

Jiri, can you please try the debug patch below? It would be good to see if the
refcount ever went negative...

Regards,
Srivatsa S. Bhat

----------------------------------------------

 kernel/cpu.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..cf82da9 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -69,6 +69,7 @@ void get_online_cpus(void)
 	if (cpu_hotplug.active_writer == current)
 		return;
 	mutex_lock(&cpu_hotplug.lock);
+	BUG_ON(cpu_hotplug.refcount < 0);
 	cpu_hotplug.refcount++;
 	mutex_unlock(&cpu_hotplug.lock);
 


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  4:04         ` Srivatsa S. Bhat
@ 2012-10-03  7:43           ` Jiri Kosina
  2012-10-03  8:11             ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-03  7:43 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: paulmck, Paul E. McKenney, Josh Triplett, linux-kernel

On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:

> >>> 	CPU 0				CPU 1
> >>> 	kmem_cache_destroy()
> >>
> >> What about the get_online_cpus() right here at CPU0 before
> >> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
> >> on CPU1?? I still don't get it... :(
> >>
> >> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
> >> and releasing slab_mutex).
> > 
> > The problem is that there is a CPU-hotplug notifier for slab, which
> > establishes hotplug->slab.
> 
> Agreed.
> 
> >  Then having kmem_cache_destroy() call
> > rcu_barrier() under the lock
> 
> Ah, that's where I disagree. kmem_cache_destroy() *cannot* proceed at
> this point in time, because it has invoked get_online_cpus()! It simply
> cannot be running past that point in the presence of a running hotplug
> notifier! So, kmem_cache_destroy() should have been sleeping on the
> hotplug lock, waiting for the notifier to release it, no?

Please look carefully at the scenario again. kmem_cache_destroy() calls 
get_online_cpus() before the hotplug notifier even starts. Hence it has no 
reason to block there (noone is holding hotplug lock).

*Then* hotplug notifier fires up, succeeds obtaining hotplug lock, 
kmem_cache_destroy() calls rcu_barrier in the meantime, and blocks itself 
on the hotplug lock there.

Please note that the get_online_cpus() call in kmem_cache_destroy() 
doesn't play *any* role in this scenario.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  7:43           ` Jiri Kosina
@ 2012-10-03  8:11             ` Srivatsa S. Bhat
  2012-10-03  8:19               ` Jiri Kosina
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  8:11 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: paulmck, Paul E. McKenney, Josh Triplett, linux-kernel

On 10/03/2012 01:13 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> 
>>>>> 	CPU 0				CPU 1
>>>>> 	kmem_cache_destroy()
>>>>
>>>> What about the get_online_cpus() right here at CPU0 before
>>>> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
>>>> on CPU1?? I still don't get it... :(
>>>>
>>>> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
>>>> and releasing slab_mutex).
>>>
>>> The problem is that there is a CPU-hotplug notifier for slab, which
>>> establishes hotplug->slab.
>>
>> Agreed.
>>
>>>  Then having kmem_cache_destroy() call
>>> rcu_barrier() under the lock
>>
>> Ah, that's where I disagree. kmem_cache_destroy() *cannot* proceed at
>> this point in time, because it has invoked get_online_cpus()! It simply
>> cannot be running past that point in the presence of a running hotplug
>> notifier! So, kmem_cache_destroy() should have been sleeping on the
>> hotplug lock, waiting for the notifier to release it, no?
> 
> Please look carefully at the scenario again. kmem_cache_destroy() calls 
> get_online_cpus() before the hotplug notifier even starts. Hence it has no 
> reason to block there (noone is holding hotplug lock).
> 

Agreed.

> *Then* hotplug notifier fires up, succeeds obtaining hotplug lock, 

Ah, that's the problem! The hotplug reader-writer synchronization is not just
via a simple mutex. Its a refcount underneath. If kmem_cache_destroy() incremented
the refcount, the hotplug-writer (cpu_up) will release the hotplug lock immediately
and try again. IOW, a hotplug-reader (kmem_cache_destroy()) and a hotplug-writer
(cpu_up) can *NEVER* run concurrently. If they do, we are totally screwed!


Take a look at the hotplug lock acquire function at the writer side:

static void cpu_hotplug_begin(void)
{
        cpu_hotplug.active_writer = current;

        for (;;) {
                mutex_lock(&cpu_hotplug.lock);
                if (likely(!cpu_hotplug.refcount))   <================ This one!
                        break;
                __set_current_state(TASK_UNINTERRUPTIBLE);
                mutex_unlock(&cpu_hotplug.lock);
                schedule();
        }   
}

> kmem_cache_destroy() calls rcu_barrier in the meantime, and blocks itself 
> on the hotplug lock there.
> 
> Please note that the get_online_cpus() call in kmem_cache_destroy() 
> doesn't play *any* role in this scenario.
> 

Please consider my thoughts above. You'll see why I'm not convinced.


Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  8:11             ` Srivatsa S. Bhat
@ 2012-10-03  8:19               ` Jiri Kosina
  2012-10-03  8:30                 ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-03  8:19 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: paulmck, Paul E. McKenney, Josh Triplett, linux-kernel

On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:

> On 10/03/2012 01:13 PM, Jiri Kosina wrote:
> > On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> > 
> >>>>> 	CPU 0				CPU 1
> >>>>> 	kmem_cache_destroy()
> >>>>
> >>>> What about the get_online_cpus() right here at CPU0 before
> >>>> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
> >>>> on CPU1?? I still don't get it... :(
> >>>>
> >>>> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
> >>>> and releasing slab_mutex).
> >>>
> >>> The problem is that there is a CPU-hotplug notifier for slab, which
> >>> establishes hotplug->slab.
> >>
> >> Agreed.
> >>
> >>>  Then having kmem_cache_destroy() call
> >>> rcu_barrier() under the lock
> >>
> >> Ah, that's where I disagree. kmem_cache_destroy() *cannot* proceed at
> >> this point in time, because it has invoked get_online_cpus()! It simply
> >> cannot be running past that point in the presence of a running hotplug
> >> notifier! So, kmem_cache_destroy() should have been sleeping on the
> >> hotplug lock, waiting for the notifier to release it, no?
> > 
> > Please look carefully at the scenario again. kmem_cache_destroy() calls 
> > get_online_cpus() before the hotplug notifier even starts. Hence it has no 
> > reason to block there (noone is holding hotplug lock).
> > 
> 
> Agreed.
> 
> > *Then* hotplug notifier fires up, succeeds obtaining hotplug lock, 
> 
> Ah, that's the problem! The hotplug reader-writer synchronization is not just
> via a simple mutex. Its a refcount underneath. If kmem_cache_destroy() incremented
> the refcount, the hotplug-writer (cpu_up) will release the hotplug lock immediately
> and try again. IOW, a hotplug-reader (kmem_cache_destroy()) and a hotplug-writer
> (cpu_up) can *NEVER* run concurrently. If they do, we are totally screwed!
> 
> 
> Take a look at the hotplug lock acquire function at the writer side:
> 
> static void cpu_hotplug_begin(void)
> {
>         cpu_hotplug.active_writer = current;
> 
>         for (;;) {
>                 mutex_lock(&cpu_hotplug.lock);
>                 if (likely(!cpu_hotplug.refcount))   <================ This one!
>                         break;
>                 __set_current_state(TASK_UNINTERRUPTIBLE);
>                 mutex_unlock(&cpu_hotplug.lock);
>                 schedule();
>         }   
> }

I acutally just came to the same conclusion (7 hours of sleep later, the 
mind indeed seems to be brighter ... what a poet I am).

Lockdep doesn't know about this semantics of cpu_hotplug_begin(), and 
therefore gets confused by the fact that mutual exclusion is actually 
achieved through the refcount instead of mutex (and the same apparently 
happened to me).

So right, now I agree that the deadlock scenario I have come up with is 
indeed bogus (*), and we just have to annotate this fact to lockdep 
somehow.

And I actually believe that moving the slab_mutex around in 
kmem_cache_destroy() is a good anotation (maybe worth a separate comment 
in the code), as it not only makes the lockdep false positive go away, but 
it also reduces the mutex hold time.

(*) I have seen machine locking hard reproducibly, but that was only with 
additional Paul's patch, so I guess the lock order there actually was 
wrong

Thanks!

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
  2012-10-03  6:08                   ` Srivatsa S. Bhat
@ 2012-10-03  8:21                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  8:21 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 11:38 AM, Srivatsa S. Bhat wrote:
> On 10/03/2012 09:20 AM, Srivatsa S. Bhat wrote:
>> On 10/03/2012 06:15 AM, Jiri Kosina wrote:
>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>
>>>> On Wed, Oct 03, 2012 at 01:48:21AM +0200, Jiri Kosina wrote:
>>>>> On Tue, 2 Oct 2012, Paul E. McKenney wrote:
>>>>>
>>>>>> Indeed.  Slab seems to be doing an rcu_barrier() in a CPU hotplug 
>>>>>> notifier, which doesn't sit so well with rcu_barrier() trying to exclude 
>>>>>> CPU hotplug events.  I could go back to the old approach, but it is 
>>>>>> significantly more complex.  I cannot say that I am all that happy about 
>>>>>> anyone calling rcu_barrier() from a CPU hotplug notifier because it 
>>>>>> doesn't help CPU hotplug latency, but that is a separate issue.
>>>>>>
>>>>>> But the thing is that rcu_barrier()'s assumptions work just fine if either
>>>>>> (1) it excludes hotplug operations or (2) if it is called from a hotplug
>>>>>> notifier.  You see, either way, the CPU cannot go away while rcu_barrier()
>>>>>> is executing.  So the right way to resolve this seems to be to do the
>>>>>> get_online_cpus() only if rcu_barrier() is -not- executing in the context
>>>>>> of a hotplug notifier.  Should be fixable without too much hassle...
>>>>>
>>>>> Sorry, I don't think I understand what you are proposing just yet.
>>>>>
>>>>> If I understand it correctly, you are proposing to introduce some magic 
>>>>> into _rcu_barrier() such as (pseudocode of course):
>>>>>
>>>>> 	if (!being_called_from_hotplug_notifier_callback)
>>>>> 		get_online_cpus()
>>>>>
>>>>> How does that protect from the scenario I've outlined before though?
>>>>>
>>>>> 	CPU 0                           CPU 1
>>>>> 	kmem_cache_destroy()
>>>>> 	mutex_lock(slab_mutex)
>>>>> 					_cpu_up()
>>>>> 					cpu_hotplug_begin()
>>>>> 					mutex_lock(cpu_hotplug.lock)
>>>>> 	rcu_barrier()
>>>>> 	_rcu_barrier()
>>>>> 	get_online_cpus()
>>>>> 	mutex_lock(cpu_hotplug.lock)
>>>>> 	 (blocks, CPU 1 has the mutex)
>>>>> 					__cpu_notify()
>>>>> 					mutex_lock(slab_mutex)	
>>>>>
>>>>> CPU 0 grabs both locks anyway (it's not running from notifier callback). 
>>>>> CPU 1 grabs both locks as well, as there is no _rcu_barrier() being called 
>>>>> from notifier callback either.
>>>>>
>>>>> What did I miss?
>>>>
>>>> You didn't miss anything, I was suffering a failure to read carefully.
>>>>
>>>> So my next stupid question is "Why can't kmem_cache_destroy drop
>>>> slab_mutex early?" like the following:
>>>>
>>>> 	void kmem_cache_destroy(struct kmem_cache *cachep)
>>>> 	{
>>>> 		BUG_ON(!cachep || in_interrupt());
>>>>
>>>> 		/* Find the cache in the chain of caches. */
>>>> 		get_online_cpus();
>>>> 		mutex_lock(&slab_mutex);
>>>> 		/*
>>>> 		 * the chain is never empty, cache_cache is never destroyed
>>>> 		 */
>>>> 		list_del(&cachep->list);
>>>> 		if (__cache_shrink(cachep)) {
>>>> 			slab_error(cachep, "Can't free all objects");
>>>> 			list_add(&cachep->list, &slab_caches);
>>>> 			mutex_unlock(&slab_mutex);
>>>> 			put_online_cpus();
>>>> 			return;
>>>> 		}
>>>> 		mutex_unlock(&slab_mutex);
>>>>
>>>> 		if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>>>> 			rcu_barrier();
>>>>
>>>> 		__kmem_cache_destroy(cachep);
>>>> 		put_online_cpus();
>>>> 	}
>>>>
>>>> Or did I miss some reason why __kmem_cache_destroy() needs that lock?
>>>> Looks to me like it is just freeing now-disconnected memory.
>>>
>>> Good question. I believe it should be safe to drop slab_mutex earlier, as 
>>> cachep has already been unlinked. I am adding slab people and linux-mm to 
>>> CC (the whole thread on LKML can be found at 
>>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>>
>>> How about the patch below? Pekka, Christoph, please?
>>>
>>> It makes the lockdep happy again, and obviously removes the deadlock (I 
>>> tested it).
>>>
>>>
>>>
>>> From: Jiri Kosina <jkosina@suse.cz>
>>> Subject: mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>>>
>>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
>>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
>>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
>>> _rcu_barrier() -> get_online_cpus().
>>>
>>> This opens a possibilty for deadlock:
>>>
>>>         CPU 0                           CPU 1
>>> 	        kmem_cache_destroy()
>>> 	        mutex_lock(slab_mutex)
>>> 	                                        _cpu_up()
>>> 	                                        cpu_hotplug_begin()
>>> 	                                        mutex_lock(cpu_hotplug.lock)
>>> 	        rcu_barrier()
>>> 	        _rcu_barrier()
>>> 	        get_online_cpus()
>>> 	        mutex_lock(cpu_hotplug.lock)
>>> 	         (blocks, CPU 1 has the mutex)
>>> 	                                        __cpu_notify()
>>> 	                                        mutex_lock(slab_mutex)
>>
>> Hmm.. no, this should *never* happen IMHO!
>>
>> If I am seeing the code right, kmem_cache_destroy() wraps its entire content
>> inside get/put_online_cpus(), which means it cannot run concurrently with cpu_up()
>> or cpu_down(). Are we really hitting a corner case where the refcounting logic
>> in get/put_online_cpus() is failing and allowing a hotplug writer to run in
>> parallel with a hotplug reader? If yes, *that* is the problem we have to fix..
>>
> 
> One scenario where we can see this happen is if we had a put_online_cpus() "leak"
> somewhere.. that is, perhaps the task that was about to invoke kmem_cache_destroy()
> previously called put_online_cpus() once too many. In that case, the get_online_cpus()
> in kmem_cache_destroy() might prove useless in excluding it from concurrent hotplug
> operations.
> 
> Jiri, can you please try the debug patch below? It would be good to see if the
> refcount ever went negative...
> 

Better to catch the bug even earlier, at the right moment, in put_online_cpus()
itself. Could you try this one instead?

Regards,
Srivatsa S. Bhat

----------------------------------------------------

 kernel/cpu.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
 	if (cpu_hotplug.active_writer == current)
 		return;
 	mutex_lock(&cpu_hotplug.lock);
+	BUG_ON(cpu_hotplug.refcount == 0);
 	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
 		wake_up_process(cpu_hotplug.active_writer);
 	mutex_unlock(&cpu_hotplug.lock);



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  8:19               ` Jiri Kosina
@ 2012-10-03  8:30                 ` Srivatsa S. Bhat
  2012-10-03  9:24                   ` Jiri Kosina
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  8:30 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: paulmck, Paul E. McKenney, Josh Triplett, linux-kernel

On 10/03/2012 01:49 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> 
>> On 10/03/2012 01:13 PM, Jiri Kosina wrote:
>>> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
>>>
>>>>>>> 	CPU 0				CPU 1
>>>>>>> 	kmem_cache_destroy()
>>>>>>
>>>>>> What about the get_online_cpus() right here at CPU0 before
>>>>>> calling mutex_lock(slab_mutex)? How can the cpu_up() proceed
>>>>>> on CPU1?? I still don't get it... :(
>>>>>>
>>>>>> (kmem_cache_destroy() uses get/put_online_cpus() around acquiring
>>>>>> and releasing slab_mutex).
>>>>>
>>>>> The problem is that there is a CPU-hotplug notifier for slab, which
>>>>> establishes hotplug->slab.
>>>>
>>>> Agreed.
>>>>
>>>>>  Then having kmem_cache_destroy() call
>>>>> rcu_barrier() under the lock
>>>>
>>>> Ah, that's where I disagree. kmem_cache_destroy() *cannot* proceed at
>>>> this point in time, because it has invoked get_online_cpus()! It simply
>>>> cannot be running past that point in the presence of a running hotplug
>>>> notifier! So, kmem_cache_destroy() should have been sleeping on the
>>>> hotplug lock, waiting for the notifier to release it, no?
>>>
>>> Please look carefully at the scenario again. kmem_cache_destroy() calls 
>>> get_online_cpus() before the hotplug notifier even starts. Hence it has no 
>>> reason to block there (noone is holding hotplug lock).
>>>
>>
>> Agreed.
>>
>>> *Then* hotplug notifier fires up, succeeds obtaining hotplug lock, 
>>
>> Ah, that's the problem! The hotplug reader-writer synchronization is not just
>> via a simple mutex. Its a refcount underneath. If kmem_cache_destroy() incremented
>> the refcount, the hotplug-writer (cpu_up) will release the hotplug lock immediately
>> and try again. IOW, a hotplug-reader (kmem_cache_destroy()) and a hotplug-writer
>> (cpu_up) can *NEVER* run concurrently. If they do, we are totally screwed!
>>
>>
>> Take a look at the hotplug lock acquire function at the writer side:
>>
>> static void cpu_hotplug_begin(void)
>> {
>>         cpu_hotplug.active_writer = current;
>>
>>         for (;;) {
>>                 mutex_lock(&cpu_hotplug.lock);
>>                 if (likely(!cpu_hotplug.refcount))   <================ This one!
>>                         break;
>>                 __set_current_state(TASK_UNINTERRUPTIBLE);
>>                 mutex_unlock(&cpu_hotplug.lock);
>>                 schedule();
>>         }   
>> }
> 
> I acutally just came to the same conclusion (7 hours of sleep later, the 
> mind indeed seems to be brighter ... what a poet I am).
> 
> Lockdep doesn't know about this semantics of cpu_hotplug_begin(), and 
> therefore gets confused by the fact that mutual exclusion is actually 
> achieved through the refcount instead of mutex (and the same apparently 
> happened to me).

No, that's not the problem. Lockdep is fine. The calltrace clearly shows that
our refcounting has messed up somewhere. As a result, we really *are* running
a hotplug-reader and a hotplug-writer at the same time! We really need to fix
*that*! So please try the second debug patch I sent just now (with the BUG_ON()
in put_online_cpus()). We need to know who is calling put_online_cpus() twice
and fix that culprit!

> 
> So right, now I agree that the deadlock scenario I have come up with is 
> indeed bogus (*), and we just have to annotate this fact to lockdep 
> somehow.

Yes, the deadlock scenario is bogus, but the refcounting leak is for real
and needs fixing.

> 
> And I actually believe that moving the slab_mutex around in 
> kmem_cache_destroy() is a good anotation (maybe worth a separate comment 
> in the code), as it not only makes the lockdep false positive go away, but 
> it also reduces the mutex hold time.
>

I'm fine with this, but the real problem is elsewhere, like I mentioned above.
This one is only a good-to-have, not a fix.
 
> (*) I have seen machine locking hard reproducibly, but that was only with 
> additional Paul's patch, so I guess the lock order there actually was 
> wrong

If refcounting was working fine, Paul's patch wouldn't have caused *any* issues.
With that patch in place, the 2 places where rcu_barrier() get invoked (ie.,
kmem_cache_destroy() and deactivate_locked_super()) both start waiting on
get_online_cpus() until the slab cpu hotplug notifier as well as the entire
cpu_up operation completes. Absolutely no problem in that! So the fact that
you are seeing lock-ups here is another indication that the problem is really
elsewhere!

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  8:30                 ` Srivatsa S. Bhat
@ 2012-10-03  9:24                   ` Jiri Kosina
  2012-10-03  9:58                     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-03  9:24 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: paulmck, Paul E. McKenney, Josh Triplett, linux-kernel

On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:

> >> static void cpu_hotplug_begin(void)
> >> {
> >>         cpu_hotplug.active_writer = current;
> >>
> >>         for (;;) {
> >>                 mutex_lock(&cpu_hotplug.lock);
> >>                 if (likely(!cpu_hotplug.refcount))   <================ This one!
> >>                         break;
> >>                 __set_current_state(TASK_UNINTERRUPTIBLE);
> >>                 mutex_unlock(&cpu_hotplug.lock);
> >>                 schedule();
> >>         }   
> >> }
> > 
> > I acutally just came to the same conclusion (7 hours of sleep later, the 
> > mind indeed seems to be brighter ... what a poet I am).
> > 
> > Lockdep doesn't know about this semantics of cpu_hotplug_begin(), and 
> > therefore gets confused by the fact that mutual exclusion is actually 
> > achieved through the refcount instead of mutex (and the same apparently 
> > happened to me).
> 
> No, that's not the problem. Lockdep is fine. The calltrace clearly shows that
> our refcounting has messed up somewhere. As a result, we really *are* running
> a hotplug-reader and a hotplug-writer at the same time! We really need to fix
> *that*! So please try the second debug patch I sent just now (with the BUG_ON()
> in put_online_cpus()). We need to know who is calling put_online_cpus() twice
> and fix that culprit!

I don't think so.

Lockdep is complaining, because

(a) during system bootup, the smp_init() -> cpu_up() -> cpuup_callback() 
    teaches him about hotplug.lock -> slab_mutex dependency

(b) many many jiffies later, nf_conntrack_cleanup_net() calls 
    kmem_cache_destroy(), which introduces slab_mutex -> hotplug.lock 
    dependency

Lockdep rightfully (from his POV) sees this as potential ABBA, and reports 
it, it's as simple as that.
It has no way of knowing the fact that the ABBA can actually never happen, 
because of special semantics of cpu_hotplug.refcount and it's handling in 
cpu_hotplug_begin().

The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin() 
until everyone who called get_online_cpus() will call put_online_cpus()" 
is totally invisible to lockdep.

> > So right, now I agree that the deadlock scenario I have come up with is 
> > indeed bogus (*), and we just have to annotate this fact to lockdep 
> > somehow.
> 
> Yes, the deadlock scenario is bogus, but the refcounting leak is for real
> and needs fixing.

With your patch applied, the BUG_ON() in put_online_cpus() didn't trigger 
for me at all. Which is what I expected.

> I'm fine with this, but the real problem is elsewhere, like I mentioned above.
> This one is only a good-to-have, not a fix.
>  
> > (*) I have seen machine locking hard reproducibly, but that was only with 
> > additional Paul's patch, so I guess the lock order there actually was 
> > wrong
> 
> If refcounting was working fine, Paul's patch wouldn't have caused *any* issues.
> With that patch in place, the 2 places where rcu_barrier() get invoked (ie.,
> kmem_cache_destroy() and deactivate_locked_super()) both start waiting on
> get_online_cpus() until the slab cpu hotplug notifier as well as the entire
> cpu_up operation completes. Absolutely no problem in that! So the fact that
> you are seeing lock-ups here is another indication that the problem is really
> elsewhere!

I don't agree. The reason why Paul's patch (1331e7a1bb) started to trigger 
this, is that (b) above doesn't exist in pre-1331e7a1bb kernels.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
  2012-10-03  3:41                 ` Paul E. McKenney
  2012-10-03  3:50                 ` Srivatsa S. Bhat
@ 2012-10-03  9:46                 ` Jiri Kosina
  2012-10-03 12:22                   ` Srivatsa S. Bhat
  2012-10-03 14:17                   ` Christoph Lameter
  2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
  3 siblings, 2 replies; 46+ messages in thread
From: Jiri Kosina @ 2012-10-03  9:46 UTC (permalink / raw)
  To: Paul E. McKenney, Christoph Lameter, Pekka Enberg
  Cc: Paul E. McKenney, Josh Triplett, linux-kernel, Srivatsa S. Bhat,
	linux-mm

On Wed, 3 Oct 2012, Jiri Kosina wrote:

> Good question. I believe it should be safe to drop slab_mutex earlier, as 
> cachep has already been unlinked. I am adding slab people and linux-mm to 
> CC (the whole thread on LKML can be found at 
> https://lkml.org/lkml/2012/10/2/296 for reference).
> 
> How about the patch below? Pekka, Christoph, please?

It turns out that lockdep is actually getting this wrong, so the changelog 
in the previous version wasn't accurate.

Please find patch with updated changelog below. Pekka, Christoph, could 
you please check whether it makes sense to you as well? Thanks.




From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()

Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().

Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:

=== [ cut here ] ===
 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
 -------------------------------------------------------
 kworker/u:2/40 is trying to acquire lock:
  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0

 but task is already holding lock:
  (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (slab_mutex){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
        [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
        [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
        [<ffffffff8155719d>] _cpu_up+0xba/0x14e
        [<ffffffff815572ed>] cpu_up+0xbc/0x117
        [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
        [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81049197>] get_online_cpus+0x37/0x50
        [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
        [<ffffffff8118cc01>] deactivate_super+0x61/0x70
        [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
        [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
        [<ffffffff81569979>] system_call_fastpath+0x16/0x1b

 -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
        [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
        [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
        [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
        [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
        [<ffffffff81454b79>] ops_exit_list+0x39/0x60
        [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
        [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
        [<ffffffff81069f3e>] worker_thread+0x12e/0x320
        [<ffffffff8106f73e>] kthread+0x9e/0xb0
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 other info that might help us debug this:

 Chain exists of:
   rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slab_mutex);
                                lock(cpu_hotplug.lock);
                                lock(slab_mutex);
   lock(rcu_sched_state.barrier_mutex);

  *** DEADLOCK ***
=== [ cut here ] ===

This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.

The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.

This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:

- it slightly reduces hold time of slab_mutex; as it's used to protect
  the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
  call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
  learning about slab_mutex -> cpu_hotplug.lock dependency

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 mm/slab.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 1133911..693c7cb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
 		put_online_cpus();
 		return;
 	}
+	mutex_unlock(&slab_mutex);
 
 	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
 		rcu_barrier();
 
 	__kmem_cache_destroy(cachep);
-	mutex_unlock(&slab_mutex);
 	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")
  2012-10-03  9:24                   ` Jiri Kosina
@ 2012-10-03  9:58                     ` Srivatsa S. Bhat
  0 siblings, 0 replies; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03  9:58 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: paulmck, Paul E. McKenney, Josh Triplett, linux-kernel

On 10/03/2012 02:54 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> 
>>>> static void cpu_hotplug_begin(void)
>>>> {
>>>>         cpu_hotplug.active_writer = current;
>>>>
>>>>         for (;;) {
>>>>                 mutex_lock(&cpu_hotplug.lock);
>>>>                 if (likely(!cpu_hotplug.refcount))   <================ This one!
>>>>                         break;
>>>>                 __set_current_state(TASK_UNINTERRUPTIBLE);
>>>>                 mutex_unlock(&cpu_hotplug.lock);
>>>>                 schedule();
>>>>         }   
>>>> }
>>>
>>> I acutally just came to the same conclusion (7 hours of sleep later, the 
>>> mind indeed seems to be brighter ... what a poet I am).
>>>
>>> Lockdep doesn't know about this semantics of cpu_hotplug_begin(), and 
>>> therefore gets confused by the fact that mutual exclusion is actually 
>>> achieved through the refcount instead of mutex (and the same apparently 
>>> happened to me).
>>
>> No, that's not the problem. Lockdep is fine. The calltrace clearly shows that
>> our refcounting has messed up somewhere. As a result, we really *are* running
>> a hotplug-reader and a hotplug-writer at the same time! We really need to fix
>> *that*! So please try the second debug patch I sent just now (with the BUG_ON()
>> in put_online_cpus()). We need to know who is calling put_online_cpus() twice
>> and fix that culprit!
> 
> I don't think so.
> 
> Lockdep is complaining, because
> 
> (a) during system bootup, the smp_init() -> cpu_up() -> cpuup_callback() 
>     teaches him about hotplug.lock -> slab_mutex dependency
> 
> (b) many many jiffies later, nf_conntrack_cleanup_net() calls 
>     kmem_cache_destroy(), which introduces slab_mutex -> hotplug.lock 
>     dependency
> 
> Lockdep rightfully (from his POV) sees this as potential ABBA, and reports 
> it, it's as simple as that.
> It has no way of knowing the fact that the ABBA can actually never happen, 
> because of special semantics of cpu_hotplug.refcount and it's handling in 
> cpu_hotplug_begin().
>

Hmm, you are right.
 
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin() 
> until everyone who called get_online_cpus() will call put_online_cpus()" 
> is totally invisible to lockdep.

I see your point..

> 
>>> So right, now I agree that the deadlock scenario I have come up with is 
>>> indeed bogus (*), and we just have to annotate this fact to lockdep 
>>> somehow.
>>
>> Yes, the deadlock scenario is bogus, but the refcounting leak is for real
>> and needs fixing.
> 
> With your patch applied, the BUG_ON() in put_online_cpus() didn't trigger 
> for me at all. Which is what I expected.

Oh, ok..

> 
>> I'm fine with this, but the real problem is elsewhere, like I mentioned above.
>> This one is only a good-to-have, not a fix.
>>  
>>> (*) I have seen machine locking hard reproducibly, but that was only with 
>>> additional Paul's patch, so I guess the lock order there actually was 
>>> wrong
>>
>> If refcounting was working fine, Paul's patch wouldn't have caused *any* issues.
>> With that patch in place, the 2 places where rcu_barrier() get invoked (ie.,
>> kmem_cache_destroy() and deactivate_locked_super()) both start waiting on
>> get_online_cpus() until the slab cpu hotplug notifier as well as the entire
>> cpu_up operation completes. Absolutely no problem in that! So the fact that
>> you are seeing lock-ups here is another indication that the problem is really
>> elsewhere!
> 
> I don't agree. The reason why Paul's patch (1331e7a1bb) started to trigger 
> this, is that (b) above doesn't exist in pre-1331e7a1bb kernels.
>

So basically what you are saying is, the calltraces in the lockdep splat are from
different points in time right? Then I see why its just a false positive and not
a real bug.
 
Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
@ 2012-10-03 12:22                   ` Srivatsa S. Bhat
  2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
  2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
  2012-10-03 14:17                   ` Christoph Lameter
  1 sibling, 2 replies; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 12:22 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 03:16 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Jiri Kosina wrote:
> 
>> Good question. I believe it should be safe to drop slab_mutex earlier, as 
>> cachep has already been unlinked. I am adding slab people and linux-mm to 
>> CC (the whole thread on LKML can be found at 
>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>
>> How about the patch below? Pekka, Christoph, please?
> 
> It turns out that lockdep is actually getting this wrong, so the changelog 
> in the previous version wasn't accurate.
> 
> Please find patch with updated changelog below. Pekka, Christoph, could 
> you please check whether it makes sense to you as well? Thanks.
> 
> 
> 
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> 
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
> 
> === [ cut here ] ===
>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
>  -------------------------------------------------------
>  kworker/u:2/40 is trying to acquire lock:
>   (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> 
>  but task is already holding lock:
>   (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
> 
>  which lock already depends on the new lock.
> 
>  the existing dependency chain (in reverse order) is:
> 
>  -> #2 (slab_mutex){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
>         [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
>         [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
>         [<ffffffff8155719d>] _cpu_up+0xba/0x14e
>         [<ffffffff815572ed>] cpu_up+0xbc/0x117
>         [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
>         [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> 
>  -> #1 (cpu_hotplug.lock){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81049197>] get_online_cpus+0x37/0x50
>         [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
>         [<ffffffff8118cc01>] deactivate_super+0x61/0x70
>         [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
>         [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
>         [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
> 
>  -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
>         [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
>         [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
>         [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
>         [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
>         [<ffffffff81454b79>] ops_exit_list+0x39/0x60
>         [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
>         [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
>         [<ffffffff81069f3e>] worker_thread+0x12e/0x320
>         [<ffffffff8106f73e>] kthread+0x9e/0xb0
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> 
>  other info that might help us debug this:
> 
>  Chain exists of:
>    rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(slab_mutex);
>                                 lock(cpu_hotplug.lock);
>                                 lock(slab_mutex);
>    lock(rcu_sched_state.barrier_mutex);
> 
>   *** DEADLOCK ***
> === [ cut here ] ===
> 
> This is actually a false positive. Lockdep has no way of knowing the fact
> that the ABBA can actually never happen, because of special semantics of
> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> exclusion there is not achieved through mutex, but through
> cpu_hotplug.refcount.
> 
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> until everyone who called get_online_cpus() will call put_online_cpus()"
> semantics is totally invisible to lockdep.
> 
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
> 
> - it slightly reduces hold time of slab_mutex; as it's used to protect
>   the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
>   call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
>   learning about slab_mutex -> cpu_hotplug.lock dependency
> 
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Earlier I was under the wrong impression that all the calltraces that lockdep
spewed were reflecting the same point in time. So, sorry about that noise! :-)
It is indeed a false-positive and there is no real bug here, and the fix looks
good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.

But, I'm also quite surprised that the put_online_cpus() code as it stands today
doesn't have any checks for the refcount going negative. I believe that such a
check would be valuable to help catch cases where we might end up inadvertently
causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
that as a separate patch.

Regards,
Srivatsa S. Bhat

> ---
>  mm/slab.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 1133911..693c7cb 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
>  		put_online_cpus();
>  		return;
>  	}
> +	mutex_unlock(&slab_mutex);
> 
>  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>  		rcu_barrier();
> 
>  	__kmem_cache_destroy(cachep);
> -	mutex_unlock(&slab_mutex);
>  	put_online_cpus();
>  }
>  EXPORT_SYMBOL(kmem_cache_destroy);
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
  2012-10-03 12:22                   ` Srivatsa S. Bhat
@ 2012-10-03 12:53                     ` Srivatsa S. Bhat
  2012-10-03 21:13                       ` Andrew Morton
  2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
  1 sibling, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 12:53 UTC (permalink / raw)
  To: Jiri Kosina, Thomas Gleixner, akpm, Ingo Molnar, Peter Zijlstra
  Cc: Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 05:52 PM, Srivatsa S. Bhat wrote:
> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
>> On Wed, 3 Oct 2012, Jiri Kosina wrote:
>>
>>> Good question. I believe it should be safe to drop slab_mutex earlier, as 
>>> cachep has already been unlinked. I am adding slab people and linux-mm to 
>>> CC (the whole thread on LKML can be found at 
>>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>>
[...]
> 
> But, I'm also quite surprised that the put_online_cpus() code as it stands today
> doesn't have any checks for the refcount going negative. I believe that such a
> check would be valuable to help catch cases where we might end up inadvertently
> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
> that as a separate patch.
> 


-----------------------------------


From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
where the refcount can go negative.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..00d29bc 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,7 @@ void put_online_cpus(void)
 	if (cpu_hotplug.active_writer == current)
 		return;
 	mutex_lock(&cpu_hotplug.lock);
+	BUG_ON(cpu_hotplug.refcount == 0);
 	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
 		wake_up_process(cpu_hotplug.active_writer);
 	mutex_unlock(&cpu_hotplug.lock);



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()"))
  2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
                                   ` (2 preceding siblings ...)
  2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
@ 2012-10-03 14:15                 ` Christoph Lameter
  2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
  3 siblings, 1 reply; 46+ messages in thread
From: Christoph Lameter @ 2012-10-03 14:15 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Pekka Enberg, Paul E. McKenney, Josh Triplett,
	linux-kernel, Srivatsa S. Bhat, linux-mm

On Wed, 3 Oct 2012, Jiri Kosina wrote:

> How about the patch below? Pekka, Christoph, please?

Looks fine for -stable. For upstream there is going to be a move to
slab_common coming in this merge period. We would need a fix against -next
or Pekka's tree too.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
  2012-10-03 12:22                   ` Srivatsa S. Bhat
@ 2012-10-03 14:17                   ` Christoph Lameter
  1 sibling, 0 replies; 46+ messages in thread
From: Christoph Lameter @ 2012-10-03 14:17 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Paul E. McKenney, Pekka Enberg, Paul E. McKenney, Josh Triplett,
	linux-kernel, Srivatsa S. Bhat, linux-mm


Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
@ 2012-10-03 14:34                   ` Jiri Kosina
  2012-10-03 15:00                     ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-03 14:34 UTC (permalink / raw)
  To: Christoph Lameter, Pekka Enberg
  Cc: Paul E. McKenney, Paul E. McKenney, Josh Triplett, linux-kernel,
	Srivatsa S. Bhat, linux-mm

On Wed, 3 Oct 2012, Christoph Lameter wrote:

> > How about the patch below? Pekka, Christoph, please?
> 
> Looks fine for -stable. For upstream there is going to be a move to
> slab_common coming in this merge period. We would need a fix against -next
> or Pekka's tree too.

Thanks Christoph. Patch against Pekka's slab/for-linus branch below.

I have kept the Acked-by/Reviewed-by from the version of the patch against 
current Linus' tree, if anyone object, please shout loudly. Ideally should 
go in during this merge window to keep lockdep happy.





From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()

Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().

Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:

=== [ cut here ] ===
 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
 -------------------------------------------------------
 kworker/u:2/40 is trying to acquire lock:
  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0

 but task is already holding lock:
  (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (slab_mutex){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
        [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
        [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
        [<ffffffff8155719d>] _cpu_up+0xba/0x14e
        [<ffffffff815572ed>] cpu_up+0xbc/0x117
        [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
        [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81049197>] get_online_cpus+0x37/0x50
        [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
        [<ffffffff8118cc01>] deactivate_super+0x61/0x70
        [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
        [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
        [<ffffffff81569979>] system_call_fastpath+0x16/0x1b

 -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
        [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
        [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
        [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
        [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
        [<ffffffff81454b79>] ops_exit_list+0x39/0x60
        [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
        [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
        [<ffffffff81069f3e>] worker_thread+0x12e/0x320
        [<ffffffff8106f73e>] kthread+0x9e/0xb0
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 other info that might help us debug this:

 Chain exists of:
   rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slab_mutex);
                                lock(cpu_hotplug.lock);
                                lock(slab_mutex);
   lock(rcu_sched_state.barrier_mutex);

  *** DEADLOCK ***
=== [ cut here ] ===

This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.

The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.

This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:

- it slightly reduces hold time of slab_mutex; as it's used to protect
  the cachep list, it's not necessary to hold it over kmem_cache_free()
  call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
  learning about slab_mutex -> cpu_hotplug.lock dependency

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 mm/slab_common.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c21725..90c3053 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 	s->refcount--;
 	if (!s->refcount) {
 		list_del(&s->list);
+		mutex_unlock(&slab_mutex);
 
 		if (!__kmem_cache_shutdown(s)) {
 			if (s->flags & SLAB_DESTROY_BY_RCU)
@@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
 				s->name);
 			dump_stack();
 		}
+	} else {
+		mutex_unlock(&slab_mutex);
 	}
-	mutex_unlock(&slab_mutex);
 	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 12:22                   ` Srivatsa S. Bhat
  2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
@ 2012-10-03 14:50                     ` Paul E. McKenney
  2012-10-03 14:55                       ` Srivatsa S. Bhat
  1 sibling, 1 reply; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-03 14:50 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Kosina, Christoph Lameter, Pekka Enberg, Paul E. McKenney,
	Josh Triplett, linux-kernel, linux-mm

On Wed, Oct 03, 2012 at 05:52:26PM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
> > On Wed, 3 Oct 2012, Jiri Kosina wrote:
> > 
> >> Good question. I believe it should be safe to drop slab_mutex earlier, as 
> >> cachep has already been unlinked. I am adding slab people and linux-mm to 
> >> CC (the whole thread on LKML can be found at 
> >> https://lkml.org/lkml/2012/10/2/296 for reference).
> >>
> >> How about the patch below? Pekka, Christoph, please?
> > 
> > It turns out that lockdep is actually getting this wrong, so the changelog 
> > in the previous version wasn't accurate.
> > 
> > Please find patch with updated changelog below. Pekka, Christoph, could 
> > you please check whether it makes sense to you as well? Thanks.
> > 
> > 
> > 
> > 
> > From: Jiri Kosina <jkosina@suse.cz>
> > Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> > 
> > Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> > __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> > dependency through kmem_cache_destroy() -> rcu_barrier() ->
> > _rcu_barrier() -> get_online_cpus().
> > 
> > Lockdep thinks that this might actually result in ABBA deadlock,
> > and reports it as below:
> > 
> > === [ cut here ] ===
> >  ======================================================
> >  [ INFO: possible circular locking dependency detected ]
> >  3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
> >  -------------------------------------------------------
> >  kworker/u:2/40 is trying to acquire lock:
> >   (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> > 
> >  but task is already holding lock:
> >   (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
> > 
> >  which lock already depends on the new lock.
> > 
> >  the existing dependency chain (in reverse order) is:
> > 
> >  -> #2 (slab_mutex){+.+.+.}:
> >         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >         [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >         [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
> >         [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
> >         [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
> >         [<ffffffff8155719d>] _cpu_up+0xba/0x14e
> >         [<ffffffff815572ed>] cpu_up+0xbc/0x117
> >         [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
> >         [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
> >         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> > 
> >  -> #1 (cpu_hotplug.lock){+.+.+.}:
> >         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >         [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >         [<ffffffff81049197>] get_online_cpus+0x37/0x50
> >         [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
> >         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> >         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> >         [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
> >         [<ffffffff8118cc01>] deactivate_super+0x61/0x70
> >         [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
> >         [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
> >         [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
> > 
> >  -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
> >         [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
> >         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >         [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >         [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> >         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> >         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> >         [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
> >         [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
> >         [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
> >         [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
> >         [<ffffffff81454b79>] ops_exit_list+0x39/0x60
> >         [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
> >         [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
> >         [<ffffffff81069f3e>] worker_thread+0x12e/0x320
> >         [<ffffffff8106f73e>] kthread+0x9e/0xb0
> >         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> > 
> >  other info that might help us debug this:
> > 
> >  Chain exists of:
> >    rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
> > 
> >   Possible unsafe locking scenario:
> > 
> >         CPU0                    CPU1
> >         ----                    ----
> >    lock(slab_mutex);
> >                                 lock(cpu_hotplug.lock);
> >                                 lock(slab_mutex);
> >    lock(rcu_sched_state.barrier_mutex);
> > 
> >   *** DEADLOCK ***
> > === [ cut here ] ===
> > 
> > This is actually a false positive. Lockdep has no way of knowing the fact
> > that the ABBA can actually never happen, because of special semantics of
> > cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> > exclusion there is not achieved through mutex, but through
> > cpu_hotplug.refcount.
> > 
> > The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> > until everyone who called get_online_cpus() will call put_online_cpus()"
> > semantics is totally invisible to lockdep.
> > 
> > This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> > is being called with it unlocked. It has two advantages:
> > 
> > - it slightly reduces hold time of slab_mutex; as it's used to protect
> >   the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
> >   call any more
> > - it silences the lockdep false positive warning, as it avoids lockdep ever
> >   learning about slab_mutex -> cpu_hotplug.lock dependency
> > 
> > Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> 
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> 
> Earlier I was under the wrong impression that all the calltraces that lockdep
> spewed were reflecting the same point in time. So, sorry about that noise! :-)
> It is indeed a false-positive and there is no real bug here, and the fix looks
> good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.

I am not so sure about it being a false positive.  Consider the following
sequence of events:

o	Thread A starts a CPU-hotplug operation, acquiring the
	hotplug mutex.

o	Thread B does a kmem_cache_destroy(), acquiring the slab mutex.

o	Thread A reaches the slab CPU-hotplug notifier, but cannot acquire
	the slab mutex because Thread B hold it.

o	Thread B enters rcu_barrier(), but cannot acquire the hotplug
	mutex because Thread A holds it.

So I would argue that lockdep's output was a bit confusing, but that
the deadlock it flagged is real.  Or am I still missing something?

							Thanx, Paul

> But, I'm also quite surprised that the put_online_cpus() code as it stands today
> doesn't have any checks for the refcount going negative. I believe that such a
> check would be valuable to help catch cases where we might end up inadvertently
> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
> that as a separate patch.
> 
> Regards,
> Srivatsa S. Bhat
> 
> > ---
> >  mm/slab.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 1133911..693c7cb 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
> >  		put_online_cpus();
> >  		return;
> >  	}
> > +	mutex_unlock(&slab_mutex);
> > 
> >  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> >  		rcu_barrier();
> > 
> >  	__kmem_cache_destroy(cachep);
> > -	mutex_unlock(&slab_mutex);
> >  	put_online_cpus();
> >  }
> >  EXPORT_SYMBOL(kmem_cache_destroy);
> > 
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
@ 2012-10-03 14:55                       ` Srivatsa S. Bhat
  2012-10-03 16:00                         ` Paul E. McKenney
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 14:55 UTC (permalink / raw)
  To: paulmck
  Cc: Jiri Kosina, Christoph Lameter, Pekka Enberg, Paul E. McKenney,
	Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 08:20 PM, Paul E. McKenney wrote:
> On Wed, Oct 03, 2012 at 05:52:26PM +0530, Srivatsa S. Bhat wrote:
>> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
>>> On Wed, 3 Oct 2012, Jiri Kosina wrote:
>>>
>>>> Good question. I believe it should be safe to drop slab_mutex earlier, as 
>>>> cachep has already been unlinked. I am adding slab people and linux-mm to 
>>>> CC (the whole thread on LKML can be found at 
>>>> https://lkml.org/lkml/2012/10/2/296 for reference).
>>>>
>>>> How about the patch below? Pekka, Christoph, please?
>>>
>>> It turns out that lockdep is actually getting this wrong, so the changelog 
>>> in the previous version wasn't accurate.
>>>
>>> Please find patch with updated changelog below. Pekka, Christoph, could 
>>> you please check whether it makes sense to you as well? Thanks.
>>>
>>>
>>>
>>>
>>> From: Jiri Kosina <jkosina@suse.cz>
>>> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
>>>
>>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
>>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
>>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
>>> _rcu_barrier() -> get_online_cpus().
>>>
>>> Lockdep thinks that this might actually result in ABBA deadlock,
>>> and reports it as below:
>>>
>>> === [ cut here ] ===
>>>  ======================================================
>>>  [ INFO: possible circular locking dependency detected ]
>>>  3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
>>>  -------------------------------------------------------
>>>  kworker/u:2/40 is trying to acquire lock:
>>>   (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>>>
>>>  but task is already holding lock:
>>>   (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
>>>
>>>  which lock already depends on the new lock.
>>>
>>>  the existing dependency chain (in reverse order) is:
>>>
>>>  -> #2 (slab_mutex){+.+.+.}:
>>>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>>>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>>>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>>>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>>>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>>>         [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
>>>         [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
>>>         [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
>>>         [<ffffffff8155719d>] _cpu_up+0xba/0x14e
>>>         [<ffffffff815572ed>] cpu_up+0xbc/0x117
>>>         [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
>>>         [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
>>>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>>>
>>>  -> #1 (cpu_hotplug.lock){+.+.+.}:
>>>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>>>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>>>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>>>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>>>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>>>         [<ffffffff81049197>] get_online_cpus+0x37/0x50
>>>         [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
>>>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>>>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>>>         [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
>>>         [<ffffffff8118cc01>] deactivate_super+0x61/0x70
>>>         [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
>>>         [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
>>>         [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
>>>
>>>  -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
>>>         [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
>>>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>>>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>>>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>>>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>>>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>>>         [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>>>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>>>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>>>         [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
>>>         [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
>>>         [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
>>>         [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
>>>         [<ffffffff81454b79>] ops_exit_list+0x39/0x60
>>>         [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
>>>         [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
>>>         [<ffffffff81069f3e>] worker_thread+0x12e/0x320
>>>         [<ffffffff8106f73e>] kthread+0x9e/0xb0
>>>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>>>
>>>  other info that might help us debug this:
>>>
>>>  Chain exists of:
>>>    rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
>>>
>>>   Possible unsafe locking scenario:
>>>
>>>         CPU0                    CPU1
>>>         ----                    ----
>>>    lock(slab_mutex);
>>>                                 lock(cpu_hotplug.lock);
>>>                                 lock(slab_mutex);
>>>    lock(rcu_sched_state.barrier_mutex);
>>>
>>>   *** DEADLOCK ***
>>> === [ cut here ] ===
>>>
>>> This is actually a false positive. Lockdep has no way of knowing the fact
>>> that the ABBA can actually never happen, because of special semantics of
>>> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
>>> exclusion there is not achieved through mutex, but through
>>> cpu_hotplug.refcount.
>>>
>>> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
>>> until everyone who called get_online_cpus() will call put_online_cpus()"
>>> semantics is totally invisible to lockdep.
>>>
>>> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
>>> is being called with it unlocked. It has two advantages:
>>>
>>> - it slightly reduces hold time of slab_mutex; as it's used to protect
>>>   the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
>>>   call any more
>>> - it silences the lockdep false positive warning, as it avoids lockdep ever
>>>   learning about slab_mutex -> cpu_hotplug.lock dependency
>>>
>>> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>>> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
>>
>> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>
>> Earlier I was under the wrong impression that all the calltraces that lockdep
>> spewed were reflecting the same point in time. So, sorry about that noise! :-)
>> It is indeed a false-positive and there is no real bug here, and the fix looks
>> good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.
> 
> I am not so sure about it being a false positive.  Consider the following
> sequence of events:
> 
> o	Thread A starts a CPU-hotplug operation, acquiring the
> 	hotplug mutex.
> 
> o	Thread B does a kmem_cache_destroy(), acquiring the slab mutex.

This can't happen. Because kmem_cache_destroy() will call get_online_cpus() before
trying to acquire slab mutex. And it sleeps waiting at get_online_cpus() because
the hotplug lock has already been acquired by Thread A.

> 
> o	Thread A reaches the slab CPU-hotplug notifier, but cannot acquire
> 	the slab mutex because Thread B hold it.
> 
> o	Thread B enters rcu_barrier(), but cannot acquire the hotplug
> 	mutex because Thread A holds it.
> 
> So I would argue that lockdep's output was a bit confusing, but that
> the deadlock it flagged is real.  Or am I still missing something?
>

So the key point is, Thread A is a hotplug writer. Thread B becomes a hotplug reader
the moment it calls get_online_cpus(). So they can't co-exist/run together. They will
get serialized. That is, Thread A runs to completion, releases hotplug lock. Only then
thread B gets past get_online_cpus().

Regards,
Srivatsa S. Bhat
  
>> But, I'm also quite surprised that the put_online_cpus() code as it stands today
>> doesn't have any checks for the refcount going negative. I believe that such a
>> check would be valuable to help catch cases where we might end up inadvertently
>> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
>> that as a separate patch.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>> ---
>>>  mm/slab.c |    2 +-
>>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/mm/slab.c b/mm/slab.c
>>> index 1133911..693c7cb 100644
>>> --- a/mm/slab.c
>>> +++ b/mm/slab.c
>>> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
>>>  		put_online_cpus();
>>>  		return;
>>>  	}
>>> +	mutex_unlock(&slab_mutex);
>>>
>>>  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
>>>  		rcu_barrier();
>>>
>>>  	__kmem_cache_destroy(cachep);
>>> -	mutex_unlock(&slab_mutex);
>>>  	put_online_cpus();
>>>  }
>>>  EXPORT_SYMBOL(kmem_cache_destroy);
>>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
@ 2012-10-03 15:00                     ` Srivatsa S. Bhat
  2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 15:00 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Christoph Lameter, Pekka Enberg, Paul E. McKenney,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 08:04 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Christoph Lameter wrote:
> 
>>> How about the patch below? Pekka, Christoph, please?
>>
>> Looks fine for -stable. For upstream there is going to be a move to
>> slab_common coming in this merge period. We would need a fix against -next
>> or Pekka's tree too.
> 
> Thanks Christoph. Patch against Pekka's slab/for-linus branch below.
> 
> I have kept the Acked-by/Reviewed-by from the version of the patch against 
> current Linus' tree, if anyone object, please shout loudly. Ideally should 
> go in during this merge window to keep lockdep happy.
> 
> 
> 
> 
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> 
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
> 
[...] 
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Acked-by: Christoph Lameter <cl@linux.com>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
>  mm/slab_common.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c21725..90c3053 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  	s->refcount--;
>  	if (!s->refcount) {
>  		list_del(&s->list);
> +		mutex_unlock(&slab_mutex);
> 
>  		if (!__kmem_cache_shutdown(s)) {

__kmem_cache_shutdown() calls __cache_shrink(). And __cache_shrink() has this
comment over it:
/* Called with slab_mutex held to protect against cpu hotplug */

So, I guess the question is whether to modify your patch to hold the slab_mutex
while calling this function, or to update the comment on top of this function
saying that we are OK to call this function (even without slab_mutex) when we
are inside a get/put_online_cpus() section.

>  			if (s->flags & SLAB_DESTROY_BY_RCU)
> @@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  				s->name);
>  			dump_stack();

There is a list_add() before this dump_stack(). I assume we need to hold the
slab_mutex while calling it.

>  		}
> +	} else {
> +		mutex_unlock(&slab_mutex);
>  	}
> -	mutex_unlock(&slab_mutex);
>  	put_online_cpus();
>  }
>  EXPORT_SYMBOL(kmem_cache_destroy);
> 

Regards,
Srivatsa S. Bhat


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 15:00                     ` Srivatsa S. Bhat
@ 2012-10-03 15:05                       ` Jiri Kosina
  2012-10-03 15:49                         ` Srivatsa S. Bhat
  2012-10-03 18:49                         ` David Rientjes
  0 siblings, 2 replies; 46+ messages in thread
From: Jiri Kosina @ 2012-10-03 15:05 UTC (permalink / raw)
  To: Srivatsa S. Bhat, Christoph Lameter, Pekka Enberg
  Cc: Paul E. McKenney, Paul E. McKenney, Josh Triplett, linux-kernel,
	linux-mm

On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:

> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 9c21725..90c3053 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >  	s->refcount--;
> >  	if (!s->refcount) {
> >  		list_del(&s->list);
> > +		mutex_unlock(&slab_mutex);
> > 
> >  		if (!__kmem_cache_shutdown(s)) {
> 
> __kmem_cache_shutdown() calls __cache_shrink(). And __cache_shrink() has this
> comment over it:
> /* Called with slab_mutex held to protect against cpu hotplug */
> 
> So, I guess the question is whether to modify your patch to hold the slab_mutex
> while calling this function, or to update the comment on top of this function
> saying that we are OK to call this function (even without slab_mutex) when we
> are inside a get/put_online_cpus() section.
> 
> >  			if (s->flags & SLAB_DESTROY_BY_RCU)
> > @@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >  				s->name);
> >  			dump_stack();
> 
> There is a list_add() before this dump_stack(). I assume we need to hold the
> slab_mutex while calling it.

Gah, of course it is, thanks for spotting this.





From: Jiri Kosina <jkosina@suse.cz>
Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()

Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
dependency through kmem_cache_destroy() -> rcu_barrier() ->
_rcu_barrier() -> get_online_cpus().

Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:

=== [ cut here ] ===
 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
 -------------------------------------------------------
 kworker/u:2/40 is trying to acquire lock:
  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0

 but task is already holding lock:
  (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (slab_mutex){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
        [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
        [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
        [<ffffffff8155719d>] _cpu_up+0xba/0x14e
        [<ffffffff815572ed>] cpu_up+0xbc/0x117
        [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
        [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81049197>] get_online_cpus+0x37/0x50
        [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
        [<ffffffff8118cc01>] deactivate_super+0x61/0x70
        [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
        [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
        [<ffffffff81569979>] system_call_fastpath+0x16/0x1b

 -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
        [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
        [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
        [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
        [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
        [<ffffffff81454b79>] ops_exit_list+0x39/0x60
        [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
        [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
        [<ffffffff81069f3e>] worker_thread+0x12e/0x320
        [<ffffffff8106f73e>] kthread+0x9e/0xb0
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 other info that might help us debug this:

 Chain exists of:
   rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slab_mutex);
                                lock(cpu_hotplug.lock);
                                lock(slab_mutex);
   lock(rcu_sched_state.barrier_mutex);

  *** DEADLOCK ***
=== [ cut here ] ===

This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.

The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.

This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:

- it slightly reduces hold time of slab_mutex; as it's used to protect
  the cachep list, it's not necessary to hold it over kmem_cache_free()
  call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
  learning about slab_mutex -> cpu_hotplug.lock dependency

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 mm/slab_common.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c21725..069a24e6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		list_del(&s->list);
 
 		if (!__kmem_cache_shutdown(s)) {
+			mutex_unlock(&slab_mutex);
 			if (s->flags & SLAB_DESTROY_BY_RCU)
 				rcu_barrier();
 
@@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
 			kmem_cache_free(kmem_cache, s);
 		} else {
 			list_add(&s->list, &slab_caches);
+			mutex_unlock(&slab_mutex);
 			printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
 				s->name);
 			dump_stack();
 		}
+	} else {
+		mutex_unlock(&slab_mutex);
 	}
-	mutex_unlock(&slab_mutex);
 	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
@ 2012-10-03 15:49                         ` Srivatsa S. Bhat
  2012-10-03 18:49                         ` David Rientjes
  1 sibling, 0 replies; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-03 15:49 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Christoph Lameter, Pekka Enberg, Paul E. McKenney,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/03/2012 08:35 PM, Jiri Kosina wrote:
> On Wed, 3 Oct 2012, Srivatsa S. Bhat wrote:
> 
>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>> index 9c21725..90c3053 100644
>>> --- a/mm/slab_common.c
>>> +++ b/mm/slab_common.c
>>> @@ -166,6 +166,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
>>>  	s->refcount--;
>>>  	if (!s->refcount) {
>>>  		list_del(&s->list);
>>> +		mutex_unlock(&slab_mutex);
>>>
>>>  		if (!__kmem_cache_shutdown(s)) {
>>
>> __kmem_cache_shutdown() calls __cache_shrink(). And __cache_shrink() has this
>> comment over it:
>> /* Called with slab_mutex held to protect against cpu hotplug */
>>
>> So, I guess the question is whether to modify your patch to hold the slab_mutex
>> while calling this function, or to update the comment on top of this function
>> saying that we are OK to call this function (even without slab_mutex) when we
>> are inside a get/put_online_cpus() section.
>>
>>>  			if (s->flags & SLAB_DESTROY_BY_RCU)
>>> @@ -179,8 +180,9 @@ void kmem_cache_destroy(struct kmem_cache *s)
>>>  				s->name);
>>>  			dump_stack();
>>
>> There is a list_add() before this dump_stack(). I assume we need to hold the
>> slab_mutex while calling it.
> 
> Gah, of course it is, thanks for spotting this.
> 
> 
> From: Jiri Kosina <jkosina@suse.cz>
> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> 
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
> 
[...]
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
> 
> - it slightly reduces hold time of slab_mutex; as it's used to protect
>   the cachep list, it's not necessary to hold it over kmem_cache_free()
>   call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
>   learning about slab_mutex -> cpu_hotplug.lock dependency
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Hmm.. We can't do much about readability I guess... :(

Regards,
Srivatsa S. Bhat

> ---
>  mm/slab_common.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c21725..069a24e6 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  		list_del(&s->list);
> 
>  		if (!__kmem_cache_shutdown(s)) {
> +			mutex_unlock(&slab_mutex);
>  			if (s->flags & SLAB_DESTROY_BY_RCU)
>  				rcu_barrier();
> 
> @@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
>  			kmem_cache_free(kmem_cache, s);
>  		} else {
>  			list_add(&s->list, &slab_caches);
> +			mutex_unlock(&slab_mutex);
>  			printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
>  				s->name);
>  			dump_stack();
>  		}
> +	} else {
> +		mutex_unlock(&slab_mutex);
>  	}
> -	mutex_unlock(&slab_mutex);
>  	put_online_cpus();
>  }
>  EXPORT_SYMBOL(kmem_cache_destroy);
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 14:55                       ` Srivatsa S. Bhat
@ 2012-10-03 16:00                         ` Paul E. McKenney
  0 siblings, 0 replies; 46+ messages in thread
From: Paul E. McKenney @ 2012-10-03 16:00 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Kosina, Christoph Lameter, Pekka Enberg, Paul E. McKenney,
	Josh Triplett, linux-kernel, linux-mm

On Wed, Oct 03, 2012 at 08:25:28PM +0530, Srivatsa S. Bhat wrote:
> On 10/03/2012 08:20 PM, Paul E. McKenney wrote:
> > On Wed, Oct 03, 2012 at 05:52:26PM +0530, Srivatsa S. Bhat wrote:
> >> On 10/03/2012 03:16 PM, Jiri Kosina wrote:
> >>> On Wed, 3 Oct 2012, Jiri Kosina wrote:
> >>>
> >>>> Good question. I believe it should be safe to drop slab_mutex earlier, as 
> >>>> cachep has already been unlinked. I am adding slab people and linux-mm to 
> >>>> CC (the whole thread on LKML can be found at 
> >>>> https://lkml.org/lkml/2012/10/2/296 for reference).
> >>>>
> >>>> How about the patch below? Pekka, Christoph, please?
> >>>
> >>> It turns out that lockdep is actually getting this wrong, so the changelog 
> >>> in the previous version wasn't accurate.
> >>>
> >>> Please find patch with updated changelog below. Pekka, Christoph, could 
> >>> you please check whether it makes sense to you as well? Thanks.
> >>>
> >>>
> >>>
> >>>
> >>> From: Jiri Kosina <jkosina@suse.cz>
> >>> Subject: [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
> >>>
> >>> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> >>> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> >>> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> >>> _rcu_barrier() -> get_online_cpus().
> >>>
> >>> Lockdep thinks that this might actually result in ABBA deadlock,
> >>> and reports it as below:
> >>>
> >>> === [ cut here ] ===
> >>>  ======================================================
> >>>  [ INFO: possible circular locking dependency detected ]
> >>>  3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
> >>>  -------------------------------------------------------
> >>>  kworker/u:2/40 is trying to acquire lock:
> >>>   (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> >>>
> >>>  but task is already holding lock:
> >>>   (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
> >>>
> >>>  which lock already depends on the new lock.
> >>>
> >>>  the existing dependency chain (in reverse order) is:
> >>>
> >>>  -> #2 (slab_mutex){+.+.+.}:
> >>>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >>>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >>>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >>>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >>>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >>>         [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
> >>>         [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
> >>>         [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
> >>>         [<ffffffff8155719d>] _cpu_up+0xba/0x14e
> >>>         [<ffffffff815572ed>] cpu_up+0xbc/0x117
> >>>         [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
> >>>         [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
> >>>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> >>>
> >>>  -> #1 (cpu_hotplug.lock){+.+.+.}:
> >>>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >>>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >>>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >>>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >>>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >>>         [<ffffffff81049197>] get_online_cpus+0x37/0x50
> >>>         [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
> >>>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> >>>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> >>>         [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
> >>>         [<ffffffff8118cc01>] deactivate_super+0x61/0x70
> >>>         [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
> >>>         [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
> >>>         [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
> >>>
> >>>  -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
> >>>         [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
> >>>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
> >>>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
> >>>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
> >>>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
> >>>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
> >>>         [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> >>>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
> >>>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
> >>>         [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
> >>>         [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
> >>>         [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
> >>>         [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
> >>>         [<ffffffff81454b79>] ops_exit_list+0x39/0x60
> >>>         [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
> >>>         [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
> >>>         [<ffffffff81069f3e>] worker_thread+0x12e/0x320
> >>>         [<ffffffff8106f73e>] kthread+0x9e/0xb0
> >>>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> >>>
> >>>  other info that might help us debug this:
> >>>
> >>>  Chain exists of:
> >>>    rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
> >>>
> >>>   Possible unsafe locking scenario:
> >>>
> >>>         CPU0                    CPU1
> >>>         ----                    ----
> >>>    lock(slab_mutex);
> >>>                                 lock(cpu_hotplug.lock);
> >>>                                 lock(slab_mutex);
> >>>    lock(rcu_sched_state.barrier_mutex);
> >>>
> >>>   *** DEADLOCK ***
> >>> === [ cut here ] ===
> >>>
> >>> This is actually a false positive. Lockdep has no way of knowing the fact
> >>> that the ABBA can actually never happen, because of special semantics of
> >>> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> >>> exclusion there is not achieved through mutex, but through
> >>> cpu_hotplug.refcount.
> >>>
> >>> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> >>> until everyone who called get_online_cpus() will call put_online_cpus()"
> >>> semantics is totally invisible to lockdep.
> >>>
> >>> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> >>> is being called with it unlocked. It has two advantages:
> >>>
> >>> - it slightly reduces hold time of slab_mutex; as it's used to protect
> >>>   the cachep list, it's not necessary to hold it over __kmem_cache_destroy()
> >>>   call any more
> >>> - it silences the lockdep false positive warning, as it avoids lockdep ever
> >>>   learning about slab_mutex -> cpu_hotplug.lock dependency
> >>>
> >>> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >>> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> >>
> >> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >>
> >> Earlier I was under the wrong impression that all the calltraces that lockdep
> >> spewed were reflecting the same point in time. So, sorry about that noise! :-)
> >> It is indeed a false-positive and there is no real bug here, and the fix looks
> >> good, provided __kmem_cache_destroy() doesn't expect slab_mutex to be held.
> > 
> > I am not so sure about it being a false positive.  Consider the following
> > sequence of events:
> > 
> > o	Thread A starts a CPU-hotplug operation, acquiring the
> > 	hotplug mutex.
> > 
> > o	Thread B does a kmem_cache_destroy(), acquiring the slab mutex.
> 
> This can't happen. Because kmem_cache_destroy() will call get_online_cpus() before
> trying to acquire slab mutex. And it sleeps waiting at get_online_cpus() because
> the hotplug lock has already been acquired by Thread A.

Good point!!!  False positive it is!

							Thanx, Paul

> > o	Thread A reaches the slab CPU-hotplug notifier, but cannot acquire
> > 	the slab mutex because Thread B hold it.
> > 
> > o	Thread B enters rcu_barrier(), but cannot acquire the hotplug
> > 	mutex because Thread A holds it.
> > 
> > So I would argue that lockdep's output was a bit confusing, but that
> > the deadlock it flagged is real.  Or am I still missing something?
> >
> 
> So the key point is, Thread A is a hotplug writer. Thread B becomes a hotplug reader
> the moment it calls get_online_cpus(). So they can't co-exist/run together. They will
> get serialized. That is, Thread A runs to completion, releases hotplug lock. Only then
> thread B gets past get_online_cpus().
> 
> Regards,
> Srivatsa S. Bhat
>   
> >> But, I'm also quite surprised that the put_online_cpus() code as it stands today
> >> doesn't have any checks for the refcount going negative. I believe that such a
> >> check would be valuable to help catch cases where we might end up inadvertently
> >> causing an imbalance between get_online_cpus() and put_online_cpus(). I'll post
> >> that as a separate patch.
> >>
> >> Regards,
> >> Srivatsa S. Bhat
> >>
> >>> ---
> >>>  mm/slab.c |    2 +-
> >>>  1 files changed, 1 insertions(+), 1 deletions(-)
> >>>
> >>> diff --git a/mm/slab.c b/mm/slab.c
> >>> index 1133911..693c7cb 100644
> >>> --- a/mm/slab.c
> >>> +++ b/mm/slab.c
> >>> @@ -2801,12 +2801,12 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
> >>>  		put_online_cpus();
> >>>  		return;
> >>>  	}
> >>> +	mutex_unlock(&slab_mutex);
> >>>
> >>>  	if (unlikely(cachep->flags & SLAB_DESTROY_BY_RCU))
> >>>  		rcu_barrier();
> >>>
> >>>  	__kmem_cache_destroy(cachep);
> >>> -	mutex_unlock(&slab_mutex);
> >>>  	put_online_cpus();
> >>>  }
> >>>  EXPORT_SYMBOL(kmem_cache_destroy);
> >>>
> >>
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
  2012-10-03 15:49                         ` Srivatsa S. Bhat
@ 2012-10-03 18:49                         ` David Rientjes
  2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
  1 sibling, 1 reply; 46+ messages in thread
From: David Rientjes @ 2012-10-03 18:49 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Srivatsa S. Bhat, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Paul E. McKenney, Josh Triplett, linux-kernel,
	linux-mm

On Wed, 3 Oct 2012, Jiri Kosina wrote:

> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock
> dependency through kmem_cache_destroy() -> rcu_barrier() ->
> _rcu_barrier() -> get_online_cpus().
> 
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
> 
> === [ cut here ] ===
>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
>  -------------------------------------------------------
>  kworker/u:2/40 is trying to acquire lock:
>   (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
> 
>  but task is already holding lock:
>   (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
> 
>  which lock already depends on the new lock.
> 
>  the existing dependency chain (in reverse order) is:
> 
>  -> #2 (slab_mutex){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
>         [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
>         [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
>         [<ffffffff8155719d>] _cpu_up+0xba/0x14e
>         [<ffffffff815572ed>] cpu_up+0xbc/0x117
>         [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
>         [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> 
>  -> #1 (cpu_hotplug.lock){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81049197>] get_online_cpus+0x37/0x50
>         [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
>         [<ffffffff8118cc01>] deactivate_super+0x61/0x70
>         [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
>         [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
>         [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
> 
>  -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
>         [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
>         [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
>         [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
>         [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
>         [<ffffffff81454b79>] ops_exit_list+0x39/0x60
>         [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
>         [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
>         [<ffffffff81069f3e>] worker_thread+0x12e/0x320
>         [<ffffffff8106f73e>] kthread+0x9e/0xb0
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
> 
>  other info that might help us debug this:
> 
>  Chain exists of:
>    rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(slab_mutex);
>                                 lock(cpu_hotplug.lock);
>                                 lock(slab_mutex);
>    lock(rcu_sched_state.barrier_mutex);
> 
>   *** DEADLOCK ***
> === [ cut here ] ===
> 
> This is actually a false positive. Lockdep has no way of knowing the fact
> that the ABBA can actually never happen, because of special semantics of
> cpu_hotplug.refcount and itss handling in cpu_hotplug_begin(); the mutual
> exclusion there is not achieved through mutex, but through
> cpu_hotplug.refcount.
> 
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> until everyone who called get_online_cpus() will call put_online_cpus()"
> semantics is totally invisible to lockdep.
> 
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
> 
> - it slightly reduces hold time of slab_mutex; as it's used to protect
>   the cachep list, it's not necessary to hold it over kmem_cache_free()
>   call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
>   learning about slab_mutex -> cpu_hotplug.lock dependency
> 
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
  2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
@ 2012-10-03 21:13                       ` Andrew Morton
  2012-10-04  6:16                         ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Andrew Morton @ 2012-10-03 21:13 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Jiri Kosina, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On Wed, 03 Oct 2012 18:23:09 +0530
"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:

> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
> 
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative.
> 
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  kernel/cpu.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..00d29bc 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>  	if (cpu_hotplug.active_writer == current)
>  		return;
>  	mutex_lock(&cpu_hotplug.lock);
> +	BUG_ON(cpu_hotplug.refcount == 0);
>  	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>  		wake_up_process(cpu_hotplug.active_writer);
>  	mutex_unlock(&cpu_hotplug.lock);

I think calling BUG() here is a bit harsh.  We should only do that if
there's a risk to proceeding: a risk of data loss, a reduced ability to
analyse the underlying bug, etc.

But a cpu-hotplug locking imbalance is a really really really minor
problem!  So how about we emit a warning then try to fix things up? 
This should increase the chance that the machine will keep running and
so will increase the chance that a user will be able to report the bug
to us.


--- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
+++ a/kernel/cpu.c
@@ -80,9 +80,12 @@ void put_online_cpus(void)
 	if (cpu_hotplug.active_writer == current)
 		return;
 	mutex_lock(&cpu_hotplug.lock);
-	BUG_ON(cpu_hotplug.refcount == 0);
-	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
-		wake_up_process(cpu_hotplug.active_writer);
+	if (!--cpu_hotplug.refcount) {
+		if (WARN_ON(cpu_hotplug.refcount == -1))
+			cpu_hotplug.refcount++;	/* try to fix things up */
+		if (unlikely(cpu_hotplug.active_writer))
+			wake_up_process(cpu_hotplug.active_writer);
+	}
 	mutex_unlock(&cpu_hotplug.lock);
 
 }
_


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
  2012-10-03 21:13                       ` Andrew Morton
@ 2012-10-04  6:16                         ` Srivatsa S. Bhat
  2012-10-05  3:24                           ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-04  6:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiri Kosina, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Paul E. McKenney, Christoph Lameter, Pekka Enberg,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm

On 10/04/2012 02:43 AM, Andrew Morton wrote:
> On Wed, 03 Oct 2012 18:23:09 +0530
> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
> 
>> The synchronization between CPU hotplug readers and writers is achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
>> it. If we ever hit an imbalance between the two, we end up compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra call to
>> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
>> where the refcount can go negative.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>>  kernel/cpu.c |    1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index f560598..00d29bc 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>  	if (cpu_hotplug.active_writer == current)
>>  		return;
>>  	mutex_lock(&cpu_hotplug.lock);
>> +	BUG_ON(cpu_hotplug.refcount == 0);
>>  	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>  		wake_up_process(cpu_hotplug.active_writer);
>>  	mutex_unlock(&cpu_hotplug.lock);
> 
> I think calling BUG() here is a bit harsh.  We should only do that if
> there's a risk to proceeding: a risk of data loss, a reduced ability to
> analyse the underlying bug, etc.
> 
> But a cpu-hotplug locking imbalance is a really really really minor
> problem!  So how about we emit a warning then try to fix things up? 

That would be better indeed, thanks!

> This should increase the chance that the machine will keep running and
> so will increase the chance that a user will be able to report the bug
> to us.
>

Yep, sounds good.
 
> 
> --- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
> +++ a/kernel/cpu.c
> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>  	if (cpu_hotplug.active_writer == current)
>  		return;
>  	mutex_lock(&cpu_hotplug.lock);
> -	BUG_ON(cpu_hotplug.refcount == 0);
> -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> -		wake_up_process(cpu_hotplug.active_writer);
> +	if (!--cpu_hotplug.refcount) {

This won't catch it. We'll enter this 'if' condition only when cpu_hotplug.refcount was
decremented to zero. We'll miss out the case when it went negative (which we intended to detect).

> +		if (WARN_ON(cpu_hotplug.refcount == -1))
> +			cpu_hotplug.refcount++;	/* try to fix things up */
> +		if (unlikely(cpu_hotplug.active_writer))
> +			wake_up_process(cpu_hotplug.active_writer);
> +	}
>  	mutex_unlock(&cpu_hotplug.lock);
> 
>  }

So how about something like below:

------------------------------------------------------>

From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()

The synchronization between CPU hotplug readers and writers is achieved by
means of refcounting, safe-guarded by the cpu_hotplug.lock.

get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
it. If we ever hit an imbalance between the two, we end up compromising the
guarantees of the hotplug synchronization i.e, for example, an extra call to
put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
where the refcount can go negative, and also attempt to fix it up, so that we can
continue to run.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
---

 kernel/cpu.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index f560598..42bd331 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -80,6 +80,10 @@ void put_online_cpus(void)
 	if (cpu_hotplug.active_writer == current)
 		return;
 	mutex_lock(&cpu_hotplug.lock);
+
+	if (WARN_ON(!cpu_hotplug.refcount))
+		cpu_hotplug.refcount++; /* try to fix things up */
+
 	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
 		wake_up_process(cpu_hotplug.active_writer);
 	mutex_unlock(&cpu_hotplug.lock);



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
  2012-10-04  6:16                         ` Srivatsa S. Bhat
@ 2012-10-05  3:24                           ` Yasuaki Ishimatsu
  2012-10-05  5:35                             ` Srivatsa S. Bhat
  0 siblings, 1 reply; 46+ messages in thread
From: Yasuaki Ishimatsu @ 2012-10-05  3:24 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Andrew Morton, Jiri Kosina, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Christoph Lameter,
	Pekka Enberg, Paul E. McKenney, Josh Triplett, linux-kernel,
	linux-mm

2012/10/04 15:16, Srivatsa S. Bhat wrote:
> On 10/04/2012 02:43 AM, Andrew Morton wrote:
>> On Wed, 03 Oct 2012 18:23:09 +0530
>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>
>>> The synchronization between CPU hotplug readers and writers is achieved by
>>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>>
>>> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
>>> it. If we ever hit an imbalance between the two, we end up compromising the
>>> guarantees of the hotplug synchronization i.e, for example, an extra call to
>>> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect such cases
>>> where the refcount can go negative.
>>>
>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>> ---
>>>
>>>   kernel/cpu.c |    1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>>> index f560598..00d29bc 100644
>>> --- a/kernel/cpu.c
>>> +++ b/kernel/cpu.c
>>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>>   	if (cpu_hotplug.active_writer == current)
>>>   		return;
>>>   	mutex_lock(&cpu_hotplug.lock);
>>> +	BUG_ON(cpu_hotplug.refcount == 0);
>>>   	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>>   		wake_up_process(cpu_hotplug.active_writer);
>>>   	mutex_unlock(&cpu_hotplug.lock);
>>
>> I think calling BUG() here is a bit harsh.  We should only do that if
>> there's a risk to proceeding: a risk of data loss, a reduced ability to
>> analyse the underlying bug, etc.
>>
>> But a cpu-hotplug locking imbalance is a really really really minor
>> problem!  So how about we emit a warning then try to fix things up?
>
> That would be better indeed, thanks!
>
>> This should increase the chance that the machine will keep running and
>> so will increase the chance that a user will be able to report the bug
>> to us.
>>
>
> Yep, sounds good.
>
>>
>> --- a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
>> +++ a/kernel/cpu.c
>> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>>   	if (cpu_hotplug.active_writer == current)
>>   		return;
>>   	mutex_lock(&cpu_hotplug.lock);
>> -	BUG_ON(cpu_hotplug.refcount == 0);
>> -	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>> -		wake_up_process(cpu_hotplug.active_writer);
>> +	if (!--cpu_hotplug.refcount) {
>
> This won't catch it. We'll enter this 'if' condition only when cpu_hotplug.refcount was
> decremented to zero. We'll miss out the case when it went negative (which we intended to detect).
>
>> +		if (WARN_ON(cpu_hotplug.refcount == -1))
>> +			cpu_hotplug.refcount++;	/* try to fix things up */
>> +		if (unlikely(cpu_hotplug.active_writer))
>> +			wake_up_process(cpu_hotplug.active_writer);
>> +	}
>>   	mutex_unlock(&cpu_hotplug.lock);
>>
>>   }
>
> So how about something like below:
>
> ------------------------------------------------------>
>
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
>
> The synchronization between CPU hotplug readers and writers is achieved by
> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>
> get_online_cpus() increments the refcount, whereas put_online_cpus() decrements
> it. If we ever hit an imbalance between the two, we end up compromising the
> guarantees of the hotplug synchronization i.e, for example, an extra call to
> put_online_cpus() can end up allowing a hotplug reader to execute concurrently with
> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect such cases
> where the refcount can go negative, and also attempt to fix it up, so that we can
> continue to run.
>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---

Looks good to me.
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

>
>   kernel/cpu.c |    4 ++++
>   1 file changed, 4 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f560598..42bd331 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -80,6 +80,10 @@ void put_online_cpus(void)
>   	if (cpu_hotplug.active_writer == current)
>   		return;
>   	mutex_lock(&cpu_hotplug.lock);
> +
> +	if (WARN_ON(!cpu_hotplug.refcount))
> +		cpu_hotplug.refcount++; /* try to fix things up */
> +
>   	if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>   		wake_up_process(cpu_hotplug.active_writer);
>   	mutex_unlock(&cpu_hotplug.lock);
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus()
  2012-10-05  3:24                           ` Yasuaki Ishimatsu
@ 2012-10-05  5:35                             ` Srivatsa S. Bhat
  0 siblings, 0 replies; 46+ messages in thread
From: Srivatsa S. Bhat @ 2012-10-05  5:35 UTC (permalink / raw)
  To: Yasuaki Ishimatsu
  Cc: Andrew Morton, Jiri Kosina, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Paul E. McKenney, Christoph Lameter,
	Pekka Enberg, Paul E. McKenney, Josh Triplett, linux-kernel,
	linux-mm

On 10/05/2012 08:54 AM, Yasuaki Ishimatsu wrote:
> 2012/10/04 15:16, Srivatsa S. Bhat wrote:
>> On 10/04/2012 02:43 AM, Andrew Morton wrote:
>>> On Wed, 03 Oct 2012 18:23:09 +0530
>>> "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> wrote:
>>>
>>>> The synchronization between CPU hotplug readers and writers is
>>>> achieved by
>>>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>>>
>>>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>>>> decrements
>>>> it. If we ever hit an imbalance between the two, we end up
>>>> compromising the
>>>> guarantees of the hotplug synchronization i.e, for example, an extra
>>>> call to
>>>> put_online_cpus() can end up allowing a hotplug reader to execute
>>>> concurrently with
>>>> a hotplug writer. So, add a BUG_ON() in put_online_cpus() to detect
>>>> such cases
>>>> where the refcount can go negative.
>>>>
>>>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>>>> ---
>>>>
>>>>   kernel/cpu.c |    1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>>>> index f560598..00d29bc 100644
>>>> --- a/kernel/cpu.c
>>>> +++ b/kernel/cpu.c
>>>> @@ -80,6 +80,7 @@ void put_online_cpus(void)
>>>>       if (cpu_hotplug.active_writer == current)
>>>>           return;
>>>>       mutex_lock(&cpu_hotplug.lock);
>>>> +    BUG_ON(cpu_hotplug.refcount == 0);
>>>>       if (!--cpu_hotplug.refcount &&
>>>> unlikely(cpu_hotplug.active_writer))
>>>>           wake_up_process(cpu_hotplug.active_writer);
>>>>       mutex_unlock(&cpu_hotplug.lock);
>>>
>>> I think calling BUG() here is a bit harsh.  We should only do that if
>>> there's a risk to proceeding: a risk of data loss, a reduced ability to
>>> analyse the underlying bug, etc.
>>>
>>> But a cpu-hotplug locking imbalance is a really really really minor
>>> problem!  So how about we emit a warning then try to fix things up?
>>
>> That would be better indeed, thanks!
>>
>>> This should increase the chance that the machine will keep running and
>>> so will increase the chance that a user will be able to report the bug
>>> to us.
>>>
>>
>> Yep, sounds good.
>>
>>>
>>> ---
>>> a/kernel/cpu.c~cpu-hotplug-debug-detect-imbalance-between-get_online_cpus-and-put_online_cpus-fix
>>>
>>> +++ a/kernel/cpu.c
>>> @@ -80,9 +80,12 @@ void put_online_cpus(void)
>>>       if (cpu_hotplug.active_writer == current)
>>>           return;
>>>       mutex_lock(&cpu_hotplug.lock);
>>> -    BUG_ON(cpu_hotplug.refcount == 0);
>>> -    if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>> -        wake_up_process(cpu_hotplug.active_writer);
>>> +    if (!--cpu_hotplug.refcount) {
>>
>> This won't catch it. We'll enter this 'if' condition only when
>> cpu_hotplug.refcount was
>> decremented to zero. We'll miss out the case when it went negative
>> (which we intended to detect).
>>
>>> +        if (WARN_ON(cpu_hotplug.refcount == -1))
>>> +            cpu_hotplug.refcount++;    /* try to fix things up */
>>> +        if (unlikely(cpu_hotplug.active_writer))
>>> +            wake_up_process(cpu_hotplug.active_writer);
>>> +    }
>>>       mutex_unlock(&cpu_hotplug.lock);
>>>
>>>   }
>>
>> So how about something like below:
>>
>> ------------------------------------------------------>
>>
>> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> Subject: [PATCH] CPU hotplug, debug: Detect imbalance between
>> get_online_cpus() and put_online_cpus()
>>
>> The synchronization between CPU hotplug readers and writers is
>> achieved by
>> means of refcounting, safe-guarded by the cpu_hotplug.lock.
>>
>> get_online_cpus() increments the refcount, whereas put_online_cpus()
>> decrements
>> it. If we ever hit an imbalance between the two, we end up
>> compromising the
>> guarantees of the hotplug synchronization i.e, for example, an extra
>> call to
>> put_online_cpus() can end up allowing a hotplug reader to execute
>> concurrently with
>> a hotplug writer. So, add a WARN_ON() in put_online_cpus() to detect
>> such cases
>> where the refcount can go negative, and also attempt to fix it up, so
>> that we can
>> continue to run.
>>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
> 
> Looks good to me.
> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
> 

Thanks for your review Yasuaki!

Regards,
Srivatsa S. Bhat

>>
>>   kernel/cpu.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/cpu.c b/kernel/cpu.c
>> index f560598..42bd331 100644
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -80,6 +80,10 @@ void put_online_cpus(void)
>>       if (cpu_hotplug.active_writer == current)
>>           return;
>>       mutex_lock(&cpu_hotplug.lock);
>> +
>> +    if (WARN_ON(!cpu_hotplug.refcount))
>> +        cpu_hotplug.refcount++; /* try to fix things up */
>> +
>>       if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
>>           wake_up_process(cpu_hotplug.active_writer);
>>       mutex_unlock(&cpu_hotplug.lock);
>>
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] [RESEND] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-03 18:49                         ` David Rientjes
@ 2012-10-08  7:26                           ` Jiri Kosina
  2012-10-10  6:27                             ` Pekka Enberg
  0 siblings, 1 reply; 46+ messages in thread
From: Jiri Kosina @ 2012-10-08  7:26 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Srivatsa S. Bhat, Christoph Lameter, Paul E. McKenney,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm,
	David Rientjes

Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on 
__stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency 
through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() -> 
get_online_cpus().

Lockdep thinks that this might actually result in ABBA deadlock,
and reports it as below:

=== [ cut here ] ===
 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
 -------------------------------------------------------
 kworker/u:2/40 is trying to acquire lock:
  (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0

 but task is already holding lock:
  (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (slab_mutex){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
        [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
        [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
        [<ffffffff8155719d>] _cpu_up+0xba/0x14e
        [<ffffffff815572ed>] cpu_up+0xbc/0x117
        [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
        [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff81049197>] get_online_cpus+0x37/0x50
        [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
        [<ffffffff8118cc01>] deactivate_super+0x61/0x70
        [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
        [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
        [<ffffffff81569979>] system_call_fastpath+0x16/0x1b

 -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
        [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
        [<ffffffff810ae1e2>] validate_chain+0x632/0x720
        [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
        [<ffffffff810ae921>] lock_acquire+0x121/0x190
        [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
        [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
        [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
        [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
        [<ffffffff810f2309>] rcu_barrier+0x9/0x10
        [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
        [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
        [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
        [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
        [<ffffffff81454b79>] ops_exit_list+0x39/0x60
        [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
        [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
        [<ffffffff81069f3e>] worker_thread+0x12e/0x320
        [<ffffffff8106f73e>] kthread+0x9e/0xb0
        [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10

 other info that might help us debug this:

 Chain exists of:
   rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(slab_mutex);
                                lock(cpu_hotplug.lock);
                                lock(slab_mutex);
   lock(rcu_sched_state.barrier_mutex);

  *** DEADLOCK ***
=== [ cut here ] ===

This is actually a false positive. Lockdep has no way of knowing the fact
that the ABBA can actually never happen, because of special semantics of
cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual
exclusion there is not achieved through mutex, but through
cpu_hotplug.refcount.

The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
until everyone who called get_online_cpus() will call put_online_cpus()"
semantics is totally invisible to lockdep.

This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
is being called with it unlocked. It has two advantages:

- it slightly reduces hold time of slab_mutex; as it's used to protect
  the cachep list, it's not necessary to hold it over kmem_cache_free()
  call any more
- it silences the lockdep false positive warning, as it avoids lockdep ever
  learning about slab_mutex -> cpu_hotplug.lock dependency

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 mm/slab_common.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 9c21725..069a24e6 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
 		list_del(&s->list);
 
 		if (!__kmem_cache_shutdown(s)) {
+			mutex_unlock(&slab_mutex);
 			if (s->flags & SLAB_DESTROY_BY_RCU)
 				rcu_barrier();
 
@@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
 			kmem_cache_free(kmem_cache, s);
 		} else {
 			list_add(&s->list, &slab_caches);
+			mutex_unlock(&slab_mutex);
 			printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
 				s->name);
 			dump_stack();
 		}
+	} else {
+		mutex_unlock(&slab_mutex);
 	}
-	mutex_unlock(&slab_mutex);
 	put_online_cpus();
 }
 EXPORT_SYMBOL(kmem_cache_destroy);

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] [RESEND] mm, slab: release slab_mutex earlier in kmem_cache_destroy()
  2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
@ 2012-10-10  6:27                             ` Pekka Enberg
  0 siblings, 0 replies; 46+ messages in thread
From: Pekka Enberg @ 2012-10-10  6:27 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Srivatsa S. Bhat, Christoph Lameter, Paul E. McKenney,
	Paul E. McKenney, Josh Triplett, linux-kernel, linux-mm,
	David Rientjes

On Mon, Oct 8, 2012 at 10:26 AM, Jiri Kosina <jkosina@suse.cz> wrote:
> Commit 1331e7a1bbe1 ("rcu: Remove _rcu_barrier() dependency on
> __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency
> through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() ->
> get_online_cpus().
>
> Lockdep thinks that this might actually result in ABBA deadlock,
> and reports it as below:
>
> === [ cut here ] ===
>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.6.0-rc5-00004-g0d8ee37 #143 Not tainted
>  -------------------------------------------------------
>  kworker/u:2/40 is trying to acquire lock:
>   (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>
>  but task is already holding lock:
>   (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
>
>  which lock already depends on the new lock.
>
>  the existing dependency chain (in reverse order) is:
>
>  -> #2 (slab_mutex){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
>         [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
>         [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
>         [<ffffffff8155719d>] _cpu_up+0xba/0x14e
>         [<ffffffff815572ed>] cpu_up+0xbc/0x117
>         [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
>         [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
>  -> #1 (cpu_hotplug.lock){+.+.+.}:
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff81049197>] get_online_cpus+0x37/0x50
>         [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
>         [<ffffffff8118cc01>] deactivate_super+0x61/0x70
>         [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
>         [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
>         [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
>
>  -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
>         [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
>         [<ffffffff810ae1e2>] validate_chain+0x632/0x720
>         [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
>         [<ffffffff810ae921>] lock_acquire+0x121/0x190
>         [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
>         [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
>         [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
>         [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
>         [<ffffffff810f2309>] rcu_barrier+0x9/0x10
>         [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
>         [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
>         [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
>         [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
>         [<ffffffff81454b79>] ops_exit_list+0x39/0x60
>         [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
>         [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
>         [<ffffffff81069f3e>] worker_thread+0x12e/0x320
>         [<ffffffff8106f73e>] kthread+0x9e/0xb0
>         [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
>
>  other info that might help us debug this:
>
>  Chain exists of:
>    rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(slab_mutex);
>                                 lock(cpu_hotplug.lock);
>                                 lock(slab_mutex);
>    lock(rcu_sched_state.barrier_mutex);
>
>   *** DEADLOCK ***
> === [ cut here ] ===
>
> This is actually a false positive. Lockdep has no way of knowing the fact
> that the ABBA can actually never happen, because of special semantics of
> cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual
> exclusion there is not achieved through mutex, but through
> cpu_hotplug.refcount.
>
> The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
> until everyone who called get_online_cpus() will call put_online_cpus()"
> semantics is totally invisible to lockdep.
>
> This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
> is being called with it unlocked. It has two advantages:
>
> - it slightly reduces hold time of slab_mutex; as it's used to protect
>   the cachep list, it's not necessary to hold it over kmem_cache_free()
>   call any more
> - it silences the lockdep false positive warning, as it avoids lockdep ever
>   learning about slab_mutex -> cpu_hotplug.lock dependency
>
> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Acked-by: David Rientjes <rientjes@google.com>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
>  mm/slab_common.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 9c21725..069a24e6 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -168,6 +168,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
>                 list_del(&s->list);
>
>                 if (!__kmem_cache_shutdown(s)) {
> +                       mutex_unlock(&slab_mutex);
>                         if (s->flags & SLAB_DESTROY_BY_RCU)
>                                 rcu_barrier();
>
> @@ -175,12 +176,14 @@ void kmem_cache_destroy(struct kmem_cache *s)
>                         kmem_cache_free(kmem_cache, s);
>                 } else {
>                         list_add(&s->list, &slab_caches);
> +                       mutex_unlock(&slab_mutex);
>                         printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
>                                 s->name);
>                         dump_stack();
>                 }
> +       } else {
> +               mutex_unlock(&slab_mutex);
>         }
> -       mutex_unlock(&slab_mutex);
>         put_online_cpus();
>  }
>  EXPORT_SYMBOL(kmem_cache_destroy);

Applied, thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2012-10-10  6:27 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-02 16:14 Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Jiri Kosina
2012-10-02 17:01 ` Paul E. McKenney
2012-10-02 21:27   ` Jiri Kosina
2012-10-02 21:49     ` Jiri Kosina
2012-10-02 21:58       ` Jiri Kosina
2012-10-02 23:31         ` Paul E. McKenney
2012-10-02 23:48           ` Jiri Kosina
2012-10-03  0:15             ` Paul E. McKenney
2012-10-03  0:45               ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Jiri Kosina
2012-10-03  3:41                 ` Paul E. McKenney
2012-10-03  3:50                 ` Srivatsa S. Bhat
2012-10-03  6:08                   ` Srivatsa S. Bhat
2012-10-03  8:21                     ` Srivatsa S. Bhat
2012-10-03  9:46                 ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 12:22                   ` Srivatsa S. Bhat
2012-10-03 12:53                     ` [PATCH] CPU hotplug, debug: Detect imbalance between get_online_cpus() and put_online_cpus() Srivatsa S. Bhat
2012-10-03 21:13                       ` Andrew Morton
2012-10-04  6:16                         ` Srivatsa S. Bhat
2012-10-05  3:24                           ` Yasuaki Ishimatsu
2012-10-05  5:35                             ` Srivatsa S. Bhat
2012-10-03 14:50                     ` [PATCH v2] [RFC] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Paul E. McKenney
2012-10-03 14:55                       ` Srivatsa S. Bhat
2012-10-03 16:00                         ` Paul E. McKenney
2012-10-03 14:17                   ` Christoph Lameter
2012-10-03 14:15                 ` [PATCH] mm, slab: release slab_mutex earlier in kmem_cache_destroy() (was Re: Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()")) Christoph Lameter
2012-10-03 14:34                   ` [PATCH v3] mm, slab: release slab_mutex earlier in kmem_cache_destroy() Jiri Kosina
2012-10-03 15:00                     ` Srivatsa S. Bhat
2012-10-03 15:05                       ` [PATCH v4] " Jiri Kosina
2012-10-03 15:49                         ` Srivatsa S. Bhat
2012-10-03 18:49                         ` David Rientjes
2012-10-08  7:26                           ` [PATCH] [RESEND] " Jiri Kosina
2012-10-10  6:27                             ` Pekka Enberg
2012-10-03  3:59           ` Lockdep complains about commit 1331e7a1bb ("rcu: Remove _rcu_barrier() dependency on __stop_machine()") Srivatsa S. Bhat
2012-10-03  4:07             ` Paul E. McKenney
2012-10-03  4:15               ` Srivatsa S. Bhat
2012-10-02 20:39 ` Srivatsa S. Bhat
2012-10-02 22:17   ` Jiri Kosina
2012-10-03  3:35     ` Srivatsa S. Bhat
2012-10-03  3:44       ` Paul E. McKenney
2012-10-03  4:04         ` Srivatsa S. Bhat
2012-10-03  7:43           ` Jiri Kosina
2012-10-03  8:11             ` Srivatsa S. Bhat
2012-10-03  8:19               ` Jiri Kosina
2012-10-03  8:30                 ` Srivatsa S. Bhat
2012-10-03  9:24                   ` Jiri Kosina
2012-10-03  9:58                     ` Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).