All of lore.kernel.org
 help / color / mirror / Atom feed
* Query regarding synchronize_sched_expedited and resched_cpu
@ 2017-09-15 11:14 Neeraj Upadhyay
  2017-09-17  1:00 ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Neeraj Upadhyay @ 2017-09-15 11:14 UTC (permalink / raw)
  To: paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai
  Cc: linux-kernel, sramana, prsood

Hi,

We have one query regarding the behavior of RCU expedited grace period,
for scenario where resched_cpu() in sync_sched_exp_handler() fails to
acquire the rq lock and returns w/o setting the need_resched. In this
case, how do we ensure that the CPU notify rcu about the
end of sched grace period (schedule() -> __schedule() ->
rcu_note_context_switch(cpu) -> rcu_sched_qs()) , for cases where tick
is stopped on that CPU.  Is it implied from the rq lock acquisition
failure, that the owner of the rq lock will enforce context switch?
For which scenarios in RCU paths (as the function is used only in RCU
code), we need trylock check in resched_cpu()?

void resched_cpu(int cpu)
{
         struct rq *rq = cpu_rq(cpu);
         unsigned long flags;

         if (!raw_spin_trylock_irqsave(&rq->lock, flags))
                 return;
         resched_curr(rq);
         raw_spin_unlock_irqrestore(&rq->lock, flags);
}


This issue was observed in below scenario, where one of the CPUs (CPU1)
started synchronize_sched_expedited and sent IPI to CPU5, which is in
the idle path but handled sync_sched_exp_handler() IPI before 
rcu_idle_enter().
As resched_cpu() failed to acquire the rq lock, need_resched was not set,
and CPU went to idle; resulting in expedited stall getting reported by 
CPU1.

Below is the scenario:

•    CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
     IPI sent to CPU5

synchronize_sched_expedited_wait
         ret = swait_event_timeout(
                                     rsp->expedited_wq,
  sync_rcu_preempt_exp_done(rnp_root),
                                     jiffies_stall);

            expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())



•    CPU5 handles IPI and fails to acquire rq lock.

Handles IPI
     sync_sched_exp_handler
         resched_cpu
             returns while failing to try lock acquire rq->lock
         need_resched is not set

•    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
     idle (schedule() is not called).

•    CPU 1 reports RCU stall.

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-15 11:14 Query regarding synchronize_sched_expedited and resched_cpu Neeraj Upadhyay
@ 2017-09-17  1:00 ` Paul E. McKenney
  2017-09-17  6:07   ` Neeraj Upadhyay
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-17  1:00 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, linux-kernel,
	sramana, prsood

On Fri, Sep 15, 2017 at 04:44:38PM +0530, Neeraj Upadhyay wrote:
> Hi,
> 
> We have one query regarding the behavior of RCU expedited grace period,
> for scenario where resched_cpu() in sync_sched_exp_handler() fails to
> acquire the rq lock and returns w/o setting the need_resched. In this
> case, how do we ensure that the CPU notify rcu about the
> end of sched grace period (schedule() -> __schedule() ->
> rcu_note_context_switch(cpu) -> rcu_sched_qs()) , for cases where tick
> is stopped on that CPU.  Is it implied from the rq lock acquisition
> failure, that the owner of the rq lock will enforce context switch?
> For which scenarios in RCU paths (as the function is used only in RCU
> code), we need trylock check in resched_cpu()?
> 
> void resched_cpu(int cpu)
> {
>         struct rq *rq = cpu_rq(cpu);
>         unsigned long flags;
> 
>         if (!raw_spin_trylock_irqsave(&rq->lock, flags))
>                 return;
>         resched_curr(rq);
>         raw_spin_unlock_irqrestore(&rq->lock, flags);
> }
> 
> 
> This issue was observed in below scenario, where one of the CPUs (CPU1)
> started synchronize_sched_expedited and sent IPI to CPU5, which is in
> the idle path but handled sync_sched_exp_handler() IPI before
> rcu_idle_enter().
> As resched_cpu() failed to acquire the rq lock, need_resched was not set,
> and CPU went to idle; resulting in expedited stall getting reported
> by CPU1.
> 
> Below is the scenario:
> 
> •    CPU1 is waiting for expedited wait to complete:
> sync_rcu_exp_select_cpus
>     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
>     IPI sent to CPU5
> 
> synchronize_sched_expedited_wait
>         ret = swait_event_timeout(
>                                     rsp->expedited_wq,
>  sync_rcu_preempt_exp_done(rnp_root),
>                                     jiffies_stall);
> 
>            expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
> 
> 
> 
> •    CPU5 handles IPI and fails to acquire rq lock.
> 
> Handles IPI
>     sync_sched_exp_handler
>         resched_cpu
>             returns while failing to try lock acquire rq->lock
>         need_resched is not set
> 
> •    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
>     idle (schedule() is not called).
> 
> •    CPU 1 reports RCU stall.

Good catch and good detective work!!!

I will be working on a fix this week, hopefully involving resched_cpu()
getting a return value so that I can track who needs a later retry.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-17  1:00 ` Paul E. McKenney
@ 2017-09-17  6:07   ` Neeraj Upadhyay
  2017-09-18 15:11     ` Steven Rostedt
  0 siblings, 1 reply; 35+ messages in thread
From: Neeraj Upadhyay @ 2017-09-17  6:07 UTC (permalink / raw)
  To: paulmck
  Cc: josh, rostedt, mathieu.desnoyers, jiangshanlai, linux-kernel,
	sramana, prsood, pkondeti, markivx, peterz



On 09/17/2017 06:30 AM, Paul E. McKenney wrote:
> On Fri, Sep 15, 2017 at 04:44:38PM +0530, Neeraj Upadhyay wrote:
>> Hi,
>>
>> We have one query regarding the behavior of RCU expedited grace period,
>> for scenario where resched_cpu() in sync_sched_exp_handler() fails to
>> acquire the rq lock and returns w/o setting the need_resched. In this
>> case, how do we ensure that the CPU notify rcu about the
>> end of sched grace period (schedule() -> __schedule() ->
>> rcu_note_context_switch(cpu) -> rcu_sched_qs()) , for cases where tick
>> is stopped on that CPU.  Is it implied from the rq lock acquisition
>> failure, that the owner of the rq lock will enforce context switch?
>> For which scenarios in RCU paths (as the function is used only in RCU
>> code), we need trylock check in resched_cpu()?
>>
>> void resched_cpu(int cpu)
>> {
>>          struct rq *rq = cpu_rq(cpu);
>>          unsigned long flags;
>>
>>          if (!raw_spin_trylock_irqsave(&rq->lock, flags))
>>                  return;
>>          resched_curr(rq);
>>          raw_spin_unlock_irqrestore(&rq->lock, flags);
>> }
>>
>>
>> This issue was observed in below scenario, where one of the CPUs (CPU1)
>> started synchronize_sched_expedited and sent IPI to CPU5, which is in
>> the idle path but handled sync_sched_exp_handler() IPI before
>> rcu_idle_enter().
>> As resched_cpu() failed to acquire the rq lock, need_resched was not set,
>> and CPU went to idle; resulting in expedited stall getting reported
>> by CPU1.
>>
>> Below is the scenario:
>>
>> •    CPU1 is waiting for expedited wait to complete:
>> sync_rcu_exp_select_cpus
>>      rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
>>      IPI sent to CPU5
>>
>> synchronize_sched_expedited_wait
>>          ret = swait_event_timeout(
>>                                      rsp->expedited_wq,
>>   sync_rcu_preempt_exp_done(rnp_root),
>>                                      jiffies_stall);
>>
>>             expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
>>
>>
>>
>> •    CPU5 handles IPI and fails to acquire rq lock.
>>
>> Handles IPI
>>      sync_sched_exp_handler
>>          resched_cpu
>>              returns while failing to try lock acquire rq->lock
>>          need_resched is not set
>>
>> •    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
>>      idle (schedule() is not called).
>>
>> •    CPU 1 reports RCU stall.
> Good catch and good detective work!!!
>
> I will be working on a fix this week, hopefully involving resched_cpu()
> getting a return value so that I can track who needs a later retry.
>
> 							Thanx, Paul
>
Hi Paul, how about replacing raw_spin_trylock_irqsave with
raw_spin_lock_irqsave in resched_cpu()? Are there any paths
in RCU code, which depend on trylock check/spinlock recursion?

Thanks
Neeraj

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-17  6:07   ` Neeraj Upadhyay
@ 2017-09-18 15:11     ` Steven Rostedt
  2017-09-18 16:01       ` Paul E. McKenney
  2017-09-21 13:55       ` Peter Zijlstra
  0 siblings, 2 replies; 35+ messages in thread
From: Steven Rostedt @ 2017-09-18 15:11 UTC (permalink / raw)
  To: Neeraj Upadhyay
  Cc: paulmck, josh, mathieu.desnoyers, jiangshanlai, linux-kernel,
	sramana, prsood, pkondeti, markivx, peterz

On Sun, 17 Sep 2017 11:37:06 +0530
Neeraj Upadhyay <neeraju@codeaurora.org> wrote:

> Hi Paul, how about replacing raw_spin_trylock_irqsave with
> raw_spin_lock_irqsave in resched_cpu()? Are there any paths
> in RCU code, which depend on trylock check/spinlock recursion?

It looks to me that resched_cpu() was added for nohz full sched
balancing, but is not longer used by that. The only user is currently
RCU. Perhaps we should change that from a trylock to a lock.

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 15:11     ` Steven Rostedt
@ 2017-09-18 16:01       ` Paul E. McKenney
  2017-09-18 16:12         ` Steven Rostedt
  2017-09-21 13:55       ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-18 16:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Mon, Sep 18, 2017 at 11:11:05AM -0400, Steven Rostedt wrote:
> On Sun, 17 Sep 2017 11:37:06 +0530
> Neeraj Upadhyay <neeraju@codeaurora.org> wrote:
> 
> > Hi Paul, how about replacing raw_spin_trylock_irqsave with
> > raw_spin_lock_irqsave in resched_cpu()? Are there any paths
> > in RCU code, which depend on trylock check/spinlock recursion?
> 
> It looks to me that resched_cpu() was added for nohz full sched
> balancing, but is not longer used by that. The only user is currently
> RCU. Perhaps we should change that from a trylock to a lock.

That certainly is a much simpler fix than the one I was thinking of!

So how about the following patch?

							Thanx, Paul

------------------------------------------------------------------------

commit bc43e2e7e08134e6f403ac845edcf4f85668d803
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Sep 18 08:54:40 2017 -0700

    sched: Make resched_cpu() unconditional
    
    The current implementation of synchronize_sched_expedited() incorrectly
    assumes that resched_cpu() is unconditional, which it is not.  This means
    that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
    fails as follows (analysis by Neeraj Upadhyay):
    
    o    CPU1 is waiting for expedited wait to complete:
    sync_rcu_exp_select_cpus
         rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
         IPI sent to CPU5
    
    synchronize_sched_expedited_wait
             ret = swait_event_timeout(
                                         rsp->expedited_wq,
      sync_rcu_preempt_exp_done(rnp_root),
                                         jiffies_stall);
    
                expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
    
    o    CPU5 handles IPI and fails to acquire rq lock.
    
    Handles IPI
         sync_sched_exp_handler
             resched_cpu
                 returns while failing to try lock acquire rq->lock
             need_resched is not set
    
    o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
         idle (schedule() is not called).
    
    o    CPU 1 reports RCU stall.
    
    Given that resched_cpu() is used only by RCU, this commit fixes the
    assumption by making resched_cpu() unconditional.
    
    Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cab8c5ec128e..b2281971894c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,8 +505,7 @@ void resched_cpu(int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
-	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
-		return;
+	raw_spin_lock_irqsave(&rq->lock, flags);
 	resched_curr(rq);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:01       ` Paul E. McKenney
@ 2017-09-18 16:12         ` Steven Rostedt
  2017-09-18 16:24           ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-09-18 16:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Mon, 18 Sep 2017 09:01:25 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:


>     sched: Make resched_cpu() unconditional
>     
>     The current implementation of synchronize_sched_expedited() incorrectly
>     assumes that resched_cpu() is unconditional, which it is not.  This means
>     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
>     fails as follows (analysis by Neeraj Upadhyay):
>     
>     o    CPU1 is waiting for expedited wait to complete:
>     sync_rcu_exp_select_cpus
>          rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
>          IPI sent to CPU5
>     
>     synchronize_sched_expedited_wait
>              ret = swait_event_timeout(
>                                          rsp->expedited_wq,
>       sync_rcu_preempt_exp_done(rnp_root),
>                                          jiffies_stall);
>     
>                 expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
>     
>     o    CPU5 handles IPI and fails to acquire rq lock.
>     
>     Handles IPI
>          sync_sched_exp_handler
>              resched_cpu
>                  returns while failing to try lock acquire rq->lock
>              need_resched is not set
>     
>     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
>          idle (schedule() is not called).
>     
>     o    CPU 1 reports RCU stall.
>     
>     Given that resched_cpu() is used only by RCU, this commit fixes the
>     assumption by making resched_cpu() unconditional.

Probably want to run this with several workloads with lockdep enabled
first.

-- Steve

>     
>     Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cab8c5ec128e..b2281971894c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -505,8 +505,7 @@ void resched_cpu(int cpu)
>  	struct rq *rq = cpu_rq(cpu);
>  	unsigned long flags;
>  
> -	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
> -		return;
> +	raw_spin_lock_irqsave(&rq->lock, flags);
>  	resched_curr(rq);
>  	raw_spin_unlock_irqrestore(&rq->lock, flags);
>  }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:12         ` Steven Rostedt
@ 2017-09-18 16:24           ` Paul E. McKenney
  2017-09-18 16:29             ` Steven Rostedt
  2017-09-19 15:31             ` Paul E. McKenney
  0 siblings, 2 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-18 16:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Mon, Sep 18, 2017 at 12:12:13PM -0400, Steven Rostedt wrote:
> On Mon, 18 Sep 2017 09:01:25 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> 
> >     sched: Make resched_cpu() unconditional
> >     
> >     The current implementation of synchronize_sched_expedited() incorrectly
> >     assumes that resched_cpu() is unconditional, which it is not.  This means
> >     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
> >     fails as follows (analysis by Neeraj Upadhyay):
> >     
> >     o    CPU1 is waiting for expedited wait to complete:
> >     sync_rcu_exp_select_cpus
> >          rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
> >          IPI sent to CPU5
> >     
> >     synchronize_sched_expedited_wait
> >              ret = swait_event_timeout(
> >                                          rsp->expedited_wq,
> >       sync_rcu_preempt_exp_done(rnp_root),
> >                                          jiffies_stall);
> >     
> >                 expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
> >     
> >     o    CPU5 handles IPI and fails to acquire rq lock.
> >     
> >     Handles IPI
> >          sync_sched_exp_handler
> >              resched_cpu
> >                  returns while failing to try lock acquire rq->lock
> >              need_resched is not set
> >     
> >     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
> >          idle (schedule() is not called).
> >     
> >     o    CPU 1 reports RCU stall.
> >     
> >     Given that resched_cpu() is used only by RCU, this commit fixes the
> >     assumption by making resched_cpu() unconditional.
> 
> Probably want to run this with several workloads with lockdep enabled
> first.

As soon as I work through the backlog of lockdep complaints that
appeared in the last merge window...  :-(

sparse_irq_lock, I am looking at you!!!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:24           ` Paul E. McKenney
@ 2017-09-18 16:29             ` Steven Rostedt
  2017-09-18 16:55               ` Paul E. McKenney
  2017-09-19  1:55               ` Byungchul Park
  2017-09-19 15:31             ` Paul E. McKenney
  1 sibling, 2 replies; 35+ messages in thread
From: Steven Rostedt @ 2017-09-18 16:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Mon, 18 Sep 2017 09:24:12 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:


> As soon as I work through the backlog of lockdep complaints that
> appeared in the last merge window...  :-(
> 
> sparse_irq_lock, I am looking at you!!!  ;-)

I just hit one too, and decided to write a patch to show a chain of 3
when applicable.

For example:

 Chain exists of:
   cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
 
  Possible unsafe locking scenario by crosslock:
 
        CPU0                    CPU1                    CPU2
        ----                    ----                    ----
   lock(smpboot_threads_lock);
   lock((complete)&self->parked);
                                lock(cpu_hotplug_lock.rw_sem);
                                lock(smpboot_threads_lock);
                                                       lock(cpu_hotplug_lock.rw_sem);
                                                       unlock((complete)&self->parked);
 
  *** DEADLOCK ***

:-)

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:29             ` Steven Rostedt
@ 2017-09-18 16:55               ` Paul E. McKenney
  2017-09-18 23:53                 ` Paul E. McKenney
  2017-09-21 13:57                 ` Peter Zijlstra
  2017-09-19  1:55               ` Byungchul Park
  1 sibling, 2 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-18 16:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:
> On Mon, 18 Sep 2017 09:24:12 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> 
> > As soon as I work through the backlog of lockdep complaints that
> > appeared in the last merge window...  :-(
> > 
> > sparse_irq_lock, I am looking at you!!!  ;-)
> 
> I just hit one too, and decided to write a patch to show a chain of 3
> when applicable.
> 
> For example:
> 
>  Chain exists of:
>    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> 
>   Possible unsafe locking scenario by crosslock:
> 
>         CPU0                    CPU1                    CPU2
>         ----                    ----                    ----
>    lock(smpboot_threads_lock);
>    lock((complete)&self->parked);
>                                 lock(cpu_hotplug_lock.rw_sem);
>                                 lock(smpboot_threads_lock);
>                                                        lock(cpu_hotplug_lock.rw_sem);
>                                                        unlock((complete)&self->parked);
> 
>   *** DEADLOCK ***
> 
> :-)

Nice!!!

My next step is reverting 12ac1d0f6c3e ("genirq: Make sparse_irq_lock
protect what it should protect") to see if that helps.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:55               ` Paul E. McKenney
@ 2017-09-18 23:53                 ` Paul E. McKenney
  2017-09-19  1:23                   ` Steven Rostedt
  2017-09-19  1:50                   ` Byungchul Park
  2017-09-21 13:57                 ` Peter Zijlstra
  1 sibling, 2 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-18 23:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz,
	byungchul.park

On Mon, Sep 18, 2017 at 09:55:27AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:
> > On Mon, 18 Sep 2017 09:24:12 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > 
> > > As soon as I work through the backlog of lockdep complaints that
> > > appeared in the last merge window...  :-(
> > > 
> > > sparse_irq_lock, I am looking at you!!!  ;-)
> > 
> > I just hit one too, and decided to write a patch to show a chain of 3
> > when applicable.
> > 
> > For example:
> > 
> >  Chain exists of:
> >    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> > 
> >   Possible unsafe locking scenario by crosslock:
> > 
> >         CPU0                    CPU1                    CPU2
> >         ----                    ----                    ----
> >    lock(smpboot_threads_lock);
> >    lock((complete)&self->parked);
> >                                 lock(cpu_hotplug_lock.rw_sem);
> >                                 lock(smpboot_threads_lock);
> >                                                        lock(cpu_hotplug_lock.rw_sem);
> >                                                        unlock((complete)&self->parked);
> > 
> >   *** DEADLOCK ***
> > 
> > :-)
> 
> Nice!!!
> 
> My next step is reverting 12ac1d0f6c3e ("genirq: Make sparse_irq_lock
> protect what it should protect") to see if that helps.

No joy, but it is amazing how much nicer "git bisect" is when your
failure happens deterministically within 35 seconds.  ;-)

The bisection converged to the range starting with 7a46ec0e2f48
("locking/refcounts, x86/asm: Implement fast refcount overflow
protection") and ending with 0c2364791343 ("Merge branch 'x86/asm'
into locking/core").  All of these failed with an unrelated build
error, but there was a fix that could be merged.  This flagged
d0541b0fa64b ("locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part
of CONFIG_PROVE_LOCKING"), which unfortunately does not revert cleanly.
However, the effect of a reversion can be obtained by removing the
selects of LOCKDEP_CROSSRELEASE and LOCKDEP_COMPLETE from
PROVE_LOCKING, which allows recent commits to complete a short
rcutorture test successfully.

So, Byungchul, any enlightenment?  Please see lockdep splat below.

							Thanx, Paul

------------------------------------------------------------------------

[   35.310179] ======================================================
[   35.310749] WARNING: possible circular locking dependency detected
[   35.310749] 4.13.0-rc4+ #1 Not tainted
[   35.310749] ------------------------------------------------------
[   35.310749] torture_onoff/766 is trying to acquire lock:
[   35.313943]  ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
[   35.313943] 
[   35.313943] but task is already holding lock:
[   35.313943]  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
[   35.313943] 
[   35.313943] which lock already depends on the new lock.
[   35.313943] 
[   35.313943] 
[   35.313943] the existing dependency chain (in reverse order) is:
[   35.313943] 
[   35.313943] -> #1 (sparse_irq_lock){+.+.}:
[   35.313943]        __mutex_lock+0x65/0x960
[   35.313943]        mutex_lock_nested+0x16/0x20
[   35.313943]        irq_lock_sparse+0x12/0x20
[   35.313943]        irq_affinity_online_cpu+0x13/0xd0
[   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
[   35.313943] 
[   35.313943] -> #0 ((complete)&st->done){+.+.}:
[   35.313943]        check_prev_add+0x401/0x800
[   35.313943]        __lock_acquire+0x1100/0x11a0
[   35.313943]        lock_acquire+0x9e/0x1e0
[   35.313943]        wait_for_completion+0x36/0x130
[   35.313943]        takedown_cpu+0x86/0xf0
[   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
[   35.313943]        cpuhp_down_callbacks+0x3d/0x80
[   35.313943]        _cpu_down+0xbb/0xf0
[   35.313943]        do_cpu_down+0x39/0x50
[   35.313943]        cpu_down+0xb/0x10
[   35.313943]        torture_offline+0x75/0x140
[   35.313943]        torture_onoff+0x102/0x1e0
[   35.313943]        kthread+0x142/0x180
[   35.313943]        ret_from_fork+0x27/0x40
[   35.313943] 
[   35.313943] other info that might help us debug this:
[   35.313943] 
[   35.313943]  Possible unsafe locking scenario:
[   35.313943] 
[   35.313943]        CPU0                    CPU1
[   35.313943]        ----                    ----
[   35.313943]   lock(sparse_irq_lock);
[   35.313943]                                lock((complete)&st->done);
[   35.313943]                                lock(sparse_irq_lock);
[   35.313943]   lock((complete)&st->done);
[   35.313943] 
[   35.313943]  *** DEADLOCK ***
[   35.313943] 
[   35.313943] 3 locks held by torture_onoff/766:
[   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
[   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
[   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
[   35.313943] 
[   35.313943] stack backtrace:
[   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
[   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   35.313943] Call Trace:
[   35.313943]  dump_stack+0x67/0x97
[   35.313943]  print_circular_bug+0x21d/0x330
[   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
[   35.313943]  check_prev_add+0x401/0x800
[   35.313943]  ? wake_up_q+0x70/0x70
[   35.313943]  __lock_acquire+0x1100/0x11a0
[   35.313943]  ? __lock_acquire+0x1100/0x11a0
[   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
[   35.313943]  lock_acquire+0x9e/0x1e0
[   35.313943]  ? takedown_cpu+0x86/0xf0
[   35.313943]  wait_for_completion+0x36/0x130
[   35.313943]  ? takedown_cpu+0x86/0xf0
[   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
[   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
[   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
[   35.313943]  takedown_cpu+0x86/0xf0
[   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
[   35.313943]  cpuhp_down_callbacks+0x3d/0x80
[   35.313943]  _cpu_down+0xbb/0xf0
[   35.313943]  do_cpu_down+0x39/0x50
[   35.313943]  cpu_down+0xb/0x10
[   35.313943]  torture_offline+0x75/0x140
[   35.313943]  torture_onoff+0x102/0x1e0
[   35.313943]  kthread+0x142/0x180
[   35.313943]  ? torture_kthread_stopping+0x70/0x70
[   35.313943]  ? kthread_create_on_node+0x40/0x40
[   35.313943]  ret_from_fork+0x27/0x40

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 23:53                 ` Paul E. McKenney
@ 2017-09-19  1:23                   ` Steven Rostedt
  2017-09-19  2:26                     ` Paul E. McKenney
  2017-09-19  1:50                   ` Byungchul Park
  1 sibling, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-09-19  1:23 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz,
	byungchul.park

On Mon, 18 Sep 2017 16:53:11 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Mon, Sep 18, 2017 at 09:55:27AM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:  
> > > On Mon, 18 Sep 2017 09:24:12 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > >   
> > > > As soon as I work through the backlog of lockdep complaints that
> > > > appeared in the last merge window...  :-(
> > > > 
> > > > sparse_irq_lock, I am looking at you!!!  ;-)  
> > > 
> > > I just hit one too, and decided to write a patch to show a chain of 3
> > > when applicable.
> > > 
> > > For example:
> > > 
> > >  Chain exists of:
> > >    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> > > 
> > >   Possible unsafe locking scenario by crosslock:
> > > 
> > >         CPU0                    CPU1                    CPU2
> > >         ----                    ----                    ----
> > >    lock(smpboot_threads_lock);
> > >    lock((complete)&self->parked);
> > >                                 lock(cpu_hotplug_lock.rw_sem);
> > >                                 lock(smpboot_threads_lock);
> > >                                                        lock(cpu_hotplug_lock.rw_sem);
> > >                                                        unlock((complete)&self->parked);
> > > 
> > >   *** DEADLOCK ***
> > > 
> > > :-)  
> > 
> > Nice!!!
> > 

Note, the above lockdep splat does discover a bug.

> > My next step is reverting 12ac1d0f6c3e ("genirq: Make sparse_irq_lock
> > protect what it should protect") to see if that helps.  
> 
> No joy, but it is amazing how much nicer "git bisect" is when your
> failure happens deterministically within 35 seconds.  ;-)
> 
> The bisection converged to the range starting with 7a46ec0e2f48
> ("locking/refcounts, x86/asm: Implement fast refcount overflow
> protection") and ending with 0c2364791343 ("Merge branch 'x86/asm'
> into locking/core").  All of these failed with an unrelated build
> error, but there was a fix that could be merged.  This flagged
> d0541b0fa64b ("locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part
> of CONFIG_PROVE_LOCKING"), which unfortunately does not revert cleanly.
> However, the effect of a reversion can be obtained by removing the
> selects of LOCKDEP_CROSSRELEASE and LOCKDEP_COMPLETE from
> PROVE_LOCKING, which allows recent commits to complete a short
> rcutorture test successfully.

I don't think you want to remove those. It appears that lockdep now
covers completions, and it is uncovering a lot of bugs.

> 
> So, Byungchul, any enlightenment?  Please see lockdep splat below.

Did you discover the below by reverting lockdep patches? It doesn't
really make sense. It looks to me to be about completions but not
fully covering it.

-- Steve
 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> [   35.310179] ======================================================
> [   35.310749] WARNING: possible circular locking dependency detected
> [   35.310749] 4.13.0-rc4+ #1 Not tainted
> [   35.310749] ------------------------------------------------------
> [   35.310749] torture_onoff/766 is trying to acquire lock:
> [   35.313943]  ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
> [   35.313943] 
> [   35.313943] but task is already holding lock:
> [   35.313943]  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> [   35.313943] 
> [   35.313943] which lock already depends on the new lock.
> [   35.313943] 
> [   35.313943] 
> [   35.313943] the existing dependency chain (in reverse order) is:
> [   35.313943] 
> [   35.313943] -> #1 (sparse_irq_lock){+.+.}:
> [   35.313943]        __mutex_lock+0x65/0x960
> [   35.313943]        mutex_lock_nested+0x16/0x20
> [   35.313943]        irq_lock_sparse+0x12/0x20
> [   35.313943]        irq_affinity_online_cpu+0x13/0xd0
> [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> [   35.313943] 
> [   35.313943] -> #0 ((complete)&st->done){+.+.}:
> [   35.313943]        check_prev_add+0x401/0x800
> [   35.313943]        __lock_acquire+0x1100/0x11a0
> [   35.313943]        lock_acquire+0x9e/0x1e0
> [   35.313943]        wait_for_completion+0x36/0x130
> [   35.313943]        takedown_cpu+0x86/0xf0
> [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> [   35.313943]        cpuhp_down_callbacks+0x3d/0x80
> [   35.313943]        _cpu_down+0xbb/0xf0
> [   35.313943]        do_cpu_down+0x39/0x50
> [   35.313943]        cpu_down+0xb/0x10
> [   35.313943]        torture_offline+0x75/0x140
> [   35.313943]        torture_onoff+0x102/0x1e0
> [   35.313943]        kthread+0x142/0x180
> [   35.313943]        ret_from_fork+0x27/0x40
> [   35.313943] 
> [   35.313943] other info that might help us debug this:
> [   35.313943] 
> [   35.313943]  Possible unsafe locking scenario:
> [   35.313943] 
> [   35.313943]        CPU0                    CPU1
> [   35.313943]        ----                    ----
> [   35.313943]   lock(sparse_irq_lock);
> [   35.313943]                                lock((complete)&st->done);
> [   35.313943]                                lock(sparse_irq_lock);
> [   35.313943]   lock((complete)&st->done);
> [   35.313943] 
> [   35.313943]  *** DEADLOCK ***
> [   35.313943] 
> [   35.313943] 3 locks held by torture_onoff/766:
> [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> [   35.313943] 
> [   35.313943] stack backtrace:
> [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [   35.313943] Call Trace:
> [   35.313943]  dump_stack+0x67/0x97
> [   35.313943]  print_circular_bug+0x21d/0x330
> [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> [   35.313943]  check_prev_add+0x401/0x800
> [   35.313943]  ? wake_up_q+0x70/0x70
> [   35.313943]  __lock_acquire+0x1100/0x11a0
> [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> [   35.313943]  lock_acquire+0x9e/0x1e0
> [   35.313943]  ? takedown_cpu+0x86/0xf0
> [   35.313943]  wait_for_completion+0x36/0x130
> [   35.313943]  ? takedown_cpu+0x86/0xf0
> [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> [   35.313943]  takedown_cpu+0x86/0xf0
> [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> [   35.313943]  _cpu_down+0xbb/0xf0
> [   35.313943]  do_cpu_down+0x39/0x50
> [   35.313943]  cpu_down+0xb/0x10
> [   35.313943]  torture_offline+0x75/0x140
> [   35.313943]  torture_onoff+0x102/0x1e0
> [   35.313943]  kthread+0x142/0x180
> [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> [   35.313943]  ? kthread_create_on_node+0x40/0x40
> [   35.313943]  ret_from_fork+0x27/0x40

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 23:53                 ` Paul E. McKenney
  2017-09-19  1:23                   ` Steven Rostedt
@ 2017-09-19  1:50                   ` Byungchul Park
  2017-09-19  2:06                     ` Byungchul Park
  1 sibling, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2017-09-19  1:50 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx,
	peterz, kernel-team

On Mon, Sep 18, 2017 at 04:53:11PM -0700, Paul E. McKenney wrote:
> So, Byungchul, any enlightenment?  Please see lockdep splat below.
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> [   35.310179] ======================================================
> [   35.310749] WARNING: possible circular locking dependency detected
> [   35.310749] 4.13.0-rc4+ #1 Not tainted
> [   35.310749] ------------------------------------------------------
> [   35.310749] torture_onoff/766 is trying to acquire lock:
> [   35.313943]  ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
> [   35.313943] 
> [   35.313943] but task is already holding lock:
> [   35.313943]  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> [   35.313943] 
> [   35.313943] which lock already depends on the new lock.
> [   35.313943] 
> [   35.313943] 
> [   35.313943] the existing dependency chain (in reverse order) is:
> [   35.313943] 
> [   35.313943] -> #1 (sparse_irq_lock){+.+.}:
> [   35.313943]        __mutex_lock+0x65/0x960
> [   35.313943]        mutex_lock_nested+0x16/0x20
> [   35.313943]        irq_lock_sparse+0x12/0x20
> [   35.313943]        irq_affinity_online_cpu+0x13/0xd0
> [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> [   35.313943] 
> [   35.313943] -> #0 ((complete)&st->done){+.+.}:
> [   35.313943]        check_prev_add+0x401/0x800
> [   35.313943]        __lock_acquire+0x1100/0x11a0
> [   35.313943]        lock_acquire+0x9e/0x1e0
> [   35.313943]        wait_for_completion+0x36/0x130
> [   35.313943]        takedown_cpu+0x86/0xf0
> [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> [   35.313943]        cpuhp_down_callbacks+0x3d/0x80
> [   35.313943]        _cpu_down+0xbb/0xf0
> [   35.313943]        do_cpu_down+0x39/0x50
> [   35.313943]        cpu_down+0xb/0x10
> [   35.313943]        torture_offline+0x75/0x140
> [   35.313943]        torture_onoff+0x102/0x1e0
> [   35.313943]        kthread+0x142/0x180
> [   35.313943]        ret_from_fork+0x27/0x40
> [   35.313943] 
> [   35.313943] other info that might help us debug this:
> [   35.313943] 
> [   35.313943]  Possible unsafe locking scenario:
> [   35.313943] 
> [   35.313943]        CPU0                    CPU1
> [   35.313943]        ----                    ----
> [   35.313943]   lock(sparse_irq_lock);
> [   35.313943]                                lock((complete)&st->done);
> [   35.313943]                                lock(sparse_irq_lock);
> [   35.313943]   lock((complete)&st->done);
> [   35.313943] 
> [   35.313943]  *** DEADLOCK ***

Hello Paul and Steven,

This is saying:

Thread A
--------
takedown_cpu()
   irq_lock_sparse()
   wait_for_completion(&st->done) // Wait for completion of B
   irq_unlock_sparse()

Thread B
--------
cpuhp_invoke_callback()
   irq_lock_sparse() // Wait for A to irq_unlock_sparse()
   (on the way going to complete(&st->done))

So, lockdep consider this as a deadlock.
Is it possible to happen?

Thanks,
Byungchul

> [   35.313943] 
> [   35.313943] 3 locks held by torture_onoff/766:
> [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> [   35.313943] 
> [   35.313943] stack backtrace:
> [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [   35.313943] Call Trace:
> [   35.313943]  dump_stack+0x67/0x97
> [   35.313943]  print_circular_bug+0x21d/0x330
> [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> [   35.313943]  check_prev_add+0x401/0x800
> [   35.313943]  ? wake_up_q+0x70/0x70
> [   35.313943]  __lock_acquire+0x1100/0x11a0
> [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> [   35.313943]  lock_acquire+0x9e/0x1e0
> [   35.313943]  ? takedown_cpu+0x86/0xf0
> [   35.313943]  wait_for_completion+0x36/0x130
> [   35.313943]  ? takedown_cpu+0x86/0xf0
> [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> [   35.313943]  takedown_cpu+0x86/0xf0
> [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> [   35.313943]  _cpu_down+0xbb/0xf0
> [   35.313943]  do_cpu_down+0x39/0x50
> [   35.313943]  cpu_down+0xb/0x10
> [   35.313943]  torture_offline+0x75/0x140
> [   35.313943]  torture_onoff+0x102/0x1e0
> [   35.313943]  kthread+0x142/0x180
> [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> [   35.313943]  ? kthread_create_on_node+0x40/0x40
> [   35.313943]  ret_from_fork+0x27/0x40

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:29             ` Steven Rostedt
  2017-09-18 16:55               ` Paul E. McKenney
@ 2017-09-19  1:55               ` Byungchul Park
  1 sibling, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2017-09-19  1:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Paul E. McKenney, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx,
	peterz, kernel-team

On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:
> On Mon, 18 Sep 2017 09:24:12 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> 
> > As soon as I work through the backlog of lockdep complaints that
> > appeared in the last merge window...  :-(
> > 
> > sparse_irq_lock, I am looking at you!!!  ;-)
> 
> I just hit one too, and decided to write a patch to show a chain of 3
> when applicable.

Hello Steven,

I really agree with this. Currently, in case that more than two locks
participates in a deadlock, the report informs insuffucuently.

Thanks,
Byungchul

> 
> For example:
> 
>  Chain exists of:
>    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
>  
>   Possible unsafe locking scenario by crosslock:
>  
>         CPU0                    CPU1                    CPU2
>         ----                    ----                    ----
>    lock(smpboot_threads_lock);
>    lock((complete)&self->parked);
>                                 lock(cpu_hotplug_lock.rw_sem);
>                                 lock(smpboot_threads_lock);
>                                                        lock(cpu_hotplug_lock.rw_sem);
>                                                        unlock((complete)&self->parked);
>  
>   *** DEADLOCK ***
> 
> :-)
> 
> -- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  1:50                   ` Byungchul Park
@ 2017-09-19  2:06                     ` Byungchul Park
  2017-09-19  2:33                       ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2017-09-19  2:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx,
	peterz, kernel-team

On Tue, Sep 19, 2017 at 10:50:27AM +0900, Byungchul Park wrote:
> On Mon, Sep 18, 2017 at 04:53:11PM -0700, Paul E. McKenney wrote:
> > So, Byungchul, any enlightenment?  Please see lockdep splat below.
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > [   35.310179] ======================================================
> > [   35.310749] WARNING: possible circular locking dependency detected
> > [   35.310749] 4.13.0-rc4+ #1 Not tainted
> > [   35.310749] ------------------------------------------------------
> > [   35.310749] torture_onoff/766 is trying to acquire lock:
> > [   35.313943]  ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
> > [   35.313943] 
> > [   35.313943] but task is already holding lock:
> > [   35.313943]  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > [   35.313943] 
> > [   35.313943] which lock already depends on the new lock.
> > [   35.313943] 
> > [   35.313943] 
> > [   35.313943] the existing dependency chain (in reverse order) is:
> > [   35.313943] 
> > [   35.313943] -> #1 (sparse_irq_lock){+.+.}:
> > [   35.313943]        __mutex_lock+0x65/0x960
> > [   35.313943]        mutex_lock_nested+0x16/0x20
> > [   35.313943]        irq_lock_sparse+0x12/0x20
> > [   35.313943]        irq_affinity_online_cpu+0x13/0xd0
> > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943] 
> > [   35.313943] -> #0 ((complete)&st->done){+.+.}:
> > [   35.313943]        check_prev_add+0x401/0x800
> > [   35.313943]        __lock_acquire+0x1100/0x11a0
> > [   35.313943]        lock_acquire+0x9e/0x1e0
> > [   35.313943]        wait_for_completion+0x36/0x130
> > [   35.313943]        takedown_cpu+0x86/0xf0
> > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943]        cpuhp_down_callbacks+0x3d/0x80
> > [   35.313943]        _cpu_down+0xbb/0xf0
> > [   35.313943]        do_cpu_down+0x39/0x50
> > [   35.313943]        cpu_down+0xb/0x10
> > [   35.313943]        torture_offline+0x75/0x140
> > [   35.313943]        torture_onoff+0x102/0x1e0
> > [   35.313943]        kthread+0x142/0x180
> > [   35.313943]        ret_from_fork+0x27/0x40
> > [   35.313943] 
> > [   35.313943] other info that might help us debug this:
> > [   35.313943] 
> > [   35.313943]  Possible unsafe locking scenario:
> > [   35.313943] 
> > [   35.313943]        CPU0                    CPU1
> > [   35.313943]        ----                    ----
> > [   35.313943]   lock(sparse_irq_lock);
> > [   35.313943]                                lock((complete)&st->done);
> > [   35.313943]                                lock(sparse_irq_lock);
> > [   35.313943]   lock((complete)&st->done);
> > [   35.313943] 
> > [   35.313943]  *** DEADLOCK ***
> 
> Hello Paul and Steven,
> 
> This is saying:
> 
> Thread A
> --------
> takedown_cpu()
>    irq_lock_sparse()
>    wait_for_completion(&st->done) // Wait for completion of B
>    irq_unlock_sparse()
> 
> Thread B
> --------
> cpuhp_invoke_callback()
>    irq_lock_sparse() // Wait for A to irq_unlock_sparse()
>    (on the way going to complete(&st->done))
> 
> So, lockdep consider this as a deadlock.
> Is it possible to happen?

In addition, if it's impossible, then we should fix lock class
assignments so that the locks actually have different classes.

> Thanks,
> Byungchul
> 
> > [   35.313943] 
> > [   35.313943] 3 locks held by torture_onoff/766:
> > [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> > [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> > [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > [   35.313943] 
> > [   35.313943] stack backtrace:
> > [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> > [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [   35.313943] Call Trace:
> > [   35.313943]  dump_stack+0x67/0x97
> > [   35.313943]  print_circular_bug+0x21d/0x330
> > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > [   35.313943]  check_prev_add+0x401/0x800
> > [   35.313943]  ? wake_up_q+0x70/0x70
> > [   35.313943]  __lock_acquire+0x1100/0x11a0
> > [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > [   35.313943]  lock_acquire+0x9e/0x1e0
> > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > [   35.313943]  wait_for_completion+0x36/0x130
> > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> > [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> > [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> > [   35.313943]  takedown_cpu+0x86/0xf0
> > [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> > [   35.313943]  _cpu_down+0xbb/0xf0
> > [   35.313943]  do_cpu_down+0x39/0x50
> > [   35.313943]  cpu_down+0xb/0x10
> > [   35.313943]  torture_offline+0x75/0x140
> > [   35.313943]  torture_onoff+0x102/0x1e0
> > [   35.313943]  kthread+0x142/0x180
> > [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> > [   35.313943]  ? kthread_create_on_node+0x40/0x40
> > [   35.313943]  ret_from_fork+0x27/0x40

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  1:23                   ` Steven Rostedt
@ 2017-09-19  2:26                     ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-19  2:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz,
	byungchul.park

On Mon, Sep 18, 2017 at 09:23:01PM -0400, Steven Rostedt wrote:
> On Mon, 18 Sep 2017 16:53:11 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Mon, Sep 18, 2017 at 09:55:27AM -0700, Paul E. McKenney wrote:
> > > On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:  
> > > > On Mon, 18 Sep 2017 09:24:12 -0700
> > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > 
> > > >   
> > > > > As soon as I work through the backlog of lockdep complaints that
> > > > > appeared in the last merge window...  :-(
> > > > > 
> > > > > sparse_irq_lock, I am looking at you!!!  ;-)  
> > > > 
> > > > I just hit one too, and decided to write a patch to show a chain of 3
> > > > when applicable.
> > > > 
> > > > For example:
> > > > 
> > > >  Chain exists of:
> > > >    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> > > > 
> > > >   Possible unsafe locking scenario by crosslock:
> > > > 
> > > >         CPU0                    CPU1                    CPU2
> > > >         ----                    ----                    ----
> > > >    lock(smpboot_threads_lock);
> > > >    lock((complete)&self->parked);
> > > >                                 lock(cpu_hotplug_lock.rw_sem);
> > > >                                 lock(smpboot_threads_lock);
> > > >                                                        lock(cpu_hotplug_lock.rw_sem);
> > > >                                                        unlock((complete)&self->parked);
> > > > 
> > > >   *** DEADLOCK ***
> > > > 
> > > > :-)  
> > > 
> > > Nice!!!
> 
> Note, the above lockdep splat does discover a bug.

Fair enough, but I unfortunately have several other much more bizarre
bugs stacked up and so I am not volunteering to fix this one.

> > > My next step is reverting 12ac1d0f6c3e ("genirq: Make sparse_irq_lock
> > > protect what it should protect") to see if that helps.  
> > 
> > No joy, but it is amazing how much nicer "git bisect" is when your
> > failure happens deterministically within 35 seconds.  ;-)
> > 
> > The bisection converged to the range starting with 7a46ec0e2f48
> > ("locking/refcounts, x86/asm: Implement fast refcount overflow
> > protection") and ending with 0c2364791343 ("Merge branch 'x86/asm'
> > into locking/core").  All of these failed with an unrelated build
> > error, but there was a fix that could be merged.  This flagged
> > d0541b0fa64b ("locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part
> > of CONFIG_PROVE_LOCKING"), which unfortunately does not revert cleanly.
> > However, the effect of a reversion can be obtained by removing the
> > selects of LOCKDEP_CROSSRELEASE and LOCKDEP_COMPLETE from
> > PROVE_LOCKING, which allows recent commits to complete a short
> > rcutorture test successfully.
> 
> I don't think you want to remove those. It appears that lockdep now
> covers completions, and it is uncovering a lot of bugs.

Actually, I do, at least in the short term.  This splat is getting in the
way of my diagnostics for the other bugs.  Please note that I am -not-
arguing that mainline should change, at least not yet.

> > So, Byungchul, any enlightenment?  Please see lockdep splat below.
> 
> Did you discover the below by reverting lockdep patches? It doesn't
> really make sense. It looks to me to be about completions but not
> fully covering it.

No, the splat below is what I get from stock v4.14-rc1 on these
rcutorture scenarios:  SRCU-P, TASKS01, TREE03, and TREE05.  If you
would like to try it yourself, TASKS01 requires only two CPUs and
the others require eight.

When I suppress LOCKDEP_CROSSRELEASE and LOCKDEP_COMPLETE, I don't
see anything that looks like that deadlock, but it is of course quite
possible that the deadlock is very low probability -- and I did short
30-minute runs.

							Thanx, Paul

> -- Steve
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > [   35.310179] ======================================================
> > [   35.310749] WARNING: possible circular locking dependency detected
> > [   35.310749] 4.13.0-rc4+ #1 Not tainted
> > [   35.310749] ------------------------------------------------------
> > [   35.310749] torture_onoff/766 is trying to acquire lock:
> > [   35.313943]  ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
> > [   35.313943] 
> > [   35.313943] but task is already holding lock:
> > [   35.313943]  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > [   35.313943] 
> > [   35.313943] which lock already depends on the new lock.
> > [   35.313943] 
> > [   35.313943] 
> > [   35.313943] the existing dependency chain (in reverse order) is:
> > [   35.313943] 
> > [   35.313943] -> #1 (sparse_irq_lock){+.+.}:
> > [   35.313943]        __mutex_lock+0x65/0x960
> > [   35.313943]        mutex_lock_nested+0x16/0x20
> > [   35.313943]        irq_lock_sparse+0x12/0x20
> > [   35.313943]        irq_affinity_online_cpu+0x13/0xd0
> > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943] 
> > [   35.313943] -> #0 ((complete)&st->done){+.+.}:
> > [   35.313943]        check_prev_add+0x401/0x800
> > [   35.313943]        __lock_acquire+0x1100/0x11a0
> > [   35.313943]        lock_acquire+0x9e/0x1e0
> > [   35.313943]        wait_for_completion+0x36/0x130
> > [   35.313943]        takedown_cpu+0x86/0xf0
> > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943]        cpuhp_down_callbacks+0x3d/0x80
> > [   35.313943]        _cpu_down+0xbb/0xf0
> > [   35.313943]        do_cpu_down+0x39/0x50
> > [   35.313943]        cpu_down+0xb/0x10
> > [   35.313943]        torture_offline+0x75/0x140
> > [   35.313943]        torture_onoff+0x102/0x1e0
> > [   35.313943]        kthread+0x142/0x180
> > [   35.313943]        ret_from_fork+0x27/0x40
> > [   35.313943] 
> > [   35.313943] other info that might help us debug this:
> > [   35.313943] 
> > [   35.313943]  Possible unsafe locking scenario:
> > [   35.313943] 
> > [   35.313943]        CPU0                    CPU1
> > [   35.313943]        ----                    ----
> > [   35.313943]   lock(sparse_irq_lock);
> > [   35.313943]                                lock((complete)&st->done);
> > [   35.313943]                                lock(sparse_irq_lock);
> > [   35.313943]   lock((complete)&st->done);
> > [   35.313943] 
> > [   35.313943]  *** DEADLOCK ***
> > [   35.313943] 
> > [   35.313943] 3 locks held by torture_onoff/766:
> > [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> > [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> > [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > [   35.313943] 
> > [   35.313943] stack backtrace:
> > [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> > [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [   35.313943] Call Trace:
> > [   35.313943]  dump_stack+0x67/0x97
> > [   35.313943]  print_circular_bug+0x21d/0x330
> > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > [   35.313943]  check_prev_add+0x401/0x800
> > [   35.313943]  ? wake_up_q+0x70/0x70
> > [   35.313943]  __lock_acquire+0x1100/0x11a0
> > [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > [   35.313943]  lock_acquire+0x9e/0x1e0
> > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > [   35.313943]  wait_for_completion+0x36/0x130
> > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> > [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> > [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> > [   35.313943]  takedown_cpu+0x86/0xf0
> > [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> > [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> > [   35.313943]  _cpu_down+0xbb/0xf0
> > [   35.313943]  do_cpu_down+0x39/0x50
> > [   35.313943]  cpu_down+0xb/0x10
> > [   35.313943]  torture_offline+0x75/0x140
> > [   35.313943]  torture_onoff+0x102/0x1e0
> > [   35.313943]  kthread+0x142/0x180
> > [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> > [   35.313943]  ? kthread_create_on_node+0x40/0x40
> > [   35.313943]  ret_from_fork+0x27/0x40
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  2:06                     ` Byungchul Park
@ 2017-09-19  2:33                       ` Paul E. McKenney
  2017-09-19  2:48                         ` Byungchul Park
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-19  2:33 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx,
	peterz, kernel-team

On Tue, Sep 19, 2017 at 11:06:10AM +0900, Byungchul Park wrote:
> On Tue, Sep 19, 2017 at 10:50:27AM +0900, Byungchul Park wrote:
> > On Mon, Sep 18, 2017 at 04:53:11PM -0700, Paul E. McKenney wrote:
> > > So, Byungchul, any enlightenment?  Please see lockdep splat below.
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > [   35.310179] ======================================================
> > > [   35.310749] WARNING: possible circular locking dependency detected
> > > [   35.310749] 4.13.0-rc4+ #1 Not tainted
> > > [   35.310749] ------------------------------------------------------
> > > [   35.310749] torture_onoff/766 is trying to acquire lock:
> > > [   35.313943]  ((complete)&st->done){+.+.}, at: [<ffffffffb905f5a6>] takedown_cpu+0x86/0xf0
> > > [   35.313943] 
> > > [   35.313943] but task is already holding lock:
> > > [   35.313943]  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > > [   35.313943] 
> > > [   35.313943] which lock already depends on the new lock.
> > > [   35.313943] 
> > > [   35.313943] 
> > > [   35.313943] the existing dependency chain (in reverse order) is:
> > > [   35.313943] 
> > > [   35.313943] -> #1 (sparse_irq_lock){+.+.}:
> > > [   35.313943]        __mutex_lock+0x65/0x960
> > > [   35.313943]        mutex_lock_nested+0x16/0x20
> > > [   35.313943]        irq_lock_sparse+0x12/0x20
> > > [   35.313943]        irq_affinity_online_cpu+0x13/0xd0
> > > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > > [   35.313943] 
> > > [   35.313943] -> #0 ((complete)&st->done){+.+.}:
> > > [   35.313943]        check_prev_add+0x401/0x800
> > > [   35.313943]        __lock_acquire+0x1100/0x11a0
> > > [   35.313943]        lock_acquire+0x9e/0x1e0
> > > [   35.313943]        wait_for_completion+0x36/0x130
> > > [   35.313943]        takedown_cpu+0x86/0xf0
> > > [   35.313943]        cpuhp_invoke_callback+0xa7/0x8b0
> > > [   35.313943]        cpuhp_down_callbacks+0x3d/0x80
> > > [   35.313943]        _cpu_down+0xbb/0xf0
> > > [   35.313943]        do_cpu_down+0x39/0x50
> > > [   35.313943]        cpu_down+0xb/0x10
> > > [   35.313943]        torture_offline+0x75/0x140
> > > [   35.313943]        torture_onoff+0x102/0x1e0
> > > [   35.313943]        kthread+0x142/0x180
> > > [   35.313943]        ret_from_fork+0x27/0x40
> > > [   35.313943] 
> > > [   35.313943] other info that might help us debug this:
> > > [   35.313943] 
> > > [   35.313943]  Possible unsafe locking scenario:
> > > [   35.313943] 
> > > [   35.313943]        CPU0                    CPU1
> > > [   35.313943]        ----                    ----
> > > [   35.313943]   lock(sparse_irq_lock);
> > > [   35.313943]                                lock((complete)&st->done);
> > > [   35.313943]                                lock(sparse_irq_lock);
> > > [   35.313943]   lock((complete)&st->done);
> > > [   35.313943] 
> > > [   35.313943]  *** DEADLOCK ***
> > 
> > Hello Paul and Steven,
> > 
> > This is saying:
> > 
> > Thread A
> > --------
> > takedown_cpu()
> >    irq_lock_sparse()
> >    wait_for_completion(&st->done) // Wait for completion of B
> >    irq_unlock_sparse()
> > 
> > Thread B
> > --------
> > cpuhp_invoke_callback()
> >    irq_lock_sparse() // Wait for A to irq_unlock_sparse()
> >    (on the way going to complete(&st->done))
> > 
> > So, lockdep consider this as a deadlock.
> > Is it possible to happen?
> 
> In addition, if it's impossible, then we should fix lock class
> assignments so that the locks actually have different classes.

Interesting, and thank you for the analysis!

The strange thing is that the way you describe it, this would be a
deterministic deadlock.  Yet CPU hotplug operations complete just fine
in my tests.  What am I missing here?

							Thanx, Paul

> > Thanks,
> > Byungchul
> > 
> > > [   35.313943] 
> > > [   35.313943] 3 locks held by torture_onoff/766:
> > > [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> > > [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> > > [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > > [   35.313943] 
> > > [   35.313943] stack backtrace:
> > > [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> > > [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > [   35.313943] Call Trace:
> > > [   35.313943]  dump_stack+0x67/0x97
> > > [   35.313943]  print_circular_bug+0x21d/0x330
> > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > [   35.313943]  check_prev_add+0x401/0x800
> > > [   35.313943]  ? wake_up_q+0x70/0x70
> > > [   35.313943]  __lock_acquire+0x1100/0x11a0
> > > [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > [   35.313943]  lock_acquire+0x9e/0x1e0
> > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > [   35.313943]  wait_for_completion+0x36/0x130
> > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> > > [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> > > [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> > > [   35.313943]  takedown_cpu+0x86/0xf0
> > > [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> > > [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> > > [   35.313943]  _cpu_down+0xbb/0xf0
> > > [   35.313943]  do_cpu_down+0x39/0x50
> > > [   35.313943]  cpu_down+0xb/0x10
> > > [   35.313943]  torture_offline+0x75/0x140
> > > [   35.313943]  torture_onoff+0x102/0x1e0
> > > [   35.313943]  kthread+0x142/0x180
> > > [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> > > [   35.313943]  ? kthread_create_on_node+0x40/0x40
> > > [   35.313943]  ret_from_fork+0x27/0x40
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  2:33                       ` Paul E. McKenney
@ 2017-09-19  2:48                         ` Byungchul Park
  2017-09-19  4:04                           ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Byungchul Park @ 2017-09-19  2:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx,
	peterz, kernel-team

On Mon, Sep 18, 2017 at 07:33:29PM -0700, Paul E. McKenney wrote:
> > > Hello Paul and Steven,
> > > 
> > > This is saying:
> > > 
> > > Thread A
> > > --------
> > > takedown_cpu()
> > >    irq_lock_sparse()
> > >    wait_for_completion(&st->done) // Wait for completion of B
> > >    irq_unlock_sparse()
> > > 
> > > Thread B
> > > --------
> > > cpuhp_invoke_callback()
> > >    irq_lock_sparse() // Wait for A to irq_unlock_sparse()
> > >    (on the way going to complete(&st->done))
> > > 
> > > So, lockdep consider this as a deadlock.
> > > Is it possible to happen?
> > 
> > In addition, if it's impossible, then we should fix lock class
> > assignments so that the locks actually have different classes.
> 
> Interesting, and thank you for the analysis!
> 
> The strange thing is that the way you describe it, this would be a
> deterministic deadlock.  Yet CPU hotplug operations complete just fine
> in my tests.  What am I missing here?

Hi, :)

Lockdep basically reports either (1) an actual deadlock happened at the
time or (2) a deadlock possibility, even w/o LOCKDEP_CROSSRELEASE.

Both are useful. But LOCKDEP_CROSSRELEASE can only do the latter. IOW,
the deadlock would actually happen _only_ when the two threads(A and B)
run simultaniously.

In your case, those two threads might run at different timings. So it's
not an actual deadlock, but still has a possibility for the problem to
happen later.

> 							Thanx, Paul
> 
> > > Thanks,
> > > Byungchul
> > > 
> > > > [   35.313943] 
> > > > [   35.313943] 3 locks held by torture_onoff/766:
> > > > [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> > > > [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> > > > [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > > > [   35.313943] 
> > > > [   35.313943] stack backtrace:
> > > > [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> > > > [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > [   35.313943] Call Trace:
> > > > [   35.313943]  dump_stack+0x67/0x97
> > > > [   35.313943]  print_circular_bug+0x21d/0x330
> > > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > > [   35.313943]  check_prev_add+0x401/0x800
> > > > [   35.313943]  ? wake_up_q+0x70/0x70
> > > > [   35.313943]  __lock_acquire+0x1100/0x11a0
> > > > [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> > > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > > [   35.313943]  lock_acquire+0x9e/0x1e0
> > > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > > [   35.313943]  wait_for_completion+0x36/0x130
> > > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > > [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> > > > [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> > > > [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> > > > [   35.313943]  takedown_cpu+0x86/0xf0
> > > > [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> > > > [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> > > > [   35.313943]  _cpu_down+0xbb/0xf0
> > > > [   35.313943]  do_cpu_down+0x39/0x50
> > > > [   35.313943]  cpu_down+0xb/0x10
> > > > [   35.313943]  torture_offline+0x75/0x140
> > > > [   35.313943]  torture_onoff+0x102/0x1e0
> > > > [   35.313943]  kthread+0x142/0x180
> > > > [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> > > > [   35.313943]  ? kthread_create_on_node+0x40/0x40
> > > > [   35.313943]  ret_from_fork+0x27/0x40
> > 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  2:48                         ` Byungchul Park
@ 2017-09-19  4:04                           ` Paul E. McKenney
  2017-09-19  5:37                             ` Boqun Feng
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-19  4:04 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx,
	peterz, kernel-team

On Tue, Sep 19, 2017 at 11:48:22AM +0900, Byungchul Park wrote:
> On Mon, Sep 18, 2017 at 07:33:29PM -0700, Paul E. McKenney wrote:
> > > > Hello Paul and Steven,
> > > > 
> > > > This is saying:
> > > > 
> > > > Thread A
> > > > --------
> > > > takedown_cpu()
> > > >    irq_lock_sparse()
> > > >    wait_for_completion(&st->done) // Wait for completion of B
> > > >    irq_unlock_sparse()
> > > > 
> > > > Thread B
> > > > --------
> > > > cpuhp_invoke_callback()
> > > >    irq_lock_sparse() // Wait for A to irq_unlock_sparse()
> > > >    (on the way going to complete(&st->done))
> > > > 
> > > > So, lockdep consider this as a deadlock.
> > > > Is it possible to happen?
> > > 
> > > In addition, if it's impossible, then we should fix lock class
> > > assignments so that the locks actually have different classes.
> > 
> > Interesting, and thank you for the analysis!
> > 
> > The strange thing is that the way you describe it, this would be a
> > deterministic deadlock.  Yet CPU hotplug operations complete just fine
> > in my tests.  What am I missing here?
> 
> Hi, :)
> 
> Lockdep basically reports either (1) an actual deadlock happened at the
> time or (2) a deadlock possibility, even w/o LOCKDEP_CROSSRELEASE.
> 
> Both are useful. But LOCKDEP_CROSSRELEASE can only do the latter. IOW,
> the deadlock would actually happen _only_ when the two threads(A and B)
> run simultaniously.
> 
> In your case, those two threads might run at different timings. So it's
> not an actual deadlock, but still has a possibility for the problem to
> happen later.

Fair enough, if the wakeup always happened first, deadlock might well
be avoided.  If the sleep happened first, I suspect deadlock would
be deterministic in this case.

							Thanx, Paul

> > > > Thanks,
> > > > Byungchul
> > > > 
> > > > > [   35.313943] 
> > > > > [   35.313943] 3 locks held by torture_onoff/766:
> > > > > [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> > > > > [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> > > > > [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > > > > [   35.313943] 
> > > > > [   35.313943] stack backtrace:
> > > > > [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> > > > > [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > [   35.313943] Call Trace:
> > > > > [   35.313943]  dump_stack+0x67/0x97
> > > > > [   35.313943]  print_circular_bug+0x21d/0x330
> > > > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > > > [   35.313943]  check_prev_add+0x401/0x800
> > > > > [   35.313943]  ? wake_up_q+0x70/0x70
> > > > > [   35.313943]  __lock_acquire+0x1100/0x11a0
> > > > > [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> > > > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > > > [   35.313943]  lock_acquire+0x9e/0x1e0
> > > > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > > > [   35.313943]  wait_for_completion+0x36/0x130
> > > > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > > > [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> > > > > [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> > > > > [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> > > > > [   35.313943]  takedown_cpu+0x86/0xf0
> > > > > [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> > > > > [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> > > > > [   35.313943]  _cpu_down+0xbb/0xf0
> > > > > [   35.313943]  do_cpu_down+0x39/0x50
> > > > > [   35.313943]  cpu_down+0xb/0x10
> > > > > [   35.313943]  torture_offline+0x75/0x140
> > > > > [   35.313943]  torture_onoff+0x102/0x1e0
> > > > > [   35.313943]  kthread+0x142/0x180
> > > > > [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> > > > > [   35.313943]  ? kthread_create_on_node+0x40/0x40
> > > > > [   35.313943]  ret_from_fork+0x27/0x40
> > > 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  4:04                           ` Paul E. McKenney
@ 2017-09-19  5:37                             ` Boqun Feng
  2017-09-19  6:11                               ` Mike Galbraith
  0 siblings, 1 reply; 35+ messages in thread
From: Boqun Feng @ 2017-09-19  5:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Byungchul Park, Steven Rostedt, Neeraj Upadhyay, josh,
	mathieu.desnoyers, jiangshanlai, linux-kernel, sramana, prsood,
	pkondeti, markivx, peterz, kernel-team

[-- Attachment #1: Type: text/plain, Size: 5517 bytes --]

On Mon, Sep 18, 2017 at 09:04:56PM -0700, Paul E. McKenney wrote:
> On Tue, Sep 19, 2017 at 11:48:22AM +0900, Byungchul Park wrote:
> > On Mon, Sep 18, 2017 at 07:33:29PM -0700, Paul E. McKenney wrote:
> > > > > Hello Paul and Steven,
> > > > > 

So I think this is another false positive, and the reason is we use
st->done for multiple purposes.

> > > > > This is saying:
> > > > > 
> > > > > Thread A
> > > > > --------
> > > > > takedown_cpu()
> > > > >    irq_lock_sparse()
> > > > >    wait_for_completion(&st->done) // Wait for completion of B

Thread A wait for the idle task on the outgoing to set the st->state to 
CPUHP_AP_IDLE_DEAD(i.e. the corresponding complete() is the one in
cpuhp_complete_idle_dead()), and it happens when we try to _offline_ a
cpu.

> > > > >    irq_unlock_sparse()
> > > > > 
> > > > > Thread B
> > > > > --------
> > > > > cpuhp_invoke_callback()
> > > > >    irq_lock_sparse() // Wait for A to irq_unlock_sparse()

irq_affinity_online_cpu() is called here, so it happens when we try to
_online_ a cpu.

> > > > >    (on the way going to complete(&st->done))

and we are going to complete(&st->done) in a hotplug thread context to
indicate the hotplug thread has finished its job(i.e. this complete() is
the one in cpuhp_thread_fun()).


So even though the &st->done are the same instance, the deadlock could
not happen, I think, as we could not up/down a same cpu at the same
time?

If I'm not missing something subtle. To fix this we can either

1)	have dedicated completion instances for different wait purposes
	in cpuhp_cpu_state.

or

2)	extend crossrelease to have the "subclass" concept, so that
	callsite of complete() and wait_for_completion() for the same
	completion instance but with different purposes could be
	differed by lockdep.

Thoughts?

Regards,
Boqun

> > > > > 
> > > > > So, lockdep consider this as a deadlock.
> > > > > Is it possible to happen?
> > > > 
> > > > In addition, if it's impossible, then we should fix lock class
> > > > assignments so that the locks actually have different classes.
> > > 
> > > Interesting, and thank you for the analysis!
> > > 
> > > The strange thing is that the way you describe it, this would be a
> > > deterministic deadlock.  Yet CPU hotplug operations complete just fine
> > > in my tests.  What am I missing here?
> > 
> > Hi, :)
> > 
> > Lockdep basically reports either (1) an actual deadlock happened at the
> > time or (2) a deadlock possibility, even w/o LOCKDEP_CROSSRELEASE.
> > 
> > Both are useful. But LOCKDEP_CROSSRELEASE can only do the latter. IOW,
> > the deadlock would actually happen _only_ when the two threads(A and B)
> > run simultaniously.
> > 
> > In your case, those two threads might run at different timings. So it's
> > not an actual deadlock, but still has a possibility for the problem to
> > happen later.
> 
> Fair enough, if the wakeup always happened first, deadlock might well
> be avoided.  If the sleep happened first, I suspect deadlock would
> be deterministic in this case.
> 
> 							Thanx, Paul
> 
> > > > > Thanks,
> > > > > Byungchul
> > > > > 
> > > > > > [   35.313943] 
> > > > > > [   35.313943] 3 locks held by torture_onoff/766:
> > > > > > [   35.313943]  #0:  (cpu_add_remove_lock){+.+.}, at: [<ffffffffb9060be2>] do_cpu_down+0x22/0x50
> > > > > > [   35.313943]  #1:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffffb90acc41>] percpu_down_write+0x21/0xf0
> > > > > > [   35.313943]  #2:  (sparse_irq_lock){+.+.}, at: [<ffffffffb90c5e42>] irq_lock_sparse+0x12/0x20
> > > > > > [   35.313943] 
> > > > > > [   35.313943] stack backtrace:
> > > > > > [   35.313943] CPU: 7 PID: 766 Comm: torture_onoff Not tainted 4.13.0-rc4+ #1
> > > > > > [   35.313943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > > [   35.313943] Call Trace:
> > > > > > [   35.313943]  dump_stack+0x67/0x97
> > > > > > [   35.313943]  print_circular_bug+0x21d/0x330
> > > > > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > > > > [   35.313943]  check_prev_add+0x401/0x800
> > > > > > [   35.313943]  ? wake_up_q+0x70/0x70
> > > > > > [   35.313943]  __lock_acquire+0x1100/0x11a0
> > > > > > [   35.313943]  ? __lock_acquire+0x1100/0x11a0
> > > > > > [   35.313943]  ? add_lock_to_list.isra.31+0xc0/0xc0
> > > > > > [   35.313943]  lock_acquire+0x9e/0x1e0
> > > > > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > > > > [   35.313943]  wait_for_completion+0x36/0x130
> > > > > > [   35.313943]  ? takedown_cpu+0x86/0xf0
> > > > > > [   35.313943]  ? stop_machine_cpuslocked+0xb9/0xd0
> > > > > > [   35.313943]  ? cpuhp_invoke_callback+0x8b0/0x8b0
> > > > > > [   35.313943]  ? cpuhp_complete_idle_dead+0x10/0x10
> > > > > > [   35.313943]  takedown_cpu+0x86/0xf0
> > > > > > [   35.313943]  cpuhp_invoke_callback+0xa7/0x8b0
> > > > > > [   35.313943]  cpuhp_down_callbacks+0x3d/0x80
> > > > > > [   35.313943]  _cpu_down+0xbb/0xf0
> > > > > > [   35.313943]  do_cpu_down+0x39/0x50
> > > > > > [   35.313943]  cpu_down+0xb/0x10
> > > > > > [   35.313943]  torture_offline+0x75/0x140
> > > > > > [   35.313943]  torture_onoff+0x102/0x1e0
> > > > > > [   35.313943]  kthread+0x142/0x180
> > > > > > [   35.313943]  ? torture_kthread_stopping+0x70/0x70
> > > > > > [   35.313943]  ? kthread_create_on_node+0x40/0x40
> > > > > > [   35.313943]  ret_from_fork+0x27/0x40
> > > > 
> > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  5:37                             ` Boqun Feng
@ 2017-09-19  6:11                               ` Mike Galbraith
  2017-09-19  6:53                                 ` Byungchul Park
  2017-09-19 13:40                                 ` Paul E. McKenney
  0 siblings, 2 replies; 35+ messages in thread
From: Mike Galbraith @ 2017-09-19  6:11 UTC (permalink / raw)
  To: Boqun Feng, Paul E. McKenney
  Cc: Byungchul Park, Steven Rostedt, Neeraj Upadhyay, josh,
	mathieu.desnoyers, jiangshanlai, linux-kernel, sramana, prsood,
	pkondeti, markivx, peterz, kernel-team

On Tue, 2017-09-19 at 13:37 +0800, Boqun Feng wrote:
> On Mon, Sep 18, 2017 at 09:04:56PM -0700, Paul E. McKenney wrote:
> > On Tue, Sep 19, 2017 at 11:48:22AM +0900, Byungchul Park wrote:
> > > On Mon, Sep 18, 2017 at 07:33:29PM -0700, Paul E. McKenney wrote:
> > > > > > Hello Paul and Steven,
> > > > > > 
> 
> So I think this is another false positive, and the reason is we use
> st->done for multiple purposes.
> 
> > > > > > This is saying:
> > > > > > 
> > > > > > Thread A
> > > > > > --------
> > > > > > takedown_cpu()
> > > > > >    irq_lock_sparse()
> > > > > >    wait_for_completion(&st->done) // Wait for completion of B
> 
> Thread A wait for the idle task on the outgoing to set the st->state to 
> CPUHP_AP_IDLE_DEAD(i.e. the corresponding complete() is the one in
> cpuhp_complete_idle_dead()), and it happens when we try to _offline_ a
> cpu.
> 
> > > > > >    irq_unlock_sparse()
> > > > > > 
> > > > > > Thread B
> > > > > > --------
> > > > > > cpuhp_invoke_callback()
> > > > > >    irq_lock_sparse() // Wait for A to irq_unlock_sparse()
> 
> irq_affinity_online_cpu() is called here, so it happens when we try to
> _online_ a cpu.
> 
> > > > > >    (on the way going to complete(&st->done))
> 
> and we are going to complete(&st->done) in a hotplug thread context to
> indicate the hotplug thread has finished its job(i.e. this complete() is
> the one in cpuhp_thread_fun()).
> 
> 
> So even though the &st->done are the same instance, the deadlock could
> not happen, I think, as we could not up/down a same cpu at the same
> time?
> 
> If I'm not missing something subtle. To fix this we can either
> 
> 1)	have dedicated completion instances for different wait purposes
> 	in cpuhp_cpu_state.
> 
> or
> 
> 2)	extend crossrelease to have the "subclass" concept, so that
> 	callsite of complete() and wait_for_completion() for the same
> 	completion instance but with different purposes could be
> 	differed by lockdep.
> 
> Thoughts?

https://lkml.org/lkml/2017/9/5/184

Peter's patches worked for me, but per tglx, additional (non-
grasshopper level) hotplug-fu is required.

	-Mike

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  6:11                               ` Mike Galbraith
@ 2017-09-19  6:53                                 ` Byungchul Park
  2017-09-19 13:40                                 ` Paul E. McKenney
  1 sibling, 0 replies; 35+ messages in thread
From: Byungchul Park @ 2017-09-19  6:53 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Boqun Feng, Paul E. McKenney, Steven Rostedt, Neeraj Upadhyay,
	josh, mathieu.desnoyers, jiangshanlai, linux-kernel, sramana,
	prsood, pkondeti, markivx, peterz, kernel-team

On Tue, Sep 19, 2017 at 08:11:53AM +0200, Mike Galbraith wrote:
> https://lkml.org/lkml/2017/9/5/184

Now, I checked the patches above. It looks to be a good approach to me.

Thanks,
Byungchul

> Peter's patches worked for me, but per tglx, additional (non-
> grasshopper level) hotplug-fu is required.
> 
> 	-Mike

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19  6:11                               ` Mike Galbraith
  2017-09-19  6:53                                 ` Byungchul Park
@ 2017-09-19 13:40                                 ` Paul E. McKenney
  1 sibling, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-19 13:40 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Boqun Feng, Byungchul Park, Steven Rostedt, Neeraj Upadhyay,
	josh, mathieu.desnoyers, jiangshanlai, linux-kernel, sramana,
	prsood, pkondeti, markivx, peterz, kernel-team

On Tue, Sep 19, 2017 at 08:11:53AM +0200, Mike Galbraith wrote:
> On Tue, 2017-09-19 at 13:37 +0800, Boqun Feng wrote:
> > On Mon, Sep 18, 2017 at 09:04:56PM -0700, Paul E. McKenney wrote:
> > > On Tue, Sep 19, 2017 at 11:48:22AM +0900, Byungchul Park wrote:
> > > > On Mon, Sep 18, 2017 at 07:33:29PM -0700, Paul E. McKenney wrote:
> > > > > > > Hello Paul and Steven,
> > > > > > > 
> > 
> > So I think this is another false positive, and the reason is we use
> > st->done for multiple purposes.
> > 
> > > > > > > This is saying:
> > > > > > > 
> > > > > > > Thread A
> > > > > > > --------
> > > > > > > takedown_cpu()
> > > > > > >    irq_lock_sparse()
> > > > > > >    wait_for_completion(&st->done) // Wait for completion of B
> > 
> > Thread A wait for the idle task on the outgoing to set the st->state to 
> > CPUHP_AP_IDLE_DEAD(i.e. the corresponding complete() is the one in
> > cpuhp_complete_idle_dead()), and it happens when we try to _offline_ a
> > cpu.
> > 
> > > > > > >    irq_unlock_sparse()
> > > > > > > 
> > > > > > > Thread B
> > > > > > > --------
> > > > > > > cpuhp_invoke_callback()
> > > > > > >    irq_lock_sparse() // Wait for A to irq_unlock_sparse()
> > 
> > irq_affinity_online_cpu() is called here, so it happens when we try to
> > _online_ a cpu.
> > 
> > > > > > >    (on the way going to complete(&st->done))
> > 
> > and we are going to complete(&st->done) in a hotplug thread context to
> > indicate the hotplug thread has finished its job(i.e. this complete() is
> > the one in cpuhp_thread_fun()).
> > 
> > 
> > So even though the &st->done are the same instance, the deadlock could
> > not happen, I think, as we could not up/down a same cpu at the same
> > time?
> > 
> > If I'm not missing something subtle. To fix this we can either
> > 
> > 1)	have dedicated completion instances for different wait purposes
> > 	in cpuhp_cpu_state.
> > 
> > or
> > 
> > 2)	extend crossrelease to have the "subclass" concept, so that
> > 	callsite of complete() and wait_for_completion() for the same
> > 	completion instance but with different purposes could be
> > 	differed by lockdep.
> > 
> > Thoughts?
> 
> https://lkml.org/lkml/2017/9/5/184
> 
> Peter's patches worked for me, but per tglx, additional (non-
> grasshopper level) hotplug-fu is required.

Thank you, I will give these a go!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:24           ` Paul E. McKenney
  2017-09-18 16:29             ` Steven Rostedt
@ 2017-09-19 15:31             ` Paul E. McKenney
  2017-09-19 15:58               ` Steven Rostedt
  2017-09-21 13:59               ` Peter Zijlstra
  1 sibling, 2 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-19 15:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Mon, Sep 18, 2017 at 09:24:12AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 18, 2017 at 12:12:13PM -0400, Steven Rostedt wrote:
> > On Mon, 18 Sep 2017 09:01:25 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > 
> > >     sched: Make resched_cpu() unconditional
> > >     
> > >     The current implementation of synchronize_sched_expedited() incorrectly
> > >     assumes that resched_cpu() is unconditional, which it is not.  This means
> > >     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
> > >     fails as follows (analysis by Neeraj Upadhyay):
> > >     
> > >     o    CPU1 is waiting for expedited wait to complete:
> > >     sync_rcu_exp_select_cpus
> > >          rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
> > >          IPI sent to CPU5
> > >     
> > >     synchronize_sched_expedited_wait
> > >              ret = swait_event_timeout(
> > >                                          rsp->expedited_wq,
> > >       sync_rcu_preempt_exp_done(rnp_root),
> > >                                          jiffies_stall);
> > >     
> > >                 expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
> > >     
> > >     o    CPU5 handles IPI and fails to acquire rq lock.
> > >     
> > >     Handles IPI
> > >          sync_sched_exp_handler
> > >              resched_cpu
> > >                  returns while failing to try lock acquire rq->lock
> > >              need_resched is not set
> > >     
> > >     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
> > >          idle (schedule() is not called).
> > >     
> > >     o    CPU 1 reports RCU stall.
> > >     
> > >     Given that resched_cpu() is used only by RCU, this commit fixes the
> > >     assumption by making resched_cpu() unconditional.
> > 
> > Probably want to run this with several workloads with lockdep enabled
> > first.
> 
> As soon as I work through the backlog of lockdep complaints that
> appeared in the last merge window...  :-(

And this patch survived all rcutorture scenarios, including those with
lockdep enabled.  There were failures, but these are pre-existing issues
I am chasing:  Lost timeouts on TREE01 and rt_mutex trying to awaken
an offline CPU in TREE03.

So I have this one queued.  Objections?

							Thanx, Paul

------------------------------------------------------------------------

commit bc43e2e7e08134e6f403ac845edcf4f85668d803
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Sep 18 08:54:40 2017 -0700

    sched: Make resched_cpu() unconditional
    
    The current implementation of synchronize_sched_expedited() incorrectly
    assumes that resched_cpu() is unconditional, which it is not.  This means
    that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
    fails as follows (analysis by Neeraj Upadhyay):
    
    o    CPU1 is waiting for expedited wait to complete:
    sync_rcu_exp_select_cpus
         rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
         IPI sent to CPU5
    
    synchronize_sched_expedited_wait
             ret = swait_event_timeout(
                                         rsp->expedited_wq,
      sync_rcu_preempt_exp_done(rnp_root),
                                         jiffies_stall);
    
                expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
    
    o    CPU5 handles IPI and fails to acquire rq lock.
    
    Handles IPI
         sync_sched_exp_handler
             resched_cpu
                 returns while failing to try lock acquire rq->lock
             need_resched is not set
    
    o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
         idle (schedule() is not called).
    
    o    CPU 1 reports RCU stall.
    
    Given that resched_cpu() is used only by RCU, this commit fixes the
    assumption by making resched_cpu() unconditional.
    
    Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cab8c5ec128e..b2281971894c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,8 +505,7 @@ void resched_cpu(int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
-	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
-		return;
+	raw_spin_lock_irqsave(&rq->lock, flags);
 	resched_curr(rq);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19 15:31             ` Paul E. McKenney
@ 2017-09-19 15:58               ` Steven Rostedt
  2017-09-19 16:12                 ` Paul E. McKenney
  2017-09-21 13:59               ` Peter Zijlstra
  1 sibling, 1 reply; 35+ messages in thread
From: Steven Rostedt @ 2017-09-19 15:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Tue, 19 Sep 2017 08:31:26 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> commit bc43e2e7e08134e6f403ac845edcf4f85668d803
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Mon Sep 18 08:54:40 2017 -0700
> 
>     sched: Make resched_cpu() unconditional
>     
>     The current implementation of synchronize_sched_expedited() incorrectly
>     assumes that resched_cpu() is unconditional, which it is not.  This means
>     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
>     fails as follows (analysis by Neeraj Upadhyay):
>     
>     o    CPU1 is waiting for expedited wait to complete:
>     sync_rcu_exp_select_cpus
>          rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
>          IPI sent to CPU5
>     
>     synchronize_sched_expedited_wait
>              ret = swait_event_timeout(
>                                          rsp->expedited_wq,
>       sync_rcu_preempt_exp_done(rnp_root),
>                                          jiffies_stall);
>     
>                 expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
>     
>     o    CPU5 handles IPI and fails to acquire rq lock.
>     
>     Handles IPI
>          sync_sched_exp_handler
>              resched_cpu
>                  returns while failing to try lock acquire rq->lock
>              need_resched is not set
>     
>     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
>          idle (schedule() is not called).
>     
>     o    CPU 1 reports RCU stall.
>     
>     Given that resched_cpu() is used only by RCU, this commit fixes the

"is now only used by RCU", as it was created for another purpose.

>     assumption by making resched_cpu() unconditional.
>     
>     Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Steven Rostedt <rostedt@goodmis.org>

Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

-- Steve

> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cab8c5ec128e..b2281971894c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -505,8 +505,7 @@ void resched_cpu(int cpu)
>  	struct rq *rq = cpu_rq(cpu);
>  	unsigned long flags;
>  
> -	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
> -		return;
> +	raw_spin_lock_irqsave(&rq->lock, flags);
>  	resched_curr(rq);
>  	raw_spin_unlock_irqrestore(&rq->lock, flags);
>  }

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19 15:58               ` Steven Rostedt
@ 2017-09-19 16:12                 ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-19 16:12 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx, peterz

On Tue, Sep 19, 2017 at 11:58:59AM -0400, Steven Rostedt wrote:
> On Tue, 19 Sep 2017 08:31:26 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > commit bc43e2e7e08134e6f403ac845edcf4f85668d803
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Mon Sep 18 08:54:40 2017 -0700
> > 
> >     sched: Make resched_cpu() unconditional
> >     
> >     The current implementation of synchronize_sched_expedited() incorrectly
> >     assumes that resched_cpu() is unconditional, which it is not.  This means
> >     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
> >     fails as follows (analysis by Neeraj Upadhyay):
> >     
> >     o    CPU1 is waiting for expedited wait to complete:
> >     sync_rcu_exp_select_cpus
> >          rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
> >          IPI sent to CPU5
> >     
> >     synchronize_sched_expedited_wait
> >              ret = swait_event_timeout(
> >                                          rsp->expedited_wq,
> >       sync_rcu_preempt_exp_done(rnp_root),
> >                                          jiffies_stall);
> >     
> >                 expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
> >     
> >     o    CPU5 handles IPI and fails to acquire rq lock.
> >     
> >     Handles IPI
> >          sync_sched_exp_handler
> >              resched_cpu
> >                  returns while failing to try lock acquire rq->lock
> >              need_resched is not set
> >     
> >     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
> >          idle (schedule() is not called).
> >     
> >     o    CPU 1 reports RCU stall.
> >     
> >     Given that resched_cpu() is used only by RCU, this commit fixes the
> 
> "is now only used by RCU", as it was created for another purpose.

Good catch, fixed.

> >     assumption by making resched_cpu() unconditional.
> >     
> >     Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> >     Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> >     Cc: Peter Zijlstra <peterz@infradead.org>
> >     Cc: Steven Rostedt <rostedt@goodmis.org>
> 
> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

Applied, thank you!

							Thanx, Paul

> -- Steve
> 
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index cab8c5ec128e..b2281971894c 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -505,8 +505,7 @@ void resched_cpu(int cpu)
> >  	struct rq *rq = cpu_rq(cpu);
> >  	unsigned long flags;
> >  
> > -	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
> > -		return;
> > +	raw_spin_lock_irqsave(&rq->lock, flags);
> >  	resched_curr(rq);
> >  	raw_spin_unlock_irqrestore(&rq->lock, flags);
> >  }
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 15:11     ` Steven Rostedt
  2017-09-18 16:01       ` Paul E. McKenney
@ 2017-09-21 13:55       ` Peter Zijlstra
  2017-09-21 15:31         ` Paul E. McKenney
  2017-09-21 15:46         ` Steven Rostedt
  1 sibling, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2017-09-21 13:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Neeraj Upadhyay, paulmck, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx

On Mon, Sep 18, 2017 at 11:11:05AM -0400, Steven Rostedt wrote:
> On Sun, 17 Sep 2017 11:37:06 +0530
> Neeraj Upadhyay <neeraju@codeaurora.org> wrote:
> 
> > Hi Paul, how about replacing raw_spin_trylock_irqsave with
> > raw_spin_lock_irqsave in resched_cpu()? Are there any paths
> > in RCU code, which depend on trylock check/spinlock recursion?
> 
> It looks to me that resched_cpu() was added for nohz full sched
> balancing, but is not longer used by that. The only user is currently
> RCU. Perhaps we should change that from a trylock to a lock.

No, regular NOHZ balancing. NOHZ FULL wasn't conceived back then.

  46cb4b7c88fa ("sched: dynticks idle load balancing")

And yeah, its no longer used for that.

And given RCU is the only user of that thing, I suppose we can indeed
change it to a full lock.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-18 16:55               ` Paul E. McKenney
  2017-09-18 23:53                 ` Paul E. McKenney
@ 2017-09-21 13:57                 ` Peter Zijlstra
  2017-09-21 15:33                   ` Paul E. McKenney
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2017-09-21 13:57 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Mon, Sep 18, 2017 at 09:55:27AM -0700, Paul E. McKenney wrote:
> On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:
> > On Mon, 18 Sep 2017 09:24:12 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > 
> > > As soon as I work through the backlog of lockdep complaints that
> > > appeared in the last merge window...  :-(
> > > 
> > > sparse_irq_lock, I am looking at you!!!  ;-)

That one is a false positive and I have send patches to address.

> > I just hit one too, and decided to write a patch to show a chain of 3
> > when applicable.
> > 
> > For example:
> > 
> >  Chain exists of:
> >    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> > 
> >   Possible unsafe locking scenario by crosslock:
> > 
> >         CPU0                    CPU1                    CPU2
> >         ----                    ----                    ----
> >    lock(smpboot_threads_lock);
> >    lock((complete)&self->parked);
> >                                 lock(cpu_hotplug_lock.rw_sem);
> >                                 lock(smpboot_threads_lock);
> >                                                        lock(cpu_hotplug_lock.rw_sem);
> >                                                        unlock((complete)&self->parked);
> > 
> >   *** DEADLOCK ***
> > 
> > :-)
> 
> Nice!!!

That one looks like the watchdog thing, and Thomas was poking at that.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-19 15:31             ` Paul E. McKenney
  2017-09-19 15:58               ` Steven Rostedt
@ 2017-09-21 13:59               ` Peter Zijlstra
  2017-09-21 16:00                 ` Paul E. McKenney
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2017-09-21 13:59 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Tue, Sep 19, 2017 at 08:31:26AM -0700, Paul E. McKenney wrote:
> So I have this one queued.  Objections?

Changelog reads like its whitespace damaged.

> commit bc43e2e7e08134e6f403ac845edcf4f85668d803
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Mon Sep 18 08:54:40 2017 -0700
> 
>     sched: Make resched_cpu() unconditional
>     
>     The current implementation of synchronize_sched_expedited() incorrectly
>     assumes that resched_cpu() is unconditional, which it is not.  This means
>     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
>     fails as follows (analysis by Neeraj Upadhyay):
>     
>     o    CPU1 is waiting for expedited wait to complete:
>     sync_rcu_exp_select_cpus
>          rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
>          IPI sent to CPU5
>     
>     synchronize_sched_expedited_wait
>              ret = swait_event_timeout(
>                                          rsp->expedited_wq,
>       sync_rcu_preempt_exp_done(rnp_root),
>                                          jiffies_stall);
>     
>                 expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())
>     
>     o    CPU5 handles IPI and fails to acquire rq lock.
>     
>     Handles IPI
>          sync_sched_exp_handler
>              resched_cpu
>                  returns while failing to try lock acquire rq->lock
>              need_resched is not set
>     
>     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
>          idle (schedule() is not called).
>     
>     o    CPU 1 reports RCU stall.
>     
>     Given that resched_cpu() is used only by RCU, this commit fixes the
>     assumption by making resched_cpu() unconditional.
>     
>     Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Steven Rostedt <rostedt@goodmis.org>
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cab8c5ec128e..b2281971894c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -505,8 +505,7 @@ void resched_cpu(int cpu)
>  	struct rq *rq = cpu_rq(cpu);
>  	unsigned long flags;
>  
> -	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
> -		return;
> +	raw_spin_lock_irqsave(&rq->lock, flags);
>  	resched_curr(rq);
>  	raw_spin_unlock_irqrestore(&rq->lock, flags);
>  }
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-21 13:55       ` Peter Zijlstra
@ 2017-09-21 15:31         ` Paul E. McKenney
  2017-09-21 16:18           ` Peter Zijlstra
  2017-09-21 15:46         ` Steven Rostedt
  1 sibling, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-21 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Thu, Sep 21, 2017 at 03:55:46PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 18, 2017 at 11:11:05AM -0400, Steven Rostedt wrote:
> > On Sun, 17 Sep 2017 11:37:06 +0530
> > Neeraj Upadhyay <neeraju@codeaurora.org> wrote:
> > 
> > > Hi Paul, how about replacing raw_spin_trylock_irqsave with
> > > raw_spin_lock_irqsave in resched_cpu()? Are there any paths
> > > in RCU code, which depend on trylock check/spinlock recursion?
> > 
> > It looks to me that resched_cpu() was added for nohz full sched
> > balancing, but is not longer used by that. The only user is currently
> > RCU. Perhaps we should change that from a trylock to a lock.
> 
> No, regular NOHZ balancing. NOHZ FULL wasn't conceived back then.
> 
>   46cb4b7c88fa ("sched: dynticks idle load balancing")
> 
> And yeah, its no longer used for that.
> 
> And given RCU is the only user of that thing, I suppose we can indeed
> change it to a full lock.

Thank you!  May I have your ack?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-21 13:57                 ` Peter Zijlstra
@ 2017-09-21 15:33                   ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-21 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Thu, Sep 21, 2017 at 03:57:49PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 18, 2017 at 09:55:27AM -0700, Paul E. McKenney wrote:
> > On Mon, Sep 18, 2017 at 12:29:31PM -0400, Steven Rostedt wrote:
> > > On Mon, 18 Sep 2017 09:24:12 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > 
> > > > As soon as I work through the backlog of lockdep complaints that
> > > > appeared in the last merge window...  :-(
> > > > 
> > > > sparse_irq_lock, I am looking at you!!!  ;-)
> 
> That one is a false positive and I have send patches to address.

I did try them out, and they work fine for me when lockdep is enabled.
I get build failures if lockdep is not enabled.  Things are a bit crazy
here, so I have not had a chance to try to fix them, though it should
not be a big deal (easy for me to say!).

> > > I just hit one too, and decided to write a patch to show a chain of 3
> > > when applicable.
> > > 
> > > For example:
> > > 
> > >  Chain exists of:
> > >    cpu_hotplug_lock.rw_sem --> smpboot_threads_lock --> (complete)&self->parked
> > > 
> > >   Possible unsafe locking scenario by crosslock:
> > > 
> > >         CPU0                    CPU1                    CPU2
> > >         ----                    ----                    ----
> > >    lock(smpboot_threads_lock);
> > >    lock((complete)&self->parked);
> > >                                 lock(cpu_hotplug_lock.rw_sem);
> > >                                 lock(smpboot_threads_lock);
> > >                                                        lock(cpu_hotplug_lock.rw_sem);
> > >                                                        unlock((complete)&self->parked);
> > > 
> > >   *** DEADLOCK ***
> > > 
> > > :-)
> > 
> > Nice!!!
> 
> That one looks like the watchdog thing, and Thomas was poking at that.

For whatever it is worth, I am still chasing lost-timer bugs.  I now know
of a large number of things that are not the cause of the problem.  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-21 13:55       ` Peter Zijlstra
  2017-09-21 15:31         ` Paul E. McKenney
@ 2017-09-21 15:46         ` Steven Rostedt
  1 sibling, 0 replies; 35+ messages in thread
From: Steven Rostedt @ 2017-09-21 15:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Neeraj Upadhyay, paulmck, josh, mathieu.desnoyers, jiangshanlai,
	linux-kernel, sramana, prsood, pkondeti, markivx

On Thu, 21 Sep 2017 15:55:46 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Sep 18, 2017 at 11:11:05AM -0400, Steven Rostedt wrote:
> > On Sun, 17 Sep 2017 11:37:06 +0530
> > Neeraj Upadhyay <neeraju@codeaurora.org> wrote:
> >   
> > > Hi Paul, how about replacing raw_spin_trylock_irqsave with
> > > raw_spin_lock_irqsave in resched_cpu()? Are there any paths
> > > in RCU code, which depend on trylock check/spinlock recursion?  
> > 
> > It looks to me that resched_cpu() was added for nohz full sched
> > balancing, but is not longer used by that. The only user is currently
> > RCU. Perhaps we should change that from a trylock to a lock.  
> 
> No, regular NOHZ balancing. NOHZ FULL wasn't conceived back then.
> 
>   46cb4b7c88fa ("sched: dynticks idle load balancing")

Bah, every time I type NOHZ today I add FULL after it. I did mean NOHZ
without FULL. :-p

-- Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-21 13:59               ` Peter Zijlstra
@ 2017-09-21 16:00                 ` Paul E. McKenney
  2017-09-21 16:30                   ` Peter Zijlstra
  0 siblings, 1 reply; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-21 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Thu, Sep 21, 2017 at 03:59:46PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 19, 2017 at 08:31:26AM -0700, Paul E. McKenney wrote:
> > So I have this one queued.  Objections?
> 
> Changelog reads like its whitespace damaged.

It does, now that you mention it.  How about the updated version below?

							Thanx, Paul

------------------------------------------------------------------------

commit c21c9b78182e35eb0e72ef4e3bba3054f26eaaea
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Sep 18 08:54:40 2017 -0700

    sched: Make resched_cpu() unconditional
    
    The current implementation of synchronize_sched_expedited() incorrectly
    assumes that resched_cpu() is unconditional, which it is not.  This means
    that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
    fails as follows (analysis by Neeraj Upadhyay):
    
    o	CPU1 is waiting for expedited wait to complete:
    
    	sync_rcu_exp_select_cpus
    	     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
    	     IPI sent to CPU5
    
    	synchronize_sched_expedited_wait
    		 ret = swait_event_timeout(rsp->expedited_wq,
    					   sync_rcu_preempt_exp_done(rnp_root),
    					   jiffies_stall);
    
    	expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
    
    o	CPU5 handles IPI and fails to acquire rq lock.
    
    	Handles IPI
    	     sync_sched_exp_handler
    		 resched_cpu
    		     returns while failing to try lock acquire rq->lock
    		 need_resched is not set
    
    o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
         idle (schedule() is not called).
    
    o    CPU 1 reports RCU stall.
    
    Given that resched_cpu() is now used only by RCU, this commit fixes the
    assumption by making resched_cpu() unconditional.
    
    Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cab8c5ec128e..b2281971894c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,8 +505,7 @@ void resched_cpu(int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
-	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
-		return;
+	raw_spin_lock_irqsave(&rq->lock, flags);
 	resched_curr(rq);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-21 15:31         ` Paul E. McKenney
@ 2017-09-21 16:18           ` Peter Zijlstra
  0 siblings, 0 replies; 35+ messages in thread
From: Peter Zijlstra @ 2017-09-21 16:18 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Thu, Sep 21, 2017 at 08:31:34AM -0700, Paul E. McKenney wrote:
> > And given RCU is the only user of that thing, I suppose we can indeed
> > change it to a full lock.
> 
> Thank you!  May I have your ack?

Last version I saw still had a Changelog that looked like whitespace
damage.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-21 16:00                 ` Paul E. McKenney
@ 2017-09-21 16:30                   ` Peter Zijlstra
  2017-09-21 16:47                     ` Paul E. McKenney
  0 siblings, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2017-09-21 16:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Thu, Sep 21, 2017 at 09:00:48AM -0700, Paul E. McKenney wrote:
> commit c21c9b78182e35eb0e72ef4e3bba3054f26eaaea
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Mon Sep 18 08:54:40 2017 -0700
> 
>     sched: Make resched_cpu() unconditional
>     
>     The current implementation of synchronize_sched_expedited() incorrectly
>     assumes that resched_cpu() is unconditional, which it is not.  This means
>     that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
>     fails as follows (analysis by Neeraj Upadhyay):
>     
>     o	CPU1 is waiting for expedited wait to complete:
>     
>     	sync_rcu_exp_select_cpus
>     	     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
>     	     IPI sent to CPU5
>     
>     	synchronize_sched_expedited_wait
>     		 ret = swait_event_timeout(rsp->expedited_wq,
>     					   sync_rcu_preempt_exp_done(rnp_root),
>     					   jiffies_stall);
>     
>     	expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
>     
>     o	CPU5 handles IPI and fails to acquire rq lock.
>     
>     	Handles IPI
>     	     sync_sched_exp_handler
>     		 resched_cpu
>     		     returns while failing to try lock acquire rq->lock
>     		 need_resched is not set
>     
>     o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
>          idle (schedule() is not called).
>     
>     o    CPU 1 reports RCU stall.


Inconsistent spacing after your bullet 'o', first two points have a
space the last two a tab or so.

>     Given that resched_cpu() is now used only by RCU, this commit fixes the
>     assumption by making resched_cpu() unconditional.

Other than that, yes looks _much_ better, thanks!

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Also, you might want to tag it for stable.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Query regarding synchronize_sched_expedited and resched_cpu
  2017-09-21 16:30                   ` Peter Zijlstra
@ 2017-09-21 16:47                     ` Paul E. McKenney
  0 siblings, 0 replies; 35+ messages in thread
From: Paul E. McKenney @ 2017-09-21 16:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, Neeraj Upadhyay, josh, mathieu.desnoyers,
	jiangshanlai, linux-kernel, sramana, prsood, pkondeti, markivx

On Thu, Sep 21, 2017 at 06:30:12PM +0200, Peter Zijlstra wrote:
> On Thu, Sep 21, 2017 at 09:00:48AM -0700, Paul E. McKenney wrote:
> > commit c21c9b78182e35eb0e72ef4e3bba3054f26eaaea

[ . . . ]

> Inconsistent spacing after your bullet 'o', first two points have a
> space the last two a tab or so.
> 
> >     Given that resched_cpu() is now used only by RCU, this commit fixes the
> >     assumption by making resched_cpu() unconditional.
> 
> Other than that, yes looks _much_ better, thanks!
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> Also, you might want to tag it for stable.

Like this, then?

							Thanx, Paul

------------------------------------------------------------------------

commit 62d94c97e96d5aa6a977a53dd007029df7a65586
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Mon Sep 18 08:54:40 2017 -0700

    sched: Make resched_cpu() unconditional
    
    The current implementation of synchronize_sched_expedited() incorrectly
    assumes that resched_cpu() is unconditional, which it is not.  This means
    that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
    fails as follows (analysis by Neeraj Upadhyay):
    
    o	CPU1 is waiting for expedited wait to complete:
    
    	sync_rcu_exp_select_cpus
    	     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
    	     IPI sent to CPU5
    
    	synchronize_sched_expedited_wait
    		 ret = swait_event_timeout(rsp->expedited_wq,
    					   sync_rcu_preempt_exp_done(rnp_root),
    					   jiffies_stall);
    
    	expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
    
    o	CPU5 handles IPI and fails to acquire rq lock.
    
    	Handles IPI
    	     sync_sched_exp_handler
    		 resched_cpu
    		     returns while failing to try lock acquire rq->lock
    		 need_resched is not set
    
    o	CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
    	idle (schedule() is not called).
    
    o	CPU 1 reports RCU stall.
    
    Given that resched_cpu() is now used only by RCU, this commit fixes the
    assumption by making resched_cpu() unconditional.
    
    Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
    Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: stable@vger.kernel.org

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cab8c5ec128e..b2281971894c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -505,8 +505,7 @@ void resched_cpu(int cpu)
 	struct rq *rq = cpu_rq(cpu);
 	unsigned long flags;
 
-	if (!raw_spin_trylock_irqsave(&rq->lock, flags))
-		return;
+	raw_spin_lock_irqsave(&rq->lock, flags);
 	resched_curr(rq);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }

^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-09-21 16:47 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-15 11:14 Query regarding synchronize_sched_expedited and resched_cpu Neeraj Upadhyay
2017-09-17  1:00 ` Paul E. McKenney
2017-09-17  6:07   ` Neeraj Upadhyay
2017-09-18 15:11     ` Steven Rostedt
2017-09-18 16:01       ` Paul E. McKenney
2017-09-18 16:12         ` Steven Rostedt
2017-09-18 16:24           ` Paul E. McKenney
2017-09-18 16:29             ` Steven Rostedt
2017-09-18 16:55               ` Paul E. McKenney
2017-09-18 23:53                 ` Paul E. McKenney
2017-09-19  1:23                   ` Steven Rostedt
2017-09-19  2:26                     ` Paul E. McKenney
2017-09-19  1:50                   ` Byungchul Park
2017-09-19  2:06                     ` Byungchul Park
2017-09-19  2:33                       ` Paul E. McKenney
2017-09-19  2:48                         ` Byungchul Park
2017-09-19  4:04                           ` Paul E. McKenney
2017-09-19  5:37                             ` Boqun Feng
2017-09-19  6:11                               ` Mike Galbraith
2017-09-19  6:53                                 ` Byungchul Park
2017-09-19 13:40                                 ` Paul E. McKenney
2017-09-21 13:57                 ` Peter Zijlstra
2017-09-21 15:33                   ` Paul E. McKenney
2017-09-19  1:55               ` Byungchul Park
2017-09-19 15:31             ` Paul E. McKenney
2017-09-19 15:58               ` Steven Rostedt
2017-09-19 16:12                 ` Paul E. McKenney
2017-09-21 13:59               ` Peter Zijlstra
2017-09-21 16:00                 ` Paul E. McKenney
2017-09-21 16:30                   ` Peter Zijlstra
2017-09-21 16:47                     ` Paul E. McKenney
2017-09-21 13:55       ` Peter Zijlstra
2017-09-21 15:31         ` Paul E. McKenney
2017-09-21 16:18           ` Peter Zijlstra
2017-09-21 15:46         ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.