All of lore.kernel.org
 help / color / mirror / Atom feed
* rcu_prempt stalls / lockup
@ 2014-03-31 23:02 Dave Jones
  2014-03-31 23:22 ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-03-31 23:02 UTC (permalink / raw)
  To: paulmck; +Cc: Linux Kernel

You can tell the merge window is open, because I'm back to breaking RCU.

[ 1023.934532] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1023.935501]  (detected by 0, t=6502 jiffies, g=28666, c=28665, q=0)
[ 1023.936299] INFO: Stall ended before state dump start
[ 1218.871404] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1218.872399]  (detected by 0, t=26007 jiffies, g=28666, c=28665, q=0)
[ 1218.873256] INFO: Stall ended before state dump start
[ 1413.808406] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1413.809387]  (detected by 0, t=45512 jiffies, g=28666, c=28665, q=0)
[ 1413.810113] INFO: Stall ended before state dump start
[ 1608.745637] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1608.746614]  (detected by 0, t=65017 jiffies, g=28666, c=28665, q=0)
[ 1608.747327] INFO: Stall ended before state dump start
[ 1803.682892] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1803.683876]  (detected by 1, t=84522 jiffies, g=28666, c=28665, q=0)
[ 1803.684741] INFO: Stall ended before state dump start
[ 1998.620169] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1998.621123]  (detected by 0, t=104027 jiffies, g=28666, c=28665, q=0)
[ 1998.621916] INFO: Stall ended before state dump start
[ 2193.557459] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2193.558425]  (detected by 0, t=123532 jiffies, g=28666, c=28665, q=0)
[ 2193.559248] INFO: Stall ended before state dump start
[ 2388.494719] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2388.495688]  (detected by 0, t=143037 jiffies, g=28666, c=28665, q=0)
[ 2388.496417] INFO: Stall ended before state dump start
[ 2583.431983] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2583.432967]  (detected by 0, t=162542 jiffies, g=28666, c=28665, q=0)
[ 2583.433852] INFO: Stall ended before state dump start
[ 2778.369242] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2778.370235]  (detected by 0, t=182047 jiffies, g=28666, c=28665, q=0)
[ 2778.371037] INFO: Stall ended before state dump start
[ 2973.306537] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2973.307540]  (detected by 0, t=201552 jiffies, g=28666, c=28665, q=0)
[ 2973.308347] INFO: Stall ended before state dump start
[ 3168.244036] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 3168.245056]  (detected by 0, t=221057 jiffies, g=28666, c=28665, q=0)
[ 3168.245805] INFO: Stall ended before state dump start
[ 3363.181458] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 3363.182496]  (detected by 0, t=240562 jiffies, g=28666, c=28665, q=0)
[ 3363.183289] INFO: Stall ended before state dump start
[ 3558.118835] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 3558.119879]  (detected by 0, t=260067 jiffies, g=28666, c=28665, q=0)
[ 3558.120739] INFO: Stall ended before state dump start

at that point, userspace stopped responding. cursor on console was blinking,
but I couldn't even switch tty's, or sysrq dump.

rc8 was fine, so this is todays rcu changes.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-03-31 23:02 rcu_prempt stalls / lockup Dave Jones
@ 2014-03-31 23:22 ` Paul E. McKenney
  2014-03-31 23:35   ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2014-03-31 23:22 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Mon, Mar 31, 2014 at 07:02:41PM -0400, Dave Jones wrote:
> You can tell the merge window is open, because I'm back to breaking RCU.
> 
> [ 1023.934532] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1023.935501]  (detected by 0, t=6502 jiffies, g=28666, c=28665, q=0)
> [ 1023.936299] INFO: Stall ended before state dump start
> [ 1218.871404] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1218.872399]  (detected by 0, t=26007 jiffies, g=28666, c=28665, q=0)
> [ 1218.873256] INFO: Stall ended before state dump start
> [ 1413.808406] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1413.809387]  (detected by 0, t=45512 jiffies, g=28666, c=28665, q=0)
> [ 1413.810113] INFO: Stall ended before state dump start
> [ 1608.745637] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1608.746614]  (detected by 0, t=65017 jiffies, g=28666, c=28665, q=0)
> [ 1608.747327] INFO: Stall ended before state dump start
> [ 1803.682892] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1803.683876]  (detected by 1, t=84522 jiffies, g=28666, c=28665, q=0)
> [ 1803.684741] INFO: Stall ended before state dump start
> [ 1998.620169] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1998.621123]  (detected by 0, t=104027 jiffies, g=28666, c=28665, q=0)
> [ 1998.621916] INFO: Stall ended before state dump start
> [ 2193.557459] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2193.558425]  (detected by 0, t=123532 jiffies, g=28666, c=28665, q=0)
> [ 2193.559248] INFO: Stall ended before state dump start
> [ 2388.494719] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2388.495688]  (detected by 0, t=143037 jiffies, g=28666, c=28665, q=0)
> [ 2388.496417] INFO: Stall ended before state dump start
> [ 2583.431983] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2583.432967]  (detected by 0, t=162542 jiffies, g=28666, c=28665, q=0)
> [ 2583.433852] INFO: Stall ended before state dump start
> [ 2778.369242] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2778.370235]  (detected by 0, t=182047 jiffies, g=28666, c=28665, q=0)
> [ 2778.371037] INFO: Stall ended before state dump start
> [ 2973.306537] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2973.307540]  (detected by 0, t=201552 jiffies, g=28666, c=28665, q=0)
> [ 2973.308347] INFO: Stall ended before state dump start
> [ 3168.244036] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 3168.245056]  (detected by 0, t=221057 jiffies, g=28666, c=28665, q=0)
> [ 3168.245805] INFO: Stall ended before state dump start
> [ 3363.181458] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 3363.182496]  (detected by 0, t=240562 jiffies, g=28666, c=28665, q=0)
> [ 3363.183289] INFO: Stall ended before state dump start
> [ 3558.118835] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 3558.119879]  (detected by 0, t=260067 jiffies, g=28666, c=28665, q=0)
> [ 3558.120739] INFO: Stall ended before state dump start
> 
> at that point, userspace stopped responding. cursor on console was blinking,
> but I couldn't even switch tty's, or sysrq dump.
> 
> rc8 was fine, so this is todays rcu changes.

New one on me!  Any chance of a .config file?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-03-31 23:22 ` Paul E. McKenney
@ 2014-03-31 23:35   ` Dave Jones
  2014-04-01  0:48     ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-03-31 23:35 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Mon, Mar 31, 2014 at 04:22:21PM -0700, Paul E. McKenney wrote:
 > On Mon, Mar 31, 2014 at 07:02:41PM -0400, Dave Jones wrote:
 > > You can tell the merge window is open, because I'm back to breaking RCU.
 > > 
 > > ... 
 > > [ 3558.120739] INFO: Stall ended before state dump start
 > > 
 > > at that point, userspace stopped responding. cursor on console was blinking,
 > > but I couldn't even switch tty's, or sysrq dump.
 > > 
 > > rc8 was fine, so this is todays rcu changes.
 > 
 > New one on me!  Any chance of a .config file?

http://paste.fedoraproject.org/90449/30888213/raw/

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-03-31 23:35   ` Dave Jones
@ 2014-04-01  0:48     ` Paul E. McKenney
  2014-04-01 15:08       ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-01  0:48 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Mon, Mar 31, 2014 at 07:35:52PM -0400, Dave Jones wrote:
> On Mon, Mar 31, 2014 at 04:22:21PM -0700, Paul E. McKenney wrote:
>  > On Mon, Mar 31, 2014 at 07:02:41PM -0400, Dave Jones wrote:
>  > > You can tell the merge window is open, because I'm back to breaking RCU.
>  > > 
>  > > ... 
>  > > [ 3558.120739] INFO: Stall ended before state dump start
>  > > 
>  > > at that point, userspace stopped responding. cursor on console was blinking,
>  > > but I couldn't even switch tty's, or sysrq dump.

Hmmm...  I am having a very hard time imagining any of this merge
window's RCU changes preventing a sysrq dump.  On the other hand,
having a single grace period persist without anything blocking it
is pretty strange as well.

I would hope that the sysrq path does not allocate memory, but who knows?
After all, one possible reason for the eventual hang is memory exhaustion.
So one thing to try is to do sysrq earlier in the process.  (Yeah,
I know, tough to do if you have lots of scripted systems.)

>  > > rc8 was fine, so this is todays rcu changes.
>  > 
>  > New one on me!  Any chance of a .config file?
> 
> http://paste.fedoraproject.org/90449/30888213/raw/

Given that you have CONFIG_RCU_NOCB_CPU_ALL=y, all the grace-period
activity is being driven by the grace-period kthreads ("rcu_preempt"
in this case).  This leads me to wonder if your workload if preventing
RCU's grace-period kthreads from running.  These kthreads are SCHED_OTHER,
so could potentially be preempted for a long time.  But I would expect
a softlockup message in that case.

Alternatively, I suppose a wakeup could be getting lost.  The main change
related to that this merge window was ffa83fb565fb, which eliminated
idle wakeups from RCU in the CONFIG_RCU_NOCB_CPU_ALL=y case.

So, could you please try reverting ffa83fb565fb?

If that doesn't work, I will need to put together some diagnostic patches.
Starting with the one below.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0c47e300210a..c5a163378710 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -936,7 +936,7 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
 	       smp_processor_id(), (long)(jiffies - rsp->gp_start),
 	       rsp->gpnum, rsp->completed, totqlen);
 	if (ndetected == 0)
-		pr_err("INFO: Stall ended before state dump start\n");
+		pr_err("INFO: Stall ended before state dump start, gp_kthread state: %#lx\n", rsp->gp_kthread->state);
 	else if (!trigger_all_cpu_backtrace())
 		rcu_dump_cpu_stacks(rsp);
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01  0:48     ` Paul E. McKenney
@ 2014-04-01 15:08       ` Dave Jones
  2014-04-01 15:30         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-01 15:08 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Mon, Mar 31, 2014 at 05:48:01PM -0700, Paul E. McKenney wrote:
 
 > Alternatively, I suppose a wakeup could be getting lost.  The main change
 > related to that this merge window was ffa83fb565fb, which eliminated
 > idle wakeups from RCU in the CONFIG_RCU_NOCB_CPU_ALL=y case.
 > 
 > So, could you please try reverting ffa83fb565fb?

Did a run overnight with that reverted, no luck..

[11869.580161] INFO: rcu_preempt detected stalls on CPUs/tasks:
[11869.581245]  (detected by 0, t=6502 jiffies, g=449892, c=449891, q=0)
[11869.582047] INFO: Stall ended before state dump start
[12064.517461] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12064.518539]  (detected by 0, t=26007 jiffies, g=449892, c=449891, q=0)
[12064.519347] INFO: Stall ended before state dump start
[12259.454767] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12259.455818]  (detected by 0, t=45512 jiffies, g=449892, c=449891, q=0)
[12259.456602] INFO: Stall ended before state dump start
[12454.392112] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12454.393131]  (detected by 0, t=65017 jiffies, g=449892, c=449891, q=0)
[12454.393942] INFO: Stall ended before state dump start
[12649.329446] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12649.330446]  (detected by 0, t=84522 jiffies, g=449892, c=449891, q=0)
[12649.331267] INFO: Stall ended before state dump start
[12844.266776] INFO: rcu_preempt detected stalls on CPUs/tasks:
[12844.267695]  (detected by 0, t=104027 jiffies, g=449892, c=449891, q=0)
[12844.268571] INFO: Stall ended before state dump start
[13039.204102] INFO: rcu_preempt detected stalls on CPUs/tasks:
[13039.205119]  (detected by 0, t=123532 jiffies, g=449892, c=449891, q=0)
[13039.205871] INFO: Stall ended before state dump start
[13234.141424] INFO: rcu_preempt detected stalls on CPUs/tasks:
[13234.141424] INFO: rcu_preempt detected stalls on CPUs/tasks:
[13234.142448]  (detected by 0, t=143037 jiffies, g=449892, c=449891, q=0)
[13234.143339] INFO: Stall ended before state dump start
[13429.078730] INFO: rcu_preempt detected stalls on CPUs/tasks:
[13429.079771]  (detected by 0, t=162542 jiffies, g=449892, c=449891, q=0)
[13429.080591] INFO: Stall ended before state dump start
[13624.016058] INFO: rcu_preempt detected stalls on CPUs/tasks:
[13624.017116]  (detected by 0, t=182047 jiffies, g=449892, c=449891, q=0)
[13624.018033] INFO: Stall ended before state dump start
[13818.953370] INFO: rcu_preempt detected stalls on CPUs/tasks:
[13818.954437]  (detected by 0, t=201552 jiffies, g=449892, c=449891, q=0)
[13818.955334] INFO: Stall ended before state dump start
[14013.890682] INFO: rcu_preempt detected stalls on CPUs/tasks:
[14013.891761]  (detected by 0, t=221057 jiffies, g=449892, c=449891, q=0)
[14013.892691] INFO: Stall ended before state dump start
[14208.827993] INFO: rcu_preempt detected stalls on CPUs/tasks:
[14208.829088]  (detected by 0, t=240562 jiffies, g=449892, c=449891, q=0)
[14208.829977] INFO: Stall ended before state dump start
[14403.765300] INFO: rcu_preempt detected stalls on CPUs/tasks:
[14403.766405]  (detected by 0, t=260067 jiffies, g=449892, c=449891, q=0)
and on and on..

so kernel space still works like before, but userspace is locked up.

 > If that doesn't work, I will need to put together some diagnostic patches.
 > Starting with the one below.

bugger, I forgot to apply this last night. I'll do it on the next run.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 15:08       ` Dave Jones
@ 2014-04-01 15:30         ` Paul E. McKenney
  2014-04-01 17:22           ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-01 15:30 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Tue, Apr 01, 2014 at 11:08:49AM -0400, Dave Jones wrote:
> On Mon, Mar 31, 2014 at 05:48:01PM -0700, Paul E. McKenney wrote:
> 
>  > Alternatively, I suppose a wakeup could be getting lost.  The main change
>  > related to that this merge window was ffa83fb565fb, which eliminated
>  > idle wakeups from RCU in the CONFIG_RCU_NOCB_CPU_ALL=y case.
>  > 
>  > So, could you please try reverting ffa83fb565fb?
> 
> Did a run overnight with that reverted, no luck..
> 
> [11869.580161] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [11869.581245]  (detected by 0, t=6502 jiffies, g=449892, c=449891, q=0)
> [11869.582047] INFO: Stall ended before state dump start
> [12064.517461] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [12064.518539]  (detected by 0, t=26007 jiffies, g=449892, c=449891, q=0)
> [12064.519347] INFO: Stall ended before state dump start
> [12259.454767] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [12259.455818]  (detected by 0, t=45512 jiffies, g=449892, c=449891, q=0)
> [12259.456602] INFO: Stall ended before state dump start
> [12454.392112] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [12454.393131]  (detected by 0, t=65017 jiffies, g=449892, c=449891, q=0)
> [12454.393942] INFO: Stall ended before state dump start
> [12649.329446] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [12649.330446]  (detected by 0, t=84522 jiffies, g=449892, c=449891, q=0)
> [12649.331267] INFO: Stall ended before state dump start
> [12844.266776] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [12844.267695]  (detected by 0, t=104027 jiffies, g=449892, c=449891, q=0)
> [12844.268571] INFO: Stall ended before state dump start
> [13039.204102] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [13039.205119]  (detected by 0, t=123532 jiffies, g=449892, c=449891, q=0)
> [13039.205871] INFO: Stall ended before state dump start
> [13234.141424] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [13234.141424] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [13234.142448]  (detected by 0, t=143037 jiffies, g=449892, c=449891, q=0)
> [13234.143339] INFO: Stall ended before state dump start
> [13429.078730] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [13429.079771]  (detected by 0, t=162542 jiffies, g=449892, c=449891, q=0)
> [13429.080591] INFO: Stall ended before state dump start
> [13624.016058] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [13624.017116]  (detected by 0, t=182047 jiffies, g=449892, c=449891, q=0)
> [13624.018033] INFO: Stall ended before state dump start
> [13818.953370] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [13818.954437]  (detected by 0, t=201552 jiffies, g=449892, c=449891, q=0)
> [13818.955334] INFO: Stall ended before state dump start
> [14013.890682] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [14013.891761]  (detected by 0, t=221057 jiffies, g=449892, c=449891, q=0)
> [14013.892691] INFO: Stall ended before state dump start
> [14208.827993] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [14208.829088]  (detected by 0, t=240562 jiffies, g=449892, c=449891, q=0)
> [14208.829977] INFO: Stall ended before state dump start
> [14403.765300] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [14403.766405]  (detected by 0, t=260067 jiffies, g=449892, c=449891, q=0)
> and on and on..
> 
> so kernel space still works like before, but userspace is locked up.

Interesting.  I suspect that if you reverted the rest of this merge
window's RCU patches, you would get the same result.

>  > If that doesn't work, I will need to put together some diagnostic patches.
>  > Starting with the one below.
> 
> bugger, I forgot to apply this last night. I'll do it on the next run.

But let's see what this shows first.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 15:30         ` Paul E. McKenney
@ 2014-04-01 17:22           ` Dave Jones
  2014-04-01 17:55             ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-01 17:22 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Tue, Apr 01, 2014 at 08:30:32AM -0700, Paul E. McKenney wrote:
 > > [14403.765300] INFO: rcu_preempt detected stalls on CPUs/tasks:
 > > [14403.766405]  (detected by 0, t=260067 jiffies, g=449892, c=449891, q=0)
 > > and on and on..
 > > 
 > > so kernel space still works like before, but userspace is locked up.
 > 
 > Interesting.  I suspect that if you reverted the rest of this merge
 > window's RCU patches, you would get the same result.
 > 
 > >  > If that doesn't work, I will need to put together some diagnostic patches.
 > >  > Starting with the one below.
 > > 
 > > bugger, I forgot to apply this last night. I'll do it on the next run.
 > 
 > But let's see what this shows first.

[ 1173.923463] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1173.924457] 	(detected by 0, t=6502 jiffies, g=47728, c=47727, q=0)
[ 1173.925199] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1368.859744] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1368.860754] 	(detected by 0, t=26007 jiffies, g=47728, c=47727, q=0)
[ 1368.861559] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1563.796416] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1563.797426] 	(detected by 0, t=45512 jiffies, g=47728, c=47727, q=0)
[ 1563.798293] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1758.733747] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1758.734725] 	(detected by 0, t=65017 jiffies, g=47728, c=47727, q=0)
[ 1758.735526] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1953.670944] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1953.671932] 	(detected by 0, t=84522 jiffies, g=47728, c=47727, q=0)
[ 1953.672735] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 2148.608132] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2148.609140] 	(detected by 0, t=104027 jiffies, g=47728, c=47727, q=0)
etc etc.

	Dave
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 17:22           ` Dave Jones
@ 2014-04-01 17:55             ` Paul E. McKenney
  2014-04-01 18:04               ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-01 17:55 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Tue, Apr 01, 2014 at 01:22:44PM -0400, Dave Jones wrote:
> On Tue, Apr 01, 2014 at 08:30:32AM -0700, Paul E. McKenney wrote:
>  > > [14403.765300] INFO: rcu_preempt detected stalls on CPUs/tasks:
>  > > [14403.766405]  (detected by 0, t=260067 jiffies, g=449892, c=449891, q=0)
>  > > and on and on..
>  > > 
>  > > so kernel space still works like before, but userspace is locked up.
>  > 
>  > Interesting.  I suspect that if you reverted the rest of this merge
>  > window's RCU patches, you would get the same result.
>  > 
>  > >  > If that doesn't work, I will need to put together some diagnostic patches.
>  > >  > Starting with the one below.
>  > > 
>  > > bugger, I forgot to apply this last night. I'll do it on the next run.
>  > 
>  > But let's see what this shows first.
> 
> [ 1173.923463] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1173.924457] 	(detected by 0, t=6502 jiffies, g=47728, c=47727, q=0)
> [ 1173.925199] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1368.859744] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1368.860754] 	(detected by 0, t=26007 jiffies, g=47728, c=47727, q=0)
> [ 1368.861559] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1563.796416] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1563.797426] 	(detected by 0, t=45512 jiffies, g=47728, c=47727, q=0)
> [ 1563.798293] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1758.733747] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1758.734725] 	(detected by 0, t=65017 jiffies, g=47728, c=47727, q=0)
> [ 1758.735526] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1953.670944] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1953.671932] 	(detected by 0, t=84522 jiffies, g=47728, c=47727, q=0)
> [ 1953.672735] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 2148.608132] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2148.609140] 	(detected by 0, t=104027 jiffies, g=47728, c=47727, q=0)
> etc etc.

Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
reported.

Given that you have CONFIG_RCU_TRACE=y, could you please enable the
following trace events and dump the trace before things hang?

	trace_event=rcu:rcu_grace_period,rcu:rcu_grace_period_init

If it is not feasible to dump the trace before things hang, let me
know, and I will work out some other diagnostic regime.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 17:55             ` Paul E. McKenney
@ 2014-04-01 18:04               ` Dave Jones
  2014-04-01 18:32                 ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-01 18:04 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Tue, Apr 01, 2014 at 10:55:45AM -0700, Paul E. McKenney wrote:
 > >  > > so kernel space still works like before, but userspace is locked up.
 > >  > 
 > >  > Interesting.  I suspect that if you reverted the rest of this merge
 > >  > window's RCU patches, you would get the same result.
 
Something that occurred to me is that this might be something in the x86 merge
that's just changing timings enough to expose this problem.
At some point this evening, I'll try bisecting it if we don't get any closer.

 > > [ 1953.672735] INFO: Stall ended before state dump start, gp_kthread state: 0x2
 > > [ 2148.608132] INFO: rcu_preempt detected stalls on CPUs/tasks:
 > > [ 2148.609140] 	(detected by 0, t=104027 jiffies, g=47728, c=47727, q=0)
 > > etc etc.
 > 
 > Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
 > you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
 > reported.

Lockdep had reported something a little earlier (timestamped at 1108.xxxxxx)
but that's a known false-positive in xfs.

 > Given that you have CONFIG_RCU_TRACE=y, could you please enable the
 > following trace events and dump the trace before things hang?
 > 
 > 	trace_event=rcu:rcu_grace_period,rcu:rcu_grace_period_init
 > 
 > If it is not feasible to dump the trace before things hang, let me
 > know, and I will work out some other diagnostic regime.

I'll give that a shot when I get back in a few hours.

	Dave

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 18:04               ` Dave Jones
@ 2014-04-01 18:32                 ` Paul E. McKenney
  2014-04-01 22:16                   ` Dave Jones
  2014-04-02 16:20                   ` Paul E. McKenney
  0 siblings, 2 replies; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-01 18:32 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Tue, Apr 01, 2014 at 02:04:14PM -0400, Dave Jones wrote:
> On Tue, Apr 01, 2014 at 10:55:45AM -0700, Paul E. McKenney wrote:
>  > >  > > so kernel space still works like before, but userspace is locked up.
>  > >  > 
>  > >  > Interesting.  I suspect that if you reverted the rest of this merge
>  > >  > window's RCU patches, you would get the same result.
> 
> Something that occurred to me is that this might be something in the x86 merge
> that's just changing timings enough to expose this problem.
> At some point this evening, I'll try bisecting it if we don't get any closer.

OK.  ;-)

>  > > [ 1953.672735] INFO: Stall ended before state dump start, gp_kthread state: 0x2
>  > > [ 2148.608132] INFO: rcu_preempt detected stalls on CPUs/tasks:
>  > > [ 2148.609140] 	(detected by 0, t=104027 jiffies, g=47728, c=47727, q=0)
>  > > etc etc.
>  > 
>  > Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
>  > you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
>  > reported.
> 
> Lockdep had reported something a little earlier (timestamped at 1108.xxxxxx)
> but that's a known false-positive in xfs.

Yep, I would be very surprised if that was related to the grace-period hang.

>  > Given that you have CONFIG_RCU_TRACE=y, could you please enable the
>  > following trace events and dump the trace before things hang?
>  > 
>  > 	trace_event=rcu:rcu_grace_period,rcu:rcu_grace_period_init
>  > 
>  > If it is not feasible to dump the trace before things hang, let me
>  > know, and I will work out some other diagnostic regime.
> 
> I'll give that a shot when I get back in a few hours.

Cool!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 18:32                 ` Paul E. McKenney
@ 2014-04-01 22:16                   ` Dave Jones
  2014-04-01 23:18                     ` Paul E. McKenney
  2014-04-02 16:20                   ` Paul E. McKenney
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-01 22:16 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Tue, Apr 01, 2014 at 11:32:45AM -0700, Paul E. McKenney wrote:
 
 > >  > Given that you have CONFIG_RCU_TRACE=y, could you please enable the
 > >  > following trace events and dump the trace before things hang?
 > >  > 
 > >  > 	trace_event=rcu:rcu_grace_period,rcu:rcu_grace_period_init
 > >  > 
 > >  > If it is not feasible to dump the trace before things hang, let me
 > >  > know, and I will work out some other diagnostic regime.
 > > 
 > > I'll give that a shot when I get back in a few hours.

ok, this is tricky, and I'm not sure how helpful the below is.
because I don't know when the hang is going to happen, in parallel to my usual workload
I did this..
while [ 1 ];
do
  cat trace
  echo > trace
done

and got this...



# entries-in-buffer/entries-written: 7/7   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
          <idle>-0     [000] dNs3   851.748475: rcu_grace_period: rcu_preempt 21342 cpuend
          <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_preempt 21343 cpustart
          <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_sched 20080 cpuend
          <idle>-0     [000] dN.2   851.748480: rcu_grace_period: rcu_preempt 21343 cpuqs
     rcu_preempt-9     [000] ...1   851.748485: rcu_grace_period: rcu_preempt 21343 fqsstart
     rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqsend
     rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqswait
# tracer: nop
#
# entries-in-buffer/entries-written: 13/13   #P:4
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
     rcu_preempt-9     [003] d..2   851.878457: rcu_grace_period: rcu_preempt 21349 end
     rcu_preempt-9     [003] d..2   851.878459: rcu_grace_period: rcu_preempt 21349 newreq
     rcu_preempt-9     [003] ...1   851.878459: rcu_grace_period: rcu_preempt 21349 reqwait
     rcu_preempt-9     [003] d..2   851.878460: rcu_grace_period: rcu_preempt 21350 start
     rcu_preempt-9     [003] d..2   851.878461: rcu_grace_period: rcu_preempt 21350 cpustart
     rcu_preempt-9     [003] d..2   851.878462: rcu_grace_period_init: rcu_preempt 21350 0 0 7 f
     rcu_preempt-9     [003] ...1   851.878463: rcu_grace_period: rcu_preempt 21350 fqswait
     rcu_preempt-9     [003] d..2   851.878464: rcu_grace_period: rcu_preempt 21350 cpuqs
          <idle>-0     [000] dNs3   851.878482: rcu_grace_period: rcu_preempt 21349 cpuend
          <idle>-0     [000] dNs3   851.878483: rcu_grace_period: rcu_preempt 21350 cpustart
          <idle>-0     [000] dNs3   851.878484: rcu_grace_period: rcu_sched 20086 cpuend
          <idle>-0     [000] .N.2   851.878486: rcu_grace_period: rcu_sched 20086 cpuqs
          <idle>-0     [000] dN.2   851.878487: rcu_grace_period: rcu_preempt 21350 cpuqs


followed by a half dozen 'empty' traces before it totally locked up.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 22:16                   ` Dave Jones
@ 2014-04-01 23:18                     ` Paul E. McKenney
  2014-04-01 23:31                       ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-01 23:18 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Tue, Apr 01, 2014 at 06:16:16PM -0400, Dave Jones wrote:
> On Tue, Apr 01, 2014 at 11:32:45AM -0700, Paul E. McKenney wrote:
> 
>  > >  > Given that you have CONFIG_RCU_TRACE=y, could you please enable the
>  > >  > following trace events and dump the trace before things hang?
>  > >  > 
>  > >  > 	trace_event=rcu:rcu_grace_period,rcu:rcu_grace_period_init
>  > >  > 
>  > >  > If it is not feasible to dump the trace before things hang, let me
>  > >  > know, and I will work out some other diagnostic regime.
>  > > 
>  > > I'll give that a shot when I get back in a few hours.
> 
> ok, this is tricky, and I'm not sure how helpful the below is.
> because I don't know when the hang is going to happen, in parallel to my usual workload
> I did this..
> while [ 1 ];
> do
>   cat trace
>   echo > trace
> done
> 
> and got this...
> 
> 
> 
> # entries-in-buffer/entries-written: 7/7   #P:4
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>           <idle>-0     [000] dNs3   851.748475: rcu_grace_period: rcu_preempt 21342 cpuend
>           <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_preempt 21343 cpustart
>           <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_sched 20080 cpuend
>           <idle>-0     [000] dN.2   851.748480: rcu_grace_period: rcu_preempt 21343 cpuqs
>      rcu_preempt-9     [000] ...1   851.748485: rcu_grace_period: rcu_preempt 21343 fqsstart
>      rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqsend
>      rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqswait
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 13/13   #P:4
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>      rcu_preempt-9     [003] d..2   851.878457: rcu_grace_period: rcu_preempt 21349 end
>      rcu_preempt-9     [003] d..2   851.878459: rcu_grace_period: rcu_preempt 21349 newreq
>      rcu_preempt-9     [003] ...1   851.878459: rcu_grace_period: rcu_preempt 21349 reqwait
>      rcu_preempt-9     [003] d..2   851.878460: rcu_grace_period: rcu_preempt 21350 start
>      rcu_preempt-9     [003] d..2   851.878461: rcu_grace_period: rcu_preempt 21350 cpustart
>      rcu_preempt-9     [003] d..2   851.878462: rcu_grace_period_init: rcu_preempt 21350 0 0 7 f
>      rcu_preempt-9     [003] ...1   851.878463: rcu_grace_period: rcu_preempt 21350 fqswait
>      rcu_preempt-9     [003] d..2   851.878464: rcu_grace_period: rcu_preempt 21350 cpuqs
>           <idle>-0     [000] dNs3   851.878482: rcu_grace_period: rcu_preempt 21349 cpuend
>           <idle>-0     [000] dNs3   851.878483: rcu_grace_period: rcu_preempt 21350 cpustart
>           <idle>-0     [000] dNs3   851.878484: rcu_grace_period: rcu_sched 20086 cpuend
>           <idle>-0     [000] .N.2   851.878486: rcu_grace_period: rcu_sched 20086 cpuqs
>           <idle>-0     [000] dN.2   851.878487: rcu_grace_period: rcu_preempt 21350 cpuqs
> 
> 
> followed by a half dozen 'empty' traces before it totally locked up.

Cool, thank you!

Could you please also send the RCU CPU stall warning messages?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 23:18                     ` Paul E. McKenney
@ 2014-04-01 23:31                       ` Dave Jones
  2014-04-01 23:57                         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-01 23:31 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Tue, Apr 01, 2014 at 04:18:22PM -0700, Paul E. McKenney wrote:
 
 > > # entries-in-buffer/entries-written: 7/7   #P:4
 > > #
 > > #                              _-----=> irqs-off
 > > #                             / _----=> need-resched
 > > #                            | / _---=> hardirq/softirq
 > > #                            || / _--=> preempt-depth
 > > #                            ||| /     delay
 > > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 > > #              | |       |   ||||       |         |
 > >           <idle>-0     [000] dNs3   851.748475: rcu_grace_period: rcu_preempt 21342 cpuend
 > >           <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_preempt 21343 cpustart
 > >           <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_sched 20080 cpuend
 > >           <idle>-0     [000] dN.2   851.748480: rcu_grace_period: rcu_preempt 21343 cpuqs
 > >      rcu_preempt-9     [000] ...1   851.748485: rcu_grace_period: rcu_preempt 21343 fqsstart
 > >      rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqsend
 > >      rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqswait
 > > # tracer: nop
 > > #
 > > # entries-in-buffer/entries-written: 13/13   #P:4
 > > #
 > > #                              _-----=> irqs-off
 > > #                             / _----=> need-resched
 > > #                            | / _---=> hardirq/softirq
 > > #                            || / _--=> preempt-depth
 > > #                            ||| /     delay
 > > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 > > #              | |       |   ||||       |         |
 > >      rcu_preempt-9     [003] d..2   851.878457: rcu_grace_period: rcu_preempt 21349 end
 > >      rcu_preempt-9     [003] d..2   851.878459: rcu_grace_period: rcu_preempt 21349 newreq
 > >      rcu_preempt-9     [003] ...1   851.878459: rcu_grace_period: rcu_preempt 21349 reqwait
 > >      rcu_preempt-9     [003] d..2   851.878460: rcu_grace_period: rcu_preempt 21350 start
 > >      rcu_preempt-9     [003] d..2   851.878461: rcu_grace_period: rcu_preempt 21350 cpustart
 > >      rcu_preempt-9     [003] d..2   851.878462: rcu_grace_period_init: rcu_preempt 21350 0 0 7 f
 > >      rcu_preempt-9     [003] ...1   851.878463: rcu_grace_period: rcu_preempt 21350 fqswait
 > >      rcu_preempt-9     [003] d..2   851.878464: rcu_grace_period: rcu_preempt 21350 cpuqs
 > >           <idle>-0     [000] dNs3   851.878482: rcu_grace_period: rcu_preempt 21349 cpuend
 > >           <idle>-0     [000] dNs3   851.878483: rcu_grace_period: rcu_preempt 21350 cpustart
 > >           <idle>-0     [000] dNs3   851.878484: rcu_grace_period: rcu_sched 20086 cpuend
 > >           <idle>-0     [000] .N.2   851.878486: rcu_grace_period: rcu_sched 20086 cpuqs
 > >           <idle>-0     [000] dN.2   851.878487: rcu_grace_period: rcu_preempt 21350 cpuqs
 > > 
 > > 
 > > followed by a half dozen 'empty' traces before it totally locked up.
 > 
 > Cool, thank you!
 > 
 > Could you please also send the RCU CPU stall warning messages?

>From that run they were..

[  917.790265] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  917.791189] 	(detected by 0, t=6502 jiffies, g=21390, c=21389, q=0)
[  917.792058] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[  917.792799] INFO: rcu_sched detected stalls on CPUs/tasks:
[  917.793534] 	(detected by 0, t=6502 jiffies, g=20128, c=20127, q=0)
[  917.794265] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1112.727480] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1112.728425] 	(detected by 0, t=26007 jiffies, g=21390, c=21389, q=0)
[ 1112.729300] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1112.730035] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1112.730744] 	(detected by 0, t=26007 jiffies, g=20128, c=20127, q=0)
[ 1112.731459] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1307.664710] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1307.665623] 	(detected by 0, t=45512 jiffies, g=21390, c=21389, q=0)
[ 1307.666386] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1307.667113] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1307.667839] 	(detected by 0, t=45512 jiffies, g=20128, c=20127, q=0)
[ 1307.668572] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1502.601961] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1502.602905] 	(detected by 0, t=65017 jiffies, g=21390, c=21389, q=0)
[ 1502.603765] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1502.604550] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1502.605305] 	(detected by 0, t=65017 jiffies, g=20128, c=20127, q=0)
[ 1502.606061] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1697.539217] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1697.540189] 	(detected by 0, t=84522 jiffies, g=21390, c=21389, q=0)
[ 1697.541052] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1697.541826] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1697.542600] 	(detected by 0, t=84522 jiffies, g=20128, c=20127, q=0)
[ 1697.543378] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1892.476433] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 1892.477545] 	(detected by 0, t=104027 jiffies, g=21390, c=21389, q=0)
[ 1892.478402] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 1892.479208] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1892.480010] 	(detected by 0, t=104027 jiffies, g=20128, c=20127, q=0)
[ 1892.480831] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 2087.413694] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 2087.414748] 	(detected by 0, t=123532 jiffies, g=21390, c=21389, q=0)
[ 2087.415722] INFO: Stall ended before state dump start, gp_kthread state: 0x2
[ 2087.416558] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 2087.417392] 	(detected by 0, t=123532 jiffies, g=20128, c=20127, q=0)
[ 2087.418231] INFO: Stall ended before state dump start, gp_kthread state: 0x2




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 23:31                       ` Dave Jones
@ 2014-04-01 23:57                         ` Paul E. McKenney
  2014-04-02  0:07                           ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-01 23:57 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Tue, Apr 01, 2014 at 07:31:30PM -0400, Dave Jones wrote:
> On Tue, Apr 01, 2014 at 04:18:22PM -0700, Paul E. McKenney wrote:
> 
>  > > # entries-in-buffer/entries-written: 7/7   #P:4
>  > > #
>  > > #                              _-----=> irqs-off
>  > > #                             / _----=> need-resched
>  > > #                            | / _---=> hardirq/softirq
>  > > #                            || / _--=> preempt-depth
>  > > #                            ||| /     delay
>  > > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>  > > #              | |       |   ||||       |         |
>  > >           <idle>-0     [000] dNs3   851.748475: rcu_grace_period: rcu_preempt 21342 cpuend
>  > >           <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_preempt 21343 cpustart
>  > >           <idle>-0     [000] dNs3   851.748477: rcu_grace_period: rcu_sched 20080 cpuend
>  > >           <idle>-0     [000] dN.2   851.748480: rcu_grace_period: rcu_preempt 21343 cpuqs
>  > >      rcu_preempt-9     [000] ...1   851.748485: rcu_grace_period: rcu_preempt 21343 fqsstart
>  > >      rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqsend
>  > >      rcu_preempt-9     [000] ...1   851.748487: rcu_grace_period: rcu_preempt 21343 fqswait
>  > > # tracer: nop
>  > > #
>  > > # entries-in-buffer/entries-written: 13/13   #P:4
>  > > #
>  > > #                              _-----=> irqs-off
>  > > #                             / _----=> need-resched
>  > > #                            | / _---=> hardirq/softirq
>  > > #                            || / _--=> preempt-depth
>  > > #                            ||| /     delay
>  > > #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>  > > #              | |       |   ||||       |         |
>  > >      rcu_preempt-9     [003] d..2   851.878457: rcu_grace_period: rcu_preempt 21349 end
>  > >      rcu_preempt-9     [003] d..2   851.878459: rcu_grace_period: rcu_preempt 21349 newreq
>  > >      rcu_preempt-9     [003] ...1   851.878459: rcu_grace_period: rcu_preempt 21349 reqwait
>  > >      rcu_preempt-9     [003] d..2   851.878460: rcu_grace_period: rcu_preempt 21350 start
>  > >      rcu_preempt-9     [003] d..2   851.878461: rcu_grace_period: rcu_preempt 21350 cpustart
>  > >      rcu_preempt-9     [003] d..2   851.878462: rcu_grace_period_init: rcu_preempt 21350 0 0 7 f
>  > >      rcu_preempt-9     [003] ...1   851.878463: rcu_grace_period: rcu_preempt 21350 fqswait
>  > >      rcu_preempt-9     [003] d..2   851.878464: rcu_grace_period: rcu_preempt 21350 cpuqs
>  > >           <idle>-0     [000] dNs3   851.878482: rcu_grace_period: rcu_preempt 21349 cpuend
>  > >           <idle>-0     [000] dNs3   851.878483: rcu_grace_period: rcu_preempt 21350 cpustart
>  > >           <idle>-0     [000] dNs3   851.878484: rcu_grace_period: rcu_sched 20086 cpuend
>  > >           <idle>-0     [000] .N.2   851.878486: rcu_grace_period: rcu_sched 20086 cpuqs
>  > >           <idle>-0     [000] dN.2   851.878487: rcu_grace_period: rcu_preempt 21350 cpuqs

OK, so the trace ended on grace period #21350...

>  > > 
>  > > 
>  > > followed by a half dozen 'empty' traces before it totally locked up.
>  > 
>  > Cool, thank you!
>  > 
>  > Could you please also send the RCU CPU stall warning messages?
> 
> >From that run they were..
> 
> [  917.790265] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  917.791189] 	(detected by 0, t=6502 jiffies, g=21390, c=21389, q=0)

But the hang didn't happen until grace period #21390.  Sigh!  That means
that the trace doesn't tell me about the current grace period.

Thank you for getting this info, but it is beginning to look like
bisection is necessary here.  :-(

							Thanx, Paul

> [  917.792058] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [  917.792799] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  917.793534] 	(detected by 0, t=6502 jiffies, g=20128, c=20127, q=0)
> [  917.794265] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1112.727480] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1112.728425] 	(detected by 0, t=26007 jiffies, g=21390, c=21389, q=0)
> [ 1112.729300] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1112.730035] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1112.730744] 	(detected by 0, t=26007 jiffies, g=20128, c=20127, q=0)
> [ 1112.731459] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1307.664710] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1307.665623] 	(detected by 0, t=45512 jiffies, g=21390, c=21389, q=0)
> [ 1307.666386] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1307.667113] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1307.667839] 	(detected by 0, t=45512 jiffies, g=20128, c=20127, q=0)
> [ 1307.668572] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1502.601961] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1502.602905] 	(detected by 0, t=65017 jiffies, g=21390, c=21389, q=0)
> [ 1502.603765] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1502.604550] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1502.605305] 	(detected by 0, t=65017 jiffies, g=20128, c=20127, q=0)
> [ 1502.606061] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1697.539217] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1697.540189] 	(detected by 0, t=84522 jiffies, g=21390, c=21389, q=0)
> [ 1697.541052] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1697.541826] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1697.542600] 	(detected by 0, t=84522 jiffies, g=20128, c=20127, q=0)
> [ 1697.543378] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1892.476433] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 1892.477545] 	(detected by 0, t=104027 jiffies, g=21390, c=21389, q=0)
> [ 1892.478402] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 1892.479208] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1892.480010] 	(detected by 0, t=104027 jiffies, g=20128, c=20127, q=0)
> [ 1892.480831] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 2087.413694] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2087.414748] 	(detected by 0, t=123532 jiffies, g=21390, c=21389, q=0)
> [ 2087.415722] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> [ 2087.416558] INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 2087.417392] 	(detected by 0, t=123532 jiffies, g=20128, c=20127, q=0)
> [ 2087.418231] INFO: Stall ended before state dump start, gp_kthread state: 0x2
> 
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 23:57                         ` Paul E. McKenney
@ 2014-04-02  0:07                           ` Dave Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Dave Jones @ 2014-04-02  0:07 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Tue, Apr 01, 2014 at 04:57:24PM -0700, Paul E. McKenney wrote:
 
 > > [  917.790265] INFO: rcu_preempt detected stalls on CPUs/tasks:
 > > [  917.791189] 	(detected by 0, t=6502 jiffies, g=21390, c=21389, q=0)
 > 
 > But the hang didn't happen until grace period #21390.  Sigh!  That means
 > that the trace doesn't tell me about the current grace period.
 > 
 > Thank you for getting this info, but it is beginning to look like
 > bisection is necessary here.  :-(

ok, I was hoping that wouldn't be the case, but let's see what it turns up.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-01 18:32                 ` Paul E. McKenney
  2014-04-01 22:16                   ` Dave Jones
@ 2014-04-02 16:20                   ` Paul E. McKenney
  2014-04-02 16:23                     ` Dave Jones
  2014-04-02 22:48                     ` Dave Jones
  1 sibling, 2 replies; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-02 16:20 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Tue, Apr 01, 2014 at 11:32:45AM -0700, Paul E. McKenney wrote:
> On Tue, Apr 01, 2014 at 02:04:14PM -0400, Dave Jones wrote:
> > On Tue, Apr 01, 2014 at 10:55:45AM -0700, Paul E. McKenney wrote:

[ . . . ]

> >  > Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
> >  > you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
> >  > reported.
> > 
> > Lockdep had reported something a little earlier (timestamped at 1108.xxxxxx)
> > but that's a known false-positive in xfs.
> 
> Yep, I would be very surprised if that was related to the grace-period hang.

Ah, but it could be suppressing later lockdep splats.  So if this can be
reproduced without xfs, we might get additional information from lockdep.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-02 16:20                   ` Paul E. McKenney
@ 2014-04-02 16:23                     ` Dave Jones
  2014-04-02 22:48                     ` Dave Jones
  1 sibling, 0 replies; 23+ messages in thread
From: Dave Jones @ 2014-04-02 16:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Wed, Apr 02, 2014 at 09:20:43AM -0700, Paul E. McKenney wrote:
 > On Tue, Apr 01, 2014 at 11:32:45AM -0700, Paul E. McKenney wrote:
 > > On Tue, Apr 01, 2014 at 02:04:14PM -0400, Dave Jones wrote:
 > > > On Tue, Apr 01, 2014 at 10:55:45AM -0700, Paul E. McKenney wrote:
 > 
 > [ . . . ]
 > 
 > > >  > Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
 > > >  > you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
 > > >  > reported.
 > > > 
 > > > Lockdep had reported something a little earlier (timestamped at 1108.xxxxxx)
 > > > but that's a known false-positive in xfs.
 > > 
 > > Yep, I would be very surprised if that was related to the grace-period hang.
 > 
 > Ah, but it could be suppressing later lockdep splats.  So if this can be
 > reproduced without xfs, we might get additional information from lockdep.

Hmm, I lack another test machine, and don't really feel like reinstalling it right now.
I'm halfway through a bisect, hopefully that's fruitful.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-02 16:20                   ` Paul E. McKenney
  2014-04-02 16:23                     ` Dave Jones
@ 2014-04-02 22:48                     ` Dave Jones
  2014-04-03 20:01                       ` Dave Jones
  1 sibling, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-02 22:48 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Wed, Apr 02, 2014 at 09:20:43AM -0700, Paul E. McKenney wrote:
 > On Tue, Apr 01, 2014 at 11:32:45AM -0700, Paul E. McKenney wrote:
 > > On Tue, Apr 01, 2014 at 02:04:14PM -0400, Dave Jones wrote:
 > > > On Tue, Apr 01, 2014 at 10:55:45AM -0700, Paul E. McKenney wrote:
 > 
 > [ . . . ]
 > 
 > > >  > Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
 > > >  > you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
 > > >  > reported.
 > > > 
 > > > Lockdep had reported something a little earlier (timestamped at 1108.xxxxxx)
 > > > but that's a known false-positive in xfs.
 > > 
 > > Yep, I would be very surprised if that was related to the grace-period hang.
 > 
 > Ah, but it could be suppressing later lockdep splats.  So if this can be
 > reproduced without xfs, we might get additional information from lockdep.

Hrmph.

$ git bisect bad
The merge base 5cb480f6b488128140c940abff3c36f524a334a8 is bad.
This means the bug has been fixed between 5cb480f6b488128140c940abff3c36f524a334a8 and [455c6fdbd219161bd09b1165f11699d6d73de11c 62c206bd514600d4d73751ade00dca8e488390a3 e086481baf9d0436bdd6e9b739bfa4a83fb89ef5].

Not sure where to go from here..

The 'good' news is I can reproduce it pretty reliably now.
I start my fuzz tester, and immediately do a git diff in my working tree,
and then boom..

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-02 22:48                     ` Dave Jones
@ 2014-04-03 20:01                       ` Dave Jones
  2014-04-03 20:46                         ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-03 20:01 UTC (permalink / raw)
  To: Paul E. McKenney, Linux Kernel

On Wed, Apr 02, 2014 at 06:48:40PM -0400, Dave Jones wrote:
 
 >  > > >  > Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
 >  > > >  > you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
 >  > > >  > reported.
 >  > > > 
 >  > > > Lockdep had reported something a little earlier (timestamped at 1108.xxxxxx)
 >  > > > but that's a known false-positive in xfs.
 >  > > 
 >  > > Yep, I would be very surprised if that was related to the grace-period hang.
 >  > 
 >  > Ah, but it could be suppressing later lockdep splats.  So if this can be
 >  > reproduced without xfs, we might get additional information from lockdep.
 > 
 > Hrmph.
 > 
 > $ git bisect bad
 > The merge base 5cb480f6b488128140c940abff3c36f524a334a8 is bad.
 > This means the bug has been fixed between 5cb480f6b488128140c940abff3c36f524a334a8 and [455c6fdbd219161bd09b1165f11699d6d73de11c 62c206bd514600d4d73751ade00dca8e488390a3 e086481baf9d0436bdd6e9b739bfa4a83fb89ef5].
 > 
 > Not sure where to go from here..
 > 
 > The 'good' news is I can reproduce it pretty reliably now.
 > I start my fuzz tester, and immediately do a git diff in my working tree,
 > and then boom..

Even better, now I realise I don't even need my fuzzer in the mix. Just doing
a fair amount of disk io (like a git diff on a dirty tree) will trigger it.

I've tried adding a show_state() call when the stall happens, but another stall
seems to occur before it gets a chance to even dump everything over the usb-serial console.
And of course nothing ever makes it to disk, even though I can sysrq-sync, on the
next reboot systemd has stuffed a bunch of ^@ in the log where the interesting
stuff should be.

Any other ideas ?

	Dave



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-03 20:01                       ` Dave Jones
@ 2014-04-03 20:46                         ` Paul E. McKenney
  2014-04-03 21:44                           ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-03 20:46 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Thu, Apr 03, 2014 at 04:01:43PM -0400, Dave Jones wrote:
> On Wed, Apr 02, 2014 at 06:48:40PM -0400, Dave Jones wrote:
> 
>  >  > > >  > Waiting uninterruptibly.  Presumably blocked on mutex_lock().  But
>  >  > > >  > you have CONFIG_PROVE_LOCKING(), so any deadlocks should have been
>  >  > > >  > reported.
>  >  > > > 
>  >  > > > Lockdep had reported something a little earlier (timestamped at 1108.xxxxxx)
>  >  > > > but that's a known false-positive in xfs.
>  >  > > 
>  >  > > Yep, I would be very surprised if that was related to the grace-period hang.
>  >  > 
>  >  > Ah, but it could be suppressing later lockdep splats.  So if this can be
>  >  > reproduced without xfs, we might get additional information from lockdep.
>  > 
>  > Hrmph.
>  > 
>  > $ git bisect bad
>  > The merge base 5cb480f6b488128140c940abff3c36f524a334a8 is bad.
>  > This means the bug has been fixed between 5cb480f6b488128140c940abff3c36f524a334a8 and [455c6fdbd219161bd09b1165f11699d6d73de11c 62c206bd514600d4d73751ade00dca8e488390a3 e086481baf9d0436bdd6e9b739bfa4a83fb89ef5].
>  > 
>  > Not sure where to go from here..
>  > 
>  > The 'good' news is I can reproduce it pretty reliably now.
>  > I start my fuzz tester, and immediately do a git diff in my working tree,
>  > and then boom..
> 
> Even better, now I realise I don't even need my fuzzer in the mix. Just doing
> a fair amount of disk io (like a git diff on a dirty tree) will trigger it.
> 
> I've tried adding a show_state() call when the stall happens, but another stall
> seems to occur before it gets a chance to even dump everything over the usb-serial console.
> And of course nothing ever makes it to disk, even though I can sysrq-sync, on the
> next reboot systemd has stuffed a bunch of ^@ in the log where the interesting
> stuff should be.
> 
> Any other ideas ?

This bug does seem to be doing an effective job of defending itself,
doesn't it?  I guess producing additional debug output just isn't going
to cut it in this case.

So, how about reverting each commit in the RCU series, and then bisecting
through the reverts?  Something like the following:

	wherever=linus/master # or substitute whatever point you wish.
	# Create a revert branch for the rcu.2014.02.26a branch
	git checkout -b anti-fixes $wherever
	git revert 5cb5c6e18f822b19bd41a2c0f9930c82b3ec0bc9
	git revert 7a754743185a4b05818e10058fa2fbe4e6969085
	git revert 8857563b819b140aa8c9be920cfe44d5d3f808b7
	git revert add1f0995454374d90c9d6b2c420d2fba3d0a4e3
	git revert ae1670339c95c3ff96ab10582506cf827c5fecc8
	git revert 52e2bb958ac4f9b3c4bdd78606d279852fd72922
	git revert 88c1863066ccfa456797e12c5d8b4631aa1ad0d0
	git revert 0adab9b9aa18d7e90337d43567f1eec3d5401b81
	git revert 41f4abd92a34f9c5110bbb870c04f8854604e28d
	git revert cb1e78cfa267453bb19e7edafd214c03834b664c
	git revert 87de1cfdc55b16b794e245b07322340725149d62
	git revert 3660c2813fb6d0ba48ee44bcbf9feddf7218c11d
	# Create a revert branch for the rt.2014.02.17b branch
	git checkout -b anti-rt $wherever
	git revert f1f399d1281ea339a08469f7e58193624992f620
	git revert ffa83fb565fbc397cbafb4b71fd1cce276d4c3b6
	git revert 2f33b512a5460578f6cf11d7b7867bed53157c7c
	git merge anti-fixes

Then bisect through these reverts.

I am assuming, perhaps naively, that changes under Documentation and
to torture testing should not be affecting you.

Does this make sense?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-03 20:46                         ` Paul E. McKenney
@ 2014-04-03 21:44                           ` Dave Jones
  2014-04-03 22:37                             ` Dave Jones
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-03 21:44 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Thu, Apr 03, 2014 at 01:46:56PM -0700, Paul E. McKenney wrote:
 > So, how about reverting each commit in the RCU series, and then bisecting
 > through the reverts?  Something like the following:
 > 
 > 	wherever=linus/master # or substitute whatever point you wish.
 > 	# Create a revert branch for the rcu.2014.02.26a branch
 > 	git checkout -b anti-fixes $wherever
 > 	git revert 5cb5c6e18f822b19bd41a2c0f9930c82b3ec0bc9
 > 	git revert 7a754743185a4b05818e10058fa2fbe4e6969085
 > 	git revert 8857563b819b140aa8c9be920cfe44d5d3f808b7
 > 	git revert add1f0995454374d90c9d6b2c420d2fba3d0a4e3
 > 	git revert ae1670339c95c3ff96ab10582506cf827c5fecc8
 > 	git revert 52e2bb958ac4f9b3c4bdd78606d279852fd72922
 > 	git revert 88c1863066ccfa456797e12c5d8b4631aa1ad0d0
 > 	git revert 0adab9b9aa18d7e90337d43567f1eec3d5401b81
 > 	git revert 41f4abd92a34f9c5110bbb870c04f8854604e28d
 > 	git revert cb1e78cfa267453bb19e7edafd214c03834b664c
 > 	git revert 87de1cfdc55b16b794e245b07322340725149d62
 > 	git revert 3660c2813fb6d0ba48ee44bcbf9feddf7218c11d
 > 	# Create a revert branch for the rt.2014.02.17b branch
 > 	git checkout -b anti-rt $wherever
 > 	git revert f1f399d1281ea339a08469f7e58193624992f620
 > 	git revert ffa83fb565fbc397cbafb4b71fd1cce276d4c3b6
 > 	git revert 2f33b512a5460578f6cf11d7b7867bed53157c7c
 > 	git merge anti-fixes
 > 
 > Then bisect through these reverts.
 > 
 > I am assuming, perhaps naively, that changes under Documentation and
 > to torture testing should not be affecting you.
 > 
 > Does this make sense?

I tried reverting every one of those and could still hit it, which
seems to indicate the bug is elsewhere, and rcu is just a victim of it.
I'm going to start trying to revert the lockdep changes, given this seems
to happen after that has warned..

	Dave 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-03 21:44                           ` Dave Jones
@ 2014-04-03 22:37                             ` Dave Jones
  2014-04-04 17:06                               ` Paul E. McKenney
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Jones @ 2014-04-03 22:37 UTC (permalink / raw)
  To: Paul E. McKenney, Linux Kernel

On Thu, Apr 03, 2014 at 05:44:30PM -0400, Dave Jones wrote:
 > On Thu, Apr 03, 2014 at 01:46:56PM -0700, Paul E. McKenney wrote:
 >  > 	git merge anti-fixes
 >  > 
 >  > Then bisect through these reverts.
 >  > 
 >  > I am assuming, perhaps naively, that changes under Documentation and
 >  > to torture testing should not be affecting you.
 >  > 
 >  > Does this make sense?
 > 
 > I tried reverting every one of those and could still hit it, which
 > seems to indicate the bug is elsewhere, and rcu is just a victim of it.
 > I'm going to start trying to revert the lockdep changes, given this seems
 > to happen after that has warned..

And that was also without any success.
Tomorrow I'll try a bisect on just arch/x86.

	Dave


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: rcu_prempt stalls / lockup
  2014-04-03 22:37                             ` Dave Jones
@ 2014-04-04 17:06                               ` Paul E. McKenney
  0 siblings, 0 replies; 23+ messages in thread
From: Paul E. McKenney @ 2014-04-04 17:06 UTC (permalink / raw)
  To: Dave Jones, Linux Kernel

On Thu, Apr 03, 2014 at 06:37:00PM -0400, Dave Jones wrote:
> On Thu, Apr 03, 2014 at 05:44:30PM -0400, Dave Jones wrote:
>  > On Thu, Apr 03, 2014 at 01:46:56PM -0700, Paul E. McKenney wrote:
>  >  > 	git merge anti-fixes
>  >  > 
>  >  > Then bisect through these reverts.
>  >  > 
>  >  > I am assuming, perhaps naively, that changes under Documentation and
>  >  > to torture testing should not be affecting you.
>  >  > 
>  >  > Does this make sense?
>  > 
>  > I tried reverting every one of those and could still hit it, which
>  > seems to indicate the bug is elsewhere, and rcu is just a victim of it.
>  > I'm going to start trying to revert the lockdep changes, given this seems
>  > to happen after that has warned..
> 
> And that was also without any success.
> Tomorrow I'll try a bisect on just arch/x86.

I cannot think of any better debugging strategy at the moment.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-04-04 17:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-31 23:02 rcu_prempt stalls / lockup Dave Jones
2014-03-31 23:22 ` Paul E. McKenney
2014-03-31 23:35   ` Dave Jones
2014-04-01  0:48     ` Paul E. McKenney
2014-04-01 15:08       ` Dave Jones
2014-04-01 15:30         ` Paul E. McKenney
2014-04-01 17:22           ` Dave Jones
2014-04-01 17:55             ` Paul E. McKenney
2014-04-01 18:04               ` Dave Jones
2014-04-01 18:32                 ` Paul E. McKenney
2014-04-01 22:16                   ` Dave Jones
2014-04-01 23:18                     ` Paul E. McKenney
2014-04-01 23:31                       ` Dave Jones
2014-04-01 23:57                         ` Paul E. McKenney
2014-04-02  0:07                           ` Dave Jones
2014-04-02 16:20                   ` Paul E. McKenney
2014-04-02 16:23                     ` Dave Jones
2014-04-02 22:48                     ` Dave Jones
2014-04-03 20:01                       ` Dave Jones
2014-04-03 20:46                         ` Paul E. McKenney
2014-04-03 21:44                           ` Dave Jones
2014-04-03 22:37                             ` Dave Jones
2014-04-04 17:06                               ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.