Re: Incomplete RCU stall splats ... Why?

* Re: Incomplete RCU stall splats ... Why?
       [not found] <1d469564-2ea0-d450-7d07-8d48a2686a06@arm.com>
@ 2022-08-04 16:30 ` Paul E. McKenney
  2022-08-12  8:47   ` Dietmar Eggemann
  0 siblings, 1 reply; 2+ messages in thread
From: Paul E. McKenney @ 2022-08-04 16:30 UTC (permalink / raw)
  To: Dietmar Eggemann; +Cc: rcu

On Thu, Aug 04, 2022 at 04:54:14PM +0200, Dietmar Eggemann wrote:
> Hi Paul,

Adding the rcu list on CC in case someone with more ARM experience
than I have has additional insights.

> one of my colleagues here in Arm approached me with an RCU stall issue
> on `5.4.0-66 Low Latency (Ubuntu)` (on Arm64 Ampere Altra server) he
> gets when he tries to bring-up a network card for which he only has the
> binary module for this kernel version. I tried to help him understanding
> it but after checking all the kernel config switches and studying the
> RCU code we still haven't found the culprit here. I was hoping you can
> give us some advice on this matter.

Do other kernel versions work better?  If so, I suggest manually forcing
an RCU CPU stall warning and bisecting.  (But you knew that already!)

> (1) What he gets when he launches the card is:
> 
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [15766.032270] rcu: 0-...0: (2 GPs behind) idle=866/0/0x1 softirq=109892/109892 fqs=7186
> [15766.040260] rcu: 7-...0: (51 ticks this GP) idle=bc6/1/0x4000000000000000 softirq=7973/7975 fqs=7187
> [15766.049552] rcu: 19-...0: (18 ticks this GP) idle=842/1/0x4000000000000000 softirq=1289/1289 fqs=7187
> [15766.058931] rcu: 33-...0: (1 GPs behind) idle=36e/1/0x4000000000000000 softirq=132936/132936 fqs=7187
> [15766.068309] rcu: 37-...0: (2 GPs behind) idle=c86/0/0x1 softirq=117951/117951 fqs=7187
> 
> w/o any task or stack information (1a). Even the line:
> 
> [    X.XXXXXX] (detected by X, t=XXX jiffies, g=XXX, q=XX)
> 
> is missing (1b).

I never have seen this, aside from the usual printk() messages being
lost due to overrunning the console device.  But then you will at least
sometimes get messages saying that output was lost.

> (2) He got only one time:
> 
> [ 1631.488289] rcu: rcu_sched kthread starved for 26711 jiffies! g26057 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=77

The above line is important.  If you don't allow the rcu_sched kthread to
run, RCU is very limited in what it can do for you.  In this particular
case, the rcu_sched kthread is trying to sleep for a few jiffies, but
was never awoken.  This is often due to timer issues.

And if it is timer issues, perhaps that is behind the lost printk()
messages.

> [ 1631.498709] rcu: RCU grace-period kthread stack dump: (0)
> 
> But also w/o the stack dump.

If this was a CPU rather that a kthread being dumped, I would suggest
that the target CPU was idle.  Of course, that usually results in a
line stating that the stack dump was omitted because the CPU was idle.

Your point about "try_get_task_stack()" below is a good one, though.

> ---
> 
> (1) print_other_cpu_stall()
>       print_cpu_stall_info(cpu) <-- works fine but
> 
> (1b) print_other_cpu_stall()
>        pr_cont("\t(detected by %d, t=%ld jiffies, g=%ld, q=%lu)\n", )
>        -- is not there? and
> 
> (1a) print_other_cpu_stall()
>        if (ndetected)
>          rcu_dump_cpu_stacks() 
>            ...
>            dump_cpu_task(cpu)
>            -- doesn't print anything even though ndetected must be > 0

Keying off of the ndetected, if you were running a preemptible kernel
(which you are not, see the "rcu_sched" above), I would suggest
backporting this commit:

e6a901a44f76 ("rcu: Fix to include first blocked task in stall warning")

> (2) I can see that no stack dump happens if we can't get the task stack.
> 
> rcu_check_gp_kthread_starvation()
> 
>   pr_err("RCU grace-period kthread stack dump:\n");
>   sched_show_task(gpk);
> 
>     if (!try_get_task_stack(p))
>       return;
> 
> So I can see why under (2) he doesn't get the stack dump but I can't
> see why he ends up w/o any task or stack information and w/o the
> `detected by X` line under (1)?

That surprises me as well.

> In case I play with RCU and try to provoke a stall I always get the
> information from (1) and (2).

Which is what I see.

> My question now is how can this happen and is there a way to convince
> the system to hand out the missing information?
> Other folks were suggesting to use kdump kernel w/ panic_on_rcu_stall=1
> and then use the crash tool. I haven't use this so far.

That makes a lot of sense to me!

You could then look through the rcu_state data structure, including
the rcu_state.node[] array to see what is stalling the grace period.

> I was hoping you can shed some light on this issue.

Another possibility would be to set hbreak breakpoints in gdb.

> Many thanks in advance!

Too much fun!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 2+ messages in thread