Question on threaded handlers for managed interrupts

* Question on threaded handlers for managed interrupts
@ 2021-04-22 16:10 John Garry
  2021-04-23 10:50 ` Thomas Gleixner
  0 siblings, 1 reply; 6+ messages in thread
From: John Garry @ 2021-04-22 16:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel

Hi Thomas,

I am finding that I can pretty easily trigger a system hang for certain 
scenarios with my storage controller.

So I'm getting something like this when running moderately heavy data 
throughput:

Starting 6 processes
[70.656622] sched: RT throttling activatedB/s][r=356k,w=0 IOPS][eta
01h:14m:43s]
[  207.632161] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:ta
01h:12m:26s]
[  207.638261] rcu:  0-...!: (1 GPs behind)
idle=312/1/0x4000000000000000 softirq=508/512 fqs=0
[  207.646777] rcu:  1-...!: (1 GPs behind) idle=694/0/0x0

It ends pretty badly - see [0].

The multi-queue storage controller (see [1] for memory refresh, but note 
that I can also trigger on PCI device host controller as well) is using 
managed interrupts and threaded handlers. Since the threaded handler 
uses SCHED_FIFO, aren't we always vulnerable to this situation with the 
managed interrupt and threaded handler combo? Would the advice be to 
just use irq polling here?

I unsuccessfully tried to trigger the same on NVMe PCI - however I have 
only 1x card, so hardly overloading the system.

Thanks,
John

[0] 
https://lore.kernel.org/rcu/412926e8-d3e1-3071-8cb9-098a7f49b64c@huawei.com/T/#mbd60463c543e04f87090d89301e1a5f10de958dd

[1] 
https://lore.kernel.org/linux-scsi/1606905417-183214-1-git-send-email-john.garry@huawei.com/#t

^ permalink raw reply	[flat|nested] 6+ messages in thread