* Lock recursion seen on qla2xxx client when rebooting the target server
@ 2019-04-01 0:44 Laurence Oberman
2019-04-02 21:30 ` Laurence Oberman
0 siblings, 1 reply; 2+ messages in thread
From: Laurence Oberman @ 2019-04-01 0:44 UTC (permalink / raw)
To: linux-scsi, linux-block, Himanshu Madhani, Madhani, Himanshu,
Hannes Reinecke, Ewan Milne
Cc: Marco Patalano, Dutile, Don, Van Assche, Bart
This who have been following my trials and tribulations with SRP and
block-mq panics (See Re: Panic when rebooting target server testing srp
on 5.0.0-rc2) know I was going to run the same test with qla2xxx and
F/C
Anyway, rebooting the targetserver (LIO) that was causing the block-mq
race that is still out there and not yet diagnosed when SRP is the
client causes issues with 5.1-rc2 as well.
The issue is different. I was seeing a total lockup and no console
messages. To get the lockup message I had to enable lock debugging.
Anyway, Hannes, how have you folks not seen these issues at Suse with
5.1+ testing. Here I caught two different problems that are now latent
in 5.1-x (maybe earlier too). This is a generic array reboot test that
sadly is a common issue with our customewrs when they have fabric or
array issues.
Kernel 5.1.0-rc2+ on an x86_64
localhost login: [ 301.752492] BUG: spinlock cpu recursion on CPU#38,
kworker/38:0/204
[ 301.782364] lock: 0xffff90ddb2e43430, .magic: dead4ead, .owner:
kworker/38:1/271, .owner_cpu: 38
[ 301.825496] CPU: 38 PID: 204 Comm: kworker/38:0 Kdump: loaded Not
tainted 5.1.0-rc2+ #1
[ 301.863052] Hardware name: HP ProLiant ML150 Gen9/ProLiant ML150
Gen9, BIOS P95 05/21/2018
[ 301.903614] Workqueue: qla2xxx_wq qla24xx_delete_sess_fn [qla2xxx]
[ 301.933561] Call Trace:
[ 301.945950] dump_stack+0x5a/0x73
[ 301.962080] do_raw_spin_lock+0x83/0xa0
[ 301.980287] _raw_spin_lock_irqsave+0x66/0x80
[ 302.001726] ? qla24xx_delete_sess_fn+0x34/0x90 [qla2xxx]
[ 302.028111] qla24xx_delete_sess_fn+0x34/0x90 [qla2xxx]
[ 302.052864] process_one_work+0x215/0x4c0
[ 302.071940] ? process_one_work+0x18c/0x4c0
[ 302.092228] worker_thread+0x46/0x3e0
[ 302.110313] kthread+0xfb/0x130
[ 302.125274] ? process_one_work+0x4c0/0x4c0
[ 302.146054] ? kthread_bind+0x10/0x10
[ 302.163789] ret_from_fork+0x35/0x40
Just an FYI, with only 100 LUNS 4 paths i cannot boot the host without
adding my watchdog_thresh=60 to the kernel line.
I hard lockup during LUN discovery so that issue is also out there.
So far 5.x+ has been problemetic with regression testing.
Regards
Laurence
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Lock recursion seen on qla2xxx client when rebooting the target server
2019-04-01 0:44 Lock recursion seen on qla2xxx client when rebooting the target server Laurence Oberman
@ 2019-04-02 21:30 ` Laurence Oberman
0 siblings, 0 replies; 2+ messages in thread
From: Laurence Oberman @ 2019-04-02 21:30 UTC (permalink / raw)
To: linux-scsi, linux-block, Himanshu Madhani, Madhani, Himanshu,
Hannes Reinecke, Ewan Milne
Cc: Marco Patalano, Dutile, Don, Van Assche, Bart
On Sun, 2019-03-31 at 20:44 -0400, Laurence Oberman wrote:
> This who have been following my trials and tribulations with SRP and
> block-mq panics (See Re: Panic when rebooting target server testing
> srp
> on 5.0.0-rc2) know I was going to run the same test with qla2xxx and
> F/C
>
> Anyway, rebooting the targetserver (LIO) that was causing the block-
> mq
> race that is still out there and not yet diagnosed when SRP is the
> client causes issues with 5.1-rc2 as well.
>
> The issue is different. I was seeing a total lockup and no console
> messages. To get the lockup message I had to enable lock debugging.
>
> Anyway, Hannes, how have you folks not seen these issues at Suse with
> 5.1+ testing. Here I caught two different problems that are now
> latent
> in 5.1-x (maybe earlier too). This is a generic array reboot test
> that
> sadly is a common issue with our customewrs when they have fabric or
> array issues.
>
> Kernel 5.1.0-rc2+ on an x86_64
>
> localhost login: [ 301.752492] BUG: spinlock cpu recursion on
> CPU#38,
> kworker/38:0/204
> [ 301.782364] lock: 0xffff90ddb2e43430, .magic: dead4ead, .owner:
> kworker/38:1/271, .owner_cpu: 38
> [ 301.825496] CPU: 38 PID: 204 Comm: kworker/38:0 Kdump: loaded Not
> tainted 5.1.0-rc2+ #1
> [ 301.863052] Hardware name: HP ProLiant ML150 Gen9/ProLiant ML150
> Gen9, BIOS P95 05/21/2018
> [ 301.903614] Workqueue: qla2xxx_wq qla24xx_delete_sess_fn [qla2xxx]
> [ 301.933561] Call Trace:
> [ 301.945950] dump_stack+0x5a/0x73
> [ 301.962080] do_raw_spin_lock+0x83/0xa0
> [ 301.980287] _raw_spin_lock_irqsave+0x66/0x80
> [ 302.001726] ? qla24xx_delete_sess_fn+0x34/0x90 [qla2xxx]
> [ 302.028111] qla24xx_delete_sess_fn+0x34/0x90 [qla2xxx]
> [ 302.052864] process_one_work+0x215/0x4c0
> [ 302.071940] ? process_one_work+0x18c/0x4c0
> [ 302.092228] worker_thread+0x46/0x3e0
> [ 302.110313] kthread+0xfb/0x130
> [ 302.125274] ? process_one_work+0x4c0/0x4c0
> [ 302.146054] ? kthread_bind+0x10/0x10
> [ 302.163789] ret_from_fork+0x35/0x40
>
> Just an FYI, with only 100 LUNS 4 paths i cannot boot the host
> without
> adding my watchdog_thresh=60 to the kernel line.
> I hard lockup during LUN discovery so that issue is also out there.
>
> So far 5.x+ has been problemetic with regression testing.
>
> Regards
> Laurence
I chatted with Himanshu about this and he will be sending me a test
patch. He thinks he knows what is going on here.
I will report back when tested.
Note!! to reitterate, this is not the block-mq issue I uncovered with
SRP testing. The investigation for that is still ongoing.
Thanks
Laurence
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-04-02 21:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-01 0:44 Lock recursion seen on qla2xxx client when rebooting the target server Laurence Oberman
2019-04-02 21:30 ` Laurence Oberman
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.