* [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash?
@ 2020-05-22 8:49 bugzilla-daemon
2020-05-22 9:19 ` [Bug 207855] " bugzilla-daemon
` (13 more replies)
0 siblings, 14 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-22 8:49 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
Bug ID: 207855
Summary: arcconf host reset causes kernel panic -> driver
crash?
Product: IO/Storage
Version: 2.5
Kernel Version: 5.6.13
Hardware: x86-64
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: SCSI
Assignee: linux-scsi@vger.kernel.org
Reporter: janpieter.sollie@edpnet.be
Regression: No
Created attachment 289227
--> https://bugzilla.kernel.org/attachment.cgi?id=289227&action=edit
last dmesg captured
When performing a arcconf operation (assign hot-spare) on a adaptec 72405 SAS
controller, the program crashes with the error "segmentation fault", but
apparently, the driver is not too happy with it either: it becomes
unresponsive, and makes it impossible to access scsi devices on the SAS
controller.
Additional tricks to perform a PCI level reset ultimately lead to a kernel
panic:
linuxserver# echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/reset
(wait a minute)
linuxserver# echo 1 > /sys/bus/pci/rescan
(wait a minute)
linuxserver# umount /data/* (where all SAS devices are mounted)
(hangs indefinitely)
linuxserver# echo auto > /sys/bus/pci/devices/0000\:04\:00.0/power/control
linuxserver# echo "0000:04:00.0" > /sys/bus/pci/drivers/aacraid/unbind
--PANIC--
I haven't been able to C/P the panic output yet, working on a kexec kernel or
crash dump.
The root directory is NOT one of the SAS devices, it is on a generic SATA
controller
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
@ 2020-05-22 9:19 ` bugzilla-daemon
2020-05-22 9:40 ` bugzilla-daemon
` (12 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-22 9:19 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #1 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
UPDATE: the host does not panic, but the whole IO system does not work any
longer:
- network IO fails
- logon fails (hangs indefinitely)
- dmesg fails (hangs indefinitely)
- keyboard still works
I'd say a general IO error occurs (but why is there still USB keyboard input?),
making the system unresponsive. Next time, I'll see whether I can still try a
cat /dev/kmsg, but any use of kexec is off the table, I guess
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
2020-05-22 9:19 ` [Bug 207855] " bugzilla-daemon
@ 2020-05-22 9:40 ` bugzilla-daemon
2020-05-23 16:29 ` bugzilla-daemon
` (11 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-22 9:40 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #2 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Created attachment 289229
--> https://bugzilla.kernel.org/attachment.cgi?id=289229&action=edit
kernel .config file
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
2020-05-22 9:19 ` [Bug 207855] " bugzilla-daemon
2020-05-22 9:40 ` bugzilla-daemon
@ 2020-05-23 16:29 ` bugzilla-daemon
2020-05-23 17:53 ` bugzilla-daemon
` (10 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 16:29 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
Bart Van Assche (bvanassche@acm.org) changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bvanassche@acm.org
--- Comment #3 from Bart Van Assche (bvanassche@acm.org) ---
Is this perhaps a recently introduced bug? If so, would it be possible to
bisect this? See also
https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (2 preceding siblings ...)
2020-05-23 16:29 ` bugzilla-daemon
@ 2020-05-23 17:53 ` bugzilla-daemon
2020-05-23 18:38 ` bugzilla-daemon
` (9 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 17:53 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #4 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Good idea ... however, currently, I cleared + rebuilt the storage array, and
everything is working again. Any idea what this segfault means so I can
reproduce the state (host adapter reset) and cause the same error condition?
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (3 preceding siblings ...)
2020-05-23 17:53 ` bugzilla-daemon
@ 2020-05-23 18:38 ` bugzilla-daemon
2020-05-23 19:09 ` bugzilla-daemon
` (8 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 18:38 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #5 from Bart Van Assche (bvanassche@acm.org) ---
Is it possible to reproduce the kernel warning by running sg_reset -h
/dev/sd... where /dev/sd... is a SCSI device controlled by an aacraid adapter?
sg_reset is available in the sg3_utils package.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (4 preceding siblings ...)
2020-05-23 18:38 ` bugzilla-daemon
@ 2020-05-23 19:09 ` bugzilla-daemon
2020-06-06 9:01 ` bugzilla-daemon
` (7 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 19:09 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #6 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Sorry, I tried that, but IOP reset succeeded. I even tried it while the array
was doing an expansion operation, but no luck. It came back with no issues
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (5 preceding siblings ...)
2020-05-23 19:09 ` bugzilla-daemon
@ 2020-06-06 9:01 ` bugzilla-daemon
2020-07-13 11:46 ` bugzilla-daemon
` (6 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-06-06 9:01 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
Janpieter Sollie (janpieter.sollie@edpnet.be) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |OBSOLETE
--- Comment #7 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
I figured out it was due to an insufficient +5V line which made devices
function the wrong way,I added some extra +5V juice and it worked without any
problem.
Neverthless, is it an option to "isolate" the storage driver somewhat so the
other PCI devices are kept up-and-running?
There are still some points of investigation:
-If it's PCI related, why does the dedicated VGA + onboard USB still work?
-If it's storage subsystem related, why does network IO fail?
-If it's driver related, why is AHCI going down as well?
I guess this is not supposed to happen, so I'll see whether I can make it crash
again, and eventually try to reset the whole PCI bus (and see whether that
would help)
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (6 preceding siblings ...)
2020-06-06 9:01 ` bugzilla-daemon
@ 2020-07-13 11:46 ` bugzilla-daemon
2020-07-13 11:47 ` bugzilla-daemon
` (5 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-13 11:46 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
Janpieter Sollie (janpieter.sollie@edpnet.be) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Resolution|OBSOLETE |---
--- Comment #8 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
to update a bit:
I had the problem reoccured this morning. I can't access the PC right now, but
I tried the remote syslog, and it displays something like:
... I know, the aacraid adapter panics, but why does it not reset the adapter
and moves on? Why does the telnet daemon segfault in libc?
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (7 preceding siblings ...)
2020-07-13 11:46 ` bugzilla-daemon
@ 2020-07-13 11:47 ` bugzilla-daemon
2020-07-13 19:10 ` bugzilla-daemon
` (4 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-13 11:47 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #9 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Created attachment 290249
--> https://bugzilla.kernel.org/attachment.cgi?id=290249&action=edit
attachment of remote syslog
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (8 preceding siblings ...)
2020-07-13 11:47 ` bugzilla-daemon
@ 2020-07-13 19:10 ` bugzilla-daemon
2020-07-15 8:45 ` bugzilla-daemon
` (3 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-13 19:10 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #10 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
I just verified: the device was mostly dead:no F12 to enter kernel log, no num
lock answer by kb led, but still replying to ping...
I currently locked the screen on tty12, so next time I *should* be able to see
something
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (9 preceding siblings ...)
2020-07-13 19:10 ` bugzilla-daemon
@ 2020-07-15 8:45 ` bugzilla-daemon
2020-07-15 8:50 ` bugzilla-daemon
` (2 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-15 8:45 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
--- Comment #11 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
the issue seems to be related to:
> [59502.794967] Call Trace:
> [59502.794967] _raw_spin_lock_irqsave+0x20/0x30
> [59502.794968] __scsi_iterate_devices+0x22/0x80
> [59502.794968] scsi_eh_ready_devs+0x129/0x7c0
> [59502.794968] ? __pm_runtime_resume+0x54/0x70
> [59502.794968] scsi_error_handler+0x394/0x3a0
> [59502.794969] kthread+0xf3/0x130
> [59502.794969] ? scsi_eh_get_sense+0x120/0x120
> [59502.794969] ? kthread_park+0x80/0x80
> [59502.794970] ret_from_fork+0x1f/0x30
As far as I see, this stack blocks the entire scsi subsystem.
I do not see why: the scsi_error_handler runs in a separate kthread, so it
*should* not block the IO subsystem ... but it definitely does: all storage
devices on all SAS/SATA controllers (even USB) become inaccessible. I managed
to get a dmesg out of it, but "echo 1 >
/sys/class/pci_bus/0000\:04/device/reset"
never completed. this command was issued over a running SSH session. A new
session could not be established any longer. But it proves the PCI subsystem
is partially intact.
is it possible the raw_spin_lock_irqsave hurts when the adapter is not ready
yet? and as such locks a device but never completes?
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (10 preceding siblings ...)
2020-07-15 8:45 ` bugzilla-daemon
@ 2020-07-15 8:50 ` bugzilla-daemon
2020-07-19 6:36 ` bugzilla-daemon
2020-11-07 17:11 ` bugzilla-daemon
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-15 8:50 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
Janpieter Sollie (janpieter.sollie@edpnet.be) changed:
What |Removed |Added
----------------------------------------------------------------------------
Kernel Version|5.6.13 |5.6.13 - 5.7.8
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (11 preceding siblings ...)
2020-07-15 8:50 ` bugzilla-daemon
@ 2020-07-19 6:36 ` bugzilla-daemon
2020-11-07 17:11 ` bugzilla-daemon
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-19 6:36 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
Janpieter Sollie (janpieter.sollie@edpnet.be) changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugzilla.kernel.org
| |/show_bug.cgi?id=208605
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
` (12 preceding siblings ...)
2020-07-19 6:36 ` bugzilla-daemon
@ 2020-11-07 17:11 ` bugzilla-daemon
13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-11-07 17:11 UTC (permalink / raw)
To: linux-scsi
https://bugzilla.kernel.org/show_bug.cgi?id=207855
Janpieter Sollie (janpieter.sollie@edpnet.be) changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution|--- |INVALID
--- Comment #12 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
SAS adapter malfunction was due to a bad power supply - this was not a linux
issue
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2020-11-07 17:11 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-22 8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
2020-05-22 9:19 ` [Bug 207855] " bugzilla-daemon
2020-05-22 9:40 ` bugzilla-daemon
2020-05-23 16:29 ` bugzilla-daemon
2020-05-23 17:53 ` bugzilla-daemon
2020-05-23 18:38 ` bugzilla-daemon
2020-05-23 19:09 ` bugzilla-daemon
2020-06-06 9:01 ` bugzilla-daemon
2020-07-13 11:46 ` bugzilla-daemon
2020-07-13 11:47 ` bugzilla-daemon
2020-07-13 19:10 ` bugzilla-daemon
2020-07-15 8:45 ` bugzilla-daemon
2020-07-15 8:50 ` bugzilla-daemon
2020-07-19 6:36 ` bugzilla-daemon
2020-11-07 17:11 ` bugzilla-daemon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.