linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash?
@ 2020-05-22  8:49 bugzilla-daemon
  2020-05-22  9:19 ` [Bug 207855] " bugzilla-daemon
                   ` (13 more replies)
  0 siblings, 14 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-22  8:49 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

            Bug ID: 207855
           Summary: arcconf host reset causes kernel panic -> driver
                    crash?
           Product: IO/Storage
           Version: 2.5
    Kernel Version: 5.6.13
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
          Assignee: linux-scsi@vger.kernel.org
          Reporter: janpieter.sollie@edpnet.be
        Regression: No

Created attachment 289227
  --> https://bugzilla.kernel.org/attachment.cgi?id=289227&action=edit
last dmesg captured

When performing a arcconf operation (assign hot-spare) on a adaptec 72405 SAS
controller, the program crashes with the error "segmentation fault", but
apparently, the driver is not too happy with it either: it becomes
unresponsive, and makes it impossible to access scsi devices on the SAS
controller.
Additional tricks to perform a PCI level reset ultimately lead to a kernel
panic:
linuxserver# echo 1 > /sys/bus/pci/devices/0000\:04\:00.0/reset
(wait a minute)
linuxserver# echo 1 > /sys/bus/pci/rescan
(wait a minute)
linuxserver# umount /data/* (where all SAS devices are mounted)
(hangs indefinitely) 
linuxserver# echo auto > /sys/bus/pci/devices/0000\:04\:00.0/power/control
linuxserver# echo "0000:04:00.0" > /sys/bus/pci/drivers/aacraid/unbind
--PANIC--

I haven't been able to C/P the panic output yet, working on a kexec kernel or
crash dump.
The root directory is NOT one of the SAS devices, it is on a generic SATA
controller

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
@ 2020-05-22  9:19 ` bugzilla-daemon
  2020-05-22  9:40 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-22  9:19 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #1 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
UPDATE: the host does not panic, but the whole IO system does not work any
longer:
- network IO fails
- logon fails (hangs indefinitely)
- dmesg fails (hangs indefinitely)
- keyboard still works
I'd say a general IO error occurs (but why is there still USB keyboard input?),
making the system unresponsive.  Next time, I'll see whether I can still try a
cat /dev/kmsg, but any use of kexec is off the table, I guess

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
  2020-05-22  9:19 ` [Bug 207855] " bugzilla-daemon
@ 2020-05-22  9:40 ` bugzilla-daemon
  2020-05-23 16:29 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-22  9:40 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #2 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Created attachment 289229
  --> https://bugzilla.kernel.org/attachment.cgi?id=289229&action=edit
kernel .config file

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
  2020-05-22  9:19 ` [Bug 207855] " bugzilla-daemon
  2020-05-22  9:40 ` bugzilla-daemon
@ 2020-05-23 16:29 ` bugzilla-daemon
  2020-05-23 17:53 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 16:29 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

Bart Van Assche (bvanassche@acm.org) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bvanassche@acm.org

--- Comment #3 from Bart Van Assche (bvanassche@acm.org) ---
Is this perhaps a recently introduced bug? If so, would it be possible to
bisect this? See also
https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-05-23 16:29 ` bugzilla-daemon
@ 2020-05-23 17:53 ` bugzilla-daemon
  2020-05-23 18:38 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 17:53 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #4 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Good idea ... however, currently, I cleared + rebuilt the storage array, and
everything is working again.  Any idea what this segfault means so I can
reproduce the state (host adapter reset) and cause the same error condition?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-05-23 17:53 ` bugzilla-daemon
@ 2020-05-23 18:38 ` bugzilla-daemon
  2020-05-23 19:09 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 18:38 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #5 from Bart Van Assche (bvanassche@acm.org) ---
Is it possible to reproduce the kernel warning by running sg_reset -h
/dev/sd... where /dev/sd... is a SCSI device controlled by an aacraid adapter?
sg_reset is available in the sg3_utils package.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-05-23 18:38 ` bugzilla-daemon
@ 2020-05-23 19:09 ` bugzilla-daemon
  2020-06-06  9:01 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-05-23 19:09 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #6 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Sorry, I tried that, but IOP reset succeeded. I even tried it while the array
was doing an expansion operation, but no luck. It came back with no issues

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-05-23 19:09 ` bugzilla-daemon
@ 2020-06-06  9:01 ` bugzilla-daemon
  2020-07-13 11:46 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-06-06  9:01 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

Janpieter Sollie (janpieter.sollie@edpnet.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |OBSOLETE

--- Comment #7 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
I figured out it was due to an insufficient +5V line which made devices
function the wrong way,I added some extra +5V juice and it worked without any
problem.
Neverthless, is it an option to "isolate" the storage driver somewhat so the
other PCI devices are kept up-and-running?
There are still some points of investigation:
-If it's PCI related, why does the dedicated VGA + onboard USB still work?
-If it's storage subsystem related, why does network IO fail?
-If it's driver related, why is AHCI going down as well?
I guess this is not supposed to happen, so I'll see whether I can make it crash
again, and eventually try to reset the whole PCI bus (and see whether that
would help)

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-06-06  9:01 ` bugzilla-daemon
@ 2020-07-13 11:46 ` bugzilla-daemon
  2020-07-13 11:47 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-13 11:46 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

Janpieter Sollie (janpieter.sollie@edpnet.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|OBSOLETE                    |---

--- Comment #8 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
to update a bit:
I had the problem reoccured this morning.  I can't access the PC right now, but
I tried the remote syslog, and it displays something like:
... I know, the aacraid adapter panics, but why does it not reset the adapter
and moves on? Why does the telnet daemon segfault in libc?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-07-13 11:46 ` bugzilla-daemon
@ 2020-07-13 11:47 ` bugzilla-daemon
  2020-07-13 19:10 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-13 11:47 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #9 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
Created attachment 290249
  --> https://bugzilla.kernel.org/attachment.cgi?id=290249&action=edit
attachment of remote syslog

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-07-13 11:47 ` bugzilla-daemon
@ 2020-07-13 19:10 ` bugzilla-daemon
  2020-07-15  8:45 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-13 19:10 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #10 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
I just verified: the device was mostly dead:no F12 to enter kernel log, no num
lock answer by kb led, but still replying to ping...
I currently locked the screen on tty12, so next time I *should* be able to see
something

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (9 preceding siblings ...)
  2020-07-13 19:10 ` bugzilla-daemon
@ 2020-07-15  8:45 ` bugzilla-daemon
  2020-07-15  8:50 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-15  8:45 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

--- Comment #11 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
the issue seems to be related to:

> [59502.794967] Call Trace:
> [59502.794967]  _raw_spin_lock_irqsave+0x20/0x30
> [59502.794968]  __scsi_iterate_devices+0x22/0x80
> [59502.794968]  scsi_eh_ready_devs+0x129/0x7c0
> [59502.794968]  ? __pm_runtime_resume+0x54/0x70
> [59502.794968]  scsi_error_handler+0x394/0x3a0
> [59502.794969]  kthread+0xf3/0x130
> [59502.794969]  ? scsi_eh_get_sense+0x120/0x120
> [59502.794969]  ? kthread_park+0x80/0x80
> [59502.794970]  ret_from_fork+0x1f/0x30

As far as I see, this stack blocks the entire scsi subsystem.
I do not see why: the scsi_error_handler runs in a separate kthread, so it
*should* not block the IO subsystem ... but it definitely does: all storage
devices on all SAS/SATA controllers (even USB) become inaccessible.  I managed
to get a dmesg out of it, but "echo 1 >
/sys/class/pci_bus/0000\:04/device/reset"
never completed.  this command was issued over a running SSH session.  A new
session could not be established any longer.  But it proves the PCI subsystem
is partially intact.

is it possible the raw_spin_lock_irqsave hurts when the adapter is not ready
yet? and as such locks a device but never completes?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (10 preceding siblings ...)
  2020-07-15  8:45 ` bugzilla-daemon
@ 2020-07-15  8:50 ` bugzilla-daemon
  2020-07-19  6:36 ` bugzilla-daemon
  2020-11-07 17:11 ` bugzilla-daemon
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-15  8:50 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

Janpieter Sollie (janpieter.sollie@edpnet.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.6.13                      |5.6.13 - 5.7.8

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (11 preceding siblings ...)
  2020-07-15  8:50 ` bugzilla-daemon
@ 2020-07-19  6:36 ` bugzilla-daemon
  2020-11-07 17:11 ` bugzilla-daemon
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-07-19  6:36 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

Janpieter Sollie (janpieter.sollie@edpnet.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://bugzilla.kernel.org
                   |                            |/show_bug.cgi?id=208605

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Bug 207855] arcconf host reset causes kernel panic -> driver crash?
  2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
                   ` (12 preceding siblings ...)
  2020-07-19  6:36 ` bugzilla-daemon
@ 2020-11-07 17:11 ` bugzilla-daemon
  13 siblings, 0 replies; 15+ messages in thread
From: bugzilla-daemon @ 2020-11-07 17:11 UTC (permalink / raw)
  To: linux-scsi

https://bugzilla.kernel.org/show_bug.cgi?id=207855

Janpieter Sollie (janpieter.sollie@edpnet.be) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |INVALID

--- Comment #12 from Janpieter Sollie (janpieter.sollie@edpnet.be) ---
SAS adapter malfunction was due to a bad power supply - this was not a linux
issue

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-11-07 17:11 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-22  8:49 [Bug 207855] New: arcconf host reset causes kernel panic -> driver crash? bugzilla-daemon
2020-05-22  9:19 ` [Bug 207855] " bugzilla-daemon
2020-05-22  9:40 ` bugzilla-daemon
2020-05-23 16:29 ` bugzilla-daemon
2020-05-23 17:53 ` bugzilla-daemon
2020-05-23 18:38 ` bugzilla-daemon
2020-05-23 19:09 ` bugzilla-daemon
2020-06-06  9:01 ` bugzilla-daemon
2020-07-13 11:46 ` bugzilla-daemon
2020-07-13 11:47 ` bugzilla-daemon
2020-07-13 19:10 ` bugzilla-daemon
2020-07-15  8:45 ` bugzilla-daemon
2020-07-15  8:50 ` bugzilla-daemon
2020-07-19  6:36 ` bugzilla-daemon
2020-11-07 17:11 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).