linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fwd: [Bug 216059] New: Scsi host number of Adaptec RAID controller changes upon a PCIe hotplug and re-insert
       [not found] <bug-216059-41252@https.bugzilla.kernel.org/>
@ 2022-06-02 16:46 ` Bjorn Helgaas
  2022-06-02 16:55   ` James Bottomley
  0 siblings, 1 reply; 2+ messages in thread
From: Bjorn Helgaas @ 2022-06-02 16:46 UTC (permalink / raw)
  To: James E.J. Bottomley, Martin K. Petersen
  Cc: linux-scsi, Linux PCI, sagar.biradar

From bugzilla.  Reported against PCI, but I think the SCSI host number
is determined by SCSI, not by PCI, so I don't see a PCI issue here.

---------- Forwarded message ---------
From: <bugzilla-daemon@kernel.org>
Date: Thu, Jun 2, 2022 at 1:53 AM
Subject: [Bug 216059] New: Scsi host number of Adaptec RAID controller
changes upon a PCIe hotplug and re-insert
To: <bjorn@helgaas.com>


https://bugzilla.kernel.org/show_bug.cgi?id=216059

            Bug ID: 216059
           Summary: Scsi host number of Adaptec RAID controller changes
                    upon a PCIe hotplug and re-insert
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.18.11
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PCI
          Assignee: drivers_pci@kernel-bugs.osdl.org
          Reporter: sagar.biradar@microchip.com
        Regression: No

Created attachment 301088
  --> https://bugzilla.kernel.org/attachment.cgi?id=301088&action=edit
The attachments contain the log files which capture before and after cases for
a hotplug host number change

Summary:
This issue talks of the smartpqi driver for Adaptec controller, PCIe hotplug
and the corresponding SCSI host number


The Linux message log shows the host number (e.g. [14:2:0:0] storage -
/dev/sg27) unexpectedly changing when PCIe hot remove is rapidly followed by
PCIe hot add. The problem appears when the two PCIe events occur in quick
succession (i.e. less than 2 minutes). Because of the timing factor, the issue
can appear to be intermittent. The problem has been root caused as a kernel
issue.



Investigation:
Kernel (4.18.11-hotplug-patch) debug prints were added in  the “scsi_add_host(
)” and “scsi_remove_host ( )” routines. Per the debug prints in the log, the
scsi host number is released after the PCIe hot add event, which forces the
kernel use a different host number.

(debug prints)
Line 48: [ 1811.461055] smartpqi 0000:b3:00.0: Debuggg . . .
pqi_unregister_scsi function, before scsi_remove_host, shost->host_num=14
//smartpqi requests host num 14 to be removed
Line 83: [ 2012.125750]  (null): Debuggg . . shost->host_no before dev_set_name
= host15
Line 84: [ 2012.126709] smartpqi 0000:b3:00.0: Debuggg . . . before
scsi_add_host, shost->host_num=15 //upon hot add, kernel allocates host number
15, it should be 14
Line 132: [ 2014.181784] scsi host14: Debuggg . . in scsi_host_dev_release
function shost_host_no to be removed = 14 //kernel finally frees host number
14, but it’s too late



Conclusion:
The kernel is not releasing the host number immediately when the smartpqi
driver calls the scsi_remove_host() routine. If the PCIe cable is added back
within 2 minutes, the kernel can unexpectedly return a different host number.
This can lead to applications accessing the wrong device.
This is a Linux kernel issue and we will be raising a bugzilla on the linux
kernel.



Questions:
Will this be a problem for Amazon? (Wouldn’t they take several minutes to do
this, they have to be very careful when hot plugging?)
Do we need to consider other customers that might use PCIe hot plug in the
future?
The problem is observed in kernel V4.18.11, but would V5.04/V5.10 make a
difference (should we test it ourselves)?



Consequence:
Application accesses wrong device. Rebooting system may still result in wrong
host number.

--
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Fwd: [Bug 216059] New: Scsi host number of Adaptec RAID controller changes upon a PCIe hotplug and re-insert
  2022-06-02 16:46 ` Fwd: [Bug 216059] New: Scsi host number of Adaptec RAID controller changes upon a PCIe hotplug and re-insert Bjorn Helgaas
@ 2022-06-02 16:55   ` James Bottomley
  0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2022-06-02 16:55 UTC (permalink / raw)
  To: bjorn, Martin K. Petersen; +Cc: linux-scsi, Linux PCI, sagar.biradar

On Thu, 2022-06-02 at 11:46 -0500, Bjorn Helgaas wrote:
> From bugzilla.  Reported against PCI, but I think the SCSI host
> number is determined by SCSI, not by PCI, so I don't see a PCI issue
> here.

Agree this is SCSI.  However, can we be clear about what the
expectation is?  Host Number looks like it should be expected to change
on hot plug/hot unplug, so what is the actual problem?

I get that the driver not releasing the host is causing this, but even
if it did do instant release, when you hot plug two SCSI devices, you
stand a good chance of getting a different host number anyway.

James



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-06-02 16:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-216059-41252@https.bugzilla.kernel.org/>
2022-06-02 16:46 ` Fwd: [Bug 216059] New: Scsi host number of Adaptec RAID controller changes upon a PCIe hotplug and re-insert Bjorn Helgaas
2022-06-02 16:55   ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).