Stable Archive on lore.kernel.org
 help / color / Atom feed
From: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
To: martin.petersen@oracle.com
Cc: linux-scsi@vger.kernel.org, sathya.prakash@broadcom.com,
	suganath-prabu.subramani@broadcom.com, stable@vger.kernel.org,
	amit@kernel.org, Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Subject: [PATCH v2] mpt3sas: Fix kernel panic observed on soft HBA unplug
Date: Fri, 27 Mar 2020 05:52:43 -0400
Message-ID: <1585302763-23007-1-git-send-email-sreekanth.reddy@broadcom.com> (raw)

Generic protection fault type kernel panic is observed when user
performs soft(ordered) HBA unplug operation while IOs are running
on drives connected to HBA.

When user performs ordered HBA removal operation then kernel calls
PCI device's .remove() call back function where driver is flushing out
all the outstanding SCSI IO commands with DID_NO_CONNECT host byte and
also un-maps sg buffers allocated for these IO commands.
But in the ordered HBA removal case (unlike of real HBA hot unplug)
HBA device is still alive and hence HBA hardware is performing the
DMA operations to those buffers on the system memory which are already
unmapped while flushing out the outstanding SCSI IO commands
and this leads to Kernel panic.

This bug got introduced from below commit,
commit c666d3be99c000bb889a33353e9be0fa5808d3de
("scsi: mpt3sas: wait for and flush running commands on shutdown/unload")

Fix:
Don't flush out the outstanding IOs from .remove() path in case of
ordered HBA removal since HBA will be still alive in this case and
it can complete the outstanding IOs. Flush out the outstanding IOs
only in case of 'physical HBA hot unplug' where their won't be any
communication with the HBA.

During shutdown also it is possible that HBA hardware can perform
DMA operations on those outstanding IO buffers which are completed
with DID_NO_CONNECT by the driver from .shutdown(). So same above fix
is applied in shutdown path as well.

In Summary it is always safe to drop the outstanding commands when HBA
is inaccessible. Such as when permanent PCI failure happens, HBA is
in non-operational state and when someone does the real HBA hot unplug
operation. Since driver knows that HBA is inaccessible during these
cases, so it is always safe to drop the outstanding commands instead
of waiting for SCSI error recovery to kicks start and clear these
outstanding commands.

Cc: stable@vger.kernel.org #v4.14.174+
Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
---
v1:
    Update the patch description.
v2:
    Update the patch description by adding summary paragraph.

 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 778d5e6..04a40af 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
 
 	ioc->remove_host = 1;
 
-	mpt3sas_wait_for_commands_to_complete(ioc);
-	_scsih_flush_running_cmds(ioc);
+	if (!pci_device_is_present(pdev))
+		_scsih_flush_running_cmds(ioc);
 
 	_scsih_fw_event_cleanup_queue(ioc);
 
@@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev *pdev)
 
 	ioc->remove_host = 1;
 
-	mpt3sas_wait_for_commands_to_complete(ioc);
-	_scsih_flush_running_cmds(ioc);
+	if (!pci_device_is_present(pdev))
+		_scsih_flush_running_cmds(ioc);
 
 	_scsih_fw_event_cleanup_queue(ioc);
 
-- 
1.8.3.1


             reply index

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27  9:52 Sreekanth Reddy [this message]
2020-04-01  2:04 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1585302763-23007-1-git-send-email-sreekanth.reddy@broadcom.com \
    --to=sreekanth.reddy@broadcom.com \
    --cc=amit@kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sathya.prakash@broadcom.com \
    --cc=stable@vger.kernel.org \
    --cc=suganath-prabu.subramani@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Stable Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/stable/0 stable/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 stable stable/ https://lore.kernel.org/stable \
		stable@vger.kernel.org
	public-inbox-index stable

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.stable


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git