linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix
@ 2022-05-12 11:15 John Garry
  2022-05-12 11:15 ` [PATCH 1/3] scsi: libsas: Refactor sas_ata_hard_reset() John Garry
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: John Garry @ 2022-05-12 11:15 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linux-kernel, linuxarm, John Garry

Hi,

This series includes 2x fixes for the hisi_sas driver:
- In issuing an ATA softreset in the IT nexus handler the disk may become 
  lost as we may issue the softreset before the phy is back up.

  So export functionality from sas_ata_hard_reset() to wait for phy up to
  synchronize.

- For host rescan we issue a nexus reset to the disk which is unnecessary,
  so drop it. In addition, usage of the device status flag needs to be
  fixed.

Please consider for 5.19, thanks!

John Garry (3):
  scsi: libsas: Refactor sas_ata_hard_reset()
  scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset
  scsi: hisi_sas: Fix rescan after deleting a disk

 drivers/scsi/hisi_sas/hisi_sas_main.c | 66 ++++++++++++---------------
 drivers/scsi/libsas/sas_ata.c         | 41 +++++++++++------
 include/scsi/sas_ata.h                |  7 +++
 3 files changed, 64 insertions(+), 50 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] scsi: libsas: Refactor sas_ata_hard_reset()
  2022-05-12 11:15 [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix John Garry
@ 2022-05-12 11:15 ` John Garry
  2022-05-12 11:15 ` [PATCH 2/3] scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset John Garry
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: John Garry @ 2022-05-12 11:15 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linux-kernel, linuxarm, John Garry

Create function sas_ata_wait_after_reset() from sas_ata_hard_reset() as
some LLDDs may want to check for a remote ATA phy is up after reset.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
---
 drivers/scsi/libsas/sas_ata.c | 41 +++++++++++++++++++++++------------
 include/scsi/sas_ata.h        |  7 ++++++
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/libsas/sas_ata.c b/drivers/scsi/libsas/sas_ata.c
index d34c82e24d9a..d35c9296f738 100644
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -358,22 +358,14 @@ static int sas_ata_printk(const char *level, const struct domain_device *ddev,
 	return r;
 }
 
-static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
-			      unsigned long deadline)
+int sas_ata_wait_after_reset(struct domain_device *dev, unsigned long deadline)
 {
-	int ret = 0, res;
-	struct sas_phy *phy;
-	struct ata_port *ap = link->ap;
+	struct sata_device *sata_dev = &dev->sata_dev;
 	int (*check_ready)(struct ata_link *link);
-	struct domain_device *dev = ap->private_data;
-	struct sas_internal *i = dev_to_sas_internal(dev);
-
-	res = i->dft->lldd_I_T_nexus_reset(dev);
-	if (res == -ENODEV)
-		return res;
-
-	if (res != TMF_RESP_FUNC_COMPLETE)
-		sas_ata_printk(KERN_DEBUG, dev, "Unable to reset ata device?\n");
+	struct ata_port *ap = sata_dev->ap;
+	struct ata_link *link = &ap->link;
+	struct sas_phy *phy;
+	int ret;
 
 	phy = sas_get_local_phy(dev);
 	if (scsi_is_sas_phy_local(phy))
@@ -386,6 +378,27 @@ static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
 	if (ret && ret != -EAGAIN)
 		sas_ata_printk(KERN_ERR, dev, "reset failed (errno=%d)\n", ret);
 
+	return ret;
+}
+EXPORT_SYMBOL_GPL(sas_ata_wait_after_reset);
+
+static int sas_ata_hard_reset(struct ata_link *link, unsigned int *class,
+			      unsigned long deadline)
+{
+	struct ata_port *ap = link->ap;
+	struct domain_device *dev = ap->private_data;
+	struct sas_internal *i = dev_to_sas_internal(dev);
+	int ret;
+
+	ret = i->dft->lldd_I_T_nexus_reset(dev);
+	if (ret == -ENODEV)
+		return ret;
+
+	if (ret != TMF_RESP_FUNC_COMPLETE)
+		sas_ata_printk(KERN_DEBUG, dev, "Unable to reset ata device?\n");
+
+	ret = sas_ata_wait_after_reset(dev, deadline);
+
 	*class = dev->sata_dev.class;
 
 	ap->cbl = ATA_CBL_SATA;
diff --git a/include/scsi/sas_ata.h b/include/scsi/sas_ata.h
index d47dea70855d..a1df4f9d57a3 100644
--- a/include/scsi/sas_ata.h
+++ b/include/scsi/sas_ata.h
@@ -34,6 +34,7 @@ void sas_resume_sata(struct asd_sas_port *port);
 void sas_ata_end_eh(struct ata_port *ap);
 int sas_execute_ata_cmd(struct domain_device *device, u8 *fis,
 			int force_phy_id);
+int sas_ata_wait_after_reset(struct domain_device *dev, unsigned long deadline);
 #else
 
 
@@ -91,6 +92,12 @@ static inline int sas_execute_ata_cmd(struct domain_device *device, u8 *fis,
 {
 	return 0;
 }
+
+static inline int sas_ata_wait_after_reset(struct domain_device *dev,
+					   unsigned long deadline)
+{
+	return -ETIMEDOUT;
+}
 #endif
 
 #endif /* _SAS_ATA_H_ */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset
  2022-05-12 11:15 [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix John Garry
  2022-05-12 11:15 ` [PATCH 1/3] scsi: libsas: Refactor sas_ata_hard_reset() John Garry
@ 2022-05-12 11:15 ` John Garry
  2022-05-12 11:15 ` [PATCH 3/3] scsi: hisi_sas: Fix rescan after deleting a disk John Garry
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: John Garry @ 2022-05-12 11:15 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linux-kernel, linuxarm, John Garry

We have seen errors like this when a SATA device is probed:

[524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ...
[524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00

Since commit 21c7e972475e ("scsi: hisi_sas: Disable SATA disk phy for
severe I_T nexus reset failure"), we issue an ATA softreset to disks after
a phy reset to ensure that they are in sound working order. If the
softreset is issued before the remote phy has come back up then the
softreset will fail (errors as above). Remedy this by waiting for the
phy to come back up after the reset.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 4bda2f6cb352..997f27e2f1e5 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1710,13 +1710,18 @@ static int hisi_sas_debug_I_T_nexus_reset(struct domain_device *device)
 		/* report PHY down if timed out */
 		if (rc == -ETIMEDOUT)
 			hisi_sas_phy_down(hisi_hba, sas_phy->id, 0, GFP_KERNEL);
-	} else if (sas_dev->dev_status != HISI_SAS_DEV_INIT) {
-		/*
-		 * If in init state, we rely on caller to wait for link to be
-		 * ready; otherwise, except phy reset is fail, delay.
-		 */
-		if (!rc)
-			msleep(2000);
+		return rc;
+	}
+
+	if (rc)
+		return rc;
+
+	/* Remote phy */
+	if (dev_is_sata(device)) {
+		rc = sas_ata_wait_after_reset(device,
+					HISI_SAS_WAIT_PHYUP_TIMEOUT);
+	} else {
+		msleep(2000);
 	}
 
 	return rc;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] scsi: hisi_sas: Fix rescan after deleting a disk
  2022-05-12 11:15 [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix John Garry
  2022-05-12 11:15 ` [PATCH 1/3] scsi: libsas: Refactor sas_ata_hard_reset() John Garry
  2022-05-12 11:15 ` [PATCH 2/3] scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset John Garry
@ 2022-05-12 11:15 ` John Garry
  2022-05-17  1:51 ` [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix Martin K. Petersen
  2022-05-20  1:09 ` Martin K. Petersen
  4 siblings, 0 replies; 6+ messages in thread
From: John Garry @ 2022-05-12 11:15 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linux-scsi, linux-kernel, linuxarm, John Garry

Removing an ATA device via sysfs means that the device may not be found
through re-scanning:

root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete
root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan
root@ubuntu:/home/john# lsscsi
[0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda
[0:0:8:0] enclosu 12G SAS Expander RevB -
root@ubuntu:/home/john#

The problem is that the rescan of the device may conflict with the device
in being re-initialized, as follows:

- In the rescan we call hisi_sas_slave_alloc() in store_scan() ->
  sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc() ->
  scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device()
  In hisi_sas_init_device() we issue an IT nexus reset for ATA devices

- That IT nexus causes the remote PHY to go down and this triggers a bcast
  event

- In parallel libsas processes the bcast event, finds that the phy is down
  and marks the device as gone

The hard reset issued in hisi_sas_init_device() is unncessary - as
described in the code comment - so remove it. Also set dev status as
HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call.

Fixes: 36c6b7613ef1 ("scsi: hisi_sas: Initialise devices in .slave_alloc callback")
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Yihang Li <liyihang6@hisilicon.com>
Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 47 ++++++++++-----------------
 1 file changed, 18 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 997f27e2f1e5..6803751dc4b1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -709,8 +709,6 @@ static int hisi_sas_init_device(struct domain_device *device)
 	struct scsi_lun lun;
 	int retry = HISI_SAS_DISK_RECOVER_CNT;
 	struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
-	struct device *dev = hisi_hba->dev;
-	struct sas_phy *local_phy;
 
 	switch (device->dev_type) {
 	case SAS_END_DEVICE:
@@ -729,30 +727,18 @@ static int hisi_sas_init_device(struct domain_device *device)
 	case SAS_SATA_PM_PORT:
 	case SAS_SATA_PENDING:
 		/*
-		 * send HARD RESET to clear previous affiliation of
-		 * STP target port
+		 * If an expander is swapped when a SATA disk is attached then
+		 * we should issue a hard reset to clear previous affiliation
+		 * of STP target port, see SPL (chapter 6.19.4).
+		 *
+		 * However we don't need to issue a hard reset here for these
+		 * reasons:
+		 * a. When probing the device, libsas/libata already issues a
+		 * hard reset in sas_probe_sata() -> ata_sas_async_probe().
+		 * Note that in hisi_sas_debug_I_T_nexus_reset() we take care
+		 * to issue a hard reset by checking the dev status (== INIT).
+		 * b. When resetting the controller, this is simply unnecessary.
 		 */
-		local_phy = sas_get_local_phy(device);
-		if (!scsi_is_sas_phy_local(local_phy) &&
-		    !test_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags)) {
-			unsigned long deadline = ata_deadline(jiffies, 20000);
-			struct sata_device *sata_dev = &device->sata_dev;
-			struct ata_host *ata_host = sata_dev->ata_host;
-			struct ata_port_operations *ops = ata_host->ops;
-			struct ata_port *ap = sata_dev->ap;
-			struct ata_link *link;
-			unsigned int classes;
-
-			ata_for_each_link(link, ap, EDGE)
-				rc = ops->hardreset(link, &classes,
-						    deadline);
-		}
-		sas_put_local_phy(local_phy);
-		if (rc) {
-			dev_warn(dev, "SATA disk hardreset fail: %d\n", rc);
-			return rc;
-		}
-
 		while (retry-- > 0) {
 			rc = hisi_sas_softreset_ata_disk(device);
 			if (!rc)
@@ -768,15 +754,19 @@ static int hisi_sas_init_device(struct domain_device *device)
 
 int hisi_sas_slave_alloc(struct scsi_device *sdev)
 {
-	struct domain_device *ddev;
+	struct domain_device *ddev = sdev_to_domain_dev(sdev);
+	struct hisi_sas_device *sas_dev = ddev->lldd_dev;
 	int rc;
 
 	rc = sas_slave_alloc(sdev);
 	if (rc)
 		return rc;
-	ddev = sdev_to_domain_dev(sdev);
 
-	return hisi_sas_init_device(ddev);
+	rc = hisi_sas_init_device(ddev);
+	if (rc)
+		return rc;
+	sas_dev->dev_status = HISI_SAS_DEV_NORMAL;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(hisi_sas_slave_alloc);
 
@@ -826,7 +816,6 @@ static int hisi_sas_dev_found(struct domain_device *device)
 	dev_info(dev, "dev[%d:%x] found\n",
 		sas_dev->device_id, sas_dev->dev_type);
 
-	sas_dev->dev_status = HISI_SAS_DEV_NORMAL;
 	return 0;
 
 err_out:
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix
  2022-05-12 11:15 [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix John Garry
                   ` (2 preceding siblings ...)
  2022-05-12 11:15 ` [PATCH 3/3] scsi: hisi_sas: Fix rescan after deleting a disk John Garry
@ 2022-05-17  1:51 ` Martin K. Petersen
  2022-05-20  1:09 ` Martin K. Petersen
  4 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2022-05-17  1:51 UTC (permalink / raw)
  To: John Garry; +Cc: jejb, martin.petersen, linux-scsi, linux-kernel, linuxarm


John,

> This series includes 2x fixes for the hisi_sas driver:
> - In issuing an ATA softreset in the IT nexus handler the disk may become 
>   lost as we may issue the softreset before the phy is back up.
>
>   So export functionality from sas_ata_hard_reset() to wait for phy up to
>   synchronize.
>
> - For host rescan we issue a nexus reset to the disk which is unnecessary,
>   so drop it. In addition, usage of the device status flag needs to be
>   fixed.

Applied to 5.19/scsi-staging, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix
  2022-05-12 11:15 [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix John Garry
                   ` (3 preceding siblings ...)
  2022-05-17  1:51 ` [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix Martin K. Petersen
@ 2022-05-20  1:09 ` Martin K. Petersen
  4 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2022-05-20  1:09 UTC (permalink / raw)
  To: John Garry, jejb; +Cc: Martin K . Petersen, linux-scsi, linuxarm, linux-kernel

On Thu, 12 May 2022 19:15:31 +0800, John Garry wrote:

> This series includes 2x fixes for the hisi_sas driver:
> - In issuing an ATA softreset in the IT nexus handler the disk may become
>   lost as we may issue the softreset before the phy is back up.
> 
>   So export functionality from sas_ata_hard_reset() to wait for phy up to
>   synchronize.
> 
> [...]

Applied to 5.19/scsi-queue, thanks!

[1/3] scsi: libsas: Refactor sas_ata_hard_reset()
      https://git.kernel.org/mkp/scsi/c/057e5fc03369
[2/3] scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset
      https://git.kernel.org/mkp/scsi/c/71453bd9d1bf
[3/3] scsi: hisi_sas: Fix rescan after deleting a disk
      https://git.kernel.org/mkp/scsi/c/e9dedc13bb11

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-20  1:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 11:15 [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix John Garry
2022-05-12 11:15 ` [PATCH 1/3] scsi: libsas: Refactor sas_ata_hard_reset() John Garry
2022-05-12 11:15 ` [PATCH 2/3] scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset John Garry
2022-05-12 11:15 ` [PATCH 3/3] scsi: hisi_sas: Fix rescan after deleting a disk John Garry
2022-05-17  1:51 ` [PATCH 0/3] hisi_sas: A device rescan and IT nexus reset fix Martin K. Petersen
2022-05-20  1:09 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).