linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: chenxiang <chenxiang66@hisilicon.com>
To: <jejb@linux.ibm.com>, <martin.petersen@oracle.com>
Cc: <linux-scsi@vger.kernel.org>, <linuxarm@huawei.com>,
	<john.garry@huawei.com>
Subject: [PATCH 01/15] libsas: Don't always drain event workqueue for HA resume
Date: Wed, 17 Nov 2021 10:44:54 +0800	[thread overview]
Message-ID: <1637117108-230103-2-git-send-email-chenxiang66@hisilicon.com> (raw)
In-Reply-To: <1637117108-230103-1-git-send-email-chenxiang66@hisilicon.com>

From: John Garry <john.garry@huawei.com>

For the hisi_sas driver, if a directly attached disk is removed during
suspend, a hang will occur in the resume process:

The background is that in commit 16fd4a7c5917 ("scsi: hisi_sas: Add device
link between SCSI devices and hisi_hba"), it is ensured that the HBA
device cannot be runtime suspended when any SCSI device associated is
active.

Other drivers which use libsas don't worry about this as none support
runtime suspend.

The mentioned hang occurs when an disk is removed during suspend. In the
removal process - from PHYE_RESUME_TIMEOUT event processing - we call into
scsi_remove_device(), which is being processed in the HA event workqueue.
Here we wait for all suppliers of the SCSI device to resume, which
includes the HBA device (from the above commit). However the HBA device
cannot resume, as it is waiting for the PHYE_RESUME_TIMEOUT to be processed
(from calling sas_resume_ha() -> sas_drain_work()). This is the deadlock.

There does not appear to be any need for the sas_drain_work() to be called
at all in sas_resume_ha() as it is not syncing against anything, so allow
LLDDs to avoid this by providing a variant of sas_resume_ha() which does
"sync", i.e. doesn't drain the event workqueue.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 10 +++++++++-
 drivers/scsi/libsas/sas_init.c         | 17 +++++++++++++++--
 include/scsi/libsas.h                  |  1 +
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 9ae47d1d9897..5a83373ac0b1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -4942,7 +4942,15 @@ static int _resume_v3_hw(struct device *device)
 		return rc;
 	}
 	phys_init_v3_hw(hisi_hba);
-	sas_resume_ha(sha);
+
+	/*
+	 * If a directly-attached disk is removed during suspend, a deadlock
+	 * may occur, as the PHYE_RESUME_TIMEOUT processing will require the
+	 * hisi_hba->device to be active, which can only happen when resume
+	 * completes. So don't wait for the HA event workqueue to drain upon
+	 * resume.
+	 */
+	sas_resume_ha_no_sync(sha);
 	clear_bit(HISI_SAS_RESETTING_BIT, &hisi_hba->flags);
 
 	return 0;
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index b640e09af6a4..43509d139241 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -387,7 +387,7 @@ static int phys_suspended(struct sas_ha_struct *ha)
 	return rc;
 }
 
-void sas_resume_ha(struct sas_ha_struct *ha)
+static void _sas_resume_ha(struct sas_ha_struct *ha, bool drain)
 {
 	const unsigned long tmo = msecs_to_jiffies(25000);
 	int i;
@@ -417,10 +417,23 @@ void sas_resume_ha(struct sas_ha_struct *ha)
 	 * flush out disks that did not return
 	 */
 	scsi_unblock_requests(ha->core.shost);
-	sas_drain_work(ha);
+	if (drain)
+		sas_drain_work(ha);
+}
+
+void sas_resume_ha(struct sas_ha_struct *ha)
+{
+	_sas_resume_ha(ha, true);
 }
 EXPORT_SYMBOL(sas_resume_ha);
 
+/* A no-sync variant, which does not call sas_drain_ha(). */
+void sas_resume_ha_no_sync(struct sas_ha_struct *ha)
+{
+	_sas_resume_ha(ha, false);
+}
+EXPORT_SYMBOL(sas_resume_ha_no_sync);
+
 void sas_suspend_ha(struct sas_ha_struct *ha)
 {
 	int i;
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 79e4903bd414..a795a2d9e5b1 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -660,6 +660,7 @@ extern int sas_register_ha(struct sas_ha_struct *);
 extern int sas_unregister_ha(struct sas_ha_struct *);
 extern void sas_prep_resume_ha(struct sas_ha_struct *sas_ha);
 extern void sas_resume_ha(struct sas_ha_struct *sas_ha);
+extern void sas_resume_ha_no_sync(struct sas_ha_struct *sas_ha);
 extern void sas_suspend_ha(struct sas_ha_struct *sas_ha);
 
 int sas_set_phy_speed(struct sas_phy *phy, struct sas_phy_linkrates *rates);
-- 
2.33.0


  reply	other threads:[~2021-11-17  2:50 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-17  2:44 [PATCH 00/15] Add runtime PM support for libsas chenxiang
2021-11-17  2:44 ` chenxiang [this message]
2021-11-17  4:14   ` [PATCH 01/15] libsas: Don't always drain event workqueue for HA resume Bart Van Assche
2021-11-17  5:06     ` Bart Van Assche
2021-11-17  2:44 ` [PATCH 02/15] Revert "scsi: hisi_sas: Filter out new PHY up events during suspend" chenxiang
2021-12-13 10:31   ` John Garry
2021-12-15  7:20     ` chenxiang (M)
2021-11-17  2:44 ` [PATCH 03/15] scsi/block PM: Always set request queue runtime active in blk_post_runtime_resume() chenxiang
2021-11-17  4:58   ` Bart Van Assche
2021-11-17  2:44 ` [PATCH 04/15] scsi: libsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list chenxiang
2021-11-17  2:44 ` [PATCH 05/15] scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list chenxiang
2021-11-17  2:44 ` [PATCH 06/15] scsi: mvsas: Add spin_lock/unlock() to protect asd_sas_port->phy_list chenxiang
2021-11-17  2:45 ` [PATCH 07/15] scsi: libsas: Send event PORTE_BROADCAST_RCVD for valid ports chenxiang
2021-12-13 11:02   ` John Garry
2021-12-15  7:25     ` chenxiang (M)
2021-11-17  2:45 ` [PATCH 08/15] scsi: hisi_sas: Add more prink for runtime suspend/resume chenxiang
2021-12-13 10:34   ` John Garry
2021-12-15  7:26     ` chenxiang (M)
2021-12-13 10:35   ` John Garry
2021-12-15  7:26     ` chenxiang (M)
2021-11-17  2:45 ` [PATCH 09/15] scsi: libsas: Resume sas host before sending SMP IOs chenxiang
2021-12-13 11:12   ` John Garry
2021-12-15  8:00     ` chenxiang (M)
2021-11-17  2:45 ` [PATCH 10/15] scsi: libsas: Add a flag SAS_HA_RESUMING of sas_ha chenxiang
2021-12-13 11:17   ` John Garry
2021-12-15  7:34     ` chenxiang (M)
2021-11-17  2:45 ` [PATCH 11/15] scsi: libsas: Refactor out sas_queue_deferred_work() chenxiang
2021-12-13 11:20   ` John Garry
2021-12-15  7:59     ` chenxiang (M)
2021-11-17  2:45 ` [PATCH 12/15] scsi: libsas: Defer works of new phys during suspend chenxiang
2021-11-17  2:45 ` [PATCH 13/15] scsi: hisi_sas: Keep controller active between ISR of phyup and the event being processed chenxiang
2021-12-13 10:44   ` John Garry
2021-12-15  7:57     ` chenxiang (M)
2021-11-17  2:45 ` [PATCH 14/15] scsi: libsas: Keep sas host active until finished some work chenxiang
2021-12-14 12:34   ` John Garry
2021-12-15  7:46     ` chenxiang (M)
2021-11-17  2:45 ` [PATCH 15/15] scsi: hisi_sas: Use autosuspend for SAS controller chenxiang
2021-12-13 10:50   ` John Garry
2021-12-15  7:52     ` chenxiang (M)
2021-11-19  4:04 ` [PATCH 00/15] Add runtime PM support for libsas Martin K. Petersen
2021-11-23  6:01   ` chenxiang (M)
2021-12-14 11:52 ` John Garry
2021-12-15  7:56   ` chenxiang (M)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1637117108-230103-2-git-send-email-chenxiang66@hisilicon.com \
    --to=chenxiang66@hisilicon.com \
    --cc=jejb@linux.ibm.com \
    --cc=john.garry@huawei.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).