All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Enhance libsas hotplug feature
@ 2017-05-20  6:39 ` Yijing Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2017-05-20  6:39 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, john.garry, fangwei1, yanaijie, hch, dan.j.williams,
	Yijing Wang

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) ---->PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce wait-complete
to fix the hotplug order issues.

Yijing Wang (2):
  libsas: Don't process sas events in static works
  libsas: Enhance libsas hotplug

 drivers/scsi/libsas/sas_discover.c | 58 +++++++++++++++++-------
 drivers/scsi/libsas/sas_event.c    | 90 ++++++++++++++++++++++++++------------
 drivers/scsi/libsas/sas_expander.c |  9 +++-
 drivers/scsi/libsas/sas_init.c     | 37 +++++++++++++---
 drivers/scsi/libsas/sas_internal.h | 53 ++++++++++++++++++++++
 drivers/scsi/libsas/sas_phy.c      | 45 ++++---------------
 drivers/scsi/libsas/sas_port.c     | 22 +++++-----
 include/scsi/libsas.h              | 21 +++++----
 8 files changed, 230 insertions(+), 105 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Enhance libsas hotplug feature
@ 2017-05-20  6:39 ` Yijing Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2017-05-20  6:39 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, john.garry, fangwei1, yanaijie, hch, dan.j.williams,
	Yijing Wang

Now the libsas hotplug has some issues, Dan Williams report
a similar bug here before
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg39187.html

The issues we have found
1. if LLDD burst reports lots of phy-up/phy-down sas events, some events
   may lost because a same sas events is pending now, finally libsas topo
   may different the hardware.
2. receive a phy down sas event, libsas call sas_deform_port to remove
   devices, it would first delete the sas port, then put a destruction
   discovery event in a new work, and queue it at the tail of workqueue,
   once the sas port be deleted, its children device will be deleted too,
   when the destruction work start, it will found the target device has
   been removed, and report a sysfs warnning.
3. since a hotplug process will be devided into several works, if a phy up
   sas event insert into phydown works, like
   destruction work  ---> PORTE_BYTES_DMAED (sas_form_port) ---->PHYE_LOSS_OF_SIGNAL
   the hot remove flow would broken by PORTE_BYTES_DMAED event, it's not
   we expected, and issues would occur.

The first patch fix the sas events lost, and the second one introudce wait-complete
to fix the hotplug order issues.

Yijing Wang (2):
  libsas: Don't process sas events in static works
  libsas: Enhance libsas hotplug

 drivers/scsi/libsas/sas_discover.c | 58 +++++++++++++++++-------
 drivers/scsi/libsas/sas_event.c    | 90 ++++++++++++++++++++++++++------------
 drivers/scsi/libsas/sas_expander.c |  9 +++-
 drivers/scsi/libsas/sas_init.c     | 37 +++++++++++++---
 drivers/scsi/libsas/sas_internal.h | 53 ++++++++++++++++++++++
 drivers/scsi/libsas/sas_phy.c      | 45 ++++---------------
 drivers/scsi/libsas/sas_port.c     | 22 +++++-----
 include/scsi/libsas.h              | 21 +++++----
 8 files changed, 230 insertions(+), 105 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] libsas: Don't process sas events in static works
  2017-05-20  6:39 ` Yijing Wang
@ 2017-05-20  6:39   ` Yijing Wang
  -1 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2017-05-20  6:39 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, john.garry, fangwei1, yanaijie, hch, dan.j.williams,
	Yijing Wang, Yousong He

Now libsas hotplug work is static, LLDD driver queue
the hotplug work into shost->work_q. If LLDD driver
burst post lots hotplug events to libsas, the hotplug
events may pending in the workqueue like

shost->work_q
new work[PORTE_BYTES_DMAED] --> |[PHYE_LOSS_OF_SIGNAL][PORTE_BYTES_DMAED] -> processing
                                |<-------wait worker to process-------->|
In this case, a new PORTE_BYTES_DMAED event coming, libsas try to queue it
to shost->work_q, but this work is already pending, so it would be lost.
Finally, libsas delete the related sas port and sas devices, but LLDD driver
expect libsas add the sas port and devices(last sas event).

This patch remove the static defined hotplug work, and use dynamic work to
avoid missing hotplug events.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yousong He <heyousong@huawei.com>
Signed-off-by: Qilin Chen <chenqilin2@huawei.com>
---
 drivers/scsi/libsas/sas_event.c    | 88 +++++++++++++++++++++++++++-----------
 drivers/scsi/libsas/sas_init.c     |  6 ---
 drivers/scsi/libsas/sas_internal.h |  3 ++
 drivers/scsi/libsas/sas_phy.c      | 45 ++++---------------
 drivers/scsi/libsas/sas_port.c     | 18 ++++----
 include/scsi/libsas.h              | 10 +----
 6 files changed, 84 insertions(+), 86 deletions(-)

diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index aadbd53..06c5c4b 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -27,6 +27,10 @@
 #include "sas_internal.h"
 #include "sas_dump.h"
 
+static const work_func_t sas_ha_event_fns[HA_NUM_EVENTS] = {
+	[HAE_RESET] = sas_hae_reset,
+};
+
 void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
 {
 	if (!test_bit(SAS_HA_REGISTERED, &ha->state))
@@ -40,17 +44,14 @@ void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
 		scsi_queue_work(ha->core.shost, &sw->work);
 }
 
-static void sas_queue_event(int event, unsigned long *pending,
-			    struct sas_work *work,
+static void sas_queue_event(int event, struct sas_work *work,
 			    struct sas_ha_struct *ha)
 {
-	if (!test_and_set_bit(event, pending)) {
-		unsigned long flags;
+	unsigned long flags;
 
-		spin_lock_irqsave(&ha->lock, flags);
-		sas_queue_work(ha, work);
-		spin_unlock_irqrestore(&ha->lock, flags);
-	}
+	spin_lock_irqsave(&ha->lock, flags);
+	sas_queue_work(ha, work);
+	spin_unlock_irqrestore(&ha->lock, flags);
 }
 
 
@@ -111,52 +112,87 @@ void sas_enable_revalidation(struct sas_ha_struct *ha)
 		if (!test_and_clear_bit(ev, &d->pending))
 			continue;
 
-		sas_queue_event(ev, &d->pending, &d->disc_work[ev].work, ha);
+		sas_queue_event(ev, &d->disc_work[ev].work, ha);
 	}
 	mutex_unlock(&ha->disco_mutex);
 }
 
+static void sas_ha_event_worker(struct work_struct *work)
+{
+	struct sas_ha_event *ev = to_sas_ha_event(work);
+
+	sas_ha_event_fns[ev->type](work);
+	kfree(ev);
+}
+
+static void sas_port_event_worker(struct work_struct *work)
+{
+	struct asd_sas_event *ev = to_asd_sas_event(work);
+
+	sas_port_event_fns[ev->type](work);
+	kfree(ev);
+}
+
+static void sas_phy_event_worker(struct work_struct *work)
+{
+	struct asd_sas_event *ev = to_asd_sas_event(work);
+
+	sas_phy_event_fns[ev->type](work);
+	kfree(ev);
+}
+
 static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event event)
 {
+	struct sas_ha_event *ev;
+
 	BUG_ON(event >= HA_NUM_EVENTS);
 
-	sas_queue_event(event, &sas_ha->pending,
-			&sas_ha->ha_events[event].work, sas_ha);
+	ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
+	if (!ev)
+		return;
+
+	INIT_SAS_WORK(&ev->work, sas_ha_event_worker);
+	ev->ha = sas_ha;
+	ev->type = event;
+	sas_queue_event(event, &ev->work, sas_ha);
 }
 
 static void notify_port_event(struct asd_sas_phy *phy, enum port_event event)
 {
+	struct asd_sas_event *ev;
 	struct sas_ha_struct *ha = phy->ha;
 
 	BUG_ON(event >= PORT_NUM_EVENTS);
 
-	sas_queue_event(event, &phy->port_events_pending,
-			&phy->port_events[event].work, ha);
+	ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
+	if (!ev)
+		return;
+
+	INIT_SAS_WORK(&ev->work, sas_port_event_worker);
+	ev->phy = phy;
+	ev->type = event;
+	sas_queue_event(event, &ev->work, ha);
 }
 
 void sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event)
 {
+	struct asd_sas_event *ev;
 	struct sas_ha_struct *ha = phy->ha;
 
 	BUG_ON(event >= PHY_NUM_EVENTS);
 
-	sas_queue_event(event, &phy->phy_events_pending,
-			&phy->phy_events[event].work, ha);
+	ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
+	if (!ev)
+		return;
+
+	INIT_SAS_WORK(&ev->work, sas_phy_event_worker);
+	ev->phy = phy;
+	ev->type = event;
+	sas_queue_event(event, &ev->work, ha);
 }
 
 int sas_init_events(struct sas_ha_struct *sas_ha)
 {
-	static const work_func_t sas_ha_event_fns[HA_NUM_EVENTS] = {
-		[HAE_RESET] = sas_hae_reset,
-	};
-
-	int i;
-
-	for (i = 0; i < HA_NUM_EVENTS; i++) {
-		INIT_SAS_WORK(&sas_ha->ha_events[i].work, sas_ha_event_fns[i]);
-		sas_ha->ha_events[i].ha = sas_ha;
-	}
-
 	sas_ha->notify_ha_event = notify_ha_event;
 	sas_ha->notify_port_event = notify_port_event;
 	sas_ha->notify_phy_event = sas_notify_phy_event;
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 15ef8e2..79f95d0 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -111,10 +111,6 @@ void sas_hash_addr(u8 *hashed, const u8 *sas_addr)
 
 void sas_hae_reset(struct work_struct *work)
 {
-	struct sas_ha_event *ev = to_sas_ha_event(work);
-	struct sas_ha_struct *ha = ev->ha;
-
-	clear_bit(HAE_RESET, &ha->pending);
 }
 
 int sas_register_ha(struct sas_ha_struct *sas_ha)
@@ -375,8 +371,6 @@ void sas_prep_resume_ha(struct sas_ha_struct *ha)
 		struct asd_sas_phy *phy = ha->sas_phy[i];
 
 		memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
-		phy->port_events_pending = 0;
-		phy->phy_events_pending = 0;
 		phy->frame_rcvd_size = 0;
 	}
 }
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index b306b78..33ce7e5 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -97,6 +97,9 @@ void sas_hae_reset(struct work_struct *work);
 
 void sas_free_device(struct kref *kref);
 
+extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
+extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
+
 #ifdef CONFIG_SCSI_SAS_HOST_SMP
 extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
 				struct request *rsp);
diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index cdee446c..7c4576d 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -35,7 +35,6 @@ static void sas_phye_loss_of_signal(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PHYE_LOSS_OF_SIGNAL, &phy->phy_events_pending);
 	phy->error = 0;
 	sas_deform_port(phy, 1);
 }
@@ -45,7 +44,6 @@ static void sas_phye_oob_done(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PHYE_OOB_DONE, &phy->phy_events_pending);
 	phy->error = 0;
 }
 
@@ -58,8 +56,6 @@ static void sas_phye_oob_error(struct work_struct *work)
 	struct sas_internal *i =
 		to_sas_internal(sas_ha->core.shost->transportt);
 
-	clear_bit(PHYE_OOB_ERROR, &phy->phy_events_pending);
-
 	sas_deform_port(phy, 1);
 
 	if (!port && phy->enabled && i->dft->lldd_control_phy) {
@@ -88,8 +84,6 @@ static void sas_phye_spinup_hold(struct work_struct *work)
 	struct sas_internal *i =
 		to_sas_internal(sas_ha->core.shost->transportt);
 
-	clear_bit(PHYE_SPINUP_HOLD, &phy->phy_events_pending);
-
 	phy->error = 0;
 	i->dft->lldd_control_phy(phy, PHY_FUNC_RELEASE_SPINUP_HOLD, NULL);
 }
@@ -99,8 +93,6 @@ static void sas_phye_resume_timeout(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PHYE_RESUME_TIMEOUT, &phy->phy_events_pending);
-
 	/* phew, lldd got the phy back in the nick of time */
 	if (!phy->suspended) {
 		dev_info(&phy->phy->dev, "resume timeout cancelled\n");
@@ -112,46 +104,18 @@ static void sas_phye_resume_timeout(struct work_struct *work)
 	sas_deform_port(phy, 1);
 }
 
-
 /* ---------- Phy class registration ---------- */
 
 int sas_register_phys(struct sas_ha_struct *sas_ha)
 {
 	int i;
 
-	static const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS] = {
-		[PHYE_LOSS_OF_SIGNAL] = sas_phye_loss_of_signal,
-		[PHYE_OOB_DONE] = sas_phye_oob_done,
-		[PHYE_OOB_ERROR] = sas_phye_oob_error,
-		[PHYE_SPINUP_HOLD] = sas_phye_spinup_hold,
-		[PHYE_RESUME_TIMEOUT] = sas_phye_resume_timeout,
-
-	};
-
-	static const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
-		[PORTE_BYTES_DMAED] = sas_porte_bytes_dmaed,
-		[PORTE_BROADCAST_RCVD] = sas_porte_broadcast_rcvd,
-		[PORTE_LINK_RESET_ERR] = sas_porte_link_reset_err,
-		[PORTE_TIMER_EVENT] = sas_porte_timer_event,
-		[PORTE_HARD_RESET] = sas_porte_hard_reset,
-	};
-
 	/* Now register the phys. */
 	for (i = 0; i < sas_ha->num_phys; i++) {
-		int k;
 		struct asd_sas_phy *phy = sas_ha->sas_phy[i];
 
 		phy->error = 0;
 		INIT_LIST_HEAD(&phy->port_phy_el);
-		for (k = 0; k < PORT_NUM_EVENTS; k++) {
-			INIT_SAS_WORK(&phy->port_events[k].work, sas_port_event_fns[k]);
-			phy->port_events[k].phy = phy;
-		}
-
-		for (k = 0; k < PHY_NUM_EVENTS; k++) {
-			INIT_SAS_WORK(&phy->phy_events[k].work, sas_phy_event_fns[k]);
-			phy->phy_events[k].phy = phy;
-		}
 
 		phy->port = NULL;
 		phy->ha = sas_ha;
@@ -179,3 +143,12 @@ int sas_register_phys(struct sas_ha_struct *sas_ha)
 
 	return 0;
 }
+
+const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS] = {
+	[PHYE_LOSS_OF_SIGNAL] = sas_phye_loss_of_signal,
+	[PHYE_OOB_DONE] = sas_phye_oob_done,
+	[PHYE_OOB_ERROR] = sas_phye_oob_error,
+	[PHYE_SPINUP_HOLD] = sas_phye_spinup_hold,
+	[PHYE_RESUME_TIMEOUT] = sas_phye_resume_timeout,
+
+};
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index d3c5297..9326628 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -261,8 +261,6 @@ void sas_porte_bytes_dmaed(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_BYTES_DMAED, &phy->port_events_pending);
-
 	sas_form_port(phy);
 }
 
@@ -273,8 +271,6 @@ void sas_porte_broadcast_rcvd(struct work_struct *work)
 	unsigned long flags;
 	u32 prim;
 
-	clear_bit(PORTE_BROADCAST_RCVD, &phy->port_events_pending);
-
 	spin_lock_irqsave(&phy->sas_prim_lock, flags);
 	prim = phy->sas_prim;
 	spin_unlock_irqrestore(&phy->sas_prim_lock, flags);
@@ -288,8 +284,6 @@ void sas_porte_link_reset_err(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_LINK_RESET_ERR, &phy->port_events_pending);
-
 	sas_deform_port(phy, 1);
 }
 
@@ -298,8 +292,6 @@ void sas_porte_timer_event(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_TIMER_EVENT, &phy->port_events_pending);
-
 	sas_deform_port(phy, 1);
 }
 
@@ -308,8 +300,6 @@ void sas_porte_hard_reset(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_HARD_RESET, &phy->port_events_pending);
-
 	sas_deform_port(phy, 1);
 }
 
@@ -353,3 +343,11 @@ void sas_unregister_ports(struct sas_ha_struct *sas_ha)
 			sas_deform_port(sas_ha->sas_phy[i], 0);
 
 }
+
+const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
+	[PORTE_BYTES_DMAED] = sas_porte_bytes_dmaed,
+	[PORTE_BROADCAST_RCVD] = sas_porte_broadcast_rcvd,
+	[PORTE_LINK_RESET_ERR] = sas_porte_link_reset_err,
+	[PORTE_TIMER_EVENT] = sas_porte_timer_event,
+	[PORTE_HARD_RESET] = sas_porte_hard_reset,
+};
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index dae99d7..c4444ad 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -300,6 +300,7 @@ struct asd_sas_port {
 struct asd_sas_event {
 	struct sas_work work;
 	struct asd_sas_phy *phy;
+	int type;
 };
 
 static inline struct asd_sas_event *to_asd_sas_event(struct work_struct *work)
@@ -314,11 +315,6 @@ static inline struct asd_sas_event *to_asd_sas_event(struct work_struct *work)
  */
 struct asd_sas_phy {
 /* private: */
-	struct asd_sas_event   port_events[PORT_NUM_EVENTS];
-	struct asd_sas_event   phy_events[PHY_NUM_EVENTS];
-
-	unsigned long port_events_pending;
-	unsigned long phy_events_pending;
 
 	int error;
 	int suspended;
@@ -365,6 +361,7 @@ struct scsi_core {
 struct sas_ha_event {
 	struct sas_work work;
 	struct sas_ha_struct *ha;
+	int type;
 };
 
 static inline struct sas_ha_event *to_sas_ha_event(struct work_struct *work)
@@ -383,9 +380,6 @@ enum sas_ha_state {
 
 struct sas_ha_struct {
 /* private: */
-	struct sas_ha_event ha_events[HA_NUM_EVENTS];
-	unsigned long	 pending;
-
 	struct list_head  defer_q; /* work queued while draining */
 	struct mutex	  drain_mutex;
 	unsigned long	  state;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2] libsas: Don't process sas events in static works
@ 2017-05-20  6:39   ` Yijing Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2017-05-20  6:39 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, john.garry, fangwei1, yanaijie, hch, dan.j.williams,
	Yijing Wang, Yousong He

Now libsas hotplug work is static, LLDD driver queue
the hotplug work into shost->work_q. If LLDD driver
burst post lots hotplug events to libsas, the hotplug
events may pending in the workqueue like

shost->work_q
new work[PORTE_BYTES_DMAED] --> |[PHYE_LOSS_OF_SIGNAL][PORTE_BYTES_DMAED] -> processing
                                |<-------wait worker to process-------->|
In this case, a new PORTE_BYTES_DMAED event coming, libsas try to queue it
to shost->work_q, but this work is already pending, so it would be lost.
Finally, libsas delete the related sas port and sas devices, but LLDD driver
expect libsas add the sas port and devices(last sas event).

This patch remove the static defined hotplug work, and use dynamic work to
avoid missing hotplug events.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Yousong He <heyousong@huawei.com>
Signed-off-by: Qilin Chen <chenqilin2@huawei.com>
---
 drivers/scsi/libsas/sas_event.c    | 88 +++++++++++++++++++++++++++-----------
 drivers/scsi/libsas/sas_init.c     |  6 ---
 drivers/scsi/libsas/sas_internal.h |  3 ++
 drivers/scsi/libsas/sas_phy.c      | 45 ++++---------------
 drivers/scsi/libsas/sas_port.c     | 18 ++++----
 include/scsi/libsas.h              | 10 +----
 6 files changed, 84 insertions(+), 86 deletions(-)

diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index aadbd53..06c5c4b 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -27,6 +27,10 @@
 #include "sas_internal.h"
 #include "sas_dump.h"
 
+static const work_func_t sas_ha_event_fns[HA_NUM_EVENTS] = {
+	[HAE_RESET] = sas_hae_reset,
+};
+
 void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
 {
 	if (!test_bit(SAS_HA_REGISTERED, &ha->state))
@@ -40,17 +44,14 @@ void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
 		scsi_queue_work(ha->core.shost, &sw->work);
 }
 
-static void sas_queue_event(int event, unsigned long *pending,
-			    struct sas_work *work,
+static void sas_queue_event(int event, struct sas_work *work,
 			    struct sas_ha_struct *ha)
 {
-	if (!test_and_set_bit(event, pending)) {
-		unsigned long flags;
+	unsigned long flags;
 
-		spin_lock_irqsave(&ha->lock, flags);
-		sas_queue_work(ha, work);
-		spin_unlock_irqrestore(&ha->lock, flags);
-	}
+	spin_lock_irqsave(&ha->lock, flags);
+	sas_queue_work(ha, work);
+	spin_unlock_irqrestore(&ha->lock, flags);
 }
 
 
@@ -111,52 +112,87 @@ void sas_enable_revalidation(struct sas_ha_struct *ha)
 		if (!test_and_clear_bit(ev, &d->pending))
 			continue;
 
-		sas_queue_event(ev, &d->pending, &d->disc_work[ev].work, ha);
+		sas_queue_event(ev, &d->disc_work[ev].work, ha);
 	}
 	mutex_unlock(&ha->disco_mutex);
 }
 
+static void sas_ha_event_worker(struct work_struct *work)
+{
+	struct sas_ha_event *ev = to_sas_ha_event(work);
+
+	sas_ha_event_fns[ev->type](work);
+	kfree(ev);
+}
+
+static void sas_port_event_worker(struct work_struct *work)
+{
+	struct asd_sas_event *ev = to_asd_sas_event(work);
+
+	sas_port_event_fns[ev->type](work);
+	kfree(ev);
+}
+
+static void sas_phy_event_worker(struct work_struct *work)
+{
+	struct asd_sas_event *ev = to_asd_sas_event(work);
+
+	sas_phy_event_fns[ev->type](work);
+	kfree(ev);
+}
+
 static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event event)
 {
+	struct sas_ha_event *ev;
+
 	BUG_ON(event >= HA_NUM_EVENTS);
 
-	sas_queue_event(event, &sas_ha->pending,
-			&sas_ha->ha_events[event].work, sas_ha);
+	ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
+	if (!ev)
+		return;
+
+	INIT_SAS_WORK(&ev->work, sas_ha_event_worker);
+	ev->ha = sas_ha;
+	ev->type = event;
+	sas_queue_event(event, &ev->work, sas_ha);
 }
 
 static void notify_port_event(struct asd_sas_phy *phy, enum port_event event)
 {
+	struct asd_sas_event *ev;
 	struct sas_ha_struct *ha = phy->ha;
 
 	BUG_ON(event >= PORT_NUM_EVENTS);
 
-	sas_queue_event(event, &phy->port_events_pending,
-			&phy->port_events[event].work, ha);
+	ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
+	if (!ev)
+		return;
+
+	INIT_SAS_WORK(&ev->work, sas_port_event_worker);
+	ev->phy = phy;
+	ev->type = event;
+	sas_queue_event(event, &ev->work, ha);
 }
 
 void sas_notify_phy_event(struct asd_sas_phy *phy, enum phy_event event)
 {
+	struct asd_sas_event *ev;
 	struct sas_ha_struct *ha = phy->ha;
 
 	BUG_ON(event >= PHY_NUM_EVENTS);
 
-	sas_queue_event(event, &phy->phy_events_pending,
-			&phy->phy_events[event].work, ha);
+	ev = kzalloc(sizeof(*ev), GFP_ATOMIC);
+	if (!ev)
+		return;
+
+	INIT_SAS_WORK(&ev->work, sas_phy_event_worker);
+	ev->phy = phy;
+	ev->type = event;
+	sas_queue_event(event, &ev->work, ha);
 }
 
 int sas_init_events(struct sas_ha_struct *sas_ha)
 {
-	static const work_func_t sas_ha_event_fns[HA_NUM_EVENTS] = {
-		[HAE_RESET] = sas_hae_reset,
-	};
-
-	int i;
-
-	for (i = 0; i < HA_NUM_EVENTS; i++) {
-		INIT_SAS_WORK(&sas_ha->ha_events[i].work, sas_ha_event_fns[i]);
-		sas_ha->ha_events[i].ha = sas_ha;
-	}
-
 	sas_ha->notify_ha_event = notify_ha_event;
 	sas_ha->notify_port_event = notify_port_event;
 	sas_ha->notify_phy_event = sas_notify_phy_event;
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 15ef8e2..79f95d0 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -111,10 +111,6 @@ void sas_hash_addr(u8 *hashed, const u8 *sas_addr)
 
 void sas_hae_reset(struct work_struct *work)
 {
-	struct sas_ha_event *ev = to_sas_ha_event(work);
-	struct sas_ha_struct *ha = ev->ha;
-
-	clear_bit(HAE_RESET, &ha->pending);
 }
 
 int sas_register_ha(struct sas_ha_struct *sas_ha)
@@ -375,8 +371,6 @@ void sas_prep_resume_ha(struct sas_ha_struct *ha)
 		struct asd_sas_phy *phy = ha->sas_phy[i];
 
 		memset(phy->attached_sas_addr, 0, SAS_ADDR_SIZE);
-		phy->port_events_pending = 0;
-		phy->phy_events_pending = 0;
 		phy->frame_rcvd_size = 0;
 	}
 }
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index b306b78..33ce7e5 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -97,6 +97,9 @@ void sas_hae_reset(struct work_struct *work);
 
 void sas_free_device(struct kref *kref);
 
+extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
+extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
+
 #ifdef CONFIG_SCSI_SAS_HOST_SMP
 extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
 				struct request *rsp);
diff --git a/drivers/scsi/libsas/sas_phy.c b/drivers/scsi/libsas/sas_phy.c
index cdee446c..7c4576d 100644
--- a/drivers/scsi/libsas/sas_phy.c
+++ b/drivers/scsi/libsas/sas_phy.c
@@ -35,7 +35,6 @@ static void sas_phye_loss_of_signal(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PHYE_LOSS_OF_SIGNAL, &phy->phy_events_pending);
 	phy->error = 0;
 	sas_deform_port(phy, 1);
 }
@@ -45,7 +44,6 @@ static void sas_phye_oob_done(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PHYE_OOB_DONE, &phy->phy_events_pending);
 	phy->error = 0;
 }
 
@@ -58,8 +56,6 @@ static void sas_phye_oob_error(struct work_struct *work)
 	struct sas_internal *i =
 		to_sas_internal(sas_ha->core.shost->transportt);
 
-	clear_bit(PHYE_OOB_ERROR, &phy->phy_events_pending);
-
 	sas_deform_port(phy, 1);
 
 	if (!port && phy->enabled && i->dft->lldd_control_phy) {
@@ -88,8 +84,6 @@ static void sas_phye_spinup_hold(struct work_struct *work)
 	struct sas_internal *i =
 		to_sas_internal(sas_ha->core.shost->transportt);
 
-	clear_bit(PHYE_SPINUP_HOLD, &phy->phy_events_pending);
-
 	phy->error = 0;
 	i->dft->lldd_control_phy(phy, PHY_FUNC_RELEASE_SPINUP_HOLD, NULL);
 }
@@ -99,8 +93,6 @@ static void sas_phye_resume_timeout(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PHYE_RESUME_TIMEOUT, &phy->phy_events_pending);
-
 	/* phew, lldd got the phy back in the nick of time */
 	if (!phy->suspended) {
 		dev_info(&phy->phy->dev, "resume timeout cancelled\n");
@@ -112,46 +104,18 @@ static void sas_phye_resume_timeout(struct work_struct *work)
 	sas_deform_port(phy, 1);
 }
 
-
 /* ---------- Phy class registration ---------- */
 
 int sas_register_phys(struct sas_ha_struct *sas_ha)
 {
 	int i;
 
-	static const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS] = {
-		[PHYE_LOSS_OF_SIGNAL] = sas_phye_loss_of_signal,
-		[PHYE_OOB_DONE] = sas_phye_oob_done,
-		[PHYE_OOB_ERROR] = sas_phye_oob_error,
-		[PHYE_SPINUP_HOLD] = sas_phye_spinup_hold,
-		[PHYE_RESUME_TIMEOUT] = sas_phye_resume_timeout,
-
-	};
-
-	static const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
-		[PORTE_BYTES_DMAED] = sas_porte_bytes_dmaed,
-		[PORTE_BROADCAST_RCVD] = sas_porte_broadcast_rcvd,
-		[PORTE_LINK_RESET_ERR] = sas_porte_link_reset_err,
-		[PORTE_TIMER_EVENT] = sas_porte_timer_event,
-		[PORTE_HARD_RESET] = sas_porte_hard_reset,
-	};
-
 	/* Now register the phys. */
 	for (i = 0; i < sas_ha->num_phys; i++) {
-		int k;
 		struct asd_sas_phy *phy = sas_ha->sas_phy[i];
 
 		phy->error = 0;
 		INIT_LIST_HEAD(&phy->port_phy_el);
-		for (k = 0; k < PORT_NUM_EVENTS; k++) {
-			INIT_SAS_WORK(&phy->port_events[k].work, sas_port_event_fns[k]);
-			phy->port_events[k].phy = phy;
-		}
-
-		for (k = 0; k < PHY_NUM_EVENTS; k++) {
-			INIT_SAS_WORK(&phy->phy_events[k].work, sas_phy_event_fns[k]);
-			phy->phy_events[k].phy = phy;
-		}
 
 		phy->port = NULL;
 		phy->ha = sas_ha;
@@ -179,3 +143,12 @@ int sas_register_phys(struct sas_ha_struct *sas_ha)
 
 	return 0;
 }
+
+const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS] = {
+	[PHYE_LOSS_OF_SIGNAL] = sas_phye_loss_of_signal,
+	[PHYE_OOB_DONE] = sas_phye_oob_done,
+	[PHYE_OOB_ERROR] = sas_phye_oob_error,
+	[PHYE_SPINUP_HOLD] = sas_phye_spinup_hold,
+	[PHYE_RESUME_TIMEOUT] = sas_phye_resume_timeout,
+
+};
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index d3c5297..9326628 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -261,8 +261,6 @@ void sas_porte_bytes_dmaed(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_BYTES_DMAED, &phy->port_events_pending);
-
 	sas_form_port(phy);
 }
 
@@ -273,8 +271,6 @@ void sas_porte_broadcast_rcvd(struct work_struct *work)
 	unsigned long flags;
 	u32 prim;
 
-	clear_bit(PORTE_BROADCAST_RCVD, &phy->port_events_pending);
-
 	spin_lock_irqsave(&phy->sas_prim_lock, flags);
 	prim = phy->sas_prim;
 	spin_unlock_irqrestore(&phy->sas_prim_lock, flags);
@@ -288,8 +284,6 @@ void sas_porte_link_reset_err(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_LINK_RESET_ERR, &phy->port_events_pending);
-
 	sas_deform_port(phy, 1);
 }
 
@@ -298,8 +292,6 @@ void sas_porte_timer_event(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_TIMER_EVENT, &phy->port_events_pending);
-
 	sas_deform_port(phy, 1);
 }
 
@@ -308,8 +300,6 @@ void sas_porte_hard_reset(struct work_struct *work)
 	struct asd_sas_event *ev = to_asd_sas_event(work);
 	struct asd_sas_phy *phy = ev->phy;
 
-	clear_bit(PORTE_HARD_RESET, &phy->port_events_pending);
-
 	sas_deform_port(phy, 1);
 }
 
@@ -353,3 +343,11 @@ void sas_unregister_ports(struct sas_ha_struct *sas_ha)
 			sas_deform_port(sas_ha->sas_phy[i], 0);
 
 }
+
+const work_func_t sas_port_event_fns[PORT_NUM_EVENTS] = {
+	[PORTE_BYTES_DMAED] = sas_porte_bytes_dmaed,
+	[PORTE_BROADCAST_RCVD] = sas_porte_broadcast_rcvd,
+	[PORTE_LINK_RESET_ERR] = sas_porte_link_reset_err,
+	[PORTE_TIMER_EVENT] = sas_porte_timer_event,
+	[PORTE_HARD_RESET] = sas_porte_hard_reset,
+};
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index dae99d7..c4444ad 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -300,6 +300,7 @@ struct asd_sas_port {
 struct asd_sas_event {
 	struct sas_work work;
 	struct asd_sas_phy *phy;
+	int type;
 };
 
 static inline struct asd_sas_event *to_asd_sas_event(struct work_struct *work)
@@ -314,11 +315,6 @@ static inline struct asd_sas_event *to_asd_sas_event(struct work_struct *work)
  */
 struct asd_sas_phy {
 /* private: */
-	struct asd_sas_event   port_events[PORT_NUM_EVENTS];
-	struct asd_sas_event   phy_events[PHY_NUM_EVENTS];
-
-	unsigned long port_events_pending;
-	unsigned long phy_events_pending;
 
 	int error;
 	int suspended;
@@ -365,6 +361,7 @@ struct scsi_core {
 struct sas_ha_event {
 	struct sas_work work;
 	struct sas_ha_struct *ha;
+	int type;
 };
 
 static inline struct sas_ha_event *to_sas_ha_event(struct work_struct *work)
@@ -383,9 +380,6 @@ enum sas_ha_state {
 
 struct sas_ha_struct {
 /* private: */
-	struct sas_ha_event ha_events[HA_NUM_EVENTS];
-	unsigned long	 pending;
-
 	struct list_head  defer_q; /* work queued while draining */
 	struct mutex	  drain_mutex;
 	unsigned long	  state;
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] libsas: Enhance libsas hotplug
  2017-05-20  6:39 ` Yijing Wang
@ 2017-05-20  6:39   ` Yijing Wang
  -1 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2017-05-20  6:39 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, john.garry, fangwei1, yanaijie, hch, dan.j.williams,
	Yijing Wang

Libsas complete a hotplug event notified by LLDD in several works,
for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
in following steps:

notify_phy_event	[interrupt context]
	sas_queue_event		[queue work on shost->work_q]
		sas_phye_loss_of_signal		[running in shost->work_q]
			sas_deform_port		[remove sas port]
				sas_unregister_dev
					sas_discover_event	[queue destruct work on shost->work_q tail]

In above case, complete whole hotplug in two works, remove sas port first, then
put the destruction of device in another work and queue it on in the tail of
workqueue, since sas port is the parent of the children rphy device, so if remove
sas port first, the children rphy device would also be deleted, when the destruction
work coming, it would find the target has been removed already, and report a
sysfs warning calltrace.

queue tail                                             queue head
DISCE_DESTRUCT----> PORTE_BYTES_DMAED event ----->PHYE_LOSS_OF_SIGNAL[running]

There are other hotplug issues in current framework, in above case, if there is
hotadd sas event queued between hotremove works, the hotplug order would be broken
and unexpected issues would happen.

In this patch, we try to solve these issues in following steps:
1. create a new workqueue used to run sas event work, instead of scsi host workqueue,
   because we may block sas event work, we cannot block the normal scsi works.
2. create a new workqueue used to run sas discovery events work, instead of scsi host
   workqueue, because in some cases, eg. in revalidate domain event, we may unregister
   a sas device and discover new one, we must sync the execution, wait the remove process
   finish, then start a new discovery. So we must put the probe and destruct discovery
   events in a new workqueue to avoid deadlock.
3. introudce a asd_sas_port level wait-complete and a sas_discovery level wait-complete
   we use former wait-complete to achieve a sas event atomic process and use latter to
   make a sas discovery sync.
4. remove disco_mutex in sas_revalidate_domain, since now sas_revalidate_domain sync
   the destruct discovery event execution, it's no need to lock disco mutex there.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
 drivers/scsi/libsas/sas_discover.c | 58 ++++++++++++++++++++++++++++----------
 drivers/scsi/libsas/sas_event.c    |  2 +-
 drivers/scsi/libsas/sas_expander.c |  9 +++++-
 drivers/scsi/libsas/sas_init.c     | 31 +++++++++++++++++++-
 drivers/scsi/libsas/sas_internal.h | 50 ++++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_port.c     |  4 +++
 include/scsi/libsas.h              | 11 +++++++-
 7 files changed, 146 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 60de662..43e8a1e 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct *work)
 	struct domain_device *ddev = port->port_dev;
 
 	/* prevent revalidation from finding sata links in recovery */
-	mutex_lock(&ha->disco_mutex);
 	if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
 		SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
 			    port->id, task_pid_nr(current));
-		goto out;
+		return;
 	}
 
 	clear_bit(DISCE_REVALIDATE_DOMAIN, &port->disc.pending);
@@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct *work)
 
 	SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
 		    port->id, task_pid_nr(current), res);
- out:
-	mutex_unlock(&ha->disco_mutex);
+}
+
+static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
+	[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
+	[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
+	[DISCE_PROBE] = sas_probe_devices,
+	[DISCE_SUSPEND] = sas_suspend_devices,
+	[DISCE_RESUME] = sas_resume_devices,
+	[DISCE_DESTRUCT] = sas_destruct_devices,
+};
+
+/* a simple wrapper for sas discover event funtions */
+static void sas_discover_common_fn(struct work_struct *work)
+{
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
+	struct asd_sas_port *port = ev->port;
+
+	sas_event_fns[ev->type](work);
+	sas_unbusy_port(port);
 }
 
 /* ---------- Events ---------- */
 
 static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw)
 {
+	int ret;
+	struct sas_discovery_event *ev = to_sas_discovery_event(&sw->work);
+	struct asd_sas_port *port = ev->port;
+
 	/* chained work is not subject to SA_HA_DRAINING or
 	 * SAS_HA_REGISTERED, because it is either submitted in the
 	 * workqueue, or known to be submitted from a context that is
 	 * not racing against draining
 	 */
-	scsi_queue_work(ha->core.shost, &sw->work);
+	sas_busy_port(port);
+
+	/*
+	 * discovery event probe and destruct would be called in other
+	 * discovery event like discover domain and revalidate domain
+	 * events, in some cases, we need to sync execute probe and destruct
+	 * events, so run discover events except probe/destruct in a new
+	 * workqueue.
+	 */
+	if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
+		ret = scsi_queue_work(ha->core.shost, &sw->work);
+	else
+		ret = queue_work(ha->disc_q, &sw->work);
+
+	if (ret != 1)
+		/* queue a work fail, unbusy the ha before return */
+		sas_unbusy_port(port);
 }
 
 static void sas_chain_event(int event, unsigned long *pending,
@@ -575,18 +611,10 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
 {
 	int i;
 
-	static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
-		[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
-		[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
-		[DISCE_PROBE] = sas_probe_devices,
-		[DISCE_SUSPEND] = sas_suspend_devices,
-		[DISCE_RESUME] = sas_resume_devices,
-		[DISCE_DESTRUCT] = sas_destruct_devices,
-	};
-
 	disc->pending = 0;
 	for (i = 0; i < DISC_NUM_EVENTS; i++) {
-		INIT_SAS_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
+		INIT_SAS_WORK(&disc->disc_work[i].work, sas_discover_common_fn);
 		disc->disc_work[i].port = port;
+		disc->disc_work[i].type = i;
 	}
 }
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 06c5c4b..c0fc07d 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -41,7 +41,7 @@ void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
 		if (list_empty(&sw->drain_node))
 			list_add(&sw->drain_node, &ha->defer_q);
 	} else
-		scsi_queue_work(ha->core.shost, &sw->work);
+		queue_work(ha->event_q, &sw->work);
 }
 
 static void sas_queue_event(int event, struct sas_work *work,
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 570b2cb..a8c8ae1 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -822,7 +822,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 
 		list_add_tail(&child->disco_list_node, &parent->port->disco_list);
 
+		wait_discover_event_init(child->port);
 		res = sas_discover_sata(child);
+		wait_for_discover_event_finish(child->port);
 		if (res) {
 			SAS_DPRINTK("sas_discover_sata() for device %16llx at "
 				    "%016llx:0x%x returned 0x%x\n",
@@ -847,7 +849,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 
 		list_add_tail(&child->disco_list_node, &parent->port->disco_list);
 
+		wait_discover_event_init(child->port);
 		res = sas_discover_end_dev(child);
+		wait_for_discover_event_finish(child->port);
 		if (res) {
 			SAS_DPRINTK("sas_discover_end_dev() for device %16llx "
 				    "at %016llx:0x%x returned 0x%x\n",
@@ -1890,8 +1894,11 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
 				if (child->dev_type == SAS_EDGE_EXPANDER_DEVICE ||
 				    child->dev_type == SAS_FANOUT_EXPANDER_DEVICE)
 					sas_unregister_ex_tree(parent->port, child);
-				else
+				else {
+					wait_discover_event_init(parent->port);
 					sas_unregister_dev(parent->port, child);
+					wait_for_discover_event_finish(parent->port);
+				}
 				found = child;
 				break;
 			}
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 79f95d0..1c49483 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -38,6 +38,8 @@
 
 #include "../scsi_sas_internal.h"
 
+static DEFINE_IDA(sas_ida);
+
 static struct kmem_cache *sas_task_cache;
 
 struct sas_task *sas_alloc_task(gfp_t flags)
@@ -116,6 +118,7 @@ void sas_hae_reset(struct work_struct *work)
 int sas_register_ha(struct sas_ha_struct *sas_ha)
 {
 	int error = 0;
+	char name[64];
 
 	mutex_init(&sas_ha->disco_mutex);
 	spin_lock_init(&sas_ha->phy_port_lock);
@@ -146,6 +149,30 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 		goto Undo_ports;
 	}
 
+	sas_ha->id = ida_simple_get(&sas_ida, 0, 0, GFP_KERNEL);
+	if(sas_ha->id < 0)
+		goto Undo_ports;
+
+	memset(name, 0, 64);
+	snprintf(name, 64, "sas-event-%d", sas_ha->id);
+	sas_ha->event_q = create_singlethread_workqueue(name);
+
+	/*
+	 * sas-disc-xx workqueue run the discover work except
+	 * probe and destruct.
+	 */
+	snprintf(name, 64, "sas-disc-%d", sas_ha->id);
+	sas_ha->disc_q = create_singlethread_workqueue(name);
+	if(!sas_ha->event_q || !sas_ha->disc_q) {
+		ida_simple_remove(&sas_ida, sas_ha->id);
+		if (sas_ha->event_q)
+			destroy_workqueue(sas_ha->event_q);
+		if (sas_ha->disc_q)
+			destroy_workqueue(sas_ha->disc_q);
+		goto Undo_ports;
+	}
+
+
 	INIT_LIST_HEAD(&sas_ha->eh_done_q);
 	INIT_LIST_HEAD(&sas_ha->eh_ata_q);
 
@@ -181,6 +208,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	__sas_drain_work(sas_ha);
 	mutex_unlock(&sas_ha->drain_mutex);
 
+	destroy_workqueue(sas_ha->event_q);
+	destroy_workqueue(sas_ha->disc_q);
+	ida_simple_remove(&sas_ida, sas_ha->id);
 	return 0;
 }
 
@@ -568,7 +598,6 @@ void sas_domain_release_transport(struct scsi_transport_template *stt)
 EXPORT_SYMBOL_GPL(sas_domain_release_transport);
 
 /* ---------- SAS Class register/unregister ---------- */
-
 static int __init sas_class_init(void)
 {
 	sas_task_cache = KMEM_CACHE(sas_task, SLAB_HWCACHE_ALIGN);
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 33ce7e5..276df8e 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -100,6 +100,56 @@ void sas_free_device(struct kref *kref);
 extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
 extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
 
+static inline void wait_discover_event_init(struct asd_sas_port *port)
+{
+	if (port) {
+		init_completion(&port->disc.completion);
+		port->disc.wait = 1;
+	}
+}
+
+static inline void wait_for_discover_event_finish(
+		struct asd_sas_port *port)
+{
+	if (port && port->disc.wait == 1)
+		wait_for_completion(&port->disc.completion);
+}
+
+static inline void wait_sas_event_init(struct asd_sas_port *port)
+{
+	if (port) {
+		init_completion(&port->completion);
+		port->busy = 0;
+	}
+}
+
+static inline void wait_for_sas_event_finish(
+		struct asd_sas_port *port)
+{
+	if (port && port->busy)
+		wait_for_completion(&port->completion);
+}
+
+static inline void sas_busy_port(struct asd_sas_port *port)
+{
+	if (port)
+		port->busy++;
+}
+
+static inline void sas_unbusy_port(struct asd_sas_port *port)
+{
+	if (port && (port->busy > 0)) {
+		port->busy--;
+		if (!port->busy)
+			complete(&port->completion);
+	}
+
+	if (port && (port->disc.wait == 1)) {
+		complete(&port->disc.completion);
+		port->disc.wait = 0;
+	}
+}
+
 #ifdef CONFIG_SCSI_SAS_HOST_SMP
 extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
 				struct request *rsp);
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index 9326628..8d8b38c 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -191,7 +191,9 @@ static void sas_form_port(struct asd_sas_phy *phy)
 	if (si->dft->lldd_port_formed)
 		si->dft->lldd_port_formed(phy);
 
+	wait_sas_event_init(port);
 	sas_discover_event(phy->port, DISCE_DISCOVER_DOMAIN);
+	wait_for_sas_event_finish(port);
 }
 
 /**
@@ -218,7 +220,9 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 		dev->pathways--;
 
 	if (port->num_phys == 1) {
+		wait_sas_event_init(port);
 		sas_unregister_domain_devices(port, gone);
+		wait_for_sas_event_finish(port);
 		sas_port_delete(port->port);
 		port->port = NULL;
 	} else {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index c4444ad..4b931d4 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -240,6 +240,9 @@ static inline void INIT_SAS_WORK(struct sas_work *sw, void (*fn)(struct work_str
 struct sas_discovery_event {
 	struct sas_work work;
 	struct asd_sas_port *port;
+	enum discover_event	type;
+	int wait;
+	struct completion completion;
 };
 
 static inline struct sas_discovery_event *to_sas_discovery_event(struct work_struct *work)
@@ -256,6 +259,8 @@ struct sas_discovery {
 	u8     eeds_a[8];
 	u8     eeds_b[8];
 	int    max_level;
+	int    wait;
+	struct completion completion;
 };
 
 /* The port struct is Class:RW, driver:RO */
@@ -276,7 +281,8 @@ struct asd_sas_port {
 
 /* public: */
 	int id;
-
+	int busy;
+	struct completion completion;
 	enum sas_class   class;
 	u8               sas_addr[SAS_ADDR_SIZE];
 	u8               attached_sas_addr[SAS_ADDR_SIZE];
@@ -387,6 +393,7 @@ struct sas_ha_struct {
 	int		  eh_active;
 	wait_queue_head_t eh_wait_q;
 	struct list_head  eh_dev_q;
+	int       id; /* for create workqueue */
 
 	struct mutex disco_mutex;
 
@@ -396,6 +403,8 @@ struct sas_ha_struct {
 	char *sas_ha_name;
 	struct device *dev;	  /* should be set */
 	struct module *lldd_module; /* should be set */
+	struct workqueue_struct	*event_q;
+	struct workqueue_struct	*disc_q;
 
 	u8 *sas_addr;		  /* must be set */
 	u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] libsas: Enhance libsas hotplug
@ 2017-05-20  6:39   ` Yijing Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Yijing Wang @ 2017-05-20  6:39 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, john.garry, fangwei1, yanaijie, hch, dan.j.williams,
	Yijing Wang

Libsas complete a hotplug event notified by LLDD in several works,
for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
in following steps:

notify_phy_event	[interrupt context]
	sas_queue_event		[queue work on shost->work_q]
		sas_phye_loss_of_signal		[running in shost->work_q]
			sas_deform_port		[remove sas port]
				sas_unregister_dev
					sas_discover_event	[queue destruct work on shost->work_q tail]

In above case, complete whole hotplug in two works, remove sas port first, then
put the destruction of device in another work and queue it on in the tail of
workqueue, since sas port is the parent of the children rphy device, so if remove
sas port first, the children rphy device would also be deleted, when the destruction
work coming, it would find the target has been removed already, and report a
sysfs warning calltrace.

queue tail                                             queue head
DISCE_DESTRUCT----> PORTE_BYTES_DMAED event ----->PHYE_LOSS_OF_SIGNAL[running]

There are other hotplug issues in current framework, in above case, if there is
hotadd sas event queued between hotremove works, the hotplug order would be broken
and unexpected issues would happen.

In this patch, we try to solve these issues in following steps:
1. create a new workqueue used to run sas event work, instead of scsi host workqueue,
   because we may block sas event work, we cannot block the normal scsi works.
2. create a new workqueue used to run sas discovery events work, instead of scsi host
   workqueue, because in some cases, eg. in revalidate domain event, we may unregister
   a sas device and discover new one, we must sync the execution, wait the remove process
   finish, then start a new discovery. So we must put the probe and destruct discovery
   events in a new workqueue to avoid deadlock.
3. introudce a asd_sas_port level wait-complete and a sas_discovery level wait-complete
   we use former wait-complete to achieve a sas event atomic process and use latter to
   make a sas discovery sync.
4. remove disco_mutex in sas_revalidate_domain, since now sas_revalidate_domain sync
   the destruct discovery event execution, it's no need to lock disco mutex there.

Signed-off-by: Yijing Wang <wangyijing@huawei.com>
---
 drivers/scsi/libsas/sas_discover.c | 58 ++++++++++++++++++++++++++++----------
 drivers/scsi/libsas/sas_event.c    |  2 +-
 drivers/scsi/libsas/sas_expander.c |  9 +++++-
 drivers/scsi/libsas/sas_init.c     | 31 +++++++++++++++++++-
 drivers/scsi/libsas/sas_internal.h | 50 ++++++++++++++++++++++++++++++++
 drivers/scsi/libsas/sas_port.c     |  4 +++
 include/scsi/libsas.h              | 11 +++++++-
 7 files changed, 146 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 60de662..43e8a1e 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct *work)
 	struct domain_device *ddev = port->port_dev;
 
 	/* prevent revalidation from finding sata links in recovery */
-	mutex_lock(&ha->disco_mutex);
 	if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
 		SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
 			    port->id, task_pid_nr(current));
-		goto out;
+		return;
 	}
 
 	clear_bit(DISCE_REVALIDATE_DOMAIN, &port->disc.pending);
@@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct *work)
 
 	SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
 		    port->id, task_pid_nr(current), res);
- out:
-	mutex_unlock(&ha->disco_mutex);
+}
+
+static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
+	[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
+	[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
+	[DISCE_PROBE] = sas_probe_devices,
+	[DISCE_SUSPEND] = sas_suspend_devices,
+	[DISCE_RESUME] = sas_resume_devices,
+	[DISCE_DESTRUCT] = sas_destruct_devices,
+};
+
+/* a simple wrapper for sas discover event funtions */
+static void sas_discover_common_fn(struct work_struct *work)
+{
+	struct sas_discovery_event *ev = to_sas_discovery_event(work);
+	struct asd_sas_port *port = ev->port;
+
+	sas_event_fns[ev->type](work);
+	sas_unbusy_port(port);
 }
 
 /* ---------- Events ---------- */
 
 static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw)
 {
+	int ret;
+	struct sas_discovery_event *ev = to_sas_discovery_event(&sw->work);
+	struct asd_sas_port *port = ev->port;
+
 	/* chained work is not subject to SA_HA_DRAINING or
 	 * SAS_HA_REGISTERED, because it is either submitted in the
 	 * workqueue, or known to be submitted from a context that is
 	 * not racing against draining
 	 */
-	scsi_queue_work(ha->core.shost, &sw->work);
+	sas_busy_port(port);
+
+	/*
+	 * discovery event probe and destruct would be called in other
+	 * discovery event like discover domain and revalidate domain
+	 * events, in some cases, we need to sync execute probe and destruct
+	 * events, so run discover events except probe/destruct in a new
+	 * workqueue.
+	 */
+	if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
+		ret = scsi_queue_work(ha->core.shost, &sw->work);
+	else
+		ret = queue_work(ha->disc_q, &sw->work);
+
+	if (ret != 1)
+		/* queue a work fail, unbusy the ha before return */
+		sas_unbusy_port(port);
 }
 
 static void sas_chain_event(int event, unsigned long *pending,
@@ -575,18 +611,10 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
 {
 	int i;
 
-	static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
-		[DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
-		[DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
-		[DISCE_PROBE] = sas_probe_devices,
-		[DISCE_SUSPEND] = sas_suspend_devices,
-		[DISCE_RESUME] = sas_resume_devices,
-		[DISCE_DESTRUCT] = sas_destruct_devices,
-	};
-
 	disc->pending = 0;
 	for (i = 0; i < DISC_NUM_EVENTS; i++) {
-		INIT_SAS_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
+		INIT_SAS_WORK(&disc->disc_work[i].work, sas_discover_common_fn);
 		disc->disc_work[i].port = port;
+		disc->disc_work[i].type = i;
 	}
 }
diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
index 06c5c4b..c0fc07d 100644
--- a/drivers/scsi/libsas/sas_event.c
+++ b/drivers/scsi/libsas/sas_event.c
@@ -41,7 +41,7 @@ void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
 		if (list_empty(&sw->drain_node))
 			list_add(&sw->drain_node, &ha->defer_q);
 	} else
-		scsi_queue_work(ha->core.shost, &sw->work);
+		queue_work(ha->event_q, &sw->work);
 }
 
 static void sas_queue_event(int event, struct sas_work *work,
diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index 570b2cb..a8c8ae1 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -822,7 +822,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 
 		list_add_tail(&child->disco_list_node, &parent->port->disco_list);
 
+		wait_discover_event_init(child->port);
 		res = sas_discover_sata(child);
+		wait_for_discover_event_finish(child->port);
 		if (res) {
 			SAS_DPRINTK("sas_discover_sata() for device %16llx at "
 				    "%016llx:0x%x returned 0x%x\n",
@@ -847,7 +849,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 
 		list_add_tail(&child->disco_list_node, &parent->port->disco_list);
 
+		wait_discover_event_init(child->port);
 		res = sas_discover_end_dev(child);
+		wait_for_discover_event_finish(child->port);
 		if (res) {
 			SAS_DPRINTK("sas_discover_end_dev() for device %16llx "
 				    "at %016llx:0x%x returned 0x%x\n",
@@ -1890,8 +1894,11 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
 				if (child->dev_type == SAS_EDGE_EXPANDER_DEVICE ||
 				    child->dev_type == SAS_FANOUT_EXPANDER_DEVICE)
 					sas_unregister_ex_tree(parent->port, child);
-				else
+				else {
+					wait_discover_event_init(parent->port);
 					sas_unregister_dev(parent->port, child);
+					wait_for_discover_event_finish(parent->port);
+				}
 				found = child;
 				break;
 			}
diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
index 79f95d0..1c49483 100644
--- a/drivers/scsi/libsas/sas_init.c
+++ b/drivers/scsi/libsas/sas_init.c
@@ -38,6 +38,8 @@
 
 #include "../scsi_sas_internal.h"
 
+static DEFINE_IDA(sas_ida);
+
 static struct kmem_cache *sas_task_cache;
 
 struct sas_task *sas_alloc_task(gfp_t flags)
@@ -116,6 +118,7 @@ void sas_hae_reset(struct work_struct *work)
 int sas_register_ha(struct sas_ha_struct *sas_ha)
 {
 	int error = 0;
+	char name[64];
 
 	mutex_init(&sas_ha->disco_mutex);
 	spin_lock_init(&sas_ha->phy_port_lock);
@@ -146,6 +149,30 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 		goto Undo_ports;
 	}
 
+	sas_ha->id = ida_simple_get(&sas_ida, 0, 0, GFP_KERNEL);
+	if(sas_ha->id < 0)
+		goto Undo_ports;
+
+	memset(name, 0, 64);
+	snprintf(name, 64, "sas-event-%d", sas_ha->id);
+	sas_ha->event_q = create_singlethread_workqueue(name);
+
+	/*
+	 * sas-disc-xx workqueue run the discover work except
+	 * probe and destruct.
+	 */
+	snprintf(name, 64, "sas-disc-%d", sas_ha->id);
+	sas_ha->disc_q = create_singlethread_workqueue(name);
+	if(!sas_ha->event_q || !sas_ha->disc_q) {
+		ida_simple_remove(&sas_ida, sas_ha->id);
+		if (sas_ha->event_q)
+			destroy_workqueue(sas_ha->event_q);
+		if (sas_ha->disc_q)
+			destroy_workqueue(sas_ha->disc_q);
+		goto Undo_ports;
+	}
+
+
 	INIT_LIST_HEAD(&sas_ha->eh_done_q);
 	INIT_LIST_HEAD(&sas_ha->eh_ata_q);
 
@@ -181,6 +208,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 	__sas_drain_work(sas_ha);
 	mutex_unlock(&sas_ha->drain_mutex);
 
+	destroy_workqueue(sas_ha->event_q);
+	destroy_workqueue(sas_ha->disc_q);
+	ida_simple_remove(&sas_ida, sas_ha->id);
 	return 0;
 }
 
@@ -568,7 +598,6 @@ void sas_domain_release_transport(struct scsi_transport_template *stt)
 EXPORT_SYMBOL_GPL(sas_domain_release_transport);
 
 /* ---------- SAS Class register/unregister ---------- */
-
 static int __init sas_class_init(void)
 {
 	sas_task_cache = KMEM_CACHE(sas_task, SLAB_HWCACHE_ALIGN);
diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
index 33ce7e5..276df8e 100644
--- a/drivers/scsi/libsas/sas_internal.h
+++ b/drivers/scsi/libsas/sas_internal.h
@@ -100,6 +100,56 @@ void sas_free_device(struct kref *kref);
 extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
 extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
 
+static inline void wait_discover_event_init(struct asd_sas_port *port)
+{
+	if (port) {
+		init_completion(&port->disc.completion);
+		port->disc.wait = 1;
+	}
+}
+
+static inline void wait_for_discover_event_finish(
+		struct asd_sas_port *port)
+{
+	if (port && port->disc.wait == 1)
+		wait_for_completion(&port->disc.completion);
+}
+
+static inline void wait_sas_event_init(struct asd_sas_port *port)
+{
+	if (port) {
+		init_completion(&port->completion);
+		port->busy = 0;
+	}
+}
+
+static inline void wait_for_sas_event_finish(
+		struct asd_sas_port *port)
+{
+	if (port && port->busy)
+		wait_for_completion(&port->completion);
+}
+
+static inline void sas_busy_port(struct asd_sas_port *port)
+{
+	if (port)
+		port->busy++;
+}
+
+static inline void sas_unbusy_port(struct asd_sas_port *port)
+{
+	if (port && (port->busy > 0)) {
+		port->busy--;
+		if (!port->busy)
+			complete(&port->completion);
+	}
+
+	if (port && (port->disc.wait == 1)) {
+		complete(&port->disc.completion);
+		port->disc.wait = 0;
+	}
+}
+
 #ifdef CONFIG_SCSI_SAS_HOST_SMP
 extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
 				struct request *rsp);
diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
index 9326628..8d8b38c 100644
--- a/drivers/scsi/libsas/sas_port.c
+++ b/drivers/scsi/libsas/sas_port.c
@@ -191,7 +191,9 @@ static void sas_form_port(struct asd_sas_phy *phy)
 	if (si->dft->lldd_port_formed)
 		si->dft->lldd_port_formed(phy);
 
+	wait_sas_event_init(port);
 	sas_discover_event(phy->port, DISCE_DISCOVER_DOMAIN);
+	wait_for_sas_event_finish(port);
 }
 
 /**
@@ -218,7 +220,9 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
 		dev->pathways--;
 
 	if (port->num_phys == 1) {
+		wait_sas_event_init(port);
 		sas_unregister_domain_devices(port, gone);
+		wait_for_sas_event_finish(port);
 		sas_port_delete(port->port);
 		port->port = NULL;
 	} else {
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index c4444ad..4b931d4 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -240,6 +240,9 @@ static inline void INIT_SAS_WORK(struct sas_work *sw, void (*fn)(struct work_str
 struct sas_discovery_event {
 	struct sas_work work;
 	struct asd_sas_port *port;
+	enum discover_event	type;
+	int wait;
+	struct completion completion;
 };
 
 static inline struct sas_discovery_event *to_sas_discovery_event(struct work_struct *work)
@@ -256,6 +259,8 @@ struct sas_discovery {
 	u8     eeds_a[8];
 	u8     eeds_b[8];
 	int    max_level;
+	int    wait;
+	struct completion completion;
 };
 
 /* The port struct is Class:RW, driver:RO */
@@ -276,7 +281,8 @@ struct asd_sas_port {
 
 /* public: */
 	int id;
-
+	int busy;
+	struct completion completion;
 	enum sas_class   class;
 	u8               sas_addr[SAS_ADDR_SIZE];
 	u8               attached_sas_addr[SAS_ADDR_SIZE];
@@ -387,6 +393,7 @@ struct sas_ha_struct {
 	int		  eh_active;
 	wait_queue_head_t eh_wait_q;
 	struct list_head  eh_dev_q;
+	int       id; /* for create workqueue */
 
 	struct mutex disco_mutex;
 
@@ -396,6 +403,8 @@ struct sas_ha_struct {
 	char *sas_ha_name;
 	struct device *dev;	  /* should be set */
 	struct module *lldd_module; /* should be set */
+	struct workqueue_struct	*event_q;
+	struct workqueue_struct	*disc_q;
 
 	u8 *sas_addr;		  /* must be set */
 	u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] libsas: Don't process sas events in static works
  2017-05-20  6:39   ` Yijing Wang
  (?)
@ 2017-05-21  3:44   ` Dan Williams
  2017-05-22  5:54       ` wangyijing
  -1 siblings, 1 reply; 17+ messages in thread
From: Dan Williams @ 2017-05-21  3:44 UTC (permalink / raw)
  To: Yijing Wang
  Cc: James E.J. Bottomley, Martin K. Petersen, chenqilin2, hare,
	linux-scsi, linux-kernel, chenxiang66, huangdaode,
	wangkefeng.wang, zhaohongjiang, dingtianhong, guohanjun,
	John Garry, Wei Fang, yanaijie, Christoph Hellwig, Yousong He

On Fri, May 19, 2017 at 11:39 PM, Yijing Wang <wangyijing@huawei.com> wrote:
> Now libsas hotplug work is static, LLDD driver queue
> the hotplug work into shost->work_q. If LLDD driver
> burst post lots hotplug events to libsas, the hotplug
> events may pending in the workqueue like
>
> shost->work_q
> new work[PORTE_BYTES_DMAED] --> |[PHYE_LOSS_OF_SIGNAL][PORTE_BYTES_DMAED] -> processing
>                                 |<-------wait worker to process-------->|
> In this case, a new PORTE_BYTES_DMAED event coming, libsas try to queue it
> to shost->work_q, but this work is already pending, so it would be lost.
> Finally, libsas delete the related sas port and sas devices, but LLDD driver
> expect libsas add the sas port and devices(last sas event).
>
> This patch remove the static defined hotplug work, and use dynamic work to
> avoid missing hotplug events.

If we go this route we don't even need:

sas_port_event_fns
sas_phy_event_fns
sas_ha_event_fns

...just specify the target routine directly to INIT_WORK() and remove
the indirection.

I also think for safety this should use a mempool that guarantees that
events can continue to be processed under system memory pressure.
Also, have you considered the case when a broken phy starts throwing a
constant stream of events? Is there a point at which libsas should
stop queuing events and disable the phy?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] libsas: Don't process sas events in static works
  2017-05-21  3:44   ` Dan Williams
@ 2017-05-22  5:54       ` wangyijing
  0 siblings, 0 replies; 17+ messages in thread
From: wangyijing @ 2017-05-22  5:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: James E.J. Bottomley, Martin K. Petersen, chenqilin2, hare,
	linux-scsi, linux-kernel, chenxiang66, huangdaode,
	wangkefeng.wang, zhaohongjiang, dingtianhong, guohanjun,
	John Garry, Wei Fang, yanaijie, Christoph Hellwig, Yousong He

Hi Dan, thanks for your review and comments!

在 2017/5/21 11:44, Dan Williams 写道:
> On Fri, May 19, 2017 at 11:39 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Now libsas hotplug work is static, LLDD driver queue
>> the hotplug work into shost->work_q. If LLDD driver
>> burst post lots hotplug events to libsas, the hotplug
>> events may pending in the workqueue like
>>
>> shost->work_q
>> new work[PORTE_BYTES_DMAED] --> |[PHYE_LOSS_OF_SIGNAL][PORTE_BYTES_DMAED] -> processing
>>                                 |<-------wait worker to process-------->|
>> In this case, a new PORTE_BYTES_DMAED event coming, libsas try to queue it
>> to shost->work_q, but this work is already pending, so it would be lost.
>> Finally, libsas delete the related sas port and sas devices, but LLDD driver
>> expect libsas add the sas port and devices(last sas event).
>>
>> This patch remove the static defined hotplug work, and use dynamic work to
>> avoid missing hotplug events.
> 
> If we go this route we don't even need:
> 
> sas_port_event_fns
> sas_phy_event_fns
> sas_ha_event_fns

Yes, these three fns are not necessary, just for avoid lots kfree in phy/port/ha event fns.

> 
> ...just specify the target routine directly to INIT_WORK() and remove
> the indirection.
> 
> I also think for safety this should use a mempool that guarantees that
> events can continue to be processed under system memory pressure.

What I am worried about is it's would still fail if the mempool is used empty during memory pressure.

> Also, have you considered the case when a broken phy starts throwing a
> constant stream of events? Is there a point at which libsas should
> stop queuing events and disable the phy?

Not yet, I didn't find this issue in real case, but I agree, it's really a problem in some broken
hardware, I think it's not a easy problem, we could improve it step by step.

Thanks!
Yijing.

> 
> .
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] libsas: Don't process sas events in static works
@ 2017-05-22  5:54       ` wangyijing
  0 siblings, 0 replies; 17+ messages in thread
From: wangyijing @ 2017-05-22  5:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: James E.J. Bottomley, Martin K. Petersen, chenqilin2, hare,
	linux-scsi, linux-kernel, chenxiang66, huangdaode,
	wangkefeng.wang, zhaohongjiang, dingtianhong, guohanjun,
	John Garry, Wei Fang, yanaijie, Christoph Hellwig, Yousong He

Hi Dan, thanks for your review and comments!

在 2017/5/21 11:44, Dan Williams 写道:
> On Fri, May 19, 2017 at 11:39 PM, Yijing Wang <wangyijing@huawei.com> wrote:
>> Now libsas hotplug work is static, LLDD driver queue
>> the hotplug work into shost->work_q. If LLDD driver
>> burst post lots hotplug events to libsas, the hotplug
>> events may pending in the workqueue like
>>
>> shost->work_q
>> new work[PORTE_BYTES_DMAED] --> |[PHYE_LOSS_OF_SIGNAL][PORTE_BYTES_DMAED] -> processing
>>                                 |<-------wait worker to process-------->|
>> In this case, a new PORTE_BYTES_DMAED event coming, libsas try to queue it
>> to shost->work_q, but this work is already pending, so it would be lost.
>> Finally, libsas delete the related sas port and sas devices, but LLDD driver
>> expect libsas add the sas port and devices(last sas event).
>>
>> This patch remove the static defined hotplug work, and use dynamic work to
>> avoid missing hotplug events.
> 
> If we go this route we don't even need:
> 
> sas_port_event_fns
> sas_phy_event_fns
> sas_ha_event_fns

Yes, these three fns are not necessary, just for avoid lots kfree in phy/port/ha event fns.

> 
> ...just specify the target routine directly to INIT_WORK() and remove
> the indirection.
> 
> I also think for safety this should use a mempool that guarantees that
> events can continue to be processed under system memory pressure.

What I am worried about is it's would still fail if the mempool is used empty during memory pressure.

> Also, have you considered the case when a broken phy starts throwing a
> constant stream of events? Is there a point at which libsas should
> stop queuing events and disable the phy?

Not yet, I didn't find this issue in real case, but I agree, it's really a problem in some broken
hardware, I think it's not a easy problem, we could improve it step by step.

Thanks!
Yijing.

> 
> .
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] libsas: Don't process sas events in static works
  2017-05-22  5:54       ` wangyijing
@ 2017-05-22  9:28         ` John Garry
  -1 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2017-05-22  9:28 UTC (permalink / raw)
  To: wangyijing, Dan Williams
  Cc: James E.J. Bottomley, Martin K. Petersen, chenqilin2, hare,
	linux-scsi, linux-kernel, chenxiang66, huangdaode,
	wangkefeng.wang, zhaohongjiang, dingtianhong, guohanjun,
	Wei Fang, yanaijie, Christoph Hellwig, Yousong He, Linuxarm

On 22/05/2017 06:54, wangyijing wrote:
>> I also think for safety this should use a mempool that guarantees that
>> > events can continue to be processed under system memory pressure.
> What I am worried about is it's would still fail if the mempool is used empty during memory pressure.
>
>> > Also, have you considered the case when a broken phy starts throwing a
>> > constant stream of events? Is there a point at which libsas should
>> > stop queuing events and disable the phy?
> Not yet, I didn't find this issue in real case, but I agree, it's really a problem in some broken
> hardware, I think it's not a easy problem, we could improve it step by step.
>
> Thanks!
> Yijing.
>

I have seen this scenario on our development board when we have a bad 
physical cable connection - the PHY continually goes up and down in a loop.

So, in this regard, it is worth safeguarding against this scenario.

John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] libsas: Don't process sas events in static works
@ 2017-05-22  9:28         ` John Garry
  0 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2017-05-22  9:28 UTC (permalink / raw)
  To: wangyijing, Dan Williams
  Cc: James E.J. Bottomley, Martin K. Petersen, chenqilin2, hare,
	linux-scsi, linux-kernel, chenxiang66, huangdaode,
	wangkefeng.wang, zhaohongjiang, dingtianhong, guohanjun,
	Wei Fang, yanaijie, Christoph Hellwig, Yousong He, Linuxarm

On 22/05/2017 06:54, wangyijing wrote:
>> I also think for safety this should use a mempool that guarantees that
>> > events can continue to be processed under system memory pressure.
> What I am worried about is it's would still fail if the mempool is used empty during memory pressure.
>
>> > Also, have you considered the case when a broken phy starts throwing a
>> > constant stream of events? Is there a point at which libsas should
>> > stop queuing events and disable the phy?
> Not yet, I didn't find this issue in real case, but I agree, it's really a problem in some broken
> hardware, I think it's not a easy problem, we could improve it step by step.
>
> Thanks!
> Yijing.
>

I have seen this scenario on our development board when we have a bad 
physical cable connection - the PHY continually goes up and down in a loop.

So, in this regard, it is worth safeguarding against this scenario.

John

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] libsas: Don't process sas events in static works
  2017-05-22  9:28         ` John Garry
@ 2017-05-23  6:39           ` wangyijing
  -1 siblings, 0 replies; 17+ messages in thread
From: wangyijing @ 2017-05-23  6:39 UTC (permalink / raw)
  To: John Garry, Dan Williams
  Cc: James E.J. Bottomley, Martin K. Petersen, chenqilin2, hare,
	linux-scsi, linux-kernel, chenxiang66, huangdaode,
	wangkefeng.wang, zhaohongjiang, dingtianhong, guohanjun,
	Wei Fang, yanaijie, Christoph Hellwig, Yousong He, Linuxarm

>>
> 
> I have seen this scenario on our development board when we have a bad physical cable connection - the PHY continually goes up and down in a loop.
> 
> So, in this regard, it is worth safeguarding against this scenario.

OK, I will reconsider this case.

Thanks!
Yijing.


> 
> John
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] libsas: Don't process sas events in static works
@ 2017-05-23  6:39           ` wangyijing
  0 siblings, 0 replies; 17+ messages in thread
From: wangyijing @ 2017-05-23  6:39 UTC (permalink / raw)
  To: John Garry, Dan Williams
  Cc: James E.J. Bottomley, Martin K. Petersen, chenqilin2, hare,
	linux-scsi, linux-kernel, chenxiang66, huangdaode,
	wangkefeng.wang, zhaohongjiang, dingtianhong, guohanjun,
	Wei Fang, yanaijie, Christoph Hellwig, Yousong He, Linuxarm

>>
> 
> I have seen this scenario on our development board when we have a bad physical cable connection - the PHY continually goes up and down in a loop.
> 
> So, in this regard, it is worth safeguarding against this scenario.

OK, I will reconsider this case.

Thanks!
Yijing.


> 
> John
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] libsas: Enhance libsas hotplug
  2017-05-20  6:39   ` Yijing Wang
@ 2017-05-25  9:04     ` John Garry
  -1 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2017-05-25  9:04 UTC (permalink / raw)
  To: Yijing Wang, jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, fangwei1, yanaijie, hch, dan.j.williams,
	Johannes Thumshirn

Hi,

There are some comments, inline.

In general, if it works, it looks ok.

Other reviews would be greatly appreciated - Hannes, Christoph, 
Johannes, Dan - please.

 > Libsas complete a hotplug event notified by LLDD in several works,
 > for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
 > in following steps:
 >
 > notify_phy_event    [interrupt context]
 >     sas_queue_event        [queue work on shost->work_q]
 >         sas_phye_loss_of_signal        [running in shost->work_q]
 >             sas_deform_port        [remove sas port]
 >                 sas_unregister_dev
 >                     sas_discover_event    [queue destruct work on 
shost->work_q tail]
 >
 > In above case, complete whole hotplug in two works, remove sas port 
first, then
 > put the destruction of device in another work and queue it on in the 
tail of
 > workqueue, since sas port is the parent of the children rphy device, 
so if remove
 > sas port first, the children rphy device would also be deleted, when 
the destruction
 > work coming, it would find the target has been removed already, and 
report a
 > sysfs warning calltrace.
 >
 > queue tail                                             queue head
 > DISCE_DESTRUCT----> PORTE_BYTES_DMAED event 
----->PHYE_LOSS_OF_SIGNAL[running]
 >
 > There are other hotplug issues in current framework, in above case, 
if there is
 > hotadd sas event queued between hotremove works, the hotplug order 
would be broken
 > and unexpected issues would happen.
 >
 > In this patch, we try to solve these issues in following steps:
 > 1. create a new workqueue used to run sas event work, instead of scsi 
host workqueue,
 >    because we may block sas event work, we cannot block the normal 
scsi works.

What do we block the event work for?

 > 2. create a new workqueue used to run sas discovery events work, 
instead of scsi host
 >    workqueue, because in some cases, eg. in revalidate domain event, 
we may unregister
 >    a sas device and discover new one, we must sync the execution, 
wait the remove process
 >    finish, then start a new discovery. So we must put the probe and 
destruct discovery
 >    events in a new workqueue to avoid deadlock.
 > 3. introudce a asd_sas_port level wait-complete and a sas_discovery 
level wait-complete
 >    we use former wait-complete to achieve a sas event atomic process 
and use latter to
 >    make a sas discovery sync.
 > 4. remove disco_mutex in sas_revalidate_domain, since now 
sas_revalidate_domain sync
 >    the destruct discovery event execution, it's no need to lock disco 
mutex there.
 >
 > Signed-off-by: Yijing Wang <wangyijing@huawei.com>
 > ---
 >  drivers/scsi/libsas/sas_discover.c | 58 
++++++++++++++++++++++++++++----------
 >  drivers/scsi/libsas/sas_event.c    |  2 +-
 >  drivers/scsi/libsas/sas_expander.c |  9 +++++-
 >  drivers/scsi/libsas/sas_init.c     | 31 +++++++++++++++++++-
 >  drivers/scsi/libsas/sas_internal.h | 50 ++++++++++++++++++++++++++++++++
 >  drivers/scsi/libsas/sas_port.c     |  4 +++
 >  include/scsi/libsas.h              | 11 +++++++-
 >  7 files changed, 146 insertions(+), 19 deletions(-)
 >
 > diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
 > index 60de662..43e8a1e 100644
 > --- a/drivers/scsi/libsas/sas_discover.c
 > +++ b/drivers/scsi/libsas/sas_discover.c
 > @@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct 
work_struct *work)
 >      struct domain_device *ddev = port->port_dev;
 >
 >      /* prevent revalidation from finding sata links in recovery */
 > -    mutex_lock(&ha->disco_mutex);
 >      if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
 >          SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
 >                  port->id, task_pid_nr(current));
 > -        goto out;
 > +        return;
 >      }
 >
 >      clear_bit(DISCE_REVALIDATE_DOMAIN, &port->disc.pending);
 > @@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct 
work_struct *work)
 >
 >      SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 
0x%x\n",
 >              port->id, task_pid_nr(current), res);
 > - out:
 > -    mutex_unlock(&ha->disco_mutex);
 > +}
 > +
 > +static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
 > +    [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
 > +    [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
 > +    [DISCE_PROBE] = sas_probe_devices,
 > +    [DISCE_SUSPEND] = sas_suspend_devices,
 > +    [DISCE_RESUME] = sas_resume_devices,
 > +    [DISCE_DESTRUCT] = sas_destruct_devices,
 > +};
 > +
 > +/* a simple wrapper for sas discover event funtions */
 > +static void sas_discover_common_fn(struct work_struct *work)
 > +{
 > +    struct sas_discovery_event *ev = to_sas_discovery_event(work);
 > +    struct asd_sas_port *port = ev->port;
 > +
 > +    sas_event_fns[ev->type](work);
 > +    sas_unbusy_port(port);
 >  }
 >
 >  /* ---------- Events ---------- */
 >
 >  static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work 
*sw)
 >  {
 > +    int ret;
 > +    struct sas_discovery_event *ev = to_sas_discovery_event(&sw->work);
 > +    struct asd_sas_port *port = ev->port;
 > +
 >      /* chained work is not subject to SA_HA_DRAINING or
 >       * SAS_HA_REGISTERED, because it is either submitted in the
 >       * workqueue, or known to be submitted from a context that is
 >       * not racing against draining
 >       */

Is this comment still valid (even if you have not touched the drain 
logic work)?

 > -    scsi_queue_work(ha->core.shost, &sw->work);
 > +    sas_busy_port(port);
 > +
 > +    /*
 > +     * discovery event probe and destruct would be called in other
 > +     * discovery event like discover domain and revalidate domain
 > +     * events, in some cases, we need to sync execute probe and destruct
 > +     * events, so run discover events except probe/destruct in a new
 > +     * workqueue.
 > +     */
 > +    if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
 > +        ret = scsi_queue_work(ha->core.shost, &sw->work);
 > +    else
 > +        ret = queue_work(ha->disc_q, &sw->work);
 > +
 > +    if (ret != 1)
 > +        /* queue a work fail, unbusy the ha before return */
 > +        sas_unbusy_port(port);

Do we really need to check for this error case, since we have dynamic 
work structs (I think queue_work only fails if we try requeuing a work 
item)?

 >  }
 >
 >  static void sas_chain_event(int event, unsigned long *pending,
 > @@ -575,18 +611,10 @@ void sas_init_disc(struct sas_discovery *disc, 
struct asd_sas_port *port)
 >  {
 >      int i;
 >
 > -    static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
 > -        [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
 > -        [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
 > -        [DISCE_PROBE] = sas_probe_devices,
 > -        [DISCE_SUSPEND] = sas_suspend_devices,
 > -        [DISCE_RESUME] = sas_resume_devices,
 > -        [DISCE_DESTRUCT] = sas_destruct_devices,
 > -    };
 > -
 >      disc->pending = 0;
 >      for (i = 0; i < DISC_NUM_EVENTS; i++) {
 > -        INIT_SAS_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
 > +        INIT_SAS_WORK(&disc->disc_work[i].work, sas_discover_common_fn);
 >          disc->disc_work[i].port = port;
 > +        disc->disc_work[i].type = i;
 >      }
 >  }
 > diff --git a/drivers/scsi/libsas/sas_event.c 
b/drivers/scsi/libsas/sas_event.c
 > index 06c5c4b..c0fc07d 100644
 > --- a/drivers/scsi/libsas/sas_event.c
 > +++ b/drivers/scsi/libsas/sas_event.c
 > @@ -41,7 +41,7 @@ void sas_queue_work(struct sas_ha_struct *ha, 
struct sas_work *sw)
 >          if (list_empty(&sw->drain_node))
 >              list_add(&sw->drain_node, &ha->defer_q);
 >      } else
 > -        scsi_queue_work(ha->core.shost, &sw->work);
 > +        queue_work(ha->event_q, &sw->work);
 >  }
 >
 >  static void sas_queue_event(int event, struct sas_work *work,
 > diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
 > index 570b2cb..a8c8ae1 100644
 > --- a/drivers/scsi/libsas/sas_expander.c
 > +++ b/drivers/scsi/libsas/sas_expander.c
 > @@ -822,7 +822,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 >
 >          list_add_tail(&child->disco_list_node, 
&parent->port->disco_list);
 >
 > +        wait_discover_event_init(child->port);
 >          res = sas_discover_sata(child);
 > +        wait_for_discover_event_finish(child->port);
 >          if (res) {
 >              SAS_DPRINTK("sas_discover_sata() for device %16llx at "
 >                      "%016llx:0x%x returned 0x%x\n",
 > @@ -847,7 +849,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 >
 >          list_add_tail(&child->disco_list_node, 
&parent->port->disco_list);
 >
 > +        wait_discover_event_init(child->port);
 >          res = sas_discover_end_dev(child);

In sas_discover_end_dev(), we may return before sending the queue event 
(if LLDD notify dev found returns error), we please take care of this.

 > +        wait_for_discover_event_finish(child->port);
 >          if (res) {
 >              SAS_DPRINTK("sas_discover_end_dev() for device %16llx "
 >                      "at %016llx:0x%x returned 0x%x\n",
 > @@ -1890,8 +1894,11 @@ static void 
sas_unregister_devs_sas_addr(struct domain_device *parent,
 >                  if (child->dev_type == SAS_EDGE_EXPANDER_DEVICE ||
 >                      child->dev_type == SAS_FANOUT_EXPANDER_DEVICE)
 >                      sas_unregister_ex_tree(parent->port, child);
 > -                else
 > +                else {
 > +                    wait_discover_event_init(parent->port);
 >                      sas_unregister_dev(parent->port, child);
 > +                    wait_for_discover_event_finish(parent->port);
 > +                }
 >                  found = child;
 >                  break;
 >              }
 > diff --git a/drivers/scsi/libsas/sas_init.c 
b/drivers/scsi/libsas/sas_init.c
 > index 79f95d0..1c49483 100644
 > --- a/drivers/scsi/libsas/sas_init.c
 > +++ b/drivers/scsi/libsas/sas_init.c
 > @@ -38,6 +38,8 @@
 >
 >  #include "../scsi_sas_internal.h"
 >
 > +static DEFINE_IDA(sas_ida);
 > +
 >  static struct kmem_cache *sas_task_cache;
 >
 >  struct sas_task *sas_alloc_task(gfp_t flags)
 > @@ -116,6 +118,7 @@ void sas_hae_reset(struct work_struct *work)
 >  int sas_register_ha(struct sas_ha_struct *sas_ha)
 >  {
 >      int error = 0;
 > +    char name[64];
 >
 >      mutex_init(&sas_ha->disco_mutex);
 >      spin_lock_init(&sas_ha->phy_port_lock);
 > @@ -146,6 +149,30 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 >          goto Undo_ports;
 >      }
 >
 > +    sas_ha->id = ida_simple_get(&sas_ida, 0, 0, GFP_KERNEL);
 > +    if(sas_ha->id < 0)
 > +        goto Undo_ports;
 > +
 > +    memset(name, 0, 64);

Why memset and then sprintf?

 > +    snprintf(name, 64, "sas-event-%d", sas_ha->id);

Can you just use unique dev_name(sas_ha->dev) to help form this name, so 
that you don't have to introduce IDR?

 > +    sas_ha->event_q = create_singlethread_workqueue(name);
 > +
 > +    /*
 > +     * sas-disc-xx workqueue run the discover work except
 > +     * probe and destruct.
 > +     */
 > +    snprintf(name, 64, "sas-disc-%d", sas_ha->id);
 > +    sas_ha->disc_q = create_singlethread_workqueue(name);
 > +    if(!sas_ha->event_q || !sas_ha->disc_q) {
 > +        ida_simple_remove(&sas_ida, sas_ha->id);
 > +        if (sas_ha->event_q)
 > +            destroy_workqueue(sas_ha->event_q);
 > +        if (sas_ha->disc_q)
 > +            destroy_workqueue(sas_ha->disc_q);

Can this error handling be a bit more concise?

 > +        goto Undo_ports;
 > +    }
 > +
 > +
 >      INIT_LIST_HEAD(&sas_ha->eh_done_q);
 >      INIT_LIST_HEAD(&sas_ha->eh_ata_q);
 >
 > @@ -181,6 +208,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 >      __sas_drain_work(sas_ha);
 >      mutex_unlock(&sas_ha->drain_mutex);
 >
 > +    destroy_workqueue(sas_ha->event_q);
 > +    destroy_workqueue(sas_ha->disc_q);
 > +    ida_simple_remove(&sas_ida, sas_ha->id);
 >      return 0;
 >  }
 >
 > @@ -568,7 +598,6 @@ void sas_domain_release_transport(struct 
scsi_transport_template *stt)
 >  EXPORT_SYMBOL_GPL(sas_domain_release_transport);
 >
 >  /* ---------- SAS Class register/unregister ---------- */
 > -
 >  static int __init sas_class_init(void)
 >  {
 >      sas_task_cache = KMEM_CACHE(sas_task, SLAB_HWCACHE_ALIGN);
 > diff --git a/drivers/scsi/libsas/sas_internal.h 
b/drivers/scsi/libsas/sas_internal.h
 > index 33ce7e5..276df8e 100644
 > --- a/drivers/scsi/libsas/sas_internal.h
 > +++ b/drivers/scsi/libsas/sas_internal.h
 > @@ -100,6 +100,56 @@ void sas_free_device(struct kref *kref);
 >  extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
 >  extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
 >
 > +static inline void wait_discover_event_init(struct asd_sas_port *port)

You need to change function names to have "sas" prefix. Actually these 
functions are all a bit messy.

 > +{
 > +    if (port) {

This init and wait function are currently act ask bookend wrappers. I 
think it may be better to put them in the wrapped function (if 
possible), as:
a. probably then we don't need port NULL check
b. handles situations where event is possibly not queued, like the 
suspected sas_discover_end_dev()

 > +        init_completion(&port->disc.completion);
 > +        port->disc.wait = 1;
 > +    }
 > +}
 > +
 > +static inline void wait_for_discover_event_finish(
 > +        struct asd_sas_port *port)
 > +{
 > +    if (port && port->disc.wait == 1)

Can you just use completion_done() instead of introducing another 
variable in discovery_event.wait?

 > +        wait_for_completion(&port->disc.completion);
 > +}
 > +
 > +static inline void wait_sas_event_init(struct asd_sas_port *port)
 > +{
 > +    if (port) {
 > +        init_completion(&port->completion);
 > +        port->busy = 0;
 > +    }
 > +}
 > +
 > +static inline void wait_for_sas_event_finish(
 > +        struct asd_sas_port *port)
 > +{
 > +    if (port && port->busy)
 > +        wait_for_completion(&port->completion);
 > +}
 > +
 > +static inline void sas_busy_port(struct asd_sas_port *port)
 > +{
 > +    if (port)
 > +        port->busy++;

Why not use kref?

 > +}
 > +
 > +static inline void sas_unbusy_port(struct asd_sas_port *port)
 > +{
 > +    if (port && (port->busy > 0)) {
 > +        port->busy--;
 > +        if (!port->busy)
 > +            complete(&port->completion);
 > +    }
 > +
 > +    if (port && (port->disc.wait == 1)) {

Why check port twice?

 > +        complete(&port->disc.completion);
 > +        port->disc.wait = 0;
 > +    }
 > +}
 > +
 >  #ifdef CONFIG_SCSI_SAS_HOST_SMP
 >  extern int sas_smp_host_handler(struct Scsi_Host *shost, struct 
request *req,
 >                  struct request *rsp);
 > diff --git a/drivers/scsi/libsas/sas_port.c 
b/drivers/scsi/libsas/sas_port.c
 > index 9326628..8d8b38c 100644
 > --- a/drivers/scsi/libsas/sas_port.c
 > +++ b/drivers/scsi/libsas/sas_port.c
 > @@ -191,7 +191,9 @@ static void sas_form_port(struct asd_sas_phy *phy)
 >      if (si->dft->lldd_port_formed)
 >          si->dft->lldd_port_formed(phy);
 >
 > +    wait_sas_event_init(port);
 >      sas_discover_event(phy->port, DISCE_DISCOVER_DOMAIN);
 > +    wait_for_sas_event_finish(port);

Is it neater to put these calls inside sas_discover_event()?

 >  }
 >
 >  /**
 > @@ -218,7 +220,9 @@ void sas_deform_port(struct asd_sas_phy *phy, int 
gone)
 >          dev->pathways--;
 >
 >      if (port->num_phys == 1) {
 > +        wait_sas_event_init(port);
 >          sas_unregister_domain_devices(port, gone);
 > +        wait_for_sas_event_finish(port);
 >          sas_port_delete(port->port);
 >          port->port = NULL;
 >      } else {
 > diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
 > index c4444ad..4b931d4 100644
 > --- a/include/scsi/libsas.h
 > +++ b/include/scsi/libsas.h
 > @@ -240,6 +240,9 @@ static inline void INIT_SAS_WORK(struct sas_work 
*sw, void (*fn)(struct work_str
 >  struct sas_discovery_event {
 >      struct sas_work work;
 >      struct asd_sas_port *port;
 > +    enum discover_event    type;
 > +    int wait;
 > +    struct completion completion;
 >  };
 >
 >  static inline struct sas_discovery_event 
*to_sas_discovery_event(struct work_struct *work)
 > @@ -256,6 +259,8 @@ struct sas_discovery {
 >      u8     eeds_a[8];
 >      u8     eeds_b[8];
 >      int    max_level;
 > +    int    wait;
 > +    struct completion completion;

Again, does completion_done() do the same job as wait element?

 >  };
 >
 >  /* The port struct is Class:RW, driver:RO */
 > @@ -276,7 +281,8 @@ struct asd_sas_port {
 >
 >  /* public: */
 >      int id;
 > -
 > +    int busy;
 > +    struct completion completion;

I think public means LLDD can access, which is not the case

 >      enum sas_class   class;
 >      u8               sas_addr[SAS_ADDR_SIZE];
 >      u8               attached_sas_addr[SAS_ADDR_SIZE];
 > @@ -387,6 +393,7 @@ struct sas_ha_struct {
 >      int          eh_active;
 >      wait_queue_head_t eh_wait_q;
 >      struct list_head  eh_dev_q;
 > +    int       id; /* for create workqueue */
 >
 >      struct mutex disco_mutex;
 >
 > @@ -396,6 +403,8 @@ struct sas_ha_struct {
 >      char *sas_ha_name;
 >      struct device *dev;      /* should be set */
 >      struct module *lldd_module; /* should be set */
 > +    struct workqueue_struct    *event_q;
 > +    struct workqueue_struct    *disc_q;
 >
 >      u8 *sas_addr;          /* must be set */
 >      u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
 >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] libsas: Enhance libsas hotplug
@ 2017-05-25  9:04     ` John Garry
  0 siblings, 0 replies; 17+ messages in thread
From: John Garry @ 2017-05-25  9:04 UTC (permalink / raw)
  To: Yijing Wang, jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, fangwei1, yanaijie, hch, dan.j.williams,
	Johannes Thumshirn

Hi,

There are some comments, inline.

In general, if it works, it looks ok.

Other reviews would be greatly appreciated - Hannes, Christoph, 
Johannes, Dan - please.

 > Libsas complete a hotplug event notified by LLDD in several works,
 > for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
 > in following steps:
 >
 > notify_phy_event    [interrupt context]
 >     sas_queue_event        [queue work on shost->work_q]
 >         sas_phye_loss_of_signal        [running in shost->work_q]
 >             sas_deform_port        [remove sas port]
 >                 sas_unregister_dev
 >                     sas_discover_event    [queue destruct work on 
shost->work_q tail]
 >
 > In above case, complete whole hotplug in two works, remove sas port 
first, then
 > put the destruction of device in another work and queue it on in the 
tail of
 > workqueue, since sas port is the parent of the children rphy device, 
so if remove
 > sas port first, the children rphy device would also be deleted, when 
the destruction
 > work coming, it would find the target has been removed already, and 
report a
 > sysfs warning calltrace.
 >
 > queue tail                                             queue head
 > DISCE_DESTRUCT----> PORTE_BYTES_DMAED event 
----->PHYE_LOSS_OF_SIGNAL[running]
 >
 > There are other hotplug issues in current framework, in above case, 
if there is
 > hotadd sas event queued between hotremove works, the hotplug order 
would be broken
 > and unexpected issues would happen.
 >
 > In this patch, we try to solve these issues in following steps:
 > 1. create a new workqueue used to run sas event work, instead of scsi 
host workqueue,
 >    because we may block sas event work, we cannot block the normal 
scsi works.

What do we block the event work for?

 > 2. create a new workqueue used to run sas discovery events work, 
instead of scsi host
 >    workqueue, because in some cases, eg. in revalidate domain event, 
we may unregister
 >    a sas device and discover new one, we must sync the execution, 
wait the remove process
 >    finish, then start a new discovery. So we must put the probe and 
destruct discovery
 >    events in a new workqueue to avoid deadlock.
 > 3. introudce a asd_sas_port level wait-complete and a sas_discovery 
level wait-complete
 >    we use former wait-complete to achieve a sas event atomic process 
and use latter to
 >    make a sas discovery sync.
 > 4. remove disco_mutex in sas_revalidate_domain, since now 
sas_revalidate_domain sync
 >    the destruct discovery event execution, it's no need to lock disco 
mutex there.
 >
 > Signed-off-by: Yijing Wang <wangyijing@huawei.com>
 > ---
 >  drivers/scsi/libsas/sas_discover.c | 58 
++++++++++++++++++++++++++++----------
 >  drivers/scsi/libsas/sas_event.c    |  2 +-
 >  drivers/scsi/libsas/sas_expander.c |  9 +++++-
 >  drivers/scsi/libsas/sas_init.c     | 31 +++++++++++++++++++-
 >  drivers/scsi/libsas/sas_internal.h | 50 ++++++++++++++++++++++++++++++++
 >  drivers/scsi/libsas/sas_port.c     |  4 +++
 >  include/scsi/libsas.h              | 11 +++++++-
 >  7 files changed, 146 insertions(+), 19 deletions(-)
 >
 > diff --git a/drivers/scsi/libsas/sas_discover.c 
b/drivers/scsi/libsas/sas_discover.c
 > index 60de662..43e8a1e 100644
 > --- a/drivers/scsi/libsas/sas_discover.c
 > +++ b/drivers/scsi/libsas/sas_discover.c
 > @@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct 
work_struct *work)
 >      struct domain_device *ddev = port->port_dev;
 >
 >      /* prevent revalidation from finding sata links in recovery */
 > -    mutex_lock(&ha->disco_mutex);
 >      if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
 >          SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
 >                  port->id, task_pid_nr(current));
 > -        goto out;
 > +        return;
 >      }
 >
 >      clear_bit(DISCE_REVALIDATE_DOMAIN, &port->disc.pending);
 > @@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct 
work_struct *work)
 >
 >      SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 
0x%x\n",
 >              port->id, task_pid_nr(current), res);
 > - out:
 > -    mutex_unlock(&ha->disco_mutex);
 > +}
 > +
 > +static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
 > +    [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
 > +    [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
 > +    [DISCE_PROBE] = sas_probe_devices,
 > +    [DISCE_SUSPEND] = sas_suspend_devices,
 > +    [DISCE_RESUME] = sas_resume_devices,
 > +    [DISCE_DESTRUCT] = sas_destruct_devices,
 > +};
 > +
 > +/* a simple wrapper for sas discover event funtions */
 > +static void sas_discover_common_fn(struct work_struct *work)
 > +{
 > +    struct sas_discovery_event *ev = to_sas_discovery_event(work);
 > +    struct asd_sas_port *port = ev->port;
 > +
 > +    sas_event_fns[ev->type](work);
 > +    sas_unbusy_port(port);
 >  }
 >
 >  /* ---------- Events ---------- */
 >
 >  static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work 
*sw)
 >  {
 > +    int ret;
 > +    struct sas_discovery_event *ev = to_sas_discovery_event(&sw->work);
 > +    struct asd_sas_port *port = ev->port;
 > +
 >      /* chained work is not subject to SA_HA_DRAINING or
 >       * SAS_HA_REGISTERED, because it is either submitted in the
 >       * workqueue, or known to be submitted from a context that is
 >       * not racing against draining
 >       */

Is this comment still valid (even if you have not touched the drain 
logic work)?

 > -    scsi_queue_work(ha->core.shost, &sw->work);
 > +    sas_busy_port(port);
 > +
 > +    /*
 > +     * discovery event probe and destruct would be called in other
 > +     * discovery event like discover domain and revalidate domain
 > +     * events, in some cases, we need to sync execute probe and destruct
 > +     * events, so run discover events except probe/destruct in a new
 > +     * workqueue.
 > +     */
 > +    if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
 > +        ret = scsi_queue_work(ha->core.shost, &sw->work);
 > +    else
 > +        ret = queue_work(ha->disc_q, &sw->work);
 > +
 > +    if (ret != 1)
 > +        /* queue a work fail, unbusy the ha before return */
 > +        sas_unbusy_port(port);

Do we really need to check for this error case, since we have dynamic 
work structs (I think queue_work only fails if we try requeuing a work 
item)?

 >  }
 >
 >  static void sas_chain_event(int event, unsigned long *pending,
 > @@ -575,18 +611,10 @@ void sas_init_disc(struct sas_discovery *disc, 
struct asd_sas_port *port)
 >  {
 >      int i;
 >
 > -    static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
 > -        [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
 > -        [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
 > -        [DISCE_PROBE] = sas_probe_devices,
 > -        [DISCE_SUSPEND] = sas_suspend_devices,
 > -        [DISCE_RESUME] = sas_resume_devices,
 > -        [DISCE_DESTRUCT] = sas_destruct_devices,
 > -    };
 > -
 >      disc->pending = 0;
 >      for (i = 0; i < DISC_NUM_EVENTS; i++) {
 > -        INIT_SAS_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
 > +        INIT_SAS_WORK(&disc->disc_work[i].work, sas_discover_common_fn);
 >          disc->disc_work[i].port = port;
 > +        disc->disc_work[i].type = i;
 >      }
 >  }
 > diff --git a/drivers/scsi/libsas/sas_event.c 
b/drivers/scsi/libsas/sas_event.c
 > index 06c5c4b..c0fc07d 100644
 > --- a/drivers/scsi/libsas/sas_event.c
 > +++ b/drivers/scsi/libsas/sas_event.c
 > @@ -41,7 +41,7 @@ void sas_queue_work(struct sas_ha_struct *ha, 
struct sas_work *sw)
 >          if (list_empty(&sw->drain_node))
 >              list_add(&sw->drain_node, &ha->defer_q);
 >      } else
 > -        scsi_queue_work(ha->core.shost, &sw->work);
 > +        queue_work(ha->event_q, &sw->work);
 >  }
 >
 >  static void sas_queue_event(int event, struct sas_work *work,
 > diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
 > index 570b2cb..a8c8ae1 100644
 > --- a/drivers/scsi/libsas/sas_expander.c
 > +++ b/drivers/scsi/libsas/sas_expander.c
 > @@ -822,7 +822,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 >
 >          list_add_tail(&child->disco_list_node, 
&parent->port->disco_list);
 >
 > +        wait_discover_event_init(child->port);
 >          res = sas_discover_sata(child);
 > +        wait_for_discover_event_finish(child->port);
 >          if (res) {
 >              SAS_DPRINTK("sas_discover_sata() for device %16llx at "
 >                      "%016llx:0x%x returned 0x%x\n",
 > @@ -847,7 +849,9 @@ static struct domain_device *sas_ex_discover_end_dev(
 >
 >          list_add_tail(&child->disco_list_node, 
&parent->port->disco_list);
 >
 > +        wait_discover_event_init(child->port);
 >          res = sas_discover_end_dev(child);

In sas_discover_end_dev(), we may return before sending the queue event 
(if LLDD notify dev found returns error), we please take care of this.

 > +        wait_for_discover_event_finish(child->port);
 >          if (res) {
 >              SAS_DPRINTK("sas_discover_end_dev() for device %16llx "
 >                      "at %016llx:0x%x returned 0x%x\n",
 > @@ -1890,8 +1894,11 @@ static void 
sas_unregister_devs_sas_addr(struct domain_device *parent,
 >                  if (child->dev_type == SAS_EDGE_EXPANDER_DEVICE ||
 >                      child->dev_type == SAS_FANOUT_EXPANDER_DEVICE)
 >                      sas_unregister_ex_tree(parent->port, child);
 > -                else
 > +                else {
 > +                    wait_discover_event_init(parent->port);
 >                      sas_unregister_dev(parent->port, child);
 > +                    wait_for_discover_event_finish(parent->port);
 > +                }
 >                  found = child;
 >                  break;
 >              }
 > diff --git a/drivers/scsi/libsas/sas_init.c 
b/drivers/scsi/libsas/sas_init.c
 > index 79f95d0..1c49483 100644
 > --- a/drivers/scsi/libsas/sas_init.c
 > +++ b/drivers/scsi/libsas/sas_init.c
 > @@ -38,6 +38,8 @@
 >
 >  #include "../scsi_sas_internal.h"
 >
 > +static DEFINE_IDA(sas_ida);
 > +
 >  static struct kmem_cache *sas_task_cache;
 >
 >  struct sas_task *sas_alloc_task(gfp_t flags)
 > @@ -116,6 +118,7 @@ void sas_hae_reset(struct work_struct *work)
 >  int sas_register_ha(struct sas_ha_struct *sas_ha)
 >  {
 >      int error = 0;
 > +    char name[64];
 >
 >      mutex_init(&sas_ha->disco_mutex);
 >      spin_lock_init(&sas_ha->phy_port_lock);
 > @@ -146,6 +149,30 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
 >          goto Undo_ports;
 >      }
 >
 > +    sas_ha->id = ida_simple_get(&sas_ida, 0, 0, GFP_KERNEL);
 > +    if(sas_ha->id < 0)
 > +        goto Undo_ports;
 > +
 > +    memset(name, 0, 64);

Why memset and then sprintf?

 > +    snprintf(name, 64, "sas-event-%d", sas_ha->id);

Can you just use unique dev_name(sas_ha->dev) to help form this name, so 
that you don't have to introduce IDR?

 > +    sas_ha->event_q = create_singlethread_workqueue(name);
 > +
 > +    /*
 > +     * sas-disc-xx workqueue run the discover work except
 > +     * probe and destruct.
 > +     */
 > +    snprintf(name, 64, "sas-disc-%d", sas_ha->id);
 > +    sas_ha->disc_q = create_singlethread_workqueue(name);
 > +    if(!sas_ha->event_q || !sas_ha->disc_q) {
 > +        ida_simple_remove(&sas_ida, sas_ha->id);
 > +        if (sas_ha->event_q)
 > +            destroy_workqueue(sas_ha->event_q);
 > +        if (sas_ha->disc_q)
 > +            destroy_workqueue(sas_ha->disc_q);

Can this error handling be a bit more concise?

 > +        goto Undo_ports;
 > +    }
 > +
 > +
 >      INIT_LIST_HEAD(&sas_ha->eh_done_q);
 >      INIT_LIST_HEAD(&sas_ha->eh_ata_q);
 >
 > @@ -181,6 +208,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
 >      __sas_drain_work(sas_ha);
 >      mutex_unlock(&sas_ha->drain_mutex);
 >
 > +    destroy_workqueue(sas_ha->event_q);
 > +    destroy_workqueue(sas_ha->disc_q);
 > +    ida_simple_remove(&sas_ida, sas_ha->id);
 >      return 0;
 >  }
 >
 > @@ -568,7 +598,6 @@ void sas_domain_release_transport(struct 
scsi_transport_template *stt)
 >  EXPORT_SYMBOL_GPL(sas_domain_release_transport);
 >
 >  /* ---------- SAS Class register/unregister ---------- */
 > -
 >  static int __init sas_class_init(void)
 >  {
 >      sas_task_cache = KMEM_CACHE(sas_task, SLAB_HWCACHE_ALIGN);
 > diff --git a/drivers/scsi/libsas/sas_internal.h 
b/drivers/scsi/libsas/sas_internal.h
 > index 33ce7e5..276df8e 100644
 > --- a/drivers/scsi/libsas/sas_internal.h
 > +++ b/drivers/scsi/libsas/sas_internal.h
 > @@ -100,6 +100,56 @@ void sas_free_device(struct kref *kref);
 >  extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
 >  extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
 >
 > +static inline void wait_discover_event_init(struct asd_sas_port *port)

You need to change function names to have "sas" prefix. Actually these 
functions are all a bit messy.

 > +{
 > +    if (port) {

This init and wait function are currently act ask bookend wrappers. I 
think it may be better to put them in the wrapped function (if 
possible), as:
a. probably then we don't need port NULL check
b. handles situations where event is possibly not queued, like the 
suspected sas_discover_end_dev()

 > +        init_completion(&port->disc.completion);
 > +        port->disc.wait = 1;
 > +    }
 > +}
 > +
 > +static inline void wait_for_discover_event_finish(
 > +        struct asd_sas_port *port)
 > +{
 > +    if (port && port->disc.wait == 1)

Can you just use completion_done() instead of introducing another 
variable in discovery_event.wait?

 > +        wait_for_completion(&port->disc.completion);
 > +}
 > +
 > +static inline void wait_sas_event_init(struct asd_sas_port *port)
 > +{
 > +    if (port) {
 > +        init_completion(&port->completion);
 > +        port->busy = 0;
 > +    }
 > +}
 > +
 > +static inline void wait_for_sas_event_finish(
 > +        struct asd_sas_port *port)
 > +{
 > +    if (port && port->busy)
 > +        wait_for_completion(&port->completion);
 > +}
 > +
 > +static inline void sas_busy_port(struct asd_sas_port *port)
 > +{
 > +    if (port)
 > +        port->busy++;

Why not use kref?

 > +}
 > +
 > +static inline void sas_unbusy_port(struct asd_sas_port *port)
 > +{
 > +    if (port && (port->busy > 0)) {
 > +        port->busy--;
 > +        if (!port->busy)
 > +            complete(&port->completion);
 > +    }
 > +
 > +    if (port && (port->disc.wait == 1)) {

Why check port twice?

 > +        complete(&port->disc.completion);
 > +        port->disc.wait = 0;
 > +    }
 > +}
 > +
 >  #ifdef CONFIG_SCSI_SAS_HOST_SMP
 >  extern int sas_smp_host_handler(struct Scsi_Host *shost, struct 
request *req,
 >                  struct request *rsp);
 > diff --git a/drivers/scsi/libsas/sas_port.c 
b/drivers/scsi/libsas/sas_port.c
 > index 9326628..8d8b38c 100644
 > --- a/drivers/scsi/libsas/sas_port.c
 > +++ b/drivers/scsi/libsas/sas_port.c
 > @@ -191,7 +191,9 @@ static void sas_form_port(struct asd_sas_phy *phy)
 >      if (si->dft->lldd_port_formed)
 >          si->dft->lldd_port_formed(phy);
 >
 > +    wait_sas_event_init(port);
 >      sas_discover_event(phy->port, DISCE_DISCOVER_DOMAIN);
 > +    wait_for_sas_event_finish(port);

Is it neater to put these calls inside sas_discover_event()?

 >  }
 >
 >  /**
 > @@ -218,7 +220,9 @@ void sas_deform_port(struct asd_sas_phy *phy, int 
gone)
 >          dev->pathways--;
 >
 >      if (port->num_phys == 1) {
 > +        wait_sas_event_init(port);
 >          sas_unregister_domain_devices(port, gone);
 > +        wait_for_sas_event_finish(port);
 >          sas_port_delete(port->port);
 >          port->port = NULL;
 >      } else {
 > diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
 > index c4444ad..4b931d4 100644
 > --- a/include/scsi/libsas.h
 > +++ b/include/scsi/libsas.h
 > @@ -240,6 +240,9 @@ static inline void INIT_SAS_WORK(struct sas_work 
*sw, void (*fn)(struct work_str
 >  struct sas_discovery_event {
 >      struct sas_work work;
 >      struct asd_sas_port *port;
 > +    enum discover_event    type;
 > +    int wait;
 > +    struct completion completion;
 >  };
 >
 >  static inline struct sas_discovery_event 
*to_sas_discovery_event(struct work_struct *work)
 > @@ -256,6 +259,8 @@ struct sas_discovery {
 >      u8     eeds_a[8];
 >      u8     eeds_b[8];
 >      int    max_level;
 > +    int    wait;
 > +    struct completion completion;

Again, does completion_done() do the same job as wait element?

 >  };
 >
 >  /* The port struct is Class:RW, driver:RO */
 > @@ -276,7 +281,8 @@ struct asd_sas_port {
 >
 >  /* public: */
 >      int id;
 > -
 > +    int busy;
 > +    struct completion completion;

I think public means LLDD can access, which is not the case

 >      enum sas_class   class;
 >      u8               sas_addr[SAS_ADDR_SIZE];
 >      u8               attached_sas_addr[SAS_ADDR_SIZE];
 > @@ -387,6 +393,7 @@ struct sas_ha_struct {
 >      int          eh_active;
 >      wait_queue_head_t eh_wait_q;
 >      struct list_head  eh_dev_q;
 > +    int       id; /* for create workqueue */
 >
 >      struct mutex disco_mutex;
 >
 > @@ -396,6 +403,8 @@ struct sas_ha_struct {
 >      char *sas_ha_name;
 >      struct device *dev;      /* should be set */
 >      struct module *lldd_module; /* should be set */
 > +    struct workqueue_struct    *event_q;
 > +    struct workqueue_struct    *disc_q;
 >
 >      u8 *sas_addr;          /* must be set */
 >      u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
 >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] libsas: Enhance libsas hotplug
  2017-05-25  9:04     ` John Garry
@ 2017-05-25 12:31       ` wangyijing
  -1 siblings, 0 replies; 17+ messages in thread
From: wangyijing @ 2017-05-25 12:31 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, fangwei1, yanaijie, hch, dan.j.williams,
	Johannes Thumshirn

Hi John, thanks for your review and comments!

在 2017/5/25 17:04, John Garry 写道:
> Hi,
> 
> There are some comments, inline.
> 
> In general, if it works, it looks ok.
> 
> Other reviews would be greatly appreciated - Hannes, Christoph, Johannes, Dan - please.
> 
>> Libsas complete a hotplug event notified by LLDD in several works,
>> for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
>> in following steps:
>>
>> notify_phy_event    [interrupt context]
>>     sas_queue_event        [queue work on shost->work_q]
>>         sas_phye_loss_of_signal        [running in shost->work_q]
>>             sas_deform_port        [remove sas port]
>>                 sas_unregister_dev
>>                     sas_discover_event    [queue destruct work on shost->work_q tail]
>>
>> In above case, complete whole hotplug in two works, remove sas port first, then
>> put the destruction of device in another work and queue it on in the tail of
>> workqueue, since sas port is the parent of the children rphy device, so if remove
>> sas port first, the children rphy device would also be deleted, when the destruction
>> work coming, it would find the target has been removed already, and report a
>> sysfs warning calltrace.
>>
>> queue tail                                             queue head
>> DISCE_DESTRUCT----> PORTE_BYTES_DMAED event ----->PHYE_LOSS_OF_SIGNAL[running]
>>
>> There are other hotplug issues in current framework, in above case, if there is
>> hotadd sas event queued between hotremove works, the hotplug order would be broken
>> and unexpected issues would happen.
>>
>> In this patch, we try to solve these issues in following steps:
>> 1. create a new workqueue used to run sas event work, instead of scsi host workqueue,
>>    because we may block sas event work, we cannot block the normal scsi works.
> 
> What do we block the event work for?

When libsas receive a phy down event, sas_deform_port would be called, and now we block sas_deform_port
and wait for destruction work finish, in sas_destruct_devices, we may wait ata error handler, it would
take a long time, so if do all stuff in scsi host workq, libsas may block other scsi works too long.

> 
>> 2. create a new workqueue used to run sas discovery events work, instead of scsi host
>>    workqueue, because in some cases, eg. in revalidate domain event, we may unregister
>>    a sas device and discover new one, we must sync the execution, wait the remove process
>>    finish, then start a new discovery. So we must put the probe and destruct discovery
>>    events in a new workqueue to avoid deadlock.
>> 3. introudce a asd_sas_port level wait-complete and a sas_discovery level wait-complete
>>    we use former wait-complete to achieve a sas event atomic process and use latter to
>>    make a sas discovery sync.
>> 4. remove disco_mutex in sas_revalidate_domain, since now sas_revalidate_domain sync
>>    the destruct discovery event execution, it's no need to lock disco mutex there.
>>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> ---
>>  drivers/scsi/libsas/sas_discover.c | 58 ++++++++++++++++++++++++++++----------
>>  drivers/scsi/libsas/sas_event.c    |  2 +-
>>  drivers/scsi/libsas/sas_expander.c |  9 +++++-
>>  drivers/scsi/libsas/sas_init.c     | 31 +++++++++++++++++++-
>>  drivers/scsi/libsas/sas_internal.h | 50 ++++++++++++++++++++++++++++++++
>>  drivers/scsi/libsas/sas_port.c     |  4 +++
>>  include/scsi/libsas.h              | 11 +++++++-
>>  7 files changed, 146 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
>> index 60de662..43e8a1e 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct *work)
>>      struct domain_device *ddev = port->port_dev;
>>
>>      /* prevent revalidation from finding sata links in recovery */
>> -    mutex_lock(&ha->disco_mutex);
>>      if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
>>          SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
>>                  port->id, task_pid_nr(current));
>> -        goto out;
>> +        return;
>>      }
>>
>>      clear_bit(DISCE_REVALIDATE_DOMAIN, &port->disc.pending);
>> @@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct *work)
>>
>>      SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
>>              port->id, task_pid_nr(current), res);
>> - out:
>> -    mutex_unlock(&ha->disco_mutex);
>> +}
>> +
>> +static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
>> +    [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
>> +    [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
>> +    [DISCE_PROBE] = sas_probe_devices,
>> +    [DISCE_SUSPEND] = sas_suspend_devices,
>> +    [DISCE_RESUME] = sas_resume_devices,
>> +    [DISCE_DESTRUCT] = sas_destruct_devices,
>> +};
>> +
>> +/* a simple wrapper for sas discover event funtions */
>> +static void sas_discover_common_fn(struct work_struct *work)
>> +{
>> +    struct sas_discovery_event *ev = to_sas_discovery_event(work);
>> +    struct asd_sas_port *port = ev->port;
>> +
>> +    sas_event_fns[ev->type](work);
>> +    sas_unbusy_port(port);
>>  }
>>
>>  /* ---------- Events ---------- */
>>
>>  static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw)
>>  {
>> +    int ret;
>> +    struct sas_discovery_event *ev = to_sas_discovery_event(&sw->work);
>> +    struct asd_sas_port *port = ev->port;
>> +
>>      /* chained work is not subject to SA_HA_DRAINING or
>>       * SAS_HA_REGISTERED, because it is either submitted in the
>>       * workqueue, or known to be submitted from a context that is
>>       * not racing against draining
>>       */
> 
> Is this comment still valid (even if you have not touched the drain logic work)?

Yes, I think so.

> 
>> -    scsi_queue_work(ha->core.shost, &sw->work);
>> +    sas_busy_port(port);
>> +
>> +    /*
>> +     * discovery event probe and destruct would be called in other
>> +     * discovery event like discover domain and revalidate domain
>> +     * events, in some cases, we need to sync execute probe and destruct
>> +     * events, so run discover events except probe/destruct in a new
>> +     * workqueue.
>> +     */
>> +    if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
>> +        ret = scsi_queue_work(ha->core.shost, &sw->work);
>> +    else
>> +        ret = queue_work(ha->disc_q, &sw->work);
>> +
>> +    if (ret != 1)
>> +        /* queue a work fail, unbusy the ha before return */
>> +        sas_unbusy_port(port);
> 
> Do we really need to check for this error case, since we have dynamic work structs (I think queue_work only fails if we try requeuing a work item)?

We only change sas event work to dynamic, for sas discovery event work, it's still static.

> 
>>  }
>>
>>  static void sas_chain_event(int event, unsigned long *pending,
>> @@ -575,18 +611,10 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
>>  {
>>      int i;
>>
>> -    static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
>> -        [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
>> -        [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
>> -        [DISCE_PROBE] = sas_probe_devices,
>> -        [DISCE_SUSPEND] = sas_suspend_devices,
>> -        [DISCE_RESUME] = sas_resume_devices,
>> -        [DISCE_DESTRUCT] = sas_destruct_devices,
>> -    };
>> -
>>      disc->pending = 0;
>>      for (i = 0; i < DISC_NUM_EVENTS; i++) {
>> -        INIT_SAS_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
>> +        INIT_SAS_WORK(&disc->disc_work[i].work, sas_discover_common_fn);
>>          disc->disc_work[i].port = port;
>> +        disc->disc_work[i].type = i;
>>      }
>>  }
>> diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
>> index 06c5c4b..c0fc07d 100644
>> --- a/drivers/scsi/libsas/sas_event.c
>> +++ b/drivers/scsi/libsas/sas_event.c
>> @@ -41,7 +41,7 @@ void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
>>          if (list_empty(&sw->drain_node))
>>              list_add(&sw->drain_node, &ha->defer_q);
>>      } else
>> -        scsi_queue_work(ha->core.shost, &sw->work);
>> +        queue_work(ha->event_q, &sw->work);
>>  }
>>
>>  static void sas_queue_event(int event, struct sas_work *work,
>> diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
>> index 570b2cb..a8c8ae1 100644
>> --- a/drivers/scsi/libsas/sas_expander.c
>> +++ b/drivers/scsi/libsas/sas_expander.c
>> @@ -822,7 +822,9 @@ static struct domain_device *sas_ex_discover_end_dev(
>>
>>          list_add_tail(&child->disco_list_node, &parent->port->disco_list);
>>
>> +        wait_discover_event_init(child->port);
>>          res = sas_discover_sata(child);
>> +        wait_for_discover_event_finish(child->port);
>>          if (res) {
>>              SAS_DPRINTK("sas_discover_sata() for device %16llx at "
>>                      "%016llx:0x%x returned 0x%x\n",
>> @@ -847,7 +849,9 @@ static struct domain_device *sas_ex_discover_end_dev(
>>
>>          list_add_tail(&child->disco_list_node, &parent->port->disco_list);
>>
>> +        wait_discover_event_init(child->port);
>>          res = sas_discover_end_dev(child);
> 
> In sas_discover_end_dev(), we may return before sending the queue event (if LLDD notify dev found returns error), we please take care of this.

Good catch, I will fix this case, thanks!

> 
>> +        wait_for_discover_event_finish(child->port);
>>          if (res) {
>>              SAS_DPRINTK("sas_discover_end_dev() for device %16llx "
>>                      "at %016llx:0x%x returned 0x%x\n",
>> @@ -1890,8 +1894,11 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
>>                  if (child->dev_type == SAS_EDGE_EXPANDER_DEVICE ||
>>                      child->dev_type == SAS_FANOUT_EXPANDER_DEVICE)
>>                      sas_unregister_ex_tree(parent->port, child);
>> -                else
>> +                else {
>> +                    wait_discover_event_init(parent->port);
>>                      sas_unregister_dev(parent->port, child);
>> +                    wait_for_discover_event_finish(parent->port);
>> +                }
>>                  found = child;
>>                  break;
>>              }
>> diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
>> index 79f95d0..1c49483 100644
>> --- a/drivers/scsi/libsas/sas_init.c
>> +++ b/drivers/scsi/libsas/sas_init.c
>> @@ -38,6 +38,8 @@
>>
>>  #include "../scsi_sas_internal.h"
>>
>> +static DEFINE_IDA(sas_ida);
>> +
>>  static struct kmem_cache *sas_task_cache;
>>
>>  struct sas_task *sas_alloc_task(gfp_t flags)
>> @@ -116,6 +118,7 @@ void sas_hae_reset(struct work_struct *work)
>>  int sas_register_ha(struct sas_ha_struct *sas_ha)
>>  {
>>      int error = 0;
>> +    char name[64];
>>
>>      mutex_init(&sas_ha->disco_mutex);
>>      spin_lock_init(&sas_ha->phy_port_lock);
>> @@ -146,6 +149,30 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
>>          goto Undo_ports;
>>      }
>>
>> +    sas_ha->id = ida_simple_get(&sas_ida, 0, 0, GFP_KERNEL);
>> +    if(sas_ha->id < 0)
>> +        goto Undo_ports;
>> +
>> +    memset(name, 0, 64);
> 
> Why memset and then sprintf?
> 
>> +    snprintf(name, 64, "sas-event-%d", sas_ha->id);
> 
> Can you just use unique dev_name(sas_ha->dev) to help form this name, so that you don't have to introduce IDR?

I check the sas_ha->dev, and found it's point to platform device, so it should be safe to use dev_name(sas_ha->dev), thanks!

> 
>> +    sas_ha->event_q = create_singlethread_workqueue(name);
>> +
>> +    /*
>> +     * sas-disc-xx workqueue run the discover work except
>> +     * probe and destruct.
>> +     */
>> +    snprintf(name, 64, "sas-disc-%d", sas_ha->id);
>> +    sas_ha->disc_q = create_singlethread_workqueue(name);
>> +    if(!sas_ha->event_q || !sas_ha->disc_q) {
>> +        ida_simple_remove(&sas_ida, sas_ha->id);
>> +        if (sas_ha->event_q)
>> +            destroy_workqueue(sas_ha->event_q);
>> +        if (sas_ha->disc_q)
>> +            destroy_workqueue(sas_ha->disc_q);
> 
> Can this error handling be a bit more concise?

OK, will refresh.

> 
>> +        goto Undo_ports;
>> +    }
>> +
>> +
>>      INIT_LIST_HEAD(&sas_ha->eh_done_q);
>>      INIT_LIST_HEAD(&sas_ha->eh_ata_q);
>>
>> @@ -181,6 +208,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
>>      __sas_drain_work(sas_ha);
>>      mutex_unlock(&sas_ha->drain_mutex);
>>
>> +    destroy_workqueue(sas_ha->event_q);
>> +    destroy_workqueue(sas_ha->disc_q);
>> +    ida_simple_remove(&sas_ida, sas_ha->id);
>>      return 0;
>>  }
>>
>> @@ -568,7 +598,6 @@ void sas_domain_release_transport(struct scsi_transport_template *stt)
>>  EXPORT_SYMBOL_GPL(sas_domain_release_transport);
>>
>>  /* ---------- SAS Class register/unregister ---------- */
>> -
>>  static int __init sas_class_init(void)
>>  {
>>      sas_task_cache = KMEM_CACHE(sas_task, SLAB_HWCACHE_ALIGN);
>> diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
>> index 33ce7e5..276df8e 100644
>> --- a/drivers/scsi/libsas/sas_internal.h
>> +++ b/drivers/scsi/libsas/sas_internal.h
>> @@ -100,6 +100,56 @@ void sas_free_device(struct kref *kref);
>>  extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
>>  extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
>>
>> +static inline void wait_discover_event_init(struct asd_sas_port *port)
> 
> You need to change function names to have "sas" prefix. Actually these functions are all a bit messy.

OK.

> 
>> +{
>> +    if (port) {
> 
> This init and wait function are currently act ask bookend wrappers. I think it may be better to put them in the wrapped function (if possible), as:
> a. probably then we don't need port NULL check
> b. handles situations where event is possibly not queued, like the suspected sas_discover_end_dev()
> 
>> +        init_completion(&port->disc.completion);
>> +        port->disc.wait = 1;
>> +    }
>> +}
>> +
>> +static inline void wait_for_discover_event_finish(
>> +        struct asd_sas_port *port)
>> +{
>> +    if (port && port->disc.wait == 1)
> 
> Can you just use completion_done() instead of introducing another variable in discovery_event.wait?

What I am worried about completion_done() may be called before we call wait_for_compltion(), in this case,
the process will hang.


> 
>> +        wait_for_completion(&port->disc.completion);
>> +}
>> +
>> +static inline void wait_sas_event_init(struct asd_sas_port *port)
>> +{
>> +    if (port) {
>> +        init_completion(&port->completion);
>> +        port->busy = 0;
>> +    }
>> +}
>> +
>> +static inline void wait_for_sas_event_finish(
>> +        struct asd_sas_port *port)
>> +{
>> +    if (port && port->busy)
>> +        wait_for_completion(&port->completion);
>> +}
>> +
>> +static inline void sas_busy_port(struct asd_sas_port *port)
>> +{
>> +    if (port)
>> +        port->busy++;
> 
> Why not use kref?

Good idea, will replace.

> 
>> +}
>> +
>> +static inline void sas_unbusy_port(struct asd_sas_port *port)
>> +{
>> +    if (port && (port->busy > 0)) {
>> +        port->busy--;
>> +        if (!port->busy)
>> +            complete(&port->completion);
>> +    }
>> +
>> +    if (port && (port->disc.wait == 1)) {
> 
> Why check port twice?

Will remove the second check, thanks!

> 
>> +        complete(&port->disc.completion);
>> +        port->disc.wait = 0;
>> +    }
>> +}
>> +
>>  #ifdef CONFIG_SCSI_SAS_HOST_SMP
>>  extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
>>                  struct request *rsp);
>> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> index 9326628..8d8b38c 100644
>> --- a/drivers/scsi/libsas/sas_port.c
>> +++ b/drivers/scsi/libsas/sas_port.c
>> @@ -191,7 +191,9 @@ static void sas_form_port(struct asd_sas_phy *phy)
>>      if (si->dft->lldd_port_formed)
>>          si->dft->lldd_port_formed(phy);
>>
>> +    wait_sas_event_init(port);
>>      sas_discover_event(phy->port, DISCE_DISCOVER_DOMAIN);
>> +    wait_for_sas_event_finish(port);
> 
> Is it neater to put these calls inside sas_discover_event()?

Now we have two wait-complete, first for sas event, like in sas_form_port/sas_deform_port,
second, for sas_discover event, like in sas_revalidate_domain, also, sas_discover_event()
may be called recursive, there is some difficult to put these calls inside it.


> 
>>  }
>>
>>  /**
>> @@ -218,7 +220,9 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>>          dev->pathways--;
>>
>>      if (port->num_phys == 1) {
>> +        wait_sas_event_init(port);
>>          sas_unregister_domain_devices(port, gone);
>> +        wait_for_sas_event_finish(port);
>>          sas_port_delete(port->port);
>>          port->port = NULL;
>>      } else {
>> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
>> index c4444ad..4b931d4 100644
>> --- a/include/scsi/libsas.h
>> +++ b/include/scsi/libsas.h
>> @@ -240,6 +240,9 @@ static inline void INIT_SAS_WORK(struct sas_work *sw, void (*fn)(struct work_str
>>  struct sas_discovery_event {
>>      struct sas_work work;
>>      struct asd_sas_port *port;
>> +    enum discover_event    type;
>> +    int wait;
>> +    struct completion completion;
>>  };
>>
>>  static inline struct sas_discovery_event *to_sas_discovery_event(struct work_struct *work)
>> @@ -256,6 +259,8 @@ struct sas_discovery {
>>      u8     eeds_a[8];
>>      u8     eeds_b[8];
>>      int    max_level;
>> +    int    wait;
>> +    struct completion completion;
> 
> Again, does completion_done() do the same job as wait element?

same as above.

> 
>>  };
>>
>>  /* The port struct is Class:RW, driver:RO */
>> @@ -276,7 +281,8 @@ struct asd_sas_port {
>>
>>  /* public: */
>>      int id;
>> -
>> +    int busy;
>> +    struct completion completion;
> 
> I think public means LLDD can access, which is not the case

OK. will move them up.

> 
>>      enum sas_class   class;
>>      u8               sas_addr[SAS_ADDR_SIZE];
>>      u8               attached_sas_addr[SAS_ADDR_SIZE];
>> @@ -387,6 +393,7 @@ struct sas_ha_struct {
>>      int          eh_active;
>>      wait_queue_head_t eh_wait_q;
>>      struct list_head  eh_dev_q;
>> +    int       id; /* for create workqueue */
>>
>>      struct mutex disco_mutex;
>>
>> @@ -396,6 +403,8 @@ struct sas_ha_struct {
>>      char *sas_ha_name;
>>      struct device *dev;      /* should be set */
>>      struct module *lldd_module; /* should be set */
>> +    struct workqueue_struct    *event_q;
>> +    struct workqueue_struct    *disc_q;
>>
>>      u8 *sas_addr;          /* must be set */
>>      u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] libsas: Enhance libsas hotplug
@ 2017-05-25 12:31       ` wangyijing
  0 siblings, 0 replies; 17+ messages in thread
From: wangyijing @ 2017-05-25 12:31 UTC (permalink / raw)
  To: John Garry, jejb, martin.petersen
  Cc: chenqilin2, hare, linux-scsi, linux-kernel, chenxiang66,
	huangdaode, wangkefeng.wang, zhaohongjiang, dingtianhong,
	guohanjun, fangwei1, yanaijie, hch, dan.j.williams,
	Johannes Thumshirn

Hi John, thanks for your review and comments!

在 2017/5/25 17:04, John Garry 写道:
> Hi,
> 
> There are some comments, inline.
> 
> In general, if it works, it looks ok.
> 
> Other reviews would be greatly appreciated - Hannes, Christoph, Johannes, Dan - please.
> 
>> Libsas complete a hotplug event notified by LLDD in several works,
>> for example, if libsas receive a PHYE_LOSS_OF_SIGNAL, we process it
>> in following steps:
>>
>> notify_phy_event    [interrupt context]
>>     sas_queue_event        [queue work on shost->work_q]
>>         sas_phye_loss_of_signal        [running in shost->work_q]
>>             sas_deform_port        [remove sas port]
>>                 sas_unregister_dev
>>                     sas_discover_event    [queue destruct work on shost->work_q tail]
>>
>> In above case, complete whole hotplug in two works, remove sas port first, then
>> put the destruction of device in another work and queue it on in the tail of
>> workqueue, since sas port is the parent of the children rphy device, so if remove
>> sas port first, the children rphy device would also be deleted, when the destruction
>> work coming, it would find the target has been removed already, and report a
>> sysfs warning calltrace.
>>
>> queue tail                                             queue head
>> DISCE_DESTRUCT----> PORTE_BYTES_DMAED event ----->PHYE_LOSS_OF_SIGNAL[running]
>>
>> There are other hotplug issues in current framework, in above case, if there is
>> hotadd sas event queued between hotremove works, the hotplug order would be broken
>> and unexpected issues would happen.
>>
>> In this patch, we try to solve these issues in following steps:
>> 1. create a new workqueue used to run sas event work, instead of scsi host workqueue,
>>    because we may block sas event work, we cannot block the normal scsi works.
> 
> What do we block the event work for?

When libsas receive a phy down event, sas_deform_port would be called, and now we block sas_deform_port
and wait for destruction work finish, in sas_destruct_devices, we may wait ata error handler, it would
take a long time, so if do all stuff in scsi host workq, libsas may block other scsi works too long.

> 
>> 2. create a new workqueue used to run sas discovery events work, instead of scsi host
>>    workqueue, because in some cases, eg. in revalidate domain event, we may unregister
>>    a sas device and discover new one, we must sync the execution, wait the remove process
>>    finish, then start a new discovery. So we must put the probe and destruct discovery
>>    events in a new workqueue to avoid deadlock.
>> 3. introudce a asd_sas_port level wait-complete and a sas_discovery level wait-complete
>>    we use former wait-complete to achieve a sas event atomic process and use latter to
>>    make a sas discovery sync.
>> 4. remove disco_mutex in sas_revalidate_domain, since now sas_revalidate_domain sync
>>    the destruct discovery event execution, it's no need to lock disco mutex there.
>>
>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> ---
>>  drivers/scsi/libsas/sas_discover.c | 58 ++++++++++++++++++++++++++++----------
>>  drivers/scsi/libsas/sas_event.c    |  2 +-
>>  drivers/scsi/libsas/sas_expander.c |  9 +++++-
>>  drivers/scsi/libsas/sas_init.c     | 31 +++++++++++++++++++-
>>  drivers/scsi/libsas/sas_internal.h | 50 ++++++++++++++++++++++++++++++++
>>  drivers/scsi/libsas/sas_port.c     |  4 +++
>>  include/scsi/libsas.h              | 11 +++++++-
>>  7 files changed, 146 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
>> index 60de662..43e8a1e 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -503,11 +503,10 @@ static void sas_revalidate_domain(struct work_struct *work)
>>      struct domain_device *ddev = port->port_dev;
>>
>>      /* prevent revalidation from finding sata links in recovery */
>> -    mutex_lock(&ha->disco_mutex);
>>      if (test_bit(SAS_HA_ATA_EH_ACTIVE, &ha->state)) {
>>          SAS_DPRINTK("REVALIDATION DEFERRED on port %d, pid:%d\n",
>>                  port->id, task_pid_nr(current));
>> -        goto out;
>> +        return;
>>      }
>>
>>      clear_bit(DISCE_REVALIDATE_DOMAIN, &port->disc.pending);
>> @@ -521,20 +520,57 @@ static void sas_revalidate_domain(struct work_struct *work)
>>
>>      SAS_DPRINTK("done REVALIDATING DOMAIN on port %d, pid:%d, res 0x%x\n",
>>              port->id, task_pid_nr(current), res);
>> - out:
>> -    mutex_unlock(&ha->disco_mutex);
>> +}
>> +
>> +static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
>> +    [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
>> +    [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
>> +    [DISCE_PROBE] = sas_probe_devices,
>> +    [DISCE_SUSPEND] = sas_suspend_devices,
>> +    [DISCE_RESUME] = sas_resume_devices,
>> +    [DISCE_DESTRUCT] = sas_destruct_devices,
>> +};
>> +
>> +/* a simple wrapper for sas discover event funtions */
>> +static void sas_discover_common_fn(struct work_struct *work)
>> +{
>> +    struct sas_discovery_event *ev = to_sas_discovery_event(work);
>> +    struct asd_sas_port *port = ev->port;
>> +
>> +    sas_event_fns[ev->type](work);
>> +    sas_unbusy_port(port);
>>  }
>>
>>  /* ---------- Events ---------- */
>>
>>  static void sas_chain_work(struct sas_ha_struct *ha, struct sas_work *sw)
>>  {
>> +    int ret;
>> +    struct sas_discovery_event *ev = to_sas_discovery_event(&sw->work);
>> +    struct asd_sas_port *port = ev->port;
>> +
>>      /* chained work is not subject to SA_HA_DRAINING or
>>       * SAS_HA_REGISTERED, because it is either submitted in the
>>       * workqueue, or known to be submitted from a context that is
>>       * not racing against draining
>>       */
> 
> Is this comment still valid (even if you have not touched the drain logic work)?

Yes, I think so.

> 
>> -    scsi_queue_work(ha->core.shost, &sw->work);
>> +    sas_busy_port(port);
>> +
>> +    /*
>> +     * discovery event probe and destruct would be called in other
>> +     * discovery event like discover domain and revalidate domain
>> +     * events, in some cases, we need to sync execute probe and destruct
>> +     * events, so run discover events except probe/destruct in a new
>> +     * workqueue.
>> +     */
>> +    if (ev->type == DISCE_PROBE || ev->type == DISCE_DESTRUCT)
>> +        ret = scsi_queue_work(ha->core.shost, &sw->work);
>> +    else
>> +        ret = queue_work(ha->disc_q, &sw->work);
>> +
>> +    if (ret != 1)
>> +        /* queue a work fail, unbusy the ha before return */
>> +        sas_unbusy_port(port);
> 
> Do we really need to check for this error case, since we have dynamic work structs (I think queue_work only fails if we try requeuing a work item)?

We only change sas event work to dynamic, for sas discovery event work, it's still static.

> 
>>  }
>>
>>  static void sas_chain_event(int event, unsigned long *pending,
>> @@ -575,18 +611,10 @@ void sas_init_disc(struct sas_discovery *disc, struct asd_sas_port *port)
>>  {
>>      int i;
>>
>> -    static const work_func_t sas_event_fns[DISC_NUM_EVENTS] = {
>> -        [DISCE_DISCOVER_DOMAIN] = sas_discover_domain,
>> -        [DISCE_REVALIDATE_DOMAIN] = sas_revalidate_domain,
>> -        [DISCE_PROBE] = sas_probe_devices,
>> -        [DISCE_SUSPEND] = sas_suspend_devices,
>> -        [DISCE_RESUME] = sas_resume_devices,
>> -        [DISCE_DESTRUCT] = sas_destruct_devices,
>> -    };
>> -
>>      disc->pending = 0;
>>      for (i = 0; i < DISC_NUM_EVENTS; i++) {
>> -        INIT_SAS_WORK(&disc->disc_work[i].work, sas_event_fns[i]);
>> +        INIT_SAS_WORK(&disc->disc_work[i].work, sas_discover_common_fn);
>>          disc->disc_work[i].port = port;
>> +        disc->disc_work[i].type = i;
>>      }
>>  }
>> diff --git a/drivers/scsi/libsas/sas_event.c b/drivers/scsi/libsas/sas_event.c
>> index 06c5c4b..c0fc07d 100644
>> --- a/drivers/scsi/libsas/sas_event.c
>> +++ b/drivers/scsi/libsas/sas_event.c
>> @@ -41,7 +41,7 @@ void sas_queue_work(struct sas_ha_struct *ha, struct sas_work *sw)
>>          if (list_empty(&sw->drain_node))
>>              list_add(&sw->drain_node, &ha->defer_q);
>>      } else
>> -        scsi_queue_work(ha->core.shost, &sw->work);
>> +        queue_work(ha->event_q, &sw->work);
>>  }
>>
>>  static void sas_queue_event(int event, struct sas_work *work,
>> diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
>> index 570b2cb..a8c8ae1 100644
>> --- a/drivers/scsi/libsas/sas_expander.c
>> +++ b/drivers/scsi/libsas/sas_expander.c
>> @@ -822,7 +822,9 @@ static struct domain_device *sas_ex_discover_end_dev(
>>
>>          list_add_tail(&child->disco_list_node, &parent->port->disco_list);
>>
>> +        wait_discover_event_init(child->port);
>>          res = sas_discover_sata(child);
>> +        wait_for_discover_event_finish(child->port);
>>          if (res) {
>>              SAS_DPRINTK("sas_discover_sata() for device %16llx at "
>>                      "%016llx:0x%x returned 0x%x\n",
>> @@ -847,7 +849,9 @@ static struct domain_device *sas_ex_discover_end_dev(
>>
>>          list_add_tail(&child->disco_list_node, &parent->port->disco_list);
>>
>> +        wait_discover_event_init(child->port);
>>          res = sas_discover_end_dev(child);
> 
> In sas_discover_end_dev(), we may return before sending the queue event (if LLDD notify dev found returns error), we please take care of this.

Good catch, I will fix this case, thanks!

> 
>> +        wait_for_discover_event_finish(child->port);
>>          if (res) {
>>              SAS_DPRINTK("sas_discover_end_dev() for device %16llx "
>>                      "at %016llx:0x%x returned 0x%x\n",
>> @@ -1890,8 +1894,11 @@ static void sas_unregister_devs_sas_addr(struct domain_device *parent,
>>                  if (child->dev_type == SAS_EDGE_EXPANDER_DEVICE ||
>>                      child->dev_type == SAS_FANOUT_EXPANDER_DEVICE)
>>                      sas_unregister_ex_tree(parent->port, child);
>> -                else
>> +                else {
>> +                    wait_discover_event_init(parent->port);
>>                      sas_unregister_dev(parent->port, child);
>> +                    wait_for_discover_event_finish(parent->port);
>> +                }
>>                  found = child;
>>                  break;
>>              }
>> diff --git a/drivers/scsi/libsas/sas_init.c b/drivers/scsi/libsas/sas_init.c
>> index 79f95d0..1c49483 100644
>> --- a/drivers/scsi/libsas/sas_init.c
>> +++ b/drivers/scsi/libsas/sas_init.c
>> @@ -38,6 +38,8 @@
>>
>>  #include "../scsi_sas_internal.h"
>>
>> +static DEFINE_IDA(sas_ida);
>> +
>>  static struct kmem_cache *sas_task_cache;
>>
>>  struct sas_task *sas_alloc_task(gfp_t flags)
>> @@ -116,6 +118,7 @@ void sas_hae_reset(struct work_struct *work)
>>  int sas_register_ha(struct sas_ha_struct *sas_ha)
>>  {
>>      int error = 0;
>> +    char name[64];
>>
>>      mutex_init(&sas_ha->disco_mutex);
>>      spin_lock_init(&sas_ha->phy_port_lock);
>> @@ -146,6 +149,30 @@ int sas_register_ha(struct sas_ha_struct *sas_ha)
>>          goto Undo_ports;
>>      }
>>
>> +    sas_ha->id = ida_simple_get(&sas_ida, 0, 0, GFP_KERNEL);
>> +    if(sas_ha->id < 0)
>> +        goto Undo_ports;
>> +
>> +    memset(name, 0, 64);
> 
> Why memset and then sprintf?
> 
>> +    snprintf(name, 64, "sas-event-%d", sas_ha->id);
> 
> Can you just use unique dev_name(sas_ha->dev) to help form this name, so that you don't have to introduce IDR?

I check the sas_ha->dev, and found it's point to platform device, so it should be safe to use dev_name(sas_ha->dev), thanks!

> 
>> +    sas_ha->event_q = create_singlethread_workqueue(name);
>> +
>> +    /*
>> +     * sas-disc-xx workqueue run the discover work except
>> +     * probe and destruct.
>> +     */
>> +    snprintf(name, 64, "sas-disc-%d", sas_ha->id);
>> +    sas_ha->disc_q = create_singlethread_workqueue(name);
>> +    if(!sas_ha->event_q || !sas_ha->disc_q) {
>> +        ida_simple_remove(&sas_ida, sas_ha->id);
>> +        if (sas_ha->event_q)
>> +            destroy_workqueue(sas_ha->event_q);
>> +        if (sas_ha->disc_q)
>> +            destroy_workqueue(sas_ha->disc_q);
> 
> Can this error handling be a bit more concise?

OK, will refresh.

> 
>> +        goto Undo_ports;
>> +    }
>> +
>> +
>>      INIT_LIST_HEAD(&sas_ha->eh_done_q);
>>      INIT_LIST_HEAD(&sas_ha->eh_ata_q);
>>
>> @@ -181,6 +208,9 @@ int sas_unregister_ha(struct sas_ha_struct *sas_ha)
>>      __sas_drain_work(sas_ha);
>>      mutex_unlock(&sas_ha->drain_mutex);
>>
>> +    destroy_workqueue(sas_ha->event_q);
>> +    destroy_workqueue(sas_ha->disc_q);
>> +    ida_simple_remove(&sas_ida, sas_ha->id);
>>      return 0;
>>  }
>>
>> @@ -568,7 +598,6 @@ void sas_domain_release_transport(struct scsi_transport_template *stt)
>>  EXPORT_SYMBOL_GPL(sas_domain_release_transport);
>>
>>  /* ---------- SAS Class register/unregister ---------- */
>> -
>>  static int __init sas_class_init(void)
>>  {
>>      sas_task_cache = KMEM_CACHE(sas_task, SLAB_HWCACHE_ALIGN);
>> diff --git a/drivers/scsi/libsas/sas_internal.h b/drivers/scsi/libsas/sas_internal.h
>> index 33ce7e5..276df8e 100644
>> --- a/drivers/scsi/libsas/sas_internal.h
>> +++ b/drivers/scsi/libsas/sas_internal.h
>> @@ -100,6 +100,56 @@ void sas_free_device(struct kref *kref);
>>  extern const work_func_t sas_phy_event_fns[PHY_NUM_EVENTS];
>>  extern const work_func_t sas_port_event_fns[PORT_NUM_EVENTS];
>>
>> +static inline void wait_discover_event_init(struct asd_sas_port *port)
> 
> You need to change function names to have "sas" prefix. Actually these functions are all a bit messy.

OK.

> 
>> +{
>> +    if (port) {
> 
> This init and wait function are currently act ask bookend wrappers. I think it may be better to put them in the wrapped function (if possible), as:
> a. probably then we don't need port NULL check
> b. handles situations where event is possibly not queued, like the suspected sas_discover_end_dev()
> 
>> +        init_completion(&port->disc.completion);
>> +        port->disc.wait = 1;
>> +    }
>> +}
>> +
>> +static inline void wait_for_discover_event_finish(
>> +        struct asd_sas_port *port)
>> +{
>> +    if (port && port->disc.wait == 1)
> 
> Can you just use completion_done() instead of introducing another variable in discovery_event.wait?

What I am worried about completion_done() may be called before we call wait_for_compltion(), in this case,
the process will hang.


> 
>> +        wait_for_completion(&port->disc.completion);
>> +}
>> +
>> +static inline void wait_sas_event_init(struct asd_sas_port *port)
>> +{
>> +    if (port) {
>> +        init_completion(&port->completion);
>> +        port->busy = 0;
>> +    }
>> +}
>> +
>> +static inline void wait_for_sas_event_finish(
>> +        struct asd_sas_port *port)
>> +{
>> +    if (port && port->busy)
>> +        wait_for_completion(&port->completion);
>> +}
>> +
>> +static inline void sas_busy_port(struct asd_sas_port *port)
>> +{
>> +    if (port)
>> +        port->busy++;
> 
> Why not use kref?

Good idea, will replace.

> 
>> +}
>> +
>> +static inline void sas_unbusy_port(struct asd_sas_port *port)
>> +{
>> +    if (port && (port->busy > 0)) {
>> +        port->busy--;
>> +        if (!port->busy)
>> +            complete(&port->completion);
>> +    }
>> +
>> +    if (port && (port->disc.wait == 1)) {
> 
> Why check port twice?

Will remove the second check, thanks!

> 
>> +        complete(&port->disc.completion);
>> +        port->disc.wait = 0;
>> +    }
>> +}
>> +
>>  #ifdef CONFIG_SCSI_SAS_HOST_SMP
>>  extern int sas_smp_host_handler(struct Scsi_Host *shost, struct request *req,
>>                  struct request *rsp);
>> diff --git a/drivers/scsi/libsas/sas_port.c b/drivers/scsi/libsas/sas_port.c
>> index 9326628..8d8b38c 100644
>> --- a/drivers/scsi/libsas/sas_port.c
>> +++ b/drivers/scsi/libsas/sas_port.c
>> @@ -191,7 +191,9 @@ static void sas_form_port(struct asd_sas_phy *phy)
>>      if (si->dft->lldd_port_formed)
>>          si->dft->lldd_port_formed(phy);
>>
>> +    wait_sas_event_init(port);
>>      sas_discover_event(phy->port, DISCE_DISCOVER_DOMAIN);
>> +    wait_for_sas_event_finish(port);
> 
> Is it neater to put these calls inside sas_discover_event()?

Now we have two wait-complete, first for sas event, like in sas_form_port/sas_deform_port,
second, for sas_discover event, like in sas_revalidate_domain, also, sas_discover_event()
may be called recursive, there is some difficult to put these calls inside it.


> 
>>  }
>>
>>  /**
>> @@ -218,7 +220,9 @@ void sas_deform_port(struct asd_sas_phy *phy, int gone)
>>          dev->pathways--;
>>
>>      if (port->num_phys == 1) {
>> +        wait_sas_event_init(port);
>>          sas_unregister_domain_devices(port, gone);
>> +        wait_for_sas_event_finish(port);
>>          sas_port_delete(port->port);
>>          port->port = NULL;
>>      } else {
>> diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
>> index c4444ad..4b931d4 100644
>> --- a/include/scsi/libsas.h
>> +++ b/include/scsi/libsas.h
>> @@ -240,6 +240,9 @@ static inline void INIT_SAS_WORK(struct sas_work *sw, void (*fn)(struct work_str
>>  struct sas_discovery_event {
>>      struct sas_work work;
>>      struct asd_sas_port *port;
>> +    enum discover_event    type;
>> +    int wait;
>> +    struct completion completion;
>>  };
>>
>>  static inline struct sas_discovery_event *to_sas_discovery_event(struct work_struct *work)
>> @@ -256,6 +259,8 @@ struct sas_discovery {
>>      u8     eeds_a[8];
>>      u8     eeds_b[8];
>>      int    max_level;
>> +    int    wait;
>> +    struct completion completion;
> 
> Again, does completion_done() do the same job as wait element?

same as above.

> 
>>  };
>>
>>  /* The port struct is Class:RW, driver:RO */
>> @@ -276,7 +281,8 @@ struct asd_sas_port {
>>
>>  /* public: */
>>      int id;
>> -
>> +    int busy;
>> +    struct completion completion;
> 
> I think public means LLDD can access, which is not the case

OK. will move them up.

> 
>>      enum sas_class   class;
>>      u8               sas_addr[SAS_ADDR_SIZE];
>>      u8               attached_sas_addr[SAS_ADDR_SIZE];
>> @@ -387,6 +393,7 @@ struct sas_ha_struct {
>>      int          eh_active;
>>      wait_queue_head_t eh_wait_q;
>>      struct list_head  eh_dev_q;
>> +    int       id; /* for create workqueue */
>>
>>      struct mutex disco_mutex;
>>
>> @@ -396,6 +403,8 @@ struct sas_ha_struct {
>>      char *sas_ha_name;
>>      struct device *dev;      /* should be set */
>>      struct module *lldd_module; /* should be set */
>> +    struct workqueue_struct    *event_q;
>> +    struct workqueue_struct    *disc_q;
>>
>>      u8 *sas_addr;          /* must be set */
>>      u8 hashed_sas_addr[HASHED_SAS_ADDR_SIZE];
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-05-25 12:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-20  6:39 [PATCH 0/2] Enhance libsas hotplug feature Yijing Wang
2017-05-20  6:39 ` Yijing Wang
2017-05-20  6:39 ` [PATCH 1/2] libsas: Don't process sas events in static works Yijing Wang
2017-05-20  6:39   ` Yijing Wang
2017-05-21  3:44   ` Dan Williams
2017-05-22  5:54     ` wangyijing
2017-05-22  5:54       ` wangyijing
2017-05-22  9:28       ` John Garry
2017-05-22  9:28         ` John Garry
2017-05-23  6:39         ` wangyijing
2017-05-23  6:39           ` wangyijing
2017-05-20  6:39 ` [PATCH 2/2] libsas: Enhance libsas hotplug Yijing Wang
2017-05-20  6:39   ` Yijing Wang
2017-05-25  9:04   ` John Garry
2017-05-25  9:04     ` John Garry
2017-05-25 12:31     ` wangyijing
2017-05-25 12:31       ` wangyijing

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.