All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] hisi_sas: add abort and retry feature
@ 2016-02-16 12:22 ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

This patchset introduces support to abort
certain commands which have failed and retry.
Certain errors require that the command be
retried, like TRANS_TX_CREDIT_TIMEOUT_ERR in
v1 hw.
However, when these errors occur the IO may
still be active, so the IO must first be
aborted, and then retried. The HiSilicon
SAS controller has no FW to do this work, so
it needs to be done manually.

John Garry (6):
  hisi_sas: add TMF_RESP_FUNC_SUCC check
  hisi_sas: add hisi_sas_slot_abort()
  hisi_sas: use slot abort in v1 hw
  hisi_sas: use slot abort in v2 hw
  hisi_sas: add hisi_sas_slave_configure()
  hisi_sas: update driver version to 1.3

 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 66 ++++++++++++++++++++++++++++++++--
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 +++++++++---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 ++++++++--
 4 files changed, 101 insertions(+), 9 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/6] hisi_sas: add abort and retry feature
@ 2016-02-16 12:22 ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

This patchset introduces support to abort
certain commands which have failed and retry.
Certain errors require that the command be
retried, like TRANS_TX_CREDIT_TIMEOUT_ERR in
v1 hw.
However, when these errors occur the IO may
still be active, so the IO must first be
aborted, and then retried. The HiSilicon
SAS controller has no FW to do this work, so
it needs to be done manually.

John Garry (6):
  hisi_sas: add TMF_RESP_FUNC_SUCC check
  hisi_sas: add hisi_sas_slot_abort()
  hisi_sas: use slot abort in v1 hw
  hisi_sas: use slot abort in v2 hw
  hisi_sas: add hisi_sas_slave_configure()
  hisi_sas: update driver version to 1.3

 drivers/scsi/hisi_sas/hisi_sas.h       |  3 +-
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 66 ++++++++++++++++++++++++++++++++--
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 +++++++++---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 ++++++++--
 4 files changed, 101 insertions(+), 9 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/6] hisi_sas: add TMF_RESP_FUNC_SUCC check
  2016-02-16 12:22 ` John Garry
@ 2016-02-16 12:22   ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

When a tmf is issued, various response codes can be
returned from the target. For a query tmf the
response may be TMF_RESP_FUNC_COMPLETE or
TMF_RESP_FUNC_SUCC.
Add a condition for TMF_RESP_FUNC_SUCC.
Also, check for SAM_STAT_GOOD is replaced with
TMF_RESP_FUNC_COMPLETE, which is a genuine tmf
response code. SAM_STAT_GOOD and
TMF_RESP_FUNC_COMPLETE have the same value, so
this is why it worked before.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 2194917..c600f5e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -661,12 +661,18 @@ static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
 		}
 
 		if (task->task_status.resp == SAS_TASK_COMPLETE &&
-		    task->task_status.stat == SAM_STAT_GOOD) {
+		     task->task_status.stat == TMF_RESP_FUNC_COMPLETE) {
 			res = TMF_RESP_FUNC_COMPLETE;
 			break;
 		}
 
 		if (task->task_status.resp == SAS_TASK_COMPLETE &&
+		    task->task_status.stat == TMF_RESP_FUNC_SUCC) {
+			res = TMF_RESP_FUNC_SUCC;
+			break;
+		}
+
+		if (task->task_status.resp == SAS_TASK_COMPLETE &&
 		      task->task_status.stat == SAS_DATA_UNDERRUN) {
 			/* no error, but return the number of bytes of
 			 * underrun
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 1/6] hisi_sas: add TMF_RESP_FUNC_SUCC check
@ 2016-02-16 12:22   ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

When a tmf is issued, various response codes can be
returned from the target. For a query tmf the
response may be TMF_RESP_FUNC_COMPLETE or
TMF_RESP_FUNC_SUCC.
Add a condition for TMF_RESP_FUNC_SUCC.
Also, check for SAM_STAT_GOOD is replaced with
TMF_RESP_FUNC_COMPLETE, which is a genuine tmf
response code. SAM_STAT_GOOD and
TMF_RESP_FUNC_COMPLETE have the same value, so
this is why it worked before.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 2194917..c600f5e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -661,12 +661,18 @@ static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
 		}
 
 		if (task->task_status.resp == SAS_TASK_COMPLETE &&
-		    task->task_status.stat == SAM_STAT_GOOD) {
+		     task->task_status.stat == TMF_RESP_FUNC_COMPLETE) {
 			res = TMF_RESP_FUNC_COMPLETE;
 			break;
 		}
 
 		if (task->task_status.resp == SAS_TASK_COMPLETE &&
+		    task->task_status.stat == TMF_RESP_FUNC_SUCC) {
+			res = TMF_RESP_FUNC_SUCC;
+			break;
+		}
+
+		if (task->task_status.resp == SAS_TASK_COMPLETE &&
 		      task->task_status.stat == SAS_DATA_UNDERRUN) {
 			/* no error, but return the number of bytes of
 			 * underrun
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()
  2016-02-16 12:22 ` John Garry
@ 2016-02-16 12:22   ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

Add a function to abort a slot (task) in the device
(if it is in the task set) and then cleanup and
complete the task.
The function is called from work queue context as
it cannot be called from the context where it is
triggered (interrupt).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h      |  1 +
 drivers/scsi/hisi_sas/hisi_sas_main.c | 43 +++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 02da7e4..a05ce71 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -120,6 +120,7 @@ struct hisi_sas_slot {
 	dma_addr_t command_table_dma;
 	struct hisi_sas_sge_page *sge_page;
 	dma_addr_t sge_page_dma;
+	struct work_struct abort_slot;
 };
 
 struct hisi_sas_tmf_task {
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index c600f5e..65509eb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -15,6 +15,9 @@
 #define DEV_IS_GONE(dev) \
 	((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
 
+static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
+				u8 *lun, struct hisi_sas_tmf_task *tmf);
+
 static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
 {
 	return device->port->ha->lldd_ha;
@@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba *hisi_hba,
 	return hisi_hba->hw->prep_stp(hisi_hba, slot);
 }
 
+/*
+ * This function will issue an abort TMF if a task is still in
+ * the target. Then it will do the task complete cleanup and
+ * callbacks.
+ */
+static void hisi_sas_slot_abort(struct work_struct *work)
+{
+	struct hisi_sas_slot *abort_slot =
+		container_of(work, struct hisi_sas_slot, abort_slot);
+	struct sas_task *task = abort_slot->task;
+	struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
+	struct scsi_cmnd *cmnd = task->uldd_task;
+	struct hisi_sas_tmf_task tmf_task;
+	struct domain_device *device = task->dev;
+	struct hisi_sas_device *sas_dev = device->lldd_dev;
+	struct scsi_lun lun;
+	int tag = abort_slot->idx, rc;
+
+	int_to_scsilun(cmnd->device->lun, &lun);
+	tmf_task.tmf = TMF_QUERY_TASK;
+	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
+
+	/* TMF_RESP_FUNC_SUCC means task is present in the task set */
+	if (rc != TMF_RESP_FUNC_SUCC)
+		goto out;
+	tmf_task.tmf = TMF_ABORT_TASK;
+	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
+out:
+	hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
+	if (task->task_done)
+		task->task_done(task);
+	if (sas_dev && sas_dev->running_req)
+		sas_dev->running_req--;
+}
+
 static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
 			      int is_tmf, struct hisi_sas_tmf_task *tmf,
 			      int *pass)
@@ -206,6 +248,7 @@ static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
 	slot->task = task;
 	slot->port = port;
 	task->lldd_task = slot;
+	INIT_WORK(&slot->abort_slot, hisi_sas_slot_abort);
 
 	slot->status_buffer = dma_pool_alloc(hisi_hba->status_buffer_pool,
 					     GFP_ATOMIC,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()
@ 2016-02-16 12:22   ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

Add a function to abort a slot (task) in the device
(if it is in the task set) and then cleanup and
complete the task.
The function is called from work queue context as
it cannot be called from the context where it is
triggered (interrupt).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h      |  1 +
 drivers/scsi/hisi_sas/hisi_sas_main.c | 43 +++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 02da7e4..a05ce71 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -120,6 +120,7 @@ struct hisi_sas_slot {
 	dma_addr_t command_table_dma;
 	struct hisi_sas_sge_page *sge_page;
 	dma_addr_t sge_page_dma;
+	struct work_struct abort_slot;
 };
 
 struct hisi_sas_tmf_task {
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index c600f5e..65509eb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -15,6 +15,9 @@
 #define DEV_IS_GONE(dev) \
 	((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
 
+static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
+				u8 *lun, struct hisi_sas_tmf_task *tmf);
+
 static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
 {
 	return device->port->ha->lldd_ha;
@@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba *hisi_hba,
 	return hisi_hba->hw->prep_stp(hisi_hba, slot);
 }
 
+/*
+ * This function will issue an abort TMF if a task is still in
+ * the target. Then it will do the task complete cleanup and
+ * callbacks.
+ */
+static void hisi_sas_slot_abort(struct work_struct *work)
+{
+	struct hisi_sas_slot *abort_slot =
+		container_of(work, struct hisi_sas_slot, abort_slot);
+	struct sas_task *task = abort_slot->task;
+	struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
+	struct scsi_cmnd *cmnd = task->uldd_task;
+	struct hisi_sas_tmf_task tmf_task;
+	struct domain_device *device = task->dev;
+	struct hisi_sas_device *sas_dev = device->lldd_dev;
+	struct scsi_lun lun;
+	int tag = abort_slot->idx, rc;
+
+	int_to_scsilun(cmnd->device->lun, &lun);
+	tmf_task.tmf = TMF_QUERY_TASK;
+	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
+
+	/* TMF_RESP_FUNC_SUCC means task is present in the task set */
+	if (rc != TMF_RESP_FUNC_SUCC)
+		goto out;
+	tmf_task.tmf = TMF_ABORT_TASK;
+	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
+
+	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
+out:
+	hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
+	if (task->task_done)
+		task->task_done(task);
+	if (sas_dev && sas_dev->running_req)
+		sas_dev->running_req--;
+}
+
 static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
 			      int is_tmf, struct hisi_sas_tmf_task *tmf,
 			      int *pass)
@@ -206,6 +248,7 @@ static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
 	slot->task = task;
 	slot->port = port;
 	task->lldd_task = slot;
+	INIT_WORK(&slot->abort_slot, hisi_sas_slot_abort);
 
 	slot->status_buffer = dma_pool_alloc(hisi_hba->status_buffer_pool,
 					     GFP_ATOMIC,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/6] hisi_sas: use slot abort in v1 hw
  2016-02-16 12:22 ` John Garry
@ 2016-02-16 12:22   ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

When TRANS_TX_CREDIT_TIMEOUT_ERR or
TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
command, the command should be re-attempted.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index ce5f65d..34f71a1c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
 }
 
 /* by default, task resp is complete */
-static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
-			   struct sas_task *task,
-			   struct hisi_sas_slot *slot)
+static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
+			   struct hisi_sas_slot *slot, int *abort_slot)
 {
 	struct task_status_struct *ts = &task->task_status;
 	struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
@@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
 			ts->stat = SAS_NAK_R_ERR;
 			break;
 		}
+		case TRANS_TX_CREDIT_TIMEOUT_ERR:
+		case TRANS_TX_CLOSE_NORMAL_ERR:
+		{
+			/* This will request a retry */
+			ts->stat = SAS_QUEUE_FULL;
+			++(*abort_slot);
+			break;
+		}
 		default:
 		{
 			ts->stat = SAM_STAT_CHECK_CONDITION;
@@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
 
 	if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
 		!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
+		int abort_slot = 0;
 
-		slot_err_v1_hw(hisi_hba, task, slot);
+		slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
+		if (unlikely(abort_slot)) {
+			queue_work(hisi_hba->wq, &slot->abort_slot);
+			sts = ts->stat;
+			goto out_1;
+		}
 		goto out;
 	}
 
@@ -1375,6 +1388,7 @@ out:
 
 	if (task->task_done)
 		task->task_done(task);
+out_1:
 
 	return sts;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/6] hisi_sas: use slot abort in v1 hw
@ 2016-02-16 12:22   ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

When TRANS_TX_CREDIT_TIMEOUT_ERR or
TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
command, the command should be re-attempted.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index ce5f65d..34f71a1c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
 }
 
 /* by default, task resp is complete */
-static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
-			   struct sas_task *task,
-			   struct hisi_sas_slot *slot)
+static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
+			   struct hisi_sas_slot *slot, int *abort_slot)
 {
 	struct task_status_struct *ts = &task->task_status;
 	struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
@@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
 			ts->stat = SAS_NAK_R_ERR;
 			break;
 		}
+		case TRANS_TX_CREDIT_TIMEOUT_ERR:
+		case TRANS_TX_CLOSE_NORMAL_ERR:
+		{
+			/* This will request a retry */
+			ts->stat = SAS_QUEUE_FULL;
+			++(*abort_slot);
+			break;
+		}
 		default:
 		{
 			ts->stat = SAM_STAT_CHECK_CONDITION;
@@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
 
 	if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
 		!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
+		int abort_slot = 0;
 
-		slot_err_v1_hw(hisi_hba, task, slot);
+		slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
+		if (unlikely(abort_slot)) {
+			queue_work(hisi_hba->wq, &slot->abort_slot);
+			sts = ts->stat;
+			goto out_1;
+		}
 		goto out;
 	}
 
@@ -1375,6 +1388,7 @@ out:
 
 	if (task->task_done)
 		task->task_done(task);
+out_1:
 
 	return sts;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/6] hisi_sas: use slot abort in v2 hw
  2016-02-16 12:22 ` John Garry
@ 2016-02-16 12:22   ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

When TRANS_TX_ERR_FRAME_TXED error occurs for
a command, the command should be re-attempted.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 58e1956..2bf93079b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
 /* by default, task resp is complete */
 static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
 			   struct sas_task *task,
-			   struct hisi_sas_slot *slot)
+			   struct hisi_sas_slot *slot,
+			   int *abort_slot)
 {
 	struct task_status_struct *ts = &task->task_status;
 	struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
@@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
 			ts->stat = SAS_DATA_UNDERRUN;
 			break;
 		}
+		case TRANS_TX_ERR_FRAME_TXED:
+		{
+			/* This will request a retry */
+			ts->stat = SAS_QUEUE_FULL;
+			++(*abort_slot);
+			break;
+		}
 		case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
 		case TRANS_TX_ERR_PHY_NOT_ENABLE:
 		case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
@@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot,
 
 	if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
 		(!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
+		int abort_slot = 0;
 		dev_dbg(dev, "%s slot %d has error info 0x%x\n",
 			__func__, slot->cmplt_queue_slot,
 			complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);
 
-		slot_err_v2_hw(hisi_hba, task, slot);
+		slot_err_v2_hw(hisi_hba, task, slot, &abort_slot);
+		if (unlikely(abort_slot)) {
+			queue_work(hisi_hba->wq, &slot->abort_slot);
+			sts = ts->stat;
+			goto out_1;
+		}
 		goto out;
 	}
 
@@ -1555,6 +1569,7 @@ out:
 
 	if (task->task_done)
 		task->task_done(task);
+out_1:
 
 	return sts;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/6] hisi_sas: use slot abort in v2 hw
@ 2016-02-16 12:22   ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

When TRANS_TX_ERR_FRAME_TXED error occurs for
a command, the command should be re-attempted.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 58e1956..2bf93079b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
 /* by default, task resp is complete */
 static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
 			   struct sas_task *task,
-			   struct hisi_sas_slot *slot)
+			   struct hisi_sas_slot *slot,
+			   int *abort_slot)
 {
 	struct task_status_struct *ts = &task->task_status;
 	struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
@@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
 			ts->stat = SAS_DATA_UNDERRUN;
 			break;
 		}
+		case TRANS_TX_ERR_FRAME_TXED:
+		{
+			/* This will request a retry */
+			ts->stat = SAS_QUEUE_FULL;
+			++(*abort_slot);
+			break;
+		}
 		case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
 		case TRANS_TX_ERR_PHY_NOT_ENABLE:
 		case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
@@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot,
 
 	if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
 		(!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
+		int abort_slot = 0;
 		dev_dbg(dev, "%s slot %d has error info 0x%x\n",
 			__func__, slot->cmplt_queue_slot,
 			complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);
 
-		slot_err_v2_hw(hisi_hba, task, slot);
+		slot_err_v2_hw(hisi_hba, task, slot, &abort_slot);
+		if (unlikely(abort_slot)) {
+			queue_work(hisi_hba->wq, &slot->abort_slot);
+			sts = ts->stat;
+			goto out_1;
+		}
 		goto out;
 	}
 
@@ -1555,6 +1569,7 @@ out:
 
 	if (task->task_done)
 		task->task_done(task);
+out_1:
 
 	return sts;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-16 12:22 ` John Garry
@ 2016-02-16 12:22   ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

In high-datarate aging tests, it is found that
the SCSI framework can periodically
issue lu resets to the device. This is because scsi
commands begin to timeout. It is found that TASK SET
FULL may be returned many times for the same command,
causing the timeouts.
To overcome this, the queue depth for the device needs
to be reduced to 64 (from 256, set in
sas_slave_configure()).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 65509eb..dde7c5a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -454,6 +454,19 @@ static int hisi_sas_dev_found(struct domain_device *device)
 	return 0;
 }
 
+static int hisi_sas_slave_configure(struct scsi_device *sdev)
+{
+	struct domain_device *dev = sdev_to_domain_dev(sdev);
+	int ret = sas_slave_configure(sdev);
+
+	if (ret)
+		return ret;
+	if (!dev_is_sata(dev))
+		sas_change_queue_depth(sdev, 64);
+
+	return 0;
+}
+
 static void hisi_sas_scan_start(struct Scsi_Host *shost)
 {
 	struct hisi_hba *hisi_hba = shost_priv(shost);
@@ -997,7 +1010,7 @@ static struct scsi_host_template hisi_sas_sht = {
 	.name			= DRV_NAME,
 	.queuecommand		= sas_queuecommand,
 	.target_alloc		= sas_target_alloc,
-	.slave_configure	= sas_slave_configure,
+	.slave_configure	= hisi_sas_slave_configure,
 	.scan_finished		= hisi_sas_scan_finished,
 	.scan_start		= hisi_sas_scan_start,
 	.change_queue_depth	= sas_change_queue_depth,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
@ 2016-02-16 12:22   ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

In high-datarate aging tests, it is found that
the SCSI framework can periodically
issue lu resets to the device. This is because scsi
commands begin to timeout. It is found that TASK SET
FULL may be returned many times for the same command,
causing the timeouts.
To overcome this, the queue depth for the device needs
to be reduced to 64 (from 256, set in
sas_slave_configure()).

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 65509eb..dde7c5a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -454,6 +454,19 @@ static int hisi_sas_dev_found(struct domain_device *device)
 	return 0;
 }
 
+static int hisi_sas_slave_configure(struct scsi_device *sdev)
+{
+	struct domain_device *dev = sdev_to_domain_dev(sdev);
+	int ret = sas_slave_configure(sdev);
+
+	if (ret)
+		return ret;
+	if (!dev_is_sata(dev))
+		sas_change_queue_depth(sdev, 64);
+
+	return 0;
+}
+
 static void hisi_sas_scan_start(struct Scsi_Host *shost)
 {
 	struct hisi_hba *hisi_hba = shost_priv(shost);
@@ -997,7 +1010,7 @@ static struct scsi_host_template hisi_sas_sht = {
 	.name			= DRV_NAME,
 	.queuecommand		= sas_queuecommand,
 	.target_alloc		= sas_target_alloc,
-	.slave_configure	= sas_slave_configure,
+	.slave_configure	= hisi_sas_slave_configure,
 	.scan_finished		= hisi_sas_scan_finished,
 	.scan_start		= hisi_sas_scan_start,
 	.change_queue_depth	= sas_change_queue_depth,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6/6] hisi_sas: update driver version to 1.3
  2016-02-16 12:22 ` John Garry
@ 2016-02-16 12:22   ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index a05ce71..beb7318 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -23,7 +23,7 @@
 #include <scsi/sas_ata.h>
 #include <scsi/libsas.h>
 
-#define DRV_VERSION "v1.2"
+#define DRV_VERSION "v1.3"
 
 #define HISI_SAS_MAX_PHYS	9
 #define HISI_SAS_MAX_QUEUES	32
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6/6] hisi_sas: update driver version to 1.3
@ 2016-02-16 12:22   ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 12:22 UTC (permalink / raw)
  To: JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi,
	linux-kernel, John Garry

Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index a05ce71..beb7318 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -23,7 +23,7 @@
 #include <scsi/sas_ata.h>
 #include <scsi/libsas.h>
 
-#define DRV_VERSION "v1.2"
+#define DRV_VERSION "v1.3"
 
 #define HISI_SAS_MAX_PHYS	9
 #define HISI_SAS_MAX_QUEUES	32
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/6] hisi_sas: add TMF_RESP_FUNC_SUCC check
  2016-02-16 12:22   ` John Garry
  (?)
@ 2016-02-16 15:20   ` Hannes Reinecke
  -1 siblings, 0 replies; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-16 15:20 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 01:22 PM, John Garry wrote:
> When a tmf is issued, various response codes can be
> returned from the target. For a query tmf the
> response may be TMF_RESP_FUNC_COMPLETE or
> TMF_RESP_FUNC_SUCC.
> Add a condition for TMF_RESP_FUNC_SUCC.
> Also, check for SAM_STAT_GOOD is replaced with
> TMF_RESP_FUNC_COMPLETE, which is a genuine tmf
> response code. SAM_STAT_GOOD and
> TMF_RESP_FUNC_COMPLETE have the same value, so
> this is why it worked before.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas_main.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index 2194917..c600f5e 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -661,12 +661,18 @@ static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
>  		}
>  
>  		if (task->task_status.resp == SAS_TASK_COMPLETE &&
> -		    task->task_status.stat == SAM_STAT_GOOD) {
> +		     task->task_status.stat == TMF_RESP_FUNC_COMPLETE) {
>  			res = TMF_RESP_FUNC_COMPLETE;
>  			break;
>  		}
>  
>  		if (task->task_status.resp == SAS_TASK_COMPLETE &&
> +		    task->task_status.stat == TMF_RESP_FUNC_SUCC) {
> +			res = TMF_RESP_FUNC_SUCC;
> +			break;
> +		}
> +
> +		if (task->task_status.resp == SAS_TASK_COMPLETE &&
>  		      task->task_status.stat == SAS_DATA_UNDERRUN) {
>  			/* no error, but return the number of bytes of
>  			 * underrun
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()
  2016-02-16 12:22   ` John Garry
  (?)
@ 2016-02-16 15:22   ` Hannes Reinecke
  2016-02-16 15:41       ` John Garry
  -1 siblings, 1 reply; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-16 15:22 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 01:22 PM, John Garry wrote:
> Add a function to abort a slot (task) in the device
> (if it is in the task set) and then cleanup and
> complete the task.
> The function is called from work queue context as
> it cannot be called from the context where it is
> triggered (interrupt).
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas.h      |  1 +
>  drivers/scsi/hisi_sas/hisi_sas_main.c | 43 +++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
> index 02da7e4..a05ce71 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas.h
> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
> @@ -120,6 +120,7 @@ struct hisi_sas_slot {
>  	dma_addr_t command_table_dma;
>  	struct hisi_sas_sge_page *sge_page;
>  	dma_addr_t sge_page_dma;
> +	struct work_struct abort_slot;
>  };
>  
>  struct hisi_sas_tmf_task {
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index c600f5e..65509eb 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -15,6 +15,9 @@
>  #define DEV_IS_GONE(dev) \
>  	((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
>  
> +static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
> +				u8 *lun, struct hisi_sas_tmf_task *tmf);
> +
>  static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
>  {
>  	return device->port->ha->lldd_ha;
> @@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba *hisi_hba,
>  	return hisi_hba->hw->prep_stp(hisi_hba, slot);
>  }
>  
> +/*
> + * This function will issue an abort TMF if a task is still in
> + * the target. Then it will do the task complete cleanup and
> + * callbacks.
> + */
> +static void hisi_sas_slot_abort(struct work_struct *work)
> +{
> +	struct hisi_sas_slot *abort_slot =
> +		container_of(work, struct hisi_sas_slot, abort_slot);
> +	struct sas_task *task = abort_slot->task;
> +	struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
> +	struct scsi_cmnd *cmnd = task->uldd_task;
> +	struct hisi_sas_tmf_task tmf_task;
> +	struct domain_device *device = task->dev;
> +	struct hisi_sas_device *sas_dev = device->lldd_dev;
> +	struct scsi_lun lun;
> +	int tag = abort_slot->idx, rc;
> +
> +	int_to_scsilun(cmnd->device->lun, &lun);
> +	tmf_task.tmf = TMF_QUERY_TASK;
> +	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
> +
> +	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
> +
> +	/* TMF_RESP_FUNC_SUCC means task is present in the task set */
> +	if (rc != TMF_RESP_FUNC_SUCC)
> +		goto out;
> +	tmf_task.tmf = TMF_ABORT_TASK;
> +	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
> +
> +	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
> +out:
> +	hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
> +	if (task->task_done)
> +		task->task_done(task);
> +	if (sas_dev && sas_dev->running_req)
> +		sas_dev->running_req--;
> +}
> +
>  static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
>  			      int is_tmf, struct hisi_sas_tmf_task *tmf,
>  			      int *pass)
Do you really need to query the task first?
As per SAM a successful return from an ABORT TASK TMF has this meaning:

A response of FUNCTION COMPLETE shall indicate that the task was
aborted or was not in the task set.

Ie it doesn't matter if the task was present or not.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw
  2016-02-16 12:22   ` John Garry
@ 2016-02-16 15:31     ` Hannes Reinecke
  -1 siblings, 0 replies; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-16 15:31 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 01:22 PM, John Garry wrote:
> When TRANS_TX_CREDIT_TIMEOUT_ERR or
> TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
> command, the command should be re-attempted.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> index ce5f65d..34f71a1c 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> @@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
>  }
>  
>  /* by default, task resp is complete */
> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
> -			   struct sas_task *task,
> -			   struct hisi_sas_slot *slot)
> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
> +			   struct hisi_sas_slot *slot, int *abort_slot)
>  {
>  	struct task_status_struct *ts = &task->task_status;
>  	struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>  			ts->stat = SAS_NAK_R_ERR;
>  			break;
>  		}
> +		case TRANS_TX_CREDIT_TIMEOUT_ERR:
> +		case TRANS_TX_CLOSE_NORMAL_ERR:
> +		{
> +			/* This will request a retry */
> +			ts->stat = SAS_QUEUE_FULL;
> +			++(*abort_slot);
> +			break;
> +		}
>  		default:
>  		{
>  			ts->stat = SAM_STAT_CHECK_CONDITION;
> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
>  
>  	if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>  		!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
> +		int abort_slot = 0;
>  
> -		slot_err_v1_hw(hisi_hba, task, slot);
> +		slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
> +		if (unlikely(abort_slot)) {
> +			queue_work(hisi_hba->wq, &slot->abort_slot);
> +			sts = ts->stat;
> +			goto out_1;
> +		}
>  		goto out;
>  	}
>  
What is the 'abort_slot' variable for?
Currently it's just a counter, no?
So why the weird pointer passing?

And it does feel weird. Apparently the driver does get a message,
but still has to abort the command. Why?
Isn't the message an indicator that the command has been aborted?

Cheers,

Hannes

-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw
@ 2016-02-16 15:31     ` Hannes Reinecke
  0 siblings, 0 replies; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-16 15:31 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 01:22 PM, John Garry wrote:
> When TRANS_TX_CREDIT_TIMEOUT_ERR or
> TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
> command, the command should be re-attempted.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> index ce5f65d..34f71a1c 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
> @@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
>  }
>  
>  /* by default, task resp is complete */
> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
> -			   struct sas_task *task,
> -			   struct hisi_sas_slot *slot)
> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
> +			   struct hisi_sas_slot *slot, int *abort_slot)
>  {
>  	struct task_status_struct *ts = &task->task_status;
>  	struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>  			ts->stat = SAS_NAK_R_ERR;
>  			break;
>  		}
> +		case TRANS_TX_CREDIT_TIMEOUT_ERR:
> +		case TRANS_TX_CLOSE_NORMAL_ERR:
> +		{
> +			/* This will request a retry */
> +			ts->stat = SAS_QUEUE_FULL;
> +			++(*abort_slot);
> +			break;
> +		}
>  		default:
>  		{
>  			ts->stat = SAM_STAT_CHECK_CONDITION;
> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
>  
>  	if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>  		!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
> +		int abort_slot = 0;
>  
> -		slot_err_v1_hw(hisi_hba, task, slot);
> +		slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
> +		if (unlikely(abort_slot)) {
> +			queue_work(hisi_hba->wq, &slot->abort_slot);
> +			sts = ts->stat;
> +			goto out_1;
> +		}
>  		goto out;
>  	}
>  
What is the 'abort_slot' variable for?
Currently it's just a counter, no?
So why the weird pointer passing?

And it does feel weird. Apparently the driver does get a message,
but still has to abort the command. Why?
Isn't the message an indicator that the command has been aborted?

Cheers,

Hannes

-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/6] hisi_sas: use slot abort in v2 hw
  2016-02-16 12:22   ` John Garry
  (?)
@ 2016-02-16 15:32   ` Hannes Reinecke
  2016-02-16 16:58       ` John Garry
  -1 siblings, 1 reply; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-16 15:32 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 01:22 PM, John Garry wrote:
> When TRANS_TX_ERR_FRAME_TXED error occurs for
> a command, the command should be re-attempted.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
> index 58e1956..2bf93079b 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
> @@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
>  /* by default, task resp is complete */
>  static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>  			   struct sas_task *task,
> -			   struct hisi_sas_slot *slot)
> +			   struct hisi_sas_slot *slot,
> +			   int *abort_slot)
>  {
>  	struct task_status_struct *ts = &task->task_status;
>  	struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
> @@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>  			ts->stat = SAS_DATA_UNDERRUN;
>  			break;
>  		}
> +		case TRANS_TX_ERR_FRAME_TXED:
> +		{
> +			/* This will request a retry */
> +			ts->stat = SAS_QUEUE_FULL;
> +			++(*abort_slot);
> +			break;
> +		}
>  		case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
>  		case TRANS_TX_ERR_PHY_NOT_ENABLE:
>  		case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
> @@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot,
>  
>  	if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
>  		(!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
> +		int abort_slot = 0;
>  		dev_dbg(dev, "%s slot %d has error info 0x%x\n",
>  			__func__, slot->cmplt_queue_slot,
>  			complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);
>  
> -		slot_err_v2_hw(hisi_hba, task, slot);
> +		slot_err_v2_hw(hisi_hba, task, slot, &abort_slot);
> +		if (unlikely(abort_slot)) {
> +			queue_work(hisi_hba->wq, &slot->abort_slot);
> +			sts = ts->stat;
> +			goto out_1;
> +		}
>  		goto out;
>  	}
>  
Again this weird 'abort_slot' pointer reshuffling.
Care to clarify?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-16 12:22   ` John Garry
  (?)
@ 2016-02-16 15:33   ` Hannes Reinecke
  2016-02-16 16:56       ` John Garry
  -1 siblings, 1 reply; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-16 15:33 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 01:22 PM, John Garry wrote:
> In high-datarate aging tests, it is found that
> the SCSI framework can periodically
> issue lu resets to the device. This is because scsi
> commands begin to timeout. It is found that TASK SET
> FULL may be returned many times for the same command,
> causing the timeouts.
> To overcome this, the queue depth for the device needs
> to be reduced to 64 (from 256, set in
> sas_slave_configure()).
> 
Hmm. TASK SET FULL should cause the queue depth to be reduced
automatically, no?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()
  2016-02-16 15:22   ` Hannes Reinecke
@ 2016-02-16 15:41       ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 15:41 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:22, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> Add a function to abort a slot (task) in the device
>> (if it is in the task set) and then cleanup and
>> complete the task.
>> The function is called from work queue context as
>> it cannot be called from the context where it is
>> triggered (interrupt).
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/scsi/hisi_sas/hisi_sas.h      |  1 +
>>   drivers/scsi/hisi_sas/hisi_sas_main.c | 43 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 44 insertions(+)
>>
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
>> index 02da7e4..a05ce71 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas.h
>> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
>> @@ -120,6 +120,7 @@ struct hisi_sas_slot {
>>   	dma_addr_t command_table_dma;
>>   	struct hisi_sas_sge_page *sge_page;
>>   	dma_addr_t sge_page_dma;
>> +	struct work_struct abort_slot;
>>   };
>>
>>   struct hisi_sas_tmf_task {
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
>> index c600f5e..65509eb 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
>> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
>> @@ -15,6 +15,9 @@
>>   #define DEV_IS_GONE(dev) \
>>   	((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
>>
>> +static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
>> +				u8 *lun, struct hisi_sas_tmf_task *tmf);
>> +
>>   static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
>>   {
>>   	return device->port->ha->lldd_ha;
>> @@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba *hisi_hba,
>>   	return hisi_hba->hw->prep_stp(hisi_hba, slot);
>>   }
>>
>> +/*
>> + * This function will issue an abort TMF if a task is still in
>> + * the target. Then it will do the task complete cleanup and
>> + * callbacks.
>> + */
>> +static void hisi_sas_slot_abort(struct work_struct *work)
>> +{
>> +	struct hisi_sas_slot *abort_slot =
>> +		container_of(work, struct hisi_sas_slot, abort_slot);
>> +	struct sas_task *task = abort_slot->task;
>> +	struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
>> +	struct scsi_cmnd *cmnd = task->uldd_task;
>> +	struct hisi_sas_tmf_task tmf_task;
>> +	struct domain_device *device = task->dev;
>> +	struct hisi_sas_device *sas_dev = device->lldd_dev;
>> +	struct scsi_lun lun;
>> +	int tag = abort_slot->idx, rc;
>> +
>> +	int_to_scsilun(cmnd->device->lun, &lun);
>> +	tmf_task.tmf = TMF_QUERY_TASK;
>> +	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>> +
>> +	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
>> +
>> +	/* TMF_RESP_FUNC_SUCC means task is present in the task set */
>> +	if (rc != TMF_RESP_FUNC_SUCC)
>> +		goto out;
>> +	tmf_task.tmf = TMF_ABORT_TASK;
>> +	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>> +
>> +	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
>> +out:
>> +	hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
>> +	if (task->task_done)
>> +		task->task_done(task);
>> +	if (sas_dev && sas_dev->running_req)
>> +		sas_dev->running_req--;
>> +}
>> +
>>   static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
>>   			      int is_tmf, struct hisi_sas_tmf_task *tmf,
>>   			      int *pass)
> Do you really need to query the task first?
> As per SAM a successful return from an ABORT TASK TMF has this meaning:
>
> A response of FUNCTION COMPLETE shall indicate that the task was
> aborted or was not in the task set.
>
> Ie it doesn't matter if the task was present or not.
>
> Cheers,
>
> Hannes
>

I think that it should be ok to the abort without first querying. I'll 
just test this to double-check. (This would make the first patch in the 
series superflous)

Cheers,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()
@ 2016-02-16 15:41       ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 15:41 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:22, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> Add a function to abort a slot (task) in the device
>> (if it is in the task set) and then cleanup and
>> complete the task.
>> The function is called from work queue context as
>> it cannot be called from the context where it is
>> triggered (interrupt).
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/scsi/hisi_sas/hisi_sas.h      |  1 +
>>   drivers/scsi/hisi_sas/hisi_sas_main.c | 43 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 44 insertions(+)
>>
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
>> index 02da7e4..a05ce71 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas.h
>> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
>> @@ -120,6 +120,7 @@ struct hisi_sas_slot {
>>   	dma_addr_t command_table_dma;
>>   	struct hisi_sas_sge_page *sge_page;
>>   	dma_addr_t sge_page_dma;
>> +	struct work_struct abort_slot;
>>   };
>>
>>   struct hisi_sas_tmf_task {
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
>> index c600f5e..65509eb 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
>> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
>> @@ -15,6 +15,9 @@
>>   #define DEV_IS_GONE(dev) \
>>   	((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
>>
>> +static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
>> +				u8 *lun, struct hisi_sas_tmf_task *tmf);
>> +
>>   static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
>>   {
>>   	return device->port->ha->lldd_ha;
>> @@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct hisi_hba *hisi_hba,
>>   	return hisi_hba->hw->prep_stp(hisi_hba, slot);
>>   }
>>
>> +/*
>> + * This function will issue an abort TMF if a task is still in
>> + * the target. Then it will do the task complete cleanup and
>> + * callbacks.
>> + */
>> +static void hisi_sas_slot_abort(struct work_struct *work)
>> +{
>> +	struct hisi_sas_slot *abort_slot =
>> +		container_of(work, struct hisi_sas_slot, abort_slot);
>> +	struct sas_task *task = abort_slot->task;
>> +	struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
>> +	struct scsi_cmnd *cmnd = task->uldd_task;
>> +	struct hisi_sas_tmf_task tmf_task;
>> +	struct domain_device *device = task->dev;
>> +	struct hisi_sas_device *sas_dev = device->lldd_dev;
>> +	struct scsi_lun lun;
>> +	int tag = abort_slot->idx, rc;
>> +
>> +	int_to_scsilun(cmnd->device->lun, &lun);
>> +	tmf_task.tmf = TMF_QUERY_TASK;
>> +	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>> +
>> +	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
>> +
>> +	/* TMF_RESP_FUNC_SUCC means task is present in the task set */
>> +	if (rc != TMF_RESP_FUNC_SUCC)
>> +		goto out;
>> +	tmf_task.tmf = TMF_ABORT_TASK;
>> +	tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>> +
>> +	rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun, &tmf_task);
>> +out:
>> +	hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
>> +	if (task->task_done)
>> +		task->task_done(task);
>> +	if (sas_dev && sas_dev->running_req)
>> +		sas_dev->running_req--;
>> +}
>> +
>>   static int hisi_sas_task_prep(struct sas_task *task, struct hisi_hba *hisi_hba,
>>   			      int is_tmf, struct hisi_sas_tmf_task *tmf,
>>   			      int *pass)
> Do you really need to query the task first?
> As per SAM a successful return from an ABORT TASK TMF has this meaning:
>
> A response of FUNCTION COMPLETE shall indicate that the task was
> aborted or was not in the task set.
>
> Ie it doesn't matter if the task was present or not.
>
> Cheers,
>
> Hannes
>

I think that it should be ok to the abort without first querying. I'll 
just test this to double-check. (This would make the first patch in the 
series superflous)

Cheers,
John



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw
  2016-02-16 15:31     ` Hannes Reinecke
@ 2016-02-16 16:13       ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 16:13 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:31, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> When TRANS_TX_CREDIT_TIMEOUT_ERR or
>> TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
>> command, the command should be re-attempted.
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
>>   1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>> index ce5f65d..34f71a1c 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>> @@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
>>   }
>>
>>   /* by default, task resp is complete */
>> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>> -			   struct sas_task *task,
>> -			   struct hisi_sas_slot *slot)
>> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
>> +			   struct hisi_sas_slot *slot, int *abort_slot)
>>   {
>>   	struct task_status_struct *ts = &task->task_status;
>>   	struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
>> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>>   			ts->stat = SAS_NAK_R_ERR;
>>   			break;
>>   		}
>> +		case TRANS_TX_CREDIT_TIMEOUT_ERR:
>> +		case TRANS_TX_CLOSE_NORMAL_ERR:
>> +		{
>> +			/* This will request a retry */
>> +			ts->stat = SAS_QUEUE_FULL;
>> +			++(*abort_slot);
>> +			break;
>> +		}
>>   		default:
>>   		{
>>   			ts->stat = SAM_STAT_CHECK_CONDITION;
>> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
>>
>>   	if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>>   		!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>> +		int abort_slot = 0;
>>
>> -		slot_err_v1_hw(hisi_hba, task, slot);
>> +		slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>> +		if (unlikely(abort_slot)) {
>> +			queue_work(hisi_hba->wq, &slot->abort_slot);
>> +			sts = ts->stat;
>> +			goto out_1;
>> +		}
>>   		goto out;
>>   	}
>>
> What is the 'abort_slot' variable for?
> Currently it's just a counter, no?
> So why the weird pointer passing?
>
> And it does feel weird. Apparently the driver does get a message,
> but still has to abort the command. Why?
> Isn't the message an indicator that the command has been aborted?
>
> Cheers,
>
> Hannes
>

I'll paste some more code for convenience and to help clarify:

static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
                    struct hisi_sas_slot *slot, int abort)
{
...

     if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
         !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
         int abort_slot = 0;

         slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
         if (unlikely(abort_slot)) { /* check if we need to abort the 
task */
             queue_work(hisi_hba->wq, &slot->abort_slot);
             sts = ts->stat;
             goto out_1;
         }
         goto out;
     }

  ...

out:
     if (sas_dev && sas_dev->running_req)
         sas_dev->running_req--;

     hisi_sas_slot_task_free(hisi_hba, task, slot);
     sts = ts->stat;

     if (task->task_done)
         task->task_done(task);
out_1:

     return sts;
}

Variable abort_slot is really a boolean flag which can be set in 
slot_err_v1_hw(). When error TRANS_TX_CREDIT_TIMEOUT_ERR or 
TRANS_TX_CLOSE_NORMAL_ERR occurs in the slot, abort_slot is set. In this 
case we don't immediately complete the task (goto out and call 
hisi_sas_slot_task_free() and task->task_done()), but instead queue the 
task to be aborted in the device before completing (call queue_work() 
and then goto out_1).
When hisi_sas_slot_abort() [patch #2] runs in the workqueue for the 
task, it first aborts the task in the device with a TMF, and then 
completes the task. Finally the status (SAS_QUEUE_FULL) is passed back 
to SCSI framework, which will request a retry for the scsi command.

This is the method our hw people recommended to handle these types of 
errors.

Hope this explains,
Cheers,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw
@ 2016-02-16 16:13       ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 16:13 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:31, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> When TRANS_TX_CREDIT_TIMEOUT_ERR or
>> TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
>> command, the command should be re-attempted.
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
>>   1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>> index ce5f65d..34f71a1c 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>> @@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba *hisi_hba,
>>   }
>>
>>   /* by default, task resp is complete */
>> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>> -			   struct sas_task *task,
>> -			   struct hisi_sas_slot *slot)
>> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
>> +			   struct hisi_sas_slot *slot, int *abort_slot)
>>   {
>>   	struct task_status_struct *ts = &task->task_status;
>>   	struct hisi_sas_err_record_v1 *err_record = slot->status_buffer;
>> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>>   			ts->stat = SAS_NAK_R_ERR;
>>   			break;
>>   		}
>> +		case TRANS_TX_CREDIT_TIMEOUT_ERR:
>> +		case TRANS_TX_CLOSE_NORMAL_ERR:
>> +		{
>> +			/* This will request a retry */
>> +			ts->stat = SAS_QUEUE_FULL;
>> +			++(*abort_slot);
>> +			break;
>> +		}
>>   		default:
>>   		{
>>   			ts->stat = SAM_STAT_CHECK_CONDITION;
>> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
>>
>>   	if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>>   		!(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>> +		int abort_slot = 0;
>>
>> -		slot_err_v1_hw(hisi_hba, task, slot);
>> +		slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>> +		if (unlikely(abort_slot)) {
>> +			queue_work(hisi_hba->wq, &slot->abort_slot);
>> +			sts = ts->stat;
>> +			goto out_1;
>> +		}
>>   		goto out;
>>   	}
>>
> What is the 'abort_slot' variable for?
> Currently it's just a counter, no?
> So why the weird pointer passing?
>
> And it does feel weird. Apparently the driver does get a message,
> but still has to abort the command. Why?
> Isn't the message an indicator that the command has been aborted?
>
> Cheers,
>
> Hannes
>

I'll paste some more code for convenience and to help clarify:

static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
                    struct hisi_sas_slot *slot, int abort)
{
...

     if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
         !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
         int abort_slot = 0;

         slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
         if (unlikely(abort_slot)) { /* check if we need to abort the 
task */
             queue_work(hisi_hba->wq, &slot->abort_slot);
             sts = ts->stat;
             goto out_1;
         }
         goto out;
     }

  ...

out:
     if (sas_dev && sas_dev->running_req)
         sas_dev->running_req--;

     hisi_sas_slot_task_free(hisi_hba, task, slot);
     sts = ts->stat;

     if (task->task_done)
         task->task_done(task);
out_1:

     return sts;
}

Variable abort_slot is really a boolean flag which can be set in 
slot_err_v1_hw(). When error TRANS_TX_CREDIT_TIMEOUT_ERR or 
TRANS_TX_CLOSE_NORMAL_ERR occurs in the slot, abort_slot is set. In this 
case we don't immediately complete the task (goto out and call 
hisi_sas_slot_task_free() and task->task_done()), but instead queue the 
task to be aborted in the device before completing (call queue_work() 
and then goto out_1).
When hisi_sas_slot_abort() [patch #2] runs in the workqueue for the 
task, it first aborts the task in the device with a TMF, and then 
completes the task. Finally the status (SAS_QUEUE_FULL) is passed back 
to SCSI framework, which will request a retry for the scsi command.

This is the method our hw people recommended to handle these types of 
errors.

Hope this explains,
Cheers,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-16 15:33   ` Hannes Reinecke
@ 2016-02-16 16:56       ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 16:56 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:33, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> In high-datarate aging tests, it is found that
>> the SCSI framework can periodically
>> issue lu resets to the device. This is because scsi
>> commands begin to timeout. It is found that TASK SET
>> FULL may be returned many times for the same command,
>> causing the timeouts.
>> To overcome this, the queue depth for the device needs
>> to be reduced to 64 (from 256, set in
>> sas_slave_configure()).
>>
> Hmm. TASK SET FULL should cause the queue depth to be reduced
> automatically, no?
>
> Cheers,
>
> Hannes
>

I need to double-check if Task set full reduces the depth, I don't think 
it does.

Regardless I found we were getting a combination of commands being 
retried due to Task Set Full and also SAS_QUEUE_FULL errors. For sure 
the SAS_QUEUE_FULL task errors reduce the queue depth in 
scsi_track_queue_full(). However I found it to be very slow in tracking, 
and we were getting commands timing out before the queue depth fell enough.

It would be nice to change default queue depth in sas_slave_configure() 
to a lower value so we can avoid this patch, but I am not sure if other 
vendor's HBA performance would be affected. From looking at the history 
of sas_slave_configure(), it would seem the value of 256 was inherited 
from mpt2sas driver, so I'm not sure.

Cheers,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
@ 2016-02-16 16:56       ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 16:56 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:33, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> In high-datarate aging tests, it is found that
>> the SCSI framework can periodically
>> issue lu resets to the device. This is because scsi
>> commands begin to timeout. It is found that TASK SET
>> FULL may be returned many times for the same command,
>> causing the timeouts.
>> To overcome this, the queue depth for the device needs
>> to be reduced to 64 (from 256, set in
>> sas_slave_configure()).
>>
> Hmm. TASK SET FULL should cause the queue depth to be reduced
> automatically, no?
>
> Cheers,
>
> Hannes
>

I need to double-check if Task set full reduces the depth, I don't think 
it does.

Regardless I found we were getting a combination of commands being 
retried due to Task Set Full and also SAS_QUEUE_FULL errors. For sure 
the SAS_QUEUE_FULL task errors reduce the queue depth in 
scsi_track_queue_full(). However I found it to be very slow in tracking, 
and we were getting commands timing out before the queue depth fell enough.

It would be nice to change default queue depth in sas_slave_configure() 
to a lower value so we can avoid this patch, but I am not sure if other 
vendor's HBA performance would be affected. From looking at the history 
of sas_slave_configure(), it would seem the value of 256 was inherited 
from mpt2sas driver, so I'm not sure.

Cheers,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/6] hisi_sas: use slot abort in v2 hw
  2016-02-16 15:32   ` Hannes Reinecke
@ 2016-02-16 16:58       ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 16:58 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:32, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> When TRANS_TX_ERR_FRAME_TXED error occurs for
>> a command, the command should be re-attempted.
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +++++++++++++++++--
>>   1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>> index 58e1956..2bf93079b 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>> @@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
>>   /* by default, task resp is complete */
>>   static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>>   			   struct sas_task *task,
>> -			   struct hisi_sas_slot *slot)
>> +			   struct hisi_sas_slot *slot,
>> +			   int *abort_slot)
>>   {
>>   	struct task_status_struct *ts = &task->task_status;
>>   	struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
>> @@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>>   			ts->stat = SAS_DATA_UNDERRUN;
>>   			break;
>>   		}
>> +		case TRANS_TX_ERR_FRAME_TXED:
>> +		{
>> +			/* This will request a retry */
>> +			ts->stat = SAS_QUEUE_FULL;
>> +			++(*abort_slot);
>> +			break;
>> +		}
>>   		case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
>>   		case TRANS_TX_ERR_PHY_NOT_ENABLE:
>>   		case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
>> @@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot,
>>
>>   	if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
>>   		(!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
>> +		int abort_slot = 0;
>>   		dev_dbg(dev, "%s slot %d has error info 0x%x\n",
>>   			__func__, slot->cmplt_queue_slot,
>>   			complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);
>>
>> -		slot_err_v2_hw(hisi_hba, task, slot);
>> +		slot_err_v2_hw(hisi_hba, task, slot, &abort_slot);
>> +		if (unlikely(abort_slot)) {
>> +			queue_work(hisi_hba->wq, &slot->abort_slot);
>> +			sts = ts->stat;
>> +			goto out_1;
>> +		}
>>   		goto out;
>>   	}
>>
> Again this weird 'abort_slot' pointer reshuffling.
> Care to clarify?
>
> Cheers,
>
> Hannes
>

Hopefully my explanation for patch #3 will help clarify here, as the 
code is functionally the same.

Thanks,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/6] hisi_sas: use slot abort in v2 hw
@ 2016-02-16 16:58       ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-16 16:58 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 16/02/2016 15:32, Hannes Reinecke wrote:
> On 02/16/2016 01:22 PM, John Garry wrote:
>> When TRANS_TX_ERR_FRAME_TXED error occurs for
>> a command, the command should be re-attempted.
>>
>> Signed-off-by: John Garry <john.garry@huawei.com>
>> ---
>>   drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 19 +++++++++++++++++--
>>   1 file changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>> index 58e1956..2bf93079b 100644
>> --- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
>> @@ -1190,7 +1190,8 @@ static void sata_done_v2_hw(struct hisi_hba *hisi_hba, struct sas_task *task,
>>   /* by default, task resp is complete */
>>   static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>>   			   struct sas_task *task,
>> -			   struct hisi_sas_slot *slot)
>> +			   struct hisi_sas_slot *slot,
>> +			   int *abort_slot)
>>   {
>>   	struct task_status_struct *ts = &task->task_status;
>>   	struct hisi_sas_err_record_v2 *err_record = slot->status_buffer;
>> @@ -1299,6 +1300,13 @@ static void slot_err_v2_hw(struct hisi_hba *hisi_hba,
>>   			ts->stat = SAS_DATA_UNDERRUN;
>>   			break;
>>   		}
>> +		case TRANS_TX_ERR_FRAME_TXED:
>> +		{
>> +			/* This will request a retry */
>> +			ts->stat = SAS_QUEUE_FULL;
>> +			++(*abort_slot);
>> +			break;
>> +		}
>>   		case TRANS_TX_OPEN_FAIL_WITH_IT_NEXUS_LOSS:
>>   		case TRANS_TX_ERR_PHY_NOT_ENABLE:
>>   		case TRANS_TX_OPEN_CNX_ERR_BY_OTHER:
>> @@ -1491,11 +1499,17 @@ slot_complete_v2_hw(struct hisi_hba *hisi_hba, struct hisi_sas_slot *slot,
>>
>>   	if ((complete_hdr->dw0 & CMPLT_HDR_ERX_MSK) &&
>>   		(!(complete_hdr->dw0 & CMPLT_HDR_RSPNS_XFRD_MSK))) {
>> +		int abort_slot = 0;
>>   		dev_dbg(dev, "%s slot %d has error info 0x%x\n",
>>   			__func__, slot->cmplt_queue_slot,
>>   			complete_hdr->dw0 & CMPLT_HDR_ERX_MSK);
>>
>> -		slot_err_v2_hw(hisi_hba, task, slot);
>> +		slot_err_v2_hw(hisi_hba, task, slot, &abort_slot);
>> +		if (unlikely(abort_slot)) {
>> +			queue_work(hisi_hba->wq, &slot->abort_slot);
>> +			sts = ts->stat;
>> +			goto out_1;
>> +		}
>>   		goto out;
>>   	}
>>
> Again this weird 'abort_slot' pointer reshuffling.
> Care to clarify?
>
> Cheers,
>
> Hannes
>

Hopefully my explanation for patch #3 will help clarify here, as the 
code is functionally the same.

Thanks,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw
  2016-02-16 16:13       ` John Garry
  (?)
@ 2016-02-18  7:16       ` Hannes Reinecke
  2016-02-18  9:52           ` John Garry
  -1 siblings, 1 reply; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-18  7:16 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 05:13 PM, John Garry wrote:
> On 16/02/2016 15:31, Hannes Reinecke wrote:
>> On 02/16/2016 01:22 PM, John Garry wrote:
>>> When TRANS_TX_CREDIT_TIMEOUT_ERR or
>>> TRANS_TX_CLOSE_NORMAL_ERR errors occur for a
>>> command, the command should be re-attempted.
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 22 ++++++++++++++++++----
>>>   1 file changed, 18 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>>> b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>>> index ce5f65d..34f71a1c 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
>>> @@ -1118,9 +1118,8 @@ static int prep_ssp_v1_hw(struct hisi_hba
>>> *hisi_hba,
>>>   }
>>>
>>>   /* by default, task resp is complete */
>>> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>>> -               struct sas_task *task,
>>> -               struct hisi_sas_slot *slot)
>>> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct
>>> sas_task *task,
>>> +               struct hisi_sas_slot *slot, int *abort_slot)
>>>   {
>>>       struct task_status_struct *ts = &task->task_status;
>>>       struct hisi_sas_err_record_v1 *err_record =
>>> slot->status_buffer;
>>> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba
>>> *hisi_hba,
>>>               ts->stat = SAS_NAK_R_ERR;
>>>               break;
>>>           }
>>> +        case TRANS_TX_CREDIT_TIMEOUT_ERR:
>>> +        case TRANS_TX_CLOSE_NORMAL_ERR:
>>> +        {
>>> +            /* This will request a retry */
>>> +            ts->stat = SAS_QUEUE_FULL;
>>> +            ++(*abort_slot);
>>> +            break;
>>> +        }
>>>           default:
>>>           {
>>>               ts->stat = SAM_STAT_CHECK_CONDITION;
>>> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct
>>> hisi_hba *hisi_hba,
>>>
>>>       if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>>>           !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>>> +        int abort_slot = 0;
>>>
>>> -        slot_err_v1_hw(hisi_hba, task, slot);
>>> +        slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>>> +        if (unlikely(abort_slot)) {
>>> +            queue_work(hisi_hba->wq, &slot->abort_slot);
>>> +            sts = ts->stat;
>>> +            goto out_1;
>>> +        }
>>>           goto out;
>>>       }
>>>
>> What is the 'abort_slot' variable for?
>> Currently it's just a counter, no?
>> So why the weird pointer passing?
>>
>> And it does feel weird. Apparently the driver does get a message,
>> but still has to abort the command. Why?
>> Isn't the message an indicator that the command has been aborted?
>>
>> Cheers,
>>
>> Hannes
>>
> 
> I'll paste some more code for convenience and to help clarify:
> 
> static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
>                    struct hisi_sas_slot *slot, int abort)
> {
> ...
> 
>     if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>         !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>         int abort_slot = 0;
> 
>         slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>         if (unlikely(abort_slot)) { /* check if we need to abort the
> task */
>             queue_work(hisi_hba->wq, &slot->abort_slot);
>             sts = ts->stat;
>             goto out_1;
>         }
>         goto out;
>     }
> 
>  ...
> 
> out:
>     if (sas_dev && sas_dev->running_req)
>         sas_dev->running_req--;
> 
>     hisi_sas_slot_task_free(hisi_hba, task, slot);
>     sts = ts->stat;
> 
>     if (task->task_done)
>         task->task_done(task);
> out_1:
> 
>     return sts;
> }
> 
> Variable abort_slot is really a boolean flag which can be set in
> slot_err_v1_hw(). When error TRANS_TX_CREDIT_TIMEOUT_ERR or
> TRANS_TX_CLOSE_NORMAL_ERR occurs in the slot, abort_slot is set. In
> this case we don't immediately complete the task (goto out and call
> hisi_sas_slot_task_free() and task->task_done()), but instead queue
> the task to be aborted in the device before completing (call
> queue_work() and then goto out_1).
So why not make slot_err_vi_hw() a boolean and have abort_slot as
the return value?

> When hisi_sas_slot_abort() [patch #2] runs in the workqueue for the
> task, it first aborts the task in the device with a TMF, and then
> completes the task. Finally the status (SAS_QUEUE_FULL) is passed
> back to SCSI framework, which will request a retry for the scsi
> command.
> 
> This is the method our hw people recommended to handle these types
> of errors.
> 
Ok, sure, that does explain it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-16 16:56       ` John Garry
  (?)
@ 2016-02-18  7:40       ` Hannes Reinecke
  2016-02-18 10:12           ` John Garry
  -1 siblings, 1 reply; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-18  7:40 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/16/2016 05:56 PM, John Garry wrote:
> On 16/02/2016 15:33, Hannes Reinecke wrote:
>> On 02/16/2016 01:22 PM, John Garry wrote:
>>> In high-datarate aging tests, it is found that
>>> the SCSI framework can periodically
>>> issue lu resets to the device. This is because scsi
>>> commands begin to timeout. It is found that TASK SET
>>> FULL may be returned many times for the same command,
>>> causing the timeouts.
>>> To overcome this, the queue depth for the device needs
>>> to be reduced to 64 (from 256, set in
>>> sas_slave_configure()).
>>>
>> Hmm. TASK SET FULL should cause the queue depth to be reduced
>> automatically, no?
>>
>> Cheers,
>>
>> Hannes
>>
> 
> I need to double-check if Task set full reduces the depth, I don't
> think it does.
> 
> Regardless I found we were getting a combination of commands being
> retried due to Task Set Full and also SAS_QUEUE_FULL errors. For
> sure the SAS_QUEUE_FULL task errors reduce the queue depth in
> scsi_track_queue_full(). However I found it to be very slow in
> tracking, and we were getting commands timing out before the queue
> depth fell enough.
> 
> It would be nice to change default queue depth in
> sas_slave_configure() to a lower value so we can avoid this patch,
> but I am not sure if other vendor's HBA performance would be
> affected. From looking at the history of sas_slave_configure(), it
> would seem the value of 256 was inherited from mpt2sas driver, so
> I'm not sure.
> 
Well, the classical thing would be to associate each request tag
with a SAS task; or, in your case, associate each slot index with a
request tag.
You probably would need to reserve some slots for TMFs, ie you'd
need to decrease the resulting ->can_queue variable by that.
But once you've done that you shouldn't hit any QUEUE_FULL issues,
as the block layer will ensure that no tags will be reused while the
command is in flight.
Plus this is something you really need to be doing if you ever
consider moving to scsi-mq ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()
  2016-02-16 15:41       ` John Garry
@ 2016-02-18  9:30         ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18  9:30 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao

On 16/02/2016 15:41, John Garry wrote:
> On 16/02/2016 15:22, Hannes Reinecke wrote:
>> On 02/16/2016 01:22 PM, John Garry wrote:
>>> Add a function to abort a slot (task) in the device
>>> (if it is in the task set) and then cleanup and
>>> complete the task.
>>> The function is called from work queue context as
>>> it cannot be called from the context where it is
>>> triggered (interrupt).
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/scsi/hisi_sas/hisi_sas.h      |  1 +
>>>   drivers/scsi/hisi_sas/hisi_sas_main.c | 43
>>> +++++++++++++++++++++++++++++++++++
>>>   2 files changed, 44 insertions(+)
>>>
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h
>>> b/drivers/scsi/hisi_sas/hisi_sas.h
>>> index 02da7e4..a05ce71 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas.h
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
>>> @@ -120,6 +120,7 @@ struct hisi_sas_slot {
>>>       dma_addr_t command_table_dma;
>>>       struct hisi_sas_sge_page *sge_page;
>>>       dma_addr_t sge_page_dma;
>>> +    struct work_struct abort_slot;
>>>   };
>>>
>>>   struct hisi_sas_tmf_task {
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> b/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> index c600f5e..65509eb 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> @@ -15,6 +15,9 @@
>>>   #define DEV_IS_GONE(dev) \
>>>       ((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
>>>
>>> +static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
>>> +                u8 *lun, struct hisi_sas_tmf_task *tmf);
>>> +
>>>   static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
>>>   {
>>>       return device->port->ha->lldd_ha;
>>> @@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct
>>> hisi_hba *hisi_hba,
>>>       return hisi_hba->hw->prep_stp(hisi_hba, slot);
>>>   }
>>>
>>> +/*
>>> + * This function will issue an abort TMF if a task is still in
>>> + * the target. Then it will do the task complete cleanup and
>>> + * callbacks.
>>> + */
>>> +static void hisi_sas_slot_abort(struct work_struct *work)
>>> +{
>>> +    struct hisi_sas_slot *abort_slot =
>>> +        container_of(work, struct hisi_sas_slot, abort_slot);
>>> +    struct sas_task *task = abort_slot->task;
>>> +    struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
>>> +    struct scsi_cmnd *cmnd = task->uldd_task;
>>> +    struct hisi_sas_tmf_task tmf_task;
>>> +    struct domain_device *device = task->dev;
>>> +    struct hisi_sas_device *sas_dev = device->lldd_dev;
>>> +    struct scsi_lun lun;
>>> +    int tag = abort_slot->idx, rc;
>>> +
>>> +    int_to_scsilun(cmnd->device->lun, &lun);
>>> +    tmf_task.tmf = TMF_QUERY_TASK;
>>> +    tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>>> +
>>> +    rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun,
>>> &tmf_task);
>>> +
>>> +    /* TMF_RESP_FUNC_SUCC means task is present in the task set */
>>> +    if (rc != TMF_RESP_FUNC_SUCC)
>>> +        goto out;
>>> +    tmf_task.tmf = TMF_ABORT_TASK;
>>> +    tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>>> +
>>> +    rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun,
>>> &tmf_task);
>>> +out:
>>> +    hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
>>> +    if (task->task_done)
>>> +        task->task_done(task);
>>> +    if (sas_dev && sas_dev->running_req)
>>> +        sas_dev->running_req--;
>>> +}
>>> +
>>>   static int hisi_sas_task_prep(struct sas_task *task, struct
>>> hisi_hba *hisi_hba,
>>>                     int is_tmf, struct hisi_sas_tmf_task *tmf,
>>>                     int *pass)
>> Do you really need to query the task first?
>> As per SAM a successful return from an ABORT TASK TMF has this meaning:
>>
>> A response of FUNCTION COMPLETE shall indicate that the task was
>> aborted or was not in the task set.
>>
>> Ie it doesn't matter if the task was present or not.
>>
>> Cheers,
>>
>> Hannes
>>
>
> I think that it should be ok to the abort without first querying. I'll
> just test this to double-check. (This would make the first patch in the
> series superflous)
>
> Cheers,
> John
>
>
> _______________________________________________
> linuxarm mailing list
> linuxarm@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm

As suggested, it is ok to remove the query tmf.

Thanks,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort()
@ 2016-02-18  9:30         ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18  9:30 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao

On 16/02/2016 15:41, John Garry wrote:
> On 16/02/2016 15:22, Hannes Reinecke wrote:
>> On 02/16/2016 01:22 PM, John Garry wrote:
>>> Add a function to abort a slot (task) in the device
>>> (if it is in the task set) and then cleanup and
>>> complete the task.
>>> The function is called from work queue context as
>>> it cannot be called from the context where it is
>>> triggered (interrupt).
>>>
>>> Signed-off-by: John Garry <john.garry@huawei.com>
>>> ---
>>>   drivers/scsi/hisi_sas/hisi_sas.h      |  1 +
>>>   drivers/scsi/hisi_sas/hisi_sas_main.c | 43
>>> +++++++++++++++++++++++++++++++++++
>>>   2 files changed, 44 insertions(+)
>>>
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas.h
>>> b/drivers/scsi/hisi_sas/hisi_sas.h
>>> index 02da7e4..a05ce71 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas.h
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas.h
>>> @@ -120,6 +120,7 @@ struct hisi_sas_slot {
>>>       dma_addr_t command_table_dma;
>>>       struct hisi_sas_sge_page *sge_page;
>>>       dma_addr_t sge_page_dma;
>>> +    struct work_struct abort_slot;
>>>   };
>>>
>>>   struct hisi_sas_tmf_task {
>>> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> b/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> index c600f5e..65509eb 100644
>>> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
>>> @@ -15,6 +15,9 @@
>>>   #define DEV_IS_GONE(dev) \
>>>       ((!dev) || (dev->dev_type == SAS_PHY_UNUSED))
>>>
>>> +static int hisi_sas_debug_issue_ssp_tmf(struct domain_device *device,
>>> +                u8 *lun, struct hisi_sas_tmf_task *tmf);
>>> +
>>>   static struct hisi_hba *dev_to_hisi_hba(struct domain_device *device)
>>>   {
>>>       return device->port->ha->lldd_ha;
>>> @@ -113,6 +116,45 @@ static int hisi_sas_task_prep_ata(struct
>>> hisi_hba *hisi_hba,
>>>       return hisi_hba->hw->prep_stp(hisi_hba, slot);
>>>   }
>>>
>>> +/*
>>> + * This function will issue an abort TMF if a task is still in
>>> + * the target. Then it will do the task complete cleanup and
>>> + * callbacks.
>>> + */
>>> +static void hisi_sas_slot_abort(struct work_struct *work)
>>> +{
>>> +    struct hisi_sas_slot *abort_slot =
>>> +        container_of(work, struct hisi_sas_slot, abort_slot);
>>> +    struct sas_task *task = abort_slot->task;
>>> +    struct hisi_hba *hisi_hba = dev_to_hisi_hba(task->dev);
>>> +    struct scsi_cmnd *cmnd = task->uldd_task;
>>> +    struct hisi_sas_tmf_task tmf_task;
>>> +    struct domain_device *device = task->dev;
>>> +    struct hisi_sas_device *sas_dev = device->lldd_dev;
>>> +    struct scsi_lun lun;
>>> +    int tag = abort_slot->idx, rc;
>>> +
>>> +    int_to_scsilun(cmnd->device->lun, &lun);
>>> +    tmf_task.tmf = TMF_QUERY_TASK;
>>> +    tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>>> +
>>> +    rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun,
>>> &tmf_task);
>>> +
>>> +    /* TMF_RESP_FUNC_SUCC means task is present in the task set */
>>> +    if (rc != TMF_RESP_FUNC_SUCC)
>>> +        goto out;
>>> +    tmf_task.tmf = TMF_ABORT_TASK;
>>> +    tmf_task.tag_of_task_to_be_managed = cpu_to_le16(tag);
>>> +
>>> +    rc = hisi_sas_debug_issue_ssp_tmf(task->dev, lun.scsi_lun,
>>> &tmf_task);
>>> +out:
>>> +    hisi_sas_slot_task_free(hisi_hba, task, abort_slot);
>>> +    if (task->task_done)
>>> +        task->task_done(task);
>>> +    if (sas_dev && sas_dev->running_req)
>>> +        sas_dev->running_req--;
>>> +}
>>> +
>>>   static int hisi_sas_task_prep(struct sas_task *task, struct
>>> hisi_hba *hisi_hba,
>>>                     int is_tmf, struct hisi_sas_tmf_task *tmf,
>>>                     int *pass)
>> Do you really need to query the task first?
>> As per SAM a successful return from an ABORT TASK TMF has this meaning:
>>
>> A response of FUNCTION COMPLETE shall indicate that the task was
>> aborted or was not in the task set.
>>
>> Ie it doesn't matter if the task was present or not.
>>
>> Cheers,
>>
>> Hannes
>>
>
> I think that it should be ok to the abort without first querying. I'll
> just test this to double-check. (This would make the first patch in the
> series superflous)
>
> Cheers,
> John
>
>
> _______________________________________________
> linuxarm mailing list
> linuxarm@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm

As suggested, it is ok to remove the query tmf.

Thanks,
John


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw
  2016-02-18  7:16       ` Hannes Reinecke
@ 2016-02-18  9:52           ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18  9:52 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

>>>>    /* by default, task resp is complete */
>>>> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>>>> -               struct sas_task *task,
>>>> -               struct hisi_sas_slot *slot)
>>>> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct
>>>> sas_task *task,
>>>> +               struct hisi_sas_slot *slot, int *abort_slot)
>>>>    {
>>>>        struct task_status_struct *ts = &task->task_status;
>>>>        struct hisi_sas_err_record_v1 *err_record =
>>>> slot->status_buffer;
>>>> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba
>>>> *hisi_hba,
>>>>                ts->stat = SAS_NAK_R_ERR;
>>>>                break;
>>>>            }
>>>> +        case TRANS_TX_CREDIT_TIMEOUT_ERR:
>>>> +        case TRANS_TX_CLOSE_NORMAL_ERR:
>>>> +        {
>>>> +            /* This will request a retry */
>>>> +            ts->stat = SAS_QUEUE_FULL;
>>>> +            ++(*abort_slot);
>>>> +            break;
>>>> +        }
>>>>            default:
>>>>            {
>>>>                ts->stat = SAM_STAT_CHECK_CONDITION;
>>>> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct
>>>> hisi_hba *hisi_hba,
>>>>
>>>>        if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>>>>            !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>>>> +        int abort_slot = 0;
>>>>
>>>> -        slot_err_v1_hw(hisi_hba, task, slot);
>>>> +        slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>>>> +        if (unlikely(abort_slot)) {
>>>> +            queue_work(hisi_hba->wq, &slot->abort_slot);
>>>> +            sts = ts->stat;
>>>> +            goto out_1;
>>>> +        }
>>>>            goto out;
>>>>        }
>>>>
>>
>> static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
>>                     struct hisi_sas_slot *slot, int abort)
>> {
>> ...
>>
>>      if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>>          !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>>          int abort_slot = 0;
>>
>>          slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>>          if (unlikely(abort_slot)) { /* check if we need to abort the
>> task */
>>              queue_work(hisi_hba->wq, &slot->abort_slot);
>>              sts = ts->stat;
>>              goto out_1;
>>          }
>>          goto out;
>>      }

>>
>> Variable abort_slot is really a boolean flag which can be set in
>> slot_err_v1_hw(). When error TRANS_TX_CREDIT_TIMEOUT_ERR or
>> TRANS_TX_CLOSE_NORMAL_ERR occurs in the slot, abort_slot is set. In
>> this case we don't immediately complete the task (goto out and call
>> hisi_sas_slot_task_free() and task->task_done()), but instead queue
>> the task to be aborted in the device before completing (call
>> queue_work() and then goto out_1).
> So why not make slot_err_vi_hw() a boolean and have abort_slot as
> the return value?
>

I am not happy that this function should return anything, more 
specifically only whether the task should be aborted. I would be 
concerned that if it did return this value then it may have to be 
changed later on if the code needs to be changed. However it would make 
the code a bit tighter now.

Alternatively I could pass a pointer to a boolean (sounds bad), or even 
inline slot_err_v1_hw() in slot_complete_v1_hw(), as this is the only 
place it is called from.

Cheers,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/6] hisi_sas: use slot abort in v1 hw
@ 2016-02-18  9:52           ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18  9:52 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

>>>>    /* by default, task resp is complete */
>>>> -static void slot_err_v1_hw(struct hisi_hba *hisi_hba,
>>>> -               struct sas_task *task,
>>>> -               struct hisi_sas_slot *slot)
>>>> +static void slot_err_v1_hw(struct hisi_hba *hisi_hba, struct
>>>> sas_task *task,
>>>> +               struct hisi_sas_slot *slot, int *abort_slot)
>>>>    {
>>>>        struct task_status_struct *ts = &task->task_status;
>>>>        struct hisi_sas_err_record_v1 *err_record =
>>>> slot->status_buffer;
>>>> @@ -1212,6 +1211,14 @@ static void slot_err_v1_hw(struct hisi_hba
>>>> *hisi_hba,
>>>>                ts->stat = SAS_NAK_R_ERR;
>>>>                break;
>>>>            }
>>>> +        case TRANS_TX_CREDIT_TIMEOUT_ERR:
>>>> +        case TRANS_TX_CLOSE_NORMAL_ERR:
>>>> +        {
>>>> +            /* This will request a retry */
>>>> +            ts->stat = SAS_QUEUE_FULL;
>>>> +            ++(*abort_slot);
>>>> +            break;
>>>> +        }
>>>>            default:
>>>>            {
>>>>                ts->stat = SAM_STAT_CHECK_CONDITION;
>>>> @@ -1317,8 +1324,14 @@ static int slot_complete_v1_hw(struct
>>>> hisi_hba *hisi_hba,
>>>>
>>>>        if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>>>>            !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>>>> +        int abort_slot = 0;
>>>>
>>>> -        slot_err_v1_hw(hisi_hba, task, slot);
>>>> +        slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>>>> +        if (unlikely(abort_slot)) {
>>>> +            queue_work(hisi_hba->wq, &slot->abort_slot);
>>>> +            sts = ts->stat;
>>>> +            goto out_1;
>>>> +        }
>>>>            goto out;
>>>>        }
>>>>
>>
>> static int slot_complete_v1_hw(struct hisi_hba *hisi_hba,
>>                     struct hisi_sas_slot *slot, int abort)
>> {
>> ...
>>
>>      if (cmplt_hdr_data & CMPLT_HDR_ERR_RCRD_XFRD_MSK &&
>>          !(cmplt_hdr_data & CMPLT_HDR_RSPNS_XFRD_MSK)) {
>>          int abort_slot = 0;
>>
>>          slot_err_v1_hw(hisi_hba, task, slot,  &abort_slot);
>>          if (unlikely(abort_slot)) { /* check if we need to abort the
>> task */
>>              queue_work(hisi_hba->wq, &slot->abort_slot);
>>              sts = ts->stat;
>>              goto out_1;
>>          }
>>          goto out;
>>      }

>>
>> Variable abort_slot is really a boolean flag which can be set in
>> slot_err_v1_hw(). When error TRANS_TX_CREDIT_TIMEOUT_ERR or
>> TRANS_TX_CLOSE_NORMAL_ERR occurs in the slot, abort_slot is set. In
>> this case we don't immediately complete the task (goto out and call
>> hisi_sas_slot_task_free() and task->task_done()), but instead queue
>> the task to be aborted in the device before completing (call
>> queue_work() and then goto out_1).
> So why not make slot_err_vi_hw() a boolean and have abort_slot as
> the return value?
>

I am not happy that this function should return anything, more 
specifically only whether the task should be aborted. I would be 
concerned that if it did return this value then it may have to be 
changed later on if the code needs to be changed. However it would make 
the code a bit tighter now.

Alternatively I could pass a pointer to a boolean (sounds bad), or even 
inline slot_err_v1_hw() in slot_complete_v1_hw(), as this is the only 
place it is called from.

Cheers,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-18  7:40       ` Hannes Reinecke
@ 2016-02-18 10:12           ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18 10:12 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 18/02/2016 07:40, Hannes Reinecke wrote:
> On 02/16/2016 05:56 PM, John Garry wrote:
>> On 16/02/2016 15:33, Hannes Reinecke wrote:
>>> On 02/16/2016 01:22 PM, John Garry wrote:
>>>> In high-datarate aging tests, it is found that
>>>> the SCSI framework can periodically
>>>> issue lu resets to the device. This is because scsi
>>>> commands begin to timeout. It is found that TASK SET
>>>> FULL may be returned many times for the same command,
>>>> causing the timeouts.
>>>> To overcome this, the queue depth for the device needs
>>>> to be reduced to 64 (from 256, set in
>>>> sas_slave_configure()).
>>>>
>>> Hmm. TASK SET FULL should cause the queue depth to be reduced
>>> automatically, no?
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>>
>> I need to double-check if Task set full reduces the depth, I don't
>> think it does.
>>
>> Regardless I found we were getting a combination of commands being
>> retried due to Task Set Full and also SAS_QUEUE_FULL errors. For
>> sure the SAS_QUEUE_FULL task errors reduce the queue depth in
>> scsi_track_queue_full(). However I found it to be very slow in
>> tracking, and we were getting commands timing out before the queue
>> depth fell enough.
>>
>> It would be nice to change default queue depth in
>> sas_slave_configure() to a lower value so we can avoid this patch,
>> but I am not sure if other vendor's HBA performance would be
>> affected. From looking at the history of sas_slave_configure(), it
>> would seem the value of 256 was inherited from mpt2sas driver, so
>> I'm not sure.
>>
> Well, the classical thing would be to associate each request tag
> with a SAS task; or, in your case, associate each slot index with a
> request tag.
> You probably would need to reserve some slots for TMFs, ie you'd
> need to decrease the resulting ->can_queue variable by that.
> But once you've done that you shouldn't hit any QUEUE_FULL issues,
> as the block layer will ensure that no tags will be reused while the
> command is in flight.
> Plus this is something you really need to be doing if you ever
> consider moving to scsi-mq ...
>
> Cheers,
>
> Hannes
>
Hi,

So would you recommend this method under the assumption that the 
can_queue value for the host is similar to the queue depth for the device?

Regards,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
@ 2016-02-18 10:12           ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18 10:12 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 18/02/2016 07:40, Hannes Reinecke wrote:
> On 02/16/2016 05:56 PM, John Garry wrote:
>> On 16/02/2016 15:33, Hannes Reinecke wrote:
>>> On 02/16/2016 01:22 PM, John Garry wrote:
>>>> In high-datarate aging tests, it is found that
>>>> the SCSI framework can periodically
>>>> issue lu resets to the device. This is because scsi
>>>> commands begin to timeout. It is found that TASK SET
>>>> FULL may be returned many times for the same command,
>>>> causing the timeouts.
>>>> To overcome this, the queue depth for the device needs
>>>> to be reduced to 64 (from 256, set in
>>>> sas_slave_configure()).
>>>>
>>> Hmm. TASK SET FULL should cause the queue depth to be reduced
>>> automatically, no?
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>>
>> I need to double-check if Task set full reduces the depth, I don't
>> think it does.
>>
>> Regardless I found we were getting a combination of commands being
>> retried due to Task Set Full and also SAS_QUEUE_FULL errors. For
>> sure the SAS_QUEUE_FULL task errors reduce the queue depth in
>> scsi_track_queue_full(). However I found it to be very slow in
>> tracking, and we were getting commands timing out before the queue
>> depth fell enough.
>>
>> It would be nice to change default queue depth in
>> sas_slave_configure() to a lower value so we can avoid this patch,
>> but I am not sure if other vendor's HBA performance would be
>> affected. From looking at the history of sas_slave_configure(), it
>> would seem the value of 256 was inherited from mpt2sas driver, so
>> I'm not sure.
>>
> Well, the classical thing would be to associate each request tag
> with a SAS task; or, in your case, associate each slot index with a
> request tag.
> You probably would need to reserve some slots for TMFs, ie you'd
> need to decrease the resulting ->can_queue variable by that.
> But once you've done that you shouldn't hit any QUEUE_FULL issues,
> as the block layer will ensure that no tags will be reused while the
> command is in flight.
> Plus this is something you really need to be doing if you ever
> consider moving to scsi-mq ...
>
> Cheers,
>
> Hannes
>
Hi,

So would you recommend this method under the assumption that the 
can_queue value for the host is similar to the queue depth for the device?

Regards,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-18 10:12           ` John Garry
  (?)
@ 2016-02-18 10:30           ` Hannes Reinecke
  2016-02-18 10:57               ` John Garry
  -1 siblings, 1 reply; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-18 10:30 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 02/18/2016 11:12 AM, John Garry wrote:
> On 18/02/2016 07:40, Hannes Reinecke wrote:
[ .. ]
>> Well, the classical thing would be to associate each request tag
>> with a SAS task; or, in your case, associate each slot index with a
>> request tag.
>> You probably would need to reserve some slots for TMFs, ie you'd
>> need to decrease the resulting ->can_queue variable by that.
>> But once you've done that you shouldn't hit any QUEUE_FULL issues,
>> as the block layer will ensure that no tags will be reused while the
>> command is in flight.
>> Plus this is something you really need to be doing if you ever
>> consider moving to scsi-mq ...
>>
>> Cheers,
>>
>> Hannes
>>
> Hi,
> 
> So would you recommend this method under the assumption that the
> can_queue value for the host is similar to the queue depth for the
> device?
> 
That depends.
Typically the can_queue setting reflects the number of commands the
_host_ can queue internally (due to hardware limitations etc).
They do not necessarily reflect the queue depth for the device
(unless you have a single device, of course).
So if the host has a hardware limit on the number of commands it can
queue, it should set the 'can_queue' variable to the appropriate
number; a host-wide shared tag map is always assumed with recent
kernels.

The queue_depth of an individual device is controlled by the
'cmd_per_lun' setting, and of course capped by can_queue.

But yes, I definitely recommend this method.
Is saves one _so much_ time trying to figure out which command slot
to use. Drawback is that you have to have some sort of fixed order
on them slots to do an efficient lookup.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-18 10:30           ` Hannes Reinecke
@ 2016-02-18 10:57               ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18 10:57 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 18/02/2016 10:30, Hannes Reinecke wrote:
> On 02/18/2016 11:12 AM, John Garry wrote:
>> On 18/02/2016 07:40, Hannes Reinecke wrote:
> [ .. ]
>>> Well, the classical thing would be to associate each request tag
>>> with a SAS task; or, in your case, associate each slot index with a
>>> request tag.
>>> You probably would need to reserve some slots for TMFs, ie you'd
>>> need to decrease the resulting ->can_queue variable by that.
>>> But once you've done that you shouldn't hit any QUEUE_FULL issues,
>>> as the block layer will ensure that no tags will be reused while the
>>> command is in flight.
>>> Plus this is something you really need to be doing if you ever
>>> consider moving to scsi-mq ...
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>> Hi,
>>
>> So would you recommend this method under the assumption that the
>> can_queue value for the host is similar to the queue depth for the
>> device?
>>
> That depends.
> Typically the can_queue setting reflects the number of commands the
> _host_ can queue internally (due to hardware limitations etc).
> They do not necessarily reflect the queue depth for the device
> (unless you have a single device, of course).
> So if the host has a hardware limit on the number of commands it can
> queue, it should set the 'can_queue' variable to the appropriate
> number; a host-wide shared tag map is always assumed with recent
> kernels.
>
> The queue_depth of an individual device is controlled by the
> 'cmd_per_lun' setting, and of course capped by can_queue.
>
> But yes, I definitely recommend this method.
> Is saves one _so much_ time trying to figure out which command slot
> to use. Drawback is that you have to have some sort of fixed order
> on them slots to do an efficient lookup.
>
> Cheers,
>
> Hannes
>

I would like to make a point on cmd_per_lun before considering tagging 
slots: For our host the can_queue is considerably greater than 
cmd_per_lun (even though we initially set the same in the host template, 
which would be incorrect). Regardless I find the host cmd_per_lun is 
effectively ignored for the slave device queue depth as it is reset in 
sas_slave_configure() to 256 [if this function is used and tagging 
enabled]. So if we we choose a reasonable cmd_per_lun for our host, it 
is ignored, right? Or am I missing something?

Thanks,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
@ 2016-02-18 10:57               ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-18 10:57 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linuxarm, zhangfei.gao, xuwei5, john.garry2, linux-scsi, linux-kernel

On 18/02/2016 10:30, Hannes Reinecke wrote:
> On 02/18/2016 11:12 AM, John Garry wrote:
>> On 18/02/2016 07:40, Hannes Reinecke wrote:
> [ .. ]
>>> Well, the classical thing would be to associate each request tag
>>> with a SAS task; or, in your case, associate each slot index with a
>>> request tag.
>>> You probably would need to reserve some slots for TMFs, ie you'd
>>> need to decrease the resulting ->can_queue variable by that.
>>> But once you've done that you shouldn't hit any QUEUE_FULL issues,
>>> as the block layer will ensure that no tags will be reused while the
>>> command is in flight.
>>> Plus this is something you really need to be doing if you ever
>>> consider moving to scsi-mq ...
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>> Hi,
>>
>> So would you recommend this method under the assumption that the
>> can_queue value for the host is similar to the queue depth for the
>> device?
>>
> That depends.
> Typically the can_queue setting reflects the number of commands the
> _host_ can queue internally (due to hardware limitations etc).
> They do not necessarily reflect the queue depth for the device
> (unless you have a single device, of course).
> So if the host has a hardware limit on the number of commands it can
> queue, it should set the 'can_queue' variable to the appropriate
> number; a host-wide shared tag map is always assumed with recent
> kernels.
>
> The queue_depth of an individual device is controlled by the
> 'cmd_per_lun' setting, and of course capped by can_queue.
>
> But yes, I definitely recommend this method.
> Is saves one _so much_ time trying to figure out which command slot
> to use. Drawback is that you have to have some sort of fixed order
> on them slots to do an efficient lookup.
>
> Cheers,
>
> Hannes
>

I would like to make a point on cmd_per_lun before considering tagging 
slots: For our host the can_queue is considerably greater than 
cmd_per_lun (even though we initially set the same in the host template, 
which would be incorrect). Regardless I find the host cmd_per_lun is 
effectively ignored for the slave device queue depth as it is reset in 
sas_slave_configure() to 256 [if this function is used and tagging 
enabled]. So if we we choose a reasonable cmd_per_lun for our host, it 
is ignored, right? Or am I missing something?

Thanks,
John

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-18 10:57               ` John Garry
@ 2016-02-19 10:46                 ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-19 10:46 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao

On 18/02/2016 10:57, John Garry wrote:
> On 18/02/2016 10:30, Hannes Reinecke wrote:
>> On 02/18/2016 11:12 AM, John Garry wrote:
>>> On 18/02/2016 07:40, Hannes Reinecke wrote:
>> [ .. ]
>>>> Well, the classical thing would be to associate each request tag
>>>> with a SAS task; or, in your case, associate each slot index with a
>>>> request tag.
>>>> You probably would need to reserve some slots for TMFs, ie you'd
>>>> need to decrease the resulting ->can_queue variable by that.
>>>> But once you've done that you shouldn't hit any QUEUE_FULL issues,
>>>> as the block layer will ensure that no tags will be reused while the
>>>> command is in flight.
>>>> Plus this is something you really need to be doing if you ever
>>>> consider moving to scsi-mq ...
>>>>
>>>> Cheers,
>>>>
>>>> Hannes
>>>>
>>> Hi,
>>>
>>> So would you recommend this method under the assumption that the
>>> can_queue value for the host is similar to the queue depth for the
>>> device?
>>>
>> That depends.
>> Typically the can_queue setting reflects the number of commands the
>> _host_ can queue internally (due to hardware limitations etc).
>> They do not necessarily reflect the queue depth for the device
>> (unless you have a single device, of course).
>> So if the host has a hardware limit on the number of commands it can
>> queue, it should set the 'can_queue' variable to the appropriate
>> number; a host-wide shared tag map is always assumed with recent
>> kernels.
>>
>> The queue_depth of an individual device is controlled by the
>> 'cmd_per_lun' setting, and of course capped by can_queue.
>>
>> But yes, I definitely recommend this method.
>> Is saves one _so much_ time trying to figure out which command slot
>> to use. Drawback is that you have to have some sort of fixed order
>> on them slots to do an efficient lookup.
>>
>> Cheers,
>>
>> Hannes
>>
>
> I would like to make a point on cmd_per_lun before considering tagging
> slots: For our host the can_queue is considerably greater than
> cmd_per_lun (even though we initially set the same in the host template,
> which would be incorrect). Regardless I find the host cmd_per_lun is
> effectively ignored for the slave device queue depth as it is reset in
> sas_slave_configure() to 256 [if this function is used and tagging
> enabled]. So if we we choose a reasonable cmd_per_lun for our host, it
> is ignored, right? Or am I missing something?
>
> Thanks,
> John
>
>

I would like to make another point about why I am making this change in 
case it is not clear. The queue full events are form 
TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in the 
slot: I want the slot retried when this occurs, so I set status as 
SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI midlayer so 
we get a retry. I could use SAS_OPEN_REJECT alternatively as the error 
which would have the same affect.
The queue full are not from all slots being consumed in the HBA.

thanks again,
John


> _______________________________________________
> linuxarm mailing list
> linuxarm@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
@ 2016-02-19 10:46                 ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-19 10:46 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao

On 18/02/2016 10:57, John Garry wrote:
> On 18/02/2016 10:30, Hannes Reinecke wrote:
>> On 02/18/2016 11:12 AM, John Garry wrote:
>>> On 18/02/2016 07:40, Hannes Reinecke wrote:
>> [ .. ]
>>>> Well, the classical thing would be to associate each request tag
>>>> with a SAS task; or, in your case, associate each slot index with a
>>>> request tag.
>>>> You probably would need to reserve some slots for TMFs, ie you'd
>>>> need to decrease the resulting ->can_queue variable by that.
>>>> But once you've done that you shouldn't hit any QUEUE_FULL issues,
>>>> as the block layer will ensure that no tags will be reused while the
>>>> command is in flight.
>>>> Plus this is something you really need to be doing if you ever
>>>> consider moving to scsi-mq ...
>>>>
>>>> Cheers,
>>>>
>>>> Hannes
>>>>
>>> Hi,
>>>
>>> So would you recommend this method under the assumption that the
>>> can_queue value for the host is similar to the queue depth for the
>>> device?
>>>
>> That depends.
>> Typically the can_queue setting reflects the number of commands the
>> _host_ can queue internally (due to hardware limitations etc).
>> They do not necessarily reflect the queue depth for the device
>> (unless you have a single device, of course).
>> So if the host has a hardware limit on the number of commands it can
>> queue, it should set the 'can_queue' variable to the appropriate
>> number; a host-wide shared tag map is always assumed with recent
>> kernels.
>>
>> The queue_depth of an individual device is controlled by the
>> 'cmd_per_lun' setting, and of course capped by can_queue.
>>
>> But yes, I definitely recommend this method.
>> Is saves one _so much_ time trying to figure out which command slot
>> to use. Drawback is that you have to have some sort of fixed order
>> on them slots to do an efficient lookup.
>>
>> Cheers,
>>
>> Hannes
>>
>
> I would like to make a point on cmd_per_lun before considering tagging
> slots: For our host the can_queue is considerably greater than
> cmd_per_lun (even though we initially set the same in the host template,
> which would be incorrect). Regardless I find the host cmd_per_lun is
> effectively ignored for the slave device queue depth as it is reset in
> sas_slave_configure() to 256 [if this function is used and tagging
> enabled]. So if we we choose a reasonable cmd_per_lun for our host, it
> is ignored, right? Or am I missing something?
>
> Thanks,
> John
>
>

I would like to make another point about why I am making this change in 
case it is not clear. The queue full events are form 
TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in the 
slot: I want the slot retried when this occurs, so I set status as 
SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI midlayer so 
we get a retry. I could use SAS_OPEN_REJECT alternatively as the error 
which would have the same affect.
The queue full are not from all slots being consumed in the HBA.

thanks again,
John


> _______________________________________________
> linuxarm mailing list
> linuxarm@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-19 10:46                 ` John Garry
@ 2016-02-19 14:31                   ` Hannes Reinecke
  -1 siblings, 0 replies; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-19 14:31 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao

On 02/19/2016 11:46 AM, John Garry wrote:
> On 18/02/2016 10:57, John Garry wrote:
>> On 18/02/2016 10:30, Hannes Reinecke wrote:
>>> On 02/18/2016 11:12 AM, John Garry wrote:
>>>> On 18/02/2016 07:40, Hannes Reinecke wrote:
>>> [ .. ]
>>>>> Well, the classical thing would be to associate each request tag
>>>>> with a SAS task; or, in your case, associate each slot index
>>>>> with a
>>>>> request tag.
>>>>> You probably would need to reserve some slots for TMFs, ie you'd
>>>>> need to decrease the resulting ->can_queue variable by that.
>>>>> But once you've done that you shouldn't hit any QUEUE_FULL issues,
>>>>> as the block layer will ensure that no tags will be reused
>>>>> while the
>>>>> command is in flight.
>>>>> Plus this is something you really need to be doing if you ever
>>>>> consider moving to scsi-mq ...
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Hannes
>>>>>
>>>> Hi,
>>>>
>>>> So would you recommend this method under the assumption that the
>>>> can_queue value for the host is similar to the queue depth for the
>>>> device?
>>>>
>>> That depends.
>>> Typically the can_queue setting reflects the number of commands the
>>> _host_ can queue internally (due to hardware limitations etc).
>>> They do not necessarily reflect the queue depth for the device
>>> (unless you have a single device, of course).
>>> So if the host has a hardware limit on the number of commands it can
>>> queue, it should set the 'can_queue' variable to the appropriate
>>> number; a host-wide shared tag map is always assumed with recent
>>> kernels.
>>>
>>> The queue_depth of an individual device is controlled by the
>>> 'cmd_per_lun' setting, and of course capped by can_queue.
>>>
>>> But yes, I definitely recommend this method.
>>> Is saves one _so much_ time trying to figure out which command slot
>>> to use. Drawback is that you have to have some sort of fixed order
>>> on them slots to do an efficient lookup.
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>>
>> I would like to make a point on cmd_per_lun before considering
>> tagging slots: For our host the can_queue is considerably greater than
>> cmd_per_lun (even though we initially set the same in the host
>> template, which would be incorrect).
That is common behaviour; most hosts support more than one LUN, and
setting cmd_per_lun to a lower value will attempt to distribute the
available commands across all LUNs.

>> Regardless I find the host cmd_per_lun is effectively ignored for
>> the slave device queue depth as it is reset in
>> sas_slave_configure() to 256 [if this function is used and tagging
>> enabled]. So if we we choose a reasonable cmd_per_lun for our
>> host, it is ignored, right? Or am I missing something?
>>
Basically, yes.
As said above, the cmd_per_lun is a _very_ rough attempt to
distribute commands across several LUNs.
However, in general there's no need to rely on that, as each queue
(which is associated with the LUN) will be controlled individually
via the queue_depth ramp-down/ramp-up mechanism.

> I would like to make another point about why I am making this change
> in case it is not clear. The queue full events are form
> TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in
> the slot: I want the slot retried when this occurs, so I set status
> as SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI
> midlayer so we get a retry. I could use SAS_OPEN_REJECT
> alternatively as the error which would have the same affect.
> The queue full are not from all slots being consumed in the HBA.
> 
Ah, right. So you might be getting those errors even with some free
slots on the HBA. As such they are roughly equivalent to a
QUEUE_FULL SCSI statue, right?
So after reading SPL I guess you are right here; using tags wouldn't
help for this situation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
@ 2016-02-19 14:31                   ` Hannes Reinecke
  0 siblings, 0 replies; 45+ messages in thread
From: Hannes Reinecke @ 2016-02-19 14:31 UTC (permalink / raw)
  To: John Garry, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao

On 02/19/2016 11:46 AM, John Garry wrote:
> On 18/02/2016 10:57, John Garry wrote:
>> On 18/02/2016 10:30, Hannes Reinecke wrote:
>>> On 02/18/2016 11:12 AM, John Garry wrote:
>>>> On 18/02/2016 07:40, Hannes Reinecke wrote:
>>> [ .. ]
>>>>> Well, the classical thing would be to associate each request tag
>>>>> with a SAS task; or, in your case, associate each slot index
>>>>> with a
>>>>> request tag.
>>>>> You probably would need to reserve some slots for TMFs, ie you'd
>>>>> need to decrease the resulting ->can_queue variable by that.
>>>>> But once you've done that you shouldn't hit any QUEUE_FULL issues,
>>>>> as the block layer will ensure that no tags will be reused
>>>>> while the
>>>>> command is in flight.
>>>>> Plus this is something you really need to be doing if you ever
>>>>> consider moving to scsi-mq ...
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Hannes
>>>>>
>>>> Hi,
>>>>
>>>> So would you recommend this method under the assumption that the
>>>> can_queue value for the host is similar to the queue depth for the
>>>> device?
>>>>
>>> That depends.
>>> Typically the can_queue setting reflects the number of commands the
>>> _host_ can queue internally (due to hardware limitations etc).
>>> They do not necessarily reflect the queue depth for the device
>>> (unless you have a single device, of course).
>>> So if the host has a hardware limit on the number of commands it can
>>> queue, it should set the 'can_queue' variable to the appropriate
>>> number; a host-wide shared tag map is always assumed with recent
>>> kernels.
>>>
>>> The queue_depth of an individual device is controlled by the
>>> 'cmd_per_lun' setting, and of course capped by can_queue.
>>>
>>> But yes, I definitely recommend this method.
>>> Is saves one _so much_ time trying to figure out which command slot
>>> to use. Drawback is that you have to have some sort of fixed order
>>> on them slots to do an efficient lookup.
>>>
>>> Cheers,
>>>
>>> Hannes
>>>
>>
>> I would like to make a point on cmd_per_lun before considering
>> tagging slots: For our host the can_queue is considerably greater than
>> cmd_per_lun (even though we initially set the same in the host
>> template, which would be incorrect).
That is common behaviour; most hosts support more than one LUN, and
setting cmd_per_lun to a lower value will attempt to distribute the
available commands across all LUNs.

>> Regardless I find the host cmd_per_lun is effectively ignored for
>> the slave device queue depth as it is reset in
>> sas_slave_configure() to 256 [if this function is used and tagging
>> enabled]. So if we we choose a reasonable cmd_per_lun for our
>> host, it is ignored, right? Or am I missing something?
>>
Basically, yes.
As said above, the cmd_per_lun is a _very_ rough attempt to
distribute commands across several LUNs.
However, in general there's no need to rely on that, as each queue
(which is associated with the LUN) will be controlled individually
via the queue_depth ramp-down/ramp-up mechanism.

> I would like to make another point about why I am making this change
> in case it is not clear. The queue full events are form
> TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in
> the slot: I want the slot retried when this occurs, so I set status
> as SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI
> midlayer so we get a retry. I could use SAS_OPEN_REJECT
> alternatively as the error which would have the same affect.
> The queue full are not from all slots being consumed in the HBA.
> 
Ah, right. So you might be getting those errors even with some free
slots on the HBA. As such they are roughly equivalent to a
QUEUE_FULL SCSI statue, right?
So after reading SPL I guess you are right here; using tags wouldn't
help for this situation.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
  2016-02-19 14:31                   ` Hannes Reinecke
@ 2016-02-22 10:02                     ` John Garry
  -1 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-22 10:02 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao


>> I would like to make another point about why I am making this change
>> in case it is not clear. The queue full events are form
>> TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in
>> the slot: I want the slot retried when this occurs, so I set status
>> as SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI
>> midlayer so we get a retry. I could use SAS_OPEN_REJECT
>> alternatively as the error which would have the same affect.
>> The queue full are not from all slots being consumed in the HBA.
>>
> Ah, right. So you might be getting those errors even with some free
> slots on the HBA. As such they are roughly equivalent to a
> QUEUE_FULL SCSI statue, right?
> So after reading SPL I guess you are right here; using tags wouldn't
> help for this situation.
>

Yes, I think we have 90% of slots free in the host when this occurs for 
one particular test - Our v2 hw has 2K slots, which is >> cmd_per_lun. 
The errors are equivalent to queue full for the device.

Thanks,
John

> Cheers,
>
> Hannes
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure()
@ 2016-02-22 10:02                     ` John Garry
  0 siblings, 0 replies; 45+ messages in thread
From: John Garry @ 2016-02-22 10:02 UTC (permalink / raw)
  To: Hannes Reinecke, JBottomley, martin.petersen
  Cc: linux-scsi, john.garry2, linux-kernel, linuxarm, zhangfei.gao


>> I would like to make another point about why I am making this change
>> in case it is not clear. The queue full events are form
>> TRANS_TX_CREDIT_TIMEOUT_ERR and TRANS_TX_CLOSE_NORMAL_ERR errors in
>> the slot: I want the slot retried when this occurs, so I set status
>> as SAS_QUEUE_FULL just so we will report DID_SOFT_ERR to SCSI
>> midlayer so we get a retry. I could use SAS_OPEN_REJECT
>> alternatively as the error which would have the same affect.
>> The queue full are not from all slots being consumed in the HBA.
>>
> Ah, right. So you might be getting those errors even with some free
> slots on the HBA. As such they are roughly equivalent to a
> QUEUE_FULL SCSI statue, right?
> So after reading SPL I guess you are right here; using tags wouldn't
> help for this situation.
>

Yes, I think we have 90% of slots free in the host when this occurs for 
one particular test - Our v2 hw has 2K slots, which is >> cmd_per_lun. 
The errors are equivalent to queue full for the device.

Thanks,
John

> Cheers,
>
> Hannes
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-02-22 10:02 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-16 12:22 [PATCH 0/6] hisi_sas: add abort and retry feature John Garry
2016-02-16 12:22 ` John Garry
2016-02-16 12:22 ` [PATCH 1/6] hisi_sas: add TMF_RESP_FUNC_SUCC check John Garry
2016-02-16 12:22   ` John Garry
2016-02-16 15:20   ` Hannes Reinecke
2016-02-16 12:22 ` [PATCH 2/6] hisi_sas: add hisi_sas_slot_abort() John Garry
2016-02-16 12:22   ` John Garry
2016-02-16 15:22   ` Hannes Reinecke
2016-02-16 15:41     ` John Garry
2016-02-16 15:41       ` John Garry
2016-02-18  9:30       ` John Garry
2016-02-18  9:30         ` John Garry
2016-02-16 12:22 ` [PATCH 3/6] hisi_sas: use slot abort in v1 hw John Garry
2016-02-16 12:22   ` John Garry
2016-02-16 15:31   ` Hannes Reinecke
2016-02-16 15:31     ` Hannes Reinecke
2016-02-16 16:13     ` John Garry
2016-02-16 16:13       ` John Garry
2016-02-18  7:16       ` Hannes Reinecke
2016-02-18  9:52         ` John Garry
2016-02-18  9:52           ` John Garry
2016-02-16 12:22 ` [PATCH 4/6] hisi_sas: use slot abort in v2 hw John Garry
2016-02-16 12:22   ` John Garry
2016-02-16 15:32   ` Hannes Reinecke
2016-02-16 16:58     ` John Garry
2016-02-16 16:58       ` John Garry
2016-02-16 12:22 ` [PATCH 5/6] hisi_sas: add hisi_sas_slave_configure() John Garry
2016-02-16 12:22   ` John Garry
2016-02-16 15:33   ` Hannes Reinecke
2016-02-16 16:56     ` John Garry
2016-02-16 16:56       ` John Garry
2016-02-18  7:40       ` Hannes Reinecke
2016-02-18 10:12         ` John Garry
2016-02-18 10:12           ` John Garry
2016-02-18 10:30           ` Hannes Reinecke
2016-02-18 10:57             ` John Garry
2016-02-18 10:57               ` John Garry
2016-02-19 10:46               ` John Garry
2016-02-19 10:46                 ` John Garry
2016-02-19 14:31                 ` Hannes Reinecke
2016-02-19 14:31                   ` Hannes Reinecke
2016-02-22 10:02                   ` John Garry
2016-02-22 10:02                     ` John Garry
2016-02-16 12:22 ` [PATCH 6/6] hisi_sas: update driver version to 1.3 John Garry
2016-02-16 12:22   ` John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.