linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches
@ 2019-02-06 10:52 John Garry
  2019-02-06 10:52 ` [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw John Garry
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: John Garry @ 2019-02-06 10:52 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linuxarm, linux-kernel, linux-scsi, John Garry

This patchset introduces the following:
- DIX support for v3 hw

- Optimisation based on mapping CPUs to queues, much the same as other
  SCSI drivers do today, without exposing > 1 queue to upper layer.

  It is disabled by default and marked as experimental, due to ongoing
  discussion here:
  https://marc.info/?l=linux-scsi&m=154876335707751&w=2

  In addition, hopefully we can eventually expose > 1 queue to upper
  layer and remove the reply_map, if performance issue discussed here has
  a positive outcome:
  https://marc.info/?l=linux-scsi&m=154874776701343&w=2

- And some other minor tidying and a driver debugfs feature  

This series is based on Martin's 5.1 queue +
"scsi: hisi_sas: Set protection parameters prior to adding SCSI host"

John Garry (2):
  scsi: hisi_sas: Issue internal abort on all relevant queues
  scsi: hisi_sas: Do some more tidy-up

Luo Jiaxing (1):
  scsi: hisi_sas: Add manual trigger for debugfs dump

Xiang Chen (3):
  scsi: hisi_sas: Add support for DIX feature for v3 hw
  scsi: hisi_sas: change queue depth from 512 to 4096
  scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental

 drivers/scsi/hisi_sas/hisi_sas.h       |  48 +++--
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 237 +++++++++++++++++++++----
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |   2 +
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |   3 +
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 157 +++++++++++++---
 5 files changed, 377 insertions(+), 70 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw
  2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
@ 2019-02-06 10:52 ` John Garry
  2019-02-08 23:13   ` Martin K. Petersen
  2019-02-06 10:52 ` [PATCH 2/6] scsi: hisi_sas: Add manual trigger for debugfs dump John Garry
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: John Garry @ 2019-02-06 10:52 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: linuxarm, linux-kernel, linux-scsi, Xiang Chen, John Garry

From: Xiang Chen <chenxiang66@hisilicon.com>

This patch adds support for DIX to v3 hw driver.

For this, we build upon support for DIF, most significantly
is adding new DMA map and unmap paths.

Some pre-existing macro precedence issues are also tidied. They
were detected by checkpatch --strict.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       | 40 ++++++++---
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 96 +++++++++++++++++++++++---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 80 +++++++++++++++++----
 3 files changed, 184 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 9b35f84b5029..df2a92a0e7a1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -41,20 +41,25 @@
 #define HISI_SAS_COMMAND_TABLE_SZ (sizeof(union hisi_sas_command_table))
 
 #define hisi_sas_status_buf_addr(buf) \
-	(buf + offsetof(struct hisi_sas_slot_buf_table, status_buffer))
-#define hisi_sas_status_buf_addr_mem(slot) hisi_sas_status_buf_addr(slot->buf)
+	((buf) + offsetof(struct hisi_sas_slot_buf_table, status_buffer))
+#define hisi_sas_status_buf_addr_mem(slot) hisi_sas_status_buf_addr((slot)->buf)
 #define hisi_sas_status_buf_addr_dma(slot) \
-	hisi_sas_status_buf_addr(slot->buf_dma)
+	hisi_sas_status_buf_addr((slot)->buf_dma)
 
 #define hisi_sas_cmd_hdr_addr(buf) \
-	(buf + offsetof(struct hisi_sas_slot_buf_table, command_header))
-#define hisi_sas_cmd_hdr_addr_mem(slot) hisi_sas_cmd_hdr_addr(slot->buf)
-#define hisi_sas_cmd_hdr_addr_dma(slot) hisi_sas_cmd_hdr_addr(slot->buf_dma)
+	((buf) + offsetof(struct hisi_sas_slot_buf_table, command_header))
+#define hisi_sas_cmd_hdr_addr_mem(slot) hisi_sas_cmd_hdr_addr((slot)->buf)
+#define hisi_sas_cmd_hdr_addr_dma(slot) hisi_sas_cmd_hdr_addr((slot)->buf_dma)
 
 #define hisi_sas_sge_addr(buf) \
-	(buf + offsetof(struct hisi_sas_slot_buf_table, sge_page))
-#define hisi_sas_sge_addr_mem(slot) hisi_sas_sge_addr(slot->buf)
-#define hisi_sas_sge_addr_dma(slot) hisi_sas_sge_addr(slot->buf_dma)
+	((buf) + offsetof(struct hisi_sas_slot_buf_table, sge_page))
+#define hisi_sas_sge_addr_mem(slot) hisi_sas_sge_addr((slot)->buf)
+#define hisi_sas_sge_addr_dma(slot) hisi_sas_sge_addr((slot)->buf_dma)
+
+#define hisi_sas_sge_dif_addr(buf) \
+	((buf) + offsetof(struct hisi_sas_slot_dif_buf_table, sge_dif_page))
+#define hisi_sas_sge_dif_addr_mem(slot) hisi_sas_sge_dif_addr((slot)->buf)
+#define hisi_sas_sge_dif_addr_dma(slot) hisi_sas_sge_dif_addr((slot)->buf_dma)
 
 #define HISI_SAS_MAX_SSP_RESP_SZ (sizeof(struct ssp_frame_hdr) + 1024)
 #define HISI_SAS_MAX_SMP_RESP_SZ 1028
@@ -74,7 +79,11 @@
 				SHOST_DIF_TYPE2_PROTECTION | \
 				SHOST_DIF_TYPE3_PROTECTION)
 
-#define HISI_SAS_PROT_MASK (HISI_SAS_DIF_PROT_MASK)
+#define HISI_SAS_DIX_PROT_MASK (SHOST_DIX_TYPE1_PROTECTION | \
+				SHOST_DIX_TYPE2_PROTECTION | \
+				SHOST_DIX_TYPE3_PROTECTION)
+
+#define HISI_SAS_PROT_MASK (HISI_SAS_DIF_PROT_MASK | HISI_SAS_DIX_PROT_MASK)
 
 #define HISI_SAS_WAIT_PHYUP_TIMEOUT 20
 
@@ -201,6 +210,7 @@ struct hisi_sas_slot {
 	struct sas_task *task;
 	struct hisi_sas_port	*port;
 	u64	n_elem;
+	u64	n_elem_dif;
 	int	dlvry_queue;
 	int	dlvry_queue_slot;
 	int	cmplt_queue;
@@ -464,6 +474,11 @@ struct hisi_sas_sge_page {
 	struct hisi_sas_sge sge[HISI_SAS_SGE_PAGE_CNT];
 }  __aligned(16);
 
+#define HISI_SAS_SGE_DIF_PAGE_CNT   SG_CHUNK_SIZE
+struct hisi_sas_sge_dif_page {
+	struct hisi_sas_sge sge[HISI_SAS_SGE_DIF_PAGE_CNT];
+}  __aligned(16);
+
 struct hisi_sas_command_table_ssp {
 	struct ssp_frame_hdr hdr;
 	union {
@@ -494,6 +509,11 @@ struct hisi_sas_slot_buf_table {
 	struct hisi_sas_sge_page sge_page;
 };
 
+struct hisi_sas_slot_dif_buf_table {
+	struct hisi_sas_slot_buf_table slot_buf;
+	struct hisi_sas_sge_dif_page sge_dif_page;
+};
+
 extern struct scsi_transport_template *hisi_sas_stt;
 
 extern bool hisi_sas_debugfs_enable;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 2d6b5fe90a9e..1a3d31d4258c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -252,11 +252,19 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
 
 		task->lldd_task = NULL;
 
-		if (!sas_protocol_ata(task->task_proto))
+		if (!sas_protocol_ata(task->task_proto)) {
+			struct sas_ssp_task *ssp_task = &task->ssp_task;
+			struct scsi_cmnd *scsi_cmnd = ssp_task->cmd;
+
 			if (slot->n_elem)
 				dma_unmap_sg(dev, task->scatter,
 					     task->num_scatter,
 					     task->data_dir);
+			if (slot->n_elem_dif)
+				dma_unmap_sg(dev, scsi_prot_sglist(scsi_cmnd),
+					     scsi_prot_sg_count(scsi_cmnd),
+					     task->data_dir);
+		}
 	}
 
 
@@ -380,6 +388,59 @@ static int hisi_sas_dma_map(struct hisi_hba *hisi_hba,
 	return rc;
 }
 
+static void hisi_sas_dif_dma_unmap(struct hisi_hba *hisi_hba,
+				   struct sas_task *task, int n_elem_dif)
+{
+	struct device *dev = hisi_hba->dev;
+
+	if (n_elem_dif) {
+		struct sas_ssp_task *ssp_task = &task->ssp_task;
+		struct scsi_cmnd *scsi_cmnd = ssp_task->cmd;
+
+		dma_unmap_sg(dev, scsi_prot_sglist(scsi_cmnd),
+			     scsi_prot_sg_count(scsi_cmnd),
+			     task->data_dir);
+	}
+}
+
+static int hisi_sas_dif_dma_map(struct hisi_hba *hisi_hba,
+				int *n_elem_dif, struct sas_task *task)
+{
+	struct device *dev = hisi_hba->dev;
+	struct sas_ssp_task *ssp_task;
+	struct scsi_cmnd *scsi_cmnd;
+	int rc;
+
+	if (task->num_scatter) {
+		ssp_task = &task->ssp_task;
+		scsi_cmnd = ssp_task->cmd;
+
+		if (scsi_prot_sg_count(scsi_cmnd)) {
+			*n_elem_dif = dma_map_sg(dev,
+						 scsi_prot_sglist(scsi_cmnd),
+						 scsi_prot_sg_count(scsi_cmnd),
+						 task->data_dir);
+
+			if (!*n_elem_dif)
+				return -ENOMEM;
+
+			if (*n_elem_dif > HISI_SAS_SGE_DIF_PAGE_CNT) {
+				dev_err(dev, "task prep: n_elem_dif(%d) too large\n",
+					*n_elem_dif);
+				rc = -EINVAL;
+				goto err_out_dif_dma_unmap;
+			}
+		}
+	}
+
+	return 0;
+
+err_out_dif_dma_unmap:
+	dma_unmap_sg(dev, scsi_prot_sglist(scsi_cmnd),
+		     scsi_prot_sg_count(scsi_cmnd), task->data_dir);
+	return rc;
+}
+
 static int hisi_sas_task_prep(struct sas_task *task,
 			      struct hisi_sas_dq **dq_pointer,
 			      bool is_tmf, struct hisi_sas_tmf_task *tmf,
@@ -394,7 +455,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	struct asd_sas_port *sas_port = device->port;
 	struct device *dev = hisi_hba->dev;
 	int dlvry_queue_slot, dlvry_queue, rc, slot_idx;
-	int n_elem = 0, n_elem_req = 0, n_elem_resp = 0;
+	int n_elem = 0, n_elem_dif = 0, n_elem_req = 0, n_elem_resp = 0;
 	struct hisi_sas_dq *dq;
 	unsigned long flags;
 	int wr_q_index;
@@ -427,6 +488,12 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	if (rc < 0)
 		goto prep_out;
 
+	if (!sas_protocol_ata(task->task_proto)) {
+		rc = hisi_sas_dif_dma_map(hisi_hba, &n_elem_dif, task);
+		if (rc < 0)
+			goto err_out_dma_unmap;
+	}
+
 	if (hisi_hba->hw->slot_index_alloc)
 		rc = hisi_hba->hw->slot_index_alloc(hisi_hba, device);
 	else {
@@ -445,7 +512,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
 	}
 	if (rc < 0)
-		goto err_out_dma_unmap;
+		goto err_out_dif_dma_unmap;
 
 	slot_idx = rc;
 	slot = &hisi_hba->slot_info[slot_idx];
@@ -466,6 +533,7 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	dlvry_queue_slot = wr_q_index;
 
 	slot->n_elem = n_elem;
+	slot->n_elem_dif = n_elem_dif;
 	slot->dlvry_queue = dlvry_queue;
 	slot->dlvry_queue_slot = dlvry_queue_slot;
 	cmd_hdr_base = hisi_hba->cmd_hdr[dlvry_queue];
@@ -509,6 +577,9 @@ static int hisi_sas_task_prep(struct sas_task *task,
 
 err_out_tag:
 	hisi_sas_slot_index_free(hisi_hba, slot_idx);
+err_out_dif_dma_unmap:
+	if (!sas_protocol_ata(task->task_proto))
+		hisi_sas_dif_dma_unmap(hisi_hba, task, n_elem_dif);
 err_out_dma_unmap:
 	hisi_sas_dma_unmap(hisi_hba, task, n_elem,
 			   n_elem_req, n_elem_resp);
@@ -2174,19 +2245,24 @@ int hisi_sas_alloc(struct hisi_hba *hisi_hba)
 
 	/* roundup to avoid overly large block size */
 	max_command_entries_ru = roundup(max_command_entries, 64);
-	sz_slot_buf_ru = roundup(sizeof(struct hisi_sas_slot_buf_table), 64);
+	if (hisi_hba->prot_mask & HISI_SAS_DIX_PROT_MASK)
+		sz_slot_buf_ru = sizeof(struct hisi_sas_slot_dif_buf_table);
+	else
+		sz_slot_buf_ru = sizeof(struct hisi_sas_slot_buf_table);
+	sz_slot_buf_ru = roundup(sz_slot_buf_ru, 64);
 	s = lcm(max_command_entries_ru, sz_slot_buf_ru);
 	blk_cnt = (max_command_entries_ru * sz_slot_buf_ru) / s;
 	slots_per_blk = s / sz_slot_buf_ru;
+
 	for (i = 0; i < blk_cnt; i++) {
-		struct hisi_sas_slot_buf_table *buf;
-		dma_addr_t buf_dma;
 		int slot_index = i * slots_per_blk;
+		dma_addr_t buf_dma;
+		void *buf;
 
-		buf = dmam_alloc_coherent(dev, s, &buf_dma, GFP_KERNEL);
+		buf = dmam_alloc_coherent(dev, s, &buf_dma,
+					  GFP_KERNEL | __GFP_ZERO);
 		if (!buf)
 			goto err_out;
-		memset(buf, 0, s);
 
 		for (j = 0; j < slots_per_blk; j++, slot_index++) {
 			struct hisi_sas_slot *slot;
@@ -2196,8 +2272,8 @@ int hisi_sas_alloc(struct hisi_hba *hisi_hba)
 			slot->buf_dma = buf_dma;
 			slot->idx = slot_index;
 
-			buf++;
-			buf_dma += sizeof(*buf);
+			buf += sz_slot_buf_ru;
+			buf_dma += sz_slot_buf_ru;
 		}
 	}
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index eafcab088c0b..0455fba93daf 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -399,6 +399,8 @@ struct hisi_sas_err_record_v3 {
 #define USR_DATA_BLOCK_SZ_OFF	20
 #define USR_DATA_BLOCK_SZ_MSK	(0x3 << USR_DATA_BLOCK_SZ_OFF)
 #define T10_CHK_MSK_OFF	    16
+#define T10_CHK_REF_TAG_MSK (0xf0 << T10_CHK_MSK_OFF)
+#define T10_CHK_APP_TAG_MSK (0xc << T10_CHK_MSK_OFF)
 
 static bool hisi_sas_intr_conv;
 MODULE_PARM_DESC(intr_conv, "interrupt converge enable (0-1)");
@@ -969,19 +971,44 @@ static void prep_prd_sge_v3_hw(struct hisi_hba *hisi_hba,
 
 	hdr->prd_table_addr = cpu_to_le64(hisi_sas_sge_addr_dma(slot));
 
-	hdr->sg_len = cpu_to_le32(n_elem << CMD_HDR_DATA_SGL_LEN_OFF);
+	hdr->sg_len |= cpu_to_le32(n_elem << CMD_HDR_DATA_SGL_LEN_OFF);
+}
+
+static void prep_prd_sge_dif_v3_hw(struct hisi_hba *hisi_hba,
+				   struct hisi_sas_slot *slot,
+				   struct hisi_sas_cmd_hdr *hdr,
+				   struct scatterlist *scatter,
+				   int n_elem)
+{
+	struct hisi_sas_sge_dif_page *sge_dif_page;
+	struct scatterlist *sg;
+	int i;
+
+	sge_dif_page = hisi_sas_sge_dif_addr_mem(slot);
+
+	for_each_sg(scatter, sg, n_elem, i) {
+		struct hisi_sas_sge *entry = &sge_dif_page->sge[i];
+
+		entry->addr = cpu_to_le64(sg_dma_address(sg));
+		entry->page_ctrl_0 = 0;
+		entry->page_ctrl_1 = 0;
+		entry->data_len = cpu_to_le32(sg_dma_len(sg));
+		entry->data_off = 0;
+	}
+
+	hdr->dif_prd_table_addr =
+		cpu_to_le64(hisi_sas_sge_dif_addr_dma(slot));
+
+	hdr->sg_len |= cpu_to_le32(n_elem << CMD_HDR_DIF_SGL_LEN_OFF);
 }
 
 static u32 get_prot_chk_msk_v3_hw(struct scsi_cmnd *scsi_cmnd)
 {
 	unsigned char prot_flags = scsi_cmnd->prot_flags;
 
-	if (prot_flags & SCSI_PROT_TRANSFER_PI) {
-		if (prot_flags & SCSI_PROT_REF_CHECK)
-			return 0xc << 16;
-		return 0xfc << 16;
-	}
-	return 0;
+	if (prot_flags & SCSI_PROT_REF_CHECK)
+		return T10_CHK_APP_TAG_MSK;
+	return T10_CHK_REF_TAG_MSK | T10_CHK_APP_TAG_MSK;
 }
 
 static void fill_prot_v3_hw(struct scsi_cmnd *scsi_cmnd,
@@ -992,15 +1019,33 @@ static void fill_prot_v3_hw(struct scsi_cmnd *scsi_cmnd,
 	u32 lbrt_chk_val = t10_pi_ref_tag(scsi_cmnd->request);
 
 	switch (prot_op) {
+	case SCSI_PROT_READ_INSERT:
+		prot->dw0 |= T10_INSRT_EN_MSK;
+		prot->lbrtgv = lbrt_chk_val;
+		break;
 	case SCSI_PROT_READ_STRIP:
 		prot->dw0 |= (T10_RMV_EN_MSK | T10_CHK_EN_MSK);
 		prot->lbrtcv = lbrt_chk_val;
 		prot->dw4 |= get_prot_chk_msk_v3_hw(scsi_cmnd);
 		break;
+	case SCSI_PROT_READ_PASS:
+		prot->dw0 |= T10_CHK_EN_MSK;
+		prot->lbrtcv = lbrt_chk_val;
+		prot->dw4 |= get_prot_chk_msk_v3_hw(scsi_cmnd);
+		break;
 	case SCSI_PROT_WRITE_INSERT:
 		prot->dw0 |= T10_INSRT_EN_MSK;
 		prot->lbrtgv = lbrt_chk_val;
 		break;
+	case SCSI_PROT_WRITE_STRIP:
+		prot->dw0 |= (T10_RMV_EN_MSK | T10_CHK_EN_MSK);
+		prot->lbrtcv = lbrt_chk_val;
+		break;
+	case SCSI_PROT_WRITE_PASS:
+		prot->dw0 |= T10_CHK_EN_MSK;
+		prot->lbrtcv = lbrt_chk_val;
+		prot->dw4 |= get_prot_chk_msk_v3_hw(scsi_cmnd);
+		break;
 	default:
 		WARN(1, "prot_op(0x%x) is not valid\n", prot_op);
 		break;
@@ -1077,9 +1122,15 @@ static void prep_ssp_v3_hw(struct hisi_hba *hisi_hba,
 	hdr->dw2 = cpu_to_le32(dw2);
 	hdr->transfer_tags = cpu_to_le32(slot->idx);
 
-	if (has_data)
+	if (has_data) {
 		prep_prd_sge_v3_hw(hisi_hba, slot, hdr, task->scatter,
-					slot->n_elem);
+				   slot->n_elem);
+
+		if (scsi_prot_sg_count(scsi_cmnd))
+			prep_prd_sge_dif_v3_hw(hisi_hba, slot, hdr,
+					       scsi_prot_sglist(scsi_cmnd),
+					       slot->n_elem_dif);
+	}
 
 	hdr->cmd_table_addr = cpu_to_le64(hisi_sas_cmd_hdr_addr_dma(slot));
 	hdr->sts_buffer_addr = cpu_to_le64(hisi_sas_status_buf_addr_dma(slot));
@@ -1120,18 +1171,19 @@ static void prep_ssp_v3_hw(struct hisi_hba *hisi_hba,
 		fill_prot_v3_hw(scsi_cmnd, &prot);
 		memcpy(buf_cmd_prot, &prot,
 		       sizeof(struct hisi_sas_protect_iu_v3_hw));
-
 		/*
 		 * For READ, we need length of info read to memory, while for
 		 * WRITE we need length of data written to the disk.
 		 */
-		if (prot_op == SCSI_PROT_WRITE_INSERT) {
+		if (prot_op == SCSI_PROT_WRITE_INSERT ||
+		    prot_op == SCSI_PROT_READ_INSERT ||
+		    prot_op == SCSI_PROT_WRITE_PASS ||
+		    prot_op == SCSI_PROT_READ_PASS) {
 			unsigned int interval = scsi_prot_interval(scsi_cmnd);
 			unsigned int ilog2_interval = ilog2(interval);
 
 			len = (task->total_xfer_len >> ilog2_interval) * 8;
 		}
-
 	}
 
 	hdr->dw1 = cpu_to_le32(dw1);
@@ -2526,6 +2578,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.bios_param		= sas_bios_param,
 	.this_id		= -1,
 	.sg_tablesize		= HISI_SAS_SGE_PAGE_CNT,
+	.sg_prot_tablesize	= HISI_SAS_SGE_PAGE_CNT,
 	.max_sectors		= SCSI_DEFAULT_MAX_SECTORS,
 	.eh_device_reset_handler = sas_eh_device_reset_handler,
 	.eh_target_reset_handler = sas_eh_target_reset_handler,
@@ -2701,6 +2754,9 @@ hisi_sas_v3_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		dev_info(dev, "Registering for DIF/DIX prot_mask=0x%x\n",
 			 prot_mask);
 		scsi_host_set_prot(hisi_hba->shost, prot_mask);
+		if (hisi_hba->prot_mask & HISI_SAS_DIX_PROT_MASK)
+			scsi_host_set_guard(hisi_hba->shost,
+					    SHOST_DIX_GUARD_CRC);
 	}
 
 	rc = scsi_add_host(shost, dev);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] scsi: hisi_sas: Add manual trigger for debugfs dump
  2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
  2019-02-06 10:52 ` [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw John Garry
@ 2019-02-06 10:52 ` John Garry
  2019-02-06 10:52 ` [PATCH 3/6] scsi: hisi_sas: change queue depth from 512 to 4096 John Garry
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2019-02-06 10:52 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: linuxarm, linux-kernel, linux-scsi, Luo Jiaxing, John Garry

From: Luo Jiaxing <luojiaxing@huawei.com>

Add an interface to manually trigger a debugfs dump.

Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c | 34 +++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 1a3d31d4258c..35619bdad1d5 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -2977,6 +2977,36 @@ static void hisi_sas_debugfs_snapshot_regs(struct hisi_hba *hisi_hba)
 	hisi_hba->hw->snapshot_restore(hisi_hba);
 }
 
+static ssize_t hisi_sas_debugfs_trigger_dump_write(struct file *file,
+						   const char __user *user_buf,
+						   size_t count, loff_t *ppos)
+{
+	struct hisi_hba *hisi_hba = file->f_inode->i_private;
+	char buf[8];
+
+	/* A bit racy, but don't care too much since it's only debugfs */
+	if (hisi_hba->debugfs_snapshot)
+		return -EFAULT;
+
+	if (count > 8)
+		return -EFAULT;
+
+	if (copy_from_user(buf, user_buf, count))
+		return -EFAULT;
+
+	if (buf[0] != '1')
+		return -EFAULT;
+
+	queue_work(hisi_hba->wq, &hisi_hba->debugfs_work);
+
+	return count;
+}
+
+static const struct file_operations hisi_sas_debugfs_trigger_dump_fops = {
+	.write = &hisi_sas_debugfs_trigger_dump_write,
+	.owner = THIS_MODULE,
+};
+
 void hisi_sas_debugfs_work_handler(struct work_struct *work)
 {
 	struct hisi_hba *hisi_hba =
@@ -2999,6 +3029,10 @@ void hisi_sas_debugfs_init(struct hisi_hba *hisi_hba)
 
 	hisi_hba->debugfs_dir = debugfs_create_dir(dev_name(dev),
 						   hisi_sas_debugfs_dir);
+	debugfs_create_file("trigger_dump", 0600,
+			    hisi_hba->debugfs_dir,
+			    hisi_hba,
+			    &hisi_sas_debugfs_trigger_dump_fops);
 
 	/* Alloc buffer for global */
 	sz = hisi_hba->hw->debugfs_reg_global->count * 4;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] scsi: hisi_sas: change queue depth from 512 to 4096
  2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
  2019-02-06 10:52 ` [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw John Garry
  2019-02-06 10:52 ` [PATCH 2/6] scsi: hisi_sas: Add manual trigger for debugfs dump John Garry
@ 2019-02-06 10:52 ` John Garry
  2019-02-06 10:52 ` [PATCH 4/6] scsi: hisi_sas: Issue internal abort on all relevant queues John Garry
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2019-02-06 10:52 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: linuxarm, linux-kernel, linux-scsi, Xiang Chen, John Garry

From: Xiang Chen <chenxiang66@hisilicon.com>

If sending IOs to many disks from single queue, it is possible that the
queue may be full. To avoid the situation, change queue depth from 512
to 4096 which is the max number of IOs for v3 hw.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index df2a92a0e7a1..ff844d4ba0d3 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -30,7 +30,7 @@
 
 #define HISI_SAS_MAX_PHYS	9
 #define HISI_SAS_MAX_QUEUES	32
-#define HISI_SAS_QUEUE_SLOTS 512
+#define HISI_SAS_QUEUE_SLOTS	4096
 #define HISI_SAS_MAX_ITCT_ENTRIES 1024
 #define HISI_SAS_MAX_DEVICES HISI_SAS_MAX_ITCT_ENTRIES
 #define HISI_SAS_RESET_BIT	0
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] scsi: hisi_sas: Issue internal abort on all relevant queues
  2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
                   ` (2 preceding siblings ...)
  2019-02-06 10:52 ` [PATCH 3/6] scsi: hisi_sas: change queue depth from 512 to 4096 John Garry
@ 2019-02-06 10:52 ` John Garry
  2019-02-06 10:52 ` [PATCH 5/6] scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental John Garry
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2019-02-06 10:52 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: linuxarm, linux-kernel, linux-scsi, John Garry, Xiang Chen

To support queue mapped to a CPU, it needs to be ensured that issuing an
internal abort is safe, in that it is guaranteed that an internal abort
is processed for a single IO or a device after all the relevant
command(s) which it is attempting to abort have been processed by
the controller.

Currently we only deliver commands for any device on a single queue to
solve this problem, as we know that commands issued on the same queue
will be processed in order, and we will not have a scenario where the
internal abort is racing against a command(s) which it is trying to abort.

To enqueue commands on queue mapped to a CPU, choosing a queue for an
command is based on the associated queue for the current CPU, so
this is not safe for internal abort since it would definitely not be
guaranteed that commands for the command devices are issued on the
same queue.

To solve this issue, we take a bludgeoning approach, and issue a separate
internal abort on any queue(s) relevant to the command or device, in that
we will be guaranteed that at least one of these internal aborts will be
received last in the controller.

So, for aborting a single command, we can just force the internal abort to
be issued on the same queue as the command which we are trying to abort.

For aborting all commands associated with a device, we issue a separate
internal abort on all relevant queues. Issuing multiple internal aborts in
this fashion would have not side affect.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  2 +
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 64 ++++++++++++++++++++------
 drivers/scsi/hisi_sas/hisi_sas_v1_hw.c |  2 +
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |  2 +
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  8 +++-
 5 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index ff844d4ba0d3..5accad05a86f 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -365,6 +365,8 @@ struct hisi_hba {
 	u32 intr_coal_ticks;	/* Time of interrupt coalesce in us */
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
+	int cq_nvecs;
+
 	/* debugfs memories */
 	u32 *debugfs_global_reg;
 	u32 *debugfs_port_reg[HISI_SAS_MAX_PHYS];
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 35619bdad1d5..40a402f09afb 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -1023,7 +1023,7 @@ static void hisi_sas_dev_gone(struct domain_device *device)
 
 	if (!test_bit(HISI_SAS_RESET_BIT, &hisi_hba->flags)) {
 		hisi_sas_internal_task_abort(hisi_hba, device,
-				     HISI_SAS_INT_ABT_DEV, 0);
+					     HISI_SAS_INT_ABT_DEV, 0);
 
 		hisi_sas_dereg_device(hisi_hba, device);
 
@@ -1630,7 +1630,8 @@ static int hisi_sas_abort_task(struct sas_task *task)
 		task->task_proto & SAS_PROTOCOL_STP) {
 		if (task->dev->dev_type == SAS_SATA_DEV) {
 			rc = hisi_sas_internal_task_abort(hisi_hba, device,
-						HISI_SAS_INT_ABT_DEV, 0);
+							  HISI_SAS_INT_ABT_DEV,
+							  0);
 			if (rc < 0) {
 				dev_err(dev, "abort task: internal abort failed\n");
 				goto out;
@@ -1645,7 +1646,7 @@ static int hisi_sas_abort_task(struct sas_task *task)
 		struct hisi_sas_cq *cq = &hisi_hba->cq[slot->dlvry_queue];
 
 		rc = hisi_sas_internal_task_abort(hisi_hba, device,
-			     HISI_SAS_INT_ABT_CMD, tag);
+						  HISI_SAS_INT_ABT_CMD, tag);
 		if (((rc < 0) || (rc == TMF_RESP_FUNC_FAILED)) &&
 					task->lldd_task) {
 			/*
@@ -1671,7 +1672,7 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
 	int rc = TMF_RESP_FUNC_FAILED;
 
 	rc = hisi_sas_internal_task_abort(hisi_hba, device,
-					HISI_SAS_INT_ABT_DEV, 0);
+					  HISI_SAS_INT_ABT_DEV, 0);
 	if (rc < 0) {
 		dev_err(dev, "abort task set: internal abort rc=%d\n", rc);
 		return TMF_RESP_FUNC_FAILED;
@@ -1743,7 +1744,7 @@ static int hisi_sas_I_T_nexus_reset(struct domain_device *device)
 	int rc = TMF_RESP_FUNC_FAILED;
 
 	rc = hisi_sas_internal_task_abort(hisi_hba, device,
-					HISI_SAS_INT_ABT_DEV, 0);
+					  HISI_SAS_INT_ABT_DEV, 0);
 	if (rc < 0) {
 		dev_err(dev, "I_T nexus reset: internal abort (%d)\n", rc);
 		return TMF_RESP_FUNC_FAILED;
@@ -1788,7 +1789,7 @@ static int hisi_sas_lu_reset(struct domain_device *device, u8 *lun)
 		struct hisi_sas_tmf_task tmf_task = { .tmf =  TMF_LU_RESET };
 
 		rc = hisi_sas_internal_task_abort(hisi_hba, device,
-						HISI_SAS_INT_ABT_DEV, 0);
+						  HISI_SAS_INT_ABT_DEV, 0);
 		if (rc < 0) {
 			dev_err(dev, "lu_reset: internal abort failed\n");
 			goto out;
@@ -1874,7 +1875,7 @@ static int hisi_sas_query_task(struct sas_task *task)
 static int
 hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
 				  struct sas_task *task, int abort_flag,
-				  int task_tag)
+				  int task_tag, struct hisi_sas_dq *dq)
 {
 	struct domain_device *device = task->dev;
 	struct hisi_sas_device *sas_dev = device->lldd_dev;
@@ -1883,7 +1884,6 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
 	struct hisi_sas_slot *slot;
 	struct asd_sas_port *sas_port = device->port;
 	struct hisi_sas_cmd_hdr *cmd_hdr_base;
-	struct hisi_sas_dq *dq = sas_dev->dq;
 	int dlvry_queue_slot, dlvry_queue, n_elem = 0, rc, slot_idx;
 	unsigned long flags, flags_dq = 0;
 	int wr_q_index;
@@ -1955,18 +1955,19 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
 }
 
 /**
- * hisi_sas_internal_task_abort -- execute an internal
+ * _hisi_sas_internal_task_abort -- execute an internal
  * abort command for single IO command or a device
  * @hisi_hba: host controller struct
  * @device: domain device
  * @abort_flag: mode of operation, device or single IO
  * @tag: tag of IO to be aborted (only relevant to single
  *       IO mode)
+ * @dq: delivery queue for this internal abort command
  */
 static int
-hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
-			     struct domain_device *device,
-			     int abort_flag, int tag)
+_hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
+			      struct domain_device *device, int abort_flag,
+			      int tag, struct hisi_sas_dq *dq)
 {
 	struct sas_task *task;
 	struct hisi_sas_device *sas_dev = device->lldd_dev;
@@ -1994,7 +1995,7 @@ hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
 	add_timer(&task->slow_task->timer);
 
 	res = hisi_sas_internal_abort_task_exec(hisi_hba, sas_dev->device_id,
-						task, abort_flag, tag);
+						task, abort_flag, tag, dq);
 	if (res) {
 		del_timer(&task->slow_task->timer);
 		dev_err(dev, "internal task abort: executing internal task failed: %d\n",
@@ -2051,6 +2052,41 @@ hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
 	return res;
 }
 
+static int
+hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
+			     struct domain_device *device,
+			     int abort_flag, int tag)
+{
+	struct hisi_sas_slot *slot;
+	struct device *dev = hisi_hba->dev;
+	struct hisi_sas_dq *dq;
+	int i, rc;
+
+	switch (abort_flag) {
+	case HISI_SAS_INT_ABT_CMD:
+		slot = &hisi_hba->slot_info[tag];
+		dq = &hisi_hba->dq[slot->dlvry_queue];
+		return _hisi_sas_internal_task_abort(hisi_hba, device,
+						     abort_flag, tag, dq);
+	case HISI_SAS_INT_ABT_DEV:
+		for (i = 0; i < hisi_hba->cq_nvecs; i++) {
+			dq = &hisi_hba->dq[i];
+			rc = _hisi_sas_internal_task_abort(hisi_hba, device,
+							   abort_flag, tag,
+							   dq);
+			if (rc)
+				return rc;
+		}
+		break;
+	default:
+		dev_err(dev, "Unrecognised internal abort flag (%d)\n",
+			abort_flag);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static void hisi_sas_port_formed(struct asd_sas_phy *sas_phy)
 {
 	hisi_sas_port_notify_formed(sas_phy);
@@ -2117,7 +2153,7 @@ void hisi_sas_kill_tasklets(struct hisi_hba *hisi_hba)
 {
 	int i;
 
-	for (i = 0; i < hisi_hba->queue_count; i++) {
+	for (i = 0; i < hisi_hba->cq_nvecs; i++) {
 		struct hisi_sas_cq *cq = &hisi_hba->cq[i];
 
 		tasklet_kill(&cq->tasklet);
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
index c8f808faed95..293807443480 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v1_hw.c
@@ -1749,6 +1749,8 @@ static int interrupt_init_v1_hw(struct hisi_hba *hisi_hba)
 		}
 	}
 
+	hisi_hba->cq_nvecs = hisi_hba->queue_count;
+
 	return 0;
 }
 
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index aa90f17a1cac..39233a9cd1f5 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -3400,6 +3400,8 @@ static int interrupt_init_v2_hw(struct hisi_hba *hisi_hba)
 		tasklet_init(t, cq_tasklet_v2_hw, (unsigned long)cq);
 	}
 
+	hisi_hba->cq_nvecs = hisi_hba->queue_count;
+
 	return 0;
 
 free_cq_int_irqs:
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 0455fba93daf..20cec1d2439c 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -402,6 +402,8 @@ struct hisi_sas_err_record_v3 {
 #define T10_CHK_REF_TAG_MSK (0xf0 << T10_CHK_MSK_OFF)
 #define T10_CHK_APP_TAG_MSK (0xc << T10_CHK_MSK_OFF)
 
+#define BASE_VECTORS_V3_HW  16
+
 static bool hisi_sas_intr_conv;
 MODULE_PARM_DESC(intr_conv, "interrupt converge enable (0-1)");
 
@@ -2050,6 +2052,8 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 		return -ENOENT;
 	}
 
+	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
+
 	rc = devm_request_irq(dev, pci_irq_vector(pdev, 1),
 			      int_phy_up_down_bcast_v3_hw, 0,
 			      DRV_NAME " phy", hisi_hba);
@@ -2078,7 +2082,7 @@ static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 	}
 
 	/* Init tasklets for cq only */
-	for (i = 0; i < hisi_hba->queue_count; i++) {
+	for (i = 0; i < hisi_hba->cq_nvecs; i++) {
 		struct hisi_sas_cq *cq = &hisi_hba->cq[i];
 		struct tasklet_struct *t = &cq->tasklet;
 		int nr = hisi_sas_intr_conv ? 16 : 16 + i;
@@ -2795,7 +2799,7 @@ hisi_sas_v3_destroy_irqs(struct pci_dev *pdev, struct hisi_hba *hisi_hba)
 	free_irq(pci_irq_vector(pdev, 1), hisi_hba);
 	free_irq(pci_irq_vector(pdev, 2), hisi_hba);
 	free_irq(pci_irq_vector(pdev, 11), hisi_hba);
-	for (i = 0; i < hisi_hba->queue_count; i++) {
+	for (i = 0; i < hisi_hba->cq_nvecs; i++) {
 		struct hisi_sas_cq *cq = &hisi_hba->cq[i];
 		int nr = hisi_sas_intr_conv ? 16 : 16 + i;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental
  2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
                   ` (3 preceding siblings ...)
  2019-02-06 10:52 ` [PATCH 4/6] scsi: hisi_sas: Issue internal abort on all relevant queues John Garry
@ 2019-02-06 10:52 ` John Garry
  2019-02-06 10:52 ` [PATCH 6/6] scsi: hisi_sas: Do some more tidy-up John Garry
  2019-02-08 23:08 ` [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches Martin K. Petersen
  6 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2019-02-06 10:52 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: linuxarm, linux-kernel, linux-scsi, Xiang Chen, John Garry

From: Xiang Chen <chenxiang66@hisilicon.com>

For auto-control irq affinity mode, choose the dq to deliver IO according
to the current CPU.

Then it decreases the performance regression that fio and CQ interrupts
are processed on different node.

For user control irq affinity mode, keep it as before.

To realize it, also need to distinguish the usage of dq lock and sas_dev
lock.

We mark as experimental due to ongoing discussion on managed MSI IRQ
during hotplug:
https://marc.info/?l=linux-scsi&m=154876335707751&w=2

We're almost at the point where we can expose multiple queues to
the upper layer for SCSI MQ, but we need to sort out the per-HBA
tags performance issue.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas.h       |  4 ++
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 33 ++++++++++---
 drivers/scsi/hisi_sas/hisi_sas_v2_hw.c |  1 +
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 65 +++++++++++++++++++++++---
 4 files changed, 89 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas.h b/drivers/scsi/hisi_sas/hisi_sas.h
index 5accad05a86f..6c87bd34509a 100644
--- a/drivers/scsi/hisi_sas/hisi_sas.h
+++ b/drivers/scsi/hisi_sas/hisi_sas.h
@@ -173,6 +173,7 @@ struct hisi_sas_port {
 
 struct hisi_sas_cq {
 	struct hisi_hba *hisi_hba;
+	const struct cpumask *pci_irq_mask;
 	struct tasklet_struct tasklet;
 	int	rd_point;
 	int	id;
@@ -195,6 +196,7 @@ struct hisi_sas_device {
 	enum sas_device_type	dev_type;
 	int device_id;
 	int sata_idx;
+	spinlock_t lock; /* For protecting slots */
 };
 
 struct hisi_sas_tmf_task {
@@ -217,6 +219,7 @@ struct hisi_sas_slot {
 	int	cmplt_queue_slot;
 	int	abort;
 	int	ready;
+	int	device_id;
 	void	*cmd_hdr;
 	dma_addr_t cmd_hdr_dma;
 	struct timer_list internal_abort_timer;
@@ -366,6 +369,7 @@ struct hisi_hba {
 	u32 intr_coal_count;	/* Interrupt count to coalesce */
 
 	int cq_nvecs;
+	unsigned int *reply_map;
 
 	/* debugfs memories */
 	u32 *debugfs_global_reg;
diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 40a402f09afb..eff31472b96e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -241,8 +241,9 @@ static void hisi_sas_slot_index_init(struct hisi_hba *hisi_hba)
 void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
 			     struct hisi_sas_slot *slot)
 {
-	struct hisi_sas_dq *dq = &hisi_hba->dq[slot->dlvry_queue];
 	unsigned long flags;
+	int device_id = slot->device_id;
+	struct hisi_sas_device *sas_dev = &hisi_hba->devices[device_id];
 
 	if (task) {
 		struct device *dev = hisi_hba->dev;
@@ -267,10 +268,9 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task,
 		}
 	}
 
-
-	spin_lock_irqsave(&dq->lock, flags);
+	spin_lock_irqsave(&sas_dev->lock, flags);
 	list_del_init(&slot->entry);
-	spin_unlock_irqrestore(&dq->lock, flags);
+	spin_unlock_irqrestore(&sas_dev->lock, flags);
 
 	memset(slot, 0, offsetof(struct hisi_sas_slot, buf));
 
@@ -471,7 +471,14 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
-	*dq_pointer = dq = sas_dev->dq;
+	if (hisi_hba->reply_map) {
+		int cpu = raw_smp_processor_id();
+		unsigned int dq_index = hisi_hba->reply_map[cpu];
+
+		*dq_pointer = dq = &hisi_hba->dq[dq_index];
+	} else {
+		*dq_pointer = dq = sas_dev->dq;
+	}
 
 	port = to_hisi_sas_port(sas_port);
 	if (port && !port->port_attached) {
@@ -526,12 +533,15 @@ static int hisi_sas_task_prep(struct sas_task *task,
 	}
 
 	list_add_tail(&slot->delivery, &dq->list);
-	list_add_tail(&slot->entry, &sas_dev->list);
 	spin_unlock_irqrestore(&dq->lock, flags);
+	spin_lock_irqsave(&sas_dev->lock, flags);
+	list_add_tail(&slot->entry, &sas_dev->list);
+	spin_unlock_irqrestore(&sas_dev->lock, flags);
 
 	dlvry_queue = dq->id;
 	dlvry_queue_slot = wr_q_index;
 
+	slot->device_id = sas_dev->device_id;
 	slot->n_elem = n_elem;
 	slot->n_elem_dif = n_elem_dif;
 	slot->dlvry_queue = dlvry_queue;
@@ -701,6 +711,7 @@ static struct hisi_sas_device *hisi_sas_alloc_dev(struct domain_device *device)
 			sas_dev->hisi_hba = hisi_hba;
 			sas_dev->sas_device = device;
 			sas_dev->dq = dq;
+			spin_lock_init(&sas_dev->lock);
 			INIT_LIST_HEAD(&hisi_hba->devices[i].list);
 			break;
 		}
@@ -1913,10 +1924,14 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
 	}
 	list_add_tail(&slot->delivery, &dq->list);
 	spin_unlock_irqrestore(&dq->lock, flags_dq);
+	spin_lock_irqsave(&sas_dev->lock, flags);
+	list_add_tail(&slot->entry, &sas_dev->list);
+	spin_unlock_irqrestore(&sas_dev->lock, flags);
 
 	dlvry_queue = dq->id;
 	dlvry_queue_slot = wr_q_index;
 
+	slot->device_id = sas_dev->device_id;
 	slot->n_elem = n_elem;
 	slot->dlvry_queue = dlvry_queue;
 	slot->dlvry_queue_slot = dlvry_queue_slot;
@@ -1940,7 +1955,6 @@ hisi_sas_internal_abort_task_exec(struct hisi_hba *hisi_hba, int device_id,
 	WRITE_ONCE(slot->ready, 1);
 	/* send abort command to the chip */
 	spin_lock_irqsave(&dq->lock, flags);
-	list_add_tail(&slot->entry, &sas_dev->list);
 	hisi_hba->hw->start_delivery(dq);
 	spin_unlock_irqrestore(&dq->lock, flags);
 
@@ -2070,6 +2084,11 @@ hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
 						     abort_flag, tag, dq);
 	case HISI_SAS_INT_ABT_DEV:
 		for (i = 0; i < hisi_hba->cq_nvecs; i++) {
+			struct hisi_sas_cq *cq = &hisi_hba->cq[i];
+			const struct cpumask *mask = cq->pci_irq_mask;
+
+			if (mask && !cpumask_intersects(cpu_online_mask, mask))
+				continue;
 			dq = &hisi_hba->dq[i];
 			rc = _hisi_sas_internal_task_abort(hisi_hba, device,
 							   abort_flag, tag,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
index 39233a9cd1f5..e40cc6b3b67b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v2_hw.c
@@ -873,6 +873,7 @@ hisi_sas_device *alloc_dev_quirk_v2_hw(struct domain_device *device)
 			sas_dev->sas_device = device;
 			sas_dev->sata_idx = sata_idx;
 			sas_dev->dq = dq;
+			spin_lock_init(&sas_dev->lock);
 			INIT_LIST_HEAD(&hisi_hba->devices[i].list);
 			break;
 		}
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 20cec1d2439c..d734aef263d1 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -403,6 +403,7 @@ struct hisi_sas_err_record_v3 {
 #define T10_CHK_APP_TAG_MSK (0xc << T10_CHK_MSK_OFF)
 
 #define BASE_VECTORS_V3_HW  16
+#define MIN_AFFINE_VECTORS_V3_HW  (BASE_VECTORS_V3_HW + 1)
 
 static bool hisi_sas_intr_conv;
 MODULE_PARM_DESC(intr_conv, "interrupt converge enable (0-1)");
@@ -412,6 +413,11 @@ static int prot_mask;
 module_param(prot_mask, int, 0);
 MODULE_PARM_DESC(prot_mask, " host protection capabilities mask, def=0x0 ");
 
+static bool auto_affine_msi_experimental;
+module_param(auto_affine_msi_experimental, bool, 0444);
+MODULE_PARM_DESC(auto_affine_msi_experimental, "Enable auto-affinity of MSI IRQs as experimental:\n"
+		 "default is off");
+
 static u32 hisi_sas_read32(struct hisi_hba *hisi_hba, u32 off)
 {
 	void __iomem *regs = hisi_hba->regs + off;
@@ -2037,19 +2043,64 @@ static irqreturn_t cq_interrupt_v3_hw(int irq_no, void *p)
 	return IRQ_HANDLED;
 }
 
+static void setup_reply_map_v3_hw(struct hisi_hba *hisi_hba, int nvecs)
+{
+	const struct cpumask *mask;
+	int queue, cpu;
+
+	for (queue = 0; queue < nvecs; queue++) {
+		struct hisi_sas_cq *cq = &hisi_hba->cq[queue];
+
+		mask = pci_irq_get_affinity(hisi_hba->pci_dev, queue +
+					    BASE_VECTORS_V3_HW);
+		if (!mask)
+			goto fallback;
+		cq->pci_irq_mask = mask;
+		for_each_cpu(cpu, mask)
+			hisi_hba->reply_map[cpu] = queue;
+	}
+	return;
+
+fallback:
+	for_each_possible_cpu(cpu)
+		hisi_hba->reply_map[cpu] = cpu % hisi_hba->queue_count;
+	/* Don't clean all CQ masks */
+}
+
 static int interrupt_init_v3_hw(struct hisi_hba *hisi_hba)
 {
 	struct device *dev = hisi_hba->dev;
 	struct pci_dev *pdev = hisi_hba->pci_dev;
 	int vectors, rc;
 	int i, k;
-	int max_msi = HISI_SAS_MSI_COUNT_V3_HW;
-
-	vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, 1,
-					max_msi, PCI_IRQ_MSI);
-	if (vectors < max_msi) {
-		dev_err(dev, "could not allocate all msi (%d)\n", vectors);
-		return -ENOENT;
+	int max_msi = HISI_SAS_MSI_COUNT_V3_HW, min_msi;
+
+	if (auto_affine_msi_experimental) {
+		struct irq_affinity desc = {
+			.pre_vectors = BASE_VECTORS_V3_HW,
+		};
+
+		min_msi = MIN_AFFINE_VECTORS_V3_HW;
+
+		hisi_hba->reply_map = devm_kcalloc(dev, nr_cpu_ids,
+						   sizeof(unsigned int),
+						   GFP_KERNEL);
+		if (!hisi_hba->reply_map)
+			return -ENOMEM;
+		vectors = pci_alloc_irq_vectors_affinity(hisi_hba->pci_dev,
+							 min_msi, max_msi,
+							 PCI_IRQ_MSI |
+							 PCI_IRQ_AFFINITY,
+							 &desc);
+		if (vectors < 0)
+			return -ENOENT;
+		setup_reply_map_v3_hw(hisi_hba, vectors - BASE_VECTORS_V3_HW);
+	} else {
+		min_msi = max_msi;
+		vectors = pci_alloc_irq_vectors(hisi_hba->pci_dev, min_msi,
+						max_msi, PCI_IRQ_MSI);
+		if (vectors < 0)
+			return vectors;
 	}
 
 	hisi_hba->cq_nvecs = vectors - BASE_VECTORS_V3_HW;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] scsi: hisi_sas: Do some more tidy-up
  2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
                   ` (4 preceding siblings ...)
  2019-02-06 10:52 ` [PATCH 5/6] scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental John Garry
@ 2019-02-06 10:52 ` John Garry
  2019-02-08 23:08 ` [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches Martin K. Petersen
  6 siblings, 0 replies; 11+ messages in thread
From: John Garry @ 2019-02-06 10:52 UTC (permalink / raw)
  To: jejb, martin.petersen; +Cc: linuxarm, linux-kernel, linux-scsi, John Garry

Do some very minor tidy-up, for things like needlessly initing
variable and not leaving whitespace before quote endings.

Originally-from: Xiang Chen <chenxiang66@hisilicon.com>
Originally-from: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c  | 10 +++++-----
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index eff31472b96e..923296653ed7 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -144,7 +144,7 @@ EXPORT_SYMBOL_GPL(hisi_sas_get_ncq_tag);
  */
 u8 hisi_sas_get_prog_phy_linkrate_mask(enum sas_linkrate max)
 {
-	u16 rate = 0;
+	u8 rate = 0;
 	int i;
 
 	max -= SAS_LINK_RATE_1_5_GBPS;
@@ -1180,7 +1180,7 @@ static int hisi_sas_exec_internal_tmf_task(struct domain_device *device,
 		task->task_done = hisi_sas_task_done;
 
 		task->slow_task->timer.function = hisi_sas_tmf_timedout;
-		task->slow_task->timer.expires = jiffies + TASK_TIMEOUT*HZ;
+		task->slow_task->timer.expires = jiffies + TASK_TIMEOUT * HZ;
 		add_timer(&task->slow_task->timer);
 
 		res = hisi_sas_task_exec(task, GFP_KERNEL, 1, tmf);
@@ -1701,8 +1701,8 @@ static int hisi_sas_abort_task_set(struct domain_device *device, u8 *lun)
 
 static int hisi_sas_clear_aca(struct domain_device *device, u8 *lun)
 {
-	int rc = TMF_RESP_FUNC_FAILED;
 	struct hisi_sas_tmf_task tmf_task;
+	int rc;
 
 	tmf_task.tmf = TMF_CLEAR_ACA;
 	rc = hisi_sas_debug_issue_ssp_tmf(device, lun, &tmf_task);
@@ -1752,7 +1752,7 @@ static int hisi_sas_I_T_nexus_reset(struct domain_device *device)
 {
 	struct hisi_hba *hisi_hba = dev_to_hisi_hba(device);
 	struct device *dev = hisi_hba->dev;
-	int rc = TMF_RESP_FUNC_FAILED;
+	int rc;
 
 	rc = hisi_sas_internal_task_abort(hisi_hba, device,
 					  HISI_SAS_INT_ABT_DEV, 0);
@@ -2005,7 +2005,7 @@ _hisi_sas_internal_task_abort(struct hisi_hba *hisi_hba,
 	task->task_proto = device->tproto;
 	task->task_done = hisi_sas_task_done;
 	task->slow_task->timer.function = hisi_sas_tmf_timedout;
-	task->slow_task->timer.expires = jiffies + INTERNAL_ABORT_TIMEOUT*HZ;
+	task->slow_task->timer.expires = jiffies + INTERNAL_ABORT_TIMEOUT * HZ;
 	add_timer(&task->slow_task->timer);
 
 	res = hisi_sas_internal_abort_task_exec(hisi_hba, sas_dev->device_id,
diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index d734aef263d1..0f05fd409f6b 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -11,7 +11,7 @@
 #include "hisi_sas.h"
 #define DRV_NAME "hisi_sas_v3_hw"
 
-/* global registers need init*/
+/* global registers need init */
 #define DLVRY_QUEUE_ENABLE		0x0
 #define IOST_BASE_ADDR_LO		0x8
 #define IOST_BASE_ADDR_HI		0xc
@@ -728,7 +728,7 @@ static void clear_itct_v3_hw(struct hisi_hba *hisi_hba,
 		hisi_sas_write32(hisi_hba, ENT_INT_SRC3,
 				 ENT_INT_SRC3_ITC_INT_MSK);
 
-	/* clear the itct table*/
+	/* clear the itct table */
 	reg_val = ITCT_CLR_EN_MSK | (dev_id & ITCT_DEV_MSK);
 	hisi_sas_write32(hisi_hba, ITCT_CLR, reg_val);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches
  2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
                   ` (5 preceding siblings ...)
  2019-02-06 10:52 ` [PATCH 6/6] scsi: hisi_sas: Do some more tidy-up John Garry
@ 2019-02-08 23:08 ` Martin K. Petersen
  6 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2019-02-08 23:08 UTC (permalink / raw)
  To: John Garry; +Cc: jejb, martin.petersen, linuxarm, linux-kernel, linux-scsi


John,

> This patchset introduces the following:
> - DIX support for v3 hw

Applied to 5.1/scsi-queue, thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw
  2019-02-06 10:52 ` [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw John Garry
@ 2019-02-08 23:13   ` Martin K. Petersen
  2019-02-11 11:36     ` chenxiang (M)
  0 siblings, 1 reply; 11+ messages in thread
From: Martin K. Petersen @ 2019-02-08 23:13 UTC (permalink / raw)
  To: John Garry
  Cc: jejb, martin.petersen, linuxarm, linux-kernel, linux-scsi, Xiang Chen


John,

Just noticed this while inspecting the resulting complete diff:

>  static u32 get_prot_chk_msk_v3_hw(struct scsi_cmnd *scsi_cmnd)
>  {
>  	unsigned char prot_flags = scsi_cmnd->prot_flags;
>  
> -	if (prot_flags & SCSI_PROT_TRANSFER_PI) {
> -		if (prot_flags & SCSI_PROT_REF_CHECK)
> -			return 0xc << 16;
> -		return 0xfc << 16;
> -	}
> -	return 0;
> +	if (prot_flags & SCSI_PROT_REF_CHECK)
> +		return T10_CHK_APP_TAG_MSK;

Polarity is a bit unclear here. Is this statement disabling checking of
the app tag?

> +	return T10_CHK_REF_TAG_MSK | T10_CHK_APP_TAG_MSK;
>  }

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw
  2019-02-08 23:13   ` Martin K. Petersen
@ 2019-02-11 11:36     ` chenxiang (M)
  2019-02-13  3:01       ` Martin K. Petersen
  0 siblings, 1 reply; 11+ messages in thread
From: chenxiang (M) @ 2019-02-11 11:36 UTC (permalink / raw)
  To: Martin K. Petersen, John Garry; +Cc: jejb, linuxarm, linux-kernel, linux-scsi

Hi Martin,

在 2019/2/9 7:13, Martin K. Petersen 写道:
> John,
>
> Just noticed this while inspecting the resulting complete diff:
>
>>   static u32 get_prot_chk_msk_v3_hw(struct scsi_cmnd *scsi_cmnd)
>>   {
>>   	unsigned char prot_flags = scsi_cmnd->prot_flags;
>>   
>> -	if (prot_flags & SCSI_PROT_TRANSFER_PI) {
>> -		if (prot_flags & SCSI_PROT_REF_CHECK)
>> -			return 0xc << 16;
>> -		return 0xfc << 16;
>> -	}
>> -	return 0;
>> +	if (prot_flags & SCSI_PROT_REF_CHECK)
>> +		return T10_CHK_APP_TAG_MSK;
> Polarity is a bit unclear here. Is this statement disabling checking of
> the app tag?

Yes, disabling checking of app tag.
For here, 1'b0 presents the related byte in DIF will be checked, and 
1'b1 presents the related byte in DIF will not be checked.


>
>> +	return T10_CHK_REF_TAG_MSK | T10_CHK_APP_TAG_MSK;
>>   }



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw
  2019-02-11 11:36     ` chenxiang (M)
@ 2019-02-13  3:01       ` Martin K. Petersen
  0 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2019-02-13  3:01 UTC (permalink / raw)
  To: chenxiang (M)
  Cc: Martin K. Petersen, John Garry, jejb, linuxarm, linux-kernel, linux-scsi


chenxiang,

>>> +	if (prot_flags & SCSI_PROT_REF_CHECK)
>>> +		return T10_CHK_APP_TAG_MSK;

>> Polarity is a bit unclear here. Is this statement disabling checking of
>> the app tag?
>
> Yes, disabling checking of app tag.
> For here, 1'b0 presents the related byte in DIF will be checked, and
> 1'b1 presents the related byte in DIF will not be checked.

OK, thanks for the clarification!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-02-13  3:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06 10:52 [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches John Garry
2019-02-06 10:52 ` [PATCH 1/6] scsi: hisi_sas: Add support for DIX feature for v3 hw John Garry
2019-02-08 23:13   ` Martin K. Petersen
2019-02-11 11:36     ` chenxiang (M)
2019-02-13  3:01       ` Martin K. Petersen
2019-02-06 10:52 ` [PATCH 2/6] scsi: hisi_sas: Add manual trigger for debugfs dump John Garry
2019-02-06 10:52 ` [PATCH 3/6] scsi: hisi_sas: change queue depth from 512 to 4096 John Garry
2019-02-06 10:52 ` [PATCH 4/6] scsi: hisi_sas: Issue internal abort on all relevant queues John Garry
2019-02-06 10:52 ` [PATCH 5/6] scsi: hisi_sas: Use pci_irq_get_affinity() for v3 hw as experimental John Garry
2019-02-06 10:52 ` [PATCH 6/6] scsi: hisi_sas: Do some more tidy-up John Garry
2019-02-08 23:08 ` [PATCH 0/6] hisi_sas: DIX support, performance optimisation, and misc patches Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).