[PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset
@ 2018-02-27 10:07 Ming Lei
  2018-02-27 10:07 ` [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue Ming Lei
                   ` (8 more replies)
  0 siblings, 9 replies; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei

Hi All,

The 1st two patches fixes reply queue selection, and this issue has been
reported and can cause IO hang during booting, please consider the two
for V4.16.

The following 6 patches try to improve hostwide tagset on hpsa and
megaraid_sas by making hw queue per NUMA node.

I don't have high-performance hpsa and megaraid_sas device at hand.

Don Brace, could you test this patchset on concurrent IOs over you hpsa
SSD and see if this approach is well?

Kashyap, could you test this patchset on your megaraid_sas SSDs?

	gitweb: https://github.com/ming1/linux/tree/v4.16-rc-host-tags-v3.2

thanks,
Ming

Hannes Reinecke (1):
  scsi: Add template flag 'host_tagset'

Ming Lei (7):
  scsi: hpsa: fix selection of reply queue
  scsi: megaraid_sas: fix selection of reply queue
  blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
  blk-mq: introduce BLK_MQ_F_HOST_TAGS
  block: null_blk: introduce module parameter of 'g_host_tags'
  scsi: hpsa: improve scsi_mq performance via .host_tagset
  scsi: megaraid: improve scsi_mq performance via .host_tagset

 block/blk-mq-debugfs.c                      |  2 +
 block/blk-mq-sched.c                        |  2 +-
 block/blk-mq-tag.c                          | 13 +++--
 block/blk-mq-tag.h                          | 11 ++--
 block/blk-mq.c                              | 50 +++++++++++++++---
 block/blk-mq.h                              |  3 +-
 drivers/block/null_blk.c                    |  6 +++
 drivers/scsi/hpsa.c                         | 79 ++++++++++++++++++++++-------
 drivers/scsi/hpsa.h                         |  1 +
 drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
 drivers/scsi/megaraid/megaraid_sas_base.c   | 40 ++++++++++++++-
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++---
 drivers/scsi/scsi_lib.c                     |  2 +
 include/linux/blk-mq.h                      |  2 +
 include/scsi/scsi_host.h                    |  3 ++
 15 files changed, 182 insertions(+), 46 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-03-01 16:18   ` Don Brace
  2018-03-08  7:50   ` Christoph Hellwig
  2018-02-27 10:07 ` [PATCH V3 2/8] scsi: megaraid_sas: " Ming Lei
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei, Meelis Roos

>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then one
command's completion may not be notified.

This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by pci_irq_get_affinity(), and uses this mapping
table to choose reply queue for queuing one command.

Then the chosen reply queue has to be active, and fixes IO hang caused
by using inactive reply queue which doesn't have any online CPU mapped.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++--------------
 drivers/scsi/hpsa.h |  1 +
 2 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 5293e6827ce5..3a9eca163db8 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
 		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
 		if (unlikely(!h->msix_vectors))
 			return;
-		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-			c->Header.ReplyQueue =
-				raw_smp_processor_id() % h->nreply_queues;
-		else
-			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
+		c->Header.ReplyQueue = reply_queue;
 	}
 }
 
@@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->ReplyQueue = reply_queue % h->nreply_queues;
+	cp->ReplyQueue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit (bit 0)
@@ -1087,10 +1080,7 @@ static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
 	/* Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/* Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
 	 *  - pull count (bits 0-3)
@@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
@@ -1157,6 +1144,8 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 {
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
+
+	reply_queue = h->reply_map[raw_smp_processor_id()];
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -7376,6 +7365,26 @@ static void hpsa_disable_interrupt_mode(struct ctlr_info *h)
 	h->msix_vectors = 0;
 }
 
+static void hpsa_setup_reply_map(struct ctlr_info *h)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	for (queue = 0; queue < h->msix_vectors; queue++) {
+		mask = pci_irq_get_affinity(h->pdev, queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			h->reply_map[cpu] = queue;
+	}
+	return;
+
+fallback:
+	for_each_possible_cpu(cpu)
+		h->reply_map[cpu] = 0;
+}
+
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info *h)
 	err = hpsa_interrupt_mode(h);
 	if (err)
 		goto clean1;
+
+	/* setup mapping between CPU and reply queue */
+	hpsa_setup_reply_map(h);
+
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8480,6 +8493,28 @@ static struct workqueue_struct *hpsa_create_controller_wq(struct ctlr_info *h,
 	return wq;
 }
 
+static void hpda_free_ctlr_info(struct ctlr_info *h)
+{
+	kfree(h->reply_map);
+	kfree(h);
+}
+
+static struct ctlr_info *hpda_alloc_ctlr_info(void)
+{
+	struct ctlr_info *h;
+
+	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	if (!h)
+		return NULL;
+
+	h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
+	if (!h->reply_map) {
+		kfree(h);
+		return NULL;
+	}
+	return h;
+}
+
 static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	int dac, rc;
@@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * the driver.  See comments in hpsa.h for more info.
 	 */
 	BUILD_BUG_ON(sizeof(struct CommandList) % COMMANDLIST_ALIGNMENT);
-	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	h = hpda_alloc_ctlr_info();
 	if (!h) {
 		dev_err(&pdev->dev, "Failed to allocate controller head\n");
 		return -ENOMEM;
@@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev *pdev)
 	h->lockup_detected = NULL;			/* init_one 2 */
 	/* (void) pci_disable_pcie_error_reporting(pdev); */	/* init_one 1 */
 
-	kfree(h);					/* init_one 1 */
+	hpda_free_ctlr_info(h);				/* init_one 1 */
 }
 
 static int hpsa_suspend(__attribute__((unused)) struct pci_dev *pdev,
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 018f980a701c..fb9f5e7f8209 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -158,6 +158,7 @@ struct bmic_controller_parameters {
 #pragma pack()
 
 struct ctlr_info {
+	unsigned int *reply_map;
 	int	ctlr;
 	char	devname[8];
 	char    *product_name;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V3 2/8] scsi: megaraid_sas: fix selection of reply queue
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
  2018-02-27 10:07 ` [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-02-27 10:07 ` [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags' Ming Lei
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei, Meelis Roos

>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then
command may be queued, and won't be notified after its completion.

This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by pci_irq_get_affinity(), and uses this info
to choose reply queue for queuing one command.

Then the chosen reply queue has to be active, and fixes IO hang caused
by using inactive reply queue which doesn't have any online CPU mapped.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
 drivers/scsi/megaraid/megaraid_sas_base.c   | 34 ++++++++++++++++++++++++++++-
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++++------
 3 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
index ba6503f37756..a644d2be55b6 100644
--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -2127,7 +2127,7 @@ enum MR_PD_TYPE {
 #define MR_NVME_PAGE_SIZE_MASK		0x000000FF
 
 struct megasas_instance {
-
+	unsigned int *reply_map;
 	__le32 *producer;
 	dma_addr_t producer_h;
 	__le32 *consumer;
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index a71ee67df084..065956cb2aeb 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5165,6 +5165,26 @@ megasas_setup_jbod_map(struct megasas_instance *instance)
 		instance->use_seqnum_jbod_fp = false;
 }
 
+static void megasas_setup_reply_map(struct megasas_instance *instance)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	for (queue = 0; queue < instance->msix_vectors; queue++) {
+		mask = pci_irq_get_affinity(instance->pdev, queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			instance->reply_map[cpu] = queue;
+	}
+	return;
+
+fallback:
+	for_each_possible_cpu(cpu)
+		instance->reply_map[cpu] = 0;
+}
+
 /**
  * megasas_init_fw -	Initializes the FW
  * @instance:		Adapter soft state
@@ -5343,6 +5363,8 @@ static int megasas_init_fw(struct megasas_instance *instance)
 			goto fail_setup_irqs;
 	}
 
+	megasas_setup_reply_map(instance);
+
 	dev_info(&instance->pdev->dev,
 		"firmware supports msix\t: (%d)", fw_msix_count);
 	dev_info(&instance->pdev->dev,
@@ -6448,6 +6470,11 @@ static int megasas_probe_one(struct pci_dev *pdev,
 	memset(instance, 0, sizeof(*instance));
 	atomic_set(&instance->fw_reset_no_pci_access, 0);
 
+	instance->reply_map = kzalloc(sizeof(unsigned int) * nr_cpu_ids,
+			GFP_KERNEL);
+	if (!instance->reply_map)
+		goto fail_alloc_reply_map;
+
 	/*
 	 * Initialize PCI related and misc parameters
 	 */
@@ -6539,8 +6566,9 @@ static int megasas_probe_one(struct pci_dev *pdev,
 	if (instance->msix_vectors)
 		pci_free_irq_vectors(instance->pdev);
 fail_init_mfi:
+	kfree(instance->reply_map);
+fail_alloc_reply_map:
 	scsi_host_put(host);
-
 fail_alloc_instance:
 	pci_disable_device(pdev);
 
@@ -6746,6 +6774,8 @@ megasas_resume(struct pci_dev *pdev)
 	if (rval < 0)
 		goto fail_reenable_msix;
 
+	megasas_setup_reply_map(instance);
+
 	if (instance->adapter_type != MFI_SERIES) {
 		megasas_reset_reply_desc(instance);
 		if (megasas_ioc_init_fusion(instance)) {
@@ -6960,6 +6990,8 @@ static void megasas_detach_one(struct pci_dev *pdev)
 
 	megasas_free_ctrl_mem(instance);
 
+	kfree(instance->reply_map);
+
 	scsi_host_put(host);
 
 	pci_disable_device(pdev);
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 073ced07e662..2994176a0121 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2655,11 +2655,8 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
 			fp_possible = (io_info.fpOkForIo > 0) ? true : false;
 	}
 
-	/* Use raw_smp_processor_id() for now until cmd->request->cpu is CPU
-	   id by default, not CPU group id, otherwise all MSI-X queues won't
-	   be utilized */
-	cmd->request_desc->SCSIIO.MSIxIndex = instance->msix_vectors ?
-		raw_smp_processor_id() % instance->msix_vectors : 0;
+	cmd->request_desc->SCSIIO.MSIxIndex =
+		instance->reply_map[raw_smp_processor_id()];
 
 	praid_context = &io_request->RaidContext;
 
@@ -2985,10 +2982,9 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
 	}
 
 	cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle;
-	cmd->request_desc->SCSIIO.MSIxIndex =
-		instance->msix_vectors ?
-		(raw_smp_processor_id() % instance->msix_vectors) : 0;
 
+	cmd->request_desc->SCSIIO.MSIxIndex =
+		instance->reply_map[raw_smp_processor_id()];
 
 	if (!fp_possible) {
 		/* system pd firmware path */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
  2018-02-27 10:07 ` [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue Ming Lei
  2018-02-27 10:07 ` [PATCH V3 2/8] scsi: megaraid_sas: " Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-03-08  7:51   ` Christoph Hellwig
  2018-02-27 10:07 ` [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS Ming Lei
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei

This patch introduces 'start_tag' field to 'struct blk_mq_tags' so that
host wide tagset can be supported easily in the following patches by
partitioning host wide tags into multiple hw queues.

No function change.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c | 3 ++-
 block/blk-mq-tag.h | 6 ++++--
 block/blk-mq.c     | 7 ++++---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 336dde07b230..5014d7343ea9 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -179,12 +179,13 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
 	finish_wait(&ws->wait, &wait);
 
 found_tag:
-	return tag + tag_offset;
+	return tag + tag_offset + tags->start_tag;
 }
 
 void blk_mq_put_tag(struct blk_mq_hw_ctx *hctx, struct blk_mq_tags *tags,
 		    struct blk_mq_ctx *ctx, unsigned int tag)
 {
+	tag -= tags->start_tag;
 	if (!blk_mq_tag_is_reserved(tags, tag)) {
 		const int real_tag = tag - tags->nr_reserved_tags;
 
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..1d629920db69 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,6 +13,8 @@ struct blk_mq_tags {
 
 	atomic_t active_queues;
 
+	unsigned int start_tag;
+
 	struct sbitmap_queue bitmap_tags;
 	struct sbitmap_queue breserved_tags;
 
@@ -78,13 +80,13 @@ static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
 static inline void blk_mq_tag_set_rq(struct blk_mq_hw_ctx *hctx,
 		unsigned int tag, struct request *rq)
 {
-	hctx->tags->rqs[tag] = rq;
+	hctx->tags->rqs[tag - hctx->tags->start_tag] = rq;
 }
 
 static inline bool blk_mq_tag_is_reserved(struct blk_mq_tags *tags,
 					  unsigned int tag)
 {
-	return tag < tags->nr_reserved_tags;
+	return (tag - tags->start_tag) < tags->nr_reserved_tags;
 }
 
 #endif
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 357492712b0e..5ea11d087f7b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -270,7 +270,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		unsigned int tag, unsigned int op)
 {
 	struct blk_mq_tags *tags = blk_mq_tags_from_data(data);
-	struct request *rq = tags->static_rqs[tag];
+	struct request *rq = tags->static_rqs[tag - tags->start_tag];
 	req_flags_t rq_flags = 0;
 
 	if (data->flags & BLK_MQ_REQ_INTERNAL) {
@@ -283,7 +283,7 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		}
 		rq->tag = tag;
 		rq->internal_tag = -1;
-		data->hctx->tags->rqs[rq->tag] = rq;
+		data->hctx->tags->rqs[rq->tag - tags->start_tag] = rq;
 	}
 
 	/* csd/requeue_work/fifo_time is initialized before use */
@@ -801,6 +801,7 @@ EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 {
+	tag -= tags->start_tag;
 	if (tag < tags->nr_tags) {
 		prefetch(tags->rqs[tag]);
 		return tags->rqs[tag];
@@ -1076,7 +1077,7 @@ bool blk_mq_get_driver_tag(struct request *rq, struct blk_mq_hw_ctx **hctx,
 			rq->rq_flags |= RQF_MQ_INFLIGHT;
 			atomic_inc(&data.hctx->nr_active);
 		}
-		data.hctx->tags->rqs[rq->tag] = rq;
+		data.hctx->tags->rqs[rq->tag - data.hctx->tags->start_tag] = rq;
 	}
 
 done:
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
                   ` (2 preceding siblings ...)
  2018-02-27 10:07 ` [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags' Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-03-08  7:52   ` Christoph Hellwig
  2018-02-27 10:07 ` [PATCH V3 5/8] scsi: Add template flag 'host_tagset' Ming Lei
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei

This patch can support to partition host-wide tags to multiple hw queues,
so each hw queue related data structures(tags, hctx) can be accessed in
NUMA locality way, for example, the hw queue can be per NUMA node.

It is observed IOPS can be improved much in this way on null_blk test.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-debugfs.c |  2 ++
 block/blk-mq-sched.c   |  2 +-
 block/blk-mq-tag.c     | 10 +++++++---
 block/blk-mq-tag.h     |  5 ++++-
 block/blk-mq.c         | 43 ++++++++++++++++++++++++++++++++++++++-----
 block/blk-mq.h         |  3 ++-
 include/linux/blk-mq.h |  2 ++
 7 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 21cbc1f071c6..56b4a572f233 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -206,6 +206,7 @@ static const char *const hctx_flag_name[] = {
 	HCTX_FLAG_NAME(SHOULD_MERGE),
 	HCTX_FLAG_NAME(TAG_SHARED),
 	HCTX_FLAG_NAME(SG_MERGE),
+	HCTX_FLAG_NAME(HOST_TAGS),
 	HCTX_FLAG_NAME(BLOCKING),
 	HCTX_FLAG_NAME(NO_SCHED),
 };
@@ -434,6 +435,7 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
 	seq_printf(m, "nr_reserved_tags=%u\n", tags->nr_reserved_tags);
 	seq_printf(m, "active_queues=%d\n",
 		   atomic_read(&tags->active_queues));
+	seq_printf(m, "start_tag=%u\n", tags->start_tag);
 
 	seq_puts(m, "\nbitmap_tags:\n");
 	sbitmap_queue_show(&tags->bitmap_tags, m);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 25c14c58385c..d895a57f945a 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -497,7 +497,7 @@ static int blk_mq_sched_alloc_tags(struct request_queue *q,
 	int ret;
 
 	hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
-					       set->reserved_tags);
+					       set->reserved_tags, 0);
 	if (!hctx->sched_tags)
 		return -ENOMEM;
 
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 5014d7343ea9..cc8886f82c71 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -380,9 +380,11 @@ static struct blk_mq_tags *blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
 	return NULL;
 }
 
-struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
+struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+				     unsigned int total_tags,
 				     unsigned int reserved_tags,
-				     int node, int alloc_policy)
+				     int node, int alloc_policy,
+				     unsigned int start_tag)
 {
 	struct blk_mq_tags *tags;
 
@@ -397,6 +399,7 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
 
 	tags->nr_tags = total_tags;
 	tags->nr_reserved_tags = reserved_tags;
+	tags->start_tag = start_tag;
 
 	return blk_mq_init_bitmap_tags(tags, node, alloc_policy);
 }
@@ -438,7 +441,8 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 		if (tdepth > 16 * BLKDEV_MAX_RQ)
 			return -EINVAL;
 
-		new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, 0);
+		new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth, 0,
+				tags->start_tag);
 		if (!new)
 			return -ENOMEM;
 		ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 1d629920db69..9cd195cb15d0 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -14,6 +14,7 @@ struct blk_mq_tags {
 	atomic_t active_queues;
 
 	unsigned int start_tag;
+	bool	host_wide;
 
 	struct sbitmap_queue bitmap_tags;
 	struct sbitmap_queue breserved_tags;
@@ -24,7 +25,9 @@ struct blk_mq_tags {
 };
 
 
-extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags, unsigned int reserved_tags, int node, int alloc_policy);
+extern struct blk_mq_tags *blk_mq_init_tags(struct blk_mq_tag_set *set,
+		unsigned int nr_tags, unsigned int reserved_tags, int node,
+		int alloc_policy, unsigned int start_tag);
 extern void blk_mq_free_tags(struct blk_mq_tags *tags);
 
 extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ea11d087f7b..6ebe053b2280 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2024,7 +2024,8 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags)
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
 					unsigned int nr_tags,
-					unsigned int reserved_tags)
+					unsigned int reserved_tags,
+					unsigned int start_tag)
 {
 	struct blk_mq_tags *tags;
 	int node;
@@ -2033,8 +2034,9 @@ struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
 
-	tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
-				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
+	tags = blk_mq_init_tags(set, nr_tags, reserved_tags, node,
+				BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags),
+				start_tag);
 	if (!tags)
 		return NULL;
 
@@ -2086,6 +2088,9 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	size_t rq_size, left;
 	int node;
 
+	if (tags->host_wide && !hctx_idx)
+		depth += set->__host_queue_depth - set->nr_hw_queues * set->queue_depth;
+
 	node = blk_mq_hw_queue_to_node(set->mq_map, hctx_idx);
 	if (node == NUMA_NO_NODE)
 		node = set->numa_node;
@@ -2335,12 +2340,25 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
 {
 	int ret = 0;
+	unsigned int queue_depth = set->queue_depth;
+	unsigned int extra, start_tag = 0;
+
+	if (set->flags & BLK_MQ_F_HOST_TAGS) {
+		extra = set->__host_queue_depth - set->nr_hw_queues * queue_depth;
+		/* Assign extra tags to hw queue 0 */
+		if (hctx_idx == 0)
+			queue_depth += extra;
+		else
+			start_tag = hctx_idx * queue_depth + extra;
+	}
 
-	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
-					set->queue_depth, set->reserved_tags);
+	set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx, queue_depth,
+						  set->reserved_tags,
+						  start_tag);
 	if (!set->tags[hctx_idx])
 		return false;
 
+	set->tags[hctx_idx]->host_wide = !!(set->flags & BLK_MQ_F_HOST_TAGS);
 	ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
 				set->queue_depth);
 	if (!ret)
@@ -2892,6 +2910,21 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
+	/*
+	 * Divide host tags to each hw queues equally, and assign extra
+	 * tags to hw queue 0, see __blk_mq_alloc_rq_map().
+	 *
+	 * It is driver's responsility to choose a suitable 'nr_hw_queues'
+	 * for getting a good 'hw queue depth', so that enough parallelism
+	 * can be exploited from device internal view to get good performance,
+	 * for example, 32 is often fine for HDD., and 256 or a bit less is
+	 * enough for SSD.
+	 */
+	if (set->flags & BLK_MQ_F_HOST_TAGS) {
+		set->__host_queue_depth = set->queue_depth;
+		set->queue_depth = set->__host_queue_depth / set->nr_hw_queues;
+	}
+
 	ret = blk_mq_alloc_rq_maps(set);
 	if (ret)
 		goto out_free_mq_map;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 88c558f71819..ea9a46517c8a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -61,7 +61,8 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags);
 struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
 					unsigned int hctx_idx,
 					unsigned int nr_tags,
-					unsigned int reserved_tags);
+					unsigned int reserved_tags,
+					unsigned int start_tag);
 int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 		     unsigned int hctx_idx, unsigned int depth);
 
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49796a3..cff01125bba7 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -74,6 +74,7 @@ struct blk_mq_tag_set {
 	const struct blk_mq_ops	*ops;
 	unsigned int		nr_hw_queues;
 	unsigned int		queue_depth;	/* max hw supported */
+	unsigned int		__host_queue_depth;	/* BLK_MQ_F_HOST_TAGS */
 	unsigned int		reserved_tags;
 	unsigned int		cmd_size;	/* per-request extra data */
 	int			numa_node;
@@ -175,6 +176,7 @@ enum {
 	BLK_MQ_F_SHOULD_MERGE	= 1 << 0,
 	BLK_MQ_F_TAG_SHARED	= 1 << 1,
 	BLK_MQ_F_SG_MERGE	= 1 << 2,
+	BLK_MQ_F_HOST_TAGS	= 1 << 3,
 	BLK_MQ_F_BLOCKING	= 1 << 5,
 	BLK_MQ_F_NO_SCHED	= 1 << 6,
 	BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V3 5/8] scsi: Add template flag 'host_tagset'
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
                   ` (3 preceding siblings ...)
  2018-02-27 10:07 ` [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-02-27 10:07 ` [PATCH V3 6/8] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Hannes Reinecke, Ming Lei

From: Hannes Reinecke <hare@suse.com>

Add a host template flag 'host_tagset' to enable the use of a global
tagset for block-mq.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c  | 2 ++
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a86df9ca7d1c..8e6f118f1066 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2291,6 +2291,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	shost->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
 	shost->tag_set.flags |=
 		BLK_ALLOC_POLICY_TO_MQ_FLAG(shost->hostt->tag_alloc_policy);
+	if (shost->hostt->host_tagset)
+		shost->tag_set.flags |= BLK_MQ_F_HOST_TAGS;
 	shost->tag_set.driver_data = shost;
 
 	return blk_mq_alloc_tag_set(&shost->tag_set);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 1a1df0d21ee3..1b35d9cb59b3 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -457,6 +457,9 @@ struct scsi_host_template {
 	 */
 	unsigned int max_host_blocked;
 
+	/* True if the host supports a host-wide tagspace */
+	unsigned host_tagset:1;
+
 	/*
 	 * Default value for the blocking.  If the queue is empty,
 	 * host_blocked counts down in the request_fn until it restarts
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V3 6/8] block: null_blk: introduce module parameter of 'g_host_tags'
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
                   ` (4 preceding siblings ...)
  2018-02-27 10:07 ` [PATCH V3 5/8] scsi: Add template flag 'host_tagset' Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-02-27 10:07 ` [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset Ming Lei
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei

This patch introduces the parameter of 'g_host_tags' so that we can
test this feature by null_blk easiy.

With host_tags when the whole hw depth is kept as same, it is observed
that IOPS can be improved by ~50% on a dual socket(total 16 CPU cores)
system:

1) no 'host_tags', each hw queue depth is 16, and 1 hw queue
modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=0 submit_queues=1 hw_queue_depth=16

IOPS: 1382K

2) 'host_tags', each hw queue depth is 8, and 2 hw queues
modprobe null_blk queue_mode=2 nr_devices=4 shared_tags=1 host_tags=1 submit_queues=2 hw_queue_depth=16

IOPS: 2124K

3) fio test done in above two settings:
fio --bs=4k --size=512G  --rw=randread --norandommap --direct=1 --ioengine=libaio --iodepth=4 --runtime=$RUNTIME --group_reporting=1  --name=nullb0 --filename=/dev/nullb0 --name=nullb1 --filename=/dev/nullb1 --name=nullb2 --filename=/dev/nullb2 --name=nullb3 --filename=/dev/nullb3

Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/block/null_blk.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 287a09611c0f..51b16249028a 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -163,6 +163,10 @@ static int g_submit_queues = 1;
 module_param_named(submit_queues, g_submit_queues, int, S_IRUGO);
 MODULE_PARM_DESC(submit_queues, "Number of submission queues");
 
+static int g_host_tags = 0;
+module_param_named(host_tags, g_host_tags, int, S_IRUGO);
+MODULE_PARM_DESC(host_tags, "All submission queues share one tags");
+
 static int g_home_node = NUMA_NO_NODE;
 module_param_named(home_node, g_home_node, int, S_IRUGO);
 MODULE_PARM_DESC(home_node, "Home node for the device");
@@ -1622,6 +1626,8 @@ static int null_init_tag_set(struct nullb *nullb, struct blk_mq_tag_set *set)
 	set->flags = BLK_MQ_F_SHOULD_MERGE;
 	if (g_no_sched)
 		set->flags |= BLK_MQ_F_NO_SCHED;
+	if (g_host_tags)
+		set->flags |= BLK_MQ_F_HOST_TAGS;
 	set->driver_data = NULL;
 
 	if ((nullb && nullb->dev->blocking) || g_blocking)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
                   ` (5 preceding siblings ...)
  2018-02-27 10:07 ` [PATCH V3 6/8] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-03-08  7:54   ` Christoph Hellwig
  2018-02-27 10:07 ` [PATCH V3 8/8] scsi: megaraid: " Ming Lei
  2018-03-01 21:46 ` [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Laurence Oberman
  8 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei

It is observed that IOPS can be improved much by simply making
hw queue per NUMA node on null_blk, so this patch applies the
introduced .host_tagset for improving performance.

In reality, .can_queue is quite big, and NUMA node number is
often small, so each hw queue's depth should be high enough to
saturate device.

Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hpsa.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 3a9eca163db8..0747751b7e1c 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -978,6 +978,7 @@ static struct scsi_host_template hpsa_driver_template = {
 	.shost_attrs = hpsa_shost_attrs,
 	.max_sectors = 1024,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 static inline u32 next_command(struct ctlr_info *h, u8 q)
@@ -5761,6 +5762,11 @@ static int hpsa_scsi_host_alloc(struct ctlr_info *h)
 static int hpsa_scsi_add_host(struct ctlr_info *h)
 {
 	int rv;
+	/* 256 tags should be high enough to saturate device */
+	int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256);
+
+	/* per NUMA node hw queue */
+	h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);
 
 	rv = scsi_add_host(h->scsi_host, &h->pdev->dev);
 	if (rv) {
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
                   ` (6 preceding siblings ...)
  2018-02-27 10:07 ` [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset Ming Lei
@ 2018-02-27 10:07 ` Ming Lei
  2018-02-28 14:58   ` Kashyap Desai
  2018-03-01 21:46 ` [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Laurence Oberman
  8 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-27 10:07 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Ming Lei

It is observed on null_blk that IOPS can be improved much by simply making
hw queue per NUMA node, so this patch applies the introduced .host_tagset
for improving performance.

In reality, .can_queue is quite big, and NUMA node number is often small, so
each hw queue's depth should be high enough to saturate device.

Cc: Arun Easi <arun.easi@cavium.com>
Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Peter Rivera <peter.rivera@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 065956cb2aeb..0b46f97cbfdb 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3177,6 +3177,7 @@ static struct scsi_host_template megasas_template = {
 	.use_clustering = ENABLE_CLUSTERING,
 	.change_queue_depth = scsi_change_queue_depth,
 	.no_write_same = 1,
+	.host_tagset = 1,
 };
 
 /**
@@ -5947,6 +5948,8 @@ static int megasas_start_aen(struct megasas_instance *instance)
 static int megasas_io_attach(struct megasas_instance *instance)
 {
 	struct Scsi_Host *host = instance->host;
+	/* 256 tags should be high enough to saturate device */
+	int max_queues = DIV_ROUND_UP(host->can_queue, 256);
 
 	/*
 	 * Export parameters required by SCSI mid-layer
@@ -5987,6 +5990,9 @@ static int megasas_io_attach(struct megasas_instance *instance)
 	host->max_lun = MEGASAS_MAX_LUN;
 	host->max_cmd_len = 16;
 
+	/* per NUMA node hw queue */
+	host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);
+
 	/*
 	 * Notify the mid-layer about the new controller
 	 */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-02-27 10:07 ` [PATCH V3 8/8] scsi: megaraid: " Ming Lei
@ 2018-02-28 14:58   ` Kashyap Desai
  2018-02-28 15:21     ` Ming Lei
  2018-03-07  5:27     ` Ming Lei
  0 siblings, 2 replies; 54+ messages in thread
From: Kashyap Desai @ 2018-02-28 14:58 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

Ming -

Quick testing on my setup -  Performance slightly degraded (4-5% drop)for
megaraid_sas driver with this patch. (From 1610K IOPS it goes to 1544K)
I confirm that after applying this patch, we have #queue = #numa node.

ls -l
/sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2:23/10:
2:23:0/block/sdy/mq
total 0
drwxr-xr-x. 18 root root 0 Feb 28 09:53 0
drwxr-xr-x. 18 root root 0 Feb 28 09:53 1


I would suggest to skip megaraid_sas driver changes using shared_tagset
until and unless there is obvious gain. If overall interface of using
shared_tagset is commit in kernel tree, we will investigate (megaraid_sas
driver) in future about real benefit of using it.

Without patch -

  4.64%  [megaraid_sas]           [k] complete_cmd_fusion
   3.23%  [kernel]                 [k] irq_entries_start
   3.18%  [kernel]                 [k] _raw_spin_lock
   3.06%  [kernel]                 [k] syscall_return_via_sysret
   2.74%  [kernel]                 [k] bt_iter
   2.55%  [kernel]                 [k] scsi_queue_rq
   2.21%  [megaraid_sas]           [k] megasas_build_io_fusion
   1.80%  [megaraid_sas]           [k] megasas_queue_command
   1.59%  [kernel]                 [k] __audit_syscall_exit
   1.55%  [kernel]                 [k] _raw_spin_lock_irqsave
   1.38%  [megaraid_sas]           [k] megasas_build_and_issue_cmd_fusion
   1.34%  [kernel]                 [k] do_io_submit
   1.33%  [kernel]                 [k] gup_pgd_range
   1.26%  [kernel]                 [k] scsi_softirq_done
   1.20%  fio                      [.] __fio_gettime
   1.20%  [kernel]                 [k] switch_mm_irqs_off
   1.00%  [megaraid_sas]           [k] megasas_build_ldio_fusion
   0.97%  fio                      [.] get_io_u
   0.89%  [kernel]                 [k] lookup_ioctx
   0.80%  [kernel]                 [k] scsi_dec_host_busy
   0.78%  [kernel]                 [k] blkdev_direct_IO
   0.78%  [megaraid_sas]           [k] MR_GetPhyParams
   0.73%  [kernel]                 [k] aio_read_events
   0.70%  [megaraid_sas]           [k] MR_BuildRaidContext
   0.64%  [kernel]                 [k] blk_mq_complete_request
   0.64%  fio                      [.] thread_main
   0.63%  [kernel]                 [k] blk_queue_split
   0.63%  [kernel]                 [k] blk_mq_get_request
   0.61%  [kernel]                 [k] read_tsc
   0.59%  [kernel]                 [k] kmem_cache_a


With patch -

   4.36%  [megaraid_sas]           [k] complete_cmd_fusion
   3.24%  [kernel]                 [k] irq_entries_start
   3.00%  [kernel]                 [k] syscall_return_via_sysret
   2.41%  [kernel]                 [k] scsi_queue_rq
   2.41%  [kernel]                 [k] _raw_spin_lock
   2.22%  [megaraid_sas]           [k] megasas_build_io_fusion
   1.92%  [kernel]                 [k] bt_iter
   1.74%  [megaraid_sas]           [k] megasas_queue_command
   1.48%  [kernel]                 [k] gup_pgd_range
   1.44%  [kernel]                 [k] __audit_syscall_exit
   1.33%  [megaraid_sas]           [k] megasas_build_and_issue_cmd_fusion
   1.29%  [kernel]                 [k] _raw_spin_lock_irqsave
   1.25%  fio                      [.] get_io_u
   1.24%  fio                      [.] __fio_gettime
   1.22%  [kernel]                 [k] do_io_submit
   1.18%  [megaraid_sas]           [k] megasas_build_ldio_fusion
   1.02%  [kernel]                 [k] blk_mq_get_request
   0.91%  [kernel]                 [k] lookup_ioctx
   0.91%  [kernel]                 [k] scsi_softirq_done
   0.88%  [kernel]                 [k] scsi_dec_host_busy
   0.87%  [kernel]                 [k] blkdev_direct_IO
   0.77%  [megaraid_sas]           [k] MR_BuildRaidContext
   0.76%  [megaraid_sas]           [k] MR_GetPhyParams
   0.73%  [kernel]                 [k] __fget
   0.70%  [kernel]                 [k] switch_mm_irqs_off
   0.70%  fio                      [.] thread_main
   0.69%  [kernel]                 [k] aio_read_events
   0.68%  [kernel]                 [k] note_interrupt
   0.65%  [kernel]                 [k] do_syscal

Kashyap

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Tuesday, February 27, 2018 3:38 PM
> To: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
Snitzer
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar
Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
Kashyap
> Desai; Peter Rivera; Laurence Oberman; Ming Lei
> Subject: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via
> .host_tagset
>
> It is observed on null_blk that IOPS can be improved much by simply
making
> hw queue per NUMA node, so this patch applies the introduced
.host_tagset
> for improving performance.
>
> In reality, .can_queue is quite big, and NUMA node number is often
small, so
> each hw queue's depth should be high enough to saturate device.
>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Laurence Oberman <loberman@redhat.com>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 065956cb2aeb..0b46f97cbfdb 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -3177,6 +3177,7 @@ static struct scsi_host_template megasas_template
> = {
>  	.use_clustering = ENABLE_CLUSTERING,
>  	.change_queue_depth = scsi_change_queue_depth,
>  	.no_write_same = 1,
> +	.host_tagset = 1,
>  };
>
>  /**
> @@ -5947,6 +5948,8 @@ static int megasas_start_aen(struct
> megasas_instance *instance)  static int megasas_io_attach(struct
> megasas_instance *instance)  {
>  	struct Scsi_Host *host = instance->host;
> +	/* 256 tags should be high enough to saturate device */
> +	int max_queues = DIV_ROUND_UP(host->can_queue, 256);
>
>  	/*
>  	 * Export parameters required by SCSI mid-layer @@ -5987,6 +5990,9
> @@ static int megasas_io_attach(struct megasas_instance *instance)
>  	host->max_lun = MEGASAS_MAX_LUN;
>  	host->max_cmd_len = 16;
>
> +	/* per NUMA node hw queue */
> +	host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);
> +
>  	/*
>  	 * Notify the mid-layer about the new controller
>  	 */
> --
> 2.9.5

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-02-28 14:58   ` Kashyap Desai
@ 2018-02-28 15:21     ` Ming Lei
  2018-02-28 16:22       ` Laurence Oberman
  2018-03-07  5:27     ` Ming Lei
  1 sibling, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-28 15:21 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> Ming -
> 
> Quick testing on my setup -  Performance slightly degraded (4-5% drop)for
> megaraid_sas driver with this patch. (From 1610K IOPS it goes to 1544K)
> I confirm that after applying this patch, we have #queue = #numa node.
> 
> ls -l
> /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2:23/10:
> 2:23:0/block/sdy/mq
> total 0
> drwxr-xr-x. 18 root root 0 Feb 28 09:53 0
> drwxr-xr-x. 18 root root 0 Feb 28 09:53 1

OK, thanks for your test.

As I mentioned to you, this patch should have improved performance on
megaraid_sas, but the current slight degrade might be caused by
scsi_host_queue_ready() in scsi_queue_rq(), I guess.

With .host_tagset enabled and use per-numa-node hw queue, request can be
queued to lld more frequently/quick than single queue, then the cost of
atomic_inc_return(&host->host_busy) may be increased much meantime,
think about millions of such operations, and finally slight IOPS drop
is observed when the hw queue depth becomes half of .can_queue.

> 
> 
> I would suggest to skip megaraid_sas driver changes using shared_tagset
> until and unless there is obvious gain. If overall interface of using
> shared_tagset is commit in kernel tree, we will investigate (megaraid_sas
> driver) in future about real benefit of using it.

I'd suggest to not merge it until it is proved that performance can be
improved in real device.

I will try to work to remove the expensive atomic_inc_return(&host->host_busy)
from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is done, will
ask you to test again.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-02-28 15:21     ` Ming Lei
@ 2018-02-28 16:22       ` Laurence Oberman
  2018-03-01  5:24         ` Kashyap Desai
  0 siblings, 1 reply; 54+ messages in thread
From: Laurence Oberman @ 2018-02-28 16:22 UTC (permalink / raw)
  To: Ming Lei, Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera

On Wed, 2018-02-28 at 23:21 +0800, Ming Lei wrote:
> On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > Ming -
> > 
> > Quick testing on my setup -  Performance slightly degraded (4-5%
> > drop)for
> > megaraid_sas driver with this patch. (From 1610K IOPS it goes to
> > 1544K)
> > I confirm that after applying this patch, we have #queue = #numa
> > node.
> > 
> > ls -l
> > /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2
> > :23/10:
> > 2:23:0/block/sdy/mq
> > total 0
> > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0
> > drwxr-xr-x. 18 root root 0 Feb 28 09:53 1
> 
> OK, thanks for your test.
> 
> As I mentioned to you, this patch should have improved performance on
> megaraid_sas, but the current slight degrade might be caused by
> scsi_host_queue_ready() in scsi_queue_rq(), I guess.
> 
> With .host_tagset enabled and use per-numa-node hw queue, request can
> be
> queued to lld more frequently/quick than single queue, then the cost
> of
> atomic_inc_return(&host->host_busy) may be increased much meantime,
> think about millions of such operations, and finally slight IOPS drop
> is observed when the hw queue depth becomes half of .can_queue.
> 
> > 
> > 
> > I would suggest to skip megaraid_sas driver changes using
> > shared_tagset
> > until and unless there is obvious gain. If overall interface of
> > using
> > shared_tagset is commit in kernel tree, we will investigate
> > (megaraid_sas
> > driver) in future about real benefit of using it.
> 
> I'd suggest to not merge it until it is proved that performance can
> be
> improved in real device.
> 
> I will try to work to remove the expensive atomic_inc_return(&host-
> >host_busy)
> from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is
> done, will
> ask you to test again.
> 
> 
> Thanks,
> Ming

I will test this here as well
I just put the Megaraid card in to my system here

Kashyap, do you have ssd's on the back-end and are you you using jbods
or virtual devices. Let me have your config.
I only have 6G sas shelves though.

Regards
Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-02-28 16:22       ` Laurence Oberman
@ 2018-03-01  5:24         ` Kashyap Desai
  2018-03-01  7:58           ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Kashyap Desai @ 2018-03-01  5:24 UTC (permalink / raw)
  To: Laurence Oberman, Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera

> -----Original Message-----
> From: Laurence Oberman [mailto:loberman@redhat.com]
> Sent: Wednesday, February 28, 2018 9:52 PM
> To: Ming Lei; Kashyap Desai
> Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace; Peter
> Rivera
> Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> via
> .host_tagset
>
> On Wed, 2018-02-28 at 23:21 +0800, Ming Lei wrote:
> > On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > > Ming -
> > >
> > > Quick testing on my setup -  Performance slightly degraded (4-5%
> > > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it
> > > goes to
> > > 1544K)
> > > I confirm that after applying this patch, we have #queue = #numa
> > > node.
> > >
> > > ls -l
> > > /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2
> > > :23/10:
> > > 2:23:0/block/sdy/mq
> > > total 0
> > > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > > Feb 28 09:53 1
> >
> > OK, thanks for your test.
> >
> > As I mentioned to you, this patch should have improved performance on
> > megaraid_sas, but the current slight degrade might be caused by
> > scsi_host_queue_ready() in scsi_queue_rq(), I guess.
> >
> > With .host_tagset enabled and use per-numa-node hw queue, request can
> > be queued to lld more frequently/quick than single queue, then the
> > cost of
> > atomic_inc_return(&host->host_busy) may be increased much meantime,
> > think about millions of such operations, and finally slight IOPS drop
> > is observed when the hw queue depth becomes half of .can_queue.
> >
> > >
> > >
> > > I would suggest to skip megaraid_sas driver changes using
> > > shared_tagset until and unless there is obvious gain. If overall
> > > interface of using shared_tagset is commit in kernel tree, we will
> > > investigate (megaraid_sas
> > > driver) in future about real benefit of using it.
> >
> > I'd suggest to not merge it until it is proved that performance can be
> > improved in real device.

Noted.

> >
> > I will try to work to remove the expensive atomic_inc_return(&host-
> > >host_busy)
> > from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is
> > done, will ask you to test again.

Ming - Do you mean removing host_busy stats  from scsi_queue_rq() will still
provide correct value in host_busy whenever IO reach to LLD ?

> >
> >
> > Thanks,
> > Ming
>
> I will test this here as well
> I just put the Megaraid card in to my system here
>
> Kashyap, do you have ssd's on the back-end and are you you using jbods or
> virtual devices. Let me have your config.
> I only have 6G sas shelves though.

Laurence -
I am using 12 SSD drives in JBOD mode OR single drive R0 mode.  Single SSD
is capable of ~138K IOPS (4K RR).
With all 12 SSDs performance scale linearly and goes upto ~1610K IOPS.

I think if you have 6G SAS fully loaded, you may need more number of drives
to reach 1600K IOPs (sequential load with nomerges=2 on HDD is required to
avoid IO merge at block layer.)

SSD model I am using is -  HGST  - " HUSMH8020BSS200"
Here is lscpu output of my setup -

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                32
On-line CPU(s) list:   0-31
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:              1
CPU MHz:               1726.217
BogoMIPS:              4199.37
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
NUMA node0 CPU(s):     0-7,16-23
NUMA node1 CPU(s):     8-15,24-31

>
> Regards
> Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-01  5:24         ` Kashyap Desai
@ 2018-03-01  7:58           ` Ming Lei
  0 siblings, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-03-01  7:58 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Laurence Oberman, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, linux-scsi, Hannes Reinecke, Arun Easi,
	Omar Sandoval, Martin K . Petersen, James Bottomley,
	Christoph Hellwig, Don Brace, Peter Rivera

On Thu, Mar 01, 2018 at 10:54:17AM +0530, Kashyap Desai wrote:
> > -----Original Message-----
> > From: Laurence Oberman [mailto:loberman@redhat.com]
> > Sent: Wednesday, February 28, 2018 9:52 PM
> > To: Ming Lei; Kashyap Desai
> > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> > Snitzer;
> > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace; Peter
> > Rivera
> > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> > via
> > .host_tagset
> >
> > On Wed, 2018-02-28 at 23:21 +0800, Ming Lei wrote:
> > > On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > > > Ming -
> > > >
> > > > Quick testing on my setup -  Performance slightly degraded (4-5%
> > > > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it
> > > > goes to
> > > > 1544K)
> > > > I confirm that after applying this patch, we have #queue = #numa
> > > > node.
> > > >
> > > > ls -l
> > > > /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2
> > > > :23/10:
> > > > 2:23:0/block/sdy/mq
> > > > total 0
> > > > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > > > Feb 28 09:53 1
> > >
> > > OK, thanks for your test.
> > >
> > > As I mentioned to you, this patch should have improved performance on
> > > megaraid_sas, but the current slight degrade might be caused by
> > > scsi_host_queue_ready() in scsi_queue_rq(), I guess.
> > >
> > > With .host_tagset enabled and use per-numa-node hw queue, request can
> > > be queued to lld more frequently/quick than single queue, then the
> > > cost of
> > > atomic_inc_return(&host->host_busy) may be increased much meantime,
> > > think about millions of such operations, and finally slight IOPS drop
> > > is observed when the hw queue depth becomes half of .can_queue.
> > >
> > > >
> > > >
> > > > I would suggest to skip megaraid_sas driver changes using
> > > > shared_tagset until and unless there is obvious gain. If overall
> > > > interface of using shared_tagset is commit in kernel tree, we will
> > > > investigate (megaraid_sas
> > > > driver) in future about real benefit of using it.
> > >
> > > I'd suggest to not merge it until it is proved that performance can be
> > > improved in real device.
> 
> Noted.
> 
> > >
> > > I will try to work to remove the expensive atomic_inc_return(&host-
> > > >host_busy)
> > > from scsi_queue_rq(), since it isn't needed for SCSI_MQ, once it is
> > > done, will ask you to test again.
> 
> Ming - Do you mean removing host_busy stats  from scsi_queue_rq() will still
> provide correct value in host_busy whenever IO reach to LLD ?

The host queue depth has been respected by blk-mq already before calling
scsi_queue_rq(), so not necessary to do it again in scsi_queue_rq(), but this
counter is needed in error handler, so we have to figure out one way to not
break error handler.

Also megaraid_sas driver need to be checked if there is host wide lock
used in .queuecommand or completion path.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-02-27 10:07 ` [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue Ming Lei
@ 2018-03-01 16:18   ` Don Brace
  2018-03-01 19:01     ` Laurence Oberman
  2018-03-02  0:47     ` Ming Lei
  2018-03-08  7:50   ` Christoph Hellwig
  1 sibling, 2 replies; 54+ messages in thread
From: Don Brace @ 2018-03-01 16:18 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Laurence Oberman, Meelis Roos

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Tuesday, February 27, 2018 4:08 AM
> To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org; Christoph
> Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com>
> Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>; Arun Easi
> <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K .
> Petersen <martin.petersen@oracle.com>; James Bottomley
> <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@lst.de>;
> Don Brace <don.brace@microsemi.com>; Kashyap Desai
> <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom.com>;
> Laurence Oberman <loberman@redhat.com>; Ming Lei
> <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
>=20
> EXTERNAL EMAIL
>=20
>=20
> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
> one msix vector can be created without any online CPU mapped, then one
> command's completion may not be notified.
>=20
> This patch setups mapping between cpu and reply queue according to irq
> affinity info retrived by pci_irq_get_affinity(), and uses this mapping
> table to choose reply queue for queuing one command.
>=20
> Then the chosen reply queue has to be active, and fixes IO hang caused
> by using inactive reply queue which doesn't have any online CPU mapped.
>=20
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: Arun Easi <arun.easi@cavium.com>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Peter Rivera <peter.rivera@broadcom.com>
> Cc: Laurence Oberman <loberman@redhat.com>
> Cc: Meelis Roos <mroos@linux.ee>
> Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPU=
s")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>

I am getting some issues that need to be tracked down:

[ 1636.032984] hpsa 0000:87:00.0: Acknowledging event: 0xc0000032 (HP SSD S=
mart Path configuration change)
[ 1638.510656] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-Access     H=
P       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En- Exp=3D0
[ 1653.967695] hpsa 0000:87:00.0: Acknowledging event: 0x80000020 (HP SSD S=
mart Path configuration change)
[ 1656.770377] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-Access     H=
P       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En- Exp=3D0
[ 2839.762267] hpsa 0000:87:00.0: Acknowledging event: 0x80000020 (HP SSD S=
mart Path configuration change)
[ 2840.841290] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-Access     H=
P       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En- Exp=3D0
[ 2917.582653] hpsa 0000:87:00.0: Acknowledging event: 0xc0000020 (HP SSD S=
mart Path configuration change)
[ 2919.087191] hpsa 0000:87:00.0: scsi 3:1:0:1: updated Direct-Access     H=
P       LOGICAL VOLUME   RAID-5 SSDSmartPathCap+ En+ Exp=3D1
[ 2919.142527] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs: [3:1:0:2] A p=
hys disk component of LV is missing, turning off offload_enabled for LV.
[ 2919.203915] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs: [3:1:0:2] A p=
hys disk component of LV is missing, turning off offload_enabled for LV.
[ 2919.266921] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs: [3:1:0:2] A p=
hys disk component of LV is missing, turning off offload_enabled for LV.
[ 2934.999629] hpsa 0000:87:00.0: Acknowledging event: 0x40000000 (HP SSD S=
mart Path state change)
[ 2936.937333] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs: [3:1:0:2] A p=
hys disk component of LV is missing, turning off offload_enabled for LV.
[ 2936.998707] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs: [3:1:0:2] A p=
hys disk component of LV is missing, turning off offload_enabled for LV.
[ 2937.060101] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs: [3:1:0:2] A p=
hys disk component of LV is missing, turning off offload_enabled for LV.
[ 3619.711122] sd 3:1:0:3: [sde] tag#436 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3619.751150] sd 3:1:0:3: [sde] tag#436 Sense Key : Aborted Command [curre=
nt]=20
[ 3619.784375] sd 3:1:0:3: [sde] tag#436 Add. Sense: Internal target failur=
e
[ 3619.816530] sd 3:1:0:3: [sde] tag#436 CDB: Read(10) 28 00 01 1b ad af 00=
 00 01 00
[ 3619.852295] print_req_error: I/O error, dev sde, sector 18591151
[ 3619.880850] sd 3:1:0:3: [sde] tag#461 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3619.920981] sd 3:1:0:3: [sde] tag#461 Sense Key : Aborted Command [curre=
nt]=20
[ 3619.955081] sd 3:1:0:3: [sde] tag#461 Add. Sense: Internal target failur=
e
[ 3619.987054] sd 3:1:0:3: [sde] tag#461 CDB: Read(10) 28 00 02 15 31 40 00=
 00 01 00
[ 3620.022569] print_req_error: I/O error, dev sde, sector 34943296
[ 3620.050873] sd 3:1:0:3: [sde] tag#157 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3620.091124] sd 3:1:0:3: [sde] tag#157 Sense Key : Aborted Command [curre=
nt]=20
[ 3620.124179] sd 3:1:0:3: [sde] tag#157 Add. Sense: Internal target failur=
e
[ 3620.156203] sd 3:1:0:3: [sde] tag#157 CDB: Read(10) 28 00 03 65 9d 7e 00=
 00 01 00
[ 3620.191520] print_req_error: I/O error, dev sde, sector 56991102
[ 3620.220308] sd 3:1:0:3: [sde] tag#266 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3620.260273] sd 3:1:0:3: [sde] tag#266 Sense Key : Aborted Command [curre=
nt]=20
[ 3620.294605] sd 3:1:0:3: [sde] tag#266 Add. Sense: Internal target failur=
e
[ 3620.328353] sd 3:1:0:3: [sde] tag#266 CDB: Read(10) 28 00 09 92 94 70 00=
 00 01 00
[ 3620.364807] print_req_error: I/O error, dev sde, sector 160601200
[ 3620.394342] sd 3:1:0:3: [sde] tag#278 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3620.434462] sd 3:1:0:3: [sde] tag#278 Sense Key : Aborted Command [curre=
nt]=20
[ 3620.469059] sd 3:1:0:3: [sde] tag#278 Add. Sense: Internal target failur=
e
[ 3620.471761] sd 3:1:0:3: [sde] tag#467 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3620.502240] sd 3:1:0:3: [sde] tag#278 CDB: Read(10) 28 00 08 00 12 ea 00=
 00 01 00
[ 3620.543157] sd 3:1:0:3: [sde] tag#467 Sense Key : Aborted Command [curre=
nt]=20
[ 3620.580375] print_req_error: I/O error, dev sde, sector 134222570
[ 3620.615355] sd 3:1:0:3: [sde] tag#467 Add. Sense: Internal target failur=
e
[ 3620.645069] sd 3:1:0:3: [sde] tag#244 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3620.678696] sd 3:1:0:3: [sde] tag#467 CDB: Read(10) 28 00 10 3f 2b fc 00=
 00 01 00
[ 3620.720247] sd 3:1:0:3: [sde] tag#244 Sense Key : Aborted Command [curre=
nt]=20
[ 3620.756776] print_req_error: I/O error, dev sde, sector 272575484
[ 3620.791857] sd 3:1:0:3: [sde] tag#244 Add. Sense: Internal target failur=
e
[ 3620.822272] sd 3:1:0:3: [sde] tag#431 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3620.855200] sd 3:1:0:3: [sde] tag#244 CDB: Read(10) 28 00 08 31 86 d9 00=
 00 01 00
[ 3620.895823] sd 3:1:0:3: [sde] tag#431 Sense Key : Aborted Command [curre=
nt]=20
[ 3620.931923] print_req_error: I/O error, dev sde, sector 137463513
[ 3620.966262] sd 3:1:0:3: [sde] tag#431 Add. Sense: Internal target failur=
e
[ 3620.995715] sd 3:1:0:3: [sde] tag#226 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3621.028703] sd 3:1:0:3: [sde] tag#431 CDB: Read(10) 28 00 10 7c b2 b0 00=
 00 01 00
[ 3621.069686] sd 3:1:0:3: [sde] tag#226 Sense Key : Aborted Command [curre=
nt]=20
[ 3621.106253] print_req_error: I/O error, dev sde, sector 276607664
[ 3621.140782] sd 3:1:0:3: [sde] tag#226 Add. Sense: Internal target failur=
e
[ 3621.170241] sd 3:1:0:3: [sde] tag#408 FAILED Result: hostbyte=3DDID_OK d=
riverbyte=3DDRIVER_SENSE
[ 3621.202997] sd 3:1:0:3: [sde] tag#226 CDB: Read(10) 28 00 08 ba cf f2 00=
 00 01 00
[ 3621.243870] sd 3:1:0:3: [sde] tag#408 Sense Key : Aborted Command [curre=
nt]=20
[ 3621.280015] print_req_error: I/O error, dev sde, sector 146460658
[ 3621.313941] sd 3:1:0:3: [sde] tag#408 Add. Sense: Internal target failur=
e
[ 3621.343790] print_req_error: I/O error, dev sde, sector 98830586
[ 3621.376164] sd 3:1:0:3: [sde] tag#408 CDB: Read(10) 28 00 14 da 6a 53 00=
 00 01 00
[ 3641.714842] WARNING: CPU: 3 PID: 0 at kernel/rcu/tree.c:2713 rcu_process=
_callbacks+0x4d5/0x510
[ 3641.756175] Modules linked in: sg ip6t_rpfilter ip6t_REJECT nf_reject_ip=
v6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_=
ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack cfg80211 rfkill ebtable_nat e=
btable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6tab=
le_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_=
security iptable_raw iptable_filter ip_tables sb_edac x86_pkg_temp_thermal =
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmuln=
i_intel pcbc iTCO_wdt iTCO_vendor_support aesni_intel crypto_simd glue_help=
er cryptd pcspkr hpilo hpwdt ioatdma shpchp ipmi_si lpc_ich dca mfd_core wm=
i ipmi_msghandler acpi_power_meter pcc_cpufreq uinput xfs libcrc32c mgag200=
 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect
[ 3642.094993]  sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core tg3 hps=
a scsi_transport_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod dax
[ 3642.158883] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.16.0-rc3+ #18
[ 3642.190015] Hardware name: HP ProLiant DL580 Gen8, BIOS P79 08/18/2016
[ 3642.221949] RIP: 0010:rcu_process_callbacks+0x4d5/0x510
[ 3642.247606] RSP: 0018:ffff8e179f6c3f08 EFLAGS: 00010002
[ 3642.273087] RAX: 0000000000000000 RBX: ffff8e179f6e3180 RCX: ffff8e279d1=
e8918
[ 3642.307426] RDX: ffffffffffffd801 RSI: ffff8e179f6c3f18 RDI: ffff8e179f6=
e31b8
[ 3642.342219] RBP: ffffffffb70a31c0 R08: ffff8e279d1e8918 R09: 00000000000=
00100
[ 3642.376929] R10: 0000000000000004 R11: 0000000000000005 R12: ffff8e179f6=
e31b8
[ 3642.411598] R13: ffff8e179d20ad00 R14: 0000000000000001 R15: 7ffffffffff=
fffff
[ 3642.445957] FS:  0000000000000000(0000) GS:ffff8e179f6c0000(0000) knlGS:=
0000000000000000
[ 3642.485599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3642.513678] CR2: 00007f30917b9008 CR3: 000000054900a006 CR4: 00000000001=
606e0
[ 3642.548189] Call Trace:
[ 3642.560411]  <IRQ>
[ 3642.570588]  __do_softirq+0xd1/0x275
[ 3642.588643]  irq_exit+0xd5/0xe0
[ 3642.604134]  smp_apic_timer_interrupt+0x60/0x120
[ 3642.626752]  apic_timer_interrupt+0xf/0x20
[ 3642.646712]  </IRQ>
[ 3642.657330] RIP: 0010:cpuidle_enter_state+0xd4/0x260
[ 3642.681389] RSP: 0018:ffffaed7c00e7ea0 EFLAGS: 00000246 ORIG_RAX: ffffff=
ffffffff12
[ 3642.717937] RAX: ffff8e179f6e2280 RBX: ffffcebfbfec1bb8 RCX: 00000000000=
0001f
[ 3642.752525] RDX: 0000000000000000 RSI: ff6c3b1b90a53a78 RDI: 00000000000=
00000
[ 3642.787181] RBP: 0000000000000003 R08: 0000000000000005 R09: 00000000000=
00396
[ 3642.821442] R10: 00000000000003a7 R11: 0000000000000008 R12: 00000000000=
00003
[ 3642.856381] R13: 0000034fe70ea52c R14: 0000000000000003 R15: 0000034fe71=
d99d4
[ 3642.890830]  do_idle+0x172/0x1e0
[ 3642.906714]  cpu_startup_entry+0x6f/0x80
[ 3642.925835]  start_secondary+0x187/0x1e0
[ 3642.944975]  secondary_startup_64+0xa5/0xb0
[ 3642.965719] Code: e9 db fd ff ff 4c 89 f6 4c 89 e7 e8 96 b8 63 00 e9 56 =
fc ff ff 0f 0b e9 34 fc ff ff 0f 0b 0f 1f 84 00 00 00 00 00 e9 e0 fb ff ff =
<0f> 0b 66 0f 1f 84 00 00 00 00 00 e9 e5 fd ff ff 0f 0b 66 0f 1f=20
[ 3643.056198] ---[ end trace 7bdac969b3138de7 ]---
[ 3735.745955] hpsa 0000:87:00.0: SCSI status: LUN:000000c000002601 CDB:120=
10000040000000000000000000000
[ 3735.790497] hpsa 0000:87:00.0: SCSI Status =3D 02, Sense key =3D 0x05, A=
SC =3D 0x25, ASCQ =3D 0x00
> ---
>  drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++--------=
------
>  drivers/scsi/hpsa.h |  1 +
>  2 files changed, 55 insertions(+), 19 deletions(-)
>=20
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 5293e6827ce5..3a9eca163db8 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info
> *h, struct CommandList *c,
>                 c->busaddr |=3D 1 | (h->blockFetchTable[c->Header.SGList]=
 << 1);
>                 if (unlikely(!h->msix_vectors))
>                         return;
> -               if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -                       c->Header.ReplyQueue =3D
> -                               raw_smp_processor_id() % h->nreply_queues=
;
> -               else
> -                       c->Header.ReplyQueue =3D reply_queue % h->nreply_=
queues;
> +               c->Header.ReplyQueue =3D reply_queue;
>         }
>  }
>=20
> @@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct
> ctlr_info *h,
>          * Tell the controller to post the reply to the queue for this
>          * processor.  This seems to give the best I/O throughput.
>          */
> -       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -               cp->ReplyQueue =3D smp_processor_id() % h->nreply_queues;
> -       else
> -               cp->ReplyQueue =3D reply_queue % h->nreply_queues;
> +       cp->ReplyQueue =3D reply_queue;
>         /*
>          * Set the bits in the address sent down to include:
>          *  - performant mode bit (bit 0)
> @@ -1087,10 +1080,7 @@ static void
> set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
>         /* Tell the controller to post the reply to the queue for this
>          * processor.  This seems to give the best I/O throughput.
>          */
> -       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -               cp->reply_queue =3D smp_processor_id() % h->nreply_queues=
;
> -       else
> -               cp->reply_queue =3D reply_queue % h->nreply_queues;
> +       cp->reply_queue =3D reply_queue;
>         /* Set the bits in the address sent down to include:
>          *  - performant mode bit not used in ioaccel mode 2
>          *  - pull count (bits 0-3)
> @@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct
> ctlr_info *h,
>          * Tell the controller to post the reply to the queue for this
>          * processor.  This seems to give the best I/O throughput.
>          */
> -       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -               cp->reply_queue =3D smp_processor_id() % h->nreply_queues=
;
> -       else
> -               cp->reply_queue =3D reply_queue % h->nreply_queues;
> +       cp->reply_queue =3D reply_queue;
>         /*
>          * Set the bits in the address sent down to include:
>          *  - performant mode bit not used in ioaccel mode 2
> @@ -1157,6 +1144,8 @@ static void __enqueue_cmd_and_start_io(struct
> ctlr_info *h,
>  {
>         dial_down_lockup_detection_during_fw_flash(h, c);
>         atomic_inc(&h->commands_outstanding);
> +
> +       reply_queue =3D h->reply_map[raw_smp_processor_id()];
>         switch (c->cmd_type) {
>         case CMD_IOACCEL1:
>                 set_ioaccel1_performant_mode(h, c, reply_queue);
> @@ -7376,6 +7365,26 @@ static void hpsa_disable_interrupt_mode(struct
> ctlr_info *h)
>         h->msix_vectors =3D 0;
>  }
>=20
> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> +{
> +       const struct cpumask *mask;
> +       unsigned int queue, cpu;
> +
> +       for (queue =3D 0; queue < h->msix_vectors; queue++) {
> +               mask =3D pci_irq_get_affinity(h->pdev, queue);
> +               if (!mask)
> +                       goto fallback;
> +
> +               for_each_cpu(cpu, mask)
> +                       h->reply_map[cpu] =3D queue;
> +       }
> +       return;
> +
> +fallback:
> +       for_each_possible_cpu(cpu)
> +               h->reply_map[cpu] =3D 0;
> +}
> +
>  /* If MSI/MSI-X is supported by the kernel we will try to enable it on
>   * controllers that are capable. If not, we use legacy INTx mode.
>   */
> @@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info *h)
>         err =3D hpsa_interrupt_mode(h);
>         if (err)
>                 goto clean1;
> +
> +       /* setup mapping between CPU and reply queue */
> +       hpsa_setup_reply_map(h);
> +
>         err =3D hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
>         if (err)
>                 goto clean2;    /* intmode+region, pci */
> @@ -8480,6 +8493,28 @@ static struct workqueue_struct
> *hpsa_create_controller_wq(struct ctlr_info *h,
>         return wq;
>  }
>=20
> +static void hpda_free_ctlr_info(struct ctlr_info *h)
> +{
> +       kfree(h->reply_map);
> +       kfree(h);
> +}
> +
> +static struct ctlr_info *hpda_alloc_ctlr_info(void)
> +{
> +       struct ctlr_info *h;
> +
> +       h =3D kzalloc(sizeof(*h), GFP_KERNEL);
> +       if (!h)
> +               return NULL;
> +
> +       h->reply_map =3D kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_=
KERNEL);
> +       if (!h->reply_map) {
> +               kfree(h);
> +               return NULL;
> +       }
> +       return h;
> +}
> +
>  static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_i=
d *ent)
>  {
>         int dac, rc;
> @@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev *pdev, cons=
t
> struct pci_device_id *ent)
>          * the driver.  See comments in hpsa.h for more info.
>          */
>         BUILD_BUG_ON(sizeof(struct CommandList) %
> COMMANDLIST_ALIGNMENT);
> -       h =3D kzalloc(sizeof(*h), GFP_KERNEL);
> +       h =3D hpda_alloc_ctlr_info();
>         if (!h) {
>                 dev_err(&pdev->dev, "Failed to allocate controller head\n=
");
>                 return -ENOMEM;
> @@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev *pdev)
>         h->lockup_detected =3D NULL;                      /* init_one 2 *=
/
>         /* (void) pci_disable_pcie_error_reporting(pdev); */    /* init_o=
ne 1 */
>=20
> -       kfree(h);                                       /* init_one 1 */
> +       hpda_free_ctlr_info(h);                         /* init_one 1 */
>  }
>=20
>  static int hpsa_suspend(__attribute__((unused)) struct pci_dev *pdev,
> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> index 018f980a701c..fb9f5e7f8209 100644
> --- a/drivers/scsi/hpsa.h
> +++ b/drivers/scsi/hpsa.h
> @@ -158,6 +158,7 @@ struct bmic_controller_parameters {
>  #pragma pack()
>=20
>  struct ctlr_info {
> +       unsigned int *reply_map;
>         int     ctlr;
>         char    devname[8];
>         char    *product_name;
> --
> 2.9.5

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-01 16:18   ` Don Brace
@ 2018-03-01 19:01     ` Laurence Oberman
  2018-03-01 21:19       ` Laurence Oberman
  2018-03-02  0:47     ` Ming Lei
  1 sibling, 1 reply; 54+ messages in thread
From: Laurence Oberman @ 2018-03-01 19:01 UTC (permalink / raw)
  To: Don Brace, Ming Lei, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Meelis Roos

On Thu, 2018-03-01 at 16:18 +0000, Don Brace wrote:
> > -----Original Message-----
> > From: Ming Lei [mailto:ming.lei@redhat.com]
> > Sent: Tuesday, February 27, 2018 4:08 AM
> > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org;
> > Christoph
> > Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com>
> > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>;
> > Arun Easi
> > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K .
> > Petersen <martin.petersen@oracle.com>; James Bottomley
> > <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@lst
> > .de>;
> > Don Brace <don.brace@microsemi.com>; Kashyap Desai
> > <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom.c
> > om>;
> > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
> > 
> > EXTERNAL EMAIL
> > 
> > 
> > From 84676c1f21 (genirq/affinity: assign vectors to all possible
> > CPUs),
> > one msix vector can be created without any online CPU mapped, then
> > one
> > command's completion may not be notified.
> > 
> > This patch setups mapping between cpu and reply queue according to
> > irq
> > affinity info retrived by pci_irq_get_affinity(), and uses this
> > mapping
> > table to choose reply queue for queuing one command.
> > 
> > Then the chosen reply queue has to be active, and fixes IO hang
> > caused
> > by using inactive reply queue which doesn't have any online CPU
> > mapped.
> > 
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Arun Easi <arun.easi@cavium.com>
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > Cc: Christoph Hellwig <hch@lst.de>,
> > Cc: Don Brace <don.brace@microsemi.com>
> > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > Cc: Laurence Oberman <loberman@redhat.com>
> > Cc: Meelis Roos <mroos@linux.ee>
> > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all
> > possible CPUs")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> I am getting some issues that need to be tracked down:
> 
> [ 1636.032984] hpsa 0000:87:00.0: Acknowledging event: 0xc0000032 (HP
> SSD Smart Path configuration change)
> [ 1638.510656] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En-
> Exp=0
> [ 1653.967695] hpsa 0000:87:00.0: Acknowledging event: 0x80000020 (HP
> SSD Smart Path configuration change)
> [ 1656.770377] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En-
> Exp=0
> [ 2839.762267] hpsa 0000:87:00.0: Acknowledging event: 0x80000020 (HP
> SSD Smart Path configuration change)
> [ 2840.841290] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En-
> Exp=0
> [ 2917.582653] hpsa 0000:87:00.0: Acknowledging event: 0xc0000020 (HP
> SSD Smart Path configuration change)
> [ 2919.087191] hpsa 0000:87:00.0: scsi 3:1:0:1: updated Direct-
> Access     HP       LOGICAL VOLUME   RAID-5 SSDSmartPathCap+ En+
> Exp=1
> [ 2919.142527] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> [3:1:0:2] A phys disk component of LV is missing, turning off
> offload_enabled for LV.
> [ 2919.203915] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> [3:1:0:2] A phys disk component of LV is missing, turning off
> offload_enabled for LV.
> [ 2919.266921] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> [3:1:0:2] A phys disk component of LV is missing, turning off
> offload_enabled for LV.
> [ 2934.999629] hpsa 0000:87:00.0: Acknowledging event: 0x40000000 (HP
> SSD Smart Path state change)
> [ 2936.937333] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> [3:1:0:2] A phys disk component of LV is missing, turning off
> offload_enabled for LV.
> [ 2936.998707] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> [3:1:0:2] A phys disk component of LV is missing, turning off
> offload_enabled for LV.
> [ 2937.060101] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> [3:1:0:2] A phys disk component of LV is missing, turning off
> offload_enabled for LV.
> [ 3619.711122] sd 3:1:0:3: [sde] tag#436 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3619.751150] sd 3:1:0:3: [sde] tag#436 Sense Key : Aborted Command
> [current] 
> [ 3619.784375] sd 3:1:0:3: [sde] tag#436 Add. Sense: Internal target
> failure
> [ 3619.816530] sd 3:1:0:3: [sde] tag#436 CDB: Read(10) 28 00 01 1b ad
> af 00 00 01 00
> [ 3619.852295] print_req_error: I/O error, dev sde, sector 18591151
> [ 3619.880850] sd 3:1:0:3: [sde] tag#461 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3619.920981] sd 3:1:0:3: [sde] tag#461 Sense Key : Aborted Command
> [current] 
> [ 3619.955081] sd 3:1:0:3: [sde] tag#461 Add. Sense: Internal target
> failure
> [ 3619.987054] sd 3:1:0:3: [sde] tag#461 CDB: Read(10) 28 00 02 15 31
> 40 00 00 01 00
> [ 3620.022569] print_req_error: I/O error, dev sde, sector 34943296
> [ 3620.050873] sd 3:1:0:3: [sde] tag#157 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3620.091124] sd 3:1:0:3: [sde] tag#157 Sense Key : Aborted Command
> [current] 
> [ 3620.124179] sd 3:1:0:3: [sde] tag#157 Add. Sense: Internal target
> failure
> [ 3620.156203] sd 3:1:0:3: [sde] tag#157 CDB: Read(10) 28 00 03 65 9d
> 7e 00 00 01 00
> [ 3620.191520] print_req_error: I/O error, dev sde, sector 56991102
> [ 3620.220308] sd 3:1:0:3: [sde] tag#266 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3620.260273] sd 3:1:0:3: [sde] tag#266 Sense Key : Aborted Command
> [current] 
> [ 3620.294605] sd 3:1:0:3: [sde] tag#266 Add. Sense: Internal target
> failure
> [ 3620.328353] sd 3:1:0:3: [sde] tag#266 CDB: Read(10) 28 00 09 92 94
> 70 00 00 01 00
> [ 3620.364807] print_req_error: I/O error, dev sde, sector 160601200
> [ 3620.394342] sd 3:1:0:3: [sde] tag#278 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3620.434462] sd 3:1:0:3: [sde] tag#278 Sense Key : Aborted Command
> [current] 
> [ 3620.469059] sd 3:1:0:3: [sde] tag#278 Add. Sense: Internal target
> failure
> [ 3620.471761] sd 3:1:0:3: [sde] tag#467 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3620.502240] sd 3:1:0:3: [sde] tag#278 CDB: Read(10) 28 00 08 00 12
> ea 00 00 01 00
> [ 3620.543157] sd 3:1:0:3: [sde] tag#467 Sense Key : Aborted Command
> [current] 
> [ 3620.580375] print_req_error: I/O error, dev sde, sector 134222570
> [ 3620.615355] sd 3:1:0:3: [sde] tag#467 Add. Sense: Internal target
> failure
> [ 3620.645069] sd 3:1:0:3: [sde] tag#244 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3620.678696] sd 3:1:0:3: [sde] tag#467 CDB: Read(10) 28 00 10 3f 2b
> fc 00 00 01 00
> [ 3620.720247] sd 3:1:0:3: [sde] tag#244 Sense Key : Aborted Command
> [current] 
> [ 3620.756776] print_req_error: I/O error, dev sde, sector 272575484
> [ 3620.791857] sd 3:1:0:3: [sde] tag#244 Add. Sense: Internal target
> failure
> [ 3620.822272] sd 3:1:0:3: [sde] tag#431 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3620.855200] sd 3:1:0:3: [sde] tag#244 CDB: Read(10) 28 00 08 31 86
> d9 00 00 01 00
> [ 3620.895823] sd 3:1:0:3: [sde] tag#431 Sense Key : Aborted Command
> [current] 
> [ 3620.931923] print_req_error: I/O error, dev sde, sector 137463513
> [ 3620.966262] sd 3:1:0:3: [sde] tag#431 Add. Sense: Internal target
> failure
> [ 3620.995715] sd 3:1:0:3: [sde] tag#226 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3621.028703] sd 3:1:0:3: [sde] tag#431 CDB: Read(10) 28 00 10 7c b2
> b0 00 00 01 00
> [ 3621.069686] sd 3:1:0:3: [sde] tag#226 Sense Key : Aborted Command
> [current] 
> [ 3621.106253] print_req_error: I/O error, dev sde, sector 276607664
> [ 3621.140782] sd 3:1:0:3: [sde] tag#226 Add. Sense: Internal target
> failure
> [ 3621.170241] sd 3:1:0:3: [sde] tag#408 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 3621.202997] sd 3:1:0:3: [sde] tag#226 CDB: Read(10) 28 00 08 ba cf
> f2 00 00 01 00
> [ 3621.243870] sd 3:1:0:3: [sde] tag#408 Sense Key : Aborted Command
> [current] 
> [ 3621.280015] print_req_error: I/O error, dev sde, sector 146460658
> [ 3621.313941] sd 3:1:0:3: [sde] tag#408 Add. Sense: Internal target
> failure
> [ 3621.343790] print_req_error: I/O error, dev sde, sector 98830586
> [ 3621.376164] sd 3:1:0:3: [sde] tag#408 CDB: Read(10) 28 00 14 da 6a
> 53 00 00 01 00
> [ 3641.714842] WARNING: CPU: 3 PID: 0 at kernel/rcu/tree.c:2713
> rcu_process_callbacks+0x4d5/0x510
> [ 3641.756175] Modules linked in: sg ip6t_rpfilter ip6t_REJECT
> nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
> nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> nf_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp
> llc ebtable_filter ebtables ip6table_mangle ip6table_security
> ip6table_raw ip6table_filter ip6_tables iptable_mangle
> iptable_security iptable_raw iptable_filter ip_tables sb_edac
> x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
> iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd pcspkr
> hpilo hpwdt ioatdma shpchp ipmi_si lpc_ich dca mfd_core wmi
> ipmi_msghandler acpi_power_meter pcc_cpufreq uinput xfs libcrc32c
> mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect
> [ 3642.094993]  sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core
> tg3 hpsa scsi_transport_sas usb_storage dm_mirror dm_region_hash
> dm_log dm_mod dax
> [ 3642.158883] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.16.0-rc3+
> #18
> [ 3642.190015] Hardware name: HP ProLiant DL580 Gen8, BIOS P79
> 08/18/2016
> [ 3642.221949] RIP: 0010:rcu_process_callbacks+0x4d5/0x510
> [ 3642.247606] RSP: 0018:ffff8e179f6c3f08 EFLAGS: 00010002
> [ 3642.273087] RAX: 0000000000000000 RBX: ffff8e179f6e3180 RCX:
> ffff8e279d1e8918
> [ 3642.307426] RDX: ffffffffffffd801 RSI: ffff8e179f6c3f18 RDI:
> ffff8e179f6e31b8
> [ 3642.342219] RBP: ffffffffb70a31c0 R08: ffff8e279d1e8918 R09:
> 0000000000000100
> [ 3642.376929] R10: 0000000000000004 R11: 0000000000000005 R12:
> ffff8e179f6e31b8
> [ 3642.411598] R13: ffff8e179d20ad00 R14: 0000000000000001 R15:
> 7fffffffffffffff
> [ 3642.445957] FS:  0000000000000000(0000) GS:ffff8e179f6c0000(0000)
> knlGS:0000000000000000
> [ 3642.485599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3642.513678] CR2: 00007f30917b9008 CR3: 000000054900a006 CR4:
> 00000000001606e0
> [ 3642.548189] Call Trace:
> [ 3642.560411]  <IRQ>
> [ 3642.570588]  __do_softirq+0xd1/0x275
> [ 3642.588643]  irq_exit+0xd5/0xe0
> [ 3642.604134]  smp_apic_timer_interrupt+0x60/0x120
> [ 3642.626752]  apic_timer_interrupt+0xf/0x20
> [ 3642.646712]  </IRQ>
> [ 3642.657330] RIP: 0010:cpuidle_enter_state+0xd4/0x260
> [ 3642.681389] RSP: 0018:ffffaed7c00e7ea0 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffff12
> [ 3642.717937] RAX: ffff8e179f6e2280 RBX: ffffcebfbfec1bb8 RCX:
> 000000000000001f
> [ 3642.752525] RDX: 0000000000000000 RSI: ff6c3b1b90a53a78 RDI:
> 0000000000000000
> [ 3642.787181] RBP: 0000000000000003 R08: 0000000000000005 R09:
> 0000000000000396
> [ 3642.821442] R10: 00000000000003a7 R11: 0000000000000008 R12:
> 0000000000000003
> [ 3642.856381] R13: 0000034fe70ea52c R14: 0000000000000003 R15:
> 0000034fe71d99d4
> [ 3642.890830]  do_idle+0x172/0x1e0
> [ 3642.906714]  cpu_startup_entry+0x6f/0x80
> [ 3642.925835]  start_secondary+0x187/0x1e0
> [ 3642.944975]  secondary_startup_64+0xa5/0xb0
> [ 3642.965719] Code: e9 db fd ff ff 4c 89 f6 4c 89 e7 e8 96 b8 63 00
> e9 56 fc ff ff 0f 0b e9 34 fc ff ff 0f 0b 0f 1f 84 00 00 00 00 00 e9
> e0 fb ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 e9 e5 fd ff ff 0f 0b
> 66 0f 1f 
> [ 3643.056198] ---[ end trace 7bdac969b3138de7 ]---
> [ 3735.745955] hpsa 0000:87:00.0: SCSI status: LUN:000000c000002601
> CDB:12010000040000000000000000000000
> [ 3735.790497] hpsa 0000:87:00.0: SCSI Status = 02, Sense key = 0x05,
> ASC = 0x25, ASCQ = 0x00
> > ---
> >  drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++
> > --------------
> >  drivers/scsi/hpsa.h |  1 +
> >  2 files changed, 55 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 5293e6827ce5..3a9eca163db8 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct
> > ctlr_info
> > *h, struct CommandList *c,
> >                 c->busaddr |= 1 | (h->blockFetchTable[c-
> > >Header.SGList] << 1);
> >                 if (unlikely(!h->msix_vectors))
> >                         return;
> > -               if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -                       c->Header.ReplyQueue =
> > -                               raw_smp_processor_id() % h-
> > >nreply_queues;
> > -               else
> > -                       c->Header.ReplyQueue = reply_queue % h-
> > >nreply_queues;
> > +               c->Header.ReplyQueue = reply_queue;
> >         }
> >  }
> > 
> > @@ -1063,10 +1059,7 @@ static void
> > set_ioaccel1_performant_mode(struct
> > ctlr_info *h,
> >          * Tell the controller to post the reply to the queue for
> > this
> >          * processor.  This seems to give the best I/O throughput.
> >          */
> > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -               cp->ReplyQueue = smp_processor_id() % h-
> > >nreply_queues;
> > -       else
> > -               cp->ReplyQueue = reply_queue % h->nreply_queues;
> > +       cp->ReplyQueue = reply_queue;
> >         /*
> >          * Set the bits in the address sent down to include:
> >          *  - performant mode bit (bit 0)
> > @@ -1087,10 +1080,7 @@ static void
> > set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
> >         /* Tell the controller to post the reply to the queue for
> > this
> >          * processor.  This seems to give the best I/O throughput.
> >          */
> > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -               cp->reply_queue = smp_processor_id() % h-
> > >nreply_queues;
> > -       else
> > -               cp->reply_queue = reply_queue % h->nreply_queues;
> > +       cp->reply_queue = reply_queue;
> >         /* Set the bits in the address sent down to include:
> >          *  - performant mode bit not used in ioaccel mode 2
> >          *  - pull count (bits 0-3)
> > @@ -1109,10 +1099,7 @@ static void
> > set_ioaccel2_performant_mode(struct
> > ctlr_info *h,
> >          * Tell the controller to post the reply to the queue for
> > this
> >          * processor.  This seems to give the best I/O throughput.
> >          */
> > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > -               cp->reply_queue = smp_processor_id() % h-
> > >nreply_queues;
> > -       else
> > -               cp->reply_queue = reply_queue % h->nreply_queues;
> > +       cp->reply_queue = reply_queue;
> >         /*
> >          * Set the bits in the address sent down to include:
> >          *  - performant mode bit not used in ioaccel mode 2
> > @@ -1157,6 +1144,8 @@ static void __enqueue_cmd_and_start_io(struct
> > ctlr_info *h,
> >  {
> >         dial_down_lockup_detection_during_fw_flash(h, c);
> >         atomic_inc(&h->commands_outstanding);
> > +
> > +       reply_queue = h->reply_map[raw_smp_processor_id()];
> >         switch (c->cmd_type) {
> >         case CMD_IOACCEL1:
> >                 set_ioaccel1_performant_mode(h, c, reply_queue);
> > @@ -7376,6 +7365,26 @@ static void
> > hpsa_disable_interrupt_mode(struct
> > ctlr_info *h)
> >         h->msix_vectors = 0;
> >  }
> > 
> > +static void hpsa_setup_reply_map(struct ctlr_info *h)
> > +{
> > +       const struct cpumask *mask;
> > +       unsigned int queue, cpu;
> > +
> > +       for (queue = 0; queue < h->msix_vectors; queue++) {
> > +               mask = pci_irq_get_affinity(h->pdev, queue);
> > +               if (!mask)
> > +                       goto fallback;
> > +
> > +               for_each_cpu(cpu, mask)
> > +                       h->reply_map[cpu] = queue;
> > +       }
> > +       return;
> > +
> > +fallback:
> > +       for_each_possible_cpu(cpu)
> > +               h->reply_map[cpu] = 0;
> > +}
> > +
> >  /* If MSI/MSI-X is supported by the kernel we will try to enable
> > it on
> >   * controllers that are capable. If not, we use legacy INTx mode.
> >   */
> > @@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info
> > *h)
> >         err = hpsa_interrupt_mode(h);
> >         if (err)
> >                 goto clean1;
> > +
> > +       /* setup mapping between CPU and reply queue */
> > +       hpsa_setup_reply_map(h);
> > +
> >         err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
> >         if (err)
> >                 goto clean2;    /* intmode+region, pci */
> > @@ -8480,6 +8493,28 @@ static struct workqueue_struct
> > *hpsa_create_controller_wq(struct ctlr_info *h,
> >         return wq;
> >  }
> > 
> > +static void hpda_free_ctlr_info(struct ctlr_info *h)
> > +{
> > +       kfree(h->reply_map);
> > +       kfree(h);
> > +}
> > +
> > +static struct ctlr_info *hpda_alloc_ctlr_info(void)
> > +{
> > +       struct ctlr_info *h;
> > +
> > +       h = kzalloc(sizeof(*h), GFP_KERNEL);
> > +       if (!h)
> > +               return NULL;
> > +
> > +       h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids,
> > GFP_KERNEL);
> > +       if (!h->reply_map) {
> > +               kfree(h);
> > +               return NULL;
> > +       }
> > +       return h;
> > +}
> > +
> >  static int hpsa_init_one(struct pci_dev *pdev, const struct
> > pci_device_id *ent)
> >  {
> >         int dac, rc;
> > @@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev
> > *pdev, const
> > struct pci_device_id *ent)
> >          * the driver.  See comments in hpsa.h for more info.
> >          */
> >         BUILD_BUG_ON(sizeof(struct CommandList) %
> > COMMANDLIST_ALIGNMENT);
> > -       h = kzalloc(sizeof(*h), GFP_KERNEL);
> > +       h = hpda_alloc_ctlr_info();
> >         if (!h) {
> >                 dev_err(&pdev->dev, "Failed to allocate controller
> > head\n");
> >                 return -ENOMEM;
> > @@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev
> > *pdev)
> >         h->lockup_detected = NULL;                      /* init_one
> > 2 */
> >         /* (void) pci_disable_pcie_error_reporting(pdev); */    /*
> > init_one 1 */
> > 
> > -       kfree(h);                                       /* init_one
> > 1 */
> > +       hpda_free_ctlr_info(h);                         /* init_one
> > 1 */
> >  }
> > 
> >  static int hpsa_suspend(__attribute__((unused)) struct pci_dev
> > *pdev,
> > diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> > index 018f980a701c..fb9f5e7f8209 100644
> > --- a/drivers/scsi/hpsa.h
> > +++ b/drivers/scsi/hpsa.h
> > @@ -158,6 +158,7 @@ struct bmic_controller_parameters {
> >  #pragma pack()
> > 
> >  struct ctlr_info {
> > +       unsigned int *reply_map;
> >         int     ctlr;
> >         char    devname[8];
> >         char    *product_name;
> > --
> > 2.9.5
> 
> 

I have a DL580 here with the following:

Ming's latest tree
4.16.0-rc2.ming+

3:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6
controllers (rev 01) P410i

/dev/sg0  1 0 0 0  12  HP        P410i             6.60
/dev/sg1  1 1 0 0  0  /dev/sda  HP        LOGICAL VOLUME    6.60
Boot volume

/dev/sg2  1 1 0 1  0  /dev/sdb  HP        LOGICAL VOLUME    6.60
Single disk

/dev/sg3  1 1 0 2  0  /dev/sdc  HP        LOGICAL VOLUME    6.60  
2 Disk Mirror


MSA50 Shelf at 6GB, all Jbods

0e:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
2208 [Thunderbolt] (rev 03)

/dev/sg4  0 0 43 0  0  /dev/sdd  HP        DG072A9BB7        HPD0
/dev/sg5  0 0 44 0  0  /dev/sde  HP        DG146BABCF        HPD5
/dev/sg6  0 0 45 0  0  /dev/sdf  HP        DG146BABCF        HPD6
/dev/sg7  0 0 46 0  0  /dev/sdg  HP        EG0146FAWHU       HPDE   
/dev/sg8  0 0 47 0  0  /dev/sdh  HP        EG0146FAWHU       HPDD
/dev/sg9  0 0 48 0  0  /dev/sdi  HP        EG0146FAWHU       HPDE
/dev/sg10  0 0 49 0  0  /dev/sdj  ATA       OCZ-VERTEX4       1.5 
/dev/sg11  0 0 50 0  0  /dev/sdk  ATA       OCZ-VERTEX4       1.5 
/dev/sg12  0 0 51 0  0  /dev/sdl  ATA       INTEL SSDSC2BW08  DC32
/dev/sg13  0 0 52 0  13  HP        MSA50  -10D25G1   1.20

I have multiple boot passes on the HPSA all passing, and have not had
any access issues with Ming's patches to the megaraid_sas drives

I dont have the decent SSD hardware to test performance on the
megaraid_sas to match Kashyap unfortunately.

What I can say is that so far all boot testing has passed.

I will exercise all the drives now to see if I can bring about any
issues seen by Don

Thanks
Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-01 19:01     ` Laurence Oberman
@ 2018-03-01 21:19       ` Laurence Oberman
  2018-03-02  2:16         ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Laurence Oberman @ 2018-03-01 21:19 UTC (permalink / raw)
  To: Don Brace, Ming Lei, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Meelis Roos

On Thu, 2018-03-01 at 14:01 -0500, Laurence Oberman wrote:
> On Thu, 2018-03-01 at 16:18 +0000, Don Brace wrote:
> > > -----Original Message-----
> > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > Sent: Tuesday, February 27, 2018 4:08 AM
> > > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org;
> > > Christoph
> > > Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com>
> > > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>;
> > > Arun Easi
> > > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K
> > > .
> > > Petersen <martin.petersen@oracle.com>; James Bottomley
> > > <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@l
> > > st
> > > .de>;
> > > Don Brace <don.brace@microsemi.com>; Kashyap Desai
> > > <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom
> > > .c
> > > om>;
> > > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
> > > 
> > > EXTERNAL EMAIL
> > > 
> > > 
> > > From 84676c1f21 (genirq/affinity: assign vectors to all possible
> > > CPUs),
> > > one msix vector can be created without any online CPU mapped,
> > > then
> > > one
> > > command's completion may not be notified.
> > > 
> > > This patch setups mapping between cpu and reply queue according
> > > to
> > > irq
> > > affinity info retrived by pci_irq_get_affinity(), and uses this
> > > mapping
> > > table to choose reply queue for queuing one command.
> > > 
> > > Then the chosen reply queue has to be active, and fixes IO hang
> > > caused
> > > by using inactive reply queue which doesn't have any online CPU
> > > mapped.
> > > 
> > > Cc: Hannes Reinecke <hare@suse.de>
> > > Cc: Arun Easi <arun.easi@cavium.com>
> > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > > Cc: Christoph Hellwig <hch@lst.de>,
> > > Cc: Don Brace <don.brace@microsemi.com>
> > > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > > Cc: Laurence Oberman <loberman@redhat.com>
> > > Cc: Meelis Roos <mroos@linux.ee>
> > > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all
> > > possible CPUs")
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > 
> > I am getting some issues that need to be tracked down:
> > 
> > [ 1636.032984] hpsa 0000:87:00.0: Acknowledging event: 0xc0000032
> > (HP
> > SSD Smart Path configuration change)
> > [ 1638.510656] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En-
> > Exp=0
> > [ 1653.967695] hpsa 0000:87:00.0: Acknowledging event: 0x80000020
> > (HP
> > SSD Smart Path configuration change)
> > [ 1656.770377] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En-
> > Exp=0
> > [ 2839.762267] hpsa 0000:87:00.0: Acknowledging event: 0x80000020
> > (HP
> > SSD Smart Path configuration change)
> > [ 2840.841290] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap- En-
> > Exp=0
> > [ 2917.582653] hpsa 0000:87:00.0: Acknowledging event: 0xc0000020
> > (HP
> > SSD Smart Path configuration change)
> > [ 2919.087191] hpsa 0000:87:00.0: scsi 3:1:0:1: updated Direct-
> > Access     HP       LOGICAL VOLUME   RAID-5 SSDSmartPathCap+ En+
> > Exp=1
> > [ 2919.142527] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > [3:1:0:2] A phys disk component of LV is missing, turning off
> > offload_enabled for LV.
> > [ 2919.203915] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > [3:1:0:2] A phys disk component of LV is missing, turning off
> > offload_enabled for LV.
> > [ 2919.266921] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > [3:1:0:2] A phys disk component of LV is missing, turning off
> > offload_enabled for LV.
> > [ 2934.999629] hpsa 0000:87:00.0: Acknowledging event: 0x40000000
> > (HP
> > SSD Smart Path state change)
> > [ 2936.937333] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > [3:1:0:2] A phys disk component of LV is missing, turning off
> > offload_enabled for LV.
> > [ 2936.998707] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > [3:1:0:2] A phys disk component of LV is missing, turning off
> > offload_enabled for LV.
> > [ 2937.060101] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > [3:1:0:2] A phys disk component of LV is missing, turning off
> > offload_enabled for LV.
> > [ 3619.711122] sd 3:1:0:3: [sde] tag#436 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3619.751150] sd 3:1:0:3: [sde] tag#436 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3619.784375] sd 3:1:0:3: [sde] tag#436 Add. Sense: Internal
> > target
> > failure
> > [ 3619.816530] sd 3:1:0:3: [sde] tag#436 CDB: Read(10) 28 00 01 1b
> > ad
> > af 00 00 01 00
> > [ 3619.852295] print_req_error: I/O error, dev sde, sector 18591151
> > [ 3619.880850] sd 3:1:0:3: [sde] tag#461 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3619.920981] sd 3:1:0:3: [sde] tag#461 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3619.955081] sd 3:1:0:3: [sde] tag#461 Add. Sense: Internal
> > target
> > failure
> > [ 3619.987054] sd 3:1:0:3: [sde] tag#461 CDB: Read(10) 28 00 02 15
> > 31
> > 40 00 00 01 00
> > [ 3620.022569] print_req_error: I/O error, dev sde, sector 34943296
> > [ 3620.050873] sd 3:1:0:3: [sde] tag#157 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3620.091124] sd 3:1:0:3: [sde] tag#157 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3620.124179] sd 3:1:0:3: [sde] tag#157 Add. Sense: Internal
> > target
> > failure
> > [ 3620.156203] sd 3:1:0:3: [sde] tag#157 CDB: Read(10) 28 00 03 65
> > 9d
> > 7e 00 00 01 00
> > [ 3620.191520] print_req_error: I/O error, dev sde, sector 56991102
> > [ 3620.220308] sd 3:1:0:3: [sde] tag#266 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3620.260273] sd 3:1:0:3: [sde] tag#266 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3620.294605] sd 3:1:0:3: [sde] tag#266 Add. Sense: Internal
> > target
> > failure
> > [ 3620.328353] sd 3:1:0:3: [sde] tag#266 CDB: Read(10) 28 00 09 92
> > 94
> > 70 00 00 01 00
> > [ 3620.364807] print_req_error: I/O error, dev sde, sector
> > 160601200
> > [ 3620.394342] sd 3:1:0:3: [sde] tag#278 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3620.434462] sd 3:1:0:3: [sde] tag#278 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3620.469059] sd 3:1:0:3: [sde] tag#278 Add. Sense: Internal
> > target
> > failure
> > [ 3620.471761] sd 3:1:0:3: [sde] tag#467 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3620.502240] sd 3:1:0:3: [sde] tag#278 CDB: Read(10) 28 00 08 00
> > 12
> > ea 00 00 01 00
> > [ 3620.543157] sd 3:1:0:3: [sde] tag#467 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3620.580375] print_req_error: I/O error, dev sde, sector
> > 134222570
> > [ 3620.615355] sd 3:1:0:3: [sde] tag#467 Add. Sense: Internal
> > target
> > failure
> > [ 3620.645069] sd 3:1:0:3: [sde] tag#244 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3620.678696] sd 3:1:0:3: [sde] tag#467 CDB: Read(10) 28 00 10 3f
> > 2b
> > fc 00 00 01 00
> > [ 3620.720247] sd 3:1:0:3: [sde] tag#244 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3620.756776] print_req_error: I/O error, dev sde, sector
> > 272575484
> > [ 3620.791857] sd 3:1:0:3: [sde] tag#244 Add. Sense: Internal
> > target
> > failure
> > [ 3620.822272] sd 3:1:0:3: [sde] tag#431 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3620.855200] sd 3:1:0:3: [sde] tag#244 CDB: Read(10) 28 00 08 31
> > 86
> > d9 00 00 01 00
> > [ 3620.895823] sd 3:1:0:3: [sde] tag#431 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3620.931923] print_req_error: I/O error, dev sde, sector
> > 137463513
> > [ 3620.966262] sd 3:1:0:3: [sde] tag#431 Add. Sense: Internal
> > target
> > failure
> > [ 3620.995715] sd 3:1:0:3: [sde] tag#226 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3621.028703] sd 3:1:0:3: [sde] tag#431 CDB: Read(10) 28 00 10 7c
> > b2
> > b0 00 00 01 00
> > [ 3621.069686] sd 3:1:0:3: [sde] tag#226 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3621.106253] print_req_error: I/O error, dev sde, sector
> > 276607664
> > [ 3621.140782] sd 3:1:0:3: [sde] tag#226 Add. Sense: Internal
> > target
> > failure
> > [ 3621.170241] sd 3:1:0:3: [sde] tag#408 FAILED Result:
> > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > [ 3621.202997] sd 3:1:0:3: [sde] tag#226 CDB: Read(10) 28 00 08 ba
> > cf
> > f2 00 00 01 00
> > [ 3621.243870] sd 3:1:0:3: [sde] tag#408 Sense Key : Aborted
> > Command
> > [current] 
> > [ 3621.280015] print_req_error: I/O error, dev sde, sector
> > 146460658
> > [ 3621.313941] sd 3:1:0:3: [sde] tag#408 Add. Sense: Internal
> > target
> > failure
> > [ 3621.343790] print_req_error: I/O error, dev sde, sector 98830586
> > [ 3621.376164] sd 3:1:0:3: [sde] tag#408 CDB: Read(10) 28 00 14 da
> > 6a
> > 53 00 00 01 00
> > [ 3641.714842] WARNING: CPU: 3 PID: 0 at kernel/rcu/tree.c:2713
> > rcu_process_callbacks+0x4d5/0x510
> > [ 3641.756175] Modules linked in: sg ip6t_rpfilter ip6t_REJECT
> > nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
> > nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> > nf_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp
> > llc ebtable_filter ebtables ip6table_mangle ip6table_security
> > ip6table_raw ip6table_filter ip6_tables iptable_mangle
> > iptable_security iptable_raw iptable_filter ip_tables sb_edac
> > x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass
> > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
> > iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd
> > pcspkr
> > hpilo hpwdt ioatdma shpchp ipmi_si lpc_ich dca mfd_core wmi
> > ipmi_msghandler acpi_power_meter pcc_cpufreq uinput xfs libcrc32c
> > mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect
> > [ 3642.094993]  sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core
> > tg3 hpsa scsi_transport_sas usb_storage dm_mirror dm_region_hash
> > dm_log dm_mod dax
> > [ 3642.158883] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.16.0-
> > rc3+
> > #18
> > [ 3642.190015] Hardware name: HP ProLiant DL580 Gen8, BIOS P79
> > 08/18/2016
> > [ 3642.221949] RIP: 0010:rcu_process_callbacks+0x4d5/0x510
> > [ 3642.247606] RSP: 0018:ffff8e179f6c3f08 EFLAGS: 00010002
> > [ 3642.273087] RAX: 0000000000000000 RBX: ffff8e179f6e3180 RCX:
> > ffff8e279d1e8918
> > [ 3642.307426] RDX: ffffffffffffd801 RSI: ffff8e179f6c3f18 RDI:
> > ffff8e179f6e31b8
> > [ 3642.342219] RBP: ffffffffb70a31c0 R08: ffff8e279d1e8918 R09:
> > 0000000000000100
> > [ 3642.376929] R10: 0000000000000004 R11: 0000000000000005 R12:
> > ffff8e179f6e31b8
> > [ 3642.411598] R13: ffff8e179d20ad00 R14: 0000000000000001 R15:
> > 7fffffffffffffff
> > [ 3642.445957] FS:  0000000000000000(0000)
> > GS:ffff8e179f6c0000(0000)
> > knlGS:0000000000000000
> > [ 3642.485599] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3642.513678] CR2: 00007f30917b9008 CR3: 000000054900a006 CR4:
> > 00000000001606e0
> > [ 3642.548189] Call Trace:
> > [ 3642.560411]  <IRQ>
> > [ 3642.570588]  __do_softirq+0xd1/0x275
> > [ 3642.588643]  irq_exit+0xd5/0xe0
> > [ 3642.604134]  smp_apic_timer_interrupt+0x60/0x120
> > [ 3642.626752]  apic_timer_interrupt+0xf/0x20
> > [ 3642.646712]  </IRQ>
> > [ 3642.657330] RIP: 0010:cpuidle_enter_state+0xd4/0x260
> > [ 3642.681389] RSP: 0018:ffffaed7c00e7ea0 EFLAGS: 00000246
> > ORIG_RAX:
> > ffffffffffffff12
> > [ 3642.717937] RAX: ffff8e179f6e2280 RBX: ffffcebfbfec1bb8 RCX:
> > 000000000000001f
> > [ 3642.752525] RDX: 0000000000000000 RSI: ff6c3b1b90a53a78 RDI:
> > 0000000000000000
> > [ 3642.787181] RBP: 0000000000000003 R08: 0000000000000005 R09:
> > 0000000000000396
> > [ 3642.821442] R10: 00000000000003a7 R11: 0000000000000008 R12:
> > 0000000000000003
> > [ 3642.856381] R13: 0000034fe70ea52c R14: 0000000000000003 R15:
> > 0000034fe71d99d4
> > [ 3642.890830]  do_idle+0x172/0x1e0
> > [ 3642.906714]  cpu_startup_entry+0x6f/0x80
> > [ 3642.925835]  start_secondary+0x187/0x1e0
> > [ 3642.944975]  secondary_startup_64+0xa5/0xb0
> > [ 3642.965719] Code: e9 db fd ff ff 4c 89 f6 4c 89 e7 e8 96 b8 63
> > 00
> > e9 56 fc ff ff 0f 0b e9 34 fc ff ff 0f 0b 0f 1f 84 00 00 00 00 00
> > e9
> > e0 fb ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 e9 e5 fd ff ff 0f 0b
> > 66 0f 1f 
> > [ 3643.056198] ---[ end trace 7bdac969b3138de7 ]---
> > [ 3735.745955] hpsa 0000:87:00.0: SCSI status: LUN:000000c000002601
> > CDB:12010000040000000000000000000000
> > [ 3735.790497] hpsa 0000:87:00.0: SCSI Status = 02, Sense key =
> > 0x05,
> > ASC = 0x25, ASCQ = 0x00
> > > ---
> > >  drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++
> > > --------------
> > >  drivers/scsi/hpsa.h |  1 +
> > >  2 files changed, 55 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > > index 5293e6827ce5..3a9eca163db8 100644
> > > --- a/drivers/scsi/hpsa.c
> > > +++ b/drivers/scsi/hpsa.c
> > > @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct
> > > ctlr_info
> > > *h, struct CommandList *c,
> > >                 c->busaddr |= 1 | (h->blockFetchTable[c-
> > > > Header.SGList] << 1);
> > > 
> > >                 if (unlikely(!h->msix_vectors))
> > >                         return;
> > > -               if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > -                       c->Header.ReplyQueue =
> > > -                               raw_smp_processor_id() % h-
> > > > nreply_queues;
> > > 
> > > -               else
> > > -                       c->Header.ReplyQueue = reply_queue % h-
> > > > nreply_queues;
> > > 
> > > +               c->Header.ReplyQueue = reply_queue;
> > >         }
> > >  }
> > > 
> > > @@ -1063,10 +1059,7 @@ static void
> > > set_ioaccel1_performant_mode(struct
> > > ctlr_info *h,
> > >          * Tell the controller to post the reply to the queue for
> > > this
> > >          * processor.  This seems to give the best I/O
> > > throughput.
> > >          */
> > > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > -               cp->ReplyQueue = smp_processor_id() % h-
> > > > nreply_queues;
> > > 
> > > -       else
> > > -               cp->ReplyQueue = reply_queue % h->nreply_queues;
> > > +       cp->ReplyQueue = reply_queue;
> > >         /*
> > >          * Set the bits in the address sent down to include:
> > >          *  - performant mode bit (bit 0)
> > > @@ -1087,10 +1080,7 @@ static void
> > > set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
> > >         /* Tell the controller to post the reply to the queue for
> > > this
> > >          * processor.  This seems to give the best I/O
> > > throughput.
> > >          */
> > > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > -               cp->reply_queue = smp_processor_id() % h-
> > > > nreply_queues;
> > > 
> > > -       else
> > > -               cp->reply_queue = reply_queue % h->nreply_queues;
> > > +       cp->reply_queue = reply_queue;
> > >         /* Set the bits in the address sent down to include:
> > >          *  - performant mode bit not used in ioaccel mode 2
> > >          *  - pull count (bits 0-3)
> > > @@ -1109,10 +1099,7 @@ static void
> > > set_ioaccel2_performant_mode(struct
> > > ctlr_info *h,
> > >          * Tell the controller to post the reply to the queue for
> > > this
> > >          * processor.  This seems to give the best I/O
> > > throughput.
> > >          */
> > > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > -               cp->reply_queue = smp_processor_id() % h-
> > > > nreply_queues;
> > > 
> > > -       else
> > > -               cp->reply_queue = reply_queue % h->nreply_queues;
> > > +       cp->reply_queue = reply_queue;
> > >         /*
> > >          * Set the bits in the address sent down to include:
> > >          *  - performant mode bit not used in ioaccel mode 2
> > > @@ -1157,6 +1144,8 @@ static void
> > > __enqueue_cmd_and_start_io(struct
> > > ctlr_info *h,
> > >  {
> > >         dial_down_lockup_detection_during_fw_flash(h, c);
> > >         atomic_inc(&h->commands_outstanding);
> > > +
> > > +       reply_queue = h->reply_map[raw_smp_processor_id()];
> > >         switch (c->cmd_type) {
> > >         case CMD_IOACCEL1:
> > >                 set_ioaccel1_performant_mode(h, c, reply_queue);
> > > @@ -7376,6 +7365,26 @@ static void
> > > hpsa_disable_interrupt_mode(struct
> > > ctlr_info *h)
> > >         h->msix_vectors = 0;
> > >  }
> > > 
> > > +static void hpsa_setup_reply_map(struct ctlr_info *h)
> > > +{
> > > +       const struct cpumask *mask;
> > > +       unsigned int queue, cpu;
> > > +
> > > +       for (queue = 0; queue < h->msix_vectors; queue++) {
> > > +               mask = pci_irq_get_affinity(h->pdev, queue);
> > > +               if (!mask)
> > > +                       goto fallback;
> > > +
> > > +               for_each_cpu(cpu, mask)
> > > +                       h->reply_map[cpu] = queue;
> > > +       }
> > > +       return;
> > > +
> > > +fallback:
> > > +       for_each_possible_cpu(cpu)
> > > +               h->reply_map[cpu] = 0;
> > > +}
> > > +
> > >  /* If MSI/MSI-X is supported by the kernel we will try to enable
> > > it on
> > >   * controllers that are capable. If not, we use legacy INTx
> > > mode.
> > >   */
> > > @@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info
> > > *h)
> > >         err = hpsa_interrupt_mode(h);
> > >         if (err)
> > >                 goto clean1;
> > > +
> > > +       /* setup mapping between CPU and reply queue */
> > > +       hpsa_setup_reply_map(h);
> > > +
> > >         err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
> > >         if (err)
> > >                 goto clean2;    /* intmode+region, pci */
> > > @@ -8480,6 +8493,28 @@ static struct workqueue_struct
> > > *hpsa_create_controller_wq(struct ctlr_info *h,
> > >         return wq;
> > >  }
> > > 
> > > +static void hpda_free_ctlr_info(struct ctlr_info *h)
> > > +{
> > > +       kfree(h->reply_map);
> > > +       kfree(h);
> > > +}
> > > +
> > > +static struct ctlr_info *hpda_alloc_ctlr_info(void)
> > > +{
> > > +       struct ctlr_info *h;
> > > +
> > > +       h = kzalloc(sizeof(*h), GFP_KERNEL);
> > > +       if (!h)
> > > +               return NULL;
> > > +
> > > +       h->reply_map = kzalloc(sizeof(*h->reply_map) *
> > > nr_cpu_ids,
> > > GFP_KERNEL);
> > > +       if (!h->reply_map) {
> > > +               kfree(h);
> > > +               return NULL;
> > > +       }
> > > +       return h;
> > > +}
> > > +
> > >  static int hpsa_init_one(struct pci_dev *pdev, const struct
> > > pci_device_id *ent)
> > >  {
> > >         int dac, rc;
> > > @@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev
> > > *pdev, const
> > > struct pci_device_id *ent)
> > >          * the driver.  See comments in hpsa.h for more info.
> > >          */
> > >         BUILD_BUG_ON(sizeof(struct CommandList) %
> > > COMMANDLIST_ALIGNMENT);
> > > -       h = kzalloc(sizeof(*h), GFP_KERNEL);
> > > +       h = hpda_alloc_ctlr_info();
> > >         if (!h) {
> > >                 dev_err(&pdev->dev, "Failed to allocate
> > > controller
> > > head\n");
> > >                 return -ENOMEM;
> > > @@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev
> > > *pdev)
> > >         h->lockup_detected = NULL;                      /*
> > > init_one
> > > 2 */
> > >         /* (void) pci_disable_pcie_error_reporting(pdev);
> > > */    /*
> > > init_one 1 */
> > > 
> > > -       kfree(h);                                       /*
> > > init_one
> > > 1 */
> > > +       hpda_free_ctlr_info(h);                         /*
> > > init_one
> > > 1 */
> > >  }
> > > 
> > >  static int hpsa_suspend(__attribute__((unused)) struct pci_dev
> > > *pdev,
> > > diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> > > index 018f980a701c..fb9f5e7f8209 100644
> > > --- a/drivers/scsi/hpsa.h
> > > +++ b/drivers/scsi/hpsa.h
> > > @@ -158,6 +158,7 @@ struct bmic_controller_parameters {
> > >  #pragma pack()
> > > 
> > >  struct ctlr_info {
> > > +       unsigned int *reply_map;
> > >         int     ctlr;
> > >         char    devname[8];
> > >         char    *product_name;
> > > --
> > > 2.9.5
> > 
> > 
> 
> I have a DL580 here with the following:
> 
> Ming's latest tree
> 4.16.0-rc2.ming+
> 
> 3:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6
> controllers (rev 01) P410i
> 
> /dev/sg0  1 0 0 0  12  HP        P410i             6.60
> /dev/sg1  1 1 0 0  0  /dev/sda  HP        LOGICAL VOLUME    6.60
> Boot volume
> 
> /dev/sg2  1 1 0 1  0  /dev/sdb  HP        LOGICAL VOLUME    6.60
> Single disk
> 
> /dev/sg3  1 1 0 2  0  /dev/sdc  HP        LOGICAL VOLUME    6.60  
> 2 Disk Mirror
> 
> 
> MSA50 Shelf at 6GB, all Jbods
> 
> 0e:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
> 2208 [Thunderbolt] (rev 03)
> 
> /dev/sg4  0 0 43 0  0  /dev/sdd  HP        DG072A9BB7        HPD0
> /dev/sg5  0 0 44 0  0  /dev/sde  HP        DG146BABCF        HPD5
> /dev/sg6  0 0 45 0  0  /dev/sdf  HP        DG146BABCF        HPD6
> /dev/sg7  0 0 46 0  0  /dev/sdg  HP        EG0146FAWHU       HPDE   
> /dev/sg8  0 0 47 0  0  /dev/sdh  HP        EG0146FAWHU       HPDD
> /dev/sg9  0 0 48 0  0  /dev/sdi  HP        EG0146FAWHU       HPDE
> /dev/sg10  0 0 49 0  0  /dev/sdj  ATA       OCZ-VERTEX4       1.5 
> /dev/sg11  0 0 50 0  0  /dev/sdk  ATA       OCZ-VERTEX4       1.5 
> /dev/sg12  0 0 51 0  0  /dev/sdl  ATA       INTEL SSDSC2BW08  DC32
> /dev/sg13  0 0 52 0  13  HP        MSA50  -10D25G1   1.20
> 
> I have multiple boot passes on the HPSA all passing, and have not had
> any access issues with Ming's patches to the megaraid_sas drives
> 
> I dont have the decent SSD hardware to test performance on the
> megaraid_sas to match Kashyap unfortunately.
> 
> What I can say is that so far all boot testing has passed.
> 
> I will exercise all the drives now to see if I can bring about any
> issues seen by Don
> 
> Thanks
> Laurence

Don,

I am not seeing any issues with Ming's V3

So Ming's latest V3 is rock solid for me through multiple fio runs on
the DL580 here.
On both megaraid_sas and hpsa

Using
BOOT_IMAGE=/vmlinuz-4.16.0-rc2.ming+ root=UUID=43f86d71-b1bf-4789-a28e-
21c6ddc90195 ro crashkernel=256M@64M log_buf_len=64M
console=ttyS1,115200n8 scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset
  2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
                   ` (7 preceding siblings ...)
  2018-02-27 10:07 ` [PATCH V3 8/8] scsi: megaraid: " Ming Lei
@ 2018-03-01 21:46 ` Laurence Oberman
  8 siblings, 0 replies; 54+ messages in thread
From: Laurence Oberman @ 2018-03-01 21:46 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer
  Cc: linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera

On Tue, 2018-02-27 at 18:07 +0800, Ming Lei wrote:
> Hi All,
> 
> The 1st two patches fixes reply queue selection, and this issue has
> been
> reported and can cause IO hang during booting, please consider the
> two
> for V4.16.
> 
> The following 6 patches try to improve hostwide tagset on hpsa and
> megaraid_sas by making hw queue per NUMA node.
> 
> I don't have high-performance hpsa and megaraid_sas device at hand.
> 
> Don Brace, could you test this patchset on concurrent IOs over you
> hpsa
> SSD and see if this approach is well?
> 
> Kashyap, could you test this patchset on your megaraid_sas SSDs?
> 
> 	gitweb: https://github.com/ming1/linux/tree/v4.16-rc-host-tags-
> v3.2
> 
> thanks,
> Ming
> 
> Hannes Reinecke (1):
>   scsi: Add template flag 'host_tagset'
> 
> Ming Lei (7):
>   scsi: hpsa: fix selection of reply queue
>   scsi: megaraid_sas: fix selection of reply queue
>   blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
>   blk-mq: introduce BLK_MQ_F_HOST_TAGS
>   block: null_blk: introduce module parameter of 'g_host_tags'
>   scsi: hpsa: improve scsi_mq performance via .host_tagset
>   scsi: megaraid: improve scsi_mq performance via .host_tagset
> 
>  block/blk-mq-debugfs.c                      |  2 +
>  block/blk-mq-sched.c                        |  2 +-
>  block/blk-mq-tag.c                          | 13 +++--
>  block/blk-mq-tag.h                          | 11 ++--
>  block/blk-mq.c                              | 50 +++++++++++++++---
>  block/blk-mq.h                              |  3 +-
>  drivers/block/null_blk.c                    |  6 +++
>  drivers/scsi/hpsa.c                         | 79
> ++++++++++++++++++++++-------
>  drivers/scsi/hpsa.h                         |  1 +
>  drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 40 ++++++++++++++-
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++---
>  drivers/scsi/scsi_lib.c                     |  2 +
>  include/linux/blk-mq.h                      |  2 +
>  include/scsi/scsi_host.h                    |  3 ++
>  15 files changed, 182 insertions(+), 46 deletions(-)
> 

For the patchset above
All functional I/O tests and boot tests passed with multiple concurrent
fio runs.

Original HPSA booting issue is also resolved and its important or we
will have to revert original genirq commit
commit 84676c1f21e8ff54befe985f4f14dc1edc10046b

Tested-by: Laurence Oberman <loberman@redhat.com>

Thanks
Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-01 16:18   ` Don Brace
  2018-03-01 19:01     ` Laurence Oberman
@ 2018-03-02  0:47     ` Ming Lei
  1 sibling, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-03-02  0:47 UTC (permalink / raw)
  To: Don Brace
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Laurence Oberman, Meelis Roos

Hi Don,

Thanks for your test!

On Thu, Mar 01, 2018 at 04:18:17PM +0000, Don Brace wrote:
> > -----Original Message-----
> > From: Ming Lei [mailto:ming.lei@redhat.com]
> > Sent: Tuesday, February 27, 2018 4:08 AM
> > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org; Christoph
> > Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com>
> > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>; Arun Easi
> > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K .
> > Petersen <martin.petersen@oracle.com>; James Bottomley
> > <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@lst.de>;
> > Don Brace <don.brace@microsemi.com>; Kashyap Desai
> > <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom.com>;
> > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
> > 
> > EXTERNAL EMAIL
> > 
> > 
> > From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
> > one msix vector can be created without any online CPU mapped, then one
> > command's completion may not be notified.
> > 
> > This patch setups mapping between cpu and reply queue according to irq
> > affinity info retrived by pci_irq_get_affinity(), and uses this mapping
> > table to choose reply queue for queuing one command.
> > 
> > Then the chosen reply queue has to be active, and fixes IO hang caused
> > by using inactive reply queue which doesn't have any online CPU mapped.
> > 
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: Arun Easi <arun.easi@cavium.com>
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > Cc: Christoph Hellwig <hch@lst.de>,
> > Cc: Don Brace <don.brace@microsemi.com>
> > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > Cc: Laurence Oberman <loberman@redhat.com>
> > Cc: Meelis Roos <mroos@linux.ee>
> > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> 
> I am getting some issues that need to be tracked down:

I check the patch one more time, not find odd thing, and the only one
is that inside hpsa_do_reset(), wait_for_device_to_become_ready() is
called to send 'test unit ready' always by the reply queue 0. Do you know
if something bad may happen if other non-zero reply queue is used?

Could you share us how you reproduce this issue?

Looks you can boot successfully, so could you please provide the
following output?

1) what is your server type? We may find one in our lab, so that I can
try to reproduce it.

2) lscpu

3) irq affinity info, and you need to pass the 1st column of
'lspci' of your hpsa PCI device to this script:

#!/bin/sh
if [ $# -ge 1 ]; then
    PCID=$1
else
    PCID=`lspci | grep "Non-Volatile memory" | cut -c1-7`
fi
PCIP=`find /sys/devices -name *$PCID | grep pci`
IRQS=`ls $PCIP/msi_irqs`

echo "kernel version: "
uname -a

echo "PCI name is $PCID, dump its irq affinity:"
for IRQ in $IRQS; do
    CPUS=`cat /proc/irq/$IRQ/smp_affinity_list`
    echo "\tirq $IRQ, cpu list $CPUS"
done


Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-01 21:19       ` Laurence Oberman
@ 2018-03-02  2:16         ` Ming Lei
  2018-03-02 14:09           ` Laurence Oberman
  0 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-02  2:16 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Don Brace, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, linux-scsi, Hannes Reinecke, Arun Easi,
	Omar Sandoval, Martin K . Petersen, James Bottomley,
	Christoph Hellwig, Kashyap Desai, Peter Rivera, Meelis Roos

On Thu, Mar 01, 2018 at 04:19:34PM -0500, Laurence Oberman wrote:
> On Thu, 2018-03-01 at 14:01 -0500, Laurence Oberman wrote:
> > On Thu, 2018-03-01 at 16:18 +0000, Don Brace wrote:
> > > > -----Original Message-----
> > > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > > Sent: Tuesday, February 27, 2018 4:08 AM
> > > > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org;
> > > > Christoph
> > > > Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com>
> > > > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de>;
> > > > Arun Easi
> > > > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>; Martin K
> > > > .
> > > > Petersen <martin.petersen@oracle.com>; James Bottomley
> > > > <james.bottomley@hansenpartnership.com>; Christoph Hellwig <hch@l
> > > > st
> > > > .de>;
> > > > Don Brace <don.brace@microsemi.com>; Kashyap Desai
> > > > <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broadcom
> > > > .c
> > > > om>;
> > > > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > > > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
> > > > 
> > > > EXTERNAL EMAIL
> > > > 
> > > > 
> > > > From 84676c1f21 (genirq/affinity: assign vectors to all possible
> > > > CPUs),
> > > > one msix vector can be created without any online CPU mapped,
> > > > then
> > > > one
> > > > command's completion may not be notified.
> > > > 
> > > > This patch setups mapping between cpu and reply queue according
> > > > to
> > > > irq
> > > > affinity info retrived by pci_irq_get_affinity(), and uses this
> > > > mapping
> > > > table to choose reply queue for queuing one command.
> > > > 
> > > > Then the chosen reply queue has to be active, and fixes IO hang
> > > > caused
> > > > by using inactive reply queue which doesn't have any online CPU
> > > > mapped.
> > > > 
> > > > Cc: Hannes Reinecke <hare@suse.de>
> > > > Cc: Arun Easi <arun.easi@cavium.com>
> > > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > > > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > > > Cc: Christoph Hellwig <hch@lst.de>,
> > > > Cc: Don Brace <don.brace@microsemi.com>
> > > > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > > > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > > > Cc: Laurence Oberman <loberman@redhat.com>
> > > > Cc: Meelis Roos <mroos@linux.ee>
> > > > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all
> > > > possible CPUs")
> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > 
> > > I am getting some issues that need to be tracked down:
> > > 
> > > [ 1636.032984] hpsa 0000:87:00.0: Acknowledging event: 0xc0000032
> > > (HP
> > > SSD Smart Path configuration change)
> > > [ 1638.510656] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > > Accessï¿½ï¿½ï¿½ï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½MO0400JDVEUï¿½ï¿½ï¿½ï¿½ï¿½ï¿½PHYS DRV SSDSmartPathCap- En-
> > > Exp=0
> > > [ 1653.967695] hpsa 0000:87:00.0: Acknowledging event: 0x80000020
> > > (HP
> > > SSD Smart Path configuration change)
> > > [ 1656.770377] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > > Accessï¿½ï¿½ï¿½ï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½MO0400JDVEUï¿½ï¿½ï¿½ï¿½ï¿½ï¿½PHYS DRV SSDSmartPathCap- En-
> > > Exp=0
> > > [ 2839.762267] hpsa 0000:87:00.0: Acknowledging event: 0x80000020
> > > (HP
> > > SSD Smart Path configuration change)
> > > [ 2840.841290] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > > Accessï¿½ï¿½ï¿½ï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½MO0400JDVEUï¿½ï¿½ï¿½ï¿½ï¿½ï¿½PHYS DRV SSDSmartPathCap- En-
> > > Exp=0
> > > [ 2917.582653] hpsa 0000:87:00.0: Acknowledging event: 0xc0000020
> > > (HP
> > > SSD Smart Path configuration change)
> > > [ 2919.087191] hpsa 0000:87:00.0: scsi 3:1:0:1: updated Direct-
> > > Accessï¿½ï¿½ï¿½ï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½LOGICAL VOLUMEï¿½ï¿½ï¿½RAID-5 SSDSmartPathCap+ En+
> > > Exp=1
> > > [ 2919.142527] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > offload_enabled for LV.
> > > [ 2919.203915] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > offload_enabled for LV.
> > > [ 2919.266921] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > offload_enabled for LV.
> > > [ 2934.999629] hpsa 0000:87:00.0: Acknowledging event: 0x40000000
> > > (HP
> > > SSD Smart Path state change)
> > > [ 2936.937333] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > offload_enabled for LV.
> > > [ 2936.998707] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > offload_enabled for LV.
> > > [ 2937.060101] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > offload_enabled for LV.
> > > [ 3619.711122] sd 3:1:0:3: [sde] tag#436 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3619.751150] sd 3:1:0:3: [sde] tag#436 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3619.784375] sd 3:1:0:3: [sde] tag#436 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3619.816530] sd 3:1:0:3: [sde] tag#436 CDB: Read(10) 28 00 01 1b
> > > ad
> > > af 00 00 01 00
> > > [ 3619.852295] print_req_error: I/O error, dev sde, sector 18591151
> > > [ 3619.880850] sd 3:1:0:3: [sde] tag#461 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3619.920981] sd 3:1:0:3: [sde] tag#461 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3619.955081] sd 3:1:0:3: [sde] tag#461 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3619.987054] sd 3:1:0:3: [sde] tag#461 CDB: Read(10) 28 00 02 15
> > > 31
> > > 40 00 00 01 00
> > > [ 3620.022569] print_req_error: I/O error, dev sde, sector 34943296
> > > [ 3620.050873] sd 3:1:0:3: [sde] tag#157 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3620.091124] sd 3:1:0:3: [sde] tag#157 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3620.124179] sd 3:1:0:3: [sde] tag#157 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3620.156203] sd 3:1:0:3: [sde] tag#157 CDB: Read(10) 28 00 03 65
> > > 9d
> > > 7e 00 00 01 00
> > > [ 3620.191520] print_req_error: I/O error, dev sde, sector 56991102
> > > [ 3620.220308] sd 3:1:0:3: [sde] tag#266 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3620.260273] sd 3:1:0:3: [sde] tag#266 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3620.294605] sd 3:1:0:3: [sde] tag#266 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3620.328353] sd 3:1:0:3: [sde] tag#266 CDB: Read(10) 28 00 09 92
> > > 94
> > > 70 00 00 01 00
> > > [ 3620.364807] print_req_error: I/O error, dev sde, sector
> > > 160601200
> > > [ 3620.394342] sd 3:1:0:3: [sde] tag#278 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3620.434462] sd 3:1:0:3: [sde] tag#278 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3620.469059] sd 3:1:0:3: [sde] tag#278 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3620.471761] sd 3:1:0:3: [sde] tag#467 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3620.502240] sd 3:1:0:3: [sde] tag#278 CDB: Read(10) 28 00 08 00
> > > 12
> > > ea 00 00 01 00
> > > [ 3620.543157] sd 3:1:0:3: [sde] tag#467 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3620.580375] print_req_error: I/O error, dev sde, sector
> > > 134222570
> > > [ 3620.615355] sd 3:1:0:3: [sde] tag#467 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3620.645069] sd 3:1:0:3: [sde] tag#244 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3620.678696] sd 3:1:0:3: [sde] tag#467 CDB: Read(10) 28 00 10 3f
> > > 2b
> > > fc 00 00 01 00
> > > [ 3620.720247] sd 3:1:0:3: [sde] tag#244 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3620.756776] print_req_error: I/O error, dev sde, sector
> > > 272575484
> > > [ 3620.791857] sd 3:1:0:3: [sde] tag#244 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3620.822272] sd 3:1:0:3: [sde] tag#431 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3620.855200] sd 3:1:0:3: [sde] tag#244 CDB: Read(10) 28 00 08 31
> > > 86
> > > d9 00 00 01 00
> > > [ 3620.895823] sd 3:1:0:3: [sde] tag#431 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3620.931923] print_req_error: I/O error, dev sde, sector
> > > 137463513
> > > [ 3620.966262] sd 3:1:0:3: [sde] tag#431 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3620.995715] sd 3:1:0:3: [sde] tag#226 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3621.028703] sd 3:1:0:3: [sde] tag#431 CDB: Read(10) 28 00 10 7c
> > > b2
> > > b0 00 00 01 00
> > > [ 3621.069686] sd 3:1:0:3: [sde] tag#226 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3621.106253] print_req_error: I/O error, dev sde, sector
> > > 276607664
> > > [ 3621.140782] sd 3:1:0:3: [sde] tag#226 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3621.170241] sd 3:1:0:3: [sde] tag#408 FAILED Result:
> > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > [ 3621.202997] sd 3:1:0:3: [sde] tag#226 CDB: Read(10) 28 00 08 ba
> > > cf
> > > f2 00 00 01 00
> > > [ 3621.243870] sd 3:1:0:3: [sde] tag#408 Sense Key : Aborted
> > > Command
> > > [current]ï¿½
> > > [ 3621.280015] print_req_error: I/O error, dev sde, sector
> > > 146460658
> > > [ 3621.313941] sd 3:1:0:3: [sde] tag#408 Add. Sense: Internal
> > > target
> > > failure
> > > [ 3621.343790] print_req_error: I/O error, dev sde, sector 98830586
> > > [ 3621.376164] sd 3:1:0:3: [sde] tag#408 CDB: Read(10) 28 00 14 da
> > > 6a
> > > 53 00 00 01 00
> > > [ 3641.714842] WARNING: CPU: 3 PID: 0 at kernel/rcu/tree.c:2713
> > > rcu_process_callbacks+0x4d5/0x510
> > > [ 3641.756175] Modules linked in: sg ip6t_rpfilter ip6t_REJECT
> > > nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
> > > nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> > > nf_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp
> > > llc ebtable_filter ebtables ip6table_mangle ip6table_security
> > > ip6table_raw ip6table_filter ip6_tables iptable_mangle
> > > iptable_security iptable_raw iptable_filter ip_tables sb_edac
> > > x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass
> > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
> > > iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd
> > > pcspkr
> > > hpilo hpwdt ioatdma shpchp ipmi_si lpc_ich dca mfd_core wmi
> > > ipmi_msghandler acpi_power_meter pcc_cpufreq uinput xfs libcrc32c
> > > mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea sysfillrect
> > > [ 3642.094993]ï¿½ï¿½sysimgblt fb_sys_fops ttm drm crc32c_intel i2c_core
> > > tg3 hpsa scsi_transport_sas usb_storage dm_mirror dm_region_hash
> > > dm_log dm_mod dax
> > > [ 3642.158883] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.16.0-
> > > rc3+
> > > #18
> > > [ 3642.190015] Hardware name: HP ProLiant DL580 Gen8, BIOS P79
> > > 08/18/2016
> > > [ 3642.221949] RIP: 0010:rcu_process_callbacks+0x4d5/0x510
> > > [ 3642.247606] RSP: 0018:ffff8e179f6c3f08 EFLAGS: 00010002
> > > [ 3642.273087] RAX: 0000000000000000 RBX: ffff8e179f6e3180 RCX:
> > > ffff8e279d1e8918
> > > [ 3642.307426] RDX: ffffffffffffd801 RSI: ffff8e179f6c3f18 RDI:
> > > ffff8e179f6e31b8
> > > [ 3642.342219] RBP: ffffffffb70a31c0 R08: ffff8e279d1e8918 R09:
> > > 0000000000000100
> > > [ 3642.376929] R10: 0000000000000004 R11: 0000000000000005 R12:
> > > ffff8e179f6e31b8
> > > [ 3642.411598] R13: ffff8e179d20ad00 R14: 0000000000000001 R15:
> > > 7fffffffffffffff
> > > [ 3642.445957] FS:ï¿½ï¿½0000000000000000(0000)
> > > GS:ffff8e179f6c0000(0000)
> > > knlGS:0000000000000000
> > > [ 3642.485599] CS:ï¿½ï¿½0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 3642.513678] CR2: 00007f30917b9008 CR3: 000000054900a006 CR4:
> > > 00000000001606e0
> > > [ 3642.548189] Call Trace:
> > > [ 3642.560411]ï¿½ï¿½<IRQ>
> > > [ 3642.570588]ï¿½ï¿½__do_softirq+0xd1/0x275
> > > [ 3642.588643]ï¿½ï¿½irq_exit+0xd5/0xe0
> > > [ 3642.604134]ï¿½ï¿½smp_apic_timer_interrupt+0x60/0x120
> > > [ 3642.626752]ï¿½ï¿½apic_timer_interrupt+0xf/0x20
> > > [ 3642.646712]ï¿½ï¿½</IRQ>
> > > [ 3642.657330] RIP: 0010:cpuidle_enter_state+0xd4/0x260
> > > [ 3642.681389] RSP: 0018:ffffaed7c00e7ea0 EFLAGS: 00000246
> > > ORIG_RAX:
> > > ffffffffffffff12
> > > [ 3642.717937] RAX: ffff8e179f6e2280 RBX: ffffcebfbfec1bb8 RCX:
> > > 000000000000001f
> > > [ 3642.752525] RDX: 0000000000000000 RSI: ff6c3b1b90a53a78 RDI:
> > > 0000000000000000
> > > [ 3642.787181] RBP: 0000000000000003 R08: 0000000000000005 R09:
> > > 0000000000000396
> > > [ 3642.821442] R10: 00000000000003a7 R11: 0000000000000008 R12:
> > > 0000000000000003
> > > [ 3642.856381] R13: 0000034fe70ea52c R14: 0000000000000003 R15:
> > > 0000034fe71d99d4
> > > [ 3642.890830]ï¿½ï¿½do_idle+0x172/0x1e0
> > > [ 3642.906714]ï¿½ï¿½cpu_startup_entry+0x6f/0x80
> > > [ 3642.925835]ï¿½ï¿½start_secondary+0x187/0x1e0
> > > [ 3642.944975]ï¿½ï¿½secondary_startup_64+0xa5/0xb0
> > > [ 3642.965719] Code: e9 db fd ff ff 4c 89 f6 4c 89 e7 e8 96 b8 63
> > > 00
> > > e9 56 fc ff ff 0f 0b e9 34 fc ff ff 0f 0b 0f 1f 84 00 00 00 00 00
> > > e9
> > > e0 fb ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 e9 e5 fd ff ff 0f 0b
> > > 66 0f 1fï¿½
> > > [ 3643.056198] ---[ end trace 7bdac969b3138de7 ]---
> > > [ 3735.745955] hpsa 0000:87:00.0: SCSI status: LUN:000000c000002601
> > > CDB:12010000040000000000000000000000
> > > [ 3735.790497] hpsa 0000:87:00.0: SCSI Status = 02, Sense key =
> > > 0x05,
> > > ASC = 0x25, ASCQ = 0x00
> > > > ---
> > > > ï¿½drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++
> > > > --------------
> > > > ï¿½drivers/scsi/hpsa.h |ï¿½ï¿½1 +
> > > > ï¿½2 files changed, 55 insertions(+), 19 deletions(-)
> > > > 
> > > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > > > index 5293e6827ce5..3a9eca163db8 100644
> > > > --- a/drivers/scsi/hpsa.c
> > > > +++ b/drivers/scsi/hpsa.c
> > > > @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct
> > > > ctlr_info
> > > > *h, struct CommandList *c,
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½c->busaddr |= 1 | (h->blockFetchTable[c-
> > > > > Header.SGList] << 1);
> > > > 
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (unlikely(!h->msix_vectors))
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½return;
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½c->Header.ReplyQueue =
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½raw_smp_processor_id() % h-
> > > > > nreply_queues;
> > > > 
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½else
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½c->Header.ReplyQueue = reply_queue % h-
> > > > > nreply_queues;
> > > > 
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½c->Header.ReplyQueue = reply_queue;
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½}
> > > > ï¿½}
> > > > 
> > > > @@ -1063,10 +1059,7 @@ static void
> > > > set_ioaccel1_performant_mode(struct
> > > > ctlr_info *h,
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* Tell the controller to post the reply to the queue for
> > > > this
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* processor.ï¿½ï¿½This seems to give the best I/O
> > > > throughput.
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*/
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->ReplyQueue = smp_processor_id() % h-
> > > > > nreply_queues;
> > > > 
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½else
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->ReplyQueue = reply_queue % h->nreply_queues;
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->ReplyQueue = reply_queue;
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/*
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* Set the bits in the address sent down to include:
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*ï¿½ï¿½- performant mode bit (bit 0)
> > > > @@ -1087,10 +1080,7 @@ static void
> > > > set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/* Tell the controller to post the reply to the queue for
> > > > this
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* processor.ï¿½ï¿½This seems to give the best I/O
> > > > throughput.
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*/
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->reply_queue = smp_processor_id() % h-
> > > > > nreply_queues;
> > > > 
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½else
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->reply_queue = reply_queue % h->nreply_queues;
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->reply_queue = reply_queue;
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/* Set the bits in the address sent down to include:
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*ï¿½ï¿½- performant mode bit not used in ioaccel mode 2
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*ï¿½ï¿½- pull count (bits 0-3)
> > > > @@ -1109,10 +1099,7 @@ static void
> > > > set_ioaccel2_performant_mode(struct
> > > > ctlr_info *h,
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* Tell the controller to post the reply to the queue for
> > > > this
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* processor.ï¿½ï¿½This seems to give the best I/O
> > > > throughput.
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*/
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->reply_queue = smp_processor_id() % h-
> > > > > nreply_queues;
> > > > 
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½else
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->reply_queue = reply_queue % h->nreply_queues;
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½cp->reply_queue = reply_queue;
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/*
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* Set the bits in the address sent down to include:
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*ï¿½ï¿½- performant mode bit not used in ioaccel mode 2
> > > > @@ -1157,6 +1144,8 @@ static void
> > > > __enqueue_cmd_and_start_io(struct
> > > > ctlr_info *h,
> > > > ï¿½{
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½dial_down_lockup_detection_during_fw_flash(h, c);
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½atomic_inc(&h->commands_outstanding);
> > > > +
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½reply_queue = h->reply_map[raw_smp_processor_id()];
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½switch (c->cmd_type) {
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½case CMD_IOACCEL1:
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½set_ioaccel1_performant_mode(h, c, reply_queue);
> > > > @@ -7376,6 +7365,26 @@ static void
> > > > hpsa_disable_interrupt_mode(struct
> > > > ctlr_info *h)
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h->msix_vectors = 0;
> > > > ï¿½}
> > > > 
> > > > +static void hpsa_setup_reply_map(struct ctlr_info *h)
> > > > +{
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½const struct cpumask *mask;
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½unsigned int queue, cpu;
> > > > +
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½for (queue = 0; queue < h->msix_vectors; queue++) {
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½mask = pci_irq_get_affinity(h->pdev, queue);
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (!mask)
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½goto fallback;
> > > > +
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½for_each_cpu(cpu, mask)
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h->reply_map[cpu] = queue;
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½}
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½return;
> > > > +
> > > > +fallback:
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½for_each_possible_cpu(cpu)
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h->reply_map[cpu] = 0;
> > > > +}
> > > > +
> > > > ï¿½/* If MSI/MSI-X is supported by the kernel we will try to enable
> > > > it on
> > > > ï¿½ * controllers that are capable. If not, we use legacy INTx
> > > > mode.
> > > > ï¿½ */
> > > > @@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info
> > > > *h)
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½err = hpsa_interrupt_mode(h);
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (err)
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½goto clean1;
> > > > +
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/* setup mapping between CPU and reply queue */
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½hpsa_setup_reply_map(h);
> > > > +
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (err)
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½goto clean2;ï¿½ï¿½ï¿½ï¿½/* intmode+region, pci */
> > > > @@ -8480,6 +8493,28 @@ static struct workqueue_struct
> > > > *hpsa_create_controller_wq(struct ctlr_info *h,
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½return wq;
> > > > ï¿½}
> > > > 
> > > > +static void hpda_free_ctlr_info(struct ctlr_info *h)
> > > > +{
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½kfree(h->reply_map);
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½kfree(h);
> > > > +}
> > > > +
> > > > +static struct ctlr_info *hpda_alloc_ctlr_info(void)
> > > > +{
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½struct ctlr_info *h;
> > > > +
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h = kzalloc(sizeof(*h), GFP_KERNEL);
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (!h)
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½return NULL;
> > > > +
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h->reply_map = kzalloc(sizeof(*h->reply_map) *
> > > > nr_cpu_ids,
> > > > GFP_KERNEL);
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (!h->reply_map) {
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½kfree(h);
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½return NULL;
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½}
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½return h;
> > > > +}
> > > > +
> > > > ï¿½static int hpsa_init_one(struct pci_dev *pdev, const struct
> > > > pci_device_id *ent)
> > > > ï¿½{
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½int dac, rc;
> > > > @@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev
> > > > *pdev, const
> > > > struct pci_device_id *ent)
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½* the driver.ï¿½ï¿½See comments in hpsa.h for more info.
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½*/
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½BUILD_BUG_ON(sizeof(struct CommandList) %
> > > > COMMANDLIST_ALIGNMENT);
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h = kzalloc(sizeof(*h), GFP_KERNEL);
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h = hpda_alloc_ctlr_info();
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½if (!h) {
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½dev_err(&pdev->dev, "Failed to allocate
> > > > controller
> > > > head\n");
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½return -ENOMEM;
> > > > @@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev
> > > > *pdev)
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½h->lockup_detected = NULL;ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/*
> > > > init_one
> > > > 2 */
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/* (void) pci_disable_pcie_error_reporting(pdev);
> > > > */ï¿½ï¿½ï¿½ï¿½/*
> > > > init_one 1 */
> > > > 
> > > > -ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½kfree(h);ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/*
> > > > init_one
> > > > 1 */
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½hpda_free_ctlr_info(h);ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½/*
> > > > init_one
> > > > 1 */
> > > > ï¿½}
> > > > 
> > > > ï¿½static int hpsa_suspend(__attribute__((unused)) struct pci_dev
> > > > *pdev,
> > > > diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> > > > index 018f980a701c..fb9f5e7f8209 100644
> > > > --- a/drivers/scsi/hpsa.h
> > > > +++ b/drivers/scsi/hpsa.h
> > > > @@ -158,6 +158,7 @@ struct bmic_controller_parameters {
> > > > ï¿½#pragma pack()
> > > > 
> > > > ï¿½struct ctlr_info {
> > > > +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½unsigned int *reply_map;
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½intï¿½ï¿½ï¿½ï¿½ï¿½ctlr;
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½charï¿½ï¿½ï¿½ï¿½devname[8];
> > > > ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½charï¿½ï¿½ï¿½ï¿½*product_name;
> > > > --
> > > > 2.9.5
> > > 
> > > 
> > 
> > I have a DL580 here with the following:
> > 
> > Ming's latest tree
> > 4.16.0-rc2.ming+
> > 
> > 3:00.0 RAID bus controller: Hewlett-Packard Company Smart Array G6
> > controllers (rev 01) P410i
> > 
> > /dev/sg0ï¿½ï¿½1 0 0 0ï¿½ï¿½12ï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½P410iï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½6.60
> > /dev/sg1ï¿½ï¿½1 1 0 0ï¿½ï¿½0ï¿½ï¿½/dev/sdaï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½LOGICAL VOLUMEï¿½ï¿½ï¿½ï¿½6.60
> > Boot volume
> > 
> > /dev/sg2ï¿½ï¿½1 1 0 1ï¿½ï¿½0ï¿½ï¿½/dev/sdbï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½LOGICAL VOLUMEï¿½ï¿½ï¿½ï¿½6.60
> > Single disk
> > 
> > /dev/sg3ï¿½ï¿½1 1 0 2ï¿½ï¿½0ï¿½ï¿½/dev/sdcï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½LOGICAL VOLUMEï¿½ï¿½ï¿½ï¿½6.60 ï¿½
> > 2 Disk Mirror
> > 
> > 
> > MSA50 Shelf at 6GB, all Jbods
> > 
> > 0e:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS
> > 2208 [Thunderbolt] (rev 03)
> > 
> > /dev/sg4ï¿½ï¿½0 0 43 0ï¿½ï¿½0ï¿½ï¿½/dev/sddï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½DG072A9BB7ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½HPD0
> > /dev/sg5ï¿½ï¿½0 0 44 0ï¿½ï¿½0ï¿½ï¿½/dev/sdeï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½DG146BABCFï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½HPD5
> > /dev/sg6ï¿½ï¿½0 0 45 0ï¿½ï¿½0ï¿½ï¿½/dev/sdfï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½DG146BABCFï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½HPD6
> > /dev/sg7ï¿½ï¿½0 0 46 0ï¿½ï¿½0ï¿½ï¿½/dev/sdgï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½EG0146FAWHUï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½HPDEï¿½ï¿½ï¿½
> > /dev/sg8ï¿½ï¿½0 0 47 0ï¿½ï¿½0ï¿½ï¿½/dev/sdhï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½EG0146FAWHUï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½HPDD
> > /dev/sg9ï¿½ï¿½0 0 48 0ï¿½ï¿½0ï¿½ï¿½/dev/sdiï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½EG0146FAWHUï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½HPDE
> > /dev/sg10ï¿½ï¿½0 0 49 0ï¿½ï¿½0ï¿½ï¿½/dev/sdjï¿½ï¿½ATAï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½OCZ-VERTEX4ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½1.5ï¿½
> > /dev/sg11ï¿½ï¿½0 0 50 0ï¿½ï¿½0ï¿½ï¿½/dev/sdkï¿½ï¿½ATAï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½OCZ-VERTEX4ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½1.5ï¿½
> > /dev/sg12ï¿½ï¿½0 0 51 0ï¿½ï¿½0ï¿½ï¿½/dev/sdlï¿½ï¿½ATAï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½INTEL SSDSC2BW08ï¿½ï¿½DC32
> > /dev/sg13ï¿½ï¿½0 0 52 0ï¿½ï¿½13ï¿½ï¿½HPï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½MSA50ï¿½ï¿½-10D25G1ï¿½ï¿½ï¿½1.20
> > 
> > I have multiple boot passes on the HPSA all passing, and have not had
> > any access issues with Ming's patches to the megaraid_sas drives
> > 
> > I dont have the decent SSD hardware to test performance on the
> > megaraid_sas to match Kashyap unfortunately.
> > 
> > What I can say is that so far all boot testing has passed.
> > 
> > I will exercise all the drives now to see if I can bring about any
> > issues seen by Don
> > 
> > Thanks
> > Laurence
> 
> Don,
> 
> I am not seeing any issues with Ming's V3
> 
> So Ming's latest V3 is rock solid for me through multiple fio runs on
> the DL580 here.
> On both megaraid_sas and hpsa
> 
> Using
> BOOT_IMAGE=/vmlinuz-4.16.0-rc2.ming+ root=UUID=43f86d71-b1bf-4789-a28e-
> 21c6ddc90195 ro crashkernel=256M@64M log_buf_len=64M
> console=ttyS1,115200n8 scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y

Hi Laurence,

Thanks for your test!

Seems Don run into IO failure without blk-mq, could you run your tests again
in legacy mode?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-02  2:16         ` Ming Lei
@ 2018-03-02 14:09           ` Laurence Oberman
  2018-03-02 15:03             ` Don Brace
  0 siblings, 1 reply; 54+ messages in thread
From: Laurence Oberman @ 2018-03-02 14:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: Don Brace, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, linux-scsi, Hannes Reinecke, Arun Easi,
	Omar Sandoval, Martin K . Petersen, James Bottomley,
	Christoph Hellwig, Kashyap Desai, Peter Rivera, Meelis Roos

On Fri, 2018-03-02 at 10:16 +0800, Ming Lei wrote:
> On Thu, Mar 01, 2018 at 04:19:34PM -0500, Laurence Oberman wrote:
> > On Thu, 2018-03-01 at 14:01 -0500, Laurence Oberman wrote:
> > > On Thu, 2018-03-01 at 16:18 +0000, Don Brace wrote:
> > > > > -----Original Message-----
> > > > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > > > Sent: Tuesday, February 27, 2018 4:08 AM
> > > > > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel.org
> > > > > ;
> > > > > Christoph
> > > > > Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat.com
> > > > > >
> > > > > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@suse.de
> > > > > >;
> > > > > Arun Easi
> > > > > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>;
> > > > > Martin K
> > > > > .
> > > > > Petersen <martin.petersen@oracle.com>; James Bottomley
> > > > > <james.bottomley@hansenpartnership.com>; Christoph Hellwig <h
> > > > > ch@l
> > > > > st
> > > > > .de>;
> > > > > Don Brace <don.brace@microsemi.com>; Kashyap Desai
> > > > > <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@broa
> > > > > dcom
> > > > > .c
> > > > > om>;
> > > > > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > > > > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > > > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply
> > > > > queue
> > > > > 
> > > > > EXTERNAL EMAIL
> > > > > 
> > > > > 
> > > > > From 84676c1f21 (genirq/affinity: assign vectors to all
> > > > > possible
> > > > > CPUs),
> > > > > one msix vector can be created without any online CPU mapped,
> > > > > then
> > > > > one
> > > > > command's completion may not be notified.
> > > > > 
> > > > > This patch setups mapping between cpu and reply queue
> > > > > according
> > > > > to
> > > > > irq
> > > > > affinity info retrived by pci_irq_get_affinity(), and uses
> > > > > this
> > > > > mapping
> > > > > table to choose reply queue for queuing one command.
> > > > > 
> > > > > Then the chosen reply queue has to be active, and fixes IO
> > > > > hang
> > > > > caused
> > > > > by using inactive reply queue which doesn't have any online
> > > > > CPU
> > > > > mapped.
> > > > > 
> > > > > Cc: Hannes Reinecke <hare@suse.de>
> > > > > Cc: Arun Easi <arun.easi@cavium.com>
> > > > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > > > > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > > > > Cc: Christoph Hellwig <hch@lst.de>,
> > > > > Cc: Don Brace <don.brace@microsemi.com>
> > > > > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > > > > Cc: Peter Rivera <peter.rivera@broadcom.com>
> > > > > Cc: Laurence Oberman <loberman@redhat.com>
> > > > > Cc: Meelis Roos <mroos@linux.ee>
> > > > > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all
> > > > > possible CPUs")
> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > 
> > > > I am getting some issues that need to be tracked down:
> > > > 
> > > > [ 1636.032984] hpsa 0000:87:00.0: Acknowledging event:
> > > > 0xc0000032
> > > > (HP
> > > > SSD Smart Path configuration change)
> > > > [ 1638.510656] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > > > Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap-
> > > > En-
> > > > Exp=0
> > > > [ 1653.967695] hpsa 0000:87:00.0: Acknowledging event:
> > > > 0x80000020
> > > > (HP
> > > > SSD Smart Path configuration change)
> > > > [ 1656.770377] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > > > Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap-
> > > > En-
> > > > Exp=0
> > > > [ 2839.762267] hpsa 0000:87:00.0: Acknowledging event:
> > > > 0x80000020
> > > > (HP
> > > > SSD Smart Path configuration change)
> > > > [ 2840.841290] hpsa 0000:87:00.0: scsi 3:0:8:0: updated Direct-
> > > > Access     HP       MO0400JDVEU      PHYS DRV SSDSmartPathCap-
> > > > En-
> > > > Exp=0
> > > > [ 2917.582653] hpsa 0000:87:00.0: Acknowledging event:
> > > > 0xc0000020
> > > > (HP
> > > > SSD Smart Path configuration change)
> > > > [ 2919.087191] hpsa 0000:87:00.0: scsi 3:1:0:1: updated Direct-
> > > > Access     HP       LOGICAL VOLUME   RAID-5 SSDSmartPathCap+
> > > > En+
> > > > Exp=1
> > > > [ 2919.142527] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > > offload_enabled for LV.
> > > > [ 2919.203915] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > > offload_enabled for LV.
> > > > [ 2919.266921] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > > offload_enabled for LV.
> > > > [ 2934.999629] hpsa 0000:87:00.0: Acknowledging event:
> > > > 0x40000000
> > > > (HP
> > > > SSD Smart Path state change)
> > > > [ 2936.937333] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > > offload_enabled for LV.
> > > > [ 2936.998707] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > > offload_enabled for LV.
> > > > [ 2937.060101] hpsa 0000:87:00.0: hpsa_figure_phys_disk_ptrs:
> > > > [3:1:0:2] A phys disk component of LV is missing, turning off
> > > > offload_enabled for LV.
> > > > [ 3619.711122] sd 3:1:0:3: [sde] tag#436 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3619.751150] sd 3:1:0:3: [sde] tag#436 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3619.784375] sd 3:1:0:3: [sde] tag#436 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3619.816530] sd 3:1:0:3: [sde] tag#436 CDB: Read(10) 28 00 01
> > > > 1b
> > > > ad
> > > > af 00 00 01 00
> > > > [ 3619.852295] print_req_error: I/O error, dev sde, sector
> > > > 18591151
> > > > [ 3619.880850] sd 3:1:0:3: [sde] tag#461 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3619.920981] sd 3:1:0:3: [sde] tag#461 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3619.955081] sd 3:1:0:3: [sde] tag#461 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3619.987054] sd 3:1:0:3: [sde] tag#461 CDB: Read(10) 28 00 02
> > > > 15
> > > > 31
> > > > 40 00 00 01 00
> > > > [ 3620.022569] print_req_error: I/O error, dev sde, sector
> > > > 34943296
> > > > [ 3620.050873] sd 3:1:0:3: [sde] tag#157 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3620.091124] sd 3:1:0:3: [sde] tag#157 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3620.124179] sd 3:1:0:3: [sde] tag#157 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3620.156203] sd 3:1:0:3: [sde] tag#157 CDB: Read(10) 28 00 03
> > > > 65
> > > > 9d
> > > > 7e 00 00 01 00
> > > > [ 3620.191520] print_req_error: I/O error, dev sde, sector
> > > > 56991102
> > > > [ 3620.220308] sd 3:1:0:3: [sde] tag#266 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3620.260273] sd 3:1:0:3: [sde] tag#266 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3620.294605] sd 3:1:0:3: [sde] tag#266 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3620.328353] sd 3:1:0:3: [sde] tag#266 CDB: Read(10) 28 00 09
> > > > 92
> > > > 94
> > > > 70 00 00 01 00
> > > > [ 3620.364807] print_req_error: I/O error, dev sde, sector
> > > > 160601200
> > > > [ 3620.394342] sd 3:1:0:3: [sde] tag#278 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3620.434462] sd 3:1:0:3: [sde] tag#278 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3620.469059] sd 3:1:0:3: [sde] tag#278 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3620.471761] sd 3:1:0:3: [sde] tag#467 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3620.502240] sd 3:1:0:3: [sde] tag#278 CDB: Read(10) 28 00 08
> > > > 00
> > > > 12
> > > > ea 00 00 01 00
> > > > [ 3620.543157] sd 3:1:0:3: [sde] tag#467 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3620.580375] print_req_error: I/O error, dev sde, sector
> > > > 134222570
> > > > [ 3620.615355] sd 3:1:0:3: [sde] tag#467 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3620.645069] sd 3:1:0:3: [sde] tag#244 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3620.678696] sd 3:1:0:3: [sde] tag#467 CDB: Read(10) 28 00 10
> > > > 3f
> > > > 2b
> > > > fc 00 00 01 00
> > > > [ 3620.720247] sd 3:1:0:3: [sde] tag#244 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3620.756776] print_req_error: I/O error, dev sde, sector
> > > > 272575484
> > > > [ 3620.791857] sd 3:1:0:3: [sde] tag#244 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3620.822272] sd 3:1:0:3: [sde] tag#431 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3620.855200] sd 3:1:0:3: [sde] tag#244 CDB: Read(10) 28 00 08
> > > > 31
> > > > 86
> > > > d9 00 00 01 00
> > > > [ 3620.895823] sd 3:1:0:3: [sde] tag#431 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3620.931923] print_req_error: I/O error, dev sde, sector
> > > > 137463513
> > > > [ 3620.966262] sd 3:1:0:3: [sde] tag#431 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3620.995715] sd 3:1:0:3: [sde] tag#226 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3621.028703] sd 3:1:0:3: [sde] tag#431 CDB: Read(10) 28 00 10
> > > > 7c
> > > > b2
> > > > b0 00 00 01 00
> > > > [ 3621.069686] sd 3:1:0:3: [sde] tag#226 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3621.106253] print_req_error: I/O error, dev sde, sector
> > > > 276607664
> > > > [ 3621.140782] sd 3:1:0:3: [sde] tag#226 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3621.170241] sd 3:1:0:3: [sde] tag#408 FAILED Result:
> > > > hostbyte=DID_OK driverbyte=DRIVER_SENSE
> > > > [ 3621.202997] sd 3:1:0:3: [sde] tag#226 CDB: Read(10) 28 00 08
> > > > ba
> > > > cf
> > > > f2 00 00 01 00
> > > > [ 3621.243870] sd 3:1:0:3: [sde] tag#408 Sense Key : Aborted
> > > > Command
> > > > [current] 
> > > > [ 3621.280015] print_req_error: I/O error, dev sde, sector
> > > > 146460658
> > > > [ 3621.313941] sd 3:1:0:3: [sde] tag#408 Add. Sense: Internal
> > > > target
> > > > failure
> > > > [ 3621.343790] print_req_error: I/O error, dev sde, sector
> > > > 98830586
> > > > [ 3621.376164] sd 3:1:0:3: [sde] tag#408 CDB: Read(10) 28 00 14
> > > > da
> > > > 6a
> > > > 53 00 00 01 00
> > > > [ 3641.714842] WARNING: CPU: 3 PID: 0 at kernel/rcu/tree.c:2713
> > > > rcu_process_callbacks+0x4d5/0x510
> > > > [ 3641.756175] Modules linked in: sg ip6t_rpfilter ip6t_REJECT
> > > > nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT
> > > > nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> > > > nf_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge
> > > > stp
> > > > llc ebtable_filter ebtables ip6table_mangle ip6table_security
> > > > ip6table_raw ip6table_filter ip6_tables iptable_mangle
> > > > iptable_security iptable_raw iptable_filter ip_tables sb_edac
> > > > x86_pkg_temp_thermal coretemp kvm_intel kvm irqbypass
> > > > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc iTCO_wdt
> > > > iTCO_vendor_support aesni_intel crypto_simd glue_helper cryptd
> > > > pcspkr
> > > > hpilo hpwdt ioatdma shpchp ipmi_si lpc_ich dca mfd_core wmi
> > > > ipmi_msghandler acpi_power_meter pcc_cpufreq uinput xfs
> > > > libcrc32c
> > > > mgag200 i2c_algo_bit drm_kms_helper sd_mod syscopyarea
> > > > sysfillrect
> > > > [ 3642.094993]  sysimgblt fb_sys_fops ttm drm crc32c_intel
> > > > i2c_core
> > > > tg3 hpsa scsi_transport_sas usb_storage dm_mirror
> > > > dm_region_hash
> > > > dm_log dm_mod dax
> > > > [ 3642.158883] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> > > > 4.16.0-
> > > > rc3+
> > > > #18
> > > > [ 3642.190015] Hardware name: HP ProLiant DL580 Gen8, BIOS P79
> > > > 08/18/2016
> > > > [ 3642.221949] RIP: 0010:rcu_process_callbacks+0x4d5/0x510
> > > > [ 3642.247606] RSP: 0018:ffff8e179f6c3f08 EFLAGS: 00010002
> > > > [ 3642.273087] RAX: 0000000000000000 RBX: ffff8e179f6e3180 RCX:
> > > > ffff8e279d1e8918
> > > > [ 3642.307426] RDX: ffffffffffffd801 RSI: ffff8e179f6c3f18 RDI:
> > > > ffff8e179f6e31b8
> > > > [ 3642.342219] RBP: ffffffffb70a31c0 R08: ffff8e279d1e8918 R09:
> > > > 0000000000000100
> > > > [ 3642.376929] R10: 0000000000000004 R11: 0000000000000005 R12:
> > > > ffff8e179f6e31b8
> > > > [ 3642.411598] R13: ffff8e179d20ad00 R14: 0000000000000001 R15:
> > > > 7fffffffffffffff
> > > > [ 3642.445957] FS:  0000000000000000(0000)
> > > > GS:ffff8e179f6c0000(0000)
> > > > knlGS:0000000000000000
> > > > [ 3642.485599] CS:  0010 DS: 0000 ES: 0000 CR0:
> > > > 0000000080050033
> > > > [ 3642.513678] CR2: 00007f30917b9008 CR3: 000000054900a006 CR4:
> > > > 00000000001606e0
> > > > [ 3642.548189] Call Trace:
> > > > [ 3642.560411]  <IRQ>
> > > > [ 3642.570588]  __do_softirq+0xd1/0x275
> > > > [ 3642.588643]  irq_exit+0xd5/0xe0
> > > > [ 3642.604134]  smp_apic_timer_interrupt+0x60/0x120
> > > > [ 3642.626752]  apic_timer_interrupt+0xf/0x20
> > > > [ 3642.646712]  </IRQ>
> > > > [ 3642.657330] RIP: 0010:cpuidle_enter_state+0xd4/0x260
> > > > [ 3642.681389] RSP: 0018:ffffaed7c00e7ea0 EFLAGS: 00000246
> > > > ORIG_RAX:
> > > > ffffffffffffff12
> > > > [ 3642.717937] RAX: ffff8e179f6e2280 RBX: ffffcebfbfec1bb8 RCX:
> > > > 000000000000001f
> > > > [ 3642.752525] RDX: 0000000000000000 RSI: ff6c3b1b90a53a78 RDI:
> > > > 0000000000000000
> > > > [ 3642.787181] RBP: 0000000000000003 R08: 0000000000000005 R09:
> > > > 0000000000000396
> > > > [ 3642.821442] R10: 00000000000003a7 R11: 0000000000000008 R12:
> > > > 0000000000000003
> > > > [ 3642.856381] R13: 0000034fe70ea52c R14: 0000000000000003 R15:
> > > > 0000034fe71d99d4
> > > > [ 3642.890830]  do_idle+0x172/0x1e0
> > > > [ 3642.906714]  cpu_startup_entry+0x6f/0x80
> > > > [ 3642.925835]  start_secondary+0x187/0x1e0
> > > > [ 3642.944975]  secondary_startup_64+0xa5/0xb0
> > > > [ 3642.965719] Code: e9 db fd ff ff 4c 89 f6 4c 89 e7 e8 96 b8
> > > > 63
> > > > 00
> > > > e9 56 fc ff ff 0f 0b e9 34 fc ff ff 0f 0b 0f 1f 84 00 00 00 00
> > > > 00
> > > > e9
> > > > e0 fb ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 e9 e5 fd ff ff
> > > > 0f 0b
> > > > 66 0f 1f 
> > > > [ 3643.056198] ---[ end trace 7bdac969b3138de7 ]---
> > > > [ 3735.745955] hpsa 0000:87:00.0: SCSI status:
> > > > LUN:000000c000002601
> > > > CDB:12010000040000000000000000000000
> > > > [ 3735.790497] hpsa 0000:87:00.0: SCSI Status = 02, Sense key =
> > > > 0x05,
> > > > ASC = 0x25, ASCQ = 0x00
> > > > > ---
> > > > >  drivers/scsi/hpsa.c | 73
> > > > > +++++++++++++++++++++++++++++++++++++++
> > > > > --------------
> > > > >  drivers/scsi/hpsa.h |  1 +
> > > > >  2 files changed, 55 insertions(+), 19 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > > > > index 5293e6827ce5..3a9eca163db8 100644
> > > > > --- a/drivers/scsi/hpsa.c
> > > > > +++ b/drivers/scsi/hpsa.c
> > > > > @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct
> > > > > ctlr_info
> > > > > *h, struct CommandList *c,
> > > > >                 c->busaddr |= 1 | (h->blockFetchTable[c-
> > > > > > Header.SGList] << 1);
> > > > > 
> > > > >                 if (unlikely(!h->msix_vectors))
> > > > >                         return;
> > > > > -               if (likely(reply_queue ==
> > > > > DEFAULT_REPLY_QUEUE))
> > > > > -                       c->Header.ReplyQueue =
> > > > > -                               raw_smp_processor_id() % h-
> > > > > > nreply_queues;
> > > > > 
> > > > > -               else
> > > > > -                       c->Header.ReplyQueue = reply_queue %
> > > > > h-
> > > > > > nreply_queues;
> > > > > 
> > > > > +               c->Header.ReplyQueue = reply_queue;
> > > > >         }
> > > > >  }
> > > > > 
> > > > > @@ -1063,10 +1059,7 @@ static void
> > > > > set_ioaccel1_performant_mode(struct
> > > > > ctlr_info *h,
> > > > >          * Tell the controller to post the reply to the queue
> > > > > for
> > > > > this
> > > > >          * processor.  This seems to give the best I/O
> > > > > throughput.
> > > > >          */
> > > > > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > > > -               cp->ReplyQueue = smp_processor_id() % h-
> > > > > > nreply_queues;
> > > > > 
> > > > > -       else
> > > > > -               cp->ReplyQueue = reply_queue % h-
> > > > > >nreply_queues;
> > > > > +       cp->ReplyQueue = reply_queue;
> > > > >         /*
> > > > >          * Set the bits in the address sent down to include:
> > > > >          *  - performant mode bit (bit 0)
> > > > > @@ -1087,10 +1080,7 @@ static void
> > > > > set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
> > > > >         /* Tell the controller to post the reply to the queue
> > > > > for
> > > > > this
> > > > >          * processor.  This seems to give the best I/O
> > > > > throughput.
> > > > >          */
> > > > > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > > > -               cp->reply_queue = smp_processor_id() % h-
> > > > > > nreply_queues;
> > > > > 
> > > > > -       else
> > > > > -               cp->reply_queue = reply_queue % h-
> > > > > >nreply_queues;
> > > > > +       cp->reply_queue = reply_queue;
> > > > >         /* Set the bits in the address sent down to include:
> > > > >          *  - performant mode bit not used in ioaccel mode 2
> > > > >          *  - pull count (bits 0-3)
> > > > > @@ -1109,10 +1099,7 @@ static void
> > > > > set_ioaccel2_performant_mode(struct
> > > > > ctlr_info *h,
> > > > >          * Tell the controller to post the reply to the queue
> > > > > for
> > > > > this
> > > > >          * processor.  This seems to give the best I/O
> > > > > throughput.
> > > > >          */
> > > > > -       if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
> > > > > -               cp->reply_queue = smp_processor_id() % h-
> > > > > > nreply_queues;
> > > > > 
> > > > > -       else
> > > > > -               cp->reply_queue = reply_queue % h-
> > > > > >nreply_queues;
> > > > > +       cp->reply_queue = reply_queue;
> > > > >         /*
> > > > >          * Set the bits in the address sent down to include:
> > > > >          *  - performant mode bit not used in ioaccel mode 2
> > > > > @@ -1157,6 +1144,8 @@ static void
> > > > > __enqueue_cmd_and_start_io(struct
> > > > > ctlr_info *h,
> > > > >  {
> > > > >         dial_down_lockup_detection_during_fw_flash(h, c);
> > > > >         atomic_inc(&h->commands_outstanding);
> > > > > +
> > > > > +       reply_queue = h->reply_map[raw_smp_processor_id()];
> > > > >         switch (c->cmd_type) {
> > > > >         case CMD_IOACCEL1:
> > > > >                 set_ioaccel1_performant_mode(h, c,
> > > > > reply_queue);
> > > > > @@ -7376,6 +7365,26 @@ static void
> > > > > hpsa_disable_interrupt_mode(struct
> > > > > ctlr_info *h)
> > > > >         h->msix_vectors = 0;
> > > > >  }
> > > > > 
> > > > > +static void hpsa_setup_reply_map(struct ctlr_info *h)
> > > > > +{
> > > > > +       const struct cpumask *mask;
> > > > > +       unsigned int queue, cpu;
> > > > > +
> > > > > +       for (queue = 0; queue < h->msix_vectors; queue++) {
> > > > > +               mask = pci_irq_get_affinity(h->pdev, queue);
> > > > > +               if (!mask)
> > > > > +                       goto fallback;
> > > > > +
> > > > > +               for_each_cpu(cpu, mask)
> > > > > +                       h->reply_map[cpu] = queue;
> > > > > +       }
> > > > > +       return;
> > > > > +
> > > > > +fallback:
> > > > > +       for_each_possible_cpu(cpu)
> > > > > +               h->reply_map[cpu] = 0;
> > > > > +}
> > > > > +
> > > > >  /* If MSI/MSI-X is supported by the kernel we will try to
> > > > > enable
> > > > > it on
> > > > >   * controllers that are capable. If not, we use legacy INTx
> > > > > mode.
> > > > >   */
> > > > > @@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct
> > > > > ctlr_info
> > > > > *h)
> > > > >         err = hpsa_interrupt_mode(h);
> > > > >         if (err)
> > > > >                 goto clean1;
> > > > > +
> > > > > +       /* setup mapping between CPU and reply queue */
> > > > > +       hpsa_setup_reply_map(h);
> > > > > +
> > > > >         err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
> > > > >         if (err)
> > > > >                 goto clean2;    /* intmode+region, pci */
> > > > > @@ -8480,6 +8493,28 @@ static struct workqueue_struct
> > > > > *hpsa_create_controller_wq(struct ctlr_info *h,
> > > > >         return wq;
> > > > >  }
> > > > > 
> > > > > +static void hpda_free_ctlr_info(struct ctlr_info *h)
> > > > > +{
> > > > > +       kfree(h->reply_map);
> > > > > +       kfree(h);
> > > > > +}
> > > > > +
> > > > > +static struct ctlr_info *hpda_alloc_ctlr_info(void)
> > > > > +{
> > > > > +       struct ctlr_info *h;
> > > > > +
> > > > > +       h = kzalloc(sizeof(*h), GFP_KERNEL);
> > > > > +       if (!h)
> > > > > +               return NULL;
> > > > > +
> > > > > +       h->reply_map = kzalloc(sizeof(*h->reply_map) *
> > > > > nr_cpu_ids,
> > > > > GFP_KERNEL);
> > > > > +       if (!h->reply_map) {
> > > > > +               kfree(h);
> > > > > +               return NULL;
> > > > > +       }
> > > > > +       return h;
> > > > > +}
> > > > > +
> > > > >  static int hpsa_init_one(struct pci_dev *pdev, const struct
> > > > > pci_device_id *ent)
> > > > >  {
> > > > >         int dac, rc;
> > > > > @@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev
> > > > > *pdev, const
> > > > > struct pci_device_id *ent)
> > > > >          * the driver.  See comments in hpsa.h for more info.
> > > > >          */
> > > > >         BUILD_BUG_ON(sizeof(struct CommandList) %
> > > > > COMMANDLIST_ALIGNMENT);
> > > > > -       h = kzalloc(sizeof(*h), GFP_KERNEL);
> > > > > +       h = hpda_alloc_ctlr_info();
> > > > >         if (!h) {
> > > > >                 dev_err(&pdev->dev, "Failed to allocate
> > > > > controller
> > > > > head\n");
> > > > >                 return -ENOMEM;
> > > > > @@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct
> > > > > pci_dev
> > > > > *pdev)
> > > > >         h->lockup_detected = NULL;                      /*
> > > > > init_one
> > > > > 2 */
> > > > >         /* (void) pci_disable_pcie_error_reporting(pdev);
> > > > > */    /*
> > > > > init_one 1 */
> > > > > 
> > > > > -       kfree(h);                                       /*
> > > > > init_one
> > > > > 1 */
> > > > > +       hpda_free_ctlr_info(h);                         /*
> > > > > init_one
> > > > > 1 */
> > > > >  }
> > > > > 
> > > > >  static int hpsa_suspend(__attribute__((unused)) struct
> > > > > pci_dev
> > > > > *pdev,
> > > > > diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> > > > > index 018f980a701c..fb9f5e7f8209 100644
> > > > > --- a/drivers/scsi/hpsa.h
> > > > > +++ b/drivers/scsi/hpsa.h
> > > > > @@ -158,6 +158,7 @@ struct bmic_controller_parameters {
> > > > >  #pragma pack()
> > > > > 
> > > > >  struct ctlr_info {
> > > > > +       unsigned int *reply_map;
> > > > >         int     ctlr;
> > > > >         char    devname[8];
> > > > >         char    *product_name;
> > > > > --
> > > > > 2.9.5
> > > > 
> > > > 
> > > 
> > > I have a DL580 here with the following:
> > > 
> > > Ming's latest tree
> > > 4.16.0-rc2.ming+
> > > 
> > > 3:00.0 RAID bus controller: Hewlett-Packard Company Smart Array
> > > G6
> > > controllers (rev 01) P410i
> > > 
> > > /dev/sg0  1 0 0 0  12  HP        P410i             6.60
> > > /dev/sg1  1 1 0 0  0  /dev/sda  HP        LOGICAL VOLUME    6.60
> > > Boot volume
> > > 
> > > /dev/sg2  1 1 0 1  0  /dev/sdb  HP        LOGICAL VOLUME    6.60
> > > Single disk
> > > 
> > > /dev/sg3  1 1 0 2  0  /dev/sdc  HP        LOGICAL VOLUME    6.60
> > >  
> > > 2 Disk Mirror
> > > 
> > > 
> > > MSA50 Shelf at 6GB, all Jbods
> > > 
> > > 0e:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID
> > > SAS
> > > 2208 [Thunderbolt] (rev 03)
> > > 
> > > /dev/sg4  0 0 43 0  0  /dev/sdd  HP        DG072A9BB7        HPD0
> > > /dev/sg5  0 0 44 0  0  /dev/sde  HP        DG146BABCF        HPD5
> > > /dev/sg6  0 0 45 0  0  /dev/sdf  HP        DG146BABCF        HPD6
> > > /dev/sg7  0 0 46
> > > 0  0  /dev/sdg  HP        EG0146FAWHU       HPDE   
> > > /dev/sg8  0 0 47 0  0  /dev/sdh  HP        EG0146FAWHU       HPDD
> > > /dev/sg9  0 0 48 0  0  /dev/sdi  HP        EG0146FAWHU       HPDE
> > > /dev/sg10  0 0 49 0  0  /dev/sdj  ATA       OCZ-
> > > VERTEX4       1.5 
> > > /dev/sg11  0 0 50 0  0  /dev/sdk  ATA       OCZ-
> > > VERTEX4       1.5 
> > > /dev/sg12  0 0 51 0  0  /dev/sdl  ATA       INTEL
> > > SSDSC2BW08  DC32
> > > /dev/sg13  0 0 52 0  13  HP        MSA50  -10D25G1   1.20
> > > 
> > > I have multiple boot passes on the HPSA all passing, and have not
> > > had
> > > any access issues with Ming's patches to the megaraid_sas drives
> > > 
> > > I dont have the decent SSD hardware to test performance on the
> > > megaraid_sas to match Kashyap unfortunately.
> > > 
> > > What I can say is that so far all boot testing has passed.
> > > 
> > > I will exercise all the drives now to see if I can bring about
> > > any
> > > issues seen by Don
> > > 
> > > Thanks
> > > Laurence
> > 
> > Don,
> > 
> > I am not seeing any issues with Ming's V3
> > 
> > So Ming's latest V3 is rock solid for me through multiple fio runs
> > on
> > the DL580 here.
> > On both megaraid_sas and hpsa
> > 
> > Using
> > BOOT_IMAGE=/vmlinuz-4.16.0-rc2.ming+ root=UUID=43f86d71-b1bf-4789-
> > a28e-
> > 21c6ddc90195 ro crashkernel=256M@64M log_buf_len=64M
> > console=ttyS1,115200n8 scsi_mod.use_blk_mq=y dm_mod.use_blk_mq=y
> 
> Hi Laurence,
> 
> Thanks for your test!
> 
> Seems Don run into IO failure without blk-mq, could you run your
> tests again
> in legacy mode?
> 
> Thanks,
> Ming

Hello Ming
I ran multiple passes on Legacy and still see no issues in my test bed

BOOT_IMAGE=/vmlinuz-4.16.0-rc2.ming+ root=UUID=43f86d71-b1bf-4789-a28e-
21c6ddc90195 ro crashkernel=256M@64M log_buf_len=64M
console=ttyS1,115200n8

HEAD of the git kernel I am using

694e16f scsi: megaraid: improve scsi_mq performance via .host_tagset
793686c scsi: hpsa: improve scsi_mq performance via .host_tagset
60d5b36 block: null_blk: introduce module parameter of 'g_host_tags'
8847067 scsi: Add template flag 'host_tagset'
a8fbdd6 blk-mq: introduce BLK_MQ_F_HOST_TAGS
4710fab blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
09bb153 scsi: megaraid_sas: fix selection of reply queue
52700d8 scsi: hpsa: fix selection of reply queue

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-02 14:09           ` Laurence Oberman
@ 2018-03-02 15:03             ` Don Brace
  2018-03-02 21:53               ` Laurence Oberman
  0 siblings, 1 reply; 54+ messages in thread
From: Don Brace @ 2018-03-02 15:03 UTC (permalink / raw)
  To: Laurence Oberman, Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Meelis Roos

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBMYXVyZW5jZSBPYmVybWFuIFtt
YWlsdG86bG9iZXJtYW5AcmVkaGF0LmNvbV0NCj4gU2VudDogRnJpZGF5LCBNYXJjaCAwMiwgMjAx
OCA4OjA5IEFNDQo+IFRvOiBNaW5nIExlaSA8bWluZy5sZWlAcmVkaGF0LmNvbT4NCj4gQ2M6IERv
biBCcmFjZSA8ZG9uLmJyYWNlQG1pY3Jvc2VtaS5jb20+OyBKZW5zIEF4Ym9lIDxheGJvZUBrZXJu
ZWwuZGs+Ow0KPiBsaW51eC1ibG9ja0B2Z2VyLmtlcm5lbC5vcmc7IENocmlzdG9waCBIZWxsd2ln
IDxoY2hAaW5mcmFkZWFkLm9yZz47IE1pa2UNCj4gU25pdHplciA8c25pdHplckByZWRoYXQuY29t
PjsgbGludXgtc2NzaUB2Z2VyLmtlcm5lbC5vcmc7IEhhbm5lcyBSZWluZWNrZQ0KPiA8aGFyZUBz
dXNlLmRlPjsgQXJ1biBFYXNpIDxhcnVuLmVhc2lAY2F2aXVtLmNvbT47IE9tYXIgU2FuZG92YWwN
Cj4gPG9zYW5kb3ZAZmIuY29tPjsgTWFydGluIEsgLiBQZXRlcnNlbiA8bWFydGluLnBldGVyc2Vu
QG9yYWNsZS5jb20+OyBKYW1lcw0KPiBCb3R0b21sZXkgPGphbWVzLmJvdHRvbWxleUBoYW5zZW5w
YXJ0bmVyc2hpcC5jb20+OyBDaHJpc3RvcGggSGVsbHdpZw0KPiA8aGNoQGxzdC5kZT47IEthc2h5
YXAgRGVzYWkgPGthc2h5YXAuZGVzYWlAYnJvYWRjb20uY29tPjsgUGV0ZXIgUml2ZXJhDQo+IDxw
ZXRlci5yaXZlcmFAYnJvYWRjb20uY29tPjsgTWVlbGlzIFJvb3MgPG1yb29zQGxpbnV4LmVlPg0K
PiBTdWJqZWN0OiBSZTogW1BBVENIIFYzIDEvOF0gc2NzaTogaHBzYTogZml4IHNlbGVjdGlvbiBv
ZiByZXBseSBxdWV1ZQ0KPiANCj4gRVhURVJOQUwgRU1BSUwNCj4gDQo+IA0KPiBPbiBGcmksIDIw
MTgtMDMtMDIgYXQgMTA6MTYgKzA4MDAsIE1pbmcgTGVpIHdyb3RlOg0KPiA+IE9uIFRodSwgTWFy
IDAxLCAyMDE4IGF0IDA0OjE5OjM0UE0gLTA1MDAsIExhdXJlbmNlIE9iZXJtYW4gd3JvdGU6DQo+
ID4gPiBPbiBUaHUsIDIwMTgtMDMtMDEgYXQgMTQ6MDEgLTA1MDAsIExhdXJlbmNlIE9iZXJtYW4g
d3JvdGU6DQo+ID4gPiA+IE9uIFRodSwgMjAxOC0wMy0wMSBhdCAxNjoxOCArMDAwMCwgRG9uIEJy
YWNlIHdyb3RlOg0KPiA+ID4gPiA+ID4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gPiA+
ID4gPiA+IEZyb206IE1pbmcgTGVpIFttYWlsdG86bWluZy5sZWlAcmVkaGF0LmNvbV0NCj4gPiA+
ID4gPiA+IFNlbnQ6IFR1ZXNkYXksIEZlYnJ1YXJ5IDI3LCAyMDE4IDQ6MDggQU0NCj4gPiA+ID4g
PiA+IFRvOiBKZW5zIEF4Ym9lIDxheGJvZUBrZXJuZWwuZGs+OyBsaW51eC1ibG9ja0B2Z2VyLmtl
cm5lbC5vcmcNCj4gPiA+ID4gPiA+IDsNCj4gPiA+ID4gPiA+IENocmlzdG9waA0KPiA+ID4gPiA+
ID4gSGVsbHdpZyA8aGNoQGluZnJhZGVhZC5vcmc+OyBNaWtlIFNuaXR6ZXIgPHNuaXR6ZXJAcmVk
aGF0LmNvbQ0KPiA+ID4gPiA+ID4gPg0KPiA+ID4gPiA+ID4gQ2M6IGxpbnV4LXNjc2lAdmdlci5r
ZXJuZWwub3JnOyBIYW5uZXMgUmVpbmVja2UgPGhhcmVAc3VzZS5kZQ0KPiA+ID4gPiA+ID4gPjsN
Cj4gPiA+ID4gPiA+IEFydW4gRWFzaQ0KPiA+ID4gPiA+ID4gPGFydW4uZWFzaUBjYXZpdW0uY29t
PjsgT21hciBTYW5kb3ZhbCA8b3NhbmRvdkBmYi5jb20+Ow0KPiA+ID4gPiA+ID4gTWFydGluIEsN
Cj4gPiA+ID4gPiA+IC4NCj4gPiA+ID4gPiA+IFBldGVyc2VuIDxtYXJ0aW4ucGV0ZXJzZW5Ab3Jh
Y2xlLmNvbT47IEphbWVzIEJvdHRvbWxleQ0KPiA+ID4gPiA+ID4gPGphbWVzLmJvdHRvbWxleUBo
YW5zZW5wYXJ0bmVyc2hpcC5jb20+OyBDaHJpc3RvcGggSGVsbHdpZyA8aA0KPiA+ID4gPiA+ID4g
Y2hAbA0KPiA+ID4gPiA+ID4gc3QNCj4gPiA+ID4gPiA+IC5kZT47DQo+ID4gPiA+ID4gPiBEb24g
QnJhY2UgPGRvbi5icmFjZUBtaWNyb3NlbWkuY29tPjsgS2FzaHlhcCBEZXNhaQ0KPiA+ID4gPiA+
ID4gPGthc2h5YXAuZGVzYWlAYnJvYWRjb20uY29tPjsgUGV0ZXIgUml2ZXJhIDxwZXRlci5yaXZl
cmFAYnJvYQ0KPiA+ID4gPiA+ID4gZGNvbQ0KPiA+ID4gPiA+ID4gLmMNCj4gPiA+ID4gPiA+IG9t
PjsNCj4gPiA+ID4gPiA+IExhdXJlbmNlIE9iZXJtYW4gPGxvYmVybWFuQHJlZGhhdC5jb20+OyBN
aW5nIExlaQ0KPiA+ID4gPiA+ID4gPG1pbmcubGVpQHJlZGhhdC5jb20+OyBNZWVsaXMgUm9vcyA8
bXJvb3NAbGludXguZWU+DQo+ID4gPiA+ID4gPiBTdWJqZWN0OiBbUEFUQ0ggVjMgMS84XSBzY3Np
OiBocHNhOiBmaXggc2VsZWN0aW9uIG9mIHJlcGx5DQo+ID4gPiA+ID4gPiBxdWV1ZQ0KPiA+ID4g
PiA+ID4NCj4gPiBTZWVtcyBEb24gcnVuIGludG8gSU8gZmFpbHVyZSB3aXRob3V0IGJsay1tcSwg
Y291bGQgeW91IHJ1biB5b3VyDQo+ID4gdGVzdHMgYWdhaW4NCj4gPiBpbiBsZWdhY3kgbW9kZT8N
Cj4gPg0KPiA+IFRoYW5rcywNCj4gPiBNaW5nDQo+IA0KPiBIZWxsbyBNaW5nDQo+IEkgcmFuIG11
bHRpcGxlIHBhc3NlcyBvbiBMZWdhY3kgYW5kIHN0aWxsIHNlZSBubyBpc3N1ZXMgaW4gbXkgdGVz
dCBiZWQNCj4gDQo+IEJPT1RfSU1BR0U9L3ZtbGludXotNC4xNi4wLXJjMi5taW5nKyByb290PVVV
SUQ9NDNmODZkNzEtYjFiZi00Nzg5LQ0KPiBhMjhlLQ0KPiAyMWM2ZGRjOTAxOTUgcm8gY3Jhc2hr
ZXJuZWw9MjU2TUA2NE0gbG9nX2J1Zl9sZW49NjRNDQo+IGNvbnNvbGU9dHR5UzEsMTE1MjAwbjgN
Cj4gDQo+IEhFQUQgb2YgdGhlIGdpdCBrZXJuZWwgSSBhbSB1c2luZw0KPiANCj4gNjk0ZTE2ZiBz
Y3NpOiBtZWdhcmFpZDogaW1wcm92ZSBzY3NpX21xIHBlcmZvcm1hbmNlIHZpYSAuaG9zdF90YWdz
ZXQNCj4gNzkzNjg2YyBzY3NpOiBocHNhOiBpbXByb3ZlIHNjc2lfbXEgcGVyZm9ybWFuY2Ugdmlh
IC5ob3N0X3RhZ3NldA0KPiA2MGQ1YjM2IGJsb2NrOiBudWxsX2JsazogaW50cm9kdWNlIG1vZHVs
ZSBwYXJhbWV0ZXIgb2YgJ2dfaG9zdF90YWdzJw0KPiA4ODQ3MDY3IHNjc2k6IEFkZCB0ZW1wbGF0
ZSBmbGFnICdob3N0X3RhZ3NldCcNCj4gYThmYmRkNiBibGstbXE6IGludHJvZHVjZSBCTEtfTVFf
Rl9IT1NUX1RBR1MNCj4gNDcxMGZhYiBibGstbXE6IGludHJvZHVjZSAnc3RhcnRfdGFnJyBmaWVs
ZCB0byAnc3RydWN0IGJsa19tcV90YWdzJw0KPiAwOWJiMTUzIHNjc2k6IG1lZ2FyYWlkX3Nhczog
Zml4IHNlbGVjdGlvbiBvZiByZXBseSBxdWV1ZQ0KPiA1MjcwMGQ4IHNjc2k6IGhwc2E6IGZpeCBz
ZWxlY3Rpb24gb2YgcmVwbHkgcXVldWUNCg0KSSBjaGVja291dCBvdXQgTGludXMncyB0cmVlICg0
LjE2LjAtcmMzKykgYW5kIHJlLWFwcGxpZWQgdGhlIGFib3ZlIHBhdGNoZXMuDQpJICBhbmQgaGF2
ZSBiZWVuIHJ1bm5pbmcgMjQgaG91cnMgd2l0aCBubyBpc3N1ZXMuDQpFdmlkZW50bHkgbXkgZm9y
a2VkIGNvcHkgd2FzIGNvcnJ1cHRlZC4gDQoNClNvLCBteSBJL08gdGVzdGluZyBoYXMgZ29uZSB3
ZWxsLiANCg0KSSdsbCBydW4gc29tZSBwZXJmb3JtYW5jZSBudW1iZXJzIG5leHQuDQoNClRoYW5r
cywNCkRvbg0K

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-02 15:03             ` Don Brace
@ 2018-03-02 21:53               ` Laurence Oberman
  2018-03-05  2:07                 ` Ming Lei
  2018-03-05  7:23                 ` Kashyap Desai
  0 siblings, 2 replies; 54+ messages in thread
From: Laurence Oberman @ 2018-03-02 21:53 UTC (permalink / raw)
  To: Don Brace, Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Meelis Roos

On Fri, 2018-03-02 at 15:03 +0000, Don Brace wrote:
> > -----Original Message-----
> > From: Laurence Oberman [mailto:loberman@redhat.com]
> > Sent: Friday, March 02, 2018 8:09 AM
> > To: Ming Lei <ming.lei@redhat.com>
> > Cc: Don Brace <don.brace@microsemi.com>; Jens Axboe <axboe@kernel.d
> > k>;
> > linux-block@vger.kernel.org; Christoph Hellwig <hch@infradead.org>;
> > Mike
> > Snitzer <snitzer@redhat.com>; linux-scsi@vger.kernel.org; Hannes
> > Reinecke
> > <hare@suse.de>; Arun Easi <arun.easi@cavium.com>; Omar Sandoval
> > <osandov@fb.com>; Martin K . Petersen <martin.petersen@oracle.com>;
> > James
> > Bottomley <james.bottomley@hansenpartnership.com>; Christoph
> > Hellwig
> > <hch@lst.de>; Kashyap Desai <kashyap.desai@broadcom.com>; Peter
> > Rivera
> > <peter.rivera@broadcom.com>; Meelis Roos <mroos@linux.ee>
> > Subject: Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply
> > queue
> > 
> > EXTERNAL EMAIL
> > 
> > 
> > On Fri, 2018-03-02 at 10:16 +0800, Ming Lei wrote:
> > > On Thu, Mar 01, 2018 at 04:19:34PM -0500, Laurence Oberman wrote:
> > > > On Thu, 2018-03-01 at 14:01 -0500, Laurence Oberman wrote:
> > > > > On Thu, 2018-03-01 at 16:18 +0000, Don Brace wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > > > > > Sent: Tuesday, February 27, 2018 4:08 AM
> > > > > > > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel
> > > > > > > .org
> > > > > > > ;
> > > > > > > Christoph
> > > > > > > Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat
> > > > > > > .com
> > > > > > > > 
> > > > > > > 
> > > > > > > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@sus
> > > > > > > e.de
> > > > > > > > ;
> > > > > > > 
> > > > > > > Arun Easi
> > > > > > > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>;
> > > > > > > Martin K
> > > > > > > .
> > > > > > > Petersen <martin.petersen@oracle.com>; James Bottomley
> > > > > > > <james.bottomley@hansenpartnership.com>; Christoph
> > > > > > > Hellwig <h
> > > > > > > ch@l
> > > > > > > st
> > > > > > > .de>;
> > > > > > > Don Brace <don.brace@microsemi.com>; Kashyap Desai
> > > > > > > <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@
> > > > > > > broa
> > > > > > > dcom
> > > > > > > .c
> > > > > > > om>;
> > > > > > > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > > > > > > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > > > > > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of
> > > > > > > reply
> > > > > > > queue
> > > > > > > 
> > > 
> > > Seems Don run into IO failure without blk-mq, could you run your
> > > tests again
> > > in legacy mode?
> > > 
> > > Thanks,
> > > Ming
> > 
> > Hello Ming
> > I ran multiple passes on Legacy and still see no issues in my test
> > bed
> > 
> > BOOT_IMAGE=/vmlinuz-4.16.0-rc2.ming+ root=UUID=43f86d71-b1bf-4789-
> > a28e-
> > 21c6ddc90195 ro crashkernel=256M@64M log_buf_len=64M
> > console=ttyS1,115200n8
> > 
> > HEAD of the git kernel I am using
> > 
> > 694e16f scsi: megaraid: improve scsi_mq performance via
> > .host_tagset
> > 793686c scsi: hpsa: improve scsi_mq performance via .host_tagset
> > 60d5b36 block: null_blk: introduce module parameter of
> > 'g_host_tags'
> > 8847067 scsi: Add template flag 'host_tagset'
> > a8fbdd6 blk-mq: introduce BLK_MQ_F_HOST_TAGS
> > 4710fab blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
> > 09bb153 scsi: megaraid_sas: fix selection of reply queue
> > 52700d8 scsi: hpsa: fix selection of reply queue
> 
> I checkout out Linus's tree (4.16.0-rc3+) and re-applied the above
> patches.
> I  and have been running 24 hours with no issues.
> Evidently my forked copy was corrupted. 
> 
> So, my I/O testing has gone well. 
> 
> I'll run some performance numbers next.
> 
> Thanks,
> Don

Unless Kashyap is not happy we need to consider getting this in to
Linus now because we are seeing HPE servers that keep hanging now with
the original commit now upstream.

Kashyap, are you good with the v3 patchset or still concerned with
performance. I was getting pretty good IOPS/sec to individual SSD
drives set up as jbod devices on the megaraid_sas.

With larger I/O sizes like 1MB I was getting good MB/sec and not seeing
a measurable performance impact.

I dont have the hardware you have to mimic your configuration.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-02 21:53               ` Laurence Oberman
@ 2018-03-05  2:07                 ` Ming Lei
  2018-03-06 17:55                   ` Martin K. Petersen
  2018-03-06 19:24                   ` Martin K. Petersen
  2018-03-05  7:23                 ` Kashyap Desai
  1 sibling, 2 replies; 54+ messages in thread
From: Ming Lei @ 2018-03-05  2:07 UTC (permalink / raw)
  To: Laurence Oberman, Martin K . Petersen, James Bottomley
  Cc: Don Brace, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, linux-scsi, Hannes Reinecke, Arun Easi,
	Omar Sandoval, Martin K . Petersen, James Bottomley,
	Christoph Hellwig, Kashyap Desai, Peter Rivera, Meelis Roos

On Fri, Mar 02, 2018 at 04:53:21PM -0500, Laurence Oberman wrote:
> On Fri, 2018-03-02 at 15:03 +0000, Don Brace wrote:
> > > -----Original Message-----
> > > From: Laurence Oberman [mailto:loberman@redhat.com]
> > > Sent: Friday, March 02, 2018 8:09 AM
> > > To: Ming Lei <ming.lei@redhat.com>
> > > Cc: Don Brace <don.brace@microsemi.com>; Jens Axboe <axboe@kernel.d
> > > k>;
> > > linux-block@vger.kernel.org; Christoph Hellwig <hch@infradead.org>;
> > > Mike
> > > Snitzer <snitzer@redhat.com>; linux-scsi@vger.kernel.org; Hannes
> > > Reinecke
> > > <hare@suse.de>; Arun Easi <arun.easi@cavium.com>; Omar Sandoval
> > > <osandov@fb.com>; Martin K . Petersen <martin.petersen@oracle.com>;
> > > James
> > > Bottomley <james.bottomley@hansenpartnership.com>; Christoph
> > > Hellwig
> > > <hch@lst.de>; Kashyap Desai <kashyap.desai@broadcom.com>; Peter
> > > Rivera
> > > <peter.rivera@broadcom.com>; Meelis Roos <mroos@linux.ee>
> > > Subject: Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply
> > > queue
> > > 
> > > EXTERNAL EMAIL
> > > 
> > > 
> > > On Fri, 2018-03-02 at 10:16 +0800, Ming Lei wrote:
> > > > On Thu, Mar 01, 2018 at 04:19:34PM -0500, Laurence Oberman wrote:
> > > > > On Thu, 2018-03-01 at 14:01 -0500, Laurence Oberman wrote:
> > > > > > On Thu, 2018-03-01 at 16:18 +0000, Don Brace wrote:
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > > > > > > Sent: Tuesday, February 27, 2018 4:08 AM
> > > > > > > > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel
> > > > > > > > .org
> > > > > > > > ;
> > > > > > > > Christoph
> > > > > > > > Hellwig <hch@infradead.org>; Mike Snitzer <snitzer@redhat
> > > > > > > > .com
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@sus
> > > > > > > > e.de
> > > > > > > > > ;
> > > > > > > > 
> > > > > > > > Arun Easi
> > > > > > > > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>;
> > > > > > > > Martin K
> > > > > > > > .
> > > > > > > > Petersen <martin.petersen@oracle.com>; James Bottomley
> > > > > > > > <james.bottomley@hansenpartnership.com>; Christoph
> > > > > > > > Hellwig <h
> > > > > > > > ch@l
> > > > > > > > st
> > > > > > > > .de>;
> > > > > > > > Don Brace <don.brace@microsemi.com>; Kashyap Desai
> > > > > > > > <kashyap.desai@broadcom.com>; Peter Rivera <peter.rivera@
> > > > > > > > broa
> > > > > > > > dcom
> > > > > > > > .c
> > > > > > > > om>;
> > > > > > > > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > > > > > > > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > > > > > > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of
> > > > > > > > reply
> > > > > > > > queue
> > > > > > > > 
> > > > 
> > > > Seems Don run into IO failure without blk-mq, could you run your
> > > > tests again
> > > > in legacy mode?
> > > > 
> > > > Thanks,
> > > > Ming
> > > 
> > > Hello Ming
> > > I ran multiple passes on Legacy and still see no issues in my test
> > > bed
> > > 
> > > BOOT_IMAGE=/vmlinuz-4.16.0-rc2.ming+ root=UUID=43f86d71-b1bf-4789-
> > > a28e-
> > > 21c6ddc90195 ro crashkernel=256M@64M log_buf_len=64M
> > > console=ttyS1,115200n8
> > > 
> > > HEAD of the git kernel I am using
> > > 
> > > 694e16f scsi: megaraid: improve scsi_mq performance via
> > > .host_tagset
> > > 793686c scsi: hpsa: improve scsi_mq performance via .host_tagset
> > > 60d5b36 block: null_blk: introduce module parameter of
> > > 'g_host_tags'
> > > 8847067 scsi: Add template flag 'host_tagset'
> > > a8fbdd6 blk-mq: introduce BLK_MQ_F_HOST_TAGS
> > > 4710fab blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
> > > 09bb153 scsi: megaraid_sas: fix selection of reply queue
> > > 52700d8 scsi: hpsa: fix selection of reply queue
> > 
> > I checkout out Linus's tree (4.16.0-rc3+) and re-applied the above
> > patches.
> > Iï¿½ï¿½and have been running 24 hours with no issues.
> > Evidently my forked copy was corrupted.ï¿½
> > 
> > So, my I/O testing has gone well.ï¿½
> > 
> > I'll run some performance numbers next.
> > 
> > Thanks,
> > Don
> 
> Unless Kashyap is not happy we need to consider getting this in to
> Linus now because we are seeing HPE servers that keep hanging now with
> the original commit now upstream.

Hi Martin,

Given both Don and Laurence have verified that patch 1 and patch 2
does fix IO hang, could you consider to merge the two first?

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-02 21:53               ` Laurence Oberman
  2018-03-05  2:07                 ` Ming Lei
@ 2018-03-05  7:23                 ` Kashyap Desai
  2018-03-05 14:35                   ` Don Brace
  2018-03-05 15:19                   ` Mike Snitzer
  1 sibling, 2 replies; 54+ messages in thread
From: Kashyap Desai @ 2018-03-05  7:23 UTC (permalink / raw)
  To: Laurence Oberman, Don Brace, Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Peter Rivera, Meelis Roos

> -----Original Message-----
> From: Laurence Oberman [mailto:loberman@redhat.com]
> Sent: Saturday, March 3, 2018 3:23 AM
> To: Don Brace; Ming Lei
> Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Kashyap Desai;
> Peter
> Rivera; Meelis Roos
> Subject: Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
>
> On Fri, 2018-03-02 at 15:03 +0000, Don Brace wrote:
> > > -----Original Message-----
> > > From: Laurence Oberman [mailto:loberman@redhat.com]
> > > Sent: Friday, March 02, 2018 8:09 AM
> > > To: Ming Lei <ming.lei@redhat.com>
> > > Cc: Don Brace <don.brace@microsemi.com>; Jens Axboe <axboe@kernel.d
> > > k>;
> > > linux-block@vger.kernel.org; Christoph Hellwig <hch@infradead.org>;
> > > Mike Snitzer <snitzer@redhat.com>; linux-scsi@vger.kernel.org;
> > > Hannes Reinecke <hare@suse.de>; Arun Easi <arun.easi@cavium.com>;
> > > Omar Sandoval <osandov@fb.com>; Martin K . Petersen
> > > <martin.petersen@oracle.com>; James Bottomley
> > > <james.bottomley@hansenpartnership.com>; Christoph Hellwig
> > > <hch@lst.de>; Kashyap Desai <kashyap.desai@broadcom.com>; Peter
> > > Rivera <peter.rivera@broadcom.com>; Meelis Roos <mroos@linux.ee>
> > > Subject: Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
> > >
> > > EXTERNAL EMAIL
> > >
> > >
> > > On Fri, 2018-03-02 at 10:16 +0800, Ming Lei wrote:
> > > > On Thu, Mar 01, 2018 at 04:19:34PM -0500, Laurence Oberman wrote:
> > > > > On Thu, 2018-03-01 at 14:01 -0500, Laurence Oberman wrote:
> > > > > > On Thu, 2018-03-01 at 16:18 +0000, Don Brace wrote:
> > > > > > > > -----Original Message-----
> > > > > > > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > > > > > > Sent: Tuesday, February 27, 2018 4:08 AM
> > > > > > > > To: Jens Axboe <axboe@kernel.dk>; linux-block@vger.kernel
> > > > > > > > .org ; Christoph Hellwig <hch@infradead.org>; Mike Snitzer
> > > > > > > > <snitzer@redhat .com
> > > > > > > > >
> > > > > > > >
> > > > > > > > Cc: linux-scsi@vger.kernel.org; Hannes Reinecke <hare@sus
> > > > > > > > e.de
> > > > > > > > > ;
> > > > > > > >
> > > > > > > > Arun Easi
> > > > > > > > <arun.easi@cavium.com>; Omar Sandoval <osandov@fb.com>;
> > > > > > > > Martin K .
> > > > > > > > Petersen <martin.petersen@oracle.com>; James Bottomley
> > > > > > > > <james.bottomley@hansenpartnership.com>; Christoph Hellwig
> > > > > > > > <h ch@l st .de>; Don Brace <don.brace@microsemi.com>;
> > > > > > > > Kashyap Desai <kashyap.desai@broadcom.com>; Peter Rivera
> > > > > > > > <peter.rivera@ broa dcom .c
> > > > > > > > om>;
> > > > > > > > Laurence Oberman <loberman@redhat.com>; Ming Lei
> > > > > > > > <ming.lei@redhat.com>; Meelis Roos <mroos@linux.ee>
> > > > > > > > Subject: [PATCH V3 1/8] scsi: hpsa: fix selection of reply
> > > > > > > > queue
> > > > > > > >
> > > >
> > > > Seems Don run into IO failure without blk-mq, could you run your
> > > > tests again in legacy mode?
> > > >
> > > > Thanks,
> > > > Ming
> > >
> > > Hello Ming
> > > I ran multiple passes on Legacy and still see no issues in my test
> > > bed
> > >
> > > BOOT_IMAGE=/vmlinuz-4.16.0-rc2.ming+ root=UUID=43f86d71-b1bf-4789-
> > > a28e-
> > > 21c6ddc90195 ro crashkernel=256M@64M log_buf_len=64M
> > > console=ttyS1,115200n8
> > >
> > > HEAD of the git kernel I am using
> > >
> > > 694e16f scsi: megaraid: improve scsi_mq performance via .host_tagset
> > > 793686c scsi: hpsa: improve scsi_mq performance via .host_tagset
> > > 60d5b36 block: null_blk: introduce module parameter of 'g_host_tags'
> > > 8847067 scsi: Add template flag 'host_tagset'
> > > a8fbdd6 blk-mq: introduce BLK_MQ_F_HOST_TAGS 4710fab blk-mq:
> > > introduce 'start_tag' field to 'struct blk_mq_tags'
> > > 09bb153 scsi: megaraid_sas: fix selection of reply queue
> > > 52700d8 scsi: hpsa: fix selection of reply queue
> >
> > I checkout out Linus's tree (4.16.0-rc3+) and re-applied the above
> > patches.
> > I  and have been running 24 hours with no issues.
> > Evidently my forked copy was corrupted.
> >
> > So, my I/O testing has gone well.
> >
> > I'll run some performance numbers next.
> >
> > Thanks,
> > Don
>
> Unless Kashyap is not happy we need to consider getting this in to Linus
> now
> because we are seeing HPE servers that keep hanging now with the original
> commit now upstream.
>
> Kashyap, are you good with the v3 patchset or still concerned with
> performance. I was getting pretty good IOPS/sec to individual SSD drives
> set
> up as jbod devices on the megaraid_sas.

Laurence -
Did you find difference with/without the patch ? What was IOPs number with
and without patch.
It is not urgent feature, so I would like to take some time to get BRCM's
performance team involved and do full analysis of performance run and find
pros/cons.

Kashyap
>
> With larger I/O sizes like 1MB I was getting good MB/sec and not seeing a
> measurable performance impact.
>
> I dont have the hardware you have to mimic your configuration.
>
> Thanks
> Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-05  7:23                 ` Kashyap Desai
@ 2018-03-05 14:35                   ` Don Brace
  2018-03-05 15:19                   ` Mike Snitzer
  1 sibling, 0 replies; 54+ messages in thread
From: Don Brace @ 2018-03-05 14:35 UTC (permalink / raw)
  To: Kashyap Desai, Laurence Oberman, Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Peter Rivera, Meelis Roos

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBLYXNoeWFwIERlc2FpIFttYWls
dG86a2FzaHlhcC5kZXNhaUBicm9hZGNvbS5jb21dDQo+IFNlbnQ6IE1vbmRheSwgTWFyY2ggMDUs
IDIwMTggMToyNCBBTQ0KPiBUbzogTGF1cmVuY2UgT2Jlcm1hbiA8bG9iZXJtYW5AcmVkaGF0LmNv
bT47IERvbiBCcmFjZQ0KPiA8ZG9uLmJyYWNlQG1pY3Jvc2VtaS5jb20+OyBNaW5nIExlaSA8bWlu
Zy5sZWlAcmVkaGF0LmNvbT4NCj4gQ2M6IEplbnMgQXhib2UgPGF4Ym9lQGtlcm5lbC5kaz47IGxp
bnV4LWJsb2NrQHZnZXIua2VybmVsLm9yZzsgQ2hyaXN0b3BoDQo+IEhlbGx3aWcgPGhjaEBpbmZy
YWRlYWQub3JnPjsgTWlrZSBTbml0emVyIDxzbml0emVyQHJlZGhhdC5jb20+OyBsaW51eC0NCj4g
c2NzaUB2Z2VyLmtlcm5lbC5vcmc7IEhhbm5lcyBSZWluZWNrZSA8aGFyZUBzdXNlLmRlPjsgQXJ1
biBFYXNpDQo+IDxhcnVuLmVhc2lAY2F2aXVtLmNvbT47IE9tYXIgU2FuZG92YWwgPG9zYW5kb3ZA
ZmIuY29tPjsgTWFydGluIEsgLg0KPiBQZXRlcnNlbiA8bWFydGluLnBldGVyc2VuQG9yYWNsZS5j
b20+OyBKYW1lcyBCb3R0b21sZXkNCj4gPGphbWVzLmJvdHRvbWxleUBoYW5zZW5wYXJ0bmVyc2hp
cC5jb20+OyBDaHJpc3RvcGggSGVsbHdpZyA8aGNoQGxzdC5kZT47DQo+IFBldGVyIFJpdmVyYSA8
cGV0ZXIucml2ZXJhQGJyb2FkY29tLmNvbT47IE1lZWxpcyBSb29zIDxtcm9vc0BsaW51eC5lZT4N
Cj4gU3ViamVjdDogUkU6IFtQQVRDSCBWMyAxLzhdIHNjc2k6IGhwc2E6IGZpeCBzZWxlY3Rpb24g
b2YgcmVwbHkgcXVldWUNCj4gDQo+IEVYVEVSTkFMIEVNQUlMDQo+IA0KPiANCj4gPiAtLS0tLU9y
aWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+IEZyb206IExhdXJlbmNlIE9iZXJtYW4gW21haWx0bzps
b2Jlcm1hbkByZWRoYXQuY29tXQ0KPiA+IFNlbnQ6IFNhdHVyZGF5LCBNYXJjaCAzLCAyMDE4IDM6
MjMgQU0NCj4gPiBUbzogRG9uIEJyYWNlOyBNaW5nIExlaQ0KPiA+IENjOiBKZW5zIEF4Ym9lOyBs
aW51eC1ibG9ja0B2Z2VyLmtlcm5lbC5vcmc7IENocmlzdG9waCBIZWxsd2lnOyBNaWtlDQo+ID4g
U25pdHplcjsNCj4gPiBsaW51eC1zY3NpQHZnZXIua2VybmVsLm9yZzsgSGFubmVzIFJlaW5lY2tl
OyBBcnVuIEVhc2k7IE9tYXIgU2FuZG92YWw7DQo+ID4gTWFydGluIEsgLiBQZXRlcnNlbjsgSmFt
ZXMgQm90dG9tbGV5OyBDaHJpc3RvcGggSGVsbHdpZzsgS2FzaHlhcCBEZXNhaTsNCj4gPiBQZXRl
cg0KPiA+IFJpdmVyYTsgTWVlbGlzIFJvb3MNCj4gPiBTdWJqZWN0OiBSZTogW1BBVENIIFYzIDEv
OF0gc2NzaTogaHBzYTogZml4IHNlbGVjdGlvbiBvZiByZXBseSBxdWV1ZQ0KPiA+DQo+ID4gT24g
RnJpLCAyMDE4LTAzLTAyIGF0IDE1OjAzICswMDAwLCBEb24gQnJhY2Ugd3JvdGU6DQo+ID4gPiA+
IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4gPiA+IEZyb206IExhdXJlbmNlIE9iZXJt
YW4gW21haWx0bzpsb2Jlcm1hbkByZWRoYXQuY29tXQ0KPiA+ID4gPiBTZW50OiBGcmlkYXksIE1h
cmNoIDAyLCAyMDE4IDg6MDkgQU0NCj4gPiA+ID4gVG86IE1pbmcgTGVpIDxtaW5nLmxlaUByZWRo
YXQuY29tPg0KPiA+ID4gPiBDYzogRG9uIEJyYWNlIDxkb24uYnJhY2VAbWljcm9zZW1pLmNvbT47
IEplbnMgQXhib2UgPGF4Ym9lQGtlcm5lbC5kDQo+ID4gPiA+IGs+Ow0KPiA+ID4gPiBsaW51eC1i
bG9ja0B2Z2VyLmtlcm5lbC5vcmc7IENocmlzdG9waCBIZWxsd2lnIDxoY2hAaW5mcmFkZWFkLm9y
Zz47DQo+ID4gPiA+IE1pa2UgU25pdHplciA8c25pdHplckByZWRoYXQuY29tPjsgbGludXgtc2Nz
aUB2Z2VyLmtlcm5lbC5vcmc7DQo+ID4gPiA+IEhhbm5lcyBSZWluZWNrZSA8aGFyZUBzdXNlLmRl
PjsgQXJ1biBFYXNpIDxhcnVuLmVhc2lAY2F2aXVtLmNvbT47DQo+ID4gPiA+IE9tYXIgU2FuZG92
YWwgPG9zYW5kb3ZAZmIuY29tPjsgTWFydGluIEsgLiBQZXRlcnNlbg0KPiA+ID4gPiA8bWFydGlu
LnBldGVyc2VuQG9yYWNsZS5jb20+OyBKYW1lcyBCb3R0b21sZXkNCj4gPiA+ID4gPGphbWVzLmJv
dHRvbWxleUBoYW5zZW5wYXJ0bmVyc2hpcC5jb20+OyBDaHJpc3RvcGggSGVsbHdpZw0KPiA+ID4g
PiA8aGNoQGxzdC5kZT47IEthc2h5YXAgRGVzYWkgPGthc2h5YXAuZGVzYWlAYnJvYWRjb20uY29t
PjsgUGV0ZXINCj4gPiA+ID4gUml2ZXJhIDxwZXRlci5yaXZlcmFAYnJvYWRjb20uY29tPjsgTWVl
bGlzIFJvb3MgPG1yb29zQGxpbnV4LmVlPg0KPiA+ID4gPiBTdWJqZWN0OiBSZTogW1BBVENIIFYz
IDEvOF0gc2NzaTogaHBzYTogZml4IHNlbGVjdGlvbiBvZiByZXBseSBxdWV1ZQ0KPiA+ID4gPg0K
PiA+ID4gPiBFWFRFUk5BTCBFTUFJTA0KPiA+ID4gPg0KPiA+ID4gPg0KPiA+ID4gPiBPbiBGcmks
IDIwMTgtMDMtMDIgYXQgMTA6MTYgKzA4MDAsIE1pbmcgTGVpIHdyb3RlOg0KPiA+ID4gPiA+IE9u
IFRodSwgTWFyIDAxLCAyMDE4IGF0IDA0OjE5OjM0UE0gLTA1MDAsIExhdXJlbmNlIE9iZXJtYW4g
d3JvdGU6DQo+ID4gPiA+ID4gPiBPbiBUaHUsIDIwMTgtMDMtMDEgYXQgMTQ6MDEgLTA1MDAsIExh
dXJlbmNlIE9iZXJtYW4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+IE9uIFRodSwgMjAxOC0wMy0wMSBh
dCAxNjoxOCArMDAwMCwgRG9uIEJyYWNlIHdyb3RlOg0KPiA+ID4gPiA+ID4gPiA+ID4gLS0tLS1P
cmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gPiA+ID4gPiA+ID4gPiA+IEZyb206IE1pbmcgTGVpIFtt
YWlsdG86bWluZy5sZWlAcmVkaGF0LmNvbV0NCj4gPiA+ID4gPiA+ID4gPiA+IFNlbnQ6IFR1ZXNk
YXksIEZlYnJ1YXJ5IDI3LCAyMDE4IDQ6MDggQU0NCj4gPiA+ID4gPiA+ID4gPiA+IFRvOiBKZW5z
IEF4Ym9lIDxheGJvZUBrZXJuZWwuZGs+OyBsaW51eC1ibG9ja0B2Z2VyLmtlcm5lbA0KPiA+ID4g
PiA+ID4gPiA+ID4gLm9yZyA7IENocmlzdG9waCBIZWxsd2lnIDxoY2hAaW5mcmFkZWFkLm9yZz47
IE1pa2UgU25pdHplcg0KPiA+ID4gPiA+ID4gPiA+ID4gPHNuaXR6ZXJAcmVkaGF0IC5jb20NCj4g
PiA+ID4gPiA+ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+DQo+ID4gPiA+ID4gPiA+ID4gPiBD
YzogbGludXgtc2NzaUB2Z2VyLmtlcm5lbC5vcmc7IEhhbm5lcyBSZWluZWNrZSA8aGFyZUBzdXMN
Cj4gPiA+ID4gPiA+ID4gPiA+IGUuZGUNCj4gPiA+ID4gPiA+ID4gPiA+ID4gOw0KPiA+ID4gPiA+
ID4gPiA+ID4NCj4gPiA+ID4gPiA+ID4gPiA+IEFydW4gRWFzaQ0KPiA+ID4gPiA+ID4gPiA+ID4g
PGFydW4uZWFzaUBjYXZpdW0uY29tPjsgT21hciBTYW5kb3ZhbCA8b3NhbmRvdkBmYi5jb20+Ow0K
PiA+ID4gPiA+ID4gPiA+ID4gTWFydGluIEsgLg0KPiA+ID4gPiA+ID4gPiA+ID4gUGV0ZXJzZW4g
PG1hcnRpbi5wZXRlcnNlbkBvcmFjbGUuY29tPjsgSmFtZXMgQm90dG9tbGV5DQo+ID4gPiA+ID4g
PiA+ID4gPiA8amFtZXMuYm90dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbT47IENocmlzdG9w
aCBIZWxsd2lnDQo+ID4gPiA+ID4gPiA+ID4gPiA8aCBjaEBsIHN0IC5kZT47IERvbiBCcmFjZSA8
ZG9uLmJyYWNlQG1pY3Jvc2VtaS5jb20+Ow0KPiA+ID4gPiA+ID4gPiA+ID4gS2FzaHlhcCBEZXNh
aSA8a2FzaHlhcC5kZXNhaUBicm9hZGNvbS5jb20+OyBQZXRlciBSaXZlcmENCj4gPiA+ID4gPiA+
ID4gPiA+IDxwZXRlci5yaXZlcmFAIGJyb2EgZGNvbSAuYw0KPiA+ID4gPiA+ID4gPiA+ID4gb20+
Ow0KPiA+ID4gPiA+ID4gPiA+ID4gTGF1cmVuY2UgT2Jlcm1hbiA8bG9iZXJtYW5AcmVkaGF0LmNv
bT47IE1pbmcgTGVpDQo+ID4gPiA+ID4gPiA+ID4gPiA8bWluZy5sZWlAcmVkaGF0LmNvbT47IE1l
ZWxpcyBSb29zIDxtcm9vc0BsaW51eC5lZT4NCj4gPiA+ID4gPiA+ID4gPiA+IFN1YmplY3Q6IFtQ
QVRDSCBWMyAxLzhdIHNjc2k6IGhwc2E6IGZpeCBzZWxlY3Rpb24gb2YgcmVwbHkNCj4gPiA+ID4g
PiA+ID4gPiA+IHF1ZXVlDQo+ID4gPiA+ID4gPiA+ID4gPg0KPiA+ID4gPiA+DQo+ID4gPiA+ID4g
U2VlbXMgRG9uIHJ1biBpbnRvIElPIGZhaWx1cmUgd2l0aG91dCBibGstbXEsIGNvdWxkIHlvdSBy
dW4geW91cg0KPiA+ID4gPiA+IHRlc3RzIGFnYWluIGluIGxlZ2FjeSBtb2RlPw0KPiA+ID4gPiA+
DQo+ID4gPiA+ID4gVGhhbmtzLA0KPiA+ID4gPiA+IE1pbmcNCj4gPiA+ID4NCj4gPiA+ID4gSGVs
bG8gTWluZw0KPiA+ID4gPiBJIHJhbiBtdWx0aXBsZSBwYXNzZXMgb24gTGVnYWN5IGFuZCBzdGls
bCBzZWUgbm8gaXNzdWVzIGluIG15IHRlc3QNCj4gPiA+ID4gYmVkDQo+ID4gPiA+DQoNClRlc3Rz
IHJhbiBhbGwgd2Vla2VuZCB3aXRob3V0IGlzc3Vlcy4NCg0KDQo+ID4gPiA+IEJPT1RfSU1BR0U9
L3ZtbGludXotNC4xNi4wLXJjMi5taW5nKyByb290PVVVSUQ9NDNmODZkNzEtYjFiZi0NCj4gNDc4
OS0NCj4gPiA+ID4gYTI4ZS0NCj4gPiA+ID4gMjFjNmRkYzkwMTk1IHJvIGNyYXNoa2VybmVsPTI1
Nk1ANjRNIGxvZ19idWZfbGVuPTY0TQ0KPiA+ID4gPiBjb25zb2xlPXR0eVMxLDExNTIwMG44DQo+
ID4gPiA+DQo+ID4gPiA+IEhFQUQgb2YgdGhlIGdpdCBrZXJuZWwgSSBhbSB1c2luZw0KPiA+ID4g
Pg0KPiA+ID4gPiA2OTRlMTZmIHNjc2k6IG1lZ2FyYWlkOiBpbXByb3ZlIHNjc2lfbXEgcGVyZm9y
bWFuY2UgdmlhIC5ob3N0X3RhZ3NldA0KPiA+ID4gPiA3OTM2ODZjIHNjc2k6IGhwc2E6IGltcHJv
dmUgc2NzaV9tcSBwZXJmb3JtYW5jZSB2aWEgLmhvc3RfdGFnc2V0DQo+ID4gPiA+IDYwZDViMzYg
YmxvY2s6IG51bGxfYmxrOiBpbnRyb2R1Y2UgbW9kdWxlIHBhcmFtZXRlciBvZiAnZ19ob3N0X3Rh
Z3MnDQo+ID4gPiA+IDg4NDcwNjcgc2NzaTogQWRkIHRlbXBsYXRlIGZsYWcgJ2hvc3RfdGFnc2V0
Jw0KPiA+ID4gPiBhOGZiZGQ2IGJsay1tcTogaW50cm9kdWNlIEJMS19NUV9GX0hPU1RfVEFHUyA0
NzEwZmFiIGJsay1tcToNCj4gPiA+ID4gaW50cm9kdWNlICdzdGFydF90YWcnIGZpZWxkIHRvICdz
dHJ1Y3QgYmxrX21xX3RhZ3MnDQo+ID4gPiA+IDA5YmIxNTMgc2NzaTogbWVnYXJhaWRfc2FzOiBm
aXggc2VsZWN0aW9uIG9mIHJlcGx5IHF1ZXVlDQo+ID4gPiA+IDUyNzAwZDggc2NzaTogaHBzYTog
Zml4IHNlbGVjdGlvbiBvZiByZXBseSBxdWV1ZQ0KPiA+ID4NCj4gPiA+IEkgY2hlY2tvdXQgb3V0
IExpbnVzJ3MgdHJlZSAoNC4xNi4wLXJjMyspIGFuZCByZS1hcHBsaWVkIHRoZSBhYm92ZQ0KPiA+
ID4gcGF0Y2hlcy4NCj4gPiA+IEkgIGFuZCBoYXZlIGJlZW4gcnVubmluZyAyNCBob3VycyB3aXRo
IG5vIGlzc3Vlcy4NCj4gPiA+IEV2aWRlbnRseSBteSBmb3JrZWQgY29weSB3YXMgY29ycnVwdGVk
Lg0KPiA+ID4NCj4gPiA+IFNvLCBteSBJL08gdGVzdGluZyBoYXMgZ29uZSB3ZWxsLg0KPiA+ID4N
Cj4gPiA+IEknbGwgcnVuIHNvbWUgcGVyZm9ybWFuY2UgbnVtYmVycyBuZXh0Lg0KPiA+ID4NCj4g
PiA+IFRoYW5rcywNCj4gPiA+IERvbg0KPiA+DQo+ID4gVW5sZXNzIEthc2h5YXAgaXMgbm90IGhh
cHB5IHdlIG5lZWQgdG8gY29uc2lkZXIgZ2V0dGluZyB0aGlzIGluIHRvIExpbnVzDQo+ID4gbm93
DQo+ID4gYmVjYXVzZSB3ZSBhcmUgc2VlaW5nIEhQRSBzZXJ2ZXJzIHRoYXQga2VlcCBoYW5naW5n
IG5vdyB3aXRoIHRoZSBvcmlnaW5hbA0KPiA+IGNvbW1pdCBub3cgdXBzdHJlYW0uDQo+ID4NCj4g
PiBLYXNoeWFwLCBhcmUgeW91IGdvb2Qgd2l0aCB0aGUgdjMgcGF0Y2hzZXQgb3Igc3RpbGwgY29u
Y2VybmVkIHdpdGgNCj4gPiBwZXJmb3JtYW5jZS4gSSB3YXMgZ2V0dGluZyBwcmV0dHkgZ29vZCBJ
T1BTL3NlYyB0byBpbmRpdmlkdWFsIFNTRCBkcml2ZXMNCj4gPiBzZXQNCj4gPiB1cCBhcyBqYm9k
IGRldmljZXMgb24gdGhlIG1lZ2FyYWlkX3Nhcy4NCj4gDQo+IExhdXJlbmNlIC0NCj4gRGlkIHlv
dSBmaW5kIGRpZmZlcmVuY2Ugd2l0aC93aXRob3V0IHRoZSBwYXRjaCA/IFdoYXQgd2FzIElPUHMg
bnVtYmVyIHdpdGgNCj4gYW5kIHdpdGhvdXQgcGF0Y2guDQo+IEl0IGlzIG5vdCB1cmdlbnQgZmVh
dHVyZSwgc28gSSB3b3VsZCBsaWtlIHRvIHRha2Ugc29tZSB0aW1lIHRvIGdldCBCUkNNJ3MNCj4g
cGVyZm9ybWFuY2UgdGVhbSBpbnZvbHZlZCBhbmQgZG8gZnVsbCBhbmFseXNpcyBvZiBwZXJmb3Jt
YW5jZSBydW4gYW5kIGZpbmQNCj4gcHJvcy9jb25zLg0KPiANCj4gS2FzaHlhcA0KPiA+DQo+ID4g
V2l0aCBsYXJnZXIgSS9PIHNpemVzIGxpa2UgMU1CIEkgd2FzIGdldHRpbmcgZ29vZCBNQi9zZWMg
YW5kIG5vdCBzZWVpbmcgYQ0KPiA+IG1lYXN1cmFibGUgcGVyZm9ybWFuY2UgaW1wYWN0Lg0KPiA+
DQo+ID4gSSBkb250IGhhdmUgdGhlIGhhcmR3YXJlIHlvdSBoYXZlIHRvIG1pbWljIHlvdXIgY29u
ZmlndXJhdGlvbi4NCj4gPg0KPiA+IFRoYW5rcw0KPiA+IExhdXJlbmNlDQo=

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-05  7:23                 ` Kashyap Desai
  2018-03-05 14:35                   ` Don Brace
@ 2018-03-05 15:19                   ` Mike Snitzer
  1 sibling, 0 replies; 54+ messages in thread
From: Mike Snitzer @ 2018-03-05 15:19 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Laurence Oberman, Don Brace, Ming Lei, Jens Axboe, linux-block,
	Christoph Hellwig, linux-scsi, Hannes Reinecke, Arun Easi,
	Omar Sandoval, Martin K . Petersen, James Bottomley,
	Christoph Hellwig, Peter Rivera, Meelis Roos

On Mon, Mar 05 2018 at  2:23am -0500,
Kashyap Desai <kashyap.desai@broadcom.com> wrote:

> > -----Original Message-----
> > From: Laurence Oberman [mailto:loberman@redhat.com]
> > Sent: Saturday, March 3, 2018 3:23 AM
> > To: Don Brace; Ming Lei
> > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> > Snitzer;
> > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Kashyap Desai;
> > Peter
> > Rivera; Meelis Roos
> > Subject: Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
> >
...
> > Unless Kashyap is not happy we need to consider getting this in to Linus
> > now
> > because we are seeing HPE servers that keep hanging now with the original
> > commit now upstream.
> >
> > Kashyap, are you good with the v3 patchset or still concerned with
> > performance. I was getting pretty good IOPS/sec to individual SSD drives
> > set
> > up as jbod devices on the megaraid_sas.
> 
> Laurence -
> Did you find difference with/without the patch ? What was IOPs number with
> and without patch.
> It is not urgent feature, so I would like to take some time to get BRCM's
> performance team involved and do full analysis of performance run and find
> pros/cons.

Performance doesn't matter if the system cannot even boot (e.g. HPE
servers with hpsa using the latest linus tree).

Have you tried your testbed with just applying the first 2 patches?  Or
do those cause the performance hit and the follow-on patches in the
series attempt to recover from it?

Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-05  2:07                 ` Ming Lei
@ 2018-03-06 17:55                   ` Martin K. Petersen
  2018-03-06 19:24                   ` Martin K. Petersen
  1 sibling, 0 replies; 54+ messages in thread
From: Martin K. Petersen @ 2018-03-06 17:55 UTC (permalink / raw)
  To: Ming Lei
  Cc: Laurence Oberman, Martin K . Petersen, James Bottomley,
	Don Brace, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, linux-scsi, Hannes Reinecke, Arun Easi,
	Omar Sandoval, Christoph Hellwig, Kashyap Desai, Peter Rivera,
	Meelis Roos


Hi Ming,

> Given both Don and Laurence have verified that patch 1 and patch 2
> does fix IO hang, could you consider to merge the two first?

I'm not going to merge the MR patch until Kashyap acks it.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-05  2:07                 ` Ming Lei
  2018-03-06 17:55                   ` Martin K. Petersen
@ 2018-03-06 19:24                   ` Martin K. Petersen
  2018-03-07  0:00                     ` Ming Lei
  2018-03-07 14:11                     ` Laurence Oberman
  1 sibling, 2 replies; 54+ messages in thread
From: Martin K. Petersen @ 2018-03-06 19:24 UTC (permalink / raw)
  To: Ming Lei
  Cc: Laurence Oberman, Martin K . Petersen, James Bottomley,
	Don Brace, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, linux-scsi, Hannes Reinecke, Arun Easi,
	Omar Sandoval, Christoph Hellwig, Kashyap Desai, Peter Rivera,
	Meelis Roos


Ming,

> Given both Don and Laurence have verified that patch 1 and patch 2
> does fix IO hang, could you consider to merge the two first?

Oh, and I would still need a formal Acked-by: from Don and Tested-by:
from Laurence.

Also, for 4.16/scsi-fixes I would prefer verification to be done with
just patch 1/8 and none of the subsequent changes in place. Just to make
sure we're testing the right thing.

Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-06 19:24                   ` Martin K. Petersen
@ 2018-03-07  0:00                     ` Ming Lei
  2018-03-07  3:14                       ` Martin K. Petersen
  2018-03-07 14:11                     ` Laurence Oberman
  1 sibling, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-07  0:00 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Laurence Oberman, James Bottomley, Don Brace, Jens Axboe,
	linux-block, Christoph Hellwig, Mike Snitzer, linux-scsi,
	Hannes Reinecke, Arun Easi, Omar Sandoval, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Meelis Roos

On Tue, Mar 06, 2018 at 02:24:25PM -0500, Martin K. Petersen wrote:
> 
> Ming,
> 
> > Given both Don and Laurence have verified that patch 1 and patch 2
> > does fix IO hang, could you consider to merge the two first?
> 
> Oh, and I would still need a formal Acked-by: from Don and Tested-by:
> from Laurence.
> 
> Also, for 4.16/scsi-fixes I would prefer verification to be done with
> just patch 1/8 and none of the subsequent changes in place. Just to make
> sure we're testing the right thing.

Hi Martin,

Please consider 2/8 too since it is still a fix.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-07  0:00                     ` Ming Lei
@ 2018-03-07  3:14                       ` Martin K. Petersen
  0 siblings, 0 replies; 54+ messages in thread
From: Martin K. Petersen @ 2018-03-07  3:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K. Petersen, Laurence Oberman, James Bottomley, Don Brace,
	Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Christoph Hellwig, Kashyap Desai, Peter Rivera, Meelis Roos


Ming,

> Please consider 2/8 too since it is still a fix.

I still need the driver maintainer to ack the change.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-02-28 14:58   ` Kashyap Desai
  2018-02-28 15:21     ` Ming Lei
@ 2018-03-07  5:27     ` Ming Lei
  2018-03-07 15:01       ` Kashyap Desai
  1 sibling, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-07  5:27 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> Ming -
> 
> Quick testing on my setup -  Performance slightly degraded (4-5% drop)for
> megaraid_sas driver with this patch. (From 1610K IOPS it goes to 1544K)
> I confirm that after applying this patch, we have #queue = #numa node.
> 
> ls -l
> /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2:23/10:
> 2:23:0/block/sdy/mq
> total 0
> drwxr-xr-x. 18 root root 0 Feb 28 09:53 0
> drwxr-xr-x. 18 root root 0 Feb 28 09:53 1
> 
> 
> I would suggest to skip megaraid_sas driver changes using shared_tagset
> until and unless there is obvious gain. If overall interface of using
> shared_tagset is commit in kernel tree, we will investigate (megaraid_sas
> driver) in future about real benefit of using it.

Hi Kashyap,

Now I have put patches for removing operating on scsi_host->host_busy
in V4[1], especially which are done in the following 3 patches:

	9221638b9bc9 scsi: avoid to hold host_busy for scsi_mq
	1ffc8c0ffbe4 scsi: read host_busy via scsi_host_busy()
	e453d3983243 scsi: introduce scsi_host_busy()


Could you run your test on V4 and see if IOPS can be improved on
megaraid_sas?


[1] https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-06 19:24                   ` Martin K. Petersen
  2018-03-07  0:00                     ` Ming Lei
@ 2018-03-07 14:11                     ` Laurence Oberman
  2018-03-08 13:42                       ` Ming Lei
  1 sibling, 1 reply; 54+ messages in thread
From: Laurence Oberman @ 2018-03-07 14:11 UTC (permalink / raw)
  To: Martin K. Petersen, Ming Lei
  Cc: James Bottomley, Don Brace, Jens Axboe, linux-block,
	Christoph Hellwig, Mike Snitzer, linux-scsi, Hannes Reinecke,
	Arun Easi, Omar Sandoval, Christoph Hellwig, Kashyap Desai,
	Peter Rivera, Meelis Roos

On Tue, 2018-03-06 at 14:24 -0500, Martin K. Petersen wrote:
> Ming,
> 
> > Given both Don and Laurence have verified that patch 1 and patch 2
> > does fix IO hang, could you consider to merge the two first?
> 
> Oh, and I would still need a formal Acked-by: from Don and Tested-by:
> from Laurence.
> 
> Also, for 4.16/scsi-fixes I would prefer verification to be done with
> just patch 1/8 and none of the subsequent changes in place. Just to
> make
> sure we're testing the right thing.
> 
> Thanks!
> 

Hello Martin

I tested just Patch 1/8 from the V3 series.
No issues running workload and no issues booting on the DL380G7.
Don can you ack this so we can at least get this one in.

Against: 4.16.0-rc4.v31of8+ on an x86_64

Tested-by: Laurence Oberman <loberman@redhat.com>

Thanks
Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-07  5:27     ` Ming Lei
@ 2018-03-07 15:01       ` Kashyap Desai
  2018-03-07 16:05         ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Kashyap Desai @ 2018-03-07 15:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Wednesday, March 7, 2018 10:58 AM
> To: Kashyap Desai
> Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
Snitzer;
> linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
Peter
> Rivera; Laurence Oberman
> Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
via
> .host_tagset
>
> On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > Ming -
> >
> > Quick testing on my setup -  Performance slightly degraded (4-5%
> > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it goes
> > to 1544K) I confirm that after applying this patch, we have #queue =
#numa
> node.
> >
> > ls -l
> >
>
/sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2:23/10:
> > 2:23:0/block/sdy/mq
> > total 0
> > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > Feb 28 09:53 1
> >
> >
> > I would suggest to skip megaraid_sas driver changes using
> > shared_tagset until and unless there is obvious gain. If overall
> > interface of using shared_tagset is commit in kernel tree, we will
> > investigate (megaraid_sas
> > driver) in future about real benefit of using it.
>
> Hi Kashyap,
>
> Now I have put patches for removing operating on scsi_host->host_busy in
> V4[1], especially which are done in the following 3 patches:
>
> 	9221638b9bc9 scsi: avoid to hold host_busy for scsi_mq
> 	1ffc8c0ffbe4 scsi: read host_busy via scsi_host_busy()
> 	e453d3983243 scsi: introduce scsi_host_busy()
>
>
> Could you run your test on V4 and see if IOPS can be improved on
> megaraid_sas?
>
>
> [1] https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4

I will be doing testing soon.

BTW - Performance impact is due below patch only -
"[PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via
.host_tagset"

Below patch is really needed -
"[PATCH V3 2/8] scsi: megaraid_sas: fix selection of reply queue"

I am currently doing review on my setup.  I think above patch is fixing
real issue of performance (for megaraid_sas) as driver may not be sending
IO to optimal reply queue.
Having CPU to MSIx mapping will solve that. Megaraid_sas driver always
create max MSIx as min (online CPU, # MSIx HW support).
I will do more review and testing for that particular patch as well.

Also one observation using V3 series patch. I am seeing below Affinity
mapping whereas I have only 72 logical CPUs.  It means we are really not
going to use all reply queues.
e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue is
used and that may lead to performance drop as well.

PCI name is 86:00.0, dump its irq affinity:
irq 218, cpu list 0-2,36-37
irq 219, cpu list 3-5,39-40
irq 220, cpu list 6-8,42-43
irq 221, cpu list 9-11,45-46
irq 222, cpu list 12-13,48-49
irq 223, cpu list 14-15,50-51
irq 224, cpu list 16-17,52-53
irq 225, cpu list 38,41,44,47
irq 226, cpu list 72,74,76,78
irq 227, cpu list 80,82,84,86
irq 228, cpu list 88,90,92,94
irq 229, cpu list 96,98,100,102
irq 230, cpu list 104,106,108,110
irq 231, cpu list 112,114,116,118
irq 232, cpu list 120,122,124,126
irq 233, cpu list 128,130,132,134
irq 234, cpu list 136,138,140,142
irq 235, cpu list 144,146,148,150
irq 236, cpu list 152,154,156,158
irq 237, cpu list 160,162,164,166
irq 238, cpu list 168,170,172,174
irq 239, cpu list 176,178,180,182
irq 240, cpu list 184,186,188,190
irq 241, cpu list 192,194,196,198
irq 242, cpu list 200,202,204,206
irq 243, cpu list 208,210,212,214
irq 244, cpu list 216,218,220,222
irq 245, cpu list 224,226,228,230
irq 246, cpu list 232,234,236,238
irq 247, cpu list 240,242,244,246
irq 248, cpu list 248,250,252,254
irq 249, cpu list 256,258,260,262
irq 250, cpu list 264,266,268,270
irq 251, cpu list 272,274,276,278
irq 252, cpu list 280,282,284,286
irq 253, cpu list 288,290,292,294
irq 254, cpu list 18-20,54-55
irq 255, cpu list 21-23,57-58
irq 256, cpu list 24-26,60-61
irq 257, cpu list 27-29,63-64
irq 258, cpu list 30-31,66-67
irq 259, cpu list 32-33,68-69
irq 260, cpu list 34-35,70-71
irq 261, cpu list 56,59,62,65
irq 262, cpu list 73,75,77,79
irq 263, cpu list 81,83,85,87
irq 264, cpu list 89,91,93,95
irq 265, cpu list 97,99,101,103
irq 266, cpu list 105,107,109,111
irq 267, cpu list 113,115,117,119
irq 268, cpu list 121,123,125,127
irq 269, cpu list 129,131,133,135
irq 270, cpu list 137,139,141,143
irq 271, cpu list 145,147,149,151
irq 272, cpu list 153,155,157,159
irq 273, cpu list 161,163,165,167
irq 274, cpu list 169,171,173,175
irq 275, cpu list 177,179,181,183
irq 276, cpu list 185,187,189,191
irq 277, cpu list 193,195,197,199
irq 278, cpu list 201,203,205,207
irq 279, cpu list 209,211,213,215
irq 280, cpu list 217,219,221,223
irq 281, cpu list 225,227,229,231
irq 282, cpu list 233,235,237,239
irq 283, cpu list 241,243,245,247
irq 284, cpu list 249,251,253,255
irq 285, cpu list 257,259,261,263
irq 286, cpu list 265,267,269,271
irq 287, cpu list 273,275,277,279
irq 288, cpu list 281,283,285,287
irq 289, cpu list 289,291,293,295


>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-07 15:01       ` Kashyap Desai
@ 2018-03-07 16:05         ` Ming Lei
  2018-03-07 17:28           ` Kashyap Desai
  0 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-07 16:05 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

On Wed, Mar 07, 2018 at 08:31:31PM +0530, Kashyap Desai wrote:
> > -----Original Message-----
> > From: Ming Lei [mailto:ming.lei@redhat.com]
> > Sent: Wednesday, March 7, 2018 10:58 AM
> > To: Kashyap Desai
> > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> Peter
> > Rivera; Laurence Oberman
> > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> via
> > .host_tagset
> >
> > On Wed, Feb 28, 2018 at 08:28:48PM +0530, Kashyap Desai wrote:
> > > Ming -
> > >
> > > Quick testing on my setup -  Performance slightly degraded (4-5%
> > > drop)for megaraid_sas driver with this patch. (From 1610K IOPS it goes
> > > to 1544K) I confirm that after applying this patch, we have #queue =
> #numa
> > node.
> > >
> > > ls -l
> > >
> >
> /sys/devices/pci0000:80/0000:80:02.0/0000:83:00.0/host10/target10:2:23/10:
> > > 2:23:0/block/sdy/mq
> > > total 0
> > > drwxr-xr-x. 18 root root 0 Feb 28 09:53 0 drwxr-xr-x. 18 root root 0
> > > Feb 28 09:53 1
> > >
> > >
> > > I would suggest to skip megaraid_sas driver changes using
> > > shared_tagset until and unless there is obvious gain. If overall
> > > interface of using shared_tagset is commit in kernel tree, we will
> > > investigate (megaraid_sas
> > > driver) in future about real benefit of using it.
> >
> > Hi Kashyap,
> >
> > Now I have put patches for removing operating on scsi_host->host_busy in
> > V4[1], especially which are done in the following 3 patches:
> >
> > 	9221638b9bc9 scsi: avoid to hold host_busy for scsi_mq
> > 	1ffc8c0ffbe4 scsi: read host_busy via scsi_host_busy()
> > 	e453d3983243 scsi: introduce scsi_host_busy()
> >
> >
> > Could you run your test on V4 and see if IOPS can be improved on
> > megaraid_sas?
> >
> >
> > [1] https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4
> 
> I will be doing testing soon.

Today I revisit your previous perf trace too, seems the following samples take
a bit more CPU:

   4.64%  [megaraid_sas]           [k] complete_cmd_fusion
   ...
   2.22%  [megaraid_sas]           [k] megasas_build_io_fusion
   ...
   1.33%  [megaraid_sas]           [k] megasas_build_and_issue_cmd_fusion

But V4 should get a bit improvement in theory.

And if some host-wide resource of megaraid_sas can be partitioned to
per-node hw queue, I guess some of improvement can be got too.

> 
> BTW - Performance impact is due below patch only -
> "[PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via
> .host_tagset"
> 
> Below patch is really needed -
> "[PATCH V3 2/8] scsi: megaraid_sas: fix selection of reply queue"
> 
> I am currently doing review on my setup.  I think above patch is fixing
> real issue of performance (for megaraid_sas) as driver may not be sending
> IO to optimal reply queue.

The ideal way is to map reply queue to blk-mq's hw queue, but seems
SCSI/driver's IO path is too slow so that high enough hw queue
depth(from device internal view, for example 256) still can't reach good
performance, as you observed.

> Having CPU to MSIx mapping will solve that. Megaraid_sas driver always
> create max MSIx as min (online CPU, # MSIx HW support).
> I will do more review and testing for that particular patch as well.

OK, thanks!

> 
> Also one observation using V3 series patch. I am seeing below Affinity
> mapping whereas I have only 72 logical CPUs.  It means we are really not
> going to use all reply queues.
> e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue is
> used and that may lead to performance drop as well.

If the mapping is in such shape, I guess it should be quite difficult to
figure out one perfect way to solve this situation because one reply
queue has to handle IOs submitted from 4~5 CPUs at average.

The application should have the knowledge to avoid this kind of usage.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-07 16:05         ` Ming Lei
@ 2018-03-07 17:28           ` Kashyap Desai
  2018-03-08  1:15             ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Kashyap Desai @ 2018-03-07 17:28 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

> >
> > Also one observation using V3 series patch. I am seeing below Affinity
> > mapping whereas I have only 72 logical CPUs.  It means we are really
> > not going to use all reply queues.
> > e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue
> > is used and that may lead to performance drop as well.
>
> If the mapping is in such shape, I guess it should be quite difficult to
figure out
> one perfect way to solve this situation because one reply queue has to
handle
> IOs submitted from 4~5 CPUs at average.

4.15.0-rc1 kernel has below mapping - I am not sure which commit id in "
linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to CPU.  It
will be really good if we can fall back to below mapping once again.
Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of random mapping
of CPU - MSIx. And that will be problematic in performance run.

As I posted earlier, latest repo will only allow us to use *18* reply
queue instead of *72*.  Lots of performance related issue can be pop up on
different setup due to inconsistency in CPU - MSIx mapping. BTW, changes
in this area is intentional @" linux_4.16-rc-host-tags-v3.2". ?

irq 218, cpu list 0
irq 219, cpu list 1
irq 220, cpu list 2
irq 221, cpu list 3
irq 222, cpu list 4
irq 223, cpu list 5
irq 224, cpu list 6
irq 225, cpu list 7
irq 226, cpu list 8
irq 227, cpu list 9
irq 228, cpu list 10
irq 229, cpu list 11
irq 230, cpu list 12
irq 231, cpu list 13
irq 232, cpu list 14
irq 233, cpu list 15
irq 234, cpu list 16
irq 235, cpu list 17
irq 236, cpu list 36
irq 237, cpu list 37
irq 238, cpu list 38
irq 239, cpu list 39
irq 240, cpu list 40
irq 241, cpu list 41
irq 242, cpu list 42
irq 243, cpu list 43
irq 244, cpu list 44
irq 245, cpu list 45
irq 246, cpu list 46
irq 247, cpu list 47
irq 248, cpu list 48
irq 249, cpu list 49
irq 250, cpu list 50
irq 251, cpu list 51
irq 252, cpu list 52
irq 253, cpu list 53
irq 254, cpu list 18
irq 255, cpu list 19
irq 256, cpu list 20
irq 257, cpu list 21
irq 258, cpu list 22
irq 259, cpu list 23
irq 260, cpu list 24
irq 261, cpu list 25
irq 262, cpu list 26
irq 263, cpu list 27
irq 264, cpu list 28
irq 265, cpu list 29
irq 266, cpu list 30
irq 267, cpu list 31
irq 268, cpu list 32
irq 269, cpu list 33
irq 270, cpu list 34
irq 271, cpu list 35
irq 272, cpu list 54
irq 273, cpu list 55
irq 274, cpu list 56
irq 275, cpu list 57
irq 276, cpu list 58
irq 277, cpu list 59
irq 278, cpu list 60
irq 279, cpu list 61
irq 280, cpu list 62
irq 281, cpu list 63
irq 282, cpu list 64
irq 283, cpu list 65
irq 284, cpu list 66
irq 285, cpu list 67
irq 286, cpu list 68
irq 287, cpu list 69
irq 288, cpu list 70
irq 289, cpu list 71


>
> The application should have the knowledge to avoid this kind of usage.
>
>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-07 17:28           ` Kashyap Desai
@ 2018-03-08  1:15             ` Ming Lei
  2018-03-08 10:04               ` Kashyap Desai
  0 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-08  1:15 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

On Wed, Mar 07, 2018 at 10:58:34PM +0530, Kashyap Desai wrote:
> > >
> > > Also one observation using V3 series patch. I am seeing below Affinity
> > > mapping whereas I have only 72 logical CPUs.  It means we are really
> > > not going to use all reply queues.
> > > e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply queue
> > > is used and that may lead to performance drop as well.
> >
> > If the mapping is in such shape, I guess it should be quite difficult to
> figure out
> > one perfect way to solve this situation because one reply queue has to
> handle
> > IOs submitted from 4~5 CPUs at average.
> 
> 4.15.0-rc1 kernel has below mapping - I am not sure which commit id in "
> linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to CPU.  It

I guess the mapping you posted is read from /proc/irq/126/smp_affinity.

If yes, no any patch in linux_4.16-rc-host-tags-v3.2 should change IRQ
affinity code, which is done in irq_create_affinity_masks(), as you saw, no any
patch in linux_4.16-rc-host-tags-v3.2 touches that code.

Could you simply apply the patches in linux_4.16-rc-host-tags-v3.2 against
4.15-rc1 kernel and see any difference?

> will be really good if we can fall back to below mapping once again.
> Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of random mapping
> of CPU - MSIx. And that will be problematic in performance run.
> 
> As I posted earlier, latest repo will only allow us to use *18* reply

Looks not see this report before, could you share us how you conclude that?
The only patch changing reply queue is the following one:

	https://marc.info/?l=linux-block&m=151972611911593&w=2

But not see any issue in this patch yet, can you recover to 72 reply
queues after reverting the patch in above link?

> queue instead of *72*.  Lots of performance related issue can be pop up on
> different setup due to inconsistency in CPU - MSIx mapping. BTW, changes
> in this area is intentional @" linux_4.16-rc-host-tags-v3.2". ?

As you mentioned in the following link, you didn't see big performance drop
with linux_4.16-rc-host-tags-v3.2, right?

	https://marc.info/?l=linux-block&m=151982993810092&w=2


Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-02-27 10:07 ` [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue Ming Lei
  2018-03-01 16:18   ` Don Brace
@ 2018-03-08  7:50   ` Christoph Hellwig
  2018-03-08  8:15     ` Ming Lei
  1 sibling, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2018-03-08  7:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman,
	Meelis Roos

> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> +{
> +	const struct cpumask *mask;
> +	unsigned int queue, cpu;
> +
> +	for (queue = 0; queue < h->msix_vectors; queue++) {
> +		mask = pci_irq_get_affinity(h->pdev, queue);
> +		if (!mask)
> +			goto fallback;
> +
> +		for_each_cpu(cpu, mask)
> +			h->reply_map[cpu] = queue;
> +	}
> +	return;
> +
> +fallback:
> +	for_each_possible_cpu(cpu)
> +		h->reply_map[cpu] = 0;
> +}

It seems a little annoying that we have to duplicate this in the driver.
Wouldn't this be solved by your force_blk_mq flag and relying on the
hw_ctx id?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
  2018-02-27 10:07 ` [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags' Ming Lei
@ 2018-03-08  7:51   ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2018-03-08  7:51 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman

Looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS
  2018-02-27 10:07 ` [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS Ming Lei
@ 2018-03-08  7:52   ` Christoph Hellwig
  2018-03-08  9:35     ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2018-03-08  7:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman

On Tue, Feb 27, 2018 at 06:07:46PM +0800, Ming Lei wrote:
> This patch can support to partition host-wide tags to multiple hw queues,
> so each hw queue related data structures(tags, hctx) can be accessed in
> NUMA locality way, for example, the hw queue can be per NUMA node.
> 
> It is observed IOPS can be improved much in this way on null_blk test.

null_blk isn't too interesting, so some real hardware number would
be very useful here.

Also the documentation should be a lot less sparse.  When are we going
to set this flag?  What help are we going to give driver authors to
guide chosing the option?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset
  2018-02-27 10:07 ` [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset Ming Lei
@ 2018-03-08  7:54   ` Christoph Hellwig
  2018-03-08 10:59     ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Christoph Hellwig @ 2018-03-08  7:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Kashyap Desai, Peter Rivera, Laurence Oberman

> +	/* 256 tags should be high enough to saturate device */
> +	int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256);
> +
> +	/* per NUMA node hw queue */
> +	h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);

I don't think this magic should be in a driver.  The per-node hw_queue
selection seems like something we'd better do in the core code.

Also the whole idea to use nr_hw_queues for just partitioning tag
space on hardware that doesn't really support multiple hardware queues
seems more than odd.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-08  7:50   ` Christoph Hellwig
@ 2018-03-08  8:15     ` Ming Lei
  2018-03-08  8:41       ` Hannes Reinecke
  0 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-08  8:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Don Brace, Kashyap Desai,
	Peter Rivera, Laurence Oberman, Meelis Roos

On Thu, Mar 08, 2018 at 08:50:35AM +0100, Christoph Hellwig wrote:
> > +static void hpsa_setup_reply_map(struct ctlr_info *h)
> > +{
> > +	const struct cpumask *mask;
> > +	unsigned int queue, cpu;
> > +
> > +	for (queue = 0; queue < h->msix_vectors; queue++) {
> > +		mask = pci_irq_get_affinity(h->pdev, queue);
> > +		if (!mask)
> > +			goto fallback;
> > +
> > +		for_each_cpu(cpu, mask)
> > +			h->reply_map[cpu] = queue;
> > +	}
> > +	return;
> > +
> > +fallback:
> > +	for_each_possible_cpu(cpu)
> > +		h->reply_map[cpu] = 0;
> > +}
> 
> It seems a little annoying that we have to duplicate this in the driver.
> Wouldn't this be solved by your force_blk_mq flag and relying on the
> hw_ctx id?

This issue can be solved by force_blk_mq, but may cause performance
regression for host-wide tagset drivers:

- If the whole tagset is partitioned into each hw queue, each hw queue's
depth may not be high enough, especially SCSI's IO path may be not
efficient enough. Even though we keep each queue's depth as 256, which
should be high enough to exploit parallelism from device internal view,
but still can't get good performance.

- If the whole tagset is still shared among all hw queues, the shared
tags can be accessed from all CPUs, and IOPS is degraded.

Kashyap has tested the above two approaches, both hurts IOPS on megaraid_sas.


thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-08  8:15     ` Ming Lei
@ 2018-03-08  8:41       ` Hannes Reinecke
  2018-03-08  9:19         ` Ming Lei
  2018-03-08 15:31         ` Bart Van Assche
  0 siblings, 2 replies; 54+ messages in thread
From: Hannes Reinecke @ 2018-03-08  8:41 UTC (permalink / raw)
  To: Ming Lei, Christoph Hellwig
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Arun Easi, Omar Sandoval, Martin K . Petersen,
	James Bottomley, Don Brace, Kashyap Desai, Peter Rivera,
	Laurence Oberman, Meelis Roos

On 03/08/2018 09:15 AM, Ming Lei wrote:
> On Thu, Mar 08, 2018 at 08:50:35AM +0100, Christoph Hellwig wrote:
>>> +static void hpsa_setup_reply_map(struct ctlr_info *h)
>>> +{
>>> +	const struct cpumask *mask;
>>> +	unsigned int queue, cpu;
>>> +
>>> +	for (queue = 0; queue < h->msix_vectors; queue++) {
>>> +		mask = pci_irq_get_affinity(h->pdev, queue);
>>> +		if (!mask)
>>> +			goto fallback;
>>> +
>>> +		for_each_cpu(cpu, mask)
>>> +			h->reply_map[cpu] = queue;
>>> +	}
>>> +	return;
>>> +
>>> +fallback:
>>> +	for_each_possible_cpu(cpu)
>>> +		h->reply_map[cpu] = 0;
>>> +}
>>
>> It seems a little annoying that we have to duplicate this in the driver.
>> Wouldn't this be solved by your force_blk_mq flag and relying on the
>> hw_ctx id?
> 
> This issue can be solved by force_blk_mq, but may cause performance
> regression for host-wide tagset drivers:
> 
> - If the whole tagset is partitioned into each hw queue, each hw queue's
> depth may not be high enough, especially SCSI's IO path may be not
> efficient enough. Even though we keep each queue's depth as 256, which
> should be high enough to exploit parallelism from device internal view,
> but still can't get good performance.
> 
> - If the whole tagset is still shared among all hw queues, the shared
> tags can be accessed from all CPUs, and IOPS is degraded.
> 
> Kashyap has tested the above two approaches, both hurts IOPS on megaraid_sas.
> 
This is precisely the issue I have been worried about, too.

The problem is not so much the tagspace (which actually is quite small
memory footprint-wise), but rather the _requests_ indexed by the tags.

We have this:

struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
                                        unsigned int hctx_idx,
                                        unsigned int nr_tags,
                                        unsigned int reserved_tags)
{
        struct blk_mq_tags *tags;
        int node;

        node = blk_mq_hw_queue_to_node(set->mq_map, hctx_idx);
        if (node == NUMA_NO_NODE)
                node = set->numa_node;

        tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
                     BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
        if (!tags)
                return NULL;

        tags->rqs = kzalloc_node(nr_tags * sizeof(struct request *),
                      GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, node);


IE the _entire_ request set is allocated as _one_ array, making it quite
hard to handle from the lower-level CPU caches.
Also the 'node' indicator doesn't really help us here, as the requests
have to be access by all CPUs in the shared tag case.

Would it be possible move tags->rqs to become a _double_ pointer?
Then we would have only a shared lookup table, but the requests
themselves can be allocated per node, depending on the CPU map.
_And_ it should be easier on the CPU cache ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-08  8:41       ` Hannes Reinecke
@ 2018-03-08  9:19         ` Ming Lei
  2018-03-08 15:31         ` Bart Van Assche
  1 sibling, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-03-08  9:19 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Christoph Hellwig,
	Mike Snitzer, linux-scsi, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Don Brace, Kashyap Desai,
	Peter Rivera, Laurence Oberman, Meelis Roos

On Thu, Mar 08, 2018 at 09:41:16AM +0100, Hannes Reinecke wrote:
> On 03/08/2018 09:15 AM, Ming Lei wrote:
> > On Thu, Mar 08, 2018 at 08:50:35AM +0100, Christoph Hellwig wrote:
> >>> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> >>> +{
> >>> +	const struct cpumask *mask;
> >>> +	unsigned int queue, cpu;
> >>> +
> >>> +	for (queue = 0; queue < h->msix_vectors; queue++) {
> >>> +		mask = pci_irq_get_affinity(h->pdev, queue);
> >>> +		if (!mask)
> >>> +			goto fallback;
> >>> +
> >>> +		for_each_cpu(cpu, mask)
> >>> +			h->reply_map[cpu] = queue;
> >>> +	}
> >>> +	return;
> >>> +
> >>> +fallback:
> >>> +	for_each_possible_cpu(cpu)
> >>> +		h->reply_map[cpu] = 0;
> >>> +}
> >>
> >> It seems a little annoying that we have to duplicate this in the driver.
> >> Wouldn't this be solved by your force_blk_mq flag and relying on the
> >> hw_ctx id?
> > 
> > This issue can be solved by force_blk_mq, but may cause performance
> > regression for host-wide tagset drivers:
> > 
> > - If the whole tagset is partitioned into each hw queue, each hw queue's
> > depth may not be high enough, especially SCSI's IO path may be not
> > efficient enough. Even though we keep each queue's depth as 256, which
> > should be high enough to exploit parallelism from device internal view,
> > but still can't get good performance.
> > 
> > - If the whole tagset is still shared among all hw queues, the shared
> > tags can be accessed from all CPUs, and IOPS is degraded.
> > 
> > Kashyap has tested the above two approaches, both hurts IOPS on megaraid_sas.
> > 
> This is precisely the issue I have been worried about, too.
> 
> The problem is not so much the tagspace (which actually is quite small
> memory footprint-wise), but rather the _requests_ indexed by the tags.

But V1 is done in this way, one shared tags is used and requests are
allocated for each hw queue in NUMA locality, finally Kashyap confirmed
that IOPS can be recovered to normal if iostats is set as 0 after V1 is
applied:

	https://marc.info/?l=linux-scsi&m=151815231026789&w=2

That means the shared tags does have a big effect on performance.

> 
> We have this:
> 
> struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
>                                         unsigned int hctx_idx,
>                                         unsigned int nr_tags,
>                                         unsigned int reserved_tags)
> {
>         struct blk_mq_tags *tags;
>         int node;
> 
>         node = blk_mq_hw_queue_to_node(set->mq_map, hctx_idx);
>         if (node == NUMA_NO_NODE)
>                 node = set->numa_node;
> 
>         tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
>                      BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
>         if (!tags)
>                 return NULL;
> 
>         tags->rqs = kzalloc_node(nr_tags * sizeof(struct request *),
>                       GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY, node);
> 
> 
> IE the _entire_ request set is allocated as _one_ array, making it quite
> hard to handle from the lower-level CPU caches.
> Also the 'node' indicator doesn't really help us here, as the requests
> have to be access by all CPUs in the shared tag case.
> 
> Would it be possible move tags->rqs to become a _double_ pointer?
> Then we would have only a shared lookup table, but the requests
> themselves can be allocated per node, depending on the CPU map.
> _And_ it should be easier on the CPU cache ...

That is basically same with the way in V1, even similar with V3, in
which per-node hw queue is introduced, from Kashyap's test, the
performance isn't bad. I believe finally IOPS can be improved if
scsi_host->host_busy operation is removed from IO path and
megaraid_sas driver is improved, as I mentioned earlier.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS
  2018-03-08  7:52   ` Christoph Hellwig
@ 2018-03-08  9:35     ` Ming Lei
  0 siblings, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-03-08  9:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Don Brace, Kashyap Desai,
	Peter Rivera, Laurence Oberman

On Thu, Mar 08, 2018 at 08:52:52AM +0100, Christoph Hellwig wrote:
> On Tue, Feb 27, 2018 at 06:07:46PM +0800, Ming Lei wrote:
> > This patch can support to partition host-wide tags to multiple hw queues,
> > so each hw queue related data structures(tags, hctx) can be accessed in
> > NUMA locality way, for example, the hw queue can be per NUMA node.
> > 
> > It is observed IOPS can be improved much in this way on null_blk test.
> 
> null_blk isn't too interesting, so some real hardware number would
> be very useful here.

About 10~20% IOPS improvement can be observed on scsi_debug too, which is
setup on one dual-sockets system.

It needs one hpsa or megaraid_sas host with dozens of SSDs, which seems
not easy to setup for me.

And Kashyap is very cooperative to test patches, looks V3 is much
better than before by using per-node hw queue.

If atomic operations on scsi_host->host_busy are removed, and
megaraid_sas IO path can be optimized a bit, we should get some improvement
by per-node hw queue with BLK_MQ_F_HOST_TAGS on megaraid_sas.

> 
> Also the documentation should be a lot less sparse.  When are we going
> to set this flag?  What help are we going to give driver authors to
> guide chosing the option?

OK, will do that in next version.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-08  1:15             ` Ming Lei
@ 2018-03-08 10:04               ` Kashyap Desai
  2018-03-08 11:06                 ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Kashyap Desai @ 2018-03-08 10:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Thursday, March 8, 2018 6:46 AM
> To: Kashyap Desai
> Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
Snitzer;
> linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
Peter
> Rivera; Laurence Oberman
> Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
via
> .host_tagset
>
> On Wed, Mar 07, 2018 at 10:58:34PM +0530, Kashyap Desai wrote:
> > > >
> > > > Also one observation using V3 series patch. I am seeing below
> > > > Affinity mapping whereas I have only 72 logical CPUs.  It means we
> > > > are really not going to use all reply queues.
> > > > e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply
> > > > queue is used and that may lead to performance drop as well.
> > >
> > > If the mapping is in such shape, I guess it should be quite
> > > difficult to
> > figure out
> > > one perfect way to solve this situation because one reply queue has
> > > to
> > handle
> > > IOs submitted from 4~5 CPUs at average.
> >
> > 4.15.0-rc1 kernel has below mapping - I am not sure which commit id in
"
> > linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to CPU.
> > It
>
> I guess the mapping you posted is read from /proc/irq/126/smp_affinity.
>
> If yes, no any patch in linux_4.16-rc-host-tags-v3.2 should change IRQ
affinity
> code, which is done in irq_create_affinity_masks(), as you saw, no any
patch
> in linux_4.16-rc-host-tags-v3.2 touches that code.
>
> Could you simply apply the patches in linux_4.16-rc-host-tags-v3.2
against
> 4.15-rc1 kernel and see any difference?
>
> > will be really good if we can fall back to below mapping once again.
> > Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of random
> > mapping of CPU - MSIx. And that will be problematic in performance
run.
> >
> > As I posted earlier, latest repo will only allow us to use *18* reply
>
> Looks not see this report before, could you share us how you conclude
that?
> The only patch changing reply queue is the following one:
>
> 	https://marc.info/?l=linux-block&m=151972611911593&w=2
>
> But not see any issue in this patch yet, can you recover to 72 reply
queues
> after reverting the patch in above link?
Ming -

While testing, my system went bad. I debug further and understood that
affinity mapping was changed due to below commit -
84676c1f21e8ff54befe985f4f14dc1edc10046b

[PATCH] genirq/affinity: assign vectors to all possible CPUs

Because of above change, we end up using very less reply queue. Many reply
queues on my setup was mapped to offline/not-available CPUs. This may be
primary contributing to odd performance impact and it may not be truly due
to V3/V4 patch series.

I am planning to check your V3 and V4 series after removing above commit
ID (for performance impact.).

It is good if we spread possible CPUs (instead of online cpus) to all irq
vectors  considering -  We should have at least *one* online CPU mapped to
the vector.

>
> > queue instead of *72*.  Lots of performance related issue can be pop
> > up on different setup due to inconsistency in CPU - MSIx mapping. BTW,
> > changes in this area is intentional @" linux_4.16-rc-host-tags-v3.2".
?
>
> As you mentioned in the following link, you didn't see big performance
drop
> with linux_4.16-rc-host-tags-v3.2, right?
>
> 	https://marc.info/?l=linux-block&m=151982993810092&w=2
>
>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset
  2018-03-08  7:54   ` Christoph Hellwig
@ 2018-03-08 10:59     ` Ming Lei
  0 siblings, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-03-08 10:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Don Brace, Kashyap Desai,
	Peter Rivera, Laurence Oberman

On Thu, Mar 08, 2018 at 08:54:43AM +0100, Christoph Hellwig wrote:
> > +	/* 256 tags should be high enough to saturate device */
> > +	int max_queues = DIV_ROUND_UP(h->scsi_host->can_queue, 256);
> > +
> > +	/* per NUMA node hw queue */
> > +	h->scsi_host->nr_hw_queues = min_t(int, nr_node_ids, max_queues);
> 
> I don't think this magic should be in a driver.  The per-node hw_queue
> selection seems like something we'd better do in the core code.

The thing is that driver code may need to know if multiple queues are used,
then driver may partition its own resource into multi hw queues, and
improve its .queuecommand and .complete_command. That seems what
megaraid_sas should do in next time.

> 
> Also the whole idea to use nr_hw_queues for just partitioning tag
> space on hardware that doesn't really support multiple hardware queues
> seems more than odd.

The per-node hw queue is used together with BLK_MQ_F_HOST_TAGS, which is
really for improving the single queue case(single tagset). If driver/device
supports real multiple hw queues, they don't need this approach.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-08 10:04               ` Kashyap Desai
@ 2018-03-08 11:06                 ` Ming Lei
  2018-03-08 11:23                   ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-08 11:06 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

On Thu, Mar 08, 2018 at 03:34:31PM +0530, Kashyap Desai wrote:
> > -----Original Message-----
> > From: Ming Lei [mailto:ming.lei@redhat.com]
> > Sent: Thursday, March 8, 2018 6:46 AM
> > To: Kashyap Desai
> > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> Peter
> > Rivera; Laurence Oberman
> > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> via
> > .host_tagset
> >
> > On Wed, Mar 07, 2018 at 10:58:34PM +0530, Kashyap Desai wrote:
> > > > >
> > > > > Also one observation using V3 series patch. I am seeing below
> > > > > Affinity mapping whereas I have only 72 logical CPUs.  It means we
> > > > > are really not going to use all reply queues.
> > > > > e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply
> > > > > queue is used and that may lead to performance drop as well.
> > > >
> > > > If the mapping is in such shape, I guess it should be quite
> > > > difficult to
> > > figure out
> > > > one perfect way to solve this situation because one reply queue has
> > > > to
> > > handle
> > > > IOs submitted from 4~5 CPUs at average.
> > >
> > > 4.15.0-rc1 kernel has below mapping - I am not sure which commit id in
> "
> > > linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to CPU.
> > > It
> >
> > I guess the mapping you posted is read from /proc/irq/126/smp_affinity.
> >
> > If yes, no any patch in linux_4.16-rc-host-tags-v3.2 should change IRQ
> affinity
> > code, which is done in irq_create_affinity_masks(), as you saw, no any
> patch
> > in linux_4.16-rc-host-tags-v3.2 touches that code.
> >
> > Could you simply apply the patches in linux_4.16-rc-host-tags-v3.2
> against
> > 4.15-rc1 kernel and see any difference?
> >
> > > will be really good if we can fall back to below mapping once again.
> > > Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of random
> > > mapping of CPU - MSIx. And that will be problematic in performance
> run.
> > >
> > > As I posted earlier, latest repo will only allow us to use *18* reply
> >
> > Looks not see this report before, could you share us how you conclude
> that?
> > The only patch changing reply queue is the following one:
> >
> > 	https://marc.info/?l=linux-block&m=151972611911593&w=2
> >
> > But not see any issue in this patch yet, can you recover to 72 reply
> queues
> > after reverting the patch in above link?
> Ming -
> 
> While testing, my system went bad. I debug further and understood that
> affinity mapping was changed due to below commit -
> 84676c1f21e8ff54befe985f4f14dc1edc10046b
> 
> [PATCH] genirq/affinity: assign vectors to all possible CPUs
> 
> Because of above change, we end up using very less reply queue. Many reply
> queues on my setup was mapped to offline/not-available CPUs. This may be
> primary contributing to odd performance impact and it may not be truly due
> to V3/V4 patch series.

Seems a good news, :-)

> 
> I am planning to check your V3 and V4 series after removing above commit
> ID (for performance impact.).

You can run your test on a server in which all CPUs are kept as online
for avoiding this issue.

Or you can apply the following patchset for avoiding this issue:

	https://marc.info/?l=linux-block&m=152050646332092&w=2

> 
> It is good if we spread possible CPUs (instead of online cpus) to all irq
> vectors  considering -  We should have at least *one* online CPU mapped to
> the vector.

Right, that is exactly what the above patchset does.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-08 11:06                 ` Ming Lei
@ 2018-03-08 11:23                   ` Ming Lei
  2018-03-09  6:56                     ` Kashyap Desai
  0 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-08 11:23 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

On Thu, Mar 08, 2018 at 07:06:25PM +0800, Ming Lei wrote:
> On Thu, Mar 08, 2018 at 03:34:31PM +0530, Kashyap Desai wrote:
> > > -----Original Message-----
> > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > Sent: Thursday, March 8, 2018 6:46 AM
> > > To: Kashyap Desai
> > > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> > Snitzer;
> > > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> > Peter
> > > Rivera; Laurence Oberman
> > > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> > via
> > > .host_tagset
> > >
> > > On Wed, Mar 07, 2018 at 10:58:34PM +0530, Kashyap Desai wrote:
> > > > > >
> > > > > > Also one observation using V3 series patch. I am seeing below
> > > > > > Affinity mapping whereas I have only 72 logical CPUs.  It means we
> > > > > > are really not going to use all reply queues.
> > > > > > e.a If I bind fio jobs on CPU 18-20, I am seeing only one reply
> > > > > > queue is used and that may lead to performance drop as well.
> > > > >
> > > > > If the mapping is in such shape, I guess it should be quite
> > > > > difficult to
> > > > figure out
> > > > > one perfect way to solve this situation because one reply queue has
> > > > > to
> > > > handle
> > > > > IOs submitted from 4~5 CPUs at average.
> > > >
> > > > 4.15.0-rc1 kernel has below mapping - I am not sure which commit id in
> > "
> > > > linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to CPU.
> > > > It
> > >
> > > I guess the mapping you posted is read from /proc/irq/126/smp_affinity.
> > >
> > > If yes, no any patch in linux_4.16-rc-host-tags-v3.2 should change IRQ
> > affinity
> > > code, which is done in irq_create_affinity_masks(), as you saw, no any
> > patch
> > > in linux_4.16-rc-host-tags-v3.2 touches that code.
> > >
> > > Could you simply apply the patches in linux_4.16-rc-host-tags-v3.2
> > against
> > > 4.15-rc1 kernel and see any difference?
> > >
> > > > will be really good if we can fall back to below mapping once again.
> > > > Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of random
> > > > mapping of CPU - MSIx. And that will be problematic in performance
> > run.
> > > >
> > > > As I posted earlier, latest repo will only allow us to use *18* reply
> > >
> > > Looks not see this report before, could you share us how you conclude
> > that?
> > > The only patch changing reply queue is the following one:
> > >
> > > 	https://marc.info/?l=linux-block&m=151972611911593&w=2
> > >
> > > But not see any issue in this patch yet, can you recover to 72 reply
> > queues
> > > after reverting the patch in above link?
> > Ming -
> > 
> > While testing, my system went bad. I debug further and understood that
> > affinity mapping was changed due to below commit -
> > 84676c1f21e8ff54befe985f4f14dc1edc10046b
> > 
> > [PATCH] genirq/affinity: assign vectors to all possible CPUs
> > 
> > Because of above change, we end up using very less reply queue. Many reply
> > queues on my setup was mapped to offline/not-available CPUs. This may be
> > primary contributing to odd performance impact and it may not be truly due
> > to V3/V4 patch series.
> 
> Seems a good news, :-)
> 
> > 
> > I am planning to check your V3 and V4 series after removing above commit
> > ID (for performance impact.).
> 
> You can run your test on a server in which all CPUs are kept as online
> for avoiding this issue.
> 
> Or you can apply the following patchset for avoiding this issue:
> 
> 	https://marc.info/?l=linux-block&m=152050646332092&w=2

If you want to do this way, all patches have been put into the following
tree(V4):

	https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4

#in reverse order
genirq/affinity: irq vector spread among online CPUs as far as possible
genirq/affinity: support to do irq vectors spread starting from any vector
genirq/affinity: move actual irq vector spread into one helper
genirq/affinity: rename *node_to_possible_cpumask as *node_to_cpumask
scsi: megaraid: improve scsi_mq performance via .host_tagset
scsi: hpsa: improve scsi_mq performance via .host_tagset
block: null_blk: introduce module parameter of 'g_host_tags'
scsi: Add template flag 'host_tagset'
blk-mq: introduce BLK_MQ_F_HOST_TAGS
blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
scsi: avoid to hold host_busy for scsi_mq
scsi: read host_busy via scsi_host_busy()
scsi: introduce scsi_host_busy()
scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity
scsi: introduce force_blk_mq
scsi: megaraid_sas: fix selection of reply queue
scsi: hpsa: fix selection of reply queue


Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-07 14:11                     ` Laurence Oberman
@ 2018-03-08 13:42                       ` Ming Lei
  2018-03-08 20:56                         ` Laurence Oberman
  0 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-03-08 13:42 UTC (permalink / raw)
  To: Laurence Oberman
  Cc: Martin K. Petersen, James Bottomley, Don Brace, Jens Axboe,
	linux-block, Christoph Hellwig, Mike Snitzer, linux-scsi,
	Hannes Reinecke, Arun Easi, Omar Sandoval, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Meelis Roos

On Wed, Mar 07, 2018 at 09:11:37AM -0500, Laurence Oberman wrote:
> On Tue, 2018-03-06 at 14:24 -0500, Martin K. Petersen wrote:
> > Ming,
> > 
> > > Given both Don and Laurence have verified that patch 1 and patch 2
> > > does fix IO hang, could you consider to merge the two first?
> > 
> > Oh, and I would still need a formal Acked-by: from Don and Tested-by:
> > from Laurence.
> > 
> > Also, for 4.16/scsi-fixes I would prefer verification to be done with
> > just patch 1/8 and none of the subsequent changes in place. Just to
> > make
> > sure we're testing the right thing.
> > 
> > Thanks!
> > 
> 
> Hello Martin
> 
> I tested just Patch 1/8 from the V3 series.
> No issues running workload and no issues booting on the DL380G7.
> Don can you ack this so we can at least get this one in.
> 
> Against: 4.16.0-rc4.v31of8+ on an x86_64
> 
> Tested-by: Laurence Oberman <loberman@redhat.com>

Hi Laurence,

Thanks for your test!

Could you test patch 2 too since you have megaraid_sas controller?

Looks it is better to split the fix patches from the current patchset,
since these fixes should be for V4.16.

Thanks
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-08  8:41       ` Hannes Reinecke
  2018-03-08  9:19         ` Ming Lei
@ 2018-03-08 15:31         ` Bart Van Assche
  1 sibling, 0 replies; 54+ messages in thread
From: Bart Van Assche @ 2018-03-08 15:31 UTC (permalink / raw)
  To: hch, hare, ming.lei
  Cc: linux-block, hch, snitzer, martin.petersen, axboe, mroos,
	linux-scsi, don.brace, james.bottomley, arun.easi, osandov,
	loberman, kashyap.desai, peter.rivera

T24gVGh1LCAyMDE4LTAzLTA4IGF0IDA5OjQxICswMTAwLCBIYW5uZXMgUmVpbmVja2Ugd3JvdGU6
DQo+IElFIHRoZSBfZW50aXJlXyByZXF1ZXN0IHNldCBpcyBhbGxvY2F0ZWQgYXMgX29uZV8gYXJy
YXksIG1ha2luZyBpdCBxdWl0ZQ0KPiBoYXJkIHRvIGhhbmRsZSBmcm9tIHRoZSBsb3dlci1sZXZl
bCBDUFUgY2FjaGVzLg0KPiBBbHNvIHRoZSAnbm9kZScgaW5kaWNhdG9yIGRvZXNuJ3QgcmVhbGx5
IGhlbHAgdXMgaGVyZSwgYXMgdGhlIHJlcXVlc3RzDQo+IGhhdmUgdG8gYmUgYWNjZXNzIGJ5IGFs
bCBDUFVzIGluIHRoZSBzaGFyZWQgdGFnIGNhc2UuDQo+IA0KPiBXb3VsZCBpdCBiZSBwb3NzaWJs
ZSBtb3ZlIHRhZ3MtPnJxcyB0byBiZWNvbWUgYSBfZG91YmxlXyBwb2ludGVyPw0KPiBUaGVuIHdl
IHdvdWxkIGhhdmUgb25seSBhIHNoYXJlZCBsb29rdXAgdGFibGUsIGJ1dCB0aGUgcmVxdWVzdHMN
Cj4gdGhlbXNlbHZlcyBjYW4gYmUgYWxsb2NhdGVkIHBlciBub2RlLCBkZXBlbmRpbmcgb24gdGhl
IENQVSBtYXAuDQo+IF9BbmRfIGl0IHNob3VsZCBiZSBlYXNpZXIgb24gdGhlIENQVSBjYWNoZSAu
Li4NCg0KVGhhdCBpcyBvbmUgcG9zc2libGUgc29sdXRpb24uIEFub3RoZXIgc29sdXRpb24gaXMg
dG8gZm9sbG93IHRoZSBhcHByb2FjaA0KZnJvbSBzYml0bWFwOiBhbGxvY2F0ZSBhIHNpbmdsZSBh
cnJheSB0aGF0IGlzIHNsaWdodGx5IGxhcmdlciB0aGFuIG5lZWRlZA0KYW5kIHVzZSB0aGUgZWxl
bWVudHMgaW4gc3VjaCBhIHdheSB0aGF0IG5vIHR3byBDUFVzIHVzZSB0aGUgc2FtZSBjYWNoZQ0K
bGluZS4NCg0KQmFydC4NCg0KDQoNCg0K

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue
  2018-03-08 13:42                       ` Ming Lei
@ 2018-03-08 20:56                         ` Laurence Oberman
  0 siblings, 0 replies; 54+ messages in thread
From: Laurence Oberman @ 2018-03-08 20:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Martin K. Petersen, James Bottomley, Don Brace, Jens Axboe,
	linux-block, Christoph Hellwig, Mike Snitzer, linux-scsi,
	Hannes Reinecke, Arun Easi, Omar Sandoval, Christoph Hellwig,
	Kashyap Desai, Peter Rivera, Meelis Roos

On Thu, 2018-03-08 at 21:42 +0800, Ming Lei wrote:
> On Wed, Mar 07, 2018 at 09:11:37AM -0500, Laurence Oberman wrote:
> > On Tue, 2018-03-06 at 14:24 -0500, Martin K. Petersen wrote:
> > > Ming,
> > > 
> > > > Given both Don and Laurence have verified that patch 1 and
> > > > patch 2
> > > > does fix IO hang, could you consider to merge the two first?
> > > 
> > > Oh, and I would still need a formal Acked-by: from Don and
> > > Tested-by:
> > > from Laurence.
> > > 
> > > Also, for 4.16/scsi-fixes I would prefer verification to be done
> > > with
> > > just patch 1/8 and none of the subsequent changes in place. Just
> > > to
> > > make
> > > sure we're testing the right thing.
> > > 
> > > Thanks!
> > > 
> > 
> > Hello Martin
> > 
> > I tested just Patch 1/8 from the V3 series.
> > No issues running workload and no issues booting on the DL380G7.
> > Don can you ack this so we can at least get this one in.
> > 
> > Against: 4.16.0-rc4.v31of8+ on an x86_64
> > 
> > Tested-by: Laurence Oberman <loberman@redhat.com>
> 
> Hi Laurence,
> 
> Thanks for your test!
> 
> Could you test patch 2 too since you have megaraid_sas controller?
> 
> Looks it is better to split the fix patches from the current
> patchset,
> since these fixes should be for V4.16.
> 
> Thanks
> Ming

Ho Ming I see a V4 now so I am going to wait until you split these and
then I will test both HPSA and megaraid_sas once Kashyap agrees.

When I see a V4 show up with the split will pull and act on it.

Thanks
Laurence

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-08 11:23                   ` Ming Lei
@ 2018-03-09  6:56                     ` Kashyap Desai
  2018-03-09  8:13                       ` Ming Lei
  0 siblings, 1 reply; 54+ messages in thread
From: Kashyap Desai @ 2018-03-09  6:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Thursday, March 8, 2018 4:54 PM
> To: Kashyap Desai
> Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
Snitzer;
> linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
Peter
> Rivera; Laurence Oberman
> Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
via
> .host_tagset
>
> On Thu, Mar 08, 2018 at 07:06:25PM +0800, Ming Lei wrote:
> > On Thu, Mar 08, 2018 at 03:34:31PM +0530, Kashyap Desai wrote:
> > > > -----Original Message-----
> > > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > > Sent: Thursday, March 8, 2018 6:46 AM
> > > > To: Kashyap Desai
> > > > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig;
> > > > Mike
> > > Snitzer;
> > > > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar
> > > > Sandoval; Martin K . Petersen; James Bottomley; Christoph Hellwig;
> > > > Don Brace;
> > > Peter
> > > > Rivera; Laurence Oberman
> > > > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq
> > > > performance
> > > via
> > > > .host_tagset
> > > >
> > > > On Wed, Mar 07, 2018 at 10:58:34PM +0530, Kashyap Desai wrote:
> > > > > > >
> > > > > > > Also one observation using V3 series patch. I am seeing
> > > > > > > below Affinity mapping whereas I have only 72 logical CPUs.
> > > > > > > It means we are really not going to use all reply queues.
> > > > > > > e.a If I bind fio jobs on CPU 18-20, I am seeing only one
> > > > > > > reply queue is used and that may lead to performance drop as
well.
> > > > > >
> > > > > > If the mapping is in such shape, I guess it should be quite
> > > > > > difficult to
> > > > > figure out
> > > > > > one perfect way to solve this situation because one reply
> > > > > > queue has to
> > > > > handle
> > > > > > IOs submitted from 4~5 CPUs at average.
> > > > >
> > > > > 4.15.0-rc1 kernel has below mapping - I am not sure which commit
> > > > > id in
> > > "
> > > > > linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to
CPU.
> > > > > It
> > > >
> > > > I guess the mapping you posted is read from
/proc/irq/126/smp_affinity.
> > > >
> > > > If yes, no any patch in linux_4.16-rc-host-tags-v3.2 should change
> > > > IRQ
> > > affinity
> > > > code, which is done in irq_create_affinity_masks(), as you saw, no
> > > > any
> > > patch
> > > > in linux_4.16-rc-host-tags-v3.2 touches that code.
> > > >
> > > > Could you simply apply the patches in linux_4.16-rc-host-tags-v3.2
> > > against
> > > > 4.15-rc1 kernel and see any difference?
> > > >
> > > > > will be really good if we can fall back to below mapping once
again.
> > > > > Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of
> > > > > random mapping of CPU - MSIx. And that will be problematic in
> > > > > performance
> > > run.
> > > > >
> > > > > As I posted earlier, latest repo will only allow us to use *18*
> > > > > reply
> > > >
> > > > Looks not see this report before, could you share us how you
> > > > conclude
> > > that?
> > > > The only patch changing reply queue is the following one:
> > > >
> > > > 	https://marc.info/?l=linux-block&m=151972611911593&w=2
> > > >
> > > > But not see any issue in this patch yet, can you recover to 72
> > > > reply
> > > queues
> > > > after reverting the patch in above link?
> > > Ming -
> > >
> > > While testing, my system went bad. I debug further and understood
> > > that affinity mapping was changed due to below commit -
> > > 84676c1f21e8ff54befe985f4f14dc1edc10046b
> > >
> > > [PATCH] genirq/affinity: assign vectors to all possible CPUs
> > >
> > > Because of above change, we end up using very less reply queue. Many
> > > reply queues on my setup was mapped to offline/not-available CPUs.
> > > This may be primary contributing to odd performance impact and it
> > > may not be truly due to V3/V4 patch series.
> >
> > Seems a good news, :-)
> >
> > >
> > > I am planning to check your V3 and V4 series after removing above
> > > commit ID (for performance impact.).
> >
> > You can run your test on a server in which all CPUs are kept as online
> > for avoiding this issue.
> >
> > Or you can apply the following patchset for avoiding this issue:
> >
> > 	https://marc.info/?l=linux-block&m=152050646332092&w=2
>
> If you want to do this way, all patches have been put into the following
> tree(V4):
>
> 	https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4

Tested above V4 commits. Now, IRQ - CPU mapping has at least one online
CPU as explained in patch " genirq/affinity: irq vector spread among
online CPUs as far as possible".
New IRQ-CPU mapping looks better.

Below is irq/cpu mapping. ( I have 0-71 online logical CPUs)
Kernel version:
Linux rhel7.3 4.16.0-rc4+ #1 SMP Thu Mar 8 10:51:56 EST 2018 x86_64 x86_64
x86_64 GNU/Linux
PCI name is 86:00.0, dump its irq affinity:
irq 218, cpu list 0,72,74,76,78
irq 219, cpu list 1,80,82,84,86
irq 220, cpu list 2,88,90,92,94
irq 221, cpu list 3,96,98,100,102
irq 222, cpu list 4,104,106,108
irq 223, cpu list 5,110,112,114
irq 224, cpu list 6,116,118,120
irq 225, cpu list 7,122,124,126
irq 226, cpu list 8,128,130,132
irq 227, cpu list 9,134,136,138
irq 228, cpu list 10,140,142,144
irq 229, cpu list 11,146,148,150
irq 230, cpu list 12,152,154,156
irq 231, cpu list 13,158,160,162
irq 232, cpu list 14,164,166,168
irq 233, cpu list 15,170,172,174
irq 234, cpu list 16,176,178,180
irq 235, cpu list 17,182,184,186
irq 236, cpu list 36,188,190,192
irq 237, cpu list 37,194,196,198
irq 238, cpu list 38,200,202,204
irq 239, cpu list 39,206,208,210
irq 240, cpu list 40,212,214,216
irq 241, cpu list 41,218,220,222
irq 242, cpu list 42,224,226,228
irq 243, cpu list 43,230,232,234
irq 244, cpu list 44,236,238,240
irq 245, cpu list 45,242,244,246
irq 246, cpu list 46,248,250,252
irq 247, cpu list 47,254,256,258
irq 248, cpu list 48,260,262,264
irq 249, cpu list 49,266,268,270
irq 250, cpu list 50,272,274,276
irq 251, cpu list 51,278,280,282
irq 252, cpu list 52,284,286,288
irq 253, cpu list 53,290,292,294
irq 254, cpu list 18,73,75,77,79
irq 255, cpu list 19,81,83,85,87
irq 256, cpu list 20,89,91,93,95
irq 257, cpu list 21,97,99,101,103
irq 258, cpu list 22,105,107,109
irq 259, cpu list 23,111,113,115
irq 260, cpu list 24,117,119,121
irq 261, cpu list 25,123,125,127
irq 262, cpu list 26,129,131,133
irq 263, cpu list 27,135,137,139
irq 264, cpu list 28,141,143,145
irq 265, cpu list 29,147,149,151
irq 266, cpu list 30,153,155,157
irq 267, cpu list 31,159,161,163
irq 268, cpu list 32,165,167,169
irq 269, cpu list 33,171,173,175
irq 270, cpu list 34,177,179,181
irq 271, cpu list 35,183,185,187
irq 272, cpu list 54,189,191,193
irq 273, cpu list 55,195,197,199
irq 274, cpu list 56,201,203,205
irq 275, cpu list 57,207,209,211
irq 276, cpu list 58,213,215,217
irq 277, cpu list 59,219,221,223
irq 278, cpu list 60,225,227,229
irq 279, cpu list 61,231,233,235
irq 280, cpu list 62,237,239,241
irq 281, cpu list 63,243,245,247
irq 282, cpu list 64,249,251,253
irq 283, cpu list 65,255,257,259
irq 284, cpu list 66,261,263,265
irq 285, cpu list 67,267,269,271
irq 286, cpu list 68,273,275,277
irq 287, cpu list 69,279,281,283
irq 288, cpu list 70,285,287,289
irq 289, cpu list 71,291,293,295

High level result on performance run is - Performance is unchanged. No
improvement and No degradation is observe using V4 series.
For this particular patch - " [PATCH V3 8/8] scsi: megaraid: improve
scsi_mq performance via .host_tagset"
We want to review performance number in our lab before commit to upstream.


Please skip this particular patch from series mainly because it is tied to
performance with no immediate improvement observed. We want to review
comprehensive result.

Regarding patch - " [PATCH V3 2/8] scsi: megaraid_sas: fix selection of
reply queue", I see one issue.  I have reply separately for that patch.
I see you have posted patch - " [PATCH V4 2/4] scsi: megaraid_sas: fix
selection of reply queue". Let me use new thread for my reply.



>
> #in reverse order
> genirq/affinity: irq vector spread among online CPUs as far as possible
> genirq/affinity: support to do irq vectors spread starting from any
vector
> genirq/affinity: move actual irq vector spread into one helper
> genirq/affinity: rename *node_to_possible_cpumask as *node_to_cpumask
> scsi: megaraid: improve scsi_mq performance via .host_tagset
> scsi: hpsa: improve scsi_mq performance via .host_tagset
> block: null_blk: introduce module parameter of 'g_host_tags'
> scsi: Add template flag 'host_tagset'
> blk-mq: introduce BLK_MQ_F_HOST_TAGS
> blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags'
> scsi: avoid to hold host_busy for scsi_mq
> scsi: read host_busy via scsi_host_busy()
> scsi: introduce scsi_host_busy()
> scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity
> scsi: introduce force_blk_mq
> scsi: megaraid_sas: fix selection of reply queue
> scsi: hpsa: fix selection of reply queue
>
>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance via .host_tagset
  2018-03-09  6:56                     ` Kashyap Desai
@ 2018-03-09  8:13                       ` Ming Lei
  0 siblings, 0 replies; 54+ messages in thread
From: Ming Lei @ 2018-03-09  8:13 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Mike Snitzer,
	linux-scsi, Hannes Reinecke, Arun Easi, Omar Sandoval,
	Martin K . Petersen, James Bottomley, Christoph Hellwig,
	Don Brace, Peter Rivera, Laurence Oberman

On Fri, Mar 09, 2018 at 12:26:57PM +0530, Kashyap Desai wrote:
> > -----Original Message-----
> > From: Ming Lei [mailto:ming.lei@redhat.com]
> > Sent: Thursday, March 8, 2018 4:54 PM
> > To: Kashyap Desai
> > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig; Mike
> Snitzer;
> > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar Sandoval;
> > Martin K . Petersen; James Bottomley; Christoph Hellwig; Don Brace;
> Peter
> > Rivera; Laurence Oberman
> > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq performance
> via
> > .host_tagset
> >
> > On Thu, Mar 08, 2018 at 07:06:25PM +0800, Ming Lei wrote:
> > > On Thu, Mar 08, 2018 at 03:34:31PM +0530, Kashyap Desai wrote:
> > > > > -----Original Message-----
> > > > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > > > Sent: Thursday, March 8, 2018 6:46 AM
> > > > > To: Kashyap Desai
> > > > > Cc: Jens Axboe; linux-block@vger.kernel.org; Christoph Hellwig;
> > > > > Mike
> > > > Snitzer;
> > > > > linux-scsi@vger.kernel.org; Hannes Reinecke; Arun Easi; Omar
> > > > > Sandoval; Martin K . Petersen; James Bottomley; Christoph Hellwig;
> > > > > Don Brace;
> > > > Peter
> > > > > Rivera; Laurence Oberman
> > > > > Subject: Re: [PATCH V3 8/8] scsi: megaraid: improve scsi_mq
> > > > > performance
> > > > via
> > > > > .host_tagset
> > > > >
> > > > > On Wed, Mar 07, 2018 at 10:58:34PM +0530, Kashyap Desai wrote:
> > > > > > > >
> > > > > > > > Also one observation using V3 series patch. I am seeing
> > > > > > > > below Affinity mapping whereas I have only 72 logical CPUs.
> > > > > > > > It means we are really not going to use all reply queues.
> > > > > > > > e.a If I bind fio jobs on CPU 18-20, I am seeing only one
> > > > > > > > reply queue is used and that may lead to performance drop as
> well.
> > > > > > >
> > > > > > > If the mapping is in such shape, I guess it should be quite
> > > > > > > difficult to
> > > > > > figure out
> > > > > > > one perfect way to solve this situation because one reply
> > > > > > > queue has to
> > > > > > handle
> > > > > > > IOs submitted from 4~5 CPUs at average.
> > > > > >
> > > > > > 4.15.0-rc1 kernel has below mapping - I am not sure which commit
> > > > > > id in
> > > > "
> > > > > > linux_4.16-rc-host-tags-v3.2" is changing the mapping of IRQ to
> CPU.
> > > > > > It
> > > > >
> > > > > I guess the mapping you posted is read from
> /proc/irq/126/smp_affinity.
> > > > >
> > > > > If yes, no any patch in linux_4.16-rc-host-tags-v3.2 should change
> > > > > IRQ
> > > > affinity
> > > > > code, which is done in irq_create_affinity_masks(), as you saw, no
> > > > > any
> > > > patch
> > > > > in linux_4.16-rc-host-tags-v3.2 touches that code.
> > > > >
> > > > > Could you simply apply the patches in linux_4.16-rc-host-tags-v3.2
> > > > against
> > > > > 4.15-rc1 kernel and see any difference?
> > > > >
> > > > > > will be really good if we can fall back to below mapping once
> again.
> > > > > > Current repo linux_4.16-rc-host-tags-v3.2 is giving lots of
> > > > > > random mapping of CPU - MSIx. And that will be problematic in
> > > > > > performance
> > > > run.
> > > > > >
> > > > > > As I posted earlier, latest repo will only allow us to use *18*
> > > > > > reply
> > > > >
> > > > > Looks not see this report before, could you share us how you
> > > > > conclude
> > > > that?
> > > > > The only patch changing reply queue is the following one:
> > > > >
> > > > > 	https://marc.info/?l=linux-block&m=151972611911593&w=2
> > > > >
> > > > > But not see any issue in this patch yet, can you recover to 72
> > > > > reply
> > > > queues
> > > > > after reverting the patch in above link?
> > > > Ming -
> > > >
> > > > While testing, my system went bad. I debug further and understood
> > > > that affinity mapping was changed due to below commit -
> > > > 84676c1f21e8ff54befe985f4f14dc1edc10046b
> > > >
> > > > [PATCH] genirq/affinity: assign vectors to all possible CPUs
> > > >
> > > > Because of above change, we end up using very less reply queue. Many
> > > > reply queues on my setup was mapped to offline/not-available CPUs.
> > > > This may be primary contributing to odd performance impact and it
> > > > may not be truly due to V3/V4 patch series.
> > >
> > > Seems a good news, :-)
> > >
> > > >
> > > > I am planning to check your V3 and V4 series after removing above
> > > > commit ID (for performance impact.).
> > >
> > > You can run your test on a server in which all CPUs are kept as online
> > > for avoiding this issue.
> > >
> > > Or you can apply the following patchset for avoiding this issue:
> > >
> > > 	https://marc.info/?l=linux-block&m=152050646332092&w=2
> >
> > If you want to do this way, all patches have been put into the following
> > tree(V4):
> >
> > 	https://github.com/ming1/linux/commits/v4.16-rc-host-tags-v4
> 
> Tested above V4 commits. Now, IRQ - CPU mapping has at least one online
> CPU as explained in patch " genirq/affinity: irq vector spread among
> online CPUs as far as possible".
> New IRQ-CPU mapping looks better.
> 
> Below is irq/cpu mapping. ( I have 0-71 online logical CPUs)
> Kernel version:
> Linux rhel7.3 4.16.0-rc4+ #1 SMP Thu Mar 8 10:51:56 EST 2018 x86_64 x86_64
> x86_64 GNU/Linux
> PCI name is 86:00.0, dump its irq affinity:
> irq 218, cpu list 0,72,74,76,78
> irq 219, cpu list 1,80,82,84,86
> irq 220, cpu list 2,88,90,92,94
> irq 221, cpu list 3,96,98,100,102
> irq 222, cpu list 4,104,106,108
> irq 223, cpu list 5,110,112,114
> irq 224, cpu list 6,116,118,120
> irq 225, cpu list 7,122,124,126
> irq 226, cpu list 8,128,130,132
> irq 227, cpu list 9,134,136,138
> irq 228, cpu list 10,140,142,144
> irq 229, cpu list 11,146,148,150
> irq 230, cpu list 12,152,154,156
> irq 231, cpu list 13,158,160,162
> irq 232, cpu list 14,164,166,168
> irq 233, cpu list 15,170,172,174
> irq 234, cpu list 16,176,178,180
> irq 235, cpu list 17,182,184,186
> irq 236, cpu list 36,188,190,192
> irq 237, cpu list 37,194,196,198
> irq 238, cpu list 38,200,202,204
> irq 239, cpu list 39,206,208,210
> irq 240, cpu list 40,212,214,216
> irq 241, cpu list 41,218,220,222
> irq 242, cpu list 42,224,226,228
> irq 243, cpu list 43,230,232,234
> irq 244, cpu list 44,236,238,240
> irq 245, cpu list 45,242,244,246
> irq 246, cpu list 46,248,250,252
> irq 247, cpu list 47,254,256,258
> irq 248, cpu list 48,260,262,264
> irq 249, cpu list 49,266,268,270
> irq 250, cpu list 50,272,274,276
> irq 251, cpu list 51,278,280,282
> irq 252, cpu list 52,284,286,288
> irq 253, cpu list 53,290,292,294
> irq 254, cpu list 18,73,75,77,79
> irq 255, cpu list 19,81,83,85,87
> irq 256, cpu list 20,89,91,93,95
> irq 257, cpu list 21,97,99,101,103
> irq 258, cpu list 22,105,107,109
> irq 259, cpu list 23,111,113,115
> irq 260, cpu list 24,117,119,121
> irq 261, cpu list 25,123,125,127
> irq 262, cpu list 26,129,131,133
> irq 263, cpu list 27,135,137,139
> irq 264, cpu list 28,141,143,145
> irq 265, cpu list 29,147,149,151
> irq 266, cpu list 30,153,155,157
> irq 267, cpu list 31,159,161,163
> irq 268, cpu list 32,165,167,169
> irq 269, cpu list 33,171,173,175
> irq 270, cpu list 34,177,179,181
> irq 271, cpu list 35,183,185,187
> irq 272, cpu list 54,189,191,193
> irq 273, cpu list 55,195,197,199
> irq 274, cpu list 56,201,203,205
> irq 275, cpu list 57,207,209,211
> irq 276, cpu list 58,213,215,217
> irq 277, cpu list 59,219,221,223
> irq 278, cpu list 60,225,227,229
> irq 279, cpu list 61,231,233,235
> irq 280, cpu list 62,237,239,241
> irq 281, cpu list 63,243,245,247
> irq 282, cpu list 64,249,251,253
> irq 283, cpu list 65,255,257,259
> irq 284, cpu list 66,261,263,265
> irq 285, cpu list 67,267,269,271
> irq 286, cpu list 68,273,275,277
> irq 287, cpu list 69,279,281,283
> irq 288, cpu list 70,285,287,289
> irq 289, cpu list 71,291,293,295
> 
> High level result on performance run is - Performance is unchanged. No
> improvement and No degradation is observe using V4 series.
> For this particular patch - " [PATCH V3 8/8] scsi: megaraid: improve
> scsi_mq performance via .host_tagset"
> We want to review performance number in our lab before commit to upstream.

OK, thanks for testing V4.

In theory, it should work. Today I just got a new server(dual sockets,
32cores), using 'host_tags' and two hw queues(per-node) improves IOPS
by >50%(from 950K to 1500M) with V4, compared with single hw queue.

In your test, maybe the following things need to be checked:

1) if the per-node hw queue is mapped to online CPUs belonging to one
same NUMA node.

2) if 1) is yes(it should be), maybe 1600K is the top performance of these SSDs,
then CPU utilization need to be checked since this patchset should decrease CPU
uses in this situation too.


thanks, 
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2018-03-09  8:13 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-27 10:07 [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Ming Lei
2018-02-27 10:07 ` [PATCH V3 1/8] scsi: hpsa: fix selection of reply queue Ming Lei
2018-03-01 16:18   ` Don Brace
2018-03-01 19:01     ` Laurence Oberman
2018-03-01 21:19       ` Laurence Oberman
2018-03-02  2:16         ` Ming Lei
2018-03-02 14:09           ` Laurence Oberman
2018-03-02 15:03             ` Don Brace
2018-03-02 21:53               ` Laurence Oberman
2018-03-05  2:07                 ` Ming Lei
2018-03-06 17:55                   ` Martin K. Petersen
2018-03-06 19:24                   ` Martin K. Petersen
2018-03-07  0:00                     ` Ming Lei
2018-03-07  3:14                       ` Martin K. Petersen
2018-03-07 14:11                     ` Laurence Oberman
2018-03-08 13:42                       ` Ming Lei
2018-03-08 20:56                         ` Laurence Oberman
2018-03-05  7:23                 ` Kashyap Desai
2018-03-05 14:35                   ` Don Brace
2018-03-05 15:19                   ` Mike Snitzer
2018-03-02  0:47     ` Ming Lei
2018-03-08  7:50   ` Christoph Hellwig
2018-03-08  8:15     ` Ming Lei
2018-03-08  8:41       ` Hannes Reinecke
2018-03-08  9:19         ` Ming Lei
2018-03-08 15:31         ` Bart Van Assche
2018-02-27 10:07 ` [PATCH V3 2/8] scsi: megaraid_sas: " Ming Lei
2018-02-27 10:07 ` [PATCH V3 3/8] blk-mq: introduce 'start_tag' field to 'struct blk_mq_tags' Ming Lei
2018-03-08  7:51   ` Christoph Hellwig
2018-02-27 10:07 ` [PATCH V3 4/8] blk-mq: introduce BLK_MQ_F_HOST_TAGS Ming Lei
2018-03-08  7:52   ` Christoph Hellwig
2018-03-08  9:35     ` Ming Lei
2018-02-27 10:07 ` [PATCH V3 5/8] scsi: Add template flag 'host_tagset' Ming Lei
2018-02-27 10:07 ` [PATCH V3 6/8] block: null_blk: introduce module parameter of 'g_host_tags' Ming Lei
2018-02-27 10:07 ` [PATCH V3 7/8] scsi: hpsa: improve scsi_mq performance via .host_tagset Ming Lei
2018-03-08  7:54   ` Christoph Hellwig
2018-03-08 10:59     ` Ming Lei
2018-02-27 10:07 ` [PATCH V3 8/8] scsi: megaraid: " Ming Lei
2018-02-28 14:58   ` Kashyap Desai
2018-02-28 15:21     ` Ming Lei
2018-02-28 16:22       ` Laurence Oberman
2018-03-01  5:24         ` Kashyap Desai
2018-03-01  7:58           ` Ming Lei
2018-03-07  5:27     ` Ming Lei
2018-03-07 15:01       ` Kashyap Desai
2018-03-07 16:05         ` Ming Lei
2018-03-07 17:28           ` Kashyap Desai
2018-03-08  1:15             ` Ming Lei
2018-03-08 10:04               ` Kashyap Desai
2018-03-08 11:06                 ` Ming Lei
2018-03-08 11:23                   ` Ming Lei
2018-03-09  6:56                     ` Kashyap Desai
2018-03-09  8:13                       ` Ming Lei
2018-03-01 21:46 ` [PATCH V3 0/8] blk-mq & scsi: fix reply queue selection and improve host wide tagset Laurence Oberman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).