All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue
@ 2018-03-09  3:32 Ming Lei
  2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Ming Lei @ 2018-03-09  3:32 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Ming Lei

Hi All,

The patches fixes reply queue(virt-queue on virtio-scsi) selection on hpsa,
megaraid_sa and virtio-scsi, and IO hang can be caused easily by this issue.

This issue is triggered by 84676c1f21e8 ("genirq/affinity: assign vectors
to all possible CPUs"). After 84676c1f21e8, it is easy to see one msix
vector mapped to all offline CPUs. If the reply queue is seleteced from
all allocated msix vectors(reply queues) in round-roin way, the selected
replay queue may not have any online CPU mapped, IO hang is caused.

Both hpsa and megaraid_sas uses host-wide tagset, we can't convert the
reply queue to blk_mq hw queue directly, otherwise IO performance is degraded
much, according to Kashyap's test, so this patchset sets up one mapping talbe
for selecting reply queue, and this approach has been used by mpt3sas already.

For virtio-scsi, the virt-queue is really hw queue wrt. blk-mq view, so
we introduce 'force_blk_mq' for fix this issue because: 1) virtio-blk
has been used for years in blk-mq mode; 2) we have discussed recently
that scsi_mq will be enabled at default soon. 

gitweb:
	https://github.com/ming1/linux/tree/v4.16-rc-select-reply-queue-fix-V4

V4:
	- splitted from previous patchset
	- handle virtio-scsi by force_blk_mq

Ming Lei (4):
  scsi: hpsa: fix selection of reply queue
  scsi: megaraid_sas: fix selection of reply queue
  scsi: introduce force_blk_mq
  scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity

 drivers/scsi/hosts.c                        |  1 +
 drivers/scsi/hpsa.c                         | 73 +++++++++++++++++++++--------
 drivers/scsi/hpsa.h                         |  1 +
 drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
 drivers/scsi/megaraid/megaraid_sas_base.c   | 34 +++++++++++++-
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++---
 drivers/scsi/virtio_scsi.c                  | 59 ++---------------------
 include/scsi/scsi_host.h                    |  3 ++
 8 files changed, 100 insertions(+), 85 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-09  3:32 [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Ming Lei
@ 2018-03-09  3:32 ` Ming Lei
  2018-03-09 22:14   ` Don Brace
                     ` (3 more replies)
  2018-03-09  3:32 ` [PATCH V4 2/4] scsi: megaraid_sas: " Ming Lei
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 21+ messages in thread
From: Ming Lei @ 2018-03-09  3:32 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Ming Lei, Hannes Reinecke, James Bottomley, Artem Bityutskiy

>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then one
command's completion may not be notified.

This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by pci_irq_get_affinity(), and uses this mapping
table to choose reply queue for queuing one command.

Then the chosen reply queue has to be active, and fixes IO hang caused
by using inactive reply queue which doesn't have any online CPU mapped.

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++--------------
 drivers/scsi/hpsa.h |  1 +
 2 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 5293e6827ce5..3a9eca163db8 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info *h, struct CommandList *c,
 		c->busaddr |= 1 | (h->blockFetchTable[c->Header.SGList] << 1);
 		if (unlikely(!h->msix_vectors))
 			return;
-		if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-			c->Header.ReplyQueue =
-				raw_smp_processor_id() % h->nreply_queues;
-		else
-			c->Header.ReplyQueue = reply_queue % h->nreply_queues;
+		c->Header.ReplyQueue = reply_queue;
 	}
 }
 
@@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->ReplyQueue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->ReplyQueue = reply_queue % h->nreply_queues;
+	cp->ReplyQueue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit (bit 0)
@@ -1087,10 +1080,7 @@ static void set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
 	/* Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/* Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
 	 *  - pull count (bits 0-3)
@@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct ctlr_info *h,
 	 * Tell the controller to post the reply to the queue for this
 	 * processor.  This seems to give the best I/O throughput.
 	 */
-	if (likely(reply_queue == DEFAULT_REPLY_QUEUE))
-		cp->reply_queue = smp_processor_id() % h->nreply_queues;
-	else
-		cp->reply_queue = reply_queue % h->nreply_queues;
+	cp->reply_queue = reply_queue;
 	/*
 	 * Set the bits in the address sent down to include:
 	 *  - performant mode bit not used in ioaccel mode 2
@@ -1157,6 +1144,8 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 {
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
+
+	reply_queue = h->reply_map[raw_smp_processor_id()];
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -7376,6 +7365,26 @@ static void hpsa_disable_interrupt_mode(struct ctlr_info *h)
 	h->msix_vectors = 0;
 }
 
+static void hpsa_setup_reply_map(struct ctlr_info *h)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	for (queue = 0; queue < h->msix_vectors; queue++) {
+		mask = pci_irq_get_affinity(h->pdev, queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			h->reply_map[cpu] = queue;
+	}
+	return;
+
+fallback:
+	for_each_possible_cpu(cpu)
+		h->reply_map[cpu] = 0;
+}
+
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info *h)
 	err = hpsa_interrupt_mode(h);
 	if (err)
 		goto clean1;
+
+	/* setup mapping between CPU and reply queue */
+	hpsa_setup_reply_map(h);
+
 	err = hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
 	if (err)
 		goto clean2;	/* intmode+region, pci */
@@ -8480,6 +8493,28 @@ static struct workqueue_struct *hpsa_create_controller_wq(struct ctlr_info *h,
 	return wq;
 }
 
+static void hpda_free_ctlr_info(struct ctlr_info *h)
+{
+	kfree(h->reply_map);
+	kfree(h);
+}
+
+static struct ctlr_info *hpda_alloc_ctlr_info(void)
+{
+	struct ctlr_info *h;
+
+	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	if (!h)
+		return NULL;
+
+	h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
+	if (!h->reply_map) {
+		kfree(h);
+		return NULL;
+	}
+	return h;
+}
+
 static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	int dac, rc;
@@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	 * the driver.  See comments in hpsa.h for more info.
 	 */
 	BUILD_BUG_ON(sizeof(struct CommandList) % COMMANDLIST_ALIGNMENT);
-	h = kzalloc(sizeof(*h), GFP_KERNEL);
+	h = hpda_alloc_ctlr_info();
 	if (!h) {
 		dev_err(&pdev->dev, "Failed to allocate controller head\n");
 		return -ENOMEM;
@@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev *pdev)
 	h->lockup_detected = NULL;			/* init_one 2 */
 	/* (void) pci_disable_pcie_error_reporting(pdev); */	/* init_one 1 */
 
-	kfree(h);					/* init_one 1 */
+	hpda_free_ctlr_info(h);				/* init_one 1 */
 }
 
 static int hpsa_suspend(__attribute__((unused)) struct pci_dev *pdev,
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 018f980a701c..fb9f5e7f8209 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -158,6 +158,7 @@ struct bmic_controller_parameters {
 #pragma pack()
 
 struct ctlr_info {
+	unsigned int *reply_map;
 	int	ctlr;
 	char	devname[8];
 	char    *product_name;
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply queue
  2018-03-09  3:32 [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Ming Lei
  2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
@ 2018-03-09  3:32 ` Ming Lei
  2018-03-09 11:07   ` Kashyap Desai
  2018-03-09  3:32 ` [PATCH V4 3/4] scsi: introduce force_blk_mq Ming Lei
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-03-09  3:32 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Ming Lei, Hannes Reinecke, James Bottomley, Artem Bityutskiy

>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then
command may be queued, and won't be notified after its completion.

This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by pci_irq_get_affinity(), and uses this info
to choose reply queue for queuing one command.

Then the chosen reply queue has to be active, and fixes IO hang caused
by using inactive reply queue which doesn't have any online CPU mapped.

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
 drivers/scsi/megaraid/megaraid_sas_base.c   | 34 ++++++++++++++++++++++++++++-
 drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++++------
 3 files changed, 38 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas.h b/drivers/scsi/megaraid/megaraid_sas.h
index ba6503f37756..a644d2be55b6 100644
--- a/drivers/scsi/megaraid/megaraid_sas.h
+++ b/drivers/scsi/megaraid/megaraid_sas.h
@@ -2127,7 +2127,7 @@ enum MR_PD_TYPE {
 #define MR_NVME_PAGE_SIZE_MASK		0x000000FF
 
 struct megasas_instance {
-
+	unsigned int *reply_map;
 	__le32 *producer;
 	dma_addr_t producer_h;
 	__le32 *consumer;
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index a71ee67df084..065956cb2aeb 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -5165,6 +5165,26 @@ megasas_setup_jbod_map(struct megasas_instance *instance)
 		instance->use_seqnum_jbod_fp = false;
 }
 
+static void megasas_setup_reply_map(struct megasas_instance *instance)
+{
+	const struct cpumask *mask;
+	unsigned int queue, cpu;
+
+	for (queue = 0; queue < instance->msix_vectors; queue++) {
+		mask = pci_irq_get_affinity(instance->pdev, queue);
+		if (!mask)
+			goto fallback;
+
+		for_each_cpu(cpu, mask)
+			instance->reply_map[cpu] = queue;
+	}
+	return;
+
+fallback:
+	for_each_possible_cpu(cpu)
+		instance->reply_map[cpu] = 0;
+}
+
 /**
  * megasas_init_fw -	Initializes the FW
  * @instance:		Adapter soft state
@@ -5343,6 +5363,8 @@ static int megasas_init_fw(struct megasas_instance *instance)
 			goto fail_setup_irqs;
 	}
 
+	megasas_setup_reply_map(instance);
+
 	dev_info(&instance->pdev->dev,
 		"firmware supports msix\t: (%d)", fw_msix_count);
 	dev_info(&instance->pdev->dev,
@@ -6448,6 +6470,11 @@ static int megasas_probe_one(struct pci_dev *pdev,
 	memset(instance, 0, sizeof(*instance));
 	atomic_set(&instance->fw_reset_no_pci_access, 0);
 
+	instance->reply_map = kzalloc(sizeof(unsigned int) * nr_cpu_ids,
+			GFP_KERNEL);
+	if (!instance->reply_map)
+		goto fail_alloc_reply_map;
+
 	/*
 	 * Initialize PCI related and misc parameters
 	 */
@@ -6539,8 +6566,9 @@ static int megasas_probe_one(struct pci_dev *pdev,
 	if (instance->msix_vectors)
 		pci_free_irq_vectors(instance->pdev);
 fail_init_mfi:
+	kfree(instance->reply_map);
+fail_alloc_reply_map:
 	scsi_host_put(host);
-
 fail_alloc_instance:
 	pci_disable_device(pdev);
 
@@ -6746,6 +6774,8 @@ megasas_resume(struct pci_dev *pdev)
 	if (rval < 0)
 		goto fail_reenable_msix;
 
+	megasas_setup_reply_map(instance);
+
 	if (instance->adapter_type != MFI_SERIES) {
 		megasas_reset_reply_desc(instance);
 		if (megasas_ioc_init_fusion(instance)) {
@@ -6960,6 +6990,8 @@ static void megasas_detach_one(struct pci_dev *pdev)
 
 	megasas_free_ctrl_mem(instance);
 
+	kfree(instance->reply_map);
+
 	scsi_host_put(host);
 
 	pci_disable_device(pdev);
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 073ced07e662..2994176a0121 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2655,11 +2655,8 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
 			fp_possible = (io_info.fpOkForIo > 0) ? true : false;
 	}
 
-	/* Use raw_smp_processor_id() for now until cmd->request->cpu is CPU
-	   id by default, not CPU group id, otherwise all MSI-X queues won't
-	   be utilized */
-	cmd->request_desc->SCSIIO.MSIxIndex = instance->msix_vectors ?
-		raw_smp_processor_id() % instance->msix_vectors : 0;
+	cmd->request_desc->SCSIIO.MSIxIndex =
+		instance->reply_map[raw_smp_processor_id()];
 
 	praid_context = &io_request->RaidContext;
 
@@ -2985,10 +2982,9 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
 	}
 
 	cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle;
-	cmd->request_desc->SCSIIO.MSIxIndex =
-		instance->msix_vectors ?
-		(raw_smp_processor_id() % instance->msix_vectors) : 0;
 
+	cmd->request_desc->SCSIIO.MSIxIndex =
+		instance->reply_map[raw_smp_processor_id()];
 
 	if (!fp_possible) {
 		/* system pd firmware path */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V4 3/4] scsi: introduce force_blk_mq
  2018-03-09  3:32 [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Ming Lei
  2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
  2018-03-09  3:32 ` [PATCH V4 2/4] scsi: megaraid_sas: " Ming Lei
@ 2018-03-09  3:32 ` Ming Lei
  2018-03-10 10:10   ` Christoph Hellwig
  2018-03-09  3:32 ` [PATCH V4 4/4] scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity Ming Lei
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-03-09  3:32 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Ming Lei, Omar Sandoval, James Bottomley

>From scsi driver view, it is a bit troublesome to support both blk-mq
and non-blk-mq at the same time, especially when drivers need to support
multi hw-queue.

This patch introduces 'force_blk_mq' to scsi_host_template so that drivers
can provide blk-mq only support, so driver code can avoid the trouble
for supporting both.

Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hosts.c     | 1 +
 include/scsi/scsi_host.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index 57bf43e34863..10f04b089392 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -477,6 +477,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
 		shost->dma_boundary = 0xffffffff;
 
 	shost->use_blk_mq = scsi_use_blk_mq;
+	shost->use_blk_mq = scsi_use_blk_mq || !!shost->hostt->force_blk_mq;
 
 	device_initialize(&shost->shost_gendev);
 	dev_set_name(&shost->shost_gendev, "host%d", shost->host_no);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 1a1df0d21ee3..6c6366f0bd15 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -452,6 +452,9 @@ struct scsi_host_template {
 	/* True if the controller does not support WRITE SAME */
 	unsigned no_write_same:1;
 
+	/* True if the low-level driver supports blk-mq only */
+	unsigned force_blk_mq:1;
+
 	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V4 4/4] scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity
  2018-03-09  3:32 [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Ming Lei
                   ` (2 preceding siblings ...)
  2018-03-09  3:32 ` [PATCH V4 3/4] scsi: introduce force_blk_mq Ming Lei
@ 2018-03-09  3:32 ` Ming Lei
  2018-03-10 10:15   ` Christoph Hellwig
  2018-03-09  7:00 ` [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Hannes Reinecke
  2018-03-09 14:01 ` Meelis Roos
  5 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-03-09  3:32 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Ming Lei, Omar Sandoval, James Bottomley, Paolo Bonzini

Now 84676c1f21e8ff5(genirq/affinity: assign vectors to all possible CPUs)
has been merged to V4.16-rc, and it is easy to allocate all offline CPUs
for some irq vectors, this can't be avoided even though the allocation
is improved.

For example, on a 8cores VM, 4~7 are not-present/offline, 4 queues of
virtio-scsi, the irq affinity assigned can become the following shape:

	irq 36, cpu list 0-7
	irq 37, cpu list 0-7
	irq 38, cpu list 0-7
	irq 39, cpu list 0-1
	irq 40, cpu list 4,6
	irq 41, cpu list 2-3
	irq 42, cpu list 5,7

Then IO hang is triggered in case of non-SCSI_MQ.

Given storage IO is always C/S model, there isn't such issue with SCSI_MQ(blk-mq),
because no IO can be submitted to one hw queue if the hw queue isn't
mapped to online CPUs.

Fix this issue by forcing to use blk_mq.

BTW, I have been used virtio-scsi(scsi_mq) for several years, and it has
been quite stable, so it shouldn't cause extra risk.

Cc: Omar Sandoval <osandov@fb.com>,
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/virtio_scsi.c | 59 +++-------------------------------------------
 1 file changed, 3 insertions(+), 56 deletions(-)

diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 7c28e8d4955a..54e3a0f6844c 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -91,9 +91,6 @@ struct virtio_scsi_vq {
 struct virtio_scsi_target_state {
 	seqcount_t tgt_seq;
 
-	/* Count of outstanding requests. */
-	atomic_t reqs;
-
 	/* Currently active virtqueue for requests sent to this target. */
 	struct virtio_scsi_vq *req_vq;
 };
@@ -152,8 +149,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
 	struct virtio_scsi_cmd *cmd = buf;
 	struct scsi_cmnd *sc = cmd->sc;
 	struct virtio_scsi_cmd_resp *resp = &cmd->resp.cmd;
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
 
 	dev_dbg(&sc->device->sdev_gendev,
 		"cmd %p response %u status %#02x sense_len %u\n",
@@ -210,8 +205,6 @@ static void virtscsi_complete_cmd(struct virtio_scsi *vscsi, void *buf)
 	}
 
 	sc->scsi_done(sc);
-
-	atomic_dec(&tgt->reqs);
 }
 
 static void virtscsi_vq_done(struct virtio_scsi *vscsi,
@@ -580,10 +573,7 @@ static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
 					struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sh);
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
 
-	atomic_inc(&tgt->reqs);
 	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
 }
 
@@ -596,55 +586,11 @@ static struct virtio_scsi_vq *virtscsi_pick_vq_mq(struct virtio_scsi *vscsi,
 	return &vscsi->req_vqs[hwq];
 }
 
-static struct virtio_scsi_vq *virtscsi_pick_vq(struct virtio_scsi *vscsi,
-					       struct virtio_scsi_target_state *tgt)
-{
-	struct virtio_scsi_vq *vq;
-	unsigned long flags;
-	u32 queue_num;
-
-	local_irq_save(flags);
-	if (atomic_inc_return(&tgt->reqs) > 1) {
-		unsigned long seq;
-
-		do {
-			seq = read_seqcount_begin(&tgt->tgt_seq);
-			vq = tgt->req_vq;
-		} while (read_seqcount_retry(&tgt->tgt_seq, seq));
-	} else {
-		/* no writes can be concurrent because of atomic_t */
-		write_seqcount_begin(&tgt->tgt_seq);
-
-		/* keep previous req_vq if a reader just arrived */
-		if (unlikely(atomic_read(&tgt->reqs) > 1)) {
-			vq = tgt->req_vq;
-			goto unlock;
-		}
-
-		queue_num = smp_processor_id();
-		while (unlikely(queue_num >= vscsi->num_queues))
-			queue_num -= vscsi->num_queues;
-		tgt->req_vq = vq = &vscsi->req_vqs[queue_num];
- unlock:
-		write_seqcount_end(&tgt->tgt_seq);
-	}
-	local_irq_restore(flags);
-
-	return vq;
-}
-
 static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
 				       struct scsi_cmnd *sc)
 {
 	struct virtio_scsi *vscsi = shost_priv(sh);
-	struct virtio_scsi_target_state *tgt =
-				scsi_target(sc->device)->hostdata;
-	struct virtio_scsi_vq *req_vq;
-
-	if (shost_use_blk_mq(sh))
-		req_vq = virtscsi_pick_vq_mq(vscsi, sc);
-	else
-		req_vq = virtscsi_pick_vq(vscsi, tgt);
+	struct virtio_scsi_vq *req_vq = virtscsi_pick_vq_mq(vscsi, sc);
 
 	return virtscsi_queuecommand(vscsi, req_vq, sc);
 }
@@ -775,7 +721,6 @@ static int virtscsi_target_alloc(struct scsi_target *starget)
 		return -ENOMEM;
 
 	seqcount_init(&tgt->tgt_seq);
-	atomic_set(&tgt->reqs, 0);
 	tgt->req_vq = &vscsi->req_vqs[0];
 
 	starget->hostdata = tgt;
@@ -823,6 +768,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
 	.target_alloc = virtscsi_target_alloc,
 	.target_destroy = virtscsi_target_destroy,
 	.track_queue_depth = 1,
+	.force_blk_mq = 1,
 };
 
 static struct scsi_host_template virtscsi_host_template_multi = {
@@ -844,6 +790,7 @@ static struct scsi_host_template virtscsi_host_template_multi = {
 	.target_destroy = virtscsi_target_destroy,
 	.map_queues = virtscsi_map_queues,
 	.track_queue_depth = 1,
+	.force_blk_mq = 1,
 };
 
 #define virtscsi_config_get(vdev, fld) \
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue
  2018-03-09  3:32 [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Ming Lei
                   ` (3 preceding siblings ...)
  2018-03-09  3:32 ` [PATCH V4 4/4] scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity Ming Lei
@ 2018-03-09  7:00 ` Hannes Reinecke
  2018-03-09  7:39   ` Ming Lei
  2018-03-09 14:01 ` Meelis Roos
  5 siblings, 1 reply; 21+ messages in thread
From: Hannes Reinecke @ 2018-03-09  7:00 UTC (permalink / raw)
  To: Ming Lei, James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer

On 03/09/2018 04:32 AM, Ming Lei wrote:
> Hi All,
> 
> The patches fixes reply queue(virt-queue on virtio-scsi) selection on hpsa,
> megaraid_sa and virtio-scsi, and IO hang can be caused easily by this issue.
> 
> This issue is triggered by 84676c1f21e8 ("genirq/affinity: assign vectors
> to all possible CPUs"). After 84676c1f21e8, it is easy to see one msix
> vector mapped to all offline CPUs. If the reply queue is seleteced from
> all allocated msix vectors(reply queues) in round-roin way, the selected
> replay queue may not have any online CPU mapped, IO hang is caused.
> 
> Both hpsa and megaraid_sas uses host-wide tagset, we can't convert the
> reply queue to blk_mq hw queue directly, otherwise IO performance is degraded
> much, according to Kashyap's test, so this patchset sets up one mapping talbe
> for selecting reply queue, and this approach has been used by mpt3sas already.
> 
> For virtio-scsi, the virt-queue is really hw queue wrt. blk-mq view, so
> we introduce 'force_blk_mq' for fix this issue because: 1) virtio-blk
> has been used for years in blk-mq mode; 2) we have discussed recently
> that scsi_mq will be enabled at default soon. 
> 
> gitweb:
> 	https://github.com/ming1/linux/tree/v4.16-rc-select-reply-queue-fix-V4
> 
> V4:
> 	- splitted from previous patchset
> 	- handle virtio-scsi by force_blk_mq
> 
> Ming Lei (4):
>   scsi: hpsa: fix selection of reply queue
>   scsi: megaraid_sas: fix selection of reply queue
>   scsi: introduce force_blk_mq
>   scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity
> 
>  drivers/scsi/hosts.c                        |  1 +
>  drivers/scsi/hpsa.c                         | 73 +++++++++++++++++++++--------
>  drivers/scsi/hpsa.h                         |  1 +
>  drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 34 +++++++++++++-
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++---
>  drivers/scsi/virtio_scsi.c                  | 59 ++---------------------
>  include/scsi/scsi_host.h                    |  3 ++
>  8 files changed, 100 insertions(+), 85 deletions(-)
> 
Well ... while this looks good in principle, what happens on cpu hotplug?
Don't we have to redo the map then?

Cheers,

Hannes

-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue
  2018-03-09  7:00 ` [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Hannes Reinecke
@ 2018-03-09  7:39   ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2018-03-09  7:39 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen,
	Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer

On Fri, Mar 09, 2018 at 08:00:52AM +0100, Hannes Reinecke wrote:
> On 03/09/2018 04:32 AM, Ming Lei wrote:
> > Hi All,
> > 
> > The patches fixes reply queue(virt-queue on virtio-scsi) selection on hpsa,
> > megaraid_sa and virtio-scsi, and IO hang can be caused easily by this issue.
> > 
> > This issue is triggered by 84676c1f21e8 ("genirq/affinity: assign vectors
> > to all possible CPUs"). After 84676c1f21e8, it is easy to see one msix
> > vector mapped to all offline CPUs. If the reply queue is seleteced from
> > all allocated msix vectors(reply queues) in round-roin way, the selected
> > replay queue may not have any online CPU mapped, IO hang is caused.
> > 
> > Both hpsa and megaraid_sas uses host-wide tagset, we can't convert the
> > reply queue to blk_mq hw queue directly, otherwise IO performance is degraded
> > much, according to Kashyap's test, so this patchset sets up one mapping talbe
> > for selecting reply queue, and this approach has been used by mpt3sas already.
> > 
> > For virtio-scsi, the virt-queue is really hw queue wrt. blk-mq view, so
> > we introduce 'force_blk_mq' for fix this issue because: 1) virtio-blk
> > has been used for years in blk-mq mode; 2) we have discussed recently
> > that scsi_mq will be enabled at default soon. 
> > 
> > gitweb:
> > 	https://github.com/ming1/linux/tree/v4.16-rc-select-reply-queue-fix-V4
> > 
> > V4:
> > 	- splitted from previous patchset
> > 	- handle virtio-scsi by force_blk_mq
> > 
> > Ming Lei (4):
> >   scsi: hpsa: fix selection of reply queue
> >   scsi: megaraid_sas: fix selection of reply queue
> >   scsi: introduce force_blk_mq
> >   scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity
> > 
> >  drivers/scsi/hosts.c                        |  1 +
> >  drivers/scsi/hpsa.c                         | 73 +++++++++++++++++++++--------
> >  drivers/scsi/hpsa.h                         |  1 +
> >  drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
> >  drivers/scsi/megaraid/megaraid_sas_base.c   | 34 +++++++++++++-
> >  drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++---
> >  drivers/scsi/virtio_scsi.c                  | 59 ++---------------------
> >  include/scsi/scsi_host.h                    |  3 ++
> >  8 files changed, 100 insertions(+), 85 deletions(-)
> > 
> Well ... while this looks good in principle, what happens on cpu hotplug?
> Don't we have to redo the map then?

Each item in the table is used to for mapping one CPU id to the hw queue index,
and the size of the table is 'nr_cpu_id', so no need to redo the map on cpu hotplug,
just like the usage of set->mq_map in blk-mq.

Thank,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply queue
  2018-03-09  3:32 ` [PATCH V4 2/4] scsi: megaraid_sas: " Ming Lei
@ 2018-03-09 11:07   ` Kashyap Desai
  2018-03-09 12:03     ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Kashyap Desai @ 2018-03-09 11:07 UTC (permalink / raw)
  To: Ming Lei, James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Laurence Oberman, Mike Snitzer, Hannes Reinecke,
	Artem Bityutskiy

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Friday, March 9, 2018 9:02 AM
> To: James Bottomley; Jens Axboe; Martin K . Petersen
> Cc: Christoph Hellwig; linux-scsi@vger.kernel.org; linux-
> block@vger.kernel.org; Meelis Roos; Don Brace; Kashyap Desai; Laurence
> Oberman; Mike Snitzer; Ming Lei; Hannes Reinecke; James Bottomley; Artem
> Bityutskiy
> Subject: [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply queue
>
> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one
> msix vector can be created without any online CPU mapped, then command
> may be queued, and won't be notified after its completion.
>
> This patch setups mapping between cpu and reply queue according to irq
> affinity info retrived by pci_irq_get_affinity(), and uses this info to
choose
> reply queue for queuing one command.
>
> Then the chosen reply queue has to be active, and fixes IO hang caused
by
> using inactive reply queue which doesn't have any online CPU mapped.

Also megaraid FW will use reply queue 0 for any async notification.  We
want to set pre_vectors = 1 and make sure reply queue 0 is not part of
affinity hint.
To meet that requirement, I have to make some more changes like add extra
queue.
Example if reply queue supported by FW is 96 and online CPU is 16, current
driver will allocate 16 msix vector. We may have to allocate 17 msix
vector and reserve reply queue 0 for async reply from FW.

I will be sending follow up patch soon.

>
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Laurence Oberman <loberman@redhat.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Meelis Roos <mroos@linux.ee>
> Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
> Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible
CPUs")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 34
> ++++++++++++++++++++++++++++-
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++++------
>  3 files changed, 38 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/scsi/megaraid/megaraid_sas.h
> b/drivers/scsi/megaraid/megaraid_sas.h
> index ba6503f37756..a644d2be55b6 100644
> --- a/drivers/scsi/megaraid/megaraid_sas.h
> +++ b/drivers/scsi/megaraid/megaraid_sas.h
> @@ -2127,7 +2127,7 @@ enum MR_PD_TYPE {
>  #define MR_NVME_PAGE_SIZE_MASK		0x000000FF
>
>  struct megasas_instance {
> -
> +	unsigned int *reply_map;
>  	__le32 *producer;
>  	dma_addr_t producer_h;
>  	__le32 *consumer;
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index a71ee67df084..065956cb2aeb 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -5165,6 +5165,26 @@ megasas_setup_jbod_map(struct
> megasas_instance *instance)
>  		instance->use_seqnum_jbod_fp = false;  }
>
> +static void megasas_setup_reply_map(struct megasas_instance *instance)
> +{
> +	const struct cpumask *mask;
> +	unsigned int queue, cpu;
> +
> +	for (queue = 0; queue < instance->msix_vectors; queue++) {
> +		mask = pci_irq_get_affinity(instance->pdev, queue);
> +		if (!mask)
> +			goto fallback;
> +
> +		for_each_cpu(cpu, mask)
> +			instance->reply_map[cpu] = queue;
> +	}
> +	return;
> +
> +fallback:
> +	for_each_possible_cpu(cpu)
> +		instance->reply_map[cpu] = 0;

Fallback should be better instead of just assigning to single reply queue.
May be something like below.

   for_each_possible_cpu(cpu)
       instance->reply_map[cpu] = cpu % instance->msix_vectors;;

If smp_affinity_enable module parameter is set to 0, I see performance
drop because IO is going to single reply queue.

> +}
> +
>  /**
>   * megasas_init_fw -	Initializes the FW
>   * @instance:		Adapter soft state
> @@ -5343,6 +5363,8 @@ static int megasas_init_fw(struct megasas_instance
> *instance)
>  			goto fail_setup_irqs;
>  	}
>
> +	megasas_setup_reply_map(instance);
> +
>  	dev_info(&instance->pdev->dev,
>  		"firmware supports msix\t: (%d)", fw_msix_count);
>  	dev_info(&instance->pdev->dev,
> @@ -6448,6 +6470,11 @@ static int megasas_probe_one(struct pci_dev
> *pdev,
>  	memset(instance, 0, sizeof(*instance));
>  	atomic_set(&instance->fw_reset_no_pci_access, 0);
>
> +	instance->reply_map = kzalloc(sizeof(unsigned int) * nr_cpu_ids,
> +			GFP_KERNEL);
> +	if (!instance->reply_map)
> +		goto fail_alloc_reply_map;
> +
We have common function  megasas_alloc_ctrl_mem() to manage allocation.
Good candidate to move this allocation to megasas_alloc_ctrl_mem.

>  	/*
>  	 * Initialize PCI related and misc parameters
>  	 */
> @@ -6539,8 +6566,9 @@ static int megasas_probe_one(struct pci_dev *pdev,
>  	if (instance->msix_vectors)
>  		pci_free_irq_vectors(instance->pdev);
>  fail_init_mfi:
> +	kfree(instance->reply_map);
> +fail_alloc_reply_map:
>  	scsi_host_put(host);
> -
>  fail_alloc_instance:
>  	pci_disable_device(pdev);
>
> @@ -6746,6 +6774,8 @@ megasas_resume(struct pci_dev *pdev)
>  	if (rval < 0)
>  		goto fail_reenable_msix;
>
> +	megasas_setup_reply_map(instance);
> +
>  	if (instance->adapter_type != MFI_SERIES) {
>  		megasas_reset_reply_desc(instance);
>  		if (megasas_ioc_init_fusion(instance)) { @@ -6960,6
+6990,8
> @@ static void megasas_detach_one(struct pci_dev *pdev)
>
>  	megasas_free_ctrl_mem(instance);
>
> +	kfree(instance->reply_map);
> +
>  	scsi_host_put(host);
>
>  	pci_disable_device(pdev);
> diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> index 073ced07e662..2994176a0121 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
> @@ -2655,11 +2655,8 @@ megasas_build_ldio_fusion(struct
> megasas_instance *instance,
>  			fp_possible = (io_info.fpOkForIo > 0) ? true :
false;
>  	}
>
> -	/* Use raw_smp_processor_id() for now until cmd->request->cpu is
> CPU
> -	   id by default, not CPU group id, otherwise all MSI-X queues
won't
> -	   be utilized */
> -	cmd->request_desc->SCSIIO.MSIxIndex = instance->msix_vectors ?
> -		raw_smp_processor_id() % instance->msix_vectors : 0;
> +	cmd->request_desc->SCSIIO.MSIxIndex =
> +		instance->reply_map[raw_smp_processor_id()];
>
>  	praid_context = &io_request->RaidContext;
>
> @@ -2985,10 +2982,9 @@ megasas_build_syspd_fusion(struct
> megasas_instance *instance,
>  	}
>
>  	cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle;
> -	cmd->request_desc->SCSIIO.MSIxIndex =
> -		instance->msix_vectors ?
> -		(raw_smp_processor_id() % instance->msix_vectors) : 0;
>
> +	cmd->request_desc->SCSIIO.MSIxIndex =
> +		instance->reply_map[raw_smp_processor_id()];
>
>  	if (!fp_possible) {
>  		/* system pd firmware path */
> --
> 2.9.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply queue
  2018-03-09 11:07   ` Kashyap Desai
@ 2018-03-09 12:03     ` Ming Lei
  2018-03-09 14:03       ` Kashyap Desai
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-03-09 12:03 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen,
	Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Laurence Oberman, Mike Snitzer, Hannes Reinecke,
	Artem Bityutskiy

On Fri, Mar 09, 2018 at 04:37:56PM +0530, Kashyap Desai wrote:
> > -----Original Message-----
> > From: Ming Lei [mailto:ming.lei@redhat.com]
> > Sent: Friday, March 9, 2018 9:02 AM
> > To: James Bottomley; Jens Axboe; Martin K . Petersen
> > Cc: Christoph Hellwig; linux-scsi@vger.kernel.org; linux-
> > block@vger.kernel.org; Meelis Roos; Don Brace; Kashyap Desai; Laurence
> > Oberman; Mike Snitzer; Ming Lei; Hannes Reinecke; James Bottomley; Artem
> > Bityutskiy
> > Subject: [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply queue
> >
> > From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
> one
> > msix vector can be created without any online CPU mapped, then command
> > may be queued, and won't be notified after its completion.
> >
> > This patch setups mapping between cpu and reply queue according to irq
> > affinity info retrived by pci_irq_get_affinity(), and uses this info to
> choose
> > reply queue for queuing one command.
> >
> > Then the chosen reply queue has to be active, and fixes IO hang caused
> by
> > using inactive reply queue which doesn't have any online CPU mapped.
> 
> Also megaraid FW will use reply queue 0 for any async notification.  We
> want to set pre_vectors = 1 and make sure reply queue 0 is not part of
> affinity hint.
> To meet that requirement, I have to make some more changes like add extra
> queue.
> Example if reply queue supported by FW is 96 and online CPU is 16, current
> driver will allocate 16 msix vector. We may have to allocate 17 msix
> vector and reserve reply queue 0 for async reply from FW.
> 
> I will be sending follow up patch soon.

OK, but the above extra change shouldn't belong to this patch, which
focuses on fixing IO hang because of reply queue selection.

> 
> >
> > Cc: Hannes Reinecke <hare@suse.de>
> > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > Cc: Christoph Hellwig <hch@lst.de>,
> > Cc: Don Brace <don.brace@microsemi.com>
> > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > Cc: Laurence Oberman <loberman@redhat.com>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Cc: Meelis Roos <mroos@linux.ee>
> > Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
> > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible
> CPUs")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
> >  drivers/scsi/megaraid/megaraid_sas_base.c   | 34
> > ++++++++++++++++++++++++++++-
> >  drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++++------
> >  3 files changed, 38 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/scsi/megaraid/megaraid_sas.h
> > b/drivers/scsi/megaraid/megaraid_sas.h
> > index ba6503f37756..a644d2be55b6 100644
> > --- a/drivers/scsi/megaraid/megaraid_sas.h
> > +++ b/drivers/scsi/megaraid/megaraid_sas.h
> > @@ -2127,7 +2127,7 @@ enum MR_PD_TYPE {
> >  #define MR_NVME_PAGE_SIZE_MASK		0x000000FF
> >
> >  struct megasas_instance {
> > -
> > +	unsigned int *reply_map;
> >  	__le32 *producer;
> >  	dma_addr_t producer_h;
> >  	__le32 *consumer;
> > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> > b/drivers/scsi/megaraid/megaraid_sas_base.c
> > index a71ee67df084..065956cb2aeb 100644
> > --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> > @@ -5165,6 +5165,26 @@ megasas_setup_jbod_map(struct
> > megasas_instance *instance)
> >  		instance->use_seqnum_jbod_fp = false;  }
> >
> > +static void megasas_setup_reply_map(struct megasas_instance *instance)
> > +{
> > +	const struct cpumask *mask;
> > +	unsigned int queue, cpu;
> > +
> > +	for (queue = 0; queue < instance->msix_vectors; queue++) {
> > +		mask = pci_irq_get_affinity(instance->pdev, queue);
> > +		if (!mask)
> > +			goto fallback;
> > +
> > +		for_each_cpu(cpu, mask)
> > +			instance->reply_map[cpu] = queue;
> > +	}
> > +	return;
> > +
> > +fallback:
> > +	for_each_possible_cpu(cpu)
> > +		instance->reply_map[cpu] = 0;
> 
> Fallback should be better instead of just assigning to single reply queue.
> May be something like below.
> 
>    for_each_possible_cpu(cpu)
>        instance->reply_map[cpu] = cpu % instance->msix_vectors;;
> 
> If smp_affinity_enable module parameter is set to 0, I see performance
> drop because IO is going to single reply queue.

OK, that looks one issue. If smp_affinity_enable is set as 0, we should
follow the original way.

> 
> > +}
> > +
> >  /**
> >   * megasas_init_fw -	Initializes the FW
> >   * @instance:		Adapter soft state
> > @@ -5343,6 +5363,8 @@ static int megasas_init_fw(struct megasas_instance
> > *instance)
> >  			goto fail_setup_irqs;
> >  	}
> >
> > +	megasas_setup_reply_map(instance);
> > +
> >  	dev_info(&instance->pdev->dev,
> >  		"firmware supports msix\t: (%d)", fw_msix_count);
> >  	dev_info(&instance->pdev->dev,
> > @@ -6448,6 +6470,11 @@ static int megasas_probe_one(struct pci_dev
> > *pdev,
> >  	memset(instance, 0, sizeof(*instance));
> >  	atomic_set(&instance->fw_reset_no_pci_access, 0);
> >
> > +	instance->reply_map = kzalloc(sizeof(unsigned int) * nr_cpu_ids,
> > +			GFP_KERNEL);
> > +	if (!instance->reply_map)
> > +		goto fail_alloc_reply_map;
> > +
> We have common function  megasas_alloc_ctrl_mem() to manage allocation.
> Good candidate to move this allocation to megasas_alloc_ctrl_mem.

Good point, will do that in V5.


-- 
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue
  2018-03-09  3:32 [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Ming Lei
                   ` (4 preceding siblings ...)
  2018-03-09  7:00 ` [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Hannes Reinecke
@ 2018-03-09 14:01 ` Meelis Roos
  5 siblings, 0 replies; 21+ messages in thread
From: Meelis Roos @ 2018-03-09 14:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen,
	Christoph Hellwig, linux-scsi, linux-block, Don Brace,
	Kashyap Desai, Laurence Oberman, Mike Snitzer

> V4:
> 	- splitted from previous patchset
> 	- handle virtio-scsi by force_blk_mq
> 
> Ming Lei (4):
>   scsi: hpsa: fix selection of reply queue
>   scsi: megaraid_sas: fix selection of reply queue
>   scsi: introduce force_blk_mq
>   scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity

Works on HP DL360G6 with integrated smartarray, with no visible 
regressions.

-- 
Meelis Roos (mroos@linux.ee)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply queue
  2018-03-09 12:03     ` Ming Lei
@ 2018-03-09 14:03       ` Kashyap Desai
  0 siblings, 0 replies; 21+ messages in thread
From: Kashyap Desai @ 2018-03-09 14:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen,
	Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Laurence Oberman, Mike Snitzer, Hannes Reinecke,
	Artem Bityutskiy

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Friday, March 9, 2018 5:33 PM
> To: Kashyap Desai
> Cc: James Bottomley; Jens Axboe; Martin K . Petersen; Christoph Hellwig;
> linux-scsi@vger.kernel.org; linux-block@vger.kernel.org; Meelis Roos;
Don
> Brace; Laurence Oberman; Mike Snitzer; Hannes Reinecke; Artem Bityutskiy
> Subject: Re: [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply
queue
>
> On Fri, Mar 09, 2018 at 04:37:56PM +0530, Kashyap Desai wrote:
> > > -----Original Message-----
> > > From: Ming Lei [mailto:ming.lei@redhat.com]
> > > Sent: Friday, March 9, 2018 9:02 AM
> > > To: James Bottomley; Jens Axboe; Martin K . Petersen
> > > Cc: Christoph Hellwig; linux-scsi@vger.kernel.org; linux-
> > > block@vger.kernel.org; Meelis Roos; Don Brace; Kashyap Desai;
> > > Laurence Oberman; Mike Snitzer; Ming Lei; Hannes Reinecke; James
> > > Bottomley; Artem Bityutskiy
> > > Subject: [PATCH V4 2/4] scsi: megaraid_sas: fix selection of reply
> > > queue
> > >
> > > From 84676c1f21 (genirq/affinity: assign vectors to all possible
> > > CPUs),
> > one
> > > msix vector can be created without any online CPU mapped, then
> > > command may be queued, and won't be notified after its completion.
> > >
> > > This patch setups mapping between cpu and reply queue according to
> > > irq affinity info retrived by pci_irq_get_affinity(), and uses this
> > > info to
> > choose
> > > reply queue for queuing one command.
> > >
> > > Then the chosen reply queue has to be active, and fixes IO hang
> > > caused
> > by
> > > using inactive reply queue which doesn't have any online CPU mapped.
> >
> > Also megaraid FW will use reply queue 0 for any async notification.
> > We want to set pre_vectors = 1 and make sure reply queue 0 is not part
> > of affinity hint.
> > To meet that requirement, I have to make some more changes like add
> > extra queue.
> > Example if reply queue supported by FW is 96 and online CPU is 16,
> > current driver will allocate 16 msix vector. We may have to allocate
> > 17 msix vector and reserve reply queue 0 for async reply from FW.
> >
> > I will be sending follow up patch soon.
>
> OK, but the above extra change shouldn't belong to this patch, which
focuses
> on fixing IO hang because of reply queue selection.

Fine. That will be a  separate patch to handle reply queue 0 affinity
case.
>
> >
> > >
> > > Cc: Hannes Reinecke <hare@suse.de>
> > > Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> > > Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> > > Cc: Christoph Hellwig <hch@lst.de>,
> > > Cc: Don Brace <don.brace@microsemi.com>
> > > Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> > > Cc: Laurence Oberman <loberman@redhat.com>
> > > Cc: Mike Snitzer <snitzer@redhat.com>
> > > Cc: Meelis Roos <mroos@linux.ee>
> > > Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
> > > Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all
> > > possible
> > CPUs")
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  drivers/scsi/megaraid/megaraid_sas.h        |  2 +-
> > >  drivers/scsi/megaraid/megaraid_sas_base.c   | 34
> > > ++++++++++++++++++++++++++++-
> > >  drivers/scsi/megaraid/megaraid_sas_fusion.c | 12 ++++------
> > >  3 files changed, 38 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/scsi/megaraid/megaraid_sas.h
> > > b/drivers/scsi/megaraid/megaraid_sas.h
> > > index ba6503f37756..a644d2be55b6 100644
> > > --- a/drivers/scsi/megaraid/megaraid_sas.h
> > > +++ b/drivers/scsi/megaraid/megaraid_sas.h
> > > @@ -2127,7 +2127,7 @@ enum MR_PD_TYPE {
> > >  #define MR_NVME_PAGE_SIZE_MASK		0x000000FF
> > >
> > >  struct megasas_instance {
> > > -
> > > +	unsigned int *reply_map;
> > >  	__le32 *producer;
> > >  	dma_addr_t producer_h;
> > >  	__le32 *consumer;
> > > diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
> > > b/drivers/scsi/megaraid/megaraid_sas_base.c
> > > index a71ee67df084..065956cb2aeb 100644
> > > --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> > > +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> > > @@ -5165,6 +5165,26 @@ megasas_setup_jbod_map(struct
> > > megasas_instance *instance)
> > >  		instance->use_seqnum_jbod_fp = false;  }
> > >
> > > +static void megasas_setup_reply_map(struct megasas_instance
> > > +*instance) {
> > > +	const struct cpumask *mask;
> > > +	unsigned int queue, cpu;
> > > +
> > > +	for (queue = 0; queue < instance->msix_vectors; queue++) {
> > > +		mask = pci_irq_get_affinity(instance->pdev, queue);
> > > +		if (!mask)
> > > +			goto fallback;
> > > +
> > > +		for_each_cpu(cpu, mask)
> > > +			instance->reply_map[cpu] = queue;
> > > +	}
> > > +	return;
> > > +
> > > +fallback:
> > > +	for_each_possible_cpu(cpu)
> > > +		instance->reply_map[cpu] = 0;
> >
> > Fallback should be better instead of just assigning to single reply
queue.
> > May be something like below.
> >
> >    for_each_possible_cpu(cpu)
> >        instance->reply_map[cpu] = cpu % instance->msix_vectors;;
> >
> > If smp_affinity_enable module parameter is set to 0, I see performance
> > drop because IO is going to single reply queue.
>
> OK, that looks one issue. If smp_affinity_enable is set as 0, we should
follow
> the original way.

Sent you the patch.
>
> >
> > > +}
> > > +
> > >  /**
> > >   * megasas_init_fw -	Initializes the FW
> > >   * @instance:		Adapter soft state
> > > @@ -5343,6 +5363,8 @@ static int megasas_init_fw(struct
> > > megasas_instance
> > > *instance)
> > >  			goto fail_setup_irqs;
> > >  	}
> > >
> > > +	megasas_setup_reply_map(instance);
> > > +
> > >  	dev_info(&instance->pdev->dev,
> > >  		"firmware supports msix\t: (%d)", fw_msix_count);
> > >  	dev_info(&instance->pdev->dev,
> > > @@ -6448,6 +6470,11 @@ static int megasas_probe_one(struct pci_dev
> > > *pdev,
> > >  	memset(instance, 0, sizeof(*instance));
> > >  	atomic_set(&instance->fw_reset_no_pci_access, 0);
> > >
> > > +	instance->reply_map = kzalloc(sizeof(unsigned int) * nr_cpu_ids,
> > > +			GFP_KERNEL);
> > > +	if (!instance->reply_map)
> > > +		goto fail_alloc_reply_map;
> > > +
> > We have common function  megasas_alloc_ctrl_mem() to manage
> allocation.
> > Good candidate to move this allocation to megasas_alloc_ctrl_mem.
>
> Good point, will do that in V5.

Sent you the patch.



>
>
> --
> Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
@ 2018-03-09 22:14   ` Don Brace
  2018-03-10 10:09   ` Christoph Hellwig
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Don Brace @ 2018-03-09 22:14 UTC (permalink / raw)
  To: Ming Lei, James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Kashyap Desai, Laurence Oberman, Mike Snitzer, Hannes Reinecke,
	Artem Bityutskiy


________________________________________
From: Ming Lei <ming.lei@redhat.com>
Sent: Thursday, March 8, 2018 7:32 PM
To: James Bottomley; Jens Axboe; Martin K . Petersen
Cc: Christoph Hellwig; linux-scsi@vger.kernel.org; linux-block@vger.kernel.=
org; Meelis Roos; Don Brace; Kashyap Desai; Laurence Oberman; Mike Snitzer;=
 Ming Lei; Hannes Reinecke; James Bottomley; Artem Bityutskiy
Subject: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue

EXTERNAL EMAIL


>From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
one msix vector can be created without any online CPU mapped, then one
command's completion may not be notified.

This patch setups mapping between cpu and reply queue according to irq
affinity info retrived by pci_irq_get_affinity(), and uses this mapping
table to choose reply queue for queuing one command.

Then the chosen reply queue has to be active, and fixes IO hang caused
by using inactive reply queue which doesn't have any online CPU mapped.

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: Don Brace <don.brace@microsemi.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPUs"=
)
Signed-off-by: Ming Lei <ming.lei@redhat.com>

I got the following stack trace while testing: I need to pop off the patche=
s and re-test for a baseline.


[root@cyflym ~]# [18564.263896] XFS (dm-2): _xfs_buf_find: daddr 0x28208484=
8 out of range, EOFS 0x7298000
[18564.301491] WARNING: CPU: 51 PID: 18275 at fs/xfs/xfs_buf.c:591 _xfs_buf=
_find+0x3f0/0x530 [xfs]
[18564.342614] Modules linked in: sg ip6t_rpfilter ip6t_REJECT nf_reject_ip=
v6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_=
ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack cfg80211 rfkill ebtable_nat e=
btable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6tab=
le_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_=
security iptable_raw iptable_filter ip_tables sb_edac x86_pkg_temp_thermal =
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmuln=
i_intel pcbc iTCO_wdt aesni_intel iTCO_vendor_support crypto_simd glue_help=
er cryptd pcspkr ipmi_si hpilo hpwdt lpc_ich ioatdma pcc_cpufreq shpchp dca=
 mfd_core wmi ipmi_msghandler acpi_power_meter uinput xfs libcrc32c mgag200=
 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
[18564.678017]  sd_mod fb_sys_fops ttm drm crc32c_intel tg3 hpsa i2c_core s=
csi_transport_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod dax
[18564.739543] CPU: 51 PID: 18275 Comm: bash Not tainted 4.16.0-rc4+ #14
[18564.769923] Hardware name: HP ProLiant DL580 Gen8, BIOS P79 08/18/2016
[18564.801111] RIP: 0010:_xfs_buf_find+0x3f0/0x530 [xfs]
[18564.825121] RSP: 0018:ffff9f0aaabaf6b8 EFLAGS: 00010246
[18564.849811] RAX: 0000000000000000 RBX: ffff9f0aaabaf808 RCX: 00000000000=
00000
[18564.883604] RDX: ffff9f0aaabaf5d8 RSI: 000000000000000a RDI: ffffffffc04=
6ad77
[18564.917315] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000000=
00021
[18564.951211] R10: 0000000000000000 R11: 000000000000000a R12: ffff8ade9c8=
8dbc0
[18564.984925] R13: ffff8ade9c88dbc0 R14: 0000000000000001 R15: ffff9f0aaab=
af808
[18565.018846] FS:  00007f423c899740(0000) GS:ffff8aee9ef80000(0000) knlGS:=
0000000000000000
[18565.057473] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18565.084562] CR2: 00007ffc480f8070 CR3: 000000105b8ce006 CR4: 00000000001=
606e0
[18565.118377] Call Trace:
[18565.129851]  ? _cond_resched+0x15/0x30
[18565.147590]  xfs_buf_get_map+0x23/0x260 [xfs]
[18565.168557]  xfs_buf_read_map+0x29/0x180 [xfs]
[18565.189845]  xfs_trans_read_buf_map+0xec/0x300 [xfs]
[18565.213354]  xfs_btree_read_buf_block.constprop.36+0x77/0xd0 [xfs]
[18565.242721]  xfs_btree_lookup_get_block+0x82/0x170 [xfs]
[18565.268117]  xfs_btree_lookup+0xce/0x3c0 [xfs]
[18565.289218]  ? kmem_zone_alloc+0x95/0x100 [xfs]
[18565.310659]  xfs_free_ag_extent+0x93/0x830 [xfs]
[18565.332491]  xfs_free_extent+0xb6/0x150 [xfs]
[18565.353187]  xfs_trans_free_extent+0x4f/0x110 [xfs]
[18565.376544]  ? xfs_trans_add_item+0x50/0x80 [xfs]
[18565.399174]  xfs_extent_free_finish_item+0x21/0x30 [xfs]
[18565.424638]  xfs_defer_finish+0x13d/0x400 [xfs]
[18565.446007]  xfs_itruncate_extents+0x11d/0x2d0 [xfs]
[18565.469501]  xfs_setattr_size+0x275/0x300 [xfs]
[18565.490808]  xfs_vn_setattr+0x40/0x60 [xfs]
[18565.510577]  notify_change+0x269/0x440
[18565.529105]  do_truncate+0x72/0xc0
[18565.545982]  path_openat+0x5ed/0x1210
[18565.563916]  ? xfs_iext_lookup_extent+0x60/0x140 [xfs]
[18565.588955]  ? xfs_bmapi_read+0x158/0x330 [xfs]
[18565.611077]  do_filp_open+0x91/0x100
[18565.628643]  ? xfs_iunlock+0xb9/0x110 [xfs]
[18565.649355]  do_sys_open+0x126/0x210
[18565.666880]  do_syscall_64+0x6e/0x1a0
[18565.684795]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[18565.709158] RIP: 0033:0x7f423bf85a20
[18565.726501] RSP: 002b:00007fffa5c54288 EFLAGS: 00000246 ORIG_RAX: 000000=
0000000002
[18565.763111] RAX: ffffffffffffffda RBX: 0000000000d64580 RCX: 00007f423bf=
85a20
[18565.797703] RDX: 0000000000000180 RSI: 0000000000000201 RDI: 0000000000d=
64580
[18565.831962] RBP: 0000000000d706dc R08: 0000000080000000 R09: 00000000000=
00071
[18565.865923] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000=
00023
[18565.899948] R13: 0000000000d70660 R14: 0000000000d706dc R15: 00000000000=
03b9a
[18565.933976] Code: 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e5 e9 57 ff ff =
ff 48 89 c1 48 c7 c2 80 76 46 c0 48 c7 c6 20 e3 46 c0 31 c0 e8 80 7c 01 00 =
<0f> 0b 31 c0 e9 74 ff ff ff 39 ca 0f 82 56 fe ff ff 48 8b 4c 24=20
[18566.023930] ---[ end trace 8a6b31ee9f72bc69 ]---
[18566.046443] XFS (dm-2): _xfs_buf_find: daddr 0x282084848 out of range, E=
OFS 0x7298000
[18566.083672] WARNING: CPU: 52 PID: 18275 at fs/xfs/xfs_buf.c:591 _xfs_buf=
_find+0x3f0/0x530 [xfs]
[18566.125206] Modules linked in: sg ip6t_rpfilter ip6t_REJECT nf_reject_ip=
v6 nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_conntrack_=
ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack cfg80211 rfkill ebtable_nat e=
btable_broute bridge stp llc ebtable_filter ebtables ip6table_mangle ip6tab=
le_security ip6table_raw ip6table_filter ip6_tables iptable_mangle iptable_=
security iptable_raw iptable_filter ip_tables sb_edac x86_pkg_temp_thermal =
coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmuln=
i_intel pcbc iTCO_wdt aesni_intel iTCO_vendor_support crypto_simd glue_help=
er cryptd pcspkr ipmi_si hpilo hpwdt lpc_ich ioatdma pcc_cpufreq shpchp dca=
 mfd_core wmi ipmi_msghandler acpi_power_meter uinput xfs libcrc32c mgag200=
 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
[18566.464387]  sd_mod fb_sys_fops ttm drm crc32c_intel tg3 hpsa i2c_core s=
csi_transport_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod dax
[18566.526994] CPU: 52 PID: 18275 Comm: bash Tainted: G        W        4.1=
6.0-rc4+ #14
[18566.564460] Hardware name: HP ProLiant DL580 Gen8, BIOS P79 08/18/2016
[18566.595845] RIP: 0010:_xfs_buf_find+0x3f0/0x530 [xfs]
[18566.621209] RSP: 0018:ffff9f0aaabaf6b8 EFLAGS: 00010246
[18566.648069] RAX: 0000000000000000 RBX: ffff9f0aaabaf808 RCX: 00000000000=
00000
[18566.683203] RDX: ffff9f0aaabaf5d8 RSI: 000000000000000a RDI: ffffffffc04=
6ad77
[18566.718236] RBP: 0000000000000001 R08: 0000000000000000 R09: 00000000000=
00021
[18566.753277] R10: 0000000000000000 R11: 000000000000000a R12: ffff8ade9c8=
8dbc0
[18566.789035] R13: ffff8ade9c88dbc0 R14: 0000000000000001 R15: ffff8aee96e=
25200
[18566.823916] FS:  00007f423c899740(0000) GS:ffff8aee9efc0000(0000) knlGS:=
0000000000000000
[18566.863626] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18566.891659] CR2: 00007f581799f410 CR3: 000000105b8ce004 CR4: 00000000001=
606e0
[18566.927561] Call Trace:
[18566.940146]  ? xfs_buf_allocate_memory+0x170/0x2b3 [xfs]
[18566.966915]  xfs_buf_get_map+0x1e4/0x260 [xfs]
[18566.989523]  xfs_buf_read_map+0x29/0x180 [xfs]
[18567.011523]  xfs_trans_read_buf_map+0xec/0x300 [xfs]
[18567.035660]  xfs_btree_read_buf_block.constprop.36+0x77/0xd0 [xfs]
[18567.065705]  xfs_btree_lookup_get_block+0x82/0x170 [xfs]
[18567.091499]  xfs_btree_lookup+0xce/0x3c0 [xfs]
[18567.113115]  ? kmem_zone_alloc+0x95/0x100 [xfs]
[18567.135992]  xfs_free_ag_extent+0x93/0x830 [xfs]
[18567.158559]  xfs_free_extent+0xb6/0x150 [xfs]
[18567.180063]  xfs_trans_free_extent+0x4f/0x110 [xfs]
[18567.204174]  ? xfs_trans_add_item+0x50/0x80 [xfs]
[18567.227012]  xfs_extent_free_finish_item+0x21/0x30 [xfs]
[18567.252809]  xfs_defer_finish+0x13d/0x400 [xfs]
[18567.275123]  xfs_itruncate_extents+0x11d/0x2d0 [xfs]
[18567.300006]  xfs_setattr_size+0x275/0x300 [xfs]
[18567.322220]  xfs_vn_setattr+0x40/0x60 [xfs]
[18567.342740]  notify_change+0x269/0x440
[18567.361172]  do_truncate+0x72/0xc0
[18567.378108]  path_openat+0x5ed/0x1210
[18567.396127]  ? xfs_iext_lookup_extent+0x60/0x140 [xfs]
[18567.421433]  ? xfs_bmapi_read+0x158/0x330 [xfs]
[18567.443411]  do_filp_open+0x91/0x100
[18567.460991]  ? xfs_iunlock+0xb9/0x110 [xfs]
[18567.481726]  do_sys_open+0x126/0x210
[18567.499159]  do_syscall_64+0x6e/0x1a0
[18567.516962]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[18567.541857] RIP: 0033:0x7f423bf85a20
[18567.559489] RSP: 002b:00007fffa5c54288 EFLAGS: 00000246 ORIG_RAX: 000000=
0000000002
[18567.595759] RAX: ffffffffffffffda RBX: 0000000000d64580 RCX: 00007f423bf=
85a20
[18567.629888] RDX: 0000000000000180 RSI: 0000000000000201 RDI: 0000000000d=
64580
[18567.664134] RBP: 0000000000d706dc R08: 0000000080000000 R09: 00000000000=
00071
[18567.698568] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000=
00023
[18567.732918] R13: 0000000000d70660 R14: 0000000000d706dc R15: 00000000000=
03b9a
[18567.767047] Code: 48 89 de ff d0 49 8b 45 00 48 85 c0 75 e5 e9 57 ff ff =
ff 48 89 c1 48 c7 c2 80 76 46 c0 48 c7 c6 20 e3 46 c0 31 c0 e8 80 7c 01 00 =
<0f> 0b 31 c0 e9 74 ff ff ff 39 ca 0f 82 56 fe ff ff 48 8b 4c 24=20
[18567.857544] ---[ end trace 8a6b31ee9f72bc6a ]---
[18567.879943] XFS (dm-2): xfs_do_force_shutdown(0x1) called from line 236 =
of file fs/xfs/libxfs/xfs_defer.c.  Return address =3D 00000000e063696e
[18568.011022] XFS (dm-2): I/O Error Detected. Shutting down filesystem
[18568.042302] XFS (dm-2): Please umount the filesystem and rectify the pro=
blem(s)

---
[0:0:0:0]    disk    Apricorn                  0128  /dev/sdg=20
[1:0:0:0]    storage HP       P830i            3.02  -       =20
[1:1:0:0]    disk    HP       LOGICAL VOLUME   3.02  /dev/sda=20
[2:0:0:0]    storage HP       P431             4.54  -       =20
[3:0:0:0]    storage HP       P441             6.59  -       =20
[3:1:0:0]    disk    HP       LOGICAL VOLUME   6.59  /dev/sdb=20
[3:1:0:1]    disk    HP       LOGICAL VOLUME   6.59  /dev/sdc=20
[3:1:0:2]    disk    HP       LOGICAL VOLUME   6.59  /dev/sdd=20
[3:1:0:3]    disk    HP       LOGICAL VOLUME   6.59  /dev/sde=20
[3:1:0:4]    disk    HP       LOGICAL VOLUME   6.59  /dev/sdf=20
[4:0:0:0]    storage HP       P431             4.54  -       =20
[4:0:1:0]    disk    HP       MO0800JFFCH      HPD0  /dev/sdh=20
[4:0:2:0]    disk    HP       MO0400JFFCF      HPD0  /dev/sdi=20
[4:0:3:0]    disk    HP       VO0960JFDGU      HPD0  /dev/sdj=20
[4:0:4:0]    disk    HP       MO1600JFFCK      HPD0  /dev/sdk=20
[4:0:5:0]    disk    HP       MO1600JFFCK      HPD0  /dev/sdl=20
[4:0:6:0]    disk    HP       VO0480JFDGT      HPD0  /dev/sdm=20
[4:0:7:0]    disk    HP       VO1920JFDGV      HPD0  /dev/sdn=20
[4:0:8:0]    disk    HP       MO0400JFFCF      HPD0  /dev/sdo=20
[4:0:9:0]    disk    HP       EO0200JDVFA      HPD1  /dev/sdp=20
[4:0:10:0]   disk    HP       EO0200JDVFA      HPD1  /dev/sdq=20
[4:0:11:0]   disk    HP       EO0200JDVFA      HPD1  /dev/sdr=20
[4:0:12:0]   disk    HP       EO0200JDVFA      HPD1  /dev/sds=20
[4:0:13:0]   disk    HP       EO0200JDVFA      HPD1  /dev/sdt=20
[4:0:14:0]   disk    HP       EO0200JDVFA      HPD1  /dev/sdu=20
[4:0:15:0]   disk    HP       EO0200JDVFA      HPD1  /dev/sdv=20
[4:0:16:0]   disk    HP       EO0200JDVFA      HPD1  /dev/sdw=20


[root@cyflym ~]# cat fio_test_5_P441_LV_devices.fio
[global]
ioengine=3Dlibaio
rw=3Drandrw
size=3D200g
bs=3D512
direct=3D1

[/dev/sdb]
iodepth=3D512

[/dev/sdc]
iodepth=3D512

[/dev/sdd]
iodepth=3D512

[/dev/sde]
iodepth=3D512

[/dev/sdf]
iodepth=3D512

[/dev/sdg]
iodepth=3D512



[root@cyflym ~]# cat fio_test_10_P431_hba_devices.fio
[global]
ioengine=3Dlibaio
rw=3Drandrw
size=3D200g
bs=3D512
direct=3D1

[/dev/sdh]
iodepth=3D512

[/dev/sdi]
iodepth=3D512

[/dev/sdj]
iodepth=3D512

[/dev/sdk]
iodepth=3D512

[/dev/sdl]
iodepth=3D512

[/dev/sdm]
iodepth=3D512

[/dev/sdn]
iodepth=3D512

[/dev/sdo]
iodepth=3D512

[/dev/sdp]
iodepth=3D512

[/dev/sdq]
iodepth=3D512

---
 drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++----------=
----
 drivers/scsi/hpsa.h |  1 +
 2 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 5293e6827ce5..3a9eca163db8 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info *h,=
 struct CommandList *c,
                c->busaddr |=3D 1 | (h->blockFetchTable[c->Header.SGList] <=
< 1);
                if (unlikely(!h->msix_vectors))
                        return;
-               if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
-                       c->Header.ReplyQueue =3D
-                               raw_smp_processor_id() % h->nreply_queues;
-               else
-                       c->Header.ReplyQueue =3D reply_queue % h->nreply_qu=
eues;
+               c->Header.ReplyQueue =3D reply_queue;
        }
 }

@@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct ctlr=
_info *h,
         * Tell the controller to post the reply to the queue for this
         * processor.  This seems to give the best I/O throughput.
         */
-       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
-               cp->ReplyQueue =3D smp_processor_id() % h->nreply_queues;
-       else
-               cp->ReplyQueue =3D reply_queue % h->nreply_queues;
+       cp->ReplyQueue =3D reply_queue;
        /*
         * Set the bits in the address sent down to include:
         *  - performant mode bit (bit 0)
@@ -1087,10 +1080,7 @@ static void set_ioaccel2_tmf_performant_mode(struct =
ctlr_info *h,
        /* Tell the controller to post the reply to the queue for this
         * processor.  This seems to give the best I/O throughput.
         */
-       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
-               cp->reply_queue =3D smp_processor_id() % h->nreply_queues;
-       else
-               cp->reply_queue =3D reply_queue % h->nreply_queues;
+       cp->reply_queue =3D reply_queue;
        /* Set the bits in the address sent down to include:
         *  - performant mode bit not used in ioaccel mode 2
         *  - pull count (bits 0-3)
@@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct ctlr=
_info *h,
         * Tell the controller to post the reply to the queue for this
         * processor.  This seems to give the best I/O throughput.
         */
-       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
-               cp->reply_queue =3D smp_processor_id() % h->nreply_queues;
-       else
-               cp->reply_queue =3D reply_queue % h->nreply_queues;
+       cp->reply_queue =3D reply_queue;
        /*
         * Set the bits in the address sent down to include:
         *  - performant mode bit not used in ioaccel mode 2
@@ -1157,6 +1144,8 @@ static void __enqueue_cmd_and_start_io(struct ctlr_in=
fo *h,
 {
        dial_down_lockup_detection_during_fw_flash(h, c);
        atomic_inc(&h->commands_outstanding);
+
+       reply_queue =3D h->reply_map[raw_smp_processor_id()];
        switch (c->cmd_type) {
        case CMD_IOACCEL1:
                set_ioaccel1_performant_mode(h, c, reply_queue);
@@ -7376,6 +7365,26 @@ static void hpsa_disable_interrupt_mode(struct ctlr_=
info *h)
        h->msix_vectors =3D 0;
 }

+static void hpsa_setup_reply_map(struct ctlr_info *h)
+{
+       const struct cpumask *mask;
+       unsigned int queue, cpu;
+
+       for (queue =3D 0; queue < h->msix_vectors; queue++) {
+               mask =3D pci_irq_get_affinity(h->pdev, queue);
+               if (!mask)
+                       goto fallback;
+
+               for_each_cpu(cpu, mask)
+                       h->reply_map[cpu] =3D queue;
+       }
+       return;
+
+fallback:
+       for_each_possible_cpu(cpu)
+               h->reply_map[cpu] =3D 0;
+}
+
 /* If MSI/MSI-X is supported by the kernel we will try to enable it on
  * controllers that are capable. If not, we use legacy INTx mode.
  */
@@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info *h)
        err =3D hpsa_interrupt_mode(h);
        if (err)
                goto clean1;
+
+       /* setup mapping between CPU and reply queue */
+       hpsa_setup_reply_map(h);
+
        err =3D hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
        if (err)
                goto clean2;    /* intmode+region, pci */
@@ -8480,6 +8493,28 @@ static struct workqueue_struct *hpsa_create_controll=
er_wq(struct ctlr_info *h,
        return wq;
 }

+static void hpda_free_ctlr_info(struct ctlr_info *h)
+{
+       kfree(h->reply_map);
+       kfree(h);
+}
+
+static struct ctlr_info *hpda_alloc_ctlr_info(void)
+{
+       struct ctlr_info *h;
+
+       h =3D kzalloc(sizeof(*h), GFP_KERNEL);
+       if (!h)
+               return NULL;
+
+       h->reply_map =3D kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KE=
RNEL);
+       if (!h->reply_map) {
+               kfree(h);
+               return NULL;
+       }
+       return h;
+}
+
 static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_id =
*ent)
 {
        int dac, rc;
@@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev *pdev, const =
struct pci_device_id *ent)
         * the driver.  See comments in hpsa.h for more info.
         */
        BUILD_BUG_ON(sizeof(struct CommandList) % COMMANDLIST_ALIGNMENT);
-       h =3D kzalloc(sizeof(*h), GFP_KERNEL);
+       h =3D hpda_alloc_ctlr_info();
        if (!h) {
                dev_err(&pdev->dev, "Failed to allocate controller head\n")=
;
                return -ENOMEM;
@@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev *pdev)
        h->lockup_detected =3D NULL;                      /* init_one 2 */
        /* (void) pci_disable_pcie_error_reporting(pdev); */    /* init_one=
 1 */

-       kfree(h);                                       /* init_one 1 */
+       hpda_free_ctlr_info(h);                         /* init_one 1 */
 }

 static int hpsa_suspend(__attribute__((unused)) struct pci_dev *pdev,
diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
index 018f980a701c..fb9f5e7f8209 100644
--- a/drivers/scsi/hpsa.h
+++ b/drivers/scsi/hpsa.h
@@ -158,6 +158,7 @@ struct bmic_controller_parameters {
 #pragma pack()

 struct ctlr_info {
+       unsigned int *reply_map;
        int     ctlr;
        char    devname[8];
        char    *product_name;
--
2.9.5

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
  2018-03-09 22:14   ` Don Brace
@ 2018-03-10 10:09   ` Christoph Hellwig
  2018-03-10 15:01     ` Ming Lei
  2018-03-12  7:37   ` Bityutskiy, Artem
  2018-03-12 15:39   ` Don Brace
  3 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-03-10 10:09 UTC (permalink / raw)
  To: Ming Lei
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen,
	Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Hannes Reinecke, Artem Bityutskiy

> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> +{
> +	const struct cpumask *mask;
> +	unsigned int queue, cpu;
> +
> +	for (queue = 0; queue < h->msix_vectors; queue++) {
> +		mask = pci_irq_get_affinity(h->pdev, queue);
> +		if (!mask)
> +			goto fallback;
> +
> +		for_each_cpu(cpu, mask)
> +			h->reply_map[cpu] = queue;
> +	}
> +	return;
> +
> +fallback:
> +	for_each_possible_cpu(cpu)
> +		h->reply_map[cpu] = 0;
> +}

> +	h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
> +	if (!h->reply_map) {
> +		kfree(h);
> +		return NULL;
> +	}
> +	return h;

I really dislike this being open coded in drivers.  It really should
be helper chared with the blk-mq map building that drivers just use.

For now just have a low-level blk_pci_map_queues that
blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
it might make sense to change the blk-mq callout to that low-level
prototype as well.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 3/4] scsi: introduce force_blk_mq
  2018-03-09  3:32 ` [PATCH V4 3/4] scsi: introduce force_blk_mq Ming Lei
@ 2018-03-10 10:10   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2018-03-10 10:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen,
	Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Omar Sandoval

On Fri, Mar 09, 2018 at 11:32:17AM +0800, Ming Lei wrote:
> >From scsi driver view, it is a bit troublesome to support both blk-mq
> and non-blk-mq at the same time, especially when drivers need to support
> multi hw-queue.
> 
> This patch introduces 'force_blk_mq' to scsi_host_template so that drivers
> can provide blk-mq only support, so driver code can avoid the trouble
> for supporting both.
> 
> Cc: Omar Sandoval <osandov@fb.com>,
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: Laurence Oberman <loberman@redhat.com>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hosts.c     | 1 +
>  include/scsi/scsi_host.h | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index 57bf43e34863..10f04b089392 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -477,6 +477,7 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize)
>  		shost->dma_boundary = 0xffffffff;
>  
>  	shost->use_blk_mq = scsi_use_blk_mq;
> +	shost->use_blk_mq = scsi_use_blk_mq || !!shost->hostt->force_blk_mq;

No need for the !! here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 4/4] scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity
  2018-03-09  3:32 ` [PATCH V4 4/4] scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity Ming Lei
@ 2018-03-10 10:15   ` Christoph Hellwig
  2018-03-12  9:00     ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-03-10 10:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen,
	Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Omar Sandoval, Paolo Bonzini

This looks generally fine to me:

Reviewed-by: Christoph Hellwig <hch@lst.de>

As a follow on we should probably kill virtscsi_queuecommand_single and
thus virtscsi_host_template_single as well.
> Given storage IO is always C/S model, there isn't such issue with SCSI_MQ(blk-mq),

What does C/S mean here?

> @@ -580,10 +573,7 @@ static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
>  					struct scsi_cmnd *sc)
>  {
>  	struct virtio_scsi *vscsi = shost_priv(sh);
> -	struct virtio_scsi_target_state *tgt =
> -				scsi_target(sc->device)->hostdata;
>  
> -	atomic_inc(&tgt->reqs);
>  	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
>  }

>  static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
>  				       struct scsi_cmnd *sc)
>  {
>  	struct virtio_scsi *vscsi = shost_priv(sh);
> -	struct virtio_scsi_target_state *tgt =
> -				scsi_target(sc->device)->hostdata;
> -	struct virtio_scsi_vq *req_vq;
> -
> -	if (shost_use_blk_mq(sh))
> -		req_vq = virtscsi_pick_vq_mq(vscsi, sc);
> -	else
> -		req_vq = virtscsi_pick_vq(vscsi, tgt);
> +	struct virtio_scsi_vq *req_vq = virtscsi_pick_vq_mq(vscsi, sc);
>  
>  	return virtscsi_queuecommand(vscsi, req_vq, sc);

Given how virtscsi_pick_vq_mq works virtscsi_queuecommand_single and
virtscsi_queuecommand_multi now have identical behavior.  That means
virtscsi_queuecommand_single should be removed, and
virtscsi_queuecommand_multi should be merged into virtscsi_queuecommand,

> @@ -823,6 +768,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
>  	.target_alloc = virtscsi_target_alloc,
>  	.target_destroy = virtscsi_target_destroy,
>  	.track_queue_depth = 1,
> +	.force_blk_mq = 1,

This probably isn't strictly needed.  That being said with your
change we could probably just drop virtscsi_host_template_single entirely.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-10 10:09   ` Christoph Hellwig
@ 2018-03-10 15:01     ` Ming Lei
  2018-03-12  7:52       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2018-03-10 15:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen, linux-scsi,
	linux-block, Meelis Roos, Don Brace, Kashyap Desai,
	Laurence Oberman, Mike Snitzer, Hannes Reinecke,
	Artem Bityutskiy

On Sat, Mar 10, 2018 at 11:09:59AM +0100, Christoph Hellwig wrote:
> > +static void hpsa_setup_reply_map(struct ctlr_info *h)
> > +{
> > +	const struct cpumask *mask;
> > +	unsigned int queue, cpu;
> > +
> > +	for (queue = 0; queue < h->msix_vectors; queue++) {
> > +		mask = pci_irq_get_affinity(h->pdev, queue);
> > +		if (!mask)
> > +			goto fallback;
> > +
> > +		for_each_cpu(cpu, mask)
> > +			h->reply_map[cpu] = queue;
> > +	}
> > +	return;
> > +
> > +fallback:
> > +	for_each_possible_cpu(cpu)
> > +		h->reply_map[cpu] = 0;
> > +}
> 
> > +	h->reply_map = kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_KERNEL);
> > +	if (!h->reply_map) {
> > +		kfree(h);
> > +		return NULL;
> > +	}
> > +	return h;
> 
> I really dislike this being open coded in drivers.  It really should
> be helper chared with the blk-mq map building that drivers just use.
> 
> For now just have a low-level blk_pci_map_queues that
> blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
> it might make sense to change the blk-mq callout to that low-level
> prototype as well.

The way for selecting reply queue is needed for non scsi_mq too.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
  2018-03-09 22:14   ` Don Brace
  2018-03-10 10:09   ` Christoph Hellwig
@ 2018-03-12  7:37   ` Bityutskiy, Artem
  2018-03-12 15:39   ` Don Brace
  3 siblings, 0 replies; 21+ messages in thread
From: Bityutskiy, Artem @ 2018-03-12  7:37 UTC (permalink / raw)
  To: martin.petersen, James.Bottomley, axboe, ming.lei
  Cc: linux-block, snitzer, hch, hare, mroos, linux-scsi, don.brace,
	loberman, kashyap.desai

TGludXgtUmVncmVzc2lvbi1JRDogbHIjMTVhMTE1DQoNCk9uIEZyaSwgMjAxOC0wMy0wOSBhdCAx
MTozMiArMDgwMCwgTWluZyBMZWkgd3JvdGU6DQo+IEZyb20gODQ2NzZjMWYyMSAoZ2VuaXJxL2Fm
ZmluaXR5OiBhc3NpZ24gdmVjdG9ycyB0byBhbGwgcG9zc2libGUgQ1BVcyksDQo+IG9uZSBtc2l4
IHZlY3RvciBjYW4gYmUgY3JlYXRlZCB3aXRob3V0IGFueSBvbmxpbmUgQ1BVIG1hcHBlZCwgdGhl
biBvbmUNCj4gY29tbWFuZCdzIGNvbXBsZXRpb24gbWF5IG5vdCBiZSBub3RpZmllZC4NCj4gDQo+
IFRoaXMgcGF0Y2ggc2V0dXBzIG1hcHBpbmcgYmV0d2VlbiBjcHUgYW5kIHJlcGx5IHF1ZXVlIGFj
Y29yZGluZyB0byBpcnENCj4gYWZmaW5pdHkgaW5mbyByZXRyaXZlZCBieSBwY2lfaXJxX2dldF9h
ZmZpbml0eSgpLCBhbmQgdXNlcyB0aGlzIG1hcHBpbmcNCj4gdGFibGUgdG8gY2hvb3NlIHJlcGx5
IHF1ZXVlIGZvciBxdWV1aW5nIG9uZSBjb21tYW5kLg0KPiANCj4gVGhlbiB0aGUgY2hvc2VuIHJl
cGx5IHF1ZXVlIGhhcyB0byBiZSBhY3RpdmUsIGFuZCBmaXhlcyBJTyBoYW5nIGNhdXNlZA0KPiBi
eSB1c2luZyBpbmFjdGl2ZSByZXBseSBxdWV1ZSB3aGljaCBkb2Vzbid0IGhhdmUgYW55IG9ubGlu
ZSBDUFUgbWFwcGVkLg0KPiANCj4gQ2M6IEhhbm5lcyBSZWluZWNrZSA8aGFyZUBzdXNlLmRlPg0K
PiBDYzogIk1hcnRpbiBLLiBQZXRlcnNlbiIgPG1hcnRpbi5wZXRlcnNlbkBvcmFjbGUuY29tPiwN
Cj4gQ2M6IEphbWVzIEJvdHRvbWxleSA8amFtZXMuYm90dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlw
LmNvbT4sDQo+IENjOiBDaHJpc3RvcGggSGVsbHdpZyA8aGNoQGxzdC5kZT4sDQo+IENjOiBEb24g
QnJhY2UgPGRvbi5icmFjZUBtaWNyb3NlbWkuY29tPg0KPiBDYzogS2FzaHlhcCBEZXNhaSA8a2Fz
aHlhcC5kZXNhaUBicm9hZGNvbS5jb20+DQo+IENjOiBMYXVyZW5jZSBPYmVybWFuIDxsb2Jlcm1h
bkByZWRoYXQuY29tPg0KPiBDYzogTWVlbGlzIFJvb3MgPG1yb29zQGxpbnV4LmVlPg0KPiBDYzog
QXJ0ZW0gQml0eXV0c2tpeSA8YXJ0ZW0uYml0eXV0c2tpeUBpbnRlbC5jb20+DQo+IENjOiBNaWtl
IFNuaXR6ZXIgPHNuaXR6ZXJAcmVkaGF0LmNvbT4NCj4gVGVzdGVkLWJ5OiBMYXVyZW5jZSBPYmVy
bWFuIDxsb2Jlcm1hbkByZWRoYXQuY29tPg0KPiBUZXN0ZWQtYnk6IERvbiBCcmFjZSA8ZG9uLmJy
YWNlQG1pY3Jvc2VtaS5jb20+DQo+IEZpeGVzOiA4NDY3NmMxZjIxZTggKCJnZW5pcnEvYWZmaW5p
dHk6IGFzc2lnbiB2ZWN0b3JzIHRvIGFsbCBwb3NzaWJsZSBDUFVzIikNCj4gU2lnbmVkLW9mZi1i
eTogTWluZyBMZWkgPG1pbmcubGVpQHJlZGhhdC5jb20+DQoNClRlc3RlZC1ieTogQXJ0ZW0gQml0
eXV0c2tpeSA8YXJ0ZW0uYml0eXV0c2tpeUBpbnRlbC5jb20+DQpMaW5rOiBodHRwczovL2xrbWwu
a2VybmVsLm9yZy9yLzE1MTkzMTEyNzAuMjUzNS41My5jYW1lbEBpbnRlbC5jb20NCg0KVGhlc2Ug
MiBwYXRjaGVzIG1ha2UgdGhlIERlbGwgUjY0MCByZWdyZXNzaW9uIHRoYXQgSSByZXBvcnRlZCBn
byBhd2F5Lg0KVGVzdGVkIG9uIHRvcCBvZiB2NC4xNi1yYzUsIHRoYW5rcyENCg0KLS0gDQpCZXN0
IFJlZ2FyZHMsDQpBcnRlbSBCaXR5dXRza2l5Ci0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQpJbnRlbCBGaW5sYW5kIE95
ClJlZ2lzdGVyZWQgQWRkcmVzczogUEwgMjgxLCAwMDE4MSBIZWxzaW5raSAKQnVzaW5lc3MgSWRl
bnRpdHkgQ29kZTogMDM1NzYwNiAtIDQgCkRvbWljaWxlZCBpbiBIZWxzaW5raSAKClRoaXMgZS1t
YWlsIGFuZCBhbnkgYXR0YWNobWVudHMgbWF5IGNvbnRhaW4gY29uZmlkZW50aWFsIG1hdGVyaWFs
IGZvcgp0aGUgc29sZSB1c2Ugb2YgdGhlIGludGVuZGVkIHJlY2lwaWVudChzKS4gQW55IHJldmll
dyBvciBkaXN0cmlidXRpb24KYnkgb3RoZXJzIGlzIHN0cmljdGx5IHByb2hpYml0ZWQuIElmIHlv
dSBhcmUgbm90IHRoZSBpbnRlbmRlZApyZWNpcGllbnQsIHBsZWFzZSBjb250YWN0IHRoZSBzZW5k
ZXIgYW5kIGRlbGV0ZSBhbGwgY29waWVzLgo=

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-10 15:01     ` Ming Lei
@ 2018-03-12  7:52       ` Christoph Hellwig
  2018-03-12  9:19         ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2018-03-12  7:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, James Bottomley, Jens Axboe,
	Martin K . Petersen, linux-scsi, linux-block, Meelis Roos,
	Don Brace, Kashyap Desai, Laurence Oberman, Mike Snitzer,
	Hannes Reinecke, Artem Bityutskiy

On Sat, Mar 10, 2018 at 11:01:43PM +0800, Ming Lei wrote:
> > I really dislike this being open coded in drivers.  It really should
> > be helper chared with the blk-mq map building that drivers just use.
> > 
> > For now just have a low-level blk_pci_map_queues that
> > blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
> > it might make sense to change the blk-mq callout to that low-level
> > prototype as well.
> 
> The way for selecting reply queue is needed for non scsi_mq too.

Which still doesn't prevent you from using a common helper.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 4/4] scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity
  2018-03-10 10:15   ` Christoph Hellwig
@ 2018-03-12  9:00     ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2018-03-12  9:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen, linux-scsi,
	linux-block, Meelis Roos, Don Brace, Kashyap Desai,
	Laurence Oberman, Mike Snitzer, Omar Sandoval, Paolo Bonzini

On Sat, Mar 10, 2018 at 11:15:20AM +0100, Christoph Hellwig wrote:
> This looks generally fine to me:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> As a follow on we should probably kill virtscsi_queuecommand_single and
> thus virtscsi_host_template_single as well.
> > Given storage IO is always C/S model, there isn't such issue with SCSI_MQ(blk-mq),
> 
> What does C/S mean here?

Client–Server.

> 
> > @@ -580,10 +573,7 @@ static int virtscsi_queuecommand_single(struct Scsi_Host *sh,
> >  					struct scsi_cmnd *sc)
> >  {
> >  	struct virtio_scsi *vscsi = shost_priv(sh);
> > -	struct virtio_scsi_target_state *tgt =
> > -				scsi_target(sc->device)->hostdata;
> >  
> > -	atomic_inc(&tgt->reqs);
> >  	return virtscsi_queuecommand(vscsi, &vscsi->req_vqs[0], sc);
> >  }
> 
> >  static int virtscsi_queuecommand_multi(struct Scsi_Host *sh,
> >  				       struct scsi_cmnd *sc)
> >  {
> >  	struct virtio_scsi *vscsi = shost_priv(sh);
> > -	struct virtio_scsi_target_state *tgt =
> > -				scsi_target(sc->device)->hostdata;
> > -	struct virtio_scsi_vq *req_vq;
> > -
> > -	if (shost_use_blk_mq(sh))
> > -		req_vq = virtscsi_pick_vq_mq(vscsi, sc);
> > -	else
> > -		req_vq = virtscsi_pick_vq(vscsi, tgt);
> > +	struct virtio_scsi_vq *req_vq = virtscsi_pick_vq_mq(vscsi, sc);
> >  
> >  	return virtscsi_queuecommand(vscsi, req_vq, sc);
> 
> Given how virtscsi_pick_vq_mq works virtscsi_queuecommand_single and
> virtscsi_queuecommand_multi now have identical behavior.  That means
> virtscsi_queuecommand_single should be removed, and
> virtscsi_queuecommand_multi should be merged into virtscsi_queuecommand,

OK.

> 
> > @@ -823,6 +768,7 @@ static struct scsi_host_template virtscsi_host_template_single = {
> >  	.target_alloc = virtscsi_target_alloc,
> >  	.target_destroy = virtscsi_target_destroy,
> >  	.track_queue_depth = 1,
> > +	.force_blk_mq = 1,
> 
> This probably isn't strictly needed.  That being said with your
> change we could probably just drop virtscsi_host_template_single entirely.
> 

OK.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-12  7:52       ` Christoph Hellwig
@ 2018-03-12  9:19         ` Ming Lei
  0 siblings, 0 replies; 21+ messages in thread
From: Ming Lei @ 2018-03-12  9:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: James Bottomley, Jens Axboe, Martin K . Petersen, linux-scsi,
	linux-block, Meelis Roos, Don Brace, Kashyap Desai,
	Laurence Oberman, Mike Snitzer, Hannes Reinecke,
	Artem Bityutskiy

On Mon, Mar 12, 2018 at 08:52:02AM +0100, Christoph Hellwig wrote:
> On Sat, Mar 10, 2018 at 11:01:43PM +0800, Ming Lei wrote:
> > > I really dislike this being open coded in drivers.  It really should
> > > be helper chared with the blk-mq map building that drivers just use.
> > > 
> > > For now just have a low-level blk_pci_map_queues that
> > > blk_mq_pci_map_queues, hpsa and megaraid can share.  In the long run
> > > it might make sense to change the blk-mq callout to that low-level
> > > prototype as well.
> > 
> > The way for selecting reply queue is needed for non scsi_mq too.
> 
> Which still doesn't prevent you from using a common helper.

The only common code is the following part:

+       for (queue = 0; queue < instance->msix_vectors; queue++) {
+               mask = pci_irq_get_affinity(instance->pdev, queue);
+               if (!mask)
+                       goto fallback;
+
+               for_each_cpu(cpu, mask)
+                       instance->reply_map[cpu] = queue;
+       }

For megraraid_sas, the fallback code need to handle mapping in the
following way for legacy vectors:

       for_each_possible_cpu(cpu)
               instance->reply_map[cpu] = cpu % instance->msix_vectors;


So not sure if it is worth of a common helper, given there may not be
potential users of the helper.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
  2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
                     ` (2 preceding siblings ...)
  2018-03-12  7:37   ` Bityutskiy, Artem
@ 2018-03-12 15:39   ` Don Brace
  3 siblings, 0 replies; 21+ messages in thread
From: Don Brace @ 2018-03-12 15:39 UTC (permalink / raw)
  To: Ming Lei, James Bottomley, Jens Axboe, Martin K . Petersen
  Cc: Christoph Hellwig, linux-scsi, linux-block, Meelis Roos,
	Kashyap Desai, Laurence Oberman, Mike Snitzer, Hannes Reinecke,
	James Bottomley, Artem Bityutskiy

> -----Original Message-----
> From: Ming Lei [mailto:ming.lei@redhat.com]
> Sent: Thursday, March 08, 2018 9:32 PM
> To: James Bottomley <James.Bottomley@HansenPartnership.com>; Jens Axboe
> <axboe@fb.com>; Martin K . Petersen <martin.petersen@oracle.com>
> Cc: Christoph Hellwig <hch@lst.de>; linux-scsi@vger.kernel.org; linux-
> block@vger.kernel.org; Meelis Roos <mroos@linux.ee>; Don Brace
> <don.brace@microsemi.com>; Kashyap Desai
> <kashyap.desai@broadcom.com>; Laurence Oberman
> <loberman@redhat.com>; Mike Snitzer <snitzer@redhat.com>; Ming Lei
> <ming.lei@redhat.com>; Hannes Reinecke <hare@suse.de>; James Bottomley
> <james.bottomley@hansenpartnership.com>; Artem Bityutskiy
> <artem.bityutskiy@intel.com>
> Subject: [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue
>=20
> EXTERNAL EMAIL
>=20
>=20
> From 84676c1f21 (genirq/affinity: assign vectors to all possible CPUs),
> one msix vector can be created without any online CPU mapped, then one
> command's completion may not be notified.
>=20
> This patch setups mapping between cpu and reply queue according to irq
> affinity info retrived by pci_irq_get_affinity(), and uses this mapping
> table to choose reply queue for queuing one command.
>=20
> Then the chosen reply queue has to be active, and fixes IO hang caused
> by using inactive reply queue which doesn't have any online CPU mapped.
>=20
> Cc: Hannes Reinecke <hare@suse.de>
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
> Cc: James Bottomley <james.bottomley@hansenpartnership.com>,
> Cc: Christoph Hellwig <hch@lst.de>,
> Cc: Don Brace <don.brace@microsemi.com>
> Cc: Kashyap Desai <kashyap.desai@broadcom.com>
> Cc: Laurence Oberman <loberman@redhat.com>
> Cc: Meelis Roos <mroos@linux.ee>
> Cc: Artem Bityutskiy <artem.bityutskiy@intel.com>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Tested-by: Laurence Oberman <loberman@redhat.com>
> Tested-by: Don Brace <don.brace@microsemi.com>
> Fixes: 84676c1f21e8 ("genirq/affinity: assign vectors to all possible CPU=
s")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---

Acked-by: Don Brace <don.brace@microsemi.com>
Tested-by: Don Brace <don.brace@microsemi.com>
   * Rebuilt test rig: applied the following patches to Linus's tree 4.16.0=
-rc4+:
                 [PATCH V4 1_4] scsi: hpsa: fix selection of reply queue - =
Ming Lei <ming.lei@redhat.com> - 2018-03-08 2132.eml
                 [PATCH V4 3_4] scsi: introduce force_blk_mq - Ming Lei <mi=
ng.lei@redhat.com> - 2018-03-08 2132.eml
        * fio tests on 6 LVs on P441 controller (fw 6.59) 5 days.
        * fio tests on 10 HBA disks on P431 (fw 4.54) controller. 3 days. (=
 concurrent with P441 tests)

>  drivers/scsi/hpsa.c | 73 +++++++++++++++++++++++++++++++++++++++--------=
------
>  drivers/scsi/hpsa.h |  1 +
>  2 files changed, 55 insertions(+), 19 deletions(-)
>=20
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 5293e6827ce5..3a9eca163db8 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1045,11 +1045,7 @@ static void set_performant_mode(struct ctlr_info
> *h, struct CommandList *c,
>                 c->busaddr |=3D 1 | (h->blockFetchTable[c->Header.SGList]=
 << 1);
>                 if (unlikely(!h->msix_vectors))
>                         return;
> -               if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -                       c->Header.ReplyQueue =3D
> -                               raw_smp_processor_id() % h->nreply_queues=
;
> -               else
> -                       c->Header.ReplyQueue =3D reply_queue % h->nreply_=
queues;
> +               c->Header.ReplyQueue =3D reply_queue;
>         }
>  }
>=20
> @@ -1063,10 +1059,7 @@ static void set_ioaccel1_performant_mode(struct
> ctlr_info *h,
>          * Tell the controller to post the reply to the queue for this
>          * processor.  This seems to give the best I/O throughput.
>          */
> -       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -               cp->ReplyQueue =3D smp_processor_id() % h->nreply_queues;
> -       else
> -               cp->ReplyQueue =3D reply_queue % h->nreply_queues;
> +       cp->ReplyQueue =3D reply_queue;
>         /*
>          * Set the bits in the address sent down to include:
>          *  - performant mode bit (bit 0)
> @@ -1087,10 +1080,7 @@ static void
> set_ioaccel2_tmf_performant_mode(struct ctlr_info *h,
>         /* Tell the controller to post the reply to the queue for this
>          * processor.  This seems to give the best I/O throughput.
>          */
> -       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -               cp->reply_queue =3D smp_processor_id() % h->nreply_queues=
;
> -       else
> -               cp->reply_queue =3D reply_queue % h->nreply_queues;
> +       cp->reply_queue =3D reply_queue;
>         /* Set the bits in the address sent down to include:
>          *  - performant mode bit not used in ioaccel mode 2
>          *  - pull count (bits 0-3)
> @@ -1109,10 +1099,7 @@ static void set_ioaccel2_performant_mode(struct
> ctlr_info *h,
>          * Tell the controller to post the reply to the queue for this
>          * processor.  This seems to give the best I/O throughput.
>          */
> -       if (likely(reply_queue =3D=3D DEFAULT_REPLY_QUEUE))
> -               cp->reply_queue =3D smp_processor_id() % h->nreply_queues=
;
> -       else
> -               cp->reply_queue =3D reply_queue % h->nreply_queues;
> +       cp->reply_queue =3D reply_queue;
>         /*
>          * Set the bits in the address sent down to include:
>          *  - performant mode bit not used in ioaccel mode 2
> @@ -1157,6 +1144,8 @@ static void __enqueue_cmd_and_start_io(struct
> ctlr_info *h,
>  {
>         dial_down_lockup_detection_during_fw_flash(h, c);
>         atomic_inc(&h->commands_outstanding);
> +
> +       reply_queue =3D h->reply_map[raw_smp_processor_id()];
>         switch (c->cmd_type) {
>         case CMD_IOACCEL1:
>                 set_ioaccel1_performant_mode(h, c, reply_queue);
> @@ -7376,6 +7365,26 @@ static void hpsa_disable_interrupt_mode(struct
> ctlr_info *h)
>         h->msix_vectors =3D 0;
>  }
>=20
> +static void hpsa_setup_reply_map(struct ctlr_info *h)
> +{
> +       const struct cpumask *mask;
> +       unsigned int queue, cpu;
> +
> +       for (queue =3D 0; queue < h->msix_vectors; queue++) {
> +               mask =3D pci_irq_get_affinity(h->pdev, queue);
> +               if (!mask)
> +                       goto fallback;
> +
> +               for_each_cpu(cpu, mask)
> +                       h->reply_map[cpu] =3D queue;
> +       }
> +       return;
> +
> +fallback:
> +       for_each_possible_cpu(cpu)
> +               h->reply_map[cpu] =3D 0;
> +}
> +
>  /* If MSI/MSI-X is supported by the kernel we will try to enable it on
>   * controllers that are capable. If not, we use legacy INTx mode.
>   */
> @@ -7771,6 +7780,10 @@ static int hpsa_pci_init(struct ctlr_info *h)
>         err =3D hpsa_interrupt_mode(h);
>         if (err)
>                 goto clean1;
> +
> +       /* setup mapping between CPU and reply queue */
> +       hpsa_setup_reply_map(h);
> +
>         err =3D hpsa_pci_find_memory_BAR(h->pdev, &h->paddr);
>         if (err)
>                 goto clean2;    /* intmode+region, pci */
> @@ -8480,6 +8493,28 @@ static struct workqueue_struct
> *hpsa_create_controller_wq(struct ctlr_info *h,
>         return wq;
>  }
>=20
> +static void hpda_free_ctlr_info(struct ctlr_info *h)
> +{
> +       kfree(h->reply_map);
> +       kfree(h);
> +}
> +
> +static struct ctlr_info *hpda_alloc_ctlr_info(void)
> +{
> +       struct ctlr_info *h;
> +
> +       h =3D kzalloc(sizeof(*h), GFP_KERNEL);
> +       if (!h)
> +               return NULL;
> +
> +       h->reply_map =3D kzalloc(sizeof(*h->reply_map) * nr_cpu_ids, GFP_=
KERNEL);
> +       if (!h->reply_map) {
> +               kfree(h);
> +               return NULL;
> +       }
> +       return h;
> +}
> +
>  static int hpsa_init_one(struct pci_dev *pdev, const struct pci_device_i=
d *ent)
>  {
>         int dac, rc;
> @@ -8517,7 +8552,7 @@ static int hpsa_init_one(struct pci_dev *pdev, cons=
t
> struct pci_device_id *ent)
>          * the driver.  See comments in hpsa.h for more info.
>          */
>         BUILD_BUG_ON(sizeof(struct CommandList) %
> COMMANDLIST_ALIGNMENT);
> -       h =3D kzalloc(sizeof(*h), GFP_KERNEL);
> +       h =3D hpda_alloc_ctlr_info();
>         if (!h) {
>                 dev_err(&pdev->dev, "Failed to allocate controller head\n=
");
>                 return -ENOMEM;
> @@ -8916,7 +8951,7 @@ static void hpsa_remove_one(struct pci_dev *pdev)
>         h->lockup_detected =3D NULL;                      /* init_one 2 *=
/
>         /* (void) pci_disable_pcie_error_reporting(pdev); */    /* init_o=
ne 1 */
>=20
> -       kfree(h);                                       /* init_one 1 */
> +       hpda_free_ctlr_info(h);                         /* init_one 1 */
>  }
>=20
>  static int hpsa_suspend(__attribute__((unused)) struct pci_dev *pdev,
> diff --git a/drivers/scsi/hpsa.h b/drivers/scsi/hpsa.h
> index 018f980a701c..fb9f5e7f8209 100644
> --- a/drivers/scsi/hpsa.h
> +++ b/drivers/scsi/hpsa.h
> @@ -158,6 +158,7 @@ struct bmic_controller_parameters {
>  #pragma pack()
>=20
>  struct ctlr_info {
> +       unsigned int *reply_map;
>         int     ctlr;
>         char    devname[8];
>         char    *product_name;
> --
> 2.9.5

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-03-12 15:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09  3:32 [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Ming Lei
2018-03-09  3:32 ` [PATCH V4 1/4] scsi: hpsa: fix selection of reply queue Ming Lei
2018-03-09 22:14   ` Don Brace
2018-03-10 10:09   ` Christoph Hellwig
2018-03-10 15:01     ` Ming Lei
2018-03-12  7:52       ` Christoph Hellwig
2018-03-12  9:19         ` Ming Lei
2018-03-12  7:37   ` Bityutskiy, Artem
2018-03-12 15:39   ` Don Brace
2018-03-09  3:32 ` [PATCH V4 2/4] scsi: megaraid_sas: " Ming Lei
2018-03-09 11:07   ` Kashyap Desai
2018-03-09 12:03     ` Ming Lei
2018-03-09 14:03       ` Kashyap Desai
2018-03-09  3:32 ` [PATCH V4 3/4] scsi: introduce force_blk_mq Ming Lei
2018-03-10 10:10   ` Christoph Hellwig
2018-03-09  3:32 ` [PATCH V4 4/4] scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity Ming Lei
2018-03-10 10:15   ` Christoph Hellwig
2018-03-12  9:00     ` Ming Lei
2018-03-09  7:00 ` [PATCH V4 0/4] SCSI: fix selection of reply(hw) queue Hannes Reinecke
2018-03-09  7:39   ` Ming Lei
2018-03-09 14:01 ` Meelis Roos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.