Linux-Block Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V2 0/5] blk-mq:  Wait for for hctx inflight requests on CPU unplug
@ 2019-05-27 15:02 Ming Lei
  2019-05-27 15:02 ` [PATCH V2 1/5] scsi: select reply queue from request's CPU Ming Lei
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-27 15:02 UTC (permalink / raw)
  To: Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, John Garry, Keith Busch, Thomas Gleixner,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Hi,

blk-mq drivers often use managed IRQ, which affinity is setup
automatically by genirq core.

For managed IRQ, we need to make sure that there isn't in-flight
requests when the managed IRQ is going to shutdown.

This patch waits for inflight requests associated with one
going-to-shutdown managed IRQ in blk-mq's CPU hotplug handler.

One special case is that some SCSI devices have multiple private
completion(reply) queue even though they only have one blk-mq hw queue,
and the private completion queue is associated with one managed
IRQ. Wait for inflight requests for these SCSI device too if last
CPU of the completion queue is going to shutdown.

SCSI device's internal commands aren't covered in this patchset,
and they are much less important than requests from blk-mq/scsi
core.

V2:
	- cover private multiple completion(reply) queue
	- remove timeout during waitting, because some driver doesn't
	implemnt proper .timeout handler.


Ming Lei (5):
  scsi: select reply queue from request's CPU
  blk-mq: introduce .complete_queue_affinity
  scsi: core: implement callback of .complete_queue_affinity
  scsi: implement .complete_queue_affinity
  blk-mq: Wait for for hctx inflight requests on CPU unplug

 block/blk-mq-tag.c                          |  2 +-
 block/blk-mq-tag.h                          |  5 ++
 block/blk-mq.c                              | 94 +++++++++++++++++++--
 drivers/scsi/hisi_sas/hisi_sas_main.c       |  5 +-
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c      | 11 +++
 drivers/scsi/hpsa.c                         | 14 ++-
 drivers/scsi/megaraid/megaraid_sas_base.c   | 10 +++
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 +-
 drivers/scsi/mpt3sas/mpt3sas_base.c         | 16 ++--
 drivers/scsi/mpt3sas/mpt3sas_scsih.c        | 11 +++
 drivers/scsi/scsi_lib.c                     | 14 +++
 include/linux/blk-mq.h                      | 12 ++-
 include/scsi/scsi_cmnd.h                    | 11 +++
 include/scsi/scsi_host.h                    | 10 +++
 14 files changed, 197 insertions(+), 22 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 1/5] scsi: select reply queue from request's CPU
  2019-05-27 15:02 [PATCH V2 0/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
@ 2019-05-27 15:02 ` Ming Lei
  2019-05-28  5:43   ` Hannes Reinecke
  2019-05-28 10:33   ` John Garry
  2019-05-27 15:02 ` [PATCH V2 2/5] blk-mq: introduce .complete_queue_affinity Ming Lei
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-27 15:02 UTC (permalink / raw)
  To: Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, John Garry, Keith Busch, Thomas Gleixner,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Hisi_sas_v3_hw, hpsa, megaraid and mpt3sas use single blk-mq hw queue
to submit request, meantime apply multiple private reply queues served as
completion queue. The mapping between CPU and reply queue is setup via
pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY) just like the usual
blk-mq queue mapping.

These drivers always use current CPU(raw_smp_processor_id) to figure out
the reply queue. Switch to use request's CPU to get the reply queue,
so we can drain in-flight request via blk-mq's API before the last CPU of
the reply queue becomes offline.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hisi_sas/hisi_sas_main.c       |  5 +++--
 drivers/scsi/hpsa.c                         |  2 +-
 drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 ++--
 drivers/scsi/mpt3sas/mpt3sas_base.c         | 16 ++++++++--------
 include/scsi/scsi_cmnd.h                    | 11 +++++++++++
 5 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
index 8a7feb8ed8d6..ab9d8e7bfc8e 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_main.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
@@ -471,9 +471,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
 		return -ECOMM;
 	}
 
+	/* only V3 hardware setup .reply_map */
 	if (hisi_hba->reply_map) {
-		int cpu = raw_smp_processor_id();
-		unsigned int dq_index = hisi_hba->reply_map[cpu];
+		unsigned int dq_index = hisi_hba->reply_map[
+			scsi_cmnd_cpu(task->uldd_task)];
 
 		*dq_pointer = dq = &hisi_hba->dq[dq_index];
 	} else {
diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 1bef1da273c2..72f9edb86752 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1145,7 +1145,7 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
 	dial_down_lockup_detection_during_fw_flash(h, c);
 	atomic_inc(&h->commands_outstanding);
 
-	reply_queue = h->reply_map[raw_smp_processor_id()];
+	reply_queue = h->reply_map[scsi_cmnd_cpu(c->scsi_cmd)];
 	switch (c->cmd_type) {
 	case CMD_IOACCEL1:
 		set_ioaccel1_performant_mode(h, c, reply_queue);
diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c
index 4dfa0685a86c..6bed77cfaf9a 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fusion.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c
@@ -2699,7 +2699,7 @@ megasas_build_ldio_fusion(struct megasas_instance *instance,
 	}
 
 	cmd->request_desc->SCSIIO.MSIxIndex =
-		instance->reply_map[raw_smp_processor_id()];
+		instance->reply_map[scsi_cmnd_cpu(scp)];
 
 	if (instance->adapter_type >= VENTURA_SERIES) {
 		/* FP for Optimal raid level 1.
@@ -3013,7 +3013,7 @@ megasas_build_syspd_fusion(struct megasas_instance *instance,
 	cmd->request_desc->SCSIIO.DevHandle = io_request->DevHandle;
 
 	cmd->request_desc->SCSIIO.MSIxIndex =
-		instance->reply_map[raw_smp_processor_id()];
+		instance->reply_map[scsi_cmnd_cpu(scmd)];
 
 	if (!fp_possible) {
 		/* system pd firmware path */
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 8aacbd1e7db2..8135e980f591 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -3266,7 +3266,7 @@ mpt3sas_base_get_reply_virt_addr(struct MPT3SAS_ADAPTER *ioc, u32 phys_addr)
 }
 
 static inline u8
-_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
+_base_get_msix_index(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd)
 {
 	/* Enables reply_queue load balancing */
 	if (ioc->msix_load_balance)
@@ -3274,7 +3274,7 @@ _base_get_msix_index(struct MPT3SAS_ADAPTER *ioc)
 		    base_mod64(atomic64_add_return(1,
 		    &ioc->total_io_cnt), ioc->reply_queue_count) : 0;
 
-	return ioc->cpu_msix_table[raw_smp_processor_id()];
+	return ioc->cpu_msix_table[scsi_cmnd_cpu(scmd)];
 }
 
 /**
@@ -3325,7 +3325,7 @@ mpt3sas_base_get_smid_scsiio(struct MPT3SAS_ADAPTER *ioc, u8 cb_idx,
 
 	smid = tag + 1;
 	request->cb_idx = cb_idx;
-	request->msix_io = _base_get_msix_index(ioc);
+	request->msix_io = _base_get_msix_index(ioc, scmd);
 	request->smid = smid;
 	INIT_LIST_HEAD(&request->chain_list);
 	return smid;
@@ -3498,7 +3498,7 @@ _base_put_smid_mpi_ep_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
 	_base_clone_mpi_to_sys_mem(mpi_req_iomem, (void *)mfp,
 					ioc->request_sz);
 	descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
-	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
 	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
 	descriptor.SCSIIO.LMID = 0;
@@ -3520,7 +3520,7 @@ _base_put_smid_scsi_io(struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 handle)
 
 
 	descriptor.SCSIIO.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_SCSI_IO;
-	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.SCSIIO.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
 	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
 	descriptor.SCSIIO.LMID = 0;
@@ -3543,7 +3543,7 @@ mpt3sas_base_put_smid_fast_path(struct MPT3SAS_ADAPTER *ioc, u16 smid,
 
 	descriptor.SCSIIO.RequestFlags =
 	    MPI25_REQ_DESCRIPT_FLAGS_FAST_PATH_SCSI_IO;
-	descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc);
+	descriptor.SCSIIO.MSIxIndex = _base_get_msix_index(ioc, NULL);
 	descriptor.SCSIIO.SMID = cpu_to_le16(smid);
 	descriptor.SCSIIO.DevHandle = cpu_to_le16(handle);
 	descriptor.SCSIIO.LMID = 0;
@@ -3607,7 +3607,7 @@ mpt3sas_base_put_smid_nvme_encap(struct MPT3SAS_ADAPTER *ioc, u16 smid)
 
 	descriptor.Default.RequestFlags =
 		MPI26_REQ_DESCRIPT_FLAGS_PCIE_ENCAPSULATED;
-	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.Default.SMID = cpu_to_le16(smid);
 	descriptor.Default.LMID = 0;
 	descriptor.Default.DescriptorTypeDependent = 0;
@@ -3639,7 +3639,7 @@ mpt3sas_base_put_smid_default(struct MPT3SAS_ADAPTER *ioc, u16 smid)
 	}
 	request = (u64 *)&descriptor;
 	descriptor.Default.RequestFlags = MPI2_REQ_DESCRIPT_FLAGS_DEFAULT_TYPE;
-	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc);
+	descriptor.Default.MSIxIndex =  _base_get_msix_index(ioc, NULL);
 	descriptor.Default.SMID = cpu_to_le16(smid);
 	descriptor.Default.LMID = 0;
 	descriptor.Default.DescriptorTypeDependent = 0;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 76ed5e4acd38..ab60883c2c40 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -332,4 +332,15 @@ static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
 	return xfer_len;
 }
 
+static inline int scsi_cmnd_cpu(struct scsi_cmnd *scmd)
+{
+	if (!scmd || !scmd->request)
+		return raw_smp_processor_id();
+
+	if (!scmd->request->mq_ctx)
+		return raw_smp_processor_id();
+
+	return blk_mq_rq_cpu(scmd->request);
+}
+
 #endif /* _SCSI_SCSI_CMND_H */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 2/5] blk-mq: introduce .complete_queue_affinity
  2019-05-27 15:02 [PATCH V2 0/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
  2019-05-27 15:02 ` [PATCH V2 1/5] scsi: select reply queue from request's CPU Ming Lei
@ 2019-05-27 15:02 ` Ming Lei
  2019-05-27 15:02 ` [PATCH V2 3/5] scsi: core: implement callback of .complete_queue_affinity Ming Lei
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-27 15:02 UTC (permalink / raw)
  To: Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, John Garry, Keith Busch, Thomas Gleixner,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Some SCSI devices support single hw queue(tags), meantime allow
multiple private complete queues for handling request delivery &
completion. And mapping between CPU and private completion queue is
setup via pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY), just
like normal blk-mq's queue mapping.

Introduce .complete_queue_affinity callback for getting the
complete queue's affinity, so that we can drain in-flight requests
delivered from the complete queue if last CPU of the completion queue
becomes offline.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/blk-mq.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 15d1aa53d96c..56f2e2ed62a7 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -140,7 +140,8 @@ typedef int (poll_fn)(struct blk_mq_hw_ctx *);
 typedef int (map_queues_fn)(struct blk_mq_tag_set *set);
 typedef bool (busy_fn)(struct request_queue *);
 typedef void (complete_fn)(struct request *);
-
+typedef const struct cpumask *(hctx_complete_queue_affinity_fn)(
+		struct blk_mq_hw_ctx *, int);
 
 struct blk_mq_ops {
 	/*
@@ -207,6 +208,15 @@ struct blk_mq_ops {
 
 	map_queues_fn		*map_queues;
 
+	/*
+	 * Some SCSI devices support private complete queue, returns
+	 * affinity of the complete queue, and the passed 'cpu' parameter
+	 * has to be included in the complete queue's affinity cpumask, and
+	 * used to figure out the mapped reply queue. If NULL is returns,
+	 * it means this hctx hasn't private completion queues.
+	 */
+	hctx_complete_queue_affinity_fn *complete_queue_affinity;
+
 #ifdef CONFIG_BLK_DEBUG_FS
 	/*
 	 * Used by the debugfs implementation to show driver-specific
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 3/5] scsi: core: implement callback of .complete_queue_affinity
  2019-05-27 15:02 [PATCH V2 0/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
  2019-05-27 15:02 ` [PATCH V2 1/5] scsi: select reply queue from request's CPU Ming Lei
  2019-05-27 15:02 ` [PATCH V2 2/5] blk-mq: introduce .complete_queue_affinity Ming Lei
@ 2019-05-27 15:02 ` Ming Lei
  2019-05-27 15:02 ` [PATCH V2 4/5] scsi: implement .complete_queue_affinity Ming Lei
  2019-05-27 15:02 ` [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
  4 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-27 15:02 UTC (permalink / raw)
  To: Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, John Garry, Keith Busch, Thomas Gleixner,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Implement scsi core's callback of .complete_queue_affinity for
supporting to drain in-flight requests in case that SCSI HBA
supports multiple completion queue.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c  | 14 ++++++++++++++
 include/scsi/scsi_host.h | 10 ++++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 65d0a10c76ad..ac57dc98a8c0 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1750,6 +1750,19 @@ static int scsi_map_queues(struct blk_mq_tag_set *set)
 	return blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
 }
 
+static const struct cpumask *
+scsi_complete_queue_affinity(struct blk_mq_hw_ctx *hctx, int cpu)
+{
+	struct request_queue *q = hctx->queue;
+	struct scsi_device *sdev = q->queuedata;
+	struct Scsi_Host *shost = sdev->host;
+
+	if (shost->hostt->complete_queue_affinity)
+		return shost->hostt->complete_queue_affinity(shost, cpu);
+
+	return NULL;
+}
+
 void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q)
 {
 	struct device *dev = shost->dma_dev;
@@ -1802,6 +1815,7 @@ static const struct blk_mq_ops scsi_mq_ops = {
 	.initialize_rq_fn = scsi_initialize_rq,
 	.busy		= scsi_mq_lld_busy,
 	.map_queues	= scsi_map_queues,
+	.complete_queue_affinity = scsi_complete_queue_affinity,
 };
 
 struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev)
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index a5fcdad4a03e..65ccac1429a1 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -268,6 +268,16 @@ struct scsi_host_template {
 	 */
 	int (* map_queues)(struct Scsi_Host *shost);
 
+	/*
+	 * This functions lets the driver expose complete queue's cpu
+	 * affinity to the block layer. @cpu is used for retrieving
+	 * the mapped completion queue.
+	 *
+	 * Status: OPTIONAL
+	 */
+	const struct cpumask * (* complete_queue_affinity)(struct Scsi_Host *,
+							   int cpu);
+
 	/*
 	 * This function determines the BIOS parameters for a given
 	 * harddisk.  These tend to be numbers that are made up by
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 4/5] scsi: implement .complete_queue_affinity
  2019-05-27 15:02 [PATCH V2 0/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
                   ` (2 preceding siblings ...)
  2019-05-27 15:02 ` [PATCH V2 3/5] scsi: core: implement callback of .complete_queue_affinity Ming Lei
@ 2019-05-27 15:02 ` Ming Lei
  2019-05-27 15:02 ` [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
  4 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-27 15:02 UTC (permalink / raw)
  To: Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, John Garry, Keith Busch, Thomas Gleixner,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Implement .complete_queue_affinity callback for all in-tree drivers
which support private completion queues.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/hisi_sas/hisi_sas_v3_hw.c    | 11 +++++++++++
 drivers/scsi/hpsa.c                       | 12 ++++++++++++
 drivers/scsi/megaraid/megaraid_sas_base.c | 10 ++++++++++
 drivers/scsi/mpt3sas/mpt3sas_scsih.c      | 11 +++++++++++
 4 files changed, 44 insertions(+)

diff --git a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
index 49620c2411df..799ee15c8786 100644
--- a/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
+++ b/drivers/scsi/hisi_sas/hisi_sas_v3_hw.c
@@ -2896,6 +2896,16 @@ static void debugfs_snapshot_restore_v3_hw(struct hisi_hba *hisi_hba)
 	clear_bit(HISI_SAS_REJECT_CMD_BIT, &hisi_hba->flags);
 }
 
+static const struct cpumask *
+hisi_sas_complete_queue_affinity(struct Scsi_Host *sh, int cpu)
+{
+	struct hisi_hba *hisi_hba = shost_priv(sh);
+	unsigned reply_queue = hisi_hba->reply_map[cpu];
+
+	return pci_irq_get_affinity(hisi_hba->pci_dev,
+			reply_queue + BASE_VECTORS_V3_HW);
+}
+
 static struct scsi_host_template sht_v3_hw = {
 	.name			= DRV_NAME,
 	.module			= THIS_MODULE,
@@ -2917,6 +2927,7 @@ static struct scsi_host_template sht_v3_hw = {
 	.shost_attrs		= host_attrs_v3_hw,
 	.tag_alloc_policy	= BLK_TAG_ALLOC_RR,
 	.host_reset             = hisi_sas_host_reset,
+	.complete_queue_affinity = hisi_sas_complete_queue_affinity,
 };
 
 static const struct hisi_sas_hw hisi_sas_v3_hw = {
diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index 72f9edb86752..87d37f945c76 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -271,6 +271,8 @@ static void hpsa_free_cmd_pool(struct ctlr_info *h);
 #define VPD_PAGE (1 << 8)
 #define HPSA_SIMPLE_ERROR_BITS 0x03
 
+static const struct cpumask *hpsa_complete_queue_affinity(
+		struct Scsi_Host *, int);
 static int hpsa_scsi_queue_command(struct Scsi_Host *h, struct scsi_cmnd *cmd);
 static void hpsa_scan_start(struct Scsi_Host *);
 static int hpsa_scan_finished(struct Scsi_Host *sh,
@@ -962,6 +964,7 @@ static struct scsi_host_template hpsa_driver_template = {
 	.name			= HPSA,
 	.proc_name		= HPSA,
 	.queuecommand		= hpsa_scsi_queue_command,
+	.complete_queue_affinity = hpsa_complete_queue_affinity,
 	.scan_start		= hpsa_scan_start,
 	.scan_finished		= hpsa_scan_finished,
 	.change_queue_depth	= hpsa_change_queue_depth,
@@ -4824,6 +4827,15 @@ static int hpsa_scsi_ioaccel_direct_map(struct ctlr_info *h,
 		cmd->cmnd, cmd->cmd_len, dev->scsi3addr, dev);
 }
 
+static const struct cpumask *
+hpsa_complete_queue_affinity(struct Scsi_Host *sh, int cpu)
+{
+	struct ctlr_info *h = shost_to_hba(sh);
+	unsigned reply_queue = h->reply_map[cpu];
+
+	return pci_irq_get_affinity(h->pdev, reply_queue);
+}
+
 /*
  * Set encryption parameters for the ioaccel2 request
  */
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 3dd1df472dc6..59b71e8f98a8 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -3165,6 +3165,15 @@ megasas_fw_cmds_outstanding_show(struct device *cdev,
 	return snprintf(buf, PAGE_SIZE, "%d\n", atomic_read(&instance->fw_outstanding));
 }
 
+static const struct cpumask *
+megasas_complete_queue_affinity(struct Scsi_Host *sh, int cpu)
+{
+	struct megasas_instance *instance = (struct megasas_instance *)sh->hostdata;
+	unsigned reply_queue = instance->reply_map[cpu];
+
+	return pci_irq_get_affinity(instance->pdev, reply_queue);
+}
+
 static DEVICE_ATTR(fw_crash_buffer, S_IRUGO | S_IWUSR,
 	megasas_fw_crash_buffer_show, megasas_fw_crash_buffer_store);
 static DEVICE_ATTR(fw_crash_buffer_size, S_IRUGO,
@@ -3208,6 +3217,7 @@ static struct scsi_host_template megasas_template = {
 	.bios_param = megasas_bios_param,
 	.change_queue_depth = scsi_change_queue_depth,
 	.no_write_same = 1,
+	.complete_queue_affinity        = megasas_complete_queue_affinity,
 };
 
 /**
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 1ccfbc7eebe0..2db1d6fc4bda 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -10161,6 +10161,15 @@ scsih_scan_finished(struct Scsi_Host *shost, unsigned long time)
 	return 1;
 }
 
+static const struct cpumask *
+mpt3sas_complete_queue_affinity(struct Scsi_Host *sh, int cpu)
+{
+	struct MPT3SAS_ADAPTER *ioc = shost_priv(sh);
+	unsigned reply_queue = ioc->cpu_msix_table[cpu];
+
+       return pci_irq_get_affinity(ioc->pdev, reply_queue);
+}
+
 /* shost template for SAS 2.0 HBA devices */
 static struct scsi_host_template mpt2sas_driver_template = {
 	.module				= THIS_MODULE,
@@ -10189,6 +10198,7 @@ static struct scsi_host_template mpt2sas_driver_template = {
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
+	.complete_queue_affinity        = mpt3sas_complete_queue_affinity,
 };
 
 /* raid transport support for SAS 2.0 HBA devices */
@@ -10227,6 +10237,7 @@ static struct scsi_host_template mpt3sas_driver_template = {
 	.sdev_attrs			= mpt3sas_dev_attrs,
 	.track_queue_depth		= 1,
 	.cmd_size			= sizeof(struct scsiio_tracker),
+	.complete_queue_affinity        = mpt3sas_complete_queue_affinity,
 };
 
 /* raid transport support for SAS 3.0 HBA devices */
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-27 15:02 [PATCH V2 0/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
                   ` (3 preceding siblings ...)
  2019-05-27 15:02 ` [PATCH V2 4/5] scsi: implement .complete_queue_affinity Ming Lei
@ 2019-05-27 15:02 ` Ming Lei
  2019-05-28 16:50   ` John Garry
  4 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2019-05-27 15:02 UTC (permalink / raw)
  To: Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, John Garry, Keith Busch, Thomas Gleixner,
	Don Brace, Kashyap Desai, Sathya Prakash, Christoph Hellwig,
	Ming Lei

Managed interrupts can not migrate affinity when their CPUs are offline.
If the CPU is allowed to shutdown before they're returned, commands
dispatched to managed queues won't be able to complete through their
irq handlers.

Wait in cpu hotplug handler until all inflight requests on the tags
are completed or timeout. Wait once for each tags, so we can save time
in case of shared tags.

Based on the following patch from Keith, and use simple delay-spin
instead.

https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/

Some SCSI devices may have single blk_mq hw queue and multiple private
completion queues, and wait until all requests on the private completion
queue are completed.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq-tag.c |  2 +-
 block/blk-mq-tag.h |  5 +++
 block/blk-mq.c     | 94 ++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 7513c8eaabee..b24334f99c5d 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -332,7 +332,7 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
  *		true to continue iterating tags, false to stop.
  * @priv:	Will be passed as second argument to @fn.
  */
-static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
+void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
 		busy_tag_iter_fn *fn, void *priv)
 {
 	if (tags->nr_reserved_tags)
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0b5a5a..9ce7606a87f0 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -19,6 +19,9 @@ struct blk_mq_tags {
 	struct request **rqs;
 	struct request **static_rqs;
 	struct list_head page_list;
+
+#define BLK_MQ_TAGS_DRAINED           0
+	unsigned long flags;
 };
 
 
@@ -35,6 +38,8 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
 extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
 void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 		void *priv);
+void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
+		busy_tag_iter_fn *fn, void *priv);
 
 static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
 						 struct blk_mq_hw_ctx *hctx)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 32b8ad3d341b..ab1fbfd48374 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2215,6 +2215,65 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
 	return -ENOMEM;
 }
 
+static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
+{
+	struct blk_mq_hw_ctx	*hctx =
+		hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+
+	if (hctx->tags)
+		clear_bit(BLK_MQ_TAGS_DRAINED, &hctx->tags->flags);
+
+	return 0;
+}
+
+struct blk_mq_inflight_rq_data {
+	unsigned cnt;
+	const struct cpumask *cpumask;
+};
+
+static bool blk_mq_count_inflight_rq(struct request *rq, void *data,
+				     bool reserved)
+{
+	struct blk_mq_inflight_rq_data *count = data;
+
+	if ((blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) &&
+			cpumask_test_cpu(blk_mq_rq_cpu(rq), count->cpumask))
+		count->cnt++;
+
+	return true;
+}
+
+unsigned blk_mq_tags_inflight_rqs(struct blk_mq_tags *tags,
+		const struct cpumask *completion_cpus)
+{
+	struct blk_mq_inflight_rq_data data = {
+		.cnt = 0,
+		.cpumask = completion_cpus,
+	};
+
+	blk_mq_all_tag_busy_iter(tags, blk_mq_count_inflight_rq, &data);
+
+	return data.cnt;
+}
+
+static void blk_mq_drain_inflight_rqs(struct blk_mq_tags *tags,
+		const struct cpumask *completion_cpus)
+{
+	if (!tags)
+		return;
+
+	/* Can't apply the optimization in case of private completion queues */
+	if (completion_cpus == cpu_all_mask &&
+			test_and_set_bit(BLK_MQ_TAGS_DRAINED, &tags->flags))
+		return;
+
+	while (1) {
+		if (!blk_mq_tags_inflight_rqs(tags, completion_cpus))
+			break;
+		msleep(5);
+	}
+}
+
 /*
  * 'cpu' is going away. splice any existing rq_list entries from this
  * software queue to the hw queue dispatch list, and ensure that it
@@ -2226,6 +2285,8 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	struct blk_mq_ctx *ctx;
 	LIST_HEAD(tmp);
 	enum hctx_type type;
+	struct request_queue *q;
+	const struct cpumask *cpumask = NULL, *completion_cpus;
 
 	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
 	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
@@ -2238,14 +2299,32 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
 	}
 	spin_unlock(&ctx->lock);
 
-	if (list_empty(&tmp))
-		return 0;
+	if (!list_empty(&tmp)) {
+		spin_lock(&hctx->lock);
+		list_splice_tail_init(&tmp, &hctx->dispatch);
+		spin_unlock(&hctx->lock);
 
-	spin_lock(&hctx->lock);
-	list_splice_tail_init(&tmp, &hctx->dispatch);
-	spin_unlock(&hctx->lock);
+		blk_mq_run_hw_queue(hctx, true);
+	}
+
+	/*
+	 * Interrupt for the current completion queue will be shutdown, so
+	 * wait until all requests on this queue are completed.
+	 */
+	q = hctx->queue;
+	if (q->mq_ops->complete_queue_affinity)
+		cpumask = q->mq_ops->complete_queue_affinity(hctx, cpu);
+
+	if (!cpumask) {
+		cpumask = hctx->cpumask;
+		completion_cpus = cpu_all_mask;
+	} else {
+		completion_cpus = cpumask;
+	}
+
+	if (cpumask_first_and(cpumask, cpu_online_mask) >= nr_cpu_ids)
+		blk_mq_drain_inflight_rqs(hctx->tags, completion_cpus);
 
-	blk_mq_run_hw_queue(hctx, true);
 	return 0;
 }
 
@@ -3541,7 +3620,8 @@ EXPORT_SYMBOL(blk_mq_rq_cpu);
 
 static int __init blk_mq_init(void)
 {
-	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
+	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead",
+				blk_mq_hctx_notify_prepare,
 				blk_mq_hctx_notify_dead);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/5] scsi: select reply queue from request's CPU
  2019-05-27 15:02 ` [PATCH V2 1/5] scsi: select reply queue from request's CPU Ming Lei
@ 2019-05-28  5:43   ` Hannes Reinecke
  2019-05-28 10:33   ` John Garry
  1 sibling, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-28  5:43 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	John Garry, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 5/27/19 5:02 PM, Ming Lei wrote:
> Hisi_sas_v3_hw, hpsa, megaraid and mpt3sas use single blk-mq hw queue
> to submit request, meantime apply multiple private reply queues served as
> completion queue. The mapping between CPU and reply queue is setup via
> pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY) just like the usual
> blk-mq queue mapping.
> 
> These drivers always use current CPU(raw_smp_processor_id) to figure out
> the reply queue. Switch to use request's CPU to get the reply queue,
> so we can drain in-flight request via blk-mq's API before the last CPU of
> the reply queue becomes offline.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   drivers/scsi/hisi_sas/hisi_sas_main.c       |  5 +++--
>   drivers/scsi/hpsa.c                         |  2 +-
>   drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 ++--
>   drivers/scsi/mpt3sas/mpt3sas_base.c         | 16 ++++++++--------
>   include/scsi/scsi_cmnd.h                    | 11 +++++++++++
>   5 files changed, 25 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/5] scsi: select reply queue from request's CPU
  2019-05-27 15:02 ` [PATCH V2 1/5] scsi: select reply queue from request's CPU Ming Lei
  2019-05-28  5:43   ` Hannes Reinecke
@ 2019-05-28 10:33   ` John Garry
  2019-05-29  2:36     ` Ming Lei
  1 sibling, 1 reply; 20+ messages in thread
From: John Garry @ 2019-05-28 10:33 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 27/05/2019 16:02, Ming Lei wrote:
> Hisi_sas_v3_hw, hpsa, megaraid and mpt3sas use single blk-mq hw queue
> to submit request, meantime apply multiple private reply queues served as
> completion queue. The mapping between CPU and reply queue is setup via
> pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY) just like the usual
> blk-mq queue mapping.
>
> These drivers always use current CPU(raw_smp_processor_id) to figure out
> the reply queue. Switch to use request's CPU to get the reply queue,
> so we can drain in-flight request via blk-mq's API before the last CPU of
> the reply queue becomes offline.
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/hisi_sas/hisi_sas_main.c       |  5 +++--
>  drivers/scsi/hpsa.c                         |  2 +-
>  drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 ++--
>  drivers/scsi/mpt3sas/mpt3sas_base.c         | 16 ++++++++--------
>  include/scsi/scsi_cmnd.h                    | 11 +++++++++++
>  5 files changed, 25 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> index 8a7feb8ed8d6..ab9d8e7bfc8e 100644
> --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> @@ -471,9 +471,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
>  		return -ECOMM;
>  	}
>
> +	/* only V3 hardware setup .reply_map */
>  	if (hisi_hba->reply_map) {
> -		int cpu = raw_smp_processor_id();
> -		unsigned int dq_index = hisi_hba->reply_map[cpu];
> +		unsigned int dq_index = hisi_hba->reply_map[
> +			scsi_cmnd_cpu(task->uldd_task)];

Hi Ming,

There is a problem here. For ATA commands in libsas, task->uldd_task is 
ata_queued_cmd *, and not a scsi_cmnd *. It comes from 
https://elixir.bootlin.com/linux/v5.2-rc2/source/drivers/scsi/libsas/sas_ata.c#L212

Please see this later code, where we have this check:
	if (task->uldd_task) {
		struct ata_queued_cmd *qc;

		if (dev_is_sata(device)) {
			qc = task->uldd_task;
			scsi_cmnd = qc->scsicmd;
		} else {
			scsi_cmnd = task->uldd_task;
		}
	}
	rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);

I suppose that we could solve by finding scsi_cmnd * earlier in 
hisi_sas_task_prep().

>
>  		*dq_pointer = dq = &hisi_hba->dq[dq_index];
>  	} else {
> diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> index 1bef1da273c2..72f9edb86752 100644
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1145,7 +1145,7 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,

[snip]

> diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> index 76ed5e4acd38..ab60883c2c40 100644
> --- a/include/scsi/scsi_cmnd.h
> +++ b/include/scsi/scsi_cmnd.h
> @@ -332,4 +332,15 @@ static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
>  	return xfer_len;
>  }
>
> +static inline int scsi_cmnd_cpu(struct scsi_cmnd *scmd)
> +{
> +	if (!scmd || !scmd->request)
> +		return raw_smp_processor_id();
> +
> +	if (!scmd->request->mq_ctx)
> +		return raw_smp_processor_id();

nit: can we combine these tests? Or do you want a distinct check on 
scmd->request->mq_ctx, since blk_mq_rq_cpu() does not check it?

> +
> +	return blk_mq_rq_cpu(scmd->request);
> +}
> +
>  #endif /* _SCSI_SCSI_CMND_H */

Thanks

>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-27 15:02 ` [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
@ 2019-05-28 16:50   ` John Garry
  2019-05-29  2:28     ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2019-05-28 16:50 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, Martin K . Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Bart Van Assche,
	Hannes Reinecke, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On 27/05/2019 16:02, Ming Lei wrote:
> Managed interrupts can not migrate affinity when their CPUs are offline.
> If the CPU is allowed to shutdown before they're returned, commands
> dispatched to managed queues won't be able to complete through their
> irq handlers.
>
> Wait in cpu hotplug handler until all inflight requests on the tags
> are completed or timeout. Wait once for each tags, so we can save time
> in case of shared tags.
>
> Based on the following patch from Keith, and use simple delay-spin
> instead.
>
> https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/
>
> Some SCSI devices may have single blk_mq hw queue and multiple private
> completion queues, and wait until all requests on the private completion
> queue are completed.

Hi Ming,

I'm a bit concerned that this approach won't work due to ordering: it 
seems that the IRQ would be shutdown prior to the CPU dead notification 
for the last CPU in the mask (where we attempt to drain the queue 
associated with the IRQ, which would require the IRQ to be still enabled).

I hope that you can tell me that I'm wrong...

Thanks,
John

>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq-tag.c |  2 +-
>  block/blk-mq-tag.h |  5 +++
>  block/blk-mq.c     | 94 ++++++++++++++++++++++++++++++++++++++++++----
>  3 files changed, 93 insertions(+), 8 deletions(-)
>
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index 7513c8eaabee..b24334f99c5d 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -332,7 +332,7 @@ static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
>   *		true to continue iterating tags, false to stop.
>   * @priv:	Will be passed as second argument to @fn.
>   */
> -static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
> +void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
>  		busy_tag_iter_fn *fn, void *priv)
>  {
>  	if (tags->nr_reserved_tags)
> diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
> index 61deab0b5a5a..9ce7606a87f0 100644
> --- a/block/blk-mq-tag.h
> +++ b/block/blk-mq-tag.h
> @@ -19,6 +19,9 @@ struct blk_mq_tags {
>  	struct request **rqs;
>  	struct request **static_rqs;
>  	struct list_head page_list;
> +
> +#define BLK_MQ_TAGS_DRAINED           0
> +	unsigned long flags;
>  };
>
>
> @@ -35,6 +38,8 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
>  extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
>  void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  		void *priv);
> +void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
> +		busy_tag_iter_fn *fn, void *priv);
>
>  static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
>  						 struct blk_mq_hw_ctx *hctx)
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 32b8ad3d341b..ab1fbfd48374 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2215,6 +2215,65 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
>  	return -ENOMEM;
>  }
>
> +static int blk_mq_hctx_notify_prepare(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct blk_mq_hw_ctx	*hctx =
> +		hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
> +
> +	if (hctx->tags)
> +		clear_bit(BLK_MQ_TAGS_DRAINED, &hctx->tags->flags);
> +
> +	return 0;
> +}
> +
> +struct blk_mq_inflight_rq_data {
> +	unsigned cnt;
> +	const struct cpumask *cpumask;
> +};
> +
> +static bool blk_mq_count_inflight_rq(struct request *rq, void *data,
> +				     bool reserved)
> +{
> +	struct blk_mq_inflight_rq_data *count = data;
> +
> +	if ((blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) &&
> +			cpumask_test_cpu(blk_mq_rq_cpu(rq), count->cpumask))
> +		count->cnt++;
> +
> +	return true;
> +}
> +
> +unsigned blk_mq_tags_inflight_rqs(struct blk_mq_tags *tags,
> +		const struct cpumask *completion_cpus)
> +{
> +	struct blk_mq_inflight_rq_data data = {
> +		.cnt = 0,
> +		.cpumask = completion_cpus,
> +	};
> +
> +	blk_mq_all_tag_busy_iter(tags, blk_mq_count_inflight_rq, &data);
> +
> +	return data.cnt;
> +}
> +
> +static void blk_mq_drain_inflight_rqs(struct blk_mq_tags *tags,
> +		const struct cpumask *completion_cpus)
> +{
> +	if (!tags)
> +		return;
> +
> +	/* Can't apply the optimization in case of private completion queues */
> +	if (completion_cpus == cpu_all_mask &&
> +			test_and_set_bit(BLK_MQ_TAGS_DRAINED, &tags->flags))
> +		return;
> +
> +	while (1) {
> +		if (!blk_mq_tags_inflight_rqs(tags, completion_cpus))
> +			break;
> +		msleep(5);
> +	}
> +}
> +
>  /*
>   * 'cpu' is going away. splice any existing rq_list entries from this
>   * software queue to the hw queue dispatch list, and ensure that it
> @@ -2226,6 +2285,8 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	struct blk_mq_ctx *ctx;
>  	LIST_HEAD(tmp);
>  	enum hctx_type type;
> +	struct request_queue *q;
> +	const struct cpumask *cpumask = NULL, *completion_cpus;
>
>  	hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
>  	ctx = __blk_mq_get_ctx(hctx->queue, cpu);
> @@ -2238,14 +2299,32 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
>  	}
>  	spin_unlock(&ctx->lock);
>
> -	if (list_empty(&tmp))
> -		return 0;
> +	if (!list_empty(&tmp)) {
> +		spin_lock(&hctx->lock);
> +		list_splice_tail_init(&tmp, &hctx->dispatch);
> +		spin_unlock(&hctx->lock);
>
> -	spin_lock(&hctx->lock);
> -	list_splice_tail_init(&tmp, &hctx->dispatch);
> -	spin_unlock(&hctx->lock);
> +		blk_mq_run_hw_queue(hctx, true);
> +	}
> +
> +	/*
> +	 * Interrupt for the current completion queue will be shutdown, so
> +	 * wait until all requests on this queue are completed.
> +	 */
> +	q = hctx->queue;
> +	if (q->mq_ops->complete_queue_affinity)
> +		cpumask = q->mq_ops->complete_queue_affinity(hctx, cpu);
> +
> +	if (!cpumask) {
> +		cpumask = hctx->cpumask;
> +		completion_cpus = cpu_all_mask;
> +	} else {
> +		completion_cpus = cpumask;
> +	}
> +
> +	if (cpumask_first_and(cpumask, cpu_online_mask) >= nr_cpu_ids)
> +		blk_mq_drain_inflight_rqs(hctx->tags, completion_cpus);
>
> -	blk_mq_run_hw_queue(hctx, true);
>  	return 0;
>  }
>
> @@ -3541,7 +3620,8 @@ EXPORT_SYMBOL(blk_mq_rq_cpu);
>
>  static int __init blk_mq_init(void)
>  {
> -	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
> +	cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead",
> +				blk_mq_hctx_notify_prepare,
>  				blk_mq_hctx_notify_dead);
>  	return 0;
>  }
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-28 16:50   ` John Garry
@ 2019-05-29  2:28     ` Ming Lei
  2019-05-29  2:42       ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2019-05-29  2:28 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Martin K . Petersen, linux-block, James Bottomley,
	linux-scsi, Bart Van Assche, Hannes Reinecke, Keith Busch,
	Thomas Gleixner, Don Brace, Kashyap Desai, Sathya Prakash,
	Christoph Hellwig

On Tue, May 28, 2019 at 05:50:40PM +0100, John Garry wrote:
> On 27/05/2019 16:02, Ming Lei wrote:
> > Managed interrupts can not migrate affinity when their CPUs are offline.
> > If the CPU is allowed to shutdown before they're returned, commands
> > dispatched to managed queues won't be able to complete through their
> > irq handlers.
> > 
> > Wait in cpu hotplug handler until all inflight requests on the tags
> > are completed or timeout. Wait once for each tags, so we can save time
> > in case of shared tags.
> > 
> > Based on the following patch from Keith, and use simple delay-spin
> > instead.
> > 
> > https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/
> > 
> > Some SCSI devices may have single blk_mq hw queue and multiple private
> > completion queues, and wait until all requests on the private completion
> > queue are completed.
> 
> Hi Ming,
> 
> I'm a bit concerned that this approach won't work due to ordering: it seems
> that the IRQ would be shutdown prior to the CPU dead notification for the

Managed IRQ shutdown is run in irq_migrate_all_off_this_cpu(), which is
called in the callback of takedown_cpu(). And the CPU dead notification
is always sent after that CPU becomes offline, see cpuhp_invoke_callback().

> last CPU in the mask (where we attempt to drain the queue associated with
> the IRQ, which would require the IRQ to be still enabled).
> 
> I hope that you can tell me that I'm wrong...

Or you add one line printk in both irq_migrate_all_off_this_cpu() and
blk_mq_hctx_notify_dead(), you will see if you are wrong.


Thanks, 
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 1/5] scsi: select reply queue from request's CPU
  2019-05-28 10:33   ` John Garry
@ 2019-05-29  2:36     ` Ming Lei
  0 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-29  2:36 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Martin K . Petersen, linux-block, James Bottomley,
	linux-scsi, Bart Van Assche, Hannes Reinecke, Keith Busch,
	Thomas Gleixner, Don Brace, Kashyap Desai, Sathya Prakash,
	Christoph Hellwig

On Tue, May 28, 2019 at 11:33:29AM +0100, John Garry wrote:
> On 27/05/2019 16:02, Ming Lei wrote:
> > Hisi_sas_v3_hw, hpsa, megaraid and mpt3sas use single blk-mq hw queue
> > to submit request, meantime apply multiple private reply queues served as
> > completion queue. The mapping between CPU and reply queue is setup via
> > pci_alloc_irq_vectors_affinity(PCI_IRQ_AFFINITY) just like the usual
> > blk-mq queue mapping.
> > 
> > These drivers always use current CPU(raw_smp_processor_id) to figure out
> > the reply queue. Switch to use request's CPU to get the reply queue,
> > so we can drain in-flight request via blk-mq's API before the last CPU of
> > the reply queue becomes offline.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/scsi/hisi_sas/hisi_sas_main.c       |  5 +++--
> >  drivers/scsi/hpsa.c                         |  2 +-
> >  drivers/scsi/megaraid/megaraid_sas_fusion.c |  4 ++--
> >  drivers/scsi/mpt3sas/mpt3sas_base.c         | 16 ++++++++--------
> >  include/scsi/scsi_cmnd.h                    | 11 +++++++++++
> >  5 files changed, 25 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c
> > index 8a7feb8ed8d6..ab9d8e7bfc8e 100644
> > --- a/drivers/scsi/hisi_sas/hisi_sas_main.c
> > +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c
> > @@ -471,9 +471,10 @@ static int hisi_sas_task_prep(struct sas_task *task,
> >  		return -ECOMM;
> >  	}
> > 
> > +	/* only V3 hardware setup .reply_map */
> >  	if (hisi_hba->reply_map) {
> > -		int cpu = raw_smp_processor_id();
> > -		unsigned int dq_index = hisi_hba->reply_map[cpu];
> > +		unsigned int dq_index = hisi_hba->reply_map[
> > +			scsi_cmnd_cpu(task->uldd_task)];
> 
> Hi Ming,
> 
> There is a problem here. For ATA commands in libsas, task->uldd_task is
> ata_queued_cmd *, and not a scsi_cmnd *. It comes from https://elixir.bootlin.com/linux/v5.2-rc2/source/drivers/scsi/libsas/sas_ata.c#L212
> 

Yeah, that is one problem.

> Please see this later code, where we have this check:
> 	if (task->uldd_task) {
> 		struct ata_queued_cmd *qc;
> 
> 		if (dev_is_sata(device)) {
> 			qc = task->uldd_task;
> 			scsi_cmnd = qc->scsicmd;
> 		} else {
> 			scsi_cmnd = task->uldd_task;
> 		}
> 	}
> 	rc  = hisi_sas_slot_index_alloc(hisi_hba, scsi_cmnd);
> 
> I suppose that we could solve by finding scsi_cmnd * earlier in
> hisi_sas_task_prep().

Yeah, it can be fixed easily, or move delivery queue selection
after .slot_index_alloc.

> 
> > 
> >  		*dq_pointer = dq = &hisi_hba->dq[dq_index];
> >  	} else {
> > diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
> > index 1bef1da273c2..72f9edb86752 100644
> > --- a/drivers/scsi/hpsa.c
> > +++ b/drivers/scsi/hpsa.c
> > @@ -1145,7 +1145,7 @@ static void __enqueue_cmd_and_start_io(struct ctlr_info *h,
> 
> [snip]
> 
> > diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
> > index 76ed5e4acd38..ab60883c2c40 100644
> > --- a/include/scsi/scsi_cmnd.h
> > +++ b/include/scsi/scsi_cmnd.h
> > @@ -332,4 +332,15 @@ static inline unsigned scsi_transfer_length(struct scsi_cmnd *scmd)
> >  	return xfer_len;
> >  }
> > 
> > +static inline int scsi_cmnd_cpu(struct scsi_cmnd *scmd)
> > +{
> > +	if (!scmd || !scmd->request)
> > +		return raw_smp_processor_id();
> > +
> > +	if (!scmd->request->mq_ctx)
> > +		return raw_smp_processor_id();
> 
> nit: can we combine these tests? Or do you want a distinct check on

OK.

> scmd->request->mq_ctx, since blk_mq_rq_cpu() does not check it?

blk_mq_rq_cpu() needn't to check it, however SCSI has to run the check
because some request may not have .mq_ctx, such as reset request.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-29  2:28     ` Ming Lei
@ 2019-05-29  2:42       ` Ming Lei
  2019-05-29  9:42         ` John Garry
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2019-05-29  2:42 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Martin K . Petersen, linux-block, James Bottomley,
	linux-scsi, Bart Van Assche, Hannes Reinecke, Keith Busch,
	Thomas Gleixner, Don Brace, Kashyap Desai, Sathya Prakash,
	Christoph Hellwig

On Wed, May 29, 2019 at 10:28:52AM +0800, Ming Lei wrote:
> On Tue, May 28, 2019 at 05:50:40PM +0100, John Garry wrote:
> > On 27/05/2019 16:02, Ming Lei wrote:
> > > Managed interrupts can not migrate affinity when their CPUs are offline.
> > > If the CPU is allowed to shutdown before they're returned, commands
> > > dispatched to managed queues won't be able to complete through their
> > > irq handlers.
> > > 
> > > Wait in cpu hotplug handler until all inflight requests on the tags
> > > are completed or timeout. Wait once for each tags, so we can save time
> > > in case of shared tags.
> > > 
> > > Based on the following patch from Keith, and use simple delay-spin
> > > instead.
> > > 
> > > https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/
> > > 
> > > Some SCSI devices may have single blk_mq hw queue and multiple private
> > > completion queues, and wait until all requests on the private completion
> > > queue are completed.
> > 
> > Hi Ming,
> > 
> > I'm a bit concerned that this approach won't work due to ordering: it seems
> > that the IRQ would be shutdown prior to the CPU dead notification for the
> 
> Managed IRQ shutdown is run in irq_migrate_all_off_this_cpu(), which is
> called in the callback of takedown_cpu(). And the CPU dead notification
> is always sent after that CPU becomes offline, see cpuhp_invoke_callback().

Hammm, looks we both say same thing.

Yeah, it is too late to drain requests in the cpu hotplug DEAD handler,
maybe we can try to move managed IRQ shutdown after sending the dead
notification.

I need to think of it further.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-29  2:42       ` Ming Lei
@ 2019-05-29  9:42         ` John Garry
  2019-05-29 10:10           ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2019-05-29  9:42 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Martin K . Petersen, linux-block, James Bottomley,
	linux-scsi, Bart Van Assche, Hannes Reinecke, Keith Busch,
	Thomas Gleixner, Don Brace, Kashyap Desai, Sathya Prakash,
	Christoph Hellwig

On 29/05/2019 03:42, Ming Lei wrote:
> On Wed, May 29, 2019 at 10:28:52AM +0800, Ming Lei wrote:
>> On Tue, May 28, 2019 at 05:50:40PM +0100, John Garry wrote:
>>> On 27/05/2019 16:02, Ming Lei wrote:
>>>> Managed interrupts can not migrate affinity when their CPUs are offline.
>>>> If the CPU is allowed to shutdown before they're returned, commands
>>>> dispatched to managed queues won't be able to complete through their
>>>> irq handlers.
>>>>
>>>> Wait in cpu hotplug handler until all inflight requests on the tags
>>>> are completed or timeout. Wait once for each tags, so we can save time
>>>> in case of shared tags.
>>>>
>>>> Based on the following patch from Keith, and use simple delay-spin
>>>> instead.
>>>>
>>>> https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/
>>>>
>>>> Some SCSI devices may have single blk_mq hw queue and multiple private
>>>> completion queues, and wait until all requests on the private completion
>>>> queue are completed.
>>>
>>> Hi Ming,
>>>
>>> I'm a bit concerned that this approach won't work due to ordering: it seems
>>> that the IRQ would be shutdown prior to the CPU dead notification for the
>>
>> Managed IRQ shutdown is run in irq_migrate_all_off_this_cpu(), which is
>> called in the callback of takedown_cpu(). And the CPU dead notification
>> is always sent after that CPU becomes offline, see cpuhp_invoke_callback().
>
> Hammm, looks we both say same thing.
>
> Yeah, it is too late to drain requests in the cpu hotplug DEAD handler,
> maybe we can try to move managed IRQ shutdown after sending the dead
> notification.
>

Even if the IRQ is shutdown later, all CPUs would still be dead, so none 
available to receive the interrupt or do the work for draining the queue.

> I need to think of it further.

It would seem that we just need to be informed of CPU offlining earlier, 
and plug the drain in there.

>

Cheers,
John

> Thanks,
> Ming
>
> .
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-29  9:42         ` John Garry
@ 2019-05-29 10:10           ` Ming Lei
  2019-05-29 15:33             ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2019-05-29 10:10 UTC (permalink / raw)
  To: John Garry
  Cc: Jens Axboe, Martin K . Petersen, linux-block, James Bottomley,
	linux-scsi, Bart Van Assche, Hannes Reinecke, Keith Busch,
	Thomas Gleixner, Don Brace, Kashyap Desai, Sathya Prakash,
	Christoph Hellwig

On Wed, May 29, 2019 at 10:42:00AM +0100, John Garry wrote:
> On 29/05/2019 03:42, Ming Lei wrote:
> > On Wed, May 29, 2019 at 10:28:52AM +0800, Ming Lei wrote:
> > > On Tue, May 28, 2019 at 05:50:40PM +0100, John Garry wrote:
> > > > On 27/05/2019 16:02, Ming Lei wrote:
> > > > > Managed interrupts can not migrate affinity when their CPUs are offline.
> > > > > If the CPU is allowed to shutdown before they're returned, commands
> > > > > dispatched to managed queues won't be able to complete through their
> > > > > irq handlers.
> > > > > 
> > > > > Wait in cpu hotplug handler until all inflight requests on the tags
> > > > > are completed or timeout. Wait once for each tags, so we can save time
> > > > > in case of shared tags.
> > > > > 
> > > > > Based on the following patch from Keith, and use simple delay-spin
> > > > > instead.
> > > > > 
> > > > > https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/
> > > > > 
> > > > > Some SCSI devices may have single blk_mq hw queue and multiple private
> > > > > completion queues, and wait until all requests on the private completion
> > > > > queue are completed.
> > > > 
> > > > Hi Ming,
> > > > 
> > > > I'm a bit concerned that this approach won't work due to ordering: it seems
> > > > that the IRQ would be shutdown prior to the CPU dead notification for the
> > > 
> > > Managed IRQ shutdown is run in irq_migrate_all_off_this_cpu(), which is
> > > called in the callback of takedown_cpu(). And the CPU dead notification
> > > is always sent after that CPU becomes offline, see cpuhp_invoke_callback().
> > 
> > Hammm, looks we both say same thing.
> > 
> > Yeah, it is too late to drain requests in the cpu hotplug DEAD handler,
> > maybe we can try to move managed IRQ shutdown after sending the dead
> > notification.
> > 
> 
> Even if the IRQ is shutdown later, all CPUs would still be dead, so none
> available to receive the interrupt or do the work for draining the queue.
> 
> > I need to think of it further.
> 
> It would seem that we just need to be informed of CPU offlining earlier, and
> plug the drain in there.

Yes, looks blk-mq has to be notified before unplugging CPU for this
issue.

And we should be careful to handle the multiple reply queue case, given the queue
shouldn't be stopped or quieseced because other reply queues are still active.

The new CPUHP state for blk-mq should be invoked after the to-be-offline
CPU is quiesced and before it becomes offline.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-29 10:10           ` Ming Lei
@ 2019-05-29 15:33             ` Ming Lei
  2019-05-29 16:10               ` John Garry
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2019-05-29 15:33 UTC (permalink / raw)
  To: Ming Lei
  Cc: John Garry, Jens Axboe, Martin K . Petersen, linux-block,
	James Bottomley, Linux SCSI List, Bart Van Assche,
	Hannes Reinecke, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On Wed, May 29, 2019 at 6:11 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, May 29, 2019 at 10:42:00AM +0100, John Garry wrote:
> > On 29/05/2019 03:42, Ming Lei wrote:
> > > On Wed, May 29, 2019 at 10:28:52AM +0800, Ming Lei wrote:
> > > > On Tue, May 28, 2019 at 05:50:40PM +0100, John Garry wrote:
> > > > > On 27/05/2019 16:02, Ming Lei wrote:
> > > > > > Managed interrupts can not migrate affinity when their CPUs are offline.
> > > > > > If the CPU is allowed to shutdown before they're returned, commands
> > > > > > dispatched to managed queues won't be able to complete through their
> > > > > > irq handlers.
> > > > > >
> > > > > > Wait in cpu hotplug handler until all inflight requests on the tags
> > > > > > are completed or timeout. Wait once for each tags, so we can save time
> > > > > > in case of shared tags.
> > > > > >
> > > > > > Based on the following patch from Keith, and use simple delay-spin
> > > > > > instead.
> > > > > >
> > > > > > https://lore.kernel.org/linux-block/20190405215920.27085-1-keith.busch@intel.com/
> > > > > >
> > > > > > Some SCSI devices may have single blk_mq hw queue and multiple private
> > > > > > completion queues, and wait until all requests on the private completion
> > > > > > queue are completed.
> > > > >
> > > > > Hi Ming,
> > > > >
> > > > > I'm a bit concerned that this approach won't work due to ordering: it seems
> > > > > that the IRQ would be shutdown prior to the CPU dead notification for the
> > > >
> > > > Managed IRQ shutdown is run in irq_migrate_all_off_this_cpu(), which is
> > > > called in the callback of takedown_cpu(). And the CPU dead notification
> > > > is always sent after that CPU becomes offline, see cpuhp_invoke_callback().
> > >
> > > Hammm, looks we both say same thing.
> > >
> > > Yeah, it is too late to drain requests in the cpu hotplug DEAD handler,
> > > maybe we can try to move managed IRQ shutdown after sending the dead
> > > notification.
> > >
> >
> > Even if the IRQ is shutdown later, all CPUs would still be dead, so none
> > available to receive the interrupt or do the work for draining the queue.
> >
> > > I need to think of it further.
> >
> > It would seem that we just need to be informed of CPU offlining earlier, and
> > plug the drain in there.
>
> Yes, looks blk-mq has to be notified before unplugging CPU for this
> issue.
>
> And we should be careful to handle the multiple reply queue case, given the queue
> shouldn't be stopped or quieseced because other reply queues are still active.
>
> The new CPUHP state for blk-mq should be invoked after the to-be-offline
> CPU is quiesced and before it becomes offline.

Hi John,

Thinking of this issue further, so far, one doable solution is to
expose reply queues
as blk-mq hw queues, as done by the following patchset:

https://lore.kernel.org/linux-block/20180205152035.15016-1-ming.lei@redhat.com/

In which global host-wide tags are shared for all blk-mq hw queues.

Also we can remove all the reply_map stuff in drivers, then solve the problem of
draining in-flight requests during unplugging CPU in a generic approach.

Last time, it was reported that the patchset causes performance regression,
which is actually caused by duplicated io accounting in
blk_mq_queue_tag_busy_iter(),
which should be fixed easily.

What do you think of this approach?

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-29 15:33             ` Ming Lei
@ 2019-05-29 16:10               ` John Garry
  2019-05-30  2:28                 ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: John Garry @ 2019-05-29 16:10 UTC (permalink / raw)
  To: Ming Lei, Ming Lei
  Cc: Jens Axboe, Martin K . Petersen, linux-block, James Bottomley,
	Linux SCSI List, Bart Van Assche, Hannes Reinecke, Keith Busch,
	Thomas Gleixner, Don Brace, Kashyap Desai, Sathya Prakash,
	Christoph Hellwig


>>
>> And we should be careful to handle the multiple reply queue case, given the queue
>> shouldn't be stopped or quieseced because other reply queues are still active.
>>
>> The new CPUHP state for blk-mq should be invoked after the to-be-offline
>> CPU is quiesced and before it becomes offline.
>
> Hi John,
>

Hi Ming,

> Thinking of this issue further, so far, one doable solution is to
> expose reply queues
> as blk-mq hw queues, as done by the following patchset:
>
> https://lore.kernel.org/linux-block/20180205152035.15016-1-ming.lei@redhat.com/

I thought that this patchset had fundamental issues, in terms of working 
for all types of hosts. FYI, I did the backport of latest hisi_sas_v3 to 
v4.15 with this patchset (as you may have noticed in my git send 
mistake), but we have not got to test it yet.

On a related topic, we did test exposing reply queues as blk-mq hw 
queues and generating the host-wide tag internally in the LLDD with 
sbitmap, and unfortunately we were experiencing a significant 
performance hit, like 2300K -> 1800K IOPs for 4K read.

We need to test this further. I don't understand why we get such a big hit.

>
> In which global host-wide tags are shared for all blk-mq hw queues.
>
> Also we can remove all the reply_map stuff in drivers, then solve the problem of
> draining in-flight requests during unplugging CPU in a generic approach.

So you're saying that removing this reply queue stuff can make the 
solution to the problem more generic, but do you have an idea of the 
overall solution?

>
> Last time, it was reported that the patchset causes performance regression,
> which is actually caused by duplicated io accounting in
> blk_mq_queue_tag_busy_iter(),
> which should be fixed easily.
>
> What do you think of this approach?

It would still be good to have a forward port of this patchset for 
testing, if we're serious about it. Or at least this bug you mention fixed.

thanks again,
John

>
> Thanks,
> Ming Lei
>
> .
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-29 16:10               ` John Garry
@ 2019-05-30  2:28                 ` Ming Lei
  2019-05-30  4:11                   ` Ming Lei
  2019-05-30  9:31                   ` John Garry
  0 siblings, 2 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-30  2:28 UTC (permalink / raw)
  To: John Garry
  Cc: Ming Lei, Jens Axboe, Martin K . Petersen, linux-block,
	James Bottomley, Linux SCSI List, Bart Van Assche,
	Hannes Reinecke, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On Wed, May 29, 2019 at 05:10:38PM +0100, John Garry wrote:
> 
> > > 
> > > And we should be careful to handle the multiple reply queue case, given the queue
> > > shouldn't be stopped or quieseced because other reply queues are still active.
> > > 
> > > The new CPUHP state for blk-mq should be invoked after the to-be-offline
> > > CPU is quiesced and before it becomes offline.
> > 
> > Hi John,
> > 
> 
> Hi Ming,
> 
> > Thinking of this issue further, so far, one doable solution is to
> > expose reply queues
> > as blk-mq hw queues, as done by the following patchset:
> > 
> > https://lore.kernel.org/linux-block/20180205152035.15016-1-ming.lei@redhat.com/
> 
> I thought that this patchset had fundamental issues, in terms of working for
> all types of hosts. FYI, I did the backport of latest hisi_sas_v3 to v4.15

Could you explain it a bit about the fundamental issues for all types of
host?

It is just for hosts with multiple reply queues, such as hisi_sas v3,
megaraid_sas, mpt3sas and hpsa.

> with this patchset (as you may have noticed in my git send mistake), but we
> have not got to test it yet.
> 
> On a related topic, we did test exposing reply queues as blk-mq hw queues
> and generating the host-wide tag internally in the LLDD with sbitmap, and
> unfortunately we were experiencing a significant performance hit, like 2300K
> -> 1800K IOPs for 4K read.
> 
> We need to test this further. I don't understand why we get such a big hit.

The performance regression shouldn't have been introduced in theory, and it is
because blk_mq_queue_tag_busy_iter() iterates over the same duplicated tags multiple
times, which can be fixed easily.  

> 
> > 
> > In which global host-wide tags are shared for all blk-mq hw queues.
> > 
> > Also we can remove all the reply_map stuff in drivers, then solve the problem of
> > draining in-flight requests during unplugging CPU in a generic approach.
> 
> So you're saying that removing this reply queue stuff can make the solution
> to the problem more generic, but do you have an idea of the overall
> solution?

1) convert reply queue into blk-mq hw queue first

2) then all drivers are in same position wrt. handling requests vs.
unplugging CPU (shutdown managed IRQ)

The current handling in blk_mq_hctx_notify_dead() is actually wrong,
at that time, all CPUs on the hctx are dead, blk_mq_run_hw_queue()
still dispatches requests on driver's hw queue, and driver is invisible
to DEAD CPUs mapped to this hctx, and finally interrupt for these
requests on the hctx are lost.

Frankly speaking, the above 2nd problem is still hard to solve.

1) take_cpu_down() shutdown managed IRQ first, then run teardown callback
for states in [CPUHP_AP_ONLINE, CPUHP_AP_OFFLINE) on the to-be-offline
CPU

2) However, all runnable tasks are removed from the CPU in the teardown
callback for CPUHP_AP_SCHED_STARTING, which is run after managed IRQs
are shutdown. That said it is hard to avoid new request queued to
the hctx with all DEAD CPUs.

3) we don't support to freeze queue for specific hctx yet, or that way
may not be accepted because of extra cost in fast path

4) once request is allocated, it should be submitted to driver no matter
if CPU hotplug happens or not. Or free it and re-allocate new request
on proper sw/hw queue?

> 
> > 
> > Last time, it was reported that the patchset causes performance regression,
> > which is actually caused by duplicated io accounting in
> > blk_mq_queue_tag_busy_iter(),
> > which should be fixed easily.
> > 
> > What do you think of this approach?
> 
> It would still be good to have a forward port of this patchset for testing,
> if we're serious about it. Or at least this bug you mention fixed.

I plan to make this patchset workable on 5.2-rc for your test first.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-30  2:28                 ` Ming Lei
@ 2019-05-30  4:11                   ` Ming Lei
  2019-05-30  9:31                   ` John Garry
  1 sibling, 0 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-30  4:11 UTC (permalink / raw)
  To: Ming Lei
  Cc: John Garry, Jens Axboe, Martin K . Petersen, linux-block,
	James Bottomley, Linux SCSI List, Bart Van Assche,
	Hannes Reinecke, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On Thu, May 30, 2019 at 10:28 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Wed, May 29, 2019 at 05:10:38PM +0100, John Garry wrote:
> >
> > > >
> > > > And we should be careful to handle the multiple reply queue case, given the queue
> > > > shouldn't be stopped or quieseced because other reply queues are still active.
> > > >
> > > > The new CPUHP state for blk-mq should be invoked after the to-be-offline
> > > > CPU is quiesced and before it becomes offline.
> > >
> > > Hi John,
> > >
> >
> > Hi Ming,
> >
> > > Thinking of this issue further, so far, one doable solution is to
> > > expose reply queues
> > > as blk-mq hw queues, as done by the following patchset:
> > >
> > > https://lore.kernel.org/linux-block/20180205152035.15016-1-ming.lei@redhat.com/
> >
> > I thought that this patchset had fundamental issues, in terms of working for
> > all types of hosts. FYI, I did the backport of latest hisi_sas_v3 to v4.15
>
> Could you explain it a bit about the fundamental issues for all types of
> host?
>
> It is just for hosts with multiple reply queues, such as hisi_sas v3,
> megaraid_sas, mpt3sas and hpsa.
>
> > with this patchset (as you may have noticed in my git send mistake), but we
> > have not got to test it yet.
> >
> > On a related topic, we did test exposing reply queues as blk-mq hw queues
> > and generating the host-wide tag internally in the LLDD with sbitmap, and
> > unfortunately we were experiencing a significant performance hit, like 2300K
> > -> 1800K IOPs for 4K read.
> >
> > We need to test this further. I don't understand why we get such a big hit.
>
> The performance regression shouldn't have been introduced in theory, and it is
> because blk_mq_queue_tag_busy_iter() iterates over the same duplicated tags multiple
> times, which can be fixed easily.
>
> >
> > >
> > > In which global host-wide tags are shared for all blk-mq hw queues.
> > >
> > > Also we can remove all the reply_map stuff in drivers, then solve the problem of
> > > draining in-flight requests during unplugging CPU in a generic approach.
> >
> > So you're saying that removing this reply queue stuff can make the solution
> > to the problem more generic, but do you have an idea of the overall
> > solution?
>
> 1) convert reply queue into blk-mq hw queue first
>
> 2) then all drivers are in same position wrt. handling requests vs.
> unplugging CPU (shutdown managed IRQ)
>
> The current handling in blk_mq_hctx_notify_dead() is actually wrong,
> at that time, all CPUs on the hctx are dead, blk_mq_run_hw_queue()
> still dispatches requests on driver's hw queue, and driver is invisible
> to DEAD CPUs mapped to this hctx, and finally interrupt for these
> requests on the hctx are lost.
>
> Frankly speaking, the above 2nd problem is still hard to solve.
>
> 1) take_cpu_down() shutdown managed IRQ first, then run teardown callback
> for states in [CPUHP_AP_ONLINE, CPUHP_AP_OFFLINE) on the to-be-offline
> CPU
>
> 2) However, all runnable tasks are removed from the CPU in the teardown
> callback for CPUHP_AP_SCHED_STARTING, which is run after managed IRQs
> are shutdown. That said it is hard to avoid new request queued to
> the hctx with all DEAD CPUs.
>
> 3) we don't support to freeze queue for specific hctx yet, or that way
> may not be accepted because of extra cost in fast path
>
> 4) once request is allocated, it should be submitted to driver no matter
> if CPU hotplug happens or not. Or free it and re-allocate new request
> on proper sw/hw queue?

That looks doable, we may steal bios from the old in-queue request, then
re-submit them via generic_make_request(), and finally free the old request,
but RQF_DONTPREP has to be addressed via one new callback.

So follows the overall solution for waiting request vs. CPU hotplug,
which is done
in two stages:

1) in the teardown callback of new  CPUHP state of CPUHP_BLK_MQ_PREP,
which is run before CPUHP_AP_ONLINE_IDLE,  at that time the CPU & managed
IRQ is still alive:

- stopped the hctx
- wait in-flight requests from this hctx until all are completed

2) in the teardown callback of CPUHP_BLK_MQ_DEAD, which is run
after the CPU is dead

- dequeue request queued in sw queue or scheduler queue from this hctx
- steal bios from the dequeued request, and re-submit them via
generic_make_request()
- free the dequeued request, and need to free driver resource via new
callback for
RQF_DONTPREP, looks only SCSI needs it.
- restart this hctx


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-30  2:28                 ` Ming Lei
  2019-05-30  4:11                   ` Ming Lei
@ 2019-05-30  9:31                   ` John Garry
  2019-05-30  9:45                     ` Ming Lei
  1 sibling, 1 reply; 20+ messages in thread
From: John Garry @ 2019-05-30  9:31 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, Jens Axboe, Martin K . Petersen, linux-block,
	James Bottomley, Linux SCSI List, Bart Van Assche,
	Hannes Reinecke, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

Hi Ming,

>>
>>> Thinking of this issue further, so far, one doable solution is to
>>> expose reply queues
>>> as blk-mq hw queues, as done by the following patchset:
>>>
>>> https://lore.kernel.org/linux-block/20180205152035.15016-1-ming.lei@redhat.com/
>>
>> I thought that this patchset had fundamental issues, in terms of working for
>> all types of hosts. FYI, I did the backport of latest hisi_sas_v3 to v4.15
>
> Could you explain it a bit about the fundamental issues for all types of
> host?
>

*As I understand*, splitting the tagset has issues with dual-mode HBAs - 
as in supporting NVMe and SCSI, as some HBAs do.

> It is just for hosts with multiple reply queues, such as hisi_sas v3,
> megaraid_sas, mpt3sas and hpsa.
>
>> with this patchset (as you may have noticed in my git send mistake), but we
>> have not got to test it yet.
>>
>> On a related topic, we did test exposing reply queues as blk-mq hw queues
>> and generating the host-wide tag internally in the LLDD with sbitmap, and
>> unfortunately we were experiencing a significant performance hit, like 2300K
>> -> 1800K IOPs for 4K read.
>>
>> We need to test this further. I don't understand why we get such a big hit.
>
> The performance regression shouldn't have been introduced in theory, and it is
> because blk_mq_queue_tag_busy_iter() iterates over the same duplicated tags multiple
> times, which can be fixed easily.
>

We are testing further, and I will tentatively say that we're getting 
better results (than previously) after fixing something in the LLDD. TBC.

>>
>>>
>>> In which global host-wide tags are shared for all blk-mq hw queues.
>>>
>>> Also we can remove all the reply_map stuff in drivers, then solve the problem of
>>> draining in-flight requests during unplugging CPU in a generic approach.
>>
>> So you're saying that removing this reply queue stuff can make the solution
>> to the problem more generic, but do you have an idea of the overall
>> solution?
>
> 1) convert reply queue into blk-mq hw queue first
>
> 2) then all drivers are in same position wrt. handling requests vs.
> unplugging CPU (shutdown managed IRQ)
>
> The current handling in blk_mq_hctx_notify_dead() is actually wrong,

Yeah, the comment reads that it's going away, but it's actually gone.

> at that time, all CPUs on the hctx are dead, blk_mq_run_hw_queue()
> still dispatches requests on driver's hw queue, and driver is invisible
> to DEAD CPUs mapped to this hctx, and finally interrupt for these
> requests on the hctx are lost.
>
> Frankly speaking, the above 2nd problem is still hard to solve.
>
> 1) take_cpu_down() shutdown managed IRQ first, then run teardown callback
> for states in [CPUHP_AP_ONLINE, CPUHP_AP_OFFLINE) on the to-be-offline
> CPU
>
> 2) However, all runnable tasks are removed from the CPU in the teardown
> callback for CPUHP_AP_SCHED_STARTING, which is run after managed IRQs
> are shutdown. That said it is hard to avoid new request queued to
> the hctx with all DEAD CPUs.
>
> 3) we don't support to freeze queue for specific hctx yet, or that way
> may not be accepted because of extra cost in fast path
>
> 4) once request is allocated, it should be submitted to driver no matter
> if CPU hotplug happens or not. Or free it and re-allocate new request
> on proper sw/hw queue?
>
>>
>>>
>>> Last time, it was reported that the patchset causes performance regression,
>>> which is actually caused by duplicated io accounting in
>>> blk_mq_queue_tag_busy_iter(),
>>> which should be fixed easily.
>>>
>>> What do you think of this approach?
>>
>> It would still be good to have a forward port of this patchset for testing,
>> if we're serious about it. Or at least this bug you mention fixed.
>
> I plan to make this patchset workable on 5.2-rc for your test first.
>

ok, thanks. I assume that we're still open to not adding support for 
global tags in blk-mq, but rather the LLDD generating the unique tag 
with sbitmap.

Cheers,
John

>
> Thanks,
> Ming
>
> .
>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug
  2019-05-30  9:31                   ` John Garry
@ 2019-05-30  9:45                     ` Ming Lei
  0 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2019-05-30  9:45 UTC (permalink / raw)
  To: John Garry
  Cc: Ming Lei, Jens Axboe, Martin K . Petersen, linux-block,
	James Bottomley, Linux SCSI List, Bart Van Assche,
	Hannes Reinecke, Keith Busch, Thomas Gleixner, Don Brace,
	Kashyap Desai, Sathya Prakash, Christoph Hellwig

On Thu, May 30, 2019 at 10:31:34AM +0100, John Garry wrote:
> Hi Ming,
> 
> > > 
> > > > Thinking of this issue further, so far, one doable solution is to
> > > > expose reply queues
> > > > as blk-mq hw queues, as done by the following patchset:
> > > > 
> > > > https://lore.kernel.org/linux-block/20180205152035.15016-1-ming.lei@redhat.com/
> > > 
> > > I thought that this patchset had fundamental issues, in terms of working for
> > > all types of hosts. FYI, I did the backport of latest hisi_sas_v3 to v4.15
> > 
> > Could you explain it a bit about the fundamental issues for all types of
> > host?
> > 
> 
> *As I understand*, splitting the tagset has issues with dual-mode HBAs - as
> in supporting NVMe and SCSI, as some HBAs do.

The patchset I mentioned doesn't split tagset. The patch just
converts SCSI's reply queue into blk_mq hw queue, and all hw queues
share the host-wide tags. You can get unique tag too.

This way isn't very different with the current single hw queue(tags),
that is why I think the performance shouldn't be bad compared with
the current single hw queue. Meantime, drivers can get simplified.

> 
> > It is just for hosts with multiple reply queues, such as hisi_sas v3,
> > megaraid_sas, mpt3sas and hpsa.
> > 
> > > with this patchset (as you may have noticed in my git send mistake), but we
> > > have not got to test it yet.
> > > 
> > > On a related topic, we did test exposing reply queues as blk-mq hw queues
> > > and generating the host-wide tag internally in the LLDD with sbitmap, and
> > > unfortunately we were experiencing a significant performance hit, like 2300K
> > > -> 1800K IOPs for 4K read.
> > > 
> > > We need to test this further. I don't understand why we get such a big hit.
> > 
> > The performance regression shouldn't have been introduced in theory, and it is
> > because blk_mq_queue_tag_busy_iter() iterates over the same duplicated tags multiple
> > times, which can be fixed easily.
> > 
> 
> We are testing further, and I will tentatively say that we're getting better
> results (than previously) after fixing something in the LLDD. TBC.
> 
> > > 
> > > > 
> > > > In which global host-wide tags are shared for all blk-mq hw queues.
> > > > 
> > > > Also we can remove all the reply_map stuff in drivers, then solve the problem of
> > > > draining in-flight requests during unplugging CPU in a generic approach.
> > > 
> > > So you're saying that removing this reply queue stuff can make the solution
> > > to the problem more generic, but do you have an idea of the overall
> > > solution?
> > 
> > 1) convert reply queue into blk-mq hw queue first
> > 
> > 2) then all drivers are in same position wrt. handling requests vs.
> > unplugging CPU (shutdown managed IRQ)
> > 
> > The current handling in blk_mq_hctx_notify_dead() is actually wrong,
> 
> Yeah, the comment reads that it's going away, but it's actually gone.
> 
> > at that time, all CPUs on the hctx are dead, blk_mq_run_hw_queue()
> > still dispatches requests on driver's hw queue, and driver is invisible
> > to DEAD CPUs mapped to this hctx, and finally interrupt for these
> > requests on the hctx are lost.
> > 
> > Frankly speaking, the above 2nd problem is still hard to solve.
> > 
> > 1) take_cpu_down() shutdown managed IRQ first, then run teardown callback
> > for states in [CPUHP_AP_ONLINE, CPUHP_AP_OFFLINE) on the to-be-offline
> > CPU
> > 
> > 2) However, all runnable tasks are removed from the CPU in the teardown
> > callback for CPUHP_AP_SCHED_STARTING, which is run after managed IRQs
> > are shutdown. That said it is hard to avoid new request queued to
> > the hctx with all DEAD CPUs.
> > 
> > 3) we don't support to freeze queue for specific hctx yet, or that way
> > may not be accepted because of extra cost in fast path
> > 
> > 4) once request is allocated, it should be submitted to driver no matter
> > if CPU hotplug happens or not. Or free it and re-allocate new request
> > on proper sw/hw queue?
> > 
> > > 
> > > > 
> > > > Last time, it was reported that the patchset causes performance regression,
> > > > which is actually caused by duplicated io accounting in
> > > > blk_mq_queue_tag_busy_iter(),
> > > > which should be fixed easily.
> > > > 
> > > > What do you think of this approach?
> > > 
> > > It would still be good to have a forward port of this patchset for testing,
> > > if we're serious about it. Or at least this bug you mention fixed.
> > 
> > I plan to make this patchset workable on 5.2-rc for your test first.
> > 
> 
> ok, thanks. I assume that we're still open to not adding support for global
> tags in blk-mq, but rather the LLDD generating the unique tag with sbitmap.

Actually it is global tags.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-27 15:02 [PATCH V2 0/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
2019-05-27 15:02 ` [PATCH V2 1/5] scsi: select reply queue from request's CPU Ming Lei
2019-05-28  5:43   ` Hannes Reinecke
2019-05-28 10:33   ` John Garry
2019-05-29  2:36     ` Ming Lei
2019-05-27 15:02 ` [PATCH V2 2/5] blk-mq: introduce .complete_queue_affinity Ming Lei
2019-05-27 15:02 ` [PATCH V2 3/5] scsi: core: implement callback of .complete_queue_affinity Ming Lei
2019-05-27 15:02 ` [PATCH V2 4/5] scsi: implement .complete_queue_affinity Ming Lei
2019-05-27 15:02 ` [PATCH V2 5/5] blk-mq: Wait for for hctx inflight requests on CPU unplug Ming Lei
2019-05-28 16:50   ` John Garry
2019-05-29  2:28     ` Ming Lei
2019-05-29  2:42       ` Ming Lei
2019-05-29  9:42         ` John Garry
2019-05-29 10:10           ` Ming Lei
2019-05-29 15:33             ` Ming Lei
2019-05-29 16:10               ` John Garry
2019-05-30  2:28                 ` Ming Lei
2019-05-30  4:11                   ` Ming Lei
2019-05-30  9:31                   ` John Garry
2019-05-30  9:45                     ` Ming Lei

Linux-Block Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-block/0 linux-block/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-block linux-block/ https://lore.kernel.org/linux-block \
		linux-block@vger.kernel.org linux-block@archiver.kernel.org
	public-inbox-index linux-block


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-block


AGPL code for this site: git clone https://public-inbox.org/ public-inbox