All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12 V2] target: fix cmd plugging and completion
@ 2021-02-09 12:38 Mike Christie
  2021-02-09 12:38 ` [PATCH 01/13] target: move t_task_cdb initialization Mike Christie
                   ` (12 more replies)
  0 siblings, 13 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

The following patches made over Martin's 5.12 branches fix two
issues:

1. target_core_iblock plugs and unplugs the queue for every
command. To handle this issue and handle an issue that
vhost-scsi and loop were avoiding by adding their own workqueue,
I added a new submission workqueue to LIO. Drivers can pass cmds
to it, and we can then submit batches of cmds.

2. vhost-scsi and loop on the submission side were doing a work
per cmd and on the lio completion side it was doing a work per
cmd. The cap on running works is 512 (max_active) and so we can
end up end up using a lot of threads when submissions start blocking
because they hit the block tag limit or the completion side blocks
trying to send the cmd. In this patchset I just use a cmd list
per session to avoid abusing the workueue layer.

The combined patchset fixes a major perf issue we've been hitting
where IOPs is stuck at 230K when running:

    fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
    --ioengine=libaio --iodepth=128  --numjobs=8 --time_based
    --group_reporting --runtime=60

The patches in this set get me to 350K when using devices that
have native IOPs of around 400-500K.

Note that 5.12 has some interrupt changes that my patches
collide with. Martin's 5.12 branches had the changes so I
based my patches on that.

V2:
- Fix up container_of use coding style
- Handle offlist review comment from Laurence where with the
original code and my patches we can hit a bug where the cmd
times out, LIO starts up the TMR code, but it misses the cmd
because it's on the workqueue.
- Made the work per device work instead of session to handle
the previous issue and so if one dev hits some issue it sleeps on,
it won't block other devices.




^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/13] target: move t_task_cdb initialization
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-10  8:35     ` Christoph Hellwig
  2021-02-09 12:38 ` [PATCH 02/13] target: split target_submit_cmd_map_sgls Mike Christie
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

The next patch splits target_submit_cmd_map_sgls so the initialization
and submission part can be called at different times. If the init part
fails we can reference the t_task_cdb early in some of the logging
and tracing code. This moves it to transport_init_se_cmd so we don't
hit NULL pointer crashes.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index d47bfd8b0f87..5c4adde96d5e 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1389,6 +1389,7 @@ void transport_init_se_cmd(
 	INIT_WORK(&cmd->work, NULL);
 	kref_init(&cmd->cmd_kref);
 
+	cmd->t_task_cdb = &cmd->__t_task_cdb[0];
 	cmd->se_tfo = tfo;
 	cmd->se_sess = se_sess;
 	cmd->data_length = data_length;
@@ -1430,7 +1431,6 @@ target_cmd_init_cdb(struct se_cmd *cmd, unsigned char *cdb)
 {
 	sense_reason_t ret;
 
-	cmd->t_task_cdb = &cmd->__t_task_cdb[0];
 	/*
 	 * Ensure that the received CDB is less than the max (252 + 8) bytes
 	 * for VARIABLE_LENGTH_CMD
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/13] target: split target_submit_cmd_map_sgls
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
  2021-02-09 12:38 ` [PATCH 01/13] target: move t_task_cdb initialization Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 16:15     ` kernel test robot
  2021-02-10  8:36     ` Christoph Hellwig
  2021-02-09 12:38 ` [PATCH 03/13] target: add workqueue based cmd submission Mike Christie
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

Separate target_submit_cmd_map_sgls into the part that does:
- the initial cmd setup
- will not sleep
- and gives us access to the se_device
and the part that:
- can sleep
- handles the actual submission

This will be needed for loop in the next patches which needs to add
the cmd to the lio workqueue and can't sleep in that initial submission
path.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 149 ++++++++++++++++++-------
 1 file changed, 109 insertions(+), 40 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 5c4adde96d5e..71b0a862608b 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1571,12 +1571,9 @@ transport_generic_map_mem_to_cmd(struct se_cmd *cmd, struct scatterlist *sgl,
 }
 
 /**
- * target_submit_cmd_map_sgls - lookup unpacked lun and submit uninitialized
- * 			 se_cmd + use pre-allocated SGL memory.
- *
+ * target_submit_prep - prep cmd for submission to lio core
  * @se_cmd: command descriptor to submit
  * @se_sess: associated se_sess for endpoint
- * @cdb: pointer to SCSI CDB
  * @sense: pointer to SCSI sense buffer
  * @unpacked_lun: unpacked LUN to reference for struct se_lun
  * @data_length: fabric expected data transfer length
@@ -1592,26 +1589,29 @@ transport_generic_map_mem_to_cmd(struct se_cmd *cmd, struct scatterlist *sgl,
  *
  * Task tags are supported if the caller has set @se_cmd->tag.
  *
- * Returns non zero to signal active I/O shutdown failure.  All other
- * setup exceptions will be returned as a SCSI CHECK_CONDITION response,
- * but still return zero here.
+ * Returns:
+ *	- less than zero to signal active I/O shutdown failure
+ *	- zero on success.
+ *	- one for all other setup exceptions. The cmd will be returned as a
+ *	  SCSI CHECK_CONDITION response in this case.
  *
- * This may only be called from process context, and also currently
- * assumes internal allocation of fabric payload buffer by target-core.
+ * This may only be called from interrupt context if the caller's
+ * queue_status and release_cmd callouts do not block.
+ *
+ * This assumes internal allocation of fabric payload buffer by target-core.
  */
-int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess,
-		unsigned char *cdb, unsigned char *sense, u64 unpacked_lun,
-		u32 data_length, int task_attr, int data_dir, int flags,
-		struct scatterlist *sgl, u32 sgl_count,
-		struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
-		struct scatterlist *sgl_prot, u32 sgl_prot_count)
+static int
+target_submit_prep(struct se_cmd *se_cmd, struct se_session *se_sess,
+		   unsigned char *sense, u64 unpacked_lun,
+		   u32 data_length, int task_attr, int data_dir, int flags,
+		   struct scatterlist *sgl, u32 sgl_count,
+		   struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+		   struct scatterlist *sgl_prot, u32 sgl_prot_count)
 {
 	struct se_portal_group *se_tpg;
 	sense_reason_t rc;
 	int ret;
 
-	might_sleep();
-
 	se_tpg = se_sess->se_tpg;
 	BUG_ON(!se_tpg);
 	BUG_ON(se_cmd->se_tfo || se_cmd->se_sess);
@@ -1642,14 +1642,6 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 	 */
 	if (flags & TARGET_SCF_BIDI_OP)
 		se_cmd->se_cmd_flags |= SCF_BIDI;
-
-	rc = target_cmd_init_cdb(se_cmd, cdb);
-	if (rc) {
-		transport_send_check_condition_and_sense(se_cmd, rc, 0);
-		target_put_sess_cmd(se_cmd);
-		return 0;
-	}
-
 	/*
 	 * Locate se_lun pointer and attach it to struct se_cmd
 	 */
@@ -1657,13 +1649,7 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 	if (rc) {
 		transport_send_check_condition_and_sense(se_cmd, rc, 0);
 		target_put_sess_cmd(se_cmd);
-		return 0;
-	}
-
-	rc = target_cmd_parse_cdb(se_cmd);
-	if (rc != 0) {
-		transport_generic_request_failure(se_cmd, rc);
-		return 0;
+		return 1;
 	}
 
 	/*
@@ -1684,6 +1670,43 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 	if (sgl_count != 0) {
 		BUG_ON(!sgl);
 
+		rc = transport_generic_map_mem_to_cmd(se_cmd, sgl, sgl_count,
+				sgl_bidi, sgl_bidi_count);
+		if (rc != 0) {
+			transport_generic_request_failure(se_cmd, rc);
+			return 1;
+		}
+	}
+
+	return 0;
+}
+
+/**
+ * target_submit - perform final initialization and submit cmd to LIO core
+ * @se_cmd: command descriptor to submit
+ * @cdb: pointer to SCSI CDB
+ *
+ * target_submit_prep must have been called on the cmd, and this must be
+ * called from process context.
+ */
+static void target_submit(struct se_cmd *se_cmd, unsigned char *cdb)
+{
+	struct scatterlist *sgl = se_cmd->t_data_sg;
+	unsigned char *buf = NULL;
+	int rc;
+
+	might_sleep();
+
+	rc = target_cmd_init_cdb(se_cmd, cdb);
+	if (rc)
+		goto fail;
+
+	rc = target_cmd_parse_cdb(se_cmd);
+	if (rc != 0)
+		goto fail;
+
+	if (se_cmd->t_data_nents != 0) {
+		BUG_ON(!sgl);
 		/*
 		 * A work-around for tcm_loop as some userspace code via
 		 * scsi-generic do not memset their associated read buffers,
@@ -1694,8 +1717,6 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 		 */
 		if (!(se_cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) &&
 		     se_cmd->data_direction == DMA_FROM_DEVICE) {
-			unsigned char *buf = NULL;
-
 			if (sgl)
 				buf = kmap(sg_page(sgl)) + sgl->offset;
 
@@ -1705,12 +1726,6 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 			}
 		}
 
-		rc = transport_generic_map_mem_to_cmd(se_cmd, sgl, sgl_count,
-				sgl_bidi, sgl_bidi_count);
-		if (rc != 0) {
-			transport_generic_request_failure(se_cmd, rc);
-			return 0;
-		}
 	}
 
 	/*
@@ -1720,6 +1735,60 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 	core_alua_check_nonop_delay(se_cmd);
 
 	transport_handle_cdb_direct(se_cmd);
+	return;
+
+fail:
+	transport_generic_request_failure(se_cmd, rc);
+}
+
+/**
+ * target_submit_cmd_map_sgls - lookup unpacked lun and submit uninitialized
+ *					se_cmd + use pre-allocated SGL memory.
+ *
+ * @se_cmd: command descriptor to submit
+ * @se_sess: associated se_sess for endpoint
+ * @cdb: pointer to SCSI CDB
+ * @sense: pointer to SCSI sense buffer
+ * @unpacked_lun: unpacked LUN to reference for struct se_lun
+ * @data_length: fabric expected data transfer length
+ * @task_attr: SAM task attribute
+ * @data_dir: DMA data direction
+ * @flags: flags for command submission from target_sc_flags_tables
+ * @sgl: struct scatterlist memory for unidirectional mapping
+ * @sgl_count: scatterlist count for unidirectional mapping
+ * @sgl_bidi: struct scatterlist memory for bidirectional READ mapping
+ * @sgl_bidi_count: scatterlist count for bidirectional READ mapping
+ * @sgl_prot: struct scatterlist memory protection information
+ * @sgl_prot_count: scatterlist count for protection information
+ *
+ * Task tags are supported if the caller has set @se_cmd->tag.
+ *
+ * Returns non zero to signal active I/O shutdown failure.  All other
+ * setup exceptions will be returned as a SCSI CHECK_CONDITION response,
+ * but still return zero here.
+ *
+ * This may only be called from process context, and also currently
+ * assumes internal allocation of fabric payload buffer by target-core.
+ */
+int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess,
+		unsigned char *cdb, unsigned char *sense, u64 unpacked_lun,
+		u32 data_length, int task_attr, int data_dir, int flags,
+		struct scatterlist *sgl, u32 sgl_count,
+		struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+		struct scatterlist *sgl_prot, u32 sgl_prot_count)
+{
+	int ret;
+
+	ret = target_submit_prep(se_cmd, se_sess, sense, unpacked_lun,
+				 data_length, task_attr, data_dir, flags,
+				 sgl, sgl_count, sgl_bidi, sgl_bidi_count,
+				 sgl_prot, sgl_prot_count);
+	if (ret < 0)
+		return ret;
+	else if (ret > 0)
+		return 0;
+
+	target_submit(se_cmd, cdb);
 	return 0;
 }
 EXPORT_SYMBOL(target_submit_cmd_map_sgls);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/13] target: add workqueue based cmd submission
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
  2021-02-09 12:38 ` [PATCH 01/13] target: move t_task_cdb initialization Mike Christie
  2021-02-09 12:38 ` [PATCH 02/13] target: split target_submit_cmd_map_sgls Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 15:48   ` Bodo Stroesser
  2021-02-09 12:38 ` [PATCH 04/13] vhost scsi: use lio wq cmd submission helper Mike Christie
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

loop and vhost-scsi do their target cmd submission from driver
workqueues. This allows them to avoid an issue where the backend may
block waiting for resources like tags/requests, mem/locks, etc
and that ends up blocking their entire submission path and for the
case of vhost-scsi both the submission and completion path.

This patch adds a helper drivers can use to submit from a lio
workqueue. This code will then be extended in the next patches to
fix the plugging of backend devices.

Note: I'm only converting vhost/loop initially, but the workqueue
based submission will work for other drivers and have similar
benefits where the main target loops will not end up blocking one
some backend resource. I'll port others when I have more time to
test as I think we might want to make it configurable for some
drivers.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_device.c    | 10 ++-
 drivers/target/target_core_internal.h  |  1 +
 drivers/target/target_core_transport.c | 91 +++++++++++++++++++++++++-
 include/target/target_core_base.h      |  8 ++-
 include/target/target_core_fabric.h    |  7 ++
 5 files changed, 113 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index 7787c527aad3..74d3a4896588 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -735,8 +735,14 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
 
 	dev->queue_cnt = nr_cpu_ids;
 	for (i = 0; i < dev->queue_cnt; i++) {
-		INIT_LIST_HEAD(&dev->queues[i].state_list);
-		spin_lock_init(&dev->queues[i].lock);
+		struct se_device_queue *q;
+
+		q = &dev->queues[i];
+		INIT_LIST_HEAD(&q->state_list);
+		spin_lock_init(&q->lock);
+
+		init_llist_head(&q->sq.cmd_list);
+		INIT_WORK(&q->sq.work, target_queued_submit_work);
 	}
 
 	dev->se_hba = hba;
diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
index e7b3c6e5d574..56f841fd7f04 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -153,6 +153,7 @@ void	target_qf_do_work(struct work_struct *work);
 bool	target_check_wce(struct se_device *dev);
 bool	target_check_fua(struct se_device *dev);
 void	__target_execute_cmd(struct se_cmd *, bool);
+void	target_queued_submit_work(struct work_struct *work);
 
 /* target_core_stat.c */
 void	target_stat_setup_dev_default_groups(struct se_device *);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 71b0a862608b..46e20c42119e 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -41,6 +41,7 @@
 #include <trace/events/target.h>
 
 static struct workqueue_struct *target_completion_wq;
+static struct workqueue_struct *target_submission_wq;
 static struct kmem_cache *se_sess_cache;
 struct kmem_cache *se_ua_cache;
 struct kmem_cache *t10_pr_reg_cache;
@@ -129,8 +130,15 @@ int init_se_kmem_caches(void)
 	if (!target_completion_wq)
 		goto out_free_lba_map_mem_cache;
 
+	target_submission_wq = alloc_workqueue("target_submission",
+					       WQ_MEM_RECLAIM, 0);
+	if (!target_submission_wq)
+		goto out_free_completion_wq;
+
 	return 0;
 
+out_free_completion_wq:
+	destroy_workqueue(target_completion_wq);
 out_free_lba_map_mem_cache:
 	kmem_cache_destroy(t10_alua_lba_map_mem_cache);
 out_free_lba_map_cache:
@@ -153,6 +161,7 @@ int init_se_kmem_caches(void)
 
 void release_se_kmem_caches(void)
 {
+	destroy_workqueue(target_submission_wq);
 	destroy_workqueue(target_completion_wq);
 	kmem_cache_destroy(se_sess_cache);
 	kmem_cache_destroy(se_ua_cache);
@@ -1380,7 +1389,6 @@ void transport_init_se_cmd(
 {
 	INIT_LIST_HEAD(&cmd->se_delayed_node);
 	INIT_LIST_HEAD(&cmd->se_qf_node);
-	INIT_LIST_HEAD(&cmd->se_cmd_list);
 	INIT_LIST_HEAD(&cmd->state_list);
 	init_completion(&cmd->t_transport_stop_comp);
 	cmd->free_compl = NULL;
@@ -1827,6 +1835,87 @@ int target_submit_cmd(struct se_cmd *se_cmd, struct se_session *se_sess,
 }
 EXPORT_SYMBOL(target_submit_cmd);
 
+void target_queued_submit_work(struct work_struct *work)
+{
+	struct se_cmd_queue *sq = container_of(work, struct se_cmd_queue, work);
+	struct se_cmd *se_cmd, *next_cmd;
+	struct llist_node *cmd_list;
+	unsigned char *cdb;
+
+	cmd_list = llist_del_all(&sq->cmd_list);
+	if (!cmd_list)
+		/* Previous call took what we were queued to submit */
+		return;
+
+	cmd_list = llist_reverse_order(cmd_list);
+	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
+		cdb = se_cmd->se_tfo->get_cdb(se_cmd);
+		target_submit(se_cmd, cdb);
+	}
+}
+
+static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
+				  int cpu)
+{
+	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
+	queue_work_on(cpu, target_submission_wq, &q->work);
+}
+
+/**
+ * target_queue_cmd_submit - queue the cmd to run on the LIO workqueue
+ * @se_cmd: command descriptor to submit
+ * @se_sess: associated se_sess for endpoint
+ * @sense: pointer to SCSI sense buffer
+ * @unpacked_lun: unpacked LUN to reference for struct se_lun
+ * @data_length: fabric expected data transfer length
+ * @task_attr: SAM task attribute
+ * @data_dir: DMA data direction
+ * @flags: flags for command submission from target_sc_flags_tables
+ * @sgl: struct scatterlist memory for unidirectional mapping
+ * @sgl_count: scatterlist count for unidirectional mapping
+ * @sgl_bidi: struct scatterlist memory for bidirectional READ mapping
+ * @sgl_bidi_count: scatterlist count for bidirectional READ mapping
+ * @sgl_prot: struct scatterlist memory protection information
+ * @sgl_prot_count: scatterlist count for protection information
+ *
+ * Task tags are supported if the caller has set @se_cmd->tag.
+ *
+ * Returns non zero to signal active I/O shutdown failure.  All other
+ * setup exceptions will be returned as a SCSI CHECK_CONDITION response,
+ * but still return zero here.
+ *
+ * This may only be called from interrupt context if the caller's
+ * queue_status and release_cmd callouts do not block.
+ *
+ * This assumes internal allocation of fabric payload buffer by target-core.
+ */
+int
+target_queue_cmd_submit(struct se_cmd *se_cmd, struct se_session *se_sess,
+			unsigned char *sense, u64 unpacked_lun, u32 data_length,
+			int task_attr, int data_dir, int flags,
+			struct scatterlist *sgl, u32 sgl_count,
+			struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+			struct scatterlist *sgl_prot, u32 sgl_prot_count)
+{
+	struct se_device *se_dev;
+	int cpu, ret;
+
+	ret = target_submit_prep(se_cmd, se_sess, sense, unpacked_lun,
+				 data_length, task_attr, data_dir, flags,
+				 sgl, sgl_count, sgl_bidi, sgl_bidi_count,
+				 sgl_prot, sgl_prot_count);
+	if (ret < 0)
+		return ret;
+	else if (ret > 0)
+		return 0;
+
+	cpu = se_cmd->cpuid;
+	se_dev = se_cmd->se_dev;
+	target_queue_cmd_work(&se_dev->queues[cpu].sq, se_cmd, cpu);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
+
 static void target_complete_tmr_failure(struct work_struct *work)
 {
 	struct se_cmd *se_cmd = container_of(work, struct se_cmd, work);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 63dd12124139..815de4c97230 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -487,7 +487,7 @@ struct se_cmd {
 	/* Only used for internal passthrough and legacy TCM fabric modules */
 	struct se_session	*se_sess;
 	struct se_tmr_req	*se_tmr_req;
-	struct list_head	se_cmd_list;
+	struct llist_node	se_cmd_list;
 	struct completion	*free_compl;
 	struct completion	*abrt_compl;
 	const struct target_core_fabric_ops *se_tfo;
@@ -764,9 +764,15 @@ struct se_dev_stat_grps {
 	struct config_group scsi_lu_group;
 };
 
+struct se_cmd_queue {
+	struct llist_head	cmd_list;
+	struct work_struct	work;
+};
+
 struct se_device_queue {
 	struct list_head	state_list;
 	spinlock_t		lock;
+	struct se_cmd_queue	sq;
 };
 
 struct se_device {
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index d60a3eb7517a..d7b54f3e04da 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -80,6 +80,7 @@ struct target_core_fabric_ops {
 	int (*queue_status)(struct se_cmd *);
 	void (*queue_tm_rsp)(struct se_cmd *);
 	void (*aborted_task)(struct se_cmd *);
+	unsigned char *(*get_cdb)(struct se_cmd *);
 	/*
 	 * fabric module calls for target_core_fabric_configfs.c
 	 */
@@ -160,6 +161,12 @@ int	target_submit_cmd_map_sgls(struct se_cmd *, struct se_session *,
 		struct scatterlist *, u32);
 int	target_submit_cmd(struct se_cmd *, struct se_session *, unsigned char *,
 		unsigned char *, u64, u32, int, int, int);
+int	target_queue_cmd_submit(struct se_cmd *se_cmd, struct se_session *se_sess,
+		unsigned char *sense, u64 unpacked_lun, u32 data_length,
+		int task_attr, int data_dir, int flags, struct scatterlist *sgl,
+		u32 sgl_count, struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+		struct scatterlist *sgl_prot, u32 sgl_prot_count);
+
 int	target_submit_tmr(struct se_cmd *se_cmd, struct se_session *se_sess,
 		unsigned char *sense, u64 unpacked_lun,
 		void *fabric_tmr_ptr, unsigned char tm_type,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/13] vhost scsi: use lio wq cmd submission helper
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (2 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 03/13] target: add workqueue based cmd submission Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 12:38 ` [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds Mike Christie
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

Convert vhost-scsi to use the lio wq cmd submission helper.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/vhost/scsi.c | 49 +++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 32 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4ce9f00ae10e..99909c6f3960 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -85,7 +85,7 @@ struct vhost_scsi_cmd {
 	/* The number of scatterlists associated with this cmd */
 	u32 tvc_sgl_count;
 	u32 tvc_prot_sgl_count;
-	/* Saved unpacked SCSI LUN for vhost_scsi_submission_work() */
+	/* Saved unpacked SCSI LUN for vhost_scsi_target_queue_cmd() */
 	u32 tvc_lun;
 	/* Pointer to the SGL formatted memory from virtio-scsi */
 	struct scatterlist *tvc_sgl;
@@ -101,8 +101,6 @@ struct vhost_scsi_cmd {
 	struct vhost_scsi_nexus *tvc_nexus;
 	/* The TCM I/O descriptor that is accessed via container_of() */
 	struct se_cmd tvc_se_cmd;
-	/* work item used for cmwq dispatch to vhost_scsi_submission_work() */
-	struct work_struct work;
 	/* Copy of the incoming SCSI command descriptor block (CDB) */
 	unsigned char tvc_cdb[VHOST_SCSI_MAX_CDB_SIZE];
 	/* Sense buffer that will be mapped into outgoing status */
@@ -240,8 +238,6 @@ struct vhost_scsi_ctx {
 	struct iov_iter out_iter;
 };
 
-static struct workqueue_struct *vhost_scsi_workqueue;
-
 /* Global spinlock to protect vhost_scsi TPG list for vhost IOCTL access */
 static DEFINE_MUTEX(vhost_scsi_mutex);
 static LIST_HEAD(vhost_scsi_list);
@@ -782,12 +778,10 @@ static int vhost_scsi_to_tcm_attr(int attr)
 	return TCM_SIMPLE_TAG;
 }
 
-static void vhost_scsi_submission_work(struct work_struct *work)
+static void vhost_scsi_target_queue_cmd(struct vhost_scsi_cmd *cmd)
 {
-	struct vhost_scsi_cmd *cmd =
-		container_of(work, struct vhost_scsi_cmd, work);
-	struct vhost_scsi_nexus *tv_nexus;
 	struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
+	struct vhost_scsi_nexus *tv_nexus;
 	struct scatterlist *sg_ptr, *sg_prot_ptr = NULL;
 	int rc;
 
@@ -805,9 +799,9 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 	tv_nexus = cmd->tvc_nexus;
 
 	se_cmd->tag = 0;
-	rc = target_submit_cmd_map_sgls(se_cmd, tv_nexus->tvn_se_sess,
-			cmd->tvc_cdb, &cmd->tvc_sense_buf[0],
-			cmd->tvc_lun, cmd->tvc_exp_data_len,
+	rc = target_queue_cmd_submit(se_cmd, tv_nexus->tvn_se_sess,
+			&cmd->tvc_sense_buf[0], cmd->tvc_lun,
+			cmd->tvc_exp_data_len,
 			vhost_scsi_to_tcm_attr(cmd->tvc_task_attr),
 			cmd->tvc_data_direction, TARGET_SCF_ACK_KREF,
 			sg_ptr, cmd->tvc_sgl_count, NULL, 0, sg_prot_ptr,
@@ -819,6 +813,14 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 	}
 }
 
+static unsigned char *vhost_scsi_get_cdb(struct se_cmd *se_cmd)
+{
+	struct vhost_scsi_cmd *cmd = container_of(se_cmd, struct vhost_scsi_cmd,
+						  tvc_se_cmd);
+
+	return cmd->tvc_cdb;
+}
+
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
 			   struct vhost_virtqueue *vq,
@@ -1132,14 +1134,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
 		 */
 		cmd->tvc_vq_desc = vc.head;
-		/*
-		 * Dispatch cmd descriptor for cmwq execution in process
-		 * context provided by vhost_scsi_workqueue.  This also ensures
-		 * cmd is executed on the same kworker CPU as this vhost
-		 * thread to gain positive L2 cache locality effects.
-		 */
-		INIT_WORK(&cmd->work, vhost_scsi_submission_work);
-		queue_work(vhost_scsi_workqueue, &cmd->work);
+		vhost_scsi_target_queue_cmd(cmd);
 		ret = 0;
 err:
 		/*
@@ -2466,6 +2461,7 @@ static const struct target_core_fabric_ops vhost_scsi_ops = {
 	.queue_status			= vhost_scsi_queue_status,
 	.queue_tm_rsp			= vhost_scsi_queue_tm_rsp,
 	.aborted_task			= vhost_scsi_aborted_task,
+	.get_cdb			= vhost_scsi_get_cdb,
 	/*
 	 * Setup callers for generic logic in target_core_fabric_configfs.c
 	 */
@@ -2489,17 +2485,9 @@ static int __init vhost_scsi_init(void)
 		" on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
 		utsname()->machine);
 
-	/*
-	 * Use our own dedicated workqueue for submitting I/O into
-	 * target core to avoid contention within system_wq.
-	 */
-	vhost_scsi_workqueue = alloc_workqueue("vhost_scsi", 0, 0);
-	if (!vhost_scsi_workqueue)
-		goto out;
-
 	ret = vhost_scsi_register();
 	if (ret < 0)
-		goto out_destroy_workqueue;
+		goto out;
 
 	ret = target_register_template(&vhost_scsi_ops);
 	if (ret < 0)
@@ -2509,8 +2497,6 @@ static int __init vhost_scsi_init(void)
 
 out_vhost_scsi_deregister:
 	vhost_scsi_deregister();
-out_destroy_workqueue:
-	destroy_workqueue(vhost_scsi_workqueue);
 out:
 	return ret;
 };
@@ -2519,7 +2505,6 @@ static void vhost_scsi_exit(void)
 {
 	target_unregister_template(&vhost_scsi_ops);
 	vhost_scsi_deregister();
-	destroy_workqueue(vhost_scsi_workqueue);
 };
 
 MODULE_DESCRIPTION("VHOST_SCSI series fabric driver");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (3 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 04/13] vhost scsi: use lio wq cmd submission helper Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-10  8:37     ` Christoph Hellwig
  2021-02-09 12:38 ` [PATCH 06/13] tcm loop: use lio wq cmd submission helper Mike Christie
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

This just has tcm loop use the block layer cmd allocator for se_cmds
instead of using the tcm_loop_cmd_cache. In future patches when we
can use the host tags for internal requests like TMFs we can completely
kill the tcm_loop_cmd_cache.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/loopback/tcm_loop.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index badba437e5f9..274826a2b0bd 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -67,8 +67,12 @@ static void tcm_loop_release_cmd(struct se_cmd *se_cmd)
 {
 	struct tcm_loop_cmd *tl_cmd = container_of(se_cmd,
 				struct tcm_loop_cmd, tl_se_cmd);
+	struct scsi_cmnd *sc = tl_cmd->sc;
 
-	kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
+	if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
+		kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
+	else
+		sc->scsi_done(sc);
 }
 
 static int tcm_loop_show_info(struct seq_file *m, struct Scsi_Host *host)
@@ -165,7 +169,6 @@ static void tcm_loop_submission_work(struct work_struct *work)
 	return;
 
 out_done:
-	kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
 	sc->scsi_done(sc);
 }
 
@@ -175,20 +178,14 @@ static void tcm_loop_submission_work(struct work_struct *work)
  */
 static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 {
-	struct tcm_loop_cmd *tl_cmd;
+	struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
 
 	pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
 		 __func__, sc->device->host->host_no, sc->device->id,
 		 sc->device->channel, sc->device->lun, sc->cmnd[0],
 		 scsi_bufflen(sc));
 
-	tl_cmd = kmem_cache_zalloc(tcm_loop_cmd_cache, GFP_ATOMIC);
-	if (!tl_cmd) {
-		set_host_byte(sc, DID_ERROR);
-		sc->scsi_done(sc);
-		return 0;
-	}
-
+	memset(tl_cmd, 0, sizeof(*tl_cmd));
 	tl_cmd->sc = sc;
 	tl_cmd->sc_cmd_tag = sc->request->tag;
 	INIT_WORK(&tl_cmd->work, tcm_loop_submission_work);
@@ -320,6 +317,7 @@ static struct scsi_host_template tcm_loop_driver_template = {
 	.dma_boundary		= PAGE_SIZE - 1,
 	.module			= THIS_MODULE,
 	.track_queue_depth	= 1,
+	.cmd_size		= sizeof(struct tcm_loop_cmd),
 };
 
 static int tcm_loop_driver_probe(struct device *dev)
@@ -580,7 +578,6 @@ static int tcm_loop_queue_data_or_status(const char *func,
 	if ((se_cmd->se_cmd_flags & SCF_OVERFLOW_BIT) ||
 	    (se_cmd->se_cmd_flags & SCF_UNDERFLOW_BIT))
 		scsi_set_resid(sc, se_cmd->residual_count);
-	sc->scsi_done(sc);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/13] tcm loop: use lio wq cmd submission helper
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (4 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 15:59   ` Bodo Stroesser
  2021-02-09 17:39     ` kernel test robot
  2021-02-09 12:38 ` [PATCH 07/13] target: cleanup cmd flag bits Mike Christie
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

Convert loop to use the lio wq cmd submission helper.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/loopback/tcm_loop.c | 34 +++++++++++++++++-------------
 drivers/target/loopback/tcm_loop.h |  1 -
 2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 274826a2b0bd..3642f381067e 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -39,7 +39,6 @@
 
 #define to_tcm_loop_hba(hba)	container_of(hba, struct tcm_loop_hba, dev)
 
-static struct workqueue_struct *tcm_loop_workqueue;
 static struct kmem_cache *tcm_loop_cmd_cache;
 
 static int tcm_loop_hba_no_cnt;
@@ -106,10 +105,16 @@ static struct device_driver tcm_loop_driverfs = {
  */
 static struct device *tcm_loop_primary;
 
-static void tcm_loop_submission_work(struct work_struct *work)
+static unsigned char *tcm_loop_get_cdb(struct se_cmd *se_cmd)
+{
+	struct tcm_loop_cmd *tl_cmd = container_of(se_cmd, struct tcm_loop_cmd,
+						   tl_se_cmd);
+
+	return tl_cmd->sc->cmnd;
+}
+
+static void tcm_loop_target_queue_cmd(struct tcm_loop_cmd *tl_cmd)
 {
-	struct tcm_loop_cmd *tl_cmd =
-		container_of(work, struct tcm_loop_cmd, work);
 	struct se_cmd *se_cmd = &tl_cmd->tl_se_cmd;
 	struct scsi_cmnd *sc = tl_cmd->sc;
 	struct tcm_loop_nexus *tl_nexus;
@@ -155,7 +160,7 @@ static void tcm_loop_submission_work(struct work_struct *work)
 	}
 
 	se_cmd->tag = tl_cmd->sc_cmd_tag;
-	rc = target_submit_cmd_map_sgls(se_cmd, tl_nexus->se_sess, sc->cmnd,
+	rc = target_queue_cmd_submit(se_cmd, tl_nexus->se_sess,
 			&tl_cmd->tl_sense_buf[0], tl_cmd->sc->device->lun,
 			transfer_length, TCM_SIMPLE_TAG,
 			sc->sc_data_direction, 0,
@@ -179,6 +184,11 @@ static void tcm_loop_submission_work(struct work_struct *work)
 static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 {
 	struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
+	struct tcm_loop_hba *tl_hba;
+	struct tcm_loop_tpg *tl_tpg;
+
+	tl_hba = *(struct tcm_loop_hba **)shost_priv(sc->device->host);
+	tl_tpg = &tl_hba->tl_hba_tpgs[sc->device->id];
 
 	pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
 		 __func__, sc->device->host->host_no, sc->device->id,
@@ -188,8 +198,8 @@ static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 	memset(tl_cmd, 0, sizeof(*tl_cmd));
 	tl_cmd->sc = sc;
 	tl_cmd->sc_cmd_tag = sc->request->tag;
-	INIT_WORK(&tl_cmd->work, tcm_loop_submission_work);
-	queue_work(tcm_loop_workqueue, &tl_cmd->work);
+
+	tcm_loop_target_queue_cmd(tl_cmd);
 	return 0;
 }
 
@@ -1146,6 +1156,7 @@ static const struct target_core_fabric_ops loop_ops = {
 	.queue_status			= tcm_loop_queue_status,
 	.queue_tm_rsp			= tcm_loop_queue_tm_rsp,
 	.aborted_task			= tcm_loop_aborted_task,
+	.get_cdb			= tcm_loop_get_cdb,
 	.fabric_make_wwn		= tcm_loop_make_scsi_hba,
 	.fabric_drop_wwn		= tcm_loop_drop_scsi_hba,
 	.fabric_make_tpg		= tcm_loop_make_naa_tpg,
@@ -1161,17 +1172,13 @@ static int __init tcm_loop_fabric_init(void)
 {
 	int ret = -ENOMEM;
 
-	tcm_loop_workqueue = alloc_workqueue("tcm_loop", 0, 0);
-	if (!tcm_loop_workqueue)
-		goto out;
-
 	tcm_loop_cmd_cache = kmem_cache_create("tcm_loop_cmd_cache",
 				sizeof(struct tcm_loop_cmd),
 				__alignof__(struct tcm_loop_cmd),
 				0, NULL);
 	if (!tcm_loop_cmd_cache) {
 		pr_debug("kmem_cache_create() for tcm_loop_cmd_cache failed\n");
-		goto out_destroy_workqueue;
+		goto out;
 	}
 
 	ret = tcm_loop_alloc_core_bus();
@@ -1188,8 +1195,6 @@ static int __init tcm_loop_fabric_init(void)
 	tcm_loop_release_core_bus();
 out_destroy_cache:
 	kmem_cache_destroy(tcm_loop_cmd_cache);
-out_destroy_workqueue:
-	destroy_workqueue(tcm_loop_workqueue);
 out:
 	return ret;
 }
@@ -1199,7 +1204,6 @@ static void __exit tcm_loop_fabric_exit(void)
 	target_unregister_template(&loop_ops);
 	tcm_loop_release_core_bus();
 	kmem_cache_destroy(tcm_loop_cmd_cache);
-	destroy_workqueue(tcm_loop_workqueue);
 }
 
 MODULE_DESCRIPTION("TCM loopback virtual Linux/SCSI fabric module");
diff --git a/drivers/target/loopback/tcm_loop.h b/drivers/target/loopback/tcm_loop.h
index d3110909a213..437663b3905c 100644
--- a/drivers/target/loopback/tcm_loop.h
+++ b/drivers/target/loopback/tcm_loop.h
@@ -16,7 +16,6 @@ struct tcm_loop_cmd {
 	struct scsi_cmnd *sc;
 	/* The TCM I/O descriptor that is accessed via container_of() */
 	struct se_cmd tl_se_cmd;
-	struct work_struct work;
 	struct completion tmr_done;
 	/* Sense buffer that will be mapped into outgoing status */
 	unsigned char tl_sense_buf[TRANSPORT_SENSE_BUFFER];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/13] target: cleanup cmd flag bits
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (5 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 06/13] tcm loop: use lio wq cmd submission helper Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-10  8:38     ` Christoph Hellwig
  2021-02-09 12:38 ` [PATCH 08/13] target: fix backend plugging Mike Christie
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie, Chaitanya Kulkarni

We have a couple holes in the cmd flags definitions. This cleans
up the definitions to fix that and make it easier to read.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 include/target/target_core_base.h | 38 +++++++++++++++----------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 815de4c97230..5e6703ca102d 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -127,25 +127,25 @@ enum transport_state_table {
 
 /* Used for struct se_cmd->se_cmd_flags */
 enum se_cmd_flags_table {
-	SCF_SUPPORTED_SAM_OPCODE	= 0x00000001,
-	SCF_TRANSPORT_TASK_SENSE	= 0x00000002,
-	SCF_EMULATED_TASK_SENSE		= 0x00000004,
-	SCF_SCSI_DATA_CDB		= 0x00000008,
-	SCF_SCSI_TMR_CDB		= 0x00000010,
-	SCF_FUA				= 0x00000080,
-	SCF_SE_LUN_CMD			= 0x00000100,
-	SCF_BIDI			= 0x00000400,
-	SCF_SENT_CHECK_CONDITION	= 0x00000800,
-	SCF_OVERFLOW_BIT		= 0x00001000,
-	SCF_UNDERFLOW_BIT		= 0x00002000,
-	SCF_ALUA_NON_OPTIMIZED		= 0x00008000,
-	SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC = 0x00020000,
-	SCF_COMPARE_AND_WRITE		= 0x00080000,
-	SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC = 0x00200000,
-	SCF_ACK_KREF			= 0x00400000,
-	SCF_USE_CPUID			= 0x00800000,
-	SCF_TASK_ATTR_SET		= 0x01000000,
-	SCF_TREAT_READ_AS_NORMAL	= 0x02000000,
+	SCF_SUPPORTED_SAM_OPCODE		= (1 << 0),
+	SCF_TRANSPORT_TASK_SENSE		= (1 << 1),
+	SCF_EMULATED_TASK_SENSE			= (1 << 2),
+	SCF_SCSI_DATA_CDB			= (1 << 3),
+	SCF_SCSI_TMR_CDB			= (1 << 4),
+	SCF_FUA					= (1 << 5),
+	SCF_SE_LUN_CMD				= (1 << 6),
+	SCF_BIDI				= (1 << 7),
+	SCF_SENT_CHECK_CONDITION		= (1 << 8),
+	SCF_OVERFLOW_BIT			= (1 << 9),
+	SCF_UNDERFLOW_BIT			= (1 << 10),
+	SCF_ALUA_NON_OPTIMIZED			= (1 << 11),
+	SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC	= (1 << 12),
+	SCF_COMPARE_AND_WRITE			= (1 << 13),
+	SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC	= (1 << 14),
+	SCF_ACK_KREF				= (1 << 15),
+	SCF_USE_CPUID				= (1 << 16),
+	SCF_TASK_ATTR_SET			= (1 << 17),
+	SCF_TREAT_READ_AS_NORMAL		= (1 << 18),
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/13] target: fix backend plugging
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (6 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 07/13] target: cleanup cmd flag bits Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 12:38 ` [PATCH 09/13] target iblock: add backend plug/unplug callouts Mike Christie
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

target_core_iblock is plugging and unplugging on every command and this
is causing perf issues for drivers that prefer batched cmds. With the
last patches we can now take multiple cmds from a fabric driver queue
and then pass them down the backend drivers in a batch. This patch adds
this support by adding 2 callouts to the backend for plugging and
unplugging the device. The next 2 patches add support for iblock and
tcmu device plugging.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 40 ++++++++++++++++++++++++++
 include/target/target_core_backend.h   |  2 ++
 include/target/target_core_base.h      |  4 +++
 3 files changed, 46 insertions(+)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 46e20c42119e..a90f20222894 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1835,10 +1835,42 @@ int target_submit_cmd(struct se_cmd *se_cmd, struct se_session *se_sess,
 }
 EXPORT_SYMBOL(target_submit_cmd);
 
+
+static struct se_dev_plug *target_plug_device(struct se_device *se_dev)
+{
+	struct se_dev_plug *se_plug;
+
+	if (!se_dev->transport->plug_device)
+		return NULL;
+
+	se_plug = se_dev->transport->plug_device(se_dev);
+	if (!se_plug)
+		return NULL;
+
+	se_plug->se_dev = se_dev;
+	/*
+	 * We have a ref to the lun at this point, but the cmds could
+	 * complete before we unplug, so grab a ref to the se_device so we
+	 * can call back into the backend.
+	 */
+	config_group_get(&se_dev->dev_group);
+	return se_plug;
+}
+
+static void target_unplug_device(struct se_dev_plug *se_plug)
+{
+	struct se_device *se_dev = se_plug->se_dev;
+
+	se_dev->transport->unplug_device(se_plug);
+	config_group_put(&se_dev->dev_group);
+}
+
 void target_queued_submit_work(struct work_struct *work)
 {
 	struct se_cmd_queue *sq = container_of(work, struct se_cmd_queue, work);
 	struct se_cmd *se_cmd, *next_cmd;
+	struct se_dev_plug *se_plug = NULL;
+	struct se_device *se_dev = NULL;
 	struct llist_node *cmd_list;
 	unsigned char *cdb;
 
@@ -1849,9 +1881,17 @@ void target_queued_submit_work(struct work_struct *work)
 
 	cmd_list = llist_reverse_order(cmd_list);
 	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
+		if (!se_dev) {
+			se_dev = se_cmd->se_dev;
+			se_plug = target_plug_device(se_dev);
+		}
+
 		cdb = se_cmd->se_tfo->get_cdb(se_cmd);
 		target_submit(se_cmd, cdb);
 	}
+
+	if (se_plug)
+		target_unplug_device(se_plug);
 }
 
 static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index 6336780d83a7..aa5f83e55501 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -34,6 +34,8 @@ struct target_backend_ops {
 	int (*configure_device)(struct se_device *);
 	void (*destroy_device)(struct se_device *);
 	void (*free_device)(struct se_device *device);
+	struct se_dev_plug *(*plug_device)(struct se_device *se_dev);
+	void (*unplug_device)(struct se_dev_plug *se_plug);
 
 	ssize_t (*set_configfs_dev_params)(struct se_device *,
 					   const char *, ssize_t);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 5e6703ca102d..b8e0a3250bd0 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -769,6 +769,10 @@ struct se_cmd_queue {
 	struct work_struct	work;
 };
 
+struct se_dev_plug {
+	struct se_device	*se_dev;
+};
+
 struct se_device_queue {
 	struct list_head	state_list;
 	spinlock_t		lock;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/13] target iblock: add backend plug/unplug callouts
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (7 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 08/13] target: fix backend plugging Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 12:38 ` [PATCH 10/13] target_core_user: " Mike Christie
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

This patch adds plug/unplug callouts for iblock. For initiator drivers
like iscsi which wants to pass multiple cmds to its xmit thread instead
of one cmd at a time, this increases IOPs by around 10% with vhost-scsi
(combined with the last patches we can see a total 40-50% increase). For
driver combos like tcm_loop and faster drivers like the iser initiator, we
can still see IOPs increase by 20-30% when tcm_loop's nr_hw_queues setting
is also increased.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_iblock.c | 39 ++++++++++++++++++++++++++++-
 drivers/target/target_core_iblock.h | 10 ++++++++
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 8ed93fd205c7..58e0d47d5163 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -61,9 +61,18 @@ static struct se_device *iblock_alloc_device(struct se_hba *hba, const char *nam
 		return NULL;
 	}
 
+	ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
+				   GFP_KERNEL);
+	if (!ib_dev->ibd_plug)
+		goto free_dev;
+
 	pr_debug( "IBLOCK: Allocated ib_dev for %s\n", name);
 
 	return &ib_dev->dev;
+
+free_dev:
+	kfree(ib_dev);
+	return NULL;
 }
 
 static int iblock_configure_device(struct se_device *dev)
@@ -171,6 +180,7 @@ static void iblock_dev_call_rcu(struct rcu_head *p)
 	struct se_device *dev = container_of(p, struct se_device, rcu_head);
 	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
 
+	kfree(ib_dev->ibd_plug);
 	kfree(ib_dev);
 }
 
@@ -188,6 +198,28 @@ static void iblock_destroy_device(struct se_device *dev)
 	bioset_exit(&ib_dev->ibd_bio_set);
 }
 
+static struct se_dev_plug *iblock_plug_device(struct se_device *se_dev)
+{
+	struct iblock_dev *ib_dev = IBLOCK_DEV(se_dev);
+	struct iblock_dev_plug *ib_dev_plug;
+
+	ib_dev_plug = &ib_dev->ibd_plug[smp_processor_id()];
+	if (test_and_set_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags))
+		return NULL;
+
+	blk_start_plug(&ib_dev_plug->blk_plug);
+	return &ib_dev_plug->se_plug;
+}
+
+static void iblock_unplug_device(struct se_dev_plug *se_plug)
+{
+	struct iblock_dev_plug *ib_dev_plug = container_of(se_plug,
+					struct iblock_dev_plug, se_plug);
+
+	blk_finish_plug(&ib_dev_plug->blk_plug);
+	clear_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags);
+}
+
 static unsigned long long iblock_emulate_read_cap_with_block_size(
 	struct se_device *dev,
 	struct block_device *bd,
@@ -337,7 +369,10 @@ static void iblock_submit_bios(struct bio_list *list)
 {
 	struct blk_plug plug;
 	struct bio *bio;
-
+	/*
+	 * The block layer handles nested plugs, so just plug/unplug to handle
+	 * fabric drivers that didn't support batching and multi bio cmds.
+	 */
 	blk_start_plug(&plug);
 	while ((bio = bio_list_pop(list)))
 		submit_bio(bio);
@@ -870,6 +905,8 @@ static const struct target_backend_ops iblock_ops = {
 	.configure_device	= iblock_configure_device,
 	.destroy_device		= iblock_destroy_device,
 	.free_device		= iblock_free_device,
+	.plug_device		= iblock_plug_device,
+	.unplug_device		= iblock_unplug_device,
 	.parse_cdb		= iblock_parse_cdb,
 	.set_configfs_dev_params = iblock_set_configfs_dev_params,
 	.show_configfs_dev_params = iblock_show_configfs_dev_params,
diff --git a/drivers/target/target_core_iblock.h b/drivers/target/target_core_iblock.h
index cefc641145b3..8c55375d2f75 100644
--- a/drivers/target/target_core_iblock.h
+++ b/drivers/target/target_core_iblock.h
@@ -4,6 +4,7 @@
 
 #include <linux/atomic.h>
 #include <linux/refcount.h>
+#include <linux/blkdev.h>
 #include <target/target_core_base.h>
 
 #define IBLOCK_VERSION		"4.0"
@@ -17,6 +18,14 @@ struct iblock_req {
 
 #define IBDF_HAS_UDEV_PATH		0x01
 
+#define IBD_PLUGF_PLUGGED		0x01
+
+struct iblock_dev_plug {
+	struct se_dev_plug se_plug;
+	struct blk_plug blk_plug;
+	unsigned long flags;
+};
+
 struct iblock_dev {
 	struct se_device dev;
 	unsigned char ibd_udev_path[SE_UDEV_PATH_LEN];
@@ -24,6 +33,7 @@ struct iblock_dev {
 	struct bio_set	ibd_bio_set;
 	struct block_device *ibd_bd;
 	bool ibd_readonly;
+	struct iblock_dev_plug *ibd_plug;
 } ____cacheline_aligned;
 
 #endif /* TARGET_CORE_IBLOCK_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/13] target_core_user: add backend plug/unplug callouts
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (8 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 09/13] target iblock: add backend plug/unplug callouts Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 16:32   ` Bodo Stroesser
  2021-02-09 12:38 ` [PATCH 11/13] target: replace work per cmd in completion path Mike Christie
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

This patch adds plug/unplug callouts for tcmu, so we can avoid the
number of times we switch to userspace. Using this driver with tcm
loop is a common config, and dependng on the nr_hw_queues
(nr_hw_queues=1 performs much better) and fio jobs (lower num jobs
around 4) this patch can increase IOPs by only around 5-10% because
we hit other issues like the big per tcmu device mutex.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_user.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index a5991df23581..a030ca6f0f4c 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -111,6 +111,7 @@ struct tcmu_dev {
 	struct kref kref;
 
 	struct se_device se_dev;
+	struct se_dev_plug se_plug;
 
 	char *name;
 	struct se_hba *hba;
@@ -119,6 +120,7 @@ struct tcmu_dev {
 #define TCMU_DEV_BIT_BROKEN 1
 #define TCMU_DEV_BIT_BLOCKED 2
 #define TCMU_DEV_BIT_TMR_NOTIFY 3
+#define TCM_DEV_BIT_PLUGGED 4
 	unsigned long flags;
 
 	struct uio_info uio_info;
@@ -959,6 +961,25 @@ static uint32_t ring_insert_padding(struct tcmu_dev *udev, size_t cmd_size)
 	return cmd_head;
 }
 
+static void tcmu_unplug_device(struct se_dev_plug *se_plug)
+{
+	struct se_device *se_dev = se_plug->se_dev;
+	struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+	uio_event_notify(&udev->uio_info);
+	clear_bit(TCM_DEV_BIT_PLUGGED, &udev->flags);
+}
+
+static struct se_dev_plug *tcmu_plug_device(struct se_device *se_dev)
+{
+	struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+	if (!test_and_set_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+		return &udev->se_plug;
+
+	return NULL;
+}
+
 /**
  * queue_cmd_ring - queue cmd to ring or internally
  * @tcmu_cmd: cmd to queue
@@ -1086,8 +1107,8 @@ static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
 
 	list_add_tail(&tcmu_cmd->queue_entry, &udev->inflight_queue);
 
-	/* TODO: only if FLUSH and FUA? */
-	uio_event_notify(&udev->uio_info);
+	if (!test_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+		uio_event_notify(&udev->uio_info);
 
 	return 0;
 
@@ -2840,6 +2861,8 @@ static struct target_backend_ops tcmu_ops = {
 	.configure_device	= tcmu_configure_device,
 	.destroy_device		= tcmu_destroy_device,
 	.free_device		= tcmu_free_device,
+	.unplug_device		= tcmu_unplug_device,
+	.plug_device		= tcmu_plug_device,
 	.parse_cdb		= tcmu_parse_cdb,
 	.tmr_notify		= tcmu_tmr_notify,
 	.set_configfs_dev_params = tcmu_set_configfs_dev_params,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/13] target: replace work per cmd in completion path
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (9 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 10/13] target_core_user: " Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 17:01   ` Bodo Stroesser
  2021-02-10  8:42     ` Christoph Hellwig
  2021-02-09 12:38 ` [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion Mike Christie
  2021-02-09 12:38 ` [PATCH 13/13] target: flush submission work during TMR processing Mike Christie
  12 siblings, 2 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

Doing a work per cmd can lead to lots of threads being created.
This patch just replaces the completion work per cmd with a per cpu
list. Combined with the first patches this allows tcm loop on top of
initiators like iser to go from around 700K IOPs to 1000K and reduces
the number of threads that get created when the system is under heavy
load and hitting the initiator drivers tagging limits.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_device.c    |  3 +
 drivers/target/target_core_internal.h  |  1 +
 drivers/target/target_core_transport.c | 98 +++++++++++++++-----------
 include/target/target_core_base.h      |  1 +
 4 files changed, 60 insertions(+), 43 deletions(-)

diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index 74d3a4896588..eaa2323843c0 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -743,6 +743,9 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
 
 		init_llist_head(&q->sq.cmd_list);
 		INIT_WORK(&q->sq.work, target_queued_submit_work);
+
+		init_llist_head(&q->cq.cmd_list);
+		INIT_WORK(&q->cq.work, target_queued_compl_work);
 	}
 
 	dev->se_hba = hba;
diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
index 56f841fd7f04..e54d05ae8dfd 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -154,6 +154,7 @@ bool	target_check_wce(struct se_device *dev);
 bool	target_check_fua(struct se_device *dev);
 void	__target_execute_cmd(struct se_cmd *, bool);
 void	target_queued_submit_work(struct work_struct *work);
+void	target_queued_compl_work(struct work_struct *work);
 
 /* target_core_stat.c */
 void	target_stat_setup_dev_default_groups(struct se_device *);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index a90f20222894..b1f920c1b816 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -55,7 +55,6 @@ static void transport_complete_task_attr(struct se_cmd *cmd);
 static void translate_sense_reason(struct se_cmd *cmd, sense_reason_t reason);
 static void transport_handle_queue_full(struct se_cmd *cmd,
 		struct se_device *dev, int err, bool write_pending);
-static void target_complete_ok_work(struct work_struct *work);
 
 int init_se_kmem_caches(void)
 {
@@ -732,14 +731,6 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
 		percpu_ref_put(&lun->lun_ref);
 }
 
-static void target_complete_failure_work(struct work_struct *work)
-{
-	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
-
-	transport_generic_request_failure(cmd,
-			TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
-}
-
 /*
  * Used when asking transport to copy Sense Data from the underlying
  * Linux/SCSI struct scsi_cmnd
@@ -827,11 +818,20 @@ static void target_handle_abort(struct se_cmd *cmd)
 	transport_cmd_check_stop_to_fabric(cmd);
 }
 
-static void target_abort_work(struct work_struct *work)
+static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
+				  int cpu, struct workqueue_struct *wq)
 {
-	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
+	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
+	queue_work_on(cpu, wq, &q->work);
+}
 
-	target_handle_abort(cmd);
+static void target_queue_cmd_compl(struct se_cmd *se_cmd)
+{
+	struct se_device *se_dev = se_cmd->se_dev;
+	int cpu = se_cmd->cpuid;
+
+	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
+			      target_completion_wq);
 }
 
 static bool target_cmd_interrupted(struct se_cmd *cmd)
@@ -841,8 +841,8 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
 	if (cmd->transport_state & CMD_T_ABORTED) {
 		if (cmd->transport_complete_callback)
 			cmd->transport_complete_callback(cmd, false, &post_ret);
-		INIT_WORK(&cmd->work, target_abort_work);
-		queue_work(target_completion_wq, &cmd->work);
+
+		target_queue_cmd_compl(cmd);
 		return true;
 	} else if (cmd->transport_state & CMD_T_STOP) {
 		if (cmd->transport_complete_callback)
@@ -857,7 +857,6 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
 /* May be called from interrupt context so must not sleep. */
 void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 {
-	int success;
 	unsigned long flags;
 
 	if (target_cmd_interrupted(cmd))
@@ -866,25 +865,11 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 	cmd->scsi_status = scsi_status;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	switch (cmd->scsi_status) {
-	case SAM_STAT_CHECK_CONDITION:
-		if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
-			success = 1;
-		else
-			success = 0;
-		break;
-	default:
-		success = 1;
-		break;
-	}
-
 	cmd->t_state = TRANSPORT_COMPLETE;
 	cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
 	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 
-	INIT_WORK(&cmd->work, success ? target_complete_ok_work :
-		  target_complete_failure_work);
-	queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
+	target_queue_cmd_compl(cmd);
 }
 EXPORT_SYMBOL(target_complete_cmd);
 
@@ -1894,13 +1879,6 @@ void target_queued_submit_work(struct work_struct *work)
 		target_unplug_device(se_plug);
 }
 
-static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
-				  int cpu)
-{
-	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
-	queue_work_on(cpu, target_submission_wq, &q->work);
-}
-
 /**
  * target_queue_cmd_submit - queue the cmd to run on the LIO workqueue
  * @se_cmd: command descriptor to submit
@@ -1951,7 +1929,8 @@ target_queue_cmd_submit(struct se_cmd *se_cmd, struct se_session *se_sess,
 
 	cpu = se_cmd->cpuid;
 	se_dev = se_cmd->se_dev;
-	target_queue_cmd_work(&se_dev->queues[cpu].sq, se_cmd, cpu);
+	target_queue_cmd_work(&se_dev->queues[cpu].sq, se_cmd, cpu,
+			      target_submission_wq);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
@@ -2054,8 +2033,7 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 		cmd->transport_complete_callback(cmd, false, &post_ret);
 
 	if (cmd->transport_state & CMD_T_ABORTED) {
-		INIT_WORK(&cmd->work, target_abort_work);
-		queue_work(target_completion_wq, &cmd->work);
+		target_queue_cmd_compl(cmd);
 		return;
 	}
 
@@ -2480,10 +2458,32 @@ static bool target_read_prot_action(struct se_cmd *cmd)
 	return false;
 }
 
-static void target_complete_ok_work(struct work_struct *work)
+static void target_complete_cmd_work(struct se_cmd *cmd)
 {
-	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
-	int ret;
+	int ret, success;
+
+	if (cmd->transport_state & CMD_T_ABORTED) {
+		target_handle_abort(cmd);
+		return;
+	}
+
+	switch (cmd->scsi_status) {
+	case SAM_STAT_CHECK_CONDITION:
+		if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
+			success = 1;
+		else
+			success = 0;
+		break;
+	default:
+		success = 1;
+		break;
+	}
+
+	if (!success) {
+		transport_generic_request_failure(cmd,
+				TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+		return;
+	}
 
 	/*
 	 * Check if we need to move delayed/dormant tasks from cmds on the
@@ -2625,6 +2625,18 @@ static void target_complete_ok_work(struct work_struct *work)
 	transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
 }
 
+void target_queued_compl_work(struct work_struct *work)
+{
+	struct se_cmd_queue *cq = container_of(work, struct se_cmd_queue,
+					       work);
+	struct se_cmd *se_cmd, *next_cmd;
+	struct llist_node *cmd_list;
+
+	cmd_list = llist_del_all(&cq->cmd_list);
+	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
+		target_complete_cmd_work(se_cmd);
+}
+
 void target_free_sgl(struct scatterlist *sgl, int nents)
 {
 	sgl_free_n_order(sgl, nents, 0);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index b8e0a3250bd0..f2ba7de59da7 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -777,6 +777,7 @@ struct se_device_queue {
 	struct list_head	state_list;
 	spinlock_t		lock;
 	struct se_cmd_queue	sq;
+	struct se_cmd_queue	cq;
 };
 
 struct se_device {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (10 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 11/13] target: replace work per cmd in completion path Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-10  8:44     ` Christoph Hellwig
  2021-02-09 12:38 ` [PATCH 13/13] target: flush submission work during TMR processing Mike Christie
  12 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

LIO wants to complete a cmd on the CPU it was submitted on, because
most drivers have per cpu or hw queue handlers. But, for vhost-scsi
which has the single thread for submissions and completions this
is not always the best thing to do since the thread could be running
on a different CPU now, and it conflicts with what the user has setup
in the lower levels with settings like the block layer rq_affinity
or for network block devices what the user has setup on their nic.

This patch has vhost-scsi tell LIO to complete the cmd on the CPU the
layer below LIO has completed the cmd on. We then stop fighting
the block, net and whatever layer/setting is below us.

With this patch and the previous ones I see an increase in IOPs by about
50% (234K -> 350K) for random 4K workloads like:

fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
--ioengine=libaio --iodepth=128  --numjobs=8 --time_based
--group_reporting --runtime=60

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/target/target_core_transport.c | 10 +++++++++-
 drivers/vhost/scsi.c                   |  3 ++-
 include/target/target_core_base.h      |  2 ++
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index b1f920c1b816..e5e555dcd73a 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -828,7 +828,12 @@ static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
 static void target_queue_cmd_compl(struct se_cmd *se_cmd)
 {
 	struct se_device *se_dev = se_cmd->se_dev;
-	int cpu = se_cmd->cpuid;
+	int cpu;
+
+	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
+		cpu = smp_processor_id();
+	else
+		cpu = se_cmd->cpuid;
 
 	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
 			      target_completion_wq);
@@ -1609,6 +1614,9 @@ target_submit_prep(struct se_cmd *se_cmd, struct se_session *se_sess,
 	BUG_ON(!se_tpg);
 	BUG_ON(se_cmd->se_tfo || se_cmd->se_sess);
 
+	if (flags & TARGET_SCF_IGNORE_CPUID_COMPL)
+		se_cmd->se_cmd_flags |= SCF_IGNORE_CPUID_COMPL;
+
 	if (flags & TARGET_SCF_USE_CPUID)
 		se_cmd->se_cmd_flags |= SCF_USE_CPUID;
 	/*
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 99909c6f3960..b9addd5fdd28 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -803,7 +803,8 @@ static void vhost_scsi_target_queue_cmd(struct vhost_scsi_cmd *cmd)
 			&cmd->tvc_sense_buf[0], cmd->tvc_lun,
 			cmd->tvc_exp_data_len,
 			vhost_scsi_to_tcm_attr(cmd->tvc_task_attr),
-			cmd->tvc_data_direction, TARGET_SCF_ACK_KREF,
+			cmd->tvc_data_direction,
+			TARGET_SCF_ACK_KREF | TARGET_SCF_IGNORE_CPUID_COMPL,
 			sg_ptr, cmd->tvc_sgl_count, NULL, 0, sg_prot_ptr,
 			cmd->tvc_prot_sgl_count);
 	if (rc < 0) {
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index f2ba7de59da7..bb4ac7e6f560 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -146,6 +146,7 @@ enum se_cmd_flags_table {
 	SCF_USE_CPUID				= (1 << 16),
 	SCF_TASK_ATTR_SET			= (1 << 17),
 	SCF_TREAT_READ_AS_NORMAL		= (1 << 18),
+	SCF_IGNORE_CPUID_COMPL			= (1 << 19)
 };
 
 /*
@@ -195,6 +196,7 @@ enum target_sc_flags_table {
 	TARGET_SCF_ACK_KREF		= 0x02,
 	TARGET_SCF_UNKNOWN_SIZE		= 0x04,
 	TARGET_SCF_USE_CPUID		= 0x08,
+	TARGET_SCF_IGNORE_CPUID_COMPL	= 0x10,
 };
 
 /* fabric independent task management function values */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/13] target: flush submission work during TMR processing
  2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
                   ` (11 preceding siblings ...)
  2021-02-09 12:38 ` [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion Mike Christie
@ 2021-02-09 12:38 ` Mike Christie
  2021-02-09 14:31   ` Laurence Oberman
  2021-02-09 17:05   ` Bodo Stroesser
  12 siblings, 2 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 12:38 UTC (permalink / raw)
  To: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization
  Cc: Mike Christie

If a cmd is on the submission workqueue then the TMR code will
miss it, and end up returning task not found or success for
lun resets. The fabric driver might then tell the initiator that
the running cmds have been handled when they are about to run.

This adds a cancel when we are processing TMRs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_tmr.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 7347285471fa..9b7f159f9341 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -124,6 +124,8 @@ void core_tmr_abort_task(
 	int i;
 
 	for (i = 0; i < dev->queue_cnt; i++) {
+		cancel_work_sync(&dev->queues[i].sq.work);
+
 		spin_lock_irqsave(&dev->queues[i].lock, flags);
 		list_for_each_entry_safe(se_cmd, next, &dev->queues[i].state_list,
 					 state_list) {
@@ -302,6 +304,8 @@ static void core_tmr_drain_state_list(
 	 * in the Control Mode Page.
 	 */
 	for (i = 0; i < dev->queue_cnt; i++) {
+		cancel_work_sync(&dev->queues[i].sq.work);
+
 		spin_lock_irqsave(&dev->queues[i].lock, flags);
 		list_for_each_entry_safe(cmd, next, &dev->queues[i].state_list,
 					 state_list) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 13/13] target: flush submission work during TMR processing
  2021-02-09 12:38 ` [PATCH 13/13] target: flush submission work during TMR processing Mike Christie
@ 2021-02-09 14:31   ` Laurence Oberman
  2021-02-10 14:25     ` Laurence Oberman
  2021-02-09 17:05   ` Bodo Stroesser
  1 sibling, 1 reply; 50+ messages in thread
From: Laurence Oberman @ 2021-02-09 14:31 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On Tue, 2021-02-09 at 06:38 -0600, Mike Christie wrote:
> If a cmd is on the submission workqueue then the TMR code will
> miss it, and end up returning task not found or success for
> lun resets. The fabric driver might then tell the initiator that
> the running cmds have been handled when they are about to run.
> 
> This adds a cancel when we are processing TMRs.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/target/target_core_tmr.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/target/target_core_tmr.c
> b/drivers/target/target_core_tmr.c
> index 7347285471fa..9b7f159f9341 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -124,6 +124,8 @@ void core_tmr_abort_task(
>  	int i;
>  
>  	for (i = 0; i < dev->queue_cnt; i++) {
> +		cancel_work_sync(&dev->queues[i].sq.work);
> +
>  		spin_lock_irqsave(&dev->queues[i].lock, flags);
>  		list_for_each_entry_safe(se_cmd, next, &dev-
> >queues[i].state_list,
>  					 state_list) {
> @@ -302,6 +304,8 @@ static void core_tmr_drain_state_list(
>  	 * in the Control Mode Page.
>  	 */
>  	for (i = 0; i < dev->queue_cnt; i++) {
> +		cancel_work_sync(&dev->queues[i].sq.work);
> +
>  		spin_lock_irqsave(&dev->queues[i].lock, flags);
>  		list_for_each_entry_safe(cmd, next, &dev-
> >queues[i].state_list,
>  	


> 				 state_list) {

Hello Mike
Thanks for these
This one in particular is the one that I think will help our case. I
will pull all of these and test later this week as a bundle.

Many Thanks
Laurence Oberman


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/13] target: add workqueue based cmd submission
  2021-02-09 12:38 ` [PATCH 03/13] target: add workqueue based cmd submission Mike Christie
@ 2021-02-09 15:48   ` Bodo Stroesser
  2021-02-09 18:43     ` Mike Christie
  0 siblings, 1 reply; 50+ messages in thread
From: Bodo Stroesser @ 2021-02-09 15:48 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 09.02.21 13:38, Mike Christie wrote:
>   
> +void target_queued_submit_work(struct work_struct *work)
> +{
> +	struct se_cmd_queue *sq = container_of(work, struct se_cmd_queue, work);
> +	struct se_cmd *se_cmd, *next_cmd;
> +	struct llist_node *cmd_list;
> +	unsigned char *cdb;
> +
> +	cmd_list = llist_del_all(&sq->cmd_list);
> +	if (!cmd_list)
> +		/* Previous call took what we were queued to submit */
> +		return;
> +
> +	cmd_list = llist_reverse_order(cmd_list);
> +	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
> +		cdb = se_cmd->se_tfo->get_cdb(se_cmd);

If I got it right, get_cdb is a new, optional callback.
So, should we check, whether it is set?

Maybe the check better could be done early in target_queue_cmd_submit.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] tcm loop: use lio wq cmd submission helper
  2021-02-09 12:38 ` [PATCH 06/13] tcm loop: use lio wq cmd submission helper Mike Christie
@ 2021-02-09 15:59   ` Bodo Stroesser
  2021-02-09 18:44     ` Mike Christie
  2021-02-09 17:39     ` kernel test robot
  1 sibling, 1 reply; 50+ messages in thread
From: Bodo Stroesser @ 2021-02-09 15:59 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 09.02.21 13:38, Mike Christie wrote:
> @@ -179,6 +184,11 @@ static void tcm_loop_submission_work(struct work_struct *work)
>   static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
>   {
>   	struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
> +	struct tcm_loop_hba *tl_hba;
> +	struct tcm_loop_tpg *tl_tpg;
> +
> +	tl_hba = *(struct tcm_loop_hba **)shost_priv(sc->device->host);
> +	tl_tpg = &tl_hba->tl_hba_tpgs[sc->device->id];
>   
>   	pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
>   		 __func__, sc->device->host->host_no, sc->device->id,

AFAICS these new lines are not needed. Or am I missing something?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] target: split target_submit_cmd_map_sgls
  2021-02-09 12:38 ` [PATCH 02/13] target: split target_submit_cmd_map_sgls Mike Christie
  2021-02-09 16:15     ` kernel test robot
@ 2021-02-09 16:15     ` kernel test robot
  1 sibling, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-02-09 16:15 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization
  Cc: kbuild-all, Mike Christie

[-- Attachment #1: Type: text/plain, Size: 4737 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on vhost/linux-next v5.11-rc7 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: h8300-randconfig-s031-20210209 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/3382952197b63f11c166ff293f819ce6ac4d52ae
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
        git checkout 3382952197b63f11c166ff293f819ce6ac4d52ae
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/target/target_core_transport.c:1695:12: sparse: sparse: incorrect type in assignment (different base types) @@     expected int rc @@     got restricted sense_reason_t @@
   drivers/target/target_core_transport.c:1695:12: sparse:     expected int rc
   drivers/target/target_core_transport.c:1695:12: sparse:     got restricted sense_reason_t
   drivers/target/target_core_transport.c:1699:12: sparse: sparse: incorrect type in assignment (different base types) @@     expected int rc @@     got restricted sense_reason_t @@
   drivers/target/target_core_transport.c:1699:12: sparse:     expected int rc
   drivers/target/target_core_transport.c:1699:12: sparse:     got restricted sense_reason_t
>> drivers/target/target_core_transport.c:1736:51: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected restricted sense_reason_t [usertype] @@     got int rc @@
   drivers/target/target_core_transport.c:1736:51: sparse:     expected restricted sense_reason_t [usertype]
   drivers/target/target_core_transport.c:1736:51: sparse:     got int rc

vim +1695 drivers/target/target_core_transport.c

  1678	
  1679	/**
  1680	 * target_submit - perform final initialization and submit cmd to LIO core
  1681	 * @se_cmd: command descriptor to submit
  1682	 * @cdb: pointer to SCSI CDB
  1683	 *
  1684	 * target_submit_prep must have been called on the cmd, and this must be
  1685	 * called from process context.
  1686	 */
  1687	static void target_submit(struct se_cmd *se_cmd, unsigned char *cdb)
  1688	{
  1689		struct scatterlist *sgl = se_cmd->t_data_sg;
  1690		unsigned char *buf = NULL;
  1691		int rc;
  1692	
  1693		might_sleep();
  1694	
> 1695		rc = target_cmd_init_cdb(se_cmd, cdb);
  1696		if (rc)
  1697			goto fail;
  1698	
  1699		rc = target_cmd_parse_cdb(se_cmd);
  1700		if (rc != 0)
  1701			goto fail;
  1702	
  1703		if (se_cmd->t_data_nents != 0) {
  1704			BUG_ON(!sgl);
  1705			/*
  1706			 * A work-around for tcm_loop as some userspace code via
  1707			 * scsi-generic do not memset their associated read buffers,
  1708			 * so go ahead and do that here for type non-data CDBs.  Also
  1709			 * note that this is currently guaranteed to be a single SGL
  1710			 * for this case by target core in target_setup_cmd_from_cdb()
  1711			 * -> transport_generic_cmd_sequencer().
  1712			 */
  1713			if (!(se_cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) &&
  1714			     se_cmd->data_direction == DMA_FROM_DEVICE) {
  1715				if (sgl)
  1716					buf = kmap(sg_page(sgl)) + sgl->offset;
  1717	
  1718				if (buf) {
  1719					memset(buf, 0, sgl->length);
  1720					kunmap(sg_page(sgl));
  1721				}
  1722			}
  1723	
  1724		}
  1725	
  1726		/*
  1727		 * Check if we need to delay processing because of ALUA
  1728		 * Active/NonOptimized primary access state..
  1729		 */
  1730		core_alua_check_nonop_delay(se_cmd);
  1731	
  1732		transport_handle_cdb_direct(se_cmd);
  1733		return;
  1734	
  1735	fail:
> 1736		transport_generic_request_failure(se_cmd, rc);
  1737	}
  1738	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23129 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] target: split target_submit_cmd_map_sgls
@ 2021-02-09 16:15     ` kernel test robot
  0 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-02-09 16:15 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization
  Cc: kbuild-all, Mike Christie

[-- Attachment #1: Type: text/plain, Size: 4737 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on vhost/linux-next v5.11-rc7 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: h8300-randconfig-s031-20210209 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/3382952197b63f11c166ff293f819ce6ac4d52ae
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
        git checkout 3382952197b63f11c166ff293f819ce6ac4d52ae
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/target/target_core_transport.c:1695:12: sparse: sparse: incorrect type in assignment (different base types) @@     expected int rc @@     got restricted sense_reason_t @@
   drivers/target/target_core_transport.c:1695:12: sparse:     expected int rc
   drivers/target/target_core_transport.c:1695:12: sparse:     got restricted sense_reason_t
   drivers/target/target_core_transport.c:1699:12: sparse: sparse: incorrect type in assignment (different base types) @@     expected int rc @@     got restricted sense_reason_t @@
   drivers/target/target_core_transport.c:1699:12: sparse:     expected int rc
   drivers/target/target_core_transport.c:1699:12: sparse:     got restricted sense_reason_t
>> drivers/target/target_core_transport.c:1736:51: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected restricted sense_reason_t [usertype] @@     got int rc @@
   drivers/target/target_core_transport.c:1736:51: sparse:     expected restricted sense_reason_t [usertype]
   drivers/target/target_core_transport.c:1736:51: sparse:     got int rc

vim +1695 drivers/target/target_core_transport.c

  1678	
  1679	/**
  1680	 * target_submit - perform final initialization and submit cmd to LIO core
  1681	 * @se_cmd: command descriptor to submit
  1682	 * @cdb: pointer to SCSI CDB
  1683	 *
  1684	 * target_submit_prep must have been called on the cmd, and this must be
  1685	 * called from process context.
  1686	 */
  1687	static void target_submit(struct se_cmd *se_cmd, unsigned char *cdb)
  1688	{
  1689		struct scatterlist *sgl = se_cmd->t_data_sg;
  1690		unsigned char *buf = NULL;
  1691		int rc;
  1692	
  1693		might_sleep();
  1694	
> 1695		rc = target_cmd_init_cdb(se_cmd, cdb);
  1696		if (rc)
  1697			goto fail;
  1698	
  1699		rc = target_cmd_parse_cdb(se_cmd);
  1700		if (rc != 0)
  1701			goto fail;
  1702	
  1703		if (se_cmd->t_data_nents != 0) {
  1704			BUG_ON(!sgl);
  1705			/*
  1706			 * A work-around for tcm_loop as some userspace code via
  1707			 * scsi-generic do not memset their associated read buffers,
  1708			 * so go ahead and do that here for type non-data CDBs.  Also
  1709			 * note that this is currently guaranteed to be a single SGL
  1710			 * for this case by target core in target_setup_cmd_from_cdb()
  1711			 * -> transport_generic_cmd_sequencer().
  1712			 */
  1713			if (!(se_cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) &&
  1714			     se_cmd->data_direction == DMA_FROM_DEVICE) {
  1715				if (sgl)
  1716					buf = kmap(sg_page(sgl)) + sgl->offset;
  1717	
  1718				if (buf) {
  1719					memset(buf, 0, sgl->length);
  1720					kunmap(sg_page(sgl));
  1721				}
  1722			}
  1723	
  1724		}
  1725	
  1726		/*
  1727		 * Check if we need to delay processing because of ALUA
  1728		 * Active/NonOptimized primary access state..
  1729		 */
  1730		core_alua_check_nonop_delay(se_cmd);
  1731	
  1732		transport_handle_cdb_direct(se_cmd);
  1733		return;
  1734	
  1735	fail:
> 1736		transport_generic_request_failure(se_cmd, rc);
  1737	}
  1738	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23129 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] target: split target_submit_cmd_map_sgls
@ 2021-02-09 16:15     ` kernel test robot
  0 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-02-09 16:15 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4847 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on vhost/linux-next v5.11-rc7 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: h8300-randconfig-s031-20210209 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/3382952197b63f11c166ff293f819ce6ac4d52ae
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
        git checkout 3382952197b63f11c166ff293f819ce6ac4d52ae
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> drivers/target/target_core_transport.c:1695:12: sparse: sparse: incorrect type in assignment (different base types) @@     expected int rc @@     got restricted sense_reason_t @@
   drivers/target/target_core_transport.c:1695:12: sparse:     expected int rc
   drivers/target/target_core_transport.c:1695:12: sparse:     got restricted sense_reason_t
   drivers/target/target_core_transport.c:1699:12: sparse: sparse: incorrect type in assignment (different base types) @@     expected int rc @@     got restricted sense_reason_t @@
   drivers/target/target_core_transport.c:1699:12: sparse:     expected int rc
   drivers/target/target_core_transport.c:1699:12: sparse:     got restricted sense_reason_t
>> drivers/target/target_core_transport.c:1736:51: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected restricted sense_reason_t [usertype] @@     got int rc @@
   drivers/target/target_core_transport.c:1736:51: sparse:     expected restricted sense_reason_t [usertype]
   drivers/target/target_core_transport.c:1736:51: sparse:     got int rc

vim +1695 drivers/target/target_core_transport.c

  1678	
  1679	/**
  1680	 * target_submit - perform final initialization and submit cmd to LIO core
  1681	 * @se_cmd: command descriptor to submit
  1682	 * @cdb: pointer to SCSI CDB
  1683	 *
  1684	 * target_submit_prep must have been called on the cmd, and this must be
  1685	 * called from process context.
  1686	 */
  1687	static void target_submit(struct se_cmd *se_cmd, unsigned char *cdb)
  1688	{
  1689		struct scatterlist *sgl = se_cmd->t_data_sg;
  1690		unsigned char *buf = NULL;
  1691		int rc;
  1692	
  1693		might_sleep();
  1694	
> 1695		rc = target_cmd_init_cdb(se_cmd, cdb);
  1696		if (rc)
  1697			goto fail;
  1698	
  1699		rc = target_cmd_parse_cdb(se_cmd);
  1700		if (rc != 0)
  1701			goto fail;
  1702	
  1703		if (se_cmd->t_data_nents != 0) {
  1704			BUG_ON(!sgl);
  1705			/*
  1706			 * A work-around for tcm_loop as some userspace code via
  1707			 * scsi-generic do not memset their associated read buffers,
  1708			 * so go ahead and do that here for type non-data CDBs.  Also
  1709			 * note that this is currently guaranteed to be a single SGL
  1710			 * for this case by target core in target_setup_cmd_from_cdb()
  1711			 * -> transport_generic_cmd_sequencer().
  1712			 */
  1713			if (!(se_cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) &&
  1714			     se_cmd->data_direction == DMA_FROM_DEVICE) {
  1715				if (sgl)
  1716					buf = kmap(sg_page(sgl)) + sgl->offset;
  1717	
  1718				if (buf) {
  1719					memset(buf, 0, sgl->length);
  1720					kunmap(sg_page(sgl));
  1721				}
  1722			}
  1723	
  1724		}
  1725	
  1726		/*
  1727		 * Check if we need to delay processing because of ALUA
  1728		 * Active/NonOptimized primary access state..
  1729		 */
  1730		core_alua_check_nonop_delay(se_cmd);
  1731	
  1732		transport_handle_cdb_direct(se_cmd);
  1733		return;
  1734	
  1735	fail:
> 1736		transport_generic_request_failure(se_cmd, rc);
  1737	}
  1738	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 23129 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/13] target_core_user: add backend plug/unplug callouts
  2021-02-09 12:38 ` [PATCH 10/13] target_core_user: " Mike Christie
@ 2021-02-09 16:32   ` Bodo Stroesser
  2021-02-09 18:59     ` Mike Christie
  0 siblings, 1 reply; 50+ messages in thread
From: Bodo Stroesser @ 2021-02-09 16:32 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 09.02.21 13:38, Mike Christie wrote:
> This patch adds plug/unplug callouts for tcmu, so we can avoid the
> number of times we switch to userspace. Using this driver with tcm
> loop is a common config, and dependng on the nr_hw_queues
> (nr_hw_queues=1 performs much better) and fio jobs (lower num jobs
> around 4) this patch can increase IOPs by only around 5-10% because
> we hit other issues like the big per tcmu device mutex.> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>   drivers/target/target_core_user.c | 27 +++++++++++++++++++++++++--
>   1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
> index a5991df23581..a030ca6f0f4c 100644
> --- a/drivers/target/target_core_user.c
> +++ b/drivers/target/target_core_user.c
> @@ -111,6 +111,7 @@ struct tcmu_dev {
>   	struct kref kref;
>   
>   	struct se_device se_dev;
> +	struct se_dev_plug se_plug;
>   
>   	char *name;
>   	struct se_hba *hba;
> @@ -119,6 +120,7 @@ struct tcmu_dev {
>   #define TCMU_DEV_BIT_BROKEN 1
>   #define TCMU_DEV_BIT_BLOCKED 2
>   #define TCMU_DEV_BIT_TMR_NOTIFY 3
> +#define TCM_DEV_BIT_PLUGGED 4
>   	unsigned long flags;
>   
>   	struct uio_info uio_info;
> @@ -959,6 +961,25 @@ static uint32_t ring_insert_padding(struct tcmu_dev *udev, size_t cmd_size)
>   	return cmd_head;
>   }
>   
> +static void tcmu_unplug_device(struct se_dev_plug *se_plug)
> +{
> +	struct se_device *se_dev = se_plug->se_dev;
> +	struct tcmu_dev *udev = TCMU_DEV(se_dev);
> +
> +	uio_event_notify(&udev->uio_info);

Don't we have a race here?

Let's assume that
 - just here the thread is interrupted
 - userspace starts,empties the ring and sleeps again
 - another cpu queues a new CDB in the ring
In that - of course very rare condition - userspace will not wake up for the freshly queued CDB.

I think, first clearing the bit, then doing the uio_event_notify would work (without need to take the big tcmu mutex).

> +	clear_bit(TCM_DEV_BIT_PLUGGED, &udev->flags);
> +}
> +
> +static struct se_dev_plug *tcmu_plug_device(struct se_device *se_dev)
> +{
> +	struct tcmu_dev *udev = TCMU_DEV(se_dev);
> +
> +	if (!test_and_set_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
> +		return &udev->se_plug;
> +
> +	return NULL;
> +}
> +
>   /**
>    * queue_cmd_ring - queue cmd to ring or internally
>    * @tcmu_cmd: cmd to queue
> @@ -1086,8 +1107,8 @@ static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
>   
>   	list_add_tail(&tcmu_cmd->queue_entry, &udev->inflight_queue);
>   
> -	/* TODO: only if FLUSH and FUA? */
> -	uio_event_notify(&udev->uio_info);
> +	if (!test_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
> +		uio_event_notify(&udev->uio_info);
>   
>   	return 0;
>   
> @@ -2840,6 +2861,8 @@ static struct target_backend_ops tcmu_ops = {
>   	.configure_device	= tcmu_configure_device,
>   	.destroy_device		= tcmu_destroy_device,
>   	.free_device		= tcmu_free_device,
> +	.unplug_device		= tcmu_unplug_device,
> +	.plug_device		= tcmu_plug_device,
>   	.parse_cdb		= tcmu_parse_cdb,
>   	.tmr_notify		= tcmu_tmr_notify,
>   	.set_configfs_dev_params = tcmu_set_configfs_dev_params,
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/13] target: replace work per cmd in completion path
  2021-02-09 12:38 ` [PATCH 11/13] target: replace work per cmd in completion path Mike Christie
@ 2021-02-09 17:01   ` Bodo Stroesser
  2021-02-09 18:50     ` Mike Christie
  2021-02-10  8:42     ` Christoph Hellwig
  1 sibling, 1 reply; 50+ messages in thread
From: Bodo Stroesser @ 2021-02-09 17:01 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 09.02.21 13:38, Mike Christie wrote:
>   
> +void target_queued_compl_work(struct work_struct *work)
> +{
> +	struct se_cmd_queue *cq = container_of(work, struct se_cmd_queue,
> +					       work);
> +	struct se_cmd *se_cmd, *next_cmd;
> +	struct llist_node *cmd_list;
> +
> +	cmd_list = llist_del_all(&cq->cmd_list);

Probably nit-picking: I'd like to reverse the list before processing like you did during submission.

> +	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
> +		target_complete_cmd_work(se_cmd);
> +}
> +



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 13/13] target: flush submission work during TMR processing
  2021-02-09 12:38 ` [PATCH 13/13] target: flush submission work during TMR processing Mike Christie
  2021-02-09 14:31   ` Laurence Oberman
@ 2021-02-09 17:05   ` Bodo Stroesser
  2021-02-09 18:49     ` Mike Christie
  1 sibling, 1 reply; 50+ messages in thread
From: Bodo Stroesser @ 2021-02-09 17:05 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 09.02.21 13:38, Mike Christie wrote:
> If a cmd is on the submission workqueue then the TMR code will
> miss it, and end up returning task not found or success for
> lun resets. The fabric driver might then tell the initiator that
> the running cmds have been handled when they are about to run.
> 
> This adds a cancel when we are processing TMRs.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>   drivers/target/target_core_tmr.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 7347285471fa..9b7f159f9341 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -124,6 +124,8 @@ void core_tmr_abort_task(
>   	int i;
>   
>   	for (i = 0; i < dev->queue_cnt; i++) {
> +		cancel_work_sync(&dev->queues[i].sq.work);
> +
>   		spin_lock_irqsave(&dev->queues[i].lock, flags);
>   		list_for_each_entry_safe(se_cmd, next, &dev->queues[i].state_list,
>   					 state_list) {
> @@ -302,6 +304,8 @@ static void core_tmr_drain_state_list(
>   	 * in the Control Mode Page.
>   	 */
>   	for (i = 0; i < dev->queue_cnt; i++) {
> +		cancel_work_sync(&dev->queues[i].sq.work);
> +
>   		spin_lock_irqsave(&dev->queues[i].lock, flags);
>   		list_for_each_entry_safe(cmd, next, &dev->queues[i].state_list,
>   					 state_list) {
> 

Is there a possible race?

I mean, if cancel_work_sync() is called for a work that was queued but not started,
can we end up with cmds sleeping in a queue until a further cmd is queued to the same queue?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] tcm loop: use lio wq cmd submission helper
  2021-02-09 12:38 ` [PATCH 06/13] tcm loop: use lio wq cmd submission helper Mike Christie
@ 2021-02-09 17:39     ` kernel test robot
  2021-02-09 17:39     ` kernel test robot
  1 sibling, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-02-09 17:39 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization
  Cc: kbuild-all, Mike Christie

[-- Attachment #1: Type: text/plain, Size: 2984 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on vhost/linux-next v5.11-rc7 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: h8300-randconfig-s031-20210209 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/b5a5f1dde145805b1ea13be05f6a28386284ac2e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
        git checkout b5a5f1dde145805b1ea13be05f6a28386284ac2e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/target/loopback/tcm_loop.c: In function 'tcm_loop_queuecommand':
>> drivers/target/loopback/tcm_loop.c:188:23: warning: variable 'tl_tpg' set but not used [-Wunused-but-set-variable]
     188 |  struct tcm_loop_tpg *tl_tpg;
         |                       ^~~~~~


vim +/tl_tpg +188 drivers/target/loopback/tcm_loop.c

   179	
   180	/*
   181	 * ->queuecommand can be and usually is called from interrupt context, so
   182	 * defer the actual submission to a workqueue.
   183	 */
   184	static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
   185	{
   186		struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
   187		struct tcm_loop_hba *tl_hba;
 > 188		struct tcm_loop_tpg *tl_tpg;
   189	
   190		tl_hba = *(struct tcm_loop_hba **)shost_priv(sc->device->host);
   191		tl_tpg = &tl_hba->tl_hba_tpgs[sc->device->id];
   192	
   193		pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
   194			 __func__, sc->device->host->host_no, sc->device->id,
   195			 sc->device->channel, sc->device->lun, sc->cmnd[0],
   196			 scsi_bufflen(sc));
   197	
   198		memset(tl_cmd, 0, sizeof(*tl_cmd));
   199		tl_cmd->sc = sc;
   200		tl_cmd->sc_cmd_tag = sc->request->tag;
   201	
   202		tcm_loop_target_queue_cmd(tl_cmd);
   203		return 0;
   204	}
   205	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23129 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] tcm loop: use lio wq cmd submission helper
@ 2021-02-09 17:39     ` kernel test robot
  0 siblings, 0 replies; 50+ messages in thread
From: kernel test robot @ 2021-02-09 17:39 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3056 bytes --]

Hi Mike,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on mkp-scsi/for-next]
[also build test WARNING on vhost/linux-next v5.11-rc7 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git for-next
config: h8300-randconfig-s031-20210209 (attached as .config)
compiler: h8300-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/b5a5f1dde145805b1ea13be05f6a28386284ac2e
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mike-Christie/target-move-t_task_cdb-initialization/20210209-213926
        git checkout b5a5f1dde145805b1ea13be05f6a28386284ac2e
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=h8300 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/target/loopback/tcm_loop.c: In function 'tcm_loop_queuecommand':
>> drivers/target/loopback/tcm_loop.c:188:23: warning: variable 'tl_tpg' set but not used [-Wunused-but-set-variable]
     188 |  struct tcm_loop_tpg *tl_tpg;
         |                       ^~~~~~


vim +/tl_tpg +188 drivers/target/loopback/tcm_loop.c

   179	
   180	/*
   181	 * ->queuecommand can be and usually is called from interrupt context, so
   182	 * defer the actual submission to a workqueue.
   183	 */
   184	static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
   185	{
   186		struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
   187		struct tcm_loop_hba *tl_hba;
 > 188		struct tcm_loop_tpg *tl_tpg;
   189	
   190		tl_hba = *(struct tcm_loop_hba **)shost_priv(sc->device->host);
   191		tl_tpg = &tl_hba->tl_hba_tpgs[sc->device->id];
   192	
   193		pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
   194			 __func__, sc->device->host->host_no, sc->device->id,
   195			 sc->device->channel, sc->device->lun, sc->cmnd[0],
   196			 scsi_bufflen(sc));
   197	
   198		memset(tl_cmd, 0, sizeof(*tl_cmd));
   199		tl_cmd->sc = sc;
   200		tl_cmd->sc_cmd_tag = sc->request->tag;
   201	
   202		tcm_loop_target_queue_cmd(tl_cmd);
   203		return 0;
   204	}
   205	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 23129 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] target: split target_submit_cmd_map_sgls
  2021-02-09 16:15     ` kernel test robot
@ 2021-02-09 18:40       ` Mike Christie
  -1 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 18:40 UTC (permalink / raw)
  To: kernel test robot, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization
  Cc: kbuild-all

On 2/9/21 10:15 AM, kernel test robot wrote:
>   1682	 * @cdb: pointer to SCSI CDB
>   1683	 *
>   1684	 * target_submit_prep must have been called on the cmd, and this must be
>   1685	 * called from process context.
>   1686	 */
>   1687	static void target_submit(struct se_cmd *se_cmd, unsigned char *cdb)
>   1688	{
>   1689		struct scatterlist *sgl = se_cmd->t_data_sg;
>   1690		unsigned char *buf = NULL;
>   1691		int rc;
>   1692	
>   1693		might_sleep();
>   1694	
>> 1695		rc = target_cmd_init_cdb(se_cmd, cdb);

This is a valid issue. rc should be a sense_reason_t. I'll fix in the next
posting.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] target: split target_submit_cmd_map_sgls
@ 2021-02-09 18:40       ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 18:40 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 624 bytes --]

On 2/9/21 10:15 AM, kernel test robot wrote:
>   1682	 * @cdb: pointer to SCSI CDB
>   1683	 *
>   1684	 * target_submit_prep must have been called on the cmd, and this must be
>   1685	 * called from process context.
>   1686	 */
>   1687	static void target_submit(struct se_cmd *se_cmd, unsigned char *cdb)
>   1688	{
>   1689		struct scatterlist *sgl = se_cmd->t_data_sg;
>   1690		unsigned char *buf = NULL;
>   1691		int rc;
>   1692	
>   1693		might_sleep();
>   1694	
>> 1695		rc = target_cmd_init_cdb(se_cmd, cdb);

This is a valid issue. rc should be a sense_reason_t. I'll fix in the next
posting.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/13] target: add workqueue based cmd submission
  2021-02-09 15:48   ` Bodo Stroesser
@ 2021-02-09 18:43     ` Mike Christie
  2021-02-09 19:10       ` Mike Christie
  0 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-09 18:43 UTC (permalink / raw)
  To: Bodo Stroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 2/9/21 9:48 AM, Bodo Stroesser wrote:
> On 09.02.21 13:38, Mike Christie wrote:
>>   +void target_queued_submit_work(struct work_struct *work)
>> +{
>> +    struct se_cmd_queue *sq = container_of(work, struct se_cmd_queue, work);
>> +    struct se_cmd *se_cmd, *next_cmd;
>> +    struct llist_node *cmd_list;
>> +    unsigned char *cdb;
>> +
>> +    cmd_list = llist_del_all(&sq->cmd_list);
>> +    if (!cmd_list)
>> +        /* Previous call took what we were queued to submit */
>> +        return;
>> +
>> +    cmd_list = llist_reverse_order(cmd_list);
>> +    llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
>> +        cdb = se_cmd->se_tfo->get_cdb(se_cmd);
> 
> If I got it right, get_cdb is a new, optional callback.
> So, should we check, whether it is set?

I think a check is not really useful. The caller has to implement it
if they call target_queue_cmd_submit. It would be a BUG() and so
either way you would get a crash when you are developing a patch.

> 
> Maybe the check better could be done early in target_queue_cmd_submit.
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/13] tcm loop: use lio wq cmd submission helper
  2021-02-09 15:59   ` Bodo Stroesser
@ 2021-02-09 18:44     ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 18:44 UTC (permalink / raw)
  To: Bodo Stroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 2/9/21 9:59 AM, Bodo Stroesser wrote:
> On 09.02.21 13:38, Mike Christie wrote:
>> @@ -179,6 +184,11 @@ static void tcm_loop_submission_work(struct work_struct *work)
>>   static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
>>   {
>>       struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
>> +    struct tcm_loop_hba *tl_hba;
>> +    struct tcm_loop_tpg *tl_tpg;
>> +
>> +    tl_hba = *(struct tcm_loop_hba **)shost_priv(sc->device->host);
>> +    tl_tpg = &tl_hba->tl_hba_tpgs[sc->device->id];
>>         pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
>>            __func__, sc->device->host->host_no, sc->device->id,
> 
> AFAICS these new lines are not needed. Or am I missing something?

You're right. I'll fix it.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 13/13] target: flush submission work during TMR processing
  2021-02-09 17:05   ` Bodo Stroesser
@ 2021-02-09 18:49     ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 18:49 UTC (permalink / raw)
  To: Bodo Stroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 2/9/21 11:05 AM, Bodo Stroesser wrote:
> On 09.02.21 13:38, Mike Christie wrote:
>> If a cmd is on the submission workqueue then the TMR code will
>> miss it, and end up returning task not found or success for
>> lun resets. The fabric driver might then tell the initiator that
>> the running cmds have been handled when they are about to run.
>>
>> This adds a cancel when we are processing TMRs.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>   drivers/target/target_core_tmr.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
>> index 7347285471fa..9b7f159f9341 100644
>> --- a/drivers/target/target_core_tmr.c
>> +++ b/drivers/target/target_core_tmr.c
>> @@ -124,6 +124,8 @@ void core_tmr_abort_task(
>>   	int i;
>>   
>>   	for (i = 0; i < dev->queue_cnt; i++) {
>> +		cancel_work_sync(&dev->queues[i].sq.work);
>> +
>>   		spin_lock_irqsave(&dev->queues[i].lock, flags);
>>   		list_for_each_entry_safe(se_cmd, next, &dev->queues[i].state_list,
>>   					 state_list) {
>> @@ -302,6 +304,8 @@ static void core_tmr_drain_state_list(
>>   	 * in the Control Mode Page.
>>   	 */
>>   	for (i = 0; i < dev->queue_cnt; i++) {
>> +		cancel_work_sync(&dev->queues[i].sq.work);
>> +
>>   		spin_lock_irqsave(&dev->queues[i].lock, flags);
>>   		list_for_each_entry_safe(cmd, next, &dev->queues[i].state_list,
>>   					 state_list) {
>>
> 
> Is there a possible race?
> 
> I mean, if cancel_work_sync() is called for a work that was queued but not started,
> can we end up with cmds sleeping in a queue until a further cmd is queued to the same queue?

Oh yeah, you are right cancel is wrong. It should be a flush.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/13] target: replace work per cmd in completion path
  2021-02-09 17:01   ` Bodo Stroesser
@ 2021-02-09 18:50     ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 18:50 UTC (permalink / raw)
  To: Bodo Stroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 2/9/21 11:01 AM, Bodo Stroesser wrote:
> On 09.02.21 13:38, Mike Christie wrote:
>>   
>> +void target_queued_compl_work(struct work_struct *work)
>> +{
>> +	struct se_cmd_queue *cq = container_of(work, struct se_cmd_queue,
>> +					       work);
>> +	struct se_cmd *se_cmd, *next_cmd;
>> +	struct llist_node *cmd_list;
>> +
>> +	cmd_list = llist_del_all(&cq->cmd_list);
> 
> Probably nit-picking: I'd like to reverse the list before processing like you did during submission.
> 

Yeah, I flip flopped a lot on both. I'll just do it. I didn't notice
any perf differences.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 10/13] target_core_user: add backend plug/unplug callouts
  2021-02-09 16:32   ` Bodo Stroesser
@ 2021-02-09 18:59     ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 18:59 UTC (permalink / raw)
  To: Bodo Stroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 2/9/21 10:32 AM, Bodo Stroesser wrote:
> On 09.02.21 13:38, Mike Christie wrote:
>> This patch adds plug/unplug callouts for tcmu, so we can avoid the
>> number of times we switch to userspace. Using this driver with tcm
>> loop is a common config, and dependng on the nr_hw_queues
>> (nr_hw_queues=1 performs much better) and fio jobs (lower num jobs
>> around 4) this patch can increase IOPs by only around 5-10% because
>> we hit other issues like the big per tcmu device mutex.> 
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>   drivers/target/target_core_user.c | 27 +++++++++++++++++++++++++--
>>   1 file changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
>> index a5991df23581..a030ca6f0f4c 100644
>> --- a/drivers/target/target_core_user.c
>> +++ b/drivers/target/target_core_user.c
>> @@ -111,6 +111,7 @@ struct tcmu_dev {
>>   	struct kref kref;
>>   
>>   	struct se_device se_dev;
>> +	struct se_dev_plug se_plug;
>>   
>>   	char *name;
>>   	struct se_hba *hba;
>> @@ -119,6 +120,7 @@ struct tcmu_dev {
>>   #define TCMU_DEV_BIT_BROKEN 1
>>   #define TCMU_DEV_BIT_BLOCKED 2
>>   #define TCMU_DEV_BIT_TMR_NOTIFY 3
>> +#define TCM_DEV_BIT_PLUGGED 4
>>   	unsigned long flags;
>>   
>>   	struct uio_info uio_info;
>> @@ -959,6 +961,25 @@ static uint32_t ring_insert_padding(struct tcmu_dev *udev, size_t cmd_size)
>>   	return cmd_head;
>>   }
>>   
>> +static void tcmu_unplug_device(struct se_dev_plug *se_plug)
>> +{
>> +	struct se_device *se_dev = se_plug->se_dev;
>> +	struct tcmu_dev *udev = TCMU_DEV(se_dev);
>> +
>> +	uio_event_notify(&udev->uio_info);
> 
> Don't we have a race here?
> 
> Let's assume that
>  - just here the thread is interrupted
>  - userspace starts,empties the ring and sleeps again
>  - another cpu queues a new CDB in the ring
> In that - of course very rare condition - userspace will not wake up for the freshly queued CDB.
> 
> I think, first clearing the bit, then doing the uio_event_notify would work (without need to take the big tcmu mutex).

You,re right. Will fix. I have the same issue in iblock and there
I made a mistake where it per cpu when it should be per task.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 03/13] target: add workqueue based cmd submission
  2021-02-09 18:43     ` Mike Christie
@ 2021-02-09 19:10       ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-09 19:10 UTC (permalink / raw)
  To: Bodo Stroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha, virtualization

On 2/9/21 12:43 PM, Mike Christie wrote:
> On 2/9/21 9:48 AM, Bodo Stroesser wrote:
>> On 09.02.21 13:38, Mike Christie wrote:
>>>   +void target_queued_submit_work(struct work_struct *work)
>>> +{
>>> +    struct se_cmd_queue *sq = container_of(work, struct se_cmd_queue, work);
>>> +    struct se_cmd *se_cmd, *next_cmd;
>>> +    struct llist_node *cmd_list;
>>> +    unsigned char *cdb;
>>> +
>>> +    cmd_list = llist_del_all(&sq->cmd_list);
>>> +    if (!cmd_list)
>>> +        /* Previous call took what we were queued to submit */
>>> +        return;
>>> +
>>> +    cmd_list = llist_reverse_order(cmd_list);
>>> +    llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
>>> +        cdb = se_cmd->se_tfo->get_cdb(se_cmd);
>>
>> If I got it right, get_cdb is a new, optional callback.
>> So, should we check, whether it is set?
> 
> I think a check is not really useful. The caller has to implement it
> if they call target_queue_cmd_submit. It would be a BUG() and so
> either way you would get a crash when you are developing a patch.

I'll actually add the BUG. It looks like it's preferred in the lio code.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] target: move t_task_cdb initialization
  2021-02-09 12:38 ` [PATCH 01/13] target: move t_task_cdb initialization Mike Christie
@ 2021-02-10  8:35     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:35 UTC (permalink / raw)
  To: Mike Christie
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On Tue, Feb 09, 2021 at 06:38:33AM -0600, Mike Christie wrote:
> The next patch splits target_submit_cmd_map_sgls so the initialization
> and submission part can be called at different times. If the init part
> fails we can reference the t_task_cdb early in some of the logging
> and tracing code. This moves it to transport_init_se_cmd so we don't
> hit NULL pointer crashes.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 01/13] target: move t_task_cdb initialization
@ 2021-02-10  8:35     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:35 UTC (permalink / raw)
  To: Mike Christie
  Cc: loberman, Chaitanya.Kulkarni, martin.petersen, mst,
	virtualization, target-devel, stefanha, linux-scsi

On Tue, Feb 09, 2021 at 06:38:33AM -0600, Mike Christie wrote:
> The next patch splits target_submit_cmd_map_sgls so the initialization
> and submission part can be called at different times. If the init part
> fails we can reference the t_task_cdb early in some of the logging
> and tracing code. This moves it to transport_init_se_cmd so we don't
> hit NULL pointer crashes.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] target: split target_submit_cmd_map_sgls
  2021-02-09 12:38 ` [PATCH 02/13] target: split target_submit_cmd_map_sgls Mike Christie
@ 2021-02-10  8:36     ` Christoph Hellwig
  2021-02-10  8:36     ` Christoph Hellwig
  1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:36 UTC (permalink / raw)
  To: Mike Christie
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

Can you just kill off target_submit_cmd_map_sgls entirely and just
open code the two calls in the three callers?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 02/13] target: split target_submit_cmd_map_sgls
@ 2021-02-10  8:36     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:36 UTC (permalink / raw)
  To: Mike Christie
  Cc: loberman, Chaitanya.Kulkarni, martin.petersen, mst,
	virtualization, target-devel, stefanha, linux-scsi

Can you just kill off target_submit_cmd_map_sgls entirely and just
open code the two calls in the three callers?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds
  2021-02-09 12:38 ` [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds Mike Christie
@ 2021-02-10  8:37     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:37 UTC (permalink / raw)
  To: Mike Christie
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On Tue, Feb 09, 2021 at 06:38:37AM -0600, Mike Christie wrote:
> This just has tcm loop use the block layer cmd allocator for se_cmds
> instead of using the tcm_loop_cmd_cache. In future patches when we
> can use the host tags for internal requests like TMFs we can completely
> kill the tcm_loop_cmd_cache.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds
@ 2021-02-10  8:37     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:37 UTC (permalink / raw)
  To: Mike Christie
  Cc: loberman, Chaitanya.Kulkarni, martin.petersen, mst,
	virtualization, target-devel, stefanha, linux-scsi

On Tue, Feb 09, 2021 at 06:38:37AM -0600, Mike Christie wrote:
> This just has tcm loop use the block layer cmd allocator for se_cmds
> instead of using the tcm_loop_cmd_cache. In future patches when we
> can use the host tags for internal requests like TMFs we can completely
> kill the tcm_loop_cmd_cache.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/13] target: cleanup cmd flag bits
  2021-02-09 12:38 ` [PATCH 07/13] target: cleanup cmd flag bits Mike Christie
@ 2021-02-10  8:38     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:38 UTC (permalink / raw)
  To: Mike Christie
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On Tue, Feb 09, 2021 at 06:38:39AM -0600, Mike Christie wrote:
> We have a couple holes in the cmd flags definitions. This cleans
> up the definitions to fix that and make it easier to read.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/13] target: cleanup cmd flag bits
@ 2021-02-10  8:38     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:38 UTC (permalink / raw)
  To: Mike Christie
  Cc: loberman, Chaitanya.Kulkarni, martin.petersen, mst,
	virtualization, target-devel, stefanha, linux-scsi

On Tue, Feb 09, 2021 at 06:38:39AM -0600, Mike Christie wrote:
> We have a couple holes in the cmd flags definitions. This cleans
> up the definitions to fix that and make it easier to read.
> 
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/13] target: replace work per cmd in completion path
  2021-02-09 12:38 ` [PATCH 11/13] target: replace work per cmd in completion path Mike Christie
@ 2021-02-10  8:42     ` Christoph Hellwig
  2021-02-10  8:42     ` Christoph Hellwig
  1 sibling, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:42 UTC (permalink / raw)
  To: Mike Christie
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On Tue, Feb 09, 2021 at 06:38:43AM -0600, Mike Christie wrote:
> Doing a work per cmd can lead to lots of threads being created.
> This patch just replaces the completion work per cmd with a per cpu
> list. Combined with the first patches this allows tcm loop on top of
> initiators like iser to go from around 700K IOPs to 1000K and reduces
> the number of threads that get created when the system is under heavy
> load and hitting the initiator drivers tagging limits.

OTOH it does increase completion latency, which might be the preference
for some workloads.  Do we need a tunable here?

> +static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
> +				  int cpu, struct workqueue_struct *wq)
>  {
> -	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
> +	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
> +	queue_work_on(cpu, wq, &q->work);
> +}

Do we need this helper at all?  Having it open coded in the two callers
would seem easier to follow to me.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/13] target: replace work per cmd in completion path
@ 2021-02-10  8:42     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:42 UTC (permalink / raw)
  To: Mike Christie
  Cc: loberman, Chaitanya.Kulkarni, martin.petersen, mst,
	virtualization, target-devel, stefanha, linux-scsi

On Tue, Feb 09, 2021 at 06:38:43AM -0600, Mike Christie wrote:
> Doing a work per cmd can lead to lots of threads being created.
> This patch just replaces the completion work per cmd with a per cpu
> list. Combined with the first patches this allows tcm loop on top of
> initiators like iser to go from around 700K IOPs to 1000K and reduces
> the number of threads that get created when the system is under heavy
> load and hitting the initiator drivers tagging limits.

OTOH it does increase completion latency, which might be the preference
for some workloads.  Do we need a tunable here?

> +static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
> +				  int cpu, struct workqueue_struct *wq)
>  {
> -	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
> +	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
> +	queue_work_on(cpu, wq, &q->work);
> +}

Do we need this helper at all?  Having it open coded in the two callers
would seem easier to follow to me.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion
  2021-02-09 12:38 ` [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion Mike Christie
@ 2021-02-10  8:44     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:44 UTC (permalink / raw)
  To: Mike Christie
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

>  	struct se_device *se_dev = se_cmd->se_dev;
> -	int cpu = se_cmd->cpuid;
> +	int cpu;
> +
> +	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
> +		cpu = smp_processor_id();
> +	else
> +		cpu = se_cmd->cpuid;
>  
>  	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
>  			      target_completion_wq);

Can't we just use se_cmd->cpuid == WORK_CPU_UNBOUND as the no affinity
indicator, which would remove all branches here.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion
@ 2021-02-10  8:44     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2021-02-10  8:44 UTC (permalink / raw)
  To: Mike Christie
  Cc: loberman, Chaitanya.Kulkarni, martin.petersen, mst,
	virtualization, target-devel, stefanha, linux-scsi

>  	struct se_device *se_dev = se_cmd->se_dev;
> -	int cpu = se_cmd->cpuid;
> +	int cpu;
> +
> +	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
> +		cpu = smp_processor_id();
> +	else
> +		cpu = se_cmd->cpuid;
>  
>  	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
>  			      target_completion_wq);

Can't we just use se_cmd->cpuid == WORK_CPU_UNBOUND as the no affinity
indicator, which would remove all branches here.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 13/13] target: flush submission work during TMR processing
  2021-02-09 14:31   ` Laurence Oberman
@ 2021-02-10 14:25     ` Laurence Oberman
  0 siblings, 0 replies; 50+ messages in thread
From: Laurence Oberman @ 2021-02-10 14:25 UTC (permalink / raw)
  To: Mike Christie, Chaitanya.Kulkarni, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On Tue, 2021-02-09 at 09:31 -0500, Laurence Oberman wrote:
> On Tue, 2021-02-09 at 06:38 -0600, Mike Christie wrote:
> > If a cmd is on the submission workqueue then the TMR code will
> > miss it, and end up returning task not found or success for
> > lun resets. The fabric driver might then tell the initiator that
> > the running cmds have been handled when they are about to run.
> > 
> > This adds a cancel when we are processing TMRs.
> > 
> > Signed-off-by: Mike Christie <michael.christie@oracle.com>
> > ---
> >  drivers/target/target_core_tmr.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/target/target_core_tmr.c
> > b/drivers/target/target_core_tmr.c
> > index 7347285471fa..9b7f159f9341 100644
> > --- a/drivers/target/target_core_tmr.c
> > +++ b/drivers/target/target_core_tmr.c
> > @@ -124,6 +124,8 @@ void core_tmr_abort_task(
> >  	int i;
> >  
> >  	for (i = 0; i < dev->queue_cnt; i++) {
> > +		cancel_work_sync(&dev->queues[i].sq.work);
> > +
> >  		spin_lock_irqsave(&dev->queues[i].lock, flags);
> >  		list_for_each_entry_safe(se_cmd, next, &dev-
> > > queues[i].state_list,
> > 
> >  					 state_list) {
> > @@ -302,6 +304,8 @@ static void core_tmr_drain_state_list(
> >  	 * in the Control Mode Page.
> >  	 */
> >  	for (i = 0; i < dev->queue_cnt; i++) {
> > +		cancel_work_sync(&dev->queues[i].sq.work);
> > +
> >  		spin_lock_irqsave(&dev->queues[i].lock, flags);
> >  		list_for_each_entry_safe(cmd, next, &dev-
> > > queues[i].state_list,
> > 
> >  	
> 
> 
> > 				 state_list) {
> 
> Hello Mike
> Thanks for these
> This one in particular is the one that I think will help our case. I
> will pull all of these and test later this week as a bundle.
> 
> Many Thanks
> Laurence Oberman
> 

I pulled v2 and built a test kernel to start the testing but I see
Christoph has also suggested changes as well to your v3 submission.

I will wait until we are finalized and then do a full test.

Thanks
Laurence


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/13] target: replace work per cmd in completion path
  2021-02-10  8:42     ` Christoph Hellwig
  (?)
@ 2021-02-10 18:33     ` Mike Christie
  -1 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-10 18:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On 2/10/21 2:42 AM, Christoph Hellwig wrote:
> On Tue, Feb 09, 2021 at 06:38:43AM -0600, Mike Christie wrote:
>> Doing a work per cmd can lead to lots of threads being created.
>> This patch just replaces the completion work per cmd with a per cpu
>> list. Combined with the first patches this allows tcm loop on top of
>> initiators like iser to go from around 700K IOPs to 1000K and reduces
>> the number of threads that get created when the system is under heavy
>> load and hitting the initiator drivers tagging limits.
> 
> OTOH it does increase completion latency, which might be the preference
> for some workloads.  Do we need a tunable here?
> 

I didn't see an increase with higher perf setups like 100G (really 60
somehting) iser/srp and even using ram disks. I think it's because either
way we run one cmd on each cpu at a time, or if we do kick off cmds on
different threads they still run on the same cpu so latency didn't change
overall.

1. Without my patch each cmd has a work struct. target_complete_cmd does
queue_work_on which puts the cmd's work on the worker pool for that cpu.
With faster drivers, workqueue.c does the equivalent of a loop over each
work/cmd running one work on the cpu, that completes, then running the next.

If we get to the point where the workqueue code runs cmds in different threads
on the same cpu, they end up sharing the cpu so latency wasn't changing
(I didn't debug this very much).

I can't tell how much the rescuer thread comes into this though. Maybe it
could come in and run cmds on a different cpu, and that would help.

2. With the patch, we add the cmd to the device's per cpu list, then do
queue_work_on on the list's per cpu work. queue_work_on then either hits
the pending check or we run like #1 but on a per device basis instead of
per cmd.

However,

1. While I was checking this I noticed for commands that set
transport_complete_callback, then I need to do the per cmd work, since
they can be slow. I'll fix this on the next posting.

2. For the latency issue, did you want the cmd to be able to run on any cpu?
I think this won't happen now under normal cases, because the workqueue does
not set WQ_UNBOUND. Did we want a tunable for that?



>> +static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
>> +				  int cpu, struct workqueue_struct *wq)
>>  {
>> -	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
>> +	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
>> +	queue_work_on(cpu, wq, &q->work);
>> +}
> 
> Do we need this helper at all?  Having it open coded in the two callers
> would seem easier to follow to me.

I can do that.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion
  2021-02-10  8:44     ` Christoph Hellwig
  (?)
@ 2021-02-10 18:43     ` Mike Christie
  2021-02-10 18:45       ` Mike Christie
  -1 siblings, 1 reply; 50+ messages in thread
From: Mike Christie @ 2021-02-10 18:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On 2/10/21 2:44 AM, Christoph Hellwig wrote:
>>  	struct se_device *se_dev = se_cmd->se_dev;
>> -	int cpu = se_cmd->cpuid;
>> +	int cpu;
>> +
>> +	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
>> +		cpu = smp_processor_id();
>> +	else
>> +		cpu = se_cmd->cpuid;
>>  
>>  	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
>>  			      target_completion_wq);
> 
> Can't we just use se_cmd->cpuid == WORK_CPU_UNBOUND as the no affinity
> indicator, which would remove all branches here.

We can't right now because the workqueue_struct does
not have the WQ_UNBOUND bit set. __queue_work will then do:

        /* pwq which will be used unless @work is executing elsewhere */
        if (wq->flags & WQ_UNBOUND) {
                if (req_cpu == WORK_CPU_UNBOUND)
                        cpu = wq_select_unbound_cpu(raw_smp_processor_id());
                pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
        } else {
                if (req_cpu == WORK_CPU_UNBOUND)
                        cpu = raw_smp_processor_id();
                pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
        }

so even if you pass in WORK_CPU_UNBOUND, it will do raw_smp_processor_id
and add the work to the cpu's worker pool.

I think if I add in a new tunable to make the workqueue bound or unbound like I
mentioned in the other thread then I think it will do what you want for both
review comments.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion
  2021-02-10 18:43     ` Mike Christie
@ 2021-02-10 18:45       ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-10 18:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chaitanya.Kulkarni, loberman, martin.petersen, linux-scsi,
	target-devel, mst, stefanha, virtualization

On 2/10/21 12:43 PM, Mike Christie wrote:
> On 2/10/21 2:44 AM, Christoph Hellwig wrote:
>>>  	struct se_device *se_dev = se_cmd->se_dev;
>>> -	int cpu = se_cmd->cpuid;
>>> +	int cpu;
>>> +
>>> +	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
>>> +		cpu = smp_processor_id();
>>> +	else
>>> +		cpu = se_cmd->cpuid;
>>>  
>>>  	target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
>>>  			      target_completion_wq);
>>
>> Can't we just use se_cmd->cpuid == WORK_CPU_UNBOUND as the no affinity
>> indicator, which would remove all branches here.
> 
> We can't right now because the workqueue_struct does
> not have the WQ_UNBOUND bit set. __queue_work will then do:
> 
>         /* pwq which will be used unless @work is executing elsewhere */
>         if (wq->flags & WQ_UNBOUND) {
>                 if (req_cpu == WORK_CPU_UNBOUND)
>                         cpu = wq_select_unbound_cpu(raw_smp_processor_id());
>                 pwq = unbound_pwq_by_node(wq, cpu_to_node(cpu));
>         } else {
>                 if (req_cpu == WORK_CPU_UNBOUND)
>                         cpu = raw_smp_processor_id();
>                 pwq = per_cpu_ptr(wq->cpu_pwqs, cpu);
>         }
> 
> so even if you pass in WORK_CPU_UNBOUND, it will do raw_smp_processor_id
> and add the work to the cpu's worker pool.

Nevermind. What I wrote is exactly what you asked for :) I can do that.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 10/13] target_core_user: add backend plug/unplug callouts
  2021-02-10  4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
@ 2021-02-10  4:55 ` Mike Christie
  0 siblings, 0 replies; 50+ messages in thread
From: Mike Christie @ 2021-02-10  4:55 UTC (permalink / raw)
  To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
	linux-scsi, target-devel, mst, stefanha
  Cc: Mike Christie

This patch adds plug/unplug callouts for tcmu, so we can avoid the
number of times we switch to userspace. Using this driver with tcm
loop is a common config, and dependng on the nr_hw_queues
(nr_hw_queues=1 performs much better) and fio jobs (lower num jobs
around 4) this patch can increase IOPs by only around 5-10% because
we hit other issues like the big per tcmu device mutex.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_user.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index a5991df23581..a433bda56b89 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -111,6 +111,7 @@ struct tcmu_dev {
 	struct kref kref;
 
 	struct se_device se_dev;
+	struct se_dev_plug se_plug;
 
 	char *name;
 	struct se_hba *hba;
@@ -119,6 +120,7 @@ struct tcmu_dev {
 #define TCMU_DEV_BIT_BROKEN 1
 #define TCMU_DEV_BIT_BLOCKED 2
 #define TCMU_DEV_BIT_TMR_NOTIFY 3
+#define TCM_DEV_BIT_PLUGGED 4
 	unsigned long flags;
 
 	struct uio_info uio_info;
@@ -959,6 +961,25 @@ static uint32_t ring_insert_padding(struct tcmu_dev *udev, size_t cmd_size)
 	return cmd_head;
 }
 
+static void tcmu_unplug_device(struct se_dev_plug *se_plug)
+{
+	struct se_device *se_dev = se_plug->se_dev;
+	struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+	clear_bit(TCM_DEV_BIT_PLUGGED, &udev->flags);
+	uio_event_notify(&udev->uio_info);
+}
+
+static struct se_dev_plug *tcmu_plug_device(struct se_device *se_dev)
+{
+	struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+	if (!test_and_set_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+		return &udev->se_plug;
+
+	return NULL;
+}
+
 /**
  * queue_cmd_ring - queue cmd to ring or internally
  * @tcmu_cmd: cmd to queue
@@ -1086,8 +1107,8 @@ static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
 
 	list_add_tail(&tcmu_cmd->queue_entry, &udev->inflight_queue);
 
-	/* TODO: only if FLUSH and FUA? */
-	uio_event_notify(&udev->uio_info);
+	if (!test_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+		uio_event_notify(&udev->uio_info);
 
 	return 0;
 
@@ -2840,6 +2861,8 @@ static struct target_backend_ops tcmu_ops = {
 	.configure_device	= tcmu_configure_device,
 	.destroy_device		= tcmu_destroy_device,
 	.free_device		= tcmu_free_device,
+	.unplug_device		= tcmu_unplug_device,
+	.plug_device		= tcmu_plug_device,
 	.parse_cdb		= tcmu_parse_cdb,
 	.tmr_notify		= tcmu_tmr_notify,
 	.set_configfs_dev_params = tcmu_set_configfs_dev_params,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2021-02-10 18:48 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 12:38 [PATCH 00/12 V2] target: fix cmd plugging and completion Mike Christie
2021-02-09 12:38 ` [PATCH 01/13] target: move t_task_cdb initialization Mike Christie
2021-02-10  8:35   ` Christoph Hellwig
2021-02-10  8:35     ` Christoph Hellwig
2021-02-09 12:38 ` [PATCH 02/13] target: split target_submit_cmd_map_sgls Mike Christie
2021-02-09 16:15   ` kernel test robot
2021-02-09 16:15     ` kernel test robot
2021-02-09 16:15     ` kernel test robot
2021-02-09 18:40     ` Mike Christie
2021-02-09 18:40       ` Mike Christie
2021-02-10  8:36   ` Christoph Hellwig
2021-02-10  8:36     ` Christoph Hellwig
2021-02-09 12:38 ` [PATCH 03/13] target: add workqueue based cmd submission Mike Christie
2021-02-09 15:48   ` Bodo Stroesser
2021-02-09 18:43     ` Mike Christie
2021-02-09 19:10       ` Mike Christie
2021-02-09 12:38 ` [PATCH 04/13] vhost scsi: use lio wq cmd submission helper Mike Christie
2021-02-09 12:38 ` [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds Mike Christie
2021-02-10  8:37   ` Christoph Hellwig
2021-02-10  8:37     ` Christoph Hellwig
2021-02-09 12:38 ` [PATCH 06/13] tcm loop: use lio wq cmd submission helper Mike Christie
2021-02-09 15:59   ` Bodo Stroesser
2021-02-09 18:44     ` Mike Christie
2021-02-09 17:39   ` kernel test robot
2021-02-09 17:39     ` kernel test robot
2021-02-09 12:38 ` [PATCH 07/13] target: cleanup cmd flag bits Mike Christie
2021-02-10  8:38   ` Christoph Hellwig
2021-02-10  8:38     ` Christoph Hellwig
2021-02-09 12:38 ` [PATCH 08/13] target: fix backend plugging Mike Christie
2021-02-09 12:38 ` [PATCH 09/13] target iblock: add backend plug/unplug callouts Mike Christie
2021-02-09 12:38 ` [PATCH 10/13] target_core_user: " Mike Christie
2021-02-09 16:32   ` Bodo Stroesser
2021-02-09 18:59     ` Mike Christie
2021-02-09 12:38 ` [PATCH 11/13] target: replace work per cmd in completion path Mike Christie
2021-02-09 17:01   ` Bodo Stroesser
2021-02-09 18:50     ` Mike Christie
2021-02-10  8:42   ` Christoph Hellwig
2021-02-10  8:42     ` Christoph Hellwig
2021-02-10 18:33     ` Mike Christie
2021-02-09 12:38 ` [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion Mike Christie
2021-02-10  8:44   ` Christoph Hellwig
2021-02-10  8:44     ` Christoph Hellwig
2021-02-10 18:43     ` Mike Christie
2021-02-10 18:45       ` Mike Christie
2021-02-09 12:38 ` [PATCH 13/13] target: flush submission work during TMR processing Mike Christie
2021-02-09 14:31   ` Laurence Oberman
2021-02-10 14:25     ` Laurence Oberman
2021-02-09 17:05   ` Bodo Stroesser
2021-02-09 18:49     ` Mike Christie
2021-02-10  4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
2021-02-10  4:55 ` [PATCH 10/13] target_core_user: add backend plug/unplug callouts Mike Christie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.