All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] target: fix cmd plugging and completion
@ 2021-02-04 11:35 Mike Christie
  2021-02-04 11:35 ` [PATCH 01/11] target: pass in fabric ops to session creation Mike Christie
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization

The following patches made over Martin's 5.12 branches fix two
issues:

1. target_core_iblock plugs and unplugs the queue for every
command. To handle this issue and handle an issue that
vhost-scsi and loop were avoiding by adding their own workqueue,
I added a new submission workqueue to LIO. Drivers can pass cmds
to it, and we can then submit batches of cmds.

2. vhost-scsi and loop on the submission side were doing a work
per cmd and on the lio completion side it was doing a work per
cmd. The cap on running works is 512 (max_active) and so we can
end up end up using a lot of threads when submissions start blocking
because they hit the block tag limit or the completion side blocks
trying to send the cmd. In this patchset I just use a cmd list
per session to avoid abusing the workueue layer.

The combined patchset fixes a major perf issue we've been hitting
where IOPs is stuck at 230K when running:

    fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
    --ioengine=libaio --iodepth=128  --numjobs=8 --time_based
    --group_reporting --runtime=60

The patches in this set get me to 350K when using devices that
have native IOPs of around 400-500K.

Note that 5.12 has some interrupt changes that my patches
collide with. Martin's 5.12 branches had the changes so I
based my patches on that.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/11] target: pass in fabric ops to session creation
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 11:35 ` [PATCH 02/11] target: add workqueue cmd submission helper Mike Christie
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

The next patch will create a session level submission work queue if
the drivers fabric ops implements a new callout. This patch just
converts the target code to take the new fabric ops arg so we can
check if the callout is implemented and then call it in other functions
when we have only the se_session available to us.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/iscsi/iscsi_target_login.c |  2 +-
 drivers/target/target_core_transport.c    | 24 ++++++++++++++++-------
 drivers/target/target_core_xcopy.c        |  2 +-
 include/target/target_core_base.h         |  1 +
 include/target/target_core_fabric.h       |  6 ++++--
 5 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 1a9c50401bdb..ddf0c3b13671 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -317,7 +317,7 @@ static int iscsi_login_zero_tsih_s1(
 		goto free_id;
 	}
 
-	sess->se_sess = transport_alloc_session(TARGET_PROT_NORMAL);
+	sess->se_sess = transport_alloc_session(&iscsi_ops, TARGET_PROT_NORMAL);
 	if (IS_ERR(sess->se_sess)) {
 		iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
 				ISCSI_LOGIN_STATUS_NO_RESOURCES);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 93ea17cbad79..7c5d37bac561 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -220,11 +220,13 @@ static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
 
 /**
  * transport_init_session - initialize a session object
+ * @tfo: target core fabric ops
  * @se_sess: Session object pointer.
  *
  * The caller must have zero-initialized @se_sess before calling this function.
  */
-int transport_init_session(struct se_session *se_sess)
+int transport_init_session(const struct target_core_fabric_ops *tfo,
+			   struct se_session *se_sess)
 {
 	INIT_LIST_HEAD(&se_sess->sess_list);
 	INIT_LIST_HEAD(&se_sess->sess_acl_list);
@@ -232,6 +234,7 @@ int transport_init_session(struct se_session *se_sess)
 	init_waitqueue_head(&se_sess->cmd_count_wq);
 	init_completion(&se_sess->stop_done);
 	atomic_set(&se_sess->stopped, 0);
+	se_sess->tfo = tfo;
 	return percpu_ref_init(&se_sess->cmd_count,
 			       target_release_sess_cmd_refcnt, 0, GFP_KERNEL);
 }
@@ -252,9 +255,12 @@ void transport_uninit_session(struct se_session *se_sess)
 
 /**
  * transport_alloc_session - allocate a session object and initialize it
+ * @tfo: target core fabric ops
  * @sup_prot_ops: bitmask that defines which T10-PI modes are supported.
  */
-struct se_session *transport_alloc_session(enum target_prot_op sup_prot_ops)
+struct se_session *
+transport_alloc_session(const struct target_core_fabric_ops *tfo,
+			enum target_prot_op sup_prot_ops)
 {
 	struct se_session *se_sess;
 	int ret;
@@ -265,7 +271,8 @@ struct se_session *transport_alloc_session(enum target_prot_op sup_prot_ops)
 				" se_sess_cache\n");
 		return ERR_PTR(-ENOMEM);
 	}
-	ret = transport_init_session(se_sess);
+
+	ret = transport_init_session(tfo, se_sess);
 	if (ret < 0) {
 		kmem_cache_free(se_sess_cache, se_sess);
 		return ERR_PTR(ret);
@@ -311,13 +318,15 @@ EXPORT_SYMBOL(transport_alloc_session_tags);
 
 /**
  * transport_init_session_tags - allocate a session and target driver private data
+ * @tfo: target core fabric ops
  * @tag_num:  Maximum number of in-flight commands between initiator and target.
  * @tag_size: Size in bytes of the private data a target driver associates with
  *	      each command.
  * @sup_prot_ops: bitmask that defines which T10-PI modes are supported.
  */
 static struct se_session *
-transport_init_session_tags(unsigned int tag_num, unsigned int tag_size,
+transport_init_session_tags(const struct target_core_fabric_ops *tfo,
+			    unsigned int tag_num, unsigned int tag_size,
 			    enum target_prot_op sup_prot_ops)
 {
 	struct se_session *se_sess;
@@ -334,7 +343,7 @@ transport_init_session_tags(unsigned int tag_num, unsigned int tag_size,
 		return ERR_PTR(-EINVAL);
 	}
 
-	se_sess = transport_alloc_session(sup_prot_ops);
+	se_sess = transport_alloc_session(tfo, sup_prot_ops);
 	if (IS_ERR(se_sess))
 		return se_sess;
 
@@ -442,9 +451,10 @@ target_setup_session(struct se_portal_group *tpg,
 	 * of I/O descriptor tags, go ahead and perform that setup now..
 	 */
 	if (tag_num != 0)
-		sess = transport_init_session_tags(tag_num, tag_size, prot_op);
+		sess = transport_init_session_tags(tpg->se_tpg_tfo, tag_num,
+						   tag_size, prot_op);
 	else
-		sess = transport_alloc_session(prot_op);
+		sess = transport_alloc_session(tpg->se_tpg_tfo, prot_op);
 
 	if (IS_ERR(sess))
 		return sess;
diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
index 44e15d7fb2f0..a7553712da25 100644
--- a/drivers/target/target_core_xcopy.c
+++ b/drivers/target/target_core_xcopy.c
@@ -472,7 +472,7 @@ int target_xcopy_setup_pt(void)
 	INIT_LIST_HEAD(&xcopy_pt_nacl.acl_list);
 	INIT_LIST_HEAD(&xcopy_pt_nacl.acl_sess_list);
 	memset(&xcopy_pt_sess, 0, sizeof(struct se_session));
-	ret = transport_init_session(&xcopy_pt_sess);
+	ret = transport_init_session(&xcopy_pt_tfo, &xcopy_pt_sess);
 	if (ret < 0)
 		goto destroy_wq;
 
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 54dcc0eb25fa..50103a22b0e2 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -628,6 +628,7 @@ struct se_session {
 	struct completion	stop_done;
 	void			*sess_cmd_map;
 	struct sbitmap_queue	sess_tag_pool;
+	const struct target_core_fabric_ops *tfo;
 };
 
 struct se_device;
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index d60a3eb7517a..cdf610838ba5 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -132,8 +132,10 @@ struct se_session *target_setup_session(struct se_portal_group *,
 				struct se_session *, void *));
 void target_remove_session(struct se_session *);
 
-int transport_init_session(struct se_session *se_sess);
-struct se_session *transport_alloc_session(enum target_prot_op);
+int transport_init_session(const struct target_core_fabric_ops *tfo,
+			   struct se_session *se_sess);
+struct se_session *transport_alloc_session(const struct target_core_fabric_ops *tfo,
+					   enum target_prot_op);
 int transport_alloc_session_tags(struct se_session *, unsigned int,
 		unsigned int);
 void	__transport_register_session(struct se_portal_group *,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/11] target: add workqueue cmd submission helper
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
  2021-02-04 11:35 ` [PATCH 01/11] target: pass in fabric ops to session creation Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 23:13   ` Chaitanya Kulkarni
  2021-02-04 11:35 ` [PATCH 03/11] tcm loop: use blk cmd allocator for se_cmds Mike Christie
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

loop and vhost-scsi do their target cmd submission from driver
workqueues. This allows them to avoid an issue where the backend may
block waiting for resources like tags/requests, mem/locks, etc
and that ends up blocking their entire submission path and for the
case of vhost-scsi both the submission and completion path.

This patch adds a helper these drivers can use to submit from the
lio workqueue. This code will then be extended in the next patches
to fix the plugging of backend devices.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 102 ++++++++++++++++++++++++-
 include/target/target_core_base.h      |  10 ++-
 include/target/target_core_fabric.h    |   3 +
 3 files changed, 111 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 7c5d37bac561..dec89e911348 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -41,6 +41,7 @@
 #include <trace/events/target.h>
 
 static struct workqueue_struct *target_completion_wq;
+static struct workqueue_struct *target_submission_wq;
 static struct kmem_cache *se_sess_cache;
 struct kmem_cache *se_ua_cache;
 struct kmem_cache *t10_pr_reg_cache;
@@ -129,8 +130,15 @@ int init_se_kmem_caches(void)
 	if (!target_completion_wq)
 		goto out_free_lba_map_mem_cache;
 
+	target_submission_wq = alloc_workqueue("target_submission",
+					       WQ_MEM_RECLAIM, 0);
+	if (!target_submission_wq)
+		goto out_free_completion_wq;
+
 	return 0;
 
+out_free_completion_wq:
+	destroy_workqueue(target_completion_wq);
 out_free_lba_map_mem_cache:
 	kmem_cache_destroy(t10_alua_lba_map_mem_cache);
 out_free_lba_map_cache:
@@ -153,6 +161,7 @@ int init_se_kmem_caches(void)
 
 void release_se_kmem_caches(void)
 {
+	destroy_workqueue(target_submission_wq);
 	destroy_workqueue(target_completion_wq);
 	kmem_cache_destroy(se_sess_cache);
 	kmem_cache_destroy(se_ua_cache);
@@ -218,6 +227,69 @@ static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
 	wake_up(&sess->cmd_count_wq);
 }
 
+static void target_queued_submit_work(struct work_struct *work)
+{
+	struct se_sess_cmd_queue *sq =
+				container_of(work, struct se_sess_cmd_queue,
+					     work);
+	struct se_session *se_sess = sq->se_sess;
+	struct se_cmd *se_cmd, *next_cmd;
+	struct llist_node *cmd_list;
+
+	cmd_list = llist_del_all(&sq->cmd_list);
+	if (!cmd_list)
+		/* Previous call took what we were queued to submit */
+		return;
+
+	cmd_list = llist_reverse_order(cmd_list);
+	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
+		se_sess->tfo->submit_queued_cmd(se_cmd);
+}
+
+static void target_queue_cmd_work(struct se_sess_cmd_queue *q,
+				  struct se_cmd *se_cmd, int cpu)
+{
+	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
+	queue_work_on(cpu, target_submission_wq, &q->work);
+}
+
+/**
+ * target_queue_cmd_submit - queue a se_cmd to be executed from the lio wq
+ * @se_sess: cmd's session
+ * @cmd_list: cmd to queue
+ */
+void target_queue_cmd_submit(struct se_session *se_sess, struct se_cmd *se_cmd)
+{
+	int cpu = smp_processor_id();
+
+	target_queue_cmd_work(&se_sess->sq[cpu], se_cmd, cpu);
+}
+EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
+
+static void target_flush_queued_cmds(struct se_session *se_sess)
+{
+	int i;
+
+	if (!se_sess->sq)
+		return;
+
+	for (i = 0; i < se_sess->q_cnt; i++)
+		cancel_work_sync(&se_sess->sq[i].work);
+}
+
+static void target_init_sess_cmd_queues(struct se_session *se_sess,
+					struct se_sess_cmd_queue *q,
+					void (*work_fn)(struct work_struct *work))
+{
+	int i;
+
+	for (i = 0; i < se_sess->q_cnt; i++) {
+		init_llist_head(&q[i].cmd_list);
+		INIT_WORK(&q[i].work, work_fn);
+		q[i].se_sess = se_sess;
+	}
+}
+
 /**
  * transport_init_session - initialize a session object
  * @tfo: target core fabric ops
@@ -228,6 +300,8 @@ static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
 int transport_init_session(const struct target_core_fabric_ops *tfo,
 			   struct se_session *se_sess)
 {
+	int rc;
+
 	INIT_LIST_HEAD(&se_sess->sess_list);
 	INIT_LIST_HEAD(&se_sess->sess_acl_list);
 	spin_lock_init(&se_sess->sess_cmd_lock);
@@ -235,13 +309,34 @@ int transport_init_session(const struct target_core_fabric_ops *tfo,
 	init_completion(&se_sess->stop_done);
 	atomic_set(&se_sess->stopped, 0);
 	se_sess->tfo = tfo;
-	return percpu_ref_init(&se_sess->cmd_count,
-			       target_release_sess_cmd_refcnt, 0, GFP_KERNEL);
+
+	if (tfo->submit_queued_cmd) {
+		se_sess->sq = kcalloc(nr_cpu_ids, sizeof(*se_sess->sq),
+				      GFP_KERNEL);
+		if (!se_sess->sq)
+			return -ENOMEM;
+
+		se_sess->q_cnt = nr_cpu_ids;
+		target_init_sess_cmd_queues(se_sess, se_sess->sq,
+					    target_queued_submit_work);
+	}
+
+	rc = percpu_ref_init(&se_sess->cmd_count,
+			     target_release_sess_cmd_refcnt, 0, GFP_KERNEL);
+	if (rc)
+		goto free_sq;
+
+	return 0;
+
+free_sq:
+	kfree(se_sess->sq);
+	return rc;
 }
 EXPORT_SYMBOL(transport_init_session);
 
 void transport_uninit_session(struct se_session *se_sess)
 {
+	kfree(se_sess->sq);
 	/*
 	 * Drivers like iscsi and loop do not call target_stop_session
 	 * during session shutdown so we have to drop the ref taken at init
@@ -1385,7 +1480,6 @@ void transport_init_se_cmd(
 {
 	INIT_LIST_HEAD(&cmd->se_delayed_node);
 	INIT_LIST_HEAD(&cmd->se_qf_node);
-	INIT_LIST_HEAD(&cmd->se_cmd_list);
 	INIT_LIST_HEAD(&cmd->state_list);
 	init_completion(&cmd->t_transport_stop_comp);
 	cmd->free_compl = NULL;
@@ -2968,6 +3062,8 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
 {
 	int ret;
 
+	target_flush_queued_cmds(se_sess);
+
 	WARN_ON_ONCE(!atomic_read(&se_sess->stopped));
 
 	do {
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 50103a22b0e2..97138bff14d1 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -488,7 +488,7 @@ struct se_cmd {
 	/* Only used for internal passthrough and legacy TCM fabric modules */
 	struct se_session	*se_sess;
 	struct se_tmr_req	*se_tmr_req;
-	struct list_head	se_cmd_list;
+	struct llist_node	se_cmd_list;
 	struct completion	*free_compl;
 	struct completion	*abrt_compl;
 	const struct target_core_fabric_ops *se_tfo;
@@ -612,6 +612,12 @@ static inline struct se_node_acl *fabric_stat_to_nacl(struct config_item *item)
 			acl_fabric_stat_group);
 }
 
+struct se_sess_cmd_queue {
+	struct llist_head	cmd_list;
+	struct work_struct	work;
+	struct se_session	*se_sess;
+};
+
 struct se_session {
 	atomic_t		stopped;
 	u64			sess_bin_isid;
@@ -629,6 +635,8 @@ struct se_session {
 	void			*sess_cmd_map;
 	struct sbitmap_queue	sess_tag_pool;
 	const struct target_core_fabric_ops *tfo;
+	struct se_sess_cmd_queue *sq;
+	int			q_cnt;
 };
 
 struct se_device;
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index cdf610838ba5..899948967a65 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -80,6 +80,7 @@ struct target_core_fabric_ops {
 	int (*queue_status)(struct se_cmd *);
 	void (*queue_tm_rsp)(struct se_cmd *);
 	void (*aborted_task)(struct se_cmd *);
+	void (*submit_queued_cmd)(struct se_cmd *);
 	/*
 	 * fabric module calls for target_core_fabric_configfs.c
 	 */
@@ -166,6 +167,8 @@ int	target_submit_tmr(struct se_cmd *se_cmd, struct se_session *se_sess,
 		unsigned char *sense, u64 unpacked_lun,
 		void *fabric_tmr_ptr, unsigned char tm_type,
 		gfp_t, u64, int);
+void	target_queue_cmd_submit(struct se_session *se_sess,
+				struct se_cmd *se_cmd);
 int	transport_handle_cdb_direct(struct se_cmd *);
 sense_reason_t	transport_generic_new_cmd(struct se_cmd *);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/11] tcm loop: use blk cmd allocator for se_cmds
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
  2021-02-04 11:35 ` [PATCH 01/11] target: pass in fabric ops to session creation Mike Christie
  2021-02-04 11:35 ` [PATCH 02/11] target: add workqueue cmd submission helper Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 11:35 ` [PATCH 04/11] tcm loop: use lio wq cmd submission helper Mike Christie
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

This just has tcm loop use the block layer cmd allocator for se_cmds
instead of using the tcm_loop_cmd_cache. In future patches when we
can use the host tags for internal requests like TMFs we can completely
kill the tcm_loop_cmd_cache.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/loopback/tcm_loop.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index badba437e5f9..274826a2b0bd 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -67,8 +67,12 @@ static void tcm_loop_release_cmd(struct se_cmd *se_cmd)
 {
 	struct tcm_loop_cmd *tl_cmd = container_of(se_cmd,
 				struct tcm_loop_cmd, tl_se_cmd);
+	struct scsi_cmnd *sc = tl_cmd->sc;
 
-	kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
+	if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
+		kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
+	else
+		sc->scsi_done(sc);
 }
 
 static int tcm_loop_show_info(struct seq_file *m, struct Scsi_Host *host)
@@ -165,7 +169,6 @@ static void tcm_loop_submission_work(struct work_struct *work)
 	return;
 
 out_done:
-	kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
 	sc->scsi_done(sc);
 }
 
@@ -175,20 +178,14 @@ static void tcm_loop_submission_work(struct work_struct *work)
  */
 static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 {
-	struct tcm_loop_cmd *tl_cmd;
+	struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
 
 	pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
 		 __func__, sc->device->host->host_no, sc->device->id,
 		 sc->device->channel, sc->device->lun, sc->cmnd[0],
 		 scsi_bufflen(sc));
 
-	tl_cmd = kmem_cache_zalloc(tcm_loop_cmd_cache, GFP_ATOMIC);
-	if (!tl_cmd) {
-		set_host_byte(sc, DID_ERROR);
-		sc->scsi_done(sc);
-		return 0;
-	}
-
+	memset(tl_cmd, 0, sizeof(*tl_cmd));
 	tl_cmd->sc = sc;
 	tl_cmd->sc_cmd_tag = sc->request->tag;
 	INIT_WORK(&tl_cmd->work, tcm_loop_submission_work);
@@ -320,6 +317,7 @@ static struct scsi_host_template tcm_loop_driver_template = {
 	.dma_boundary		= PAGE_SIZE - 1,
 	.module			= THIS_MODULE,
 	.track_queue_depth	= 1,
+	.cmd_size		= sizeof(struct tcm_loop_cmd),
 };
 
 static int tcm_loop_driver_probe(struct device *dev)
@@ -580,7 +578,6 @@ static int tcm_loop_queue_data_or_status(const char *func,
 	if ((se_cmd->se_cmd_flags & SCF_OVERFLOW_BIT) ||
 	    (se_cmd->se_cmd_flags & SCF_UNDERFLOW_BIT))
 		scsi_set_resid(sc, se_cmd->residual_count);
-	sc->scsi_done(sc);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/11] tcm loop: use lio wq cmd submission helper
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (2 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 03/11] tcm loop: use blk cmd allocator for se_cmds Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 11:35 ` [PATCH 05/11] vhost scsi: " Mike Christie
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

Convert loop to use the lio wq cmd submission helper.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/loopback/tcm_loop.c | 25 +++++++++++--------------
 drivers/target/loopback/tcm_loop.h |  1 -
 2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 274826a2b0bd..8dc45165d33b 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -39,7 +39,6 @@
 
 #define to_tcm_loop_hba(hba)	container_of(hba, struct tcm_loop_hba, dev)
 
-static struct workqueue_struct *tcm_loop_workqueue;
 static struct kmem_cache *tcm_loop_cmd_cache;
 
 static int tcm_loop_hba_no_cnt;
@@ -106,11 +105,10 @@ static struct device_driver tcm_loop_driverfs = {
  */
 static struct device *tcm_loop_primary;
 
-static void tcm_loop_submission_work(struct work_struct *work)
+static void tcm_loop_submit_queued_cmd(struct se_cmd *se_cmd)
 {
 	struct tcm_loop_cmd *tl_cmd =
-		container_of(work, struct tcm_loop_cmd, work);
-	struct se_cmd *se_cmd = &tl_cmd->tl_se_cmd;
+		container_of(se_cmd, struct tcm_loop_cmd, tl_se_cmd);
 	struct scsi_cmnd *sc = tl_cmd->sc;
 	struct tcm_loop_nexus *tl_nexus;
 	struct tcm_loop_hba *tl_hba;
@@ -179,6 +177,11 @@ static void tcm_loop_submission_work(struct work_struct *work)
 static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 {
 	struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
+	struct tcm_loop_hba *tl_hba;
+	struct tcm_loop_tpg *tl_tpg;
+
+	tl_hba = *(struct tcm_loop_hba **)shost_priv(sc->device->host);
+	tl_tpg = &tl_hba->tl_hba_tpgs[sc->device->id];
 
 	pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
 		 __func__, sc->device->host->host_no, sc->device->id,
@@ -188,8 +191,8 @@ static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
 	memset(tl_cmd, 0, sizeof(*tl_cmd));
 	tl_cmd->sc = sc;
 	tl_cmd->sc_cmd_tag = sc->request->tag;
-	INIT_WORK(&tl_cmd->work, tcm_loop_submission_work);
-	queue_work(tcm_loop_workqueue, &tl_cmd->work);
+
+	target_queue_cmd_submit(tl_tpg->tl_nexus->se_sess, &tl_cmd->tl_se_cmd);
 	return 0;
 }
 
@@ -1146,6 +1149,7 @@ static const struct target_core_fabric_ops loop_ops = {
 	.queue_status			= tcm_loop_queue_status,
 	.queue_tm_rsp			= tcm_loop_queue_tm_rsp,
 	.aborted_task			= tcm_loop_aborted_task,
+	.submit_queued_cmd		= tcm_loop_submit_queued_cmd,
 	.fabric_make_wwn		= tcm_loop_make_scsi_hba,
 	.fabric_drop_wwn		= tcm_loop_drop_scsi_hba,
 	.fabric_make_tpg		= tcm_loop_make_naa_tpg,
@@ -1161,17 +1165,13 @@ static int __init tcm_loop_fabric_init(void)
 {
 	int ret = -ENOMEM;
 
-	tcm_loop_workqueue = alloc_workqueue("tcm_loop", 0, 0);
-	if (!tcm_loop_workqueue)
-		goto out;
-
 	tcm_loop_cmd_cache = kmem_cache_create("tcm_loop_cmd_cache",
 				sizeof(struct tcm_loop_cmd),
 				__alignof__(struct tcm_loop_cmd),
 				0, NULL);
 	if (!tcm_loop_cmd_cache) {
 		pr_debug("kmem_cache_create() for tcm_loop_cmd_cache failed\n");
-		goto out_destroy_workqueue;
+		goto out;
 	}
 
 	ret = tcm_loop_alloc_core_bus();
@@ -1188,8 +1188,6 @@ static int __init tcm_loop_fabric_init(void)
 	tcm_loop_release_core_bus();
 out_destroy_cache:
 	kmem_cache_destroy(tcm_loop_cmd_cache);
-out_destroy_workqueue:
-	destroy_workqueue(tcm_loop_workqueue);
 out:
 	return ret;
 }
@@ -1199,7 +1197,6 @@ static void __exit tcm_loop_fabric_exit(void)
 	target_unregister_template(&loop_ops);
 	tcm_loop_release_core_bus();
 	kmem_cache_destroy(tcm_loop_cmd_cache);
-	destroy_workqueue(tcm_loop_workqueue);
 }
 
 MODULE_DESCRIPTION("TCM loopback virtual Linux/SCSI fabric module");
diff --git a/drivers/target/loopback/tcm_loop.h b/drivers/target/loopback/tcm_loop.h
index d3110909a213..437663b3905c 100644
--- a/drivers/target/loopback/tcm_loop.h
+++ b/drivers/target/loopback/tcm_loop.h
@@ -16,7 +16,6 @@ struct tcm_loop_cmd {
 	struct scsi_cmnd *sc;
 	/* The TCM I/O descriptor that is accessed via container_of() */
 	struct se_cmd tl_se_cmd;
-	struct work_struct work;
 	struct completion tmr_done;
 	/* Sense buffer that will be mapped into outgoing status */
 	unsigned char tl_sense_buf[TRANSPORT_SENSE_BUFFER];
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/11] vhost scsi: use lio wq cmd submission helper
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (3 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 04/11] tcm loop: use lio wq cmd submission helper Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-05 16:17     ` Michael S. Tsirkin
  2021-02-04 11:35 ` [PATCH 06/11] target: cleanup cmd flag bits Mike Christie
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

Convert vhost-scsi to use the lio wq cmd submission helper.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/vhost/scsi.c | 35 +++++++----------------------------
 1 file changed, 7 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4ce9f00ae10e..aacad9e222ff 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -85,7 +85,7 @@ struct vhost_scsi_cmd {
 	/* The number of scatterlists associated with this cmd */
 	u32 tvc_sgl_count;
 	u32 tvc_prot_sgl_count;
-	/* Saved unpacked SCSI LUN for vhost_scsi_submission_work() */
+	/* Saved unpacked SCSI LUN for vhost_scsi_submit_queued_cmd() */
 	u32 tvc_lun;
 	/* Pointer to the SGL formatted memory from virtio-scsi */
 	struct scatterlist *tvc_sgl;
@@ -101,8 +101,6 @@ struct vhost_scsi_cmd {
 	struct vhost_scsi_nexus *tvc_nexus;
 	/* The TCM I/O descriptor that is accessed via container_of() */
 	struct se_cmd tvc_se_cmd;
-	/* work item used for cmwq dispatch to vhost_scsi_submission_work() */
-	struct work_struct work;
 	/* Copy of the incoming SCSI command descriptor block (CDB) */
 	unsigned char tvc_cdb[VHOST_SCSI_MAX_CDB_SIZE];
 	/* Sense buffer that will be mapped into outgoing status */
@@ -240,8 +238,6 @@ struct vhost_scsi_ctx {
 	struct iov_iter out_iter;
 };
 
-static struct workqueue_struct *vhost_scsi_workqueue;
-
 /* Global spinlock to protect vhost_scsi TPG list for vhost IOCTL access */
 static DEFINE_MUTEX(vhost_scsi_mutex);
 static LIST_HEAD(vhost_scsi_list);
@@ -782,12 +778,11 @@ static int vhost_scsi_to_tcm_attr(int attr)
 	return TCM_SIMPLE_TAG;
 }
 
-static void vhost_scsi_submission_work(struct work_struct *work)
+static void vhost_scsi_submit_queued_cmd(struct se_cmd *se_cmd)
 {
 	struct vhost_scsi_cmd *cmd =
-		container_of(work, struct vhost_scsi_cmd, work);
+		container_of(se_cmd, struct vhost_scsi_cmd, tvc_se_cmd);
 	struct vhost_scsi_nexus *tv_nexus;
-	struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
 	struct scatterlist *sg_ptr, *sg_prot_ptr = NULL;
 	int rc;
 
@@ -1132,14 +1127,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
 		 */
 		cmd->tvc_vq_desc = vc.head;
-		/*
-		 * Dispatch cmd descriptor for cmwq execution in process
-		 * context provided by vhost_scsi_workqueue.  This also ensures
-		 * cmd is executed on the same kworker CPU as this vhost
-		 * thread to gain positive L2 cache locality effects.
-		 */
-		INIT_WORK(&cmd->work, vhost_scsi_submission_work);
-		queue_work(vhost_scsi_workqueue, &cmd->work);
+		target_queue_cmd_submit(tpg->tpg_nexus->tvn_se_sess,
+					&cmd->tvc_se_cmd);
 		ret = 0;
 err:
 		/*
@@ -2466,6 +2455,7 @@ static const struct target_core_fabric_ops vhost_scsi_ops = {
 	.queue_status			= vhost_scsi_queue_status,
 	.queue_tm_rsp			= vhost_scsi_queue_tm_rsp,
 	.aborted_task			= vhost_scsi_aborted_task,
+	.submit_queued_cmd		= vhost_scsi_submit_queued_cmd,
 	/*
 	 * Setup callers for generic logic in target_core_fabric_configfs.c
 	 */
@@ -2489,17 +2479,9 @@ static int __init vhost_scsi_init(void)
 		" on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
 		utsname()->machine);
 
-	/*
-	 * Use our own dedicated workqueue for submitting I/O into
-	 * target core to avoid contention within system_wq.
-	 */
-	vhost_scsi_workqueue = alloc_workqueue("vhost_scsi", 0, 0);
-	if (!vhost_scsi_workqueue)
-		goto out;
-
 	ret = vhost_scsi_register();
 	if (ret < 0)
-		goto out_destroy_workqueue;
+		goto out;
 
 	ret = target_register_template(&vhost_scsi_ops);
 	if (ret < 0)
@@ -2509,8 +2491,6 @@ static int __init vhost_scsi_init(void)
 
 out_vhost_scsi_deregister:
 	vhost_scsi_deregister();
-out_destroy_workqueue:
-	destroy_workqueue(vhost_scsi_workqueue);
 out:
 	return ret;
 };
@@ -2519,7 +2499,6 @@ static void vhost_scsi_exit(void)
 {
 	target_unregister_template(&vhost_scsi_ops);
 	vhost_scsi_deregister();
-	destroy_workqueue(vhost_scsi_workqueue);
 };
 
 MODULE_DESCRIPTION("VHOST_SCSI series fabric driver");
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/11] target: cleanup cmd flag bits
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (4 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 05/11] vhost scsi: " Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 23:15   ` Chaitanya Kulkarni
  2021-02-04 11:35 ` [PATCH 07/11] target: fix backend plugging Mike Christie
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

We have a couple holes in the cmd flags definitions. This cleans
up the definitions to fix that and make it easier to read.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 include/target/target_core_base.h | 38 +++++++++++++++----------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 97138bff14d1..b7f92a15cd1c 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -127,25 +127,25 @@ enum transport_state_table {
 
 /* Used for struct se_cmd->se_cmd_flags */
 enum se_cmd_flags_table {
-	SCF_SUPPORTED_SAM_OPCODE	= 0x00000001,
-	SCF_TRANSPORT_TASK_SENSE	= 0x00000002,
-	SCF_EMULATED_TASK_SENSE		= 0x00000004,
-	SCF_SCSI_DATA_CDB		= 0x00000008,
-	SCF_SCSI_TMR_CDB		= 0x00000010,
-	SCF_FUA				= 0x00000080,
-	SCF_SE_LUN_CMD			= 0x00000100,
-	SCF_BIDI			= 0x00000400,
-	SCF_SENT_CHECK_CONDITION	= 0x00000800,
-	SCF_OVERFLOW_BIT		= 0x00001000,
-	SCF_UNDERFLOW_BIT		= 0x00002000,
-	SCF_ALUA_NON_OPTIMIZED		= 0x00008000,
-	SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC = 0x00020000,
-	SCF_COMPARE_AND_WRITE		= 0x00080000,
-	SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC = 0x00200000,
-	SCF_ACK_KREF			= 0x00400000,
-	SCF_USE_CPUID			= 0x00800000,
-	SCF_TASK_ATTR_SET		= 0x01000000,
-	SCF_TREAT_READ_AS_NORMAL	= 0x02000000,
+	SCF_SUPPORTED_SAM_OPCODE		= (1 << 0),
+	SCF_TRANSPORT_TASK_SENSE		= (1 << 1),
+	SCF_EMULATED_TASK_SENSE			= (1 << 2),
+	SCF_SCSI_DATA_CDB			= (1 << 3),
+	SCF_SCSI_TMR_CDB			= (1 << 4),
+	SCF_FUA					= (1 << 5),
+	SCF_SE_LUN_CMD				= (1 << 6),
+	SCF_BIDI				= (1 << 7),
+	SCF_SENT_CHECK_CONDITION		= (1 << 8),
+	SCF_OVERFLOW_BIT			= (1 << 9),
+	SCF_UNDERFLOW_BIT			= (1 << 10),
+	SCF_ALUA_NON_OPTIMIZED			= (1 << 11),
+	SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC	= (1 << 12),
+	SCF_COMPARE_AND_WRITE			= (1 << 13),
+	SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC	= (1 << 14),
+	SCF_ACK_KREF				= (1 << 15),
+	SCF_USE_CPUID				= (1 << 16),
+	SCF_TASK_ATTR_SET			= (1 << 17),
+	SCF_TREAT_READ_AS_NORMAL		= (1 << 18),
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/11] target: fix backend plugging
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (5 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 06/11] target: cleanup cmd flag bits Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 11:35 ` [PATCH 08/11] target iblock: add backend plug/unplug callouts Mike Christie
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

target_core_iblock is plugging and unplugging on every command and this
is causing perf issues for drivers that prefer batched cmds. With the
last patches we can now take multiple cmds from a fabric driver queue
and then pass them down the backend drivers in a batch. This patch adds
this support by adding 2 callouts to the backend for plugging and
unplugging the device. The next 2 patches add support for iblock and
tcmu device plugging.

Note: These patches currently only work for drivers like vhost and loop
which can just run target_execute_cmd from their write_pending callout
because they have all their data already and they have access to their
transport queues so they can batch multiple cmds to lio core.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 53 +++++++++++++++++++++++++-
 include/target/target_core_backend.h   |  2 +
 include/target/target_core_base.h      |  8 ++++
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index dec89e911348..35aa201ed80b 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -227,6 +227,48 @@ static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
 	wake_up(&sess->cmd_count_wq);
 }
 
+static void target_plug_device(struct se_cmd *se_cmd)
+{
+	struct se_device *se_dev = se_cmd->se_dev;
+	struct se_sess_cmd_queue *sq = se_cmd->sq;
+	struct se_dev_plug *se_plug;
+
+	if (!(se_cmd->se_cmd_flags & SCF_BATCHED) ||
+	    !se_dev->transport->plug_device)
+		return;
+
+	se_plug = se_dev->transport->plug_device(se_cmd);
+	if (!se_plug)
+		return;
+
+	/*
+	 * We have a ref to the lun at this point, but the cmds could
+	 * complete before we unplug, so grab a ref to the se_device so we
+	 * can call back into the backend.
+	 */
+	config_group_get(&se_dev->dev_group);
+	se_plug->se_dev = se_dev;
+	llist_add(&se_plug->plug_node, &sq->plug_list);
+}
+
+static void target_unplug_device(struct se_dev_plug *se_plug)
+{
+	struct se_device *se_dev = se_plug->se_dev;
+
+	se_dev->transport->unplug_device(se_plug);
+	config_group_put(&se_dev->dev_group);
+}
+
+static void target_unplug_sq(struct se_sess_cmd_queue *sq)
+{
+	struct se_dev_plug *se_plug, *next_plug;
+	struct llist_node *plug_list;
+
+	plug_list = llist_del_all(&sq->plug_list);
+	llist_for_each_entry_safe(se_plug, next_plug, plug_list, plug_node)
+		target_unplug_device(se_plug);
+}
+
 static void target_queued_submit_work(struct work_struct *work)
 {
 	struct se_sess_cmd_queue *sq =
@@ -242,8 +284,14 @@ static void target_queued_submit_work(struct work_struct *work)
 		return;
 
 	cmd_list = llist_reverse_order(cmd_list);
-	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
+	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
+		se_cmd->sq = sq;
+		se_cmd->se_cmd_flags |= SCF_BATCHED;
+
 		se_sess->tfo->submit_queued_cmd(se_cmd);
+	}
+
+	target_unplug_sq(sq);
 }
 
 static void target_queue_cmd_work(struct se_sess_cmd_queue *q,
@@ -284,6 +332,7 @@ static void target_init_sess_cmd_queues(struct se_session *se_sess,
 	int i;
 
 	for (i = 0; i < se_sess->q_cnt; i++) {
+		init_llist_head(&q[i].plug_list);
 		init_llist_head(&q[i].cmd_list);
 		INIT_WORK(&q[i].work, work_fn);
 		q[i].se_sess = se_sess;
@@ -1759,6 +1808,8 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 		return 0;
 	}
 
+	target_plug_device(se_cmd);
+
 	rc = target_cmd_parse_cdb(se_cmd);
 	if (rc != 0) {
 		transport_generic_request_failure(se_cmd, rc);
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index 6336780d83a7..45b5ae885af6 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -34,6 +34,8 @@ struct target_backend_ops {
 	int (*configure_device)(struct se_device *);
 	void (*destroy_device)(struct se_device *);
 	void (*free_device)(struct se_device *device);
+	struct se_dev_plug *(*plug_device)(struct se_cmd *se_cmd);
+	void (*unplug_device)(struct se_dev_plug *se_plug);
 
 	ssize_t (*set_configfs_dev_params)(struct se_device *,
 					   const char *, ssize_t);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index b7f92a15cd1c..10ac30f7f638 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -146,6 +146,7 @@ enum se_cmd_flags_table {
 	SCF_USE_CPUID				= (1 << 16),
 	SCF_TASK_ATTR_SET			= (1 << 17),
 	SCF_TREAT_READ_AS_NORMAL		= (1 << 18),
+	SCF_BATCHED				= (1 << 19),
 };
 
 /*
@@ -513,6 +514,7 @@ struct se_cmd {
 	struct completion	t_transport_stop_comp;
 
 	struct work_struct	work;
+	struct se_sess_cmd_queue *sq;
 
 	struct scatterlist	*t_data_sg;
 	struct scatterlist	*t_data_sg_orig;
@@ -612,9 +614,15 @@ static inline struct se_node_acl *fabric_stat_to_nacl(struct config_item *item)
 			acl_fabric_stat_group);
 }
 
+struct se_dev_plug {
+	struct se_device        *se_dev;
+	struct llist_node	plug_node;
+};
+
 struct se_sess_cmd_queue {
 	struct llist_head	cmd_list;
 	struct work_struct	work;
+	struct llist_head	plug_list;
 	struct se_session	*se_sess;
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/11] target iblock: add backend plug/unplug callouts
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (6 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 07/11] target: fix backend plugging Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 23:23   ` Chaitanya Kulkarni
  2021-02-07  1:06   ` Chaitanya Kulkarni
  2021-02-04 11:35 ` [PATCH 09/11] target_core_user: " Mike Christie
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

This patch adds plug/unplug callouts for iblock. For initiator drivers
like iscsi which wants to pass multiple cmds to its xmit thread instead
of one cmd at a time, this increases IOPs by around 10% with vhost-scsi
(combined with the last patches we can see a total 40-50% increase). For
driver combos like tcm_loop and faster drivers like the iser initiator, we
can still see IOPs increase by 20-30% when tcm_loop's nr_hw_queues setting
is also increased.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_iblock.c | 41 ++++++++++++++++++++++++++++-
 drivers/target/target_core_iblock.h | 10 +++++++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 8ed93fd205c7..a4951e662615 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -61,9 +61,18 @@ static struct se_device *iblock_alloc_device(struct se_hba *hba, const char *nam
 		return NULL;
 	}
 
+	ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
+				   GFP_KERNEL);
+	if (!ib_dev->ibd_plug)
+		goto free_dev;
+
 	pr_debug( "IBLOCK: Allocated ib_dev for %s\n", name);
 
 	return &ib_dev->dev;
+
+free_dev:
+	kfree(ib_dev);
+	return NULL;
 }
 
 static int iblock_configure_device(struct se_device *dev)
@@ -171,6 +180,7 @@ static void iblock_dev_call_rcu(struct rcu_head *p)
 	struct se_device *dev = container_of(p, struct se_device, rcu_head);
 	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
 
+	kfree(ib_dev->ibd_plug);
 	kfree(ib_dev);
 }
 
@@ -188,6 +198,30 @@ static void iblock_destroy_device(struct se_device *dev)
 	bioset_exit(&ib_dev->ibd_bio_set);
 }
 
+static struct se_dev_plug *iblock_plug_device(struct se_cmd *se_cmd)
+{
+	struct se_device *se_dev = se_cmd->se_dev;
+	struct iblock_dev *ib_dev = IBLOCK_DEV(se_dev);
+	struct iblock_dev_plug *ib_dev_plug;
+
+	ib_dev_plug = &ib_dev->ibd_plug[se_cmd->cpuid];
+	if (test_and_set_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags))
+		return NULL;
+
+	blk_start_plug(&ib_dev_plug->blk_plug);
+	return &ib_dev_plug->se_plug;
+}
+
+static void iblock_unplug_device(struct se_dev_plug *se_plug)
+{
+	struct iblock_dev_plug *ib_dev_plug =
+				container_of(se_plug, struct iblock_dev_plug,
+					     se_plug);
+
+	blk_finish_plug(&ib_dev_plug->blk_plug);
+	clear_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags);
+}
+
 static unsigned long long iblock_emulate_read_cap_with_block_size(
 	struct se_device *dev,
 	struct block_device *bd,
@@ -337,7 +371,10 @@ static void iblock_submit_bios(struct bio_list *list)
 {
 	struct blk_plug plug;
 	struct bio *bio;
-
+	/*
+	 * The block layer handles nested plugs, so just plug/unplug to handle
+	 * fabric drivers that didn't support batching and multi bio cmds.
+	 */
 	blk_start_plug(&plug);
 	while ((bio = bio_list_pop(list)))
 		submit_bio(bio);
@@ -870,6 +907,8 @@ static const struct target_backend_ops iblock_ops = {
 	.configure_device	= iblock_configure_device,
 	.destroy_device		= iblock_destroy_device,
 	.free_device		= iblock_free_device,
+	.plug_device		= iblock_plug_device,
+	.unplug_device		= iblock_unplug_device,
 	.parse_cdb		= iblock_parse_cdb,
 	.set_configfs_dev_params = iblock_set_configfs_dev_params,
 	.show_configfs_dev_params = iblock_show_configfs_dev_params,
diff --git a/drivers/target/target_core_iblock.h b/drivers/target/target_core_iblock.h
index cefc641145b3..8c55375d2f75 100644
--- a/drivers/target/target_core_iblock.h
+++ b/drivers/target/target_core_iblock.h
@@ -4,6 +4,7 @@
 
 #include <linux/atomic.h>
 #include <linux/refcount.h>
+#include <linux/blkdev.h>
 #include <target/target_core_base.h>
 
 #define IBLOCK_VERSION		"4.0"
@@ -17,6 +18,14 @@ struct iblock_req {
 
 #define IBDF_HAS_UDEV_PATH		0x01
 
+#define IBD_PLUGF_PLUGGED		0x01
+
+struct iblock_dev_plug {
+	struct se_dev_plug se_plug;
+	struct blk_plug blk_plug;
+	unsigned long flags;
+};
+
 struct iblock_dev {
 	struct se_device dev;
 	unsigned char ibd_udev_path[SE_UDEV_PATH_LEN];
@@ -24,6 +33,7 @@ struct iblock_dev {
 	struct bio_set	ibd_bio_set;
 	struct block_device *ibd_bd;
 	bool ibd_readonly;
+	struct iblock_dev_plug *ibd_plug;
 } ____cacheline_aligned;
 
 #endif /* TARGET_CORE_IBLOCK_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/11] target_core_user: add backend plug/unplug callouts
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (7 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 08/11] target iblock: add backend plug/unplug callouts Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 23:25   ` Chaitanya Kulkarni
  2021-02-04 11:35 ` [PATCH 10/11] target: replace work per cmd in completion path Mike Christie
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

This patch adds plug/unplug callouts for tcmu, so we can avoid the
number of times we switch to userspace. Using this driver with tcm
loop is a common config, and dependng on the nr_hw_queues and fio
jobs this patch can increase IOPs by only around 5% because we
hit other issues like the big per tcmu device mutex.

Bodo, because the improvement is so small I'm not sure if we
want this patch. I was thinking when you fix those other issues
you've been working on then it might be more useful.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_user.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index a5991df23581..d67be2f959b9 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -111,6 +111,7 @@ struct tcmu_dev {
 	struct kref kref;
 
 	struct se_device se_dev;
+	struct se_dev_plug se_plug;
 
 	char *name;
 	struct se_hba *hba;
@@ -119,6 +120,7 @@ struct tcmu_dev {
 #define TCMU_DEV_BIT_BROKEN 1
 #define TCMU_DEV_BIT_BLOCKED 2
 #define TCMU_DEV_BIT_TMR_NOTIFY 3
+#define TCM_DEV_BIT_PLUGGED 4
 	unsigned long flags;
 
 	struct uio_info uio_info;
@@ -959,6 +961,26 @@ static uint32_t ring_insert_padding(struct tcmu_dev *udev, size_t cmd_size)
 	return cmd_head;
 }
 
+static void tcmu_unplug_device(struct se_dev_plug *se_plug)
+{
+	struct se_device *se_dev = se_plug->se_dev;
+	struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+	uio_event_notify(&udev->uio_info);
+	clear_bit(TCM_DEV_BIT_PLUGGED, &udev->flags);
+}
+
+static struct se_dev_plug *tcmu_plug_device(struct se_cmd *se_cmd)
+{
+	struct se_device *se_dev = se_cmd->se_dev;
+	struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+	if (!test_and_set_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+		return &udev->se_plug;
+
+	return NULL;
+}
+
 /**
  * queue_cmd_ring - queue cmd to ring or internally
  * @tcmu_cmd: cmd to queue
@@ -1086,8 +1108,8 @@ static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
 
 	list_add_tail(&tcmu_cmd->queue_entry, &udev->inflight_queue);
 
-	/* TODO: only if FLUSH and FUA? */
-	uio_event_notify(&udev->uio_info);
+	if (!test_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+		uio_event_notify(&udev->uio_info);
 
 	return 0;
 
@@ -2840,6 +2862,8 @@ static struct target_backend_ops tcmu_ops = {
 	.configure_device	= tcmu_configure_device,
 	.destroy_device		= tcmu_destroy_device,
 	.free_device		= tcmu_free_device,
+	.unplug_device		= tcmu_unplug_device,
+	.plug_device		= tcmu_plug_device,
 	.parse_cdb		= tcmu_parse_cdb,
 	.tmr_notify		= tcmu_tmr_notify,
 	.set_configfs_dev_params = tcmu_set_configfs_dev_params,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/11] target: replace work per cmd in completion path
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (8 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 09/11] target_core_user: " Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-04 23:26   ` Chaitanya Kulkarni
  2021-02-04 11:35 ` [PATCH 11/11] target, vhost-scsi: don't switch cpus on completion Mike Christie
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

Doing a work per cmd can lead to lots of threads being created.
This patch just replaces the completion work per cmd with a list.
Combined with the first patches this allows tcm loop with higher
perf initiators like iser to go from around 700K IOPs to 1000K
and reduces the number of threads that get created when the system
is under heavy load and hitting the initiator drivers tagging limits.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 124 +++++++++++++++----------
 include/target/target_core_base.h      |   1 +
 2 files changed, 77 insertions(+), 48 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 35aa201ed80b..57022285badb 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -55,7 +55,7 @@ static void transport_complete_task_attr(struct se_cmd *cmd);
 static void translate_sense_reason(struct se_cmd *cmd, sense_reason_t reason);
 static void transport_handle_queue_full(struct se_cmd *cmd,
 		struct se_device *dev, int err, bool write_pending);
-static void target_complete_ok_work(struct work_struct *work);
+static void target_queued_compl_work(struct work_struct *work);
 
 int init_se_kmem_caches(void)
 {
@@ -295,10 +295,20 @@ static void target_queued_submit_work(struct work_struct *work)
 }
 
 static void target_queue_cmd_work(struct se_sess_cmd_queue *q,
-				  struct se_cmd *se_cmd, int cpu)
+				  struct se_cmd *se_cmd, int cpu,
+				  struct workqueue_struct *wq)
 {
 	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
-	queue_work_on(cpu, target_submission_wq, &q->work);
+	queue_work_on(cpu, wq, &q->work);
+}
+
+static void target_queue_cmd_compl(struct se_cmd *se_cmd)
+{
+	struct se_session *se_sess = se_cmd->se_sess;
+	int cpu = se_cmd->cpuid;
+
+	target_queue_cmd_work(&se_sess->cq[cpu], se_cmd, cpu,
+			      target_completion_wq);
 }
 
 /**
@@ -310,7 +320,8 @@ void target_queue_cmd_submit(struct se_session *se_sess, struct se_cmd *se_cmd)
 {
 	int cpu = smp_processor_id();
 
-	target_queue_cmd_work(&se_sess->sq[cpu], se_cmd, cpu);
+	target_queue_cmd_work(&se_sess->sq[cpu], se_cmd, cpu,
+			      target_submission_wq);
 }
 EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
 
@@ -318,11 +329,13 @@ static void target_flush_queued_cmds(struct se_session *se_sess)
 {
 	int i;
 
-	if (!se_sess->sq)
-		return;
+	if (se_sess->sq) {
+		for (i = 0; i < se_sess->q_cnt; i++)
+			cancel_work_sync(&se_sess->sq[i].work);
+	}
 
 	for (i = 0; i < se_sess->q_cnt; i++)
-		cancel_work_sync(&se_sess->sq[i].work);
+		cancel_work_sync(&se_sess->cq[i].work);
 }
 
 static void target_init_sess_cmd_queues(struct se_session *se_sess,
@@ -359,13 +372,21 @@ int transport_init_session(const struct target_core_fabric_ops *tfo,
 	atomic_set(&se_sess->stopped, 0);
 	se_sess->tfo = tfo;
 
+	se_sess->cq = kcalloc(nr_cpu_ids, sizeof(*se_sess->cq), GFP_KERNEL);
+	if (!se_sess->cq)
+		return -ENOMEM;
+	se_sess->q_cnt = nr_cpu_ids;
+	target_init_sess_cmd_queues(se_sess, se_sess->cq,
+				    target_queued_compl_work);
+
 	if (tfo->submit_queued_cmd) {
 		se_sess->sq = kcalloc(nr_cpu_ids, sizeof(*se_sess->sq),
 				      GFP_KERNEL);
-		if (!se_sess->sq)
-			return -ENOMEM;
+		if (!se_sess->sq) {
+			rc = -ENOMEM;
+			goto free_cq;
+		}
 
-		se_sess->q_cnt = nr_cpu_ids;
 		target_init_sess_cmd_queues(se_sess, se_sess->sq,
 					    target_queued_submit_work);
 	}
@@ -379,12 +400,15 @@ int transport_init_session(const struct target_core_fabric_ops *tfo,
 
 free_sq:
 	kfree(se_sess->sq);
+free_cq:
+	kfree(se_sess->cq);
 	return rc;
 }
 EXPORT_SYMBOL(transport_init_session);
 
 void transport_uninit_session(struct se_session *se_sess)
 {
+	kfree(se_sess->cq);
 	kfree(se_sess->sq);
 	/*
 	 * Drivers like iscsi and loop do not call target_stop_session
@@ -877,14 +901,6 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
 		percpu_ref_put(&lun->lun_ref);
 }
 
-static void target_complete_failure_work(struct work_struct *work)
-{
-	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
-
-	transport_generic_request_failure(cmd,
-			TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
-}
-
 /*
  * Used when asking transport to copy Sense Data from the underlying
  * Linux/SCSI struct scsi_cmnd
@@ -972,13 +988,6 @@ static void target_handle_abort(struct se_cmd *cmd)
 	transport_cmd_check_stop_to_fabric(cmd);
 }
 
-static void target_abort_work(struct work_struct *work)
-{
-	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
-
-	target_handle_abort(cmd);
-}
-
 static bool target_cmd_interrupted(struct se_cmd *cmd)
 {
 	int post_ret;
@@ -986,8 +995,8 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
 	if (cmd->transport_state & CMD_T_ABORTED) {
 		if (cmd->transport_complete_callback)
 			cmd->transport_complete_callback(cmd, false, &post_ret);
-		INIT_WORK(&cmd->work, target_abort_work);
-		queue_work(target_completion_wq, &cmd->work);
+
+		target_queue_cmd_compl(cmd);
 		return true;
 	} else if (cmd->transport_state & CMD_T_STOP) {
 		if (cmd->transport_complete_callback)
@@ -1002,7 +1011,6 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
 /* May be called from interrupt context so must not sleep. */
 void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 {
-	int success;
 	unsigned long flags;
 
 	if (target_cmd_interrupted(cmd))
@@ -1011,25 +1019,11 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
 	cmd->scsi_status = scsi_status;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
-	switch (cmd->scsi_status) {
-	case SAM_STAT_CHECK_CONDITION:
-		if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
-			success = 1;
-		else
-			success = 0;
-		break;
-	default:
-		success = 1;
-		break;
-	}
-
 	cmd->t_state = TRANSPORT_COMPLETE;
 	cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
 	spin_unlock_irqrestore(&cmd->t_state_lock, flags);
 
-	INIT_WORK(&cmd->work, success ? target_complete_ok_work :
-		  target_complete_failure_work);
-	queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
+	target_queue_cmd_compl(cmd);
 }
 EXPORT_SYMBOL(target_complete_cmd);
 
@@ -2006,8 +2000,7 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 		cmd->transport_complete_callback(cmd, false, &post_ret);
 
 	if (cmd->transport_state & CMD_T_ABORTED) {
-		INIT_WORK(&cmd->work, target_abort_work);
-		queue_work(target_completion_wq, &cmd->work);
+		target_queue_cmd_compl(cmd);
 		return;
 	}
 
@@ -2433,10 +2426,32 @@ static bool target_read_prot_action(struct se_cmd *cmd)
 	return false;
 }
 
-static void target_complete_ok_work(struct work_struct *work)
+static void target_complete_cmd_work(struct se_cmd *cmd)
 {
-	struct se_cmd *cmd = container_of(work, struct se_cmd, work);
-	int ret;
+	int ret, success;
+
+	if (cmd->transport_state & CMD_T_ABORTED) {
+		target_handle_abort(cmd);
+		return;
+	}
+
+	switch (cmd->scsi_status) {
+	case SAM_STAT_CHECK_CONDITION:
+		if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
+			success = 1;
+		else
+			success = 0;
+		break;
+	default:
+		success = 1;
+		break;
+	}
+
+	if (!success) {
+		transport_generic_request_failure(cmd,
+				TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+		return;
+	}
 
 	/*
 	 * Check if we need to move delayed/dormant tasks from cmds on the
@@ -2578,6 +2593,19 @@ static void target_complete_ok_work(struct work_struct *work)
 	transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
 }
 
+static void target_queued_compl_work(struct work_struct *work)
+{
+	struct se_sess_cmd_queue *cq =
+				container_of(work, struct se_sess_cmd_queue,
+					     work);
+	struct se_cmd *se_cmd, *next_cmd;
+	struct llist_node *cmd_list;
+
+	cmd_list = llist_del_all(&cq->cmd_list);
+	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
+		target_complete_cmd_work(se_cmd);
+}
+
 void target_free_sgl(struct scatterlist *sgl, int nents)
 {
 	sgl_free_n_order(sgl, nents, 0);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 10ac30f7f638..6b32e8d26347 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -643,6 +643,7 @@ struct se_session {
 	void			*sess_cmd_map;
 	struct sbitmap_queue	sess_tag_pool;
 	const struct target_core_fabric_ops *tfo;
+	struct se_sess_cmd_queue *cq;
 	struct se_sess_cmd_queue *sq;
 	int			q_cnt;
 };
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/11] target, vhost-scsi: don't switch cpus on completion
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
                   ` (9 preceding siblings ...)
  2021-02-04 11:35 ` [PATCH 10/11] target: replace work per cmd in completion path Mike Christie
@ 2021-02-04 11:35 ` Mike Christie
  2021-02-08 10:48   ` Stefan Hajnoczi
  2021-02-08 12:01   ` Michael S. Tsirkin
  12 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-04 11:35 UTC (permalink / raw)
  To: martin.petersen, linux-scsi, target-devel, mst, jasowang,
	stefanha, virtualization
  Cc: Mike Christie

LIO wants to complete a cmd on the CPU it was submitted on, because
most drivers have per cpu or hw queue handlers. But, for vhost-scsi
which has the single thread for submissions and completions this
is not always the best thing to do since the thread could be running
on a different CPU now, and it conflicts with what the user has setup
in the lower levels with settings like the block layer rq_affinity
or for network block devices what the user has setup on their nic.

This patch has vhost-scsi tell LIO to complete the cmd on the CPU the
layer below LIO has completed the cmd on. We then stop fighting
the block, net and whatever layer/setting is below us.

With this patch and the previous ones I see an increase in IOPs by about
50% (234K -> 350K) for random 4K workloads like:

fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
--ioengine=libaio --iodepth=128  --numjobs=8 --time_based
--group_reporting --runtime=60

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/target/target_core_transport.c | 10 +++++++++-
 drivers/vhost/scsi.c                   |  3 ++-
 include/target/target_core_base.h      |  2 ++
 3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 57022285badb..5475f628a119 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -305,7 +305,12 @@ static void target_queue_cmd_work(struct se_sess_cmd_queue *q,
 static void target_queue_cmd_compl(struct se_cmd *se_cmd)
 {
 	struct se_session *se_sess = se_cmd->se_sess;
-	int cpu = se_cmd->cpuid;
+	int cpu;
+
+	if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
+		cpu = smp_processor_id();
+	else
+		cpu = se_cmd->cpuid;
 
 	target_queue_cmd_work(&se_sess->cq[cpu], se_cmd, cpu,
 			      target_completion_wq);
@@ -1758,6 +1763,9 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
 	BUG_ON(!se_tpg);
 	BUG_ON(se_cmd->se_tfo || se_cmd->se_sess);
 
+	if (flags & TARGET_SCF_IGNORE_CPUID_COMPL)
+		se_cmd->se_cmd_flags |= SCF_IGNORE_CPUID_COMPL;
+
 	if (flags & TARGET_SCF_USE_CPUID)
 		se_cmd->se_cmd_flags |= SCF_USE_CPUID;
 	/*
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index aacad9e222ff..baee85dbf97c 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -804,7 +804,8 @@ static void vhost_scsi_submit_queued_cmd(struct se_cmd *se_cmd)
 			cmd->tvc_cdb, &cmd->tvc_sense_buf[0],
 			cmd->tvc_lun, cmd->tvc_exp_data_len,
 			vhost_scsi_to_tcm_attr(cmd->tvc_task_attr),
-			cmd->tvc_data_direction, TARGET_SCF_ACK_KREF,
+			cmd->tvc_data_direction,
+			TARGET_SCF_ACK_KREF | TARGET_SCF_IGNORE_CPUID_COMPL,
 			sg_ptr, cmd->tvc_sgl_count, NULL, 0, sg_prot_ptr,
 			cmd->tvc_prot_sgl_count);
 	if (rc < 0) {
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 6b32e8d26347..13514c59ae3d 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -147,6 +147,7 @@ enum se_cmd_flags_table {
 	SCF_TASK_ATTR_SET			= (1 << 17),
 	SCF_TREAT_READ_AS_NORMAL		= (1 << 18),
 	SCF_BATCHED				= (1 << 19),
+	SCF_IGNORE_CPUID_COMPL			= (1 << 20),
 };
 
 /*
@@ -197,6 +198,7 @@ enum target_sc_flags_table {
 	TARGET_SCF_ACK_KREF		= 0x02,
 	TARGET_SCF_UNKNOWN_SIZE		= 0x04,
 	TARGET_SCF_USE_CPUID		= 0x08,
+	TARGET_SCF_IGNORE_CPUID_COMPL	= 0x10,
 };
 
 /* fabric independent task management function values */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] target: add workqueue cmd submission helper
  2021-02-04 11:35 ` [PATCH 02/11] target: add workqueue cmd submission helper Mike Christie
@ 2021-02-04 23:13   ` Chaitanya Kulkarni
  2021-02-05  0:43     ` michael.christie
  0 siblings, 1 reply; 32+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-04 23:13 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, stefanha, virtualization

On 2/4/21 03:41, Mike Christie wrote:
> loop and vhost-scsi do their target cmd submission from driver
> workqueues. This allows them to avoid an issue where the backend may
> block waiting for resources like tags/requests, mem/locks, etc
> and that ends up blocking their entire submission path and for the
> case of vhost-scsi both the submission and completion path.
>
> This patch adds a helper these drivers can use to submit from the
> lio workqueue. This code will then be extended in the next patches
> to fix the plugging of backend devices.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/target/target_core_transport.c | 102 ++++++++++++++++++++++++-
>  include/target/target_core_base.h      |  10 ++-
>  include/target/target_core_fabric.h    |   3 +
>  3 files changed, 111 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 7c5d37bac561..dec89e911348 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -41,6 +41,7 @@
>  #include <trace/events/target.h>
>  
>  static struct workqueue_struct *target_completion_wq;
> +static struct workqueue_struct *target_submission_wq;
>  static struct kmem_cache *se_sess_cache;
>  struct kmem_cache *se_ua_cache;
>  struct kmem_cache *t10_pr_reg_cache;
> @@ -129,8 +130,15 @@ int init_se_kmem_caches(void)
>  	if (!target_completion_wq)
>  		goto out_free_lba_map_mem_cache;
>  
> +	target_submission_wq = alloc_workqueue("target_submission",
> +					       WQ_MEM_RECLAIM, 0);
> +	if (!target_submission_wq)
> +		goto out_free_completion_wq;
> +
>  	return 0;
>  
> +out_free_completion_wq:
> +	destroy_workqueue(target_completion_wq);
>  out_free_lba_map_mem_cache:
>  	kmem_cache_destroy(t10_alua_lba_map_mem_cache);
>  out_free_lba_map_cache:
> @@ -153,6 +161,7 @@ int init_se_kmem_caches(void)
>  
>  void release_se_kmem_caches(void)
>  {
> +	destroy_workqueue(target_submission_wq);
>  	destroy_workqueue(target_completion_wq);
>  	kmem_cache_destroy(se_sess_cache);
>  	kmem_cache_destroy(se_ua_cache);
> @@ -218,6 +227,69 @@ static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
>  	wake_up(&sess->cmd_count_wq);
>  }
>  
> +static void target_queued_submit_work(struct work_struct *work)
> +{
> +	struct se_sess_cmd_queue *sq =
> +				container_of(work, struct se_sess_cmd_queue,
> +					     work);
> +	struct se_session *se_sess = sq->se_sess;
> +	struct se_cmd *se_cmd, *next_cmd;
> +	struct llist_node *cmd_list;
> +
> +	cmd_list = llist_del_all(&sq->cmd_list);
> +	if (!cmd_list)
> +		/* Previous call took what we were queued to submit */
> +		return;
> +
> +	cmd_list = llist_reverse_order(cmd_list);
> +	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
> +		se_sess->tfo->submit_queued_cmd(se_cmd);
> +}
> +
> +static void target_queue_cmd_work(struct se_sess_cmd_queue *q,
> +				  struct se_cmd *se_cmd, int cpu)
> +{
> +	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
> +	queue_work_on(cpu, target_submission_wq, &q->work);
> +}
> +
> +/**
> + * target_queue_cmd_submit - queue a se_cmd to be executed from the lio wq
> + * @se_sess: cmd's session
> + * @cmd_list: cmd to queue
> + */
> +void target_queue_cmd_submit(struct se_session *se_sess, struct se_cmd *se_cmd)
> +{
> +	int cpu = smp_processor_id();
> +
> +	target_queue_cmd_work(&se_sess->sq[cpu], se_cmd, cpu);
> +}
> +EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
> +
> +static void target_flush_queued_cmds(struct se_session *se_sess)
> +{
> +	int i;
> +
> +	if (!se_sess->sq)
> +		return;
> +
> +	for (i = 0; i < se_sess->q_cnt; i++)
> +		cancel_work_sync(&se_sess->sq[i].work);
> +}
> +
> +static void target_init_sess_cmd_queues(struct se_session *se_sess,
> +					struct se_sess_cmd_queue *q,
> +					void (*work_fn)(struct work_struct *work))
> +{
> +	int i;
> +
> +	for (i = 0; i < se_sess->q_cnt; i++) {
> +		init_llist_head(&q[i].cmd_list);
> +		INIT_WORK(&q[i].work, work_fn);
> +		q[i].se_sess = se_sess;
> +	}
> +}
> +
Can we opencode above function if there is only one caller ?
unless there is a specific reason to have it on its own which I failed to
understand.
>  /**
>   * transport_init_session - initialize a session object
>   * @tfo: target core fabric ops
> @@ -228,6 +300,8 @@ static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
>  int transport_init_session(const struct target_core_fabric_ops *tfo,
>  			   struct se_session *se_sess)
>  {
> +	int rc;
> +
>  	INIT_LIST_HEAD(&se_sess->sess_list);
>  	INIT_LIST_HEAD(&se_sess->sess_acl_list);
>  	spin_lock_init(&se_sess->sess_cmd_lock);
> @@ -235,13 +309,34 @@ int transport_init_session(const struct target_core_fabric_ops *tfo,
>  	init_completion(&se_sess->stop_done);
>  	atomic_set(&se_sess->stopped, 0);
>  	se_sess->tfo = tfo;
> -	return percpu_ref_init(&se_sess->cmd_count,
> -			       target_release_sess_cmd_refcnt, 0, GFP_KERNEL);
> +
> +	if (tfo->submit_queued_cmd) {
> +		se_sess->sq = kcalloc(nr_cpu_ids, sizeof(*se_sess->sq),
> +				      GFP_KERNEL);
> +		if (!se_sess->sq)
> +			return -ENOMEM;
> +
> +		se_sess->q_cnt = nr_cpu_ids;
> +		target_init_sess_cmd_queues(se_sess, se_sess->sq,
> +					    target_queued_submit_work);
> +	}
> +
> +	rc = percpu_ref_init(&se_sess->cmd_count,
> +			     target_release_sess_cmd_refcnt, 0, GFP_KERNEL);
> +	if (rc)
> +		goto free_sq;
> +
> +	return 0;
> +
> +free_sq:
> +	kfree(se_sess->sq);
> +	return rc;
>  }
>  EXPORT_SYMBOL(transport_init_session);
>  
>  void transport_uninit_session(struct se_session *se_sess)
>  {
> +	kfree(se_sess->sq);
>  	/*
>  	 * Drivers like iscsi and loop do not call target_stop_session
>  	 * during session shutdown so we have to drop the ref taken at init
> @@ -1385,7 +1480,6 @@ void transport_init_se_cmd(
>  {
>  	INIT_LIST_HEAD(&cmd->se_delayed_node);
>  	INIT_LIST_HEAD(&cmd->se_qf_node);
> -	INIT_LIST_HEAD(&cmd->se_cmd_list);
>  	INIT_LIST_HEAD(&cmd->state_list);
>  	init_completion(&cmd->t_transport_stop_comp);
>  	cmd->free_compl = NULL;
> @@ -2968,6 +3062,8 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
>  {
>  	int ret;
>  
> +	target_flush_queued_cmds(se_sess);
> +
>  	WARN_ON_ONCE(!atomic_read(&se_sess->stopped));
>  
>  	do {
> diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
> index 50103a22b0e2..97138bff14d1 100644
> --- a/include/target/target_core_base.h
> +++ b/include/target/target_core_base.h
> @@ -488,7 +488,7 @@ struct se_cmd {
>  	/* Only used for internal passthrough and legacy TCM fabric modules */
>  	struct se_session	*se_sess;
>  	struct se_tmr_req	*se_tmr_req;
> -	struct list_head	se_cmd_list;
> +	struct llist_node	se_cmd_list;
>  	struct completion	*free_compl;
>  	struct completion	*abrt_compl;
>  	const struct target_core_fabric_ops *se_tfo;
> @@ -612,6 +612,12 @@ static inline struct se_node_acl *fabric_stat_to_nacl(struct config_item *item)
>  			acl_fabric_stat_group);
>  }
>  
> +struct se_sess_cmd_queue {
> +	struct llist_head	cmd_list;
> +	struct work_struct	work;
> +	struct se_session	*se_sess;
> +};
> +
>  struct se_session {
>  	atomic_t		stopped;
>  	u64			sess_bin_isid;
> @@ -629,6 +635,8 @@ struct se_session {
>  	void			*sess_cmd_map;
>  	struct sbitmap_queue	sess_tag_pool;
>  	const struct target_core_fabric_ops *tfo;
> +	struct se_sess_cmd_queue *sq;
> +	int			q_cnt;
>  };
>  
>  struct se_device;
> diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
> index cdf610838ba5..899948967a65 100644
> --- a/include/target/target_core_fabric.h
> +++ b/include/target/target_core_fabric.h
> @@ -80,6 +80,7 @@ struct target_core_fabric_ops {
>  	int (*queue_status)(struct se_cmd *);
>  	void (*queue_tm_rsp)(struct se_cmd *);
>  	void (*aborted_task)(struct se_cmd *);
> +	void (*submit_queued_cmd)(struct se_cmd *);
>  	/*
>  	 * fabric module calls for target_core_fabric_configfs.c
>  	 */
> @@ -166,6 +167,8 @@ int	target_submit_tmr(struct se_cmd *se_cmd, struct se_session *se_sess,
>  		unsigned char *sense, u64 unpacked_lun,
>  		void *fabric_tmr_ptr, unsigned char tm_type,
>  		gfp_t, u64, int);
> +void	target_queue_cmd_submit(struct se_session *se_sess,
> +				struct se_cmd *se_cmd);
>  int	transport_handle_cdb_direct(struct se_cmd *);
>  sense_reason_t	transport_generic_new_cmd(struct se_cmd *);
>  


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/11] target: cleanup cmd flag bits
  2021-02-04 11:35 ` [PATCH 06/11] target: cleanup cmd flag bits Mike Christie
@ 2021-02-04 23:15   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 32+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-04 23:15 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, stefanha, virtualization

On 2/4/21 03:40, Mike Christie wrote:
> We have a couple holes in the cmd flags definitions. This cleans
> up the definitions to fix that and make it easier to read.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
Looks good.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] target iblock: add backend plug/unplug callouts
  2021-02-04 11:35 ` [PATCH 08/11] target iblock: add backend plug/unplug callouts Mike Christie
@ 2021-02-04 23:23   ` Chaitanya Kulkarni
  2021-02-05  0:45     ` michael.christie
  2021-02-07  1:06   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 32+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-04 23:23 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, stefanha, virtualization

On 2/4/21 03:40, Mike Christie wrote:
> This patch adds plug/unplug callouts for iblock. For initiator drivers
> like iscsi which wants to pass multiple cmds to its xmit thread instead
> of one cmd at a time, this increases IOPs by around 10% with vhost-scsi
> (combined with the last patches we can see a total 40-50% increase). For
> driver combos like tcm_loop and faster drivers like the iser initiator, we
> can still see IOPs increase by 20-30% when tcm_loop's nr_hw_queues setting
> is also increased.
>
> Signed-off-by: Mike Christie <michael.christie@oracle.com>
> ---
>  drivers/target/target_core_iblock.c | 41 ++++++++++++++++++++++++++++-
>  drivers/target/target_core_iblock.h | 10 +++++++
>  2 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
> index 8ed93fd205c7..a4951e662615 100644
> --- a/drivers/target/target_core_iblock.c
> +++ b/drivers/target/target_core_iblock.c
> @@ -61,9 +61,18 @@ static struct se_device *iblock_alloc_device(struct se_hba *hba, const char *nam
>  		return NULL;
>  	}
>  
> +	ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
> +				   GFP_KERNEL);
> +	if (!ib_dev->ibd_plug)
> +		goto free_dev;
> +
>  	pr_debug( "IBLOCK: Allocated ib_dev for %s\n", name);
>  
>  	return &ib_dev->dev;
> +
> +free_dev:
> +	kfree(ib_dev);
> +	return NULL;
>  }
>  
>  static int iblock_configure_device(struct se_device *dev)
> @@ -171,6 +180,7 @@ static void iblock_dev_call_rcu(struct rcu_head *p)
>  	struct se_device *dev = container_of(p, struct se_device, rcu_head);
>  	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
>  
> +	kfree(ib_dev->ibd_plug);
>  	kfree(ib_dev);
>  }
>  
> @@ -188,6 +198,30 @@ static void iblock_destroy_device(struct se_device *dev)
>  	bioset_exit(&ib_dev->ibd_bio_set);
>  }
>  
> +static struct se_dev_plug *iblock_plug_device(struct se_cmd *se_cmd)
> +{
> +	struct se_device *se_dev = se_cmd->se_dev;
> +	struct iblock_dev *ib_dev = IBLOCK_DEV(se_dev);
> +	struct iblock_dev_plug *ib_dev_plug;
> +
> +	ib_dev_plug = &ib_dev->ibd_plug[se_cmd->cpuid];
> +	if (test_and_set_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags))
> +		return NULL;
> +
> +	blk_start_plug(&ib_dev_plug->blk_plug);
> +	return &ib_dev_plug->se_plug;
> +}
> +
> +static void iblock_unplug_device(struct se_dev_plug *se_plug)
> +{
> +	struct iblock_dev_plug *ib_dev_plug =
> +				container_of(se_plug, struct iblock_dev_plug,
> +					     se_plug);
I think something like on the new line read much easier for me atleast :-

        ib_dev_plug = container_of(se_plug, struct iblock_dev_plug,
se_plug);
> +
> +	blk_finish_plug(&ib_dev_plug->blk_plug);
> +	clear_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags);
> +}
> +
>  static unsigned long long iblock_emulate_read_cap_with_block_size(
>  	struct se_device *dev,
>  	struct block_device *bd,
> @@ -337,7 +371,10 @@ static void iblock_submit_bios(struct bio_list *list)
>  {
>  	struct blk_plug plug;
>  	struct bio *bio;
> -
> +	/*
> +	 * The block layer handles nested plugs, so just plug/unplug to handle
> +	 * fabric drivers that didn't support batching and multi bio cmds.
> +	 */
>  	blk_start_plug(&plug);
>  	while ((bio = bio_list_pop(list)))
>  		submit_bio(bio);
> @@ -870,6 +907,8 @@ static const struct target_backend_ops iblock_ops = {
>  	.configure_device	= iblock_configure_device,
>  	.destroy_device		= iblock_destroy_device,
>  	.free_device		= iblock_free_device,
> +	.plug_device		= iblock_plug_device,
> +	.unplug_device		= iblock_unplug_device,
>  	.parse_cdb		= iblock_parse_cdb,
>  	.set_configfs_dev_params = iblock_set_configfs_dev_params,
>  	.show_configfs_dev_params = iblock_show_configfs_dev_params,
> diff --git a/drivers/target/target_core_iblock.h b/drivers/target/target_core_iblock.h
> index cefc641145b3..8c55375d2f75 100644
> --- a/drivers/target/target_core_iblock.h
> +++ b/drivers/target/target_core_iblock.h
> @@ -4,6 +4,7 @@
>  
>  #include <linux/atomic.h>
>  #include <linux/refcount.h>
> +#include <linux/blkdev.h>
>  #include <target/target_core_base.h>
>  
>  #define IBLOCK_VERSION		"4.0"
> @@ -17,6 +18,14 @@ struct iblock_req {
>  
>  #define IBDF_HAS_UDEV_PATH		0x01
>  
> +#define IBD_PLUGF_PLUGGED		0x01
> +
> +struct iblock_dev_plug {
> +	struct se_dev_plug se_plug;
> +	struct blk_plug blk_plug;
> +	unsigned long flags;
> +};
> +
>  struct iblock_dev {
>  	struct se_device dev;
>  	unsigned char ibd_udev_path[SE_UDEV_PATH_LEN];
> @@ -24,6 +33,7 @@ struct iblock_dev {
>  	struct bio_set	ibd_bio_set;
>  	struct block_device *ibd_bd;
>  	bool ibd_readonly;
> +	struct iblock_dev_plug *ibd_plug;
>  } ____cacheline_aligned;
>  
>  #endif /* TARGET_CORE_IBLOCK_H */


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/11] target_core_user: add backend plug/unplug callouts
  2021-02-04 11:35 ` [PATCH 09/11] target_core_user: " Mike Christie
@ 2021-02-04 23:25   ` Chaitanya Kulkarni
  2021-02-07 21:37     ` Mike Christie
  0 siblings, 1 reply; 32+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-04 23:25 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, stefanha, virtualization

>   * queue_cmd_ring - queue cmd to ring or internally
>   * @tcmu_cmd: cmd to queue
> @@ -1086,8 +1108,8 @@ static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
>  
>  	list_add_tail(&tcmu_cmd->queue_entry, &udev->inflight_queue);
>  
> -	/* TODO: only if FLUSH and FUA? */
> -	uio_event_notify(&udev->uio_info);
> +	if (!test_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
> +		uio_event_notify(&udev->uio_info);
>  
Do we need to keep the TODO ?
>  	return 0;
>  
> @@ -2840,6 +2862,8 @@ static struct target_backend_ops tcmu_ops = {
>  	.configure_device	= tcmu_configure_device,
>  	.destroy_device		= tcmu_destroy_device,
>  	.free_device		= tcmu_free_device,
> +	.unplug_device		= tcmu_unplug_device,
> +	.plug_device		= tcmu_plug_device,
>  	.parse_cdb		= tcmu_parse_cdb,
>  	.tmr_notify		= tcmu_tmr_notify,
>  	.set_configfs_dev_params = tcmu_set_configfs_dev_params,


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 10/11] target: replace work per cmd in completion path
  2021-02-04 11:35 ` [PATCH 10/11] target: replace work per cmd in completion path Mike Christie
@ 2021-02-04 23:26   ` Chaitanya Kulkarni
  0 siblings, 0 replies; 32+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-04 23:26 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, stefanha, virtualization

On 2/4/21 03:41, Mike Christie wrote:
> +static void target_queued_compl_work(struct work_struct *work)
> +{
> +	struct se_sess_cmd_queue *cq =
> +				container_of(work, struct se_sess_cmd_queue,
> +					     work);
same here as previously mentioned.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] target: add workqueue cmd submission helper
  2021-02-04 23:13   ` Chaitanya Kulkarni
@ 2021-02-05  0:43     ` michael.christie
  2021-02-05  1:50       ` Chaitanya Kulkarni
  0 siblings, 1 reply; 32+ messages in thread
From: michael.christie @ 2021-02-05  0:43 UTC (permalink / raw)
  To: Chaitanya Kulkarni, martin.petersen, linux-scsi, target-devel,
	mst, jasowang, stefanha, virtualization

On 2/4/21 5:13 PM, Chaitanya Kulkarni wrote:
> On 2/4/21 03:41, Mike Christie wrote:
>> loop and vhost-scsi do their target cmd submission from driver
>> workqueues. This allows them to avoid an issue where the backend may
>> block waiting for resources like tags/requests, mem/locks, etc
>> and that ends up blocking their entire submission path and for the
>> case of vhost-scsi both the submission and completion path.
>>
>> This patch adds a helper these drivers can use to submit from the
>> lio workqueue. This code will then be extended in the next patches
>> to fix the plugging of backend devices.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>   drivers/target/target_core_transport.c | 102 ++++++++++++++++++++++++-
>>   include/target/target_core_base.h      |  10 ++-
>>   include/target/target_core_fabric.h    |   3 +
>>   3 files changed, 111 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
>> index 7c5d37bac561..dec89e911348 100644
>> --- a/drivers/target/target_core_transport.c
>> +++ b/drivers/target/target_core_transport.c
>> @@ -41,6 +41,7 @@
>>   #include <trace/events/target.h>
>>   
>>   static struct workqueue_struct *target_completion_wq;
>> +static struct workqueue_struct *target_submission_wq;
>>   static struct kmem_cache *se_sess_cache;
>>   struct kmem_cache *se_ua_cache;
>>   struct kmem_cache *t10_pr_reg_cache;
>> @@ -129,8 +130,15 @@ int init_se_kmem_caches(void)
>>   	if (!target_completion_wq)
>>   		goto out_free_lba_map_mem_cache;
>>   
>> +	target_submission_wq = alloc_workqueue("target_submission",
>> +					       WQ_MEM_RECLAIM, 0);
>> +	if (!target_submission_wq)
>> +		goto out_free_completion_wq;
>> +
>>   	return 0;
>>   
>> +out_free_completion_wq:
>> +	destroy_workqueue(target_completion_wq);
>>   out_free_lba_map_mem_cache:
>>   	kmem_cache_destroy(t10_alua_lba_map_mem_cache);
>>   out_free_lba_map_cache:
>> @@ -153,6 +161,7 @@ int init_se_kmem_caches(void)
>>   
>>   void release_se_kmem_caches(void)
>>   {
>> +	destroy_workqueue(target_submission_wq);
>>   	destroy_workqueue(target_completion_wq);
>>   	kmem_cache_destroy(se_sess_cache);
>>   	kmem_cache_destroy(se_ua_cache);
>> @@ -218,6 +227,69 @@ static void target_release_sess_cmd_refcnt(struct percpu_ref *ref)
>>   	wake_up(&sess->cmd_count_wq);
>>   }
>>   
>> +static void target_queued_submit_work(struct work_struct *work)
>> +{
>> +	struct se_sess_cmd_queue *sq =
>> +				container_of(work, struct se_sess_cmd_queue,
>> +					     work);
>> +	struct se_session *se_sess = sq->se_sess;
>> +	struct se_cmd *se_cmd, *next_cmd;
>> +	struct llist_node *cmd_list;
>> +
>> +	cmd_list = llist_del_all(&sq->cmd_list);
>> +	if (!cmd_list)
>> +		/* Previous call took what we were queued to submit */
>> +		return;
>> +
>> +	cmd_list = llist_reverse_order(cmd_list);
>> +	llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
>> +		se_sess->tfo->submit_queued_cmd(se_cmd);
>> +}
>> +
>> +static void target_queue_cmd_work(struct se_sess_cmd_queue *q,
>> +				  struct se_cmd *se_cmd, int cpu)
>> +{
>> +	llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
>> +	queue_work_on(cpu, target_submission_wq, &q->work);
>> +}
>> +
>> +/**
>> + * target_queue_cmd_submit - queue a se_cmd to be executed from the lio wq
>> + * @se_sess: cmd's session
>> + * @cmd_list: cmd to queue
>> + */
>> +void target_queue_cmd_submit(struct se_session *se_sess, struct se_cmd *se_cmd)
>> +{
>> +	int cpu = smp_processor_id();
>> +
>> +	target_queue_cmd_work(&se_sess->sq[cpu], se_cmd, cpu);
>> +}
>> +EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
>> +
>> +static void target_flush_queued_cmds(struct se_session *se_sess)
>> +{
>> +	int i;
>> +
>> +	if (!se_sess->sq)
>> +		return;
>> +
>> +	for (i = 0; i < se_sess->q_cnt; i++)
>> +		cancel_work_sync(&se_sess->sq[i].work);
>> +}
>> +
>> +static void target_init_sess_cmd_queues(struct se_session *se_sess,
>> +					struct se_sess_cmd_queue *q,
>> +					void (*work_fn)(struct work_struct *work))
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < se_sess->q_cnt; i++) {
>> +		init_llist_head(&q[i].cmd_list);
>> +		INIT_WORK(&q[i].work, work_fn);
>> +		q[i].se_sess = se_sess;
>> +	}
>> +}
>> +
> Can we opencode above function if there is only one caller ?
> unless there is a specific reason to have it on its own which I failed to
> understand.

Patch 10 also calls it. I tried to say that in the end of the patch 
description but it was not too clear now that I read it again.

I couldn't decide if I should do it now or later. I selected now since 
it made the 10th pach smaller.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] target iblock: add backend plug/unplug callouts
  2021-02-04 23:23   ` Chaitanya Kulkarni
@ 2021-02-05  0:45     ` michael.christie
  0 siblings, 0 replies; 32+ messages in thread
From: michael.christie @ 2021-02-05  0:45 UTC (permalink / raw)
  To: Chaitanya Kulkarni, martin.petersen, linux-scsi, target-devel,
	mst, jasowang, stefanha, virtualization

On 2/4/21 5:23 PM, Chaitanya Kulkarni wrote:
> On 2/4/21 03:40, Mike Christie wrote:
>> This patch adds plug/unplug callouts for iblock. For initiator drivers
>> like iscsi which wants to pass multiple cmds to its xmit thread instead
>> of one cmd at a time, this increases IOPs by around 10% with vhost-scsi
>> (combined with the last patches we can see a total 40-50% increase). For
>> driver combos like tcm_loop and faster drivers like the iser initiator, we
>> can still see IOPs increase by 20-30% when tcm_loop's nr_hw_queues setting
>> is also increased.
>>
>> Signed-off-by: Mike Christie <michael.christie@oracle.com>
>> ---
>>   drivers/target/target_core_iblock.c | 41 ++++++++++++++++++++++++++++-
>>   drivers/target/target_core_iblock.h | 10 +++++++
>>   2 files changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
>> index 8ed93fd205c7..a4951e662615 100644
>> --- a/drivers/target/target_core_iblock.c
>> +++ b/drivers/target/target_core_iblock.c
>> @@ -61,9 +61,18 @@ static struct se_device *iblock_alloc_device(struct se_hba *hba, const char *nam
>>   		return NULL;
>>   	}
>>   
>> +	ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
>> +				   GFP_KERNEL);
>> +	if (!ib_dev->ibd_plug)
>> +		goto free_dev;
>> +
>>   	pr_debug( "IBLOCK: Allocated ib_dev for %s\n", name);
>>   
>>   	return &ib_dev->dev;
>> +
>> +free_dev:
>> +	kfree(ib_dev);
>> +	return NULL;
>>   }
>>   
>>   static int iblock_configure_device(struct se_device *dev)
>> @@ -171,6 +180,7 @@ static void iblock_dev_call_rcu(struct rcu_head *p)
>>   	struct se_device *dev = container_of(p, struct se_device, rcu_head);
>>   	struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
>>   
>> +	kfree(ib_dev->ibd_plug);
>>   	kfree(ib_dev);
>>   }
>>   
>> @@ -188,6 +198,30 @@ static void iblock_destroy_device(struct se_device *dev)
>>   	bioset_exit(&ib_dev->ibd_bio_set);
>>   }
>>   
>> +static struct se_dev_plug *iblock_plug_device(struct se_cmd *se_cmd)
>> +{
>> +	struct se_device *se_dev = se_cmd->se_dev;
>> +	struct iblock_dev *ib_dev = IBLOCK_DEV(se_dev);
>> +	struct iblock_dev_plug *ib_dev_plug;
>> +
>> +	ib_dev_plug = &ib_dev->ibd_plug[se_cmd->cpuid];
>> +	if (test_and_set_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags))
>> +		return NULL;
>> +
>> +	blk_start_plug(&ib_dev_plug->blk_plug);
>> +	return &ib_dev_plug->se_plug;
>> +}
>> +
>> +static void iblock_unplug_device(struct se_dev_plug *se_plug)
>> +{
>> +	struct iblock_dev_plug *ib_dev_plug =
>> +				container_of(se_plug, struct iblock_dev_plug,
>> +					     se_plug);
> I think something like on the new line read much easier for me atleast :-
> 
>          ib_dev_plug = container_of(se_plug, struct iblock_dev_plug,
> se_plug);

Yeah nicer. Will change this and the other one.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/11] target: add workqueue cmd submission helper
  2021-02-05  0:43     ` michael.christie
@ 2021-02-05  1:50       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 32+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-05  1:50 UTC (permalink / raw)
  To: michael.christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, stefanha, virtualization

On 2/4/21 16:44, michael.christie@oracle.com wrote:
>>> +
>> Can we opencode above function if there is only one caller ?
>> unless there is a specific reason to have it on its own which I failed to
>> understand.
> Patch 10 also calls it. I tried to say that in the end of the patch 
> description but it was not too clear now that I read it again.
>
> I couldn't decide if I should do it now or later. I selected now since 
> it made the 10th pach smaller.
>
if it does then fine.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/11] vhost scsi: use lio wq cmd submission helper
  2021-02-04 11:35 ` [PATCH 05/11] vhost scsi: " Mike Christie
@ 2021-02-05 16:17     ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2021-02-05 16:17 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, stefanha,
	virtualization

On Thu, Feb 04, 2021 at 05:35:07AM -0600, Mike Christie wrote:
> @@ -1132,14 +1127,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
>  		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
>  		 */
>  		cmd->tvc_vq_desc = vc.head;
> -		/*
> -		 * Dispatch cmd descriptor for cmwq execution in process
> -		 * context provided by vhost_scsi_workqueue.  This also ensures
> -		 * cmd is executed on the same kworker CPU as this vhost
> -		 * thread to gain positive L2 cache locality effects.
> -		 */
> -		INIT_WORK(&cmd->work, vhost_scsi_submission_work);
> -		queue_work(vhost_scsi_workqueue, &cmd->work);
> +		target_queue_cmd_submit(tpg->tpg_nexus->tvn_se_sess,
> +					&cmd->tvc_se_cmd);
>  		ret = 0;
>  err:
>  		/*

What about this aspect? Will things still stay on the same CPU?

-- 
MST


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/11] vhost scsi: use lio wq cmd submission helper
@ 2021-02-05 16:17     ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2021-02-05 16:17 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, virtualization, target-devel, stefanha

On Thu, Feb 04, 2021 at 05:35:07AM -0600, Mike Christie wrote:
> @@ -1132,14 +1127,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
>  		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
>  		 */
>  		cmd->tvc_vq_desc = vc.head;
> -		/*
> -		 * Dispatch cmd descriptor for cmwq execution in process
> -		 * context provided by vhost_scsi_workqueue.  This also ensures
> -		 * cmd is executed on the same kworker CPU as this vhost
> -		 * thread to gain positive L2 cache locality effects.
> -		 */
> -		INIT_WORK(&cmd->work, vhost_scsi_submission_work);
> -		queue_work(vhost_scsi_workqueue, &cmd->work);
> +		target_queue_cmd_submit(tpg->tpg_nexus->tvn_se_sess,
> +					&cmd->tvc_se_cmd);
>  		ret = 0;
>  err:
>  		/*

What about this aspect? Will things still stay on the same CPU?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/11] vhost scsi: use lio wq cmd submission helper
  2021-02-05 16:17     ` Michael S. Tsirkin
  (?)
@ 2021-02-05 17:38     ` Mike Christie
  2021-02-05 18:04       ` Mike Christie
  -1 siblings, 1 reply; 32+ messages in thread
From: Mike Christie @ 2021-02-05 17:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, stefanha,
	virtualization

On 2/5/21 10:17 AM, Michael S. Tsirkin wrote:
> On Thu, Feb 04, 2021 at 05:35:07AM -0600, Mike Christie wrote:
>> @@ -1132,14 +1127,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
>>  		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
>>  		 */
>>  		cmd->tvc_vq_desc = vc.head;
>> -		/*
>> -		 * Dispatch cmd descriptor for cmwq execution in process
>> -		 * context provided by vhost_scsi_workqueue.  This also ensures
>> -		 * cmd is executed on the same kworker CPU as this vhost
>> -		 * thread to gain positive L2 cache locality effects.
>> -		 */
>> -		INIT_WORK(&cmd->work, vhost_scsi_submission_work);
>> -		queue_work(vhost_scsi_workqueue, &cmd->work);
>> +		target_queue_cmd_submit(tpg->tpg_nexus->tvn_se_sess,
>> +					&cmd->tvc_se_cmd);
>>  		ret = 0;
>>  err:
>>  		/*
> 
> What about this aspect? Will things still stay on the same CPU
Yes, if that is what it's configured to do.

On the submission path there is no change in behavior. target_queue_cmd_submit
does queue_work_on so it executes the cmd on the same CPU in LIO. Once
LIO passes it to the block layer then that layer does whatever is setup.

On the completion path the low level works the same. The low level
driver goes by its ISRs/softirq/completion-thread settings, the block layer
then goes by the queue settings like rq_affinity.

The change in behavior is that in LIO we will do what was configured
in the layer below us instead of always trying to complete on the same
CPU it was submitted on.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 05/11] vhost scsi: use lio wq cmd submission helper
  2021-02-05 17:38     ` Mike Christie
@ 2021-02-05 18:04       ` Mike Christie
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-05 18:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, stefanha,
	virtualization

On 2/5/21 11:38 AM, Mike Christie wrote:
> On 2/5/21 10:17 AM, Michael S. Tsirkin wrote:
>> On Thu, Feb 04, 2021 at 05:35:07AM -0600, Mike Christie wrote:
>>> @@ -1132,14 +1127,8 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
>>>  		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
>>>  		 */
>>>  		cmd->tvc_vq_desc = vc.head;
>>> -		/*
>>> -		 * Dispatch cmd descriptor for cmwq execution in process
>>> -		 * context provided by vhost_scsi_workqueue.  This also ensures
>>> -		 * cmd is executed on the same kworker CPU as this vhost
>>> -		 * thread to gain positive L2 cache locality effects.
>>> -		 */
>>> -		INIT_WORK(&cmd->work, vhost_scsi_submission_work);
>>> -		queue_work(vhost_scsi_workqueue, &cmd->work);
>>> +		target_queue_cmd_submit(tpg->tpg_nexus->tvn_se_sess,
>>> +					&cmd->tvc_se_cmd);
>>>  		ret = 0;
>>>  err:
>>>  		/*
>>
>> What about this aspect? Will things still stay on the same CPU
> Yes, if that is what it's configured to do.

Oh yeah, I wasn't sure if you were only asking about the code in this
patch or the combined patchset. The above chunk modifies the submission
code. Like I wrote below, there are no changes CPU use wise in that path.

Patch:

[PATCH 11/11] target, vhost-scsi: don't switch cpus on completion

modifies the completion path in LIO so we can complete the cmd on the
CPU that the layers below LIO were configured to complete on. The user
can then configure those layers to complete on the specific CPU it was
submitted on, just one that shares a cache, or what the layer below the
block layer completed it on.


> 
> On the submission path there is no change in behavior. target_queue_cmd_submit
> does queue_work_on so it executes the cmd on the same CPU in LIO. Once
> LIO passes it to the block layer then that layer does whatever is setup.
> 
> On the completion path the low level works the same. The low level
> driver goes by its ISRs/softirq/completion-thread settings, the block layer
> then goes by the queue settings like rq_affinity.
> 
> The change in behavior is that in LIO we will do what was configured
> in the layer below us instead of always trying to complete on the same
> CPU it was submitted on.
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] target iblock: add backend plug/unplug callouts
  2021-02-04 11:35 ` [PATCH 08/11] target iblock: add backend plug/unplug callouts Mike Christie
  2021-02-04 23:23   ` Chaitanya Kulkarni
@ 2021-02-07  1:06   ` Chaitanya Kulkarni
  2021-02-07  2:21       ` Bart Van Assche
  1 sibling, 1 reply; 32+ messages in thread
From: Chaitanya Kulkarni @ 2021-02-07  1:06 UTC (permalink / raw)
  To: Mike Christie, martin.petersen, linux-scsi, target-devel, mst,
	jasowang, stefanha, virtualization

On 2/4/21 03:40, Mike Christie wrote:
>  
> +	ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
> +				   GFP_KERNEL);
I'd actually prefer struct xxx in sizeof, but maybe that is just my
preference.
Not sure what is the standard practice in target code.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] target iblock: add backend plug/unplug callouts
  2021-02-07  1:06   ` Chaitanya Kulkarni
@ 2021-02-07  2:21       ` Bart Van Assche
  0 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2021-02-07  2:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Mike Christie, martin.petersen, linux-scsi,
	target-devel, mst, jasowang, stefanha, virtualization

On 2/6/21 5:06 PM, Chaitanya Kulkarni wrote:
> On 2/4/21 03:40, Mike Christie wrote:
>>  
>> +	ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
>> +				   GFP_KERNEL);
> I'd actually prefer struct xxx in sizeof, but maybe that is just my
> preference.
> Not sure what is the standard practice in target code.

The above code is easier to verify than the suggested alternative. With
the alternative one has to look up the definition of ibd_plug to verify
correctness of the code. The above code can be verified without having
to look up the definition of the ibd_plug member.

Bart.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/11] target iblock: add backend plug/unplug callouts
@ 2021-02-07  2:21       ` Bart Van Assche
  0 siblings, 0 replies; 32+ messages in thread
From: Bart Van Assche @ 2021-02-07  2:21 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Mike Christie, martin.petersen, linux-scsi,
	target-devel, mst, jasowang, stefanha, virtualization

On 2/6/21 5:06 PM, Chaitanya Kulkarni wrote:
> On 2/4/21 03:40, Mike Christie wrote:
>>  
>> +	ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
>> +				   GFP_KERNEL);
> I'd actually prefer struct xxx in sizeof, but maybe that is just my
> preference.
> Not sure what is the standard practice in target code.

The above code is easier to verify than the suggested alternative. With
the alternative one has to look up the definition of ibd_plug to verify
correctness of the code. The above code can be verified without having
to look up the definition of the ibd_plug member.

Bart.


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/11] target_core_user: add backend plug/unplug callouts
  2021-02-04 23:25   ` Chaitanya Kulkarni
@ 2021-02-07 21:37     ` Mike Christie
  0 siblings, 0 replies; 32+ messages in thread
From: Mike Christie @ 2021-02-07 21:37 UTC (permalink / raw)
  To: Chaitanya Kulkarni, martin.petersen, linux-scsi, target-devel,
	mst, jasowang, stefanha, virtualization

On 2/4/21 5:25 PM, Chaitanya Kulkarni wrote:
>>   * queue_cmd_ring - queue cmd to ring or internally
>>   * @tcmu_cmd: cmd to queue
>> @@ -1086,8 +1108,8 @@ static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
>>  
>>  	list_add_tail(&tcmu_cmd->queue_entry, &udev->inflight_queue);
>>  
>> -	/* TODO: only if FLUSH and FUA? */
>> -	uio_event_notify(&udev->uio_info);
>> +	if (!test_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
>> +		uio_event_notify(&udev->uio_info);
>>  
> Do we need to keep the TODO ?
I think it's not helpful.

The reason for the TODO was to avoid calling uio_event_notify for
every command. I think we had thought we could just key of a FLUSH
but then later figured out we might not always get one so that wouldn't
work. The comment should have been removed or if we like to keep TODOs
like that in code it should have been updated to better reflect what the
issue was and the idea to fix it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] target: fix cmd plugging and completion
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
@ 2021-02-08 10:48   ` Stefan Hajnoczi
  2021-02-04 11:35 ` [PATCH 02/11] target: add workqueue cmd submission helper Mike Christie
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2021-02-08 10:48 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, target-devel, mst, jasowang, virtualization

[-- Attachment #1: Type: text/plain, Size: 1542 bytes --]

On Thu, Feb 04, 2021 at 05:35:02AM -0600, Mike Christie wrote:
> The following patches made over Martin's 5.12 branches fix two
> issues:
> 
> 1. target_core_iblock plugs and unplugs the queue for every
> command. To handle this issue and handle an issue that
> vhost-scsi and loop were avoiding by adding their own workqueue,
> I added a new submission workqueue to LIO. Drivers can pass cmds
> to it, and we can then submit batches of cmds.
> 
> 2. vhost-scsi and loop on the submission side were doing a work
> per cmd and on the lio completion side it was doing a work per
> cmd. The cap on running works is 512 (max_active) and so we can
> end up end up using a lot of threads when submissions start blocking
> because they hit the block tag limit or the completion side blocks
> trying to send the cmd. In this patchset I just use a cmd list
> per session to avoid abusing the workueue layer.
> 
> The combined patchset fixes a major perf issue we've been hitting
> where IOPs is stuck at 230K when running:
> 
>     fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
>     --ioengine=libaio --iodepth=128  --numjobs=8 --time_based
>     --group_reporting --runtime=60
> 
> The patches in this set get me to 350K when using devices that
> have native IOPs of around 400-500K.
> 
> Note that 5.12 has some interrupt changes that my patches
> collide with. Martin's 5.12 branches had the changes so I
> based my patches on that.

For vhost-scsi:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] target: fix cmd plugging and completion
@ 2021-02-08 10:48   ` Stefan Hajnoczi
  0 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2021-02-08 10:48 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, mst, virtualization, target-devel


[-- Attachment #1.1: Type: text/plain, Size: 1542 bytes --]

On Thu, Feb 04, 2021 at 05:35:02AM -0600, Mike Christie wrote:
> The following patches made over Martin's 5.12 branches fix two
> issues:
> 
> 1. target_core_iblock plugs and unplugs the queue for every
> command. To handle this issue and handle an issue that
> vhost-scsi and loop were avoiding by adding their own workqueue,
> I added a new submission workqueue to LIO. Drivers can pass cmds
> to it, and we can then submit batches of cmds.
> 
> 2. vhost-scsi and loop on the submission side were doing a work
> per cmd and on the lio completion side it was doing a work per
> cmd. The cap on running works is 512 (max_active) and so we can
> end up end up using a lot of threads when submissions start blocking
> because they hit the block tag limit or the completion side blocks
> trying to send the cmd. In this patchset I just use a cmd list
> per session to avoid abusing the workueue layer.
> 
> The combined patchset fixes a major perf issue we've been hitting
> where IOPs is stuck at 230K when running:
> 
>     fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
>     --ioengine=libaio --iodepth=128  --numjobs=8 --time_based
>     --group_reporting --runtime=60
> 
> The patches in this set get me to 350K when using devices that
> have native IOPs of around 400-500K.
> 
> Note that 5.12 has some interrupt changes that my patches
> collide with. Martin's 5.12 branches had the changes so I
> based my patches on that.

For vhost-scsi:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] target: fix cmd plugging and completion
  2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
@ 2021-02-08 12:01   ` Michael S. Tsirkin
  2021-02-04 11:35 ` [PATCH 02/11] target: add workqueue cmd submission helper Mike Christie
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2021-02-08 12:01 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, target-devel, jasowang, stefanha,
	virtualization

On Thu, Feb 04, 2021 at 05:35:02AM -0600, Mike Christie wrote:
> The following patches made over Martin's 5.12 branches fix two
> issues:
> 
> 1. target_core_iblock plugs and unplugs the queue for every
> command. To handle this issue and handle an issue that
> vhost-scsi and loop were avoiding by adding their own workqueue,
> I added a new submission workqueue to LIO. Drivers can pass cmds
> to it, and we can then submit batches of cmds.
> 
> 2. vhost-scsi and loop on the submission side were doing a work
> per cmd and on the lio completion side it was doing a work per
> cmd. The cap on running works is 512 (max_active) and so we can
> end up end up using a lot of threads when submissions start blocking
> because they hit the block tag limit or the completion side blocks
> trying to send the cmd. In this patchset I just use a cmd list
> per session to avoid abusing the workueue layer.
> 
> The combined patchset fixes a major perf issue we've been hitting
> where IOPs is stuck at 230K when running:
> 
>     fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
>     --ioengine=libaio --iodepth=128  --numjobs=8 --time_based
>     --group_reporting --runtime=60
> 
> The patches in this set get me to 350K when using devices that
> have native IOPs of around 400-500K.
> 
> Note that 5.12 has some interrupt changes that my patches
> collide with. Martin's 5.12 branches had the changes so I
> based my patches on that.
> 

OK so feel free to merge through that branch.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

-- 
MST


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/11] target: fix cmd plugging and completion
@ 2021-02-08 12:01   ` Michael S. Tsirkin
  0 siblings, 0 replies; 32+ messages in thread
From: Michael S. Tsirkin @ 2021-02-08 12:01 UTC (permalink / raw)
  To: Mike Christie
  Cc: martin.petersen, linux-scsi, virtualization, target-devel, stefanha

On Thu, Feb 04, 2021 at 05:35:02AM -0600, Mike Christie wrote:
> The following patches made over Martin's 5.12 branches fix two
> issues:
> 
> 1. target_core_iblock plugs and unplugs the queue for every
> command. To handle this issue and handle an issue that
> vhost-scsi and loop were avoiding by adding their own workqueue,
> I added a new submission workqueue to LIO. Drivers can pass cmds
> to it, and we can then submit batches of cmds.
> 
> 2. vhost-scsi and loop on the submission side were doing a work
> per cmd and on the lio completion side it was doing a work per
> cmd. The cap on running works is 512 (max_active) and so we can
> end up end up using a lot of threads when submissions start blocking
> because they hit the block tag limit or the completion side blocks
> trying to send the cmd. In this patchset I just use a cmd list
> per session to avoid abusing the workueue layer.
> 
> The combined patchset fixes a major perf issue we've been hitting
> where IOPs is stuck at 230K when running:
> 
>     fio --filename=/dev/sda  --direct=1 --rw=randrw --bs=4k
>     --ioengine=libaio --iodepth=128  --numjobs=8 --time_based
>     --group_reporting --runtime=60
> 
> The patches in this set get me to 350K when using devices that
> have native IOPs of around 400-500K.
> 
> Note that 5.12 has some interrupt changes that my patches
> collide with. Martin's 5.12 branches had the changes so I
> based my patches on that.
> 

OK so feel free to merge through that branch.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-02-08 12:04 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-04 11:35 [PATCH 00/11] target: fix cmd plugging and completion Mike Christie
2021-02-04 11:35 ` [PATCH 01/11] target: pass in fabric ops to session creation Mike Christie
2021-02-04 11:35 ` [PATCH 02/11] target: add workqueue cmd submission helper Mike Christie
2021-02-04 23:13   ` Chaitanya Kulkarni
2021-02-05  0:43     ` michael.christie
2021-02-05  1:50       ` Chaitanya Kulkarni
2021-02-04 11:35 ` [PATCH 03/11] tcm loop: use blk cmd allocator for se_cmds Mike Christie
2021-02-04 11:35 ` [PATCH 04/11] tcm loop: use lio wq cmd submission helper Mike Christie
2021-02-04 11:35 ` [PATCH 05/11] vhost scsi: " Mike Christie
2021-02-05 16:17   ` Michael S. Tsirkin
2021-02-05 16:17     ` Michael S. Tsirkin
2021-02-05 17:38     ` Mike Christie
2021-02-05 18:04       ` Mike Christie
2021-02-04 11:35 ` [PATCH 06/11] target: cleanup cmd flag bits Mike Christie
2021-02-04 23:15   ` Chaitanya Kulkarni
2021-02-04 11:35 ` [PATCH 07/11] target: fix backend plugging Mike Christie
2021-02-04 11:35 ` [PATCH 08/11] target iblock: add backend plug/unplug callouts Mike Christie
2021-02-04 23:23   ` Chaitanya Kulkarni
2021-02-05  0:45     ` michael.christie
2021-02-07  1:06   ` Chaitanya Kulkarni
2021-02-07  2:21     ` Bart Van Assche
2021-02-07  2:21       ` Bart Van Assche
2021-02-04 11:35 ` [PATCH 09/11] target_core_user: " Mike Christie
2021-02-04 23:25   ` Chaitanya Kulkarni
2021-02-07 21:37     ` Mike Christie
2021-02-04 11:35 ` [PATCH 10/11] target: replace work per cmd in completion path Mike Christie
2021-02-04 23:26   ` Chaitanya Kulkarni
2021-02-04 11:35 ` [PATCH 11/11] target, vhost-scsi: don't switch cpus on completion Mike Christie
2021-02-08 10:48 ` [PATCH 00/11] target: fix cmd plugging and completion Stefan Hajnoczi
2021-02-08 10:48   ` Stefan Hajnoczi
2021-02-08 12:01 ` Michael S. Tsirkin
2021-02-08 12:01   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.