* [PATCH 01/13] target: move t_task_cdb initialization
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 02/13] target: split target_submit_cmd_map_sgls Mike Christie
` (11 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
The next patch splits target_submit_cmd_map_sgls so the initialization
and submission part can be called at different times. If the init part
fails we can reference the t_task_cdb early in some of the logging
and tracing code. This moves it to transport_init_se_cmd so we don't
hit NULL pointer crashes.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_transport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index d47bfd8b0f87..5c4adde96d5e 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1389,6 +1389,7 @@ void transport_init_se_cmd(
INIT_WORK(&cmd->work, NULL);
kref_init(&cmd->cmd_kref);
+ cmd->t_task_cdb = &cmd->__t_task_cdb[0];
cmd->se_tfo = tfo;
cmd->se_sess = se_sess;
cmd->data_length = data_length;
@@ -1430,7 +1431,6 @@ target_cmd_init_cdb(struct se_cmd *cmd, unsigned char *cdb)
{
sense_reason_t ret;
- cmd->t_task_cdb = &cmd->__t_task_cdb[0];
/*
* Ensure that the received CDB is less than the max (252 + 8) bytes
* for VARIABLE_LENGTH_CMD
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 02/13] target: split target_submit_cmd_map_sgls
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
2021-02-10 4:55 ` [PATCH 01/13] target: move t_task_cdb initialization Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 03/13] target: add workqueue based cmd submission Mike Christie
` (10 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
Separate target_submit_cmd_map_sgls into the part that does:
- the initial cmd setup
- will not sleep
- and gives us access to the se_device
and the part that:
- can sleep
- handles the actual submission
This will be needed for loop in the next patches which needs to add
the cmd to the lio workqueue and can't sleep in that initial submission
path.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_transport.c | 149 ++++++++++++++++++-------
1 file changed, 109 insertions(+), 40 deletions(-)
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 5c4adde96d5e..f19b4b81e3b9 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1571,12 +1571,9 @@ transport_generic_map_mem_to_cmd(struct se_cmd *cmd, struct scatterlist *sgl,
}
/**
- * target_submit_cmd_map_sgls - lookup unpacked lun and submit uninitialized
- * se_cmd + use pre-allocated SGL memory.
- *
+ * target_submit_prep - prep cmd for submission to lio core
* @se_cmd: command descriptor to submit
* @se_sess: associated se_sess for endpoint
- * @cdb: pointer to SCSI CDB
* @sense: pointer to SCSI sense buffer
* @unpacked_lun: unpacked LUN to reference for struct se_lun
* @data_length: fabric expected data transfer length
@@ -1592,26 +1589,29 @@ transport_generic_map_mem_to_cmd(struct se_cmd *cmd, struct scatterlist *sgl,
*
* Task tags are supported if the caller has set @se_cmd->tag.
*
- * Returns non zero to signal active I/O shutdown failure. All other
- * setup exceptions will be returned as a SCSI CHECK_CONDITION response,
- * but still return zero here.
+ * Returns:
+ * - less than zero to signal active I/O shutdown failure
+ * - zero on success.
+ * - one for all other setup exceptions. The cmd will be returned as a
+ * SCSI CHECK_CONDITION response in this case.
*
- * This may only be called from process context, and also currently
- * assumes internal allocation of fabric payload buffer by target-core.
+ * This may only be called from interrupt context if the caller's
+ * queue_status and release_cmd callouts do not block.
+ *
+ * This assumes internal allocation of fabric payload buffer by target-core.
*/
-int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess,
- unsigned char *cdb, unsigned char *sense, u64 unpacked_lun,
- u32 data_length, int task_attr, int data_dir, int flags,
- struct scatterlist *sgl, u32 sgl_count,
- struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
- struct scatterlist *sgl_prot, u32 sgl_prot_count)
+static int
+target_submit_prep(struct se_cmd *se_cmd, struct se_session *se_sess,
+ unsigned char *sense, u64 unpacked_lun,
+ u32 data_length, int task_attr, int data_dir, int flags,
+ struct scatterlist *sgl, u32 sgl_count,
+ struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+ struct scatterlist *sgl_prot, u32 sgl_prot_count)
{
struct se_portal_group *se_tpg;
sense_reason_t rc;
int ret;
- might_sleep();
-
se_tpg = se_sess->se_tpg;
BUG_ON(!se_tpg);
BUG_ON(se_cmd->se_tfo || se_cmd->se_sess);
@@ -1642,14 +1642,6 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
*/
if (flags & TARGET_SCF_BIDI_OP)
se_cmd->se_cmd_flags |= SCF_BIDI;
-
- rc = target_cmd_init_cdb(se_cmd, cdb);
- if (rc) {
- transport_send_check_condition_and_sense(se_cmd, rc, 0);
- target_put_sess_cmd(se_cmd);
- return 0;
- }
-
/*
* Locate se_lun pointer and attach it to struct se_cmd
*/
@@ -1657,13 +1649,7 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
if (rc) {
transport_send_check_condition_and_sense(se_cmd, rc, 0);
target_put_sess_cmd(se_cmd);
- return 0;
- }
-
- rc = target_cmd_parse_cdb(se_cmd);
- if (rc != 0) {
- transport_generic_request_failure(se_cmd, rc);
- return 0;
+ return 1;
}
/*
@@ -1684,6 +1670,43 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
if (sgl_count != 0) {
BUG_ON(!sgl);
+ rc = transport_generic_map_mem_to_cmd(se_cmd, sgl, sgl_count,
+ sgl_bidi, sgl_bidi_count);
+ if (rc != 0) {
+ transport_generic_request_failure(se_cmd, rc);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * target_submit - perform final initialization and submit cmd to LIO core
+ * @se_cmd: command descriptor to submit
+ * @cdb: pointer to SCSI CDB
+ *
+ * target_submit_prep must have been called on the cmd, and this must be
+ * called from process context.
+ */
+static void target_submit(struct se_cmd *se_cmd, unsigned char *cdb)
+{
+ struct scatterlist *sgl = se_cmd->t_data_sg;
+ unsigned char *buf = NULL;
+ sense_reason_t rc;
+
+ might_sleep();
+
+ rc = target_cmd_init_cdb(se_cmd, cdb);
+ if (rc)
+ goto fail;
+
+ rc = target_cmd_parse_cdb(se_cmd);
+ if (rc != 0)
+ goto fail;
+
+ if (se_cmd->t_data_nents != 0) {
+ BUG_ON(!sgl);
/*
* A work-around for tcm_loop as some userspace code via
* scsi-generic do not memset their associated read buffers,
@@ -1694,8 +1717,6 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
*/
if (!(se_cmd->se_cmd_flags & SCF_SCSI_DATA_CDB) &&
se_cmd->data_direction == DMA_FROM_DEVICE) {
- unsigned char *buf = NULL;
-
if (sgl)
buf = kmap(sg_page(sgl)) + sgl->offset;
@@ -1705,12 +1726,6 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
}
}
- rc = transport_generic_map_mem_to_cmd(se_cmd, sgl, sgl_count,
- sgl_bidi, sgl_bidi_count);
- if (rc != 0) {
- transport_generic_request_failure(se_cmd, rc);
- return 0;
- }
}
/*
@@ -1720,6 +1735,60 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess
core_alua_check_nonop_delay(se_cmd);
transport_handle_cdb_direct(se_cmd);
+ return;
+
+fail:
+ transport_generic_request_failure(se_cmd, rc);
+}
+
+/**
+ * target_submit_cmd_map_sgls - lookup unpacked lun and submit uninitialized
+ * se_cmd + use pre-allocated SGL memory.
+ *
+ * @se_cmd: command descriptor to submit
+ * @se_sess: associated se_sess for endpoint
+ * @cdb: pointer to SCSI CDB
+ * @sense: pointer to SCSI sense buffer
+ * @unpacked_lun: unpacked LUN to reference for struct se_lun
+ * @data_length: fabric expected data transfer length
+ * @task_attr: SAM task attribute
+ * @data_dir: DMA data direction
+ * @flags: flags for command submission from target_sc_flags_tables
+ * @sgl: struct scatterlist memory for unidirectional mapping
+ * @sgl_count: scatterlist count for unidirectional mapping
+ * @sgl_bidi: struct scatterlist memory for bidirectional READ mapping
+ * @sgl_bidi_count: scatterlist count for bidirectional READ mapping
+ * @sgl_prot: struct scatterlist memory protection information
+ * @sgl_prot_count: scatterlist count for protection information
+ *
+ * Task tags are supported if the caller has set @se_cmd->tag.
+ *
+ * Returns non zero to signal active I/O shutdown failure. All other
+ * setup exceptions will be returned as a SCSI CHECK_CONDITION response,
+ * but still return zero here.
+ *
+ * This may only be called from process context, and also currently
+ * assumes internal allocation of fabric payload buffer by target-core.
+ */
+int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess,
+ unsigned char *cdb, unsigned char *sense, u64 unpacked_lun,
+ u32 data_length, int task_attr, int data_dir, int flags,
+ struct scatterlist *sgl, u32 sgl_count,
+ struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+ struct scatterlist *sgl_prot, u32 sgl_prot_count)
+{
+ int ret;
+
+ ret = target_submit_prep(se_cmd, se_sess, sense, unpacked_lun,
+ data_length, task_attr, data_dir, flags,
+ sgl, sgl_count, sgl_bidi, sgl_bidi_count,
+ sgl_prot, sgl_prot_count);
+ if (ret < 0)
+ return ret;
+ else if (ret > 0)
+ return 0;
+
+ target_submit(se_cmd, cdb);
return 0;
}
EXPORT_SYMBOL(target_submit_cmd_map_sgls);
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 03/13] target: add workqueue based cmd submission
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
2021-02-10 4:55 ` [PATCH 01/13] target: move t_task_cdb initialization Mike Christie
2021-02-10 4:55 ` [PATCH 02/13] target: split target_submit_cmd_map_sgls Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 04/13] vhost scsi: use lio wq cmd submission helper Mike Christie
` (9 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
loop and vhost-scsi do their target cmd submission from driver
workqueues. This allows them to avoid an issue where the backend may
block waiting for resources like tags/requests, mem/locks, etc
and that ends up blocking their entire submission path and for the
case of vhost-scsi both the submission and completion path.
This patch adds a helper drivers can use to submit from a lio
workqueue. This code will then be extended in the next patches to
fix the plugging of backend devices.
Note: I'm only converting vhost/loop initially, but the workqueue
based submission will work for other drivers and have similar
benefits where the main target loops will not end up blocking one
some backend resource. I'll port others when I have more time to
test as I think we might want to make it configurable for some
drivers.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_device.c | 10 ++-
drivers/target/target_core_internal.h | 1 +
drivers/target/target_core_transport.c | 95 +++++++++++++++++++++++++-
include/target/target_core_base.h | 8 ++-
include/target/target_core_fabric.h | 7 ++
5 files changed, 117 insertions(+), 4 deletions(-)
diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index 7787c527aad3..74d3a4896588 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -735,8 +735,14 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
dev->queue_cnt = nr_cpu_ids;
for (i = 0; i < dev->queue_cnt; i++) {
- INIT_LIST_HEAD(&dev->queues[i].state_list);
- spin_lock_init(&dev->queues[i].lock);
+ struct se_device_queue *q;
+
+ q = &dev->queues[i];
+ INIT_LIST_HEAD(&q->state_list);
+ spin_lock_init(&q->lock);
+
+ init_llist_head(&q->sq.cmd_list);
+ INIT_WORK(&q->sq.work, target_queued_submit_work);
}
dev->se_hba = hba;
diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
index e7b3c6e5d574..56f841fd7f04 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -153,6 +153,7 @@ void target_qf_do_work(struct work_struct *work);
bool target_check_wce(struct se_device *dev);
bool target_check_fua(struct se_device *dev);
void __target_execute_cmd(struct se_cmd *, bool);
+void target_queued_submit_work(struct work_struct *work);
/* target_core_stat.c */
void target_stat_setup_dev_default_groups(struct se_device *);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index f19b4b81e3b9..5f33730e3d27 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -41,6 +41,7 @@
#include <trace/events/target.h>
static struct workqueue_struct *target_completion_wq;
+static struct workqueue_struct *target_submission_wq;
static struct kmem_cache *se_sess_cache;
struct kmem_cache *se_ua_cache;
struct kmem_cache *t10_pr_reg_cache;
@@ -129,8 +130,15 @@ int init_se_kmem_caches(void)
if (!target_completion_wq)
goto out_free_lba_map_mem_cache;
+ target_submission_wq = alloc_workqueue("target_submission",
+ WQ_MEM_RECLAIM, 0);
+ if (!target_submission_wq)
+ goto out_free_completion_wq;
+
return 0;
+out_free_completion_wq:
+ destroy_workqueue(target_completion_wq);
out_free_lba_map_mem_cache:
kmem_cache_destroy(t10_alua_lba_map_mem_cache);
out_free_lba_map_cache:
@@ -153,6 +161,7 @@ int init_se_kmem_caches(void)
void release_se_kmem_caches(void)
{
+ destroy_workqueue(target_submission_wq);
destroy_workqueue(target_completion_wq);
kmem_cache_destroy(se_sess_cache);
kmem_cache_destroy(se_ua_cache);
@@ -1380,7 +1389,6 @@ void transport_init_se_cmd(
{
INIT_LIST_HEAD(&cmd->se_delayed_node);
INIT_LIST_HEAD(&cmd->se_qf_node);
- INIT_LIST_HEAD(&cmd->se_cmd_list);
INIT_LIST_HEAD(&cmd->state_list);
init_completion(&cmd->t_transport_stop_comp);
cmd->free_compl = NULL;
@@ -1827,6 +1835,91 @@ int target_submit_cmd(struct se_cmd *se_cmd, struct se_session *se_sess,
}
EXPORT_SYMBOL(target_submit_cmd);
+void target_queued_submit_work(struct work_struct *work)
+{
+ struct se_cmd_queue *sq = container_of(work, struct se_cmd_queue, work);
+ struct se_cmd *se_cmd, *next_cmd;
+ struct llist_node *cmd_list;
+ unsigned char *cdb;
+
+ cmd_list = llist_del_all(&sq->cmd_list);
+ if (!cmd_list)
+ /* Previous call took what we were queued to submit */
+ return;
+
+ cmd_list = llist_reverse_order(cmd_list);
+ llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
+ cdb = se_cmd->se_tfo->get_cdb(se_cmd);
+ target_submit(se_cmd, cdb);
+ }
+}
+
+static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
+ int cpu)
+{
+ llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
+ queue_work_on(cpu, target_submission_wq, &q->work);
+}
+
+/**
+ * target_queue_cmd_submit - queue the cmd to run on the LIO workqueue
+ * @se_cmd: command descriptor to submit
+ * @se_sess: associated se_sess for endpoint
+ * @sense: pointer to SCSI sense buffer
+ * @unpacked_lun: unpacked LUN to reference for struct se_lun
+ * @data_length: fabric expected data transfer length
+ * @task_attr: SAM task attribute
+ * @data_dir: DMA data direction
+ * @flags: flags for command submission from target_sc_flags_tables
+ * @sgl: struct scatterlist memory for unidirectional mapping
+ * @sgl_count: scatterlist count for unidirectional mapping
+ * @sgl_bidi: struct scatterlist memory for bidirectional READ mapping
+ * @sgl_bidi_count: scatterlist count for bidirectional READ mapping
+ * @sgl_prot: struct scatterlist memory protection information
+ * @sgl_prot_count: scatterlist count for protection information
+ *
+ * Task tags are supported if the caller has set @se_cmd->tag.
+ *
+ * Returns non zero to signal active I/O shutdown failure. All other
+ * setup exceptions will be returned as a SCSI CHECK_CONDITION response,
+ * but still return zero here.
+ *
+ * This may only be called from interrupt context if the caller's
+ * queue_status and release_cmd callouts do not block.
+ *
+ * This assumes internal allocation of fabric payload buffer by target-core.
+ *
+ * Callers must implement the target_core_fabric_ops->get_cdb callback.
+ */
+int
+target_queue_cmd_submit(struct se_cmd *se_cmd, struct se_session *se_sess,
+ unsigned char *sense, u64 unpacked_lun, u32 data_length,
+ int task_attr, int data_dir, int flags,
+ struct scatterlist *sgl, u32 sgl_count,
+ struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+ struct scatterlist *sgl_prot, u32 sgl_prot_count)
+{
+ struct se_device *se_dev;
+ int cpu, ret;
+
+ BUG_ON(!se_sess->se_tpg->se_tpg_tfo->get_cdb);
+
+ ret = target_submit_prep(se_cmd, se_sess, sense, unpacked_lun,
+ data_length, task_attr, data_dir, flags,
+ sgl, sgl_count, sgl_bidi, sgl_bidi_count,
+ sgl_prot, sgl_prot_count);
+ if (ret < 0)
+ return ret;
+ else if (ret > 0)
+ return 0;
+
+ cpu = se_cmd->cpuid;
+ se_dev = se_cmd->se_dev;
+ target_queue_cmd_work(&se_dev->queues[cpu].sq, se_cmd, cpu);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
+
static void target_complete_tmr_failure(struct work_struct *work)
{
struct se_cmd *se_cmd = container_of(work, struct se_cmd, work);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 63dd12124139..815de4c97230 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -487,7 +487,7 @@ struct se_cmd {
/* Only used for internal passthrough and legacy TCM fabric modules */
struct se_session *se_sess;
struct se_tmr_req *se_tmr_req;
- struct list_head se_cmd_list;
+ struct llist_node se_cmd_list;
struct completion *free_compl;
struct completion *abrt_compl;
const struct target_core_fabric_ops *se_tfo;
@@ -764,9 +764,15 @@ struct se_dev_stat_grps {
struct config_group scsi_lu_group;
};
+struct se_cmd_queue {
+ struct llist_head cmd_list;
+ struct work_struct work;
+};
+
struct se_device_queue {
struct list_head state_list;
spinlock_t lock;
+ struct se_cmd_queue sq;
};
struct se_device {
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index d60a3eb7517a..d7b54f3e04da 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -80,6 +80,7 @@ struct target_core_fabric_ops {
int (*queue_status)(struct se_cmd *);
void (*queue_tm_rsp)(struct se_cmd *);
void (*aborted_task)(struct se_cmd *);
+ unsigned char *(*get_cdb)(struct se_cmd *);
/*
* fabric module calls for target_core_fabric_configfs.c
*/
@@ -160,6 +161,12 @@ int target_submit_cmd_map_sgls(struct se_cmd *, struct se_session *,
struct scatterlist *, u32);
int target_submit_cmd(struct se_cmd *, struct se_session *, unsigned char *,
unsigned char *, u64, u32, int, int, int);
+int target_queue_cmd_submit(struct se_cmd *se_cmd, struct se_session *se_sess,
+ unsigned char *sense, u64 unpacked_lun, u32 data_length,
+ int task_attr, int data_dir, int flags, struct scatterlist *sgl,
+ u32 sgl_count, struct scatterlist *sgl_bidi, u32 sgl_bidi_count,
+ struct scatterlist *sgl_prot, u32 sgl_prot_count);
+
int target_submit_tmr(struct se_cmd *se_cmd, struct se_session *se_sess,
unsigned char *sense, u64 unpacked_lun,
void *fabric_tmr_ptr, unsigned char tm_type,
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 04/13] vhost scsi: use lio wq cmd submission helper
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (2 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 03/13] target: add workqueue based cmd submission Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds Mike Christie
` (8 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
Convert vhost-scsi to use the lio wq cmd submission helper.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/vhost/scsi.c | 49 +++++++++++++++-----------------------------
1 file changed, 17 insertions(+), 32 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4ce9f00ae10e..99909c6f3960 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -85,7 +85,7 @@ struct vhost_scsi_cmd {
/* The number of scatterlists associated with this cmd */
u32 tvc_sgl_count;
u32 tvc_prot_sgl_count;
- /* Saved unpacked SCSI LUN for vhost_scsi_submission_work() */
+ /* Saved unpacked SCSI LUN for vhost_scsi_target_queue_cmd() */
u32 tvc_lun;
/* Pointer to the SGL formatted memory from virtio-scsi */
struct scatterlist *tvc_sgl;
@@ -101,8 +101,6 @@ struct vhost_scsi_cmd {
struct vhost_scsi_nexus *tvc_nexus;
/* The TCM I/O descriptor that is accessed via container_of() */
struct se_cmd tvc_se_cmd;
- /* work item used for cmwq dispatch to vhost_scsi_submission_work() */
- struct work_struct work;
/* Copy of the incoming SCSI command descriptor block (CDB) */
unsigned char tvc_cdb[VHOST_SCSI_MAX_CDB_SIZE];
/* Sense buffer that will be mapped into outgoing status */
@@ -240,8 +238,6 @@ struct vhost_scsi_ctx {
struct iov_iter out_iter;
};
-static struct workqueue_struct *vhost_scsi_workqueue;
-
/* Global spinlock to protect vhost_scsi TPG list for vhost IOCTL access */
static DEFINE_MUTEX(vhost_scsi_mutex);
static LIST_HEAD(vhost_scsi_list);
@@ -782,12 +778,10 @@ static int vhost_scsi_to_tcm_attr(int attr)
return TCM_SIMPLE_TAG;
}
-static void vhost_scsi_submission_work(struct work_struct *work)
+static void vhost_scsi_target_queue_cmd(struct vhost_scsi_cmd *cmd)
{
- struct vhost_scsi_cmd *cmd =
- container_of(work, struct vhost_scsi_cmd, work);
- struct vhost_scsi_nexus *tv_nexus;
struct se_cmd *se_cmd = &cmd->tvc_se_cmd;
+ struct vhost_scsi_nexus *tv_nexus;
struct scatterlist *sg_ptr, *sg_prot_ptr = NULL;
int rc;
@@ -805,9 +799,9 @@ static void vhost_scsi_submission_work(struct work_struct *work)
tv_nexus = cmd->tvc_nexus;
se_cmd->tag = 0;
- rc = target_submit_cmd_map_sgls(se_cmd, tv_nexus->tvn_se_sess,
- cmd->tvc_cdb, &cmd->tvc_sense_buf[0],
- cmd->tvc_lun, cmd->tvc_exp_data_len,
+ rc = target_queue_cmd_submit(se_cmd, tv_nexus->tvn_se_sess,
+ &cmd->tvc_sense_buf[0], cmd->tvc_lun,
+ cmd->tvc_exp_data_len,
vhost_scsi_to_tcm_attr(cmd->tvc_task_attr),
cmd->tvc_data_direction, TARGET_SCF_ACK_KREF,
sg_ptr, cmd->tvc_sgl_count, NULL, 0, sg_prot_ptr,
@@ -819,6 +813,14 @@ static void vhost_scsi_submission_work(struct work_struct *work)
}
}
+static unsigned char *vhost_scsi_get_cdb(struct se_cmd *se_cmd)
+{
+ struct vhost_scsi_cmd *cmd = container_of(se_cmd, struct vhost_scsi_cmd,
+ tvc_se_cmd);
+
+ return cmd->tvc_cdb;
+}
+
static void
vhost_scsi_send_bad_target(struct vhost_scsi *vs,
struct vhost_virtqueue *vq,
@@ -1132,14 +1134,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
* vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
*/
cmd->tvc_vq_desc = vc.head;
- /*
- * Dispatch cmd descriptor for cmwq execution in process
- * context provided by vhost_scsi_workqueue. This also ensures
- * cmd is executed on the same kworker CPU as this vhost
- * thread to gain positive L2 cache locality effects.
- */
- INIT_WORK(&cmd->work, vhost_scsi_submission_work);
- queue_work(vhost_scsi_workqueue, &cmd->work);
+ vhost_scsi_target_queue_cmd(cmd);
ret = 0;
err:
/*
@@ -2466,6 +2461,7 @@ static const struct target_core_fabric_ops vhost_scsi_ops = {
.queue_status = vhost_scsi_queue_status,
.queue_tm_rsp = vhost_scsi_queue_tm_rsp,
.aborted_task = vhost_scsi_aborted_task,
+ .get_cdb = vhost_scsi_get_cdb,
/*
* Setup callers for generic logic in target_core_fabric_configfs.c
*/
@@ -2489,17 +2485,9 @@ static int __init vhost_scsi_init(void)
" on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname,
utsname()->machine);
- /*
- * Use our own dedicated workqueue for submitting I/O into
- * target core to avoid contention within system_wq.
- */
- vhost_scsi_workqueue = alloc_workqueue("vhost_scsi", 0, 0);
- if (!vhost_scsi_workqueue)
- goto out;
-
ret = vhost_scsi_register();
if (ret < 0)
- goto out_destroy_workqueue;
+ goto out;
ret = target_register_template(&vhost_scsi_ops);
if (ret < 0)
@@ -2509,8 +2497,6 @@ static int __init vhost_scsi_init(void)
out_vhost_scsi_deregister:
vhost_scsi_deregister();
-out_destroy_workqueue:
- destroy_workqueue(vhost_scsi_workqueue);
out:
return ret;
};
@@ -2519,7 +2505,6 @@ static void vhost_scsi_exit(void)
{
target_unregister_template(&vhost_scsi_ops);
vhost_scsi_deregister();
- destroy_workqueue(vhost_scsi_workqueue);
};
MODULE_DESCRIPTION("VHOST_SCSI series fabric driver");
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (3 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 04/13] vhost scsi: use lio wq cmd submission helper Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 06/13] tcm loop: use lio wq cmd submission helper Mike Christie
` (7 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
This just has tcm loop use the block layer cmd allocator for se_cmds
instead of using the tcm_loop_cmd_cache. In future patches when we
can use the host tags for internal requests like TMFs we can completely
kill the tcm_loop_cmd_cache.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/loopback/tcm_loop.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index badba437e5f9..274826a2b0bd 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -67,8 +67,12 @@ static void tcm_loop_release_cmd(struct se_cmd *se_cmd)
{
struct tcm_loop_cmd *tl_cmd = container_of(se_cmd,
struct tcm_loop_cmd, tl_se_cmd);
+ struct scsi_cmnd *sc = tl_cmd->sc;
- kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
+ if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
+ kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
+ else
+ sc->scsi_done(sc);
}
static int tcm_loop_show_info(struct seq_file *m, struct Scsi_Host *host)
@@ -165,7 +169,6 @@ static void tcm_loop_submission_work(struct work_struct *work)
return;
out_done:
- kmem_cache_free(tcm_loop_cmd_cache, tl_cmd);
sc->scsi_done(sc);
}
@@ -175,20 +178,14 @@ static void tcm_loop_submission_work(struct work_struct *work)
*/
static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
{
- struct tcm_loop_cmd *tl_cmd;
+ struct tcm_loop_cmd *tl_cmd = scsi_cmd_priv(sc);
pr_debug("%s() %d:%d:%d:%llu got CDB: 0x%02x scsi_buf_len: %u\n",
__func__, sc->device->host->host_no, sc->device->id,
sc->device->channel, sc->device->lun, sc->cmnd[0],
scsi_bufflen(sc));
- tl_cmd = kmem_cache_zalloc(tcm_loop_cmd_cache, GFP_ATOMIC);
- if (!tl_cmd) {
- set_host_byte(sc, DID_ERROR);
- sc->scsi_done(sc);
- return 0;
- }
-
+ memset(tl_cmd, 0, sizeof(*tl_cmd));
tl_cmd->sc = sc;
tl_cmd->sc_cmd_tag = sc->request->tag;
INIT_WORK(&tl_cmd->work, tcm_loop_submission_work);
@@ -320,6 +317,7 @@ static struct scsi_host_template tcm_loop_driver_template = {
.dma_boundary = PAGE_SIZE - 1,
.module = THIS_MODULE,
.track_queue_depth = 1,
+ .cmd_size = sizeof(struct tcm_loop_cmd),
};
static int tcm_loop_driver_probe(struct device *dev)
@@ -580,7 +578,6 @@ static int tcm_loop_queue_data_or_status(const char *func,
if ((se_cmd->se_cmd_flags & SCF_OVERFLOW_BIT) ||
(se_cmd->se_cmd_flags & SCF_UNDERFLOW_BIT))
scsi_set_resid(sc, se_cmd->residual_count);
- sc->scsi_done(sc);
return 0;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 06/13] tcm loop: use lio wq cmd submission helper
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (4 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 05/13] tcm loop: use blk cmd allocator for se_cmds Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 07/13] target: cleanup cmd flag bits Mike Christie
` (6 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
Convert loop to use the lio wq cmd submission helper.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/loopback/tcm_loop.c | 29 ++++++++++++++---------------
drivers/target/loopback/tcm_loop.h | 1 -
2 files changed, 14 insertions(+), 16 deletions(-)
diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 274826a2b0bd..d4b008f5b7a6 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -39,7 +39,6 @@
#define to_tcm_loop_hba(hba) container_of(hba, struct tcm_loop_hba, dev)
-static struct workqueue_struct *tcm_loop_workqueue;
static struct kmem_cache *tcm_loop_cmd_cache;
static int tcm_loop_hba_no_cnt;
@@ -106,10 +105,16 @@ static struct device_driver tcm_loop_driverfs = {
*/
static struct device *tcm_loop_primary;
-static void tcm_loop_submission_work(struct work_struct *work)
+static unsigned char *tcm_loop_get_cdb(struct se_cmd *se_cmd)
+{
+ struct tcm_loop_cmd *tl_cmd = container_of(se_cmd, struct tcm_loop_cmd,
+ tl_se_cmd);
+
+ return tl_cmd->sc->cmnd;
+}
+
+static void tcm_loop_target_queue_cmd(struct tcm_loop_cmd *tl_cmd)
{
- struct tcm_loop_cmd *tl_cmd =
- container_of(work, struct tcm_loop_cmd, work);
struct se_cmd *se_cmd = &tl_cmd->tl_se_cmd;
struct scsi_cmnd *sc = tl_cmd->sc;
struct tcm_loop_nexus *tl_nexus;
@@ -155,7 +160,7 @@ static void tcm_loop_submission_work(struct work_struct *work)
}
se_cmd->tag = tl_cmd->sc_cmd_tag;
- rc = target_submit_cmd_map_sgls(se_cmd, tl_nexus->se_sess, sc->cmnd,
+ rc = target_queue_cmd_submit(se_cmd, tl_nexus->se_sess,
&tl_cmd->tl_sense_buf[0], tl_cmd->sc->device->lun,
transfer_length, TCM_SIMPLE_TAG,
sc->sc_data_direction, 0,
@@ -188,8 +193,8 @@ static int tcm_loop_queuecommand(struct Scsi_Host *sh, struct scsi_cmnd *sc)
memset(tl_cmd, 0, sizeof(*tl_cmd));
tl_cmd->sc = sc;
tl_cmd->sc_cmd_tag = sc->request->tag;
- INIT_WORK(&tl_cmd->work, tcm_loop_submission_work);
- queue_work(tcm_loop_workqueue, &tl_cmd->work);
+
+ tcm_loop_target_queue_cmd(tl_cmd);
return 0;
}
@@ -1146,6 +1151,7 @@ static const struct target_core_fabric_ops loop_ops = {
.queue_status = tcm_loop_queue_status,
.queue_tm_rsp = tcm_loop_queue_tm_rsp,
.aborted_task = tcm_loop_aborted_task,
+ .get_cdb = tcm_loop_get_cdb,
.fabric_make_wwn = tcm_loop_make_scsi_hba,
.fabric_drop_wwn = tcm_loop_drop_scsi_hba,
.fabric_make_tpg = tcm_loop_make_naa_tpg,
@@ -1161,17 +1167,13 @@ static int __init tcm_loop_fabric_init(void)
{
int ret = -ENOMEM;
- tcm_loop_workqueue = alloc_workqueue("tcm_loop", 0, 0);
- if (!tcm_loop_workqueue)
- goto out;
-
tcm_loop_cmd_cache = kmem_cache_create("tcm_loop_cmd_cache",
sizeof(struct tcm_loop_cmd),
__alignof__(struct tcm_loop_cmd),
0, NULL);
if (!tcm_loop_cmd_cache) {
pr_debug("kmem_cache_create() for tcm_loop_cmd_cache failed\n");
- goto out_destroy_workqueue;
+ goto out;
}
ret = tcm_loop_alloc_core_bus();
@@ -1188,8 +1190,6 @@ static int __init tcm_loop_fabric_init(void)
tcm_loop_release_core_bus();
out_destroy_cache:
kmem_cache_destroy(tcm_loop_cmd_cache);
-out_destroy_workqueue:
- destroy_workqueue(tcm_loop_workqueue);
out:
return ret;
}
@@ -1199,7 +1199,6 @@ static void __exit tcm_loop_fabric_exit(void)
target_unregister_template(&loop_ops);
tcm_loop_release_core_bus();
kmem_cache_destroy(tcm_loop_cmd_cache);
- destroy_workqueue(tcm_loop_workqueue);
}
MODULE_DESCRIPTION("TCM loopback virtual Linux/SCSI fabric module");
diff --git a/drivers/target/loopback/tcm_loop.h b/drivers/target/loopback/tcm_loop.h
index d3110909a213..437663b3905c 100644
--- a/drivers/target/loopback/tcm_loop.h
+++ b/drivers/target/loopback/tcm_loop.h
@@ -16,7 +16,6 @@ struct tcm_loop_cmd {
struct scsi_cmnd *sc;
/* The TCM I/O descriptor that is accessed via container_of() */
struct se_cmd tl_se_cmd;
- struct work_struct work;
struct completion tmr_done;
/* Sense buffer that will be mapped into outgoing status */
unsigned char tl_sense_buf[TRANSPORT_SENSE_BUFFER];
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 07/13] target: cleanup cmd flag bits
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (5 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 06/13] tcm loop: use lio wq cmd submission helper Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 08/13] target: fix backend plugging Mike Christie
` (5 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie, Chaitanya Kulkarni
We have a couple holes in the cmd flags definitions. This cleans
up the definitions to fix that and make it easier to read.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
include/target/target_core_base.h | 38 +++++++++++++++----------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 815de4c97230..5e6703ca102d 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -127,25 +127,25 @@ enum transport_state_table {
/* Used for struct se_cmd->se_cmd_flags */
enum se_cmd_flags_table {
- SCF_SUPPORTED_SAM_OPCODE = 0x00000001,
- SCF_TRANSPORT_TASK_SENSE = 0x00000002,
- SCF_EMULATED_TASK_SENSE = 0x00000004,
- SCF_SCSI_DATA_CDB = 0x00000008,
- SCF_SCSI_TMR_CDB = 0x00000010,
- SCF_FUA = 0x00000080,
- SCF_SE_LUN_CMD = 0x00000100,
- SCF_BIDI = 0x00000400,
- SCF_SENT_CHECK_CONDITION = 0x00000800,
- SCF_OVERFLOW_BIT = 0x00001000,
- SCF_UNDERFLOW_BIT = 0x00002000,
- SCF_ALUA_NON_OPTIMIZED = 0x00008000,
- SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC = 0x00020000,
- SCF_COMPARE_AND_WRITE = 0x00080000,
- SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC = 0x00200000,
- SCF_ACK_KREF = 0x00400000,
- SCF_USE_CPUID = 0x00800000,
- SCF_TASK_ATTR_SET = 0x01000000,
- SCF_TREAT_READ_AS_NORMAL = 0x02000000,
+ SCF_SUPPORTED_SAM_OPCODE = (1 << 0),
+ SCF_TRANSPORT_TASK_SENSE = (1 << 1),
+ SCF_EMULATED_TASK_SENSE = (1 << 2),
+ SCF_SCSI_DATA_CDB = (1 << 3),
+ SCF_SCSI_TMR_CDB = (1 << 4),
+ SCF_FUA = (1 << 5),
+ SCF_SE_LUN_CMD = (1 << 6),
+ SCF_BIDI = (1 << 7),
+ SCF_SENT_CHECK_CONDITION = (1 << 8),
+ SCF_OVERFLOW_BIT = (1 << 9),
+ SCF_UNDERFLOW_BIT = (1 << 10),
+ SCF_ALUA_NON_OPTIMIZED = (1 << 11),
+ SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC = (1 << 12),
+ SCF_COMPARE_AND_WRITE = (1 << 13),
+ SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC = (1 << 14),
+ SCF_ACK_KREF = (1 << 15),
+ SCF_USE_CPUID = (1 << 16),
+ SCF_TASK_ATTR_SET = (1 << 17),
+ SCF_TREAT_READ_AS_NORMAL = (1 << 18),
};
/*
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 08/13] target: fix backend plugging
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (6 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 07/13] target: cleanup cmd flag bits Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 09/13] target iblock: add backend plug/unplug callouts Mike Christie
` (4 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
target_core_iblock is plugging and unplugging on every command and this
is causing perf issues for drivers that prefer batched cmds. With the
last patches we can now take multiple cmds from a fabric driver queue
and then pass them down the backend drivers in a batch. This patch adds
this support by adding 2 callouts to the backend for plugging and
unplugging the device. The next 2 patches add support for iblock and
tcmu device plugging.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_transport.c | 40 ++++++++++++++++++++++++++
include/target/target_core_backend.h | 2 ++
include/target/target_core_base.h | 4 +++
3 files changed, 46 insertions(+)
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 5f33730e3d27..c4bc012fc215 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -1835,10 +1835,42 @@ int target_submit_cmd(struct se_cmd *se_cmd, struct se_session *se_sess,
}
EXPORT_SYMBOL(target_submit_cmd);
+
+static struct se_dev_plug *target_plug_device(struct se_device *se_dev)
+{
+ struct se_dev_plug *se_plug;
+
+ if (!se_dev->transport->plug_device)
+ return NULL;
+
+ se_plug = se_dev->transport->plug_device(se_dev);
+ if (!se_plug)
+ return NULL;
+
+ se_plug->se_dev = se_dev;
+ /*
+ * We have a ref to the lun at this point, but the cmds could
+ * complete before we unplug, so grab a ref to the se_device so we
+ * can call back into the backend.
+ */
+ config_group_get(&se_dev->dev_group);
+ return se_plug;
+}
+
+static void target_unplug_device(struct se_dev_plug *se_plug)
+{
+ struct se_device *se_dev = se_plug->se_dev;
+
+ se_dev->transport->unplug_device(se_plug);
+ config_group_put(&se_dev->dev_group);
+}
+
void target_queued_submit_work(struct work_struct *work)
{
struct se_cmd_queue *sq = container_of(work, struct se_cmd_queue, work);
struct se_cmd *se_cmd, *next_cmd;
+ struct se_dev_plug *se_plug = NULL;
+ struct se_device *se_dev = NULL;
struct llist_node *cmd_list;
unsigned char *cdb;
@@ -1849,9 +1881,17 @@ void target_queued_submit_work(struct work_struct *work)
cmd_list = llist_reverse_order(cmd_list);
llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list) {
+ if (!se_dev) {
+ se_dev = se_cmd->se_dev;
+ se_plug = target_plug_device(se_dev);
+ }
+
cdb = se_cmd->se_tfo->get_cdb(se_cmd);
target_submit(se_cmd, cdb);
}
+
+ if (se_plug)
+ target_unplug_device(se_plug);
}
static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index 6336780d83a7..aa5f83e55501 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -34,6 +34,8 @@ struct target_backend_ops {
int (*configure_device)(struct se_device *);
void (*destroy_device)(struct se_device *);
void (*free_device)(struct se_device *device);
+ struct se_dev_plug *(*plug_device)(struct se_device *se_dev);
+ void (*unplug_device)(struct se_dev_plug *se_plug);
ssize_t (*set_configfs_dev_params)(struct se_device *,
const char *, ssize_t);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 5e6703ca102d..b8e0a3250bd0 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -769,6 +769,10 @@ struct se_cmd_queue {
struct work_struct work;
};
+struct se_dev_plug {
+ struct se_device *se_dev;
+};
+
struct se_device_queue {
struct list_head state_list;
spinlock_t lock;
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 09/13] target iblock: add backend plug/unplug callouts
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (7 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 08/13] target: fix backend plugging Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 10/13] target_core_user: " Mike Christie
` (3 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
This patch adds plug/unplug callouts for iblock. For initiator drivers
like iscsi which wants to pass multiple cmds to its xmit thread instead
of one cmd at a time, this increases IOPs by around 10% with vhost-scsi
(combined with the last patches we can see a total 40-50% increase). For
driver combos like tcm_loop and faster drivers like the iser initiator, we
can still see IOPs increase by 20-30% when tcm_loop's nr_hw_queues setting
is also increased.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_iblock.c | 44 ++++++++++++++++++++++++++++-
drivers/target/target_core_iblock.h | 10 +++++++
2 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c
index 8ed93fd205c7..33c88eca090f 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -61,9 +61,18 @@ static struct se_device *iblock_alloc_device(struct se_hba *hba, const char *nam
return NULL;
}
+ ib_dev->ibd_plug = kcalloc(nr_cpu_ids, sizeof(*ib_dev->ibd_plug),
+ GFP_KERNEL);
+ if (!ib_dev->ibd_plug)
+ goto free_dev;
+
pr_debug( "IBLOCK: Allocated ib_dev for %s\n", name);
return &ib_dev->dev;
+
+free_dev:
+ kfree(ib_dev);
+ return NULL;
}
static int iblock_configure_device(struct se_device *dev)
@@ -171,6 +180,7 @@ static void iblock_dev_call_rcu(struct rcu_head *p)
struct se_device *dev = container_of(p, struct se_device, rcu_head);
struct iblock_dev *ib_dev = IBLOCK_DEV(dev);
+ kfree(ib_dev->ibd_plug);
kfree(ib_dev);
}
@@ -188,6 +198,33 @@ static void iblock_destroy_device(struct se_device *dev)
bioset_exit(&ib_dev->ibd_bio_set);
}
+static struct se_dev_plug *iblock_plug_device(struct se_device *se_dev)
+{
+ struct iblock_dev *ib_dev = IBLOCK_DEV(se_dev);
+ struct iblock_dev_plug *ib_dev_plug;
+
+ /*
+ * Each se_device has a per cpu work this can be run from. Wwe
+ * shouldn't have multiple threads on the same cpu calling this
+ * at the same time.
+ */
+ ib_dev_plug = &ib_dev->ibd_plug[smp_processor_id()];
+ if (test_and_set_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags))
+ return NULL;
+
+ blk_start_plug(&ib_dev_plug->blk_plug);
+ return &ib_dev_plug->se_plug;
+}
+
+static void iblock_unplug_device(struct se_dev_plug *se_plug)
+{
+ struct iblock_dev_plug *ib_dev_plug = container_of(se_plug,
+ struct iblock_dev_plug, se_plug);
+
+ blk_finish_plug(&ib_dev_plug->blk_plug);
+ clear_bit(IBD_PLUGF_PLUGGED, &ib_dev_plug->flags);
+}
+
static unsigned long long iblock_emulate_read_cap_with_block_size(
struct se_device *dev,
struct block_device *bd,
@@ -337,7 +374,10 @@ static void iblock_submit_bios(struct bio_list *list)
{
struct blk_plug plug;
struct bio *bio;
-
+ /*
+ * The block layer handles nested plugs, so just plug/unplug to handle
+ * fabric drivers that didn't support batching and multi bio cmds.
+ */
blk_start_plug(&plug);
while ((bio = bio_list_pop(list)))
submit_bio(bio);
@@ -870,6 +910,8 @@ static const struct target_backend_ops iblock_ops = {
.configure_device = iblock_configure_device,
.destroy_device = iblock_destroy_device,
.free_device = iblock_free_device,
+ .plug_device = iblock_plug_device,
+ .unplug_device = iblock_unplug_device,
.parse_cdb = iblock_parse_cdb,
.set_configfs_dev_params = iblock_set_configfs_dev_params,
.show_configfs_dev_params = iblock_show_configfs_dev_params,
diff --git a/drivers/target/target_core_iblock.h b/drivers/target/target_core_iblock.h
index cefc641145b3..8c55375d2f75 100644
--- a/drivers/target/target_core_iblock.h
+++ b/drivers/target/target_core_iblock.h
@@ -4,6 +4,7 @@
#include <linux/atomic.h>
#include <linux/refcount.h>
+#include <linux/blkdev.h>
#include <target/target_core_base.h>
#define IBLOCK_VERSION "4.0"
@@ -17,6 +18,14 @@ struct iblock_req {
#define IBDF_HAS_UDEV_PATH 0x01
+#define IBD_PLUGF_PLUGGED 0x01
+
+struct iblock_dev_plug {
+ struct se_dev_plug se_plug;
+ struct blk_plug blk_plug;
+ unsigned long flags;
+};
+
struct iblock_dev {
struct se_device dev;
unsigned char ibd_udev_path[SE_UDEV_PATH_LEN];
@@ -24,6 +33,7 @@ struct iblock_dev {
struct bio_set ibd_bio_set;
struct block_device *ibd_bd;
bool ibd_readonly;
+ struct iblock_dev_plug *ibd_plug;
} ____cacheline_aligned;
#endif /* TARGET_CORE_IBLOCK_H */
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 10/13] target_core_user: add backend plug/unplug callouts
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (8 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 09/13] target iblock: add backend plug/unplug callouts Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 11/13] target: replace work per cmd in completion path Mike Christie
` (2 subsequent siblings)
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
This patch adds plug/unplug callouts for tcmu, so we can avoid the
number of times we switch to userspace. Using this driver with tcm
loop is a common config, and dependng on the nr_hw_queues
(nr_hw_queues=1 performs much better) and fio jobs (lower num jobs
around 4) this patch can increase IOPs by only around 5-10% because
we hit other issues like the big per tcmu device mutex.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_user.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index a5991df23581..a433bda56b89 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -111,6 +111,7 @@ struct tcmu_dev {
struct kref kref;
struct se_device se_dev;
+ struct se_dev_plug se_plug;
char *name;
struct se_hba *hba;
@@ -119,6 +120,7 @@ struct tcmu_dev {
#define TCMU_DEV_BIT_BROKEN 1
#define TCMU_DEV_BIT_BLOCKED 2
#define TCMU_DEV_BIT_TMR_NOTIFY 3
+#define TCM_DEV_BIT_PLUGGED 4
unsigned long flags;
struct uio_info uio_info;
@@ -959,6 +961,25 @@ static uint32_t ring_insert_padding(struct tcmu_dev *udev, size_t cmd_size)
return cmd_head;
}
+static void tcmu_unplug_device(struct se_dev_plug *se_plug)
+{
+ struct se_device *se_dev = se_plug->se_dev;
+ struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+ clear_bit(TCM_DEV_BIT_PLUGGED, &udev->flags);
+ uio_event_notify(&udev->uio_info);
+}
+
+static struct se_dev_plug *tcmu_plug_device(struct se_device *se_dev)
+{
+ struct tcmu_dev *udev = TCMU_DEV(se_dev);
+
+ if (!test_and_set_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+ return &udev->se_plug;
+
+ return NULL;
+}
+
/**
* queue_cmd_ring - queue cmd to ring or internally
* @tcmu_cmd: cmd to queue
@@ -1086,8 +1107,8 @@ static int queue_cmd_ring(struct tcmu_cmd *tcmu_cmd, sense_reason_t *scsi_err)
list_add_tail(&tcmu_cmd->queue_entry, &udev->inflight_queue);
- /* TODO: only if FLUSH and FUA? */
- uio_event_notify(&udev->uio_info);
+ if (!test_bit(TCM_DEV_BIT_PLUGGED, &udev->flags))
+ uio_event_notify(&udev->uio_info);
return 0;
@@ -2840,6 +2861,8 @@ static struct target_backend_ops tcmu_ops = {
.configure_device = tcmu_configure_device,
.destroy_device = tcmu_destroy_device,
.free_device = tcmu_free_device,
+ .unplug_device = tcmu_unplug_device,
+ .plug_device = tcmu_plug_device,
.parse_cdb = tcmu_parse_cdb,
.tmr_notify = tcmu_tmr_notify,
.set_configfs_dev_params = tcmu_set_configfs_dev_params,
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 11/13] target: replace work per cmd in completion path
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (9 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 10/13] target_core_user: " Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion Mike Christie
2021-02-10 4:55 ` [PATCH 13/13] target: flush submission work during TMR processing Mike Christie
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
Doing a work per cmd can lead to lots of threads being created.
This patch just replaces the completion work per cmd with a per cpu
list. Combined with the first patches this allows tcm loop on top of
initiators like iser to go from around 700K IOPs to 1000K and reduces
the number of threads that get created when the system is under heavy
load and hitting the initiator drivers tagging limits.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_device.c | 3 +
drivers/target/target_core_internal.h | 1 +
drivers/target/target_core_transport.c | 103 ++++++++++++++-----------
include/target/target_core_base.h | 1 +
4 files changed, 65 insertions(+), 43 deletions(-)
diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
index 74d3a4896588..eaa2323843c0 100644
--- a/drivers/target/target_core_device.c
+++ b/drivers/target/target_core_device.c
@@ -743,6 +743,9 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
init_llist_head(&q->sq.cmd_list);
INIT_WORK(&q->sq.work, target_queued_submit_work);
+
+ init_llist_head(&q->cq.cmd_list);
+ INIT_WORK(&q->cq.work, target_queued_compl_work);
}
dev->se_hba = hba;
diff --git a/drivers/target/target_core_internal.h b/drivers/target/target_core_internal.h
index 56f841fd7f04..e54d05ae8dfd 100644
--- a/drivers/target/target_core_internal.h
+++ b/drivers/target/target_core_internal.h
@@ -154,6 +154,7 @@ bool target_check_wce(struct se_device *dev);
bool target_check_fua(struct se_device *dev);
void __target_execute_cmd(struct se_cmd *, bool);
void target_queued_submit_work(struct work_struct *work);
+void target_queued_compl_work(struct work_struct *work);
/* target_core_stat.c */
void target_stat_setup_dev_default_groups(struct se_device *);
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index c4bc012fc215..039016ab7a77 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -55,7 +55,6 @@ static void transport_complete_task_attr(struct se_cmd *cmd);
static void translate_sense_reason(struct se_cmd *cmd, sense_reason_t reason);
static void transport_handle_queue_full(struct se_cmd *cmd,
struct se_device *dev, int err, bool write_pending);
-static void target_complete_ok_work(struct work_struct *work);
int init_se_kmem_caches(void)
{
@@ -732,14 +731,6 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
percpu_ref_put(&lun->lun_ref);
}
-static void target_complete_failure_work(struct work_struct *work)
-{
- struct se_cmd *cmd = container_of(work, struct se_cmd, work);
-
- transport_generic_request_failure(cmd,
- TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
-}
-
/*
* Used when asking transport to copy Sense Data from the underlying
* Linux/SCSI struct scsi_cmnd
@@ -827,11 +818,20 @@ static void target_handle_abort(struct se_cmd *cmd)
transport_cmd_check_stop_to_fabric(cmd);
}
-static void target_abort_work(struct work_struct *work)
+static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
+ int cpu, struct workqueue_struct *wq)
{
- struct se_cmd *cmd = container_of(work, struct se_cmd, work);
+ llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
+ queue_work_on(cpu, wq, &q->work);
+}
- target_handle_abort(cmd);
+static void target_queue_cmd_compl(struct se_cmd *se_cmd)
+{
+ struct se_device *se_dev = se_cmd->se_dev;
+ int cpu = se_cmd->cpuid;
+
+ target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
+ target_completion_wq);
}
static bool target_cmd_interrupted(struct se_cmd *cmd)
@@ -841,8 +841,8 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
if (cmd->transport_state & CMD_T_ABORTED) {
if (cmd->transport_complete_callback)
cmd->transport_complete_callback(cmd, false, &post_ret);
- INIT_WORK(&cmd->work, target_abort_work);
- queue_work(target_completion_wq, &cmd->work);
+
+ target_queue_cmd_compl(cmd);
return true;
} else if (cmd->transport_state & CMD_T_STOP) {
if (cmd->transport_complete_callback)
@@ -857,7 +857,6 @@ static bool target_cmd_interrupted(struct se_cmd *cmd)
/* May be called from interrupt context so must not sleep. */
void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
{
- int success;
unsigned long flags;
if (target_cmd_interrupted(cmd))
@@ -866,25 +865,11 @@ void target_complete_cmd(struct se_cmd *cmd, u8 scsi_status)
cmd->scsi_status = scsi_status;
spin_lock_irqsave(&cmd->t_state_lock, flags);
- switch (cmd->scsi_status) {
- case SAM_STAT_CHECK_CONDITION:
- if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
- success = 1;
- else
- success = 0;
- break;
- default:
- success = 1;
- break;
- }
-
cmd->t_state = TRANSPORT_COMPLETE;
cmd->transport_state |= (CMD_T_COMPLETE | CMD_T_ACTIVE);
spin_unlock_irqrestore(&cmd->t_state_lock, flags);
- INIT_WORK(&cmd->work, success ? target_complete_ok_work :
- target_complete_failure_work);
- queue_work_on(cmd->cpuid, target_completion_wq, &cmd->work);
+ target_queue_cmd_compl(cmd);
}
EXPORT_SYMBOL(target_complete_cmd);
@@ -1894,13 +1879,6 @@ void target_queued_submit_work(struct work_struct *work)
target_unplug_device(se_plug);
}
-static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
- int cpu)
-{
- llist_add(&se_cmd->se_cmd_list, &q->cmd_list);
- queue_work_on(cpu, target_submission_wq, &q->work);
-}
-
/**
* target_queue_cmd_submit - queue the cmd to run on the LIO workqueue
* @se_cmd: command descriptor to submit
@@ -1955,7 +1933,8 @@ target_queue_cmd_submit(struct se_cmd *se_cmd, struct se_session *se_sess,
cpu = se_cmd->cpuid;
se_dev = se_cmd->se_dev;
- target_queue_cmd_work(&se_dev->queues[cpu].sq, se_cmd, cpu);
+ target_queue_cmd_work(&se_dev->queues[cpu].sq, se_cmd, cpu,
+ target_submission_wq);
return 0;
}
EXPORT_SYMBOL_GPL(target_queue_cmd_submit);
@@ -2058,8 +2037,7 @@ void transport_generic_request_failure(struct se_cmd *cmd,
cmd->transport_complete_callback(cmd, false, &post_ret);
if (cmd->transport_state & CMD_T_ABORTED) {
- INIT_WORK(&cmd->work, target_abort_work);
- queue_work(target_completion_wq, &cmd->work);
+ target_queue_cmd_compl(cmd);
return;
}
@@ -2484,10 +2462,32 @@ static bool target_read_prot_action(struct se_cmd *cmd)
return false;
}
-static void target_complete_ok_work(struct work_struct *work)
+static void target_complete_cmd_work(struct se_cmd *cmd)
{
- struct se_cmd *cmd = container_of(work, struct se_cmd, work);
- int ret;
+ int ret, success;
+
+ if (cmd->transport_state & CMD_T_ABORTED) {
+ target_handle_abort(cmd);
+ return;
+ }
+
+ switch (cmd->scsi_status) {
+ case SAM_STAT_CHECK_CONDITION:
+ if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
+ success = 1;
+ else
+ success = 0;
+ break;
+ default:
+ success = 1;
+ break;
+ }
+
+ if (!success) {
+ transport_generic_request_failure(cmd,
+ TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+ return;
+ }
/*
* Check if we need to move delayed/dormant tasks from cmds on the
@@ -2629,6 +2629,23 @@ static void target_complete_ok_work(struct work_struct *work)
transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
}
+void target_queued_compl_work(struct work_struct *work)
+{
+ struct se_cmd_queue *cq = container_of(work, struct se_cmd_queue,
+ work);
+ struct se_cmd *se_cmd, *next_cmd;
+ struct llist_node *cmd_list;
+
+ cmd_list = llist_del_all(&cq->cmd_list);
+ if (!cmd_list)
+ /* Previous call took what we were queued to submit */
+ return;
+
+ cmd_list = llist_reverse_order(cmd_list);
+ llist_for_each_entry_safe(se_cmd, next_cmd, cmd_list, se_cmd_list)
+ target_complete_cmd_work(se_cmd);
+}
+
void target_free_sgl(struct scatterlist *sgl, int nents)
{
sgl_free_n_order(sgl, nents, 0);
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index b8e0a3250bd0..f2ba7de59da7 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -777,6 +777,7 @@ struct se_device_queue {
struct list_head state_list;
spinlock_t lock;
struct se_cmd_queue sq;
+ struct se_cmd_queue cq;
};
struct se_device {
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (10 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 11/13] target: replace work per cmd in completion path Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
2021-02-10 4:55 ` [PATCH 13/13] target: flush submission work during TMR processing Mike Christie
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
LIO wants to complete a cmd on the CPU it was submitted on, because
most drivers have per cpu or hw queue handlers. But, for vhost-scsi
which has the single thread for submissions and completions this
is not always the best thing to do since the thread could be running
on a different CPU now, and it conflicts with what the user has setup
in the lower levels with settings like the block layer rq_affinity
or for network block devices what the user has setup on their nic.
This patch has vhost-scsi tell LIO to complete the cmd on the CPU the
layer below LIO has completed the cmd on. We then stop fighting
the block, net and whatever layer/setting is below us.
With this patch and the previous ones I see an increase in IOPs by about
50% (234K -> 350K) for random 4K workloads like:
fio --filename=/dev/sda --direct=1 --rw=randrw --bs=4k
--ioengine=libaio --iodepth=128 --numjobs=8 --time_based
--group_reporting --runtime=60
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
drivers/target/target_core_transport.c | 10 +++++++++-
drivers/vhost/scsi.c | 3 ++-
include/target/target_core_base.h | 2 ++
3 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 039016ab7a77..0f5f8a2d101a 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -828,7 +828,12 @@ static void target_queue_cmd_work(struct se_cmd_queue *q, struct se_cmd *se_cmd,
static void target_queue_cmd_compl(struct se_cmd *se_cmd)
{
struct se_device *se_dev = se_cmd->se_dev;
- int cpu = se_cmd->cpuid;
+ int cpu;
+
+ if (se_cmd->se_cmd_flags & SCF_IGNORE_CPUID_COMPL)
+ cpu = smp_processor_id();
+ else
+ cpu = se_cmd->cpuid;
target_queue_cmd_work(&se_dev->queues[cpu].cq, se_cmd, cpu,
target_completion_wq);
@@ -1609,6 +1614,9 @@ target_submit_prep(struct se_cmd *se_cmd, struct se_session *se_sess,
BUG_ON(!se_tpg);
BUG_ON(se_cmd->se_tfo || se_cmd->se_sess);
+ if (flags & TARGET_SCF_IGNORE_CPUID_COMPL)
+ se_cmd->se_cmd_flags |= SCF_IGNORE_CPUID_COMPL;
+
if (flags & TARGET_SCF_USE_CPUID)
se_cmd->se_cmd_flags |= SCF_USE_CPUID;
/*
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 99909c6f3960..b9addd5fdd28 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -803,7 +803,8 @@ static void vhost_scsi_target_queue_cmd(struct vhost_scsi_cmd *cmd)
&cmd->tvc_sense_buf[0], cmd->tvc_lun,
cmd->tvc_exp_data_len,
vhost_scsi_to_tcm_attr(cmd->tvc_task_attr),
- cmd->tvc_data_direction, TARGET_SCF_ACK_KREF,
+ cmd->tvc_data_direction,
+ TARGET_SCF_ACK_KREF | TARGET_SCF_IGNORE_CPUID_COMPL,
sg_ptr, cmd->tvc_sgl_count, NULL, 0, sg_prot_ptr,
cmd->tvc_prot_sgl_count);
if (rc < 0) {
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index f2ba7de59da7..bb4ac7e6f560 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -146,6 +146,7 @@ enum se_cmd_flags_table {
SCF_USE_CPUID = (1 << 16),
SCF_TASK_ATTR_SET = (1 << 17),
SCF_TREAT_READ_AS_NORMAL = (1 << 18),
+ SCF_IGNORE_CPUID_COMPL = (1 << 19)
};
/*
@@ -195,6 +196,7 @@ enum target_sc_flags_table {
TARGET_SCF_ACK_KREF = 0x02,
TARGET_SCF_UNKNOWN_SIZE = 0x04,
TARGET_SCF_USE_CPUID = 0x08,
+ TARGET_SCF_IGNORE_CPUID_COMPL = 0x10,
};
/* fabric independent task management function values */
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 13/13] target: flush submission work during TMR processing
2021-02-10 4:55 [PATCH 00/13 V3] target: fix cmd plugging and completion Mike Christie
` (11 preceding siblings ...)
2021-02-10 4:55 ` [PATCH 12/13] target, vhost-scsi: don't switch cpus on completion Mike Christie
@ 2021-02-10 4:55 ` Mike Christie
12 siblings, 0 replies; 22+ messages in thread
From: Mike Christie @ 2021-02-10 4:55 UTC (permalink / raw)
To: bostroesser, Chaitanya.Kulkarni, loberman, martin.petersen,
linux-scsi, target-devel, mst, stefanha
Cc: Mike Christie
If a cmd is on the submission workqueue then the TMR code will
miss it, and end up returning task not found or success for
lun resets. The fabric driver might then tell the initiator that
the running cmds have been handled when they are about to run.
This adds a flush when we are processing TMRs to make sure queued
cmds do not run after returning the TMR response.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
drivers/target/target_core_tmr.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 7347285471fa..e7fcbc09f9db 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -124,6 +124,8 @@ void core_tmr_abort_task(
int i;
for (i = 0; i < dev->queue_cnt; i++) {
+ flush_work(&dev->queues[i].sq.work);
+
spin_lock_irqsave(&dev->queues[i].lock, flags);
list_for_each_entry_safe(se_cmd, next, &dev->queues[i].state_list,
state_list) {
@@ -302,6 +304,8 @@ static void core_tmr_drain_state_list(
* in the Control Mode Page.
*/
for (i = 0; i < dev->queue_cnt; i++) {
+ flush_work(&dev->queues[i].sq.work);
+
spin_lock_irqsave(&dev->queues[i].lock, flags);
list_for_each_entry_safe(cmd, next, &dev->queues[i].state_list,
state_list) {
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread