All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd
@ 2022-03-08  0:39 Mike Christie
  2022-03-08  0:39 ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Mike Christie
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Mike Christie @ 2022-03-08  0:39 UTC (permalink / raw)
  To: bvanassche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley

The following patches allow iscsi to transmit the initial iscsi PDU and
data from the kblockd context similar to how nvme/tcp does it. This has
the benefit that in a lot of our setups, we run
number_of_sessions == CPUs, so the apps performing IO share the CPU with
the iscsi layer. Right now, a lot of times we end up going from kblockd
to the iscsi workqueue which just adds an extra hop. By running from the
kblockd workqueue we see improvements of 20-30% in IOPs and throughput.

I made this a RFC and cc'd Ming and Bart because I wanted to get some
extra feed back on the first patch:

scsi: Allow drivers to set BLK_MQ_F_BLOCKING

because you guys know that code. For example the iscsi layer doesn't use
scsi_host_block so we won't run into problems there. But I wanted to make
sure there were not other possible issues.

The following patches were made over this iscsi patchset

https://lore.kernel.org/linux-scsi/20220308002747.122682-1-michael.christie@oracle.com/T/#t

the scsi patch applies over Linus or Martin's tree.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
  2022-03-08  0:39 [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Mike Christie
@ 2022-03-08  0:39 ` Mike Christie
  2022-03-08  4:57   ` Bart Van Assche
  2022-03-09  0:53   ` Ming Lei
  2022-03-08  0:39 ` [RFC PATCH 2/4] scsi: iscsi: Tell drivers when we must not block Mike Christie
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 18+ messages in thread
From: Mike Christie @ 2022-03-08  0:39 UTC (permalink / raw)
  To: bvanassche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

The software iscsi driver's queuecommand can block and taking the extra
hop from kblockd to its workqueue results in a performance hit. Allowing
it to set BLK_MQ_F_BLOCKING and transmit from that context directly
results in a 20-30% improvement in IOPs for workloads like:

fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio
--iodepth=128  --numjobs=1

and for all write workloads.

when using the none scheduler and the app and iscsi bound to the same
CPUs. Throughput tests show similar gains.

This patch adds a new scsi_host_template field so drivers can tell scsi-ml
that they can block so scsi-ml can setup the tag set with the
BLK_MQ_F_BLOCKING flag.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/scsi_lib.c  | 6 ++++--
 include/scsi/scsi_host.h | 4 ++++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0a70aa763a96..a5dbeb9994ae 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1989,6 +1989,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 	tag_set->driver_data = shost;
 	if (shost->host_tagset)
 		tag_set->flags |= BLK_MQ_F_TAG_HCTX_SHARED;
+	if (shost->hostt->queuecommand_blocks)
+		tag_set->flags |= BLK_MQ_F_BLOCKING;
 
 	return blk_mq_alloc_tag_set(tag_set);
 }
@@ -2952,8 +2954,8 @@ scsi_host_block(struct Scsi_Host *shost)
 	}
 
 	/*
-	 * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
-	 * calling synchronize_rcu() once is enough.
+	 * Drivers that use this helper enable blk-mq's BLK_MQ_F_BLOCKING flag
+	 * so calling synchronize_rcu() once is enough.
 	 */
 	WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 72e1a347baa6..0d106dc9309d 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -75,6 +75,10 @@ struct scsi_host_template {
 	 */
 	int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
 
+	/*
+	 * Set To true if the queuecommand function can block.
+	 */
+	bool queuecommand_blocks;
 	/*
 	 * The commit_rqs function is used to trigger a hardware
 	 * doorbell after some requests have been queued with
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/4] scsi: iscsi: Tell drivers when we must not block
  2022-03-08  0:39 [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Mike Christie
  2022-03-08  0:39 ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Mike Christie
@ 2022-03-08  0:39 ` Mike Christie
  2022-03-08  4:59   ` Bart Van Assche
  2022-03-08  0:39 ` [RFC PATCH 3/4] scsi: iscsi: Support transmit from queuecommand Mike Christie
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Mike Christie @ 2022-03-08  0:39 UTC (permalink / raw)
  To: bvanassche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

When in the next patches we xmit from kblockd we will want to perform
non-blocking IO so we don't affect other sessions while waiting for memory.
This adds an arg to the task and pdu xmit functions so libiscsi can tell
the drivers that support transmitting from both queuecommand and the
iscsi_q wq when they can't block.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |  2 +-
 drivers/scsi/be2iscsi/be_main.c          |  2 +-
 drivers/scsi/bnx2i/bnx2i_iscsi.c         |  3 ++-
 drivers/scsi/cxgbi/libcxgbi.c            |  2 +-
 drivers/scsi/cxgbi/libcxgbi.h            |  2 +-
 drivers/scsi/iscsi_tcp.c                 | 16 +++++++++++-----
 drivers/scsi/libiscsi.c                  | 16 ++++++++--------
 drivers/scsi/libiscsi_tcp.c              |  5 +++--
 drivers/scsi/qedi/qedi_iscsi.c           |  2 +-
 drivers/scsi/qla4xxx/ql4_os.c            |  4 ++--
 include/scsi/libiscsi_tcp.h              |  2 +-
 include/scsi/scsi_transport_iscsi.h      |  4 ++--
 12 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 07e47021a71f..244a4540dbf6 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -317,7 +317,7 @@ static int iscsi_iser_task_xmit_unsol_data(struct iscsi_conn *conn,
  *
  * Return: zero on success or escalates $error on failure.
  */
-static int iscsi_iser_task_xmit(struct iscsi_task *task)
+static int iscsi_iser_task_xmit(struct iscsi_task *task, bool dontwait)
 {
 	struct iscsi_conn *conn = task->conn;
 	struct iscsi_iser_task *iser_task = task->dd_data;
diff --git a/drivers/scsi/be2iscsi/be_main.c b/drivers/scsi/be2iscsi/be_main.c
index 36fddce2786d..fbd0c6981097 100644
--- a/drivers/scsi/be2iscsi/be_main.c
+++ b/drivers/scsi/be2iscsi/be_main.c
@@ -4745,7 +4745,7 @@ static int beiscsi_mtask(struct iscsi_task *task)
 	return 0;
 }
 
-static int beiscsi_task_xmit(struct iscsi_task *task)
+static int beiscsi_task_xmit(struct iscsi_task *task, bool dontwait)
 {
 	struct beiscsi_io_task *io_task = task->dd_data;
 	struct scsi_cmnd *sc = task->sc;
diff --git a/drivers/scsi/bnx2i/bnx2i_iscsi.c b/drivers/scsi/bnx2i/bnx2i_iscsi.c
index a592ca8602f9..d887157c1c02 100644
--- a/drivers/scsi/bnx2i/bnx2i_iscsi.c
+++ b/drivers/scsi/bnx2i/bnx2i_iscsi.c
@@ -1213,10 +1213,11 @@ bnx2i_mtask_xmit(struct iscsi_conn *conn, struct iscsi_task *task)
 /**
  * bnx2i_task_xmit - transmit iscsi command to chip for further processing
  * @task:	transport layer command structure pointer
+ * @dontwait:	true if the driver should not block
  *
  * maps SG buffers and send request to chip/firmware in the form of SQ WQE
  */
-static int bnx2i_task_xmit(struct iscsi_task *task)
+static int bnx2i_task_xmit(struct iscsi_task *task, bool dontwait)
 {
 	struct iscsi_conn *conn = task->conn;
 	struct iscsi_session *session = conn->session;
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 411b0d386fad..6fc6cf1a0090 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -2358,7 +2358,7 @@ static int cxgbi_sock_send_skb(struct cxgbi_sock *csk, struct sk_buff *skb)
 	return len;
 }
 
-int cxgbi_conn_xmit_pdu(struct iscsi_task *task)
+int cxgbi_conn_xmit_pdu(struct iscsi_task *task, bool dontwait)
 {
 	struct iscsi_tcp_conn *tcp_conn = task->conn->dd_data;
 	struct cxgbi_conn *cconn = tcp_conn->dd_data;
diff --git a/drivers/scsi/cxgbi/libcxgbi.h b/drivers/scsi/cxgbi/libcxgbi.h
index 3687b5c0cf90..a852dc31171a 100644
--- a/drivers/scsi/cxgbi/libcxgbi.h
+++ b/drivers/scsi/cxgbi/libcxgbi.h
@@ -604,7 +604,7 @@ void cxgbi_conn_tx_open(struct cxgbi_sock *);
 void cxgbi_conn_pdu_ready(struct cxgbi_sock *);
 int cxgbi_conn_alloc_pdu(struct iscsi_task *, u8);
 int cxgbi_conn_init_pdu(struct iscsi_task *, unsigned int , unsigned int);
-int cxgbi_conn_xmit_pdu(struct iscsi_task *);
+int cxgbi_conn_xmit_pdu(struct iscsi_task *, bool);
 
 void cxgbi_cleanup_task(struct iscsi_task *task);
 
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 974245eab605..c2627505011d 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -279,6 +279,7 @@ iscsi_sw_tcp_conn_restore_callbacks(struct iscsi_conn *conn)
  * iscsi_sw_tcp_xmit_segment - transmit segment
  * @tcp_conn: the iSCSI TCP connection
  * @segment: the buffer to transmnit
+ * @dontwait: true if we should use MSG_DONTWAIT
  *
  * This function transmits as much of the buffer as
  * the network layer will accept, and returns the number of
@@ -289,7 +290,8 @@ iscsi_sw_tcp_conn_restore_callbacks(struct iscsi_conn *conn)
  * it will retrieve the hash value and send it as well.
  */
 static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn,
-				     struct iscsi_segment *segment)
+				     struct iscsi_segment *segment,
+				     bool dontwait)
 {
 	struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
 	struct socket *sk = tcp_sw_conn->sock;
@@ -308,6 +310,9 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn,
 		if (segment->total_copied + segment->size < segment->total_size)
 			flags |= MSG_MORE | MSG_SENDPAGE_NOTLAST;
 
+		if (dontwait)
+			flags |= MSG_DONTWAIT;
+
 		/* Use sendpage if we can; else fall back to sendmsg */
 		if (!segment->data) {
 			sg = segment->sg;
@@ -350,8 +355,9 @@ static int iscsi_sw_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn,
 /**
  * iscsi_sw_tcp_xmit - TCP transmit
  * @conn: iscsi connection
+ * @dontwait: true if we should perform nonblocking IO
  **/
-static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
+static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn, bool dontwait)
 {
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
 	struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn->dd_data;
@@ -360,7 +366,7 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn)
 	int rc = 0;
 
 	while (1) {
-		rc = iscsi_sw_tcp_xmit_segment(tcp_conn, segment);
+		rc = iscsi_sw_tcp_xmit_segment(tcp_conn, segment, dontwait);
 		/*
 		 * We may not have been able to send data because the conn
 		 * is getting stopped. libiscsi will know so propagate err
@@ -411,7 +417,7 @@ static inline int iscsi_sw_tcp_xmit_qlen(struct iscsi_conn *conn)
 	return segment->total_copied - segment->total_size;
 }
 
-static int iscsi_sw_tcp_pdu_xmit(struct iscsi_task *task)
+static int iscsi_sw_tcp_pdu_xmit(struct iscsi_task *task, bool dontwait)
 {
 	struct iscsi_conn *conn = task->conn;
 	unsigned int noreclaim_flag;
@@ -428,7 +434,7 @@ static int iscsi_sw_tcp_pdu_xmit(struct iscsi_task *task)
 	noreclaim_flag = memalloc_noreclaim_save();
 
 	while (iscsi_sw_tcp_xmit_qlen(conn)) {
-		rc = iscsi_sw_tcp_xmit(conn);
+		rc = iscsi_sw_tcp_xmit(conn, dontwait);
 		if (rc == 0) {
 			rc = -EAGAIN;
 			break;
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index fcf5c30614ba..63e0d97df50f 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -812,7 +812,7 @@ static int iscsi_send_mgmt_task(struct iscsi_task *task)
 		if (rc)
 			return rc;
 
-		rc = session->tt->xmit_task(task);
+		rc = session->tt->xmit_task(task, false);
 		if (rc)
 			return rc;
 	} else {
@@ -1498,7 +1498,7 @@ static int iscsi_check_cmdsn_window_closed(struct iscsi_conn *conn)
 }
 
 static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task,
-			   bool was_requeue)
+			   bool was_requeue, bool dontwait)
 {
 	int rc;
 
@@ -1539,7 +1539,7 @@ static int iscsi_xmit_task(struct iscsi_conn *conn, struct iscsi_task *task,
 	}
 
 	spin_unlock_bh(&conn->session->frwd_lock);
-	rc = conn->session->tt->xmit_task(task);
+	rc = conn->session->tt->xmit_task(task, dontwait);
 	spin_lock_bh(&conn->session->frwd_lock);
 	if (!rc) {
 		/* done with this task */
@@ -1608,7 +1608,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 	}
 
 	if (conn->task) {
-		rc = iscsi_xmit_task(conn, conn->task, false);
+		rc = iscsi_xmit_task(conn, conn->task, false, false);
 	        if (rc)
 		        goto done;
 	}
@@ -1630,7 +1630,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 			spin_unlock_bh(&conn->session->back_lock);
 			continue;
 		}
-		rc = iscsi_xmit_task(conn, task, false);
+		rc = iscsi_xmit_task(conn, task, false, false);
 		if (rc)
 			goto done;
 	}
@@ -1652,7 +1652,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 				fail_scsi_task(task, DID_ABORT);
 			continue;
 		}
-		rc = iscsi_xmit_task(conn, task, false);
+		rc = iscsi_xmit_task(conn, task, false, false);
 		if (rc)
 			goto done;
 		/*
@@ -1678,7 +1678,7 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 			break;
 
 		list_del_init(&task->running);
-		rc = iscsi_xmit_task(conn, task, true);
+		rc = iscsi_xmit_task(conn, task, true, false);
 		if (rc)
 			goto done;
 		if (!list_empty(&conn->mgmtqueue))
@@ -1843,7 +1843,7 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
 				goto prepd_fault;
 			}
 		}
-		if (session->tt->xmit_task(task)) {
+		if (session->tt->xmit_task(task, true)) {
 			session->cmdsn--;
 			reason = FAILURE_SESSION_NOT_READY;
 			goto prepd_reject;
diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
index defe08142b75..2c783862963d 100644
--- a/drivers/scsi/libiscsi_tcp.c
+++ b/drivers/scsi/libiscsi_tcp.c
@@ -1060,12 +1060,13 @@ static struct iscsi_r2t_info *iscsi_tcp_get_curr_r2t(struct iscsi_task *task)
 /**
  * iscsi_tcp_task_xmit - xmit normal PDU task
  * @task: iscsi command task
+ * @dontwait: true if the driver should not wait for wmem space
  *
  * We're expected to return 0 when everything was transmitted successfully,
  * -EAGAIN if there's still data in the queue, or != 0 for any other kind
  * of error.
  */
-int iscsi_tcp_task_xmit(struct iscsi_task *task)
+int iscsi_tcp_task_xmit(struct iscsi_task *task, bool dontwait)
 {
 	struct iscsi_conn *conn = task->conn;
 	struct iscsi_session *session = conn->session;
@@ -1074,7 +1075,7 @@ int iscsi_tcp_task_xmit(struct iscsi_task *task)
 
 flush:
 	/* Flush any pending data first. */
-	rc = session->tt->xmit_pdu(task);
+	rc = session->tt->xmit_pdu(task, dontwait);
 	if (rc < 0)
 		return rc;
 
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index 282ecb4e39bb..80a9bd4ef65e 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -827,7 +827,7 @@ static int qedi_mtask_xmit(struct iscsi_conn *conn, struct iscsi_task *task)
 	return qedi_iscsi_send_generic_request(task);
 }
 
-static int qedi_task_xmit(struct iscsi_task *task)
+static int qedi_task_xmit(struct iscsi_task *task, bool dontwait)
 {
 	struct iscsi_conn *conn = task->conn;
 	struct qedi_conn *qedi_conn = conn->dd_data;
diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 955d8cb675f1..60f8c10c000d 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -137,7 +137,7 @@ qla4xxx_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
 static void qla4xxx_session_destroy(struct iscsi_cls_session *sess);
 static void qla4xxx_task_work(struct work_struct *wdata);
 static int qla4xxx_alloc_pdu(struct iscsi_task *, uint8_t);
-static int qla4xxx_task_xmit(struct iscsi_task *);
+static int qla4xxx_task_xmit(struct iscsi_task *, bool);
 static void qla4xxx_task_cleanup(struct iscsi_task *);
 static void qla4xxx_fail_session(struct iscsi_cls_session *cls_session);
 static void qla4xxx_conn_get_stats(struct iscsi_cls_conn *cls_conn,
@@ -3477,7 +3477,7 @@ static void qla4xxx_task_cleanup(struct iscsi_task *task)
 	return;
 }
 
-static int qla4xxx_task_xmit(struct iscsi_task *task)
+static int qla4xxx_task_xmit(struct iscsi_task *task, bool dontwait)
 {
 	struct scsi_cmnd *sc = task->sc;
 	struct iscsi_session *sess = task->conn->session;
diff --git a/include/scsi/libiscsi_tcp.h b/include/scsi/libiscsi_tcp.h
index 7c8ba9d7378b..9fb97cbfa05c 100644
--- a/include/scsi/libiscsi_tcp.h
+++ b/include/scsi/libiscsi_tcp.h
@@ -88,7 +88,7 @@ extern int iscsi_tcp_recv_skb(struct iscsi_conn *conn, struct sk_buff *skb,
 			      unsigned int offset, bool offloaded, int *status);
 extern void iscsi_tcp_cleanup_task(struct iscsi_task *task);
 extern int iscsi_tcp_task_init(struct iscsi_task *task);
-extern int iscsi_tcp_task_xmit(struct iscsi_task *task);
+extern int iscsi_tcp_task_xmit(struct iscsi_task *task, bool dontwait);
 
 /* segment helpers */
 extern int iscsi_tcp_recv_segment_is_hdr(struct iscsi_tcp_conn *tcp_conn);
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index 7a0d24d3b916..d3b37d28e11a 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -108,11 +108,11 @@ struct iscsi_transport {
 			   struct iscsi_stats *stats);
 
 	int (*init_task) (struct iscsi_task *task);
-	int (*xmit_task) (struct iscsi_task *task);
+	int (*xmit_task) (struct iscsi_task *task, bool dontwait);
 	void (*cleanup_task) (struct iscsi_task *task);
 
 	int (*alloc_pdu) (struct iscsi_task *task, uint8_t opcode);
-	int (*xmit_pdu) (struct iscsi_task *task);
+	int (*xmit_pdu) (struct iscsi_task *task, bool dontwait);
 	int (*init_pdu) (struct iscsi_task *task, unsigned int offset,
 			 unsigned int count);
 	void (*parse_pdu_itt) (struct iscsi_conn *conn, itt_t itt,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 3/4] scsi: iscsi: Support transmit from queuecommand
  2022-03-08  0:39 [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Mike Christie
  2022-03-08  0:39 ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Mike Christie
  2022-03-08  0:39 ` [RFC PATCH 2/4] scsi: iscsi: Tell drivers when we must not block Mike Christie
@ 2022-03-08  0:39 ` Mike Christie
  2022-03-08  0:39 ` [RFC PATCH 4/4] scsi: iscsi_tcp: Allow user to control if " Mike Christie
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2022-03-08  0:39 UTC (permalink / raw)
  To: bvanassche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

This patch alllow drivers like iscsi_tcp to transmit the task from the
queuecommand. For the case where we were going to run from kblockd and the
app and iscsi share CPUs, we can skip the extra workqueue runs and it can
result in up to 30% increase in 4K IOPs.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/libiscsi.c | 52 +++++++++++++++++++++++++++++++++++------
 include/scsi/libiscsi.h | 11 +++++++--
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 63e0d97df50f..5a2953260a94 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1700,9 +1700,13 @@ static void iscsi_xmitworker(struct work_struct *work)
 	/*
 	 * serialize Xmit worker on a per-connection basis.
 	 */
+	mutex_lock(&conn->xmit_mutex);
+
 	do {
 		rc = iscsi_data_xmit(conn);
 	} while (rc >= 0 || rc == -EAGAIN);
+
+	mutex_unlock(&conn->xmit_mutex);
 }
 
 static inline struct iscsi_task *iscsi_alloc_task(struct iscsi_conn *conn,
@@ -1832,9 +1836,16 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
 		goto reject;
 	}
 
-	if (!ihost->workq) {
+	session->queued_cmdsn++;
+
+	if (!ihost->workq ||
+	    (ihost->xmit_from_qc && !conn->task &&
+	     list_empty(&conn->cmdqueue) && mutex_trylock(&conn->xmit_mutex))) {
 		reason = iscsi_prep_scsi_cmd_pdu(task);
 		if (reason) {
+			if (ihost->workq)
+				mutex_unlock(&conn->xmit_mutex);
+
 			if (reason == -ENOMEM ||  reason == -EACCES) {
 				reason = FAILURE_OOM;
 				goto prepd_reject;
@@ -1843,21 +1854,27 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
 				goto prepd_fault;
 			}
 		}
-		if (session->tt->xmit_task(task, true)) {
-			session->cmdsn--;
-			reason = FAILURE_SESSION_NOT_READY;
-			goto prepd_reject;
+
+		if (!ihost->workq) {
+			if (session->tt->xmit_task(task, true)) {
+				session->cmdsn--;
+				reason = FAILURE_SESSION_NOT_READY;
+				goto prepd_reject;
+			}
+		} else {
+			iscsi_xmit_task(conn, task, false, true);
+			mutex_unlock(&conn->xmit_mutex);
 		}
 	} else {
 		list_add_tail(&task->running, &conn->cmdqueue);
 		iscsi_conn_queue_xmit(conn);
 	}
 
-	session->queued_cmdsn++;
 	spin_unlock_bh(&session->frwd_lock);
 	return 0;
 
 prepd_reject:
+	session->queued_cmdsn--;
 	spin_lock_bh(&session->back_lock);
 	iscsi_complete_task(task, ISCSI_TASK_REQUEUE_SCSIQ);
 	spin_unlock_bh(&session->back_lock);
@@ -1868,6 +1885,7 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *sc)
 	return SCSI_MLQUEUE_TARGET_BUSY;
 
 prepd_fault:
+	session->queued_cmdsn--;
 	spin_lock_bh(&session->back_lock);
 	iscsi_complete_task(task, ISCSI_TASK_REQUEUE_SCSIQ);
 	spin_unlock_bh(&session->back_lock);
@@ -2558,11 +2576,17 @@ int iscsi_eh_device_reset(struct scsi_cmnd *sc)
 
 	iscsi_suspend_tx(conn);
 
+	/*
+	 * If this is called from a SG ioctl the host may not be fully stopped.
+	 * Take the xmit_mutex in case we are xmitting from iscsi_queuecommand.
+	 */
+	mutex_lock(&conn->xmit_mutex);
 	spin_lock_bh(&session->frwd_lock);
 	memset(hdr, 0, sizeof(*hdr));
 	fail_scsi_tasks(conn, sc->device->lun, DID_ERROR);
 	session->tmf_state = TMF_INITIAL;
 	spin_unlock_bh(&session->frwd_lock);
+	mutex_unlock(&conn->xmit_mutex);
 
 	iscsi_start_tx(conn);
 	goto done;
@@ -2720,11 +2744,17 @@ static int iscsi_eh_target_reset(struct scsi_cmnd *sc)
 
 	iscsi_suspend_tx(conn);
 
+	/*
+	 * If this is called from a SG ioctl the host may not be fully stopped.
+	 * Take the xmit_mutex in case we are xmitting from iscsi_queuecommand.
+	 */
+	mutex_lock(&conn->xmit_mutex);
 	spin_lock_bh(&session->frwd_lock);
 	memset(hdr, 0, sizeof(*hdr));
 	fail_scsi_tasks(conn, -1, DID_ERROR);
 	session->tmf_state = TMF_INITIAL;
 	spin_unlock_bh(&session->frwd_lock);
+	mutex_unlock(&conn->xmit_mutex);
 
 	iscsi_start_tx(conn);
 	goto done;
@@ -2909,6 +2939,8 @@ struct Scsi_Host *iscsi_host_alloc(struct scsi_host_template *sht,
 			2, shost->host_no);
 		if (!ihost->workq)
 			goto free_host;
+	} else {
+		ihost->xmit_from_qc = true;
 	}
 
 	spin_lock_init(&ihost->lock);
@@ -3165,6 +3197,7 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, int dd_size,
 	INIT_LIST_HEAD(&conn->cmdqueue);
 	INIT_LIST_HEAD(&conn->requeue);
 	INIT_WORK(&conn->xmitwork, iscsi_xmitworker);
+	mutex_init(&conn->xmit_mutex);
 
 	/* allocate login_task used for the login/text sequences */
 	spin_lock_bh(&session->frwd_lock);
@@ -3395,13 +3428,18 @@ void iscsi_conn_stop(struct iscsi_cls_conn *cls_conn, int flag)
 	}
 
 	/*
-	 * flush queues.
+	 * When flushing the queues we don't wait for the block to complete
+	 * above because it can be expensive when there are lots of devices.
+	 * To make sure there is not an xmit from iscsi_queuecommand running
+	 * take the xmit_mutex.
 	 */
+	mutex_lock(&conn->xmit_mutex);
 	spin_lock_bh(&session->frwd_lock);
 	fail_scsi_tasks(conn, -1, DID_TRANSPORT_DISRUPTED);
 	fail_mgmt_tasks(session, conn);
 	memset(&session->tmhdr, 0, sizeof(session->tmhdr));
 	spin_unlock_bh(&session->frwd_lock);
+	mutex_unlock(&conn->xmit_mutex);
 	mutex_unlock(&session->eh_mutex);
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_stop);
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 412722f44747..1f2accf2bc1b 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -198,6 +198,12 @@ struct iscsi_conn {
 	struct list_head	cmdqueue;	/* data-path cmd queue */
 	struct list_head	requeue;	/* tasks needing another run */
 	struct work_struct	xmitwork;	/* per-conn. xmit workqueue */
+	/*
+	 * This must be held while calling the xmit_task callout if it can
+	 * called from the iscsi_q workqueue. It must be taken after the
+	 * eh_mutex if both mutex's are needed.
+	 */
+	struct mutex		xmit_mutex;
 	/* recv */
 	struct work_struct	recvwork;
 	unsigned long		flags;		/* ISCSI_CONN_FLAGs */
@@ -267,8 +273,8 @@ struct iscsi_session {
 	struct iscsi_cls_session *cls_session;
 	/*
 	 * Syncs up the scsi eh thread with the iscsi eh thread when sending
-	 * task management functions. This must be taken before the session
-	 * and recv lock.
+	 * task management functions. This must be taken before the conn's
+	 * xmit_mutex.
 	 */
 	struct mutex		eh_mutex;
 	/* abort */
@@ -370,6 +376,7 @@ struct iscsi_host {
 	int			num_sessions;
 	int			state;
 
+	bool			xmit_from_qc;
 	struct workqueue_struct	*workq;
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 4/4] scsi: iscsi_tcp: Allow user to control if transmit from queuecommand
  2022-03-08  0:39 [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Mike Christie
                   ` (2 preceding siblings ...)
  2022-03-08  0:39 ` [RFC PATCH 3/4] scsi: iscsi: Support transmit from queuecommand Mike Christie
@ 2022-03-08  0:39 ` Mike Christie
  2022-03-08  5:10   ` Bart Van Assche
       [not found] ` <CGME20220308004023epcas2p12ebd497c14d32f36d0aa6682c0b9d0db@epcms2p7>
  2022-03-15  8:26 ` [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Christoph Hellwig
  5 siblings, 1 reply; 18+ messages in thread
From: Mike Christie @ 2022-03-08  0:39 UTC (permalink / raw)
  To: bvanassche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley
  Cc: Mike Christie

Transmitting from the queuecommand is nice when your app and iscsi threads
have to run on the same CPU. But, for the case where you want higher
throughput/IOPs it's sometimes better to hog multiple CPUs with the app on
one CPU and the xmit/recv paths an another. To allow both configs this
adds a modparam.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
---
 drivers/scsi/iscsi_tcp.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index c2627505011d..c48707b53746 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -56,6 +56,10 @@ static bool iscsi_use_recv_wq;
 module_param_named(use_recv_wq, iscsi_use_recv_wq, bool, 0644);
 MODULE_PARM_DESC(use_recv_wq, "Set to true to read iSCSI data/headers from the iscsi_q workqueue. The default is false which will perform reads from the network softirq context.");
 
+static bool iscsi_xmit_from_qc;
+module_param_named(xmit_from_queuecommand, iscsi_xmit_from_qc, bool, 0644);
+MODULE_PARM_DESC(xmit_from_queuecommand, "Set to true to try to xmit the task from the queuecommand callout. The default is false wihch will xmit the task from the iscsi_q workqueue.");
+
 static int iscsi_sw_tcp_dbg;
 module_param_named(debug_iscsi_tcp, iscsi_sw_tcp_dbg, int,
 		   S_IRUGO | S_IWUSR);
@@ -909,6 +913,7 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
 	struct iscsi_cls_session *cls_session;
 	struct iscsi_session *session;
 	struct iscsi_sw_tcp_host *tcp_sw_host;
+	struct iscsi_host *ihost;
 	struct Scsi_Host *shost;
 	int rc;
 
@@ -928,6 +933,12 @@ iscsi_sw_tcp_session_create(struct iscsi_endpoint *ep, uint16_t cmds_max,
 	shost->max_channel = 0;
 	shost->max_cmd_len = SCSI_MAX_VARLEN_CDB_SIZE;
 
+	if (iscsi_xmit_from_qc) {
+		shost->hostt->queuecommand_blocks = true;
+		ihost = shost_priv(shost);
+		ihost->xmit_from_qc = true;
+	}
+
 	rc = iscsi_host_get_max_scsi_cmds(shost, cmds_max);
 	if (rc < 0)
 		goto free_host;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
  2022-03-08  0:39 ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Mike Christie
@ 2022-03-08  4:57   ` Bart Van Assche
  2022-03-09  0:53   ` Ming Lei
  1 sibling, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2022-03-08  4:57 UTC (permalink / raw)
  To: Mike Christie, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley

On 3/7/22 16:39, Mike Christie wrote:
> The software iscsi driver's queuecommand can block and taking the extra
> hop from kblockd to its workqueue results in a performance hit. Allowing
> it to set BLK_MQ_F_BLOCKING and transmit from that context directly
> results in a 20-30% improvement in IOPs for workloads like: [...]

That's impressive!

> @@ -2952,8 +2954,8 @@ scsi_host_block(struct Scsi_Host *shost)
>   	}
>   
>   	/*
> -	 * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
> -	 * calling synchronize_rcu() once is enough.
> +	 * Drivers that use this helper enable blk-mq's BLK_MQ_F_BLOCKING flag
> +	 * so calling synchronize_rcu() once is enough.
>   	 */
>   	WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);

s/enable/do not set/ ?

> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 72e1a347baa6..0d106dc9309d 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -75,6 +75,10 @@ struct scsi_host_template {
>   	 */
>   	int (* queuecommand)(struct Scsi_Host *, struct scsi_cmnd *);
>   
> +	/*
> +	 * Set To true if the queuecommand function can block.
> +	 */
> +	bool queuecommand_blocks;
>   	/*
>   	 * The commit_rqs function is used to trigger a hardware
>   	 * doorbell after some requests have been queued with

I'm not sure what the best name is for this new flag. Some function 
names refer to sleeping (e.g. might_sleep()) while the flag 
BLK_MQ_F_BLOCKING has "blocking" in its name. Although I do not have a 
strong opinion about this: has it been considered to use a name like 
queuecommand_sleeps or queuecommand_may_sleep instead of 
queuecommand_blocks?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/4] scsi: iscsi: Tell drivers when we must not block
  2022-03-08  0:39 ` [RFC PATCH 2/4] scsi: iscsi: Tell drivers when we must not block Mike Christie
@ 2022-03-08  4:59   ` Bart Van Assche
  2022-03-08 15:58     ` Mike Christie
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2022-03-08  4:59 UTC (permalink / raw)
  To: Mike Christie, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley

On 3/7/22 16:39, Mike Christie wrote:
> -static int iscsi_iser_task_xmit(struct iscsi_task *task)
> +static int iscsi_iser_task_xmit(struct iscsi_task *task, bool dontwait)

Arguments with "not" in their name easily lead to double negations. Has 
it been considered to change "bool dontwait" into "bool may_sleep" and 
to inverse the argument passed to this and similar functions?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/4] scsi: iscsi_tcp: Allow user to control if transmit from queuecommand
  2022-03-08  0:39 ` [RFC PATCH 4/4] scsi: iscsi_tcp: Allow user to control if " Mike Christie
@ 2022-03-08  5:10   ` Bart Van Assche
  2022-03-08 16:51     ` Mike Christie
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2022-03-08  5:10 UTC (permalink / raw)
  To: Mike Christie, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley

On 3/7/22 16:39, Mike Christie wrote:
> +static bool iscsi_xmit_from_qc;
> +module_param_named(xmit_from_queuecommand, iscsi_xmit_from_qc, bool, 0644);
> +MODULE_PARM_DESC(xmit_from_queuecommand, "Set to true to try to xmit the task from the queuecommand callout. The default is false wihch will xmit the task from the iscsi_q workqueue.");

s/wihch/which/ ?

It may be hard for users to get the value of this parameter right. Has 
it been considered to make the iSCSI initiator select the proper mode 
depending on the load? I think the DPDK and SPDK software supports this. 
This is supported via user-level multithreading and by scaling to more 
CPU cores if required to achieve full performance.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
       [not found] ` <CGME20220308004023epcas2p12ebd497c14d32f36d0aa6682c0b9d0db@epcms2p7>
@ 2022-03-08  7:00   ` Daejun Park
  2022-03-08 16:14     ` Mike Christie
  0 siblings, 1 reply; 18+ messages in thread
From: Daejun Park @ 2022-03-08  7:00 UTC (permalink / raw)
  To: Mike Christie, bvanassche, lduncan, cleech, ming.lei,
	martin.petersen, linux-scsi, james.bottomley

Hi Mike, 

>@@ -2952,8 +2954,8 @@ scsi_host_block(struct Scsi_Host *shost)
>         }
> 
>         /*
>-         * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
>-         * calling synchronize_rcu() once is enough.
>+         * Drivers that use this helper enable blk-mq's BLK_MQ_F_BLOCKING flag
>+         * so calling synchronize_rcu() once is enough.
>          */
>         WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
Should we keep this line?

And I think scsi_host_block() should call synchronize_srcu() rather than
synchronize_rcu(), if the BLK_MQ_F_BLOCKING flag is used.

Thanks,
Daejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/4] scsi: iscsi: Tell drivers when we must not block
  2022-03-08  4:59   ` Bart Van Assche
@ 2022-03-08 15:58     ` Mike Christie
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2022-03-08 15:58 UTC (permalink / raw)
  To: Bart Van Assche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley

On 3/7/22 10:59 PM, Bart Van Assche wrote:
> On 3/7/22 16:39, Mike Christie wrote:
>> -static int iscsi_iser_task_xmit(struct iscsi_task *task)
>> +static int iscsi_iser_task_xmit(struct iscsi_task *task, bool dontwait)
> 
> Arguments with "not" in their name easily lead to double negations. Has it been considered to change "bool dontwait" into "bool may_sleep" and to inverse the argument passed to this and similar functions?

Yeah, I'm not tied to any naming. I just copied the name from the
network layer's MSG_DONTWAIT flag.

I was going to suggest something with "block" in the name but I
I saw in your other comment you might like something with sleep.

may_sleep is fine with me. For the queuecommand related field,
I'll do a sleep variant too.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
  2022-03-08  7:00   ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Daejun Park
@ 2022-03-08 16:14     ` Mike Christie
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2022-03-08 16:14 UTC (permalink / raw)
  To: daejun7.park, bvanassche, lduncan, cleech, ming.lei,
	martin.petersen, linux-scsi, james.bottomley

On 3/8/22 1:00 AM, Daejun Park wrote:
> Hi Mike, 
> 
>> @@ -2952,8 +2954,8 @@ scsi_host_block(struct Scsi_Host *shost)
>>         }
>>
>>         /*
>> -         * SCSI never enables blk-mq's BLK_MQ_F_BLOCKING flag so
>> -         * calling synchronize_rcu() once is enough.
>> +         * Drivers that use this helper enable blk-mq's BLK_MQ_F_BLOCKING flag
>> +         * so calling synchronize_rcu() once is enough.
>>          */
>>         WARN_ON_ONCE(shost->tag_set.flags & BLK_MQ_F_BLOCKING);
> Should we keep this line?
> 
> And I think scsi_host_block() should call synchronize_srcu() rather than
> synchronize_rcu(), if the BLK_MQ_F_BLOCKING flag is used.
>

Sorry, I messed up the comment.

The comment should have been:

Drivers that set the blk-mq BLK_MQ_F_BLOCKING do not use this function
so calling synchronize_rcu() once is enough.

--------------------------

The iscsi drivers that use BLK_MQ_F_BLOCKING use scsi_target_block which
ends up calling blk_mq_wait_quiesce_done. That function will do the right
thing wrt rcu/srcu based on if the BLK_MQ_F_BLOCKING flag was set when
the scsi host was added.

So in the above comment I meant to express drivers that use BLK_MQ_F_BLOCKING
shouldn't call scsi_host_block and right now the WARN call is valid.

However, I could do something like you are suggesting and do a:

if (shost->hostt->queuecommand_may_sleep_blocks)
	synchronize_srcu()
else
	synchronize_rcu()

However, I don't have any way to test it, so I was a little reluctant
to add that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 4/4] scsi: iscsi_tcp: Allow user to control if transmit from queuecommand
  2022-03-08  5:10   ` Bart Van Assche
@ 2022-03-08 16:51     ` Mike Christie
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2022-03-08 16:51 UTC (permalink / raw)
  To: Bart Van Assche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley

On 3/7/22 11:10 PM, Bart Van Assche wrote:
> On 3/7/22 16:39, Mike Christie wrote:
>> +static bool iscsi_xmit_from_qc;
>> +module_param_named(xmit_from_queuecommand, iscsi_xmit_from_qc, bool, 0644);
>> +MODULE_PARM_DESC(xmit_from_queuecommand, "Set to true to try to xmit the task from the queuecommand callout. The default is false wihch will xmit the task from the iscsi_q workqueue.");
> 
> s/wihch/which/ ?
> 
> It may be hard for users to get the value of this parameter right. Has it been considered to make the iSCSI initiator select the proper mode depending on the load? I think the DPDK and SPDK software supports this. This is supported via user-level multithreading and by scaling to more CPU cores if required to achieve full performance.

The problem is guessing what the user is doing and wants.
Users just do a wide range of things for whatever reasons:

1. No tuning at all. Let iscsi, network and app run wild on all CPUs.

2. Pin iscsi to specific CPUs. Let the app run wild. Or pin the
networking and let iscsi and app run wild.

3. Pin iscsi and network to a specific CPU. Let app run wild.

4. Pin everything.

They all have their benefits drawbacks like if you put the network
and iscsi on the same CPU (or put them on different ones), and run
the app on a different one you get the highest performance for a lot
of workloads. Abusing the CPUs overcome the loss of caching. In this
case, you might not want to use xmit_from_queuecommand because we
could be transmitting from the CPU the app is on and it might have
wanted those CPU cycle for it's own work.

However, you might have to put everything on the same CPUs because they
can't interfere with other workloads. In that case you take the perf
hit. In this case xmit_from_queuecommand=true works really well for you.

Note, if you start with xmit_from_queuecommand=true and if the user is
doing a workload that doesn't allow us to transmit from the queucommand
then we do drop back to the old style of doing things. We also do that
for the case where SCSI WRITEs have to be broken up.

So I didn't make it dynamic or change the default so I wouldn't mess
up existing users.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
  2022-03-08  0:39 ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Mike Christie
  2022-03-08  4:57   ` Bart Van Assche
@ 2022-03-09  0:53   ` Ming Lei
  2022-03-09  1:17     ` Mike Christie
  1 sibling, 1 reply; 18+ messages in thread
From: Ming Lei @ 2022-03-09  0:53 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, lduncan, cleech, martin.petersen, linux-scsi,
	james.bottomley

On Mon, Mar 07, 2022 at 06:39:54PM -0600, Mike Christie wrote:
> The software iscsi driver's queuecommand can block and taking the extra
> hop from kblockd to its workqueue results in a performance hit. Allowing
> it to set BLK_MQ_F_BLOCKING and transmit from that context directly
> results in a 20-30% improvement in IOPs for workloads like:
> 
> fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio
> --iodepth=128  --numjobs=1
> 
> and for all write workloads.

This single patch shouldn't make any difference for iscsi, so please
make it as last one if performance improvement data is provided
in commit log.

Also is there performance effect for other worloads? such as multiple
jobs? iscsi is SQ hardware, so if driver is blocked in ->queuecommand()
via BLK_MQ_F_BLOCKING, other contexts can't submit IO to scsi ML any more.


thanks,
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
  2022-03-09  0:53   ` Ming Lei
@ 2022-03-09  1:17     ` Mike Christie
  2022-03-09  1:43       ` Ming Lei
  0 siblings, 1 reply; 18+ messages in thread
From: Mike Christie @ 2022-03-09  1:17 UTC (permalink / raw)
  To: Ming Lei
  Cc: bvanassche, lduncan, cleech, martin.petersen, linux-scsi,
	james.bottomley

On 3/8/22 6:53 PM, Ming Lei wrote:
> On Mon, Mar 07, 2022 at 06:39:54PM -0600, Mike Christie wrote:
>> The software iscsi driver's queuecommand can block and taking the extra
>> hop from kblockd to its workqueue results in a performance hit. Allowing
>> it to set BLK_MQ_F_BLOCKING and transmit from that context directly
>> results in a 20-30% improvement in IOPs for workloads like:
>>
>> fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio
>> --iodepth=128  --numjobs=1
>>
>> and for all write workloads.
> 
> This single patch shouldn't make any difference for iscsi, so please
> make it as last one if performance improvement data is provided
> in commit log.

Ok.

> 
> Also is there performance effect for other worloads? such as multiple
> jobs? iscsi is SQ hardware, so if driver is blocked in ->queuecommand()
> via BLK_MQ_F_BLOCKING, other contexts can't submit IO to scsi ML any more.

If you mean multiple jobs running on the same connection/session then
they are all serialized now. A connection can only do 1 cmd at a time.
There's a big mutex around it in the network layer, so multiple jobs
just suck no matter what.

If you mean multiple jobs from different connection/sessions, then the
iscsi code with this patchset blocks only because the network layer
takes a mutex for a short time. We configure it to not block for things
like socket space, memory allocations, we do zero copy IO normally, etc
so it's quick.

We also can do up to workqueues max_active limit worth of calls so
other things can normally send IO. We haven't found a need to increase
it yet.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
  2022-03-09  1:17     ` Mike Christie
@ 2022-03-09  1:43       ` Ming Lei
  2022-03-09 19:38         ` Mike Christie
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2022-03-09  1:43 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, lduncan, cleech, martin.petersen, linux-scsi,
	james.bottomley

On Tue, Mar 08, 2022 at 07:17:13PM -0600, Mike Christie wrote:
> On 3/8/22 6:53 PM, Ming Lei wrote:
> > On Mon, Mar 07, 2022 at 06:39:54PM -0600, Mike Christie wrote:
> >> The software iscsi driver's queuecommand can block and taking the extra
> >> hop from kblockd to its workqueue results in a performance hit. Allowing
> >> it to set BLK_MQ_F_BLOCKING and transmit from that context directly
> >> results in a 20-30% improvement in IOPs for workloads like:
> >>
> >> fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio
> >> --iodepth=128  --numjobs=1
> >>
> >> and for all write workloads.
> > 
> > This single patch shouldn't make any difference for iscsi, so please
> > make it as last one if performance improvement data is provided
> > in commit log.
> 
> Ok.
> 
> > 
> > Also is there performance effect for other worloads? such as multiple
> > jobs? iscsi is SQ hardware, so if driver is blocked in ->queuecommand()
> > via BLK_MQ_F_BLOCKING, other contexts can't submit IO to scsi ML any more.
> 
> If you mean multiple jobs running on the same connection/session then
> they are all serialized now. A connection can only do 1 cmd at a time.
> There's a big mutex around it in the network layer, so multiple jobs
> just suck no matter what.

I guess one block device can only bind to one isci connection, given the
1 cmd per connection limit, so looks multiple jobs is fine.

> 
> If you mean multiple jobs from different connection/sessions, then the
> iscsi code with this patchset blocks only because the network layer
> takes a mutex for a short time. We configure it to not block for things
> like socket space, memory allocations, we do zero copy IO normally, etc
> so it's quick.
> 
> We also can do up to workqueues max_active limit worth of calls so
> other things can normally send IO. We haven't found a need to increase
> it yet.
 
I meant that hctx->run_work is required for blk-mq to dispatch IO, iscsi is
SQ HBA, so there is only single work_struct. If one context is blocked in
->queue_rq or ->queuecommand, other contexts can't submit IO to driver any
more.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING
  2022-03-09  1:43       ` Ming Lei
@ 2022-03-09 19:38         ` Mike Christie
  0 siblings, 0 replies; 18+ messages in thread
From: Mike Christie @ 2022-03-09 19:38 UTC (permalink / raw)
  To: Ming Lei
  Cc: bvanassche, lduncan, cleech, martin.petersen, linux-scsi,
	james.bottomley

On 3/8/22 7:43 PM, Ming Lei wrote:
> On Tue, Mar 08, 2022 at 07:17:13PM -0600, Mike Christie wrote:
>> On 3/8/22 6:53 PM, Ming Lei wrote:
>>> On Mon, Mar 07, 2022 at 06:39:54PM -0600, Mike Christie wrote:
>>>> The software iscsi driver's queuecommand can block and taking the extra
>>>> hop from kblockd to its workqueue results in a performance hit. Allowing
>>>> it to set BLK_MQ_F_BLOCKING and transmit from that context directly
>>>> results in a 20-30% improvement in IOPs for workloads like:
>>>>
>>>> fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=4k --ioengine=libaio
>>>> --iodepth=128  --numjobs=1
>>>>
>>>> and for all write workloads.
>>>
>>> This single patch shouldn't make any difference for iscsi, so please
>>> make it as last one if performance improvement data is provided
>>> in commit log.
>>
>> Ok.
>>
>>>
>>> Also is there performance effect for other worloads? such as multiple
>>> jobs? iscsi is SQ hardware, so if driver is blocked in ->queuecommand()
>>> via BLK_MQ_F_BLOCKING, other contexts can't submit IO to scsi ML any more.
>>
>> If you mean multiple jobs running on the same connection/session then
>> they are all serialized now. A connection can only do 1 cmd at a time.
>> There's a big mutex around it in the network layer, so multiple jobs
>> just suck no matter what.
> 
> I guess one block device can only bind to one isci connection, given the
> 1 cmd per connection limit, so looks multiple jobs is fine.
> 
>>
>> If you mean multiple jobs from different connection/sessions, then the
>> iscsi code with this patchset blocks only because the network layer
>> takes a mutex for a short time. We configure it to not block for things
>> like socket space, memory allocations, we do zero copy IO normally, etc
>> so it's quick.
>>
>> We also can do up to workqueues max_active limit worth of calls so
>> other things can normally send IO. We haven't found a need to increase
>> it yet.
>  
> I meant that hctx->run_work is required for blk-mq to dispatch IO, iscsi is
> SQ HBA, so there is only single work_struct. If one context is blocked in
> ->queue_rq or ->queuecommand, other contexts can't submit IO to driver any
> more.

I see what you mean. With the current code, we have the same issue already.
We have 1 work_struct per connection/session and one connection/session
per scsi_host.

Basically, the iscsi protocol and socket layer only allow us to send the 1
command per connection at a time (you can't have 2 threads doing
sendmsg/sendpage). It's why nvme/tcp is a lot better. It makes N tcp
connections and each hwctx can use a different one.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd
  2022-03-08  0:39 [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Mike Christie
                   ` (4 preceding siblings ...)
       [not found] ` <CGME20220308004023epcas2p12ebd497c14d32f36d0aa6682c0b9d0db@epcms2p7>
@ 2022-03-15  8:26 ` Christoph Hellwig
  2022-03-16  1:08   ` Ming Lei
  5 siblings, 1 reply; 18+ messages in thread
From: Christoph Hellwig @ 2022-03-15  8:26 UTC (permalink / raw)
  To: Mike Christie
  Cc: bvanassche, lduncan, cleech, ming.lei, martin.petersen,
	linux-scsi, james.bottomley

The subject seems a bit misleading, I'd expect to see
BLK_MQ_F_BLOCKING in the subject.

Note that you'll also need the series from Ming to support dm-multipath
on top devices that set BLK_MQ_F_BLOCKING.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd
  2022-03-15  8:26 ` [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Christoph Hellwig
@ 2022-03-16  1:08   ` Ming Lei
  0 siblings, 0 replies; 18+ messages in thread
From: Ming Lei @ 2022-03-16  1:08 UTC (permalink / raw)
  To: Christoph Hellwig, Mike Christie
  Cc: bvanassche, lduncan, cleech, martin.petersen, linux-scsi,
	james.bottomley

On Tue, Mar 15, 2022 at 01:26:40AM -0700, Christoph Hellwig wrote:
> The subject seems a bit misleading, I'd expect to see
> BLK_MQ_F_BLOCKING in the subject.
> 
> Note that you'll also need the series from Ming to support dm-multipath
> on top devices that set BLK_MQ_F_BLOCKING.

Indeed.

Mike, please feel free to fold the following patches into your next
post:

https://lore.kernel.org/linux-block/YcP4FMG9an5ReIiV@T590/#r


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-16  1:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-08  0:39 [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Mike Christie
2022-03-08  0:39 ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Mike Christie
2022-03-08  4:57   ` Bart Van Assche
2022-03-09  0:53   ` Ming Lei
2022-03-09  1:17     ` Mike Christie
2022-03-09  1:43       ` Ming Lei
2022-03-09 19:38         ` Mike Christie
2022-03-08  0:39 ` [RFC PATCH 2/4] scsi: iscsi: Tell drivers when we must not block Mike Christie
2022-03-08  4:59   ` Bart Van Assche
2022-03-08 15:58     ` Mike Christie
2022-03-08  0:39 ` [RFC PATCH 3/4] scsi: iscsi: Support transmit from queuecommand Mike Christie
2022-03-08  0:39 ` [RFC PATCH 4/4] scsi: iscsi_tcp: Allow user to control if " Mike Christie
2022-03-08  5:10   ` Bart Van Assche
2022-03-08 16:51     ` Mike Christie
     [not found] ` <CGME20220308004023epcas2p12ebd497c14d32f36d0aa6682c0b9d0db@epcms2p7>
2022-03-08  7:00   ` [RFC PATCH 1/4] scsi: Allow drivers to set BLK_MQ_F_BLOCKING Daejun Park
2022-03-08 16:14     ` Mike Christie
2022-03-15  8:26 ` [RFC PATCH 0/4] scsi/iscsi: Send iscsi data from kblockd Christoph Hellwig
2022-03-16  1:08   ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.