All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads
@ 2014-02-14 11:21 Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 1/5] tgtd: add helper functions for checking iostate of iscsi connections Hitoshi Mitake
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Hitoshi Mitake @ 2014-02-14 11:21 UTC (permalink / raw)
  To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake

Current tgtd sends and receives iSCSI PDUs in its main event
loop. This design can cause bottleneck when many iSCSI clients connect
to single tgtd process. For example, we need multiple tgtd processes
for utilizing fast network like 10 GbE because typical single
processor core isn't fast enough for processing a bunch of requests.

This patchset lets tgtd offload send/recv iSCSI PDUs and digest checking to
worker threads. The basic strategy of this change is like below:
1. decompose iscsi_[rt]x_handler()
2. re-implement single threaded version of iscsi_tcp_event_handler()
3. implement multi-threaded version of iscsi_tcp_event_handler()

This patch also adds a new option "-T" to specify a number of threads which
send/recieve PDUs. When 1 is passed with the option, the above single threaded
version will be used, because the multi-threaded version incurs overhead which
comes from frequent communication between threads in some case. If users don't
want to use the multi-threaded version, they don't have to use it.

Below is a summary of our performance evaluation:
- 4 physical hosts connected with 10Gbps ethernet
- 1 tgtd process provides 1 logical unit
- 16 VMs read 4GB iso file by dd command in parallel
-- average time required to complete the dd command of 16 VMs is a score
--- original tgtd: 93.718 second
--- changed tgtd (with -T 16): 57.449 second

The above scores show that this patchset can improve performance of parallel
access of initiators. This patchset is not so heavily tested yet. I'd like to
hear your opinion about the design.

Hitoshi Mitake (5):
  tgtd: add helper functions for checking iostate of iscsi connections
  tgtd: decompose iscsi_[rt]x_handler()
  tgtd: add a new option "-T" for specifying a number of threads which
    send/recv iSCSI PDUs
  tgtd: implement a deferred event modification mechanism
  tgtd: offload iSCSI PDU send/recv to worker threads

 usr/iscsi/iscsi_tcp.c |  405 +++++++++++++++++++++++++++++++++++++++++++++++--
 usr/iscsi/iscsid.c    |  101 ++++++++----
 usr/iscsi/iscsid.h    |   11 +-
 usr/tgtd.c            |   10 +-
 usr/tgtd.h            |    1 +
 5 files changed, 483 insertions(+), 45 deletions(-)

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 RFC 1/5] tgtd: add helper functions for checking iostate of iscsi connections
  2014-02-14 11:21 [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
@ 2014-02-14 11:21 ` Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 2/5] tgtd: decompose iscsi_[rt]x_handler() Hitoshi Mitake
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Hitoshi Mitake @ 2014-02-14 11:21 UTC (permalink / raw)
  To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake

This patch adds helper functions for checking iostate of iscsi
connections from outside of iscsid.c. These functions will be used in
iscsi_tcp.c in the future commits.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
---
 usr/iscsi/iscsid.c |   20 ++++++++++++++++++++
 usr/iscsi/iscsid.h |    4 ++++
 2 files changed, 24 insertions(+)

diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c
index 30bd13f..c472608 100644
--- a/usr/iscsi/iscsid.c
+++ b/usr/iscsi/iscsid.c
@@ -76,6 +76,26 @@ enum {
 	IOSTATE_TX_END,
 };
 
+int is_conn_rx_bhs(struct iscsi_connection *conn)
+{
+	return conn->rx_iostate == IOSTATE_RX_BHS;
+}
+
+int is_conn_rx_init_ahs(struct iscsi_connection *conn)
+{
+	return conn->rx_iostate == IOSTATE_RX_INIT_AHS;
+}
+
+int is_conn_rx_end(struct iscsi_connection *conn)
+{
+	return conn->rx_iostate == IOSTATE_RX_END;
+}
+
+int is_conn_tx_end(struct iscsi_connection *conn)
+{
+	return conn->tx_iostate == IOSTATE_TX_END;
+}
+
 void conn_read_pdu(struct iscsi_connection *conn)
 {
 	conn->rx_iostate = IOSTATE_RX_BHS;
diff --git a/usr/iscsi/iscsid.h b/usr/iscsi/iscsid.h
index c7f6801..01d0002 100644
--- a/usr/iscsi/iscsid.h
+++ b/usr/iscsi/iscsid.h
@@ -336,6 +336,10 @@ extern int iscsi_param_parse_portals(char *p, int do_add, int do_delete);
 extern void iscsi_update_conn_stats_rx(struct iscsi_connection *conn, int size, int opcode);
 extern void iscsi_update_conn_stats_tx(struct iscsi_connection *conn, int size, int opcode);
 extern void iscsi_rsp_set_residual(struct iscsi_cmd_rsp *rsp, struct scsi_cmd *scmd);
+extern int is_conn_rx_bhs(struct iscsi_connection *conn);
+extern int is_conn_rx_init_ahs(struct iscsi_connection *conn);
+extern int is_conn_rx_end(struct iscsi_connection *conn);
+extern int is_conn_tx_end(struct iscsi_connection *conn);
 
 /* iscsid.c iscsi_task */
 extern void iscsi_free_task(struct iscsi_task *task);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 RFC 2/5] tgtd: decompose iscsi_[rt]x_handler()
  2014-02-14 11:21 [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 1/5] tgtd: add helper functions for checking iostate of iscsi connections Hitoshi Mitake
@ 2014-02-14 11:21 ` Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 3/5] tgtd: add a new option "-T" for specifying a number of threads which send/recv iSCSI PDUs Hitoshi Mitake
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Hitoshi Mitake @ 2014-02-14 11:21 UTC (permalink / raw)
  To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake

Basic strategy of parallelizing sending/receiving iSCSI PDUs is like
this: offload do_recv(), do_send() and crc checking to worker threads
and do other part in the main event loop thread.

To follow the above strategy, this patch decomposes iscsi_rx_handler()
and iscsi_tx_handler() into some parts. Some of them can be called in
worker threads and others are not. In the future commit,
multi-threaded TCP event handler will be implemented based on
them. This patch re-implements single threaded TCP event handler
(past iscsi_tcp_event_handler(), renamed iscsi_tcp_st_event_handler())
based on these decomposed parts. iscsi_tcp_st_event_handler() is
also required because under some workload the multi-threaded version
degrades performance because of frequent communication between
threads.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
---
 usr/iscsi/iscsi_tcp.c |   47 +++++++++++++++++++++++-----
 usr/iscsi/iscsid.c    |   81 +++++++++++++++++++++++++++++--------------------
 usr/iscsi/iscsid.h    |    7 ++++-
 3 files changed, 94 insertions(+), 41 deletions(-)

diff --git a/usr/iscsi/iscsi_tcp.c b/usr/iscsi/iscsi_tcp.c
index 4a9532a..9a41466 100644
--- a/usr/iscsi/iscsi_tcp.c
+++ b/usr/iscsi/iscsi_tcp.c
@@ -37,7 +37,7 @@
 #include "util.h"
 #include "work.h"
 
-static void iscsi_tcp_event_handler(int fd, int events, void *data);
+static void iscsi_tcp_st_event_handler(int fd, int events, void *data);
 static void iscsi_tcp_release(struct iscsi_connection *conn);
 static struct iscsi_task *iscsi_tcp_alloc_task(struct iscsi_connection *conn,
 						size_t ext_len);
@@ -255,7 +255,7 @@ static void accept_connection(int afd, int events, void *data)
 	conn_read_pdu(conn);
 	set_non_blocking(fd);
 
-	ret = tgt_event_add(fd, EPOLLIN, iscsi_tcp_event_handler, conn);
+	ret = tgt_event_add(fd, EPOLLIN, iscsi_tcp_st_event_handler, conn);
 	if (ret) {
 		conn_exit(conn);
 		free(tcp_conn);
@@ -270,19 +270,52 @@ out:
 	return;
 }
 
-static void iscsi_tcp_event_handler(int fd, int events, void *data)
+static void iscsi_tcp_st_event_handler(int fd, int events, void *data)
 {
+	int ret;
+
 	struct iscsi_connection *conn = (struct iscsi_connection *) data;
 
-	if (events & EPOLLIN)
-		iscsi_rx_handler(conn);
+	if (events & EPOLLIN) {
+		if (is_conn_rx_bhs(conn)) {
+			ret = iscsi_rx_bhs_handler(conn);
+			if (!is_conn_rx_init_ahs(conn)
+			    || ret < 0 || conn->state == STATE_CLOSE)
+				goto epollin_end;
+		}
+
+		if (conn->state != STATE_CLOSE && is_conn_rx_init_ahs(conn)) {
+			iscsi_pre_iostate_rx_init_ahs(conn);
+
+			if (conn->state == STATE_CLOSE)
+				goto epollin_end;
+		}
+
+		if (conn->state != STATE_CLOSE)
+			iscsi_rx_handler(conn);
 
+		if (conn->state != STATE_CLOSE && is_conn_rx_end(conn))
+			iscsi_rx_done(conn);
+	}
+
+epollin_end:
 	if (conn->state == STATE_CLOSE)
 		dprintf("connection closed\n");
 
-	if (conn->state != STATE_CLOSE && events & EPOLLOUT)
-		iscsi_tx_handler(conn);
+	if (conn->state != STATE_CLOSE && events & EPOLLOUT) {
+		if (conn->state == STATE_SCSI && !conn->tx_task) {
+			if (iscsi_task_tx_start(conn))
+				goto epollout_end;
+		}
+
+		if (conn->state != STATE_CLOSE)
+			iscsi_tx_handler(conn);
+
+		if (conn->state != STATE_CLOSE && is_conn_tx_end(conn))
+			iscsi_tx_done(conn);
+	}
 
+epollout_end:
 	if (conn->state == STATE_CLOSE) {
 		dprintf("connection closed %p\n", conn);
 		conn_close(conn);
diff --git a/usr/iscsi/iscsid.c b/usr/iscsi/iscsid.c
index c472608..c696d30 100644
--- a/usr/iscsi/iscsid.c
+++ b/usr/iscsi/iscsid.c
@@ -1991,7 +1991,7 @@ static int iscsi_task_tx_done(struct iscsi_connection *conn)
 	return 0;
 }
 
-static int iscsi_task_tx_start(struct iscsi_connection *conn)
+int iscsi_task_tx_start(struct iscsi_connection *conn)
 {
 	struct iscsi_task *task;
 	int is_rsp, err = 0;
@@ -2048,7 +2048,7 @@ static int do_recv(struct iscsi_connection *conn, int next_state)
 		return 0;
 	} else if (ret < 0) {
 		if (errno == EINTR || errno == EAGAIN)
-			return 0;
+			return -errno;
 		else
 			return -EIO;
 	}
@@ -2066,7 +2066,30 @@ static int do_recv(struct iscsi_connection *conn, int next_state)
 	return ret;
 }
 
-void iscsi_rx_handler(struct iscsi_connection *conn)
+int iscsi_rx_bhs_handler(struct iscsi_connection *conn)
+{
+	if (conn->rx_iostate != IOSTATE_RX_BHS) {
+		eprintf("invalid rx_iostate: %d\n", conn->rx_iostate);
+		exit(1);
+	}
+
+	return do_recv(conn, IOSTATE_RX_INIT_AHS);
+}
+
+void iscsi_pre_iostate_rx_init_ahs(struct iscsi_connection *conn)
+{
+	if (conn->state == STATE_SCSI) {
+		if (iscsi_task_rx_start(conn))
+			conn->state = STATE_CLOSE;
+	} else {
+		conn->rx_buffer = conn->req_buffer;
+		conn->req.ahs = conn->rx_buffer;
+		conn->req.data = conn->rx_buffer
+			+ conn->req.bhs.hlength * 4;
+	}
+}
+
+int iscsi_rx_handler(struct iscsi_connection *conn)
 {
 	int ret = 0, hdigest, ddigest;
 	uint32_t crc;
@@ -2080,23 +2103,7 @@ void iscsi_rx_handler(struct iscsi_connection *conn)
 		hdigest = ddigest = 0;
 again:
 	switch (conn->rx_iostate) {
-	case IOSTATE_RX_BHS:
-		ret = do_recv(conn, IOSTATE_RX_INIT_AHS);
-		if (ret <= 0 || conn->rx_iostate != IOSTATE_RX_INIT_AHS)
-			break;
 	case IOSTATE_RX_INIT_AHS:
-		if (conn->state == STATE_SCSI) {
-			ret = iscsi_task_rx_start(conn);
-			if (ret) {
-				conn->state = STATE_CLOSE;
-				break;
-			}
-		} else {
-			conn->rx_buffer = conn->req_buffer;
-			conn->req.ahs = conn->rx_buffer;
-			conn->req.data = conn->rx_buffer
-				+ conn->req.bhs.hlength * 4;
-		}
 		conn->req.ahssize = conn->req.bhs.hlength * 4;
 		conn->req.datasize = ntoh24(conn->req.bhs.dlength);
 		conn->rx_size = conn->req.ahssize;
@@ -2104,7 +2111,7 @@ again:
 		if (conn->state != STATE_SCSI &&
 		    conn->req.ahssize > INCOMING_BUFSIZE) {
 			conn->state = STATE_CLOSE;
-			return;
+			return -1;
 		}
 
 		if (conn->rx_size) {
@@ -2164,7 +2171,7 @@ again:
 				if (conn->req.ahssize + conn->rx_size >
 				    INCOMING_BUFSIZE) {
 					conn->state = STATE_CLOSE;
-					return;
+					return -1;
 				}
 			}
 		} else {
@@ -2202,10 +2209,11 @@ again:
 		exit(1);
 	}
 
-	if (ret < 0 ||
-	    conn->rx_iostate != IOSTATE_RX_END ||
-	    conn->state == STATE_CLOSE)
-		return;
+	if (ret < 0)
+		return ret;
+
+	if (conn->rx_iostate != IOSTATE_RX_END || conn->state == STATE_CLOSE)
+		return 0;
 
 	if (conn->rx_size) {
 		eprintf("error %d %d %d\n", conn->state, conn->rx_iostate,
@@ -2213,6 +2221,13 @@ again:
 		exit(1);
 	}
 
+	return 0;
+}
+
+void iscsi_rx_done(struct iscsi_connection *conn)
+{
+	int ret;
+
 	if (conn->state == STATE_SCSI) {
 		ret = iscsi_task_rx_done(conn);
 		if (ret)
@@ -2268,12 +2283,6 @@ int iscsi_tx_handler(struct iscsi_connection *conn)
 	} else
 		hdigest = ddigest = 0;
 
-	if (conn->state == STATE_SCSI && !conn->tx_task) {
-		ret = iscsi_task_tx_start(conn);
-		if (ret)
-			goto out;
-	}
-
 	/*
 	 * For rdma, grab the data-in or r2t packet and covert to
 	 * an RDMA operation.
@@ -2393,6 +2402,14 @@ again:
 finish:
 	cmnd_finish(conn);
 
+out:
+	return ret;
+}
+
+void iscsi_tx_done(struct iscsi_connection *conn)
+{
+	int ret;
+
 	switch (conn->state) {
 	case STATE_KERNEL:
 		ret = conn_take_fd(conn);
@@ -2416,8 +2433,6 @@ finish:
 		break;
 	}
 
-out:
-	return ret;
 }
 
 int iscsi_transportid(int tid, uint64_t itn_id, char *buf, int size)
diff --git a/usr/iscsi/iscsid.h b/usr/iscsi/iscsid.h
index 01d0002..b795828 100644
--- a/usr/iscsi/iscsid.h
+++ b/usr/iscsi/iscsid.h
@@ -321,8 +321,13 @@ extern tgtadm_err conn_close_admin(uint32_t tid, uint64_t sid, uint32_t cid);
 extern char *text_key_find(struct iscsi_connection *conn, char *searchKey);
 extern void text_key_add(struct iscsi_connection *conn, char *key, char *value);
 extern void conn_read_pdu(struct iscsi_connection *conn);
+extern int iscsi_task_tx_start(struct iscsi_connection *conn);
 extern int iscsi_tx_handler(struct iscsi_connection *conn);
-extern void iscsi_rx_handler(struct iscsi_connection *conn);
+extern void iscsi_tx_done(struct iscsi_connection *conn);
+extern int iscsi_rx_bhs_handler(struct iscsi_connection *conn);
+extern void iscsi_pre_iostate_rx_init_ahs(struct iscsi_connection *conn);
+extern int iscsi_rx_handler(struct iscsi_connection *conn);
+extern void iscsi_rx_done(struct iscsi_connection *conn);
 extern int iscsi_scsi_cmd_execute(struct iscsi_task *task);
 extern int iscsi_transportid(int tid, uint64_t itn_id, char *buf, int size);
 extern int iscsi_add_portal(char *addr, int port, int tpgt);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 RFC 3/5] tgtd: add a new option "-T" for specifying a number of threads which send/recv iSCSI PDUs
  2014-02-14 11:21 [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 1/5] tgtd: add helper functions for checking iostate of iscsi connections Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 2/5] tgtd: decompose iscsi_[rt]x_handler() Hitoshi Mitake
@ 2014-02-14 11:21 ` Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 4/5] tgtd: implement a deferred event modification mechanism Hitoshi Mitake
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Hitoshi Mitake @ 2014-02-14 11:21 UTC (permalink / raw)
  To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake

Currently, nr_iothreads, specified with "-t" option, is used as a
number of worker threads per logical unit conventionally in some
backing store drivers.

This patch adds a new option "-T" to tgtd for specified a number of
threads which send/recv iSCSI PDUs. Because using the "-t" option for
this purpose would be introduce inconvenience of performance tuning.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
---
 usr/iscsi/iscsi_tcp.c |    2 ++
 usr/tgtd.c            |   10 +++++++++-
 usr/tgtd.h            |    1 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/usr/iscsi/iscsi_tcp.c b/usr/iscsi/iscsi_tcp.c
index 9a41466..d00265e 100644
--- a/usr/iscsi/iscsi_tcp.c
+++ b/usr/iscsi/iscsi_tcp.c
@@ -37,6 +37,8 @@
 #include "util.h"
 #include "work.h"
 
+int nr_tcp_iothreads = 1;
+
 static void iscsi_tcp_st_event_handler(int fd, int events, void *data);
 static void iscsi_tcp_release(struct iscsi_connection *conn);
 static struct iscsi_task *iscsi_tcp_alloc_task(struct iscsi_connection *conn,
diff --git a/usr/tgtd.c b/usr/tgtd.c
index 50e1c83..09c1442 100644
--- a/usr/tgtd.c
+++ b/usr/tgtd.c
@@ -55,13 +55,14 @@ static struct option const long_options[] = {
 	{"foreground", no_argument, 0, 'f'},
 	{"control-port", required_argument, 0, 'C'},
 	{"nr_iothreads", required_argument, 0, 't'},
+	{"nr_tcp_iothreads", required_argument, 0, 'T'},
 	{"debug", required_argument, 0, 'd'},
 	{"version", no_argument, 0, 'V'},
 	{"help", no_argument, 0, 'h'},
 	{0, 0, 0, 0},
 };
 
-static char *short_options = "fC:d:t:Vh";
+static char *short_options = "fC:d:t:VhT:";
 static char *spare_args;
 
 static void usage(int status)
@@ -77,6 +78,8 @@ static void usage(int status)
 		"-f, --foreground        make the program run in the foreground\n"
 		"-C, --control-port NNNN use port NNNN for the mgmt channel\n"
 		"-t, --nr_iothreads NNNN specify the number of I/O threads\n"
+		"-T, --nr_tcp_iothreads NNNN specify the number of I/O threads"
+		" for send/recv iSCSI PDU\n"
 		"-d, --debug debuglevel  print debugging information\n"
 		"-V, --version           print version and exit\n"
 		"-h, --help              display this help and exit\n",
@@ -546,6 +549,11 @@ int main(int argc, char **argv)
 			if (ret)
 				bad_optarg(ret, ch, optarg);
 			break;
+		case 'T':
+			ret = str_to_int_gt(optarg, nr_tcp_iothreads, 0);
+			if (ret)
+				bad_optarg(ret, ch, optarg);
+			break;
 		case 'd':
 			ret = str_to_int_range(optarg, is_debug, 0, 1);
 			if (ret)
diff --git a/usr/tgtd.h b/usr/tgtd.h
index 8a25521..a708f13 100644
--- a/usr/tgtd.h
+++ b/usr/tgtd.h
@@ -257,6 +257,7 @@ enum mgmt_req_result {
 extern int system_active;
 extern int is_debug;
 extern int nr_iothreads;
+extern int nr_tcp_iothreads;
 extern struct list_head bst_list;
 
 extern int ipc_init(void);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 RFC 4/5] tgtd: implement a deferred event modification mechanism
  2014-02-14 11:21 [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
                   ` (2 preceding siblings ...)
  2014-02-14 11:21 ` [PATCH v4 RFC 3/5] tgtd: add a new option "-T" for specifying a number of threads which send/recv iSCSI PDUs Hitoshi Mitake
@ 2014-02-14 11:21 ` Hitoshi Mitake
  2014-02-14 11:21 ` [PATCH v4 RFC 5/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
  2014-02-19  3:11 ` [PATCH v4 RFC 0/5] " Hitoshi Mitake
  5 siblings, 0 replies; 7+ messages in thread
From: Hitoshi Mitake @ 2014-02-14 11:21 UTC (permalink / raw)
  To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake

In the next commit, worker threads for sending/receiving iSCSI PDUs
will be implemented. When the feature is used, the worker threads read
and write fds created by accept(2). For avoiding conflicts between the
main event loop thread and the worker threads, no event should be
notified to the main event via epoll_wait(2) when the worker threads
are running.

For this purpose, this patch implements a deferred event modification
mechanism. Two new members are added to iscsi_tcp_connection:
used_in_worker_threads and restore_events. The first one is used for
indicating that the connection is used by worker threads. The second
one is used for storing an event mask which should be set to the
fd. When control of the fd is passed back to the main event, the
stored event mask is set via tgt_event_modify().

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
---
 usr/iscsi/iscsi_tcp.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/usr/iscsi/iscsi_tcp.c b/usr/iscsi/iscsi_tcp.c
index d00265e..bb553a8 100644
--- a/usr/iscsi/iscsi_tcp.c
+++ b/usr/iscsi/iscsi_tcp.c
@@ -61,6 +61,9 @@ struct iscsi_tcp_connection {
 	long ttt;
 
 	struct iscsi_connection iscsi_conn;
+
+	int used_in_worker_thread;
+	int restore_events;
 };
 
 static inline struct iscsi_tcp_connection *TCP_CONN(struct iscsi_connection *conn)
@@ -588,9 +591,13 @@ static void iscsi_event_modify(struct iscsi_connection *conn, int events)
 	struct iscsi_tcp_connection *tcp_conn = TCP_CONN(conn);
 	int ret;
 
-	ret = tgt_event_modify(tcp_conn->fd, events);
-	if (ret)
-		eprintf("tgt_event_modify failed\n");
+	if (tcp_conn->used_in_worker_thread)
+		tcp_conn->restore_events = events;
+	else {
+		ret = tgt_event_modify(tcp_conn->fd, events);
+		if (ret)
+			eprintf("tgt_event_modify failed\n");
+	}
 }
 
 static struct iscsi_task *iscsi_tcp_alloc_task(struct iscsi_connection *conn,
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v4 RFC 5/5] tgtd: offload iSCSI PDU send/recv to worker threads
  2014-02-14 11:21 [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
                   ` (3 preceding siblings ...)
  2014-02-14 11:21 ` [PATCH v4 RFC 4/5] tgtd: implement a deferred event modification mechanism Hitoshi Mitake
@ 2014-02-14 11:21 ` Hitoshi Mitake
  2014-02-19  3:11 ` [PATCH v4 RFC 0/5] " Hitoshi Mitake
  5 siblings, 0 replies; 7+ messages in thread
From: Hitoshi Mitake @ 2014-02-14 11:21 UTC (permalink / raw)
  To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake

Current tgtd sends and receives iSCSI PDUs in its main event
loop. This design can cause bottleneck when many iSCSI clients connect
to single tgtd process. For example, we need multiple tgtd processes
for utilizing fast network like 10 GbE because typical single
processor core isn't fast enough for processing a bunch of requests.

This patch lets tgtd offload send/recv iSCSI PDUs and check digests to
worker threads when a parameter of "-T" option is larger than 1. The
offloading is done by a newly added event handler,
iscsi_tcp_mt_event_handler(). When "-T" isn't passed or value 1 is
passed, single threaded version (iscsi_tcp_st_event_handler()) is
used. The single threaded version is provided because in some cases
the multi-threaded version degrades performance (e.g. a number of
initiators are not so large).

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
---
 usr/iscsi/iscsi_tcp.c |  345 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 344 insertions(+), 1 deletion(-)

diff --git a/usr/iscsi/iscsi_tcp.c b/usr/iscsi/iscsi_tcp.c
index bb553a8..a78ddbb 100644
--- a/usr/iscsi/iscsi_tcp.c
+++ b/usr/iscsi/iscsi_tcp.c
@@ -31,6 +31,8 @@
 #include <netinet/tcp.h>
 #include <sys/epoll.h>
 #include <sys/socket.h>
+#include <pthread.h>
+#include <sys/eventfd.h>
 
 #include "iscsid.h"
 #include "tgtd.h"
@@ -40,6 +42,7 @@
 int nr_tcp_iothreads = 1;
 
 static void iscsi_tcp_st_event_handler(int fd, int events, void *data);
+static void iscsi_tcp_mt_event_handler(int fd, int events, void *data);
 static void iscsi_tcp_release(struct iscsi_connection *conn);
 static struct iscsi_task *iscsi_tcp_alloc_task(struct iscsi_connection *conn,
 						size_t ext_len);
@@ -50,6 +53,23 @@ static long nop_ttt;
 static int listen_fds[8];
 static struct iscsi_transport iscsi_tcp;
 
+enum iscsi_tcp_work_state {
+	ISCSI_TCP_WORK_INIT,
+	ISCSI_TCP_WORK_RX,
+	ISCSI_TCP_WORK_RX_BHS,
+	ISCSI_TCP_WORK_RX_EAGAIN,
+	ISCSI_TCP_WORK_RX_FAILED,
+	ISCSI_TCP_WORK_TX,
+	ISCSI_TCP_WORK_TX_FAILED,
+};
+
+struct iscsi_tcp_work {
+	/* list: connected to iscsi_tcp_work_list or iscsi_tcp_finished_list */
+	struct list_head list;
+
+	enum iscsi_tcp_work_state state;
+};
+
 struct iscsi_tcp_connection {
 	int fd;
 
@@ -64,13 +84,229 @@ struct iscsi_tcp_connection {
 
 	int used_in_worker_thread;
 	int restore_events;
+
+	struct iscsi_tcp_work work;
 };
 
+static LIST_HEAD(iscsi_tcp_work_list);
+static pthread_mutex_t iscsi_tcp_work_mutex = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t iscsi_tcp_work_cond = PTHREAD_COND_INITIALIZER;
+
+static LIST_HEAD(iscsi_tcp_work_finished_list);
+static pthread_mutex_t iscsi_tcp_work_finished_mutex =
+	PTHREAD_MUTEX_INITIALIZER;
+
+static int iscsi_tcp_work_done_fd;
+
+static pthread_mutex_t iscsi_tcp_worker_startup_mutex =
+	PTHREAD_MUTEX_INITIALIZER;
+
+static int iscsi_tcp_worker_stop;
+
+static pthread_t *iscsi_tcp_worker_threads;
+
+static void queue_iscsi_tcp_work(struct iscsi_connection *conn, int events);
+
+static void iscsi_tcp_work_done_handler(int fd, int events, void *data)
+{
+	LIST_HEAD(list);
+	struct iscsi_tcp_work *work;
+	struct iscsi_connection *conn;
+	struct iscsi_tcp_connection *tcp_conn;
+	int ret, failed;
+	eventfd_t dummy;
+
+	ret = eventfd_read(fd, &dummy);
+	if (ret < 0) {
+		eprintf("iscsi tcp work error: %m\n");
+		exit(1);
+	}
+
+	pthread_mutex_lock(&iscsi_tcp_work_finished_mutex);
+	list_splice_init(&iscsi_tcp_work_finished_list, &list);
+	pthread_mutex_unlock(&iscsi_tcp_work_finished_mutex);
+
+	while (!list_empty(&list)) {
+		work = list_first_entry(&list, struct iscsi_tcp_work, list);
+		list_del(&work->list);
+
+		tcp_conn =
+			container_of(work, struct iscsi_tcp_connection, work);
+		conn = &tcp_conn->iscsi_conn;
+
+		tcp_conn->used_in_worker_thread = 0;
+
+		ret = tgt_event_add(tcp_conn->fd, tcp_conn->restore_events,
+				    iscsi_tcp_mt_event_handler, conn);
+		if (ret < 0) {
+			/* fd is broken by worker threads */
+			failed = 1;
+			goto end;
+		}
+
+		failed = 0;
+
+		if (conn->state == STATE_CLOSE)
+			goto end;
+
+		switch (work->state) {
+		case ISCSI_TCP_WORK_RX_FAILED:
+		case ISCSI_TCP_WORK_TX_FAILED:
+			failed = 1;
+			goto end;
+		case ISCSI_TCP_WORK_RX:
+			if (is_conn_rx_end(conn))
+				iscsi_rx_done(conn);
+			break;
+		case ISCSI_TCP_WORK_RX_BHS:
+			if (is_conn_rx_bhs(conn))
+				/* EAGAIN or EINTR */
+				break;
+
+			iscsi_pre_iostate_rx_init_ahs(conn);
+			if (conn->state == STATE_CLOSE)
+				break;
+
+			/* bypass the main event loop */
+			work->state = ISCSI_TCP_WORK_RX;
+			queue_iscsi_tcp_work(conn, tcp_conn->restore_events);
+			continue;
+		case ISCSI_TCP_WORK_TX:
+			if (is_conn_tx_end(conn))
+				iscsi_tx_done(conn);
+			break;
+		default:
+			eprintf("invalid state of iscsi work tcp: %d\n",
+				work->state);
+			exit(1);
+		}
+
+		tcp_conn->restore_events = 0;
+
+end:
+		work->state = ISCSI_TCP_WORK_INIT;
+		if (failed || conn->state == STATE_CLOSE) {
+			dprintf("connection closed %p\n", conn);
+			conn_close(conn);
+		}
+	}
+}
+
+static void *iscsi_tcp_worker_fn(void *arg)
+{
+	sigset_t set;
+	struct iscsi_tcp_work *work;
+	struct iscsi_connection *conn;
+	struct iscsi_tcp_connection *tcp_conn;
+	int ret;
+
+	sigfillset(&set);
+	sigprocmask(SIG_BLOCK, &set, NULL);
+
+	pthread_mutex_lock(&iscsi_tcp_worker_startup_mutex);
+	pthread_mutex_unlock(&iscsi_tcp_worker_startup_mutex);
+
+	dprintf("starting iscsi tcp worker thread: %lu\n", pthread_self());
+
+	while (!iscsi_tcp_worker_stop) {
+		pthread_mutex_lock(&iscsi_tcp_work_mutex);
+retest:
+		if (list_empty(&iscsi_tcp_work_list)) {
+			pthread_cond_wait(&iscsi_tcp_work_cond,
+					  &iscsi_tcp_work_mutex);
+
+			if (iscsi_tcp_worker_stop) {
+				pthread_mutex_unlock(&iscsi_tcp_work_mutex);
+				pthread_exit(NULL);
+			}
+
+			goto retest;
+		}
+
+		work = list_first_entry(&iscsi_tcp_work_list,
+				       struct iscsi_tcp_work, list);
+
+		list_del(&work->list);
+		pthread_mutex_unlock(&iscsi_tcp_work_mutex);
+
+		tcp_conn =
+			container_of(work, struct iscsi_tcp_connection, work);
+		conn = &tcp_conn->iscsi_conn;
+
+		switch (work->state) {
+		case ISCSI_TCP_WORK_RX_BHS:
+			ret = iscsi_rx_bhs_handler(conn);
+			if (ret < 0)
+				work->state = ISCSI_TCP_WORK_RX_FAILED;
+			break;
+		case ISCSI_TCP_WORK_RX:
+			do {
+				ret = iscsi_rx_handler(conn);
+				if (ret == -EAGAIN)
+					break;
+				if (ret == -EINTR)
+					continue;
+
+				if (ret < 0) {
+					work->state = ISCSI_TCP_WORK_RX_FAILED;
+					break;
+				}
+			} while (conn->state != STATE_CLOSE &&
+				 !is_conn_rx_end(conn));
+			break;
+		case ISCSI_TCP_WORK_TX:
+			do {
+				ret = iscsi_tx_handler(conn);
+				if (ret < 0) {
+					work->state = ISCSI_TCP_WORK_TX_FAILED;
+					break;
+				}
+			} while (conn->state != STATE_CLOSE &&
+				 !is_conn_tx_end(conn));
+			break;
+		default:
+			eprintf("invalid state of iscsi tcp work: %d\n",
+				work->state);
+			exit(1);
+		}
+
+		pthread_mutex_lock(&iscsi_tcp_work_finished_mutex);
+		list_add_tail(&work->list, &iscsi_tcp_work_finished_list);
+		pthread_mutex_unlock(&iscsi_tcp_work_finished_mutex);
+
+		ret = eventfd_write(iscsi_tcp_work_done_fd, 1);
+		if (ret < 0) {
+			eprintf("iscsi tcp work error: %m\n");
+			exit(1);
+		}
+	}
+
+	pthread_exit(NULL);
+}
+
 static inline struct iscsi_tcp_connection *TCP_CONN(struct iscsi_connection *conn)
 {
 	return container_of(conn, struct iscsi_tcp_connection, iscsi_conn);
 }
 
+static void queue_iscsi_tcp_work(struct iscsi_connection *conn, int events)
+{
+	struct iscsi_tcp_connection *tcp_conn = TCP_CONN(conn);
+	struct iscsi_tcp_work *work = &tcp_conn->work;
+
+	tcp_conn->used_in_worker_thread = 1;
+
+	tcp_conn->restore_events = events;
+
+	tgt_event_del(tcp_conn->fd);
+
+	pthread_mutex_lock(&iscsi_tcp_work_mutex);
+	list_add_tail(&work->list, &iscsi_tcp_work_list);
+	pthread_mutex_unlock(&iscsi_tcp_work_mutex);
+
+	pthread_cond_signal(&iscsi_tcp_work_cond);
+}
+
 static struct tgt_work nop_work;
 
 /* all iscsi connections */
@@ -100,6 +336,9 @@ static void iscsi_tcp_nop_work_handler(void *data)
 	struct iscsi_tcp_connection *tcp_conn;
 
 	list_for_each_entry(tcp_conn, &iscsi_tcp_conn_list, tcp_conn_siblings) {
+		if (tcp_conn->used_in_worker_thread)
+			continue;
+
 		if (tcp_conn->nop_interval == 0)
 			continue;
 
@@ -246,6 +485,9 @@ static void accept_connection(int afd, int events, void *data)
 	if (!tcp_conn)
 		goto out;
 
+	INIT_LIST_HEAD(&tcp_conn->work.list);
+	tcp_conn->work.state = ISCSI_TCP_WORK_INIT;
+
 	conn = &tcp_conn->iscsi_conn;
 
 	ret = conn_init(conn);
@@ -260,7 +502,11 @@ static void accept_connection(int afd, int events, void *data)
 	conn_read_pdu(conn);
 	set_non_blocking(fd);
 
-	ret = tgt_event_add(fd, EPOLLIN, iscsi_tcp_st_event_handler, conn);
+	ret = tgt_event_add(fd, EPOLLIN,
+			    nr_tcp_iothreads == 1 ?
+			    iscsi_tcp_st_event_handler :
+			    iscsi_tcp_mt_event_handler,
+			    conn);
 	if (ret) {
 		conn_exit(conn);
 		free(tcp_conn);
@@ -327,6 +573,39 @@ epollout_end:
 	}
 }
 
+static void iscsi_tcp_mt_event_handler(int fd, int events, void *data)
+{
+	struct iscsi_connection *conn = (struct iscsi_connection *) data;
+	struct iscsi_tcp_connection *tcp_conn = TCP_CONN(conn);
+	struct iscsi_tcp_work *work = &tcp_conn->work;
+
+	if (work->state != ISCSI_TCP_WORK_INIT) {
+		eprintf("invalid state of iscsi tcp work: %d\n", work->state);
+		exit(1);
+	}
+
+	if (conn->state == STATE_CLOSE) {
+		conn_close(conn);
+		return;
+	}
+
+	if (events & EPOLLIN) {
+		if (is_conn_rx_bhs(conn))
+			work->state = ISCSI_TCP_WORK_RX_BHS;
+		else
+			work->state = ISCSI_TCP_WORK_RX;
+	} else if (events & EPOLLOUT) {
+		if (conn->state == STATE_SCSI && !conn->tx_task) {
+			if (iscsi_task_tx_start(conn))
+				return;
+		}
+
+		work->state = ISCSI_TCP_WORK_TX;
+	}
+
+	queue_iscsi_tcp_work(conn, events);
+}
+
 int iscsi_tcp_init_portal(char *addr, int port, int tpgt)
 {
 	struct addrinfo hints, *res, *res0;
@@ -461,6 +740,8 @@ int iscsi_delete_portal(char *addr, int port)
 
 static int iscsi_tcp_init(void)
 {
+	int i, ret = 0;
+
 	/* If we were passed any portals on the command line */
 	if (portal_arguments)
 		iscsi_param_parse_portals(portal_arguments, 1, 0);
@@ -478,17 +759,79 @@ static int iscsi_tcp_init(void)
 	nop_work.data = &nop_work;
 	add_work(&nop_work, 1);
 
+	if (1 < nr_tcp_iothreads) {
+		iscsi_tcp_work_done_fd = eventfd(0, EFD_NONBLOCK);
+		if (iscsi_tcp_work_done_fd < 0) {
+			eprintf("failed to create eventfd for tcp work: %m\n");
+			return -1;
+		}
+
+		ret = tgt_event_add(iscsi_tcp_work_done_fd, EPOLLIN,
+				    iscsi_tcp_work_done_handler, NULL);
+		if (ret < 0) {
+			eprintf("failed to register"\
+				"iscsi_tcp_work_done_handler(): %m\n");
+			ret = -1;
+			goto close_done_fd;
+		}
+
+		iscsi_tcp_worker_threads = calloc(nr_tcp_iothreads,
+						  sizeof(pthread_t));
+		if (!iscsi_tcp_worker_threads) {
+			eprintf("failed to allocate memory for pthread"\
+				" identifier: %m\n");
+			ret = -1;
+
+			goto close_done_fd;
+		}
+
+		pthread_mutex_lock(&iscsi_tcp_worker_startup_mutex);
+		for (i = 0; i < nr_tcp_iothreads; i++) {
+			ret = pthread_create(&iscsi_tcp_worker_threads[i], NULL,
+					     iscsi_tcp_worker_fn, NULL);
+			if (ret) {
+				eprintf("creating worker thread failed: %m\n");
+				ret = -1;
+
+				goto terminate_workers;
+			}
+		}
+
+		pthread_mutex_unlock(&iscsi_tcp_worker_startup_mutex);
+		goto out;
+
+terminate_workers:
+		iscsi_tcp_worker_stop = 1;
+		pthread_mutex_unlock(&iscsi_tcp_worker_startup_mutex);
+
+		for (; 0 <= i; i--)
+			pthread_join(iscsi_tcp_worker_threads[i], NULL);
+
+		free(iscsi_tcp_worker_threads);
+
+close_done_fd:
+		close(iscsi_tcp_work_done_fd);
+	}
+
+out:
 	return 0;
 }
 
 static void iscsi_tcp_exit(void)
 {
+	int i;
 	struct iscsi_portal *portal, *ptmp;
 
 	list_for_each_entry_safe(portal, ptmp, &iscsi_portals_list,
 			    iscsi_portal_siblings) {
 		iscsi_delete_portal(portal->addr, portal->port);
 	}
+
+	iscsi_tcp_worker_stop = 1;
+	for (i = 0; i < nr_tcp_iothreads; i++) {
+		pthread_cond_signal(&iscsi_tcp_work_cond);
+		pthread_join(iscsi_tcp_worker_threads[i], NULL);
+	}
 }
 
 static int iscsi_tcp_conn_login_complete(struct iscsi_connection *conn)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads
  2014-02-14 11:21 [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
                   ` (4 preceding siblings ...)
  2014-02-14 11:21 ` [PATCH v4 RFC 5/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
@ 2014-02-19  3:11 ` Hitoshi Mitake
  5 siblings, 0 replies; 7+ messages in thread
From: Hitoshi Mitake @ 2014-02-19  3:11 UTC (permalink / raw)
  To: Hitoshi Mitake; +Cc: stgt

Sorry, please discard this patchset. We found that the previous commit:
https://github.com/fujita/tgt/commit/0e23d5754d3bc3cf0328255fa7dab8547ecb6e60
improves throughput of tgtd enough.

We've evaluated this patchset based on RHEL's tgt, which doesn't have
the above commit.

Thanks,
Hitoshi


On Fri, Feb 14, 2014 at 8:21 PM, Hitoshi Mitake
<mitake.hitoshi@lab.ntt.co.jp> wrote:
> Current tgtd sends and receives iSCSI PDUs in its main event
> loop. This design can cause bottleneck when many iSCSI clients connect
> to single tgtd process. For example, we need multiple tgtd processes
> for utilizing fast network like 10 GbE because typical single
> processor core isn't fast enough for processing a bunch of requests.
>
> This patchset lets tgtd offload send/recv iSCSI PDUs and digest checking to
> worker threads. The basic strategy of this change is like below:
> 1. decompose iscsi_[rt]x_handler()
> 2. re-implement single threaded version of iscsi_tcp_event_handler()
> 3. implement multi-threaded version of iscsi_tcp_event_handler()
>
> This patch also adds a new option "-T" to specify a number of threads which
> send/recieve PDUs. When 1 is passed with the option, the above single threaded
> version will be used, because the multi-threaded version incurs overhead which
> comes from frequent communication between threads in some case. If users don't
> want to use the multi-threaded version, they don't have to use it.
>
> Below is a summary of our performance evaluation:
> - 4 physical hosts connected with 10Gbps ethernet
> - 1 tgtd process provides 1 logical unit
> - 16 VMs read 4GB iso file by dd command in parallel
> -- average time required to complete the dd command of 16 VMs is a score
> --- original tgtd: 93.718 second
> --- changed tgtd (with -T 16): 57.449 second
>
> The above scores show that this patchset can improve performance of parallel
> access of initiators. This patchset is not so heavily tested yet. I'd like to
> hear your opinion about the design.
>
> Hitoshi Mitake (5):
>   tgtd: add helper functions for checking iostate of iscsi connections
>   tgtd: decompose iscsi_[rt]x_handler()
>   tgtd: add a new option "-T" for specifying a number of threads which
>     send/recv iSCSI PDUs
>   tgtd: implement a deferred event modification mechanism
>   tgtd: offload iSCSI PDU send/recv to worker threads
>
>  usr/iscsi/iscsi_tcp.c |  405 +++++++++++++++++++++++++++++++++++++++++++++++--
>  usr/iscsi/iscsid.c    |  101 ++++++++----
>  usr/iscsi/iscsid.h    |   11 +-
>  usr/tgtd.c            |   10 +-
>  usr/tgtd.h            |    1 +
>  5 files changed, 483 insertions(+), 45 deletions(-)
>
> --
> 1.7.10.4
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-02-19  3:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-14 11:21 [PATCH v4 RFC 0/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
2014-02-14 11:21 ` [PATCH v4 RFC 1/5] tgtd: add helper functions for checking iostate of iscsi connections Hitoshi Mitake
2014-02-14 11:21 ` [PATCH v4 RFC 2/5] tgtd: decompose iscsi_[rt]x_handler() Hitoshi Mitake
2014-02-14 11:21 ` [PATCH v4 RFC 3/5] tgtd: add a new option "-T" for specifying a number of threads which send/recv iSCSI PDUs Hitoshi Mitake
2014-02-14 11:21 ` [PATCH v4 RFC 4/5] tgtd: implement a deferred event modification mechanism Hitoshi Mitake
2014-02-14 11:21 ` [PATCH v4 RFC 5/5] tgtd: offload iSCSI PDU send/recv to worker threads Hitoshi Mitake
2014-02-19  3:11 ` [PATCH v4 RFC 0/5] " Hitoshi Mitake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.