All of lore.kernel.org
 help / color / mirror / Atom feed
* iscsi update for 2.6.25
@ 2007-12-05  0:42 michaelc
  2007-12-05  0:42 ` [PATCH 01/23] libiscsi, iscsi_tcp: add device support michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi

This patchset was made over scsi-rc-fixes (it also applies to Linus's git
tree). It will not apply over scsi-misc, because there are some patches
which went into scsi-rc-fixes/2.6.24-rc4 and scsi-misc has not been updated
for the scsi-rc-fixes patches.

Sorry for the bigger than normal update. There are patches from Olaf and
Boaz which do the following:

- Rewrite iscsi_tcp data path (from Olaf), so we do not have two ulgy
abstractions for the send and recv path. Now we have one nice iscsi_segment
struct that works for both paths.
- Add basic iscsi spec handling we will need for Bidi support (from Boaz).
This does not add bidi support. It just adds code that is common.
- Device reset support. Some clustering software needed at least
device reset support, so I cleaned up that code and added device
logical unit reset support. I will add warm target reset support
later.
- Lots of fixes for things like shutdown, r2t leaks, and lots of clean
up from Olaf.
- sg chaining support.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/23] libiscsi, iscsi_tcp: add device support
  2007-12-05  0:42 iscsi update for 2.6.25 michaelc
@ 2007-12-05  0:42 ` michaelc
  2007-12-05  0:42   ` [PATCH 02/23] iscsi_tcp: rewrite recv path michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

This patch adds logical unit reset support. This should work for ib_iser,
but I have not finished testing that driver so it is not hooked in yet.

This patch also temporarily reverts the iscsi_tcp r2t write out patch.
That code is completely rewritten in this patchset.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |    6 -
 drivers/scsi/iscsi_tcp.c                 |  151 +++++-----
 drivers/scsi/iscsi_tcp.h                 |   34 +-
 drivers/scsi/libiscsi.c                  |  494 +++++++++++++++++-------------
 drivers/scsi/scsi_transport_iscsi.c      |    4 +-
 include/scsi/iscsi_if.h                  |    2 +
 include/scsi/iscsi_proto.h               |    2 +
 include/scsi/libiscsi.h                  |   25 +-
 8 files changed, 396 insertions(+), 322 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index bad8dac..2eadb6d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -220,12 +220,6 @@ iscsi_iser_ctask_xmit(struct iscsi_conn *conn,
 	debug_scsi("ctask deq [cid %d itt 0x%x]\n",
 		   conn->id, ctask->itt);
 
-	/*
-	 * serialize with TMF AbortTask
-	 */
-	if (ctask->mtask)
-		return error;
-
 	/* Send the cmd PDU */
 	if (!iser_ctask->command_sent) {
 		error = iser_send_command(conn, ctask);
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 57ce225..4b226b8 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -197,7 +197,7 @@ iscsi_tcp_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	if (unlikely(!sc))
 		return;
 
-	tcp_ctask->xmstate = XMSTATE_VALUE_IDLE;
+	tcp_ctask->xmstate = XMSTATE_IDLE;
 	tcp_ctask->r2t = NULL;
 }
 
@@ -369,8 +369,7 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	spin_lock(&session->lock);
 	iscsi_update_cmdsn(session, (struct iscsi_nopin*)rhdr);
 
-	if (!ctask->sc || ctask->mtask ||
-	     session->state != ISCSI_STATE_LOGGED_IN) {
+	if (!ctask->sc || session->state != ISCSI_STATE_LOGGED_IN) {
 		printk(KERN_INFO "iscsi_tcp: dropping R2T itt %d in "
 		       "recovery...\n", ctask->itt);
 		spin_unlock(&session->lock);
@@ -409,11 +408,10 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 
 	tcp_ctask->exp_datasn = r2tsn + 1;
 	__kfifo_put(tcp_ctask->r2tqueue, (void*)&r2t, sizeof(void*));
-	set_bit(XMSTATE_BIT_SOL_HDR_INIT, &tcp_ctask->xmstate);
-	list_move_tail(&ctask->running, &conn->xmitqueue);
-
-	scsi_queue_work(session->host, &conn->xmitwork);
+	tcp_ctask->xmstate |= XMSTATE_SOL_HDR_INIT;
 	conn->r2t_pdus_cnt++;
+
+	iscsi_requeue_ctask(ctask);
 	spin_unlock(&session->lock);
 
 	return 0;
@@ -1254,7 +1252,7 @@ static void iscsi_set_padding(struct iscsi_tcp_cmd_task *tcp_ctask,
 
 	tcp_ctask->pad_count = ISCSI_PAD_LEN - tcp_ctask->pad_count;
 	debug_scsi("write padding %d bytes\n", tcp_ctask->pad_count);
-	set_bit(XMSTATE_BIT_W_PAD, &tcp_ctask->xmstate);
+	tcp_ctask->xmstate |= XMSTATE_W_PAD;
 }
 
 /**
@@ -1269,7 +1267,7 @@ iscsi_tcp_cmd_init(struct iscsi_cmd_task *ctask)
 	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
 
 	BUG_ON(__kfifo_len(tcp_ctask->r2tqueue));
-	tcp_ctask->xmstate = 1 << XMSTATE_BIT_CMD_HDR_INIT;
+	tcp_ctask->xmstate = XMSTATE_CMD_HDR_INIT;
 }
 
 /**
@@ -1283,10 +1281,10 @@ iscsi_tcp_cmd_init(struct iscsi_cmd_task *ctask)
  *	xmit.
  *
  *	Management xmit state machine consists of these states:
- *		XMSTATE_BIT_IMM_HDR_INIT - calculate digest of PDU Header
- *		XMSTATE_BIT_IMM_HDR      - PDU Header xmit in progress
- *		XMSTATE_BIT_IMM_DATA     - PDU Data xmit in progress
- *		XMSTATE_VALUE_IDLE       - management PDU is done
+ *		XMSTATE_IMM_HDR_INIT	- calculate digest of PDU Header
+ *		XMSTATE_IMM_HDR 	- PDU Header xmit in progress
+ *		XMSTATE_IMM_DATA 	- PDU Data xmit in progress
+ *		XMSTATE_IDLE		- management PDU is done
  **/
 static int
 iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
@@ -1297,12 +1295,12 @@ iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 	debug_scsi("mtask deq [cid %d state %x itt 0x%x]\n",
 		conn->id, tcp_mtask->xmstate, mtask->itt);
 
-	if (test_bit(XMSTATE_BIT_IMM_HDR_INIT, &tcp_mtask->xmstate)) {
+	if (tcp_mtask->xmstate & XMSTATE_IMM_HDR_INIT) {
 		iscsi_buf_init_iov(&tcp_mtask->headbuf, (char*)mtask->hdr,
 				   sizeof(struct iscsi_hdr));
 
 		if (mtask->data_count) {
-			set_bit(XMSTATE_BIT_IMM_DATA, &tcp_mtask->xmstate);
+			tcp_mtask->xmstate |= XMSTATE_IMM_DATA;
 			iscsi_buf_init_iov(&tcp_mtask->sendbuf,
 					   (char*)mtask->data,
 					   mtask->data_count);
@@ -1315,20 +1313,21 @@ iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 					(u8*)tcp_mtask->hdrext);
 
 		tcp_mtask->sent = 0;
-		clear_bit(XMSTATE_BIT_IMM_HDR_INIT, &tcp_mtask->xmstate);
-		set_bit(XMSTATE_BIT_IMM_HDR, &tcp_mtask->xmstate);
+		tcp_mtask->xmstate &= ~XMSTATE_IMM_HDR_INIT;
+		tcp_mtask->xmstate |= XMSTATE_IMM_HDR;
 	}
 
-	if (test_bit(XMSTATE_BIT_IMM_HDR, &tcp_mtask->xmstate)) {
+	if (tcp_mtask->xmstate & XMSTATE_IMM_HDR) {
 		rc = iscsi_sendhdr(conn, &tcp_mtask->headbuf,
 				   mtask->data_count);
 		if (rc)
 			return rc;
-		clear_bit(XMSTATE_BIT_IMM_HDR, &tcp_mtask->xmstate);
+		tcp_mtask->xmstate &= ~XMSTATE_IMM_HDR;
 	}
 
-	if (test_and_clear_bit(XMSTATE_BIT_IMM_DATA, &tcp_mtask->xmstate)) {
+	if (tcp_mtask->xmstate & XMSTATE_IMM_DATA) {
 		BUG_ON(!mtask->data_count);
+		tcp_mtask->xmstate &= ~XMSTATE_IMM_DATA;
 		/* FIXME: implement.
 		 * Virtual buffer could be spreaded across multiple pages...
 		 */
@@ -1338,13 +1337,13 @@ iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 			rc = iscsi_sendpage(conn, &tcp_mtask->sendbuf,
 					&mtask->data_count, &tcp_mtask->sent);
 			if (rc) {
-				set_bit(XMSTATE_BIT_IMM_DATA, &tcp_mtask->xmstate);
+				tcp_mtask->xmstate |= XMSTATE_IMM_DATA;
 				return rc;
 			}
 		} while (mtask->data_count);
 	}
 
-	BUG_ON(tcp_mtask->xmstate != XMSTATE_VALUE_IDLE);
+	BUG_ON(tcp_mtask->xmstate != XMSTATE_IDLE);
 	if (mtask->hdr->itt == RESERVED_ITT) {
 		struct iscsi_session *session = conn->session;
 
@@ -1364,7 +1363,7 @@ iscsi_send_cmd_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
 	int rc = 0;
 
-	if (test_bit(XMSTATE_BIT_CMD_HDR_INIT, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_CMD_HDR_INIT) {
 		tcp_ctask->sent = 0;
 		tcp_ctask->sg_count = 0;
 		tcp_ctask->exp_datasn = 0;
@@ -1389,21 +1388,21 @@ iscsi_send_cmd_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 		if (conn->hdrdgst_en)
 			iscsi_hdr_digest(conn, &tcp_ctask->headbuf,
 					 (u8*)tcp_ctask->hdrext);
-		clear_bit(XMSTATE_BIT_CMD_HDR_INIT, &tcp_ctask->xmstate);
-		set_bit(XMSTATE_BIT_CMD_HDR_XMIT, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_CMD_HDR_INIT;
+		tcp_ctask->xmstate |= XMSTATE_CMD_HDR_XMIT;
 	}
 
-	if (test_bit(XMSTATE_BIT_CMD_HDR_XMIT, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_CMD_HDR_XMIT) {
 		rc = iscsi_sendhdr(conn, &tcp_ctask->headbuf, ctask->imm_count);
 		if (rc)
 			return rc;
-		clear_bit(XMSTATE_BIT_CMD_HDR_XMIT, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_CMD_HDR_XMIT;
 
 		if (sc->sc_data_direction != DMA_TO_DEVICE)
 			return 0;
 
 		if (ctask->imm_count) {
-			set_bit(XMSTATE_BIT_IMM_DATA, &tcp_ctask->xmstate);
+			tcp_ctask->xmstate |= XMSTATE_IMM_DATA;
 			iscsi_set_padding(tcp_ctask, ctask->imm_count);
 
 			if (ctask->conn->datadgst_en) {
@@ -1413,10 +1412,9 @@ iscsi_send_cmd_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 			}
 		}
 
-		if (ctask->unsol_count) {
-			set_bit(XMSTATE_BIT_UNS_HDR, &tcp_ctask->xmstate);
-			set_bit(XMSTATE_BIT_UNS_INIT, &tcp_ctask->xmstate);
-		}
+		if (ctask->unsol_count)
+			tcp_ctask->xmstate |=
+					XMSTATE_UNS_HDR | XMSTATE_UNS_INIT;
 	}
 	return rc;
 }
@@ -1428,25 +1426,25 @@ iscsi_send_padding(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
 	int sent = 0, rc;
 
-	if (test_bit(XMSTATE_BIT_W_PAD, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_W_PAD) {
 		iscsi_buf_init_iov(&tcp_ctask->sendbuf, (char*)&tcp_ctask->pad,
 				   tcp_ctask->pad_count);
 		if (conn->datadgst_en)
 			crypto_hash_update(&tcp_conn->tx_hash,
 					   &tcp_ctask->sendbuf.sg,
 					   tcp_ctask->sendbuf.sg.length);
-	} else if (!test_bit(XMSTATE_BIT_W_RESEND_PAD, &tcp_ctask->xmstate))
+	} else if (!(tcp_ctask->xmstate & XMSTATE_W_RESEND_PAD))
 		return 0;
 
-	clear_bit(XMSTATE_BIT_W_PAD, &tcp_ctask->xmstate);
-	clear_bit(XMSTATE_BIT_W_RESEND_PAD, &tcp_ctask->xmstate);
+	tcp_ctask->xmstate &= ~XMSTATE_W_PAD;
+	tcp_ctask->xmstate &= ~XMSTATE_W_RESEND_PAD;
 	debug_scsi("sending %d pad bytes for itt 0x%x\n",
 		   tcp_ctask->pad_count, ctask->itt);
 	rc = iscsi_sendpage(conn, &tcp_ctask->sendbuf, &tcp_ctask->pad_count,
 			   &sent);
 	if (rc) {
 		debug_scsi("padding send failed %d\n", rc);
-		set_bit(XMSTATE_BIT_W_RESEND_PAD, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate |= XMSTATE_W_RESEND_PAD;
 	}
 	return rc;
 }
@@ -1465,11 +1463,11 @@ iscsi_send_digest(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 	tcp_ctask = ctask->dd_data;
 	tcp_conn = conn->dd_data;
 
-	if (!test_bit(XMSTATE_BIT_W_RESEND_DATA_DIGEST, &tcp_ctask->xmstate)) {
+	if (!(tcp_ctask->xmstate & XMSTATE_W_RESEND_DATA_DIGEST)) {
 		crypto_hash_final(&tcp_conn->tx_hash, (u8*)digest);
 		iscsi_buf_init_iov(buf, (char*)digest, 4);
 	}
-	clear_bit(XMSTATE_BIT_W_RESEND_DATA_DIGEST, &tcp_ctask->xmstate);
+	tcp_ctask->xmstate &= ~XMSTATE_W_RESEND_DATA_DIGEST;
 
 	rc = iscsi_sendpage(conn, buf, &tcp_ctask->digest_count, &sent);
 	if (!rc)
@@ -1478,7 +1476,7 @@ iscsi_send_digest(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 	else {
 		debug_scsi("sending digest 0x%x failed for itt 0x%x!\n",
 			  *digest, ctask->itt);
-		set_bit(XMSTATE_BIT_W_RESEND_DATA_DIGEST, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate |= XMSTATE_W_RESEND_DATA_DIGEST;
 	}
 	return rc;
 }
@@ -1526,8 +1524,8 @@ iscsi_send_unsol_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	struct iscsi_data_task *dtask;
 	int rc;
 
-	set_bit(XMSTATE_BIT_UNS_DATA, &tcp_ctask->xmstate);
-	if (test_bit(XMSTATE_BIT_UNS_INIT, &tcp_ctask->xmstate)) {
+	tcp_ctask->xmstate |= XMSTATE_UNS_DATA;
+	if (tcp_ctask->xmstate & XMSTATE_UNS_INIT) {
 		dtask = &tcp_ctask->unsol_dtask;
 
 		iscsi_prep_unsolicit_data_pdu(ctask, &dtask->hdr);
@@ -1537,14 +1535,14 @@ iscsi_send_unsol_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 			iscsi_hdr_digest(conn, &tcp_ctask->headbuf,
 					(u8*)dtask->hdrext);
 
-		clear_bit(XMSTATE_BIT_UNS_INIT, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_UNS_INIT;
 		iscsi_set_padding(tcp_ctask, ctask->data_count);
 	}
 
 	rc = iscsi_sendhdr(conn, &tcp_ctask->headbuf, ctask->data_count);
 	if (rc) {
-		clear_bit(XMSTATE_BIT_UNS_DATA, &tcp_ctask->xmstate);
-		set_bit(XMSTATE_BIT_UNS_HDR, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_UNS_DATA;
+		tcp_ctask->xmstate |= XMSTATE_UNS_HDR;
 		return rc;
 	}
 
@@ -1565,15 +1563,16 @@ iscsi_send_unsol_pdu(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
 	int rc;
 
-	if (test_and_clear_bit(XMSTATE_BIT_UNS_HDR, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_UNS_HDR) {
 		BUG_ON(!ctask->unsol_count);
+		tcp_ctask->xmstate &= ~XMSTATE_UNS_HDR;
 send_hdr:
 		rc = iscsi_send_unsol_hdr(conn, ctask);
 		if (rc)
 			return rc;
 	}
 
-	if (test_bit(XMSTATE_BIT_UNS_DATA, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_UNS_DATA) {
 		struct iscsi_data_task *dtask = &tcp_ctask->unsol_dtask;
 		int start = tcp_ctask->sent;
 
@@ -1583,14 +1582,14 @@ send_hdr:
 		ctask->unsol_count -= tcp_ctask->sent - start;
 		if (rc)
 			return rc;
-		clear_bit(XMSTATE_BIT_UNS_DATA, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_UNS_DATA;
 		/*
 		 * Done with the Data-Out. Next, check if we need
 		 * to send another unsolicited Data-Out.
 		 */
 		if (ctask->unsol_count) {
 			debug_scsi("sending more uns\n");
-			set_bit(XMSTATE_BIT_UNS_INIT, &tcp_ctask->xmstate);
+			tcp_ctask->xmstate |= XMSTATE_UNS_INIT;
 			goto send_hdr;
 		}
 	}
@@ -1606,7 +1605,7 @@ static int iscsi_send_sol_pdu(struct iscsi_conn *conn,
 	struct iscsi_data_task *dtask;
 	int left, rc;
 
-	if (test_bit(XMSTATE_BIT_SOL_HDR_INIT, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_SOL_HDR_INIT) {
 		if (!tcp_ctask->r2t) {
 			spin_lock_bh(&session->lock);
 			__kfifo_get(tcp_ctask->r2tqueue, (void*)&tcp_ctask->r2t,
@@ -1620,19 +1619,19 @@ send_hdr:
 		if (conn->hdrdgst_en)
 			iscsi_hdr_digest(conn, &r2t->headbuf,
 					(u8*)dtask->hdrext);
-		clear_bit(XMSTATE_BIT_SOL_HDR_INIT, &tcp_ctask->xmstate);
-		set_bit(XMSTATE_BIT_SOL_HDR, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_SOL_HDR_INIT;
+		tcp_ctask->xmstate |= XMSTATE_SOL_HDR;
 	}
 
-	if (test_bit(XMSTATE_BIT_SOL_HDR, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_SOL_HDR) {
 		r2t = tcp_ctask->r2t;
 		dtask = &r2t->dtask;
 
 		rc = iscsi_sendhdr(conn, &r2t->headbuf, r2t->data_count);
 		if (rc)
 			return rc;
-		clear_bit(XMSTATE_BIT_SOL_HDR, &tcp_ctask->xmstate);
-		set_bit(XMSTATE_BIT_SOL_DATA, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_SOL_HDR;
+		tcp_ctask->xmstate |= XMSTATE_SOL_DATA;
 
 		if (conn->datadgst_en) {
 			iscsi_data_digest_init(conn->dd_data, tcp_ctask);
@@ -1645,7 +1644,7 @@ send_hdr:
 			r2t->sent);
 	}
 
-	if (test_bit(XMSTATE_BIT_SOL_DATA, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_SOL_DATA) {
 		r2t = tcp_ctask->r2t;
 		dtask = &r2t->dtask;
 
@@ -1654,7 +1653,7 @@ send_hdr:
 				     &dtask->digestbuf, &dtask->digest);
 		if (rc)
 			return rc;
-		clear_bit(XMSTATE_BIT_SOL_DATA, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_SOL_DATA;
 
 		/*
 		 * Done with this Data-Out. Next, check if we have
@@ -1699,32 +1698,32 @@ send_hdr:
  *	xmit stages.
  *
  *iscsi_send_cmd_hdr()
- *	XMSTATE_BIT_CMD_HDR_INIT - prepare Header and Data buffers Calculate
- *	                           Header Digest
- *	XMSTATE_BIT_CMD_HDR_XMIT - Transmit header in progress
+ *	XMSTATE_CMD_HDR_INIT - prepare Header and Data buffers Calculate
+ *	                       Header Digest
+ *	XMSTATE_CMD_HDR_XMIT - Transmit header in progress
  *
  *iscsi_send_padding
- *	XMSTATE_BIT_W_PAD        - Prepare and send pading
- *	XMSTATE_BIT_W_RESEND_PAD - retry send pading
+ *	XMSTATE_W_PAD        - Prepare and send pading
+ *	XMSTATE_W_RESEND_PAD - retry send pading
  *
  *iscsi_send_digest
- *	XMSTATE_BIT_W_RESEND_DATA_DIGEST - Finalize and send Data Digest
- *	XMSTATE_BIT_W_RESEND_DATA_DIGEST - retry sending digest
+ *	XMSTATE_W_RESEND_DATA_DIGEST - Finalize and send Data Digest
+ *	XMSTATE_W_RESEND_DATA_DIGEST - retry sending digest
  *
  *iscsi_send_unsol_hdr
- *	XMSTATE_BIT_UNS_INIT     - prepare un-solicit data header and digest
- *	XMSTATE_BIT_UNS_HDR      - send un-solicit header
+ *	XMSTATE_UNS_INIT     - prepare un-solicit data header and digest
+ *	XMSTATE_UNS_HDR      - send un-solicit header
  *
  *iscsi_send_unsol_pdu
- *	XMSTATE_BIT_UNS_DATA     - send un-solicit data in progress
+ *	XMSTATE_UNS_DATA     - send un-solicit data in progress
  *
  *iscsi_send_sol_pdu
- *	XMSTATE_BIT_SOL_HDR_INIT - solicit data header and digest initialize
- *	XMSTATE_BIT_SOL_HDR      - send solicit header
- *	XMSTATE_BIT_SOL_DATA     - send solicit data
+ *	XMSTATE_SOL_HDR_INIT - solicit data header and digest initialize
+ *	XMSTATE_SOL_HDR      - send solicit header
+ *	XMSTATE_SOL_DATA     - send solicit data
  *
  *iscsi_tcp_ctask_xmit
- *	XMSTATE_BIT_IMM_DATA     - xmit managment data (??)
+ *	XMSTATE_IMM_DATA     - xmit managment data (??)
  **/
 static int
 iscsi_tcp_ctask_xmit(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
@@ -1741,13 +1740,13 @@ iscsi_tcp_ctask_xmit(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	if (ctask->sc->sc_data_direction != DMA_TO_DEVICE)
 		return 0;
 
-	if (test_bit(XMSTATE_BIT_IMM_DATA, &tcp_ctask->xmstate)) {
+	if (tcp_ctask->xmstate & XMSTATE_IMM_DATA) {
 		rc = iscsi_send_data(ctask, &tcp_ctask->sendbuf, &tcp_ctask->sg,
 				     &tcp_ctask->sent, &ctask->imm_count,
 				     &tcp_ctask->immbuf, &tcp_ctask->immdigest);
 		if (rc)
 			return rc;
-		clear_bit(XMSTATE_BIT_IMM_DATA, &tcp_ctask->xmstate);
+		tcp_ctask->xmstate &= ~XMSTATE_IMM_DATA;
 	}
 
 	rc = iscsi_send_unsol_pdu(conn, ctask);
@@ -1980,7 +1979,7 @@ static void
 iscsi_tcp_mgmt_init(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 {
 	struct iscsi_tcp_mgmt_task *tcp_mtask = mtask->dd_data;
-	tcp_mtask->xmstate = 1 << XMSTATE_BIT_IMM_HDR_INIT;
+	tcp_mtask->xmstate = XMSTATE_IMM_HDR_INIT;
 }
 
 static int
@@ -2226,6 +2225,7 @@ static struct scsi_host_template iscsi_sht = {
 	.max_sectors		= 0xFFFF,
 	.cmd_per_lun		= ISCSI_DEF_CMD_PER_LUN,
 	.eh_abort_handler       = iscsi_eh_abort,
+	.eh_device_reset_handler= iscsi_eh_device_reset,
 	.eh_host_reset_handler	= iscsi_eh_host_reset,
 	.use_clustering         = DISABLE_CLUSTERING,
 	.slave_configure        = iscsi_tcp_slave_configure,
@@ -2257,7 +2257,8 @@ static struct iscsi_transport iscsi_tcp_transport = {
 				  ISCSI_PERSISTENT_ADDRESS |
 				  ISCSI_TARGET_NAME | ISCSI_TPGT |
 				  ISCSI_USERNAME | ISCSI_PASSWORD |
-				  ISCSI_USERNAME_IN | ISCSI_PASSWORD_IN,
+				  ISCSI_USERNAME_IN | ISCSI_PASSWORD_IN |
+				  ISCSI_FAST_ABORT,
 	.host_param_mask	= ISCSI_HOST_HWADDRESS | ISCSI_HOST_IPADDRESS |
 				  ISCSI_HOST_INITIATOR_NAME |
 				  ISCSI_HOST_NETDEV_NAME,
diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
index 68c36cc..7eba44d 100644
--- a/drivers/scsi/iscsi_tcp.h
+++ b/drivers/scsi/iscsi_tcp.h
@@ -32,21 +32,21 @@
 #define IN_PROGRESS_PAD_RECV		0x4
 
 /* xmit state machine */
-#define XMSTATE_VALUE_IDLE			0
-#define XMSTATE_BIT_CMD_HDR_INIT		0
-#define XMSTATE_BIT_CMD_HDR_XMIT		1
-#define XMSTATE_BIT_IMM_HDR			2
-#define XMSTATE_BIT_IMM_DATA			3
-#define XMSTATE_BIT_UNS_INIT			4
-#define XMSTATE_BIT_UNS_HDR			5
-#define XMSTATE_BIT_UNS_DATA			6
-#define XMSTATE_BIT_SOL_HDR			7
-#define XMSTATE_BIT_SOL_DATA			8
-#define XMSTATE_BIT_W_PAD			9
-#define XMSTATE_BIT_W_RESEND_PAD		10
-#define XMSTATE_BIT_W_RESEND_DATA_DIGEST	11
-#define XMSTATE_BIT_IMM_HDR_INIT		12
-#define XMSTATE_BIT_SOL_HDR_INIT		13
+#define XMSTATE_IDLE			0x0
+#define XMSTATE_CMD_HDR_INIT		0x1
+#define XMSTATE_CMD_HDR_XMIT		0x2
+#define XMSTATE_IMM_HDR			0x4
+#define XMSTATE_IMM_DATA		0x8
+#define XMSTATE_UNS_INIT		0x10
+#define XMSTATE_UNS_HDR			0x20
+#define XMSTATE_UNS_DATA		0x40
+#define XMSTATE_SOL_HDR			0x80
+#define XMSTATE_SOL_DATA		0x100
+#define XMSTATE_W_PAD			0x200
+#define XMSTATE_W_RESEND_PAD		0x400
+#define XMSTATE_W_RESEND_DATA_DIGEST	0x800
+#define XMSTATE_IMM_HDR_INIT		0x1000
+#define XMSTATE_SOL_HDR_INIT		0x2000
 
 #define ISCSI_PAD_LEN			4
 #define ISCSI_SG_TABLESIZE		SG_ALL
@@ -122,7 +122,7 @@ struct iscsi_data_task {
 struct iscsi_tcp_mgmt_task {
 	struct iscsi_hdr	hdr;
 	char			hdrext[sizeof(__u32)]; /* Header-Digest */
-	unsigned long		xmstate;	/* mgmt xmit progress */
+	int			xmstate;	/* mgmt xmit progress */
 	struct iscsi_buf	headbuf;	/* header buffer */
 	struct iscsi_buf	sendbuf;	/* in progress buffer */
 	int			sent;
@@ -150,7 +150,7 @@ struct iscsi_tcp_cmd_task {
 	int			pad_count;		/* padded bytes */
 	struct iscsi_buf	headbuf;		/* header buf (xmit) */
 	struct iscsi_buf	sendbuf;		/* in progress buffer*/
-	unsigned long		xmstate;		/* xmit xtate machine */
+	int			xmstate;		/* xmit xtate machine */
 	int			sent;
 	struct scatterlist	*sg;			/* per-cmd SG list  */
 	struct scatterlist	*bad_sg;		/* assert statement */
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 8b57af5..176458f 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -86,7 +86,7 @@ iscsi_update_cmdsn(struct iscsi_session *session, struct iscsi_nopin *hdr)
 		 * xmit thread
 		 */
 		if (!list_empty(&session->leadconn->xmitqueue) ||
-		    __kfifo_len(session->leadconn->mgmtqueue))
+		    !list_empty(&session->leadconn->mgmtqueue))
 			scsi_queue_work(session->host,
 					&session->leadconn->xmitwork);
 	}
@@ -318,15 +318,15 @@ static void iscsi_tmf_rsp(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 	conn->exp_statsn = be32_to_cpu(hdr->statsn) + 1;
 	conn->tmfrsp_pdus_cnt++;
 
-	if (conn->tmabort_state != TMABORT_INITIAL)
+	if (conn->tmf_state != TMF_QUEUED)
 		return;
 
 	if (tmf->response == ISCSI_TMF_RSP_COMPLETE)
-		conn->tmabort_state = TMABORT_SUCCESS;
+		conn->tmf_state = TMF_SUCCESS;
 	else if (tmf->response == ISCSI_TMF_RSP_NO_TASK)
-		conn->tmabort_state = TMABORT_NOT_FOUND;
+		conn->tmf_state = TMF_NOT_FOUND;
 	else
-		conn->tmabort_state = TMABORT_FAILED;
+		conn->tmf_state = TMF_FAILED;
 	wake_up(&conn->ehwait);
 }
 
@@ -429,7 +429,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			 */
 			if (iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen))
 				rc = ISCSI_ERR_CONN_FAILED;
-			list_del(&mtask->running);
+			list_del_init(&mtask->running);
 			if (conn->login_mtask != mtask)
 				__kfifo_put(session->mgmtpool.queue,
 					    (void*)&mtask, sizeof(void*));
@@ -451,10 +451,9 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 
 			if (iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen))
 				rc = ISCSI_ERR_CONN_FAILED;
-			list_del(&mtask->running);
-			if (conn->login_mtask != mtask)
-				__kfifo_put(session->mgmtpool.queue,
-					    (void*)&mtask, sizeof(void*));
+			list_del_init(&mtask->running);
+			__kfifo_put(session->mgmtpool.queue,
+				    (void*)&mtask, sizeof(void*));
 			break;
 		default:
 			rc = ISCSI_ERR_BAD_OPCODE;
@@ -609,7 +608,8 @@ static void iscsi_prep_mtask(struct iscsi_conn *conn,
 		session->tt->init_mgmt_task(conn, mtask);
 
 	debug_scsi("mgmtpdu [op 0x%x hdr->itt 0x%x datalen %d]\n",
-		   hdr->opcode, hdr->itt, mtask->data_count);
+		   hdr->opcode & ISCSI_OPCODE_MASK, hdr->itt,
+		   mtask->data_count);
 }
 
 static int iscsi_xmit_mtask(struct iscsi_conn *conn)
@@ -658,21 +658,13 @@ static int iscsi_check_cmdsn_window_closed(struct iscsi_conn *conn)
 static int iscsi_xmit_ctask(struct iscsi_conn *conn)
 {
 	struct iscsi_cmd_task *ctask = conn->ctask;
-	int rc = 0;
-
-	/*
-	 * serialize with TMF AbortTask
-	 */
-	if (ctask->state == ISCSI_TASK_ABORTING)
-		goto done;
+	int rc;
 
 	__iscsi_get_ctask(ctask);
 	spin_unlock_bh(&conn->session->lock);
 	rc = conn->session->tt->xmit_cmd_task(conn, ctask);
 	spin_lock_bh(&conn->session->lock);
 	__iscsi_put_ctask(ctask);
-
-done:
 	if (!rc)
 		/* done with this ctask */
 		conn->ctask = NULL;
@@ -680,6 +672,22 @@ done:
 }
 
 /**
+ * iscsi_requeue_ctask - requeue ctask to run from session workqueue
+ * @ctask: ctask to requeue
+ *
+ * LLDs that need to run a ctask from the session workqueue should call
+ * this. The session lock must be held.
+ */
+void iscsi_requeue_ctask(struct iscsi_cmd_task *ctask)
+{
+	struct iscsi_conn *conn = ctask->conn;
+
+	list_move_tail(&ctask->running, &conn->requeue);
+	scsi_queue_work(conn->session->host, &conn->xmitwork);
+}
+EXPORT_SYMBOL_GPL(iscsi_requeue_ctask);
+
+/**
  * iscsi_data_xmit - xmit any command into the scheduled connection
  * @conn: iscsi connection
  *
@@ -717,36 +725,27 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
 	 * overflow us with nop-ins
 	 */
 check_mgmt:
-	while (__kfifo_get(conn->mgmtqueue, (void*)&conn->mtask,
-			   sizeof(void*))) {
+	while (!list_empty(&conn->mgmtqueue)) {
+		conn->mtask = list_entry(conn->mgmtqueue.next,
+					 struct iscsi_mgmt_task, running);
 		iscsi_prep_mtask(conn, conn->mtask);
-		list_add_tail(&conn->mtask->running, &conn->mgmt_run_list);
+		list_move_tail(conn->mgmtqueue.next, &conn->mgmt_run_list);
 		rc = iscsi_xmit_mtask(conn);
 		if (rc)
 			goto again;
 	}
 
-	/* process command queue */
+	/* process pending command queue */
 	while (!list_empty(&conn->xmitqueue)) {
-		/*
-		 * iscsi tcp may readd the task to the xmitqueue to send
-		 * write data
-		 */
+		if (conn->tmf_state == TMF_QUEUED)
+			break;
+
 		conn->ctask = list_entry(conn->xmitqueue.next,
 					 struct iscsi_cmd_task, running);
-		switch (conn->ctask->state) {
-		case ISCSI_TASK_ABORTING:
-			break;
-		case ISCSI_TASK_PENDING:
-			iscsi_prep_scsi_cmd_pdu(conn->ctask);
-			conn->session->tt->init_cmd_task(conn->ctask);
-			/* fall through */
-		default:
-			conn->ctask->state = ISCSI_TASK_RUNNING;
-			break;
-		}
+		iscsi_prep_scsi_cmd_pdu(conn->ctask);
+		conn->session->tt->init_cmd_task(conn->ctask);
+		conn->ctask->state = ISCSI_TASK_RUNNING;
 		list_move_tail(conn->xmitqueue.next, &conn->run_list);
-
 		rc = iscsi_xmit_ctask(conn);
 		if (rc)
 			goto again;
@@ -755,7 +754,22 @@ check_mgmt:
 		 * we need to check the mgmt queue for nops that need to
 		 * be sent to aviod starvation
 		 */
-		if (__kfifo_len(conn->mgmtqueue))
+		if (!list_empty(&conn->mgmtqueue))
+			goto check_mgmt;
+	}
+
+	while (!list_empty(&conn->requeue)) {
+		if (conn->session->fast_abort && conn->tmf_state != TMF_INITIAL)
+			break;
+
+		conn->ctask = list_entry(conn->requeue.next,
+					 struct iscsi_cmd_task, running);
+		conn->ctask->state = ISCSI_TASK_RUNNING;
+		list_move_tail(conn->requeue.next, &conn->run_list);
+		rc = iscsi_xmit_ctask(conn);
+		if (rc)
+			goto again;
+		if (!list_empty(&conn->mgmtqueue))
 			goto check_mgmt;
 	}
 	spin_unlock_bh(&conn->session->lock);
@@ -859,7 +873,6 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 
 	atomic_set(&ctask->refcount, 1);
 	ctask->state = ISCSI_TASK_PENDING;
-	ctask->mtask = NULL;
 	ctask->conn = conn;
 	ctask->sc = sc;
 	INIT_LIST_HEAD(&ctask->running);
@@ -929,9 +942,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 	} else
 		mtask->data_count = 0;
 
-	INIT_LIST_HEAD(&mtask->running);
 	memcpy(mtask->hdr, hdr, sizeof(struct iscsi_hdr));
-	__kfifo_put(conn->mgmtqueue, (void*)&mtask, sizeof(void*));
+	INIT_LIST_HEAD(&mtask->running);
+	list_add_tail(&mtask->running, &conn->mgmtqueue);
 	return mtask;
 }
 
@@ -954,13 +967,12 @@ EXPORT_SYMBOL_GPL(iscsi_conn_send_pdu);
 void iscsi_session_recovery_timedout(struct iscsi_cls_session *cls_session)
 {
 	struct iscsi_session *session = class_to_transport_session(cls_session);
-	struct iscsi_conn *conn = session->leadconn;
 
 	spin_lock_bh(&session->lock);
 	if (session->state != ISCSI_STATE_LOGGED_IN) {
 		session->state = ISCSI_STATE_RECOVERY_FAILED;
-		if (conn)
-			wake_up(&conn->ehwait);
+		if (session->leadconn)
+			wake_up(&session->leadconn->ehwait);
 	}
 	spin_unlock_bh(&session->lock);
 }
@@ -971,7 +983,6 @@ int iscsi_eh_host_reset(struct scsi_cmnd *sc)
 	struct Scsi_Host *host = sc->device->host;
 	struct iscsi_session *session = iscsi_hostdata(host->hostdata);
 	struct iscsi_conn *conn = session->leadconn;
-	int fail_session = 0;
 
 	spin_lock_bh(&session->lock);
 	if (session->state == ISCSI_STATE_TERMINATE) {
@@ -982,19 +993,13 @@ failed:
 		return FAILED;
 	}
 
-	if (sc->SCp.phase == session->age) {
-		debug_scsi("failing connection CID %d due to SCSI host reset\n",
-			   conn->id);
-		fail_session = 1;
-	}
 	spin_unlock_bh(&session->lock);
 
 	/*
 	 * we drop the lock here but the leadconn cannot be destoyed while
 	 * we are in the scsi eh
 	 */
-	if (fail_session)
-		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+	iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
 
 	debug_scsi("iscsi_eh_host_reset wait for relogin\n");
 	wait_event_interruptible(conn->ehwait,
@@ -1015,62 +1020,43 @@ failed:
 }
 EXPORT_SYMBOL_GPL(iscsi_eh_host_reset);
 
-static void iscsi_tmabort_timedout(unsigned long data)
+static void iscsi_tmf_timedout(unsigned long data)
 {
-	struct iscsi_cmd_task *ctask = (struct iscsi_cmd_task *)data;
-	struct iscsi_conn *conn = ctask->conn;
+	struct iscsi_conn *conn = (struct iscsi_conn *)data;
 	struct iscsi_session *session = conn->session;
 
 	spin_lock(&session->lock);
-	if (conn->tmabort_state == TMABORT_INITIAL) {
-		conn->tmabort_state = TMABORT_TIMEDOUT;
-		debug_scsi("tmabort timedout [sc %p itt 0x%x]\n",
-			ctask->sc, ctask->itt);
+	if (conn->tmf_state == TMF_QUEUED) {
+		conn->tmf_state = TMF_TIMEDOUT;
+		debug_scsi("tmf timedout\n");
 		/* unblock eh_abort() */
 		wake_up(&conn->ehwait);
 	}
 	spin_unlock(&session->lock);
 }
 
-static int iscsi_exec_abort_task(struct scsi_cmnd *sc,
-				 struct iscsi_cmd_task *ctask)
+static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
+				   struct iscsi_tm *hdr, int age)
 {
-	struct iscsi_conn *conn = ctask->conn;
 	struct iscsi_session *session = conn->session;
-	struct iscsi_tm *hdr = &conn->tmhdr;
-
-	/*
-	 * ctask timed out but session is OK requests must be serialized.
-	 */
-	memset(hdr, 0, sizeof(struct iscsi_tm));
-	hdr->opcode = ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE;
-	hdr->flags = ISCSI_TM_FUNC_ABORT_TASK;
-	hdr->flags |= ISCSI_FLAG_CMD_FINAL;
-	memcpy(hdr->lun, ctask->hdr->lun, sizeof(hdr->lun));
-	hdr->rtt = ctask->hdr->itt;
-	hdr->refcmdsn = ctask->hdr->cmdsn;
+	struct iscsi_mgmt_task *mtask;
 
-	ctask->mtask = __iscsi_conn_send_pdu(conn, (struct iscsi_hdr *)hdr,
-					    NULL, 0);
-	if (!ctask->mtask) {
+	mtask = __iscsi_conn_send_pdu(conn, (struct iscsi_hdr *)hdr,
+				      NULL, 0);
+	if (!mtask) {
 		spin_unlock_bh(&session->lock);
 		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
-		spin_lock_bh(&session->lock)
-		debug_scsi("abort sent failure [itt 0x%x]\n", ctask->itt);
+		spin_lock_bh(&session->lock);
+		debug_scsi("tmf exec failure\n");
 		return -EPERM;
 	}
-	ctask->state = ISCSI_TASK_ABORTING;
-
-	debug_scsi("abort sent [itt 0x%x]\n", ctask->itt);
+	conn->tmfcmd_pdus_cnt++;
+	conn->tmf_timer.expires = 30 * HZ + jiffies;
+	conn->tmf_timer.function = iscsi_tmf_timedout;
+	conn->tmf_timer.data = (unsigned long)conn;
+	add_timer(&conn->tmf_timer);
+	debug_scsi("tmf set timeout\n");
 
-	if (conn->tmabort_state == TMABORT_INITIAL) {
-		conn->tmfcmd_pdus_cnt++;
-		conn->tmabort_timer.expires = 20*HZ + jiffies;
-		conn->tmabort_timer.function = iscsi_tmabort_timedout;
-		conn->tmabort_timer.data = (unsigned long)ctask;
-		add_timer(&conn->tmabort_timer);
-		debug_scsi("abort set timeout [itt 0x%x]\n", ctask->itt);
-	}
 	spin_unlock_bh(&session->lock);
 	mutex_unlock(&session->eh_mutex);
 	scsi_queue_work(session->host, &conn->xmitwork);
@@ -1078,61 +1064,30 @@ static int iscsi_exec_abort_task(struct scsi_cmnd *sc,
 	/*
 	 * block eh thread until:
 	 *
-	 * 1) abort response
-	 * 2) abort timeout
+	 * 1) tmf response
+	 * 2) tmf timeout
 	 * 3) session is terminated or restarted or userspace has
 	 * given up on recovery
 	 */
-	wait_event_interruptible(conn->ehwait,
-				 sc->SCp.phase != session->age ||
+	wait_event_interruptible(conn->ehwait, age != session->age ||
 				 session->state != ISCSI_STATE_LOGGED_IN ||
-				 conn->tmabort_state != TMABORT_INITIAL);
+				 conn->tmf_state != TMF_QUEUED);
 	if (signal_pending(current))
 		flush_signals(current);
-	del_timer_sync(&conn->tmabort_timer);
+	del_timer_sync(&conn->tmf_timer);
+
 	mutex_lock(&session->eh_mutex);
 	spin_lock_bh(&session->lock);
-	return 0;
-}
-
-/*
- * session lock must be held
- */
-static struct iscsi_mgmt_task *
-iscsi_remove_mgmt_task(struct kfifo *fifo, uint32_t itt)
-{
-	int i, nr_tasks = __kfifo_len(fifo) / sizeof(void*);
-	struct iscsi_mgmt_task *task;
-
-	debug_scsi("searching %d tasks\n", nr_tasks);
-
-	for (i = 0; i < nr_tasks; i++) {
-		__kfifo_get(fifo, (void*)&task, sizeof(void*));
-		debug_scsi("check task %u\n", task->itt);
-
-		if (task->itt == itt) {
-			debug_scsi("matched task\n");
-			return task;
-		}
+	/* if the session drops it will clean up the mtask */
+	if (age != session->age ||
+	    session->state != ISCSI_STATE_LOGGED_IN)
+		return -ENOTCONN;
 
-		__kfifo_put(fifo, (void*)&task, sizeof(void*));
+	if (!list_empty(&mtask->running)) {
+		list_del_init(&mtask->running);
+		__kfifo_put(session->mgmtpool.queue, (void*)&mtask,
+			    sizeof(void*));
 	}
-	return NULL;
-}
-
-static int iscsi_ctask_mtask_cleanup(struct iscsi_cmd_task *ctask)
-{
-	struct iscsi_conn *conn = ctask->conn;
-	struct iscsi_session *session = conn->session;
-
-	if (!ctask->mtask)
-		return -EINVAL;
-
-	if (!iscsi_remove_mgmt_task(conn->mgmtqueue, ctask->mtask->itt))
-		list_del(&ctask->mtask->running);
-	__kfifo_put(session->mgmtpool.queue, (void*)&ctask->mtask,
-		    sizeof(void*));
-	ctask->mtask = NULL;
 	return 0;
 }
 
@@ -1156,7 +1111,6 @@ static void fail_command(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 		conn->session->queued_cmdsn--;
 	else
 		conn->session->tt->cleanup_cmd_task(conn, ctask);
-	iscsi_ctask_mtask_cleanup(ctask);
 
 	sc->result = err;
 	scsi_set_resid(sc, scsi_bufflen(sc));
@@ -1166,6 +1120,44 @@ static void fail_command(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 	__iscsi_put_ctask(ctask);
 }
 
+/*
+ * Fail commands. session lock held and recv side suspended and xmit
+ * thread flushed
+ */
+static void fail_all_commands(struct iscsi_conn *conn, unsigned lun)
+{
+	struct iscsi_cmd_task *ctask, *tmp;
+
+	if (conn->ctask && (conn->ctask->sc->device->lun == lun || lun == -1))
+		conn->ctask = NULL;
+
+	/* flush pending */
+	list_for_each_entry_safe(ctask, tmp, &conn->xmitqueue, running) {
+		if (lun == ctask->sc->device->lun || lun == -1) {
+			debug_scsi("failing pending sc %p itt 0x%x\n",
+				   ctask->sc, ctask->itt);
+			fail_command(conn, ctask, DID_BUS_BUSY << 16);
+		}
+	}
+
+	list_for_each_entry_safe(ctask, tmp, &conn->requeue, running) {
+		if (lun == ctask->sc->device->lun || lun == -1) {
+			debug_scsi("failing requeued sc %p itt 0x%x\n",
+				   ctask->sc, ctask->itt);
+			fail_command(conn, ctask, DID_BUS_BUSY << 16);
+		}
+	}
+
+	/* fail all other running */
+	list_for_each_entry_safe(ctask, tmp, &conn->run_list, running) {
+		if (lun == ctask->sc->device->lun || lun == -1) {
+			debug_scsi("failing in progress sc %p itt 0x%x\n",
+				   ctask->sc, ctask->itt);
+			fail_command(conn, ctask, DID_BUS_BUSY << 16);
+		}
+	}
+}
+
 static void iscsi_suspend_tx(struct iscsi_conn *conn)
 {
 	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
@@ -1178,13 +1170,26 @@ static void iscsi_start_tx(struct iscsi_conn *conn)
 	scsi_queue_work(conn->session->host, &conn->xmitwork);
 }
 
+static void iscsi_prep_abort_task_pdu(struct iscsi_cmd_task *ctask,
+				      struct iscsi_tm *hdr)
+{
+	memset(hdr, 0, sizeof(*hdr));
+	hdr->opcode = ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE;
+	hdr->flags = ISCSI_TM_FUNC_ABORT_TASK & ISCSI_FLAG_TM_FUNC_MASK;
+	hdr->flags |= ISCSI_FLAG_CMD_FINAL;
+	memcpy(hdr->lun, ctask->hdr->lun, sizeof(hdr->lun));
+	hdr->rtt = ctask->hdr->itt;
+	hdr->refcmdsn = ctask->hdr->cmdsn;
+}
+
 int iscsi_eh_abort(struct scsi_cmnd *sc)
 {
 	struct Scsi_Host *host = sc->device->host;
 	struct iscsi_session *session = iscsi_hostdata(host->hostdata);
-	struct iscsi_cmd_task *ctask;
 	struct iscsi_conn *conn;
-	int rc;
+	struct iscsi_cmd_task *ctask;
+	struct iscsi_tm *hdr;
+	int rc, age;
 
 	mutex_lock(&session->eh_mutex);
 	spin_lock_bh(&session->lock);
@@ -1199,19 +1204,23 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 		return SUCCESS;
 	}
 
-	ctask = (struct iscsi_cmd_task *)sc->SCp.ptr;
-	conn = ctask->conn;
-
-	conn->eh_abort_cnt++;
-	debug_scsi("aborting [sc %p itt 0x%x]\n", sc, ctask->itt);
-
 	/*
 	 * If we are not logged in or we have started a new session
 	 * then let the host reset code handle this
 	 */
-	if (session->state != ISCSI_STATE_LOGGED_IN ||
-	    sc->SCp.phase != session->age)
-		goto failed;
+	if (!session->leadconn || session->state != ISCSI_STATE_LOGGED_IN ||
+	    sc->SCp.phase != session->age) {
+		spin_unlock_bh(&session->lock);
+		mutex_unlock(&session->eh_mutex);
+		return FAILED;
+	}
+
+	conn = session->leadconn;
+	conn->eh_abort_cnt++;
+	age = session->age;
+
+	ctask = (struct iscsi_cmd_task *)sc->SCp.ptr;
+	debug_scsi("aborting [sc %p itt 0x%x]\n", sc, ctask->itt);
 
 	/* ctask completed before time out */
 	if (!ctask->sc) {
@@ -1219,27 +1228,26 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 		goto success;
 	}
 
-	/* what should we do here ? */
-	if (conn->ctask == ctask) {
-		printk(KERN_INFO "iscsi: sc %p itt 0x%x partially sent. "
-		       "Failing abort\n", sc, ctask->itt);
-		goto failed;
-	}
-
 	if (ctask->state == ISCSI_TASK_PENDING) {
 		fail_command(conn, ctask, DID_ABORT << 16);
 		goto success;
 	}
 
-	conn->tmabort_state = TMABORT_INITIAL;
-	rc = iscsi_exec_abort_task(sc, ctask);
-	if (rc || sc->SCp.phase != session->age ||
-	    session->state != ISCSI_STATE_LOGGED_IN)
+	/* only have one tmf outstanding at a time */
+	if (conn->tmf_state != TMF_INITIAL)
 		goto failed;
-	iscsi_ctask_mtask_cleanup(ctask);
+	conn->tmf_state = TMF_QUEUED;
 
-	switch (conn->tmabort_state) {
-	case TMABORT_SUCCESS:
+	hdr = &conn->tmhdr;
+	iscsi_prep_abort_task_pdu(ctask, hdr);
+
+	if (iscsi_exec_task_mgmt_fn(conn, hdr, age)) {
+		rc = FAILED;
+		goto failed;
+	}
+
+	switch (conn->tmf_state) {
+	case TMF_SUCCESS:
 		spin_unlock_bh(&session->lock);
 		iscsi_suspend_tx(conn);
 		/*
@@ -1248,22 +1256,26 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 		write_lock_bh(conn->recv_lock);
 		spin_lock(&session->lock);
 		fail_command(conn, ctask, DID_ABORT << 16);
+		conn->tmf_state = TMF_INITIAL;
 		spin_unlock(&session->lock);
 		write_unlock_bh(conn->recv_lock);
 		iscsi_start_tx(conn);
 		goto success_unlocked;
-	case TMABORT_NOT_FOUND:
-		if (!ctask->sc) {
+	case TMF_TIMEDOUT:
+		spin_unlock_bh(&session->lock);
+		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+		goto failed_unlocked;
+	case TMF_NOT_FOUND:
+		if (!sc->SCp.ptr) {
+			conn->tmf_state = TMF_INITIAL;
 			/* ctask completed before tmf abort response */
 			debug_scsi("sc completed while abort in progress\n");
 			goto success;
 		}
 		/* fall through */
 	default:
-		/* timedout or failed */
-		spin_unlock_bh(&session->lock);
-		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
-		goto failed_unlocked;
+		conn->tmf_state = TMF_INITIAL;
+		goto failed;
 	}
 
 success:
@@ -1276,12 +1288,93 @@ success_unlocked:
 failed:
 	spin_unlock_bh(&session->lock);
 failed_unlocked:
-	debug_scsi("abort failed [sc %lx itt 0x%x]\n", (long)sc, ctask->itt);
+	debug_scsi("abort failed [sc %p itt 0x%x]\n", sc,
+		    ctask ? ctask->itt : 0);
 	mutex_unlock(&session->eh_mutex);
 	return FAILED;
 }
 EXPORT_SYMBOL_GPL(iscsi_eh_abort);
 
+static void iscsi_prep_lun_reset_pdu(struct scsi_cmnd *sc, struct iscsi_tm *hdr)
+{
+	memset(hdr, 0, sizeof(*hdr));
+	hdr->opcode = ISCSI_OP_SCSI_TMFUNC | ISCSI_OP_IMMEDIATE;
+	hdr->flags = ISCSI_TM_FUNC_LOGICAL_UNIT_RESET & ISCSI_FLAG_TM_FUNC_MASK;
+	hdr->flags |= ISCSI_FLAG_CMD_FINAL;
+	int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun);
+	hdr->rtt = ISCSI_RESERVED_TAG;
+}
+
+int iscsi_eh_device_reset(struct scsi_cmnd *sc)
+{
+	struct Scsi_Host *host = sc->device->host;
+	struct iscsi_session *session = iscsi_hostdata(host->hostdata);
+	struct iscsi_conn *conn;
+	struct iscsi_tm *hdr;
+	int rc = FAILED;
+
+	debug_scsi("LU Reset [sc %p lun %u]\n", sc, sc->device->lun);
+
+	mutex_lock(&session->eh_mutex);
+	spin_lock_bh(&session->lock);
+	/*
+	 * Just check if we are not logged in. We cannot check for
+	 * the phase because the reset could come from a ioctl.
+	 */
+	if (!session->leadconn || session->state != ISCSI_STATE_LOGGED_IN)
+		goto unlock;
+	conn = session->leadconn;
+
+	/* only have one tmf outstanding at a time */
+	if (conn->tmf_state != TMF_INITIAL)
+		goto unlock;
+	conn->tmf_state = TMF_QUEUED;
+
+	hdr = &conn->tmhdr;
+	iscsi_prep_lun_reset_pdu(sc, hdr);
+
+	if (iscsi_exec_task_mgmt_fn(conn, hdr, session->age)) {
+		rc = FAILED;
+		goto unlock;
+	}
+
+	switch (conn->tmf_state) {
+	case TMF_SUCCESS:
+		break;
+	case TMF_TIMEDOUT:
+		spin_unlock_bh(&session->lock);
+		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+		goto done;
+	default:
+		conn->tmf_state = TMF_INITIAL;
+		goto unlock;
+	}
+
+	rc = SUCCESS;
+	spin_unlock_bh(&session->lock);
+
+	iscsi_suspend_tx(conn);
+	/* need to grab the recv lock then session lock */
+	write_lock_bh(conn->recv_lock);
+	spin_lock(&session->lock);
+	fail_all_commands(conn, sc->device->lun);
+	conn->tmf_state = TMF_INITIAL;
+	spin_unlock(&session->lock);
+	write_unlock_bh(conn->recv_lock);
+
+	iscsi_start_tx(conn);
+	goto done;
+
+unlock:
+	spin_unlock_bh(&session->lock);
+done:
+	debug_scsi("iscsi_eh_device_reset %s\n",
+		  rc == SUCCESS ? "SUCCESS" : "FAILED");
+	mutex_unlock(&session->eh_mutex);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(iscsi_eh_device_reset);
+
 int
 iscsi_pool_init(struct iscsi_queue *q, int max, void ***items, int item_size)
 {
@@ -1546,17 +1639,12 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, uint32_t conn_idx)
 	conn->c_stage = ISCSI_CONN_INITIAL_STAGE;
 	conn->id = conn_idx;
 	conn->exp_statsn = 0;
-	conn->tmabort_state = TMABORT_INITIAL;
+	conn->tmf_state = TMF_INITIAL;
 	INIT_LIST_HEAD(&conn->run_list);
 	INIT_LIST_HEAD(&conn->mgmt_run_list);
+	INIT_LIST_HEAD(&conn->mgmtqueue);
 	INIT_LIST_HEAD(&conn->xmitqueue);
-
-	/* initialize general immediate & non-immediate PDU commands queue */
-	conn->mgmtqueue = kfifo_alloc(session->mgmtpool_max * sizeof(void*),
-			                GFP_KERNEL, NULL);
-	if (conn->mgmtqueue == ERR_PTR(-ENOMEM))
-		goto mgmtqueue_alloc_fail;
-
+	INIT_LIST_HEAD(&conn->requeue);
 	INIT_WORK(&conn->xmitwork, iscsi_xmitworker);
 
 	/* allocate login_mtask used for the login/text sequences */
@@ -1574,7 +1662,7 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, uint32_t conn_idx)
 		goto login_mtask_data_alloc_fail;
 	conn->login_mtask->data = conn->data = data;
 
-	init_timer(&conn->tmabort_timer);
+	init_timer(&conn->tmf_timer);
 	init_waitqueue_head(&conn->ehwait);
 
 	return cls_conn;
@@ -1583,8 +1671,6 @@ login_mtask_data_alloc_fail:
 	__kfifo_put(session->mgmtpool.queue, (void*)&conn->login_mtask,
 		    sizeof(void*));
 login_mtask_alloc_fail:
-	kfifo_free(conn->mgmtqueue);
-mgmtqueue_alloc_fail:
 	iscsi_destroy_conn(cls_conn);
 	return NULL;
 }
@@ -1604,7 +1690,6 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 	unsigned long flags;
 
 	spin_lock_bh(&session->lock);
-	set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
 	conn->c_stage = ISCSI_CONN_CLEANUP_WAIT;
 	if (session->leadconn == conn) {
 		/*
@@ -1637,7 +1722,7 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 	}
 
 	/* flush queued up work because we free the connection below */
-	scsi_flush_work(session->host);
+	iscsi_suspend_tx(conn);
 
 	spin_lock_bh(&session->lock);
 	kfree(conn->data);
@@ -1648,8 +1733,6 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 		session->leadconn = NULL;
 	spin_unlock_bh(&session->lock);
 
-	kfifo_free(conn->mgmtqueue);
-
 	iscsi_destroy_conn(cls_conn);
 }
 EXPORT_SYMBOL_GPL(iscsi_conn_teardown);
@@ -1684,7 +1767,7 @@ int iscsi_conn_start(struct iscsi_cls_conn *cls_conn)
 		 * commands after successful recovery
 		 */
 		conn->stop_stage = 0;
-		conn->tmabort_state = TMABORT_INITIAL;
+		conn->tmf_state = TMF_INITIAL;
 		session->age++;
 		spin_unlock_bh(&session->lock);
 
@@ -1709,10 +1792,11 @@ flush_control_queues(struct iscsi_session *session, struct iscsi_conn *conn)
 	struct iscsi_mgmt_task *mtask, *tmp;
 
 	/* handle pending */
-	while (__kfifo_get(conn->mgmtqueue, (void*)&mtask, sizeof(void*))) {
+	list_for_each_entry_safe(mtask, tmp, &conn->mgmtqueue, running) {
+		debug_scsi("flushing pending mgmt task itt 0x%x\n", mtask->itt);
+		list_del_init(&mtask->running);
 		if (mtask == conn->login_mtask)
 			continue;
-		debug_scsi("flushing pending mgmt task itt 0x%x\n", mtask->itt);
 		__kfifo_put(session->mgmtpool.queue, (void*)&mtask,
 			    sizeof(void*));
 	}
@@ -1720,7 +1804,7 @@ flush_control_queues(struct iscsi_session *session, struct iscsi_conn *conn)
 	/* handle running */
 	list_for_each_entry_safe(mtask, tmp, &conn->mgmt_run_list, running) {
 		debug_scsi("flushing running mgmt task itt 0x%x\n", mtask->itt);
-		list_del(&mtask->running);
+		list_del_init(&mtask->running);
 
 		if (mtask == conn->login_mtask)
 			continue;
@@ -1731,28 +1815,6 @@ flush_control_queues(struct iscsi_session *session, struct iscsi_conn *conn)
 	conn->mtask = NULL;
 }
 
-/* Fail commands. Mutex and session lock held and recv side suspended */
-static void fail_all_commands(struct iscsi_conn *conn)
-{
-	struct iscsi_cmd_task *ctask, *tmp;
-
-	/* flush pending */
-	list_for_each_entry_safe(ctask, tmp, &conn->xmitqueue, running) {
-		debug_scsi("failing pending sc %p itt 0x%x\n", ctask->sc,
-			   ctask->itt);
-		fail_command(conn, ctask, DID_BUS_BUSY << 16);
-	}
-
-	/* fail all other running */
-	list_for_each_entry_safe(ctask, tmp, &conn->run_list, running) {
-		debug_scsi("failing in progress sc %p itt 0x%x\n",
-			   ctask->sc, ctask->itt);
-		fail_command(conn, ctask, DID_BUS_BUSY << 16);
-	}
-
-	conn->ctask = NULL;
-}
-
 static void iscsi_start_session_recovery(struct iscsi_session *session,
 					 struct iscsi_conn *conn, int flag)
 {
@@ -1818,7 +1880,7 @@ static void iscsi_start_session_recovery(struct iscsi_session *session,
 	 * flush queues.
 	 */
 	spin_lock_bh(&session->lock);
-	fail_all_commands(conn);
+	fail_all_commands(conn, -1);
 	flush_control_queues(session, conn);
 	spin_unlock_bh(&session->lock);
 	mutex_unlock(&session->eh_mutex);
@@ -1869,6 +1931,9 @@ int iscsi_set_param(struct iscsi_cls_conn *cls_conn,
 	uint32_t value;
 
 	switch(param) {
+	case ISCSI_PARAM_FAST_ABORT:
+		sscanf(buf, "%d", &session->fast_abort);
+		break;
 	case ISCSI_PARAM_MAX_RECV_DLENGTH:
 		sscanf(buf, "%d", &conn->max_recv_dlength);
 		break;
@@ -1983,6 +2048,9 @@ int iscsi_session_get_param(struct iscsi_cls_session *cls_session,
 	int len;
 
 	switch(param) {
+	case ISCSI_PARAM_FAST_ABORT:
+		len = sprintf(buf, "%d\n", session->fast_abort);
+		break;
 	case ISCSI_PARAM_INITIAL_R2T_EN:
 		len = sprintf(buf, "%d\n", session->initial_r2t_en);
 		break;
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 5428d15..b6f346c 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -30,7 +30,7 @@
 #include <scsi/scsi_transport_iscsi.h>
 #include <scsi/iscsi_if.h>
 
-#define ISCSI_SESSION_ATTRS 15
+#define ISCSI_SESSION_ATTRS 16
 #define ISCSI_CONN_ATTRS 11
 #define ISCSI_HOST_ATTRS 4
 #define ISCSI_TRANSPORT_VERSION "2.0-724"
@@ -1217,6 +1217,7 @@ iscsi_session_attr(username, ISCSI_PARAM_USERNAME, 1);
 iscsi_session_attr(username_in, ISCSI_PARAM_USERNAME_IN, 1);
 iscsi_session_attr(password, ISCSI_PARAM_PASSWORD, 1);
 iscsi_session_attr(password_in, ISCSI_PARAM_PASSWORD_IN, 1);
+iscsi_session_attr(fast_abort, ISCSI_PARAM_FAST_ABORT, 1);
 
 #define iscsi_priv_session_attr_show(field, format)			\
 static ssize_t								\
@@ -1438,6 +1439,7 @@ iscsi_register_transport(struct iscsi_transport *tt)
 	SETUP_SESSION_RD_ATTR(password_in, ISCSI_USERNAME_IN);
 	SETUP_SESSION_RD_ATTR(username, ISCSI_PASSWORD);
 	SETUP_SESSION_RD_ATTR(username_in, ISCSI_PASSWORD_IN);
+	SETUP_SESSION_RD_ATTR(fast_abort, ISCSI_FAST_ABORT);
 	SETUP_PRIV_SESSION_RD_ATTR(recovery_tmo);
 
 	BUG_ON(count > ISCSI_SESSION_ATTRS);
diff --git a/include/scsi/iscsi_if.h b/include/scsi/iscsi_if.h
index 50e907f..bff0b1f 100644
--- a/include/scsi/iscsi_if.h
+++ b/include/scsi/iscsi_if.h
@@ -236,6 +236,7 @@ enum iscsi_param {
 	ISCSI_PARAM_PASSWORD,
 	ISCSI_PARAM_PASSWORD_IN,
 
+	ISCSI_PARAM_FAST_ABORT,
 	/* must always be last */
 	ISCSI_PARAM_MAX,
 };
@@ -266,6 +267,7 @@ enum iscsi_param {
 #define ISCSI_USERNAME_IN		(1 << ISCSI_PARAM_USERNAME_IN)
 #define ISCSI_PASSWORD			(1 << ISCSI_PARAM_PASSWORD)
 #define ISCSI_PASSWORD_IN		(1 << ISCSI_PARAM_PASSWORD_IN)
+#define ISCSI_FAST_ABORT		(1 << ISCSI_PARAM_FAST_ABORT)
 
 /* iSCSI HBA params */
 enum iscsi_host_param {
diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h
index 8d1e4e8..751c81e 100644
--- a/include/scsi/iscsi_proto.h
+++ b/include/scsi/iscsi_proto.h
@@ -600,6 +600,8 @@ struct iscsi_reject {
 #define ISCSI_MIN_MAX_BURST_LEN			512
 #define ISCSI_MAX_MAX_BURST_LEN			16777215
 
+#define ISCSI_DEF_TIME2WAIT			2
+
 /************************* RFC 3720 End *****************************/
 
 #endif /* ISCSI_PROTO_H */
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index b4b3113..89429f4 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -57,11 +57,14 @@ struct iscsi_nopin;
 #define ISCSI_MAX_CMD_PER_LUN		128
 
 /* Task Mgmt states */
-#define TMABORT_INITIAL			0x0
-#define TMABORT_SUCCESS			0x1
-#define TMABORT_FAILED			0x2
-#define TMABORT_TIMEDOUT		0x3
-#define TMABORT_NOT_FOUND		0x4
+enum {
+	TMF_INITIAL,
+	TMF_QUEUED,
+	TMF_SUCCESS,
+	TMF_FAILED,
+	TMF_TIMEDOUT,
+	TMF_NOT_FOUND,
+};
 
 /* Connection suspend "bit" */
 #define ISCSI_SUSPEND_BIT		1
@@ -91,7 +94,6 @@ enum {
 	ISCSI_TASK_COMPLETED,
 	ISCSI_TASK_PENDING,
 	ISCSI_TASK_RUNNING,
-	ISCSI_TASK_ABORTING,
 };
 
 struct iscsi_cmd_task {
@@ -110,7 +112,6 @@ struct iscsi_cmd_task {
 	unsigned		data_count;	/* remaining Data-Out */
 	struct scsi_cmnd	*sc;		/* associated SCSI cmd*/
 	struct iscsi_conn	*conn;		/* used connection    */
-	struct iscsi_mgmt_task	*mtask;		/* tmf mtask in progr */
 
 	/* state set/tested under session->lock */
 	int			state;
@@ -152,10 +153,11 @@ struct iscsi_conn {
 	struct iscsi_cmd_task	*ctask;		/* xmit ctask in progress */
 
 	/* xmit */
-	struct kfifo		*mgmtqueue;	/* mgmt (control) xmit queue */
+	struct list_head	mgmtqueue;	/* mgmt (control) xmit queue */
 	struct list_head	mgmt_run_list;	/* list of control tasks */
 	struct list_head	xmitqueue;	/* data-path cmd queue */
 	struct list_head	run_list;	/* list of cmds in progress */
+	struct list_head	requeue;	/* tasks needing another run */
 	struct work_struct	xmitwork;	/* per-conn. xmit workqueue */
 	unsigned long		suspend_tx;	/* suspend Tx */
 	unsigned long		suspend_rx;	/* suspend Rx */
@@ -163,8 +165,8 @@ struct iscsi_conn {
 	/* abort */
 	wait_queue_head_t	ehwait;		/* used in eh_abort() */
 	struct iscsi_tm		tmhdr;
-	struct timer_list	tmabort_timer;
-	int			tmabort_state;	/* see TMABORT_INITIAL, etc.*/
+	struct timer_list	tmf_timer;
+	int			tmf_state;	/* see TMF_INITIAL, etc.*/
 
 	/* negotiated params */
 	unsigned		max_recv_dlength; /* initiator_max_recv_dsl*/
@@ -231,6 +233,7 @@ struct iscsi_session {
 	int			pdu_inorder_en;
 	int			dataseq_inorder_en;
 	int			erl;
+	int			fast_abort;
 	int			tpgt;
 	char			*username;
 	char			*username_in;
@@ -268,6 +271,7 @@ struct iscsi_session {
 extern int iscsi_change_queue_depth(struct scsi_device *sdev, int depth);
 extern int iscsi_eh_abort(struct scsi_cmnd *sc);
 extern int iscsi_eh_host_reset(struct scsi_cmnd *sc);
+extern int iscsi_eh_device_reset(struct scsi_cmnd *sc);
 extern int iscsi_queuecommand(struct scsi_cmnd *sc,
 			      void (*done)(struct scsi_cmnd *));
 
@@ -326,6 +330,7 @@ extern int __iscsi_complete_pdu(struct iscsi_conn *, struct iscsi_hdr *,
 				char *, int);
 extern int iscsi_verify_itt(struct iscsi_conn *, struct iscsi_hdr *,
 			    uint32_t *);
+extern void iscsi_requeue_ctask(struct iscsi_cmd_task *ctask);
 
 /*
  * generic helpers
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/23] iscsi_tcp: rewrite recv path
  2007-12-05  0:42 ` [PATCH 01/23] libiscsi, iscsi_tcp: add device support michaelc
@ 2007-12-05  0:42   ` michaelc
  2007-12-05  0:42     ` [PATCH 03/23] Prettify resid handling and some extra checks michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, olaf.kirch

From: Mike Christie <michaelc@cs.wisc.edu>

>From Olaf Kirch:

Rewrite recv path. Fixes:
- data digest processing and error handling.
- ahs support.

Some fixups by Mike Christie

Signed-off-by: olaf.kirch@oracle.com
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c | 1018 +++++++++++++++++++++++-----------------------
 drivers/scsi/iscsi_tcp.h |   66 ++--
 include/scsi/libiscsi.h  |    4 +
 3 files changed, 552 insertions(+), 536 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 4b226b8..1b540e0 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -48,7 +48,7 @@ MODULE_AUTHOR("Dmitry Yusupov <dmitry_yus@yahoo.com>, "
 	      "Alex Aizman <itn780@yahoo.com>");
 MODULE_DESCRIPTION("iSCSI/TCP data-path");
 MODULE_LICENSE("GPL");
-/* #define DEBUG_TCP */
+#undef DEBUG_TCP
 #define DEBUG_ASSERT
 
 #ifdef DEBUG_TCP
@@ -67,10 +67,15 @@ MODULE_LICENSE("GPL");
 static unsigned int iscsi_max_lun = 512;
 module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
 
+static int iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
+				   struct iscsi_chunk *chunk);
+
 static inline void
 iscsi_buf_init_iov(struct iscsi_buf *ibuf, char *vbuf, int size)
 {
-	sg_init_one(&ibuf->sg, vbuf, size);
+	ibuf->sg.page = virt_to_page(vbuf);
+	ibuf->sg.offset = offset_in_page(vbuf);
+	ibuf->sg.length = size;
 	ibuf->sent = 0;
 	ibuf->use_sendmsg = 1;
 }
@@ -78,12 +83,13 @@ iscsi_buf_init_iov(struct iscsi_buf *ibuf, char *vbuf, int size)
 static inline void
 iscsi_buf_init_sg(struct iscsi_buf *ibuf, struct scatterlist *sg)
 {
-	sg_init_table(&ibuf->sg, 1);
-	sg_set_page(&ibuf->sg, sg_page(sg), sg->length, sg->offset);
+	ibuf->sg.page = sg->page;
+	ibuf->sg.offset = sg->offset;
+	ibuf->sg.length = sg->length;
 	/*
 	 * Fastpath: sg element fits into single page
 	 */
-	if (sg->length + sg->offset <= PAGE_SIZE && !PageSlab(sg_page(sg)))
+	if (sg->length + sg->offset <= PAGE_SIZE && !PageSlab(sg->page))
 		ibuf->use_sendmsg = 0;
 	else
 		ibuf->use_sendmsg = 1;
@@ -110,72 +116,331 @@ iscsi_hdr_digest(struct iscsi_conn *conn, struct iscsi_buf *buf,
 	buf->sg.length += sizeof(u32);
 }
 
+/*
+ * Scatterlist handling: inside the iscsi_chunk, we
+ * remember an index into the scatterlist, and set data/size
+ * to the current scatterlist entry. For highmem pages, we
+ * kmap as needed.
+ *
+ * Note that the page is unmapped when we return from
+ * TCP's data_ready handler, so we may end up mapping and
+ * unmapping the same page repeatedly. The whole reason
+ * for this is that we shouldn't keep the page mapped
+ * outside the softirq.
+ */
+
+/**
+ * iscsi_tcp_chunk_init_sg - init indicated scatterlist entry
+ * @chunk: the buffer object
+ * @idx: index into scatterlist
+ * @offset: byte offset into that sg entry
+ *
+ * This function sets up the chunk so that subsequent
+ * data is copied to the indicated sg entry, at the given
+ * offset.
+ */
+static inline void
+iscsi_tcp_chunk_init_sg(struct iscsi_chunk *chunk,
+			unsigned int idx, unsigned int offset)
+{
+	struct scatterlist *sg;
+
+	BUG_ON(chunk->sg == NULL);
+
+	sg = &chunk->sg[idx];
+	chunk->sg_index = idx;
+	chunk->sg_offset = offset;
+	chunk->size = min(sg->length - offset, chunk->total_size);
+	chunk->data = NULL;
+}
+
+/**
+ * iscsi_tcp_chunk_map - map the current S/G page
+ * @chunk: iscsi chunk
+ *
+ * We only need to possibly kmap data if scatter lists are being used,
+ * because the iscsi passthrough and internal IO paths will never use high
+ * mem pages.
+ */
+static inline void
+iscsi_tcp_chunk_map(struct iscsi_chunk *chunk)
+{
+	struct scatterlist *sg;
+
+	if (chunk->data != NULL || !chunk->sg)
+		return;
+
+	sg = &chunk->sg[chunk->sg_index];
+	BUG_ON(chunk->sg_mapped);
+	BUG_ON(sg->length == 0);
+	chunk->sg_mapped = kmap_atomic(sg->page, KM_SOFTIRQ0);
+	chunk->data = chunk->sg_mapped + sg->offset + chunk->sg_offset;
+}
+
+static inline void
+iscsi_tcp_chunk_unmap(struct iscsi_chunk *chunk)
+{
+	if (chunk->sg_mapped) {
+		kunmap_atomic(chunk->sg_mapped, KM_SOFTIRQ0);
+		chunk->sg_mapped = NULL;
+		chunk->data = NULL;
+	}
+}
+
+/*
+ * Splice the digest buffer into the buffer
+ */
+static inline void
+iscsi_tcp_chunk_splice_digest(struct iscsi_chunk *chunk, void *digest)
+{
+	chunk->data = digest;
+	chunk->digest_len = ISCSI_DIGEST_SIZE;
+	chunk->total_size += ISCSI_DIGEST_SIZE;
+	chunk->size = ISCSI_DIGEST_SIZE;
+	chunk->copied = 0;
+	chunk->sg = NULL;
+	chunk->sg_index = 0;
+	chunk->hash = NULL;
+}
+
+/**
+ * iscsi_tcp_chunk_done - check whether the chunk is complete
+ * @chunk: iscsi chunk to check
+ *
+ * Check if we're done receiving this chunk. If the receive
+ * buffer is full but we expect more data, move on to the
+ * next entry in the scatterlist.
+ *
+ * If the amount of data we received isn't a multiple of 4,
+ * we will transparently receive the pad bytes, too.
+ *
+ * This function must be re-entrant.
+ */
 static inline int
-iscsi_hdr_extract(struct iscsi_tcp_conn *tcp_conn)
+iscsi_tcp_chunk_done(struct iscsi_chunk *chunk)
 {
-	struct sk_buff *skb = tcp_conn->in.skb;
+	static unsigned char padbuf[ISCSI_PAD_LEN];
+
+	if (chunk->copied < chunk->size) {
+		iscsi_tcp_chunk_map(chunk);
+		return 0;
+	}
 
-	tcp_conn->in.zero_copy_hdr = 0;
+	chunk->total_copied += chunk->copied;
+	chunk->copied = 0;
+	chunk->size = 0;
 
-	if (tcp_conn->in.copy >= tcp_conn->hdr_size &&
-	    tcp_conn->in_progress == IN_PROGRESS_WAIT_HEADER) {
-		/*
-		 * Zero-copy PDU Header: using connection context
-		 * to store header pointer.
-		 */
-		if (skb_shinfo(skb)->frag_list == NULL &&
-		    !skb_shinfo(skb)->nr_frags) {
-			tcp_conn->in.hdr = (struct iscsi_hdr *)
-				((char*)skb->data + tcp_conn->in.offset);
-			tcp_conn->in.zero_copy_hdr = 1;
-		} else {
-			/* ignoring return code since we checked
-			 * in.copy before */
-			skb_copy_bits(skb, tcp_conn->in.offset,
-				&tcp_conn->hdr, tcp_conn->hdr_size);
-			tcp_conn->in.hdr = &tcp_conn->hdr;
+	/* Unmap the current scatterlist page, if there is one. */
+	iscsi_tcp_chunk_unmap(chunk);
+
+	/* Do we have more scatterlist entries? */
+	if (chunk->total_copied < chunk->total_size) {
+		/* Proceed to the next entry in the scatterlist. */
+		iscsi_tcp_chunk_init_sg(chunk, chunk->sg_index + 1, 0);
+		iscsi_tcp_chunk_map(chunk);
+		BUG_ON(chunk->size == 0);
+		return 0;
+	}
+
+	/* Do we need to handle padding? */
+	if (chunk->total_copied & (ISCSI_PAD_LEN-1)) {
+		unsigned int pad;
+
+		pad = ISCSI_PAD_LEN - (chunk->total_copied & (ISCSI_PAD_LEN-1));
+		debug_tcp("consume %d pad bytes\n", pad);
+		chunk->total_size += pad;
+		chunk->size = pad;
+		chunk->data = padbuf;
+		return 0;
+	}
+
+	/*
+	 * Set us up for receiving the data digest. hdr digest
+	 * is completely handled in hdr done function.
+	 */
+	if (chunk->hash) {
+		if (chunk->digest_len == 0) {
+			crypto_hash_final(chunk->hash, chunk->digest);
+			iscsi_tcp_chunk_splice_digest(chunk,
+						      chunk->recv_digest);
+			return 0;
 		}
-		tcp_conn->in.offset += tcp_conn->hdr_size;
-		tcp_conn->in.copy -= tcp_conn->hdr_size;
-	} else {
-		int hdr_remains;
-		int copylen;
+	}
 
-		/*
-		 * PDU header scattered across SKB's,
-		 * copying it... This'll happen quite rarely.
-		 */
+	return 1;
+}
+
+/**
+ * iscsi_tcp_chunk_recv - copy data to chunk
+ * @tcp_conn: the iSCSI TCP connection
+ * @chunk: the buffer to copy to
+ * @ptr: data pointer
+ * @len: amount of data available
+ *
+ * This function copies up to @len bytes to the
+ * given buffer, and returns the number of bytes
+ * consumed, which can actually be less than @len.
+ *
+ * If hash digest is enabled, the function will update the
+ * hash while copying.
+ * Combining these two operations doesn't buy us a lot (yet),
+ * but in the future we could implement combined copy+crc,
+ * just way we do for network layer checksums.
+ */
+static int
+iscsi_tcp_chunk_recv(struct iscsi_tcp_conn *tcp_conn,
+		     struct iscsi_chunk *chunk, const void *ptr,
+		     unsigned int len)
+{
+	struct scatterlist sg;
+	unsigned int copy, copied = 0;
+
+	while (!iscsi_tcp_chunk_done(chunk)) {
+		if (copied == len)
+			goto out;
+
+		copy = min(len - copied, chunk->size - chunk->copied);
+		memcpy(chunk->data + chunk->copied, ptr + copied, copy);
+
+		if (chunk->hash) {
+			sg_init_one(&sg, ptr + copied, copy);
+			crypto_hash_update(chunk->hash, &sg, copy);
+		}
+		chunk->copied += copy;
+		copied += copy;
+	}
+
+out:
+	return copied;
+}
 
-		if (tcp_conn->in_progress == IN_PROGRESS_WAIT_HEADER)
-			tcp_conn->in.hdr_offset = 0;
+static inline void
+iscsi_tcp_dgst_header(struct hash_desc *hash, const void *hdr, size_t hdrlen,
+		      unsigned char digest[ISCSI_DIGEST_SIZE])
+{
+	struct scatterlist sg;
+
+	sg_init_one(&sg, hdr, hdrlen);
+	crypto_hash_digest(hash, &sg, hdrlen, digest);
+}
+
+static inline int
+iscsi_tcp_dgst_verify(struct iscsi_tcp_conn *tcp_conn,
+		      struct iscsi_chunk *chunk)
+{
+	if (!chunk->digest_len)
+		return 1;
+
+	if (memcmp(chunk->recv_digest, chunk->digest, chunk->digest_len)) {
+		debug_scsi("digest mismatch\n");
+		return 0;
+	}
 
-		hdr_remains = tcp_conn->hdr_size - tcp_conn->in.hdr_offset;
-		BUG_ON(hdr_remains <= 0);
+	return 1;
+}
 
-		copylen = min(tcp_conn->in.copy, hdr_remains);
-		skb_copy_bits(skb, tcp_conn->in.offset,
-			(char*)&tcp_conn->hdr + tcp_conn->in.hdr_offset,
-			copylen);
+/*
+ * Helper function to set up chunk buffer
+ */
+static inline void
+__iscsi_chunk_init(struct iscsi_chunk *chunk, size_t size,
+		   iscsi_chunk_done_fn_t *done, struct hash_desc *hash)
+{
+	memset(chunk, 0, sizeof(*chunk));
+	chunk->total_size = size;
+	chunk->done = done;
 
-		debug_tcp("PDU gather offset %d bytes %d in.offset %d "
-		       "in.copy %d\n", tcp_conn->in.hdr_offset, copylen,
-		       tcp_conn->in.offset, tcp_conn->in.copy);
+	if (hash) {
+		chunk->hash = hash;
+		crypto_hash_init(hash);
+	}
+}
 
-		tcp_conn->in.offset += copylen;
-		tcp_conn->in.copy -= copylen;
-		if (copylen < hdr_remains)  {
-			tcp_conn->in_progress = IN_PROGRESS_HEADER_GATHER;
-			tcp_conn->in.hdr_offset += copylen;
-		        return -EAGAIN;
+static inline void
+iscsi_chunk_init_linear(struct iscsi_chunk *chunk, void *data, size_t size,
+			iscsi_chunk_done_fn_t *done, struct hash_desc *hash)
+{
+	__iscsi_chunk_init(chunk, size, done, hash);
+	chunk->data = data;
+	chunk->size = size;
+}
+
+static inline int
+iscsi_chunk_seek_sg(struct iscsi_chunk *chunk,
+		    struct scatterlist *sg, unsigned int sg_count,
+		    unsigned int offset, size_t size,
+		    iscsi_chunk_done_fn_t *done, struct hash_desc *hash)
+{
+	unsigned int i;
+
+	__iscsi_chunk_init(chunk, size, done, hash);
+	for (i = 0; i < sg_count; ++i) {
+		if (offset < sg[i].length) {
+			chunk->sg = sg;
+			chunk->sg_count = sg_count;
+			iscsi_tcp_chunk_init_sg(chunk, i, offset);
+			return 0;
 		}
-		tcp_conn->in.hdr = &tcp_conn->hdr;
-		tcp_conn->discontiguous_hdr_cnt++;
-	        tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER;
+		offset -= sg[i].length;
 	}
 
+	return ISCSI_ERR_DATA_OFFSET;
+}
+
+/**
+ * iscsi_tcp_hdr_recv_prep - prep chunk for hdr reception
+ * @tcp_conn: iscsi connection to prep for
+ *
+ * This function always passes NULL for the hash argument, because when this
+ * function is called we do not yet know the final size of the header and want
+ * to delay the digest processing until we know that.
+ */
+static void
+iscsi_tcp_hdr_recv_prep(struct iscsi_tcp_conn *tcp_conn)
+{
+	debug_tcp("iscsi_tcp_hdr_recv_prep(%p%s)\n", tcp_conn,
+		  tcp_conn->iscsi_conn->hdrdgst_en ? ", digest enabled" : "");
+	iscsi_chunk_init_linear(&tcp_conn->in.chunk,
+				tcp_conn->in.hdr_buf, sizeof(struct iscsi_hdr),
+				iscsi_tcp_hdr_recv_done, NULL);
+}
+
+/*
+ * Handle incoming reply to any other type of command
+ */
+static int
+iscsi_tcp_data_recv_done(struct iscsi_tcp_conn *tcp_conn,
+			 struct iscsi_chunk *chunk)
+{
+	struct iscsi_conn *conn = tcp_conn->iscsi_conn;
+	int rc = 0;
+
+	if (!iscsi_tcp_dgst_verify(tcp_conn, chunk))
+		return ISCSI_ERR_DATA_DGST;
+
+	rc = iscsi_complete_pdu(conn, tcp_conn->in.hdr,
+			conn->data, tcp_conn->in.datalen);
+	if (rc)
+		return rc;
+
+	iscsi_tcp_hdr_recv_prep(tcp_conn);
 	return 0;
 }
 
+static void
+iscsi_tcp_data_recv_prep(struct iscsi_tcp_conn *tcp_conn)
+{
+	struct iscsi_conn *conn = tcp_conn->iscsi_conn;
+	struct hash_desc *rx_hash = NULL;
+
+	if (conn->datadgst_en)
+		rx_hash = &tcp_conn->rx_hash;
+
+	iscsi_chunk_init_linear(&tcp_conn->in.chunk,
+				conn->data, tcp_conn->in.datalen,
+				iscsi_tcp_data_recv_done, rx_hash);
+}
+
 /*
  * must be called with session lock
  */
@@ -417,16 +682,49 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	return 0;
 }
 
+/*
+ * Handle incoming reply to DataIn command
+ */
 static int
-iscsi_tcp_hdr_recv(struct iscsi_conn *conn)
+iscsi_tcp_process_data_in(struct iscsi_tcp_conn *tcp_conn,
+			  struct iscsi_chunk *chunk)
+{
+	struct iscsi_conn *conn = tcp_conn->iscsi_conn;
+	struct iscsi_hdr *hdr = tcp_conn->in.hdr;
+	int rc;
+
+	if (!iscsi_tcp_dgst_verify(tcp_conn, chunk))
+		return ISCSI_ERR_DATA_DGST;
+
+	/* check for non-exceptional status */
+	if (hdr->flags & ISCSI_FLAG_DATA_STATUS) {
+		rc = iscsi_complete_pdu(conn, tcp_conn->in.hdr, NULL, 0);
+		if (rc)
+			return rc;
+	}
+
+	iscsi_tcp_hdr_recv_prep(tcp_conn);
+	return 0;
+}
+
+/**
+ * iscsi_tcp_hdr_dissect - process PDU header
+ * @conn: iSCSI connection
+ * @hdr: PDU header
+ *
+ * This function analyzes the header of the PDU received,
+ * and performs several sanity checks. If the PDU is accompanied
+ * by data, the receive buffer is set up to copy the incoming data
+ * to the correct location.
+ */
+static int
+iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 {
 	int rc = 0, opcode, ahslen;
-	struct iscsi_hdr *hdr;
 	struct iscsi_session *session = conn->session;
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	uint32_t cdgst, rdgst = 0, itt;
-
-	hdr = tcp_conn->in.hdr;
+	struct iscsi_cmd_task *ctask;
+	uint32_t itt;
 
 	/* verify PDU length */
 	tcp_conn->in.datalen = ntoh24(hdr->dlength);
@@ -435,77 +733,72 @@ iscsi_tcp_hdr_recv(struct iscsi_conn *conn)
 		       tcp_conn->in.datalen, conn->max_recv_dlength);
 		return ISCSI_ERR_DATALEN;
 	}
-	tcp_conn->data_copied = 0;
 
-	/* read AHS */
+	/* Additional header segments. So far, we don't
+	 * process additional headers.
+	 */
 	ahslen = hdr->hlength << 2;
-	tcp_conn->in.offset += ahslen;
-	tcp_conn->in.copy -= ahslen;
-	if (tcp_conn->in.copy < 0) {
-		printk(KERN_ERR "iscsi_tcp: can't handle AHS with length "
-		       "%d bytes\n", ahslen);
-		return ISCSI_ERR_AHSLEN;
-	}
-
-	/* calculate read padding */
-	tcp_conn->in.padding = tcp_conn->in.datalen & (ISCSI_PAD_LEN-1);
-	if (tcp_conn->in.padding) {
-		tcp_conn->in.padding = ISCSI_PAD_LEN - tcp_conn->in.padding;
-		debug_scsi("read padding %d bytes\n", tcp_conn->in.padding);
-	}
-
-	if (conn->hdrdgst_en) {
-		struct scatterlist sg;
-
-		sg_init_one(&sg, (u8 *)hdr,
-			    sizeof(struct iscsi_hdr) + ahslen);
-		crypto_hash_digest(&tcp_conn->rx_hash, &sg, sg.length,
-				   (u8 *)&cdgst);
-		rdgst = *(uint32_t*)((char*)hdr + sizeof(struct iscsi_hdr) +
-				     ahslen);
-		if (cdgst != rdgst) {
-			printk(KERN_ERR "iscsi_tcp: hdrdgst error "
-			       "recv 0x%x calc 0x%x\n", rdgst, cdgst);
-			return ISCSI_ERR_HDR_DGST;
-		}
-	}
 
 	opcode = hdr->opcode & ISCSI_OPCODE_MASK;
 	/* verify itt (itt encoding: age+cid+itt) */
 	rc = iscsi_verify_itt(conn, hdr, &itt);
 	if (rc == ISCSI_ERR_NO_SCSI_CMD) {
+		/* XXX: what does this do? */
 		tcp_conn->in.datalen = 0; /* force drop */
 		return 0;
 	} else if (rc)
 		return rc;
 
-	debug_tcp("opcode 0x%x offset %d copy %d ahslen %d datalen %d\n",
-		  opcode, tcp_conn->in.offset, tcp_conn->in.copy,
-		  ahslen, tcp_conn->in.datalen);
+	debug_tcp("opcode 0x%x ahslen %d datalen %d\n",
+		  opcode, ahslen, tcp_conn->in.datalen);
 
 	switch(opcode) {
 	case ISCSI_OP_SCSI_DATA_IN:
-		tcp_conn->in.ctask = session->cmds[itt];
-		rc = iscsi_data_rsp(conn, tcp_conn->in.ctask);
+		ctask = session->cmds[itt];
+		rc = iscsi_data_rsp(conn, ctask);
 		if (rc)
 			return rc;
+		if (tcp_conn->in.datalen) {
+			struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
+			struct hash_desc *rx_hash = NULL;
+
+			/*
+			 * Setup copy of Data-In into the Scsi_Cmnd
+			 * Scatterlist case:
+			 * We set up the iscsi_chunk to point to the next
+			 * scatterlist entry to copy to. As we go along,
+			 * we move on to the next scatterlist entry and
+			 * update the digest per-entry.
+			 */
+			if (conn->datadgst_en)
+				rx_hash = &tcp_conn->rx_hash;
+
+			debug_tcp("iscsi_tcp_begin_data_in(%p, offset=%d, "
+				  "datalen=%d)\n", tcp_conn,
+				  tcp_ctask->data_offset,
+				  tcp_conn->in.datalen);
+			return iscsi_chunk_seek_sg(&tcp_conn->in.chunk,
+						scsi_sglist(ctask->sc),
+						scsi_sg_count(ctask->sc),
+						tcp_ctask->data_offset,
+						tcp_conn->in.datalen,
+						iscsi_tcp_process_data_in,
+						rx_hash);
+		}
 		/* fall through */
 	case ISCSI_OP_SCSI_CMD_RSP:
-		tcp_conn->in.ctask = session->cmds[itt];
-		if (tcp_conn->in.datalen)
-			goto copy_hdr;
-
-		spin_lock(&session->lock);
-		rc = __iscsi_complete_pdu(conn, hdr, NULL, 0);
-		spin_unlock(&session->lock);
+		if (tcp_conn->in.datalen) {
+			iscsi_tcp_data_recv_prep(tcp_conn);
+			return 0;
+		}
+		rc = iscsi_complete_pdu(conn, hdr, NULL, 0);
 		break;
 	case ISCSI_OP_R2T:
-		tcp_conn->in.ctask = session->cmds[itt];
+		ctask = session->cmds[itt];
 		if (ahslen)
 			rc = ISCSI_ERR_AHSLEN;
-		else if (tcp_conn->in.ctask->sc->sc_data_direction ==
-								DMA_TO_DEVICE)
-			rc = iscsi_r2t_rsp(conn, tcp_conn->in.ctask);
+		else if (ctask->sc->sc_data_direction == DMA_TO_DEVICE)
+			rc = iscsi_r2t_rsp(conn, ctask);
 		else
 			rc = ISCSI_ERR_PROTO;
 		break;
@@ -518,8 +811,7 @@ iscsi_tcp_hdr_recv(struct iscsi_conn *conn)
 		 * than 8K, but there are no targets that currently do this.
 		 * For now we fail until we find a vendor that needs it
 		 */
-		if (ISCSI_DEF_MAX_RECV_SEG_LEN <
-		    tcp_conn->in.datalen) {
+		if (ISCSI_DEF_MAX_RECV_SEG_LEN < tcp_conn->in.datalen) {
 			printk(KERN_ERR "iscsi_tcp: received buffer of len %u "
 			      "but conn buffer is only %u (opcode %0x)\n",
 			      tcp_conn->in.datalen,
@@ -528,8 +820,13 @@ iscsi_tcp_hdr_recv(struct iscsi_conn *conn)
 			break;
 		}
 
-		if (tcp_conn->in.datalen)
-			goto copy_hdr;
+		/* If there's data coming in with the response,
+		 * receive it to the connection's buffer.
+		 */
+		if (tcp_conn->in.datalen) {
+			iscsi_tcp_data_recv_prep(tcp_conn);
+			return 0;
+		}
 	/* fall through */
 	case ISCSI_OP_LOGOUT_RSP:
 	case ISCSI_OP_NOOP_IN:
@@ -541,129 +838,15 @@ iscsi_tcp_hdr_recv(struct iscsi_conn *conn)
 		break;
 	}
 
-	return rc;
-
-copy_hdr:
-	/*
-	 * if we did zero copy for the header but we will need multiple
-	 * skbs to complete the command then we have to copy the header
-	 * for later use
-	 */
-	if (tcp_conn->in.zero_copy_hdr && tcp_conn->in.copy <=
-	   (tcp_conn->in.datalen + tcp_conn->in.padding +
-	    (conn->datadgst_en ? 4 : 0))) {
-		debug_tcp("Copying header for later use. in.copy %d in.datalen"
-			  " %d\n", tcp_conn->in.copy, tcp_conn->in.datalen);
-		memcpy(&tcp_conn->hdr, tcp_conn->in.hdr,
-		       sizeof(struct iscsi_hdr));
-		tcp_conn->in.hdr = &tcp_conn->hdr;
-		tcp_conn->in.zero_copy_hdr = 0;
-	}
-	return 0;
-}
-
-/**
- * iscsi_ctask_copy - copy skb bits to the destanation cmd task
- * @conn: iscsi tcp connection
- * @ctask: scsi command task
- * @buf: buffer to copy to
- * @buf_size: size of buffer
- * @offset: offset within the buffer
- *
- * Notes:
- *	The function calls skb_copy_bits() and updates per-connection and
- *	per-cmd byte counters.
- *
- *	Read counters (in bytes):
- *
- *	conn->in.offset		offset within in progress SKB
- *	conn->in.copy		left to copy from in progress SKB
- *				including padding
- *	conn->in.copied		copied already from in progress SKB
- *	conn->data_copied	copied already from in progress buffer
- *	ctask->sent		total bytes sent up to the MidLayer
- *	ctask->data_count	left to copy from in progress Data-In
- *	buf_left		left to copy from in progress buffer
- **/
-static inline int
-iscsi_ctask_copy(struct iscsi_tcp_conn *tcp_conn, struct iscsi_cmd_task *ctask,
-		void *buf, int buf_size, int offset)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	int buf_left = buf_size - (tcp_conn->data_copied + offset);
-	unsigned size = min(tcp_conn->in.copy, buf_left);
-	int rc;
-
-	size = min(size, ctask->data_count);
-
-	debug_tcp("ctask_copy %d bytes at offset %d copied %d\n",
-	       size, tcp_conn->in.offset, tcp_conn->in.copied);
-
-	BUG_ON(size <= 0);
-	BUG_ON(tcp_ctask->sent + size > scsi_bufflen(ctask->sc));
-
-	rc = skb_copy_bits(tcp_conn->in.skb, tcp_conn->in.offset,
-			   (char*)buf + (offset + tcp_conn->data_copied), size);
-	/* must fit into skb->len */
-	BUG_ON(rc);
-
-	tcp_conn->in.offset += size;
-	tcp_conn->in.copy -= size;
-	tcp_conn->in.copied += size;
-	tcp_conn->data_copied += size;
-	tcp_ctask->sent += size;
-	ctask->data_count -= size;
-
-	BUG_ON(tcp_conn->in.copy < 0);
-	BUG_ON(ctask->data_count < 0);
-
-	if (buf_size != (tcp_conn->data_copied + offset)) {
-		if (!ctask->data_count) {
-			BUG_ON(buf_size - tcp_conn->data_copied < 0);
-			/* done with this PDU */
-			return buf_size - tcp_conn->data_copied;
-		}
-		return -EAGAIN;
+	if (rc == 0) {
+		/* Anything that comes with data should have
+		 * been handled above. */
+		if (tcp_conn->in.datalen)
+			return ISCSI_ERR_PROTO;
+		iscsi_tcp_hdr_recv_prep(tcp_conn);
 	}
 
-	/* done with this buffer or with both - PDU and buffer */
-	tcp_conn->data_copied = 0;
-	return 0;
-}
-
-/**
- * iscsi_tcp_copy - copy skb bits to the destanation buffer
- * @conn: iscsi tcp connection
- *
- * Notes:
- *	The function calls skb_copy_bits() and updates per-connection
- *	byte counters.
- **/
-static inline int
-iscsi_tcp_copy(struct iscsi_conn *conn, int buf_size)
-{
-	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	int buf_left = buf_size - tcp_conn->data_copied;
-	int size = min(tcp_conn->in.copy, buf_left);
-	int rc;
-
-	debug_tcp("tcp_copy %d bytes at offset %d copied %d\n",
-	       size, tcp_conn->in.offset, tcp_conn->data_copied);
-	BUG_ON(size <= 0);
-
-	rc = skb_copy_bits(tcp_conn->in.skb, tcp_conn->in.offset,
-			   (char*)conn->data + tcp_conn->data_copied, size);
-	BUG_ON(rc);
-
-	tcp_conn->in.offset += size;
-	tcp_conn->in.copy -= size;
-	tcp_conn->in.copied += size;
-	tcp_conn->data_copied += size;
-
-	if (buf_size != tcp_conn->data_copied)
-		return -EAGAIN;
-
-	return 0;
+	return rc;
 }
 
 static inline void
@@ -677,325 +860,146 @@ partial_sg_digest_update(struct hash_desc *desc, struct scatterlist *sg,
 	crypto_hash_update(desc, &temp, length);
 }
 
-static void
-iscsi_recv_digest_update(struct iscsi_tcp_conn *tcp_conn, char* buf, int len)
-{
-	struct scatterlist tmp;
-
-	sg_init_one(&tmp, buf, len);
-	crypto_hash_update(&tcp_conn->rx_hash, &tmp, len);
-}
-
-static int iscsi_scsi_data_in(struct iscsi_conn *conn)
+/**
+ * iscsi_tcp_hdr_recv_done - process PDU header
+ *
+ * This is the callback invoked when the PDU header has
+ * been received. If the header is followed by additional
+ * header segments, we go back for more data.
+ */
+static int
+iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
+			struct iscsi_chunk *chunk)
 {
-	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	struct iscsi_cmd_task *ctask = tcp_conn->in.ctask;
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	struct scsi_cmnd *sc = ctask->sc;
-	struct scatterlist *sg;
-	int i, offset, rc = 0;
-
-	BUG_ON((void*)ctask != sc->SCp.ptr);
-
-	offset = tcp_ctask->data_offset;
-	sg = scsi_sglist(sc);
-
-	if (tcp_ctask->data_offset)
-		for (i = 0; i < tcp_ctask->sg_count; i++)
-			offset -= sg[i].length;
-	/* we've passed through partial sg*/
-	if (offset < 0)
-		offset = 0;
-
-	for (i = tcp_ctask->sg_count; i < scsi_sg_count(sc); i++) {
-		char *dest;
-
-		dest = kmap_atomic(sg_page(&sg[i]), KM_SOFTIRQ0);
-		rc = iscsi_ctask_copy(tcp_conn, ctask, dest + sg[i].offset,
-				      sg[i].length, offset);
-		kunmap_atomic(dest, KM_SOFTIRQ0);
-		if (rc == -EAGAIN)
-			/* continue with the next SKB/PDU */
-			return rc;
-		if (!rc) {
-			if (conn->datadgst_en) {
-				if (!offset)
-					crypto_hash_update(
-							&tcp_conn->rx_hash,
-							&sg[i], sg[i].length);
-				else
-					partial_sg_digest_update(
-							&tcp_conn->rx_hash,
-							&sg[i],
-							sg[i].offset + offset,
-							sg[i].length - offset);
-			}
-			offset = 0;
-			tcp_ctask->sg_count++;
-		}
-
-		if (!ctask->data_count) {
-			if (rc && conn->datadgst_en)
-				/*
-				 * data-in is complete, but buffer not...
-				 */
-				partial_sg_digest_update(&tcp_conn->rx_hash,
-							 &sg[i],
-							 sg[i].offset,
-							 sg[i].length-rc);
-			rc = 0;
-			break;
-		}
-
-		if (!tcp_conn->in.copy)
-			return -EAGAIN;
-	}
-	BUG_ON(ctask->data_count);
+	struct iscsi_conn *conn = tcp_conn->iscsi_conn;
+	struct iscsi_hdr *hdr;
 
-	/* check for non-exceptional status */
-	if (tcp_conn->in.hdr->flags & ISCSI_FLAG_DATA_STATUS) {
-		debug_scsi("done [sc %lx res %d itt 0x%x flags 0x%x]\n",
-			   (long)sc, sc->result, ctask->itt,
-			   tcp_conn->in.hdr->flags);
-		spin_lock(&conn->session->lock);
-		__iscsi_complete_pdu(conn, tcp_conn->in.hdr, NULL, 0);
-		spin_unlock(&conn->session->lock);
+	/* Check if there are additional header segments
+	 * *prior* to computing the digest, because we
+	 * may need to go back to the caller for more.
+	 */
+	hdr = (struct iscsi_hdr *) tcp_conn->in.hdr_buf;
+	if (chunk->copied == sizeof(struct iscsi_hdr) && hdr->hlength) {
+		/* Bump the header length - the caller will
+		 * just loop around and get the AHS for us, and
+		 * call again. */
+		unsigned int ahslen = hdr->hlength << 2;
+
+		/* Make sure we don't overflow */
+		if (sizeof(*hdr) + ahslen > sizeof(tcp_conn->in.hdr_buf))
+			return ISCSI_ERR_AHSLEN;
+
+		chunk->total_size += ahslen;
+		chunk->size += ahslen;
+		return 0;
 	}
 
-	return rc;
-}
-
-static int
-iscsi_data_recv(struct iscsi_conn *conn)
-{
-	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	int rc = 0, opcode;
-
-	opcode = tcp_conn->in.hdr->opcode & ISCSI_OPCODE_MASK;
-	switch (opcode) {
-	case ISCSI_OP_SCSI_DATA_IN:
-		rc = iscsi_scsi_data_in(conn);
-		break;
-	case ISCSI_OP_SCSI_CMD_RSP:
-	case ISCSI_OP_TEXT_RSP:
-	case ISCSI_OP_LOGIN_RSP:
-	case ISCSI_OP_ASYNC_EVENT:
-	case ISCSI_OP_REJECT:
-		/*
-		 * Collect data segment to the connection's data
-		 * placeholder
-		 */
-		if (iscsi_tcp_copy(conn, tcp_conn->in.datalen)) {
-			rc = -EAGAIN;
-			goto exit;
+	/* We're done processing the header. See if we're doing
+	 * header digests; if so, set up the recv_digest buffer
+	 * and go back for more. */
+	if (conn->hdrdgst_en) {
+		if (chunk->digest_len == 0) {
+			iscsi_tcp_chunk_splice_digest(chunk,
+						      chunk->recv_digest);
+			return 0;
 		}
+		iscsi_tcp_dgst_header(&tcp_conn->rx_hash, hdr,
+				      chunk->total_copied - ISCSI_DIGEST_SIZE,
+				      chunk->digest);
 
-		rc = iscsi_complete_pdu(conn, tcp_conn->in.hdr, conn->data,
-					tcp_conn->in.datalen);
-		if (!rc && conn->datadgst_en && opcode != ISCSI_OP_LOGIN_RSP)
-			iscsi_recv_digest_update(tcp_conn, conn->data,
-			  			tcp_conn->in.datalen);
-		break;
-	default:
-		BUG_ON(1);
+		if (!iscsi_tcp_dgst_verify(tcp_conn, chunk))
+			return ISCSI_ERR_HDR_DGST;
 	}
-exit:
-	return rc;
+
+	tcp_conn->in.hdr = hdr;
+	return iscsi_tcp_hdr_dissect(conn, hdr);
 }
 
 /**
- * iscsi_tcp_data_recv - TCP receive in sendfile fashion
+ * iscsi_tcp_recv - TCP receive in sendfile fashion
  * @rd_desc: read descriptor
  * @skb: socket buffer
  * @offset: offset in skb
  * @len: skb->len - offset
  **/
 static int
-iscsi_tcp_data_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
-		unsigned int offset, size_t len)
+iscsi_tcp_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
+	       unsigned int offset, size_t len)
 {
-	int rc;
 	struct iscsi_conn *conn = rd_desc->arg.data;
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	int processed;
-	char pad[ISCSI_PAD_LEN];
-	struct scatterlist sg;
-
-	/*
-	 * Save current SKB and its offset in the corresponding
-	 * connection context.
-	 */
-	tcp_conn->in.copy = skb->len - offset;
-	tcp_conn->in.offset = offset;
-	tcp_conn->in.skb = skb;
-	tcp_conn->in.len = tcp_conn->in.copy;
-	BUG_ON(tcp_conn->in.copy <= 0);
-	debug_tcp("in %d bytes\n", tcp_conn->in.copy);
+	struct iscsi_chunk *chunk = &tcp_conn->in.chunk;
+	struct skb_seq_state seq;
+	unsigned int consumed = 0;
+	int rc = 0;
 
-more:
-	tcp_conn->in.copied = 0;
-	rc = 0;
+	debug_tcp("in %d bytes\n", skb->len - offset);
 
 	if (unlikely(conn->suspend_rx)) {
 		debug_tcp("conn %d Rx suspended!\n", conn->id);
 		return 0;
 	}
 
-	if (tcp_conn->in_progress == IN_PROGRESS_WAIT_HEADER ||
-	    tcp_conn->in_progress == IN_PROGRESS_HEADER_GATHER) {
-		rc = iscsi_hdr_extract(tcp_conn);
-		if (rc) {
-		       if (rc == -EAGAIN)
-				goto nomore;
-		       else {
-				iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
-				return 0;
-		       }
-		}
-
-		/*
-		 * Verify and process incoming PDU header.
-		 */
-		rc = iscsi_tcp_hdr_recv(conn);
-		if (!rc && tcp_conn->in.datalen) {
-			if (conn->datadgst_en)
-				crypto_hash_init(&tcp_conn->rx_hash);
-			tcp_conn->in_progress = IN_PROGRESS_DATA_RECV;
-		} else if (rc) {
-			iscsi_conn_failure(conn, rc);
-			return 0;
-		}
-	}
-
-	if (tcp_conn->in_progress == IN_PROGRESS_DDIGEST_RECV &&
-	    tcp_conn->in.copy) {
-		uint32_t recv_digest;
+	skb_prepare_seq_read(skb, offset, skb->len, &seq);
+	while (1) {
+		unsigned int avail;
+		const u8 *ptr;
 
-		debug_tcp("extra data_recv offset %d copy %d\n",
-			  tcp_conn->in.offset, tcp_conn->in.copy);
-
-		if (!tcp_conn->data_copied) {
-			if (tcp_conn->in.padding) {
-				debug_tcp("padding -> %d\n",
-					  tcp_conn->in.padding);
-				memset(pad, 0, tcp_conn->in.padding);
-				sg_init_one(&sg, pad, tcp_conn->in.padding);
-				crypto_hash_update(&tcp_conn->rx_hash,
-						   &sg, sg.length);
+		avail = skb_seq_read(consumed, &ptr, &seq);
+		if (avail == 0)
+			break;
+		BUG_ON(chunk->copied >= chunk->size);
+
+		debug_tcp("skb %p ptr=%p avail=%u\n", skb, ptr, avail);
+		rc = iscsi_tcp_chunk_recv(tcp_conn, chunk, ptr, avail);
+		BUG_ON(rc == 0);
+		consumed += rc;
+
+		if (chunk->total_copied >= chunk->total_size) {
+			rc = chunk->done(tcp_conn, chunk);
+			if (rc != 0) {
+				skb_abort_seq_read(&seq);
+				goto error;
 			}
-			crypto_hash_final(&tcp_conn->rx_hash,
-					  (u8 *) &tcp_conn->in.datadgst);
-			debug_tcp("rx digest 0x%x\n", tcp_conn->in.datadgst);
-		}
-
-		rc = iscsi_tcp_copy(conn, sizeof(uint32_t));
-		if (rc) {
-			if (rc == -EAGAIN)
-				goto again;
-			iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
-			return 0;
-		}
-
-		memcpy(&recv_digest, conn->data, sizeof(uint32_t));
-		if (recv_digest != tcp_conn->in.datadgst) {
-			debug_tcp("iscsi_tcp: data digest error!"
-				  "0x%x != 0x%x\n", recv_digest,
-				  tcp_conn->in.datadgst);
-			iscsi_conn_failure(conn, ISCSI_ERR_DATA_DGST);
-			return 0;
-		} else {
-			debug_tcp("iscsi_tcp: data digest match!"
-				  "0x%x == 0x%x\n", recv_digest,
-				  tcp_conn->in.datadgst);
-			tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER;
-		}
-	}
-
-	if (tcp_conn->in_progress == IN_PROGRESS_DATA_RECV &&
-	    tcp_conn->in.copy) {
-		debug_tcp("data_recv offset %d copy %d\n",
-		       tcp_conn->in.offset, tcp_conn->in.copy);
 
-		rc = iscsi_data_recv(conn);
-		if (rc) {
-			if (rc == -EAGAIN)
-				goto again;
-			iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
-			return 0;
+			/* The done() functions sets up the
+			 * next chunk. */
 		}
-
-		if (tcp_conn->in.padding)
-			tcp_conn->in_progress = IN_PROGRESS_PAD_RECV;
-		else if (conn->datadgst_en)
-			tcp_conn->in_progress = IN_PROGRESS_DDIGEST_RECV;
-		else
-			tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER;
-		tcp_conn->data_copied = 0;
-	}
-
-	if (tcp_conn->in_progress == IN_PROGRESS_PAD_RECV &&
-	    tcp_conn->in.copy) {
-		int copylen = min(tcp_conn->in.padding - tcp_conn->data_copied,
-				  tcp_conn->in.copy);
-
-		tcp_conn->in.copy -= copylen;
-		tcp_conn->in.offset += copylen;
-		tcp_conn->data_copied += copylen;
-
-		if (tcp_conn->data_copied != tcp_conn->in.padding)
-			tcp_conn->in_progress = IN_PROGRESS_PAD_RECV;
-		else if (conn->datadgst_en)
-			tcp_conn->in_progress = IN_PROGRESS_DDIGEST_RECV;
-		else
-			tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER;
-		tcp_conn->data_copied = 0;
 	}
 
-	debug_tcp("f, processed %d from out of %d padding %d\n",
-	       tcp_conn->in.offset - offset, (int)len, tcp_conn->in.padding);
-	BUG_ON(tcp_conn->in.offset - offset > len);
+	conn->rxdata_octets += consumed;
+	return consumed;
 
-	if (tcp_conn->in.offset - offset != len) {
-		debug_tcp("continue to process %d bytes\n",
-		       (int)len - (tcp_conn->in.offset - offset));
-		goto more;
-	}
-
-nomore:
-	processed = tcp_conn->in.offset - offset;
-	BUG_ON(processed == 0);
-	return processed;
-
-again:
-	processed = tcp_conn->in.offset - offset;
-	debug_tcp("c, processed %d from out of %d rd_desc_cnt %d\n",
-	          processed, (int)len, (int)rd_desc->count);
-	BUG_ON(processed == 0);
-	BUG_ON(processed > len);
-
-	conn->rxdata_octets += processed;
-	return processed;
+error:
+	debug_tcp("Error receiving PDU, errno=%d\n", rc);
+	iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+	return 0;
 }
 
 static void
 iscsi_tcp_data_ready(struct sock *sk, int flag)
 {
 	struct iscsi_conn *conn = sk->sk_user_data;
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
 	read_descriptor_t rd_desc;
 
 	read_lock(&sk->sk_callback_lock);
 
 	/*
-	 * Use rd_desc to pass 'conn' to iscsi_tcp_data_recv.
+	 * Use rd_desc to pass 'conn' to iscsi_tcp_recv.
 	 * We set count to 1 because we want the network layer to
-	 * hand us all the skbs that are available. iscsi_tcp_data_recv
+	 * hand us all the skbs that are available. iscsi_tcp_recv
 	 * handled pdus that cross buffers or pdus that still need data.
 	 */
 	rd_desc.arg.data = conn;
 	rd_desc.count = 1;
-	tcp_read_sock(sk, &rd_desc, iscsi_tcp_data_recv);
+	tcp_read_sock(sk, &rd_desc, iscsi_tcp_recv);
 
 	read_unlock(&sk->sk_callback_lock);
+
+	/* If we had to (atomically) map a highmem page,
+	 * unmap it now. */
+	iscsi_tcp_chunk_unmap(&tcp_conn->in.chunk);
 }
 
 static void
@@ -1097,9 +1101,9 @@ iscsi_send(struct iscsi_conn *conn, struct iscsi_buf *buf, int size, int flags)
 	 * slab case.
 	 */
 	if (buf->use_sendmsg)
-		res = sock_no_sendpage(sk, sg_page(&buf->sg), offset, size, flags);
+		res = sock_no_sendpage(sk, buf->sg.page, offset, size, flags);
 	else
-		res = tcp_conn->sendpage(sk, sg_page(&buf->sg), offset, size, flags);
+		res = tcp_conn->sendpage(sk, buf->sg.page, offset, size, flags);
 
 	if (res >= 0) {
 		conn->txdata_octets += res;
@@ -1783,9 +1787,6 @@ iscsi_tcp_conn_create(struct iscsi_cls_session *cls_session, uint32_t conn_idx)
 
 	conn->dd_data = tcp_conn;
 	tcp_conn->iscsi_conn = conn;
-	tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER;
-	/* initial operational parameters */
-	tcp_conn->hdr_size = sizeof(struct iscsi_hdr);
 
 	tcp_conn->tx_hash.tfm = crypto_alloc_hash("crc32c", 0,
 						  CRYPTO_ALG_ASYNC);
@@ -1862,11 +1863,9 @@ static void
 iscsi_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag)
 {
 	struct iscsi_conn *conn = cls_conn->dd_data;
-	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
 
 	iscsi_conn_stop(cls_conn, flag);
 	iscsi_tcp_release_conn(conn);
-	tcp_conn->hdr_size = sizeof(struct iscsi_hdr);
 }
 
 static int iscsi_tcp_get_addr(struct iscsi_conn *conn, struct socket *sock,
@@ -1966,7 +1965,7 @@ iscsi_tcp_conn_bind(struct iscsi_cls_session *cls_session,
 	/*
 	 * set receive state machine into initial state
 	 */
-	tcp_conn->in_progress = IN_PROGRESS_WAIT_HEADER;
+	iscsi_tcp_hdr_recv_prep(tcp_conn);
 	return 0;
 
 free_socket:
@@ -2059,9 +2058,6 @@ iscsi_conn_set_param(struct iscsi_cls_conn *cls_conn, enum iscsi_param param,
 	switch(param) {
 	case ISCSI_PARAM_HDRDGST_EN:
 		iscsi_set_param(cls_conn, param, buf, buflen);
-		tcp_conn->hdr_size = sizeof(struct iscsi_hdr);
-		if (conn->hdrdgst_en)
-			tcp_conn->hdr_size += sizeof(__u32);
 		break;
 	case ISCSI_PARAM_DATADGST_EN:
 		iscsi_set_param(cls_conn, param, buf, buflen);
diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
index 7eba44d..f1c5411 100644
--- a/drivers/scsi/iscsi_tcp.h
+++ b/drivers/scsi/iscsi_tcp.h
@@ -24,13 +24,6 @@
 
 #include <scsi/libiscsi.h>
 
-/* Socket's Receive state machine */
-#define IN_PROGRESS_WAIT_HEADER		0x0
-#define IN_PROGRESS_HEADER_GATHER	0x1
-#define IN_PROGRESS_DATA_RECV		0x2
-#define IN_PROGRESS_DDIGEST_RECV	0x3
-#define IN_PROGRESS_PAD_RECV		0x4
-
 /* xmit state machine */
 #define XMSTATE_IDLE			0x0
 #define XMSTATE_CMD_HDR_INIT		0x1
@@ -54,41 +47,64 @@
 
 struct crypto_hash;
 struct socket;
+struct iscsi_tcp_conn;
+struct iscsi_chunk;
+
+typedef int iscsi_chunk_done_fn_t(struct iscsi_tcp_conn *,
+				  struct iscsi_chunk *);
+
+struct iscsi_chunk {
+	unsigned char		*data;
+	unsigned int		size;
+	unsigned int		copied;
+	unsigned int		total_size;
+	unsigned int		total_copied;
+
+	struct hash_desc	*hash;
+	unsigned char		recv_digest[ISCSI_DIGEST_SIZE];
+	unsigned char		digest[ISCSI_DIGEST_SIZE];
+	unsigned int		digest_len;
+
+	struct scatterlist	*sg;
+	void			*sg_mapped;
+	unsigned int		sg_offset;
+	unsigned int		sg_index;
+	unsigned int		sg_count;
+
+	iscsi_chunk_done_fn_t	*done;
+};
 
 /* Socket connection recieve helper */
 struct iscsi_tcp_recv {
 	struct iscsi_hdr	*hdr;
-	struct sk_buff		*skb;
-	int			offset;
-	int			len;
-	int			hdr_offset;
-	int			copy;
-	int			copied;
-	int			padding;
-	struct iscsi_cmd_task	*ctask;		/* current cmd in progress */
+	struct iscsi_chunk	chunk;
+
+	/* Allocate buffer for BHS + AHS */
+	uint32_t		hdr_buf[64];
 
 	/* copied and flipped values */
 	int			datalen;
-	int			datadgst;
-	char			zero_copy_hdr;
+};
+
+/* Socket connection send helper */
+struct iscsi_tcp_send {
+	struct iscsi_hdr	*hdr;
+	struct iscsi_chunk	chunk;
+	struct iscsi_chunk	data_chunk;
+
+	/* Allocate buffer for BHS + AHS */
+	uint32_t		hdr_buf[64];
 };
 
 struct iscsi_tcp_conn {
 	struct iscsi_conn	*iscsi_conn;
 	struct socket		*sock;
-	struct iscsi_hdr	hdr;		/* header placeholder */
-	char			hdrext[4*sizeof(__u16) +
-				    sizeof(__u32)];
-	int			data_copied;
 	int			stop_stage;	/* conn_stop() flag: *
 						 * stop to recover,  *
 						 * stop to terminate */
-	/* iSCSI connection-wide sequencing */
-	int			hdr_size;	/* PDU header size */
-
 	/* control data */
 	struct iscsi_tcp_recv	in;		/* TCP receive context */
-	int			in_progress;	/* connection state machine */
+	struct iscsi_tcp_send	out;		/* TCP send context */
 
 	/* old values for socket callbacks */
 	void			(*old_data_ready)(struct sock *, int);
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 89429f4..e1fb3d0 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -77,6 +77,10 @@ enum {
 
 #define ISCSI_ADDRESS_BUF_LEN		64
 
+enum {
+	ISCSI_DIGEST_SIZE = sizeof(__u32),
+};
+
 struct iscsi_mgmt_task {
 	/*
 	 * Becuae LLDs allocate their hdr differently, this is a pointer to
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/23] Prettify resid handling and some extra checks
  2007-12-05  0:42   ` [PATCH 02/23] iscsi_tcp: rewrite recv path michaelc
@ 2007-12-05  0:42     ` michaelc
  2007-12-05  0:42       ` [PATCH 04/23] iscsi_tcp, libiscsi: initial AHS Support michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, Boaz Harrosh

From: Mike Christie <michaelc@cs.wisc.edu>

from Boaz Harrosh:
  - Check to see that OVERFLOW is not negative indicating
    a bug.
  - Unify handling of UNDERFLOW and OVERFLOW to the same
    code.
  - Also handle BIDI_OVERFLOW.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c |   16 +++++++---------
 drivers/scsi/libiscsi.c  |   12 +++++++-----
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 1b540e0..fd88777 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -507,22 +507,20 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	}
 
 	if (rhdr->flags & ISCSI_FLAG_DATA_STATUS) {
+		sc->result = (DID_OK << 16) | rhdr->cmd_status;
 		conn->exp_statsn = be32_to_cpu(rhdr->statsn) + 1;
-		if (rhdr->flags & ISCSI_FLAG_DATA_UNDERFLOW) {
+		if (rhdr->flags & (ISCSI_FLAG_DATA_UNDERFLOW |
+		                   ISCSI_FLAG_DATA_OVERFLOW)) {
 			int res_count = be32_to_cpu(rhdr->residual_count);
 
 			if (res_count > 0 &&
-			    res_count <= scsi_bufflen(sc)) {
+			    (rhdr->flags & ISCSI_FLAG_CMD_OVERFLOW ||
+			     res_count <= scsi_bufflen(sc)))
 				scsi_set_resid(sc, res_count);
-				sc->result = (DID_OK << 16) | rhdr->cmd_status;
-			} else
+			else
 				sc->result = (DID_BAD_TARGET << 16) |
 					rhdr->cmd_status;
-		} else if (rhdr->flags & ISCSI_FLAG_DATA_OVERFLOW) {
-			scsi_set_resid(sc, be32_to_cpu(rhdr->residual_count));
-			sc->result = (DID_OK << 16) | rhdr->cmd_status;
-		} else
-			sc->result = (DID_OK << 16) | rhdr->cmd_status;
+		}
 	}
 
 	conn->datain_pdus_cnt++;
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 176458f..0beb4c6 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -291,17 +291,19 @@ invalid_datalen:
 			   min_t(uint16_t, senselen, SCSI_SENSE_BUFFERSIZE));
 	}
 
-	if (rhdr->flags & ISCSI_FLAG_CMD_UNDERFLOW) {
+	if (rhdr->flags & (ISCSI_FLAG_CMD_UNDERFLOW |
+	                   ISCSI_FLAG_CMD_OVERFLOW)) {
 		int res_count = be32_to_cpu(rhdr->residual_count);
 
-		if (res_count > 0 && res_count <= scsi_bufflen(sc))
+		if (res_count > 0 &&
+		    (rhdr->flags & ISCSI_FLAG_CMD_OVERFLOW ||
+		     res_count <= scsi_bufflen(sc)))
 			scsi_set_resid(sc, res_count);
 		else
 			sc->result = (DID_BAD_TARGET << 16) | rhdr->cmd_status;
-	} else if (rhdr->flags & ISCSI_FLAG_CMD_BIDI_UNDERFLOW)
+	} else if (rhdr->flags & (ISCSI_FLAG_CMD_BIDI_UNDERFLOW |
+	                          ISCSI_FLAG_CMD_BIDI_OVERFLOW))
 		sc->result = (DID_BAD_TARGET << 16) | rhdr->cmd_status;
-	else if (rhdr->flags & ISCSI_FLAG_CMD_OVERFLOW)
-		scsi_set_resid(sc, be32_to_cpu(rhdr->residual_count));
 
 out:
 	debug_scsi("done [sc %lx res %d itt 0x%x]\n",
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/23] iscsi_tcp, libiscsi: initial AHS Support
  2007-12-05  0:42     ` [PATCH 03/23] Prettify resid handling and some extra checks michaelc
@ 2007-12-05  0:42       ` michaelc
  2007-12-05  0:42         ` [PATCH 05/23] iser patching for AHS support michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, Boaz Harrosh

From: Mike Christie <michaelc@cs.wisc.edu>

  at libiscsi generic code
  - currently code assumes a storage space of pdu header is allocated
    at llds ctask and is pointed to by iscsi_cmd_task->hdr. Here I add
    a hdr_max field pertaining to that storage, and an hdr_len that
    accumulates the current use of the pdu-header.

  - Add an iscsi_next_hdr() inline which returns the next free space
    to write new Header at. Also iscsi_next_hdr() is used to retrieve
    the address at which to write the header-digest.

  - Add iscsi_add_hdr(length). What the user do is calls iscsi_next_hdr()
    for address of the new header, than calls iscsi_add_hdr(length) with
    the size of the new header. iscsi_add_hdr() will check if space is
    available and update to the new size. length must be padded according
    to standard.

  - Add 2 padding inline helpers thanks to Olaf. Current patch does not
    use them but Following patches will.
    Also moved definition of ISCSI_PAD_LEN to iscsi_proto.h which had
    PAD_WORD_LEN that was never used anywhere.

  - Let iscsi_prep_scsi_cmd_pdu() signal an Error return since now  it is
    possible that it will fail.

  - I was tired of yet again writing a "this is a digest" comment next to
    sizeof(__u32) so I defined a new ISCSI_DIGEST_SIZE. Now I don't need
    any comments. Changed all places that used sizeof(__u32) or "4" in
    connection to a digest.

  iscsi_tcp specific code
  - At struct iscsi_tcp_cmd_task allocate maximum space allowed in
    standard for all headers following the iscsi_cmd header. and mark
    it so in iscsi_tcp_session_create()
  - At iscsi_send_cmd_hdr() retrieve the correct headers size and
    write header digest at iscsi_next_hdr().

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c   |   16 ++++++++--------
 drivers/scsi/iscsi_tcp.h   |   13 +++++++------
 drivers/scsi/libiscsi.c    |   41 +++++++++++++++++++++++++++++++++++++++--
 include/scsi/iscsi_proto.h |   10 +++++++++-
 include/scsi/libiscsi.h    |   33 +++++++++++++++++++++++++++++++--
 5 files changed, 94 insertions(+), 19 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index fd88777..491845f 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -113,7 +113,7 @@ iscsi_hdr_digest(struct iscsi_conn *conn, struct iscsi_buf *buf,
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
 
 	crypto_hash_digest(&tcp_conn->tx_hash, &buf->sg, buf->sg.length, crc);
-	buf->sg.length += sizeof(u32);
+	buf->sg.length += ISCSI_DIGEST_SIZE;
 }
 
 /*
@@ -220,6 +220,7 @@ static inline int
 iscsi_tcp_chunk_done(struct iscsi_chunk *chunk)
 {
 	static unsigned char padbuf[ISCSI_PAD_LEN];
+	unsigned int pad;
 
 	if (chunk->copied < chunk->size) {
 		iscsi_tcp_chunk_map(chunk);
@@ -243,10 +244,8 @@ iscsi_tcp_chunk_done(struct iscsi_chunk *chunk)
 	}
 
 	/* Do we need to handle padding? */
-	if (chunk->total_copied & (ISCSI_PAD_LEN-1)) {
-		unsigned int pad;
-
-		pad = ISCSI_PAD_LEN - (chunk->total_copied & (ISCSI_PAD_LEN-1));
+	pad = iscsi_padding(chunk->total_copied);
+	if (pad != 0) {
 		debug_tcp("consume %d pad bytes\n", pad);
 		chunk->total_size += pad;
 		chunk->size = pad;
@@ -1385,11 +1384,11 @@ iscsi_send_cmd_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 		}
 
 		iscsi_buf_init_iov(&tcp_ctask->headbuf, (char*)ctask->hdr,
-				  sizeof(struct iscsi_hdr));
+				  ctask->hdr_len);
 
 		if (conn->hdrdgst_en)
 			iscsi_hdr_digest(conn, &tcp_ctask->headbuf,
-					 (u8*)tcp_ctask->hdrext);
+					 iscsi_next_hdr(ctask));
 		tcp_ctask->xmstate &= ~XMSTATE_CMD_HDR_INIT;
 		tcp_ctask->xmstate |= XMSTATE_CMD_HDR_XMIT;
 	}
@@ -2176,7 +2175,8 @@ iscsi_tcp_session_create(struct iscsi_transport *iscsit,
 		struct iscsi_cmd_task *ctask = session->cmds[cmd_i];
 		struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
 
-		ctask->hdr = &tcp_ctask->hdr;
+		ctask->hdr = &tcp_ctask->hdr.cmd_hdr;
+		ctask->hdr_max = sizeof(tcp_ctask->hdr) - ISCSI_DIGEST_SIZE;
 	}
 
 	for (cmd_i = 0; cmd_i < session->mgmtpool_max; cmd_i++) {
diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
index f1c5411..eb3784f 100644
--- a/drivers/scsi/iscsi_tcp.h
+++ b/drivers/scsi/iscsi_tcp.h
@@ -41,7 +41,6 @@
 #define XMSTATE_IMM_HDR_INIT		0x1000
 #define XMSTATE_SOL_HDR_INIT		0x2000
 
-#define ISCSI_PAD_LEN			4
 #define ISCSI_SG_TABLESIZE		SG_ALL
 #define ISCSI_TCP_MAX_CMD_LEN		16
 
@@ -130,14 +129,14 @@ struct iscsi_buf {
 
 struct iscsi_data_task {
 	struct iscsi_data	hdr;			/* PDU */
-	char			hdrext[sizeof(__u32)];	/* Header-Digest */
+	char			hdrext[ISCSI_DIGEST_SIZE];/* Header-Digest */
 	struct iscsi_buf	digestbuf;		/* digest buffer */
 	uint32_t		digest;			/* data digest */
 };
 
 struct iscsi_tcp_mgmt_task {
 	struct iscsi_hdr	hdr;
-	char			hdrext[sizeof(__u32)]; /* Header-Digest */
+	char			hdrext[ISCSI_DIGEST_SIZE]; /* Header-Digest */
 	int			xmstate;	/* mgmt xmit progress */
 	struct iscsi_buf	headbuf;	/* header buffer */
 	struct iscsi_buf	sendbuf;	/* in progress buffer */
@@ -159,9 +158,11 @@ struct iscsi_r2t_info {
 };
 
 struct iscsi_tcp_cmd_task {
-	struct iscsi_cmd	hdr;
-	char			hdrext[4*sizeof(__u16)+	/* AHS */
-				    sizeof(__u32)];	/* HeaderDigest */
+	struct iscsi_hdr_buff {
+		struct iscsi_cmd	cmd_hdr;
+		char			hdrextbuf[ISCSI_MAX_AHS_SIZE +
+		                                  ISCSI_DIGEST_SIZE];
+	} hdr;
 	char			pad[ISCSI_PAD_LEN];
 	int			pad_count;		/* padded bytes */
 	struct iscsi_buf	headbuf;		/* header buf (xmit) */
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0beb4c6..0d7914f 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -37,6 +37,9 @@
 #include <scsi/scsi_transport_iscsi.h>
 #include <scsi/libiscsi.h>
 
+static void fail_command(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
+			 int err);
+
 struct iscsi_session *
 class_to_transport_session(struct iscsi_cls_session *cls_session)
 {
@@ -122,6 +125,20 @@ void iscsi_prep_unsolicit_data_pdu(struct iscsi_cmd_task *ctask,
 }
 EXPORT_SYMBOL_GPL(iscsi_prep_unsolicit_data_pdu);
 
+static int iscsi_add_hdr(struct iscsi_cmd_task *ctask, unsigned len)
+{
+	unsigned exp_len = ctask->hdr_len + len;
+
+	if (exp_len > ctask->hdr_max) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+
+	WARN_ON(len & (ISCSI_PAD_LEN - 1)); /* caller must pad the AHS */
+	ctask->hdr_len = exp_len;
+	return 0;
+}
+
 /**
  * iscsi_prep_scsi_cmd_pdu - prep iscsi scsi cmd pdu
  * @ctask: iscsi cmd task
@@ -129,13 +146,19 @@ EXPORT_SYMBOL_GPL(iscsi_prep_unsolicit_data_pdu);
  * Prep basic iSCSI PDU fields for a scsi cmd pdu. The LLD should set
  * fields like dlength or final based on how much data it sends
  */
-static void iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
+static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
 {
 	struct iscsi_conn *conn = ctask->conn;
 	struct iscsi_session *session = conn->session;
 	struct iscsi_cmd *hdr = ctask->hdr;
 	struct scsi_cmnd *sc = ctask->sc;
+	unsigned hdrlength;
+	int rc;
 
+	ctask->hdr_len = 0;
+	rc = iscsi_add_hdr(ctask, sizeof(*hdr));
+	if (rc)
+		return rc;
         hdr->opcode = ISCSI_OP_SCSI_CMD;
         hdr->flags = ISCSI_ATTR_SIMPLE;
         int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun);
@@ -199,6 +222,15 @@ static void iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
 			hdr->flags |= ISCSI_FLAG_CMD_READ;
 	}
 
+	/* calculate size of additional header segments (AHSs) */
+	hdrlength = ctask->hdr_len - sizeof(*hdr);
+
+	WARN_ON(hdrlength & (ISCSI_PAD_LEN-1));
+	hdrlength /= ISCSI_PAD_LEN;
+
+	WARN_ON(hdrlength >= 256);
+	hdr->hlength = hdrlength & 0xFF;
+
 	conn->scsicmd_pdus_cnt++;
 
         debug_scsi("iscsi prep [%s cid %d sc %p cdb 0x%x itt 0x%x len %d "
@@ -206,6 +238,7 @@ static void iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
                 sc->sc_data_direction == DMA_TO_DEVICE ? "write" : "read",
 		conn->id, sc, sc->cmnd[0], ctask->itt, scsi_bufflen(sc),
                 session->cmdsn, session->max_cmdsn - session->exp_cmdsn + 1);
+	return 0;
 }
 
 /**
@@ -744,7 +777,10 @@ check_mgmt:
 
 		conn->ctask = list_entry(conn->xmitqueue.next,
 					 struct iscsi_cmd_task, running);
-		iscsi_prep_scsi_cmd_pdu(conn->ctask);
+		if (iscsi_prep_scsi_cmd_pdu(conn->ctask)) {
+			fail_command(conn, conn->ctask, DID_ABORT << 16);
+			continue;
+		}
 		conn->session->tt->init_cmd_task(conn->ctask);
 		conn->ctask->state = ISCSI_TASK_RUNNING;
 		list_move_tail(conn->xmitqueue.next, &conn->run_list);
@@ -1534,6 +1570,7 @@ iscsi_session_setup(struct iscsi_transport *iscsit,
 		if (cmd_task_size)
 			ctask->dd_data = &ctask[1];
 		ctask->itt = cmd_i;
+		ctask->hdr_max = sizeof(struct iscsi_cmd);
 		INIT_LIST_HEAD(&ctask->running);
 	}
 
diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h
index 751c81e..6947082 100644
--- a/include/scsi/iscsi_proto.h
+++ b/include/scsi/iscsi_proto.h
@@ -27,7 +27,7 @@
 #define ISCSI_LISTEN_PORT	3260
 
 /* Padding word length */
-#define PAD_WORD_LEN		4
+#define ISCSI_PAD_LEN		4
 
 /*
  * useful common(control and data pathes) macro
@@ -147,6 +147,14 @@ struct iscsi_rlength_ahdr {
 	__be32 read_length;
 };
 
+/* Extended CDB AHS */
+struct iscsi_ecdb_ahdr {
+	__be16 ahslength;	/* CDB length - 15, including reserved byte */
+	uint8_t ahstype;
+	uint8_t reserved;
+	uint8_t ecdb[260 - 16];	/* 4-byte aligned extended CDB spillover */
+};
+
 /* SCSI Response Header */
 struct iscsi_cmd_rsp {
 	uint8_t opcode;
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index e1fb3d0..a9a9e86 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -78,6 +78,9 @@ enum {
 #define ISCSI_ADDRESS_BUF_LEN		64
 
 enum {
+	/* this is the maximum possible storage for AHSs */
+	ISCSI_MAX_AHS_SIZE = sizeof(struct iscsi_ecdb_ahdr) +
+				sizeof(struct iscsi_rlength_ahdr),
 	ISCSI_DIGEST_SIZE = sizeof(__u32),
 };
 
@@ -102,10 +105,13 @@ enum {
 
 struct iscsi_cmd_task {
 	/*
-	 * Becuae LLDs allocate their hdr differently, this is a pointer to
-	 * that storage. It must be setup at session creation time.
+	 * Because LLDs allocate their hdr differently, this is a pointer
+	 * and length to that storage. It must be setup at session
+	 * creation time.
 	 */
 	struct iscsi_cmd	*hdr;
+	unsigned short		hdr_max;
+	unsigned short		hdr_len;	/* accumulated size of hdr used */
 	int			itt;		/* this ITT */
 
 	uint32_t		unsol_datasn;
@@ -124,6 +130,11 @@ struct iscsi_cmd_task {
 	void			*dd_data;	/* driver/transport data */
 };
 
+static inline void* iscsi_next_hdr(struct iscsi_cmd_task *ctask)
+{
+	return (void*)ctask->hdr + ctask->hdr_len;
+}
+
 struct iscsi_conn {
 	struct iscsi_cls_conn	*cls_conn;	/* ptr to class connection */
 	void			*dd_data;	/* iscsi_transport data */
@@ -342,4 +353,22 @@ extern void iscsi_requeue_ctask(struct iscsi_cmd_task *ctask);
 extern void iscsi_pool_free(struct iscsi_queue *, void **);
 extern int iscsi_pool_init(struct iscsi_queue *, int, void ***, int);
 
+/*
+ * inline functions to deal with padding.
+ */
+static inline unsigned int
+iscsi_padded(unsigned int len)
+{
+	return (len + ISCSI_PAD_LEN - 1) & ~(ISCSI_PAD_LEN - 1);
+}
+
+static inline unsigned int
+iscsi_padding(unsigned int len)
+{
+	len &= (ISCSI_PAD_LEN - 1);
+	if (len)
+		len = ISCSI_PAD_LEN - len;
+	return len;
+}
+
 #endif
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/23] iser patching for AHS support
  2007-12-05  0:42       ` [PATCH 04/23] iscsi_tcp, libiscsi: initial AHS Support michaelc
@ 2007-12-05  0:42         ` michaelc
  2007-12-05  0:42           ` [PATCH 06/23] libiscsi, iscsi_tcp: iscsi pool cleanup michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, Boaz Harrosh

From: Mike Christie <michaelc@cs.wisc.edu>

from Boaz Harrosh <boazharrosh@gmail.com>

  - The default initialization of hdr_max is the minimum -
    sizeof(struct iscsi_cmd) - Once this patch goes into iser the default
    initialization at libiscsi can be removed.
  - This is not yet full support for AHSs at iser end. But it should be easy.
    Just allocate more space at iser_desc right after iscsi_hdr. Than
    at transmission time use ctask->hdr_len to retrieve the total
    size of all iscsi pdu headers. See previous patch at iscsi_tcp.[ch]

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |    1 +
 drivers/scsi/libiscsi.c                  |    1 -
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 2eadb6d..a2622f4 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -400,6 +400,7 @@ iscsi_iser_session_create(struct iscsi_transport *iscsit,
 		ctask      = session->cmds[i];
 		iser_ctask = ctask->dd_data;
 		ctask->hdr = (struct iscsi_cmd *)&iser_ctask->desc.iscsi_header;
+		ctask->hdr_max = sizeof(iser_ctask->desc.iscsi_header);
 	}
 
 	for (i = 0; i < session->mgmtpool_max; i++) {
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 0d7914f..5936586 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1570,7 +1570,6 @@ iscsi_session_setup(struct iscsi_transport *iscsit,
 		if (cmd_task_size)
 			ctask->dd_data = &ctask[1];
 		ctask->itt = cmd_i;
-		ctask->hdr_max = sizeof(struct iscsi_cmd);
 		INIT_LIST_HEAD(&ctask->running);
 	}
 
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/23] libiscsi, iscsi_tcp: iscsi pool cleanup
  2007-12-05  0:42         ` [PATCH 05/23] iser patching for AHS support michaelc
@ 2007-12-05  0:42           ` michaelc
  2007-12-05  0:42             ` [PATCH 07/23] libiscsi: do not block session during logout michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, Olaf Kirch

From: Mike Christie <michaelc@cs.wisc.edu>

from olaf.kirch@oracle.com

iscsi_pool_init simplified

iscsi_pool_init currently has a lot of duplicate kfree() calls it does
when some allocation fails. This patch simplifies the code a little by
using iscsi_pool_free to tear down the pool in case of an error.

iscsi_pool_init also returns a copy of the item array to the caller.
Not all callers use this array, so we make it optional.

Instead of allocating a second array and return that, allocate just one
array, of twice the size.

Update users of iscsi_pool_{init,free}

This patch drops the (now useless) second argument to
iscsi_pool_free, and updates all callers.

It also removes the ctask->r2ts array, which was never
used anyway. Since the items argument to iscsi_pool_init
is now optional, we can pass NULL instead.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c |   12 ++-----
 drivers/scsi/iscsi_tcp.h |    3 +-
 drivers/scsi/libiscsi.c  |   75 ++++++++++++++++++++++++---------------------
 include/scsi/libiscsi.h  |   10 +++---
 4 files changed, 50 insertions(+), 50 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 491845f..f79a457 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -1998,8 +1998,7 @@ iscsi_r2tpool_alloc(struct iscsi_session *session)
 		 */
 
 		/* R2T pool */
-		if (iscsi_pool_init(&tcp_ctask->r2tpool, session->max_r2t * 4,
-				    (void***)&tcp_ctask->r2ts,
+		if (iscsi_pool_init(&tcp_ctask->r2tpool, session->max_r2t * 4, NULL,
 				    sizeof(struct iscsi_r2t_info))) {
 			goto r2t_alloc_fail;
 		}
@@ -2008,8 +2007,7 @@ iscsi_r2tpool_alloc(struct iscsi_session *session)
 		tcp_ctask->r2tqueue = kfifo_alloc(
 		      session->max_r2t * 4 * sizeof(void*), GFP_KERNEL, NULL);
 		if (tcp_ctask->r2tqueue == ERR_PTR(-ENOMEM)) {
-			iscsi_pool_free(&tcp_ctask->r2tpool,
-					(void**)tcp_ctask->r2ts);
+			iscsi_pool_free(&tcp_ctask->r2tpool);
 			goto r2t_alloc_fail;
 		}
 	}
@@ -2022,8 +2020,7 @@ r2t_alloc_fail:
 		struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
 
 		kfifo_free(tcp_ctask->r2tqueue);
-		iscsi_pool_free(&tcp_ctask->r2tpool,
-				(void**)tcp_ctask->r2ts);
+		iscsi_pool_free(&tcp_ctask->r2tpool);
 	}
 	return -ENOMEM;
 }
@@ -2038,8 +2035,7 @@ iscsi_r2tpool_free(struct iscsi_session *session)
 		struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
 
 		kfifo_free(tcp_ctask->r2tqueue);
-		iscsi_pool_free(&tcp_ctask->r2tpool,
-				(void**)tcp_ctask->r2ts);
+		iscsi_pool_free(&tcp_ctask->r2tpool);
 	}
 }
 
diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
index eb3784f..d49d876 100644
--- a/drivers/scsi/iscsi_tcp.h
+++ b/drivers/scsi/iscsi_tcp.h
@@ -175,9 +175,8 @@ struct iscsi_tcp_cmd_task {
 	uint32_t		exp_datasn;		/* expected target's R2TSN/DataSN */
 	int			data_offset;
 	struct iscsi_r2t_info	*r2t;			/* in progress R2T    */
-	struct iscsi_queue	r2tpool;
+	struct iscsi_pool	r2tpool;
 	struct kfifo		*r2tqueue;
-	struct iscsi_r2t_info	**r2ts;
 	int			digest_count;
 	uint32_t		immdigest;		/* for imm data */
 	struct iscsi_buf	immbuf;			/* for imm data digest */
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 5936586..d43f909 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1413,59 +1413,64 @@ done:
 }
 EXPORT_SYMBOL_GPL(iscsi_eh_device_reset);
 
+/*
+ * Pre-allocate a pool of @max items of @item_size. By default, the pool
+ * should be accessed via kfifo_{get,put} on q->queue.
+ * Optionally, the caller can obtain the array of object pointers
+ * by passing in a non-NULL @items pointer
+ */
 int
-iscsi_pool_init(struct iscsi_queue *q, int max, void ***items, int item_size)
+iscsi_pool_init(struct iscsi_pool *q, int max, void ***items, int item_size)
 {
-	int i;
+	int i, num_arrays = 1;
 
-	*items = kmalloc(max * sizeof(void*), GFP_KERNEL);
-	if (*items == NULL)
-		return -ENOMEM;
+	memset(q, 0, sizeof(*q));
 
 	q->max = max;
-	q->pool = kmalloc(max * sizeof(void*), GFP_KERNEL);
-	if (q->pool == NULL) {
-		kfree(*items);
-		return -ENOMEM;
-	}
+
+	/* If the user passed an items pointer, he wants a copy of
+	 * the array. */
+	if (items)
+		num_arrays++;
+	q->pool = kzalloc(num_arrays * max * sizeof(void*), GFP_KERNEL);
+	if (q->pool == NULL)
+		goto enomem;
 
 	q->queue = kfifo_init((void*)q->pool, max * sizeof(void*),
 			      GFP_KERNEL, NULL);
-	if (q->queue == ERR_PTR(-ENOMEM)) {
-		kfree(q->pool);
-		kfree(*items);
-		return -ENOMEM;
-	}
+	if (q->queue == ERR_PTR(-ENOMEM))
+		goto enomem;
 
 	for (i = 0; i < max; i++) {
-		q->pool[i] = kmalloc(item_size, GFP_KERNEL);
+		q->pool[i] = kzalloc(item_size, GFP_KERNEL);
 		if (q->pool[i] == NULL) {
-			int j;
-
-			for (j = 0; j < i; j++)
-				kfree(q->pool[j]);
-
-			kfifo_free(q->queue);
-			kfree(q->pool);
-			kfree(*items);
-			return -ENOMEM;
+			q->max = i;
+			goto enomem;
 		}
-		memset(q->pool[i], 0, item_size);
-		(*items)[i] = q->pool[i];
 		__kfifo_put(q->queue, (void*)&q->pool[i], sizeof(void*));
 	}
+
+	if (items) {
+		*items = q->pool + max;
+		memcpy(*items, q->pool, max * sizeof(void *));
+	}
+
 	return 0;
+
+enomem:
+	iscsi_pool_free(q);
+	return -ENOMEM;
 }
 EXPORT_SYMBOL_GPL(iscsi_pool_init);
 
-void iscsi_pool_free(struct iscsi_queue *q, void **items)
+void iscsi_pool_free(struct iscsi_pool *q)
 {
 	int i;
 
 	for (i = 0; i < q->max; i++)
-		kfree(items[i]);
-	kfree(q->pool);
-	kfree(items);
+		kfree(q->pool[i]);
+	if (q->pool)
+		kfree(q->pool);
 }
 EXPORT_SYMBOL_GPL(iscsi_pool_free);
 
@@ -1610,9 +1615,9 @@ module_put:
 cls_session_fail:
 	scsi_remove_host(shost);
 add_host_fail:
-	iscsi_pool_free(&session->mgmtpool, (void**)session->mgmt_cmds);
+	iscsi_pool_free(&session->mgmtpool);
 mgmtpool_alloc_fail:
-	iscsi_pool_free(&session->cmdpool, (void**)session->cmds);
+	iscsi_pool_free(&session->cmdpool);
 cmdpool_alloc_fail:
 	scsi_host_put(shost);
 	return NULL;
@@ -1635,8 +1640,8 @@ void iscsi_session_teardown(struct iscsi_cls_session *cls_session)
 	iscsi_unblock_session(cls_session);
 	scsi_remove_host(shost);
 
-	iscsi_pool_free(&session->mgmtpool, (void**)session->mgmt_cmds);
-	iscsi_pool_free(&session->cmdpool, (void**)session->cmds);
+	iscsi_pool_free(&session->mgmtpool);
+	iscsi_pool_free(&session->cmdpool);
 
 	kfree(session->password);
 	kfree(session->password_in);
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index a9a9e86..4b3e3c1 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -215,7 +215,7 @@ struct iscsi_conn {
 	uint32_t		eh_abort_cnt;
 };
 
-struct iscsi_queue {
+struct iscsi_pool {
 	struct kfifo		*queue;		/* FIFO Queue */
 	void			**pool;		/* Pool of elements */
 	int			max;		/* Max number of elements */
@@ -274,10 +274,10 @@ struct iscsi_session {
 
 	int			cmds_max;	/* size of cmds array */
 	struct iscsi_cmd_task	**cmds;		/* Original Cmds arr */
-	struct iscsi_queue	cmdpool;	/* PDU's pool */
+	struct iscsi_pool	cmdpool;	/* PDU's pool */
 	int			mgmtpool_max;	/* size of mgmt array */
 	struct iscsi_mgmt_task	**mgmt_cmds;	/* Original mgmt arr */
-	struct iscsi_queue	mgmtpool;	/* Mgmt PDU's pool */
+	struct iscsi_pool	mgmtpool;	/* Mgmt PDU's pool */
 };
 
 /*
@@ -350,8 +350,8 @@ extern void iscsi_requeue_ctask(struct iscsi_cmd_task *ctask);
 /*
  * generic helpers
  */
-extern void iscsi_pool_free(struct iscsi_queue *, void **);
-extern int iscsi_pool_init(struct iscsi_queue *, int, void ***, int);
+extern void iscsi_pool_free(struct iscsi_pool *);
+extern int iscsi_pool_init(struct iscsi_pool *, int, void ***, int);
 
 /*
  * inline functions to deal with padding.
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/23] libiscsi: do not block session during logout
  2007-12-05  0:42           ` [PATCH 06/23] libiscsi, iscsi_tcp: iscsi pool cleanup michaelc
@ 2007-12-05  0:42             ` michaelc
  2007-12-05  0:42               ` [PATCH 08/23] iscsi class: Use our own workq instead of common system one michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

There is not need to block the session during logout. Since
we are going to fail the commands that were blocked just fail them
immediately instead.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/infiniband/ulp/iser/iser_initiator.c |   11 +--
 drivers/scsi/iscsi_tcp.c                     |    4 +-
 drivers/scsi/libiscsi.c                      |  153 ++++++++++++++------------
 include/scsi/libiscsi.h                      |    2 +
 include/scsi/scsi_transport_iscsi.h          |    1 +
 5 files changed, 89 insertions(+), 82 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
index a6f2303..46fc9b7 100644
--- a/drivers/infiniband/ulp/iser/iser_initiator.c
+++ b/drivers/infiniband/ulp/iser/iser_initiator.c
@@ -617,15 +617,8 @@ void iser_snd_completion(struct iser_desc *tx_desc)
 		/* this arithmetic is legal by libiscsi dd_data allocation */
 		mtask = (void *) ((long)(void *)tx_desc -
 				  sizeof(struct iscsi_mgmt_task));
-		if (mtask->hdr->itt == RESERVED_ITT) {
-			struct iscsi_session *session = conn->session;
-
-			spin_lock(&conn->session->lock);
-			list_del(&mtask->running);
-			__kfifo_put(session->mgmtpool.queue, (void*)&mtask,
-				    sizeof(void*));
-			spin_unlock(&session->lock);
-		}
+		if (mtask->hdr->itt == RESERVED_ITT)
+			iscsi_free_mgmt_task(conn, mtask);
 	}
 }
 
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index f79a457..90eae8e 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -1349,9 +1349,7 @@ iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 		struct iscsi_session *session = conn->session;
 
 		spin_lock_bh(&session->lock);
-		list_del(&conn->mtask->running);
-		__kfifo_put(session->mgmtpool.queue, (void*)&conn->mtask,
-			    sizeof(void*));
+		iscsi_free_mgmt_task(conn, mtask);
 		spin_unlock_bh(&session->lock);
 	}
 	return 0;
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index d43f909..b7a2b9a 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -37,9 +37,6 @@
 #include <scsi/scsi_transport_iscsi.h>
 #include <scsi/libiscsi.h>
 
-static void fail_command(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
-			 int err);
-
 struct iscsi_session *
 class_to_transport_session(struct iscsi_cls_session *cls_session)
 {
@@ -274,6 +271,53 @@ static void __iscsi_put_ctask(struct iscsi_cmd_task *ctask)
 		iscsi_complete_command(ctask);
 }
 
+/*
+ * session lock must be held
+ */
+static void fail_command(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
+			 int err)
+{
+	struct scsi_cmnd *sc;
+
+	sc = ctask->sc;
+	if (!sc)
+		return;
+
+	if (ctask->state == ISCSI_TASK_PENDING)
+		/*
+		 * cmd never made it to the xmit thread, so we should not count
+		 * the cmd in the sequencing
+		 */
+		conn->session->queued_cmdsn--;
+	else
+		conn->session->tt->cleanup_cmd_task(conn, ctask);
+
+	sc->result = err;
+	scsi_set_resid(sc, scsi_bufflen(sc));
+	if (conn->ctask == ctask)
+		conn->ctask = NULL;
+	/* release ref from queuecommand */
+	__iscsi_put_ctask(ctask);
+}
+
+/**
+ * iscsi_free_mgmt_task - return mgmt task back to pool
+ * @conn: iscsi connection
+ * @mtask: mtask
+ *
+ * Must be called with session lock.
+ */
+void iscsi_free_mgmt_task(struct iscsi_conn *conn,
+			  struct iscsi_mgmt_task *mtask)
+{
+	list_del_init(&mtask->running);
+	if (conn->login_mtask == mtask)
+		return;
+	__kfifo_put(conn->session->mgmtpool.queue,
+		    (void*)&mtask, sizeof(void*));
+}
+EXPORT_SYMBOL_GPL(iscsi_free_mgmt_task);
+
 /**
  * iscsi_cmd_rsp - SCSI Command Response processing
  * @conn: iscsi connection
@@ -464,10 +508,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			 */
 			if (iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen))
 				rc = ISCSI_ERR_CONN_FAILED;
-			list_del_init(&mtask->running);
-			if (conn->login_mtask != mtask)
-				__kfifo_put(session->mgmtpool.queue,
-					    (void*)&mtask, sizeof(void*));
+			iscsi_free_mgmt_task(conn, mtask);
 			break;
 		case ISCSI_OP_SCSI_TMFUNC_RSP:
 			if (datalen) {
@@ -476,6 +517,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			}
 
 			iscsi_tmf_rsp(conn, hdr);
+			iscsi_free_mgmt_task(conn, mtask);
 			break;
 		case ISCSI_OP_NOOP_IN:
 			if (hdr->ttt != cpu_to_be32(ISCSI_RESERVED_TAG) || datalen) {
@@ -486,9 +528,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 
 			if (iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen))
 				rc = ISCSI_ERR_CONN_FAILED;
-			list_del_init(&mtask->running);
-			__kfifo_put(session->mgmtpool.queue,
-				    (void*)&mtask, sizeof(void*));
+			iscsi_free_mgmt_task(conn, mtask);
 			break;
 		default:
 			rc = ISCSI_ERR_BAD_OPCODE;
@@ -650,14 +690,12 @@ static void iscsi_prep_mtask(struct iscsi_conn *conn,
 static int iscsi_xmit_mtask(struct iscsi_conn *conn)
 {
 	struct iscsi_hdr *hdr = conn->mtask->hdr;
-	int rc, was_logout = 0;
+	int rc;
 
+	if ((hdr->opcode & ISCSI_OPCODE_MASK) == ISCSI_OP_LOGOUT)
+		conn->session->state = ISCSI_STATE_LOGGING_OUT;
 	spin_unlock_bh(&conn->session->lock);
-	if ((hdr->opcode & ISCSI_OPCODE_MASK) == ISCSI_OP_LOGOUT) {
-		conn->session->state = ISCSI_STATE_IN_RECOVERY;
-		iscsi_block_session(session_to_cls(conn->session));
-		was_logout = 1;
-	}
+
 	rc = conn->session->tt->xmit_mgmt_task(conn, conn->mtask);
 	spin_lock_bh(&conn->session->lock);
 	if (rc)
@@ -665,11 +703,6 @@ static int iscsi_xmit_mtask(struct iscsi_conn *conn)
 
 	/* done with this in-progress mtask */
 	conn->mtask = NULL;
-
-	if (was_logout) {
-		set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_tx);
-		return -ENODATA;
-	}
 	return 0;
 }
 
@@ -763,6 +796,12 @@ check_mgmt:
 	while (!list_empty(&conn->mgmtqueue)) {
 		conn->mtask = list_entry(conn->mgmtqueue.next,
 					 struct iscsi_mgmt_task, running);
+		if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
+			iscsi_free_mgmt_task(conn, conn->mtask);
+			conn->mtask = NULL;
+			continue;
+		}
+
 		iscsi_prep_mtask(conn, conn->mtask);
 		list_move_tail(conn->mgmtqueue.next, &conn->mgmt_run_list);
 		rc = iscsi_xmit_mtask(conn);
@@ -777,6 +816,10 @@ check_mgmt:
 
 		conn->ctask = list_entry(conn->xmitqueue.next,
 					 struct iscsi_cmd_task, running);
+		if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
+			fail_command(conn, conn->ctask, DID_NO_CONNECT << 16);
+			continue;
+		}
 		if (iscsi_prep_scsi_cmd_pdu(conn->ctask)) {
 			fail_command(conn, conn->ctask, DID_ABORT << 16);
 			continue;
@@ -800,6 +843,12 @@ check_mgmt:
 		if (conn->session->fast_abort && conn->tmf_state != TMF_INITIAL)
 			break;
 
+		/*
+		 * we always do fastlogout - conn stop code will clean up.
+		 */
+		if (conn->session->state == ISCSI_STATE_LOGGING_OUT)
+			break;
+
 		conn->ctask = list_entry(conn->requeue.next,
 					 struct iscsi_cmd_task, running);
 		conn->ctask->state = ISCSI_TASK_RUNNING;
@@ -842,6 +891,7 @@ enum {
 	FAILURE_SESSION_TERMINATE,
 	FAILURE_SESSION_IN_RECOVERY,
 	FAILURE_SESSION_RECOVERY_TIMEOUT,
+	FAILURE_SESSION_LOGGING_OUT,
 };
 
 int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
@@ -879,12 +929,19 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 			goto reject;
 		}
 
-		if (session->state == ISCSI_STATE_RECOVERY_FAILED)
+		switch (session->state) {
+		case ISCSI_STATE_RECOVERY_FAILED:
 			reason = FAILURE_SESSION_RECOVERY_TIMEOUT;
-		else if (session->state == ISCSI_STATE_TERMINATE)
+			break;
+		case ISCSI_STATE_TERMINATE:
 			reason = FAILURE_SESSION_TERMINATE;
-		else
+			break;
+		case ISCSI_STATE_LOGGING_OUT:
+			reason = FAILURE_SESSION_LOGGING_OUT;
+			break;
+		default:
 			reason = FAILURE_SESSION_FREED;
+		}
 		goto fault;
 	}
 
@@ -1120,45 +1177,10 @@ static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
 	if (age != session->age ||
 	    session->state != ISCSI_STATE_LOGGED_IN)
 		return -ENOTCONN;
-
-	if (!list_empty(&mtask->running)) {
-		list_del_init(&mtask->running);
-		__kfifo_put(session->mgmtpool.queue, (void*)&mtask,
-			    sizeof(void*));
-	}
 	return 0;
 }
 
 /*
- * session lock must be held
- */
-static void fail_command(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
-			 int err)
-{
-	struct scsi_cmnd *sc;
-
-	sc = ctask->sc;
-	if (!sc)
-		return;
-
-	if (ctask->state == ISCSI_TASK_PENDING)
-		/*
-		 * cmd never made it to the xmit thread, so we should not count
-		 * the cmd in the sequencing
-		 */
-		conn->session->queued_cmdsn--;
-	else
-		conn->session->tt->cleanup_cmd_task(conn, ctask);
-
-	sc->result = err;
-	scsi_set_resid(sc, scsi_bufflen(sc));
-	if (conn->ctask == ctask)
-		conn->ctask = NULL;
-	/* release ref from queuecommand */
-	__iscsi_put_ctask(ctask);
-}
-
-/*
  * Fail commands. session lock held and recv side suspended and xmit
  * thread flushed
  */
@@ -1837,22 +1859,13 @@ flush_control_queues(struct iscsi_session *session, struct iscsi_conn *conn)
 	/* handle pending */
 	list_for_each_entry_safe(mtask, tmp, &conn->mgmtqueue, running) {
 		debug_scsi("flushing pending mgmt task itt 0x%x\n", mtask->itt);
-		list_del_init(&mtask->running);
-		if (mtask == conn->login_mtask)
-			continue;
-		__kfifo_put(session->mgmtpool.queue, (void*)&mtask,
-			    sizeof(void*));
+		iscsi_free_mgmt_task(conn, mtask);
 	}
 
 	/* handle running */
 	list_for_each_entry_safe(mtask, tmp, &conn->mgmt_run_list, running) {
 		debug_scsi("flushing running mgmt task itt 0x%x\n", mtask->itt);
-		list_del_init(&mtask->running);
-
-		if (mtask == conn->login_mtask)
-			continue;
-		__kfifo_put(session->mgmtpool.queue, (void*)&mtask,
-			   sizeof(void*));
+		iscsi_free_mgmt_task(conn, mtask);
 	}
 
 	conn->mtask = NULL;
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index 4b3e3c1..d68f745 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -346,6 +346,8 @@ extern int __iscsi_complete_pdu(struct iscsi_conn *, struct iscsi_hdr *,
 extern int iscsi_verify_itt(struct iscsi_conn *, struct iscsi_hdr *,
 			    uint32_t *);
 extern void iscsi_requeue_ctask(struct iscsi_cmd_task *ctask);
+extern void iscsi_free_mgmt_task(struct iscsi_conn *conn,
+				 struct iscsi_mgmt_task *mtask);
 
 /*
  * generic helpers
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index 7ff6199..b8d97bd 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -176,6 +176,7 @@ struct iscsi_cls_conn {
 #define ISCSI_STATE_TERMINATE		4
 #define ISCSI_STATE_IN_RECOVERY		5
 #define ISCSI_STATE_RECOVERY_FAILED	6
+#define ISCSI_STATE_LOGGING_OUT		7
 
 struct iscsi_cls_session {
 	struct list_head sess_list;		/* item in session_list */
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/23] iscsi class: Use our own workq instead of common system one.
  2007-12-05  0:42             ` [PATCH 07/23] libiscsi: do not block session during logout michaelc
@ 2007-12-05  0:42               ` michaelc
  2007-12-05  0:42                 ` [PATCH 09/23] libiscsi: grab eh_mutex during host reset michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

There is just too much going on through the common workq and
something like a scsi device removal through sysfs affects
how long it will take to recover the transport, mark it as
failed, or shut it down gracefully.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/scsi_transport_iscsi.c |   16 ++++++++++++----
 1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index b6f346c..afdc2ae 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -50,6 +50,7 @@ struct iscsi_internal {
 };
 
 static atomic_t iscsi_session_nr; /* sysfs session id for next new session */
+static struct workqueue_struct *iscsi_eh_timer_workq;
 
 /*
  * list of registered transports and lock that must
@@ -252,7 +253,7 @@ static void session_recovery_timedout(struct work_struct *work)
 void iscsi_unblock_session(struct iscsi_cls_session *session)
 {
 	if (!cancel_delayed_work(&session->recovery_work))
-		flush_scheduled_work();
+		flush_workqueue(iscsi_eh_timer_workq);
 	scsi_target_unblock(&session->dev);
 }
 EXPORT_SYMBOL_GPL(iscsi_unblock_session);
@@ -260,8 +261,8 @@ EXPORT_SYMBOL_GPL(iscsi_unblock_session);
 void iscsi_block_session(struct iscsi_cls_session *session)
 {
 	scsi_target_block(&session->dev);
-	schedule_delayed_work(&session->recovery_work,
-			     session->recovery_tmo * HZ);
+	queue_delayed_work(iscsi_eh_timer_workq, &session->recovery_work,
+			   session->recovery_tmo * HZ);
 }
 EXPORT_SYMBOL_GPL(iscsi_block_session);
 
@@ -356,7 +357,7 @@ void iscsi_remove_session(struct iscsi_cls_session *session)
 	struct iscsi_host *ihost = shost->shost_data;
 
 	if (!cancel_delayed_work(&session->recovery_work))
-		flush_scheduled_work();
+		flush_workqueue(iscsi_eh_timer_workq);
 
 	mutex_lock(&ihost->mutex);
 	list_del(&session->host_list);
@@ -1520,8 +1521,14 @@ static __init int iscsi_transport_init(void)
 		goto unregister_session_class;
 	}
 
+	iscsi_eh_timer_workq = create_singlethread_workqueue("iscsi_eh");
+	if (!iscsi_eh_timer_workq)
+		goto release_nls;
+
 	return 0;
 
+release_nls:
+	sock_release(nls->sk_socket);
 unregister_session_class:
 	transport_class_unregister(&iscsi_session_class);
 unregister_conn_class:
@@ -1535,6 +1542,7 @@ unregister_transport_class:
 
 static void __exit iscsi_transport_exit(void)
 {
+	destroy_workqueue(iscsi_eh_timer_workq);
 	sock_release(nls->sk_socket);
 	transport_class_unregister(&iscsi_connection_class);
 	transport_class_unregister(&iscsi_session_class);
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/23] libiscsi: grab eh_mutex during host reset
  2007-12-05  0:42               ` [PATCH 08/23] iscsi class: Use our own workq instead of common system one michaelc
@ 2007-12-05  0:42                 ` michaelc
  2007-12-05  0:42                   ` [PATCH 10/23] libiscsi: fix shutdown michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

I thought we may not need the eh mutex during host reset, but that is wrong
with the new shutdown code. When start_session_recovery sets the state to
terminate then drops the session lock. The scsi eh thread could then grab the
session lock see that we are terminating and then return failed to scsi-ml.
scsi-ml's eh then owns the command and will do whatever it wants
with it. But then the iscsi eh thread could grab the session lock
and want to complete the scsi commands that we in the LLD, but
it no longer owns them and kaboom.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index b7a2b9a..441e351 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1079,17 +1079,19 @@ int iscsi_eh_host_reset(struct scsi_cmnd *sc)
 	struct iscsi_session *session = iscsi_hostdata(host->hostdata);
 	struct iscsi_conn *conn = session->leadconn;
 
+	mutex_lock(&session->eh_mutex);
 	spin_lock_bh(&session->lock);
 	if (session->state == ISCSI_STATE_TERMINATE) {
 failed:
 		debug_scsi("failing host reset: session terminated "
 			   "[CID %d age %d]\n", conn->id, session->age);
 		spin_unlock_bh(&session->lock);
+		mutex_unlock(&session->eh_mutex);
 		return FAILED;
 	}
 
 	spin_unlock_bh(&session->lock);
-
+	mutex_unlock(&session->eh_mutex);
 	/*
 	 * we drop the lock here but the leadconn cannot be destoyed while
 	 * we are in the scsi eh
@@ -1104,13 +1106,14 @@ failed:
 	if (signal_pending(current))
 		flush_signals(current);
 
+	mutex_lock(&session->eh_mutex);
 	spin_lock_bh(&session->lock);
 	if (session->state == ISCSI_STATE_LOGGED_IN)
 		printk(KERN_INFO "iscsi: host reset succeeded\n");
 	else
 		goto failed;
 	spin_unlock_bh(&session->lock);
-
+	mutex_unlock(&session->eh_mutex);
 	return SUCCESS;
 }
 EXPORT_SYMBOL_GPL(iscsi_eh_host_reset);
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/23] libiscsi: fix shutdown
  2007-12-05  0:42                 ` [PATCH 09/23] libiscsi: grab eh_mutex during host reset michaelc
@ 2007-12-05  0:42                   ` michaelc
  2007-12-05  0:42                     ` [PATCH 11/23] libiscsi: fix nop handling michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

We were using the device delete sysfs file to remove each device
then logout. Now in 2.6.21 and .22 this will not work because
the sysfs delete file returns immediately and does not wait for
the device removal to complete. This causes a hang if a cache sync
is needed during shutdown. Before .21, that approach had other
problems, so this patch fixes the shutdown code so that we remove the target
and unbind the session before logging out and shut down the session.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c             |    4 +-
 drivers/scsi/qla4xxx/ql4_init.c     |    4 +-
 drivers/scsi/qla4xxx/ql4_os.c       |    7 +-
 drivers/scsi/scsi_transport_iscsi.c |  287 +++++++++++++++++++----------------
 include/scsi/iscsi_if.h             |    7 +
 include/scsi/iscsi_proto.h          |    2 +
 include/scsi/scsi_transport_iscsi.h |    7 +-
 7 files changed, 175 insertions(+), 143 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 441e351..5205ef2 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1662,7 +1662,7 @@ void iscsi_session_teardown(struct iscsi_cls_session *cls_session)
 	struct iscsi_session *session = iscsi_hostdata(shost->hostdata);
 	struct module *owner = cls_session->transport->owner;
 
-	iscsi_unblock_session(cls_session);
+	iscsi_remove_session(cls_session);
 	scsi_remove_host(shost);
 
 	iscsi_pool_free(&session->mgmtpool);
@@ -1677,7 +1677,7 @@ void iscsi_session_teardown(struct iscsi_cls_session *cls_session)
 	kfree(session->hwaddress);
 	kfree(session->initiatorname);
 
-	iscsi_destroy_session(cls_session);
+	iscsi_free_session(cls_session);
 	scsi_host_put(shost);
 	module_put(owner);
 }
diff --git a/drivers/scsi/qla4xxx/ql4_init.c b/drivers/scsi/qla4xxx/ql4_init.c
index d692c71..cbe0a17 100644
--- a/drivers/scsi/qla4xxx/ql4_init.c
+++ b/drivers/scsi/qla4xxx/ql4_init.c
@@ -5,6 +5,7 @@
  * See LICENSE.qla4xxx for copyright and licensing details.
  */
 
+#include <scsi/iscsi_if.h>
 #include "ql4_def.h"
 #include "ql4_glbl.h"
 #include "ql4_dbg.h"
@@ -1305,7 +1306,8 @@ int qla4xxx_process_ddb_changed(struct scsi_qla_host *ha,
 		atomic_set(&ddb_entry->relogin_timer, 0);
 		clear_bit(DF_RELOGIN, &ddb_entry->flags);
 		clear_bit(DF_NO_RELOGIN, &ddb_entry->flags);
-		iscsi_if_create_session_done(ddb_entry->conn);
+		iscsi_session_event(ddb_entry->sess,
+				    ISCSI_KEVENT_CREATE_SESSION);
 		/*
 		 * Change the lun state to READY in case the lun TIMEOUT before
 		 * the device came back.
diff --git a/drivers/scsi/qla4xxx/ql4_os.c b/drivers/scsi/qla4xxx/ql4_os.c
index 89460d2..f55b9f7 100644
--- a/drivers/scsi/qla4xxx/ql4_os.c
+++ b/drivers/scsi/qla4xxx/ql4_os.c
@@ -298,8 +298,7 @@ void qla4xxx_destroy_sess(struct ddb_entry *ddb_entry)
 		return;
 
 	if (ddb_entry->conn) {
-		iscsi_if_destroy_session_done(ddb_entry->conn);
-		iscsi_destroy_conn(ddb_entry->conn);
+		atomic_set(&ddb_entry->state, DDB_STATE_DEAD);
 		iscsi_remove_session(ddb_entry->sess);
 	}
 	iscsi_free_session(ddb_entry->sess);
@@ -309,6 +308,7 @@ int qla4xxx_add_sess(struct ddb_entry *ddb_entry)
 {
 	int err;
 
+	ddb_entry->sess->recovery_tmo = ddb_entry->ha->port_down_retry_count;
 	err = iscsi_add_session(ddb_entry->sess, ddb_entry->fw_ddb_index);
 	if (err) {
 		DEBUG2(printk(KERN_ERR "Could not add session.\n"));
@@ -321,9 +321,6 @@ int qla4xxx_add_sess(struct ddb_entry *ddb_entry)
 		DEBUG2(printk(KERN_ERR "Could not add connection.\n"));
 		return -ENOMEM;
 	}
-
-	ddb_entry->sess->recovery_tmo = ddb_entry->ha->port_down_retry_count;
-	iscsi_if_create_session_done(ddb_entry->conn);
 	return 0;
 }
 
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index afdc2ae..0515b7d 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -116,6 +116,8 @@ static struct attribute_group iscsi_transport_group = {
 	.attrs = iscsi_transport_attrs,
 };
 
+
+
 static int iscsi_setup_host(struct transport_container *tc, struct device *dev,
 			    struct class_device *cdev)
 {
@@ -125,13 +127,30 @@ static int iscsi_setup_host(struct transport_container *tc, struct device *dev,
 	memset(ihost, 0, sizeof(*ihost));
 	INIT_LIST_HEAD(&ihost->sessions);
 	mutex_init(&ihost->mutex);
+
+	snprintf(ihost->unbind_workq_name, KOBJ_NAME_LEN, "iscsi_unbind_%d",
+		shost->host_no);
+	ihost->unbind_workq = create_singlethread_workqueue(
+						ihost->unbind_workq_name);
+	if (!ihost->unbind_workq)
+		return -ENOMEM;
+	return 0;
+}
+
+static int iscsi_remove_host(struct transport_container *tc, struct device *dev,
+			     struct class_device *cdev)
+{
+	struct Scsi_Host *shost = dev_to_shost(dev);
+	struct iscsi_host *ihost = shost->shost_data;
+
+	destroy_workqueue(ihost->unbind_workq);
 	return 0;
 }
 
 static DECLARE_TRANSPORT_CLASS(iscsi_host_class,
 			       "iscsi_host",
 			       iscsi_setup_host,
-			       NULL,
+			       iscsi_remove_host,
 			       NULL);
 
 static DECLARE_TRANSPORT_CLASS(iscsi_session_class,
@@ -266,6 +285,35 @@ void iscsi_block_session(struct iscsi_cls_session *session)
 }
 EXPORT_SYMBOL_GPL(iscsi_block_session);
 
+static void __iscsi_unbind_session(struct work_struct *work)
+{
+	struct iscsi_cls_session *session =
+			container_of(work, struct iscsi_cls_session,
+				     unbind_work);
+	struct Scsi_Host *shost = iscsi_session_to_shost(session);
+	struct iscsi_host *ihost = shost->shost_data;
+
+	/* Prevent new scans and make sure scanning is not in progress */
+	mutex_lock(&ihost->mutex);
+	if (list_empty(&session->host_list)) {
+		mutex_unlock(&ihost->mutex);
+		return;
+	}
+	list_del_init(&session->host_list);
+	mutex_unlock(&ihost->mutex);
+
+	scsi_remove_target(&session->dev);
+	iscsi_session_event(session, ISCSI_KEVENT_UNBIND_SESSION);
+}
+
+static int iscsi_unbind_session(struct iscsi_cls_session *session)
+{
+	struct Scsi_Host *shost = iscsi_session_to_shost(session);
+	struct iscsi_host *ihost = shost->shost_data;
+
+	return queue_work(ihost->unbind_workq, &session->unbind_work);
+}
+
 struct iscsi_cls_session *
 iscsi_alloc_session(struct Scsi_Host *shost,
 		    struct iscsi_transport *transport)
@@ -282,6 +330,7 @@ iscsi_alloc_session(struct Scsi_Host *shost,
 	INIT_DELAYED_WORK(&session->recovery_work, session_recovery_timedout);
 	INIT_LIST_HEAD(&session->host_list);
 	INIT_LIST_HEAD(&session->sess_list);
+	INIT_WORK(&session->unbind_work, __iscsi_unbind_session);
 
 	/* this is released in the dev's release function */
 	scsi_host_get(shost);
@@ -298,6 +347,7 @@ int iscsi_add_session(struct iscsi_cls_session *session, unsigned int target_id)
 {
 	struct Scsi_Host *shost = iscsi_session_to_shost(session);
 	struct iscsi_host *ihost;
+	unsigned long flags;
 	int err;
 
 	ihost = shost->shost_data;
@@ -314,9 +364,15 @@ int iscsi_add_session(struct iscsi_cls_session *session, unsigned int target_id)
 	}
 	transport_register_device(&session->dev);
 
+	spin_lock_irqsave(&sesslock, flags);
+	list_add(&session->sess_list, &sesslist);
+	spin_unlock_irqrestore(&sesslock, flags);
+
 	mutex_lock(&ihost->mutex);
 	list_add(&session->host_list, &ihost->sessions);
 	mutex_unlock(&ihost->mutex);
+
+	iscsi_session_event(session, ISCSI_KEVENT_CREATE_SESSION);
 	return 0;
 
 release_host:
@@ -351,19 +407,58 @@ iscsi_create_session(struct Scsi_Host *shost,
 }
 EXPORT_SYMBOL_GPL(iscsi_create_session);
 
+static void iscsi_conn_release(struct device *dev)
+{
+	struct iscsi_cls_conn *conn = iscsi_dev_to_conn(dev);
+	struct device *parent = conn->dev.parent;
+
+	kfree(conn);
+	put_device(parent);
+}
+
+static int iscsi_is_conn_dev(const struct device *dev)
+{
+	return dev->release == iscsi_conn_release;
+}
+
+static int iscsi_iter_destroy_conn_fn(struct device *dev, void *data)
+{
+	if (!iscsi_is_conn_dev(dev))
+		return 0;
+	return iscsi_destroy_conn(iscsi_dev_to_conn(dev));
+}
+
 void iscsi_remove_session(struct iscsi_cls_session *session)
 {
 	struct Scsi_Host *shost = iscsi_session_to_shost(session);
 	struct iscsi_host *ihost = shost->shost_data;
+	unsigned long flags;
+	int err;
+
+	spin_lock_irqsave(&sesslock, flags);
+	list_del(&session->sess_list);
+	spin_unlock_irqrestore(&sesslock, flags);
 
+	/*
+	 * If we are blocked let commands flow again. The lld or iscsi
+	 * layer should set up the queuecommand to fail commands.
+	 */
+	iscsi_unblock_session(session);
+	iscsi_unbind_session(session);
+	/*
+	 * If the session dropped while removing devices then we need to make
+	 * sure it is not blocked
+	 */
 	if (!cancel_delayed_work(&session->recovery_work))
 		flush_workqueue(iscsi_eh_timer_workq);
+	flush_workqueue(ihost->unbind_workq);
 
-	mutex_lock(&ihost->mutex);
-	list_del(&session->host_list);
-	mutex_unlock(&ihost->mutex);
-
-	scsi_remove_target(&session->dev);
+	/* hw iscsi may not have removed all connections from session */
+	err = device_for_each_child(&session->dev, NULL,
+				    iscsi_iter_destroy_conn_fn);
+	if (err)
+		dev_printk(KERN_ERR, &session->dev, "iscsi: Could not delete "
+			   "all connections for session. Error %d.\n", err);
 
 	transport_unregister_device(&session->dev);
 	device_del(&session->dev);
@@ -372,9 +467,9 @@ EXPORT_SYMBOL_GPL(iscsi_remove_session);
 
 void iscsi_free_session(struct iscsi_cls_session *session)
 {
+	iscsi_session_event(session, ISCSI_KEVENT_DESTROY_SESSION);
 	put_device(&session->dev);
 }
-
 EXPORT_SYMBOL_GPL(iscsi_free_session);
 
 /**
@@ -392,20 +487,6 @@ int iscsi_destroy_session(struct iscsi_cls_session *session)
 }
 EXPORT_SYMBOL_GPL(iscsi_destroy_session);
 
-static void iscsi_conn_release(struct device *dev)
-{
-	struct iscsi_cls_conn *conn = iscsi_dev_to_conn(dev);
-	struct device *parent = conn->dev.parent;
-
-	kfree(conn);
-	put_device(parent);
-}
-
-static int iscsi_is_conn_dev(const struct device *dev)
-{
-	return dev->release == iscsi_conn_release;
-}
-
 /**
  * iscsi_create_conn - create iscsi class connection
  * @session: iscsi cls session
@@ -425,6 +506,7 @@ iscsi_create_conn(struct iscsi_cls_session *session, uint32_t cid)
 {
 	struct iscsi_transport *transport = session->transport;
 	struct iscsi_cls_conn *conn;
+	unsigned long flags;
 	int err;
 
 	conn = kzalloc(sizeof(*conn) + transport->conndata_size, GFP_KERNEL);
@@ -453,6 +535,11 @@ iscsi_create_conn(struct iscsi_cls_session *session, uint32_t cid)
 		goto release_parent_ref;
 	}
 	transport_register_device(&conn->dev);
+
+	spin_lock_irqsave(&connlock, flags);
+	list_add(&conn->conn_list, &connlist);
+	conn->active = 1;
+	spin_unlock_irqrestore(&connlock, flags);
 	return conn;
 
 release_parent_ref:
@@ -472,11 +559,17 @@ EXPORT_SYMBOL_GPL(iscsi_create_conn);
  **/
 int iscsi_destroy_conn(struct iscsi_cls_conn *conn)
 {
+	unsigned long flags;
+
+	spin_lock_irqsave(&connlock, flags);
+	conn->active = 0;
+	list_del(&conn->conn_list);
+	spin_unlock_irqrestore(&connlock, flags);
+
 	transport_unregister_device(&conn->dev);
 	device_unregister(&conn->dev);
 	return 0;
 }
-
 EXPORT_SYMBOL_GPL(iscsi_destroy_conn);
 
 /*
@@ -686,132 +779,74 @@ iscsi_if_get_stats(struct iscsi_transport *transport, struct nlmsghdr *nlh)
 }
 
 /**
- * iscsi_if_destroy_session_done - send session destr. completion event
- * @conn: last connection for session
- *
- * This is called by HW iscsi LLDs to notify userpsace that its HW has
- * removed a session.
+ * iscsi_session_event - send session destr. completion event
+ * @session: iscsi class session
+ * @event: type of event
  **/
-int iscsi_if_destroy_session_done(struct iscsi_cls_conn *conn)
+int iscsi_session_event(struct iscsi_cls_session *session,
+			enum iscsi_uevent_e event)
 {
 	struct iscsi_internal *priv;
-	struct iscsi_cls_session *session;
 	struct Scsi_Host *shost;
 	struct iscsi_uevent *ev;
 	struct sk_buff  *skb;
 	struct nlmsghdr *nlh;
-	unsigned long flags;
 	int rc, len = NLMSG_SPACE(sizeof(*ev));
 
-	priv = iscsi_if_transport_lookup(conn->transport);
+	priv = iscsi_if_transport_lookup(session->transport);
 	if (!priv)
 		return -EINVAL;
-
-	session = iscsi_dev_to_session(conn->dev.parent);
 	shost = iscsi_session_to_shost(session);
 
 	skb = alloc_skb(len, GFP_KERNEL);
 	if (!skb) {
-		dev_printk(KERN_ERR, &conn->dev, "Cannot notify userspace of "
-			  "session creation event\n");
+		dev_printk(KERN_ERR, &session->dev, "Cannot notify userspace "
+			  "of session event %u\n", event);
 		return -ENOMEM;
 	}
 
 	nlh = __nlmsg_put(skb, priv->daemon_pid, 0, 0, (len - sizeof(*nlh)), 0);
 	ev = NLMSG_DATA(nlh);
-	ev->transport_handle = iscsi_handle(conn->transport);
-	ev->type = ISCSI_KEVENT_DESTROY_SESSION;
-	ev->r.d_session.host_no = shost->host_no;
-	ev->r.d_session.sid = session->sid;
-
-	/*
-	 * this will occur if the daemon is not up, so we just warn
-	 * the user and when the daemon is restarted it will handle it
-	 */
-	rc = iscsi_broadcast_skb(skb, GFP_KERNEL);
-	if (rc < 0)
-		dev_printk(KERN_ERR, &conn->dev, "Cannot notify userspace of "
-			  "session destruction event. Check iscsi daemon\n");
-
-	spin_lock_irqsave(&sesslock, flags);
-	list_del(&session->sess_list);
-	spin_unlock_irqrestore(&sesslock, flags);
-
-	spin_lock_irqsave(&connlock, flags);
-	conn->active = 0;
-	list_del(&conn->conn_list);
-	spin_unlock_irqrestore(&connlock, flags);
-
-	return rc;
-}
-EXPORT_SYMBOL_GPL(iscsi_if_destroy_session_done);
-
-/**
- * iscsi_if_create_session_done - send session creation completion event
- * @conn: leading connection for session
- *
- * This is called by HW iscsi LLDs to notify userpsace that its HW has
- * created a session or a existing session is back in the logged in state.
- **/
-int iscsi_if_create_session_done(struct iscsi_cls_conn *conn)
-{
-	struct iscsi_internal *priv;
-	struct iscsi_cls_session *session;
-	struct Scsi_Host *shost;
-	struct iscsi_uevent *ev;
-	struct sk_buff  *skb;
-	struct nlmsghdr *nlh;
-	unsigned long flags;
-	int rc, len = NLMSG_SPACE(sizeof(*ev));
+	ev->transport_handle = iscsi_handle(session->transport);
 
-	priv = iscsi_if_transport_lookup(conn->transport);
-	if (!priv)
+	ev->type = event;
+	switch (event) {
+	case ISCSI_KEVENT_DESTROY_SESSION:
+		ev->r.d_session.host_no = shost->host_no;
+		ev->r.d_session.sid = session->sid;
+		break;
+	case ISCSI_KEVENT_CREATE_SESSION:
+		ev->r.c_session_ret.host_no = shost->host_no;
+		ev->r.c_session_ret.sid = session->sid;
+		break;
+	case ISCSI_KEVENT_UNBIND_SESSION:
+		ev->r.unbind_session.host_no = shost->host_no;
+		ev->r.unbind_session.sid = session->sid;
+		break;
+	default:
+		dev_printk(KERN_ERR, &session->dev, "Invalid event %u.\n",
+			   event);
+		kfree_skb(skb);
 		return -EINVAL;
-
-	session = iscsi_dev_to_session(conn->dev.parent);
-	shost = iscsi_session_to_shost(session);
-
-	skb = alloc_skb(len, GFP_KERNEL);
-	if (!skb) {
-		dev_printk(KERN_ERR, &conn->dev, "Cannot notify userspace of "
-			  "session creation event\n");
-		return -ENOMEM;
 	}
 
-	nlh = __nlmsg_put(skb, priv->daemon_pid, 0, 0, (len - sizeof(*nlh)), 0);
-	ev = NLMSG_DATA(nlh);
-	ev->transport_handle = iscsi_handle(conn->transport);
-	ev->type = ISCSI_UEVENT_CREATE_SESSION;
-	ev->r.c_session_ret.host_no = shost->host_no;
-	ev->r.c_session_ret.sid = session->sid;
-
 	/*
 	 * this will occur if the daemon is not up, so we just warn
 	 * the user and when the daemon is restarted it will handle it
 	 */
 	rc = iscsi_broadcast_skb(skb, GFP_KERNEL);
 	if (rc < 0)
-		dev_printk(KERN_ERR, &conn->dev, "Cannot notify userspace of "
-			  "session creation event. Check iscsi daemon\n");
-
-	spin_lock_irqsave(&sesslock, flags);
-	list_add(&session->sess_list, &sesslist);
-	spin_unlock_irqrestore(&sesslock, flags);
-
-	spin_lock_irqsave(&connlock, flags);
-	list_add(&conn->conn_list, &connlist);
-	conn->active = 1;
-	spin_unlock_irqrestore(&connlock, flags);
+		dev_printk(KERN_ERR, &session->dev, "Cannot notify userspace "
+			  "of session event %u. Check iscsi daemon\n", event);
 	return rc;
 }
-EXPORT_SYMBOL_GPL(iscsi_if_create_session_done);
+EXPORT_SYMBOL_GPL(iscsi_session_event);
 
 static int
 iscsi_if_create_session(struct iscsi_internal *priv, struct iscsi_uevent *ev)
 {
 	struct iscsi_transport *transport = priv->iscsi_transport;
 	struct iscsi_cls_session *session;
-	unsigned long flags;
 	uint32_t hostno;
 
 	session = transport->create_session(transport, &priv->t,
@@ -822,10 +857,6 @@ iscsi_if_create_session(struct iscsi_internal *priv, struct iscsi_uevent *ev)
 	if (!session)
 		return -ENOMEM;
 
-	spin_lock_irqsave(&sesslock, flags);
-	list_add(&session->sess_list, &sesslist);
-	spin_unlock_irqrestore(&sesslock, flags);
-
 	ev->r.c_session_ret.host_no = hostno;
 	ev->r.c_session_ret.sid = session->sid;
 	return 0;
@@ -836,7 +867,6 @@ iscsi_if_create_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
 {
 	struct iscsi_cls_conn *conn;
 	struct iscsi_cls_session *session;
-	unsigned long flags;
 
 	session = iscsi_session_lookup(ev->u.c_conn.sid);
 	if (!session) {
@@ -855,28 +885,17 @@ iscsi_if_create_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
 
 	ev->r.c_conn_ret.sid = session->sid;
 	ev->r.c_conn_ret.cid = conn->cid;
-
-	spin_lock_irqsave(&connlock, flags);
-	list_add(&conn->conn_list, &connlist);
-	conn->active = 1;
-	spin_unlock_irqrestore(&connlock, flags);
-
 	return 0;
 }
 
 static int
 iscsi_if_destroy_conn(struct iscsi_transport *transport, struct iscsi_uevent *ev)
 {
-	unsigned long flags;
 	struct iscsi_cls_conn *conn;
 
 	conn = iscsi_conn_lookup(ev->u.d_conn.sid, ev->u.d_conn.cid);
 	if (!conn)
 		return -EINVAL;
-	spin_lock_irqsave(&connlock, flags);
-	conn->active = 0;
-	list_del(&conn->conn_list);
-	spin_unlock_irqrestore(&connlock, flags);
 
 	if (transport->destroy_conn)
 		transport->destroy_conn(conn);
@@ -1003,7 +1022,6 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	struct iscsi_internal *priv;
 	struct iscsi_cls_session *session;
 	struct iscsi_cls_conn *conn;
-	unsigned long flags;
 
 	priv = iscsi_if_transport_lookup(iscsi_ptr(ev->transport_handle));
 	if (!priv)
@@ -1021,13 +1039,16 @@ iscsi_if_recv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		break;
 	case ISCSI_UEVENT_DESTROY_SESSION:
 		session = iscsi_session_lookup(ev->u.d_session.sid);
-		if (session) {
-			spin_lock_irqsave(&sesslock, flags);
-			list_del(&session->sess_list);
-			spin_unlock_irqrestore(&sesslock, flags);
-
+		if (session)
 			transport->destroy_session(session);
-		} else
+		else
+			err = -EINVAL;
+		break;
+	case ISCSI_UEVENT_UNBIND_SESSION:
+		session = iscsi_session_lookup(ev->u.d_session.sid);
+		if (session)
+			iscsi_unbind_session(session);
+		else
 			err = -EINVAL;
 		break;
 	case ISCSI_UEVENT_CREATE_CONN:
diff --git a/include/scsi/iscsi_if.h b/include/scsi/iscsi_if.h
index bff0b1f..8a4426d 100644
--- a/include/scsi/iscsi_if.h
+++ b/include/scsi/iscsi_if.h
@@ -49,12 +49,15 @@ enum iscsi_uevent_e {
 
 	ISCSI_UEVENT_TGT_DSCVR		= UEVENT_BASE + 15,
 	ISCSI_UEVENT_SET_HOST_PARAM	= UEVENT_BASE + 16,
+	ISCSI_UEVENT_UNBIND_SESSION	= UEVENT_BASE + 17,
 
 	/* up events */
 	ISCSI_KEVENT_RECV_PDU		= KEVENT_BASE + 1,
 	ISCSI_KEVENT_CONN_ERROR		= KEVENT_BASE + 2,
 	ISCSI_KEVENT_IF_ERROR		= KEVENT_BASE + 3,
 	ISCSI_KEVENT_DESTROY_SESSION	= KEVENT_BASE + 4,
+	ISCSI_KEVENT_UNBIND_SESSION	= KEVENT_BASE + 5,
+	ISCSI_KEVENT_CREATE_SESSION	= KEVENT_BASE + 6,
 };
 
 enum iscsi_tgt_dscvr {
@@ -156,6 +159,10 @@ struct iscsi_uevent {
 			uint32_t	sid;
 			uint32_t	cid;
 		} c_conn_ret;
+		struct msg_unbind_session {
+			uint32_t	sid;
+			uint32_t	host_no;
+		} unbind_session;
 		struct msg_recv_req {
 			uint32_t	sid;
 			uint32_t	cid;
diff --git a/include/scsi/iscsi_proto.h b/include/scsi/iscsi_proto.h
index 6947082..318a909 100644
--- a/include/scsi/iscsi_proto.h
+++ b/include/scsi/iscsi_proto.h
@@ -21,6 +21,8 @@
 #ifndef ISCSI_PROTO_H
 #define ISCSI_PROTO_H
 
+#include <linux/types.h>
+
 #define ISCSI_DRAFT20_VERSION	0x00
 
 /* default iSCSI listen port for incoming connections */
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index b8d97bd..093b403 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -186,6 +186,7 @@ struct iscsi_cls_session {
 	/* recovery fields */
 	int recovery_tmo;
 	struct delayed_work recovery_work;
+	struct work_struct unbind_work;
 
 	int target_id;
 
@@ -206,6 +207,8 @@ struct iscsi_cls_session {
 struct iscsi_host {
 	struct list_head sessions;
 	struct mutex mutex;
+	struct workqueue_struct *unbind_workq;
+	char unbind_workq_name[KOBJ_NAME_LEN];
 };
 
 /*
@@ -215,8 +218,8 @@ extern struct iscsi_cls_session *iscsi_alloc_session(struct Scsi_Host *shost,
 					struct iscsi_transport *transport);
 extern int iscsi_add_session(struct iscsi_cls_session *session,
 			     unsigned int target_id);
-extern int iscsi_if_create_session_done(struct iscsi_cls_conn *conn);
-extern int iscsi_if_destroy_session_done(struct iscsi_cls_conn *conn);
+extern int iscsi_session_event(struct iscsi_cls_session *session,
+			       enum iscsi_uevent_e event);
 extern struct iscsi_cls_session *iscsi_create_session(struct Scsi_Host *shost,
 						struct iscsi_transport *t,
 						unsigned int target_id);
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 11/23] libiscsi: fix nop handling
  2007-12-05  0:42                   ` [PATCH 10/23] libiscsi: fix shutdown michaelc
@ 2007-12-05  0:42                     ` michaelc
  2007-12-05  0:42                       ` [PATCH 12/23] iscsi_tcp: update the website URL michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

During root boot and shutdown the target could send us nops.
At this time iscsid cannot be running, so the target will drop
the session and the boot or shutdown will hang.

To handle this and allow us to better control when to check the network
this patch moves the nop handling to the kernel.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |    4 +-
 drivers/scsi/iscsi_tcp.c                 |    4 +-
 drivers/scsi/libiscsi.c                  |  331 ++++++++++++++++++++++++------
 drivers/scsi/scsi_transport_iscsi.c      |    4 +
 include/scsi/iscsi_if.h                  |   11 +
 include/scsi/libiscsi.h                  |    8 +
 6 files changed, 294 insertions(+), 68 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index a2622f4..2656064 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -577,7 +577,9 @@ static struct iscsi_transport iscsi_iser_transport = {
 				  ISCSI_PERSISTENT_ADDRESS |
 				  ISCSI_TARGET_NAME | ISCSI_TPGT |
 				  ISCSI_USERNAME | ISCSI_PASSWORD |
-				  ISCSI_USERNAME_IN | ISCSI_PASSWORD_IN,
+				  ISCSI_USERNAME_IN | ISCSI_PASSWORD_IN |
+				  ISCSI_FAST_ABORT | ISCSI_ABORT_TMO |
+				  ISCSI_PING_TMO | ISCSI_RECV_TMO,
 	.host_param_mask	= ISCSI_HOST_HWADDRESS |
 				  ISCSI_HOST_NETDEV_NAME |
 				  ISCSI_HOST_INITIATOR_NAME,
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 90eae8e..9b41852 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -2246,7 +2246,9 @@ static struct iscsi_transport iscsi_tcp_transport = {
 				  ISCSI_TARGET_NAME | ISCSI_TPGT |
 				  ISCSI_USERNAME | ISCSI_PASSWORD |
 				  ISCSI_USERNAME_IN | ISCSI_PASSWORD_IN |
-				  ISCSI_FAST_ABORT,
+				  ISCSI_FAST_ABORT | ISCSI_ABORT_TMO |
+				  ISCSI_LU_RESET_TMO |
+				  ISCSI_PING_TMO | ISCSI_RECV_TMO,
 	.host_param_mask	= ISCSI_HOST_HWADDRESS | ISCSI_HOST_IPADDRESS |
 				  ISCSI_HOST_INITIATOR_NAME |
 				  ISCSI_HOST_NETDEV_NAME,
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 5205ef2..9688361 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -313,11 +313,70 @@ void iscsi_free_mgmt_task(struct iscsi_conn *conn,
 	list_del_init(&mtask->running);
 	if (conn->login_mtask == mtask)
 		return;
+
+	if (conn->ping_mtask == mtask)
+		conn->ping_mtask = NULL;
 	__kfifo_put(conn->session->mgmtpool.queue,
 		    (void*)&mtask, sizeof(void*));
 }
 EXPORT_SYMBOL_GPL(iscsi_free_mgmt_task);
 
+static struct iscsi_mgmt_task *
+__iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
+		      char *data, uint32_t data_size)
+{
+	struct iscsi_session *session = conn->session;
+	struct iscsi_mgmt_task *mtask;
+
+	if (session->state == ISCSI_STATE_TERMINATE)
+		return NULL;
+
+	if (hdr->opcode == (ISCSI_OP_LOGIN | ISCSI_OP_IMMEDIATE) ||
+	    hdr->opcode == (ISCSI_OP_TEXT | ISCSI_OP_IMMEDIATE))
+		/*
+		 * Login and Text are sent serially, in
+		 * request-followed-by-response sequence.
+		 * Same mtask can be used. Same ITT must be used.
+		 * Note that login_mtask is preallocated at conn_create().
+		 */
+		mtask = conn->login_mtask;
+	else {
+		BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE);
+		BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED);
+
+		if (!__kfifo_get(session->mgmtpool.queue,
+				 (void*)&mtask, sizeof(void*)))
+			return NULL;
+	}
+
+	if (data_size) {
+		memcpy(mtask->data, data, data_size);
+		mtask->data_count = data_size;
+	} else
+		mtask->data_count = 0;
+
+	memcpy(mtask->hdr, hdr, sizeof(struct iscsi_hdr));
+	INIT_LIST_HEAD(&mtask->running);
+	list_add_tail(&mtask->running, &conn->mgmtqueue);
+	return mtask;
+}
+
+int iscsi_conn_send_pdu(struct iscsi_cls_conn *cls_conn, struct iscsi_hdr *hdr,
+			char *data, uint32_t data_size)
+{
+	struct iscsi_conn *conn = cls_conn->dd_data;
+	struct iscsi_session *session = conn->session;
+	int err = 0;
+
+	spin_lock_bh(&session->lock);
+	if (!__iscsi_conn_send_pdu(conn, hdr, data, data_size))
+		err = -EPERM;
+	spin_unlock_bh(&session->lock);
+	scsi_queue_work(session->host, &conn->xmitwork);
+	return err;
+}
+EXPORT_SYMBOL_GPL(iscsi_conn_send_pdu);
+
 /**
  * iscsi_cmd_rsp - SCSI Command Response processing
  * @conn: iscsi connection
@@ -409,6 +468,39 @@ static void iscsi_tmf_rsp(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 	wake_up(&conn->ehwait);
 }
 
+static void iscsi_send_nopout(struct iscsi_conn *conn, struct iscsi_nopin *rhdr)
+{
+        struct iscsi_nopout hdr;
+	struct iscsi_mgmt_task *mtask;
+
+	if (!rhdr && conn->ping_mtask)
+		return;
+
+	memset(&hdr, 0, sizeof(struct iscsi_nopout));
+	hdr.opcode = ISCSI_OP_NOOP_OUT | ISCSI_OP_IMMEDIATE;
+	hdr.flags = ISCSI_FLAG_CMD_FINAL;
+
+	if (rhdr) {
+		memcpy(hdr.lun, rhdr->lun, 8);
+		hdr.ttt = rhdr->ttt;
+		hdr.itt = RESERVED_ITT;
+	} else
+		hdr.ttt = RESERVED_ITT;
+
+	mtask = __iscsi_conn_send_pdu(conn, (struct iscsi_hdr *)&hdr, NULL, 0);
+	if (!mtask) {
+		printk(KERN_ERR "Could not send nopout\n");
+		return;
+	}
+
+	/* only track our nops */
+	if (!rhdr) {
+		conn->ping_mtask = mtask;
+		conn->last_ping = jiffies;
+	}
+	scsi_queue_work(conn->session->host, &conn->xmitwork);
+}
+
 static int iscsi_handle_reject(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			       char *data, int datalen)
 {
@@ -453,6 +545,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 	struct iscsi_mgmt_task *mtask;
 	uint32_t itt;
 
+	conn->last_recv = jiffies;
 	if (hdr->itt != RESERVED_ITT)
 		itt = get_itt(hdr->itt);
 	else
@@ -520,14 +613,22 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			iscsi_free_mgmt_task(conn, mtask);
 			break;
 		case ISCSI_OP_NOOP_IN:
-			if (hdr->ttt != cpu_to_be32(ISCSI_RESERVED_TAG) || datalen) {
+			if (hdr->ttt != cpu_to_be32(ISCSI_RESERVED_TAG) ||
+			    datalen) {
 				rc = ISCSI_ERR_PROTO;
 				break;
 			}
 			conn->exp_statsn = be32_to_cpu(hdr->statsn) + 1;
 
-			if (iscsi_recv_pdu(conn->cls_conn, hdr, data, datalen))
-				rc = ISCSI_ERR_CONN_FAILED;
+			if (conn->ping_mtask != mtask) {
+				/*
+				 * If this is not in response to one of our
+				 * nops then it must be from userspace.
+				 */
+				if (iscsi_recv_pdu(conn->cls_conn, hdr, data,
+						   datalen))
+					rc = ISCSI_ERR_CONN_FAILED;
+			}
 			iscsi_free_mgmt_task(conn, mtask);
 			break;
 		default:
@@ -547,8 +648,7 @@ int __iscsi_complete_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
 			if (hdr->ttt == cpu_to_be32(ISCSI_RESERVED_TAG))
 				break;
 
-			if (iscsi_recv_pdu(conn->cls_conn, hdr, NULL, 0))
-				rc = ISCSI_ERR_CONN_FAILED;
+			iscsi_send_nopout(conn, (struct iscsi_nopin*)hdr);
 			break;
 		case ISCSI_OP_REJECT:
 			rc = iscsi_handle_reject(conn, hdr, data, datalen);
@@ -1003,62 +1103,6 @@ int iscsi_change_queue_depth(struct scsi_device *sdev, int depth)
 }
 EXPORT_SYMBOL_GPL(iscsi_change_queue_depth);
 
-static struct iscsi_mgmt_task *
-__iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
-		      char *data, uint32_t data_size)
-{
-	struct iscsi_session *session = conn->session;
-	struct iscsi_mgmt_task *mtask;
-
-	if (session->state == ISCSI_STATE_TERMINATE)
-		return NULL;
-
-	if (hdr->opcode == (ISCSI_OP_LOGIN | ISCSI_OP_IMMEDIATE) ||
-	    hdr->opcode == (ISCSI_OP_TEXT | ISCSI_OP_IMMEDIATE))
-		/*
-		 * Login and Text are sent serially, in
-		 * request-followed-by-response sequence.
-		 * Same mtask can be used. Same ITT must be used.
-		 * Note that login_mtask is preallocated at conn_create().
-		 */
-		mtask = conn->login_mtask;
-	else {
-		BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE);
-		BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED);
-
-		if (!__kfifo_get(session->mgmtpool.queue,
-				 (void*)&mtask, sizeof(void*)))
-			return NULL;
-	}
-
-	if (data_size) {
-		memcpy(mtask->data, data, data_size);
-		mtask->data_count = data_size;
-	} else
-		mtask->data_count = 0;
-
-	memcpy(mtask->hdr, hdr, sizeof(struct iscsi_hdr));
-	INIT_LIST_HEAD(&mtask->running);
-	list_add_tail(&mtask->running, &conn->mgmtqueue);
-	return mtask;
-}
-
-int iscsi_conn_send_pdu(struct iscsi_cls_conn *cls_conn, struct iscsi_hdr *hdr,
-			char *data, uint32_t data_size)
-{
-	struct iscsi_conn *conn = cls_conn->dd_data;
-	struct iscsi_session *session = conn->session;
-	int err = 0;
-
-	spin_lock_bh(&session->lock);
-	if (!__iscsi_conn_send_pdu(conn, hdr, data, data_size))
-		err = -EPERM;
-	spin_unlock_bh(&session->lock);
-	scsi_queue_work(session->host, &conn->xmitwork);
-	return err;
-}
-EXPORT_SYMBOL_GPL(iscsi_conn_send_pdu);
-
 void iscsi_session_recovery_timedout(struct iscsi_cls_session *cls_session)
 {
 	struct iscsi_session *session = class_to_transport_session(cls_session);
@@ -1134,7 +1178,8 @@ static void iscsi_tmf_timedout(unsigned long data)
 }
 
 static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
-				   struct iscsi_tm *hdr, int age)
+				   struct iscsi_tm *hdr, int age,
+				   int timeout)
 {
 	struct iscsi_session *session = conn->session;
 	struct iscsi_mgmt_task *mtask;
@@ -1149,7 +1194,7 @@ static int iscsi_exec_task_mgmt_fn(struct iscsi_conn *conn,
 		return -EPERM;
 	}
 	conn->tmfcmd_pdus_cnt++;
-	conn->tmf_timer.expires = 30 * HZ + jiffies;
+	conn->tmf_timer.expires = timeout * HZ + jiffies;
 	conn->tmf_timer.function = iscsi_tmf_timedout;
 	conn->tmf_timer.data = (unsigned long)conn;
 	add_timer(&conn->tmf_timer);
@@ -1233,6 +1278,106 @@ static void iscsi_start_tx(struct iscsi_conn *conn)
 	scsi_queue_work(conn->session->host, &conn->xmitwork);
 }
 
+static enum scsi_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *scmd)
+{
+	struct iscsi_cls_session *cls_session;
+	struct iscsi_session *session;
+	struct iscsi_conn *conn;
+	enum scsi_eh_timer_return rc = EH_NOT_HANDLED;
+
+	cls_session = starget_to_session(scsi_target(scmd->device));
+	session = class_to_transport_session(cls_session);
+
+	debug_scsi("scsi cmd %p timedout\n", scmd);
+
+	spin_lock(&session->lock);
+	if (session->state != ISCSI_STATE_LOGGED_IN) {
+		/*
+		 * We are probably in the middle of iscsi recovery so let
+		 * that complete and handle the error.
+		 */
+		rc = EH_RESET_TIMER;
+		goto done;
+	}
+
+	conn = session->leadconn;
+	if (!conn) {
+		/* In the middle of shuting down */
+		rc = EH_RESET_TIMER;
+		goto done;
+	}
+
+	if (!conn->recv_timeout && !conn->ping_timeout)
+		goto done;
+	/*
+	 * if the ping timedout then we are in the middle of cleaning up
+	 * and can let the iscsi eh handle it
+	 */
+	if (time_before_eq(conn->last_recv + (conn->recv_timeout * HZ) +
+			    (conn->ping_timeout * HZ), jiffies))
+		rc = EH_RESET_TIMER;
+	/*
+	 * if we are about to check the transport then give the command
+	 * more time
+	 */
+	if (time_before_eq(conn->last_recv + (conn->recv_timeout * HZ),
+			   jiffies))
+		rc = EH_RESET_TIMER;
+	/* if in the middle of checking the transport then give us more time */
+	if (conn->ping_mtask)
+		rc = EH_RESET_TIMER;
+done:
+	spin_unlock(&session->lock);
+	debug_scsi("return %s\n", rc == EH_RESET_TIMER ? "timer reset" : "nh");
+	return rc;
+}
+
+static void iscsi_check_transport_timeouts(unsigned long data)
+{
+	struct iscsi_conn *conn = (struct iscsi_conn *)data;
+	struct iscsi_session *session = conn->session;
+	unsigned long timeout, next_timeout = 0, last_recv;
+
+	spin_lock(&session->lock);
+	if (session->state != ISCSI_STATE_LOGGED_IN)
+		goto done;
+
+	timeout = conn->recv_timeout;
+	if (!timeout)
+		goto done;
+
+	timeout *= HZ;
+	last_recv = conn->last_recv;
+	if (time_before_eq(last_recv + timeout + (conn->ping_timeout * HZ),
+			   jiffies)) {
+		printk(KERN_ERR "ping timeout of %d secs expired, "
+		       "last rx %lu, last ping %lu, now %lu\n",
+		       conn->ping_timeout, last_recv,
+		       conn->last_ping, jiffies);
+		spin_unlock(&session->lock);
+		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+		return;
+	}
+
+	if (time_before_eq(last_recv + timeout, jiffies)) {
+		if (time_before_eq(conn->last_ping, last_recv)) {
+			/* send a ping to try to provoke some traffic */
+			debug_scsi("Sending nopout as ping on conn %p\n", conn);
+			iscsi_send_nopout(conn, NULL);
+		}
+		next_timeout = last_recv + timeout + (conn->ping_timeout * HZ);
+	} else {
+		next_timeout = last_recv + timeout;
+	}
+
+	if (next_timeout) {
+		debug_scsi("Setting next tmo %lu\n", next_timeout);
+		mod_timer(&conn->transport_timer, next_timeout);
+	}
+done:
+	spin_unlock(&session->lock);
+}
+
 static void iscsi_prep_abort_task_pdu(struct iscsi_cmd_task *ctask,
 				      struct iscsi_tm *hdr)
 {
@@ -1304,7 +1449,7 @@ int iscsi_eh_abort(struct scsi_cmnd *sc)
 	hdr = &conn->tmhdr;
 	iscsi_prep_abort_task_pdu(ctask, hdr);
 
-	if (iscsi_exec_task_mgmt_fn(conn, hdr, age)) {
+	if (iscsi_exec_task_mgmt_fn(conn, hdr, age, session->abort_timeout)) {
 		rc = FAILED;
 		goto failed;
 	}
@@ -1365,7 +1510,7 @@ static void iscsi_prep_lun_reset_pdu(struct scsi_cmnd *sc, struct iscsi_tm *hdr)
 	hdr->flags = ISCSI_TM_FUNC_LOGICAL_UNIT_RESET & ISCSI_FLAG_TM_FUNC_MASK;
 	hdr->flags |= ISCSI_FLAG_CMD_FINAL;
 	int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun);
-	hdr->rtt = ISCSI_RESERVED_TAG;
+	hdr->rtt = RESERVED_ITT;
 }
 
 int iscsi_eh_device_reset(struct scsi_cmnd *sc)
@@ -1396,7 +1541,8 @@ int iscsi_eh_device_reset(struct scsi_cmnd *sc)
 	hdr = &conn->tmhdr;
 	iscsi_prep_lun_reset_pdu(sc, hdr);
 
-	if (iscsi_exec_task_mgmt_fn(conn, hdr, session->age)) {
+	if (iscsi_exec_task_mgmt_fn(conn, hdr, session->age,
+				    session->lu_reset_timeout)) {
 		rc = FAILED;
 		goto unlock;
 	}
@@ -1572,12 +1718,14 @@ iscsi_session_setup(struct iscsi_transport *iscsit,
 	shost->max_cmd_len = iscsit->max_cmd_len;
 	shost->transportt = scsit;
 	shost->transportt->create_work_queue = 1;
+	shost->transportt->eh_timed_out = iscsi_eh_cmd_timed_out;
 	*hostno = shost->host_no;
 
 	session = iscsi_hostdata(shost->hostdata);
 	memset(session, 0, sizeof(struct iscsi_session));
 	session->host = shost;
 	session->state = ISCSI_STATE_FREE;
+	session->fast_abort = 1;
 	session->mgmtpool_max = ISCSI_MGMT_CMDS_MAX;
 	session->cmds_max = cmds_max;
 	session->queued_cmdsn = session->cmdsn = initial_cmdsn;
@@ -1708,6 +1856,11 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session, uint32_t conn_idx)
 	conn->id = conn_idx;
 	conn->exp_statsn = 0;
 	conn->tmf_state = TMF_INITIAL;
+
+	init_timer(&conn->transport_timer);
+	conn->transport_timer.data = (unsigned long)conn;
+	conn->transport_timer.function = iscsi_check_transport_timeouts;
+
 	INIT_LIST_HEAD(&conn->run_list);
 	INIT_LIST_HEAD(&conn->mgmt_run_list);
 	INIT_LIST_HEAD(&conn->mgmtqueue);
@@ -1757,6 +1910,8 @@ void iscsi_conn_teardown(struct iscsi_cls_conn *cls_conn)
 	struct iscsi_session *session = conn->session;
 	unsigned long flags;
 
+	del_timer_sync(&conn->transport_timer);
+
 	spin_lock_bh(&session->lock);
 	conn->c_stage = ISCSI_CONN_CLEANUP_WAIT;
 	if (session->leadconn == conn) {
@@ -1823,11 +1978,29 @@ int iscsi_conn_start(struct iscsi_cls_conn *cls_conn)
 		return -EINVAL;
 	}
 
+	if (conn->ping_timeout && !conn->recv_timeout) {
+		printk(KERN_ERR "iscsi: invalid recv timeout of zero "
+		      "Using 5 seconds\n.");
+		conn->recv_timeout = 5;
+	}
+
+	if (conn->recv_timeout && !conn->ping_timeout) {
+		printk(KERN_ERR "iscsi: invalid ping timeout of zero "
+		      "Using 5 seconds.\n");
+		conn->ping_timeout = 5;
+	}
+
 	spin_lock_bh(&session->lock);
 	conn->c_stage = ISCSI_CONN_STARTED;
 	session->state = ISCSI_STATE_LOGGED_IN;
 	session->queued_cmdsn = session->cmdsn;
 
+	conn->last_recv = jiffies;
+	conn->last_ping = jiffies;
+	if (conn->recv_timeout && conn->ping_timeout)
+		mod_timer(&conn->transport_timer,
+			  jiffies + (conn->recv_timeout * HZ));
+
 	switch(conn->stop_stage) {
 	case STOP_CONN_RECOVER:
 		/*
@@ -1879,6 +2052,8 @@ static void iscsi_start_session_recovery(struct iscsi_session *session,
 {
 	int old_stop_stage;
 
+	del_timer_sync(&conn->transport_timer);
+
 	mutex_lock(&session->eh_mutex);
 	spin_lock_bh(&session->lock);
 	if (conn->stop_stage == STOP_CONN_TERM) {
@@ -1993,6 +2168,18 @@ int iscsi_set_param(struct iscsi_cls_conn *cls_conn,
 	case ISCSI_PARAM_FAST_ABORT:
 		sscanf(buf, "%d", &session->fast_abort);
 		break;
+	case ISCSI_PARAM_ABORT_TMO:
+		sscanf(buf, "%d", &session->abort_timeout);
+		break;
+	case ISCSI_PARAM_LU_RESET_TMO:
+		sscanf(buf, "%d", &session->lu_reset_timeout);
+		break;
+	case ISCSI_PARAM_PING_TMO:
+		sscanf(buf, "%d", &conn->ping_timeout);
+		break;
+	case ISCSI_PARAM_RECV_TMO:
+		sscanf(buf, "%d", &conn->recv_timeout);
+		break;
 	case ISCSI_PARAM_MAX_RECV_DLENGTH:
 		sscanf(buf, "%d", &conn->max_recv_dlength);
 		break;
@@ -2110,6 +2297,12 @@ int iscsi_session_get_param(struct iscsi_cls_session *cls_session,
 	case ISCSI_PARAM_FAST_ABORT:
 		len = sprintf(buf, "%d\n", session->fast_abort);
 		break;
+	case ISCSI_PARAM_ABORT_TMO:
+		len = sprintf(buf, "%d\n", session->abort_timeout);
+		break;
+	case ISCSI_PARAM_LU_RESET_TMO:
+		len = sprintf(buf, "%d\n", session->lu_reset_timeout);
+		break;
 	case ISCSI_PARAM_INITIAL_R2T_EN:
 		len = sprintf(buf, "%d\n", session->initial_r2t_en);
 		break;
@@ -2167,6 +2360,12 @@ int iscsi_conn_get_param(struct iscsi_cls_conn *cls_conn,
 	int len;
 
 	switch(param) {
+	case ISCSI_PARAM_PING_TMO:
+		len = sprintf(buf, "%u\n", conn->ping_timeout);
+		break;
+	case ISCSI_PARAM_RECV_TMO:
+		len = sprintf(buf, "%u\n", conn->recv_timeout);
+		break;
 	case ISCSI_PARAM_MAX_RECV_DLENGTH:
 		len = sprintf(buf, "%u\n", conn->max_recv_dlength);
 		break;
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 0515b7d..fb8a765 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1201,6 +1201,8 @@ iscsi_conn_attr(port, ISCSI_PARAM_CONN_PORT);
 iscsi_conn_attr(exp_statsn, ISCSI_PARAM_EXP_STATSN);
 iscsi_conn_attr(persistent_address, ISCSI_PARAM_PERSISTENT_ADDRESS);
 iscsi_conn_attr(address, ISCSI_PARAM_CONN_ADDRESS);
+iscsi_conn_attr(ping_tmo, ISCSI_PARAM_PING_TMO);
+iscsi_conn_attr(recv_tmo, ISCSI_PARAM_RECV_TMO);
 
 #define iscsi_cdev_to_session(_cdev) \
 	iscsi_dev_to_session(_cdev->dev)
@@ -1436,6 +1438,8 @@ iscsi_register_transport(struct iscsi_transport *tt)
 	SETUP_CONN_RD_ATTR(exp_statsn, ISCSI_EXP_STATSN);
 	SETUP_CONN_RD_ATTR(persistent_address, ISCSI_PERSISTENT_ADDRESS);
 	SETUP_CONN_RD_ATTR(persistent_port, ISCSI_PERSISTENT_PORT);
+	SETUP_CONN_RD_ATTR(ping_tmo, ISCSI_PING_TMO);
+	SETUP_CONN_RD_ATTR(recv_tmo, ISCSI_RECV_TMO);
 
 	BUG_ON(count > ISCSI_CONN_ATTRS);
 	priv->conn_attrs[count] = NULL;
diff --git a/include/scsi/iscsi_if.h b/include/scsi/iscsi_if.h
index 8a4426d..e19e584 100644
--- a/include/scsi/iscsi_if.h
+++ b/include/scsi/iscsi_if.h
@@ -244,6 +244,12 @@ enum iscsi_param {
 	ISCSI_PARAM_PASSWORD_IN,
 
 	ISCSI_PARAM_FAST_ABORT,
+	ISCSI_PARAM_ABORT_TMO,
+	ISCSI_PARAM_LU_RESET_TMO,
+	ISCSI_PARAM_HOST_RESET_TMO,
+
+	ISCSI_PARAM_PING_TMO,
+	ISCSI_PARAM_RECV_TMO,
 	/* must always be last */
 	ISCSI_PARAM_MAX,
 };
@@ -275,6 +281,11 @@ enum iscsi_param {
 #define ISCSI_PASSWORD			(1 << ISCSI_PARAM_PASSWORD)
 #define ISCSI_PASSWORD_IN		(1 << ISCSI_PARAM_PASSWORD_IN)
 #define ISCSI_FAST_ABORT		(1 << ISCSI_PARAM_FAST_ABORT)
+#define ISCSI_ABORT_TMO			(1 << ISCSI_PARAM_ABORT_TMO)
+#define ISCSI_LU_RESET_TMO		(1 << ISCSI_PARAM_LU_RESET_TMO)
+#define ISCSI_HOST_RESET_TMO		(1 << ISCSI_PARAM_HOST_RESET_TMO)
+#define ISCSI_PING_TMO			(1 << ISCSI_PARAM_PING_TMO)
+#define ISCSI_RECV_TMO			(1 << ISCSI_PARAM_RECV_TMO)
 
 /* iSCSI HBA params */
 enum iscsi_host_param {
diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h
index d68f745..889f51f 100644
--- a/include/scsi/libiscsi.h
+++ b/include/scsi/libiscsi.h
@@ -148,6 +148,12 @@ struct iscsi_conn {
 	 * conn_stop() flag: stop to recover, stop to terminate
 	 */
         int			stop_stage;
+	struct timer_list	transport_timer;
+	unsigned long		last_recv;
+	unsigned long		last_ping;
+	int			ping_timeout;
+	int			recv_timeout;
+	struct iscsi_mgmt_task	*ping_mtask;
 
 	/* iSCSI connection-wide sequencing */
 	uint32_t		exp_statsn;
@@ -238,6 +244,8 @@ struct iscsi_session {
 	uint32_t		queued_cmdsn;
 
 	/* configuration */
+	int			abort_timeout;
+	int			lu_reset_timeout;
 	int			initial_r2t_en;
 	unsigned		max_r2t;
 	int			imm_data_en;
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 12/23] iscsi_tcp: update the website URL
  2007-12-05  0:42                     ` [PATCH 11/23] libiscsi: fix nop handling michaelc
@ 2007-12-05  0:42                       ` michaelc
  2007-12-05  0:42                         ` [PATCH 13/23] Do not fail commands immediately during logout michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: FUJITA Tomonori, FUJITA Tomonori, Mike Christie

From: FUJITA Tomonori <tomof@acm.org>

Use open-iscsi.org instead of linux-iscsi.sf.net, which hasn't been
updated for ages.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
index a6676be..ab965f5 100644
--- a/drivers/scsi/Kconfig
+++ b/drivers/scsi/Kconfig
@@ -341,7 +341,7 @@ config ISCSI_TCP
 	 The userspace component needed to initialize the driver, documentation,
 	 and sample configuration files can be found here:
 
-	 http://linux-iscsi.sf.net
+	 http://open-iscsi.org
 
 config SGIWD93_SCSI
 	tristate "SGI WD93C93 SCSI Driver"
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 13/23] Do not fail commands immediately during logout
  2007-12-05  0:42                       ` [PATCH 12/23] iscsi_tcp: update the website URL michaelc
@ 2007-12-05  0:42                         ` michaelc
  2007-12-05  0:42                           ` [PATCH 14/23] clear conn->ctask when task is completed early michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

If the target requests a logout, then we do not want
to fail commands to scsi-ml right away. This patch just
fails in pending commands for a requeue immediately, and then lets
iscsid handle running commands like normal recovery.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 9688361..b17081b 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -917,7 +917,7 @@ check_mgmt:
 		conn->ctask = list_entry(conn->xmitqueue.next,
 					 struct iscsi_cmd_task, running);
 		if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
-			fail_command(conn, conn->ctask, DID_NO_CONNECT << 16);
+			fail_command(conn, conn->ctask, DID_IMM_RETRY << 16);
 			continue;
 		}
 		if (iscsi_prep_scsi_cmd_pdu(conn->ctask)) {
@@ -1024,21 +1024,19 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 		 * be entering our queuecommand while a block is starting
 		 * up because the block code is not locked)
 		 */
-		if (session->state == ISCSI_STATE_IN_RECOVERY) {
+		switch (session->state) {
+		case ISCSI_STATE_IN_RECOVERY:
 			reason = FAILURE_SESSION_IN_RECOVERY;
 			goto reject;
-		}
-
-		switch (session->state) {
+		case ISCSI_STATE_LOGGING_OUT:
+			reason = FAILURE_SESSION_LOGGING_OUT;
+			goto reject;
 		case ISCSI_STATE_RECOVERY_FAILED:
 			reason = FAILURE_SESSION_RECOVERY_TIMEOUT;
 			break;
 		case ISCSI_STATE_TERMINATE:
 			reason = FAILURE_SESSION_TERMINATE;
 			break;
-		case ISCSI_STATE_LOGGING_OUT:
-			reason = FAILURE_SESSION_LOGGING_OUT;
-			break;
 		default:
 			reason = FAILURE_SESSION_FREED;
 		}
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 14/23] clear conn->ctask when task is completed early
  2007-12-05  0:42                         ` [PATCH 13/23] Do not fail commands immediately during logout michaelc
@ 2007-12-05  0:42                           ` michaelc
  2007-12-05  0:42                             ` [PATCH 15/23] Drop host lock in queuecommand michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

If the current ctask is failed early, we legt the conn->ctask pointer
pointing to a invalid task. When the xmit thread would send data for
it, we would then oops.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index b17081b..4461317 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -248,13 +248,16 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
  */
 static void iscsi_complete_command(struct iscsi_cmd_task *ctask)
 {
-	struct iscsi_session *session = ctask->conn->session;
+	struct iscsi_conn *conn = ctask->conn;
+	struct iscsi_session *session = conn->session;
 	struct scsi_cmnd *sc = ctask->sc;
 
 	ctask->state = ISCSI_TASK_COMPLETED;
 	ctask->sc = NULL;
 	/* SCSI eh reuses commands to verify us */
 	sc->SCp.ptr = NULL;
+	if (conn->ctask == ctask)
+		conn->ctask = NULL;
 	list_del_init(&ctask->running);
 	__kfifo_put(session->cmdpool.queue, (void*)&ctask, sizeof(void*));
 	sc->scsi_done(sc);
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 15/23] Drop host lock in queuecommand
  2007-12-05  0:42                           ` [PATCH 14/23] clear conn->ctask when task is completed early michaelc
@ 2007-12-05  0:42                             ` michaelc
  2007-12-05  0:42                               ` [PATCH 16/23] convert xmit path to iscsi chunks michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

The driver does not need the host lock in queuecommand so drop it.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 4461317..b0bc8c3 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1010,8 +1010,9 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 	sc->SCp.ptr = NULL;
 
 	host = sc->device->host;
-	session = iscsi_hostdata(host->hostdata);
+	spin_unlock(host->host_lock);
 
+	session = iscsi_hostdata(host->hostdata);
 	spin_lock(&session->lock);
 
 	/*
@@ -1077,11 +1078,13 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(struct scsi_cmnd *))
 	spin_unlock(&session->lock);
 
 	scsi_queue_work(host, &conn->xmitwork);
+	spin_lock(host->host_lock);
 	return 0;
 
 reject:
 	spin_unlock(&session->lock);
 	debug_scsi("cmd 0x%x rejected (%d)\n", sc->cmnd[0], reason);
+	spin_lock(host->host_lock);
 	return SCSI_MLQUEUE_HOST_BUSY;
 
 fault:
@@ -1091,6 +1094,7 @@ fault:
 	sc->result = (DID_NO_CONNECT << 16);
 	scsi_set_resid(sc, scsi_bufflen(sc));
 	sc->scsi_done(sc);
+	spin_lock(host->host_lock);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(iscsi_queuecommand);
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 16/23] convert xmit path to iscsi chunks
  2007-12-05  0:42                             ` [PATCH 15/23] Drop host lock in queuecommand michaelc
@ 2007-12-05  0:42                               ` michaelc
  2007-12-05  0:42                                 ` [PATCH 17/23] iscsi_tcp: stop leaking r2t_info's when the incoming R2T is bad michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, Olaf Kirch

From: Mike Christie <michaelc@cs.wisc.edu>

from olaf.kirch@oracle.com:

Convert xmit to iscsi chunks.

from michaelc@cs.wisc.edu:

Bug fixes, more digest integration, sg chaining conversion and other
sg wrapper changes, coding style sync up, and removal of io fields,
like pdu_sent, that are not needed.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/infiniband/ulp/iser/iscsi_iser.c |    3 +-
 drivers/scsi/iscsi_tcp.c                 | 1333 ++++++++++++------------------
 drivers/scsi/iscsi_tcp.h                 |   75 +--
 drivers/scsi/libiscsi.c                  |   37 +-
 include/scsi/scsi_transport_iscsi.h      |    2 +-
 5 files changed, 563 insertions(+), 887 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 2656064..fd69fb3 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -129,7 +129,7 @@ error:
  * iscsi_iser_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands
  *
  **/
-static void
+static int
 iscsi_iser_cmd_init(struct iscsi_cmd_task *ctask)
 {
 	struct iscsi_iser_conn     *iser_conn  = ctask->conn->dd_data;
@@ -138,6 +138,7 @@ iscsi_iser_cmd_init(struct iscsi_cmd_task *ctask)
 	iser_ctask->command_sent = 0;
 	iser_ctask->iser_conn    = iser_conn;
 	iser_ctask_rdma_init(iser_ctask);
+	return 0;
 }
 
 /**
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 9b41852..7212fe9 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -68,56 +68,10 @@ static unsigned int iscsi_max_lun = 512;
 module_param_named(max_lun, iscsi_max_lun, uint, S_IRUGO);
 
 static int iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
-				   struct iscsi_chunk *chunk);
-
-static inline void
-iscsi_buf_init_iov(struct iscsi_buf *ibuf, char *vbuf, int size)
-{
-	ibuf->sg.page = virt_to_page(vbuf);
-	ibuf->sg.offset = offset_in_page(vbuf);
-	ibuf->sg.length = size;
-	ibuf->sent = 0;
-	ibuf->use_sendmsg = 1;
-}
-
-static inline void
-iscsi_buf_init_sg(struct iscsi_buf *ibuf, struct scatterlist *sg)
-{
-	ibuf->sg.page = sg->page;
-	ibuf->sg.offset = sg->offset;
-	ibuf->sg.length = sg->length;
-	/*
-	 * Fastpath: sg element fits into single page
-	 */
-	if (sg->length + sg->offset <= PAGE_SIZE && !PageSlab(sg->page))
-		ibuf->use_sendmsg = 0;
-	else
-		ibuf->use_sendmsg = 1;
-	ibuf->sent = 0;
-}
-
-static inline int
-iscsi_buf_left(struct iscsi_buf *ibuf)
-{
-	int rc;
-
-	rc = ibuf->sg.length - ibuf->sent;
-	BUG_ON(rc < 0);
-	return rc;
-}
-
-static inline void
-iscsi_hdr_digest(struct iscsi_conn *conn, struct iscsi_buf *buf,
-		 u8* crc)
-{
-	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-
-	crypto_hash_digest(&tcp_conn->tx_hash, &buf->sg, buf->sg.length, crc);
-	buf->sg.length += ISCSI_DIGEST_SIZE;
-}
+				   struct iscsi_segment *segment);
 
 /*
- * Scatterlist handling: inside the iscsi_chunk, we
+ * Scatterlist handling: inside the iscsi_segment, we
  * remember an index into the scatterlist, and set data/size
  * to the current scatterlist entry. For highmem pages, we
  * kmap as needed.
@@ -130,60 +84,72 @@ iscsi_hdr_digest(struct iscsi_conn *conn, struct iscsi_buf *buf,
  */
 
 /**
- * iscsi_tcp_chunk_init_sg - init indicated scatterlist entry
- * @chunk: the buffer object
- * @idx: index into scatterlist
+ * iscsi_tcp_segment_init_sg - init indicated scatterlist entry
+ * @segment: the buffer object
+ * @sg: scatterlist
  * @offset: byte offset into that sg entry
  *
- * This function sets up the chunk so that subsequent
+ * This function sets up the segment so that subsequent
  * data is copied to the indicated sg entry, at the given
  * offset.
  */
 static inline void
-iscsi_tcp_chunk_init_sg(struct iscsi_chunk *chunk,
-			unsigned int idx, unsigned int offset)
+iscsi_tcp_segment_init_sg(struct iscsi_segment *segment,
+			  struct scatterlist *sg, unsigned int offset)
 {
-	struct scatterlist *sg;
-
-	BUG_ON(chunk->sg == NULL);
-
-	sg = &chunk->sg[idx];
-	chunk->sg_index = idx;
-	chunk->sg_offset = offset;
-	chunk->size = min(sg->length - offset, chunk->total_size);
-	chunk->data = NULL;
+	segment->sg = sg;
+	segment->sg_offset = offset;
+	segment->size = min(sg->length - offset,
+			    segment->total_size - segment->total_copied);
+	segment->data = NULL;
 }
 
 /**
- * iscsi_tcp_chunk_map - map the current S/G page
- * @chunk: iscsi chunk
+ * iscsi_tcp_segment_map - map the current S/G page
+ * @segment: iscsi_segment
+ * @recv: 1 if called from recv path
  *
  * We only need to possibly kmap data if scatter lists are being used,
  * because the iscsi passthrough and internal IO paths will never use high
  * mem pages.
  */
 static inline void
-iscsi_tcp_chunk_map(struct iscsi_chunk *chunk)
+iscsi_tcp_segment_map(struct iscsi_segment *segment, int recv)
 {
 	struct scatterlist *sg;
 
-	if (chunk->data != NULL || !chunk->sg)
+	if (segment->data != NULL || !segment->sg)
 		return;
 
-	sg = &chunk->sg[chunk->sg_index];
-	BUG_ON(chunk->sg_mapped);
+	sg = segment->sg;
+	BUG_ON(segment->sg_mapped);
 	BUG_ON(sg->length == 0);
-	chunk->sg_mapped = kmap_atomic(sg->page, KM_SOFTIRQ0);
-	chunk->data = chunk->sg_mapped + sg->offset + chunk->sg_offset;
+
+	/*
+	 * If the page count is greater than one it is ok to send
+	 * to the network layer's zero copy send path. If not we
+	 * have to go the slow sendmsg path. We always map for the
+	 * recv path.
+	 */
+	if (page_count(sg_page(sg)) >= 1 && !recv)
+		return;
+
+	debug_tcp("iscsi_tcp_segment_map %s %p\n", recv ? "recv" : "xmit",
+		  segment);
+	segment->sg_mapped = kmap_atomic(sg_page(sg), KM_SOFTIRQ0);
+	segment->data = segment->sg_mapped + sg->offset + segment->sg_offset;
 }
 
 static inline void
-iscsi_tcp_chunk_unmap(struct iscsi_chunk *chunk)
+iscsi_tcp_segment_unmap(struct iscsi_segment *segment)
 {
-	if (chunk->sg_mapped) {
-		kunmap_atomic(chunk->sg_mapped, KM_SOFTIRQ0);
-		chunk->sg_mapped = NULL;
-		chunk->data = NULL;
+	debug_tcp("iscsi_tcp_segment_unmap %p\n", segment);
+
+	if (segment->sg_mapped) {
+		debug_tcp("iscsi_tcp_segment_unmap valid\n");
+		kunmap_atomic(segment->sg_mapped, KM_SOFTIRQ0);
+		segment->sg_mapped = NULL;
+		segment->data = NULL;
 	}
 }
 
@@ -191,23 +157,24 @@ iscsi_tcp_chunk_unmap(struct iscsi_chunk *chunk)
  * Splice the digest buffer into the buffer
  */
 static inline void
-iscsi_tcp_chunk_splice_digest(struct iscsi_chunk *chunk, void *digest)
+iscsi_tcp_segment_splice_digest(struct iscsi_segment *segment, void *digest)
 {
-	chunk->data = digest;
-	chunk->digest_len = ISCSI_DIGEST_SIZE;
-	chunk->total_size += ISCSI_DIGEST_SIZE;
-	chunk->size = ISCSI_DIGEST_SIZE;
-	chunk->copied = 0;
-	chunk->sg = NULL;
-	chunk->sg_index = 0;
-	chunk->hash = NULL;
+	segment->data = digest;
+	segment->digest_len = ISCSI_DIGEST_SIZE;
+	segment->total_size += ISCSI_DIGEST_SIZE;
+	segment->size = ISCSI_DIGEST_SIZE;
+	segment->copied = 0;
+	segment->sg = NULL;
+	segment->hash = NULL;
 }
 
 /**
- * iscsi_tcp_chunk_done - check whether the chunk is complete
- * @chunk: iscsi chunk to check
+ * iscsi_tcp_segment_done - check whether the segment is complete
+ * @segment: iscsi segment to check
+ * @recv: set to one of this is called from the recv path
+ * @copied: number of bytes copied
  *
- * Check if we're done receiving this chunk. If the receive
+ * Check if we're done receiving this segment. If the receive
  * buffer is full but we expect more data, move on to the
  * next entry in the scatterlist.
  *
@@ -217,62 +184,145 @@ iscsi_tcp_chunk_splice_digest(struct iscsi_chunk *chunk, void *digest)
  * This function must be re-entrant.
  */
 static inline int
-iscsi_tcp_chunk_done(struct iscsi_chunk *chunk)
+iscsi_tcp_segment_done(struct iscsi_segment *segment, int recv, unsigned copied)
 {
 	static unsigned char padbuf[ISCSI_PAD_LEN];
+	struct scatterlist sg;
 	unsigned int pad;
 
-	if (chunk->copied < chunk->size) {
-		iscsi_tcp_chunk_map(chunk);
+	debug_tcp("copied %u %u size %u %s\n", segment->copied, copied,
+		  segment->size, recv ? "recv" : "xmit");
+	if (segment->hash && copied) {
+		/*
+		 * If a segment is kmapd we must unmap it before sending
+		 * to the crypto layer since that will try to kmap it again.
+		 */
+		iscsi_tcp_segment_unmap(segment);
+
+		if (!segment->data) {
+			sg_init_table(&sg, 1);
+			sg_set_page(&sg, sg_page(segment->sg), copied,
+				    segment->copied + segment->sg_offset +
+							segment->sg->offset);
+		} else
+			sg_init_one(&sg, segment->data + segment->copied,
+				    copied);
+		crypto_hash_update(segment->hash, &sg, copied);
+	}
+
+	segment->copied += copied;
+	if (segment->copied < segment->size) {
+		iscsi_tcp_segment_map(segment, recv);
 		return 0;
 	}
 
-	chunk->total_copied += chunk->copied;
-	chunk->copied = 0;
-	chunk->size = 0;
+	segment->total_copied += segment->copied;
+	segment->copied = 0;
+	segment->size = 0;
 
 	/* Unmap the current scatterlist page, if there is one. */
-	iscsi_tcp_chunk_unmap(chunk);
+	iscsi_tcp_segment_unmap(segment);
 
 	/* Do we have more scatterlist entries? */
-	if (chunk->total_copied < chunk->total_size) {
+	debug_tcp("total copied %u total size %u\n", segment->total_copied,
+		   segment->total_size);
+	if (segment->total_copied < segment->total_size) {
 		/* Proceed to the next entry in the scatterlist. */
-		iscsi_tcp_chunk_init_sg(chunk, chunk->sg_index + 1, 0);
-		iscsi_tcp_chunk_map(chunk);
-		BUG_ON(chunk->size == 0);
+		iscsi_tcp_segment_init_sg(segment, sg_next(segment->sg),
+					  0);
+		iscsi_tcp_segment_map(segment, recv);
+		BUG_ON(segment->size == 0);
 		return 0;
 	}
 
 	/* Do we need to handle padding? */
-	pad = iscsi_padding(chunk->total_copied);
+	pad = iscsi_padding(segment->total_copied);
 	if (pad != 0) {
 		debug_tcp("consume %d pad bytes\n", pad);
-		chunk->total_size += pad;
-		chunk->size = pad;
-		chunk->data = padbuf;
+		segment->total_size += pad;
+		segment->size = pad;
+		segment->data = padbuf;
 		return 0;
 	}
 
 	/*
-	 * Set us up for receiving the data digest. hdr digest
+	 * Set us up for transferring the data digest. hdr digest
 	 * is completely handled in hdr done function.
 	 */
-	if (chunk->hash) {
-		if (chunk->digest_len == 0) {
-			crypto_hash_final(chunk->hash, chunk->digest);
-			iscsi_tcp_chunk_splice_digest(chunk,
-						      chunk->recv_digest);
-			return 0;
-		}
+	if (segment->hash) {
+		crypto_hash_final(segment->hash, segment->digest);
+		iscsi_tcp_segment_splice_digest(segment,
+				 recv ? segment->recv_digest : segment->digest);
+		return 0;
 	}
 
 	return 1;
 }
 
 /**
- * iscsi_tcp_chunk_recv - copy data to chunk
+ * iscsi_tcp_xmit_segment - transmit segment
  * @tcp_conn: the iSCSI TCP connection
- * @chunk: the buffer to copy to
+ * @segment: the buffer to transmnit
+ *
+ * This function transmits as much of the buffer as
+ * the network layer will accept, and returns the number of
+ * bytes transmitted.
+ *
+ * If CRC hashing is enabled, the function will compute the
+ * hash as it goes. When the entire segment has been transmitted,
+ * it will retrieve the hash value and send it as well.
+ */
+static int
+iscsi_tcp_xmit_segment(struct iscsi_tcp_conn *tcp_conn,
+		       struct iscsi_segment *segment)
+{
+	struct socket *sk = tcp_conn->sock;
+	unsigned int copied = 0;
+	int r = 0;
+
+	while (!iscsi_tcp_segment_done(segment, 0, r)) {
+		struct scatterlist *sg;
+		unsigned int offset, copy;
+		int flags = 0;
+
+		r = 0;
+		offset = segment->copied;
+		copy = segment->size - offset;
+
+		if (segment->total_copied + segment->size < segment->total_size)
+			flags |= MSG_MORE;
+
+		/* Use sendpage if we can; else fall back to sendmsg */
+		if (!segment->data) {
+			sg = segment->sg;
+			offset += segment->sg_offset + sg->offset;
+			r = tcp_conn->sendpage(sk, sg_page(sg), offset, copy,
+					       flags);
+		} else {
+			struct msghdr msg = { .msg_flags = flags };
+			struct kvec iov = {
+				.iov_base = segment->data + offset,
+				.iov_len = copy
+			};
+
+			r = kernel_sendmsg(sk, &msg, &iov, 1, copy);
+		}
+
+		if (r < 0) {
+			iscsi_tcp_segment_unmap(segment);
+			if (copied || r == -EAGAIN)
+				break;
+			return r;
+		}
+		copied += r;
+	}
+	return copied;
+}
+
+/**
+ * iscsi_tcp_segment_recv - copy data to segment
+ * @tcp_conn: the iSCSI TCP connection
+ * @segment: the buffer to copy to
  * @ptr: data pointer
  * @len: amount of data available
  *
@@ -287,29 +337,24 @@ iscsi_tcp_chunk_done(struct iscsi_chunk *chunk)
  * just way we do for network layer checksums.
  */
 static int
-iscsi_tcp_chunk_recv(struct iscsi_tcp_conn *tcp_conn,
-		     struct iscsi_chunk *chunk, const void *ptr,
-		     unsigned int len)
+iscsi_tcp_segment_recv(struct iscsi_tcp_conn *tcp_conn,
+		       struct iscsi_segment *segment, const void *ptr,
+		       unsigned int len)
 {
-	struct scatterlist sg;
-	unsigned int copy, copied = 0;
-
-	while (!iscsi_tcp_chunk_done(chunk)) {
-		if (copied == len)
-			goto out;
+	unsigned int copy = 0, copied = 0;
 
-		copy = min(len - copied, chunk->size - chunk->copied);
-		memcpy(chunk->data + chunk->copied, ptr + copied, copy);
-
-		if (chunk->hash) {
-			sg_init_one(&sg, ptr + copied, copy);
-			crypto_hash_update(chunk->hash, &sg, copy);
+	while (!iscsi_tcp_segment_done(segment, 1, copy)) {
+		if (copied == len) {
+			debug_tcp("iscsi_tcp_segment_recv copied %d bytes\n",
+				  len);
+			break;
 		}
-		chunk->copied += copy;
+
+		copy = min(len - copied, segment->size - segment->copied);
+		debug_tcp("iscsi_tcp_segment_recv copying %d\n", copy);
+		memcpy(segment->data + segment->copied, ptr + copied, copy);
 		copied += copy;
 	}
-
-out:
 	return copied;
 }
 
@@ -325,12 +370,13 @@ iscsi_tcp_dgst_header(struct hash_desc *hash, const void *hdr, size_t hdrlen,
 
 static inline int
 iscsi_tcp_dgst_verify(struct iscsi_tcp_conn *tcp_conn,
-		      struct iscsi_chunk *chunk)
+		      struct iscsi_segment *segment)
 {
-	if (!chunk->digest_len)
+	if (!segment->digest_len)
 		return 1;
 
-	if (memcmp(chunk->recv_digest, chunk->digest, chunk->digest_len)) {
+	if (memcmp(segment->recv_digest, segment->digest,
+		   segment->digest_len)) {
 		debug_scsi("digest mismatch\n");
 		return 0;
 	}
@@ -339,55 +385,59 @@ iscsi_tcp_dgst_verify(struct iscsi_tcp_conn *tcp_conn,
 }
 
 /*
- * Helper function to set up chunk buffer
+ * Helper function to set up segment buffer
  */
 static inline void
-__iscsi_chunk_init(struct iscsi_chunk *chunk, size_t size,
-		   iscsi_chunk_done_fn_t *done, struct hash_desc *hash)
+__iscsi_segment_init(struct iscsi_segment *segment, size_t size,
+		     iscsi_segment_done_fn_t *done, struct hash_desc *hash)
 {
-	memset(chunk, 0, sizeof(*chunk));
-	chunk->total_size = size;
-	chunk->done = done;
+	memset(segment, 0, sizeof(*segment));
+	segment->total_size = size;
+	segment->done = done;
 
 	if (hash) {
-		chunk->hash = hash;
+		segment->hash = hash;
 		crypto_hash_init(hash);
 	}
 }
 
 static inline void
-iscsi_chunk_init_linear(struct iscsi_chunk *chunk, void *data, size_t size,
-			iscsi_chunk_done_fn_t *done, struct hash_desc *hash)
+iscsi_segment_init_linear(struct iscsi_segment *segment, void *data,
+			  size_t size, iscsi_segment_done_fn_t *done,
+			  struct hash_desc *hash)
 {
-	__iscsi_chunk_init(chunk, size, done, hash);
-	chunk->data = data;
-	chunk->size = size;
+	__iscsi_segment_init(segment, size, done, hash);
+	segment->data = data;
+	segment->size = size;
 }
 
 static inline int
-iscsi_chunk_seek_sg(struct iscsi_chunk *chunk,
-		    struct scatterlist *sg, unsigned int sg_count,
-		    unsigned int offset, size_t size,
-		    iscsi_chunk_done_fn_t *done, struct hash_desc *hash)
+iscsi_segment_seek_sg(struct iscsi_segment *segment,
+		      struct scatterlist *sg_list, unsigned int sg_count,
+		      unsigned int offset, size_t size,
+		      iscsi_segment_done_fn_t *done, struct hash_desc *hash)
 {
+	struct scatterlist *sg;
 	unsigned int i;
 
-	__iscsi_chunk_init(chunk, size, done, hash);
-	for (i = 0; i < sg_count; ++i) {
-		if (offset < sg[i].length) {
-			chunk->sg = sg;
-			chunk->sg_count = sg_count;
-			iscsi_tcp_chunk_init_sg(chunk, i, offset);
+	debug_scsi("iscsi_segment_seek_sg offset %u size %llu\n",
+		  offset, size);
+	__iscsi_segment_init(segment, size, done, hash);
+	for_each_sg(sg_list, sg, sg_count, i) {
+		debug_scsi("sg %d, len %u offset %u\n", i, sg->length,
+			   sg->offset);
+		if (offset < sg->length) {
+			iscsi_tcp_segment_init_sg(segment, sg, offset);
 			return 0;
 		}
-		offset -= sg[i].length;
+		offset -= sg->length;
 	}
 
 	return ISCSI_ERR_DATA_OFFSET;
 }
 
 /**
- * iscsi_tcp_hdr_recv_prep - prep chunk for hdr reception
+ * iscsi_tcp_hdr_recv_prep - prep segment for hdr reception
  * @tcp_conn: iscsi connection to prep for
  *
  * This function always passes NULL for the hash argument, because when this
@@ -399,7 +449,7 @@ iscsi_tcp_hdr_recv_prep(struct iscsi_tcp_conn *tcp_conn)
 {
 	debug_tcp("iscsi_tcp_hdr_recv_prep(%p%s)\n", tcp_conn,
 		  tcp_conn->iscsi_conn->hdrdgst_en ? ", digest enabled" : "");
-	iscsi_chunk_init_linear(&tcp_conn->in.chunk,
+	iscsi_segment_init_linear(&tcp_conn->in.segment,
 				tcp_conn->in.hdr_buf, sizeof(struct iscsi_hdr),
 				iscsi_tcp_hdr_recv_done, NULL);
 }
@@ -409,12 +459,12 @@ iscsi_tcp_hdr_recv_prep(struct iscsi_tcp_conn *tcp_conn)
  */
 static int
 iscsi_tcp_data_recv_done(struct iscsi_tcp_conn *tcp_conn,
-			 struct iscsi_chunk *chunk)
+			 struct iscsi_segment *segment)
 {
 	struct iscsi_conn *conn = tcp_conn->iscsi_conn;
 	int rc = 0;
 
-	if (!iscsi_tcp_dgst_verify(tcp_conn, chunk))
+	if (!iscsi_tcp_dgst_verify(tcp_conn, segment))
 		return ISCSI_ERR_DATA_DGST;
 
 	rc = iscsi_complete_pdu(conn, tcp_conn->in.hdr,
@@ -435,7 +485,7 @@ iscsi_tcp_data_recv_prep(struct iscsi_tcp_conn *tcp_conn)
 	if (conn->datadgst_en)
 		rx_hash = &tcp_conn->rx_hash;
 
-	iscsi_chunk_init_linear(&tcp_conn->in.chunk,
+	iscsi_segment_init_linear(&tcp_conn->in.segment,
 				conn->data, tcp_conn->in.datalen,
 				iscsi_tcp_data_recv_done, rx_hash);
 }
@@ -448,7 +498,6 @@ iscsi_tcp_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 {
 	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
 	struct iscsi_r2t_info *r2t;
-	struct scsi_cmnd *sc;
 
 	/* flush ctask's r2t queues */
 	while (__kfifo_get(tcp_ctask->r2tqueue, (void*)&r2t, sizeof(void*))) {
@@ -457,12 +506,12 @@ iscsi_tcp_cleanup_ctask(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 		debug_scsi("iscsi_tcp_cleanup_ctask pending r2t dropped\n");
 	}
 
-	sc = ctask->sc;
-	if (unlikely(!sc))
-		return;
-
-	tcp_ctask->xmstate = XMSTATE_IDLE;
-	tcp_ctask->r2t = NULL;
+	r2t = tcp_ctask->r2t;
+	if (r2t != NULL) {
+		__kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t,
+			    sizeof(void*));
+		tcp_ctask->r2t = NULL;
+	}
 }
 
 /**
@@ -481,11 +530,6 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	int datasn = be32_to_cpu(rhdr->datasn);
 
 	iscsi_update_cmdsn(session, (struct iscsi_nopin*)rhdr);
-	/*
-	 * setup Data-In byte counter (gets decremented..)
-	 */
-	ctask->data_count = tcp_conn->in.datalen;
-
 	if (tcp_conn->in.datalen == 0)
 		return 0;
 
@@ -543,9 +587,6 @@ iscsi_solicit_data_init(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 			struct iscsi_r2t_info *r2t)
 {
 	struct iscsi_data *hdr;
-	struct scsi_cmnd *sc = ctask->sc;
-	int i, sg_count = 0;
-	struct scatterlist *sg;
 
 	hdr = &r2t->dtask.hdr;
 	memset(hdr, 0, sizeof(struct iscsi_data));
@@ -569,34 +610,6 @@ iscsi_solicit_data_init(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 	conn->dataout_pdus_cnt++;
 
 	r2t->sent = 0;
-
-	iscsi_buf_init_iov(&r2t->headbuf, (char*)hdr,
-			   sizeof(struct iscsi_hdr));
-
-	sg = scsi_sglist(sc);
-	r2t->sg = NULL;
-	for (i = 0; i < scsi_sg_count(sc); i++, sg += 1) {
-		/* FIXME: prefetch ? */
-		if (sg_count + sg->length > r2t->data_offset) {
-			int page_offset;
-
-			/* sg page found! */
-
-			/* offset within this page */
-			page_offset = r2t->data_offset - sg_count;
-
-			/* fill in this buffer */
-			iscsi_buf_init_sg(&r2t->sendbuf, sg);
-			r2t->sendbuf.sg.offset += page_offset;
-			r2t->sendbuf.sg.length -= page_offset;
-
-			/* xmit logic will continue with next one */
-			r2t->sg = sg + 1;
-			break;
-		}
-		sg_count += sg->length;
-	}
-	BUG_ON(r2t->sg == NULL);
 }
 
 /**
@@ -670,7 +683,6 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 
 	tcp_ctask->exp_datasn = r2tsn + 1;
 	__kfifo_put(tcp_ctask->r2tqueue, (void*)&r2t, sizeof(void*));
-	tcp_ctask->xmstate |= XMSTATE_SOL_HDR_INIT;
 	conn->r2t_pdus_cnt++;
 
 	iscsi_requeue_ctask(ctask);
@@ -684,13 +696,13 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
  */
 static int
 iscsi_tcp_process_data_in(struct iscsi_tcp_conn *tcp_conn,
-			  struct iscsi_chunk *chunk)
+			  struct iscsi_segment *segment)
 {
 	struct iscsi_conn *conn = tcp_conn->iscsi_conn;
 	struct iscsi_hdr *hdr = tcp_conn->in.hdr;
 	int rc;
 
-	if (!iscsi_tcp_dgst_verify(tcp_conn, chunk))
+	if (!iscsi_tcp_dgst_verify(tcp_conn, segment))
 		return ISCSI_ERR_DATA_DGST;
 
 	/* check for non-exceptional status */
@@ -762,7 +774,7 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 			/*
 			 * Setup copy of Data-In into the Scsi_Cmnd
 			 * Scatterlist case:
-			 * We set up the iscsi_chunk to point to the next
+			 * We set up the iscsi_segment to point to the next
 			 * scatterlist entry to copy to. As we go along,
 			 * we move on to the next scatterlist entry and
 			 * update the digest per-entry.
@@ -774,13 +786,13 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 				  "datalen=%d)\n", tcp_conn,
 				  tcp_ctask->data_offset,
 				  tcp_conn->in.datalen);
-			return iscsi_chunk_seek_sg(&tcp_conn->in.chunk,
-						scsi_sglist(ctask->sc),
-						scsi_sg_count(ctask->sc),
-						tcp_ctask->data_offset,
-						tcp_conn->in.datalen,
-						iscsi_tcp_process_data_in,
-						rx_hash);
+			return iscsi_segment_seek_sg(&tcp_conn->in.segment,
+						     scsi_sglist(ctask->sc),
+						     scsi_sg_count(ctask->sc),
+						     tcp_ctask->data_offset,
+						     tcp_conn->in.datalen,
+						     iscsi_tcp_process_data_in,
+						     rx_hash);
 		}
 		/* fall through */
 	case ISCSI_OP_SCSI_CMD_RSP:
@@ -846,17 +858,6 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 	return rc;
 }
 
-static inline void
-partial_sg_digest_update(struct hash_desc *desc, struct scatterlist *sg,
-			 int offset, int length)
-{
-	struct scatterlist temp;
-
-	sg_init_table(&temp, 1);
-	sg_set_page(&temp, sg_page(sg), length, offset);
-	crypto_hash_update(desc, &temp, length);
-}
-
 /**
  * iscsi_tcp_hdr_recv_done - process PDU header
  *
@@ -866,7 +867,7 @@ partial_sg_digest_update(struct hash_desc *desc, struct scatterlist *sg,
  */
 static int
 iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
-			struct iscsi_chunk *chunk)
+			struct iscsi_segment *segment)
 {
 	struct iscsi_conn *conn = tcp_conn->iscsi_conn;
 	struct iscsi_hdr *hdr;
@@ -876,7 +877,7 @@ iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
 	 * may need to go back to the caller for more.
 	 */
 	hdr = (struct iscsi_hdr *) tcp_conn->in.hdr_buf;
-	if (chunk->copied == sizeof(struct iscsi_hdr) && hdr->hlength) {
+	if (segment->copied == sizeof(struct iscsi_hdr) && hdr->hlength) {
 		/* Bump the header length - the caller will
 		 * just loop around and get the AHS for us, and
 		 * call again. */
@@ -886,8 +887,8 @@ iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
 		if (sizeof(*hdr) + ahslen > sizeof(tcp_conn->in.hdr_buf))
 			return ISCSI_ERR_AHSLEN;
 
-		chunk->total_size += ahslen;
-		chunk->size += ahslen;
+		segment->total_size += ahslen;
+		segment->size += ahslen;
 		return 0;
 	}
 
@@ -895,16 +896,16 @@ iscsi_tcp_hdr_recv_done(struct iscsi_tcp_conn *tcp_conn,
 	 * header digests; if so, set up the recv_digest buffer
 	 * and go back for more. */
 	if (conn->hdrdgst_en) {
-		if (chunk->digest_len == 0) {
-			iscsi_tcp_chunk_splice_digest(chunk,
-						      chunk->recv_digest);
+		if (segment->digest_len == 0) {
+			iscsi_tcp_segment_splice_digest(segment,
+							segment->recv_digest);
 			return 0;
 		}
 		iscsi_tcp_dgst_header(&tcp_conn->rx_hash, hdr,
-				      chunk->total_copied - ISCSI_DIGEST_SIZE,
-				      chunk->digest);
+				      segment->total_copied - ISCSI_DIGEST_SIZE,
+				      segment->digest);
 
-		if (!iscsi_tcp_dgst_verify(tcp_conn, chunk))
+		if (!iscsi_tcp_dgst_verify(tcp_conn, segment))
 			return ISCSI_ERR_HDR_DGST;
 	}
 
@@ -925,7 +926,7 @@ iscsi_tcp_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
 {
 	struct iscsi_conn *conn = rd_desc->arg.data;
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	struct iscsi_chunk *chunk = &tcp_conn->in.chunk;
+	struct iscsi_segment *segment = &tcp_conn->in.segment;
 	struct skb_seq_state seq;
 	unsigned int consumed = 0;
 	int rc = 0;
@@ -943,27 +944,31 @@ iscsi_tcp_recv(read_descriptor_t *rd_desc, struct sk_buff *skb,
 		const u8 *ptr;
 
 		avail = skb_seq_read(consumed, &ptr, &seq);
-		if (avail == 0)
+		if (avail == 0) {
+			debug_tcp("no more data avail. Consumed %d\n",
+				  consumed);
 			break;
-		BUG_ON(chunk->copied >= chunk->size);
+		}
+		BUG_ON(segment->copied >= segment->size);
 
 		debug_tcp("skb %p ptr=%p avail=%u\n", skb, ptr, avail);
-		rc = iscsi_tcp_chunk_recv(tcp_conn, chunk, ptr, avail);
+		rc = iscsi_tcp_segment_recv(tcp_conn, segment, ptr, avail);
 		BUG_ON(rc == 0);
 		consumed += rc;
 
-		if (chunk->total_copied >= chunk->total_size) {
-			rc = chunk->done(tcp_conn, chunk);
+		if (segment->total_copied >= segment->total_size) {
+			debug_tcp("segment done\n");
+			rc = segment->done(tcp_conn, segment);
 			if (rc != 0) {
 				skb_abort_seq_read(&seq);
 				goto error;
 			}
 
 			/* The done() functions sets up the
-			 * next chunk. */
+			 * next segment. */
 		}
 	}
-
+	skb_abort_seq_read(&seq);
 	conn->rxdata_octets += consumed;
 	return consumed;
 
@@ -996,7 +1001,7 @@ iscsi_tcp_data_ready(struct sock *sk, int flag)
 
 	/* If we had to (atomically) map a highmem page,
 	 * unmap it now. */
-	iscsi_tcp_chunk_unmap(&tcp_conn->in.chunk);
+	iscsi_tcp_segment_unmap(&tcp_conn->in.segment);
 }
 
 static void
@@ -1076,121 +1081,173 @@ iscsi_conn_restore_callbacks(struct iscsi_tcp_conn *tcp_conn)
 }
 
 /**
- * iscsi_send - generic send routine
- * @sk: kernel's socket
- * @buf: buffer to write from
- * @size: actual size to write
- * @flags: socket's flags
- */
-static inline int
-iscsi_send(struct iscsi_conn *conn, struct iscsi_buf *buf, int size, int flags)
+ * iscsi_xmit - TCP transmit
+ **/
+static int
+iscsi_xmit(struct iscsi_conn *conn)
 {
 	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	struct socket *sk = tcp_conn->sock;
-	int offset = buf->sg.offset + buf->sent, res;
+	struct iscsi_segment *segment = &tcp_conn->out.segment;
+	unsigned int consumed = 0;
+	int rc = 0;
 
-	/*
-	 * if we got use_sg=0 or are sending something we kmallocd
-	 * then we did not have to do kmap (kmap returns page_address)
-	 *
-	 * if we got use_sg > 0, but had to drop down, we do not
-	 * set clustering so this should only happen for that
-	 * slab case.
-	 */
-	if (buf->use_sendmsg)
-		res = sock_no_sendpage(sk, buf->sg.page, offset, size, flags);
-	else
-		res = tcp_conn->sendpage(sk, buf->sg.page, offset, size, flags);
-
-	if (res >= 0) {
-		conn->txdata_octets += res;
-		buf->sent += res;
-		return res;
+	while (1) {
+		rc = iscsi_tcp_xmit_segment(tcp_conn, segment);
+		if (rc < 0)
+			goto error;
+		if (rc == 0)
+			break;
+
+		consumed += rc;
+
+		if (segment->total_copied >= segment->total_size) {
+			if (segment->done != NULL) {
+				rc = segment->done(tcp_conn, segment);
+				if (rc < 0)
+					goto error;
+			}
+		}
 	}
 
-	tcp_conn->sendpage_failures_cnt++;
-	if (res == -EAGAIN)
-		res = -ENOBUFS;
-	else
-		iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
-	return res;
+	debug_tcp("xmit %d bytes\n", consumed);
+
+	conn->txdata_octets += consumed;
+	return consumed;
+
+error:
+	/* Transmit error. We could initiate error recovery
+	 * here. */
+	debug_tcp("Error sending PDU, errno=%d\n", rc);
+	iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
+	return rc;
 }
 
 /**
- * iscsi_sendhdr - send PDU Header via tcp_sendpage()
- * @conn: iscsi connection
- * @buf: buffer to write from
- * @datalen: lenght of data to be sent after the header
- *
- * Notes:
- *	(Tx, Fast Path)
- **/
+ * iscsi_tcp_xmit_qlen - return the number of bytes queued for xmit
+ */
 static inline int
-iscsi_sendhdr(struct iscsi_conn *conn, struct iscsi_buf *buf, int datalen)
+iscsi_tcp_xmit_qlen(struct iscsi_conn *conn)
 {
-	int flags = 0; /* MSG_DONTWAIT; */
-	int res, size;
-
-	size = buf->sg.length - buf->sent;
-	BUG_ON(buf->sent + size > buf->sg.length);
-	if (buf->sent + size != buf->sg.length || datalen)
-		flags |= MSG_MORE;
-
-	res = iscsi_send(conn, buf, size, flags);
-	debug_tcp("sendhdr %d bytes, sent %d res %d\n", size, buf->sent, res);
-	if (res >= 0) {
-		if (size != res)
-			return -EAGAIN;
-		return 0;
-	}
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct iscsi_segment *segment = &tcp_conn->out.segment;
 
-	return res;
+	return segment->total_copied - segment->total_size;
 }
 
-/**
- * iscsi_sendpage - send one page of iSCSI Data-Out.
- * @conn: iscsi connection
- * @buf: buffer to write from
- * @count: remaining data
- * @sent: number of bytes sent
- *
- * Notes:
- *	(Tx, Fast Path)
- **/
 static inline int
-iscsi_sendpage(struct iscsi_conn *conn, struct iscsi_buf *buf,
-	       int *count, int *sent)
+iscsi_tcp_flush(struct iscsi_conn *conn)
 {
-	int flags = 0; /* MSG_DONTWAIT; */
-	int res, size;
-
-	size = buf->sg.length - buf->sent;
-	BUG_ON(buf->sent + size > buf->sg.length);
-	if (size > *count)
-		size = *count;
-	if (buf->sent + size != buf->sg.length || *count != size)
-		flags |= MSG_MORE;
-
-	res = iscsi_send(conn, buf, size, flags);
-	debug_tcp("sendpage: %d bytes, sent %d left %d sent %d res %d\n",
-		  size, buf->sent, *count, *sent, res);
-	if (res >= 0) {
-		*count -= res;
-		*sent += res;
-		if (size != res)
+	int rc;
+
+	while (iscsi_tcp_xmit_qlen(conn)) {
+		rc = iscsi_xmit(conn);
+		if (rc == 0)
 			return -EAGAIN;
-		return 0;
+		if (rc < 0)
+			return rc;
 	}
 
-	return res;
+	return 0;
 }
 
-static inline void
-iscsi_data_digest_init(struct iscsi_tcp_conn *tcp_conn,
-		      struct iscsi_tcp_cmd_task *tcp_ctask)
+/*
+ * This is called when we're done sending the header.
+ * Simply copy the data_segment to the send segment, and return.
+ */
+static int
+iscsi_tcp_send_hdr_done(struct iscsi_tcp_conn *tcp_conn,
+			struct iscsi_segment *segment)
 {
-	crypto_hash_init(&tcp_conn->tx_hash);
-	tcp_ctask->digest_count = 4;
+	tcp_conn->out.segment = tcp_conn->out.data_segment;
+	debug_tcp("Header done. Next segment size %u total_size %u\n",
+		  tcp_conn->out.segment.size, tcp_conn->out.segment.total_size);
+	return 0;
+}
+
+static void
+iscsi_tcp_send_hdr_prep(struct iscsi_conn *conn, void *hdr, size_t hdrlen)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+
+	debug_tcp("%s(%p%s)\n", __FUNCTION__, tcp_conn,
+			conn->hdrdgst_en? ", digest enabled" : "");
+
+	/* Clear the data segment - needs to be filled in by the
+	 * caller using iscsi_tcp_send_data_prep() */
+	memset(&tcp_conn->out.data_segment, 0, sizeof(struct iscsi_segment));
+
+	/* If header digest is enabled, compute the CRC and
+	 * place the digest into the same buffer. We make
+	 * sure that both iscsi_tcp_ctask and mtask have
+	 * sufficient room.
+	 */
+	if (conn->hdrdgst_en) {
+		iscsi_tcp_dgst_header(&tcp_conn->tx_hash, hdr, hdrlen,
+				      hdr + hdrlen);
+		hdrlen += ISCSI_DIGEST_SIZE;
+	}
+
+	/* Remember header pointer for later, when we need
+	 * to decide whether there's a payload to go along
+	 * with the header. */
+	tcp_conn->out.hdr = hdr;
+
+	iscsi_segment_init_linear(&tcp_conn->out.segment, hdr, hdrlen,
+				iscsi_tcp_send_hdr_done, NULL);
+}
+
+/*
+ * Prepare the send buffer for the payload data.
+ * Padding and checksumming will all be taken care
+ * of by the iscsi_segment routines.
+ */
+static int
+iscsi_tcp_send_data_prep(struct iscsi_conn *conn, struct scatterlist *sg,
+			 unsigned int count, unsigned int offset,
+			 unsigned int len)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct hash_desc *tx_hash = NULL;
+	unsigned int hdr_spec_len;
+
+	debug_tcp("%s(%p, offset=%d, datalen=%d%s)\n", __FUNCTION__,
+			tcp_conn, offset, len,
+			conn->datadgst_en? ", digest enabled" : "");
+
+	/* Make sure the datalen matches what the caller
+	   said he would send. */
+	hdr_spec_len = ntoh24(tcp_conn->out.hdr->dlength);
+	WARN_ON(iscsi_padded(len) != iscsi_padded(hdr_spec_len));
+
+	if (conn->datadgst_en)
+		tx_hash = &tcp_conn->tx_hash;
+
+	return iscsi_segment_seek_sg(&tcp_conn->out.data_segment,
+				   sg, count, offset, len,
+				   NULL, tx_hash);
+}
+
+static void
+iscsi_tcp_send_linear_data_prepare(struct iscsi_conn *conn, void *data,
+				   size_t len)
+{
+	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
+	struct hash_desc *tx_hash = NULL;
+	unsigned int hdr_spec_len;
+
+	debug_tcp("%s(%p, datalen=%d%s)\n", __FUNCTION__, tcp_conn, len,
+		  conn->datadgst_en? ", digest enabled" : "");
+
+	/* Make sure the datalen matches what the caller
+	   said he would send. */
+	hdr_spec_len = ntoh24(tcp_conn->out.hdr->dlength);
+	WARN_ON(iscsi_padded(len) != iscsi_padded(hdr_spec_len));
+
+	if (conn->datadgst_en)
+		tx_hash = &tcp_conn->tx_hash;
+
+	iscsi_segment_init_linear(&tcp_conn->out.data_segment,
+				data, len, NULL, tx_hash);
 }
 
 /**
@@ -1206,12 +1263,17 @@ iscsi_data_digest_init(struct iscsi_tcp_conn *tcp_conn,
  *
  *	Called under connection lock.
  **/
-static void
+static int
 iscsi_solicit_data_cont(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
-			struct iscsi_r2t_info *r2t, int left)
+			struct iscsi_r2t_info *r2t)
 {
 	struct iscsi_data *hdr;
-	int new_offset;
+	int new_offset, left;
+
+	BUG_ON(r2t->data_length - r2t->sent < 0);
+	left = r2t->data_length - r2t->sent;
+	if (left == 0)
+		return 0;
 
 	hdr = &r2t->dtask.hdr;
 	memset(hdr, 0, sizeof(struct iscsi_data));
@@ -1232,43 +1294,46 @@ iscsi_solicit_data_cont(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
 		r2t->data_count = left;
 		hdr->flags = ISCSI_FLAG_CMD_FINAL;
 	}
-	conn->dataout_pdus_cnt++;
-
-	iscsi_buf_init_iov(&r2t->headbuf, (char*)hdr,
-			   sizeof(struct iscsi_hdr));
-
-	if (iscsi_buf_left(&r2t->sendbuf))
-		return;
-
-	iscsi_buf_init_sg(&r2t->sendbuf, r2t->sg);
-	r2t->sg += 1;
-}
-
-static void iscsi_set_padding(struct iscsi_tcp_cmd_task *tcp_ctask,
-			      unsigned long len)
-{
-	tcp_ctask->pad_count = len & (ISCSI_PAD_LEN - 1);
-	if (!tcp_ctask->pad_count)
-		return;
 
-	tcp_ctask->pad_count = ISCSI_PAD_LEN - tcp_ctask->pad_count;
-	debug_scsi("write padding %d bytes\n", tcp_ctask->pad_count);
-	tcp_ctask->xmstate |= XMSTATE_W_PAD;
+	conn->dataout_pdus_cnt++;
+	return 1;
 }
 
 /**
- * iscsi_tcp_cmd_init - Initialize iSCSI SCSI_READ or SCSI_WRITE commands
+ * iscsi_tcp_ctask - Initialize iSCSI SCSI_READ or SCSI_WRITE commands
  * @conn: iscsi connection
  * @ctask: scsi command task
  * @sc: scsi command
  **/
-static void
-iscsi_tcp_cmd_init(struct iscsi_cmd_task *ctask)
+static int
+iscsi_tcp_ctask_init(struct iscsi_cmd_task *ctask)
 {
 	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
+	struct iscsi_conn *conn = ctask->conn;
+	struct scsi_cmnd *sc = ctask->sc;
+	int err;
 
 	BUG_ON(__kfifo_len(tcp_ctask->r2tqueue));
-	tcp_ctask->xmstate = XMSTATE_CMD_HDR_INIT;
+	tcp_ctask->sent = 0;
+	tcp_ctask->exp_datasn = 0;
+
+	/* Prepare PDU, optionally w/ immediate data */
+	debug_scsi("ctask deq [cid %d itt 0x%x imm %d unsol %d]\n",
+		    conn->id, ctask->itt, ctask->imm_count,
+		    ctask->unsol_count);
+	iscsi_tcp_send_hdr_prep(conn, ctask->hdr, ctask->hdr_len);
+
+	if (!ctask->imm_count)
+		return 0;
+
+	/* If we have immediate data, attach a payload */
+	err = iscsi_tcp_send_data_prep(conn, scsi_sglist(sc), scsi_sg_count(sc),
+				       0, ctask->imm_count);
+	if (err)
+		return err;
+	tcp_ctask->sent += ctask->imm_count;
+	ctask->imm_count = 0;
+	return 0;
 }
 
 /**
@@ -1280,71 +1345,17 @@ iscsi_tcp_cmd_init(struct iscsi_cmd_task *ctask)
  *	The function can return -EAGAIN in which case caller must
  *	call it again later, or recover. '0' return code means successful
  *	xmit.
- *
- *	Management xmit state machine consists of these states:
- *		XMSTATE_IMM_HDR_INIT	- calculate digest of PDU Header
- *		XMSTATE_IMM_HDR 	- PDU Header xmit in progress
- *		XMSTATE_IMM_DATA 	- PDU Data xmit in progress
- *		XMSTATE_IDLE		- management PDU is done
  **/
 static int
 iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 {
-	struct iscsi_tcp_mgmt_task *tcp_mtask = mtask->dd_data;
 	int rc;
 
-	debug_scsi("mtask deq [cid %d state %x itt 0x%x]\n",
-		conn->id, tcp_mtask->xmstate, mtask->itt);
-
-	if (tcp_mtask->xmstate & XMSTATE_IMM_HDR_INIT) {
-		iscsi_buf_init_iov(&tcp_mtask->headbuf, (char*)mtask->hdr,
-				   sizeof(struct iscsi_hdr));
-
-		if (mtask->data_count) {
-			tcp_mtask->xmstate |= XMSTATE_IMM_DATA;
-			iscsi_buf_init_iov(&tcp_mtask->sendbuf,
-					   (char*)mtask->data,
-					   mtask->data_count);
-		}
-
-		if (conn->c_stage != ISCSI_CONN_INITIAL_STAGE &&
-		    conn->stop_stage != STOP_CONN_RECOVER &&
-		    conn->hdrdgst_en)
-			iscsi_hdr_digest(conn, &tcp_mtask->headbuf,
-					(u8*)tcp_mtask->hdrext);
-
-		tcp_mtask->sent = 0;
-		tcp_mtask->xmstate &= ~XMSTATE_IMM_HDR_INIT;
-		tcp_mtask->xmstate |= XMSTATE_IMM_HDR;
-	}
-
-	if (tcp_mtask->xmstate & XMSTATE_IMM_HDR) {
-		rc = iscsi_sendhdr(conn, &tcp_mtask->headbuf,
-				   mtask->data_count);
-		if (rc)
-			return rc;
-		tcp_mtask->xmstate &= ~XMSTATE_IMM_HDR;
-	}
-
-	if (tcp_mtask->xmstate & XMSTATE_IMM_DATA) {
-		BUG_ON(!mtask->data_count);
-		tcp_mtask->xmstate &= ~XMSTATE_IMM_DATA;
-		/* FIXME: implement.
-		 * Virtual buffer could be spreaded across multiple pages...
-		 */
-		do {
-			int rc;
-
-			rc = iscsi_sendpage(conn, &tcp_mtask->sendbuf,
-					&mtask->data_count, &tcp_mtask->sent);
-			if (rc) {
-				tcp_mtask->xmstate |= XMSTATE_IMM_DATA;
-				return rc;
-			}
-		} while (mtask->data_count);
-	}
+	/* Flush any pending data first. */
+	rc = iscsi_tcp_flush(conn);
+	if (rc < 0)
+		return rc;
 
-	BUG_ON(tcp_mtask->xmstate != XMSTATE_IDLE);
 	if (mtask->hdr->itt == RESERVED_ITT) {
 		struct iscsi_session *session = conn->session;
 
@@ -1352,411 +1363,112 @@ iscsi_tcp_mtask_xmit(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 		iscsi_free_mgmt_task(conn, mtask);
 		spin_unlock_bh(&session->lock);
 	}
+
 	return 0;
 }
 
+/*
+ * iscsi_tcp_ctask_xmit - xmit normal PDU task
+ * @conn: iscsi connection
+ * @ctask: iscsi command task
+ *
+ * We're expected to return 0 when everything was transmitted succesfully,
+ * -EAGAIN if there's still data in the queue, or != 0 for any other kind
+ * of error.
+ */
 static int
-iscsi_send_cmd_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
+iscsi_tcp_ctask_xmit(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 {
-	struct scsi_cmnd *sc = ctask->sc;
 	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
+	struct scsi_cmnd *sc = ctask->sc;
 	int rc = 0;
 
-	if (tcp_ctask->xmstate & XMSTATE_CMD_HDR_INIT) {
-		tcp_ctask->sent = 0;
-		tcp_ctask->sg_count = 0;
-		tcp_ctask->exp_datasn = 0;
-
-		if (sc->sc_data_direction == DMA_TO_DEVICE) {
-			struct scatterlist *sg = scsi_sglist(sc);
-
-			iscsi_buf_init_sg(&tcp_ctask->sendbuf, sg);
-			tcp_ctask->sg = sg + 1;
-			tcp_ctask->bad_sg = sg + scsi_sg_count(sc);
-
-			debug_scsi("cmd [itt 0x%x total %d imm_data %d "
-				   "unsol count %d, unsol offset %d]\n",
-				   ctask->itt, scsi_bufflen(sc),
-				   ctask->imm_count, ctask->unsol_count,
-				   ctask->unsol_offset);
-		}
-
-		iscsi_buf_init_iov(&tcp_ctask->headbuf, (char*)ctask->hdr,
-				  ctask->hdr_len);
-
-		if (conn->hdrdgst_en)
-			iscsi_hdr_digest(conn, &tcp_ctask->headbuf,
-					 iscsi_next_hdr(ctask));
-		tcp_ctask->xmstate &= ~XMSTATE_CMD_HDR_INIT;
-		tcp_ctask->xmstate |= XMSTATE_CMD_HDR_XMIT;
-	}
-
-	if (tcp_ctask->xmstate & XMSTATE_CMD_HDR_XMIT) {
-		rc = iscsi_sendhdr(conn, &tcp_ctask->headbuf, ctask->imm_count);
-		if (rc)
-			return rc;
-		tcp_ctask->xmstate &= ~XMSTATE_CMD_HDR_XMIT;
-
-		if (sc->sc_data_direction != DMA_TO_DEVICE)
-			return 0;
-
-		if (ctask->imm_count) {
-			tcp_ctask->xmstate |= XMSTATE_IMM_DATA;
-			iscsi_set_padding(tcp_ctask, ctask->imm_count);
-
-			if (ctask->conn->datadgst_en) {
-				iscsi_data_digest_init(ctask->conn->dd_data,
-						       tcp_ctask);
-				tcp_ctask->immdigest = 0;
-			}
-		}
-
-		if (ctask->unsol_count)
-			tcp_ctask->xmstate |=
-					XMSTATE_UNS_HDR | XMSTATE_UNS_INIT;
-	}
-	return rc;
-}
-
-static int
-iscsi_send_padding(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	int sent = 0, rc;
-
-	if (tcp_ctask->xmstate & XMSTATE_W_PAD) {
-		iscsi_buf_init_iov(&tcp_ctask->sendbuf, (char*)&tcp_ctask->pad,
-				   tcp_ctask->pad_count);
-		if (conn->datadgst_en)
-			crypto_hash_update(&tcp_conn->tx_hash,
-					   &tcp_ctask->sendbuf.sg,
-					   tcp_ctask->sendbuf.sg.length);
-	} else if (!(tcp_ctask->xmstate & XMSTATE_W_RESEND_PAD))
-		return 0;
-
-	tcp_ctask->xmstate &= ~XMSTATE_W_PAD;
-	tcp_ctask->xmstate &= ~XMSTATE_W_RESEND_PAD;
-	debug_scsi("sending %d pad bytes for itt 0x%x\n",
-		   tcp_ctask->pad_count, ctask->itt);
-	rc = iscsi_sendpage(conn, &tcp_ctask->sendbuf, &tcp_ctask->pad_count,
-			   &sent);
-	if (rc) {
-		debug_scsi("padding send failed %d\n", rc);
-		tcp_ctask->xmstate |= XMSTATE_W_RESEND_PAD;
-	}
-	return rc;
-}
-
-static int
-iscsi_send_digest(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask,
-			struct iscsi_buf *buf, uint32_t *digest)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask;
-	struct iscsi_tcp_conn *tcp_conn;
-	int rc, sent = 0;
-
-	if (!conn->datadgst_en)
-		return 0;
-
-	tcp_ctask = ctask->dd_data;
-	tcp_conn = conn->dd_data;
-
-	if (!(tcp_ctask->xmstate & XMSTATE_W_RESEND_DATA_DIGEST)) {
-		crypto_hash_final(&tcp_conn->tx_hash, (u8*)digest);
-		iscsi_buf_init_iov(buf, (char*)digest, 4);
-	}
-	tcp_ctask->xmstate &= ~XMSTATE_W_RESEND_DATA_DIGEST;
-
-	rc = iscsi_sendpage(conn, buf, &tcp_ctask->digest_count, &sent);
-	if (!rc)
-		debug_scsi("sent digest 0x%x for itt 0x%x\n", *digest,
-			  ctask->itt);
-	else {
-		debug_scsi("sending digest 0x%x failed for itt 0x%x!\n",
-			  *digest, ctask->itt);
-		tcp_ctask->xmstate |= XMSTATE_W_RESEND_DATA_DIGEST;
-	}
-	return rc;
-}
-
-static int
-iscsi_send_data(struct iscsi_cmd_task *ctask, struct iscsi_buf *sendbuf,
-		struct scatterlist **sg, int *sent, int *count,
-		struct iscsi_buf *digestbuf, uint32_t *digest)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	struct iscsi_conn *conn = ctask->conn;
-	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
-	int rc, buf_sent, offset;
-
-	while (*count) {
-		buf_sent = 0;
-		offset = sendbuf->sent;
-
-		rc = iscsi_sendpage(conn, sendbuf, count, &buf_sent);
-		*sent = *sent + buf_sent;
-		if (buf_sent && conn->datadgst_en)
-			partial_sg_digest_update(&tcp_conn->tx_hash,
-				&sendbuf->sg, sendbuf->sg.offset + offset,
-				buf_sent);
-		if (!iscsi_buf_left(sendbuf) && *sg != tcp_ctask->bad_sg) {
-			iscsi_buf_init_sg(sendbuf, *sg);
-			*sg = *sg + 1;
-		}
-
-		if (rc)
-			return rc;
-	}
-
-	rc = iscsi_send_padding(conn, ctask);
-	if (rc)
+flush:
+	/* Flush any pending data first. */
+	rc = iscsi_tcp_flush(conn);
+	if (rc < 0)
 		return rc;
 
-	return iscsi_send_digest(conn, ctask, digestbuf, digest);
-}
-
-static int
-iscsi_send_unsol_hdr(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	struct iscsi_data_task *dtask;
-	int rc;
-
-	tcp_ctask->xmstate |= XMSTATE_UNS_DATA;
-	if (tcp_ctask->xmstate & XMSTATE_UNS_INIT) {
-		dtask = &tcp_ctask->unsol_dtask;
-
-		iscsi_prep_unsolicit_data_pdu(ctask, &dtask->hdr);
-		iscsi_buf_init_iov(&tcp_ctask->headbuf, (char*)&dtask->hdr,
-				   sizeof(struct iscsi_hdr));
-		if (conn->hdrdgst_en)
-			iscsi_hdr_digest(conn, &tcp_ctask->headbuf,
-					(u8*)dtask->hdrext);
-
-		tcp_ctask->xmstate &= ~XMSTATE_UNS_INIT;
-		iscsi_set_padding(tcp_ctask, ctask->data_count);
-	}
-
-	rc = iscsi_sendhdr(conn, &tcp_ctask->headbuf, ctask->data_count);
-	if (rc) {
-		tcp_ctask->xmstate &= ~XMSTATE_UNS_DATA;
-		tcp_ctask->xmstate |= XMSTATE_UNS_HDR;
-		return rc;
-	}
+	/* Are we done already? */
+	if (sc->sc_data_direction != DMA_TO_DEVICE)
+		return 0;
 
-	if (conn->datadgst_en) {
-		dtask = &tcp_ctask->unsol_dtask;
-		iscsi_data_digest_init(ctask->conn->dd_data, tcp_ctask);
-		dtask->digest = 0;
-	}
+	if (ctask->unsol_count != 0) {
+		struct iscsi_data *hdr = &tcp_ctask->unsol_dtask.hdr;
 
-	debug_scsi("uns dout [itt 0x%x dlen %d sent %d]\n",
-		   ctask->itt, ctask->unsol_count, tcp_ctask->sent);
-	return 0;
-}
+		/* Prepare a header for the unsolicited PDU.
+		 * The amount of data we want to send will be
+		 * in ctask->data_count.
+		 * FIXME: return the data count instead.
+		 */
+		iscsi_prep_unsolicit_data_pdu(ctask, hdr);
 
-static int
-iscsi_send_unsol_pdu(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	int rc;
+		debug_tcp("unsol dout [itt 0x%x doff %d dlen %d]\n",
+				ctask->itt, tcp_ctask->sent, ctask->data_count);
 
-	if (tcp_ctask->xmstate & XMSTATE_UNS_HDR) {
-		BUG_ON(!ctask->unsol_count);
-		tcp_ctask->xmstate &= ~XMSTATE_UNS_HDR;
-send_hdr:
-		rc = iscsi_send_unsol_hdr(conn, ctask);
+		iscsi_tcp_send_hdr_prep(conn, hdr, sizeof(*hdr));
+		rc = iscsi_tcp_send_data_prep(conn, scsi_sglist(sc),
+					      scsi_sg_count(sc),
+					      tcp_ctask->sent,
+					      ctask->data_count);
 		if (rc)
-			return rc;
-	}
-
-	if (tcp_ctask->xmstate & XMSTATE_UNS_DATA) {
-		struct iscsi_data_task *dtask = &tcp_ctask->unsol_dtask;
-		int start = tcp_ctask->sent;
+			goto fail;
+		tcp_ctask->sent += ctask->data_count;
+		ctask->unsol_count -= ctask->data_count;
+		goto flush;
+	} else {
+		struct iscsi_session *session = conn->session;
+		struct iscsi_r2t_info *r2t;
 
-		rc = iscsi_send_data(ctask, &tcp_ctask->sendbuf, &tcp_ctask->sg,
-				     &tcp_ctask->sent, &ctask->data_count,
-				     &dtask->digestbuf, &dtask->digest);
-		ctask->unsol_count -= tcp_ctask->sent - start;
-		if (rc)
-			return rc;
-		tcp_ctask->xmstate &= ~XMSTATE_UNS_DATA;
-		/*
-		 * Done with the Data-Out. Next, check if we need
-		 * to send another unsolicited Data-Out.
+		/* All unsolicited PDUs sent. Check for solicited PDUs.
 		 */
-		if (ctask->unsol_count) {
-			debug_scsi("sending more uns\n");
-			tcp_ctask->xmstate |= XMSTATE_UNS_INIT;
-			goto send_hdr;
+		spin_lock_bh(&session->lock);
+		r2t = tcp_ctask->r2t;
+		if (r2t != NULL) {
+			/* Continue with this R2T? */
+			if (!iscsi_solicit_data_cont(conn, ctask, r2t)) {
+				debug_scsi("  done with r2t %p\n", r2t);
+
+				__kfifo_put(tcp_ctask->r2tpool.queue,
+					    (void*)&r2t, sizeof(void*));
+				tcp_ctask->r2t = r2t = NULL;
+			}
 		}
-	}
-	return 0;
-}
 
-static int iscsi_send_sol_pdu(struct iscsi_conn *conn,
-			      struct iscsi_cmd_task *ctask)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	struct iscsi_session *session = conn->session;
-	struct iscsi_r2t_info *r2t;
-	struct iscsi_data_task *dtask;
-	int left, rc;
-
-	if (tcp_ctask->xmstate & XMSTATE_SOL_HDR_INIT) {
-		if (!tcp_ctask->r2t) {
-			spin_lock_bh(&session->lock);
+		if (r2t == NULL) {
 			__kfifo_get(tcp_ctask->r2tqueue, (void*)&tcp_ctask->r2t,
 				    sizeof(void*));
-			spin_unlock_bh(&session->lock);
+			r2t = tcp_ctask->r2t;
 		}
-send_hdr:
-		r2t = tcp_ctask->r2t;
-		dtask = &r2t->dtask;
-
-		if (conn->hdrdgst_en)
-			iscsi_hdr_digest(conn, &r2t->headbuf,
-					(u8*)dtask->hdrext);
-		tcp_ctask->xmstate &= ~XMSTATE_SOL_HDR_INIT;
-		tcp_ctask->xmstate |= XMSTATE_SOL_HDR;
-	}
-
-	if (tcp_ctask->xmstate & XMSTATE_SOL_HDR) {
-		r2t = tcp_ctask->r2t;
-		dtask = &r2t->dtask;
-
-		rc = iscsi_sendhdr(conn, &r2t->headbuf, r2t->data_count);
-		if (rc)
-			return rc;
-		tcp_ctask->xmstate &= ~XMSTATE_SOL_HDR;
-		tcp_ctask->xmstate |= XMSTATE_SOL_DATA;
+		spin_unlock_bh(&session->lock);
 
-		if (conn->datadgst_en) {
-			iscsi_data_digest_init(conn->dd_data, tcp_ctask);
-			dtask->digest = 0;
+		/* Waiting for more R2Ts to arrive. */
+		if (r2t == NULL) {
+			debug_tcp("no R2Ts yet\n");
+			return 0;
 		}
 
-		iscsi_set_padding(tcp_ctask, r2t->data_count);
-		debug_scsi("sol dout [dsn %d itt 0x%x dlen %d sent %d]\n",
-			r2t->solicit_datasn - 1, ctask->itt, r2t->data_count,
-			r2t->sent);
-	}
+		debug_scsi("sol dout %p [dsn %d itt 0x%x doff %d dlen %d]\n",
+			r2t, r2t->solicit_datasn - 1, ctask->itt,
+			r2t->data_offset + r2t->sent, r2t->data_count);
 
-	if (tcp_ctask->xmstate & XMSTATE_SOL_DATA) {
-		r2t = tcp_ctask->r2t;
-		dtask = &r2t->dtask;
+		iscsi_tcp_send_hdr_prep(conn, &r2t->dtask.hdr,
+					sizeof(struct iscsi_hdr));
 
-		rc = iscsi_send_data(ctask, &r2t->sendbuf, &r2t->sg,
-				     &r2t->sent, &r2t->data_count,
-				     &dtask->digestbuf, &dtask->digest);
+		rc = iscsi_tcp_send_data_prep(conn, scsi_sglist(sc),
+					      scsi_sg_count(sc),
+					      r2t->data_offset + r2t->sent,
+					      r2t->data_count);
 		if (rc)
-			return rc;
-		tcp_ctask->xmstate &= ~XMSTATE_SOL_DATA;
-
-		/*
-		 * Done with this Data-Out. Next, check if we have
-		 * to send another Data-Out for this R2T.
-		 */
-		BUG_ON(r2t->data_length - r2t->sent < 0);
-		left = r2t->data_length - r2t->sent;
-		if (left) {
-			iscsi_solicit_data_cont(conn, ctask, r2t, left);
-			goto send_hdr;
-		}
-
-		/*
-		 * Done with this R2T. Check if there are more
-		 * outstanding R2Ts ready to be processed.
-		 */
-		spin_lock_bh(&session->lock);
-		tcp_ctask->r2t = NULL;
-		__kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t,
-			    sizeof(void*));
-		if (__kfifo_get(tcp_ctask->r2tqueue, (void*)&r2t,
-				sizeof(void*))) {
-			tcp_ctask->r2t = r2t;
-			spin_unlock_bh(&session->lock);
-			goto send_hdr;
-		}
-		spin_unlock_bh(&session->lock);
+			goto fail;
+		tcp_ctask->sent += r2t->data_count;
+		r2t->sent += r2t->data_count;
+		goto flush;
 	}
 	return 0;
-}
-
-/**
- * iscsi_tcp_ctask_xmit - xmit normal PDU task
- * @conn: iscsi connection
- * @ctask: iscsi command task
- *
- * Notes:
- *	The function can return -EAGAIN in which case caller must
- *	call it again later, or recover. '0' return code means successful
- *	xmit.
- *	The function is devided to logical helpers (above) for the different
- *	xmit stages.
- *
- *iscsi_send_cmd_hdr()
- *	XMSTATE_CMD_HDR_INIT - prepare Header and Data buffers Calculate
- *	                       Header Digest
- *	XMSTATE_CMD_HDR_XMIT - Transmit header in progress
- *
- *iscsi_send_padding
- *	XMSTATE_W_PAD        - Prepare and send pading
- *	XMSTATE_W_RESEND_PAD - retry send pading
- *
- *iscsi_send_digest
- *	XMSTATE_W_RESEND_DATA_DIGEST - Finalize and send Data Digest
- *	XMSTATE_W_RESEND_DATA_DIGEST - retry sending digest
- *
- *iscsi_send_unsol_hdr
- *	XMSTATE_UNS_INIT     - prepare un-solicit data header and digest
- *	XMSTATE_UNS_HDR      - send un-solicit header
- *
- *iscsi_send_unsol_pdu
- *	XMSTATE_UNS_DATA     - send un-solicit data in progress
- *
- *iscsi_send_sol_pdu
- *	XMSTATE_SOL_HDR_INIT - solicit data header and digest initialize
- *	XMSTATE_SOL_HDR      - send solicit header
- *	XMSTATE_SOL_DATA     - send solicit data
- *
- *iscsi_tcp_ctask_xmit
- *	XMSTATE_IMM_DATA     - xmit managment data (??)
- **/
-static int
-iscsi_tcp_ctask_xmit(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
-{
-	struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
-	int rc = 0;
-
-	debug_scsi("ctask deq [cid %d xmstate %x itt 0x%x]\n",
-		conn->id, tcp_ctask->xmstate, ctask->itt);
-
-	rc = iscsi_send_cmd_hdr(conn, ctask);
-	if (rc)
-		return rc;
-	if (ctask->sc->sc_data_direction != DMA_TO_DEVICE)
-		return 0;
-
-	if (tcp_ctask->xmstate & XMSTATE_IMM_DATA) {
-		rc = iscsi_send_data(ctask, &tcp_ctask->sendbuf, &tcp_ctask->sg,
-				     &tcp_ctask->sent, &ctask->imm_count,
-				     &tcp_ctask->immbuf, &tcp_ctask->immdigest);
-		if (rc)
-			return rc;
-		tcp_ctask->xmstate &= ~XMSTATE_IMM_DATA;
-	}
-
-	rc = iscsi_send_unsol_pdu(conn, ctask);
-	if (rc)
-		return rc;
-
-	rc = iscsi_send_sol_pdu(conn, ctask);
-	if (rc)
-		return rc;
-
-	return rc;
+fail:
+	iscsi_conn_failure(conn, rc);
+	return -EIO;
 }
 
 static struct iscsi_cls_conn *
@@ -1970,10 +1682,17 @@ free_socket:
 
 /* called with host lock */
 static void
-iscsi_tcp_mgmt_init(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
+iscsi_tcp_mtask_init(struct iscsi_conn *conn, struct iscsi_mgmt_task *mtask)
 {
-	struct iscsi_tcp_mgmt_task *tcp_mtask = mtask->dd_data;
-	tcp_mtask->xmstate = XMSTATE_IMM_HDR_INIT;
+	debug_scsi("mtask deq [cid %d itt 0x%x]\n", conn->id, mtask->itt);
+
+	/* Prepare PDU, optionally w/ immediate data */
+	iscsi_tcp_send_hdr_prep(conn, mtask->hdr, sizeof(*mtask->hdr));
+
+	/* If we have immediate data, attach a payload */
+	if (mtask->data_count)
+		iscsi_tcp_send_linear_data_prepare(conn, mtask->data,
+						   mtask->data_count);
 }
 
 static int
@@ -2177,7 +1896,7 @@ iscsi_tcp_session_create(struct iscsi_transport *iscsit,
 		struct iscsi_mgmt_task *mtask = session->mgmt_cmds[cmd_i];
 		struct iscsi_tcp_mgmt_task *tcp_mtask = mtask->dd_data;
 
-		mtask->hdr = &tcp_mtask->hdr;
+		mtask->hdr = (struct iscsi_hdr *) &tcp_mtask->hdr;
 	}
 
 	if (iscsi_r2tpool_alloc(class_to_transport_session(cls_session)))
@@ -2274,8 +1993,8 @@ static struct iscsi_transport iscsi_tcp_transport = {
 	/* IO */
 	.send_pdu		= iscsi_conn_send_pdu,
 	.get_stats		= iscsi_conn_get_stats,
-	.init_cmd_task		= iscsi_tcp_cmd_init,
-	.init_mgmt_task		= iscsi_tcp_mgmt_init,
+	.init_cmd_task		= iscsi_tcp_ctask_init,
+	.init_mgmt_task		= iscsi_tcp_mtask_init,
 	.xmit_cmd_task		= iscsi_tcp_ctask_xmit,
 	.xmit_mgmt_task		= iscsi_tcp_mtask_xmit,
 	.cleanup_cmd_task	= iscsi_tcp_cleanup_ctask,
diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
index d49d876..893cd2e 100644
--- a/drivers/scsi/iscsi_tcp.h
+++ b/drivers/scsi/iscsi_tcp.h
@@ -24,35 +24,18 @@
 
 #include <scsi/libiscsi.h>
 
-/* xmit state machine */
-#define XMSTATE_IDLE			0x0
-#define XMSTATE_CMD_HDR_INIT		0x1
-#define XMSTATE_CMD_HDR_XMIT		0x2
-#define XMSTATE_IMM_HDR			0x4
-#define XMSTATE_IMM_DATA		0x8
-#define XMSTATE_UNS_INIT		0x10
-#define XMSTATE_UNS_HDR			0x20
-#define XMSTATE_UNS_DATA		0x40
-#define XMSTATE_SOL_HDR			0x80
-#define XMSTATE_SOL_DATA		0x100
-#define XMSTATE_W_PAD			0x200
-#define XMSTATE_W_RESEND_PAD		0x400
-#define XMSTATE_W_RESEND_DATA_DIGEST	0x800
-#define XMSTATE_IMM_HDR_INIT		0x1000
-#define XMSTATE_SOL_HDR_INIT		0x2000
-
 #define ISCSI_SG_TABLESIZE		SG_ALL
 #define ISCSI_TCP_MAX_CMD_LEN		16
 
 struct crypto_hash;
 struct socket;
 struct iscsi_tcp_conn;
-struct iscsi_chunk;
+struct iscsi_segment;
 
-typedef int iscsi_chunk_done_fn_t(struct iscsi_tcp_conn *,
-				  struct iscsi_chunk *);
+typedef int iscsi_segment_done_fn_t(struct iscsi_tcp_conn *,
+				    struct iscsi_segment *);
 
-struct iscsi_chunk {
+struct iscsi_segment {
 	unsigned char		*data;
 	unsigned int		size;
 	unsigned int		copied;
@@ -67,16 +50,14 @@ struct iscsi_chunk {
 	struct scatterlist	*sg;
 	void			*sg_mapped;
 	unsigned int		sg_offset;
-	unsigned int		sg_index;
-	unsigned int		sg_count;
 
-	iscsi_chunk_done_fn_t	*done;
+	iscsi_segment_done_fn_t	*done;
 };
 
 /* Socket connection recieve helper */
 struct iscsi_tcp_recv {
 	struct iscsi_hdr	*hdr;
-	struct iscsi_chunk	chunk;
+	struct iscsi_segment	segment;
 
 	/* Allocate buffer for BHS + AHS */
 	uint32_t		hdr_buf[64];
@@ -88,11 +69,8 @@ struct iscsi_tcp_recv {
 /* Socket connection send helper */
 struct iscsi_tcp_send {
 	struct iscsi_hdr	*hdr;
-	struct iscsi_chunk	chunk;
-	struct iscsi_chunk	data_chunk;
-
-	/* Allocate buffer for BHS + AHS */
-	uint32_t		hdr_buf[64];
+	struct iscsi_segment	segment;
+	struct iscsi_segment	data_segment;
 };
 
 struct iscsi_tcp_conn {
@@ -118,29 +96,19 @@ struct iscsi_tcp_conn {
 	uint32_t		sendpage_failures_cnt;
 	uint32_t		discontiguous_hdr_cnt;
 
-	ssize_t (*sendpage)(struct socket *, struct page *, int, size_t, int);
-};
+	int			error;
 
-struct iscsi_buf {
-	struct scatterlist	sg;
-	unsigned int		sent;
-	char			use_sendmsg;
+	ssize_t (*sendpage)(struct socket *, struct page *, int, size_t, int);
 };
 
 struct iscsi_data_task {
 	struct iscsi_data	hdr;			/* PDU */
 	char			hdrext[ISCSI_DIGEST_SIZE];/* Header-Digest */
-	struct iscsi_buf	digestbuf;		/* digest buffer */
-	uint32_t		digest;			/* data digest */
 };
 
 struct iscsi_tcp_mgmt_task {
 	struct iscsi_hdr	hdr;
 	char			hdrext[ISCSI_DIGEST_SIZE]; /* Header-Digest */
-	int			xmstate;	/* mgmt xmit progress */
-	struct iscsi_buf	headbuf;	/* header buffer */
-	struct iscsi_buf	sendbuf;	/* in progress buffer */
-	int			sent;
 };
 
 struct iscsi_r2t_info {
@@ -148,13 +116,10 @@ struct iscsi_r2t_info {
 	__be32			exp_statsn;	/* copied from R2T */
 	uint32_t		data_length;	/* copied from R2T */
 	uint32_t		data_offset;	/* copied from R2T */
-	struct iscsi_buf	headbuf;	/* Data-Out Header Buffer */
-	struct iscsi_buf	sendbuf;	/* Data-Out in progress buffer*/
 	int			sent;		/* R2T sequence progress */
 	int			data_count;	/* DATA-Out payload progress */
-	struct scatterlist	*sg;		/* per-R2T SG list */
 	int			solicit_datasn;
-	struct iscsi_data_task   dtask;        /* which data task */
+	struct iscsi_data_task	dtask;		/* Data-Out header buf */
 };
 
 struct iscsi_tcp_cmd_task {
@@ -163,24 +128,14 @@ struct iscsi_tcp_cmd_task {
 		char			hdrextbuf[ISCSI_MAX_AHS_SIZE +
 		                                  ISCSI_DIGEST_SIZE];
 	} hdr;
-	char			pad[ISCSI_PAD_LEN];
-	int			pad_count;		/* padded bytes */
-	struct iscsi_buf	headbuf;		/* header buf (xmit) */
-	struct iscsi_buf	sendbuf;		/* in progress buffer*/
-	int			xmstate;		/* xmit xtate machine */
+
 	int			sent;
-	struct scatterlist	*sg;			/* per-cmd SG list  */
-	struct scatterlist	*bad_sg;		/* assert statement */
-	int			sg_count;		/* SG's to process  */
-	uint32_t		exp_datasn;		/* expected target's R2TSN/DataSN */
+	uint32_t		exp_datasn;	/* expected target's R2TSN/DataSN */
 	int			data_offset;
-	struct iscsi_r2t_info	*r2t;			/* in progress R2T    */
+	struct iscsi_r2t_info	*r2t;		/* in progress R2T    */
 	struct iscsi_pool	r2tpool;
 	struct kfifo		*r2tqueue;
-	int			digest_count;
-	uint32_t		immdigest;		/* for imm data */
-	struct iscsi_buf	immbuf;			/* for imm data digest */
-	struct iscsi_data_task	unsol_dtask;	/* unsol data task */
+	struct iscsi_data_task	unsol_dtask;	/* Data-Out header buf */
 };
 
 #endif /* ISCSI_H */
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index b0bc8c3..f15df8d 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -156,20 +156,19 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
 	rc = iscsi_add_hdr(ctask, sizeof(*hdr));
 	if (rc)
 		return rc;
-        hdr->opcode = ISCSI_OP_SCSI_CMD;
-        hdr->flags = ISCSI_ATTR_SIMPLE;
-        int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun);
-        hdr->itt = build_itt(ctask->itt, conn->id, session->age);
-        hdr->data_length = cpu_to_be32(scsi_bufflen(sc));
-        hdr->cmdsn = cpu_to_be32(session->cmdsn);
-        session->cmdsn++;
-        hdr->exp_statsn = cpu_to_be32(conn->exp_statsn);
-        memcpy(hdr->cdb, sc->cmnd, sc->cmd_len);
+	hdr->opcode = ISCSI_OP_SCSI_CMD;
+	hdr->flags = ISCSI_ATTR_SIMPLE;
+	int_to_scsilun(sc->device->lun, (struct scsi_lun *)hdr->lun);
+	hdr->itt = build_itt(ctask->itt, conn->id, session->age);
+	hdr->data_length = cpu_to_be32(scsi_bufflen(sc));
+	hdr->cmdsn = cpu_to_be32(session->cmdsn);
+	session->cmdsn++;
+	hdr->exp_statsn = cpu_to_be32(conn->exp_statsn);
+	memcpy(hdr->cdb, sc->cmnd, sc->cmd_len);
 	if (sc->cmd_len < MAX_COMMAND_SIZE)
 		memset(&hdr->cdb[sc->cmd_len], 0,
 			MAX_COMMAND_SIZE - sc->cmd_len);
 
-	ctask->data_count = 0;
 	ctask->imm_count = 0;
 	if (sc->sc_data_direction == DMA_TO_DEVICE) {
 		hdr->flags |= ISCSI_FLAG_CMD_WRITE;
@@ -198,9 +197,9 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
 			else
 				ctask->imm_count = min(scsi_bufflen(sc),
 							conn->max_xmit_dlength);
-			hton24(ctask->hdr->dlength, ctask->imm_count);
+			hton24(hdr->dlength, ctask->imm_count);
 		} else
-			zero_data(ctask->hdr->dlength);
+			zero_data(hdr->dlength);
 
 		if (!session->initial_r2t_en) {
 			ctask->unsol_count = min((session->first_burst),
@@ -210,7 +209,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
 
 		if (!ctask->unsol_count)
 			/* No unsolicit Data-Out's */
-			ctask->hdr->flags |= ISCSI_FLAG_CMD_FINAL;
+			hdr->flags |= ISCSI_FLAG_CMD_FINAL;
 	} else {
 		hdr->flags |= ISCSI_FLAG_CMD_FINAL;
 		zero_data(hdr->dlength);
@@ -228,13 +227,15 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_cmd_task *ctask)
 	WARN_ON(hdrlength >= 256);
 	hdr->hlength = hdrlength & 0xFF;
 
-	conn->scsicmd_pdus_cnt++;
+	if (conn->session->tt->init_cmd_task(conn->ctask))
+		return EIO;
 
-        debug_scsi("iscsi prep [%s cid %d sc %p cdb 0x%x itt 0x%x len %d "
+	conn->scsicmd_pdus_cnt++;
+	debug_scsi("iscsi prep [%s cid %d sc %p cdb 0x%x itt 0x%x len %d "
 		"cmdsn %d win %d]\n",
-                sc->sc_data_direction == DMA_TO_DEVICE ? "write" : "read",
+		sc->sc_data_direction == DMA_TO_DEVICE ? "write" : "read",
 		conn->id, sc, sc->cmnd[0], ctask->itt, scsi_bufflen(sc),
-                session->cmdsn, session->max_cmdsn - session->exp_cmdsn + 1);
+		session->cmdsn, session->max_cmdsn - session->exp_cmdsn + 1);
 	return 0;
 }
 
@@ -927,7 +928,7 @@ check_mgmt:
 			fail_command(conn, conn->ctask, DID_ABORT << 16);
 			continue;
 		}
-		conn->session->tt->init_cmd_task(conn->ctask);
+
 		conn->ctask->state = ISCSI_TASK_RUNNING;
 		list_move_tail(conn->xmitqueue.next, &conn->run_list);
 		rc = iscsi_xmit_ctask(conn);
diff --git a/include/scsi/scsi_transport_iscsi.h b/include/scsi/scsi_transport_iscsi.h
index 093b403..404f11d 100644
--- a/include/scsi/scsi_transport_iscsi.h
+++ b/include/scsi/scsi_transport_iscsi.h
@@ -118,7 +118,7 @@ struct iscsi_transport {
 			 char *data, uint32_t data_size);
 	void (*get_stats) (struct iscsi_cls_conn *conn,
 			   struct iscsi_stats *stats);
-	void (*init_cmd_task) (struct iscsi_cmd_task *ctask);
+	int (*init_cmd_task) (struct iscsi_cmd_task *ctask);
 	void (*init_mgmt_task) (struct iscsi_conn *conn,
 				struct iscsi_mgmt_task *mtask);
 	int (*xmit_cmd_task) (struct iscsi_conn *conn,
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 17/23] iscsi_tcp: stop leaking r2t_info's when the incoming R2T is bad
  2007-12-05  0:42                               ` [PATCH 16/23] convert xmit path to iscsi chunks michaelc
@ 2007-12-05  0:42                                 ` michaelc
  2007-12-05  0:42                                   ` [PATCH 18/23] iscsi_tcp: drop session when itt does not match any command michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, Olaf Kirch

From: Mike Christie <michaelc@cs.wisc.edu>

from olaf.kirch@oracle.com:

iscsi_r2t_rsp checks the incoming R2T for sanity, and if it
thinks it's fishy, it will drop it silently. In this case, we
leaked an r2t_info object. If we do this often enough, we run
into a BUG_ON some time later.

Removed r2t wrappers and update patch by Mike Christie

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 7212fe9..ecba606 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -658,6 +658,8 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	r2t->data_length = be32_to_cpu(rhdr->data_length);
 	if (r2t->data_length == 0) {
 		printk(KERN_ERR "iscsi_tcp: invalid R2T with zero data len\n");
+		__kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t,
+			    sizeof(void*));
 		spin_unlock(&session->lock);
 		return ISCSI_ERR_DATALEN;
 	}
@@ -669,10 +671,12 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 
 	r2t->data_offset = be32_to_cpu(rhdr->data_offset);
 	if (r2t->data_offset + r2t->data_length > scsi_bufflen(ctask->sc)) {
-		spin_unlock(&session->lock);
 		printk(KERN_ERR "iscsi_tcp: invalid R2T with data len %u at "
 		       "offset %u and total length %d\n", r2t->data_length,
 		       r2t->data_offset, scsi_bufflen(ctask->sc));
+		__kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t,
+			    sizeof(void*));
+		spin_unlock(&session->lock);
 		return ISCSI_ERR_DATALEN;
 	}
 
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 18/23] iscsi_tcp: drop session when itt does not match any command
  2007-12-05  0:42                                 ` [PATCH 17/23] iscsi_tcp: stop leaking r2t_info's when the incoming R2T is bad michaelc
@ 2007-12-05  0:42                                   ` michaelc
  2007-12-05  0:42                                     ` [PATCH 19/23] libiscsi, iscsi class: set tmf to a safe default and export in sysfs michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

A target should never send us a itt that does not match a running
task. If it does we do not really know what is coming down after the header,
unless we evaluate the hdr and do some guessing sometimes. However,
even if we know what is coming we probably do not have buffers for it or we
cannot respond (if it is a r2t for example), so just drop the session.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index ecba606..65df908 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -755,11 +755,7 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 	opcode = hdr->opcode & ISCSI_OPCODE_MASK;
 	/* verify itt (itt encoding: age+cid+itt) */
 	rc = iscsi_verify_itt(conn, hdr, &itt);
-	if (rc == ISCSI_ERR_NO_SCSI_CMD) {
-		/* XXX: what does this do? */
-		tcp_conn->in.datalen = 0; /* force drop */
-		return 0;
-	} else if (rc)
+	if (rc)
 		return rc;
 
 	debug_tcp("opcode 0x%x ahslen %d datalen %d\n",
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 19/23] libiscsi, iscsi class: set tmf to a safe default and export in sysfs
  2007-12-05  0:42                                   ` [PATCH 18/23] iscsi_tcp: drop session when itt does not match any command michaelc
@ 2007-12-05  0:42                                     ` michaelc
  2007-12-05  0:42                                       ` [PATCH 20/23] iscsi_tcp: enable sg chaining michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

Older tools will not be setting the tmf time outs since they
did not exists, so set them to a safe default.

And export abort and lu reset timeout values in sysfs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c             |    2 ++
 drivers/scsi/scsi_transport_iscsi.c |    8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index f15df8d..6573223 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1732,6 +1732,8 @@ iscsi_session_setup(struct iscsi_transport *iscsit,
 	session->host = shost;
 	session->state = ISCSI_STATE_FREE;
 	session->fast_abort = 1;
+	session->lu_reset_timeout = 15;
+	session->abort_timeout = 10;
 	session->mgmtpool_max = ISCSI_MGMT_CMDS_MAX;
 	session->cmds_max = cmds_max;
 	session->queued_cmdsn = session->cmdsn = initial_cmdsn;
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index fb8a765..d3db318 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -30,7 +30,7 @@
 #include <scsi/scsi_transport_iscsi.h>
 #include <scsi/iscsi_if.h>
 
-#define ISCSI_SESSION_ATTRS 16
+#define ISCSI_SESSION_ATTRS 18
 #define ISCSI_CONN_ATTRS 11
 #define ISCSI_HOST_ATTRS 4
 #define ISCSI_TRANSPORT_VERSION "2.0-724"
@@ -1241,7 +1241,9 @@ iscsi_session_attr(username, ISCSI_PARAM_USERNAME, 1);
 iscsi_session_attr(username_in, ISCSI_PARAM_USERNAME_IN, 1);
 iscsi_session_attr(password, ISCSI_PARAM_PASSWORD, 1);
 iscsi_session_attr(password_in, ISCSI_PARAM_PASSWORD_IN, 1);
-iscsi_session_attr(fast_abort, ISCSI_PARAM_FAST_ABORT, 1);
+iscsi_session_attr(fast_abort, ISCSI_PARAM_FAST_ABORT, 0);
+iscsi_session_attr(abort_tmo, ISCSI_PARAM_ABORT_TMO, 0);
+iscsi_session_attr(lu_reset_tmo, ISCSI_PARAM_LU_RESET_TMO, 0);
 
 #define iscsi_priv_session_attr_show(field, format)			\
 static ssize_t								\
@@ -1466,6 +1468,8 @@ iscsi_register_transport(struct iscsi_transport *tt)
 	SETUP_SESSION_RD_ATTR(username, ISCSI_PASSWORD);
 	SETUP_SESSION_RD_ATTR(username_in, ISCSI_PASSWORD_IN);
 	SETUP_SESSION_RD_ATTR(fast_abort, ISCSI_FAST_ABORT);
+	SETUP_SESSION_RD_ATTR(abort_tmo, ISCSI_ABORT_TMO);
+	SETUP_SESSION_RD_ATTR(lu_reset_tmo,ISCSI_LU_RESET_TMO);
 	SETUP_PRIV_SESSION_RD_ATTR(recovery_tmo);
 
 	BUG_ON(count > ISCSI_SESSION_ATTRS);
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 20/23] iscsi_tcp: enable sg chaining
  2007-12-05  0:42                                     ` [PATCH 19/23] libiscsi, iscsi class: set tmf to a safe default and export in sysfs michaelc
@ 2007-12-05  0:42                                       ` michaelc
  2007-12-05  0:42                                         ` [PATCH 21/23] iscsi_tcp: hold lock during data rsp processing michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

The previous patches converted iscsi_tcp to support sg chaining.
This patch sets the proper flags and sets sg_table size to
4096. This allows fs io to be capped at max_sectors, but passthrough
IO to be limited by some other part of the kernel.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c |    5 +++--
 drivers/scsi/iscsi_tcp.h |    3 ---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 65df908..84c4a50 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -1928,13 +1928,14 @@ static struct scsi_host_template iscsi_sht = {
 	.queuecommand           = iscsi_queuecommand,
 	.change_queue_depth	= iscsi_change_queue_depth,
 	.can_queue		= ISCSI_DEF_XMIT_CMDS_MAX - 1,
-	.sg_tablesize		= ISCSI_SG_TABLESIZE,
+	.sg_tablesize		= 4096,
 	.max_sectors		= 0xFFFF,
 	.cmd_per_lun		= ISCSI_DEF_CMD_PER_LUN,
 	.eh_abort_handler       = iscsi_eh_abort,
 	.eh_device_reset_handler= iscsi_eh_device_reset,
 	.eh_host_reset_handler	= iscsi_eh_host_reset,
 	.use_clustering         = DISABLE_CLUSTERING,
+	.use_sg_chaining	= ENABLE_SG_CHAINING,
 	.slave_configure        = iscsi_tcp_slave_configure,
 	.proc_name		= "iscsi_tcp",
 	.this_id		= -1,
@@ -1974,7 +1975,7 @@ static struct iscsi_transport iscsi_tcp_transport = {
 	.host_template		= &iscsi_sht,
 	.conndata_size		= sizeof(struct iscsi_conn),
 	.max_conn		= 1,
-	.max_cmd_len		= ISCSI_TCP_MAX_CMD_LEN,
+	.max_cmd_len		= 16,
 	/* session management */
 	.create_session		= iscsi_tcp_session_create,
 	.destroy_session	= iscsi_tcp_session_destroy,
diff --git a/drivers/scsi/iscsi_tcp.h b/drivers/scsi/iscsi_tcp.h
index 893cd2e..ed0b991 100644
--- a/drivers/scsi/iscsi_tcp.h
+++ b/drivers/scsi/iscsi_tcp.h
@@ -24,9 +24,6 @@
 
 #include <scsi/libiscsi.h>
 
-#define ISCSI_SG_TABLESIZE		SG_ALL
-#define ISCSI_TCP_MAX_CMD_LEN		16
-
 struct crypto_hash;
 struct socket;
 struct iscsi_tcp_conn;
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 21/23] iscsi_tcp: hold lock during data rsp processing
  2007-12-05  0:42                                       ` [PATCH 20/23] iscsi_tcp: enable sg chaining michaelc
@ 2007-12-05  0:42                                         ` michaelc
  2007-12-05  0:42                                           ` [PATCH 22/23] libiscsi: use is_power_of_2 michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

iscsi_data_rsp needs to hold the sesison lock when it calls
iscsi_update_cmdsn.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c |   14 ++++++--------
 1 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 84c4a50..edebdf2 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -641,13 +641,11 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	}
 
 	/* fill-in new R2T associated with the task */
-	spin_lock(&session->lock);
 	iscsi_update_cmdsn(session, (struct iscsi_nopin*)rhdr);
 
 	if (!ctask->sc || session->state != ISCSI_STATE_LOGGED_IN) {
 		printk(KERN_INFO "iscsi_tcp: dropping R2T itt %d in "
 		       "recovery...\n", ctask->itt);
-		spin_unlock(&session->lock);
 		return 0;
 	}
 
@@ -660,7 +658,6 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 		printk(KERN_ERR "iscsi_tcp: invalid R2T with zero data len\n");
 		__kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t,
 			    sizeof(void*));
-		spin_unlock(&session->lock);
 		return ISCSI_ERR_DATALEN;
 	}
 
@@ -676,7 +673,6 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 		       r2t->data_offset, scsi_bufflen(ctask->sc));
 		__kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t,
 			    sizeof(void*));
-		spin_unlock(&session->lock);
 		return ISCSI_ERR_DATALEN;
 	}
 
@@ -690,8 +686,6 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
 	conn->r2t_pdus_cnt++;
 
 	iscsi_requeue_ctask(ctask);
-	spin_unlock(&session->lock);
-
 	return 0;
 }
 
@@ -764,7 +758,9 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 	switch(opcode) {
 	case ISCSI_OP_SCSI_DATA_IN:
 		ctask = session->cmds[itt];
+		spin_lock(&conn->session->lock);
 		rc = iscsi_data_rsp(conn, ctask);
+		spin_unlock(&conn->session->lock);
 		if (rc)
 			return rc;
 		if (tcp_conn->in.datalen) {
@@ -806,9 +802,11 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct iscsi_hdr *hdr)
 		ctask = session->cmds[itt];
 		if (ahslen)
 			rc = ISCSI_ERR_AHSLEN;
-		else if (ctask->sc->sc_data_direction == DMA_TO_DEVICE)
+		else if (ctask->sc->sc_data_direction == DMA_TO_DEVICE) {
+			spin_lock(&session->lock);
 			rc = iscsi_r2t_rsp(conn, ctask);
-		else
+			spin_unlock(&session->lock);
+		} else
 			rc = ISCSI_ERR_PROTO;
 		break;
 	case ISCSI_OP_LOGIN_RSP:
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 22/23] libiscsi: use is_power_of_2
  2007-12-05  0:42                                         ` [PATCH 21/23] iscsi_tcp: hold lock during data rsp processing michaelc
@ 2007-12-05  0:42                                           ` michaelc
  2007-12-05  0:42                                             ` [PATCH 23/23] iscsi_tcp: fix setting of r2t michaelc
  0 siblings, 1 reply; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie, vignesh babu

From: Mike Christie <michaelc@cs.wisc.edu>

Patch from vignesh babu <vignesh.babu@wipro.com>:

Replacing n & (n - 1) for power of 2 check by is_power_of_2(n)

Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/libiscsi.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 6573223..553168a 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -24,6 +24,7 @@
 #include <linux/types.h>
 #include <linux/kfifo.h>
 #include <linux/delay.h>
+#include <linux/log2.h>
 #include <asm/unaligned.h>
 #include <net/tcp.h>
 #include <scsi/scsi_cmnd.h>
@@ -1700,7 +1701,7 @@ iscsi_session_setup(struct iscsi_transport *iscsit,
 		qdepth = ISCSI_DEF_CMD_PER_LUN;
 	}
 
-	if (cmds_max < 2 || (cmds_max & (cmds_max - 1)) ||
+	if (!is_power_of_2(cmds_max) ||
 	    cmds_max >= ISCSI_MGMT_ITT_OFFSET) {
 		if (cmds_max != 0)
 			printk(KERN_ERR "iscsi: invalid can_queue of %d. "
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 23/23] iscsi_tcp: fix setting of r2t
  2007-12-05  0:42                                           ` [PATCH 22/23] libiscsi: use is_power_of_2 michaelc
@ 2007-12-05  0:42                                             ` michaelc
  0 siblings, 0 replies; 24+ messages in thread
From: michaelc @ 2007-12-05  0:42 UTC (permalink / raw)
  To: linux-scsi; +Cc: Mike Christie

From: Mike Christie <michaelc@cs.wisc.edu>

If we negotiate for X r2ts we have to use only X r2ts. We cannot
round up (we could send less though). It is ok to fail if it
is not something the driver can handle, so this patch just does
that.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
---
 drivers/scsi/iscsi_tcp.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index edebdf2..e5be5fd 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -1774,12 +1774,12 @@ iscsi_conn_set_param(struct iscsi_cls_conn *cls_conn, enum iscsi_param param,
 		break;
 	case ISCSI_PARAM_MAX_R2T:
 		sscanf(buf, "%d", &value);
-		if (session->max_r2t == roundup_pow_of_two(value))
+		if (value <= 0 || !is_power_of_2(value))
+			return -EINVAL;
+		if (session->max_r2t == value)
 			break;
 		iscsi_r2tpool_free(session);
 		iscsi_set_param(cls_conn, param, buf, buflen);
-		if (session->max_r2t & (session->max_r2t - 1))
-			session->max_r2t = roundup_pow_of_two(session->max_r2t);
 		if (iscsi_r2tpool_alloc(session))
 			return -ENOMEM;
 		break;
-- 
1.5.1.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2007-12-05  0:44 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-05  0:42 iscsi update for 2.6.25 michaelc
2007-12-05  0:42 ` [PATCH 01/23] libiscsi, iscsi_tcp: add device support michaelc
2007-12-05  0:42   ` [PATCH 02/23] iscsi_tcp: rewrite recv path michaelc
2007-12-05  0:42     ` [PATCH 03/23] Prettify resid handling and some extra checks michaelc
2007-12-05  0:42       ` [PATCH 04/23] iscsi_tcp, libiscsi: initial AHS Support michaelc
2007-12-05  0:42         ` [PATCH 05/23] iser patching for AHS support michaelc
2007-12-05  0:42           ` [PATCH 06/23] libiscsi, iscsi_tcp: iscsi pool cleanup michaelc
2007-12-05  0:42             ` [PATCH 07/23] libiscsi: do not block session during logout michaelc
2007-12-05  0:42               ` [PATCH 08/23] iscsi class: Use our own workq instead of common system one michaelc
2007-12-05  0:42                 ` [PATCH 09/23] libiscsi: grab eh_mutex during host reset michaelc
2007-12-05  0:42                   ` [PATCH 10/23] libiscsi: fix shutdown michaelc
2007-12-05  0:42                     ` [PATCH 11/23] libiscsi: fix nop handling michaelc
2007-12-05  0:42                       ` [PATCH 12/23] iscsi_tcp: update the website URL michaelc
2007-12-05  0:42                         ` [PATCH 13/23] Do not fail commands immediately during logout michaelc
2007-12-05  0:42                           ` [PATCH 14/23] clear conn->ctask when task is completed early michaelc
2007-12-05  0:42                             ` [PATCH 15/23] Drop host lock in queuecommand michaelc
2007-12-05  0:42                               ` [PATCH 16/23] convert xmit path to iscsi chunks michaelc
2007-12-05  0:42                                 ` [PATCH 17/23] iscsi_tcp: stop leaking r2t_info's when the incoming R2T is bad michaelc
2007-12-05  0:42                                   ` [PATCH 18/23] iscsi_tcp: drop session when itt does not match any command michaelc
2007-12-05  0:42                                     ` [PATCH 19/23] libiscsi, iscsi class: set tmf to a safe default and export in sysfs michaelc
2007-12-05  0:42                                       ` [PATCH 20/23] iscsi_tcp: enable sg chaining michaelc
2007-12-05  0:42                                         ` [PATCH 21/23] iscsi_tcp: hold lock during data rsp processing michaelc
2007-12-05  0:42                                           ` [PATCH 22/23] libiscsi: use is_power_of_2 michaelc
2007-12-05  0:42                                             ` [PATCH 23/23] iscsi_tcp: fix setting of r2t michaelc

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.