All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] target: Fix queue-full callback error signaling
@ 2016-10-31  5:25 Nicholas A. Bellinger
  2016-10-31  5:25 ` [PATCH 1/3] target: Fix unknown fabric callback queue-full errors Nicholas A. Bellinger
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Nicholas A. Bellinger @ 2016-10-31  5:25 UTC (permalink / raw)
  To: target-devel
  Cc: linux-rdma, Potnuri Bharat Teja, Steve Wise, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

Hi folks,

This series contains target-core queue-full + iscsi-target
callback fixes for a bug reported by Steve and Potnuri
during recent v4.8.y iWARP/iser-target testing.

The first patch fixes target-core queue-full handling response
leaks with non -EAGAIN / -ENOMEM errors.  It uses a new state
TRANSPORT_COMPLETE_QF_ERR to internally generate CHECK_CONDITION
for unknown fabric callback errors, to avoid attempting retry
of fabric data-transfer callbacks for this special case.

This means all non -EAGAIN / -ENOMEM fabric callback errors
during target_core_fabric_ops for:

  *) ->write_pending()
  *) ->queue_data_in()
  *) ->queue_status()

will result in CHECK_CONDITION + LOGICAL_UNIT_COMMUNICATION_FAILURE,
if no non-zero se_cmd->scsi_status was previously set.
It also means target-core ->queue_status() errors retry indefinately,
or until session shutdown explicitly stops outstanding I/O.

The remaining changes are for propagating iscsit_transport
response failure back to target-core queue-full, and updating
iser-target to propagate isert_rdma_rw_ctx_post() errors from
RDMA R/W API back to target-core as well.

Please review.

--nab

Nicholas Bellinger (3):
  target: Fix unknown fabric callback queue-full errors
  iscsi-target: Propigate queue_data_in + queue_status errors
  iser-target: Fix queue-full response handling

 drivers/infiniband/ulp/isert/ib_isert.c      |  53 +++++++++-----
 drivers/target/iscsi/iscsi_target.c          |   3 +-
 drivers/target/iscsi/iscsi_target_configfs.c |  13 ++--
 drivers/target/iscsi/iscsi_target_util.c     |   5 +-
 drivers/target/iscsi/iscsi_target_util.h     |   2 +-
 drivers/target/target_core_transport.c       | 102 ++++++++++++++++++---------
 include/target/target_core_base.h            |   1 +
 7 files changed, 114 insertions(+), 65 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] target: Fix unknown fabric callback queue-full errors
  2016-10-31  5:25 [PATCH 0/3] target: Fix queue-full callback error signaling Nicholas A. Bellinger
@ 2016-10-31  5:25 ` Nicholas A. Bellinger
  2016-10-31  5:25 ` [PATCH 2/3] iscsi-target: Propigate queue_data_in + queue_status errors Nicholas A. Bellinger
       [not found] ` <1477891554-26222-1-git-send-email-nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
  2 siblings, 0 replies; 7+ messages in thread
From: Nicholas A. Bellinger @ 2016-10-31  5:25 UTC (permalink / raw)
  To: target-devel
  Cc: linux-rdma, Potnuri Bharat Teja, Steve Wise, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch fixes a set of queue-full response handling
bugs, where outgoing responses are leaked when a fabric
driver is propagating non -EAGAIN or -ENOMEM errors
to target-core.

It introduces TRANSPORT_COMPLETE_QF_ERR state used to
signal when CHECK_CONDITION status should be generated,
when fabric driver ->write_pending(), ->queue_data_in(),
or ->queue_status() callbacks fail with non -EAGAIN or
-ENOMEM errors, and data-transfer should not be retried.

Note all fabric driver -EAGAIN and -ENOMEM errors are
still retried indefinately with associated data-transfer
callbacks, following existing queue-full logic.

Also fix two missing ->queue_status() queue-full cases
related to CMD_T_ABORTED w/ TAS status handling.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/target/target_core_transport.c | 102 ++++++++++++++++++++++-----------
 include/target/target_core_base.h      |   1 +
 2 files changed, 69 insertions(+), 34 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 7dfefd6..489aa95 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -64,8 +64,9 @@
 struct kmem_cache *t10_alua_lba_map_mem_cache;
 
 static void transport_complete_task_attr(struct se_cmd *cmd);
+static int translate_sense_reason(struct se_cmd *cmd, sense_reason_t reason);
 static void transport_handle_queue_full(struct se_cmd *cmd,
-		struct se_device *dev);
+		struct se_device *dev, int err, bool write_pending);
 static int transport_put_cmd(struct se_cmd *cmd);
 static void target_complete_ok_work(struct work_struct *work);
 
@@ -811,7 +812,8 @@ void target_qf_do_work(struct work_struct *work)
 
 		if (cmd->t_state == TRANSPORT_COMPLETE_QF_WP)
 			transport_write_pending_qf(cmd);
-		else if (cmd->t_state == TRANSPORT_COMPLETE_QF_OK)
+		else if (cmd->t_state == TRANSPORT_COMPLETE_QF_OK ||
+			 cmd->t_state == TRANSPORT_COMPLETE_QF_ERR)
 			transport_complete_qf(cmd);
 	}
 }
@@ -1720,7 +1722,7 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 		}
 		trace_target_cmd_complete(cmd);
 		ret = cmd->se_tfo->queue_status(cmd);
-		if (ret == -EAGAIN || ret == -ENOMEM)
+		if (ret)
 			goto queue_full;
 		goto check_stop;
 	default:
@@ -1731,7 +1733,7 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 	}
 
 	ret = transport_send_check_condition_and_sense(cmd, sense_reason, 0);
-	if (ret == -EAGAIN || ret == -ENOMEM)
+	if (ret)
 		goto queue_full;
 
 check_stop:
@@ -1740,8 +1742,7 @@ void transport_generic_request_failure(struct se_cmd *cmd,
 	return;
 
 queue_full:
-	cmd->t_state = TRANSPORT_COMPLETE_QF_OK;
-	transport_handle_queue_full(cmd, cmd->se_dev);
+	transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
 }
 EXPORT_SYMBOL(transport_generic_request_failure);
 
@@ -1980,13 +1981,29 @@ static void transport_complete_qf(struct se_cmd *cmd)
 	int ret = 0;
 
 	transport_complete_task_attr(cmd);
+	/*
+	 * If a fabric driver ->write_pending() or ->queue_data_in() callback
+	 * has returned neither -ENOMEM or -EAGAIN, assume it's fatal and
+	 * the same callbacks should not be retried.  Return CHECK_CONDITION
+	 * if a scsi_status is not already set.
+	 *
+	 * If a fabric driver ->queue_status() has returned non zero, always
+	 * keep retrying no matter what..
+	 */
+	if (cmd->t_state == TRANSPORT_COMPLETE_QF_ERR) {
+		if (cmd->scsi_status)
+			goto queue_status;
 
-	if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) {
-		trace_target_cmd_complete(cmd);
-		ret = cmd->se_tfo->queue_status(cmd);
-		goto out;
+		cmd->se_cmd_flags |= SCF_EMULATED_TASK_SENSE;
+		cmd->scsi_status = SAM_STAT_CHECK_CONDITION;
+		cmd->scsi_sense_length  = TRANSPORT_SENSE_BUFFER;
+		translate_sense_reason(cmd, TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE);
+		goto queue_status;
 	}
 
+	if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE)
+		goto queue_status;
+
 	switch (cmd->data_direction) {
 	case DMA_FROM_DEVICE:
 		if (cmd->scsi_status)
@@ -2010,19 +2027,33 @@ static void transport_complete_qf(struct se_cmd *cmd)
 		break;
 	}
 
-out:
 	if (ret < 0) {
-		transport_handle_queue_full(cmd, cmd->se_dev);
+		transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
 		return;
 	}
 	transport_lun_remove_cmd(cmd);
 	transport_cmd_check_stop_to_fabric(cmd);
 }
 
-static void transport_handle_queue_full(
-	struct se_cmd *cmd,
-	struct se_device *dev)
+static void transport_handle_queue_full(struct se_cmd *cmd, struct se_device *dev,
+					int err, bool write_pending)
 {
+	/*
+	 * -EAGAIN or -ENOMEM signals retry of ->write_pending() and/or
+	 * ->queue_data_in() callbacks from new process context.
+	 *
+	 * Otherwise for other errors, transport_complete_qf() will send
+	 * CHECK_CONDITION via ->queue_status() instead of attempting to
+	 * retry associated fabric driver data-transfer callbacks.
+	 */
+	if (err == -EAGAIN || err == -ENOMEM) {
+		cmd->t_state = (write_pending) ? TRANSPORT_COMPLETE_QF_WP :
+						 TRANSPORT_COMPLETE_QF_OK;
+	} else {
+		pr_warn_ratelimited("Got unknown fabric queue status: %d\n", err);
+		cmd->t_state = TRANSPORT_COMPLETE_QF_ERR;
+	}
+
 	spin_lock_irq(&dev->qf_cmd_lock);
 	list_add_tail(&cmd->se_qf_node, &cmd->se_dev->qf_cmd_list);
 	atomic_inc_mb(&dev->dev_qf_count);
@@ -2086,7 +2117,7 @@ static void target_complete_ok_work(struct work_struct *work)
 		WARN_ON(!cmd->scsi_status);
 		ret = transport_send_check_condition_and_sense(
 					cmd, 0, 1);
-		if (ret == -EAGAIN || ret == -ENOMEM)
+		if (ret)
 			goto queue_full;
 
 		transport_lun_remove_cmd(cmd);
@@ -2112,7 +2143,7 @@ static void target_complete_ok_work(struct work_struct *work)
 		} else if (rc) {
 			ret = transport_send_check_condition_and_sense(cmd,
 						rc, 0);
-			if (ret == -EAGAIN || ret == -ENOMEM)
+			if (ret)
 				goto queue_full;
 
 			transport_lun_remove_cmd(cmd);
@@ -2137,7 +2168,7 @@ static void target_complete_ok_work(struct work_struct *work)
 		if (target_read_prot_action(cmd)) {
 			ret = transport_send_check_condition_and_sense(cmd,
 						cmd->pi_err, 0);
-			if (ret == -EAGAIN || ret == -ENOMEM)
+			if (ret)
 				goto queue_full;
 
 			transport_lun_remove_cmd(cmd);
@@ -2147,7 +2178,7 @@ static void target_complete_ok_work(struct work_struct *work)
 
 		trace_target_cmd_complete(cmd);
 		ret = cmd->se_tfo->queue_data_in(cmd);
-		if (ret == -EAGAIN || ret == -ENOMEM)
+		if (ret)
 			goto queue_full;
 		break;
 	case DMA_TO_DEVICE:
@@ -2160,7 +2191,7 @@ static void target_complete_ok_work(struct work_struct *work)
 			atomic_long_add(cmd->data_length,
 					&cmd->se_lun->lun_stats.tx_data_octets);
 			ret = cmd->se_tfo->queue_data_in(cmd);
-			if (ret == -EAGAIN || ret == -ENOMEM)
+			if (ret)
 				goto queue_full;
 			break;
 		}
@@ -2169,7 +2200,7 @@ static void target_complete_ok_work(struct work_struct *work)
 queue_status:
 		trace_target_cmd_complete(cmd);
 		ret = cmd->se_tfo->queue_status(cmd);
-		if (ret == -EAGAIN || ret == -ENOMEM)
+		if (ret)
 			goto queue_full;
 		break;
 	default:
@@ -2183,8 +2214,8 @@ static void target_complete_ok_work(struct work_struct *work)
 queue_full:
 	pr_debug("Handling complete_ok QUEUE_FULL: se_cmd: %p,"
 		" data_direction: %d\n", cmd, cmd->data_direction);
-	cmd->t_state = TRANSPORT_COMPLETE_QF_OK;
-	transport_handle_queue_full(cmd, cmd->se_dev);
+
+	transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
 }
 
 void target_free_sgl(struct scatterlist *sgl, int nents)
@@ -2434,18 +2465,14 @@ void transport_kunmap_data_sg(struct se_cmd *cmd)
 	transport_cmd_check_stop(cmd, false, true);
 
 	ret = cmd->se_tfo->write_pending(cmd);
-	if (ret == -EAGAIN || ret == -ENOMEM)
+	if (ret)
 		goto queue_full;
 
-	/* fabric drivers should only return -EAGAIN or -ENOMEM as error */
-	WARN_ON(ret);
-
-	return (!ret) ? 0 : TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
+	return 0;
 
 queue_full:
 	pr_debug("Handling write_pending QUEUE__FULL: se_cmd: %p\n", cmd);
-	cmd->t_state = TRANSPORT_COMPLETE_QF_WP;
-	transport_handle_queue_full(cmd, cmd->se_dev);
+	transport_handle_queue_full(cmd, cmd->se_dev, ret, true);
 	return 0;
 }
 EXPORT_SYMBOL(transport_generic_new_cmd);
@@ -2455,10 +2482,10 @@ static void transport_write_pending_qf(struct se_cmd *cmd)
 	int ret;
 
 	ret = cmd->se_tfo->write_pending(cmd);
-	if (ret == -EAGAIN || ret == -ENOMEM) {
+	if (ret) {
 		pr_debug("Handling write_pending QUEUE__FULL: se_cmd: %p\n",
 			 cmd);
-		transport_handle_queue_full(cmd, cmd->se_dev);
+		transport_handle_queue_full(cmd, cmd->se_dev, ret, true);
 	}
 }
 
@@ -2951,6 +2978,8 @@ static int __transport_check_aborted_status(struct se_cmd *cmd, int send_status)
 	__releases(&cmd->t_state_lock)
 	__acquires(&cmd->t_state_lock)
 {
+	int ret;
+
 	assert_spin_locked(&cmd->t_state_lock);
 	WARN_ON_ONCE(!irqs_disabled());
 
@@ -2974,7 +3003,9 @@ static int __transport_check_aborted_status(struct se_cmd *cmd, int send_status)
 	trace_target_cmd_complete(cmd);
 
 	spin_unlock_irq(&cmd->t_state_lock);
-	cmd->se_tfo->queue_status(cmd);
+	ret = cmd->se_tfo->queue_status(cmd);
+	if (ret)
+		transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
 	spin_lock_irq(&cmd->t_state_lock);
 
 	return 1;
@@ -2995,6 +3026,7 @@ int transport_check_aborted_status(struct se_cmd *cmd, int send_status)
 void transport_send_task_abort(struct se_cmd *cmd)
 {
 	unsigned long flags;
+	int ret;
 
 	spin_lock_irqsave(&cmd->t_state_lock, flags);
 	if (cmd->se_cmd_flags & (SCF_SENT_CHECK_CONDITION)) {
@@ -3030,7 +3062,9 @@ void transport_send_task_abort(struct se_cmd *cmd)
 		 cmd->t_task_cdb[0], cmd->tag);
 
 	trace_target_cmd_complete(cmd);
-	cmd->se_tfo->queue_status(cmd);
+	ret = cmd->se_tfo->queue_status(cmd);
+	if (ret)
+		transport_handle_queue_full(cmd, cmd->se_dev, ret, false);
 }
 
 static void target_tmr_work(struct work_struct *work)
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index c211900..6a5e439 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -119,6 +119,7 @@ enum transport_state_table {
 	TRANSPORT_ISTATE_PROCESSING = 11,
 	TRANSPORT_COMPLETE_QF_WP = 18,
 	TRANSPORT_COMPLETE_QF_OK = 19,
+	TRANSPORT_COMPLETE_QF_ERR = 20,
 };
 
 /* Used for struct se_cmd->se_cmd_flags */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] iscsi-target: Propigate queue_data_in + queue_status errors
  2016-10-31  5:25 [PATCH 0/3] target: Fix queue-full callback error signaling Nicholas A. Bellinger
  2016-10-31  5:25 ` [PATCH 1/3] target: Fix unknown fabric callback queue-full errors Nicholas A. Bellinger
@ 2016-10-31  5:25 ` Nicholas A. Bellinger
       [not found] ` <1477891554-26222-1-git-send-email-nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
  2 siblings, 0 replies; 7+ messages in thread
From: Nicholas A. Bellinger @ 2016-10-31  5:25 UTC (permalink / raw)
  To: target-devel
  Cc: linux-rdma, Potnuri Bharat Teja, Steve Wise, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab@linux-iscsi.org>

This patch changes iscsi-target to propagate iscsit_transport
->iscsit_queue_data_in() and ->iscsit_queue_status() callback
errors, back up into target-core.

This allows target-core to retry failed iscsit_transport
callbacks using internal queue-full logic.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
---
 drivers/target/iscsi/iscsi_target.c          |  3 +--
 drivers/target/iscsi/iscsi_target_configfs.c | 13 +++++--------
 drivers/target/iscsi/iscsi_target_util.c     |  5 +++--
 drivers/target/iscsi/iscsi_target_util.h     |  2 +-
 4 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index b7d747e..1b03e85 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -482,8 +482,7 @@ int iscsit_del_np(struct iscsi_np *np)
 
 int iscsit_queue_rsp(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 {
-	iscsit_add_cmd_to_response_queue(cmd, cmd->conn, cmd->i_state);
-	return 0;
+	return iscsit_add_cmd_to_response_queue(cmd, cmd->conn, cmd->i_state);
 }
 EXPORT_SYMBOL(iscsit_queue_rsp);
 
diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c
index 923c032..38a72d3 100644
--- a/drivers/target/iscsi/iscsi_target_configfs.c
+++ b/drivers/target/iscsi/iscsi_target_configfs.c
@@ -1395,11 +1395,10 @@ static u32 lio_sess_get_initiator_sid(
 static int lio_queue_data_in(struct se_cmd *se_cmd)
 {
 	struct iscsi_cmd *cmd = container_of(se_cmd, struct iscsi_cmd, se_cmd);
+	struct iscsi_conn *conn = cmd->conn;
 
 	cmd->i_state = ISTATE_SEND_DATAIN;
-	cmd->conn->conn_transport->iscsit_queue_data_in(cmd->conn, cmd);
-
-	return 0;
+	return conn->conn_transport->iscsit_queue_data_in(conn, cmd);
 }
 
 static int lio_write_pending(struct se_cmd *se_cmd)
@@ -1428,16 +1427,14 @@ static int lio_write_pending_status(struct se_cmd *se_cmd)
 static int lio_queue_status(struct se_cmd *se_cmd)
 {
 	struct iscsi_cmd *cmd = container_of(se_cmd, struct iscsi_cmd, se_cmd);
+	struct iscsi_conn *conn = cmd->conn;
 
 	cmd->i_state = ISTATE_SEND_STATUS;
 
 	if (cmd->se_cmd.scsi_status || cmd->sense_reason) {
-		iscsit_add_cmd_to_response_queue(cmd, cmd->conn, cmd->i_state);
-		return 0;
+		return iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
 	}
-	cmd->conn->conn_transport->iscsit_queue_status(cmd->conn, cmd);
-
-	return 0;
+	return conn->conn_transport->iscsit_queue_status(conn, cmd);
 }
 
 static void lio_queue_tm_rsp(struct se_cmd *se_cmd)
diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c
index 1f38177..eb9c49a 100644
--- a/drivers/target/iscsi/iscsi_target_util.c
+++ b/drivers/target/iscsi/iscsi_target_util.c
@@ -565,7 +565,7 @@ static void iscsit_remove_cmd_from_immediate_queue(
 	}
 }
 
-void iscsit_add_cmd_to_response_queue(
+int iscsit_add_cmd_to_response_queue(
 	struct iscsi_cmd *cmd,
 	struct iscsi_conn *conn,
 	u8 state)
@@ -576,7 +576,7 @@ void iscsit_add_cmd_to_response_queue(
 	if (!qr) {
 		pr_err("Unable to allocate memory for"
 			" struct iscsi_queue_req\n");
-		return;
+		return -ENOMEM;
 	}
 	INIT_LIST_HEAD(&qr->qr_list);
 	qr->cmd = cmd;
@@ -588,6 +588,7 @@ void iscsit_add_cmd_to_response_queue(
 	spin_unlock_bh(&conn->response_queue_lock);
 
 	wake_up(&conn->queues_wq);
+	return 0;
 }
 
 struct iscsi_queue_req *iscsit_get_cmd_from_response_queue(struct iscsi_conn *conn)
diff --git a/drivers/target/iscsi/iscsi_target_util.h b/drivers/target/iscsi/iscsi_target_util.h
index 995f1cb..d96ec7d 100644
--- a/drivers/target/iscsi/iscsi_target_util.h
+++ b/drivers/target/iscsi/iscsi_target_util.h
@@ -23,7 +23,7 @@ extern int iscsit_find_cmd_for_recovery(struct iscsi_session *, struct iscsi_cmd
 			struct iscsi_conn_recovery **, itt_t);
 extern void iscsit_add_cmd_to_immediate_queue(struct iscsi_cmd *, struct iscsi_conn *, u8);
 extern struct iscsi_queue_req *iscsit_get_cmd_from_immediate_queue(struct iscsi_conn *);
-extern void iscsit_add_cmd_to_response_queue(struct iscsi_cmd *, struct iscsi_conn *, u8);
+extern int iscsit_add_cmd_to_response_queue(struct iscsi_cmd *, struct iscsi_conn *, u8);
 extern struct iscsi_queue_req *iscsit_get_cmd_from_response_queue(struct iscsi_conn *);
 extern void iscsit_remove_cmd_from_tx_queues(struct iscsi_cmd *, struct iscsi_conn *);
 extern bool iscsit_conn_all_queues_empty(struct iscsi_conn *);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] iser-target: Fix queue-full response handling
       [not found] ` <1477891554-26222-1-git-send-email-nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
@ 2016-10-31  5:25   ` Nicholas A. Bellinger
  2017-03-23  5:36   ` [PATCH 0/3] target: Fix queue-full callback error signaling Potnuri Bharat Teja
  1 sibling, 0 replies; 7+ messages in thread
From: Nicholas A. Bellinger @ 2016-10-31  5:25 UTC (permalink / raw)
  To: target-devel
  Cc: linux-rdma, Potnuri Bharat Teja, Steve Wise, Sagi Grimberg,
	Nicholas Bellinger

From: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>

This patch addresses two queue-full handling bugs in iser-target.

The first is propagating isert_rdma_rw_ctx_post() return back
to target-core via isert_put_datain() + isert_get_dataout()
callbacks, in order to trigger queue-full logic in target-core.
Note target-core expects -EAGAIN or -ENOMEM error to signal
RDMA WRITE/READ data-transfer callbacks should be retried,
after queue-full logic been invoked.

Other types of errors propagated up from RDMA RW API will result
in target-core generating internal CHECK_CONDITION status,
avoiding subsequent isert_put_datain() and isert_get_dataout()
iscsit_transport callback retry attempts.

The second is to use transport_generic_request_failure()
during T10-PI hw-offload errors in isert_rdma_write_done()
and isert_rdma_read_done(), so CHECK_CONDITION queue-full
is handled internally by target-core.

Also add isert_put_response() T10-PI failure case fixme in
isert_rdma_write_done(), which is currently not internally
retried or released until session reinstatement.

Reported-by: Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Cc: Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Reported-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Signed-off-by: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 53 ++++++++++++++++++++++-----------
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 6dd43f6..e658248 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -1666,10 +1666,23 @@
 	ret = isert_check_pi_status(cmd, isert_cmd->rw.sig->sig_mr);
 	isert_rdma_rw_ctx_destroy(isert_cmd, isert_conn);
 
-	if (ret)
-		transport_send_check_condition_and_sense(cmd, cmd->pi_err, 0);
-	else
-		isert_put_response(isert_conn->conn, isert_cmd->iscsi_cmd);
+	if (ret) {
+		/*
+		 * transport_generic_request_failure() expects to have
+		 * plus two references to handle queue-full, so re-add
+		 * one here as target-core will have already dropped
+		 * it after the first isert_put_datain() callback.
+		 */
+		kref_get(&cmd->cmd_kref);
+		transport_generic_request_failure(cmd, cmd->pi_err);
+	} else {
+		/*
+		 * XXX: isert_put_response() failure is not retried.
+		 */
+		ret = isert_put_response(isert_conn->conn, isert_cmd->iscsi_cmd);
+		if (ret)
+			pr_warn_ratelimited("isert_put_response() ret: %d\n", ret);
+	}
 }
 
 static void
@@ -1706,13 +1719,15 @@
 	cmd->i_state = ISTATE_RECEIVED_LAST_DATAOUT;
 	spin_unlock_bh(&cmd->istate_lock);
 
-	if (ret) {
-		target_put_sess_cmd(se_cmd);
-		transport_send_check_condition_and_sense(se_cmd,
-							 se_cmd->pi_err, 0);
-	} else {
+	/*
+	 * transport_generic_request_failure() will drop the extra
+	 * se_cmd->cmd_kref reference after T10-PI error, and handle
+	 * any non-zero ->queue_status() callback error retries.
+	 */
+	if (ret)
+		transport_generic_request_failure(se_cmd, se_cmd->pi_err);
+	else
 		target_execute_cmd(se_cmd);
-	}
 }
 
 static void
@@ -2172,26 +2187,28 @@
 		chain_wr = &isert_cmd->tx_desc.send_wr;
 	}
 
-	isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
-	isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ\n", isert_cmd);
-	return 1;
+	rc = isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+	isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ rc: %d\n",
+		  isert_cmd, rc);
+	return rc;
 }
 
 static int
 isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
 {
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
+	int ret;
 
 	isert_dbg("Cmd: %p RDMA_READ data_length: %u write_data_done: %u\n",
 		 isert_cmd, cmd->se_cmd.data_length, cmd->write_data_done);
 
 	isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
-	isert_rdma_rw_ctx_post(isert_cmd, conn->context,
-			&isert_cmd->tx_desc.tx_cqe, NULL);
+	ret = isert_rdma_rw_ctx_post(isert_cmd, conn->context,
+				     &isert_cmd->tx_desc.tx_cqe, NULL);
 
-	isert_dbg("Cmd: %p posted RDMA_READ memory for ISER Data WRITE\n",
-		 isert_cmd);
-	return 0;
+	isert_dbg("Cmd: %p posted RDMA_READ memory for ISER Data WRITE rc: %d\n",
+		 isert_cmd, ret);
+	return ret;
 }
 
 static int
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] target: Fix queue-full callback error signaling
       [not found] ` <1477891554-26222-1-git-send-email-nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
  2016-10-31  5:25   ` [PATCH 3/3] iser-target: Fix queue-full response handling Nicholas A. Bellinger
@ 2017-03-23  5:36   ` Potnuri Bharat Teja
  2017-03-31  4:32     ` Nicholas A. Bellinger
  1 sibling, 1 reply; 7+ messages in thread
From: Potnuri Bharat Teja @ 2017-03-23  5:36 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: target-devel, linux-rdma, SWise OGC, Sagi Grimberg

On Monday, October 10/31/16, 2016 at 10:55:51 +0530, Nicholas A. Bellinger wrote:
>    From: Nicholas Bellinger <nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
Hi Nicholas,
This patch series fixes the post failures due to Send Queue overflow when
tested along with Sagi's "iser-target: avoid posting a recv buffer twice", which 
goes on top of these 3 patches.

Combined patches series(Nicholas's, Sagi's) works fine.
Tested with iWARP iser. 

As these changes are critical for devices with lower sg resources, hoping this 
series will be queued for next RC.

Reviewed-by: Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Tested-by: Potnuri Bharat Teja <bharat-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

Thanks,
Bharat.
> 
>    Hi folks,
> 
>    This series contains target-core queue-full + iscsi-target
>    callback fixes for a bug reported by Steve and Potnuri
>    during recent v4.8.y iWARP/iser-target testing.
> 
>    The first patch fixes target-core queue-full handling response
>    leaks with non -EAGAIN / -ENOMEM errors.  It uses a new state
>    TRANSPORT_COMPLETE_QF_ERR to internally generate CHECK_CONDITION
>    for unknown fabric callback errors, to avoid attempting retry
>    of fabric data-transfer callbacks for this special case.
> 
>    This means all non -EAGAIN / -ENOMEM fabric callback errors
>    during target_core_fabric_ops for:
> 
>      *) ->write_pending()
>      *) ->queue_data_in()
>      *) ->queue_status()
> 
>    will result in CHECK_CONDITION + LOGICAL_UNIT_COMMUNICATION_FAILURE,
>    if no non-zero se_cmd->scsi_status was previously set.
>    It also means target-core ->queue_status() errors retry indefinately,
>    or until session shutdown explicitly stops outstanding I/O.
> 
>    The remaining changes are for propagating iscsit_transport
>    response failure back to target-core queue-full, and updating
>    iser-target to propagate isert_rdma_rw_ctx_post() errors from
>    RDMA R/W API back to target-core as well.
> 
>    Please review.
> 
>    --nab
> 
>    Nicholas Bellinger (3):
>      target: Fix unknown fabric callback queue-full errors
>      iscsi-target: Propigate queue_data_in + queue_status errors
>      iser-target: Fix queue-full response handling
> 
>     drivers/infiniband/ulp/isert/ib_isert.c      |  53 +++++++++-----
>     drivers/target/iscsi/iscsi_target.c          |   3 +-
>     drivers/target/iscsi/iscsi_target_configfs.c |  13 ++--
>     drivers/target/iscsi/iscsi_target_util.c     |   5 +-
>     drivers/target/iscsi/iscsi_target_util.h     |   2 +-
>     drivers/target/target_core_transport.c       | 102
>    ++++++++++++++++++---------
>     include/target/target_core_base.h            |   1 +
>     7 files changed, 114 insertions(+), 65 deletions(-)
> 
>    --
>    1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] target: Fix queue-full callback error signaling
  2017-03-23  5:36   ` [PATCH 0/3] target: Fix queue-full callback error signaling Potnuri Bharat Teja
@ 2017-03-31  4:32     ` Nicholas A. Bellinger
  2017-04-03  6:23       ` Potnuri Bharat Teja
  0 siblings, 1 reply; 7+ messages in thread
From: Nicholas A. Bellinger @ 2017-03-31  4:32 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: target-devel, linux-rdma, SWise OGC, Sagi Grimberg

Hi Bharat & Co,

On Thu, 2017-03-23 at 11:06 +0530, Potnuri Bharat Teja wrote:
> On Monday, October 10/31/16, 2016 at 10:55:51 +0530, Nicholas A. Bellinger wrote:
> >    From: Nicholas Bellinger <nab@linux-iscsi.org>
> Hi Nicholas,
> This patch series fixes the post failures due to Send Queue overflow when
> tested along with Sagi's "iser-target: avoid posting a recv buffer twice", which 
> goes on top of these 3 patches.
> 
> Combined patches series(Nicholas's, Sagi's) works fine.
> Tested with iWARP iser. 
> 
> As these changes are critical for devices with lower sg resources, hoping this 
> series will be queued for next RC.
> 
> Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
> Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>

As mentioned, the original three patches along with Sagi's have been
applied to target-pending/master.

Thank you for your testing and patch review.

I'm still considering if they should be CC'ed for stable, considering
AFAICT they are pretty much a requirement for iser-target to be stable
with iw_cxgb4.  It should be OK to include the series back to v4.1.y
stable, but I'm reviewing the target change again just to be sure.

Also, I've been thinking a bit more about the hung tasks you reported
earlier during session reinstatement + repeated queue full preceding
Sagi's patch here:

http://www.spinics.net/lists/target-devel/msg14868.html

and have some clarity as to what was going on..

The issue is that queue full handling for TFO->write_pending(),
TFO->queue_data_in() and TFO->queue_status failures will retry
indefinitely until they successfully complete, and currently do not
honor se_cmd quiesce that occurs during normal session reinstatement.

Which means that if a fabric driver fails any of these three TFO
callbacks forever, the se_cmd descriptor will never be properly quiesced
back to fabric driver code, resulting in hung tasks you observed while
session shutdown waits for this action to complete.

So as-is, this patch series doesn't address this particular scenario but
doesn't make it any worse either..  ;)

That said, it's fine to merge this series now to allow iser-target +
iw_cxgb4 to function, and I'll be looking into addressing the infinite
queue_full + session reinstatement scenario for target-core in separate
patch series.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] target: Fix queue-full callback error signaling
  2017-03-31  4:32     ` Nicholas A. Bellinger
@ 2017-04-03  6:23       ` Potnuri Bharat Teja
  0 siblings, 0 replies; 7+ messages in thread
From: Potnuri Bharat Teja @ 2017-04-03  6:23 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: target-devel, linux-rdma, SWise OGC, Sagi Grimberg

On Friday, March 03/31/17, 2017 at 10:02:09 +0530, Nicholas A. Bellinger wrote:
Hi Nicholas,
>    Hi Bharat & Co,
> 
>    On Thu, 2017-03-23 at 11:06 +0530, Potnuri Bharat Teja wrote:
>    > On Monday, October 10/31/16, 2016 at 10:55:51 +0530, Nicholas A.
>    Bellinger wrote:
>    > >    From: Nicholas Bellinger <nab@linux-iscsi.org>
>    > Hi Nicholas,
>    > This patch series fixes the post failures due to Send Queue overflow
>    when
>    > tested along with Sagi's "iser-target: avoid posting a recv buffer
>    twice", which
>    > goes on top of these 3 patches.
>    >
>    > Combined patches series(Nicholas's, Sagi's) works fine.
>    > Tested with iWARP iser.
>    >
>    > As these changes are critical for devices with lower sg resources,
>    hoping this
>    > series will be queued for next RC.
>    >
>    > Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
>    > Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
> 
>    As mentioned, the original three patches along with Sagi's have been
>    applied to target-pending/master.
Thanks a lot!
> 
>    Thank you for your testing and patch review.
> 
>    I'm still considering if they should be CC'ed for stable, considering
>    AFAICT they are pretty much a requirement for iser-target to be stable
>    with iw_cxgb4.  It should be OK to include the series back to v4.1.y
>    stable, but I'm reviewing the target change again just to be sure.
Yes, It would be nice to have them backported to v4.1
> 
>    Also, I've been thinking a bit more about the hung tasks you reported
>    earlier during session reinstatement + repeated queue full preceding
>    Sagi's patch here:
> 
>    [1]http://www.spinics.net/lists/target-devel/msg14868.html
> 
>    and have some clarity as to what was going on..
> 
>    The issue is that queue full handling for TFO->write_pending(),
>    TFO->queue_data_in() and TFO->queue_status failures will retry
>    indefinitely until they successfully complete, and currently do not
>    honor se_cmd quiesce that occurs during normal session reinstatement.
> 
>    Which means that if a fabric driver fails any of these three TFO
>    callbacks forever, the se_cmd descriptor will never be properly quiesced
>    back to fabric driver code, resulting in hung tasks you observed while
>    session shutdown waits for this action to complete.
> 
>    So as-is, this patch series doesn't address this particular scenario but
>    doesn't make it any worse either..  ;)
> 
>    That said, it's fine to merge this series now to allow iser-target +
>    iw_cxgb4 to function, and I'll be looking into addressing the infinite
>    queue_full + session reinstatement scenario for target-core in separate
>    patch series.
Thanks, I shall get back to you with some more debug in the above scenario.
> 
>    --
>    To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>    the body of a message to majordomo@vger.kernel.org
>    More majordomo info at  [2]http://vger.kernel.org/majordomo-info.html
> 
> References
> 
>    Visible links
>    1. http://www.spinics.net/lists/target-devel/msg14868.html
>    2. http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-03  6:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-31  5:25 [PATCH 0/3] target: Fix queue-full callback error signaling Nicholas A. Bellinger
2016-10-31  5:25 ` [PATCH 1/3] target: Fix unknown fabric callback queue-full errors Nicholas A. Bellinger
2016-10-31  5:25 ` [PATCH 2/3] iscsi-target: Propigate queue_data_in + queue_status errors Nicholas A. Bellinger
     [not found] ` <1477891554-26222-1-git-send-email-nab-IzHhD5pYlfBP7FQvKIMDCQ@public.gmane.org>
2016-10-31  5:25   ` [PATCH 3/3] iser-target: Fix queue-full response handling Nicholas A. Bellinger
2017-03-23  5:36   ` [PATCH 0/3] target: Fix queue-full callback error signaling Potnuri Bharat Teja
2017-03-31  4:32     ` Nicholas A. Bellinger
2017-04-03  6:23       ` Potnuri Bharat Teja

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.