[PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted
       [not found] <20170215002612.14566-1-bart.vanassche@sandisk.com>
@ 2017-02-15  0:25 ` Bart Van Assche
  2017-02-20 21:38   ` Nicholas A. Bellinger
  2017-02-15  0:25 ` [PATCH v6 15/33] target: Avoid circular waits between LUN resets Bart Van Assche
  2017-02-15  0:25 ` [PATCH v6 19/33] target: Avoid that LUN reset sporadically triggers data corruption Bart Van Assche
  2 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-02-15  0:25 UTC (permalink / raw)
  To: Nicholas A . Bellinger; +Cc: target-devel, Bart Van Assche, stable

For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside
transport_generic_free_cmd() causes RDMA completion processing to stall.
Hence only sleep inside this function if the (iSCSI) target driver
requires this.

This patch avoids that messages similar to the following appear in the
kernel log:

INFO: task kworker/u25:0:1013 blocked for more than 480 seconds.
      Tainted: G        W       4.10.0-rc7-dbg+ #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u25:0   D    0  1013      2 0x00000000
Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Call Trace:
 __schedule+0x2da/0xb00
 schedule+0x38/0x90
 schedule_timeout+0x2fe/0x640
 wait_for_completion+0xfe/0x160
 transport_generic_free_cmd+0x2e/0x80 [target_core_mod]
 srpt_send_done+0x59/0x9f [ib_srpt]
 __ib_process_cq+0x4b/0xd0 [ib_core]
 ib_cq_poll_work+0x1b/0x60 [ib_core]
 process_one_work+0x208/0x6a0
 worker_thread+0x49/0x4a0
 kthread+0x107/0x140
 ret_from_fork+0x2e/0x40

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: <stable@vger.kernel.org>
---
 drivers/target/target_core_transport.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 2ed9721a7202..ab1c493a9a16 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2504,15 +2504,13 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 	int ret = 0;
 	bool aborted = false, tas = false;
 
-	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
-		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
-			target_wait_free_cmd(cmd, &aborted, &tas);
+	if (wait_for_tasks)
+		target_wait_free_cmd(cmd, &aborted, &tas);
 
+	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
 		if (!aborted || tas)
 			ret = transport_put_cmd(cmd);
 	} else {
-		if (wait_for_tasks)
-			target_wait_free_cmd(cmd, &aborted, &tas);
 		/*
 		 * Handle WRITE failure case where transport_generic_new_cmd()
 		 * has already added se_cmd to state_list, but fabric has
@@ -2535,7 +2533,6 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 	 */
 	if (aborted) {
 		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
-		wait_for_completion(&cmd->cmd_wait_comp);
 		cmd->se_tfo->release_cmd(cmd);
 		ret = 1;
 	}
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 15/33] target: Avoid circular waits between LUN resets
       [not found] <20170215002612.14566-1-bart.vanassche@sandisk.com>
  2017-02-15  0:25 ` [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted Bart Van Assche
@ 2017-02-15  0:25 ` Bart Van Assche
  2017-02-20 22:32   ` Nicholas A. Bellinger
  2017-02-15  0:25 ` [PATCH v6 19/33] target: Avoid that LUN reset sporadically triggers data corruption Bart Van Assche
  2 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-02-15  0:25 UTC (permalink / raw)
  To: Nicholas A . Bellinger
  Cc: target-devel, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, David Disseldorp, stable

If an initiator submits a LUN reset while a previous LUN reset is
still being processed, both LUN resets will be added to dev_tmr_list.
Avoid that this results in a circular wait between the two LUN resets
by removing a LUN reset from dev_tmr_list before scanning that list
for TMFs to wait for.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: c66ac9db8d4a ("target: Add LIO target core to kernel v2.6.38")
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Disseldorp <ddiss@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/target/target_core_tmr.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 5eb164dac5cc..5c671fa67339 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -230,13 +230,8 @@ static void core_tmr_drain_tmr_list(
 	 * LUN_RESET tmr..
 	 */
 	spin_lock_irqsave(&dev->se_tmr_lock, flags);
+	list_del_init(&tmr->tmr_list);
 	list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
-		/*
-		 * Allow the received TMR to return with FUNCTION_COMPLETE.
-		 */
-		if (tmr_p == tmr)
-			continue;
-
 		cmd = tmr_p->task_cmd;
 		if (!cmd) {
 			pr_err("Unable to locate struct se_cmd for TMR\n");
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 19/33] target: Avoid that LUN reset sporadically triggers data corruption
       [not found] <20170215002612.14566-1-bart.vanassche@sandisk.com>
  2017-02-15  0:25 ` [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted Bart Van Assche
  2017-02-15  0:25 ` [PATCH v6 15/33] target: Avoid circular waits between LUN resets Bart Van Assche
@ 2017-02-15  0:25 ` Bart Van Assche
  2017-02-20 23:52   ` Nicholas A. Bellinger
  2 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-02-15  0:25 UTC (permalink / raw)
  To: Nicholas A . Bellinger
  Cc: target-devel, Bart Van Assche, Hannes Reinecke,
	Christoph Hellwig, Andy Grover, David Disseldorp, stable

If on an initiator system a LUN reset is issued while I/O is in
progress with queue depth > 1, avoid that data corruption occurs
as follows:
- The initiator submits a READ (a).
- The initiator submits a LUN reset before READ (a) completes.
- The target responds that the LUN reset succeeded after READ (a)
  has been marked as CMD_T_COMPLETE and before .queue_status() has
  been called.
- The initiator receives the LUN reset response and frees the
  tag used by READ (a).
- The initiator submits READ (b) and reuses the tag of READ (a).
- The initiator receives the response for READ (a) and interprets
  this as a completion for READ (b).
- The initiator receives the completion for READ (b) and discards
  it.

With the SRP initiator and target drivers and when running fio
concurrently with sg_reset -d it only takes a few minutes to
reproduce this.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: commit febe562c20df ("target: Fix LUN_RESET active I/O handling for ACK_KREF")
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/target/target_core_tmr.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 32ea7c61d6ac..16e748eb32d2 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -109,7 +109,7 @@ static int target_check_cdb_and_preempt(struct list_head *list,
 	return 1;
 }
 
-static bool __target_check_io_state(struct se_cmd *se_cmd,
+static bool __target_check_io_state(struct se_cmd *se_cmd, u32 skip_flags,
 				    struct se_session *tmr_sess, int tas)
 {
 	struct se_session *sess = se_cmd->se_sess;
@@ -127,7 +127,7 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
 	 * long as se_cmd->cmd_kref is still active unless zero.
 	 */
 	spin_lock(&se_cmd->t_state_lock);
-	if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) {
+	if (se_cmd->transport_state & (skip_flags | CMD_T_FABRIC_STOP)) {
 		pr_debug("Attempted to abort io tag: %llu already complete or"
 			" fabric stop, skipping\n", se_cmd->tag);
 		spin_unlock(&se_cmd->t_state_lock);
@@ -182,7 +182,8 @@ void core_tmr_abort_task(
 		printk("ABORT_TASK: Found referenced %s task_tag: %llu\n",
 			se_cmd->se_tfo->get_fabric_name(), ref_tag);
 
-		if (!__target_check_io_state(se_cmd, se_sess, 0))
+		if (!__target_check_io_state(se_cmd, CMD_T_COMPLETE, se_sess,
+					     0))
 			continue;
 
 		list_del_init(&se_cmd->se_cmd_list);
@@ -354,7 +355,7 @@ static void core_tmr_drain_state_list(
 			continue;
 
 		spin_lock(&sess->sess_cmd_lock);
-		rc = __target_check_io_state(cmd, tmr_sess, tas);
+		rc = __target_check_io_state(cmd, 0, tmr_sess, tas);
 		spin_unlock(&sess->sess_cmd_lock);
 		if (!rc)
 			continue;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted
  2017-02-15  0:25 ` [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted Bart Van Assche
@ 2017-02-20 21:38   ` Nicholas A. Bellinger
  2017-02-21 18:58     ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas A. Bellinger @ 2017-02-20 21:38 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: target-devel, stable

On Tue, 2017-02-14 at 16:25 -0800, Bart Van Assche wrote:
> For several target drivers (e.g. ib_srpt and ib_isert) sleeping inside
> transport_generic_free_cmd() causes RDMA completion processing to stall.
> Hence only sleep inside this function if the (iSCSI) target driver
> requires this.
> 
> This patch avoids that messages similar to the following appear in the
> kernel log:
> 
> INFO: task kworker/u25:0:1013 blocked for more than 480 seconds.
>       Tainted: G        W       4.10.0-rc7-dbg+ #3
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u25:0   D    0  1013      2 0x00000000
> Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
> Call Trace:
>  __schedule+0x2da/0xb00
>  schedule+0x38/0x90
>  schedule_timeout+0x2fe/0x640
>  wait_for_completion+0xfe/0x160
>  transport_generic_free_cmd+0x2e/0x80 [target_core_mod]
>  srpt_send_done+0x59/0x9f [ib_srpt]
>  __ib_process_cq+0x4b/0xd0 [ib_core]
>  ib_cq_poll_work+0x1b/0x60 [ib_core]
>  process_one_work+0x208/0x6a0
>  worker_thread+0x49/0x4a0
>  kthread+0x107/0x140
>  ret_from_fork+0x2e/0x40
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/target/target_core_transport.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 2ed9721a7202..ab1c493a9a16 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2504,15 +2504,13 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  	int ret = 0;
>  	bool aborted = false, tas = false;
>  
> -	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
> -		if (wait_for_tasks && (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
> -			target_wait_free_cmd(cmd, &aborted, &tas);
> +	if (wait_for_tasks)
> +		target_wait_free_cmd(cmd, &aborted, &tas);
>  
> +	if (!(cmd->se_cmd_flags & SCF_SE_LUN_CMD)) {
>  		if (!aborted || tas)
>  			ret = transport_put_cmd(cmd);
>  	} else {
> -		if (wait_for_tasks)
> -			target_wait_free_cmd(cmd, &aborted, &tas);
>  		/*
>  		 * Handle WRITE failure case where transport_generic_new_cmd()
>  		 * has already added se_cmd to state_list, but fabric has
> @@ -2535,7 +2533,6 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  	 */
>  	if (aborted) {
>  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
> -		wait_for_completion(&cmd->cmd_wait_comp);
>  		cmd->se_tfo->release_cmd(cmd);
>  		ret = 1;
>  	}

Removing the aborted wait_for_completion breaks the second order
scenario described for iscsi-target previously here:

http://www.spinics.net/lists/target-devel/msg14456.html

where a concurrent CMD_T_ABORTED occurs while an iscsi connection is
shutting down, and transport_generic_free_cmd() must not return until
cmd->cmd_kref reaches zero and does the complete.

However, looking at transport_generic_free_cmd() in the context of
ib_srpt and ib_isert, I don't see how this scenario can ever happen in
as-is in upstream code.

Namely, all of the transport_generic_free_cmd() callers in ib_srpt and
ib_isert pass wait_for_tasks = false.  And the only time
transport_generic_free_cmd() will block is when wait_for_tasks = true,
or when wait_for_tasks = true && abort = true.

So as-is in upstream, how can transport_generic_free_cmd() ever block
when wait_for_tasks = false..?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 15/33] target: Avoid circular waits between LUN resets
  2017-02-15  0:25 ` [PATCH v6 15/33] target: Avoid circular waits between LUN resets Bart Van Assche
@ 2017-02-20 22:32   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas A. Bellinger @ 2017-02-20 22:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: target-devel, Hannes Reinecke, Christoph Hellwig,
	David Disseldorp, stable

On Tue, 2017-02-14 at 16:25 -0800, Bart Van Assche wrote:
> If an initiator submits a LUN reset while a previous LUN reset is
> still being processed, both LUN resets will be added to dev_tmr_list.
> Avoid that this results in a circular wait between the two LUN resets
> by removing a LUN reset from dev_tmr_list before scanning that list
> for TMFs to wait for.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Fixes: c66ac9db8d4a ("target: Add LIO target core to kernel v2.6.38")
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: David Disseldorp <ddiss@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/target/target_core_tmr.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 5eb164dac5cc..5c671fa67339 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -230,13 +230,8 @@ static void core_tmr_drain_tmr_list(
>  	 * LUN_RESET tmr..
>  	 */
>  	spin_lock_irqsave(&dev->se_tmr_lock, flags);
> +	list_del_init(&tmr->tmr_list);
>  	list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
> -		/*
> -		 * Allow the received TMR to return with FUNCTION_COMPLETE.
> -		 */
> -		if (tmr_p == tmr)
> -			continue;
> -
>  		cmd = tmr_p->task_cmd;
>  		if (!cmd) {
>  			pr_err("Unable to locate struct se_cmd for TMR\n");

As-is I don't think this is a stand-alone upstream bug-fix without patch
#7, that allows multiple target_tmr_work() execute concurrently.

That is because target_tmr_work() only executes one context at a time
since se_device->tmr_wq is using WQ_UNBOUND w/ max_active = 1, so this
is not an issue.

However, I think it's a reasonable improvement on it's own, so I'll
apply it to target-pending/for-next with a slightly different
description and without the stable CC.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 19/33] target: Avoid that LUN reset sporadically triggers data corruption
  2017-02-15  0:25 ` [PATCH v6 19/33] target: Avoid that LUN reset sporadically triggers data corruption Bart Van Assche
@ 2017-02-20 23:52   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas A. Bellinger @ 2017-02-20 23:52 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: target-devel, Hannes Reinecke, Christoph Hellwig, Andy Grover,
	David Disseldorp, stable

On Tue, 2017-02-14 at 16:25 -0800, Bart Van Assche wrote:
> If on an initiator system a LUN reset is issued while I/O is in
> progress with queue depth > 1, avoid that data corruption occurs
> as follows:
> - The initiator submits a READ (a).
> - The initiator submits a LUN reset before READ (a) completes.
> - The target responds that the LUN reset succeeded after READ (a)
>   has been marked as CMD_T_COMPLETE and before .queue_status() has
>   been called.
> - The initiator receives the LUN reset response and frees the
>   tag used by READ (a).
> - The initiator submits READ (b) and reuses the tag of READ (a).
> - The initiator receives the response for READ (a) and interprets
>   this as a completion for READ (b).
> - The initiator receives the completion for READ (b) and discards
>   it.
> 
> With the SRP initiator and target drivers and when running fio
> concurrently with sg_reset -d it only takes a few minutes to
> reproduce this.
> 

Now this is an interesting one.

AFAICT this can happen regardless of the previous changes in this
series, so I'll treat it as a stand-alone bug-fix.

Comments below.

> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Fixes: commit febe562c20df ("target: Fix LUN_RESET active I/O handling for ACK_KREF")
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Andy Grover <agrover@redhat.com>
> Cc: David Disseldorp <ddiss@suse.de>
> Cc: <stable@vger.kernel.org>
> ---
>  drivers/target/target_core_tmr.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 32ea7c61d6ac..16e748eb32d2 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -109,7 +109,7 @@ static int target_check_cdb_and_preempt(struct list_head *list,
>  	return 1;
>  }
>  
> -static bool __target_check_io_state(struct se_cmd *se_cmd,
> +static bool __target_check_io_state(struct se_cmd *se_cmd, u32 skip_flags,
>  				    struct se_session *tmr_sess, int tas)
>  {
>  	struct se_session *sess = se_cmd->se_sess;
> @@ -127,7 +127,7 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
>  	 * long as se_cmd->cmd_kref is still active unless zero.
>  	 */
>  	spin_lock(&se_cmd->t_state_lock);
> -	if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) {
> +	if (se_cmd->transport_state & (skip_flags | CMD_T_FABRIC_STOP)) {
>  		pr_debug("Attempted to abort io tag: %llu already complete or"
>  			" fabric stop, skipping\n", se_cmd->tag);
>  		spin_unlock(&se_cmd->t_state_lock);
> @@ -182,7 +182,8 @@ void core_tmr_abort_task(
>  		printk("ABORT_TASK: Found referenced %s task_tag: %llu\n",
>  			se_cmd->se_tfo->get_fabric_name(), ref_tag);
>  
> -		if (!__target_check_io_state(se_cmd, se_sess, 0))
> +		if (!__target_check_io_state(se_cmd, CMD_T_COMPLETE, se_sess,
> +					     0))
>  			continue;
>  
>  		list_del_init(&se_cmd->se_cmd_list);
> @@ -354,7 +355,7 @@ static void core_tmr_drain_state_list(
>  			continue;
>  
>  		spin_lock(&sess->sess_cmd_lock);
> -		rc = __target_check_io_state(cmd, tmr_sess, tas);
> +		rc = __target_check_io_state(cmd, 0, tmr_sess, tas);
>  		spin_unlock(&sess->sess_cmd_lock);
>  		if (!rc)
>  			continue;

As-is just ignoring CMD_T_COMPLETE in core_tmr_drain_state_list() is not
enough to address this bug and not cause other issues, because once a
se_cmd descriptor is handed back to the fabric driver after
transport_cmd_check_stop_to_fabric() is called,
__target_check_io_state() must not attempt to abort the descriptor.

That said, here is how I'd like to address this particular bug.

1) Allow CMD_T_COMPLETE to occur, but still ignore se_cmds that have
already called transport_cmd_check_stop_to_fabric().  Eg: CMD_T_ACTIVE
is not set, but CMD_T_SENT is set.

2) Since transport_complete_ok() can execute while CMD_T_ABORTED is set,
they both need to check for aborted status, and immediately invoke
transport_cmd_check_stop_to_fabric() if detected.

Here is a first pass at a patch that does both of these things.

Since ib_srpt appears to be the only one who can trigger this bug since
it uses fixed tags, would you be so kind to try to reproduce with this
stand-alone patch..?

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index a806d9b..9ba1427 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -109,25 +109,43 @@ static int target_check_cdb_and_preempt(struct list_head *list,
 	return 1;
 }
 
+static bool target_check_abort_state(struct se_cmd *se_cmd)
+{
+	return (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP));
+}
+
+static bool target_check_lur_state(struct se_cmd *se_cmd)
+{
+	return ((se_cmd->transport_state & CMD_T_FABRIC_STOP) ||
+		(!(se_cmd->transport_state & CMD_T_ACTIVE) &&
+		(se_cmd->transport_state & CMD_T_SENT)));
+}
+
 static bool __target_check_io_state(struct se_cmd *se_cmd,
-				    struct se_session *tmr_sess, int tas)
+				    struct se_session *tmr_sess, int tas,
+				    bool (*check_transport_state)(struct se_cmd *))
 {
 	struct se_session *sess = se_cmd->se_sess;
 
 	assert_spin_locked(&sess->sess_cmd_lock);
 	WARN_ON_ONCE(!irqs_disabled());
 	/*
-	 * If command already reached CMD_T_COMPLETE state within
-	 * target_complete_cmd() or CMD_T_FABRIC_STOP due to shutdown,
-	 * this se_cmd has been passed to fabric driver and will
-	 * not be aborted.
+	 * For ABORT_TASK, if command already reached CMD_T_COMPLETE
+	 * state within target_complete_cmd() or CMD_T_FABRIC_STOP
+	 * due to shutdown, this se_cmd has been passed to fabric
+	 * driver and will not be aborted.
+	 *
+	 * For LUN_RESET, this is checked using !CMD_T_ACTIVE and
+	 * CMD_T_SENT to determine if se_cmd has already been
+	 * handed off to fabric driver code, and abort here needs
+	 * to be ignored.
 	 *
 	 * Otherwise, obtain a local se_cmd->cmd_kref now for TMR
 	 * ABORT_TASK + LUN_RESET for CMD_T_ABORTED processing as
 	 * long as se_cmd->cmd_kref is still active unless zero.
 	 */
 	spin_lock(&se_cmd->t_state_lock);
-	if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) {
+	if (check_transport_state(se_cmd)) {
 		pr_debug("Attempted to abort io tag: %llu already complete or"
 			" fabric stop, skipping\n", se_cmd->tag);
 		spin_unlock(&se_cmd->t_state_lock);
@@ -175,7 +193,8 @@ void core_tmr_abort_task(
 		printk("ABORT_TASK: Found referenced %s task_tag: %llu\n",
 			se_cmd->se_tfo->get_fabric_name(), ref_tag);
 
-		if (!__target_check_io_state(se_cmd, se_sess, 0))
+		if (!__target_check_io_state(se_cmd, se_sess, 0,
+					     target_check_abort_state))
 			continue;
 
 		list_del_init(&se_cmd->se_cmd_list);
@@ -339,7 +358,8 @@ static void core_tmr_drain_state_list(
 			continue;
 
 		spin_lock(&sess->sess_cmd_lock);
-		rc = __target_check_io_state(cmd, tmr_sess, tas);
+		rc = __target_check_io_state(cmd, tmr_sess, tas,
+					     target_check_lur_state);
 		spin_unlock(&sess->sess_cmd_lock);
 		if (!rc)
 			continue;
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index efb9e6f..246995a 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2082,6 +2082,10 @@ static void target_complete_ok_work(struct work_struct *work)
 	 */
 	if (cmd->se_cmd_flags & SCF_TRANSPORT_TASK_SENSE) {
 		WARN_ON(!cmd->scsi_status);
+
+		if (cmd->transport_state & CMD_T_ABORTED)
+			goto queue_out;
+
 		ret = transport_send_check_condition_and_sense(
 					cmd, 0, 1);
 		if (ret == -EAGAIN || ret == -ENOMEM)
@@ -2108,6 +2112,9 @@ static void target_complete_ok_work(struct work_struct *work)
 
 			return;
 		} else if (rc) {
+			if (cmd->transport_state & CMD_T_ABORTED)
+				goto queue_out;
+
 			ret = transport_send_check_condition_and_sense(cmd,
 						rc, 0);
 			if (ret == -EAGAIN || ret == -ENOMEM)
@@ -2120,6 +2127,9 @@ static void target_complete_ok_work(struct work_struct *work)
 	}
 
 queue_rsp:
+	if (cmd->transport_state & CMD_T_ABORTED)
+		goto queue_out;
+
 	switch (cmd->data_direction) {
 	case DMA_FROM_DEVICE:
 		if (cmd->scsi_status)
@@ -2174,6 +2184,7 @@ static void target_complete_ok_work(struct work_struct *work)
 		break;
 	}
 
+queue_out:
 	transport_lun_remove_cmd(cmd);
 	transport_cmd_check_stop_to_fabric(cmd);
 	return;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted
  2017-02-20 21:38   ` Nicholas A. Bellinger
@ 2017-02-21 18:58     ` Bart Van Assche
  2017-03-02  5:21       ` Nicholas A. Bellinger
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-02-21 18:58 UTC (permalink / raw)
  To: Nicholas A. Bellinger; +Cc: target-devel, stable

On 02/20/2017 01:38 PM, Nicholas A. Bellinger wrote:
> However, looking at transport_generic_free_cmd() in the context of
> ib_srpt and ib_isert, I don't see how this scenario can ever happen in
> as-is in upstream code.
> 
> Namely, all of the transport_generic_free_cmd() callers in ib_srpt and
> ib_isert pass wait_for_tasks = false.  And the only time
> transport_generic_free_cmd() will block is when wait_for_tasks = true,
> or when wait_for_tasks = true && abort = true.
> 
> So as-is in upstream, how can transport_generic_free_cmd() ever block
> when wait_for_tasks = false..?

I will check the other patches in this series for changes that trigger a
call to wait_for_completion() if wait_for_tasks == false.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted
  2017-02-21 18:58     ` Bart Van Assche
@ 2017-03-02  5:21       ` Nicholas A. Bellinger
  2017-03-02  5:24         ` Bart Van Assche
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas A. Bellinger @ 2017-03-02  5:21 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: target-devel, stable

On Tue, 2017-02-21 at 18:58 +0000, Bart Van Assche wrote:
> On 02/20/2017 01:38 PM, Nicholas A. Bellinger wrote:
> > However, looking at transport_generic_free_cmd() in the context of
> > ib_srpt and ib_isert, I don't see how this scenario can ever happen in
> > as-is in upstream code.
> > 
> > Namely, all of the transport_generic_free_cmd() callers in ib_srpt and
> > ib_isert pass wait_for_tasks = false.  And the only time
> > transport_generic_free_cmd() will block is when wait_for_tasks = true,
> > or when wait_for_tasks = true && abort = true.
> > 
> > So as-is in upstream, how can transport_generic_free_cmd() ever block
> > when wait_for_tasks = false..?
> 
> I will check the other patches in this series for changes that trigger a
> call to wait_for_completion() if wait_for_tasks == false.
> 

Given there has not been any follow-up to identify a case in upstream
where ib_isert or ib_srpt is using target_generic_free_cmd() with
wait_for_tasks = true or a case where wait_for_tasks = false blocks,
I'll conclude this was a bugfix incorrectly CC'ed to stable for breakage
introduced by other changes in this series.

To reiterate the importance of having bug-fixes, especially those
intended for stable, always be leading other patches..

There is no way for a maintainer to know which bug-fixes are to existing
code unless they precede all other patches and not intermixed with
various other changes.  The bug-fixes to existing upstream code need to
be tested on their own without the other changes (especially those that
effect the same area) invalidating the tests.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted
  2017-03-02  5:21       ` Nicholas A. Bellinger
@ 2017-03-02  5:24         ` Bart Van Assche
  2017-03-02  7:02           ` Nicholas A. Bellinger
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Van Assche @ 2017-03-02  5:24 UTC (permalink / raw)
  To: nab; +Cc: target-devel, stable

On Wed, 2017-03-01 at 21:21 -0800, Nicholas A. Bellinger wrote:
> To reiterate the importance of having bug-fixes, especially those
> intended for stable, always be leading other patches..

You should know that I can't group all bugfixes at the start of the series
because some of the bugfixes depend on patches that are not bugfixes.

Bart.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted
  2017-03-02  5:24         ` Bart Van Assche
@ 2017-03-02  7:02           ` Nicholas A. Bellinger
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas A. Bellinger @ 2017-03-02  7:02 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: target-devel, stable

On Thu, 2017-03-02 at 05:24 +0000, Bart Van Assche wrote:
> On Wed, 2017-03-01 at 21:21 -0800, Nicholas A. Bellinger wrote:
> > To reiterate the importance of having bug-fixes, especially those
> > intended for stable, always be leading other patches..
> 
> You should know that I can't group all bugfixes at the start of the series
> because some of the bugfixes depend on patches that are not bugfixes.
> 

That makes no sense.  Either it's a bug-fix to existing upstream code,
or it's not.

This patch was not a bug-fix to upstream, because it detailed a scenario
that doesn't existing in upstream.  That is, a case where ib_isert or
ib_srpt calls target_generic_free_cmd() with wait_for_tasks = true or a
case where wait_for_tasks = false blocks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-02  7:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170215002612.14566-1-bart.vanassche@sandisk.com>
2017-02-15  0:25 ` [PATCH v6 14/33] target: Avoid that target drivers hang if a command is aborted Bart Van Assche
2017-02-20 21:38   ` Nicholas A. Bellinger
2017-02-21 18:58     ` Bart Van Assche
2017-03-02  5:21       ` Nicholas A. Bellinger
2017-03-02  5:24         ` Bart Van Assche
2017-03-02  7:02           ` Nicholas A. Bellinger
2017-02-15  0:25 ` [PATCH v6 15/33] target: Avoid circular waits between LUN resets Bart Van Assche
2017-02-20 22:32   ` Nicholas A. Bellinger
2017-02-15  0:25 ` [PATCH v6 19/33] target: Avoid that LUN reset sporadically triggers data corruption Bart Van Assche
2017-02-20 23:52   ` Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.