All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Konstantin Shelekhin <k.shelekhin@yadro.com>,
	target-devel@vger.kernel.org
Cc: linux@yadro.com, Maurizio Lombardi <mlombard@redhat.com>
Subject: Re: iSCSI Abort Task and WRITE PENDING
Date: Wed, 13 Oct 2021 13:05:26 -0500	[thread overview]
Message-ID: <b49d2f3b-4762-832f-55ff-7543e08f387b@oracle.com> (raw)
In-Reply-To: <17647b68-f039-3fc3-808e-0feb652ddf8c@oracle.com>

On 10/13/21 12:51 PM, Mike Christie wrote:
> On 10/13/21 8:21 AM, Konstantin Shelekhin wrote:
>> Hi,
>>
>> I really need the collective wisdom.
>>
>> Not long ago we've uncovered the problem with iSCSI and ABORT TASK
>> handling. Currently it's not possible to abort a WRITE_10 command in
>> TRANSPORT_WRITE_PENDING state, because ABORT TASK  will hang itself in
>> the process:
>>
>>   # dmesg | tail -2
>>   [   83.563505] ABORT_TASK: Found referenced iSCSI task_tag: 3372979269
>>   [   84.593545] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus <nexus>
>>
>>   # ps aux | awk '$8 ~/D/'
>>   root        32  0.0  0.0      0     0 ?        D    15:19   0:00 [kworker/0:1+events]
>>   root      1187  0.0  0.0      0     0 ?        D    15:20   0:00 [iscsi_ttx]
>>
>>   # cat /proc/32/stack
>>   [<0>] target_put_cmd_and_wait+0x68/0xa0
>>   [<0>] core_tmr_abort_task.cold+0x16b/0x192
>>   [<0>] target_tmr_work+0x9e/0xe0
>>   [<0>] process_one_work+0x1d4/0x370
>>   [<0>] worker_thread+0x48/0x3d0
>>   [<0>] kthread+0x122/0x140
>>   [<0>] ret_from_fork+0x22/0x30
>>
>>   # cat /proc/1187/stack
>>   [<0>] __transport_wait_for_tasks+0xaf/0x100
>>   [<0>] transport_generic_free_cmd+0xe9/0x180
>>   [<0>] iscsit_free_cmd+0x50/0xb0
>>   [<0>] iscsit_close_connection+0x47d/0x8c0
>>   [<0>] iscsit_take_action_for_connection_exit+0x6f/0xf0
>>   [<0>] iscsi_target_tx_thread+0x184/0x200
>>   [<0>] kthread+0x122/0x140
>>   [<0>] ret_from_fork+0x22/0x30
>>
>> What happens:
>>
>>   1. Initiator sends WRITE_10 CDB
>>   2. Target parses the CDB and sends R2T
>>   3. Target starts the Data-Out timer
>>   4. Initiator sends ABORT TASK; no new data from Initiator after this
>>   5. Target starts aborting WRITE_10, gets into core_tmr_abort_task()
>>      and starts waiting for the request completion
>>   6. Nothing happens
>>   7. The Data-Out timers expires, connection teardown starts and gets
>>      stuck waiting for ABORT TASK that waits for WRITE_10
>>
>> The ABORT TASK processing looks roughly like this:
>>
>>   iscsi_rx_opcode
>>     iscsi_handle_task_mgt_cmd
>>       iscsi_tmr_abort_task
>>       transport_generic_handle_tmr
>>         if (tmr_cmd->transport_state & CMD_T_ABORTED)
>>           target_handle_abort
>>         else
>>           target_tmr_work
>>             if (tmr_cmd->transport_state & CMD_T_ABORTED)
>>               target_handle_abort
>>             else
>>               core_tmr_abort_task
>>                 ret = __target_check_io_state
>>                   if (write_cmd->transport_state & CMD_T_STOP)
>>                     return -1
>>                   write_cmd->transport_state |= CMD_T_ABORTED
>>                   return 0
>>                 if (!ret)
>>                   list_move_tail(&write_cmd->state_list, &aborted)
>>                   target_put_cmd_and_wait(&write_cmd)
>>
>> As I see it, the main problem is that the abort path can't initiate the
>> command termination, it simply waits for the request to handle this on
>> the execution path like in target_execute_cmd():
>>
>>   target_execute_cmd
>>     target_cmd_interrupted
>>       INIT_WORK(&cmd->work, target_abort_work)
>>
>> However, in this case the request is not going to be executed because
>> Initiator will not send the Data-Out buffer.
>>
>> I have a couple of ideas on how to fix this, but they all look kinda
>> ugly. The one that currently works around this for me:
>>
>>   core_tmr_abort_task():
>>
>>     [...]
>>
>>     spin_lock_irqsave(&se_cmd->t_state_lock, flags);
>>     write_pending = se_cmd->t_state == TRANSPORT_WRITE_PENDING;
>>     spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
>>     
>>     if (write_pending && se_cmd->se_tfo->abort_write_pending)
>>             se_cmd->se_tfo->abort_write_pending(se_cmd);
>>     
>>     target_put_cmd_and_wait(se_cmd);
>>
>>     [...]
>>
>> The new method abort_write_pending() is defined only for iSCSI and calls
>> target_handle_abort(). However, this opens up another can of worms
>> because this code heavily races with R2T sending and requires a couple
>> of checks to "work most of the time". Not ideal, by far.
>>
>> I can make this one better by introducing R2T list draining that ensures
>> the proper order during cleanup, but maybe there is a much easier way
>> that I'm not seeing here.
> 
> Ccing Maurizio to make sure I don't add his original bug back.
> 
> If I understand you, I think I added this bug in:
> 
> commit f36199355c64a39fe82cfddc7623d827c7e050da
> Author: Mike Christie <michael.christie@oracle.com>
> Date:   Fri Nov 13 19:46:18 2020 -0600
> 
>     scsi: target: iscsi: Fix cmd abort fabric stop race
> 
> With that patch if the abort or a lun reset has got to lio core then we
> are going to be stuck waiting for the data which won't come because we
> killed the iscsi threads.
> 
> Can go back to always having the iscsi target clean up the cmd, but if
> LIO has started to abort the cmd we take an extra ref so we don't free
> the cmd from under each other.
> 
> This patch is completely untested:
> 
> 
> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index 2c54c5d8412d..d221e9be7468 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -4090,12 +4090,13 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn)
>  			spin_lock_irq(&se_cmd->t_state_lock);
>  			if (se_cmd->transport_state & CMD_T_ABORTED) {
>  				/*
> -				 * LIO's abort path owns the cleanup for this,
> -				 * so put it back on the list and let
> -				 * aborted_task handle it.
> +				 * The LIO TMR handler owns the cmd but if
> +				 * we were waiting for data from the initiator
> +				 * then we need to internally cleanup to be
> +				 * able to complete it. Get an extra ref so
> +				 * we don't free the cmd from under LIO core.
>  				 */
> -				list_move_tail(&cmd->i_conn_node,
> -					       &conn->conn_cmd_list);
> +				target_get_sess_cmd(se_cmd, false);
>  			} else {
>  				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
>  			}
> 

Another alternative would be to have iscsi check if it was waiting on
data before sending a TMR to LIO core. If it is, then it would just cleanup
internally and complete the TMR with success/failed depending on the TMR or
drop it and let the initiator escalate.

  reply	other threads:[~2021-10-13 18:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-13 13:21 iSCSI Abort Task and WRITE PENDING Konstantin Shelekhin
2021-10-13 14:22 ` Hannes Reinecke
2021-10-13 14:53   ` Konstantin Shelekhin
2021-10-13 14:56     ` Konstantin Shelekhin
2021-10-14  7:09     ` Hannes Reinecke
2021-10-14  7:52       ` Konstantin Shelekhin
2021-10-13 17:51 ` Mike Christie
2021-10-13 18:05   ` Mike Christie [this message]
2021-10-13 18:11     ` Konstantin Shelekhin
2021-10-13 18:08   ` Konstantin Shelekhin
2021-10-13 18:24     ` Mike Christie
2021-10-13 18:30       ` Mike Christie
2021-10-13 18:58         ` Konstantin Shelekhin
2021-10-13 19:01           ` Konstantin Shelekhin
2021-10-13 20:21             ` Mike Christie
2021-10-14 23:12               ` Konstantin Shelekhin
2021-10-15  3:18                 ` michael.christie
2021-10-18 11:56                   ` Konstantin Shelekhin
2021-10-18 16:29                     ` Mike Christie
2021-10-18 17:08                       ` Mike Christie
2021-10-26 10:59                         ` Konstantin Shelekhin
2021-10-18 17:32                       ` Konstantin Shelekhin
2021-10-18 20:20                         ` Mike Christie
2021-10-18 20:34                           ` Mike Christie
2021-10-18 21:50                             ` Konstantin Shelekhin
2021-10-18 21:48                           ` Konstantin Shelekhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b49d2f3b-4762-832f-55ff-7543e08f387b@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=k.shelekhin@yadro.com \
    --cc=linux@yadro.com \
    --cc=mlombard@redhat.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.