All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Christie <michael.christie@oracle.com>
To: Dmitry Bogdanov <d.bogdanov@yadro.com>,
	Martin Petersen <martin.petersen@oracle.com>,
	target-devel@vger.kernel.org
Cc: linux-scsi@vger.kernel.org, linux@yadro.com,
	Roman Bolshakov <r.bolshakov@yadro.com>
Subject: Re: [PATCH v2] target: core: remove from tmr_list at lun unlink
Date: Fri, 16 Apr 2021 17:39:02 -0500	[thread overview]
Message-ID: <d4a19bc6-a89a-7572-b726-31df86fc84fd@oracle.com> (raw)
In-Reply-To: <20210416092146.3201-1-d.bogdanov@yadro.com>

On 4/16/21 4:21 AM, Dmitry Bogdanov wrote:
> Currently TMF commands are removed from de_device.dev_tmf_list at
> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
> up on a command status (response) is queued in transport layer.
> It means that LUN and backend device can be deleted meantime and at
> the moment of repsonse completion a panic is occured:
> 
> target_tmr_work()
> 	cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
> 	transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
> - // - // - // -
> <<<--- lun remove
> <<<--- core backend device remove
> - // - // - // -
> qlt_handle_abts_completion()
>   tfo->free_mcmd()
>     transport_generic_free_cmd()
>       target_put_sess_cmd()
>         core_tmr_release_req() {
>           if (dev) { // backend device, can not be null
>             spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
> 
> Call Trace:
> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
> Call Trace:
> (unreliable)
> 0x0
> target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
> process_one_work+0x2c4/0x5c0
> worker_thread+0x88/0x690
> 
> For FC protocol it is a race condition, but for iSCSI protocol it is
> easyly reproduced by manual sending iSCSI commands:
> - Send some SCSI sommand
> - Send Abort of that command over iSCSI
> - Remove LUN on target
> - Send next iSCSI command to acknowledge the Abort_Response
> - target panics
> 
> There is no sense to keep the command in tmr_list until response
> completion, so move the removal from tmr_list from the response
> completion to the response queueing when lun is unlinked.
> Move the removal from state list too as it is a subject to the same
> race condition.
> 
> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
> 
> ---
> v2:
>  fix cmd stuck in tmr list in error case in iscsi
>  move clearing cmd->se_lun to transport_lun_remove_cmd
> 
> The issue exists from the very begining.
> I uploaded a scapy script that helps to reproduce the issue at
> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!MjANCRIp5ZrtKYonxEKclROOwOe7s-adKHLiUd2Njis6m1774RMpLGNkHyapFf68BwFr$ 
> ---
>  drivers/target/iscsi/iscsi_target.c    |  2 +-
>  drivers/target/target_core_tmr.c       |  9 --------
>  drivers/target/target_core_transport.c | 29 +++++++++++++++++++-------
>  3 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
> index d0e7ed8f28cc..3311b031a812 100644
> --- a/drivers/target/iscsi/iscsi_target.c
> +++ b/drivers/target/iscsi/iscsi_target.c
> @@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
>  	 * TMR TASK_REASSIGN.
>  	 */
>  	iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
> -	target_put_sess_cmd(&cmd->se_cmd);
> +	transport_generic_free_cmd(&cmd->se_cmd, false);
>  	return 0;
>  }

Doh. I see how I got all confused. I guess this path leaks the lun_ref
taken by transport_lookup_tmr_lun. It looks like an old issue and nothing
to do with your patch.

I'm not sure if we are supposed to be calling transport_generic_free_cmd
twice. It looks like it works ok, because your patch added the "cmd->se_lun = NULL"
in transport_lun_remove_cmd, so we won't do a double list deletion.
It feels dirty though. I can feel Bart saying, "Mike did you see the comment
at the top of the function". :)

Maybe it's best to more cleanly unwind what was setup in the rror path. I think we
can fix lun_ref leak too.

So instead of doing transport_generic_free_cmd above do transport_lun_remove_cmd
to match/undo the transport_lookup_tmr_lun call in iscsit_handle_task_mgt_cmd?

  parent reply	other threads:[~2021-04-16 22:39 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16  9:21 [PATCH v2] target: core: remove from tmr_list at lun unlink Dmitry Bogdanov
2021-04-16 21:38 ` Mike Christie
2021-04-16 21:41   ` Mike Christie
2021-04-16 22:39 ` Mike Christie [this message]
2021-04-16 22:50   ` Mike Christie
2021-04-21 15:07     ` Dmitriy Bogdanov
2021-04-17  9:39 ` Konstantin Shelekhin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4a19bc6-a89a-7572-b726-31df86fc84fd@oracle.com \
    --to=michael.christie@oracle.com \
    --cc=d.bogdanov@yadro.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@yadro.com \
    --cc=martin.petersen@oracle.com \
    --cc=r.bolshakov@yadro.com \
    --cc=target-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.