* [PATCH] target: core: remove from tmr_list at lun unlink
@ 2021-04-07 14:21 Dmitry Bogdanov
2021-04-09 17:18 ` Mike Christie
0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Bogdanov @ 2021-04-07 14:21 UTC (permalink / raw)
To: Martin Petersen, target-devel
Cc: linux-scsi, linux, Dmitry Bogdanov, Roman Bolshakov
Currently TMF commands are removed from de_device.dev_tmf_list at
the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
up on a command status (response) is queued in transport layer.
It means that LUN and backend device can be deleted meantime and at
the moment of repsonse completion a panic is occured:
target_tmr_work()
cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
— // — // — // —
<<<--- lun remove
<<<--- core backend device remove
— // — // — // —
qlt_handle_abts_completion()
tfo->free_mcmd()
transport_generic_free_cmd()
target_put_sess_cmd()
core_tmr_release_req() {
if (dev) { // backend device, can not be null
spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
Call Trace:
NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
Call Trace:
(unreliable)
0x0
target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
process_one_work+0x2c4/0x5c0
worker_thread+0x88/0x690
For FC protocol it is a race condition, but for iSCSI protocol it is
easyly reproduced by manual sending iSCSI commands:
- Send some SCSI sommand
- Send Abort of that command over iSCSI
- Remove LUN on target
- Send next iSCSI command to acknowledge the Abort_Response
- target panics
There is no sense to keep the command in tmr_list until response
completion, so move the removal from tmr_list from the response
completion to the response queueing when lun is unlinked.
Move the removal from state list too as it is a subject to the same
race condition.
Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
---
The issue exists from the very begining.
I uploaded a script that helps to reproduce the issue at
https://gist.github.com/logost/cb93df41dd2432454324449b390403c4
---
drivers/target/target_core_tmr.c | 9 ---------
drivers/target/target_core_transport.c | 19 +++++++++++++++++--
2 files changed, 17 insertions(+), 11 deletions(-)
diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 7347285471fa..323a173010c1 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -50,15 +50,6 @@ EXPORT_SYMBOL(core_tmr_alloc_req);
void core_tmr_release_req(struct se_tmr_req *tmr)
{
- struct se_device *dev = tmr->tmr_dev;
- unsigned long flags;
-
- if (dev) {
- spin_lock_irqsave(&dev->se_tmr_lock, flags);
- list_del_init(&tmr->tmr_list);
- spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
- }
-
kfree(tmr);
}
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 5ecb9f18a53d..4d9968f43cf1 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -667,6 +667,20 @@ static void target_remove_from_state_list(struct se_cmd *cmd)
spin_unlock_irqrestore(&dev->queues[cmd->cpuid].lock, flags);
}
+static void target_remove_from_tmr_list(struct se_cmd *cmd)
+{
+ struct se_device *dev = NULL;
+ unsigned long flags;
+
+ if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
+ dev = cmd->se_tmr_req->tmr_dev;
+
+ if (dev) {
+ spin_lock_irqsave(&dev->se_tmr_lock, flags);
+ list_del_init(&cmd->se_tmr_req->tmr_list);
+ spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
+ }
+}
/*
* This function is called by the target core after the target core has
* finished processing a SCSI command or SCSI TMF. Both the regular command
@@ -678,8 +692,6 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd)
{
unsigned long flags;
- target_remove_from_state_list(cmd);
-
/*
* Clear struct se_cmd->se_lun before the handoff to FE.
*/
@@ -719,6 +731,9 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
if (!lun)
return;
+ target_remove_from_state_list(cmd);
+ target_remove_from_tmr_list(cmd);
+
if (cmpxchg(&cmd->lun_ref_active, true, false))
percpu_ref_put(&lun->lun_ref);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] target: core: remove from tmr_list at lun unlink
2021-04-07 14:21 [PATCH] target: core: remove from tmr_list at lun unlink Dmitry Bogdanov
@ 2021-04-09 17:18 ` Mike Christie
2021-04-09 17:34 ` Mike Christie
0 siblings, 1 reply; 4+ messages in thread
From: Mike Christie @ 2021-04-09 17:18 UTC (permalink / raw)
To: Dmitry Bogdanov, Martin Petersen, target-devel
Cc: linux-scsi, linux, Roman Bolshakov
On 4/7/21 9:21 AM, Dmitry Bogdanov wrote:
> Currently TMF commands are removed from de_device.dev_tmf_list at
> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
> up on a command status (response) is queued in transport layer.
> It means that LUN and backend device can be deleted meantime and at
> the moment of repsonse completion a panic is occured:
>
> target_tmr_work()
> cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
> transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
> — // — // — // —
> <<<--- lun remove
> <<<--- core backend device remove
> — // — // — // —
> qlt_handle_abts_completion()
> tfo->free_mcmd()
> transport_generic_free_cmd()
> target_put_sess_cmd()
> core_tmr_release_req() {
> if (dev) { // backend device, can not be null
> spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
>
> Call Trace:
> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
> Call Trace:
> (unreliable)
> 0x0
> target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
> process_one_work+0x2c4/0x5c0
> worker_thread+0x88/0x690
>
> For FC protocol it is a race condition, but for iSCSI protocol it is
> easyly reproduced by manual sending iSCSI commands:
> - Send some SCSI sommand
> - Send Abort of that command over iSCSI
> - Remove LUN on target
> - Send next iSCSI command to acknowledge the Abort_Response
> - target panics
>
> There is no sense to keep the command in tmr_list until response
> completion, so move the removal from tmr_list from the response
> completion to the response queueing when lun is unlinked.
> Move the removal from state list too as it is a subject to the same
> race condition.
>
> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
>
> ---
> The issue exists from the very begining.
> I uploaded a script that helps to reproduce the issue at
> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!KN-N7Ult7Itn2AzY6LdP2vm0D81UIjpbCyOAH3uf2OoLGUykpGP3dJTQlLm9m71MjsxY$
> ---
> drivers/target/target_core_tmr.c | 9 ---------
> drivers/target/target_core_transport.c | 19 +++++++++++++++++--
> 2 files changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
> index 7347285471fa..323a173010c1 100644
> --- a/drivers/target/target_core_tmr.c
> +++ b/drivers/target/target_core_tmr.c
> @@ -50,15 +50,6 @@ EXPORT_SYMBOL(core_tmr_alloc_req);
>
> void core_tmr_release_req(struct se_tmr_req *tmr)
> {
> - struct se_device *dev = tmr->tmr_dev;
> - unsigned long flags;
> -
> - if (dev) {
> - spin_lock_irqsave(&dev->se_tmr_lock, flags);
> - list_del_init(&tmr->tmr_list);
> - spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> - }
> -
> kfree(tmr);
> }
>
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 5ecb9f18a53d..4d9968f43cf1 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -667,6 +667,20 @@ static void target_remove_from_state_list(struct se_cmd *cmd)
> spin_unlock_irqrestore(&dev->queues[cmd->cpuid].lock, flags);
> }
>
> +static void target_remove_from_tmr_list(struct se_cmd *cmd)
> +{
> + struct se_device *dev = NULL;
> + unsigned long flags;
> +
This is just a nit. Maybe just do:
struct se_device *dev = NULL;
unsigned long flags;
if (!(cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
return;
dev = cmd->se_tmr_req->tmr_dev;
spin_lock_irqsave(&dev->se_tmr_lock, flags);
list_del_init(&cmd->se_tmr_req->tmr_list);
spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
It will look like target_remove_from_state_list. Below I was expecting
some sort of twist ending with the extra if in there. But we just wanted
to run the function for tmrs.
> + if (cmd->se_cmd_flags & SCF_SCSI_TMR_CDB)
> + dev = cmd->se_tmr_req->tmr_dev;
> +
> + if (dev) {
> + spin_lock_irqsave(&dev->se_tmr_lock, flags);
> + list_del_init(&cmd->se_tmr_req->tmr_list);
> + spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> + }
> +}
> /*
> * This function is called by the target core after the target core has
> * finished processing a SCSI command or SCSI TMF. Both the regular command
> @@ -678,8 +692,6 @@ static int transport_cmd_check_stop_to_fabric(struct se_cmd *cmd)
> {
> unsigned long flags;
>
> - target_remove_from_state_list(cmd);
> -
> /*
> * Clear struct se_cmd->se_lun before the handoff to FE.
> */
Right below this line we clear se_cmd->selun. I think that should go in
transport_lun_remove_cmd since it's the function that does the put and now
with your changes we handle all the last refs there.
> @@ -719,6 +731,9 @@ static void transport_lun_remove_cmd(struct se_cmd *cmd)
> if (!lun)
> return;
>
> + target_remove_from_state_list(cmd);
> + target_remove_from_tmr_list(cmd);
> +
> if (cmpxchg(&cmd->lun_ref_active, true, false))
> percpu_ref_put(&lun->lun_ref);
> }
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] target: core: remove from tmr_list at lun unlink
2021-04-09 17:18 ` Mike Christie
@ 2021-04-09 17:34 ` Mike Christie
2021-04-13 15:53 ` Dmitriy Bogdanov
0 siblings, 1 reply; 4+ messages in thread
From: Mike Christie @ 2021-04-09 17:34 UTC (permalink / raw)
To: Dmitry Bogdanov, Martin Petersen, target-devel
Cc: linux-scsi, linux, Roman Bolshakov
On 4/9/21 12:18 PM, Mike Christie wrote:
> On 4/7/21 9:21 AM, Dmitry Bogdanov wrote:
>> Currently TMF commands are removed from de_device.dev_tmf_list at
>> the very end of se_cmd lifecycle. But se_lun unlinks from se_cmd
>> up on a command status (response) is queued in transport layer.
>> It means that LUN and backend device can be deleted meantime and at
>> the moment of repsonse completion a panic is occured:
>>
>> target_tmr_work()
>> cmd->se_tfo->queue_tm_rsp(cmd); // send abort_rsp to a wire
>> transport_lun_remove_cmd(cmd) // unlink se_cmd from se_lun
>> — // — // — // —
>> <<<--- lun remove
>> <<<--- core backend device remove
>> — // — // — // —
>> qlt_handle_abts_completion()
>> tfo->free_mcmd()
>> transport_generic_free_cmd()
>> target_put_sess_cmd()
>> core_tmr_release_req() {
>> if (dev) { // backend device, can not be null
>> spin_lock_irqsave(&dev->se_tmr_lock, flags); //<<<--- CRASH
>>
>> Call Trace:
>> NIP [c000000000e1683c] _raw_spin_lock_irqsave+0x2c/0xc0
>> LR [c00800000e433338] core_tmr_release_req+0x40/0xa0 [target_core_mod]
>> Call Trace:
>> (unreliable)
>> 0x0
>> target_put_sess_cmd+0x2a0/0x370 [target_core_mod]
>> transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod]
>> tcm_qla2xxx_complete_mcmd+0x28/0x50 [tcm_qla2xxx]
>> process_one_work+0x2c4/0x5c0
>> worker_thread+0x88/0x690
>>
>> For FC protocol it is a race condition, but for iSCSI protocol it is
>> easyly reproduced by manual sending iSCSI commands:
>> - Send some SCSI sommand
>> - Send Abort of that command over iSCSI
>> - Remove LUN on target
>> - Send next iSCSI command to acknowledge the Abort_Response
>> - target panics
>>
>> There is no sense to keep the command in tmr_list until response
>> completion, so move the removal from tmr_list from the response
>> completion to the response queueing when lun is unlinked.
>> Move the removal from state list too as it is a subject to the same
>> race condition.
>>
>> Fixes: c66ac9db8d4a ("[SCSI] target: Add LIO target core v4.0.0-rc6")
>> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com>
>> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com>
>>
>> ---
>> The issue exists from the very begining.
>> I uploaded a script that helps to reproduce the issue at
>> https://urldefense.com/v3/__https://gist.github.com/logost/cb93df41dd2432454324449b390403c4__;!!GqivPVa7Brio!KN-N7Ult7Itn2AzY6LdP2vm0D81UIjpbCyOAH3uf2OoLGUykpGP3dJTQlLm9m71MjsxY$
>> ---
>> drivers/target/target_core_tmr.c | 9 ---------
>> drivers/target/target_core_transport.c | 19 +++++++++++++++++--
>> 2 files changed, 17 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
>> index 7347285471fa..323a173010c1 100644
>> --- a/drivers/target/target_core_tmr.c
>> +++ b/drivers/target/target_core_tmr.c
>> @@ -50,15 +50,6 @@ EXPORT_SYMBOL(core_tmr_alloc_req);
>>
>> void core_tmr_release_req(struct se_tmr_req *tmr)
>> {
>> - struct se_device *dev = tmr->tmr_dev;
>> - unsigned long flags;
>> -
>> - if (dev) {
>> - spin_lock_irqsave(&dev->se_tmr_lock, flags);
>> - list_del_init(&tmr->tmr_list);
>> - spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
>> - }
>> -
>> kfree(tmr);
>> }
>>
>> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
>> index 5ecb9f18a53d..4d9968f43cf1 100644
>> --- a/drivers/target/target_core_transport.c
>> +++ b/drivers/target/target_core_transport.c
>> @@ -667,6 +667,20 @@ static void target_remove_from_state_list(struct se_cmd *cmd)
>> spin_unlock_irqrestore(&dev->queues[cmd->cpuid].lock, flags);
>> }
>>
>> +static void target_remove_from_tmr_list(struct se_cmd *cmd)
>> +{
>> + struct se_device *dev = NULL;
>> + unsigned long flags;
>> +
>
> This is just a nit. Maybe just do:
>
> struct se_device *dev = NULL;
> unsigned long flags;
>
> if (!(cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
> return;
>
> dev = cmd->se_tmr_req->tmr_dev;
> spin_lock_irqsave(&dev->se_tmr_lock, flags);
> list_del_init(&cmd->se_tmr_req->tmr_list);
> spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
>
This might be wrong. I thought when you moved the deletion to
transport_cmd_check_stop_to_fabric, we would always have a dev set.
But in core_tmr_abort_task there is that transport_lookup_tmr_lun
check. If that is a valid check in there and it could be NULL in
the path:
transport_generic_handle_tmr -> target_handle_abort ->
ransport_cmd_check_stop_to_fabric
then we would hit a NULL pointer. I'm not sure how we would get a NULL
dev there though. The driver would have to not use the standard
target_submit_tmr or be iscsi and not call transport_lookup_tmr_lun.
One issue with the patch though is if iscsit_tmr_abort_task fails then we don't
call transport_cmd_check_stop_to_fabric, so the tmr will be stuck on the list.
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH] target: core: remove from tmr_list at lun unlink
2021-04-09 17:34 ` Mike Christie
@ 2021-04-13 15:53 ` Dmitriy Bogdanov
0 siblings, 0 replies; 4+ messages in thread
From: Dmitriy Bogdanov @ 2021-04-13 15:53 UTC (permalink / raw)
To: Mike Christie, Martin Petersen, target-devel
Cc: linux-scsi, linux, Roman Bolshakov
Hi Mike,
> > This is just a nit. Maybe just do:
> >
> > struct se_device *dev = NULL;
> > unsigned long flags;
> >
> > if (!(cmd->se_cmd_flags & SCF_SCSI_TMR_CDB))
> > return;
> >
> > dev = cmd->se_tmr_req->tmr_dev;
> > spin_lock_irqsave(&dev->se_tmr_lock, flags);
> > list_del_init(&cmd->se_tmr_req->tmr_list);
> > spin_unlock_irqrestore(&dev->se_tmr_lock, flags);
> >
> This might be wrong. I thought when you moved the deletion to
> transport_cmd_check_stop_to_fabric, we would always have a dev set.
> But in core_tmr_abort_task there is that transport_lookup_tmr_lun
> check. If that is a valid check in there and it could be NULL in
> the path:
> transport_generic_handle_tmr -> target_handle_abort ->
> ransport_cmd_check_stop_to_fabric
>
> then we would hit a NULL pointer. I'm not sure how we would get a NULL
> dev there though. The driver would have to not use the standard
> target_submit_tmr or be iscsi and not call transport_lookup_tmr_lun.
In theory it can do some out-of-tree transport driver, ok I will keep a guard
in target_remove_from_tmr_list().
>
> One issue with the patch though is if iscsit_tmr_abort_task fails then we don't
> call transport_cmd_check_stop_to_fabric, so the tmr will be stuck on the list.
Yes, I see. I am going to use this patch to solve it:
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -2142,7 +2142,7 @@ iscsit_handle_task_mgt_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
* TMR TASK_REASSIGN.
*/
iscsit_add_cmd_to_response_queue(cmd, conn, cmd->i_state);
- target_put_sess_cmd(&cmd->se_cmd);
+ transport_generic_free_cmd(&cmd->se_cmd, false);
return 0;
}
EXPORT_SYMBOL(iscsit_handle_task_mgt_cmd);
BR,
Dmitry
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-04-13 15:53 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-07 14:21 [PATCH] target: core: remove from tmr_list at lun unlink Dmitry Bogdanov
2021-04-09 17:18 ` Mike Christie
2021-04-09 17:34 ` Mike Christie
2021-04-13 15:53 ` Dmitriy Bogdanov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.