All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
@ 2023-02-13  3:43 Zhong Jinghua
  2023-03-01  3:37 ` zhongjinghua
  2023-03-01 19:46 ` Bart Van Assche
  0 siblings, 2 replies; 9+ messages in thread
From: Zhong Jinghua @ 2023-02-13  3:43 UTC (permalink / raw)
  To: jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, zhongjinghua, yi.zhang, yukuai3

From: Zhong Jinghua <zhongjinghua@huawei.com>

A use-after-free problem like below:

BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70

Workqueue: scsi_wq_1 __iscsi_unbind_session [scsi_transport_iscsi]
Call trace:
 dump_backtrace+0x0/0x320
 show_stack+0x24/0x30
 dump_stack+0xdc/0x128
 print_address_description+0x68/0x278
 kasan_report+0x1e4/0x308
 __asan_report_load4_noabort+0x30/0x40
 scsi_target_reap+0x6c/0x70
 scsi_remove_target+0x430/0x640
 __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
 process_one_work+0x67c/0x1350
 worker_thread+0x370/0xf90
 kthread+0x2a4/0x320
 ret_from_fork+0x10/0x18

The problem is caused by a concurrency scenario:

T0: delete target
// echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
T1: logout
// iscsiadm -m node --logout

T0							T1
 sdev_store_delete
  scsi_remove_device
   device_remove_file
    __scsi_remove_device
        					__iscsi_unbind_session
        					 scsi_remove_target
						  spin_lock_irqsave
        					  list_for_each_entry
     scsi_target_reap // starget->reaf 1 -> 0
     						  kref_get(&starget->reap_ref);
						  // warn use-after-free.
						  spin_unlock_irqrestore
      scsi_target_reap_ref_release
	scsi_target_destroy
	... // delete starget
						  scsi_target_reap
						  // UAF

When T0 reduces the reference count to 0, but has not been released,
T1 can still enter list_for_each_entry, and then kref_get reports UAF.

Fix it by using kref_get_unless_zero() to check for a reference count of
0.

Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
---
 drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e7893835b99a..0ad357ff4c59 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
 		    starget->state == STARGET_CREATED_REMOVE)
 			continue;
 		if (starget->dev.parent == dev || &starget->dev == dev) {
-			kref_get(&starget->reap_ref);
+
+			/*
+			 * If starget->reap_ref is reduced to 0, it means
+			 * that other processes are releasing it and
+			 * there is no need to delete it again
+			 */
+			if (!kref_get_unless_zero(&starget->reap_ref)) {
+				spin_unlock_irqrestore(shost->host_lock, flags);
+				goto restart;
+			}
+
 			if (starget->state == STARGET_CREATED)
 				starget->state = STARGET_CREATED_REMOVE;
 			else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-02-13  3:43 [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target Zhong Jinghua
@ 2023-03-01  3:37 ` zhongjinghua
  2023-03-01  3:40   ` zhongjinghua
  2023-03-01 19:46 ` Bart Van Assche
  1 sibling, 1 reply; 9+ messages in thread
From: zhongjinghua @ 2023-03-01  3:37 UTC (permalink / raw)
  To: Zhong Jinghua, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai3

ping...

Hello,

Anyone looking this?

在 2023/2/13 11:43, Zhong Jinghua 写道:
> From: Zhong Jinghua <zhongjinghua@huawei.com>
>
> A use-after-free problem like below:
>
> BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70
>
> Workqueue: scsi_wq_1 __iscsi_unbind_session [scsi_transport_iscsi]
> Call trace:
>   dump_backtrace+0x0/0x320
>   show_stack+0x24/0x30
>   dump_stack+0xdc/0x128
>   print_address_description+0x68/0x278
>   kasan_report+0x1e4/0x308
>   __asan_report_load4_noabort+0x30/0x40
>   scsi_target_reap+0x6c/0x70
>   scsi_remove_target+0x430/0x640
>   __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
>   process_one_work+0x67c/0x1350
>   worker_thread+0x370/0xf90
>   kthread+0x2a4/0x320
>   ret_from_fork+0x10/0x18
>
> The problem is caused by a concurrency scenario:
>
> T0: delete target
> // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
> T1: logout
> // iscsiadm -m node --logout
>
> T0							T1
>   sdev_store_delete
>    scsi_remove_device
>     device_remove_file
>      __scsi_remove_device
>          					__iscsi_unbind_session
>          					 scsi_remove_target
> 						  spin_lock_irqsave
>          					  list_for_each_entry
>       scsi_target_reap // starget->reaf 1 -> 0
>       						  kref_get(&starget->reap_ref);
> 						  // warn use-after-free.
> 						  spin_unlock_irqrestore
>        scsi_target_reap_ref_release
> 	scsi_target_destroy
> 	... // delete starget
> 						  scsi_target_reap
> 						  // UAF
>
> When T0 reduces the reference count to 0, but has not been released,
> T1 can still enter list_for_each_entry, and then kref_get reports UAF.
>
> Fix it by using kref_get_unless_zero() to check for a reference count of
> 0.
>
> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
> ---
>   drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
>   1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index e7893835b99a..0ad357ff4c59 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
>   		    starget->state == STARGET_CREATED_REMOVE)
>   			continue;
>   		if (starget->dev.parent == dev || &starget->dev == dev) {
> -			kref_get(&starget->reap_ref);
> +
> +			/*
> +			 * If starget->reap_ref is reduced to 0, it means
> +			 * that other processes are releasing it and
> +			 * there is no need to delete it again
> +			 */
> +			if (!kref_get_unless_zero(&starget->reap_ref)) {
> +				spin_unlock_irqrestore(shost->host_lock, flags);
> +				goto restart;
> +			}
> +
>   			if (starget->state == STARGET_CREATED)
>   				starget->state = STARGET_CREATED_REMOVE;
>   			else

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-03-01  3:37 ` zhongjinghua
@ 2023-03-01  3:40   ` zhongjinghua
  2023-03-01 19:51     ` James Bottomley
  2023-03-01 21:15     ` Mike Christie
  0 siblings, 2 replies; 9+ messages in thread
From: zhongjinghua @ 2023-03-01  3:40 UTC (permalink / raw)
  To: zhongjinghua, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai3

ping...

Hello,

Anyone looking this?

在 2023/3/1 11:37, zhongjinghua 写道:
> ping...
>
> Hello,
>
> Anyone looking this?
>
> 在 2023/2/13 11:43, Zhong Jinghua 写道:
>> From: Zhong Jinghua <zhongjinghua@huawei.com>
>>
>> A use-after-free problem like below:
>>
>> BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70
>>
>> Workqueue: scsi_wq_1 __iscsi_unbind_session [scsi_transport_iscsi]
>> Call trace:
>>   dump_backtrace+0x0/0x320
>>   show_stack+0x24/0x30
>>   dump_stack+0xdc/0x128
>>   print_address_description+0x68/0x278
>>   kasan_report+0x1e4/0x308
>>   __asan_report_load4_noabort+0x30/0x40
>>   scsi_target_reap+0x6c/0x70
>>   scsi_remove_target+0x430/0x640
>>   __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
>>   process_one_work+0x67c/0x1350
>>   worker_thread+0x370/0xf90
>>   kthread+0x2a4/0x320
>>   ret_from_fork+0x10/0x18
>>
>> The problem is caused by a concurrency scenario:
>>
>> T0: delete target
>> // echo 1 > 
>> /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
>> T1: logout
>> // iscsiadm -m node --logout
>>
>> T0                            T1
>>   sdev_store_delete
>>    scsi_remove_device
>>     device_remove_file
>>      __scsi_remove_device
>>                              __iscsi_unbind_session
>>                               scsi_remove_target
>>                           spin_lock_irqsave
>>                                list_for_each_entry
>>       scsi_target_reap // starget->reaf 1 -> 0
>> kref_get(&starget->reap_ref);
>>                           // warn use-after-free.
>>                           spin_unlock_irqrestore
>>        scsi_target_reap_ref_release
>>     scsi_target_destroy
>>     ... // delete starget
>>                           scsi_target_reap
>>                           // UAF
>>
>> When T0 reduces the reference count to 0, but has not been released,
>> T1 can still enter list_for_each_entry, and then kref_get reports UAF.
>>
>> Fix it by using kref_get_unless_zero() to check for a reference count of
>> 0.
>>
>> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
>> ---
>>   drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index e7893835b99a..0ad357ff4c59 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
>>               starget->state == STARGET_CREATED_REMOVE)
>>               continue;
>>           if (starget->dev.parent == dev || &starget->dev == dev) {
>> -            kref_get(&starget->reap_ref);
>> +
>> +            /*
>> +             * If starget->reap_ref is reduced to 0, it means
>> +             * that other processes are releasing it and
>> +             * there is no need to delete it again
>> +             */
>> +            if (!kref_get_unless_zero(&starget->reap_ref)) {
>> +                spin_unlock_irqrestore(shost->host_lock, flags);
>> +                goto restart;
>> +            }
>> +
>>               if (starget->state == STARGET_CREATED)
>>                   starget->state = STARGET_CREATED_REMOVE;
>>               else


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-02-13  3:43 [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target Zhong Jinghua
  2023-03-01  3:37 ` zhongjinghua
@ 2023-03-01 19:46 ` Bart Van Assche
  2023-03-06  8:29   ` zhongjinghua
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2023-03-01 19:46 UTC (permalink / raw)
  To: Zhong Jinghua, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, zhongjinghua, yi.zhang, yukuai3

On 2/12/23 19:43, Zhong Jinghua wrote:
> T0							T1
>   sdev_store_delete
>    scsi_remove_device
>     device_remove_file
>      __scsi_remove_device
>          					__iscsi_unbind_session
>          					 scsi_remove_target
> 						  spin_lock_irqsave
>          					  list_for_each_entry
>       scsi_target_reap // starget->reaf 1 -> 0

What is "reaf"? Did you perhaps want to write "reap_ref"?

> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index e7893835b99a..0ad357ff4c59 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
>   		    starget->state == STARGET_CREATED_REMOVE)
>   			continue;
>   		if (starget->dev.parent == dev || &starget->dev == dev) {
> -			kref_get(&starget->reap_ref);
> +
> +			/*
> +			 * If starget->reap_ref is reduced to 0, it means
> +			 * that other processes are releasing it and
> +			 * there is no need to delete it again
> +			 */
> +			if (!kref_get_unless_zero(&starget->reap_ref)) {
> +				spin_unlock_irqrestore(shost->host_lock, flags);
> +				goto restart;
> +			}
> +
>   			if (starget->state == STARGET_CREATED)
>   				starget->state = STARGET_CREATED_REMOVE;
>   			else

The above comment should be made more clear, e.g. as follows: "If the 
reference count is already zero, skip this target. Calling 
kref_get_unless_zero() if the reference count is zero is safe because 
scsi_target_destroy() will wait until the host lock has been released 
before freeing starget."

Otherwise this patch looks fine to me.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-03-01  3:40   ` zhongjinghua
@ 2023-03-01 19:51     ` James Bottomley
  2023-03-02  8:56       ` zhongjinghua
  2023-03-01 21:15     ` Mike Christie
  1 sibling, 1 reply; 9+ messages in thread
From: James Bottomley @ 2023-03-01 19:51 UTC (permalink / raw)
  To: zhongjinghua, zhongjinghua, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai3

On Wed, 2023-03-01 at 11:40 +0800, zhongjinghua wrote:
> ping...
> 
> Hello,
> 
> Anyone looking this?
> 
> 在 2023/3/1 11:37, zhongjinghua 写道:
> > ping...
> > 
> > Hello,
> > 
> > Anyone looking this?
> > 
> > 在 2023/2/13 11:43, Zhong Jinghua 写道:
> > > From: Zhong Jinghua <zhongjinghua@huawei.com>
> > > 
> > > A use-after-free problem like below:
> > > 
> > > BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70
> > > 
> > > Workqueue: scsi_wq_1 __iscsi_unbind_session
> > > [scsi_transport_iscsi]
> > > Call trace:
> > >   dump_backtrace+0x0/0x320
> > >   show_stack+0x24/0x30
> > >   dump_stack+0xdc/0x128
> > >   print_address_description+0x68/0x278
> > >   kasan_report+0x1e4/0x308
> > >   __asan_report_load4_noabort+0x30/0x40
> > >   scsi_target_reap+0x6c/0x70
> > >   scsi_remove_target+0x430/0x640
> > >   __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
> > >   process_one_work+0x67c/0x1350
> > >   worker_thread+0x370/0xf90
> > >   kthread+0x2a4/0x320
> > >   ret_from_fork+0x10/0x18
> > > 
> > > The problem is caused by a concurrency scenario:
> > > 
> > > T0: delete target
> > > // echo 1 > 
> > > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
> > > T1: logout
> > > // iscsiadm -m node --logout
> > > 
> > > T0                            T1
> > >   sdev_store_delete
> > >    scsi_remove_device
> > >     device_remove_file
> > >      __scsi_remove_device
> > >                              __iscsi_unbind_session
> > >                               scsi_remove_target
> > >                           spin_lock_irqsave
> > >                                list_for_each_entry
> > >       scsi_target_reap // starget->reaf 1 -> 0
> > > kref_get(&starget->reap_ref);
> > >                           // warn use-after-free.
> > >                           spin_unlock_irqrestore
> > >        scsi_target_reap_ref_release
> > >     scsi_target_destroy
> > >     ... // delete starget
> > >                           scsi_target_reap
> > >                           // UAF
> > > 
> > > When T0 reduces the reference count to 0, but has not been
> > > released,
> > > T1 can still enter list_for_each_entry, and then kref_get reports
> > > UAF.
> > > 
> > > Fix it by using kref_get_unless_zero() to check for a reference
> > > count of
> > > 0.
> > > 
> > > Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
> > > ---
> > >   drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
> > >   1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/scsi/scsi_sysfs.c
> > > b/drivers/scsi/scsi_sysfs.c
> > > index e7893835b99a..0ad357ff4c59 100644
> > > --- a/drivers/scsi/scsi_sysfs.c
> > > +++ b/drivers/scsi/scsi_sysfs.c
> > > @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device
> > > *dev)
> > >               starget->state == STARGET_CREATED_REMOVE)
> > >               continue;
> > >           if (starget->dev.parent == dev || &starget->dev == dev)
> > > {
> > > -            kref_get(&starget->reap_ref);
> > > +
> > > +            /*
> > > +             * If starget->reap_ref is reduced to 0, it means
> > > +             * that other processes are releasing it and
> > > +             * there is no need to delete it again
> > > +             */
> > > +            if (!kref_get_unless_zero(&starget->reap_ref)) {
> > > +                spin_unlock_irqrestore(shost->host_lock, flags);
> > > +                goto restart;

This doesn't seem to be a good idea: you're asking for a live lock
where the thread that's already reduced the refcount to 0 and will
eventually remove the target from the list doesn't progress before you
take the lock again in the restart and then you find the same result
and go round again (and again ...).

Since there should only be one match in the target list and you found
it and know it's going away, what about break instead of unlock and
goto restart?

James


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-03-01  3:40   ` zhongjinghua
  2023-03-01 19:51     ` James Bottomley
@ 2023-03-01 21:15     ` Mike Christie
  2023-03-02 16:38       ` Mike Christie
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Christie @ 2023-03-01 21:15 UTC (permalink / raw)
  To: zhongjinghua, zhongjinghua, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai3

On 2/28/23 9:40 PM, zhongjinghua wrote:
>> 在 2023/2/13 11:43, Zhong Jinghua 写道:
>>> From: Zhong Jinghua <zhongjinghua@huawei.com>
>>>
>>> A use-after-free problem like below:
>>>
>>> BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70
>>>
>>> Workqueue: scsi_wq_1 __iscsi_unbind_session [scsi_transport_iscsi]
>>> Call trace:
>>>   dump_backtrace+0x0/0x320
>>>   show_stack+0x24/0x30
>>>   dump_stack+0xdc/0x128
>>>   print_address_description+0x68/0x278
>>>   kasan_report+0x1e4/0x308
>>>   __asan_report_load4_noabort+0x30/0x40
>>>   scsi_target_reap+0x6c/0x70
>>>   scsi_remove_target+0x430/0x640
>>>   __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
>>>   process_one_work+0x67c/0x1350
>>>   worker_thread+0x370/0xf90
>>>   kthread+0x2a4/0x320
>>>   ret_from_fork+0x10/0x18
>>>
>>> The problem is caused by a concurrency scenario:
>>>
>>> T0: delete target
>>> // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
>>> T1: logout
>>> // iscsiadm -m node --logout
>>>
>>> T0                            T1
>>>   sdev_store_delete
>>>    scsi_remove_device
>>>     device_remove_file
>>>      __scsi_remove_device
>>>                              __iscsi_unbind_session
>>>                               scsi_remove_target
>>>                           spin_lock_irqsave
>>>                                list_for_each_entry
>>>       scsi_target_reap // starget->reaf 1 -> 0
>>> kref_get(&starget->reap_ref);
>>>                           // warn use-after-free.
>>>                           spin_unlock_irqrestore
>>>        scsi_target_reap_ref_release
>>>     scsi_target_destroy
>>>     ... // delete starget
>>>                           scsi_target_reap
>>>                           // UAF
>>>
>>> When T0 reduces the reference count to 0, but has not been released,
>>> T1 can still enter list_for_each_entry, and then kref_get reports UAF.
>>>
>>> Fix it by using kref_get_unless_zero() to check for a reference count of
>>> 0.
>>>
>>> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
>>> ---
>>>   drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
>>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>> index e7893835b99a..0ad357ff4c59 100644
>>> --- a/drivers/scsi/scsi_sysfs.c
>>> +++ b/drivers/scsi/scsi_sysfs.c
>>> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
>>>               starget->state == STARGET_CREATED_REMOVE)
>>>               continue;
>>>           if (starget->dev.parent == dev || &starget->dev == dev) {
>>> -            kref_get(&starget->reap_ref);
>>> +
>>> +            /*
>>> +             * If starget->reap_ref is reduced to 0, it means
>>> +             * that other processes are releasing it and
>>> +             * there is no need to delete it again
>>> +             */
>>> +            if (!kref_get_unless_zero(&starget->reap_ref)) {
>>> +                spin_unlock_irqrestore(shost->host_lock, flags);
>>> +                goto restart;
>>> +            }
>>> +

Patch looks ok.

Is there another bug in the existing kref_get_unless_zero(&starget->reap_ref)
call in scsi_alloc_target?

I think scsi_alloc_target can find the target on the __targets list, and
it's call to kref_get_unless_zero will succeed if we are only above getting
our own ref (we have not done __scsi_remove_target and have not done the
scsi_target_reap call at the end of the function).

But if scsi_remove_target has set the target state to STARGET_REMOVE, the thread
that did scsi_alloc_target wouldn't be able to put the target into the correct state
(the scsi_target_add call will see the target state and return). So later if the
driver/transport class did scsi_remove_target again to remove the target that
the scsi_alloc_target call re-added, we see the target->state still in STARGET_REMOVE
and it won't get deleted.

Can we solve both issues at the same time?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-03-01 19:51     ` James Bottomley
@ 2023-03-02  8:56       ` zhongjinghua
  0 siblings, 0 replies; 9+ messages in thread
From: zhongjinghua @ 2023-03-02  8:56 UTC (permalink / raw)
  To: jejb, zhongjinghua, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai3


在 2023/3/2 3:51, James Bottomley 写道:
> On Wed, 2023-03-01 at 11:40 +0800, zhongjinghua wrote:
>> ping...
>>
>> Hello,
>>
>> Anyone looking this?
>>
>> 在 2023/3/1 11:37, zhongjinghua 写道:
>>> ping...
>>>
>>> Hello,
>>>
>>> Anyone looking this?
>>>
>>> 在 2023/2/13 11:43, Zhong Jinghua 写道:
>>>> From: Zhong Jinghua <zhongjinghua@huawei.com>
>>>>
>>>> A use-after-free problem like below:
>>>>
>>>> BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70
>>>>
>>>> Workqueue: scsi_wq_1 __iscsi_unbind_session
>>>> [scsi_transport_iscsi]
>>>> Call trace:
>>>>    dump_backtrace+0x0/0x320
>>>>    show_stack+0x24/0x30
>>>>    dump_stack+0xdc/0x128
>>>>    print_address_description+0x68/0x278
>>>>    kasan_report+0x1e4/0x308
>>>>    __asan_report_load4_noabort+0x30/0x40
>>>>    scsi_target_reap+0x6c/0x70
>>>>    scsi_remove_target+0x430/0x640
>>>>    __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
>>>>    process_one_work+0x67c/0x1350
>>>>    worker_thread+0x370/0xf90
>>>>    kthread+0x2a4/0x320
>>>>    ret_from_fork+0x10/0x18
>>>>
>>>> The problem is caused by a concurrency scenario:
>>>>
>>>> T0: delete target
>>>> // echo 1 >
>>>> /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
>>>> T1: logout
>>>> // iscsiadm -m node --logout
>>>>
>>>> T0                            T1
>>>>    sdev_store_delete
>>>>     scsi_remove_device
>>>>      device_remove_file
>>>>       __scsi_remove_device
>>>>                               __iscsi_unbind_session
>>>>                                scsi_remove_target
>>>>                            spin_lock_irqsave
>>>>                                 list_for_each_entry
>>>>        scsi_target_reap // starget->reaf 1 -> 0
>>>> kref_get(&starget->reap_ref);
>>>>                            // warn use-after-free.
>>>>                            spin_unlock_irqrestore
>>>>         scsi_target_reap_ref_release
>>>>      scsi_target_destroy
>>>>      ... // delete starget
>>>>                            scsi_target_reap
>>>>                            // UAF
>>>>
>>>> When T0 reduces the reference count to 0, but has not been
>>>> released,
>>>> T1 can still enter list_for_each_entry, and then kref_get reports
>>>> UAF.
>>>>
>>>> Fix it by using kref_get_unless_zero() to check for a reference
>>>> count of
>>>> 0.
>>>>
>>>> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
>>>> ---
>>>>    drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
>>>>    1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_sysfs.c
>>>> b/drivers/scsi/scsi_sysfs.c
>>>> index e7893835b99a..0ad357ff4c59 100644
>>>> --- a/drivers/scsi/scsi_sysfs.c
>>>> +++ b/drivers/scsi/scsi_sysfs.c
>>>> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device
>>>> *dev)
>>>>                starget->state == STARGET_CREATED_REMOVE)
>>>>                continue;
>>>>            if (starget->dev.parent == dev || &starget->dev == dev)
>>>> {
>>>> -            kref_get(&starget->reap_ref);
>>>> +
>>>> +            /*
>>>> +             * If starget->reap_ref is reduced to 0, it means
>>>> +             * that other processes are releasing it and
>>>> +             * there is no need to delete it again
>>>> +             */
>>>> +            if (!kref_get_unless_zero(&starget->reap_ref)) {
>>>> +                spin_unlock_irqrestore(shost->host_lock, flags);
>>>> +                goto restart;
> This doesn't seem to be a good idea: you're asking for a live lock
> where the thread that's already reduced the refcount to 0 and will
> eventually remove the target from the list doesn't progress before you
> take the lock again in the restart and then you find the same result
> and go round again (and again ...).

I agree with this, no need to use goto restart.

> Since there should only be one match in the target list and you found
> it and know it's going away, what about break instead of unlock and
> goto restart?

Wouldn't it be better to use continue? Assuming that the device 
parameter is a session, maybe not only one is matched.

>
> James

Thanks.

Jinghua.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-03-01 21:15     ` Mike Christie
@ 2023-03-02 16:38       ` Mike Christie
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Christie @ 2023-03-02 16:38 UTC (permalink / raw)
  To: zhongjinghua, zhongjinghua, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai3

On 3/1/23 3:15 PM, Mike Christie wrote:
> On 2/28/23 9:40 PM, zhongjinghua wrote:
>>> 在 2023/2/13 11:43, Zhong Jinghua 写道:
>>>> From: Zhong Jinghua <zhongjinghua@huawei.com>
>>>>
>>>> A use-after-free problem like below:
>>>>
>>>> BUG: KASAN: use-after-free in scsi_target_reap+0x6c/0x70
>>>>
>>>> Workqueue: scsi_wq_1 __iscsi_unbind_session [scsi_transport_iscsi]
>>>> Call trace:
>>>>   dump_backtrace+0x0/0x320
>>>>   show_stack+0x24/0x30
>>>>   dump_stack+0xdc/0x128
>>>>   print_address_description+0x68/0x278
>>>>   kasan_report+0x1e4/0x308
>>>>   __asan_report_load4_noabort+0x30/0x40
>>>>   scsi_target_reap+0x6c/0x70
>>>>   scsi_remove_target+0x430/0x640
>>>>   __iscsi_unbind_session+0x164/0x268 [scsi_transport_iscsi]
>>>>   process_one_work+0x67c/0x1350
>>>>   worker_thread+0x370/0xf90
>>>>   kthread+0x2a4/0x320
>>>>   ret_from_fork+0x10/0x18
>>>>
>>>> The problem is caused by a concurrency scenario:
>>>>
>>>> T0: delete target
>>>> // echo 1 > /sys/devices/platform/host1/session1/target1:0:0/1:0:0:1/delete
>>>> T1: logout
>>>> // iscsiadm -m node --logout
>>>>
>>>> T0                            T1
>>>>   sdev_store_delete
>>>>    scsi_remove_device
>>>>     device_remove_file
>>>>      __scsi_remove_device
>>>>                              __iscsi_unbind_session
>>>>                               scsi_remove_target
>>>>                           spin_lock_irqsave
>>>>                                list_for_each_entry
>>>>       scsi_target_reap // starget->reaf 1 -> 0
>>>> kref_get(&starget->reap_ref);
>>>>                           // warn use-after-free.
>>>>                           spin_unlock_irqrestore
>>>>        scsi_target_reap_ref_release
>>>>     scsi_target_destroy
>>>>     ... // delete starget
>>>>                           scsi_target_reap
>>>>                           // UAF
>>>>
>>>> When T0 reduces the reference count to 0, but has not been released,
>>>> T1 can still enter list_for_each_entry, and then kref_get reports UAF.
>>>>
>>>> Fix it by using kref_get_unless_zero() to check for a reference count of
>>>> 0.
>>>>
>>>> Signed-off-by: Zhong Jinghua <zhongjinghua@huawei.com>
>>>> ---
>>>>   drivers/scsi/scsi_sysfs.c | 12 +++++++++++-
>>>>   1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>>>> index e7893835b99a..0ad357ff4c59 100644
>>>> --- a/drivers/scsi/scsi_sysfs.c
>>>> +++ b/drivers/scsi/scsi_sysfs.c
>>>> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
>>>>               starget->state == STARGET_CREATED_REMOVE)
>>>>               continue;
>>>>           if (starget->dev.parent == dev || &starget->dev == dev) {
>>>> -            kref_get(&starget->reap_ref);
>>>> +
>>>> +            /*
>>>> +             * If starget->reap_ref is reduced to 0, it means
>>>> +             * that other processes are releasing it and
>>>> +             * there is no need to delete it again
>>>> +             */
>>>> +            if (!kref_get_unless_zero(&starget->reap_ref)) {
>>>> +                spin_unlock_irqrestore(shost->host_lock, flags);
>>>> +                goto restart;
>>>> +            }
>>>> +
> 
> Patch looks ok.
> 
> Is there another bug in the existing kref_get_unless_zero(&starget->reap_ref)
> call in scsi_alloc_target?
> 
> I think scsi_alloc_target can find the target on the __targets list, and
> it's call to kref_get_unless_zero will succeed if we are only above getting
> our own ref (we have not done __scsi_remove_target and have not done the
> scsi_target_reap call at the end of the function).
> 
> But if scsi_remove_target has set the target state to STARGET_REMOVE, the thread
> that did scsi_alloc_target wouldn't be able to put the target into the correct state
> (the scsi_target_add call will see the target state and return). So later if the
> driver/transport class did scsi_remove_target again to remove the target that
> the scsi_alloc_target call re-added, we see the target->state still in STARGET_REMOVE
> and it won't get deleted.
> 
> Can we solve both issues at the same time?

I looked into this last part of my comment, and I don't think it's possible.
I thought we could just change around when we add/delete the target from the
__targets list and when the target_alloc/destroy callouts are done, but that
is more difficult than it looks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target
  2023-03-01 19:46 ` Bart Van Assche
@ 2023-03-06  8:29   ` zhongjinghua
  0 siblings, 0 replies; 9+ messages in thread
From: zhongjinghua @ 2023-03-06  8:29 UTC (permalink / raw)
  To: Bart Van Assche, Zhong Jinghua, jejb, martin.petersen
  Cc: linux-scsi, linux-kernel, yi.zhang, yukuai3


在 2023/3/2 3:46, Bart Van Assche 写道:
> On 2/12/23 19:43, Zhong Jinghua wrote:
>> T0                            T1
>>   sdev_store_delete
>>    scsi_remove_device
>>     device_remove_file
>>      __scsi_remove_device
>>                              __iscsi_unbind_session
>>                               scsi_remove_target
>>                           spin_lock_irqsave
>>                                list_for_each_entry
>>       scsi_target_reap // starget->reaf 1 -> 0
>
> What is "reaf"? Did you perhaps want to write "reap_ref"?
Yes, I will modify late.
>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index e7893835b99a..0ad357ff4c59 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1561,7 +1561,17 @@ void scsi_remove_target(struct device *dev)
>>               starget->state == STARGET_CREATED_REMOVE)
>>               continue;
>>           if (starget->dev.parent == dev || &starget->dev == dev) {
>> -            kref_get(&starget->reap_ref);
>> +
>> +            /*
>> +             * If starget->reap_ref is reduced to 0, it means
>> +             * that other processes are releasing it and
>> +             * there is no need to delete it again
>> +             */
>> +            if (!kref_get_unless_zero(&starget->reap_ref)) {
>> +                spin_unlock_irqrestore(shost->host_lock, flags);
>> +                goto restart;
>> +            }
>> +
>>               if (starget->state == STARGET_CREATED)
>>                   starget->state = STARGET_CREATED_REMOVE;
>>               else
>
> The above comment should be made more clear, e.g. as follows: "If the 
> reference count is already zero, skip this target. Calling 
> kref_get_unless_zero() if the reference count is zero is safe because 
> scsi_target_destroy() will wait until the host lock has been released 
> before freeing starget."

Agree. Thanks for your e.g.

I will send the v2 late.

>
> Otherwise this patch looks fine to me.
>
> Thanks,
>
> Bart.
>
>
Thanks,

Jinghua


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-03-06  8:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-13  3:43 [PATCH-next] scsi: fix use-after-free problem in scsi_remove_target Zhong Jinghua
2023-03-01  3:37 ` zhongjinghua
2023-03-01  3:40   ` zhongjinghua
2023-03-01 19:51     ` James Bottomley
2023-03-02  8:56       ` zhongjinghua
2023-03-01 21:15     ` Mike Christie
2023-03-02 16:38       ` Mike Christie
2023-03-01 19:46 ` Bart Van Assche
2023-03-06  8:29   ` zhongjinghua

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.