All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] scsi: sg: NULL pointer dereference
@ 2024-03-05 15:05 Alexander Wetzel
  2024-03-18 17:50 ` [PATCH] scsi: sg: Avoid sg device teardown race Alexander Wetzel
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Wetzel @ 2024-03-05 15:05 UTC (permalink / raw)
  To: Doug Gilbert; +Cc: linux-scsi, Alexander Wetzel

My new notebook is nearly always crashing when I phsically disconnect
(some?) USB devices. For testing I used a SealOne USB device. But I
have the same issue with an USB DVD rom drive.

I have the issue with quite some kernel release. Never had a working
kernel on the new system, so the issue should go back till at least
kernel 6.2. (Not tested that. When needed I'm happy to try older
versions, too.)

The attached patch avoides the null pointer dereference without
aiming to be a proper fix. (Instead I then get the Warning from the
new WARN_ON.)

Obvoiusly sdp->device->request_queue in sg_device_destroy() is
sometimes NULL. And since the system is not always crashing it looks
like some kind of cleanup race.

My "normal" kernel with the issue currently is 6.7.5-gentoo.
But since that one is tainted the kernel Oops and debug patch here is
using 6.8.0-rc6-wt. (Interestingly the BUG report below did not freeze
the system as usual, the system continued to be working.)

The kernel messages are:

usb 1-1: USB disconnect, device number 7
BUG: kernel NULL pointer dereference, address: 0000000000000370
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 5 PID: 683 Comm: kworker/5:11 Not tainted 6.8.0-rc6-wt+ #2
Hardware name: LENOVO 21D6CTO1WW/21D6CTO1WW, BIOS N3FET34W (1.19 ) 03/10/2023
Workqueue: events sg_remove_sfp_usercontext
RIP: 0010:mutex_lock+0x19/0x30
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 89 fb e8 22 dd ff ff 31 c0 65 48 8b 14 25 40 fb 02 00 <f0> 48 0f b1 13 75 06 5b c3 cc cc cc cc 48 89 df 5b eb b4 0f 1f 40
RSP: 0000:ffffbb0d412bfdd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000370 RCX: 00000000820001c6
RDX: ffff985d58152080 RSI: fffff6c0041083c0 RDI: 0000000000000370
RBP: 0000000000000000 R08: ffff985d4420fc28 R09: 00000000820001c6
R10: ffff985d4420fea8 R11: 0000000000000181 R12: 0000000000000370
R13: ffff985d400518b0 R14: ffff985dddd046c0 R15: ffff985d4ebeb328
FS:  0000000000000000(0000) GS:ffff986c6f340000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000370 CR3: 000000015887a000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
 <TASK>
 ? __die+0x1f/0x70
 ? page_fault_oops+0x171/0x4d0
 ? __slab_free+0xe1/0x320
 ? exc_page_fault+0x7b/0x180
 ? asm_exc_page_fault+0x22/0x30
 ? mutex_lock+0x19/0x30
 blk_trace_remove+0x16/0xb0
 sg_device_destroy+0x26/0xa0
 sg_remove_sfp_usercontext+0x12c/0x190
 process_one_work+0x162/0x330
 worker_thread+0x2f1/0x410
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xe1/0x110
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x2d/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>
Modules linked in: uas usb_storage rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables bridge stp llc bnep snd_ctl_led ledtrig_audio snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_sof_probes snd_soc_intel_hda_dsp_common snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda intel_uncore_frequency snd_hda_ext_core intel_uncore_frequency_common snd_soc_acpi_intel_match intel_tcc_cooling snd_soc_acpi soundwire_bus x86_pkg_temp_thermal snd_hda_codec_hdmi intel_powerclamp snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine coretemp snd_hda_intel kvm_intel uvcvideo
 snd_intel_dspcfg snd_intel_sdw_acpi uvc videobuf2_vmalloc snd_hda_codec videobuf2_memops kvm iwlmvm processor_thermal_device_pci snd_hda_core videobuf2_v4l2 btusb processor_thermal_device iTCO_wdt btrtl snd_hwdep irqbypass videobuf2_common intel_pmc_bxt processor_thermal_wt_hint btintel mac80211 snd_pcm nxp_nci_i2c processor_thermal_rfim iTCO_vendor_support btbcm rapl nxp_nci snd_timer processor_thermal_rapl mei_wdt videodev vfat btmtk fat libarc4 intel_rapl_msr intel_rapl_common bluetooth mc nci intel_cstate processor_thermal_wt_req iwlwifi snd mei_me processor_thermal_power_floor i2c_i801 think_lmi intel_uncore pcspkr firmware_attributes_class i2c_smbus wmi_bmof mei idma64 soundcore processor_thermal_mbox nfc intel_pmc_core cfg80211 rfkill intel_vsec int3403_thermal int3400_thermal int340x_thermal_zone pmt_telemetry intel_hid acpi_thermal_rel pmt_class sparse_keymap acpi_tad acpi_pad joydev loop fuse nfnetlink mmc_block nvme nvme_core crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni
 rtsx_pci_sdmmc polyval_generic ghash_clmulni_intel sha512_ssse3 mmc_core sha256_ssse3 ucsi_acpi hid_multitouch sha1_ssse3 thunderbolt typec_ucsi rtsx_pci vmd typec i2c_hid_acpi i2c_hid pinctrl_alderlake serio_raw
CR2: 0000000000000370
---[ end trace 0000000000000000 ]---
RIP: 0010:mutex_lock+0x19/0x30
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 89 fb e8 22 dd ff ff 31 c0 65 48 8b 14 25 40 fb 02 00 <f0> 48 0f b1 13 75 06 5b c3 cc cc cc cc 48 89 df 5b eb b4 0f 1f 40
RSP: 0000:ffffbb0d412bfdd0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000370 RCX: 00000000820001c6
RDX: ffff985d58152080 RSI: fffff6c0041083c0 RDI: 0000000000000370
RBP: 0000000000000000 R08: ffff985d4420fc28 R09: 00000000820001c6
R10: ffff985d4420fea8 R11: 0000000000000181 R12: 0000000000000370
R13: ffff985d400518b0 R14: ffff985dddd046c0 R15: ffff985d4ebeb328
FS:  0000000000000000(0000) GS:ffff986c6f340000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000370 CR3: 000000015887a000 CR4: 0000000000750ef0
PKRU: 55555554
note: kworker/5:11[683] exited with irqs disabled
---
 drivers/scsi/sg.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 86210e4dd0d3..94c07cd318a0 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1575,8 +1575,10 @@ sg_device_destroy(struct kref *kref)
 	 * any other cleanup.
 	 */
 
-	blk_trace_remove(q);
-	blk_put_queue(q);
+	if (!WARN_ON(!q)) {
+		blk_trace_remove(q);
+		blk_put_queue(q);
+	}
 
 	write_lock_irqsave(&sg_index_lock, flags);
 	idr_remove(&sg_index_idr, sdp->index);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] scsi: sg: Avoid sg device teardown race
  2024-03-05 15:05 [BUG] scsi: sg: NULL pointer dereference Alexander Wetzel
@ 2024-03-18 17:50 ` Alexander Wetzel
  2024-03-20 11:08   ` [PATCH v2] " Alexander Wetzel
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Wetzel @ 2024-03-18 17:50 UTC (permalink / raw)
  To: dgilbert; +Cc: linux-scsi, Alexander Wetzel, stable

sg_remove_sfp_usercontext() must not use sg_device_destroy() after
calling scsi_device_put().

sg_device_destroy() is accessling the device queue. Which will be set to
NULL if scsi_device_put() removes the last reference to the sg device.

Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
---

This is my best shot for a real fix of the issue.
I confirmed with printk's that I get the NULL pointer freeze ony when
scsi_device_put() is deleting the last reference to the device.
In the cases where it's not crashing there is still a reference left
after the call.

I don't see any obvious down side of simply swapping the calls.
The alternative would by my first patch, just without the WARN_ON.

Alexander
---
 drivers/scsi/sg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 86210e4dd0d3..80e0d1981191 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 			"sg_remove_sfp: sfp=0x%p\n", sfp));
 	kfree(sfp);
 
-	scsi_device_put(sdp->device);
 	kref_put(&sdp->d_ref, sg_device_destroy);
+	scsi_device_put(sdp->device);
 	module_put(THIS_MODULE);
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2] scsi: sg: Avoid sg device teardown race
  2024-03-18 17:50 ` [PATCH] scsi: sg: Avoid sg device teardown race Alexander Wetzel
@ 2024-03-20 11:08   ` Alexander Wetzel
  2024-03-20 11:16     ` Greg KH
                       ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Alexander Wetzel @ 2024-03-20 11:08 UTC (permalink / raw)
  To: dgilbert; +Cc: linux-scsi, Alexander Wetzel, stable

sg_remove_sfp_usercontext() must not use sg_device_destroy() after
calling scsi_device_put().

sg_device_destroy() is accessing the parent scsi device request_queue.
Which will already be set to NULL when the preceding call to
scsi_device_put() removed the last reference to the parent scsi device.

The resulting NULL pointer exception will then crash the kernel.

Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
---
Changes compared to V1:
Reworked the commit message

Alexander
---
 drivers/scsi/sg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 86210e4dd0d3..80e0d1981191 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 			"sg_remove_sfp: sfp=0x%p\n", sfp));
 	kfree(sfp);
 
-	scsi_device_put(sdp->device);
 	kref_put(&sdp->d_ref, sg_device_destroy);
+	scsi_device_put(sdp->device);
 	module_put(THIS_MODULE);
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] scsi: sg: Avoid sg device teardown race
  2024-03-20 11:08   ` [PATCH v2] " Alexander Wetzel
@ 2024-03-20 11:16     ` Greg KH
  2024-03-20 11:42       ` Alexander Wetzel
  2024-03-20 15:02     ` Bart Van Assche
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2024-03-20 11:16 UTC (permalink / raw)
  To: Alexander Wetzel; +Cc: dgilbert, linux-scsi, stable

On Wed, Mar 20, 2024 at 12:08:09PM +0100, Alexander Wetzel wrote:
> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
> calling scsi_device_put().
> 
> sg_device_destroy() is accessing the parent scsi device request_queue.
> Which will already be set to NULL when the preceding call to
> scsi_device_put() removed the last reference to the parent scsi device.
> 
> The resulting NULL pointer exception will then crash the kernel.
> 
> Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
> ---
> Changes compared to V1:
> Reworked the commit message

What commit id does this fix?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] scsi: sg: Avoid sg device teardown race
  2024-03-20 11:16     ` Greg KH
@ 2024-03-20 11:42       ` Alexander Wetzel
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Wetzel @ 2024-03-20 11:42 UTC (permalink / raw)
  To: Greg KH; +Cc: dgilbert, linux-scsi, stable

On 20.03.24 12:16, Greg KH wrote:
> On Wed, Mar 20, 2024 at 12:08:09PM +0100, Alexander Wetzel wrote:
>> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
>> calling scsi_device_put().
>>
>> sg_device_destroy() is accessing the parent scsi device request_queue.
>> Which will already be set to NULL when the preceding call to
>> scsi_device_put() removed the last reference to the parent scsi device.
>>
>> The resulting NULL pointer exception will then crash the kernel.
>>
>> Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
>> ---
>> Changes compared to V1:
>> Reworked the commit message
> 
> What commit id does this fix?

It's a combination of patches. I think
db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage") was the 
one which finally broke it.

The in the hindsight wrong sequence was introduced via:
c6517b7942fa ("[SCSI] sg: fix races during device removal")
and cc833acbee9d ("sg: O_EXCL and other lock handling")

Alexander

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] scsi: sg: Avoid sg device teardown race
  2024-03-20 11:08   ` [PATCH v2] " Alexander Wetzel
  2024-03-20 11:16     ` Greg KH
@ 2024-03-20 15:02     ` Bart Van Assche
  2024-03-20 16:58       ` Alexander Wetzel
  2024-03-20 17:46     ` Bart Van Assche
  2024-03-20 21:30     ` [PATCH v3] " Alexander Wetzel
  3 siblings, 1 reply; 12+ messages in thread
From: Bart Van Assche @ 2024-03-20 15:02 UTC (permalink / raw)
  To: Alexander Wetzel, dgilbert; +Cc: linux-scsi, stable

On 3/20/24 04:08, Alexander Wetzel wrote:
> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
> calling scsi_device_put().
> 
> sg_device_destroy() is accessing the parent scsi device request_queue.
> Which will already be set to NULL when the preceding call to
> scsi_device_put() removed the last reference to the parent scsi device.
> 
> The resulting NULL pointer exception will then crash the kernel.
> 
> Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
> ---
> Changes compared to V1:
> Reworked the commit message
> 
> Alexander
> ---
>   drivers/scsi/sg.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 86210e4dd0d3..80e0d1981191 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work)
>   			"sg_remove_sfp: sfp=0x%p\n", sfp));
>   	kfree(sfp);
>   
> -	scsi_device_put(sdp->device);
>   	kref_put(&sdp->d_ref, sg_device_destroy);
> +	scsi_device_put(sdp->device);
>   	module_put(THIS_MODULE);
>   }
>   

Is it guaranteed that the above kref_put() call is the last kref_put()
call on sdp->d_ref? If not, how about inserting code between the
kref_put() call and the scsi_device_put() call that waits until
sg_device_destroy() has finished?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] scsi: sg: Avoid sg device teardown race
  2024-03-20 15:02     ` Bart Van Assche
@ 2024-03-20 16:58       ` Alexander Wetzel
  2024-03-20 17:45         ` Bart Van Assche
  0 siblings, 1 reply; 12+ messages in thread
From: Alexander Wetzel @ 2024-03-20 16:58 UTC (permalink / raw)
  To: Bart Van Assche, dgilbert; +Cc: linux-scsi, stable

On 20.03.24 16:02, Bart Van Assche wrote:
> On 3/20/24 04:08, Alexander Wetzel wrote:
>> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
>> calling scsi_device_put().
>>
>> sg_device_destroy() is accessing the parent scsi device request_queue.
>> Which will already be set to NULL when the preceding call to
>> scsi_device_put() removed the last reference to the parent scsi device.
>>
>> The resulting NULL pointer exception will then crash the kernel.
>>
>> Link: 
>> https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
>> ---
>> Changes compared to V1:
>> Reworked the commit message
>>
>> Alexander
>> ---
>>   drivers/scsi/sg.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>> index 86210e4dd0d3..80e0d1981191 100644
>> --- a/drivers/scsi/sg.c
>> +++ b/drivers/scsi/sg.c
>> @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work)
>>               "sg_remove_sfp: sfp=0x%p\n", sfp));
>>       kfree(sfp);
>> -    scsi_device_put(sdp->device);
>>       kref_put(&sdp->d_ref, sg_device_destroy);
>> +    scsi_device_put(sdp->device);
>>       module_put(THIS_MODULE);
>>   }
> 
> Is it guaranteed that the above kref_put() call is the last kref_put()
> call on sdp->d_ref? If not, how about inserting code between the
> kref_put() call and the scsi_device_put() call that waits until
> sg_device_destroy() has finished?
> 

While I'm not familiar with the code, I'm pretty sure kref_put() is 
removing the last reference to d_ref here. Anything else would be odd, 
based on my - really sketchy - understanding of the flows.

Also waiting for another process looks wrong. I guess we would then have 
to delay the call to sg_release().

And at least for me it's always the last d_ref reference.
I changed the section to:

         kref_put(&sdp->d_ref, sg_device_destroy);
         printk("XXXX scsi=%u, dref=%u\n", \
		kref_read(&sdp->device->sdev_gendev.kobj.kref), \
		kref_read(&sdp->d_ref));
         scsi_device_put(sdp->device);

And connected/disconnected my test USB device a few times:
  XXXX scsi=2, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=2, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0
  XXXX scsi=1, dref=0

(scsi=1 are the cases which would cause the NULL pointer exceptions with 
the unpatched driver.)

Alexander


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] scsi: sg: Avoid sg device teardown race
  2024-03-20 16:58       ` Alexander Wetzel
@ 2024-03-20 17:45         ` Bart Van Assche
  0 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2024-03-20 17:45 UTC (permalink / raw)
  To: Alexander Wetzel, dgilbert; +Cc: linux-scsi, stable

On 3/20/24 09:58, Alexander Wetzel wrote:
> While I'm not familiar with the code, I'm pretty sure kref_put() is 
> removing the last reference to d_ref here. Anything else would be odd, 
> based on my - really sketchy - understanding of the flows.

Please document this by adding a WARN_ON_ONCE() statement before the
kref_put() call that checks that the refcount equals one.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] scsi: sg: Avoid sg device teardown race
  2024-03-20 11:08   ` [PATCH v2] " Alexander Wetzel
  2024-03-20 11:16     ` Greg KH
  2024-03-20 15:02     ` Bart Van Assche
@ 2024-03-20 17:46     ` Bart Van Assche
  2024-03-20 21:30     ` [PATCH v3] " Alexander Wetzel
  3 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2024-03-20 17:46 UTC (permalink / raw)
  To: Alexander Wetzel, dgilbert; +Cc: linux-scsi, stable

On 3/20/24 04:08, Alexander Wetzel wrote:
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 86210e4dd0d3..80e0d1981191 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -2232,8 +2232,8 @@ sg_remove_sfp_usercontext(struct work_struct *work)
>   			"sg_remove_sfp: sfp=0x%p\n", sfp));
>   	kfree(sfp);
>   
> -	scsi_device_put(sdp->device);
>   	kref_put(&sdp->d_ref, sg_device_destroy);
> +	scsi_device_put(sdp->device);
>   	module_put(THIS_MODULE);
>   }

Since sg_device_destroy() frees struct sg_device and since the
scsi_device_put() call reads from struct sg_device, does this patch
introduce a use-after-free? Has it been tested with KASAN enabled?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] scsi: sg: Avoid sg device teardown race
  2024-03-20 11:08   ` [PATCH v2] " Alexander Wetzel
                       ` (2 preceding siblings ...)
  2024-03-20 17:46     ` Bart Van Assche
@ 2024-03-20 21:30     ` Alexander Wetzel
  2024-03-20 21:39       ` Bart Van Assche
  2024-03-26  1:21       ` Martin K. Petersen
  3 siblings, 2 replies; 12+ messages in thread
From: Alexander Wetzel @ 2024-03-20 21:30 UTC (permalink / raw)
  To: dgilbert; +Cc: linux-scsi, bvanassche, gregkh, Alexander Wetzel, stable

sg_remove_sfp_usercontext() must not use sg_device_destroy() after
calling scsi_device_put().

sg_device_destroy() is accessing the parent scsi device request_queue.
Which will already be set to NULL when the preceding call to
scsi_device_put() removed the last reference to the parent scsi device.

The resulting NULL pointer exception will then crash the kernel.

Link: https://lore.kernel.org/r/20240305150509.23896-1-Alexander@wetzel-home.de
Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de>
---
Changes compared to V2:
- Fixed the use-after-free pointed out by Bart
- Added the WARN_ON_ONCE() requested by Bart
- added the Fixes tag pointed out by Greg

This patch has now been tested with KASAN enabled. I also  verified,
that db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
introduced the issue.

Thanks for all your help!

Alexander
---
 drivers/scsi/sg.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 86210e4dd0d3..ff6894ce5404 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -2207,6 +2207,7 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 {
 	struct sg_fd *sfp = container_of(work, struct sg_fd, ew.work);
 	struct sg_device *sdp = sfp->parentdp;
+	struct scsi_device *device = sdp->device;
 	Sg_request *srp;
 	unsigned long iflags;
 
@@ -2232,8 +2233,9 @@ sg_remove_sfp_usercontext(struct work_struct *work)
 			"sg_remove_sfp: sfp=0x%p\n", sfp));
 	kfree(sfp);
 
-	scsi_device_put(sdp->device);
+	WARN_ON_ONCE(kref_read(&sdp->d_ref) != 1);
 	kref_put(&sdp->d_ref, sg_device_destroy);
+	scsi_device_put(device);
 	module_put(THIS_MODULE);
 }
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] scsi: sg: Avoid sg device teardown race
  2024-03-20 21:30     ` [PATCH v3] " Alexander Wetzel
@ 2024-03-20 21:39       ` Bart Van Assche
  2024-03-26  1:21       ` Martin K. Petersen
  1 sibling, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2024-03-20 21:39 UTC (permalink / raw)
  To: Alexander Wetzel, dgilbert; +Cc: linux-scsi, gregkh, stable

On 3/20/24 14:30, Alexander Wetzel wrote:
> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
> calling scsi_device_put().
> 
> sg_device_destroy() is accessing the parent scsi device request_queue.
> Which will already be set to NULL when the preceding call to
> scsi_device_put() removed the last reference to the parent scsi device.
> 
> The resulting NULL pointer exception will then crash the kernel.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] scsi: sg: Avoid sg device teardown race
  2024-03-20 21:30     ` [PATCH v3] " Alexander Wetzel
  2024-03-20 21:39       ` Bart Van Assche
@ 2024-03-26  1:21       ` Martin K. Petersen
  1 sibling, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2024-03-26  1:21 UTC (permalink / raw)
  To: dgilbert, Alexander Wetzel
  Cc: Martin K . Petersen, linux-scsi, bvanassche, gregkh, stable

On Wed, 20 Mar 2024 22:30:32 +0100, Alexander Wetzel wrote:

> sg_remove_sfp_usercontext() must not use sg_device_destroy() after
> calling scsi_device_put().
> 
> sg_device_destroy() is accessing the parent scsi device request_queue.
> Which will already be set to NULL when the preceding call to
> scsi_device_put() removed the last reference to the parent scsi device.
> 
> [...]

Applied to 6.9/scsi-fixes, thanks!

[1/1] scsi: sg: Avoid sg device teardown race
      https://git.kernel.org/mkp/scsi/c/27f58c04a8f4

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-03-26  1:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-05 15:05 [BUG] scsi: sg: NULL pointer dereference Alexander Wetzel
2024-03-18 17:50 ` [PATCH] scsi: sg: Avoid sg device teardown race Alexander Wetzel
2024-03-20 11:08   ` [PATCH v2] " Alexander Wetzel
2024-03-20 11:16     ` Greg KH
2024-03-20 11:42       ` Alexander Wetzel
2024-03-20 15:02     ` Bart Van Assche
2024-03-20 16:58       ` Alexander Wetzel
2024-03-20 17:45         ` Bart Van Assche
2024-03-20 17:46     ` Bart Van Assche
2024-03-20 21:30     ` [PATCH v3] " Alexander Wetzel
2024-03-20 21:39       ` Bart Van Assche
2024-03-26  1:21       ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.