All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth
@ 2022-09-29 18:53 bugzilla-daemon
  2022-09-30  8:53 ` Greg KH
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: bugzilla-daemon @ 2022-09-29 18:53 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

            Bug ID: 216543
           Summary: kernel NULL pointer dereference
                    usb_hcd_alloc_bandwidth
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.19.10
          Hardware: AMD
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: USB
          Assignee: drivers_usb@kernel-bugs.kernel.org
          Reporter: nazar@mokrynskyi.com
        Regression: No

With a flaky USB 3.0 cable (3m extension + 2m cable + 90 degree adapter) and
Logitech BRIO webcam I got exactly the same null pointer dereference twice
already.

I'm sorry for not using vanilla kernel upfront, but I strongly doubt something
as fundamental as this would be different in Xanmod kernel.

Call traces are quite similar at the top, so while triggered from different
places, the actual bug must be the same.

Here are two instances (from different boots):
[64977.148098] BUG: kernel NULL pointer dereference, address: 0000000000000000
[64977.148101] #PF: supervisor read access in kernel mode
[64977.148102] #PF: error_code(0x0000) - not-present page
[64977.148103] PGD 101370067 P4D 101370067 PUD 0
[64977.148105] Oops: 0000 [#1] SMP NOPTI
[64977.148107] CPU: 14 PID: 27951 Comm: VideoCapture Not tainted
5.19.10-xanmod1-x64v2 #0~20220920.git017c598
[64977.148109] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550
VISION D, BIOS F15d 07/20/2022
[64977.148109] RIP: 0010:usb_ifnum_to_if+0x34/0x60
[64977.148113] Code: 74 33 0f b6 4a 04 84 c9 74 33 83 e9 01 48 8d 82 98 00 00
00 48 8d bc ca a0 00 00 00 eb 09 48 83 c0 08 48 39 f8 74 16 48 8b 10 <48> 8b 0a
0f b6 49 02 39 f1 75 e9 48 89 d0 c3 cc cc cc cc 31 d2 48
[64977.148114] RSP: 0018:ffffb20951407bb0 EFLAGS: 00010202
[64977.148115] RAX: ffff8cfbbc618098 RBX: ffff8ceb844cc800 RCX:
0000000000000004
[64977.148116] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff8cfbbc6180c0
[64977.148117] RBP: 0000000000000000 R08: 0000000080000000 R09:
ffffffff8f590de8
[64977.148117] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff8cf67c70f398
[64977.148118] R13: 0000000000000000 R14: ffff8cf67c70f208 R15:
ffff8ceb8123c000
[64977.148119] FS:  00007f5f51379640(0000) GS:ffff8d0a3ed80000(0000)
knlGS:0000000000000000
[64977.148120] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[64977.148120] CR2: 0000000000000000 CR3: 000000023b842000 CR4:
0000000000750ee0
[64977.148121] PKRU: 55555554
[64977.148122] Call Trace:
[64977.148123]  <TASK>
[64977.148124]  usb_hcd_alloc_bandwidth+0x241/0x360
[64977.148127]  usb_set_interface+0x11d/0x340
[64977.148130]  uvc_video_start_transfer+0x17b/0x4b0 [uvcvideo]
[64977.148134]  uvc_video_start_streaming+0x6f/0xc0 [uvcvideo]
[64977.148137]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
[64977.148139]  vb2_start_streaming+0x7f/0x120 [videobuf2_common]
[64977.148142]  vb2_core_streamon+0x53/0xb0 [videobuf2_common]
[64977.148144]  uvc_queue_streamon+0x22/0x40 [uvcvideo]
[64977.148146]  uvc_ioctl_streamon+0x33/0x50 [uvcvideo]
[64977.148148]  __video_do_ioctl+0x197/0x3e0 [videodev]
[64977.148153]  ? kernel_clone+0xfb/0x3d0
[64977.148156]  video_usercopy+0x2b3/0x670 [videodev]
[64977.148160]  ? v4l_print_control+0x20/0x20 [videodev]
[64977.148163]  ? handle_mm_fault+0xcb/0x2b0
[64977.148166]  v4l2_ioctl+0x44/0x50 [videodev]
[64977.148169]  __x64_sys_ioctl+0x8b/0xc0
[64977.148171]  do_syscall_64+0x5b/0x80
[64977.148174]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[64977.148176] RIP: 0033:0x7f5f8c300aff
[64977.148177] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00
00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0
3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[64977.148178] RSP: 002b:00007f5f51378320 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[64977.148179] RAX: ffffffffffffffda RBX: 00007f5f513783a0 RCX:
00007f5f8c300aff
[64977.148180] RDX: 00007f5f513783c0 RSI: 0000000040045612 RDI:
0000000000000178
[64977.148180] RBP: 00007f5f51378630 R08: 00007f5f331b1640 R09:
00007f5f5137811f
[64977.148181] R10: 0000000000000008 R11: 0000000000000246 R12:
00007f5f513783c0
[64977.148181] R13: 00007f5d8c8eb390 R14: 00007f5d8c8eb000 R15:
0000000000000000
[64977.148182]  </TASK>
[64977.148183] Modules linked in: xt_nat veth nf_conntrack_netlink xfrm_user
xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack
ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc
overlay nvme_fabrics binfmt_misc nls_iso8859_1 iwlmvm snd_hda_codec_hdmi
sch_fq_codel intel_rapl_msr snd_hda_intel intel_rapl_common uvcvideo
snd_intel_dspcfg mac80211 snd_usb_audio libarc4 videobuf2_vmalloc
snd_intel_sdw_acpi edac_mce_amd videobuf2_memops snd_hda_codec snd_usbmidi_lib
videobuf2_v4l2 snd_hda_core videobuf2_common snd_rawmidi nct6775_core snd_hwdep
snd_seq_device videodev iwlwifi btusb hwmon_vid kvm_amd btrtl vfio_pci
input_leds joydev snd_pcm btbcm mc btintel vfio_pci_core iwlmei snd_timer kvm
vfio_virqfd btmtk cfg80211 irqbypass bluetooth snd ucsi_ccg cuse mei ccp
typec_ucsi soundcore lp k10temp serio_raw wmi_bmof typec ecdh_generic ecc
gigabyte_wmi rapl mac_hid parport msr bfq
[64977.148209]  ramoops reed_solomon pstore_blk pstore_zone efi_pstore
ip_tables x_tables autofs4 btrfs blake2b_generic dm_crypt raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid1 raid0 multipath linear uas usb_storage hid_generic usbhid hid
amdgpu iommu_v2 gpu_sched drm_ttm_helper ttm drm_display_helper cec rc_core
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd igb cryptd psmouse
i2c_nvidia_gpu drm i2c_piix4 nvme ahci i2c_ccgx_ucsi dca xhci_pci i2c_algo_bit
thunderbolt nvme_core libahci xhci_pci_renesas wmi gpio_amdpt
[64977.148231] CR2: 0000000000000000
[64977.148232] ---[ end trace 0000000000000000 ]---
[64977.308559] RIP: 0010:usb_ifnum_to_if+0x34/0x60
[64977.308566] Code: 74 33 0f b6 4a 04 84 c9 74 33 83 e9 01 48 8d 82 98 00 00
00 48 8d bc ca a0 00 00 00 eb 09 48 83 c0 08 48 39 f8 74 16 48 8b 10 <48> 8b 0a
0f b6 49 02 39 f1 75 e9 48 89 d0 c3 cc cc cc cc 31 d2 48
[64977.308568] RSP: 0018:ffffb20951407bb0 EFLAGS: 00010202
[64977.308570] RAX: ffff8cfbbc618098 RBX: ffff8ceb844cc800 RCX:
0000000000000004
[64977.308571] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff8cfbbc6180c0
[64977.308572] RBP: 0000000000000000 R08: 0000000080000000 R09:
ffffffff8f590de8
[64977.308574] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff8cf67c70f398
[64977.308574] R13: 0000000000000000 R14: ffff8cf67c70f208 R15:
ffff8ceb8123c000
[64977.308576] FS:  00007f5f51379640(0000) GS:ffff8d0a3ed80000(0000)
knlGS:0000000000000000
[64977.308577] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[64977.308578] CR2: 0000000000000000 CR3: 000000023b842000 CR4:
0000000000750ee0
[64977.308579] PKRU: 55555554


[112221.564394] usb 10-4: USB disconnect, device number 8
[112222.544520] BUG: kernel NULL pointer dereference, address: 0000000000000000
[112222.544524] #PF: supervisor read access in kernel mode
[112222.544525] #PF: error_code(0x0000) - not-present page
[112222.544526] PGD 0 P4D 0
[112222.544528] Oops: 0000 [#1] SMP NOPTI
[112222.544530] CPU: 9 PID: 9584 Comm: VideoCapture Not tainted
5.19.10-xanmod1-x64v2 #0~20220920.git017c598
[112222.544533] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550
VISION D, BIOS F15d 07/20/2022
[112222.544533] RIP: 0010:usb_ifnum_to_if+0x34/0x60
[112222.544538] Code: 74 33 0f b6 4a 04 84 c9 74 33 83 e9 01 48 8d 82 98 00 00
00 48 8d bc ca a0 00 00 00 eb 09 48 83 c0 08 48 39 f8 74 16 48 8b 10 <48> 8b 0a
0f b6 49 02 39 f1 75 e9 48 89 d0 c3 cc cc cc cc 31 d2 48
[112222.544540] RSP: 0018:ffffb3bb10eb7b70 EFLAGS: 00010206
[112222.544541] RAX: ffff91ccf8026898 RBX: ffff91ccc45b9800 RCX:
0000000000000005
[112222.544542] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff91ccf80268c8
[112222.544543] RBP: 0000000000000000 R08: 0000000080000000 R09:
ffffffffaff90de8
[112222.544544] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff91ccdf4484f8
[112222.544544] R13: 0000000000000000 R14: ffff91ccdf448408 R15:
ffff91ccdef7e000
[112222.544545] FS:  00007f8f9efae640(0000) GS:ffff91eb7ec40000(0000)
knlGS:0000000000000000
[112222.544546] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[112222.544547] CR2: 0000000000000000 CR3: 0000000110eb8000 CR4:
0000000000750ee0
[112222.544548] PKRU: 55555554
[112222.544549] Call Trace:
[112222.544550]  <TASK>
[112222.544551]  usb_hcd_alloc_bandwidth+0x241/0x360
[112222.544555]  usb_set_interface+0x11d/0x340
[112222.544558]  uvc_video_start_transfer+0x17b/0x4b0 [uvcvideo]
[112222.544563]  uvc_video_start_streaming+0x6f/0xc0 [uvcvideo]
[112222.544566]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
[112222.544570]  vb2_start_streaming+0x7f/0x120 [videobuf2_common]
[112222.544573]  vb2_core_streamon+0x53/0xb0 [videobuf2_common]
[112222.544575]  uvc_queue_streamon+0x22/0x40 [uvcvideo]
[112222.544578]  uvc_ioctl_streamon+0x33/0x50 [uvcvideo]
[112222.544581]  __video_do_ioctl+0x197/0x3e0 [videodev]
[112222.544588]  ? __do_sys_clone3+0xc2/0x100
[112222.544590]  video_usercopy+0x2b3/0x670 [videodev]
[112222.544596]  ? v4l_print_control+0x20/0x20 [videodev]
[112222.544600]  ? sigprocmask+0xa0/0xd0
[112222.544602]  ? sigprocmask+0xa0/0xd0
[112222.544602]  ? exit_to_user_mode_prepare+0x2b/0x130
[112222.544605]  ? syscall_exit_to_user_mode+0x22/0x50
[112222.544607]  ? do_syscall_64+0x67/0x80
[112222.544609]  v4l2_ioctl+0x44/0x50 [videodev]
[112222.544613]  __x64_sys_ioctl+0x8b/0xc0
[112222.544616]  do_syscall_64+0x5b/0x80
[112222.544618]  ? syscall_exit_to_user_mode+0x22/0x50
[112222.544619]  ? exit_to_user_mode_prepare+0x2b/0x130
[112222.544620]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[112222.544622] RIP: 0033:0x7f8fdc256aff
[112222.544624] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00
00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0
3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[112222.544625] RSP: 002b:00007f8f9efad320 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[112222.544626] RAX: ffffffffffffffda RBX: 00007f8f9efad3a0 RCX:
00007f8fdc256aff
[112222.544627] RDX: 00007f8f9efad3c0 RSI: 0000000040045612 RDI:
000000000000003b
[112222.544628] RBP: 00007f8f9efad630 R08: 00007f8f800c7640 R09:
00007f8f9efad11f
[112222.544629] R10: 0000000000000008 R11: 0000000000000246 R12:
00007f8f9efad3c0
[112222.544629] R13: 00007f8f267c8390 R14: 00007f8f267c8000 R15:
0000000000000000
[112222.544631]  </TASK>
[112222.544631] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack
ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc
overlay nvme_fabrics binfmt_misc nls_iso8859_1 snd_hda_codec_hdmi sch_fq_codel
intel_rapl_msr snd_hda_intel iwlmvm uvcvideo intel_rapl_common
videobuf2_vmalloc mac80211 snd_intel_dspcfg libarc4 videobuf2_memops
snd_usb_audio snd_intel_sdw_acpi videobuf2_v4l2 edac_mce_amd snd_hda_codec
nct6775_core snd_usbmidi_lib btusb videobuf2_common btrtl snd_rawmidi
snd_hda_core hwmon_vid videodev btbcm snd_seq_device snd_hwdep iwlwifi btintel
kvm_amd snd_pcm btmtk vfio_pci joydev input_leds mc iwlmei kvm bluetooth
cfg80211 snd_timer vfio_pci_core ucsi_ccg snd typec_ucsi mei ccp ecdh_generic
typec soundcore serio_raw gigabyte_wmi ecc k10temp rapl wmi_bmof mac_hid
vfio_virqfd irqbypass cuse lp parport msr bfq ramoops reed_solomon pstore_blk
pstore_zone efi_pstore ip_tables x_tables
[112222.544663]  autofs4 btrfs blake2b_generic dm_crypt raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid1 raid0 multipath linear uas usb_storage hid_generic usbhid
amdgpu hid iommu_v2 gpu_sched drm_ttm_helper ttm drm_display_helper cec rc_core
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd igb cryptd
i2c_nvidia_gpu dca psmouse drm i2c_piix4 i2c_ccgx_ucsi nvme i2c_algo_bit
thunderbolt ahci xhci_pci nvme_core libahci xhci_pci_renesas wmi gpio_amdpt
[112222.544683] CR2: 0000000000000000
[112222.544684] ---[ end trace 0000000000000000 ]---
[112222.711095] RIP: 0010:usb_ifnum_to_if+0x34/0x60
[112222.711101] Code: 74 33 0f b6 4a 04 84 c9 74 33 83 e9 01 48 8d 82 98 00 00
00 48 8d bc ca a0 00 00 00 eb 09 48 83 c0 08 48 39 f8 74 16 48 8b 10 <48> 8b 0a
0f b6 49 02 39 f1 75 e9 48 89 d0 c3 cc cc cc cc 31 d2 48
[112222.711103] RSP: 0018:ffffb3bb10eb7b70 EFLAGS: 00010206
[112222.711104] RAX: ffff91ccf8026898 RBX: ffff91ccc45b9800 RCX:
0000000000000005
[112222.711105] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff91ccf80268c8
[112222.711106] RBP: 0000000000000000 R08: 0000000080000000 R09:
ffffffffaff90de8
[112222.711106] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff91ccdf4484f8
[112222.711107] R13: 0000000000000000 R14: ffff91ccdf448408 R15:
ffff91ccdef7e000
[112222.711108] FS:  00007f8f9efae640(0000) GS:ffff91eb7ec40000(0000)
knlGS:0000000000000000
[112222.711109] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[112222.711109] CR2: 0000000000000000 CR3: 0000000110eb8000 CR4:
0000000000750ee0
[112222.711110] PKRU: 55555554

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
@ 2022-09-30  8:53 ` Greg KH
  2022-09-30  8:53 ` [Bug 216543] " bugzilla-daemon
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2022-09-30  8:53 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-usb

On Thu, Sep 29, 2022 at 06:53:46PM +0000, bugzilla-daemon@kernel.org wrote:
> With a flaky USB 3.0 cable (3m extension + 2m cable + 90 degree adapter) and
> Logitech BRIO webcam I got exactly the same null pointer dereference twice
> already.

That's really an unstable and unsupported system, sorry.  If you fix
your cable it should work properly, right?

> Here are two instances (from different boots):
> [64977.148098] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [64977.148101] #PF: supervisor read access in kernel mode
> [64977.148102] #PF: error_code(0x0000) - not-present page
> [64977.148103] PGD 101370067 P4D 101370067 PUD 0
> [64977.148105] Oops: 0000 [#1] SMP NOPTI
> [64977.148107] CPU: 14 PID: 27951 Comm: VideoCapture Not tainted
> 5.19.10-xanmod1-x64v2 #0~20220920.git017c598

What about any kernel log messages from right before this crashed?
There should be some disconnect or other USB messages, right?  Specifics
here would be good to see.


> [64977.148109] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550
> VISION D, BIOS F15d 07/20/2022
> [64977.148109] RIP: 0010:usb_ifnum_to_if+0x34/0x60
> [64977.148113] Code: 74 33 0f b6 4a 04 84 c9 74 33 83 e9 01 48 8d 82 98 00 00
> 00 48 8d bc ca a0 00 00 00 eb 09 48 83 c0 08 48 39 f8 74 16 48 8b 10 <48> 8b 0a
> 0f b6 49 02 39 f1 75 e9 48 89 d0 c3 cc cc cc cc 31 d2 48
> [64977.148114] RSP: 0018:ffffb20951407bb0 EFLAGS: 00010202
> [64977.148115] RAX: ffff8cfbbc618098 RBX: ffff8ceb844cc800 RCX:
> 0000000000000004
> [64977.148116] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffff8cfbbc6180c0
> [64977.148117] RBP: 0000000000000000 R08: 0000000080000000 R09:
> ffffffff8f590de8
> [64977.148117] R10: 0000000000000001 R11: 0000000000000001 R12:
> ffff8cf67c70f398
> [64977.148118] R13: 0000000000000000 R14: ffff8cf67c70f208 R15:
> ffff8ceb8123c000
> [64977.148119] FS:  00007f5f51379640(0000) GS:ffff8d0a3ed80000(0000)
> knlGS:0000000000000000
> [64977.148120] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [64977.148120] CR2: 0000000000000000 CR3: 000000023b842000 CR4:
> 0000000000750ee0
> [64977.148121] PKRU: 55555554
> [64977.148122] Call Trace:
> [64977.148123]  <TASK>
> [64977.148124]  usb_hcd_alloc_bandwidth+0x241/0x360
> [64977.148127]  usb_set_interface+0x11d/0x340
> [64977.148130]  uvc_video_start_transfer+0x17b/0x4b0 [uvcvideo]

This isn't good, we shouldn't crash when a device is removed, but this
might be an issue with some reference counting.  More log messages might
help out here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
  2022-09-30  8:53 ` Greg KH
@ 2022-09-30  8:53 ` bugzilla-daemon
  2022-09-30 11:32 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2022-09-30  8:53 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

--- Comment #1 from gregkh@linuxfoundation.org ---
On Thu, Sep 29, 2022 at 06:53:46PM +0000, bugzilla-daemon@kernel.org wrote:
> With a flaky USB 3.0 cable (3m extension + 2m cable + 90 degree adapter) and
> Logitech BRIO webcam I got exactly the same null pointer dereference twice
> already.

That's really an unstable and unsupported system, sorry.  If you fix
your cable it should work properly, right?

> Here are two instances (from different boots):
> [64977.148098] BUG: kernel NULL pointer dereference, address:
> 0000000000000000
> [64977.148101] #PF: supervisor read access in kernel mode
> [64977.148102] #PF: error_code(0x0000) - not-present page
> [64977.148103] PGD 101370067 P4D 101370067 PUD 0
> [64977.148105] Oops: 0000 [#1] SMP NOPTI
> [64977.148107] CPU: 14 PID: 27951 Comm: VideoCapture Not tainted
> 5.19.10-xanmod1-x64v2 #0~20220920.git017c598

What about any kernel log messages from right before this crashed?
There should be some disconnect or other USB messages, right?  Specifics
here would be good to see.


> [64977.148109] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION
> D/B550
> VISION D, BIOS F15d 07/20/2022
> [64977.148109] RIP: 0010:usb_ifnum_to_if+0x34/0x60
> [64977.148113] Code: 74 33 0f b6 4a 04 84 c9 74 33 83 e9 01 48 8d 82 98 00 00
> 00 48 8d bc ca a0 00 00 00 eb 09 48 83 c0 08 48 39 f8 74 16 48 8b 10 <48> 8b
> 0a
> 0f b6 49 02 39 f1 75 e9 48 89 d0 c3 cc cc cc cc 31 d2 48
> [64977.148114] RSP: 0018:ffffb20951407bb0 EFLAGS: 00010202
> [64977.148115] RAX: ffff8cfbbc618098 RBX: ffff8ceb844cc800 RCX:
> 0000000000000004
> [64977.148116] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffff8cfbbc6180c0
> [64977.148117] RBP: 0000000000000000 R08: 0000000080000000 R09:
> ffffffff8f590de8
> [64977.148117] R10: 0000000000000001 R11: 0000000000000001 R12:
> ffff8cf67c70f398
> [64977.148118] R13: 0000000000000000 R14: ffff8cf67c70f208 R15:
> ffff8ceb8123c000
> [64977.148119] FS:  00007f5f51379640(0000) GS:ffff8d0a3ed80000(0000)
> knlGS:0000000000000000
> [64977.148120] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [64977.148120] CR2: 0000000000000000 CR3: 000000023b842000 CR4:
> 0000000000750ee0
> [64977.148121] PKRU: 55555554
> [64977.148122] Call Trace:
> [64977.148123]  <TASK>
> [64977.148124]  usb_hcd_alloc_bandwidth+0x241/0x360
> [64977.148127]  usb_set_interface+0x11d/0x340
> [64977.148130]  uvc_video_start_transfer+0x17b/0x4b0 [uvcvideo]

This isn't good, we shouldn't crash when a device is removed, but this
might be an issue with some reference counting.  More log messages might
help out here.

thanks,

greg k-h

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
  2022-09-30  8:53 ` Greg KH
  2022-09-30  8:53 ` [Bug 216543] " bugzilla-daemon
@ 2022-09-30 11:32 ` bugzilla-daemon
  2022-09-30 11:32 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2022-09-30 11:32 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

--- Comment #2 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
Created attachment 301905
  --> https://bugzilla.kernel.org/attachment.cgi?id=301905&action=edit
kernel log from latest crash

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
                   ` (2 preceding siblings ...)
  2022-09-30 11:32 ` bugzilla-daemon
@ 2022-09-30 11:32 ` bugzilla-daemon
  2022-09-30 11:38 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2022-09-30 11:32 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

--- Comment #3 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> That's really an unstable and unsupported system, sorry.  If you fix your
> cable it should work properly, right?

Yes. And I totally understand that is not supported, the only reason I posted
this is because it seemed to uncover some race condition in the code that might
be beneficial to fix.

> What about any kernel log messages from right before this crashed?
> There should be some disconnect or other USB messages, right?  Specifics
here would be good to see.

Attached full kernel log.

> This isn't good, we shouldn't crash when a device is removed, but this
might be an issue with some reference counting.  More log messages might
help out here.

Yes, that is the reason I decided to create a bug report, just hoping it is
useful.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-09-30 11:32 ` bugzilla-daemon
@ 2022-09-30 11:38 ` bugzilla-daemon
  2022-09-30 12:32   ` Greg KH
  2022-09-30 12:32 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: bugzilla-daemon @ 2022-09-30 11:38 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

--- Comment #4 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
Created attachment 301906
  --> https://bugzilla.kernel.org/attachment.cgi?id=301906&action=edit
kernel log from first crash

Previously uploaded file is from second log snippet, this is the first one for
completeness since stack traces are slightly different there.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-30 11:38 ` bugzilla-daemon
@ 2022-09-30 12:32   ` Greg KH
  0 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2022-09-30 12:32 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-usb

On Fri, Sep 30, 2022 at 11:38:46AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216543
> 
> --- Comment #4 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> Created attachment 301906
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301906&action=edit
> kernel log from first crash
> 
> Previously uploaded file is from second log snippet, this is the first one for
> completeness since stack traces are slightly different there.

The log file is full of warnings and other messages way before USB is
ever involved.  You might want to resolve those first.

Anyway, yes, the device disconnects itself from the USB bus which is an
electrical event and the video driver fails trying to send data to it,
and then things blow up again.  As there is a real solution for this
(fix the cable), I recommend doing that first :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-09-30 11:38 ` bugzilla-daemon
@ 2022-09-30 12:32 ` bugzilla-daemon
  2022-09-30 14:28 ` bugzilla-daemon
  2022-10-17 17:59 ` bugzilla-daemon
  7 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2022-09-30 12:32 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

--- Comment #5 from gregkh@linuxfoundation.org ---
On Fri, Sep 30, 2022 at 11:38:46AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216543
> 
> --- Comment #4 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> Created attachment 301906
>   --> https://bugzilla.kernel.org/attachment.cgi?id=301906&action=edit
> kernel log from first crash
> 
> Previously uploaded file is from second log snippet, this is the first one
> for
> completeness since stack traces are slightly different there.

The log file is full of warnings and other messages way before USB is
ever involved.  You might want to resolve those first.

Anyway, yes, the device disconnects itself from the USB bus which is an
electrical event and the video driver fails trying to send data to it,
and then things blow up again.  As there is a real solution for this
(fix the cable), I recommend doing that first :)

thanks,

greg k-h

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-09-30 12:32 ` bugzilla-daemon
@ 2022-09-30 14:28 ` bugzilla-daemon
  2022-10-17 17:59 ` bugzilla-daemon
  7 siblings, 0 replies; 21+ messages in thread
From: bugzilla-daemon @ 2022-09-30 14:28 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

--- Comment #6 from Alan Stern (stern@rowland.harvard.edu) ---
Created attachment 301908
  --> https://bugzilla.kernel.org/attachment.cgi?id=301908&action=edit
Diagnostic patch for uvcvideo driver

This looks like a race in the uvcvideo driver, possibly between disconnect and
video start.

You might be able to trace this down more by running with the attached patch.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
                   ` (6 preceding siblings ...)
  2022-09-30 14:28 ` bugzilla-daemon
@ 2022-10-17 17:59 ` bugzilla-daemon
  2022-10-17 21:25   ` Alan Stern
  7 siblings, 1 reply; 21+ messages in thread
From: bugzilla-daemon @ 2022-10-17 17:59 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=216543

--- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
Created attachment 303022
  --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
Kernel log with uvc-trace patch applied

I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
within a few minutes.

USB seems to reset, often camera stops or freezes in the browser, but the light
on the camera itself remains on. Sometimes I can enable/disable/enable camera
for it to reboot, but the last time I did that in the log I got null pointer
de-reference again.

Please let me know if there is any other information I can provide and what
could be the root cause of this annoying behavior.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-17 17:59 ` bugzilla-daemon
@ 2022-10-17 21:25   ` Alan Stern
  2022-10-18  5:40     ` Ricardo Ribalda
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2022-10-17 21:25 UTC (permalink / raw)
  To: Nazar Mokrynskyi; +Cc: Laurent Pinchart, linux-media, linux-usb

Moving this bug report from bugzilla to the mailing lists.

The short description of the bug is that in uvcvideo, disconnect races 
with starting a video transfer.  The race shows up on Nazar's system 
because of a marginal USB cable which leads to a lot of spontaneous 
disconnections.

On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216543
> 
> --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> Created attachment 303022
>   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> Kernel log with uvc-trace patch applied

For everyone's information, here is the uvc-trace patch.  All it does is 
add messages to the kernel log when uvcvideo's probe and disconnect 
routines run, and just before uvc_video_start_transfer() calls 
usb_set_interface().

--- usb-devel/drivers/media/usb/uvc/uvc_video.c	
+++ usb-devel/drivers/media/usb/uvc/uvc_video.c	
@@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
 			"Selecting alternate setting %u (%u B/frame bandwidth)\n",
 			altsetting, best_psize);
 
+		dev_info(&intf->dev, "uvc set alt\n");
 		ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
 		if (ret < 0)
 			return ret;
--- usb-devel/drivers/media/usb/uvc/uvc_driver.c	
+++ usb-devel/drivers/media/usb/uvc/uvc_driver.c	
@@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
 	int function;
 	int ret;
 
+	dev_info(&intf->dev, "uvc_probe start\n");
+
 	/* Allocate memory for the device and initialize it. */
 	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
 	if (dev == NULL)
@@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
 		return;
 
 	uvc_unregister_video(dev);
+	dev_info(&intf->dev, "uvc_disconnect done\n");
 	kref_put(&dev->ref, uvc_delete);
 }
 
The output in the kernel log below clearly shows that there is a bug in 
the uvcvideo driver.

> I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> within a few minutes.
> 
> USB seems to reset, often camera stops or freezes in the browser, but the light
> on the camera itself remains on. Sometimes I can enable/disable/enable camera
> for it to reboot, but the last time I did that in the log I got null pointer
> de-reference again.

Here is the important part of the log:

[  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
[  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
[  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
[  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
[  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
...
[  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt

This is where an ioctl calls uvc_video_start_transfer.

[  748.654701] usb 8-2.4.4: USB disconnect, device number 6
[  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done

This is where the disconnect starts and finishes

[  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  748.898344] #PF: supervisor read access in kernel mode
[  748.898346] #PF: error_code(0x0000) - not-present page
[  748.898347] PGD 0 P4D 0
[  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
[  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
[  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
...
[  748.898368] Call Trace:
[  748.898370]  <TASK>
[  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
[  748.898375]  usb_set_interface+0x122/0x350
[  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
[  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
[  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
[  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
[  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
[  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
[  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
[  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]

And this proves that uvc_disconnect() returned before the driver was 
finished accessing the device.

I don't know how the driver works or how it tries to prevent this sort 
of race from occurring, but apparently the strategy isn't working.

> Please let me know if there is any other information I can provide and what
> could be the root cause of this annoying behavior.

At this point I will bow out of the discussion; it's up to the uvcvideo 
maintainers to investigate further.  Maybe they can provide a patch for 
you to test.

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-17 21:25   ` Alan Stern
@ 2022-10-18  5:40     ` Ricardo Ribalda
  2022-10-18  5:42       ` Ricardo Ribalda
                         ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-18  5:40 UTC (permalink / raw)
  To: Alan Stern, inux
  Cc: Nazar Mokrynskyi, Laurent Pinchart, linux-media, linux-usb

Hi

Guenter already provided some patches to fix this issue:
https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/

Until we have a solution on the core (or rewrite the kernel in rust
;P) , I think we should merge them (or something similar).

I can prepare a patchset merging Guenter set and my "grannular PM"
https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/

It can always be reverted when we reach consensus on how to do it for
every driver.

Regards!


On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> Moving this bug report from bugzilla to the mailing lists.
>
> The short description of the bug is that in uvcvideo, disconnect races
> with starting a video transfer.  The race shows up on Nazar's system
> because of a marginal USB cable which leads to a lot of spontaneous
> disconnections.
>
> On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> >
> > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > Created attachment 303022
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > Kernel log with uvc-trace patch applied
>
> For everyone's information, here is the uvc-trace patch.  All it does is
> add messages to the kernel log when uvcvideo's probe and disconnect
> routines run, and just before uvc_video_start_transfer() calls
> usb_set_interface().
>
> --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
>                         "Selecting alternate setting %u (%u B/frame bandwidth)\n",
>                         altsetting, best_psize);
>
> +               dev_info(&intf->dev, "uvc set alt\n");
>                 ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
>                 if (ret < 0)
>                         return ret;
> --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
>         int function;
>         int ret;
>
> +       dev_info(&intf->dev, "uvc_probe start\n");
> +
>         /* Allocate memory for the device and initialize it. */
>         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
>         if (dev == NULL)
> @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
>                 return;
>
>         uvc_unregister_video(dev);
> +       dev_info(&intf->dev, "uvc_disconnect done\n");
>         kref_put(&dev->ref, uvc_delete);
>  }
>
> The output in the kernel log below clearly shows that there is a bug in
> the uvcvideo driver.
>
> > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > within a few minutes.
> >
> > USB seems to reset, often camera stops or freezes in the browser, but the light
> > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > for it to reboot, but the last time I did that in the log I got null pointer
> > de-reference again.
>
> Here is the important part of the log:
>
> [  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> [  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> [  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> [  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> [  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> ...
> [  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
>
> This is where an ioctl calls uvc_video_start_transfer.
>
> [  748.654701] usb 8-2.4.4: USB disconnect, device number 6
> [  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
>
> This is where the disconnect starts and finishes
>
> [  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  748.898344] #PF: supervisor read access in kernel mode
> [  748.898346] #PF: error_code(0x0000) - not-present page
> [  748.898347] PGD 0 P4D 0
> [  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> [  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> [  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> ...
> [  748.898368] Call Trace:
> [  748.898370]  <TASK>
> [  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
> [  748.898375]  usb_set_interface+0x122/0x350
> [  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> [  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> [  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
> [  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
> [  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> [  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
> [  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> [  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]
>
> And this proves that uvc_disconnect() returned before the driver was
> finished accessing the device.
>
> I don't know how the driver works or how it tries to prevent this sort
> of race from occurring, but apparently the strategy isn't working.
>
> > Please let me know if there is any other information I can provide and what
> > could be the root cause of this annoying behavior.
>
> At this point I will bow out of the discussion; it's up to the uvcvideo
> maintainers to investigate further.  Maybe they can provide a patch for
> you to test.
>
> Alan Stern



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-18  5:40     ` Ricardo Ribalda
@ 2022-10-18  5:42       ` Ricardo Ribalda
  2022-10-18 14:46       ` Alan Stern
  2022-10-18 15:02       ` Laurent Pinchart
  2 siblings, 0 replies; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-18  5:42 UTC (permalink / raw)
  To: Alan Stern, linux
  Cc: Nazar Mokrynskyi, Laurent Pinchart, linux-media, linux-usb

using proper Guenter email

On Tue, 18 Oct 2022 at 14:40, Ricardo Ribalda <ribalda@chromium.org> wrote:
>
> Hi
>
> Guenter already provided some patches to fix this issue:
> https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
>
> Until we have a solution on the core (or rewrite the kernel in rust
> ;P) , I think we should merge them (or something similar).
>
> I can prepare a patchset merging Guenter set and my "grannular PM"
> https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
>
> It can always be reverted when we reach consensus on how to do it for
> every driver.
>
> Regards!
>
>
> On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Moving this bug report from bugzilla to the mailing lists.
> >
> > The short description of the bug is that in uvcvideo, disconnect races
> > with starting a video transfer.  The race shows up on Nazar's system
> > because of a marginal USB cable which leads to a lot of spontaneous
> > disconnections.
> >
> > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > >
> > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > Created attachment 303022
> > >   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > Kernel log with uvc-trace patch applied
> >
> > For everyone's information, here is the uvc-trace patch.  All it does is
> > add messages to the kernel log when uvcvideo's probe and disconnect
> > routines run, and just before uvc_video_start_transfer() calls
> > usb_set_interface().
> >
> > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> >                         "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> >                         altsetting, best_psize);
> >
> > +               dev_info(&intf->dev, "uvc set alt\n");
> >                 ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> >                 if (ret < 0)
> >                         return ret;
> > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> >         int function;
> >         int ret;
> >
> > +       dev_info(&intf->dev, "uvc_probe start\n");
> > +
> >         /* Allocate memory for the device and initialize it. */
> >         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> >         if (dev == NULL)
> > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> >                 return;
> >
> >         uvc_unregister_video(dev);
> > +       dev_info(&intf->dev, "uvc_disconnect done\n");
> >         kref_put(&dev->ref, uvc_delete);
> >  }
> >
> > The output in the kernel log below clearly shows that there is a bug in
> > the uvcvideo driver.
> >
> > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > within a few minutes.
> > >
> > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > for it to reboot, but the last time I did that in the log I got null pointer
> > > de-reference again.
> >
> > Here is the important part of the log:
> >
> > [  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > [  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > [  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > [  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > [  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > ...
> > [  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> >
> > This is where an ioctl calls uvc_video_start_transfer.
> >
> > [  748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > [  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> >
> > This is where the disconnect starts and finishes
> >
> > [  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [  748.898344] #PF: supervisor read access in kernel mode
> > [  748.898346] #PF: error_code(0x0000) - not-present page
> > [  748.898347] PGD 0 P4D 0
> > [  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > [  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > [  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > ...
> > [  748.898368] Call Trace:
> > [  748.898370]  <TASK>
> > [  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
> > [  748.898375]  usb_set_interface+0x122/0x350
> > [  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > [  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > [  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > [  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > [  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > [  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > [  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > [  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]
> >
> > And this proves that uvc_disconnect() returned before the driver was
> > finished accessing the device.
> >
> > I don't know how the driver works or how it tries to prevent this sort
> > of race from occurring, but apparently the strategy isn't working.
> >
> > > Please let me know if there is any other information I can provide and what
> > > could be the root cause of this annoying behavior.
> >
> > At this point I will bow out of the discussion; it's up to the uvcvideo
> > maintainers to investigate further.  Maybe they can provide a patch for
> > you to test.
> >
> > Alan Stern
>
>
>
> --
> Ricardo Ribalda



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-18  5:40     ` Ricardo Ribalda
  2022-10-18  5:42       ` Ricardo Ribalda
@ 2022-10-18 14:46       ` Alan Stern
  2022-10-18 15:02       ` Laurent Pinchart
  2 siblings, 0 replies; 21+ messages in thread
From: Alan Stern @ 2022-10-18 14:46 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: inux, Nazar Mokrynskyi, Laurent Pinchart, linux-media, linux-usb

On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> Hi
> 
> Guenter already provided some patches to fix this issue:
> https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> 
> Until we have a solution on the core (or rewrite the kernel in rust
> ;P) , I think we should merge them (or something similar).
> 
> I can prepare a patchset merging Guenter set and my "grannular PM"
> https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
> 
> It can always be reverted when we reach consensus on how to do it for
> every driver.
> 
> Regards!

I was going to say "Wow! Quick work, thank you!" until I noticed that
the patch set was submitted over two years ago and still hasn't been
merged.  :-(

Well, Nazar, at least now you have an answer and a potential fix.

Alan Stern

> On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Moving this bug report from bugzilla to the mailing lists.
> >
> > The short description of the bug is that in uvcvideo, disconnect races
> > with starting a video transfer.  The race shows up on Nazar's system
> > because of a marginal USB cable which leads to a lot of spontaneous
> > disconnections.
> >
> > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > >
> > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > Created attachment 303022
> > >   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > Kernel log with uvc-trace patch applied
> >
> > For everyone's information, here is the uvc-trace patch.  All it does is
> > add messages to the kernel log when uvcvideo's probe and disconnect
> > routines run, and just before uvc_video_start_transfer() calls
> > usb_set_interface().
> >
> > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> >                         "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> >                         altsetting, best_psize);
> >
> > +               dev_info(&intf->dev, "uvc set alt\n");
> >                 ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> >                 if (ret < 0)
> >                         return ret;
> > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> >         int function;
> >         int ret;
> >
> > +       dev_info(&intf->dev, "uvc_probe start\n");
> > +
> >         /* Allocate memory for the device and initialize it. */
> >         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> >         if (dev == NULL)
> > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> >                 return;
> >
> >         uvc_unregister_video(dev);
> > +       dev_info(&intf->dev, "uvc_disconnect done\n");
> >         kref_put(&dev->ref, uvc_delete);
> >  }
> >
> > The output in the kernel log below clearly shows that there is a bug in
> > the uvcvideo driver.
> >
> > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > within a few minutes.
> > >
> > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > for it to reboot, but the last time I did that in the log I got null pointer
> > > de-reference again.
> >
> > Here is the important part of the log:
> >
> > [  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > [  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > [  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > [  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > [  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > ...
> > [  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> >
> > This is where an ioctl calls uvc_video_start_transfer.
> >
> > [  748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > [  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> >
> > This is where the disconnect starts and finishes
> >
> > [  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [  748.898344] #PF: supervisor read access in kernel mode
> > [  748.898346] #PF: error_code(0x0000) - not-present page
> > [  748.898347] PGD 0 P4D 0
> > [  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > [  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > [  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > ...
> > [  748.898368] Call Trace:
> > [  748.898370]  <TASK>
> > [  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
> > [  748.898375]  usb_set_interface+0x122/0x350
> > [  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > [  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > [  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > [  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > [  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > [  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > [  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > [  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]
> >
> > And this proves that uvc_disconnect() returned before the driver was
> > finished accessing the device.
> >
> > I don't know how the driver works or how it tries to prevent this sort
> > of race from occurring, but apparently the strategy isn't working.
> >
> > > Please let me know if there is any other information I can provide and what
> > > could be the root cause of this annoying behavior.
> >
> > At this point I will bow out of the discussion; it's up to the uvcvideo
> > maintainers to investigate further.  Maybe they can provide a patch for
> > you to test.
> >
> > Alan Stern
> 
> 
> 
> -- 
> Ricardo Ribalda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-18  5:40     ` Ricardo Ribalda
  2022-10-18  5:42       ` Ricardo Ribalda
  2022-10-18 14:46       ` Alan Stern
@ 2022-10-18 15:02       ` Laurent Pinchart
  2022-10-19  1:35         ` Ricardo Ribalda
  2 siblings, 1 reply; 21+ messages in thread
From: Laurent Pinchart @ 2022-10-18 15:02 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Alan Stern, inux, Nazar Mokrynskyi, linux-media, linux-usb

hi Ricardo,

On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> Hi
> 
> Guenter already provided some patches to fix this issue:
> https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> 
> Until we have a solution on the core (or rewrite the kernel in rust
> ;P) , I think we should merge them (or something similar).
> 
> I can prepare a patchset merging Guenter set and my "grannular PM"
> https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/

How about working on a proper fix instead ? :-)

> It can always be reverted when we reach consensus on how to do it for
> every driver.
> 
> Regards!
> 
> 
> On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > Moving this bug report from bugzilla to the mailing lists.
> >
> > The short description of the bug is that in uvcvideo, disconnect races
> > with starting a video transfer.  The race shows up on Nazar's system
> > because of a marginal USB cable which leads to a lot of spontaneous
> > disconnections.
> >
> > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > >
> > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > Created attachment 303022
> > >   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > Kernel log with uvc-trace patch applied
> >
> > For everyone's information, here is the uvc-trace patch.  All it does is
> > add messages to the kernel log when uvcvideo's probe and disconnect
> > routines run, and just before uvc_video_start_transfer() calls
> > usb_set_interface().
> >
> > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> >                         "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> >                         altsetting, best_psize);
> >
> > +               dev_info(&intf->dev, "uvc set alt\n");
> >                 ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> >                 if (ret < 0)
> >                         return ret;
> > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> >         int function;
> >         int ret;
> >
> > +       dev_info(&intf->dev, "uvc_probe start\n");
> > +
> >         /* Allocate memory for the device and initialize it. */
> >         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> >         if (dev == NULL)
> > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> >                 return;
> >
> >         uvc_unregister_video(dev);
> > +       dev_info(&intf->dev, "uvc_disconnect done\n");
> >         kref_put(&dev->ref, uvc_delete);
> >  }
> >
> > The output in the kernel log below clearly shows that there is a bug in
> > the uvcvideo driver.
> >
> > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > within a few minutes.
> > >
> > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > for it to reboot, but the last time I did that in the log I got null pointer
> > > de-reference again.
> >
> > Here is the important part of the log:
> >
> > [  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > [  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > [  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > [  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > [  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > ...
> > [  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> >
> > This is where an ioctl calls uvc_video_start_transfer.
> >
> > [  748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > [  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> >
> > This is where the disconnect starts and finishes
> >
> > [  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [  748.898344] #PF: supervisor read access in kernel mode
> > [  748.898346] #PF: error_code(0x0000) - not-present page
> > [  748.898347] PGD 0 P4D 0
> > [  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > [  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > [  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > ...
> > [  748.898368] Call Trace:
> > [  748.898370]  <TASK>
> > [  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
> > [  748.898375]  usb_set_interface+0x122/0x350
> > [  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > [  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > [  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > [  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > [  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > [  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > [  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > [  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]
> >
> > And this proves that uvc_disconnect() returned before the driver was
> > finished accessing the device.
> >
> > I don't know how the driver works or how it tries to prevent this sort
> > of race from occurring, but apparently the strategy isn't working.
> >
> > > Please let me know if there is any other information I can provide and what
> > > could be the root cause of this annoying behavior.
> >
> > At this point I will bow out of the discussion; it's up to the uvcvideo
> > maintainers to investigate further.  Maybe they can provide a patch for
> > you to test.
> >
> > Alan Stern
> 
> 
> 
> -- 
> Ricardo Ribalda

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-18 15:02       ` Laurent Pinchart
@ 2022-10-19  1:35         ` Ricardo Ribalda
  2022-10-19  1:44           ` Laurent Pinchart
  0 siblings, 1 reply; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-19  1:35 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Alan Stern, Nazar Mokrynskyi, linux-media, linux-usb, linux, Tomasz Figa

Hi Laurent

On Wed, 19 Oct 2022 at 00:02, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> hi Ricardo,
>
> On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> > Hi
> >
> > Guenter already provided some patches to fix this issue:
> > https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> >
> > Until we have a solution on the core (or rewrite the kernel in rust
> > ;P) , I think we should merge them (or something similar).
> >
> > I can prepare a patchset merging Guenter set and my "grannular PM"
> > https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
>
> How about working on a proper fix instead ? :-)

We already have a fix that has been extensively tested ;P

When put on top of granular PM it is a tiny patch:
https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/commit/?h=b4/resend-powersave&id=cf826010bedda38f8faf8d072f95a9ca69ed452d
that can be cleanly reverted when/if we fix it in core.

I would like to avoid that more and more people/distros have
downstream patches on top of uvc to fix real issues just because we
think that it is not the "perfect" solution.

Would you please take a second look at the combined patchset?


Thanks!

>
> > It can always be reverted when we reach consensus on how to do it for
> > every driver.
> >
> > Regards!
> >
> >
> > On Tue, 18 Oct 2022 at 06:46, Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > Moving this bug report from bugzilla to the mailing lists.
> > >
> > > The short description of the bug is that in uvcvideo, disconnect races
> > > with starting a video transfer.  The race shows up on Nazar's system
> > > because of a marginal USB cable which leads to a lot of spontaneous
> > > disconnections.
> > >
> > > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > > >
> > > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > > Created attachment 303022
> > > >   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > > Kernel log with uvc-trace patch applied
> > >
> > > For everyone's information, here is the uvc-trace patch.  All it does is
> > > add messages to the kernel log when uvcvideo's probe and disconnect
> > > routines run, and just before uvc_video_start_transfer() calls
> > > usb_set_interface().
> > >
> > > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > >                         "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > >                         altsetting, best_psize);
> > >
> > > +               dev_info(&intf->dev, "uvc set alt\n");
> > >                 ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > >                 if (ret < 0)
> > >                         return ret;
> > > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > >         int function;
> > >         int ret;
> > >
> > > +       dev_info(&intf->dev, "uvc_probe start\n");
> > > +
> > >         /* Allocate memory for the device and initialize it. */
> > >         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > >         if (dev == NULL)
> > > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > >                 return;
> > >
> > >         uvc_unregister_video(dev);
> > > +       dev_info(&intf->dev, "uvc_disconnect done\n");
> > >         kref_put(&dev->ref, uvc_delete);
> > >  }
> > >
> > > The output in the kernel log below clearly shows that there is a bug in
> > > the uvcvideo driver.
> > >
> > > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > > within a few minutes.
> > > >
> > > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > > for it to reboot, but the last time I did that in the log I got null pointer
> > > > de-reference again.
> > >
> > > Here is the important part of the log:
> > >
> > > [  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > > [  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > > [  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > > [  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > > [  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > > ...
> > > [  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> > >
> > > This is where an ioctl calls uvc_video_start_transfer.
> > >
> > > [  748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > > [  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> > >
> > > This is where the disconnect starts and finishes
> > >
> > > [  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > [  748.898344] #PF: supervisor read access in kernel mode
> > > [  748.898346] #PF: error_code(0x0000) - not-present page
> > > [  748.898347] PGD 0 P4D 0
> > > [  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > [  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > > [  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > > [  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > > ...
> > > [  748.898368] Call Trace:
> > > [  748.898370]  <TASK>
> > > [  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
> > > [  748.898375]  usb_set_interface+0x122/0x350
> > > [  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > > [  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > > [  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > > [  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > > [  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > > [  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > > [  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > > [  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]
> > >
> > > And this proves that uvc_disconnect() returned before the driver was
> > > finished accessing the device.
> > >
> > > I don't know how the driver works or how it tries to prevent this sort
> > > of race from occurring, but apparently the strategy isn't working.
> > >
> > > > Please let me know if there is any other information I can provide and what
> > > > could be the root cause of this annoying behavior.
> > >
> > > At this point I will bow out of the discussion; it's up to the uvcvideo
> > > maintainers to investigate further.  Maybe they can provide a patch for
> > > you to test.
> > >
> > > Alan Stern
> >
> >
> >
> > --
> > Ricardo Ribalda
>
> --
> Regards,
>
> Laurent Pinchart



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-19  1:35         ` Ricardo Ribalda
@ 2022-10-19  1:44           ` Laurent Pinchart
  2022-10-19  4:22             ` Ricardo Ribalda
  0 siblings, 1 reply; 21+ messages in thread
From: Laurent Pinchart @ 2022-10-19  1:44 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Alan Stern, Nazar Mokrynskyi, linux-media, linux-usb, linux, Tomasz Figa

Hi Ricardo,

On Wed, Oct 19, 2022 at 10:35:00AM +0900, Ricardo Ribalda wrote:
> On Wed, 19 Oct 2022 at 00:02, Laurent Pinchart wrote:
> > On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> > > Hi
> > >
> > > Guenter already provided some patches to fix this issue:
> > > https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> > >
> > > Until we have a solution on the core (or rewrite the kernel in rust
> > > ;P) , I think we should merge them (or something similar).
> > >
> > > I can prepare a patchset merging Guenter set and my "grannular PM"
> > > https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
> >
> > How about working on a proper fix instead ? :-)
> 
> We already have a fix that has been extensively tested ;P
> 
> When put on top of granular PM it is a tiny patch:
> https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/commit/?h=b4/resend-powersave&id=cf826010bedda38f8faf8d072f95a9ca69ed452d
> that can be cleanly reverted when/if we fix it in core.
> 
> I would like to avoid that more and more people/distros have
> downstream patches on top of uvc to fix real issues just because we
> think that it is not the "perfect" solution.

And I would like to avoid having to roll out manual changes to all
drivers when the problem can be fixed in the core, just because nobody
can be bothered to spend time to implement a good fix. We don't have to
aim for a solution at the cdev level if that takes too long, an
implementation in V4L2 would be enough to start with.

I'm getting tired of having to reexplain this continuously with nobody
listening. This could have been solved a long time ago.

> Would you please take a second look at the combined patchset?

I will have a look. If I recall correctly, there were some patches in
Guenter's series that I had no issue with, I'll start with those.

> > > It can always be reverted when we reach consensus on how to do it for
> > > every driver.
> > >
> > > Regards!
> > >
> > > On Tue, 18 Oct 2022 at 06:46, Alan Stern wrote:
> > > >
> > > > Moving this bug report from bugzilla to the mailing lists.
> > > >
> > > > The short description of the bug is that in uvcvideo, disconnect races
> > > > with starting a video transfer.  The race shows up on Nazar's system
> > > > because of a marginal USB cable which leads to a lot of spontaneous
> > > > disconnections.
> > > >
> > > > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > > > >
> > > > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > > > Created attachment 303022
> > > > >   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > > > Kernel log with uvc-trace patch applied
> > > >
> > > > For everyone's information, here is the uvc-trace patch.  All it does is
> > > > add messages to the kernel log when uvcvideo's probe and disconnect
> > > > routines run, and just before uvc_video_start_transfer() calls
> > > > usb_set_interface().
> > > >
> > > > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > > >                         "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > > >                         altsetting, best_psize);
> > > >
> > > > +               dev_info(&intf->dev, "uvc set alt\n");
> > > >                 ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > > >                 if (ret < 0)
> > > >                         return ret;
> > > > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > > >         int function;
> > > >         int ret;
> > > >
> > > > +       dev_info(&intf->dev, "uvc_probe start\n");
> > > > +
> > > >         /* Allocate memory for the device and initialize it. */
> > > >         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > > >         if (dev == NULL)
> > > > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > > >                 return;
> > > >
> > > >         uvc_unregister_video(dev);
> > > > +       dev_info(&intf->dev, "uvc_disconnect done\n");
> > > >         kref_put(&dev->ref, uvc_delete);
> > > >  }
> > > >
> > > > The output in the kernel log below clearly shows that there is a bug in
> > > > the uvcvideo driver.
> > > >
> > > > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > > > within a few minutes.
> > > > >
> > > > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > > > for it to reboot, but the last time I did that in the log I got null pointer
> > > > > de-reference again.
> > > >
> > > > Here is the important part of the log:
> > > >
> > > > [  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > > > [  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > > > [  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > > > [  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > > > [  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > > > ...
> > > > [  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> > > >
> > > > This is where an ioctl calls uvc_video_start_transfer.
> > > >
> > > > [  748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > > > [  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> > > >
> > > > This is where the disconnect starts and finishes
> > > >
> > > > [  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > > [  748.898344] #PF: supervisor read access in kernel mode
> > > > [  748.898346] #PF: error_code(0x0000) - not-present page
> > > > [  748.898347] PGD 0 P4D 0
> > > > [  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > > [  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > > > [  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > > > [  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > > > ...
> > > > [  748.898368] Call Trace:
> > > > [  748.898370]  <TASK>
> > > > [  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
> > > > [  748.898375]  usb_set_interface+0x122/0x350
> > > > [  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > > > [  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > > > [  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > > > [  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > > > [  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > > > [  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > > > [  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > > > [  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]
> > > >
> > > > And this proves that uvc_disconnect() returned before the driver was
> > > > finished accessing the device.
> > > >
> > > > I don't know how the driver works or how it tries to prevent this sort
> > > > of race from occurring, but apparently the strategy isn't working.
> > > >
> > > > > Please let me know if there is any other information I can provide and what
> > > > > could be the root cause of this annoying behavior.
> > > >
> > > > At this point I will bow out of the discussion; it's up to the uvcvideo
> > > > maintainers to investigate further.  Maybe they can provide a patch for
> > > > you to test.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-19  1:44           ` Laurent Pinchart
@ 2022-10-19  4:22             ` Ricardo Ribalda
  2022-10-19 15:08               ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-19  4:22 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Alan Stern, Nazar Mokrynskyi, linux-media, linux-usb, linux, Tomasz Figa

Hi Laurent

On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Hi Ricardo,
>
> On Wed, Oct 19, 2022 at 10:35:00AM +0900, Ricardo Ribalda wrote:
> > On Wed, 19 Oct 2022 at 00:02, Laurent Pinchart wrote:
> > > On Tue, Oct 18, 2022 at 02:40:44PM +0900, Ricardo Ribalda wrote:
> > > > Hi
> > > >
> > > > Guenter already provided some patches to fix this issue:
> > > > https://lore.kernel.org/lkml/20200917022547.198090-1-linux@roeck-us.net/
> > > >
> > > > Until we have a solution on the core (or rewrite the kernel in rust
> > > > ;P) , I think we should merge them (or something similar).
> > > >
> > > > I can prepare a patchset merging Guenter set and my "grannular PM"
> > > > https://lore.kernel.org/linux-media/20220920-resend-powersave-v1-0-123aa2ba3836@chromium.org/
> > >
> > > How about working on a proper fix instead ? :-)
> >
> > We already have a fix that has been extensively tested ;P
> >
> > When put on top of granular PM it is a tiny patch:
> > https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/commit/?h=b4/resend-powersave&id=cf826010bedda38f8faf8d072f95a9ca69ed452d
> > that can be cleanly reverted when/if we fix it in core.
> >
> > I would like to avoid that more and more people/distros have
> > downstream patches on top of uvc to fix real issues just because we
> > think that it is not the "perfect" solution.
>
> And I would like to avoid having to roll out manual changes to all
> drivers when the problem can be fixed in the core, just because nobody
> can be bothered to spend time to implement a good fix. We don't have to
> aim for a solution at the cdev level if that takes too long, an
> implementation in V4L2 would be enough to start with.

Do we know what a "good fix" would look like?. This is a race
condition between cdev, v4l2, and usb_driver. The only entity that
knows about the three of them is the driver.

If we "fix" v4l2 to provide a callback to notify the framework about a
"bus disconnect". It can prevent new syscalls, but it cannot interrupt
the current ones.

So this is not something we can easily fix in O(months). Is there
anyone working on it after your LPC presentation?

Until then, landing a 10 lines patch that solves a real fix, that
distros are backporting it already is not a bad compromise....

>
> I'm getting tired of having to reexplain this continuously with nobody
> listening. This could have been solved a long time ago.

People listen, but it is a change that goes across multiple boundaries

>
> > Would you please take a second look at the combined patchset?
>
> I will have a look. If I recall correctly, there were some patches in
> Guenter's series that I had no issue with, I'll start with those.

Thanks, I will post the combined series today.

>
> > > > It can always be reverted when we reach consensus on how to do it for
> > > > every driver.
> > > >
> > > > Regards!
> > > >
> > > > On Tue, 18 Oct 2022 at 06:46, Alan Stern wrote:
> > > > >
> > > > > Moving this bug report from bugzilla to the mailing lists.
> > > > >
> > > > > The short description of the bug is that in uvcvideo, disconnect races
> > > > > with starting a video transfer.  The race shows up on Nazar's system
> > > > > because of a marginal USB cable which leads to a lot of spontaneous
> > > > > disconnections.
> > > > >
> > > > > On Mon, Oct 17, 2022 at 05:59:48PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216543
> > > > > >
> > > > > > --- Comment #7 from Nazar Mokrynskyi (nazar@mokrynskyi.com) ---
> > > > > > Created attachment 303022
> > > > > >   --> https://bugzilla.kernel.org/attachment.cgi?id=303022&action=edit
> > > > > > Kernel log with uvc-trace patch applied
> > > > >
> > > > > For everyone's information, here is the uvc-trace patch.  All it does is
> > > > > add messages to the kernel log when uvcvideo's probe and disconnect
> > > > > routines run, and just before uvc_video_start_transfer() calls
> > > > > usb_set_interface().
> > > > >
> > > > > --- usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > > +++ usb-devel/drivers/media/usb/uvc/uvc_video.c
> > > > > @@ -1965,6 +1965,7 @@ static int uvc_video_start_transfer(stru
> > > > >                         "Selecting alternate setting %u (%u B/frame bandwidth)\n",
> > > > >                         altsetting, best_psize);
> > > > >
> > > > > +               dev_info(&intf->dev, "uvc set alt\n");
> > > > >                 ret = usb_set_interface(stream->dev->udev, intfnum, altsetting);
> > > > >                 if (ret < 0)
> > > > >                         return ret;
> > > > > --- usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > > +++ usb-devel/drivers/media/usb/uvc/uvc_driver.c
> > > > > @@ -2374,6 +2374,8 @@ static int uvc_probe(struct usb_interfac
> > > > >         int function;
> > > > >         int ret;
> > > > >
> > > > > +       dev_info(&intf->dev, "uvc_probe start\n");
> > > > > +
> > > > >         /* Allocate memory for the device and initialize it. */
> > > > >         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> > > > >         if (dev == NULL)
> > > > > @@ -2535,6 +2537,7 @@ static void uvc_disconnect(struct usb_in
> > > > >                 return;
> > > > >
> > > > >         uvc_unregister_video(dev);
> > > > > +       dev_info(&intf->dev, "uvc_disconnect done\n");
> > > > >         kref_put(&dev->ref, uvc_delete);
> > > > >  }
> > > > >
> > > > > The output in the kernel log below clearly shows that there is a bug in
> > > > > the uvcvideo driver.
> > > > >
> > > > > > I'm on 6.0.2 and seemingly get this even more frequently with good cable and no
> > > > > > extra adapters. So I patched 6.0.2 with uvc-trace above and reproduced it
> > > > > > within a few minutes.
> > > > > >
> > > > > > USB seems to reset, often camera stops or freezes in the browser, but the light
> > > > > > on the camera itself remains on. Sometimes I can enable/disable/enable camera
> > > > > > for it to reboot, but the last time I did that in the log I got null pointer
> > > > > > de-reference again.
> > > > >
> > > > > Here is the important part of the log:
> > > > >
> > > > > [  684.746848] usb 8-2.4.4: reset SuperSpeed USB device number 6 using xhci_hcd
> > > > > [  684.810979] uvcvideo 8-2.4.4:1.0: uvc_probe start
> > > > > [  684.811032] usb 8-2.4.4: Found UVC 1.00 device Logitech BRIO (046d:085e)
> > > > > [  684.843413] input: Logitech BRIO as /devices/pci0000:00/0000:00:08.1/0000:59:00.3/usb8/8-2/8-2.4/8-2.4.4/8-2.4.4:1.0/input/input43
> > > > > [  684.911255] usb 8-2.4.4: current rate 16000 is different from the runtime rate 24000
> > > > > ...
> > > > > [  743.800368] uvcvideo 8-2.4.4:1.1: uvc set alt
> > > > >
> > > > > This is where an ioctl calls uvc_video_start_transfer.
> > > > >
> > > > > [  748.654701] usb 8-2.4.4: USB disconnect, device number 6
> > > > > [  748.714355] uvcvideo 8-2.4.4:1.0: uvc_disconnect done
> > > > >
> > > > > This is where the disconnect starts and finishes
> > > > >
> > > > > [  748.898340] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > > > [  748.898344] #PF: supervisor read access in kernel mode
> > > > > [  748.898346] #PF: error_code(0x0000) - not-present page
> > > > > [  748.898347] PGD 0 P4D 0
> > > > > [  748.898349] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > > > [  748.898351] CPU: 16 PID: 11890 Comm: VideoCapture Not tainted 6.0.2-x64v2-uvc-trace-xanmod1 #1
> > > > > [  748.898353] Hardware name: Gigabyte Technology Co., Ltd. B550 VISION D/B550 VISION D, BIOS F15d 07/20/2022
> > > > > [  748.898354] RIP: 0010:usb_ifnum_to_if+0x35/0x60
> > > > > ...
> > > > > [  748.898368] Call Trace:
> > > > > [  748.898370]  <TASK>
> > > > > [  748.898370]  usb_hcd_alloc_bandwidth+0x240/0x370
> > > > > [  748.898375]  usb_set_interface+0x122/0x350
> > > > > [  748.898378]  uvc_video_start_transfer.cold+0xd8/0x2ae [uvcvideo]
> > > > > [  748.898383]  uvc_video_start_streaming+0x75/0xd0 [uvcvideo]
> > > > > [  748.898386]  uvc_start_streaming+0x25/0xe0 [uvcvideo]
> > > > > [  748.898390]  vb2_start_streaming+0x86/0x140 [videobuf2_common]
> > > > > [  748.898393]  vb2_core_streamon+0x57/0xc0 [videobuf2_common]
> > > > > [  748.898395]  uvc_queue_streamon+0x25/0x40 [uvcvideo]
> > > > > [  748.898398]  uvc_ioctl_streamon+0x35/0x60 [uvcvideo]
> > > > > [  748.898401]  __video_do_ioctl+0x19a/0x3f0 [videodev]
> > > > >
> > > > > And this proves that uvc_disconnect() returned before the driver was
> > > > > finished accessing the device.
> > > > >
> > > > > I don't know how the driver works or how it tries to prevent this sort
> > > > > of race from occurring, but apparently the strategy isn't working.
> > > > >
> > > > > > Please let me know if there is any other information I can provide and what
> > > > > > could be the root cause of this annoying behavior.
> > > > >
> > > > > At this point I will bow out of the discussion; it's up to the uvcvideo
> > > > > maintainers to investigate further.  Maybe they can provide a patch for
> > > > > you to test.
>
> --
> Regards,
>
> Laurent Pinchart



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-19  4:22             ` Ricardo Ribalda
@ 2022-10-19 15:08               ` Alan Stern
  2022-10-20  0:57                 ` Ricardo Ribalda
  0 siblings, 1 reply; 21+ messages in thread
From: Alan Stern @ 2022-10-19 15:08 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Laurent Pinchart, Nazar Mokrynskyi, linux-media, linux-usb,
	linux, Tomasz Figa

On Wed, Oct 19, 2022 at 01:22:48PM +0900, Ricardo Ribalda wrote:
> Hi Laurent
> 
> On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
> <laurent.pinchart@ideasonboard.com> wrote:
> > And I would like to avoid having to roll out manual changes to all
> > drivers when the problem can be fixed in the core, just because nobody
> > can be bothered to spend time to implement a good fix. We don't have to
> > aim for a solution at the cdev level if that takes too long, an
> > implementation in V4L2 would be enough to start with.
> 
> Do we know what a "good fix" would look like?. This is a race
> condition between cdev, v4l2, and usb_driver. The only entity that
> knows about the three of them is the driver.
> 
> If we "fix" v4l2 to provide a callback to notify the framework about a
> "bus disconnect". It can prevent new syscalls, but it cannot interrupt
> the current ones.

It doesn't need to interrupt current syscalls.  It merely needs to wait 
until the current ones complete (and help them to complete early by 
making them aware of the disconnection) and to prevent new ones from 
starting.

I have no idea what facility (if any) the framework uses for this 
already.  However, if it turns out that proper synchronization needs a 
new approach, I suggest trying SRCU.  It can be viewed in some respects 
as a kind of read-write mutex that is highly optimized for rapid 
read-locks and -unlocks at the cost of very slow write-locks -- 
appropriate here since every syscall would need a read-lock whereas 
write-locking would be needed only when a disconnect occurs.

Alan Stern

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-19 15:08               ` Alan Stern
@ 2022-10-20  0:57                 ` Ricardo Ribalda
  2022-10-20 14:25                   ` Alan Stern
  0 siblings, 1 reply; 21+ messages in thread
From: Ricardo Ribalda @ 2022-10-20  0:57 UTC (permalink / raw)
  To: Alan Stern
  Cc: Laurent Pinchart, Nazar Mokrynskyi, linux-media, linux-usb,
	linux, Tomasz Figa

Hi Alan

On Thu, 20 Oct 2022 at 00:08, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Wed, Oct 19, 2022 at 01:22:48PM +0900, Ricardo Ribalda wrote:
> > Hi Laurent
> >
> > On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
> > <laurent.pinchart@ideasonboard.com> wrote:
> > > And I would like to avoid having to roll out manual changes to all
> > > drivers when the problem can be fixed in the core, just because nobody
> > > can be bothered to spend time to implement a good fix. We don't have to
> > > aim for a solution at the cdev level if that takes too long, an
> > > implementation in V4L2 would be enough to start with.
> >
> > Do we know what a "good fix" would look like?. This is a race
> > condition between cdev, v4l2, and usb_driver. The only entity that
> > knows about the three of them is the driver.
> >
> > If we "fix" v4l2 to provide a callback to notify the framework about a
> > "bus disconnect". It can prevent new syscalls, but it cannot interrupt
> > the current ones.
>
> It doesn't need to interrupt current syscalls.  It merely needs to wait
> until the current ones complete (and help them to complete early by
> making them aware of the disconnection) and to prevent new ones from
> starting.
>

The USB subsystem is not aware of the current syscalls running for that device,
it just triggers the callback disconnect() to notify the driver that
they are not allowed
anymore to access the hardware.

Even when/if we fix this for real, a "basic test" checking if the
device is disconnected is a
nice thing to have. I think of it as a protective programming :)

Something like:

if WARN_ON(is_connected)
   return -EIO;



> I have no idea what facility (if any) the framework uses for this
> already.  However, if it turns out that proper synchronization needs a
> new approach, I suggest trying SRCU.  It can be viewed in some respects
> as a kind of read-write mutex that is highly optimized for rapid
> read-locks and -unlocks at the cost of very slow write-locks --
> appropriate here since every syscall would need a read-lock whereas
> write-locking would be needed only when a disconnect occurs.


Thanks for the pointer :)

>
> Alan Stern



-- 
Ricardo Ribalda

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug 216543] kernel NULL pointer dereference usb_hcd_alloc_bandwidth
  2022-10-20  0:57                 ` Ricardo Ribalda
@ 2022-10-20 14:25                   ` Alan Stern
  0 siblings, 0 replies; 21+ messages in thread
From: Alan Stern @ 2022-10-20 14:25 UTC (permalink / raw)
  To: Ricardo Ribalda
  Cc: Laurent Pinchart, Nazar Mokrynskyi, linux-media, linux-usb,
	linux, Tomasz Figa

On Thu, Oct 20, 2022 at 09:57:24AM +0900, Ricardo Ribalda wrote:
> Hi Alan
> 
> On Thu, 20 Oct 2022 at 00:08, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Wed, Oct 19, 2022 at 01:22:48PM +0900, Ricardo Ribalda wrote:
> > > Hi Laurent
> > >
> > > On Wed, 19 Oct 2022 at 10:45, Laurent Pinchart
> > > <laurent.pinchart@ideasonboard.com> wrote:
> > > > And I would like to avoid having to roll out manual changes to all
> > > > drivers when the problem can be fixed in the core, just because nobody
> > > > can be bothered to spend time to implement a good fix. We don't have to
> > > > aim for a solution at the cdev level if that takes too long, an
> > > > implementation in V4L2 would be enough to start with.
> > >
> > > Do we know what a "good fix" would look like?. This is a race
> > > condition between cdev, v4l2, and usb_driver. The only entity that
> > > knows about the three of them is the driver.
> > >
> > > If we "fix" v4l2 to provide a callback to notify the framework about a
> > > "bus disconnect". It can prevent new syscalls, but it cannot interrupt
> > > the current ones.
> >
> > It doesn't need to interrupt current syscalls.  It merely needs to wait
> > until the current ones complete (and help them to complete early by
> > making them aware of the disconnection) and to prevent new ones from
> > starting.
> >
> 
> The USB subsystem is not aware of the current syscalls running for that device,
> it just triggers the callback disconnect() to notify the driver that
> they are not allowed
> anymore to access the hardware.

Right.  The question is: At what level in the various video code paths 
should the check for "device gone" then be made?

One possibility is to do all these checks at the USB driver level.  This 
has the advantage of not requiring any changes to the V4L2 core or 
elsewhere in the framework.  But it has the disadvantage of spreading 
the checks all over the place, making it that much easier to forget one.  
Furthermore it would help only the USB driver, not any of the other 
video drivers.

Another possibility is to do the checks at the framework level, or at 
least, higher up than the driver level.  For example, upon syscall 
entry, and for long-running commands, at suitable intermediate points.  
This can be implemented by having the driver call a core function from 
within its disconnect callback; that function would let the framework 
know the device is gone and wouldn't return until all existing syscall 
threads were aware of this fact and had stopped accessing the device.
This approach has advantages and disadvantages complementary to the 
first approach.

I can't tell you which approach is better -- that's up to Laurent :-) -- 
I just want to make sure you are aware of the possibilities and their 
tradeoffs.

> Even when/if we fix this for real, a "basic test" checking if the
> device is disconnected is a
> nice thing to have. I think of it as a protective programming :)
> 
> Something like:
> 
> if WARN_ON(is_connected)
>    return -EIO;

Sure, it wouldn't hurt to sprinkle things like that here and there.  But 
obviously they won't fix the problem by themselves.

Alan Stern

> > I have no idea what facility (if any) the framework uses for this
> > already.  However, if it turns out that proper synchronization needs a
> > new approach, I suggest trying SRCU.  It can be viewed in some respects
> > as a kind of read-write mutex that is highly optimized for rapid
> > read-locks and -unlocks at the cost of very slow write-locks --
> > appropriate here since every syscall would need a read-lock whereas
> > write-locking would be needed only when a disconnect occurs.
> 
> 
> Thanks for the pointer :)
> 
> >
> > Alan Stern
> 
> 
> 
> -- 
> Ricardo Ribalda

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-10-20 14:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-29 18:53 [Bug 216543] New: kernel NULL pointer dereference usb_hcd_alloc_bandwidth bugzilla-daemon
2022-09-30  8:53 ` Greg KH
2022-09-30  8:53 ` [Bug 216543] " bugzilla-daemon
2022-09-30 11:32 ` bugzilla-daemon
2022-09-30 11:32 ` bugzilla-daemon
2022-09-30 11:38 ` bugzilla-daemon
2022-09-30 12:32   ` Greg KH
2022-09-30 12:32 ` bugzilla-daemon
2022-09-30 14:28 ` bugzilla-daemon
2022-10-17 17:59 ` bugzilla-daemon
2022-10-17 21:25   ` Alan Stern
2022-10-18  5:40     ` Ricardo Ribalda
2022-10-18  5:42       ` Ricardo Ribalda
2022-10-18 14:46       ` Alan Stern
2022-10-18 15:02       ` Laurent Pinchart
2022-10-19  1:35         ` Ricardo Ribalda
2022-10-19  1:44           ` Laurent Pinchart
2022-10-19  4:22             ` Ricardo Ribalda
2022-10-19 15:08               ` Alan Stern
2022-10-20  0:57                 ` Ricardo Ribalda
2022-10-20 14:25                   ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.