Linux-USB Archive on lore.kernel.org
 help / color / Atom feed
* BUG with linux 5.9.0 with dwc3 in gadget mode
@ 2020-10-16 20:21 Ferry Toth
  2020-10-19  5:45 ` Felipe Balbi
  0 siblings, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-16 20:21 UTC (permalink / raw)
  To: linux-usb; +Cc: felipe.balbi

This occurs with edison-arduino board, that has a nifty switch allowing 
to switch between gadget/host mode. In host mode it boot fine, then 
crashes when I flip the switch to gadget.

The below trace if what I get from the console when booting with gadget 
mode selected.

The last kernel is used where everything is obviously working fine is 5.6.0.

The kernel is built specifically for the platform, nothing suspcious 
going on the the dwc3 area, see 
https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0

Magic signature found

Starting kernel ...

[    2.395631] Initramfs unpacking failed: invalid magic at start of 
compressed archive
Scanning for Btrfs filesystems
Starting version 243.2+
Kernel with acpi enabled detected
Loading acpi tables
Waiting for root device /dev/mmcblk0p8
   10Found device '/run/media/mmcblk0p8'
   9Init found, booting...
[   10.834272] brcmfmac: brcmf_fw_alloc_request: using 
brcm/brcmfmac43340-sdio for chip BCM43340/2
[   11.179662] brcmfmac: brcmf_fw_alloc_request: using 
brcm/brcmfmac43340-sdio for chip BCM43340/2
[   11.194223] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available 
(err=-2), device may have limited channels available
[   11.234779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43340/2 
wl0: Oct 23 2017 08:41:23 version 6.10.190.70 (r674464) FWID 01-98d71006
[   12.401620] BUG: unable to handle page fault for address: 
0000000100000000
[   12.408496] #PF: supervisor instruction fetch in kernel mode
[   12.414145] #PF: error_code(0x0010) - not-present page
[   12.419276] PGD 0 P4D 0
[   12.421817] Oops: 0010 [#1] SMP PTI
[   12.425307] CPU: 0 PID: 488 Comm: irq/15-dwc3 Not tainted 
5.9.0-edison-acpi-standard #1
[   12.433297] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
BIOS 542 2015.01.21:18.19.48
[   12.442075] RIP: 0010:0x100000000
[   12.445382] Code: Bad RIP value.
[   12.448605] RSP: 0000:ffff9a95403fbbf8 EFLAGS: 00010046
[   12.453827] RAX: 0000000100000000 RBX: ffff8ee8bd32f828 RCX: 
ffff8ee8bacc4000
[   12.460950] RDX: 00000000ffffff94 RSI: ffff8ee8bc01a5a0 RDI: 
ffff8ee887228700
[   12.468075] RBP: ffff8ee8bc01a5a0 R08: 0000000000000046 R09: 
0000000000000238
[   12.475199] R10: 0000000000000004 R11: ffff8ee8ba8ba248 R12: 
ffff8ee887228700
[   12.482322] R13: ffff8ee8bd32f828 R14: 0000000000000002 R15: 
ffff8ee8bae93200
[   12.489449] FS:  0000000000000000(0000) GS:ffff8ee8be200000(0000) 
knlGS:0000000000000000
[   12.497524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.503262] CR2: 0000000100000000 CR3: 000000003c5ae000 CR4: 
00000000001006f0
[   12.510382] Call Trace:
[   12.512841]  ? dwc3_gadget_giveback+0xbf/0x120
[   12.517286]  ? __dwc3_gadget_ep_disable+0xc5/0x250
[   12.522077]  ? dwc3_gadget_ep_disable+0x3d/0xd0
[   12.526608]  ? usb_ep_disable+0x1d/0x80
[   12.530451]  ? u_audio_stop_capture+0x87/0x9a [u_audio]
[   12.535680]  ? afunc_set_alt+0x73/0x80 [usb_f_uac2]
[   12.540562]  ? composite_setup+0x20f/0x1b20 [libcomposite]
[   12.546053]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
[   12.552060]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
[   12.558062]  ? dwc3_ep0_delegate_req+0x24/0x40
[   12.562502]  ? dwc3_ep0_interrupt+0x40a/0x9d8
[   12.566858]  ? dwc3_thread_interrupt+0x880/0xf70
[   12.571475]  ? __schedule+0x3ee/0x640
[   12.575143]  ? irq_forced_thread_fn+0x70/0x70
[   12.579497]  ? irq_thread_fn+0x1b/0x60
[   12.583245]  ? irq_thread+0xd3/0x150
[   12.586821]  ? wake_threads_waitq+0x30/0x30
[   12.591001]  ? irq_thread_dtor+0x80/0x80
[   12.594925]  ? kthread+0xf9/0x130
[   12.598238]  ? kthread_park+0x80/0x80
[   12.601901]  ? ret_from_fork+0x22/0x30
[   12.605644] Modules linked in: spi_pxa2xx_platform dw_dmac usb_f_uac2 
u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial 
libcomposite pwm_lpss_pci snd_sof_pci snd_sof_intel_byt pwm_lpss 
snd_sof_intel_ipc snd_sof_xtensa_dsp intel_mrfld_pwrbtn intel_mrfld_adc 
snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil 
leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer 
kfifo_buf spidev ledtrig_heartbeat mmc_block extcon_intel_mrfld 
sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs 
libcrc32c xor zstd_compress zlib_deflate raid6_pq
[   12.657416] CR2: 0000000100000000
[   12.660729] ---[ end trace 9b92dea6da33c71e ]---


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-16 20:21 BUG with linux 5.9.0 with dwc3 in gadget mode Ferry Toth
@ 2020-10-19  5:45 ` Felipe Balbi
  2020-10-19  7:14   ` Ferry Toth
                     ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Felipe Balbi @ 2020-10-19  5:45 UTC (permalink / raw)
  To: Ferry Toth, linux-usb, Andy Shevchenko; +Cc: felipe.balbi


[-- Attachment #1: Type: text/plain, Size: 4826 bytes --]


Hi Andy,

Ferry Toth <fntoth@gmail.com> writes:
> This occurs with edison-arduino board, that has a nifty switch allowing 
> to switch between gadget/host mode. In host mode it boot fine, then 
> crashes when I flip the switch to gadget.
>
> The below trace if what I get from the console when booting with gadget 
> mode selected.
>
> The last kernel is used where everything is obviously working fine is 5.6.0.
>
> The kernel is built specifically for the platform, nothing suspcious 
> going on the the dwc3 area, see 
> https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>
> Magic signature found
>
> Starting kernel ...
>
> [    2.395631] Initramfs unpacking failed: invalid magic at start of 
> compressed archive
> Scanning for Btrfs filesystems
> Starting version 243.2+
> Kernel with acpi enabled detected
> Loading acpi tables
> Waiting for root device /dev/mmcblk0p8
>    10Found device '/run/media/mmcblk0p8'
>    9Init found, booting...
> [   10.834272] brcmfmac: brcmf_fw_alloc_request: using 
> brcm/brcmfmac43340-sdio for chip BCM43340/2
> [   11.179662] brcmfmac: brcmf_fw_alloc_request: using 
> brcm/brcmfmac43340-sdio for chip BCM43340/2
> [   11.194223] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available 
> (err=-2), device may have limited channels available
> [   11.234779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43340/2 
> wl0: Oct 23 2017 08:41:23 version 6.10.190.70 (r674464) FWID 01-98d71006
> [   12.401620] BUG: unable to handle page fault for address: 
> 0000000100000000
> [   12.408496] #PF: supervisor instruction fetch in kernel mode
> [   12.414145] #PF: error_code(0x0010) - not-present page
> [   12.419276] PGD 0 P4D 0
> [   12.421817] Oops: 0010 [#1] SMP PTI
> [   12.425307] CPU: 0 PID: 488 Comm: irq/15-dwc3 Not tainted 
> 5.9.0-edison-acpi-standard #1
> [   12.433297] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
> BIOS 542 2015.01.21:18.19.48
> [   12.442075] RIP: 0010:0x100000000
> [   12.445382] Code: Bad RIP value.
> [   12.448605] RSP: 0000:ffff9a95403fbbf8 EFLAGS: 00010046
> [   12.453827] RAX: 0000000100000000 RBX: ffff8ee8bd32f828 RCX: 
> ffff8ee8bacc4000
> [   12.460950] RDX: 00000000ffffff94 RSI: ffff8ee8bc01a5a0 RDI: 
> ffff8ee887228700
> [   12.468075] RBP: ffff8ee8bc01a5a0 R08: 0000000000000046 R09: 
> 0000000000000238
> [   12.475199] R10: 0000000000000004 R11: ffff8ee8ba8ba248 R12: 
> ffff8ee887228700
> [   12.482322] R13: ffff8ee8bd32f828 R14: 0000000000000002 R15: 
> ffff8ee8bae93200
> [   12.489449] FS:  0000000000000000(0000) GS:ffff8ee8be200000(0000) 
> knlGS:0000000000000000
> [   12.497524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   12.503262] CR2: 0000000100000000 CR3: 000000003c5ae000 CR4: 
> 00000000001006f0
> [   12.510382] Call Trace:
> [   12.512841]  ? dwc3_gadget_giveback+0xbf/0x120
> [   12.517286]  ? __dwc3_gadget_ep_disable+0xc5/0x250
> [   12.522077]  ? dwc3_gadget_ep_disable+0x3d/0xd0
> [   12.526608]  ? usb_ep_disable+0x1d/0x80
> [   12.530451]  ? u_audio_stop_capture+0x87/0x9a [u_audio]
> [   12.535680]  ? afunc_set_alt+0x73/0x80 [usb_f_uac2]
> [   12.540562]  ? composite_setup+0x20f/0x1b20 [libcomposite]
> [   12.546053]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
> [   12.552060]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
> [   12.558062]  ? dwc3_ep0_delegate_req+0x24/0x40
> [   12.562502]  ? dwc3_ep0_interrupt+0x40a/0x9d8
> [   12.566858]  ? dwc3_thread_interrupt+0x880/0xf70
> [   12.571475]  ? __schedule+0x3ee/0x640
> [   12.575143]  ? irq_forced_thread_fn+0x70/0x70
> [   12.579497]  ? irq_thread_fn+0x1b/0x60
> [   12.583245]  ? irq_thread+0xd3/0x150
> [   12.586821]  ? wake_threads_waitq+0x30/0x30
> [   12.591001]  ? irq_thread_dtor+0x80/0x80
> [   12.594925]  ? kthread+0xf9/0x130
> [   12.598238]  ? kthread_park+0x80/0x80
> [   12.601901]  ? ret_from_fork+0x22/0x30
> [   12.605644] Modules linked in: spi_pxa2xx_platform dw_dmac usb_f_uac2 
> u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial 
> libcomposite pwm_lpss_pci snd_sof_pci snd_sof_intel_byt pwm_lpss 
> snd_sof_intel_ipc snd_sof_xtensa_dsp intel_mrfld_pwrbtn intel_mrfld_adc 
> snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil 
> leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer 
> kfifo_buf spidev ledtrig_heartbeat mmc_block extcon_intel_mrfld 
> sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs 
> libcrc32c xor zstd_compress zlib_deflate raid6_pq
> [   12.657416] CR2: 0000000100000000
> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---

It this something you can reproduce on your end? Ferry, can you get dwc3
trace logs when this happens? ftrace_dump_on_oops may help here.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19  5:45 ` Felipe Balbi
@ 2020-10-19  7:14   ` Ferry Toth
  2020-10-19 18:49     ` Ferry Toth
  2020-10-19  7:18   ` Ferry Toth
  2020-10-19 19:46   ` Andy Shevchenko
  2 siblings, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-19  7:14 UTC (permalink / raw)
  To: linux-usb; +Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA

Op 19-10-2020 om 07:45 schreef Felipe Balbi:
> 
> Hi Andy,
> 
> Ferry Toth <fntoth@gmail.com> writes:
>> This occurs with edison-arduino board, that has a nifty switch allowing
>> to switch between gadget/host mode. In host mode it boot fine, then
>> crashes when I flip the switch to gadget.
>>
>> The below trace if what I get from the console when booting with gadget
>> mode selected.
>>
>> The last kernel is used where everything is obviously working fine is 5.6.0.
>>
>> The kernel is built specifically for the platform, nothing suspcious
>> going on the the dwc3 area, see
>> https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>>
>> Magic signature found
>>
>> Starting kernel ...
>>
>> [    2.395631] Initramfs unpacking failed: invalid magic at start of
>> compressed archive
>> Scanning for Btrfs filesystems
>> Starting version 243.2+
>> Kernel with acpi enabled detected
>> Loading acpi tables
>> Waiting for root device /dev/mmcblk0p8
>>     10Found device '/run/media/mmcblk0p8'
>>     9Init found, booting...
>> [   10.834272] brcmfmac: brcmf_fw_alloc_request: using
>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>> [   11.179662] brcmfmac: brcmf_fw_alloc_request: using
>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>> [   11.194223] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available
>> (err=-2), device may have limited channels available
>> [   11.234779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43340/2
>> wl0: Oct 23 2017 08:41:23 version 6.10.190.70 (r674464) FWID 01-98d71006
>> [   12.401620] BUG: unable to handle page fault for address:
>> 0000000100000000
>> [   12.408496] #PF: supervisor instruction fetch in kernel mode
>> [   12.414145] #PF: error_code(0x0010) - not-present page
>> [   12.419276] PGD 0 P4D 0
>> [   12.421817] Oops: 0010 [#1] SMP PTI
>> [   12.425307] CPU: 0 PID: 488 Comm: irq/15-dwc3 Not tainted
>> 5.9.0-edison-acpi-standard #1
>> [   12.433297] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>> BIOS 542 2015.01.21:18.19.48
>> [   12.442075] RIP: 0010:0x100000000
>> [   12.445382] Code: Bad RIP value.
>> [   12.448605] RSP: 0000:ffff9a95403fbbf8 EFLAGS: 00010046
>> [   12.453827] RAX: 0000000100000000 RBX: ffff8ee8bd32f828 RCX:
>> ffff8ee8bacc4000
>> [   12.460950] RDX: 00000000ffffff94 RSI: ffff8ee8bc01a5a0 RDI:
>> ffff8ee887228700
>> [   12.468075] RBP: ffff8ee8bc01a5a0 R08: 0000000000000046 R09:
>> 0000000000000238
>> [   12.475199] R10: 0000000000000004 R11: ffff8ee8ba8ba248 R12:
>> ffff8ee887228700
>> [   12.482322] R13: ffff8ee8bd32f828 R14: 0000000000000002 R15:
>> ffff8ee8bae93200
>> [   12.489449] FS:  0000000000000000(0000) GS:ffff8ee8be200000(0000)
>> knlGS:0000000000000000
>> [   12.497524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   12.503262] CR2: 0000000100000000 CR3: 000000003c5ae000 CR4:
>> 00000000001006f0
>> [   12.510382] Call Trace:
>> [   12.512841]  ? dwc3_gadget_giveback+0xbf/0x120
>> [   12.517286]  ? __dwc3_gadget_ep_disable+0xc5/0x250
>> [   12.522077]  ? dwc3_gadget_ep_disable+0x3d/0xd0
>> [   12.526608]  ? usb_ep_disable+0x1d/0x80
>> [   12.530451]  ? u_audio_stop_capture+0x87/0x9a [u_audio]
>> [   12.535680]  ? afunc_set_alt+0x73/0x80 [usb_f_uac2]
>> [   12.540562]  ? composite_setup+0x20f/0x1b20 [libcomposite]
>> [   12.546053]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>> [   12.552060]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>> [   12.558062]  ? dwc3_ep0_delegate_req+0x24/0x40
>> [   12.562502]  ? dwc3_ep0_interrupt+0x40a/0x9d8
>> [   12.566858]  ? dwc3_thread_interrupt+0x880/0xf70
>> [   12.571475]  ? __schedule+0x3ee/0x640
>> [   12.575143]  ? irq_forced_thread_fn+0x70/0x70
>> [   12.579497]  ? irq_thread_fn+0x1b/0x60
>> [   12.583245]  ? irq_thread+0xd3/0x150
>> [   12.586821]  ? wake_threads_waitq+0x30/0x30
>> [   12.591001]  ? irq_thread_dtor+0x80/0x80
>> [   12.594925]  ? kthread+0xf9/0x130
>> [   12.598238]  ? kthread_park+0x80/0x80
>> [   12.601901]  ? ret_from_fork+0x22/0x30
>> [   12.605644] Modules linked in: spi_pxa2xx_platform dw_dmac usb_f_uac2
>> u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial
>> libcomposite pwm_lpss_pci snd_sof_pci snd_sof_intel_byt pwm_lpss
>> snd_sof_intel_ipc snd_sof_xtensa_dsp intel_mrfld_pwrbtn intel_mrfld_adc
>> snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil
>> leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer
>> kfifo_buf spidev ledtrig_heartbeat mmc_block extcon_intel_mrfld
>> sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs
>> libcrc32c xor zstd_compress zlib_deflate raid6_pq
>> [   12.657416] CR2: 0000000100000000
>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
> 
> It this something you can reproduce on your end? Ferry, can you get dwc3
> trace logs when this happens? ftrace_dump_on_oops may help here.
> 
I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or 
do I need to do more?

BTW after posting this I found in host mode dwc3 is not working properly 
either. No oops, but no driver get loaded on device plug in.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19  5:45 ` Felipe Balbi
  2020-10-19  7:14   ` Ferry Toth
@ 2020-10-19  7:18   ` Ferry Toth
  2020-10-20 12:32     ` Felipe Balbi
  2020-10-19 19:46   ` Andy Shevchenko
  2 siblings, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-19  7:18 UTC (permalink / raw)
  To: Felipe Balbi, linux-usb, Andy Shevchenko; +Cc: felipe.balbi

Op 19-10-2020 om 07:45 schreef Felipe Balbi:
> 
> Hi Andy,
> 
> Ferry Toth <fntoth@gmail.com> writes:
>> This occurs with edison-arduino board, that has a nifty switch allowing
>> to switch between gadget/host mode. In host mode it boot fine, then
>> crashes when I flip the switch to gadget.
>>
>> The below trace if what I get from the console when booting with gadget
>> mode selected.
>>
>> The last kernel is used where everything is obviously working fine is 5.6.0.
>>
>> The kernel is built specifically for the platform, nothing suspcious
>> going on the the dwc3 area, see
>> https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>>
>> Magic signature found
>>
>> Starting kernel ...
>>
>> [    2.395631] Initramfs unpacking failed: invalid magic at start of
>> compressed archive
>> Scanning for Btrfs filesystems
>> Starting version 243.2+
>> Kernel with acpi enabled detected
>> Loading acpi tables
>> Waiting for root device /dev/mmcblk0p8
>>     10Found device '/run/media/mmcblk0p8'
>>     9Init found, booting...
>> [   10.834272] brcmfmac: brcmf_fw_alloc_request: using
>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>> [   11.179662] brcmfmac: brcmf_fw_alloc_request: using
>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>> [   11.194223] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available
>> (err=-2), device may have limited channels available
>> [   11.234779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43340/2
>> wl0: Oct 23 2017 08:41:23 version 6.10.190.70 (r674464) FWID 01-98d71006
>> [   12.401620] BUG: unable to handle page fault for address:
>> 0000000100000000
>> [   12.408496] #PF: supervisor instruction fetch in kernel mode
>> [   12.414145] #PF: error_code(0x0010) - not-present page
>> [   12.419276] PGD 0 P4D 0
>> [   12.421817] Oops: 0010 [#1] SMP PTI
>> [   12.425307] CPU: 0 PID: 488 Comm: irq/15-dwc3 Not tainted
>> 5.9.0-edison-acpi-standard #1
>> [   12.433297] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>> BIOS 542 2015.01.21:18.19.48
>> [   12.442075] RIP: 0010:0x100000000
>> [   12.445382] Code: Bad RIP value.
>> [   12.448605] RSP: 0000:ffff9a95403fbbf8 EFLAGS: 00010046
>> [   12.453827] RAX: 0000000100000000 RBX: ffff8ee8bd32f828 RCX:
>> ffff8ee8bacc4000
>> [   12.460950] RDX: 00000000ffffff94 RSI: ffff8ee8bc01a5a0 RDI:
>> ffff8ee887228700
>> [   12.468075] RBP: ffff8ee8bc01a5a0 R08: 0000000000000046 R09:
>> 0000000000000238
>> [   12.475199] R10: 0000000000000004 R11: ffff8ee8ba8ba248 R12:
>> ffff8ee887228700
>> [   12.482322] R13: ffff8ee8bd32f828 R14: 0000000000000002 R15:
>> ffff8ee8bae93200
>> [   12.489449] FS:  0000000000000000(0000) GS:ffff8ee8be200000(0000)
>> knlGS:0000000000000000
>> [   12.497524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   12.503262] CR2: 0000000100000000 CR3: 000000003c5ae000 CR4:
>> 00000000001006f0
>> [   12.510382] Call Trace:
>> [   12.512841]  ? dwc3_gadget_giveback+0xbf/0x120
>> [   12.517286]  ? __dwc3_gadget_ep_disable+0xc5/0x250
>> [   12.522077]  ? dwc3_gadget_ep_disable+0x3d/0xd0
>> [   12.526608]  ? usb_ep_disable+0x1d/0x80
>> [   12.530451]  ? u_audio_stop_capture+0x87/0x9a [u_audio]
>> [   12.535680]  ? afunc_set_alt+0x73/0x80 [usb_f_uac2]
>> [   12.540562]  ? composite_setup+0x20f/0x1b20 [libcomposite]
>> [   12.546053]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>> [   12.552060]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>> [   12.558062]  ? dwc3_ep0_delegate_req+0x24/0x40
>> [   12.562502]  ? dwc3_ep0_interrupt+0x40a/0x9d8
>> [   12.566858]  ? dwc3_thread_interrupt+0x880/0xf70
>> [   12.571475]  ? __schedule+0x3ee/0x640
>> [   12.575143]  ? irq_forced_thread_fn+0x70/0x70
>> [   12.579497]  ? irq_thread_fn+0x1b/0x60
>> [   12.583245]  ? irq_thread+0xd3/0x150
>> [   12.586821]  ? wake_threads_waitq+0x30/0x30
>> [   12.591001]  ? irq_thread_dtor+0x80/0x80
>> [   12.594925]  ? kthread+0xf9/0x130
>> [   12.598238]  ? kthread_park+0x80/0x80
>> [   12.601901]  ? ret_from_fork+0x22/0x30
>> [   12.605644] Modules linked in: spi_pxa2xx_platform dw_dmac usb_f_uac2
>> u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial
>> libcomposite pwm_lpss_pci snd_sof_pci snd_sof_intel_byt pwm_lpss
>> snd_sof_intel_ipc snd_sof_xtensa_dsp intel_mrfld_pwrbtn intel_mrfld_adc
>> snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil
>> leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer
>> kfifo_buf spidev ledtrig_heartbeat mmc_block extcon_intel_mrfld
>> sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs
>> libcrc32c xor zstd_compress zlib_deflate raid6_pq
>> [   12.657416] CR2: 0000000100000000
>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
> 
> It this something you can reproduce on your end? Ferry, can you get dwc3
> trace logs when this happens? ftrace_dump_on_oops may help here.
I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or 
do I need to do more?

BTW after posting this I found in host mode dwc3 is not working properly 
either. No oops, but no driver get loaded on device plug in.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19  7:14   ` Ferry Toth
@ 2020-10-19 18:49     ` Ferry Toth
  2020-10-20 12:35       ` Felipe Balbi
  0 siblings, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-19 18:49 UTC (permalink / raw)
  To: linux-usb; +Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg

Op 19-10-2020 om 09:14 schreef Ferry Toth:
> Op 19-10-2020 om 07:45 schreef Felipe Balbi:
>>
>> Hi Andy,
>>
>> Ferry Toth <fntoth@gmail.com> writes:
>>> This occurs with edison-arduino board, that has a nifty switch allowing
>>> to switch between gadget/host mode. In host mode it boot fine, then
>>> crashes when I flip the switch to gadget.
>>>
>>> The below trace if what I get from the console when booting with gadget
>>> mode selected.
>>>
>>> The last kernel is used where everything is obviously working fine is 
>>> 5.6.0.
>>>
>>> The kernel is built specifically for the platform, nothing suspcious
>>> going on the the dwc3 area, see
>>> https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>>>
>>> Magic signature found
>>>
>>> Starting kernel ...
>>>
>>> [    2.395631] Initramfs unpacking failed: invalid magic at start of
>>> compressed archive
>>> Scanning for Btrfs filesystems
>>> Starting version 243.2+
>>> Kernel with acpi enabled detected
>>> Loading acpi tables
>>> Waiting for root device /dev/mmcblk0p8
>>>     10Found device '/run/media/mmcblk0p8'
>>>     9Init found, booting...
>>> [   10.834272] brcmfmac: brcmf_fw_alloc_request: using
>>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>>> [   11.179662] brcmfmac: brcmf_fw_alloc_request: using
>>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>>> [   11.194223] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available
>>> (err=-2), device may have limited channels available
>>> [   11.234779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43340/2
>>> wl0: Oct 23 2017 08:41:23 version 6.10.190.70 (r674464) FWID 01-98d71006
>>> [   12.401620] BUG: unable to handle page fault for address:
>>> 0000000100000000
>>> [   12.408496] #PF: supervisor instruction fetch in kernel mode
>>> [   12.414145] #PF: error_code(0x0010) - not-present page
>>> [   12.419276] PGD 0 P4D 0
>>> [   12.421817] Oops: 0010 [#1] SMP PTI
>>> [   12.425307] CPU: 0 PID: 488 Comm: irq/15-dwc3 Not tainted
>>> 5.9.0-edison-acpi-standard #1
>>> [   12.433297] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>> BIOS 542 2015.01.21:18.19.48
>>> [   12.442075] RIP: 0010:0x100000000
>>> [   12.445382] Code: Bad RIP value.
>>> [   12.448605] RSP: 0000:ffff9a95403fbbf8 EFLAGS: 00010046
>>> [   12.453827] RAX: 0000000100000000 RBX: ffff8ee8bd32f828 RCX:
>>> ffff8ee8bacc4000
>>> [   12.460950] RDX: 00000000ffffff94 RSI: ffff8ee8bc01a5a0 RDI:
>>> ffff8ee887228700
>>> [   12.468075] RBP: ffff8ee8bc01a5a0 R08: 0000000000000046 R09:
>>> 0000000000000238
>>> [   12.475199] R10: 0000000000000004 R11: ffff8ee8ba8ba248 R12:
>>> ffff8ee887228700
>>> [   12.482322] R13: ffff8ee8bd32f828 R14: 0000000000000002 R15:
>>> ffff8ee8bae93200
>>> [   12.489449] FS:  0000000000000000(0000) GS:ffff8ee8be200000(0000)
>>> knlGS:0000000000000000
>>> [   12.497524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   12.503262] CR2: 0000000100000000 CR3: 000000003c5ae000 CR4:
>>> 00000000001006f0
>>> [   12.510382] Call Trace:
>>> [   12.512841]  ? dwc3_gadget_giveback+0xbf/0x120
>>> [   12.517286]  ? __dwc3_gadget_ep_disable+0xc5/0x250
>>> [   12.522077]  ? dwc3_gadget_ep_disable+0x3d/0xd0
>>> [   12.526608]  ? usb_ep_disable+0x1d/0x80
>>> [   12.530451]  ? u_audio_stop_capture+0x87/0x9a [u_audio]
>>> [   12.535680]  ? afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>> [   12.540562]  ? composite_setup+0x20f/0x1b20 [libcomposite]
>>> [   12.546053]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>> [   12.552060]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>> [   12.558062]  ? dwc3_ep0_delegate_req+0x24/0x40
>>> [   12.562502]  ? dwc3_ep0_interrupt+0x40a/0x9d8
>>> [   12.566858]  ? dwc3_thread_interrupt+0x880/0xf70
>>> [   12.571475]  ? __schedule+0x3ee/0x640
>>> [   12.575143]  ? irq_forced_thread_fn+0x70/0x70
>>> [   12.579497]  ? irq_thread_fn+0x1b/0x60
>>> [   12.583245]  ? irq_thread+0xd3/0x150
>>> [   12.586821]  ? wake_threads_waitq+0x30/0x30
>>> [   12.591001]  ? irq_thread_dtor+0x80/0x80
>>> [   12.594925]  ? kthread+0xf9/0x130
>>> [   12.598238]  ? kthread_park+0x80/0x80
>>> [   12.601901]  ? ret_from_fork+0x22/0x30
>>> [   12.605644] Modules linked in: spi_pxa2xx_platform dw_dmac usb_f_uac2
>>> u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial
>>> libcomposite pwm_lpss_pci snd_sof_pci snd_sof_intel_byt pwm_lpss
>>> snd_sof_intel_ipc snd_sof_xtensa_dsp intel_mrfld_pwrbtn intel_mrfld_adc
>>> snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil
>>> leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer
>>> kfifo_buf spidev ledtrig_heartbeat mmc_block extcon_intel_mrfld
>>> sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs
>>> libcrc32c xor zstd_compress zlib_deflate raid6_pq
>>> [   12.657416] CR2: 0000000100000000
>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>
>> It this something you can reproduce on your end? Ferry, can you get dwc3
>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>
> I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or 
> do I need to do more?
> 
> BTW after posting this I found in host mode dwc3 is not working properly 
> either. No oops, but no driver get loaded on device plug in.
> 
Not sure if this is what you are looking for (otherwise let me know):

root@edison:/proc/sys/kernel# echo 1 > ftrace_dump_on_oops
## flip the switch from host to gadget
root@edison:/proc/sys/kernel# [  515.866590] BUG: kernel NULL pointer 
dereference, address: 0000000000000000
[  515.873553] #PF: supervisor read access in kernel mode
[  515.878682] #PF: error_code(0x0000) - not-present page
[  515.883814] PGD 0 P4D 0
[  515.886352] Oops: 0000 [#1] SMP PTI
[  515.889844] CPU: 0 PID: 490 Comm: irq/15-dwc3 Not tainted 
5.9.0-edison-acpi-standard #1
[  515.897836] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
BIOS 542 2015.01.21:18.19.48
[  515.906621] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
[  515.911842] Code: 0f 1f 44 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8 05 
e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75 0f eb 
2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31 f6 4c
[  515.930581] RSP: 0018:ffff945f8044fc40 EFLAGS: 00010083
[  515.935802] RAX: ffff8a347b2e3c48 RBX: ffff8a347d3af828 RCX: 
0000000000000000
[  515.942926] RDX: ffffffffffffffa0 RSI: ffff8a347dc4ed80 RDI: 
ffff8a347d3af958
[  515.950049] RBP: ffff8a347dc4ed80 R08: ffff8a347b2e3c68 R09: 
00000000dbfbb796
[  515.957173] R10: ffff945f8044fd90 R11: ffff8a347d3afb00 R12: 
ffff8a347d3af958
[  515.964297] R13: 0000000000000082 R14: ffff8a347b2e3c00 R15: 
ffff8a347b103600
[  515.971423] FS:  0000000000000000(0000) GS:ffff8a347e200000(0000) 
knlGS:0000000000000000
[  515.979503] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  515.985240] CR2: 0000000000000000 CR3: 000000002f40a000 CR4: 
00000000001006f0
[  515.992362] Call Trace:
[  515.994823]  usb_ep_dequeue+0x19/0x80
[  515.998499]  u_audio_stop_capture+0x54/0x9a [u_audio]
[  516.003554]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
[  516.008267]  composite_setup+0x20f/0x1b20 [libcomposite]
[  516.013588]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
[  516.019597]  configfs_composite_setup+0x6b/0x90 [libcomposite]
[  516.025432]  dwc3_ep0_delegate_req+0x24/0x40
[  516.029703]  dwc3_ep0_interrupt+0x40a/0x9d8
[  516.033890]  dwc3_thread_interrupt+0x880/0xf70
[  516.038336]  ? __schedule+0x3ee/0x640
[  516.042002]  ? irq_forced_thread_fn+0x70/0x70
[  516.046356]  irq_thread_fn+0x1b/0x60
[  516.049934]  irq_thread+0xd3/0x150
[  516.053335]  ? wake_threads_waitq+0x30/0x30
[  516.057516]  ? irq_thread_dtor+0x80/0x80
[  516.061438]  kthread+0xf9/0x130
[  516.064579]  ? kthread_park+0x80/0x80
[  516.068241]  ret_from_fork+0x22/0x30
[  516.071814] Modules linked in: rfcomm iptable_nat bnep usb_f_uac2 
u_audio usb_f_mass_storage spi_pxa2xx_platform dw_dmac usb_f_eem u_ether 
usb_f_serial u_serial libcomposite pwm_lpss_pci pwm_lpss snd_sof_pci 
snd_sof_intel_byt snd_sof_intel_ipc intel_mrfld_pwrbtn 
snd_sof_xtensa_dsp intel_mrfld_adc snd_sof snd_sof_nocodec snd_soc_acpi 
spi_pxa2xx_pci brcmfmac brcmutil leds_gpio ti_ads7950 hci_uart 
industrialio_triggered_buffer btbcm spidev kfifo_buf ledtrig_heartbeat 
mmc_block extcon_intel_mrfld sdhci_pci cqhci intel_soc_pmic_mrfld sdhci 
led_class mmc_core btrfs libcrc32c xor zstd_compress zlib_deflate raid6_pq
[  516.125674] Dumping ftrace buffer:
[  516.129074]    (ftrace buffer empty)
[  516.132642] CR2: 0000000000000000
[  516.135957] ---[ end trace 2386f834a3643685 ]---
[  516.140574] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
[  516.145793] Code: 0f 1f 44 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8 05 
e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75 0f eb 
2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31 f6 4c
[  516.164532] RSP: 0018:ffff945f8044fc40 EFLAGS: 00010083
[  516.169749] RAX: ffff8a347b2e3c48 RBX: ffff8a347d3af828 RCX: 
0000000000000000
[  516.176873] RDX: ffffffffffffffa0 RSI: ffff8a347dc4ed80 RDI: 
ffff8a347d3af958
[  516.183998] RBP: ffff8a347dc4ed80 R08: ffff8a347b2e3c68 R09: 
00000000dbfbb796
[  516.191121] R10: ffff945f8044fd90 R11: ffff8a347d3afb00 R12: 
ffff8a347d3af958
[  516.198246] R13: 0000000000000082 R14: ffff8a347b2e3c00 R15: 
ffff8a347b103600
[  516.205371] FS:  0000000000000000(0000) GS:ffff8a347e200000(0000) 
knlGS:0000000000000000
[  516.213447] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  516.219185] CR2: 0000000000000000 CR3: 000000002f40a000 CR4: 
00000000001006f0
[  516.226454] BUG: kernel NULL pointer dereference, address: 
0000000000000000
[  516.233414] #PF: supervisor instruction fetch in kernel mode
[  516.239063] #PF: error_code(0x0010) - not-present page
[  516.244190] PGD 0 P4D 0
[  516.246729] Oops: 0010 [#2] SMP PTI
[  516.250221] CPU: 0 PID: 490 Comm: irq/15-dwc3 Tainted: G      D 
     5.9.0-edison-acpi-standard #1
[  516.259595] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
BIOS 542 2015.01.21:18.19.48
[  516.268368] RIP: 0010:0x0
[  516.270988] Code: Bad RIP value.
[  516.274212] RSP: 0018:ffff945f8044fec0 EFLAGS: 00010246
[  516.279430] RAX: 0000000000000000 RBX: ffff8a347afa0cc0 RCX: 
0000000000000000
[  516.286555] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
ffff945f8044fec8
[  516.293679] RBP: ffff8a347afa0cc0 R08: 000000000000000f R09: 
ffffffffac26b701
[  516.300802] R10: ffff8a347c965000 R11: 0000000000000001 R12: 
ffff8a347afa13fc
[  516.307925] R13: 0000000000000000 R14: 0000000000000001 R15: 
0000000000000000
[  516.315053] FS:  0000000000000000(0000) GS:ffff8a347e200000(0000) 
knlGS:0000000000000000
[  516.323132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  516.328869] CR2: ffffffffffffffd6 CR3: 000000002f40a000 CR4: 
00000000001006f0
[  516.335991] Call Trace:
[  516.338443]  task_work_run+0x5a/0x90
[  516.342022]  do_exit+0x358/0xab0
[  516.345251]  ? kthread+0xf9/0x130
[  516.348566]  rewind_stack_do_exit+0x17/0x20
[  516.352742] RIP: 0000:0x0
[  516.355361] Code: Bad RIP value.
[  516.358587] RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 
0000000000000000
[  516.366148] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
0000000000000000
[  516.373273] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[  516.380396] RBP: 0000000000000000 R08: 0000000000000000 R09: 
0000000000000000
[  516.387519] R10: 0000000000000000 R11: 0000000000000000 R12: 
0000000000000000
[  516.394644] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[  516.401769] Modules linked in: rfcomm iptable_nat bnep usb_f_uac2 
u_audio usb_f_mass_storage spi_pxa2xx_platform dw_dmac usb_f_eem u_ether 
usb_f_serial u_serial libcomposite pwm_lpss_pci pwm_lpss snd_sof_pci 
snd_sof_intel_byt snd_sof_intel_ipc intel_mrfld_pwrbtn 
snd_sof_xtensa_dsp intel_mrfld_adc snd_sof snd_sof_nocodec snd_soc_acpi 
spi_pxa2xx_pci brcmfmac brcmutil leds_gpio ti_ads7950 hci_uart 
industrialio_triggered_buffer btbcm spidev kfifo_buf ledtrig_heartbeat 
mmc_block extcon_intel_mrfld sdhci_pci cqhci intel_soc_pmic_mrfld sdhci 
led_class mmc_core btrfs libcrc32c xor zstd_compress zlib_deflate raid6_pq
[  516.455620] Dumping ftrace buffer:
[  516.459018]    (ftrace buffer empty)
[  516.462586] CR2: 0000000000000000
[  516.465902] ---[ end trace 2386f834a3643686 ]---
[  516.470520] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
[  516.475743] Code: 0f 1f 44 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8 05 
e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75 0f eb 
2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31 f6 4c
[  516.494481] RSP: 0018:ffff945f8044fc40 EFLAGS: 00010083
[  516.499703] RAX: ffff8a347b2e3c48 RBX: ffff8a347d3af828 RCX: 
0000000000000000
[  516.506826] RDX: ffffffffffffffa0 RSI: ffff8a347dc4ed80 RDI: 
ffff8a347d3af958
[  516.513950] RBP: ffff8a347dc4ed80 R08: ffff8a347b2e3c68 R09: 
00000000dbfbb796
[  516.521075] R10: ffff945f8044fd90 R11: ffff8a347d3afb00 R12: 
ffff8a347d3af958
[  516.528198] R13: 0000000000000082 R14: ffff8a347b2e3c00 R15: 
ffff8a347b103600
[  516.535324] FS:  0000000000000000(0000) GS:ffff8a347e200000(0000) 
knlGS:0000000000000000
[  516.543406] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  516.549142] CR2: ffffffffffffffd6 CR3: 000000002f40a000 CR4: 
00000000001006f0
[  516.556264] Fixing recursive fault but reboot is needed!
[  544.795126] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  544.801062] rcu: 	0-...0: (2 GPs behind) 
idle=a5a/1/0x4000000000000000 softirq=18432/18432 fqs=5250
[  544.810182] 	(detected by 1, t=21002 jiffies, g=31497, q=9)
[  544.815751] Sending NMI from CPU 1 to CPUs 0:
[  544.821106] NMI backtrace for cpu 0
[  544.821112] CPU: 0 PID: 194 Comm: kworker/0:2H Tainted: G      D 
      5.9.0-edison-acpi-standard #1
[  544.821116] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
BIOS 542 2015.01.21:18.19.48
[  544.821119] Workqueue:  0x0 (mmc_complete)
[  544.821127] RIP: 0010:queued_spin_lock_slowpath+0x3c/0x1a0
[  544.821135] Code: 41 f0 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 
8b 07 30 e4 09 d0 a9 00 01 ff ff 75 1b 85 c0 74 0e 8b 07 84 c0 74 08 f3 
90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 c3 f6 c4 01 75 04 c6 47
[  544.821138] RSP: 0018:ffff945f80003af8 EFLAGS: 00000002
[  544.821144] RAX: 0000000000000101 RBX: ffff8a347b2e3e00 RCX: 
0000000000000000
[  544.821148] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffff8a347d3af958
[  544.821152] RBP: ffff8a347b655840 R08: 0000000000000004 R09: 
0000000000000282
[  544.821155] R10: ffff8a347c95a400 R11: 0000000000000002 R12: 
0000000000000202
[  544.821159] R13: ffff8a347b2238e0 R14: ffff8a347b223000 R15: 
ffff8a347b655880
[  544.821163] FS:  0000000000000000(0000) GS:ffff8a347e200000(0000) 
knlGS:0000000000000000
[  544.821166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  544.821170] CR2: ffffffffffffffd6 CR3: 000000002f40a000 CR4: 
00000000001006f0
[  544.821172] Call Trace:
[  544.821173]  <IRQ>
[  544.821176]  _raw_spin_lock_irqsave+0x20/0x30
[  544.821179]  dwc3_gadget_ep_queue+0x26/0x1a0
[  544.821181]  usb_ep_queue+0x2b/0xa0
[  544.821184]  eth_start_xmit+0x1d5/0x360 [u_ether]
[  544.821186]  dev_hard_start_xmit+0x88/0x1d0
[  544.821189]  sch_direct_xmit+0xe1/0x210
[  544.821191]  __qdisc_run+0x145/0x520
[  544.821194]  __dev_queue_xmit+0x44e/0x6d0
[  544.821196]  ? sock_alloc_send_pskb+0x202/0x220
[  544.821199]  ip6_finish_output2+0x23b/0x550
[  544.821201]  ip6_output+0x6e/0x120
[  544.821204]  ? __ip6_finish_output+0x110/0x110
[  544.821207]  mld_sendpack+0x1b2/0x220
[  544.821209]  mld_ifc_timer_expire+0x191/0x2f0
[  544.821212]  ? ip6_mc_leave_src+0x90/0x90
[  544.821214]  call_timer_fn+0x28/0x120
[  544.821217]  run_timer_softirq+0x395/0x450
[  544.821219]  ? hrtimer_wakeup+0x19/0x20
[  544.821222]  ? __hrtimer_run_queues+0x100/0x260
[  544.821225]  ? recalibrate_cpu_khz+0x10/0x10
[  544.821227]  ? ktime_get+0x33/0x90
[  544.821229]  __do_softirq+0xdb/0x2dc
[  544.821232]  asm_call_irq_on_stack+0x12/0x20
[  544.821234]  </IRQ>
[  544.821236]  do_softirq_own_stack+0x32/0x40
[  544.821239]  irq_exit_rcu+0x92/0xa0
[  544.821241]  sysvec_apic_timer_interrupt+0x2e/0x80
[  544.821244]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  544.821247] RIP: 0010:finish_task_switch+0x6e/0x200
[  544.821255] Code: 00 00 4d 8b 7d 10 65 48 8b 1c 25 00 6d 01 00 0f 1f 
44 00 00 0f 1f 44 00 00 41 c7 45 2c 00 00 00 00 41 c6 04 24 00 fb 4d 85 
f6 <74> 1d 65 48 8b 04 25 00 6d 01 00 4c 3b b0 e0 03 00 00 74 33 f0 41
[  544.821258] RSP: 0018:ffff945f802cfe18 EFLAGS: 00000246
[  544.821264] RAX: ffff8a347afa0cc0 RBX: ffff8a3446040cc0 RCX: 
0000000000000000
[  544.821267] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffff8a347afa0cc0
[  544.821271] RBP: ffff945f802cfe40 R08: 0000000000000368 R09: 
0000000000000009
[  544.821275] R10: 00000000fe6e673f R11: 000000000000013f R12: 
ffff8a347e22a0c0
[  544.821279] R13: ffff8a347afa0cc0 R14: 0000000000000000 R15: 
0000000000000002
[  544.821281]  ? __switch_to_asm+0x36/0x70
[  544.821284]  __schedule+0x3ee/0x640
[  544.821286]  schedule+0x45/0xb0
[  544.821288]  worker_thread+0xb7/0x3b0
[  544.821291]  ? process_one_work+0x380/0x380
[  544.821293]  kthread+0xf9/0x130
[  544.821296]  ? kthread_park+0x80/0x80
[  544.821298]  ret_from_fork+0x22/0x30





^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19  5:45 ` Felipe Balbi
  2020-10-19  7:14   ` Ferry Toth
  2020-10-19  7:18   ` Ferry Toth
@ 2020-10-19 19:46   ` Andy Shevchenko
  2020-10-19 20:46     ` Ferry Toth
  2020-10-20 13:27     ` Andy Shevchenko
  2 siblings, 2 replies; 31+ messages in thread
From: Andy Shevchenko @ 2020-10-19 19:46 UTC (permalink / raw)
  To: Felipe Balbi, Heikki Krogerus; +Cc: Ferry Toth, linux-usb, felipe.balbi

On Mon, Oct 19, 2020 at 08:45:10AM +0300, Felipe Balbi wrote:
> Ferry Toth <fntoth@gmail.com> writes:
> > This occurs with edison-arduino board, that has a nifty switch allowing 
> > to switch between gadget/host mode. In host mode it boot fine, then 
> > crashes when I flip the switch to gadget.
> >
> > The below trace if what I get from the console when booting with gadget 
> > mode selected.
> >
> > The last kernel is used where everything is obviously working fine is 5.6.0.
> >
> > The kernel is built specifically for the platform, nothing suspcious 
> > going on the the dwc3 area, see 
> > https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0

> It this something you can reproduce on your end? Ferry, can you get dwc3
> trace logs when this happens? ftrace_dump_on_oops may help here.

For time being I can confirm that switch stopped working between v5.7 (v5.8)
and v5.8.16. But I didn't see any crash so far (I don't use any predefined
gadget, though).

afb420486016 usb: dwc3: gadget: Handle ZLP for sg requests
8301e3aa1c8d usb: dwc3: gadget: Fix handling ZLP
d884a90cec5a usb: dwc3: gadget: Don't setup more than requested

Reverting them does not help, so I looked into drivers/usb changes.

Manual guess work did give any result, so I bisected:

# good: [9ece50d8a470ca7235ffd6ac0f9c5f0f201fe2c8] Linux 5.8.5
# good: [96d020ddff6adff267a6900bcfcd46a8993f5152] xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
# bad: [ccc9838fed80f04e45a2c317e4a2dacdf2f1e3c2] drm/amd/pm: correct the thermal alert temperature limit settings
# bad: [bbf423c28efcde2beec2b187806eda0041cb0582] x86/irq: Unbreak interrupt affinity setting
# good: [9a9cc8c9b1c715317c5fc18ac695751577bdf250] powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
# bad: [8cb3561d084ef532cd13d4f1f9077a900ff9f740] usbip: Implement a match function to fix usbip
# bad: [3c491c44194253789d568549fac3b34dccdbcecd] crypto: af_alg - Work around empty control messages without MSG_MORE
# bad: [1d35dfde2a7d9a0627b1e9465e8e4305478fb945] device property: Fix the secondary firmware node handling in set_primary_fwnode()
# first bad commit: [1d35dfde2a7d9a0627b1e9465e8e4305478fb945] device property: Fix the secondary firmware node handling in set_primary_fwnode()

Revert on v5.9 helps.

Heikki, any idea?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19 19:46   ` Andy Shevchenko
@ 2020-10-19 20:46     ` Ferry Toth
  2020-10-20 13:27     ` Andy Shevchenko
  1 sibling, 0 replies; 31+ messages in thread
From: Ferry Toth @ 2020-10-19 20:46 UTC (permalink / raw)
  To: linux-usb
  Cc: Ferry Toth, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	felipe.balbi-VuQAYsv1563Yd54FQh9/CA

Op 19-10-2020 om 21:46 schreef Andy Shevchenko:
> On Mon, Oct 19, 2020 at 08:45:10AM +0300, Felipe Balbi wrote:
>> Ferry Toth <fntoth@gmail.com> writes:
>>> This occurs with edison-arduino board, that has a nifty switch allowing
>>> to switch between gadget/host mode. In host mode it boot fine, then
>>> crashes when I flip the switch to gadget.
>>>
>>> The below trace if what I get from the console when booting with gadget
>>> mode selected.
>>>
>>> The last kernel is used where everything is obviously working fine is 5.6.0.
>>>
>>> The kernel is built specifically for the platform, nothing suspcious
>>> going on the the dwc3 area, see
>>> https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
> 
>> It this something you can reproduce on your end? Ferry, can you get dwc3
>> trace logs when this happens? ftrace_dump_on_oops may help here.
> 
> For time being I can confirm that switch stopped working between v5.7 (v5.8)
> and v5.8.16. But I didn't see any crash so far (I don't use any predefined
> gadget, though).
> 
> afb420486016 usb: dwc3: gadget: Handle ZLP for sg requests
> 8301e3aa1c8d usb: dwc3: gadget: Fix handling ZLP
> d884a90cec5a usb: dwc3: gadget: Don't setup more than requested
> 
> Reverting them does not help, so I looked into drivers/usb changes.
> 
> Manual guess work did give any result, so I bisected:
> 
> # good: [9ece50d8a470ca7235ffd6ac0f9c5f0f201fe2c8] Linux 5.8.5
> # good: [96d020ddff6adff267a6900bcfcd46a8993f5152] xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
> # bad: [ccc9838fed80f04e45a2c317e4a2dacdf2f1e3c2] drm/amd/pm: correct the thermal alert temperature limit settings
> # bad: [bbf423c28efcde2beec2b187806eda0041cb0582] x86/irq: Unbreak interrupt affinity setting
> # good: [9a9cc8c9b1c715317c5fc18ac695751577bdf250] powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
> # bad: [8cb3561d084ef532cd13d4f1f9077a900ff9f740] usbip: Implement a match function to fix usbip
> # bad: [3c491c44194253789d568549fac3b34dccdbcecd] crypto: af_alg - Work around empty control messages without MSG_MORE
> # bad: [1d35dfde2a7d9a0627b1e9465e8e4305478fb945] device property: Fix the secondary firmware node handling in set_primary_fwnode()
> # first bad commit: [1d35dfde2a7d9a0627b1e9465e8e4305478fb945] device property: Fix the secondary firmware node handling in set_primary_fwnode()
> 
> Revert on v5.9 helps.
> 
> Heikki, any idea?
> 
Hi Andy, that was fast.

I can confirm that reverting this patch (which I found as 
c15e1bdda4365a5f17cdadf22bf1c1df13884a9e in 5.9-rc3) makes host mode 
work again on 5.9.0. I can see the usb controller and usb stick with 
`lsusb -t`, and can mount/umount the stick.

Booting with the switch in gadget position I still get an oops.
Same with booting in host mode, then setting the switch to gadget mode.

I noted on 5.9.0 in host mode I get:
root@edison:~# journalctl -b | grep dwc
root@edison:~# journalctl -b | grep dwc
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: GPIO lookup 
for consumer reset
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: using ACPI for 
GPIO lookup
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: using lookup 
tables for GPIO lookup
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: No GPIO 
consumer reset found
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: GPIO lookup 
for consumer cs
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: using ACPI for 
GPIO lookup
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: using lookup 
tables for GPIO lookup
Oct 04 16:49:44 edison kernel: tusb1210 dwc3.0.auto.ulpi: No GPIO 
consumer cs found
...
<repeats a few times>

This is new on 5.9.0, but is not affected by reverting c15e1bdda, so 
maybe unrelated.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19  7:18   ` Ferry Toth
@ 2020-10-20 12:32     ` Felipe Balbi
  2020-10-20 19:46       ` Ferry Toth
  2020-10-20 20:37       ` Ferry Toth
  0 siblings, 2 replies; 31+ messages in thread
From: Felipe Balbi @ 2020-10-20 12:32 UTC (permalink / raw)
  To: Ferry Toth, linux-usb, Andy Shevchenko; +Cc: felipe.balbi


[-- Attachment #1: Type: text/plain, Size: 630 bytes --]


Hi,

Ferry Toth <fntoth@gmail.com> writes:

8< snip

>>> [   12.657416] CR2: 0000000100000000
>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>> 
>> It this something you can reproduce on your end? Ferry, can you get dwc3
>> trace logs when this happens? ftrace_dump_on_oops may help here.
> I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or 
> do I need to do more?

you'd have to enable dwc3 trace events first ;-)

> BTW after posting this I found in host mode dwc3 is not working properly 
> either. No oops, but no driver get loaded on device plug in.

okay

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19 18:49     ` Ferry Toth
@ 2020-10-20 12:35       ` Felipe Balbi
  2020-10-20 21:01         ` Ferry Toth
  0 siblings, 1 reply; 31+ messages in thread
From: Felipe Balbi @ 2020-10-20 12:35 UTC (permalink / raw)
  To: Ferry Toth, linux-usb
  Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg


[-- Attachment #1: Type: text/plain, Size: 6493 bytes --]

Ferry Toth <fntoth@gmail.com> writes:

Hi,

> Op 19-10-2020 om 09:14 schreef Ferry Toth:
>> Op 19-10-2020 om 07:45 schreef Felipe Balbi:
>>>
>>> Hi Andy,
>>>
>>> Ferry Toth <fntoth@gmail.com> writes:
>>>> This occurs with edison-arduino board, that has a nifty switch allowing
>>>> to switch between gadget/host mode. In host mode it boot fine, then
>>>> crashes when I flip the switch to gadget.
>>>>
>>>> The below trace if what I get from the console when booting with gadget
>>>> mode selected.
>>>>
>>>> The last kernel is used where everything is obviously working fine is 
>>>> 5.6.0.
>>>>
>>>> The kernel is built specifically for the platform, nothing suspcious
>>>> going on the the dwc3 area, see
>>>> https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>>>>
>>>> Magic signature found
>>>>
>>>> Starting kernel ...
>>>>
>>>> [    2.395631] Initramfs unpacking failed: invalid magic at start of
>>>> compressed archive
>>>> Scanning for Btrfs filesystems
>>>> Starting version 243.2+
>>>> Kernel with acpi enabled detected
>>>> Loading acpi tables
>>>> Waiting for root device /dev/mmcblk0p8
>>>>     10Found device '/run/media/mmcblk0p8'
>>>>     9Init found, booting...
>>>> [   10.834272] brcmfmac: brcmf_fw_alloc_request: using
>>>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>>>> [   11.179662] brcmfmac: brcmf_fw_alloc_request: using
>>>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>>>> [   11.194223] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available
>>>> (err=-2), device may have limited channels available
>>>> [   11.234779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43340/2
>>>> wl0: Oct 23 2017 08:41:23 version 6.10.190.70 (r674464) FWID 01-98d71006
>>>> [   12.401620] BUG: unable to handle page fault for address:
>>>> 0000000100000000
>>>> [   12.408496] #PF: supervisor instruction fetch in kernel mode
>>>> [   12.414145] #PF: error_code(0x0010) - not-present page
>>>> [   12.419276] PGD 0 P4D 0
>>>> [   12.421817] Oops: 0010 [#1] SMP PTI
>>>> [   12.425307] CPU: 0 PID: 488 Comm: irq/15-dwc3 Not tainted
>>>> 5.9.0-edison-acpi-standard #1
>>>> [   12.433297] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>>> BIOS 542 2015.01.21:18.19.48
>>>> [   12.442075] RIP: 0010:0x100000000
>>>> [   12.445382] Code: Bad RIP value.
>>>> [   12.448605] RSP: 0000:ffff9a95403fbbf8 EFLAGS: 00010046
>>>> [   12.453827] RAX: 0000000100000000 RBX: ffff8ee8bd32f828 RCX:
>>>> ffff8ee8bacc4000
>>>> [   12.460950] RDX: 00000000ffffff94 RSI: ffff8ee8bc01a5a0 RDI:
>>>> ffff8ee887228700
>>>> [   12.468075] RBP: ffff8ee8bc01a5a0 R08: 0000000000000046 R09:
>>>> 0000000000000238
>>>> [   12.475199] R10: 0000000000000004 R11: ffff8ee8ba8ba248 R12:
>>>> ffff8ee887228700
>>>> [   12.482322] R13: ffff8ee8bd32f828 R14: 0000000000000002 R15:
>>>> ffff8ee8bae93200
>>>> [   12.489449] FS:  0000000000000000(0000) GS:ffff8ee8be200000(0000)
>>>> knlGS:0000000000000000
>>>> [   12.497524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   12.503262] CR2: 0000000100000000 CR3: 000000003c5ae000 CR4:
>>>> 00000000001006f0
>>>> [   12.510382] Call Trace:
>>>> [   12.512841]  ? dwc3_gadget_giveback+0xbf/0x120
>>>> [   12.517286]  ? __dwc3_gadget_ep_disable+0xc5/0x250
>>>> [   12.522077]  ? dwc3_gadget_ep_disable+0x3d/0xd0
>>>> [   12.526608]  ? usb_ep_disable+0x1d/0x80
>>>> [   12.530451]  ? u_audio_stop_capture+0x87/0x9a [u_audio]
>>>> [   12.535680]  ? afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>>> [   12.540562]  ? composite_setup+0x20f/0x1b20 [libcomposite]
>>>> [   12.546053]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>>> [   12.552060]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>>> [   12.558062]  ? dwc3_ep0_delegate_req+0x24/0x40
>>>> [   12.562502]  ? dwc3_ep0_interrupt+0x40a/0x9d8
>>>> [   12.566858]  ? dwc3_thread_interrupt+0x880/0xf70
>>>> [   12.571475]  ? __schedule+0x3ee/0x640
>>>> [   12.575143]  ? irq_forced_thread_fn+0x70/0x70
>>>> [   12.579497]  ? irq_thread_fn+0x1b/0x60
>>>> [   12.583245]  ? irq_thread+0xd3/0x150
>>>> [   12.586821]  ? wake_threads_waitq+0x30/0x30
>>>> [   12.591001]  ? irq_thread_dtor+0x80/0x80
>>>> [   12.594925]  ? kthread+0xf9/0x130
>>>> [   12.598238]  ? kthread_park+0x80/0x80
>>>> [   12.601901]  ? ret_from_fork+0x22/0x30
>>>> [   12.605644] Modules linked in: spi_pxa2xx_platform dw_dmac usb_f_uac2
>>>> u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial
>>>> libcomposite pwm_lpss_pci snd_sof_pci snd_sof_intel_byt pwm_lpss
>>>> snd_sof_intel_ipc snd_sof_xtensa_dsp intel_mrfld_pwrbtn intel_mrfld_adc
>>>> snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil
>>>> leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer
>>>> kfifo_buf spidev ledtrig_heartbeat mmc_block extcon_intel_mrfld
>>>> sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs
>>>> libcrc32c xor zstd_compress zlib_deflate raid6_pq
>>>> [   12.657416] CR2: 0000000100000000
>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>
>>> It this something you can reproduce on your end? Ferry, can you get dwc3
>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>
>> I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or 
>> do I need to do more?
>> 
>> BTW after posting this I found in host mode dwc3 is not working properly 
>> either. No oops, but no driver get loaded on device plug in.
>> 
> Not sure if this is what you are looking for (otherwise let me know):
>
> root@edison:/proc/sys/kernel# echo 1 > ftrace_dump_on_oops
> ## flip the switch from host to gadget
> root@edison:/proc/sys/kernel# [  515.866590] BUG: kernel NULL pointer 
> dereference, address: 0000000000000000
> [  515.873553] #PF: supervisor read access in kernel mode
> [  515.878682] #PF: error_code(0x0000) - not-present page
> [  515.883814] PGD 0 P4D 0
> [  515.886352] Oops: 0000 [#1] SMP PTI
> [  515.889844] CPU: 0 PID: 490 Comm: irq/15-dwc3 Not tainted 
> 5.9.0-edison-acpi-standard #1
> [  515.897836] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
> BIOS 542 2015.01.21:18.19.48
> [  515.906621] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0

what do you get with:

$ gdb vmlinux
(gdb) l *(dwc3_gadget_ep_dequeue+0x41)

??

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-19 19:46   ` Andy Shevchenko
  2020-10-19 20:46     ` Ferry Toth
@ 2020-10-20 13:27     ` Andy Shevchenko
  1 sibling, 0 replies; 31+ messages in thread
From: Andy Shevchenko @ 2020-10-20 13:27 UTC (permalink / raw)
  To: Andy Shevchenko; +Cc: Felipe Balbi, Heikki Krogerus, Ferry Toth, USB

+Cc: Rafael.

On Tue, Oct 20, 2020 at 10:32 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Mon, Oct 19, 2020 at 08:45:10AM +0300, Felipe Balbi wrote:
> > Ferry Toth <fntoth@gmail.com> writes:
> > > This occurs with edison-arduino board, that has a nifty switch allowing
> > > to switch between gadget/host mode. In host mode it boot fine, then
> > > crashes when I flip the switch to gadget.
> > >
> > > The below trace if what I get from the console when booting with gadget
> > > mode selected.
> > >
> > > The last kernel is used where everything is obviously working fine is 5.6.0.
> > >
> > > The kernel is built specifically for the platform, nothing suspcious
> > > going on the the dwc3 area, see
> > > https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>
> > It this something you can reproduce on your end? Ferry, can you get dwc3
> > trace logs when this happens? ftrace_dump_on_oops may help here.
>
> For time being I can confirm that switch stopped working between v5.7 (v5.8)
> and v5.8.16. But I didn't see any crash so far (I don't use any predefined
> gadget, though).
>
> afb420486016 usb: dwc3: gadget: Handle ZLP for sg requests
> 8301e3aa1c8d usb: dwc3: gadget: Fix handling ZLP
> d884a90cec5a usb: dwc3: gadget: Don't setup more than requested
>
> Reverting them does not help, so I looked into drivers/usb changes.
>
> Manual guess work did not give any result, so I bisected:
>
> # good: [9ece50d8a470ca7235ffd6ac0f9c5f0f201fe2c8] Linux 5.8.5
> # good: [96d020ddff6adff267a6900bcfcd46a8993f5152] xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
> # bad: [ccc9838fed80f04e45a2c317e4a2dacdf2f1e3c2] drm/amd/pm: correct the thermal alert temperature limit settings
> # bad: [bbf423c28efcde2beec2b187806eda0041cb0582] x86/irq: Unbreak interrupt affinity setting
> # good: [9a9cc8c9b1c715317c5fc18ac695751577bdf250] powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
> # bad: [8cb3561d084ef532cd13d4f1f9077a900ff9f740] usbip: Implement a match function to fix usbip
> # bad: [3c491c44194253789d568549fac3b34dccdbcecd] crypto: af_alg - Work around empty control messages without MSG_MORE
> # bad: [1d35dfde2a7d9a0627b1e9465e8e4305478fb945] device property: Fix the secondary firmware node handling in set_primary_fwnode()
> # first bad commit: [1d35dfde2a7d9a0627b1e9465e8e4305478fb945] device property: Fix the secondary firmware node handling in set_primary_fwnode()
>
> Revert on v5.9 helps.
>
> Heikki, any idea?

Rafael, this patch made a regression and it seems the proper fix might
be not so neat / small and nice to backport.
Any advice on how to proceed here?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-20 12:32     ` Felipe Balbi
@ 2020-10-20 19:46       ` Ferry Toth
  2020-10-20 20:37       ` Ferry Toth
  1 sibling, 0 replies; 31+ messages in thread
From: Ferry Toth @ 2020-10-20 19:46 UTC (permalink / raw)
  To: linux-usb; +Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA


[-- Attachment #1: Type: text/plain, Size: 1042 bytes --]

Op 20-10-2020 om 14:32 schreef Felipe Balbi:
> 
> Hi,
> 
> Ferry Toth <fntoth@gmail.com> writes:
> 
> 8< snip
> 
>>>> [   12.657416] CR2: 0000000100000000
>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>
>>> It this something you can reproduce on your end? Ferry, can you get dwc3
>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>> I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or
>> do I need to do more?
> 
> you'd have to enable dwc3 trace events first ;-)

Ok, this is with the c15e1bdda reverted, so host mode seems to be working.

Booting with the switch in host mode, then briefly flipping to gadget 
mode. Long trace happens, then finally wdt. I'll attach it here, see if 
it passes sweepers, else this link to google drive:
https://docs.google.com/document/d/17XaBF03l0KDBEey8XiDTAPUWQKKdYjPZcabn5VJXt2o/edit?usp=sharing

>> BTW after posting this I found in host mode dwc3 is not working properly
>> either. No oops, but no driver get loaded on device plug in.
> 
> okay
> 


[-- Attachment #2: ftrace.7z --]
[-- Type: application/x-7z-compressed, Size: 44356 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-20 12:32     ` Felipe Balbi
  2020-10-20 19:46       ` Ferry Toth
@ 2020-10-20 20:37       ` Ferry Toth
  2020-10-20 22:10         ` Thinh Nguyen
  1 sibling, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-20 20:37 UTC (permalink / raw)
  To: linux-usb; +Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA

Op 20-10-2020 om 14:32 schreef Felipe Balbi:
> 
> Hi,
> 
> Ferry Toth <fntoth@gmail.com> writes:
> 
> 8< snip
> 
>>>> [   12.657416] CR2: 0000000100000000
>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>
>>> It this something you can reproduce on your end? Ferry, can you get dwc3
>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>> I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or
>> do I need to do more?
> 
> you'd have to enable dwc3 trace events first ;-)
> 
>> BTW after posting this I found in host mode dwc3 is not working properly
>> either. No oops, but no driver get loaded on device plug in.
> 
> okay
> 
Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:

root@edison:/boot# uname -a
Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC 
2020 x86_64 x86_64 x86_64 GNU/Linux
root@edison:/boot# echo 1 > 
/sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
root@edison:/boot#
root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer dereference, 
address: 0000000000000000
[ 2608.592288] #PF: supervisor read access in kernel mode
[ 2608.597419] #PF: error_code(0x0000) - not-present page
[ 2608.602549] PGD 0 P4D 0
[ 2608.605090] Oops: 0000 [#1] SMP PTI
[ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted 
5.9.0-edison-acpi-standard #1
[ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
BIOS 542 2015.01.21:18.19.48
[ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
[ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8 15 
e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75 0f eb 
2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31 f6 4c
[ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
[ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX: 
0000000000000000
[ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI: 
ffff9a5f46f48158
[ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09: 
ffff9a5f7dd6a000
[ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12: 
ffff9a5f46f48158
[ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15: 
ffff9a5f7ad77e00
[ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000) 
knlGS:0000000000000000
[ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4: 
00000000001006e0
[ 2608.711102] Call Trace:
[ 2608.713561]  usb_ep_dequeue+0x19/0x80
[ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
[ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
[ 2608.727003]  composite_setup+0x20f/0x1b20 [libcomposite]
[ 2608.732324]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
[ 2608.738332]  configfs_composite_setup+0x6b/0x90 [libcomposite]
[ 2608.744163]  dwc3_ep0_delegate_req+0x24/0x40
[ 2608.748435]  dwc3_ep0_interrupt+0x40a/0x9d8
[ 2608.752620]  dwc3_thread_interrupt+0x880/0xf70
[ 2608.757069]  ? __schedule+0x3ee/0x640
[ 2608.760734]  ? irq_forced_thread_fn+0x70/0x70
[ 2608.765089]  irq_thread_fn+0x1b/0x60
[ 2608.768666]  irq_thread+0xd3/0x150
[ 2608.772068]  ? wake_threads_waitq+0x30/0x30
[ 2608.776248]  ? irq_thread_dtor+0x80/0x80
[ 2608.780170]  kthread+0xf9/0x130
[ 2608.783312]  ? kthread_park+0x80/0x80
[ 2608.786975]  ret_from_fork+0x22/0x30
[ 2608.790546] Modules linked in: usb_f_uac2 u_audio usb_f_mass_storage 
usb_f_eem u_ether usb_f_serial u_serial libcomposite rfcomm iptable_nat 
bnep spi_pxa2xx_platform dw_dmac pwm_lpss_pci pwm_lpss snd_sof_pci 
intel_mrfld_adc snd_sof_intel_byt intel_mrfld_pwrbtn snd_sof_intel_ipc 
snd_sof_xtensa_dsp snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci 
brcmfmac brcmutil hci_uart leds_gpio btbcm ti_ads7950 spidev 
industrialio_triggered_buffer kfifo_buf ledtrig_heartbeat mmc_block 
extcon_intel_mrfld sdhci_pci cqhci sdhci led_class intel_soc_pmic_mrfld 
mmc_core btrfs libcrc32c xor zstd_compress zlib_deflate raid6_pq
[ 2608.844407] Dumping ftrace buffer:
[ 2608.847805] ---------------------------------
[ 2608.852198] irq/15-d-733       1d... 2608518943us : dwc3_ep_dequeue: 
ep4out: req 00000000a40fdf40 length 0/256 zsI ==> -115
[ 2608.863334] irq/15-d-733       1d... 2608518954us : dwc3_ep_dequeue: 
ep4out: req 00000000545565de length 0/256 zsI ==> -115
[ 2608.874467] irq/15-d-733       1d... 2608520323us : dwc3_ep_dequeue: 
ep5in: req 00000000545565de length 0/192 zsI ==> -115
[ 2608.885513] irq/15-d-733       1d... 2608520331us : dwc3_ep_dequeue: 
ep5in: req 00000000a5936556 length 0/192 zsI ==> -115
[ 2608.896558] irq/15-d-733       1d... 2608578454us : dwc3_ep_dequeue: 
ep5in: req 00000000545565de length 0/192 zsI ==> -115
[ 2608.907603] irq/15-d-733       1d... 2608578464us : dwc3_ep_dequeue: 
ep5in: req 0000000036de95f5 length 0/192 zsI ==> -115
[ 2608.918650] irq/15-d-733       1d... 2608580113us : dwc3_ep_dequeue: 
ep5in: req 0000000036de95f5 length 0/192 zsI ==> -115
[ 2608.929694] irq/15-d-733       1d... 2608580124us : dwc3_ep_dequeue: 
ep5in: req 00000000545565de length 0/192 zsI ==> -115
[ 2608.940739] irq/15-d-733       1d... 2608582968us : dwc3_ep_dequeue: 
ep5in: req 00000000aa8c59ad length 0/192 zsI ==> -115
[ 2608.951787] irq/15-d-733       1d... 2608582976us : dwc3_ep_dequeue: 
ep5in: req 00000000a40fdf40 length 0/192 zsI ==> -115
[ 2608.962832] irq/15-d-733       1d... 2608590151us : dwc3_ep_dequeue: 
ep4out: req 00000000545565de length 0/256 zsI ==> -115
[ 2608.973963] irq/15-d-733       1d... 2608590164us : dwc3_ep_dequeue: 
ep4out: req 0000000036de95f5 length 0/256 zsI ==> -115
[ 2608.985074] ---------------------------------
[ 2608.989425] CR2: 0000000000000000
[ 2608.992740] ---[ end trace b72f9adf1da68308 ]---



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-20 12:35       ` Felipe Balbi
@ 2020-10-20 21:01         ` Ferry Toth
  0 siblings, 0 replies; 31+ messages in thread
From: Ferry Toth @ 2020-10-20 21:01 UTC (permalink / raw)
  To: linux-usb
  Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg-XMD5yJDbdMReXY1tMh2IBg

Op 20-10-2020 om 14:35 schreef Felipe Balbi:
> Ferry Toth <fntoth@gmail.com> writes:
> 
> Hi,
> 
>> Op 19-10-2020 om 09:14 schreef Ferry Toth:
>>> Op 19-10-2020 om 07:45 schreef Felipe Balbi:
>>>>
>>>> Hi Andy,
>>>>
>>>> Ferry Toth <fntoth@gmail.com> writes:
>>>>> This occurs with edison-arduino board, that has a nifty switch allowing
>>>>> to switch between gadget/host mode. In host mode it boot fine, then
>>>>> crashes when I flip the switch to gadget.
>>>>>
>>>>> The below trace if what I get from the console when booting with gadget
>>>>> mode selected.
>>>>>
>>>>> The last kernel is used where everything is obviously working fine is
>>>>> 5.6.0.
>>>>>
>>>>> The kernel is built specifically for the platform, nothing suspcious
>>>>> going on the the dwc3 area, see
>>>>> https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>>>>>
>>>>> Magic signature found
>>>>>
>>>>> Starting kernel ...
>>>>>
>>>>> [    2.395631] Initramfs unpacking failed: invalid magic at start of
>>>>> compressed archive
>>>>> Scanning for Btrfs filesystems
>>>>> Starting version 243.2+
>>>>> Kernel with acpi enabled detected
>>>>> Loading acpi tables
>>>>> Waiting for root device /dev/mmcblk0p8
>>>>>      10Found device '/run/media/mmcblk0p8'
>>>>>      9Init found, booting...
>>>>> [   10.834272] brcmfmac: brcmf_fw_alloc_request: using
>>>>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>>>>> [   11.179662] brcmfmac: brcmf_fw_alloc_request: using
>>>>> brcm/brcmfmac43340-sdio for chip BCM43340/2
>>>>> [   11.194223] brcmfmac: brcmf_c_process_clm_blob: no clm_blob available
>>>>> (err=-2), device may have limited channels available
>>>>> [   11.234779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM43340/2
>>>>> wl0: Oct 23 2017 08:41:23 version 6.10.190.70 (r674464) FWID 01-98d71006
>>>>> [   12.401620] BUG: unable to handle page fault for address:
>>>>> 0000000100000000
>>>>> [   12.408496] #PF: supervisor instruction fetch in kernel mode
>>>>> [   12.414145] #PF: error_code(0x0010) - not-present page
>>>>> [   12.419276] PGD 0 P4D 0
>>>>> [   12.421817] Oops: 0010 [#1] SMP PTI
>>>>> [   12.425307] CPU: 0 PID: 488 Comm: irq/15-dwc3 Not tainted
>>>>> 5.9.0-edison-acpi-standard #1
>>>>> [   12.433297] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>>>> BIOS 542 2015.01.21:18.19.48
>>>>> [   12.442075] RIP: 0010:0x100000000
>>>>> [   12.445382] Code: Bad RIP value.
>>>>> [   12.448605] RSP: 0000:ffff9a95403fbbf8 EFLAGS: 00010046
>>>>> [   12.453827] RAX: 0000000100000000 RBX: ffff8ee8bd32f828 RCX:
>>>>> ffff8ee8bacc4000
>>>>> [   12.460950] RDX: 00000000ffffff94 RSI: ffff8ee8bc01a5a0 RDI:
>>>>> ffff8ee887228700
>>>>> [   12.468075] RBP: ffff8ee8bc01a5a0 R08: 0000000000000046 R09:
>>>>> 0000000000000238
>>>>> [   12.475199] R10: 0000000000000004 R11: ffff8ee8ba8ba248 R12:
>>>>> ffff8ee887228700
>>>>> [   12.482322] R13: ffff8ee8bd32f828 R14: 0000000000000002 R15:
>>>>> ffff8ee8bae93200
>>>>> [   12.489449] FS:  0000000000000000(0000) GS:ffff8ee8be200000(0000)
>>>>> knlGS:0000000000000000
>>>>> [   12.497524] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [   12.503262] CR2: 0000000100000000 CR3: 000000003c5ae000 CR4:
>>>>> 00000000001006f0
>>>>> [   12.510382] Call Trace:
>>>>> [   12.512841]  ? dwc3_gadget_giveback+0xbf/0x120
>>>>> [   12.517286]  ? __dwc3_gadget_ep_disable+0xc5/0x250
>>>>> [   12.522077]  ? dwc3_gadget_ep_disable+0x3d/0xd0
>>>>> [   12.526608]  ? usb_ep_disable+0x1d/0x80
>>>>> [   12.530451]  ? u_audio_stop_capture+0x87/0x9a [u_audio]
>>>>> [   12.535680]  ? afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>>>> [   12.540562]  ? composite_setup+0x20f/0x1b20 [libcomposite]
>>>>> [   12.546053]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>>>> [   12.552060]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>>>> [   12.558062]  ? dwc3_ep0_delegate_req+0x24/0x40
>>>>> [   12.562502]  ? dwc3_ep0_interrupt+0x40a/0x9d8
>>>>> [   12.566858]  ? dwc3_thread_interrupt+0x880/0xf70
>>>>> [   12.571475]  ? __schedule+0x3ee/0x640
>>>>> [   12.575143]  ? irq_forced_thread_fn+0x70/0x70
>>>>> [   12.579497]  ? irq_thread_fn+0x1b/0x60
>>>>> [   12.583245]  ? irq_thread+0xd3/0x150
>>>>> [   12.586821]  ? wake_threads_waitq+0x30/0x30
>>>>> [   12.591001]  ? irq_thread_dtor+0x80/0x80
>>>>> [   12.594925]  ? kthread+0xf9/0x130
>>>>> [   12.598238]  ? kthread_park+0x80/0x80
>>>>> [   12.601901]  ? ret_from_fork+0x22/0x30
>>>>> [   12.605644] Modules linked in: spi_pxa2xx_platform dw_dmac usb_f_uac2
>>>>> u_audio usb_f_mass_storage usb_f_eem u_ether usb_f_serial u_serial
>>>>> libcomposite pwm_lpss_pci snd_sof_pci snd_sof_intel_byt pwm_lpss
>>>>> snd_sof_intel_ipc snd_sof_xtensa_dsp intel_mrfld_pwrbtn intel_mrfld_adc
>>>>> snd_sof snd_sof_nocodec snd_soc_acpi spi_pxa2xx_pci brcmfmac brcmutil
>>>>> leds_gpio hci_uart btbcm ti_ads7950 industrialio_triggered_buffer
>>>>> kfifo_buf spidev ledtrig_heartbeat mmc_block extcon_intel_mrfld
>>>>> sdhci_pci cqhci sdhci led_class mmc_core intel_soc_pmic_mrfld btrfs
>>>>> libcrc32c xor zstd_compress zlib_deflate raid6_pq
>>>>> [   12.657416] CR2: 0000000100000000
>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>
>>>> It this something you can reproduce on your end? Ferry, can you get dwc3
>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>
>>> I will do that tonight. Is flipping on ftrace_dump_on_oops sufficient or
>>> do I need to do more?
>>>
>>> BTW after posting this I found in host mode dwc3 is not working properly
>>> either. No oops, but no driver get loaded on device plug in.
>>>
>> Not sure if this is what you are looking for (otherwise let me know):
>>
>> root@edison:/proc/sys/kernel# echo 1 > ftrace_dump_on_oops
>> ## flip the switch from host to gadget
>> root@edison:/proc/sys/kernel# [  515.866590] BUG: kernel NULL pointer
>> dereference, address: 0000000000000000
>> [  515.873553] #PF: supervisor read access in kernel mode
>> [  515.878682] #PF: error_code(0x0000) - not-present page
>> [  515.883814] PGD 0 P4D 0
>> [  515.886352] Oops: 0000 [#1] SMP PTI
>> [  515.889844] CPU: 0 PID: 490 Comm: irq/15-dwc3 Not tainted
>> 5.9.0-edison-acpi-standard #1
>> [  515.897836] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>> BIOS 542 2015.01.21:18.19.48
>> [  515.906621] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
> 
> what do you get with:
> 
> $ gdb vmlinux
> (gdb) l *(dwc3_gadget_ep_dequeue+0x41)
> 
> ??
> 
Unfortunately no debug symbols, but I can disassem:
Dump of assembler code for function dwc3_gadget_ep_dequeue:
    0xffffffff8177afe0 <+0>:     push   %r15
    0xffffffff8177afe2 <+2>:     push   %r14
    0xffffffff8177afe4 <+4>:     mov    %rdi,%r14
    0xffffffff8177afe7 <+7>:     push   %r13
    0xffffffff8177afe9 <+9>:     push   %r12
    0xffffffff8177afeb <+11>:    push   %rbp
    0xffffffff8177afec <+12>:    mov    %rsi,%rbp
    0xffffffff8177afef <+15>:    push   %rbx
    0xffffffff8177aff0 <+16>:    mov    0x90(%rdi),%rbx
    0xffffffff8177aff7 <+23>:    nopl   0x0(%rax,%rax,1)
    0xffffffff8177affc <+28>:    lea    0x130(%rbx),%r12
    0xffffffff8177b003 <+35>:    mov    %r12,%rdi
    0xffffffff8177b006 <+38>:    callq  0xffffffff81ba9620 
<_raw_spin_lock_irqsave>
# list_for_each_entry(r, &dep->cancelled_list, list) {
    0xffffffff8177b00b <+43>:    mov    0x48(%r14),%rcx
    0xffffffff8177b00f <+47>:    mov    %rax,%r13
    0xffffffff8177b012 <+50>:    lea    0x48(%r14),%rax
    0xffffffff8177b016 <+54>:    lea    -0x60(%rcx),%rdx
    0xffffffff8177b01a <+58>:    cmp    %rcx,%rax
    0xffffffff8177b01d <+61>:    jne    0xffffffff8177b02e 
<dwc3_gadget_ep_dequeue+78>
    0xffffffff8177b01f <+63>:    jmp    0xffffffff8177b04f 
<dwc3_gadget_ep_dequeue+111>
# crash
    0xffffffff8177b021 <+65>:    mov    0x60(%rdx),%rcx
    0xffffffff8177b025 <+69>:    lea    -0x60(%rcx),%rdx
    0xffffffff8177b029 <+73>:    cmp    %rcx,%rax
    0xffffffff8177b02c <+76>:    je     0xffffffff8177b04f 
<dwc3_gadget_ep_dequeue+111>

    0xffffffff8177b02e <+78>:    cmp    %rdx,%rbp
    0xffffffff8177b031 <+81>:    jne    0xffffffff8177b021 
<dwc3_gadget_ep_dequeue+65>
# }
# out:
    0xffffffff8177b033 <+83>:    xor    %r14d,%r14d
    0xffffffff8177b036 <+86>:    mov    %r13,%rsi
    0xffffffff8177b039 <+89>:    mov    %r12,%rdi
    0xffffffff8177b03c <+92>:    callq  0xffffffff81ba9450 
<_raw_spin_unlock_irqrestore>
    0xffffffff8177b041 <+97>:    mov    %r14d,%eax
    0xffffffff8177b044 <+100>:   pop    %rbx
    0xffffffff8177b045 <+101>:   pop    %rbp
    0xffffffff8177b046 <+102>:   pop    %r12
    0xffffffff8177b048 <+104>:   pop    %r13
    0xffffffff8177b04a <+106>:   pop    %r14
    0xffffffff8177b04c <+108>:   pop    %r15
    0xffffffff8177b04e <+110>:   retq
# list_for_each_entry(r, &dep->pending_list, list) {



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-20 20:37       ` Ferry Toth
@ 2020-10-20 22:10         ` Thinh Nguyen
  2020-10-20 22:58           ` Thinh Nguyen
  0 siblings, 1 reply; 31+ messages in thread
From: Thinh Nguyen @ 2020-10-20 22:10 UTC (permalink / raw)
  To: Ferry Toth, linux-usb; +Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA

Hi,

Ferry Toth wrote:
> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>
>> Hi,
>>
>> Ferry Toth <fntoth@gmail.com> writes:
>>
>> 8< snip
>>
>>>>> [   12.657416] CR2: 0000000100000000
>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>
>>>> It this something you can reproduce on your end? Ferry, can you get
>>>> dwc3
>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>> sufficient or
>>> do I need to do more?
>>
>> you'd have to enable dwc3 trace events first ;-)
>>
>>> BTW after posting this I found in host mode dwc3 is not working
>>> properly
>>> either. No oops, but no driver get loaded on device plug in.
>>
>> okay
>>
> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>
> root@edison:/boot# uname -a
> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
> 2020 x86_64 x86_64 x86_64 GNU/Linux
> root@edison:/boot# echo 1 >
> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
> root@edison:/boot#
> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
> dereference, address: 0000000000000000
> [ 2608.592288] #PF: supervisor read access in kernel mode
> [ 2608.597419] #PF: error_code(0x0000) - not-present page
> [ 2608.602549] PGD 0 P4D 0
> [ 2608.605090] Oops: 0000 [#1] SMP PTI
> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
> 5.9.0-edison-acpi-standard #1
> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
> BIOS 542 2015.01.21:18.19.48
> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
> f6 4c
> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
> 0000000000000000
> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
> ffff9a5f46f48158
> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
> ffff9a5f7dd6a000
> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
> ffff9a5f46f48158
> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
> ffff9a5f7ad77e00
> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
> knlGS:0000000000000000
> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
> 00000000001006e0
> [ 2608.711102] Call Trace:
> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]

I took a look at how the audio function is handling switching alternate
setting and dequeuing endpoints, and I think I found the issue.

Here's a snippet of the free_ep() code in u_audio.c:

static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
{
    .....
        for (i = 0; i < params->req_number; i++) {
                if (prm->ureq[i].req) {
                        usb_ep_dequeue(ep, prm->ureq[i].req);
                        usb_ep_free_request(ep, prm->ureq[i].req);
                        prm->ureq[i].req = NULL;
                }
        }
  ....


usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
the request until it gives back the request. Freeing the request
immediately here will cause a problem.

BR,
Thinh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-20 22:10         ` Thinh Nguyen
@ 2020-10-20 22:58           ` Thinh Nguyen
  2020-10-21  1:47             ` Jack Pham
  2020-10-21 19:45             ` Ferry Toth
  0 siblings, 2 replies; 31+ messages in thread
From: Thinh Nguyen @ 2020-10-20 22:58 UTC (permalink / raw)
  To: Ferry Toth, linux-usb; +Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA

Thinh Nguyen wrote:
> Hi,
>
> Ferry Toth wrote:
>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>> Hi,
>>>
>>> Ferry Toth <fntoth@gmail.com> writes:
>>>
>>> 8< snip
>>>
>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>> It this something you can reproduce on your end? Ferry, can you get
>>>>> dwc3
>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>> sufficient or
>>>> do I need to do more?
>>> you'd have to enable dwc3 trace events first ;-)
>>>
>>>> BTW after posting this I found in host mode dwc3 is not working
>>>> properly
>>>> either. No oops, but no driver get loaded on device plug in.
>>> okay
>>>
>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>
>> root@edison:/boot# uname -a
>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>> root@edison:/boot# echo 1 >
>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>> root@edison:/boot#
>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>> dereference, address: 0000000000000000
>> [ 2608.592288] #PF: supervisor read access in kernel mode
>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>> [ 2608.602549] PGD 0 P4D 0
>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>> 5.9.0-edison-acpi-standard #1
>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>> BIOS 542 2015.01.21:18.19.48
>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
>> f6 4c
>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>> 0000000000000000
>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>> ffff9a5f46f48158
>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>> ffff9a5f7dd6a000
>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>> ffff9a5f46f48158
>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>> ffff9a5f7ad77e00
>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>> knlGS:0000000000000000
>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>> 00000000001006e0
>> [ 2608.711102] Call Trace:
>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
> I took a look at how the audio function is handling switching alternate
> setting and dequeuing endpoints, and I think I found the issue.
>
> Here's a snippet of the free_ep() code in u_audio.c:
>
> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
> {
>     .....
>         for (i = 0; i < params->req_number; i++) {
>                 if (prm->ureq[i].req) {
>                         usb_ep_dequeue(ep, prm->ureq[i].req);
>                         usb_ep_free_request(ep, prm->ureq[i].req);
>                         prm->ureq[i].req = NULL;
>                 }
>         }
>   ....
>
>
> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
> the request until it gives back the request. Freeing the request
> immediately here will cause a problem.

To confirm my suspicion, can you try this and see if you still get oops?

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index eec8e9a9e3ed..b66eb24ec070 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
                        list_for_each_entry_safe(r, t,
&dep->started_list, list)
                                dwc3_gadget_move_cancelled_request(r);
 
+                       dwc3_gadget_ep_cleanup_cancelled_requests(dep);
                        goto out;
                }
        }


This will make usb_ep_dequeue() synchronous. (Note that this is not tested).

BR,
Thinh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-20 22:58           ` Thinh Nguyen
@ 2020-10-21  1:47             ` Jack Pham
  2020-10-21  1:56               ` Thinh Nguyen
  2020-10-22  9:23               ` Andy Shevchenko
  2020-10-21 19:45             ` Ferry Toth
  1 sibling, 2 replies; 31+ messages in thread
From: Jack Pham @ 2020-10-21  1:47 UTC (permalink / raw)
  To: Thinh Nguyen; +Cc: Ferry Toth, linux-usb, felipe.balbi-VuQAYsv1563Yd54FQh9/CA

Hi Thinh, Ferry,

On Tue, Oct 20, 2020 at 10:58:31PM +0000, Thinh Nguyen wrote:
> Thinh Nguyen wrote:
> > Hi,
> >
> > Ferry Toth wrote:
> >> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
> >>> Hi,
> >>>
> >>> Ferry Toth <fntoth@gmail.com> writes:
> >>>
> >>> 8< snip
> >>>
> >>>>>> [   12.657416] CR2: 0000000100000000
> >>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
> >>>>> It this something you can reproduce on your end? Ferry, can you get
> >>>>> dwc3
> >>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
> >>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
> >>>> sufficient or
> >>>> do I need to do more?
> >>> you'd have to enable dwc3 trace events first ;-)
> >>>
> >>>> BTW after posting this I found in host mode dwc3 is not working
> >>>> properly
> >>>> either. No oops, but no driver get loaded on device plug in.
> >>> okay
> >>>
> >> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
> >>
> >> root@edison:/boot# uname -a
> >> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
> >> 2020 x86_64 x86_64 x86_64 GNU/Linux
> >> root@edison:/boot# echo 1 >
> >> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
> >> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
> >> root@edison:/boot#
> >> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
> >> dereference, address: 0000000000000000
> >> [ 2608.592288] #PF: supervisor read access in kernel mode
> >> [ 2608.597419] #PF: error_code(0x0000) - not-present page
> >> [ 2608.602549] PGD 0 P4D 0
> >> [ 2608.605090] Oops: 0000 [#1] SMP PTI
> >> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
> >> 5.9.0-edison-acpi-standard #1
> >> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
> >> BIOS 542 2015.01.21:18.19.48
> >> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
> >> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
> >> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
> >> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
> >> f6 4c
> >> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
> >> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
> >> 0000000000000000
> >> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
> >> ffff9a5f46f48158
> >> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
> >> ffff9a5f7dd6a000
> >> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
> >> ffff9a5f46f48158
> >> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
> >> ffff9a5f7ad77e00
> >> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
> >> knlGS:0000000000000000
> >> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
> >> 00000000001006e0
> >> [ 2608.711102] Call Trace:
> >> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
> >> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
> >> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
> > I took a look at how the audio function is handling switching alternate
> > setting and dequeuing endpoints, and I think I found the issue.
> >
> > Here's a snippet of the free_ep() code in u_audio.c:
> >
> > static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
> > {
> >     .....
> >         for (i = 0; i < params->req_number; i++) {
> >                 if (prm->ureq[i].req) {
> >                         usb_ep_dequeue(ep, prm->ureq[i].req);
> >                         usb_ep_free_request(ep, prm->ureq[i].req);
> >                         prm->ureq[i].req = NULL;
> >                 }
> >         }
> >   ....
> >
> >
> > usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
> > the request until it gives back the request. Freeing the request
> > immediately here will cause a problem.
> 
> To confirm my suspicion, can you try this and see if you still get oops?
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index eec8e9a9e3ed..b66eb24ec070 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
>                         list_for_each_entry_safe(r, t,
> &dep->started_list, list)
>                                 dwc3_gadget_move_cancelled_request(r);
>  
> +                       dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>                         goto out;
>                 }
>         }
> 
> 
> This will make usb_ep_dequeue() synchronous. (Note that this is not tested).

But only for dwc3 right? In general do other UDC drivers provide
synchronous behavior? It does states clearly in the kerneldoc for
usb_ep_dequeue() that the completion is asynchronous.  From
drivers/usb/gadget/udc/core.c:

 * If the request is still active on the endpoint, it is dequeued and
 * eventually its completion routine is called (with status -ECONNRESET);
 * else a negative error code is returned.  This routine is asynchronous,
 * that is, it may return before the completion routine runs.

Alternatively, could we not fix up u_audio.c to deal with this?

diff --git a/drivers/usb/gadget/function/u_audio.c b/drivers/usb/gadget/function/u_audio.c
index 56906d15fb55..f08f036d520e 100644
--- a/drivers/usb/gadget/function/u_audio.c
+++ b/drivers/usb/gadget/function/u_audio.c
@@ -89,7 +89,12 @@ static void u_audio_iso_complete(struct usb_ep *ep, struct usb_request *req)
	struct snd_uac_chip *uac = prm->uac;

	/* i/f shutting down */
-	if (!prm->ep_enabled || req->status == -ESHUTDOWN)
+	if (!prm->ep_enabled) {
+		usb_ep_free_request(ep, req);
+		return;
+	}
+
+	if (req->status == -ESHUTDOWN)
		return;

	/*
@@ -352,7 +357,6 @@ static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
	for (i = 0; i < params->req_number; i++) {
		if (prm->ureq[i].req) {
			usb_ep_dequeue(ep, prm->ureq[i].req);
-			usb_ep_free_request(ep, prm->ureq[i].req);
			prm->ureq[i].req = NULL;
		}
	}

Jack

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-21  1:47             ` Jack Pham
@ 2020-10-21  1:56               ` Thinh Nguyen
  2020-10-21 20:01                 ` Ferry Toth
  2020-10-22  9:23               ` Andy Shevchenko
  1 sibling, 1 reply; 31+ messages in thread
From: Thinh Nguyen @ 2020-10-21  1:56 UTC (permalink / raw)
  To: Jack Pham, Thinh Nguyen
  Cc: Ferry Toth, linux-usb, felipe.balbi-VuQAYsv1563Yd54FQh9/CA

Jack Pham wrote:
> Hi Thinh, Ferry,
>
> On Tue, Oct 20, 2020 at 10:58:31PM +0000, Thinh Nguyen wrote:
>> Thinh Nguyen wrote:
>>> Hi,
>>>
>>> Ferry Toth wrote:
>>>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>>>> Hi,
>>>>>
>>>>> Ferry Toth <fntoth@gmail.com> writes:
>>>>>
>>>>> 8< snip
>>>>>
>>>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>>>> It this something you can reproduce on your end? Ferry, can you get
>>>>>>> dwc3
>>>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>>>> sufficient or
>>>>>> do I need to do more?
>>>>> you'd have to enable dwc3 trace events first ;-)
>>>>>
>>>>>> BTW after posting this I found in host mode dwc3 is not working
>>>>>> properly
>>>>>> either. No oops, but no driver get loaded on device plug in.
>>>>> okay
>>>>>
>>>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>>>
>>>> root@edison:/boot# uname -a
>>>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
>>>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>>> root@edison:/boot# echo 1 >
>>>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>>>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>>> root@edison:/boot#
>>>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>>>> dereference, address: 0000000000000000
>>>> [ 2608.592288] #PF: supervisor read access in kernel mode
>>>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>>>> [ 2608.602549] PGD 0 P4D 0
>>>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>>>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>>>> 5.9.0-edison-acpi-standard #1
>>>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>>> BIOS 542 2015.01.21:18.19.48
>>>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>>>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>>>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>>>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
>>>> f6 4c
>>>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>>>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>>>> 0000000000000000
>>>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>>>> ffff9a5f46f48158
>>>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>>>> ffff9a5f7dd6a000
>>>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>>>> ffff9a5f46f48158
>>>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>>>> ffff9a5f7ad77e00
>>>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>>>> knlGS:0000000000000000
>>>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>>>> 00000000001006e0
>>>> [ 2608.711102] Call Trace:
>>>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>>>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>>>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>> I took a look at how the audio function is handling switching alternate
>>> setting and dequeuing endpoints, and I think I found the issue.
>>>
>>> Here's a snippet of the free_ep() code in u_audio.c:
>>>
>>> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
>>> {
>>>     .....
>>>         for (i = 0; i < params->req_number; i++) {
>>>                 if (prm->ureq[i].req) {
>>>                         usb_ep_dequeue(ep, prm->ureq[i].req);
>>>                         usb_ep_free_request(ep, prm->ureq[i].req);
>>>                         prm->ureq[i].req = NULL;
>>>                 }
>>>         }
>>>   ....
>>>
>>>
>>> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
>>> the request until it gives back the request. Freeing the request
>>> immediately here will cause a problem.
>> To confirm my suspicion, can you try this and see if you still get oops?
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index eec8e9a9e3ed..b66eb24ec070 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
>>                         list_for_each_entry_safe(r, t,
>> &dep->started_list, list)
>>                                 dwc3_gadget_move_cancelled_request(r);
>>  
>> +                       dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>>                         goto out;
>>                 }
>>         }
>>
>>
>> This will make usb_ep_dequeue() synchronous. (Note that this is not tested).
> But only for dwc3 right? In general do other UDC drivers provide
> synchronous behavior? It does states clearly in the kerneldoc for
> usb_ep_dequeue() that the completion is asynchronous.  From
> drivers/usb/gadget/udc/core.c:
>
>  * If the request is still active on the endpoint, it is dequeued and
>  * eventually its completion routine is called (with status -ECONNRESET);
>  * else a negative error code is returned.  This routine is asynchronous,
>  * that is, it may return before the completion routine runs.
>
> Alternatively, could we not fix up u_audio.c to deal with this?

The issue is in u_audio.c. But I want to confirm my suspicion about
whether the oops that Ferry is due to this (most likely it is).

We don't want to apply this as a workaround and mask the issue in audio.c.

>
> diff --git a/drivers/usb/gadget/function/u_audio.c b/drivers/usb/gadget/function/u_audio.c
> index 56906d15fb55..f08f036d520e 100644
> --- a/drivers/usb/gadget/function/u_audio.c
> +++ b/drivers/usb/gadget/function/u_audio.c
> @@ -89,7 +89,12 @@ static void u_audio_iso_complete(struct usb_ep *ep, struct usb_request *req)
> 	struct snd_uac_chip *uac = prm->uac;
>
> 	/* i/f shutting down */
> -	if (!prm->ep_enabled || req->status == -ESHUTDOWN)
> +	if (!prm->ep_enabled) {
> +		usb_ep_free_request(ep, req);
> +		return;
> +	}
> +
> +	if (req->status == -ESHUTDOWN)
> 		return;
>
> 	/*
> @@ -352,7 +357,6 @@ static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
> 	for (i = 0; i < params->req_number; i++) {
> 		if (prm->ureq[i].req) {
> 			usb_ep_dequeue(ep, prm->ureq[i].req);
> -			usb_ep_free_request(ep, prm->ureq[i].req);
> 			prm->ureq[i].req = NULL;
> 		}
> 	}
>
> Jack

Yes, the u_audio.c needs to do something like that.

BR,
Thinh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-20 22:58           ` Thinh Nguyen
  2020-10-21  1:47             ` Jack Pham
@ 2020-10-21 19:45             ` Ferry Toth
  2020-10-21 19:50               ` Thinh Nguyen
  1 sibling, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-21 19:45 UTC (permalink / raw)
  To: linux-usb
  Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg,
	Heikki Krogerus, Andy Shevchenko

Op 21-10-2020 om 00:58 schreef Thinh Nguyen:
> Thinh Nguyen wrote:
>> Hi,
>>
>> Ferry Toth wrote:
>>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>>> Hi,
>>>>
>>>> Ferry Toth <fntoth@gmail.com> writes:
>>>>
>>>> 8< snip
>>>>
>>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>>> It this something you can reproduce on your end? Ferry, can you get
>>>>>> dwc3
>>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>>> sufficient or
>>>>> do I need to do more?
>>>> you'd have to enable dwc3 trace events first ;-)
>>>>
>>>>> BTW after posting this I found in host mode dwc3 is not working
>>>>> properly
>>>>> either. No oops, but no driver get loaded on device plug in.
>>>> okay
>>>>
>>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>>
>>> root@edison:/boot# uname -a
>>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
>>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>> root@edison:/boot# echo 1 >
>>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>> root@edison:/boot#
>>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>>> dereference, address: 0000000000000000
>>> [ 2608.592288] #PF: supervisor read access in kernel mode
>>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>>> [ 2608.602549] PGD 0 P4D 0
>>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>>> 5.9.0-edison-acpi-standard #1
>>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>> BIOS 542 2015.01.21:18.19.48
>>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
>>> f6 4c
>>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>>> 0000000000000000
>>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>>> ffff9a5f46f48158
>>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>>> ffff9a5f7dd6a000
>>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>>> ffff9a5f46f48158
>>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>>> ffff9a5f7ad77e00
>>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>>> knlGS:0000000000000000
>>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>>> 00000000001006e0
>>> [ 2608.711102] Call Trace:
>>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>> I took a look at how the audio function is handling switching alternate
>> setting and dequeuing endpoints, and I think I found the issue.
>>
>> Here's a snippet of the free_ep() code in u_audio.c:
>>
>> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
>> {
>>      .....
>>          for (i = 0; i < params->req_number; i++) {
>>                  if (prm->ureq[i].req) {
>>                          usb_ep_dequeue(ep, prm->ureq[i].req);
>>                          usb_ep_free_request(ep, prm->ureq[i].req);
>>                          prm->ureq[i].req = NULL;
>>                  }
>>          }
>>    ....
>>
>>
>> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
>> the request until it gives back the request. Freeing the request
>> immediately here will cause a problem.
> 
> To confirm my suspicion, can you try this and see if you still get oops?
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index eec8e9a9e3ed..b66eb24ec070 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
>                          list_for_each_entry_safe(r, t,
> &dep->started_list, list)
>                                  dwc3_gadget_move_cancelled_request(r);
>   
> +                       dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>                          goto out;
>                  }
>          }
> 
> 
> This will make usb_ep_dequeue() synchronous. (Note that this is not tested).

Unfortunately, it doesn't work. The trace changes to:
root@edison:~# [  104.418264] BUG: kernel NULL pointer dereference, 
address: 0000000000000000
[  104.425227] #PF: supervisor instruction fetch in kernel mode
[  104.430877] #PF: error_code(0x0010) - not-present page
[  104.436007] PGD 0 P4D 0
[  104.438547] Oops: 0010 [#1] SMP PTI
[  104.442039] CPU: 1 PID: 605 Comm: irq/15-dwc3 Not tainted 
5.9.0-edison-acpi-standard #1
[  104.450027] Hardware name: Intel Corporation Merrifield/BODEGA BAY, 
BIOS 542 2015.01.21:18.19.48
[  104.458802] RIP: 0010:0x0
[  104.461425] Code: Bad RIP value.
[  104.464649] RSP: 0018:ffffae584034fbf8 EFLAGS: 00010046
[  104.469870] RAX: 0000000000000000 RBX: ffff8c198608a028 RCX: 
ffff8c19bb87fa00
[  104.476993] RDX: 00000000ffffff94 RSI: ffff8c19bafa54e0 RDI: 
ffff8c198609ee00
[  104.484118] RBP: ffff8c19bafa54e0 R08: 0000000000000046 R09: 
0000000000000238
[  104.491241] R10: 000000000000002c R11: ffff8c19bcf62490 R12: 
ffff8c198609ee00
[  104.498366] R13: ffff8c198608a028 R14: 0000000000000002 R15: 
ffff8c19bb8ff000
[  104.505493] FS:  0000000000000000(0000) GS:ffff8c19be300000(0000) 
knlGS:0000000000000000
[  104.513572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.519309] CR2: ffffffffffffffd6 CR3: 000000002e80a000 CR4: 
00000000001006e0
[  104.526432] Call Trace:
[  104.528892]  dwc3_gadget_giveback+0xbf/0x120
[  104.533169]  __dwc3_gadget_ep_disable+0xc5/0x250
[  104.537789]  dwc3_gadget_ep_disable+0x3d/0xd0
[  104.542149]  usb_ep_disable+0x1d/0x80
[  104.545823]  u_audio_stop_capture+0x87/0x9a [u_audio]
[  104.550880]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
[  104.555594]  composite_setup+0x20f/0x1b20 [libcomposite]
[  104.560912]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
[  104.566921]  configfs_composite_setup+0x6b/0x90 [libcomposite]
[  104.572752]  dwc3_ep0_delegate_req+0x24/0x40
[  104.577022]  dwc3_ep0_interrupt+0x40a/0x9d8
[  104.581205]  dwc3_thread_interrupt+0x880/0xf70

> BR,
> Thinh
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-21 19:45             ` Ferry Toth
@ 2020-10-21 19:50               ` Thinh Nguyen
  2020-10-21 20:42                 ` Ferry Toth
  0 siblings, 1 reply; 31+ messages in thread
From: Thinh Nguyen @ 2020-10-21 19:50 UTC (permalink / raw)
  To: Ferry Toth, linux-usb
  Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg,
	Heikki Krogerus, Andy Shevchenko

Ferry Toth wrote:
> Op 21-10-2020 om 00:58 schreef Thinh Nguyen:
>> Thinh Nguyen wrote:
>>> Hi,
>>>
>>> Ferry Toth wrote:
>>>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>>>> Hi,
>>>>>
>>>>> Ferry Toth <fntoth@gmail.com> writes:
>>>>>
>>>>> 8< snip
>>>>>
>>>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>>>> It this something you can reproduce on your end? Ferry, can you get
>>>>>>> dwc3
>>>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>>>> sufficient or
>>>>>> do I need to do more?
>>>>> you'd have to enable dwc3 trace events first ;-)
>>>>>
>>>>>> BTW after posting this I found in host mode dwc3 is not working
>>>>>> properly
>>>>>> either. No oops, but no driver get loaded on device plug in.
>>>>> okay
>>>>>
>>>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>>>
>>>> root@edison:/boot# uname -a
>>>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
>>>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>>> root@edison:/boot# echo 1 >
>>>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>>>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>>> root@edison:/boot#
>>>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>>>> dereference, address: 0000000000000000
>>>> [ 2608.592288] #PF: supervisor read access in kernel mode
>>>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>>>> [ 2608.602549] PGD 0 P4D 0
>>>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>>>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>>>> 5.9.0-edison-acpi-standard #1
>>>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>>> BIOS 542 2015.01.21:18.19.48
>>>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>>>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>>>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>>>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
>>>> f6 4c
>>>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>>>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>>>> 0000000000000000
>>>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>>>> ffff9a5f46f48158
>>>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>>>> ffff9a5f7dd6a000
>>>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>>>> ffff9a5f46f48158
>>>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>>>> ffff9a5f7ad77e00
>>>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>>>> knlGS:0000000000000000
>>>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>>>> 00000000001006e0
>>>> [ 2608.711102] Call Trace:
>>>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>>>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>>>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>> I took a look at how the audio function is handling switching alternate
>>> setting and dequeuing endpoints, and I think I found the issue.
>>>
>>> Here's a snippet of the free_ep() code in u_audio.c:
>>>
>>> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep
>>> *ep)
>>> {
>>>      .....
>>>          for (i = 0; i < params->req_number; i++) {
>>>                  if (prm->ureq[i].req) {
>>>                          usb_ep_dequeue(ep, prm->ureq[i].req);
>>>                          usb_ep_free_request(ep, prm->ureq[i].req);
>>>                          prm->ureq[i].req = NULL;
>>>                  }
>>>          }
>>>    ....
>>>
>>>
>>> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
>>> the request until it gives back the request. Freeing the request
>>> immediately here will cause a problem.
>>
>> To confirm my suspicion, can you try this and see if you still get oops?
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index eec8e9a9e3ed..b66eb24ec070 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep
>> *ep,
>>                          list_for_each_entry_safe(r, t,
>> &dep->started_list, list)
>>                                  dwc3_gadget_move_cancelled_request(r);
>>   +                      
>> dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>>                          goto out;
>>                  }
>>          }
>>
>>
>> This will make usb_ep_dequeue() synchronous. (Note that this is not
>> tested).
>
> Unfortunately, it doesn't work. The trace changes to:
> root@edison:~# [  104.418264] BUG: kernel NULL pointer dereference,
> address: 0000000000000000
> [  104.425227] #PF: supervisor instruction fetch in kernel mode
> [  104.430877] #PF: error_code(0x0010) - not-present page
> [  104.436007] PGD 0 P4D 0
> [  104.438547] Oops: 0010 [#1] SMP PTI
> [  104.442039] CPU: 1 PID: 605 Comm: irq/15-dwc3 Not tainted
> 5.9.0-edison-acpi-standard #1
> [  104.450027] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
> BIOS 542 2015.01.21:18.19.48
> [  104.458802] RIP: 0010:0x0
> [  104.461425] Code: Bad RIP value.
> [  104.464649] RSP: 0018:ffffae584034fbf8 EFLAGS: 00010046
> [  104.469870] RAX: 0000000000000000 RBX: ffff8c198608a028 RCX:
> ffff8c19bb87fa00
> [  104.476993] RDX: 00000000ffffff94 RSI: ffff8c19bafa54e0 RDI:
> ffff8c198609ee00
> [  104.484118] RBP: ffff8c19bafa54e0 R08: 0000000000000046 R09:
> 0000000000000238
> [  104.491241] R10: 000000000000002c R11: ffff8c19bcf62490 R12:
> ffff8c198609ee00
> [  104.498366] R13: ffff8c198608a028 R14: 0000000000000002 R15:
> ffff8c19bb8ff000
> [  104.505493] FS:  0000000000000000(0000) GS:ffff8c19be300000(0000)
> knlGS:0000000000000000
> [  104.513572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  104.519309] CR2: ffffffffffffffd6 CR3: 000000002e80a000 CR4:
> 00000000001006e0
> [  104.526432] Call Trace:
> [  104.528892]  dwc3_gadget_giveback+0xbf/0x120
> [  104.533169]  __dwc3_gadget_ep_disable+0xc5/0x250
> [  104.537789]  dwc3_gadget_ep_disable+0x3d/0xd0
> [  104.542149]  usb_ep_disable+0x1d/0x80
> [  104.545823]  u_audio_stop_capture+0x87/0x9a [u_audio]
> [  104.550880]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
> [  104.555594]  composite_setup+0x20f/0x1b20 [libcomposite]
> [  104.560912]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
> [  104.566921]  configfs_composite_setup+0x6b/0x90 [libcomposite]
> [  104.572752]  dwc3_ep0_delegate_req+0x24/0x40
> [  104.577022]  dwc3_ep0_interrupt+0x40a/0x9d8
> [  104.581205]  dwc3_thread_interrupt+0x880/0xf70
>

Oops, looks like I can't make it synchronous this way. Can you try
Jack's change to the u_audio.c instead?

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-21  1:56               ` Thinh Nguyen
@ 2020-10-21 20:01                 ` Ferry Toth
  0 siblings, 0 replies; 31+ messages in thread
From: Ferry Toth @ 2020-10-21 20:01 UTC (permalink / raw)
  To: linux-usb
  Cc: Ferry Toth, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg,
	Heikki Krogerus, Andy Shevchenko

Op 21-10-2020 om 03:56 schreef Thinh Nguyen:
> Jack Pham wrote:
>> Hi Thinh, Ferry,
>>
>> On Tue, Oct 20, 2020 at 10:58:31PM +0000, Thinh Nguyen wrote:
>>> Thinh Nguyen wrote:
>>>> Hi,
>>>>
>>>> Ferry Toth wrote:
>>>>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>>>>> Hi,
>>>>>>
>>>>>> Ferry Toth <fntoth@gmail.com> writes:
>>>>>>
>>>>>> 8< snip
>>>>>>
>>>>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>>>>> It this something you can reproduce on your end? Ferry, can you get
>>>>>>>> dwc3
>>>>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>>>>> sufficient or
>>>>>>> do I need to do more?
>>>>>> you'd have to enable dwc3 trace events first ;-)
>>>>>>
>>>>>>> BTW after posting this I found in host mode dwc3 is not working
>>>>>>> properly
>>>>>>> either. No oops, but no driver get loaded on device plug in.
>>>>>> okay
>>>>>>
>>>>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>>>>
>>>>> root@edison:/boot# uname -a
>>>>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
>>>>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>>>> root@edison:/boot# echo 1 >
>>>>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>>>>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>>>> root@edison:/boot#
>>>>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>>>>> dereference, address: 0000000000000000
>>>>> [ 2608.592288] #PF: supervisor read access in kernel mode
>>>>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>>>>> [ 2608.602549] PGD 0 P4D 0
>>>>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>>>>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>>>>> 5.9.0-edison-acpi-standard #1
>>>>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>>>> BIOS 542 2015.01.21:18.19.48
>>>>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>>>>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>>>>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>>>>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
>>>>> f6 4c
>>>>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>>>>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>>>>> 0000000000000000
>>>>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>>>>> ffff9a5f46f48158
>>>>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>>>>> ffff9a5f7dd6a000
>>>>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>>>>> ffff9a5f46f48158
>>>>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>>>>> ffff9a5f7ad77e00
>>>>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>>>>> 00000000001006e0
>>>>> [ 2608.711102] Call Trace:
>>>>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>>>>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>>>>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>>> I took a look at how the audio function is handling switching alternate
>>>> setting and dequeuing endpoints, and I think I found the issue.
>>>>
>>>> Here's a snippet of the free_ep() code in u_audio.c:
>>>>
>>>> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
>>>> {
>>>>      .....
>>>>          for (i = 0; i < params->req_number; i++) {
>>>>                  if (prm->ureq[i].req) {
>>>>                          usb_ep_dequeue(ep, prm->ureq[i].req);
>>>>                          usb_ep_free_request(ep, prm->ureq[i].req);
>>>>                          prm->ureq[i].req = NULL;
>>>>                  }
>>>>          }
>>>>    ....
>>>>
>>>>
>>>> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
>>>> the request until it gives back the request. Freeing the request
>>>> immediately here will cause a problem.
>>> To confirm my suspicion, can you try this and see if you still get oops?
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index eec8e9a9e3ed..b66eb24ec070 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
>>>                          list_for_each_entry_safe(r, t,
>>> &dep->started_list, list)
>>>                                  dwc3_gadget_move_cancelled_request(r);
>>>   
>>> +                       dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>>>                          goto out;
>>>                  }
>>>          }
>>>
>>>
>>> This will make usb_ep_dequeue() synchronous. (Note that this is not tested).
>> But only for dwc3 right? In general do other UDC drivers provide
>> synchronous behavior? It does states clearly in the kerneldoc for
>> usb_ep_dequeue() that the completion is asynchronous.  From
>> drivers/usb/gadget/udc/core.c:
>>
>>   * If the request is still active on the endpoint, it is dequeued and
>>   * eventually its completion routine is called (with status -ECONNRESET);
>>   * else a negative error code is returned.  This routine is asynchronous,
>>   * that is, it may return before the completion routine runs.
>>
>> Alternatively, could we not fix up u_audio.c to deal with this?
> 
> The issue is in u_audio.c. But I want to confirm my suspicion about
> whether the oops that Ferry is due to this (most likely it is).

When I disable uac2 from my config this oops goes away.
But I also have have eem (ethernet), which appears and then drops away.
So, seems there are more things wrong.

> We don't want to apply this as a workaround and mask the issue in audio.c.
> 
>>
>> diff --git a/drivers/usb/gadget/function/u_audio.c b/drivers/usb/gadget/function/u_audio.c
>> index 56906d15fb55..f08f036d520e 100644
>> --- a/drivers/usb/gadget/function/u_audio.c
>> +++ b/drivers/usb/gadget/function/u_audio.c
>> @@ -89,7 +89,12 @@ static void u_audio_iso_complete(struct usb_ep *ep, struct usb_request *req)
>> 	struct snd_uac_chip *uac = prm->uac;
>>
>> 	/* i/f shutting down */
>> -	if (!prm->ep_enabled || req->status == -ESHUTDOWN)
>> +	if (!prm->ep_enabled) {
>> +		usb_ep_free_request(ep, req);
>> +		return;
>> +	}
>> +
>> +	if (req->status == -ESHUTDOWN)
>> 		return;
>>
>> 	/*
>> @@ -352,7 +357,6 @@ static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
>> 	for (i = 0; i < params->req_number; i++) {
>> 		if (prm->ureq[i].req) {
>> 			usb_ep_dequeue(ep, prm->ureq[i].req);
>> -			usb_ep_free_request(ep, prm->ureq[i].req);
>> 			prm->ureq[i].req = NULL;
>> 		}
>> 	}
>>
>> Jack
> 
> Yes, the u_audio.c needs to do something like that.
> 
> BR,
> Thinh
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-21 19:50               ` Thinh Nguyen
@ 2020-10-21 20:42                 ` Ferry Toth
  2020-10-21 23:32                   ` Thinh Nguyen
  0 siblings, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-21 20:42 UTC (permalink / raw)
  To: linux-usb
  Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg-XMD5yJDbdMReXY1tMh2IBg,
	Heikki Krogerus, Andy Shevchenko

Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
> Ferry Toth wrote:
>> Op 21-10-2020 om 00:58 schreef Thinh Nguyen:
>>> Thinh Nguyen wrote:
>>>> Hi,
>>>>
>>>> Ferry Toth wrote:
>>>>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>>>>> Hi,
>>>>>>
>>>>>> Ferry Toth <fntoth@gmail.com> writes:
>>>>>>
>>>>>> 8< snip
>>>>>>
>>>>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>>>>> It this something you can reproduce on your end? Ferry, can you get
>>>>>>>> dwc3
>>>>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>>>>> sufficient or
>>>>>>> do I need to do more?
>>>>>> you'd have to enable dwc3 trace events first ;-)
>>>>>>
>>>>>>> BTW after posting this I found in host mode dwc3 is not working
>>>>>>> properly
>>>>>>> either. No oops, but no driver get loaded on device plug in.
>>>>>> okay
>>>>>>
>>>>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>>>>
>>>>> root@edison:/boot# uname -a
>>>>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
>>>>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>>>> root@edison:/boot# echo 1 >
>>>>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>>>>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>>>> root@edison:/boot#
>>>>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>>>>> dereference, address: 0000000000000000
>>>>> [ 2608.592288] #PF: supervisor read access in kernel mode
>>>>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>>>>> [ 2608.602549] PGD 0 P4D 0
>>>>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>>>>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>>>>> 5.9.0-edison-acpi-standard #1
>>>>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>>>> BIOS 542 2015.01.21:18.19.48
>>>>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>>>>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>>>>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>>>>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
>>>>> f6 4c
>>>>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>>>>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>>>>> 0000000000000000
>>>>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>>>>> ffff9a5f46f48158
>>>>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>>>>> ffff9a5f7dd6a000
>>>>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>>>>> ffff9a5f46f48158
>>>>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>>>>> ffff9a5f7ad77e00
>>>>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>>>>> 00000000001006e0
>>>>> [ 2608.711102] Call Trace:
>>>>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>>>>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>>>>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>>> I took a look at how the audio function is handling switching alternate
>>>> setting and dequeuing endpoints, and I think I found the issue.
>>>>
>>>> Here's a snippet of the free_ep() code in u_audio.c:
>>>>
>>>> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep
>>>> *ep)
>>>> {
>>>>       .....
>>>>           for (i = 0; i < params->req_number; i++) {
>>>>                   if (prm->ureq[i].req) {
>>>>                           usb_ep_dequeue(ep, prm->ureq[i].req);
>>>>                           usb_ep_free_request(ep, prm->ureq[i].req);
>>>>                           prm->ureq[i].req = NULL;
>>>>                   }
>>>>           }
>>>>     ....
>>>>
>>>>
>>>> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
>>>> the request until it gives back the request. Freeing the request
>>>> immediately here will cause a problem.
>>>
>>> To confirm my suspicion, can you try this and see if you still get oops?
>>>
>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>> index eec8e9a9e3ed..b66eb24ec070 100644
>>> --- a/drivers/usb/dwc3/gadget.c
>>> +++ b/drivers/usb/dwc3/gadget.c
>>> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep
>>> *ep,
>>>                           list_for_each_entry_safe(r, t,
>>> &dep->started_list, list)
>>>                                   dwc3_gadget_move_cancelled_request(r);
>>>    +
>>> dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>>>                           goto out;
>>>                   }
>>>           }
>>>
>>>
>>> This will make usb_ep_dequeue() synchronous. (Note that this is not
>>> tested).
>>
>> Unfortunately, it doesn't work. The trace changes to:
>> root@edison:~# [  104.418264] BUG: kernel NULL pointer dereference,
>> address: 0000000000000000
>> [  104.425227] #PF: supervisor instruction fetch in kernel mode
>> [  104.430877] #PF: error_code(0x0010) - not-present page
>> [  104.436007] PGD 0 P4D 0
>> [  104.438547] Oops: 0010 [#1] SMP PTI
>> [  104.442039] CPU: 1 PID: 605 Comm: irq/15-dwc3 Not tainted
>> 5.9.0-edison-acpi-standard #1
>> [  104.450027] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>> BIOS 542 2015.01.21:18.19.48
>> [  104.458802] RIP: 0010:0x0
>> [  104.461425] Code: Bad RIP value.
>> [  104.464649] RSP: 0018:ffffae584034fbf8 EFLAGS: 00010046
>> [  104.469870] RAX: 0000000000000000 RBX: ffff8c198608a028 RCX:
>> ffff8c19bb87fa00
>> [  104.476993] RDX: 00000000ffffff94 RSI: ffff8c19bafa54e0 RDI:
>> ffff8c198609ee00
>> [  104.484118] RBP: ffff8c19bafa54e0 R08: 0000000000000046 R09:
>> 0000000000000238
>> [  104.491241] R10: 000000000000002c R11: ffff8c19bcf62490 R12:
>> ffff8c198609ee00
>> [  104.498366] R13: ffff8c198608a028 R14: 0000000000000002 R15:
>> ffff8c19bb8ff000
>> [  104.505493] FS:  0000000000000000(0000) GS:ffff8c19be300000(0000)
>> knlGS:0000000000000000
>> [  104.513572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  104.519309] CR2: ffffffffffffffd6 CR3: 000000002e80a000 CR4:
>> 00000000001006e0
>> [  104.526432] Call Trace:
>> [  104.528892]  dwc3_gadget_giveback+0xbf/0x120
>> [  104.533169]  __dwc3_gadget_ep_disable+0xc5/0x250
>> [  104.537789]  dwc3_gadget_ep_disable+0x3d/0xd0
>> [  104.542149]  usb_ep_disable+0x1d/0x80
>> [  104.545823]  u_audio_stop_capture+0x87/0x9a [u_audio]
>> [  104.550880]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>> [  104.555594]  composite_setup+0x20f/0x1b20 [libcomposite]
>> [  104.560912]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>> [  104.566921]  configfs_composite_setup+0x6b/0x90 [libcomposite]
>> [  104.572752]  dwc3_ep0_delegate_req+0x24/0x40
>> [  104.577022]  dwc3_ep0_interrupt+0x40a/0x9d8
>> [  104.581205]  dwc3_thread_interrupt+0x880/0xf70
>>
> 
> Oops, looks like I can't make it synchronous this way. Can you try
> Jack's change to the u_audio.c instead?

Oops indeed goes away with Jack's change, but usb connection goes 
up/down continuously, meaning: my host sees usb network and audio device 
appearing / disappearing.

mass_storage device does not appear all.

Host:
21-10-2020 22:36	kernel	sd 7:0:0:0: [sdd] tag#0 device offline or changed
21-10-2020 22:36	kernel	blk_update_request: I/O error, dev sdd, sector 0 
op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
21-10-2020 22:36	kernel	Buffer I/O error on dev sdd, logical block 0, 
async page read

And on edison:
Oct 21 22:37:37 edison kernel: IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link 
becomes ready
Oct 21 22:37:37 edison kernel[499]: [  436.595952] IPv6: 
ADDRCONF(NETDEV_CHANGE): usb0: link b>
Oct 21 22:37:37 edison systemd-networkd[521]: usb0: Gained carrier
Oct 21 22:37:38 edison systemd-networkd[521]: usb0: Gained IPv6LL
Oct 21 22:38:07 edison systemd-networkd[521]: usb0: Lost carrier
Oct 21 22:38:07 edison systemd-journald[435]: Forwarding to syslog 
missed 4 messages.


> Thanks,
> Thinh
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-21 20:42                 ` Ferry Toth
@ 2020-10-21 23:32                   ` Thinh Nguyen
  2020-10-22 13:43                     ` Andy Shevchenko
  0 siblings, 1 reply; 31+ messages in thread
From: Thinh Nguyen @ 2020-10-21 23:32 UTC (permalink / raw)
  To: Ferry Toth, linux-usb
  Cc: felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg-XMD5yJDbdMReXY1tMh2IBg,
	Heikki Krogerus, Andy Shevchenko

Ferry Toth wrote:
> Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
>> Ferry Toth wrote:
>>> Op 21-10-2020 om 00:58 schreef Thinh Nguyen:
>>>> Thinh Nguyen wrote:
>>>>> Hi,
>>>>>
>>>>> Ferry Toth wrote:
>>>>>> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Ferry Toth <fntoth@gmail.com> writes:
>>>>>>>
>>>>>>> 8< snip
>>>>>>>
>>>>>>>>>> [   12.657416] CR2: 0000000100000000
>>>>>>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
>>>>>>>>> It this something you can reproduce on your end? Ferry, can
>>>>>>>>> you get
>>>>>>>>> dwc3
>>>>>>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
>>>>>>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
>>>>>>>> sufficient or
>>>>>>>> do I need to do more?
>>>>>>> you'd have to enable dwc3 trace events first ;-)
>>>>>>>
>>>>>>>> BTW after posting this I found in host mode dwc3 is not working
>>>>>>>> properly
>>>>>>>> either. No oops, but no driver get loaded on device plug in.
>>>>>>> okay
>>>>>>>
>>>>>> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
>>>>>>
>>>>>> root@edison:/boot# uname -a
>>>>>> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19
>>>>>> 20:17:04 UTC
>>>>>> 2020 x86_64 x86_64 x86_64 GNU/Linux
>>>>>> root@edison:/boot# echo 1 >
>>>>>> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
>>>>>> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
>>>>>> root@edison:/boot#
>>>>>> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
>>>>>> dereference, address: 0000000000000000
>>>>>> [ 2608.592288] #PF: supervisor read access in kernel mode
>>>>>> [ 2608.597419] #PF: error_code(0x0000) - not-present page
>>>>>> [ 2608.602549] PGD 0 P4D 0
>>>>>> [ 2608.605090] Oops: 0000 [#1] SMP PTI
>>>>>> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
>>>>>> 5.9.0-edison-acpi-standard #1
>>>>>> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA
>>>>>> BAY,
>>>>>> BIOS 542 2015.01.21:18.19.48
>>>>>> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
>>>>>> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
>>>>>> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
>>>>>> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee
>>>>>> 45 31
>>>>>> f6 4c
>>>>>> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
>>>>>> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
>>>>>> 0000000000000000
>>>>>> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
>>>>>> ffff9a5f46f48158
>>>>>> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
>>>>>> ffff9a5f7dd6a000
>>>>>> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
>>>>>> ffff9a5f46f48158
>>>>>> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
>>>>>> ffff9a5f7ad77e00
>>>>>> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
>>>>>> 00000000001006e0
>>>>>> [ 2608.711102] Call Trace:
>>>>>> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
>>>>>> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
>>>>>> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>>>> I took a look at how the audio function is handling switching
>>>>> alternate
>>>>> setting and dequeuing endpoints, and I think I found the issue.
>>>>>
>>>>> Here's a snippet of the free_ep() code in u_audio.c:
>>>>>
>>>>> static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep
>>>>> *ep)
>>>>> {
>>>>>       .....
>>>>>           for (i = 0; i < params->req_number; i++) {
>>>>>                   if (prm->ureq[i].req) {
>>>>>                           usb_ep_dequeue(ep, prm->ureq[i].req);
>>>>>                           usb_ep_free_request(ep, prm->ureq[i].req);
>>>>>                           prm->ureq[i].req = NULL;
>>>>>                   }
>>>>>           }
>>>>>     ....
>>>>>
>>>>>
>>>>> usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
>>>>> the request until it gives back the request. Freeing the request
>>>>> immediately here will cause a problem.
>>>>
>>>> To confirm my suspicion, can you try this and see if you still get
>>>> oops?
>>>>
>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>> index eec8e9a9e3ed..b66eb24ec070 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep
>>>> *ep,
>>>>                           list_for_each_entry_safe(r, t,
>>>> &dep->started_list, list)
>>>>                                  
>>>> dwc3_gadget_move_cancelled_request(r);
>>>>    +
>>>> dwc3_gadget_ep_cleanup_cancelled_requests(dep);
>>>>                           goto out;
>>>>                   }
>>>>           }
>>>>
>>>>
>>>> This will make usb_ep_dequeue() synchronous. (Note that this is not
>>>> tested).
>>>
>>> Unfortunately, it doesn't work. The trace changes to:
>>> root@edison:~# [  104.418264] BUG: kernel NULL pointer dereference,
>>> address: 0000000000000000
>>> [  104.425227] #PF: supervisor instruction fetch in kernel mode
>>> [  104.430877] #PF: error_code(0x0010) - not-present page
>>> [  104.436007] PGD 0 P4D 0
>>> [  104.438547] Oops: 0010 [#1] SMP PTI
>>> [  104.442039] CPU: 1 PID: 605 Comm: irq/15-dwc3 Not tainted
>>> 5.9.0-edison-acpi-standard #1
>>> [  104.450027] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
>>> BIOS 542 2015.01.21:18.19.48
>>> [  104.458802] RIP: 0010:0x0
>>> [  104.461425] Code: Bad RIP value.
>>> [  104.464649] RSP: 0018:ffffae584034fbf8 EFLAGS: 00010046
>>> [  104.469870] RAX: 0000000000000000 RBX: ffff8c198608a028 RCX:
>>> ffff8c19bb87fa00
>>> [  104.476993] RDX: 00000000ffffff94 RSI: ffff8c19bafa54e0 RDI:
>>> ffff8c198609ee00
>>> [  104.484118] RBP: ffff8c19bafa54e0 R08: 0000000000000046 R09:
>>> 0000000000000238
>>> [  104.491241] R10: 000000000000002c R11: ffff8c19bcf62490 R12:
>>> ffff8c198609ee00
>>> [  104.498366] R13: ffff8c198608a028 R14: 0000000000000002 R15:
>>> ffff8c19bb8ff000
>>> [  104.505493] FS:  0000000000000000(0000) GS:ffff8c19be300000(0000)
>>> knlGS:0000000000000000
>>> [  104.513572] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  104.519309] CR2: ffffffffffffffd6 CR3: 000000002e80a000 CR4:
>>> 00000000001006e0
>>> [  104.526432] Call Trace:
>>> [  104.528892]  dwc3_gadget_giveback+0xbf/0x120
>>> [  104.533169]  __dwc3_gadget_ep_disable+0xc5/0x250
>>> [  104.537789]  dwc3_gadget_ep_disable+0x3d/0xd0
>>> [  104.542149]  usb_ep_disable+0x1d/0x80
>>> [  104.545823]  u_audio_stop_capture+0x87/0x9a [u_audio]
>>> [  104.550880]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
>>> [  104.555594]  composite_setup+0x20f/0x1b20 [libcomposite]
>>> [  104.560912]  ? configfs_composite_setup+0x6b/0x90 [libcomposite]
>>> [  104.566921]  configfs_composite_setup+0x6b/0x90 [libcomposite]
>>> [  104.572752]  dwc3_ep0_delegate_req+0x24/0x40
>>> [  104.577022]  dwc3_ep0_interrupt+0x40a/0x9d8
>>> [  104.581205]  dwc3_thread_interrupt+0x880/0xf70
>>>
>>
>> Oops, looks like I can't make it synchronous this way. Can you try
>> Jack's change to the u_audio.c instead?
>
> Oops indeed goes away with Jack's change, but usb connection goes
> up/down continuously, meaning: my host sees usb network and audio
> device appearing / disappearing.
>

Ok, thanks for verifying that it went away.

> mass_storage device does not appear all.

There are some fixes to dwc3 in kernel mainline. Is it possible to test
this against linux-next?

Thanks,
Thinh

>
> Host:
> 21-10-2020 22:36    kernel    sd 7:0:0:0: [sdd] tag#0 device offline
> or changed
> 21-10-2020 22:36    kernel    blk_update_request: I/O error, dev sdd,
> sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> 21-10-2020 22:36    kernel    Buffer I/O error on dev sdd, logical
> block 0, async page read
>
> And on edison:
> Oct 21 22:37:37 edison kernel: IPv6: ADDRCONF(NETDEV_CHANGE): usb0:
> link becomes ready
> Oct 21 22:37:37 edison kernel[499]: [  436.595952] IPv6:
> ADDRCONF(NETDEV_CHANGE): usb0: link b>
> Oct 21 22:37:37 edison systemd-networkd[521]: usb0: Gained carrier
> Oct 21 22:37:38 edison systemd-networkd[521]: usb0: Gained IPv6LL
> Oct 21 22:38:07 edison systemd-networkd[521]: usb0: Lost carrier
> Oct 21 22:38:07 edison systemd-journald[435]: Forwarding to syslog
> missed 4 messages.
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-21  1:47             ` Jack Pham
  2020-10-21  1:56               ` Thinh Nguyen
@ 2020-10-22  9:23               ` Andy Shevchenko
  1 sibling, 0 replies; 31+ messages in thread
From: Andy Shevchenko @ 2020-10-22  9:23 UTC (permalink / raw)
  To: Jack Pham, Ruslan Bilovol
  Cc: Thinh Nguyen, Ferry Toth, linux-usb, felipe.balbi-VuQAYsv1563Yd54FQh9/CA

+Cc: Ruslan (maybe you have an insight)
On Wed, Oct 21, 2020 at 3:03 PM Jack Pham <jackp@codeaurora.org> wrote:
>
> Hi Thinh, Ferry,
>
> On Tue, Oct 20, 2020 at 10:58:31PM +0000, Thinh Nguyen wrote:
> > Thinh Nguyen wrote:
> > > Hi,
> > >
> > > Ferry Toth wrote:
> > >> Op 20-10-2020 om 14:32 schreef Felipe Balbi:
> > >>> Hi,
> > >>>
> > >>> Ferry Toth <fntoth@gmail.com> writes:
> > >>>
> > >>> 8< snip
> > >>>
> > >>>>>> [   12.657416] CR2: 0000000100000000
> > >>>>>> [   12.660729] ---[ end trace 9b92dea6da33c71e ]---
> > >>>>> It this something you can reproduce on your end? Ferry, can you get
> > >>>>> dwc3
> > >>>>> trace logs when this happens? ftrace_dump_on_oops may help here.
> > >>>> I will do that tonight. Is flipping on ftrace_dump_on_oops
> > >>>> sufficient or
> > >>>> do I need to do more?
> > >>> you'd have to enable dwc3 trace events first ;-)
> > >>>
> > >>>> BTW after posting this I found in host mode dwc3 is not working
> > >>>> properly
> > >>>> either. No oops, but no driver get loaded on device plug in.
> > >>> okay
> > >>>
> > >> Ehem, you maybe only me to enable /dwc3/dwc3_ep_dequeue/enable:
> > >>
> > >> root@edison:/boot# uname -a
> > >> Linux edison 5.9.0-edison-acpi-standard #1 SMP Mon Oct 19 20:17:04 UTC
> > >> 2020 x86_64 x86_64 x86_64 GNU/Linux
> > >> root@edison:/boot# echo 1 >
> > >> /sys/kernel/debug/tracing/events/dwc3/dwc3_ep_dequeue/enable
> > >> root@edison:/boot# echo 1 > /proc/sys/kernel/ftrace_dump_on_oops
> > >> root@edison:/boot#
> > >> root@edison:/boot# [ 2608.585323] BUG: kernel NULL pointer
> > >> dereference, address: 0000000000000000
> > >> [ 2608.592288] #PF: supervisor read access in kernel mode
> > >> [ 2608.597419] #PF: error_code(0x0000) - not-present page
> > >> [ 2608.602549] PGD 0 P4D 0
> > >> [ 2608.605090] Oops: 0000 [#1] SMP PTI
> > >> [ 2608.608580] CPU: 1 PID: 733 Comm: irq/15-dwc3 Not tainted
> > >> 5.9.0-edison-acpi-standard #1
> > >> [ 2608.616571] Hardware name: Intel Corporation Merrifield/BODEGA BAY,
> > >> BIOS 542 2015.01.21:18.19.48
> > >> [ 2608.625356] RIP: 0010:dwc3_gadget_ep_dequeue+0x41/0x1c0
> > >> [ 2608.630580] Code: e9 51 01 00 00 4c 8d a3 30 01 00 00 4c 89 e7 e8
> > >> 15 e6 42 00 49 8b 4e 48 49 89 c5 49 8d 46 48 48 8d 51 a0 48 39 c8 75
> > >> 0f eb 2e <48> 8b 4a 60 48 8d 51 a0 48 39 c8 74 21 48 39 d5 75 ee 45 31
> > >> f6 4c
> > >> [ 2608.649320] RSP: 0018:ffffa838002a7c40 EFLAGS: 00010087
> > >> [ 2608.654543] RAX: ffff9a5f4609c048 RBX: ffff9a5f46f48028 RCX:
> > >> 0000000000000000
> > >> [ 2608.661666] RDX: ffffffffffffffa0 RSI: 0000000000000008 RDI:
> > >> ffff9a5f46f48158
> > >> [ 2608.668790] RBP: ffff9a5f7bd09b40 R08: 00000000000002d8 R09:
> > >> ffff9a5f7dd6a000
> > >> [ 2608.675913] R10: ffffa838002a7d90 R11: ffff9a5f46f48300 R12:
> > >> ffff9a5f46f48158
> > >> [ 2608.683039] R13: 0000000000000046 R14: ffff9a5f4609c000 R15:
> > >> ffff9a5f7ad77e00
> > >> [ 2608.690165] FS:  0000000000000000(0000) GS:ffff9a5f7e300000(0000)
> > >> knlGS:0000000000000000
> > >> [ 2608.698244] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> [ 2608.703980] CR2: 0000000000000000 CR3: 000000003780a000 CR4:
> > >> 00000000001006e0
> > >> [ 2608.711102] Call Trace:
> > >> [ 2608.713561]  usb_ep_dequeue+0x19/0x80
> > >> [ 2608.717234]  u_audio_stop_capture+0x54/0x9a [u_audio]
> > >> [ 2608.722289]  afunc_set_alt+0x73/0x80 [usb_f_uac2]
> > > I took a look at how the audio function is handling switching alternate
> > > setting and dequeuing endpoints, and I think I found the issue.
> > >
> > > Here's a snippet of the free_ep() code in u_audio.c:
> > >
> > > static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
> > > {
> > >     .....
> > >         for (i = 0; i < params->req_number; i++) {
> > >                 if (prm->ureq[i].req) {
> > >                         usb_ep_dequeue(ep, prm->ureq[i].req);
> > >                         usb_ep_free_request(ep, prm->ureq[i].req);
> > >                         prm->ureq[i].req = NULL;
> > >                 }
> > >         }
> > >   ....
> > >
> > >
> > > usb_ep_dequeue() can be asynchronous. The dwc3 still has ownership of
> > > the request until it gives back the request. Freeing the request
> > > immediately here will cause a problem.
> >
> > To confirm my suspicion, can you try this and see if you still get oops?
> >
> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > index eec8e9a9e3ed..b66eb24ec070 100644
> > --- a/drivers/usb/dwc3/gadget.c
> > +++ b/drivers/usb/dwc3/gadget.c
> > @@ -2031,6 +2031,7 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
> >                         list_for_each_entry_safe(r, t,
> > &dep->started_list, list)
> >                                 dwc3_gadget_move_cancelled_request(r);
> >
> > +                       dwc3_gadget_ep_cleanup_cancelled_requests(dep);
> >                         goto out;
> >                 }
> >         }
> >
> >
> > This will make usb_ep_dequeue() synchronous. (Note that this is not tested).
>
> But only for dwc3 right? In general do other UDC drivers provide
> synchronous behavior? It does states clearly in the kerneldoc for
> usb_ep_dequeue() that the completion is asynchronous.  From
> drivers/usb/gadget/udc/core.c:
>
>  * If the request is still active on the endpoint, it is dequeued and
>  * eventually its completion routine is called (with status -ECONNRESET);
>  * else a negative error code is returned.  This routine is asynchronous,
>  * that is, it may return before the completion routine runs.
>
> Alternatively, could we not fix up u_audio.c to deal with this?
>
> diff --git a/drivers/usb/gadget/function/u_audio.c b/drivers/usb/gadget/function/u_audio.c
> index 56906d15fb55..f08f036d520e 100644
> --- a/drivers/usb/gadget/function/u_audio.c
> +++ b/drivers/usb/gadget/function/u_audio.c
> @@ -89,7 +89,12 @@ static void u_audio_iso_complete(struct usb_ep *ep, struct usb_request *req)
>         struct snd_uac_chip *uac = prm->uac;
>
>         /* i/f shutting down */
> -       if (!prm->ep_enabled || req->status == -ESHUTDOWN)
> +       if (!prm->ep_enabled) {
> +               usb_ep_free_request(ep, req);
> +               return;
> +       }
> +
> +       if (req->status == -ESHUTDOWN)
>                 return;
>
>         /*
> @@ -352,7 +357,6 @@ static inline void free_ep(struct uac_rtd_params *prm, struct usb_ep *ep)
>         for (i = 0; i < params->req_number; i++) {
>                 if (prm->ureq[i].req) {
>                         usb_ep_dequeue(ep, prm->ureq[i].req);
> -                       usb_ep_free_request(ep, prm->ureq[i].req);
>                         prm->ureq[i].req = NULL;
>                 }
>         }
>
> Jack



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-21 23:32                   ` Thinh Nguyen
@ 2020-10-22 13:43                     ` Andy Shevchenko
  2020-10-27 20:13                       ` Ferry Toth
  0 siblings, 1 reply; 31+ messages in thread
From: Andy Shevchenko @ 2020-10-22 13:43 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Ferry Toth, linux-usb,
	felipe.balbi-VuQAYsv1563Yd54FQh9/CA-XMD5yJDbdMReXY1tMh2IBg-XMD5yJDbdMReXY1tMh2IBg,
	Heikki Krogerus, Andy Shevchenko

On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> Ferry Toth wrote:
> > Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
> >> Ferry Toth wrote:

...

> >> Oops, looks like I can't make it synchronous this way. Can you try
> >> Jack's change to the u_audio.c instead?
> >
> > Oops indeed goes away with Jack's change, but usb connection goes
> > up/down continuously, meaning: my host sees usb network and audio
> > device appearing / disappearing.
>
> Ok, thanks for verifying that it went away.
>
> > mass_storage device does not appear all.
>
> There are some fixes to dwc3 in kernel mainline. Is it possible to test
> this against linux-next?

I think the best is to wait for v5.10-rc1 and retest.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-22 13:43                     ` Andy Shevchenko
@ 2020-10-27 20:13                       ` Ferry Toth
  2020-10-27 21:06                         ` Jack Pham
                                           ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Ferry Toth @ 2020-10-27 20:13 UTC (permalink / raw)
  To: Andy Shevchenko, linux-usb
  Cc: Thinh Nguyen, Andy Shevchenko, felipe.balbi, Heikki Krogerus, Jack Pham

Hi guys,

Sorry for messing up the CC list. This was partly thanks to gmane, 
partly my own stupidity. I hope it is complete now.

I am summarizing the status of this one at the bottom.

Op 22-10-2020 om 15:43 schreef Andy Shevchenko:
> On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>> Ferry Toth wrote:
>>> Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
>>>> Ferry Toth wrote:
> ...
>
>>>> Oops, looks like I can't make it synchronous this way. Can you try
>>>> Jack's change to the u_audio.c instead?
>>> Oops indeed goes away with Jack's change, but usb connection goes
>>> up/down continuously, meaning: my host sees usb network and audio
>>> device appearing / disappearing.
>> Ok, thanks for verifying that it went away.
>>
>>> mass_storage device does not appear all.
>> There are some fixes to dwc3 in kernel mainline. Is it possible to test
>> this against linux-next?
> I think the best is to wait for v5.10-rc1 and retest.
>
I looks like there have been at least 3 problems:

1) dwc3 was not working in host mode, but not causing an oops. This may 
have been caused by platform changes. Andy has provided a fix for this, 
dwc3 now working in host mode on 5.9

2) dwc3 was causing the oops in gadget mode as referenced in this 
thread. The experimental patch from Jack Phan indeed fixes this.

Code here: https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0

3) With the above 2 fixes gadgets work but seem to be powered down 
(after 15 sec. or so) and up (after 1 sec.) continuously. No oops, no 
errors in journal. The gadgets I enabled are a network, sound and mass 
storage. The latter stops working due to going up/down quickly. But my 
host shows network/sound appearing/disappearing. Journal of edison shows:

systemd-networkd[525]: usb0: Gained carrier
systemd-networkd[525]: usb0: Gained IPv6LL
systemd-networkd[525]: usb0: Lost carrier
systemd-networkd[525]: usb0: Gained carrier
systemd-networkd[525]: usb0: Gained IPv6LL
systemd-networkd[525]: usb0: Lost carrier

Any ideas how to proceed are highly welcomed!



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-27 20:13                       ` Ferry Toth
@ 2020-10-27 21:06                         ` Jack Pham
  2020-10-27 22:07                           ` Ferry Toth
  2020-10-27 21:16                         ` Andy Shevchenko
  2020-10-27 21:19                         ` Andy Shevchenko
  2 siblings, 1 reply; 31+ messages in thread
From: Jack Pham @ 2020-10-27 21:06 UTC (permalink / raw)
  To: Ferry Toth
  Cc: Andy Shevchenko, linux-usb, Thinh Nguyen, felipe.balbi, Heikki Krogerus

Hi Ferry,

On Tue, Oct 27, 2020 at 09:13:31PM +0100, Ferry Toth wrote:
> Hi guys,
> 
> Sorry for messing up the CC list. This was partly thanks to gmane, partly my
> own stupidity. I hope it is complete now.
> 
> I am summarizing the status of this one at the bottom.
> 
> Op 22-10-2020 om 15:43 schreef Andy Shevchenko:
> > On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> > > Ferry Toth wrote:
> > > > Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
> > > > > Ferry Toth wrote:
> > ...
> > 
> > > > > Oops, looks like I can't make it synchronous this way. Can you try
> > > > > Jack's change to the u_audio.c instead?
> > > > Oops indeed goes away with Jack's change, but usb connection goes
> > > > up/down continuously, meaning: my host sees usb network and audio
> > > > device appearing / disappearing.
> > > Ok, thanks for verifying that it went away.
> > > 
> > > > mass_storage device does not appear all.
> > > There are some fixes to dwc3 in kernel mainline. Is it possible to test
> > > this against linux-next?
> > I think the best is to wait for v5.10-rc1 and retest.
> > 
> I looks like there have been at least 3 problems:
> 
> 1) dwc3 was not working in host mode, but not causing an oops. This may have
> been caused by platform changes. Andy has provided a fix for this, dwc3 now
> working in host mode on 5.9
> 
> 2) dwc3 was causing the oops in gadget mode as referenced in this thread.
> The experimental patch from Jack Phan indeed fixes this.
> 
> Code here: https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0

Great, thanks! I'll submit the patch to the list. Is it alright if I
add a Reported-and-tested-by tag from you?

> 3) With the above 2 fixes gadgets work but seem to be powered down (after 15
> sec. or so) and up (after 1 sec.) continuously. No oops, no errors in
> journal. The gadgets I enabled are a network, sound and mass storage. The
> latter stops working due to going up/down quickly. But my host shows
> network/sound appearing/disappearing. Journal of edison shows:
> 
> systemd-networkd[525]: usb0: Gained carrier
> systemd-networkd[525]: usb0: Gained IPv6LL
> systemd-networkd[525]: usb0: Lost carrier
> systemd-networkd[525]: usb0: Gained carrier
> systemd-networkd[525]: usb0: Gained IPv6LL
> systemd-networkd[525]: usb0: Lost carrier
> 
> Any ideas how to proceed are highly welcomed!

I suppose you can start with enabling dwc3 trace events and try to see
what's going on from the gadget side. Please refer to
Documentation/driver-api/usb/dwc3.rst **Reporting Bugs**

Also what happens if you enable the network, sound and mass storage
functions individually rather than all at once (assuming you are using
ConfigFS)?

Jack
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-27 20:13                       ` Ferry Toth
  2020-10-27 21:06                         ` Jack Pham
@ 2020-10-27 21:16                         ` Andy Shevchenko
  2020-10-27 21:54                           ` Ferry Toth
  2020-10-27 21:19                         ` Andy Shevchenko
  2 siblings, 1 reply; 31+ messages in thread
From: Andy Shevchenko @ 2020-10-27 21:16 UTC (permalink / raw)
  To: Ferry Toth
  Cc: linux-usb, Thinh Nguyen, Felipe Balbi, Heikki Krogerus, Jack Pham

On Tue, Oct 27, 2020 at 10:13 PM Ferry Toth <fntoth@gmail.com> wrote:
> Op 22-10-2020 om 15:43 schreef Andy Shevchenko:
> > On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> >> Ferry Toth wrote:

...

> >> There are some fixes to dwc3 in kernel mainline. Is it possible to test
> >> this against linux-next?
> > I think the best is to wait for v5.10-rc1 and retest.

Can you give a try of v5.10-rc1?

> I looks like there have been at least 3 problems:
>
> 1) dwc3 was not working in host mode, but not causing an oops. This may
> have been caused by platform changes. Andy has provided a fix for this,
> dwc3 now working in host mode on 5.9

Rafael took the above mentioned change(s) for v5.10-rcX.
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=99aed9227073fb34ce2880cbc7063e04185a65e1

> 2) dwc3 was causing the oops in gadget mode as referenced in this
> thread. The experimental patch from Jack Phan indeed fixes this.
>
> Code here: https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>
> 3) With the above 2 fixes gadgets work but seem to be powered down
> (after 15 sec. or so) and up (after 1 sec.) continuously. No oops, no
> errors in journal. The gadgets I enabled are a network, sound and mass
> storage. The latter stops working due to going up/down quickly. But my
> host shows network/sound appearing/disappearing. Journal of edison shows:
>
> systemd-networkd[525]: usb0: Gained carrier
> systemd-networkd[525]: usb0: Gained IPv6LL
> systemd-networkd[525]: usb0: Lost carrier
> systemd-networkd[525]: usb0: Gained carrier
> systemd-networkd[525]: usb0: Gained IPv6LL
> systemd-networkd[525]: usb0: Lost carrier
>
> Any ideas how to proceed are highly welcomed!

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-27 20:13                       ` Ferry Toth
  2020-10-27 21:06                         ` Jack Pham
  2020-10-27 21:16                         ` Andy Shevchenko
@ 2020-10-27 21:19                         ` Andy Shevchenko
  2 siblings, 0 replies; 31+ messages in thread
From: Andy Shevchenko @ 2020-10-27 21:19 UTC (permalink / raw)
  To: Ferry Toth, Felipe Balbi
  Cc: linux-usb, Thinh Nguyen, Heikki Krogerus, Jack Pham

+Cc: Felipe (Ferry, note the correct address)

On Tue, Oct 27, 2020 at 10:13 PM Ferry Toth <fntoth@gmail.com> wrote:
>
> Hi guys,
>
> Sorry for messing up the CC list. This was partly thanks to gmane,
> partly my own stupidity. I hope it is complete now.
>
> I am summarizing the status of this one at the bottom.
>
> Op 22-10-2020 om 15:43 schreef Andy Shevchenko:
> > On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
> >> Ferry Toth wrote:
> >>> Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
> >>>> Ferry Toth wrote:
> > ...
> >
> >>>> Oops, looks like I can't make it synchronous this way. Can you try
> >>>> Jack's change to the u_audio.c instead?
> >>> Oops indeed goes away with Jack's change, but usb connection goes
> >>> up/down continuously, meaning: my host sees usb network and audio
> >>> device appearing / disappearing.
> >> Ok, thanks for verifying that it went away.
> >>
> >>> mass_storage device does not appear all.
> >> There are some fixes to dwc3 in kernel mainline. Is it possible to test
> >> this against linux-next?
> > I think the best is to wait for v5.10-rc1 and retest.
> >
> I looks like there have been at least 3 problems:
>
> 1) dwc3 was not working in host mode, but not causing an oops. This may
> have been caused by platform changes. Andy has provided a fix for this,
> dwc3 now working in host mode on 5.9
>
> 2) dwc3 was causing the oops in gadget mode as referenced in this
> thread. The experimental patch from Jack Phan indeed fixes this.
>
> Code here: https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>
> 3) With the above 2 fixes gadgets work but seem to be powered down
> (after 15 sec. or so) and up (after 1 sec.) continuously. No oops, no
> errors in journal. The gadgets I enabled are a network, sound and mass
> storage. The latter stops working due to going up/down quickly. But my
> host shows network/sound appearing/disappearing. Journal of edison shows:
>
> systemd-networkd[525]: usb0: Gained carrier
> systemd-networkd[525]: usb0: Gained IPv6LL
> systemd-networkd[525]: usb0: Lost carrier
> systemd-networkd[525]: usb0: Gained carrier
> systemd-networkd[525]: usb0: Gained IPv6LL
> systemd-networkd[525]: usb0: Lost carrier
>
> Any ideas how to proceed are highly welcomed!
>
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-27 21:16                         ` Andy Shevchenko
@ 2020-10-27 21:54                           ` Ferry Toth
  2020-10-28  9:18                             ` Felipe Balbi
  0 siblings, 1 reply; 31+ messages in thread
From: Ferry Toth @ 2020-10-27 21:54 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: linux-usb, Thinh Nguyen, Felipe Balbi, Heikki Krogerus, Jack Pham

Hi,

Op 27-10-2020 om 22:16 schreef Andy Shevchenko:
> On Tue, Oct 27, 2020 at 10:13 PM Ferry Toth <fntoth@gmail.com> wrote:
>> Op 22-10-2020 om 15:43 schreef Andy Shevchenko:
>>> On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>> Ferry Toth wrote:
> ...
>
>>>> There are some fixes to dwc3 in kernel mainline. Is it possible to test
>>>> this against linux-next?
>>> I think the best is to wait for v5.10-rc1 and retest.
> Can you give a try of v5.10-rc1?

Yes, I just tried:

I booted in host mode, then flip the switch. Gadget come up, go down 
once, then come up again and stay up.

I tested mass storage mounts, network works. Sound device is detected by 
the host, I didn't actually try to play a sound.

So, big improvement, thanks!

>> I looks like there have been at least 3 problems:
>>
>> 1) dwc3 was not working in host mode, but not causing an oops. This may
>> have been caused by platform changes. Andy has provided a fix for this,
>> dwc3 now working in host mode on 5.9
> Rafael took the above mentioned change(s) for v5.10-rcX.
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=99aed9227073fb34ce2880cbc7063e04185a65e1
>
>> 2) dwc3 was causing the oops in gadget mode as referenced in this
>> thread. The experimental patch from Jack Phan indeed fixes this.
>>
>> Code here: https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
>>
>> 3) With the above 2 fixes gadgets work but seem to be powered down
>> (after 15 sec. or so) and up (after 1 sec.) continuously. No oops, no
>> errors in journal. The gadgets I enabled are a network, sound and mass
>> storage. The latter stops working due to going up/down quickly. But my
>> host shows network/sound appearing/disappearing. Journal of edison shows:
>>
>> systemd-networkd[525]: usb0: Gained carrier
>> systemd-networkd[525]: usb0: Gained IPv6LL
>> systemd-networkd[525]: usb0: Lost carrier
>> systemd-networkd[525]: usb0: Gained carrier
>> systemd-networkd[525]: usb0: Gained IPv6LL
>> systemd-networkd[525]: usb0: Lost carrier
>>
>> Any ideas how to proceed are highly welcomed!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-27 21:06                         ` Jack Pham
@ 2020-10-27 22:07                           ` Ferry Toth
  0 siblings, 0 replies; 31+ messages in thread
From: Ferry Toth @ 2020-10-27 22:07 UTC (permalink / raw)
  To: Jack Pham
  Cc: Andy Shevchenko, linux-usb, Thinh Nguyen, Felipe Balbi, Heikki Krogerus

Hi,

Op 27-10-2020 om 22:06 schreef Jack Pham:
> Hi Ferry,
>
> On Tue, Oct 27, 2020 at 09:13:31PM +0100, Ferry Toth wrote:
>> Hi guys,
>>
>> Sorry for messing up the CC list. This was partly thanks to gmane, partly my
>> own stupidity. I hope it is complete now.
>>
>> I am summarizing the status of this one at the bottom.
>>
>> Op 22-10-2020 om 15:43 schreef Andy Shevchenko:
>>> On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>> Ferry Toth wrote:
>>>>> Op 21-10-2020 om 21:50 schreef Thinh Nguyen:
>>>>>> Ferry Toth wrote:
>>> ...
>>>
>>>>>> Oops, looks like I can't make it synchronous this way. Can you try
>>>>>> Jack's change to the u_audio.c instead?
>>>>> Oops indeed goes away with Jack's change, but usb connection goes
>>>>> up/down continuously, meaning: my host sees usb network and audio
>>>>> device appearing / disappearing.
>>>> Ok, thanks for verifying that it went away.
>>>>
>>>>> mass_storage device does not appear all.
>>>> There are some fixes to dwc3 in kernel mainline. Is it possible to test
>>>> this against linux-next?
>>> I think the best is to wait for v5.10-rc1 and retest.
>>>
>> I looks like there have been at least 3 problems:
>>
>> 1) dwc3 was not working in host mode, but not causing an oops. This may have
>> been caused by platform changes. Andy has provided a fix for this, dwc3 now
>> working in host mode on 5.9
>>
>> 2) dwc3 was causing the oops in gadget mode as referenced in this thread.
>> The experimental patch from Jack Phan indeed fixes this.
>>
>> Code here: https://github.com/edison-fw/linux/commits/eds-acpi-5.9.0
> Great, thanks! I'll submit the patch to the list. Is it alright if I
> add a Reported-and-tested-by tag from you?

Sure.

I also tested against 5.10-rc1 now, see below.

>> 3) With the above 2 fixes gadgets work but seem to be powered down (after 15
>> sec. or so) and up (after 1 sec.) continuously. No oops, no errors in
>> journal. The gadgets I enabled are a network, sound and mass storage. The
>> latter stops working due to going up/down quickly. But my host shows
>> network/sound appearing/disappearing. Journal of edison shows:
>>
>> systemd-networkd[525]: usb0: Gained carrier
>> systemd-networkd[525]: usb0: Gained IPv6LL
>> systemd-networkd[525]: usb0: Lost carrier
>> systemd-networkd[525]: usb0: Gained carrier
>> systemd-networkd[525]: usb0: Gained IPv6LL
>> systemd-networkd[525]: usb0: Lost carrier

With 5.10-rc1 booting in host mode, then switching to gadget lost 
carrier (after 30 dec.) happens only once, then comes back and stays:

root@edison:~# journalctl -b -1 | grep usb0
Oct 27 22:36:52 edison kernel: usb0: HOST MAC aa:bb:cc:dd:ee:f2
Oct 27 22:36:52 edison kernel: usb0: MAC aa:bb:cc:dd:ee:f1
Oct 27 22:36:52 edison kernel: IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link 
becomes ready
Oct 27 22:36:52 edison systemd-networkd[527]: usb0: Gained carrier
Oct 27 22:36:54 edison systemd-networkd[527]: usb0: Gained IPv6LL
Oct 27 22:37:25 edison systemd-networkd[527]: usb0: Lost carrier
Oct 27 22:37:26 edison kernel: IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link 
becomes ready
Oct 27 22:37:26 edison systemd-networkd[527]: usb0: Gained carrier
Oct 27 22:37:27 edison systemd-networkd[527]: usb0: Gained IPv6LL

Booting in gadget mode sequence of event is defferent:
root@edison:~# journalctl -b | grep usb0
Oct 27 22:58:54 edison kernel: usb0: HOST MAC aa:bb:cc:dd:ee:f2
Oct 27 22:58:54 edison kernel: usb0: MAC aa:bb:cc:dd:ee:f1
Oct 27 22:58:57 edison systemd-networkd[543]: usb0: Gained IPv6LL
Oct 27 22:58:57 edison systemd-networkd[543]: usb0: Lost carrier
Oct 27 22:58:57 edison systemd-networkd[543]: usb0: Gained carrier


>> Any ideas how to proceed are highly welcomed!
> I suppose you can start with enabling dwc3 trace events and try to see
> what's going on from the gadget side. Please refer to
> Documentation/driver-api/usb/dwc3.rst **Reporting Bugs**
I'll try to get something meaningful here
> Also what happens if you enable the network, sound and mass storage
> functions individually rather than all at once (assuming you are using
> ConfigFS)?
I haven't tried this yet. I did try disabling sound entirely but that 
made no difference.
> Jack

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: BUG with linux 5.9.0 with dwc3 in gadget mode
  2020-10-27 21:54                           ` Ferry Toth
@ 2020-10-28  9:18                             ` Felipe Balbi
  0 siblings, 0 replies; 31+ messages in thread
From: Felipe Balbi @ 2020-10-28  9:18 UTC (permalink / raw)
  To: Ferry Toth, Andy Shevchenko
  Cc: linux-usb, Thinh Nguyen, Heikki Krogerus, Jack Pham


[-- Attachment #1: Type: text/plain, Size: 944 bytes --]


Hi,

Ferry Toth <fntoth@gmail.com> writes:
> Op 27-10-2020 om 22:16 schreef Andy Shevchenko:
>> On Tue, Oct 27, 2020 at 10:13 PM Ferry Toth <fntoth@gmail.com> wrote:
>>> Op 22-10-2020 om 15:43 schreef Andy Shevchenko:
>>>> On Thu, Oct 22, 2020 at 4:21 PM Thinh Nguyen <Thinh.Nguyen@synopsys.com> wrote:
>>>>> Ferry Toth wrote:
>> ...
>>
>>>>> There are some fixes to dwc3 in kernel mainline. Is it possible to test
>>>>> this against linux-next?
>>>> I think the best is to wait for v5.10-rc1 and retest.
>> Can you give a try of v5.10-rc1?
>
> Yes, I just tried:
>
> I booted in host mode, then flip the switch. Gadget come up, go down 
> once, then come up again and stay up.

please collect trace events. It's important to figure out why it's going
down, even if only once. Make sure to collect trace *and* dmesg so we
can correlate trace with the reenumeration that should show up in dmesg.

thanks

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 857 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, back to index

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-16 20:21 BUG with linux 5.9.0 with dwc3 in gadget mode Ferry Toth
2020-10-19  5:45 ` Felipe Balbi
2020-10-19  7:14   ` Ferry Toth
2020-10-19 18:49     ` Ferry Toth
2020-10-20 12:35       ` Felipe Balbi
2020-10-20 21:01         ` Ferry Toth
2020-10-19  7:18   ` Ferry Toth
2020-10-20 12:32     ` Felipe Balbi
2020-10-20 19:46       ` Ferry Toth
2020-10-20 20:37       ` Ferry Toth
2020-10-20 22:10         ` Thinh Nguyen
2020-10-20 22:58           ` Thinh Nguyen
2020-10-21  1:47             ` Jack Pham
2020-10-21  1:56               ` Thinh Nguyen
2020-10-21 20:01                 ` Ferry Toth
2020-10-22  9:23               ` Andy Shevchenko
2020-10-21 19:45             ` Ferry Toth
2020-10-21 19:50               ` Thinh Nguyen
2020-10-21 20:42                 ` Ferry Toth
2020-10-21 23:32                   ` Thinh Nguyen
2020-10-22 13:43                     ` Andy Shevchenko
2020-10-27 20:13                       ` Ferry Toth
2020-10-27 21:06                         ` Jack Pham
2020-10-27 22:07                           ` Ferry Toth
2020-10-27 21:16                         ` Andy Shevchenko
2020-10-27 21:54                           ` Ferry Toth
2020-10-28  9:18                             ` Felipe Balbi
2020-10-27 21:19                         ` Andy Shevchenko
2020-10-19 19:46   ` Andy Shevchenko
2020-10-19 20:46     ` Ferry Toth
2020-10-20 13:27     ` Andy Shevchenko

Linux-USB Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \
		linux-usb@vger.kernel.org
	public-inbox-index linux-usb

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git