All of lore.kernel.org
 help / color / mirror / Atom feed
* Oops in xhci_endpoint_reset
@ 2019-07-27  3:15 Bob Gleitsmann
  2019-07-27 10:59 ` Greg KH
  0 siblings, 1 reply; 11+ messages in thread
From: Bob Gleitsmann @ 2019-07-27  3:15 UTC (permalink / raw)
  To: linux-usb

[-- Attachment #1: Type: text/plain, Size: 386 bytes --]

Hello,


I have seen kernel oopses on waking from suspend to memory. I got this
twice, one dmesg with backtrace attached. The other one had the failure
in the same place in the code.


This is kernel 5.3.0-rc1, patched for another problem in ethernet PHY
driver. Have not had the problem with earlier kernels. Using Gentoo
linux, amd64, but git kernel.


Best Wishes,


Bob Gleitsmann


[-- Attachment #2: dmesg-efi-1564019 --]
[-- Type: text/plain, Size: 24853 bytes --]

Oops#1 Part14
<6>[    1.091483]  sda: sda1 sda2 sda3
<5>[    1.091976] sd 2:0:0:0: [sdb] 4096-byte physical blocks
<5>[    1.093079] sd 1:0:0:0: [sda] Attached SCSI disk
<5>[    1.093356] sd 2:0:0:0: [sdb] Write Protect is off
<7>[    1.094495] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
<5>[    1.094514] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
<6>[    1.139099]  sdb: sdb1 sdb2 sdb3 sdb4
<5>[    1.140084] sd 2:0:0:0: [sdb] Attached SCSI disk
<7>[    1.167738] PM: Image not found (code -22)
<6>[    1.207334] EXT4-fs (sdb3): mounted filesystem with ordered data mode. Opts: (null)
<6>[    1.207966] VFS: Mounted root (ext4 filesystem) readonly on device 8:19.
<6>[    1.210562] Freeing unused kernel image memory: 2292K
<6>[    1.211188] Write protecting the kernel read-only data: 18432k
<6>[    1.212437] Freeing unused kernel image memory: 2028K
<6>[    1.213310] Freeing unused kernel image memory: 840K
<6>[    1.213904] rodata_test: all tests were successful
<6>[    1.214516] Run /sbin/init as init process
<6>[    1.239825] usb 7-1: New USB device found, idVendor=0763, idProduct=1011, bcdDevice= 1.21
<6>[    1.240419] usb 7-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
<6>[    1.412871] tsc: Refined TSC clocksource calibration: 4219.303 MHz
<6>[    1.413658] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x3cd19c0c3f5, max_idle_ns: 440795202126 ns
<6>[    1.414342] clocksource: Switched to clocksource tsc
<5>[    1.561167] audit: type=1404 audit(1564013745.305:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=selinux res=1
<5>[    1.604674] random: fast init done
<6>[    1.627185] SELinux:  policy capability network_peer_controls=0
<6>[    1.627868] SELinux:  policy capability open_perms=1
Oops#1 Part13
<6>[    1.628536] SELinux:  policy capability extended_socket_class=1
<6>[    1.629171] SELinux:  policy capability always_check_network=0
<6>[    1.629828] SELinux:  policy capability cgroup_seclabel=1
<6>[    1.630440] SELinux:  policy capability nnp_nosuid_transition=1
<6>[    1.638864] usb 3-3: new high-speed USB device number 3 using ehci-pci
<5>[    1.643997] audit: type=1403 audit(1564013745.388:3): auid=4294967295 ses=4294967295 lsm=selinux res=1
<6>[    1.769120] usb 3-3: New USB device found, idVendor=058f, idProduct=6254, bcdDevice= 1.00
<6>[    1.769794] usb 3-3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
<6>[    1.771240] hub 3-3:1.0: USB hub found
<6>[    1.772422] hub 3-3:1.0: 4 ports detected
<6>[    2.017868] usb 7-4: new low-speed USB device number 3 using ohci-pci
<5>[    2.075368] random: crng init done
<6>[    2.181898] usb 7-4: New USB device found, idVendor=05b8, idProduct=3279, bcdDevice= 0.01
<6>[    2.182697] usb 7-4: New USB device strings: Mfr=0, Product=2, SerialNumber=0
<6>[    2.183347] usb 7-4: Product: Wired Keyboard
<6>[    2.190294] input: Wired Keyboard as /devices/pci0000:00/0000:00:16.0/usb7/7-4/7-4:1.0/0003:05B8:3279.0001/input/input2
<6>[    2.243248] hid-generic 0003:05B8:3279.0001: input,hidraw0: USB HID v1.11 Keyboard [Wired Keyboard] on usb-0000:00:16.0-4/input0
<6>[    2.249162] input: Wired Keyboard Consumer Control as /devices/pci0000:00/0000:00:16.0/usb7/7-4/7-4:1.1/0003:05B8:3279.0002/input/input3
<6>[    2.302089] input: Wired Keyboard System Control as /devices/pci0000:00/0000:00:16.0/usb7/7-4/7-4:1.1/0003:05B8:3279.0002/input/input4
<6>[    2.303797] hid-generic 0003:05B8:3279.0002: input,hidraw1: USB HID v1.11 Device [Wired Keyboard] on usb-0000:00:16.0-4/input1
Oops#1 Part12
<6>[    2.601802] usb 3-3.1: new full-speed USB device number 5 using ehci-pci
<6>[    2.697733] usb 3-3.1: New USB device found, idVendor=046d, idProduct=c52b, bcdDevice=12.01
<6>[    2.698509] usb 3-3.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
<6>[    2.699309] usb 3-3.1: Product: USB Receiver
<6>[    2.700092] usb 3-3.1: Manufacturer: Logitech
<6>[    2.702849] input: Logitech USB Receiver as /devices/pci0000:00/0000:00:16.2/usb3/3-3/3-3.1/3-3.1:1.0/0003:046D:C52B.0003/input/input5
<6>[    2.756138] hid-generic 0003:046D:C52B.0003: input,hidraw2: USB HID v1.11 Keyboard [Logitech USB Receiver] on usb-0000:00:16.2-3.1/input0
<6>[    2.760142] input: Logitech USB Receiver Mouse as /devices/pci0000:00/0000:00:16.2/usb3/3-3/3-3.1/3-3.1:1.1/0003:046D:C52B.0004/input/input6
<6>[    2.761859] input: Logitech USB Receiver Consumer Control as /devices/pci0000:00/0000:00:16.2/usb3/3-3/3-3.1/3-3.1:1.1/0003:046D:C52B.0004/input/input7
<6>[    2.815022] input: Logitech USB Receiver System Control as /devices/pci0000:00/0000:00:16.2/usb3/3-3/3-3.1/3-3.1:1.1/0003:046D:C52B.0004/input/input8
<6>[    2.817123] hid-generic 0003:046D:C52B.0004: input,hiddev96,hidraw3: USB HID v1.11 Mouse [Logitech USB Receiver] on usb-0000:00:16.2-3.1/input1
<6>[    2.820802] hid-generic 0003:046D:C52B.0005: hiddev97,hidraw4: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:16.2-3.1/input2
<30>[    6.628763] udevd[652]: starting version 3.2.5
<4>[    7.100252] Failed to create system directory mdio
<5>[    7.100261] audit: type=1400 audit(1564013750.844:4): avc:  denied  { search } for  pid=742 comm="modprobe" name="events" dev="tracefs" ino=34 scontext=system_u:system_r:kmod_t tcontext=system_u:object_r:tracefs_t tclass=dir permissive=0
Oops#1 Part11
<4>[    7.214206] r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
<6>[    7.217716] libphy: r8169: probed
<6>[    7.217879] r8169 0000:03:00.0 eth0: RTL8168g/8111g, fc:aa:14:c9:ba:a5, XID 4c0, IRQ 32
<6>[    7.217881] r8169 0000:03:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
<6>[    7.330618] cryptd: max_cpu_qlen set to 1000
<6>[    7.670358] fuse: init (API version 7.31)
<30>[    7.714861] udevd[652]: starting eudev-3.2.5
<6>[    7.776379] it87: Found IT8620E chip at 0x228, revision 4
<6>[    7.776401] it87: Beeping is supported
<6>[    7.819883] acpi_cpufreq: overriding BIOS provided _PSD data
<6>[    7.884959] scsi host4: pata_atiixp
<6>[    7.885050] scsi host5: pata_atiixp
<6>[    7.885080] ata5: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14
<6>[    7.885080] ata6: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15
<6>[    7.931619] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, revision 0
<6>[    7.931624] piix4_smbus 0000:00:14.0: Using register 0x2c for SMBus port selection
<6>[    7.931694] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20
<6>[    8.025830] Emu10k1_gameport 0000:04:06.1: enabling device (0000 -> 0001)
<6>[    8.041464] gameport gameport0: EMU10K1 is pci0000:04:06.1/gameport0, io 0xc040, speed 793kHz
<6>[    8.064877] firewire_ohci 0000:04:06.2: enabling device (0000 -> 0002)
<5>[    8.117819] firewire_ohci 0000:04:06.2: added OHCI v1.10 device as card 0, 4 IR + 8 IT contexts, quirks 0x2
<6>[    8.288074] EFI Variables Facility v0.08 2004-May-17
Oops#1 Part10
<6>[    8.301205] xhci_hcd 0000:02:00.0: xHCI Host Controller
<6>[    8.301325] xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 8
<6>[    8.301394] xhci_hcd 0000:02:00.0: hcc params 0x002841eb hci version 0x100 quirks 0x0000000000000090
<6>[    8.301554] usb usb8: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.03
<6>[    8.301556] usb usb8: New USB device strings: Mfr=3, Product=2, SerialNumber=1
<6>[    8.301557] usb usb8: Product: xHCI Host Controller
<6>[    8.301558] usb usb8: Manufacturer: Linux 5.3.0-rc1+ xhci-hcd
<6>[    8.301559] usb usb8: SerialNumber: 0000:02:00.0
<6>[    8.301649] hub 8-0:1.0: USB hub found
<6>[    8.301655] hub 8-0:1.0: 1 port detected
<6>[    8.301739] xhci_hcd 0000:02:00.0: xHCI Host Controller
<6>[    8.301780] xhci_hcd 0000:02:00.0: new USB bus registered, assigned bus number 9
<6>[    8.301783] xhci_hcd 0000:02:00.0: Host supports USB 3.0 SuperSpeed
<6>[    8.301802] usb usb9: We don't know the algorithms for LPM for this host, disabling LPM.
<6>[    8.301826] usb usb9: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.03
<6>[    8.301827] usb usb9: New USB device strings: Mfr=3, Product=2, SerialNumber=1
<6>[    8.301828] usb usb9: Product: xHCI Host Controller
<6>[    8.301829] usb usb9: Manufacturer: Linux 5.3.0-rc1+ xhci-hcd
<6>[    8.301830] usb usb9: SerialNumber: 0000:02:00.0
<6>[    8.301912] hub 9-0:1.0: USB hub found
<6>[    8.301919] hub 9-0:1.0: 4 ports detected
<6>[    8.429878] pstore: Using crash dump compression: deflate
<6>[    8.429890] pstore: Registered efi as persistent store backend
<6>[    8.441893] logitech-djreceiver 0003:046D:C52B.0005: hiddev96,hidraw2: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:16.2-3.1/input2
Oops#1 Part9
<6>[    8.518732] input: PC Speaker as /devices/platform/pcspkr/input/input10
<6>[    8.549045] input: Logitech Unifying Device. Wireless PID:1025 Mouse as /devices/pci0000:00/0000:00:16.2/usb3/3-3/3-3.1/3-3.1:1.2/0003:046D:C52B.0005/0003:046D:1025.0006/input/input11
<6>[    8.549179] hid-generic 0003:046D:1025.0006: input,hidraw3: USB HID v1.11 Mouse [Logitech Unifying Device. Wireless PID:1025] on usb-0000:00:16.2-3.1/input2:1
<6>[    8.610683] usbcore: registered new interface driver snd-usb-audio
<6>[    8.612598] AVX version of gcm_enc/dec engaged.
<6>[    8.612599] AES CTR mode by8 optimization enabled
<5>[    8.620860] firewire_core 0000:04:06.2: created device fw0: GUID 00023c015113a989, S400
<6>[    8.622787] usb 8-1: new high-speed USB device number 2 using xhci_hcd
<6>[    8.688596] snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC892: line_outs=4 (0x14/0x15/0x16/0x17/0x0) type:line
<6>[    8.688599] snd_hda_codec_realtek hdaudioC1D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
<6>[    8.688602] snd_hda_codec_realtek hdaudioC1D0:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
<6>[    8.688603] snd_hda_codec_realtek hdaudioC1D0:    mono: mono_out=0x0
<6>[    8.688605] snd_hda_codec_realtek hdaudioC1D0:    dig-out=0x11/0x1e
<6>[    8.688607] snd_hda_codec_realtek hdaudioC1D0:    inputs:
<6>[    8.688609] snd_hda_codec_realtek hdaudioC1D0:      Front Mic=0x19
<6>[    8.688611] snd_hda_codec_realtek hdaudioC1D0:      Rear Mic=0x18
<6>[    8.688612] snd_hda_codec_realtek hdaudioC1D0:      Line=0x1a
<6>[    8.688801] snd_emu10k1 0000:04:06.0: enabling device (0000 -> 0001)
<6>[    8.694273] snd_emu10k1 0000:04:06.0: Installing spdif_bug patch: SB Audigy 2 ZS [SB0350]
<6>[    8.709178] input: HDA ATI SB Front Mic as /devices/pci0000:00/0000:00:14.2/sound/card1/input15
Oops#1 Part8
<6>[    8.709377] input: HDA ATI SB Rear Mic as /devices/pci0000:00/0000:00:14.2/sound/card1/input16
<6>[    8.709539] input: HDA ATI SB Line as /devices/pci0000:00/0000:00:14.2/sound/card1/input17
<6>[    8.709597] input: HDA ATI SB Line Out Front as /devices/pci0000:00/0000:00:14.2/sound/card1/input18
<6>[    8.709659] input: HDA ATI SB Line Out Surround as /devices/pci0000:00/0000:00:14.2/sound/card1/input19
<6>[    8.709723] input: HDA ATI SB Line Out CLFE as /devices/pci0000:00/0000:00:14.2/sound/card1/input20
<6>[    8.709766] input: HDA ATI SB Line Out Side as /devices/pci0000:00/0000:00:14.2/sound/card1/input21
<6>[    8.709818] input: HDA ATI SB Front Headphone as /devices/pci0000:00/0000:00:14.2/sound/card1/input22
<6>[    8.750916] usb 8-1: New USB device found, idVendor=2109, idProduct=3431, bcdDevice= 4.20
<6>[    8.750919] usb 8-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
<6>[    8.750920] usb 8-1: Product: USB2.0 Hub
<6>[    8.751587] hub 8-1:1.0: USB hub found
<6>[    8.751945] hub 8-1:1.0: 4 ports detected
<6>[    8.757505] r8169 0000:03:00.0 enp3s0: renamed from eth0
<6>[    8.799012] input: Logitech M510 as /devices/pci0000:00/0000:00:16.2/usb3/3-3/3-3.1/3-3.1:1.2/0003:046D:C52B.0005/0003:046D:1025.0006/input/input23
<6>[    8.799178] logitech-hidpp-device 0003:046D:1025.0006: input,hidraw3: USB HID v1.11 Mouse [Logitech M510] on usb-0000:00:16.2-3.1/input2:1
<6>[    9.028869] usb 8-1.1: new full-speed USB device number 3 using xhci_hcd
<6>[    9.113622] usb 8-1.1: New USB device found, idVendor=051d, idProduct=0002, bcdDevice= 0.90
<6>[    9.113626] usb 8-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
<6>[    9.113628] usb 8-1.1: Product: Back-UPS XS 1300G FW:864.L8 .D USB FW:L8 
Oops#1 Part7
<6>[    9.113630] usb 8-1.1: Manufacturer: American Power Conversion
<6>[    9.113632] usb 8-1.1: SerialNumber: 4B1519P43922  
<6>[    9.129850] hid-generic 0003:051D:0002.0007: hiddev97,hidraw4: USB HID v1.00 Device [American Power Conversion Back-UPS XS 1300G FW:864.L8 .D USB FW:L8 ] on usb-0000:02:00.0-1.1/input0
<6>[    9.193889] usb 8-1.2: new full-speed USB device number 4 using xhci_hcd
<6>[    9.525423] usb 8-1.2: New USB device found, idVendor=03f0, idProduct=1004, bcdDevice= 1.00
<6>[    9.525428] usb 8-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
<6>[    9.525430] usb 8-1.2: Product: DeskJet 970C
<6>[    9.525433] usb 8-1.2: Manufacturer: Hewlett-Packard
<6>[    9.525436] usb 8-1.2: SerialNumber: xxxxx
<6>[    9.593321] kvm: Nested Virtualization enabled
<6>[    9.593326] kvm: Nested Paging enabled
<6>[    9.635605] MCE: In-kernel MCE decoding enabled.
<6>[    9.668335] usblp 8-1.2:1.0: usblp0: USB Bidirectional printer dev 4 if 0 alt 1 proto 2 vid 0x03F0 pid 0x1004
<6>[    9.668403] usbcore: registered new interface driver usblp
<6>[    9.703745] EDAC amd64: Node 0: DRAM ECC disabled.
<6>[    9.703746] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
<6>[    9.703746]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
<6>[    9.703746]  (Note that use of the override may cause unknown side effects.)
<6>[    9.717245] EDAC amd64: Node 0: DRAM ECC disabled.
<6>[    9.717246] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
<6>[    9.717246]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
<6>[    9.717246]  (Note that use of the override may cause unknown side effects.)
Oops#1 Part6
<6>[    9.733583] EDAC amd64: Node 0: DRAM ECC disabled.
<6>[    9.733586] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
<6>[    9.733586]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
<6>[    9.733586]  (Note that use of the override may cause unknown side effects.)
<6>[    9.748661] EDAC amd64: Node 0: DRAM ECC disabled.
<6>[    9.748664] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
<6>[    9.748664]  Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
<6>[    9.748664]  (Note that use of the override may cause unknown side effects.)
<6>[   12.280195] EXT4-fs (sdb3): re-mounted. Opts: (null)
<6>[   12.880856] Adding 19433468k swap on /dev/sdb4.  Priority:-2 extents:1 across:19433468k FS
<7>[   16.821920] checking generic (f5000000 300000) vs hw (c0000000 10000000)
<7>[   16.821922] checking generic (f5000000 300000) vs hw (f4000000 2000000)
<6>[   16.821923] fb0: switching to nouveaufb from simple
<6>[   16.822023] Console: switching to colour dummy device 80x25
<6>[   16.822128] nouveau 0000:01:00.0: NVIDIA G92 (092a00a2)
<6>[   16.937255] nouveau 0000:01:00.0: bios: version 62.92.ad.00.00
<6>[   16.957682] nouveau 0000:01:00.0: fb: 1024 MiB GDDR3
<6>[   17.116860] [TTM] Zone  kernel: Available graphics memory: 4026756 KiB
<6>[   17.116861] [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
<6>[   17.116862] [TTM] Initializing pool allocator
<6>[   17.116866] [TTM] Initializing DMA pool allocator
<6>[   17.116877] nouveau 0000:01:00.0: DRM: VRAM: 1024 MiB
<6>[   17.116878] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB
Oops#1 Part5
<6>[   17.116881] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
<6>[   17.116882] nouveau 0000:01:00.0: DRM: DCB version 4.0
<6>[   17.116883] nouveau 0000:01:00.0: DRM: DCB outp 00: 02000300 00000028
<6>[   17.116885] nouveau 0000:01:00.0: DRM: DCB outp 01: 01000302 00020030
<6>[   17.116886] nouveau 0000:01:00.0: DRM: DCB outp 02: 04011310 00000028
<6>[   17.116887] nouveau 0000:01:00.0: DRM: DCB outp 03: 02011312 00020030
<6>[   17.116888] nouveau 0000:01:00.0: DRM: DCB conn 00: 00001030
<6>[   17.116889] nouveau 0000:01:00.0: DRM: DCB conn 01: 00002130
<6>[   17.118677] nouveau 0000:01:00.0: DRM: MM: using CRYPT for buffer copies
<6>[   17.119113] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
<6>[   17.119114] [drm] Driver supports precise vblank timestamp query.
<6>[   17.159424] nouveau 0000:01:00.0: DRM: allocated 1920x1080 fb: 0x70000, bo 000000001c943f0c
<6>[   17.187563] fbcon: nouveaudrmfb (fb0) is primary device
<6>[   17.245738] Console: switching to colour frame buffer device 240x67
<6>[   17.247686] nouveau 0000:01:00.0: fb0: nouveaudrmfb frame buffer device
<6>[   17.256326] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 0
<38>[   17.919469] elogind-daemon[1465]: New seat seat0.
<38>[   17.920283] elogind-daemon[1465]: Watching system buttons on /dev/input/event1 (Power Button)
<38>[   17.920346] elogind-daemon[1465]: Watching system buttons on /dev/input/event0 (Power Button)
<38>[   17.920440] elogind-daemon[1465]: Watching system buttons on /dev/input/event2 (Wired Keyboard)
<38>[   17.920532] elogind-daemon[1465]: Watching system buttons on /dev/input/event3 (Wired Keyboard Consumer Control)
<38>[   17.920622] elogind-daemon[1465]: Watching system buttons on /dev/input/event4 (Wired Keyboard System Control)
Oops#1 Part4
<6>[   19.712581] Generic Realtek PHY r8169-300:00: attached PHY driver [Generic Realtek PHY] (mii_bus:phy_addr=r8169-300:00, irq=IGNORE)
<6>[   19.812678] r8169 0000:03:00.0 enp3s0: Link is Down
<6>[   21.486763] r8169 0000:03:00.0 enp3s0: Link is Up - 100Mbps/Full - flow control rx/tx
<6>[   21.486800] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
<38>[   27.078754] elogind-daemon[1465]: New session c1 of user sddm.
<38>[   37.171593] elogind-daemon[1465]: New session 1 of user xx.
<38>[   38.279554] elogind-daemon[1465]: Removed session c1.
<6>[   93.931664] logitech-hidpp-device 0003:046D:1025.0006: HID++ 1.0 device connected.
<6>[  231.948951] EXT4-fs (sda1): mounting ext3 file system using the ext4 subsystem
<6>[  232.166282] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
<6>[ 1118.875810] traps: plugin-containe[3573] general protection fault ip:7f482d094339 sp:7ffca48d7470 error:0 in ld-2.29.so[7f482d081000+1e000]
<38>[ 2990.080236] elogind-daemon[1465]: Suspending system...
<6>[ 2990.080247] PM: suspend entry (deep)
<6>[ 2990.174030] Filesystems sync: 0.093 seconds
<6>[ 2990.174196] Freezing user space processes ... (elapsed 0.001 seconds) done.
<6>[ 2990.175677] OOM killer disabled.
<6>[ 2990.175678] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<6>[ 2990.176928] printk: Suspending console(s) (use no_console_suspend to debug)
<6>[ 2990.177558] r8169 0000:03:00.0 enp3s0: Link is Down
<6>[ 2990.177833] serial 00:03: disabled
<5>[ 2990.201881] sd 1:0:0:0: [sda] Synchronizing SCSI cache
<5>[ 2990.201909] sd 2:0:0:0: [sdb] Synchronizing SCSI cache
<5>[ 2990.201926] sd 1:0:0:0: [sda] Stopping disk
<5>[ 2990.202628] sd 2:0:0:0: [sdb] Stopping disk
<6>[ 2994.344739] ACPI: Preparing to enter system sleep state S3
Oops#1 Part3
<5>[ 2994.344811] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
<6>[ 2994.345404] PM: Saving platform NVS memory
<6>[ 2994.345460] Disabling non-boot CPUs ...
<6>[ 2994.347565] smpboot: CPU 1 is now offline
<6>[ 2994.349075] smpboot: CPU 2 is now offline
<6>[ 2994.351136] smpboot: CPU 3 is now offline
<6>[ 2994.352639] ACPI: Low-level resume complete
<6>[ 2994.352665] PM: Restoring platform NVS memory
<6>[ 2994.393134] LVT offset 0 assigned for vector 0x400
<6>[ 2994.393374] Enabling non-boot CPUs ...
<6>[ 2994.393449] x86: Booting SMP configuration:
<6>[ 2994.393449] smpboot: Booting Node 0 Processor 1 APIC 0x11
<6>[ 2994.394160] microcode: CPU1: patch_level=0x06000822
<6>[ 2994.396574] CPU1 is up
<6>[ 2994.396634] smpboot: Booting Node 0 Processor 2 APIC 0x12
<6>[ 2994.397219] microcode: CPU2: patch_level=0x06000822
<6>[ 2994.399636] CPU2 is up
<6>[ 2994.399679] smpboot: Booting Node 0 Processor 3 APIC 0x13
<6>[ 2994.400386] microcode: CPU3: patch_level=0x06000822
<6>[ 2994.402796] CPU3 is up
<6>[ 2994.407603] ACPI: Waking up from system sleep state S3
<5>[ 2994.539409] usb usb8: root hub lost power or was reset
<5>[ 2994.539411] usb usb9: root hub lost power or was reset
<6>[ 2994.539984] serial 00:03: activated
<5>[ 2994.557061] sd 1:0:0:0: [sda] Starting disk
<5>[ 2994.557086] sd 2:0:0:0: [sdb] Starting disk
<6>[ 2994.619132] r8169 0000:03:00.0 enp3s0: Link is Down
<6>[ 2994.849077] ata4: SATA link down (SStatus 0 SControl 300)
<6>[ 2994.885335] usb 8-1: reset high-speed USB device number 2 using xhci_hcd
<6>[ 2995.003150] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
<6>[ 2995.003180] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<4>[ 2995.039287] ata3.00: NCQ Send/Recv Log not supported
<6>[ 2995.050422] ata1.00: configured for UDMA/100
Oops#1 Part2
<4>[ 2995.097769] ata3.00: NCQ Send/Recv Log not supported
<6>[ 2995.097772] ata3.00: configured for UDMA/133
<5>[ 2995.131187] firewire_core 0000:04:06.2: rediscovered device fw0
<1>[ 2995.227345] BUG: kernel NULL pointer dereference, address: 0000000000000030
<1>[ 2995.227346] #PF: supervisor read access in kernel mode
<1>[ 2995.227347] #PF: error_code(0x0000) - not-present page
<6>[ 2995.227348] PGD 0 P4D 0 
<4>[ 2995.227350] Oops: 0000 [#1] SMP NOPTI
<4>[ 2995.227352] CPU: 0 PID: 4302 Comm: kworker/u8:88 Not tainted 5.3.0-rc1+ #88
<4>[ 2995.227352] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./970A-D3P, BIOS FC 06/01/2015
<4>[ 2995.227356] Workqueue: events_unbound async_run_entry_fn
<4>[ 2995.227366] RIP: 0010:xhci_endpoint_reset+0x74/0x2e0 [xhci_hcd]
<4>[ 2995.227367] Code: 03 49 63 84 24 10 05 00 00 41 83 e7 03 4c 8b ac c3 f8 03 00 00 75 4f 41 0f b6 6e 02 83 e5 0f 8d 44 2d 00 48 69 d0 a8 00 00 00 <41> 8b 54 15 30 f6 c2 40 74 0f 48 69 c0 a8 00 00 00 83 e2 bf 41 89
<4>[ 2995.227368] RSP: 0018:ffffb3bc81ea7c38 EFLAGS: 00010246
<4>[ 2995.227369] RAX: 0000000000000000 RBX: ffff95bf31418000 RCX: 0000000000000000
<4>[ 2995.227369] RDX: 0000000000000000 RSI: ffff95bf3398f050 RDI: ffff95bf31418000
<4>[ 2995.227370] RBP: 0000000000000000 R08: ffffffff9be1c560 R09: 0000000000000000
<4>[ 2995.227370] R10: 00000002af68b8cd R11: 0000000000000000 R12: ffff95bf3398f000
<4>[ 2995.227371] R13: 0000000000000000 R14: ffff95bf3398f050 R15: 0000000000000000
<4>[ 2995.227372] FS:  0000000000000000(0000) GS:ffff95bf36a00000(0000) knlGS:0000000000000000
<4>[ 2995.227372] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 2995.227373] CR2: 0000000000000030 CR3: 0000000214212000 CR4: 00000000000406f0
<4>[ 2995.227373] Call Trace:
Oops#1 Part1
<4>[ 2995.227379]  usb_enable_endpoint+0xa5/0xb0
<4>[ 2995.227381]  usb_reset_and_verify_device+0x10d/0x740
<4>[ 2995.227383]  ? _raw_spin_unlock_irqrestore+0x16/0x30
<4>[ 2995.227384]  usb_port_resume+0x596/0x780
<4>[ 2995.227386]  usb_resume_both+0x91/0x130
<4>[ 2995.227387]  usb_resume+0x21/0x80
<4>[ 2995.227388]  ? usb_dev_thaw+0x10/0x10
<4>[ 2995.227390]  dpm_run_callback+0x65/0x190
<4>[ 2995.227391]  device_resume+0xac/0x1b0
<4>[ 2995.227393]  async_resume+0x19/0x40
<4>[ 2995.227394]  async_run_entry_fn+0x4a/0x180
<4>[ 2995.227395]  process_one_work+0x185/0x3a0
<4>[ 2995.227397]  worker_thread+0x30/0x3b0
<4>[ 2995.227398]  ? process_one_work+0x3a0/0x3a0
<4>[ 2995.227399]  kthread+0x113/0x130
<4>[ 2995.227399]  ? kthread_park+0xa0/0xa0
<4>[ 2995.227400]  ret_from_fork+0x27/0x50
<4>[ 2995.227402] Modules linked in: snd_seq_dummy snd_seq_oss snd_emu10k1_synth snd_emux_synth snd_seq_midi_emul snd_seq_virmidi snd_seq_midi snd_seq_midi_event snd_seq nouveau wmi i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops usblp ttm edac_mce_amd kvm_amd drm kvm hid_logitech_hidpp irqbypass snd_hda_codec_realtek snd_emu10k1 aesni_intel snd_usb_audio snd_hda_codec_generic snd_hda_intel aes_x86_64 snd_util_mem glue_helper pcspkr crypto_simd snd_ac97_codec snd_hda_codec ac97_bus snd_usbmidi_lib efi_pstore snd_rawmidi snd_hda_core hid_logitech_dj snd_seq_device snd_hwdep xhci_pci k10temp efivars snd_pcm fam15h_power xhci_hcd ata_generic firewire_ohci snd_timer emu10k1_gp firewire_core pata_acpi snd crc_itu_t i2c_piix4 soundcore gameport pata_atiixp i2c_core acpi_cpufreq it87 hwmon_vid fuse autofs4 ghash_clmulni_intel cryptd crc32c_intel r8169 realtek libphy configfs efivarfs
<4>[ 2995.227424] CR2: 0000000000000030
<4>[ 2995.227425] ---[ end trace 0af16ad166d3b33a ]---

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-27  3:15 Oops in xhci_endpoint_reset Bob Gleitsmann
@ 2019-07-27 10:59 ` Greg KH
  2019-07-27 15:05   ` Bob Gleitsmann
  2019-07-27 20:43   ` Bob Gleitsmann
  0 siblings, 2 replies; 11+ messages in thread
From: Greg KH @ 2019-07-27 10:59 UTC (permalink / raw)
  To: Bob Gleitsmann; +Cc: linux-usb

On Fri, Jul 26, 2019 at 11:15:46PM -0400, Bob Gleitsmann wrote:
> Hello,
> 
> 
> I have seen kernel oopses on waking from suspend to memory. I got this
> twice, one dmesg with backtrace attached. The other one had the failure
> in the same place in the code.
> 
> 
> This is kernel 5.3.0-rc1, patched for another problem in ethernet PHY
> driver. Have not had the problem with earlier kernels. Using Gentoo
> linux, amd64, but git kernel.

Any chance you can run 'git bisect' to track down the offending commit?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-27 10:59 ` Greg KH
@ 2019-07-27 15:05   ` Bob Gleitsmann
  2019-07-27 20:43   ` Bob Gleitsmann
  1 sibling, 0 replies; 11+ messages in thread
From: Bob Gleitsmann @ 2019-07-27 15:05 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-usb

I'm working on it,.

On 7/27/19 6:59 AM, Greg KH wrote:
> On Fri, Jul 26, 2019 at 11:15:46PM -0400, Bob Gleitsmann wrote:
>> Hello,
>>
>>
>> I have seen kernel oopses on waking from suspend to memory. I got this
>> twice, one dmesg with backtrace attached. The other one had the failure
>> in the same place in the code.
>>
>>
>> This is kernel 5.3.0-rc1, patched for another problem in ethernet PHY
>> driver. Have not had the problem with earlier kernels. Using Gentoo
>> linux, amd64, but git kernel.
> Any chance you can run 'git bisect' to track down the offending commit?
>
> thanks,
>
> greg k-h
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-27 10:59 ` Greg KH
  2019-07-27 15:05   ` Bob Gleitsmann
@ 2019-07-27 20:43   ` Bob Gleitsmann
  2019-07-30 15:49     ` Enric Balletbo Serra
  2019-07-30 15:57     ` Mathias Nyman
  1 sibling, 2 replies; 11+ messages in thread
From: Bob Gleitsmann @ 2019-07-27 20:43 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-usb

OK, here's the result of the bisection:

ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
commit ef513be0a9057cc6baf5d29566aaaefa214ba344
Author: Jim Lin <jilin@nvidia.com>
Date:???? Mon Jun 3 18:53:44 2019 +0800

?????? usb: xhci: Add Clear_TT_Buffer
??????
?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
?????? processing for full-/low-speed endpoints connected via a TT, the host
?????? software must use the Clear_TT_Buffer request to the TT to ensure
?????? that the buffer is not in the busy state".
??????
?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
?????? request we may continue to get STALL with the folllowing requests,
?????? like Set_Interface.
??????
?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
?????? request to the hub of the device for the following Set_Interface
?????? requests to the device to get ACK successfully.
??????
?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
??drivers/usb/host/xhci.h?????????? |?? 5 +++++
??3 files changed, 52 insertions(+), 1 deletion(-)


On 7/27/19 6:59 AM, Greg KH wrote:
> On Fri, Jul 26, 2019 at 11:15:46PM -0400, Bob Gleitsmann wrote:
>> Hello,
>>
>>
>> I have seen kernel oopses on waking from suspend to memory. I got this
>> twice, one dmesg with backtrace attached. The other one had the failure
>> in the same place in the code.
>>
>>
>> This is kernel 5.3.0-rc1, patched for another problem in ethernet PHY
>> driver. Have not had the problem with earlier kernels. Using Gentoo
>> linux, amd64, but git kernel.
> Any chance you can run 'git bisect' to track down the offending commit?
>
> thanks,
>
> greg k-h
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-27 20:43   ` Bob Gleitsmann
@ 2019-07-30 15:49     ` Enric Balletbo Serra
  2019-07-30 16:28       ` Mathias Nyman
  2019-07-30 15:57     ` Mathias Nyman
  1 sibling, 1 reply; 11+ messages in thread
From: Enric Balletbo Serra @ 2019-07-30 15:49 UTC (permalink / raw)
  To: Bob Gleitsmann; +Cc: Greg KH, linux-usb

Hi,

Missatge de Bob Gleitsmann <rjgleits@bellsouth.net> del dia ds., 27 de
jul. 2019 a les 23:39:
>
> OK, here's the result of the bisection:
>
> ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
> commit ef513be0a9057cc6baf5d29566aaaefa214ba344
> Author: Jim Lin <jilin@nvidia.com>
> Date:???? Mon Jun 3 18:53:44 2019 +0800
>
> ?????? usb: xhci: Add Clear_TT_Buffer

I want to confirm that I get the same oops on a Samsung Chromebook
Plus (rk3399) and that reverting the above commit fixes the issue.

If it helps there is a decoded stacktrace below (I need to gain some
usb knowledge to deal with this), probably others can have a better
idea on what is happening.

[   75.613254] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000030
[   75.623102] Mem abort info:
[   75.626224]   ESR = 0x96000004
[   75.629636]   Exception class = DABT (current EL), IL = 32 bits
[   75.636252]   SET = 0, FnV = 0
[   75.639662]   EA = 0, S1PTW = 0
[   75.643164] Data abort info:
[   75.646381]   ISV = 0, ISS = 0x00000004
[   75.650667]   CM = 0, WnR = 0
[   75.653981] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e359e000
[   75.661181] [0000000000000030] pgd=0000000000000000
[   75.666633] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   75.672856] Modules linked in: btusb btrtl ...
[   75.751693] CPU: 4 PID: 916 Comm: systemd-sleep Not tainted 5.3.0-rc2+ #103
[   75.759470] Hardware name: Google Kevin (DT)
[   75.764237] pstate: 40000005 (nZcv daif -PAN -UAO)
[   75.769594] pc : xhci_endpoint_reset
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3096)
[   75.774741] lr : xhci_endpoint_reset
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.h:1913
/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3087)
[   75.779797] sp : ffff000011b6b930
[   75.783494] x29: ffff000011b6b930 x28: 00000000ffffff95
[   75.789426] x27: ffff8000ef657e00 x26: 0000000000000000
[   75.795358] x25: ffff8000efafeb80 x24: 0000000000000000
[   75.801289] x23: ffff8000efa4a250 x22: 0000000000000001
[   75.807212] x21: ffff8000efafe800 x20: ffff8000efa4a000
[   75.813143] x19: ffff8000efafe850 x18: 0000000000000000
[   75.819074] x17: 0000000000000000 x16: 0000000000000000
[   75.824997] x15: 0000000000000000 x14: 0000000000000000
[   75.830920] x13: ffff8000ef5ff180 x12: 0000000034d4d91d
[   75.836851] x11: 0000000000000000 x10: 0000000000000990
[   75.842773] x9 : ffff8000efa3d000 x8 : 0000000000000004
[   75.848695] x7 : ffff8000f55b8340 x6 : ffff8000ef65e700
[   75.854618] x5 : ffff8000efe844c0 x4 : 0000000000000000
[   75.860549] x3 : 0000000000000000 x2 : 0000000000000000
[   75.866471] x1 : 0000000000000000 x0 : 0000000000000000
[   75.872394] Call trace:
[   75.875122] xhci_endpoint_reset
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3096)
[   75.879889] usb_hcd_reset_endpoint
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hcd.c:2090)
[   75.884753] usb_enable_endpoint
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/message.c:1294)
[   75.889324] usb_ep0_reinit
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:4423)
[   75.893402] usb_reset_and_verify_device
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:5716)
[   75.898848] usb_port_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:3379
/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:3579)
[   75.903217] generic_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/generic.c:277)
[   75.907304] usb_resume_both
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1182
/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1406)
[   75.911584] usb_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1501)
[   75.915281] usb_dev_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/usb.c:471)
[   75.919361] dpm_run_callback.isra.6
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:458)
[   75.924322] device_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:999)
[   75.928408] dpm_resume
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:1055)
[   75.932203] dpm_resume_end
(/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:1171)

Thanks,
~ Enric

> ??????
> ?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
> ?????? processing for full-/low-speed endpoints connected via a TT, the host
> ?????? software must use the Clear_TT_Buffer request to the TT to ensure
> ?????? that the buffer is not in the busy state".
> ??????
> ?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
> ?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
> ?????? request we may continue to get STALL with the folllowing requests,
> ?????? like Set_Interface.
> ??????
> ?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
> ?????? request to the hub of the device for the following Set_Interface
> ?????? requests to the device to get ACK successfully.
> ??????
> ?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
> ?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> ??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
> ??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
> ??drivers/usb/host/xhci.h?????????? |?? 5 +++++
> ??3 files changed, 52 insertions(+), 1 deletion(-)
>
>
> On 7/27/19 6:59 AM, Greg KH wrote:
> > On Fri, Jul 26, 2019 at 11:15:46PM -0400, Bob Gleitsmann wrote:
> >> Hello,
> >>
> >>
> >> I have seen kernel oopses on waking from suspend to memory. I got this
> >> twice, one dmesg with backtrace attached. The other one had the failure
> >> in the same place in the code.
> >>
> >>
> >> This is kernel 5.3.0-rc1, patched for another problem in ethernet PHY
> >> driver. Have not had the problem with earlier kernels. Using Gentoo
> >> linux, amd64, but git kernel.
> > Any chance you can run 'git bisect' to track down the offending commit?
> >
> > thanks,
> >
> > greg k-h
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-27 20:43   ` Bob Gleitsmann
  2019-07-30 15:49     ` Enric Balletbo Serra
@ 2019-07-30 15:57     ` Mathias Nyman
  2019-07-31  9:18       ` Enric Balletbo Serra
  1 sibling, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2019-07-30 15:57 UTC (permalink / raw)
  To: Bob Gleitsmann, Greg KH; +Cc: linux-usb

On 27.7.2019 23.43, Bob Gleitsmann wrote:
> OK, here's the result of the bisection:
> 
> ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
> commit ef513be0a9057cc6baf5d29566aaaefa214ba344
> Author: Jim Lin <jilin@nvidia.com>
> Date:???? Mon Jun 3 18:53:44 2019 +0800
> 
> ?????? usb: xhci: Add Clear_TT_Buffer
> ??????
> ?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
> ?????? processing for full-/low-speed endpoints connected via a TT, the host
> ?????? software must use the Clear_TT_Buffer request to the TT to ensure
> ?????? that the buffer is not in the busy state".
> ??????
> ?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
> ?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
> ?????? request we may continue to get STALL with the folllowing requests,
> ?????? like Set_Interface.
> ??????
> ?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
> ?????? request to the hub of the device for the following Set_Interface
> ?????? requests to the device to get ACK successfully.
> ??????
> ?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
> ?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> ??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
> ??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
> ??drivers/usb/host/xhci.h?????????? |?? 5 +++++
> ??3 files changed, 52 insertions(+), 1 deletion(-)
> 
> 

Thanks, a quick look doesn't immediately open up the cause to me.
Most likely an endpoint or struct usb_device got dropped and freed at suspend/resume,
but we probably have some old stale pointer still in a a TD or URB to it.

could you apply the hack below, it should show more details about this issue.

grep for "Mathias" after resume, if you find it we just prevented a crash.

also adding more xhci debugging and tracing would help:

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
< suspend/resume >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace

8<---

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9741cde..98a515c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1809,14 +1809,33 @@ struct xhci_segment *trb_in_td(struct xhci_hcd *xhci,
  static void xhci_clear_hub_tt_buffer(struct xhci_hcd *xhci, struct xhci_td *td,
                 struct xhci_virt_ep *ep)
  {
+       struct usb_device *udev;
+
         /*
          * As part of low/full-speed endpoint-halt processing
          * we must clear the TT buffer (USB 2.0 specification 11.17.5).
          */
+
         if (td->urb->dev->tt && !usb_pipeint(td->urb->pipe) &&
             (td->urb->dev->tt->hub != xhci_to_hcd(xhci)->self.root_hub) &&
             !(ep->ep_state & EP_CLEARING_TT)) {
+               udev = td->urb->dev;
+               if (!udev) {
+                       xhci_err(xhci, "Mathias: missing udev\n");
+                       return;
+               }
+               if (!udev->slot_id)  {
+                       xhci_err(xhci, "Mathias: missing udev->slot_id\n");
+                       return;
+               }
+
+               if (!xhci->devs[udev->slot_id])  {
+                       xhci_err(xhci, "Mathias: missing xhci->devs[udev->slot_id]\n");
+                       return;
+               }
                 ep->ep_state |= EP_CLEARING_TT;
+               xhci_err(xhci, "urb->ep->hcpriv %p,  urb->hcpriv %p\n",
+                        td->urb->ep->hcpriv, td->urb->dev);
                 td->urb->ep->hcpriv = td->urb->dev;
                 if (usb_hub_clear_tt_buffer(td->urb))
                         ep->ep_state &= ~EP_CLEARING_TT;
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 248cd7a..d7978e0 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3090,8 +3090,19 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
         udev = (struct usb_device *) host_ep->hcpriv;
         vdev = xhci->devs[udev->slot_id];
         ep_index = xhci_get_endpoint_index(&host_ep->desc);
+
+       if (!vdev) {
+               xhci_warn(xhci, "Mathias: No vdev for slot id %d\n", udev->slot_id);
+               return;
+       }
         ep = &vdev->eps[ep_index];
  
+       if (!ep) {
+               xhci_warn(xhci, "Mathias: No ep for slot %d ep_index %d\n",
+                         udev->slot_id, ep_index);
+               return;
+       }
+
         /* Bail out if toggle is already being cleared by a endpoint reset */
         if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {
                 ep->ep_state &= ~EP_HARD_CLEAR_TOGGLE;


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-30 15:49     ` Enric Balletbo Serra
@ 2019-07-30 16:28       ` Mathias Nyman
  0 siblings, 0 replies; 11+ messages in thread
From: Mathias Nyman @ 2019-07-30 16:28 UTC (permalink / raw)
  To: Enric Balletbo Serra, Bob Gleitsmann; +Cc: Greg KH, linux-usb

On 30.7.2019 18.49, Enric Balletbo Serra wrote:
> Hi,
> 
> Missatge de Bob Gleitsmann <rjgleits@bellsouth.net> del dia ds., 27 de
> jul. 2019 a les 23:39:
>>
>> OK, here's the result of the bisection:
>>
>> ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
>> commit ef513be0a9057cc6baf5d29566aaaefa214ba344
>> Author: Jim Lin <jilin@nvidia.com>
>> Date:???? Mon Jun 3 18:53:44 2019 +0800
>>
>> ?????? usb: xhci: Add Clear_TT_Buffer
> 
> I want to confirm that I get the same oops on a Samsung Chromebook
> Plus (rk3399) and that reverting the above commit fixes the issue.
> 
> If it helps there is a decoded stacktrace below (I need to gain some
> usb knowledge to deal with this), probably others can have a better
> idea on what is happening.
> 
> [   75.613254] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000030

> [   75.769594] pc : xhci_endpoint_reset
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3096)

Thanks, guessing maybe host_ep->hcpriv used to be cleared after some endpoint was dropped,
which in normal cases would cause xhci_endpoint_reset() to return early.

3074	static void xhci_endpoint_reset(struct usb_hcd *hcd,
3075			struct usb_host_endpoint *host_ep)
3076	{
3077		struct xhci_hcd *xhci;
3078		struct usb_device *udev;
3079		struct xhci_virt_device *vdev;
3080		struct xhci_virt_ep *ep;
3081		struct xhci_input_control_ctx *ctrl_ctx;
3082		struct xhci_command *stop_cmd, *cfg_cmd;
3083		unsigned int ep_index;
3084		unsigned long flags;
3085		u32 ep_flag;
3086	
3087		xhci = hcd_to_xhci(hcd);
3088		if (!host_ep->hcpriv)
3089			return;
3090		udev = (struct usb_device *) host_ep->hcpriv;
3091		vdev = xhci->devs[udev->slot_id];
3092		ep_index = xhci_get_endpoint_index(&host_ep->desc);
3093		ep = &vdev->eps[ep_index];
3094	
3095		/* Bail out if toggle is already being cleared by a endpoint reset */
3096		if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {

commit ef513be" usb: xhci: Add Clear_TT_Buffer" sets hcpriv again when handling a halted endpoint behind a TT hub.
If the event to handle the stalled endpoint is hadled late its possible we set a stale value to ep->hcpriv
which should just be cleared.

+static void xhci_clear_hub_tt_buffer(struct xhci_hcd *xhci, struct xhci_td *td,
+               struct xhci_virt_ep *ep)
+{
+       /*
+        * As part of low/full-speed endpoint-halt processing
+        * we must clear the TT buffer (USB 2.0 specification 11.17.5).
+        */
+       if (td->urb->dev->tt && !usb_pipeint(td->urb->pipe) &&
+           (td->urb->dev->tt->hub != xhci_to_hcd(xhci)->self.root_hub) &&
+           !(ep->ep_state & EP_CLEARING_TT)) {
+               ep->ep_state |= EP_CLEARING_TT;
+               td->urb->ep->hcpriv = td->urb->dev;
+               if (usb_hub_clear_tt_buffer(td->urb))
+                       ep->ep_state &= ~EP_CLEARING_TT;
+       }
+}

Still just a guess.
Does the below code fix your issue?

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 248cd7a..a0984aa 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3092,6 +3092,10 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
         ep_index = xhci_get_endpoint_index(&host_ep->desc);
         ep = &vdev->eps[ep_index];
  
+       if (!ep) {
+               xhci_err(xhci, "Mathias: No ep for endpoint reset, bail out\n");
+               return;
+       }
         /* Bail out if toggle is already being cleared by a endpoint reset */
         if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {
                 ep->ep_state &= ~EP_HARD_CLEAR_TOGGLE;


Also logs and traces would better show the root cause:

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
< suspend/resume >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace

-Mathias


> [   75.774741] lr : xhci_endpoint_reset
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.h:1913
> /home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3087)
> [   75.779797] sp : ffff000011b6b930
> [   75.783494] x29: ffff000011b6b930 x28: 00000000ffffff95
> [   75.789426] x27: ffff8000ef657e00 x26: 0000000000000000
> [   75.795358] x25: ffff8000efafeb80 x24: 0000000000000000
> [   75.801289] x23: ffff8000efa4a250 x22: 0000000000000001
> [   75.807212] x21: ffff8000efafe800 x20: ffff8000efa4a000
> [   75.813143] x19: ffff8000efafe850 x18: 0000000000000000
> [   75.819074] x17: 0000000000000000 x16: 0000000000000000
> [   75.824997] x15: 0000000000000000 x14: 0000000000000000
> [   75.830920] x13: ffff8000ef5ff180 x12: 0000000034d4d91d
> [   75.836851] x11: 0000000000000000 x10: 0000000000000990
> [   75.842773] x9 : ffff8000efa3d000 x8 : 0000000000000004
> [   75.848695] x7 : ffff8000f55b8340 x6 : ffff8000ef65e700
> [   75.854618] x5 : ffff8000efe844c0 x4 : 0000000000000000
> [   75.860549] x3 : 0000000000000000 x2 : 0000000000000000
> [   75.866471] x1 : 0000000000000000 x0 : 0000000000000000
> [   75.872394] Call trace:
> [   75.875122] xhci_endpoint_reset
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/host/xhci.c:3096)
> [   75.879889] usb_hcd_reset_endpoint
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hcd.c:2090)
> [   75.884753] usb_enable_endpoint
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/message.c:1294)
> [   75.889324] usb_ep0_reinit
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:4423)
> [   75.893402] usb_reset_and_verify_device
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:5716)
> [   75.898848] usb_port_resume
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:3379
> /home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/hub.c:3579)
> [   75.903217] generic_resume
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/generic.c:277)
> [   75.907304] usb_resume_both
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1182
> /home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1406)
> [   75.911584] usb_resume
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/driver.c:1501)
> [   75.915281] usb_dev_resume
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/usb/core/usb.c:471)
> [   75.919361] dpm_run_callback.isra.6
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:458)
> [   75.924322] device_resume
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:999)
> [   75.928408] dpm_resume
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:1055)
> [   75.932203] dpm_resume_end
> (/home/eballetbo/Projects/chromebooks/kernel/drivers/base/power/main.c:1171)
> 
> Thanks,
> ~ Enric
> 
>> ??????
>> ?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
>> ?????? processing for full-/low-speed endpoints connected via a TT, the host
>> ?????? software must use the Clear_TT_Buffer request to the TT to ensure
>> ?????? that the buffer is not in the busy state".
>> ??????
>> ?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
>> ?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
>> ?????? request we may continue to get STALL with the folllowing requests,
>> ?????? like Set_Interface.
>> ??????
>> ?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
>> ?????? request to the hub of the device for the following Set_Interface
>> ?????? requests to the device to get ACK successfully.
>> ??????
>> ?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
>> ?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>> ?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>
>> ??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
>> ??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
>> ??drivers/usb/host/xhci.h?????????? |?? 5 +++++
>> ??3 files changed, 52 insertions(+), 1 deletion(-)
>>
>>
>> On 7/27/19 6:59 AM, Greg KH wrote:
>>> On Fri, Jul 26, 2019 at 11:15:46PM -0400, Bob Gleitsmann wrote:
>>>> Hello,
>>>>
>>>>
>>>> I have seen kernel oopses on waking from suspend to memory. I got this
>>>> twice, one dmesg with backtrace attached. The other one had the failure
>>>> in the same place in the code.
>>>>
>>>>
>>>> This is kernel 5.3.0-rc1, patched for another problem in ethernet PHY
>>>> driver. Have not had the problem with earlier kernels. Using Gentoo
>>>> linux, amd64, but git kernel.
>>> Any chance you can run 'git bisect' to track down the offending commit?
>>>
>>> thanks,
>>>
>>> greg k-h
>>>


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-30 15:57     ` Mathias Nyman
@ 2019-07-31  9:18       ` Enric Balletbo Serra
  2019-07-31 14:18         ` Mathias Nyman
  0 siblings, 1 reply; 11+ messages in thread
From: Enric Balletbo Serra @ 2019-07-31  9:18 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Bob Gleitsmann, Greg KH, linux-usb

Hi Mathias,

Thanks to look into this.

Missatge de Mathias Nyman <mathias.nyman@linux.intel.com> del dia dt.,
30 de jul. 2019 a les 21:39:
>
> On 27.7.2019 23.43, Bob Gleitsmann wrote:
> > OK, here's the result of the bisection:
> >
> > ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
> > commit ef513be0a9057cc6baf5d29566aaaefa214ba344
> > Author: Jim Lin <jilin@nvidia.com>
> > Date:???? Mon Jun 3 18:53:44 2019 +0800
> >
> > ?????? usb: xhci: Add Clear_TT_Buffer
> > ??????
> > ?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
> > ?????? processing for full-/low-speed endpoints connected via a TT, the host
> > ?????? software must use the Clear_TT_Buffer request to the TT to ensure
> > ?????? that the buffer is not in the busy state".
> > ??????
> > ?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
> > ?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
> > ?????? request we may continue to get STALL with the folllowing requests,
> > ?????? like Set_Interface.
> > ??????
> > ?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
> > ?????? request to the hub of the device for the following Set_Interface
> > ?????? requests to the device to get ACK successfully.
> > ??????
> > ?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
> > ?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> > ?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >
> > ??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
> > ??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
> > ??drivers/usb/host/xhci.h?????????? |?? 5 +++++
> > ??3 files changed, 52 insertions(+), 1 deletion(-)
> >
> >
>
> Thanks, a quick look doesn't immediately open up the cause to me.
> Most likely an endpoint or struct usb_device got dropped and freed at suspend/resume,
> but we probably have some old stale pointer still in a a TD or URB to it.
>
> could you apply the hack below, it should show more details about this issue.
>
> grep for "Mathias" after resume, if you find it we just prevented a crash.
>

With the below patch the oops disappears and the reason is

root@debian:~# dmesg | grep "Mathias"
[   67.747933] xhci-hcd xhci-hcd.8.auto: Mathias: No vdev for slot id 0


> also adding more xhci debugging and tracing would help:
>
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < suspend/resume >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace
>

Unfortunately, when the oops happens the machine is unresponsive :-(

Thanks,
~ Enric


> 8<---
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 9741cde..98a515c 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -1809,14 +1809,33 @@ struct xhci_segment *trb_in_td(struct xhci_hcd *xhci,
>   static void xhci_clear_hub_tt_buffer(struct xhci_hcd *xhci, struct xhci_td *td,
>                  struct xhci_virt_ep *ep)
>   {
> +       struct usb_device *udev;
> +
>          /*
>           * As part of low/full-speed endpoint-halt processing
>           * we must clear the TT buffer (USB 2.0 specification 11.17.5).
>           */
> +
>          if (td->urb->dev->tt && !usb_pipeint(td->urb->pipe) &&
>              (td->urb->dev->tt->hub != xhci_to_hcd(xhci)->self.root_hub) &&
>              !(ep->ep_state & EP_CLEARING_TT)) {
> +               udev = td->urb->dev;
> +               if (!udev) {
> +                       xhci_err(xhci, "Mathias: missing udev\n");
> +                       return;
> +               }
> +               if (!udev->slot_id)  {
> +                       xhci_err(xhci, "Mathias: missing udev->slot_id\n");
> +                       return;
> +               }
> +
> +               if (!xhci->devs[udev->slot_id])  {
> +                       xhci_err(xhci, "Mathias: missing xhci->devs[udev->slot_id]\n");
> +                       return;
> +               }
>                  ep->ep_state |= EP_CLEARING_TT;
> +               xhci_err(xhci, "urb->ep->hcpriv %p,  urb->hcpriv %p\n",
> +                        td->urb->ep->hcpriv, td->urb->dev);
>                  td->urb->ep->hcpriv = td->urb->dev;
>                  if (usb_hub_clear_tt_buffer(td->urb))
>                          ep->ep_state &= ~EP_CLEARING_TT;
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 248cd7a..d7978e0 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -3090,8 +3090,19 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
>          udev = (struct usb_device *) host_ep->hcpriv;
>          vdev = xhci->devs[udev->slot_id];
>          ep_index = xhci_get_endpoint_index(&host_ep->desc);
> +
> +       if (!vdev) {
> +               xhci_warn(xhci, "Mathias: No vdev for slot id %d\n", udev->slot_id);
> +               return;
> +       }
>          ep = &vdev->eps[ep_index];
>
> +       if (!ep) {
> +               xhci_warn(xhci, "Mathias: No ep for slot %d ep_index %d\n",
> +                         udev->slot_id, ep_index);
> +               return;
> +       }
> +
>          /* Bail out if toggle is already being cleared by a endpoint reset */
>          if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {
>                  ep->ep_state &= ~EP_HARD_CLEAR_TOGGLE;
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-31  9:18       ` Enric Balletbo Serra
@ 2019-07-31 14:18         ` Mathias Nyman
  2019-07-31 16:31           ` Enric Balletbo Serra
  0 siblings, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2019-07-31 14:18 UTC (permalink / raw)
  To: Enric Balletbo Serra; +Cc: Bob Gleitsmann, Greg KH, linux-usb

On 31.7.2019 12.18, Enric Balletbo Serra wrote:
> Hi Mathias,
> 
> Thanks to look into this.
> 
> Missatge de Mathias Nyman <mathias.nyman@linux.intel.com> del dia dt.,
> 30 de jul. 2019 a les 21:39:
>>
>> On 27.7.2019 23.43, Bob Gleitsmann wrote:
>>> OK, here's the result of the bisection:
>>>
>>> ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
>>> commit ef513be0a9057cc6baf5d29566aaaefa214ba344
>>> Author: Jim Lin <jilin@nvidia.com>
>>> Date:???? Mon Jun 3 18:53:44 2019 +0800
>>>
>>> ?????? usb: xhci: Add Clear_TT_Buffer
>>> ??????
>>> ?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
>>> ?????? processing for full-/low-speed endpoints connected via a TT, the host
>>> ?????? software must use the Clear_TT_Buffer request to the TT to ensure
>>> ?????? that the buffer is not in the busy state".
>>> ??????
>>> ?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
>>> ?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
>>> ?????? request we may continue to get STALL with the folllowing requests,
>>> ?????? like Set_Interface.
>>> ??????
>>> ?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
>>> ?????? request to the hub of the device for the following Set_Interface
>>> ?????? requests to the device to get ACK successfully.
>>> ??????
>>> ?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
>>> ?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>> ?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>
>>> ??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
>>> ??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
>>> ??drivers/usb/host/xhci.h?????????? |?? 5 +++++
>>> ??3 files changed, 52 insertions(+), 1 deletion(-)
>>>
>>>
>>
>> Thanks, a quick look doesn't immediately open up the cause to me.
>> Most likely an endpoint or struct usb_device got dropped and freed at suspend/resume,
>> but we probably have some old stale pointer still in a a TD or URB to it.
>>
>> could you apply the hack below, it should show more details about this issue.
>>
>> grep for "Mathias" after resume, if you find it we just prevented a crash.
>>
> 
> With the below patch the oops disappears and the reason is
> 
> root@debian:~# dmesg | grep "Mathias"
> [   67.747933] xhci-hcd xhci-hcd.8.auto: Mathias: No vdev for slot id 0
> 

Ok, thanks,
When we free the xhci virt_dev the udev->slot_is set to zero as well.
Looks like whole xHCI was reset are resume:
  
[ 2994.539409] usb usb8: root hub lost power or was reset
[ 2994.539411] usb usb9: root hub lost power or was reset

This means that xHC controller was reset and xhci driver re-allocated everything.

It makes sense to check that xhci virt_device exists in the endpoint reset callback.
This will fix the oops, but I'm still missing the big picture, how we ended up here.

Would it be possible for you to take traces and logs with the previous patch  that prevents
the oops, but shows the "Mathias: No vdev for slot id 0" message?

-Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-31 14:18         ` Mathias Nyman
@ 2019-07-31 16:31           ` Enric Balletbo Serra
  2019-08-02 11:22             ` Mathias Nyman
  0 siblings, 1 reply; 11+ messages in thread
From: Enric Balletbo Serra @ 2019-07-31 16:31 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Bob Gleitsmann, Greg KH, linux-usb

Missatge de Mathias Nyman <mathias.nyman@linux.intel.com> del dia dc.,
31 de jul. 2019 a les 16:16:
>
> On 31.7.2019 12.18, Enric Balletbo Serra wrote:
> > Hi Mathias,
> >
> > Thanks to look into this.
> >
> > Missatge de Mathias Nyman <mathias.nyman@linux.intel.com> del dia dt.,
> > 30 de jul. 2019 a les 21:39:
> >>
> >> On 27.7.2019 23.43, Bob Gleitsmann wrote:
> >>> OK, here's the result of the bisection:
> >>>
> >>> ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
> >>> commit ef513be0a9057cc6baf5d29566aaaefa214ba344
> >>> Author: Jim Lin <jilin@nvidia.com>
> >>> Date:???? Mon Jun 3 18:53:44 2019 +0800
> >>>
> >>> ?????? usb: xhci: Add Clear_TT_Buffer
> >>> ??????
> >>> ?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
> >>> ?????? processing for full-/low-speed endpoints connected via a TT, the host
> >>> ?????? software must use the Clear_TT_Buffer request to the TT to ensure
> >>> ?????? that the buffer is not in the busy state".
> >>> ??????
> >>> ?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
> >>> ?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
> >>> ?????? request we may continue to get STALL with the folllowing requests,
> >>> ?????? like Set_Interface.
> >>> ??????
> >>> ?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
> >>> ?????? request to the hub of the device for the following Set_Interface
> >>> ?????? requests to the device to get ACK successfully.
> >>> ??????
> >>> ?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
> >>> ?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> >>> ?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> >>>
> >>> ??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
> >>> ??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
> >>> ??drivers/usb/host/xhci.h?????????? |?? 5 +++++
> >>> ??3 files changed, 52 insertions(+), 1 deletion(-)
> >>>
> >>>
> >>
> >> Thanks, a quick look doesn't immediately open up the cause to me.
> >> Most likely an endpoint or struct usb_device got dropped and freed at suspend/resume,
> >> but we probably have some old stale pointer still in a a TD or URB to it.
> >>
> >> could you apply the hack below, it should show more details about this issue.
> >>
> >> grep for "Mathias" after resume, if you find it we just prevented a crash.
> >>
> >
> > With the below patch the oops disappears and the reason is
> >
> > root@debian:~# dmesg | grep "Mathias"
> > [   67.747933] xhci-hcd xhci-hcd.8.auto: Mathias: No vdev for slot id 0
> >
>
> Ok, thanks,
> When we free the xhci virt_dev the udev->slot_is set to zero as well.
> Looks like whole xHCI was reset are resume:
>
> [ 2994.539409] usb usb8: root hub lost power or was reset
> [ 2994.539411] usb usb9: root hub lost power or was reset
>
> This means that xHC controller was reset and xhci driver re-allocated everything.
>
> It makes sense to check that xhci virt_device exists in the endpoint reset callback.
> This will fix the oops, but I'm still missing the big picture, how we ended up here.
>
> Would it be possible for you to take traces and logs with the previous patch  that prevents
> the oops, but shows the "Mathias: No vdev for slot id 0" message?
>

Sure, here is:

dmesg: https://paste.debian.net/1093737/
traces: https://drive.google.com/open?id=1So-_zsu8ROtMH08hYVKIAZfr_51QLUPD

Thanks,
~ Enric




> -Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Oops in xhci_endpoint_reset
  2019-07-31 16:31           ` Enric Balletbo Serra
@ 2019-08-02 11:22             ` Mathias Nyman
  0 siblings, 0 replies; 11+ messages in thread
From: Mathias Nyman @ 2019-08-02 11:22 UTC (permalink / raw)
  To: Enric Balletbo Serra; +Cc: Bob Gleitsmann, Greg KH, linux-usb

On 31.7.2019 19.31, Enric Balletbo Serra wrote:
> Missatge de Mathias Nyman <mathias.nyman@linux.intel.com> del dia dc.,
> 31 de jul. 2019 a les 16:16:
>>
>> On 31.7.2019 12.18, Enric Balletbo Serra wrote:
>>> Hi Mathias,
>>>
>>> Thanks to look into this.
>>>
>>> Missatge de Mathias Nyman <mathias.nyman@linux.intel.com> del dia dt.,
>>> 30 de jul. 2019 a les 21:39:
>>>>
>>>> On 27.7.2019 23.43, Bob Gleitsmann wrote:
>>>>> OK, here's the result of the bisection:
>>>>>
>>>>> ef513be0a9057cc6baf5d29566aaaefa214ba344 is the first bad commit
>>>>> commit ef513be0a9057cc6baf5d29566aaaefa214ba344
>>>>> Author: Jim Lin <jilin@nvidia.com>
>>>>> Date:???? Mon Jun 3 18:53:44 2019 +0800
>>>>>
>>>>> ?????? usb: xhci: Add Clear_TT_Buffer
>>>>> ??????
>>>>> ?????? USB 2.0 specification chapter 11.17.5 says "as part of endpoint halt
>>>>> ?????? processing for full-/low-speed endpoints connected via a TT, the host
>>>>> ?????? software must use the Clear_TT_Buffer request to the TT to ensure
>>>>> ?????? that the buffer is not in the busy state".
>>>>> ??????
>>>>> ?????? In our case, a full-speed speaker (ConferenceCam) is behind a high-
>>>>> ?????? speed hub (ConferenceCam Connect), sometimes once we get STALL on a
>>>>> ?????? request we may continue to get STALL with the folllowing requests,
>>>>> ?????? like Set_Interface.
>>>>> ??????
>>>>> ?????? Here we invoke usb_hub_clear_tt_buffer() to send Clear_TT_Buffer
>>>>> ?????? request to the hub of the device for the following Set_Interface
>>>>> ?????? requests to the device to get ACK successfully.
>>>>> ??????
>>>>> ?????? Signed-off-by: Jim Lin <jilin@nvidia.com>
>>>>> ?????? Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>>> ?????? Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>>>
>>>>> ??drivers/usb/host/xhci-ring.c | 27 ++++++++++++++++++++++++++-
>>>>> ??drivers/usb/host/xhci.c?????????? | 21 +++++++++++++++++++++
>>>>> ??drivers/usb/host/xhci.h?????????? |?? 5 +++++
>>>>> ??3 files changed, 52 insertions(+), 1 deletion(-)
>>>>>
>>>>>
>>>>
>>>> Thanks, a quick look doesn't immediately open up the cause to me.
>>>> Most likely an endpoint or struct usb_device got dropped and freed at suspend/resume,
>>>> but we probably have some old stale pointer still in a a TD or URB to it.
>>>>
>>>> could you apply the hack below, it should show more details about this issue.
>>>>
>>>> grep for "Mathias" after resume, if you find it we just prevented a crash.
>>>>
>>>
>>> With the below patch the oops disappears and the reason is
>>>
>>> root@debian:~# dmesg | grep "Mathias"
>>> [   67.747933] xhci-hcd xhci-hcd.8.auto: Mathias: No vdev for slot id 0
>>>
>>
>> Ok, thanks,
>> When we free the xhci virt_dev the udev->slot_is set to zero as well.
>> Looks like whole xHCI was reset are resume:
>>
>> [ 2994.539409] usb usb8: root hub lost power or was reset
>> [ 2994.539411] usb usb9: root hub lost power or was reset
>>
>> This means that xHC controller was reset and xhci driver re-allocated everything.
>>
>> It makes sense to check that xhci virt_device exists in the endpoint reset callback.
>> This will fix the oops, but I'm still missing the big picture, how we ended up here.
>>
>> Would it be possible for you to take traces and logs with the previous patch  that prevents
>> the oops, but shows the "Mathias: No vdev for slot id 0" message?
>>
> 
> Sure, here is:
> 
> dmesg: https://paste.debian.net/1093737/
> traces: https://drive.google.com/open?id=1So-_zsu8ROtMH08hYVKIAZfr_51QLUPD
> 

Thanks, now I understand what is happening.

xhci host driver doesn't do anything in xhci_endpoint_reset() unless the hcpriv pointer
in struct usb_host_endpoint points to a usb device:

static void xhci_endpoint_reset(struct usb_hcd *hcd, struct usb_host_endpoint *host_ep)
{       ...
         if (!host_ep->hcpriv)
                 return;
         udev = (struct usb_device *) host_ep->hcpriv;

host_ep->hcpriv is set in xhci_add_endpoint() when allocating xhci parts of the endpoint.
But the default control endpoint ep0 is never added, its allocated by default together
with the xhci slot, so host_ep->hcpriv for ep0 is always NULL, or, was until
commit "usb: xhci: Add Clear_TT_Buffer" changed that.

ep0 is special, and usb core will reset it before resetting the device.

usb_reset_and_verify_device()
{       ...
         /* ep0 maxpacket size may change; let the HCD know about it.
          * Other endpoints will be handled by re-enumeration. */
         usb_ep0_reinit(udev);
         ret = hub_port_init(parent_hub, udev, port1, i);

If xhci is reset at resume,  all xhci slots are released, and slot_id in struct usb_device
are are set to 0. At device reset the xhci driver notices slot_id doesn't point
to a valid xhci slot, so a new slot is enabled and allocated.

Other endpoints than ep0 are always reset after after the device is reset, and by then there
is a valid xhci slot in place, so that's why this never triggered before.

So this is triggered if there is a full speed or low speed device behind a high speed hub that
stalled, which would set host_ep->hcpriv (patch usb: xhci: Add Clear_TT_Buffer),
followed by a resume that requires xhci host reset, losing all slots,
making host_ep->hcpric->slot_id == 0

So to fixing this by checking slot_id and udev are valid in xhci_endpoint_reset() should be ok,
but after that a better look at how we use host_ep->hcpriv wouldn't hurt

-Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-08-02 11:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-27  3:15 Oops in xhci_endpoint_reset Bob Gleitsmann
2019-07-27 10:59 ` Greg KH
2019-07-27 15:05   ` Bob Gleitsmann
2019-07-27 20:43   ` Bob Gleitsmann
2019-07-30 15:49     ` Enric Balletbo Serra
2019-07-30 16:28       ` Mathias Nyman
2019-07-30 15:57     ` Mathias Nyman
2019-07-31  9:18       ` Enric Balletbo Serra
2019-07-31 14:18         ` Mathias Nyman
2019-07-31 16:31           ` Enric Balletbo Serra
2019-08-02 11:22             ` Mathias Nyman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.