linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrzej Pietrasiewicz <andrzej.p@collabora.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>
Cc: "kernel@collabora.com" <kernel@collabora.com>
Subject: Re: xhci problem -> general protection fault
Date: Fri, 18 Sep 2020 16:20:41 +0200	[thread overview]
Message-ID: <8230c2a2-719c-ef81-e85d-5921bf8e98e6@collabora.com> (raw)
In-Reply-To: <a6364bd9-58d9-e66e-5595-7d887a8f3fc9@linux.intel.com>

Hi Mathias,

W dniu 18.09.2020 o 12:50, Mathias Nyman pisze:
> On 17.9.2020 18.30, Andrzej Pietrasiewicz wrote:
>> Dear All,
>>
>> I have encountered a problem in xhci which leads to general protection fault.
>>
>> The problem is triggered by running this program:
>>
>> https://gitlab.collabora.com/andrzej.p/bulk-cancel.git
>>
>> $ sudo ./bulk-cancel -D /dev/bus/usb/002/006 -i 1 -b 1
>>
>> where /dev/bus/usb/002/006 is a Gadget Zero:
>>
>> It takes less than a minute until the crash happens.
>> The DMAR (iommu) errors don't happen always, i.e. there are crashes
>> when they are not reported in the system log. In either case the
>>
>> "WARN Cannot submit Set TR Deq Ptr"
>> "A Set TR Deq Ptr command is pending."
>> "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state."
>>
>> messages do appear.
>>
> 
> Nice testcase and report, thanks.
> 
> I started looking at issues in this area some time ago, and wrote a couple patches but
> it was left hanging. The two patches (now rebased on 5.9-rc3) can be found in my tree in the
> fix_invalid_context_at_stop_endpoint branch
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git fix_invalid_context_at_stop_endpoint
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=fix_invalid_context_at_stop_endpoint
> 
> If you could give those a try and see if they help I'd be grateful.

No, it doesn't help, albeit the errors are slightly different:

xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
dmar_fault: 44 callbacks suppressed
DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr ffcda000 
[fault reason 05] PTE Write access is not set
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr ffee8000 
[fault reason 05] PTE Write access is not set
xhci_hcd 0000:00:14.0: WARN Cannot submit Set TR Deq Ptr
xhci_hcd 0000:00:14.0: A Set TR Deq Ptr command is pending.
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr fff8b000 
[fault reason 05] PTE Write access is not set
DMAR: DRHD: handling fault status reg 3
xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD 
ep_index 2 comp_code 3
xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff49dd0 trb-start 
00000000fff49e00 trb-end 00000000fff49e10 seg-start 00000000fff49000 seg-end 
00000000fff49ff0
xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr not part of current TD 
ep_index 2 comp_code 3
xhci_hcd 0000:00:14.0: Looking for event-dma 00000000fff49dd0 trb-start 
00000000fff49e00 trb-end 00000000fff49e20 seg-start 00000000fff49000 seg-end

> 
> also, if possible, can you add xhci dynamic debug and traces to show in more detail what is going on.
> 
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < Run the testcase >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace
> 

For 5.9-rc3 _without_ your patches:

xhci_hcd 0000:00:14.0: Babble error for slot 5 ep 2 on endpoint
xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
xhci_hcd 0000:00:14.0: Finding endpoint context
xhci_hcd 0000:00:14.0: Cycle state = 0x1
xhci_hcd 0000:00:14.0: New dequeue segment = 000000000a108eaa (virtual)
xhci_hcd 0000:00:14.0: New dequeue pointer = 0xfff4b010 (DMA)
xhci_hcd 0000:00:14.0: Queueing new dequeue state
xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000000a108eaa 
(0xfff4b000 dma), new deq ptr = 000000006e3d8c35 (0xfff4b010 dma), new cycle = 1
xhci_hcd 0000:00:14.0: // Ding dong!
xhci_hcd 0000:00:14.0: Giveback URB 000000008b374ed9, len = 0, expected = 256, 
status = -75
xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @fff4b010
xhci_hcd 0000:00:14.0: Babble error for slot 5 ep 2 on endpoint
xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
xhci_hcd 0000:00:14.0: Finding endpoint context
xhci_hcd 0000:00:14.0: Cycle state = 0x1
xhci_hcd 0000:00:14.0: New dequeue segment = 000000000a108eaa (virtual)
xhci_hcd 0000:00:14.0: New dequeue pointer = 0xfff4b020 (DMA)
xhci_hcd 0000:00:14.0: Queueing new dequeue state
xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000000a108eaa 
(0xfff4b000 dma), new deq ptr = 00000000355aae2b (0xfff4b020 dma), new cycle = 1
xhci_hcd 0000:00:14.0: // Ding dong!
xhci_hcd 0000:00:14.0: Giveback URB 0000000014c841b0, len = 0, expected = 256, 
status = -75
xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @fff4b020
xhci_hcd 0000:00:14.0: Babble error for slot 5 ep 2 on endpoint
xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
xhci_hcd 0000:00:14.0: Finding endpoint context
xhci_hcd 0000:00:14.0: Cycle state = 0x1
xhci_hcd 0000:00:14.0: New dequeue segment = 000000000a108eaa (virtual)
xhci_hcd 0000:00:14.0: New dequeue pointer = 0xfff4b030 (DMA)
xhci_hcd 0000:00:14.0: Queueing new dequeue state
xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000000a108eaa 
(0xfff4b000 dma), new deq ptr = 00000000ce04ba9f (0xfff4b030 dma), new cycle = 1
xhci_hcd 0000:00:14.0: // Ding dong!
xhci_hcd 0000:00:14.0: Giveback URB 000000008b374ed9, len = 0, expected = 256, 
status = -75


[above blocks repeat a gazillion times]

xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @fff4bd10
xhci_hcd 0000:00:14.0: Babble error for slot 5 ep 2 on endpoint
xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
xhci_hcd 0000:00:14.0: Finding endpoint context
xhci_hcd 0000:00:14.0: Cycle state = 0x0
xhci_hcd 0000:00:14.0: New dequeue segment = 000000000a108eaa (virtual)
xhci_hcd 0000:00:14.0: New dequeue pointer = 0xfff4bd20 (DMA)
xhci_hcd 0000:00:14.0: Queueing new dequeue state
xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000000a108eaa 
(0xfff4b000 dma), new deq ptr = 00000000e2b08aa6 (0xfff4bd20 dma), new cycle = 0
xhci_hcd 0000:00:14.0: // Ding dong!
xhci_hcd 0000:00:14.0: Giveback URB 00000000ea1584be, len = 0, expected = 256, 
status = -75
xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @fff4bd20
xhci_hcd 0000:00:14.0: Babble error for slot 5 ep 2 on endpoint
xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
xhci_hcd 0000:00:14.0: Finding endpoint context
xhci_hcd 0000:00:14.0: Cycle state = 0x0
xhci_hcd 0000:00:14.0: New dequeue segment = 000000000a108eaa (virtual)
xhci_hcd 0000:00:14.0: New dequeue pointer = 0xfff4bd30 (DMA)
xhci_hcd 0000:00:14.0: Queueing new dequeue state
xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000000a108eaa 
(0xfff4b000 dma), new deq ptr = 00000000d04439dd (0xfff4bd30 dma), new cycle = 0
xhci_hcd 0000:00:14.0: // Ding dong!
xhci_hcd 0000:00:14.0: Giveback URB 000000007e94171c, len = 0, expected = 256, 
status = -75
xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @fff4bd30
xhci_hcd 0000:00:14.0: Babble error for slot 5 ep 2 on endpoint
xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
xhci_hcd 0000:00:14.0: Finding endpoint context
xhci_hcd 0000:00:14.0: Cycle state = 0x0
xhci_hcd 0000:00:14.0: New dequeue segment = 000000000a108eaa (virtual)
xhci_hcd 0000:00:14.0: New dequeue pointer = 0xfff4bd40 (DMA)
xhci_hcd 0000:00:14.0: Queueing new dequeue state
xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000000a108eaa 
(0xfff4b000 dma), new deq ptr = 000000007a585f64 (0xfff4bd40 dma), new cycle = 0
xhci_hcd 0000:00:14.0: // Ding dong!
xhci_hcd 0000:00:14.0: Giveback URB 000000000c30ab36, len = 0, expected = 256, 
status = -75
xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @fff4bd40
xhci_hcd 0000:00:14.0: Babble error for slot 5 ep 2 on endpoint
xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
xhci_hcd 0000:00:14.0: Finding endpoint context
xhci_hcd 0000:00:14.0: Cycle state = 0x0
xhci_hcd 0000:00:14.0: New dequeue segment = 000000000a108eaa (virtual)
xhci_hcd 0000:00:14.0: New dequeue pointer = 0xfff4bd50 (DMA)
xhci_hcd 0000:00:14.0: Queueing new dequeue state
xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000000a108eaa 
(0xfff4b000 dma), new deq ptr = 00000000bbfef907 (0xfff4bd50 dma), new cycle = 0
xhci_hcd 0000:00:14.0: // Ding dong!
xhci_hcd 0000:00:14.0: Giveback URB 0000000042fc26fe, len = 0, expected = 256, 
status = -75
xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @fff4bd50
general protection fault: 0000 [#2] SMP PTI
CPU: 3 PID: 1061 Comm: gnome-shell Tainted: G      D           5.4.0-47-generic 
#51~18.04.1-Ubuntu
Hardware name: LENOVO 20B7S4L800/20B7S4L800, BIOS GJET82WW (2.32 ) 01/09/2015
RIP: 0010:kmem_cache_alloc_trace+0x7e/0x220
Code: b9 01 00 00 4d 8b 01 65 49 8b 50 08 65 4c 03 05 50 8e b6 6e 4d 8b 30 4d 85 
f6 0f 84 56 01 00 00 41 8b 41 20 49 8b 39 4c 01 f0 <48> 8b 18 49 33 99 70 01 00 
00 48 89 c1 48 0f c9 4c 89 f0 48 31 cb
RSP: 0018:ffffa3a640bfbeb8 EFLAGS: 00010206
RAX: 56329a4a4f062eb2 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000053b RSI: 0000000000000dc0 RDI: 000000000002f120
RBP: ffffa3a640bfbef0 R08: ffff9563164ef120 R09: ffff956315406f40
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000dc0
R13: ffff956315406f40 R14: 56329a4a4f062eb2 R15: ffffffff9153683c
FS:  00007fefcba2dac0(0000) GS:ffff9563164c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00001d94d9640000 CR3: 00000001ec1ec006 CR4: 00000000001606e0
Call Trace:
  __x64_sys_timerfd_create+0x6c/0x1a0
  do_syscall_64+0x57/0x190
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fefc8af6757
Code: 73 01 c3 48 8b 0d 31 87 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 
00 00 00 00 00 0f 1f 44 00 00 b8 1b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 
48 8b 0d 01 87 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffee678ad78 EFLAGS: 00000246 ORIG_RAX: 000000000000011b
RAX: ffffffffffffffda RBX: 0000555750f80560 RCX: 00007fefc8af6757
RDX: 0000000ed6f808b0 RSI: 0000000000080000 RDI: 0000000000000000
RBP: 0000000000000001 R08: 0000555750f80560 R09: 00007fefc8dbfc40
R10: 0000000000000007 R11: 0000000000000246 R12: 000000005f64c06c
R13: 000000005f64c030 R14: 0000000000000000 R15: 0000000000000000
Modules linked in: ccm aufs overlay cmac bnep intel_rapl_msr mei_hdcp 
intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic aesni_intel crypto_simd cryptd 
glue_helper snd_hda_intel rapl snd_intel_dspcfg snd_hda_codec iwlmvm 
snd_hda_core snd_hwdep mac80211 uvcvideo intel_cstate videobuf2_vmalloc libarc4 
videobuf2_memops videobuf2_v4l2 videobuf2_common snd_pcm cp210x joydev usbtest 
usbserial videodev input_leds rtsx_pci_ms iwlwifi serio_raw mc wmi_bmof memstick 
i915 snd_seq_midi snd_seq_midi_event btusb btrtl btbcm btintel bluetooth 
thinkpad_acpi snd_rawmidi ecdh_generic cfg80211 ecc nvram ledtrig_audio snd_seq 
drm_kms_helper snd_seq_device drm snd_timer snd i2c_algo_bit fb_sys_fops 
syscopyarea mac_hid sysfillrect sysimgblt mei_me lpc_ich mei soundcore 
sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 
rtsx_pci_sdmmc psmouse
  ahci libahci rtsx_pci e1000e wmi video
---[ end trace ac9831e33eee8ef0 ]---
RIP: 0010:__kmalloc+0x9e/0x270
Code: 7f 01 00 00 4d 8b 01 65 49 8b 50 08 65 4c 03 05 70 84 b6 6e 4d 8b 20 4d 85 
e4 0f 84 8d 01 00 00 41 8b 41 20 49 8b 39 4c 01 e0 <48> 8b 18 49 33 99 70 01 00 
00 48 89 c1 48 0f c9 4c 89 e0 48 31 cb
RSP: 0018:ffffa3a640577ca8 EFLAGS: 00010206
RAX: 56329a4a4f062eb2 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000053b RSI: 0000000000000cc0 RDI: 000000000002f120
RBP: ffffa3a640577cd8 R08: ffff9563164ef120 R09: ffff956315406f40
R10: ffff95630af85598 R11: ffff9562ed344338 R12: 56329a4a4f062eb2
R13: 0000000000000cc0 R14: 0000000000000100 R15: ffff956315406f40
FS:  00007fefcba2dac0(0000) GS:ffff9563164c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00001d94d9640000 CR3: 00000001ec1ec006 CR4: 00000000001606e0


# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |

[empty contents!]

Andrzej

  reply	other threads:[~2020-09-18 14:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17 15:30 xhci problem -> general protection fault Andrzej Pietrasiewicz
2020-09-18 10:50 ` Mathias Nyman
2020-09-18 14:20   ` Andrzej Pietrasiewicz [this message]
2020-09-25 13:40     ` Mathias Nyman
2020-09-25 21:05       ` Ross Zwisler
2020-09-28 13:32         ` Andrzej Pietrasiewicz
2020-09-29  7:13           ` Mathias Nyman
2020-10-01 14:13             ` Andrzej Pietrasiewicz
2020-09-28 22:35         ` Mathias Nyman
2020-10-01 16:43           ` zwisler
2020-10-12 19:20             ` Mathias Nyman
2020-10-12 21:53               ` zwisler
2020-10-13  7:49                 ` Mathias Nyman
2020-10-13  8:29                   ` Andrzej Pietrasiewicz
2020-10-13 16:44                     ` zwisler
2020-11-19 16:52                   ` Ross Zwisler
2020-11-23 15:06                     ` Mathias Nyman
2020-12-02 22:59                       ` Ross Zwisler
2020-12-04 18:07                         ` Mathias Nyman
2020-12-08 17:24                           ` Ross Zwisler
2020-12-09 13:11                             ` Mathias Nyman
2020-12-09 18:54                               ` Ross Zwisler
2020-12-30 12:33                                 ` Mathias Nyman
2021-01-06 18:52                                   ` Ross Zwisler
2021-01-07  8:57                                     ` Mathias Nyman
2021-01-07 16:07                                       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8230c2a2-719c-ef81-e85d-5921bf8e98e6@collabora.com \
    --to=andrzej.p@collabora.com \
    --cc=kernel@collabora.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).