On 21.11.2019 18.43, Kai-Heng Feng wrote: > > >> On Nov 21, 2019, at 19:48, Mathias Nyman wrote: >> >> On 21.11.2019 9.38, Kai-Heng Feng wrote: >>> Hi, >>> Currently there are two quirks for Dell TB16 to workaround an issue: >>> "xhci: Bad Ethernet performance plugged in ASM1042A host", and >>> "r8152: disable RX aggregation on Dell TB16 dock". >>> However, the issue wasn't analyzed and root caused. I'd like to understand what's really going on here. >>> Reverting those two patches, and as soon as doing TX/RX over the ethernet on TB16, xhci_hcd errors out: >>> [ 69.803804] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 69.803807] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1010 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> [ 69.803900] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 69.803901] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1020 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> [ 69.803998] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 69.803999] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1030 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> [ 69.804096] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 69.804099] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1040 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> [ 69.955830] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 69.955847] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1050 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> [ 69.956704] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 69.956718] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1060 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> [ 69.963802] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 69.963819] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1070 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> [ 70.112302] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 >>> [ 70.112321] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000047c4b1080 trb-start 000000047c4b0fe0 trb-end 000000047c4b0fe0 seg-start 000000047c4b0000 seg-end 000000047c4b0ff0 >>> So, under what circumstances the DMA pointer can be out of the TD? >> >> When the TRB pointer field of the transfer event doesn't point to a TRB >> in the TD the xhci driver was expecting next. >> In your case xhci driver expects an event for a TD (TRB) at 0x47c4b0fe0 >> >> Looks like driver didn't get that event, or didn't handle it properly. >> Hardware continues processing TRBs in the ring, sending events for later TDs. >> >> Your case looks something like this: >> endpoint ring segment 1: 256 TRBs (47c047c4b0000 -47c4b0ff0) >> 000000047c4b0000 1st TRB >> 000000047c4b0010 2nd TRB >> .... >> 000000047c4b0fd0 TRB that driver has handled >> 000000047c4b0fe0 TRB driver expect an event for >> 000000047c4b0ff0 LINK TRB, Last TRB of this segment, points to next segment. (probably pointing to 000000047c4b1000 >> endpoint ring segment 2: 256 TRBs (probably 47c047c4b1000 -47c4b1ff0) >> 000000047c4b1000 TRB (Log donsn't cover this, probably hw competed?) >> 000000047c4b1010 TRB hardware has completed (short packet 13) >> 000000047c4b1020 TRB hardware has completed (short packet 13) >> 000000047c4b1030 TRB hardware has completed (short packet 13) >> 000000047c4b1040 TRB hardware has completed (short packet 13) >> 000000047c4b1050 TRB hardware has completed (short packet 13) >> 000000047c4b1060 TRB hardware has completed (short packet 13) >> 000000047c4b1070 TRB hardware has completed (short packet 13) >> 000000047c4b1080 TRB hardware has completed (short packet 13), hardware handling is here while software still waiting for an event for TRB at 47c4b0fe0 >> >> If you can get xhci traces, dmesg and a snapshot of the xhci event and transfer ring >> ring after this, we could see if hardware ever wrote the missing event to the event ring, >> or if driver just failed to handle it properly. > > [ 112.224838] rfkill: input handler disabled > [ 434.653968] xhci_hcd 0000:3f:00.0: ep 0x81 - asked for 32768 bytes, 32392 bytes untransferred > [ 438.461323] xhci_hcd 0000:3f:00.0: Cancel URB 00000000c8b2b6a0, dev 1.5, ep 0x7, starting at offset 0x47ccc97b0 > [ 438.461338] xhci_hcd 0000:3f:00.0: // Ding dong! > [ 438.461356] xhci_hcd 0000:3f:00.0: Cancel URB 00000000fb7b60a7, dev 1.5, ep 0x7, starting at offset 0x47ccc96f0 > [ 438.461367] xhci_hcd 0000:3f:00.0: Cancel URB 0000000008bbc1d8, dev 1.5, ep 0x7, starting at offset 0x47ccc9750 > [ 438.461988] xhci_hcd 0000:3f:00.0: Stopped on Transfer TRB for slot 4 ep 13 > [ 438.462000] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc97b0 (dma). > [ 438.462006] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc97c0 (dma). > [ 438.462009] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc97d0 (dma). > [ 438.462013] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc97e0 (dma). > [ 438.462017] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc97f0 (dma). > [ 438.462020] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9800 (dma). > [ 438.462024] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9730 (dma). > [ 438.462028] xhci_hcd 0000:3f:00.0: Finding endpoint context > [ 438.462032] xhci_hcd 0000:3f:00.0: Cycle state = 0x0 > [ 438.462037] xhci_hcd 0000:3f:00.0: New dequeue segment = 000000002f634d54 (virtual) > [ 438.462041] xhci_hcd 0000:3f:00.0: New dequeue pointer = 0x47ccc9740 (DMA) > [ 438.462045] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9740 (dma). > [ 438.462049] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9750 (dma). > [ 438.462053] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9760 (dma). > [ 438.462056] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9770 (dma). > [ 438.462060] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9780 (dma). > [ 438.462063] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc9790 (dma). > [ 438.462067] xhci_hcd 0000:3f:00.0: Removing canceled TD starting at 0x47ccc97a0 (dma). > [ 438.462074] xhci_hcd 0000:3f:00.0: Set TR Deq Ptr cmd, new deq seg = 000000002f634d54 (0x47ccc9000 dma), new deq ptr = 00000000872eb19e (0x47ccc9740 dma), new cycle = 0 > [ 438.462080] xhci_hcd 0000:3f:00.0: // Ding dong! > [ 438.462304] xhci_hcd 0000:3f:00.0: Successful Set TR Deq Ptr cmd, deq = @47ccc9740 > [ 438.467571] xhci_hcd 0000:3f:00.0: xhci_drop_endpoint called for udev 0000000097a7fbc6 > [ 438.467613] xhci_hcd 0000:3f:00.0: drop ep 0x7, slot id 4, new drop flags = 0x4000, new add flags = 0x0 > [ 438.467621] xhci_hcd 0000:3f:00.0: xhci_check_bandwidth called for udev 0000000097a7fbc6 > [ 438.467633] xhci_hcd 0000:3f:00.0: // Ding dong! > [ 438.468374] xhci_hcd 0000:3f:00.0: Successful Endpoint Configure command > [ 443.093658] xhci_hcd 0000:3f:00.0: ep 0x81 - asked for 32768 bytes, 32392 bytes untransferred > [ 446.365530] xhci_hcd 0000:3f:00.0: ep 0x81 - asked for 32768 bytes, 32656 bytes untransferred > [ 446.366192] xhci_hcd 0000:3f:00.0: ep 0x81 - asked for 32768 bytes, 32480 bytes untransferred > ... > [ 447.067231] xhci_hcd 0000:3f:00.0: ep 0x81 - asked for 32768 bytes, 25048 bytes untransferred > [ 447.067544] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.067553] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d010 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > [ 447.076669] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.076682] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d020 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > [ 447.175990] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.176009] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d030 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > [ 447.189334] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.189350] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d040 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > [ 447.375676] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.375690] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d050 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > [ 447.376920] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.376930] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d060 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > [ 447.386611] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.386628] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d070 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > [ 447.488879] xhci_hcd 0000:3f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13 > [ 447.488891] xhci_hcd 0000:3f:00.0: Looking for event-dma 000000049522d080 trb-start 000000049522cfe0 trb-end 000000049522cfe0 seg-start 000000049522c000 seg-end 000000049522cff0 > > >> >> I don't have a nice way of triggering ring dumping at some specific instance, but there >> is generic debugfs ring dump code in xhci-debugfs.c >> >> (cat /sys/kernel/debug/usb/xhci/0000:3f:00.0/event-ring/trbs for sample output) > > 0x0000000498764000: TRB 0000000499941730 status 'Success' len 0 slot 3 ep 7 type 'Transfer Event' flags e:C > 0x0000000498764010: TRB 0000000499941740 status 'Success' len 0 slot 3 ep 7 type 'Transfer Event' flags e:C The event ring is already overwritten with new events when dumped. Could you apply the code below, it will dump both the endpoint transfer ring and the event ring at the right moment: Attached as a patch as well. diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 620846f30b4f..a1a9a408c479 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -201,6 +201,31 @@ void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring) * @more_trbs_coming: Will you enqueue more TRBs before calling * prepare_transfer()? */ + +static int xhci_dump_ring(struct xhci_hcd *xhci, struct xhci_ring *ring) +{ + struct xhci_segment *seg; + dma_addr_t dma; + union xhci_trb *trb; + int i, j; + + seg = ring->first_seg; + + for (i = 0; i < ring->num_segs; i++) { + for (j = 0; j < TRBS_PER_SEGMENT; j++) { + trb = &seg->trbs[j]; + dma = seg->dma + j * sizeof(*trb); + xhci_err(xhci, "%pad: %s\n", &dma, + xhci_decode_trb(le32_to_cpu(trb->generic.field[0]), + le32_to_cpu(trb->generic.field[1]), + le32_to_cpu(trb->generic.field[2]), + le32_to_cpu(trb->generic.field[3]))); + } + seg = seg->next; + } + return 0; +} + static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring, bool more_trbs_coming) { @@ -2318,6 +2343,7 @@ static int handle_tx_event(struct xhci_hcd *xhci, u32 trb_comp_code; int td_num = 0; bool handling_skipped_tds = false; + static unsigned int err_cnt; slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags)); ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1; @@ -2579,6 +2605,12 @@ static int handle_tx_event(struct xhci_hcd *xhci, trb_in_td(xhci, ep_ring->deq_seg, ep_ring->dequeue, td->last_trb, ep_trb_dma, true); + if (err_cnt++ == 5) { + xhci_err(xhci, "EVENT RING:\n"); + xhci_dump_ring(xhci, xhci->event_ring); + xhci_err(xhci, "ENDPOINT RING:\n"); + xhci_dump_ring(xhci, ep_ring); + } return -ESHUTDOWN; }