From: Mathias Nyman <mathias.nyman@linux.intel.com> To: zwisler@google.com Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com>, "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>, "kernel@collabora.com" <kernel@collabora.com> Subject: Re: xhci problem -> general protection fault Date: Mon, 12 Oct 2020 22:20:31 +0300 Message-ID: <69f8cbc3-0ae7-cfb2-2fdd-556ada77381f@linux.intel.com> (raw) In-Reply-To: <20201001164352.GA13249@google.com> On 1.10.2020 19.43, zwisler@google.com wrote: > On Tue, Sep 29, 2020 at 01:35:31AM +0300, Mathias Nyman wrote: > <> >> The race I was referring to is if a driver issues a "Stop endpoint" command, >> and it races with an endpoint error/halt initiated by the xHC controller. >> >> The additional note in xhci 4.6.9 - Stop Endpoint, explains it: >> "Note: A Busy endpoint may asynchronously transition from the Running to the Halted >> or Error state due to error conditions detected while processing TRBs. A possible >> race condition may occur if software, thinking an endpoint is in the Running state, >> issues a Stop Endpoint Command however at the same time the xHC >> asynchronously transitions the endpoint to the Halted or Error state. In this case, >> a Context State Error may be generated for the command completion. Software >> may verify that this case occurred by inspecting the EP State for Halted or Error >> when a Stop Endpoint Command results in a Context State Error." >> >> There are several context state errors in your trace. >> >> Thanks >> -Mathias > > Interestingly it looks like it's the actions that we take at the end of > xhci_handle_cmd_set_deq() for the broken command which break the HC. > Specifically, this line: > > dev->eps[ep_index].ep_state &= ~SET_DEQ_PENDING; > > If I skip this line when I notice that ep_ctx->deq==0, the system will keep > running happily. Skipping this will prevent this endpoint from running, and thus preventing the issues seen if we continue. > > Here is a trace and dmesg for a run with the patch at the bottom of this mail. > I trimmed the trace a bit since it was very large, but I think I've left the > important bits intact: > > https://gist.github.com/rzwisler/422e55321d9d2db5fc258d6d5b93d018 > > I've been able to run with this patch and survive through many "Mismatch" > occurrences, both with ep_ctx->deq set to 0 and set to some other value which > just seems to be wrong. > > It seems like there are a few other places where we notice that we're in a bad > state, and we just bail, specifically these in xhci_queue_new_dequeue_state(): > > addr = xhci_trb_virt_to_dma(deq_state->new_deq_seg, > deq_state->new_deq_ptr); > if (addr == 0) { > xhci_warn(xhci, "WARN Cannot submit Set TR Deq Ptr\n"); > xhci_warn(xhci, "WARN deq seg = %px, deq pt = %px\n", > deq_state->new_deq_seg, deq_state->new_deq_ptr); > return; > } > ep = &xhci->devs[slot_id]->eps[ep_index]; > if ((ep->ep_state & SET_DEQ_PENDING)) { > xhci_warn(xhci, "WARN Cannot submit Set TR Deq Ptr\n"); > xhci_warn(xhci, "A Set TR Deq Ptr command is pending.\n"); > return; > } > > Is noticing that the HC has given us bad data via the "Mismatch" check in > xhci_handle_cmd_set_deq() and bailing out enough, or should we figure out > exactly why the HC is getting into a bad state? I'm rewriting how xhci driver handles halted and canceled transfers. While looking into it I found an older case where hardware gives bad data in the output context. This was 10 years ago and on some specic hardware, see commit: ac9d8fe7c6a8 USB: xhci: Add quirk for Fresco Logic xHCI hardware. > > I'm happy to gather logs with more debug or run other experiments, if that > would be helpful. As it is I don't really know how to debug the internal > state of the HC further, but hopefully the knowledge that the patch below > makes a difference will help us move forward. Great thanks, it will take some time before rewrite is ready. -Mathias
next prev parent reply index Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-17 15:30 Andrzej Pietrasiewicz 2020-09-18 10:50 ` Mathias Nyman 2020-09-18 14:20 ` Andrzej Pietrasiewicz 2020-09-25 13:40 ` Mathias Nyman 2020-09-25 21:05 ` Ross Zwisler 2020-09-28 13:32 ` Andrzej Pietrasiewicz 2020-09-29 7:13 ` Mathias Nyman 2020-10-01 14:13 ` Andrzej Pietrasiewicz 2020-09-28 22:35 ` Mathias Nyman 2020-10-01 16:43 ` zwisler 2020-10-12 19:20 ` Mathias Nyman [this message] 2020-10-12 21:53 ` zwisler 2020-10-13 7:49 ` Mathias Nyman 2020-10-13 8:29 ` Andrzej Pietrasiewicz 2020-10-13 16:44 ` zwisler 2020-11-19 16:52 ` Ross Zwisler 2020-11-23 15:06 ` Mathias Nyman 2020-12-02 22:59 ` Ross Zwisler 2020-12-04 18:07 ` Mathias Nyman 2020-12-08 17:24 ` Ross Zwisler 2020-12-09 13:11 ` Mathias Nyman 2020-12-09 18:54 ` Ross Zwisler 2020-12-30 12:33 ` Mathias Nyman 2021-01-06 18:52 ` Ross Zwisler 2021-01-07 8:57 ` Mathias Nyman 2021-01-07 16:07 ` Ross Zwisler
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=69f8cbc3-0ae7-cfb2-2fdd-556ada77381f@linux.intel.com \ --to=mathias.nyman@linux.intel.com \ --cc=andrzej.p@collabora.com \ --cc=kernel@collabora.com \ --cc=linux-usb@vger.kernel.org \ --cc=zwisler@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-USB Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \ linux-usb@vger.kernel.org public-inbox-index linux-usb Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb AGPL code for this site: git clone https://public-inbox.org/public-inbox.git