Linux-USB Archive on lore.kernel.org
 help / color / Atom feed
From: zwisler@google.com
To: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"kernel@collabora.com" <kernel@collabora.com>
Subject: Re: xhci problem -> general protection fault
Date: Thu, 1 Oct 2020 10:43:52 -0600
Message-ID: <20201001164352.GA13249@google.com> (raw)
In-Reply-To: <7e38c533-6ea1-63a6-fc92-2ecef7ee1f84@linux.intel.com>

On Tue, Sep 29, 2020 at 01:35:31AM +0300, Mathias Nyman wrote:
<>
> The race I was referring to is if a driver issues a "Stop endpoint" command,
> and it races with an endpoint error/halt initiated by the xHC controller. 
> 
> The additional note in xhci 4.6.9 - Stop Endpoint, explains it:
> "Note: A Busy endpoint may asynchronously transition from the Running to the Halted
> or Error state due to error conditions detected while processing TRBs. A possible
> race condition may occur if software, thinking an endpoint is in the Running state,
> issues a Stop Endpoint Command however at the same time the xHC
> asynchronously transitions the endpoint to the Halted or Error state. In this case,
> a Context State Error may be generated for the command completion. Software
> may verify that this case occurred by inspecting the EP State for Halted or Error
> when a Stop Endpoint Command results in a Context State Error."
> 
> There are several context state errors in your trace.
> 
> Thanks
> -Mathias

Interestingly it looks like it's the actions that we take at the end of
xhci_handle_cmd_set_deq() for the broken command which break the HC.
Specifically, this line:

	dev->eps[ep_index].ep_state &= ~SET_DEQ_PENDING;

If I skip this line when I notice that ep_ctx->deq==0, the system will keep
running happily.

Here is a trace and dmesg for a run with the patch at the bottom of this mail.
I trimmed the trace a bit since it was very large, but I think I've left the
important bits intact:

https://gist.github.com/rzwisler/422e55321d9d2db5fc258d6d5b93d018

I've been able to run with this patch and survive through many "Mismatch"
occurrences, both with ep_ctx->deq set to 0 and set to some other value which
just seems to be wrong.

It seems like there are a few other places where we notice that we're in a bad
state, and we just bail, specifically these in xhci_queue_new_dequeue_state():

	addr = xhci_trb_virt_to_dma(deq_state->new_deq_seg,
				    deq_state->new_deq_ptr);
	if (addr == 0) {
		xhci_warn(xhci, "WARN Cannot submit Set TR Deq Ptr\n");
		xhci_warn(xhci, "WARN deq seg = %px, deq pt = %px\n",
			  deq_state->new_deq_seg, deq_state->new_deq_ptr);
		return;
	}
	ep = &xhci->devs[slot_id]->eps[ep_index];
	if ((ep->ep_state & SET_DEQ_PENDING)) {
		xhci_warn(xhci, "WARN Cannot submit Set TR Deq Ptr\n");
		xhci_warn(xhci, "A Set TR Deq Ptr command is pending.\n");
		return;
	}

Is noticing that the HC has given us bad data via the "Mismatch" check in
xhci_handle_cmd_set_deq() and bailing out enough, or should we figure out
exactly why the HC is getting into a bad state?

I'm happy to gather logs with more debug or run other experiments, if that
would be helpful.  As it is I don't really know how to debug the internal
state of the HC further, but hopefully the knowledge that the patch below
makes a difference will help us move forward.

--- >8 ---
From 1642f6f375d4822919fb03a56ce8c9186f58d6f7 Mon Sep 17 00:00:00 2001
From: Ross Zwisler <zwisler@google.com>
Date: Wed, 30 Sep 2020 17:45:24 -0600
Subject: [PATCH] DEBUG: avoid xhci crash

---
 drivers/usb/host/xhci-ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 2c255d0620b05..379570facbc9e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1142,6 +1142,8 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 			xhci_warn(xhci, "Mismatch between completed Set TR Deq Ptr command & xHCI internal state.\n");
 			xhci_warn(xhci, "ep deq seg = %p, deq ptr = %p\n",
 				  ep->queued_deq_seg, ep->queued_deq_ptr);
+			xhci_warn(xhci, "calculated deq:%#llx  real deq:%#llx\n", xhci_trb_virt_to_dma(ep->queued_deq_seg, ep->queued_deq_ptr), deq);
+			return;
 		}
 	}
 
-- 
2.28.0.709.gb0816b6eb0-goog


  reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17 15:30 Andrzej Pietrasiewicz
2020-09-18 10:50 ` Mathias Nyman
2020-09-18 14:20   ` Andrzej Pietrasiewicz
2020-09-25 13:40     ` Mathias Nyman
2020-09-25 21:05       ` Ross Zwisler
2020-09-28 13:32         ` Andrzej Pietrasiewicz
2020-09-29  7:13           ` Mathias Nyman
2020-10-01 14:13             ` Andrzej Pietrasiewicz
2020-09-28 22:35         ` Mathias Nyman
2020-10-01 16:43           ` zwisler [this message]
2020-10-12 19:20             ` Mathias Nyman
2020-10-12 21:53               ` zwisler
2020-10-13  7:49                 ` Mathias Nyman
2020-10-13  8:29                   ` Andrzej Pietrasiewicz
2020-10-13 16:44                     ` zwisler
2020-11-19 16:52                   ` Ross Zwisler
2020-11-23 15:06                     ` Mathias Nyman
2020-12-02 22:59                       ` Ross Zwisler
2020-12-04 18:07                         ` Mathias Nyman
2020-12-08 17:24                           ` Ross Zwisler
2020-12-09 13:11                             ` Mathias Nyman
2020-12-09 18:54                               ` Ross Zwisler
2020-12-30 12:33                                 ` Mathias Nyman
2021-01-06 18:52                                   ` Ross Zwisler
2021-01-07  8:57                                     ` Mathias Nyman
2021-01-07 16:07                                       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201001164352.GA13249@google.com \
    --to=zwisler@google.com \
    --cc=andrzej.p@collabora.com \
    --cc=kernel@collabora.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-USB Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \
		linux-usb@vger.kernel.org
	public-inbox-index linux-usb

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git