Linux-USB Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/8] xhci fixes for usb-linus
@ 2019-10-04 11:59 Mathias Nyman
  2019-10-04 11:59 ` [PATCH 1/8] xhci: Fix false warning message about wrong bounce buffer write length Mathias Nyman
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman

Hi Greg

A set of xhci fixes for usb-linus, for exaple fix a NULL pointer
dereference, remove a false warning, resolve a couple link power management
related bandwidth issues, and correctly detect USB 3.1 capability on hosts
based on early xHCI 1.1 specs.

-Mathias

Bill Kuzeja (1):
  xhci: Prevent deadlock when xhci adapter breaks during init

Jan Schmidt (1):
  xhci: Check all endpoints for LPM timeout

Kai-Heng Feng (1):
  xhci: Increase STS_SAVE timeout in xhci_suspend()

Mathias Nyman (4):
  xhci: Fix false warning message about wrong bounce buffer write length
  xhci: Prevent device initiated U1/U2 link pm if exit latency is too
    long
  xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based
    hosts
  xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete()

Rick Tseng (1):
  usb: xhci: wait for CNR controller not ready bit in xhci resume

 drivers/usb/host/xhci-ring.c |  4 +--
 drivers/usb/host/xhci.c      | 78 +++++++++++++++++++++++++++++++++++---------
 2 files changed, 65 insertions(+), 17 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/8] xhci: Fix false warning message about wrong bounce buffer write length
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
  2019-10-04 11:59 ` [PATCH 2/8] xhci: Prevent device initiated U1/U2 link pm if exit latency is too long Mathias Nyman
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman, # v4 . 8+

The check printing out the "WARN Wrong bounce buffer write length:"
uses incorrect values when comparing bytes written from scatterlist
to bounce buffer. Actual copied lengths are fine.

The used seg->bounce_len will be set to equal new_buf_len a few lines later
in the code, but is incorrect when doing the comparison.

The patch which added this false warning was backported to 4.8+ kernels
so this should be backported as far as well.

Cc: <stable@vger.kernel.org> # v4.8+
Fixes: 597c56e372da ("xhci: update bounce buffer with correct sg num")
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9741cdeea9d7..85ceb43e3405 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3202,10 +3202,10 @@ static int xhci_align_td(struct xhci_hcd *xhci, struct urb *urb, u32 enqd_len,
 	if (usb_urb_dir_out(urb)) {
 		len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs,
 				   seg->bounce_buf, new_buff_len, enqd_len);
-		if (len != seg->bounce_len)
+		if (len != new_buff_len)
 			xhci_warn(xhci,
 				"WARN Wrong bounce buffer write length: %zu != %d\n",
-				len, seg->bounce_len);
+				len, new_buff_len);
 		seg->bounce_dma = dma_map_single(dev, seg->bounce_buf,
 						 max_pkt, DMA_TO_DEVICE);
 	} else {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/8] xhci: Prevent device initiated U1/U2 link pm if exit latency is too long
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
  2019-10-04 11:59 ` [PATCH 1/8] xhci: Fix false warning message about wrong bounce buffer write length Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
  2019-10-04 11:59 ` [PATCH 3/8] xhci: Check all endpoints for LPM timeout Mathias Nyman
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman, stable

If host/hub initiated link pm is prevented by a driver flag we still must
ensure that periodic endpoints have longer service intervals than link pm
exit latency before allowing device initiated link pm.

Fix this by continue walking and checking endpoint service interval if
xhci_get_timeout_no_hub_lpm() returns anything else than USB3_LPM_DISABLED

While at it fix the split line error message

Tested-by: Jan Schmidt <jan@centricular.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 500865975687..aed5a65f25bd 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4790,10 +4790,12 @@ static u16 xhci_calculate_lpm_timeout(struct usb_hcd *hcd,
 		if (intf->dev.driver) {
 			driver = to_usb_driver(intf->dev.driver);
 			if (driver && driver->disable_hub_initiated_lpm) {
-				dev_dbg(&udev->dev, "Hub-initiated %s disabled "
-						"at request of driver %s\n",
-						state_name, driver->name);
-				return xhci_get_timeout_no_hub_lpm(udev, state);
+				dev_dbg(&udev->dev, "Hub-initiated %s disabled at request of driver %s\n",
+					state_name, driver->name);
+				timeout = xhci_get_timeout_no_hub_lpm(udev,
+								      state);
+				if (timeout == USB3_LPM_DISABLED)
+					return timeout;
 			}
 		}
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/8] xhci: Check all endpoints for LPM timeout
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
  2019-10-04 11:59 ` [PATCH 1/8] xhci: Fix false warning message about wrong bounce buffer write length Mathias Nyman
  2019-10-04 11:59 ` [PATCH 2/8] xhci: Prevent device initiated U1/U2 link pm if exit latency is too long Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
  2019-10-04 11:59 ` [PATCH 4/8] xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts Mathias Nyman
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Jan Schmidt, stable, Mathias Nyman

From: Jan Schmidt <jan@centricular.com>

If an endpoint is encountered that returns USB3_LPM_DEVICE_INITIATED, keep
checking further endpoints, as there might be periodic endpoints later
that return USB3_LPM_DISABLED due to shorter service intervals.

Without this, the code can set too high a maximum-exit-latency and
prevent the use of multiple USB3 cameras that should be able to work.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jan Schmidt <jan@centricular.com>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index aed5a65f25bd..a5ba5ace3d00 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4674,12 +4674,12 @@ static int xhci_update_timeout_for_endpoint(struct xhci_hcd *xhci,
 	alt_timeout = xhci_call_host_update_timeout_for_endpoint(xhci, udev,
 		desc, state, timeout);
 
-	/* If we found we can't enable hub-initiated LPM, or
+	/* If we found we can't enable hub-initiated LPM, and
 	 * the U1 or U2 exit latency was too high to allow
-	 * device-initiated LPM as well, just stop searching.
+	 * device-initiated LPM as well, then we will disable LPM
+	 * for this device, so stop searching any further.
 	 */
-	if (alt_timeout == USB3_LPM_DISABLED ||
-			alt_timeout == USB3_LPM_DEVICE_INITIATED) {
+	if (alt_timeout == USB3_LPM_DISABLED) {
 		*timeout = alt_timeout;
 		return -E2BIG;
 	}
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/8] xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
                   ` (2 preceding siblings ...)
  2019-10-04 11:59 ` [PATCH 3/8] xhci: Check all endpoints for LPM timeout Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
  2019-10-04 11:59 ` [PATCH 5/8] usb: xhci: wait for CNR controller not ready bit in xhci resume Mathias Nyman
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman, # v4 . 18+, Loïc Yhuel

Early xHCI 1.1 spec did not mention USB 3.1 capable hosts should set
sbrn to 0x31, or that the minor revision is a two digit BCD
containing minor and sub-minor numbers.
This was later clarified in xHCI 1.2.

Some USB 3.1 capable hosts therefore have sbrn set to 0x30, or minor
revision set to 0x1 instead of 0x10.

Detect the USB 3.1 capability correctly for these hosts as well

Fixes: ddd57980a0fd ("xhci: detect USB 3.2 capable host controllers correctly")
Cc: <stable@vger.kernel.org> # v4.18+
Cc: Loïc Yhuel <loic.yhuel@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index a5ba5ace3d00..aec69d294973 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -5079,11 +5079,18 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks)
 		hcd->has_tt = 1;
 	} else {
 		/*
-		 * Some 3.1 hosts return sbrn 0x30, use xhci supported protocol
-		 * minor revision instead of sbrn. Minor revision is a two digit
-		 * BCD containing minor and sub-minor numbers, only show minor.
+		 * Early xHCI 1.1 spec did not mention USB 3.1 capable hosts
+		 * should return 0x31 for sbrn, or that the minor revision
+		 * is a two digit BCD containig minor and sub-minor numbers.
+		 * This was later clarified in xHCI 1.2.
+		 *
+		 * Some USB 3.1 capable hosts therefore have sbrn 0x30, and
+		 * minor revision set to 0x1 instead of 0x10.
 		 */
-		minor_rev = xhci->usb3_rhub.min_rev / 0x10;
+		if (xhci->usb3_rhub.min_rev == 0x1)
+			minor_rev = 1;
+		else
+			minor_rev = xhci->usb3_rhub.min_rev / 0x10;
 
 		switch (minor_rev) {
 		case 2:
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/8] usb: xhci: wait for CNR controller not ready bit in xhci resume
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
                   ` (3 preceding siblings ...)
  2019-10-04 11:59 ` [PATCH 4/8] xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
  2019-10-04 11:59 ` [PATCH 6/8] xhci: Prevent deadlock when xhci adapter breaks during init Mathias Nyman
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Rick Tseng, stable, Mathias Nyman

From: Rick Tseng <rtseng@nvidia.com>

NVIDIA 3.1 xHCI card would lose power when moving power state into D3Cold.
Thus we need to wait for CNR bit to clear in xhci resume, just as in
xhci init.

[Minor changes to comment and commit message -Mathias]
Cc: <stable@vger.kernel.org>
Signed-off-by: Rick Tseng <rtseng@nvidia.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index aec69d294973..fe8f9ff58015 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1108,6 +1108,18 @@ int xhci_resume(struct xhci_hcd *xhci, bool hibernated)
 		hibernated = true;
 
 	if (!hibernated) {
+		/*
+		 * Some controllers might lose power during suspend, so wait
+		 * for controller not ready bit to clear, just as in xHC init.
+		 */
+		retval = xhci_handshake(&xhci->op_regs->status,
+					STS_CNR, 0, 10 * 1000 * 1000);
+		if (retval) {
+			xhci_warn(xhci, "Controller not ready at resume %d\n",
+				  retval);
+			spin_unlock_irq(&xhci->lock);
+			return retval;
+		}
 		/* step 1: restore register */
 		xhci_restore_registers(xhci);
 		/* step 2: initialize command ring buffer */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 6/8] xhci: Prevent deadlock when xhci adapter breaks during init
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
                   ` (4 preceding siblings ...)
  2019-10-04 11:59 ` [PATCH 5/8] usb: xhci: wait for CNR controller not ready bit in xhci resume Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
  2019-10-04 11:59 ` [PATCH 7/8] xhci: Increase STS_SAVE timeout in xhci_suspend() Mathias Nyman
  2019-10-04 11:59 ` [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete() Mathias Nyman
  7 siblings, 0 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh
  Cc: linux-usb, Bill Kuzeja, # v4 . 17+,
	Torez Smith, Bill Kuzeja, Mathias Nyman

From: Bill Kuzeja <William.Kuzeja@stratus.com>

The system can hit a deadlock if an xhci adapter breaks while initializing.
The deadlock is between two threads: thread 1 is tearing down the
adapter and is stuck in usb_unlocked_disable_lpm waiting to lock the
hcd->handwidth_mutex. Thread 2 is holding this mutex (while still trying
to add a usb device), but is stuck in xhci_endpoint_reset waiting for a
stop or config command to complete. A reboot is required to resolve.

It turns out when calling xhci_queue_stop_endpoint and
xhci_queue_configure_endpoint in xhci_endpoint_reset, the return code is
not checked for errors. If the timing is right and the adapter dies just
before either of these commands get issued, we hang indefinitely waiting
for a completion on a command that didn't get issued.

This wasn't a problem before the following fix because we didn't send
commands in xhci_endpoint_reset:

commit f5249461b504 ("xhci: Clear the host side toggle manually when
    endpoint is soft reset")

With the patch I am submitting, a duration test which breaks adapters
during initialization (and which deadlocks with the standard kernel) runs
without issue.

Fixes: f5249461b504 ("xhci: Clear the host side toggle manually when
    endpoint is soft reset")
Cc: <stable@vger.kernel.org> # v4.17+
Cc: Torez Smith <torez@redhat.com>
Signed-off-by: Bill Kuzeja <william.kuzeja@stratus.com>
Signed-off-by: Torez Smith <torez@redhat.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index fe8f9ff58015..8c068d0cc7c1 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3095,6 +3095,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 	unsigned int ep_index;
 	unsigned long flags;
 	u32 ep_flag;
+	int err;
 
 	xhci = hcd_to_xhci(hcd);
 	if (!host_ep->hcpriv)
@@ -3154,7 +3155,17 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 		xhci_free_command(xhci, cfg_cmd);
 		goto cleanup;
 	}
-	xhci_queue_stop_endpoint(xhci, stop_cmd, udev->slot_id, ep_index, 0);
+
+	err = xhci_queue_stop_endpoint(xhci, stop_cmd, udev->slot_id,
+					ep_index, 0);
+	if (err < 0) {
+		spin_unlock_irqrestore(&xhci->lock, flags);
+		xhci_free_command(xhci, cfg_cmd);
+		xhci_dbg(xhci, "%s: Failed to queue stop ep command, %d ",
+				__func__, err);
+		goto cleanup;
+	}
+
 	xhci_ring_cmd_db(xhci);
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
@@ -3168,8 +3179,16 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 					   ctrl_ctx, ep_flag, ep_flag);
 	xhci_endpoint_copy(xhci, cfg_cmd->in_ctx, vdev->out_ctx, ep_index);
 
-	xhci_queue_configure_endpoint(xhci, cfg_cmd, cfg_cmd->in_ctx->dma,
+	err = xhci_queue_configure_endpoint(xhci, cfg_cmd, cfg_cmd->in_ctx->dma,
 				      udev->slot_id, false);
+	if (err < 0) {
+		spin_unlock_irqrestore(&xhci->lock, flags);
+		xhci_free_command(xhci, cfg_cmd);
+		xhci_dbg(xhci, "%s: Failed to queue config ep command, %d ",
+				__func__, err);
+		goto cleanup;
+	}
+
 	xhci_ring_cmd_db(xhci);
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 7/8] xhci: Increase STS_SAVE timeout in xhci_suspend()
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
                   ` (5 preceding siblings ...)
  2019-10-04 11:59 ` [PATCH 6/8] xhci: Prevent deadlock when xhci adapter breaks during init Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
       [not found]   ` <20191006120750.5334F2087E@mail.kernel.org>
  2019-10-04 11:59 ` [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete() Mathias Nyman
  7 siblings, 1 reply; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Kai-Heng Feng, # v5 . 2+, Mathias Nyman

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

After commit f7fac17ca925 ("xhci: Convert xhci_handshake() to use
readl_poll_timeout_atomic()"), ASMedia xHCI may fail to suspend.

Although the algorithms are essentially the same, the old max timeout is
(usec + usec * time of doing readl()), and the new max timeout is just
usec, which is much less than the old one.

Increase the timeout to make ASMedia xHCI able to suspend again.

BugLink: https://bugs.launchpad.net/bugs/1844021
Fixes: f7fac17ca925 ("xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()")
Cc: <stable@vger.kernel.org> # v5.2+
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 8c068d0cc7c1..00f3804f7aa7 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1032,7 +1032,7 @@ int xhci_suspend(struct xhci_hcd *xhci, bool do_wakeup)
 	writel(command, &xhci->op_regs->command);
 	xhci->broken_suspend = 0;
 	if (xhci_handshake(&xhci->op_regs->status,
-				STS_SAVE, 0, 10 * 1000)) {
+				STS_SAVE, 0, 20 * 1000)) {
 	/*
 	 * AMD SNPS xHC 3.0 occasionally does not clear the
 	 * SSS bit of USBSTS and when driver tries to poll
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete()
  2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
                   ` (6 preceding siblings ...)
  2019-10-04 11:59 ` [PATCH 7/8] xhci: Increase STS_SAVE timeout in xhci_suspend() Mathias Nyman
@ 2019-10-04 11:59 ` Mathias Nyman
  2019-10-07 14:02   ` Johan Hovold
  7 siblings, 1 reply; 16+ messages in thread
From: Mathias Nyman @ 2019-10-04 11:59 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman, # v5 . 3

udev stored in ep->hcpriv might be NULL if tt buffer is cleared
due to a halted control endpoint during device enumeration

xhci_clear_tt_buffer_complete is called by hub_tt_work() once it's
scheduled,  and by then usb core might have freed and allocated a
new udev for the next enumeration attempt.

Fixes: ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer")
Cc: <stable@vger.kernel.org> # v5.3
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 00f3804f7aa7..517ec3206f6e 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -5238,8 +5238,16 @@ static void xhci_clear_tt_buffer_complete(struct usb_hcd *hcd,
 	unsigned int ep_index;
 	unsigned long flags;
 
+	/*
+	 * udev might be NULL if tt buffer is cleared during a failed device
+	 * enumeration due to a halted control endpoint. Usb core might
+	 * have allocated a new udev for the next enumeration attempt.
+	 */
+
 	xhci = hcd_to_xhci(hcd);
 	udev = (struct usb_device *)ep->hcpriv;
+	if (!udev)
+		return;
 	slot_id = udev->slot_id;
 	ep_index = xhci_get_endpoint_index(&ep->desc);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete()
  2019-10-04 11:59 ` [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete() Mathias Nyman
@ 2019-10-07 14:02   ` Johan Hovold
  2019-10-08  8:15     ` Mathias Nyman
  0 siblings, 1 reply; 16+ messages in thread
From: Johan Hovold @ 2019-10-07 14:02 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: gregkh, linux-usb, # v5 . 3, Alan Stern

[ +CC: Alan ]

On Fri, Oct 04, 2019 at 02:59:33PM +0300, Mathias Nyman wrote:
> udev stored in ep->hcpriv might be NULL if tt buffer is cleared
> due to a halted control endpoint during device enumeration
> 
> xhci_clear_tt_buffer_complete is called by hub_tt_work() once it's
> scheduled,  and by then usb core might have freed and allocated a
> new udev for the next enumeration attempt.
>
> Fixes: ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer")
> Cc: <stable@vger.kernel.org> # v5.3
> Reported-by: Johan Hovold <johan@kernel.org>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
>  drivers/usb/host/xhci.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 00f3804f7aa7..517ec3206f6e 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -5238,8 +5238,16 @@ static void xhci_clear_tt_buffer_complete(struct usb_hcd *hcd,
>  	unsigned int ep_index;
>  	unsigned long flags;
>  
> +	/*
> +	 * udev might be NULL if tt buffer is cleared during a failed device
> +	 * enumeration due to a halted control endpoint. Usb core might
> +	 * have allocated a new udev for the next enumeration attempt.
> +	 */
> +
>  	xhci = hcd_to_xhci(hcd);
>  	udev = (struct usb_device *)ep->hcpriv;
> +	if (!udev)
> +		return;

I didn't have time to look into this myself last week, or comment on the
patch before Greg picked it up, but this clearly isn't the right fix.

As your comment suggests, ep->hcpriv may indeed be NULL here if USB core
have allocated a new udev. But this only happens after USB has freed the
old usb_device and the new one happens to get the same address.

Note that even the usb_host_endpoint itself (ep) has then been freed and
reallocated since it is member of struct usb_device, and it is the
use-after-free that needs fixing.

I've even been able to trigger another NULL-deref in this function
before a new udev has been allocated, due to the virt dev having been
freed by xhci_free_dev as part of usb_release_dev:

[   19.627771] usb 2-2.4: unable to read config index 0 descriptor/start: -32
[   19.627966] usb 2-2.4: chopping to 0 config(s)
[   19.628133] usb 2-2.4: can't read configurations, error -32
[   19.629017] usb 2-2.4: usb_release_dev - udev = ffff930b14d82000

udev is freed here

[   19.629258] usb 2-2-port4: attempt power cycle
[   19.629461] xhci_clear_tt_buffer_complete - udev = ffff930b14d82000

use-after-free when tt work is scheduled (note than udev is non-NULL
since udev hasn't been reallocated and initialised yet):

[   19.629643] xhci_clear_tt_buffer_complete - xhci->devs[4] = 0000000000000000

virt dev is NULL after having been freed by xhci_free_dev()

[   19.629876] BUG: kernel NULL pointer dereference, address: 0000000000000030

and is later dereferenced

[   19.630034] #PF: supervisor write access in kernel mode
[   19.630155] #PF: error_code(0x0002) - not-present page
[   19.630270] PGD 0 P4D 0 
[   19.630341] Oops: 0002 [#1] SMP
[   19.630425] CPU: 2 PID: 110 Comm: kworker/2:2 Not tainted 5.4.0-rc1 #28
[   19.630572] Hardware name:  /D34010WYK, BIOS WYLPT10H.86A.0051.2019.0322.1320 03/22/2019
[   19.636141] Workqueue: events hub_tt_work
[   19.638125] RIP: 0010:xhci_clear_tt_buffer_complete.cold.69+0x9b/0xcd

It seems the xhci clear-tt implementation was incomplete since it did
not take care to wait for any ongoing work before disabling the
endpoint. EHCI does this in ehci_endpoint_disable(), but xhci doesn't
even implement that callback.

As this may be something you could end up hitting in other paths as
well, perhaps we should even consider reverting the offending commit
pending a more complete implementation?

>  	slot_id = udev->slot_id;
>  	ep_index = xhci_get_endpoint_index(&ep->desc);

For reference, my original report is here:

	https://lkml.kernel.org/r/20190930103107.GC13531@localhost

Johan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/8] xhci: Increase STS_SAVE timeout in xhci_suspend()
       [not found]   ` <20191006120750.5334F2087E@mail.kernel.org>
@ 2019-10-07 18:51     ` Kai-Heng Feng
  0 siblings, 0 replies; 16+ messages in thread
From: Kai-Heng Feng @ 2019-10-07 18:51 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Mathias Nyman, gregkh, linux-usb, stable

Hi Sasha,

> On Oct 6, 2019, at 20:07, Sasha Levin <sashal@kernel.org> wrote:
> 
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: f7fac17ca925 xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic().
> 
> The bot has tested the following trees: v5.3.2, v5.2.18, v4.19.76, v4.14.146, v4.9.194, v4.4.194.
> 
> v5.3.2: Build OK!
> v5.2.18: Build OK!
> v4.19.76: Build OK!
> v4.14.146: Build OK!
> v4.9.194: Failed to apply! Possible dependencies:
>    0b6c324c8b60 ("xhci: cleanup and refactor process_ctrl_td()")
>    0f1d832ed1fb ("usb: xhci: Add port test modes support for usb2.")
>    11644a765952 ("xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc")
>    191edc5e2e51 ("xhci: Fix front USB ports on ASUS PRIME B350M-A")
>    1cc6d8617b91 ("usb: xhci: remove unnecessary second abort try")
>    2a72126de1bb ("xhci: Remove duplicate xhci urb giveback functions")
>    2d6d5769f82d ("xhci: fix non static symbol warning")
>    30a65b45bfb1 ("xhci: cleanup and refactor process_bulk_intr_td()")
>    446b31419cb1 ("xhci: refactor handle_tx_event() urb giveback")
>    4750bc78efdb ("usb: host: xhci support option to disable the xHCI USB2 HW LPM")
>    488dc164914f ("xhci: remove WARN_ON if dma mask is not set for platform devices")
>    4c39d4b949d3 ("usb: xhci: use bus->sysdev for DMA configuration")
>    505f581c48bc ("xhci: simplify if statement to make it more readable")
>    52ab86852f74 ("xhci: remove extra URB_SHORT_NOT_OK checks in xhci, core handles most cases")
>    6b7f40f71234 ("xhci: change xhci_set_link_state() to work with port structures")
>    76a35293b901 ("usb: host: xhci: simplify irq handler return")
>    9983a5fc39bf ("xhci: rename EP_HALT_PENDING to EP_STOP_CMD_PENDING")
>    9ef7fbbb4fdf ("xhci: Rename variables related to transfer descritpors")
>    a6ff6cbf1fab ("usb: xhci: Add helper function xhci_set_power_on().")
>    a7d57abcc8a5 ("xhci: workaround CSS timeout on AMD SNPS 3.0 xHC")
>    d3519b9d9606 ("xhci: Manually give back cancelled URB if we can't queue it for cancel")
>    d9f11ba9f107 ("xhci: Rework how we handle unresponsive or hoptlug removed hosts")
>    e740b019d7c6 ("xhci: xhci-hub: use new port structures to get port address instead of port array")
>    eaefcf246b56 ("xhci: change xhci_test_and_clear_bit() to use new port structure")
>    f97c08ae329b ("xhci: rename endpoint related trb variables")
>    f99265965b32 ("xhci: detect stop endpoint race using pending timer instead of counter.")
>    ffd4b4fc0b9a ("xhci: Add helper to get xhci roothub from hcd")
> 
> v4.4.194: Failed to apply! Possible dependencies:
>    11644a765952 ("xhci: Add quirk to workaround the errata seen on Cavium Thunder-X2 Soc")
>    191edc5e2e51 ("xhci: Fix front USB ports on ASUS PRIME B350M-A")
>    21939f003ad0 ("usb: host: xhci-plat: enable BROKEN_PED quirk if platform requested")
>    41135de1e7fd ("usb: xhci: add quirk flag for broken PED bits")
>    4750bc78efdb ("usb: host: xhci support option to disable the xHCI USB2 HW LPM")
>    488dc164914f ("xhci: remove WARN_ON if dma mask is not set for platform devices")
>    4c39d4b949d3 ("usb: xhci: use bus->sysdev for DMA configuration")
>    4efb2f694114 ("usb: host: xhci-plat: add struct xhci_plat_priv")
>    69307ccb9ad7 ("usb: xhci: bInterval quirk for TI TUSB73x0")
>    76f9502fe761 ("xhci: plat: adapt to unified device property interface")
>    9da5a1092b13 ("xhci: Bad Ethernet performance plugged in ASM1042A host")
>    a3aef3793071 ("xhci: get rid of platform data")
>    a7d57abcc8a5 ("xhci: workaround CSS timeout on AMD SNPS 3.0 xHC")
>    dec08194ffec ("xhci: Limit USB2 port wake support for AMD Promontory hosts")
>    def4e6f7b419 ("xhci: refactor and cleanup endpoint initialization.")
> 
> 
> NOTE: The patch will not be queued to stable trees until it is upstream.

Where do I send backport for v4.4 and v4.9?

Kai-Heng

> 
> How should we proceed with this patch?
> 
> -- 
> Thanks,
> Sasha


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete()
  2019-10-07 14:02   ` Johan Hovold
@ 2019-10-08  8:15     ` Mathias Nyman
  2019-10-11 12:47       ` Mathias Nyman
  0 siblings, 1 reply; 16+ messages in thread
From: Mathias Nyman @ 2019-10-08  8:15 UTC (permalink / raw)
  To: Johan Hovold; +Cc: gregkh, linux-usb, # v5 . 3, Alan Stern

On 7.10.2019 17.02, Johan Hovold wrote:
> [ +CC: Alan ]
> 
> On Fri, Oct 04, 2019 at 02:59:33PM +0300, Mathias Nyman wrote:
>> udev stored in ep->hcpriv might be NULL if tt buffer is cleared
>> due to a halted control endpoint during device enumeration
>>
>> xhci_clear_tt_buffer_complete is called by hub_tt_work() once it's
>> scheduled,  and by then usb core might have freed and allocated a
>> new udev for the next enumeration attempt.
>>
>> Fixes: ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer")
>> Cc: <stable@vger.kernel.org> # v5.3
>> Reported-by: Johan Hovold <johan@kernel.org>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>> ---
>>   drivers/usb/host/xhci.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
>> index 00f3804f7aa7..517ec3206f6e 100644
>> --- a/drivers/usb/host/xhci.c
>> +++ b/drivers/usb/host/xhci.c
>> @@ -5238,8 +5238,16 @@ static void xhci_clear_tt_buffer_complete(struct usb_hcd *hcd,
>>   	unsigned int ep_index;
>>   	unsigned long flags;
>>   
>> +	/*
>> +	 * udev might be NULL if tt buffer is cleared during a failed device
>> +	 * enumeration due to a halted control endpoint. Usb core might
>> +	 * have allocated a new udev for the next enumeration attempt.
>> +	 */
>> +
>>   	xhci = hcd_to_xhci(hcd);
>>   	udev = (struct usb_device *)ep->hcpriv;
>> +	if (!udev)
>> +		return;
> 
> I didn't have time to look into this myself last week, or comment on the
> patch before Greg picked it up, but this clearly isn't the right fix.
> 
> As your comment suggests, ep->hcpriv may indeed be NULL here if USB core
> have allocated a new udev. But this only happens after USB has freed the
> old usb_device and the new one happens to get the same address.
> 

You're right, that fix doesn't solve the actual issue, it avoids a few specific
null pointer dereference cases, but leaves both root cause and several other
use-after-free cases open.

> Note that even the usb_host_endpoint itself (ep) has then been freed and
> reallocated since it is member of struct usb_device, and it is the
> use-after-free that needs fixing.
> 
> I've even been able to trigger another NULL-deref in this function
> before a new udev has been allocated, due to the virt dev having been
> freed by xhci_free_dev as part of usb_release_dev:
> 
> It seems the xhci clear-tt implementation was incomplete since it did
> not take care to wait for any ongoing work before disabling the
> endpoint. EHCI does this in ehci_endpoint_disable(), but xhci doesn't
> even implement that callback.
> 

So it seems, it might be possible to remove pending clear_tt work for
most endpoints in the .drop_endpoint callbacks, but ep0 is different,
it isn't dropped, we might need to implement the endpoint_disable()
callback for this.

> As this may be something you could end up hitting in other paths as
> well, perhaps we should even consider reverting the offending commit
> pending a more complete implementation?

Possibly, if we can't find a working solution early enough in this release cycle

-Mathias

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete()
  2019-10-08  8:15     ` Mathias Nyman
@ 2019-10-11 12:47       ` Mathias Nyman
  2019-10-11 12:58         ` [RFT PATCH] xhci: Fix use-after-free regression in xhci clear hub TT implementation Mathias Nyman
  0 siblings, 1 reply; 16+ messages in thread
From: Mathias Nyman @ 2019-10-11 12:47 UTC (permalink / raw)
  To: Johan Hovold; +Cc: gregkh, USB, Alan Stern

On 8.10.2019 11.15, Mathias Nyman wrote:
> On 7.10.2019 17.02, Johan Hovold wrote:
>>
>> I didn't have time to look into this myself last week, or comment on the
>> patch before Greg picked it up, but this clearly isn't the right fix.
>>
>> As your comment suggests, ep->hcpriv may indeed be NULL here if USB core
>> have allocated a new udev. But this only happens after USB has freed the
>> old usb_device and the new one happens to get the same address.
>>
> 
> You're right, that fix doesn't solve the actual issue, it avoids a few specific
> null pointer dereference cases, but leaves both root cause and several other
> use-after-free cases open.
> 
>> Note that even the usb_host_endpoint itself (ep) has then been freed and
>> reallocated since it is member of struct usb_device, and it is the
>> use-after-free that needs fixing.
>>
>> I've even been able to trigger another NULL-deref in this function
>> before a new udev has been allocated, due to the virt dev having been
>> freed by xhci_free_dev as part of usb_release_dev:
>>
>> It seems the xhci clear-tt implementation was incomplete since it did
>> not take care to wait for any ongoing work before disabling the
>> endpoint. EHCI does this in ehci_endpoint_disable(), but xhci doesn't
>> even implement that callback.
>>
> 
> So it seems, it might be possible to remove pending clear_tt work for
> most endpoints in the .drop_endpoint callbacks, but ep0 is different,
> it isn't dropped, we might need to implement the endpoint_disable()
> callback for this.
> 

I was able to reproduce the use-after-free issue by faking a endpoint halt,
and resetting the endpoint early in enumeration at Get device descriptor request.

To fix this I added the endpoint_disable() callback that will wait for
clear_tt_work to finish before returning, similar to the ehci solution.
It works in my case.

the endpoint_disable() callback is called by usb core both before dropping
xhci endpoints belonging to a interface, and separately for ep0 before udev is
freed during enumeration retry.

Both the hack that triggers the issue (LS/FS behind HS bug won't ever enumerate with this)
and the fix to prevent use after free can be found in  clear_tt_fix branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git clear_tt_fix

I'll send the fix out to the list as well, any chance you could try that out?

-Mathias

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFT PATCH] xhci: Fix use-after-free regression in xhci clear hub TT implementation
  2019-10-11 12:47       ` Mathias Nyman
@ 2019-10-11 12:58         ` Mathias Nyman
  2019-10-14 10:16           ` Johan Hovold
  0 siblings, 1 reply; 16+ messages in thread
From: Mathias Nyman @ 2019-10-11 12:58 UTC (permalink / raw)
  To: johan; +Cc: gregkh, stern, linux-usb, Mathias Nyman, # v5 . 3

commit ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer") schedules work
to clear TT buffer, but causes a use-after-free regression at the same time

Make sure hub_tt_work finishes before endpoint is disabled, otherwise
the work will dereference already freed endpoint and device related
pointers.

This was triggered when usb core failed to read the configuration
descriptor of a FS/LS device during enumeration.
xhci driver queued clear_tt_work while usb core freed and reallocated
a new device for the next enumeration attempt.

EHCI driver implents ehci_endpoint_disable() that makes sure
clear_tt_work has finished before it returns, but xhci lacks this support.
usb core will call hcd->driver->endpoint_disable() callback before
disabling endpoints, so we want this in xhci as well.

The added xhci_endpoint_disable() is based on ehci_endpoint_disable()

Fixes: ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer")
Cc: <stable@vger.kernel.org> # v5.3
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 5cfbf9a04494..6e817686d04f 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3071,6 +3071,48 @@ void xhci_cleanup_stalled_ring(struct xhci_hcd *xhci, unsigned int ep_index,
 	}
 }
 
+static void xhci_endpoint_disable(struct usb_hcd *hcd,
+				  struct usb_host_endpoint *host_ep)
+{
+	struct xhci_hcd		*xhci;
+	struct xhci_virt_device	*vdev;
+	struct xhci_virt_ep	*ep;
+	struct usb_device	*udev;
+	unsigned long		flags;
+	unsigned int		ep_index;
+
+	xhci = hcd_to_xhci(hcd);
+rescan:
+	spin_lock_irqsave(&xhci->lock, flags);
+
+	udev = (struct usb_device *)host_ep->hcpriv;
+	if (!udev || !udev->slot_id)
+		goto done;
+
+	vdev = xhci->devs[udev->slot_id];
+	if (!vdev)
+		goto done;
+
+	ep_index = xhci_get_endpoint_index(&host_ep->desc);
+	ep = &vdev->eps[ep_index];
+	if (!ep)
+		goto done;
+
+	/* wait for hub_tt_work to finish clearing hub TT */
+	if (ep->ep_state & EP_CLEARING_TT) {
+		spin_unlock_irqrestore(&xhci->lock, flags);
+		schedule_timeout_uninterruptible(1);
+		goto rescan;
+	}
+
+	if (ep->ep_state)
+		xhci_dbg(xhci, "endpoint disable with ep_state 0x%x\n",
+			 ep->ep_state);
+done:
+	host_ep->hcpriv = NULL;
+	spin_unlock_irqrestore(&xhci->lock, flags);
+}
+
 /*
  * Called after usb core issues a clear halt control message.
  * The host side of the halt should already be cleared by a reset endpoint
@@ -5290,6 +5332,7 @@ static const struct hc_driver xhci_hc_driver = {
 	.free_streams =		xhci_free_streams,
 	.add_endpoint =		xhci_add_endpoint,
 	.drop_endpoint =	xhci_drop_endpoint,
+	.endpoint_disable =	xhci_endpoint_disable,
 	.endpoint_reset =	xhci_endpoint_reset,
 	.check_bandwidth =	xhci_check_bandwidth,
 	.reset_bandwidth =	xhci_reset_bandwidth,
-- 
2.7.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT PATCH] xhci: Fix use-after-free regression in xhci clear hub TT implementation
  2019-10-11 12:58         ` [RFT PATCH] xhci: Fix use-after-free regression in xhci clear hub TT implementation Mathias Nyman
@ 2019-10-14 10:16           ` Johan Hovold
  2019-10-14 13:18             ` Mathias Nyman
  0 siblings, 1 reply; 16+ messages in thread
From: Johan Hovold @ 2019-10-14 10:16 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: johan, gregkh, stern, linux-usb, # v5 . 3

On Fri, Oct 11, 2019 at 03:58:42PM +0300, Mathias Nyman wrote:
> commit ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer") schedules work
> to clear TT buffer, but causes a use-after-free regression at the same time
> 
> Make sure hub_tt_work finishes before endpoint is disabled, otherwise
> the work will dereference already freed endpoint and device related
> pointers.
> 
> This was triggered when usb core failed to read the configuration
> descriptor of a FS/LS device during enumeration.
> xhci driver queued clear_tt_work while usb core freed and reallocated
> a new device for the next enumeration attempt.
> 
> EHCI driver implents ehci_endpoint_disable() that makes sure
> clear_tt_work has finished before it returns, but xhci lacks this support.
> usb core will call hcd->driver->endpoint_disable() callback before
> disabling endpoints, so we want this in xhci as well.
> 
> The added xhci_endpoint_disable() is based on ehci_endpoint_disable()
> 
> Fixes: ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer")
> Cc: <stable@vger.kernel.org> # v5.3
> Reported-by: Johan Hovold <johan@kernel.org>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
>  drivers/usb/host/xhci.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 43 insertions(+)
> 
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 5cfbf9a04494..6e817686d04f 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -3071,6 +3071,48 @@ void xhci_cleanup_stalled_ring(struct xhci_hcd *xhci, unsigned int ep_index,
>  	}
>  }
>  
> +static void xhci_endpoint_disable(struct usb_hcd *hcd,
> +				  struct usb_host_endpoint *host_ep)
> +{
> +	struct xhci_hcd		*xhci;
> +	struct xhci_virt_device	*vdev;
> +	struct xhci_virt_ep	*ep;
> +	struct usb_device	*udev;
> +	unsigned long		flags;
> +	unsigned int		ep_index;
> +
> +	xhci = hcd_to_xhci(hcd);
> +rescan:
> +	spin_lock_irqsave(&xhci->lock, flags);
> +
> +	udev = (struct usb_device *)host_ep->hcpriv;
> +	if (!udev || !udev->slot_id)
> +		goto done;
> +
> +	vdev = xhci->devs[udev->slot_id];
> +	if (!vdev)
> +		goto done;
> +
> +	ep_index = xhci_get_endpoint_index(&host_ep->desc);
> +	ep = &vdev->eps[ep_index];
> +	if (!ep)
> +		goto done;
> +
> +	/* wait for hub_tt_work to finish clearing hub TT */
> +	if (ep->ep_state & EP_CLEARING_TT) {
> +		spin_unlock_irqrestore(&xhci->lock, flags);
> +		schedule_timeout_uninterruptible(1);
> +		goto rescan;
> +	}
> +
> +	if (ep->ep_state)
> +		xhci_dbg(xhci, "endpoint disable with ep_state 0x%x\n",
> +			 ep->ep_state);
> +done:
> +	host_ep->hcpriv = NULL;
> +	spin_unlock_irqrestore(&xhci->lock, flags);
> +}
> +

I used essentially the same reproducer as you did for debugging this
after I first hit it with an actually stalled control endpoint, and this
patch works also with my fault-injection hack.

I've reviewed the code and it looks good to me except for one mostly
theoretical issue. You need to check ep->hc_priv while holding the
xhci->lock in xhci_clear_tt_buffer_complete() or you could end up having
xhci_endpoint_disable() reschedule indefinitely while waiting for
EP_CLEARING_TT to be cleared on a sufficiently weakly ordered
system.

Since cfbb8a84c2d2 ("xhci: Fix NULL pointer dereference in
xhci_clear_tt_buffer_complete()") isn't needed anymore and is slightly
misleading, I suggest amending the patch with the following:

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 9b1e15fe2c8e..6c17e3fe181a 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -5280,20 +5280,13 @@ static void xhci_clear_tt_buffer_complete(struct usb_hcd *hcd,
        unsigned int ep_index;
        unsigned long flags;
 
-       /*
-        * udev might be NULL if tt buffer is cleared during a failed device
-        * enumeration due to a halted control endpoint. Usb core might
-        * have allocated a new udev for the next enumeration attempt.
-        */
-
        xhci = hcd_to_xhci(hcd);
+
+       spin_lock_irqsave(&xhci->lock, flags);
        udev = (struct usb_device *)ep->hcpriv;
-       if (!udev)
-               return;
        slot_id = udev->slot_id;
        ep_index = xhci_get_endpoint_index(&ep->desc);
 
-       spin_lock_irqsave(&xhci->lock, flags);
        xhci->devs[slot_id]->eps[ep_index].ep_state &= ~EP_CLEARING_TT;
        xhci_ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
        spin_unlock_irqrestore(&xhci->lock, flags);

Feel free to add my:

Suggested-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Johan Hovold <johan@kernel.org>
Tested-by: Johan Hovold <johan@kernel.org>

Johan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFT PATCH] xhci: Fix use-after-free regression in xhci clear hub TT implementation
  2019-10-14 10:16           ` Johan Hovold
@ 2019-10-14 13:18             ` Mathias Nyman
  0 siblings, 0 replies; 16+ messages in thread
From: Mathias Nyman @ 2019-10-14 13:18 UTC (permalink / raw)
  To: Johan Hovold; +Cc: gregkh, stern, linux-usb, # v5 . 3

On 14.10.2019 13.16, Johan Hovold wrote:
> On Fri, Oct 11, 2019 at 03:58:42PM +0300, Mathias Nyman wrote:
>> commit ef513be0a905 ("usb: xhci: Add Clear_TT_Buffer") schedules work
>> to clear TT buffer, but causes a use-after-free regression at the same time
>>
>> Make sure hub_tt_work finishes before endpoint is disabled, otherwise
>> the work will dereference already freed endpoint and device related
>> pointers.
>>
>> This was triggered when usb core failed to read the configuration
>> descriptor of a FS/LS device during enumeration.
>> xhci driver queued clear_tt_work while usb core freed and reallocated
>> a new device for the next enumeration attempt.
>>
>> EHCI driver implents ehci_endpoint_disable() that makes sure
>> clear_tt_work has finished before it returns, but xhci lacks this support.
>> usb core will call hcd->driver->endpoint_disable() callback before
>> disabling endpoints, so we want this in xhci as well.
>>
>> The added xhci_endpoint_disable() is based on ehci_endpoint_disable()
>>
> 
> I used essentially the same reproducer as you did for debugging this
> after I first hit it with an actually stalled control endpoint, and this
> patch works also with my fault-injection hack.
> 
> I've reviewed the code and it looks good to me except for one mostly
> theoretical issue. You need to check ep->hc_priv while holding the
> xhci->lock in xhci_clear_tt_buffer_complete() or you could end up having
> xhci_endpoint_disable() reschedule indefinitely while waiting for
> EP_CLEARING_TT to be cleared on a sufficiently weakly ordered
> system.

Good point, I'll change that

> 
> Since cfbb8a84c2d2 ("xhci: Fix NULL pointer dereference in
> xhci_clear_tt_buffer_complete()") isn't needed anymore and is slightly
> misleading, I suggest amending the patch with the following:
> 

I'll add those changes and your tags to the patch

Thanks
Mathias

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04 11:59 [PATCH 0/8] xhci fixes for usb-linus Mathias Nyman
2019-10-04 11:59 ` [PATCH 1/8] xhci: Fix false warning message about wrong bounce buffer write length Mathias Nyman
2019-10-04 11:59 ` [PATCH 2/8] xhci: Prevent device initiated U1/U2 link pm if exit latency is too long Mathias Nyman
2019-10-04 11:59 ` [PATCH 3/8] xhci: Check all endpoints for LPM timeout Mathias Nyman
2019-10-04 11:59 ` [PATCH 4/8] xhci: Fix USB 3.1 capability detection on early xHCI 1.1 spec based hosts Mathias Nyman
2019-10-04 11:59 ` [PATCH 5/8] usb: xhci: wait for CNR controller not ready bit in xhci resume Mathias Nyman
2019-10-04 11:59 ` [PATCH 6/8] xhci: Prevent deadlock when xhci adapter breaks during init Mathias Nyman
2019-10-04 11:59 ` [PATCH 7/8] xhci: Increase STS_SAVE timeout in xhci_suspend() Mathias Nyman
     [not found]   ` <20191006120750.5334F2087E@mail.kernel.org>
2019-10-07 18:51     ` Kai-Heng Feng
2019-10-04 11:59 ` [PATCH 8/8] xhci: Fix NULL pointer dereference in xhci_clear_tt_buffer_complete() Mathias Nyman
2019-10-07 14:02   ` Johan Hovold
2019-10-08  8:15     ` Mathias Nyman
2019-10-11 12:47       ` Mathias Nyman
2019-10-11 12:58         ` [RFT PATCH] xhci: Fix use-after-free regression in xhci clear hub TT implementation Mathias Nyman
2019-10-14 10:16           ` Johan Hovold
2019-10-14 13:18             ` Mathias Nyman

Linux-USB Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \
		linux-usb@vger.kernel.org
	public-inbox-index linux-usb

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git