All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit
@ 2015-11-04 22:53 ` Douglas Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Douglas Anderson @ 2015-11-04 22:53 UTC (permalink / raw)
  To: John Youn, balbi
  Cc: Yunzhi Li, Heiko Stübner, linux-rockchip, Julius Werner,
	gregory.herrero, yousaf.kaukab, dinguyen, Douglas Anderson,
	johnyoun, gregkh, linux-usb, linux-kernel

The dwc2 interrupt handler is quite slow.  On rk3288 with a few things
plugged into the ports and with cpufreq locked at 696MHz (to simulate
real world idle system), I can easily observe dwc2_handle_hcd_intr()
taking > 120 us, sometimes > 150 us.  Note that SOF interrupts come
every 125 us with high speed USB, so taking > 120 us in the interrupt
handler is a big deal.

The patches here will speed up the interrupt controller significantly.
After this series, I have a hard time seeing the interrupt controller
taking > 20 us and I don't ever see it taking > 30 us ever in my tests
unless I bring the cpufreq back down.  With the cpufreq at 126 MHz I can
still see the interrupt handler take > 50 us, so I'm sure we could
improve this further.  ...but hey, it's a start.

In addition to the speedup, this series also has the advantage of
simplifying dwc2 and making it more like everyone else (introducing the
possibility of future simplifications).  Picking this series up will
help your diffstat and likely win you friends.  ;)

===

Steps for gathering data with ftrace:

cd /sys/devices/system/cpu/cpu0/cpufreq/
echo userspace > scaling_governor
echo 696000 > scaling_setspeed

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo "" > trace
echo nop > current_tracer
echo function_graph > current_tracer
echo dwc2_handle_hcd_intr > set_graph_function
echo dwc2_handle_common_intr >> set_graph_function
echo dwc2_handle_hcd_intr > set_ftrace_filter
echo dwc2_handle_common_intr >> set_ftrace_filter
echo funcgraph-abstime > trace_options
echo 70 > tracing_thresh
echo 1 > /sys/kernel/debug/tracing/tracing_on

sleep 2
cat trace

===

NOTE: This series doesn't replace any other patches I've submitted
recently, it merely adds another set of changes that upstream could
benefit from.


Douglas Anderson (3):
  usb: dwc2: rockchip: Make the max_transfer_size automatic
  usb: dwc2: host: Giveback URB in tasklet context
  usb: dwc2: host: Get aligned DMA in a more supported way

 drivers/usb/dwc2/core.c      |  21 +-----
 drivers/usb/dwc2/hcd.c       | 170 ++++++++++++++++++++-----------------------
 drivers/usb/dwc2/hcd.h       |  10 ---
 drivers/usb/dwc2/hcd_intr.c  |  65 -----------------
 drivers/usb/dwc2/hcd_queue.c |   7 +-
 drivers/usb/dwc2/platform.c  |   2 +-
 6 files changed, 85 insertions(+), 190 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit
@ 2015-11-04 22:53 ` Douglas Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Douglas Anderson @ 2015-11-04 22:53 UTC (permalink / raw)
  To: John Youn, balbi-l0cyMroinI0
  Cc: Yunzhi Li, Heiko Stübner,
	linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Julius Werner,
	gregory.herrero-ral2JQCrhuEAvxtiuMwx3w,
	yousaf.kaukab-ral2JQCrhuEAvxtiuMwx3w,
	dinguyen-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx, Douglas Anderson,
	johnyoun-HKixBCOQz3hWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

The dwc2 interrupt handler is quite slow.  On rk3288 with a few things
plugged into the ports and with cpufreq locked at 696MHz (to simulate
real world idle system), I can easily observe dwc2_handle_hcd_intr()
taking > 120 us, sometimes > 150 us.  Note that SOF interrupts come
every 125 us with high speed USB, so taking > 120 us in the interrupt
handler is a big deal.

The patches here will speed up the interrupt controller significantly.
After this series, I have a hard time seeing the interrupt controller
taking > 20 us and I don't ever see it taking > 30 us ever in my tests
unless I bring the cpufreq back down.  With the cpufreq at 126 MHz I can
still see the interrupt handler take > 50 us, so I'm sure we could
improve this further.  ...but hey, it's a start.

In addition to the speedup, this series also has the advantage of
simplifying dwc2 and making it more like everyone else (introducing the
possibility of future simplifications).  Picking this series up will
help your diffstat and likely win you friends.  ;)

===

Steps for gathering data with ftrace:

cd /sys/devices/system/cpu/cpu0/cpufreq/
echo userspace > scaling_governor
echo 696000 > scaling_setspeed

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo "" > trace
echo nop > current_tracer
echo function_graph > current_tracer
echo dwc2_handle_hcd_intr > set_graph_function
echo dwc2_handle_common_intr >> set_graph_function
echo dwc2_handle_hcd_intr > set_ftrace_filter
echo dwc2_handle_common_intr >> set_ftrace_filter
echo funcgraph-abstime > trace_options
echo 70 > tracing_thresh
echo 1 > /sys/kernel/debug/tracing/tracing_on

sleep 2
cat trace

===

NOTE: This series doesn't replace any other patches I've submitted
recently, it merely adds another set of changes that upstream could
benefit from.


Douglas Anderson (3):
  usb: dwc2: rockchip: Make the max_transfer_size automatic
  usb: dwc2: host: Giveback URB in tasklet context
  usb: dwc2: host: Get aligned DMA in a more supported way

 drivers/usb/dwc2/core.c      |  21 +-----
 drivers/usb/dwc2/hcd.c       | 170 ++++++++++++++++++++-----------------------
 drivers/usb/dwc2/hcd.h       |  10 ---
 drivers/usb/dwc2/hcd_intr.c  |  65 -----------------
 drivers/usb/dwc2/hcd_queue.c |   7 +-
 drivers/usb/dwc2/platform.c  |   2 +-
 6 files changed, 85 insertions(+), 190 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] usb: dwc2: rockchip: Make the max_transfer_size automatic
  2015-11-04 22:53 ` Douglas Anderson
  (?)
@ 2015-11-04 22:53 ` Douglas Anderson
  -1 siblings, 0 replies; 15+ messages in thread
From: Douglas Anderson @ 2015-11-04 22:53 UTC (permalink / raw)
  To: John Youn, balbi
  Cc: Yunzhi Li, Heiko Stübner, linux-rockchip, Julius Werner,
	gregory.herrero, yousaf.kaukab, dinguyen, Douglas Anderson,
	johnyoun, gregkh, linux-usb, linux-kernel

Previously we needed to set the max_transfer_size to explicitly be 65535
because the old driver would detect that our hardware could support much
bigger transfers and then would try to do them.  This wouldn't work
since the DMA alignment code couldn't support it.

Later in commit e8f8c14d9da7 ("usb: dwc2: clip max_transfer_size to
65535") upstream added support for clipping this automatically.  Since
that commit it has been OK to just use "-1" (default), but nobody
bothered to change it.

Let's change it to default now for two reasons:
- It's nice to use autodetected params.
- If we can remove the 65535 limit, we can transfer more!

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
 drivers/usb/dwc2/platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
index 5859b0fa19ee..f26e0c31c07e 100644
--- a/drivers/usb/dwc2/platform.c
+++ b/drivers/usb/dwc2/platform.c
@@ -95,7 +95,7 @@ static const struct dwc2_core_params params_rk3066 = {
 	.host_rx_fifo_size		= 520,	/* 520 DWORDs */
 	.host_nperio_tx_fifo_size	= 128,	/* 128 DWORDs */
 	.host_perio_tx_fifo_size	= 256,	/* 256 DWORDs */
-	.max_transfer_size		= 65535,
+	.max_transfer_size		= -1,
 	.max_packet_count		= -1,
 	.host_channels			= -1,
 	.phy_type			= -1,
-- 
2.6.0.rc2.230.g3dd15c0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
@ 2015-11-04 22:53   ` Douglas Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Douglas Anderson @ 2015-11-04 22:53 UTC (permalink / raw)
  To: John Youn, balbi
  Cc: Yunzhi Li, Heiko Stübner, linux-rockchip, Julius Werner,
	gregory.herrero, yousaf.kaukab, dinguyen, Douglas Anderson,
	johnyoun, gregkh, linux-usb, linux-kernel

In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
context") support was added to give back the URB in tasklet context.
Let's take advantage of this in dwc2.

This speeds up the dwc2 interrupt handler considerably.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
 drivers/usb/dwc2/hcd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index e79baf73c234..9e7988950c7a 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
 	kfree(qtd->urb);
 	qtd->urb = NULL;
 
-	spin_unlock(&hsotg->lock);
 	usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
-	spin_lock(&hsotg->lock);
 }
 
 /*
@@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
 	.hcd_priv_size = sizeof(struct wrapper_priv_data),
 
 	.irq = _dwc2_hcd_irq,
-	.flags = HCD_MEMORY | HCD_USB2,
+	.flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
 
 	.start = _dwc2_hcd_start,
 	.stop = _dwc2_hcd_stop,
-- 
2.6.0.rc2.230.g3dd15c0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
@ 2015-11-04 22:53   ` Douglas Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Douglas Anderson @ 2015-11-04 22:53 UTC (permalink / raw)
  To: John Youn, balbi-l0cyMroinI0
  Cc: gregory.herrero-ral2JQCrhuEAvxtiuMwx3w, Heiko Stübner,
	johnyoun-HKixBCOQz3hWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Douglas Anderson,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	yousaf.kaukab-ral2JQCrhuEAvxtiuMwx3w, Yunzhi Li, Julius Werner,
	dinguyen-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx

In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
context") support was added to give back the URB in tasklet context.
Let's take advantage of this in dwc2.

This speeds up the dwc2 interrupt handler considerably.

Signed-off-by: Douglas Anderson <dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
---
 drivers/usb/dwc2/hcd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index e79baf73c234..9e7988950c7a 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
 	kfree(qtd->urb);
 	qtd->urb = NULL;
 
-	spin_unlock(&hsotg->lock);
 	usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
-	spin_lock(&hsotg->lock);
 }
 
 /*
@@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
 	.hcd_priv_size = sizeof(struct wrapper_priv_data),
 
 	.irq = _dwc2_hcd_irq,
-	.flags = HCD_MEMORY | HCD_USB2,
+	.flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
 
 	.start = _dwc2_hcd_start,
 	.stop = _dwc2_hcd_stop,
-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] usb: dwc2: host: Get aligned DMA in a more supported way
  2015-11-04 22:53 ` Douglas Anderson
                   ` (2 preceding siblings ...)
  (?)
@ 2015-11-04 22:53 ` Douglas Anderson
  -1 siblings, 0 replies; 15+ messages in thread
From: Douglas Anderson @ 2015-11-04 22:53 UTC (permalink / raw)
  To: John Youn, balbi
  Cc: Yunzhi Li, Heiko Stübner, linux-rockchip, Julius Werner,
	gregory.herrero, yousaf.kaukab, dinguyen, Douglas Anderson,
	johnyoun, gregkh, linux-usb, linux-kernel

All other host controllers who want aligned buffers for DMA do it a
certain way.  Let's do that too instead of working behind the USB core's
back.  This makes our interrupt handler not take forever and also rips
out a lot of code, simplifying things a bunch.

This also has the side effect of removing the 65535 max transfer size
limit.

NOTE: The actual code to allocate the aligned buffers is ripped almost
completely from the tegra EHCI driver.  At some point in the future we
may want to add this functionality to the USB core to share more code
everywhere.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
 drivers/usb/dwc2/core.c      |  21 +-----
 drivers/usb/dwc2/hcd.c       | 166 ++++++++++++++++++++-----------------------
 drivers/usb/dwc2/hcd.h       |  10 ---
 drivers/usb/dwc2/hcd_intr.c  |  65 -----------------
 drivers/usb/dwc2/hcd_queue.c |   7 +-
 5 files changed, 83 insertions(+), 186 deletions(-)

diff --git a/drivers/usb/dwc2/core.c b/drivers/usb/dwc2/core.c
index ef73e498e98f..7e28cfafcfd8 100644
--- a/drivers/usb/dwc2/core.c
+++ b/drivers/usb/dwc2/core.c
@@ -1830,19 +1830,11 @@ void dwc2_hc_start_transfer(struct dwc2_hsotg *hsotg,
 	}
 
 	if (hsotg->core_params->dma_enable > 0) {
-		dma_addr_t dma_addr;
-
-		if (chan->align_buf) {
-			if (dbg_hc(chan))
-				dev_vdbg(hsotg->dev, "align_buf\n");
-			dma_addr = chan->align_buf;
-		} else {
-			dma_addr = chan->xfer_dma;
-		}
-		dwc2_writel((u32)dma_addr, hsotg->regs + HCDMA(chan->hc_num));
+		dwc2_writel((u32)chan->xfer_dma,
+			    hsotg->regs + HCDMA(chan->hc_num));
 		if (dbg_hc(chan))
 			dev_vdbg(hsotg->dev, "Wrote %08lx to HCDMA(%d)\n",
-				 (unsigned long)dma_addr, chan->hc_num);
+				 (unsigned long)chan->xfer_dma, chan->hc_num);
 	}
 
 	/* Start the split */
@@ -3137,13 +3129,6 @@ int dwc2_get_hwparams(struct dwc2_hsotg *hsotg)
 	width = (hwcfg3 & GHWCFG3_XFER_SIZE_CNTR_WIDTH_MASK) >>
 		GHWCFG3_XFER_SIZE_CNTR_WIDTH_SHIFT;
 	hw->max_transfer_size = (1 << (width + 11)) - 1;
-	/*
-	 * Clip max_transfer_size to 65535. dwc2_hc_setup_align_buf() allocates
-	 * coherent buffers with this size, and if it's too large we can
-	 * exhaust the coherent DMA pool.
-	 */
-	if (hw->max_transfer_size > 65535)
-		hw->max_transfer_size = 65535;
 	width = (hwcfg3 & GHWCFG3_PACKET_SIZE_CNTR_WIDTH_MASK) >>
 		GHWCFG3_PACKET_SIZE_CNTR_WIDTH_SHIFT;
 	hw->max_packet_count = (1 << (width + 4)) - 1;
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 9e7988950c7a..4487f1b262b2 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -598,9 +598,9 @@ static void dwc2_hc_init_split(struct dwc2_hsotg *hsotg,
 	chan->hub_port = (u8)hub_port;
 }
 
-static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
-			       struct dwc2_host_chan *chan,
-			       struct dwc2_qtd *qtd, void *bufptr)
+static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
+			      struct dwc2_host_chan *chan,
+			      struct dwc2_qtd *qtd)
 {
 	struct dwc2_hcd_urb *urb = qtd->urb;
 	struct dwc2_hcd_iso_packet_desc *frame_desc;
@@ -620,7 +620,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
 			else
 				chan->xfer_buf = urb->setup_packet;
 			chan->xfer_len = 8;
-			bufptr = NULL;
 			break;
 
 		case DWC2_CONTROL_DATA:
@@ -647,7 +646,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
 				chan->xfer_dma = hsotg->status_buf_dma;
 			else
 				chan->xfer_buf = hsotg->status_buf;
-			bufptr = NULL;
 			break;
 		}
 		break;
@@ -680,14 +678,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
 
 		chan->xfer_len = frame_desc->length - qtd->isoc_split_offset;
 
-		/* For non-dword aligned buffers */
-		if (hsotg->core_params->dma_enable > 0 &&
-		    (chan->xfer_dma & 0x3))
-			bufptr = (u8 *)urb->buf + frame_desc->offset +
-					qtd->isoc_split_offset;
-		else
-			bufptr = NULL;
-
 		if (chan->xact_pos == DWC2_HCSPLT_XACTPOS_ALL) {
 			if (chan->xfer_len <= 188)
 				chan->xact_pos = DWC2_HCSPLT_XACTPOS_ALL;
@@ -696,63 +686,89 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
 		}
 		break;
 	}
+}
+
+#define DWC2_USB_DMA_ALIGN 4
+
+struct dma_aligned_buffer {
+	void *kmalloc_ptr;
+	void *old_xfer_buffer;
+	u8 data[0];
+};
+
+static void dwc2_free_dma_aligned_buffer(struct urb *urb)
+{
+	struct dma_aligned_buffer *temp;
+
+	if (!(urb->transfer_flags & URB_ALIGNED_TEMP_BUFFER))
+		return;
 
-	return bufptr;
+	temp = container_of(urb->transfer_buffer,
+		struct dma_aligned_buffer, data);
+
+	if (usb_urb_dir_in(urb))
+		memcpy(temp->old_xfer_buffer, temp->data,
+		       urb->transfer_buffer_length);
+	urb->transfer_buffer = temp->old_xfer_buffer;
+	kfree(temp->kmalloc_ptr);
+
+	urb->transfer_flags &= ~URB_ALIGNED_TEMP_BUFFER;
 }
 
-static int dwc2_hc_setup_align_buf(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
-				   struct dwc2_host_chan *chan,
-				   struct dwc2_hcd_urb *urb, void *bufptr)
+static int dwc2_alloc_dma_aligned_buffer(struct urb *urb, gfp_t mem_flags)
 {
-	u32 buf_size;
-	struct urb *usb_urb;
-	struct usb_hcd *hcd;
+	struct dma_aligned_buffer *temp, *kmalloc_ptr;
+	size_t kmalloc_size;
 
-	if (!qh->dw_align_buf) {
-		if (chan->ep_type != USB_ENDPOINT_XFER_ISOC)
-			buf_size = hsotg->core_params->max_transfer_size;
-		else
-			/* 3072 = 3 max-size Isoc packets */
-			buf_size = 3072;
+	if (urb->num_sgs || urb->sg ||
+	    urb->transfer_buffer_length == 0 ||
+	    !((uintptr_t)urb->transfer_buffer & (DWC2_USB_DMA_ALIGN - 1)))
+		return 0;
 
-		qh->dw_align_buf = kmalloc(buf_size, GFP_ATOMIC | GFP_DMA);
-		if (!qh->dw_align_buf)
-			return -ENOMEM;
-		qh->dw_align_buf_size = buf_size;
-	}
+	/* Allocate a buffer with enough padding for alignment */
+	kmalloc_size = urb->transfer_buffer_length +
+		sizeof(struct dma_aligned_buffer) + DWC2_USB_DMA_ALIGN - 1;
 
-	if (chan->xfer_len) {
-		dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
-		usb_urb = urb->priv;
+	kmalloc_ptr = kmalloc(kmalloc_size, mem_flags);
+	if (!kmalloc_ptr)
+		return -ENOMEM;
 
-		if (usb_urb) {
-			if (usb_urb->transfer_flags &
-			    (URB_SETUP_MAP_SINGLE | URB_DMA_MAP_SG |
-			     URB_DMA_MAP_PAGE | URB_DMA_MAP_SINGLE)) {
-				hcd = dwc2_hsotg_to_hcd(hsotg);
-				usb_hcd_unmap_urb_for_dma(hcd, usb_urb);
-			}
-			if (!chan->ep_is_in)
-				memcpy(qh->dw_align_buf, bufptr,
-				       chan->xfer_len);
-		} else {
-			dev_warn(hsotg->dev, "no URB in dwc2_urb\n");
-		}
-	}
+	/* Position our struct dma_aligned_buffer such that data is aligned */
+	temp = PTR_ALIGN(kmalloc_ptr + 1, DWC2_USB_DMA_ALIGN) - 1;
+	temp->kmalloc_ptr = kmalloc_ptr;
+	temp->old_xfer_buffer = urb->transfer_buffer;
+	if (usb_urb_dir_out(urb))
+		memcpy(temp->data, urb->transfer_buffer,
+		       urb->transfer_buffer_length);
+	urb->transfer_buffer = temp->data;
 
-	qh->dw_align_buf_dma = dma_map_single(hsotg->dev,
-			qh->dw_align_buf, qh->dw_align_buf_size,
-			chan->ep_is_in ? DMA_FROM_DEVICE : DMA_TO_DEVICE);
-	if (dma_mapping_error(hsotg->dev, qh->dw_align_buf_dma)) {
-		dev_err(hsotg->dev, "can't map align_buf\n");
-		chan->align_buf = 0;
-		return -EINVAL;
-	}
+	urb->transfer_flags |= URB_ALIGNED_TEMP_BUFFER;
 
-	chan->align_buf = qh->dw_align_buf_dma;
 	return 0;
 }
 
+static int dwc2_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb,
+				      gfp_t mem_flags)
+{
+	int ret;
+
+	ret = dwc2_alloc_dma_aligned_buffer(urb, mem_flags);
+	if (ret)
+		return ret;
+
+	ret = usb_hcd_map_urb_for_dma(hcd, urb, mem_flags);
+	if (ret)
+		dwc2_free_dma_aligned_buffer(urb);
+
+	return ret;
+}
+
+static void dwc2_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
+{
+	usb_hcd_unmap_urb_for_dma(hcd, urb);
+	dwc2_free_dma_aligned_buffer(urb);
+}
+
 /**
  * dwc2_assign_and_init_hc() - Assigns transactions from a QTD to a free host
  * channel and initializes the host channel to perform the transactions. The
@@ -767,7 +783,6 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 	struct dwc2_host_chan *chan;
 	struct dwc2_hcd_urb *urb;
 	struct dwc2_qtd *qtd;
-	void *bufptr = NULL;
 
 	if (dbg_qh(qh))
 		dev_vdbg(hsotg->dev, "%s(%p,%p)\n", __func__, hsotg, qh);
@@ -829,16 +844,10 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 		!dwc2_hcd_is_pipe_in(&urb->pipe_info))
 		urb->actual_length = urb->length;
 
-	if (hsotg->core_params->dma_enable > 0) {
+	if (hsotg->core_params->dma_enable > 0)
 		chan->xfer_dma = urb->dma + urb->actual_length;
-
-		/* For non-dword aligned case */
-		if (hsotg->core_params->dma_desc_enable <= 0 &&
-		    (chan->xfer_dma & 0x3))
-			bufptr = (u8 *)urb->buf + urb->actual_length;
-	} else {
+	else
 		chan->xfer_buf = (u8 *)urb->buf + urb->actual_length;
-	}
 
 	chan->xfer_len = urb->length - urb->actual_length;
 	chan->xfer_count = 0;
@@ -850,27 +859,7 @@ static int dwc2_assign_and_init_hc(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 		chan->do_split = 0;
 
 	/* Set the transfer attributes */
-	bufptr = dwc2_hc_init_xfer(hsotg, chan, qtd, bufptr);
-
-	/* Non DWORD-aligned buffer case */
-	if (bufptr) {
-		dev_vdbg(hsotg->dev, "Non-aligned buffer\n");
-		if (dwc2_hc_setup_align_buf(hsotg, qh, chan, urb, bufptr)) {
-			dev_err(hsotg->dev,
-				"%s: Failed to allocate memory to handle non-dword aligned buffer\n",
-				__func__);
-			/* Add channel back to free list */
-			chan->align_buf = 0;
-			chan->multi_count = 0;
-			list_add_tail(&chan->hc_list_entry,
-				      &hsotg->free_hc_list);
-			qtd->in_process = 0;
-			qh->channel = NULL;
-			return -ENOMEM;
-		}
-	} else {
-		chan->align_buf = 0;
-	}
+	dwc2_hc_init_xfer(hsotg, chan, qtd);
 
 	if (chan->ep_type == USB_ENDPOINT_XFER_INT ||
 	    chan->ep_type == USB_ENDPOINT_XFER_ISOC)
@@ -2902,6 +2891,9 @@ static struct hc_driver dwc2_hc_driver = {
 
 	.bus_suspend = _dwc2_hcd_suspend,
 	.bus_resume = _dwc2_hcd_resume,
+
+	.map_urb_for_dma	= dwc2_map_urb_for_dma,
+	.unmap_urb_for_dma	= dwc2_unmap_urb_for_dma,
 };
 
 /*
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index f105bada2fd1..8a4486e1a542 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -75,8 +75,6 @@ struct dwc2_qh;
  *                      (micro)frame
  * @xfer_buf:           Pointer to current transfer buffer position
  * @xfer_dma:           DMA address of xfer_buf
- * @align_buf:          In Buffer DMA mode this will be used if xfer_buf is not
- *                      DWORD aligned
  * @xfer_len:           Total number of bytes to transfer
  * @xfer_count:         Number of bytes transferred so far
  * @start_pkt_count:    Packet count at start of transfer
@@ -132,7 +130,6 @@ struct dwc2_host_chan {
 
 	u8 *xfer_buf;
 	dma_addr_t xfer_dma;
-	dma_addr_t align_buf;
 	u32 xfer_len;
 	u32 xfer_count;
 	u16 start_pkt_count;
@@ -241,10 +238,6 @@ enum dwc2_transaction_type {
  * @frame_usecs:        Internal variable used by the microframe scheduler
  * @start_split_frame:  (Micro)frame at which last start split was initialized
  * @ntd:                Actual number of transfer descriptors in a list
- * @dw_align_buf:       Used instead of original buffer if its physical address
- *                      is not dword-aligned
- * @dw_align_buf_size:  Size of dw_align_buf
- * @dw_align_buf_dma:   DMA address for dw_align_buf
  * @qtd_list:           List of QTDs for this QH
  * @channel:            Host channel currently processing transfers for this QH
  * @qh_list_entry:      Entry for QH in either the periodic or non-periodic
@@ -276,9 +269,6 @@ struct dwc2_qh {
 	u16 frame_usecs[8];
 	u16 start_split_frame;
 	u16 ntd;
-	u8 *dw_align_buf;
-	int dw_align_buf_size;
-	dma_addr_t dw_align_buf_dma;
 	struct list_head qtd_list;
 	struct dwc2_host_chan *channel;
 	struct list_head qh_list_entry;
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index bda0b21b850f..84190243b8be 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -465,18 +465,6 @@ static int dwc2_update_urb_state(struct dwc2_hsotg *hsotg,
 		xfer_length = urb->length - urb->actual_length;
 	}
 
-	/* Non DWORD-aligned buffer case handling */
-	if (chan->align_buf && xfer_length) {
-		dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
-		dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
-				chan->qh->dw_align_buf_size,
-				chan->ep_is_in ?
-				DMA_FROM_DEVICE : DMA_TO_DEVICE);
-		if (chan->ep_is_in)
-			memcpy(urb->buf + urb->actual_length,
-					chan->qh->dw_align_buf, xfer_length);
-	}
-
 	dev_vdbg(hsotg->dev, "urb->actual_length=%d xfer_length=%d\n",
 		 urb->actual_length, xfer_length);
 	urb->actual_length += xfer_length;
@@ -558,21 +546,6 @@ static enum dwc2_halt_status dwc2_update_isoc_urb_state(
 		frame_desc->status = 0;
 		frame_desc->actual_length = dwc2_get_actual_xfer_length(hsotg,
 					chan, chnum, qtd, halt_status, NULL);
-
-		/* Non DWORD-aligned buffer case handling */
-		if (chan->align_buf && frame_desc->actual_length) {
-			dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n",
-				 __func__);
-			dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
-					chan->qh->dw_align_buf_size,
-					chan->ep_is_in ?
-					DMA_FROM_DEVICE : DMA_TO_DEVICE);
-			if (chan->ep_is_in)
-				memcpy(urb->buf + frame_desc->offset +
-					qtd->isoc_split_offset,
-					chan->qh->dw_align_buf,
-					frame_desc->actual_length);
-		}
 		break;
 	case DWC2_HC_XFER_FRAME_OVERRUN:
 		urb->error_count++;
@@ -593,21 +566,6 @@ static enum dwc2_halt_status dwc2_update_isoc_urb_state(
 		frame_desc->actual_length = dwc2_get_actual_xfer_length(hsotg,
 					chan, chnum, qtd, halt_status, NULL);
 
-		/* Non DWORD-aligned buffer case handling */
-		if (chan->align_buf && frame_desc->actual_length) {
-			dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n",
-				 __func__);
-			dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
-					chan->qh->dw_align_buf_size,
-					chan->ep_is_in ?
-					DMA_FROM_DEVICE : DMA_TO_DEVICE);
-			if (chan->ep_is_in)
-				memcpy(urb->buf + frame_desc->offset +
-					qtd->isoc_split_offset,
-					chan->qh->dw_align_buf,
-					frame_desc->actual_length);
-		}
-
 		/* Skip whole frame */
 		if (chan->qh->do_split &&
 		    chan->ep_type == USB_ENDPOINT_XFER_ISOC && chan->ep_is_in &&
@@ -673,8 +631,6 @@ static void dwc2_deactivate_qh(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
 	}
 
 no_qtd:
-	if (qh->channel)
-		qh->channel->align_buf = 0;
 	qh->channel = NULL;
 	dwc2_hcd_qh_deactivate(hsotg, qh, continue_split);
 }
@@ -939,14 +895,6 @@ static int dwc2_xfercomp_isoc_split_in(struct dwc2_hsotg *hsotg,
 
 	frame_desc->actual_length += len;
 
-	if (chan->align_buf) {
-		dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
-		dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
-				chan->qh->dw_align_buf_size, DMA_FROM_DEVICE);
-		memcpy(qtd->urb->buf + frame_desc->offset +
-		       qtd->isoc_split_offset, chan->qh->dw_align_buf, len);
-	}
-
 	qtd->isoc_split_offset += len;
 
 	if (frame_desc->actual_length >= frame_desc->length) {
@@ -1169,19 +1117,6 @@ static void dwc2_update_urb_state_abn(struct dwc2_hsotg *hsotg,
 		xfer_length = urb->length - urb->actual_length;
 	}
 
-	/* Non DWORD-aligned buffer case handling */
-	if (chan->align_buf && xfer_length && chan->ep_is_in) {
-		dev_vdbg(hsotg->dev, "%s(): non-aligned buffer\n", __func__);
-		dma_unmap_single(hsotg->dev, chan->qh->dw_align_buf_dma,
-				chan->qh->dw_align_buf_size,
-				chan->ep_is_in ?
-				DMA_FROM_DEVICE : DMA_TO_DEVICE);
-		if (chan->ep_is_in)
-			memcpy(urb->buf + urb->actual_length,
-					chan->qh->dw_align_buf,
-					xfer_length);
-	}
-
 	urb->actual_length += xfer_length;
 
 	hctsiz = dwc2_readl(hsotg->regs + HCTSIZ(chnum));
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 7d8d06cfe3c1..3e1edafc593c 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -232,13 +232,8 @@ struct dwc2_qh *dwc2_hcd_qh_create(struct dwc2_hsotg *hsotg,
  */
 void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 {
-	if (hsotg->core_params->dma_desc_enable > 0) {
+	if (hsotg->core_params->dma_desc_enable > 0)
 		dwc2_hcd_qh_free_ddma(hsotg, qh);
-	} else {
-		/* kfree(NULL) is safe */
-		kfree(qh->dw_align_buf);
-		qh->dw_align_buf_dma = (dma_addr_t)0;
-	}
 	kfree(qh);
 }
 
-- 
2.6.0.rc2.230.g3dd15c0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
@ 2015-11-05  0:30     ` Doug Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Anderson @ 2015-11-05  0:30 UTC (permalink / raw)
  To: John Youn, Felipe Balbi
  Cc: Yunzhi Li, Heiko Stübner, open list:ARM/Rockchip SoC...,
	Julius Werner, Herrero, Gregory, Kaukab, Yousaf, Dinh Nguyen,
	Douglas Anderson, John Youn, Greg Kroah-Hartman, linux-usb,
	linux-kernel, Ming Lei, Alan Stern

Hi,

On Wed, Nov 4, 2015 at 2:53 PM, Douglas Anderson <dianders@chromium.org> wrote:
> In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
> context") support was added to give back the URB in tasklet context.
> Let's take advantage of this in dwc2.
>
> This speeds up the dwc2 interrupt handler considerably.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
>  drivers/usb/dwc2/hcd.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
> index e79baf73c234..9e7988950c7a 100644
> --- a/drivers/usb/dwc2/hcd.c
> +++ b/drivers/usb/dwc2/hcd.c
> @@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
>         kfree(qtd->urb);
>         qtd->urb = NULL;
>
> -       spin_unlock(&hsotg->lock);
>         usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
> -       spin_lock(&hsotg->lock);
>  }
>
>  /*
> @@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
>         .hcd_priv_size = sizeof(struct wrapper_priv_data),
>
>         .irq = _dwc2_hcd_irq,
> -       .flags = HCD_MEMORY | HCD_USB2,
> +       .flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
>
>         .start = _dwc2_hcd_start,
>         .stop = _dwc2_hcd_stop,

In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.

I'll see if I can do some investigation about this and also some
benchmarking before and after.  Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.

If anyone else knows better than I, please speak up!  :)

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
@ 2015-11-05  0:30     ` Doug Anderson
  0 siblings, 0 replies; 15+ messages in thread
From: Doug Anderson @ 2015-11-05  0:30 UTC (permalink / raw)
  To: John Youn, Felipe Balbi
  Cc: Herrero, Gregory, Heiko Stübner, John Youn,
	Greg Kroah-Hartman, Ming Lei, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Douglas Anderson, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	open list:ARM/Rockchip SoC...,
	Kaukab, Yousaf, Alan Stern, Yunzhi Li, Julius Werner,
	Dinh Nguyen

Hi,

On Wed, Nov 4, 2015 at 2:53 PM, Douglas Anderson <dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
> In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
> context") support was added to give back the URB in tasklet context.
> Let's take advantage of this in dwc2.
>
> This speeds up the dwc2 interrupt handler considerably.
>
> Signed-off-by: Douglas Anderson <dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> ---
>  drivers/usb/dwc2/hcd.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
> index e79baf73c234..9e7988950c7a 100644
> --- a/drivers/usb/dwc2/hcd.c
> +++ b/drivers/usb/dwc2/hcd.c
> @@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct dwc2_qtd *qtd,
>         kfree(qtd->urb);
>         qtd->urb = NULL;
>
> -       spin_unlock(&hsotg->lock);
>         usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
> -       spin_lock(&hsotg->lock);
>  }
>
>  /*
> @@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
>         .hcd_priv_size = sizeof(struct wrapper_priv_data),
>
>         .irq = _dwc2_hcd_irq,
> -       .flags = HCD_MEMORY | HCD_USB2,
> +       .flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
>
>         .start = _dwc2_hcd_start,
>         .stop = _dwc2_hcd_stop,

In the ChromeOS gerrit
<https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
points out that for EHCI it was good to take the optimization from
commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
whether we also need a similar change before landing.

I'll see if I can do some investigation about this and also some
benchmarking before and after.  Certainly profiling the interrupt
handler itself showed a huge improvement, but I'd hate to see a
regression elsewhere.

If anyone else knows better than I, please speak up!  :)

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
  2015-11-05  0:30     ` Doug Anderson
  (?)
@ 2015-11-05 15:19     ` Alan Stern
  2015-11-06  0:29       ` Doug Anderson
  -1 siblings, 1 reply; 15+ messages in thread
From: Alan Stern @ 2015-11-05 15:19 UTC (permalink / raw)
  To: Doug Anderson
  Cc: John Youn, Felipe Balbi, Yunzhi Li, Heiko Stübner,
	open list:ARM/Rockchip SoC...,
	Julius Werner, Herrero, Gregory, Kaukab, Yousaf, Dinh Nguyen,
	John Youn, Greg Kroah-Hartman, linux-usb, linux-kernel, Ming Lei

On Wed, 4 Nov 2015, Doug Anderson wrote:

> In the ChromeOS gerrit
> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
> points out that for EHCI it was good to take the optimization from
> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
> whether we also need a similar change before landing.
> 
> I'll see if I can do some investigation about this and also some
> benchmarking before and after.  Certainly profiling the interrupt
> handler itself showed a huge improvement, but I'd hate to see a
> regression elsewhere.
> 
> If anyone else knows better than I, please speak up!  :)

This is a matter of both efficiency and correctness.  Giving back URBs 
in a tasklet is not a simple change.

Have you read the kerneldoc for usb_submit_urb() in 
drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth 
Transfers" is highly relevant.  I don't know how dwc2 goes about 
reserving bandwidth for periodic transfers, but if it relies on the 
endpoint queue being non-empty to maintain a reservation then it will 
be affected by this change.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit
@ 2015-11-05 19:18   ` Heiko Stuebner
  0 siblings, 0 replies; 15+ messages in thread
From: Heiko Stuebner @ 2015-11-05 19:18 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: John Youn, balbi, Yunzhi Li, linux-rockchip, Julius Werner,
	gregory.herrero, yousaf.kaukab, dinguyen, johnyoun, gregkh,
	linux-usb, linux-kernel

Am Mittwoch, 4. November 2015, 14:53:02 schrieb Douglas Anderson:
> The dwc2 interrupt handler is quite slow.  On rk3288 with a few things
> plugged into the ports and with cpufreq locked at 696MHz (to simulate
> real world idle system), I can easily observe dwc2_handle_hcd_intr()
> taking > 120 us, sometimes > 150 us.  Note that SOF interrupts come
> every 125 us with high speed USB, so taking > 120 us in the interrupt
> handler is a big deal.
> 
> The patches here will speed up the interrupt controller significantly.
> After this series, I have a hard time seeing the interrupt controller
> taking > 20 us and I don't ever see it taking > 30 us ever in my tests
> unless I bring the cpufreq back down.  With the cpufreq at 126 MHz I can
> still see the interrupt handler take > 50 us, so I'm sure we could
> improve this further.  ...but hey, it's a start.
> 
> In addition to the speedup, this series also has the advantage of
> simplifying dwc2 and making it more like everyone else (introducing the
> possibility of future simplifications).  Picking this series up will
> help your diffstat and likely win you friends.  ;)

I gave this a simple spin on my veyron-pinky with both a device attached 
directly to the port as well as with an usb-hub in between. Everything was 
still working smoothly, so

Tested-by: Heiko Stuebner <heiko@sntech.de>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit
@ 2015-11-05 19:18   ` Heiko Stuebner
  0 siblings, 0 replies; 15+ messages in thread
From: Heiko Stuebner @ 2015-11-05 19:18 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: gregory.herrero-ral2JQCrhuEAvxtiuMwx3w,
	johnyoun-HKixBCOQz3hWk0Htik3J/w,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, balbi-l0cyMroinI0,
	John Youn, linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	yousaf.kaukab-ral2JQCrhuEAvxtiuMwx3w, Yunzhi Li, Julius Werner,
	dinguyen-yzvPICuk2ABMcg4IHK0kFoH6Mc4MB0Vx

Am Mittwoch, 4. November 2015, 14:53:02 schrieb Douglas Anderson:
> The dwc2 interrupt handler is quite slow.  On rk3288 with a few things
> plugged into the ports and with cpufreq locked at 696MHz (to simulate
> real world idle system), I can easily observe dwc2_handle_hcd_intr()
> taking > 120 us, sometimes > 150 us.  Note that SOF interrupts come
> every 125 us with high speed USB, so taking > 120 us in the interrupt
> handler is a big deal.
> 
> The patches here will speed up the interrupt controller significantly.
> After this series, I have a hard time seeing the interrupt controller
> taking > 20 us and I don't ever see it taking > 30 us ever in my tests
> unless I bring the cpufreq back down.  With the cpufreq at 126 MHz I can
> still see the interrupt handler take > 50 us, so I'm sure we could
> improve this further.  ...but hey, it's a start.
> 
> In addition to the speedup, this series also has the advantage of
> simplifying dwc2 and making it more like everyone else (introducing the
> possibility of future simplifications).  Picking this series up will
> help your diffstat and likely win you friends.  ;)

I gave this a simple spin on my veyron-pinky with both a device attached 
directly to the port as well as with an usb-hub in between. Everything was 
still working smoothly, so

Tested-by: Heiko Stuebner <heiko-4mtYJXux2i+zQB+pC5nmwQ@public.gmane.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
  2015-11-05 15:19     ` Alan Stern
@ 2015-11-06  0:29       ` Doug Anderson
  2015-11-06 15:40         ` Alan Stern
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Anderson @ 2015-11-06  0:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Youn, Felipe Balbi, Yunzhi Li, Heiko Stübner,
	open list:ARM/Rockchip SoC...,
	Julius Werner, Herrero, Gregory, Kaukab, Yousaf, Dinh Nguyen,
	John Youn, Greg Kroah-Hartman, linux-usb, linux-kernel, Ming Lei

Alan,

On Thu, Nov 5, 2015 at 7:19 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 4 Nov 2015, Doug Anderson wrote:
>
>> In the ChromeOS gerrit
>> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
>> points out that for EHCI it was good to take the optimization from
>> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
>> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
>> whether we also need a similar change before landing.
>>
>> I'll see if I can do some investigation about this and also some
>> benchmarking before and after.  Certainly profiling the interrupt
>> handler itself showed a huge improvement, but I'd hate to see a
>> regression elsewhere.
>>
>> If anyone else knows better than I, please speak up!  :)
>
> This is a matter of both efficiency and correctness.  Giving back URBs
> in a tasklet is not a simple change.
>
> Have you read the kerneldoc for usb_submit_urb() in
> drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth
> Transfers" is highly relevant.  I don't know how dwc2 goes about
> reserving bandwidth for periodic transfers, but if it relies on the
> endpoint queue being non-empty to maintain a reservation then it will
> be affected by this change.

It does look as if you are right and the reservation will end up being
released.  It looks to me like dwc2_deschedule_periodic() is in charge
of releasing the reservation.  I'll work on trying to actually confirm
this.  I guess I need to find a USB test setup where there are enough
devices that I exceed the available time so I can see the brokenness
of my old solution...

I hadn't realized that this was a correctness problem and not just an
optimization problem, so thank you very much for the info!  :)  I ran
with a bunch of USB devices and it worked fine (and performance
improved!) so I figured I was good to go...  Now I've read the
kerneldoc you pointed at and it was very helpful.  As I understand it,
it's considered OK if I copy what EHCI did and release the reservation
if nothing has been scheduled for 5 ms.

Quoting a friend of mine: I'm now all done adding the delayed
reservation release code.  Now I just need to compile it and test it.
:-P  My plan is add some printouts to my current implementation to see
cases where the deferred "unreserve" actually saved us and (to me)
that will help indicate that it's working properly.  Presumably I
won't see this case hit (or not much) without HCD_BH and I will see
this case with HCD_BH.

Please consider this patch "on hold" until my next spin.

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
  2015-11-06  0:29       ` Doug Anderson
@ 2015-11-06 15:40         ` Alan Stern
  2015-11-07  1:26           ` Doug Anderson
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Stern @ 2015-11-06 15:40 UTC (permalink / raw)
  To: Doug Anderson
  Cc: John Youn, Felipe Balbi, Yunzhi Li, Heiko Stübner,
	open list:ARM/Rockchip SoC...,
	Julius Werner, Herrero, Gregory, Kaukab, Yousaf, Dinh Nguyen,
	John Youn, Greg Kroah-Hartman, linux-usb, linux-kernel, Ming Lei

On Thu, 5 Nov 2015, Doug Anderson wrote:

> Alan,
> 
> On Thu, Nov 5, 2015 at 7:19 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Wed, 4 Nov 2015, Doug Anderson wrote:
> >
> >> In the ChromeOS gerrit
> >> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
> >> points out that for EHCI it was good to take the optimization from
> >> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
> >> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
> >> whether we also need a similar change before landing.
> >>
> >> I'll see if I can do some investigation about this and also some
> >> benchmarking before and after.  Certainly profiling the interrupt
> >> handler itself showed a huge improvement, but I'd hate to see a
> >> regression elsewhere.
> >>
> >> If anyone else knows better than I, please speak up!  :)
> >
> > This is a matter of both efficiency and correctness.  Giving back URBs
> > in a tasklet is not a simple change.
> >
> > Have you read the kerneldoc for usb_submit_urb() in
> > drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth
> > Transfers" is highly relevant.  I don't know how dwc2 goes about
> > reserving bandwidth for periodic transfers, but if it relies on the
> > endpoint queue being non-empty to maintain a reservation then it will
> > be affected by this change.
> 
> It does look as if you are right and the reservation will end up being
> released.  It looks to me like dwc2_deschedule_periodic() is in charge
> of releasing the reservation.  I'll work on trying to actually confirm
> this.  I guess I need to find a USB test setup where there are enough
> devices that I exceed the available time so I can see the brokenness
> of my old solution...

You may not need that.  Try a single USB keyboard and see what happens
when the interrupt URB is given back.

> I hadn't realized that this was a correctness problem and not just an
> optimization problem, so thank you very much for the info!  :)  I ran
> with a bunch of USB devices and it worked fine (and performance
> improved!) so I figured I was good to go...  Now I've read the
> kerneldoc you pointed at and it was very helpful.  As I understand it,
> it's considered OK if I copy what EHCI did and release the reservation
> if nothing has been scheduled for 5 ms.

You might also look into the issues surrounding isochronous URBs.  In 
particular, when an URB is submitted, which frames or microframes 
should it be scheduled in?  The problem is that when the submission 
occurs, some of the slots following the previous URB may already have 
expired.  The explanations for EXDEV and EFBIG in 
Documentation/usb/error-codes.txt are relevant here, although terse.

The host controller drivers that I maintain work like this:

	If the endpoint's queue is empty and none of the endpoint's
	URBs are still being given back by the tasklet, pretend that
	the URB_ISO_ASAP flag is set.  Note that this involves
	testing hcd_periodic_completion_in_progress() -- that's
	where switching over to tasklets makes a difference.

	If the URB_ISO_ASAP flag is set, the URB is scheduled for
	the first unallocated slot that hasn't already expired.

	If the flag isn't set, try to schedule the URB for the first
	unallocated slot.  If that means all the isoc packets in the
	URB would be assigned to expired slots, return -EXDEV.  If
	some but not all of the packets would be assigned to expired
	slots, skip them -- only schedule the packets whose slots
	haven't expired yet.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
  2015-11-06 15:40         ` Alan Stern
@ 2015-11-07  1:26           ` Doug Anderson
  2015-11-07 15:09             ` Alan Stern
  0 siblings, 1 reply; 15+ messages in thread
From: Doug Anderson @ 2015-11-07  1:26 UTC (permalink / raw)
  To: Alan Stern
  Cc: John Youn, Felipe Balbi, Yunzhi Li, Heiko Stübner,
	open list:ARM/Rockchip SoC...,
	Julius Werner, Herrero, Gregory, Kaukab, Yousaf, Dinh Nguyen,
	John Youn, Greg Kroah-Hartman, linux-usb, linux-kernel, Ming Lei

Alan,

On Fri, Nov 6, 2015 at 7:40 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Thu, 5 Nov 2015, Doug Anderson wrote:
>
>> Alan,
>>
>> On Thu, Nov 5, 2015 at 7:19 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
>> > On Wed, 4 Nov 2015, Doug Anderson wrote:
>> >
>> >> In the ChromeOS gerrit
>> >> <https://chromium-review.googlesource.com/#/c/310583/> Julius Werner
>> >> points out that for EHCI it was good to take the optimization from
>> >> commit 9118f9eb4f1e ("USB: EHCI: improve interrupt qh unlink") before
>> >> this one.  I'm still trying to learn USB / dwc2 so it's unclear to me
>> >> whether we also need a similar change before landing.
>> >>
>> >> I'll see if I can do some investigation about this and also some
>> >> benchmarking before and after.  Certainly profiling the interrupt
>> >> handler itself showed a huge improvement, but I'd hate to see a
>> >> regression elsewhere.
>> >>
>> >> If anyone else knows better than I, please speak up!  :)
>> >
>> > This is a matter of both efficiency and correctness.  Giving back URBs
>> > in a tasklet is not a simple change.
>> >
>> > Have you read the kerneldoc for usb_submit_urb() in
>> > drivers/usb/core/urb.c?  The portion about "Reserved Bandwidth
>> > Transfers" is highly relevant.  I don't know how dwc2 goes about
>> > reserving bandwidth for periodic transfers, but if it relies on the
>> > endpoint queue being non-empty to maintain a reservation then it will
>> > be affected by this change.
>>
>> It does look as if you are right and the reservation will end up being
>> released.  It looks to me like dwc2_deschedule_periodic() is in charge
>> of releasing the reservation.  I'll work on trying to actually confirm
>> this.  I guess I need to find a USB test setup where there are enough
>> devices that I exceed the available time so I can see the brokenness
>> of my old solution...
>
> You may not need that.  Try a single USB keyboard and see what happens
> when the interrupt URB is given back.

I haven't been able to reproduce any new errors with my patch, but I
have with trace_printk I have proven that it does cause reservations
to be lost and then re-gained, which isn't good.

Note that dwc2 is currently having scheduling problems anyway (even
before my patch).  lyz@rock-chips.com posted an RFC patch to fix
problems he was seeing, but I have a setup that has problems too and
his patch doesn't fix it.  :(  ...so possibly if everything was
working then I could see my patch break things, but as it is now it's
hard to say.

In any case, I've finished testing the patch that adds a 5us delay
before releasing the reservation.  I'll post that so we can be on the
same page.


>> I hadn't realized that this was a correctness problem and not just an
>> optimization problem, so thank you very much for the info!  :)  I ran
>> with a bunch of USB devices and it worked fine (and performance
>> improved!) so I figured I was good to go...  Now I've read the
>> kerneldoc you pointed at and it was very helpful.  As I understand it,
>> it's considered OK if I copy what EHCI did and release the reservation
>> if nothing has been scheduled for 5 ms.
>
> You might also look into the issues surrounding isochronous URBs.  In
> particular, when an URB is submitted, which frames or microframes
> should it be scheduled in?  The problem is that when the submission
> occurs, some of the slots following the previous URB may already have
> expired.  The explanations for EXDEV and EFBIG in
> Documentation/usb/error-codes.txt are relevant here, although terse.
>
> The host controller drivers that I maintain work like this:
>
>         If the endpoint's queue is empty and none of the endpoint's
>         URBs are still being given back by the tasklet, pretend that
>         the URB_ISO_ASAP flag is set.  Note that this involves
>         testing hcd_periodic_completion_in_progress() -- that's
>         where switching over to tasklets makes a difference.
>
>         If the URB_ISO_ASAP flag is set, the URB is scheduled for
>         the first unallocated slot that hasn't already expired.
>
>         If the flag isn't set, try to schedule the URB for the first
>         unallocated slot.  If that means all the isoc packets in the
>         URB would be assigned to expired slots, return -EXDEV.  If
>         some but not all of the packets would be assigned to expired
>         slots, skip them -- only schedule the packets whose slots
>         haven't expired yet.

You're talking to someone who has only been looking at the details of
USB for about 2 days now.  :-P  ...I'm trying to grok all of that, but
I'm not sure I got it all...

I will say that "URB_ISO_ASAP" is not referenced anywhere in dwc2.
Presumably that's a bug (or missing feature).  Would it be terrible if
I just ignored that for now and said that if you try to add something
to the queue and we're currently in the process of waiting for our
timer to expire then you'll just get back -ENOSPC?  dwc2 won't deal
perfectly well if you've very close to exhausting the available
bandwidth, but that could be a task for another day.

-Doug

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context
  2015-11-07  1:26           ` Doug Anderson
@ 2015-11-07 15:09             ` Alan Stern
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Stern @ 2015-11-07 15:09 UTC (permalink / raw)
  To: Doug Anderson
  Cc: John Youn, Felipe Balbi, Yunzhi Li, Heiko Stübner,
	open list:ARM/Rockchip SoC...,
	Julius Werner, Herrero, Gregory, Kaukab, Yousaf, Dinh Nguyen,
	John Youn, Greg Kroah-Hartman, linux-usb, linux-kernel, Ming Lei

On Fri, 6 Nov 2015, Doug Anderson wrote:

> You're talking to someone who has only been looking at the details of
> USB for about 2 days now.  :-P  ...I'm trying to grok all of that, but
> I'm not sure I got it all...
> 
> I will say that "URB_ISO_ASAP" is not referenced anywhere in dwc2.
> Presumably that's a bug (or missing feature).  Would it be terrible if
> I just ignored that for now and said that if you try to add something
> to the queue and we're currently in the process of waiting for our
> timer to expire then you'll just get back -ENOSPC?  dwc2 won't deal
> perfectly well if you've very close to exhausting the available
> bandwidth, but that could be a task for another day.

Sure.  It's not necessary to support everything perfectly, right from
the beginning.

Alan Stern


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-07 15:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-04 22:53 [PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit Douglas Anderson
2015-11-04 22:53 ` Douglas Anderson
2015-11-04 22:53 ` [PATCH 1/3] usb: dwc2: rockchip: Make the max_transfer_size automatic Douglas Anderson
2015-11-04 22:53 ` [PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context Douglas Anderson
2015-11-04 22:53   ` Douglas Anderson
2015-11-05  0:30   ` Doug Anderson
2015-11-05  0:30     ` Doug Anderson
2015-11-05 15:19     ` Alan Stern
2015-11-06  0:29       ` Doug Anderson
2015-11-06 15:40         ` Alan Stern
2015-11-07  1:26           ` Doug Anderson
2015-11-07 15:09             ` Alan Stern
2015-11-04 22:53 ` [PATCH 3/3] usb: dwc2: host: Get aligned DMA in a more supported way Douglas Anderson
2015-11-05 19:18 ` [PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit Heiko Stuebner
2015-11-05 19:18   ` Heiko Stuebner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.