All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 16:03 ` ming.lei at canonical.com
  0 siblings, 0 replies; 65+ messages in thread
From: ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw @ 2011-08-30 16:03 UTC (permalink / raw)
  To: greg-U8xfFu+wG4EAvxtiuMwx3w, stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-omap-u79uwXL29TY76Z2rM5mHXA, Ming Lei, Russell King

From: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

This patch introduces the helper of ehci_sync_mem to flush
qtd/qh into memory immediately on some ARM, so that HC can
see the up-to-date qtd/qh descriptor asap.

This patch fixs one performance bug on ARM Cortex A9 dual core
platform, which has been reported on quite a few ARM machines
(OMAP4, Tegra 2, snowball...), see details from link of
https://bugs.launchpad.net/bugs/709245.

The patch has been tested ok on OMAP4 panda A1 board, and the
performance of 'dd' over usb mass storage can be increased from
4~5MB/sec to 14~16MB/sec after applying this patch.

Cc: Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org>
Cc: Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>
Signed-off-by: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 drivers/usb/host/ehci-q.c |   18 ++++++++++++++++++
 drivers/usb/host/ehci.h   |   17 +++++++++++++++++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 0917e3a..2719879 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -995,6 +995,12 @@ static void qh_link_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
 	head->qh_next.qh = qh;
 	head->hw->hw_next = dma;
 
+	/*
+	 * flush qh descriptor into memory immediately,
+	 * see comments in qh_append_tds.
+	 * */
+	ehci_sync_mem();
+
 	qh_get(qh);
 	qh->xacterrs = 0;
 	qh->qh_state = QH_STATE_LINKED;
@@ -1082,6 +1088,18 @@ static struct ehci_qh *qh_append_tds (
 			wmb ();
 			dummy->hw_token = token;
 
+			/*
+			 * Writing to dma coherent buffer on ARM may
+			 * be delayed to reach memory, so HC may not see
+			 * hw_token of dummy qtd in time, which can cause
+			 * the qtd transaction to be executed very late,
+			 * and degrade performance a lot. ehci_sync_mem
+			 * is added to flush 'token' immediatelly into
+			 * memory, so that ehci can execute the transaction
+			 * ASAP.
+			 * */
+			ehci_sync_mem();
+
 			urb->hcpriv = qh_get (qh);
 		}
 	}
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index cc7d337..313d9d6 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -738,6 +738,23 @@ static inline u32 hc32_to_cpup (const struct ehci_hcd *ehci, const __hc32 *x)
 
 #endif
 
+/*
+ * Writing to dma coherent memory on ARM may be delayed via L2
+ * writing buffer, so introduce the helper which can flush L2 writing
+ * buffer into memory immediately, especially used to flush ehci
+ * descriptor to memory.
+ * */
+#ifdef	CONFIG_ARM_DMA_MEM_BUFFERABLE
+static inline void ehci_sync_mem()
+{
+	mb();
+}
+#else
+static inline void ehci_sync_mem()
+{
+}
+#endif
+
 /*-------------------------------------------------------------------------*/
 
 #ifndef DEBUG
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 16:03 ` ming.lei at canonical.com
  0 siblings, 0 replies; 65+ messages in thread
From: ming.lei at canonical.com @ 2011-08-30 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

From: Ming Lei <ming.lei@canonical.com>

This patch introduces the helper of ehci_sync_mem to flush
qtd/qh into memory immediately on some ARM, so that HC can
see the up-to-date qtd/qh descriptor asap.

This patch fixs one performance bug on ARM Cortex A9 dual core
platform, which has been reported on quite a few ARM machines
(OMAP4, Tegra 2, snowball...), see details from link of
https://bugs.launchpad.net/bugs/709245.

The patch has been tested ok on OMAP4 panda A1 board, and the
performance of 'dd' over usb mass storage can be increased from
4~5MB/sec to 14~16MB/sec after applying this patch.

Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 drivers/usb/host/ehci-q.c |   18 ++++++++++++++++++
 drivers/usb/host/ehci.h   |   17 +++++++++++++++++
 2 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 0917e3a..2719879 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -995,6 +995,12 @@ static void qh_link_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
 	head->qh_next.qh = qh;
 	head->hw->hw_next = dma;
 
+	/*
+	 * flush qh descriptor into memory immediately,
+	 * see comments in qh_append_tds.
+	 * */
+	ehci_sync_mem();
+
 	qh_get(qh);
 	qh->xacterrs = 0;
 	qh->qh_state = QH_STATE_LINKED;
@@ -1082,6 +1088,18 @@ static struct ehci_qh *qh_append_tds (
 			wmb ();
 			dummy->hw_token = token;
 
+			/*
+			 * Writing to dma coherent buffer on ARM may
+			 * be delayed to reach memory, so HC may not see
+			 * hw_token of dummy qtd in time, which can cause
+			 * the qtd transaction to be executed very late,
+			 * and degrade performance a lot. ehci_sync_mem
+			 * is added to flush 'token' immediatelly into
+			 * memory, so that ehci can execute the transaction
+			 * ASAP.
+			 * */
+			ehci_sync_mem();
+
 			urb->hcpriv = qh_get (qh);
 		}
 	}
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index cc7d337..313d9d6 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -738,6 +738,23 @@ static inline u32 hc32_to_cpup (const struct ehci_hcd *ehci, const __hc32 *x)
 
 #endif
 
+/*
+ * Writing to dma coherent memory on ARM may be delayed via L2
+ * writing buffer, so introduce the helper which can flush L2 writing
+ * buffer into memory immediately, especially used to flush ehci
+ * descriptor to memory.
+ * */
+#ifdef	CONFIG_ARM_DMA_MEM_BUFFERABLE
+static inline void ehci_sync_mem()
+{
+	mb();
+}
+#else
+static inline void ehci_sync_mem()
+{
+}
+#endif
+
 /*-------------------------------------------------------------------------*/
 
 #ifndef DEBUG
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 16:03 ` ming.lei at canonical.com
@ 2011-08-30 16:15   ` Alan Stern
  -1 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2011-08-30 16:15 UTC (permalink / raw)
  To: Ming Lei; +Cc: greg, linux-omap, linux-usb, Russell King, linux-arm-kernel

On Wed, 31 Aug 2011 ming.lei@canonical.com wrote:

> From: Ming Lei <ming.lei@canonical.com>
> 
> This patch introduces the helper of ehci_sync_mem to flush
> qtd/qh into memory immediately on some ARM, so that HC can
> see the up-to-date qtd/qh descriptor asap.
> 
> This patch fixs one performance bug on ARM Cortex A9 dual core
> platform, which has been reported on quite a few ARM machines
> (OMAP4, Tegra 2, snowball...), see details from link of
> https://bugs.launchpad.net/bugs/709245.
> 
> The patch has been tested ok on OMAP4 panda A1 board, and the
> performance of 'dd' over usb mass storage can be increased from
> 4~5MB/sec to 14~16MB/sec after applying this patch.
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Russell King <linux@arm.linux.org.uk>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  drivers/usb/host/ehci-q.c |   18 ++++++++++++++++++
>  drivers/usb/host/ehci.h   |   17 +++++++++++++++++
>  2 files changed, 35 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> index 0917e3a..2719879 100644
> --- a/drivers/usb/host/ehci-q.c
> +++ b/drivers/usb/host/ehci-q.c
> @@ -995,6 +995,12 @@ static void qh_link_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
>  	head->qh_next.qh = qh;
>  	head->hw->hw_next = dma;
>  
> +	/*
> +	 * flush qh descriptor into memory immediately,
> +	 * see comments in qh_append_tds.
> +	 * */

Comments are supposed to look like this:

	/*
	 * Blah blah blah
	 * blah blah blah
	 */

> +	ehci_sync_mem();
> +
>  	qh_get(qh);
>  	qh->xacterrs = 0;
>  	qh->qh_state = QH_STATE_LINKED;
> @@ -1082,6 +1088,18 @@ static struct ehci_qh *qh_append_tds (
>  			wmb ();
>  			dummy->hw_token = token;
>  
> +			/*
> +			 * Writing to dma coherent buffer on ARM may
> +			 * be delayed to reach memory, so HC may not see
> +			 * hw_token of dummy qtd in time, which can cause
> +			 * the qtd transaction to be executed very late,
> +			 * and degrade performance a lot. ehci_sync_mem
> +			 * is added to flush 'token' immediatelly into
> +			 * memory, so that ehci can execute the transaction
> +			 * ASAP.
> +			 * */

Here too.

> +			ehci_sync_mem();
> +
>  			urb->hcpriv = qh_get (qh);
>  		}
>  	}
> diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
> index cc7d337..313d9d6 100644
> --- a/drivers/usb/host/ehci.h
> +++ b/drivers/usb/host/ehci.h
> @@ -738,6 +738,23 @@ static inline u32 hc32_to_cpup (const struct ehci_hcd *ehci, const __hc32 *x)
>  
>  #endif
>  
> +/*
> + * Writing to dma coherent memory on ARM may be delayed via L2
> + * writing buffer, so introduce the helper which can flush L2 writing
> + * buffer into memory immediately, especially used to flush ehci
> + * descriptor to memory.
> + * */

And here.

> +#ifdef	CONFIG_ARM_DMA_MEM_BUFFERABLE
> +static inline void ehci_sync_mem()
> +{
> +	mb();
> +}
> +#else
> +static inline void ehci_sync_mem()
> +{
> +}
> +#endif
> +

Except for the formatting of the comments, this is fine.  When you fix 
up the comments, you can add:

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

Alan Stern

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 16:15   ` Alan Stern
  0 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2011-08-30 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 31 Aug 2011 ming.lei at canonical.com wrote:

> From: Ming Lei <ming.lei@canonical.com>
> 
> This patch introduces the helper of ehci_sync_mem to flush
> qtd/qh into memory immediately on some ARM, so that HC can
> see the up-to-date qtd/qh descriptor asap.
> 
> This patch fixs one performance bug on ARM Cortex A9 dual core
> platform, which has been reported on quite a few ARM machines
> (OMAP4, Tegra 2, snowball...), see details from link of
> https://bugs.launchpad.net/bugs/709245.
> 
> The patch has been tested ok on OMAP4 panda A1 board, and the
> performance of 'dd' over usb mass storage can be increased from
> 4~5MB/sec to 14~16MB/sec after applying this patch.
> 
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Russell King <linux@arm.linux.org.uk>
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  drivers/usb/host/ehci-q.c |   18 ++++++++++++++++++
>  drivers/usb/host/ehci.h   |   17 +++++++++++++++++
>  2 files changed, 35 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> index 0917e3a..2719879 100644
> --- a/drivers/usb/host/ehci-q.c
> +++ b/drivers/usb/host/ehci-q.c
> @@ -995,6 +995,12 @@ static void qh_link_async (struct ehci_hcd *ehci, struct ehci_qh *qh)
>  	head->qh_next.qh = qh;
>  	head->hw->hw_next = dma;
>  
> +	/*
> +	 * flush qh descriptor into memory immediately,
> +	 * see comments in qh_append_tds.
> +	 * */

Comments are supposed to look like this:

	/*
	 * Blah blah blah
	 * blah blah blah
	 */

> +	ehci_sync_mem();
> +
>  	qh_get(qh);
>  	qh->xacterrs = 0;
>  	qh->qh_state = QH_STATE_LINKED;
> @@ -1082,6 +1088,18 @@ static struct ehci_qh *qh_append_tds (
>  			wmb ();
>  			dummy->hw_token = token;
>  
> +			/*
> +			 * Writing to dma coherent buffer on ARM may
> +			 * be delayed to reach memory, so HC may not see
> +			 * hw_token of dummy qtd in time, which can cause
> +			 * the qtd transaction to be executed very late,
> +			 * and degrade performance a lot. ehci_sync_mem
> +			 * is added to flush 'token' immediatelly into
> +			 * memory, so that ehci can execute the transaction
> +			 * ASAP.
> +			 * */

Here too.

> +			ehci_sync_mem();
> +
>  			urb->hcpriv = qh_get (qh);
>  		}
>  	}
> diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
> index cc7d337..313d9d6 100644
> --- a/drivers/usb/host/ehci.h
> +++ b/drivers/usb/host/ehci.h
> @@ -738,6 +738,23 @@ static inline u32 hc32_to_cpup (const struct ehci_hcd *ehci, const __hc32 *x)
>  
>  #endif
>  
> +/*
> + * Writing to dma coherent memory on ARM may be delayed via L2
> + * writing buffer, so introduce the helper which can flush L2 writing
> + * buffer into memory immediately, especially used to flush ehci
> + * descriptor to memory.
> + * */

And here.

> +#ifdef	CONFIG_ARM_DMA_MEM_BUFFERABLE
> +static inline void ehci_sync_mem()
> +{
> +	mb();
> +}
> +#else
> +static inline void ehci_sync_mem()
> +{
> +}
> +#endif
> +

Except for the formatting of the comments, this is fine.  When you fix 
up the comments, you can add:

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

Alan Stern

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 16:03 ` ming.lei at canonical.com
@ 2011-08-30 16:38   ` Mark Salter
  -1 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-30 16:38 UTC (permalink / raw)
  To: ming.lei
  Cc: greg, stern, linux-usb, linux-arm-kernel, linux-omap, Russell King

On Wed, 2011-08-31 at 00:03 +0800, ming.lei@canonical.com wrote:
> +/*
> + * Writing to dma coherent memory on ARM may be delayed via L2
> + * writing buffer, so introduce the helper which can flush L2 writing
> + * buffer into memory immediately, especially used to flush ehci
> + * descriptor to memory.
> + * */
> +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> +static inline void ehci_sync_mem()
> +{
> +       mb();
> +}
> +#else
> +static inline void ehci_sync_mem()
> +{
> +}
> +#endif
> +

I'm wondering if this doesn't really belong in the DMA API for any
future architectures that can't avoid prolonged write buffering to DMA
coherent memory. IIUC, ARM mitigates this for most drivers by including
an implicit write buffer flush in the mmio write routines. This takes
care of the drivers which write to a mmio device register after writing
something to shared DMA memory. IIUC, this doesn't help ehci because the
host controller is polling to see what the cpu writes to the shared
memory. Other hardware which polls shared memory like that will likely
have the same problem and could use buffer drain helpers as well.

--Mark



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 16:38   ` Mark Salter
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-30 16:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
> +/*
> + * Writing to dma coherent memory on ARM may be delayed via L2
> + * writing buffer, so introduce the helper which can flush L2 writing
> + * buffer into memory immediately, especially used to flush ehci
> + * descriptor to memory.
> + * */
> +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> +static inline void ehci_sync_mem()
> +{
> +       mb();
> +}
> +#else
> +static inline void ehci_sync_mem()
> +{
> +}
> +#endif
> +

I'm wondering if this doesn't really belong in the DMA API for any
future architectures that can't avoid prolonged write buffering to DMA
coherent memory. IIUC, ARM mitigates this for most drivers by including
an implicit write buffer flush in the mmio write routines. This takes
care of the drivers which write to a mmio device register after writing
something to shared DMA memory. IIUC, this doesn't help ehci because the
host controller is polling to see what the cpu writes to the shared
memory. Other hardware which polls shared memory like that will likely
have the same problem and could use buffer drain helpers as well.

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 16:38   ` Mark Salter
@ 2011-08-30 17:15     ` Alan Stern
  -1 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2011-08-30 17:15 UTC (permalink / raw)
  To: Mark Salter
  Cc: ming.lei, greg, linux-usb, linux-arm-kernel, linux-omap, Russell King

On Tue, 30 Aug 2011, Mark Salter wrote:

> On Wed, 2011-08-31 at 00:03 +0800, ming.lei@canonical.com wrote:
> > +/*
> > + * Writing to dma coherent memory on ARM may be delayed via L2
> > + * writing buffer, so introduce the helper which can flush L2 writing
> > + * buffer into memory immediately, especially used to flush ehci
> > + * descriptor to memory.
> > + * */
> > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > +static inline void ehci_sync_mem()
> > +{
> > +       mb();
> > +}
> > +#else
> > +static inline void ehci_sync_mem()
> > +{
> > +}
> > +#endif
> > +
> 
> I'm wondering if this doesn't really belong in the DMA API for any
> future architectures that can't avoid prolonged write buffering to DMA
> coherent memory. IIUC, ARM mitigates this for most drivers by including
> an implicit write buffer flush in the mmio write routines. This takes
> care of the drivers which write to a mmio device register after writing
> something to shared DMA memory. IIUC, this doesn't help ehci because the
> host controller is polling to see what the cpu writes to the shared
> memory. Other hardware which polls shared memory like that will likely
> have the same problem and could use buffer drain helpers as well.

This would be a good thing to define centrally.  Would you like to 
post an RFC on LKML?

Do you know of any other examples of hardware that polls shared DMA 
memory?

Alan Stern


^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 17:15     ` Alan Stern
  0 siblings, 0 replies; 65+ messages in thread
From: Alan Stern @ 2011-08-30 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 30 Aug 2011, Mark Salter wrote:

> On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
> > +/*
> > + * Writing to dma coherent memory on ARM may be delayed via L2
> > + * writing buffer, so introduce the helper which can flush L2 writing
> > + * buffer into memory immediately, especially used to flush ehci
> > + * descriptor to memory.
> > + * */
> > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > +static inline void ehci_sync_mem()
> > +{
> > +       mb();
> > +}
> > +#else
> > +static inline void ehci_sync_mem()
> > +{
> > +}
> > +#endif
> > +
> 
> I'm wondering if this doesn't really belong in the DMA API for any
> future architectures that can't avoid prolonged write buffering to DMA
> coherent memory. IIUC, ARM mitigates this for most drivers by including
> an implicit write buffer flush in the mmio write routines. This takes
> care of the drivers which write to a mmio device register after writing
> something to shared DMA memory. IIUC, this doesn't help ehci because the
> host controller is polling to see what the cpu writes to the shared
> memory. Other hardware which polls shared memory like that will likely
> have the same problem and could use buffer drain helpers as well.

This would be a good thing to define centrally.  Would you like to 
post an RFC on LKML?

Do you know of any other examples of hardware that polls shared DMA 
memory?

Alan Stern

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 16:38   ` Mark Salter
@ 2011-08-30 17:26     ` Will Deacon
  -1 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-30 17:26 UTC (permalink / raw)
  To: Mark Salter
  Cc: Russell King, greg, ming.lei, linux-usb, stern, linux-omap,
	linux-arm-kernel

On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
> On Wed, 2011-08-31 at 00:03 +0800, ming.lei@canonical.com wrote:
> > +/*
> > + * Writing to dma coherent memory on ARM may be delayed via L2
> > + * writing buffer, so introduce the helper which can flush L2 writing
> > + * buffer into memory immediately, especially used to flush ehci
> > + * descriptor to memory.
> > + * */
> > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > +static inline void ehci_sync_mem()
> > +{
> > +       mb();
> > +}
> > +#else
> > +static inline void ehci_sync_mem()
> > +{
> > +}
> > +#endif
> > +
> 
> I'm wondering if this doesn't really belong in the DMA API for any
> future architectures that can't avoid prolonged write buffering to DMA
> coherent memory. IIUC, ARM mitigates this for most drivers by including
> an implicit write buffer flush in the mmio write routines. This takes
> care of the drivers which write to a mmio device register after writing
> something to shared DMA memory. IIUC, this doesn't help ehci because the
> host controller is polling to see what the cpu writes to the shared
> memory. Other hardware which polls shared memory like that will likely
> have the same problem and could use buffer drain helpers as well.

Right. In this case the buffering is happening at L2 which becomes
noticeable when measuring performance. We also buffer stores at the
CPU (regardless of memory type) but because these tend to become visible
fairly quickly, there isn't a comparable performance problem.

Given that I would expect other architectures to buffer writes at the CPU,
would it not be worth having an API for flushing to L3 (devices)? It seems
like this would be a useful addition to the coherent DMA API on platforms
that handle coherency with non-cacheable memory attributes.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 17:26     ` Will Deacon
  0 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-30 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
> On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
> > +/*
> > + * Writing to dma coherent memory on ARM may be delayed via L2
> > + * writing buffer, so introduce the helper which can flush L2 writing
> > + * buffer into memory immediately, especially used to flush ehci
> > + * descriptor to memory.
> > + * */
> > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > +static inline void ehci_sync_mem()
> > +{
> > +       mb();
> > +}
> > +#else
> > +static inline void ehci_sync_mem()
> > +{
> > +}
> > +#endif
> > +
> 
> I'm wondering if this doesn't really belong in the DMA API for any
> future architectures that can't avoid prolonged write buffering to DMA
> coherent memory. IIUC, ARM mitigates this for most drivers by including
> an implicit write buffer flush in the mmio write routines. This takes
> care of the drivers which write to a mmio device register after writing
> something to shared DMA memory. IIUC, this doesn't help ehci because the
> host controller is polling to see what the cpu writes to the shared
> memory. Other hardware which polls shared memory like that will likely
> have the same problem and could use buffer drain helpers as well.

Right. In this case the buffering is happening at L2 which becomes
noticeable when measuring performance. We also buffer stores at the
CPU (regardless of memory type) but because these tend to become visible
fairly quickly, there isn't a comparable performance problem.

Given that I would expect other architectures to buffer writes at the CPU,
would it not be worth having an API for flushing to L3 (devices)? It seems
like this would be a useful addition to the coherent DMA API on platforms
that handle coherency with non-cacheable memory attributes.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 17:26     ` Will Deacon
@ 2011-08-30 17:48         ` Greg KH
  -1 siblings, 0 replies; 65+ messages in thread
From: Greg KH @ 2011-08-30 17:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Salter, ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw, Russell King,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
> On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
> > On Wed, 2011-08-31 at 00:03 +0800, ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org wrote:
> > > +/*
> > > + * Writing to dma coherent memory on ARM may be delayed via L2
> > > + * writing buffer, so introduce the helper which can flush L2 writing
> > > + * buffer into memory immediately, especially used to flush ehci
> > > + * descriptor to memory.
> > > + * */
> > > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +       mb();
> > > +}
> > > +#else
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +}
> > > +#endif
> > > +
> > 
> > I'm wondering if this doesn't really belong in the DMA API for any
> > future architectures that can't avoid prolonged write buffering to DMA
> > coherent memory. IIUC, ARM mitigates this for most drivers by including
> > an implicit write buffer flush in the mmio write routines. This takes
> > care of the drivers which write to a mmio device register after writing
> > something to shared DMA memory. IIUC, this doesn't help ehci because the
> > host controller is polling to see what the cpu writes to the shared
> > memory. Other hardware which polls shared memory like that will likely
> > have the same problem and could use buffer drain helpers as well.
> 
> Right. In this case the buffering is happening at L2 which becomes
> noticeable when measuring performance. We also buffer stores at the
> CPU (regardless of memory type) but because these tend to become visible
> fairly quickly, there isn't a comparable performance problem.
> 
> Given that I would expect other architectures to buffer writes at the CPU,
> would it not be worth having an API for flushing to L3 (devices)? It seems
> like this would be a useful addition to the coherent DMA API on platforms
> that handle coherency with non-cacheable memory attributes.

I agree, this seems to be a "new" type of barrier that is needed, as the
code comment above seems to go against what the kernel memory barrier
documentation says about what a memory barrier really does on the
hardware.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 17:48         ` Greg KH
  0 siblings, 0 replies; 65+ messages in thread
From: Greg KH @ 2011-08-30 17:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
> On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
> > On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
> > > +/*
> > > + * Writing to dma coherent memory on ARM may be delayed via L2
> > > + * writing buffer, so introduce the helper which can flush L2 writing
> > > + * buffer into memory immediately, especially used to flush ehci
> > > + * descriptor to memory.
> > > + * */
> > > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +       mb();
> > > +}
> > > +#else
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +}
> > > +#endif
> > > +
> > 
> > I'm wondering if this doesn't really belong in the DMA API for any
> > future architectures that can't avoid prolonged write buffering to DMA
> > coherent memory. IIUC, ARM mitigates this for most drivers by including
> > an implicit write buffer flush in the mmio write routines. This takes
> > care of the drivers which write to a mmio device register after writing
> > something to shared DMA memory. IIUC, this doesn't help ehci because the
> > host controller is polling to see what the cpu writes to the shared
> > memory. Other hardware which polls shared memory like that will likely
> > have the same problem and could use buffer drain helpers as well.
> 
> Right. In this case the buffering is happening at L2 which becomes
> noticeable when measuring performance. We also buffer stores at the
> CPU (regardless of memory type) but because these tend to become visible
> fairly quickly, there isn't a comparable performance problem.
> 
> Given that I would expect other architectures to buffer writes at the CPU,
> would it not be worth having an API for flushing to L3 (devices)? It seems
> like this would be a useful addition to the coherent DMA API on platforms
> that handle coherency with non-cacheable memory attributes.

I agree, this seems to be a "new" type of barrier that is needed, as the
code comment above seems to go against what the kernel memory barrier
documentation says about what a memory barrier really does on the
hardware.

greg k-h

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 17:48         ` Greg KH
@ 2011-08-30 17:54           ` Will Deacon
  -1 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-30 17:54 UTC (permalink / raw)
  To: Greg KH
  Cc: Russell King, ming.lei, linux-usb, stern, Mark Salter,
	linux-omap, linux-arm-kernel

On Tue, Aug 30, 2011 at 06:48:59PM +0100, Greg KH wrote:
> On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
> > On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
> > > On Wed, 2011-08-31 at 00:03 +0800, ming.lei@canonical.com wrote:
> > > > +/*
> > > > + * Writing to dma coherent memory on ARM may be delayed via L2
> > > > + * writing buffer, so introduce the helper which can flush L2 writing
> > > > + * buffer into memory immediately, especially used to flush ehci
> > > > + * descriptor to memory.
> > > > + * */
> > > > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > > > +static inline void ehci_sync_mem()
> > > > +{
> > > > +       mb();
> > > > +}
> > > > +#else
> > > > +static inline void ehci_sync_mem()
> > > > +{
> > > > +}
> > > > +#endif
> > > > +
> > > 
> > > I'm wondering if this doesn't really belong in the DMA API for any
> > > future architectures that can't avoid prolonged write buffering to DMA
> > > coherent memory. IIUC, ARM mitigates this for most drivers by including
> > > an implicit write buffer flush in the mmio write routines. This takes
> > > care of the drivers which write to a mmio device register after writing
> > > something to shared DMA memory. IIUC, this doesn't help ehci because the
> > > host controller is polling to see what the cpu writes to the shared
> > > memory. Other hardware which polls shared memory like that will likely
> > > have the same problem and could use buffer drain helpers as well.
> > 
> > Right. In this case the buffering is happening at L2 which becomes
> > noticeable when measuring performance. We also buffer stores at the
> > CPU (regardless of memory type) but because these tend to become visible
> > fairly quickly, there isn't a comparable performance problem.
> > 
> > Given that I would expect other architectures to buffer writes at the CPU,
> > would it not be worth having an API for flushing to L3 (devices)? It seems
> > like this would be a useful addition to the coherent DMA API on platforms
> > that handle coherency with non-cacheable memory attributes.
> 
> I agree, this seems to be a "new" type of barrier that is needed, as the
> code comment above seems to go against what the kernel memory barrier
> documentation says about what a memory barrier really does on the
> hardware.

Although this doesn't have anything to do with ordering; it's all to do with
immediacy so I think we should try to avoiding using the term `barrier'. If
this can be made part of the coherent DMA API, that might be the best place
for it (I can't think of any other areas this is needed given that the
streaming DMA API and I/O accessors already deal with it).

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 17:54           ` Will Deacon
  0 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-30 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Aug 30, 2011 at 06:48:59PM +0100, Greg KH wrote:
> On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
> > On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
> > > On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
> > > > +/*
> > > > + * Writing to dma coherent memory on ARM may be delayed via L2
> > > > + * writing buffer, so introduce the helper which can flush L2 writing
> > > > + * buffer into memory immediately, especially used to flush ehci
> > > > + * descriptor to memory.
> > > > + * */
> > > > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > > > +static inline void ehci_sync_mem()
> > > > +{
> > > > +       mb();
> > > > +}
> > > > +#else
> > > > +static inline void ehci_sync_mem()
> > > > +{
> > > > +}
> > > > +#endif
> > > > +
> > > 
> > > I'm wondering if this doesn't really belong in the DMA API for any
> > > future architectures that can't avoid prolonged write buffering to DMA
> > > coherent memory. IIUC, ARM mitigates this for most drivers by including
> > > an implicit write buffer flush in the mmio write routines. This takes
> > > care of the drivers which write to a mmio device register after writing
> > > something to shared DMA memory. IIUC, this doesn't help ehci because the
> > > host controller is polling to see what the cpu writes to the shared
> > > memory. Other hardware which polls shared memory like that will likely
> > > have the same problem and could use buffer drain helpers as well.
> > 
> > Right. In this case the buffering is happening at L2 which becomes
> > noticeable when measuring performance. We also buffer stores at the
> > CPU (regardless of memory type) but because these tend to become visible
> > fairly quickly, there isn't a comparable performance problem.
> > 
> > Given that I would expect other architectures to buffer writes at the CPU,
> > would it not be worth having an API for flushing to L3 (devices)? It seems
> > like this would be a useful addition to the coherent DMA API on platforms
> > that handle coherency with non-cacheable memory attributes.
> 
> I agree, this seems to be a "new" type of barrier that is needed, as the
> code comment above seems to go against what the kernel memory barrier
> documentation says about what a memory barrier really does on the
> hardware.

Although this doesn't have anything to do with ordering; it's all to do with
immediacy so I think we should try to avoiding using the term `barrier'. If
this can be made part of the coherent DMA API, that might be the best place
for it (I can't think of any other areas this is needed given that the
streaming DMA API and I/O accessors already deal with it).

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 17:15     ` Alan Stern
@ 2011-08-30 18:45       ` Mark Salter
  -1 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-30 18:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: ming.lei, greg, linux-usb, linux-arm-kernel, linux-omap, Russell King

On Tue, 2011-08-30 at 13:15 -0400, Alan Stern wrote:
> On Tue, 30 Aug 2011, Mark Salter wrote:
> 
> > On Wed, 2011-08-31 at 00:03 +0800, ming.lei@canonical.com wrote:
> > > +/*
> > > + * Writing to dma coherent memory on ARM may be delayed via L2
> > > + * writing buffer, so introduce the helper which can flush L2 writing
> > > + * buffer into memory immediately, especially used to flush ehci
> > > + * descriptor to memory.
> > > + * */
> > > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +       mb();
> > > +}
> > > +#else
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +}
> > > +#endif
> > > +
> > 
> > I'm wondering if this doesn't really belong in the DMA API for any
> > future architectures that can't avoid prolonged write buffering to DMA
> > coherent memory. IIUC, ARM mitigates this for most drivers by including
> > an implicit write buffer flush in the mmio write routines. This takes
> > care of the drivers which write to a mmio device register after writing
> > something to shared DMA memory. IIUC, this doesn't help ehci because the
> > host controller is polling to see what the cpu writes to the shared
> > memory. Other hardware which polls shared memory like that will likely
> > have the same problem and could use buffer drain helpers as well.
> 
> This would be a good thing to define centrally.  Would you like to 
> post an RFC on LKML?

Yes, I can take a stab at that.

> 
> Do you know of any other examples of hardware that polls shared DMA 
> memory?

Not offhand nor after a quick search. I don't think it is a common
way of doing things.

--Mark



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-30 18:45       ` Mark Salter
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-30 18:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2011-08-30 at 13:15 -0400, Alan Stern wrote:
> On Tue, 30 Aug 2011, Mark Salter wrote:
> 
> > On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
> > > +/*
> > > + * Writing to dma coherent memory on ARM may be delayed via L2
> > > + * writing buffer, so introduce the helper which can flush L2 writing
> > > + * buffer into memory immediately, especially used to flush ehci
> > > + * descriptor to memory.
> > > + * */
> > > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +       mb();
> > > +}
> > > +#else
> > > +static inline void ehci_sync_mem()
> > > +{
> > > +}
> > > +#endif
> > > +
> > 
> > I'm wondering if this doesn't really belong in the DMA API for any
> > future architectures that can't avoid prolonged write buffering to DMA
> > coherent memory. IIUC, ARM mitigates this for most drivers by including
> > an implicit write buffer flush in the mmio write routines. This takes
> > care of the drivers which write to a mmio device register after writing
> > something to shared DMA memory. IIUC, this doesn't help ehci because the
> > host controller is polling to see what the cpu writes to the shared
> > memory. Other hardware which polls shared memory like that will likely
> > have the same problem and could use buffer drain helpers as well.
> 
> This would be a good thing to define centrally.  Would you like to 
> post an RFC on LKML?

Yes, I can take a stab at that.

> 
> Do you know of any other examples of hardware that polls shared DMA 
> memory?

Not offhand nor after a quick search. I don't think it is a common
way of doing things.

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 17:54           ` Will Deacon
@ 2011-08-31  0:23               ` Chen Peter-B29397
  -1 siblings, 0 replies; 65+ messages in thread
From: Chen Peter-B29397 @ 2011-08-31  0:23 UTC (permalink / raw)
  To: Will Deacon
  Cc: Greg KH, Mark Salter, ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw,
	Russell King, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r



On Aug 31, 2011, at 1:54 AM, Will Deacon wrote:

> On Tue, Aug 30, 2011 at 06:48:59PM +0100, Greg KH wrote:
>> On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
>>> On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
>>>> On Wed, 2011-08-31 at 00:03 +0800, ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org wrote:
>>>>> +/*
>>>>> + * Writing to dma coherent memory on ARM may be delayed via L2
>>>>> + * writing buffer, so introduce the helper which can flush L2 writing
>>>>> + * buffer into memory immediately, especially used to flush ehci
>>>>> + * descriptor to memory.
>>>>> + * */
>>>>> +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
>>>>> +static inline void ehci_sync_mem()
>>>>> +{
>>>>> +       mb();
>>>>> +}
>>>>> +#else
>>>>> +static inline void ehci_sync_mem()
>>>>> +{
>>>>> +}
>>>>> +#endif
>>>>> +
>>>> 
>>>> I'm wondering if this doesn't really belong in the DMA API for any
>>>> future architectures that can't avoid prolonged write buffering to DMA
>>>> coherent memory. IIUC, ARM mitigates this for most drivers by including
>>>> an implicit write buffer flush in the mmio write routines. This takes
>>>> care of the drivers which write to a mmio device register after writing
>>>> something to shared DMA memory. IIUC, this doesn't help ehci because the
>>>> host controller is polling to see what the cpu writes to the shared
>>>> memory. Other hardware which polls shared memory like that will likely
>>>> have the same problem and could use buffer drain helpers as well.
>>> 
>>> Right. In this case the buffering is happening at L2 which becomes
>>> noticeable when measuring performance. We also buffer stores at the
>>> CPU (regardless of memory type) but because these tend to become visible
>>> fairly quickly, there isn't a comparable performance problem.
>>> 
>>> Given that I would expect other architectures to buffer writes at the CPU,
>>> would it not be worth having an API for flushing to L3 (devices)? It seems
>>> like this would be a useful addition to the coherent DMA API on platforms
>>> that handle coherency with non-cacheable memory attributes.
>> 
>> I agree, this seems to be a "new" type of barrier that is needed, as the
>> code comment above seems to go against what the kernel memory barrier
>> documentation says about what a memory barrier really does on the
>> hardware.
> 
> Although this doesn't have anything to do with ordering; it's all to do with
> immediacy so I think we should try to avoiding using the term `barrier'. If
> this can be made part of the coherent DMA API, that might be the best place
> for it (I can't think of any other areas this is needed given that the
> streaming DMA API and I/O accessors already deal with it).
I am agree with you. I met the same issue at both usb device driver (adding next dTD pointer which
the current one is handling) and usb host driver (performance issue this thread have discussed) at
Freescale i.MX6Q platform (4 Cores, ARM SMP).
So, now I need to add two barriers at two different drivers.

One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
also uncache, but bufferable?

> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Best Regard,
Peter Chen
 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31  0:23               ` Chen Peter-B29397
  0 siblings, 0 replies; 65+ messages in thread
From: Chen Peter-B29397 @ 2011-08-31  0:23 UTC (permalink / raw)
  To: linux-arm-kernel



On Aug 31, 2011, at 1:54 AM, Will Deacon wrote:

> On Tue, Aug 30, 2011 at 06:48:59PM +0100, Greg KH wrote:
>> On Tue, Aug 30, 2011 at 06:26:42PM +0100, Will Deacon wrote:
>>> On Tue, Aug 30, 2011 at 05:38:30PM +0100, Mark Salter wrote:
>>>> On Wed, 2011-08-31 at 00:03 +0800, ming.lei at canonical.com wrote:
>>>>> +/*
>>>>> + * Writing to dma coherent memory on ARM may be delayed via L2
>>>>> + * writing buffer, so introduce the helper which can flush L2 writing
>>>>> + * buffer into memory immediately, especially used to flush ehci
>>>>> + * descriptor to memory.
>>>>> + * */
>>>>> +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
>>>>> +static inline void ehci_sync_mem()
>>>>> +{
>>>>> +       mb();
>>>>> +}
>>>>> +#else
>>>>> +static inline void ehci_sync_mem()
>>>>> +{
>>>>> +}
>>>>> +#endif
>>>>> +
>>>> 
>>>> I'm wondering if this doesn't really belong in the DMA API for any
>>>> future architectures that can't avoid prolonged write buffering to DMA
>>>> coherent memory. IIUC, ARM mitigates this for most drivers by including
>>>> an implicit write buffer flush in the mmio write routines. This takes
>>>> care of the drivers which write to a mmio device register after writing
>>>> something to shared DMA memory. IIUC, this doesn't help ehci because the
>>>> host controller is polling to see what the cpu writes to the shared
>>>> memory. Other hardware which polls shared memory like that will likely
>>>> have the same problem and could use buffer drain helpers as well.
>>> 
>>> Right. In this case the buffering is happening at L2 which becomes
>>> noticeable when measuring performance. We also buffer stores at the
>>> CPU (regardless of memory type) but because these tend to become visible
>>> fairly quickly, there isn't a comparable performance problem.
>>> 
>>> Given that I would expect other architectures to buffer writes at the CPU,
>>> would it not be worth having an API for flushing to L3 (devices)? It seems
>>> like this would be a useful addition to the coherent DMA API on platforms
>>> that handle coherency with non-cacheable memory attributes.
>> 
>> I agree, this seems to be a "new" type of barrier that is needed, as the
>> code comment above seems to go against what the kernel memory barrier
>> documentation says about what a memory barrier really does on the
>> hardware.
> 
> Although this doesn't have anything to do with ordering; it's all to do with
> immediacy so I think we should try to avoiding using the term `barrier'. If
> this can be made part of the coherent DMA API, that might be the best place
> for it (I can't think of any other areas this is needed given that the
> streaming DMA API and I/O accessors already deal with it).
I am agree with you. I met the same issue at both usb device driver (adding next dTD pointer which
the current one is handling) and usb host driver (performance issue this thread have discussed) at
Freescale i.MX6Q platform (4 Cores, ARM SMP).
So, now I need to add two barriers at two different drivers.

One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
also uncache, but bufferable?

> 
> Will
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Best Regard,
Peter Chen
 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 17:54           ` Will Deacon
@ 2011-08-31  0:56             ` Ming Lei
  -1 siblings, 0 replies; 65+ messages in thread
From: Ming Lei @ 2011-08-31  0:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: Greg KH, Russell King, linux-usb, stern, Mark Salter, linux-omap,
	linux-arm-kernel

On Wed, Aug 31, 2011 at 1:54 AM, Will Deacon <will.deacon@arm.com> wrote:

> Although this doesn't have anything to do with ordering; it's all to do with
> immediacy so I think we should try to avoiding using the term `barrier'. If
> this can be made part of the coherent DMA API, that might be the best place
> for it (I can't think of any other areas this is needed given that the
> streaming DMA API and I/O accessors already deal with it).

Agree too.


thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31  0:56             ` Ming Lei
  0 siblings, 0 replies; 65+ messages in thread
From: Ming Lei @ 2011-08-31  0:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 31, 2011 at 1:54 AM, Will Deacon <will.deacon@arm.com> wrote:

> Although this doesn't have anything to do with ordering; it's all to do with
> immediacy so I think we should try to avoiding using the term `barrier'. If
> this can be made part of the coherent DMA API, that might be the best place
> for it (I can't think of any other areas this is needed given that the
> streaming DMA API and I/O accessors already deal with it).

Agree too.


thanks,
--
Ming Lei

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31  0:23               ` Chen Peter-B29397
@ 2011-08-31  8:49                 ` Will Deacon
  -1 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31  8:49 UTC (permalink / raw)
  To: Chen Peter-B29397
  Cc: Russell King, Greg KH, ming.lei, linux-usb, stern, Mark Salter,
	linux-omap, linux-arm-kernel

On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> also uncache, but bufferable?

Which CPU was on this platform?

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31  8:49                 ` Will Deacon
  0 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> also uncache, but bufferable?

Which CPU was on this platform?

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31  8:49                 ` Will Deacon
@ 2011-08-31 12:33                   ` Chen Peter-B29397
  -1 siblings, 0 replies; 65+ messages in thread
From: Chen Peter-B29397 @ 2011-08-31 12:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King, Chen Peter-B29397, ming.lei, linux-usb, Greg KH,
	Mark Salter, stern, linux-omap, linux-arm-kernel



Best Regard,
Peter Chen
 



On Aug 31, 2011, at 4:49 PM, Will Deacon wrote:

> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>> also uncache, but bufferable?
> 
> Which CPU was on this platform?

Cortex A8 UP system (freescale i.MX5x platform)
> 
> Will
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 12:33                   ` Chen Peter-B29397
  0 siblings, 0 replies; 65+ messages in thread
From: Chen Peter-B29397 @ 2011-08-31 12:33 UTC (permalink / raw)
  To: linux-arm-kernel



Best Regard,
Peter Chen
 



On Aug 31, 2011, at 4:49 PM, Will Deacon wrote:

> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>> also uncache, but bufferable?
> 
> Which CPU was on this platform?

Cortex A8 UP system (freescale i.MX5x platform)
> 
> Will
> 

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31  8:49                 ` Will Deacon
@ 2011-08-31 13:43                   ` Mark Salter
  -1 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 13:43 UTC (permalink / raw)
  To: Will Deacon
  Cc: Chen Peter-B29397, Greg KH, ming.lei, Russell King, linux-usb,
	stern, linux-omap, linux-arm-kernel

On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > also uncache, but bufferable?
> 
> Which CPU was on this platform?

Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
nosmp on the commandline, I see 20.3MB/s.

Can someone explain why nosmp would make such a difference?

--Mark



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 13:43                   ` Mark Salter
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 13:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > also uncache, but bufferable?
> 
> Which CPU was on this platform?

Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
nosmp on the commandline, I see 20.3MB/s.

Can someone explain why nosmp would make such a difference?

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 13:43                   ` Mark Salter
@ 2011-08-31 15:21                     ` Will Deacon
  -1 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31 15:21 UTC (permalink / raw)
  To: Mark Salter
  Cc: Russell King, Chen Peter-B29397, ming.lei, linux-usb, stern,
	Greg KH, linux-omap, linux-arm-kernel

On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > also uncache, but bufferable?
> > 
> > Which CPU was on this platform?
> 
> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> nosmp on the commandline, I see 20.3MB/s.
> 
> Can someone explain why nosmp would make such a difference?

Oh gawd, that's horrible. I have a feeling it's probably a separate issue
though, caused by:

omap_modify_auxcoreboot0(0x200, 0xfffffdff);

in boot_secondary for OMAP. Unfortunately I have no idea what that line is
doing because it ends up talking to the secure monitor.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 15:21                     ` Will Deacon
  0 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > also uncache, but bufferable?
> > 
> > Which CPU was on this platform?
> 
> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> nosmp on the commandline, I see 20.3MB/s.
> 
> Can someone explain why nosmp would make such a difference?

Oh gawd, that's horrible. I have a feeling it's probably a separate issue
though, caused by:

omap_modify_auxcoreboot0(0x200, 0xfffffdff);

in boot_secondary for OMAP. Unfortunately I have no idea what that line is
doing because it ends up talking to the secure monitor.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 15:21                     ` Will Deacon
@ 2011-08-31 15:27                       ` Mark Salter
  -1 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 15:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King, Chen Peter-B29397, ming.lei, linux-usb, stern,
	Greg KH, linux-omap, linux-arm-kernel

On Wed, 2011-08-31 at 16:21 +0100, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> > On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > > also uncache, but bufferable?
> > > 
> > > Which CPU was on this platform?
> > 
> > Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> > usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> > nosmp on the commandline, I see 20.3MB/s.
> > 
> > Can someone explain why nosmp would make such a difference?
> 
> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> though, caused by:
> 
> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> 
> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> doing because it ends up talking to the secure monitor.

Okay, I may poke around a bit with that to see I can get a better
understanding.

With the patched ehci-q.c, I see no noticeable difference between smp
and nosmp. Both get me around 23.5MB/s with my setup.

--Mark



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 15:27                       ` Mark Salter
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-08-31 at 16:21 +0100, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> > On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > > also uncache, but bufferable?
> > > 
> > > Which CPU was on this platform?
> > 
> > Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> > usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> > nosmp on the commandline, I see 20.3MB/s.
> > 
> > Can someone explain why nosmp would make such a difference?
> 
> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> though, caused by:
> 
> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> 
> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> doing because it ends up talking to the secure monitor.

Okay, I may poke around a bit with that to see I can get a better
understanding.

With the patched ehci-q.c, I see no noticeable difference between smp
and nosmp. Both get me around 23.5MB/s with my setup.

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 15:27                       ` Mark Salter
@ 2011-08-31 16:12                         ` Marc Zyngier
  -1 siblings, 0 replies; 65+ messages in thread
From: Marc Zyngier @ 2011-08-31 16:12 UTC (permalink / raw)
  To: Mark Salter
  Cc: Russell King, Chen Peter-B29397, ming.lei, linux-usb,
	Will Deacon, stern, Greg KH, linux-omap, linux-arm-kernel

On 31/08/11 16:27, Mark Salter wrote:
> On Wed, 2011-08-31 at 16:21 +0100, Will Deacon wrote:
>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>>>>> also uncache, but bufferable?
>>>>
>>>> Which CPU was on this platform?
>>>
>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
>>> nosmp on the commandline, I see 20.3MB/s.
>>>
>>> Can someone explain why nosmp would make such a difference?
>>
>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
>> though, caused by:
>>
>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
>>
>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
>> doing because it ends up talking to the secure monitor.
> 
> Okay, I may poke around a bit with that to see I can get a better
> understanding.
> 
> With the patched ehci-q.c, I see no noticeable difference between smp
> and nosmp. Both get me around 23.5MB/s with my setup.

Oddly enough, this patch doesn't do anything on my Tegra setup. In both
cases, I get around 17MB/s from a crap SD card plugged in a USB reader.

This leads me to suspect that this issue is very much OMAP4 specific.
Can anyone verify this theory on other some A9 platforms?

Cheers,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 16:12                         ` Marc Zyngier
  0 siblings, 0 replies; 65+ messages in thread
From: Marc Zyngier @ 2011-08-31 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 31/08/11 16:27, Mark Salter wrote:
> On Wed, 2011-08-31 at 16:21 +0100, Will Deacon wrote:
>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>>>>> also uncache, but bufferable?
>>>>
>>>> Which CPU was on this platform?
>>>
>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
>>> nosmp on the commandline, I see 20.3MB/s.
>>>
>>> Can someone explain why nosmp would make such a difference?
>>
>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
>> though, caused by:
>>
>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
>>
>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
>> doing because it ends up talking to the secure monitor.
> 
> Okay, I may poke around a bit with that to see I can get a better
> understanding.
> 
> With the patched ehci-q.c, I see no noticeable difference between smp
> and nosmp. Both get me around 23.5MB/s with my setup.

Oddly enough, this patch doesn't do anything on my Tegra setup. In both
cases, I get around 17MB/s from a crap SD card plugged in a USB reader.

This leads me to suspect that this issue is very much OMAP4 specific.
Can anyone verify this theory on other some A9 platforms?

Cheers,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 16:12                         ` Marc Zyngier
@ 2011-08-31 16:55                           ` Marc Dietrich
  -1 siblings, 0 replies; 65+ messages in thread
From: Marc Dietrich @ 2011-08-31 16:55 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Marc Zyngier, Mark Salter, Russell King, Chen Peter-B29397,
	ming.lei, linux-usb, Will Deacon, stern, Greg KH, linux-omap

Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
> [...]
> Oddly enough, this patch doesn't do anything on my Tegra setup. In both
> cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
> 
> This leads me to suspect that this issue is very much OMAP4 specific.
> Can anyone verify this theory on other some A9 platforms?

That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to 
17 MB/s. 

Marc



^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 16:55                           ` Marc Dietrich
  0 siblings, 0 replies; 65+ messages in thread
From: Marc Dietrich @ 2011-08-31 16:55 UTC (permalink / raw)
  To: linux-arm-kernel

Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
> [...]
> Oddly enough, this patch doesn't do anything on my Tegra setup. In both
> cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
> 
> This leads me to suspect that this issue is very much OMAP4 specific.
> Can anyone verify this theory on other some A9 platforms?

That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to 
17 MB/s. 

Marc

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 15:21                     ` Will Deacon
@ 2011-08-31 17:46                       ` Nicolas Pitre
  -1 siblings, 0 replies; 65+ messages in thread
From: Nicolas Pitre @ 2011-08-31 17:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Salter, Russell King, Chen Peter-B29397, ming.lei,
	linux-usb, stern, Greg KH, linux-omap, linux-arm-kernel

On Wed, 31 Aug 2011, Will Deacon wrote:

> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> > On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > > also uncache, but bufferable?
> > > 
> > > Which CPU was on this platform?
> > 
> > Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> > usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> > nosmp on the commandline, I see 20.3MB/s.
> > 
> > Can someone explain why nosmp would make such a difference?
> 
> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> though, caused by:
> 
> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> 
> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> doing because it ends up talking to the secure monitor.

Well, this issue is apparently affecting other ARMv9 implementations 
too.  In which case this code in arch/arm/mm/mmu.c could be responsible:

                if (is_smp()) {
                        /*
                         * Mark memory with the "shared" attribute
                         * for SMP systems
                         */
                        user_pgprot |= L_PTE_SHARED;
                        kern_pgprot |= L_PTE_SHARED;
                        vecs_pgprot |= L_PTE_SHARED;
                        mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
                        mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
                        mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
                        mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
                        mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
                        mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
                        mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
                        mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
                }

However I don't see the nosmp kernel argument having any effect on the 
result from is_smp().


Nicolas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 17:46                       ` Nicolas Pitre
  0 siblings, 0 replies; 65+ messages in thread
From: Nicolas Pitre @ 2011-08-31 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 31 Aug 2011, Will Deacon wrote:

> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> > On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > > also uncache, but bufferable?
> > > 
> > > Which CPU was on this platform?
> > 
> > Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> > usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> > nosmp on the commandline, I see 20.3MB/s.
> > 
> > Can someone explain why nosmp would make such a difference?
> 
> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> though, caused by:
> 
> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> 
> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> doing because it ends up talking to the secure monitor.

Well, this issue is apparently affecting other ARMv9 implementations 
too.  In which case this code in arch/arm/mm/mmu.c could be responsible:

                if (is_smp()) {
                        /*
                         * Mark memory with the "shared" attribute
                         * for SMP systems
                         */
                        user_pgprot |= L_PTE_SHARED;
                        kern_pgprot |= L_PTE_SHARED;
                        vecs_pgprot |= L_PTE_SHARED;
                        mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
                        mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
                        mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
                        mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
                        mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
                        mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
                        mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
                        mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
                }

However I don't see the nosmp kernel argument having any effect on the 
result from is_smp().


Nicolas

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 17:46                       ` Nicolas Pitre
@ 2011-08-31 17:51                         ` Will Deacon
  -1 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31 17:51 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Russell King, Greg KH, Chen Peter-B29397, ming.lei, linux-usb,
	stern, Mark Salter, linux-omap, linux-arm-kernel

On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
> On Wed, 31 Aug 2011, Will Deacon wrote:
> 
> > On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> > > On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > > > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > > > also uncache, but bufferable?
> > > > 
> > > > Which CPU was on this platform?
> > > 
> > > Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> > > usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> > > nosmp on the commandline, I see 20.3MB/s.
> > > 
> > > Can someone explain why nosmp would make such a difference?
> > 
> > Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> > though, caused by:
> > 
> > omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> > 
> > in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> > doing because it ends up talking to the secure monitor.
> 
> Well, this issue is apparently affecting other ARMv9 implementations 
> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
> 
>                 if (is_smp()) {
>                         /*
>                          * Mark memory with the "shared" attribute
>                          * for SMP systems
>                          */
>                         user_pgprot |= L_PTE_SHARED;
>                         kern_pgprot |= L_PTE_SHARED;
>                         vecs_pgprot |= L_PTE_SHARED;
>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
>                 }
> 
> However I don't see the nosmp kernel argument having any effect on the 
> result from is_smp().

Yes, the first thing that sprung to mind was the shared attribute, but like
you say, that doesn't seem to be affected by the nosmp command line
argument.

Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
CPU during boot (by commenting out most of smp_init). In this case, I/O
performance was good until we tried to online the secondary CPU. The online
failed but after that the I/O performance was certainly degraded.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 17:51                         ` Will Deacon
  0 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31 17:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
> On Wed, 31 Aug 2011, Will Deacon wrote:
> 
> > On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> > > On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> > > > On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> > > > > One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> > > > > also uncache, but bufferable?
> > > > 
> > > > Which CPU was on this platform?
> > > 
> > > Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> > > usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> > > nosmp on the commandline, I see 20.3MB/s.
> > > 
> > > Can someone explain why nosmp would make such a difference?
> > 
> > Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> > though, caused by:
> > 
> > omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> > 
> > in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> > doing because it ends up talking to the secure monitor.
> 
> Well, this issue is apparently affecting other ARMv9 implementations 
> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
> 
>                 if (is_smp()) {
>                         /*
>                          * Mark memory with the "shared" attribute
>                          * for SMP systems
>                          */
>                         user_pgprot |= L_PTE_SHARED;
>                         kern_pgprot |= L_PTE_SHARED;
>                         vecs_pgprot |= L_PTE_SHARED;
>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
>                 }
> 
> However I don't see the nosmp kernel argument having any effect on the 
> result from is_smp().

Yes, the first thing that sprung to mind was the shared attribute, but like
you say, that doesn't seem to be affected by the nosmp command line
argument.

Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
CPU during boot (by commenting out most of smp_init). In this case, I/O
performance was good until we tried to online the secondary CPU. The online
failed but after that the I/O performance was certainly degraded.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 17:51                         ` Will Deacon
@ 2011-08-31 18:19                             ` Rob Herring
  -1 siblings, 0 replies; 65+ messages in thread
From: Rob Herring @ 2011-08-31 18:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Nicolas Pitre, Russell King, Greg KH, Chen Peter-B29397,
	ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz, Mark Salter,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 08/31/2011 12:51 PM, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
>> On Wed, 31 Aug 2011, Will Deacon wrote:
>>
>>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
>>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
>>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>>>>>> also uncache, but bufferable?
>>>>>
>>>>> Which CPU was on this platform?
>>>>
>>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
>>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
>>>> nosmp on the commandline, I see 20.3MB/s.
>>>>
>>>> Can someone explain why nosmp would make such a difference?
>>>
>>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
>>> though, caused by:
>>>
>>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
>>>
>>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
>>> doing because it ends up talking to the secure monitor.
>>
>> Well, this issue is apparently affecting other ARMv9 implementations 
>> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
>>
>>                 if (is_smp()) {
>>                         /*
>>                          * Mark memory with the "shared" attribute
>>                          * for SMP systems
>>                          */
>>                         user_pgprot |= L_PTE_SHARED;
>>                         kern_pgprot |= L_PTE_SHARED;
>>                         vecs_pgprot |= L_PTE_SHARED;
>>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
>>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
>>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
>>                 }
>>
>> However I don't see the nosmp kernel argument having any effect on the 
>> result from is_smp().
> 
> Yes, the first thing that sprung to mind was the shared attribute, but like
> you say, that doesn't seem to be affected by the nosmp command line
> argument.
> 
> Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> CPU during boot (by commenting out most of smp_init). In this case, I/O
> performance was good until we tried to online the secondary CPU. The online
> failed but after that the I/O performance was certainly degraded.
> 

Was the SCU enabled at that point? One diff between nosmp boot and
offlining the 2nd core would be that the SCU remains enabled in the
latter case. I think the SCU does not get enabled for nosmp.

Do we really know which write buffer the data is sitting? Some
experiments to only flush the L1 write buffer would be interesting.
Perhaps something executed on the 2nd core has a mb which doesn't help
for SMP because the other core's L1 write buffer is not flushed, but it
helps for nosmp because everything runs on 1 core and any occurrence of
a mb will flush all data out. I wouldn't expect the behavior to be so
consistent though. Could it be something is not visible to the other
core rather than not visible to the EHCI controller?

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 18:19                             ` Rob Herring
  0 siblings, 0 replies; 65+ messages in thread
From: Rob Herring @ 2011-08-31 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/31/2011 12:51 PM, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
>> On Wed, 31 Aug 2011, Will Deacon wrote:
>>
>>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
>>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
>>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>>>>>> also uncache, but bufferable?
>>>>>
>>>>> Which CPU was on this platform?
>>>>
>>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
>>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
>>>> nosmp on the commandline, I see 20.3MB/s.
>>>>
>>>> Can someone explain why nosmp would make such a difference?
>>>
>>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
>>> though, caused by:
>>>
>>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
>>>
>>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
>>> doing because it ends up talking to the secure monitor.
>>
>> Well, this issue is apparently affecting other ARMv9 implementations 
>> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
>>
>>                 if (is_smp()) {
>>                         /*
>>                          * Mark memory with the "shared" attribute
>>                          * for SMP systems
>>                          */
>>                         user_pgprot |= L_PTE_SHARED;
>>                         kern_pgprot |= L_PTE_SHARED;
>>                         vecs_pgprot |= L_PTE_SHARED;
>>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
>>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
>>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
>>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
>>                 }
>>
>> However I don't see the nosmp kernel argument having any effect on the 
>> result from is_smp().
> 
> Yes, the first thing that sprung to mind was the shared attribute, but like
> you say, that doesn't seem to be affected by the nosmp command line
> argument.
> 
> Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> CPU during boot (by commenting out most of smp_init). In this case, I/O
> performance was good until we tried to online the secondary CPU. The online
> failed but after that the I/O performance was certainly degraded.
> 

Was the SCU enabled at that point? One diff between nosmp boot and
offlining the 2nd core would be that the SCU remains enabled in the
latter case. I think the SCU does not get enabled for nosmp.

Do we really know which write buffer the data is sitting? Some
experiments to only flush the L1 write buffer would be interesting.
Perhaps something executed on the 2nd core has a mb which doesn't help
for SMP because the other core's L1 write buffer is not flushed, but it
helps for nosmp because everything runs on 1 core and any occurrence of
a mb will flush all data out. I wouldn't expect the behavior to be so
consistent though. Could it be something is not visible to the other
core rather than not visible to the EHCI controller?

Rob

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 18:19                             ` Rob Herring
@ 2011-08-31 18:35                               ` Mark Salter
  -1 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 18:35 UTC (permalink / raw)
  To: Rob Herring
  Cc: Will Deacon, Nicolas Pitre, Russell King, Greg KH,
	Chen Peter-B29397, ming.lei, linux-usb, stern, linux-omap,
	linux-arm-kernel

On Wed, 2011-08-31 at 13:19 -0500, Rob Herring wrote:
> On 08/31/2011 12:51 PM, Will Deacon wrote:
> > On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
> >> On Wed, 31 Aug 2011, Will Deacon wrote:
> >>
> >>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> >>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> >>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> >>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> >>>>>> also uncache, but bufferable?
> >>>>>
> >>>>> Which CPU was on this platform?
> >>>>
> >>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> >>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> >>>> nosmp on the commandline, I see 20.3MB/s.
> >>>>
> >>>> Can someone explain why nosmp would make such a difference?
> >>>
> >>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> >>> though, caused by:
> >>>
> >>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> >>>
> >>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> >>> doing because it ends up talking to the secure monitor.
> >>
> >> Well, this issue is apparently affecting other ARMv9 implementations 
> >> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
> >>
> >>                 if (is_smp()) {
> >>                         /*
> >>                          * Mark memory with the "shared" attribute
> >>                          * for SMP systems
> >>                          */
> >>                         user_pgprot |= L_PTE_SHARED;
> >>                         kern_pgprot |= L_PTE_SHARED;
> >>                         vecs_pgprot |= L_PTE_SHARED;
> >>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
> >>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
> >>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
> >>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
> >>                 }
> >>
> >> However I don't see the nosmp kernel argument having any effect on the 
> >> result from is_smp().
> > 
> > Yes, the first thing that sprung to mind was the shared attribute, but like
> > you say, that doesn't seem to be affected by the nosmp command line
> > argument.
> > 
> > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> > CPU during boot (by commenting out most of smp_init). In this case, I/O
> > performance was good until we tried to online the secondary CPU. The online
> > failed but after that the I/O performance was certainly degraded.
> > 
> 
> Was the SCU enabled at that point? One diff between nosmp boot and
> offlining the 2nd core would be that the SCU remains enabled in the
> latter case. I think the SCU does not get enabled for nosmp.
> 
> Do we really know which write buffer the data is sitting? Some
> experiments to only flush the L1 write buffer would be interesting.
> Perhaps something executed on the 2nd core has a mb which doesn't help
> for SMP because the other core's L1 write buffer is not flushed, but it
> helps for nosmp because everything runs on 1 core and any occurrence of
> a mb will flush all data out. I wouldn't expect the behavior to be so
> consistent though. Could it be something is not visible to the other
> core rather than not visible to the EHCI controller?

One experiment I did a few days ago was to pin processes and interrupts
to core#0 (except IPI and local timer). This didn't make any noticeable
difference.

My current understanding is that the writes are getting hung up in a
cache and not a write buffer. I am seeing delays of 10-15ms between
queuing the urb and getting an interrupt for urb completion. That
drops to a few hundred microseconds with the explicit flushing added
to the ehci driver. I don't see how any write buffer could hold data
that long without draining out on its own. What I see seems to suggest
that the memory is only coherent among the cores and not coherent for
CPU writes/device reads. Adding just a dsb() for the ehci flush does
not help. An outer_sync() is also necessary.

--Mark





^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 18:35                               ` Mark Salter
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 18:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-08-31 at 13:19 -0500, Rob Herring wrote:
> On 08/31/2011 12:51 PM, Will Deacon wrote:
> > On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
> >> On Wed, 31 Aug 2011, Will Deacon wrote:
> >>
> >>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
> >>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
> >>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
> >>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
> >>>>>> also uncache, but bufferable?
> >>>>>
> >>>>> Which CPU was on this platform?
> >>>>
> >>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
> >>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
> >>>> nosmp on the commandline, I see 20.3MB/s.
> >>>>
> >>>> Can someone explain why nosmp would make such a difference?
> >>>
> >>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
> >>> though, caused by:
> >>>
> >>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
> >>>
> >>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
> >>> doing because it ends up talking to the secure monitor.
> >>
> >> Well, this issue is apparently affecting other ARMv9 implementations 
> >> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
> >>
> >>                 if (is_smp()) {
> >>                         /*
> >>                          * Mark memory with the "shared" attribute
> >>                          * for SMP systems
> >>                          */
> >>                         user_pgprot |= L_PTE_SHARED;
> >>                         kern_pgprot |= L_PTE_SHARED;
> >>                         vecs_pgprot |= L_PTE_SHARED;
> >>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
> >>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
> >>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
> >>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
> >>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
> >>                 }
> >>
> >> However I don't see the nosmp kernel argument having any effect on the 
> >> result from is_smp().
> > 
> > Yes, the first thing that sprung to mind was the shared attribute, but like
> > you say, that doesn't seem to be affected by the nosmp command line
> > argument.
> > 
> > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> > CPU during boot (by commenting out most of smp_init). In this case, I/O
> > performance was good until we tried to online the secondary CPU. The online
> > failed but after that the I/O performance was certainly degraded.
> > 
> 
> Was the SCU enabled at that point? One diff between nosmp boot and
> offlining the 2nd core would be that the SCU remains enabled in the
> latter case. I think the SCU does not get enabled for nosmp.
> 
> Do we really know which write buffer the data is sitting? Some
> experiments to only flush the L1 write buffer would be interesting.
> Perhaps something executed on the 2nd core has a mb which doesn't help
> for SMP because the other core's L1 write buffer is not flushed, but it
> helps for nosmp because everything runs on 1 core and any occurrence of
> a mb will flush all data out. I wouldn't expect the behavior to be so
> consistent though. Could it be something is not visible to the other
> core rather than not visible to the EHCI controller?

One experiment I did a few days ago was to pin processes and interrupts
to core#0 (except IPI and local timer). This didn't make any noticeable
difference.

My current understanding is that the writes are getting hung up in a
cache and not a write buffer. I am seeing delays of 10-15ms between
queuing the urb and getting an interrupt for urb completion. That
drops to a few hundred microseconds with the explicit flushing added
to the ehci driver. I don't see how any write buffer could hold data
that long without draining out on its own. What I see seems to suggest
that the memory is only coherent among the cores and not coherent for
CPU writes/device reads. Adding just a dsb() for the ehci flush does
not help. An outer_sync() is also necessary.

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 18:35                               ` Mark Salter
@ 2011-08-31 18:49                                 ` Rob Herring
  -1 siblings, 0 replies; 65+ messages in thread
From: Rob Herring @ 2011-08-31 18:49 UTC (permalink / raw)
  To: Mark Salter
  Cc: Russell King, Nicolas Pitre, Chen Peter-B29397, ming.lei,
	linux-usb, Will Deacon, stern, Greg KH, linux-omap,
	linux-arm-kernel

On 08/31/2011 01:35 PM, Mark Salter wrote:
> On Wed, 2011-08-31 at 13:19 -0500, Rob Herring wrote:
>> On 08/31/2011 12:51 PM, Will Deacon wrote:
>>> On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
>>>> On Wed, 31 Aug 2011, Will Deacon wrote:
>>>>
>>>>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
>>>>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
>>>>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>>>>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>>>>>>>> also uncache, but bufferable?
>>>>>>>
>>>>>>> Which CPU was on this platform?
>>>>>>
>>>>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
>>>>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
>>>>>> nosmp on the commandline, I see 20.3MB/s.
>>>>>>
>>>>>> Can someone explain why nosmp would make such a difference?
>>>>>
>>>>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
>>>>> though, caused by:
>>>>>
>>>>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
>>>>>
>>>>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
>>>>> doing because it ends up talking to the secure monitor.
>>>>
>>>> Well, this issue is apparently affecting other ARMv9 implementations 
>>>> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
>>>>
>>>>                 if (is_smp()) {
>>>>                         /*
>>>>                          * Mark memory with the "shared" attribute
>>>>                          * for SMP systems
>>>>                          */
>>>>                         user_pgprot |= L_PTE_SHARED;
>>>>                         kern_pgprot |= L_PTE_SHARED;
>>>>                         vecs_pgprot |= L_PTE_SHARED;
>>>>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>>>>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
>>>>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
>>>>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
>>>>                 }
>>>>
>>>> However I don't see the nosmp kernel argument having any effect on the 
>>>> result from is_smp().
>>>
>>> Yes, the first thing that sprung to mind was the shared attribute, but like
>>> you say, that doesn't seem to be affected by the nosmp command line
>>> argument.
>>>
>>> Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
>>> CPU during boot (by commenting out most of smp_init). In this case, I/O
>>> performance was good until we tried to online the secondary CPU. The online
>>> failed but after that the I/O performance was certainly degraded.
>>>
>>
>> Was the SCU enabled at that point? One diff between nosmp boot and
>> offlining the 2nd core would be that the SCU remains enabled in the
>> latter case. I think the SCU does not get enabled for nosmp.
>>
>> Do we really know which write buffer the data is sitting? Some
>> experiments to only flush the L1 write buffer would be interesting.
>> Perhaps something executed on the 2nd core has a mb which doesn't help
>> for SMP because the other core's L1 write buffer is not flushed, but it
>> helps for nosmp because everything runs on 1 core and any occurrence of
>> a mb will flush all data out. I wouldn't expect the behavior to be so
>> consistent though. Could it be something is not visible to the other
>> core rather than not visible to the EHCI controller?
> 
> One experiment I did a few days ago was to pin processes and interrupts
> to core#0 (except IPI and local timer). This didn't make any noticeable
> difference.
> 
> My current understanding is that the writes are getting hung up in a
> cache and not a write buffer. I am seeing delays of 10-15ms between
> queuing the urb and getting an interrupt for urb completion. That
> drops to a few hundred microseconds with the explicit flushing added
> to the ehci driver. I don't see how any write buffer could hold data
> that long without draining out on its own. What I see seems to suggest
> that the memory is only coherent among the cores and not coherent for
> CPU writes/device reads. Adding just a dsb() for the ehci flush does
> not help. An outer_sync() is also necessary.
> 
An outer_sync will only drain the write buffer of the L2. It does not
flush the cache though. If the write buffer does in fact keep data as
long as possible (until it needs a free slot or the line is full), then
long delays to write out data are certainly possible. The exact
operation is not documented AFAIR.

Rob

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 18:49                                 ` Rob Herring
  0 siblings, 0 replies; 65+ messages in thread
From: Rob Herring @ 2011-08-31 18:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/31/2011 01:35 PM, Mark Salter wrote:
> On Wed, 2011-08-31 at 13:19 -0500, Rob Herring wrote:
>> On 08/31/2011 12:51 PM, Will Deacon wrote:
>>> On Wed, Aug 31, 2011 at 06:46:50PM +0100, Nicolas Pitre wrote:
>>>> On Wed, 31 Aug 2011, Will Deacon wrote:
>>>>
>>>>> On Wed, Aug 31, 2011 at 02:43:33PM +0100, Mark Salter wrote:
>>>>>> On Wed, 2011-08-31 at 09:49 +0100, Will Deacon wrote:
>>>>>>> On Wed, Aug 31, 2011 at 01:23:47AM +0100, Chen Peter-B29397 wrote:
>>>>>>>> One question: why this write buffer issue did not happen at UP ARM V7 platform, whose dma buffer
>>>>>>>> also uncache, but bufferable?
>>>>>>>
>>>>>>> Which CPU was on this platform?
>>>>>>
>>>>>> Using a 3.1.0-rc4+ kernel on a Pandaboard, and running 'hdparm -t' on a
>>>>>> usb disk drive, I see ~5.8MB/s read speed. Same kernel, but passing
>>>>>> nosmp on the commandline, I see 20.3MB/s.
>>>>>>
>>>>>> Can someone explain why nosmp would make such a difference?
>>>>>
>>>>> Oh gawd, that's horrible. I have a feeling it's probably a separate issue
>>>>> though, caused by:
>>>>>
>>>>> omap_modify_auxcoreboot0(0x200, 0xfffffdff);
>>>>>
>>>>> in boot_secondary for OMAP. Unfortunately I have no idea what that line is
>>>>> doing because it ends up talking to the secure monitor.
>>>>
>>>> Well, this issue is apparently affecting other ARMv9 implementations 
>>>> too.  In which case this code in arch/arm/mm/mmu.c could be responsible:
>>>>
>>>>                 if (is_smp()) {
>>>>                         /*
>>>>                          * Mark memory with the "shared" attribute
>>>>                          * for SMP systems
>>>>                          */
>>>>                         user_pgprot |= L_PTE_SHARED;
>>>>                         kern_pgprot |= L_PTE_SHARED;
>>>>                         vecs_pgprot |= L_PTE_SHARED;
>>>>                         mem_types[MT_DEVICE_WC].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_DEVICE_WC].prot_pte |= L_PTE_SHARED;
>>>>                         mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
>>>>                         mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
>>>>                         mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
>>>>                         mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
>>>>                 }
>>>>
>>>> However I don't see the nosmp kernel argument having any effect on the 
>>>> result from is_smp().
>>>
>>> Yes, the first thing that sprung to mind was the shared attribute, but like
>>> you say, that doesn't seem to be affected by the nosmp command line
>>> argument.
>>>
>>> Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
>>> CPU during boot (by commenting out most of smp_init). In this case, I/O
>>> performance was good until we tried to online the secondary CPU. The online
>>> failed but after that the I/O performance was certainly degraded.
>>>
>>
>> Was the SCU enabled at that point? One diff between nosmp boot and
>> offlining the 2nd core would be that the SCU remains enabled in the
>> latter case. I think the SCU does not get enabled for nosmp.
>>
>> Do we really know which write buffer the data is sitting? Some
>> experiments to only flush the L1 write buffer would be interesting.
>> Perhaps something executed on the 2nd core has a mb which doesn't help
>> for SMP because the other core's L1 write buffer is not flushed, but it
>> helps for nosmp because everything runs on 1 core and any occurrence of
>> a mb will flush all data out. I wouldn't expect the behavior to be so
>> consistent though. Could it be something is not visible to the other
>> core rather than not visible to the EHCI controller?
> 
> One experiment I did a few days ago was to pin processes and interrupts
> to core#0 (except IPI and local timer). This didn't make any noticeable
> difference.
> 
> My current understanding is that the writes are getting hung up in a
> cache and not a write buffer. I am seeing delays of 10-15ms between
> queuing the urb and getting an interrupt for urb completion. That
> drops to a few hundred microseconds with the explicit flushing added
> to the ehci driver. I don't see how any write buffer could hold data
> that long without draining out on its own. What I see seems to suggest
> that the memory is only coherent among the cores and not coherent for
> CPU writes/device reads. Adding just a dsb() for the ehci flush does
> not help. An outer_sync() is also necessary.
> 
An outer_sync will only drain the write buffer of the L2. It does not
flush the cache though. If the write buffer does in fact keep data as
long as possible (until it needs a free slot or the line is full), then
long delays to write out data are certainly possible. The exact
operation is not documented AFAIR.

Rob

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 18:49                                 ` Rob Herring
@ 2011-08-31 18:58                                     ` Mark Salter
  -1 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 18:58 UTC (permalink / raw)
  To: Rob Herring
  Cc: Will Deacon, Nicolas Pitre, Russell King, Greg KH,
	Chen Peter-B29397, ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, 2011-08-31 at 13:49 -0500, Rob Herring wrote:
> An outer_sync will only drain the write buffer of the L2. It does not
> flush the cache though. If the write buffer does in fact keep data as
> long as possible (until it needs a free slot or the line is full), then
> long delays to write out data are certainly possible. The exact
> operation is not documented AFAIR.

Ah, thanks for that. I really haven't been paying close attention to
ARMv6/7 hardware and I'm in the process of playing catch up.

--Mark


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 18:58                                     ` Mark Salter
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-08-31 18:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-08-31 at 13:49 -0500, Rob Herring wrote:
> An outer_sync will only drain the write buffer of the L2. It does not
> flush the cache though. If the write buffer does in fact keep data as
> long as possible (until it needs a free slot or the line is full), then
> long delays to write out data are certainly possible. The exact
> operation is not documented AFAIR.

Ah, thanks for that. I really haven't been paying close attention to
ARMv6/7 hardware and I'm in the process of playing catch up.

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 18:19                             ` Rob Herring
@ 2011-08-31 19:35                               ` Will Deacon
  -1 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31 19:35 UTC (permalink / raw)
  To: Rob Herring
  Cc: Russell King, Nicolas Pitre, Greg KH, ming.lei, linux-usb, stern,
	Mark Salter, Chen Peter-B29397, linux-omap, linux-arm-kernel

On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
> On 08/31/2011 12:51 PM, Will Deacon wrote:
> > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> > CPU during boot (by commenting out most of smp_init). In this case, I/O
> > performance was good until we tried to online the secondary CPU. The online
> > failed but after that the I/O performance was certainly degraded.
> > 
> 
> Was the SCU enabled at that point? One diff between nosmp boot and
> offlining the 2nd core would be that the SCU remains enabled in the
> latter case. I think the SCU does not get enabled for nosmp.

Our rudimentary test (printing out the SCU control register during boot)
showed that it *was* enabled for nosmp. I think this is due to the secure
world having to do that on OMAP so it's probably not true for other
platforms.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-08-31 19:35                               ` Will Deacon
  0 siblings, 0 replies; 65+ messages in thread
From: Will Deacon @ 2011-08-31 19:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
> On 08/31/2011 12:51 PM, Will Deacon wrote:
> > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> > CPU during boot (by commenting out most of smp_init). In this case, I/O
> > performance was good until we tried to online the secondary CPU. The online
> > failed but after that the I/O performance was certainly degraded.
> > 
> 
> Was the SCU enabled at that point? One diff between nosmp boot and
> offlining the 2nd core would be that the SCU remains enabled in the
> latter case. I think the SCU does not get enabled for nosmp.

Our rudimentary test (printing out the SCU control register during boot)
showed that it *was* enabled for nosmp. I think this is due to the secure
world having to do that on OMAP so it's probably not true for other
platforms.

Will

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 16:55                           ` Marc Dietrich
@ 2011-09-01 10:34                             ` Marc Zyngier
  -1 siblings, 0 replies; 65+ messages in thread
From: Marc Zyngier @ 2011-09-01 10:34 UTC (permalink / raw)
  To: Marc Dietrich
  Cc: Russell King, Greg KH, ming.lei, linux-usb, Will Deacon, stern,
	Mark Salter, Chen Peter-B29397, linux-omap, linux-arm-kernel

Hi Marc,

On 31/08/11 17:55, Marc Dietrich wrote:
> Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
>> [...]
>> Oddly enough, this patch doesn't do anything on my Tegra setup. In both
>> cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
>>
>> This leads me to suspect that this issue is very much OMAP4 specific.
>> Can anyone verify this theory on other some A9 platforms?
> 
> That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to 
> 17 MB/s. 

I'm using a Harmony board. Could you share your kernel version, .config
and dmesg?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-09-01 10:34                             ` Marc Zyngier
  0 siblings, 0 replies; 65+ messages in thread
From: Marc Zyngier @ 2011-09-01 10:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On 31/08/11 17:55, Marc Dietrich wrote:
> Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
>> [...]
>> Oddly enough, this patch doesn't do anything on my Tegra setup. In both
>> cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
>>
>> This leads me to suspect that this issue is very much OMAP4 specific.
>> Can anyone verify this theory on other some A9 platforms?
> 
> That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to 
> 17 MB/s. 

I'm using a Harmony board. Could you share your kernel version, .config
and dmesg?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-09-01 10:34                             ` Marc Zyngier
@ 2011-09-01 11:13                                 ` Marc Dietich
  -1 siblings, 0 replies; 65+ messages in thread
From: Marc Dietich @ 2011-09-01 11:13 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Russell King,
	Greg KH, Chen Peter-B29397, ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz, Mark Salter,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA, Stephen Warren

> Hi Marc,

^dito,

> On 31/08/11 17:55, Marc Dietrich wrote:
> > Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
> >> [...]
> >> Oddly enough, this patch doesn't do anything on my Tegra setup. In both
> >> cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
> >> 
> >> This leads me to suspect that this issue is very much OMAP4 specific.
> >> Can anyone verify this theory on other some A9 platforms?
> > 
> > That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to
> > 17 MB/s.
> 
> I'm using a Harmony board. Could you share your kernel version, .config
> and dmesg?
> 
> Thanks,
> 
> 	M.

I use the chromiumos tree (for chromiumos 2.6.38 kernel: 
http://git.chromium.org/chromiumos/third_party/kernel-next.git) with some 
additions to make it run on the AC100. This modified tree is on 
git://gitorious.org/~marvin24/ac100/marvin24s-kernel.git. The config is 
paz00_defconfig and a dmesg you can get e.g. from http://pastebin.com/9uVfDWma
(it's not very current, but it should be sufficient).

I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on. 

Btw, this is the patch I used: http://gitorious.org/~marvin24/ac100/marvin24s-
kernel/commit/cce8d9e25d009a45c219a6ad0b9ac4e27d034ab0

	Marc
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-09-01 11:13                                 ` Marc Dietich
  0 siblings, 0 replies; 65+ messages in thread
From: Marc Dietich @ 2011-09-01 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

> Hi Marc,

^dito,

> On 31/08/11 17:55, Marc Dietrich wrote:
> > Am Mittwoch 31 August 2011, 18:12:48 schrieb Marc Zyngier:
> >> [...]
> >> Oddly enough, this patch doesn't do anything on my Tegra setup. In both
> >> cases, I get around 17MB/s from a crap SD card plugged in a USB reader.
> >> 
> >> This leads me to suspect that this issue is very much OMAP4 specific.
> >> Can anyone verify this theory on other some A9 platforms?
> > 
> > That's odd. On my Tegra2 (on ac100) it boosts the transfer rate from 7 to
> > 17 MB/s.
> 
> I'm using a Harmony board. Could you share your kernel version, .config
> and dmesg?
> 
> Thanks,
> 
> 	M.

I use the chromiumos tree (for chromiumos 2.6.38 kernel: 
http://git.chromium.org/chromiumos/third_party/kernel-next.git) with some 
additions to make it run on the AC100. This modified tree is on 
git://gitorious.org/~marvin24/ac100/marvin24s-kernel.git. The config is 
paz00_defconfig and a dmesg you can get e.g. from http://pastebin.com/9uVfDWma
(it's not very current, but it should be sufficient).

I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on. 

Btw, this is the patch I used: http://gitorious.org/~marvin24/ac100/marvin24s-
kernel/commit/cce8d9e25d009a45c219a6ad0b9ac4e27d034ab0

	Marc

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-09-01 11:13                                 ` Marc Dietich
@ 2011-09-01 19:08                                   ` Stephen Warren
  -1 siblings, 0 replies; 65+ messages in thread
From: Stephen Warren @ 2011-09-01 19:08 UTC (permalink / raw)
  To: Marc Dietich, Marc Zyngier
  Cc: Russell King, Greg KH, ming.lei, linux-usb, Will Deacon, stern,
	Mark Salter, Chen Peter-B29397, linux-tegra, linux-omap,
	linux-arm-kernel

Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
> I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on.

Here are the results I found:

Harmony:
Tegra USB3 -> SMSC9514 hub: NOT affected
(Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)

Seaboard (springbank; clamshell):
Tegra USB1 -> no hub: Affected

Seaboard (seaboard non-clamshell):
Tegra USB1 -> no hub: Affected
Tegra USB3 -> no hub: Affected

TrimSlice:
Tegra USB3 -> unknown hub: Affected

This implies there's something different about Harmony.

Is the USB hub a clue? Seaboard doesn't have one, and although I don't
know what model TrimSlice uses, I assume it's different since I know
TrimSlice's Ethernet is not the same as Harmony's.

I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
anything USB-related.

Perhaps there's some kind of bootloader or BCT difference. However, my
Harmony and both Seaboards both use (a very old) U-Boot and BCT from
ChromeOS, so I don't imagine there's actually much difference there.

-- 
nvpublic

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-09-01 19:08                                   ` Stephen Warren
  0 siblings, 0 replies; 65+ messages in thread
From: Stephen Warren @ 2011-09-01 19:08 UTC (permalink / raw)
  To: linux-arm-kernel

Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
> I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on.

Here are the results I found:

Harmony:
Tegra USB3 -> SMSC9514 hub: NOT affected
(Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)

Seaboard (springbank; clamshell):
Tegra USB1 -> no hub: Affected

Seaboard (seaboard non-clamshell):
Tegra USB1 -> no hub: Affected
Tegra USB3 -> no hub: Affected

TrimSlice:
Tegra USB3 -> unknown hub: Affected

This implies there's something different about Harmony.

Is the USB hub a clue? Seaboard doesn't have one, and although I don't
know what model TrimSlice uses, I assume it's different since I know
TrimSlice's Ethernet is not the same as Harmony's.

I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
anything USB-related.

Perhaps there's some kind of bootloader or BCT difference. However, my
Harmony and both Seaboards both use (a very old) U-Boot and BCT from
ChromeOS, so I don't imagine there's actually much difference there.

-- 
nvpublic

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-30 16:38   ` Mark Salter
                     ` (2 preceding siblings ...)
  (?)
@ 2011-09-01 23:16   ` Grant Grundler
  -1 siblings, 0 replies; 65+ messages in thread
From: Grant Grundler @ 2011-09-01 23:16 UTC (permalink / raw)
  To: linux-omap

Mark Salter <msalter <at> redhat.com> writes:
> On Wed, 2011-08-31 at 00:03 +0800, ming.lei <at> canonical.com wrote:
...
> > +#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
> > +static inline void ehci_sync_mem()
> > +{
> > +       mb();
> > +}
> > +#else
> > +static inline void ehci_sync_mem()
> > +{
> > +}
> > +#endif
...

Consider moving the #ifdef inside the function. :)

> I'm wondering if this doesn't really belong in the DMA API for any
> future architectures that can't avoid prolonged write buffering to DMA
> coherent memory.

I suspect the semantics needed exist in mmiowb().  Honestly, I'm not an expert
of either ehci USB and ARM platforms but it looks that way to me after reading
the thread.

BTW, ChromiumOS has had a performance bug open for quite a while:
    http://code.google.com/p/chromium-os/issues/detail?id=11503

Vince Palatin said he would try it.


mmiowb() was originally implemented for SGI SN2 (IA64/Altix) machines but
applies to any case where MMIO (any uncached writes really) and "regular" (ie
cached) memory writes need to be ordered.

The "wmb() vs mmiowb()" thread explains this pretty well:
    http://www.mail-archive.com/linux-ia64@vger.kernel.org/msg03379.html


> IIUC, ARM mitigates this for most drivers by including
> an implicit write buffer flush in the mmio write routines.

The write flush is part of the MMIO writel() semantics AFAIK because MMIO writes
have to be strongly ordered (PCI requirement).

> This takes
> care of the drivers which write to a mmio device register after writing
> something to shared DMA memory. IIUC, this doesn't help ehci because the
> host controller is polling to see what the cpu writes to the shared
> memory.

Write flush in writel() does help for the "not RUNNING" -> "RUNNING" state
transition (I'm looking at qh_link_async()). It doesn't help for "already
RUNNING" case.

> Other hardware which polls shared memory like that will likely
> have the same problem and could use buffer drain helpers as well.

Yup - agreed.

cheers,
grant



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-09-01 19:08                                   ` Stephen Warren
@ 2011-09-02  9:50                                     ` Marc Zyngier
  -1 siblings, 0 replies; 65+ messages in thread
From: Marc Zyngier @ 2011-09-02  9:50 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Russell King, Greg KH, ming.lei, linux-usb, Will Deacon,
	Marc Dietich, stern, Mark Salter, Chen Peter-B29397, linux-tegra,
	linux-omap, linux-arm-kernel

On 01/09/11 20:08, Stephen Warren wrote:
> Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
>> I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on.
> 
> Here are the results I found:
> 
> Harmony:
> Tegra USB3 -> SMSC9514 hub: NOT affected
> (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)
> 
> Seaboard (springbank; clamshell):
> Tegra USB1 -> no hub: Affected
> 
> Seaboard (seaboard non-clamshell):
> Tegra USB1 -> no hub: Affected
> Tegra USB3 -> no hub: Affected
> 
> TrimSlice:
> Tegra USB3 -> unknown hub: Affected
> 
> This implies there's something different about Harmony.
> 
> Is the USB hub a clue? Seaboard doesn't have one, and although I don't
> know what model TrimSlice uses, I assume it's different since I know
> TrimSlice's Ethernet is not the same as Harmony's.

Panda has the exact same USB hub configuration, and is affected. So we
can rule this out.

> I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
> anything USB-related.
> 
> Perhaps there's some kind of bootloader or BCT difference. However, my
> Harmony and both Seaboards both use (a very old) U-Boot and BCT from
> ChromeOS, so I don't imagine there's actually much difference there.

I just noticed something else. Harmony is fast *most of the time*. In
about one in 3 reboots, I get the slow behavior. When USB is fast, I
also have I2C interrupts "screaming":

 85:     294321          0       GIC  tegra-i2c
116:          0          0       GIC  tegra-i2c
118:      98542          0       GIC  tps6586x

This is a couple of seconds after boot.

When USB is slow, I see the following:
[    0.385270] tps6586x 3-0034: Chip ID read failed: -121
[    0.390584] tps6586x: probe of 3-0034 failed with error -5

... and I2C interrupt is quiet.

The I2C interrupt handler calls writel(), which does a cache sync. That
would explain the "fast" behavior of Harmony.

Do you see the same this on your board?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-09-02  9:50                                     ` Marc Zyngier
  0 siblings, 0 replies; 65+ messages in thread
From: Marc Zyngier @ 2011-09-02  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/09/11 20:08, Stephen Warren wrote:
> Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
>> I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on.
> 
> Here are the results I found:
> 
> Harmony:
> Tegra USB3 -> SMSC9514 hub: NOT affected
> (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)
> 
> Seaboard (springbank; clamshell):
> Tegra USB1 -> no hub: Affected
> 
> Seaboard (seaboard non-clamshell):
> Tegra USB1 -> no hub: Affected
> Tegra USB3 -> no hub: Affected
> 
> TrimSlice:
> Tegra USB3 -> unknown hub: Affected
> 
> This implies there's something different about Harmony.
> 
> Is the USB hub a clue? Seaboard doesn't have one, and although I don't
> know what model TrimSlice uses, I assume it's different since I know
> TrimSlice's Ethernet is not the same as Harmony's.

Panda has the exact same USB hub configuration, and is affected. So we
can rule this out.

> I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
> anything USB-related.
> 
> Perhaps there's some kind of bootloader or BCT difference. However, my
> Harmony and both Seaboards both use (a very old) U-Boot and BCT from
> ChromeOS, so I don't imagine there's actually much difference there.

I just noticed something else. Harmony is fast *most of the time*. In
about one in 3 reboots, I get the slow behavior. When USB is fast, I
also have I2C interrupts "screaming":

 85:     294321          0       GIC  tegra-i2c
116:          0          0       GIC  tegra-i2c
118:      98542          0       GIC  tps6586x

This is a couple of seconds after boot.

When USB is slow, I see the following:
[    0.385270] tps6586x 3-0034: Chip ID read failed: -121
[    0.390584] tps6586x: probe of 3-0034 failed with error -5

... and I2C interrupt is quiet.

The I2C interrupt handler calls writel(), which does a cache sync. That
would explain the "fast" behavior of Harmony.

Do you see the same this on your board?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-09-01 19:08                                   ` Stephen Warren
@ 2011-09-02 11:13                                       ` Marc Dietich
  -1 siblings, 0 replies; 65+ messages in thread
From: Marc Dietich @ 2011-09-02 11:13 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Marc Zyngier, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Russell King, Greg KH, Chen Peter-B29397,
	ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Will Deacon,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz, Mark Salter,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

just another measurement point

> Stephen Warren wrote at Thursday:
>
> Here are the results I found:
> 
> Harmony:
> Tegra USB3 -> SMSC9514 hub: NOT affected
> (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change
> this)
> 
> Seaboard (springbank; clamshell):
> Tegra USB1 -> no hub: Affected
> 
> Seaboard (seaboard non-clamshell):
> Tegra USB1 -> no hub: Affected
> Tegra USB3 -> no hub: Affected
> 
> TrimSlice:
> Tegra USB3 -> unknown hub: Affected

PAZ00:
ULPI -> SMCS 2512: Affected
Tegra USB3 -> SMSC 2514: Affected

The patch also cures high latencies/packet drops on wifi connected to ULPI via 
the 2512 hub. The pen drive was connected to USB3/2514.

> This implies there's something different about Harmony.
> 
> Is the USB hub a clue? Seaboard doesn't have one, and although I don't
> know what model TrimSlice uses, I assume it's different since I know
> TrimSlice's Ethernet is not the same as Harmony's.
> 
> I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
> anything USB-related.
> 
> Perhaps there's some kind of bootloader or BCT difference. However, my
> Harmony and both Seaboards both use (a very old) U-Boot and BCT from
> ChromeOS, so I don't imagine there's actually much difference there.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-09-02 11:13                                       ` Marc Dietich
  0 siblings, 0 replies; 65+ messages in thread
From: Marc Dietich @ 2011-09-02 11:13 UTC (permalink / raw)
  To: linux-arm-kernel

just another measurement point

> Stephen Warren wrote at Thursday:
>
> Here are the results I found:
> 
> Harmony:
> Tegra USB3 -> SMSC9514 hub: NOT affected
> (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change
> this)
> 
> Seaboard (springbank; clamshell):
> Tegra USB1 -> no hub: Affected
> 
> Seaboard (seaboard non-clamshell):
> Tegra USB1 -> no hub: Affected
> Tegra USB3 -> no hub: Affected
> 
> TrimSlice:
> Tegra USB3 -> unknown hub: Affected

PAZ00:
ULPI -> SMCS 2512: Affected
Tegra USB3 -> SMSC 2514: Affected

The patch also cures high latencies/packet drops on wifi connected to ULPI via 
the 2512 hub. The pen drive was connected to USB3/2514.

> This implies there's something different about Harmony.
> 
> Is the USB hub a clue? Seaboard doesn't have one, and although I don't
> know what model TrimSlice uses, I assume it's different since I know
> TrimSlice's Ethernet is not the same as Harmony's.
> 
> I don't see anything in board-harmony.c vs. board-seaboard.c that'd affect
> anything USB-related.
> 
> Perhaps there's some kind of bootloader or BCT difference. However, my
> Harmony and both Seaboards both use (a very old) U-Boot and BCT from
> ChromeOS, so I don't imagine there's actually much difference there.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* RE: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-09-02  9:50                                     ` Marc Zyngier
@ 2011-09-02 17:07                                       ` Stephen Warren
  -1 siblings, 0 replies; 65+ messages in thread
From: Stephen Warren @ 2011-09-02 17:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Russell King, Greg KH, ming.lei, linux-usb, Will Deacon,
	Marc Dietich, stern, Mark Salter, Chen Peter-B29397, linux-tegra,
	linux-omap, linux-arm-kernel

Marc Zyngier wrote at Friday, September 02, 2011 3:51 AM:
> On 01/09/11 20:08, Stephen Warren wrote:
> > Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
> >> I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on.
> >
> > Here are the results I found:
> >
> > Harmony:
> > Tegra USB3 -> SMSC9514 hub: NOT affected
> > (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)
...
> I just noticed something else. Harmony is fast *most of the time*. In
> about one in 3 reboots, I get the slow behavior. When USB is fast, I
> also have I2C interrupts "screaming":
> 
>  85:     294321          0       GIC  tegra-i2c
> 116:          0          0       GIC  tegra-i2c
> 118:      98542          0       GIC  tps6586x
> 
> This is a couple of seconds after boot.
> 
> When USB is slow, I see the following:
> [    0.385270] tps6586x 3-0034: Chip ID read failed: -121
> [    0.390584] tps6586x: probe of 3-0034 failed with error -5
> 
> ... and I2C interrupt is quiet.
> 
> The I2C interrupt handler calls writel(), which does a cache sync. That
> would explain the "fast" behavior of Harmony.
> 
> Do you see the same this on your board?

Yes, I re-ran the test a few more times and see those exact same symptoms.

In a case with the screaming I2C interrupts and fast USB, I then did:

echo 3-0034 > /sys/bus/i2c/drivers/tps6586x/unbind
(I got a kernel BUG and bash crashed here, but just logged back in)

which caused the I2C interrupt handler to stop, then re-ran the test.
I then saw the slow USB speed.

So, now I think *all* platforms(boards) are affected, right?

-- 
nvpublic

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-09-02 17:07                                       ` Stephen Warren
  0 siblings, 0 replies; 65+ messages in thread
From: Stephen Warren @ 2011-09-02 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

Marc Zyngier wrote at Friday, September 02, 2011 3:51 AM:
> On 01/09/11 20:08, Stephen Warren wrote:
> > Marc Dietich wrote at Thursday, September 01, 2011 5:14 AM:
> >> I'll add Stephen Warren from NVIDIA to the CC list. He has more HW to test on.
> >
> > Here are the results I found:
> >
> > Harmony:
> > Tegra USB3 -> SMSC9514 hub: NOT affected
> > (Unplugging LAN cable, or disabling SMSC9514 LAN driver doesn't change this)
...
> I just noticed something else. Harmony is fast *most of the time*. In
> about one in 3 reboots, I get the slow behavior. When USB is fast, I
> also have I2C interrupts "screaming":
> 
>  85:     294321          0       GIC  tegra-i2c
> 116:          0          0       GIC  tegra-i2c
> 118:      98542          0       GIC  tps6586x
> 
> This is a couple of seconds after boot.
> 
> When USB is slow, I see the following:
> [    0.385270] tps6586x 3-0034: Chip ID read failed: -121
> [    0.390584] tps6586x: probe of 3-0034 failed with error -5
> 
> ... and I2C interrupt is quiet.
> 
> The I2C interrupt handler calls writel(), which does a cache sync. That
> would explain the "fast" behavior of Harmony.
> 
> Do you see the same this on your board?

Yes, I re-ran the test a few more times and see those exact same symptoms.

In a case with the screaming I2C interrupts and fast USB, I then did:

echo 3-0034 > /sys/bus/i2c/drivers/tps6586x/unbind
(I got a kernel BUG and bash crashed here, but just logged back in)

which caused the I2C interrupt handler to stop, then re-ran the test.
I then saw the slow USB speed.

So, now I think *all* platforms(boards) are affected, right?

-- 
nvpublic

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-08-31 19:35                               ` Will Deacon
@ 2011-09-08 22:41                                 ` Mark Salter
  -1 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-09-08 22:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: Russell King, Nicolas Pitre, Chen Peter-B29397, ming.lei,
	linux-usb, Greg KH, stern, linux-omap, linux-arm-kernel

On Wed, 2011-08-31 at 20:35 +0100, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
> > On 08/31/2011 12:51 PM, Will Deacon wrote:
> > > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> > > CPU during boot (by commenting out most of smp_init). In this case, I/O
> > > performance was good until we tried to online the secondary CPU. The online
> > > failed but after that the I/O performance was certainly degraded.
> > > 
> > 
> > Was the SCU enabled at that point? One diff between nosmp boot and
> > offlining the 2nd core would be that the SCU remains enabled in the
> > latter case. I think the SCU does not get enabled for nosmp.
> 
> Our rudimentary test (printing out the SCU control register during boot)
> showed that it *was* enabled for nosmp. I think this is due to the secure
> world having to do that on OMAP so it's probably not true for other
> platforms.

I've done a little test and found that turning on the MMU of the second
core causes the problem to show up. I patched head.S so I stopped the
second core in an infinite loop just before turning on the MMU. The
system continues booting on core#0 and I see ~20MB/s with hdparm -t to
an attached usb disk. Same setup but with second core being stopped with
infinite loop just after MMU is enabled shows ~5MB/s. So whatever is
going wrong, its not because of anything the second core is doing beyond
turning on its MMU and doing an empty loop.

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-09-08 22:41                                 ` Mark Salter
  0 siblings, 0 replies; 65+ messages in thread
From: Mark Salter @ 2011-09-08 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-08-31 at 20:35 +0100, Will Deacon wrote:
> On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
> > On 08/31/2011 12:51 PM, Will Deacon wrote:
> > > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
> > > CPU during boot (by commenting out most of smp_init). In this case, I/O
> > > performance was good until we tried to online the secondary CPU. The online
> > > failed but after that the I/O performance was certainly degraded.
> > > 
> > 
> > Was the SCU enabled at that point? One diff between nosmp boot and
> > offlining the 2nd core would be that the SCU remains enabled in the
> > latter case. I think the SCU does not get enabled for nosmp.
> 
> Our rudimentary test (printing out the SCU control register during boot)
> showed that it *was* enabled for nosmp. I think this is due to the secure
> world having to do that on OMAP so it's probably not true for other
> platforms.

I've done a little test and found that turning on the MMU of the second
core causes the problem to show up. I patched head.S so I stopped the
second core in an infinite loop just before turning on the MMU. The
system continues booting on core#0 and I see ~20MB/s with hdparm -t to
an attached usb disk. Same setup but with second core being stopped with
infinite loop just after MMU is enabled shows ~5MB/s. So whatever is
going wrong, its not because of anything the second core is doing beyond
turning on its MMU and doing an empty loop.

--Mark

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
  2011-09-08 22:41                                 ` Mark Salter
@ 2011-10-31  6:49                                     ` Pandita, Vikram
  -1 siblings, 0 replies; 65+ messages in thread
From: Pandita, Vikram @ 2011-10-31  6:49 UTC (permalink / raw)
  To: Mark Salter
  Cc: Will Deacon, Russell King, Nicolas Pitre, Chen Peter-B29397,
	ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Greg KH,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Sep 8, 2011 at 3:41 PM, Mark Salter <msalter-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Wed, 2011-08-31 at 20:35 +0100, Will Deacon wrote:
>> On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
>> > On 08/31/2011 12:51 PM, Will Deacon wrote:
>> > > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
>> > > CPU during boot (by commenting out most of smp_init). In this case, I/O
>> > > performance was good until we tried to online the secondary CPU. The online
>> > > failed but after that the I/O performance was certainly degraded.
>> > >
>> >
>> > Was the SCU enabled at that point? One diff between nosmp boot and
>> > offlining the 2nd core would be that the SCU remains enabled in the
>> > latter case. I think the SCU does not get enabled for nosmp.
>>
>> Our rudimentary test (printing out the SCU control register during boot)
>> showed that it *was* enabled for nosmp. I think this is due to the secure
>> world having to do that on OMAP so it's probably not true for other
>> platforms.
>
> I've done a little test and found that turning on the MMU of the second
> core causes the problem to show up. I patched head.S so I stopped the
> second core in an infinite loop just before turning on the MMU. The
> system continues booting on core#0 and I see ~20MB/s with hdparm -t to
> an attached usb disk. Same setup but with second core being stopped with
> infinite loop just after MMU is enabled shows ~5MB/s. So whatever is
> going wrong, its not because of anything the second core is doing beyond
> turning on its MMU and doing an empty loop.

what was the final take on the optimization?
Excuse if i could not follow the whole thread - could someone
summarize for the benefit of many.

Thanks

>
> --Mark
>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP
@ 2011-10-31  6:49                                     ` Pandita, Vikram
  0 siblings, 0 replies; 65+ messages in thread
From: Pandita, Vikram @ 2011-10-31  6:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 8, 2011 at 3:41 PM, Mark Salter <msalter@redhat.com> wrote:
> On Wed, 2011-08-31 at 20:35 +0100, Will Deacon wrote:
>> On Wed, Aug 31, 2011 at 07:19:33PM +0100, Rob Herring wrote:
>> > On 08/31/2011 12:51 PM, Will Deacon wrote:
>> > > Another thing that Marc and I tried on OMAP4 was not bringing up the secondary
>> > > CPU during boot (by commenting out most of smp_init). In this case, I/O
>> > > performance was good until we tried to online the secondary CPU. The online
>> > > failed but after that the I/O performance was certainly degraded.
>> > >
>> >
>> > Was the SCU enabled at that point? One diff between nosmp boot and
>> > offlining the 2nd core would be that the SCU remains enabled in the
>> > latter case. I think the SCU does not get enabled for nosmp.
>>
>> Our rudimentary test (printing out the SCU control register during boot)
>> showed that it *was* enabled for nosmp. I think this is due to the secure
>> world having to do that on OMAP so it's probably not true for other
>> platforms.
>
> I've done a little test and found that turning on the MMU of the second
> core causes the problem to show up. I patched head.S so I stopped the
> second core in an infinite loop just before turning on the MMU. The
> system continues booting on core#0 and I see ~20MB/s with hdparm -t to
> an attached usb disk. Same setup but with second core being stopped with
> infinite loop just after MMU is enabled shows ~5MB/s. So whatever is
> going wrong, its not because of anything the second core is doing beyond
> turning on its MMU and doing an empty loop.

what was the final take on the optimization?
Excuse if i could not follow the whole thread - could someone
summarize for the benefit of many.

Thanks

>
> --Mark
>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2011-10-31  6:49 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-30 16:03 [PATCH] usb: ehci: make HC see up-to-date qh/qtd descriptor ASAP ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw
2011-08-30 16:03 ` ming.lei at canonical.com
2011-08-30 16:15 ` Alan Stern
2011-08-30 16:15   ` Alan Stern
2011-08-30 16:38 ` Mark Salter
2011-08-30 16:38   ` Mark Salter
2011-08-30 17:15   ` Alan Stern
2011-08-30 17:15     ` Alan Stern
2011-08-30 18:45     ` Mark Salter
2011-08-30 18:45       ` Mark Salter
2011-08-30 17:26   ` Will Deacon
2011-08-30 17:26     ` Will Deacon
     [not found]     ` <20110830172642.GE3464-SGELLbQ0bobZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2011-08-30 17:48       ` Greg KH
2011-08-30 17:48         ` Greg KH
2011-08-30 17:54         ` Will Deacon
2011-08-30 17:54           ` Will Deacon
     [not found]           ` <20110830175432.GG3464-SGELLbQ0bobZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2011-08-31  0:23             ` Chen Peter-B29397
2011-08-31  0:23               ` Chen Peter-B29397
2011-08-31  8:49               ` Will Deacon
2011-08-31  8:49                 ` Will Deacon
2011-08-31 12:33                 ` Chen Peter-B29397
2011-08-31 12:33                   ` Chen Peter-B29397
2011-08-31 13:43                 ` Mark Salter
2011-08-31 13:43                   ` Mark Salter
2011-08-31 15:21                   ` Will Deacon
2011-08-31 15:21                     ` Will Deacon
2011-08-31 15:27                     ` Mark Salter
2011-08-31 15:27                       ` Mark Salter
2011-08-31 16:12                       ` Marc Zyngier
2011-08-31 16:12                         ` Marc Zyngier
2011-08-31 16:55                         ` Marc Dietrich
2011-08-31 16:55                           ` Marc Dietrich
2011-09-01 10:34                           ` Marc Zyngier
2011-09-01 10:34                             ` Marc Zyngier
     [not found]                             ` <4E5F5FA9.3010305-5wv7dgnIgG8@public.gmane.org>
2011-09-01 11:13                               ` Marc Dietich
2011-09-01 11:13                                 ` Marc Dietich
2011-09-01 19:08                                 ` Stephen Warren
2011-09-01 19:08                                   ` Stephen Warren
2011-09-02  9:50                                   ` Marc Zyngier
2011-09-02  9:50                                     ` Marc Zyngier
2011-09-02 17:07                                     ` Stephen Warren
2011-09-02 17:07                                       ` Stephen Warren
     [not found]                                   ` <74CDBE0F657A3D45AFBB94109FB122FF04B327A383-C7FfzLzN0UxDw2glCA4ptUEOCMrvLtNR@public.gmane.org>
2011-09-02 11:13                                     ` Marc Dietich
2011-09-02 11:13                                       ` Marc Dietich
2011-08-31 17:46                     ` Nicolas Pitre
2011-08-31 17:46                       ` Nicolas Pitre
2011-08-31 17:51                       ` Will Deacon
2011-08-31 17:51                         ` Will Deacon
     [not found]                         ` <20110831175147.GI8777-SGELLbQ0bobZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2011-08-31 18:19                           ` Rob Herring
2011-08-31 18:19                             ` Rob Herring
2011-08-31 18:35                             ` Mark Salter
2011-08-31 18:35                               ` Mark Salter
2011-08-31 18:49                               ` Rob Herring
2011-08-31 18:49                                 ` Rob Herring
     [not found]                                 ` <4E5E8230.9060307-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2011-08-31 18:58                                   ` Mark Salter
2011-08-31 18:58                                     ` Mark Salter
2011-08-31 19:35                             ` Will Deacon
2011-08-31 19:35                               ` Will Deacon
2011-09-08 22:41                               ` Mark Salter
2011-09-08 22:41                                 ` Mark Salter
     [not found]                                 ` <1315521779.2313.29.camel-PDpCo7skNiwAicBL8TP8PQ@public.gmane.org>
2011-10-31  6:49                                   ` Pandita, Vikram
2011-10-31  6:49                                     ` Pandita, Vikram
2011-08-31  0:56           ` Ming Lei
2011-08-31  0:56             ` Ming Lei
2011-09-01 23:16   ` Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.