All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm: Improve MMC performance on Versatile Express
@ 2011-01-21 13:59 Pawel Moll
  2011-01-21 15:09 ` Sergei Shtylyov
  2011-01-21 19:59 ` Nicolas Pitre
  0 siblings, 2 replies; 25+ messages in thread
From: Pawel Moll @ 2011-01-21 13:59 UTC (permalink / raw)
  To: linux-arm-kernel

VE suffers from serious problem with MMC transfers - low performance,
errors when other IO peripherals (especially USB) are used at the
same time etc.

It all boils down to the MMC controller short FIFO and - in result -
timing constrains. The most problematic case - USB driver hogging
CPU and MMC FIFO under/overruns in the result - can be mitigated on
SMP system by distributing interrupts handling for these peripherals
between cores.

With this, the MMC card clock can be safely (at least it looks like
it) increased to 1.5MHz, improving performance.

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
---
 arch/arm/mach-vexpress/v2m.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
index 4b5af01..300ac72 100644
--- a/arch/arm/mach-vexpress/v2m.c
+++ b/arch/arm/mach-vexpress/v2m.c
@@ -244,6 +244,7 @@ static unsigned int v2m_mmci_status(struct device *dev)
 static struct mmci_platform_data v2m_mmci_data = {
 	.ocr_mask	= MMC_VDD_32_33|MMC_VDD_33_34,
 	.status		= v2m_mmci_status,
+	.f_max		= 1500000,
 };
 
 static AMBA_DEVICE(aaci,  "mb:aaci",  V2M_AACI, NULL);
@@ -391,6 +392,19 @@ static void __init v2m_init(void)
 	for (i = 0; i < ARRAY_SIZE(v2m_amba_devs); i++)
 		amba_device_register(v2m_amba_devs[i], &iomem_resource);
 
+	/* WARNING: HACK HACK HACK!
+	 *
+	 * MMCI cell has very tight timing requirements regarding interrupt
+	 * handling (fractions of millisecond), while the USB host controller
+	 * interrupt handler can hog the CPU for more then 1 millisecond!
+	 *
+	 * To mitigate the problem on SMP systems, we can set CPU affinities
+	 * of these interrupts to different cores... */
+	BUG_ON(v2m_usb_resources[1].flags != IORESOURCE_IRQ);
+	irq_set_affinity(v2m_usb_resources[1].start, cpumask_of(0));
+	irq_set_affinity(mmci_device.irq[0], cpumask_of(1));
+	irq_set_affinity(mmci_device.irq[1], cpumask_of(1));
+
 	pm_power_off = v2m_power_off;
 	arm_pm_restart = v2m_restart;
 
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-21 13:59 [PATCH] arm: Improve MMC performance on Versatile Express Pawel Moll
@ 2011-01-21 15:09 ` Sergei Shtylyov
  2011-01-21 19:59 ` Nicolas Pitre
  1 sibling, 0 replies; 25+ messages in thread
From: Sergei Shtylyov @ 2011-01-21 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hello.

Pawel Moll wrote:

> VE suffers from serious problem with MMC transfers - low performance,
> errors when other IO peripherals (especially USB) are used at the
> same time etc.

> It all boils down to the MMC controller short FIFO and - in result -
> timing constrains. The most problematic case - USB driver hogging
> CPU and MMC FIFO under/overruns in the result - can be mitigated on
> SMP system by distributing interrupts handling for these peripherals
> between cores.

> With this, the MMC card clock can be safely (at least it looks like
> it) increased to 1.5MHz, improving performance.

> Signed-off-by: Pawel Moll <pawel.moll@arm.com>
> ---
>  arch/arm/mach-vexpress/v2m.c |   14 ++++++++++++++
>  1 files changed, 14 insertions(+), 0 deletions(-)

> diff --git a/arch/arm/mach-vexpress/v2m.c b/arch/arm/mach-vexpress/v2m.c
> index 4b5af01..300ac72 100644
> --- a/arch/arm/mach-vexpress/v2m.c
> +++ b/arch/arm/mach-vexpress/v2m.c
[...]
>  static AMBA_DEVICE(aaci,  "mb:aaci",  V2M_AACI, NULL);
> @@ -391,6 +392,19 @@ static void __init v2m_init(void)
>  	for (i = 0; i < ARRAY_SIZE(v2m_amba_devs); i++)
>  		amba_device_register(v2m_amba_devs[i], &iomem_resource);
>  
> +	/* WARNING: HACK HACK HACK!
> +	 *
> +	 * MMCI cell has very tight timing requirements regarding interrupt
> +	 * handling (fractions of millisecond), while the USB host controller
> +	 * interrupt handler can hog the CPU for more then 1 millisecond!
> +	 *
> +	 * To mitigate the problem on SMP systems, we can set CPU affinities
> +	 * of these interrupts to different cores... */

    The preferred style for the mutli-line comments is this:

/*
  * bla
  * bla
  */

    You were close. :-)

> +	BUG_ON(v2m_usb_resources[1].flags != IORESOURCE_IRQ);
> +	irq_set_affinity(v2m_usb_resources[1].start, cpumask_of(0));
> +	irq_set_affinity(mmci_device.irq[0], cpumask_of(1));
> +	irq_set_affinity(mmci_device.irq[1], cpumask_of(1));
> +
>  	pm_power_off = v2m_power_off;
>  	arm_pm_restart = v2m_restart;

WBR, Sergei

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-21 13:59 [PATCH] arm: Improve MMC performance on Versatile Express Pawel Moll
  2011-01-21 15:09 ` Sergei Shtylyov
@ 2011-01-21 19:59 ` Nicolas Pitre
  2011-01-21 21:51   ` Russell King - ARM Linux
  2011-01-24 12:27   ` Pawel Moll
  1 sibling, 2 replies; 25+ messages in thread
From: Nicolas Pitre @ 2011-01-21 19:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 21 Jan 2011, Pawel Moll wrote:

> VE suffers from serious problem with MMC transfers - low performance,
> errors when other IO peripherals (especially USB) are used at the
> same time etc.
> 
> It all boils down to the MMC controller short FIFO and - in result -
> timing constrains. The most problematic case - USB driver hogging
> CPU and MMC FIFO under/overruns in the result - can be mitigated on
> SMP system by distributing interrupts handling for these peripherals
> between cores.

Wouldn't the ultimate solution be to simply use FIQs to service the MMC FIFO?


Nicolas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-21 19:59 ` Nicolas Pitre
@ 2011-01-21 21:51   ` Russell King - ARM Linux
  2011-01-21 22:20     ` Nicolas Pitre
  2011-01-24 12:27   ` Pawel Moll
  1 sibling, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-21 21:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 21, 2011 at 02:59:35PM -0500, Nicolas Pitre wrote:
> On Fri, 21 Jan 2011, Pawel Moll wrote:
> 
> > VE suffers from serious problem with MMC transfers - low performance,
> > errors when other IO peripherals (especially USB) are used at the
> > same time etc.
> > 
> > It all boils down to the MMC controller short FIFO and - in result -
> > timing constrains. The most problematic case - USB driver hogging
> > CPU and MMC FIFO under/overruns in the result - can be mitigated on
> > SMP system by distributing interrupts handling for these peripherals
> > between cores.
> 
> Wouldn't the ultimate solution be to simply use FIQs to service the MMC FIFO?

Could you suggest how to route an arbitary interrupt to the FIQ using
the GIC?

On systems which implement security extensions, the non-secure world can
only use IRQs and not FIQs.  The secure world can use FIQs - in which
case interrupts marked as secure interrupts can go to FIQ.

On systems without security extensions, the GIC only supports IRQ, and
you need a second GIC implemented for FIQs.  I'm not aware of any system
doing that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-21 21:51   ` Russell King - ARM Linux
@ 2011-01-21 22:20     ` Nicolas Pitre
  2011-01-21 22:29       ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Nicolas Pitre @ 2011-01-21 22:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 21 Jan 2011, Russell King - ARM Linux wrote:

> On Fri, Jan 21, 2011 at 02:59:35PM -0500, Nicolas Pitre wrote:
> > On Fri, 21 Jan 2011, Pawel Moll wrote:
> > 
> > > VE suffers from serious problem with MMC transfers - low performance,
> > > errors when other IO peripherals (especially USB) are used at the
> > > same time etc.
> > > 
> > > It all boils down to the MMC controller short FIFO and - in result -
> > > timing constrains. The most problematic case - USB driver hogging
> > > CPU and MMC FIFO under/overruns in the result - can be mitigated on
> > > SMP system by distributing interrupts handling for these peripherals
> > > between cores.
> > 
> > Wouldn't the ultimate solution be to simply use FIQs to service the MMC FIFO?
> 
> Could you suggest how to route an arbitary interrupt to the FIQ using
> the GIC?

No I can't, hence my question.

> On systems which implement security extensions, the non-secure world can
> only use IRQs and not FIQs.  The secure world can use FIQs - in which
> case interrupts marked as secure interrupts can go to FIQ.

In that case, in theory, the secure world could provide some kind of 
software based DMA engine facility to the non-secure world for this FIFO 
servicing purpose.

> On systems without security extensions, the GIC only supports IRQ, and
> you need a second GIC implemented for FIQs.  I'm not aware of any system
> doing that.

The only solution in that case is to give top priority to the FIFO IRQ 
and never disable IRQs when in interrupt context, except for that FIFO 
servicing handler which should keep IRQs masked out throughout.  In any 
case this would certainly be only a hack for badly misdesigned hardware.


Nicolas

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-21 22:20     ` Nicolas Pitre
@ 2011-01-21 22:29       ` Russell King - ARM Linux
  2011-02-01 14:29         ` Linus Walleij
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-21 22:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 21, 2011 at 05:20:57PM -0500, Nicolas Pitre wrote:
> The only solution in that case is to give top priority to the FIFO IRQ 
> and never disable IRQs when in interrupt context, except for that FIFO 
> servicing handler which should keep IRQs masked out throughout.  In any 
> case this would certainly be only a hack for badly misdesigned hardware.

Not possible anymore.  The kernel's IRQ handling has changed such that
generic code now ensures that IRQs are disabled irrespective of the
IRQF_DISABLED flag.  All IRQ handlers are called with IRQs disabled,
and they remain that way until they call local_irq_enable().

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-21 19:59 ` Nicolas Pitre
  2011-01-21 21:51   ` Russell King - ARM Linux
@ 2011-01-24 12:27   ` Pawel Moll
  2011-01-24 13:35     ` Russell King - ARM Linux
  1 sibling, 1 reply; 25+ messages in thread
From: Pawel Moll @ 2011-01-24 12:27 UTC (permalink / raw)
  To: linux-arm-kernel


> Wouldn't the ultimate solution be to simply use FIQs to service the
> MMC FIFO?

The ultimate solution is to super-size FIFO ;-) It's not impossible (as the motherboard peripherals live in a FPGA), but according to our board people may be tricky due to the MMCI design (it's just ancient, that's all).

Now, the first thing I did was bumping the MMCI interrupts priorities, then I was considering use of FIQ. But - as Russell pointed out - it's simply not so simple...

> the secure world could provide some kind of 
> software based DMA engine facility to the non-secure world for this FIFO 
> servicing purpose.

It's very, very unrealistic to expect any secure stack implementation (say, Visa or MasterCard authorization software) to provide such feature for anyone.

So - we'll try to enlarge FIFO. For the moment - playing with interrupts affinity seem to be a viable workaround.

Cheers!

Pawe?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 12:27   ` Pawel Moll
@ 2011-01-24 13:35     ` Russell King - ARM Linux
  2011-01-24 16:13       ` Pawel Moll
  2011-02-03 14:15       ` Linus Walleij
  0 siblings, 2 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 13:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 24, 2011 at 12:27:16PM -0000, Pawel Moll wrote:
> So - we'll try to enlarge FIFO. For the moment - playing with interrupts
> affinity seem to be a viable workaround.

I don't think enlarging the FIFO will help too much.  The issue is
whether the CPU can keep up with the data rate coming off the card.
If it can't, then no matter how large the FIFO is, it will eventually
overflow.

The real answer is to avoid PIO mode, and use DMA support.  However,
I've had problems using DMA on the ARM development boards.  You can
find details my DMA issues internally within ARM by talking to Catalin.

The alternative answer, I believe implemented by some of ARMs silicon
partners, is to turn the card clock off when the FIFO becomes full/empty
to stop it sending more data.  I think this violates some of the MMC/SD
requirements, but it seems to work for the silicon partners just fine.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 13:35     ` Russell King - ARM Linux
@ 2011-01-24 16:13       ` Pawel Moll
  2011-01-24 16:24         ` Russell King - ARM Linux
  2011-02-03 14:15       ` Linus Walleij
  1 sibling, 1 reply; 25+ messages in thread
From: Pawel Moll @ 2011-01-24 16:13 UTC (permalink / raw)
  To: linux-arm-kernel

> I don't think enlarging the FIFO will help too much.  The issue is
> whether the CPU can keep up with the data rate coming off the card.
> If it can't, then no matter how large the FIFO is, it will
> eventually overflow.

Yes, I realize that. But so far the only time when problem happens is the timeout caused by ISP1761 handler. If we substantially increase the FIFO depth, we'll just have much more margin - enough to "mostly work". To my taste, the 1.3ms required to service the USB interrupt is already waaay to long. I'd consider any substantially larger latencies pathological, so making sure that we have margin of 3, maybe even 5 ms sounds good enough to me.

It's not about making this interface perfect. This won't happen, I afraid :-( It's just about making it good enough :-)

> The real answer is to avoid PIO mode, and use DMA support.
> However,
> I've had problems using DMA on the ARM development boards.  You can

Well, using DMA won't be easy on VE, will it? ;-) Besides even with this, in some extreme situation, the bus could be potentially stalled long enough to cause an underrun. Yes, very unlikely, but not impossible.

> The alternative answer, I believe implemented by some of ARMs silicon
> partners, is to turn the card clock off when the FIFO becomes full/empty
> to stop it sending more data.  I think this violates some of the MMC/SD
> requirements, but it seems to work for the silicon partners just fine.

Oh no, that's absolutely legal - see JEDEC 84-A441, p. 7.7:
"The MultiMediaCard bus clock signal can be used by the host to put the card into energy saving mode, or to control the data flow (to avoid under-run or over-run conditions) on the bus."

So this would be the technically correct fix. The problem is that this would require even more MMCI modifications then enlarging FIFO, so it's unlikely to happen.

Pawe?

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 16:13       ` Pawel Moll
@ 2011-01-24 16:24         ` Russell King - ARM Linux
  2011-01-24 16:30           ` Russell King - ARM Linux
  2011-01-24 16:39           ` Pawel Moll
  0 siblings, 2 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 24, 2011 at 04:13:14PM +0000, Pawel Moll wrote:
> > I don't think enlarging the FIFO will help too much.  The issue is
> > whether the CPU can keep up with the data rate coming off the card.
> > If it can't, then no matter how large the FIFO is, it will
> > eventually overflow.
> 
> Yes, I realize that. But so far the only time when problem happens is
> the timeout caused by ISP1761 handler.

That's fixable by having the block driver retry.

I have patches lodged with cjb which rework the MMC block driver error
handling, and I now have MMC rootfs entirely usable with the ISP1761
driver.  With ISP1761 also fixed (god knows why the driver maintainer
isn't doing anything with that patch) then the problem goes away.

So really this is a non-issue.

> If we substantially increase the FIFO depth, we'll just have much more
> margin - enough to "mostly work". To my taste, the 1.3ms required to
> service the USB interrupt is already waaay to long. I'd consider any
> substantially larger latencies pathological, so making sure that we
> have margin of 3, maybe even 5 ms sounds good enough to me.

Well, fixing MMCI because ISP1761 is buggy is not the way forward.  The
answer is to fix the broken ISP1761 driver.

We know as it currently stands in mainline that it's utterly unusable
with certain serial devices.  That doesn't mean we go off and fix MMCI.

> > The real answer is to avoid PIO mode, and use DMA support.
> > However,
> > I've had problems using DMA on the ARM development boards.  You can
> 
> Well, using DMA won't be easy on VE, will it? ;-) Besides even with
> this, in some extreme situation, the bus could be potentially stalled
> long enough to cause an underrun. Yes, very unlikely, but not impossible.

I'm running SD rootfs alongside NFS/ssh/NTP and the buggy ISP1761.  With the
pile of 'fixes' patches I have here (which are currently stuck with various
maintainers) I have a completely stable system.

I've also had it running gstreamer on CLCD with AACI too, again with SD
rootfs.  Provided the video is already loaded off the SD card (because
the transfer rate is too slow) it's fine.

The only remaining problem I have with it is !"?$%% busybox shell which
doesn't like readonly rootfs with command history.  The last command is
/sbin/reboot - I'm sure you can imagine what keeps on happening,
particularly at the most annoying points in time.

> > The alternative answer, I believe implemented by some of ARMs silicon
> > partners, is to turn the card clock off when the FIFO becomes full/empty
> > to stop it sending more data.  I think this violates some of the MMC/SD
> > requirements, but it seems to work for the silicon partners just fine.
> 
> Oh no, that's absolutely legal - see JEDEC 84-A441, p. 7.7:
> "The MultiMediaCard bus clock signal can be used by the host to put the
> card into energy saving mode, or to control the data flow (to avoid
> under-run or over-run conditions) on the bus."
> 
> So this would be the technically correct fix. The problem is that this
> would require even more MMCI modifications then enlarging FIFO, so it's
> unlikely to happen.

Well as I see it, it's pointless enlarging the FIFO.  ARM Ltd isn't going
to give me updated FPGA images for the Integrator/CP, Versatile PB926,
Realview EB and Versatile Express.

Given that the problem is already fixed via a set of patches, I see no
reason to mess about with the hardware, thereby making the driver more
complicated for *no* benefit.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 16:24         ` Russell King - ARM Linux
@ 2011-01-24 16:30           ` Russell King - ARM Linux
  2011-01-24 16:39           ` Pawel Moll
  1 sibling, 0 replies; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 24, 2011 at 04:24:00PM +0000, Russell King - ARM Linux wrote:
> On Mon, Jan 24, 2011 at 04:13:14PM +0000, Pawel Moll wrote:
> > > The real answer is to avoid PIO mode, and use DMA support.
> > > However,
> > > I've had problems using DMA on the ARM development boards.  You can
> > 
> > Well, using DMA won't be easy on VE, will it? ;-) Besides even with
> > this, in some extreme situation, the bus could be potentially stalled
> > long enough to cause an underrun. Yes, very unlikely, but not impossible.
> 
> I'm running SD rootfs alongside NFS/ssh/NTP and the buggy ISP1761.  With the
> pile of 'fixes' patches I have here (which are currently stuck with various
> maintainers) I have a completely stable system.
> 
> I've also had it running gstreamer on CLCD with AACI too, again with SD
> rootfs.  Provided the video is already loaded off the SD card (because
> the transfer rate is too slow) it's fine.
> 
> The only remaining problem I have with it is !"?$%% busybox shell which
> doesn't like readonly rootfs with command history.  The last command is
> /sbin/reboot - I'm sure you can imagine what keeps on happening,
> particularly at the most annoying points in time.

I should also point out that my Versatile Express has been on continuously
for the last two to three weeks, running off SD rootfs, and since fixing
the ISP1761 driver and MMC block driver error handling there's been zero
issues with MMC or ISP1761.

So, MMCI overruns is now a complete non-issue for me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 16:24         ` Russell King - ARM Linux
  2011-01-24 16:30           ` Russell King - ARM Linux
@ 2011-01-24 16:39           ` Pawel Moll
  2011-01-24 16:53             ` Russell King - ARM Linux
  1 sibling, 1 reply; 25+ messages in thread
From: Pawel Moll @ 2011-01-24 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

> Given that the problem is already fixed via a set of patches, I see
> no reason to mess about with the hardware, thereby making the driver
> more complicated for *no* benefit.

There will be virtually no change in the driver - all required stuff is already there. See the STE UX500 variant - they did exactly the same, enlarged the FIFO. One of the main reason I've immediately abandoned the idea of using FIQ was the need of changing the driver for no good reason.

> Well, fixing MMCI because ISP1761 is buggy is not the way forward.
> The answer is to fix the broken ISP1761 driver.

Totally agree. The thing is that you are the only person right now who doesn't have any problems. The others can quickly benefit from the workaround. I think I made the status of this patch clear enough - let me quote myself: "HACK HACK HACK" ;-)

> Well as I see it, it's pointless enlarging the FIFO.  ARM Ltd isn't
> going to give me updated FPGA images for the Integrator/CP, Versatile
> PB926, Realview EB and Versatile Express.

If we succeed with this, you (and all other users) will get the "improved" FPGA image. But for VE only indeed.

Pawe?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 16:39           ` Pawel Moll
@ 2011-01-24 16:53             ` Russell King - ARM Linux
  2011-01-24 17:03               ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 24, 2011 at 04:39:21PM -0000, Pawel Moll wrote:
> > Well, fixing MMCI because ISP1761 is buggy is not the way forward.
> > The answer is to fix the broken ISP1761 driver.
> 
> Totally agree. The thing is that you are the only person right now who
> doesn't have any problems.

That's not my problem.  A simple search of the linux-usb mailing list
will give you the patch:

	http://marc.info/?l=linux-usb&m=129469114904445&w=2

> The others can quickly benefit from the workaround.

Others _can_ quickly benefit from the ISP1761 fix by applying the above
patch, which not only fixes the MMCI overruns, but also allows USB to
serial devices to work with Versatile Express.  The above patch is
the correct solution.

For christ sake, that patch is a fix for a *known* errata issued by the
ISP1761 manufacturer.

> > Well as I see it, it's pointless enlarging the FIFO.  ARM Ltd isn't
> > going to give me updated FPGA images for the Integrator/CP, Versatile
> > PB926, Realview EB and Versatile Express.
> 
> If we succeed with this, you (and all other users) will get the
> "improved" FPGA image. But for VE only indeed.

That's utterly useless and completely pointless.

Look, stop jumping off a cliff with this and come back to reality.  The
ISP1761 is buggy.  We have a fix.  We have a tested fix.  We have a tested
fix out on mailing lists for everyone to use.  It fixes more than MMC
overruns and makes the ISP1761 usable.

The only problem is that it's taking *FAR* too long for it to get into
mainline for no apparant reason.  That doesn't stop anyone taking the
tested patch in the URL above and gaining proper USB functionality.

The right thing to do is to follow up on my reply on the ISP1761 thread
to find out why it's taking soo long for it to get into Linus' tree.

I repeat, with the proper set of patches in mainline THERE IS NO ISSUE
WITH MMCI.  THE HARDWARE DOES NOT NEED "FIXING".  Please listen to me.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 16:53             ` Russell King - ARM Linux
@ 2011-01-24 17:03               ` Russell King - ARM Linux
  2011-01-24 17:54                 ` Pawel Moll
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 17:03 UTC (permalink / raw)
  To: linux-arm-kernel

My final mail on this subject.  I'm adding Philippe and Catalin so that
they're in the loop on this.

Take a mainline kernel.  Apply the attached three patches.  MMC will then
work without any problems.  No hacks required.  There is *absolutely* *no*
need to waste time with hardware modifications.

With this you will find that MMCI FIFO underruns/overruns are no longer
any problem.
-------------- next part --------------
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index bfc8a8a..264a6bb 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -245,6 +245,20 @@ static u32 mmc_sd_num_wr_blocks(struct mmc_card *card)
 	return result;
 }
 
+static void send_stop(struct mmc_card *card, struct request *req)
+{
+	struct mmc_command cmd;
+	int err;
+
+	memset(&cmd, 0, sizeof(struct mmc_command));
+	cmd.opcode = MMC_STOP_TRANSMISSION;
+	cmd.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
+	err = mmc_wait_for_cmd(card->host, &cmd, 0);
+	if (err)
+		pr_err("%s: error %d sending stop command\n",
+		       req->rq_disk->disk_name, err);
+}
+
 static u32 get_card_status(struct mmc_card *card, struct request *req)
 {
 	struct mmc_command cmd;
@@ -255,9 +269,9 @@ static u32 get_card_status(struct mmc_card *card, struct request *req)
 	if (!mmc_host_is_spi(card->host))
 		cmd.arg = card->rca << 16;
 	cmd.flags = MMC_RSP_SPI_R2 | MMC_RSP_R1 | MMC_CMD_AC;
-	err = mmc_wait_for_cmd(card->host, &cmd, 0);
+	err = mmc_wait_for_cmd(card->host, &cmd, 2);
 	if (err)
-		printk(KERN_ERR "%s: error %d sending status command",
+		pr_err("%s: error %d sending status command\n",
 		       req->rq_disk->disk_name, err);
 	return cmd.resp[0];
 }
@@ -336,7 +350,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 	struct mmc_blk_data *md = mq->data;
 	struct mmc_card *card = md->queue.card;
 	struct mmc_blk_request brq;
-	int ret = 1, disable_multi = 0;
+	int ret = 1, disable_multi = 0, retry = 0;
 
 	mmc_claim_host(card->host);
 
@@ -432,6 +446,53 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 		 * programming mode even when things go wrong.
 		 */
 		if (brq.cmd.error || brq.data.error || brq.stop.error) {
+			status = get_card_status(card, req);
+
+			/* First print what's up */
+			if (brq.cmd.error)
+				pr_err("%s: error %d sending read/write command, card status %#x\n",
+				       req->rq_disk->disk_name, brq.cmd.error,
+				       status);
+
+			if (brq.data.error)
+				pr_err("%s: error %d transferring data, sector %u, nr %u, cmd response %#x, card status %#x\n",
+				       req->rq_disk->disk_name, brq.data.error,
+				       (unsigned)blk_rq_pos(req),
+				       (unsigned)blk_rq_sectors(req),
+				       brq.cmd.resp[0], status);
+
+			if (brq.stop.error)
+				pr_err("%s: error %d sending stop command, original cmd response %#x, card status %#x\n",
+				       req->rq_disk->disk_name, brq.stop.error,
+				       brq.cmd.resp[0], status);
+
+			/*
+			 * Now check the current card state.  If it is
+			 * in some data transfer mode, tell it to stop
+			 * (and hopefully transition back to TRAN.)
+			 */
+			if (R1_CURRENT_STATE(status) == R1_STATE_DATA ||
+			    R1_CURRENT_STATE(status) == R1_STATE_RCV)
+				send_stop(card, req);
+
+			/*
+			 * r/w cmd failure - get_card_status() should
+			 * tell us why the command was not accepted
+			 */
+			if (brq.cmd.error && retry < 2) {
+				/*
+				 * if it was a r/w cmd crc error, or illegal
+				 * command (eg, issued in wrong state) then
+				 * retry - we should have corrected the
+				 * state problem above.
+				 */
+				if (status & (R1_COM_CRC_ERROR |
+					      R1_ILLEGAL_COMMAND)) {
+					retry++;
+					continue;
+				}
+			}
+
 			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
 				/* Redo read one sector at a time */
 				printk(KERN_WARNING "%s: retrying using single "
@@ -439,32 +500,9 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				disable_multi = 1;
 				continue;
 			}
-			status = get_card_status(card, req);
-		}
-
-		if (brq.cmd.error) {
-			printk(KERN_ERR "%s: error %d sending read/write "
-			       "command, response %#x, card status %#x\n",
-			       req->rq_disk->disk_name, brq.cmd.error,
-			       brq.cmd.resp[0], status);
-		}
 
-		if (brq.data.error) {
-			if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)
-				/* 'Stop' response contains card status */
-				status = brq.mrq.stop->resp[0];
-			printk(KERN_ERR "%s: error %d transferring data,"
-			       " sector %u, nr %u, card status %#x\n",
-			       req->rq_disk->disk_name, brq.data.error,
-			       (unsigned)blk_rq_pos(req),
-			       (unsigned)blk_rq_sectors(req), status);
-		}
-
-		if (brq.stop.error) {
-			printk(KERN_ERR "%s: error %d sending stop command, "
-			       "response %#x, card status %#x\n",
-			       req->rq_disk->disk_name, brq.stop.error,
-			       brq.stop.resp[0], status);
+			if (retry++ < 5)
+				continue;
 		}
 
 		if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
@@ -486,7 +524,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *req)
 				 * indication and the card state.
 				 */
 			} while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
-				(R1_CURRENT_STATE(cmd.resp[0]) == 7));
+			    (R1_CURRENT_STATE(cmd.resp[0]) == R1_STATE_PRG));
 
 #if 0
 			if (cmd.resp[0] & ~0x00000900)
diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h
index 612301f..9d067ee 100644
--- a/include/linux/mmc/mmc.h
+++ b/include/linux/mmc/mmc.h
@@ -133,6 +133,16 @@
 #define R1_SWITCH_ERROR		(1 << 7)	/* sx, c */
 #define R1_APP_CMD		(1 << 5)	/* sr, c */
 
+#define R1_STATE_IDLE	0
+#define R1_STATE_READY	1
+#define R1_STATE_IDENT	2
+#define R1_STATE_STBY	3
+#define R1_STATE_TRAN	4
+#define R1_STATE_DATA	5
+#define R1_STATE_RCV	6
+#define R1_STATE_PRG	7
+#define R1_STATE_DIS	8
+
 /*
  * MMC/SD in SPI mode reports R1 status always, and R2 for SEND_STATUS
  * R1 is the low order byte; R2 is the next highest byte, when present.
-------------- next part --------------
diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 5630228..040de4f 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -342,15 +342,15 @@ mmci_cmd_irq(struct mmci_host *host, struct mmc_command *cmd,
 
 	host->cmd = NULL;
 
-	cmd->resp[0] = readl(base + MMCIRESPONSE0);
-	cmd->resp[1] = readl(base + MMCIRESPONSE1);
-	cmd->resp[2] = readl(base + MMCIRESPONSE2);
-	cmd->resp[3] = readl(base + MMCIRESPONSE3);
-
 	if (status & MCI_CMDTIMEOUT) {
 		cmd->error = -ETIMEDOUT;
 	} else if (status & MCI_CMDCRCFAIL && cmd->flags & MMC_RSP_CRC) {
 		cmd->error = -EILSEQ;
+	} else {
+		cmd->resp[0] = readl(base + MMCIRESPONSE0);
+		cmd->resp[1] = readl(base + MMCIRESPONSE1);
+		cmd->resp[2] = readl(base + MMCIRESPONSE2);
+		cmd->resp[3] = readl(base + MMCIRESPONSE3);
 	}
 
 	if (!cmd->data || cmd->error) {
-------------- next part --------------
diff --git a/drivers/usb/host/isp1760-hcd.c b/drivers/usb/host/isp1760-hcd.c
index bdba8c5..9cb8722 100644
--- a/drivers/usb/host/isp1760-hcd.c
+++ b/drivers/usb/host/isp1760-hcd.c
@@ -1049,17 +1049,6 @@ static void do_atl_int(struct usb_hcd *usb_hcd)
 		if (!nakcount && (dw3 & DW3_QTD_ACTIVE)) {
 			u32 buffstatus;
 
-			/*
-			 * NAKs are handled in HW by the chip. Usually if the
-			 * device is not able to send data fast enough.
-			 * This happens mostly on slower hardware.
-			 */
-			printk(KERN_NOTICE "Reloading ptd %p/%p... qh %p read: "
-					"%d of %zu done: %08x cur: %08x\n", qtd,
-					urb, qh, PTD_XFERRED_LENGTH(dw3),
-					qtd->length, done_map,
-					(1 << queue_entry));
-
 			/* RL counter = ERR counter */
 			dw3 &= ~(0xf << 19);
 			dw3 |= rl << 19;
@@ -1770,7 +1759,7 @@ static irqreturn_t isp1760_irq(struct usb_hcd *usb_hcd)
 		goto leave;
 
 	isp1760_writel(imask, usb_hcd->regs + HC_INTERRUPT_REG);
-	if (imask & HC_ATL_INT)
+	if (imask & (HC_ATL_INT | HC_SOT_INT))
 		do_atl_int(usb_hcd);
 
 	if (imask & HC_INTL_INT)
diff --git a/drivers/usb/host/isp1760-hcd.h b/drivers/usb/host/isp1760-hcd.h
index 6931ef5..0d6023b 100644
--- a/drivers/usb/host/isp1760-hcd.h
+++ b/drivers/usb/host/isp1760-hcd.h
@@ -68,7 +68,7 @@ void deinit_kmem_cache(void);
 #define HC_INTERRUPT_REG	0x310
 
 #define HC_INTERRUPT_ENABLE	0x314
-#define INTERRUPT_ENABLE_MASK	(HC_INTL_INT | HC_ATL_INT | HC_EOT_INT)
+#define INTERRUPT_ENABLE_MASK	(HC_INTL_INT | HC_SOT_INT | HC_EOT_INT)
 
 #define HC_ISO_INT		(1 << 9)
 #define HC_ATL_INT		(1 << 8)

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 17:03               ` Russell King - ARM Linux
@ 2011-01-24 17:54                 ` Pawel Moll
  2011-01-24 18:09                   ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Pawel Moll @ 2011-01-24 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

> My final mail on this subject.

Ah, and I was really enjoying our pleasant discussion...

> Take a mainline kernel.  Apply the attached three patches.  MMC
> will then work without any problems.  No hacks required.  There is
> *absolutely* *no* need to waste time with hardware modifications.
> 
> With this you will find that MMCI FIFO underruns/overruns are no
> longer any problem.

$ git log -1
commit d315777b32a4696feb86f2a0c9e9f39c94683649
Merge: 5a05a6d 1765e3a
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Jan 24 19:58:39 2011 +1000
[...]
$ git diff > changes.diff

Attached, I trust you will find all your changes there (and only them).

/ # uname -a
Linux (none) 2.6.38-rc2+ #11 SMP Mon Jan 24 17:11:35 GMT 2011 armv7l GNU/Linux
/ # cat /dev/sda > /dev/null &
/ # dd if=/dev/mmcblk0 of=/dev/null bs=64k count=128
dd: /dev/mmcblk0: Input/output error

And dmesg is full of:
[...]
mmcblk0: error -5 transferring data, sector 254, nr 2, cmd response 0x900, card status 0xb00
end_request: I/O error, dev mmcblk0, sector 254
mmcblk0: error -5 transferring data, sector 255, nr 1, cmd response 0x900, card status 0xb00
end_request: I/O error, dev mmcblk0, sector 255
mmcblk0: error -5 transferring data, sector 0, nr 8, cmd response 0x900, card status 0x900
mmcblk0: retrying using single block read
mmcblk0: error -5 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb00
[...]

Now:

$ git am 0001-arm-Improve-MMC-performance-on-Versatile-Express.patch

/ # uname -a
Linux (none) 2.6.38-rc2+ #12 SMP Mon Jan 24 17:41:53 GMT 2011 armv7l GNU/Linux
/ # cat /dev/sda > /dev/null &
/ # dd if=/dev/mmcblk0 of=/dev/null bs=64k count=128
128+0 records in
128+0 records out
8388608 bytes (8.0MB) copied, 1.501529 seconds, 5.3MB/s

> THERE IS NO ISSUE WITH MMCI.  THE HARDWARE DOES NOT NEED "FIXING".

I'm really pleased you think so. It's always nice to have happy customers :-)

Cheers!

Pawe?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: changes.diff
Type: application/octet-stream
Size: 7990 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20110124/35e18346/attachment-0001.obj>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 17:54                 ` Pawel Moll
@ 2011-01-24 18:09                   ` Russell King - ARM Linux
  2011-01-24 19:59                     ` Pawel Moll
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 18:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 24, 2011 at 05:54:18PM -0000, Pawel Moll wrote:
> $ git am 0001-arm-Improve-MMC-performance-on-Versatile-Express.patch
> 
> / # uname -a
> Linux (none) 2.6.38-rc2+ #12 SMP Mon Jan 24 17:41:53 GMT 2011 armv7l GNU/Linux
> / # cat /dev/sda > /dev/null &
> / # dd if=/dev/mmcblk0 of=/dev/null bs=64k count=128
> 128+0 records in
> 128+0 records out
> 8388608 bytes (8.0MB) copied, 1.501529 seconds, 5.3MB/s
> 
> > THERE IS NO ISSUE WITH MMCI.  THE HARDWARE DOES NOT NEED "FIXING".
> 
> I'm really pleased you think so. It's always nice to have happy customers :-)

If you're flooding the system with USB traffic, enlargening the FIFO size
won't help.  Making the FIFO larger just decreases the _interrupt_ _latency_
requirements.  It doesn't mean you can cope with the amount of data being
transferred.

	if (cycles-to-transfer-usb-packets + cycles-to-transfer-MMC-data >
			available-cpu-cycles)
		you_are_stuffed(even_with_larger_fifo);

So I'm not surprised that running USB and MMC at full speed results in
MMC losing out.  You will find that even with a larger FIFO, MMC will
still lose out.

Why?  The ISP1761 can store the packets without complaint, and the host
CPU can read them when it's ready.  As soon as one packet is read off
the host, the next packet is probably sitting there waiting for the
host CPU to read it.  When the ISP1761 buffer becomes full, it can
tell the device to stop sending data.

The result of that is the ISP1761 will be able to transfer data as fast
as the host CPU can possibly go - to the exclusion of the MMC interface.
No amount of FIFO (well, except several KB to cover the _largest_ size
of MMC transfer we request) will resolve the problem of USB and MMC
sharing the same CPU.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 18:09                   ` Russell King - ARM Linux
@ 2011-01-24 19:59                     ` Pawel Moll
  2011-01-24 23:10                       ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Pawel Moll @ 2011-01-24 19:59 UTC (permalink / raw)
  To: linux-arm-kernel

> If you're flooding the system with USB traffic, enlargening the
> FIFO size won't help.  Making the FIFO larger just decreases the
> _interrupt_ _latency_ requirements.  It doesn't mean you can
> cope with the amount of data being transferred.

On VE both ISP and MMCI are sharing the same static memory interface, which naturally limit their throughput. The ISP has a limited transfer buffer (about 60kB) and will not start new transfers before there is some space available. And of course USB devices are unable to initiate _any_ transactions on their own, so host can simply stall the bus on its discretion.

In my example, when I'm reading as much data from an USB device as possible, the ISP buffer will be filled at some point and the interrupt asserted. Now the handler will perform its duties. I'm guessing (not sure, I'm just unable to read this code today) it gathers all the received transfers and schedules new descriptors, then return. And now it's the MMC opportunity - if it can "survive", not being starved, till this moment (and, as experiments suggest it's about 1.3ms in the worst case scenario), it has a chance to grab CPU's attention, before the USB transfers are finished and the USB interrupt asserted again. If not that - I'll try to bump up MMC interrupt priority.

But it's all theory right now, just what I'm hoping to achieve, and - will all due respect - it's not your decision whether or not we perform this exercise. And there are virtually no changes to the driver required - all we have to do is to add additional "struct variant_data", so you may not be concerned about horrible driver hacks.

Of course you may be absolutely right and we won't succeed to make it much better. But even in this case we should be able to clock the card faster, which will bring performance gain on its own.

Good night!

Pawe?

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 19:59                     ` Pawel Moll
@ 2011-01-24 23:10                       ` Russell King - ARM Linux
  2011-02-01 11:18                         ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-01-24 23:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 24, 2011 at 07:59:03PM +0000, Pawel Moll wrote:
> > If you're flooding the system with USB traffic, enlargening the
> > FIFO size won't help.  Making the FIFO larger just decreases the
> > _interrupt_ _latency_ requirements.  It doesn't mean you can
> > cope with the amount of data being transferred.
> 
> On VE both ISP and MMCI are sharing the same static memory interface,

What has that to do with it?  If the static memory controller was the
bottleneck, don't you think that two CPUs running in parallel, one
reading data from the ISP1761 and the other reading the MMCI would
suffer bus starvation?  Your "HACK HACK HACK" patch shows that clearly
isn't the case.

You've already told me that you've measured the ISP1761 interrupt
handler taking 1.3ms to empty data off of the chip.  If that's 60K
of data, that's a data rate of around 47MiB/s.

At 521kHz transfer rate, it takes about 490us for MMCI to half-fill
its FIFO, or 980us to fully fill it.  It takes (measured) about 6-9us
to unload 32 bytes of data from the FIFO.

Translating the CPU read rate, that's a data rate of around 4MiB/s.

So I put it to you that there's plenty of bus bandwidth present to
service both the ISP1761 and MMCI.  What we're lacking is CPU
bandwidth.

I guess you haven't thought about moving MMCI to an adaptive clocking
solution?  What I'm suggesting is halve the clock rate on FIFO error
and retry.  Increase the clock rate on each successful transfer up
to the maximum provided by the core MMC code.

That should _significantly_ increase the achievable PIO data rate
way beyond what a deeper FIFO could ever hope to achieve, and will
allow it to adapt to situations where you load the system up beyond
the maximum latency which the MMCI can stand.

This would benefit a whole range of platforms, improving performance
across the board, which as you've already indicated merely going for
a deeper FIFO would be unable to do.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 23:10                       ` Russell King - ARM Linux
@ 2011-02-01 11:18                         ` Russell King - ARM Linux
  2011-02-01 13:34                           ` Pawel Moll
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-02-01 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 24, 2011 at 11:10:50PM +0000, Russell King - ARM Linux wrote:
> I guess you haven't thought about moving MMCI to an adaptive clocking
> solution?  What I'm suggesting is halve the clock rate on FIFO error
> and retry.  Increase the clock rate on each successful transfer up
> to the maximum provided by the core MMC code.
> 
> That should _significantly_ increase the achievable PIO data rate
> way beyond what a deeper FIFO could ever hope to achieve, and will
> allow it to adapt to situations where you load the system up beyond
> the maximum latency which the MMCI can stand.

And to prove the point, I have MMCI running at up to 4Mbps, an 8 fold
increase over what the current fixed upper-rate implementation does.
The adaptive rate implementation is just a proof of concept at the
moment and requires further work to improve the rate selection algorithm.

The real solution to this is for there to be proper working DMA support
implemented on ARM platforms, rather than requiring every peripheral to
be driven via CPU intensive PIO all the time.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-02-01 11:18                         ` Russell King - ARM Linux
@ 2011-02-01 13:34                           ` Pawel Moll
  2011-02-01 14:28                             ` Russell King - ARM Linux
  0 siblings, 1 reply; 25+ messages in thread
From: Pawel Moll @ 2011-02-01 13:34 UTC (permalink / raw)
  To: linux-arm-kernel

> And to prove the point, I have MMCI running at up to 4Mbps, an 8 fold
> increase over what the current fixed upper-rate implementation does.
> The adaptive rate implementation is just a proof of concept at the
> moment and requires further work to improve the rate selection algorithm.

Great, I've terribly glad you managed to have a go at this (I honestly
wanted to, but simply had no time). I'm looking forward to see the
patches and will be more than happy to backport them for the sake of the
Linaro guys using 2.6.35 and 2.6.37 right now.

On our side we did extend the FIFO and performed some tests (not very
extensive yet though). The change seems not to break anything and help
in the pathological (heavy USB traffic) scenario.

When I get your changes and some official FPGA release, I'll try to push
the bandwidth limits even further - hopefully changes will complement.
Of course there is absolutely no need of using the modified version if
one opts not to do so ;-) It's just a line in the board configuration
file, selecting one FPGA image or the other - they have different
"configuration value" in the peripheral ID so the driver can distinguish
them.

> The real solution to this is for there to be proper working DMA support
> implemented on ARM platforms,

In case of VE this is all about getting an engine into the test chips,
what didn't happen for A9 (the request lines are routed between the
motherboard and the tile and IO FPGA can - theoretically - use the MMCI
requests). As far as I'm told this cell is simply huge (silicon-wise)
and therefore it's the first candidate to cut down when area is
scarce... Anyway, I've spoken to guys around and asked them to keep the
problem in mind, so we may get something with the next releases.

Cheers!

Pawe?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-02-01 13:34                           ` Pawel Moll
@ 2011-02-01 14:28                             ` Russell King - ARM Linux
  2011-02-01 17:25                               ` Pawel Moll
  0 siblings, 1 reply; 25+ messages in thread
From: Russell King - ARM Linux @ 2011-02-01 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 01, 2011 at 01:34:59PM +0000, Pawel Moll wrote:
> > And to prove the point, I have MMCI running at up to 4Mbps, an 8 fold
> > increase over what the current fixed upper-rate implementation does.
> > The adaptive rate implementation is just a proof of concept at the
> > moment and requires further work to improve the rate selection algorithm.
> 
> Great, I've terribly glad you managed to have a go at this (I honestly
> wanted to, but simply had no time). I'm looking forward to see the
> patches and will be more than happy to backport them for the sake of the
> Linaro guys using 2.6.35 and 2.6.37 right now.
> 
> On our side we did extend the FIFO and performed some tests (not very
> extensive yet though). The change seems not to break anything and help
> in the pathological (heavy USB traffic) scenario.
> 
> When I get your changes and some official FPGA release, I'll try to push
> the bandwidth limits even further - hopefully changes will complement.

You can't push it any further without increasing the CPU/bus clock rates.
My measurements show that it takes the CPU in the region of 6-9us to
unload 32 bytes from the FIFO, which gives a theoretical limit of 2.8
to 4.2Mbps, depending on how the platform booted (some reboots its
consistently in the order of 6us, some boots its consistently around 9us.)

> > The real solution to this is for there to be proper working DMA support
> > implemented on ARM platforms,
> 
> In case of VE this is all about getting an engine into the test chips,
> what didn't happen for A9 (the request lines are routed between the
> motherboard and the tile and IO FPGA can - theoretically - use the MMCI
> requests). As far as I'm told this cell is simply huge (silicon-wise)
> and therefore it's the first candidate to cut down when area is
> scarce... Anyway, I've spoken to guys around and asked them to keep the
> problem in mind, so we may get something with the next releases.

Bear in mind that PL18x + PL08x doesn't work.  Catalin forwarded my
concerns over this to ARM Support - where I basically ask how to program
the hardware up to DMA a single 64K transfer off a MMC card into a set
of scattered memory locations.

I've yet to have a response, so I'll take it that it's just possible
(the TRMs say as much).

The problem is that for a transfer, the MMCI produces BREQ * n + LBREQ,
and the DMAC will only listen for a LBREQ if it's in peripheral flow
control.  If it's in peripheral flow control, then it ignores the transfer
length field in the control register, only moving to the next LLI when it
sees LBREQ or LSREQ.

It ignores LBREQ and LSREQ in DMAC flow control mode..  You can DMA almost
all the data too/from the MMCI, but you miss the last half-fifo-size worth
of data.  While you can unload that manually for a read, you can't load
it manually for a write.

With peripheral flow control, you can only DMA the requested data to a
single contiguous buffer without breaking the MMC request into much
smaller chunks.  As Peter Pearse's PL08x code seems to suggest, the
maximum size of those chunks is 1K.

This seems to be a fundamental problem with the way each primecell has
been designed.

So, I do hope that someone decides to implement something more reasonable
if Versatile Express were to get a DMA controller.  If it's another PL08x
then it isn't worth it - it won't work.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-21 22:29       ` Russell King - ARM Linux
@ 2011-02-01 14:29         ` Linus Walleij
  2011-02-01 17:25           ` Pawel Moll
  0 siblings, 1 reply; 25+ messages in thread
From: Linus Walleij @ 2011-02-01 14:29 UTC (permalink / raw)
  To: linux-arm-kernel

2011/1/21 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Fri, Jan 21, 2011 at 05:20:57PM -0500, Nicolas Pitre wrote:
>> The only solution in that case is to give top priority to the FIFO IRQ
>> and never disable IRQs when in interrupt context, except for that FIFO
>> servicing handler which should keep IRQs masked out throughout. ?In any
>> case this would certainly be only a hack for badly misdesigned hardware.
>
> Not possible anymore. ?The kernel's IRQ handling has changed such that
> generic code now ensures that IRQs are disabled irrespective of the
> IRQF_DISABLED flag. ?All IRQ handlers are called with IRQs disabled,
> and they remain that way until they call local_irq_enable().

Then the only way of assuring low latency on this one IRQ would
be to convert the IRQ handlers for all the *other* hardware in the
Vexpress to request_threaded_irq(), and that's what the RealTime
folks do all the time I believe.

[Pawel Moll]
> (...) so far the only time when problem happens is the timeout caused
> by ISP1761 handler.

Have you considered switching the ISP1761 handler to
request_threaded_irq() with IRQF_ONESHOT | IRQF_NO_SUSPEND
so it runs in process context with that IRQ masked off, until completion?

Linus Walleij

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-02-01 14:28                             ` Russell King - ARM Linux
@ 2011-02-01 17:25                               ` Pawel Moll
  0 siblings, 0 replies; 25+ messages in thread
From: Pawel Moll @ 2011-02-01 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

> So, I do hope that someone decides to implement something more reasonable
> if Versatile Express were to get a DMA controller.  If it's another PL08x
> then it isn't worth it - it won't work.

More likely it will be DMA-330 formerly known as PL330. But as I said -
it's about the chip that Core Tile carries (whether it's feasible to
squeeze it inside).

Alternatively one could use Logic Tile (FPGA based) in the second
daughterboard site with an engine implementation, but I don't expect
many users willing to pay $$$$ just for this reason :-)

Cheers!

Pawe?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-02-01 14:29         ` Linus Walleij
@ 2011-02-01 17:25           ` Pawel Moll
  0 siblings, 0 replies; 25+ messages in thread
From: Pawel Moll @ 2011-02-01 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

> Have you considered switching the ISP1761 handler to
> request_threaded_irq() with IRQF_ONESHOT | IRQF_NO_SUSPEND
> so it runs in process context with that IRQ masked off, until completion?

That's something that Will suggested, but no - I didn't try it. This may
be worth discussing with the ISP1761 driver maintainer...

Cheers!

Pawe?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] arm: Improve MMC performance on Versatile Express
  2011-01-24 13:35     ` Russell King - ARM Linux
  2011-01-24 16:13       ` Pawel Moll
@ 2011-02-03 14:15       ` Linus Walleij
  1 sibling, 0 replies; 25+ messages in thread
From: Linus Walleij @ 2011-02-03 14:15 UTC (permalink / raw)
  To: linux-arm-kernel

Following up on this:

2011/1/24 Russell King - ARM Linux <linux@arm.linux.org.uk>:
> On Mon, Jan 24, 2011 at 12:27:16PM -0000, Pawel Moll wrote:
>> So - we'll try to enlarge FIFO. For the moment - playing with interrupts
>> affinity seem to be a viable workaround.
>
> I don't think enlarging the FIFO will help too much. ?The issue is
> whether the CPU can keep up with the data rate coming off the card.
> If it can't, then no matter how large the FIFO is, it will eventually
> overflow.
>
> The real answer is to avoid PIO mode, and use DMA support. ?However,
> I've had problems using DMA on the ARM development boards. ?You can
> find details my DMA issues internally within ARM by talking to Catalin.

I fully agree with Russell, MMC by nature begs to be used with DMA.
Hopefully PL330 does not have all the basic problems found in
PL080/PL081, yet Samsung (som version) and ST-Ericsson Nomadik
does use the PL080, albeit in modified versions.

> The alternative answer, I believe implemented by some of ARMs silicon
> partners, is to turn the card clock off when the FIFO becomes full/empty
> to stop it sending more data. ?I think this violates some of the MMC/SD
> requirements, but it seems to work for the silicon partners just fine.

One of these fixes does not exclude the other.

We have this "hardware flow control" in U300, Nomadik and Ux500.
Basically the clock to the card is simply gated if the FIFO risk
to over/underflow.

To be precise, it gates the MCIFBCLK, MCICLKOUT and
MCICLK to the card when either RX FIFO is full and DPSM
is enabled, or TX FIFO is empty and DPSM is enabled.
We do not mess with the internal MCLK clock.

We have some experience in not even DMA being quick enough to
avoid overflows under all conditions, making it necessary to clock
down the host undesirably low. Increasing FIFO depth will
actually help to some extent in this case.

For example: the SD spec permits us to clock the card at something
like 23 MHz and with 4 data lines and a standard 64 byte (16 word)
FIFO this will fill up in 2.8 microseconds. So that, or rather half of
it 1.3 us, is the maximum allowed interupt latency, unless you clock
down the card.

The system IRQ latency is a swamp of heuristics unless you have
things like FIQ or realtime patches as mentioned.

And as mentioned by Russell one way to mitigate the effect
that would also benefit current Versatiles and RealViews would
be to dynamically recalibrate the card clock with some
error-feedback loop. (Would be pretty cool actually!)

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-02-03 14:15 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-21 13:59 [PATCH] arm: Improve MMC performance on Versatile Express Pawel Moll
2011-01-21 15:09 ` Sergei Shtylyov
2011-01-21 19:59 ` Nicolas Pitre
2011-01-21 21:51   ` Russell King - ARM Linux
2011-01-21 22:20     ` Nicolas Pitre
2011-01-21 22:29       ` Russell King - ARM Linux
2011-02-01 14:29         ` Linus Walleij
2011-02-01 17:25           ` Pawel Moll
2011-01-24 12:27   ` Pawel Moll
2011-01-24 13:35     ` Russell King - ARM Linux
2011-01-24 16:13       ` Pawel Moll
2011-01-24 16:24         ` Russell King - ARM Linux
2011-01-24 16:30           ` Russell King - ARM Linux
2011-01-24 16:39           ` Pawel Moll
2011-01-24 16:53             ` Russell King - ARM Linux
2011-01-24 17:03               ` Russell King - ARM Linux
2011-01-24 17:54                 ` Pawel Moll
2011-01-24 18:09                   ` Russell King - ARM Linux
2011-01-24 19:59                     ` Pawel Moll
2011-01-24 23:10                       ` Russell King - ARM Linux
2011-02-01 11:18                         ` Russell King - ARM Linux
2011-02-01 13:34                           ` Pawel Moll
2011-02-01 14:28                             ` Russell King - ARM Linux
2011-02-01 17:25                               ` Pawel Moll
2011-02-03 14:15       ` Linus Walleij

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.