From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@arm.linux.org.uk (Russell King - ARM Linux) Date: Mon, 24 Jan 2011 23:10:50 +0000 Subject: [PATCH] arm: Improve MMC performance on Versatile Express In-Reply-To: References: <000001cbbbc2$0e815e80$2b841b80$@moll@arm.com> <20110124133513.GL16202@n2100.arm.linux.org.uk> <20110124162400.GC24104@n2100.arm.linux.org.uk> <000101cbbbe5$45d47ed0$d17d7c70$@moll@arm.com> <20110124165356.GG24104@n2100.arm.linux.org.uk> <20110124170304.GH24104@n2100.arm.linux.org.uk> <000201cbbbef$bcec4610$36c4d230$@moll@arm.com> <20110124180944.GK24104@n2100.arm.linux.org.uk> Message-ID: <20110124231050.GP24104@n2100.arm.linux.org.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Mon, Jan 24, 2011 at 07:59:03PM +0000, Pawel Moll wrote: > > If you're flooding the system with USB traffic, enlargening the > > FIFO size won't help. Making the FIFO larger just decreases the > > _interrupt_ _latency_ requirements. It doesn't mean you can > > cope with the amount of data being transferred. > > On VE both ISP and MMCI are sharing the same static memory interface, What has that to do with it? If the static memory controller was the bottleneck, don't you think that two CPUs running in parallel, one reading data from the ISP1761 and the other reading the MMCI would suffer bus starvation? Your "HACK HACK HACK" patch shows that clearly isn't the case. You've already told me that you've measured the ISP1761 interrupt handler taking 1.3ms to empty data off of the chip. If that's 60K of data, that's a data rate of around 47MiB/s. At 521kHz transfer rate, it takes about 490us for MMCI to half-fill its FIFO, or 980us to fully fill it. It takes (measured) about 6-9us to unload 32 bytes of data from the FIFO. Translating the CPU read rate, that's a data rate of around 4MiB/s. So I put it to you that there's plenty of bus bandwidth present to service both the ISP1761 and MMCI. What we're lacking is CPU bandwidth. I guess you haven't thought about moving MMCI to an adaptive clocking solution? What I'm suggesting is halve the clock rate on FIFO error and retry. Increase the clock rate on each successful transfer up to the maximum provided by the core MMC code. That should _significantly_ increase the achievable PIO data rate way beyond what a deeper FIFO could ever hope to achieve, and will allow it to adapt to situations where you load the system up beyond the maximum latency which the MMCI can stand. This would benefit a whole range of platforms, improving performance across the board, which as you've already indicated merely going for a deeper FIFO would be unable to do.