From mboxrd@z Thu Jan 1 00:00:00 1970 From: arnd@arndb.de (Arnd Bergmann) Date: Thu, 11 Sep 2014 12:20:42 +0200 Subject: [PATCH] ARM: cache-l2x0: optimize aurora range operations In-Reply-To: <20140911120839.59d4b728@free-electrons.com> References: <2852268.nkG1OoBDfE@wuerfel> <2885617.4rXk1QzsZ5@wuerfel> <20140911120839.59d4b728@free-electrons.com> Message-ID: <3371190.Z44AUJudFN@wuerfel> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Thursday 11 September 2014 12:08:39 Thomas Petazzoni wrote: > On Mon, 08 Sep 2014 22:43:32 +0200, Arnd Bergmann wrote: > > The aurora_inv_range(), aurora_clean_range() and aurora_flush_range() > > functions are highly redundant, both in source and in object code, and > > they are harder to understand than necessary. > > > > By moving the range loop into the aurora_pa_range() function, they > > become trivial wrappers, and the object code start looking like what > > one would expect for an optimal implementation. > > > > Signed-off-by: Arnd Bergmann > > --- > > arch/arm/mm/cache-l2x0.c | 67 +++++++++++++++++----------------------------------------- > > 1 file changed, 22 insertions(+), 45 deletions(-) > > Tested-by: Thomas Petazzoni > (on Armada 370 RD and Armada XP GP, boot tested, plus a little bit of > DMA traffic by reading data from a SD card) > Ok, thanks! I wonder if it's worth doing benchmarks over this, as there is still a little optimization potential in the range function if we hold the spinlock a little longer. Right now we drop and reacquire the lock for every 1024 bytes being flushed, which can cause a lot of traffic on the coherency bus if we flush really long ranges. We could hold the lock across the entire loop to improve that, but that can also cause extra latency if multiple CPUs try to do cache operations simultaneously. Arnd