From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cyrille Pitchen Subject: SPI: performance regression when using the common message queuing infrastructure Date: Wed, 6 Jul 2016 11:50:28 +0200 Message-ID: <577CD464.6050506@atmel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org" , , "Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org" , , "linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org" , "linux-spi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" To: Mark Brown Return-path: Sender: linux-spi-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Hi Mark, recently Heiko reported to us a performance regression with Atmel SPI controllers. He noticed the issue on a sam9g15ek board and I was also a= ble to reproduce it on a sama5d36ek board. We found out that the performance regression was introduced in 3.14 by = commit: 8090d6d1a415d3ae1a7208995decfab8f60f4f36 spi: atmel: Refactor spi-atmel to use SPI framework queue =46or the test, I connected a Spansion S25FL512 memory on the SPI1 cont= roller of a sama5d36ek board. Then with an oscilloscope I monitored the chip-sele= ct, clock and MOSI signals on the SPI bus. 1 - Reading 512 bytes from the memory # dd if=3D/dev/mtd6 bs=3D512 count=3D1 of=3D/dev/null With the oscilloscope, I measured the time between the chip-select fell= before the Read Status command (05h) and the chip-select rose after all data h= ad been read by the 4-byte address Fast Read 1-1-1 command (13h). 3.14 vanilla : 305 =C2=B5s 3.14 commit 8090d6d1a415 reverted : 242 =C2=B5s -21% 2 - Reading 1000 x 1024 bytes from the memory # dd if=3D/dev/mtd6 bs=3D1024 count=3D1000 of=3D/dev/null Still with the scope, I measured the time to read all data. 3.14 vanilla : 435 ms 3.14 commit 8090d6d1a415 reverted : 361 ms -17% Indeed the oscilloscope shows that more time is spent between messages = and transfers. commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/t= ransfer queue by a workqueue provided by the SPI framework. The support of this (optional) workqueue was introduced by commit: ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0 spi: create a message queuing infrastructure Though the commit message claims that is common infrastructure is optio= nal, the patch also claims the .transfer() hook is deprecated, suggesting dr= ivers should implement the new .transfer_one_message() hook instead. This is the reason why commit 8090d6d1a415 was submitted. However we lo= st quite amount of performances moving from our tasklet to the generic wor= kqueue. So do you recommend us to keep our current generic implementation relyi= ng on the SPI framework workqueue or to go back to a custom implementation us= ing tasklet? If we keep the current implementation, is there a way to improve the performances so we go back to something close to what he had before? We saw in commit ffbbdd21329f that we can change the workqueue thread scheduling policy to SCHED_FIFO by setting master->rt. Best regards, Cyrille -- To unsubscribe from this list: send the line "unsubscribe linux-spi" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html From mboxrd@z Thu Jan 1 00:00:00 1970 From: cyrille.pitchen@atmel.com (Cyrille Pitchen) Date: Wed, 6 Jul 2016 11:50:28 +0200 Subject: SPI: performance regression when using the common message queuing infrastructure Message-ID: <577CD464.6050506@atmel.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Mark, recently Heiko reported to us a performance regression with Atmel SPI controllers. He noticed the issue on a sam9g15ek board and I was also able to reproduce it on a sama5d36ek board. We found out that the performance regression was introduced in 3.14 by commit: 8090d6d1a415d3ae1a7208995decfab8f60f4f36 spi: atmel: Refactor spi-atmel to use SPI framework queue For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock and MOSI signals on the SPI bus. 1 - Reading 512 bytes from the memory # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null With the oscilloscope, I measured the time between the chip-select fell before the Read Status command (05h) and the chip-select rose after all data had been read by the 4-byte address Fast Read 1-1-1 command (13h). 3.14 vanilla : 305 ?s 3.14 commit 8090d6d1a415 reverted : 242 ?s -21% 2 - Reading 1000 x 1024 bytes from the memory # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null Still with the scope, I measured the time to read all data. 3.14 vanilla : 435 ms 3.14 commit 8090d6d1a415 reverted : 361 ms -17% Indeed the oscilloscope shows that more time is spent between messages and transfers. commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer queue by a workqueue provided by the SPI framework. The support of this (optional) workqueue was introduced by commit: ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0 spi: create a message queuing infrastructure Though the commit message claims that is common infrastructure is optional, the patch also claims the .transfer() hook is deprecated, suggesting drivers should implement the new .transfer_one_message() hook instead. This is the reason why commit 8090d6d1a415 was submitted. However we lost quite amount of performances moving from our tasklet to the generic workqueue. So do you recommend us to keep our current generic implementation relying on the SPI framework workqueue or to go back to a custom implementation using tasklet? If we keep the current implementation, is there a way to improve the performances so we go back to something close to what he had before? We saw in commit ffbbdd21329f that we can change the workqueue thread scheduling policy to SCHED_FIFO by setting master->rt. Best regards, Cyrille