SPI: performance regression when using the common message queuing infrastructure

All of lore.kernel.org
 help / color / mirror / Atom feed

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-06  9:50 ` Cyrille Pitchen
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrille Pitchen @ 2016-07-06  9:50 UTC (permalink / raw)
  To: Mark Brown
  Cc: Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w, hs-ynQEQJNshbs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-spi-u79uwXL29TY76Z2rM5mHXA

Hi Mark,

recently Heiko reported to us a performance regression with Atmel SPI
controllers. He noticed the issue on a sam9g15ek board and I was also able to
reproduce it on a sama5d36ek board.

We found out that the performance regression was introduced in 3.14 by commit:
8090d6d1a415d3ae1a7208995decfab8f60f4f36
spi: atmel: Refactor spi-atmel to use SPI framework queue

For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
and MOSI signals on the SPI bus.

1 - Reading 512 bytes from the memory

# dd if=/dev/mtd6 bs=512 count=1 of=/dev/null

With the oscilloscope, I measured the time between the chip-select fell before
the Read Status command (05h) and the chip-select rose after all data had been
read by the 4-byte address Fast Read 1-1-1 command (13h).

3.14 vanilla                      : 305 µs
3.14 commit 8090d6d1a415 reverted : 242 µs   -21%

2 - Reading 1000 x 1024 bytes from the memory

# dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null

Still with the scope, I measured the time to read all data.

3.14 vanilla                      : 435 ms
3.14 commit 8090d6d1a415 reverted : 361 ms   -17%

Indeed the oscilloscope shows that more time is spent between messages and
transfers.

commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
queue by a workqueue provided by the SPI framework.

The support of this (optional) workqueue was introduced by commit:
ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
spi: create a message queuing infrastructure

Though the commit message claims that is common infrastructure is optional,
the patch also claims the .transfer() hook is deprecated, suggesting drivers
should implement the new .transfer_one_message() hook instead.

This is the reason why commit 8090d6d1a415 was submitted. However we lost
quite amount of performances moving from our tasklet to the generic workqueue.

So do you recommend us to keep our current generic implementation relying on
the SPI framework workqueue or to go back to a custom implementation using
tasklet?
If we keep the current implementation, is there a way to improve the
performances so we go back to something close to what he had before?

We saw in commit ffbbdd21329f that we can change the workqueue thread
scheduling policy to SCHED_FIFO by setting master->rt.

Best regards,

Cyrille
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-06  9:50 ` Cyrille Pitchen
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrille Pitchen @ 2016-07-06  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

recently Heiko reported to us a performance regression with Atmel SPI
controllers. He noticed the issue on a sam9g15ek board and I was also able to
reproduce it on a sama5d36ek board.

We found out that the performance regression was introduced in 3.14 by commit:
8090d6d1a415d3ae1a7208995decfab8f60f4f36
spi: atmel: Refactor spi-atmel to use SPI framework queue

For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
and MOSI signals on the SPI bus.

1 - Reading 512 bytes from the memory

# dd if=/dev/mtd6 bs=512 count=1 of=/dev/null

With the oscilloscope, I measured the time between the chip-select fell before
the Read Status command (05h) and the chip-select rose after all data had been
read by the 4-byte address Fast Read 1-1-1 command (13h).

3.14 vanilla                      : 305 ?s
3.14 commit 8090d6d1a415 reverted : 242 ?s   -21%

2 - Reading 1000 x 1024 bytes from the memory

# dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null

Still with the scope, I measured the time to read all data.

3.14 vanilla                      : 435 ms
3.14 commit 8090d6d1a415 reverted : 361 ms   -17%

Indeed the oscilloscope shows that more time is spent between messages and
transfers.

commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
queue by a workqueue provided by the SPI framework.

The support of this (optional) workqueue was introduced by commit:
ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
spi: create a message queuing infrastructure

Though the commit message claims that is common infrastructure is optional,
the patch also claims the .transfer() hook is deprecated, suggesting drivers
should implement the new .transfer_one_message() hook instead.

This is the reason why commit 8090d6d1a415 was submitted. However we lost
quite amount of performances moving from our tasklet to the generic workqueue.

So do you recommend us to keep our current generic implementation relying on
the SPI framework workqueue or to go back to a custom implementation using
tasklet?
If we keep the current implementation, is there a way to improve the
performances so we go back to something close to what he had before?

We saw in commit ffbbdd21329f that we can change the workqueue thread
scheduling policy to SCHED_FIFO by setting master->rt.

Best regards,

Cyrille

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SPI: performance regression when using the common message queuing infrastructure
  2016-07-06  9:50 ` Cyrille Pitchen
@ 2016-07-06 10:03     ` Grygorii Strashko
  -1 siblings, 0 replies; 14+ messages in thread
From: Grygorii Strashko @ 2016-07-06 10:03 UTC (permalink / raw)
  To: Cyrille Pitchen, Mark Brown
  Cc: linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w,
	Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w,
	linux-spi-u79uwXL29TY76Z2rM5mHXA, hs-ynQEQJNshbs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
> Hi Mark,
>
> recently Heiko reported to us a performance regression with Atmel SPI
> controllers. He noticed the issue on a sam9g15ek board and I was also able to
> reproduce it on a sama5d36ek board.
>
> We found out that the performance regression was introduced in 3.14 by commit:
> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
> spi: atmel: Refactor spi-atmel to use SPI framework queue
>
> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
> and MOSI signals on the SPI bus.
>
>
> 1 - Reading 512 bytes from the memory
>
> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>
> With the oscilloscope, I measured the time between the chip-select fell before
> the Read Status command (05h) and the chip-select rose after all data had been
> read by the 4-byte address Fast Read 1-1-1 command (13h).
>
> 3.14 vanilla                      : 305 µs
> 3.14 commit 8090d6d1a415 reverted : 242 µs   -21%
>
> 2 - Reading 1000 x 1024 bytes from the memory
>
> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>
> Still with the scope, I measured the time to read all data.
>
> 3.14 vanilla                      : 435 ms
> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>
>
> Indeed the oscilloscope shows that more time is spent between messages and
> transfers.
>
> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
> queue by a workqueue provided by the SPI framework.
>
> The support of this (optional) workqueue was introduced by commit:
> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
> spi: create a message queuing infrastructure
>
> Though the commit message claims that is common infrastructure is optional,
> the patch also claims the .transfer() hook is deprecated, suggesting drivers
> should implement the new .transfer_one_message() hook instead.
>
> This is the reason why commit 8090d6d1a415 was submitted. However we lost
> quite amount of performances moving from our tasklet to the generic workqueue.
>
> So do you recommend us to keep our current generic implementation relying on
> the SPI framework workqueue or to go back to a custom implementation using
> tasklet?
> If we keep the current implementation, is there a way to improve the
> performances so we go back to something close to what he had before?
>
> We saw in commit ffbbdd21329f that we can change the workqueue thread
> scheduling policy to SCHED_FIFO by setting master->rt.
>

master->rt is not a good choice as i know and
you may find thread [1] useful for you.

[1] http://www.spinics.net/lists/linux-rt-users/msg14347.html

-- 
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-06 10:03     ` Grygorii Strashko
  0 siblings, 0 replies; 14+ messages in thread
From: Grygorii Strashko @ 2016-07-06 10:03 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
> Hi Mark,
>
> recently Heiko reported to us a performance regression with Atmel SPI
> controllers. He noticed the issue on a sam9g15ek board and I was also able to
> reproduce it on a sama5d36ek board.
>
> We found out that the performance regression was introduced in 3.14 by commit:
> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
> spi: atmel: Refactor spi-atmel to use SPI framework queue
>
> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
> and MOSI signals on the SPI bus.
>
>
> 1 - Reading 512 bytes from the memory
>
> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>
> With the oscilloscope, I measured the time between the chip-select fell before
> the Read Status command (05h) and the chip-select rose after all data had been
> read by the 4-byte address Fast Read 1-1-1 command (13h).
>
> 3.14 vanilla                      : 305 ?s
> 3.14 commit 8090d6d1a415 reverted : 242 ?s   -21%
>
> 2 - Reading 1000 x 1024 bytes from the memory
>
> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>
> Still with the scope, I measured the time to read all data.
>
> 3.14 vanilla                      : 435 ms
> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>
>
> Indeed the oscilloscope shows that more time is spent between messages and
> transfers.
>
> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
> queue by a workqueue provided by the SPI framework.
>
> The support of this (optional) workqueue was introduced by commit:
> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
> spi: create a message queuing infrastructure
>
> Though the commit message claims that is common infrastructure is optional,
> the patch also claims the .transfer() hook is deprecated, suggesting drivers
> should implement the new .transfer_one_message() hook instead.
>
> This is the reason why commit 8090d6d1a415 was submitted. However we lost
> quite amount of performances moving from our tasklet to the generic workqueue.
>
> So do you recommend us to keep our current generic implementation relying on
> the SPI framework workqueue or to go back to a custom implementation using
> tasklet?
> If we keep the current implementation, is there a way to improve the
> performances so we go back to something close to what he had before?
>
> We saw in commit ffbbdd21329f that we can change the workqueue thread
> scheduling policy to SCHED_FIFO by setting master->rt.
>

master->rt is not a good choice as i know and
you may find thread [1] useful for you.

[1] http://www.spinics.net/lists/linux-rt-users/msg14347.html

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SPI: performance regression when using the common message queuing infrastructure
  2016-07-06 10:03     ` Grygorii Strashko
@ 2016-07-07  8:12         ` Cyrille Pitchen
  -1 siblings, 0 replies; 14+ messages in thread
From: Cyrille Pitchen @ 2016-07-07  8:12 UTC (permalink / raw)
  To: Grygorii Strashko, Mark Brown
  Cc: linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w,
	Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w,
	linux-spi-u79uwXL29TY76Z2rM5mHXA, hs-ynQEQJNshbs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Grygorii,

Le 06/07/2016 12:03, Grygorii Strashko a écrit :
> On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
>> Hi Mark,
>>
>> recently Heiko reported to us a performance regression with Atmel SPI
>> controllers. He noticed the issue on a sam9g15ek board and I was also able to
>> reproduce it on a sama5d36ek board.
>>
>> We found out that the performance regression was introduced in 3.14 by commit:
>> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
>> spi: atmel: Refactor spi-atmel to use SPI framework queue
>>
>> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
>> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
>> and MOSI signals on the SPI bus.
>>
>>
>> 1 - Reading 512 bytes from the memory
>>
>> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>>
>> With the oscilloscope, I measured the time between the chip-select fell before
>> the Read Status command (05h) and the chip-select rose after all data had been
>> read by the 4-byte address Fast Read 1-1-1 command (13h).
>>
>> 3.14 vanilla                      : 305 µs
>> 3.14 commit 8090d6d1a415 reverted : 242 µs   -21%
>>
>> 2 - Reading 1000 x 1024 bytes from the memory
>>
>> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>>
>> Still with the scope, I measured the time to read all data.
>>
>> 3.14 vanilla                      : 435 ms
>> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>>
>>
>> Indeed the oscilloscope shows that more time is spent between messages and
>> transfers.
>>
>> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
>> queue by a workqueue provided by the SPI framework.
>>
>> The support of this (optional) workqueue was introduced by commit:
>> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
>> spi: create a message queuing infrastructure
>>
>> Though the commit message claims that is common infrastructure is optional,
>> the patch also claims the .transfer() hook is deprecated, suggesting drivers
>> should implement the new .transfer_one_message() hook instead.
>>
>> This is the reason why commit 8090d6d1a415 was submitted. However we lost
>> quite amount of performances moving from our tasklet to the generic workqueue.
>>
>> So do you recommend us to keep our current generic implementation relying on
>> the SPI framework workqueue or to go back to a custom implementation using
>> tasklet?
>> If we keep the current implementation, is there a way to improve the
>> performances so we go back to something close to what he had before?
>>
>> We saw in commit ffbbdd21329f that we can change the workqueue thread
>> scheduling policy to SCHED_FIFO by setting master->rt.
>>
> 
> master->rt is not a good choice as i know and
> you may find thread [1] useful for you.
> 
> [1] http://www.spinics.net/lists/linux-rt-users/msg14347.html
> 

thanks for the link, I'll look at it :)

Best regards,

Cyrille

--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-07  8:12         ` Cyrille Pitchen
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrille Pitchen @ 2016-07-07  8:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Grygorii,

Le 06/07/2016 12:03, Grygorii Strashko a ?crit :
> On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
>> Hi Mark,
>>
>> recently Heiko reported to us a performance regression with Atmel SPI
>> controllers. He noticed the issue on a sam9g15ek board and I was also able to
>> reproduce it on a sama5d36ek board.
>>
>> We found out that the performance regression was introduced in 3.14 by commit:
>> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
>> spi: atmel: Refactor spi-atmel to use SPI framework queue
>>
>> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
>> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
>> and MOSI signals on the SPI bus.
>>
>>
>> 1 - Reading 512 bytes from the memory
>>
>> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>>
>> With the oscilloscope, I measured the time between the chip-select fell before
>> the Read Status command (05h) and the chip-select rose after all data had been
>> read by the 4-byte address Fast Read 1-1-1 command (13h).
>>
>> 3.14 vanilla                      : 305 ?s
>> 3.14 commit 8090d6d1a415 reverted : 242 ?s   -21%
>>
>> 2 - Reading 1000 x 1024 bytes from the memory
>>
>> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>>
>> Still with the scope, I measured the time to read all data.
>>
>> 3.14 vanilla                      : 435 ms
>> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>>
>>
>> Indeed the oscilloscope shows that more time is spent between messages and
>> transfers.
>>
>> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
>> queue by a workqueue provided by the SPI framework.
>>
>> The support of this (optional) workqueue was introduced by commit:
>> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
>> spi: create a message queuing infrastructure
>>
>> Though the commit message claims that is common infrastructure is optional,
>> the patch also claims the .transfer() hook is deprecated, suggesting drivers
>> should implement the new .transfer_one_message() hook instead.
>>
>> This is the reason why commit 8090d6d1a415 was submitted. However we lost
>> quite amount of performances moving from our tasklet to the generic workqueue.
>>
>> So do you recommend us to keep our current generic implementation relying on
>> the SPI framework workqueue or to go back to a custom implementation using
>> tasklet?
>> If we keep the current implementation, is there a way to improve the
>> performances so we go back to something close to what he had before?
>>
>> We saw in commit ffbbdd21329f that we can change the workqueue thread
>> scheduling policy to SCHED_FIFO by setting master->rt.
>>
> 
> master->rt is not a good choice as i know and
> you may find thread [1] useful for you.
> 
> [1] http://www.spinics.net/lists/linux-rt-users/msg14347.html
> 

thanks for the link, I'll look at it :)

Best regards,

Cyrille

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SPI: performance regression when using the common message queuing infrastructure
  2016-07-06  9:50 ` Cyrille Pitchen
@ 2016-07-07  9:50     ` Mark Brown
  -1 siblings, 0 replies; 14+ messages in thread
From: Mark Brown @ 2016-07-07  9:50 UTC (permalink / raw)
  To: Cyrille Pitchen
  Cc: Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w, hs-ynQEQJNshbs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-spi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 677 bytes --]

On Wed, Jul 06, 2016 at 11:50:28AM +0200, Cyrille Pitchen wrote:

> So do you recommend us to keep our current generic implementation relying on
> the SPI framework workqueue or to go back to a custom implementation using
> tasklet?
> If we keep the current implementation, is there a way to improve the
> performances so we go back to something close to what he had before?

I'd suggest trying to implement whatever it is that your customm
implementation was doing in the core, having individual drivers trying
to open code the message pump is obviously not sensible.  If what you're
doing makes your controllers run faster it'll probably make everyone
else's run faster too.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-07  9:50     ` Mark Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Brown @ 2016-07-07  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 06, 2016 at 11:50:28AM +0200, Cyrille Pitchen wrote:

> So do you recommend us to keep our current generic implementation relying on
> the SPI framework workqueue or to go back to a custom implementation using
> tasklet?
> If we keep the current implementation, is there a way to improve the
> performances so we go back to something close to what he had before?

I'd suggest trying to implement whatever it is that your customm
implementation was doing in the core, having individual drivers trying
to open code the message pump is obviously not sensible.  If what you're
doing makes your controllers run faster it'll probably make everyone
else's run faster too.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160707/e453244b/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SPI: performance regression when using the common message queuing infrastructure
  2016-07-07  8:12         ` Cyrille Pitchen
@ 2016-07-25  4:51             ` Heiko Schocher
  -1 siblings, 0 replies; 14+ messages in thread
From: Heiko Schocher @ 2016-07-25  4:51 UTC (permalink / raw)
  To: Cyrille Pitchen
  Cc: Grygorii Strashko, Mark Brown,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w,
	Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w,
	linux-spi-u79uwXL29TY76Z2rM5mHXA, hs-ynQEQJNshbs,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hello Cyrille,

sorry for the late answer, but just back from holidays ...

Am 07.07.2016 um 10:12 schrieb Cyrille Pitchen:
> Hi Grygorii,
>
> Le 06/07/2016 12:03, Grygorii Strashko a écrit :
>> On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
>>> Hi Mark,
>>>
>>> recently Heiko reported to us a performance regression with Atmel SPI
>>> controllers. He noticed the issue on a sam9g15ek board and I was also able to
>>> reproduce it on a sama5d36ek board.
>>>
>>> We found out that the performance regression was introduced in 3.14 by commit:
>>> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
>>> spi: atmel: Refactor spi-atmel to use SPI framework queue
>>>
>>> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
>>> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
>>> and MOSI signals on the SPI bus.
>>>
>>>
>>> 1 - Reading 512 bytes from the memory
>>>
>>> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>>>
>>> With the oscilloscope, I measured the time between the chip-select fell before
>>> the Read Status command (05h) and the chip-select rose after all data had been
>>> read by the 4-byte address Fast Read 1-1-1 command (13h).
>>>
>>> 3.14 vanilla                      : 305 µs
>>> 3.14 commit 8090d6d1a415 reverted : 242 µs   -21%
>>>
>>> 2 - Reading 1000 x 1024 bytes from the memory
>>>
>>> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>>>
>>> Still with the scope, I measured the time to read all data.
>>>
>>> 3.14 vanilla                      : 435 ms
>>> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>>>
>>>
>>> Indeed the oscilloscope shows that more time is spent between messages and
>>> transfers.

Yes this fits with my observations.

>>> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
>>> queue by a workqueue provided by the SPI framework.
>>>
>>> The support of this (optional) workqueue was introduced by commit:
>>> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
>>> spi: create a message queuing infrastructure
>>>
>>> Though the commit message claims that is common infrastructure is optional,
>>> the patch also claims the .transfer() hook is deprecated, suggesting drivers
>>> should implement the new .transfer_one_message() hook instead.
>>>
>>> This is the reason why commit 8090d6d1a415 was submitted. However we lost
>>> quite amount of performances moving from our tasklet to the generic workqueue.
>>>
>>> So do you recommend us to keep our current generic implementation relying on
>>> the SPI framework workqueue or to go back to a custom implementation using
>>> tasklet?
>>> If we keep the current implementation, is there a way to improve the
>>> performances so we go back to something close to what he had before?
>>>
>>> We saw in commit ffbbdd21329f that we can change the workqueue thread
>>> scheduling policy to SCHED_FIFO by setting master->rt.
>>>
>>
>> master->rt is not a good choice as i know and
>> you may find thread [1] useful for you.
>>
>> [1] http://www.spinics.net/lists/linux-rt-users/msg14347.html
>>
>
> thanks for the link, I'll look at it :)

Thanks for digging into this issue and your tests!

Do you have some new results? Can I help you?

bye,
Heiko
-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-25  4:51             ` Heiko Schocher
  0 siblings, 0 replies; 14+ messages in thread
From: Heiko Schocher @ 2016-07-25  4:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Cyrille,

sorry for the late answer, but just back from holidays ...

Am 07.07.2016 um 10:12 schrieb Cyrille Pitchen:
> Hi Grygorii,
>
> Le 06/07/2016 12:03, Grygorii Strashko a ?crit :
>> On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
>>> Hi Mark,
>>>
>>> recently Heiko reported to us a performance regression with Atmel SPI
>>> controllers. He noticed the issue on a sam9g15ek board and I was also able to
>>> reproduce it on a sama5d36ek board.
>>>
>>> We found out that the performance regression was introduced in 3.14 by commit:
>>> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
>>> spi: atmel: Refactor spi-atmel to use SPI framework queue
>>>
>>> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
>>> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
>>> and MOSI signals on the SPI bus.
>>>
>>>
>>> 1 - Reading 512 bytes from the memory
>>>
>>> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>>>
>>> With the oscilloscope, I measured the time between the chip-select fell before
>>> the Read Status command (05h) and the chip-select rose after all data had been
>>> read by the 4-byte address Fast Read 1-1-1 command (13h).
>>>
>>> 3.14 vanilla                      : 305 ?s
>>> 3.14 commit 8090d6d1a415 reverted : 242 ?s   -21%
>>>
>>> 2 - Reading 1000 x 1024 bytes from the memory
>>>
>>> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>>>
>>> Still with the scope, I measured the time to read all data.
>>>
>>> 3.14 vanilla                      : 435 ms
>>> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>>>
>>>
>>> Indeed the oscilloscope shows that more time is spent between messages and
>>> transfers.

Yes this fits with my observations.

>>> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
>>> queue by a workqueue provided by the SPI framework.
>>>
>>> The support of this (optional) workqueue was introduced by commit:
>>> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
>>> spi: create a message queuing infrastructure
>>>
>>> Though the commit message claims that is common infrastructure is optional,
>>> the patch also claims the .transfer() hook is deprecated, suggesting drivers
>>> should implement the new .transfer_one_message() hook instead.
>>>
>>> This is the reason why commit 8090d6d1a415 was submitted. However we lost
>>> quite amount of performances moving from our tasklet to the generic workqueue.
>>>
>>> So do you recommend us to keep our current generic implementation relying on
>>> the SPI framework workqueue or to go back to a custom implementation using
>>> tasklet?
>>> If we keep the current implementation, is there a way to improve the
>>> performances so we go back to something close to what he had before?
>>>
>>> We saw in commit ffbbdd21329f that we can change the workqueue thread
>>> scheduling policy to SCHED_FIFO by setting master->rt.
>>>
>>
>> master->rt is not a good choice as i know and
>> you may find thread [1] useful for you.
>>
>> [1] http://www.spinics.net/lists/linux-rt-users/msg14347.html
>>
>
> thanks for the link, I'll look at it :)

Thanks for digging into this issue and your tests!

Do you have some new results? Can I help you?

bye,
Heiko
-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SPI: performance regression when using the common message queuing infrastructure
  2016-07-25  4:51             ` Heiko Schocher
@ 2016-07-29  9:33                 ` Cyrille Pitchen
  -1 siblings, 0 replies; 14+ messages in thread
From: Cyrille Pitchen @ 2016-07-29  9:33 UTC (permalink / raw)
  To: hs-ynQEQJNshbs
  Cc: Grygorii Strashko, Mark Brown,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w,
	Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w,
	linux-spi-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Heiko,

Le 25/07/2016 à 06:51, Heiko Schocher a écrit :
> Hello Cyrille,
> 
> sorry for the late answer, but just back from holidays ...
> 
> Am 07.07.2016 um 10:12 schrieb Cyrille Pitchen:
>> Hi Grygorii,
>>
>> Le 06/07/2016 12:03, Grygorii Strashko a écrit :
>>> On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
>>>> Hi Mark,
>>>>
>>>> recently Heiko reported to us a performance regression with Atmel SPI
>>>> controllers. He noticed the issue on a sam9g15ek board and I was also able to
>>>> reproduce it on a sama5d36ek board.
>>>>
>>>> We found out that the performance regression was introduced in 3.14 by commit:
>>>> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
>>>> spi: atmel: Refactor spi-atmel to use SPI framework queue
>>>>
>>>> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
>>>> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
>>>> and MOSI signals on the SPI bus.
>>>>
>>>>
>>>> 1 - Reading 512 bytes from the memory
>>>>
>>>> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>>>>
>>>> With the oscilloscope, I measured the time between the chip-select fell before
>>>> the Read Status command (05h) and the chip-select rose after all data had been
>>>> read by the 4-byte address Fast Read 1-1-1 command (13h).
>>>>
>>>> 3.14 vanilla                      : 305 µs
>>>> 3.14 commit 8090d6d1a415 reverted : 242 µs   -21%
>>>>
>>>> 2 - Reading 1000 x 1024 bytes from the memory
>>>>
>>>> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>>>>
>>>> Still with the scope, I measured the time to read all data.
>>>>
>>>> 3.14 vanilla                      : 435 ms
>>>> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>>>>
>>>>
>>>> Indeed the oscilloscope shows that more time is spent between messages and
>>>> transfers.
> 
> Yes this fits with my observations.
> 
>>>> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
>>>> queue by a workqueue provided by the SPI framework.
>>>>
>>>> The support of this (optional) workqueue was introduced by commit:
>>>> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
>>>> spi: create a message queuing infrastructure
>>>>
>>>> Though the commit message claims that is common infrastructure is optional,
>>>> the patch also claims the .transfer() hook is deprecated, suggesting drivers
>>>> should implement the new .transfer_one_message() hook instead.
>>>>
>>>> This is the reason why commit 8090d6d1a415 was submitted. However we lost
>>>> quite amount of performances moving from our tasklet to the generic workqueue.
>>>>
>>>> So do you recommend us to keep our current generic implementation relying on
>>>> the SPI framework workqueue or to go back to a custom implementation using
>>>> tasklet?
>>>> If we keep the current implementation, is there a way to improve the
>>>> performances so we go back to something close to what he had before?
>>>>
>>>> We saw in commit ffbbdd21329f that we can change the workqueue thread
>>>> scheduling policy to SCHED_FIFO by setting master->rt.
>>>>
>>>
>>> master->rt is not a good choice as i know and
>>> you may find thread [1] useful for you.
>>>
>>> [1] http://www.spinics.net/lists/linux-rt-users/msg14347.html
>>>
>>
>> thanks for the link, I'll look at it :)
> 
> Thanks for digging into this issue and your tests!
> 
> Do you have some new results? Can I help you?
> 
> bye,
> Heiko


We talked about moving back to a tasklet implementation but nothing was done
yet so nothing new for now, sorry.
Also, I will be out of office for the next 3 weeks: I will be back on August,
22th.


Best regards,

Cyrille
--
To unsubscribe from this list: send the line "unsubscribe linux-spi" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-29  9:33                 ` Cyrille Pitchen
  0 siblings, 0 replies; 14+ messages in thread
From: Cyrille Pitchen @ 2016-07-29  9:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Heiko,

Le 25/07/2016 ? 06:51, Heiko Schocher a ?crit :
> Hello Cyrille,
> 
> sorry for the late answer, but just back from holidays ...
> 
> Am 07.07.2016 um 10:12 schrieb Cyrille Pitchen:
>> Hi Grygorii,
>>
>> Le 06/07/2016 12:03, Grygorii Strashko a ?crit :
>>> On 07/06/2016 12:50 PM, Cyrille Pitchen wrote:
>>>> Hi Mark,
>>>>
>>>> recently Heiko reported to us a performance regression with Atmel SPI
>>>> controllers. He noticed the issue on a sam9g15ek board and I was also able to
>>>> reproduce it on a sama5d36ek board.
>>>>
>>>> We found out that the performance regression was introduced in 3.14 by commit:
>>>> 8090d6d1a415d3ae1a7208995decfab8f60f4f36
>>>> spi: atmel: Refactor spi-atmel to use SPI framework queue
>>>>
>>>> For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
>>>> a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
>>>> and MOSI signals on the SPI bus.
>>>>
>>>>
>>>> 1 - Reading 512 bytes from the memory
>>>>
>>>> # dd if=/dev/mtd6 bs=512 count=1 of=/dev/null
>>>>
>>>> With the oscilloscope, I measured the time between the chip-select fell before
>>>> the Read Status command (05h) and the chip-select rose after all data had been
>>>> read by the 4-byte address Fast Read 1-1-1 command (13h).
>>>>
>>>> 3.14 vanilla                      : 305 ?s
>>>> 3.14 commit 8090d6d1a415 reverted : 242 ?s   -21%
>>>>
>>>> 2 - Reading 1000 x 1024 bytes from the memory
>>>>
>>>> # dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null
>>>>
>>>> Still with the scope, I measured the time to read all data.
>>>>
>>>> 3.14 vanilla                      : 435 ms
>>>> 3.14 commit 8090d6d1a415 reverted : 361 ms   -17%
>>>>
>>>>
>>>> Indeed the oscilloscope shows that more time is spent between messages and
>>>> transfers.
> 
> Yes this fits with my observations.
> 
>>>> commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
>>>> queue by a workqueue provided by the SPI framework.
>>>>
>>>> The support of this (optional) workqueue was introduced by commit:
>>>> ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
>>>> spi: create a message queuing infrastructure
>>>>
>>>> Though the commit message claims that is common infrastructure is optional,
>>>> the patch also claims the .transfer() hook is deprecated, suggesting drivers
>>>> should implement the new .transfer_one_message() hook instead.
>>>>
>>>> This is the reason why commit 8090d6d1a415 was submitted. However we lost
>>>> quite amount of performances moving from our tasklet to the generic workqueue.
>>>>
>>>> So do you recommend us to keep our current generic implementation relying on
>>>> the SPI framework workqueue or to go back to a custom implementation using
>>>> tasklet?
>>>> If we keep the current implementation, is there a way to improve the
>>>> performances so we go back to something close to what he had before?
>>>>
>>>> We saw in commit ffbbdd21329f that we can change the workqueue thread
>>>> scheduling policy to SCHED_FIFO by setting master->rt.
>>>>
>>>
>>> master->rt is not a good choice as i know and
>>> you may find thread [1] useful for you.
>>>
>>> [1] http://www.spinics.net/lists/linux-rt-users/msg14347.html
>>>
>>
>> thanks for the link, I'll look at it :)
> 
> Thanks for digging into this issue and your tests!
> 
> Do you have some new results? Can I help you?
> 
> bye,
> Heiko


We talked about moving back to a tasklet implementation but nothing was done
yet so nothing new for now, sorry.
Also, I will be out of office for the next 3 weeks: I will be back on August,
22th.


Best regards,

Cyrille

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: SPI: performance regression when using the common message queuing infrastructure
  2016-07-29  9:33                 ` Cyrille Pitchen
@ 2016-07-29 12:11                     ` Mark Brown
  -1 siblings, 0 replies; 14+ messages in thread
From: Mark Brown @ 2016-07-29 12:11 UTC (permalink / raw)
  To: Cyrille Pitchen
  Cc: hs-ynQEQJNshbs, Grygorii Strashko,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	Nicolas.FERRE-AIFe0yeh4nAAvxtiuMwx3w,
	Wenyou.Yang-AIFe0yeh4nAAvxtiuMwx3w,
	linux-spi-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

On Fri, Jul 29, 2016 at 11:33:00AM +0200, Cyrille Pitchen wrote:

> We talked about moving back to a tasklet implementation but nothing was done
> yet so nothing new for now, sorry.
> Also, I will be out of office for the next 3 weeks: I will be back on August,
> 22th.

To repeat what I said before the thing to do here is to improve the
performance of the core implementation rather than just open coding
something.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* SPI: performance regression when using the common message queuing infrastructure
@ 2016-07-29 12:11                     ` Mark Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Mark Brown @ 2016-07-29 12:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 29, 2016 at 11:33:00AM +0200, Cyrille Pitchen wrote:

> We talked about moving back to a tasklet implementation but nothing was done
> yet so nothing new for now, sorry.
> Also, I will be out of office for the next 3 weeks: I will be back on August,
> 22th.

To repeat what I said before the thing to do here is to improve the
performance of the core implementation rather than just open coding
something.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20160729/05ddab58/attachment.sig>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-07-29 12:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-06  9:50 SPI: performance regression when using the common message queuing infrastructure Cyrille Pitchen
2016-07-06  9:50 ` Cyrille Pitchen
     [not found] ` <577CD464.6050506-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org>
2016-07-06 10:03   ` Grygorii Strashko
2016-07-06 10:03     ` Grygorii Strashko
     [not found]     ` <577CD767.2080309-l0cyMroinI0@public.gmane.org>
2016-07-07  8:12       ` Cyrille Pitchen
2016-07-07  8:12         ` Cyrille Pitchen
     [not found]         ` <577E0EF3.6000308-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org>
2016-07-25  4:51           ` Heiko Schocher
2016-07-25  4:51             ` Heiko Schocher
     [not found]             ` <57959ADD.40700-ynQEQJNshbs@public.gmane.org>
2016-07-29  9:33               ` Cyrille Pitchen
2016-07-29  9:33                 ` Cyrille Pitchen
     [not found]                 ` <41cb8a2a-7138-d2c0-e668-6c03add1882e-AIFe0yeh4nAAvxtiuMwx3w@public.gmane.org>
2016-07-29 12:11                   ` Mark Brown
2016-07-29 12:11                     ` Mark Brown
2016-07-07  9:50   ` Mark Brown
2016-07-07  9:50     ` Mark Brown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.