All of lore.kernel.org
 help / color / mirror / Atom feed
* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:08 ` Corentin Labbe
  0 siblings, 0 replies; 20+ messages in thread
From: Corentin Labbe @ 2017-03-23 10:08 UTC (permalink / raw)
  To: Joao.Pinto, peppe.cavallaro, alexandre.torgue
  Cc: netdev, linux-kernel, linux-arm-kernel

Hello

Using next-20170323 produce a huge performance regression on my sunxi boards.
On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.

On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
and network is lost after.

Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
I still try to found which part of this patch mades the performance lower.

Regards
Corentin Labbe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:08 ` Corentin Labbe
  0 siblings, 0 replies; 20+ messages in thread
From: Corentin Labbe @ 2017-03-23 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hello

Using next-20170323 produce a huge performance regression on my sunxi boards.
On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.

On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
and network is lost after.

Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
I still try to found which part of this patch mades the performance lower.

Regards
Corentin Labbe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:08 ` Corentin Labbe
@ 2017-03-23 10:12   ` Joao Pinto
  -1 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:12 UTC (permalink / raw)
  To: Corentin Labbe, Joao.Pinto, peppe.cavallaro, alexandre.torgue
  Cc: netdev, linux-kernel, linux-arm-kernel


Hi Corentin,

Às 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> Hello
> 
> Using next-20170323 produce a huge performance regression on my sunxi boards.
> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
> 
> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> and network is lost after.
> 
> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> I still try to found which part of this patch mades the performance lower.
> 
> Regards
> Corentin Labbe
> 

I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
Could you please share the iperf cmds you are using in order for me to reproduce
in my side?

@stmmac users: It would be great if people that have a setup could also perform
teh same iperf test in order to clean in up for everyone.

Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:12   ` Joao Pinto
  0 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:12 UTC (permalink / raw)
  To: linux-arm-kernel


Hi Corentin,

?s 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> Hello
> 
> Using next-20170323 produce a huge performance regression on my sunxi boards.
> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
> 
> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> and network is lost after.
> 
> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> I still try to found which part of this patch mades the performance lower.
> 
> Regards
> Corentin Labbe
> 

I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
Could you please share the iperf cmds you are using in order for me to reproduce
in my side?

@stmmac users: It would be great if people that have a setup could also perform
teh same iperf test in order to clean in up for everyone.

Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:12   ` Joao Pinto
@ 2017-03-23 10:20     ` Corentin Labbe
  -1 siblings, 0 replies; 20+ messages in thread
From: Corentin Labbe @ 2017-03-23 10:20 UTC (permalink / raw)
  To: Joao Pinto
  Cc: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel,
	linux-arm-kernel

On Thu, Mar 23, 2017 at 10:12:18AM +0000, Joao Pinto wrote:
> 
> Hi Corentin,
> 
> Às 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> > Hello
> > 
> > Using next-20170323 produce a huge performance regression on my sunxi boards.
> > On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
> > 
> > On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> > and network is lost after.
> > 
> > Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> > I still try to found which part of this patch mades the performance lower.
> > 
> > Regards
> > Corentin Labbe
> > 
> 
> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
> Could you please share the iperf cmds you are using in order for me to reproduce
> in my side?

simple iperf -c serverip for both board

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:20     ` Corentin Labbe
  0 siblings, 0 replies; 20+ messages in thread
From: Corentin Labbe @ 2017-03-23 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 23, 2017 at 10:12:18AM +0000, Joao Pinto wrote:
> 
> Hi Corentin,
> 
> ?s 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
> > Hello
> > 
> > Using next-20170323 produce a huge performance regression on my sunxi boards.
> > On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
> > 
> > On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> > and network is lost after.
> > 
> > Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
> > I still try to found which part of this patch mades the performance lower.
> > 
> > Regards
> > Corentin Labbe
> > 
> 
> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
> Could you please share the iperf cmds you are using in order for me to reproduce
> in my side?

simple iperf -c serverip for both board

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:20     ` Corentin Labbe
@ 2017-03-23 10:40       ` Joao Pinto
  -1 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:40 UTC (permalink / raw)
  To: Corentin Labbe, Joao Pinto
  Cc: peppe.cavallaro, alexandre.torgue, netdev, linux-kernel,
	linux-arm-kernel

Às 10:20 AM de 3/23/2017, Corentin Labbe escreveu:
> On Thu, Mar 23, 2017 at 10:12:18AM +0000, Joao Pinto wrote:
>>
>> Hi Corentin,
>>
>> Às 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
>>> Hello
>>>
>>> Using next-20170323 produce a huge performance regression on my sunxi boards.
>>> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
>>>
>>> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
>>> and network is lost after.
>>>
>>> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
>>> I still try to found which part of this patch mades the performance lower.
>>>
>>> Regards
>>> Corentin Labbe
>>>
>>
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> Could you please share the iperf cmds you are using in order for me to reproduce
>> in my side?
> 
> simple iperf -c serverip for both board
> 

Ok, I am going to run my tests with a fresh net-next and come back to you soon.

Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:40       ` Joao Pinto
  0 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:40 UTC (permalink / raw)
  To: linux-arm-kernel

?s 10:20 AM de 3/23/2017, Corentin Labbe escreveu:
> On Thu, Mar 23, 2017 at 10:12:18AM +0000, Joao Pinto wrote:
>>
>> Hi Corentin,
>>
>> ?s 10:08 AM de 3/23/2017, Corentin Labbe escreveu:
>>> Hello
>>>
>>> Using next-20170323 produce a huge performance regression on my sunxi boards.
>>> On dwmac-sun8i, iperf goes from 94mbs/s to 37 when sending.
>>>
>>> On cubieboard2(dwmac-sunxi), iperf made the kernel flood with "ndesc_get_rx_status: Oversized frame spanned multiple buffers"
>>> and network is lost after.
>>>
>>> Reverting aff3d9eff84399e433c4aca65a9bb236581bc082 fix the issue.
>>> I still try to found which part of this patch mades the performance lower.
>>>
>>> Regards
>>> Corentin Labbe
>>>
>>
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> Could you please share the iperf cmds you are using in order for me to reproduce
>> in my side?
> 
> simple iperf -c serverip for both board
> 

Ok, I am going to run my tests with a fresh net-next and come back to you soon.

Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:40       ` Joao Pinto
  (?)
@ 2017-03-23 10:48         ` Giuseppe CAVALLARO
  -1 siblings, 0 replies; 20+ messages in thread
From: Giuseppe CAVALLARO @ 2017-03-23 10:48 UTC (permalink / raw)
  To: Joao Pinto, Corentin Labbe
  Cc: alexandre.torgue, netdev, linux-kernel, linux-arm-kernel

Hello

On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> >Could you please share the iperf cmds you are using in order for me to reproduce
>> >in my side?

Joao, you have a really powerful HW integration with multiple channels 
for both RX and TX.
Often this is not the same for other setup where, usually just a DMA0 is 
present or, sometime, there
is just one RX extra channel.

My question is, what happens on this kind of configurations? Are we 
still guarantying the best performances?

Also we have to guarantee, that the TSO and SG are always working. 
Another point is the buffer sizes that
can be different among platforms.

The problem  below reported by Corentin push me to think that there is a 
bug, so we should
understand when this has been introduced and if likely fixed by some 
configuration we are
not take care right now.

ndesc_get_rx_status: Oversized frame spanned multiple buffers"


Best Regards
Peppe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:48         ` Giuseppe CAVALLARO
  0 siblings, 0 replies; 20+ messages in thread
From: Giuseppe CAVALLARO @ 2017-03-23 10:48 UTC (permalink / raw)
  To: Joao Pinto, Corentin Labbe
  Cc: netdev, alexandre.torgue, linux-arm-kernel, linux-kernel

Hello

On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> >Could you please share the iperf cmds you are using in order for me to reproduce
>> >in my side?

Joao, you have a really powerful HW integration with multiple channels 
for both RX and TX.
Often this is not the same for other setup where, usually just a DMA0 is 
present or, sometime, there
is just one RX extra channel.

My question is, what happens on this kind of configurations? Are we 
still guarantying the best performances?

Also we have to guarantee, that the TSO and SG are always working. 
Another point is the buffer sizes that
can be different among platforms.

The problem  below reported by Corentin push me to think that there is a 
bug, so we should
understand when this has been introduced and if likely fixed by some 
configuration we are
not take care right now.

ndesc_get_rx_status: Oversized frame spanned multiple buffers"


Best Regards
Peppe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:48         ` Giuseppe CAVALLARO
  0 siblings, 0 replies; 20+ messages in thread
From: Giuseppe CAVALLARO @ 2017-03-23 10:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hello

On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>> >Could you please share the iperf cmds you are using in order for me to reproduce
>> >in my side?

Joao, you have a really powerful HW integration with multiple channels 
for both RX and TX.
Often this is not the same for other setup where, usually just a DMA0 is 
present or, sometime, there
is just one RX extra channel.

My question is, what happens on this kind of configurations? Are we 
still guarantying the best performances?

Also we have to guarantee, that the TSO and SG are always working. 
Another point is the buffer sizes that
can be different among platforms.

The problem  below reported by Corentin push me to think that there is a 
bug, so we should
understand when this has been introduced and if likely fixed by some 
configuration we are
not take care right now.

ndesc_get_rx_status: Oversized frame spanned multiple buffers"


Best Regards
Peppe

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:48         ` Giuseppe CAVALLARO
  (?)
@ 2017-03-23 10:51           ` Giuseppe CAVALLARO
  -1 siblings, 0 replies; 20+ messages in thread
From: Giuseppe CAVALLARO @ 2017-03-23 10:51 UTC (permalink / raw)
  To: Joao Pinto, Corentin Labbe
  Cc: alexandre.torgue, netdev, linux-kernel, linux-arm-kernel

On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
> Hello
>
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me 
>>> to reproduce
>>> >in my side?
>
> Joao, you have a really powerful HW integration with multiple channels 
> for both RX and TX.
> Often this is not the same for other setup where, usually just a DMA0 
> is present or, sometime, there
> is just one RX extra channel.
>
> My question is, what happens on this kind of configurations? Are we 
> still guarantying the best performances?
>
> Also we have to guarantee, that the TSO and SG are always working. 
> Another point is the buffer sizes that
> can be different among platforms.
>
> The problem  below reported by Corentin push me to think that there is 
> a bug, so we should
> understand when this has been introduced and if likely fixed by some 
> configuration we are
> not take care right now.
>
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"

I wonder if this could be easily triggered by getting a big file via 
FTP. So not properly related on performance benchs

peppe

>
>
> Best Regards
> Peppe
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:51           ` Giuseppe CAVALLARO
  0 siblings, 0 replies; 20+ messages in thread
From: Giuseppe CAVALLARO @ 2017-03-23 10:51 UTC (permalink / raw)
  To: Joao Pinto, Corentin Labbe
  Cc: netdev, alexandre.torgue, linux-arm-kernel, linux-kernel

On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
> Hello
>
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me 
>>> to reproduce
>>> >in my side?
>
> Joao, you have a really powerful HW integration with multiple channels 
> for both RX and TX.
> Often this is not the same for other setup where, usually just a DMA0 
> is present or, sometime, there
> is just one RX extra channel.
>
> My question is, what happens on this kind of configurations? Are we 
> still guarantying the best performances?
>
> Also we have to guarantee, that the TSO and SG are always working. 
> Another point is the buffer sizes that
> can be different among platforms.
>
> The problem  below reported by Corentin push me to think that there is 
> a bug, so we should
> understand when this has been introduced and if likely fixed by some 
> configuration we are
> not take care right now.
>
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"

I wonder if this could be easily triggered by getting a big file via 
FTP. So not properly related on performance benchs

peppe

>
>
> Best Regards
> Peppe
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:51           ` Giuseppe CAVALLARO
  0 siblings, 0 replies; 20+ messages in thread
From: Giuseppe CAVALLARO @ 2017-03-23 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
> Hello
>
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me 
>>> to reproduce
>>> >in my side?
>
> Joao, you have a really powerful HW integration with multiple channels 
> for both RX and TX.
> Often this is not the same for other setup where, usually just a DMA0 
> is present or, sometime, there
> is just one RX extra channel.
>
> My question is, what happens on this kind of configurations? Are we 
> still guarantying the best performances?
>
> Also we have to guarantee, that the TSO and SG are always working. 
> Another point is the buffer sizes that
> can be different among platforms.
>
> The problem  below reported by Corentin push me to think that there is 
> a bug, so we should
> understand when this has been introduced and if likely fixed by some 
> configuration we are
> not take care right now.
>
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"

I wonder if this could be easily triggered by getting a big file via 
FTP. So not properly related on performance benchs

peppe

>
>
> Best Regards
> Peppe
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:48         ` Giuseppe CAVALLARO
@ 2017-03-23 10:54           ` Joao Pinto
  -1 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:54 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Joao Pinto, Corentin Labbe
  Cc: alexandre.torgue, netdev, linux-kernel, linux-arm-kernel


Hi Peppe,

Às 10:48 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> Hello
> 
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me to
>>> reproduce
>>> >in my side?
> 
> Joao, you have a really powerful HW integration with multiple channels for both
> RX and TX.
> Often this is not the same for other setup where, usually just a DMA0 is present
> or, sometime, there
> is just one RX extra channel.

My opinion is that we should not have problems, since the majority of features
introduced are used if you configure rx queues > 1 or tx queues > 1, so if you
use the default (=1) those confiogurations will not take place.

> 
> My question is, what happens on this kind of configurations? Are we still
> guarantying the best performances?
> 
> Also we have to guarantee, that the TSO and SG are always working. Another point
> is the buffer sizes that
> can be different among platforms.

We have to pay attention to the RX buffer size, since I had problems with DHCP
messages not being received because of little buffer size.
Currently TX buffer size is not configurable and in the future it should be
useful to include it too.

> 
> The problem  below reported by Corentin push me to think that there is a bug, so
> we should
> understand when this has been introduced and if likely fixed by some
> configuration we are
> not take care right now.

Of course.

> 
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> 
> 
> Best Regards
> Peppe

Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:54           ` Joao Pinto
  0 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:54 UTC (permalink / raw)
  To: linux-arm-kernel


Hi Peppe,

?s 10:48 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> Hello
> 
> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>> >Could you please share the iperf cmds you are using in order for me to
>>> reproduce
>>> >in my side?
> 
> Joao, you have a really powerful HW integration with multiple channels for both
> RX and TX.
> Often this is not the same for other setup where, usually just a DMA0 is present
> or, sometime, there
> is just one RX extra channel.

My opinion is that we should not have problems, since the majority of features
introduced are used if you configure rx queues > 1 or tx queues > 1, so if you
use the default (=1) those confiogurations will not take place.

> 
> My question is, what happens on this kind of configurations? Are we still
> guarantying the best performances?
> 
> Also we have to guarantee, that the TSO and SG are always working. Another point
> is the buffer sizes that
> can be different among platforms.

We have to pay attention to the RX buffer size, since I had problems with DHCP
messages not being received because of little buffer size.
Currently TX buffer size is not configurable and in the future it should be
useful to include it too.

> 
> The problem  below reported by Corentin push me to think that there is a bug, so
> we should
> understand when this has been introduced and if likely fixed by some
> configuration we are
> not take care right now.

Of course.

> 
> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> 
> 
> Best Regards
> Peppe

Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:51           ` Giuseppe CAVALLARO
@ 2017-03-23 10:56             ` Joao Pinto
  -1 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:56 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Joao Pinto, Corentin Labbe
  Cc: alexandre.torgue, netdev, linux-kernel, linux-arm-kernel

Às 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>> Hello
>>
>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>>> >Could you please share the iperf cmds you are using in order for me to
>>>> reproduce
>>>> >in my side?
>>
>> Joao, you have a really powerful HW integration with multiple channels for
>> both RX and TX.
>> Often this is not the same for other setup where, usually just a DMA0 is
>> present or, sometime, there
>> is just one RX extra channel.
>>
>> My question is, what happens on this kind of configurations? Are we still
>> guarantying the best performances?
>>
>> Also we have to guarantee, that the TSO and SG are always working. Another
>> point is the buffer sizes that
>> can be different among platforms.
>>
>> The problem  below reported by Corentin push me to think that there is a bug,
>> so we should
>> understand when this has been introduced and if likely fixed by some
>> configuration we are
>> not take care right now.
>>
>> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> 
> I wonder if this could be easily triggered by getting a big file via FTP. So not
> properly related on performance benchs

I am going to do that test and check it out and also run iperf a couple of
times. I am counting on doing this today and send you later the results. If
anyone gets results sooner please share.

> 
> peppe
> 
>>
>>
>> Best Regards
>> Peppe
>>
> 

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 10:56             ` Joao Pinto
  0 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 10:56 UTC (permalink / raw)
  To: linux-arm-kernel

?s 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>> Hello
>>
>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>>> >Could you please share the iperf cmds you are using in order for me to
>>>> reproduce
>>>> >in my side?
>>
>> Joao, you have a really powerful HW integration with multiple channels for
>> both RX and TX.
>> Often this is not the same for other setup where, usually just a DMA0 is
>> present or, sometime, there
>> is just one RX extra channel.
>>
>> My question is, what happens on this kind of configurations? Are we still
>> guarantying the best performances?
>>
>> Also we have to guarantee, that the TSO and SG are always working. Another
>> point is the buffer sizes that
>> can be different among platforms.
>>
>> The problem  below reported by Corentin push me to think that there is a bug,
>> so we should
>> understand when this has been introduced and if likely fixed by some
>> configuration we are
>> not take care right now.
>>
>> ndesc_get_rx_status: Oversized frame spanned multiple buffers"
> 
> I wonder if this could be easily triggered by getting a big file via FTP. So not
> properly related on performance benchs

I am going to do that test and check it out and also run iperf a couple of
times. I am counting on doing this today and send you later the results. If
anyone gets results sooner please share.

> 
> peppe
> 
>>
>>
>> Best Regards
>> Peppe
>>
> 

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
  2017-03-23 10:56             ` Joao Pinto
@ 2017-03-23 12:55               ` Joao Pinto
  -1 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 12:55 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Joao Pinto, Corentin Labbe
  Cc: alexandre.torgue, netdev, linux-kernel, linux-arm-kernel

Às 10:56 AM de 3/23/2017, Joao Pinto escreveu:
> Às 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
>> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>>> Hello
>>>
>>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>>>>> Could you please share the iperf cmds you are using in order for me to
>>>>> reproduce
>>>>>> in my side?
>>>

HW Version: 4.21 QoS Core in HAPS DX7 (FPGA)
The connection between the FPGA and PC where stmmac is running is PCIe.
My configurations are done in stmmac_pci. Here they are:

@@ -68,10 +70,52 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
 {
 	plat->bus_id = 1;
 	plat->phy_addr = 0;
-	plat->interface = PHY_INTERFACE_MODE_GMII;
-	plat->clk_csr = 2;	/* clk_csr_i = 20-35MHz & MDC = clk_csr_i/16 */
-	plat->has_gmac = 1;
-	plat->force_sf_dma_mode = 1;
+	plat->interface = PHY_INTERFACE_MODE_SGMII;
+	plat->clk_csr = 0x5;
+	plat->has_gmac = 0;
+	plat->has_gmac4 = 1;
+	plat->force_sf_dma_mode = 0;
+
+	plat->rx_queues_to_use = 4;
+	plat->tx_queues_to_use = 4;
+
+	plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+	plat->rx_queues_cfg[0].mode_to_use = MTL_QUEUE_AVB;
+	plat->rx_queues_cfg[1].mode_to_use = MTL_QUEUE_DCB;
+	plat->rx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+	plat->rx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+	plat->tx_queues_cfg[0].mode_to_use = MTL_QUEUE_DCB;
+	plat->tx_queues_cfg[1].mode_to_use = MTL_QUEUE_AVB;
+	plat->tx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+	plat->tx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+	plat->tx_queues_cfg[1].send_slope = 0xCCC;
+	plat->tx_queues_cfg[1].idle_slope = 0x1333;
+	plat->tx_queues_cfg[1].high_credit = 0x4B0000;
+	plat->tx_queues_cfg[1].low_credit = 0xFFB50000;
+
+	plat->rx_queues_cfg[0].chan = 0;
+	plat->rx_queues_cfg[1].chan = 1;
+	plat->rx_queues_cfg[2].chan = 2;
+	plat->rx_queues_cfg[3].chan = 3;
+
+	plat->tx_sched_algorithm = MTL_TX_ALGORITHM_WRR;
+	plat->tx_queues_cfg[0].weight = 0x10;
+	plat->tx_queues_cfg[1].weight = 0x11;
+	plat->tx_queues_cfg[2].weight = 0x12;
+	plat->tx_queues_cfg[3].weight = 0x13;
+
+	/* Disable Priority config by default */
+	plat->tx_queues_cfg[0].use_prio = false;
+	plat->rx_queues_cfg[0].use_prio = false;
+
+	/* Disable RX queues routing by default */
+	plat->rx_queues_cfg[0].pkt_route = 0x0;
+	plat->rx_queues_cfg[1].pkt_route = 0x0;
+	plat->rx_queues_cfg[2].pkt_route = 0x0;
+	plat->rx_queues_cfg[3].pkt_route = 0x0;

 	plat->mdio_bus_data->phy_reset = NULL;
 	plat->mdio_bus_data->phy_mask = 0;
@@ -83,22 +127,14 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
 	/* Set default value for multicast hash bins */
 	plat->multicast_filter_bins = HASH_TABLE_SIZE;

+	plat->dma_cfg->fixed_burst = 0;
+	plat->dma_cfg->aal = 0;
+
 	/* Set default value for unicast filter entries */
 	plat->unicast_filter_entries = 1;

 	/* Set the maxmtu to a default of JUMBO_LEN */
 	plat->maxmtu = JUMBO_LEN;
-
-	/* Set default number of RX and TX queues to use */
-	plat->tx_queues_to_use = 1;
-	plat->rx_queues_to_use = 1;
-
-	/* Disable Priority config by default */
-	plat->tx_queues_cfg[0].use_prio = false;
-	plat->rx_queues_cfg[0].use_prio = false;
-
-	/* Disable RX queues routing by default */
-	plat->rx_queues_cfg[0].pkt_route = 0x0;
 }


******* TESTS *******


*TEST 1: File (linux-next tarball) transfer of ~1.4G by scp to the DUT*

scp net-next-20170323.tar.gz xxxxx@XXXXXXX:/home/synopsys/
The authenticity of host 'XXXXX' can't be established.
ECDSA key fingerprint is SHA256:/XXXXXX.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'XXXXXX' (ECDSA) to the list of known hosts.
XXXXXX@XXXXX's password:
net-next20170323.tar.gz

             100% 1366MB  79.3MB/s   00:17

ifconfig after transfer:

eth1      Link encap:Ethernet  HWaddr XXXX
          inet addr:XXXX  Bcast:XXXX  Mask:XXXX
          inet6 addr: XXXXX Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1026614 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56804 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1502856063 (1.5 GB)  TX bytes:4224767 (4.2 MB)
          Interrupt:16

*stmmac Log after transfer:

#:~/temp$ dmesg | grep stmmac
[    0.278200] stmmac - user ID: 0x10, Synopsys ID: 0x42
[    0.278207] stmmaceth 0000:01:00.0: DMA HW capability register supported
[    0.278209] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
[    0.278211] stmmaceth 0000:01:00.0: TX Checksum insertion supported
[    0.278224] stmmaceth 0000:01:00.0: Enable RX Mitigation via HW Watchdog Timer
[    0.315596] libphy: stmmac: probed
[    0.315601] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 0 IRQ POLL (stmmac-1:00) active
[    0.315605] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 1 IRQ POLL (stmmac-1:01)
[    0.315608] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 2 IRQ POLL (stmmac-1:02)
[    0.315612] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 3 IRQ POLL (stmmac-1:03)
[   13.380009] Generic PHY stmmac-1:00: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=stmmac-1:00, irq=-1)
[   13.390093] stmmaceth 0000:01:00.0 eth1: IEEE 1588-2008 Advanced Timestamp
supported
[   13.390200] stmmaceth 0000:01:00.0 eth1: registered PTP clock
[   14.436743] stmmaceth 0000:01:00.0 eth1: Link is Up - 1Gbps/Full - flow
control off
[   21.056476]  stmmac_set_wol+0x55/0xc0

Conclusions: No packets lost, clean stmmac log.


* TEST 2: iperf

Server side:

#:/media/DevDisk/gitrepo/mainline-net$ iperf -s -B XXXX.0.3
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address XXXXX.0.3
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local XXXX.0.3 port 5001 connected with XXXX.0.2 port 54092
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-20.1 sec  1.03 GBytes   443 Mbits/sec

Client side:

#:~/temp$ iperf -c XXXX.0.3 --port 5001 -t 20 -i 5
------------------------------------------------------------
Client connecting to XXXX.0.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local XXXXX.0.2 port 54092 connected with XXXXX.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   265 MBytes   445 Mbits/sec
[  3]  5.0-10.0 sec   265 MBytes   444 Mbits/sec
[  3] 10.0-15.0 sec   264 MBytes   444 Mbits/sec
[  3] 15.0-20.0 sec   263 MBytes   442 Mbits/sec
[  3]  0.0-20.0 sec  1.03 GBytes   444 Mbits/sec


Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers"
@ 2017-03-23 12:55               ` Joao Pinto
  0 siblings, 0 replies; 20+ messages in thread
From: Joao Pinto @ 2017-03-23 12:55 UTC (permalink / raw)
  To: linux-arm-kernel

?s 10:56 AM de 3/23/2017, Joao Pinto escreveu:
> ?s 10:51 AM de 3/23/2017, Giuseppe CAVALLARO escreveu:
>> On 3/23/2017 11:48 AM, Giuseppe CAVALLARO wrote:
>>> Hello
>>>
>>> On 3/23/2017 11:20 AM, Corentin Labbe wrote:
>>>>> I have a 4.21 QoS Core with 4 RX + 4 TX and detected no regression.
>>>>>> Could you please share the iperf cmds you are using in order for me to
>>>>> reproduce
>>>>>> in my side?
>>>

HW Version: 4.21 QoS Core in HAPS DX7 (FPGA)
The connection between the FPGA and PC where stmmac is running is PCIe.
My configurations are done in stmmac_pci. Here they are:

@@ -68,10 +70,52 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
 {
 	plat->bus_id = 1;
 	plat->phy_addr = 0;
-	plat->interface = PHY_INTERFACE_MODE_GMII;
-	plat->clk_csr = 2;	/* clk_csr_i = 20-35MHz & MDC = clk_csr_i/16 */
-	plat->has_gmac = 1;
-	plat->force_sf_dma_mode = 1;
+	plat->interface = PHY_INTERFACE_MODE_SGMII;
+	plat->clk_csr = 0x5;
+	plat->has_gmac = 0;
+	plat->has_gmac4 = 1;
+	plat->force_sf_dma_mode = 0;
+
+	plat->rx_queues_to_use = 4;
+	plat->tx_queues_to_use = 4;
+
+	plat->rx_sched_algorithm = MTL_RX_ALGORITHM_SP;
+
+	plat->rx_queues_cfg[0].mode_to_use = MTL_QUEUE_AVB;
+	plat->rx_queues_cfg[1].mode_to_use = MTL_QUEUE_DCB;
+	plat->rx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+	plat->rx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+	plat->tx_queues_cfg[0].mode_to_use = MTL_QUEUE_DCB;
+	plat->tx_queues_cfg[1].mode_to_use = MTL_QUEUE_AVB;
+	plat->tx_queues_cfg[2].mode_to_use = MTL_QUEUE_DCB;
+	plat->tx_queues_cfg[3].mode_to_use = MTL_QUEUE_DCB;
+
+	plat->tx_queues_cfg[1].send_slope = 0xCCC;
+	plat->tx_queues_cfg[1].idle_slope = 0x1333;
+	plat->tx_queues_cfg[1].high_credit = 0x4B0000;
+	plat->tx_queues_cfg[1].low_credit = 0xFFB50000;
+
+	plat->rx_queues_cfg[0].chan = 0;
+	plat->rx_queues_cfg[1].chan = 1;
+	plat->rx_queues_cfg[2].chan = 2;
+	plat->rx_queues_cfg[3].chan = 3;
+
+	plat->tx_sched_algorithm = MTL_TX_ALGORITHM_WRR;
+	plat->tx_queues_cfg[0].weight = 0x10;
+	plat->tx_queues_cfg[1].weight = 0x11;
+	plat->tx_queues_cfg[2].weight = 0x12;
+	plat->tx_queues_cfg[3].weight = 0x13;
+
+	/* Disable Priority config by default */
+	plat->tx_queues_cfg[0].use_prio = false;
+	plat->rx_queues_cfg[0].use_prio = false;
+
+	/* Disable RX queues routing by default */
+	plat->rx_queues_cfg[0].pkt_route = 0x0;
+	plat->rx_queues_cfg[1].pkt_route = 0x0;
+	plat->rx_queues_cfg[2].pkt_route = 0x0;
+	plat->rx_queues_cfg[3].pkt_route = 0x0;

 	plat->mdio_bus_data->phy_reset = NULL;
 	plat->mdio_bus_data->phy_mask = 0;
@@ -83,22 +127,14 @@ static void stmmac_default_data(struct plat_stmmacenet_data
*plat)
 	/* Set default value for multicast hash bins */
 	plat->multicast_filter_bins = HASH_TABLE_SIZE;

+	plat->dma_cfg->fixed_burst = 0;
+	plat->dma_cfg->aal = 0;
+
 	/* Set default value for unicast filter entries */
 	plat->unicast_filter_entries = 1;

 	/* Set the maxmtu to a default of JUMBO_LEN */
 	plat->maxmtu = JUMBO_LEN;
-
-	/* Set default number of RX and TX queues to use */
-	plat->tx_queues_to_use = 1;
-	plat->rx_queues_to_use = 1;
-
-	/* Disable Priority config by default */
-	plat->tx_queues_cfg[0].use_prio = false;
-	plat->rx_queues_cfg[0].use_prio = false;
-
-	/* Disable RX queues routing by default */
-	plat->rx_queues_cfg[0].pkt_route = 0x0;
 }


******* TESTS *******


*TEST 1: File (linux-next tarball) transfer of ~1.4G by scp to the DUT*

scp net-next-20170323.tar.gz xxxxx at XXXXXXX:/home/synopsys/
The authenticity of host 'XXXXX' can't be established.
ECDSA key fingerprint is SHA256:/XXXXXX.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'XXXXXX' (ECDSA) to the list of known hosts.
XXXXXX at XXXXX's password:
net-next20170323.tar.gz

             100% 1366MB  79.3MB/s   00:17

ifconfig after transfer:

eth1      Link encap:Ethernet  HWaddr XXXX
          inet addr:XXXX  Bcast:XXXX  Mask:XXXX
          inet6 addr: XXXXX Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1026614 errors:0 dropped:0 overruns:0 frame:0
          TX packets:56804 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1502856063 (1.5 GB)  TX bytes:4224767 (4.2 MB)
          Interrupt:16

*stmmac Log after transfer:

#:~/temp$ dmesg | grep stmmac
[    0.278200] stmmac - user ID: 0x10, Synopsys ID: 0x42
[    0.278207] stmmaceth 0000:01:00.0: DMA HW capability register supported
[    0.278209] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported
[    0.278211] stmmaceth 0000:01:00.0: TX Checksum insertion supported
[    0.278224] stmmaceth 0000:01:00.0: Enable RX Mitigation via HW Watchdog Timer
[    0.315596] libphy: stmmac: probed
[    0.315601] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 0 IRQ POLL (stmmac-1:00) active
[    0.315605] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 1 IRQ POLL (stmmac-1:01)
[    0.315608] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 2 IRQ POLL (stmmac-1:02)
[    0.315612] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): PHY
ID 01410cc2 at 3 IRQ POLL (stmmac-1:03)
[   13.380009] Generic PHY stmmac-1:00: attached PHY driver [Generic PHY]
(mii_bus:phy_addr=stmmac-1:00, irq=-1)
[   13.390093] stmmaceth 0000:01:00.0 eth1: IEEE 1588-2008 Advanced Timestamp
supported
[   13.390200] stmmaceth 0000:01:00.0 eth1: registered PTP clock
[   14.436743] stmmaceth 0000:01:00.0 eth1: Link is Up - 1Gbps/Full - flow
control off
[   21.056476]  stmmac_set_wol+0x55/0xc0

Conclusions: No packets lost, clean stmmac log.


* TEST 2: iperf

Server side:

#:/media/DevDisk/gitrepo/mainline-net$ iperf -s -B XXXX.0.3
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address XXXXX.0.3
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local XXXX.0.3 port 5001 connected with XXXX.0.2 port 54092
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-20.1 sec  1.03 GBytes   443 Mbits/sec

Client side:

#:~/temp$ iperf -c XXXX.0.3 --port 5001 -t 20 -i 5
------------------------------------------------------------
Client connecting to XXXX.0.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local XXXXX.0.2 port 54092 connected with XXXXX.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   265 MBytes   445 Mbits/sec
[  3]  5.0-10.0 sec   265 MBytes   444 Mbits/sec
[  3] 10.0-15.0 sec   264 MBytes   444 Mbits/sec
[  3] 15.0-20.0 sec   263 MBytes   442 Mbits/sec
[  3]  0.0-20.0 sec  1.03 GBytes   444 Mbits/sec


Thanks,
Joao

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-03-23 12:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-23 10:08 stmmac: Performance regression after commit aff3d9eff843 "net: stmmac: enable multiple buffers" Corentin Labbe
2017-03-23 10:08 ` Corentin Labbe
2017-03-23 10:12 ` Joao Pinto
2017-03-23 10:12   ` Joao Pinto
2017-03-23 10:20   ` Corentin Labbe
2017-03-23 10:20     ` Corentin Labbe
2017-03-23 10:40     ` Joao Pinto
2017-03-23 10:40       ` Joao Pinto
2017-03-23 10:48       ` Giuseppe CAVALLARO
2017-03-23 10:48         ` Giuseppe CAVALLARO
2017-03-23 10:48         ` Giuseppe CAVALLARO
2017-03-23 10:51         ` Giuseppe CAVALLARO
2017-03-23 10:51           ` Giuseppe CAVALLARO
2017-03-23 10:51           ` Giuseppe CAVALLARO
2017-03-23 10:56           ` Joao Pinto
2017-03-23 10:56             ` Joao Pinto
2017-03-23 12:55             ` Joao Pinto
2017-03-23 12:55               ` Joao Pinto
2017-03-23 10:54         ` Joao Pinto
2017-03-23 10:54           ` Joao Pinto

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.