linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
@ 2023-09-27 11:16 Vishvambar Panth S
  2023-09-29 17:35 ` Jacob Keller
  2023-10-04 19:20 ` Jakub Kicinski
  0 siblings, 2 replies; 10+ messages in thread
From: Vishvambar Panth S @ 2023-09-27 11:16 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: bryan.whitehead, UNGLinuxDriver, davem, edumazet, kuba, pabeni, andrew

The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
device supports placing the descriptors in memory back to back or
reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
configurable hardware setting. Currently DSPACE is unnecessarily set to
match the host's L1 cache line size, resulting in space reserved in
between descriptors in most platforms and causing a suboptimal behavior
(single PCIe Mem transaction per descriptor). By changing the setting
to DSPACE=16 many descriptors can be packed in a single PCIe Mem
transaction resulting in a massive performance improvement in
bidirectional tests without any negative effects.
Tested and verified improvements on x64 PC and several ARM platforms
(typical data below)

Test setup 1: x64 PC with LAN7430 ---> x64 PC

iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec   170 MBytes   143 Mbits/sec  sender
[  5][TX-C]   0.00-10.04  sec   169 MBytes   141 Mbits/sec  receiver
[  7][RX-C]   0.00-10.00  sec  1.02 GBytes   876 Mbits/sec  sender
[  7][RX-C]   0.00-10.04  sec  1.02 GBytes   870 Mbits/sec  receiver

iperf3 UDP bidirectional with DSPACE set to 16 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec  sender
[  5][TX-C]   0.00-10.04  sec  1.11 GBytes   951 Mbits/sec  receiver
[  7][RX-C]   0.00-10.00  sec  1.10 GBytes   948 Mbits/sec  sender
[  7][RX-C]   0.00-10.04  sec  1.10 GBytes   942 Mbits/sec  receiver

Test setup 2 : RK3399 with LAN7430 ---> x64 PC

RK3399 Spec:
The SOM-RK3399 is ARM module designed and developed by FriendlyElec.
Cores: 64-bit Dual Core Cortex-A72 + Quad Core Cortex-A53
Frequency: Cortex-A72(up to 2.0GHz), Cortex-A53(up to 1.5GHz)
PCIe: PCIe x4, compatible with PCIe 2.1, Dual operation mode

iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec   534 MBytes   448 Mbits/sec  sender
[  5][TX-C]   0.00-10.05  sec   534 MBytes   446 Mbits/sec  receiver
[  7][RX-C]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  sender
[  7][RX-C]   0.00-10.05  sec  1.11 GBytes   946 Mbits/sec  receiver

iperf3 UDP bidirectional with DSPACE set to 16 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate
[  5][TX-C]   0.00-10.00  sec   966 MBytes   810 Mbits/sec   sender
[  5][TX-C]   0.00-10.04  sec   965 MBytes   806 Mbits/sec   receiver
[  7][RX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec   sender
[  7][RX-C]   0.00-10.04  sec  1.07 GBytes   919 Mbits/sec   receiver

Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com>
---
 drivers/net/ethernet/microchip/lan743x_main.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h
index 52609fc13ad9..6dac6fef7d24 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.h
+++ b/drivers/net/ethernet/microchip/lan743x_main.h
@@ -1067,7 +1067,7 @@ struct lan743x_adapter {
 #define DMA_DESCRIPTOR_SPACING_32       (32)
 #define DMA_DESCRIPTOR_SPACING_64       (64)
 #define DMA_DESCRIPTOR_SPACING_128      (128)
-#define DEFAULT_DMA_DESCRIPTOR_SPACING  (L1_CACHE_BYTES)
+#define DEFAULT_DMA_DESCRIPTOR_SPACING  (DMA_DESCRIPTOR_SPACING_16)
 
 #define DMAC_CHANNEL_STATE_SET(start_bit, stop_bit) \
 	(((start_bit) ? 2 : 0) | ((stop_bit) ? 1 : 0))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-09-27 11:16 [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement Vishvambar Panth S
@ 2023-09-29 17:35 ` Jacob Keller
  2023-10-05  5:17   ` VishvambarPanth.S
  2023-10-04 19:20 ` Jakub Kicinski
  1 sibling, 1 reply; 10+ messages in thread
From: Jacob Keller @ 2023-09-29 17:35 UTC (permalink / raw)
  To: Vishvambar Panth S, netdev, linux-kernel
  Cc: bryan.whitehead, UNGLinuxDriver, davem, edumazet, kuba, pabeni, andrew



On 9/27/2023 4:16 AM, Vishvambar Panth S wrote:
> The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
> device supports placing the descriptors in memory back to back or
> reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
> configurable hardware setting. Currently DSPACE is unnecessarily set to
> match the host's L1 cache line size, resulting in space reserved in
> between descriptors in most platforms and causing a suboptimal behavior
> (single PCIe Mem transaction per descriptor). By changing the setting
> to DSPACE=16 many descriptors can be packed in a single PCIe Mem
> transaction resulting in a massive performance improvement in
> bidirectional tests without any negative effects.
> Tested and verified improvements on x64 PC and several ARM platforms
> (typical data below)
> 

I assume here the choice of 16 is to get the 16 bytes from 4 dwords?

> Test setup 1: x64 PC with LAN7430 ---> x64 PC
> 
> iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec   170 MBytes   143 Mbits/sec  sender
> [  5][TX-C]   0.00-10.04  sec   169 MBytes   141 Mbits/sec  receiver
> [  7][RX-C]   0.00-10.00  sec  1.02 GBytes   876 Mbits/sec  sender
> [  7][RX-C]   0.00-10.04  sec  1.02 GBytes   870 Mbits/sec  receiver
> 
> iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec  sender
> [  5][TX-C]   0.00-10.04  sec  1.11 GBytes   951 Mbits/sec  receiver
> [  7][RX-C]   0.00-10.00  sec  1.10 GBytes   948 Mbits/sec  sender
> [  7][RX-C]   0.00-10.04  sec  1.10 GBytes   942 Mbits/sec  receiver
> 

Going from barely transmitting to hitting the line rate *and* improving
both Tx and Rx slightly is fantastic. Huge win just avoiding all that
unnecessary wasted PCIe bandwidth.

> Test setup 2 : RK3399 with LAN7430 ---> x64 PC
> 
> RK3399 Spec:
> The SOM-RK3399 is ARM module designed and developed by FriendlyElec.
> Cores: 64-bit Dual Core Cortex-A72 + Quad Core Cortex-A53
> Frequency: Cortex-A72(up to 2.0GHz), Cortex-A53(up to 1.5GHz)
> PCIe: PCIe x4, compatible with PCIe 2.1, Dual operation mode
> 
> iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec   534 MBytes   448 Mbits/sec  sender
> [  5][TX-C]   0.00-10.05  sec   534 MBytes   446 Mbits/sec  receiver
> [  7][RX-C]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  sender
> [  7][RX-C]   0.00-10.05  sec  1.11 GBytes   946 Mbits/sec  receiver
> 
> iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID][Role] Interval           Transfer     Bitrate
> [  5][TX-C]   0.00-10.00  sec   966 MBytes   810 Mbits/sec   sender
> [  5][TX-C]   0.00-10.04  sec   965 MBytes   806 Mbits/sec   receiver
> [  7][RX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec   sender
> [  7][RX-C]   0.00-10.04  sec  1.07 GBytes   919 Mbits/sec   receiver
> 
> Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
> Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com>
> ---

Always fun to see a massive win like this! Even just the 2x improvement
on the RK3399 is huge, let alone 6x improvement on the x86 platform.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

>  drivers/net/ethernet/microchip/lan743x_main.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h
> index 52609fc13ad9..6dac6fef7d24 100644
> --- a/drivers/net/ethernet/microchip/lan743x_main.h
> +++ b/drivers/net/ethernet/microchip/lan743x_main.h
> @@ -1067,7 +1067,7 @@ struct lan743x_adapter {
>  #define DMA_DESCRIPTOR_SPACING_32       (32)
>  #define DMA_DESCRIPTOR_SPACING_64       (64)
>  #define DMA_DESCRIPTOR_SPACING_128      (128)
> -#define DEFAULT_DMA_DESCRIPTOR_SPACING  (L1_CACHE_BYTES)
> +#define DEFAULT_DMA_DESCRIPTOR_SPACING  (DMA_DESCRIPTOR_SPACING_16)
>  
>  #define DMAC_CHANNEL_STATE_SET(start_bit, stop_bit) \
>  	(((start_bit) ? 2 : 0) | ((stop_bit) ? 1 : 0))

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-09-27 11:16 [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement Vishvambar Panth S
  2023-09-29 17:35 ` Jacob Keller
@ 2023-10-04 19:20 ` Jakub Kicinski
  2023-10-04 20:02   ` Florian Fainelli
  1 sibling, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-10-04 19:20 UTC (permalink / raw)
  To: Vishvambar Panth S
  Cc: netdev, linux-kernel, bryan.whitehead, UNGLinuxDriver, davem,
	edumazet, pabeni, andrew

On Wed, 27 Sep 2023 16:46:23 +0530 Vishvambar Panth S wrote:
> The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
> device supports placing the descriptors in memory back to back or
> reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
> configurable hardware setting. Currently DSPACE is unnecessarily set to
> match the host's L1 cache line size, resulting in space reserved in
> between descriptors in most platforms and causing a suboptimal behavior
> (single PCIe Mem transaction per descriptor). By changing the setting
> to DSPACE=16 many descriptors can be packed in a single PCIe Mem
> transaction resulting in a massive performance improvement in
> bidirectional tests without any negative effects.
> Tested and verified improvements on x64 PC and several ARM platforms
> (typical data below)

Nobody complained for 5 years, and it's not a regression.
Let's not treat this as a fix, please repost without the Fixes tag for
net-next.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-10-04 19:20 ` Jakub Kicinski
@ 2023-10-04 20:02   ` Florian Fainelli
  2023-10-04 20:09     ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Fainelli @ 2023-10-04 20:02 UTC (permalink / raw)
  To: Jakub Kicinski, Vishvambar Panth S
  Cc: netdev, linux-kernel, bryan.whitehead, UNGLinuxDriver, davem,
	edumazet, pabeni, andrew

On 10/4/23 12:20, Jakub Kicinski wrote:
> On Wed, 27 Sep 2023 16:46:23 +0530 Vishvambar Panth S wrote:
>> The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but the
>> device supports placing the descriptors in memory back to back or
>> reserving space in between them using its DMA_DESCRIPTOR_SPACE (DSPACE)
>> configurable hardware setting. Currently DSPACE is unnecessarily set to
>> match the host's L1 cache line size, resulting in space reserved in
>> between descriptors in most platforms and causing a suboptimal behavior
>> (single PCIe Mem transaction per descriptor). By changing the setting
>> to DSPACE=16 many descriptors can be packed in a single PCIe Mem
>> transaction resulting in a massive performance improvement in
>> bidirectional tests without any negative effects.
>> Tested and verified improvements on x64 PC and several ARM platforms
>> (typical data below)
> 
> Nobody complained for 5 years, and it's not a regression.
> Let's not treat this as a fix, please repost without the Fixes tag for
> net-next.

As a driver maintainer, you may want to provide some guarantees to your 
end users/customers that from stable version X.Y.Z the performance 
issues have been fixed. Performance improvements are definitively border 
line in terms of being considered as bug fixes though.
-- 
Florian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-10-04 20:02   ` Florian Fainelli
@ 2023-10-04 20:09     ` Jakub Kicinski
  2023-11-01  7:20       ` VishvambarPanth.S
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-10-04 20:09 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Vishvambar Panth S, netdev, linux-kernel, bryan.whitehead,
	UNGLinuxDriver, davem, edumazet, pabeni, andrew

On Wed, 4 Oct 2023 13:02:17 -0700 Florian Fainelli wrote:
> > Nobody complained for 5 years, and it's not a regression.
> > Let's not treat this as a fix, please repost without the Fixes tag for
> > net-next.  
> 
> As a driver maintainer, you may want to provide some guarantees to your 
> end users/customers that from stable version X.Y.Z the performance 
> issues have been fixed. Performance improvements are definitively border 
> line in terms of being considered as bug fixes though.

I understand that, but too often people just "feel like a device which
advertises X Mbps / Gbps should reach line rate" while no end user
cares.

Luckily stable rules are pretty clear about this (search for
"performance"): 
https://docs.kernel.org/process/stable-kernel-rules.html

As posted it doesn't fulfill the requirements 🤷️

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-09-29 17:35 ` Jacob Keller
@ 2023-10-05  5:17   ` VishvambarPanth.S
  0 siblings, 0 replies; 10+ messages in thread
From: VishvambarPanth.S @ 2023-10-05  5:17 UTC (permalink / raw)
  To: jacob.e.keller, netdev, linux-kernel
  Cc: andrew, pabeni, edumazet, davem, UNGLinuxDriver, Bryan.Whitehead, kuba

On Fri, 2023-09-29 at 10:35 -0700, Jacob Keller wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On 9/27/2023 4:16 AM, Vishvambar Panth S wrote:
> > The LAN743x/PCI11xxx DMA descriptors are always 4 dwords long, but
> > the
> > device supports placing the descriptors in memory back to back or
> > reserving space in between them using its DMA_DESCRIPTOR_SPACE
> > (DSPACE)
> > configurable hardware setting. Currently DSPACE is unnecessarily
> > set to
> > match the host's L1 cache line size, resulting in space reserved in
> > between descriptors in most platforms and causing a suboptimal
> > behavior
> > (single PCIe Mem transaction per descriptor). By changing the
> > setting
> > to DSPACE=16 many descriptors can be packed in a single PCIe Mem
> > transaction resulting in a massive performance improvement in
> > bidirectional tests without any negative effects.
> > Tested and verified improvements on x64 PC and several ARM
> > platforms
> > (typical data below)
> > 
> 
> I assume here the choice of 16 is to get the 16 bytes from 4 dwords?

Thanks for the comments. Yes, it is set to 16 bytes to match the size
of the descriptors which are 4 dwords long.
> 
> > Test setup 1: x64 PC with LAN7430 ---> x64 PC
> > 
> > iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec   170 MBytes   143 Mbits/sec  sender
> > [  5][TX-C]   0.00-10.04  sec   169 MBytes   141 Mbits/sec 
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.02 GBytes   876 Mbits/sec  sender
> > [  7][RX-C]   0.00-10.04  sec  1.02 GBytes   870 Mbits/sec 
> > receiver
> > 
> > iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec  sender
> > [  5][TX-C]   0.00-10.04  sec  1.11 GBytes   951 Mbits/sec 
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.10 GBytes   948 Mbits/sec  sender
> > [  7][RX-C]   0.00-10.04  sec  1.10 GBytes   942 Mbits/sec 
> > receiver
> > 
> 
> Going from barely transmitting to hitting the line rate *and*
> improving
> both Tx and Rx slightly is fantastic. Huge win just avoiding all that
> unnecessary wasted PCIe bandwidth.
> 
> > Test setup 2 : RK3399 with LAN7430 ---> x64 PC
> > 
> > RK3399 Spec:
> > The SOM-RK3399 is ARM module designed and developed by
> > FriendlyElec.
> > Cores: 64-bit Dual Core Cortex-A72 + Quad Core Cortex-A53
> > Frequency: Cortex-A72(up to 2.0GHz), Cortex-A53(up to 1.5GHz)
> > PCIe: PCIe x4, compatible with PCIe 2.1, Dual operation mode
> > 
> > iperf3 UDP bidirectional with DSPACE set to L1 CACHE Size:
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec   534 MBytes   448 Mbits/sec  sender
> > [  5][TX-C]   0.00-10.05  sec   534 MBytes   446 Mbits/sec 
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.12 GBytes   961 Mbits/sec  sender
> > [  7][RX-C]   0.00-10.05  sec  1.11 GBytes   946 Mbits/sec 
> > receiver
> > 
> > iperf3 UDP bidirectional with DSPACE set to 16 Bytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID][Role] Interval           Transfer     Bitrate
> > [  5][TX-C]   0.00-10.00  sec   966 MBytes   810 Mbits/sec   sender
> > [  5][TX-C]   0.00-10.04  sec   965 MBytes   806 Mbits/sec  
> > receiver
> > [  7][RX-C]   0.00-10.00  sec  1.11 GBytes   956 Mbits/sec   sender
> > [  7][RX-C]   0.00-10.04  sec  1.07 GBytes   919 Mbits/sec  
> > receiver
> > 
> > Fixes: 23f0703c125b ("lan743x: Add main source files for new
> > lan743x driver")
> > Signed-off-by: Vishvambar Panth S <vishvambarpanth.s@microchip.com>
> > ---
> 
> Always fun to see a massive win like this! Even just the 2x
> improvement
> on the RK3399 is huge, let alone 6x improvement on the x86 platform.
> 
> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> 
> >  drivers/net/ethernet/microchip/lan743x_main.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/ethernet/microchip/lan743x_main.h
> > b/drivers/net/ethernet/microchip/lan743x_main.h
> > index 52609fc13ad9..6dac6fef7d24 100644
> > --- a/drivers/net/ethernet/microchip/lan743x_main.h
> > +++ b/drivers/net/ethernet/microchip/lan743x_main.h
> > @@ -1067,7 +1067,7 @@ struct lan743x_adapter {
> >  #define DMA_DESCRIPTOR_SPACING_32       (32)
> >  #define DMA_DESCRIPTOR_SPACING_64       (64)
> >  #define DMA_DESCRIPTOR_SPACING_128      (128)
> > -#define DEFAULT_DMA_DESCRIPTOR_SPACING  (L1_CACHE_BYTES)
> > +#define DEFAULT_DMA_DESCRIPTOR_SPACING 
> > (DMA_DESCRIPTOR_SPACING_16)
> > 
> >  #define DMAC_CHANNEL_STATE_SET(start_bit, stop_bit) \
> >       (((start_bit) ? 2 : 0) | ((stop_bit) ? 1 : 0))


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-10-04 20:09     ` Jakub Kicinski
@ 2023-11-01  7:20       ` VishvambarPanth.S
  2023-11-09 10:53         ` VishvambarPanth.S
  0 siblings, 1 reply; 10+ messages in thread
From: VishvambarPanth.S @ 2023-11-01  7:20 UTC (permalink / raw)
  To: kuba, f.fainelli
  Cc: Bryan.Whitehead, andrew, davem, linux-kernel, pabeni, netdev,
	UNGLinuxDriver, edumazet

On Wed, 2023-10-04 at 13:09 -0700, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On Wed, 4 Oct 2023 13:02:17 -0700 Florian Fainelli wrote:
> > > Nobody complained for 5 years, and it's not a regression.
> > > Let's not treat this as a fix, please repost without the Fixes
> > > tag for
> > > net-next.
> > 
> > As a driver maintainer, you may want to provide some guarantees to
> > your
> > end users/customers that from stable version X.Y.Z the performance
> > issues have been fixed. Performance improvements are definitively
> > border
> > line in terms of being considered as bug fixes though.
> 
> I understand that, but too often people just "feel like a device
> which
> advertises X Mbps / Gbps should reach line rate" while no end user
> cares.
> 
> Luckily stable rules are pretty clear about this (search for
> "performance"):
> https://docs.kernel.org/process/stable-kernel-rules.html
> 
> As posted it doesn't fulfill the requirements 🤷️

Thanks for your feedback. I apologize for the delayed response.
 
The data presented in the patch description was aimed to convince a
reviewer with the visible impact of the performance boosts in both x64
and ARM platforms. However, the main motivation behind the patch was
not merely a "good-to-have" improvement but a solution to the
throughput issues reported by multiple customers in several platforms.
We received lots of customer requests through our ticket site system
urging us to address the performance issues on multiple kernel versions
including LTS. While it's acknowledged that stable branch rules
typically do not consider performance fixes that are not documented in
public Bugzilla, this performance enhancement is essential to many of
our customers and their end users and we believe should therefore be
considered for stable branch on the basis of it’s visible user impact.
 
Few issues reported by our customers are mentioned below, even though
these issues have existed for a long time, the data presented below is
collected from the customer within last 3 months. 

Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
improved the performance to ~900Mbps Rx  in their platform.

Customer-B using lan743x with v5.10 has an issue with Tx UDP being only
157Mbps in their platform. Including the fix in the patch boosts the
performance to ~600Mbps in Tx UDP.

Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
Tx/Rx to be 126/723 Mbps and the fix improved the performance to
828/956 Mbps.

Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
for their platform from UDP Rx 200Mbps. The fix along with few other
changes helped us to bring Rx perf to 800Mbps in customer’s platform
 
This is a kind request for considering the acceptance of this patch
into the net branch, as it has a significant positive impact on users
and does not have any adverse effects.
 
Thanks,
Vishvambar Panth S
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-11-01  7:20       ` VishvambarPanth.S
@ 2023-11-09 10:53         ` VishvambarPanth.S
  2023-11-09 23:04           ` Jakub Kicinski
  0 siblings, 1 reply; 10+ messages in thread
From: VishvambarPanth.S @ 2023-11-09 10:53 UTC (permalink / raw)
  To: kuba, f.fainelli
  Cc: Bryan.Whitehead, andrew, davem, linux-kernel, pabeni, netdev,
	UNGLinuxDriver, edumazet

On Wed, 2023-11-01 at 12:52 +0530, Vishvambar Panth S wrote:
> On Wed, 2023-10-04 at 13:09 -0700, Jakub Kicinski wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > know the content is safe
> > 
> > On Wed, 4 Oct 2023 13:02:17 -0700 Florian Fainelli wrote:
> > > > Nobody complained for 5 years, and it's not a regression.
> > > > Let's not treat this as a fix, please repost without the Fixes
> > > > tag for
> > > > net-next.
> > > 
> > > As a driver maintainer, you may want to provide some guarantees
> > > to
> > > your
> > > end users/customers that from stable version X.Y.Z the
> > > performance
> > > issues have been fixed. Performance improvements are definitively
> > > border
> > > line in terms of being considered as bug fixes though.
> > 
> > I understand that, but too often people just "feel like a device
> > which
> > advertises X Mbps / Gbps should reach line rate" while no end user
> > cares.
> > 
> > Luckily stable rules are pretty clear about this (search for
> > "performance"):
> > https://docs.kernel.org/process/stable-kernel-rules.html
> > 
> > As posted it doesn't fulfill the requirements 
> 
> Thanks for your feedback. I apologize for the delayed response.
>  
> The data presented in the patch description was aimed to convince a
> reviewer with the visible impact of the performance boosts in both
> x64
> and ARM platforms. However, the main motivation behind the patch was
> not merely a "good-to-have" improvement but a solution to the
> throughput issues reported by multiple customers in several
> platforms.
> We received lots of customer requests through our ticket site system
> urging us to address the performance issues on multiple kernel
> versions
> including LTS. While it's acknowledged that stable branch rules
> typically do not consider performance fixes that are not documented
> in
> public Bugzilla, this performance enhancement is essential to many of
> our customers and their end users and we believe should therefore be
> considered for stable branch on the basis of it’s visible user
> impact.
>  
> Few issues reported by our customers are mentioned below, even though
> these issues have existed for a long time, the data presented below
> is
> collected from the customer within last 3 months. 
> 
> Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
> kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
> improved the performance to ~900Mbps Rx  in their platform.
> 
> Customer-B using lan743x with v5.10 has an issue with Tx UDP being
> only
> 157Mbps in their platform. Including the fix in the patch boosts the
> performance to ~600Mbps in Tx UDP.
> 
> Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
> Tx/Rx to be 126/723 Mbps and the fix improved the performance to
> 828/956 Mbps.
> 
> Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
> for their platform from UDP Rx 200Mbps. The fix along with few other
> changes helped us to bring Rx perf to 800Mbps in customer’s platform
>  
> This is a kind request for considering the acceptance of this patch
> into the net branch, as it has a significant positive impact on users
> and does not have any adverse effects.
>  
> Thanks,
> Vishvambar Panth S
>  
> 
It has come to my attention that some people may not have received my
whole reply dated Nov 1st (as per
https://patchwork.kernel.org/project/netdevbpf/patch/20230927111623.9966-1-vishvambarpanth.s@microchip.com/#25577895
), possibly due to a non-ASCII character at the cut-off point.
Therefore, I am resending the part that was cut short below.
 
Jakub, would it be possible for you to apply the patch to the net
branch given the additional justification now posted below?
 
-----
 
Thanks for your feedback. I apologize for the delayed response.

The data presented in the patch description was aimed to convince a
reviewer with the visible impact of the performance boosts in both x64
and ARM platforms. However, the main motivation behind the patch was
not merely a "good-to-have" improvement but a solution to the
throughput issues reported by multiple customers in several platforms.
We received lots of customer requests through our ticket site system
urging us to address the performance issues on multiple kernel versions
including LTS. While it's acknowledged that stable branch rules
typically do not consider performance fixes that are not documented in
public Bugzilla, this performance enhancement is essential to many of
our customers and their end users and we believe should therefore be
considered for stable branch on the basis of it’s visible user impact.
Few issues reported by our customers are mentioned below, even though
these issues have existed for a long time, the data presented below is
collected from the customer within last 3 months.
 
Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
improved the performance to ~900Mbps Rx  in their platform.
 
Customer-B using lan743x with v5.10 has an issue with Tx UDP being only
157Mbps in their platform. Including the fix in the patch boosts the
performance to ~600Mbps in Tx UDP.
 
Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
Tx/Rx to be 126/723 Mbps and the fix improved the performance to
828/956 Mbps.
 
Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
for their platform from UDP Rx 200Mbps. The fix along with few other
changes helped us to bring Rx perf to 800Mbps in customer’s platform

This is a kind request for considering the acceptance of this patch
into the net branch, as it has a significant positive impact on users
and does not have any adverse effects.

Thanks,
Vishvambar Panth S

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-11-09 10:53         ` VishvambarPanth.S
@ 2023-11-09 23:04           ` Jakub Kicinski
  2023-11-16  5:49             ` VishvambarPanth.S
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Kicinski @ 2023-11-09 23:04 UTC (permalink / raw)
  To: VishvambarPanth.S
  Cc: f.fainelli, Bryan.Whitehead, andrew, davem, linux-kernel, pabeni,
	netdev, UNGLinuxDriver, edumazet

On Thu, 9 Nov 2023 10:53:26 +0000 VishvambarPanth.S@microchip.com wrote:
> Thanks for your feedback. I apologize for the delayed response.
> 
> The data presented in the patch description was aimed to convince a
> reviewer with the visible impact of the performance boosts in both x64
> and ARM platforms. However, the main motivation behind the patch was
> not merely a "good-to-have" improvement but a solution to the
> throughput issues reported by multiple customers in several platforms.
> We received lots of customer requests through our ticket site system
> urging us to address the performance issues on multiple kernel versions
> including LTS. While it's acknowledged that stable branch rules
> typically do not consider performance fixes that are not documented in
> public Bugzilla, this performance enhancement is essential to many of
> our customers and their end users and we believe should therefore be
> considered for stable branch on the basis of it’s visible user impact.
> Few issues reported by our customers are mentioned below, even though
> these issues have existed for a long time, the data presented below is
> collected from the customer within last 3 months.
>  
> Customer-A using lan743x with Hisilicon- Kirin 990 processor in 5.10
> kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
> improved the performance to ~900Mbps Rx  in their platform.
>  
> Customer-B using lan743x with v5.10 has an issue with Tx UDP being only
> 157Mbps in their platform. Including the fix in the patch boosts the
> performance to ~600Mbps in Tx UDP.
>  
> Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
> Tx/Rx to be 126/723 Mbps and the fix improved the performance to
> 828/956 Mbps.
>  
> Customer-D using lan743x with Qcom 6490 with v5.4 wanted improvements
> for their platform from UDP Rx 200Mbps. The fix along with few other
> changes helped us to bring Rx perf to 800Mbps in customer’s platform
> 
> This is a kind request for considering the acceptance of this patch
> into the net branch, as it has a significant positive impact on users
> and does not have any adverse effects.

Thanks a lot for the details. Unfortunately after further consideration
I can't accept this patch as a fix with clear conscience. The code has
been this way for a long time, performance improvements should end up
in new kernels and people who want to benefit from faster kernels should
not be sticking to old LTS releases.

So please repost for net-next next week, when it's open again.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement
  2023-11-09 23:04           ` Jakub Kicinski
@ 2023-11-16  5:49             ` VishvambarPanth.S
  0 siblings, 0 replies; 10+ messages in thread
From: VishvambarPanth.S @ 2023-11-16  5:49 UTC (permalink / raw)
  To: kuba
  Cc: Bryan.Whitehead, andrew, davem, linux-kernel, pabeni, netdev,
	UNGLinuxDriver, f.fainelli, edumazet

On Thu, 2023-11-09 at 15:04 -0800, Jakub Kicinski wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
> 
> On Thu, 9 Nov 2023 10:53:26 +0000 VishvambarPanth.S@microchip.com
> wrote:
> > Thanks for your feedback. I apologize for the delayed response.
> > 
> > The data presented in the patch description was aimed to convince a
> > reviewer with the visible impact of the performance boosts in both
> > x64
> > and ARM platforms. However, the main motivation behind the patch
> > was
> > not merely a "good-to-have" improvement but a solution to the
> > throughput issues reported by multiple customers in several
> > platforms.
> > We received lots of customer requests through our ticket site
> > system
> > urging us to address the performance issues on multiple kernel
> > versions
> > including LTS. While it's acknowledged that stable branch rules
> > typically do not consider performance fixes that are not documented
> > in
> > public Bugzilla, this performance enhancement is essential to many
> > of
> > our customers and their end users and we believe should therefore
> > be
> > considered for stable branch on the basis of it’s visible user
> > impact.
> > Few issues reported by our customers are mentioned below, even
> > though
> > these issues have existed for a long time, the data presented below
> > is
> > collected from the customer within last 3 months.
> > 
> > Customer-A using lan743x with Hisilicon- Kirin 990 processor in
> > 5.10
> > kernel, reported a mere ~300Mbps in Rx UDP. The fix significantly
> > improved the performance to ~900Mbps Rx  in their platform.
> > 
> > Customer-B using lan743x with v5.10 has an issue with Tx UDP being
> > only
> > 157Mbps in their platform. Including the fix in the patch boosts
> > the
> > performance to ~600Mbps in Tx UDP.
> > 
> > Customer-C using lan743x with ADAS Ref Design in v5.10 reported UDP
> > Tx/Rx to be 126/723 Mbps and the fix improved the performance to
> > 828/956 Mbps.
> > 
> > Customer-D using lan743x with Qcom 6490 with v5.4 wanted
> > improvements
> > for their platform from UDP Rx 200Mbps. The fix along with few
> > other
> > changes helped us to bring Rx perf to 800Mbps in customer’s
> > platform
> > 
> > This is a kind request for considering the acceptance of this patch
> > into the net branch, as it has a significant positive impact on
> > users
> > and does not have any adverse effects.
> 
> Thanks a lot for the details. Unfortunately after further
> consideration
> I can't accept this patch as a fix with clear conscience. The code
> has
> been this way for a long time, performance improvements should end up
> in new kernels and people who want to benefit from faster kernels
> should
> not be sticking to old LTS releases.
> 
> So please repost for net-next next week, when it's open again.

Hi Jakub,
Thanks for your inputs. Have submitted this patch to the net-next
branch. 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-16  5:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-27 11:16 [PATCH net] net: microchip: lan743x : bidirectional throughuput improvement Vishvambar Panth S
2023-09-29 17:35 ` Jacob Keller
2023-10-05  5:17   ` VishvambarPanth.S
2023-10-04 19:20 ` Jakub Kicinski
2023-10-04 20:02   ` Florian Fainelli
2023-10-04 20:09     ` Jakub Kicinski
2023-11-01  7:20       ` VishvambarPanth.S
2023-11-09 10:53         ` VishvambarPanth.S
2023-11-09 23:04           ` Jakub Kicinski
2023-11-16  5:49             ` VishvambarPanth.S

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).