All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600
@ 2022-05-17  9:22 Joel Stanley
  2022-05-18 13:10 ` patchwork-bot+netdevbpf
  2022-05-21  2:51 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 5+ messages in thread
From: Joel Stanley @ 2022-05-17  9:22 UTC (permalink / raw)
  To: Andrew Lunn, David S . Miller, Jakub Kicinski, Andrew Jeffery
  Cc: netdev, linux-aspeed, David Wilder, Dylan Hung

The AST2600 when using the i210 NIC over NC-SI has been observed to
produce incorrect checksum results with specific MTU values. This was
first observed when sending data across a long distance set of networks.

On a local network, the following test was performed using a 1MB file of
random data.

On the receiver run this script:

 #!/bin/bash
 while [ 1 ]; do
        # Zero the stats
        nstat -r  > /dev/null
        nc -l 9899 > test-file
        # Check for checksum errors
        TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
        if [ -z "$TcpInCsumErrors" ]; then
                echo No TcpInCsumErrors
        else
                echo TcpInCsumErrors = $TcpInCsumErrors
        fi
 done

On an AST2600 system:

 # nc <IP of  receiver host> 9899 < test-file

The test was repeated with various MTU values:

 # ip link set mtu 1410 dev eth0

The observed results:

 1500 - good
 1434 - bad
 1400 - good
 1410 - bad
 1420 - good

The test was repeated after disabling tx checksumming:

 # ethtool -K eth0 tx-checksumming off

And all MTU values tested resulted in transfers without error.

An issue with the driver cannot be ruled out, however there has been no
bug discovered so far.

David has done the work to take the original bug report of slow data
transfer between long distance connections and triaged it down to this
test case.

The vendor suspects this this is a hardware issue when using NC-SI. The
fixes line refers to the patch that introduced AST2600 support.

Reported-by: David Wilder <wilder@us.ibm.com>
Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
---
v3 modifies the wrapping of the commit message.

v2 updates the commit message with confirmation from the vendor that
this is a hardware issue, and clarifies why the commit used in the fixes

 drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index caf48023f8ea..5231818943c6 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
 	/* AST2400  doesn't have working HW checksum generation */
 	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
 		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
+	/* AST2600 tx checksum with NCSI is broken */
+	if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
+		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
 	if (np && of_get_property(np, "no-hw-checksum", NULL))
 		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
 	netdev->features |= netdev->hw_features;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-17  9:22 [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600 Joel Stanley
@ 2022-05-18 13:10 ` patchwork-bot+netdevbpf
  2022-05-21  2:51 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-18 13:10 UTC (permalink / raw)
  To: Joel Stanley
  Cc: andrew, davem, kuba, andrew, netdev, linux-aspeed, wilder, dylan_hung

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 17 May 2022 18:52:17 +0930 you wrote:
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was
> first observed when sending data across a long distance set of networks.
> 
> On a local network, the following test was performed using a 1MB file of
> random data.
> 
> [...]

Here is the summary with links:
  - [net,v3] net: ftgmac100: Disable hardware checksum on AST2600
    https://git.kernel.org/netdev/net/c/6fd45e79e8b9

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-17  9:22 [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600 Joel Stanley
  2022-05-18 13:10 ` patchwork-bot+netdevbpf
@ 2022-05-21  2:51 ` Benjamin Herrenschmidt
  2022-05-23 22:25   ` Joel Stanley
  1 sibling, 1 reply; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2022-05-21  2:51 UTC (permalink / raw)
  To: Joel Stanley, Andrew Lunn, David S . Miller, Jakub Kicinski,
	Andrew Jeffery
  Cc: netdev, David Wilder, linux-aspeed

On Tue, 2022-05-17 at 18:52 +0930, Joel Stanley wrote:
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was
> first observed when sending data across a long distance set of
> networks.
> 
> On a local network, the following test was performed using a 1MB file
> of random data.

Can you double check with Aspeed what's going on there and whether
there's a way to instead, identify the bad case in the TX path and do
on-demand SW checksuming only in those cases ?

Because disabling HW checksum will kill performances afaik... (doesn't
it also end up disabling zero-copy and SG ?)

Cheers,
Ben.

> On the receiver run this script:
> 
>  #!/bin/bash
>  while [ 1 ]; do
>         # Zero the stats
>         nstat -r  > /dev/null
>         nc -l 9899 > test-file
>         # Check for checksum errors
>         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
>         if [ -z "$TcpInCsumErrors" ]; then
>                 echo No TcpInCsumErrors
>         else
>                 echo TcpInCsumErrors = $TcpInCsumErrors
>         fi
>  done
> 
> On an AST2600 system:
> 
>  # nc <IP of  receiver host> 9899 < test-file
> 
> The test was repeated with various MTU values:
> 
>  # ip link set mtu 1410 dev eth0
> 
> The observed results:
> 
>  1500 - good
>  1434 - bad
>  1400 - good
>  1410 - bad
>  1420 - good
> 
> The test was repeated after disabling tx checksumming:
> 
>  # ethtool -K eth0 tx-checksumming off
> 
> And all MTU values tested resulted in transfers without error.
> 
> An issue with the driver cannot be ruled out, however there has been
> no
> bug discovered so far.
> 
> David has done the work to take the original bug report of slow data
> transfer between long distance connections and triaged it down to
> this
> test case.
> 
> The vendor suspects this this is a hardware issue when using NC-SI.
> The
> fixes line refers to the patch that introduced AST2600 support.
> 
> Reported-by: David Wilder <wilder@us.ibm.com>
> Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
> v3 modifies the wrapping of the commit message.
> 
> v2 updates the commit message with confirmation from the vendor that
> this is a hardware issue, and clarifies why the commit used in the
> fixes
> 
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index caf48023f8ea..5231818943c6 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct
> platform_device *pdev)
>  	/* AST2400  doesn't have working HW checksum generation */
>  	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
>  		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
> +	/* AST2600 tx checksum with NCSI is broken */
> +	if (priv->use_ncsi && of_device_is_compatible(np,
> "aspeed,ast2600-mac"))
> +		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
>  	if (np && of_get_property(np, "no-hw-checksum", NULL))
>  		netdev->hw_features &= ~(NETIF_F_HW_CSUM |
> NETIF_F_RXCSUM);
>  	netdev->features |= netdev->hw_features;


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-21  2:51 ` Benjamin Herrenschmidt
@ 2022-05-23 22:25   ` Joel Stanley
  2022-05-23 23:44     ` Andrew Lunn
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Stanley @ 2022-05-23 22:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andrew Lunn, David S . Miller, Jakub Kicinski, Andrew Jeffery,
	Networking, David Wilder, linux-aspeed

On Sat, 21 May 2022 at 02:53, Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Tue, 2022-05-17 at 18:52 +0930, Joel Stanley wrote:
> > The AST2600 when using the i210 NIC over NC-SI has been observed to
> > produce incorrect checksum results with specific MTU values. This was
> > first observed when sending data across a long distance set of
> > networks.
> >
> > On a local network, the following test was performed using a 1MB file
> > of random data.
>
> Can you double check with Aspeed what's going on there and whether
> there's a way to instead, identify the bad case in the TX path and do
> on-demand SW checksuming only in those cases ?

Keep in mind this is only for the NC-SI case, where the link is
limited to 100Mbit anyway.

I did some tests with the openbmc kernel; a v5.15 tree with whatever
options we have enabled there.

Averaging a few iperf3 runs I see about 92Mbit/s with hardware
checksumming enabled, and 90Mbit/s with it disabled. So we can see the
difference, and it would be good if Aspeed could find the root cause
so this only needs to be disabled when hitting the problematic path as
you say.

> Because disabling HW checksum will kill performances afaik... (doesn't
> it also end up disabling zero-copy and SG ?)

Not sure?

>
> Cheers,
> Ben.
>
> > On the receiver run this script:
> >
> >  #!/bin/bash
> >  while [ 1 ]; do
> >         # Zero the stats
> >         nstat -r  > /dev/null
> >         nc -l 9899 > test-file
> >         # Check for checksum errors
> >         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
> >         if [ -z "$TcpInCsumErrors" ]; then
> >                 echo No TcpInCsumErrors
> >         else
> >                 echo TcpInCsumErrors = $TcpInCsumErrors
> >         fi
> >  done
> >
> > On an AST2600 system:
> >
> >  # nc <IP of  receiver host> 9899 < test-file
> >
> > The test was repeated with various MTU values:
> >
> >  # ip link set mtu 1410 dev eth0
> >
> > The observed results:
> >
> >  1500 - good
> >  1434 - bad
> >  1400 - good
> >  1410 - bad
> >  1420 - good
> >
> > The test was repeated after disabling tx checksumming:
> >
> >  # ethtool -K eth0 tx-checksumming off
> >
> > And all MTU values tested resulted in transfers without error.
> >
> > An issue with the driver cannot be ruled out, however there has been
> > no
> > bug discovered so far.
> >
> > David has done the work to take the original bug report of slow data
> > transfer between long distance connections and triaged it down to
> > this
> > test case.
> >
> > The vendor suspects this this is a hardware issue when using NC-SI.
> > The
> > fixes line refers to the patch that introduced AST2600 support.
> >
> > Reported-by: David Wilder <wilder@us.ibm.com>
> > Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>
> > Signed-off-by: Joel Stanley <joel@jms.id.au>
> > ---
> > v3 modifies the wrapping of the commit message.
> >
> > v2 updates the commit message with confirmation from the vendor that
> > this is a hardware issue, and clarifies why the commit used in the
> > fixes
> >
> >  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > b/drivers/net/ethernet/faraday/ftgmac100.c
> > index caf48023f8ea..5231818943c6 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct
> > platform_device *pdev)
> >       /* AST2400  doesn't have working HW checksum generation */
> >       if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> >               netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> > +     /* AST2600 tx checksum with NCSI is broken */
> > +     if (priv->use_ncsi && of_device_is_compatible(np,
> > "aspeed,ast2600-mac"))
> > +             netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> >       if (np && of_get_property(np, "no-hw-checksum", NULL))
> >               netdev->hw_features &= ~(NETIF_F_HW_CSUM |
> > NETIF_F_RXCSUM);
> >       netdev->features |= netdev->hw_features;
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-23 22:25   ` Joel Stanley
@ 2022-05-23 23:44     ` Andrew Lunn
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Lunn @ 2022-05-23 23:44 UTC (permalink / raw)
  To: Joel Stanley
  Cc: Benjamin Herrenschmidt, David S . Miller, Jakub Kicinski,
	Andrew Jeffery, Networking, David Wilder, linux-aspeed

> > > The observed results:
> > >
> > >  1500 - good
> > >  1434 - bad
> > >  1400 - good
> > >  1410 - bad
> > >  1420 - good

Looking at these numbers, all the good cases a divisible by 4. All the
bad cases are not.

Could you extend the test to automatically test 64 through 1500?  Or
manually try 1499, 1498, 1497, 1496. Maybe the workaround is if the
packet length is divisible by 4 let the hardware do the checksum,
otherwise do it in software.

      Andrew

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-23 23:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-17  9:22 [PATCH net v3] net: ftgmac100: Disable hardware checksum on AST2600 Joel Stanley
2022-05-18 13:10 ` patchwork-bot+netdevbpf
2022-05-21  2:51 ` Benjamin Herrenschmidt
2022-05-23 22:25   ` Joel Stanley
2022-05-23 23:44     ` Andrew Lunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.