All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
@ 2022-05-12 23:19 ` Joel Stanley
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Stanley @ 2022-05-12 23:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni,
	Benjamin Herrenschmidt, Dylan Hung, David Wilder
  Cc: openbmc, netdev, linux-kernel, David Wilder

The AST2600 when using the i210 NIC over NC-SI has been observed to
produce incorrect checksum results with specific MTU values. This was
first observed when sending data across a long distance set of networks.

On a local network, the following test was performed using a 1MB file of
random data.

On the receiver run this script:

 #!/bin/bash
 while [ 1 ]; do
        # Zero the stats
        nstat -r  > /dev/null
        nc -l 9899 > test-file
        # Check for checksum errors
        TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
        if [ -z "$TcpInCsumErrors" ]; then
                echo No TcpInCsumErrors
        else
                echo TcpInCsumErrors = $TcpInCsumErrors
        fi
 done

On an AST2600 system:

 # nc <IP of  receiver host> 9899 < test-file

The test was repeated with various MTU values:

 # ip link set mtu 1410 dev eth0

The observed results:

 1500 - good
 1434 - bad
 1400 - good
 1410 - bad
 1420 - good

The test was repeated after disabling tx checksumming:

 # ethtool -K eth0 tx-checksumming off

And all MTU values tested resulted in transfers without error.

An issue with the driver cannot be ruled out, however there has been no
bug discovered so far.

David has done the work to take the original bug report of slow data
transfer between long distance connections and triaged it down to this
test case.

The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers
to the patch that introduced AST2600 support.

Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
Reported-by: David Wilder <wilder@us.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
---
v2 updates the commit message with confirmation form the vendor that
this is a hardware issue, and clarifes why the commit used in the fixes
tag was chosen.

 drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index caf48023f8ea..5231818943c6 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
 	/* AST2400  doesn't have working HW checksum generation */
 	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
 		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
+	/* AST2600 tx checksum with NC-SI is broken */
+	if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
+		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
 	if (np && of_get_property(np, "no-hw-checksum", NULL))
 		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
 	netdev->features |= netdev->hw_features;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
@ 2022-05-12 23:19 ` Joel Stanley
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Stanley @ 2022-05-12 23:19 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni,
	Benjamin Herrenschmidt, Dylan Hung, David Wilder
  Cc: netdev, openbmc, David Wilder, linux-kernel

The AST2600 when using the i210 NIC over NC-SI has been observed to
produce incorrect checksum results with specific MTU values. This was
first observed when sending data across a long distance set of networks.

On a local network, the following test was performed using a 1MB file of
random data.

On the receiver run this script:

 #!/bin/bash
 while [ 1 ]; do
        # Zero the stats
        nstat -r  > /dev/null
        nc -l 9899 > test-file
        # Check for checksum errors
        TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
        if [ -z "$TcpInCsumErrors" ]; then
                echo No TcpInCsumErrors
        else
                echo TcpInCsumErrors = $TcpInCsumErrors
        fi
 done

On an AST2600 system:

 # nc <IP of  receiver host> 9899 < test-file

The test was repeated with various MTU values:

 # ip link set mtu 1410 dev eth0

The observed results:

 1500 - good
 1434 - bad
 1400 - good
 1410 - bad
 1420 - good

The test was repeated after disabling tx checksumming:

 # ethtool -K eth0 tx-checksumming off

And all MTU values tested resulted in transfers without error.

An issue with the driver cannot be ruled out, however there has been no
bug discovered so far.

David has done the work to take the original bug report of slow data
transfer between long distance connections and triaged it down to this
test case.

The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers
to the patch that introduced AST2600 support.

Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
Reported-by: David Wilder <wilder@us.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
---
v2 updates the commit message with confirmation form the vendor that
this is a hardware issue, and clarifes why the commit used in the fixes
tag was chosen.

 drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
index caf48023f8ea..5231818943c6 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
 	/* AST2400  doesn't have working HW checksum generation */
 	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
 		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
+	/* AST2600 tx checksum with NC-SI is broken */
+	if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
+		netdev->hw_features &= ~NETIF_F_HW_CSUM;
+
 	if (np && of_get_property(np, "no-hw-checksum", NULL))
 		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
 	netdev->features |= netdev->hw_features;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-12 23:19 ` Joel Stanley
@ 2022-05-13  1:46   ` Dylan Hung
  -1 siblings, 0 replies; 10+ messages in thread
From: Dylan Hung @ 2022-05-13  1:46 UTC (permalink / raw)
  To: Joel Stanley, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Benjamin Herrenschmidt, David Wilder
  Cc: openbmc, netdev, linux-kernel, David Wilder

> -----Original Message-----
> From: joel.stan@gmail.com [mailto:joel.stan@gmail.com] On Behalf Of Joel
> Stanley
> Sent: Friday, May 13, 2022 7:20 AM
> To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Benjamin
> Herrenschmidt <benh@kernel.crashing.org>; Dylan Hung
> <dylan_hung@aspeedtech.com>; David Wilder <dwilder@us.ibm.com>
> Cc: openbmc@lists.ozlabs.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; David Wilder <wilder@us.ibm.com>
> Subject: [PATCH net v2] net: ftgmac100: Disable hardware checksum on
> AST2600
> 
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was first
> observed when sending data across a long distance set of networks.
> 
> On a local network, the following test was performed using a 1MB file of
> random data.
> 
> On the receiver run this script:
> 
>  #!/bin/bash
>  while [ 1 ]; do
>         # Zero the stats
>         nstat -r  > /dev/null
>         nc -l 9899 > test-file
>         # Check for checksum errors
>         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
>         if [ -z "$TcpInCsumErrors" ]; then
>                 echo No TcpInCsumErrors
>         else
>                 echo TcpInCsumErrors = $TcpInCsumErrors
>         fi
>  done
> 
> On an AST2600 system:
> 
>  # nc <IP of  receiver host> 9899 < test-file
> 
> The test was repeated with various MTU values:
> 
>  # ip link set mtu 1410 dev eth0
> 
> The observed results:
> 
>  1500 - good
>  1434 - bad
>  1400 - good
>  1410 - bad
>  1420 - good
> 
> The test was repeated after disabling tx checksumming:
> 
>  # ethtool -K eth0 tx-checksumming off
> 
> And all MTU values tested resulted in transfers without error.
> 
> An issue with the driver cannot be ruled out, however there has been no bug
> discovered so far.
> 
> David has done the work to take the original bug report of slow data transfer
> between long distance connections and triaged it down to this test case.
> 
> The vendor suspects this this is a hardware issue when using NC-SI. The fixes
> line refers to the patch that introduced AST2600 support.
> 
> Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle
> property")
> Reported-by: David Wilder <wilder@us.ibm.com>
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
> v2 updates the commit message with confirmation form the vendor that this is
> a hardware issue, and clarifes why the commit used in the fixes tag was
> chosen.
> 
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index caf48023f8ea..5231818943c6 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device
> *pdev)
>  	/* AST2400  doesn't have working HW checksum generation */
>  	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
>  		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
> +	/* AST2600 tx checksum with NC-SI is broken */
> +	if (priv->use_ncsi && of_device_is_compatible(np,
> "aspeed,ast2600-mac"))
> +		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
>  	if (np && of_get_property(np, "no-hw-checksum", NULL))
>  		netdev->hw_features &= ~(NETIF_F_HW_CSUM |
> NETIF_F_RXCSUM);
>  	netdev->features |= netdev->hw_features;
> --
> 2.35.1
Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
@ 2022-05-13  1:46   ` Dylan Hung
  0 siblings, 0 replies; 10+ messages in thread
From: Dylan Hung @ 2022-05-13  1:46 UTC (permalink / raw)
  To: Joel Stanley, David S . Miller, Jakub Kicinski, Paolo Abeni,
	Benjamin Herrenschmidt, David Wilder
  Cc: netdev, openbmc, David Wilder, linux-kernel

> -----Original Message-----
> From: joel.stan@gmail.com [mailto:joel.stan@gmail.com] On Behalf Of Joel
> Stanley
> Sent: Friday, May 13, 2022 7:20 AM
> To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Benjamin
> Herrenschmidt <benh@kernel.crashing.org>; Dylan Hung
> <dylan_hung@aspeedtech.com>; David Wilder <dwilder@us.ibm.com>
> Cc: openbmc@lists.ozlabs.org; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; David Wilder <wilder@us.ibm.com>
> Subject: [PATCH net v2] net: ftgmac100: Disable hardware checksum on
> AST2600
> 
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was first
> observed when sending data across a long distance set of networks.
> 
> On a local network, the following test was performed using a 1MB file of
> random data.
> 
> On the receiver run this script:
> 
>  #!/bin/bash
>  while [ 1 ]; do
>         # Zero the stats
>         nstat -r  > /dev/null
>         nc -l 9899 > test-file
>         # Check for checksum errors
>         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
>         if [ -z "$TcpInCsumErrors" ]; then
>                 echo No TcpInCsumErrors
>         else
>                 echo TcpInCsumErrors = $TcpInCsumErrors
>         fi
>  done
> 
> On an AST2600 system:
> 
>  # nc <IP of  receiver host> 9899 < test-file
> 
> The test was repeated with various MTU values:
> 
>  # ip link set mtu 1410 dev eth0
> 
> The observed results:
> 
>  1500 - good
>  1434 - bad
>  1400 - good
>  1410 - bad
>  1420 - good
> 
> The test was repeated after disabling tx checksumming:
> 
>  # ethtool -K eth0 tx-checksumming off
> 
> And all MTU values tested resulted in transfers without error.
> 
> An issue with the driver cannot be ruled out, however there has been no bug
> discovered so far.
> 
> David has done the work to take the original bug report of slow data transfer
> between long distance connections and triaged it down to this test case.
> 
> The vendor suspects this this is a hardware issue when using NC-SI. The fixes
> line refers to the patch that introduced AST2600 support.
> 
> Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle
> property")
> Reported-by: David Wilder <wilder@us.ibm.com>
> Signed-off-by: Joel Stanley <joel@jms.id.au>
> ---
> v2 updates the commit message with confirmation form the vendor that this is
> a hardware issue, and clarifes why the commit used in the fixes tag was
> chosen.
> 
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> b/drivers/net/ethernet/faraday/ftgmac100.c
> index caf48023f8ea..5231818943c6 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device
> *pdev)
>  	/* AST2400  doesn't have working HW checksum generation */
>  	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
>  		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
> +	/* AST2600 tx checksum with NC-SI is broken */
> +	if (priv->use_ncsi && of_device_is_compatible(np,
> "aspeed,ast2600-mac"))
> +		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
>  	if (np && of_get_property(np, "no-hw-checksum", NULL))
>  		netdev->hw_features &= ~(NETIF_F_HW_CSUM |
> NETIF_F_RXCSUM);
>  	netdev->features |= netdev->hw_features;
> --
> 2.35.1
Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-12 23:19 ` Joel Stanley
@ 2022-05-13  5:11   ` Paul Menzel
  -1 siblings, 0 replies; 10+ messages in thread
From: Paul Menzel @ 2022-05-13  5:11 UTC (permalink / raw)
  To: Joel Stanley
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni,
	Benjamin Herrenschmidt, Dylan Hung, David Wilder, netdev,
	openbmc, David Wilder, linux-kernel

Dear Joel,


Am 13.05.22 um 01:19 schrieb Joel Stanley:
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was
> first observed when sending data across a long distance set of networks.
> 
> On a local network, the following test was performed using a 1MB file of
> random data.
> 
> On the receiver run this script:
> 
>   #!/bin/bash
>   while [ 1 ]; do
>          # Zero the stats
>          nstat -r  > /dev/null
>          nc -l 9899 > test-file
>          # Check for checksum errors
>          TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
>          if [ -z "$TcpInCsumErrors" ]; then
>                  echo No TcpInCsumErrors
>          else
>                  echo TcpInCsumErrors = $TcpInCsumErrors
>          fi
>   done
> 
> On an AST2600 system:
> 
>   # nc <IP of  receiver host> 9899 < test-file
> 
> The test was repeated with various MTU values:
> 
>   # ip link set mtu 1410 dev eth0
> 
> The observed results:
> 
>   1500 - good
>   1434 - bad
>   1400 - good
>   1410 - bad
>   1420 - good

Sort the values? As some MTUs are good, should a allow list for these 
values be added?

> The test was repeated after disabling tx checksumming:
> 
>   # ethtool -K eth0 tx-checksumming off
> 
> And all MTU values tested resulted in transfers without error.
> 
> An issue with the driver cannot be ruled out, however there has been no
> bug discovered so far.
> 
> David has done the work to take the original bug report of slow data
> transfer between long distance connections and triaged it down to this
> test case.
> 
> The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers
> to the patch that introduced AST2600 support.

Please wrap the line after 75 characters.

Can the problem be reproduced with QEMU?

> Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
> Reported-by: David Wilder <wilder@us.ibm.com>
> Signed-off-by: Joel Stanley <joel@jms.id.au>

Should the intel-wired-lan folks be put in Cc?

> ---
> v2 updates the commit message with confirmation form the vendor that

from

> this is a hardware issue, and clarifes why the commit used in the fixes

clarifies

> tag was chosen.
> 
>   drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> index caf48023f8ea..5231818943c6 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
>   	/* AST2400  doesn't have working HW checksum generation */
>   	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
>   		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
> +	/* AST2600 tx checksum with NC-SI is broken */

Does ASPEED have an internal bug for this, so should there be new 
revisions of the AST2600, the bug can be fixed?

> +	if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
> +		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +

I would fancy a note or even warning about this hardware issue.

>   	if (np && of_get_property(np, "no-hw-checksum", NULL))
>   		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
>   	netdev->features |= netdev->hw_features;


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
@ 2022-05-13  5:11   ` Paul Menzel
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Menzel @ 2022-05-13  5:11 UTC (permalink / raw)
  To: Joel Stanley
  Cc: David Wilder, openbmc, David Wilder, linux-kernel, netdev,
	Jakub Kicinski, Dylan Hung, Paolo Abeni, David S . Miller

Dear Joel,


Am 13.05.22 um 01:19 schrieb Joel Stanley:
> The AST2600 when using the i210 NIC over NC-SI has been observed to
> produce incorrect checksum results with specific MTU values. This was
> first observed when sending data across a long distance set of networks.
> 
> On a local network, the following test was performed using a 1MB file of
> random data.
> 
> On the receiver run this script:
> 
>   #!/bin/bash
>   while [ 1 ]; do
>          # Zero the stats
>          nstat -r  > /dev/null
>          nc -l 9899 > test-file
>          # Check for checksum errors
>          TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
>          if [ -z "$TcpInCsumErrors" ]; then
>                  echo No TcpInCsumErrors
>          else
>                  echo TcpInCsumErrors = $TcpInCsumErrors
>          fi
>   done
> 
> On an AST2600 system:
> 
>   # nc <IP of  receiver host> 9899 < test-file
> 
> The test was repeated with various MTU values:
> 
>   # ip link set mtu 1410 dev eth0
> 
> The observed results:
> 
>   1500 - good
>   1434 - bad
>   1400 - good
>   1410 - bad
>   1420 - good

Sort the values? As some MTUs are good, should a allow list for these 
values be added?

> The test was repeated after disabling tx checksumming:
> 
>   # ethtool -K eth0 tx-checksumming off
> 
> And all MTU values tested resulted in transfers without error.
> 
> An issue with the driver cannot be ruled out, however there has been no
> bug discovered so far.
> 
> David has done the work to take the original bug report of slow data
> transfer between long distance connections and triaged it down to this
> test case.
> 
> The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers
> to the patch that introduced AST2600 support.

Please wrap the line after 75 characters.

Can the problem be reproduced with QEMU?

> Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
> Reported-by: David Wilder <wilder@us.ibm.com>
> Signed-off-by: Joel Stanley <joel@jms.id.au>

Should the intel-wired-lan folks be put in Cc?

> ---
> v2 updates the commit message with confirmation form the vendor that

from

> this is a hardware issue, and clarifes why the commit used in the fixes

clarifies

> tag was chosen.
> 
>   drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> index caf48023f8ea..5231818943c6 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
>   	/* AST2400  doesn't have working HW checksum generation */
>   	if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
>   		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +
> +	/* AST2600 tx checksum with NC-SI is broken */

Does ASPEED have an internal bug for this, so should there be new 
revisions of the AST2600, the bug can be fixed?

> +	if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
> +		netdev->hw_features &= ~NETIF_F_HW_CSUM;
> +

I would fancy a note or even warning about this hardware issue.

>   	if (np && of_get_property(np, "no-hw-checksum", NULL))
>   		netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
>   	netdev->features |= netdev->hw_features;


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-13  5:11   ` Paul Menzel
@ 2022-05-17  9:19     ` Joel Stanley
  -1 siblings, 0 replies; 10+ messages in thread
From: Joel Stanley @ 2022-05-17  9:19 UTC (permalink / raw)
  To: Paul Menzel
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni,
	Benjamin Herrenschmidt, Dylan Hung, David Wilder, Networking,
	OpenBMC Maillist, David Wilder, Linux Kernel Mailing List

On Fri, 13 May 2022 at 05:11, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Joel,
>
>
> Am 13.05.22 um 01:19 schrieb Joel Stanley:
> > The AST2600 when using the i210 NIC over NC-SI has been observed to
> > produce incorrect checksum results with specific MTU values. This was
> > first observed when sending data across a long distance set of networks.
> >
> > On a local network, the following test was performed using a 1MB file of
> > random data.
> >
> > On the receiver run this script:
> >
> >   #!/bin/bash
> >   while [ 1 ]; do
> >          # Zero the stats
> >          nstat -r  > /dev/null
> >          nc -l 9899 > test-file
> >          # Check for checksum errors
> >          TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
> >          if [ -z "$TcpInCsumErrors" ]; then
> >                  echo No TcpInCsumErrors
> >          else
> >                  echo TcpInCsumErrors = $TcpInCsumErrors
> >          fi
> >   done
> >
> > On an AST2600 system:
> >
> >   # nc <IP of  receiver host> 9899 < test-file
> >
> > The test was repeated with various MTU values:
> >
> >   # ip link set mtu 1410 dev eth0
> >
> > The observed results:
> >
> >   1500 - good
> >   1434 - bad
> >   1400 - good
> >   1410 - bad
> >   1420 - good
>
> Sort the values? As some MTUs are good, should a allow list for these
> values be added?

No.

>
> > The test was repeated after disabling tx checksumming:
> >
> >   # ethtool -K eth0 tx-checksumming off
> >
> > And all MTU values tested resulted in transfers without error.
> >
> > An issue with the driver cannot be ruled out, however there has been no
> > bug discovered so far.
> >
> > David has done the work to take the original bug report of slow data
> > transfer between long distance connections and triaged it down to this
> > test case.
> >
> > The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers
> > to the patch that introduced AST2600 support.
>
> Please wrap the line after 75 characters.
>
> Can the problem be reproduced with QEMU?

It can not. If you wanted to try you could modify the model to corrupt
tx checksums, but I would consider this of limited value.

>
> > Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
> > Reported-by: David Wilder <wilder@us.ibm.com>
> > Signed-off-by: Joel Stanley <joel@jms.id.au>
>
> Should the intel-wired-lan folks be put in Cc?

No, as all evidence points towards this being an AST2600 problem.

ASPEED did not report the issue relates to the i210 doing anything
wrong. The issue is not seen from the host PCIe interface, and the
i210 is in widespread use with other BMCs without issue.

>
> > ---
> > v2 updates the commit message with confirmation form the vendor that
>
> from
>
> > this is a hardware issue, and clarifes why the commit used in the fixes
>
> clarifies
>
> > tag was chosen.
> >
> >   drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> > index caf48023f8ea..5231818943c6 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
> >       /* AST2400  doesn't have working HW checksum generation */
> >       if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> >               netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> > +     /* AST2600 tx checksum with NC-SI is broken */
>
> Does ASPEED have an internal bug for this, so should there be new
> revisions of the AST2600, the bug can be fixed?

There are no plans to fix it that I'm aware of.

>
> > +     if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
> > +             netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
>
> I would fancy a note or even warning about this hardware issue.

I don't see the need to clutter up the kernel logs.

We've had a similar workaround for the 2400 since support was added
for the aspeed part. It doesn't affect the operation of the system; in
fact it improves it as without this we see degraded throughput due to
retransmissions.

We have git history for detailed notes on why a change was made.

>
> >       if (np && of_get_property(np, "no-hw-checksum", NULL))
> >               netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
> >       netdev->features |= netdev->hw_features;
>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
@ 2022-05-17  9:19     ` Joel Stanley
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Stanley @ 2022-05-17  9:19 UTC (permalink / raw)
  To: Paul Menzel
  Cc: David Wilder, OpenBMC Maillist, David Wilder,
	Linux Kernel Mailing List, Networking, Jakub Kicinski,
	Dylan Hung, Paolo Abeni, David S . Miller

On Fri, 13 May 2022 at 05:11, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>
> Dear Joel,
>
>
> Am 13.05.22 um 01:19 schrieb Joel Stanley:
> > The AST2600 when using the i210 NIC over NC-SI has been observed to
> > produce incorrect checksum results with specific MTU values. This was
> > first observed when sending data across a long distance set of networks.
> >
> > On a local network, the following test was performed using a 1MB file of
> > random data.
> >
> > On the receiver run this script:
> >
> >   #!/bin/bash
> >   while [ 1 ]; do
> >          # Zero the stats
> >          nstat -r  > /dev/null
> >          nc -l 9899 > test-file
> >          # Check for checksum errors
> >          TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
> >          if [ -z "$TcpInCsumErrors" ]; then
> >                  echo No TcpInCsumErrors
> >          else
> >                  echo TcpInCsumErrors = $TcpInCsumErrors
> >          fi
> >   done
> >
> > On an AST2600 system:
> >
> >   # nc <IP of  receiver host> 9899 < test-file
> >
> > The test was repeated with various MTU values:
> >
> >   # ip link set mtu 1410 dev eth0
> >
> > The observed results:
> >
> >   1500 - good
> >   1434 - bad
> >   1400 - good
> >   1410 - bad
> >   1420 - good
>
> Sort the values? As some MTUs are good, should a allow list for these
> values be added?

No.

>
> > The test was repeated after disabling tx checksumming:
> >
> >   # ethtool -K eth0 tx-checksumming off
> >
> > And all MTU values tested resulted in transfers without error.
> >
> > An issue with the driver cannot be ruled out, however there has been no
> > bug discovered so far.
> >
> > David has done the work to take the original bug report of slow data
> > transfer between long distance connections and triaged it down to this
> > test case.
> >
> > The vendor suspects this this is a hardware issue when using NC-SI. The fixes line refers
> > to the patch that introduced AST2600 support.
>
> Please wrap the line after 75 characters.
>
> Can the problem be reproduced with QEMU?

It can not. If you wanted to try you could modify the model to corrupt
tx checksums, but I would consider this of limited value.

>
> > Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle property")
> > Reported-by: David Wilder <wilder@us.ibm.com>
> > Signed-off-by: Joel Stanley <joel@jms.id.au>
>
> Should the intel-wired-lan folks be put in Cc?

No, as all evidence points towards this being an AST2600 problem.

ASPEED did not report the issue relates to the i210 doing anything
wrong. The issue is not seen from the host PCIe interface, and the
i210 is in widespread use with other BMCs without issue.

>
> > ---
> > v2 updates the commit message with confirmation form the vendor that
>
> from
>
> > this is a hardware issue, and clarifes why the commit used in the fixes
>
> clarifies
>
> > tag was chosen.
> >
> >   drivers/net/ethernet/faraday/ftgmac100.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c b/drivers/net/ethernet/faraday/ftgmac100.c
> > index caf48023f8ea..5231818943c6 100644
> > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > @@ -1928,6 +1928,11 @@ static int ftgmac100_probe(struct platform_device *pdev)
> >       /* AST2400  doesn't have working HW checksum generation */
> >       if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> >               netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
> > +     /* AST2600 tx checksum with NC-SI is broken */
>
> Does ASPEED have an internal bug for this, so should there be new
> revisions of the AST2600, the bug can be fixed?

There are no plans to fix it that I'm aware of.

>
> > +     if (priv->use_ncsi && of_device_is_compatible(np, "aspeed,ast2600-mac"))
> > +             netdev->hw_features &= ~NETIF_F_HW_CSUM;
> > +
>
> I would fancy a note or even warning about this hardware issue.

I don't see the need to clutter up the kernel logs.

We've had a similar workaround for the 2400 since support was added
for the aspeed part. It doesn't affect the operation of the system; in
fact it improves it as without this we see degraded throughput due to
retransmissions.

We have git history for detailed notes on why a change was made.

>
> >       if (np && of_get_property(np, "no-hw-checksum", NULL))
> >               netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
> >       netdev->features |= netdev->hw_features;
>
>
> Kind regards,
>
> Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
  2022-05-13  1:46   ` Dylan Hung
@ 2022-05-17  9:20     ` Joel Stanley
  -1 siblings, 0 replies; 10+ messages in thread
From: Joel Stanley @ 2022-05-17  9:20 UTC (permalink / raw)
  To: Dylan Hung
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni,
	Benjamin Herrenschmidt, David Wilder, openbmc, netdev,
	linux-kernel, David Wilder

On Fri, 13 May 2022 at 01:46, Dylan Hung <dylan_hung@aspeedtech.com> wrote:
>
> > -----Original Message-----
> > From: joel.stan@gmail.com [mailto:joel.stan@gmail.com] On Behalf Of Joel
> > Stanley
> > Sent: Friday, May 13, 2022 7:20 AM
> > To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Benjamin
> > Herrenschmidt <benh@kernel.crashing.org>; Dylan Hung
> > <dylan_hung@aspeedtech.com>; David Wilder <dwilder@us.ibm.com>
> > Cc: openbmc@lists.ozlabs.org; netdev@vger.kernel.org;
> > linux-kernel@vger.kernel.org; David Wilder <wilder@us.ibm.com>
> > Subject: [PATCH net v2] net: ftgmac100: Disable hardware checksum on
> > AST2600
> >
> > The AST2600 when using the i210 NIC over NC-SI has been observed to
> > produce incorrect checksum results with specific MTU values. This was first
> > observed when sending data across a long distance set of networks.
> >
> > On a local network, the following test was performed using a 1MB file of
> > random data.
> >
> > On the receiver run this script:
> >
> >  #!/bin/bash
> >  while [ 1 ]; do
> >         # Zero the stats
> >         nstat -r  > /dev/null
> >         nc -l 9899 > test-file
> >         # Check for checksum errors
> >         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
> >         if [ -z "$TcpInCsumErrors" ]; then
> >                 echo No TcpInCsumErrors
> >         else
> >                 echo TcpInCsumErrors = $TcpInCsumErrors
> >         fi
> >  done
> >
> > On an AST2600 system:
> >
> >  # nc <IP of  receiver host> 9899 < test-file
> >
> > The test was repeated with various MTU values:
> >
> >  # ip link set mtu 1410 dev eth0
> >
> > The observed results:
> >
> >  1500 - good
> >  1434 - bad
> >  1400 - good
> >  1410 - bad
> >  1420 - good
> >
> > The test was repeated after disabling tx checksumming:
> >
> >  # ethtool -K eth0 tx-checksumming off
> >
> > And all MTU values tested resulted in transfers without error.
> >
> > An issue with the driver cannot be ruled out, however there has been no bug
> > discovered so far.
> >
> > David has done the work to take the original bug report of slow data transfer
> > between long distance connections and triaged it down to this test case.
> >
> > The vendor suspects this this is a hardware issue when using NC-SI. The fixes
> > line refers to the patch that introduced AST2600 support.
> >
> > Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle
> > property")
> > Reported-by: David Wilder <wilder@us.ibm.com>
> > Signed-off-by: Joel Stanley <joel@jms.id.au>

> Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>

Thank you Dylan. I've added your r-b to v3, as the only changes are to
the wrapping of the commit message.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600
@ 2022-05-17  9:20     ` Joel Stanley
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Stanley @ 2022-05-17  9:20 UTC (permalink / raw)
  To: Dylan Hung
  Cc: David Wilder, openbmc, David Wilder, linux-kernel, netdev,
	Jakub Kicinski, Paolo Abeni, David S . Miller

On Fri, 13 May 2022 at 01:46, Dylan Hung <dylan_hung@aspeedtech.com> wrote:
>
> > -----Original Message-----
> > From: joel.stan@gmail.com [mailto:joel.stan@gmail.com] On Behalf Of Joel
> > Stanley
> > Sent: Friday, May 13, 2022 7:20 AM
> > To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> > <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Benjamin
> > Herrenschmidt <benh@kernel.crashing.org>; Dylan Hung
> > <dylan_hung@aspeedtech.com>; David Wilder <dwilder@us.ibm.com>
> > Cc: openbmc@lists.ozlabs.org; netdev@vger.kernel.org;
> > linux-kernel@vger.kernel.org; David Wilder <wilder@us.ibm.com>
> > Subject: [PATCH net v2] net: ftgmac100: Disable hardware checksum on
> > AST2600
> >
> > The AST2600 when using the i210 NIC over NC-SI has been observed to
> > produce incorrect checksum results with specific MTU values. This was first
> > observed when sending data across a long distance set of networks.
> >
> > On a local network, the following test was performed using a 1MB file of
> > random data.
> >
> > On the receiver run this script:
> >
> >  #!/bin/bash
> >  while [ 1 ]; do
> >         # Zero the stats
> >         nstat -r  > /dev/null
> >         nc -l 9899 > test-file
> >         # Check for checksum errors
> >         TcpInCsumErrors=$(nstat | grep TcpInCsumErrors)
> >         if [ -z "$TcpInCsumErrors" ]; then
> >                 echo No TcpInCsumErrors
> >         else
> >                 echo TcpInCsumErrors = $TcpInCsumErrors
> >         fi
> >  done
> >
> > On an AST2600 system:
> >
> >  # nc <IP of  receiver host> 9899 < test-file
> >
> > The test was repeated with various MTU values:
> >
> >  # ip link set mtu 1410 dev eth0
> >
> > The observed results:
> >
> >  1500 - good
> >  1434 - bad
> >  1400 - good
> >  1410 - bad
> >  1420 - good
> >
> > The test was repeated after disabling tx checksumming:
> >
> >  # ethtool -K eth0 tx-checksumming off
> >
> > And all MTU values tested resulted in transfers without error.
> >
> > An issue with the driver cannot be ruled out, however there has been no bug
> > discovered so far.
> >
> > David has done the work to take the original bug report of slow data transfer
> > between long distance connections and triaged it down to this test case.
> >
> > The vendor suspects this this is a hardware issue when using NC-SI. The fixes
> > line refers to the patch that introduced AST2600 support.
> >
> > Fixes: 39bfab8844a0 ("net: ftgmac100: Add support for DT phy-handle
> > property")
> > Reported-by: David Wilder <wilder@us.ibm.com>
> > Signed-off-by: Joel Stanley <joel@jms.id.au>

> Reviewed-by: Dylan Hung <dylan_hung@aspeedtech.com>

Thank you Dylan. I've added your r-b to v3, as the only changes are to
the wrapping of the commit message.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-17  9:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 23:19 [PATCH net v2] net: ftgmac100: Disable hardware checksum on AST2600 Joel Stanley
2022-05-12 23:19 ` Joel Stanley
2022-05-13  1:46 ` Dylan Hung
2022-05-13  1:46   ` Dylan Hung
2022-05-17  9:20   ` Joel Stanley
2022-05-17  9:20     ` Joel Stanley
2022-05-13  5:11 ` Paul Menzel
2022-05-13  5:11   ` Paul Menzel
2022-05-17  9:19   ` Joel Stanley
2022-05-17  9:19     ` Joel Stanley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.