All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v5 0/2] Fix a regression in the Gemini ethernet controller.
@ 2024-01-02 20:34 Linus Walleij
  2024-01-02 20:34 ` [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO Linus Walleij
  2024-01-02 20:34 ` [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes Linus Walleij
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Walleij @ 2024-01-02 20:34 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Vladimir Oltean, Household Cang, Romain Gantois
  Cc: netdev, Linus Walleij

These fixes were developed on top of the earlier fixes.

Finding the right solution is hard because the Gemini checksumming
engine is completely undocumented in the datasheets.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v5:
- Drop the patch re-implementing eth_header_parse_protocol()
- Link to v4: https://lore.kernel.org/r/20231222-new-gemini-ethernet-regression-v4-0-a36e71b0f32b@linaro.org

Changes in v4:
- Properly drop all MTU/TSO muckery in the TX function, the
  whole approach is bogus.
- Make the raw etherype retrieveal return __be16, it is the
  callers job to deal with endianness (as per the pattern
  from if_vlan.h)
- Use __vlan_get_protocol() instead of vlan_get_protocol()
- Only actively bypass the TSS if the frame is over a certain
  size.
- Drop comment that no longer applies.
- Link to v3: https://lore.kernel.org/r/20231221-new-gemini-ethernet-regression-v3-0-a96b4374bfe8@linaro.org

Changes in v3:
- Fix a whitespace bug in the first patch.
- Add generic accessors to obtain the raw ethertype of an
  ethernet frame. VLAN already have the right accessors.
- Link to v2: https://lore.kernel.org/r/20231216-new-gemini-ethernet-regression-v2-0-64c269413dfa@linaro.org

Changes in v2:
- Drop the TSO and length checks altogether, this was never
  working properly.
- Plan to make a proper TSO implementation in the next kernel
  cycle.
- Link to v1: https://lore.kernel.org/r/20231215-new-gemini-ethernet-regression-v1-0-93033544be23@linaro.org

---
Linus Walleij (2):
      net: ethernet: cortina: Drop software checksum and TSO
      net: ethernet: cortina: Bypass checksumming engine of alien ethertypes

 drivers/net/ethernet/cortina/gemini.c | 62 +++++++++++++++--------------------
 1 file changed, 26 insertions(+), 36 deletions(-)
---
base-commit: 33cc938e65a98f1d29d0a18403dbbee050dcad9a
change-id: 20231203-new-gemini-ethernet-regression-3c672de9cfd9

Best regards,
-- 
Linus Walleij <linus.walleij@linaro.org>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO
  2024-01-02 20:34 [PATCH net v5 0/2] Fix a regression in the Gemini ethernet controller Linus Walleij
@ 2024-01-02 20:34 ` Linus Walleij
  2024-01-04  0:24   ` Vladimir Oltean
                     ` (2 more replies)
  2024-01-02 20:34 ` [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes Linus Walleij
  1 sibling, 3 replies; 11+ messages in thread
From: Linus Walleij @ 2024-01-02 20:34 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Vladimir Oltean, Household Cang, Romain Gantois
  Cc: netdev, Linus Walleij

The recent change to allow large frames without hardware checksumming
slotted in software checksumming in the driver if hardware could not
do it.

This will however upset TSO (TCP Segment Offloading). Typical
error dumps includes this:

skb len=2961 headroom=222 headlen=66 tailroom=0
(...)
WARNING: CPU: 0 PID: 956 at net/core/dev.c:3259 skb_warn_bad_offload+0x7c/0x108
gemini-ethernet-port: caps=(0x0000010000154813, 0x00002007ffdd7889)

And the packets do not go through.

After investigating I drilled it down to the introduction of the
software checksumming in the driver.

Since the segmenting of packets will be done by the hardware this
makes a bit of sense since in that case the hardware also needs to
be keeping track of the checksumming.

That begs the question why large TCP or UDP packets also have to
bypass the checksumming (like e.g. ICMP does). If the hardware is
splitting it into smaller packets per-MTU setting, and checksumming
them, why is this happening then? I don't know. I know it is needed,
from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
and hang unless the bypass bit is set: the frames are not getting
through.

Drop the size check and the offloading features for now: this
needs to be fixed up properly.

Suggested-by: Eric Dumazet <edumazet@google.com>
Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 35 ++++-------------------------------
 1 file changed, 4 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 78287cfcbf63..5e399c6e095b 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -79,8 +79,7 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 #define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
 
 #define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
-		NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
-		NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
+			       NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
 
 /**
  * struct gmac_queue_page - page buffer per-page info
@@ -1143,39 +1142,13 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 	struct gmac_txdesc *txd;
 	skb_frag_t *skb_frag;
 	dma_addr_t mapping;
-	unsigned short mtu;
 	void *buffer;
-	int ret;
-
-	mtu  = ETH_HLEN;
-	mtu += netdev->mtu;
-	if (skb->protocol == htons(ETH_P_8021Q))
-		mtu += VLAN_HLEN;
 
+	/* TODO: implement proper TSO using MTU in word3 */
 	word1 = skb->len;
-	word3 = SOF_BIT;
-
-	if (word1 > mtu) {
-		word1 |= TSS_MTU_ENABLE_BIT;
-		word3 |= mtu;
-	}
+	word3 = SOF_BIT | skb->len;
 
-	if (skb->len >= ETH_FRAME_LEN) {
-		/* Hardware offloaded checksumming isn't working on frames
-		 * bigger than 1514 bytes. A hypothesis about this is that the
-		 * checksum buffer is only 1518 bytes, so when the frames get
-		 * bigger they get truncated, or the last few bytes get
-		 * overwritten by the FCS.
-		 *
-		 * Just use software checksumming and bypass on bigger frames.
-		 */
-		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			ret = skb_checksum_help(skb);
-			if (ret)
-				return ret;
-		}
-		word1 |= TSS_BYPASS_BIT;
-	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		int tcp = 0;
 
 		/* We do not switch off the checksumming on non TCP/UDP

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes
  2024-01-02 20:34 [PATCH net v5 0/2] Fix a regression in the Gemini ethernet controller Linus Walleij
  2024-01-02 20:34 ` [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO Linus Walleij
@ 2024-01-02 20:34 ` Linus Walleij
  2024-01-04  0:53   ` Vladimir Oltean
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Walleij @ 2024-01-02 20:34 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Vladimir Oltean, Household Cang, Romain Gantois
  Cc: netdev, Linus Walleij

We had workarounds were the ethernet checksumming engine would be bypassed
for larger frames, this fixed devices using DSA, but regressed devices
where the ethernet was connected directly to a PHY.

The devices with a PHY connected directly can't handle large frames
either way, with or without bypass. Looking at the size of the frame
is probably just wrong.

Rework the workaround such that we don't activate the checksumming engine if
the ethertype inside the actual frame is something else than 0x0800
(IPv4) or 0x86dd (IPv6). These are the only frames the checksumming engine
can actually handle. VLAN framing (0x8100) also works fine.

We can't inspect skb->protocol because DSA frames will sometimes have a
custom ethertype despite skb->protocol is e.g. 0x0800.

If the frame is ALSO over the size of an ordinary ethernet frame,
we will actively bypass the checksumming engine. (Always doing this
makes the hardware unstable.)

After this both devices with direct ethernet attached such as D-Link
DNS-313 and devices with a DSA switch with a custom ethertype such as
D-Link DIR-685 work fine.

Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 5e399c6e095b..68da4ae26248 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -29,6 +29,7 @@
 #include <linux/of_net.h>
 #include <linux/of_platform.h>
 #include <linux/etherdevice.h>
+#include <linux/if_ether.h>
 #include <linux/if_vlan.h>
 #include <linux/skbuff.h>
 #include <linux/phy.h>
@@ -1142,22 +1143,38 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 	struct gmac_txdesc *txd;
 	skb_frag_t *skb_frag;
 	dma_addr_t mapping;
+	u16 ethertype;
 	void *buffer;
 
 	/* TODO: implement proper TSO using MTU in word3 */
 	word1 = skb->len;
 	word3 = SOF_BIT | skb->len;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+	/* Dig out the the ethertype actually in the buffer and not what the
+	 * protocol claims to be. This is the raw data that the checksumming
+	 * offload engine will have to deal with.
+	 */
+	ethertype = ntohs(eth_header_parse_protocol(skb));
+	/* This is the only VLAN type supported by this hardware so check for
+	 * that: the checksumming engine can handle IP and IPv6 inside 802.1Q.
+	 */
+	if (ethertype == ETH_P_8021Q)
+		ethertype = ntohs(__vlan_get_protocol(skb, htons(ethertype), NULL));
+
+	if (ethertype != ETH_P_IP && ethertype != ETH_P_IPV6) {
+		/* Hardware offloaded checksumming isn't working on non-IP frames.
+		 * This happens for example on some DSA switches using a custom
+		 * ethertype. When a frame gets bigger than a standard ethernet
+		 * frame, it also needs to actively bypass the checksumming engine.
+		 * There is no clear explanation to why it is like this, the
+		 * reference manual has left the TSS completely undocumented.
+		 */
+		if (skb->len > ETH_FRAME_LEN)
+			word1 |= TSS_BYPASS_BIT;
+	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		int tcp = 0;
 
-		/* We do not switch off the checksumming on non TCP/UDP
-		 * frames: as is shown from tests, the checksumming engine
-		 * is smart enough to see that a frame is not actually TCP
-		 * or UDP and then just pass it through without any changes
-		 * to the frame.
-		 */
-		if (skb->protocol == htons(ETH_P_IP)) {
+		if (ethertype == ETH_P_IP) {
 			word1 |= TSS_IP_CHKSUM_BIT;
 			tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
 		} else { /* IPv6 */

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO
  2024-01-02 20:34 ` [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO Linus Walleij
@ 2024-01-04  0:24   ` Vladimir Oltean
  2024-01-05  0:00     ` Linus Walleij
  2024-01-05 11:32   ` Vladimir Oltean
  2024-01-05 14:40   ` Eric Dumazet
  2 siblings, 1 reply; 11+ messages in thread
From: Vladimir Oltean @ 2024-01-04  0:24 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Household Cang, Romain Gantois, netdev

Hi Linus,

On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:
> That begs the question why large TCP or UDP packets also have to
> bypass the checksumming (like e.g. ICMP does). If the hardware is
> splitting it into smaller packets per-MTU setting, and checksumming
> them, why is this happening then? I don't know. I know it is needed,
> from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
> to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
> and hang unless the bypass bit is set: the frames are not getting
> through.

This uhttpd traffic is plain TCP, or TCP wrapped in DSA?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes
  2024-01-02 20:34 ` [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes Linus Walleij
@ 2024-01-04  0:53   ` Vladimir Oltean
  2024-01-06  0:17     ` Linus Walleij
  0 siblings, 1 reply; 11+ messages in thread
From: Vladimir Oltean @ 2024-01-04  0:53 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Household Cang, Romain Gantois, netdev

On Tue, Jan 02, 2024 at 09:34:26PM +0100, Linus Walleij wrote:
> We had workarounds were the ethernet checksumming engine would be bypassed

s/were/where/

> for larger frames, this fixed devices using DSA, but regressed devices
> where the ethernet was connected directly to a PHY.
> 
> The devices with a PHY connected directly can't handle large frames
> either way, with or without bypass. Looking at the size of the frame
> is probably just wrong.

"Looking at the size of the frame is probably just wrong." yet you keep it.

Not only is this confusing for you to say this, but I believe that the
skb->len check is the _only_ thing that is needed. Explanation below.

> Rework the workaround such that we don't activate the checksumming engine if
> the ethertype inside the actual frame is something else than 0x0800
> (IPv4) or 0x86dd (IPv6). These are the only frames the checksumming engine
> can actually handle. VLAN framing (0x8100) also works fine.

Premise:

This driver does not set NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM (or anything)
in dev->vlan_features. Upper interface drivers which look at dev->vlan_features
in order to determine their own features are 8021q and DSA.

Packets transmitted through stacked interfaces have 3 checksumming points.
Two in software, during validate_xmit_skb() on the respective netdev,
depending on its features and skb->ip_summed, and one in the xmit
procedure of the hardware driver - gmac_start_xmit().

In short, I believe that the code which you have added to inspect the
ethertype - and based on that to avoid the "if (skb->ip_summed == CHECKSUM_PARTIAL)"
test - is bogus (a cost you are paying for nothing).

I'm saying this because I think that those
"(ethertype != ETH_P_IP && ethertype != ETH_P_IPV6)" frames wouldn't
have entered the "skb->ip_summed == CHECKSUM_PARTIAL" test anyway.

DSA-tagged frames should come with CHECKSUM_NONE, having been checksummed
in software already, by the first validate_xmit_skb() - DSA not having
inherited the checksum offload feature, because it's not in dev->vlan_features.

Coincidentally, this is also the reason why in your tests, DSA-tagged
TCP/UDP traffic still has a proper checksum, despite you bypassing the
hardware offload, and no longer calling skb_checksum_help() from the
driver. It was never needed, because the checksum was always already
calculated.

And VLAN traffic should also come with CHECKSUM_NONE, for the same reason.

The one difference between DSA and VLAN is that for DSA, you sometimes
set TSS_BYPASS_BIT (for large frames) and for VLAN you never do.

> 
> We can't inspect skb->protocol because DSA frames will sometimes have a
> custom ethertype despite skb->protocol is e.g. 0x0800.
> 
> If the frame is ALSO over the size of an ordinary ethernet frame,
> we will actively bypass the checksumming engine. (Always doing this
> makes the hardware unstable.)
> 
> After this both devices with direct ethernet attached such as D-Link
> DNS-313 and devices with a DSA switch with a custom ethertype such as
> D-Link DIR-685 work fine.
> 
> Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  drivers/net/ethernet/cortina/gemini.c | 33 +++++++++++++++++++++++++--------
>  1 file changed, 25 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
> index 5e399c6e095b..68da4ae26248 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -1142,22 +1143,38 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  	struct gmac_txdesc *txd;
>  	skb_frag_t *skb_frag;
>  	dma_addr_t mapping;
> +	u16 ethertype;
>  	void *buffer;
>  
>  	/* TODO: implement proper TSO using MTU in word3 */
>  	word1 = skb->len;
>  	word3 = SOF_BIT | skb->len;
>  
> -	if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +	/* Dig out the the ethertype actually in the buffer and not what the
> +	 * protocol claims to be. This is the raw data that the checksumming
> +	 * offload engine will have to deal with.
> +	 */
> +	ethertype = ntohs(eth_header_parse_protocol(skb));
> +	/* This is the only VLAN type supported by this hardware so check for
> +	 * that: the checksumming engine can handle IP and IPv6 inside 802.1Q.
> +	 */
> +	if (ethertype == ETH_P_8021Q)
> +		ethertype = ntohs(__vlan_get_protocol(skb, htons(ethertype), NULL));

Random fact: if you store "ethertype" as __be16 and perform htons() on the
constant value instead, the htons() operation will be performed at compile
time and should result in fewer instructions per packet in the fast path.

> +
> +	if (ethertype != ETH_P_IP && ethertype != ETH_P_IPV6) {
> +		/* Hardware offloaded checksumming isn't working on non-IP frames.
> +		 * This happens for example on some DSA switches using a custom
> +		 * ethertype. When a frame gets bigger than a standard ethernet
> +		 * frame, it also needs to actively bypass the checksumming engine.
> +		 * There is no clear explanation to why it is like this, the
> +		 * reference manual has left the TSS completely undocumented.
> +		 */
> +		if (skb->len > ETH_FRAME_LEN)
> +			word1 |= TSS_BYPASS_BIT;

Do you know what "TSS_BYPASS_BIT" does, exactly?

> +	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
>  		int tcp = 0;
>  
> -		/* We do not switch off the checksumming on non TCP/UDP
> -		 * frames: as is shown from tests, the checksumming engine
> -		 * is smart enough to see that a frame is not actually TCP
> -		 * or UDP and then just pass it through without any changes
> -		 * to the frame.
> -		 */
> -		if (skb->protocol == htons(ETH_P_IP)) {
> +		if (ethertype == ETH_P_IP) {
>  			word1 |= TSS_IP_CHKSUM_BIT;
>  			tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
>  		} else { /* IPv6 */
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO
  2024-01-04  0:24   ` Vladimir Oltean
@ 2024-01-05  0:00     ` Linus Walleij
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-01-05  0:00 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Household Cang, Romain Gantois, netdev

On Thu, Jan 4, 2024 at 1:24 AM Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:

> > That begs the question why large TCP or UDP packets also have to
> > bypass the checksumming (like e.g. ICMP does). If the hardware is
> > splitting it into smaller packets per-MTU setting, and checksumming
> > them, why is this happening then? I don't know. I know it is needed,
> > from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
> > to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
> > and hang unless the bypass bit is set: the frames are not getting
> > through.
>
> This uhttpd traffic is plain TCP, or TCP wrapped in DSA?

Wrapped in DSA, rtl_a_4.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO
  2024-01-02 20:34 ` [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO Linus Walleij
  2024-01-04  0:24   ` Vladimir Oltean
@ 2024-01-05 11:32   ` Vladimir Oltean
  2024-01-05 14:36     ` Eric Dumazet
  2024-01-05 23:35     ` Linus Walleij
  2024-01-05 14:40   ` Eric Dumazet
  2 siblings, 2 replies; 11+ messages in thread
From: Vladimir Oltean @ 2024-01-05 11:32 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Household Cang, Romain Gantois, netdev

On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:
> @@ -1143,39 +1142,13 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  	struct gmac_txdesc *txd;
>  	skb_frag_t *skb_frag;
>  	dma_addr_t mapping;
> -	unsigned short mtu;
>  	void *buffer;
> -	int ret;
> -
> -	mtu  = ETH_HLEN;
> -	mtu += netdev->mtu;
> -	if (skb->protocol == htons(ETH_P_8021Q))
> -		mtu += VLAN_HLEN;
>  
> +	/* TODO: implement proper TSO using MTU in word3 */
>  	word1 = skb->len;
> -	word3 = SOF_BIT;
> -
> -	if (word1 > mtu) {
> -		word1 |= TSS_MTU_ENABLE_BIT;
> -		word3 |= mtu;
> -	}
> +	word3 = SOF_BIT | skb->len;
>  
> -	if (skb->len >= ETH_FRAME_LEN) {
> -		/* Hardware offloaded checksumming isn't working on frames
> -		 * bigger than 1514 bytes. A hypothesis about this is that the
> -		 * checksum buffer is only 1518 bytes, so when the frames get
> -		 * bigger they get truncated, or the last few bytes get
> -		 * overwritten by the FCS.
> -		 *
> -		 * Just use software checksumming and bypass on bigger frames.
> -		 */
> -		if (skb->ip_summed == CHECKSUM_PARTIAL) {
> -			ret = skb_checksum_help(skb);
> -			if (ret)
> -				return ret;
> -		}
> -		word1 |= TSS_BYPASS_BIT;
> -	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {

So are you taking back the statement that "Hardware offloaded
checksumming isn't working on frames bigger than 1514 bytes"?

Have you increased the interface MTU beyond 1500, and tested with plain
TCP (no DSA) on top of it? Who will provide the TCP checksum for them now?

I don't understand why you remove the skb_checksum_help() call.
It doesn't play nice with skb_is_gso() packets, agreed, but you removed
the TSO netdev feature.

> +	if (skb->ip_summed == CHECKSUM_PARTIAL) {
>  		int tcp = 0;
>  
>  		/* We do not switch off the checksumming on non TCP/UDP
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO
  2024-01-05 11:32   ` Vladimir Oltean
@ 2024-01-05 14:36     ` Eric Dumazet
  2024-01-05 23:35     ` Linus Walleij
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2024-01-05 14:36 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Linus Walleij, Hans Ulli Kroll, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Household Cang, Romain Gantois, netdev

On Fri, Jan 5, 2024 at 12:32 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> On Tue, Jan 02, 2024 at 09:34:25PM +0100, Linus Walleij wrote:
> > @@ -1143,39 +1142,13 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
> >       struct gmac_txdesc *txd;
> >       skb_frag_t *skb_frag;
> >       dma_addr_t mapping;
> > -     unsigned short mtu;
> >       void *buffer;
> > -     int ret;
> > -
> > -     mtu  = ETH_HLEN;
> > -     mtu += netdev->mtu;
> > -     if (skb->protocol == htons(ETH_P_8021Q))
> > -             mtu += VLAN_HLEN;
> >
> > +     /* TODO: implement proper TSO using MTU in word3 */
> >       word1 = skb->len;
> > -     word3 = SOF_BIT;
> > -
> > -     if (word1 > mtu) {
> > -             word1 |= TSS_MTU_ENABLE_BIT;
> > -             word3 |= mtu;
> > -     }
> > +     word3 = SOF_BIT | skb->len;
> >
> > -     if (skb->len >= ETH_FRAME_LEN) {
> > -             /* Hardware offloaded checksumming isn't working on frames
> > -              * bigger than 1514 bytes. A hypothesis about this is that the
> > -              * checksum buffer is only 1518 bytes, so when the frames get
> > -              * bigger they get truncated, or the last few bytes get
> > -              * overwritten by the FCS.
> > -              *
> > -              * Just use software checksumming and bypass on bigger frames.
> > -              */
> > -             if (skb->ip_summed == CHECKSUM_PARTIAL) {
> > -                     ret = skb_checksum_help(skb);
> > -                     if (ret)
> > -                             return ret;
> > -             }
> > -             word1 |= TSS_BYPASS_BIT;
> > -     } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
>
> So are you taking back the statement that "Hardware offloaded
> checksumming isn't working on frames bigger than 1514 bytes"?
>
> Have you increased the interface MTU beyond 1500, and tested with plain
> TCP (no DSA) on top of it? Who will provide the TCP checksum for them now?
>
> I don't understand why you remove the skb_checksum_help() call.
> It doesn't play nice with skb_is_gso() packets, agreed, but you removed
> the TSO netdev feature.

This TSO feature never possibly worked.

This was probably hidden because TCP retransmits non TSO packets eventually.

A TSO enabled driver must use/propagate skb_shinfo(skb)->gso_size
value to the TSO engine on the NIC.
Otherwise, this is absolutely broken.

Please look at my original suggestion. I think the plan is to try to
add back TSO in next release, with proper testing (ie not rely on TCP
resilience)

https://lore.kernel.org/netdev/CANn89iJLfxng1sYL5Zk0mknXpyYQPCp83m3KgD2KJ2_hKCpEUg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO
  2024-01-02 20:34 ` [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO Linus Walleij
  2024-01-04  0:24   ` Vladimir Oltean
  2024-01-05 11:32   ` Vladimir Oltean
@ 2024-01-05 14:40   ` Eric Dumazet
  2 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2024-01-05 14:40 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Vladimir Oltean, Household Cang, Romain Gantois, netdev

On Tue, Jan 2, 2024 at 9:34 PM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> The recent change to allow large frames without hardware checksumming
> slotted in software checksumming in the driver if hardware could not
> do it.
>
> This will however upset TSO (TCP Segment Offloading). Typical
> error dumps includes this:
>
> skb len=2961 headroom=222 headlen=66 tailroom=0
> (...)
> WARNING: CPU: 0 PID: 956 at net/core/dev.c:3259 skb_warn_bad_offload+0x7c/0x108
> gemini-ethernet-port: caps=(0x0000010000154813, 0x00002007ffdd7889)
>
> And the packets do not go through.
>
> After investigating I drilled it down to the introduction of the
> software checksumming in the driver.
>
> Since the segmenting of packets will be done by the hardware this
> makes a bit of sense since in that case the hardware also needs to
> be keeping track of the checksumming.
>
> That begs the question why large TCP or UDP packets also have to
> bypass the checksumming (like e.g. ICMP does). If the hardware is
> splitting it into smaller packets per-MTU setting, and checksumming
> them, why is this happening then? I don't know. I know it is needed,
> from tests: the OpenWrt webserver uhttpd starts sending big skb:s (up
> to 2047 bytes, the max MTU) and above 1514 bytes it starts to fail
> and hang unless the bypass bit is set: the frames are not getting
> through.
>
> Drop the size check and the offloading features for now: this
> needs to be fixed up properly.
>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Fixes: d4d0c5b4d279 ("net: ethernet: cortina: Handle large frames")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  drivers/net/ethernet/cortina/gemini.c | 35 ++++-------------------------------
>  1 file changed, 4 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
> index 78287cfcbf63..5e399c6e095b 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -79,8 +79,7 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
>  #define GMAC0_IRQ4_8 (GMAC0_MIB_INT_BIT | GMAC0_RX_OVERRUN_INT_BIT)
>
>  #define GMAC_OFFLOAD_FEATURES (NETIF_F_SG | NETIF_F_IP_CSUM | \
> -               NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM | \
> -               NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
> +                              NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM)
>
>  /**
>   * struct gmac_queue_page - page buffer per-page info
> @@ -1143,39 +1142,13 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>         struct gmac_txdesc *txd;
>         skb_frag_t *skb_frag;
>         dma_addr_t mapping;
> -       unsigned short mtu;
>         void *buffer;
> -       int ret;
> -
> -       mtu  = ETH_HLEN;
> -       mtu += netdev->mtu;
> -       if (skb->protocol == htons(ETH_P_8021Q))
> -               mtu += VLAN_HLEN;
>
> +       /* TODO: implement proper TSO using MTU in word3 */

I would not use MTU in this comment, but gso_size (or flow MSS).

>         word1 = skb->len;
> -       word3 = SOF_BIT;
> -
> -       if (word1 > mtu) {
> -               word1 |= TSS_MTU_ENABLE_BIT;
> -               word3 |= mtu;
> -       }
> +       word3 = SOF_BIT | skb->len;

Probably word3 could be left with SOF_BIT ?
I am guessing the 'length' would only be used by the NIC if TSO is requested.

>
> -       if (skb->len >= ETH_FRAME_LEN) {
> -               /* Hardware offloaded checksumming isn't working on frames
> -                * bigger than 1514 bytes. A hypothesis about this is that the
> -                * checksum buffer is only 1518 bytes, so when the frames get
> -                * bigger they get truncated, or the last few bytes get
> -                * overwritten by the FCS.
> -                *
> -                * Just use software checksumming and bypass on bigger frames.
> -                */
> -               if (skb->ip_summed == CHECKSUM_PARTIAL) {
> -                       ret = skb_checksum_help(skb);
> -                       if (ret)
> -                               return ret;
> -               }
> -               word1 |= TSS_BYPASS_BIT;
> -       } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +       if (skb->ip_summed == CHECKSUM_PARTIAL) {
>                 int tcp = 0;
>
>                 /* We do not switch off the checksumming on non TCP/UDP
>
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO
  2024-01-05 11:32   ` Vladimir Oltean
  2024-01-05 14:36     ` Eric Dumazet
@ 2024-01-05 23:35     ` Linus Walleij
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-01-05 23:35 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Household Cang, Romain Gantois, netdev

On Fri, Jan 5, 2024 at 12:32 PM Vladimir Oltean <olteanv@gmail.com> wrote:

> So are you taking back the statement that "Hardware offloaded
> checksumming isn't working on frames bigger than 1514 bytes"?

Yes, the correct statement is that it isn't working in frames
bigger than 1514 bytes, if they have a custom DSA ethernet
tag.

The previous workaround has made the driver work fine
with the device that has a Realtek DSA switch with custom
ethertype, but it broke the driver for devices that have a
PHY connected directly to the ethernet block.

(I blame manual testing...)

> Have you increased the interface MTU beyond 1500, and tested with plain
> TCP (no DSA) on top of it? Who will provide the TCP checksum for them now?
>
> I don't understand why you remove the skb_checksum_help() call.
> It doesn't play nice with skb_is_gso() packets, agreed, but you removed
> the TSO netdev feature.

You're right, I was stuck there and larger MTU would not work.

Simply dropping the TSO and leaving the SW checksum in place
make it all work nicely!

Thank you so much Vladimir for pointing this out!

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes
  2024-01-04  0:53   ` Vladimir Oltean
@ 2024-01-06  0:17     ` Linus Walleij
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2024-01-06  0:17 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Household Cang, Romain Gantois, netdev

On Thu, Jan 4, 2024 at 1:53 AM Vladimir Oltean <olteanv@gmail.com> wrote:

> "Looking at the size of the frame is probably just wrong." yet you keep it.
>
> Not only is this confusing for you to say this, but I believe that the
> skb->len check is the _only_ thing that is needed. Explanation below.

You are right (as usual).

And the analysis you write make perfect sense.

I dropped the entire patch, and send only 1/2 in v6.

> The one difference between DSA and VLAN is that for DSA, you sometimes
> set TSS_BYPASS_BIT (for large frames) and for VLAN you never do.
(...)
> Do you know what "TSS_BYPASS_BIT" does, exactly?

No.

The datasheet very annoyingly omits all details on the TSS
(the checksumming engine), and the documentation of the bits
in "word1" and "word3" only say it is a way to pass configuration
to the checksumming engine. I think it is a genuine oversight
by the document author actually.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-01-06  0:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-02 20:34 [PATCH net v5 0/2] Fix a regression in the Gemini ethernet controller Linus Walleij
2024-01-02 20:34 ` [PATCH net v5 1/2] net: ethernet: cortina: Drop software checksum and TSO Linus Walleij
2024-01-04  0:24   ` Vladimir Oltean
2024-01-05  0:00     ` Linus Walleij
2024-01-05 11:32   ` Vladimir Oltean
2024-01-05 14:36     ` Eric Dumazet
2024-01-05 23:35     ` Linus Walleij
2024-01-05 14:40   ` Eric Dumazet
2024-01-02 20:34 ` [PATCH net v5 2/2] net: ethernet: cortina: Bypass checksumming engine of alien ethertypes Linus Walleij
2024-01-04  0:53   ` Vladimir Oltean
2024-01-06  0:17     ` Linus Walleij

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.