linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/4] Fix large frames in the Gemini ethernet driver
@ 2023-11-05 20:57 Linus Walleij
  2023-11-05 20:57 ` [PATCH net v2 1/4] net: ethernet: cortina: Fix MTU max setting Linus Walleij
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Linus Walleij @ 2023-11-05 20:57 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Vladimir Oltean,
	Andrew Lunn
  Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij

This is the result of a bug hunt for a problem with the
RTL8366RB DSA switch leading me wrong all over the place.

I am indebted to Vladimir Oltean who as usual pointed
out where the real problem was, many thanks!

Tryig to actually use big ("jumbo") frames on this
hardware uncovered the real bugs. Then I tested it on
the DSA switch and it indeed fixes the issue.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Changes in v2:
- Don't check for oversized MTU request: the framework makes sure it doesn't
  happen.
- Drop unrelated BIT() macro cleanups (I might send these later for net-next)
- Use a special error code if the skbuff is too big and fail gracefully
  is this happens.
- Do proper checksum of the frame using a software fallback when the frame
  is too long for hardware checksumming.
- Link to v1: https://lore.kernel.org/r/20231104-gemini-largeframe-fix-v1-0-9c5513f22f33@linaro.org

---
Linus Walleij (4):
      net: ethernet: cortina: Fix MTU max setting
      net: ethernet: cortina: Fix max RX frame define
      net: ethernet: cortina: Protect against oversized frames
      net: ethernet: cortina: Handle large frames

 drivers/net/ethernet/cortina/gemini.c | 39 ++++++++++++++++++++++++++++-------
 drivers/net/ethernet/cortina/gemini.h |  4 ++--
 2 files changed, 34 insertions(+), 9 deletions(-)
---
base-commit: e85fd73c7d9630d392f451fcf69a457c8e3f21dd
change-id: 20231104-gemini-largeframe-fix-c143d2c781b5

Best regards,
-- 
Linus Walleij <linus.walleij@linaro.org>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net v2 1/4] net: ethernet: cortina: Fix MTU max setting
  2023-11-05 20:57 [PATCH net v2 0/4] Fix large frames in the Gemini ethernet driver Linus Walleij
@ 2023-11-05 20:57 ` Linus Walleij
  2023-11-05 23:39   ` Andrew Lunn
  2023-11-05 20:57 ` [PATCH net v2 2/4] net: ethernet: cortina: Fix max RX frame define Linus Walleij
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Linus Walleij @ 2023-11-05 20:57 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Vladimir Oltean,
	Andrew Lunn
  Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij

The RX max frame size is over 10000 for the Gemini ethernet,
but the TX max frame size is actually just 2047 (0x7ff after
checking the datasheet). Reflect this in what we offer to Linux,
cap the MTU at the TX max frame minus ethernet headers.

Use the BIT() macro for related bit flags so these TX settings
are consistent.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 7 ++++---
 drivers/net/ethernet/cortina/gemini.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 5423fe26b4ef..ed9701f8ad9a 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -2464,11 +2464,12 @@ static int gemini_ethernet_port_probe(struct platform_device *pdev)
 
 	netdev->hw_features = GMAC_OFFLOAD_FEATURES;
 	netdev->features |= GMAC_OFFLOAD_FEATURES | NETIF_F_GRO;
-	/* We can handle jumbo frames up to 10236 bytes so, let's accept
-	 * payloads of 10236 bytes minus VLAN and ethernet header
+	/* We can receive jumbo frames up to 10236 bytes but only
+	 * transmit 2047 bytes so, let's accept payloads of 2047
+	 * bytes minus VLAN and ethernet header
 	 */
 	netdev->min_mtu = ETH_MIN_MTU;
-	netdev->max_mtu = 10236 - VLAN_ETH_HLEN;
+	netdev->max_mtu = MTU_SIZE_BIT_MASK - VLAN_ETH_HLEN;
 
 	port->freeq_refill = 0;
 	netif_napi_add(netdev, &port->napi, gmac_napi_poll);
diff --git a/drivers/net/ethernet/cortina/gemini.h b/drivers/net/ethernet/cortina/gemini.h
index 9fdf77d5eb37..201b4efe2937 100644
--- a/drivers/net/ethernet/cortina/gemini.h
+++ b/drivers/net/ethernet/cortina/gemini.h
@@ -502,7 +502,7 @@ union gmac_txdesc_3 {
 #define SOF_BIT			0x80000000
 #define EOF_BIT			0x40000000
 #define EOFIE_BIT		BIT(29)
-#define MTU_SIZE_BIT_MASK	0x1fff
+#define MTU_SIZE_BIT_MASK	0x7ff /* Max MTU 2047 bytes */
 
 /* GMAC Tx Descriptor */
 struct gmac_txdesc {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net v2 2/4] net: ethernet: cortina: Fix max RX frame define
  2023-11-05 20:57 [PATCH net v2 0/4] Fix large frames in the Gemini ethernet driver Linus Walleij
  2023-11-05 20:57 ` [PATCH net v2 1/4] net: ethernet: cortina: Fix MTU max setting Linus Walleij
@ 2023-11-05 20:57 ` Linus Walleij
  2023-11-05 20:57 ` [PATCH net v2 3/4] net: ethernet: cortina: Protect against oversized frames Linus Walleij
  2023-11-05 20:57 ` [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames Linus Walleij
  3 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2023-11-05 20:57 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Vladimir Oltean,
	Andrew Lunn
  Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij

Enumerator 3 is 1548 bytes according to the datasheet.
Not 1542.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 4 ++--
 drivers/net/ethernet/cortina/gemini.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index ed9701f8ad9a..b21a94b4ab5c 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -432,8 +432,8 @@ static const struct gmac_max_framelen gmac_maxlens[] = {
 		.val = CONFIG0_MAXLEN_1536,
 	},
 	{
-		.max_l3_len = 1542,
-		.val = CONFIG0_MAXLEN_1542,
+		.max_l3_len = 1548,
+		.val = CONFIG0_MAXLEN_1548,
 	},
 	{
 		.max_l3_len = 9212,
diff --git a/drivers/net/ethernet/cortina/gemini.h b/drivers/net/ethernet/cortina/gemini.h
index 201b4efe2937..24bb989981f2 100644
--- a/drivers/net/ethernet/cortina/gemini.h
+++ b/drivers/net/ethernet/cortina/gemini.h
@@ -787,7 +787,7 @@ union gmac_config0 {
 #define  CONFIG0_MAXLEN_1536	0
 #define  CONFIG0_MAXLEN_1518	1
 #define  CONFIG0_MAXLEN_1522	2
-#define  CONFIG0_MAXLEN_1542	3
+#define  CONFIG0_MAXLEN_1548	3
 #define  CONFIG0_MAXLEN_9k	4	/* 9212 */
 #define  CONFIG0_MAXLEN_10k	5	/* 10236 */
 #define  CONFIG0_MAXLEN_1518__6	6

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net v2 3/4] net: ethernet: cortina: Protect against oversized frames
  2023-11-05 20:57 [PATCH net v2 0/4] Fix large frames in the Gemini ethernet driver Linus Walleij
  2023-11-05 20:57 ` [PATCH net v2 1/4] net: ethernet: cortina: Fix MTU max setting Linus Walleij
  2023-11-05 20:57 ` [PATCH net v2 2/4] net: ethernet: cortina: Fix max RX frame define Linus Walleij
@ 2023-11-05 20:57 ` Linus Walleij
  2023-11-06 12:40   ` Vladimir Oltean
  2023-11-05 20:57 ` [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames Linus Walleij
  3 siblings, 1 reply; 11+ messages in thread
From: Linus Walleij @ 2023-11-05 20:57 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Vladimir Oltean,
	Andrew Lunn
  Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij

The max size of a transfer no matter the MTU is 64KB-1 so immediately
bail out if the skb exceeds that.

The calling site tries to linearize the skbuff on error so return a
special error code -E2BIG to indicate that this will not work in
any way and bail out immediately if this happens.

Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index b21a94b4ab5c..576174a862a9 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1151,6 +1151,12 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 	if (skb->protocol == htons(ETH_P_8021Q))
 		mtu += VLAN_HLEN;
 
+	if (skb->len > 65535) {
+		/* The field for length is only 16 bits */
+		netdev_err(netdev, "%s: frame too big, max size 65535 bytes\n", __func__);
+		return -E2BIG;
+	}
+
 	word1 = skb->len;
 	word3 = SOF_BIT;
 
@@ -1232,6 +1238,7 @@ static netdev_tx_t gmac_start_xmit(struct sk_buff *skb,
 	struct gmac_txq *txq;
 	int txq_num, nfrags;
 	union dma_rwptr rw;
+	int ret;
 
 	if (skb->len >= 0x10000)
 		goto out_drop_free;
@@ -1269,7 +1276,11 @@ static netdev_tx_t gmac_start_xmit(struct sk_buff *skb,
 		}
 	}
 
-	if (gmac_map_tx_bufs(netdev, skb, txq, &w)) {
+	ret = gmac_map_tx_bufs(netdev, skb, txq, &w);
+	if (ret == -E2BIG)
+		goto out_drop;
+	if (ret) {
+		/* Linearize and retry */
 		if (skb_linearize(skb))
 			goto out_drop;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames
  2023-11-05 20:57 [PATCH net v2 0/4] Fix large frames in the Gemini ethernet driver Linus Walleij
                   ` (2 preceding siblings ...)
  2023-11-05 20:57 ` [PATCH net v2 3/4] net: ethernet: cortina: Protect against oversized frames Linus Walleij
@ 2023-11-05 20:57 ` Linus Walleij
  2023-11-06 13:26   ` Vladimir Oltean
  3 siblings, 1 reply; 11+ messages in thread
From: Linus Walleij @ 2023-11-05 20:57 UTC (permalink / raw)
  To: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Vladimir Oltean,
	Andrew Lunn
  Cc: linux-arm-kernel, netdev, linux-kernel, Linus Walleij

The Gemini ethernet controller provides hardware checksumming
for frames up to 1514 bytes including ethernet headers but not
FCS.

If we start sending bigger frames (after first bumping up the MTU
on both interfaces sending and receiveing the frames), truncated
packets start to appear on the target such as in this tcpdump
resulting from ping -s 1474:

23:34:17.241983 14:d6:4d:a8:3c:4f (oui Unknown) > bc:ae:c5:6b:a8:3d (oui Unknown),
ethertype IPv4 (0x0800), length 1514: truncated-ip - 2 bytes missing!
(tos 0x0, ttl 64, id 32653, offset 0, flags [DF], proto ICMP (1), length 1502)
OpenWrt.lan > Fecusia: ICMP echo request, id 1672, seq 50, length 1482

If we bypass the hardware checksumming and provide a software
fallback, everything starts working fine up to the max TX MTU
of 2047 bytes, for example ping -s2000 192.168.1.2:

00:44:29.587598 bc:ae:c5:6b:a8:3d (oui Unknown) > 14:d6:4d:a8:3c:4f (oui Unknown),
ethertype IPv4 (0x0800), length 2042:
(tos 0x0, ttl 64, id 51828, offset 0, flags [none], proto ICMP (1), length 2028)
Fecusia > OpenWrt.lan: ICMP echo reply, id 1683, seq 4, length 2008

The bit enabling to bypass hardware checksum (or any of the
"TSS" bits) are undocumented in the hardware reference manual.
The entire hardware checksum unit appears undocumented. The
conclusion that we need to use the "bypass" bit was found by
trial-and-error.

Since no hardware checksum will happen, we slot in a software
checksum fallback.

Check for the condition where we need to compute checksum on the
skb with either hardware or software using == CHECKSUM_PARTIAL instead
of != CHECKSUM_NONE which is an incomplete check according to
<linux/skbuff.h>.

On the D-Link DIR-685 router this fixes a bug on the conduit
interface to the RTL8366RB DSA switch: as the switch needs to add
space for its tag it increases the MTU on the conduit interface
to 1504 and that means that when the router sends packages
of 1500 bytes these get an extra 4 bytes of DSA tag and the
transfer fails because of the erroneous hardware checksumming,
affecting such basic functionality as the LuCI web interface.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/ethernet/cortina/gemini.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
index 576174a862a9..84295c1b87e6 100644
--- a/drivers/net/ethernet/cortina/gemini.c
+++ b/drivers/net/ethernet/cortina/gemini.c
@@ -1145,6 +1145,7 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 	dma_addr_t mapping;
 	unsigned short mtu;
 	void *buffer;
+	int ret;
 
 	mtu  = ETH_HLEN;
 	mtu += netdev->mtu;
@@ -1165,7 +1166,19 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
 		word3 |= mtu;
 	}
 
-	if (skb->ip_summed != CHECKSUM_NONE) {
+	if (skb->len >= ETH_FRAME_LEN) {
+		/* Hardware offloaded checksumming isn't working on frames
+		 * bigger than 1514 bytes. Perhaps the buffer is only 1518
+		 * bytes fitting a normal frame and a checksum?
+		 * Just use software checksumming and bypass on bigger frames.
+		 */
+		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			ret = skb_checksum_help(skb);
+			if (ret)
+				return ret;
+		}
+		word1 |= TSS_BYPASS_BIT;
+	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		int tcp = 0;
 
 		if (skb->protocol == htons(ETH_P_IP)) {

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2 1/4] net: ethernet: cortina: Fix MTU max setting
  2023-11-05 20:57 ` [PATCH net v2 1/4] net: ethernet: cortina: Fix MTU max setting Linus Walleij
@ 2023-11-05 23:39   ` Andrew Lunn
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Lunn @ 2023-11-05 23:39 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Vladimir Oltean,
	linux-arm-kernel, netdev, linux-kernel

On Sun, Nov 05, 2023 at 09:57:23PM +0100, Linus Walleij wrote:
> The RX max frame size is over 10000 for the Gemini ethernet,
> but the TX max frame size is actually just 2047 (0x7ff after
> checking the datasheet). Reflect this in what we offer to Linux,
> cap the MTU at the TX max frame minus ethernet headers.
> 
> Use the BIT() macro for related bit flags so these TX settings
> are consistent.
> 
> Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2 3/4] net: ethernet: cortina: Protect against oversized frames
  2023-11-05 20:57 ` [PATCH net v2 3/4] net: ethernet: cortina: Protect against oversized frames Linus Walleij
@ 2023-11-06 12:40   ` Vladimir Oltean
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Oltean @ 2023-11-06 12:40 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Andrew Lunn,
	linux-arm-kernel, netdev, linux-kernel

On Sun, Nov 05, 2023 at 09:57:25PM +0100, Linus Walleij wrote:
> The max size of a transfer no matter the MTU is 64KB-1 so immediately
> bail out if the skb exceeds that.
> 
> The calling site tries to linearize the skbuff on error so return a
> special error code -E2BIG to indicate that this will not work in
> any way and bail out immediately if this happens.
> 
> Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  drivers/net/ethernet/cortina/gemini.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
> index b21a94b4ab5c..576174a862a9 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -1151,6 +1151,12 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  	if (skb->protocol == htons(ETH_P_8021Q))
>  		mtu += VLAN_HLEN;
>  
> +	if (skb->len > 65535) {
> +		/* The field for length is only 16 bits */
> +		netdev_err(netdev, "%s: frame too big, max size 65535 bytes\n", __func__);
> +		return -E2BIG;
> +	}
> +

Prints in the packet data path are extremely discouraged, since if they
trigger, they will spam your serial console and make it unusable.

I see that the out_drop label already bumps a counter. That should be
enough to signal there is a problem.

>  	word1 = skb->len;
>  	word3 = SOF_BIT;
>  
> @@ -1232,6 +1238,7 @@ static netdev_tx_t gmac_start_xmit(struct sk_buff *skb,
>  	struct gmac_txq *txq;
>  	int txq_num, nfrags;
>  	union dma_rwptr rw;
> +	int ret;
>  
>  	if (skb->len >= 0x10000)
>  		goto out_drop_free;

Since you already have this test, does the newly introduced one make
this redundant? Why not just change the limit here?

> @@ -1269,7 +1276,11 @@ static netdev_tx_t gmac_start_xmit(struct sk_buff *skb,
>  		}
>  	}
>  
> -	if (gmac_map_tx_bufs(netdev, skb, txq, &w)) {
> +	ret = gmac_map_tx_bufs(netdev, skb, txq, &w);
> +	if (ret == -E2BIG)
> +		goto out_drop;

Why out_drop and not out_drop_free? This handling will eventually cause an OOM.

The fact that it didn't makes me suspect that you never actually hit this condition,
because the network stack isn't delivering skbs larger than dev->mtu.
Maybe net/core/pktgen.c doesn't take the MTU into consideration, I'm not
completely sure there...

> +	if (ret) {
> +		/* Linearize and retry */
>  		if (skb_linearize(skb))
>  			goto out_drop;
>  
> 
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames
  2023-11-05 20:57 ` [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames Linus Walleij
@ 2023-11-06 13:26   ` Vladimir Oltean
  2023-11-06 13:43     ` Vladimir Oltean
  2023-11-06 22:44     ` Linus Walleij
  0 siblings, 2 replies; 11+ messages in thread
From: Vladimir Oltean @ 2023-11-06 13:26 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Andrew Lunn,
	linux-arm-kernel, netdev, linux-kernel

On Sun, Nov 05, 2023 at 09:57:26PM +0100, Linus Walleij wrote:
> The Gemini ethernet controller provides hardware checksumming
> for frames up to 1514 bytes including ethernet headers but not
> FCS.
> 
> If we start sending bigger frames (after first bumping up the MTU
> on both interfaces sending and receiveing the frames), truncated
> packets start to appear on the target such as in this tcpdump
> resulting from ping -s 1474:

A bit related: what is gmac_fix_features() supposed to do? I see it
unsets GMAC_OFFLOAD_FEATURES when the MTU goes over a certain limit,
and that also includes NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM. Is that
limit correct, or is it supposed to kick in sooner, to allow
validate_xmit_skb() -> skb_csum_hwoffload_help() do the software
checksuum for you? I'm not sure whether that was the intention.

> 
> 23:34:17.241983 14:d6:4d:a8:3c:4f (oui Unknown) > bc:ae:c5:6b:a8:3d (oui Unknown),
> ethertype IPv4 (0x0800), length 1514: truncated-ip - 2 bytes missing!
> (tos 0x0, ttl 64, id 32653, offset 0, flags [DF], proto ICMP (1), length 1502)
> OpenWrt.lan > Fecusia: ICMP echo request, id 1672, seq 50, length 1482
> 
> If we bypass the hardware checksumming and provide a software
> fallback, everything starts working fine up to the max TX MTU
> of 2047 bytes, for example ping -s2000 192.168.1.2:
> 
> 00:44:29.587598 bc:ae:c5:6b:a8:3d (oui Unknown) > 14:d6:4d:a8:3c:4f (oui Unknown),
> ethertype IPv4 (0x0800), length 2042:
> (tos 0x0, ttl 64, id 51828, offset 0, flags [none], proto ICMP (1), length 2028)
> Fecusia > OpenWrt.lan: ICMP echo reply, id 1683, seq 4, length 2008
> 
> The bit enabling to bypass hardware checksum (or any of the
> "TSS" bits) are undocumented in the hardware reference manual.
> The entire hardware checksum unit appears undocumented. The
> conclusion that we need to use the "bypass" bit was found by
> trial-and-error.
> 
> Since no hardware checksum will happen, we slot in a software
> checksum fallback.
> 
> Check for the condition where we need to compute checksum on the
> skb with either hardware or software using == CHECKSUM_PARTIAL instead
> of != CHECKSUM_NONE which is an incomplete check according to
> <linux/skbuff.h>.
> 
> On the D-Link DIR-685 router this fixes a bug on the conduit
> interface to the RTL8366RB DSA switch: as the switch needs to add
> space for its tag it increases the MTU on the conduit interface
> to 1504 and that means that when the router sends packages
> of 1500 bytes these get an extra 4 bytes of DSA tag and the
> transfer fails because of the erroneous hardware checksumming,
> affecting such basic functionality as the LuCI web interface.
> 
> Suggested-by: Vladimir Oltean <olteanv@gmail.com>

To be clear, I didn't suggest any of this. I just pointed towards the gemini.c
driver as being the problem. Please remove my Suggested-by tag.

> Fixes: 4d5ae32f5e1e ("net: ethernet: Add a driver for Gemini gigabit ethernet")
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
>  drivers/net/ethernet/cortina/gemini.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cortina/gemini.c b/drivers/net/ethernet/cortina/gemini.c
> index 576174a862a9..84295c1b87e6 100644
> --- a/drivers/net/ethernet/cortina/gemini.c
> +++ b/drivers/net/ethernet/cortina/gemini.c
> @@ -1145,6 +1145,7 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  	dma_addr_t mapping;
>  	unsigned short mtu;
>  	void *buffer;
> +	int ret;
>  
>  	mtu  = ETH_HLEN;
>  	mtu += netdev->mtu;
> @@ -1165,7 +1166,19 @@ static int gmac_map_tx_bufs(struct net_device *netdev, struct sk_buff *skb,
>  		word3 |= mtu;
>  	}
>  
> -	if (skb->ip_summed != CHECKSUM_NONE) {
> +	if (skb->len >= ETH_FRAME_LEN) {
> +		/* Hardware offloaded checksumming isn't working on frames
> +		 * bigger than 1514 bytes. Perhaps the buffer is only 1518
> +		 * bytes fitting a normal frame and a checksum?
> +		 * Just use software checksumming and bypass on bigger frames.
> +		 */
> +		if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +			ret = skb_checksum_help(skb);
> +			if (ret)
> +				return ret;
> +		}
> +		word1 |= TSS_BYPASS_BIT;
> +	} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
>  		int tcp = 0;
>  
>  		if (skb->protocol == htons(ETH_P_IP)) {
> 
> -- 
> 2.34.1
> 

[ context: tag_rtl4_a.c is a "category 2", aka "Ethertype", tagger ]

We say this in Documentation/networking/dsa/dsa.rst:

Checksum offload should work with category 1 and 2 taggers when the DSA conduit
driver declares NETIF_F_HW_CSUM in vlan_features and looks at csum_start and
csum_offset. For those cases, DSA will shift the checksum start and offset by
the tag size. If the DSA conduit driver still uses the legacy NETIF_F_IP_CSUM
or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the
offload hardware already expects that specific tag (perhaps due to matching
vendors). DSA user ports inherit those flags from the conduit, and it is up to
the driver to correctly fall back to software checksum when the IP header is not
where the hardware expects. If that check is ineffective, the packets might go
to the network without a proper checksum (the checksum field will have the
pseudo IP header sum).

Shouldn't "word1 |= TSS_BYPASS_BIT;" be done depending on skb->protocol,
rather than depending on skb->len?!

		if (skb->protocol == htons(ETH_P_IP)) {
			word1 |= TSS_IP_CHKSUM_BIT;
			tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
		} else { /* IPv6 */
			word1 |= TSS_IPV6_ENABLE_BIT;
			tcp = ipv6_hdr(skb)->nexthdr == IPPROTO_TCP;
		} // here
			word1 |= TSS_BYPASS_BIT;

Gemini should never attempt to provide checksums for DSA-tagged packets
unless it is able to take skb->csum_start into consideration, otherwise
it will get it wrong.

This is somewhat independent of the other problem you've found, which
seems to be that large non-DSA packets get truncated anyway. But not
bypassing TX checksum offload truncates a packet? Hmm, strange.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames
  2023-11-06 13:26   ` Vladimir Oltean
@ 2023-11-06 13:43     ` Vladimir Oltean
  2023-11-06 22:44     ` Linus Walleij
  1 sibling, 0 replies; 11+ messages in thread
From: Vladimir Oltean @ 2023-11-06 13:43 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Andrew Lunn,
	linux-arm-kernel, netdev, linux-kernel

On Mon, Nov 06, 2023 at 03:26:26PM +0200, Vladimir Oltean wrote:
> Gemini should never attempt to provide checksums for DSA-tagged packets
> unless it is able to take skb->csum_start into consideration, otherwise
> it will get it wrong.

Additionally, since Gemini does not put NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM
in vlan_features, DSA won't inherit (thus won't have) them. So,
validate_xmit_skb() should perform the skb checksum during the xmit on
the user port, which is earlier compared to the xmit on the conduit.
So, I guess skb->ip_summed should already be CHECKSUM_NONE here?
I think the only problem for DSA is the lack of the TSS_BYPASS_BIT. The
rest is unrelated.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames
  2023-11-06 13:26   ` Vladimir Oltean
  2023-11-06 13:43     ` Vladimir Oltean
@ 2023-11-06 22:44     ` Linus Walleij
  2023-11-08 15:18       ` Vladimir Oltean
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Walleij @ 2023-11-06 22:44 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Andrew Lunn,
	linux-arm-kernel, netdev, linux-kernel

On Mon, Nov 6, 2023 at 2:26 PM Vladimir Oltean <olteanv@gmail.com> wrote:
> On Sun, Nov 05, 2023 at 09:57:26PM +0100, Linus Walleij wrote:

> > If we start sending bigger frames (after first bumping up the MTU
> > on both interfaces sending and receiveing the frames), truncated
> > packets start to appear on the target such as in this tcpdump
> > resulting from ping -s 1474:
>
> A bit related: what is gmac_fix_features() supposed to do? I see it
> unsets GMAC_OFFLOAD_FEATURES when the MTU goes over a certain limit,
> and that also includes NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM. Is that
> limit correct, or is it supposed to kick in sooner, to allow
> validate_xmit_skb() -> skb_csum_hwoffload_help() do the software
> checksuum for you? I'm not sure whether that was the intention.

That indeed seems like the intention. But it's a bit suboptimal, because it
disables hardware checksum just because the MTU goes over a
certain level, and stops using the hardware checksum also for all
packets smaller than the MTU :(

I'll delete this and make the driver slot in the SW fallback per-packet
instead, I think it is fair to assume that most packets will be < MTU
and it is really just a question of where the fallback gets called.

> > Suggested-by: Vladimir Oltean <olteanv@gmail.com>
>
> To be clear, I didn't suggest any of this. I just pointed towards the gemini.c
> driver as being the problem. Please remove my Suggested-by tag.

OK sorry, it was just my way of trying to provide credit where
credit is due, because you helped so much with this bug.

> > -     if (skb->ip_summed != CHECKSUM_NONE) {
> > +     if (skb->len >= ETH_FRAME_LEN) {
> > +             /* Hardware offloaded checksumming isn't working on frames
> > +              * bigger than 1514 bytes. Perhaps the buffer is only 1518
> > +              * bytes fitting a normal frame and a checksum?
> > +              * Just use software checksumming and bypass on bigger frames.
> > +              */
> > +             if (skb->ip_summed == CHECKSUM_PARTIAL) {
> > +                     ret = skb_checksum_help(skb);
> > +                     if (ret)
> > +                             return ret;
> > +             }
> > +             word1 |= TSS_BYPASS_BIT;
> > +     } else if (skb->ip_summed == CHECKSUM_PARTIAL) {
> >               int tcp = 0;
> >
> >               if (skb->protocol == htons(ETH_P_IP)) {
>
> [ context: tag_rtl4_a.c is a "category 2", aka "Ethertype", tagger ]
>
> We say this in Documentation/networking/dsa/dsa.rst:
>
> Checksum offload should work with category 1 and 2 taggers when the DSA conduit
> driver declares NETIF_F_HW_CSUM in vlan_features
> and looks at csum_start and csum_offset.
> For those cases, DSA will shift the checksum start and offset by
> the tag size. If the DSA conduit driver still uses the legacy NETIF_F_IP_CSUM
> or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the
> offload hardware already expects that specific tag (perhaps due to matching
> vendors).

Since things work smoothly I can only assume that the Gemini
checksum engine actually knows about the Realtek ethertype (0x8899)
and the protocol (0xa) and takes action on that, since the switch works.

OR: it has some heuristic on for how to handle it. (Such as looking for
a valid TCP or UDP header to figure out where to put the checksum.)

But I have no idea how it does it. It doesn't have a firmware AFAIK.

Examples listed were ICMP so just IP checksums but I tried for example
SSH, and HTTP and packets look like this:

22:51:35.457191 9a:ec:30:5a:46:96 (oui Unknown) > bc:ae:c5:6b:a8:3d
(oui Unknown),
ethertype IPv4 (0x0800), length 434: (tos 0x48, ttl 64, id 8221,
offset 0, flags [DF], proto TCP (6), length 420)
    _gateway.48102 > fedora.ssh: Flags [P.], cksum 0xcf1b (correct),
seq 811:1179, ack 2310,
   win 2054, options [nop,nop,TS val 74858741 ecr 1981407207], length 368

Checksum correct. So...

> DSA user ports inherit those flags from the conduit, and it is up to
> the driver to correctly fall back to software checksum when the IP header is not
> where the hardware expects. If that check is ineffective, the packets might go
> to the network without a proper checksum (the checksum field will have the
> pseudo IP header sum).

It definately does not contain the pseudo IP header sum because
it would be the same all the time but tcpdump is happy:
cksum 0xcf1b (correct)
cksum 0x0655 (correct)
cksum 0xd247 (correct)
cksum 0x06b1 (correct)

> Shouldn't "word1 |= TSS_BYPASS_BIT;" be done depending on skb->protocol,
> rather than depending on skb->len?!
>
>                 if (skb->protocol == htons(ETH_P_IP)) {
>                         word1 |= TSS_IP_CHKSUM_BIT;
>                         tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
>                 } else { /* IPv6 */
>                         word1 |= TSS_IPV6_ENABLE_BIT;
>                         tcp = ipv6_hdr(skb)->nexthdr == IPPROTO_TCP;
>                 } // here
>                         word1 |= TSS_BYPASS_BIT;

Oddly it assumes everything is either TCP or UDP on
IPv4 or IPv6. And yet things such as ICMP work just fine.

I think the checksum engine can contain some various
heuristics, such as if it cannot recognize what is coming
in as the selected TCP or UDP, it will pass right through.

> Gemini should never attempt to provide checksums for DSA-tagged packets
> unless it is able to take skb->csum_start into consideration, otherwise
> it will get it wrong.
>
> This is somewhat independent of the other problem you've found, which
> seems to be that large non-DSA packets get truncated anyway. But not
> bypassing TX checksum offload truncates a packet? Hmm, strange.

I have a theory about that in the comment, which is that when they
engineered the hardware they only put in a hardware buffer for
1518 bytes in the checksum engine. If you try to put in any more
it gets truncated. It's a reasonable guess.

If you do not set the checkumming engine to "bypass" it will try
to fit the incoming paket into the checksumming buffer, and then
it will look to see if it can find the right TCP or UDP headers.
If it can't it will pass the packet out from the buffer without doing
any changes. But the buffer is just 1518 bytes, which means that
no matter what kind of package it is, it will get truncated if it
does not fit into the checksumming buffer.

This would give exactly the behaviour we're seeing.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames
  2023-11-06 22:44     ` Linus Walleij
@ 2023-11-08 15:18       ` Vladimir Oltean
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Oltean @ 2023-11-08 15:18 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Hans Ulli Kroll, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Michał Mirosław, Andrew Lunn,
	linux-arm-kernel, netdev, linux-kernel

On Mon, Nov 06, 2023 at 11:44:37PM +0100, Linus Walleij wrote:
> That indeed seems like the intention. But it's a bit suboptimal, because it
> disables hardware checksum just because the MTU goes over a
> certain level, and stops using the hardware checksum also for all
> packets smaller than the MTU :(
> 
> I'll delete this and make the driver slot in the SW fallback per-packet
> instead, I think it is fair to assume that most packets will be < MTU
> and it is really just a question of where the fallback gets called.

Performing the software checksum per packet instead of doing it for all
packets seems more sensible. I haven't looked at the other offload bits
from GMAC_OFFLOAD_FEATURES to determine if changing those is going to be
problematic.

> > > Suggested-by: Vladimir Oltean <olteanv@gmail.com>
> >
> > To be clear, I didn't suggest any of this. I just pointed towards the gemini.c
> > driver as being the problem. Please remove my Suggested-by tag.
> 
> OK sorry, it was just my way of trying to provide credit where
> credit is due, because you helped so much with this bug.

Ok, but asserting the validity of the DSA code and vaguely pointing
towards an unrelated driver and not even specifying what might be wrong
with its xmit procedure still does not count as worth a Suggested-by tag
to me. So still, please remove it.

> > We say this in Documentation/networking/dsa/dsa.rst:
> >
> > Checksum offload should work with category 1 and 2 taggers when the DSA conduit
> > driver declares NETIF_F_HW_CSUM in vlan_features
> > and looks at csum_start and csum_offset.
> > For those cases, DSA will shift the checksum start and offset by
> > the tag size. If the DSA conduit driver still uses the legacy NETIF_F_IP_CSUM
> > or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the
> > offload hardware already expects that specific tag (perhaps due to matching
> > vendors).
> 
> Since things work smoothly I can only assume that the Gemini
> checksum engine actually knows about the Realtek ethertype (0x8899)
> and the protocol (0xa) and takes action on that, since the switch works.
> 
> OR: it has some heuristic on for how to handle it. (Such as looking for
> a valid TCP or UDP header to figure out where to put the checksum.)

The second option seems more plausible to me. It seems unlikely that a
non-Realtek controller would have information about a Realtek
proprietary protocol, that even Linux people have trouble finding a lot
of information about, when explicitly trying to do so.

> But I have no idea how it does it. It doesn't have a firmware AFAIK.
> 
> Examples listed were ICMP so just IP checksums but I tried for example
> SSH, and HTTP and packets look like this:
> 
> 22:51:35.457191 9a:ec:30:5a:46:96 (oui Unknown) > bc:ae:c5:6b:a8:3d
> (oui Unknown),
> ethertype IPv4 (0x0800), length 434: (tos 0x48, ttl 64, id 8221,
> offset 0, flags [DF], proto TCP (6), length 420)
>     _gateway.48102 > fedora.ssh: Flags [P.], cksum 0xcf1b (correct),
> seq 811:1179, ack 2310,
>    win 2054, options [nop,nop,TS val 74858741 ecr 1981407207], length 368
> 
> Checksum correct. So...

Well, the checksum is correct because it's done in software by the
network stack, prior to dsa_user_xmit() being even called, and thus
prior to the DSA tag being added. "ethtool -k lan0 | grep sum" will tell
you as much (compare it to "ethtool -k eth0" which will list some
offload bits set).

> > Shouldn't "word1 |= TSS_BYPASS_BIT;" be done depending on skb->protocol,
> > rather than depending on skb->len?!
> >
> >                 if (skb->protocol == htons(ETH_P_IP)) {
> >                         word1 |= TSS_IP_CHKSUM_BIT;
> >                         tcp = ip_hdr(skb)->protocol == IPPROTO_TCP;
> >                 } else { /* IPv6 */
> >                         word1 |= TSS_IPV6_ENABLE_BIT;
> >                         tcp = ipv6_hdr(skb)->nexthdr == IPPROTO_TCP;
> >                 } // here
> >                         word1 |= TSS_BYPASS_BIT;
> 
> Oddly it assumes everything is either TCP or UDP on
> IPv4 or IPv6. And yet things such as ICMP work just fine.
> 
> I think the checksum engine can contain some various
> heuristics, such as if it cannot recognize what is coming
> in as the selected TCP or UDP, it will pass right through.
> 
> > Gemini should never attempt to provide checksums for DSA-tagged packets
> > unless it is able to take skb->csum_start into consideration, otherwise
> > it will get it wrong.
> >
> > This is somewhat independent of the other problem you've found, which
> > seems to be that large non-DSA packets get truncated anyway. But not
> > bypassing TX checksum offload truncates a packet? Hmm, strange.
> 
> I have a theory about that in the comment, which is that when they
> engineered the hardware they only put in a hardware buffer for
> 1518 bytes in the checksum engine. If you try to put in any more
> it gets truncated. It's a reasonable guess.
> 
> If you do not set the checkumming engine to "bypass" it will try
> to fit the incoming paket into the checksumming buffer, and then
> it will look to see if it can find the right TCP or UDP headers.
> If it can't it will pass the packet out from the buffer without doing
> any changes. But the buffer is just 1518 bytes, which means that
> no matter what kind of package it is, it will get truncated if it
> does not fit into the checksumming buffer.
> 
> This would give exactly the behaviour we're seeing.

The evidence you've presented seems to support your position and not
mine. The controller does not attempt to provide checksums for
DSA-tagged packets, that is not a problem because those have checksums
already, but the packet size is the actual problem, irrespective of
whether the packets are DSA-tagged or not.

But in that case I have some comments on your latest version of the
patch set.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-11-08 15:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-05 20:57 [PATCH net v2 0/4] Fix large frames in the Gemini ethernet driver Linus Walleij
2023-11-05 20:57 ` [PATCH net v2 1/4] net: ethernet: cortina: Fix MTU max setting Linus Walleij
2023-11-05 23:39   ` Andrew Lunn
2023-11-05 20:57 ` [PATCH net v2 2/4] net: ethernet: cortina: Fix max RX frame define Linus Walleij
2023-11-05 20:57 ` [PATCH net v2 3/4] net: ethernet: cortina: Protect against oversized frames Linus Walleij
2023-11-06 12:40   ` Vladimir Oltean
2023-11-05 20:57 ` [PATCH net v2 4/4] net: ethernet: cortina: Handle large frames Linus Walleij
2023-11-06 13:26   ` Vladimir Oltean
2023-11-06 13:43     ` Vladimir Oltean
2023-11-06 22:44     ` Linus Walleij
2023-11-08 15:18       ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).