All of lore.kernel.org
 help / color / mirror / Atom feed
* [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
@ 2018-12-04  6:14 Cong Wang
  2018-12-04  6:34 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Cong Wang @ 2018-12-04  6:14 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Saeed Mahameed, Eric Dumazet, Tariq Toukan

When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are ususally zero's, which don't affect
checksum. However, we have a switch which pads non-zero octets, this
causes kernel hardware checksum fault repeatedly.

Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
skb checksum was forced to be CHECKSUM_NONE when padding is detected.
After it, we need to keep skb->csum updated, like what we do for RXFCS.
However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
headers, it is not worthy the effort as the packets are so small that
CHECKSUM_COMPLETE can't save anything.

I tested this patch with RXFCS on and off, it works fine without any
warning in both cases.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 624eed345b5d..1c153b8091da 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int network_depth, __be16 proto)
 					    ((struct ipv6hdr *)ip_p)->nexthdr;
 }
 
+static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
+{
+	u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
+
+	return frame_len <= ETH_ZLEN;
+}
+
 static inline void mlx5e_handle_csum(struct net_device *netdev,
 				     struct mlx5_cqe64 *cqe,
 				     struct mlx5e_rq *rq,
@@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
 		goto csum_unnecessary;
 
 	if (likely(is_last_ethertype_ip(skb, &network_depth, &proto))) {
+		bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
+
 		if (unlikely(get_ip_proto(skb, network_depth, proto) == IPPROTO_SCTP))
 			goto csum_unnecessary;
 
+		/* CQE csum doesn't cover padding octets in short ethernet
+		 * frames. And the pad field is appended prior to calculating
+		 * and appending the FCS field.
+		 *
+		 * Detecting these padded frames requires to verify and parse
+		 * IP headers, so we simply force all those small frames to be
+		 * CHECKSUM_NONE even if they are not padded.
+		 */
+		if (unlikely(is_short_frame(skb, has_fcs)))
+			goto csum_none;
+
 		skb->ip_summed = CHECKSUM_COMPLETE;
 		skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
 		if (network_depth > ETH_HLEN)
@@ -768,7 +788,7 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
 			skb->csum = csum_partial(skb->data + ETH_HLEN,
 						 network_depth - ETH_HLEN,
 						 skb->csum);
-		if (unlikely(netdev->features & NETIF_F_RXFCS))
+		if (unlikely(has_fcs))
 			skb->csum = csum_block_add(skb->csum,
 						   (__force __wsum)mlx5e_get_fcs(skb),
 						   skb->len - ETH_FCS_LEN);
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04  6:14 [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames Cong Wang
@ 2018-12-04  6:34 ` Eric Dumazet
  2018-12-04  6:48   ` Cong Wang
  2018-12-04 19:02 ` Saeed Mahameed
  2018-12-04 20:44 ` Cong Wang
  2 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2018-12-04  6:34 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Saeed Mahameed, Tariq Toukan

On Mon, Dec 3, 2018 at 10:14 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are ususally zero's, which don't affect
> checksum. However, we have a switch which pads non-zero octets, this
> causes kernel hardware checksum fault repeatedly.
>
> Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> skb checksum was forced to be CHECKSUM_NONE when padding is detected.
> After it, we need to keep skb->csum updated, like what we do for RXFCS.
> However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
> headers, it is not worthy the effort as the packets are so small that
> CHECKSUM_COMPLETE can't save anything.
>
> I tested this patch with RXFCS on and off, it works fine without any
> warning in both cases.
>
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 624eed345b5d..1c153b8091da 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int network_depth, __be16 proto)
>                                             ((struct ipv6hdr *)ip_p)->nexthdr;
>  }
>
> +static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
> +{
> +       u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
> +
> +       return frame_len <= ETH_ZLEN;
> +}
> +
>  static inline void mlx5e_handle_csum(struct net_device *netdev,
>                                      struct mlx5_cqe64 *cqe,
>                                      struct mlx5e_rq *rq,
> @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
>                 goto csum_unnecessary;
>
>         if (likely(is_last_ethertype_ip(skb, &network_depth, &proto))) {
> +               bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
> +
>                 if (unlikely(get_ip_proto(skb, network_depth, proto) == IPPROTO_SCTP))
>                         goto csum_unnecessary;
>
> +               /* CQE csum doesn't cover padding octets in short ethernet
> +                * frames. And the pad field is appended prior to calculating
> +                * and appending the FCS field.
> +                *
> +                * Detecting these padded frames requires to verify and parse
> +                * IP headers, so we simply force all those small frames to be
> +                * CHECKSUM_NONE even if they are not padded.
> +                */
> +               if (unlikely(is_short_frame(skb, has_fcs)))
> +                       goto csum_none;

Should not this go to csum_unnecessary instead ?

Probably not a big deal, but small UDP frames might hit this code path,
so ethtool -S would show a lot of csum_none which could confuse mlx5 owners.

BTW,
It looks like mlx5 prefers delivering skbs with CHECKSUM_COMPLETE instead of
CHECKSUM_UNNECESSARY but at least for ipv6 CHECKSUM_UNNECESSARY
would be slightly faster, by avoiding  various csum_partial() costs
when headers are parsed.


> +
>                 skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
>                 if (network_depth > ETH_HLEN)
> @@ -768,7 +788,7 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
>                         skb->csum = csum_partial(skb->data + ETH_HLEN,
>                                                  network_depth - ETH_HLEN,
>                                                  skb->csum);
> -               if (unlikely(netdev->features & NETIF_F_RXFCS))
> +               if (unlikely(has_fcs))
>                         skb->csum = csum_block_add(skb->csum,
>                                                    (__force __wsum)mlx5e_get_fcs(skb),
>                                                    skb->len - ETH_FCS_LEN);
> --
> 2.19.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04  6:34 ` Eric Dumazet
@ 2018-12-04  6:48   ` Cong Wang
       [not found]     ` <CANn89iK0j=2LYK=szVO+Fpg1-tX=wSz+ghZx8RnwZSEbxZjf5w@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2018-12-04  6:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Network Developers, Saeed Mahameed, Tariq Toukan

On Mon, Dec 3, 2018 at 10:34 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Dec 3, 2018 at 10:14 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > When an ethernet frame is padded to meet the minimum ethernet frame
> > size, the padding octets are not covered by the hardware checksum.
> > Fortunately the padding octets are ususally zero's, which don't affect
> > checksum. However, we have a switch which pads non-zero octets, this
> > causes kernel hardware checksum fault repeatedly.
> >
> > Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> > skb checksum was forced to be CHECKSUM_NONE when padding is detected.
> > After it, we need to keep skb->csum updated, like what we do for RXFCS.
> > However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
> > headers, it is not worthy the effort as the packets are so small that
> > CHECKSUM_COMPLETE can't save anything.
> >
> > I tested this patch with RXFCS on and off, it works fine without any
> > warning in both cases.
> >
> > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> > Cc: Saeed Mahameed <saeedm@mellanox.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Tariq Toukan <tariqt@mellanox.com>
> > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> > ---
> >  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++++++++++++++++++-
> >  1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index 624eed345b5d..1c153b8091da 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int network_depth, __be16 proto)
> >                                             ((struct ipv6hdr *)ip_p)->nexthdr;
> >  }
> >
> > +static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
> > +{
> > +       u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
> > +
> > +       return frame_len <= ETH_ZLEN;
> > +}
> > +
> >  static inline void mlx5e_handle_csum(struct net_device *netdev,
> >                                      struct mlx5_cqe64 *cqe,
> >                                      struct mlx5e_rq *rq,
> > @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
> >                 goto csum_unnecessary;
> >
> >         if (likely(is_last_ethertype_ip(skb, &network_depth, &proto))) {
> > +               bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
> > +
> >                 if (unlikely(get_ip_proto(skb, network_depth, proto) == IPPROTO_SCTP))
> >                         goto csum_unnecessary;
> >
> > +               /* CQE csum doesn't cover padding octets in short ethernet
> > +                * frames. And the pad field is appended prior to calculating
> > +                * and appending the FCS field.
> > +                *
> > +                * Detecting these padded frames requires to verify and parse
> > +                * IP headers, so we simply force all those small frames to be
> > +                * CHECKSUM_NONE even if they are not padded.
> > +                */
> > +               if (unlikely(is_short_frame(skb, has_fcs)))
> > +                       goto csum_none;
>
> Should not this go to csum_unnecessary instead ?

I don't see why we don't even want to validate the protocol checksum
here.

Any reason you are suggesting so?


>
> Probably not a big deal, but small UDP frames might hit this code path,
> so ethtool -S would show a lot of csum_none which could confuse mlx5 owners.

Why it is confusing? We intentionally bypass hardware checksum
and let protocol layer validate it.


>
> BTW,
> It looks like mlx5 prefers delivering skbs with CHECKSUM_COMPLETE instead of
> CHECKSUM_UNNECESSARY but at least for ipv6 CHECKSUM_UNNECESSARY
> would be slightly faster, by avoiding  various csum_partial() costs
> when headers are parsed.

Sure, it is certainly faster if you don't want to validate L4 checksum.
The only question is why we don't either validate hardware checksum
or L4 checksum?

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
       [not found]     ` <CANn89iK0j=2LYK=szVO+Fpg1-tX=wSz+ghZx8RnwZSEbxZjf5w@mail.gmail.com>
@ 2018-12-04  7:09       ` Eric Dumazet
  2018-12-04  7:29       ` Cong Wang
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-12-04  7:09 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Saeed Mahameed, Tariq Toukan

Resent to netdev@ without htm formatting, sorry.

On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <edumazet@google.com> wrote:
>
>
>
> On Mon, Dec 3, 2018 at 10:48 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> On Mon, Dec 3, 2018 at 10:34 PM Eric Dumazet <edumazet@google.com> wrote:
>> >
>> > On Mon, Dec 3, 2018 at 10:14 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > >
>> > > When an ethernet frame is padded to meet the minimum ethernet frame
>> > > size, the padding octets are not covered by the hardware checksum.
>> > > Fortunately the padding octets are ususally zero's, which don't affect
>> > > checksum. However, we have a switch which pads non-zero octets, this
>> > > causes kernel hardware checksum fault repeatedly.
>> > >
>> > > Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
>> > > skb checksum was forced to be CHECKSUM_NONE when padding is detected.
>> > > After it, we need to keep skb->csum updated, like what we do for RXFCS.
>> > > However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
>> > > headers, it is not worthy the effort as the packets are so small that
>> > > CHECKSUM_COMPLETE can't save anything.
>> > >
>> > > I tested this patch with RXFCS on and off, it works fine without any
>> > > warning in both cases.
>> > >
>> > > Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
>> > > Cc: Saeed Mahameed <saeedm@mellanox.com>
>> > > Cc: Eric Dumazet <edumazet@google.com>
>> > > Cc: Tariq Toukan <tariqt@mellanox.com>
>> > > Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
>> > > ---
>> > >  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++++++++++++++++++-
>> > >  1 file changed, 21 insertions(+), 1 deletion(-)
>> > >
>> > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > > index 624eed345b5d..1c153b8091da 100644
>> > > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> > > @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int network_depth, __be16 proto)
>> > >                                             ((struct ipv6hdr *)ip_p)->nexthdr;
>> > >  }
>> > >
>> > > +static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
>> > > +{
>> > > +       u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
>> > > +
>> > > +       return frame_len <= ETH_ZLEN;
>> > > +}
>> > > +
>> > >  static inline void mlx5e_handle_csum(struct net_device *netdev,
>> > >                                      struct mlx5_cqe64 *cqe,
>> > >                                      struct mlx5e_rq *rq,
>> > > @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
>> > >                 goto csum_unnecessary;
>> > >
>> > >         if (likely(is_last_ethertype_ip(skb, &network_depth, &proto))) {
>> > > +               bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
>> > > +
>> > >                 if (unlikely(get_ip_proto(skb, network_depth, proto) == IPPROTO_SCTP))
>> > >                         goto csum_unnecessary;
>> > >
>> > > +               /* CQE csum doesn't cover padding octets in short ethernet
>> > > +                * frames. And the pad field is appended prior to calculating
>> > > +                * and appending the FCS field.
>> > > +                *
>> > > +                * Detecting these padded frames requires to verify and parse
>> > > +                * IP headers, so we simply force all those small frames to be
>> > > +                * CHECKSUM_NONE even if they are not padded.
>> > > +                */
>> > > +               if (unlikely(is_short_frame(skb, has_fcs)))
>> > > +                       goto csum_none;
>> >
>> > Should not this go to csum_unnecessary instead ?
>>
>> I don't see why we don't even want to validate the protocol checksum
>> here.
>>
>> Any reason you are suggesting so?
>>
>>
>> >
>> > Probably not a big deal, but small UDP frames might hit this code path,
>> > so ethtool -S would show a lot of csum_none which could confuse mlx5 owners.
>>
>> Why it is confusing? We intentionally bypass hardware checksum
>> and let protocol layer validate it.
>
>
> The hardware has probably validated the L3 & L4 checksum just fine.
>
> Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> have no impact on the csum that has been verified by the NIC.
>
>
>>
>>
>> >
>> > BTW,
>> > It looks like mlx5 prefers delivering skbs with CHECKSUM_COMPLETE instead of
>> > CHECKSUM_UNNECESSARY but at least for ipv6 CHECKSUM_UNNECESSARY
>> > would be slightly faster, by avoiding  various csum_partial() costs
>> > when headers are parsed.
>>
>> Sure, it is certainly faster if you don't want to validate L4 checksum.
>
>
> Sorry I do not get your point.
>
>>
>> The only question is why we don't either validate hardware checksum
>> or L4 checksum?
>>
>
> CHECKSUM_UNNECESSARY means the NIC has fully  validated the checksums,
> including L4 one.
>
> For example, mlx4 drivers first checks if CHECKSUM_UNNECESSARY status can be given,
> and only fallback on CHECKSUM_COMPLETE for the rare cases this CHECKSUM_UNNECESSARY
> status was not given by the NIC.
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
       [not found]     ` <CANn89iK0j=2LYK=szVO+Fpg1-tX=wSz+ghZx8RnwZSEbxZjf5w@mail.gmail.com>
  2018-12-04  7:09       ` Eric Dumazet
@ 2018-12-04  7:29       ` Cong Wang
  2018-12-04  7:51         ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Cong Wang @ 2018-12-04  7:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Network Developers, Saeed Mahameed, Tariq Toukan

On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <edumazet@google.com> wrote:
>
> The hardware has probably validated the L3 & L4 checksum just fine.
>
> Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> have no impact on the csum that has been verified by the NIC.


Why? Why does the hardware validates L3/L4 checksum when it
supplies a full-packet checksum? What's its point here?

If it really validates L3/L4 checksum, then a full-packet checksum
is not needed.

If a full-packet checksum is supplied, the software is able to use
it to validate L3/L4 checksum, then the hardware doesn't need to
validate it.

I see no reason it provides both at the same time. If it really does,
then all CHECKSUM_COMPLETE code here could be just removed
and would be faster.

Something must be wrong with your argument.

>
> Sorry I do not get your point.


I don't get your point either.


Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04  7:29       ` Cong Wang
@ 2018-12-04  7:51         ` Eric Dumazet
  2018-12-04 19:17           ` Saeed Mahameed
  2018-12-04 20:31           ` Cong Wang
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2018-12-04  7:51 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, Saeed Mahameed, Tariq Toukan

On Mon, Dec 3, 2018 at 11:30 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > The hardware has probably validated the L3 & L4 checksum just fine.
> >
> > Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> > have no impact on the csum that has been verified by the NIC.
>
>
> Why? Why does the hardware validates L3/L4 checksum when it
> supplies a full-packet checksum? What's its point here?

The point is that the driver author can decide what is best.

For native IP+TCP or IP+UDP, the NIC has the ability to fully
understand the packet and fully validate the checksum.

>
> If it really validates L3/L4 checksum, then a full-packet checksum
> is not needed.

Yes, this is exactly what CHECKSUM_UNNECESSARY means.
linux stack does not have to perform the check another time.

For example, no call to csum_partial() is needed, even for IPv6+TCP or IPv6+UDP

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04  6:14 [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames Cong Wang
  2018-12-04  6:34 ` Eric Dumazet
@ 2018-12-04 19:02 ` Saeed Mahameed
  2018-12-04 20:44 ` Cong Wang
  2 siblings, 0 replies; 15+ messages in thread
From: Saeed Mahameed @ 2018-12-04 19:02 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Netdev List, Saeed Mahameed, Eric Dumazet, Tariq Toukan

On Mon, Dec 3, 2018 at 10:14 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are ususally zero's, which don't affect
> checksum. However, we have a switch which pads non-zero octets, this
> causes kernel hardware checksum fault repeatedly.
>
> Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> skb checksum was forced to be CHECKSUM_NONE when padding is detected.
> After it, we need to keep skb->csum updated, like what we do for RXFCS.
> However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
> headers, it is not worthy the effort as the packets are so small that
> CHECKSUM_COMPLETE can't save anything.
>
> I tested this patch with RXFCS on and off, it works fine without any
> warning in both cases.
>
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 22 ++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index 624eed345b5d..1c153b8091da 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -732,6 +732,13 @@ static u8 get_ip_proto(struct sk_buff *skb, int network_depth, __be16 proto)
>                                             ((struct ipv6hdr *)ip_p)->nexthdr;
>  }
>
> +static bool is_short_frame(struct sk_buff *skb, bool has_fcs)
> +{
> +       u32 frame_len = has_fcs ? skb->len - ETH_FCS_LEN : skb->len;
> +
> +       return frame_len <= ETH_ZLEN;
> +}
> +

Do we really need to handle FCS here ?
maybe increase the small packet size to always assume FCS is there:

return skb->len <= ETH_ZLEN + ETH_FCS_LEN;
to avoid conditional statements at Data path.

>  static inline void mlx5e_handle_csum(struct net_device *netdev,
>                                      struct mlx5_cqe64 *cqe,
>                                      struct mlx5e_rq *rq,
> @@ -755,9 +762,22 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
>                 goto csum_unnecessary;
>
>         if (likely(is_last_ethertype_ip(skb, &network_depth, &proto))) {
> +               bool has_fcs = !!(netdev->features & NETIF_F_RXFCS);
> +
>                 if (unlikely(get_ip_proto(skb, network_depth, proto) == IPPROTO_SCTP))
>                         goto csum_unnecessary;
>
> +               /* CQE csum doesn't cover padding octets in short ethernet
> +                * frames. And the pad field is appended prior to calculating
> +                * and appending the FCS field.
> +                *
> +                * Detecting these padded frames requires to verify and parse
> +                * IP headers, so we simply force all those small frames to be
> +                * CHECKSUM_NONE even if they are not padded.
> +                */
> +               if (unlikely(is_short_frame(skb, has_fcs)))
> +                       goto csum_none;
> +

As Eric mentioned, goto csum_unnecessary; here, the code will handle the rest.
for l3/l4 packets the HW already verifies the checksum we must leverage that.

>                 skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
>                 if (network_depth > ETH_HLEN)
> @@ -768,7 +788,7 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
>                         skb->csum = csum_partial(skb->data + ETH_HLEN,
>                                                  network_depth - ETH_HLEN,
>                                                  skb->csum);
> -               if (unlikely(netdev->features & NETIF_F_RXFCS))
> +               if (unlikely(has_fcs))
>                         skb->csum = csum_block_add(skb->csum,
>                                                    (__force __wsum)mlx5e_get_fcs(skb),
>                                                    skb->len - ETH_FCS_LEN);
> --
> 2.19.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04  7:51         ` Eric Dumazet
@ 2018-12-04 19:17           ` Saeed Mahameed
  2018-12-04 20:35             ` Cong Wang
  2018-12-04 20:31           ` Cong Wang
  1 sibling, 1 reply; 15+ messages in thread
From: Saeed Mahameed @ 2018-12-04 19:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Cong Wang, Linux Netdev List, Saeed Mahameed, Tariq Toukan

On Mon, Dec 3, 2018 at 11:52 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Dec 3, 2018 at 11:30 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > The hardware has probably validated the L3 & L4 checksum just fine.
> > >
> > > Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> > > have no impact on the csum that has been verified by the NIC.
> >
> >
> > Why? Why does the hardware validates L3/L4 checksum when it
> > supplies a full-packet checksum? What's its point here?
>
> The point is that the driver author can decide what is best.
>
> For native IP+TCP or IP+UDP, the NIC has the ability to fully
> understand the packet and fully validate the checksum.

Also for Native IP4 and IP6 plain L3 packets.
The Hardware validates the csum when it can, and always provides
checksum complete for all packets.
One of the reason to validate is that sometimes we want to skip
checksum complete, but still leverage the hw validation,
like in your patch :), or LRO case, or many other cases in other
kernels/OSes/drivers.

So i agree with Eric, let's jump to checksum_unnecessary for short packets.

>
> >
> > If it really validates L3/L4 checksum, then a full-packet checksum
> > is not needed.
>
> Yes, this is exactly what CHECKSUM_UNNECESSARY means.
> linux stack does not have to perform the check another time.
>
> For example, no call to csum_partial() is needed, even for IPv6+TCP or IPv6+UDP

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04  7:51         ` Eric Dumazet
  2018-12-04 19:17           ` Saeed Mahameed
@ 2018-12-04 20:31           ` Cong Wang
  1 sibling, 0 replies; 15+ messages in thread
From: Cong Wang @ 2018-12-04 20:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Linux Kernel Network Developers, Saeed Mahameed, Tariq Toukan

On Mon, Dec 3, 2018 at 11:51 PM Eric Dumazet <edumazet@google.com> wrote:
> > If it really validates L3/L4 checksum, then a full-packet checksum
> > is not needed.
>
> Yes, this is exactly what CHECKSUM_UNNECESSARY means.
> linux stack does not have to perform the check another time.

So you are suggesting to get rid of CHECKSUM_COMPLETE, right?
Like below:

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 7cb988640c9c..b9626c8b6b91 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -764,32 +764,6 @@ static inline void mlx5e_handle_csum(struct
net_device *netdev,
                return;
        }

-       if (unlikely(test_bit(MLX5E_RQ_STATE_NO_CSUM_COMPLETE, &rq->state)))
-               goto csum_unnecessary;
-
-       if (likely(is_last_ethertype_ip(skb, &network_depth, &proto))) {
-               if (unlikely(get_ip_proto(skb, network_depth, proto)
== IPPROTO_SCTP))
-                       goto csum_unnecessary;
-
-               skb->ip_summed = CHECKSUM_COMPLETE;
-               skb->csum = csum_unfold((__force __sum16)cqe->check_sum);
-               if (network_depth > ETH_HLEN)
-                       /* CQE csum is calculated from the IP header and does
-                        * not cover VLAN headers (if present). This will add
-                        * the checksum manually.
-                        */
-                       skb->csum = csum_partial(skb->data + ETH_HLEN,
-                                                network_depth - ETH_HLEN,
-                                                skb->csum);
-               if (unlikely(netdev->features & NETIF_F_RXFCS))
-                       skb->csum = csum_block_add(skb->csum,
-                                                  (__force
__wsum)mlx5e_get_fcs(skb),
-                                                  skb->len - ETH_FCS_LEN);
-               stats->csum_complete++;
-               return;
-       }
-
-csum_unnecessary:


>
> For example, no call to csum_partial() is needed, even for IPv6+TCP or IPv6+UDP

Sure, if you mean to get rid of CHECKSUM_COMPLETE.

Remember you fixed CHECKSUM_COMPLETE too when you touch it,
see d48051c5b837 ("net/mlx5e: fix csum adjustments caused by RXFCS").
So why didn't you remove CHECKSUM_COMPLETE but fixed it instead?

Thanks.

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04 19:17           ` Saeed Mahameed
@ 2018-12-04 20:35             ` Cong Wang
  2018-12-04 21:16               ` Eric Dumazet
  2018-12-05  0:59               ` Saeed Mahameed
  0 siblings, 2 replies; 15+ messages in thread
From: Cong Wang @ 2018-12-04 20:35 UTC (permalink / raw)
  To: saeedm
  Cc: Eric Dumazet, Linux Kernel Network Developers, Saeed Mahameed,
	Tariq Toukan

On Tue, Dec 4, 2018 at 11:17 AM Saeed Mahameed
<saeedm@dev.mellanox.co.il> wrote:
>
> On Mon, Dec 3, 2018 at 11:52 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Dec 3, 2018 at 11:30 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > The hardware has probably validated the L3 & L4 checksum just fine.
> > > >
> > > > Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
> > > > have no impact on the csum that has been verified by the NIC.
> > >
> > >
> > > Why? Why does the hardware validates L3/L4 checksum when it
> > > supplies a full-packet checksum? What's its point here?
> >
> > The point is that the driver author can decide what is best.
> >
> > For native IP+TCP or IP+UDP, the NIC has the ability to fully
> > understand the packet and fully validate the checksum.
>
> Also for Native IP4 and IP6 plain L3 packets.
> The Hardware validates the csum when it can, and always provides
> checksum complete for all packets.
> One of the reason to validate is that sometimes we want to skip
> checksum complete, but still leverage the hw validation,
> like in your patch :), or LRO case, or many other cases in other
> kernels/OSes/drivers.

This sounds wrong to me too.

If the HW already validates it, the software doesn't need to do it,
therefore must skip hw csum for performance gain.,


>
> So i agree with Eric, let's jump to checksum_unnecessary for short packets.

This is odd, if Eric is right, then we should completely get rid of
CHECKSUM_COMPLETE. Short packets are not exceptions.

I still don't understand why people including Eric kept fixing this
thing which could be just removed from the very beginning.
Sounds like nobody even looked into it until my patch.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04  6:14 [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames Cong Wang
  2018-12-04  6:34 ` Eric Dumazet
  2018-12-04 19:02 ` Saeed Mahameed
@ 2018-12-04 20:44 ` Cong Wang
  2 siblings, 0 replies; 15+ messages in thread
From: Cong Wang @ 2018-12-04 20:44 UTC (permalink / raw)
  To: Linux Kernel Network Developers
  Cc: Saeed Mahameed, Eric Dumazet, Tariq Toukan

On Mon, Dec 3, 2018 at 10:14 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> When an ethernet frame is padded to meet the minimum ethernet frame
> size, the padding octets are not covered by the hardware checksum.
> Fortunately the padding octets are ususally zero's, which don't affect
> checksum. However, we have a switch which pads non-zero octets, this
> causes kernel hardware checksum fault repeatedly.
>
> Prior to commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> skb checksum was forced to be CHECKSUM_NONE when padding is detected.
> After it, we need to keep skb->csum updated, like what we do for RXFCS.
> However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
> headers, it is not worthy the effort as the packets are so small that
> CHECKSUM_COMPLETE can't save anything.
>
> I tested this patch with RXFCS on and off, it works fine without any
> warning in both cases.
>
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Nacked-by: Cong Wang <xiyou.wangcong@gmail.com>

This patch has no value for upstream. Let's discard it.

Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04 20:35             ` Cong Wang
@ 2018-12-04 21:16               ` Eric Dumazet
  2018-12-04 21:20                 ` Cong Wang
  2018-12-05  0:59               ` Saeed Mahameed
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2018-12-04 21:16 UTC (permalink / raw)
  To: Cong Wang, saeedm
  Cc: Eric Dumazet, Linux Kernel Network Developers, Saeed Mahameed,
	Tariq Toukan



On 12/04/2018 12:35 PM, Cong Wang wrote:
> On Tue, Dec 4, 2018 at 11:17 AM Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
>>
>> On Mon, Dec 3, 2018 at 11:52 PM Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Mon, Dec 3, 2018 at 11:30 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>
>>>> On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <edumazet@google.com> wrote:
>>>>>
>>>>> The hardware has probably validated the L3 & L4 checksum just fine.
>>>>>
>>>>> Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding bytes (if any)
>>>>> have no impact on the csum that has been verified by the NIC.
>>>>
>>>>
>>>> Why? Why does the hardware validates L3/L4 checksum when it
>>>> supplies a full-packet checksum? What's its point here?
>>>
>>> The point is that the driver author can decide what is best.
>>>
>>> For native IP+TCP or IP+UDP, the NIC has the ability to fully
>>> understand the packet and fully validate the checksum.
>>
>> Also for Native IP4 and IP6 plain L3 packets.
>> The Hardware validates the csum when it can, and always provides
>> checksum complete for all packets.
>> One of the reason to validate is that sometimes we want to skip
>> checksum complete, but still leverage the hw validation,
>> like in your patch :), or LRO case, or many other cases in other
>> kernels/OSes/drivers.
> 
> This sounds wrong to me too.
> 
> If the HW already validates it, the software doesn't need to do it,
> therefore must skip hw csum for performance gain.,
> 
> 
>>
>> So i agree with Eric, let's jump to checksum_unnecessary for short packets.
> 
> This is odd, if Eric is right, then we should completely get rid of
> CHECKSUM_COMPLETE. Short packets are not exceptions.
> 
> I still don't understand why people including Eric kept fixing this
> thing which could be just removed from the very beginning.
> Sounds like nobody even looked into it until my patch.
> 

Erm I never suggested to get rid of CHECKSUM_COMPLETE...
My suggestion was to reorder the mlx5 logic to match mlx4 one.

CHECKSUM_COMPLETE is very nice _when_/_if_ the NIC is unable to
fully dissect a packet and validate L4, as a fallback.

I am pretty sure for example that IP reassembly can benefit from CHECKSUM_COMPLETE.
(Although for some reason mlx4 code does not handle IPv6 fragments in its CHECKSUM_COMPLETE path)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04 21:16               ` Eric Dumazet
@ 2018-12-04 21:20                 ` Cong Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Cong Wang @ 2018-12-04 21:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: saeedm, Eric Dumazet, Linux Kernel Network Developers,
	Saeed Mahameed, Tariq Toukan

On Tue, Dec 4, 2018 at 1:16 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Erm I never suggested to get rid of CHECKSUM_COMPLETE...
> My suggestion was to reorder the mlx5 logic to match mlx4 one.
>
> CHECKSUM_COMPLETE is very nice _when_/_if_ the NIC is unable to
> fully dissect a packet and validate L4, as a fallback.

Quote from Eric:

"For native IP+TCP or IP+UDP, the NIC has the ability to fully
understand the packet and fully validate the checksum."

Therefore CHECKSUM_COMPLETE is not nice here.

>
> I am pretty sure for example that IP reassembly can benefit from CHECKSUM_COMPLETE.
> (Although for some reason mlx4 code does not handle IPv6 fragments in its CHECKSUM_COMPLETE path)
>

I am pretty sure mlx5 driver code doesn't check for IP fragments.


My patch is dropped, let's keep the current code as it is and pretend
everything just works fine.

Thanks a lot!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-04 20:35             ` Cong Wang
  2018-12-04 21:16               ` Eric Dumazet
@ 2018-12-05  0:59               ` Saeed Mahameed
  2018-12-05  2:48                 ` Cong Wang
  1 sibling, 1 reply; 15+ messages in thread
From: Saeed Mahameed @ 2018-12-05  0:59 UTC (permalink / raw)
  To: saeedm, xiyou.wangcong; +Cc: netdev, Tariq Toukan, edumazet

On Tue, 2018-12-04 at 12:35 -0800, Cong Wang wrote:
> On Tue, Dec 4, 2018 at 11:17 AM Saeed Mahameed
> <saeedm@dev.mellanox.co.il> wrote:
> > On Mon, Dec 3, 2018 at 11:52 PM Eric Dumazet <edumazet@google.com>
> > wrote:
> > > On Mon, Dec 3, 2018 at 11:30 PM Cong Wang <
> > > xiyou.wangcong@gmail.com> wrote:
> > > > On Mon, Dec 3, 2018 at 11:08 PM Eric Dumazet <
> > > > edumazet@google.com> wrote:
> > > > > The hardware has probably validated the L3 & L4 checksum just
> > > > > fine.
> > > > > 
> > > > > Note that if ip_summed is CHECKSUM_UNNECESSARY, the padding
> > > > > bytes (if any)
> > > > > have no impact on the csum that has been verified by the NIC.
> > > > 
> > > > Why? Why does the hardware validates L3/L4 checksum when it
> > > > supplies a full-packet checksum? What's its point here?
> > > 
> > > The point is that the driver author can decide what is best.
> > > 
> > > For native IP+TCP or IP+UDP, the NIC has the ability to fully
> > > understand the packet and fully validate the checksum.
> > 
> > Also for Native IP4 and IP6 plain L3 packets.
> > The Hardware validates the csum when it can, and always provides
> > checksum complete for all packets.
> > One of the reason to validate is that sometimes we want to skip
> > checksum complete, but still leverage the hw validation,
> > like in your patch :), or LRO case, or many other cases in other
> > kernels/OSes/drivers.
> 
> This sounds wrong to me too.
> 
> If the HW already validates it, the software doesn't need to do it,
> therefore must skip hw csum for performance gain.,
> 

Cong, you are missing some important details, hardware can only parse a
handful of protocols IP/TCP/UDP some tunneling like vxlan,GRE, etc.. 
but for complicated protocols and new generic tunneling protocols,
which the hardware wasn't built to understand, the checksum complete
comes to the rescue.

not only that, imagine you are doing some proprietary tunneling and you
want to encapsulate the rx traffic and send it back to wire, how would
you want to recalculate the csum before txing ? manually on the whole
new packet or use the csum complete and just figure out the differences
? i am sure you gonna want to use the checksum complete of the original
packet.

So again csum complete is not only for validation, it is more powerful
than that.

> 
> > So i agree with Eric, let's jump to checksum_unnecessary for short
> > packets.
> 
> This is odd, if Eric is right, then we should completely get rid of
> CHECKSUM_COMPLETE. Short packets are not exceptions.
> 

short packets are exception, 
1. because of the small packet padding issue
2. because they are short enough not to feel the performance hit of
recalculating their whole csum (if necessary) .

Again still it is nice to report csum unnecessary for those packets, if
possible.

for large packets csum complete is favorable because of the above
reasons.


if you don't like checksum complete you have a way to totally disable
it in mlx5 via ethtool private flags.

I hope this helps.


> I still don't understand why people including Eric kept fixing this
> thing which could be just removed from the very beginning.
> Sounds like nobody even looked into it until my patch.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames
  2018-12-05  0:59               ` Saeed Mahameed
@ 2018-12-05  2:48                 ` Cong Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Cong Wang @ 2018-12-05  2:48 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: saeedm, Linux Kernel Network Developers, Tariq Toukan, Eric Dumazet

On Tue, Dec 4, 2018 at 4:59 PM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> Cong, you are missing some important details, hardware can only parse a
> handful of protocols IP/TCP/UDP some tunneling like vxlan,GRE, etc..
> but for complicated protocols and new generic tunneling protocols,
> which the hardware wasn't built to understand, the checksum complete
> comes to the rescue.
>
> not only that, imagine you are doing some proprietary tunneling and you
> want to encapsulate the rx traffic and send it back to wire, how would
> you want to recalculate the csum before txing ? manually on the whole
> new packet or use the csum complete and just figure out the differences
> ? i am sure you gonna want to use the checksum complete of the original
> packet.
>
> So again csum complete is not only for validation, it is more powerful
> than that.


OK, so how do you detect this? Clearly not by just checking skb->len,
right?

Let me show you why your arguments are non-sense.

1. From your description of how mlx5 hardware works, the driver logic
must be something like this:

if (hardware can checksum L4)
    use CHECKSUM_UNNECESSARY
else
    use CHECKSUM_COMPLETE

This obviously makes sense. Does the current code base match this?
No. The current code base only checks for IP/IPv6 and sets
CHECKSUM_COMPLETE (unless SCTP), far from matching what
you are trying to sell.

2. What you and Eric suggest to change to:

if (the frame is short)
    use CHECKSUM_UNNECESSARY
else
    use CHECKSUM_COMPLETE

Does this match what you describe? No, because short frame does not
necessarily mean hardware can't check its L4 checksum. It only misleads
people to believe so, it helps nothing.


3. What my patch actually does:

if (the frame is short)
    use CHECKSUM_NONE
else
    use CHECKSUM_COMPLETE

Why this makes sense? Because when skb->csum is not correct,
treating it as none is reasonable. And, checking skb->len doesn't
have indication of whether hardware can checksum L4 checksum
or not. This is _good_ when the rest code remains unchanged,
we have no clue whether hardware has done L4 checksum validation
or not, therefore choose to pass it to software to validate.

Of course you can eventually change all the code to match what
you are describing, but _until_ that happens CHECKSUM_NONE
makes more sense than CHECKSUM_UNNECESSARY here.



>
> >
> > > So i agree with Eric, let's jump to checksum_unnecessary for short
> > > packets.
> >
> > This is odd, if Eric is right, then we should completely get rid of
> > CHECKSUM_COMPLETE. Short packets are not exceptions.
> >
>
> short packets are exception,
> 1. because of the small packet padding issue

Why padding issue makes it CHECKSUM_UNNECESSARY not
CHECKSUM_NONE? Isn't the hardware's capability what makes the
difference?


> 2. because they are short enough not to feel the performance hit of
> recalculating their whole csum (if necessary) .

With CHECKSUM_UNNECESSARY, checksum is not recalculated
or validated at L4.


>
> Again still it is nice to report csum unnecessary for those packets, if
> possible.


Please define "if possible". You are hiding a lot of things behind
"if possible" to favor your own argument.


>
> for large packets csum complete is favorable because of the above
> reasons.


Only if hardware can checksum all large packets.


>
>
> if you don't like checksum complete you have a way to totally disable
> it in mlx5 via ethtool private flags.

Thanks but we are very happy to carry the fix locally.

This is why I stop arguing with this non-sense. Why not we all stop
here and agree to an disagreement? Why are we wasting our time?


Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-12-05  2:48 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-04  6:14 [Patch net v3] mlx5: force CHECKSUM_NONE for short ethernet frames Cong Wang
2018-12-04  6:34 ` Eric Dumazet
2018-12-04  6:48   ` Cong Wang
     [not found]     ` <CANn89iK0j=2LYK=szVO+Fpg1-tX=wSz+ghZx8RnwZSEbxZjf5w@mail.gmail.com>
2018-12-04  7:09       ` Eric Dumazet
2018-12-04  7:29       ` Cong Wang
2018-12-04  7:51         ` Eric Dumazet
2018-12-04 19:17           ` Saeed Mahameed
2018-12-04 20:35             ` Cong Wang
2018-12-04 21:16               ` Eric Dumazet
2018-12-04 21:20                 ` Cong Wang
2018-12-05  0:59               ` Saeed Mahameed
2018-12-05  2:48                 ` Cong Wang
2018-12-04 20:31           ` Cong Wang
2018-12-04 19:02 ` Saeed Mahameed
2018-12-04 20:44 ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.