All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] Local checksum offload for VXLAN
@ 2015-12-17 15:27 Edward Cree
  2015-12-17 15:29 ` [PATCH net-next 1/2] net: udp: local checksum offload for encapsulation Edward Cree
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Edward Cree @ 2015-12-17 15:27 UTC (permalink / raw)
  To: David Miller; +Cc: Tom Herbert, netdev

When the inner packet checksum is offloaded, the outer UDP checksum is easy
 to calculate as it doesn't depend on the payload (because the inner checksum
 cancels out everything from the inner packet except the pseudo header).
Thus, transmit checksums for VXLAN (and in principle other encapsulations,
 but I haven't enabled it for / tested with those) can be offloaded on any
 device supporting NETIF_F_HW_CSUM.  Only the innermost checksum has to be
 offloaded, the rest are filled in by the stack.
Tested by hacking a driver to report NETIF_F_HW_CSUM, call skb_checksum_help
 before transmitting a packet, and not actually offload anything to the hw.
In principle it should also be possible to apply this technique when the
 inner packet has been checksummed by software, but only if skb->csum_start
 and skb->csum_offset have been filled in to describe the inner checksum.
 However in this case it is easier to use skb->csum and skb->csum_start, as
 gso_make_checksum() already does - a similar but simpler technique.  It's
 not clear to me where else this should be done, so this is out of scope for
 this patch series.

Edward Cree (2):
  net: udp: local checksum offload for encapsulation
  net: vxlan: enable local checksum offload on HW_CSUM devices

 drivers/net/vxlan.c |  5 ++++-
 net/ipv4/udp.c      | 34 +++++++++++++++++++++++++++++-----
 2 files changed, 33 insertions(+), 6 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net-next 1/2] net: udp: local checksum offload for encapsulation
  2015-12-17 15:27 [PATCH net-next 0/2] Local checksum offload for VXLAN Edward Cree
@ 2015-12-17 15:29 ` Edward Cree
  2015-12-17 15:30 ` [PATCH net-next 2/2] net: vxlan: enable local checksum offload on HW_CSUM devices Edward Cree
  2015-12-17 18:06 ` [PATCH net-next 0/2] Local checksum offload for VXLAN Tom Herbert
  2 siblings, 0 replies; 8+ messages in thread
From: Edward Cree @ 2015-12-17 15:29 UTC (permalink / raw)
  To: David Miller; +Cc: Tom Herbert, netdev

The arithmetic properties of the ones-complement checksum mean that a
 correctly checksummed inner packet, including its checksum, has a ones
 complement sum depending only on whatever value was used to initialise
 the checksum field before checksumming (in the case of TCP and UDP,
 this is the ones complement sum of the pseudo header, complemented).
Consequently, if we are going to offload the inner checksum with
 CHECKSUM_PARTIAL, we can compute the outer checksum based only on the
 packed data not covered by the inner checksum, and the initial value of
 the inner checksum field.

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 net/ipv4/udp.c | 34 +++++++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8841e98..3e63c3d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -767,13 +767,37 @@ void udp_set_csum(bool nocheck, struct sk_buff *skb,
 {
 	struct udphdr *uh = udp_hdr(skb);
 
-	if (nocheck)
+	if (nocheck) {
 		uh->check = 0;
-	else if (skb_is_gso(skb))
+	} else if (skb_is_gso(skb)) {
 		uh->check = ~udp_v4_check(len, saddr, daddr, 0);
-	else if (skb_dst(skb) && skb_dst(skb)->dev &&
-		 (skb_dst(skb)->dev->features &
-		  (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) {
+	} else if (skb->ip_summed == CHECKSUM_PARTIAL &&
+		   skb_dst(skb) && skb_dst(skb)->dev &&
+		   (skb_dst(skb)->dev->features &
+		    (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) {
+		/* Everything from csum_start onwards will be
+		 * checksummed and will thus have a sum of whatever
+		 * we previously put in the checksum field (eg. sum
+		 * of pseudo-header)
+		 */
+		__wsum csum;
+
+		/* Fill in our pseudo-header checksum */
+		uh->check = ~udp_v4_check(len, saddr, daddr, 0);
+		/* Start with complement of inner pseudo-header checksum */
+		csum = ~skb_checksum(skb, skb_checksum_start_offset(skb) + skb->csum_offset,
+				     2, 0);
+		/* Add in checksum of our headers (incl. pseudo-header
+		 * checksum filled in above)
+		 */
+		csum = skb_checksum(skb, 0, skb_checksum_start_offset(skb), csum);
+		/* The result is the outer checksum */
+		uh->check = csum_fold(csum);
+		if (uh->check == 0)
+			uh->check = CSUM_MANGLED_0;
+	} else if (skb_dst(skb) && skb_dst(skb)->dev &&
+		   (skb_dst(skb)->dev->features &
+		    (NETIF_F_IP_CSUM | NETIF_F_HW_CSUM))) {
 
 		BUG_ON(skb->ip_summed == CHECKSUM_PARTIAL);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net-next 2/2] net: vxlan: enable local checksum offload on HW_CSUM devices
  2015-12-17 15:27 [PATCH net-next 0/2] Local checksum offload for VXLAN Edward Cree
  2015-12-17 15:29 ` [PATCH net-next 1/2] net: udp: local checksum offload for encapsulation Edward Cree
@ 2015-12-17 15:30 ` Edward Cree
  2015-12-17 18:06 ` [PATCH net-next 0/2] Local checksum offload for VXLAN Tom Herbert
  2 siblings, 0 replies; 8+ messages in thread
From: Edward Cree @ 2015-12-17 15:30 UTC (permalink / raw)
  To: David Miller; +Cc: Tom Herbert, netdev

Signed-off-by: Edward Cree <ecree@solarflare.com>
---
 drivers/net/vxlan.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6369a57..c1660d6 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1785,6 +1785,9 @@ static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *sk
 	bool udp_sum = !!(vxflags & VXLAN_F_UDP_CSUM);
 	int type = udp_sum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
 	u16 hdrlen = sizeof(struct vxlanhdr);
+	/* Is device able to do the inner checksum? */
+	bool inner_csum = skb_dst(skb) && skb_dst(skb)->dev &&
+				(skb_dst(skb)->dev->features & NETIF_F_HW_CSUM);
 
 	if ((vxflags & VXLAN_F_REMCSUM_TX) &&
 	    skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -1814,7 +1817,7 @@ static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *sk
 	if (WARN_ON(!skb))
 		return -ENOMEM;
 
-	skb = iptunnel_handle_offloads(skb, udp_sum, type);
+	skb = iptunnel_handle_offloads(skb, udp_sum && !inner_csum, type);
 	if (IS_ERR(skb))
 		return PTR_ERR(skb);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/2] Local checksum offload for VXLAN
  2015-12-17 15:27 [PATCH net-next 0/2] Local checksum offload for VXLAN Edward Cree
  2015-12-17 15:29 ` [PATCH net-next 1/2] net: udp: local checksum offload for encapsulation Edward Cree
  2015-12-17 15:30 ` [PATCH net-next 2/2] net: vxlan: enable local checksum offload on HW_CSUM devices Edward Cree
@ 2015-12-17 18:06 ` Tom Herbert
  2015-12-18 10:41   ` Edward Cree
  2015-12-26  4:41   ` David Miller
  2 siblings, 2 replies; 8+ messages in thread
From: Tom Herbert @ 2015-12-17 18:06 UTC (permalink / raw)
  To: Edward Cree; +Cc: David Miller, netdev

On Thu, Dec 17, 2015 at 7:27 AM, Edward Cree <ecree@solarflare.com> wrote:
> When the inner packet checksum is offloaded, the outer UDP checksum is easy
>  to calculate as it doesn't depend on the payload (because the inner checksum
>  cancels out everything from the inner packet except the pseudo header).
> Thus, transmit checksums for VXLAN (and in principle other encapsulations,
>  but I haven't enabled it for / tested with those) can be offloaded on any
>  device supporting NETIF_F_HW_CSUM.  Only the innermost checksum has to be
>  offloaded, the rest are filled in by the stack.
> Tested by hacking a driver to report NETIF_F_HW_CSUM, call skb_checksum_help
>  before transmitting a packet, and not actually offload anything to the hw.
> In principle it should also be possible to apply this technique when the
>  inner packet has been checksummed by software, but only if skb->csum_start
>  and skb->csum_offset have been filled in to describe the inner checksum.
>  However in this case it is easier to use skb->csum and skb->csum_start, as
>  gso_make_checksum() already does - a similar but simpler technique.  It's
>  not clear to me where else this should be done, so this is out of scope for
>  this patch series.
>
Edward, it took me a while to understand how this works, but this
really is an amazing trick! This implies that we don't need to worry
about HW support for offloading multiple checksums.

I'm not sure that we need bits in VXLAN or any other encapsulation. It
should be sufficient in udp_set_csum that if we already have
CHECKSUM_PARTIAL that can always be used to do local checksum offload.
This is also should be independent as to whether the device does
NETIF_F_HW_CSUM or can offload  NETIF_F_IP[V6]_CSUM for encapsulated
packets.

It would be nice to have a more formal documentation also. This is a
very powerful mechanism but the math behind it and requirements are
subtle.

Tom

> Edward Cree (2):
>   net: udp: local checksum offload for encapsulation
>   net: vxlan: enable local checksum offload on HW_CSUM devices
>
>  drivers/net/vxlan.c |  5 ++++-
>  net/ipv4/udp.c      | 34 +++++++++++++++++++++++++++++-----
>  2 files changed, 33 insertions(+), 6 deletions(-)
>
> --
> 2.4.3
>
> The information contained in this message is confidential and is intended for the addressee(s) only. If you have received this message in error, please notify the sender immediately and delete the message. Unless you are an addressee (or authorized to receive for an addressee), you may not use, copy or disclose to anyone this message or any information contained in this message. The unauthorized use, disclosure, copying or alteration of this message is strictly prohibited.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/2] Local checksum offload for VXLAN
  2015-12-17 18:06 ` [PATCH net-next 0/2] Local checksum offload for VXLAN Tom Herbert
@ 2015-12-18 10:41   ` Edward Cree
  2015-12-18 19:41     ` Tom Herbert
  2015-12-26  4:41   ` David Miller
  1 sibling, 1 reply; 8+ messages in thread
From: Edward Cree @ 2015-12-18 10:41 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev

On 17/12/15 18:06, Tom Herbert wrote:
> I'm not sure that we need bits in VXLAN or any other encapsulation. It
> should be sufficient in udp_set_csum that if we already have
> CHECKSUM_PARTIAL that can always be used to do local checksum offload.
My understandingis that otherwise iptunnel_handle_offloads() will do the
 inner checksum in sw, because csum_help will be passed as true.  It will
 call skb_checksum_help().
> This is also should be independent as to whether the device does
> NETIF_F_HW_CSUM or can offload  NETIF_F_IP[V6]_CSUM for encapsulated
> packets.
I was wary of drivers that declare NETIF_F_IP[V6]_CSUM but don't cope with
 encapsulated packets.  Would they do the right thing if the inner_csum bool
 in patch 2 just tested for NETIF_F_CSUM_MASK, or do I need to test things
 like NETIF_F_GSO_UDP_TUNNEL_CSUM?  I'm afraid I don't entirely understand
 the infrastructure here, so I just did the minimal thing I was sure worked,
 i.e. testing for NETIF_F_HW_CSUM.
> It would be nice to have a more formal documentation also. This is a
> very powerful mechanism but the math behind it and requirements are
> subtle.
>
> Tom
What would be a good place to put such documentation?  In
 Documentation/networking, or as part of the big checksums comment at the
 top of skbuff.h?

-ed

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/2] Local checksum offload for VXLAN
  2015-12-18 10:41   ` Edward Cree
@ 2015-12-18 19:41     ` Tom Herbert
  2016-01-06 19:23       ` Edward Cree
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Herbert @ 2015-12-18 19:41 UTC (permalink / raw)
  To: Edward Cree; +Cc: David Miller, netdev

On Fri, Dec 18, 2015 at 2:41 AM, Edward Cree <ecree@solarflare.com> wrote:
> On 17/12/15 18:06, Tom Herbert wrote:
>> I'm not sure that we need bits in VXLAN or any other encapsulation. It
>> should be sufficient in udp_set_csum that if we already have
>> CHECKSUM_PARTIAL that can always be used to do local checksum offload.
> My understandingis that otherwise iptunnel_handle_offloads() will do the
>  inner checksum in sw, because csum_help will be passed as true.  It will
>  call skb_checksum_help().
>> This is also should be independent as to whether the device does
>> NETIF_F_HW_CSUM or can offload  NETIF_F_IP[V6]_CSUM for encapsulated
>> packets.
> I was wary of drivers that declare NETIF_F_IP[V6]_CSUM but don't cope with
>  encapsulated packets.  Would they do the right thing if the inner_csum bool
>  in patch 2 just tested for NETIF_F_CSUM_MASK, or do I need to test things
>  like NETIF_F_GSO_UDP_TUNNEL_CSUM?  I'm afraid I don't entirely understand
>  the infrastructure here, so I just did the minimal thing I was sure worked,
>  i.e. testing for NETIF_F_HW_CSUM.

Drivers indicate that can do NETIF_F_IP[V6]_CSUM for encapsulation by
setting enc_features. This is checked in validate_xmit_skb so that if
drive can't handle encapsulated checksum skb_checksum_help is called
there.

 >> It would be nice to have a more formal documentation also. This is a
>> very powerful mechanism but the math behind it and requirements are
>> subtle.
>>
>> Tom
> What would be a good place to put such documentation?  In
>  Documentation/networking, or as part of the big checksums comment at the
>  top of skbuff.h?
>
I don't think this right for skbuff.h that should just describe the
interface. LCO has no interface like checksum-unnecessary conversion.
Checksumming, encapsulation, segmentation offloads are complex enough
now there should really be a Linux doc on this maybe modeled after
scaling.txt. That's probably more than you bargained for in this
patch, but if someone wants to learn how this infrastructure really
works, writing a doc is a good way! :-)

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/2] Local checksum offload for VXLAN
  2015-12-17 18:06 ` [PATCH net-next 0/2] Local checksum offload for VXLAN Tom Herbert
  2015-12-18 10:41   ` Edward Cree
@ 2015-12-26  4:41   ` David Miller
  1 sibling, 0 replies; 8+ messages in thread
From: David Miller @ 2015-12-26  4:41 UTC (permalink / raw)
  To: tom; +Cc: ecree, netdev

From: Tom Herbert <tom@herbertland.com>
Date: Thu, 17 Dec 2015 10:06:13 -0800

> On Thu, Dec 17, 2015 at 7:27 AM, Edward Cree <ecree@solarflare.com> wrote:
>> When the inner packet checksum is offloaded, the outer UDP checksum is easy
>>  to calculate as it doesn't depend on the payload (because the inner checksum
>>  cancels out everything from the inner packet except the pseudo header).
>> Thus, transmit checksums for VXLAN (and in principle other encapsulations,
>>  but I haven't enabled it for / tested with those) can be offloaded on any
>>  device supporting NETIF_F_HW_CSUM.  Only the innermost checksum has to be
>>  offloaded, the rest are filled in by the stack.
>> Tested by hacking a driver to report NETIF_F_HW_CSUM, call skb_checksum_help
>>  before transmitting a packet, and not actually offload anything to the hw.
>> In principle it should also be possible to apply this technique when the
>>  inner packet has been checksummed by software, but only if skb->csum_start
>>  and skb->csum_offset have been filled in to describe the inner checksum.
>>  However in this case it is easier to use skb->csum and skb->csum_start, as
>>  gso_make_checksum() already does - a similar but simpler technique.  It's
>>  not clear to me where else this should be done, so this is out of scope for
>>  this patch series.
>>
> Edward, it took me a while to understand how this works, but this
> really is an amazing trick! This implies that we don't need to worry
> about HW support for offloading multiple checksums.
> 
> I'm not sure that we need bits in VXLAN or any other encapsulation. It
> should be sufficient in udp_set_csum that if we already have
> CHECKSUM_PARTIAL that can always be used to do local checksum offload.
> This is also should be independent as to whether the device does
> NETIF_F_HW_CSUM or can offload  NETIF_F_IP[V6]_CSUM for encapsulated
> packets.
> 
> It would be nice to have a more formal documentation also. This is a
> very powerful mechanism but the math behind it and requirements are
> subtle.

I also think this is amazing, but should be generalized.

Also the pseudo header bits should be optimized a bit, running
skb_checksum_help() on a two byte part of the packet can certainly be
improved for example. :-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next 0/2] Local checksum offload for VXLAN
  2015-12-18 19:41     ` Tom Herbert
@ 2016-01-06 19:23       ` Edward Cree
  0 siblings, 0 replies; 8+ messages in thread
From: Edward Cree @ 2016-01-06 19:23 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev

On 18/12/15 19:41, Tom Herbert wrote:
> Drivers indicate that can do NETIF_F_IP[V6]_CSUM for encapsulation by
> setting enc_features. This is checked in validate_xmit_skb so that if
> drive can't handle encapsulated checksum skb_checksum_help is called
> there.
In that case, why is udp_set_csum ever looking at skb_dst(skb)->dev->features
 in the first place?  Shouldn't it just always set up checksum offload, and
 rely on it getting fixed up in validate_xmit_skb()?
> On Fri, Dec 18, 2015 at 2:41 AM, Edward Cree <ecree@solarflare.com> wrote:
>> On 17/12/15 18:06, Tom Herbert wrote:
>>> I'm not sure that we need bits in VXLAN or any other encapsulation. It
>>> should be sufficient in udp_set_csum that if we already have
>>> CHECKSUM_PARTIAL that can always be used to do local checksum offload.
>> My understandingis that otherwise iptunnel_handle_offloads() will do the
>>  inner checksum in sw, because csum_help will be passed as true.  It will
>>  call skb_checksum_help().
I'm beginning to think that csum_help should just go away, and all tunnels
 that have both inner and outer checksums should make sure they support LCO.
 I don't see any situation in which it should be necessary for
 iptunnel_handle_offloads to call skb_checksum_help, now that we never need
 to offload the outer checksum.
As far as I can tell from a quick LXR search, the protocols which sometimes
 pass csum_help as true are:
* FOU.  Calls udp_set_csum, so will get LCO support for free.
* GRE.  The GRE header has its own checksum, but it uses ones complement, so
 ip_gre.c:build_header needs to implement LCO.  Probably we want a helper
 function for LCO, that does everything except fill in the outer pseudo-
 header sum, which can then be called by both GRE and udp_set_csum.
* Geneve.  This goes through udp_tunnel_xmit_skb, so udp_set_csum gets
 called, so Geneve will get LCO support for free.
* VXLAN.  Like Geneve, goes through udp_tunnel_xmit_skb, so gets LCO support
 for free.
So, if we implement LCO for GRE, we can then drop the csum_help argument to
 iptunnel_handle_offloads() entirely, and act as though it's always false.
Also, of course, we need to do the same stuff for the IPv6 versions: add LCO
 to udp6_set_csum(), and to the IPv6-GRE implementation, which appears to be
 the call to ip_compute_csum() inside ip6gre_xmit2().  Although, it looks
 like IPv6-GRE has so little in common with other tunnel code, that simply
 leaving it as it is wouldn't break anything.  If anyone is actually *using*
 GRE over IPv6 (the spec explicitly says it doesn't cover IPv6) and wants
 the performance boost, they can implement LCO for it themselves :)
The only remaining problem is GSO: I haven't got quite as far yet with
 understanding that path.  But I don't _think_ implementing LCO for non-GSO
 will break GSO.  In which case we can get the non-GSO side working first,
 then try to implement the GSO version later.
> I don't think this right for skbuff.h that should just describe the
> interface. LCO has no interface like checksum-unnecessary conversion.
> Checksumming, encapsulation, segmentation offloads are complex enough
> now there should really be a Linux doc on this maybe modeled after
> scaling.txt. That's probably more than you bargained for in this
> patch, but if someone wants to learn how this infrastructure really
> works, writing a doc is a good way! :-)
I'm making some attempt to write a doc for checksumming and encapsulation.
 But I'd prefer to punt on segmentation because that's a whole other thing
 and I don't know any of the code yet - and I don't think it blocks any of
 the csum/encap work.  (Though that work will need to be done again for the
 segmentation offload path, I think.)  So I'll leave that bit for someone
 else to document ;)
Can I claim that canonically the only part of the TX stack that should look
 at the netdev offload features is validate_xmit_skb(), and everything else
 should just assume all features are supported?  (Like the udp_set_csum bit
 I started this email with.)  This would seem like the Right Thing, as it
 mirrors the new rule for device drivers that they should have sw fallbacks
 rather than trying to tell the stack exactly what they support.
Of course, that does include all the functions validate_xmit_skb() _calls_,
 which appears to include the entirety of GSO.  But like I say, I'm
 pretending GSO doesn't exist for now...

-Edward

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-01-06 19:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-17 15:27 [PATCH net-next 0/2] Local checksum offload for VXLAN Edward Cree
2015-12-17 15:29 ` [PATCH net-next 1/2] net: udp: local checksum offload for encapsulation Edward Cree
2015-12-17 15:30 ` [PATCH net-next 2/2] net: vxlan: enable local checksum offload on HW_CSUM devices Edward Cree
2015-12-17 18:06 ` [PATCH net-next 0/2] Local checksum offload for VXLAN Tom Herbert
2015-12-18 10:41   ` Edward Cree
2015-12-18 19:41     ` Tom Herbert
2016-01-06 19:23       ` Edward Cree
2015-12-26  4:41   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.