All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Checksum offload and XDP
@ 2017-04-10 18:26 Tom Herbert
  2017-04-11 15:55 ` Edward Cree
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Herbert @ 2017-04-10 18:26 UTC (permalink / raw)
  To: Linux Kernel Network Developers

Not having checksum offload in XDP is going to get more painful once
we start seeing a lot programs doing packet modifications. One nice
thing we do for ILA router is pre-compute the checksum delta necessary
to maintain checksum neutral property in the packet. So that after
doing ILA routing in XDP the checksum complete value is still valid as
is the transport layer checksum.

It's conceivable we could generalize this by having a u16 checksum
delta returned from XDP program. If the checksum diff can be
pre-computed in a structure for doing the translation, then there
should be little cost other than making API a little more complex. On
return the checksum_complete value is updated jusy by adding in the
diff value.

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-10 18:26 RFC: Checksum offload and XDP Tom Herbert
@ 2017-04-11 15:55 ` Edward Cree
  2017-04-11 16:43   ` Daniel Borkmann
  2017-04-11 16:46   ` Tom Herbert
  0 siblings, 2 replies; 9+ messages in thread
From: Edward Cree @ 2017-04-11 15:55 UTC (permalink / raw)
  To: Tom Herbert, Linux Kernel Network Developers, Alexei Starovoitov

On 10/04/17 19:26, Tom Herbert wrote:
> Not having checksum offload in XDP is going to get more painful once
> we start seeing a lot programs doing packet modifications. One nice
> thing we do for ILA router is pre-compute the checksum delta necessary
> to maintain checksum neutral property in the packet. So that after
> doing ILA routing in XDP the checksum complete value is still valid as
> is the transport layer checksum.
>
> It's conceivable we could generalize this by having a u16 checksum
> delta returned from XDP program. If the checksum diff can be
> pre-computed in a structure for doing the translation, then there
> should be little cost other than making API a little more complex. On
> return the checksum_complete value is updated jusy by adding in the
> diff value.
>
> Tom

AIUI you're suggesting to have the user's BPF program do this calculation  and (somehow) feed the diff back to the caller.  As well as adding work
 for the XDP program writer, it also means trusting they got it right.
My suggestion was that the eBPF verifier could insert instructions around
 every write that update the diff value automagically, and not allow the
 user's program to touch it directly.
There would be some cleverness required, for instance to determine which
 byte of the checksum to add to (and thus sometimes shift the byte diff
 by 8 before adding it in) - which might require a runtime check when the
 load is indexed.  Or might not, if the verifier sees something like
 packet[ETH_HLEN + (ihl << 2)] and can deduce the low bit of the offset.
For the simple case, we would translate "write byte to packet+1" into:
	diff -= packet[1];
	write byte to packet+1;
	diff += packet[1];
Alexei, does this sound reasonable?

The counter-argument, of course, is that if the XDP program edits fields
 that are protected by an Internet checksum (which in practice usually
 means anything but the Ethernet header) and then fixes up the checksums
 itself, we will edit our diff value twice just to conclude that diff==0.
 And maybe we are willing to trust root with the ability to lie to the
 kernel about incoming packets' checksum_complete values.

-Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-11 15:55 ` Edward Cree
@ 2017-04-11 16:43   ` Daniel Borkmann
  2017-04-11 18:34     ` Jakub Kicinski
  2017-04-11 16:46   ` Tom Herbert
  1 sibling, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2017-04-11 16:43 UTC (permalink / raw)
  To: Edward Cree
  Cc: Tom Herbert, Linux Kernel Network Developers, Alexei Starovoitov

On 04/11/2017 05:55 PM, Edward Cree wrote:
> On 10/04/17 19:26, Tom Herbert wrote:
>> Not having checksum offload in XDP is going to get more painful once
>> we start seeing a lot programs doing packet modifications. One nice
>> thing we do for ILA router is pre-compute the checksum delta necessary
>> to maintain checksum neutral property in the packet. So that after
>> doing ILA routing in XDP the checksum complete value is still valid as
>> is the transport layer checksum.
>>
>> It's conceivable we could generalize this by having a u16 checksum
>> delta returned from XDP program. If the checksum diff can be
>> pre-computed in a structure for doing the translation, then there
>> should be little cost other than making API a little more complex. On
>> return the checksum_complete value is updated jusy by adding in the
>> diff value.
>>
>> Tom
>
> AIUI you're suggesting to have the user's BPF program do this calculation  and (somehow) feed the diff back to the caller.  As well as adding work
>   for the XDP program writer, it also means trusting they got it right.
> My suggestion was that the eBPF verifier could insert instructions around
>   every write that update the diff value automagically, and not allow the
>   user's program to touch it directly.
> There would be some cleverness required, for instance to determine which
>   byte of the checksum to add to (and thus sometimes shift the byte diff
>   by 8 before adding it in) - which might require a runtime check when the
>   load is indexed.  Or might not, if the verifier sees something like
>   packet[ETH_HLEN + (ihl << 2)] and can deduce the low bit of the offset.
> For the simple case, we would translate "write byte to packet+1" into:
> 	diff -= packet[1];
> 	write byte to packet+1;
> 	diff += packet[1];
> Alexei, does this sound reasonable?
>
> The counter-argument, of course, is that if the XDP program edits fields
>   that are protected by an Internet checksum (which in practice usually
>   means anything but the Ethernet header) and then fixes up the checksums
>   itself, we will edit our diff value twice just to conclude that diff==0.
>   And maybe we are willing to trust root with the ability to lie to the
>   kernel about incoming packets' checksum_complete values.

For cls_bpf programs, we effectively trust the user and I think it makes
sense to do a similar model in XDP as well. If the verifier would perform
such rewrites we basically assume csum complete everywhere, since at
verification time, it's unclear where the resulting fd for XDP will be
attached to wrt netdevice. Also, I think it's probably not so straight
forward to do this efficiently through insn rewrites.

cls_bpf has a couple of helpers like bpf_l3_csum_replace(), bpf_l4_csum_replace()
bpf_csum_diff(), bpf_csum_update(), where we (cilium at least) use checksum
diffs extensively. You can then also leave the option to the user to feed
pre-computed diffs from a map, etc, or, to just not care at all when not
necessary. Pushing this via return code seems a bit odd, perhaps a xdp->csum
member could be populated before calling into the program and the resulting
xdp->csum processed further upon exit.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-11 15:55 ` Edward Cree
  2017-04-11 16:43   ` Daniel Borkmann
@ 2017-04-11 16:46   ` Tom Herbert
  2017-04-11 17:13     ` Edward Cree
  1 sibling, 1 reply; 9+ messages in thread
From: Tom Herbert @ 2017-04-11 16:46 UTC (permalink / raw)
  To: Edward Cree; +Cc: Linux Kernel Network Developers, Alexei Starovoitov

On Tue, Apr 11, 2017 at 8:55 AM, Edward Cree <ecree@solarflare.com> wrote:
> On 10/04/17 19:26, Tom Herbert wrote:
>> Not having checksum offload in XDP is going to get more painful once
>> we start seeing a lot programs doing packet modifications. One nice
>> thing we do for ILA router is pre-compute the checksum delta necessary
>> to maintain checksum neutral property in the packet. So that after
>> doing ILA routing in XDP the checksum complete value is still valid as
>> is the transport layer checksum.
>>
>> It's conceivable we could generalize this by having a u16 checksum
>> delta returned from XDP program. If the checksum diff can be
>> pre-computed in a structure for doing the translation, then there
>> should be little cost other than making API a little more complex. On
>> return the checksum_complete value is updated jusy by adding in the
>> diff value.
>>
>> Tom
>
> AIUI you're suggesting to have the user's BPF program do this calculation  and (somehow) feed the diff back to the caller.  As well as adding work
>  for the XDP program writer, it also means trusting they got it right.
> My suggestion was that the eBPF verifier could insert instructions around
>  every write that update the diff value automagically, and not allow the
>  user's program to touch it directly.
> There would be some cleverness required, for instance to determine which
>  byte of the checksum to add to (and thus sometimes shift the byte diff
>  by 8 before adding it in) - which might require a runtime check when the
>  load is indexed.  Or might not, if the verifier sees something like
>  packet[ETH_HLEN + (ihl << 2)] and can deduce the low bit of the offset.
> For the simple case, we would translate "write byte to packet+1" into:
>         diff -= packet[1];
>         write byte to packet+1;
>         diff += packet[1];
> Alexei, does this sound reasonable?
>
> The counter-argument, of course, is that if the XDP program edits fields
>  that are protected by an Internet checksum (which in practice usually
>  means anything but the Ethernet header) and then fixes up the checksums
>  itself, we will edit our diff value twice just to conclude that diff==0.
>  And maybe we are willing to trust root with the ability to lie to the
>  kernel about incoming packets' checksum_complete values.
>
Works for modifying IPv4 or when we are expressly doing a checksum
neutral operation, but that doesn't cover the general case. Suppose we
simply want to pop an IPv6 header in IPv6/IPv6 encpasulation (there is
some push in IETF for expanded use of IPIP tunneling with IPv6 to deal
with the inserting EH into a packet problem). In that case the bits
being chucked won't sum to zero and they're probably not also the same
so the trick for ILA won't work so we need something to rectify the
checksum if we're doing XDP_PASS for instance.

As far as trust is concerned I don't think that is a major issue. The
XDP program can already modify the packet in arbitrary ways so if the
checksum is messed up this likely results in unnecessarily dropping
packets that actually a good checksum (as opposed to allowing packets
with bad checksum to pass). Note that this only applies to
checksum_complete, if we were to allow XDP program to return
checksum_unnecessary for instance then it's more a leap of faith that
things are always correct.

Tom

> -Ed
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-11 16:46   ` Tom Herbert
@ 2017-04-11 17:13     ` Edward Cree
  2017-04-11 17:26       ` Tom Herbert
  2017-04-11 18:43       ` Jakub Kicinski
  0 siblings, 2 replies; 9+ messages in thread
From: Edward Cree @ 2017-04-11 17:13 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers, Alexei Starovoitov

On 11/04/17 17:46, Tom Herbert wrote:
> On Tue, Apr 11, 2017 at 8:55 AM, Edward Cree <ecree@solarflare.com> wrote:
>> The counter-argument, of course, is that if the XDP program edits fields
>>  that are protected by an Internet checksum (which in practice usually
>>  means anything but the Ethernet header) and then fixes up the checksums
>>  itself, we will edit our diff value twice just to conclude that diff==0.
>>  And maybe we are willing to trust root with the ability to lie to the
>>  kernel about incoming packets' checksum_complete values.
>>
> Works for modifying IPv4 or when we are expressly doing a checksum
> neutral operation, but that doesn't cover the general case. Suppose we
> simply want to pop an IPv6 header in IPv6/IPv6 encpasulation (there is
> some push in IETF for expanded use of IPIP tunneling with IPv6 to deal
> with the inserting EH into a packet problem). In that case the bits
> being chucked won't sum to zero and they're probably not also the same
> so the trick for ILA won't work so we need something to rectify the
> checksum if we're doing XDP_PASS for instance.
Oh, I agree that if we don't do the rewriting we have to have some way
 for XDP to output a diff.  I agree with Daniel that a helper function
 seems the obvious way to get this information out.  My "counter-
 argument" was about "user does it" vs. "verifier does it".
> As far as trust is concerned I don't think that is a major issue. The
> XDP program can already modify the packet in arbitrary ways so if the
> checksum is messed up this likely results in unnecessarily dropping
> packets that actually a good checksum (as opposed to allowing packets
> with bad checksum to pass).
I wasn't thinking so much of trust in the security sense (I agree that's
 not much of an argument since XDP programs are privileged already), but
 rather as a reliability thing; it maybe being an easy thing for root to
 get wrong ("hey, this bit only edits Ethernet headers so I don't have
 to fix any checksums, job done.  Why is my network broken?")
So I think we'd have to assume that any program that _doesn't_ call the
 helper function has invalidated any checksum_complete value we had; if
 all the program's changes were balanced, it can call update_csum(0).
> Note that this only applies to
> checksum_complete, if we were to allow XDP program to return
> checksum_unnecessary for instance then it's more a leap of faith that
> things are always correct.
Speaking of checksum_unnecessary, it might still be useful for the
 verifier to tell us whether the program contains any writes, since if
 so any drivers using checksum_unnecessary will have to clear it when
 calling XDP or else do conversion to checksum_complete beforehand.  (We
 at sfc will probably take the latter path.)  But this can be elided if
 the XDP program is known not to do writes or header adjustments,
 potentially speeding things up for the pure drop case.
(Also we wouldn't require such a program to call update_csum for the
 checksum_complete case, of course.)

-Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-11 17:13     ` Edward Cree
@ 2017-04-11 17:26       ` Tom Herbert
  2017-04-11 18:43       ` Jakub Kicinski
  1 sibling, 0 replies; 9+ messages in thread
From: Tom Herbert @ 2017-04-11 17:26 UTC (permalink / raw)
  To: Edward Cree; +Cc: Linux Kernel Network Developers, Alexei Starovoitov

On Tue, Apr 11, 2017 at 10:13 AM, Edward Cree <ecree@solarflare.com> wrote:
> On 11/04/17 17:46, Tom Herbert wrote:
>> On Tue, Apr 11, 2017 at 8:55 AM, Edward Cree <ecree@solarflare.com> wrote:
>>> The counter-argument, of course, is that if the XDP program edits fields
>>>  that are protected by an Internet checksum (which in practice usually
>>>  means anything but the Ethernet header) and then fixes up the checksums
>>>  itself, we will edit our diff value twice just to conclude that diff==0.
>>>  And maybe we are willing to trust root with the ability to lie to the
>>>  kernel about incoming packets' checksum_complete values.
>>>
>> Works for modifying IPv4 or when we are expressly doing a checksum
>> neutral operation, but that doesn't cover the general case. Suppose we
>> simply want to pop an IPv6 header in IPv6/IPv6 encpasulation (there is
>> some push in IETF for expanded use of IPIP tunneling with IPv6 to deal
>> with the inserting EH into a packet problem). In that case the bits
>> being chucked won't sum to zero and they're probably not also the same
>> so the trick for ILA won't work so we need something to rectify the
>> checksum if we're doing XDP_PASS for instance.
> Oh, I agree that if we don't do the rewriting we have to have some way
>  for XDP to output a diff.  I agree with Daniel that a helper function
>  seems the obvious way to get this information out.  My "counter-
>  argument" was about "user does it" vs. "verifier does it".
>> As far as trust is concerned I don't think that is a major issue. The
>> XDP program can already modify the packet in arbitrary ways so if the
>> checksum is messed up this likely results in unnecessarily dropping
>> packets that actually a good checksum (as opposed to allowing packets
>> with bad checksum to pass).
> I wasn't thinking so much of trust in the security sense (I agree that's
>  not much of an argument since XDP programs are privileged already), but
>  rather as a reliability thing; it maybe being an easy thing for root to
>  get wrong ("hey, this bit only edits Ethernet headers so I don't have
>  to fix any checksums, job done.  Why is my network broken?")
> So I think we'd have to assume that any program that _doesn't_ call the
>  helper function has invalidated any checksum_complete value we had; if
>  all the program's changes were balanced, it can call update_csum(0).
>> Note that this only applies to
>> checksum_complete, if we were to allow XDP program to return
>> checksum_unnecessary for instance then it's more a leap of faith that
>> things are always correct.
> Speaking of checksum_unnecessary, it might still be useful for the
>  verifier to tell us whether the program contains any writes, since if
>  so any drivers using checksum_unnecessary will have to clear it when
>  calling XDP or else do conversion to checksum_complete beforehand.  (We
>  at sfc will probably take the latter path.)  But this can be elided if
>  the XDP program is known not to do writes or header adjustments,
>  potentially speeding things up for the pure drop case.
> (Also we wouldn't require such a program to call update_csum for the
>  checksum_complete case, of course.)
>
I don't think we need to do anything special for checksum_unnecessary
other than what we need for checksum_complete. For instance, if we do
return the checksum diff from XDP program then a non-zero value would
be sufficient to invalidate the checksum_unnecessary. Of course, it's
likely this is overkill since in many cases the changes wouldn't
affect checksum_unnecessary setting, but trying to unravel that to
preserve checksum_unnecessary would be a mess. checksum_unnecessary is
considered deprecated anyway!

Tom

> -Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-11 16:43   ` Daniel Borkmann
@ 2017-04-11 18:34     ` Jakub Kicinski
  2017-04-11 19:00       ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2017-04-11 18:34 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Edward Cree, Tom Herbert, Linux Kernel Network Developers,
	Alexei Starovoitov

On Tue, 11 Apr 2017 18:43:47 +0200, Daniel Borkmann wrote:
> cls_bpf has a couple of helpers like bpf_l3_csum_replace(), bpf_l4_csum_replace()
> bpf_csum_diff(), bpf_csum_update(), where we (cilium at least) use checksum
> diffs extensively. You can then also leave the option to the user to feed
> pre-computed diffs from a map, etc, or, to just not care at all when not
> necessary. Pushing this via return code seems a bit odd, perhaps a xdp->csum
> member could be populated before calling into the program and the resulting
> xdp->csum processed further upon exit.

+1 on exposing xdp->csum.  I think having the csum complete available
in the eBPF program (rather than diff) may be useful for apps doing
encap to a csumed tunnel (e.g. UDP).  If the header is pre-calculated
in the map, having csum complete makes calculating the outer csum
extremely easy, right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-11 17:13     ` Edward Cree
  2017-04-11 17:26       ` Tom Herbert
@ 2017-04-11 18:43       ` Jakub Kicinski
  1 sibling, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2017-04-11 18:43 UTC (permalink / raw)
  To: Edward Cree
  Cc: Tom Herbert, Linux Kernel Network Developers, Alexei Starovoitov

On Tue, 11 Apr 2017 18:13:37 +0100, Edward Cree wrote:
> > Note that this only applies to
> > checksum_complete, if we were to allow XDP program to return
> > checksum_unnecessary for instance then it's more a leap of faith that
> > things are always correct.  
> Speaking of checksum_unnecessary, it might still be useful for the
>  verifier to tell us whether the program contains any writes, since if
>  so any drivers using checksum_unnecessary will have to clear it when
>  calling XDP or else do conversion to checksum_complete beforehand.

If we exposed csum_complete as a value in xdp buffer, we could make the
unnecessary -> complete conversion dependent on accesses to that field
rather than writes and header adjustments.  If program doesn't read or
write the csum it doesn't care about the csum.

And make invalidation of unnecessary dependent on writes and header
adjustments.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: RFC: Checksum offload and XDP
  2017-04-11 18:34     ` Jakub Kicinski
@ 2017-04-11 19:00       ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2017-04-11 19:00 UTC (permalink / raw)
  To: kubakici; +Cc: daniel, ecree, tom, netdev, ast

From: Jakub Kicinski <kubakici@wp.pl>
Date: Tue, 11 Apr 2017 11:34:25 -0700

> On Tue, 11 Apr 2017 18:43:47 +0200, Daniel Borkmann wrote:
>> cls_bpf has a couple of helpers like bpf_l3_csum_replace(), bpf_l4_csum_replace()
>> bpf_csum_diff(), bpf_csum_update(), where we (cilium at least) use checksum
>> diffs extensively. You can then also leave the option to the user to feed
>> pre-computed diffs from a map, etc, or, to just not care at all when not
>> necessary. Pushing this via return code seems a bit odd, perhaps a xdp->csum
>> member could be populated before calling into the program and the resulting
>> xdp->csum processed further upon exit.
> 
> +1 on exposing xdp->csum.  I think having the csum complete available
> in the eBPF program (rather than diff) may be useful for apps doing
> encap to a csumed tunnel (e.g. UDP).  If the header is pre-calculated
> in the map, having csum complete makes calculating the outer csum
> extremely easy, right?

The outer checksum is constant because when the inner checksum is
installed the inner frame checksums to zero.

I really want to see how people are actually changing fields in XDP
that requires special checksum facilities, that isn't already handled
using the csum helpers Daniel mentioned.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-04-11 19:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-10 18:26 RFC: Checksum offload and XDP Tom Herbert
2017-04-11 15:55 ` Edward Cree
2017-04-11 16:43   ` Daniel Borkmann
2017-04-11 18:34     ` Jakub Kicinski
2017-04-11 19:00       ` David Miller
2017-04-11 16:46   ` Tom Herbert
2017-04-11 17:13     ` Edward Cree
2017-04-11 17:26       ` Tom Herbert
2017-04-11 18:43       ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.