All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ansis Atteka <ansisatteka@gmail.com>
To: Steffen Klassert <steffen.klassert@secunet.com>,
	netdev@vger.kernel.org, Ansis Atteka <aatteka@ovn.org>
Subject: Re: [PATCH net] xfrm: calculate L4 checksums also for GSO case before encrypting packets
Date: Thu, 13 Apr 2017 19:54:35 -0700	[thread overview]
Message-ID: <CAMQa7Bhr3VTSzRucCX_nJEFTuO7nFpV+g_DD_qekvJm5ZMxdBA@mail.gmail.com> (raw)
In-Reply-To: <CAMQa7Biajree5Kc1fOWQN42R1UDDGp7ZevZZRtUMZOKDWTk-Vw@mail.gmail.com>

On 13 April 2017 at 19:45, Ansis Atteka <ansisatteka@gmail.com> wrote:
>
>
>
> On 11 April 2017 at 00:07, Steffen Klassert <steffen.klassert@secunet.com> wrote:
>>
>> On Mon, Apr 10, 2017 at 11:42:07AM -0700, Ansis Atteka wrote:
>> > Otherwise, if L4 checksum calculation is done after encryption,
>> > then all ESP packets end up being corrupted at the location
>> > where pre-encryption L4 checksum field resides.
>> >
>> > One of the ways to reproduce this bug is to have a VM with virtio_net
>> > driver (UFO set to ON in the guest VM); and then encapsulate all guest's
>> > Ethernet frames in GENEVE; and then further encrypt GENEVE with IPsec.
>> > In this case following symptoms are observed:
>> > 1. If using ixgbe NIC, then the driver will also emit following
>> >    warning message:
>> >    ixgbe 0000:01:00.1: partial checksum but l4 proto=32!
>> > 2. Receiving VM will drop all the corrupted ESP packets, hence UDP iperf test
>> >    with large packets will fail completely or TCP iperf will get ridiculously
>> >    low performance because TCP window will never grow above MTU.
>> >
>> > Signed-off-by: Ansis Atteka <aatteka@ovn.org>
>> > ---
>> >  net/xfrm/xfrm_output.c | 19 +++++++++++++------
>> >  1 file changed, 13 insertions(+), 6 deletions(-)
>> >
>> > diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
>> > index 8ba29fe..7ad7e5f 100644
>> > --- a/net/xfrm/xfrm_output.c
>> > +++ b/net/xfrm/xfrm_output.c
>> > @@ -168,7 +168,8 @@ static int xfrm_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
>> >
>> >  static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb)
>> >  {
>> > -     struct sk_buff *segs;
>> > +     struct sk_buff *segs, *nskb;
>> > +     int err;
>> >
>> >       BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
>> >       BUILD_BUG_ON(sizeof(*IP6CB(skb)) > SKB_SGO_CB_OFFSET);
>> > @@ -180,21 +181,27 @@ static int xfrm_output_gso(struct net *net, struct sock *sk, struct sk_buff *skb
>> >               return -EINVAL;
>> >
>> >       do {
>> > -             struct sk_buff *nskb = segs->next;
>> > -             int err;
>> > +             nskb = segs->next;
>> >
>> >               segs->next = NULL;
>> > -             err = xfrm_output2(net, sk, segs);
>> > +             err = skb_checksum_help(segs);
>>
>> What's wrong with the checksum provided by the GSO layer and
>> why we have to do this unconditionally here?

I believe with "GSO layer" you meant the skb_gso_segment() function
invocation from xfrm_output_gso()?

If so, then the problem with that is that the list of the skb's
returned by that function could be in CHECKSUM_PARTIAL state, if skbs
came from a UDP tunnel such as Geneve:

   xfrm_output() {
     __skb_gso_segment() {
       skb_mac_gso_segment() {
         skb_network_protocol();
         inet_gso_segment() {
           udp4_ufo_fragment() {
             skb_udp_tunnel_segment() {
               skb_mac_gso_segment() {
                 skb_network_protocol();
                 inet_gso_segment() {
                   udp4_ufo_fragment() {
                     skb_checksum() {
                       __skb_checksum() {
                         csum_partial() {
                           do_csum();
                         }
                         csum_partial() {
                           do_csum();
                         }
                       }


Since those skbs could remain in CHECKSUM_PARTIAL state even after
IPsec encryption, then ixgbe tries to calculate L4 checksums on
already encrypted skb where L4 layer is already protected through
IPsec integrity checks. Hence, ESP packets end up being corrupted and
dropped on receive side by XFRM. I clearly see this ESP packet
corruption happening by observing:
1. in wireshark that the same ESP packet differs at the offset where
UDP checksum field should reside; AND
2. in dmesg that ixgbe driver complains on send side with "partial
checksum but L4 proto is 0x32 (ESP)". AND
3. in /proc/net/xfrm_stat where XfrmInStateProtoErrorcounter is
incremented on receive side each time it receives corrupted packet.


>>
>>
>> We don't announce any checksum capabilities, so the GSO
>> layer should provide the checksum. If this is not the case,
>> something along the path is taking wrong assumptions.
>
>
The same explicit checksum calculation is done from xfrm_output() for
non-GSO case, so it was tempting for me to simply put a similar
skb_checksum_help() for GSO case as well.
>
>> Btw. all GSO packets on a standard IPv4 xfrm tunnel are getting
>> dropped with your patch applied.
>>
I think I just noticed possible issue with my patch that I sent out.
In your setup were packets getting dropped on receive side due to UDP
checksum failure (and not IPsec integrity check failure)? If so then I
wonder if after my patch applied skb_checksum_help() was called twice
under conditions that you tested for. Hence the skbs ended up with
wrong checksums.

So, would you mind to give another spin for my patch after applying
this small amendment that calls skb_checkum_help() only in
CHECKSUM_PARTIAL case:

diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 7ad7e5f..a870164 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -184,10 +184,12 @@ static int xfrm_output_gso(struct net *net,
struct sock *sk, struct sk_buff *skb
                nskb = segs->next;

                segs->next = NULL;
-               err = skb_checksum_help(segs);
-               if (unlikely(err)) {
-                       XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
-                       goto error;
+               if (skb->ip_summed == CHECKSUM_PARTIAL) {
+                       err = skb_checksum_help(segs);
+                       if (unlikely(err)) {
+                               XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
+                               goto error;
+                       }
                }

                err = xfrm_output2(net, sk, segs);

  parent reply	other threads:[~2017-04-14  2:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-10 18:42 [PATCH net] xfrm: calculate L4 checksums also for GSO case before encrypting packets Ansis Atteka
2017-04-11  7:07 ` Steffen Klassert
     [not found]   ` <CAMQa7Biajree5Kc1fOWQN42R1UDDGp7ZevZZRtUMZOKDWTk-Vw@mail.gmail.com>
2017-04-14  2:54     ` Ansis Atteka [this message]
2017-04-18  9:09     ` Steffen Klassert
2017-04-19  2:10       ` Ansis Atteka
2017-04-20  9:47         ` Steffen Klassert
2017-04-21 21:45           ` Ansis Atteka
2017-04-27  9:04             ` Steffen Klassert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMQa7Bhr3VTSzRucCX_nJEFTuO7nFpV+g_DD_qekvJm5ZMxdBA@mail.gmail.com \
    --to=ansisatteka@gmail.com \
    --cc=aatteka@ovn.org \
    --cc=netdev@vger.kernel.org \
    --cc=steffen.klassert@secunet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.