All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ed Swierk <eswierk@skyportsystems.com>
To: Pravin Shelar <pshelar@ovn.org>
Cc: ovs-dev <ovs-dev@openvswitch.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Benjamin Warren <ben@skyportsystems.com>,
	Keith Holleman <holleman@skyportsystems.com>
Subject: Re: [PATCH v2] openvswitch: Trim off padding before L3+ netfilter processing
Date: Fri, 5 Jan 2018 10:14:05 -0800	[thread overview]
Message-ID: <CAO_EM_kFtsEBbVS9T00skKCpO0C72uDh=KQa8XWqS_8hSUfzVg@mail.gmail.com> (raw)
In-Reply-To: <CAOrHB_D1mj_0O+_e3+GxxcTiuk0B5w5cc6qf90jM+fGiw8oAhw@mail.gmail.com>

On Thu, Jan 4, 2018 at 7:36 PM, Pravin Shelar <pshelar@ovn.org> wrote:
> On Wed, Jan 3, 2018 at 7:49 PM, Ed Swierk <eswierk@skyportsystems.com> wrote:
>> On Fri, Dec 22, 2017 at 3:31 PM, Pravin Shelar <pshelar@ovn.org> wrote:
>>> On Thu, Dec 21, 2017 at 7:17 AM, Ed Swierk <eswierk@skyportsystems.com> wrote:
>>>> IPv4 and IPv6 packets may arrive with lower-layer padding that is not
>>>> included in the L3 length. For example, a short IPv4 packet may have
>>>> up to 6 bytes of padding following the IP payload when received on an
>>>> Ethernet device. In the normal IPv4 receive path, ip_rcv() trims the
>>>> packet to ip_hdr->tot_len before invoking netfilter hooks (including
>>>> conntrack and nat).
>>>>
>>>> In the IPv6 receive path, ip6_rcv() does the same using
>>>> ipv6_hdr->payload_len. Similarly in the br_netfilter receive path,
>>>> br_validate_ipv4() and br_validate_ipv6() trim the packet to the L3
>>>> length before invoking NF_INET_PRE_ROUTING hooks.
>>>>
>>>> In the OVS conntrack receive path, ovs_ct_execute() pulls the skb to
>>>> the L3 header but does not trim it to the L3 length before calling
>>>> nf_conntrack_in(NF_INET_PRE_ROUTING). When nf_conntrack_proto_tcp
>>>> encounters a packet with lower-layer padding, nf_checksum() fails and
>>>> logs "nf_ct_tcp: bad TCP checksum". While extra zero bytes don't
>>>> affect the checksum, the length in the IP pseudoheader does. That
>>>> length is based on skb->len, and without trimming, it doesn't match
>>>> the length the sender used when computing the checksum.
>>>>
>>>> The assumption throughout nf_conntrack and nf_nat is that skb->len
>>>> reflects the length of the L3 header and payload, so there is no need
>>>> to refer back to ip_hdr->tot_len or ipv6_hdr->payload_len.
>>>>
>>>> This change brings OVS into line with other netfilter users, trimming
>>>> IPv4 and IPv6 packets prior to L3+ netfilter processing.
>>>>
>>>> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
>>>> ---
>>>> v2:
>>>> - Trim packet in nat receive path as well as conntrack
>>>> - Free skb on error
>>>> ---
>>>>  net/openvswitch/conntrack.c | 34 ++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 34 insertions(+)
>>>>
>>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>>>> index b27c5c6..1bdc78f 100644
>>>> --- a/net/openvswitch/conntrack.c
>>>> +++ b/net/openvswitch/conntrack.c
>>>> @@ -703,6 +703,33 @@ static bool skb_nfct_cached(struct net *net,
>>>>         return ct_executed;
>>>>  }
>>>>
>>>> +/* Trim the skb to the L3 length. Assumes the skb is already pulled to
>>>> + * the L3 header. The skb is freed on error.
>>>> + */
>>>> +static int skb_trim_l3(struct sk_buff *skb)
>>>> +{
>>>> +       unsigned int nh_len;
>>>> +       int err;
>>>> +
>>>> +       switch (skb->protocol) {
>>>> +       case htons(ETH_P_IP):
>>>> +               nh_len = ntohs(ip_hdr(skb)->tot_len);
>>>> +               break;
>>>> +       case htons(ETH_P_IPV6):
>>>> +               nh_len = ntohs(ipv6_hdr(skb)->payload_len)
>>>> +                       + sizeof(struct ipv6hdr);
>>>> +               break;
>>>> +       default:
>>>> +               nh_len = skb->len;
>>>> +       }
>>>> +
>>>> +       err = pskb_trim_rcsum(skb, nh_len);
>>>> +       if (err)
>>> This should is unlikely.
>>>> +               kfree_skb(skb);
>>>> +
>>>> +       return err;
>>>> +}
>>>> +
>>> This looks like a generic function, it probably does not belong to OVS
>>> code base.
>>
>> It occurs to me that skb_trim_l3() can't just reach into ip_hdr(skb)
>> before calling pskb_may_pull(skb, sizeof(struct iphdr)) to make sure
>> the IP header is actually there; and for IPv4 it should validate the
>> IP header checksum, including options. Once we add all these steps,
>> skb_trim_l3() starts to look an awful lot like br_validate_ipv4() and
>> br_validate_ipv6(). And those in turn are eerily similar to ip_rcv()
>> and ip6_rcv(). It would be nice to avoid duplicating this logic yet
>> again.
>>
> OVS already pull all required headers in skb linear data, so no need
> to redo all of it. only check required is the ip-checksum validation.
> I think we could avoid it in most of cases by checking skb length to
> ipheader length before verifying the ip header-checksum.

Shouldn't the IP header checksum be verified even earlier, like in
key_extract(), before actually using any of the fields in the IP
header?

And since key_extract() is already inspecting the IP/IPv6 header, it
would be a convenient spot to check whether the skb->len matches. If
there's a difference, it could record the number of bytes to trim in
an ovs_skb_cb field. Then ovs_ct_execute() would look at this field
and trim the skb only if necessary.

  reply	other threads:[~2018-01-05 18:14 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-12 16:17 [PATCH] openvswitch: Trim off padding before L3 conntrack processing Ed Swierk
2017-12-14  0:58 ` Pravin Shelar
2017-12-14 20:05   ` Ed Swierk
2017-12-17 19:22     ` Pravin Shelar
2017-12-17 19:22     ` Pravin Shelar
2017-12-21 15:17 ` [PATCH v2] openvswitch: Trim off padding before L3+ netfilter processing Ed Swierk
2017-12-22 23:31   ` Pravin Shelar
2017-12-22 23:31   ` Pravin Shelar
2017-12-23  0:39     ` Ed Swierk
2018-01-03  6:21       ` Pravin Shelar
2018-01-03  6:21       ` Pravin Shelar
2017-12-23  0:39     ` Ed Swierk
2018-01-04  3:49     ` Ed Swierk
2018-01-05  3:36       ` Pravin Shelar
2018-01-05  3:36       ` Pravin Shelar
2018-01-05 18:14         ` Ed Swierk [this message]
     [not found]           ` <CAO_EM_mQgURXZNtW7Qw7OkW4rjp4JWKBmqS8e4pUR=ZuiGCcZQ@mail.gmail.com>
2018-01-06  6:17             ` Pravin Shelar
2018-01-06  6:17             ` Pravin Shelar
     [not found]               ` <CAO_EM_=2qt3zSW1xprkLvcQVKGRTFMUQxCc4-cVLsUcRLj63Hg@mail.gmail.com>
2018-01-06 18:57                 ` Pravin Shelar
2018-01-09  0:05                   ` Pravin Shelar
2018-01-09  3:02                   ` Ed Swierk
2018-01-09  3:02                   ` Ed Swierk
2018-01-09 22:06                     ` Pravin Shelar
2018-01-05 18:14         ` Ed Swierk
2017-12-21 15:21 ` [PATCH v2 RESEND] " Ed Swierk
2017-12-21 15:21 ` Ed Swierk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAO_EM_kFtsEBbVS9T00skKCpO0C72uDh=KQa8XWqS_8hSUfzVg@mail.gmail.com' \
    --to=eswierk@skyportsystems.com \
    --cc=ben@skyportsystems.com \
    --cc=holleman@skyportsystems.com \
    --cc=netdev@vger.kernel.org \
    --cc=ovs-dev@openvswitch.org \
    --cc=pshelar@ovn.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.