All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pravin Shelar <pshelar@ovn.org>
To: Ed Swierk <eswierk@skyportsystems.com>
Cc: ovs-dev <ovs-dev@openvswitch.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Benjamin Warren <ben@skyportsystems.com>,
	Keith Holleman <holleman@skyportsystems.com>
Subject: Re: [PATCH v2] openvswitch: Trim off padding before L3+ netfilter processing
Date: Fri, 22 Dec 2017 15:31:19 -0800	[thread overview]
Message-ID: <CAOrHB_D_xUoynncs_W3EcKtpc2yyYuvUgAg1G=7GNJvgnFt=ow__2975.00752280957$1513985443$gmane$org@mail.gmail.com> (raw)
In-Reply-To: <1513869437-20059-1-git-send-email-eswierk@skyportsystems.com>

On Thu, Dec 21, 2017 at 7:17 AM, Ed Swierk <eswierk@skyportsystems.com> wrote:
> IPv4 and IPv6 packets may arrive with lower-layer padding that is not
> included in the L3 length. For example, a short IPv4 packet may have
> up to 6 bytes of padding following the IP payload when received on an
> Ethernet device. In the normal IPv4 receive path, ip_rcv() trims the
> packet to ip_hdr->tot_len before invoking netfilter hooks (including
> conntrack and nat).
>
> In the IPv6 receive path, ip6_rcv() does the same using
> ipv6_hdr->payload_len. Similarly in the br_netfilter receive path,
> br_validate_ipv4() and br_validate_ipv6() trim the packet to the L3
> length before invoking NF_INET_PRE_ROUTING hooks.
>
> In the OVS conntrack receive path, ovs_ct_execute() pulls the skb to
> the L3 header but does not trim it to the L3 length before calling
> nf_conntrack_in(NF_INET_PRE_ROUTING). When nf_conntrack_proto_tcp
> encounters a packet with lower-layer padding, nf_checksum() fails and
> logs "nf_ct_tcp: bad TCP checksum". While extra zero bytes don't
> affect the checksum, the length in the IP pseudoheader does. That
> length is based on skb->len, and without trimming, it doesn't match
> the length the sender used when computing the checksum.
>
> The assumption throughout nf_conntrack and nf_nat is that skb->len
> reflects the length of the L3 header and payload, so there is no need
> to refer back to ip_hdr->tot_len or ipv6_hdr->payload_len.
>
> This change brings OVS into line with other netfilter users, trimming
> IPv4 and IPv6 packets prior to L3+ netfilter processing.
>
> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
> ---
> v2:
> - Trim packet in nat receive path as well as conntrack
> - Free skb on error
> ---
>  net/openvswitch/conntrack.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
>
> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
> index b27c5c6..1bdc78f 100644
> --- a/net/openvswitch/conntrack.c
> +++ b/net/openvswitch/conntrack.c
> @@ -703,6 +703,33 @@ static bool skb_nfct_cached(struct net *net,
>         return ct_executed;
>  }
>
> +/* Trim the skb to the L3 length. Assumes the skb is already pulled to
> + * the L3 header. The skb is freed on error.
> + */
> +static int skb_trim_l3(struct sk_buff *skb)
> +{
> +       unsigned int nh_len;
> +       int err;
> +
> +       switch (skb->protocol) {
> +       case htons(ETH_P_IP):
> +               nh_len = ntohs(ip_hdr(skb)->tot_len);
> +               break;
> +       case htons(ETH_P_IPV6):
> +               nh_len = ntohs(ipv6_hdr(skb)->payload_len)
> +                       + sizeof(struct ipv6hdr);
> +               break;
> +       default:
> +               nh_len = skb->len;
> +       }
> +
> +       err = pskb_trim_rcsum(skb, nh_len);
> +       if (err)
This should is unlikely.
> +               kfree_skb(skb);
> +
> +       return err;
> +}
> +
This looks like a generic function, it probably does not belong to OVS
code base.

>  #ifdef CONFIG_NF_NAT_NEEDED
>  /* Modelled after nf_nat_ipv[46]_fn().
>   * range is only used for new, uninitialized NAT state.
> @@ -715,8 +742,12 @@ static int ovs_ct_nat_execute(struct sk_buff *skb, struct nf_conn *ct,
>  {
>         int hooknum, nh_off, err = NF_ACCEPT;
>
> +       /* The nat module expects to be working at L3. */
>         nh_off = skb_network_offset(skb);
>         skb_pull_rcsum(skb, nh_off);
> +       err = skb_trim_l3(skb);
> +       if (err)
> +               return err;
>
ct-nat is executed within ct action, so I do not see why you you call
skb-trim again from ovs_ct_nat_execute().
ovs_ct_execute() trim should take care of the skb.

>         /* See HOOK2MANIP(). */
>         if (maniptype == NF_NAT_MANIP_SRC)
> @@ -1111,6 +1142,9 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
>         /* The conntrack module expects to be working at L3. */
>         nh_ofs = skb_network_offset(skb);
>         skb_pull_rcsum(skb, nh_ofs);
> +       err = skb_trim_l3(skb);
> +       if (err)
> +               return err;
>
>         if (key->ip.frag != OVS_FRAG_TYPE_NONE) {
>                 err = handle_fragments(net, key, info->zone.id, skb);
> --
> 1.9.1
>

  reply	other threads:[~2017-12-22 23:31 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-12 16:17 [PATCH] openvswitch: Trim off padding before L3 conntrack processing Ed Swierk
2017-12-14  0:58 ` Pravin Shelar
2017-12-14 20:05   ` Ed Swierk
2017-12-17 19:22     ` Pravin Shelar
2017-12-17 19:22     ` Pravin Shelar
2017-12-21 15:17 ` [PATCH v2] openvswitch: Trim off padding before L3+ netfilter processing Ed Swierk
2017-12-22 23:31   ` Pravin Shelar [this message]
2017-12-22 23:31   ` Pravin Shelar
2017-12-23  0:39     ` Ed Swierk
2018-01-03  6:21       ` Pravin Shelar
2018-01-03  6:21       ` Pravin Shelar
2017-12-23  0:39     ` Ed Swierk
2018-01-04  3:49     ` Ed Swierk
2018-01-05  3:36       ` Pravin Shelar
2018-01-05  3:36       ` Pravin Shelar
2018-01-05 18:14         ` Ed Swierk
     [not found]           ` <CAO_EM_mQgURXZNtW7Qw7OkW4rjp4JWKBmqS8e4pUR=ZuiGCcZQ@mail.gmail.com>
2018-01-06  6:17             ` Pravin Shelar
2018-01-06  6:17             ` Pravin Shelar
     [not found]               ` <CAO_EM_=2qt3zSW1xprkLvcQVKGRTFMUQxCc4-cVLsUcRLj63Hg@mail.gmail.com>
2018-01-06 18:57                 ` Pravin Shelar
2018-01-09  0:05                   ` Pravin Shelar
2018-01-09  3:02                   ` Ed Swierk
2018-01-09  3:02                   ` Ed Swierk
2018-01-09 22:06                     ` Pravin Shelar
2018-01-05 18:14         ` Ed Swierk
2017-12-21 15:21 ` [PATCH v2 RESEND] " Ed Swierk
2017-12-21 15:21 ` Ed Swierk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOrHB_D_xUoynncs_W3EcKtpc2yyYuvUgAg1G=7GNJvgnFt=ow__2975.00752280957$1513985443$gmane$org@mail.gmail.com' \
    --to=pshelar@ovn.org \
    --cc=ben@skyportsystems.com \
    --cc=eswierk@skyportsystems.com \
    --cc=holleman@skyportsystems.com \
    --cc=netdev@vger.kernel.org \
    --cc=ovs-dev@openvswitch.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.