All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jarno Rajahalme <jarno@ovn.org>
To: Joe Stringer <joe@ovn.org>
Cc: netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next 5/7] openvswitch: Add original direction conntrack tuple to sw_flow_key.
Date: Tue, 7 Feb 2017 21:31:16 -0800	[thread overview]
Message-ID: <DBED96B8-64E3-480B-93B2-1175682FF3B4@ovn.org> (raw)
In-Reply-To: <CAPWQB7Hhm+dVqaOxU-56wV6MmDJw70716ir7cUAnxNnWs_a1LA@mail.gmail.com>


> On Feb 6, 2017, at 11:15 PM, Joe Stringer <joe@ovn.org> wrote:
> 
> On 2 February 2017 at 17:10, Jarno Rajahalme <jarno@ovn.org> wrote:
>> Add the fields of the conntrack original direction 5-tuple to struct
>> sw_flow_key.  The new fields are initially zeroed, and are populated
>> whenever a conntrack action is executed and either finds or generates
>> a conntrack entry.  This means that these fields exist for all packets
>> were not rejected by conntrack as untrackable.
>> 
>> The original tuple fields in the sw_flow_key are filled from the
>> original direction tuple of the conntrack entry relating to the
>> current packet, or from the original direction tuple of the master
>> conntrack entry, if the current conntrack entry has a master.
>> Generally, expected connections of connections having an assigned
>> helper (e.g., FTP), have a master conntrack entry.
>> 
>> The main purpose of the new conntrack original tuple fields is to
>> allow matching on them for policy decision purposes, with the premise
>> that the admissibility of tracked connections reply packets (as well
>> as original direction packets), and both direction packets of any
>> related connections may be based on ACL rules applying to the master
>> connection's original direction 5-tuple.  This also makes it easier to
>> make policy decisions when the actual packet headers might have been
>> transformed by NAT, as the original direction 5-tuple represents the
>> packet headers before any such transformation.
>> 
>> When using the original direction 5-tuple the admissibility of return
>> and/or related packets need not be based on the mere existence of a
>> conntrack entry, allowing separation of admission policy from the
>> established conntrack state.  While existence of a conntrack entry is
>> required for admission of the return or related packets, policy
>> changes can render connections that were initially admitted to be
>> rejected or dropped afterwards.  If the admission of the return and
>> related packets was based on mere conntrack state (e.g., connection
>> being in an established state), a policy change that would make the
>> connection rejected or dropped would need to find and delete all
>> conntrack entries affected by such a change.  When using the original
>> direction 5-tuple matching the affected conntrack entries can be
>> allowed to time out instead, as the established state of the
>> connection would not need to be the basis for packet admission any
>> more.
>> 
>> It should be noted that the directionality of related connections may
>> be the same or different than that of the master connection, and
>> neither the original direction 5-tuple nor the conntrack state bits
>> carry this information.  If needed, the directionality of the master
>> connection can be stored in master's conntrack mark or labels, which
>> are automatically inherited by the expected related connections.
>> 
>> The fact that neither ARP not ND packets are trackable by conntrack
>> allows mutual exclusion between ARP/ND and the new conntrack original
>> tuple fields.  Hence, the IP addresses are overlaid in union with ARP
>> and ND fields.  This allows the sw_flow_key to not grow much due to
>> this patch, but it also means that we must be careful to never use the
>> new key fields with ARP or ND packets.  ARP is easy to distinguish and
>> keep mutually exclusive based on the ethernet type, but ND being an
>> ICMPv6 protocol requires a bit more attention.
>> 
>> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
>> ---
> 
> OK, maybe we need to do something a bit more to handle the NATed
> related connections to address the problem in patch 1.
> 
> <snip>
> 
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 738a4fa..1afe153 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -155,6 +155,59 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
>>        key->ct.zone = zone->id;
>>        key->ct.mark = ovs_ct_get_mark(ct);
>>        ovs_ct_get_labels(ct, &key->ct.labels);
>> +
>> +       /* Use the master if we have one. */
>> +       if (ct && ct->master)
>> +               ct = ct->master;
> 
> Perhaps:
> 
> if (!ct || sw_flow_key_is_nd(key) || !is_ip_any(key->eth.type)) {
>    /* zero everything */
>    return;
> }
> 
> One of the things this helps us to avoid is having a comment in the
> middle of an if statement.
> 
> Then afterwards,
> if (ct->master)
>    ct = ct->master;
> 
>> +
>> +       key->ct.orig_proto = 0;
>> +       key->ct.orig_tp.src = 0;
>> +       key->ct.orig_tp.dst = 0;
>> +       if (key->eth.type == htons(ETH_P_IP)) {
>> +               /* IP version must match. */
>> +               if (ct && nf_ct_l3num(ct) == NFPROTO_IPV4) {
> 
> I don't quite understand how we could end up with a connection NFPROTO
> that is mismatched with an IP version that we should handle here, but
> if there are some legitimite cases perhaps we can pick them up and
> handle them in the early exit condition above?
> 

One would be if an IPv4 FTP control connection opened an IPv6 data connection. Our flow key layout assumes the IP versions between master and related connection match. I added a comment to that effect.

> We can probably share a few more lines between IPv4 and IPv6 here.
> 

I refactored all the repeated code away for v2.


>> @@ -208,24 +261,54 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
>>        ovs_ct_update_key(skb, NULL, key, false, false);
>> }
>> 
>> -int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb)
>> +#define IN6_ADDR_INITIALIZER(ADDR) \
>> +       { (ADDR).s6_addr32[0], (ADDR).s6_addr32[1], \
>> +         (ADDR).s6_addr32[2], (ADDR).s6_addr32[3] }
> 
> scripts/checkpatch.pl asks:
> 
> CHECK: Macro argument reuse 'ADDR' - possible side-effects?
> #704: FILE: net/openvswitch/conntrack.c:264:
> +#define IN6_ADDR_INITIALIZER(ADDR) \
> +       { (ADDR).s6_addr32[0], (ADDR).s6_addr32[1], \
> +         (ADDR).s6_addr32[2], (ADDR).s6_addr32[3] }
> 
> I'm guessing it just doesn't like your use of the name 'ADDR’.

Rather it is recommending caution about ‘ADDR’ being evaluated multiple times, which is no problem here.

> 
>> +       if (swkey->eth.type == htons(ETH_P_IP) && swkey->ct.orig_proto) {
>> +               struct ovs_key_ct_tuple_ipv4 orig = {
>> +                       output->ipv4.ct_orig.src,
>> +                       output->ipv4.ct_orig.dst,
>> +                       output->ct.orig_tp.src,
>> +                       output->ct.orig_tp.dst,
>> +                       output->ct.orig_proto,
>> +               };
>> +               if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, sizeof(orig),
>> +                           &orig))
>> +                       return -EMSGSIZE;
>> +       } else if (swkey->eth.type == htons(ETH_P_IPV6) &&
>> +                  swkey->ct.orig_proto) {
>> +               struct ovs_key_ct_tuple_ipv6 orig = {
>> +                       IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.src),
>> +                       IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
>> +                       output->ct.orig_tp.src,
>> +                       output->ct.orig_tp.dst,
>> +                       output->ct.orig_proto,
>> +               };
>> +               if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, sizeof(orig),
>> +                           &orig))
>> +                       return -EMSGSIZE;
>> +       }
> 
> swkey->ct.orig_proto check is common across both conditions, maybe
> check that first then nest the other logic within?

Done.

> 
>> +
>>        return 0;
>> }
>> 
>> diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
>> index 8f6230b..8573ab3 100644
>> --- a/net/openvswitch/conntrack.h
>> +++ b/net/openvswitch/conntrack.h
>> @@ -32,7 +32,8 @@ int ovs_ct_execute(struct net *, struct sk_buff *, struct sw_flow_key *,
>>                   const struct ovs_conntrack_info *);
>> 
>> void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key);
>> -int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
>> +int ovs_ct_put_key(const struct sw_flow_key *swkey,
>> +                  const struct sw_flow_key *output, struct sk_buff *skb);
>> void ovs_ct_free_action(const struct nlattr *a);
>> 
>> #define CT_SUPPORTED_MASK (OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | \
>> @@ -79,9 +80,17 @@ static inline void ovs_ct_fill_key(const struct sk_buff *skb,
>>        key->ct.zone = 0;
>>        key->ct.mark = 0;
>>        memset(&key->ct.labels, 0, sizeof(key->ct.labels));
>> +       key->ct.orig_proto = 0;
>> +       key->ct.orig_tp.src = 0;
>> +       key->ct.orig_tp.dst = 0;
>> +       if (key->eth.type == htons(ETH_P_IP))
>> +               memset(&key->ipv4.ct_orig, 0, sizeof(key->ipv4.ct_orig));
>> +       else if (key->eth.type == htons(ETH_P_IPV6) && !sw_flow_key_is_nd(key))
>> +               memset(&key->ipv6.ct_orig, 0, sizeof(key->ipv6.ct_orig));
> 
> If we only ever use these fields based on key->ct.orig_proto being
> nonzero, then we m
> 

All these fields need to be initialized for megaflow matching purposes. We use orig_proto non-zero check to avoid encoding all-zero netlink attribute.

> <snip>
> 
>> +static inline bool sw_flow_key_is_nd(const struct sw_flow_key *key)
>> +{
>> +       return key->eth.type == htons(ETH_P_IPV6) &&
>> +               key->ip.proto == NEXTHDR_ICMP &&
>> +               key->tp.dst == 0 &&
>> +               (key->tp.src == ntohs(NDISC_NEIGHBOUR_SOLICITATION) ||
>> +                key->tp.src == ntohs(NDISC_NEIGHBOUR_ADVERTISEMENT));
> 
> Sparse complains:
> 
> net/openvswitch/flow.h:163:33: warning: cast to restricted __be16
> net/openvswitch/flow.h:163:25: warning: restricted __be16 degrades to integer
> net/openvswitch/flow.h:164:33: warning: cast to restricted __be16
> net/openvswitch/flow.h:164:25: warning: restricted __be16 degrades to integer
> 

Oops, should have used htons() instead.

>> @@ -161,8 +163,11 @@ static bool match_validate(const struct sw_flow_match *match,
>> 
>>        if (match->key->eth.type == htons(ETH_P_IP)) {
>>                key_expected |= 1 << OVS_KEY_ATTR_IPV4;
>> -               if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
>> +               if (match->mask &&
>> +                   (match->mask->key.eth.type == htons(0xffff))) {
> 
> Minor nit, can probably drop the parentheses and keep this to one line?
> 

OK.

>> @@ -1430,6 +1476,10 @@ int ovs_nla_get_match(struct net *net, struct sw_flow_match *match,
>>                        goto free_newmask;
>>        }
>> 
>> +       /* This also checks that conntrack original direction metadata exists
>> +        * only for packets for which it makes sense.  Otherwise the key may be
>> +        * corrupted due to overlapping key fields.
>> +        */
>>        if (!match_validate(match, key_attrs, mask_attrs, log))
>>                err = -EINVAL;
>> 
> 
> I believe that this is already explained in the relevant portion of
> match_validate(), not sure it's worth highlighting that particular
> piece of validation above a call to such a general function.

Removed.

  parent reply	other threads:[~2017-02-08  5:31 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-03  1:10 [PATCH net-next 1/7] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted Jarno Rajahalme
2017-02-03  1:10 ` [PATCH net-next 2/7] openvswitch: Unionize ovs_key_ct_label with a u32 array Jarno Rajahalme
2017-02-03  1:10 ` [PATCH net-next 3/7] openvswitch: Do not trigger events for unconfirmed connection Jarno Rajahalme
2017-02-06 21:46   ` Joe Stringer
2017-02-08  5:30     ` Jarno Rajahalme
2017-02-03  1:10 ` [PATCH net-next 4/7] openvswitch: Inherit master's labels Jarno Rajahalme
2017-02-06 21:53   ` Joe Stringer
2017-02-08  5:31     ` Jarno Rajahalme
2017-02-03  1:10 ` [PATCH net-next 5/7] openvswitch: Add original direction conntrack tuple to sw_flow_key Jarno Rajahalme
2017-02-07  7:15   ` Joe Stringer
2017-02-07 21:38     ` Joe Stringer
2017-02-08  5:31     ` Jarno Rajahalme [this message]
2017-02-03  1:10 ` [PATCH net-next 6/7] openvswitch: Add force commit Jarno Rajahalme
2017-02-06 17:08   ` Pravin Shelar
2017-02-07  7:28     ` Joe Stringer
2017-02-07 17:14       ` Pravin Shelar
2017-02-07 22:15   ` Joe Stringer
     [not found]     ` <5B795D0B-4C7F-4297-BA2A-6BE3444033D0@ovn.org>
2017-02-08  1:32       ` Joe Stringer
2017-02-08  5:31     ` Jarno Rajahalme
2017-02-03  1:10 ` [PATCH net-next 7/7] openvswitch: Pack struct sw_flow_key Jarno Rajahalme
2017-02-07  7:15   ` Joe Stringer
2017-02-08  1:11     ` Jarno Rajahalme
2017-02-08  5:31     ` Jarno Rajahalme
2017-02-05 22:28 ` [PATCH net-next 1/7] openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted David Miller
2017-02-06 17:06   ` Pravin Shelar
2017-02-06 17:15     ` David Miller
2017-02-07 17:14       ` Pravin Shelar
2017-02-07 21:29         ` Jarno Rajahalme
2017-02-07  0:45   ` Joe Stringer
2017-02-06 17:07 ` Pravin Shelar
2017-02-08  5:30   ` Jarno Rajahalme

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBED96B8-64E3-480B-93B2-1175682FF3B4@ovn.org \
    --to=jarno@ovn.org \
    --cc=joe@ovn.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.