All of lore.kernel.org
 help / color / mirror / Atom feed
* [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
@ 2021-03-10  7:51 Louis Peens
  2021-03-10 11:06 ` Ilya Maximets
  0 siblings, 1 reply; 8+ messages in thread
From: Louis Peens @ 2021-03-10  7:51 UTC (permalink / raw)
  To: ovs-dev; +Cc: netdev, Simon Horman, Yinjun Zhang

Hi all

We've recently encountered an interesting situation with OVS conntrack
when offloading to the TC datapath, and would like some feedback. Sorry
about the longish wall of text, but I'm trying to explain the problem
as clearly as possible. The very short summary is that there is a mismatch
in behaviour between the OVS datapath and OVS+TC datapath, and we're
not sure how to resolve this. Here goes:

We have a set of rules looking like this:
ovs-ofctl add-flow br0 "table=0,in_port=p1,ct_state=-trk,ipv6,actions=ct(table=1)"
ovs-ofctl add-flow br0 "table=0,in_port=p2,ct_state=-trk,ipv6,actions=ct(table=1)"
#post_ct flows"
ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+new,ipv6,actions=ct(commit),output:p2"
ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+new,ipv6,actions=ct(commit),output:p1"
ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+est,ipv6,actions=output:p2"
ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+est,ipv6,actions=output:p1"

p1/p2 are the endpoints of two different veth pairs, just to keep this simple.
The rules above work well enough with UDP/TCP traffic, however ICMPv6 packets
(08:56:39.984375 IP6 2001:db8:0:f101::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2001:db8:0:f101::2, length 32)
breaks this somewhat. With TC offload disabled:

ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=false

we get the following datapath rules:

ovs-appctl dpctl/dump-flows --names
recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:drop
recirc_id(0),in_port(p1),ct_state(-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:ct,recirc(0x1)

This part is still fine, we do not have a rule for just matching +trk, so the
the drop rule is to be expected. The problem however is when we enable TC
offload:

ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true

This is the result in the datapath:

ovs-appctl dpctl/dump-flows --names
ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:144, used:0.920s, actions:ct,recirc(0x1)
recirc_id(0x1),in_port(p1),ct_state(-new-est-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:86, used:0.928s, actions:drop
recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:drop

Notice the installation of the two recirc rules, one with -trk and one with +trk,
with the -trk one being the rule that handles all the next packets. Further
investigation reveals that something like the following is happening:

1) The first packet arrives and is handled by the OVS datapath,
   triggering the installation of the two rules like in the non-offloaded
   case. So the recirc_id(0) rule gets installed into tc, and recirc_id(0x1)
   gets installed into the ovs datapath. This bit of code in the OVS module
   makes sure that +trk is set.

    /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
     * previously sent the packet to conntrack via the ct action.....
     * /
   static void ovs_ct_update_key(const struct sk_buff *skb,
                              const struct ovs_conntrack_info *info,
                              struct sw_flow_key *key, bool post_ct,
                              bool keep_nat_flags)
    {
            ...
            ct = nf_ct_get(skb, &ctinfo);
            if (ct) {//tracked
                    ...
            } else if (post_ct) {
                    state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
                    if (info)
                            zone = &info->zone;
            }
            __ovs_ct_update_key(key, state, zone, ct);

    }
    Obviously this is not the case when the packet was sent to conntrack
    via tc.

2) The second packet arrives, and now hits the rule installed in
   TC. However, TC does not handle ICMPv6 (Neighbor Solicitation), and explicitely
   clears the tracked bit (net/netfilter/nf_conntrack_proto_icmpv6.c):

    int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
                                  struct sk_buff *skb,
                                  unsigned int dataoff,
                                  const struct nf_hook_state *state)

    {
    ...
            type = icmp6h->icmp6_type - 130;
            if (type >= 0 && type < sizeof(noct_valid_new) &&
                noct_valid_new[type]) {
                    nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
                    return NF_ACCEPT;
            }
    ...
    }
    (The above code gets triggered a few function calls down from act_ct.c)

3) So now the packet does not hit the +trk rule after the recirc, and leads
   to the installation of the "recirc_id(0x1),..-trk" rule, since +trk wasn't
   set by TC.

This is now the point where we're a bit stuck and is hoping for some ideas
on how to best resolve this. A workaround is of course just to modify the
userspace rules to not send the icmp packets to conntrack and that should
work, but it is a workaround. I think this inconsistency between TC
offload and non-TC is quite undesirable, and could lead to some interesting
results, for instance this was first detected by the observation of packets
getting stuck in a loop in the datapath:

recirc_id(0xe),...ct_state(0/0x20),....,in_port(eth9),eth_type(0x86dd),... ,dp:tc, actions:ct,recirc(0xe)

Where the userspace rule was doing ct to the same table instead of moving to the next table:

ovs-ofctl add-flow br0 "table=0,in_port=eth9,ct_state=-trk,ipv6,actions=ct(table=0)"

So far we've not managed to think of a good way to resolve this in the code.
I don't think changing the kernel behaviour would be desirable, at least
not in that specific function as that is common conntrack code. I suspect
that ideally this is something we can try and address from the OVS side,
but at this moment I have no idea how this will be achieved, hence this
email.

Looking forward to get some suggestions on this

Regards
Louis Peens

PS: Tested on:
net-next kernel:
    d310ec03a34e Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
OVS:
    "cdaa7e0fd dpif-netdev: Fix crash when add dp flow without in_port field."
        +
    "[ovs-dev] [PATCH v3 0/3] Add offload support for ct_state rpl and inv flags"
    (The behaviour before and after the patch series in terms of the problem
     above is the same. Whether the recirc rules end up in the ovs datapath or tc
     datapath doesn't really matter)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
  2021-03-10  7:51 [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6 Louis Peens
@ 2021-03-10 11:06 ` Ilya Maximets
  2021-03-12 22:06   ` Marcelo Leitner
  0 siblings, 1 reply; 8+ messages in thread
From: Ilya Maximets @ 2021-03-10 11:06 UTC (permalink / raw)
  To: Louis Peens, ovs-dev, Marcelo Leitner, Paul Blakey
  Cc: netdev, Yinjun Zhang, Simon Horman, i.maximets

Hi, Louis.  Thanks for your report!

Marcelo, Paul, could you, please, take a look?

Best regards, Ilya Maximets.

On 3/10/21 8:51 AM, Louis Peens wrote:
> Hi all
> 
> We've recently encountered an interesting situation with OVS conntrack
> when offloading to the TC datapath, and would like some feedback. Sorry
> about the longish wall of text, but I'm trying to explain the problem
> as clearly as possible. The very short summary is that there is a mismatch
> in behaviour between the OVS datapath and OVS+TC datapath, and we're
> not sure how to resolve this. Here goes:
> 
> We have a set of rules looking like this:
> ovs-ofctl add-flow br0 "table=0,in_port=p1,ct_state=-trk,ipv6,actions=ct(table=1)"
> ovs-ofctl add-flow br0 "table=0,in_port=p2,ct_state=-trk,ipv6,actions=ct(table=1)"
> #post_ct flows"
> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+new,ipv6,actions=ct(commit),output:p2"
> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+new,ipv6,actions=ct(commit),output:p1"
> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+est,ipv6,actions=output:p2"
> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+est,ipv6,actions=output:p1"
> 
> p1/p2 are the endpoints of two different veth pairs, just to keep this simple.
> The rules above work well enough with UDP/TCP traffic, however ICMPv6 packets
> (08:56:39.984375 IP6 2001:db8:0:f101::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2001:db8:0:f101::2, length 32)
> breaks this somewhat. With TC offload disabled:
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=false
> 
> we get the following datapath rules:
> 
> ovs-appctl dpctl/dump-flows --names
> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:drop
> recirc_id(0),in_port(p1),ct_state(-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:ct,recirc(0x1)
> 
> This part is still fine, we do not have a rule for just matching +trk, so the
> the drop rule is to be expected. The problem however is when we enable TC
> offload:
> 
> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true
> 
> This is the result in the datapath:
> 
> ovs-appctl dpctl/dump-flows --names
> ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:144, used:0.920s, actions:ct,recirc(0x1)
> recirc_id(0x1),in_port(p1),ct_state(-new-est-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:86, used:0.928s, actions:drop
> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:drop
> 
> Notice the installation of the two recirc rules, one with -trk and one with +trk,
> with the -trk one being the rule that handles all the next packets. Further
> investigation reveals that something like the following is happening:
> 
> 1) The first packet arrives and is handled by the OVS datapath,
>    triggering the installation of the two rules like in the non-offloaded
>    case. So the recirc_id(0) rule gets installed into tc, and recirc_id(0x1)
>    gets installed into the ovs datapath. This bit of code in the OVS module
>    makes sure that +trk is set.
> 
>     /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
>      * previously sent the packet to conntrack via the ct action.....
>      * /
>    static void ovs_ct_update_key(const struct sk_buff *skb,
>                               const struct ovs_conntrack_info *info,
>                               struct sw_flow_key *key, bool post_ct,
>                               bool keep_nat_flags)
>     {
>             ...
>             ct = nf_ct_get(skb, &ctinfo);
>             if (ct) {//tracked
>                     ...
>             } else if (post_ct) {
>                     state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
>                     if (info)
>                             zone = &info->zone;
>             }
>             __ovs_ct_update_key(key, state, zone, ct);
> 
>     }
>     Obviously this is not the case when the packet was sent to conntrack
>     via tc.
> 
> 2) The second packet arrives, and now hits the rule installed in
>    TC. However, TC does not handle ICMPv6 (Neighbor Solicitation), and explicitely
>    clears the tracked bit (net/netfilter/nf_conntrack_proto_icmpv6.c):
> 
>     int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
>                                   struct sk_buff *skb,
>                                   unsigned int dataoff,
>                                   const struct nf_hook_state *state)
> 
>     {
>     ...
>             type = icmp6h->icmp6_type - 130;
>             if (type >= 0 && type < sizeof(noct_valid_new) &&
>                 noct_valid_new[type]) {
>                     nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
>                     return NF_ACCEPT;
>             }
>     ...
>     }
>     (The above code gets triggered a few function calls down from act_ct.c)
> 
> 3) So now the packet does not hit the +trk rule after the recirc, and leads
>    to the installation of the "recirc_id(0x1),..-trk" rule, since +trk wasn't
>    set by TC.
> 
> This is now the point where we're a bit stuck and is hoping for some ideas
> on how to best resolve this. A workaround is of course just to modify the
> userspace rules to not send the icmp packets to conntrack and that should
> work, but it is a workaround. I think this inconsistency between TC
> offload and non-TC is quite undesirable, and could lead to some interesting
> results, for instance this was first detected by the observation of packets
> getting stuck in a loop in the datapath:
> 
> recirc_id(0xe),...ct_state(0/0x20),....,in_port(eth9),eth_type(0x86dd),... ,dp:tc, actions:ct,recirc(0xe)
> 
> Where the userspace rule was doing ct to the same table instead of moving to the next table:
> 
> ovs-ofctl add-flow br0 "table=0,in_port=eth9,ct_state=-trk,ipv6,actions=ct(table=0)"
> 
> So far we've not managed to think of a good way to resolve this in the code.
> I don't think changing the kernel behaviour would be desirable, at least
> not in that specific function as that is common conntrack code. I suspect
> that ideally this is something we can try and address from the OVS side,
> but at this moment I have no idea how this will be achieved, hence this
> email.
> 
> Looking forward to get some suggestions on this
> 
> Regards
> Louis Peens
> 
> PS: Tested on:
> net-next kernel:
>     d310ec03a34e Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> OVS:
>     "cdaa7e0fd dpif-netdev: Fix crash when add dp flow without in_port field."
>         +
>     "[ovs-dev] [PATCH v3 0/3] Add offload support for ct_state rpl and inv flags"
>     (The behaviour before and after the patch series in terms of the problem
>      above is the same. Whether the recirc rules end up in the ovs datapath or tc
>      datapath doesn't really matter)
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
  2021-03-10 11:06 ` Ilya Maximets
@ 2021-03-12 22:06   ` Marcelo Leitner
       [not found]     ` <58820355-7337-d51b-32dd-be944600832d@corigine.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Marcelo Leitner @ 2021-03-12 22:06 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: Louis Peens, ovs-dev, Paul Blakey, netdev, Yinjun Zhang,
	Simon Horman, wenxu

Hi there,

On Wed, Mar 10, 2021 at 12:06:52PM +0100, Ilya Maximets wrote:
> Hi, Louis.  Thanks for your report!
> 
> Marcelo, Paul, could you, please, take a look?

Thanks for the ping.
+wenxu

> 
> Best regards, Ilya Maximets.
> 
> On 3/10/21 8:51 AM, Louis Peens wrote:
> > Hi all
> > 
> > We've recently encountered an interesting situation with OVS conntrack
> > when offloading to the TC datapath, and would like some feedback. Sorry
> > about the longish wall of text, but I'm trying to explain the problem
> > as clearly as possible. The very short summary is that there is a mismatch

Details are very welcomed, thanks for them.

> > in behaviour between the OVS datapath and OVS+TC datapath, and we're
> > not sure how to resolve this. Here goes:
> > 
> > We have a set of rules looking like this:
> > ovs-ofctl add-flow br0 "table=0,in_port=p1,ct_state=-trk,ipv6,actions=ct(table=1)"
> > ovs-ofctl add-flow br0 "table=0,in_port=p2,ct_state=-trk,ipv6,actions=ct(table=1)"
> > #post_ct flows"
> > ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+new,ipv6,actions=ct(commit),output:p2"
> > ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+new,ipv6,actions=ct(commit),output:p1"
> > ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+est,ipv6,actions=output:p2"
> > ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+est,ipv6,actions=output:p1"
> > 
> > p1/p2 are the endpoints of two different veth pairs, just to keep this simple.
> > The rules above work well enough with UDP/TCP traffic, however ICMPv6 packets
> > (08:56:39.984375 IP6 2001:db8:0:f101::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2001:db8:0:f101::2, length 32)
> > breaks this somewhat. With TC offload disabled:
> > 
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=false
> > 
> > we get the following datapath rules:
> > 
> > ovs-appctl dpctl/dump-flows --names
> > recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:drop
> > recirc_id(0),in_port(p1),ct_state(-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:ct,recirc(0x1)
> > 
> > This part is still fine, we do not have a rule for just matching +trk, so the
> > the drop rule is to be expected. The problem however is when we enable TC
> > offload:
> > 
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true
> > 
> > This is the result in the datapath:
> > 
> > ovs-appctl dpctl/dump-flows --names
> > ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:144, used:0.920s, actions:ct,recirc(0x1)
> > recirc_id(0x1),in_port(p1),ct_state(-new-est-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:86, used:0.928s, actions:drop
> > recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:drop
> > 
> > Notice the installation of the two recirc rules, one with -trk and one with +trk,
> > with the -trk one being the rule that handles all the next packets. Further
> > investigation reveals that something like the following is happening:
> > 
> > 1) The first packet arrives and is handled by the OVS datapath,

Hmm. This shouldn't happen if hw-offload=true, because the first rule
should be installed on tc datapath already. Or maybe you mean OVS
vswitchd when you referred to OVS datapath?

What does  dpctl/dump-flows --names -m  gives in this situation, are
all flows installed on dp:tc?

> >    triggering the installation of the two rules like in the non-offloaded
> >    case. So the recirc_id(0) rule gets installed into tc, and recirc_id(0x1)
> >    gets installed into the ovs datapath. This bit of code in the OVS module
> >    makes sure that +trk is set.
> > 
> >     /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
> >      * previously sent the packet to conntrack via the ct action.....
> >      * /
> >    static void ovs_ct_update_key(const struct sk_buff *skb,
> >                               const struct ovs_conntrack_info *info,
> >                               struct sw_flow_key *key, bool post_ct,
> >                               bool keep_nat_flags)
> >     {
> >             ...
> >             ct = nf_ct_get(skb, &ctinfo);
> >             if (ct) {//tracked
> >                     ...
> >             } else if (post_ct) {
> >                     state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
> >                     if (info)
> >                             zone = &info->zone;
> >             }
> >             __ovs_ct_update_key(key, state, zone, ct);
> > 
> >     }
> >     Obviously this is not the case when the packet was sent to conntrack
> >     via tc.
> > 
> > 2) The second packet arrives, and now hits the rule installed in
> >    TC. However, TC does not handle ICMPv6 (Neighbor Solicitation), and explicitely
> >    clears the tracked bit (net/netfilter/nf_conntrack_proto_icmpv6.c):
> > 
> >     int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
> >                                   struct sk_buff *skb,
> >                                   unsigned int dataoff,
> >                                   const struct nf_hook_state *state)
> > 
> >     {
> >     ...
> >             type = icmp6h->icmp6_type - 130;
> >             if (type >= 0 && type < sizeof(noct_valid_new) &&
> >                 noct_valid_new[type]) {
> >                     nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
> >                     return NF_ACCEPT;
> >             }
> >     ...
> >     }
> >     (The above code gets triggered a few function calls down from act_ct.c)

I don't follow this part, and it seems it would affect ovs kernel
dp as well. Can you please elaborate on the call chain you're focusing
here?

> > 
> > 3) So now the packet does not hit the +trk rule after the recirc, and leads
> >    to the installation of the "recirc_id(0x1),..-trk" rule, since +trk wasn't
> >    set by TC.

If you meant vswitchd above, this can be the problem, yes.
ovs_ct_update_key() is updating the key, and AFAICT that's reflected
on the upcall. Which, then, it's fair to assume (I didn't check)
vswitchd does the same.

But for tc, +trk+inv is synthetsized when tc is trying to match again
on this packet, when skb_flow_dissect_ct() in it will:

       if (!ct) {
               key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
                               TCA_FLOWER_KEY_CT_FLAGS_INVALID;
               return;
       }

Note that 'key' here is not part of the packet in any way. The only
information that is stored within the packet, is
qdisc_skb_cb(skb)->post_ct, which ovs kernel doesn't know about. So
this wouldn't be reflected on an upcall, causing vswitchd to not see
these flags.

IOW, an upcall right after this flow:
ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), actions:ct,recirc(0x1)
can be different if it's from tc datapath or ovs kernel/vswitchd
regarding these flags in this case.

Makes sense? I think we're mostly on the same page on this part,
actually.

Thanks,
Marcelo

> > 
> > This is now the point where we're a bit stuck and is hoping for some ideas
> > on how to best resolve this. A workaround is of course just to modify the
> > userspace rules to not send the icmp packets to conntrack and that should
> > work, but it is a workaround. I think this inconsistency between TC
> > offload and non-TC is quite undesirable, and could lead to some interesting
> > results, for instance this was first detected by the observation of packets
> > getting stuck in a loop in the datapath:
> > 
> > recirc_id(0xe),...ct_state(0/0x20),....,in_port(eth9),eth_type(0x86dd),... ,dp:tc, actions:ct,recirc(0xe)
> > 
> > Where the userspace rule was doing ct to the same table instead of moving to the next table:
> > 
> > ovs-ofctl add-flow br0 "table=0,in_port=eth9,ct_state=-trk,ipv6,actions=ct(table=0)"
> > 
> > So far we've not managed to think of a good way to resolve this in the code.
> > I don't think changing the kernel behaviour would be desirable, at least
> > not in that specific function as that is common conntrack code. I suspect
> > that ideally this is something we can try and address from the OVS side,
> > but at this moment I have no idea how this will be achieved, hence this
> > email.
> > 
> > Looking forward to get some suggestions on this
> > 
> > Regards
> > Louis Peens
> > 
> > PS: Tested on:
> > net-next kernel:
> >     d310ec03a34e Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > OVS:
> >     "cdaa7e0fd dpif-netdev: Fix crash when add dp flow without in_port field."
> >         +
> >     "[ovs-dev] [PATCH v3 0/3] Add offload support for ct_state rpl and inv flags"
> >     (The behaviour before and after the patch series in terms of the problem
> >      above is the same. Whether the recirc rules end up in the ovs datapath or tc
> >      datapath doesn't really matter)
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
       [not found]     ` <58820355-7337-d51b-32dd-be944600832d@corigine.com>
@ 2021-03-15 15:50       ` Louis Peens
  2021-03-16  7:00       ` wenxu
  1 sibling, 0 replies; 8+ messages in thread
From: Louis Peens @ 2021-03-15 15:50 UTC (permalink / raw)
  To: Marcelo Leitner, Ilya Maximets
  Cc: ovs-dev, Paul Blakey, netdev, Yinjun Zhang, Simon Horman, wenxu

Apologies for the horrible formatting on the previous mail, I had to move email
clients and have not figured it's settings out properly yet for mailing list use.
Hopefully this is better already. I think it's still mostly possible to follow, but let me
know if I should rather resend.

Regards
Louis

On 2021/03/15 17:29, Louis Peens wrote:
> Hi Marcelo
> 
> Thanks for taking time to take a look. I've replied inline - and also found
> a bit more info, although I'm not sure if it clears things up much. I do think
> that the main problem is the different upcall behaviour, I have not figured
> out what to do about it yet.
> 
> On 2021/03/13 00:06, Marcelo Leitner wrote:
>> Hi there,
>>
>> On Wed, Mar 10, 2021 at 12:06:52PM +0100, Ilya Maximets wrote:
>>> Hi, Louis.  Thanks for your report!
>>>
>>> Marcelo, Paul, could you, please, take a look?
>> Thanks for the ping.
>> +wenxu
>>
>>> Best regards, Ilya Maximets.
>>>
>>> On 3/10/21 8:51 AM, Louis Peens wrote:
>>>> Hi all
>>>>
>>>> We've recently encountered an interesting situation with OVS conntrack
>>>> when offloading to the TC datapath, and would like some feedback. Sorry
>>>> about the longish wall of text, but I'm trying to explain the problem
>>>> as clearly as possible. The very short summary is that there is a mismatch
>> Details are very welcomed, thanks for them.
>>
>>>> in behaviour between the OVS datapath and OVS+TC datapath, and we're
>>>> not sure how to resolve this. Here goes:
>>>>
>>>> We have a set of rules looking like this:
>>>> ovs-ofctl add-flow br0 "table=0,in_port=p1,ct_state=-trk,ipv6,actions=ct(table=1)"
>>>> ovs-ofctl add-flow br0 "table=0,in_port=p2,ct_state=-trk,ipv6,actions=ct(table=1)"
>>>> #post_ct flows"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+new,ipv6,actions=ct(commit),output:p2"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+new,ipv6,actions=ct(commit),output:p1"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+est,ipv6,actions=output:p2"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+est,ipv6,actions=output:p1"
>>>>
>>>> p1/p2 are the endpoints of two different veth pairs, just to keep this simple.
>>>> The rules above work well enough with UDP/TCP traffic, however ICMPv6 packets
>>>> (08:56:39.984375 IP6 2001:db8:0:f101::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2001:db8:0:f101::2, length 32)
>>>> breaks this somewhat. With TC offload disabled:
>>>>
>>>> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=false
>>>>
>>>> we get the following datapath rules:
>>>>
>>>> ovs-appctl dpctl/dump-flows --names
>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:drop
>>>> recirc_id(0),in_port(p1),ct_state(-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:ct,recirc(0x1)
>>>>
>>>> This part is still fine, we do not have a rule for just matching +trk, so the
>>>> the drop rule is to be expected. The problem however is when we enable TC
>>>> offload:
>>>>
>>>> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true
>>>>
>>>> This is the result in the datapath:
>>>>
>>>> ovs-appctl dpctl/dump-flows --names
>>>> ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:144, used:0.920s, actions:ct,recirc(0x1)
>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:86, used:0.928s, actions:drop
>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:drop
>>>>
>>>> Notice the installation of the two recirc rules, one with -trk and one with +trk,
>>>> with the -trk one being the rule that handles all the next packets. Further
>>>> investigation reveals that something like the following is happening:
>>>>
>>>> 1) The first packet arrives and is handled by the OVS datapath,
>> Hmm. This shouldn't happen if hw-offload=true, because the first rule
>> should be installed on tc datapath already. Or maybe you mean OVS
>> vswitchd when you referred to OVS datapath?
>>
>> What does  dpctl/dump-flows --names -m  gives in this situation, are
>> all flows installed on dp:tc?
> 
> Yes, packet would be handled by vswitchd here, triggering the installation of datapath
> flow rules. I think I may have mixed up packet handling and flow rule installation a bit.
> These are the expanded flows:
> ovs-appctl dpctl/dump-flows --more
> ufid:65c93f06-9f09-4b62-b309-951c36d3d98a, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:2.080s, dp:tc, actions:ct,recirc(0x1)
> ufid:30fd8977-0bcc-41b1-8800-1262cea71005, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
> ufid:c176bac4-a42d-4e8b-99d9-bce386c1be4f, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
> 
> The pre-recirc rule is in tc, but the post-recirc rules are in OVS.
> 
>> triggering the installation of the two rules like in the non-offloaded
>>>>    case. So the recirc_id(0) rule gets installed into tc, and recirc_id(0x1)
>>>>    gets installed into the ovs datapath. This bit of code in the OVS module
>>>>    makes sure that +trk is set.
>>>>
>>>>     /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
>>>>      * previously sent the packet to conntrack via the ct action.....
>>>>      * /
>>>>    static void ovs_ct_update_key(const struct sk_buff *skb,
>>>>                               const struct ovs_conntrack_info *info,
>>>>                               struct sw_flow_key *key, bool post_ct,
>>>>                               bool keep_nat_flags)
>>>>     {
>>>>             ...
>>>>             ct = nf_ct_get(skb, &ctinfo);
>>>>             if (ct) {//tracked
>>>>                     ...
>>>>             } else if (post_ct) {
>>>>                     state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
>>>>                     if (info)
>>>>                             zone = &info->zone;
>>>>             }
>>>>             __ovs_ct_update_key(key, state, zone, ct);
>>>>
>>>>     }
>>>>     Obviously this is not the case when the packet was sent to conntrack
>>>>     via tc.
>>>>
>>>> 2) The second packet arrives, and now hits the rule installed in
>>>>    TC. However, TC does not handle ICMPv6 (Neighbor Solicitation), and explicitely
>>>>    clears the tracked bit (net/netfilter/nf_conntrack_proto_icmpv6.c):
>>>>
>>>>     int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
>>>>                                   struct sk_buff *skb,
>>>>                                   unsigned int dataoff,
>>>>                                   const struct nf_hook_state *state)
>>>>
>>>>     {
>>>>     ...
>>>>             type = icmp6h->icmp6_type - 130;
>>>>             if (type >= 0 && type < sizeof(noct_valid_new) &&
>>>>                 noct_valid_new[type]) {
>>>>                     nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
>>>>                     return NF_ACCEPT;
>>>>             }
>>>>     ...
>>>>     }
>>>>     (The above code gets triggered a few function calls down from act_ct.c)
>> I don't follow this part, and it seems it would affect ovs kernel
>> dp as well. Can you please elaborate on the call chain you're focusing
>> here?
> The chain I was referring to is:
> tcf_ct_act->nf_conntrack_in->nf_conntrack_handle_icmp->nf_conntrack_icmpv6_error
> However I'm not so sure anymore that this is relevant, and yes, would probably affect
> ovs as well.
>>>> 3) So now the packet does not hit the +trk rule after the recirc, and leads
>>>>    to the installation of the "recirc_id(0x1),..-trk" rule, since +trk wasn't
>>>>    set by TC.
>> If you meant vswitchd above, this can be the problem, yes.
>> ovs_ct_update_key() is updating the key, and AFAICT that's reflected
>> on the upcall. Which, then, it's fair to assume (I didn't check)
>> vswitchd does the same.
>>
>> But for tc, +trk+inv is synthetsized when tc is trying to match again
>> on this packet, when skb_flow_dissect_ct() in it will:
>>
>>        if (!ct) {
>>                key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
>>                                TCA_FLOWER_KEY_CT_FLAGS_INVALID;
>>                return;
>>        }
>>
>> Note that 'key' here is not part of the packet in any way. The only
>> information that is stored within the packet, is
>> qdisc_skb_cb(skb)->post_ct, which ovs kernel doesn't know about. So
>> this wouldn't be reflected on an upcall, causing vswitchd to not see
>> these flags.
>>
>> IOW, an upcall right after this flow:
>> ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), actions:ct,recirc(0x1)
>> can be different if it's from tc datapath or ovs kernel/vswitchd
>> regarding these flags in this case.
>>
>> Makes sense? I think we're mostly on the same page on this part,
>> actually.
> I think this is what it boils down to in the end yes. I did do some bisecting of the kernel tree in the mean time:
> Just before "7baf2429a1a9 net/sched: cls_flower add CT_FLAGS_INVALID flag support" all the rules end up in
> in dp:tc, but we have have -trk and +trk as explained. Then after the commit does look to be working as expected,
> rules get's installed in tc, and only +trk set:
> ovs-appctl dpctl/dump-flows --more
> ufid:e476d5a2-3133-405c-9826-ab911c2c3240, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.890s, dp:tc, actions:ct,recirc(0x2)
> ufid:b8369209-3069-4280-9914-820d98a3a536, skb_priority(0/0),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x2),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.890s, dp:tc, actions:drop
> 
> Then things go south again with this commit:
> "1bcc51ac0731 net/sched: cls_flower: Reject invalid ct_state flags rules"
> This is the point where the recirc rule is rejected by tc, and leads to the installation
> of the two ovs rules as in the dump at the start of the email.
> 
> Also, not everything is good at this commit:
> "7baf2429a1a9 net/sched: cls_flower add CT_FLAGS_INVALID flag support"
> If I add userspace rules to also match on +inv so that it isn't wildcarded I
> (with the ovs patches for tc +inv support removed) I get the same two-recirc-rule
> behaviour. I do think that it matches the current theory on the upcall behaviour.
> We will keep on digging on our side as well, this did give some more avenues
> of thought, thanks.
> 
> Regards
> Louis
>> Thanks,
>> Marcelo
>>
>>>> This is now the point where we're a bit stuck and is hoping for some ideas
>>>> on how to best resolve this. A workaround is of course just to modify the
>>>> userspace rules to not send the icmp packets to conntrack and that should
>>>> work, but it is a workaround. I think this inconsistency between TC
>>>> offload and non-TC is quite undesirable, and could lead to some interesting
>>>> results, for instance this was first detected by the observation of packets
>>>> getting stuck in a loop in the datapath:
>>>>
>>>> recirc_id(0xe),...ct_state(0/0x20),....,in_port(eth9),eth_type(0x86dd),... ,dp:tc, actions:ct,recirc(0xe)
>>>>
>>>> Where the userspace rule was doing ct to the same table instead of moving to the next table:
>>>>
>>>> ovs-ofctl add-flow br0 "table=0,in_port=eth9,ct_state=-trk,ipv6,actions=ct(table=0)"
>>>>
>>>> So far we've not managed to think of a good way to resolve this in the code.
>>>> I don't think changing the kernel behaviour would be desirable, at least
>>>> not in that specific function as that is common conntrack code. I suspect
>>>> that ideally this is something we can try and address from the OVS side,
>>>> but at this moment I have no idea how this will be achieved, hence this
>>>> email.
>>>>
>>>> Looking forward to get some suggestions on this
>>>>
>>>> Regards
>>>> Louis Peens
>>>>
>>>> PS: Tested on:
>>>> net-next kernel:
>>>>     d310ec03a34e Merge tag 'perf-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>> OVS:
>>>>     "cdaa7e0fd dpif-netdev: Fix crash when add dp flow without in_port field."
>>>>         +
>>>>     "[ovs-dev] [PATCH v3 0/3] Add offload support for ct_state rpl and inv flags"
>>>>     (The behaviour before and after the patch series in terms of the problem
>>>>      above is the same. Whether the recirc rules end up in the ovs datapath or tc
>>>>      datapath doesn't really matter)
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev@openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
       [not found]     ` <58820355-7337-d51b-32dd-be944600832d@corigine.com>
  2021-03-15 15:50       ` Louis Peens
@ 2021-03-16  7:00       ` wenxu
  2021-03-16 15:12         ` Louis Peens
  1 sibling, 1 reply; 8+ messages in thread
From: wenxu @ 2021-03-16  7:00 UTC (permalink / raw)
  To: Louis Peens, Marcelo Leitner, Ilya Maximets
  Cc: ovs-dev, Paul Blakey, netdev, Yinjun Zhang, Simon Horman


On 3/15/2021 11:29 PM, Louis Peens wrote:
> Hi Marcelo
>
> Thanks for taking time to take a look. I've replied inline - and also found
> a bit more info, although I'm not sure if it clears things up much. I do think
> that the main problem is the different upcall behaviour, I have not figured
> out what to do about it yet.
>
> On 2021/03/13 00:06, Marcelo Leitner wrote:
>> Hi there,
>>
>> On Wed, Mar 10, 2021 at 12:06:52PM +0100, Ilya Maximets wrote:
>>> Hi, Louis.  Thanks for your report!
>>>
>>> Marcelo, Paul, could you, please, take a look?
>> Thanks for the ping.
>> +wenxu
>>
>>> Best regards, Ilya Maximets.
>>>
>>> On 3/10/21 8:51 AM, Louis Peens wrote:
>>>> Hi all
>>>>
>>>> We've recently encountered an interesting situation with OVS conntrack
>>>> when offloading to the TC datapath, and would like some feedback. Sorry
>>>> about the longish wall of text, but I'm trying to explain the problem
>>>> as clearly as possible. The very short summary is that there is a mismatch
>> Details are very welcomed, thanks for them.
>>
>>>> in behaviour between the OVS datapath and OVS+TC datapath, and we're
>>>> not sure how to resolve this. Here goes:
>>>>
>>>> We have a set of rules looking like this:
>>>> ovs-ofctl add-flow br0 "table=0,in_port=p1,ct_state=-trk,ipv6,actions=ct(table=1)"
>>>> ovs-ofctl add-flow br0 "table=0,in_port=p2,ct_state=-trk,ipv6,actions=ct(table=1)"
>>>> #post_ct flows"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+new,ipv6,actions=ct(commit),output:p2"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+new,ipv6,actions=ct(commit),output:p1"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+est,ipv6,actions=output:p2"
>>>> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+est,ipv6,actions=output:p1"
>>>>
>>>> p1/p2 are the endpoints of two different veth pairs, just to keep this simple.
>>>> The rules above work well enough with UDP/TCP traffic, however ICMPv6 packets
>>>> (08:56:39.984375 IP6 2001:db8:0:f101::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2001:db8:0:f101::2, length 32)
>>>> breaks this somewhat. With TC offload disabled:
>>>>
>>>> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=false
>>>>
>>>> we get the following datapath rules:
>>>>
>>>> ovs-appctl dpctl/dump-flows --names
>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:drop
>>>> recirc_id(0),in_port(p1),ct_state(-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:ct,recirc(0x1)
>>>>
>>>> This part is still fine, we do not have a rule for just matching +trk, so the
>>>> the drop rule is to be expected. The problem however is when we enable TC
>>>> offload:
>>>>
>>>> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true
>>>>
>>>> This is the result in the datapath:
>>>>
>>>> ovs-appctl dpctl/dump-flows --names
>>>> ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:144, used:0.920s, actions:ct,recirc(0x1)
>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:86, used:0.928s, actions:drop
>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:drop
>>>>
>>>> Notice the installation of the two recirc rules, one with -trk and one with +trk,
>>>> with the -trk one being the rule that handles all the next packets. Further
>>>> investigation reveals that something like the following is happening:
>>>>
>>>> 1) The first packet arrives and is handled by the OVS datapath,
>> Hmm. This shouldn't happen if hw-offload=true, because the first rule
>> should be installed on tc datapath already. Or maybe you mean OVS
>> vswitchd when you referred to OVS datapath?
>>
>> What does  dpctl/dump-flows --names -m  gives in this situation, are
>> all flows installed on dp:tc?
>
> Yes, packet would be handled by vswitchd here, triggering the installation of datapath
> flow rules. I think I may have mixed up packet handling and flow rule installation a bit.
> These are the expanded flows:
> ovs-appctl dpctl/dump-flows --more
> ufid:65c93f06-9f09-4b62-b309-951c36d3d98a, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:2.080s, dp:tc, actions:ct,recirc(0x1)
> ufid:30fd8977-0bcc-41b1-8800-1262cea71005, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
> ufid:c176bac4-a42d-4e8b-99d9-bce386c1be4f, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
>
> The pre-recirc rule is in tc, but the post-recirc rules are in OVS.
>
>> triggering the installation of the two rules like in the non-offloaded
>>>>     case. So the recirc_id(0) rule gets installed into tc, and recirc_id(0x1)
>>>>     gets installed into the ovs datapath. This bit of code in the OVS module
>>>>     makes sure that +trk is set.
>>>>
>>>>      /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
>>>>       * previously sent the packet to conntrack via the ct action.....
>>>>       * /
>>>>     static void ovs_ct_update_key(const struct sk_buff *skb,
>>>>                                const struct ovs_conntrack_info *info,
>>>>                                struct sw_flow_key *key, bool post_ct,
>>>>                                bool keep_nat_flags)
>>>>      {
>>>>              ...
>>>>              ct = nf_ct_get(skb, &ctinfo);
>>>>              if (ct) {//tracked
>>>>                      ...
>>>>              } else if (post_ct) {
>>>>                      state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
>>>>                      if (info)
>>>>                              zone = &info->zone;
>>>>              }
>>>>              __ovs_ct_update_key(key, state, zone, ct);
>>>>
>>>>      }
>>>>      Obviously this is not the case when the packet was sent to conntrack
>>>>      via tc.
>>>>
>>>> 2) The second packet arrives, and now hits the rule installed in
>>>>     TC. However, TC does not handle ICMPv6 (Neighbor Solicitation), and explicitely
>>>>     clears the tracked bit (net/netfilter/nf_conntrack_proto_icmpv6.c):
>>>>
>>>>      int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
>>>>                                    struct sk_buff *skb,
>>>>                                    unsigned int dataoff,
>>>>                                    const struct nf_hook_state *state)
>>>>
>>>>      {
>>>>      ...
>>>>              type = icmp6h->icmp6_type - 130;
>>>>              if (type >= 0 && type < sizeof(noct_valid_new) &&
>>>>                  noct_valid_new[type]) {
>>>>                      nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
>>>>                      return NF_ACCEPT;
>>>>              }
>>>>      ...
>>>>      }
>>>>      (The above code gets triggered a few function calls down from act_ct.c)
>> I don't follow this part, and it seems it would affect ovs kernel
>> dp as well. Can you please elaborate on the call chain you're focusing
>> here?
> The chain I was referring to is:
> tcf_ct_act->nf_conntrack_in->nf_conntrack_handle_icmp->nf_conntrack_icmpv6_error
> However I'm not so sure anymore that this is relevant, and yes, would probably affect
> ovs as well.
>>>> 3) So now the packet does not hit the +trk rule after the recirc, and leads
>>>>     to the installation of the "recirc_id(0x1),..-trk" rule, since +trk wasn't
>>>>     set by TC.
>> If you meant vswitchd above, this can be the problem, yes.
>> ovs_ct_update_key() is updating the key, and AFAICT that's reflected
>> on the upcall. Which, then, it's fair to assume (I didn't check)
>> vswitchd does the same.
>>
>> But for tc, +trk+inv is synthetsized when tc is trying to match again
>> on this packet, when skb_flow_dissect_ct() in it will:
>>
>>         if (!ct) {
>>                 key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
>>                                 TCA_FLOWER_KEY_CT_FLAGS_INVALID;
>>                 return;
>>         }
>>
>> Note that 'key' here is not part of the packet in any way. The only
>> information that is stored within the packet, is
>> qdisc_skb_cb(skb)->post_ct, which ovs kernel doesn't know about. So
>> this wouldn't be reflected on an upcall, causing vswitchd to not see
>> these flags.
>>
>> IOW, an upcall right after this flow:
>> ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), actions:ct,recirc(0x1)
>> can be different if it's from tc datapath or ovs kernel/vswitchd
>> regarding these flags in this case.
>>
>> Makes sense? I think we're mostly on the same page on this part,
>> actually.
> I think this is what it boils down to in the end yes. I did do some bisecting of the kernel tree in the mean time:
> Just before "7baf2429a1a9 net/sched: cls_flower add CT_FLAGS_INVALID flag support" all the rules end up in
> in dp:tc, but we have have -trk and +trk as explained. Then after the commit does look to be working as expected,
> rules get's installed in tc, and only +trk set:
> ovs-appctl dpctl/dump-flows --more
> ufid:e476d5a2-3133-405c-9826-ab911c2c3240, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.890s, dp:tc, actions:ct,recirc(0x2)
> ufid:b8369209-3069-4280-9914-820d98a3a536, skb_priority(0/0),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x2),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.890s, dp:tc, actions:drop
>
> Then things go south again with this commit:
> "1bcc51ac0731 net/sched: cls_flower: Reject invalid ct_state flags rules"
> This is the point where the recirc rule is rejected by tc, and leads to the installation
> of the two ovs rules as in the dump at the start of the email.


I think there are some problem in the commit 1bcc51ac0731, The dp flow

with est and new in the mask, This flow will be reject by in the fl_validate_ct_state.

We will fix it.  This validate should not only based on flags in the mask.


And in this case the dp flow does't contain inv flags?

What about the detail dp flows for tc-offload=false case?



>
> Also, not everything is good at this commit:
> "7baf2429a1a9 net/sched: cls_flower add CT_FLAGS_INVALID flag support"
> If I add userspace rules to also match on +inv so that it isn't wildcarded I
> (with the ovs patches for tc +inv support removed) I get the same two-recirc-rule
> behaviour. I do think that it matches the current theory on the upcall behaviour.
> We will keep on digging on our side as well, this did give some more avenues
> of thought, thanks.
>
You means the ovs with userspace flow +inv? And the dp flow doesn't contain the inv flags?

Or the flow not be hit anymore?



BR

wenxu




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
  2021-03-16  7:00       ` wenxu
@ 2021-03-16 15:12         ` Louis Peens
  2021-03-26 16:58           ` Marcelo Leitner
  0 siblings, 1 reply; 8+ messages in thread
From: Louis Peens @ 2021-03-16 15:12 UTC (permalink / raw)
  To: wenxu, Marcelo Leitner, Ilya Maximets
  Cc: ovs-dev, Paul Blakey, netdev, Yinjun Zhang, Simon Horman



On 2021/03/16 09:00, wenxu wrote:
> 
> On 3/15/2021 11:29 PM, Louis Peens wrote:
>> Hi Marcelo
>>
>> Thanks for taking time to take a look. I've replied inline - and also found
>> a bit more info, although I'm not sure if it clears things up much. I do think
>> that the main problem is the different upcall behaviour, I have not figured
>> out what to do about it yet.
>>
>> On 2021/03/13 00:06, Marcelo Leitner wrote:
>>> Hi there,
>>>
>>> On Wed, Mar 10, 2021 at 12:06:52PM +0100, Ilya Maximets wrote:
>>>> Hi, Louis.  Thanks for your report!
>>>>
>>>> Marcelo, Paul, could you, please, take a look?
>>> Thanks for the ping.
>>> +wenxu
>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>> On 3/10/21 8:51 AM, Louis Peens wrote:
>>>>> Hi all
>>>>>
>>>>> We've recently encountered an interesting situation with OVS conntrack
>>>>> when offloading to the TC datapath, and would like some feedback. Sorry
>>>>> about the longish wall of text, but I'm trying to explain the problem
>>>>> as clearly as possible. The very short summary is that there is a mismatch
>>> Details are very welcomed, thanks for them.
>>>
>>>>> in behaviour between the OVS datapath and OVS+TC datapath, and we're
>>>>> not sure how to resolve this. Here goes:
>>>>>
>>>>> We have a set of rules looking like this:
>>>>> ovs-ofctl add-flow br0 "table=0,in_port=p1,ct_state=-trk,ipv6,actions=ct(table=1)"
>>>>> ovs-ofctl add-flow br0 "table=0,in_port=p2,ct_state=-trk,ipv6,actions=ct(table=1)"
>>>>> #post_ct flows"
>>>>> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+new,ipv6,actions=ct(commit),output:p2"
>>>>> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+new,ipv6,actions=ct(commit),output:p1"
>>>>> ovs-ofctl add-flow br0 "table=1,in_port=p1,ct_state=+trk+est,ipv6,actions=output:p2"
>>>>> ovs-ofctl add-flow br0 "table=1,in_port=p2,ct_state=+trk+est,ipv6,actions=output:p1"
>>>>>
>>>>> p1/p2 are the endpoints of two different veth pairs, just to keep this simple.
>>>>> The rules above work well enough with UDP/TCP traffic, however ICMPv6 packets
>>>>> (08:56:39.984375 IP6 2001:db8:0:f101::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2001:db8:0:f101::2, length 32)
>>>>> breaks this somewhat. With TC offload disabled:
>>>>>
>>>>> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=false
>>>>>
>>>>> we get the following datapath rules:
>>>>>
>>>>> ovs-appctl dpctl/dump-flows --names
>>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:drop
>>>>> recirc_id(0),in_port(p1),ct_state(-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:172, used:1.329s, actions:ct,recirc(0x1)
>>>>>
>>>>> This part is still fine, we do not have a rule for just matching +trk, so the
>>>>> the drop rule is to be expected. The problem however is when we enable TC
>>>>> offload:
>>>>>
>>>>> ovs-vsctl --no-wait set Open_vSwitch . other_config:hw-offload=true
>>>>>
>>>>> This is the result in the datapath:
>>>>>
>>>>> ovs-appctl dpctl/dump-flows --names
>>>>> ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), packets:2, bytes:144, used:0.920s, actions:ct,recirc(0x1)
>>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est-trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:1, bytes:86, used:0.928s, actions:drop
>>>>> recirc_id(0x1),in_port(p1),ct_state(-new-est+trk),eth(),eth_type(0x86dd),ipv6(frag=no), packets:0, bytes:0, used:never, actions:drop
>>>>>
>>>>> Notice the installation of the two recirc rules, one with -trk and one with +trk,
>>>>> with the -trk one being the rule that handles all the next packets. Further
>>>>> investigation reveals that something like the following is happening:
>>>>>
>>>>> 1) The first packet arrives and is handled by the OVS datapath,
>>> Hmm. This shouldn't happen if hw-offload=true, because the first rule
>>> should be installed on tc datapath already. Or maybe you mean OVS
>>> vswitchd when you referred to OVS datapath?
>>>
>>> What does  dpctl/dump-flows --names -m  gives in this situation, are
>>> all flows installed on dp:tc?
>>
>> Yes, packet would be handled by vswitchd here, triggering the installation of datapath
>> flow rules. I think I may have mixed up packet handling and flow rule installation a bit.
>> These are the expanded flows:
>> ovs-appctl dpctl/dump-flows --more
>> ufid:65c93f06-9f09-4b62-b309-951c36d3d98a, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:2.080s, dp:tc, actions:ct,recirc(0x1)
>> ufid:30fd8977-0bcc-41b1-8800-1262cea71005, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
>> ufid:c176bac4-a42d-4e8b-99d9-bce386c1be4f, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
>>
>> The pre-recirc rule is in tc, but the post-recirc rules are in OVS.
>>
>>> triggering the installation of the two rules like in the non-offloaded
>>>>>     case. So the recirc_id(0) rule gets installed into tc, and recirc_id(0x1)
>>>>>     gets installed into the ovs datapath. This bit of code in the OVS module
>>>>>     makes sure that +trk is set.
>>>>>
>>>>>      /* Update 'key' based on skb->_nfct.  If 'post_ct' is true, then OVS has
>>>>>       * previously sent the packet to conntrack via the ct action.....
>>>>>       * /
>>>>>     static void ovs_ct_update_key(const struct sk_buff *skb,
>>>>>                                const struct ovs_conntrack_info *info,
>>>>>                                struct sw_flow_key *key, bool post_ct,
>>>>>                                bool keep_nat_flags)
>>>>>      {
>>>>>              ...
>>>>>              ct = nf_ct_get(skb, &ctinfo);
>>>>>              if (ct) {//tracked
>>>>>                      ...
>>>>>              } else if (post_ct) {
>>>>>                      state = OVS_CS_F_TRACKED | OVS_CS_F_INVALID;
>>>>>                      if (info)
>>>>>                              zone = &info->zone;
>>>>>              }
>>>>>              __ovs_ct_update_key(key, state, zone, ct);
>>>>>
>>>>>      }
>>>>>      Obviously this is not the case when the packet was sent to conntrack
>>>>>      via tc.
>>>>>
>>>>> 2) The second packet arrives, and now hits the rule installed in
>>>>>     TC. However, TC does not handle ICMPv6 (Neighbor Solicitation), and explicitely
>>>>>     clears the tracked bit (net/netfilter/nf_conntrack_proto_icmpv6.c):
>>>>>
>>>>>      int nf_conntrack_icmpv6_error(struct nf_conn *tmpl,
>>>>>                                    struct sk_buff *skb,
>>>>>                                    unsigned int dataoff,
>>>>>                                    const struct nf_hook_state *state)
>>>>>
>>>>>      {
>>>>>      ...
>>>>>              type = icmp6h->icmp6_type - 130;
>>>>>              if (type >= 0 && type < sizeof(noct_valid_new) &&
>>>>>                  noct_valid_new[type]) {
>>>>>                      nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
>>>>>                      return NF_ACCEPT;
>>>>>              }
>>>>>      ...
>>>>>      }
>>>>>      (The above code gets triggered a few function calls down from act_ct.c)
>>> I don't follow this part, and it seems it would affect ovs kernel
>>> dp as well. Can you please elaborate on the call chain you're focusing
>>> here?
>> The chain I was referring to is:
>> tcf_ct_act->nf_conntrack_in->nf_conntrack_handle_icmp->nf_conntrack_icmpv6_error
>> However I'm not so sure anymore that this is relevant, and yes, would probably affect
>> ovs as well.
>>>>> 3) So now the packet does not hit the +trk rule after the recirc, and leads
>>>>>     to the installation of the "recirc_id(0x1),..-trk" rule, since +trk wasn't
>>>>>     set by TC.
>>> If you meant vswitchd above, this can be the problem, yes.
>>> ovs_ct_update_key() is updating the key, and AFAICT that's reflected
>>> on the upcall. Which, then, it's fair to assume (I didn't check)
>>> vswitchd does the same.
>>>
>>> But for tc, +trk+inv is synthetsized when tc is trying to match again
>>> on this packet, when skb_flow_dissect_ct() in it will:
>>>
>>>         if (!ct) {
>>>                 key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED |
>>>                                 TCA_FLOWER_KEY_CT_FLAGS_INVALID;
>>>                 return;
>>>         }
>>>
>>> Note that 'key' here is not part of the packet in any way. The only
>>> information that is stored within the packet, is
>>> qdisc_skb_cb(skb)->post_ct, which ovs kernel doesn't know about. So
>>> this wouldn't be reflected on an upcall, causing vswitchd to not see
>>> these flags.
>>>
>>> IOW, an upcall right after this flow:
>>> ct_state(-trk),recirc_id(0),in_port(p1),eth_type(0x86dd),ipv6(frag=no), actions:ct,recirc(0x1)
>>> can be different if it's from tc datapath or ovs kernel/vswitchd
>>> regarding these flags in this case.
>>>
>>> Makes sense? I think we're mostly on the same page on this part,
>>> actually.
>> I think this is what it boils down to in the end yes. I did do some bisecting of the kernel tree in the mean time:
>> Just before "7baf2429a1a9 net/sched: cls_flower add CT_FLAGS_INVALID flag support" all the rules end up in
>> in dp:tc, but we have have -trk and +trk as explained. Then after the commit does look to be working as expected,
>> rules get's installed in tc, and only +trk set:
>> ovs-appctl dpctl/dump-flows --more
>> ufid:e476d5a2-3133-405c-9826-ab911c2c3240, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.890s, dp:tc, actions:ct,recirc(0x2)
>> ufid:b8369209-3069-4280-9914-820d98a3a536, skb_priority(0/0),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x2),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.890s, dp:tc, actions:drop
>>
>> Then things go south again with this commit:
>> "1bcc51ac0731 net/sched: cls_flower: Reject invalid ct_state flags rules"
>> This is the point where the recirc rule is rejected by tc, and leads to the installation
>> of the two ovs rules as in the dump at the start of the email.
> 
> 
> I think there are some problem in the commit 1bcc51ac0731, The dp flow
> 
> with est and new in the mask, This flow will be reject by in the fl_validate_ct_state.
> 
> We will fix it.  This validate should not only based on flags in the mask.
Ah yes, that does make sense! Thanks.
> 
> 
> And in this case the dp flow does't contain inv flags?

This does NOT have +inv userspace match rules. For offload=true (on commit 1bcc51ac0731). So I think in this case the inv match is just wildcarded:
ovs-appctl dpctl/dump-flows --more
ufid:fa080783-115c-46bf-b605-2b1e8a1e9b1e, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.030s, dp:tc, actions:ct,recirc(0x1)
ufid:2a129dbc-d2a0-4f77-b7bc-2b13fd4f10dc, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
ufid:7ccc0cc9-800d-4ccc-9403-4b2c10d47da3, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop

> 
> What about the detail dp flows for tc-offload=false case?
For offload=false:
ovs-appctl dpctl/dump-flows --more
ufid:48c19d71-9e84-4b4f-b896-26c451bd5102, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00
:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:86, used:2.785s, dp:ovs, actions:ct,recirc(0x1)
ufid:427b06f5-d1c3-4c24-9ae2-01ef56b2c2a2, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:
00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:86, used:2.785s, dp:ovs, actions:drop

> 
> 
> 
>>
>> Also, not everything is good at this commit:
>> "7baf2429a1a9 net/sched: cls_flower add CT_FLAGS_INVALID flag support"
>> If I add userspace rules to also match on +inv so that it isn't wildcarded I
>> (with the ovs patches for tc +inv support removed) I get the same two-recirc-rule
>> behaviour. I do think that it matches the current theory on the upcall behaviour.
>> We will keep on digging on our side as well, this did give some more avenues
>> of thought, thanks.
>>
> You means the ovs with userspace flow +inv? And the dp flow doesn't contain the inv flags?
With these userspace rules:
ovs-ofctl dump-flows br0
 cookie=0x0, duration=104.387s, table=0, n_packets=2, n_bytes=158, ct_state=-trk,ipv6,in_port=p1 actions=ct(table=1)
 cookie=0x0, duration=104.382s, table=0, n_packets=0, n_bytes=0, ct_state=-trk,ipv6,in_port=p2 actions=ct(table=1)
 cookie=0x0, duration=104.376s, table=1, n_packets=0, n_bytes=0, ct_state=+new+trk,ipv6,in_port=p1 actions=ct(commit),output:p2
 cookie=0x0, duration=104.371s, table=1, n_packets=0, n_bytes=0, ct_state=+new+trk,ipv6,in_port=p2 actions=ct(commit),output:p1
 cookie=0x0, duration=104.366s, table=1, n_packets=0, n_bytes=0, ct_state=+est+trk,ipv6,in_port=p1 actions=output:p2
 cookie=0x0, duration=104.361s, table=1, n_packets=0, n_bytes=0, ct_state=+est+trk,ipv6,in_port=p2 actions=output:p1
# These two rules are new for this part
 cookie=0x0, duration=104.356s, table=1, n_packets=1, n_bytes=86, ct_state=+inv+trk,ipv6,in_port=p1 actions=output:p2
 cookie=0x0, duration=104.352s, table=1, n_packets=0, n_bytes=0, ct_state=+inv+trk,ipv6,in_port=p2 actions=output:p1


I get this (on commit 7baf2429a1a9), with offload=true:
ovs-appctl dpctl/dump-flows --more
ufid:f8c9546b-6bfc-46a9-b901-a44f39bfff71, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:2.160s, dp:tc, actions:ct,recirc(0x1)
ufid:45dce3af-9b92-4161-a129-20fdeab76f9f, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x33),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:drop
ufid:a1957f09-85c5-4240-a690-4eb0bbcc467a, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x30/0x33),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:never, dp:ovs, actions:p2
> 
> Or the flow not be hit anymore?
With offload=false it looks like this:
ovs-appctl dpctl/dump-flows --more
ufid:ea73a727-65b8-44d5-9997-b3dff3e3098a, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x30/0x33),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:
00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:86, used:1.741s, dp:ovs, actions:p2
ufid:32ceedd2-1809-41cf-a512-322093fc2135, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00
:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:86, used:1.741s, dp:ovs, actions:ct,recirc(0x1)

WITHOUT the +inv userspace rules this commit gets these two tc rules (offload=true, commit 7baf2429a1a9), which looks good:
ovs-appctl dpctl/dump-flows --more
ufid:03e76570-41ba-44a2-b4cf-15bf6890a3e6, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.220s, dp:tc, actions:ct,recirc(0x1)
ufid:cc469290-a485-4433-a365-18719f4d9f17, skb_priority(0/0),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x1),dp_hash(0/0),in_port(p1),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:72, used:1.220s, dp:tc, actions:drop

and for completeness, the (offload=false) version of the WITHOUT +inv case, at commit 7baf2429a1a9:
ovs-appctl dpctl/dump-flows --more
ufid:5e12c343-d73b-4dec-9864-6d957a429794, recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00
:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:86, used:1.247s, dp:ovs, actions:ct,recirc(0x1)
ufid:7ee39b78-338b-4dfa-b75d-89d42204c98d, recirc_id(0x1),dp_hash(0/0),skb_priority(0/0),in_port(p1),skb_mark(0/0),ct_state(0x20/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:
00/00:00:00:00:00:00),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:86, used:1.247s, dp:ovs, actions:drop
> 
> 
> 
> BR
> 
> wenxu
So in the end I think there are two problems - the on you identified with only checking
the mask in commit 1bcc51ac0731. And then the second bigger one is that the behaviour
differs depending on whether the recirc upcall is after the a rule installed in tc
or a rule installed in ovs, as Marcelo mentioned.

Thanks for helping to look into this

Regards
Louis
> 
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
  2021-03-16 15:12         ` Louis Peens
@ 2021-03-26 16:58           ` Marcelo Leitner
  2021-03-26 18:03             ` Louis Peens
  0 siblings, 1 reply; 8+ messages in thread
From: Marcelo Leitner @ 2021-03-26 16:58 UTC (permalink / raw)
  To: Louis Peens
  Cc: wenxu, Ilya Maximets, ovs-dev, Paul Blakey, netdev, Yinjun Zhang,
	Simon Horman

On Tue, Mar 16, 2021 at 05:12:22PM +0200, Louis Peens wrote:
> So in the end I think there are two problems - the on you identified with only checking
> the mask in commit 1bcc51ac0731. And then the second bigger one is that the behaviour
> differs depending on whether the recirc upcall is after the a rule installed in tc
> or a rule installed in ovs, as Marcelo mentioned.

Hi Louis,

Not sure if you noticed but both fixes landed in upstream kernel
already.
That's basically:
afa536d8405a ("net/sched: cls_flower: fix only mask bit check in the
validate_ct_state")
d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after
do conntrack in act_ct")

If testing again, it's probably better if you use the latest one.

Thanks,
Marcelo


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6
  2021-03-26 16:58           ` Marcelo Leitner
@ 2021-03-26 18:03             ` Louis Peens
  0 siblings, 0 replies; 8+ messages in thread
From: Louis Peens @ 2021-03-26 18:03 UTC (permalink / raw)
  To: Marcelo Leitner
  Cc: wenxu, Ilya Maximets, ovs-dev, Paul Blakey, netdev, Yinjun Zhang,
	Simon Horman



On 2021/03/26 18:58, Marcelo Leitner wrote:
> On Tue, Mar 16, 2021 at 05:12:22PM +0200, Louis Peens wrote:
>> So in the end I think there are two problems - the on you identified with only checking
>> the mask in commit 1bcc51ac0731. And then the second bigger one is that the behaviour
>> differs depending on whether the recirc upcall is after the a rule installed in tc
>> or a rule installed in ovs, as Marcelo mentioned.
> 
> Hi Louis,
> 
> Not sure if you noticed but both fixes landed in upstream kernel
> already.
> That's basically:
> afa536d8405a ("net/sched: cls_flower: fix only mask bit check in the
> validate_ct_state")
> d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after
> do conntrack in act_ct")
> 
> If testing again, it's probably better if you use the latest one.
Hi Marcelo

Thanks for the ping, I saw the mask fix, but I did indeed miss the
post_ct fix. Looking at the change it looks like it would address
the issue that we saw. Will test it out and report back in case
something is still wrong, but hopefully this is the end of the
chain. Thanks for the help everyone.

Regards
Louis

> 
> Thanks,
> Marcelo
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-03-26 18:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10  7:51 [ovs-dev] tc-conntrack: inconsistent behaviour with icmpv6 Louis Peens
2021-03-10 11:06 ` Ilya Maximets
2021-03-12 22:06   ` Marcelo Leitner
     [not found]     ` <58820355-7337-d51b-32dd-be944600832d@corigine.com>
2021-03-15 15:50       ` Louis Peens
2021-03-16  7:00       ` wenxu
2021-03-16 15:12         ` Louis Peens
2021-03-26 16:58           ` Marcelo Leitner
2021-03-26 18:03             ` Louis Peens

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.