All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC]  Connection Tracking Offload netdev RFC v1.0, part 1/2: command line + implementation
@ 2019-01-23 11:29 Guy Shattah
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Guy Shattah @ 2019-01-23 11:29 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

--------------------------------------------------------------------------
Connection Tracking Offload netdev RFC v1.0 Part 1/2  - TC with  Connection Tracking - command line + implementation
--------------------------------------------------------------------------

OVS recirculation ID is to be translated to TC chain, as described in https://www.netdevconf.org/2.2/papers/efraim-extendtctoct-talk.pdf

------------------------------------------------------------------------------------
CT Matches:
------------------------------------------------------------------------------------
The ct match acts on ct_state bits or ct variables which were modified as a result from a connection tracking action.

Some of the information can be extacted directly from struct nf_conn and the rest of the information could be taken by using
conntrack_mt...() [/net/netfilter/xt_conntrack.c] 


1.  ct_state  - a new variable
    The ct_state match is used to test result of connection tracking.
    The bits are set or unset according to the results of the connection tracking module.

The following Match able ct_state items are supported:
*   ±trk - Tracked - Been through the connection tracker 
*   ±new – a new connection
*   ±est - Established connection 
*   ±dnat - Packet’s source address/port was mangled by NAT. 
*   ±snat - Packet’s destination address/port was mangled by NAT.
*   ±inv - Invalid packet
*   ±rel – Related  to an existing connection
*   ±rpl  - Reply: Connection must be established

Example #1: "tc filter add dev eth5 protocol ip parent ffff: chain 100 flower ct_state 
       +trk +est -dnat action mirred egress redirect dev eth6"

2.  three additional integer variables.
These variables, which can be set from within the ct_action, are introduced: 
     ct_zone - to commit the connection in (u16) Logically separate connection tracking 
               table/Multi-tenancy 
     ct_mark - Attach metadata to particular connections (u32) 
     ct_label – similar to mark (128 bits)  

Example #2:  "tc filter add dev eth5 protocol ip parent ffff: chain 100 flower  
                  ct_state +trk +est ct_label 10 ct_zone 9 action drop "

Complete list of the flags and their  description can be found at:
http://www.openvswitch.org/support/dist-docs/ovs-fields.7.txt


------------------------------------------------------------------------------------
CT actions:
------------------------------------------------------------------------------------
The ct_action action sends packet to ConnTrack ( nf_conntrack_in() method) and then updates ct_state bits according to the result from connection tracking.


[1] CT Action has the following possible arguments:
1. commit: Commit the connection to the connection tracking module which will be
      stored beyond the lifetime of packet in the pipeline.
2. force: The force flag may be used in addition to commit flag to effectively terminate
     the existing connection and start a new one in the current direction.
3.  chain = K (chain is similar to ct 'table' in OVS syntax) :  Clone packet to send to
      connection tracker. When the connection tracker is finished, resume processing
       in chain K for that packet. The original packet continues right after the ct(...) action.
4.  Set variable: ct_zone, ct_mark, ct_label (see description above)
    Example #3:: "tc .... action ct ct_zone 7 commit ct_label 0x0123456789ABCDEF0000111222"
5.  NAT: Specifies the address and port translation for the connection being tracked.
      Example #4:
      "ct_action nat src 10.0.0.1 10.1.1.0" rewrite source ip+port from the list.

      Example #5: "tc ... action ct nat src 10.0.0.1 10.1.1.0" rewrite source ip+port
  from the list.
      Example #6: "tc ... action ct nat auto" rewrite packets automatically from
  saved kernel NAT list
  
-----

[2] CT action also has 3 new parameters
Three new variables which can be set from within the ct_action.
1. ct_zone: 16 bit
2. ct_mark: 32bit
3. ct_label: 128bit

Example #7: tc..... action ct ct_zone 7 commit ct_label x0123456789ABCDEF0000111222 


------

[3] NAT action. 
Supporting
(1) specific NAT for source
(2) specific NAT for destination
(3) automatic.

TC, when instructed when and how to do so, will do a NAT translation by using the kernel NAT module. 
Resulting in a modified skb returning to the following TC chain for further  processing

Example #8: "tc filter add dev eth5 protocol ip parent ffff: chain 100 flower action 
          ct commit nat src 10.0.0.0 10.0.0.255"
Commit a new connection to Conntrack and replace NAT the source ip address

Example #9: "tc filter add dev eth5 protocol ip parent ffff: chain 100 flower action ct 
             commit nat auto"
Commit a new connection to Conntrack and replace NAT the source ip address

Additional examples can be found at OVS NAT patch comments:
https://lwn.net/Articles/674868/


[3] match on newly added variables ( ct_zone, ct_mark, ct_label) Example #10: "tc ct_zone 3 ct_mark 0x333 ...."

----------------------------------------
Connection-Tracking action:
----------------------------
TC data path calls Connection Tracking  nf_conntrack_in() method with skb which returns connTrack result inside skb->_nfct which is of type struct nf_conn.

Connection-Tracking Match:
----------------------------
connection tracking match can be done using conntrack_mt...() [/net/netfilter/xt_conntrack.c] calls which can be used to match connection tracking information. 

Connection-Tracking NAT:
-------------------------------
NAT implementation details are the same as in OVS. As described in:

* https://lwn.net/Articles/674868/
* https://lwn.net/Articles/671459/
* http://www.openvswitch.org/support/ovscon2014/17/1030-conntrack_nat.pdf


Required OVS changes
-------------------------------
1. OVS has to be modified to send Connection-tracking datapath messages to TC 2. OVS datapath has to be enhanced to support enforcement of window-validation


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT
  2019-01-23 11:29 [RFC] Connection Tracking Offload netdev RFC v1.0, part 1/2: command line + implementation Guy Shattah
@ 2019-01-25  2:32 ` Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 1/6] flow_dissector: add support for matching on ConnTrack Marcelo Ricardo Leitner
                     ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:32 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

We have been working on the sw datapath of tc+CT. We may not have much
yet, but this should help to shed some light on what is needed,
sw-datapath-wise speaking. Lets grease the wheels!

Some key features are still missing like proper handling of conntrack
labels, indexing all CT entries on a given act_ct action (so that we can
purge them if the action is removed) and properly match on ct_state.

All in all, if anything in there is not aligned with the planning RFC PATCH,
is because it is still in progress, but fell free to highlight it
anyway.

A LOT more will be needed for handling the offloading.

With these patches, this construction:

./tc filter del dev veth1 ingress
./tc filter add dev veth1 ingress proto ip \
        matchall \
        action ct zone 1 commit \
        action goto chain 100
./tc filter add dev veth1 ingress proto ip chain 100 \
        flower ct_zone 2 \
        action drop
./tc filter add dev veth1 ingress proto ip chain 100 \
        flower ct_zone 1 \
        action drop

works, in the sense that replaying a tcp packet gets dropped by the last
rule on chain 100, while the first one misses it. Regarding the goto
chain used here, yes, that action has to be done within the ct action
(as described in the planning and in the FIXME tag in 3rd patch).

Marcelo Ricardo Leitner (6):
  flow_dissector: add support for matching on ConnTrack
  net/sched: flower: add support for matching on ConnTrack
  net/sched: add CT action
  net/sched: act_ct: add support for force flag
  net/sched: act_ct: add support for clear flag
  net/sched: act_ct: allow sending a packet through conntrack multiple
    times

 include/net/flow_dissector.h                |  17 +
 include/net/tc_act/tc_ct.h                  |  29 ++
 include/uapi/linux/netfilter/xt_connlabel.h |   5 +
 include/uapi/linux/pkt_cls.h                |   9 +
 include/uapi/linux/tc_act/tc_ct.h           |  38 ++
 net/core/flow_dissector.c                   |  25 ++
 net/sched/Kconfig                           |   6 +
 net/sched/Makefile                          |   1 +
 net/sched/act_ct.c                          | 385 ++++++++++++++++++++
 net/sched/cls_flower.c                      |  33 ++
 10 files changed, 548 insertions(+)
 create mode 100644 include/net/tc_act/tc_ct.h
 create mode 100644 include/uapi/linux/tc_act/tc_ct.h
 create mode 100644 net/sched/act_ct.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH 1/6] flow_dissector: add support for matching on ConnTrack
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
@ 2019-01-25  2:32   ` Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 2/6] net/sched: flower: " Marcelo Ricardo Leitner
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:32 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

This a preliminary patch to add support on flow dissector for matching on
ConnTrack information.

2 FIXMEs in place:
- reusing nf_conn_labels may not be feasible, as we don't want to pull too
  much of ConnTrack into flow dissector.
- CT may be there, but it may not be using labels. As hashing zeroes is
  different than not having the information, it should either be handled as
  a new dissector key or have an extra bit in flow_dissector_key_ct
  indicating its presence. Having it as an extra key may speed searches not
  using labels.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/net/flow_dissector.h                | 17 ++++++++++++++
 include/uapi/linux/netfilter/xt_connlabel.h |  5 +++++
 net/core/flow_dissector.c                   | 25 +++++++++++++++++++++
 3 files changed, 47 insertions(+)

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 6a4586dcdeded9b6cfe7d299d368b6a6ea6801cc..2b5a20a0f65c28d9907697ac3ef7e03d0e20209a 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -4,7 +4,9 @@
 
 #include <linux/types.h>
 #include <linux/in6.h>
+#include <linux/bits.h>
 #include <uapi/linux/if_ether.h>
+#include <uapi/linux/netfilter/xt_connlabel.h>
 
 /**
  * struct flow_dissector_key_control:
@@ -199,6 +201,19 @@ struct flow_dissector_key_ip {
 	__u8	ttl;
 };
 
+/**
+ * struct flow_dissector_key_ct:
+ */
+struct flow_dissector_key_ct {
+	__u16	zone;
+	__u8	state;
+	__u32	mark;
+	/* FIXME: Use nf_conn_labels instead? But it pulls all netfilter */
+#define NF_CT_LABELS_MAX_SIZE ((XT_CONNLABEL_MAXBIT + 1) / BITS_PER_BYTE)
+	unsigned long label[NF_CT_LABELS_MAX_SIZE / sizeof(long)];
+#undef NF_CT_LABELS_MAX_SIZE
+};
+
 enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
 	FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -224,6 +239,7 @@ enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_CVLAN, /* struct flow_dissector_key_flow_vlan */
 	FLOW_DISSECTOR_KEY_ENC_IP, /* struct flow_dissector_key_ip */
 	FLOW_DISSECTOR_KEY_ENC_OPTS, /* struct flow_dissector_key_enc_opts */
+	FLOW_DISSECTOR_KEY_CT, /* struct flow_dissector_key_ct */
 
 	FLOW_DISSECTOR_KEY_MAX,
 };
@@ -254,6 +270,7 @@ struct flow_keys {
 #define FLOW_KEYS_HASH_START_FIELD basic
 	struct flow_dissector_key_basic basic;
 	struct flow_dissector_key_tags tags;
+	struct flow_dissector_key_ct ct;
 	struct flow_dissector_key_vlan vlan;
 	struct flow_dissector_key_vlan cvlan;
 	struct flow_dissector_key_keyid keyid;
diff --git a/include/uapi/linux/netfilter/xt_connlabel.h b/include/uapi/linux/netfilter/xt_connlabel.h
index 2312f0ec07b2791ffaece0a95eebaefa727f14be..20a1c1fe79a7676c4b9f8727c393443f2d545784 100644
--- a/include/uapi/linux/netfilter/xt_connlabel.h
+++ b/include/uapi/linux/netfilter/xt_connlabel.h
@@ -1,4 +1,7 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _NF_CONNTRACK_CONNLABEL_H
+#define _NF_CONNTRACK_CONNLABEL_H
+
 #include <linux/types.h>
 
 #define XT_CONNLABEL_MAXBIT 127
@@ -11,3 +14,5 @@ struct xt_connlabel_mtinfo {
 	__u16 bit;
 	__u16 options;
 };
+
+#endif
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 2e8d91e54179c32b41ab53b9fa424fe1691acde0..73336466423aedff9a6fc4724f6c6efe44d5225a 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -26,6 +26,8 @@
 #include <scsi/fc/fc_fcoe.h>
 #include <uapi/linux/batadv_packet.h>
 #include <linux/bpf.h>
+#include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_labels.h>
 
 static DEFINE_MUTEX(flow_dissector_mutex);
 
@@ -798,6 +800,29 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	}
 	rcu_read_unlock();
 
+	if (dissector_uses_key(flow_dissector,
+			       FLOW_DISSECTOR_KEY_CT)) {
+		struct flow_dissector_key_ct *key_ct;
+		enum ip_conntrack_info ctinfo;
+		struct nf_conn_labels *labels;
+		struct nf_conn *ct;
+
+		ct = nf_ct_get(skb, &ctinfo);
+		if (ct) {
+			key_ct = skb_flow_dissector_target(flow_dissector,
+							   FLOW_DISSECTOR_KEY_CT,
+							   target_container);
+			key_ct->zone = ct->zone.id;
+			key_ct->state = ctinfo;
+			key_ct->mark = ct->mark;
+			labels = nf_ct_labels_find(ct);
+			/* FIXME: should this be a new key then? */
+			if (labels)
+				memcpy(key_ct->label, labels->bits,
+				       sizeof(key_ct->label));
+		}
+	}
+
 	if (dissector_uses_key(flow_dissector,
 			       FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
 		struct ethhdr *eth = eth_hdr(skb);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 2/6] net/sched: flower: add support for matching on ConnTrack
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 1/6] flow_dissector: add support for matching on ConnTrack Marcelo Ricardo Leitner
@ 2019-01-25  2:32   ` Marcelo Ricardo Leitner
  2019-01-25 13:37     ` Simon Horman
  2019-01-25  2:32   ` [RFC PATCH 3/6] net/sched: add CT action Marcelo Ricardo Leitner
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:32 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

Hook on flow dissector's new interface on ConnTrack from previous patch.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/pkt_cls.h |  9 +++++++++
 net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -490,6 +490,15 @@ enum {
 	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
 	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
 
+	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
+	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
+	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
+	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
+	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
+	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
+	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
+	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 85e9f8e1da10aa7b01b0f51768edfefbe63d6a10..430b7fceeca0998b8c904acd91f8de53571814ff 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -57,6 +57,7 @@ struct fl_flow_key {
 	struct flow_dissector_key_enc_opts enc_opts;
 	struct flow_dissector_key_ports tp_min;
 	struct flow_dissector_key_ports tp_max;
+	struct flow_dissector_key_ct ct;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. */
 
 struct fl_flow_mask_range {
@@ -1079,6 +1080,22 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 
 	fl_set_key_ip(tb, true, &key->enc_ip, &mask->enc_ip);
 
+	fl_set_key_val(tb, &key->ct.mark, TCA_FLOWER_KEY_CT_MARK,
+		       &mask->ct.mark, TCA_FLOWER_KEY_CT_MARK_MASK,
+		       sizeof(key->ct.mark));
+
+	fl_set_key_val(tb, &key->ct.zone, TCA_FLOWER_KEY_CT_ZONE,
+		       &mask->ct.zone, TCA_FLOWER_KEY_CT_ZONE_MASK,
+		       sizeof(key->ct.zone));
+
+	fl_set_key_val(tb, &key->ct.state, TCA_FLOWER_KEY_CT_STATE,
+		       &mask->ct.state, TCA_FLOWER_KEY_CT_STATE_MASK,
+		       sizeof(key->ct.state));
+
+	fl_set_key_val(tb, &key->ct.label, TCA_FLOWER_KEY_CT_LABEL,
+		       &mask->ct.label, TCA_FLOWER_KEY_CT_LABEL_MASK,
+		       sizeof(key->ct.label));
+
 	if (tb[TCA_FLOWER_KEY_ENC_OPTS]) {
 		ret = fl_set_enc_opt(tb, key, mask, extack);
 		if (ret)
@@ -1183,6 +1200,8 @@ static void fl_init_dissector(struct flow_dissector *dissector,
 			     FLOW_DISSECTOR_KEY_ENC_IP, enc_ip);
 	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
 			     FLOW_DISSECTOR_KEY_ENC_OPTS, enc_opts);
+	FL_KEY_SET_IF_MASKED(mask, keys, cnt,
+			     FLOW_DISSECTOR_KEY_CT, ct);
 
 	skb_flow_dissector_init(dissector, keys, cnt);
 }
@@ -1994,6 +2013,20 @@ static int fl_dump_key(struct sk_buff *skb, struct net *net,
 	    fl_dump_key_enc_opt(skb, &key->enc_opts, &mask->enc_opts))
 		goto nla_put_failure;
 
+	if (fl_dump_key_val(skb, &key->ct.zone, TCA_FLOWER_KEY_CT_ZONE,
+			    &mask->ct.zone, TCA_FLOWER_KEY_CT_ZONE_MASK,
+			    sizeof(key->ct.zone)) ||
+	    fl_dump_key_val(skb, &key->ct.mark, TCA_FLOWER_KEY_CT_MARK,
+			    &mask->ct.mark, TCA_FLOWER_KEY_CT_MARK_MASK,
+			    sizeof(key->ct.mark)) ||
+	    fl_dump_key_val(skb, &key->ct.state, TCA_FLOWER_KEY_CT_STATE,
+			    &mask->ct.state, TCA_FLOWER_KEY_CT_STATE_MASK,
+			    sizeof(key->ct.state)) ||
+	    fl_dump_key_val(skb, &key->ct.label, TCA_FLOWER_KEY_CT_LABEL,
+			    &mask->ct.label, TCA_FLOWER_KEY_CT_LABEL_MASK,
+			    sizeof(key->ct.label)))
+		goto nla_put_failure;
+
 	if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
 		goto nla_put_failure;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 3/6] net/sched: add CT action
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 1/6] flow_dissector: add support for matching on ConnTrack Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 2/6] net/sched: flower: " Marcelo Ricardo Leitner
@ 2019-01-25  2:32   ` Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 4/6] net/sched: act_ct: add support for force flag Marcelo Ricardo Leitner
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:32 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

This is where most of the code is and the main pain points.

The implementation is using spinlock on the datapath for now just for
simplicity. Lets get the basics done and then move forward.

Open points:
- nf_ct_netns_get() accepts IPv4, IPv6 or both. It would be interesting to
  match on what was specified to the filter, but not sure if that's really
  wanted neither how.
- iptables CT target can set a different zone for each direction and also
  infer it from the mark. These are NOT used by OvS. We can focus on this
  later but probably want to consider the need now.
- datapath fork
  As described in the planning RFC PATCH, OvS ct action creates a fork in the
  datapath: consider that the packet is being sent through conntrack. The
  original packet, without conntrack information, will first finish
  executing the current list of actions. After that is done, a packet clone
  created by the ct action will be inserted into the specified chain, and
  resume its processing. Somehow we need to be able to inject this packet
  and we can't use the interface backlog for it, as that would create a
  massive reordering.
- The handling of multiple calls to CT action is needed because the first
  call may be on a packet still with tunnel headers, and then without it.
  It is handled in a subsequent patch by dropping any conntrack present in
  the skb.

On protocol type on datapath, note that tc can match on both at once, IPv4
and IPv6.  So far we can't easily tell which filter tc is using. We could
tell conntrack to work with both (NFPROTO_INET), but that would be kind of
a lazy solution here.  Instead, lets trust the packet header: if it is
here, it's because tc matched, so we can either process it as IPv4 or IPv6.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/net/tc_act/tc_ct.h        |  29 +++
 include/uapi/linux/tc_act/tc_ct.h |  36 +++
 net/sched/Kconfig                 |   6 +
 net/sched/Makefile                |   1 +
 net/sched/act_ct.c                | 356 ++++++++++++++++++++++++++++++
 5 files changed, 428 insertions(+)
 create mode 100644 include/net/tc_act/tc_ct.h
 create mode 100644 include/uapi/linux/tc_act/tc_ct.h
 create mode 100644 net/sched/act_ct.c

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
new file mode 100644
index 0000000000000000000000000000000000000000..65682460f501b5886d9266f811c8ed30a4510304
--- /dev/null
+++ b/include/net/tc_act/tc_ct.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __NET_TC_CT_H
+#define __NET_TC_CT_H
+
+#include <linux/types.h>
+#include <net/act_api.h>
+#include <uapi/linux/netfilter/xt_connlabel.h>
+
+struct tcf_ct {
+	struct tc_action common;
+	struct net *net;
+
+	u16 zone;
+	u32 mark;
+	u32 mark_mask;
+	u32 chain;
+
+	/* FIXME: Use nf_conn_labels instead? But it pulls all netfilter */
+#define NF_CT_LABELS_MAX_SIZE ((XT_CONNLABEL_MAXBIT + 1) / BITS_PER_BYTE)
+	u32 label[NF_CT_LABELS_MAX_SIZE / sizeof(long)];
+	u32 label_mask[NF_CT_LABELS_MAX_SIZE / sizeof(long)];
+
+	u32 flags;
+	struct nf_conn *ct;
+};
+
+#define to_tcf_ct(a) ((struct tcf_ct *)a)
+
+#endif /* __NET_TC_CT_H */
diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
new file mode 100644
index 0000000000000000000000000000000000000000..37b95cda1dedd283b0244a03a20860ba22966dfa
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_TC_CT_H
+#define __LINUX_TC_CT_H
+
+#include <linux/pkt_cls.h>
+#include <linux/types.h>
+
+#define TCA_ACT_CT 27
+
+enum {
+	TCA_CT_UNSPEC,
+	TCA_CT_TM,
+	TCA_CT_PARMS,
+	TCA_CT_PAD,
+	TCA_CT_ZONE,
+	TCA_CT_MARK,
+	TCA_CT_MARK_MASK,
+	TCA_CT_LABEL,
+	TCA_CT_LABEL_MASK,
+	TCA_CT_CHAIN,
+	TCA_CT_FLAGS,
+	__TCA_CT_MAX
+};
+#define TCA_CT_MAX (__TCA_CT_MAX - 1)
+
+enum {
+	TC_CT_COMMIT,
+	__TC_CT_MAX
+};
+#define TC_CT_MAX (__TC_CT_MAX - 1)
+
+struct tc_ct {
+	tc_gen;
+};
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 1b9afdee5ba976ba64200d8f85050cf053b7d65c..2c7f963b78f7511bbee8814b1c5bfdb488386c5d 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -912,6 +912,12 @@ config NET_ACT_TUNNEL_KEY
 	  To compile this code as a module, choose M here: the
 	  module will be called act_tunnel_key.
 
+config NET_ACT_CT
+        tristate "Conntrack manipulation"
+        depends on NET_CLS_ACT
+        ---help---
+	  FIXME
+
 config NET_IFE_SKBMARK
         tristate "Support to encoding decoding skb mark on IFE action"
         depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 8a40431d7b5c420d86427933a9af383e093812b7..f2f6db5b8352a9594b72bc6197caf2228b45c079 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
 obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
 obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
+obj-$(CONFIG_NET_ACT_CT)	+= act_ct.o
 obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)	+= act_meta_skbprio.o
 obj-$(CONFIG_NET_IFE_SKBTCINDEX)	+= act_meta_skbtcindex.o
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
new file mode 100644
index 0000000000000000000000000000000000000000..f69509954149a0c8be710916a5289a4448049b5d
--- /dev/null
+++ b/net/sched/act_ct.c
@@ -0,0 +1,356 @@
+/*
+ * Conntrack manipulation
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ */
+
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/netfilter.h>
+#include <linux/rtnetlink.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/tc_act/tc_ct.h>
+#include <net/act_api.h>
+#include <net/netlink.h>
+#include <net/tc_act/tc_ct.h>
+#include <net/netfilter/nf_conntrack_core.h>
+#include <net/netfilter/nf_conntrack_labels.h>
+
+static unsigned int ct_net_id;
+static struct tc_action_ops act_ct_ops;
+
+static const struct nla_policy ct_policy[TCA_CT_MAX + 1] = {
+	[TCA_CT_PARMS]		= { .len = sizeof(struct tc_ct) },
+	[TCA_CT_ZONE]		= { .type = NLA_U16 },
+	[TCA_CT_MARK]		= { .type = NLA_U32 },
+	[TCA_CT_MARK_MASK]	= { .type = NLA_U32 },
+	[TCA_CT_LABEL]		= { .type = NLA_BINARY,
+				    .len = 128/BITS_PER_BYTE },
+	[TCA_CT_LABEL_MASK]	= { .type = NLA_BINARY,
+				    .len = 128/BITS_PER_BYTE },
+	[TCA_CT_CHAIN]		= { .type = NLA_U32 },
+	[TCA_CT_FLAGS]		= { .type = NLA_U32 },
+};
+
+static int tcf_ct_init(struct net *net, struct nlattr *nla, struct nlattr *est,
+		       struct tc_action **a, int ovr, int bind,
+		       bool rtnl_held, struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ct_net_id);
+	struct nlattr *tb[TCA_CT_MAX + 1];
+	struct nf_conntrack_zone zone;
+	struct nf_conn *ct, *_ct;
+	struct tc_ct *parm;
+	int ret = 0, err;
+	struct tcf_ct *p;
+	u16 zone_id = NF_CT_DEFAULT_ZONE_ID;
+
+	if (!nla)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CT_MAX, nla, ct_policy, NULL);
+	if (err < 0)
+		return err;
+
+	if (!tb[TCA_CT_PARMS])
+		return -EINVAL;
+	parm = nla_data(tb[TCA_CT_PARMS]);
+
+	if (tb[TCA_CT_ZONE])
+		zone_id = nla_get_u16(tb[TCA_CT_ZONE]);
+
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!err) {
+		ret = tcf_idr_create(tn, parm->index, est, a,
+				     &act_ct_ops, bind, false);
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
+			return ret;
+		}
+		ret = ACT_P_CREATED;
+	} else if (err > 0) {
+		if (bind)
+			return 0;
+		if (!ovr) {
+			ret = -EEXIST;
+			goto err1;
+		}
+	} else {
+		return err;
+	}
+
+	/* XXX Need translation from AF_INET to NFPROTO_ */
+	err = nf_ct_netns_get(net, NFPROTO_IPV4 /* XXX par->family */);
+	if (err < 0) {
+		ret = err;
+		goto err1;
+	}
+
+	/* XXX: CT target supports setting a different zone on each direction */
+	/* XXX: CT supports inferring zone id from the mark, but we probably
+	 * don't need that here.
+	if (info->flags & XT_CT_ZONE_MARK)
+		zone.flags |= NF_CT_FLAG_MARK;
+	 */
+	nf_ct_zone_init(&zone, zone_id, NF_CT_DEFAULT_ZONE_DIR, 0);
+
+	ct = nf_ct_tmpl_alloc(net, &zone, GFP_KERNEL);
+	if (!ct) {
+		ret = -ENOMEM;
+		goto err1;
+	}
+
+	__set_bit(IPS_CONFIRMED_BIT, &ct->status);
+	nf_conntrack_get(&ct->ct_general);
+
+	p = to_tcf_ct(*a);
+	spin_lock_bh(&p->tcf_lock);
+	p->zone = zone_id;
+	if (tb[TCA_CT_MARK] && tb[TCA_CT_MARK_MASK]) {
+		p->mark = nla_get_u32(tb[TCA_CT_MARK]);
+		p->mark_mask = nla_get_u32(tb[TCA_CT_MARK_MASK]);
+	}
+	if (tb[TCA_CT_LABEL] && tb[TCA_CT_LABEL_MASK]) {
+		nla_memcpy(p->label, tb[TCA_CT_LABEL], sizeof(p->label));
+		nla_memcpy(p->label_mask, tb[TCA_CT_LABEL_MASK],
+			   sizeof(p->label_mask));
+		nf_connlabels_replace(ct, p->label, p->label_mask,
+				      sizeof(p->label)/sizeof(u32));
+	}
+	if (tb[TCA_CT_CHAIN])
+		p->chain = nla_get_u32(tb[TCA_CT_CHAIN]);
+	if (tb[TCA_CT_FLAGS])
+		p->flags = nla_get_u32(tb[TCA_CT_FLAGS]);
+	p->net = net;
+
+	p->tcf_action = parm->action;
+
+	_ct = p->ct;
+	p->ct = ct;
+
+	spin_unlock_bh(&p->tcf_lock);
+
+	if (_ct) {
+		nf_conntrack_put(&_ct->ct_general);
+	}
+
+	if (ret == ACT_P_CREATED)
+		tcf_idr_insert(tn, *a);
+
+	return ret;
+
+err1:
+	tcf_idr_release(*a, bind);
+	return ret;
+}
+
+static void tcf_ct_cleanup(struct tc_action *a)
+{
+	struct tcf_ct *p = to_tcf_ct(a);
+
+	if (p->ct) {
+		nf_conntrack_put(&p->ct->ct_general);
+	}
+}
+
+static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
+		      struct tcf_result *res)
+{
+	struct tcf_ct *p = to_tcf_ct(a);
+	struct nf_hook_state state = {
+		.hook = NF_INET_PRE_ROUTING,
+	};
+	struct nf_conn *ct, *new_ct;
+	u32 mark, mark_mask, flags;
+	int action, err;
+	int nh_ofs;
+
+	spin_lock(&p->tcf_lock);
+
+	tcf_lastuse_update(&p->tcf_tm);
+	mark = p->mark;
+	mark_mask = p->mark_mask;
+	flags = p->flags;
+	state.net = p->net;
+	action = p->tcf_action;
+	ct = p->ct;
+	if (ct)
+		/* This gets transferred to conntrack */
+		nf_conntrack_get(&ct->ct_general);
+
+	bstats_update(&p->tcf_bstats, skb);
+
+	spin_unlock(&p->tcf_lock);
+
+	if (unlikely(action == TC_ACT_SHOT))
+		goto drop;
+
+	/* FIXME: For when we support cloning the packet
+	orig_skb = skb;
+	skb = skb_clone(orig_skb, GFP_ATOMIC);
+	 */
+
+	/* The conntrack module expects to be working at L3. */
+	nh_ofs = skb_network_offset(skb);
+	skb_pull_rcsum(skb, nh_ofs);
+	/* FIXME: OvS trims the packet here. Should we? */
+
+	/* FIXME: Need to handle multiple calls to CT action here. */
+	if (ct)
+		nf_ct_set(skb, ct, IP_CT_NEW);
+
+	if (skb->protocol == htons(ETH_P_IPV6)) {
+		state.pf = NFPROTO_IPV6;
+	} else {
+		/* FIXME: should we restrict this even further? */
+		state.pf = NFPROTO_IPV4;
+	}
+
+	err = nf_conntrack_in(skb, &state);
+	if (err != NF_ACCEPT)
+		goto drop;
+
+	new_ct = (struct nf_conn *)skb_nfct(skb);
+	if (new_ct) {
+		if (mark_mask) {
+			new_ct->mark = (new_ct->mark &~ mark_mask) | (mark & mark_mask);
+			if (nf_ct_is_confirmed(new_ct))
+				nf_conntrack_event_cache(IPCT_MARK, new_ct);
+		}
+
+		nf_ct_deliver_cached_events(new_ct);
+	}
+
+	if (flags & BIT(TC_CT_COMMIT)) {
+		err = nf_conntrack_confirm(skb);
+		if (err != NF_ACCEPT) {
+			printk("failed to confirm %d\n", err);
+			goto drop;
+		}
+	}
+
+	/* FIXME: inject the packet into another chain (as it would happen if
+	 * it had a miss in hw too)
+	 */
+
+	skb_push(skb, nh_ofs);
+	skb_postpush_rcsum(skb, skb->data, nh_ofs);
+	return TC_ACT_PIPE;
+
+drop:
+	spin_lock(&p->tcf_lock);
+	p->tcf_qstats.drops++;
+	spin_unlock(&p->tcf_lock);
+	return TC_ACT_SHOT;
+}
+
+static int tcf_ct_dump(struct sk_buff *skb, struct tc_action *a,
+		       int bind, int ref)
+{
+	unsigned char *b = skb_tail_pointer(skb);
+	struct tcf_ct *p = to_tcf_ct(a);
+	struct tc_ct opt = {
+		.index    = p->tcf_index,
+		.refcnt   = refcount_read(&p->tcf_refcnt) - ref,
+		.bindcnt  = atomic_read(&p->tcf_bindcnt) - bind,
+	};
+	struct tcf_t t;
+
+	spin_lock_bh(&p->tcf_lock);
+	nla_put_u16(skb, TCA_CT_ZONE, p->zone);
+	nla_put_u32(skb, TCA_CT_MARK, p->mark);
+	nla_put_u32(skb, TCA_CT_MARK_MASK, p->mark_mask);
+	nla_put_u32(skb, TCA_CT_CHAIN, p->chain);
+	nla_put(skb, TCA_CT_LABEL, sizeof(p->label), p->label);
+	nla_put(skb, TCA_CT_LABEL_MASK, sizeof(p->label_mask), p->label_mask);
+	nla_put_u32(skb, TCA_CT_FLAGS, p->flags);
+	opt.action = p->tcf_action;
+
+	if (nla_put(skb, TCA_CT_PARMS, sizeof(opt), &opt))
+		goto nla_put_failure;
+
+	tcf_tm_dump(&t, &p->tcf_tm);
+	if (nla_put_64bit(skb, TCA_CT_TM, sizeof(t), &t, TCA_CT_PAD))
+		goto nla_put_failure;
+	spin_unlock_bh(&p->tcf_lock);
+
+	return skb->len;
+
+nla_put_failure:
+	spin_unlock_bh(&p->tcf_lock);
+	nlmsg_trim(skb, b);
+	return -1;
+}
+
+static int tcf_ct_walker(struct net *net, struct sk_buff *skb,
+			 struct netlink_callback *cb, int type,
+			 const struct tc_action_ops *ops,
+			 struct netlink_ext_ack *extack)
+{
+	struct tc_action_net *tn = net_generic(net, ct_net_id);
+
+	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static int tcf_ct_search(struct net *net, struct tc_action **a, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ct_net_id);
+
+	return tcf_idr_search(tn, a, index);
+}
+
+static struct tc_action_ops act_ct_ops = {
+	.kind		=	"ct",
+	.type		=	TCA_ACT_CT,
+	.owner		=	THIS_MODULE,
+	.act		=	tcf_ct_act,
+	.dump		=	tcf_ct_dump,
+	.init		=	tcf_ct_init,
+	.cleanup	=	tcf_ct_cleanup,
+	.walk		=	tcf_ct_walker,
+	.lookup		=	tcf_ct_search,
+	.size		=	sizeof(struct tcf_ct),
+};
+
+static __net_init int ct_init_net(struct net *net)
+{
+	struct tc_action_net *tn = net_generic(net, ct_net_id);
+
+	return tc_action_net_init(tn, &act_ct_ops);
+}
+
+static void __net_exit ct_exit_net(struct list_head *net_list)
+{
+	tc_action_net_exit(net_list, ct_net_id);
+}
+
+static struct pernet_operations ct_net_ops = {
+	.init = ct_init_net,
+	.exit_batch = ct_exit_net,
+	.id   = &ct_net_id,
+	.size = sizeof(struct tc_action_net),
+};
+
+MODULE_DESCRIPTION("Connection Tracking actions");
+MODULE_LICENSE("GPL");
+
+static int __init ct_init_module(void)
+{
+	return tcf_register_action(&act_ct_ops, &ct_net_ops);
+}
+
+static void __exit ct_cleanup_module(void)
+{
+	tcf_unregister_action(&act_ct_ops, &ct_net_ops);
+}
+
+module_init(ct_init_module);
+module_exit(ct_cleanup_module);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 4/6] net/sched: act_ct: add support for force flag
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
                     ` (2 preceding siblings ...)
  2019-01-25  2:32   ` [RFC PATCH 3/6] net/sched: add CT action Marcelo Ricardo Leitner
@ 2019-01-25  2:32   ` Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 5/6] net/sched: act_ct: add support for clear flag Marcelo Ricardo Leitner
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:32 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

OvS ct action has this 'force' flag, which basically forces ConnTrack to
consider that this packet, this specific direction, is the original one.

Implement that similarly: if the ct entry is there and the direction is not
the expected one, destroy it and create a new one.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/tc_act/tc_ct.h |  1 +
 net/sched/act_ct.c                | 16 +++++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
index 37b95cda1dedd283b0244a03a20860ba22966dfa..009e53ee83fb3125bc5c4ca86954af3bf6a0287a 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -25,6 +25,7 @@ enum {
 
 enum {
 	TC_CT_COMMIT,
+	TC_CT_FORCE,
 	__TC_CT_MAX
 };
 #define TC_CT_MAX (__TC_CT_MAX - 1)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index f69509954149a0c8be710916a5289a4448049b5d..8a1b5d6a7cd8360c50011d992368464db213a020 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -165,6 +165,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 		      struct tcf_result *res)
 {
 	struct tcf_ct *p = to_tcf_ct(a);
+	enum ip_conntrack_info ctinfo;
 	struct nf_hook_state state = {
 		.hook = NF_INET_PRE_ROUTING,
 	};
@@ -173,6 +174,8 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 	int action, err;
 	int nh_ofs;
 
+	/* Again needs to be here because we need a new ref on the ct. */
+again:
 	spin_lock(&p->tcf_lock);
 
 	tcf_lastuse_update(&p->tcf_tm);
@@ -218,8 +221,19 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 	if (err != NF_ACCEPT)
 		goto drop;
 
-	new_ct = (struct nf_conn *)skb_nfct(skb);
+	new_ct = nf_ct_get(skb, &ctinfo);
 	if (new_ct) {
+		/* Force conntrack entry direction. */
+		if (flags & BIT(TC_CT_FORCE) &&
+		    CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) {
+			if (nf_ct_is_confirmed(new_ct))
+				nf_ct_delete(new_ct, 0, 0);
+
+			nf_conntrack_put(&new_ct->ct_general);
+			nf_ct_set(skb, NULL, 0);
+			goto again;
+		}
+
 		if (mark_mask) {
 			new_ct->mark = (new_ct->mark &~ mark_mask) | (mark & mark_mask);
 			if (nf_ct_is_confirmed(new_ct))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 5/6] net/sched: act_ct: add support for clear flag
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
                     ` (3 preceding siblings ...)
  2019-01-25  2:32   ` [RFC PATCH 4/6] net/sched: act_ct: add support for force flag Marcelo Ricardo Leitner
@ 2019-01-25  2:32   ` Marcelo Ricardo Leitner
  2019-01-25  2:32   ` [RFC PATCH 6/6] net/sched: act_ct: allow sending a packet through conntrack multiple times Marcelo Ricardo Leitner
  2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
  6 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:32 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

OvS ct action supports a 'clear' flag: it removes any ConnTrack marking in
the packet. Implement it similarly here: drop the reference and return.
Note that the packet is also marked as UNTRACKED.

Yes, parsing should ensure that clear is not used with any other flags as
they are mutually exclusive.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/tc_act/tc_ct.h |  1 +
 net/sched/act_ct.c                | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
index 009e53ee83fb3125bc5c4ca86954af3bf6a0287a..636f435b86e006aa36034f86c65fd5c220ca8a13 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -26,6 +26,7 @@ enum {
 enum {
 	TC_CT_COMMIT,
 	TC_CT_FORCE,
+	TC_CT_CLEAR,
 	__TC_CT_MAX
 };
 #define TC_CT_MAX (__TC_CT_MAX - 1)
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 8a1b5d6a7cd8360c50011d992368464db213a020..77d55c05ed95d8abc8c35a3d19f453a586139914 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -196,6 +196,18 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 	if (unlikely(action == TC_ACT_SHOT))
 		goto drop;
 
+	if (flags & BIT(TC_CT_CLEAR)) {
+		new_ct = nf_ct_get(skb, &ctinfo);
+		if (new_ct) {
+			if (nf_ct_is_confirmed(new_ct))
+				nf_ct_delete(new_ct, 0, 0);
+
+			nf_conntrack_put(&new_ct->ct_general);
+			nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
+			goto out;
+		}
+	}
+
 	/* FIXME: For when we support cloning the packet
 	orig_skb = skb;
 	skb = skb_clone(orig_skb, GFP_ATOMIC);
@@ -257,6 +269,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 
 	skb_push(skb, nh_ofs);
 	skb_postpush_rcsum(skb, skb->data, nh_ofs);
+out:
 	return TC_ACT_PIPE;
 
 drop:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH 6/6] net/sched: act_ct: allow sending a packet through conntrack multiple times
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
                     ` (4 preceding siblings ...)
  2019-01-25  2:32   ` [RFC PATCH 5/6] net/sched: act_ct: add support for clear flag Marcelo Ricardo Leitner
@ 2019-01-25  2:32   ` Marcelo Ricardo Leitner
  2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
  6 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:32 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

The first time it may use conntrack to track the tunnel information,
then jump into another chain, and go through conntrack again so that
the inner header is tracked.

This commit clears previous conntrack info if any so that we can
submit it to conntrack again.

Header offsets are supposed to be updated by the decapsulating action.

The main difference from just adding another act_ct(clear) action is that
the clear flag also sets the UNTRACKED mark in the packet (like OvS does).

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 net/sched/act_ct.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 77d55c05ed95d8abc8c35a3d19f453a586139914..6e446db3bcdda772dbe1090d5c584156f6cc59eb 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -196,16 +196,19 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 	if (unlikely(action == TC_ACT_SHOT))
 		goto drop;
 
-	if (flags & BIT(TC_CT_CLEAR)) {
-		new_ct = nf_ct_get(skb, &ctinfo);
-		if (new_ct) {
-			if (nf_ct_is_confirmed(new_ct))
-				nf_ct_delete(new_ct, 0, 0);
+	new_ct = nf_ct_get(skb, &ctinfo);
+	if (new_ct) {
+		if (nf_ct_is_confirmed(new_ct))
+			nf_ct_delete(new_ct, 0, 0);
 
-			nf_conntrack_put(&new_ct->ct_general);
+		nf_conntrack_put(&new_ct->ct_general);
+
+		if (flags & BIT(TC_CT_CLEAR)) {
 			nf_ct_set(skb, NULL, IP_CT_UNTRACKED);
 			goto out;
 		}
+
+		nf_ct_set(skb, NULL, 0);
 	}
 
 	/* FIXME: For when we support cloning the packet
@@ -218,7 +221,6 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 	skb_pull_rcsum(skb, nh_ofs);
 	/* FIXME: OvS trims the packet here. Should we? */
 
-	/* FIXME: Need to handle multiple calls to CT action here. */
 	if (ct)
 		nf_ct_set(skb, ct, IP_CT_NEW);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT
  2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
                     ` (5 preceding siblings ...)
  2019-01-25  2:32   ` [RFC PATCH 6/6] net/sched: act_ct: allow sending a packet through conntrack multiple times Marcelo Ricardo Leitner
@ 2019-01-25  2:33   ` Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 1/5] flower: add support for CT fields Marcelo Ricardo Leitner
                       ` (4 more replies)
  6 siblings, 5 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:33 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

Same comments as for the kernel patches. Whatever is not in accordance
to the planning RFC, is because it is still in progress.

Marcelo Ricardo Leitner (5):
  flower: add support for CT fields
  act_ct: first import
  act_ct: add support for commit flag
  act/ct: add support for force flag
  act/ct: add support for clear flag

 include/uapi/linux/pkt_cls.h      |   9 +
 include/uapi/linux/tc_act/tc_ct.h |  38 ++++
 tc/Makefile                       |   1 +
 tc/f_flower.c                     | 158 +++++++++++++-
 tc/m_ct.c                         | 337 ++++++++++++++++++++++++++++++
 5 files changed, 541 insertions(+), 2 deletions(-)
 create mode 100644 include/uapi/linux/tc_act/tc_ct.h
 create mode 100644 tc/m_ct.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCH iproute2 1/5] flower: add support for CT fields
  2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
@ 2019-01-25  2:33     ` Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 2/5] act_ct: first import Marcelo Ricardo Leitner
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:33 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

Except ct_label, just a place holder for now.

Parsing of ct_state definitely should be handled better.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/pkt_cls.h |   9 ++
 tc/f_flower.c                | 158 ++++++++++++++++++++++++++++++++++-
 2 files changed, 165 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -490,6 +490,15 @@ enum {
 	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
 	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
 
+	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
+	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
+	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
+	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
+	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
+	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
+	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
+	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
+
 	__TCA_FLOWER_MAX,
 };
 
diff --git a/tc/f_flower.c b/tc/f_flower.c
index c563666702b50973703f37c0174bfae3f242fdf3..40706d7156131f0b6603a4abdbe9108451e5cff7 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -81,7 +81,11 @@ static void explain(void)
 		"                       enc_ttl MASKED-IP_TTL |\n"
 		"                       geneve_opts MASKED-OPTIONS |\n"
 		"                       ip_flags IP-FLAGS | \n"
-		"                       enc_dst_port [ port_number ] }\n"
+		"                       enc_dst_port [ port_number ] | \n"
+		"                       ct_mark MASKED-MARK | \n"
+		"                       ct_zone MASKED-ZONE | \n"
+		"                       ct_state MASKED-state | \n"
+		"                       ct_label MASKED-label } \n"
 		"       FILTERID := X:Y:Z\n"
 		"       MASKED_LLADDR := { LLADDR | LLADDR/MASK | LLADDR/BITS }\n"
 		"       ACTION-SPEC := ... look at individual actions\n"
@@ -331,7 +335,7 @@ static int flower_parse_arp_ip_addr(char *str, __be16 eth_type,
 
 static int flower_parse_u8(char *str, int value_type, int mask_type,
 			   int (*value_from_name)(const char *str,
-						 __u8 *value),
+						  __u8 *value),
 			   bool (*value_validate)(__u8 value),
 			   struct nlmsghdr *n)
 {
@@ -372,6 +376,92 @@ err:
 	return err;
 }
 
+static int flower_parse_u16(char *str, int value_type, int mask_type,
+			    int (*value_from_name)(const char *str,
+						   __u16 *value),
+			    bool (*value_validate)(__u16 value),
+			    struct nlmsghdr *n)
+{
+	char *slash;
+	int ret, err = -1;
+	__u16 value, mask;
+
+	slash = strchr(str, '/');
+	if (slash)
+		*slash = '\0';
+
+	ret = value_from_name ? value_from_name(str, &value) : -1;
+	if (ret < 0) {
+		ret = get_u16(&value, str, 10);
+		if (ret)
+			goto err;
+	}
+
+	if (value_validate && !value_validate(value))
+		goto err;
+
+	if (slash) {
+		ret = get_u16(&mask, slash + 1, 10);
+		if (ret)
+			goto err;
+	}
+	else {
+		mask = UINT16_MAX;
+	}
+
+	addattr16(n, MAX_MSG, value_type, value);
+	addattr16(n, MAX_MSG, mask_type, mask);
+
+	err = 0;
+err:
+	if (slash)
+		*slash = '/';
+	return err;
+}
+
+static int flower_parse_u32(char *str, int value_type, int mask_type,
+			    int (*value_from_name)(const char *str,
+						   __u32 *value),
+			    bool (*value_validate)(__u32 value),
+			    struct nlmsghdr *n)
+{
+	char *slash;
+	int ret, err = -1;
+	__u32 value, mask;
+
+	slash = strchr(str, '/');
+	if (slash)
+		*slash = '\0';
+
+	ret = value_from_name ? value_from_name(str, &value) : -1;
+	if (ret < 0) {
+		ret = get_u32(&value, str, 10);
+		if (ret)
+			goto err;
+	}
+
+	if (value_validate && !value_validate(value))
+		goto err;
+
+	if (slash) {
+		ret = get_u32(&mask, slash + 1, 10);
+		if (ret)
+			goto err;
+	}
+	else {
+		mask = UINT32_MAX;
+	}
+
+	addattr32(n, MAX_MSG, value_type, value);
+	addattr32(n, MAX_MSG, mask_type, mask);
+
+	err = 0;
+err:
+	if (slash)
+		*slash = '/';
+	return err;
+}
+
 static const char *flower_print_arp_op_to_name(__u8 op)
 {
 	switch (op) {
@@ -796,6 +886,40 @@ static int flower_parse_enc_opts(char *str, struct nlmsghdr *n)
 	return 0;
 }
 
+static int flower_parse_ct_mark(char *str, struct nlmsghdr *n)
+{
+	return flower_parse_u32(str,
+				TCA_FLOWER_KEY_CT_MARK,
+				TCA_FLOWER_KEY_CT_MARK_MASK,
+				NULL, NULL, n);
+}
+
+static int flower_parse_ct_zone(char *str, struct nlmsghdr *n)
+{
+	return flower_parse_u16(str,
+				TCA_FLOWER_KEY_CT_ZONE,
+				TCA_FLOWER_KEY_CT_ZONE_MASK,
+				NULL, NULL, n);
+}
+
+static int flower_parse_ct_state(char *str, struct nlmsghdr *n)
+{
+	return flower_parse_u8(str,
+			       TCA_FLOWER_KEY_CT_STATE,
+			       TCA_FLOWER_KEY_CT_STATE_MASK,
+			       NULL, NULL, n);
+}
+
+/*
+static int flower_parse_ct_label(char *str, struct nlmsghdr *n)
+{
+	return flower_parse_u128_masked(str,
+					TCA_FLOWER_KEY_CT_LABEL,
+					TCA_FLOWER_KEY_CT_LABEL_MASK,
+					n);
+}
+*/
+
 static int flower_parse_opt(struct filter_util *qu, char *handle,
 			    int argc, char **argv, struct nlmsghdr *n)
 {
@@ -1255,6 +1379,35 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 				fprintf(stderr, "Illegal \"geneve_opts\"\n");
 				return -1;
 			}
+		} else if (matches(*argv, "ct_mark") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ct_mark(*argv, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"ct_mark\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "ct_zone") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ct_zone(*argv, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"ct_zone\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "ct_state") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ct_state(*argv, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"ct_state\"\n");
+				return -1;
+			}
+/*		} else if (matches(*argv, "ct_label") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ct_label(*argv, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"ct_label\"\n");
+				return -1;
+			}
+*/
 		} else if (matches(*argv, "action") == 0) {
 			NEXT_ARG();
 			ret = parse_action(&argc, &argv, TCA_FLOWER_ACT, n);
@@ -1919,6 +2072,7 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
 			    tb[TCA_FLOWER_KEY_ENC_IP_TTL_MASK]);
 	flower_print_enc_opts("enc_opt", tb[TCA_FLOWER_KEY_ENC_OPTS],
 			      tb[TCA_FLOWER_KEY_ENC_OPTS_MASK]);
+	/* FIXME: print ct fields */
 
 	flower_print_matching_flags("ip_flags", FLOWER_IP_FLAGS,
 				    tb[TCA_FLOWER_KEY_FLAGS],
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH iproute2 2/5] act_ct: first import
  2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 1/5] flower: add support for CT fields Marcelo Ricardo Leitner
@ 2019-01-25  2:33     ` Marcelo Ricardo Leitner
  2019-02-05 22:56       ` Stephen Hemminger
  2019-01-25  2:33     ` [RFC PATCH iproute2 3/5] act_ct: add support for commit flag Marcelo Ricardo Leitner
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:33 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/tc_act/tc_ct.h |  30 +++
 tc/Makefile                       |   1 +
 tc/m_ct.c                         | 314 ++++++++++++++++++++++++++++++
 3 files changed, 345 insertions(+)
 create mode 100644 include/uapi/linux/tc_act/tc_ct.h
 create mode 100644 tc/m_ct.c

diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
new file mode 100644
index 0000000000000000000000000000000000000000..d08a5afdc4b453c5388ad2ae63a00fd3b48457f0
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef __LINUX_TC_CT_H
+#define __LINUX_TC_CT_H
+
+#include <linux/pkt_cls.h>
+#include <linux/types.h>
+
+#define TCA_ACT_CT 27
+
+enum {
+	TCA_CT_UNSPEC,
+	TCA_CT_TM,
+	TCA_CT_PARMS,
+	TCA_CT_PAD,
+	TCA_CT_ZONE,
+	TCA_CT_MARK,
+	TCA_CT_MARK_MASK,
+	TCA_CT_LABEL,
+	TCA_CT_LABEL_MASK,
+	TCA_CT_CHAIN,
+	TCA_CT_FLAGS,
+	__TCA_CT_MAX
+};
+#define TCA_CT_MAX (__TCA_CT_MAX - 1)
+
+struct tc_ct {
+	tc_gen;
+};
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index 2edaf2c8836475c78e713e789a05203e1c598327..79e10a1a5493bbc9703b47660aa145f98e533aea 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -51,6 +51,7 @@ TCMODULES += m_connmark.o
 TCMODULES += m_bpf.o
 TCMODULES += m_tunnel_key.o
 TCMODULES += m_sample.o
+TCMODULES += m_ct.o
 TCMODULES += p_ip.o
 TCMODULES += p_ip6.o
 TCMODULES += p_icmp.o
diff --git a/tc/m_ct.c b/tc/m_ct.c
new file mode 100644
index 0000000000000000000000000000000000000000..e20837ba4f9c49d1603b14721cabca1fbeca0c74
--- /dev/null
+++ b/tc/m_ct.c
@@ -0,0 +1,314 @@
+/*
+ * m_ct.c	Connection Tracking target module
+ *
+ *		This program is free software; you can distribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors: Marcelo Ricardo Leitner <mleitner@redhat.com>
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+#include "utils.h"
+#include "tc_util.h"
+#include "tc_common.h"
+#include <linux/tc_act/tc_ct.h>
+
+static void
+explain(void)
+{
+	fprintf(stderr,
+		"Usage: ct [mark <mark>] [zone <zone>] [label <label>] [chain <chain>]\n"
+		"where:\n");
+}
+
+static void
+usage(void)
+{
+	explain();
+	exit(-1);
+}
+
+#if 0
+static int ct_parse_u8(char *str, int value_type, int mask_type,
+			   int (*value_from_name)(const char *str,
+						  __u8 *value),
+			   bool (*value_validate)(__u8 value),
+			   struct nlmsghdr *n)
+{
+	char *slash;
+	int ret, err = -1;
+	__u8 value, mask;
+
+	slash = strchr(str, '/');
+	if (slash)
+		*slash = '\0';
+
+	ret = value_from_name ? value_from_name(str, &value) : -1;
+	if (ret < 0) {
+		ret = get_u8(&value, str, 10);
+		if (ret)
+			goto err;
+	}
+
+	if (value_validate && !value_validate(value))
+		goto err;
+
+	if (slash) {
+		ret = get_u8(&mask, slash + 1, 10);
+		if (ret)
+			goto err;
+	}
+	else {
+		mask = UINT8_MAX;
+	}
+
+	addattr8(n, MAX_MSG, value_type, value);
+	addattr8(n, MAX_MSG, mask_type, mask);
+
+	err = 0;
+err:
+	if (slash)
+		*slash = '/';
+	return err;
+}
+#endif
+
+static int ct_parse_u16(char *str, int value_type, int mask_type,
+			    int (*value_from_name)(const char *str,
+						   __u16 *value),
+			    bool (*value_validate)(__u16 value),
+			    struct nlmsghdr *n)
+{
+	char *slash;
+	int ret, err = -1;
+	__u16 value, mask;
+
+	slash = strchr(str, '/');
+	if (slash)
+		*slash = '\0';
+
+	ret = value_from_name ? value_from_name(str, &value) : -1;
+	if (ret < 0) {
+		ret = get_u16(&value, str, 10);
+		if (ret)
+			goto err;
+	}
+
+	if (value_validate && !value_validate(value))
+		goto err;
+
+	if (slash) {
+		ret = get_u16(&mask, slash + 1, 10);
+		if (ret)
+			goto err;
+	}
+	else {
+		mask = UINT16_MAX;
+	}
+
+	print_uint(PRINT_ANY, "zone", "zone %u ", value);
+	addattr16(n, MAX_MSG, value_type, value);
+	/* FIXME: find another parse function to avoid this */
+	if (mask_type)
+		addattr16(n, MAX_MSG, mask_type, mask);
+
+	err = 0;
+err:
+	if (slash)
+		*slash = '/';
+	return err;
+}
+
+static int ct_parse_u32(char *str, int value_type, int mask_type,
+			    int (*value_from_name)(const char *str,
+						   __u32 *value),
+			    bool (*value_validate)(__u32 value),
+			    struct nlmsghdr *n)
+{
+	char *slash;
+	int ret, err = -1;
+	__u32 value, mask;
+
+	slash = strchr(str, '/');
+	if (slash)
+		*slash = '\0';
+
+	ret = value_from_name ? value_from_name(str, &value) : -1;
+	if (ret < 0) {
+		ret = get_u32(&value, str, 10);
+		if (ret)
+			goto err;
+	}
+
+	if (value_validate && !value_validate(value))
+		goto err;
+
+	if (slash) {
+		ret = get_u32(&mask, slash + 1, 10);
+		if (ret)
+			goto err;
+	}
+	else {
+		mask = UINT32_MAX;
+	}
+
+	addattr32(n, MAX_MSG, value_type, value);
+	addattr32(n, MAX_MSG, mask_type, mask);
+
+	err = 0;
+err:
+	if (slash)
+		*slash = '/';
+	return err;
+}
+
+
+static int
+parse_ct(struct action_util *a, int *argc_p, char ***argv_p,
+	 int tca_id, struct nlmsghdr *n)
+{
+	int argc = *argc_p;
+	char **argv = *argv_p;
+	struct tc_ct p = {};
+	struct rtattr *tail;
+
+	if (argc <= 0) {
+		fprintf(stderr, "ct bad argument count %d\n", argc);
+		return -1;
+	}
+
+	if (!matches(*argv, "ct")) {
+		NEXT_ARG_FWD();
+	} else {
+		fprintf(stderr, "ct bad argument %s\n", *argv);
+		return -1;
+	}
+
+	tail = addattr_nest(n, MAX_MSG, tca_id);
+
+again:
+	if (argc <= 0) {
+		addattr_l(n, MAX_MSG, TCA_CT_PARMS, &p, sizeof(p));
+		addattr_nest_end(n, tail);
+
+		*argc_p = argc;
+		*argv_p = argv;
+		return 0;
+	}
+
+	if (!matches(*argv, "mark")) {
+		NEXT_ARG();
+		ct_parse_u32(*argv,
+			       TCA_CT_MARK,
+			       TCA_CT_MARK_MASK,
+			       NULL, NULL, n);
+		NEXT_ARG_FWD();
+		goto again;
+	}
+	if (!matches(*argv, "zone")) {
+		NEXT_ARG();
+		ct_parse_u16(*argv,
+				TCA_CT_ZONE,
+				0,
+				NULL, NULL, n);
+		NEXT_ARG_FWD();
+		goto again;
+	}
+/*	if (!matches(*argv, "state")) {
+		NEXT_ARG();
+		ct_parse_u8(*argv,
+			       TCA_FLOWER_KEY_CT_STATE,
+			       TCA_FLOWER_KEY_CT_STATE_MASK,
+			       NULL, NULL, n);
+		NEXT_ARG_FWD();
+		goto again;
+	}*/
+/*	if (!matches(*argv, "label")) {
+		NEXT_ARG();
+		goto again;
+	}*/
+/*	if (!matches(*argv, "chain")) {
+		NEXT_ARG();
+		goto again;
+	}*/
+
+	if (!matches(*argv, "help") == 0) {
+		usage();
+	} else {
+		fprintf(stderr, "ct option not supported %s\n", *argv);
+	}
+
+	return -1;
+}
+
+static void ct_print_mask(const char *name, __u32 key, __u32 mask)
+{
+	SPRINT_BUF(namefrm);
+	SPRINT_BUF(out);
+	size_t done;
+
+	if (!key && !mask)
+		return;
+
+	done = sprintf(out, "0x%x", key);
+	if (mask)
+		sprintf(out + done, "/%x", mask);
+
+	print_string(PRINT_FP, NULL, "%s  ", _SL_);
+	sprintf(namefrm, "%s %%s", name);
+	print_string(PRINT_ANY, name, namefrm, out);
+}
+
+static int
+print_ct(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+	struct tc_ct *p;
+	struct rtattr *tb[TCA_CT_MAX + 1];
+
+	if (arg == NULL)
+		return -1;
+
+	parse_rtattr_nested(tb, TCA_CT_MAX, arg);
+
+	if (tb[TCA_CT_PARMS] == NULL) {
+		print_string(PRINT_FP, NULL, "%s", "[NULL ct parameters]");
+		return -1;
+	}
+	p = RTA_DATA(tb[TCA_CT_PARMS]);
+
+	print_string(PRINT_ANY, "kind", "%s ", "ct");
+	ct_print_mask("mark", rta_getattr_u32(tb[TCA_CT_MARK]),
+		      rta_getattr_u32(tb[TCA_CT_MARK_MASK]));
+	print_uint(PRINT_ANY, "zone", "zone %u ", rta_getattr_u16(tb[TCA_CT_ZONE]));
+	print_uint(PRINT_ANY, "chain", "chain %u", rta_getattr_u32(tb[TCA_CT_CHAIN]));
+
+	print_uint(PRINT_ANY, "index", "\n \tindex %u", p->index);
+	print_int(PRINT_ANY, "ref", " ref %d", p->refcnt);
+	print_int(PRINT_ANY, "bind", " bind %d", p->bindcnt);
+
+	if (show_stats) {
+		if (tb[TCA_CT_TM]) {
+			struct tcf_t *tm = RTA_DATA(tb[TCA_CT_TM]);
+
+			print_tm(f, tm);
+		}
+	}
+	print_string(PRINT_FP, NULL, "%s", "\n ");
+	return 0;
+}
+
+struct action_util ct_action_util = {
+	.id = "ct",
+	.parse_aopt = parse_ct,
+	.print_aopt = print_ct,
+};
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH iproute2 3/5] act_ct: add support for commit flag
  2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 1/5] flower: add support for CT fields Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 2/5] act_ct: first import Marcelo Ricardo Leitner
@ 2019-01-25  2:33     ` Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 4/5] act/ct: add support for force flag Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 5/5] act/ct: add support for clear flag Marcelo Ricardo Leitner
  4 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:33 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/tc_act/tc_ct.h |  6 ++++++
 tc/m_ct.c                         | 14 ++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
index d08a5afdc4b453c5388ad2ae63a00fd3b48457f0..37b95cda1dedd283b0244a03a20860ba22966dfa 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -23,6 +23,12 @@ enum {
 };
 #define TCA_CT_MAX (__TCA_CT_MAX - 1)
 
+enum {
+	TC_CT_COMMIT,
+	__TC_CT_MAX
+};
+#define TC_CT_MAX (__TC_CT_MAX - 1)
+
 struct tc_ct {
 	tc_gen;
 };
diff --git a/tc/m_ct.c b/tc/m_ct.c
index e20837ba4f9c49d1603b14721cabca1fbeca0c74..0e9b20edab8c870f93657d43bb5e72c13e9b6bd4 100644
--- a/tc/m_ct.c
+++ b/tc/m_ct.c
@@ -27,7 +27,7 @@ static void
 explain(void)
 {
 	fprintf(stderr,
-		"Usage: ct [mark <mark>] [zone <zone>] [label <label>] [chain <chain>]\n"
+		"Usage: ct [mark <mark>] [zone <zone>] [label <label>] [chain <chain>] [commit]\n"
 		"where:\n");
 }
 
@@ -181,6 +181,7 @@ parse_ct(struct action_util *a, int *argc_p, char ***argv_p,
 	char **argv = *argv_p;
 	struct tc_ct p = {};
 	struct rtattr *tail;
+	__u32 flags = 0;
 
 	if (argc <= 0) {
 		fprintf(stderr, "ct bad argument count %d\n", argc);
@@ -198,6 +199,8 @@ parse_ct(struct action_util *a, int *argc_p, char ***argv_p,
 
 again:
 	if (argc <= 0) {
+out:
+		addattr32(n, MAX_MSG, TCA_CT_FLAGS, flags);
 		addattr_l(n, MAX_MSG, TCA_CT_PARMS, &p, sizeof(p));
 		addattr_nest_end(n, tail);
 
@@ -224,6 +227,11 @@ again:
 		NEXT_ARG_FWD();
 		goto again;
 	}
+	if (!matches(*argv, "commit")) {
+		NEXT_ARG_FWD();
+		flags |= BIT(TC_CT_COMMIT);
+		goto again;
+	}
 /*	if (!matches(*argv, "state")) {
 		NEXT_ARG();
 		ct_parse_u8(*argv,
@@ -242,7 +250,9 @@ again:
 		goto again;
 	}*/
 
-	if (!matches(*argv, "help") == 0) {
+	if (!matches(*argv, "action"))
+		goto out;
+	if (!matches(*argv, "help")) {
 		usage();
 	} else {
 		fprintf(stderr, "ct option not supported %s\n", *argv);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH iproute2 4/5] act/ct: add support for force flag
  2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
                       ` (2 preceding siblings ...)
  2019-01-25  2:33     ` [RFC PATCH iproute2 3/5] act_ct: add support for commit flag Marcelo Ricardo Leitner
@ 2019-01-25  2:33     ` Marcelo Ricardo Leitner
  2019-01-25  2:33     ` [RFC PATCH iproute2 5/5] act/ct: add support for clear flag Marcelo Ricardo Leitner
  4 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:33 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/tc_act/tc_ct.h |  1 +
 tc/m_ct.c                         | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
index 37b95cda1dedd283b0244a03a20860ba22966dfa..009e53ee83fb3125bc5c4ca86954af3bf6a0287a 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -25,6 +25,7 @@ enum {
 
 enum {
 	TC_CT_COMMIT,
+	TC_CT_FORCE,
 	__TC_CT_MAX
 };
 #define TC_CT_MAX (__TC_CT_MAX - 1)
diff --git a/tc/m_ct.c b/tc/m_ct.c
index 0e9b20edab8c870f93657d43bb5e72c13e9b6bd4..c3c1a66848ae52c4522ac7e07822febf2b52e5f1 100644
--- a/tc/m_ct.c
+++ b/tc/m_ct.c
@@ -27,7 +27,7 @@ static void
 explain(void)
 {
 	fprintf(stderr,
-		"Usage: ct [mark <mark>] [zone <zone>] [label <label>] [chain <chain>] [commit]\n"
+		"Usage: ct [mark <mark>] [zone <zone>] [label <label>] [chain <chain>] [commit [force]]\n"
 		"where:\n");
 }
 
@@ -232,6 +232,13 @@ out:
 		flags |= BIT(TC_CT_COMMIT);
 		goto again;
 	}
+	if (!matches(*argv, "force")) {
+		if (!(flags & BIT(TC_CT_COMMIT)))
+			goto help;
+		NEXT_ARG_FWD();
+		flags |= BIT(TC_CT_FORCE);
+		goto again;
+	}
 /*	if (!matches(*argv, "state")) {
 		NEXT_ARG();
 		ct_parse_u8(*argv,
@@ -253,6 +260,7 @@ out:
 	if (!matches(*argv, "action"))
 		goto out;
 	if (!matches(*argv, "help")) {
+help:
 		usage();
 	} else {
 		fprintf(stderr, "ct option not supported %s\n", *argv);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCH iproute2 5/5] act/ct: add support for clear flag
  2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
                       ` (3 preceding siblings ...)
  2019-01-25  2:33     ` [RFC PATCH iproute2 4/5] act/ct: add support for force flag Marcelo Ricardo Leitner
@ 2019-01-25  2:33     ` Marcelo Ricardo Leitner
  4 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-25  2:33 UTC (permalink / raw)
  To: Guy Shattah, Marcelo Leitner, Aaron Conole, John Hurley,
	Simon Horman, Justin Pettit, Gregory Rose, Eelco Chaudron,
	Flavio Leitner, Florian Westphal, Jiri Pirko, Rashid Khan,
	Sushil Kulkarni, Andy Gospodarek, Roi Dayan, Yossi Kuperman,
	Or Gerlitz, Rony Efraim, davem
  Cc: netdev

Same comment as in the kernel patch: parsing and argument checking should
be done better here.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
---
 include/uapi/linux/tc_act/tc_ct.h | 1 +
 tc/m_ct.c                         | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
index 009e53ee83fb3125bc5c4ca86954af3bf6a0287a..636f435b86e006aa36034f86c65fd5c220ca8a13 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -26,6 +26,7 @@ enum {
 enum {
 	TC_CT_COMMIT,
 	TC_CT_FORCE,
+	TC_CT_CLEAR,
 	__TC_CT_MAX
 };
 #define TC_CT_MAX (__TC_CT_MAX - 1)
diff --git a/tc/m_ct.c b/tc/m_ct.c
index c3c1a66848ae52c4522ac7e07822febf2b52e5f1..0221fc666a84dc3df73eff97beca997b9b11a9f0 100644
--- a/tc/m_ct.c
+++ b/tc/m_ct.c
@@ -27,7 +27,7 @@ static void
 explain(void)
 {
 	fprintf(stderr,
-		"Usage: ct [mark <mark>] [zone <zone>] [label <label>] [chain <chain>] [commit [force]]\n"
+		"Usage: ct [mark <mark>] [zone <zone>] [label <label>] [chain <chain>] [commit [force]] [clear]\n"
 		"where:\n");
 }
 
@@ -239,6 +239,11 @@ out:
 		flags |= BIT(TC_CT_FORCE);
 		goto again;
 	}
+	if (!matches(*argv, "clear")) {
+		NEXT_ARG_FWD();
+		flags |= BIT(TC_CT_CLEAR);
+		goto again;
+	}
 /*	if (!matches(*argv, "state")) {
 		NEXT_ARG();
 		ct_parse_u8(*argv,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/6] net/sched: flower: add support for matching on ConnTrack
  2019-01-25  2:32   ` [RFC PATCH 2/6] net/sched: flower: " Marcelo Ricardo Leitner
@ 2019-01-25 13:37     ` Simon Horman
  2019-01-26 15:52       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Simon Horman @ 2019-01-25 13:37 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Guy Shattah, Aaron Conole, John Hurley, Justin Pettit,
	Gregory Rose, Eelco Chaudron, Flavio Leitner, Florian Westphal,
	Jiri Pirko, Rashid Khan, Sushil Kulkarni, Andy Gospodarek,
	Roi Dayan, Yossi Kuperman, Or Gerlitz, Rony Efraim, davem,
	netdev

Hi Marcelo,

On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> Hook on flow dissector's new interface on ConnTrack from previous patch.
> 
> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> ---
>  include/uapi/linux/pkt_cls.h |  9 +++++++++
>  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -490,6 +490,15 @@ enum {
>  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
>  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
>  
> +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */

With the corresponding flow dissector patch this API is
exposing the contents of an instance of enum ip_conntrack_info
as an ABI for conntrack state.

I believe (after getting similar review for my geneve options macthing
patches for flower) that this exposes implementation details as an ABI
to a degree that is not desirable.

My suggested would be to define, say in the form of named bits,
an ABI, that describes the state information that is exposed.
These bits may not correspond directly to the implementation of
ip_conntrack_info.

I think there should also be some consideration of if a mask makes
sense for the state as, f.e. in the implementation of enum
ip_conntrack_info not all bit combinations are valid. 

> +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> +
>  	__TCA_FLOWER_MAX,
>  };

...

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/6] net/sched: flower: add support for matching on ConnTrack
  2019-01-25 13:37     ` Simon Horman
@ 2019-01-26 15:52       ` Marcelo Ricardo Leitner
  2019-01-28  9:44         ` Simon Horman
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-26 15:52 UTC (permalink / raw)
  To: Simon Horman
  Cc: Guy Shattah, Aaron Conole, John Hurley, Justin Pettit,
	Gregory Rose, Eelco Chaudron, Flavio Leitner, Florian Westphal,
	Jiri Pirko, Rashid Khan, Sushil Kulkarni, Andy Gospodarek,
	Roi Dayan, Yossi Kuperman, Or Gerlitz, Rony Efraim, davem,
	netdev

On Fri, Jan 25, 2019 at 02:37:13PM +0100, Simon Horman wrote:
> Hi Marcelo,
> 
> On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> > Hook on flow dissector's new interface on ConnTrack from previous patch.
> > 
> > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > ---
> >  include/uapi/linux/pkt_cls.h |  9 +++++++++
> >  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
> >  2 files changed, 42 insertions(+)
> > 
> > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> > index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> > --- a/include/uapi/linux/pkt_cls.h
> > +++ b/include/uapi/linux/pkt_cls.h
> > @@ -490,6 +490,15 @@ enum {
> >  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
> >  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
> >  
> > +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> > +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> > +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> > +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
> 
> With the corresponding flow dissector patch this API is
> exposing the contents of an instance of enum ip_conntrack_info
> as an ABI for conntrack state.
> 
> I believe (after getting similar review for my geneve options macthing
> patches for flower) that this exposes implementation details as an ABI
> to a degree that is not desirable.
> 
> My suggested would be to define, say in the form of named bits,
> an ABI, that describes the state information that is exposed.
> These bits may not correspond directly to the implementation of
> ip_conntrack_info.
> 
> I think there should also be some consideration of if a mask makes
> sense for the state as, f.e. in the implementation of enum
> ip_conntrack_info not all bit combinations are valid. 

Right. ct_state must be handled differently. For conntrack it is a
linear enum and as we want to be able to OR match, we will have to
convert the states in a bitfield as you were saying or so.

I don't think the representation above wouldn't change, though: we have
8 bits wrapped under a u8. What would change is how we deal with it.

If iproute tc is able to parse the cmdline and set a corresponding bit
for each state, the flower-side of flow dissector here should be
mostly fine (need to consider the invalid bits as you mentioned, as
part of sanity checking).
Then just need to change on how flow dissector is reading ct_state
from the packet.


Is your comment only related to ct_state or other fields too? I'm
thinking only ct_state.

> 
> > +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> > +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> > +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> > +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> > +
> >  	__TCA_FLOWER_MAX,
> >  };
> 
> ...
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/6] net/sched: flower: add support for matching on ConnTrack
  2019-01-26 15:52       ` Marcelo Ricardo Leitner
@ 2019-01-28  9:44         ` Simon Horman
  2019-01-28 12:55           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Simon Horman @ 2019-01-28  9:44 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Guy Shattah, Aaron Conole, John Hurley, Justin Pettit,
	Gregory Rose, Eelco Chaudron, Flavio Leitner, Florian Westphal,
	Jiri Pirko, Rashid Khan, Sushil Kulkarni, Andy Gospodarek,
	Roi Dayan, Yossi Kuperman, Or Gerlitz, Rony Efraim, davem,
	netdev

On Sat, Jan 26, 2019 at 01:52:01PM -0200, Marcelo Ricardo Leitner wrote:
> On Fri, Jan 25, 2019 at 02:37:13PM +0100, Simon Horman wrote:
> > Hi Marcelo,
> > 
> > On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> > > Hook on flow dissector's new interface on ConnTrack from previous patch.
> > > 
> > > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > > ---
> > >  include/uapi/linux/pkt_cls.h |  9 +++++++++
> > >  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
> > >  2 files changed, 42 insertions(+)
> > > 
> > > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> > > index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> > > --- a/include/uapi/linux/pkt_cls.h
> > > +++ b/include/uapi/linux/pkt_cls.h
> > > @@ -490,6 +490,15 @@ enum {
> > >  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
> > >  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
> > >  
> > > +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> > > +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> > > +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> > > +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
> > 
> > With the corresponding flow dissector patch this API is
> > exposing the contents of an instance of enum ip_conntrack_info
> > as an ABI for conntrack state.
> > 
> > I believe (after getting similar review for my geneve options macthing
> > patches for flower) that this exposes implementation details as an ABI
> > to a degree that is not desirable.
> > 
> > My suggested would be to define, say in the form of named bits,
> > an ABI, that describes the state information that is exposed.
> > These bits may not correspond directly to the implementation of
> > ip_conntrack_info.
> > 
> > I think there should also be some consideration of if a mask makes
> > sense for the state as, f.e. in the implementation of enum
> > ip_conntrack_info not all bit combinations are valid. 
> 
> Right. ct_state must be handled differently. For conntrack it is a
> linear enum and as we want to be able to OR match, we will have to
> convert the states in a bitfield as you were saying or so.
> 
> I don't think the representation above wouldn't change, though: we have
> 8 bits wrapped under a u8. What would change is how we deal with it.
> 
> If iproute tc is able to parse the cmdline and set a corresponding bit
> for each state, the flower-side of flow dissector here should be
> mostly fine (need to consider the invalid bits as you mentioned, as
> part of sanity checking).
> Then just need to change on how flow dissector is reading ct_state
> from the packet.

I'm not entirely opposed to a KABI which defines bits of
TCA_FLOWER_KEY_CT_STATE in such a way that they match exactly
the current implementation of enum ip_conntrack_info (though I do suspect
that a better representation is possible, for some value of better).

But, on the other hand, I am not comfortable in simply sating that
TCA_FLOWER_KEY_CT_STATE is the same as enum ip_conntrack_info, because
that exposes an internal implementation detail which may change.

> Is your comment only related to ct_state or other fields too? I'm
> thinking only ct_state.

Sorry that I wasn't clear, I was only referring to ct_state.

> > > +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> > > +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> > > +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> > > +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> > > +
> > >  	__TCA_FLOWER_MAX,
> > >  };
> > 
> > ...
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/6] net/sched: flower: add support for matching on ConnTrack
  2019-01-28  9:44         ` Simon Horman
@ 2019-01-28 12:55           ` Marcelo Ricardo Leitner
  2019-01-28 13:02             ` Florian Westphal
  0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-01-28 12:55 UTC (permalink / raw)
  To: Simon Horman
  Cc: Guy Shattah, Aaron Conole, John Hurley, Justin Pettit,
	Gregory Rose, Eelco Chaudron, Flavio Leitner, Florian Westphal,
	Jiri Pirko, Rashid Khan, Sushil Kulkarni, Andy Gospodarek,
	Roi Dayan, Yossi Kuperman, Or Gerlitz, Rony Efraim, davem,
	netdev

On Mon, Jan 28, 2019 at 10:44:23AM +0100, Simon Horman wrote:
> On Sat, Jan 26, 2019 at 01:52:01PM -0200, Marcelo Ricardo Leitner wrote:
> > On Fri, Jan 25, 2019 at 02:37:13PM +0100, Simon Horman wrote:
> > > Hi Marcelo,
> > > 
> > > On Fri, Jan 25, 2019 at 12:32:31AM -0200, Marcelo Ricardo Leitner wrote:
> > > > Hook on flow dissector's new interface on ConnTrack from previous patch.
> > > > 
> > > > Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
> > > > ---
> > > >  include/uapi/linux/pkt_cls.h |  9 +++++++++
> > > >  net/sched/cls_flower.c       | 33 +++++++++++++++++++++++++++++++++
> > > >  2 files changed, 42 insertions(+)
> > > > 
> > > > diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> > > > index 95d0db2a8350dffb1dd20816591f3b179913fb2e..ba1f3bc01b2fdfd810e37a2b3853a1da1f838acf 100644
> > > > --- a/include/uapi/linux/pkt_cls.h
> > > > +++ b/include/uapi/linux/pkt_cls.h
> > > > @@ -490,6 +490,15 @@ enum {
> > > >  	TCA_FLOWER_KEY_PORT_DST_MIN,	/* be16 */
> > > >  	TCA_FLOWER_KEY_PORT_DST_MAX,	/* be16 */
> > > >  
> > > > +	TCA_FLOWER_KEY_CT_ZONE,		/* u16 */
> > > > +	TCA_FLOWER_KEY_CT_ZONE_MASK,	/* u16 */
> > > > +	TCA_FLOWER_KEY_CT_STATE,	/* u8 */
> > > > +	TCA_FLOWER_KEY_CT_STATE_MASK,	/* u8 */
> > > 
> > > With the corresponding flow dissector patch this API is
> > > exposing the contents of an instance of enum ip_conntrack_info
> > > as an ABI for conntrack state.
> > > 
> > > I believe (after getting similar review for my geneve options macthing
> > > patches for flower) that this exposes implementation details as an ABI
> > > to a degree that is not desirable.
> > > 
> > > My suggested would be to define, say in the form of named bits,
> > > an ABI, that describes the state information that is exposed.
> > > These bits may not correspond directly to the implementation of
> > > ip_conntrack_info.
> > > 
> > > I think there should also be some consideration of if a mask makes
> > > sense for the state as, f.e. in the implementation of enum
> > > ip_conntrack_info not all bit combinations are valid. 
> > 
> > Right. ct_state must be handled differently. For conntrack it is a
> > linear enum and as we want to be able to OR match, we will have to
> > convert the states in a bitfield as you were saying or so.
> > 
> > I don't think the representation above wouldn't change, though: we have
> > 8 bits wrapped under a u8. What would change is how we deal with it.
> > 
> > If iproute tc is able to parse the cmdline and set a corresponding bit
> > for each state, the flower-side of flow dissector here should be
> > mostly fine (need to consider the invalid bits as you mentioned, as
> > part of sanity checking).
> > Then just need to change on how flow dissector is reading ct_state
> > from the packet.
> 
> I'm not entirely opposed to a KABI which defines bits of
> TCA_FLOWER_KEY_CT_STATE in such a way that they match exactly
> the current implementation of enum ip_conntrack_info (though I do suspect
> that a better representation is possible, for some value of better).

Ok. Will check.

> 
> But, on the other hand, I am not comfortable in simply sating that
> TCA_FLOWER_KEY_CT_STATE is the same as enum ip_conntrack_info, because
> that exposes an internal implementation detail which may change.

"internal" here is relative because enum ip_conntrack_info is on UAPI
already, at include/uapi/linux/netfilter/nf_conntrack_common.h.
Anyhow, agree that a new listing may make more sense.

The duplicity in enum ip_conntrack_info might hide some surpises for
us too:

enum ip_conntrack_info {
        IP_CT_ESTABLISHED,                = 0
        IP_CT_RELATED,                    = 1
        IP_CT_NEW,                        = 2
        IP_CT_IS_REPLY,                   = 3   <--
        IP_CT_ESTABLISHED_REPLY = IP_CT_ESTABLISHED + IP_CT_IS_REPLY,
                                    0 + 3 = 3   <--
        IP_CT_RELATED_REPLY = IP_CT_RELATED + IP_CT_IS_REPLY,
                                    1 + 3 = 4

> 
> > Is your comment only related to ct_state or other fields too? I'm
> > thinking only ct_state.
> 
> Sorry that I wasn't clear, I was only referring to ct_state.

No need to be :-)

Thanks,
Marcelo

> 
> > > > +	TCA_FLOWER_KEY_CT_MARK,		/* u32 */
> > > > +	TCA_FLOWER_KEY_CT_MARK_MASK,	/* u32 */
> > > > +	TCA_FLOWER_KEY_CT_LABEL,	/* 128 bits */
> > > > +	TCA_FLOWER_KEY_CT_LABEL_MASK,	/* 128 bits */
> > > > +
> > > >  	__TCA_FLOWER_MAX,
> > > >  };
> > > 
> > > ...
> > > 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH 2/6] net/sched: flower: add support for matching on ConnTrack
  2019-01-28 12:55           ` Marcelo Ricardo Leitner
@ 2019-01-28 13:02             ` Florian Westphal
  0 siblings, 0 replies; 21+ messages in thread
From: Florian Westphal @ 2019-01-28 13:02 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Simon Horman, Guy Shattah, Aaron Conole, John Hurley,
	Justin Pettit, Gregory Rose, Eelco Chaudron, Flavio Leitner,
	Florian Westphal, Jiri Pirko, Rashid Khan, Sushil Kulkarni,
	Andy Gospodarek, Roi Dayan, Yossi Kuperman, Or Gerlitz,
	Rony Efraim, davem, netdev

Marcelo Ricardo Leitner <mleitner@redhat.com> wrote:
> 
> us too:
> 
> enum ip_conntrack_info {
>         IP_CT_ESTABLISHED,                = 0
>         IP_CT_RELATED,                    = 1
>         IP_CT_NEW,                        = 2
>         IP_CT_IS_REPLY,                   = 3   <--
>         IP_CT_ESTABLISHED_REPLY = IP_CT_ESTABLISHED + IP_CT_IS_REPLY,
>                                     0 + 3 = 3   <--

Don't worry, there is no IP_CT_ESTABLISHED_REPLY state, just pretend
you did not see it :-)

As for UAPI, I suggest to check net/netfilter/nft_ct.c which already
exposes established/new/invalid and so on.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH iproute2 2/5] act_ct: first import
  2019-01-25  2:33     ` [RFC PATCH iproute2 2/5] act_ct: first import Marcelo Ricardo Leitner
@ 2019-02-05 22:56       ` Stephen Hemminger
  2019-02-06  0:09         ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2019-02-05 22:56 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Guy Shattah, Aaron Conole, John Hurley, Simon Horman,
	Justin Pettit, Gregory Rose, Eelco Chaudron, Flavio Leitner,
	Florian Westphal, Jiri Pirko, Rashid Khan, Sushil Kulkarni,
	Andy Gospodarek, Roi Dayan, Yossi Kuperman, Or Gerlitz,
	Rony Efraim, davem, netdev

On Fri, 25 Jan 2019 00:33:30 -0200
Marcelo Ricardo Leitner <mleitner@redhat.com> wrote:

> +/*
> + * m_ct.c	Connection Tracking target module
> + *
> + *		This program is free software; you can distribute it and/or
> + *		modify it under the terms of the GNU General Public License
> + *		as published by the Free Software Foundation; either version
> + *		2 of the License, or (at your option) any later version.

Please just use SPDX for license info not GPL boilerplate.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [RFC PATCH iproute2 2/5] act_ct: first import
  2019-02-05 22:56       ` Stephen Hemminger
@ 2019-02-06  0:09         ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 21+ messages in thread
From: Marcelo Ricardo Leitner @ 2019-02-06  0:09 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Guy Shattah, Aaron Conole, John Hurley, Simon Horman,
	Justin Pettit, Gregory Rose, Eelco Chaudron, Flavio Leitner,
	Florian Westphal, Jiri Pirko, Rashid Khan, Sushil Kulkarni,
	Andy Gospodarek, Roi Dayan, Yossi Kuperman, Or Gerlitz,
	Rony Efraim, davem, netdev

On Tue, Feb 05, 2019 at 02:56:13PM -0800, Stephen Hemminger wrote:
> On Fri, 25 Jan 2019 00:33:30 -0200
> Marcelo Ricardo Leitner <mleitner@redhat.com> wrote:
> 
> > +/*
> > + * m_ct.c	Connection Tracking target module
> > + *
> > + *		This program is free software; you can distribute it and/or
> > + *		modify it under the terms of the GNU General Public License
> > + *		as published by the Free Software Foundation; either version
> > + *		2 of the License, or (at your option) any later version.
> 
> Please just use SPDX for license info not GPL boilerplate.
> 

Will do.

  Marcelo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-02-06  0:09 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23 11:29 [RFC] Connection Tracking Offload netdev RFC v1.0, part 1/2: command line + implementation Guy Shattah
2019-01-25  2:32 ` [RFC PATCH 0/6] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
2019-01-25  2:32   ` [RFC PATCH 1/6] flow_dissector: add support for matching on ConnTrack Marcelo Ricardo Leitner
2019-01-25  2:32   ` [RFC PATCH 2/6] net/sched: flower: " Marcelo Ricardo Leitner
2019-01-25 13:37     ` Simon Horman
2019-01-26 15:52       ` Marcelo Ricardo Leitner
2019-01-28  9:44         ` Simon Horman
2019-01-28 12:55           ` Marcelo Ricardo Leitner
2019-01-28 13:02             ` Florian Westphal
2019-01-25  2:32   ` [RFC PATCH 3/6] net/sched: add CT action Marcelo Ricardo Leitner
2019-01-25  2:32   ` [RFC PATCH 4/6] net/sched: act_ct: add support for force flag Marcelo Ricardo Leitner
2019-01-25  2:32   ` [RFC PATCH 5/6] net/sched: act_ct: add support for clear flag Marcelo Ricardo Leitner
2019-01-25  2:32   ` [RFC PATCH 6/6] net/sched: act_ct: allow sending a packet through conntrack multiple times Marcelo Ricardo Leitner
2019-01-25  2:33   ` [RFC PATCH iproute2 0/5] Initial, PoC implementation of sw datapath of tc+CT Marcelo Ricardo Leitner
2019-01-25  2:33     ` [RFC PATCH iproute2 1/5] flower: add support for CT fields Marcelo Ricardo Leitner
2019-01-25  2:33     ` [RFC PATCH iproute2 2/5] act_ct: first import Marcelo Ricardo Leitner
2019-02-05 22:56       ` Stephen Hemminger
2019-02-06  0:09         ` Marcelo Ricardo Leitner
2019-01-25  2:33     ` [RFC PATCH iproute2 3/5] act_ct: add support for commit flag Marcelo Ricardo Leitner
2019-01-25  2:33     ` [RFC PATCH iproute2 4/5] act/ct: add support for force flag Marcelo Ricardo Leitner
2019-01-25  2:33     ` [RFC PATCH iproute2 5/5] act/ct: add support for clear flag Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.