[PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
@ 2020-05-14 13:48 Paul Blakey
  2020-05-14 13:48 ` [PATCH net-next 1/3] netfilter: flowtable: Control flow offload timeout interval Paul Blakey
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Paul Blakey @ 2020-05-14 13:48 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

This patchset adds support for specifying the offload policy of act ct
offloaded flows to the nf flow table (and then hardware).

policy_pkts - specifies after how many software packets to offload
a flow to the flow table

policy_timeout - specifies the aging timeout, in seconds, from last seen
packet

Usage is:
$ tc filter add dev ens1f0_0 ingress chain 0 flower ct_state -trk \
action ct policy_timeout 120 policy_pkts 10 pipe \
action goto chain 1

$ tc filter add dev ens1f0_0 ingress chain 1 flower ct_state +trk+new \
action ct commit policy_timeout 120 policy_pkts 10 pipe \
action mirred egress redirect dev ens1f0_1

$ tc filter add dev ens1f0_0 ingress chain 1 flower ct_state +trk+est \
action mirred egress redirect dev ens1f0_1

To avoid conflicting policies, the policy is applied per zone on the first
act ct instance for that zone, and must be repeated in all further act ct
instances of the same zone.

Paul Blakey (3):
  netfilter: flowtable: Control flow offload timeout interval
  net/sched: act_ct: Add policy_pkts tuple offload control policy
  net/sched: act_ct: Add policy_timeout tuple offload control policy

 include/net/netfilter/nf_flow_table.h |  7 ++-
 include/net/tc_act/tc_ct.h            |  5 ++
 include/uapi/linux/tc_act/tc_ct.h     |  2 +
 net/netfilter/nf_flow_table_core.c    | 12 ++++-
 net/netfilter/nf_flow_table_offload.c |  5 +-
 net/sched/act_ct.c                    | 93 ++++++++++++++++++++++++++++++++++-
 6 files changed, 117 insertions(+), 7 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/3] netfilter: flowtable: Control flow offload timeout interval
  2020-05-14 13:48 [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Paul Blakey
@ 2020-05-14 13:48 ` Paul Blakey
  2020-05-14 13:48 ` [PATCH net-next 2/3] net/sched: act_ct: Add policy_pkts tuple offload control policy Paul Blakey
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Paul Blakey @ 2020-05-14 13:48 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Add support for setting per flow table flow timeout in seconds,
which determines the after how many seconds from last seen packet
to delete the flow from offload. This can be set by the user of the
API before initializing the table, otherwise, the default value (30
seconds) will be used.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/netfilter/nf_flow_table.h |  7 ++++++-
 net/netfilter/nf_flow_table_core.c    | 12 ++++++++++--
 net/netfilter/nf_flow_table_offload.c |  5 +++--
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 6bf6965..8dca957 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -75,6 +75,7 @@ struct nf_flowtable {
 	unsigned int			flags;
 	struct flow_block		flow_block;
 	struct rw_semaphore		flow_block_lock; /* Guards flow_block */
+	u32				flow_timeout;
 	possible_net_t			net;
 };
 
@@ -143,7 +144,6 @@ struct flow_offload {
 	struct rcu_head				rcu_head;
 };
 
-#define NF_FLOW_TIMEOUT (30 * HZ)
 #define nf_flowtable_time_stamp	(u32)jiffies
 
 static inline __s32 nf_flow_timeout_delta(unsigned int timeout)
@@ -151,6 +151,11 @@ static inline __s32 nf_flow_timeout_delta(unsigned int timeout)
 	return (__s32)(timeout - nf_flowtable_time_stamp);
 }
 
+static inline __s32 nf_flow_offload_timeout(struct nf_flowtable *flow_table)
+{
+	return flow_table->flow_timeout;
+}
+
 struct nf_flow_route {
 	struct {
 		struct dst_entry	*dst;
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 4344e57..b9af9f2 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -225,7 +225,8 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
 {
 	int err;
 
-	flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT;
+	flow->timeout = nf_flowtable_time_stamp +
+			nf_flow_offload_timeout(flow_table);
 
 	err = rhashtable_insert_fast(&flow_table->rhashtable,
 				     &flow->tuplehash[0].node,
@@ -255,7 +256,8 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
 void flow_offload_refresh(struct nf_flowtable *flow_table,
 			  struct flow_offload *flow)
 {
-	flow->timeout = nf_flowtable_time_stamp + NF_FLOW_TIMEOUT;
+	flow->timeout = nf_flowtable_time_stamp +
+			nf_flow_offload_timeout(flow_table);
 
 	if (likely(!nf_flowtable_hw_offload(flow_table) ||
 		   !test_and_clear_bit(NF_FLOW_HW_REFRESH, &flow->flags)))
@@ -547,10 +549,16 @@ int nf_flow_dnat_port(const struct flow_offload *flow,
 }
 EXPORT_SYMBOL_GPL(nf_flow_dnat_port);
 
+#define NF_DEFAULT_FLOW_TIMEOUT (30 * HZ)
+
 int nf_flow_table_init(struct nf_flowtable *flowtable)
 {
 	int err;
 
+	flowtable->flow_timeout = flowtable->flow_timeout ?
+				  flowtable->flow_timeout * HZ :
+				  NF_DEFAULT_FLOW_TIMEOUT;
+
 	INIT_DEFERRABLE_WORK(&flowtable->gc_work, nf_flow_offload_work_gc);
 	flow_block_init(&flowtable->flow_block);
 	init_rwsem(&flowtable->flow_block_lock);
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 6dccbc7..2a43476 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -779,6 +779,7 @@ static void flow_offload_tuple_stats(struct flow_offload_work *offload,
 
 static void flow_offload_work_stats(struct flow_offload_work *offload)
 {
+	u64 flow_timeout = nf_flow_offload_timeout(offload->flowtable);
 	struct flow_stats stats[FLOW_OFFLOAD_DIR_MAX] = {};
 	u64 lastused;
 
@@ -787,7 +788,7 @@ static void flow_offload_work_stats(struct flow_offload_work *offload)
 
 	lastused = max_t(u64, stats[0].lastused, stats[1].lastused);
 	offload->flow->timeout = max_t(u64, offload->flow->timeout,
-				       lastused + NF_FLOW_TIMEOUT);
+				       lastused + flow_timeout);
 
 	if (offload->flowtable->flags & NF_FLOWTABLE_COUNTER) {
 		if (stats[0].pkts)
@@ -880,7 +881,7 @@ void nf_flow_offload_stats(struct nf_flowtable *flowtable,
 	__s32 delta;
 
 	delta = nf_flow_timeout_delta(flow->timeout);
-	if ((delta >= (9 * NF_FLOW_TIMEOUT) / 10))
+	if ((delta >= (9 * nf_flow_offload_timeout(flowtable)) / 10))
 		return;
 
 	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_STATS);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/3] net/sched: act_ct: Add policy_pkts tuple offload control policy
  2020-05-14 13:48 [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Paul Blakey
  2020-05-14 13:48 ` [PATCH net-next 1/3] netfilter: flowtable: Control flow offload timeout interval Paul Blakey
@ 2020-05-14 13:48 ` Paul Blakey
  2020-05-14 13:48 ` [PATCH net-next 3/3] net/sched: act_ct: Add policy_timeout " Paul Blakey
  2020-05-14 14:04 ` [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Edward Cree
  3 siblings, 0 replies; 13+ messages in thread
From: Paul Blakey @ 2020-05-14 13:48 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Use netfilter packet accounting to add support for tc user
specifying per zone policy_pkts, which determines after how
many packets a tuple will be offloaded to the zone's flow table.

To avoid conflicting policies, the same policy must be given for all
act ct instances of the same zone.

Usage example:
$ tc filter add dev ens1f0_0 ingress chain 0 flower ct_state -trk \
action ct policy_pkts 10 pipe action goto chain 1

$ tc filter add dev ens1f0_0 ingress chain 1 flower ct_state \
action ct commit policy_pkts 10 pipe \
action mirred egress redirect dev ens1f0_1

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/tc_act/tc_ct.h        |  4 +++
 include/uapi/linux/tc_act/tc_ct.h |  1 +
 net/sched/act_ct.c                | 74 +++++++++++++++++++++++++++++++++++++--
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 79654bc..9d9fdcc 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -19,6 +19,10 @@ struct tcf_ct_params {
 	u32 labels[NF_CT_LABELS_MAX_SIZE / sizeof(u32)];
 	u32 labels_mask[NF_CT_LABELS_MAX_SIZE / sizeof(u32)];
 
+	struct {
+		u32 pkts;
+	} offload_policy;
+
 	struct nf_nat_range2 range;
 	bool ipv4_range;
 
diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
index 5fb1d7a..a97e740 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -21,6 +21,7 @@ enum {
 	TCA_CT_NAT_IPV6_MAX,	/* struct in6_addr */
 	TCA_CT_NAT_PORT_MIN,	/* be16 */
 	TCA_CT_NAT_PORT_MAX,	/* be16 */
+	TCA_CT_OFFLOAD_POLICY_PKTS,	/* u32 */
 	TCA_CT_PAD,
 	__TCA_CT_MAX
 };
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 9adff83..5b18d62 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -27,6 +27,7 @@
 
 #include <net/netfilter/nf_flow_table.h>
 #include <net/netfilter/nf_conntrack.h>
+#include <net/netfilter/nf_conntrack_acct.h>
 #include <net/netfilter/nf_conntrack_core.h>
 #include <net/netfilter/nf_conntrack_zones.h>
 #include <net/netfilter/nf_conntrack_helper.h>
@@ -46,6 +47,10 @@ struct tcf_ct_flow_table {
 	refcount_t ref;
 	u16 zone;
 
+	struct {
+		u32 pkts;
+	} offload_policy;
+
 	bool dying;
 };
 
@@ -267,12 +272,57 @@ static int tcf_ct_flow_table_fill_actions(struct net *net,
 	return err;
 }
 
+static int tcf_ct_flow_table_policy_set(struct net *net,
+					struct tcf_ct_params *params,
+					struct netlink_ext_ack *extack,
+					bool ct_ft_created)
+{
+	struct tcf_ct_flow_table *ct_ft = params->ct_ft;
+
+	if (!ct_ft_created) {
+		if (params->offload_policy.pkts != ct_ft->offload_policy.pkts) {
+			NL_SET_ERR_MSG_MOD(extack, "Policy pkts must match previous action ct instance");
+			return -EOPNOTSUPP;
+		}
+		return 0;
+	}
+
+	if (params->offload_policy.pkts) {
+		ct_ft->offload_policy.pkts = params->offload_policy.pkts;
+		if (!nf_ct_acct_enabled(net))
+			nf_ct_set_acct(net, true);
+	}
+	return 0;
+}
+
+static bool tcf_ct_check_offload_policy(struct tcf_ct_flow_table *ct_ft,
+					struct nf_conn *ct)
+{
+	struct nf_conn_counter *counter;
+	struct nf_conn_acct *acct;
+	u64 pkts;
+
+	if (!ct_ft->offload_policy.pkts)
+		return true;
+
+	acct = nf_conn_acct_find(ct);
+	if (!acct)
+		return false;
+
+	counter = acct->counter;
+	pkts = atomic64_read(&counter[IP_CT_DIR_ORIGINAL].packets) +
+	       atomic64_read(&counter[IP_CT_DIR_REPLY].packets);
+
+	return pkts >= ct_ft->offload_policy.pkts;
+}
+
 static struct nf_flowtable_type flowtable_ct = {
 	.action		= tcf_ct_flow_table_fill_actions,
 	.owner		= THIS_MODULE,
 };
 
-static int tcf_ct_flow_table_get(struct tcf_ct_params *params)
+static int tcf_ct_flow_table_get(struct tcf_ct_params *params,
+				 bool *ct_ft_created)
 {
 	struct tcf_ct_flow_table *ct_ft;
 	int err = -ENOMEM;
@@ -299,6 +349,7 @@ static int tcf_ct_flow_table_get(struct tcf_ct_params *params)
 		goto err_init;
 
 	__module_get(THIS_MODULE);
+	*ct_ft_created = true;
 out_unlock:
 	params->ct_ft = ct_ft;
 	params->nf_ft = &ct_ft->nf_ft;
@@ -345,6 +396,9 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 	struct flow_offload *entry;
 	int err;
 
+	if (!tcf_ct_check_offload_policy(ct_ft, ct))
+		return;
+
 	if (test_and_set_bit(IPS_OFFLOAD_BIT, &ct->status))
 		return;
 
@@ -1034,6 +1088,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 				   .len = sizeof(struct in6_addr) },
 	[TCA_CT_NAT_PORT_MIN] = { .type = NLA_U16 },
 	[TCA_CT_NAT_PORT_MAX] = { .type = NLA_U16 },
+	[TCA_CT_OFFLOAD_POLICY_PKTS] = { .type = NLA_U32 },
 };
 
 static int tcf_ct_fill_params_nat(struct tcf_ct_params *p,
@@ -1179,6 +1234,10 @@ static int tcf_ct_fill_params(struct net *net,
 				   sizeof(p->zone));
 	}
 
+	if (tb[TCA_CT_OFFLOAD_POLICY_PKTS])
+		p->offload_policy.pkts =
+			nla_get_u32(tb[TCA_CT_OFFLOAD_POLICY_PKTS]);
+
 	if (p->zone == NF_CT_DEFAULT_ZONE_ID)
 		return 0;
 
@@ -1205,6 +1264,7 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla,
 	struct tcf_ct_params *params = NULL;
 	struct nlattr *tb[TCA_CT_MAX + 1];
 	struct tcf_chain *goto_ch = NULL;
+	bool ct_ft_created = false;
 	struct tc_ct *parm;
 	struct tcf_ct *c;
 	int err, res = 0;
@@ -1262,7 +1322,11 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla,
 	if (err)
 		goto cleanup;
 
-	err = tcf_ct_flow_table_get(params);
+	err = tcf_ct_flow_table_get(params, &ct_ft_created);
+	if (err)
+		goto cleanup;
+
+	err = tcf_ct_flow_table_policy_set(net, params, extack, ct_ft_created);
 	if (err)
 		goto cleanup;
 
@@ -1282,6 +1346,8 @@ static int tcf_ct_init(struct net *net, struct nlattr *nla,
 	return res;
 
 cleanup:
+	if (params->ct_ft)
+		tcf_ct_flow_table_put(params);
 	if (goto_ch)
 		tcf_chain_put_by_act(goto_ch);
 	kfree(params);
@@ -1414,6 +1480,10 @@ static inline int tcf_ct_dump(struct sk_buff *skb, struct tc_action *a,
 	if (tcf_ct_dump_nat(skb, p))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_CT_OFFLOAD_POLICY_PKTS,
+			p->offload_policy.pkts))
+		goto nla_put_failure;
+
 skip_dump:
 	if (nla_put(skb, TCA_CT_PARMS, sizeof(opt), &opt))
 		goto nla_put_failure;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/3] net/sched: act_ct: Add policy_timeout tuple offload control policy
  2020-05-14 13:48 [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Paul Blakey
  2020-05-14 13:48 ` [PATCH net-next 1/3] netfilter: flowtable: Control flow offload timeout interval Paul Blakey
  2020-05-14 13:48 ` [PATCH net-next 2/3] net/sched: act_ct: Add policy_pkts tuple offload control policy Paul Blakey
@ 2020-05-14 13:48 ` Paul Blakey
  2020-05-14 14:04 ` [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Edward Cree
  3 siblings, 0 replies; 13+ messages in thread
From: Paul Blakey @ 2020-05-14 13:48 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Use nf flow table flow timeout param, to add support for tc user
specifying per zone policy_timeout, which determines the time, in
seconds, a tuple will remain offloaded in the flow table after
last seen packet.

To avoid conflicting policies, the same policy must be given for all
act ct instances of the same zone.

Usage example:
$ tc filter add dev ens1f0_0 ingress chain 0 flower ct_state -trk \
action ct policy_timeout 120 pipe action goto chain 1

$ tc filter add dev ens1f0_0 ingress chain 1 flower ct_state \
action ct commit policy_timeout 120 pipe \
action mirred egress redirect dev ens1f0_1

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/tc_act/tc_ct.h        |  1 +
 include/uapi/linux/tc_act/tc_ct.h |  1 +
 net/sched/act_ct.c                | 19 +++++++++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/include/net/tc_act/tc_ct.h b/include/net/tc_act/tc_ct.h
index 9d9fdcc..1de3769 100644
--- a/include/net/tc_act/tc_ct.h
+++ b/include/net/tc_act/tc_ct.h
@@ -21,6 +21,7 @@ struct tcf_ct_params {
 
 	struct {
 		u32 pkts;
+		u32 timeout;
 	} offload_policy;
 
 	struct nf_nat_range2 range;
diff --git a/include/uapi/linux/tc_act/tc_ct.h b/include/uapi/linux/tc_act/tc_ct.h
index a97e740..501709e 100644
--- a/include/uapi/linux/tc_act/tc_ct.h
+++ b/include/uapi/linux/tc_act/tc_ct.h
@@ -22,6 +22,7 @@ enum {
 	TCA_CT_NAT_PORT_MIN,	/* be16 */
 	TCA_CT_NAT_PORT_MAX,	/* be16 */
 	TCA_CT_OFFLOAD_POLICY_PKTS,	/* u32 */
+	TCA_CT_OFFLOAD_POLICY_TIMEOUT,	/* u32 */
 	TCA_CT_PAD,
 	__TCA_CT_MAX
 };
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 5b18d62..de95d9d 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -49,6 +49,7 @@ struct tcf_ct_flow_table {
 
 	struct {
 		u32 pkts;
+		u32 timeout;
 	} offload_policy;
 
 	bool dying;
@@ -284,6 +285,12 @@ static int tcf_ct_flow_table_policy_set(struct net *net,
 			NL_SET_ERR_MSG_MOD(extack, "Policy pkts must match previous action ct instance");
 			return -EOPNOTSUPP;
 		}
+
+		if (params->offload_policy.timeout !=
+		    ct_ft->offload_policy.timeout) {
+			NL_SET_ERR_MSG_MOD(extack, "Policy timeout must match previous action ct instance");
+			return -EOPNOTSUPP;
+		}
 		return 0;
 	}
 
@@ -292,6 +299,8 @@ static int tcf_ct_flow_table_policy_set(struct net *net,
 		if (!nf_ct_acct_enabled(net))
 			nf_ct_set_acct(net, true);
 	}
+
+	ct_ft->offload_policy.timeout = params->offload_policy.timeout;
 	return 0;
 }
 
@@ -344,6 +353,7 @@ static int tcf_ct_flow_table_get(struct tcf_ct_params *params,
 
 	ct_ft->nf_ft.type = &flowtable_ct;
 	ct_ft->nf_ft.flags |= NF_FLOWTABLE_HW_OFFLOAD;
+	ct_ft->nf_ft.flow_timeout = params->offload_policy.timeout;
 	err = nf_flow_table_init(&ct_ft->nf_ft);
 	if (err)
 		goto err_init;
@@ -1089,6 +1099,7 @@ static int tcf_ct_act(struct sk_buff *skb, const struct tc_action *a,
 	[TCA_CT_NAT_PORT_MIN] = { .type = NLA_U16 },
 	[TCA_CT_NAT_PORT_MAX] = { .type = NLA_U16 },
 	[TCA_CT_OFFLOAD_POLICY_PKTS] = { .type = NLA_U32 },
+	[TCA_CT_OFFLOAD_POLICY_TIMEOUT] = { .type = NLA_U32 },
 };
 
 static int tcf_ct_fill_params_nat(struct tcf_ct_params *p,
@@ -1238,6 +1249,10 @@ static int tcf_ct_fill_params(struct net *net,
 		p->offload_policy.pkts =
 			nla_get_u32(tb[TCA_CT_OFFLOAD_POLICY_PKTS]);
 
+	if (tb[TCA_CT_OFFLOAD_POLICY_TIMEOUT])
+		p->offload_policy.timeout =
+			nla_get_u32(tb[TCA_CT_OFFLOAD_POLICY_TIMEOUT]);
+
 	if (p->zone == NF_CT_DEFAULT_ZONE_ID)
 		return 0;
 
@@ -1484,6 +1499,10 @@ static inline int tcf_ct_dump(struct sk_buff *skb, struct tc_action *a,
 			p->offload_policy.pkts))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_CT_OFFLOAD_POLICY_TIMEOUT,
+			p->offload_policy.timeout))
+		goto nla_put_failure;
+
 skip_dump:
 	if (nla_put(skb, TCA_CT_PARMS, sizeof(opt), &opt))
 		goto nla_put_failure;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-14 13:48 [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Paul Blakey
                   ` (2 preceding siblings ...)
  2020-05-14 13:48 ` [PATCH net-next 3/3] net/sched: act_ct: Add policy_timeout " Paul Blakey
@ 2020-05-14 14:04 ` Edward Cree
  2020-05-14 14:49   ` Jiri Pirko
  3 siblings, 1 reply; 13+ messages in thread
From: Edward Cree @ 2020-05-14 14:04 UTC (permalink / raw)
  To: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

On 14/05/2020 14:48, Paul Blakey wrote:
> To avoid conflicting policies, the policy is applied per zone on the first
> act ct instance for that zone, and must be repeated in all further act ct
> instances of the same zone.
Is the scope of this the entire zone, or just offload of that zone to a
 specific device?
Either way, the need to repeat the policy on every tc command suggests
 that there really ought to instead be a separate API for configuring
 conntrack offload policy, either per zone or per (zone, device) pair,
 as appropriate.

-ed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-14 14:04 ` [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Edward Cree
@ 2020-05-14 14:49   ` Jiri Pirko
  2020-05-14 15:28     ` Edward Cree
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Pirko @ 2020-05-14 14:49 UTC (permalink / raw)
  To: Edward Cree
  Cc: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Thu, May 14, 2020 at 04:04:02PM CEST, ecree@solarflare.com wrote:
>On 14/05/2020 14:48, Paul Blakey wrote:
>> To avoid conflicting policies, the policy is applied per zone on the first
>> act ct instance for that zone, and must be repeated in all further act ct
>> instances of the same zone.
>Is the scope of this the entire zone, or just offload of that zone to a
> specific device?
>Either way, the need to repeat the policy on every tc command suggests
> that there really ought to instead be a separate API for configuring
> conntrack offload policy, either per zone or per (zone, device) pair,
> as appropriate.

You don't have to repeat. You specify it in the first one, in the rest
you omit it.


>
>-ed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-14 14:49   ` Jiri Pirko
@ 2020-05-14 15:28     ` Edward Cree
  2020-05-18 16:17       ` Paul Blakey
  0 siblings, 1 reply; 13+ messages in thread
From: Edward Cree @ 2020-05-14 15:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

On 14/05/2020 15:49, Jiri Pirko wrote:
> Thu, May 14, 2020 at 04:04:02PM CEST, ecree@solarflare.com wrote:
>> Either way, the need to repeat the policy on every tc command suggests
>>  that there really ought to instead be a separate API for configuring
>>  conntrack offload policy, either per zone or per (zone, device) pair,
>>  as appropriate.
> You don't have to repeat. You specify it in the first one, in the rest
> you omit it.
Ok, well (a) the commit message needs changing to make that clear, and
 (b) that kind of implicit action-at-a-distance slightly bothers me.
If you don't repeat, then the order of tc commands matters, and any
 program (e.g. OvS) generating these commands will need to either keep
 track of which zones it's configured policy for already, or just
 repeat on every command just in case.
It really doesn't feel like an orthogonal, Unixy API to me.

<offtopic rambliness="very">
TBH I think that when a tc rule with a ct action is offloaded, drivers
 should get three callbacks in order:
1) a CT zone callback (if the CT zone is 'new')
2) an action callback, including a pointer to the CT zone info (in case
  the driver chose to ignore the CT zone callback because it had no
   offloading work to do at that point) (if the action is 'new')
3) a rule callback, including a pointer to the action info (in case the
   driver ignored the action creation).
And each of these should be drivable directly from userspace as well as
 being called automatically by the level below it.
Currently we have (2) as a distinct entity in TC, but no-one offloads
 it (and as I've ranted before, that makes a mess of stats) and AIUI
 it's not called when user creates a rule, only when using 'tc action'
 command directly).  And (1) doesn't exist at all; drivers just have
 to notice the first time a tc ct action they're offloading mentions a
 given zone, and call into nf_flow_table_offload to register for
 tracked conns in that zone.  I feel that this hierarchical 'offload
 dependencies first' model would make drivers simpler and more robust,
 as well as helping to ensure different drivers share a consistent
 interpretation of the API.
RFC on the above?  Obviously I'm not likely to start implementing it
 until after we have a TC-supporting sfc driver upstream (it's coming
 Real Soon Now™, I swear!) but I thought it worthwhile to throw the
 design up for discussion earlier rather than later.
</offtopic>

-ed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-14 15:28     ` Edward Cree
@ 2020-05-18 16:17       ` Paul Blakey
  2020-05-18 16:48         ` Edward Cree
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Blakey @ 2020-05-18 16:17 UTC (permalink / raw)
  To: Edward Cree, Jiri Pirko
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko, Roi Dayan


On 14/05/2020 18:28, Edward Cree wrote:
> On 14/05/2020 15:49, Jiri Pirko wrote:
>> Thu, May 14, 2020 at 04:04:02PM CEST, ecree@solarflare.com wrote:
>>> Either way, the need to repeat the policy on every tc command suggests
>>>  that there really ought to instead be a separate API for configuring
>>>  conntrack offload policy, either per zone or per (zone, device) pair,
>>>  as appropriate.
>> You don't have to repeat. You specify it in the first one, in the rest
>> you omit it.
> Ok, well (a) the commit message needs changing to make that clear, and
>  (b) that kind of implicit action-at-a-distance slightly bothers me.
> If you don't repeat, then the order of tc commands matters, and any
>  program (e.g. OvS) generating these commands will need to either keep
>  track of which zones it's configured policy for already, or just
>  repeat on every command just in case.
> It really doesn't feel like an orthogonal, Unixy API to me.
You were right that a tc user needs to supply the same policy for all ct action instances of the same zone (not per zone/dev).
Actually, in my internal previous version of this patchset, if the user doesn't specify a policy, it uses what ever was configured before,
and if the user does specify, it must match the previosuly configured table parameter or override a previously unconfigured table parameter (or new table).
But we think, as you pointed out, explicit as here is better, there is just no API to configure the flow table currently so we suggested this.
Do you have any suggestion for an API that would be better?

Yes in OvS we planned on submitting the same global policy for all ct actions.

Paul.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-18 16:17       ` Paul Blakey
@ 2020-05-18 16:48         ` Edward Cree
  2020-05-18 17:25           ` Jiri Pirko
  0 siblings, 1 reply; 13+ messages in thread
From: Edward Cree @ 2020-05-18 16:48 UTC (permalink / raw)
  To: Paul Blakey, Jiri Pirko
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko, Roi Dayan

On 18/05/2020 17:17, Paul Blakey wrote:
> But we think, as you pointed out, explicit as here is better, there is just no API to configure the flow table currently so we suggested this.
> Do you have any suggestion for an API that would be better?
I see two possible approaches.  We could either say "conntrack is
 part of netfilter, so this should be an nftnetlink API", or we
 could say "this is about configuring TC offload (of conntracks),
 so it belongs in a TC command".  I lean towards the latter mainly
 so that I don't have to install & learn netfilter commands (the
 current conntrack offload can be enabled without touching them).
So it'd be something like "tc ct add zone 1 timeout 120 pkts 10",
 and if a 'tc filter add' or 'tc action add' references a ct zone
 that hasn't been 'tc ct add'ed, it gets automatically added with
 the default policy (and if you come along later and try to 'tc ct
 add' it you get an EBUSY or EEXIST or something).

WDYT?

-ed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-18 16:48         ` Edward Cree
@ 2020-05-18 17:25           ` Jiri Pirko
  2020-05-18 18:02             ` Edward Cree
  0 siblings, 1 reply; 13+ messages in thread
From: Jiri Pirko @ 2020-05-18 17:25 UTC (permalink / raw)
  To: Edward Cree
  Cc: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

Mon, May 18, 2020 at 06:48:46PM CEST, ecree@solarflare.com wrote:
>On 18/05/2020 17:17, Paul Blakey wrote:
>> But we think, as you pointed out, explicit as here is better, there is just no API to configure the flow table currently so we suggested this.
>> Do you have any suggestion for an API that would be better?
>I see two possible approaches.  We could either say "conntrack is
> part of netfilter, so this should be an nftnetlink API", or we
> could say "this is about configuring TC offload (of conntracks),
> so it belongs in a TC command".  I lean towards the latter mainly
> so that I don't have to install & learn netfilter commands (the
> current conntrack offload can be enabled without touching them).
>So it'd be something like "tc ct add zone 1 timeout 120 pkts 10",
> and if a 'tc filter add' or 'tc action add' references a ct zone
> that hasn't been 'tc ct add'ed, it gets automatically added with
> the default policy (and if you come along later and try to 'tc ct
> add' it you get an EBUSY or EEXIST or something).

Is it worth to have an object just for this particular purpose? In the
past I was trying to push a tc block object that could be added/removed
and being used to insert filters w/o being attached to any qdisc. This
was frowned upon and refused because the existence of block does not
have any meaning w/o being attached.

What you suggest with zone sounds quite similar. More to that, it is
related only to act_ct. Is it a good idea to have a common object in TC
which is actually used as internal part of act_ct only?

To be honest, I don't like the idea.


>
>WDYT?
>
>-ed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-18 17:25           ` Jiri Pirko
@ 2020-05-18 18:02             ` Edward Cree
  2020-05-26  9:25               ` Paul Blakey
  0 siblings, 1 reply; 13+ messages in thread
From: Edward Cree @ 2020-05-18 18:02 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Paul Blakey, Saeed Mahameed, Oz Shlomo, Jakub Kicinski,
	Vlad Buslov, David Miller, netdev, Jiri Pirko, Roi Dayan

On 18/05/2020 18:25, Jiri Pirko wrote:
> Is it worth to have an object just for this particular purpose? In the
> past I was trying to push a tc block object that could be added/removed
> and being used to insert filters w/o being attached to any qdisc. This
> was frowned upon and refused because the existence of block does not
> have any meaning w/o being attached.
A tc action doesn't have any meaning either until it is attached to a
 filter.  Is the consensus that the 'tc action' API/command set was a
 mistake, or am I misunderstanding the objection?

> What you suggest with zone sounds quite similar. More to that, it is
> related only to act_ct. Is it a good idea to have a common object in TC
> which is actually used as internal part of act_ct only?
Well, really it's related as much to flower ct_stateas to act_ct: the
 policy numbers control when a conntrack rule (from the zone) gets
 offloaded into drivers, thus determining whether a packet (which has
 been through an act_ct to make it +trk) is ±est.
It's because it has a scope broader than a single ct action that I'm
 resistant to hanging it off act_ct in this way.

Also it still somewhat bothers me that this policy isn't scoped to the
 device; I realise that the current implementation of a single flow
 table shared by all offloading devices is what forces that, but it
 just doesn't seem semantically right that the policy on when to
 offload a connection is global across devices with potentially
 differing capabilities (e.g. size of their conntrack tables) that
 might want different policies.
(And a 'tc ct add dev eth0 zone 1 policy_blah...' would conveniently
 give a hook point for callback (1) from my offtopic ramble, that the
 driver could use to register for connections in the zone and start
 offloading them to hardware, rather than doing it the first time it
 sees that zone show up in an act_ct it's offloading.  You don't
 really want to do the same in the non-device-qualified case because
 that could use up HW table space for connections in a zone you're
 not offloading any rules for.)

Basically I'm just dreaming of a world where TC does a lot more with
 explicit objects that it creates and then references, rather than
 drivers having to implicitly create HW objects for things the first
 time a rule tries to reference them.
"Is it worth" all these extra objects?  Really that depends on how
 much simpler the drivers can become as a result; this is the control
 path, so programmer time is worth more than machine time, and space
 in the programmer's head is worth more than machine RAM ;-)

-ed

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-18 18:02             ` Edward Cree
@ 2020-05-26  9:25               ` Paul Blakey
  2020-05-26 16:17                 ` Edward Cree
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Blakey @ 2020-05-26  9:25 UTC (permalink / raw)
  To: Edward Cree, Jiri Pirko
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko, Roi Dayan

On 5/18/2020 9:02 PM, Edward Cree wrote:
> On 18/05/2020 18:25, Jiri Pirko wrote:
>> Is it worth to have an object just for this particular purpose? In the
>> past I was trying to push a tc block object that could be added/removed
>> and being used to insert filters w/o being attached to any qdisc. This
>> was frowned upon and refused because the existence of block does not
>> have any meaning w/o being attached.
> A tc action doesn't have any meaning either until it is attached to a
>  filter.  Is the consensus that the 'tc action' API/command set was a
>  mistake, or am I misunderstanding the objection?
>
>> What you suggest with zone sounds quite similar. More to that, it is
>> related only to act_ct. Is it a good idea to have a common object in TC
>> which is actually used as internal part of act_ct only?
> Well, really it's related as much to flower ct_stateas to act_ct: the
>  policy numbers control when a conntrack rule (from the zone) gets
>  offloaded into drivers, thus determining whether a packet (which has
>  been through an act_ct to make it +trk) is ±est.

It doesn't affect when a connection will become established (+est),
just the offloading of such connections.

> It's because it has a scope broader than a single ct action that I'm
>  resistant to hanging it off act_ct in this way.
>
> Also it still somewhat bothers me that this policy isn't scoped to the
>  device; I realise that the current implementation of a single flow
>  table shared by all offloading devices is what forces that, but it
>  just doesn't seem semantically right that the policy on when to
>  offload a connection is global across devices with potentially
>  differing capabilities (e.g. size of their conntrack tables) that
>  might want different policies.
> (And a 'tc ct add dev eth0 zone 1 policy_blah...' would conveniently
>  give a hook point for callback (1) from my offtopic ramble, that the
>  driver could use to register for connections in the zone and start
>  offloading them to hardware, rather than doing it the first time it
>  sees that zone show up in an act_ct it's offloading.  You don't
>  really want to do the same in the non-device-qualified case because
>  that could use up HW table space for connections in a zone you're
>  not offloading any rules for.)
>
> Basically I'm just dreaming of a world where TC does a lot more with
>  explicit objects that it creates and then references, rather than
>  drivers having to implicitly create HW objects for things the first
>  time a rule tries to reference them.
> "Is it worth" all these extra objects?  Really that depends on how
>  much simpler the drivers can become as a result; this is the control
>  path, so programmer time is worth more than machine time, and space
>  in the programmer's head is worth more than machine RAM ;-)
>
> -ed

I see what you mean here, but this is only used to control action ct behavior
and we don't expect this to be used or referenced in other actions/filters.

What you are suggesting will require new userspace and kernel (builtin)
tc netlink API to manage conntrack zones/nf flow tables policies.

I'm not sure how well it will sit with the flow table having a device while
the filter has a tc block which can have multiple devices.

And then we have the single IPS_OFFLOAD_BIT so a flow can't currently be
shared between different flow tables that will be created for different devices.
We will need to do a an atomic lookup/insert to each table.

So this will need a lot of work, and I think might be a overkill till we have more
use cases besides the policy per device case which can still be achieved, if needed,
with different conntrack zones.


 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy
  2020-05-26  9:25               ` Paul Blakey
@ 2020-05-26 16:17                 ` Edward Cree
  0 siblings, 0 replies; 13+ messages in thread
From: Edward Cree @ 2020-05-26 16:17 UTC (permalink / raw)
  To: Paul Blakey, Jiri Pirko
  Cc: Saeed Mahameed, Oz Shlomo, Jakub Kicinski, Vlad Buslov,
	David Miller, netdev, Jiri Pirko, Roi Dayan

On 26/05/2020 10:25, Paul Blakey wrote:
> On 5/18/2020 9:02 PM, Edward Cree wrote:
>> Well, really it's related as much to flower ct_state as to act_ct: the
>>  policy numbers control when a conntrack rule (from the zone) gets
>>  offloaded into drivers, thus determining whether a packet (which has
>>  been through an act_ct to make it +trk) is ±est.
> It doesn't affect when a connection will become established (+est),
> just the offloading of such connections.
Yes; what I meant was that the packet wouldn't match a +trk+estrule in
 hardware if the connection hadn't been offloaded (it would instead go
 to software, where it would match).
So the policy is in a sense controlling ct_state offload, as much as it
 is act_ct offload.

> I'm not sure how well it will sit with the flow table having a device while
> the filter has a tc block which can have multiple devices.
>
> And then we have the single IPS_OFFLOAD_BIT so a flow can't currently be
> shared between different flow tables that will be created for different devices.
> We will need to do a an atomic lookup/insert to each table.
I see; it does get a bit complicated.  I agree that the 'policy per
 device' use case is too speculative to justify all that work.

But I think I can still make my dream happen with the global policy,
 just by putting the "we used this CT zone in an action, we need to
 offload it" smarts into TC rather than all the drivers.  Which I
 suppose is orthogonal to the TC CT policy uAPI question; but I still
 prefer the 'explicit CT object' approach.  To do those smarts, TC
 would need a CT object internally anyway, so we may as well expose
 it and get a clean policy API into the bargain.

> What you are suggesting will require new userspace and kernel (builtin)
> tc netlink API to manage conntrack zones/nf flow tables policies.
Either way there's new API.  Yours may be a smaller change (just adding
 new attributes hung off the act_ct), but you still need to update
 userspace to take advantage of it, else you get the default policy.
And `tc ct add zone 1 policy_timeout 120` (global policy) can be done
 backwards-compatibly, too, as I described in my message before last.

(Policy-per-device would be much more of a pain new-uAPI-wise; again,
 I agree to drop that idea.)

-ed

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-26 16:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 13:48 [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Paul Blakey
2020-05-14 13:48 ` [PATCH net-next 1/3] netfilter: flowtable: Control flow offload timeout interval Paul Blakey
2020-05-14 13:48 ` [PATCH net-next 2/3] net/sched: act_ct: Add policy_pkts tuple offload control policy Paul Blakey
2020-05-14 13:48 ` [PATCH net-next 3/3] net/sched: act_ct: Add policy_timeout " Paul Blakey
2020-05-14 14:04 ` [PATCH net-next 0/3] net/sched: act_ct: Add support for specifying tuple offload policy Edward Cree
2020-05-14 14:49   ` Jiri Pirko
2020-05-14 15:28     ` Edward Cree
2020-05-18 16:17       ` Paul Blakey
2020-05-18 16:48         ` Edward Cree
2020-05-18 17:25           ` Jiri Pirko
2020-05-18 18:02             ` Edward Cree
2020-05-26  9:25               ` Paul Blakey
2020-05-26 16:17                 ` Edward Cree

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.