All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection
@ 2018-07-30 12:30 Paolo Abeni
  2018-07-30 12:30 ` [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Paolo Abeni @ 2018-07-30 12:30 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

This series is aimed at improving the act_mirred redirect performances.
Such action is used by OVS to represent TC S/W flows, and it's current largest
bottle-neck is the need for a skb_clone() for each packet.

The first 2 patches introduce some cleanup and safeguards to allow extending 
tca_result - we will use it to store RCU protected redirect information - and
introduce a clear separation between user-space accessible tcfa_action
values and internal values accessible only by the kernel.
Then a new tcfa_action value is introduced: TC_ACT_REINJECT, similar to
TC_ACT_REDIRECT, but preserving the mirred semantic. Such value is not
accessible from user-space.
The last patch exploits the newly introduced infrastructure in the act_mirred
action, to avoid a skb_clone, when possible.

Overall this the above gives a ~10% performance improvement in forwarding tput,
when using the TC S/W datapath.

v1 -> v2:
 - preserve the rcu lock in act_bpf
 - add and use a new action value to reinject the packets, preserving the mirred
   semantic

v2 -> v3:
 - renamed to new action as TC_ACT_REINJECT
 - TC_ACT_REINJECT is not exposed to user-space

v3 -> v4:
 - dropped the TC_ACT_REDIRECT patch
 - report failure via extack, too
 - rename the new action as TC_ACT_REINSERT
 - skip clone only if the control action don't touch tcf_result

v4 -> v5:
 - fix a couple of build issue reported by kbuild bot
 - dont split messages

Paolo Abeni (4):
  net/sched: user-space can't set unknown tcfa_action values
  tc/act: remove unneeded RCU lock in action callback
  net/tc: introduce TC_ACT_REINSERT.
  act_mirred: use TC_ACT_REINSERT when possible

 include/net/act_api.h        |  2 +-
 include/net/pkt_cls.h        |  3 ++
 include/net/sch_generic.h    | 30 +++++++++++++++++++
 include/uapi/linux/pkt_cls.h |  6 ++--
 net/core/dev.c               |  6 +++-
 net/sched/act_api.c          | 14 +++++++++
 net/sched/act_csum.c         | 12 ++------
 net/sched/act_ife.c          |  5 +---
 net/sched/act_mirred.c       | 57 ++++++++++++++++++++++++++++--------
 net/sched/act_sample.c       |  4 +--
 net/sched/act_skbedit.c      | 10 ++-----
 net/sched/act_skbmod.c       | 21 ++++++-------
 net/sched/act_tunnel_key.c   |  6 +---
 net/sched/act_vlan.c         | 19 +++++-------
 14 files changed, 126 insertions(+), 69 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-30 12:30 [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection Paolo Abeni
@ 2018-07-30 12:30 ` Paolo Abeni
  2018-07-30 12:36   ` Jiri Pirko
  2018-07-30 14:03   ` Jamal Hadi Salim
  2018-07-30 12:30 ` [PATCH net-next v5 2/4] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 20+ messages in thread
From: Paolo Abeni @ 2018-07-30 12:30 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Currently, when initializing an action, the user-space can specify
and use arbitrary values for the tcfa_action field. If the value
is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC.

This change explicitly checks for unknown values at action creation
time, and explicitly convert them to TC_ACT_UNSPEC. No functional
changes are introduced, but this will allow introducing tcfa_action
values not exposed to user-space in a later patch.

Note: we can't use the above to hide TC_ACT_REDIRECT from user-space,
as the latter is already part of uAPI.

v3 -> v4:
 - use an helper to check for action validity (JiriP)
 - emit an extack for invalid actions (JiriP)
v4 -> v5:
 - keep messages on a single line, drop net_warn (Marcelo)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/uapi/linux/pkt_cls.h |  6 ++++--
 net/sched/act_api.c          | 14 ++++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index b4512254036b..48e5b5d49a34 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -45,6 +45,7 @@ enum {
 				   * the skb and act like everything
 				   * is alright.
 				   */
+#define TC_ACT_VALUE_MAX	TC_ACT_TRAP
 
 /* There is a special kind of actions called "extended actions",
  * which need a value parameter. These have a local opcode located in
@@ -55,11 +56,12 @@ enum {
 #define __TC_ACT_EXT_SHIFT 28
 #define __TC_ACT_EXT(local) ((local) << __TC_ACT_EXT_SHIFT)
 #define TC_ACT_EXT_VAL_MASK ((1 << __TC_ACT_EXT_SHIFT) - 1)
-#define TC_ACT_EXT_CMP(combined, opcode) \
-	(((combined) & (~TC_ACT_EXT_VAL_MASK)) == opcode)
+#define TC_ACT_EXT_OPCODE(combined) ((combined) & (~TC_ACT_EXT_VAL_MASK))
+#define TC_ACT_EXT_CMP(combined, opcode) (TC_ACT_EXT_OPCODE(combined) == opcode)
 
 #define TC_ACT_JUMP __TC_ACT_EXT(1)
 #define TC_ACT_GOTO_CHAIN __TC_ACT_EXT(2)
+#define TC_ACT_EXT_OPCODE_MAX	TC_ACT_GOTO_CHAIN
 
 /* Action type identifiers*/
 enum {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index b43df1e25c6d..229d63c99be2 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -786,6 +786,15 @@ static struct tc_cookie *nla_memdup_cookie(struct nlattr **tb)
 	return c;
 }
 
+static bool tcf_action_valid(int action)
+{
+	int opcode = TC_ACT_EXT_OPCODE(action);
+
+	if (!opcode)
+		return action <= TC_ACT_VALUE_MAX;
+	return opcode <= TC_ACT_EXT_OPCODE_MAX || action == TC_ACT_UNSPEC;
+}
+
 struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 				    struct nlattr *nla, struct nlattr *est,
 				    char *name, int ovr, int bind,
@@ -895,6 +904,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 		}
 	}
 
+	if (!tcf_action_valid(a->tcfa_action)) {
+		NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC instead");
+		a->tcfa_action = TC_ACT_UNSPEC;
+	}
+
 	return a;
 
 err_mod:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 2/4] tc/act: remove unneeded RCU lock in action callback
  2018-07-30 12:30 [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection Paolo Abeni
  2018-07-30 12:30 ` [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
@ 2018-07-30 12:30 ` Paolo Abeni
  2018-07-30 12:30 ` [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Paolo Abeni @ 2018-07-30 12:30 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Each lockless action currently does its own RCU locking in ->act().
This allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with the _bh variant, cleans up a bit the surrounding code and
documents the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

v1 -> v2:
 - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
   as pointed out by Daniel

v3 -> v4:
 - fixed some typos in the commit message (JiriP)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/act_api.h      |  2 +-
 include/net/sch_generic.h  |  2 ++
 net/sched/act_csum.c       | 12 +++---------
 net/sched/act_ife.c        |  5 +----
 net/sched/act_mirred.c     |  4 +---
 net/sched/act_sample.c     |  4 +---
 net/sched/act_skbedit.c    | 10 +++-------
 net/sched/act_skbmod.c     | 21 +++++++++------------
 net/sched/act_tunnel_key.c |  6 +-----
 net/sched/act_vlan.c       | 19 +++++++------------
 10 files changed, 29 insertions(+), 56 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 683ce41053d9..8c9bc02d05e1 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -85,7 +85,7 @@ struct tc_action_ops {
 	size_t	size;
 	struct module		*owner;
 	int     (*act)(struct sk_buff *, const struct tc_action *,
-		       struct tcf_result *);
+		       struct tcf_result *); /* called under RCU BH lock*/
 	int     (*dump)(struct sk_buff *, struct tc_action *, int, int);
 	void	(*cleanup)(struct tc_action *);
 	int     (*lookup)(struct net *net, struct tc_action **a, u32 index,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index c5432362dc26..bcae181c1857 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -285,6 +285,8 @@ struct tcf_proto {
 	/* Fast access part */
 	struct tcf_proto __rcu	*next;
 	void __rcu		*root;
+
+	/* called under RCU BH lock*/
 	int			(*classify)(struct sk_buff *,
 					    const struct tcf_proto *,
 					    struct tcf_result *);
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 4e8c383f379e..648a3a35b720 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -561,15 +561,14 @@ static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
 	u32 update_flags;
 	int action;
 
-	rcu_read_lock();
-	params = rcu_dereference(p->params);
+	params = rcu_dereference_bh(p->params);
 
 	tcf_lastuse_update(&p->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(p->common.cpu_bstats), skb);
 
 	action = READ_ONCE(p->tcf_action);
 	if (unlikely(action == TC_ACT_SHOT))
-		goto drop_stats;
+		goto drop;
 
 	update_flags = params->update_flags;
 	switch (tc_skb_protocol(skb)) {
@@ -583,16 +582,11 @@ static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
 		break;
 	}
 
-unlock:
-	rcu_read_unlock();
 	return action;
 
 drop:
-	action = TC_ACT_SHOT;
-
-drop_stats:
 	qstats_drop_inc(this_cpu_ptr(p->common.cpu_qstats));
-	goto unlock;
+	return TC_ACT_SHOT;
 }
 
 static int tcf_csum_dump(struct sk_buff *skb, struct tc_action *a, int bind,
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 3d6e265758c0..df4060e32d43 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -820,14 +820,11 @@ static int tcf_ife_act(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_ife_params *p;
 	int ret;
 
-	rcu_read_lock();
-	p = rcu_dereference(ife->params);
+	p = rcu_dereference_bh(ife->params);
 	if (p->flags & IFE_ENCODE) {
 		ret = tcf_ife_encode(skb, a, res, p);
-		rcu_read_unlock();
 		return ret;
 	}
-	rcu_read_unlock();
 
 	return tcf_ife_decode(skb, a, res);
 }
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 6afd89a36c69..eeb335f03102 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -181,11 +181,10 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	tcf_lastuse_update(&m->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
 
-	rcu_read_lock();
 	m_mac_header_xmit = READ_ONCE(m->tcfm_mac_header_xmit);
 	m_eaction = READ_ONCE(m->tcfm_eaction);
 	retval = READ_ONCE(m->tcf_action);
-	dev = rcu_dereference(m->tcfm_dev);
+	dev = rcu_dereference_bh(m->tcfm_dev);
 	if (unlikely(!dev)) {
 		pr_notice_once("tc mirred: target device is gone\n");
 		goto out;
@@ -236,7 +235,6 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		if (tcf_mirred_is_act_redirect(m_eaction))
 			retval = TC_ACT_SHOT;
 	}
-	rcu_read_unlock();
 
 	return retval;
 }
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 3079e7be5bde..2608ccc83e5e 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -140,8 +140,7 @@ static int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a,
 	bstats_cpu_update(this_cpu_ptr(s->common.cpu_bstats), skb);
 	retval = READ_ONCE(s->tcf_action);
 
-	rcu_read_lock();
-	psample_group = rcu_dereference(s->psample_group);
+	psample_group = rcu_dereference_bh(s->psample_group);
 
 	/* randomly sample packets according to rate */
 	if (psample_group && (prandom_u32() % s->rate == 0)) {
@@ -165,7 +164,6 @@ static int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a,
 			skb_pull(skb, skb->mac_len);
 	}
 
-	rcu_read_unlock();
 	return retval;
 }
 
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index da56e6938c9e..a6db47ebec11 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -43,8 +43,7 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 	tcf_lastuse_update(&d->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
 
-	rcu_read_lock();
-	params = rcu_dereference(d->params);
+	params = rcu_dereference_bh(d->params);
 	action = READ_ONCE(d->tcf_action);
 
 	if (params->flags & SKBEDIT_F_PRIORITY)
@@ -77,14 +76,11 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 	}
 	if (params->flags & SKBEDIT_F_PTYPE)
 		skb->pkt_type = params->ptype;
-
-unlock:
-	rcu_read_unlock();
 	return action;
+
 err:
 	qstats_drop_inc(this_cpu_ptr(d->common.cpu_qstats));
-	action = TC_ACT_SHOT;
-	goto unlock;
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index cdc6bacfb190..c437c6d51a71 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -41,20 +41,14 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 	 * then MAX_EDIT_LEN needs to change appropriately
 	*/
 	err = skb_ensure_writable(skb, MAX_EDIT_LEN);
-	if (unlikely(err)) { /* best policy is to drop on the floor */
-		qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
-		return TC_ACT_SHOT;
-	}
+	if (unlikely(err)) /* best policy is to drop on the floor */
+		goto drop;
 
-	rcu_read_lock();
 	action = READ_ONCE(d->tcf_action);
-	if (unlikely(action == TC_ACT_SHOT)) {
-		qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
-		rcu_read_unlock();
-		return action;
-	}
+	if (unlikely(action == TC_ACT_SHOT))
+		goto drop;
 
-	p = rcu_dereference(d->skbmod_p);
+	p = rcu_dereference_bh(d->skbmod_p);
 	flags = p->flags;
 	if (flags & SKBMOD_F_DMAC)
 		ether_addr_copy(eth_hdr(skb)->h_dest, p->eth_dst);
@@ -62,7 +56,6 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 		ether_addr_copy(eth_hdr(skb)->h_source, p->eth_src);
 	if (flags & SKBMOD_F_ETYPE)
 		eth_hdr(skb)->h_proto = p->eth_type;
-	rcu_read_unlock();
 
 	if (flags & SKBMOD_F_SWAPMAC) {
 		u16 tmpaddr[ETH_ALEN / 2]; /* ether_addr_copy() requirement */
@@ -73,6 +66,10 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 	}
 
 	return action;
+
+drop:
+	qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy skbmod_policy[TCA_SKBMOD_MAX + 1] = {
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index f811850fd1d0..d42d9e112789 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -31,9 +31,7 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_tunnel_key_params *params;
 	int action;
 
-	rcu_read_lock();
-
-	params = rcu_dereference(t->params);
+	params = rcu_dereference_bh(t->params);
 
 	tcf_lastuse_update(&t->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
@@ -53,8 +51,6 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
 		break;
 	}
 
-	rcu_read_unlock();
-
 	return action;
 }
 
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index ad37f308175a..15a0ee214c9c 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -40,11 +40,9 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	if (skb_at_tc_ingress(skb))
 		skb_push_rcsum(skb, skb->mac_len);
 
-	rcu_read_lock();
-
 	action = READ_ONCE(v->tcf_action);
 
-	p = rcu_dereference(v->vlan_p);
+	p = rcu_dereference_bh(v->vlan_p);
 
 	switch (p->tcfv_action) {
 	case TCA_VLAN_ACT_POP:
@@ -61,7 +59,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	case TCA_VLAN_ACT_MODIFY:
 		/* No-op if no vlan tag (either hw-accel or in-payload) */
 		if (!skb_vlan_tagged(skb))
-			goto unlock;
+			goto out;
 		/* extract existing tag (and guarantee no hw-accel tag) */
 		if (skb_vlan_tag_present(skb)) {
 			tci = skb_vlan_tag_get(skb);
@@ -86,18 +84,15 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 		BUG();
 	}
 
-	goto unlock;
-
-drop:
-	action = TC_ACT_SHOT;
-	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
-
-unlock:
-	rcu_read_unlock();
+out:
 	if (skb_at_tc_ingress(skb))
 		skb_pull_rcsum(skb, skb->mac_len);
 
 	return action;
+
+drop:
+	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT.
  2018-07-30 12:30 [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection Paolo Abeni
  2018-07-30 12:30 ` [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
  2018-07-30 12:30 ` [PATCH net-next v5 2/4] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
@ 2018-07-30 12:30 ` Paolo Abeni
  2018-07-30 12:40   ` Jiri Pirko
  2018-07-30 12:30 ` [PATCH net-next v5 4/4] act_mirred: use TC_ACT_REINSERT when possible Paolo Abeni
  2018-07-30 16:31 ` [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection David Miller
  4 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2018-07-30 12:30 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

This is similar TC_ACT_REDIRECT, but with a slightly different
semantic:
- on ingress the mirred skbs are passed to the target device
network stack without any additional check not scrubbing.
- the rcu-protected stats provided via the tcf_result struct
  are updated on error conditions.

This new tcfa_action value is not exposed to the user-space
and can be used only internally by clsact.

v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
 a new action type instead
v2 -> v3:
 - rename the new action value TC_ACT_REINJECT, update the
   helper accordingly
 - take care of uncloned reinjected packets in XDP generic
   hook
v3 -> v4:
 - renamed again the new action value (JiriP)
v4 -> v5:
 - fix build error with !NET_CLS_ACT (kbuild bot)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: since I had to change the netif_receive_generic_xdp chunk
in respect to revision v4, so I did not include Cong's ack, added
to such revision
---
 include/net/pkt_cls.h     |  3 +++
 include/net/sch_generic.h | 28 ++++++++++++++++++++++++++++
 net/core/dev.c            |  6 +++++-
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 6d02f31abba8..22bfc3a13c25 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -7,6 +7,9 @@
 #include <net/sch_generic.h>
 #include <net/act_api.h>
 
+/* TC action not accessible from user space */
+#define TC_ACT_REINSERT		(TC_ACT_VALUE_MAX + 1)
+
 /* Basic packet classifier frontend definitions. */
 
 struct tcf_walker {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index bcae181c1857..a6d00093f35e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -235,6 +235,12 @@ struct tcf_result {
 			u32		classid;
 		};
 		const struct tcf_proto *goto_tp;
+
+		/* used by the TC_ACT_REINSERT action */
+		struct {
+			bool		ingress;
+			struct gnet_stats_queue *qstats;
+		};
 	};
 };
 
@@ -569,6 +575,15 @@ static inline void skb_reset_tc(struct sk_buff *skb)
 #endif
 }
 
+static inline bool skb_is_tc_redirected(const struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	return skb->tc_redirected;
+#else
+	return false;
+#endif
+}
+
 static inline bool skb_at_tc_ingress(const struct sk_buff *skb)
 {
 #ifdef CONFIG_NET_CLS_ACT
@@ -1108,4 +1123,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
 			  struct mini_Qdisc __rcu **p_miniq);
 
+static inline void skb_tc_reinsert(struct sk_buff *skb, struct tcf_result *res)
+{
+	struct gnet_stats_queue *stats = res->qstats;
+	int ret;
+
+	if (res->ingress)
+		ret = netif_receive_skb(skb);
+	else
+		ret = dev_queue_xmit(skb);
+	if (ret && stats)
+		qstats_overlimit_inc(res->qstats);
+}
+
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 89031b5fef9f..38b0c414d780 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	/* Reinjected packets coming from act_mirred or similar should
 	 * not get XDP generic processing.
 	 */
-	if (skb_cloned(skb))
+	if (skb_cloned(skb) || skb_is_tc_redirected(skb))
 		return XDP_PASS;
 
 	/* XDP packets must be linear and must have sufficient headroom
@@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		__skb_push(skb, skb->mac_len);
 		skb_do_redirect(skb);
 		return NULL;
+	case TC_ACT_REINSERT:
+		/* this does not scrub the packet, and updates stats on error */
+		skb_tc_reinsert(skb, &cl_res);
+		return NULL;
 	default:
 		break;
 	}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 4/4] act_mirred: use TC_ACT_REINSERT when possible
  2018-07-30 12:30 [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection Paolo Abeni
                   ` (2 preceding siblings ...)
  2018-07-30 12:30 ` [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT Paolo Abeni
@ 2018-07-30 12:30 ` Paolo Abeni
  2018-07-30 12:42   ` Jiri Pirko
  2018-07-30 16:31 ` [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection David Miller
  4 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2018-07-30 12:30 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

When mirred is invoked from the ingress path, and it wants to redirect
the processed packet, it can now use the TC_ACT_REINSERT action,
filling the tcf_result accordingly, and avoiding a per packet
skb_clone().

Overall this gives a ~10% improvement in forwarding performance for the
TC S/W data path and TC S/W performances are now comparable to the
kernel openvswitch datapath.

v1 -> v2: use ACT_MIRRED instead of ACT_REDIRECT
v2 -> v3: updated after action rename, fixed typo into the commit
	message
v3 -> v4: updated again after action rename, added more comments to
	the code (JiriP), skip the optimization if the control action
	need to touch the tcf_result (Paolo)
v4 -> v5: fix sparse warning (kbuild bot)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: I had to do some minor change in the last chunk since v4, so
I did not include Cong's ack, added to such revision
---
 net/sched/act_mirred.c | 53 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index eeb335f03102..b26d060da08e 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -25,6 +25,7 @@
 #include <net/net_namespace.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
+#include <net/pkt_cls.h>
 #include <linux/tc_act/tc_mirred.h>
 #include <net/tc_act/tc_mirred.h>
 
@@ -49,6 +50,18 @@ static bool tcf_mirred_act_wants_ingress(int action)
 	}
 }
 
+static bool tcf_mirred_can_reinsert(int action)
+{
+	switch (action) {
+	case TC_ACT_SHOT:
+	case TC_ACT_STOLEN:
+	case TC_ACT_QUEUED:
+	case TC_ACT_TRAP:
+		return true;
+	}
+	return false;
+}
+
 static void tcf_mirred_release(struct tc_action *a)
 {
 	struct tcf_mirred *m = to_mirred(a);
@@ -171,10 +184,13 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		      struct tcf_result *res)
 {
 	struct tcf_mirred *m = to_mirred(a);
+	struct sk_buff *skb2 = skb;
 	bool m_mac_header_xmit;
 	struct net_device *dev;
-	struct sk_buff *skb2;
 	int retval, err = 0;
+	bool use_reinsert;
+	bool want_ingress;
+	bool is_redirect;
 	int m_eaction;
 	int mac_len;
 
@@ -196,16 +212,25 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		goto out;
 	}
 
-	skb2 = skb_clone(skb, GFP_ATOMIC);
-	if (!skb2)
-		goto out;
+	/* we could easily avoid the clone only if called by ingress and clsact;
+	 * since we can't easily detect the clsact caller, skip clone only for
+	 * ingress - that covers the TC S/W datapath.
+	 */
+	is_redirect = tcf_mirred_is_act_redirect(m_eaction);
+	use_reinsert = skb_at_tc_ingress(skb) && is_redirect &&
+		       tcf_mirred_can_reinsert(retval);
+	if (!use_reinsert) {
+		skb2 = skb_clone(skb, GFP_ATOMIC);
+		if (!skb2)
+			goto out;
+	}
 
 	/* If action's target direction differs than filter's direction,
 	 * and devices expect a mac header on xmit, then mac push/pull is
 	 * needed.
 	 */
-	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
-	    m_mac_header_xmit) {
+	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
+	if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
 		if (!skb_at_tc_ingress(skb)) {
 			/* caught at egress, act ingress: pull mac */
 			mac_len = skb_network_header(skb) - skb_mac_header(skb);
@@ -216,15 +241,23 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		}
 	}
 
+	skb2->skb_iif = skb->dev->ifindex;
+	skb2->dev = dev;
+
 	/* mirror is always swallowed */
-	if (tcf_mirred_is_act_redirect(m_eaction)) {
+	if (is_redirect) {
 		skb2->tc_redirected = 1;
 		skb2->tc_from_ingress = skb2->tc_at_ingress;
+
+		/* let's the caller reinsert the packet, if possible */
+		if (use_reinsert) {
+			res->ingress = want_ingress;
+			res->qstats = this_cpu_ptr(m->common.cpu_qstats);
+			return TC_ACT_REINSERT;
+		}
 	}
 
-	skb2->skb_iif = skb->dev->ifindex;
-	skb2->dev = dev;
-	if (!tcf_mirred_act_wants_ingress(m_eaction))
+	if (!want_ingress)
 		err = dev_queue_xmit(skb2);
 	else
 		err = netif_receive_skb(skb2);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-30 12:30 ` [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
@ 2018-07-30 12:36   ` Jiri Pirko
  2018-07-30 14:03   ` Jamal Hadi Salim
  1 sibling, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2018-07-30 12:36 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Mon, Jul 30, 2018 at 02:30:42PM CEST, pabeni@redhat.com wrote:
>Currently, when initializing an action, the user-space can specify
>and use arbitrary values for the tcfa_action field. If the value
>is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC.
>
>This change explicitly checks for unknown values at action creation
>time, and explicitly convert them to TC_ACT_UNSPEC. No functional
>changes are introduced, but this will allow introducing tcfa_action
>values not exposed to user-space in a later patch.
>
>Note: we can't use the above to hide TC_ACT_REDIRECT from user-space,
>as the latter is already part of uAPI.
>
>v3 -> v4:
> - use an helper to check for action validity (JiriP)
> - emit an extack for invalid actions (JiriP)
>v4 -> v5:
> - keep messages on a single line, drop net_warn (Marcelo)

Little nitpick: The changelog should be in the reverse order.

>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT.
  2018-07-30 12:30 ` [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT Paolo Abeni
@ 2018-07-30 12:40   ` Jiri Pirko
  2018-07-30 16:31     ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2018-07-30 12:40 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Mon, Jul 30, 2018 at 02:30:44PM CEST, pabeni@redhat.com wrote:
>This is similar TC_ACT_REDIRECT, but with a slightly different
>semantic:
>- on ingress the mirred skbs are passed to the target device
>network stack without any additional check not scrubbing.
>- the rcu-protected stats provided via the tcf_result struct
>  are updated on error conditions.
>
>This new tcfa_action value is not exposed to the user-space
>and can be used only internally by clsact.
>
>v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
> a new action type instead
>v2 -> v3:
> - rename the new action value TC_ACT_REINJECT, update the
>   helper accordingly
> - take care of uncloned reinjected packets in XDP generic
>   hook
>v3 -> v4:
> - renamed again the new action value (JiriP)
>v4 -> v5:
> - fix build error with !NET_CLS_ACT (kbuild bot)

Oh, another nitpick: The changelog should go under "---" line. That way,
the maintainer does not have to deal with it during apply (git apply
will cut it out).

>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 4/4] act_mirred: use TC_ACT_REINSERT when possible
  2018-07-30 12:30 ` [PATCH net-next v5 4/4] act_mirred: use TC_ACT_REINSERT when possible Paolo Abeni
@ 2018-07-30 12:42   ` Jiri Pirko
  0 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2018-07-30 12:42 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Mon, Jul 30, 2018 at 02:30:45PM CEST, pabeni@redhat.com wrote:
>When mirred is invoked from the ingress path, and it wants to redirect
>the processed packet, it can now use the TC_ACT_REINSERT action,
>filling the tcf_result accordingly, and avoiding a per packet
>skb_clone().
>
>Overall this gives a ~10% improvement in forwarding performance for the
>TC S/W data path and TC S/W performances are now comparable to the
>kernel openvswitch datapath.
>
>v1 -> v2: use ACT_MIRRED instead of ACT_REDIRECT
>v2 -> v3: updated after action rename, fixed typo into the commit
>	message
>v3 -> v4: updated again after action rename, added more comments to
>	the code (JiriP), skip the optimization if the control action
>	need to touch the tcf_result (Paolo)
>v4 -> v5: fix sparse warning (kbuild bot)
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-30 12:30 ` [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
  2018-07-30 12:36   ` Jiri Pirko
@ 2018-07-30 14:03   ` Jamal Hadi Salim
  2018-07-30 14:21     ` Jiri Pirko
  2018-07-30 16:41     ` Paolo Abeni
  1 sibling, 2 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2018-07-30 14:03 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On 30/07/18 08:30 AM, Paolo Abeni wrote:
>   	}
>   
> +	if (!tcf_action_valid(a->tcfa_action)) {
> +		NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC instead");
> +		a->tcfa_action = TC_ACT_UNSPEC;
> +	}
> +
>   	return a;
>   


I think it would make a lot more sense to just reject the entry than
changing it underneath the user to a default value. Least element of
suprise.

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-30 14:03   ` Jamal Hadi Salim
@ 2018-07-30 14:21     ` Jiri Pirko
  2018-07-30 16:41     ` Paolo Abeni
  1 sibling, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2018-07-30 14:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Paolo Abeni, netdev, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Mon, Jul 30, 2018 at 04:03:50PM CEST, jhs@mojatatu.com wrote:
>On 30/07/18 08:30 AM, Paolo Abeni wrote:
>>   	}
>> +	if (!tcf_action_valid(a->tcfa_action)) {
>> +		NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC instead");
>> +		a->tcfa_action = TC_ACT_UNSPEC;
>> +	}
>> +
>>   	return a;
>
>
>I think it would make a lot more sense to just reject the entry than
>changing it underneath the user to a default value. Least element of
>suprise.

It might break existing user who is incorrectly doing it. But I'm okay
with both solutions.

>
>cheers,
>jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT.
  2018-07-30 12:40   ` Jiri Pirko
@ 2018-07-30 16:31     ` David Miller
  2018-07-30 16:49       ` Jiri Pirko
  0 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2018-07-30 16:31 UTC (permalink / raw)
  To: jiri
  Cc: pabeni, netdev, jhs, xiyou.wangcong, daniel, marcelo.leitner,
	eyal.birger

From: Jiri Pirko <jiri@resnulli.us>
Date: Mon, 30 Jul 2018 14:40:13 +0200

> Oh, another nitpick: The changelog should go under "---" line. That way,
> the maintainer does not have to deal with it during apply (git apply
> will cut it out).

I actually like the changelog to be _IN_ the commit message.

It's useful information that helps someone investigating history
in the tree later.

Please do not tell people to pull changelogs out of the commit
message, thank you.  I've been telling them to do the exact
opposite.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection
  2018-07-30 12:30 [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection Paolo Abeni
                   ` (3 preceding siblings ...)
  2018-07-30 12:30 ` [PATCH net-next v5 4/4] act_mirred: use TC_ACT_REINSERT when possible Paolo Abeni
@ 2018-07-30 16:31 ` David Miller
  4 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2018-07-30 16:31 UTC (permalink / raw)
  To: pabeni
  Cc: netdev, jhs, xiyou.wangcong, jiri, daniel, marcelo.leitner, eyal.birger

From: Paolo Abeni <pabeni@redhat.com>
Date: Mon, 30 Jul 2018 14:30:41 +0200

> This series is aimed at improving the act_mirred redirect performances.
> Such action is used by OVS to represent TC S/W flows, and it's current largest
> bottle-neck is the need for a skb_clone() for each packet.
> 
> The first 2 patches introduce some cleanup and safeguards to allow extending 
> tca_result - we will use it to store RCU protected redirect information - and
> introduce a clear separation between user-space accessible tcfa_action
> values and internal values accessible only by the kernel.
> Then a new tcfa_action value is introduced: TC_ACT_REINJECT, similar to
> TC_ACT_REDIRECT, but preserving the mirred semantic. Such value is not
> accessible from user-space.
> The last patch exploits the newly introduced infrastructure in the act_mirred
> action, to avoid a skb_clone, when possible.
> 
> Overall this the above gives a ~10% performance improvement in forwarding tput,
> when using the TC S/W datapath.
 ...

Series applied, thank you.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-30 14:03   ` Jamal Hadi Salim
  2018-07-30 14:21     ` Jiri Pirko
@ 2018-07-30 16:41     ` Paolo Abeni
  2018-07-30 19:31       ` Jamal Hadi Salim
  1 sibling, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2018-07-30 16:41 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On Mon, 2018-07-30 at 10:03 -0400, Jamal Hadi Salim wrote:
> On 30/07/18 08:30 AM, Paolo Abeni wrote:
> >   	}
> >   
> > +	if (!tcf_action_valid(a->tcfa_action)) {
> > +		NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC instead");
> > +		a->tcfa_action = TC_ACT_UNSPEC;
> > +	}
> > +
> >   	return a;
> >   
> 
> 
> I think it would make a lot more sense to just reject the entry than
> changing it underneath the user to a default value. Least element of
> suprise.

I fear that would break existing (bad) users ?!? This way, such users
are notified they are doing something uncorrect, but still continue to
work.

The patch can be changed to reject bad actions, if there is agreement,
but it would not look as the safest way to me.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT.
  2018-07-30 16:31     ` David Miller
@ 2018-07-30 16:49       ` Jiri Pirko
  2018-07-30 16:56         ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2018-07-30 16:49 UTC (permalink / raw)
  To: David Miller
  Cc: pabeni, netdev, jhs, xiyou.wangcong, daniel, marcelo.leitner,
	eyal.birger

Mon, Jul 30, 2018 at 06:31:00PM CEST, davem@davemloft.net wrote:
>From: Jiri Pirko <jiri@resnulli.us>
>Date: Mon, 30 Jul 2018 14:40:13 +0200
>
>> Oh, another nitpick: The changelog should go under "---" line. That way,
>> the maintainer does not have to deal with it during apply (git apply
>> will cut it out).
>
>I actually like the changelog to be _IN_ the commit message.
>
>It's useful information that helps someone investigating history
>in the tree later.
>
>Please do not tell people to pull changelogs out of the commit
>message, thank you.  I've been telling them to do the exact
>opposite.

Oups. Interesting. So the changelog stays in the commit message, right?
I noticed that in the past but thought it was an accident. Sorry about
that. I owe you just another beer :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT.
  2018-07-30 16:49       ` Jiri Pirko
@ 2018-07-30 16:56         ` David Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2018-07-30 16:56 UTC (permalink / raw)
  To: jiri
  Cc: pabeni, netdev, jhs, xiyou.wangcong, daniel, marcelo.leitner,
	eyal.birger

From: Jiri Pirko <jiri@resnulli.us>
Date: Mon, 30 Jul 2018 18:49:23 +0200

> So the changelog stays in the commit message, right?

Yes.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-30 16:41     ` Paolo Abeni
@ 2018-07-30 19:31       ` Jamal Hadi Salim
  2018-07-31  9:41         ` Paolo Abeni
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2018-07-30 19:31 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On 30/07/18 12:41 PM, Paolo Abeni wrote:
> On Mon, 2018-07-30 at 10:03 -0400, Jamal Hadi Salim wrote:
>> On 30/07/18 08:30 AM, Paolo Abeni wrote:
>>>    	}
>>>    
>>> +	if (!tcf_action_valid(a->tcfa_action)) {
>>> +		NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC instead");
>>> +		a->tcfa_action = TC_ACT_UNSPEC;
>>> +	}
>>> +
>>>    	return a;
>>>    
>>
>>
>> I think it would make a lot more sense to just reject the entry than
>> changing it underneath the user to a default value. Least element of
>> suprise.
> 
> I fear that would break existing (bad) users ?!? This way, such users
> are notified they are doing something uncorrect, but still continue to
> work.


By "bad users" I think you mean someone setting a policy expecting
one behavior but getting a different one? If yes, that policy was
already wrong/buggy. As an example, if i configured:

match xxx action foo action goo action bar action gah

where action goo has a bad opcode
If  you "fix it"  with TC_ACT_UNSPEC then basically the above
policy is now equivalent to:

match xxx action foo action goo

Infact if there was a lower prio rule in the chain
then lookup will continue there and produce even stranger
results.


cheers,
jamal


> 
> The patch can be changed to reject bad actions, if there is agreement,
> but it would not look as the safest way to me.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-30 19:31       ` Jamal Hadi Salim
@ 2018-07-31  9:41         ` Paolo Abeni
  2018-07-31 13:53           ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2018-07-31  9:41 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On Mon, 2018-07-30 at 15:31 -0400, Jamal Hadi Salim wrote:
> On 30/07/18 12:41 PM, Paolo Abeni wrote:
> > On Mon, 2018-07-30 at 10:03 -0400, Jamal Hadi Salim wrote:
> > > On 30/07/18 08:30 AM, Paolo Abeni wrote:
> > > >    	}
> > > >    
> > > > +	if (!tcf_action_valid(a->tcfa_action)) {
> > > > +		NL_SET_ERR_MSG(extack, "invalid action value, using TC_ACT_UNSPEC instead");
> > > > +		a->tcfa_action = TC_ACT_UNSPEC;
> > > > +	}
> > > > +
> > > >    	return a;
> > > >    
> > > 
> > > 
> > > I think it would make a lot more sense to just reject the entry than
> > > changing it underneath the user to a default value. Least element of
> > > suprise.
> > 
> > I fear that would break existing (bad) users ?!? This way, such users
> > are notified they are doing something uncorrect, but still continue to
> > work.
> 
> 
> By "bad users" I think you mean someone setting a policy expecting
> one behavior but getting a different one? 

Generally speaking I thought about user-space pushing to the kernel
some uninitialized/random value for 'tcfa_action'.

> If yes, that policy was
> already wrong/buggy. As an example, if i configured:
> 
> match xxx action foo action goo action bar action gah
> 
> where action goo has a bad opcode
> If  you "fix it"  with TC_ACT_UNSPEC then basically the above
> policy is now equivalent to:
> 
> match xxx action foo action goo
> 
> Infact if there was a lower prio rule in the chain
> then lookup will continue there and produce even stranger
> results.

I see. 

Before this patch, the kernel exposed the same behaviour for negative
value of 'bar', while, for positive 'bar' values, the overall behaviour
was more complex (some classifier always stops with unknown positive
action value, others go to lower prio).

Overall, the kernel behavior should be more well-defined now, but yes,
there is a change of behavior under some circumstances.

What about instead mapping undefined/unknown actions value to TC_ACT_OK
(still at initialization time)? this is what is already done by
tcf_action_exec() for faulty opcodes/graphs and by tcf_ipt() and would
handle the above example more conistently.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-31  9:41         ` Paolo Abeni
@ 2018-07-31 13:53           ` Jamal Hadi Salim
  2018-07-31 14:40             ` Paolo Abeni
  0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2018-07-31 13:53 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On 31/07/18 05:41 AM, Paolo Abeni wrote:

> Before this patch, the kernel exposed the same behaviour for negative
> value of 'bar', while, for positive 'bar' values, the overall behaviour
> was more complex (some classifier always stops with unknown positive
> action value, others go to lower prio).
> 
 >
> Overall, the kernel behavior should be more well-defined now, but yes,
> there is a change of behavior under some circumstances.
> 
> What about instead mapping undefined/unknown actions value to TC_ACT_OK
> (still at initialization time)? >
> this is what is already done by
> tcf_action_exec() for faulty opcodes/graphs and by tcf_ipt() and would
> handle the above example more conistently.
> 

I think PIPE maybe more reasonable. You still continue the action graph
even on such a mis-configuration.

But let me see if i can make a convincing arguement for rejecting at
init time:
I would be _very suprised_ if someone is depending on a misconfiguration 
such as this in a deployment because they would get different results 
than what they are expecting and sooner or later fix it after a lot of
debugging and cursing. Your patch helps them notice it sooner. And a
rejection even much much sooner. With a rejection you dont get to
execute a "fixup" the kernel assumes for you.

BTW, I asked this earlier and Jiri said it was addressed in patch 2.
I just looked again and i may be missing something basic:
Lets say tomorrow in a new kernel we add new TC_ACT_XXX that then gets 
exposed to uapi - so user space tc is updated.
You then use the new tc specifying TC_ACT_XXX policy on kernel with your
changes.
If i read correctly because TC_ACT_XXX is out of bounds for current
kernel(which has your changes) you will fix it to be UNSPEC, no?

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-31 13:53           ` Jamal Hadi Salim
@ 2018-07-31 14:40             ` Paolo Abeni
  2018-08-01 14:34               ` Jamal Hadi Salim
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Abeni @ 2018-07-31 14:40 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On Tue, 2018-07-31 at 09:53 -0400, Jamal Hadi Salim wrote:
> BTW, I asked this earlier and Jiri said it was addressed in patch 2.
> I just looked again and i may be missing something basic:
> Lets say tomorrow in a new kernel we add new TC_ACT_XXX that then gets 
> exposed to uapi - so user space tc is updated.
> You then use the new tc specifying TC_ACT_XXX policy on kernel with your
> changes.
> If i read correctly because TC_ACT_XXX is out of bounds for current
> kernel(which has your changes) you will fix it to be UNSPEC, no?

You are right.

If we choose to reject unknown opcodes, such user-space configuration
will fail.

What would happen before this patch is that configurations using such
TC_ACT_XXXX value would be successful. This is why I proposed to keep
the fixup.

I initially thought the kernel behavior in the above scenario would
match exactly TC_ACT_UNSPEC processing, but as you noted with the
example in your previous email, TC_ACT_UNSPEC processing is actually a
bit different.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values
  2018-07-31 14:40             ` Paolo Abeni
@ 2018-08-01 14:34               ` Jamal Hadi Salim
  0 siblings, 0 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2018-08-01 14:34 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On 31/07/18 10:40 AM, Paolo Abeni wrote:

> If we choose to reject unknown opcodes, such user-space configuration
> will fail.
> 

I think that is a good thing. The kernel should not be accepting things
it doesnt understand. This is a good opportunity to enforce that.

> What would happen before this patch is that configurations using such
> TC_ACT_XXXX value would be successful. This is why I proposed to keep
> the fixup.
>

Note: Such behavior can only occur if tc(user space) allows you
to pass illegitimate values which today can only happen when you have a 
new user space but older kernel (with "old" starting with your current
changes).
iow, fixing a policy in a kernel which has no support for TC_ACT_XXXX
to translate intent to be TC_ACT_OK/PIPE is problematic (as i was
showing earlier).

> I initially thought the kernel behavior in the above scenario would
> match exactly TC_ACT_UNSPEC processing, but as you noted with the
> example in your previous email, TC_ACT_UNSPEC processing is actually a
> bit different.
> 

I worry: I dont think we can get a good default for most use
cases. No point in maintaining faulty  expectations
(because  IMO: the user will - eventually - fix their scripts if they
dont see expected behavior).

cheers,
jamal

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-08-01 16:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-30 12:30 [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection Paolo Abeni
2018-07-30 12:30 ` [PATCH net-next v5 1/4] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
2018-07-30 12:36   ` Jiri Pirko
2018-07-30 14:03   ` Jamal Hadi Salim
2018-07-30 14:21     ` Jiri Pirko
2018-07-30 16:41     ` Paolo Abeni
2018-07-30 19:31       ` Jamal Hadi Salim
2018-07-31  9:41         ` Paolo Abeni
2018-07-31 13:53           ` Jamal Hadi Salim
2018-07-31 14:40             ` Paolo Abeni
2018-08-01 14:34               ` Jamal Hadi Salim
2018-07-30 12:30 ` [PATCH net-next v5 2/4] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
2018-07-30 12:30 ` [PATCH net-next v5 3/4] net/tc: introduce TC_ACT_REINSERT Paolo Abeni
2018-07-30 12:40   ` Jiri Pirko
2018-07-30 16:31     ` David Miller
2018-07-30 16:49       ` Jiri Pirko
2018-07-30 16:56         ` David Miller
2018-07-30 12:30 ` [PATCH net-next v5 4/4] act_mirred: use TC_ACT_REINSERT when possible Paolo Abeni
2018-07-30 12:42   ` Jiri Pirko
2018-07-30 16:31 ` [PATCH net-next v5 0/4] TC: refactor act_mirred packets re-injection David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.