All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection
@ 2018-07-24 20:06 Paolo Abeni
  2018-07-24 20:06 ` [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
                   ` (5 more replies)
  0 siblings, 6 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-24 20:06 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

This series is aimed at improving the act_mirred redirect performances.
Such action is used by OVS to represent TC S/W flows, and it's current largest
bottle-neck is the need for a skb_clone() for each packet.

The first 3 patches introduce some cleanup and safeguards to allow extending 
tca_result - we will use it to store RCU protected redirect information - and
introduce a clear separation between user-space accessible tcfa_action
value and internal values accessible only by the kernel.
Then a new tcfa_action value is introduced: TC_ACT_REINJECT, similar to
TC_ACT_REDIRECT, but preserving the mirred semantic. Such value is not
accessible from user-space.
The last patch exploits the newly introduced infrastructure in the act_mirred
action, to avoid a skb_clone, when possible.

Overall this the above gives a ~10% performance improvement in forwarding tput,
when using the TC S/W datapath.

v1 -> v2:
 - preserve the rcu lock in act_bpf
 - add and use a new action value to reinject the packets, preserving the mirred
   semantic

v2 -> v3:
 - renamed to new action as TC_ACT_REINJECT
 - TC_ACT_REINJECT is not exposed to user-space

Paolo Abeni (5):
  tc/act: user space can't use TC_ACT_REDIRECT directly
  net/sched: user-space can't set unknown tcfa_action values
  tc/act: remove unneeded RCU lock in action callback
  net/tc: introduce TC_ACT_REINJECT.
  act_mirred: use TC_ACT_REINJECT when possible

 include/net/act_api.h        |  2 +-
 include/net/pkt_cls.h        |  3 +++
 include/net/sch_generic.h    | 21 ++++++++++++++++++++
 include/uapi/linux/pkt_cls.h |  6 ++++--
 net/core/dev.c               |  6 +++++-
 net/sched/act_api.c          | 15 +++++++++++++-
 net/sched/act_csum.c         | 12 +++---------
 net/sched/act_ife.c          |  5 +----
 net/sched/act_mirred.c       | 38 ++++++++++++++++++++++++------------
 net/sched/act_sample.c       |  4 +---
 net/sched/act_skbedit.c      | 10 +++-------
 net/sched/act_skbmod.c       | 21 +++++++++-----------
 net/sched/act_tunnel_key.c   |  6 +-----
 net/sched/act_vlan.c         | 19 +++++++-----------
 14 files changed, 98 insertions(+), 70 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-24 20:06 [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Paolo Abeni
@ 2018-07-24 20:06 ` Paolo Abeni
  2018-07-25 11:55   ` Jamal Hadi Salim
  2018-07-25 11:56   ` Jiri Pirko
  2018-07-24 20:06 ` [PATCH net-next v3 2/5] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-24 20:06 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Only cls_bpf and act_bpf can safely use such value. If a generic
action is configured by user space to return TC_ACT_REDIRECT,
the usually visible behavior is passing the skb up the stack - as
for unknown action, but, with complex configuration, more random
results can be obtained.

This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
at action init time, making the kernel behavior more consistent.

v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/sched/act_api.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 148a89ab789b..24b5534967fe 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 		}
 	}
 
+	if (a->tcfa_action == TC_ACT_REDIRECT) {
+		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
+		a->tcfa_action = TC_ACT_UNSPEC;
+	}
+
 	return a;
 
 err_mod:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v3 2/5] net/sched: user-space can't set unknown tcfa_action values
  2018-07-24 20:06 [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Paolo Abeni
  2018-07-24 20:06 ` [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
@ 2018-07-24 20:06 ` Paolo Abeni
  2018-07-25 12:26   ` Jiri Pirko
  2018-07-24 20:06 ` [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 43+ messages in thread
From: Paolo Abeni @ 2018-07-24 20:06 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Currently, when initializing an action, the user-space can specify
and use arbitrary values for the tcfa_action field. If the value
is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC.

This change explicitly checks for unknown values at action creation
time, and explicitly convert them to TC_ACT_UNSPEC. No functional
changes are introduced, but this will allow introducing tcfa_action
values not exposed to user-space in a later patch.

Note: we can't use the above to hide TC_ACT_REDIRECT from user-space,
as the latter is already part of uAPI.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/uapi/linux/pkt_cls.h |  6 ++++--
 net/sched/act_api.c          | 10 +++++++++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index c4262d911596..c8a24861d4c8 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -45,6 +45,7 @@ enum {
 				   * the skb and act like everything
 				   * is alright.
 				   */
+#define TC_ACT_VALUE_MAX	TC_ACT_TRAP
 
 /* There is a special kind of actions called "extended actions",
  * which need a value parameter. These have a local opcode located in
@@ -55,11 +56,12 @@ enum {
 #define __TC_ACT_EXT_SHIFT 28
 #define __TC_ACT_EXT(local) ((local) << __TC_ACT_EXT_SHIFT)
 #define TC_ACT_EXT_VAL_MASK ((1 << __TC_ACT_EXT_SHIFT) - 1)
-#define TC_ACT_EXT_CMP(combined, opcode) \
-	(((combined) & (~TC_ACT_EXT_VAL_MASK)) == opcode)
+#define TC_ACT_EXT_OPCODE(combined) ((combined) & (~TC_ACT_EXT_VAL_MASK))
+#define TC_ACT_EXT_CMP(combined, opcode) (TC_ACT_EXT_OPCODE(combined) == opcode)
 
 #define TC_ACT_JUMP __TC_ACT_EXT(1)
 #define TC_ACT_GOTO_CHAIN __TC_ACT_EXT(2)
+#define TC_ACT_EXT_OPCODE_MAX	TC_ACT_GOTO_CHAIN
 
 /* Action type identifiers*/
 enum {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 24b5534967fe..5044f4809b37 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -798,6 +798,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	char act_name[IFNAMSIZ];
 	struct nlattr *tb[TCA_ACT_MAX + 1];
 	struct nlattr *kind;
+	int opcode;
 	int err;
 
 	if (name == NULL) {
@@ -884,7 +885,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	if (err != ACT_P_CREATED)
 		module_put(a_o->owner);
 
-	if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN)) {
+	opcode = TC_ACT_EXT_OPCODE(a->tcfa_action);
+	if (opcode == TC_ACT_GOTO_CHAIN) {
 		err = tcf_action_goto_chain_init(a, tp);
 		if (err) {
 			struct tc_action *actions[] = { a, NULL };
@@ -898,6 +900,12 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	if (a->tcfa_action == TC_ACT_REDIRECT) {
 		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
 		a->tcfa_action = TC_ACT_UNSPEC;
+	} else if ((!opcode && a->tcfa_action > TC_ACT_VALUE_MAX) ||
+		   (opcode && opcode > TC_ACT_EXT_OPCODE_MAX &&
+		    a->tcfa_action != TC_ACT_UNSPEC)) {
+		net_warn_ratelimited("invalid %d action value",
+				     a->tcfa_action);
+		a->tcfa_action = TC_ACT_UNSPEC;
 	}
 
 	return a;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback
  2018-07-24 20:06 [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Paolo Abeni
  2018-07-24 20:06 ` [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
  2018-07-24 20:06 ` [PATCH net-next v3 2/5] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
@ 2018-07-24 20:06 ` Paolo Abeni
  2018-07-25 11:59   ` Jamal Hadi Salim
  2018-07-25 12:32   ` Jiri Pirko
  2018-07-24 20:06 ` [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT Paolo Abeni
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-24 20:06 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Each lockless action currently does its own RCU locking in ->act().
This is allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with _bh variant, cleans up a bit the surronding code and documents
the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

v1 -> v2:
 - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
   as pointed out by Daniel

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/act_api.h      |  2 +-
 include/net/sch_generic.h  |  2 ++
 net/sched/act_csum.c       | 12 +++---------
 net/sched/act_ife.c        |  5 +----
 net/sched/act_mirred.c     |  4 +---
 net/sched/act_sample.c     |  4 +---
 net/sched/act_skbedit.c    | 10 +++-------
 net/sched/act_skbmod.c     | 21 +++++++++------------
 net/sched/act_tunnel_key.c |  6 +-----
 net/sched/act_vlan.c       | 19 +++++++------------
 10 files changed, 29 insertions(+), 56 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 683ce41053d9..8c9bc02d05e1 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -85,7 +85,7 @@ struct tc_action_ops {
 	size_t	size;
 	struct module		*owner;
 	int     (*act)(struct sk_buff *, const struct tc_action *,
-		       struct tcf_result *);
+		       struct tcf_result *); /* called under RCU BH lock*/
 	int     (*dump)(struct sk_buff *, struct tc_action *, int, int);
 	void	(*cleanup)(struct tc_action *);
 	int     (*lookup)(struct net *net, struct tc_action **a, u32 index,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7432100027b7..056dc1083aa3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -275,6 +275,8 @@ struct tcf_proto {
 	/* Fast access part */
 	struct tcf_proto __rcu	*next;
 	void __rcu		*root;
+
+	/* called under RCU BH lock*/
 	int			(*classify)(struct sk_buff *,
 					    const struct tcf_proto *,
 					    struct tcf_result *);
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index bd232d3bd022..4f092fbd1e20 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -561,15 +561,14 @@ static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
 	u32 update_flags;
 	int action;
 
-	rcu_read_lock();
-	params = rcu_dereference(p->params);
+	params = rcu_dereference_bh(p->params);
 
 	tcf_lastuse_update(&p->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(p->common.cpu_bstats), skb);
 
 	action = params->action;
 	if (unlikely(action == TC_ACT_SHOT))
-		goto drop_stats;
+		goto drop;
 
 	update_flags = params->update_flags;
 	switch (tc_skb_protocol(skb)) {
@@ -583,16 +582,11 @@ static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
 		break;
 	}
 
-unlock:
-	rcu_read_unlock();
 	return action;
 
 drop:
-	action = TC_ACT_SHOT;
-
-drop_stats:
 	qstats_drop_inc(this_cpu_ptr(p->common.cpu_qstats));
-	goto unlock;
+	return TC_ACT_SHOT;
 }
 
 static int tcf_csum_dump(struct sk_buff *skb, struct tc_action *a, int bind,
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 3d6e265758c0..df4060e32d43 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -820,14 +820,11 @@ static int tcf_ife_act(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_ife_params *p;
 	int ret;
 
-	rcu_read_lock();
-	p = rcu_dereference(ife->params);
+	p = rcu_dereference_bh(ife->params);
 	if (p->flags & IFE_ENCODE) {
 		ret = tcf_ife_encode(skb, a, res, p);
-		rcu_read_unlock();
 		return ret;
 	}
-	rcu_read_unlock();
 
 	return tcf_ife_decode(skb, a, res);
 }
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 6afd89a36c69..eeb335f03102 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -181,11 +181,10 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	tcf_lastuse_update(&m->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
 
-	rcu_read_lock();
 	m_mac_header_xmit = READ_ONCE(m->tcfm_mac_header_xmit);
 	m_eaction = READ_ONCE(m->tcfm_eaction);
 	retval = READ_ONCE(m->tcf_action);
-	dev = rcu_dereference(m->tcfm_dev);
+	dev = rcu_dereference_bh(m->tcfm_dev);
 	if (unlikely(!dev)) {
 		pr_notice_once("tc mirred: target device is gone\n");
 		goto out;
@@ -236,7 +235,6 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		if (tcf_mirred_is_act_redirect(m_eaction))
 			retval = TC_ACT_SHOT;
 	}
-	rcu_read_unlock();
 
 	return retval;
 }
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 3079e7be5bde..2608ccc83e5e 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -140,8 +140,7 @@ static int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a,
 	bstats_cpu_update(this_cpu_ptr(s->common.cpu_bstats), skb);
 	retval = READ_ONCE(s->tcf_action);
 
-	rcu_read_lock();
-	psample_group = rcu_dereference(s->psample_group);
+	psample_group = rcu_dereference_bh(s->psample_group);
 
 	/* randomly sample packets according to rate */
 	if (psample_group && (prandom_u32() % s->rate == 0)) {
@@ -165,7 +164,6 @@ static int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a,
 			skb_pull(skb, skb->mac_len);
 	}
 
-	rcu_read_unlock();
 	return retval;
 }
 
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index da56e6938c9e..a6db47ebec11 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -43,8 +43,7 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 	tcf_lastuse_update(&d->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
 
-	rcu_read_lock();
-	params = rcu_dereference(d->params);
+	params = rcu_dereference_bh(d->params);
 	action = READ_ONCE(d->tcf_action);
 
 	if (params->flags & SKBEDIT_F_PRIORITY)
@@ -77,14 +76,11 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 	}
 	if (params->flags & SKBEDIT_F_PTYPE)
 		skb->pkt_type = params->ptype;
-
-unlock:
-	rcu_read_unlock();
 	return action;
+
 err:
 	qstats_drop_inc(this_cpu_ptr(d->common.cpu_qstats));
-	action = TC_ACT_SHOT;
-	goto unlock;
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index cdc6bacfb190..c437c6d51a71 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -41,20 +41,14 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 	 * then MAX_EDIT_LEN needs to change appropriately
 	*/
 	err = skb_ensure_writable(skb, MAX_EDIT_LEN);
-	if (unlikely(err)) { /* best policy is to drop on the floor */
-		qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
-		return TC_ACT_SHOT;
-	}
+	if (unlikely(err)) /* best policy is to drop on the floor */
+		goto drop;
 
-	rcu_read_lock();
 	action = READ_ONCE(d->tcf_action);
-	if (unlikely(action == TC_ACT_SHOT)) {
-		qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
-		rcu_read_unlock();
-		return action;
-	}
+	if (unlikely(action == TC_ACT_SHOT))
+		goto drop;
 
-	p = rcu_dereference(d->skbmod_p);
+	p = rcu_dereference_bh(d->skbmod_p);
 	flags = p->flags;
 	if (flags & SKBMOD_F_DMAC)
 		ether_addr_copy(eth_hdr(skb)->h_dest, p->eth_dst);
@@ -62,7 +56,6 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 		ether_addr_copy(eth_hdr(skb)->h_source, p->eth_src);
 	if (flags & SKBMOD_F_ETYPE)
 		eth_hdr(skb)->h_proto = p->eth_type;
-	rcu_read_unlock();
 
 	if (flags & SKBMOD_F_SWAPMAC) {
 		u16 tmpaddr[ETH_ALEN / 2]; /* ether_addr_copy() requirement */
@@ -73,6 +66,10 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 	}
 
 	return action;
+
+drop:
+	qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy skbmod_policy[TCA_SKBMOD_MAX + 1] = {
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 3ec585d58762..1ab959ef3547 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -31,9 +31,7 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_tunnel_key_params *params;
 	int action;
 
-	rcu_read_lock();
-
-	params = rcu_dereference(t->params);
+	params = rcu_dereference_bh(t->params);
 
 	tcf_lastuse_update(&t->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
@@ -53,8 +51,6 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
 		break;
 	}
 
-	rcu_read_unlock();
-
 	return action;
 }
 
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index ad37f308175a..15a0ee214c9c 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -40,11 +40,9 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	if (skb_at_tc_ingress(skb))
 		skb_push_rcsum(skb, skb->mac_len);
 
-	rcu_read_lock();
-
 	action = READ_ONCE(v->tcf_action);
 
-	p = rcu_dereference(v->vlan_p);
+	p = rcu_dereference_bh(v->vlan_p);
 
 	switch (p->tcfv_action) {
 	case TCA_VLAN_ACT_POP:
@@ -61,7 +59,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	case TCA_VLAN_ACT_MODIFY:
 		/* No-op if no vlan tag (either hw-accel or in-payload) */
 		if (!skb_vlan_tagged(skb))
-			goto unlock;
+			goto out;
 		/* extract existing tag (and guarantee no hw-accel tag) */
 		if (skb_vlan_tag_present(skb)) {
 			tci = skb_vlan_tag_get(skb);
@@ -86,18 +84,15 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 		BUG();
 	}
 
-	goto unlock;
-
-drop:
-	action = TC_ACT_SHOT;
-	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
-
-unlock:
-	rcu_read_unlock();
+out:
 	if (skb_at_tc_ingress(skb))
 		skb_pull_rcsum(skb, skb->mac_len);
 
 	return action;
+
+drop:
+	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-24 20:06 [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Paolo Abeni
                   ` (2 preceding siblings ...)
  2018-07-24 20:06 ` [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
@ 2018-07-24 20:06 ` Paolo Abeni
  2018-07-24 20:38   ` Cong Wang
                     ` (2 more replies)
  2018-07-24 20:06 ` [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible Paolo Abeni
  2018-07-25 11:53 ` [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Jiri Pirko
  5 siblings, 3 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-24 20:06 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

This is similar TC_ACT_REDIRECT, but with a slightly different
semantic:
- on ingress the mirred skbs are passed to the target device
network stack without any additional check not scrubbing.
- the rcu-protected stats provided via the tcf_result struct
  are updated on error conditions.

This new tcfa_action value is not exposed to the user-space
and can be used only internally by clsact.

v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
 a new action type instead

v2 -> v3:
 - rename the new action value TC_ACT_REINJECT, update the
   helper accordingly
 - take care of uncloned reinjected packets in XDP generic
   hook

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/pkt_cls.h     |  3 +++
 include/net/sch_generic.h | 19 +++++++++++++++++++
 net/core/dev.c            |  6 +++++-
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2081e4219f81..36ccfe2a303a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -7,6 +7,9 @@
 #include <net/sch_generic.h>
 #include <net/act_api.h>
 
+/* TC action not accessible from user space */
+#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
+
 /* Basic packet classifier frontend definitions. */
 
 struct tcf_walker {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 056dc1083aa3..95e81a70f549 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -235,6 +235,12 @@ struct tcf_result {
 			u32		classid;
 		};
 		const struct tcf_proto *goto_tp;
+
+		/* used by the TC_ACT_REINJECT action */
+		struct {
+			bool		ingress;
+			struct gnet_stats_queue *qstats;
+		};
 	};
 };
 
@@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
 			  struct mini_Qdisc __rcu **p_miniq);
 
+static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
+{
+	struct gnet_stats_queue *stats = res->qstats;
+	int ret;
+
+	if (res->ingress)
+		ret = netif_receive_skb(skb);
+	else
+		ret = dev_queue_xmit(skb);
+	if (ret && stats)
+		qstats_overlimit_inc(res->qstats);
+}
+
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 14a748ee8cc9..826ec74fe1d9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	/* Reinjected packets coming from act_mirred or similar should
 	 * not get XDP generic processing.
 	 */
-	if (skb_cloned(skb))
+	if (skb_cloned(skb) || skb->tc_redirected)
 		return XDP_PASS;
 
 	/* XDP packets must be linear and must have sufficient headroom
@@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		__skb_push(skb, skb->mac_len);
 		skb_do_redirect(skb);
 		return NULL;
+	case TC_ACT_REINJECT:
+		/* this does not scrub the packet, and updates stats on error */
+		skb_tc_reinject(skb, &cl_res);
+		return NULL;
 	default:
 		break;
 	}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-24 20:06 [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Paolo Abeni
                   ` (3 preceding siblings ...)
  2018-07-24 20:06 ` [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT Paolo Abeni
@ 2018-07-24 20:06 ` Paolo Abeni
  2018-07-24 21:15   ` Cong Wang
  2018-07-25 13:52   ` Jiri Pirko
  2018-07-25 11:53 ` [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Jiri Pirko
  5 siblings, 2 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-24 20:06 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

When mirred is invoked from the ingress path, and it wants to redirect
the processed packet, it can now use the TC_ACT_REINJECT action,
filling the tcf_result accordingly, and avoiding a per packet
skb_clone().

Overall this gives a ~10% improvement in forwarding performance for the
TC S/W data path and TC S/W performances are now comparable to the
kernel openvswitch datapath.

v1 -> v2: use ACT_MIRRED instead of ACT_REDIRECT
v2 -> v3: updated after action rename, fixed a typo into the commit
	message

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/sched/act_mirred.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index eeb335f03102..368187312136 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -25,6 +25,7 @@
 #include <net/net_namespace.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
+#include <net/pkt_cls.h>
 #include <linux/tc_act/tc_mirred.h>
 #include <net/tc_act/tc_mirred.h>
 
@@ -171,10 +172,12 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		      struct tcf_result *res)
 {
 	struct tcf_mirred *m = to_mirred(a);
+	struct sk_buff *skb2 = skb;
 	bool m_mac_header_xmit;
 	struct net_device *dev;
-	struct sk_buff *skb2;
 	int retval, err = 0;
+	bool want_ingress;
+	bool is_redirect;
 	int m_eaction;
 	int mac_len;
 
@@ -196,16 +199,19 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		goto out;
 	}
 
-	skb2 = skb_clone(skb, GFP_ATOMIC);
-	if (!skb2)
-		goto out;
+	is_redirect = tcf_mirred_is_act_redirect(m_eaction);
+	if (!skb_at_tc_ingress(skb) || !is_redirect) {
+		skb2 = skb_clone(skb, GFP_ATOMIC);
+		if (!skb2)
+			goto out;
+	}
 
 	/* If action's target direction differs than filter's direction,
 	 * and devices expect a mac header on xmit, then mac push/pull is
 	 * needed.
 	 */
-	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
-	    m_mac_header_xmit) {
+	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
+	if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
 		if (!skb_at_tc_ingress(skb)) {
 			/* caught at egress, act ingress: pull mac */
 			mac_len = skb_network_header(skb) - skb_mac_header(skb);
@@ -216,15 +222,23 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		}
 	}
 
+	skb2->skb_iif = skb->dev->ifindex;
+	skb2->dev = dev;
+
 	/* mirror is always swallowed */
-	if (tcf_mirred_is_act_redirect(m_eaction)) {
+	if (is_redirect) {
 		skb2->tc_redirected = 1;
 		skb2->tc_from_ingress = skb2->tc_at_ingress;
+
+		/* let's the caller reinject the packet, if possible */
+		if (skb_at_tc_ingress(skb)) {
+			res->ingress = want_ingress;
+			res->qstats = this_cpu_ptr(m->common.cpu_qstats);
+			return TC_ACT_REINJECT;
+		}
 	}
 
-	skb2->skb_iif = skb->dev->ifindex;
-	skb2->dev = dev;
-	if (!tcf_mirred_act_wants_ingress(m_eaction))
+	if (!want_ingress)
 		err = dev_queue_xmit(skb2);
 	else
 		err = netif_receive_skb(skb2);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-24 20:06 ` [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT Paolo Abeni
@ 2018-07-24 20:38   ` Cong Wang
  2018-07-24 20:50     ` Cong Wang
  2018-07-25 12:16   ` Jamal Hadi Salim
  2018-07-25 12:57   ` Jiri Pirko
  2 siblings, 1 reply; 43+ messages in thread
From: Cong Wang @ 2018-07-24 20:38 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Daniel Borkmann, Marcelo Ricardo Leitner, Eyal Birger,
	David Miller

On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> +{
> +       struct gnet_stats_queue *stats = res->qstats;
> +       int ret;
> +
> +       if (res->ingress)
> +               ret = netif_receive_skb(skb);
> +       else
> +               ret = dev_queue_xmit(skb);
> +       if (ret && stats)
> +               qstats_overlimit_inc(res->qstats);

Why increasing overlimit? Overlimit is typically increased
by traffic shapers to indicate there is no bandwidth to send
out the packet.

I fail to understand why overlimit is increased in your case
here. I guess you want to increase 'drops' instead.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-24 20:38   ` Cong Wang
@ 2018-07-24 20:50     ` Cong Wang
  2018-07-25  8:29       ` Paolo Abeni
  0 siblings, 1 reply; 43+ messages in thread
From: Cong Wang @ 2018-07-24 20:50 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Daniel Borkmann, Marcelo Ricardo Leitner, Eyal Birger,
	David Miller

On Tue, Jul 24, 2018 at 1:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> > +{
> > +       struct gnet_stats_queue *stats = res->qstats;
> > +       int ret;
> > +
> > +       if (res->ingress)
> > +               ret = netif_receive_skb(skb);
> > +       else
> > +               ret = dev_queue_xmit(skb);
> > +       if (ret && stats)
> > +               qstats_overlimit_inc(res->qstats);
>
> Why increasing overlimit? Overlimit is typically increased
> by traffic shapers to indicate there is no bandwidth to send
> out the packet.
>
> I fail to understand why overlimit is increased in your case
> here. I guess you want to increase 'drops' instead.

Hmm, actually the current mirred code increases overlimit too.
But I still don't think it makes sense.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-24 20:06 ` [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible Paolo Abeni
@ 2018-07-24 21:15   ` Cong Wang
  2018-07-25 10:14     ` Paolo Abeni
  2018-07-25 11:50     ` Jamal Hadi Salim
  2018-07-25 13:52   ` Jiri Pirko
  1 sibling, 2 replies; 43+ messages in thread
From: Cong Wang @ 2018-07-24 21:15 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Daniel Borkmann, Marcelo Ricardo Leitner, Eyal Birger,
	David Miller

On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> +
> +               /* let's the caller reinject the packet, if possible */
> +               if (skb_at_tc_ingress(skb)) {
> +                       res->ingress = want_ingress;
> +                       res->qstats = this_cpu_ptr(m->common.cpu_qstats);
> +                       return TC_ACT_REINJECT;
> +               }

Looks good to me, but here we no longer return user-specified
return value here, I am sure it is safe for TC_ACT_STOLEN, but
I am not sure if it is safe for other values, like TC_ACT_RECLASSIFY.

Jamal, is there any use case of returning !TC_ACT_STOLEN for
ingress redirections?

Thanks.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-24 20:50     ` Cong Wang
@ 2018-07-25  8:29       ` Paolo Abeni
  2018-07-25 12:27         ` Jamal Hadi Salim
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25  8:29 UTC (permalink / raw)
  To: Cong Wang, Jamal Hadi Salim
  Cc: Linux Kernel Network Developers, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David Miller

On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:
> On Tue, Jul 24, 2018 at 1:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > 
> > On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> > > +{
> > > +       struct gnet_stats_queue *stats = res->qstats;
> > > +       int ret;
> > > +
> > > +       if (res->ingress)
> > > +               ret = netif_receive_skb(skb);
> > > +       else
> > > +               ret = dev_queue_xmit(skb);
> > > +       if (ret && stats)
> > > +               qstats_overlimit_inc(res->qstats);
> > 
> > Why increasing overlimit? Overlimit is typically increased
> > by traffic shapers to indicate there is no bandwidth to send
> > out the packet.
> > 
> > I fail to understand why overlimit is increased in your case
> > here. I guess you want to increase 'drops' instead.
> 
> Hmm, actually the current mirred code increases overlimit too.
> But I still don't think it makes sense.

Yep, I chose to increment 'overlimits' to preserve the current mirred
semantic. 

AFAICS, that was first introduced with:

commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
Author: Jamal Hadi Salim <jhs@mojatatu.com>
Date:   Mon Aug 15 05:25:40 2011 +0000

    net_sched: fix port mirror/redirect stats reporting

Likely increasing 'drops' would be "better", but I'm unsure we can
change this established behavior without affecting some user.

Anyway, I'm fine either way. Please advice.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-24 21:15   ` Cong Wang
@ 2018-07-25 10:14     ` Paolo Abeni
  2018-07-25 13:30       ` Jiri Pirko
  2018-07-25 11:50     ` Jamal Hadi Salim
  1 sibling, 1 reply; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 10:14 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Daniel Borkmann, Marcelo Ricardo Leitner, Eyal Birger,
	David Miller

On Tue, 2018-07-24 at 14:15 -0700, Cong Wang wrote:
> On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > +
> > +               /* let's the caller reinject the packet, if possible */
> > +               if (skb_at_tc_ingress(skb)) {
> > +                       res->ingress = want_ingress;
> > +                       res->qstats = this_cpu_ptr(m->common.cpu_qstats);
> > +                       return TC_ACT_REINJECT;
> > +               }
> 
> Looks good to me, but here we no longer return user-specified
> return value here, I am sure it is safe for TC_ACT_STOLEN, but
> I am not sure if it is safe for other values, like TC_ACT_RECLASSIFY.

I can make it safer, using the no clone path only if tcf_action is
TC_ACT_STOLEN. That will still cover the relevant use-cases.

Will do that in v4.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-24 21:15   ` Cong Wang
  2018-07-25 10:14     ` Paolo Abeni
@ 2018-07-25 11:50     ` Jamal Hadi Salim
  1 sibling, 0 replies; 43+ messages in thread
From: Jamal Hadi Salim @ 2018-07-25 11:50 UTC (permalink / raw)
  To: Cong Wang, Paolo Abeni
  Cc: Linux Kernel Network Developers, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David Miller

On 24/07/18 05:15 PM, Cong Wang wrote:
> On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> +
>> +               /* let's the caller reinject the packet, if possible */
>> +               if (skb_at_tc_ingress(skb)) {
>> +                       res->ingress = want_ingress;
>> +                       res->qstats = this_cpu_ptr(m->common.cpu_qstats);
>> +                       return TC_ACT_REINJECT;
>> +               }
> 
> Looks good to me, but here we no longer return user-specified
> return value here, I am sure it is safe for TC_ACT_STOLEN, but
> I am not sure if it is safe for other values, like TC_ACT_RECLASSIFY.
> 
> Jamal, is there any use case of returning !TC_ACT_STOLEN for
> ingress redirections?

I cant think of one off top of my head.
There maybe a future use case where it is not so - maybe just allow
to return the user programmed action? that value will always be
TC_ACT_STOLEN if the rule was specified via iproute2/tc.

cheers,
jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection
  2018-07-24 20:06 [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Paolo Abeni
                   ` (4 preceding siblings ...)
  2018-07-24 20:06 ` [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible Paolo Abeni
@ 2018-07-25 11:53 ` Jiri Pirko
  2018-07-25 12:07   ` Paolo Abeni
  5 siblings, 1 reply; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 11:53 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Tue, Jul 24, 2018 at 10:06:38PM CEST, pabeni@redhat.com wrote:
>This series is aimed at improving the act_mirred redirect performances.
>Such action is used by OVS to represent TC S/W flows, and it's current largest
>bottle-neck is the need for a skb_clone() for each packet.
>
>The first 3 patches introduce some cleanup and safeguards to allow extending 
>tca_result - we will use it to store RCU protected redirect information - and
>introduce a clear separation between user-space accessible tcfa_action
>value and internal values accessible only by the kernel.
>Then a new tcfa_action value is introduced: TC_ACT_REINJECT, similar to
>TC_ACT_REDIRECT, but preserving the mirred semantic. Such value is not
>accessible from user-space.
>The last patch exploits the newly introduced infrastructure in the act_mirred
>action, to avoid a skb_clone, when possible.
>
>Overall this the above gives a ~10% performance improvement in forwarding tput,
>when using the TC S/W datapath.
>
>v1 -> v2:
> - preserve the rcu lock in act_bpf
> - add and use a new action value to reinject the packets, preserving the mirred
>   semantic
>
>v2 -> v3:
> - renamed to new action as TC_ACT_REINJECT
> - TC_ACT_REINJECT is not exposed to user-space
>
>Paolo Abeni (5):
>  tc/act: user space can't use TC_ACT_REDIRECT directly
>  net/sched: user-space can't set unknown tcfa_action values
>  tc/act: remove unneeded RCU lock in action callback
>  net/tc: introduce TC_ACT_REINJECT.
>  act_mirred: use TC_ACT_REINJECT when possible

Could you please send the userspace iproute2 patch and list some
examples of usage along with the next patchset version?

Also, I would like you to do selftest script to test this and have it as
a part of this patchset.

Thanks!


>
> include/net/act_api.h        |  2 +-
> include/net/pkt_cls.h        |  3 +++
> include/net/sch_generic.h    | 21 ++++++++++++++++++++
> include/uapi/linux/pkt_cls.h |  6 ++++--
> net/core/dev.c               |  6 +++++-
> net/sched/act_api.c          | 15 +++++++++++++-
> net/sched/act_csum.c         | 12 +++---------
> net/sched/act_ife.c          |  5 +----
> net/sched/act_mirred.c       | 38 ++++++++++++++++++++++++------------
> net/sched/act_sample.c       |  4 +---
> net/sched/act_skbedit.c      | 10 +++-------
> net/sched/act_skbmod.c       | 21 +++++++++-----------
> net/sched/act_tunnel_key.c   |  6 +-----
> net/sched/act_vlan.c         | 19 +++++++-----------
> 14 files changed, 98 insertions(+), 70 deletions(-)
>
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-24 20:06 ` [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
@ 2018-07-25 11:55   ` Jamal Hadi Salim
  2018-07-25 11:56   ` Jiri Pirko
  1 sibling, 0 replies; 43+ messages in thread
From: Jamal Hadi Salim @ 2018-07-25 11:55 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On 24/07/18 04:06 PM, Paolo Abeni wrote:

> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
> @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>   		}
>   	}
>   
> +	if (a->tcfa_action == TC_ACT_REDIRECT) {
> +		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
> +		a->tcfa_action = TC_ACT_UNSPEC;
> +	}
> +

Why not just reject the rule instead of changing the code underneath the
user?

cheers,
jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-24 20:06 ` [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
  2018-07-25 11:55   ` Jamal Hadi Salim
@ 2018-07-25 11:56   ` Jiri Pirko
  2018-07-25 12:54     ` Paolo Abeni
  1 sibling, 1 reply; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 11:56 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Tue, Jul 24, 2018 at 10:06:39PM CEST, pabeni@redhat.com wrote:
>Only cls_bpf and act_bpf can safely use such value. If a generic
>action is configured by user space to return TC_ACT_REDIRECT,
>the usually visible behavior is passing the skb up the stack - as
>for unknown action, but, with complex configuration, more random
>results can be obtained.
>
>This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
>at action init time, making the kernel behavior more consistent.
>
>v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>---
> net/sched/act_api.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 148a89ab789b..24b5534967fe 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
> 		}
> 	}
> 
>+	if (a->tcfa_action == TC_ACT_REDIRECT) {
>+		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");

Can't you push this warning through extack?

But, wouldn't it be more appropriate to fail here? User is passing
invalid configuration....


>+		a->tcfa_action = TC_ACT_UNSPEC;
>+	}
>+
> 	return a;
> 
> err_mod:
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback
  2018-07-24 20:06 ` [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
@ 2018-07-25 11:59   ` Jamal Hadi Salim
  2018-07-25 18:24     ` Marcelo Ricardo Leitner
  2018-07-25 12:32   ` Jiri Pirko
  1 sibling, 1 reply; 43+ messages in thread
From: Jamal Hadi Salim @ 2018-07-25 11:59 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

On 24/07/18 04:06 PM, Paolo Abeni wrote:
> Each lockless action currently does its own RCU locking in ->act().
> This is allows using plain RCU accessor, even if the context
> is really RCU BH.
> 
> This change drops the per action RCU lock, replace the accessors
> with _bh variant, cleans up a bit the surronding code and documents
> the RCU status in the relevant header.
> No functional nor performance change is intended.
> 
> The goal of this patch is clarifying that the RCU critical section
> used by the tc actions extends up to the classifier's caller.
> 

This and 2/5 seems to stand on their own merit.

cheers,
jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection
  2018-07-25 11:53 ` [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Jiri Pirko
@ 2018-07-25 12:07   ` Paolo Abeni
  2018-07-25 12:17     ` Jiri Pirko
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 12:07 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Hi,

On Wed, 2018-07-25 at 13:53 +0200, Jiri Pirko wrote:
> Could you please send the userspace iproute2 patch and list some
> examples of usage along with the next patchset version?

There are no iproute2 patches, as TC_ACT_REINJECT is not accessible
from user-space: act_mirred will use such value when possible.

User-space can have any notion of the above only via the perf tool
and/or performances tests ;)

An use case is the OVS TC S/W datapath. openvswitch alreday leverage
this when configured with:

other_config:hw-offload=true
other_config:tc-policy=none

> Also, I would like you to do selftest script to test this and have it as
> a part of this patchset.

As said, there is no user-space change. AFAICS the current act_mirred
self tests already cover this. Do you have other things in mind?

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-24 20:06 ` [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT Paolo Abeni
  2018-07-24 20:38   ` Cong Wang
@ 2018-07-25 12:16   ` Jamal Hadi Salim
  2018-07-25 12:59     ` Jiri Pirko
  2018-07-25 13:55     ` Paolo Abeni
  2018-07-25 12:57   ` Jiri Pirko
  2 siblings, 2 replies; 43+ messages in thread
From: Jamal Hadi Salim @ 2018-07-25 12:16 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller, Shmulik Ladkani

+Cc Shmulik

Paolo - please also run the tdc tests (and add anymore if you
feel they dont do coverage to your changes)

On 24/07/18 04:06 PM, Paolo Abeni wrote:
> This is similar TC_ACT_REDIRECT, but with a slightly different
> semantic:
> - on ingress the mirred skbs are passed to the target device
> network stack without any additional check not scrubbing.
> - the rcu-protected stats provided via the tcf_result struct
>    are updated on error conditions.
> 
> This new tcfa_action value is not exposed to the user-space
> and can be used only internally by clsact.
> 
> v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
>   a new action type instead
> 
> v2 -> v3:
>   - rename the new action value TC_ACT_REINJECT, update the
>     helper accordingly
>   - take care of uncloned reinjected packets in XDP generic
>     hook
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>   include/net/pkt_cls.h     |  3 +++
>   include/net/sch_generic.h | 19 +++++++++++++++++++
>   net/core/dev.c            |  6 +++++-
>   3 files changed, 27 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 2081e4219f81..36ccfe2a303a 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -7,6 +7,9 @@
>   #include <net/sch_generic.h>
>   #include <net/act_api.h>
>   
> +/* TC action not accessible from user space */
> +#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)

Lets say in the future we add a new opcode.
Will old kernel, new iproute2 (new value) work?
Maybe use a negative number below -1.
I am honestly still unclear about introducing this new
code.
Could you not use the result code to carry sufficient
info to indicate the intent?

cheers,
jamal

> +
>   /* Basic packet classifier frontend definitions. */
>   
>   struct tcf_walker {
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 056dc1083aa3..95e81a70f549 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -235,6 +235,12 @@ struct tcf_result {
>   			u32		classid;
>   		};
>   		const struct tcf_proto *goto_tp;
> +
> +		/* used by the TC_ACT_REINJECT action */
> +		struct {
> +			bool		ingress;
> +			struct gnet_stats_queue *qstats;
> +		};
>   	};
>   };
>   
> @@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>   void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
>   			  struct mini_Qdisc __rcu **p_miniq);
>   
> +static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
> +{
> +	struct gnet_stats_queue *stats = res->qstats;
> +	int ret;
> +
> +	if (res->ingress)
> +		ret = netif_receive_skb(skb);
> +	else
> +		ret = dev_queue_xmit(skb);
> +	if (ret && stats)
> +		qstats_overlimit_inc(res->qstats);
> +}
> +
>   #endif
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 14a748ee8cc9..826ec74fe1d9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>   	/* Reinjected packets coming from act_mirred or similar should
>   	 * not get XDP generic processing.
>   	 */
> -	if (skb_cloned(skb))
> +	if (skb_cloned(skb) || skb->tc_redirected)
>   		return XDP_PASS;
>   
>   	/* XDP packets must be linear and must have sufficient headroom
> @@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>   		__skb_push(skb, skb->mac_len);
>   		skb_do_redirect(skb);
>   		return NULL;
> +	case TC_ACT_REINJECT:
> +		/* this does not scrub the packet, and updates stats on error */
> +		skb_tc_reinject(skb, &cl_res);
> +		return NULL;
>   	default:
>   		break;
>   	}
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection
  2018-07-25 12:07   ` Paolo Abeni
@ 2018-07-25 12:17     ` Jiri Pirko
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 12:17 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Wed, Jul 25, 2018 at 02:07:15PM CEST, pabeni@redhat.com wrote:
>Hi,
>
>On Wed, 2018-07-25 at 13:53 +0200, Jiri Pirko wrote:
>> Could you please send the userspace iproute2 patch and list some
>> examples of usage along with the next patchset version?
>
>There are no iproute2 patches, as TC_ACT_REINJECT is not accessible
>from user-space: act_mirred will use such value when possible.

Okay.

>
>User-space can have any notion of the above only via the perf tool
>and/or performances tests ;)
>
>An use case is the OVS TC S/W datapath. openvswitch alreday leverage
>this when configured with:
>
>other_config:hw-offload=true
>other_config:tc-policy=none
>
>> Also, I would like you to do selftest script to test this and have it as
>> a part of this patchset.
>
>As said, there is no user-space change. AFAICS the current act_mirred
>self tests already cover this. Do you have other things in mind?

Sounds allright. Thanks for the explanation!


>
>Cheers,
>
>Paolo
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 2/5] net/sched: user-space can't set unknown tcfa_action values
  2018-07-24 20:06 ` [PATCH net-next v3 2/5] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
@ 2018-07-25 12:26   ` Jiri Pirko
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 12:26 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Tue, Jul 24, 2018 at 10:06:40PM CEST, pabeni@redhat.com wrote:
>Currently, when initializing an action, the user-space can specify
>and use arbitrary values for the tcfa_action field. If the value
>is unknown by the kernel, is implicitly threaded as TC_ACT_UNSPEC.
>
>This change explicitly checks for unknown values at action creation
>time, and explicitly convert them to TC_ACT_UNSPEC. No functional
>changes are introduced, but this will allow introducing tcfa_action
>values not exposed to user-space in a later patch.
>
>Note: we can't use the above to hide TC_ACT_REDIRECT from user-space,
>as the latter is already part of uAPI.
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>---
> include/uapi/linux/pkt_cls.h |  6 ++++--
> net/sched/act_api.c          | 10 +++++++++-
> 2 files changed, 13 insertions(+), 3 deletions(-)
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index c4262d911596..c8a24861d4c8 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -45,6 +45,7 @@ enum {
> 				   * the skb and act like everything
> 				   * is alright.
> 				   */
>+#define TC_ACT_VALUE_MAX	TC_ACT_TRAP
> 
> /* There is a special kind of actions called "extended actions",
>  * which need a value parameter. These have a local opcode located in
>@@ -55,11 +56,12 @@ enum {
> #define __TC_ACT_EXT_SHIFT 28
> #define __TC_ACT_EXT(local) ((local) << __TC_ACT_EXT_SHIFT)
> #define TC_ACT_EXT_VAL_MASK ((1 << __TC_ACT_EXT_SHIFT) - 1)
>-#define TC_ACT_EXT_CMP(combined, opcode) \
>-	(((combined) & (~TC_ACT_EXT_VAL_MASK)) == opcode)
>+#define TC_ACT_EXT_OPCODE(combined) ((combined) & (~TC_ACT_EXT_VAL_MASK))
>+#define TC_ACT_EXT_CMP(combined, opcode) (TC_ACT_EXT_OPCODE(combined) == opcode)
> 
> #define TC_ACT_JUMP __TC_ACT_EXT(1)
> #define TC_ACT_GOTO_CHAIN __TC_ACT_EXT(2)
>+#define TC_ACT_EXT_OPCODE_MAX	TC_ACT_GOTO_CHAIN
> 
> /* Action type identifiers*/
> enum {
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 24b5534967fe..5044f4809b37 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -798,6 +798,7 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
> 	char act_name[IFNAMSIZ];
> 	struct nlattr *tb[TCA_ACT_MAX + 1];
> 	struct nlattr *kind;
>+	int opcode;
> 	int err;
> 
> 	if (name == NULL) {
>@@ -884,7 +885,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
> 	if (err != ACT_P_CREATED)
> 		module_put(a_o->owner);
> 
>-	if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN)) {
>+	opcode = TC_ACT_EXT_OPCODE(a->tcfa_action);
>+	if (opcode == TC_ACT_GOTO_CHAIN) {
> 		err = tcf_action_goto_chain_init(a, tp);
> 		if (err) {
> 			struct tc_action *actions[] = { a, NULL };
>@@ -898,6 +900,12 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
> 	if (a->tcfa_action == TC_ACT_REDIRECT) {
> 		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
> 		a->tcfa_action = TC_ACT_UNSPEC;
>+	} else if ((!opcode && a->tcfa_action > TC_ACT_VALUE_MAX) ||
>+		   (opcode && opcode > TC_ACT_EXT_OPCODE_MAX &&
>+		    a->tcfa_action != TC_ACT_UNSPEC)) {
>+		net_warn_ratelimited("invalid %d action value",
>+				     a->tcfa_action);
>+		a->tcfa_action = TC_ACT_UNSPEC;

Maybe this could be a separate helper function?

Also, the warn might go along with extack to user too.

Otherwise, this looks fine to me.


> 	}
> 
> 	return a;
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25  8:29       ` Paolo Abeni
@ 2018-07-25 12:27         ` Jamal Hadi Salim
  2018-07-25 14:24           ` Paolo Abeni
  2018-07-25 16:48           ` Cong Wang
  0 siblings, 2 replies; 43+ messages in thread
From: Jamal Hadi Salim @ 2018-07-25 12:27 UTC (permalink / raw)
  To: Paolo Abeni, Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David Miller

On 25/07/18 04:29 AM, Paolo Abeni wrote:
> On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:

[..]
>>> I fail to understand why overlimit is increased in your case
>>> here. I guess you want to increase 'drops' instead.
>>
>> Hmm, actually the current mirred code increases overlimit too.
>> But I still don't think it makes sense.
> 
> Yep, I chose to increment 'overlimits' to preserve the current mirred
> semantic.
> 
> AFAICS, that was first introduced with:
> 
> commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
> Author: Jamal Hadi Salim <jhs@mojatatu.com>
> Date:   Mon Aug 15 05:25:40 2011 +0000
> 
>      net_sched: fix port mirror/redirect stats reporting
> 
> Likely increasing 'drops' would be "better", but I'm unsure we can
> change this established behavior without affecting some user.
> 

Those changes were there from the beginning (above patch did
not introduce them).
IIRC, the reason was to distinguish between policy intended
drops and drops because of errors.

cheers,
jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback
  2018-07-24 20:06 ` [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
  2018-07-25 11:59   ` Jamal Hadi Salim
@ 2018-07-25 12:32   ` Jiri Pirko
  1 sibling, 0 replies; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 12:32 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Tue, Jul 24, 2018 at 10:06:41PM CEST, pabeni@redhat.com wrote:
>Each lockless action currently does its own RCU locking in ->act().
>This is allows using plain RCU accessor, even if the context
>is really RCU BH.
>
>This change drops the per action RCU lock, replace the accessors
>with _bh variant, cleans up a bit the surronding code and documents

s/surronding/surrounding/


>the RCU status in the relevant header.
>No functional nor performance change is intended.
>
>The goal of this patch is clarifying that the RCU critical section
>used by the tc actions extends up to the classifier's caller.
>
>v1 -> v2:
> - preserve rcu lock in act_bpf: it's needed by eBPF helpers,
>   as pointed out by Daniel
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-25 11:56   ` Jiri Pirko
@ 2018-07-25 12:54     ` Paolo Abeni
  2018-07-25 13:03       ` Jiri Pirko
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 12:54 UTC (permalink / raw)
  To: Jiri Pirko, Jamal Hadi Salim
  Cc: netdev, Cong Wang, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller

Hi,

On Wed, 2018-07-25 at 13:56 +0200, Jiri Pirko wrote:
> Tue, Jul 24, 2018 at 10:06:39PM CEST, pabeni@redhat.com wrote:
> > Only cls_bpf and act_bpf can safely use such value. If a generic
> > action is configured by user space to return TC_ACT_REDIRECT,
> > the usually visible behavior is passing the skb up the stack - as
> > for unknown action, but, with complex configuration, more random
> > results can be obtained.
> > 
> > This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
> > at action init time, making the kernel behavior more consistent.
> > 
> > v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > net/sched/act_api.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> > 
> > diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> > index 148a89ab789b..24b5534967fe 100644
> > --- a/net/sched/act_api.c
> > +++ b/net/sched/act_api.c
> > @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
> > 		}
> > 	}
> > 
> > +	if (a->tcfa_action == TC_ACT_REDIRECT) {
> > +		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
> 
> Can't you push this warning through extack?
> 
> But, wouldn't it be more appropriate to fail here? User is passing
> invalid configuration....

Jiri, Jamal, thank you for the feedback.

Please allow me to answer both of you here, since you raised similar
concers.

I thought about rejecting the action, but that change of behavior could
break some users, as currently most kind of invalid tcfa_action values
are simply accepted.

If there is consensus about it, I can simply fail.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-24 20:06 ` [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT Paolo Abeni
  2018-07-24 20:38   ` Cong Wang
  2018-07-25 12:16   ` Jamal Hadi Salim
@ 2018-07-25 12:57   ` Jiri Pirko
  2 siblings, 0 replies; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 12:57 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Tue, Jul 24, 2018 at 10:06:42PM CEST, pabeni@redhat.com wrote:
>This is similar TC_ACT_REDIRECT, but with a slightly different
>semantic:
>- on ingress the mirred skbs are passed to the target device
>network stack without any additional check not scrubbing.
>- the rcu-protected stats provided via the tcf_result struct
>  are updated on error conditions.
>
>This new tcfa_action value is not exposed to the user-space
>and can be used only internally by clsact.
>
>v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
> a new action type instead
>
>v2 -> v3:
> - rename the new action value TC_ACT_REINJECT, update the
>   helper accordingly
> - take care of uncloned reinjected packets in XDP generic
>   hook
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>---
> include/net/pkt_cls.h     |  3 +++
> include/net/sch_generic.h | 19 +++++++++++++++++++
> net/core/dev.c            |  6 +++++-
> 3 files changed, 27 insertions(+), 1 deletion(-)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 2081e4219f81..36ccfe2a303a 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -7,6 +7,9 @@
> #include <net/sch_generic.h>
> #include <net/act_api.h>
> 
>+/* TC action not accessible from user space */
>+#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
>+
> /* Basic packet classifier frontend definitions. */
> 
> struct tcf_walker {
>diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>index 056dc1083aa3..95e81a70f549 100644
>--- a/include/net/sch_generic.h
>+++ b/include/net/sch_generic.h
>@@ -235,6 +235,12 @@ struct tcf_result {
> 			u32		classid;
> 		};
> 		const struct tcf_proto *goto_tp;
>+
>+		/* used by the TC_ACT_REINJECT action */
>+		struct {
>+			bool		ingress;
>+			struct gnet_stats_queue *qstats;
>+		};
> 	};
> };
> 
>@@ -1091,4 +1097,17 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
> 			  struct mini_Qdisc __rcu **p_miniq);
> 
>+static inline void skb_tc_reinject(struct sk_buff *skb, struct tcf_result *res)
>+{
>+	struct gnet_stats_queue *stats = res->qstats;
>+	int ret;
>+
>+	if (res->ingress)
>+		ret = netif_receive_skb(skb);
>+	else
>+		ret = dev_queue_xmit(skb);

Hmm. "reinject" by the name tells me that the packet should be injected
again. By "inject", I understand beginning of the rx path. However, this
does xmit as well :/ It is a bit misleading. Maybe "reinsert" would
sound better?



>+	if (ret && stats)
>+		qstats_overlimit_inc(res->qstats);
>+}
>+
> #endif
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 14a748ee8cc9..826ec74fe1d9 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -4252,7 +4252,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> 	/* Reinjected packets coming from act_mirred or similar should
> 	 * not get XDP generic processing.
> 	 */
>-	if (skb_cloned(skb))
>+	if (skb_cloned(skb) || skb->tc_redirected)
> 		return XDP_PASS;
> 
> 	/* XDP packets must be linear and must have sufficient headroom
>@@ -4602,6 +4602,10 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
> 		__skb_push(skb, skb->mac_len);
> 		skb_do_redirect(skb);
> 		return NULL;
>+	case TC_ACT_REINJECT:
>+		/* this does not scrub the packet, and updates stats on error */
>+		skb_tc_reinject(skb, &cl_res);
>+		return NULL;
> 	default:
> 		break;
> 	}
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25 12:16   ` Jamal Hadi Salim
@ 2018-07-25 12:59     ` Jiri Pirko
  2018-07-25 13:55     ` Paolo Abeni
  1 sibling, 0 replies; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 12:59 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Paolo Abeni, netdev, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller,
	Shmulik Ladkani

Wed, Jul 25, 2018 at 02:16:12PM CEST, jhs@mojatatu.com wrote:

[...]

>> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>> index 2081e4219f81..36ccfe2a303a 100644
>> --- a/include/net/pkt_cls.h
>> +++ b/include/net/pkt_cls.h
>> @@ -7,6 +7,9 @@
>>   #include <net/sch_generic.h>
>>   #include <net/act_api.h>
>> +/* TC action not accessible from user space */
>> +#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
>
>Lets say in the future we add a new opcode.
>Will old kernel, new iproute2 (new value) work?

This is safe. See patch #2.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-25 12:54     ` Paolo Abeni
@ 2018-07-25 13:03       ` Jiri Pirko
  2018-07-25 15:48         ` Paolo Abeni
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 13:03 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jamal Hadi Salim, netdev, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Wed, Jul 25, 2018 at 02:54:04PM CEST, pabeni@redhat.com wrote:
>Hi,
>
>On Wed, 2018-07-25 at 13:56 +0200, Jiri Pirko wrote:
>> Tue, Jul 24, 2018 at 10:06:39PM CEST, pabeni@redhat.com wrote:
>> > Only cls_bpf and act_bpf can safely use such value. If a generic
>> > action is configured by user space to return TC_ACT_REDIRECT,
>> > the usually visible behavior is passing the skb up the stack - as
>> > for unknown action, but, with complex configuration, more random
>> > results can be obtained.
>> > 
>> > This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
>> > at action init time, making the kernel behavior more consistent.
>> > 
>> > v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
>> > 
>> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> > ---
>> > net/sched/act_api.c | 5 +++++
>> > 1 file changed, 5 insertions(+)
>> > 
>> > diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>> > index 148a89ab789b..24b5534967fe 100644
>> > --- a/net/sched/act_api.c
>> > +++ b/net/sched/act_api.c
>> > @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>> > 		}
>> > 	}
>> > 
>> > +	if (a->tcfa_action == TC_ACT_REDIRECT) {
>> > +		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
>> 
>> Can't you push this warning through extack?
>> 
>> But, wouldn't it be more appropriate to fail here? User is passing
>> invalid configuration....
>
>Jiri, Jamal, thank you for the feedback.
>
>Please allow me to answer both of you here, since you raised similar
>concers.
>
>I thought about rejecting the action, but that change of behavior could
>break some users, as currently most kind of invalid tcfa_action values
>are simply accepted.
>
>If there is consensus about it, I can simply fail.

Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
really has no meaning for anyone to use it throughout its whole history.
I would vote for "fail", yet I admit that I am usually alone in opinion
about similar uapi changes :)



>
>Thanks,
>
>Paolo
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-25 10:14     ` Paolo Abeni
@ 2018-07-25 13:30       ` Jiri Pirko
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 13:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Cong Wang, Linux Kernel Network Developers, Jamal Hadi Salim,
	Daniel Borkmann, Marcelo Ricardo Leitner, Eyal Birger,
	David Miller

Wed, Jul 25, 2018 at 12:14:32PM CEST, pabeni@redhat.com wrote:
>On Tue, 2018-07-24 at 14:15 -0700, Cong Wang wrote:
>> On Tue, Jul 24, 2018 at 1:07 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> > +
>> > +               /* let's the caller reinject the packet, if possible */
>> > +               if (skb_at_tc_ingress(skb)) {
>> > +                       res->ingress = want_ingress;
>> > +                       res->qstats = this_cpu_ptr(m->common.cpu_qstats);
>> > +                       return TC_ACT_REINJECT;
>> > +               }
>> 
>> Looks good to me, but here we no longer return user-specified
>> return value here, I am sure it is safe for TC_ACT_STOLEN, but
>> I am not sure if it is safe for other values, like TC_ACT_RECLASSIFY.
>
>I can make it safer, using the no clone path only if tcf_action is
>TC_ACT_STOLEN. That will still cover the relevant use-cases.

That is a good idea. Thanks!


>
>Will do that in v4.
>
>Cheers,
>
>Paolo
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-24 20:06 ` [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible Paolo Abeni
  2018-07-24 21:15   ` Cong Wang
@ 2018-07-25 13:52   ` Jiri Pirko
  2018-07-25 14:04     ` Paolo Abeni
  1 sibling, 1 reply; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 13:52 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Tue, Jul 24, 2018 at 10:06:43PM CEST, pabeni@redhat.com wrote:
>When mirred is invoked from the ingress path, and it wants to redirect
>the processed packet, it can now use the TC_ACT_REINJECT action,
>filling the tcf_result accordingly, and avoiding a per packet
>skb_clone().
>
>Overall this gives a ~10% improvement in forwarding performance for the
>TC S/W data path and TC S/W performances are now comparable to the
>kernel openvswitch datapath.
>
>v1 -> v2: use ACT_MIRRED instead of ACT_REDIRECT
>v2 -> v3: updated after action rename, fixed a typo into the commit
>	message
>
>Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>---
> net/sched/act_mirred.c | 34 ++++++++++++++++++++++++----------
> 1 file changed, 24 insertions(+), 10 deletions(-)
>
>diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>index eeb335f03102..368187312136 100644
>--- a/net/sched/act_mirred.c
>+++ b/net/sched/act_mirred.c
>@@ -25,6 +25,7 @@
> #include <net/net_namespace.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
>+#include <net/pkt_cls.h>
> #include <linux/tc_act/tc_mirred.h>
> #include <net/tc_act/tc_mirred.h>
> 
>@@ -171,10 +172,12 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
> 		      struct tcf_result *res)
> {
> 	struct tcf_mirred *m = to_mirred(a);
>+	struct sk_buff *skb2 = skb;
> 	bool m_mac_header_xmit;
> 	struct net_device *dev;
>-	struct sk_buff *skb2;
> 	int retval, err = 0;
>+	bool want_ingress;
>+	bool is_redirect;
> 	int m_eaction;
> 	int mac_len;
> 
>@@ -196,16 +199,19 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
> 		goto out;
> 	}
> 
>-	skb2 = skb_clone(skb, GFP_ATOMIC);
>-	if (!skb2)
>-		goto out;
>+	is_redirect = tcf_mirred_is_act_redirect(m_eaction);
>+	if (!skb_at_tc_ingress(skb) || !is_redirect) {
>+		skb2 = skb_clone(skb, GFP_ATOMIC);
>+		if (!skb2)
>+			goto out;
>+	}
> 
> 	/* If action's target direction differs than filter's direction,
> 	 * and devices expect a mac header on xmit, then mac push/pull is
> 	 * needed.
> 	 */
>-	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
>-	    m_mac_header_xmit) {
>+	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
>+	if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
> 		if (!skb_at_tc_ingress(skb)) {
> 			/* caught at egress, act ingress: pull mac */
> 			mac_len = skb_network_header(skb) - skb_mac_header(skb);
>@@ -216,15 +222,23 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
> 		}
> 	}
> 
>+	skb2->skb_iif = skb->dev->ifindex;
>+	skb2->dev = dev;
>+
> 	/* mirror is always swallowed */
>-	if (tcf_mirred_is_act_redirect(m_eaction)) {
>+	if (is_redirect) {
> 		skb2->tc_redirected = 1;
> 		skb2->tc_from_ingress = skb2->tc_at_ingress;
>+
>+		/* let's the caller reinject the packet, if possible */
>+		if (skb_at_tc_ingress(skb)) {

I probably missed something. Why only on ingress?

The patch looks good to me.


>+			res->ingress = want_ingress;
>+			res->qstats = this_cpu_ptr(m->common.cpu_qstats);
>+			return TC_ACT_REINJECT;
>+		}
> 	}
> 
>-	skb2->skb_iif = skb->dev->ifindex;
>-	skb2->dev = dev;
>-	if (!tcf_mirred_act_wants_ingress(m_eaction))
>+	if (!want_ingress)
> 		err = dev_queue_xmit(skb2);
> 	else
> 		err = netif_receive_skb(skb2);
>-- 
>2.17.1
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25 12:16   ` Jamal Hadi Salim
  2018-07-25 12:59     ` Jiri Pirko
@ 2018-07-25 13:55     ` Paolo Abeni
  1 sibling, 0 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 13:55 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: Cong Wang, Jiri Pirko, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller, Shmulik Ladkani

Hi,

On Wed, 2018-07-25 at 08:16 -0400, Jamal Hadi Salim wrote:
> +Cc Shmulik
> 
> Paolo - please also run the tdc tests (and add anymore if you
> feel they dont do coverage to your changes)

I run successfully tdc tests on a patched before posting. I plan to
rerun them before posting the v4.

> On 24/07/18 04:06 PM, Paolo Abeni wrote:
> > This is similar TC_ACT_REDIRECT, but with a slightly different
> > semantic:
> > - on ingress the mirred skbs are passed to the target device
> > network stack without any additional check not scrubbing.
> > - the rcu-protected stats provided via the tcf_result struct
> >    are updated on error conditions.
> > 
> > This new tcfa_action value is not exposed to the user-space
> > and can be used only internally by clsact.
> > 
> > v1 -> v2: do not touch TC_ACT_REDIRECT code path, introduce
> >   a new action type instead
> > 
> > v2 -> v3:
> >   - rename the new action value TC_ACT_REINJECT, update the
> >     helper accordingly
> >   - take care of uncloned reinjected packets in XDP generic
> >     hook
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >   include/net/pkt_cls.h     |  3 +++
> >   include/net/sch_generic.h | 19 +++++++++++++++++++
> >   net/core/dev.c            |  6 +++++-
> >   3 files changed, 27 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> > index 2081e4219f81..36ccfe2a303a 100644
> > --- a/include/net/pkt_cls.h
> > +++ b/include/net/pkt_cls.h
> > @@ -7,6 +7,9 @@
> >   #include <net/sch_generic.h>
> >   #include <net/act_api.h>
> >   
> > +/* TC action not accessible from user space */
> > +#define TC_ACT_REINJECT		(TC_ACT_VALUE_MAX + 1)
> 
> Lets say in the future we add a new opcode.
> Will old kernel, new iproute2 (new value) work?

It will works as it currently does in similar situation: opcode unknown
to the kernel are treated as TC_ACT_UNSPEC even now. 

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-25 13:52   ` Jiri Pirko
@ 2018-07-25 14:04     ` Paolo Abeni
  2018-07-25 14:30       ` Jiri Pirko
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 14:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

On Wed, 2018-07-25 at 15:52 +0200, Jiri Pirko wrote:
> Tue, Jul 24, 2018 at 10:06:43PM CEST, pabeni@redhat.com wrote:
> > When mirred is invoked from the ingress path, and it wants to redirect
> > the processed packet, it can now use the TC_ACT_REINJECT action,
> > filling the tcf_result accordingly, and avoiding a per packet
> > skb_clone().
> > 
> > Overall this gives a ~10% improvement in forwarding performance for the
> > TC S/W data path and TC S/W performances are now comparable to the
> > kernel openvswitch datapath.
> > 
> > v1 -> v2: use ACT_MIRRED instead of ACT_REDIRECT
> > v2 -> v3: updated after action rename, fixed a typo into the commit
> > 	message
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > net/sched/act_mirred.c | 34 ++++++++++++++++++++++++----------
> > 1 file changed, 24 insertions(+), 10 deletions(-)
> > 
> > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > index eeb335f03102..368187312136 100644
> > --- a/net/sched/act_mirred.c
> > +++ b/net/sched/act_mirred.c
> > @@ -25,6 +25,7 @@
> > #include <net/net_namespace.h>
> > #include <net/netlink.h>
> > #include <net/pkt_sched.h>
> > +#include <net/pkt_cls.h>
> > #include <linux/tc_act/tc_mirred.h>
> > #include <net/tc_act/tc_mirred.h>
> > 
> > @@ -171,10 +172,12 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
> > 		      struct tcf_result *res)
> > {
> > 	struct tcf_mirred *m = to_mirred(a);
> > +	struct sk_buff *skb2 = skb;
> > 	bool m_mac_header_xmit;
> > 	struct net_device *dev;
> > -	struct sk_buff *skb2;
> > 	int retval, err = 0;
> > +	bool want_ingress;
> > +	bool is_redirect;
> > 	int m_eaction;
> > 	int mac_len;
> > 
> > @@ -196,16 +199,19 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
> > 		goto out;
> > 	}
> > 
> > -	skb2 = skb_clone(skb, GFP_ATOMIC);
> > -	if (!skb2)
> > -		goto out;
> > +	is_redirect = tcf_mirred_is_act_redirect(m_eaction);
> > +	if (!skb_at_tc_ingress(skb) || !is_redirect) {
> > +		skb2 = skb_clone(skb, GFP_ATOMIC);
> > +		if (!skb2)
> > +			goto out;
> > +	}
> > 
> > 	/* If action's target direction differs than filter's direction,
> > 	 * and devices expect a mac header on xmit, then mac push/pull is
> > 	 * needed.
> > 	 */
> > -	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
> > -	    m_mac_header_xmit) {
> > +	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
> > +	if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
> > 		if (!skb_at_tc_ingress(skb)) {
> > 			/* caught at egress, act ingress: pull mac */
> > 			mac_len = skb_network_header(skb) - skb_mac_header(skb);
> > @@ -216,15 +222,23 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
> > 		}
> > 	}
> > 
> > +	skb2->skb_iif = skb->dev->ifindex;
> > +	skb2->dev = dev;
> > +
> > 	/* mirror is always swallowed */
> > -	if (tcf_mirred_is_act_redirect(m_eaction)) {
> > +	if (is_redirect) {
> > 		skb2->tc_redirected = 1;
> > 		skb2->tc_from_ingress = skb2->tc_at_ingress;
> > +
> > +		/* let's the caller reinject the packet, if possible */
> > +		if (skb_at_tc_ingress(skb)) {
> 
> I probably missed something. Why only on ingress?

To keep the implementation as simple as possible: if I read correctly,
it is impossible for a filter detect if called by the clsact or the dev
root qdisc, and I think we could safely avoid the skb clone with a not
invasive patch, only if called from the clsact.

[please let me know if the above is somewhat clear ;)]

Also this covers nicely the relevant use case (TC S/W datapath).

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25 12:27         ` Jamal Hadi Salim
@ 2018-07-25 14:24           ` Paolo Abeni
  2018-07-25 15:26             ` Jamal Hadi Salim
  2018-07-25 16:48           ` Cong Wang
  1 sibling, 1 reply; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 14:24 UTC (permalink / raw)
  To: Jamal Hadi Salim, Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David Miller

On Wed, 2018-07-25 at 08:27 -0400, Jamal Hadi Salim wrote:
> On 25/07/18 04:29 AM, Paolo Abeni wrote:
> > On Tue, 2018-07-24 at 13:50 -0700, Cong Wang wrote:
> 
> [..]
> > > > I fail to understand why overlimit is increased in your case
> > > > here. I guess you want to increase 'drops' instead.
> > > 
> > > Hmm, actually the current mirred code increases overlimit too.
> > > But I still don't think it makes sense.
> > 
> > Yep, I chose to increment 'overlimits' to preserve the current mirred
> > semantic.
> > 
> > AFAICS, that was first introduced with:
> > 
> > commit 8919bc13e8d92c5b082c5c0321567383a071f5bc
> > Author: Jamal Hadi Salim <jhs@mojatatu.com>
> > Date:   Mon Aug 15 05:25:40 2011 +0000
> > 
> >      net_sched: fix port mirror/redirect stats reporting
> > 
> > Likely increasing 'drops' would be "better", but I'm unsure we can
> > change this established behavior without affecting some user.
> > 
> 
> Those changes were there from the beginning (above patch did
> not introduce them).
> IIRC, the reason was to distinguish between policy intended
> drops and drops because of errors.

Double-checking to avoid misinterepration on my side: you are ok with
keeping the 'overlimits' increment, right?

Thanks,

Paolo 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible
  2018-07-25 14:04     ` Paolo Abeni
@ 2018-07-25 14:30       ` Jiri Pirko
  0 siblings, 0 replies; 43+ messages in thread
From: Jiri Pirko @ 2018-07-25 14:30 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

Wed, Jul 25, 2018 at 04:04:03PM CEST, pabeni@redhat.com wrote:
>On Wed, 2018-07-25 at 15:52 +0200, Jiri Pirko wrote:
>> Tue, Jul 24, 2018 at 10:06:43PM CEST, pabeni@redhat.com wrote:
>> > When mirred is invoked from the ingress path, and it wants to redirect
>> > the processed packet, it can now use the TC_ACT_REINJECT action,
>> > filling the tcf_result accordingly, and avoiding a per packet
>> > skb_clone().
>> > 
>> > Overall this gives a ~10% improvement in forwarding performance for the
>> > TC S/W data path and TC S/W performances are now comparable to the
>> > kernel openvswitch datapath.
>> > 
>> > v1 -> v2: use ACT_MIRRED instead of ACT_REDIRECT
>> > v2 -> v3: updated after action rename, fixed a typo into the commit
>> > 	message
>> > 
>> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> > ---
>> > net/sched/act_mirred.c | 34 ++++++++++++++++++++++++----------
>> > 1 file changed, 24 insertions(+), 10 deletions(-)
>> > 
>> > diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>> > index eeb335f03102..368187312136 100644
>> > --- a/net/sched/act_mirred.c
>> > +++ b/net/sched/act_mirred.c
>> > @@ -25,6 +25,7 @@
>> > #include <net/net_namespace.h>
>> > #include <net/netlink.h>
>> > #include <net/pkt_sched.h>
>> > +#include <net/pkt_cls.h>
>> > #include <linux/tc_act/tc_mirred.h>
>> > #include <net/tc_act/tc_mirred.h>
>> > 
>> > @@ -171,10 +172,12 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
>> > 		      struct tcf_result *res)
>> > {
>> > 	struct tcf_mirred *m = to_mirred(a);
>> > +	struct sk_buff *skb2 = skb;
>> > 	bool m_mac_header_xmit;
>> > 	struct net_device *dev;
>> > -	struct sk_buff *skb2;
>> > 	int retval, err = 0;
>> > +	bool want_ingress;
>> > +	bool is_redirect;
>> > 	int m_eaction;
>> > 	int mac_len;
>> > 
>> > @@ -196,16 +199,19 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
>> > 		goto out;
>> > 	}
>> > 
>> > -	skb2 = skb_clone(skb, GFP_ATOMIC);
>> > -	if (!skb2)
>> > -		goto out;
>> > +	is_redirect = tcf_mirred_is_act_redirect(m_eaction);
>> > +	if (!skb_at_tc_ingress(skb) || !is_redirect) {
>> > +		skb2 = skb_clone(skb, GFP_ATOMIC);
>> > +		if (!skb2)
>> > +			goto out;
>> > +	}
>> > 
>> > 	/* If action's target direction differs than filter's direction,
>> > 	 * and devices expect a mac header on xmit, then mac push/pull is
>> > 	 * needed.
>> > 	 */
>> > -	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
>> > -	    m_mac_header_xmit) {
>> > +	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
>> > +	if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
>> > 		if (!skb_at_tc_ingress(skb)) {
>> > 			/* caught at egress, act ingress: pull mac */
>> > 			mac_len = skb_network_header(skb) - skb_mac_header(skb);
>> > @@ -216,15 +222,23 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
>> > 		}
>> > 	}
>> > 
>> > +	skb2->skb_iif = skb->dev->ifindex;
>> > +	skb2->dev = dev;
>> > +
>> > 	/* mirror is always swallowed */
>> > -	if (tcf_mirred_is_act_redirect(m_eaction)) {
>> > +	if (is_redirect) {
>> > 		skb2->tc_redirected = 1;
>> > 		skb2->tc_from_ingress = skb2->tc_at_ingress;
>> > +
>> > +		/* let's the caller reinject the packet, if possible */
>> > +		if (skb_at_tc_ingress(skb)) {
>> 
>> I probably missed something. Why only on ingress?
>
>To keep the implementation as simple as possible: if I read correctly,
>it is impossible for a filter detect if called by the clsact or the dev
>root qdisc, and I think we could safely avoid the skb clone with a not
>invasive patch, only if called from the clsact.
>
>[please let me know if the above is somewhat clear ;)]
>
>Also this covers nicely the relevant use case (TC S/W datapath).

Sure. I was just curious. Perhaps put a comment to this optimisation
describing why it is not possible for egress. It might help future
readers.

Thanks!

>
>Thanks,
>
>Paolo
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25 14:24           ` Paolo Abeni
@ 2018-07-25 15:26             ` Jamal Hadi Salim
  0 siblings, 0 replies; 43+ messages in thread
From: Jamal Hadi Salim @ 2018-07-25 15:26 UTC (permalink / raw)
  To: Paolo Abeni, Cong Wang
  Cc: Linux Kernel Network Developers, Jiri Pirko, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David Miller

On 25/07/18 10:24 AM, Paolo Abeni wrote:
> On Wed, 2018-07-25 at 08:27 -0400, Jamal Hadi Salim wrote:

>> Those changes were there from the beginning (above patch did
>> not introduce them).
>> IIRC, the reason was to distinguish between policy intended
>> drops and drops because of errors.
> 
> Double-checking to avoid misinterepration on my side: you are ok with
> keeping the 'overlimits' increment, right?

Yes.

cheers,
jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-25 13:03       ` Jiri Pirko
@ 2018-07-25 15:48         ` Paolo Abeni
  2018-07-25 16:29           ` Paolo Abeni
  2018-07-25 16:29           ` Daniel Borkmann
  0 siblings, 2 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 15:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, netdev, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

On Wed, 2018-07-25 at 15:03 +0200, Jiri Pirko wrote:
> Wed, Jul 25, 2018 at 02:54:04PM CEST, pabeni@redhat.com wrote:
> > Hi,
> > 
> > On Wed, 2018-07-25 at 13:56 +0200, Jiri Pirko wrote:
> > > Tue, Jul 24, 2018 at 10:06:39PM CEST, pabeni@redhat.com wrote:
> > > > Only cls_bpf and act_bpf can safely use such value. If a generic
> > > > action is configured by user space to return TC_ACT_REDIRECT,
> > > > the usually visible behavior is passing the skb up the stack - as
> > > > for unknown action, but, with complex configuration, more random
> > > > results can be obtained.
> > > > 
> > > > This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
> > > > at action init time, making the kernel behavior more consistent.
> > > > 
> > > > v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
> > > > 
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > > net/sched/act_api.c | 5 +++++
> > > > 1 file changed, 5 insertions(+)
> > > > 
> > > > diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> > > > index 148a89ab789b..24b5534967fe 100644
> > > > --- a/net/sched/act_api.c
> > > > +++ b/net/sched/act_api.c
> > > > @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
> > > > 		}
> > > > 	}
> > > > 
> > > > +	if (a->tcfa_action == TC_ACT_REDIRECT) {
> > > > +		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
> > > 
> > > Can't you push this warning through extack?
> > > 
> > > But, wouldn't it be more appropriate to fail here? User is passing
> > > invalid configuration....
> > 
> > Jiri, Jamal, thank you for the feedback.
> > 
> > Please allow me to answer both of you here, since you raised similar
> > concers.
> > 
> > I thought about rejecting the action, but that change of behavior could
> > break some users, as currently most kind of invalid tcfa_action values
> > are simply accepted.
> > 
> > If there is consensus about it, I can simply fail.
> 
> Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
> really has no meaning for anyone to use it throughout its whole history.
> I would vote for "fail", yet I admit that I am usually alone in opinion
> about similar uapi changes :)

Since even Jamal suggested the same, unless someone else voice some
opposition soon, in v4 I'll opt for rejecting actions using
TC_ACT_REDIRECT.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-25 15:48         ` Paolo Abeni
@ 2018-07-25 16:29           ` Paolo Abeni
  2018-07-25 16:29           ` Daniel Borkmann
  1 sibling, 0 replies; 43+ messages in thread
From: Paolo Abeni @ 2018-07-25 16:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jamal Hadi Salim, netdev, Cong Wang, Daniel Borkmann,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller

On Wed, 2018-07-25 at 17:48 +0200, Paolo Abeni wrote:
> On Wed, 2018-07-25 at 15:03 +0200, Jiri Pirko wrote:
> > Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
> > really has no meaning for anyone to use it throughout its whole history.
> > I would vote for "fail", yet I admit that I am usually alone in opinion
> > about similar uapi changes :)
> 
> Since even Jamal suggested the same, unless someone else voice some
> opposition soon, in v4 I'll opt for rejecting actions using
> TC_ACT_REDIRECT.

Thinking again about it, I'm going to drop this patch from this series.
Since v2 is not strictly needed anymore and actually quite unrelated.

Thanks and sorry for the reiterated noise ;)

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-25 15:48         ` Paolo Abeni
  2018-07-25 16:29           ` Paolo Abeni
@ 2018-07-25 16:29           ` Daniel Borkmann
  2018-07-26  7:43             ` Jiri Pirko
  1 sibling, 1 reply; 43+ messages in thread
From: Daniel Borkmann @ 2018-07-25 16:29 UTC (permalink / raw)
  To: Paolo Abeni, Jiri Pirko
  Cc: Jamal Hadi Salim, netdev, Cong Wang, Marcelo Ricardo Leitner,
	Eyal Birger, David S. Miller, alexei.starovoitov

On 07/25/2018 05:48 PM, Paolo Abeni wrote:
> On Wed, 2018-07-25 at 15:03 +0200, Jiri Pirko wrote:
>> Wed, Jul 25, 2018 at 02:54:04PM CEST, pabeni@redhat.com wrote:
>>> On Wed, 2018-07-25 at 13:56 +0200, Jiri Pirko wrote:
>>>> Tue, Jul 24, 2018 at 10:06:39PM CEST, pabeni@redhat.com wrote:
>>>>> Only cls_bpf and act_bpf can safely use such value. If a generic
>>>>> action is configured by user space to return TC_ACT_REDIRECT,
>>>>> the usually visible behavior is passing the skb up the stack - as
>>>>> for unknown action, but, with complex configuration, more random
>>>>> results can be obtained.
>>>>>
>>>>> This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
>>>>> at action init time, making the kernel behavior more consistent.
>>>>>
>>>>> v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
>>>>>
>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>> ---
>>>>> net/sched/act_api.c | 5 +++++
>>>>> 1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>>>> index 148a89ab789b..24b5534967fe 100644
>>>>> --- a/net/sched/act_api.c
>>>>> +++ b/net/sched/act_api.c
>>>>> @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>>>>> 		}
>>>>> 	}
>>>>>
>>>>> +	if (a->tcfa_action == TC_ACT_REDIRECT) {
>>>>> +		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
>>>>
>>>> Can't you push this warning through extack?
>>>>
>>>> But, wouldn't it be more appropriate to fail here? User is passing
>>>> invalid configuration....
>>>
>>> Jiri, Jamal, thank you for the feedback.
>>>
>>> Please allow me to answer both of you here, since you raised similar
>>> concers.
>>>
>>> I thought about rejecting the action, but that change of behavior could
>>> break some users, as currently most kind of invalid tcfa_action values
>>> are simply accepted.
>>>
>>> If there is consensus about it, I can simply fail.
>>
>> Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
>> really has no meaning for anyone to use it throughout its whole history.

That claim is completely wrong.

>> I would vote for "fail", yet I admit that I am usually alone in opinion
>> about similar uapi changes :)
> 
> Since even Jamal suggested the same, unless someone else voice some
> opposition soon, in v4 I'll opt for rejecting actions using
> TC_ACT_REDIRECT.

You should probably leave out act_bpf from that rejection as there may be
a small chance that users could potentially use it as default action.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25 12:27         ` Jamal Hadi Salim
  2018-07-25 14:24           ` Paolo Abeni
@ 2018-07-25 16:48           ` Cong Wang
  2018-07-25 17:09             ` Marcelo Ricardo Leitner
  1 sibling, 1 reply; 43+ messages in thread
From: Cong Wang @ 2018-07-25 16:48 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Paolo Abeni, Linux Kernel Network Developers, Jiri Pirko,
	Daniel Borkmann, Marcelo Ricardo Leitner, Eyal Birger,
	David Miller

On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> Those changes were there from the beginning (above patch did
> not introduce them).
> IIRC, the reason was to distinguish between policy intended
> drops and drops because of errors.

There must be a limit for "overlimit" to make sense. There is
no limit in mirred action's context, probably there is only
such a limit in act_police. So, all rest should not touch overlimit.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25 16:48           ` Cong Wang
@ 2018-07-25 17:09             ` Marcelo Ricardo Leitner
  2018-07-26 12:52               ` Jamal Hadi Salim
  0 siblings, 1 reply; 43+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-07-25 17:09 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, Paolo Abeni, Linux Kernel Network Developers,
	Jiri Pirko, Daniel Borkmann, Eyal Birger, David Miller

On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >
> > Those changes were there from the beginning (above patch did
> > not introduce them).
> > IIRC, the reason was to distinguish between policy intended
> > drops and drops because of errors.
> 
> There must be a limit for "overlimit" to make sense. There is
> no limit in mirred action's context, probably there is only
> such a limit in act_police. So, all rest should not touch overlimit.

+1

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback
  2018-07-25 11:59   ` Jamal Hadi Salim
@ 2018-07-25 18:24     ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 43+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-07-25 18:24 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Paolo Abeni, netdev, Cong Wang, Jiri Pirko, Daniel Borkmann,
	Eyal Birger, David S. Miller

On Wed, Jul 25, 2018 at 07:59:48AM -0400, Jamal Hadi Salim wrote:
> On 24/07/18 04:06 PM, Paolo Abeni wrote:
> > Each lockless action currently does its own RCU locking in ->act().
> > This is allows using plain RCU accessor, even if the context
> > is really RCU BH.
> > 
> > This change drops the per action RCU lock, replace the accessors
> > with _bh variant, cleans up a bit the surronding code and documents
> > the RCU status in the relevant header.
> > No functional nor performance change is intended.
> > 
> > The goal of this patch is clarifying that the RCU critical section
> > used by the tc actions extends up to the classifier's caller.
> > 
> 
> This and 2/5 seems to stand on their own merit.

So does 1/5, I think.

> 
> cheers,
> jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-25 16:29           ` Daniel Borkmann
@ 2018-07-26  7:43             ` Jiri Pirko
  2018-07-27  2:48               ` Daniel Borkmann
  0 siblings, 1 reply; 43+ messages in thread
From: Jiri Pirko @ 2018-07-26  7:43 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Paolo Abeni, Jamal Hadi Salim, netdev, Cong Wang,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller,
	alexei.starovoitov

Wed, Jul 25, 2018 at 06:29:54PM CEST, daniel@iogearbox.net wrote:
>On 07/25/2018 05:48 PM, Paolo Abeni wrote:
>> On Wed, 2018-07-25 at 15:03 +0200, Jiri Pirko wrote:
>>> Wed, Jul 25, 2018 at 02:54:04PM CEST, pabeni@redhat.com wrote:
>>>> On Wed, 2018-07-25 at 13:56 +0200, Jiri Pirko wrote:
>>>>> Tue, Jul 24, 2018 at 10:06:39PM CEST, pabeni@redhat.com wrote:
>>>>>> Only cls_bpf and act_bpf can safely use such value. If a generic
>>>>>> action is configured by user space to return TC_ACT_REDIRECT,
>>>>>> the usually visible behavior is passing the skb up the stack - as
>>>>>> for unknown action, but, with complex configuration, more random
>>>>>> results can be obtained.
>>>>>>
>>>>>> This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
>>>>>> at action init time, making the kernel behavior more consistent.
>>>>>>
>>>>>> v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
>>>>>>
>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>>> ---
>>>>>> net/sched/act_api.c | 5 +++++
>>>>>> 1 file changed, 5 insertions(+)
>>>>>>
>>>>>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>>>>> index 148a89ab789b..24b5534967fe 100644
>>>>>> --- a/net/sched/act_api.c
>>>>>> +++ b/net/sched/act_api.c
>>>>>> @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>>>>>> 		}
>>>>>> 	}
>>>>>>
>>>>>> +	if (a->tcfa_action == TC_ACT_REDIRECT) {
>>>>>> +		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
>>>>>
>>>>> Can't you push this warning through extack?
>>>>>
>>>>> But, wouldn't it be more appropriate to fail here? User is passing
>>>>> invalid configuration....
>>>>
>>>> Jiri, Jamal, thank you for the feedback.
>>>>
>>>> Please allow me to answer both of you here, since you raised similar
>>>> concers.
>>>>
>>>> I thought about rejecting the action, but that change of behavior could
>>>> break some users, as currently most kind of invalid tcfa_action values
>>>> are simply accepted.
>>>>
>>>> If there is consensus about it, I can simply fail.
>>>
>>> Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
>>> really has no meaning for anyone to use it throughout its whole history.
>
>That claim is completely wrong.

Why? Does addition of TC_ACT_REDIRECT to uapi have any meaning?


>
>>> I would vote for "fail", yet I admit that I am usually alone in opinion
>>> about similar uapi changes :)
>> 
>> Since even Jamal suggested the same, unless someone else voice some
>> opposition soon, in v4 I'll opt for rejecting actions using
>> TC_ACT_REDIRECT.
>
>You should probably leave out act_bpf from that rejection as there may be
>a small chance that users could potentially use it as default action.
>
>Thanks,
>Daniel

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-25 17:09             ` Marcelo Ricardo Leitner
@ 2018-07-26 12:52               ` Jamal Hadi Salim
  2018-07-26 23:25                 ` Cong Wang
  0 siblings, 1 reply; 43+ messages in thread
From: Jamal Hadi Salim @ 2018-07-26 12:52 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, Cong Wang
  Cc: Paolo Abeni, Linux Kernel Network Developers, Jiri Pirko,
	Daniel Borkmann, Eyal Birger, David Miller

On 25/07/18 01:09 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
>> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>>
>>> Those changes were there from the beginning (above patch did
>>> not introduce them).
>>> IIRC, the reason was to distinguish between policy intended
>>> drops and drops because of errors.
>>
>> There must be a limit for "overlimit" to make sense. There is
>> no limit in mirred action's context, probably there is only
>> such a limit in act_police. So, all rest should not touch overlimit.
> 
> +1
> 

I agree we should at least record drop count(unrelated patch though).
we should keep overlimit (for no other reason other than this
has been around for at least 15 years).

On why "overlimit"? It is just a name for a counter that is useless
for most actions (but was still being transfered to user space).
It is the closest counter to counting "this failed because of
runtime errors" as opposed to "user asked us to drop this".

Probably a good alternative is to make a very small stats v3 structure
(we have migrated stats structures before) and extend for
each action/classifier/qdisc to add its extra counters using XSTATS.

Note:
If you are _mirroring_ packets - incrementing the drop counter is
misleading because the packet is not dropped by the system.
i.e the qdisc will not record it as dropped; it should for
redirect policy.  It is useful to be able to tell
them apart when you are collecting  analytics just for actions.
(if youve worked on a massive amount of machines you'll appreciate
being able to debug by looking at counters that reduce ambiguity).

cheers,
jamal

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT.
  2018-07-26 12:52               ` Jamal Hadi Salim
@ 2018-07-26 23:25                 ` Cong Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Cong Wang @ 2018-07-26 23:25 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Marcelo Ricardo Leitner, Paolo Abeni,
	Linux Kernel Network Developers, Jiri Pirko, Daniel Borkmann,
	Eyal Birger, David Miller

On Thu, Jul 26, 2018 at 5:52 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On 25/07/18 01:09 PM, Marcelo Ricardo Leitner wrote:
> > On Wed, Jul 25, 2018 at 09:48:16AM -0700, Cong Wang wrote:
> >> On Wed, Jul 25, 2018 at 5:27 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> >>>
> >>> Those changes were there from the beginning (above patch did
> >>> not introduce them).
> >>> IIRC, the reason was to distinguish between policy intended
> >>> drops and drops because of errors.
> >>
> >> There must be a limit for "overlimit" to make sense. There is
> >> no limit in mirred action's context, probably there is only
> >> such a limit in act_police. So, all rest should not touch overlimit.
> >
> > +1
> >
>
> I agree we should at least record drop count(unrelated patch though).
> we should keep overlimit (for no other reason other than this
> has been around for at least 15 years).
>
> On why "overlimit"? It is just a name for a counter that is useless
> for most actions (but was still being transfered to user space).
> It is the closest counter to counting "this failed because of
> runtime errors" as opposed to "user asked us to drop this".
>
> Probably a good alternative is to make a very small stats v3 structure
> (we have migrated stats structures before) and extend for
> each action/classifier/qdisc to add its extra counters using XSTATS.

Agreed.

>
> Note:
> If you are _mirroring_ packets - incrementing the drop counter is
> misleading because the packet is not dropped by the system.
> i.e the qdisc will not record it as dropped; it should for
> redirect policy.  It is useful to be able to tell
> them apart when you are collecting  analytics just for actions.

Sounds like we just need another counter rather than re-using overlimit
or drops.


> (if youve worked on a massive amount of machines you'll appreciate
> being able to debug by looking at counters that reduce ambiguity).
>

Yes, this is how I found out the overlimit of htb qdisc is inaccurate
or misleading, instead the one in htb class is accurate, see:

commit 3c75f6ee139d464351f8ab77a042dd3635769d98
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Sep 18 12:36:22 2017 -0700

    net_sched: sch_htb: add per class overlimits counter

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-26  7:43             ` Jiri Pirko
@ 2018-07-27  2:48               ` Daniel Borkmann
  0 siblings, 0 replies; 43+ messages in thread
From: Daniel Borkmann @ 2018-07-27  2:48 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Paolo Abeni, Jamal Hadi Salim, netdev, Cong Wang,
	Marcelo Ricardo Leitner, Eyal Birger, David S. Miller,
	alexei.starovoitov

On 07/26/2018 09:43 AM, Jiri Pirko wrote:
> Wed, Jul 25, 2018 at 06:29:54PM CEST, daniel@iogearbox.net wrote:
>> On 07/25/2018 05:48 PM, Paolo Abeni wrote:
>>> On Wed, 2018-07-25 at 15:03 +0200, Jiri Pirko wrote:
>>>> Wed, Jul 25, 2018 at 02:54:04PM CEST, pabeni@redhat.com wrote:
>>>>> On Wed, 2018-07-25 at 13:56 +0200, Jiri Pirko wrote:
>>>>>> Tue, Jul 24, 2018 at 10:06:39PM CEST, pabeni@redhat.com wrote:
>>>>>>> Only cls_bpf and act_bpf can safely use such value. If a generic
>>>>>>> action is configured by user space to return TC_ACT_REDIRECT,
>>>>>>> the usually visible behavior is passing the skb up the stack - as
>>>>>>> for unknown action, but, with complex configuration, more random
>>>>>>> results can be obtained.
>>>>>>>
>>>>>>> This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_UNSPEC
>>>>>>> at action init time, making the kernel behavior more consistent.
>>>>>>>
>>>>>>> v1 -> v3: use TC_ACT_UNSPEC instead of a newly definied act value
>>>>>>>
>>>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>>>> ---
>>>>>>> net/sched/act_api.c | 5 +++++
>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>>>>>> index 148a89ab789b..24b5534967fe 100644
>>>>>>> --- a/net/sched/act_api.c
>>>>>>> +++ b/net/sched/act_api.c
>>>>>>> @@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
>>>>>>> 		}
>>>>>>> 	}
>>>>>>>
>>>>>>> +	if (a->tcfa_action == TC_ACT_REDIRECT) {
>>>>>>> +		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
>>>>>>
>>>>>> Can't you push this warning through extack?
>>>>>>
>>>>>> But, wouldn't it be more appropriate to fail here? User is passing
>>>>>> invalid configuration....
>>>>>
>>>>> Jiri, Jamal, thank you for the feedback.
>>>>>
>>>>> Please allow me to answer both of you here, since you raised similar
>>>>> concers.
>>>>>
>>>>> I thought about rejecting the action, but that change of behavior could
>>>>> break some users, as currently most kind of invalid tcfa_action values
>>>>> are simply accepted.
>>>>>
>>>>> If there is consensus about it, I can simply fail.
>>>>
>>>> Well it was obviously wrong to expose TC_ACT_REDIRECT to uapi and it
>>>> really has no meaning for anyone to use it throughout its whole history.
>>
>> That claim is completely wrong.
> 
> Why? Does addition of TC_ACT_REDIRECT to uapi have any meaning?

BPF programs return TC_ACT_* as a verdict which is then further processed,
including TC_ACT_REDIRECT. Hence, it's a contract for them similarly as e.g.
helper enums (BPF_FUNC_*) and other things they make use of.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-07-27  4:08 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-24 20:06 [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Paolo Abeni
2018-07-24 20:06 ` [PATCH net-next v3 1/5] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
2018-07-25 11:55   ` Jamal Hadi Salim
2018-07-25 11:56   ` Jiri Pirko
2018-07-25 12:54     ` Paolo Abeni
2018-07-25 13:03       ` Jiri Pirko
2018-07-25 15:48         ` Paolo Abeni
2018-07-25 16:29           ` Paolo Abeni
2018-07-25 16:29           ` Daniel Borkmann
2018-07-26  7:43             ` Jiri Pirko
2018-07-27  2:48               ` Daniel Borkmann
2018-07-24 20:06 ` [PATCH net-next v3 2/5] net/sched: user-space can't set unknown tcfa_action values Paolo Abeni
2018-07-25 12:26   ` Jiri Pirko
2018-07-24 20:06 ` [PATCH net-next v3 3/5] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
2018-07-25 11:59   ` Jamal Hadi Salim
2018-07-25 18:24     ` Marcelo Ricardo Leitner
2018-07-25 12:32   ` Jiri Pirko
2018-07-24 20:06 ` [PATCH net-next v3 4/5] net/tc: introduce TC_ACT_REINJECT Paolo Abeni
2018-07-24 20:38   ` Cong Wang
2018-07-24 20:50     ` Cong Wang
2018-07-25  8:29       ` Paolo Abeni
2018-07-25 12:27         ` Jamal Hadi Salim
2018-07-25 14:24           ` Paolo Abeni
2018-07-25 15:26             ` Jamal Hadi Salim
2018-07-25 16:48           ` Cong Wang
2018-07-25 17:09             ` Marcelo Ricardo Leitner
2018-07-26 12:52               ` Jamal Hadi Salim
2018-07-26 23:25                 ` Cong Wang
2018-07-25 12:16   ` Jamal Hadi Salim
2018-07-25 12:59     ` Jiri Pirko
2018-07-25 13:55     ` Paolo Abeni
2018-07-25 12:57   ` Jiri Pirko
2018-07-24 20:06 ` [PATCH net-next v3 5/5] act_mirred: use TC_ACT_REINJECT when possible Paolo Abeni
2018-07-24 21:15   ` Cong Wang
2018-07-25 10:14     ` Paolo Abeni
2018-07-25 13:30       ` Jiri Pirko
2018-07-25 11:50     ` Jamal Hadi Salim
2018-07-25 13:52   ` Jiri Pirko
2018-07-25 14:04     ` Paolo Abeni
2018-07-25 14:30       ` Jiri Pirko
2018-07-25 11:53 ` [PATCH net-next v3 0/5] TC: refactor act_mirred packets re-injection Jiri Pirko
2018-07-25 12:07   ` Paolo Abeni
2018-07-25 12:17     ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.