All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] TC: refactor TC_ACT_REDIRECT action
@ 2018-07-13  9:54 Paolo Abeni
  2018-07-13  9:54 ` [PATCH net-next 1/4] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13  9:54 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Marcelo Ricardo Leitner

This series is aimed at improving the act_mirred redirect performances.
Such action is used by OVS to represent TC S/W flows, and it's current largest
bottle-neck is the need for a skb_clone() for each packet.

The first 2 patches introduce some cleanup and safeguards to allow changing the
TC_ACT_REDIRECT handling: currently the action context is stored in per CPU
variables, and we allow also to use the tcf_result. Finally we use this extended
infrastructure in act_mirred to implement redirect via TC_ACT_REDIRECT, when
possible.

Overall this the above gives a ~10% performance improvement in forwarding tput,
when using the TC S/W datapath.

Paolo Abeni (4):
  tc/act: user space can't use TC_ACT_REDIRECT directly
  tc/act: remove unneeded RCU lock in action callback
  net/sched: refactor TC_ACT_REDIRECT handling
  act_mirred: use ACT_REDIRECT when possible

 include/net/act_api.h        |  2 +-
 include/net/sch_generic.h    | 17 ++++++++++++++++-
 include/uapi/linux/pkt_cls.h |  1 +
 net/core/dev.c               |  4 ++--
 net/core/filter.c            | 29 +++++++++++++++++++++++------
 net/core/lwt_bpf.c           |  5 ++++-
 net/sched/act_api.c          |  5 +++++
 net/sched/act_bpf.c          |  8 ++++----
 net/sched/act_csum.c         | 12 +++---------
 net/sched/act_ife.c          |  5 +----
 net/sched/act_mirred.c       | 19 ++++++++++++++-----
 net/sched/act_sample.c       |  4 +---
 net/sched/act_skbedit.c      | 10 +++-------
 net/sched/act_skbmod.c       | 21 +++++++++------------
 net/sched/act_tunnel_key.c   |  6 +-----
 net/sched/act_vlan.c         | 19 +++++++------------
 net/sched/cls_bpf.c          |  8 +++++---
 17 files changed, 100 insertions(+), 75 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net-next 1/4] tc/act: user space can't use TC_ACT_REDIRECT directly
  2018-07-13  9:54 [PATCH net-next 0/4] TC: refactor TC_ACT_REDIRECT action Paolo Abeni
@ 2018-07-13  9:54 ` Paolo Abeni
  2018-07-13  9:55 ` [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13  9:54 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Marcelo Ricardo Leitner

Only cls_bpf and act_bpf can safely use such value. If a generic
action is configured by user space to return TC_ACT_REDIRECT,
the usually visible behavior is passing the skb up the stack - as
for unknown action, but, with complex configuration, more random
results can be obtained.

This patch forcefully converts TC_ACT_REDIRECT to TC_ACT_LAST + 1
at action init time, making the kernel behavior more consistent.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/uapi/linux/pkt_cls.h | 1 +
 net/sched/act_api.c          | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index c4262d911596..7cdd62b51106 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -45,6 +45,7 @@ enum {
 				   * the skb and act like everything
 				   * is alright.
 				   */
+#define TC_ACT_LAST		TC_ACT_TRAP
 
 /* There is a special kind of actions called "extended actions",
  * which need a value parameter. These have a local opcode located in
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 148a89ab789b..f6438f246dab 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -895,6 +895,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 		}
 	}
 
+	if (a->tcfa_action == TC_ACT_REDIRECT) {
+		net_warn_ratelimited("TC_ACT_REDIRECT can't be used directly");
+		a->tcfa_action = TC_ACT_LAST + 1;
+	}
+
 	return a;
 
 err_mod:
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback
  2018-07-13  9:54 [PATCH net-next 0/4] TC: refactor TC_ACT_REDIRECT action Paolo Abeni
  2018-07-13  9:54 ` [PATCH net-next 1/4] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
@ 2018-07-13  9:55 ` Paolo Abeni
  2018-07-13 14:08   ` Daniel Borkmann
  2018-07-13  9:55 ` [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling Paolo Abeni
  2018-07-13  9:55 ` [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible Paolo Abeni
  3 siblings, 1 reply; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Marcelo Ricardo Leitner

Each lockless action currently does its own RCU locking in ->act().
This is allows using plain RCU accessor, even if the context
is really RCU BH.

This change drops the per action RCU lock, replace the accessors
with _bh variant, cleans up a bit the surronding code and documents
the RCU status in the relevant header.
No functional nor performance change is intended.

The goal of this patch is clarifying that the RCU critical section
used by the tc actions extends up to the classifier's caller.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/act_api.h      |  2 +-
 include/net/sch_generic.h  |  2 ++
 net/sched/act_bpf.c        |  4 +---
 net/sched/act_csum.c       | 12 +++---------
 net/sched/act_ife.c        |  5 +----
 net/sched/act_mirred.c     |  4 +---
 net/sched/act_sample.c     |  4 +---
 net/sched/act_skbedit.c    | 10 +++-------
 net/sched/act_skbmod.c     | 21 +++++++++------------
 net/sched/act_tunnel_key.c |  6 +-----
 net/sched/act_vlan.c       | 19 +++++++------------
 11 files changed, 30 insertions(+), 59 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 683ce41053d9..8c9bc02d05e1 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -85,7 +85,7 @@ struct tc_action_ops {
 	size_t	size;
 	struct module		*owner;
 	int     (*act)(struct sk_buff *, const struct tc_action *,
-		       struct tcf_result *);
+		       struct tcf_result *); /* called under RCU BH lock*/
 	int     (*dump)(struct sk_buff *, struct tc_action *, int, int);
 	void	(*cleanup)(struct tc_action *);
 	int     (*lookup)(struct net *net, struct tc_action **a, u32 index,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7432100027b7..056dc1083aa3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -275,6 +275,8 @@ struct tcf_proto {
 	/* Fast access part */
 	struct tcf_proto __rcu	*next;
 	void __rcu		*root;
+
+	/* called under RCU BH lock*/
 	int			(*classify)(struct sk_buff *,
 					    const struct tcf_proto *,
 					    struct tcf_result *);
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 06f743d8ed41..ac20266460c0 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -45,8 +45,7 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
 	tcf_lastuse_update(&prog->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
 
-	rcu_read_lock();
-	filter = rcu_dereference(prog->filter);
+	filter = rcu_dereference_bh(prog->filter);
 	if (at_ingress) {
 		__skb_push(skb, skb->mac_len);
 		bpf_compute_data_pointers(skb);
@@ -56,7 +55,6 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
 		bpf_compute_data_pointers(skb);
 		filter_res = BPF_PROG_RUN(filter, skb);
 	}
-	rcu_read_unlock();
 
 	/* A BPF program may overwrite the default action opcode.
 	 * Similarly as in cls_bpf, if filter_res == -1 we use the
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index bd232d3bd022..4f092fbd1e20 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -561,15 +561,14 @@ static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
 	u32 update_flags;
 	int action;
 
-	rcu_read_lock();
-	params = rcu_dereference(p->params);
+	params = rcu_dereference_bh(p->params);
 
 	tcf_lastuse_update(&p->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(p->common.cpu_bstats), skb);
 
 	action = params->action;
 	if (unlikely(action == TC_ACT_SHOT))
-		goto drop_stats;
+		goto drop;
 
 	update_flags = params->update_flags;
 	switch (tc_skb_protocol(skb)) {
@@ -583,16 +582,11 @@ static int tcf_csum(struct sk_buff *skb, const struct tc_action *a,
 		break;
 	}
 
-unlock:
-	rcu_read_unlock();
 	return action;
 
 drop:
-	action = TC_ACT_SHOT;
-
-drop_stats:
 	qstats_drop_inc(this_cpu_ptr(p->common.cpu_qstats));
-	goto unlock;
+	return TC_ACT_SHOT;
 }
 
 static int tcf_csum_dump(struct sk_buff *skb, struct tc_action *a, int bind,
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 3d6e265758c0..df4060e32d43 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -820,14 +820,11 @@ static int tcf_ife_act(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_ife_params *p;
 	int ret;
 
-	rcu_read_lock();
-	p = rcu_dereference(ife->params);
+	p = rcu_dereference_bh(ife->params);
 	if (p->flags & IFE_ENCODE) {
 		ret = tcf_ife_encode(skb, a, res, p);
-		rcu_read_unlock();
 		return ret;
 	}
-	rcu_read_unlock();
 
 	return tcf_ife_decode(skb, a, res);
 }
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 6afd89a36c69..eeb335f03102 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -181,11 +181,10 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	tcf_lastuse_update(&m->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
 
-	rcu_read_lock();
 	m_mac_header_xmit = READ_ONCE(m->tcfm_mac_header_xmit);
 	m_eaction = READ_ONCE(m->tcfm_eaction);
 	retval = READ_ONCE(m->tcf_action);
-	dev = rcu_dereference(m->tcfm_dev);
+	dev = rcu_dereference_bh(m->tcfm_dev);
 	if (unlikely(!dev)) {
 		pr_notice_once("tc mirred: target device is gone\n");
 		goto out;
@@ -236,7 +235,6 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		if (tcf_mirred_is_act_redirect(m_eaction))
 			retval = TC_ACT_SHOT;
 	}
-	rcu_read_unlock();
 
 	return retval;
 }
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 3079e7be5bde..2608ccc83e5e 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -140,8 +140,7 @@ static int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a,
 	bstats_cpu_update(this_cpu_ptr(s->common.cpu_bstats), skb);
 	retval = READ_ONCE(s->tcf_action);
 
-	rcu_read_lock();
-	psample_group = rcu_dereference(s->psample_group);
+	psample_group = rcu_dereference_bh(s->psample_group);
 
 	/* randomly sample packets according to rate */
 	if (psample_group && (prandom_u32() % s->rate == 0)) {
@@ -165,7 +164,6 @@ static int tcf_sample_act(struct sk_buff *skb, const struct tc_action *a,
 			skb_pull(skb, skb->mac_len);
 	}
 
-	rcu_read_unlock();
 	return retval;
 }
 
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index da56e6938c9e..a6db47ebec11 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -43,8 +43,7 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 	tcf_lastuse_update(&d->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb);
 
-	rcu_read_lock();
-	params = rcu_dereference(d->params);
+	params = rcu_dereference_bh(d->params);
 	action = READ_ONCE(d->tcf_action);
 
 	if (params->flags & SKBEDIT_F_PRIORITY)
@@ -77,14 +76,11 @@ static int tcf_skbedit(struct sk_buff *skb, const struct tc_action *a,
 	}
 	if (params->flags & SKBEDIT_F_PTYPE)
 		skb->pkt_type = params->ptype;
-
-unlock:
-	rcu_read_unlock();
 	return action;
+
 err:
 	qstats_drop_inc(this_cpu_ptr(d->common.cpu_qstats));
-	action = TC_ACT_SHOT;
-	goto unlock;
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index cdc6bacfb190..c437c6d51a71 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -41,20 +41,14 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 	 * then MAX_EDIT_LEN needs to change appropriately
 	*/
 	err = skb_ensure_writable(skb, MAX_EDIT_LEN);
-	if (unlikely(err)) { /* best policy is to drop on the floor */
-		qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
-		return TC_ACT_SHOT;
-	}
+	if (unlikely(err)) /* best policy is to drop on the floor */
+		goto drop;
 
-	rcu_read_lock();
 	action = READ_ONCE(d->tcf_action);
-	if (unlikely(action == TC_ACT_SHOT)) {
-		qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
-		rcu_read_unlock();
-		return action;
-	}
+	if (unlikely(action == TC_ACT_SHOT))
+		goto drop;
 
-	p = rcu_dereference(d->skbmod_p);
+	p = rcu_dereference_bh(d->skbmod_p);
 	flags = p->flags;
 	if (flags & SKBMOD_F_DMAC)
 		ether_addr_copy(eth_hdr(skb)->h_dest, p->eth_dst);
@@ -62,7 +56,6 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 		ether_addr_copy(eth_hdr(skb)->h_source, p->eth_src);
 	if (flags & SKBMOD_F_ETYPE)
 		eth_hdr(skb)->h_proto = p->eth_type;
-	rcu_read_unlock();
 
 	if (flags & SKBMOD_F_SWAPMAC) {
 		u16 tmpaddr[ETH_ALEN / 2]; /* ether_addr_copy() requirement */
@@ -73,6 +66,10 @@ static int tcf_skbmod_run(struct sk_buff *skb, const struct tc_action *a,
 	}
 
 	return action;
+
+drop:
+	qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy skbmod_policy[TCA_SKBMOD_MAX + 1] = {
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 3ec585d58762..1ab959ef3547 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -31,9 +31,7 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_tunnel_key_params *params;
 	int action;
 
-	rcu_read_lock();
-
-	params = rcu_dereference(t->params);
+	params = rcu_dereference_bh(t->params);
 
 	tcf_lastuse_update(&t->tcf_tm);
 	bstats_cpu_update(this_cpu_ptr(t->common.cpu_bstats), skb);
@@ -53,8 +51,6 @@ static int tunnel_key_act(struct sk_buff *skb, const struct tc_action *a,
 		break;
 	}
 
-	rcu_read_unlock();
-
 	return action;
 }
 
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index ad37f308175a..15a0ee214c9c 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -40,11 +40,9 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	if (skb_at_tc_ingress(skb))
 		skb_push_rcsum(skb, skb->mac_len);
 
-	rcu_read_lock();
-
 	action = READ_ONCE(v->tcf_action);
 
-	p = rcu_dereference(v->vlan_p);
+	p = rcu_dereference_bh(v->vlan_p);
 
 	switch (p->tcfv_action) {
 	case TCA_VLAN_ACT_POP:
@@ -61,7 +59,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	case TCA_VLAN_ACT_MODIFY:
 		/* No-op if no vlan tag (either hw-accel or in-payload) */
 		if (!skb_vlan_tagged(skb))
-			goto unlock;
+			goto out;
 		/* extract existing tag (and guarantee no hw-accel tag) */
 		if (skb_vlan_tag_present(skb)) {
 			tci = skb_vlan_tag_get(skb);
@@ -86,18 +84,15 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 		BUG();
 	}
 
-	goto unlock;
-
-drop:
-	action = TC_ACT_SHOT;
-	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
-
-unlock:
-	rcu_read_unlock();
+out:
 	if (skb_at_tc_ingress(skb))
 		skb_pull_rcsum(skb, skb->mac_len);
 
 	return action;
+
+drop:
+	qstats_drop_inc(this_cpu_ptr(v->common.cpu_qstats));
+	return TC_ACT_SHOT;
 }
 
 static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] = {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling
  2018-07-13  9:54 [PATCH net-next 0/4] TC: refactor TC_ACT_REDIRECT action Paolo Abeni
  2018-07-13  9:54 ` [PATCH net-next 1/4] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
  2018-07-13  9:55 ` [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
@ 2018-07-13  9:55 ` Paolo Abeni
  2018-07-13 14:13   ` Daniel Borkmann
  2018-07-13  9:55 ` [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible Paolo Abeni
  3 siblings, 1 reply; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Marcelo Ricardo Leitner

This patch changes the TC_ACT_REDIRECT code path to allow
providing the redirect parameters via the tcf_result argument.

Such union is expanded to host the redirect device, the redirect
direction (ingress/egress) and the stats to be updated on error
conditions.

Actions/classifiers using TC_ACT_REDIRECT can either:
* fill the tcf_result redirect related fields
* clear such fields and use the bpf per cpu redirect info

skb_do_redirect now tries to fetch the relevant data from tcf_result
and fall back to access redirect info. It also updates the stats
accordingly to the redirect result, if provided by the caller.

This will allow using the TC_ACT_REDIRECT action in more places in
the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/sch_generic.h | 15 ++++++++++++++-
 net/core/dev.c            |  4 ++--
 net/core/filter.c         | 29 +++++++++++++++++++++++------
 net/core/lwt_bpf.c        |  5 ++++-
 net/sched/act_bpf.c       |  4 +++-
 net/sched/cls_bpf.c       |  8 +++++---
 6 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 056dc1083aa3..dd9e00d017b3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -235,9 +235,22 @@ struct tcf_result {
 			u32		classid;
 		};
 		const struct tcf_proto *goto_tp;
+
+		/* used by the TC_ACT_REDIRECT action */
+		struct {
+			/* device and direction, or 0 bpf redirect */
+			long		dev_ingress;
+			struct gnet_stats_queue *qstats;
+		};
 	};
 };
 
+#define TCF_RESULT_REDIR_DEV(res) \
+	((struct net_device *)((res)->dev_ingress & ~1))
+#define TCF_RESULT_REDIR_INGRESS(res) ((res)->dev_ingress & 1)
+#define TCF_RESULT_SET_REDIRECT(res, dev, ingress) \
+	((res)->dev_ingress = (long)(dev) | (!!(ingress)))
+
 struct tcf_proto_ops {
 	struct list_head	head;
 	char			kind[IFNAMSIZ];
@@ -543,7 +556,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 				struct netlink_ext_ack *extack);
 void __qdisc_calculate_pkt_len(struct sk_buff *skb,
 			       const struct qdisc_size_table *stab);
-int skb_do_redirect(struct sk_buff *);
+int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res);
 
 static inline void skb_reset_tc(struct sk_buff *skb)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index 14a748ee8cc9..a283dbfde30c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3538,7 +3538,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 		return NULL;
 	case TC_ACT_REDIRECT:
 		/* No need to push/pop skb's mac_header here on egress! */
-		skb_do_redirect(skb);
+		skb_do_redirect(skb, &cl_res);
 		*ret = NET_XMIT_SUCCESS;
 		return NULL;
 	default:
@@ -4600,7 +4600,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
 		 * redirecting to another netdev
 		 */
 		__skb_push(skb, skb->mac_len);
-		skb_do_redirect(skb);
+		skb_do_redirect(skb, &cl_res);
 		return NULL;
 	default:
 		break;
diff --git a/net/core/filter.c b/net/core/filter.c
index b9ec916f4e3a..4f64cf5189e6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2062,19 +2062,36 @@ BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags)
 	return TC_ACT_REDIRECT;
 }
 
-int skb_do_redirect(struct sk_buff *skb)
+int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res)
 {
-	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
+	struct gnet_stats_queue *stats;
 	struct net_device *dev;
+	int ret, flags;
 
-	dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
-	ri->ifindex = 0;
+	if (!res->dev_ingress) {
+		struct redirect_info *ri = this_cpu_ptr(&redirect_info);
+
+		dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
+		flags = ri->flags;
+		ri->ifindex = 0;
+		stats = NULL;
+	} else {
+		dev = TCF_RESULT_REDIR_DEV(res);
+		flags = TCF_RESULT_REDIR_INGRESS(res) ? BPF_F_INGRESS : 0;
+		stats = res->qstats;
+	}
 	if (unlikely(!dev)) {
 		kfree_skb(skb);
-		return -EINVAL;
+		ret = -EINVAL;
+		goto out;
 	}
 
-	return __bpf_redirect(skb, dev, ri->flags);
+	ret = __bpf_redirect(skb, dev, flags);
+
+out:
+	if (ret && stats)
+		qstats_overlimit_inc(res->qstats);
+	return ret;
 }
 
 static const struct bpf_func_proto bpf_redirect_proto = {
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index e7e626fb87bb..8dde1093994a 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -65,7 +65,10 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
 				     lwt->name ? : "<unknown>");
 			ret = BPF_OK;
 		} else {
-			ret = skb_do_redirect(skb);
+			struct tcf_result res;
+
+			res.dev_ingress = 0;
+			ret = skb_do_redirect(skb, &res);
 			if (ret == 0)
 				ret = BPF_REDIRECT;
 		}
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index ac20266460c0..6fd46b691181 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -67,10 +67,12 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
 	 * returned.
 	 */
 	switch (filter_res) {
+	case TC_ACT_REDIRECT:
+		res->dev_ingress = 0;
+		/* fall-through */
 	case TC_ACT_PIPE:
 	case TC_ACT_RECLASSIFY:
 	case TC_ACT_OK:
-	case TC_ACT_REDIRECT:
 		action = filter_res;
 		break;
 	case TC_ACT_SHOT:
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 66e0ac9811f9..f0fb7ded8fe2 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -65,14 +65,16 @@ static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
 				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
 };
 
-static int cls_bpf_exec_opcode(int code)
+static int cls_bpf_exec_opcode(int code, struct tcf_result *res)
 {
 	switch (code) {
+	case TC_ACT_REDIRECT:
+		res->dev_ingress = 0;
+		/* fall-through */
 	case TC_ACT_OK:
 	case TC_ACT_SHOT:
 	case TC_ACT_STOLEN:
 	case TC_ACT_TRAP:
-	case TC_ACT_REDIRECT:
 	case TC_ACT_UNSPEC:
 		return code;
 	default:
@@ -113,7 +115,7 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 			res->classid = TC_H_MAJ(prog->res.classid) |
 				       qdisc_skb_cb(skb)->tc_classid;
 
-			ret = cls_bpf_exec_opcode(filter_res);
+			ret = cls_bpf_exec_opcode(filter_res, res);
 			if (ret == TC_ACT_UNSPEC)
 				continue;
 			break;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-13  9:54 [PATCH net-next 0/4] TC: refactor TC_ACT_REDIRECT action Paolo Abeni
                   ` (2 preceding siblings ...)
  2018-07-13  9:55 ` [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling Paolo Abeni
@ 2018-07-13  9:55 ` Paolo Abeni
  2018-07-16 23:39   ` Cong Wang
  2018-07-19 17:16   ` kbuild test robot
  3 siblings, 2 replies; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13  9:55 UTC (permalink / raw)
  To: netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Daniel Borkmann, Marcelo Ricardo Leitner

When mirred is invoked from the ingress path, and it wants to redirect
the processed packet, it can now use the ACT_REDIRECT action,
filling the tcf_result accordingly.

This avoids a skb_clone() in the TC S/W data path giving a ~10%
improvement in forwarding performances. Overall TC S/W performances
are now comparable to the kernel openswitch datapath.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/sched/act_mirred.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index eeb335f03102..b19317426117 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -175,6 +175,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	struct net_device *dev;
 	struct sk_buff *skb2;
 	int retval, err = 0;
+	bool want_ingress;
 	int m_eaction;
 	int mac_len;
 
@@ -185,6 +186,17 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	m_eaction = READ_ONCE(m->tcfm_eaction);
 	retval = READ_ONCE(m->tcf_action);
 	dev = rcu_dereference_bh(m->tcfm_dev);
+	want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
+	if (skb_at_tc_ingress(skb) && tcf_mirred_is_act_redirect(m_eaction)) {
+		skb->tc_redirected = 1;
+		skb->tc_from_ingress = 1;
+
+		/* the core redirect code will check dev and its status */
+		TCF_RESULT_SET_REDIRECT(res, dev, want_ingress);
+		res->qstats = this_cpu_ptr(m->common.cpu_qstats);
+		return TC_ACT_REDIRECT;
+	}
+
 	if (unlikely(!dev)) {
 		pr_notice_once("tc mirred: target device is gone\n");
 		goto out;
@@ -204,8 +216,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	 * and devices expect a mac header on xmit, then mac push/pull is
 	 * needed.
 	 */
-	if (skb_at_tc_ingress(skb) != tcf_mirred_act_wants_ingress(m_eaction) &&
-	    m_mac_header_xmit) {
+	if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
 		if (!skb_at_tc_ingress(skb)) {
 			/* caught at egress, act ingress: pull mac */
 			mac_len = skb_network_header(skb) - skb_mac_header(skb);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback
  2018-07-13  9:55 ` [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
@ 2018-07-13 14:08   ` Daniel Borkmann
  2018-07-13 14:26     ` Paolo Abeni
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2018-07-13 14:08 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Marcelo Ricardo Leitner

Hi Paolo,

On 07/13/2018 11:55 AM, Paolo Abeni wrote:
> Each lockless action currently does its own RCU locking in ->act().
> This is allows using plain RCU accessor, even if the context
> is really RCU BH.
> 
> This change drops the per action RCU lock, replace the accessors
> with _bh variant, cleans up a bit the surronding code and documents
> the RCU status in the relevant header.
> No functional nor performance change is intended.
> 
> The goal of this patch is clarifying that the RCU critical section
> used by the tc actions extends up to the classifier's caller.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
[...]
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index 06f743d8ed41..ac20266460c0 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -45,8 +45,7 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
>  	tcf_lastuse_update(&prog->tcf_tm);
>  	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
>  
> -	rcu_read_lock();
> -	filter = rcu_dereference(prog->filter);
> +	filter = rcu_dereference_bh(prog->filter);
>  	if (at_ingress) {
>  		__skb_push(skb, skb->mac_len);
>  		bpf_compute_data_pointers(skb);
> @@ -56,7 +55,6 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
>  		bpf_compute_data_pointers(skb);
>  		filter_res = BPF_PROG_RUN(filter, skb);
>  	}
> -	rcu_read_unlock();

This conversion is not correct, BPF itself relies on RCU but not RCU-bh flavor.
You might probably see a splat if you do e.g. a map lookup with this change in
interpreter mode on tx side.

>  	/* A BPF program may overwrite the default action opcode.
>  	 * Similarly as in cls_bpf, if filter_res == -1 we use the

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling
  2018-07-13  9:55 ` [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling Paolo Abeni
@ 2018-07-13 14:13   ` Daniel Borkmann
  2018-07-13 14:37     ` Paolo Abeni
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2018-07-13 14:13 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Marcelo Ricardo Leitner

On 07/13/2018 11:55 AM, Paolo Abeni wrote:
> This patch changes the TC_ACT_REDIRECT code path to allow
> providing the redirect parameters via the tcf_result argument.
> 
> Such union is expanded to host the redirect device, the redirect
> direction (ingress/egress) and the stats to be updated on error
> conditions.
> 
> Actions/classifiers using TC_ACT_REDIRECT can either:
> * fill the tcf_result redirect related fields
> * clear such fields and use the bpf per cpu redirect info
> 
> skb_do_redirect now tries to fetch the relevant data from tcf_result
> and fall back to access redirect info. It also updates the stats
> accordingly to the redirect result, if provided by the caller.
> 
> This will allow using the TC_ACT_REDIRECT action in more places in
> the next patch.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/net/sch_generic.h | 15 ++++++++++++++-
>  net/core/dev.c            |  4 ++--
>  net/core/filter.c         | 29 +++++++++++++++++++++++------
>  net/core/lwt_bpf.c        |  5 ++++-
>  net/sched/act_bpf.c       |  4 +++-
>  net/sched/cls_bpf.c       |  8 +++++---
>  6 files changed, 51 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 056dc1083aa3..dd9e00d017b3 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -235,9 +235,22 @@ struct tcf_result {
>  			u32		classid;
>  		};
>  		const struct tcf_proto *goto_tp;
> +
> +		/* used by the TC_ACT_REDIRECT action */
> +		struct {
> +			/* device and direction, or 0 bpf redirect */
> +			long		dev_ingress;
> +			struct gnet_stats_queue *qstats;
> +		};
>  	};
>  };
>  
> +#define TCF_RESULT_REDIR_DEV(res) \
> +	((struct net_device *)((res)->dev_ingress & ~1))
> +#define TCF_RESULT_REDIR_INGRESS(res) ((res)->dev_ingress & 1)
> +#define TCF_RESULT_SET_REDIRECT(res, dev, ingress) \
> +	((res)->dev_ingress = (long)(dev) | (!!(ingress)))
> +
>  struct tcf_proto_ops {
>  	struct list_head	head;
>  	char			kind[IFNAMSIZ];
> @@ -543,7 +556,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
>  				struct netlink_ext_ack *extack);
>  void __qdisc_calculate_pkt_len(struct sk_buff *skb,
>  			       const struct qdisc_size_table *stab);
> -int skb_do_redirect(struct sk_buff *);
> +int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res);
>  
>  static inline void skb_reset_tc(struct sk_buff *skb)
>  {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 14a748ee8cc9..a283dbfde30c 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3538,7 +3538,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>  		return NULL;
>  	case TC_ACT_REDIRECT:
>  		/* No need to push/pop skb's mac_header here on egress! */
> -		skb_do_redirect(skb);
> +		skb_do_redirect(skb, &cl_res);
>  		*ret = NET_XMIT_SUCCESS;
>  		return NULL;
>  	default:
> @@ -4600,7 +4600,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>  		 * redirecting to another netdev
>  		 */
>  		__skb_push(skb, skb->mac_len);
> -		skb_do_redirect(skb);
> +		skb_do_redirect(skb, &cl_res);
>  		return NULL;
>  	default:
>  		break;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index b9ec916f4e3a..4f64cf5189e6 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2062,19 +2062,36 @@ BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags)
>  	return TC_ACT_REDIRECT;
>  }
>  
> -int skb_do_redirect(struct sk_buff *skb)
> +int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res)
>  {
> -	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> +	struct gnet_stats_queue *stats;
>  	struct net_device *dev;
> +	int ret, flags;
>  
> -	dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
> -	ri->ifindex = 0;
> +	if (!res->dev_ingress) {
> +		struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> +
> +		dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
> +		flags = ri->flags;
> +		ri->ifindex = 0;
> +		stats = NULL;
> +	} else {
> +		dev = TCF_RESULT_REDIR_DEV(res);
> +		flags = TCF_RESULT_REDIR_INGRESS(res) ? BPF_F_INGRESS : 0;
> +		stats = res->qstats;
> +	}
>  	if (unlikely(!dev)) {
>  		kfree_skb(skb);
> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto out;
>  	}
>  
> -	return __bpf_redirect(skb, dev, ri->flags);
> +	ret = __bpf_redirect(skb, dev, flags);
> +
> +out:
> +	if (ret && stats)
> +		qstats_overlimit_inc(res->qstats);
> +	return ret;
>  }
>  
>  static const struct bpf_func_proto bpf_redirect_proto = {
> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index e7e626fb87bb..8dde1093994a 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -65,7 +65,10 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
>  				     lwt->name ? : "<unknown>");
>  			ret = BPF_OK;
>  		} else {
> -			ret = skb_do_redirect(skb);
> +			struct tcf_result res;
> +
> +			res.dev_ingress = 0;
> +			ret = skb_do_redirect(skb, &res);
>  			if (ret == 0)
>  				ret = BPF_REDIRECT;
>  		}
> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> index ac20266460c0..6fd46b691181 100644
> --- a/net/sched/act_bpf.c
> +++ b/net/sched/act_bpf.c
> @@ -67,10 +67,12 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
>  	 * returned.
>  	 */
>  	switch (filter_res) {
> +	case TC_ACT_REDIRECT:
> +		res->dev_ingress = 0;
> +		/* fall-through */
>  	case TC_ACT_PIPE:
>  	case TC_ACT_RECLASSIFY:
>  	case TC_ACT_OK:
> -	case TC_ACT_REDIRECT:
>  		action = filter_res;
>  		break;
>  	case TC_ACT_SHOT:
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index 66e0ac9811f9..f0fb7ded8fe2 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -65,14 +65,16 @@ static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
>  				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
>  };
>  
> -static int cls_bpf_exec_opcode(int code)
> +static int cls_bpf_exec_opcode(int code, struct tcf_result *res)
>  {
>  	switch (code) {
> +	case TC_ACT_REDIRECT:
> +		res->dev_ingress = 0;
> +		/* fall-through */
>  	case TC_ACT_OK:
>  	case TC_ACT_SHOT:
>  	case TC_ACT_STOLEN:
>  	case TC_ACT_TRAP:
> -	case TC_ACT_REDIRECT:
>  	case TC_ACT_UNSPEC:
>  		return code;
>  	default:
> @@ -113,7 +115,7 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>  			res->classid = TC_H_MAJ(prog->res.classid) |
>  				       qdisc_skb_cb(skb)->tc_classid;
>  
> -			ret = cls_bpf_exec_opcode(filter_res);
> +			ret = cls_bpf_exec_opcode(filter_res, res);
>  			if (ret == TC_ACT_UNSPEC)
>  				continue;
>  			break;
> 

Can't we just export the struct redirect_info and let others like
act_mirred use it, then we wouldn't need all these extra changes in
fast path?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback
  2018-07-13 14:08   ` Daniel Borkmann
@ 2018-07-13 14:26     ` Paolo Abeni
  2018-07-13 14:41       ` Daniel Borkmann
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13 14:26 UTC (permalink / raw)
  To: Daniel Borkmann, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Marcelo Ricardo Leitner

On Fri, 2018-07-13 at 16:08 +0200, Daniel Borkmann wrote:
> Hi Paolo,
> 
> On 07/13/2018 11:55 AM, Paolo Abeni wrote:
> > Each lockless action currently does its own RCU locking in ->act().
> > This is allows using plain RCU accessor, even if the context
> > is really RCU BH.
> > 
> > This change drops the per action RCU lock, replace the accessors
> > with _bh variant, cleans up a bit the surronding code and documents
> > the RCU status in the relevant header.
> > No functional nor performance change is intended.
> > 
> > The goal of this patch is clarifying that the RCU critical section
> > used by the tc actions extends up to the classifier's caller.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> [...]
> > diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> > index 06f743d8ed41..ac20266460c0 100644
> > --- a/net/sched/act_bpf.c
> > +++ b/net/sched/act_bpf.c
> > @@ -45,8 +45,7 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
> >  	tcf_lastuse_update(&prog->tcf_tm);
> >  	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
> >  
> > -	rcu_read_lock();
> > -	filter = rcu_dereference(prog->filter);
> > +	filter = rcu_dereference_bh(prog->filter);
> >  	if (at_ingress) {
> >  		__skb_push(skb, skb->mac_len);
> >  		bpf_compute_data_pointers(skb);
> > @@ -56,7 +55,6 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
> >  		bpf_compute_data_pointers(skb);
> >  		filter_res = BPF_PROG_RUN(filter, skb);
> >  	}
> > -	rcu_read_unlock();
> 
> This conversion is not correct, BPF itself relies on RCU but not RCU-bh flavor.
> You might probably see a splat if you do e.g. a map lookup with this change in
> interpreter mode on tx side.

Thank you for your review.

I actually tested with lockdep, and lockdep is happy about it.

The not so nice fact is that many TC modules already use plain RCU
primitives in the control path (call_rcu, kfree_rcu, etc.) and
rcu_derefence_bh() in the datapath (e.g. all the classifiers). AFACS,
despite the mix, this use is safe.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling
  2018-07-13 14:13   ` Daniel Borkmann
@ 2018-07-13 14:37     ` Paolo Abeni
  2018-07-13 16:32       ` Daniel Borkmann
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13 14:37 UTC (permalink / raw)
  To: Daniel Borkmann, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Marcelo Ricardo Leitner

On Fri, 2018-07-13 at 16:13 +0200, Daniel Borkmann wrote:
> On 07/13/2018 11:55 AM, Paolo Abeni wrote:
> > This patch changes the TC_ACT_REDIRECT code path to allow
> > providing the redirect parameters via the tcf_result argument.
> > 
> > Such union is expanded to host the redirect device, the redirect
> > direction (ingress/egress) and the stats to be updated on error
> > conditions.
> > 
> > Actions/classifiers using TC_ACT_REDIRECT can either:
> > * fill the tcf_result redirect related fields
> > * clear such fields and use the bpf per cpu redirect info
> > 
> > skb_do_redirect now tries to fetch the relevant data from tcf_result
> > and fall back to access redirect info. It also updates the stats
> > accordingly to the redirect result, if provided by the caller.
> > 
> > This will allow using the TC_ACT_REDIRECT action in more places in
> > the next patch.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  include/net/sch_generic.h | 15 ++++++++++++++-
> >  net/core/dev.c            |  4 ++--
> >  net/core/filter.c         | 29 +++++++++++++++++++++++------
> >  net/core/lwt_bpf.c        |  5 ++++-
> >  net/sched/act_bpf.c       |  4 +++-
> >  net/sched/cls_bpf.c       |  8 +++++---
> >  6 files changed, 51 insertions(+), 14 deletions(-)
> > 
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 056dc1083aa3..dd9e00d017b3 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -235,9 +235,22 @@ struct tcf_result {
> >  			u32		classid;
> >  		};
> >  		const struct tcf_proto *goto_tp;
> > +
> > +		/* used by the TC_ACT_REDIRECT action */
> > +		struct {
> > +			/* device and direction, or 0 bpf redirect */
> > +			long		dev_ingress;
> > +			struct gnet_stats_queue *qstats;
> > +		};
> >  	};
> >  };
> >  
> > +#define TCF_RESULT_REDIR_DEV(res) \
> > +	((struct net_device *)((res)->dev_ingress & ~1))
> > +#define TCF_RESULT_REDIR_INGRESS(res) ((res)->dev_ingress & 1)
> > +#define TCF_RESULT_SET_REDIRECT(res, dev, ingress) \
> > +	((res)->dev_ingress = (long)(dev) | (!!(ingress)))
> > +
> >  struct tcf_proto_ops {
> >  	struct list_head	head;
> >  	char			kind[IFNAMSIZ];
> > @@ -543,7 +556,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
> >  				struct netlink_ext_ack *extack);
> >  void __qdisc_calculate_pkt_len(struct sk_buff *skb,
> >  			       const struct qdisc_size_table *stab);
> > -int skb_do_redirect(struct sk_buff *);
> > +int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res);
> >  
> >  static inline void skb_reset_tc(struct sk_buff *skb)
> >  {
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 14a748ee8cc9..a283dbfde30c 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3538,7 +3538,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> >  		return NULL;
> >  	case TC_ACT_REDIRECT:
> >  		/* No need to push/pop skb's mac_header here on egress! */
> > -		skb_do_redirect(skb);
> > +		skb_do_redirect(skb, &cl_res);
> >  		*ret = NET_XMIT_SUCCESS;
> >  		return NULL;
> >  	default:
> > @@ -4600,7 +4600,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
> >  		 * redirecting to another netdev
> >  		 */
> >  		__skb_push(skb, skb->mac_len);
> > -		skb_do_redirect(skb);
> > +		skb_do_redirect(skb, &cl_res);
> >  		return NULL;
> >  	default:
> >  		break;
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index b9ec916f4e3a..4f64cf5189e6 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2062,19 +2062,36 @@ BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags)
> >  	return TC_ACT_REDIRECT;
> >  }
> >  
> > -int skb_do_redirect(struct sk_buff *skb)
> > +int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res)
> >  {
> > -	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> > +	struct gnet_stats_queue *stats;
> >  	struct net_device *dev;
> > +	int ret, flags;
> >  
> > -	dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
> > -	ri->ifindex = 0;
> > +	if (!res->dev_ingress) {
> > +		struct redirect_info *ri = this_cpu_ptr(&redirect_info);
> > +
> > +		dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
> > +		flags = ri->flags;
> > +		ri->ifindex = 0;
> > +		stats = NULL;
> > +	} else {
> > +		dev = TCF_RESULT_REDIR_DEV(res);
> > +		flags = TCF_RESULT_REDIR_INGRESS(res) ? BPF_F_INGRESS : 0;
> > +		stats = res->qstats;
> > +	}
> >  	if (unlikely(!dev)) {
> >  		kfree_skb(skb);
> > -		return -EINVAL;
> > +		ret = -EINVAL;
> > +		goto out;
> >  	}
> >  
> > -	return __bpf_redirect(skb, dev, ri->flags);
> > +	ret = __bpf_redirect(skb, dev, flags);
> > +
> > +out:
> > +	if (ret && stats)
> > +		qstats_overlimit_inc(res->qstats);
> > +	return ret;
> >  }
> >  
> >  static const struct bpf_func_proto bpf_redirect_proto = {
> > diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> > index e7e626fb87bb..8dde1093994a 100644
> > --- a/net/core/lwt_bpf.c
> > +++ b/net/core/lwt_bpf.c
> > @@ -65,7 +65,10 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
> >  				     lwt->name ? : "<unknown>");
> >  			ret = BPF_OK;
> >  		} else {
> > -			ret = skb_do_redirect(skb);
> > +			struct tcf_result res;
> > +
> > +			res.dev_ingress = 0;
> > +			ret = skb_do_redirect(skb, &res);
> >  			if (ret == 0)
> >  				ret = BPF_REDIRECT;
> >  		}
> > diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> > index ac20266460c0..6fd46b691181 100644
> > --- a/net/sched/act_bpf.c
> > +++ b/net/sched/act_bpf.c
> > @@ -67,10 +67,12 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
> >  	 * returned.
> >  	 */
> >  	switch (filter_res) {
> > +	case TC_ACT_REDIRECT:
> > +		res->dev_ingress = 0;
> > +		/* fall-through */
> >  	case TC_ACT_PIPE:
> >  	case TC_ACT_RECLASSIFY:
> >  	case TC_ACT_OK:
> > -	case TC_ACT_REDIRECT:
> >  		action = filter_res;
> >  		break;
> >  	case TC_ACT_SHOT:
> > diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> > index 66e0ac9811f9..f0fb7ded8fe2 100644
> > --- a/net/sched/cls_bpf.c
> > +++ b/net/sched/cls_bpf.c
> > @@ -65,14 +65,16 @@ static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
> >  				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
> >  };
> >  
> > -static int cls_bpf_exec_opcode(int code)
> > +static int cls_bpf_exec_opcode(int code, struct tcf_result *res)
> >  {
> >  	switch (code) {
> > +	case TC_ACT_REDIRECT:
> > +		res->dev_ingress = 0;
> > +		/* fall-through */
> >  	case TC_ACT_OK:
> >  	case TC_ACT_SHOT:
> >  	case TC_ACT_STOLEN:
> >  	case TC_ACT_TRAP:
> > -	case TC_ACT_REDIRECT:
> >  	case TC_ACT_UNSPEC:
> >  		return code;
> >  	default:
> > @@ -113,7 +115,7 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
> >  			res->classid = TC_H_MAJ(prog->res.classid) |
> >  				       qdisc_skb_cb(skb)->tc_classid;
> >  
> > -			ret = cls_bpf_exec_opcode(filter_res);
> > +			ret = cls_bpf_exec_opcode(filter_res, res);
> >  			if (ret == TC_ACT_UNSPEC)
> >  				continue;
> >  			break;
> > 
> 
> Can't we just export the struct redirect_info and let others like
> act_mirred use it, then we wouldn't need all these extra changes in
> fast path?

Thank you for the feedback.

The use of tcf_result allows passing to the redirect helper the stats
to be updated, and avoids an additional dev lookup for the mirred
datapath.

Some changes are necessary to make the skb_do_redirect() function more
generic, and code duplication could be reduced with some additional
helper. Do you have so strong opinion against that?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback
  2018-07-13 14:26     ` Paolo Abeni
@ 2018-07-13 14:41       ` Daniel Borkmann
  2018-07-13 15:00         ` Paolo Abeni
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2018-07-13 14:41 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Marcelo Ricardo Leitner

On 07/13/2018 04:26 PM, Paolo Abeni wrote:
> On Fri, 2018-07-13 at 16:08 +0200, Daniel Borkmann wrote:
>> On 07/13/2018 11:55 AM, Paolo Abeni wrote:
>>> Each lockless action currently does its own RCU locking in ->act().
>>> This is allows using plain RCU accessor, even if the context
>>> is really RCU BH.
>>>
>>> This change drops the per action RCU lock, replace the accessors
>>> with _bh variant, cleans up a bit the surronding code and documents
>>> the RCU status in the relevant header.
>>> No functional nor performance change is intended.
>>>
>>> The goal of this patch is clarifying that the RCU critical section
>>> used by the tc actions extends up to the classifier's caller.
>>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>
>> [...]
>>> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>> index 06f743d8ed41..ac20266460c0 100644
>>> --- a/net/sched/act_bpf.c
>>> +++ b/net/sched/act_bpf.c
>>> @@ -45,8 +45,7 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
>>>  	tcf_lastuse_update(&prog->tcf_tm);
>>>  	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
>>>  
>>> -	rcu_read_lock();
>>> -	filter = rcu_dereference(prog->filter);
>>> +	filter = rcu_dereference_bh(prog->filter);
>>>  	if (at_ingress) {
>>>  		__skb_push(skb, skb->mac_len);
>>>  		bpf_compute_data_pointers(skb);
>>> @@ -56,7 +55,6 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
>>>  		bpf_compute_data_pointers(skb);
>>>  		filter_res = BPF_PROG_RUN(filter, skb);
>>>  	}
>>> -	rcu_read_unlock();
>>
>> This conversion is not correct, BPF itself relies on RCU but not RCU-bh flavor.
>> You might probably see a splat if you do e.g. a map lookup with this change in
>> interpreter mode on tx side.
> 
> Thank you for your review.
> 
> I actually tested with lockdep, and lockdep is happy about it.
> 
> The not so nice fact is that many TC modules already use plain RCU
> primitives in the control path (call_rcu, kfree_rcu, etc.) and
> rcu_derefence_bh() in the datapath (e.g. all the classifiers). AFACS,
> despite the mix, this use is safe.

Hmm, so out of __dev_queue_xmit() we do the RCU-bh read-side. We call
into sch_handle_egress() which calls into tcf_classify() which may be
a matchall one e.g. mall_classify(). It invokes tcf_exts_exec() that
does the a->ops->act() which is the tcf_bpf() from here. If you then
call a helper like bpf_map_lookup_elem(), there's a WARN_ON_ONCE() for
!rcu_read_lock_held() since all of BPF is under normal RCU flavor. Why
would that not trigger? RCU != RCU-bh. Only the rcu_read_lock_bh_held()
would hold true.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback
  2018-07-13 14:41       ` Daniel Borkmann
@ 2018-07-13 15:00         ` Paolo Abeni
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Abeni @ 2018-07-13 15:00 UTC (permalink / raw)
  To: Daniel Borkmann, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Marcelo Ricardo Leitner

On Fri, 2018-07-13 at 16:41 +0200, Daniel Borkmann wrote:
> On 07/13/2018 04:26 PM, Paolo Abeni wrote:
> > On Fri, 2018-07-13 at 16:08 +0200, Daniel Borkmann wrote:
> > > On 07/13/2018 11:55 AM, Paolo Abeni wrote:
> > > > Each lockless action currently does its own RCU locking in ->act().
> > > > This is allows using plain RCU accessor, even if the context
> > > > is really RCU BH.
> > > > 
> > > > This change drops the per action RCU lock, replace the accessors
> > > > with _bh variant, cleans up a bit the surronding code and documents
> > > > the RCU status in the relevant header.
> > > > No functional nor performance change is intended.
> > > > 
> > > > The goal of this patch is clarifying that the RCU critical section
> > > > used by the tc actions extends up to the classifier's caller.
> > > > 
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > 
> > > [...]
> > > > diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
> > > > index 06f743d8ed41..ac20266460c0 100644
> > > > --- a/net/sched/act_bpf.c
> > > > +++ b/net/sched/act_bpf.c
> > > > @@ -45,8 +45,7 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
> > > >  	tcf_lastuse_update(&prog->tcf_tm);
> > > >  	bstats_cpu_update(this_cpu_ptr(prog->common.cpu_bstats), skb);
> > > >  
> > > > -	rcu_read_lock();
> > > > -	filter = rcu_dereference(prog->filter);
> > > > +	filter = rcu_dereference_bh(prog->filter);
> > > >  	if (at_ingress) {
> > > >  		__skb_push(skb, skb->mac_len);
> > > >  		bpf_compute_data_pointers(skb);
> > > > @@ -56,7 +55,6 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
> > > >  		bpf_compute_data_pointers(skb);
> > > >  		filter_res = BPF_PROG_RUN(filter, skb);
> > > >  	}
> > > > -	rcu_read_unlock();
> > > 
> > > This conversion is not correct, BPF itself relies on RCU but not RCU-bh flavor.
> > > You might probably see a splat if you do e.g. a map lookup with this change in
> > > interpreter mode on tx side.
> > 
> > Thank you for your review.
> > 
> > I actually tested with lockdep, and lockdep is happy about it.
> > 
> > The not so nice fact is that many TC modules already use plain RCU
> > primitives in the control path (call_rcu, kfree_rcu, etc.) and
> > rcu_derefence_bh() in the datapath (e.g. all the classifiers). AFACS,
> > despite the mix, this use is safe.
> 
> Hmm, so out of __dev_queue_xmit() we do the RCU-bh read-side. We call
> into sch_handle_egress() which calls into tcf_classify() which may be
> a matchall one e.g. mall_classify(). It invokes tcf_exts_exec() that
> does the a->ops->act() which is the tcf_bpf() from here. If you then
> call a helper like bpf_map_lookup_elem(), there's a WARN_ON_ONCE() for
> !rcu_read_lock_held() since all of BPF is under normal RCU flavor. Why
> would that not trigger? 

Because the basic sample I used did not call any other ebpf helper
beyond bpf_redirect(), I guess.

I see rcu_read_lock() is still needed here, thanks.

As a side note, after:

rcu_read_lock_bh()
rcu_read_lock();

both rcu_dereference() and rcu_derefernce_bh() are considered fine by
lockdep.

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling
  2018-07-13 14:37     ` Paolo Abeni
@ 2018-07-13 16:32       ` Daniel Borkmann
  0 siblings, 0 replies; 21+ messages in thread
From: Daniel Borkmann @ 2018-07-13 16:32 UTC (permalink / raw)
  To: Paolo Abeni, netdev
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Alexei Starovoitov,
	Marcelo Ricardo Leitner

On 07/13/2018 04:37 PM, Paolo Abeni wrote:
> On Fri, 2018-07-13 at 16:13 +0200, Daniel Borkmann wrote:
>> On 07/13/2018 11:55 AM, Paolo Abeni wrote:
>>> This patch changes the TC_ACT_REDIRECT code path to allow
>>> providing the redirect parameters via the tcf_result argument.
>>>
>>> Such union is expanded to host the redirect device, the redirect
>>> direction (ingress/egress) and the stats to be updated on error
>>> conditions.
>>>
>>> Actions/classifiers using TC_ACT_REDIRECT can either:
>>> * fill the tcf_result redirect related fields
>>> * clear such fields and use the bpf per cpu redirect info
>>>
>>> skb_do_redirect now tries to fetch the relevant data from tcf_result
>>> and fall back to access redirect info. It also updates the stats
>>> accordingly to the redirect result, if provided by the caller.
>>>
>>> This will allow using the TC_ACT_REDIRECT action in more places in
>>> the next patch.
>>>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>>  include/net/sch_generic.h | 15 ++++++++++++++-
>>>  net/core/dev.c            |  4 ++--
>>>  net/core/filter.c         | 29 +++++++++++++++++++++++------
>>>  net/core/lwt_bpf.c        |  5 ++++-
>>>  net/sched/act_bpf.c       |  4 +++-
>>>  net/sched/cls_bpf.c       |  8 +++++---
>>>  6 files changed, 51 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>>> index 056dc1083aa3..dd9e00d017b3 100644
>>> --- a/include/net/sch_generic.h
>>> +++ b/include/net/sch_generic.h
>>> @@ -235,9 +235,22 @@ struct tcf_result {
>>>  			u32		classid;
>>>  		};
>>>  		const struct tcf_proto *goto_tp;
>>> +
>>> +		/* used by the TC_ACT_REDIRECT action */
>>> +		struct {
>>> +			/* device and direction, or 0 bpf redirect */
>>> +			long		dev_ingress;
>>> +			struct gnet_stats_queue *qstats;
>>> +		};
>>>  	};
>>>  };
>>>  
>>> +#define TCF_RESULT_REDIR_DEV(res) \
>>> +	((struct net_device *)((res)->dev_ingress & ~1))
>>> +#define TCF_RESULT_REDIR_INGRESS(res) ((res)->dev_ingress & 1)
>>> +#define TCF_RESULT_SET_REDIRECT(res, dev, ingress) \
>>> +	((res)->dev_ingress = (long)(dev) | (!!(ingress)))
>>> +
>>>  struct tcf_proto_ops {
>>>  	struct list_head	head;
>>>  	char			kind[IFNAMSIZ];
>>> @@ -543,7 +556,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
>>>  				struct netlink_ext_ack *extack);
>>>  void __qdisc_calculate_pkt_len(struct sk_buff *skb,
>>>  			       const struct qdisc_size_table *stab);
>>> -int skb_do_redirect(struct sk_buff *);
>>> +int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res);
>>>  
>>>  static inline void skb_reset_tc(struct sk_buff *skb)
>>>  {
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 14a748ee8cc9..a283dbfde30c 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -3538,7 +3538,7 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>>>  		return NULL;
>>>  	case TC_ACT_REDIRECT:
>>>  		/* No need to push/pop skb's mac_header here on egress! */
>>> -		skb_do_redirect(skb);
>>> +		skb_do_redirect(skb, &cl_res);
>>>  		*ret = NET_XMIT_SUCCESS;
>>>  		return NULL;
>>>  	default:
>>> @@ -4600,7 +4600,7 @@ sch_handle_ingress(struct sk_buff *skb, struct packet_type **pt_prev, int *ret,
>>>  		 * redirecting to another netdev
>>>  		 */
>>>  		__skb_push(skb, skb->mac_len);
>>> -		skb_do_redirect(skb);
>>> +		skb_do_redirect(skb, &cl_res);
>>>  		return NULL;
>>>  	default:
>>>  		break;
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index b9ec916f4e3a..4f64cf5189e6 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -2062,19 +2062,36 @@ BPF_CALL_2(bpf_redirect, u32, ifindex, u64, flags)
>>>  	return TC_ACT_REDIRECT;
>>>  }
>>>  
>>> -int skb_do_redirect(struct sk_buff *skb)
>>> +int skb_do_redirect(struct sk_buff *skb, struct tcf_result *res)
>>>  {
>>> -	struct redirect_info *ri = this_cpu_ptr(&redirect_info);
>>> +	struct gnet_stats_queue *stats;
>>>  	struct net_device *dev;
>>> +	int ret, flags;
>>>  
>>> -	dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
>>> -	ri->ifindex = 0;
>>> +	if (!res->dev_ingress) {
>>> +		struct redirect_info *ri = this_cpu_ptr(&redirect_info);
>>> +
>>> +		dev = dev_get_by_index_rcu(dev_net(skb->dev), ri->ifindex);
>>> +		flags = ri->flags;
>>> +		ri->ifindex = 0;
>>> +		stats = NULL;
>>> +	} else {
>>> +		dev = TCF_RESULT_REDIR_DEV(res);
>>> +		flags = TCF_RESULT_REDIR_INGRESS(res) ? BPF_F_INGRESS : 0;
>>> +		stats = res->qstats;
>>> +	}
>>>  	if (unlikely(!dev)) {
>>>  		kfree_skb(skb);
>>> -		return -EINVAL;
>>> +		ret = -EINVAL;
>>> +		goto out;
>>>  	}
>>>  
>>> -	return __bpf_redirect(skb, dev, ri->flags);
>>> +	ret = __bpf_redirect(skb, dev, flags);
>>> +
>>> +out:
>>> +	if (ret && stats)
>>> +		qstats_overlimit_inc(res->qstats);
>>> +	return ret;
>>>  }
>>>  
>>>  static const struct bpf_func_proto bpf_redirect_proto = {
>>> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
>>> index e7e626fb87bb..8dde1093994a 100644
>>> --- a/net/core/lwt_bpf.c
>>> +++ b/net/core/lwt_bpf.c
>>> @@ -65,7 +65,10 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
>>>  				     lwt->name ? : "<unknown>");
>>>  			ret = BPF_OK;
>>>  		} else {
>>> -			ret = skb_do_redirect(skb);
>>> +			struct tcf_result res;
>>> +
>>> +			res.dev_ingress = 0;
>>> +			ret = skb_do_redirect(skb, &res);
>>>  			if (ret == 0)
>>>  				ret = BPF_REDIRECT;
>>>  		}
>>> diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
>>> index ac20266460c0..6fd46b691181 100644
>>> --- a/net/sched/act_bpf.c
>>> +++ b/net/sched/act_bpf.c
>>> @@ -67,10 +67,12 @@ static int tcf_bpf(struct sk_buff *skb, const struct tc_action *act,
>>>  	 * returned.
>>>  	 */
>>>  	switch (filter_res) {
>>> +	case TC_ACT_REDIRECT:
>>> +		res->dev_ingress = 0;
>>> +		/* fall-through */
>>>  	case TC_ACT_PIPE:
>>>  	case TC_ACT_RECLASSIFY:
>>>  	case TC_ACT_OK:
>>> -	case TC_ACT_REDIRECT:
>>>  		action = filter_res;
>>>  		break;
>>>  	case TC_ACT_SHOT:
>>> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
>>> index 66e0ac9811f9..f0fb7ded8fe2 100644
>>> --- a/net/sched/cls_bpf.c
>>> +++ b/net/sched/cls_bpf.c
>>> @@ -65,14 +65,16 @@ static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
>>>  				    .len = sizeof(struct sock_filter) * BPF_MAXINSNS },
>>>  };
>>>  
>>> -static int cls_bpf_exec_opcode(int code)
>>> +static int cls_bpf_exec_opcode(int code, struct tcf_result *res)
>>>  {
>>>  	switch (code) {
>>> +	case TC_ACT_REDIRECT:
>>> +		res->dev_ingress = 0;
>>> +		/* fall-through */
>>>  	case TC_ACT_OK:
>>>  	case TC_ACT_SHOT:
>>>  	case TC_ACT_STOLEN:
>>>  	case TC_ACT_TRAP:
>>> -	case TC_ACT_REDIRECT:
>>>  	case TC_ACT_UNSPEC:
>>>  		return code;
>>>  	default:
>>> @@ -113,7 +115,7 @@ static int cls_bpf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
>>>  			res->classid = TC_H_MAJ(prog->res.classid) |
>>>  				       qdisc_skb_cb(skb)->tc_classid;
>>>  
>>> -			ret = cls_bpf_exec_opcode(filter_res);
>>> +			ret = cls_bpf_exec_opcode(filter_res, res);
>>>  			if (ret == TC_ACT_UNSPEC)
>>>  				continue;
>>>  			break;
>>>
>>
>> Can't we just export the struct redirect_info and let others like
>> act_mirred use it, then we wouldn't need all these extra changes in
>> fast path?
> 
> Thank you for the feedback.
> 
> The use of tcf_result allows passing to the redirect helper the stats
> to be updated, and avoids an additional dev lookup for the mirred
> datapath.
> 
> Some changes are necessary to make the skb_do_redirect() function more
> generic, and code duplication could be reduced with some additional
> helper. Do you have so strong opinion against that?

Couldn't the needed pointers be added into struct redirect_info (and/or the
dev lookup be deferred for this specific scenario) thus that we would avoid
all the extra hassle just to transfer this state from yet a second entity?
I'd like to have the fast-path minimal and short, and I think this should be
doable this way for more broader reuse.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-13  9:55 ` [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible Paolo Abeni
@ 2018-07-16 23:39   ` Cong Wang
  2018-07-17  7:01     ` Eyal Birger
  2018-07-17  9:15     ` Paolo Abeni
  2018-07-19 17:16   ` kbuild test robot
  1 sibling, 2 replies; 21+ messages in thread
From: Cong Wang @ 2018-07-16 23:39 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner

On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> When mirred is invoked from the ingress path, and it wants to redirect
> the processed packet, it can now use the ACT_REDIRECT action,
> filling the tcf_result accordingly.
>
> This avoids a skb_clone() in the TC S/W data path giving a ~10%
> improvement in forwarding performances. Overall TC S/W performances
> are now comparable to the kernel openswitch datapath.

Avoiding skb_clone() for redirection is cool, but why need to use
skb_do_redirect() here?

There is a subtle difference here:

skb_do_redirect() calls __bpf_rx_skb() which calls
dev_forward_skb().

while the current mirred action doesn't scrub packets when
redirecting to ingress (from egress). Although I forget if it is
intentionally.

Also, skb->skb_iif is unset in skb_do_redirect() when
redirecting to ingress, I recall we have to set it correctly
for input routing. Probably yet another reason why we
can't scrub it, unless my memory goes wrong. :)

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-16 23:39   ` Cong Wang
@ 2018-07-17  7:01     ` Eyal Birger
  2018-07-17  9:15     ` Paolo Abeni
  1 sibling, 0 replies; 21+ messages in thread
From: Eyal Birger @ 2018-07-17  7:01 UTC (permalink / raw)
  To: Cong Wang
  Cc: Paolo Abeni, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Marcelo Ricardo Leitner, shmulik.ladkani

Hi,

On Mon, 16 Jul 2018 16:39:55 -0700
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > When mirred is invoked from the ingress path, and it wants to
> > redirect the processed packet, it can now use the ACT_REDIRECT
> > action, filling the tcf_result accordingly.
> >
> > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > improvement in forwarding performances. Overall TC S/W performances
> > are now comparable to the kernel openswitch datapath.  
> 
> Avoiding skb_clone() for redirection is cool, but why need to use
> skb_do_redirect() here?
> 
> There is a subtle difference here:
> 
> skb_do_redirect() calls __bpf_rx_skb() which calls
> dev_forward_skb().
> 
> while the current mirred action doesn't scrub packets when
> redirecting to ingress (from egress). Although I forget if it is
> intentionally.
> 
> Also, skb->skb_iif is unset in skb_do_redirect() when
> redirecting to ingress, I recall we have to set it correctly
> for input routing. Probably yet another reason why we
> can't scrub it, unless my memory goes wrong. :)

Also dev_forward_skb() enforces MTU checks on the packet which are not
done at the moment. I am aware of deployments where this would break
things.

Eyal.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-16 23:39   ` Cong Wang
  2018-07-17  7:01     ` Eyal Birger
@ 2018-07-17  9:15     ` Paolo Abeni
  2018-07-17  9:38       ` Dave Taht
  2018-07-17 17:24       ` Cong Wang
  1 sibling, 2 replies; 21+ messages in thread
From: Paolo Abeni @ 2018-07-17  9:15 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger

Hi,

On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote:
> On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > When mirred is invoked from the ingress path, and it wants to redirect
> > the processed packet, it can now use the ACT_REDIRECT action,
> > filling the tcf_result accordingly.
> > 
> > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > improvement in forwarding performances. Overall TC S/W performances
> > are now comparable to the kernel openswitch datapath.

Thank you for the feedback.

> Avoiding skb_clone() for redirection is cool, but why need to use
> skb_do_redirect() here?

Well, it's not needed. I tried to reduce code duplication, and I tried
to avoid adding another TC_ACT_* value.

> There is a subtle difference here:
> 
> skb_do_redirect() calls __bpf_rx_skb() which calls
> dev_forward_skb().
>
> while the current mirred action doesn't scrub packets when
> redirecting to ingress (from egress). Although I forget if it is
> intentionally.

Understood.

A possible option out of this issues would be adding another action
value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with
the appropriate semantic. That should address also Daniel and Eyal
concerns.

Would you consider the above acceptable?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-17  9:15     ` Paolo Abeni
@ 2018-07-17  9:38       ` Dave Taht
  2018-07-17 17:24       ` Cong Wang
  1 sibling, 0 replies; 21+ messages in thread
From: Dave Taht @ 2018-07-17  9:38 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Cong Wang, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiří Pírko, ast, Daniel Borkmann,
	Marcelo Ricardo Leitner, eyal.birger

On Tue, Jul 17, 2018 at 2:16 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote:
> > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > When mirred is invoked from the ingress path, and it wants to redirect
> > > the processed packet, it can now use the ACT_REDIRECT action,
> > > filling the tcf_result accordingly.
> > >
> > > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > > improvement in forwarding performances. Overall TC S/W performances
> > > are now comparable to the kernel openswitch datapath.
>
> Thank you for the feedback.
>
> > Avoiding skb_clone() for redirection is cool, but why need to use
> > skb_do_redirect() here?
>
> Well, it's not needed. I tried to reduce code duplication, and I tried
> to avoid adding another TC_ACT_* value.
>
> > There is a subtle difference here:
> >
> > skb_do_redirect() calls __bpf_rx_skb() which calls
> > dev_forward_skb().
> >
> > while the current mirred action doesn't scrub packets when
> > redirecting to ingress (from egress). Although I forget if it is
> > intentionally.
>
> Understood.
>
> A possible option out of this issues would be adding another action
> value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with
> the appropriate semantic. That should address also Daniel and Eyal
> concerns.
>
> Would you consider the above acceptable?

Our dream has been to be able to specify a 1 liner:

tc qdisc add dev eth0 ingress cake bandwidth 200mbit besteffort wash

and be done with it, eliminating ifb, mirred, etc, entirely.

(I am otherwise delighted to see anything appear that makes inbound
shaping cheaper along the lines of this patchset)
>
> Thanks,
>
> Paolo
>
>


-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-17  9:15     ` Paolo Abeni
  2018-07-17  9:38       ` Dave Taht
@ 2018-07-17 17:24       ` Cong Wang
  2018-07-18 10:05         ` Paolo Abeni
  1 sibling, 1 reply; 21+ messages in thread
From: Cong Wang @ 2018-07-17 17:24 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger

On Tue, Jul 17, 2018 at 2:16 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Mon, 2018-07-16 at 16:39 -0700, Cong Wang wrote:
> > On Fri, Jul 13, 2018 at 2:55 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > When mirred is invoked from the ingress path, and it wants to redirect
> > > the processed packet, it can now use the ACT_REDIRECT action,
> > > filling the tcf_result accordingly.
> > >
> > > This avoids a skb_clone() in the TC S/W data path giving a ~10%
> > > improvement in forwarding performances. Overall TC S/W performances
> > > are now comparable to the kernel openswitch datapath.
>
> Thank you for the feedback.
>
> > Avoiding skb_clone() for redirection is cool, but why need to use
> > skb_do_redirect() here?
>
> Well, it's not needed. I tried to reduce code duplication, and I tried
> to avoid adding another TC_ACT_* value.


If you consider dev_forward_skb(), it is not a duplication.


>
> > There is a subtle difference here:
> >
> > skb_do_redirect() calls __bpf_rx_skb() which calls
> > dev_forward_skb().
> >
> > while the current mirred action doesn't scrub packets when
> > redirecting to ingress (from egress). Although I forget if it is
> > intentionally.
>
> Understood.
>
> A possible option out of this issues would be adding another action
> value - TC_ACT_MIRRED ? - and handle it in sch_handle_egress()[1] with
> the appropriate semantic. That should address also Daniel and Eyal
> concerns.
>
> Would you consider the above acceptable?

If you goal is to get rid of skb_clone(), why not just do the following?

        if (tcf_mirred_is_act_redirect(m_eaction)) {
                skb2 = skb;
        } else {
                skb2 = skb_clone(skb, GFP_ATOMIC);
                if (!skb2)
                        goto out;
        }

For redirect, we return TC_ACT_SHOT, so upper layer should not
touch the skb after that.

What am I missing here?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-17 17:24       ` Cong Wang
@ 2018-07-18 10:05         ` Paolo Abeni
  2018-07-19 17:56           ` Cong Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Abeni @ 2018-07-18 10:05 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger

Hi,

On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote:
> If you goal is to get rid of skb_clone(), why not just do the following?
> 
>         if (tcf_mirred_is_act_redirect(m_eaction)) {
>                 skb2 = skb;
>         } else {
>                 skb2 = skb_clone(skb, GFP_ATOMIC);
>                 if (!skb2)
>                         goto out;
>         }
> 
> For redirect, we return TC_ACT_SHOT, so upper layer should not
> touch the skb after that.
> 
> What am I missing here?

With ACT_SHOT caller/upper layer will free the skb, too. We will have
an use after free (from either the upper layer and the xmit device).
Similar issues with STOLEN, TRAP, etc.

In the past, Changli Gao attempted to avoid the clone incrementing the
skb usage count:

commit 210d6de78c5d7c785fc532556cea340e517955e1
Author: Changli Gao <xiaosuo@gmail.com>
Date:   Thu Jun 24 16:25:12 2010 +0000

    act_mirred: don't clone skb when skb isn't shared

but some/many device drivers expect an skb usage count of 1, and that
caused ooops and was revered.

I think the only other option (beyond re-using ACT_MIRROR) is adding
another action value, and let the upper layer re-inject the packet
while handling such action (similar to what ACT_MIRROR currently does,
but preserving the current mirred semantic).

Cheers,

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-13  9:55 ` [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible Paolo Abeni
  2018-07-16 23:39   ` Cong Wang
@ 2018-07-19 17:16   ` kbuild test robot
  1 sibling, 0 replies; 21+ messages in thread
From: kbuild test robot @ 2018-07-19 17:16 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: kbuild-all, netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Paolo-Abeni/TC-refactor-TC_ACT_REDIRECT-action/20180716-011055
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/sched/act_mirred.c:195:17: sparse: dubious: x | !y
   net/sched/act_mirred.c:260:23: sparse: expression using sizeof(void)
   net/sched/act_mirred.c:260:23: sparse: expression using sizeof(void)

vim +195 net/sched/act_mirred.c

   169	
   170	static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
   171			      struct tcf_result *res)
   172	{
   173		struct tcf_mirred *m = to_mirred(a);
   174		bool m_mac_header_xmit;
   175		struct net_device *dev;
   176		struct sk_buff *skb2;
   177		int retval, err = 0;
   178		bool want_ingress;
   179		int m_eaction;
   180		int mac_len;
   181	
   182		tcf_lastuse_update(&m->tcf_tm);
   183		bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
   184	
   185		m_mac_header_xmit = READ_ONCE(m->tcfm_mac_header_xmit);
   186		m_eaction = READ_ONCE(m->tcfm_eaction);
   187		retval = READ_ONCE(m->tcf_action);
   188		dev = rcu_dereference_bh(m->tcfm_dev);
   189		want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
   190		if (skb_at_tc_ingress(skb) && tcf_mirred_is_act_redirect(m_eaction)) {
   191			skb->tc_redirected = 1;
   192			skb->tc_from_ingress = 1;
   193	
   194			/* the core redirect code will check dev and its status */
 > 195			TCF_RESULT_SET_REDIRECT(res, dev, want_ingress);
   196			res->qstats = this_cpu_ptr(m->common.cpu_qstats);
   197			return TC_ACT_REDIRECT;
   198		}
   199	
   200		if (unlikely(!dev)) {
   201			pr_notice_once("tc mirred: target device is gone\n");
   202			goto out;
   203		}
   204	
   205		if (unlikely(!(dev->flags & IFF_UP))) {
   206			net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
   207					       dev->name);
   208			goto out;
   209		}
   210	
   211		skb2 = skb_clone(skb, GFP_ATOMIC);
   212		if (!skb2)
   213			goto out;
   214	
   215		/* If action's target direction differs than filter's direction,
   216		 * and devices expect a mac header on xmit, then mac push/pull is
   217		 * needed.
   218		 */
   219		if (skb_at_tc_ingress(skb) != want_ingress && m_mac_header_xmit) {
   220			if (!skb_at_tc_ingress(skb)) {
   221				/* caught at egress, act ingress: pull mac */
   222				mac_len = skb_network_header(skb) - skb_mac_header(skb);
   223				skb_pull_rcsum(skb2, mac_len);
   224			} else {
   225				/* caught at ingress, act egress: push mac */
   226				skb_push_rcsum(skb2, skb->mac_len);
   227			}
   228		}
   229	
   230		/* mirror is always swallowed */
   231		if (tcf_mirred_is_act_redirect(m_eaction)) {
   232			skb2->tc_redirected = 1;
   233			skb2->tc_from_ingress = skb2->tc_at_ingress;
   234		}
   235	
   236		skb2->skb_iif = skb->dev->ifindex;
   237		skb2->dev = dev;
   238		if (!tcf_mirred_act_wants_ingress(m_eaction))
   239			err = dev_queue_xmit(skb2);
   240		else
   241			err = netif_receive_skb(skb2);
   242	
   243		if (err) {
   244	out:
   245			qstats_overlimit_inc(this_cpu_ptr(m->common.cpu_qstats));
   246			if (tcf_mirred_is_act_redirect(m_eaction))
   247				retval = TC_ACT_SHOT;
   248		}
   249	
   250		return retval;
   251	}
   252	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-18 10:05         ` Paolo Abeni
@ 2018-07-19 17:56           ` Cong Wang
  2018-07-20 10:16             ` Paolo Abeni
  0 siblings, 1 reply; 21+ messages in thread
From: Cong Wang @ 2018-07-19 17:56 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger

On Wed, Jul 18, 2018 at 3:05 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote:
> > If you goal is to get rid of skb_clone(), why not just do the following?
> >
> >         if (tcf_mirred_is_act_redirect(m_eaction)) {
> >                 skb2 = skb;
> >         } else {
> >                 skb2 = skb_clone(skb, GFP_ATOMIC);
> >                 if (!skb2)
> >                         goto out;
> >         }
> >
> > For redirect, we return TC_ACT_SHOT, so upper layer should not
> > touch the skb after that.
> >
> > What am I missing here?
>
> With ACT_SHOT caller/upper layer will free the skb, too. We will have
> an use after free (from either the upper layer and the xmit device).
> Similar issues with STOLEN, TRAP, etc.
>
> In the past, Changli Gao attempted to avoid the clone incrementing the
> skb usage count:
>
> commit 210d6de78c5d7c785fc532556cea340e517955e1
> Author: Changli Gao <xiaosuo@gmail.com>
> Date:   Thu Jun 24 16:25:12 2010 +0000
>
>     act_mirred: don't clone skb when skb isn't shared
>
> but some/many device drivers expect an skb usage count of 1, and that
> caused ooops and was revered.

Interesting, I wasn't aware of the above commit and its revert.

First, I didn't use skb_get() above.

Second, I think the caller of dev_queue_xmit() should not
touch the skb after it, the skb is either freed by dev_queue_xmit()
or successfully transmitted, in either case, the ownership belongs
to dev_queue_xmit(). So, I think we should skip the qdisc_drop()
for this case.

Not sure about netif_receive_skb() case, given veth calls in its
xmit too, I speculate the rule is probably same.

Not sure about other ACT_SHOT case than act_mirred...

>
> I think the only other option (beyond re-using ACT_MIRROR) is adding
> another action value, and let the upper layer re-inject the packet
> while handling such action (similar to what ACT_MIRROR currently does,
> but preserving the current mirred semantic).

Maybe if you mean to avoid breaking ACT_SHOT.

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible
  2018-07-19 17:56           ` Cong Wang
@ 2018-07-20 10:16             ` Paolo Abeni
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Abeni @ 2018-07-20 10:16 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner,
	Eyal Birger

On Thu, 2018-07-19 at 10:56 -0700, Cong Wang wrote:
> On Wed, Jul 18, 2018 at 3:05 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Hi,
> > 
> > On Tue, 2018-07-17 at 10:24 -0700, Cong Wang wrote:
> > > If you goal is to get rid of skb_clone(), why not just do the following?
> > > 
> > >         if (tcf_mirred_is_act_redirect(m_eaction)) {
> > >                 skb2 = skb;
> > >         } else {
> > >                 skb2 = skb_clone(skb, GFP_ATOMIC);
> > >                 if (!skb2)
> > >                         goto out;
> > >         }
> > > 
> > > For redirect, we return TC_ACT_SHOT, so upper layer should not
> > > touch the skb after that.
> > > 
> > > What am I missing here?
> > 
> > With ACT_SHOT caller/upper layer will free the skb, too. We will have
> > an use after free (from either the upper layer and the xmit device).
> > Similar issues with STOLEN, TRAP, etc.
> > 
> > In the past, Changli Gao attempted to avoid the clone incrementing the
> > skb usage count:
> > 
> > commit 210d6de78c5d7c785fc532556cea340e517955e1
> > Author: Changli Gao <xiaosuo@gmail.com>
> > Date:   Thu Jun 24 16:25:12 2010 +0000
> > 
> >     act_mirred: don't clone skb when skb isn't shared
> > 
> > but some/many device drivers expect an skb usage count of 1, and that
> > caused ooops and was revered.
> 
> Interesting, I wasn't aware of the above commit and its revert.
> 
> First, I didn't use skb_get() above.
> 
> Second, I think the caller of dev_queue_xmit() should not
> touch the skb after it, the skb is either freed by dev_queue_xmit()
> or successfully transmitted, in either case, the ownership belongs
> to dev_queue_xmit(). So, I think we should skip the qdisc_drop()
> for this case.
> 
> Not sure about netif_receive_skb() case, given veth calls in its
> xmit too, I speculate the rule is probably same.
> 
> Not sure about other ACT_SHOT case than act_mirred...

I think any tc filter can be configured from user space to return
ACT_SHOT, so changing the ACT_SHOT handling would be quite invasive and
error prone, as all the tc filters (to free the skb on ACT_SHOT) and tc
schedulers (to avoid touching the skb) must be modified.

If there are no strong objection vs a new action value, I would opt for
such option.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-07-20 11:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-13  9:54 [PATCH net-next 0/4] TC: refactor TC_ACT_REDIRECT action Paolo Abeni
2018-07-13  9:54 ` [PATCH net-next 1/4] tc/act: user space can't use TC_ACT_REDIRECT directly Paolo Abeni
2018-07-13  9:55 ` [PATCH net-next 2/4] tc/act: remove unneeded RCU lock in action callback Paolo Abeni
2018-07-13 14:08   ` Daniel Borkmann
2018-07-13 14:26     ` Paolo Abeni
2018-07-13 14:41       ` Daniel Borkmann
2018-07-13 15:00         ` Paolo Abeni
2018-07-13  9:55 ` [PATCH net-next 3/4] net/sched: refactor TC_ACT_REDIRECT handling Paolo Abeni
2018-07-13 14:13   ` Daniel Borkmann
2018-07-13 14:37     ` Paolo Abeni
2018-07-13 16:32       ` Daniel Borkmann
2018-07-13  9:55 ` [PATCH net-next 4/4] act_mirred: use ACT_REDIRECT when possible Paolo Abeni
2018-07-16 23:39   ` Cong Wang
2018-07-17  7:01     ` Eyal Birger
2018-07-17  9:15     ` Paolo Abeni
2018-07-17  9:38       ` Dave Taht
2018-07-17 17:24       ` Cong Wang
2018-07-18 10:05         ` Paolo Abeni
2018-07-19 17:56           ` Cong Wang
2018-07-20 10:16             ` Paolo Abeni
2018-07-19 17:16   ` kbuild test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.