All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6 00/11] Modify action API for implementing lockless actions
@ 2018-07-05 14:24 Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update Vlad Buslov
                   ` (12 more replies)
  0 siblings, 13 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn, Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a first step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in action API that
prevents it from being executed concurrently. This patch set does not
completely unlock rules or actions update path. Additional patch sets
are required to refactor individual actions and filters update for
parallel execution.

As a preparation for executing TC rules update handlers without rtnl
lock, action API code was audited to determine areas that assume
external synchronization with rtnl lock and must be changed to allow
safe concurrent access with following results:

1. Action idr is already protected with spinlock. However, some code
   paths assume that idr state is not changes between several
   consecutive tcf_idr_* function calls.
2. tc_action reference and bind counters are implemented as plain
   integers. They purpose was to allow single actions to be shared
   between multiple filters, not to provide means for concurrent
   modification.
3. tc_action 'cookie' pointer field is not protected against
   modification.
4. Action API functions, that work with set of actions, use intrusive
   linked list, which cannot be used concurrently without additional
   synchronization.
5. Action API functions don't take reference to actions while using
   them, assuming external synchronization with rtnl lock.

Following solutions to these problems are implemented:

1. To remove assumption that idr state doesn't change between tcf_idr_*
   calls, implement new functions that atomically perform several
   operations on idr without releasing idr spinlock. (function to
   atomically lookup and delete action by index, function to atomically
   check if action exists and allocate new one if necessary, etc.)
2. Use atomic operations on counters to make them suitable for
   concurrent get/put operations.
3. Data that 'cookie' points to is never modified, so it enough to
   refactor it to rcu pointer to prevent concurrent de-allocation.
4. Action API doesn't actually use any linked list specific operations
   on actions intrusive linked list, so it can be refactored to array in
   straightforward manner.
5. Always take reference to action while accessing it in action API.
   tcf_idr_search function modified to take reference to action before
   returning it, so there is no way to lookup an action without
   incrementing its reference counter. All users of this function are
   modified to release the reference, after they done using action. With
   all users using reference counting, it is now safe to concurrently
   delete actions.

Additionally, actions init function signature was expanded with
'rtnl_held' argument, that allows actions that have internal dependency
on rtnl lock to take/release it when necessary.

Since only shared state in action API module are actions themselves and
action idr, these changes are sufficient to not to rely on global rtnl
lock for protection of internal action API data structures.

Changes from V5 to V6:
- Rebase on current net-next
- When action is deleted, set pointer in actions array to NULL to
  prevent double freeing.

Changes from V4 to V5:
- Change action delete API to track actions that were deleted, to
  prevent releasing them on error.

Changes from V3 to V4:
- Expand cover letter.
- Reduce actions array size in tcf_action_init_1.
- Rebase on latest net-next.

Changes from V2 to V3:
- Re-send with changelog copied to individual patches.

Changes from V1 to V2:
- Removed redundant actions ops lookup during delete.
- Merge action ops delete definition and implementation.
- Assume all actions have delete implemented and don't check for it
  explicitly.
- Resplit action lookup/release code to prevent memory leaks in
  individual patches.
- Make __tcf_idr_check function static
- Remove unique idr insertion function. Change original idr insert to do
  the same thing.
- Merge changes that take reference to action when performing lookup and
  changes that account for this additional reference when dumping action
  to user space into single patch.
- Change convoluted commit message.
- Rename "unlocked" to "rtnl_held" for clarity.
- Remove estimator lock add patch.
- Refactor action check-alloc code into standalone function.
- Rename tcf_idr_find_delete to tcf_idr_delete_index.
- Rearrange variable definitions in tc_action_delete.
- Add patch that refactors action API code to use array of pointers to
  actions instead of intrusive linked list.
- Expand cover letter.

Vlad Buslov (11):
  net: sched: use rcu for action cookie update
  net: sched: change type of reference and bind counters
  net: sched: implement unlocked action init API
  net: sched: always take reference to action
  net: sched: implement action API that deletes action by index
  net: sched: add 'delete' function to action ops
  net: sched: implement reference counted action release
  net: sched: don't release reference on action overwrite
  net: sched: use reference counting action init
  net: sched: atomically check-allocate action
  net: sched: change action API to use array of pointers to actions

 include/net/act_api.h      |  25 ++-
 include/net/pkt_cls.h      |   1 +
 net/sched/act_api.c        | 415 +++++++++++++++++++++++++++++++--------------
 net/sched/act_bpf.c        |  34 ++--
 net/sched/act_connmark.c   |  29 +++-
 net/sched/act_csum.c       |  34 ++--
 net/sched/act_gact.c       |  31 +++-
 net/sched/act_ife.c        |  31 +++-
 net/sched/act_ipt.c        |  44 ++++-
 net/sched/act_mirred.c     |  38 +++--
 net/sched/act_nat.c        |  30 +++-
 net/sched/act_pedit.c      |  29 +++-
 net/sched/act_police.c     |  31 ++--
 net/sched/act_sample.c     |  34 ++--
 net/sched/act_simple.c     |  31 +++-
 net/sched/act_skbedit.c    |  31 +++-
 net/sched/act_skbmod.c     |  34 ++--
 net/sched/act_tunnel_key.c |  35 ++--
 net/sched/act_vlan.c       |  40 +++--
 net/sched/cls_api.c        |  25 +--
 20 files changed, 707 insertions(+), 295 deletions(-)

-- 
2.7.5

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-07-13  3:52   ` Cong Wang
  2018-07-05 14:24 ` [PATCH net-next v6 02/11] net: sched: change type of reference and bind counters Vlad Buslov
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Implement functions to atomically update and free action cookie
using rcu mechanism.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/act_api.h |  2 +-
 include/net/pkt_cls.h |  1 +
 net/sched/act_api.c   | 44 ++++++++++++++++++++++++++++++--------------
 3 files changed, 32 insertions(+), 15 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 5ff11adbe2a6..ffc3ef321776 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -37,7 +37,7 @@ struct tc_action {
 	spinlock_t			tcfa_lock;
 	struct gnet_stats_basic_cpu __percpu *cpu_bstats;
 	struct gnet_stats_queue __percpu *cpu_qstats;
-	struct tc_cookie	*act_cookie;
+	struct tc_cookie	__rcu *act_cookie;
 	struct tcf_chain	*goto_chain;
 };
 #define tcf_index	common.tcfa_index
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 6641584b27f1..2081e4219f81 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -781,6 +781,7 @@ struct tc_mqprio_qopt_offload {
 struct tc_cookie {
 	u8  *data;
 	u32 len;
+	struct rcu_head rcu;
 };
 
 struct tc_qopt_offload_stats {
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 3f4cf930f809..02670c7489e3 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -55,6 +55,24 @@ static void tcf_action_goto_chain_exec(const struct tc_action *a,
 	res->goto_tp = rcu_dereference_bh(chain->filter_chain);
 }
 
+static void tcf_free_cookie_rcu(struct rcu_head *p)
+{
+	struct tc_cookie *cookie = container_of(p, struct tc_cookie, rcu);
+
+	kfree(cookie->data);
+	kfree(cookie);
+}
+
+static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
+				  struct tc_cookie *new_cookie)
+{
+	struct tc_cookie *old;
+
+	old = xchg(old_cookie, new_cookie);
+	if (old)
+		call_rcu(&old->rcu, tcf_free_cookie_rcu);
+}
+
 /* XXX: For standalone actions, we don't need a RCU grace period either, because
  * actions are always connected to filters and filters are already destroyed in
  * RCU callbacks, so after a RCU grace period actions are already disconnected
@@ -65,10 +83,7 @@ static void free_tcf(struct tc_action *p)
 	free_percpu(p->cpu_bstats);
 	free_percpu(p->cpu_qstats);
 
-	if (p->act_cookie) {
-		kfree(p->act_cookie->data);
-		kfree(p->act_cookie);
-	}
+	tcf_set_action_cookie(&p->act_cookie, NULL);
 	if (p->goto_chain)
 		tcf_action_goto_chain_fini(p);
 
@@ -567,16 +582,22 @@ tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
 	int err = -EINVAL;
 	unsigned char *b = skb_tail_pointer(skb);
 	struct nlattr *nest;
+	struct tc_cookie *cookie;
 
 	if (nla_put_string(skb, TCA_KIND, a->ops->kind))
 		goto nla_put_failure;
 	if (tcf_action_copy_stats(skb, a, 0))
 		goto nla_put_failure;
-	if (a->act_cookie) {
-		if (nla_put(skb, TCA_ACT_COOKIE, a->act_cookie->len,
-			    a->act_cookie->data))
+
+	rcu_read_lock();
+	cookie = rcu_dereference(a->act_cookie);
+	if (cookie) {
+		if (nla_put(skb, TCA_ACT_COOKIE, cookie->len, cookie->data)) {
+			rcu_read_unlock();
 			goto nla_put_failure;
+		}
 	}
+	rcu_read_unlock();
 
 	nest = nla_nest_start(skb, TCA_OPTIONS);
 	if (nest == NULL)
@@ -719,13 +740,8 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	if (err < 0)
 		goto err_mod;
 
-	if (name == NULL && tb[TCA_ACT_COOKIE]) {
-		if (a->act_cookie) {
-			kfree(a->act_cookie->data);
-			kfree(a->act_cookie);
-		}
-		a->act_cookie = cookie;
-	}
+	if (!name && tb[TCA_ACT_COOKIE])
+		tcf_set_action_cookie(&a->act_cookie, cookie);
 
 	/* module count goes up only when brand new policy is created
 	 * if it exists and is only bound to in a_o->init() then
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 02/11] net: sched: change type of reference and bind counters
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 03/11] net: sched: implement unlocked action init API Vlad Buslov
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Change type of action reference counter to refcount_t.

Change type of action bind counter to atomic_t.
This type is used to allow decrementing bind counter without testing
for 0 result.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/act_api.h      |  5 +++--
 net/sched/act_api.c        | 32 ++++++++++++++++++++++----------
 net/sched/act_bpf.c        |  4 ++--
 net/sched/act_connmark.c   |  4 ++--
 net/sched/act_csum.c       |  4 ++--
 net/sched/act_gact.c       |  4 ++--
 net/sched/act_ife.c        |  4 ++--
 net/sched/act_ipt.c        |  4 ++--
 net/sched/act_mirred.c     |  4 ++--
 net/sched/act_nat.c        |  4 ++--
 net/sched/act_pedit.c      |  4 ++--
 net/sched/act_police.c     |  4 ++--
 net/sched/act_sample.c     |  4 ++--
 net/sched/act_simple.c     |  4 ++--
 net/sched/act_skbedit.c    |  4 ++--
 net/sched/act_skbmod.c     |  4 ++--
 net/sched/act_tunnel_key.c |  4 ++--
 net/sched/act_vlan.c       |  4 ++--
 18 files changed, 57 insertions(+), 44 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index ffc3ef321776..2759226527a2 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -6,6 +6,7 @@
  * Public action API for classifiers/qdiscs
 */
 
+#include <linux/refcount.h>
 #include <net/sch_generic.h>
 #include <net/pkt_sched.h>
 #include <net/net_namespace.h>
@@ -26,8 +27,8 @@ struct tc_action {
 	struct tcf_idrinfo		*idrinfo;
 
 	u32				tcfa_index;
-	int				tcfa_refcnt;
-	int				tcfa_bindcnt;
+	refcount_t			tcfa_refcnt;
+	atomic_t			tcfa_bindcnt;
 	u32				tcfa_capab;
 	int				tcfa_action;
 	struct tcf_t			tcfa_tm;
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 02670c7489e3..4f064ecab882 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -105,14 +105,26 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 
 	ASSERT_RTNL();
 
+	/* Release with strict==1 and bind==0 is only called through act API
+	 * interface (classifiers always bind). Only case when action with
+	 * positive reference count and zero bind count can exist is when it was
+	 * also created with act API (unbinding last classifier will destroy the
+	 * action if it was created by classifier). So only case when bind count
+	 * can be changed after initial check is when unbound action is
+	 * destroyed by act API while classifier binds to action with same id
+	 * concurrently. This result either creation of new action(same behavior
+	 * as before), or reusing existing action if concurrent process
+	 * increments reference count before action is deleted. Both scenarios
+	 * are acceptable.
+	 */
 	if (p) {
 		if (bind)
-			p->tcfa_bindcnt--;
-		else if (strict && p->tcfa_bindcnt > 0)
+			atomic_dec(&p->tcfa_bindcnt);
+		else if (strict && atomic_read(&p->tcfa_bindcnt) > 0)
 			return -EPERM;
 
-		p->tcfa_refcnt--;
-		if (p->tcfa_bindcnt <= 0 && p->tcfa_refcnt <= 0) {
+		if (atomic_read(&p->tcfa_bindcnt) <= 0 &&
+		    refcount_dec_and_test(&p->tcfa_refcnt)) {
 			if (p->ops->cleanup)
 				p->ops->cleanup(p);
 			tcf_idr_remove(p->idrinfo, p);
@@ -304,8 +316,8 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
 
 	if (index && p) {
 		if (bind)
-			p->tcfa_bindcnt++;
-		p->tcfa_refcnt++;
+			atomic_inc(&p->tcfa_bindcnt);
+		refcount_inc(&p->tcfa_refcnt);
 		*a = p;
 		return true;
 	}
@@ -324,9 +336,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 
 	if (unlikely(!p))
 		return -ENOMEM;
-	p->tcfa_refcnt = 1;
+	refcount_set(&p->tcfa_refcnt, 1);
 	if (bind)
-		p->tcfa_bindcnt = 1;
+		atomic_set(&p->tcfa_bindcnt, 1);
 
 	if (cpustats) {
 		p->cpu_bstats = netdev_alloc_pcpu_stats(struct gnet_stats_basic_cpu);
@@ -782,7 +794,7 @@ static void cleanup_a(struct list_head *actions, int ovr)
 		return;
 
 	list_for_each_entry(a, actions, list)
-		a->tcfa_refcnt--;
+		refcount_dec(&a->tcfa_refcnt);
 }
 
 int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
@@ -810,7 +822,7 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		act->order = i;
 		sz += tcf_action_fill_size(act);
 		if (ovr)
-			act->tcfa_refcnt++;
+			refcount_inc(&act->tcfa_refcnt);
 		list_add_tail(&act->list, actions);
 	}
 
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 18089c02e557..15a2a53cbde1 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -141,8 +141,8 @@ static int tcf_bpf_dump(struct sk_buff *skb, struct tc_action *act,
 	struct tcf_bpf *prog = to_bpf(act);
 	struct tc_act_bpf opt = {
 		.index   = prog->tcf_index,
-		.refcnt  = prog->tcf_refcnt - ref,
-		.bindcnt = prog->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&prog->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&prog->tcf_bindcnt) - bind,
 		.action  = prog->tcf_action,
 	};
 	struct tcf_t tm;
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index e4b880fa51fe..188865034f9a 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -154,8 +154,8 @@ static inline int tcf_connmark_dump(struct sk_buff *skb, struct tc_action *a,
 
 	struct tc_connmark opt = {
 		.index   = ci->tcf_index,
-		.refcnt  = ci->tcf_refcnt - ref,
-		.bindcnt = ci->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
 		.action  = ci->tcf_action,
 		.zone   = ci->zone,
 	};
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 526a8e491626..da865f7b390a 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -597,8 +597,8 @@ static int tcf_csum_dump(struct sk_buff *skb, struct tc_action *a, int bind,
 	struct tcf_csum_params *params;
 	struct tc_csum opt = {
 		.index   = p->tcf_index,
-		.refcnt  = p->tcf_refcnt - ref,
-		.bindcnt = p->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&p->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&p->tcf_bindcnt) - bind,
 	};
 	struct tcf_t t;
 
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 4dc4f153cad8..ca83debd5a70 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -169,8 +169,8 @@ static int tcf_gact_dump(struct sk_buff *skb, struct tc_action *a,
 	struct tcf_gact *gact = to_gact(a);
 	struct tc_gact opt = {
 		.index   = gact->tcf_index,
-		.refcnt  = gact->tcf_refcnt - ref,
-		.bindcnt = gact->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&gact->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&gact->tcf_bindcnt) - bind,
 		.action  = gact->tcf_action,
 	};
 	struct tcf_t t;
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 20d7d36b2fc9..3536a23f46b5 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -596,8 +596,8 @@ static int tcf_ife_dump(struct sk_buff *skb, struct tc_action *a, int bind,
 	struct tcf_ife_params *p = rtnl_dereference(ife->params);
 	struct tc_ife opt = {
 		.index = ife->tcf_index,
-		.refcnt = ife->tcf_refcnt - ref,
-		.bindcnt = ife->tcf_bindcnt - bind,
+		.refcnt = refcount_read(&ife->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&ife->tcf_bindcnt) - bind,
 		.action = ife->tcf_action,
 		.flags = p->flags,
 	};
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 14c312d7908f..7bce88dc11c9 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -280,8 +280,8 @@ static int tcf_ipt_dump(struct sk_buff *skb, struct tc_action *a, int bind,
 	if (unlikely(!t))
 		goto nla_put_failure;
 
-	c.bindcnt = ipt->tcf_bindcnt - bind;
-	c.refcnt = ipt->tcf_refcnt - ref;
+	c.bindcnt = atomic_read(&ipt->tcf_bindcnt) - bind;
+	c.refcnt = refcount_read(&ipt->tcf_refcnt) - ref;
 	strcpy(t->u.user.name, ipt->tcfi_t->u.kernel.target->name);
 
 	if (nla_put(skb, TCA_IPT_TARG, ipt->tcfi_t->u.user.target_size, t) ||
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index fd34015331ab..82a8bdd67c47 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -250,8 +250,8 @@ static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind,
 	struct tc_mirred opt = {
 		.index   = m->tcf_index,
 		.action  = m->tcf_action,
-		.refcnt  = m->tcf_refcnt - ref,
-		.bindcnt = m->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&m->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&m->tcf_bindcnt) - bind,
 		.eaction = m->tcfm_eaction,
 		.ifindex = dev ? dev->ifindex : 0,
 	};
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 4b5848b6c252..457c2ae3de46 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -257,8 +257,8 @@ static int tcf_nat_dump(struct sk_buff *skb, struct tc_action *a,
 
 		.index    = p->tcf_index,
 		.action   = p->tcf_action,
-		.refcnt   = p->tcf_refcnt - ref,
-		.bindcnt  = p->tcf_bindcnt - bind,
+		.refcnt   = refcount_read(&p->tcf_refcnt) - ref,
+		.bindcnt  = atomic_read(&p->tcf_bindcnt) - bind,
 	};
 	struct tcf_t t;
 
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index e43aef28fdac..889690e0ec39 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -409,8 +409,8 @@ static int tcf_pedit_dump(struct sk_buff *skb, struct tc_action *a,
 	opt->nkeys = p->tcfp_nkeys;
 	opt->flags = p->tcfp_flags;
 	opt->action = p->tcf_action;
-	opt->refcnt = p->tcf_refcnt - ref;
-	opt->bindcnt = p->tcf_bindcnt - bind;
+	opt->refcnt = refcount_read(&p->tcf_refcnt) - ref;
+	opt->bindcnt = atomic_read(&p->tcf_bindcnt) - bind;
 
 	if (p->tcfp_keys_ex) {
 		tcf_pedit_key_ex_dump(skb, p->tcfp_keys_ex, p->tcfp_nkeys);
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 4e72bc2a0dfb..a789b8060968 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -274,8 +274,8 @@ static int tcf_act_police_dump(struct sk_buff *skb, struct tc_action *a,
 		.action = police->tcf_action,
 		.mtu = police->tcfp_mtu,
 		.burst = PSCHED_NS2TICKS(police->tcfp_burst),
-		.refcnt = police->tcf_refcnt - ref,
-		.bindcnt = police->tcf_bindcnt - bind,
+		.refcnt = refcount_read(&police->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&police->tcf_bindcnt) - bind,
 	};
 	struct tcf_t t;
 
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 5db358497c9e..4a46978db092 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -173,8 +173,8 @@ static int tcf_sample_dump(struct sk_buff *skb, struct tc_action *a,
 	struct tc_sample opt = {
 		.index      = s->tcf_index,
 		.action     = s->tcf_action,
-		.refcnt     = s->tcf_refcnt - ref,
-		.bindcnt    = s->tcf_bindcnt - bind,
+		.refcnt     = refcount_read(&s->tcf_refcnt) - ref,
+		.bindcnt    = atomic_read(&s->tcf_bindcnt) - bind,
 	};
 	struct tcf_t t;
 
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 98c4afe7c15b..c3a761097b01 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -145,8 +145,8 @@ static int tcf_simp_dump(struct sk_buff *skb, struct tc_action *a,
 	struct tcf_defact *d = to_defact(a);
 	struct tc_defact opt = {
 		.index   = d->tcf_index,
-		.refcnt  = d->tcf_refcnt - ref,
-		.bindcnt = d->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&d->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&d->tcf_bindcnt) - bind,
 		.action  = d->tcf_action,
 	};
 	struct tcf_t t;
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index dfaf5d8028dd..cfd20d3d2ca9 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -208,8 +208,8 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
 	struct tcf_skbedit *d = to_skbedit(a);
 	struct tc_skbedit opt = {
 		.index   = d->tcf_index,
-		.refcnt  = d->tcf_refcnt - ref,
-		.bindcnt = d->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&d->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&d->tcf_bindcnt) - bind,
 		.action  = d->tcf_action,
 	};
 	struct tcf_t t;
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index ad050d7d4b46..ff90d720eda3 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -205,8 +205,8 @@ static int tcf_skbmod_dump(struct sk_buff *skb, struct tc_action *a,
 	struct tcf_skbmod_params  *p = rtnl_dereference(d->skbmod_p);
 	struct tc_skbmod opt = {
 		.index   = d->tcf_index,
-		.refcnt  = d->tcf_refcnt - ref,
-		.bindcnt = d->tcf_bindcnt - bind,
+		.refcnt  = refcount_read(&d->tcf_refcnt) - ref,
+		.bindcnt = atomic_read(&d->tcf_bindcnt) - bind,
 		.action  = d->tcf_action,
 	};
 	struct tcf_t t;
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index ea203e386a92..2354f07eba15 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -474,8 +474,8 @@ static int tunnel_key_dump(struct sk_buff *skb, struct tc_action *a,
 	struct tcf_tunnel_key_params *params;
 	struct tc_tunnel_key opt = {
 		.index    = t->tcf_index,
-		.refcnt   = t->tcf_refcnt - ref,
-		.bindcnt  = t->tcf_bindcnt - bind,
+		.refcnt   = refcount_read(&t->tcf_refcnt) - ref,
+		.bindcnt  = atomic_read(&t->tcf_bindcnt) - bind,
 	};
 	struct tcf_t tm;
 
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 1fb39e1f9d07..799e3deb44ac 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -239,8 +239,8 @@ static int tcf_vlan_dump(struct sk_buff *skb, struct tc_action *a,
 	struct tcf_vlan_params *p = rtnl_dereference(v->vlan_p);
 	struct tc_vlan opt = {
 		.index    = v->tcf_index,
-		.refcnt   = v->tcf_refcnt - ref,
-		.bindcnt  = v->tcf_bindcnt - bind,
+		.refcnt   = refcount_read(&v->tcf_refcnt) - ref,
+		.bindcnt  = atomic_read(&v->tcf_bindcnt) - bind,
 		.action   = v->tcf_action,
 		.v_action = p->tcfv_action,
 	};
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 03/11] net: sched: implement unlocked action init API
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 02/11] net: sched: change type of reference and bind counters Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 04/11] net: sched: always take reference to action Vlad Buslov
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Add additional 'rtnl_held' argument to act API init functions. It is
required to implement actions that need to release rtnl lock before loading
kernel module and reacquire if afterwards.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Rename "unlocked" to "rtnl_held" for clarity.

 include/net/act_api.h      |  6 ++++--
 net/sched/act_api.c        | 18 +++++++++++-------
 net/sched/act_bpf.c        |  3 ++-
 net/sched/act_connmark.c   |  2 +-
 net/sched/act_csum.c       |  3 ++-
 net/sched/act_gact.c       |  3 ++-
 net/sched/act_ife.c        |  3 ++-
 net/sched/act_ipt.c        |  6 ++++--
 net/sched/act_mirred.c     |  5 +++--
 net/sched/act_nat.c        |  2 +-
 net/sched/act_pedit.c      |  3 ++-
 net/sched/act_police.c     |  2 +-
 net/sched/act_sample.c     |  3 ++-
 net/sched/act_simple.c     |  3 ++-
 net/sched/act_skbedit.c    |  3 ++-
 net/sched/act_skbmod.c     |  3 ++-
 net/sched/act_tunnel_key.c |  3 ++-
 net/sched/act_vlan.c       |  3 ++-
 net/sched/cls_api.c        |  5 +++--
 19 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 2759226527a2..27823f4e24c4 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -92,7 +92,8 @@ struct tc_action_ops {
 			  struct netlink_ext_ack *extack);
 	int     (*init)(struct net *net, struct nlattr *nla,
 			struct nlattr *est, struct tc_action **act, int ovr,
-			int bind, struct netlink_ext_ack *extack);
+			int bind, bool rtnl_held,
+			struct netlink_ext_ack *extack);
 	int     (*walk)(struct net *, struct sk_buff *,
 			struct netlink_callback *, int,
 			const struct tc_action_ops *,
@@ -168,10 +169,11 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
 int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		    struct nlattr *est, char *name, int ovr, int bind,
 		    struct list_head *actions, size_t *attr_size,
-		    struct netlink_ext_ack *extack);
+		    bool rtnl_held, struct netlink_ext_ack *extack);
 struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 				    struct nlattr *nla, struct nlattr *est,
 				    char *name, int ovr, int bind,
+				    bool rtnl_held,
 				    struct netlink_ext_ack *extack);
 int tcf_action_dump(struct sk_buff *skb, struct list_head *, int, int);
 int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int);
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 4f064ecab882..256b0c93916c 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -671,6 +671,7 @@ static struct tc_cookie *nla_memdup_cookie(struct nlattr **tb)
 struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 				    struct nlattr *nla, struct nlattr *est,
 				    char *name, int ovr, int bind,
+				    bool rtnl_held,
 				    struct netlink_ext_ack *extack)
 {
 	struct tc_action *a;
@@ -721,9 +722,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	a_o = tc_lookup_action_n(act_name);
 	if (a_o == NULL) {
 #ifdef CONFIG_MODULES
-		rtnl_unlock();
+		if (rtnl_held)
+			rtnl_unlock();
 		request_module("act_%s", act_name);
-		rtnl_lock();
+		if (rtnl_held)
+			rtnl_lock();
 
 		a_o = tc_lookup_action_n(act_name);
 
@@ -746,9 +749,10 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	/* backward compatibility for policer */
 	if (name == NULL)
 		err = a_o->init(net, tb[TCA_ACT_OPTIONS], est, &a, ovr, bind,
-				extack);
+				rtnl_held, extack);
 	else
-		err = a_o->init(net, nla, est, &a, ovr, bind, extack);
+		err = a_o->init(net, nla, est, &a, ovr, bind, rtnl_held,
+				extack);
 	if (err < 0)
 		goto err_mod;
 
@@ -800,7 +804,7 @@ static void cleanup_a(struct list_head *actions, int ovr)
 int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		    struct nlattr *est, char *name, int ovr, int bind,
 		    struct list_head *actions, size_t *attr_size,
-		    struct netlink_ext_ack *extack)
+		    bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
 	struct tc_action *act;
@@ -814,7 +818,7 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 
 	for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
 		act = tcf_action_init_1(net, tp, tb[i], est, name, ovr, bind,
-					extack);
+					rtnl_held, extack);
 		if (IS_ERR(act)) {
 			err = PTR_ERR(act);
 			goto err;
@@ -1173,7 +1177,7 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
 	LIST_HEAD(actions);
 
 	ret = tcf_action_init(net, NULL, nla, NULL, NULL, ovr, 0, &actions,
-			      &attr_size, extack);
+			      &attr_size, true, extack);
 	if (ret)
 		return ret;
 
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 15a2a53cbde1..8ebf40a3506c 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -276,7 +276,8 @@ static void tcf_bpf_prog_fill_cfg(const struct tcf_bpf *prog,
 
 static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 			struct nlattr *est, struct tc_action **act,
-			int replace, int bind, struct netlink_ext_ack *extack)
+			int replace, int bind, bool rtnl_held,
+			struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, bpf_net_id);
 	struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index 188865034f9a..e3787aa0025a 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -96,7 +96,7 @@ static const struct nla_policy connmark_policy[TCA_CONNMARK_MAX + 1] = {
 
 static int tcf_connmark_init(struct net *net, struct nlattr *nla,
 			     struct nlattr *est, struct tc_action **a,
-			     int ovr, int bind,
+			     int ovr, int bind, bool rtnl_held,
 			     struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, connmark_net_id);
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index da865f7b390a..334261943f9f 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -46,7 +46,8 @@ static struct tc_action_ops act_csum_ops;
 
 static int tcf_csum_init(struct net *net, struct nlattr *nla,
 			 struct nlattr *est, struct tc_action **a, int ovr,
-			 int bind, struct netlink_ext_ack *extack)
+			 int bind, bool rtnl_held,
+			 struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, csum_net_id);
 	struct tcf_csum_params *params_old, *params_new;
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index ca83debd5a70..b4dfb2b4addc 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -56,7 +56,8 @@ static const struct nla_policy gact_policy[TCA_GACT_MAX + 1] = {
 
 static int tcf_gact_init(struct net *net, struct nlattr *nla,
 			 struct nlattr *est, struct tc_action **a,
-			 int ovr, int bind, struct netlink_ext_ack *extack)
+			 int ovr, int bind, bool rtnl_held,
+			 struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, gact_net_id);
 	struct nlattr *tb[TCA_GACT_MAX + 1];
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 3536a23f46b5..576ffbba61c3 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -448,7 +448,8 @@ static int populate_metalist(struct tcf_ife_info *ife, struct nlattr **tb,
 
 static int tcf_ife_init(struct net *net, struct nlattr *nla,
 			struct nlattr *est, struct tc_action **a,
-			int ovr, int bind, struct netlink_ext_ack *extack)
+			int ovr, int bind, bool rtnl_held,
+			struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, ife_net_id);
 	struct nlattr *tb[TCA_IFE_MAX + 1];
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 7bce88dc11c9..9c21663a86a6 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -196,7 +196,8 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
 
 static int tcf_ipt_init(struct net *net, struct nlattr *nla,
 			struct nlattr *est, struct tc_action **a, int ovr,
-			int bind, struct netlink_ext_ack *extack)
+			int bind, bool rtnl_held,
+			struct netlink_ext_ack *extack)
 {
 	return __tcf_ipt_init(net, ipt_net_id, nla, est, a, &act_ipt_ops, ovr,
 			      bind);
@@ -204,7 +205,8 @@ static int tcf_ipt_init(struct net *net, struct nlattr *nla,
 
 static int tcf_xt_init(struct net *net, struct nlattr *nla,
 		       struct nlattr *est, struct tc_action **a, int ovr,
-		       int bind, struct netlink_ext_ack *extack)
+		       int bind, bool unlocked,
+		       struct netlink_ext_ack *extack)
 {
 	return __tcf_ipt_init(net, xt_net_id, nla, est, a, &act_xt_ops, ovr,
 			      bind);
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 82a8bdd67c47..5434f08f2eb7 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -68,8 +68,9 @@ static unsigned int mirred_net_id;
 static struct tc_action_ops act_mirred_ops;
 
 static int tcf_mirred_init(struct net *net, struct nlattr *nla,
-			   struct nlattr *est, struct tc_action **a, int ovr,
-			   int bind, struct netlink_ext_ack *extack)
+			   struct nlattr *est, struct tc_action **a,
+			   int ovr, int bind, bool rtnl_held,
+			   struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, mirred_net_id);
 	struct nlattr *tb[TCA_MIRRED_MAX + 1];
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 457c2ae3de46..e6487ad1e4a8 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -38,7 +38,7 @@ static const struct nla_policy nat_policy[TCA_NAT_MAX + 1] = {
 
 static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est,
 			struct tc_action **a, int ovr, int bind,
-			struct netlink_ext_ack *extack)
+			bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, nat_net_id);
 	struct nlattr *tb[TCA_NAT_MAX + 1];
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 889690e0ec39..f7965f35585b 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -132,7 +132,8 @@ static int tcf_pedit_key_ex_dump(struct sk_buff *skb,
 
 static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 			  struct nlattr *est, struct tc_action **a,
-			  int ovr, int bind, struct netlink_ext_ack *extack)
+			  int ovr, int bind, bool rtnl_held,
+			  struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, pedit_net_id);
 	struct nlattr *tb[TCA_PEDIT_MAX + 1];
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index a789b8060968..0e1c2fb0ebea 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -75,7 +75,7 @@ static const struct nla_policy police_policy[TCA_POLICE_MAX + 1] = {
 
 static int tcf_act_police_init(struct net *net, struct nlattr *nla,
 			       struct nlattr *est, struct tc_action **a,
-			       int ovr, int bind,
+			       int ovr, int bind, bool rtnl_held,
 			       struct netlink_ext_ack *extack)
 {
 	int ret = 0, err;
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 4a46978db092..316fc645595d 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -37,7 +37,8 @@ static const struct nla_policy sample_policy[TCA_SAMPLE_MAX + 1] = {
 
 static int tcf_sample_init(struct net *net, struct nlattr *nla,
 			   struct nlattr *est, struct tc_action **a, int ovr,
-			   int bind, struct netlink_ext_ack *extack)
+			   int bind, bool rtnl_held,
+			   struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, sample_net_id);
 	struct nlattr *tb[TCA_SAMPLE_MAX + 1];
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index c3a761097b01..dc591cc87f4a 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -79,7 +79,8 @@ static const struct nla_policy simple_policy[TCA_DEF_MAX + 1] = {
 
 static int tcf_simp_init(struct net *net, struct nlattr *nla,
 			 struct nlattr *est, struct tc_action **a,
-			 int ovr, int bind, struct netlink_ext_ack *extack)
+			 int ovr, int bind, bool rtnl_held,
+			 struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, simp_net_id);
 	struct nlattr *tb[TCA_DEF_MAX + 1];
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index cfd20d3d2ca9..c4ae4bd830aa 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -94,7 +94,8 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
 
 static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 			    struct nlattr *est, struct tc_action **a,
-			    int ovr, int bind, struct netlink_ext_ack *extack)
+			    int ovr, int bind, bool rtnl_held,
+			    struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, skbedit_net_id);
 	struct nlattr *tb[TCA_SKBEDIT_MAX + 1];
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index ff90d720eda3..026d6f58eda1 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -84,7 +84,8 @@ static const struct nla_policy skbmod_policy[TCA_SKBMOD_MAX + 1] = {
 
 static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
 			   struct nlattr *est, struct tc_action **a,
-			   int ovr, int bind, struct netlink_ext_ack *extack)
+			   int ovr, int bind, bool rtnl_held,
+			   struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, skbmod_net_id);
 	struct nlattr *tb[TCA_SKBMOD_MAX + 1];
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 2354f07eba15..15ea5ce0f9ed 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -201,7 +201,8 @@ static const struct nla_policy tunnel_key_policy[TCA_TUNNEL_KEY_MAX + 1] = {
 
 static int tunnel_key_init(struct net *net, struct nlattr *nla,
 			   struct nlattr *est, struct tc_action **a,
-			   int ovr, int bind, struct netlink_ext_ack *extack)
+			   int ovr, int bind, bool rtnl_held,
+			   struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
 	struct nlattr *tb[TCA_TUNNEL_KEY_MAX + 1];
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 799e3deb44ac..c61775250722 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -109,7 +109,8 @@ static const struct nla_policy vlan_policy[TCA_VLAN_MAX + 1] = {
 
 static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 			 struct nlattr *est, struct tc_action **a,
-			 int ovr, int bind, struct netlink_ext_ack *extack)
+			 int ovr, int bind, bool rtnl_held,
+			 struct netlink_ext_ack *extack)
 {
 	struct tc_action_net *tn = net_generic(net, vlan_net_id);
 	struct nlattr *tb[TCA_VLAN_MAX + 1];
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bbf8dda96b0e..ebc2b9dd783f 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1632,7 +1632,7 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 		if (exts->police && tb[exts->police]) {
 			act = tcf_action_init_1(net, tp, tb[exts->police],
 						rate_tlv, "police", ovr,
-						TCA_ACT_BIND, extack);
+						TCA_ACT_BIND, true, extack);
 			if (IS_ERR(act))
 				return PTR_ERR(act);
 
@@ -1645,7 +1645,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 
 			err = tcf_action_init(net, tp, tb[exts->action],
 					      rate_tlv, NULL, ovr, TCA_ACT_BIND,
-					      &actions, &attr_size, extack);
+					      &actions, &attr_size, true,
+					      extack);
 			if (err)
 				return err;
 			list_for_each_entry(act, &actions, list)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 04/11] net: sched: always take reference to action
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (2 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 03/11] net: sched: implement unlocked action init API Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 05/11] net: sched: implement action API that deletes action by index Vlad Buslov
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Without rtnl lock protection it is no longer safe to use pointer to tc
action without holding reference to it. (it can be destroyed concurrently)

Remove unsafe action idr lookup function. Instead of it, implement safe tcf
idr check function that atomically looks up action in idr and increments
its reference and bind counters. Implement both action search and check
using new safe function

Reference taken by idr check is temporal and should not be accounted by
userspace clients (both logically and to preserver current API behavior).
Subtract temporal reference when dumping action to userspace using existing
tca_get_fill function arguments.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Make __tcf_idr_check function static
- Merge changes that take reference to action when performing lookup and
  changes that account for this additional reference when dumping action
  to user space into single patch.

 net/sched/act_api.c | 46 ++++++++++++++++++++--------------------------
 1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 256b0c93916c..aa304d36fee0 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -284,44 +284,38 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
 }
 EXPORT_SYMBOL(tcf_generic_walker);
 
-static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
+static bool __tcf_idr_check(struct tc_action_net *tn, u32 index,
+			    struct tc_action **a, int bind)
 {
-	struct tc_action *p = NULL;
+	struct tcf_idrinfo *idrinfo = tn->idrinfo;
+	struct tc_action *p;
 
 	spin_lock(&idrinfo->lock);
 	p = idr_find(&idrinfo->action_idr, index);
+	if (p) {
+		refcount_inc(&p->tcfa_refcnt);
+		if (bind)
+			atomic_inc(&p->tcfa_bindcnt);
+	}
 	spin_unlock(&idrinfo->lock);
 
-	return p;
+	if (p) {
+		*a = p;
+		return true;
+	}
+	return false;
 }
 
 int tcf_idr_search(struct tc_action_net *tn, struct tc_action **a, u32 index)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
-
-	if (p) {
-		*a = p;
-		return 1;
-	}
-	return 0;
+	return __tcf_idr_check(tn, index, a, 0);
 }
 EXPORT_SYMBOL(tcf_idr_search);
 
 bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
 		   int bind)
 {
-	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct tc_action *p = tcf_idr_lookup(index, idrinfo);
-
-	if (index && p) {
-		if (bind)
-			atomic_inc(&p->tcfa_bindcnt);
-		refcount_inc(&p->tcfa_refcnt);
-		*a = p;
-		return true;
-	}
-	return false;
+	return __tcf_idr_check(tn, index, a, bind);
 }
 EXPORT_SYMBOL(tcf_idr_check);
 
@@ -932,7 +926,7 @@ tcf_get_notify(struct net *net, u32 portid, struct nlmsghdr *n,
 	if (!skb)
 		return -ENOBUFS;
 	if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, event,
-			 0, 0) <= 0) {
+			 0, 1) <= 0) {
 		NL_SET_ERR_MSG(extack, "Failed to fill netlink attributes while adding TC action");
 		kfree_skb(skb);
 		return -EINVAL;
@@ -1072,7 +1066,7 @@ tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
 		return -ENOBUFS;
 
 	if (tca_get_fill(skb, actions, portid, n->nlmsg_seq, 0, RTM_DELACTION,
-			 0, 1) <= 0) {
+			 0, 2) <= 0) {
 		NL_SET_ERR_MSG(extack, "Failed to fill netlink TC action attributes");
 		kfree_skb(skb);
 		return -EINVAL;
@@ -1131,14 +1125,14 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
 	if (event == RTM_GETACTION)
 		ret = tcf_get_notify(net, portid, n, &actions, event, extack);
 	else { /* delete */
+		cleanup_a(&actions, 1); /* lookup took reference */
 		ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
 		if (ret)
 			goto err;
 		return ret;
 	}
 err:
-	if (event != RTM_GETACTION)
-		tcf_action_destroy(&actions, 0);
+	tcf_action_destroy(&actions, 0);
 	return ret;
 }
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 05/11] net: sched: implement action API that deletes action by index
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (3 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 04/11] net: sched: always take reference to action Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops Vlad Buslov
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Implement new action API function that atomically finds and deletes action
from idr by index. Intended to be used by lockless actions that do not rely
on rtnl lock.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Rename tcf_idr_find_delete to tcf_idr_delete_index.

 include/net/act_api.h |  1 +
 net/sched/act_api.c   | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 27823f4e24c4..a8eaae67c264 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -153,6 +153,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		   int bind, bool cpustats);
 void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
 
+int tcf_idr_delete_index(struct tc_action_net *tn, u32 index);
 int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
 
 static inline int tcf_idr_release(struct tc_action *a, bool bind)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index aa304d36fee0..0f31f09946ab 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -319,6 +319,45 @@ bool tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
 }
 EXPORT_SYMBOL(tcf_idr_check);
 
+int tcf_idr_delete_index(struct tc_action_net *tn, u32 index)
+{
+	struct tcf_idrinfo *idrinfo = tn->idrinfo;
+	struct tc_action *p;
+	int ret = 0;
+
+	spin_lock(&idrinfo->lock);
+	p = idr_find(&idrinfo->action_idr, index);
+	if (!p) {
+		spin_unlock(&idrinfo->lock);
+		return -ENOENT;
+	}
+
+	if (!atomic_read(&p->tcfa_bindcnt)) {
+		if (refcount_dec_and_test(&p->tcfa_refcnt)) {
+			struct module *owner = p->ops->owner;
+
+			WARN_ON(p != idr_remove(&idrinfo->action_idr,
+						p->tcfa_index));
+			spin_unlock(&idrinfo->lock);
+
+			if (p->ops->cleanup)
+				p->ops->cleanup(p);
+
+			gen_kill_estimator(&p->tcfa_rate_est);
+			free_tcf(p);
+			module_put(owner);
+			return 0;
+		}
+		ret = 0;
+	} else {
+		ret = -EPERM;
+	}
+
+	spin_unlock(&idrinfo->lock);
+	return ret;
+}
+EXPORT_SYMBOL(tcf_idr_delete_index);
+
 int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		   struct tc_action **a, const struct tc_action_ops *ops,
 		   int bind, bool cpustats)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (4 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 05/11] net: sched: implement action API that deletes action by index Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-08-09 19:38   ` Cong Wang
  2018-07-05 14:24 ` [PATCH net-next v6 07/11] net: sched: implement reference counted action release Vlad Buslov
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Extend action ops with 'delete' function. Each action type to implements
its own delete function that doesn't depend on rtnl lock.

Implement delete function that is required to delete actions without
holding rtnl lock. Use action API function that atomically deletes action
only if it is still in action idr. This implementation prevents concurrent
threads from deleting same action twice.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Merge action ops delete definition and implementation.

 include/net/act_api.h      |  1 +
 net/sched/act_bpf.c        |  8 ++++++++
 net/sched/act_connmark.c   |  8 ++++++++
 net/sched/act_csum.c       |  8 ++++++++
 net/sched/act_gact.c       |  8 ++++++++
 net/sched/act_ife.c        |  8 ++++++++
 net/sched/act_ipt.c        | 16 ++++++++++++++++
 net/sched/act_mirred.c     |  8 ++++++++
 net/sched/act_nat.c        |  8 ++++++++
 net/sched/act_pedit.c      |  8 ++++++++
 net/sched/act_police.c     |  8 ++++++++
 net/sched/act_sample.c     |  8 ++++++++
 net/sched/act_simple.c     |  8 ++++++++
 net/sched/act_skbedit.c    |  8 ++++++++
 net/sched/act_skbmod.c     |  8 ++++++++
 net/sched/act_tunnel_key.c |  8 ++++++++
 net/sched/act_vlan.c       |  8 ++++++++
 17 files changed, 137 insertions(+)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index a8eaae67c264..b9ed2b8256a5 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -101,6 +101,7 @@ struct tc_action_ops {
 	void	(*stats_update)(struct tc_action *, u64, u32, u64);
 	size_t  (*get_fill_size)(const struct tc_action *act);
 	struct net_device *(*get_dev)(const struct tc_action *a);
+	int     (*delete)(struct net *net, u32 index);
 };
 
 struct tc_action_net {
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 8ebf40a3506c..7941dd66ff83 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -388,6 +388,13 @@ static int tcf_bpf_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_bpf_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, bpf_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_bpf_ops __read_mostly = {
 	.kind		=	"bpf",
 	.type		=	TCA_ACT_BPF,
@@ -398,6 +405,7 @@ static struct tc_action_ops act_bpf_ops __read_mostly = {
 	.init		=	tcf_bpf_init,
 	.walk		=	tcf_bpf_walker,
 	.lookup		=	tcf_bpf_search,
+	.delete		=	tcf_bpf_delete,
 	.size		=	sizeof(struct tcf_bpf),
 };
 
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index e3787aa0025a..143c2d3de723 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -193,6 +193,13 @@ static int tcf_connmark_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_connmark_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, connmark_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_connmark_ops = {
 	.kind		=	"connmark",
 	.type		=	TCA_ACT_CONNMARK,
@@ -202,6 +209,7 @@ static struct tc_action_ops act_connmark_ops = {
 	.init		=	tcf_connmark_init,
 	.walk		=	tcf_connmark_walker,
 	.lookup		=	tcf_connmark_search,
+	.delete		=	tcf_connmark_delete,
 	.size		=	sizeof(struct tcf_connmark_info),
 };
 
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 334261943f9f..3768539340e0 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -654,6 +654,13 @@ static size_t tcf_csum_get_fill_size(const struct tc_action *act)
 	return nla_total_size(sizeof(struct tc_csum));
 }
 
+static int tcf_csum_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, csum_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_csum_ops = {
 	.kind		= "csum",
 	.type		= TCA_ACT_CSUM,
@@ -665,6 +672,7 @@ static struct tc_action_ops act_csum_ops = {
 	.walk		= tcf_csum_walker,
 	.lookup		= tcf_csum_search,
 	.get_fill_size  = tcf_csum_get_fill_size,
+	.delete		= tcf_csum_delete,
 	.size		= sizeof(struct tcf_csum),
 };
 
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index b4dfb2b4addc..a431a711f0dd 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -231,6 +231,13 @@ static size_t tcf_gact_get_fill_size(const struct tc_action *act)
 	return sz;
 }
 
+static int tcf_gact_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, gact_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_gact_ops = {
 	.kind		=	"gact",
 	.type		=	TCA_ACT_GACT,
@@ -242,6 +249,7 @@ static struct tc_action_ops act_gact_ops = {
 	.walk		=	tcf_gact_walker,
 	.lookup		=	tcf_gact_search,
 	.get_fill_size	=	tcf_gact_get_fill_size,
+	.delete		=	tcf_gact_delete,
 	.size		=	sizeof(struct tcf_gact),
 };
 
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 576ffbba61c3..89a761395c94 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -844,6 +844,13 @@ static int tcf_ife_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_ife_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ife_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_ife_ops = {
 	.kind = "ife",
 	.type = TCA_ACT_IFE,
@@ -854,6 +861,7 @@ static struct tc_action_ops act_ife_ops = {
 	.init = tcf_ife_init,
 	.walk = tcf_ife_walker,
 	.lookup = tcf_ife_search,
+	.delete = tcf_ife_delete,
 	.size =	sizeof(struct tcf_ife_info),
 };
 
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 9c21663a86a6..6c234411c771 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -324,6 +324,13 @@ static int tcf_ipt_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_ipt_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, ipt_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_ipt_ops = {
 	.kind		=	"ipt",
 	.type		=	TCA_ACT_IPT,
@@ -334,6 +341,7 @@ static struct tc_action_ops act_ipt_ops = {
 	.init		=	tcf_ipt_init,
 	.walk		=	tcf_ipt_walker,
 	.lookup		=	tcf_ipt_search,
+	.delete		=	tcf_ipt_delete,
 	.size		=	sizeof(struct tcf_ipt),
 };
 
@@ -374,6 +382,13 @@ static int tcf_xt_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_xt_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, xt_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_xt_ops = {
 	.kind		=	"xt",
 	.type		=	TCA_ACT_XT,
@@ -384,6 +399,7 @@ static struct tc_action_ops act_xt_ops = {
 	.init		=	tcf_xt_init,
 	.walk		=	tcf_xt_walker,
 	.lookup		=	tcf_xt_search,
+	.delete		=	tcf_xt_delete,
 	.size		=	sizeof(struct tcf_ipt),
 };
 
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 5434f08f2eb7..3d8300bce7e4 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -322,6 +322,13 @@ static struct net_device *tcf_mirred_get_dev(const struct tc_action *a)
 	return rtnl_dereference(m->tcfm_dev);
 }
 
+static int tcf_mirred_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, mirred_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_mirred_ops = {
 	.kind		=	"mirred",
 	.type		=	TCA_ACT_MIRRED,
@@ -335,6 +342,7 @@ static struct tc_action_ops act_mirred_ops = {
 	.lookup		=	tcf_mirred_search,
 	.size		=	sizeof(struct tcf_mirred),
 	.get_dev	=	tcf_mirred_get_dev,
+	.delete		=	tcf_mirred_delete,
 };
 
 static __net_init int mirred_init_net(struct net *net)
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index e6487ad1e4a8..9eb27c89dc46 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -294,6 +294,13 @@ static int tcf_nat_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_nat_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, nat_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_nat_ops = {
 	.kind		=	"nat",
 	.type		=	TCA_ACT_NAT,
@@ -303,6 +310,7 @@ static struct tc_action_ops act_nat_ops = {
 	.init		=	tcf_nat_init,
 	.walk		=	tcf_nat_walker,
 	.lookup		=	tcf_nat_search,
+	.delete		=	tcf_nat_delete,
 	.size		=	sizeof(struct tcf_nat),
 };
 
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index f7965f35585b..45871052840f 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -454,6 +454,13 @@ static int tcf_pedit_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_pedit_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, pedit_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_pedit_ops = {
 	.kind		=	"pedit",
 	.type		=	TCA_ACT_PEDIT,
@@ -464,6 +471,7 @@ static struct tc_action_ops act_pedit_ops = {
 	.init		=	tcf_pedit_init,
 	.walk		=	tcf_pedit_walker,
 	.lookup		=	tcf_pedit_search,
+	.delete		=	tcf_pedit_delete,
 	.size		=	sizeof(struct tcf_pedit),
 };
 
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 0e1c2fb0ebea..c955fb0d4f3f 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -314,6 +314,13 @@ static int tcf_police_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_police_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, police_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 MODULE_AUTHOR("Alexey Kuznetsov");
 MODULE_DESCRIPTION("Policing actions");
 MODULE_LICENSE("GPL");
@@ -327,6 +334,7 @@ static struct tc_action_ops act_police_ops = {
 	.init		=	tcf_act_police_init,
 	.walk		=	tcf_act_police_walker,
 	.lookup		=	tcf_police_search,
+	.delete		=	tcf_police_delete,
 	.size		=	sizeof(struct tcf_police),
 };
 
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 316fc645595d..6f79d2afcba2 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -220,6 +220,13 @@ static int tcf_sample_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_sample_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, sample_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_sample_ops = {
 	.kind	  = "sample",
 	.type	  = TCA_ACT_SAMPLE,
@@ -230,6 +237,7 @@ static struct tc_action_ops act_sample_ops = {
 	.cleanup  = tcf_sample_cleanup,
 	.walk	  = tcf_sample_walker,
 	.lookup	  = tcf_sample_search,
+	.delete	  = tcf_sample_delete,
 	.size	  = sizeof(struct tcf_sample),
 };
 
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index dc591cc87f4a..446c750f3d3c 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -184,6 +184,13 @@ static int tcf_simp_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_simp_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, simp_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_simp_ops = {
 	.kind		=	"simple",
 	.type		=	TCA_ACT_SIMP,
@@ -194,6 +201,7 @@ static struct tc_action_ops act_simp_ops = {
 	.init		=	tcf_simp_init,
 	.walk		=	tcf_simp_walker,
 	.lookup		=	tcf_simp_search,
+	.delete		=	tcf_simp_delete,
 	.size		=	sizeof(struct tcf_defact),
 };
 
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index c4ae4bd830aa..b3eaa120c7f4 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -267,6 +267,13 @@ static int tcf_skbedit_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_skbedit_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, skbedit_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_skbedit_ops = {
 	.kind		=	"skbedit",
 	.type		=	TCA_ACT_SKBEDIT,
@@ -276,6 +283,7 @@ static struct tc_action_ops act_skbedit_ops = {
 	.init		=	tcf_skbedit_init,
 	.walk		=	tcf_skbedit_walker,
 	.lookup		=	tcf_skbedit_search,
+	.delete		=	tcf_skbedit_delete,
 	.size		=	sizeof(struct tcf_skbedit),
 };
 
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index 026d6f58eda1..30be3f767495 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -253,6 +253,13 @@ static int tcf_skbmod_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_skbmod_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, skbmod_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_skbmod_ops = {
 	.kind		=	"skbmod",
 	.type		=	TCA_ACT_SKBMOD,
@@ -263,6 +270,7 @@ static struct tc_action_ops act_skbmod_ops = {
 	.cleanup	=	tcf_skbmod_cleanup,
 	.walk		=	tcf_skbmod_walker,
 	.lookup		=	tcf_skbmod_search,
+	.delete		=	tcf_skbmod_delete,
 	.size		=	sizeof(struct tcf_skbmod),
 };
 
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 15ea5ce0f9ed..655ed0b3fc67 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -534,6 +534,13 @@ static int tunnel_key_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tunnel_key_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, tunnel_key_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_tunnel_key_ops = {
 	.kind		=	"tunnel_key",
 	.type		=	TCA_ACT_TUNNEL_KEY,
@@ -544,6 +551,7 @@ static struct tc_action_ops act_tunnel_key_ops = {
 	.cleanup	=	tunnel_key_release,
 	.walk		=	tunnel_key_walker,
 	.lookup		=	tunnel_key_search,
+	.delete		=	tunnel_key_delete,
 	.size		=	sizeof(struct tcf_tunnel_key),
 };
 
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index c61775250722..e334d2751784 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -287,6 +287,13 @@ static int tcf_vlan_search(struct net *net, struct tc_action **a, u32 index,
 	return tcf_idr_search(tn, a, index);
 }
 
+static int tcf_vlan_delete(struct net *net, u32 index)
+{
+	struct tc_action_net *tn = net_generic(net, vlan_net_id);
+
+	return tcf_idr_delete_index(tn, index);
+}
+
 static struct tc_action_ops act_vlan_ops = {
 	.kind		=	"vlan",
 	.type		=	TCA_ACT_VLAN,
@@ -297,6 +304,7 @@ static struct tc_action_ops act_vlan_ops = {
 	.cleanup	=	tcf_vlan_cleanup,
 	.walk		=	tcf_vlan_walker,
 	.lookup		=	tcf_vlan_search,
+	.delete		=	tcf_vlan_delete,
 	.size		=	sizeof(struct tcf_vlan),
 };
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 07/11] net: sched: implement reference counted action release
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (5 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite Vlad Buslov
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Implement helper delete function that uses new action ops 'delete', instead
of destroying action directly. This is required so act API could delete
actions by index, without holding any references to action that is being
deleted.

Implement function __tcf_action_put() that releases reference to action and
frees it, if necessary. Refactor action deletion code to use new put
function and not to rely on rtnl lock. Remove rtnl lock assertions that are
no longer needed.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Removed redundant actions ops lookup during delete.
- Assume all actions have delete implemented and don't check for it
  explicitly.
- Rearrange variable definitions in tcf_action_delete.

 net/sched/act_api.c | 84 +++++++++++++++++++++++++++++++++++++++--------------
 net/sched/cls_api.c |  1 -
 2 files changed, 62 insertions(+), 23 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 0f31f09946ab..a023873db713 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -90,21 +90,39 @@ static void free_tcf(struct tc_action *p)
 	kfree(p);
 }
 
-static void tcf_idr_remove(struct tcf_idrinfo *idrinfo, struct tc_action *p)
+static void tcf_action_cleanup(struct tc_action *p)
 {
-	spin_lock(&idrinfo->lock);
-	idr_remove(&idrinfo->action_idr, p->tcfa_index);
-	spin_unlock(&idrinfo->lock);
+	if (p->ops->cleanup)
+		p->ops->cleanup(p);
+
 	gen_kill_estimator(&p->tcfa_rate_est);
 	free_tcf(p);
 }
 
+static int __tcf_action_put(struct tc_action *p, bool bind)
+{
+	struct tcf_idrinfo *idrinfo = p->idrinfo;
+
+	if (refcount_dec_and_lock(&p->tcfa_refcnt, &idrinfo->lock)) {
+		if (bind)
+			atomic_dec(&p->tcfa_bindcnt);
+		idr_remove(&idrinfo->action_idr, p->tcfa_index);
+		spin_unlock(&idrinfo->lock);
+
+		tcf_action_cleanup(p);
+		return 1;
+	}
+
+	if (bind)
+		atomic_dec(&p->tcfa_bindcnt);
+
+	return 0;
+}
+
 int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 {
 	int ret = 0;
 
-	ASSERT_RTNL();
-
 	/* Release with strict==1 and bind==0 is only called through act API
 	 * interface (classifiers always bind). Only case when action with
 	 * positive reference count and zero bind count can exist is when it was
@@ -118,18 +136,11 @@ int __tcf_idr_release(struct tc_action *p, bool bind, bool strict)
 	 * are acceptable.
 	 */
 	if (p) {
-		if (bind)
-			atomic_dec(&p->tcfa_bindcnt);
-		else if (strict && atomic_read(&p->tcfa_bindcnt) > 0)
+		if (!bind && strict && atomic_read(&p->tcfa_bindcnt) > 0)
 			return -EPERM;
 
-		if (atomic_read(&p->tcfa_bindcnt) <= 0 &&
-		    refcount_dec_and_test(&p->tcfa_refcnt)) {
-			if (p->ops->cleanup)
-				p->ops->cleanup(p);
-			tcf_idr_remove(p->idrinfo, p);
+		if (__tcf_action_put(p, bind))
 			ret = ACT_P_DELETED;
-		}
 	}
 
 	return ret;
@@ -340,11 +351,7 @@ int tcf_idr_delete_index(struct tc_action_net *tn, u32 index)
 						p->tcfa_index));
 			spin_unlock(&idrinfo->lock);
 
-			if (p->ops->cleanup)
-				p->ops->cleanup(p);
-
-			gen_kill_estimator(&p->tcfa_rate_est);
-			free_tcf(p);
+			tcf_action_cleanup(p);
 			module_put(owner);
 			return 0;
 		}
@@ -615,6 +622,11 @@ int tcf_action_destroy(struct list_head *actions, int bind)
 	return ret;
 }
 
+static int tcf_action_put(struct tc_action *p)
+{
+	return __tcf_action_put(p, false);
+}
+
 int
 tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
 {
@@ -1092,6 +1104,35 @@ static int tca_action_flush(struct net *net, struct nlattr *nla,
 	return err;
 }
 
+static int tcf_action_delete(struct net *net, struct list_head *actions,
+			     struct netlink_ext_ack *extack)
+{
+	struct tc_action *a, *tmp;
+	u32 act_index;
+	int ret;
+
+	list_for_each_entry_safe(a, tmp, actions, list) {
+		const struct tc_action_ops *ops = a->ops;
+
+		/* Actions can be deleted concurrently so we must save their
+		 * type and id to search again after reference is released.
+		 */
+		act_index = a->tcfa_index;
+
+		list_del(&a->list);
+		if (tcf_action_put(a)) {
+			/* last reference, action was deleted concurrently */
+			module_put(ops->owner);
+		} else  {
+			/* now do the delete */
+			ret = ops->delete(net, act_index);
+			if (ret < 0)
+				return ret;
+		}
+	}
+	return 0;
+}
+
 static int
 tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
 	       u32 portid, size_t attr_size, struct netlink_ext_ack *extack)
@@ -1112,7 +1153,7 @@ tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
 	}
 
 	/* now do the delete */
-	ret = tcf_action_destroy(actions, 0);
+	ret = tcf_action_delete(net, actions, extack);
 	if (ret < 0) {
 		NL_SET_ERR_MSG(extack, "Failed to delete TC action");
 		kfree_skb(skb);
@@ -1164,7 +1205,6 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
 	if (event == RTM_GETACTION)
 		ret = tcf_get_notify(net, portid, n, &actions, event, extack);
 	else { /* delete */
-		cleanup_a(&actions, 1); /* lookup took reference */
 		ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
 		if (ret)
 			goto err;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index ebc2b9dd783f..9041f0e43e9a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1611,7 +1611,6 @@ void tcf_exts_destroy(struct tcf_exts *exts)
 #ifdef CONFIG_NET_CLS_ACT
 	LIST_HEAD(actions);
 
-	ASSERT_RTNL();
 	tcf_exts_to_list(exts, &actions);
 	tcf_action_destroy(&actions, TCA_ACT_UNBIND);
 	kfree(exts->actions);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (6 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 07/11] net: sched: implement reference counted action release Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-08-13 23:00   ` Cong Wang
  2018-07-05 14:24 ` [PATCH net-next v6 09/11] net: sched: use reference counting action init Vlad Buslov
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Return from action init function with reference to action taken,
even when overwriting existing action.

Action init API initializes its fourth argument (pointer to pointer to tc
action) to either existing action with same index or newly created action.
In case of existing index(and bind argument is zero), init function returns
without incrementing action reference counter. Caller of action init then
proceeds working with action, without actually holding reference to it.
This means that action could be deleted concurrently.

Change action init behavior to always take reference to action before
returning successfully, in order to protect from concurrent deletion.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Resplit action lookup/release code to prevent memory leaks in
  individual patches.
- Change convoluted commit message.

 net/sched/act_api.c        |  2 --
 net/sched/act_bpf.c        |  8 ++++----
 net/sched/act_connmark.c   |  5 +++--
 net/sched/act_csum.c       |  8 ++++----
 net/sched/act_gact.c       |  5 +++--
 net/sched/act_ife.c        | 10 +++++-----
 net/sched/act_ipt.c        |  5 +++--
 net/sched/act_mirred.c     |  5 ++---
 net/sched/act_nat.c        |  5 +++--
 net/sched/act_pedit.c      |  2 +-
 net/sched/act_police.c     |  8 +++-----
 net/sched/act_sample.c     |  8 +++-----
 net/sched/act_simple.c     |  5 +++--
 net/sched/act_skbedit.c    |  5 +++--
 net/sched/act_skbmod.c     |  8 +++-----
 net/sched/act_tunnel_key.c | 11 ++++-------
 net/sched/act_vlan.c       |  8 +++-----
 17 files changed, 50 insertions(+), 58 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index a023873db713..f019f0464cec 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -870,8 +870,6 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		}
 		act->order = i;
 		sz += tcf_action_fill_size(act);
-		if (ovr)
-			refcount_inc(&act->tcfa_refcnt);
 		list_add_tail(&act->list, actions);
 	}
 
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 7941dd66ff83..d3f4ac6f2c4b 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -311,9 +311,10 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 		if (bind)
 			return 0;
 
-		tcf_idr_release(*act, bind);
-		if (!replace)
+		if (!replace) {
+			tcf_idr_release(*act, bind);
 			return -EEXIST;
+		}
 	}
 
 	is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
@@ -356,8 +357,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
 	return res;
 out:
-	if (res == ACT_P_CREATED)
-		tcf_idr_release(*act, bind);
+	tcf_idr_release(*act, bind);
 
 	return ret;
 }
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index 143c2d3de723..701e90244eff 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -135,9 +135,10 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
 		ci = to_connmark(*a);
 		if (bind)
 			return 0;
-		tcf_idr_release(*a, bind);
-		if (!ovr)
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			return -EEXIST;
+		}
 		/* replacing action and zone */
 		ci->tcf_action = parm->action;
 		ci->zone = parm->zone;
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 3768539340e0..5dbee136b0a1 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -76,9 +76,10 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla,
 	} else {
 		if (bind)/* dont override defaults */
 			return 0;
-		tcf_idr_release(*a, bind);
-		if (!ovr)
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			return -EEXIST;
+		}
 	}
 
 	p = to_tcf_csum(*a);
@@ -86,8 +87,7 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla,
 
 	params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
 	if (unlikely(!params_new)) {
-		if (ret == ACT_P_CREATED)
-			tcf_idr_release(*a, bind);
+		tcf_idr_release(*a, bind);
 		return -ENOMEM;
 	}
 	params_old = rtnl_dereference(p->params);
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index a431a711f0dd..11c4de3f344e 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -100,9 +100,10 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
 	} else {
 		if (bind)/* dont override defaults */
 			return 0;
-		tcf_idr_release(*a, bind);
-		if (!ovr)
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			return -EEXIST;
+		}
 	}
 
 	gact = to_gact(*a);
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index 89a761395c94..acea3feae762 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -498,12 +498,10 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 			return ret;
 		}
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
-		if (!ovr) {
-			kfree(p);
-			return -EEXIST;
-		}
+		kfree(p);
+		return -EEXIST;
 	}
 
 	ife = to_ife(*a);
@@ -548,6 +546,8 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 
 			if (exists)
 				spin_unlock_bh(&ife->tcf_lock);
+			tcf_idr_release(*a, bind);
+
 			kfree(p);
 			return err;
 		}
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 6c234411c771..85e85dfba401 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -145,10 +145,11 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
 	} else {
 		if (bind)/* dont override defaults */
 			return 0;
-		tcf_idr_release(*a, bind);
 
-		if (!ovr)
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			return -EEXIST;
+		}
 	}
 	hook = nla_get_u32(tb[TCA_IPT_HOOK]);
 
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 3d8300bce7e4..e08aed06d7f8 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -132,10 +132,9 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
-		if (!ovr)
-			return -EEXIST;
+		return -EEXIST;
 	}
 	m = to_mirred(*a);
 
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 9eb27c89dc46..1f91e8e66c0f 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -66,9 +66,10 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est,
 	} else {
 		if (bind)
 			return 0;
-		tcf_idr_release(*a, bind);
-		if (!ovr)
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			return -EEXIST;
+		}
 	}
 	p = to_tcf_nat(*a);
 
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 45871052840f..3a0e2f762f4e 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -194,8 +194,8 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 	} else {
 		if (bind)
 			goto out_free;
-		tcf_idr_release(*a, bind);
 		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			ret = -EEXIST;
 			goto out_free;
 		}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index c955fb0d4f3f..99335cca739e 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -111,10 +111,9 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
-		if (!ovr)
-			return -EEXIST;
+		return -EEXIST;
 	}
 
 	police = to_police(*a);
@@ -195,8 +194,7 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
 failure:
 	qdisc_put_rtab(P_tab);
 	qdisc_put_rtab(R_tab);
-	if (ret == ACT_P_CREATED)
-		tcf_idr_release(*a, bind);
+	tcf_idr_release(*a, bind);
 	return err;
 }
 
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index 6f79d2afcba2..a8582e1347db 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -69,10 +69,9 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
 		if (ret)
 			return ret;
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
-		if (!ovr)
-			return -EEXIST;
+		return -EEXIST;
 	}
 	s = to_sample(*a);
 
@@ -81,8 +80,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
 	s->psample_group_num = nla_get_u32(tb[TCA_SAMPLE_PSAMPLE_GROUP]);
 	psample_group = psample_group_get(net, s->psample_group_num);
 	if (!psample_group) {
-		if (ret == ACT_P_CREATED)
-			tcf_idr_release(*a, bind);
+		tcf_idr_release(*a, bind);
 		return -ENOMEM;
 	}
 	RCU_INIT_POINTER(s->psample_group, psample_group);
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 446c750f3d3c..2da47c682a30 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -127,9 +127,10 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla,
 	} else {
 		d = to_defact(*a);
 
-		tcf_idr_release(*a, bind);
-		if (!ovr)
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			return -EEXIST;
+		}
 
 		reset_policy(d, tb[TCA_DEF_DATA], parm);
 	}
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index b3eaa120c7f4..4616a2c1821f 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -172,9 +172,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 		ret = ACT_P_CREATED;
 	} else {
 		d = to_skbedit(*a);
-		tcf_idr_release(*a, bind);
-		if (!ovr)
+		if (!ovr) {
+			tcf_idr_release(*a, bind);
 			return -EEXIST;
+		}
 	}
 
 	spin_lock_bh(&d->tcf_lock);
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index 30be3f767495..e844381af066 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -145,10 +145,9 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
 			return ret;
 
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
-		if (!ovr)
-			return -EEXIST;
+		return -EEXIST;
 	}
 
 	d = to_skbmod(*a);
@@ -156,8 +155,7 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
 	ASSERT_RTNL();
 	p = kzalloc(sizeof(struct tcf_skbmod_params), GFP_KERNEL);
 	if (unlikely(!p)) {
-		if (ret == ACT_P_CREATED)
-			tcf_idr_release(*a, bind);
+		tcf_idr_release(*a, bind);
 		return -ENOMEM;
 	}
 
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 655ed0b3fc67..ab5bf5c13f87 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -329,12 +329,10 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 		}
 
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
-		if (!ovr) {
-			NL_SET_ERR_MSG(extack, "TC IDR already exists");
-			return -EEXIST;
-		}
+		NL_SET_ERR_MSG(extack, "TC IDR already exists");
+		return -EEXIST;
 	}
 
 	t = to_tunnel_key(*a);
@@ -342,8 +340,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 	ASSERT_RTNL();
 	params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
 	if (unlikely(!params_new)) {
-		if (ret == ACT_P_CREATED)
-			tcf_idr_release(*a, bind);
+		tcf_idr_release(*a, bind);
 		NL_SET_ERR_MSG(extack, "Cannot allocate tunnel key parameters");
 		return -ENOMEM;
 	}
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index e334d2751784..9b600faaccbb 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -187,10 +187,9 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 			return ret;
 
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
-		if (!ovr)
-			return -EEXIST;
+		return -EEXIST;
 	}
 
 	v = to_vlan(*a);
@@ -198,8 +197,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 	ASSERT_RTNL();
 	p = kzalloc(sizeof(*p), GFP_KERNEL);
 	if (!p) {
-		if (ret == ACT_P_CREATED)
-			tcf_idr_release(*a, bind);
+		tcf_idr_release(*a, bind);
 		return -ENOMEM;
 	}
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 09/11] net: sched: use reference counting action init
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (7 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-07-05 14:24 ` [PATCH net-next v6 10/11] net: sched: atomically check-allocate action Vlad Buslov
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Change action API to assume that action init function always takes
reference to action, even when overwriting existing action. This is
necessary because action API continues to use action pointer after init
function is done. At this point action becomes accessible for concurrent
modifications, so user must always hold reference to it.

Implement helper put list function to atomically release list of actions
after action API init code is done using them.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Resplit action lookup/release code to prevent memory leaks in
  individual patches.

 net/sched/act_api.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index f019f0464cec..eefe8c2fe667 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -627,6 +627,18 @@ static int tcf_action_put(struct tc_action *p)
 	return __tcf_action_put(p, false);
 }
 
+static void tcf_action_put_lst(struct list_head *actions)
+{
+	struct tc_action *a, *tmp;
+
+	list_for_each_entry_safe(a, tmp, actions, list) {
+		const struct tc_action_ops *ops = a->ops;
+
+		if (tcf_action_put(a))
+			module_put(ops->owner);
+	}
+}
+
 int
 tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
 {
@@ -835,17 +847,6 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	return ERR_PTR(err);
 }
 
-static void cleanup_a(struct list_head *actions, int ovr)
-{
-	struct tc_action *a;
-
-	if (!ovr)
-		return;
-
-	list_for_each_entry(a, actions, list)
-		refcount_dec(&a->tcfa_refcnt);
-}
-
 int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		    struct nlattr *est, char *name, int ovr, int bind,
 		    struct list_head *actions, size_t *attr_size,
@@ -874,11 +875,6 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 	}
 
 	*attr_size = tcf_action_full_attrs_size(sz);
-
-	/* Remove the temp refcnt which was necessary to protect against
-	 * destroying an existing action which was being replaced
-	 */
-	cleanup_a(actions, ovr);
 	return 0;
 
 err:
@@ -1209,7 +1205,7 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
 		return ret;
 	}
 err:
-	tcf_action_destroy(&actions, 0);
+	tcf_action_put_lst(&actions);
 	return ret;
 }
 
@@ -1251,8 +1247,11 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
 			      &attr_size, true, extack);
 	if (ret)
 		return ret;
+	ret = tcf_add_notify(net, n, &actions, portid, attr_size, extack);
+	if (ovr)
+		tcf_action_put_lst(&actions);
 
-	return tcf_add_notify(net, n, &actions, portid, attr_size, extack);
+	return ret;
 }
 
 static u32 tcaa_root_flags_allowed = TCA_FLAG_LARGE_DUMP_ON;
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (8 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 09/11] net: sched: use reference counting action init Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-08-08  1:20   ` Cong Wang
  2018-07-05 14:24 ` [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions Vlad Buslov
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn,
	Vlad Buslov, Jiri Pirko

Implement function that atomically checks if action exists and either takes
reference to it, or allocates idr slot for action index to prevent
concurrent allocations of actions with same index. Use EBUSY error pointer
to indicate that idr slot is reserved.

Implement cleanup helper function that removes temporary error pointer from
idr. (in case of error between idr allocation and insertion of newly
created action to specified index)

Refactor all action init functions to insert new action to idr using this
API.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
Changes from V1 to V2:
- Remove unique idr insertion function. Change original idr insert to do
  the same thing.
- Refactor action check-alloc code into standalone function.

 include/net/act_api.h      |  3 ++
 net/sched/act_api.c        | 92 ++++++++++++++++++++++++++++++++++++----------
 net/sched/act_bpf.c        | 11 ++++--
 net/sched/act_connmark.c   | 10 +++--
 net/sched/act_csum.c       | 11 ++++--
 net/sched/act_gact.c       | 11 ++++--
 net/sched/act_ife.c        |  6 ++-
 net/sched/act_ipt.c        | 13 ++++++-
 net/sched/act_mirred.c     | 16 ++++++--
 net/sched/act_nat.c        | 11 ++++--
 net/sched/act_pedit.c      | 12 ++++--
 net/sched/act_police.c     |  9 ++++-
 net/sched/act_sample.c     | 11 ++++--
 net/sched/act_simple.c     | 11 +++++-
 net/sched/act_skbedit.c    | 11 +++++-
 net/sched/act_skbmod.c     | 11 +++++-
 net/sched/act_tunnel_key.c |  9 ++++-
 net/sched/act_vlan.c       | 17 ++++++++-
 18 files changed, 216 insertions(+), 59 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index b9ed2b8256a5..8090de2edab7 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -154,6 +154,9 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 		   int bind, bool cpustats);
 void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a);
 
+void tcf_idr_cleanup(struct tc_action_net *tn, u32 index);
+int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
+			struct tc_action **a, int bind);
 int tcf_idr_delete_index(struct tc_action_net *tn, u32 index);
 int __tcf_idr_release(struct tc_action *a, bool bind, bool strict);
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index eefe8c2fe667..9511502e1cbb 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -303,7 +303,9 @@ static bool __tcf_idr_check(struct tc_action_net *tn, u32 index,
 
 	spin_lock(&idrinfo->lock);
 	p = idr_find(&idrinfo->action_idr, index);
-	if (p) {
+	if (IS_ERR(p)) {
+		p = NULL;
+	} else if (p) {
 		refcount_inc(&p->tcfa_refcnt);
 		if (bind)
 			atomic_inc(&p->tcfa_bindcnt);
@@ -371,7 +373,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 {
 	struct tc_action *p = kzalloc(ops->size, GFP_KERNEL);
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
-	struct idr *idr = &idrinfo->action_idr;
 	int err = -ENOMEM;
 
 	if (unlikely(!p))
@@ -389,20 +390,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 			goto err2;
 	}
 	spin_lock_init(&p->tcfa_lock);
-	idr_preload(GFP_KERNEL);
-	spin_lock(&idrinfo->lock);
-	/* user doesn't specify an index */
-	if (!index) {
-		index = 1;
-		err = idr_alloc_u32(idr, NULL, &index, UINT_MAX, GFP_ATOMIC);
-	} else {
-		err = idr_alloc_u32(idr, NULL, &index, index, GFP_ATOMIC);
-	}
-	spin_unlock(&idrinfo->lock);
-	idr_preload_end();
-	if (err)
-		goto err3;
-
 	p->tcfa_index = index;
 	p->tcfa_tm.install = jiffies;
 	p->tcfa_tm.lastuse = jiffies;
@@ -412,7 +399,7 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 					&p->tcfa_rate_est,
 					&p->tcfa_lock, NULL, est);
 		if (err)
-			goto err4;
+			goto err3;
 	}
 
 	p->idrinfo = idrinfo;
@@ -420,8 +407,6 @@ int tcf_idr_create(struct tc_action_net *tn, u32 index, struct nlattr *est,
 	INIT_LIST_HEAD(&p->list);
 	*a = p;
 	return 0;
-err4:
-	idr_remove(idr, index);
 err3:
 	free_percpu(p->cpu_qstats);
 err2:
@@ -437,11 +422,78 @@ void tcf_idr_insert(struct tc_action_net *tn, struct tc_action *a)
 	struct tcf_idrinfo *idrinfo = tn->idrinfo;
 
 	spin_lock(&idrinfo->lock);
-	idr_replace(&idrinfo->action_idr, a, a->tcfa_index);
+	/* Replace ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */
+	WARN_ON(!IS_ERR(idr_replace(&idrinfo->action_idr, a, a->tcfa_index)));
 	spin_unlock(&idrinfo->lock);
 }
 EXPORT_SYMBOL(tcf_idr_insert);
 
+/* Cleanup idr index that was allocated but not initialized. */
+
+void tcf_idr_cleanup(struct tc_action_net *tn, u32 index)
+{
+	struct tcf_idrinfo *idrinfo = tn->idrinfo;
+
+	spin_lock(&idrinfo->lock);
+	/* Remove ERR_PTR(-EBUSY) allocated by tcf_idr_check_alloc */
+	WARN_ON(!IS_ERR(idr_remove(&idrinfo->action_idr, index)));
+	spin_unlock(&idrinfo->lock);
+}
+EXPORT_SYMBOL(tcf_idr_cleanup);
+
+/* Check if action with specified index exists. If actions is found, increments
+ * its reference and bind counters, and return 1. Otherwise insert temporary
+ * error pointer (to prevent concurrent users from inserting actions with same
+ * index) and return 0.
+ */
+
+int tcf_idr_check_alloc(struct tc_action_net *tn, u32 *index,
+			struct tc_action **a, int bind)
+{
+	struct tcf_idrinfo *idrinfo = tn->idrinfo;
+	struct tc_action *p;
+	int ret;
+
+again:
+	spin_lock(&idrinfo->lock);
+	if (*index) {
+		p = idr_find(&idrinfo->action_idr, *index);
+		if (IS_ERR(p)) {
+			/* This means that another process allocated
+			 * index but did not assign the pointer yet.
+			 */
+			spin_unlock(&idrinfo->lock);
+			goto again;
+		}
+
+		if (p) {
+			refcount_inc(&p->tcfa_refcnt);
+			if (bind)
+				atomic_inc(&p->tcfa_bindcnt);
+			*a = p;
+			ret = 1;
+		} else {
+			*a = NULL;
+			ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
+					    *index, GFP_ATOMIC);
+			if (!ret)
+				idr_replace(&idrinfo->action_idr,
+					    ERR_PTR(-EBUSY), *index);
+		}
+	} else {
+		*index = 1;
+		*a = NULL;
+		ret = idr_alloc_u32(&idrinfo->action_idr, NULL, index,
+				    UINT_MAX, GFP_ATOMIC);
+		if (!ret)
+			idr_replace(&idrinfo->action_idr, ERR_PTR(-EBUSY),
+				    *index);
+	}
+	spin_unlock(&idrinfo->lock);
+	return ret;
+}
+EXPORT_SYMBOL(tcf_idr_check_alloc);
+
 void tcf_idrinfo_destroy(const struct tc_action_ops *ops,
 			 struct tcf_idrinfo *idrinfo)
 {
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index d3f4ac6f2c4b..06f743d8ed41 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -299,14 +299,17 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 
 	parm = nla_data(tb[TCA_ACT_BPF_PARMS]);
 
-	if (!tcf_idr_check(tn, parm->index, act, bind)) {
+	ret = tcf_idr_check_alloc(tn, &parm->index, act, bind);
+	if (!ret) {
 		ret = tcf_idr_create(tn, parm->index, est, act,
 				     &act_bpf_ops, bind, true);
-		if (ret < 0)
+		if (ret < 0) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 
 		res = ACT_P_CREATED;
-	} else {
+	} else if (ret > 0) {
 		/* Don't override defaults. */
 		if (bind)
 			return 0;
@@ -315,6 +318,8 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
 			tcf_idr_release(*act, bind);
 			return -EEXIST;
 		}
+	} else {
+		return ret;
 	}
 
 	is_bpf = tb[TCA_ACT_BPF_OPS_LEN] && tb[TCA_ACT_BPF_OPS];
diff --git a/net/sched/act_connmark.c b/net/sched/act_connmark.c
index 701e90244eff..1e31f0e448e2 100644
--- a/net/sched/act_connmark.c
+++ b/net/sched/act_connmark.c
@@ -118,11 +118,14 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
 
 	parm = nla_data(tb[TCA_CONNMARK_PARMS]);
 
-	if (!tcf_idr_check(tn, parm->index, a, bind)) {
+	ret = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!ret) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_connmark_ops, bind, false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 
 		ci = to_connmark(*a);
 		ci->tcf_action = parm->action;
@@ -131,7 +134,7 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
 
 		tcf_idr_insert(tn, *a);
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (ret > 0) {
 		ci = to_connmark(*a);
 		if (bind)
 			return 0;
@@ -142,6 +145,7 @@ static int tcf_connmark_init(struct net *net, struct nlattr *nla,
 		/* replacing action and zone */
 		ci->tcf_action = parm->action;
 		ci->zone = parm->zone;
+		ret = 0;
 	}
 
 	return ret;
diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c
index 5dbee136b0a1..bd232d3bd022 100644
--- a/net/sched/act_csum.c
+++ b/net/sched/act_csum.c
@@ -67,19 +67,24 @@ static int tcf_csum_init(struct net *net, struct nlattr *nla,
 		return -EINVAL;
 	parm = nla_data(tb[TCA_CSUM_PARMS]);
 
-	if (!tcf_idr_check(tn, parm->index, a, bind)) {
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!err) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_csum_ops, bind, true);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (err > 0) {
 		if (bind)/* dont override defaults */
 			return 0;
 		if (!ovr) {
 			tcf_idr_release(*a, bind);
 			return -EEXIST;
 		}
+	} else {
+		return err;
 	}
 
 	p = to_tcf_csum(*a);
diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 11c4de3f344e..661b72b9147d 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -91,19 +91,24 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla,
 	}
 #endif
 
-	if (!tcf_idr_check(tn, parm->index, a, bind)) {
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!err) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_gact_ops, bind, true);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (err > 0) {
 		if (bind)/* dont override defaults */
 			return 0;
 		if (!ovr) {
 			tcf_idr_release(*a, bind);
 			return -EEXIST;
 		}
+	} else {
+		return err;
 	}
 
 	gact = to_gact(*a);
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index acea3feae762..a3eef00cd711 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -484,7 +484,10 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 	if (!p)
 		return -ENOMEM;
 
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind) {
 		kfree(p);
 		return 0;
@@ -494,6 +497,7 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
 		ret = tcf_idr_create(tn, parm->index, est, a, &act_ife_ops,
 				     bind, true);
 		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			kfree(p);
 			return ret;
 		}
diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 85e85dfba401..0dc787a57798 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -119,13 +119,18 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
 	if (tb[TCA_IPT_INDEX] != NULL)
 		index = nla_get_u32(tb[TCA_IPT_INDEX]);
 
-	exists = tcf_idr_check(tn, index, a, bind);
+	err = tcf_idr_check_alloc(tn, &index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
 	if (tb[TCA_IPT_HOOK] == NULL || tb[TCA_IPT_TARG] == NULL) {
 		if (exists)
 			tcf_idr_release(*a, bind);
+		else
+			tcf_idr_cleanup(tn, index);
 		return -EINVAL;
 	}
 
@@ -133,14 +138,18 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, struct nlattr *nla,
 	if (nla_len(tb[TCA_IPT_TARG]) < td->u.target_size) {
 		if (exists)
 			tcf_idr_release(*a, bind);
+		else
+			tcf_idr_cleanup(tn, index);
 		return -EINVAL;
 	}
 
 	if (!exists) {
 		ret = tcf_idr_create(tn, index, est, a, ops, bind,
 				     false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, index);
 			return ret;
+		}
 		ret = ACT_P_CREATED;
 	} else {
 		if (bind)/* dont override defaults */
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index e08aed06d7f8..6afd89a36c69 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -79,7 +79,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	struct tcf_mirred *m;
 	struct net_device *dev;
 	bool exists = false;
-	int ret;
+	int ret, err;
 
 	if (!nla) {
 		NL_SET_ERR_MSG_MOD(extack, "Mirred requires attributes to be passed");
@@ -94,7 +94,10 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	}
 	parm = nla_data(tb[TCA_MIRRED_PARMS]);
 
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
@@ -107,6 +110,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	default:
 		if (exists)
 			tcf_idr_release(*a, bind);
+		else
+			tcf_idr_cleanup(tn, parm->index);
 		NL_SET_ERR_MSG_MOD(extack, "Unknown mirred option");
 		return -EINVAL;
 	}
@@ -115,6 +120,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 		if (dev == NULL) {
 			if (exists)
 				tcf_idr_release(*a, bind);
+			else
+				tcf_idr_cleanup(tn, parm->index);
 			return -ENODEV;
 		}
 		mac_header_xmit = dev_is_mac_header_xmit(dev);
@@ -124,13 +131,16 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 
 	if (!exists) {
 		if (!dev) {
+			tcf_idr_cleanup(tn, parm->index);
 			NL_SET_ERR_MSG_MOD(extack, "Specified device does not exist");
 			return -EINVAL;
 		}
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_mirred_ops, bind, true);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 		ret = ACT_P_CREATED;
 	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c
index 1f91e8e66c0f..4dd9188a72fd 100644
--- a/net/sched/act_nat.c
+++ b/net/sched/act_nat.c
@@ -57,19 +57,24 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est,
 		return -EINVAL;
 	parm = nla_data(tb[TCA_NAT_PARMS]);
 
-	if (!tcf_idr_check(tn, parm->index, a, bind)) {
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!err) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_nat_ops, bind, false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (err > 0) {
 		if (bind)
 			return 0;
 		if (!ovr) {
 			tcf_idr_release(*a, bind);
 			return -EEXIST;
 		}
+	} else {
+		return err;
 	}
 	p = to_tcf_nat(*a);
 
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 3a0e2f762f4e..cc8ffcd1ddb5 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -173,16 +173,20 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 	if (IS_ERR(keys_ex))
 		return PTR_ERR(keys_ex);
 
-	if (!tcf_idr_check(tn, parm->index, a, bind)) {
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (!err) {
 		if (!parm->nkeys) {
+			tcf_idr_cleanup(tn, parm->index);
 			NL_SET_ERR_MSG_MOD(extack, "Pedit requires keys to be passed");
 			ret = -EINVAL;
 			goto out_free;
 		}
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_pedit_ops, bind, false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			goto out_free;
+		}
 		p = to_pedit(*a);
 		keys = kmalloc(ksize, GFP_KERNEL);
 		if (!keys) {
@@ -191,7 +195,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 			goto out_free;
 		}
 		ret = ACT_P_CREATED;
-	} else {
+	} else if (err > 0) {
 		if (bind)
 			goto out_free;
 		if (!ovr) {
@@ -207,6 +211,8 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla,
 				goto out_free;
 			}
 		}
+	} else {
+		return err;
 	}
 
 	spin_lock_bh(&p->tcf_lock);
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 99335cca739e..1f3192ea8df7 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -101,15 +101,20 @@ static int tcf_act_police_init(struct net *net, struct nlattr *nla,
 		return -EINVAL;
 
 	parm = nla_data(tb[TCA_POLICE_TBF]);
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, NULL, a,
 				     &act_police_ops, bind, false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 		ret = ACT_P_CREATED;
 	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
diff --git a/net/sched/act_sample.c b/net/sched/act_sample.c
index a8582e1347db..3079e7be5bde 100644
--- a/net/sched/act_sample.c
+++ b/net/sched/act_sample.c
@@ -46,7 +46,7 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
 	struct tc_sample *parm;
 	struct tcf_sample *s;
 	bool exists = false;
-	int ret;
+	int ret, err;
 
 	if (!nla)
 		return -EINVAL;
@@ -59,15 +59,20 @@ static int tcf_sample_init(struct net *net, struct nlattr *nla,
 
 	parm = nla_data(tb[TCA_SAMPLE_PARMS]);
 
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_sample_ops, bind, false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 		ret = ACT_P_CREATED;
 	} else if (!ovr) {
 		tcf_idr_release(*a, bind);
diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 2da47c682a30..aa51152e0066 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -100,21 +100,28 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla,
 		return -EINVAL;
 
 	parm = nla_data(tb[TCA_DEF_PARMS]);
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
 	if (tb[TCA_DEF_DATA] == NULL) {
 		if (exists)
 			tcf_idr_release(*a, bind);
+		else
+			tcf_idr_cleanup(tn, parm->index);
 		return -EINVAL;
 	}
 
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_simp_ops, bind, false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 
 		d = to_defact(*a);
 		ret = alloc_defdata(d, tb[TCA_DEF_DATA]);
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 4616a2c1821f..86521a74ecdd 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -152,21 +152,28 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 
 	parm = nla_data(tb[TCA_SKBEDIT_PARMS]);
 
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
 	if (!flags) {
 		if (exists)
 			tcf_idr_release(*a, bind);
+		else
+			tcf_idr_cleanup(tn, parm->index);
 		return -EINVAL;
 	}
 
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_skbedit_ops, bind, false);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 
 		d = to_skbedit(*a);
 		ret = ACT_P_CREATED;
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index e844381af066..cdc6bacfb190 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -128,21 +128,28 @@ static int tcf_skbmod_init(struct net *net, struct nlattr *nla,
 	if (parm->flags & SKBMOD_F_SWAPMAC)
 		lflags = SKBMOD_F_SWAPMAC;
 
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
 	if (!lflags) {
 		if (exists)
 			tcf_idr_release(*a, bind);
+		else
+			tcf_idr_cleanup(tn, parm->index);
 		return -EINVAL;
 	}
 
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_skbmod_ops, bind, true);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 
 		ret = ACT_P_CREATED;
 	} else if (!ovr) {
diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index ab5bf5c13f87..3ec585d58762 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -237,7 +237,10 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 	}
 
 	parm = nla_data(tb[TCA_TUNNEL_KEY_PARMS]);
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
@@ -325,7 +328,7 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 				     &act_tunnel_key_ops, bind, true);
 		if (ret) {
 			NL_SET_ERR_MSG(extack, "Cannot create TC IDR");
-			return ret;
+			goto err_out;
 		}
 
 		ret = ACT_P_CREATED;
@@ -364,6 +367,8 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla,
 err_out:
 	if (exists)
 		tcf_idr_release(*a, bind);
+	else
+		tcf_idr_cleanup(tn, parm->index);
 	return ret;
 }
 
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 9b600faaccbb..ad37f308175a 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -134,7 +134,10 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 	if (!tb[TCA_VLAN_PARMS])
 		return -EINVAL;
 	parm = nla_data(tb[TCA_VLAN_PARMS]);
-	exists = tcf_idr_check(tn, parm->index, a, bind);
+	err = tcf_idr_check_alloc(tn, &parm->index, a, bind);
+	if (err < 0)
+		return err;
+	exists = err;
 	if (exists && bind)
 		return 0;
 
@@ -146,12 +149,16 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 		if (!tb[TCA_VLAN_PUSH_VLAN_ID]) {
 			if (exists)
 				tcf_idr_release(*a, bind);
+			else
+				tcf_idr_cleanup(tn, parm->index);
 			return -EINVAL;
 		}
 		push_vid = nla_get_u16(tb[TCA_VLAN_PUSH_VLAN_ID]);
 		if (push_vid >= VLAN_VID_MASK) {
 			if (exists)
 				tcf_idr_release(*a, bind);
+			else
+				tcf_idr_cleanup(tn, parm->index);
 			return -ERANGE;
 		}
 
@@ -164,6 +171,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 			default:
 				if (exists)
 					tcf_idr_release(*a, bind);
+				else
+					tcf_idr_cleanup(tn, parm->index);
 				return -EPROTONOSUPPORT;
 			}
 		} else {
@@ -176,6 +185,8 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 	default:
 		if (exists)
 			tcf_idr_release(*a, bind);
+		else
+			tcf_idr_cleanup(tn, parm->index);
 		return -EINVAL;
 	}
 	action = parm->v_action;
@@ -183,8 +194,10 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 	if (!exists) {
 		ret = tcf_idr_create(tn, parm->index, est, a,
 				     &act_vlan_ops, bind, true);
-		if (ret)
+		if (ret) {
+			tcf_idr_cleanup(tn, parm->index);
 			return ret;
+		}
 
 		ret = ACT_P_CREATED;
 	} else if (!ovr) {
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (9 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 10/11] net: sched: atomically check-allocate action Vlad Buslov
@ 2018-07-05 14:24 ` Vlad Buslov
  2018-08-07 23:26   ` Cong Wang
  2018-07-07 11:41 ` [PATCH net-next v6 00/11] Modify action API for implementing lockless actions David Miller
  2018-07-08  3:43 ` David Miller
  12 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-07-05 14:24 UTC (permalink / raw)
  To: netdev
  Cc: davem, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn, Vlad Buslov

Act API used linked list to pass set of actions to functions. It is
intrusive data structure that stores list nodes inside action structure
itself, which means it is not safe to modify such list concurrently.
However, action API doesn't use any linked list specific operations on this
set of actions, so it can be safely refactored into plain pointer array.

Refactor action API to use array of pointers to tc_actions instead of
linked list. Change argument 'actions' type of exported action init,
destroy and dump functions.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
---
Changes from V5 to V6:
- When action is deleted, set pointer in actions array to NULL to
  prevent double freeing.

Changes from V4 to V5:
- Change action delete API to track actions that were deleted, to
  prevent releasing them on error.

Changes from V3 to V4:
- Reduce actions array size in tcf_action_init_1.

 include/net/act_api.h |  7 ++--
 net/sched/act_api.c   | 89 +++++++++++++++++++++++++++++----------------------
 net/sched/cls_api.c   | 21 ++++--------
 3 files changed, 60 insertions(+), 57 deletions(-)

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8090de2edab7..683ce41053d9 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -168,19 +168,20 @@ static inline int tcf_idr_release(struct tc_action *a, bool bind)
 int tcf_register_action(struct tc_action_ops *a, struct pernet_operations *ops);
 int tcf_unregister_action(struct tc_action_ops *a,
 			  struct pernet_operations *ops);
-int tcf_action_destroy(struct list_head *actions, int bind);
+int tcf_action_destroy(struct tc_action *actions[], int bind);
 int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
 		    int nr_actions, struct tcf_result *res);
 int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		    struct nlattr *est, char *name, int ovr, int bind,
-		    struct list_head *actions, size_t *attr_size,
+		    struct tc_action *actions[], size_t *attr_size,
 		    bool rtnl_held, struct netlink_ext_ack *extack);
 struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 				    struct nlattr *nla, struct nlattr *est,
 				    char *name, int ovr, int bind,
 				    bool rtnl_held,
 				    struct netlink_ext_ack *extack);
-int tcf_action_dump(struct sk_buff *skb, struct list_head *, int, int);
+int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[], int bind,
+		    int ref);
 int tcf_action_dump_old(struct sk_buff *skb, struct tc_action *a, int, int);
 int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int, int);
 int tcf_action_copy_stats(struct sk_buff *, struct tc_action *, int);
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 9511502e1cbb..bf1c35f3deb6 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -657,13 +657,15 @@ int tcf_action_exec(struct sk_buff *skb, struct tc_action **actions,
 }
 EXPORT_SYMBOL(tcf_action_exec);
 
-int tcf_action_destroy(struct list_head *actions, int bind)
+int tcf_action_destroy(struct tc_action *actions[], int bind)
 {
 	const struct tc_action_ops *ops;
-	struct tc_action *a, *tmp;
-	int ret = 0;
+	struct tc_action *a;
+	int ret = 0, i;
 
-	list_for_each_entry_safe(a, tmp, actions, list) {
+	for (i = 0; i < TCA_ACT_MAX_PRIO && actions[i]; i++) {
+		a = actions[i];
+		actions[i] = NULL;
 		ops = a->ops;
 		ret = __tcf_idr_release(a, bind, true);
 		if (ret == ACT_P_DELETED)
@@ -679,11 +681,12 @@ static int tcf_action_put(struct tc_action *p)
 	return __tcf_action_put(p, false);
 }
 
-static void tcf_action_put_lst(struct list_head *actions)
+static void tcf_action_put_many(struct tc_action *actions[])
 {
-	struct tc_action *a, *tmp;
+	int i;
 
-	list_for_each_entry_safe(a, tmp, actions, list) {
+	for (i = 0; i < TCA_ACT_MAX_PRIO && actions[i]; i++) {
+		struct tc_action *a = actions[i];
 		const struct tc_action_ops *ops = a->ops;
 
 		if (tcf_action_put(a))
@@ -735,14 +738,15 @@ tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
 }
 EXPORT_SYMBOL(tcf_action_dump_1);
 
-int tcf_action_dump(struct sk_buff *skb, struct list_head *actions,
+int tcf_action_dump(struct sk_buff *skb, struct tc_action *actions[],
 		    int bind, int ref)
 {
 	struct tc_action *a;
-	int err = -EINVAL;
+	int err = -EINVAL, i;
 	struct nlattr *nest;
 
-	list_for_each_entry(a, actions, list) {
+	for (i = 0; i < TCA_ACT_MAX_PRIO && actions[i]; i++) {
+		a = actions[i];
 		nest = nla_nest_start(skb, a->order);
 		if (nest == NULL)
 			goto nla_put_failure;
@@ -878,10 +882,9 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	if (TC_ACT_EXT_CMP(a->tcfa_action, TC_ACT_GOTO_CHAIN)) {
 		err = tcf_action_goto_chain_init(a, tp);
 		if (err) {
-			LIST_HEAD(actions);
+			struct tc_action *actions[] = { a, NULL };
 
-			list_add_tail(&a->list, &actions);
-			tcf_action_destroy(&actions, bind);
+			tcf_action_destroy(actions, bind);
 			NL_SET_ERR_MSG(extack, "Failed to init TC action chain");
 			return ERR_PTR(err);
 		}
@@ -899,9 +902,11 @@ struct tc_action *tcf_action_init_1(struct net *net, struct tcf_proto *tp,
 	return ERR_PTR(err);
 }
 
+/* Returns numbers of initialized actions or negative error. */
+
 int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		    struct nlattr *est, char *name, int ovr, int bind,
-		    struct list_head *actions, size_t *attr_size,
+		    struct tc_action *actions[], size_t *attr_size,
 		    bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
@@ -923,11 +928,12 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
 		}
 		act->order = i;
 		sz += tcf_action_fill_size(act);
-		list_add_tail(&act->list, actions);
+		/* Start from index 0 */
+		actions[i - 1] = act;
 	}
 
 	*attr_size = tcf_action_full_attrs_size(sz);
-	return 0;
+	return i - 1;
 
 err:
 	tcf_action_destroy(actions, bind);
@@ -978,7 +984,7 @@ int tcf_action_copy_stats(struct sk_buff *skb, struct tc_action *p,
 	return -1;
 }
 
-static int tca_get_fill(struct sk_buff *skb, struct list_head *actions,
+static int tca_get_fill(struct sk_buff *skb, struct tc_action *actions[],
 			u32 portid, u32 seq, u16 flags, int event, int bind,
 			int ref)
 {
@@ -1014,7 +1020,7 @@ static int tca_get_fill(struct sk_buff *skb, struct list_head *actions,
 
 static int
 tcf_get_notify(struct net *net, u32 portid, struct nlmsghdr *n,
-	       struct list_head *actions, int event,
+	       struct tc_action *actions[], int event,
 	       struct netlink_ext_ack *extack)
 {
 	struct sk_buff *skb;
@@ -1150,14 +1156,14 @@ static int tca_action_flush(struct net *net, struct nlattr *nla,
 	return err;
 }
 
-static int tcf_action_delete(struct net *net, struct list_head *actions,
-			     struct netlink_ext_ack *extack)
+static int tcf_action_delete(struct net *net, struct tc_action *actions[],
+			     int *acts_deleted, struct netlink_ext_ack *extack)
 {
-	struct tc_action *a, *tmp;
 	u32 act_index;
-	int ret;
+	int ret, i;
 
-	list_for_each_entry_safe(a, tmp, actions, list) {
+	for (i = 0; i < TCA_ACT_MAX_PRIO && actions[i]; i++) {
+		struct tc_action *a = actions[i];
 		const struct tc_action_ops *ops = a->ops;
 
 		/* Actions can be deleted concurrently so we must save their
@@ -1165,23 +1171,26 @@ static int tcf_action_delete(struct net *net, struct list_head *actions,
 		 */
 		act_index = a->tcfa_index;
 
-		list_del(&a->list);
 		if (tcf_action_put(a)) {
 			/* last reference, action was deleted concurrently */
 			module_put(ops->owner);
 		} else  {
 			/* now do the delete */
 			ret = ops->delete(net, act_index);
-			if (ret < 0)
+			if (ret < 0) {
+				*acts_deleted = i + 1;
 				return ret;
+			}
 		}
 	}
+	*acts_deleted = i;
 	return 0;
 }
 
 static int
-tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
-	       u32 portid, size_t attr_size, struct netlink_ext_ack *extack)
+tcf_del_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[],
+	       int *acts_deleted, u32 portid, size_t attr_size,
+	       struct netlink_ext_ack *extack)
 {
 	int ret;
 	struct sk_buff *skb;
@@ -1199,7 +1208,7 @@ tcf_del_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
 	}
 
 	/* now do the delete */
-	ret = tcf_action_delete(net, actions, extack);
+	ret = tcf_action_delete(net, actions, acts_deleted, extack);
 	if (ret < 0) {
 		NL_SET_ERR_MSG(extack, "Failed to delete TC action");
 		kfree_skb(skb);
@@ -1221,7 +1230,8 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
 	struct nlattr *tb[TCA_ACT_MAX_PRIO + 1];
 	struct tc_action *act;
 	size_t attr_size = 0;
-	LIST_HEAD(actions);
+	struct tc_action *actions[TCA_ACT_MAX_PRIO + 1] = {};
+	int acts_deleted = 0;
 
 	ret = nla_parse_nested(tb, TCA_ACT_MAX_PRIO, nla, NULL, extack);
 	if (ret < 0)
@@ -1243,26 +1253,27 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct nlmsghdr *n,
 		}
 		act->order = i;
 		attr_size += tcf_action_fill_size(act);
-		list_add_tail(&act->list, &actions);
+		actions[i - 1] = act;
 	}
 
 	attr_size = tcf_action_full_attrs_size(attr_size);
 
 	if (event == RTM_GETACTION)
-		ret = tcf_get_notify(net, portid, n, &actions, event, extack);
+		ret = tcf_get_notify(net, portid, n, actions, event, extack);
 	else { /* delete */
-		ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
+		ret = tcf_del_notify(net, n, actions, &acts_deleted, portid,
+				     attr_size, extack);
 		if (ret)
 			goto err;
 		return ret;
 	}
 err:
-	tcf_action_put_lst(&actions);
+	tcf_action_put_many(&actions[acts_deleted]);
 	return ret;
 }
 
 static int
-tcf_add_notify(struct net *net, struct nlmsghdr *n, struct list_head *actions,
+tcf_add_notify(struct net *net, struct nlmsghdr *n, struct tc_action *actions[],
 	       u32 portid, size_t attr_size, struct netlink_ext_ack *extack)
 {
 	struct sk_buff *skb;
@@ -1293,15 +1304,15 @@ static int tcf_action_add(struct net *net, struct nlattr *nla,
 {
 	size_t attr_size = 0;
 	int ret = 0;
-	LIST_HEAD(actions);
+	struct tc_action *actions[TCA_ACT_MAX_PRIO] = {};
 
-	ret = tcf_action_init(net, NULL, nla, NULL, NULL, ovr, 0, &actions,
+	ret = tcf_action_init(net, NULL, nla, NULL, NULL, ovr, 0, actions,
 			      &attr_size, true, extack);
-	if (ret)
+	if (ret < 0)
 		return ret;
-	ret = tcf_add_notify(net, n, &actions, portid, attr_size, extack);
+	ret = tcf_add_notify(net, n, actions, portid, attr_size, extack);
 	if (ovr)
-		tcf_action_put_lst(&actions);
+		tcf_action_put_many(actions);
 
 	return ret;
 }
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 9041f0e43e9a..73d9967c3739 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1609,10 +1609,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 void tcf_exts_destroy(struct tcf_exts *exts)
 {
 #ifdef CONFIG_NET_CLS_ACT
-	LIST_HEAD(actions);
-
-	tcf_exts_to_list(exts, &actions);
-	tcf_action_destroy(&actions, TCA_ACT_UNBIND);
+	tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
 	kfree(exts->actions);
 	exts->nr_actions = 0;
 #endif
@@ -1639,18 +1636,15 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 			exts->actions[0] = act;
 			exts->nr_actions = 1;
 		} else if (exts->action && tb[exts->action]) {
-			LIST_HEAD(actions);
-			int err, i = 0;
+			int err;
 
 			err = tcf_action_init(net, tp, tb[exts->action],
 					      rate_tlv, NULL, ovr, TCA_ACT_BIND,
-					      &actions, &attr_size, true,
+					      exts->actions, &attr_size, true,
 					      extack);
-			if (err)
+			if (err < 0)
 				return err;
-			list_for_each_entry(act, &actions, list)
-				exts->actions[i++] = act;
-			exts->nr_actions = i;
+			exts->nr_actions = err;
 		}
 		exts->net = net;
 	}
@@ -1699,14 +1693,11 @@ int tcf_exts_dump(struct sk_buff *skb, struct tcf_exts *exts)
 		 * tc data even if iproute2  was newer - jhs
 		 */
 		if (exts->type != TCA_OLD_COMPAT) {
-			LIST_HEAD(actions);
-
 			nest = nla_nest_start(skb, exts->action);
 			if (nest == NULL)
 				goto nla_put_failure;
 
-			tcf_exts_to_list(exts, &actions);
-			if (tcf_action_dump(skb, &actions, 0, 0) < 0)
+			if (tcf_action_dump(skb, exts->actions, 0, 0) < 0)
 				goto nla_put_failure;
 			nla_nest_end(skb, nest);
 		} else if (exts->police) {
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 00/11] Modify action API for implementing lockless actions
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (10 preceding siblings ...)
  2018-07-05 14:24 ` [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions Vlad Buslov
@ 2018-07-07 11:41 ` David Miller
  2018-07-08  3:43 ` David Miller
  12 siblings, 0 replies; 37+ messages in thread
From: David Miller @ 2018-07-07 11:41 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn

From: Vlad Buslov <vladbu@mellanox.com>
Date: Thu,  5 Jul 2018 17:24:22 +0300

> Currently, all netlink protocol handlers for updating rules, actions and
> qdiscs are protected with single global rtnl lock which removes any
> possibility for parallelism. This patch set is a first step to remove
> rtnl lock dependency from TC rules update path.

I've reviewed this a few time but since this is a rather non-trivial
set of changes I'm going to let others have a chance to review and
give feedback as well.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 00/11] Modify action API for implementing lockless actions
  2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
                   ` (11 preceding siblings ...)
  2018-07-07 11:41 ` [PATCH net-next v6 00/11] Modify action API for implementing lockless actions David Miller
@ 2018-07-08  3:43 ` David Miller
  2018-07-13  3:54   ` Cong Wang
  12 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2018-07-08  3:43 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel, kliteyn

From: Vlad Buslov <vladbu@mellanox.com>
Date: Thu,  5 Jul 2018 17:24:22 +0300

> Currently, all netlink protocol handlers for updating rules, actions and
> qdiscs are protected with single global rtnl lock which removes any
> possibility for parallelism. This patch set is a first step to remove
> rtnl lock dependency from TC rules update path.
 ...

I'll apply this for now, I reviewed it a few more times and I see
where you are going with this.

I hope there are no new performance regressions in the control path
for cases people care about, and if there are I definitely expect
you to address them.

Thank you.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-05 14:24 ` [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update Vlad Buslov
@ 2018-07-13  3:52   ` Cong Wang
  2018-07-13 13:30     ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-07-13  3:52 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Implement functions to atomically update and free action cookie
> using rcu mechanism.

Without stating any reason..... Is this even a changelog?

>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Dear Marcelo, how did it pass your review? See below:


> +static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
> +                                 struct tc_cookie *new_cookie)
> +{
> +       struct tc_cookie *old;
> +
> +       old = xchg(old_cookie, new_cookie);


This is an incorrect use of RCU, obviously should be rcu_assign_pointer()
here.


> @@ -65,10 +83,7 @@ static void free_tcf(struct tc_action *p)
>         free_percpu(p->cpu_bstats);
>         free_percpu(p->cpu_qstats);
>
> -       if (p->act_cookie) {
> -               kfree(p->act_cookie->data);
> -               kfree(p->act_cookie);
> -       }
> +       tcf_set_action_cookie(&p->act_cookie, NULL);

So, this is called in free_tcf(), where the action is already
invisible from readers so it is ready to be freed.

The question is:

If the action itself is already ready to be freed, why do you
need RCU here? What could still read 'act->act_cookie'
while 'act' is already invisible?

Its last refcnt is already gone, the fast path RCU readers
are gone too given filters use rcu work already.

Standalone action dump? Again, the last refcnt is already
gone.

Marcelo, Vlad, Jiri, please explain.

Thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 00/11] Modify action API for implementing lockless actions
  2018-07-08  3:43 ` David Miller
@ 2018-07-13  3:54   ` Cong Wang
  2018-07-13 13:40     ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-07-13  3:54 UTC (permalink / raw)
  To: David Miller
  Cc: Vlad Buslov, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik

On Sat, Jul 7, 2018 at 8:43 PM David Miller <davem@davemloft.net> wrote:
>
> From: Vlad Buslov <vladbu@mellanox.com>
> Date: Thu,  5 Jul 2018 17:24:22 +0300
>
> > Currently, all netlink protocol handlers for updating rules, actions and
> > qdiscs are protected with single global rtnl lock which removes any
> > possibility for parallelism. This patch set is a first step to remove
> > rtnl lock dependency from TC rules update path.
>  ...
>
> I'll apply this for now, I reviewed it a few more times and I see
> where you are going with this.

Dear David,

I don't understand why you even believe the claim of lockless
updaters here, it at least should raise a red flag when you see any
kinda of this claim.

I know you don't trust me, how about thinking it in this way:

Why does RCU still require a lock for RCU writers? (Or at least
RCU recommends a lock, if anyone really wants to point out some
lockless algorithm here.)

or:

If writers could really go lockless as easily as Vlad claims, how could
even Paul E. McKenney never bring it into RCU?

Maybe Vlad is much cleverer than any of us here, and maybe he really
discovers a very brilliant algorithm to allow TC actions to be updated
locklessly, why not wait until he shows a proof (either code or a paper)?
Is there a rush? I don't see it.

In fact, I discussed this with Vlad a little bit at netdev TC workshop.
I never see any brilliant algorithm from him from his slides, and I was
told by him he used "copy and replace" to archive parallel updaters, I
told him that is basically how RCU works and RCU writers have to be
sync'ed with a lock (or at least recommended).

Also, to confirm my judgement, I checked this with Paul privately too.
Paul said you have to be extremely careful to go lockless, it is very hard
to be bug free for lockless, although he _never_ says it is impossible.

My _personal_ bet is that, lockless updates for TC filters or actions
are impossible unless there are more things hiding behind "copy and
replace", for example, some brilliant lockless algorithm. If lockless is
really impossible in this circumstance, then many of your efforts in
this patchset are vain, by the way.

I _do_ believe you can break RTNL down to per device, per filter or per
action, but no matter how small the locking scope is, there is still a lock.
With a lock, there is no need to make things friendly to lockless, like
making an integer increment inside an action to be atomic (your patch
02/11).

Please _do_ prove my personal judgement is wrong, by showing your
final code or a formal paper/article. I am very *happy* to be proved
to be wrong here, I am very open to change my mind here.

Vlad, we need your proof. Please prove I am wrong, seriously!!! :)

Thanks to anyone for proving me I am wrong just in case!!! :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-13  3:52   ` Cong Wang
@ 2018-07-13 13:30     ` Vlad Buslov
  2018-07-13 21:51       ` Cong Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-07-13 13:30 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko


On Fri 13 Jul 2018 at 03:52, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Implement functions to atomically update and free action cookie
>> using rcu mechanism.
>
> Without stating any reason..... Is this even a changelog?

Yes, it is.

>
>>
>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>
> Dear Marcelo, how did it pass your review? See below:
>
>
>> +static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
>> +                                 struct tc_cookie *new_cookie)
>> +{
>> +       struct tc_cookie *old;
>> +
>> +       old = xchg(old_cookie, new_cookie);
>
>
> This is an incorrect use of RCU, obviously should be rcu_assign_pointer()
> here.

Could you please explain your concern in more details? Similar pattern
is already widely used in kernel for re-assigning rcu pointers. For
example, Eric Dumazet uses it in 1c0d32fde5bd ("net_sched:
gen_estimator: complete rewrite of rate estimators"):

void gen_kill_estimator(struct net_rate_estimator __rcu **rate_est)
{
	struct net_rate_estimator *est;

	est = xchg((__force struct net_rate_estimator **)rate_est, NULL);
	if (est) {
		del_timer_sync(&est->timer);
		kfree_rcu(est, rcu);
	}
}

Tom Herbert uses same idiom in a8c5f90fb59a ("ip_tunnel: Ops
registration for secondary encap (fou, gue)"):

int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *ops,
			    unsigned int num)
{
	if (num >= MAX_IPTUN_ENCAP_OPS)
		return -ERANGE;

	return !cmpxchg((const struct ip_tunnel_encap_ops **)
			&iptun_encaps[num],
			NULL, ops) ? 0 : -1;
}

Again, Eric uses xchg to re-assign rcu pointer in 45f6fad84cc3 ("ipv6:
add complete rcu protection around np->opt"):

struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
					   struct ipv6_txoptions *opt)
{
	if (inet_sk(sk)->is_icsk) {
		if (opt &&
		    !((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
		    inet_sk(sk)->inet_daddr != LOOPBACK4_IPV6) {
			struct inet_connection_sock *icsk = inet_csk(sk);
			icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen;
			icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
		}
	}
	opt = xchg((__force struct ipv6_txoptions **)&inet6_sk(sk)->opt,
		   opt);
	sk_dst_reset(sk);

	return opt;
}

>
>
>> @@ -65,10 +83,7 @@ static void free_tcf(struct tc_action *p)
>>         free_percpu(p->cpu_bstats);
>>         free_percpu(p->cpu_qstats);
>>
>> -       if (p->act_cookie) {
>> -               kfree(p->act_cookie->data);
>> -               kfree(p->act_cookie);
>> -       }
>> +       tcf_set_action_cookie(&p->act_cookie, NULL);
>
> So, this is called in free_tcf(), where the action is already
> invisible from readers so it is ready to be freed.
>
> The question is:
>
> If the action itself is already ready to be freed, why do you
> need RCU here? What could still read 'act->act_cookie'
> while 'act' is already invisible?
>
> Its last refcnt is already gone, the fast path RCU readers
> are gone too given filters use rcu work already.
>
> Standalone action dump? Again, the last refcnt is already
> gone.

It is not necessary here, I just used tcf_set_action_cookie() that
already implements cookie pointer cleanup to prevent code duplication.
I'm open to changing it, if you concerned with performance impact of
using atomic operation for re-assigning cookie pointer.

>
> Marcelo, Vlad, Jiri, please explain.
>
> Thanks!

Thank you for reviewing my code!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 00/11] Modify action API for implementing lockless actions
  2018-07-13  3:54   ` Cong Wang
@ 2018-07-13 13:40     ` Vlad Buslov
  0 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-07-13 13:40 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Miller, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik


On Fri 13 Jul 2018 at 03:54, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sat, Jul 7, 2018 at 8:43 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Vlad Buslov <vladbu@mellanox.com>
>> Date: Thu,  5 Jul 2018 17:24:22 +0300
>>
>> > Currently, all netlink protocol handlers for updating rules, actions and
>> > qdiscs are protected with single global rtnl lock which removes any
>> > possibility for parallelism. This patch set is a first step to remove
>> > rtnl lock dependency from TC rules update path.
>>  ...
>>
>> I'll apply this for now, I reviewed it a few more times and I see
>> where you are going with this.
>
> Dear David,
>
> I don't understand why you even believe the claim of lockless
> updaters here, it at least should raise a red flag when you see any
> kinda of this claim.
>
> I know you don't trust me, how about thinking it in this way:
>
> Why does RCU still require a lock for RCU writers? (Or at least
> RCU recommends a lock, if anyone really wants to point out some
> lockless algorithm here.)
>
> or:
>
> If writers could really go lockless as easily as Vlad claims, how could
> even Paul E. McKenney never bring it into RCU?
>
> Maybe Vlad is much cleverer than any of us here, and maybe he really
> discovers a very brilliant algorithm to allow TC actions to be updated
> locklessly, why not wait until he shows a proof (either code or a paper)?
> Is there a rush? I don't see it.
>
> In fact, I discussed this with Vlad a little bit at netdev TC workshop.
> I never see any brilliant algorithm from him from his slides, and I was
> told by him he used "copy and replace" to archive parallel updaters, I
> told him that is basically how RCU works and RCU writers have to be
> sync'ed with a lock (or at least recommended).
>
> Also, to confirm my judgement, I checked this with Paul privately too.
> Paul said you have to be extremely careful to go lockless, it is very hard
> to be bug free for lockless, although he _never_ says it is impossible.
>
> My _personal_ bet is that, lockless updates for TC filters or actions
> are impossible unless there are more things hiding behind "copy and
> replace", for example, some brilliant lockless algorithm. If lockless is
> really impossible in this circumstance, then many of your efforts in
> this patchset are vain, by the way.
>
> I _do_ believe you can break RTNL down to per device, per filter or per
> action, but no matter how small the locking scope is, there is still a lock.
> With a lock, there is no need to make things friendly to lockless, like
> making an integer increment inside an action to be atomic (your patch
> 02/11).
>
> Please _do_ prove my personal judgement is wrong, by showing your
> final code or a formal paper/article. I am very *happy* to be proved
> to be wrong here, I am very open to change my mind here.
>
> Vlad, we need your proof. Please prove I am wrong, seriously!!! :)
>
> Thanks to anyone for proving me I am wrong just in case!!! :)

Dear Cong,

I never claimed to have some new brilliant algorithm that completely
removed any locks from rules update path. Obviously, fine-grained
locking is introduced when necessary. I'm sorry if my liberal usage of
term "lockless" confused you. I guess I should be more specific. I'm
fully agree with you that totally removing any and all locks from rules
update path would require some engineering marvel.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-13 13:30     ` Vlad Buslov
@ 2018-07-13 21:51       ` Cong Wang
  2018-07-13 22:11         ` David Miller
  2018-07-16  8:31         ` Vlad Buslov
  0 siblings, 2 replies; 37+ messages in thread
From: Cong Wang @ 2018-07-13 21:51 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Fri, Jul 13, 2018 at 6:30 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Fri 13 Jul 2018 at 03:52, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> Implement functions to atomically update and free action cookie
> >> using rcu mechanism.
> >
> > Without stating any reason..... Is this even a changelog?
>
> Yes, it is.

What do you expect in a changelog generally? Repeating what
your code does? Thanks but we don't even want to read any code
unless the need of this code is reasonably justified.

Can we at least agree you have no justification for this change
in this changelog? Or you believe this patch is as trivial as
a white space change which doesn't need a justification?


>
> >
> >>
> >> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> >
> > Dear Marcelo, how did it pass your review? See below:
> >
> >
> >> +static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
> >> +                                 struct tc_cookie *new_cookie)
> >> +{
> >> +       struct tc_cookie *old;
> >> +
> >> +       old = xchg(old_cookie, new_cookie);
> >
> >
> > This is an incorrect use of RCU, obviously should be rcu_assign_pointer()
> > here.
>
> Could you please explain your concern in more details? Similar pattern
> is already widely used in kernel for re-assigning rcu pointers. For

My reasoning is too simple: search whatisRCU.txt for "xchg",
I find nothing. :)

Here is the link:
https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt

Of course, both xchg() and rcu_assign_pointer() are aimed to make
an assignment of a pointer. But even without looking into their
implementations, there must be a reason for rcu_assign_pointer() to
exist, right? Can we agree or you believe rcu_assign_pointer() can
be replaced by xchg() and removed finally?

This also means you need to justify your pick of xchg() in your
changelog where there is nothing literally.


> example, Eric Dumazet uses it in 1c0d32fde5bd ("net_sched:
> gen_estimator: complete rewrite of rate estimators"):
>
> void gen_kill_estimator(struct net_rate_estimator __rcu **rate_est)
> {
>         struct net_rate_estimator *est;
>
>         est = xchg((__force struct net_rate_estimator **)rate_est, NULL);
>         if (est) {
>                 del_timer_sync(&est->timer);
>                 kfree_rcu(est, rcu);
>         }
> }

In this case, *I think* the only reason is the burden of the API
gen_kill_estimator(). It aims to be a core API for both netfilter and
net_sched, therefore it _has to_ provide a wrapper for its users,
because otherwise each user has to repeat rcu_assign_pointer()
+ del_timer_sync() + kfree_rcu(), just a matter of duplication.

Apparently this rule does NOT apply to your case, where
tcf_set_action_cookie() is merely a static function called by two
users in the same C file, and without anything but a call_rcu().


>
> Tom Herbert uses same idiom in a8c5f90fb59a ("ip_tunnel: Ops
> registration for secondary encap (fou, gue)"):
>
> int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *ops,
>                             unsigned int num)
> {
>         if (num >= MAX_IPTUN_ENCAP_OPS)
>                 return -ERANGE;
>
>         return !cmpxchg((const struct ip_tunnel_encap_ops **)
>                         &iptun_encaps[num],
>                         NULL, ops) ? 0 : -1;
> }


cmpxchg() is completely different with xchg(), first of all.

In this case, its caller expects if this cmpxchg() fails or not.
How this could be even related to your case given
tcf_set_action_cookie() returns void?


>
> Again, Eric uses xchg to re-assign rcu pointer in 45f6fad84cc3 ("ipv6:
> add complete rcu protection around np->opt"):
>
> struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
>                                            struct ipv6_txoptions *opt)
> {
>         if (inet_sk(sk)->is_icsk) {
>                 if (opt &&
>                     !((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
>                     inet_sk(sk)->inet_daddr != LOOPBACK4_IPV6) {
>                         struct inet_connection_sock *icsk = inet_csk(sk);
>                         icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen;
>                         icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
>                 }
>         }
>         opt = xchg((__force struct ipv6_txoptions **)&inet6_sk(sk)->opt,
>                    opt);
>         sk_dst_reset(sk);


In this case, it is the caller's requirement. The callers of
ipv6_update_options() want to get the old pointer and do
something about it:

                opt = ipv6_update_options(sk, opt);
                if (opt) {
                        atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
                        txopt_put(opt);
                }

It should be functionally equivalent to saving the old pointer
before a rcu_assign_pointer().

So, this does NOT apply to your case either, you only call
call_rcu() and the callers require nothing.


>
>         return opt;
> }
>
> >
> >
> >> @@ -65,10 +83,7 @@ static void free_tcf(struct tc_action *p)
> >>         free_percpu(p->cpu_bstats);
> >>         free_percpu(p->cpu_qstats);
> >>
> >> -       if (p->act_cookie) {
> >> -               kfree(p->act_cookie->data);
> >> -               kfree(p->act_cookie);
> >> -       }
> >> +       tcf_set_action_cookie(&p->act_cookie, NULL);
> >
> > So, this is called in free_tcf(), where the action is already
> > invisible from readers so it is ready to be freed.
> >
> > The question is:
> >
> > If the action itself is already ready to be freed, why do you
> > need RCU here? What could still read 'act->act_cookie'
> > while 'act' is already invisible?
> >
> > Its last refcnt is already gone, the fast path RCU readers
> > are gone too given filters use rcu work already.
> >
> > Standalone action dump? Again, the last refcnt is already
> > gone.
>
> It is not necessary here, I just used tcf_set_action_cookie() that
> already implements cookie pointer cleanup to prevent code duplication.
> I'm open to changing it, if you concerned with performance impact of
> using atomic operation for re-assigning cookie pointer.

Yeah, totally understand. But >act_cookie very special here,
it requires no copying when update, unlike the normal cases.

This means from RCU we can remove the "C" here. I know
you still copy it when dumping it, but it is a part of Read, not
a part of Update, so it is safe to say you only need R and U
here.

Which in turn means two things:

1. You don't have to use RCU anymore.

2. You don't need a lock for writers given there is no copy
during update, if you still stick to RCU.

This is why I keep saying you need to justify it, it is not trivial
and it is not easy to understand either.

Thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-13 21:51       ` Cong Wang
@ 2018-07-13 22:11         ` David Miller
  2018-07-14  0:14           ` Cong Wang
  2018-07-16  8:31         ` Vlad Buslov
  1 sibling, 1 reply; 37+ messages in thread
From: David Miller @ 2018-07-13 22:11 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: vladbu, netdev, jhs, jiri, ast, daniel, kliteyn, jiri

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Fri, 13 Jul 2018 14:51:15 -0700

> Can we at least agree you have no justification for this change in
> this changelog?

He stated that he wishes to make this subsystem more lockless, and he
cannot do that without making the action cookie handling use RCU.

I agree with the stated goal, and the necessity of this kind of change.

Therefore I applied the patch.

I really don't see what the problem is.

I also gave a couple days for this patch set to get reviewed.  If you
have a problem, please respond to the patch posting.  When I see nobody
is reviewing, that is when I step in and make my own judgment.

So when you want your objection to be heard, please do so in a timely
manner.  That helps all of us.

Thank you.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-13 22:11         ` David Miller
@ 2018-07-14  0:14           ` Cong Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Cong Wang @ 2018-07-14  0:14 UTC (permalink / raw)
  To: David Miller
  Cc: Vlad Buslov, Linux Kernel Network Developers, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Fri, Jul 13, 2018 at 3:11 PM David Miller <davem@davemloft.net> wrote:
>
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Fri, 13 Jul 2018 14:51:15 -0700
>
> > Can we at least agree you have no justification for this change in
> > this changelog?
>
> He stated that he wishes to make this subsystem more lockless, and he
> cannot do that without making the action cookie handling use RCU.

This isn't enough given RCU writers are recommended (subject
to exceptions) to have locks. Let's move this discussion on patch
00/11 where I provided more details. :)


>
> I agree with the stated goal, and the necessity of this kind of change.
>
> Therefore I applied the patch.
>
> I really don't see what the problem is.
>
> I also gave a couple days for this patch set to get reviewed.  If you
> have a problem, please respond to the patch posting.  When I see nobody
> is reviewing, that is when I step in and make my own judgment.
>
> So when you want your objection to be heard, please do so in a timely
> manner.  That helps all of us.

I 100% understand given how much workload you have. I am not even
saying to revert or something.

My only complain is the goal of lockless is very hard or nearly
impossible to achieve, unless there is some secret hiding from me.
And I am trying to get it exposed in my response to 00/11, by offering
an opportunity to prove I am wrong! :)

The problem with this patch, 01/11, is trivial comparing to the
discussion in 00/11, that is crucial for whether the whole patchset(s)
makes sense.

Thanks for taking care of it anyway!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-13 21:51       ` Cong Wang
  2018-07-13 22:11         ` David Miller
@ 2018-07-16  8:31         ` Vlad Buslov
  2018-07-17 20:46           ` Cong Wang
  1 sibling, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-07-16  8:31 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko


On Fri 13 Jul 2018 at 21:51, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Jul 13, 2018 at 6:30 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Fri 13 Jul 2018 at 03:52, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >> Implement functions to atomically update and free action cookie
>> >> using rcu mechanism.
>> >
>> > Without stating any reason..... Is this even a changelog?
>>
>> Yes, it is.
>
> What do you expect in a changelog generally? Repeating what
> your code does? Thanks but we don't even want to read any code
> unless the need of this code is reasonably justified.

In my cover letter:
 - Motivation for patchset is presented in first paragraph.
 - Problems that prevent us from removing rtnl lock dependency are
 described, problem 3 is about cookie pointer.
 - In implementation section, point 3 presents solution for that
 problem.

>
> Can we at least agree you have no justification for this change
> in this changelog? Or you believe this patch is as trivial as
> a white space change which doesn't need a justification?

Cong, from your last letter I understand that you want to have
justification specifically for using atomic operation in this particular
patch. I agree with you that I should have explained it in more details.
I found a lot of prior art for same or similar atomic ops usage for rcu
pointers(examples in my previous mail) and assumed it to be trivial, but
now I understand that I was wrong in this case.

>
>
>>
>> >
>> >>
>> >> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> >
>> > Dear Marcelo, how did it pass your review? See below:
>> >
>> >
>> >> +static void tcf_set_action_cookie(struct tc_cookie __rcu **old_cookie,
>> >> +                                 struct tc_cookie *new_cookie)
>> >> +{
>> >> +       struct tc_cookie *old;
>> >> +
>> >> +       old = xchg(old_cookie, new_cookie);
>> >
>> >
>> > This is an incorrect use of RCU, obviously should be rcu_assign_pointer()
>> > here.
>>
>> Could you please explain your concern in more details? Similar pattern
>> is already widely used in kernel for re-assigning rcu pointers. For
>
> My reasoning is too simple: search whatisRCU.txt for "xchg",
> I find nothing. :)
>
> Here is the link:
> https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt
>
> Of course, both xchg() and rcu_assign_pointer() are aimed to make
> an assignment of a pointer. But even without looking into their
> implementations, there must be a reason for rcu_assign_pointer() to
> exist, right? Can we agree or you believe rcu_assign_pointer() can
> be replaced by xchg() and removed finally?
>
> This also means you need to justify your pick of xchg() in your
> changelog where there is nothing literally.

Now I understand that you want a justification for using xchg
specifically and I do agree with you that it should have been described
in details.

>
>
>> example, Eric Dumazet uses it in 1c0d32fde5bd ("net_sched:
>> gen_estimator: complete rewrite of rate estimators"):
>>
>> void gen_kill_estimator(struct net_rate_estimator __rcu **rate_est)
>> {
>>         struct net_rate_estimator *est;
>>
>>         est = xchg((__force struct net_rate_estimator **)rate_est, NULL);
>>         if (est) {
>>                 del_timer_sync(&est->timer);
>>                 kfree_rcu(est, rcu);
>>         }
>> }
>
> In this case, *I think* the only reason is the burden of the API
> gen_kill_estimator(). It aims to be a core API for both netfilter and
> net_sched, therefore it _has to_ provide a wrapper for its users,
> because otherwise each user has to repeat rcu_assign_pointer()
> + del_timer_sync() + kfree_rcu(), just a matter of duplication.
>
> Apparently this rule does NOT apply to your case, where
> tcf_set_action_cookie() is merely a static function called by two
> users in the same C file, and without anything but a call_rcu().
>
>
>>
>> Tom Herbert uses same idiom in a8c5f90fb59a ("ip_tunnel: Ops
>> registration for secondary encap (fou, gue)"):
>>
>> int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *ops,
>>                             unsigned int num)
>> {
>>         if (num >= MAX_IPTUN_ENCAP_OPS)
>>                 return -ERANGE;
>>
>>         return !cmpxchg((const struct ip_tunnel_encap_ops **)
>>                         &iptun_encaps[num],
>>                         NULL, ops) ? 0 : -1;
>> }
>
>
> cmpxchg() is completely different with xchg(), first of all.
>
> In this case, its caller expects if this cmpxchg() fails or not.
> How this could be even related to your case given
> tcf_set_action_cookie() returns void?
>
>
>>
>> Again, Eric uses xchg to re-assign rcu pointer in 45f6fad84cc3 ("ipv6:
>> add complete rcu protection around np->opt"):
>>
>> struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
>>                                            struct ipv6_txoptions *opt)
>> {
>>         if (inet_sk(sk)->is_icsk) {
>>                 if (opt &&
>>                     !((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
>>                     inet_sk(sk)->inet_daddr != LOOPBACK4_IPV6) {
>>                         struct inet_connection_sock *icsk = inet_csk(sk);
>>                         icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen;
>>                         icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
>>                 }
>>         }
>>         opt = xchg((__force struct ipv6_txoptions **)&inet6_sk(sk)->opt,
>>                    opt);
>>         sk_dst_reset(sk);
>
>
> In this case, it is the caller's requirement. The callers of
> ipv6_update_options() want to get the old pointer and do
> something about it:
>
>                 opt = ipv6_update_options(sk, opt);
>                 if (opt) {
>                         atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
>                         txopt_put(opt);
>                 }
>
> It should be functionally equivalent to saving the old pointer
> before a rcu_assign_pointer().
>
> So, this does NOT apply to your case either, you only call
> call_rcu() and the callers require nothing.
>
>
>>
>>         return opt;
>> }
>>
>> >
>> >
>> >> @@ -65,10 +83,7 @@ static void free_tcf(struct tc_action *p)
>> >>         free_percpu(p->cpu_bstats);
>> >>         free_percpu(p->cpu_qstats);
>> >>
>> >> -       if (p->act_cookie) {
>> >> -               kfree(p->act_cookie->data);
>> >> -               kfree(p->act_cookie);
>> >> -       }
>> >> +       tcf_set_action_cookie(&p->act_cookie, NULL);
>> >
>> > So, this is called in free_tcf(), where the action is already
>> > invisible from readers so it is ready to be freed.
>> >
>> > The question is:
>> >
>> > If the action itself is already ready to be freed, why do you
>> > need RCU here? What could still read 'act->act_cookie'
>> > while 'act' is already invisible?
>> >
>> > Its last refcnt is already gone, the fast path RCU readers
>> > are gone too given filters use rcu work already.
>> >
>> > Standalone action dump? Again, the last refcnt is already
>> > gone.
>>
>> It is not necessary here, I just used tcf_set_action_cookie() that
>> already implements cookie pointer cleanup to prevent code duplication.
>> I'm open to changing it, if you concerned with performance impact of
>> using atomic operation for re-assigning cookie pointer.
>
> Yeah, totally understand. But >act_cookie very special here,
> it requires no copying when update, unlike the normal cases.
>
> This means from RCU we can remove the "C" here. I know
> you still copy it when dumping it, but it is a part of Read, not
> a part of Update, so it is safe to say you only need R and U
> here.
>
> Which in turn means two things:
>
> 1. You don't have to use RCU anymore.
>
> 2. You don't need a lock for writers given there is no copy
> during update, if you still stick to RCU.
>
> This is why I keep saying you need to justify it, it is not trivial
> and it is not easy to understand either.

I agree with your analysis. I just want to use rcu mechanism here for
managing lifetime of cookie pointer(protecting reads with rcu read lock
and freeing cookie after rcu timeout). However, due to lack of actual
copy phase, I omit locking on update and just use atomic ops to
substitute cookie pointer in concurrent-safe manner.

>
> Thanks!

Thank you for reviewing my code!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update
  2018-07-16  8:31         ` Vlad Buslov
@ 2018-07-17 20:46           ` Cong Wang
  0 siblings, 0 replies; 37+ messages in thread
From: Cong Wang @ 2018-07-17 20:46 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Mon, Jul 16, 2018 at 1:31 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Fri 13 Jul 2018 at 21:51, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Fri, Jul 13, 2018 at 6:30 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >>
> >> On Fri 13 Jul 2018 at 03:52, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> >>
> >> >> Implement functions to atomically update and free action cookie
> >> >> using rcu mechanism.
> >> >
> >> > Without stating any reason..... Is this even a changelog?
> >>
> >> Yes, it is.
> >
> > What do you expect in a changelog generally? Repeating what
> > your code does? Thanks but we don't even want to read any code
> > unless the need of this code is reasonably justified.
>
> In my cover letter:
>  - Motivation for patchset is presented in first paragraph.
>  - Problems that prevent us from removing rtnl lock dependency are
>  described, problem 3 is about cookie pointer.
>  - In implementation section, point 3 presents solution for that
>  problem.


Do you want to use cover letter as a changelog for all patches in
your patchset? Seriously? :)

Every patch is your patchset is unique, because you are not fixing
a problem can be expressed by a pattern.

Given how hard lockless is generally, probably you even can't
find out a pattern. If you really do, I am happy to learn!


>
> >
> > Can we at least agree you have no justification for this change
> > in this changelog? Or you believe this patch is as trivial as
> > a white space change which doesn't need a justification?
>
> Cong, from your last letter I understand that you want to have
> justification specifically for using atomic operation in this particular
> patch. I agree with you that I should have explained it in more details.
> I found a lot of prior art for same or similar atomic ops usage for rcu
> pointers(examples in my previous mail) and assumed it to be trivial, but
> now I understand that I was wrong in this case.

Thanks for having an agreement!

I expect to see more detailed changelog in your future patches! :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions
  2018-07-05 14:24 ` [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions Vlad Buslov
@ 2018-08-07 23:26   ` Cong Wang
  2018-08-08 11:41     ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-08-07 23:26 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik

On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>         attr_size = tcf_action_full_attrs_size(attr_size);
>
>         if (event == RTM_GETACTION)
> -               ret = tcf_get_notify(net, portid, n, &actions, event, extack);
> +               ret = tcf_get_notify(net, portid, n, actions, event, extack);
>         else { /* delete */
> -               ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
> +               ret = tcf_del_notify(net, n, actions, &acts_deleted, portid,
> +                                    attr_size, extack);
>                 if (ret)
>                         goto err;
>                 return ret;
>         }
>  err:
> -       tcf_action_put_lst(&actions);
> +       tcf_action_put_many(&actions[acts_deleted]);
>         return ret;

How does this even work?

You save an index in 'acts_deleted', but you pass &actions[acts_deleted]
to tcf_action_put_many(), which seems you want to start from
where it fails, but inside tcf_action_put_many() it starts from 0
to TCA_ACT_MAX_PRIO, out-of-bound access at least?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
  2018-07-05 14:24 ` [PATCH net-next v6 10/11] net: sched: atomically check-allocate action Vlad Buslov
@ 2018-08-08  1:20   ` Cong Wang
  2018-08-08 12:06     ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-08-08  1:20 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Implement function that atomically checks if action exists and either takes
> reference to it, or allocates idr slot for action index to prevent
> concurrent allocations of actions with same index. Use EBUSY error pointer
> to indicate that idr slot is reserved.

A dumb question:

How could "concurrent allocations of actions with same index" happen
as you already take idrinfo->lock for the whole tcf_idr_check_alloc()??

For me, it should be only one allocation could succeed, all others
should fail.

Maybe you are trying to prevent others treat it like existing one,
but in that case you can just hold the idinfo->lock for all idr operations.

And more importantly, upper layer is able to tell it is a creation or
just replace, you don't have to check this in this complicated way.

IOW, all of these complicated code should not exist.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions
  2018-08-07 23:26   ` Cong Wang
@ 2018-08-08 11:41     ` Vlad Buslov
  2018-08-08 18:29       ` Cong Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-08-08 11:41 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik


On Tue 07 Aug 2018 at 23:26, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>         attr_size = tcf_action_full_attrs_size(attr_size);
>>
>>         if (event == RTM_GETACTION)
>> -               ret = tcf_get_notify(net, portid, n, &actions, event, extack);
>> +               ret = tcf_get_notify(net, portid, n, actions, event, extack);
>>         else { /* delete */
>> -               ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
>> +               ret = tcf_del_notify(net, n, actions, &acts_deleted, portid,
>> +                                    attr_size, extack);
>>                 if (ret)
>>                         goto err;
>>                 return ret;
>>         }
>>  err:
>> -       tcf_action_put_lst(&actions);
>> +       tcf_action_put_many(&actions[acts_deleted]);
>>         return ret;
>
> How does this even work?
>
> You save an index in 'acts_deleted', but you pass &actions[acts_deleted]
> to tcf_action_put_many(), which seems you want to start from
> where it fails, but inside tcf_action_put_many() it starts from 0
> to TCA_ACT_MAX_PRIO, out-of-bound access at least?

Actions array is declared to be TCA_ACT_MAX_PRIO+1 in size, and
initialized to NULL pointers. In loop inside tcf_action_put_many() there
are two checks: One is that index is less than TCA_ACT_MAX_PRIO and
another one that pointer is not NULL. In this case I rely on extra NULL
pointer at the end of actions array to prevent out-of-bound access.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
  2018-08-08  1:20   ` Cong Wang
@ 2018-08-08 12:06     ` Vlad Buslov
  2018-08-09 23:43       ` Cong Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-08-08 12:06 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko


On Wed 08 Aug 2018 at 01:20, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Implement function that atomically checks if action exists and either takes
>> reference to it, or allocates idr slot for action index to prevent
>> concurrent allocations of actions with same index. Use EBUSY error pointer
>> to indicate that idr slot is reserved.
>
> A dumb question:
>
> How could "concurrent allocations of actions with same index" happen
> as you already take idrinfo->lock for the whole
> tcf_idr_check_alloc()??

I guess my changelog is not precise enough in this description.
Let look into sequence of events of initialization of new action:
1) tcf_idr_check_alloc() is called by action init.
2) idrinfo->lock is taken.
3) Lookup in idr is performed to determine if action with specified
index already exists.
4) EBUSY pointer is inserted to indicate that id is taken.
5) idrinfo->lock is released.
6) tcf_idr_check_alloc() returns to action init code.
7) New action is allocated and initialized.
8) tcf_idr_insert() is called.
9) idrinfo->lock is taken.
10) EBUSY pointer is substituted with pointer to new action.
11) idrinfo->lock is released.
12) tcf_idr_insert() returns.

So in this case "concurrent allocations of actions with same index"
means not the allocation with same index during tcf_idr_check_alloc(),
but during the period when idrinfo->lock was released(6-8).

>
> For me, it should be only one allocation could succeed, all others
> should fail.

Correct! And this change is made specifically to enforce that rule.

Otherwise, multiple processes could try to create new action with same
id at the same time, and all processes that executed 3, before any
process reached 10, will "succeed" by overwriting each others action in
idr. (and leak memory while doing so)

>
> Maybe you are trying to prevent others treat it like existing one,
> but in that case you can just hold the idinfo->lock for all idr operations.
>
> And more importantly, upper layer is able to tell it is a creation or
> just replace, you don't have to check this in this complicated way.
>
> IOW, all of these complicated code should not exist.

Original code was simpler and didn't involve temporary EBUSY pointer.
This change was made according to Jiri's request. He wanted to have
unified API to be used by all actions and suggested this approach
specifically.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions
  2018-08-08 11:41     ` Vlad Buslov
@ 2018-08-08 18:29       ` Cong Wang
  2018-08-09  7:03         ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-08-08 18:29 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik

On Wed, Aug 8, 2018 at 4:41 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Tue 07 Aug 2018 at 23:26, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>         attr_size = tcf_action_full_attrs_size(attr_size);
> >>
> >>         if (event == RTM_GETACTION)
> >> -               ret = tcf_get_notify(net, portid, n, &actions, event, extack);
> >> +               ret = tcf_get_notify(net, portid, n, actions, event, extack);
> >>         else { /* delete */
> >> -               ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
> >> +               ret = tcf_del_notify(net, n, actions, &acts_deleted, portid,
> >> +                                    attr_size, extack);
> >>                 if (ret)
> >>                         goto err;
> >>                 return ret;
> >>         }
> >>  err:
> >> -       tcf_action_put_lst(&actions);
> >> +       tcf_action_put_many(&actions[acts_deleted]);
> >>         return ret;
> >
> > How does this even work?
> >
> > You save an index in 'acts_deleted', but you pass &actions[acts_deleted]
> > to tcf_action_put_many(), which seems you want to start from
> > where it fails, but inside tcf_action_put_many() it starts from 0
> > to TCA_ACT_MAX_PRIO, out-of-bound access at least?
>
> Actions array is declared to be TCA_ACT_MAX_PRIO+1 in size, and


Declaration doesn't matter at all, functions see it as a pure pointer
once you pass it as an argument.


> initialized to NULL pointers. In loop inside tcf_action_put_many() there
> are two checks: One is that index is less than TCA_ACT_MAX_PRIO and
> another one that pointer is not NULL. In this case I rely on extra NULL
> pointer at the end of actions array to prevent out-of-bound access.

True, but you pass &actions[acts_deleted] as the start of the array,
so inside it would be:

&actions[acts_deleted][0]...&actions[acts_deleted][MAX_PRIO]

So, the overall of the result is:

actions[acts_deleted]...actions[acts_deleted + MAX_PRIO]

You have out-of-bound access when acts_deleted > 1.

And if acts_deleted == MAX_PRIO-1, then you don't have any
NULL pointer to rely on.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions
  2018-08-08 18:29       ` Cong Wang
@ 2018-08-09  7:03         ` Vlad Buslov
  0 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-08-09  7:03 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik


On Wed 08 Aug 2018 at 18:29, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Aug 8, 2018 at 4:41 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Tue 07 Aug 2018 at 23:26, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>         attr_size = tcf_action_full_attrs_size(attr_size);
>> >>
>> >>         if (event == RTM_GETACTION)
>> >> -               ret = tcf_get_notify(net, portid, n, &actions, event, extack);
>> >> +               ret = tcf_get_notify(net, portid, n, actions, event, extack);
>> >>         else { /* delete */
>> >> -               ret = tcf_del_notify(net, n, &actions, portid, attr_size, extack);
>> >> +               ret = tcf_del_notify(net, n, actions, &acts_deleted, portid,
>> >> +                                    attr_size, extack);
>> >>                 if (ret)
>> >>                         goto err;
>> >>                 return ret;
>> >>         }
>> >>  err:
>> >> -       tcf_action_put_lst(&actions);
>> >> +       tcf_action_put_many(&actions[acts_deleted]);
>> >>         return ret;
>> >
>> > How does this even work?
>> >
>> > You save an index in 'acts_deleted', but you pass &actions[acts_deleted]
>> > to tcf_action_put_many(), which seems you want to start from
>> > where it fails, but inside tcf_action_put_many() it starts from 0
>> > to TCA_ACT_MAX_PRIO, out-of-bound access at least?
>>
>> Actions array is declared to be TCA_ACT_MAX_PRIO+1 in size, and
>
>
> Declaration doesn't matter at all, functions see it as a pure pointer
> once you pass it as an argument.
>
>
>> initialized to NULL pointers. In loop inside tcf_action_put_many() there
>> are two checks: One is that index is less than TCA_ACT_MAX_PRIO and
>> another one that pointer is not NULL. In this case I rely on extra NULL
>> pointer at the end of actions array to prevent out-of-bound access.
>
> True, but you pass &actions[acts_deleted] as the start of the array,
> so inside it would be:
>
> &actions[acts_deleted][0]...&actions[acts_deleted][MAX_PRIO]
>
> So, the overall of the result is:
>
> actions[acts_deleted]...actions[acts_deleted + MAX_PRIO]
>
> You have out-of-bound access when acts_deleted > 1.
>
> And if acts_deleted == MAX_PRIO-1, then you don't have any
> NULL pointer to rely on.

Lets look at the loop inside tcf_action_put_many():

	for (i = 0; i < TCA_ACT_MAX_PRIO && actions[i]; i++) {
		struct tc_action *a = actions[i];
		const struct tc_action_ops *ops = a->ops;

		if (tcf_action_put(a))
			module_put(ops->owner);
	}

In the case you highlighted I rely on second conditional - pointer to
action in array is not NULL. As I already explained in my previous
email, by making initial array TCA_ACT_MAX_PRIO+1 in size I ensure that
there is always a NULL pointer at the end of sequence of actions pointed
by 'actions' pointer/array.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops
  2018-07-05 14:24 ` [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops Vlad Buslov
@ 2018-08-09 19:38   ` Cong Wang
  2018-08-10  9:41     ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-08-09 19:38 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Extend action ops with 'delete' function. Each action type to implements
> its own delete function that doesn't depend on rtnl lock.
>
> Implement delete function that is required to delete actions without
> holding rtnl lock. Use action API function that atomically deletes action
> only if it is still in action idr. This implementation prevents concurrent
> threads from deleting same action twice.

I fail to understand why you introduce ops->delete(), it seems all
you want is getting the tn->idrinfo, but you already have tc_action
before calling ops->delete(), and tc_action has ->idrinfo...

Each type of action does the same too, that is, just calling
tcf_idr_delete_index()...

This changelog sucks again, it claims for skipping rtnl lock,
but you can skip rtnl lock by just calling tcf_idr_delete_index()
directly too, it is not the reason for adding ops->delete().

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
  2018-08-08 12:06     ` Vlad Buslov
@ 2018-08-09 23:43       ` Cong Wang
  2018-08-10 10:29         ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-08-09 23:43 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Wed, Aug 8, 2018 at 5:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Wed 08 Aug 2018 at 01:20, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> Implement function that atomically checks if action exists and either takes
> >> reference to it, or allocates idr slot for action index to prevent
> >> concurrent allocations of actions with same index. Use EBUSY error pointer
> >> to indicate that idr slot is reserved.
> >
> > A dumb question:
> >
> > How could "concurrent allocations of actions with same index" happen
> > as you already take idrinfo->lock for the whole
> > tcf_idr_check_alloc()??
>
> I guess my changelog is not precise enough in this description.
> Let look into sequence of events of initialization of new action:
> 1) tcf_idr_check_alloc() is called by action init.
> 2) idrinfo->lock is taken.
> 3) Lookup in idr is performed to determine if action with specified
> index already exists.
> 4) EBUSY pointer is inserted to indicate that id is taken.
> 5) idrinfo->lock is released.
> 6) tcf_idr_check_alloc() returns to action init code.
> 7) New action is allocated and initialized.
> 8) tcf_idr_insert() is called.
> 9) idrinfo->lock is taken.
> 10) EBUSY pointer is substituted with pointer to new action.
> 11) idrinfo->lock is released.
> 12) tcf_idr_insert() returns.
>
> So in this case "concurrent allocations of actions with same index"
> means not the allocation with same index during tcf_idr_check_alloc(),
> but during the period when idrinfo->lock was released(6-8).

Yes but it is unnecessary:

a) When adding a new action, you can actually allocate and init it before
touching idrinfo, therefore the check and insert can be done in one step
instead of breaking down it into multiple steps, which means you can
acquire idrinfo->lock once.

b) When updating an existing action, it is slightly complicated.
However, you can still allocate a new one first, then find the old one
and copy it into the new one and finally replace it.

In summary, we can do the following:

1. always allocate a new action
2. acquire idrinfo->lock
3a. if it is an add operation: allocate a new ID and insert the new action
3b. if it is a replace operation: find the old one with ID, copy it into the
new one and replace it
4. release idrinfo->lock
5. If 3a or 3b fails, free the allocation. Otherwise succeed.

I know, the locking scope is now per netns rather than per action,
but this can be optimized for replacing, you can hold the old action
and then release the idrinfo->lock, as idr_replace() later doesn't
require idrinfo->lock AFAIK.

Is there anything I miss here?


>
> >
> > For me, it should be only one allocation could succeed, all others
> > should fail.
>
> Correct! And this change is made specifically to enforce that rule.
>
> Otherwise, multiple processes could try to create new action with same
> id at the same time, and all processes that executed 3, before any
> process reached 10, will "succeed" by overwriting each others action in
> idr. (and leak memory while doing so)

I know but again it doesn't look necessary to achieve a same goal.


>
> >
> > Maybe you are trying to prevent others treat it like existing one,
> > but in that case you can just hold the idinfo->lock for all idr operations.
> >
> > And more importantly, upper layer is able to tell it is a creation or
> > just replace, you don't have to check this in this complicated way.
> >
> > IOW, all of these complicated code should not exist.
>
> Original code was simpler and didn't involve temporary EBUSY pointer.
> This change was made according to Jiri's request. He wanted to have
> unified API to be used by all actions and suggested this approach
> specifically.

I will work on this, as this is aligned to my work to make
it RCU-complete.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops
  2018-08-09 19:38   ` Cong Wang
@ 2018-08-10  9:41     ` Vlad Buslov
  0 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-08-10  9:41 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko


On Thu 09 Aug 2018 at 19:38, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Extend action ops with 'delete' function. Each action type to implements
>> its own delete function that doesn't depend on rtnl lock.
>>
>> Implement delete function that is required to delete actions without
>> holding rtnl lock. Use action API function that atomically deletes action
>> only if it is still in action idr. This implementation prevents concurrent
>> threads from deleting same action twice.
>
> I fail to understand why you introduce ops->delete(), it seems all
> you want is getting the tn->idrinfo, but you already have tc_action
> before calling ops->delete(), and tc_action has ->idrinfo...
>
> Each type of action does the same too, that is, just calling
> tcf_idr_delete_index()...

I agree with your assessment. Should have implemented it by just calling
tcf_idr_delete_index() directly.

>
> This changelog sucks again, it claims for skipping rtnl lock,
> but you can skip rtnl lock by just calling tcf_idr_delete_index()
> directly too, it is not the reason for adding ops->delete().

My intention was to implement some generic and parallel-safe API that
could be used to implement delete for all actions. It turns out that, as
you noted, just calling tcf_idr_delete_index() is enough because any
special per-action delete code is already implemented by ops->cleanup().

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
  2018-08-09 23:43       ` Cong Wang
@ 2018-08-10 10:29         ` Vlad Buslov
  2018-08-10 21:45           ` Cong Wang
  0 siblings, 1 reply; 37+ messages in thread
From: Vlad Buslov @ 2018-08-10 10:29 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko


On Thu 09 Aug 2018 at 23:43, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Aug 8, 2018 at 5:06 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Wed 08 Aug 2018 at 01:20, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >> Implement function that atomically checks if action exists and either takes
>> >> reference to it, or allocates idr slot for action index to prevent
>> >> concurrent allocations of actions with same index. Use EBUSY error pointer
>> >> to indicate that idr slot is reserved.
>> >
>> > A dumb question:
>> >
>> > How could "concurrent allocations of actions with same index" happen
>> > as you already take idrinfo->lock for the whole
>> > tcf_idr_check_alloc()??
>>
>> I guess my changelog is not precise enough in this description.
>> Let look into sequence of events of initialization of new action:
>> 1) tcf_idr_check_alloc() is called by action init.
>> 2) idrinfo->lock is taken.
>> 3) Lookup in idr is performed to determine if action with specified
>> index already exists.
>> 4) EBUSY pointer is inserted to indicate that id is taken.
>> 5) idrinfo->lock is released.
>> 6) tcf_idr_check_alloc() returns to action init code.
>> 7) New action is allocated and initialized.
>> 8) tcf_idr_insert() is called.
>> 9) idrinfo->lock is taken.
>> 10) EBUSY pointer is substituted with pointer to new action.
>> 11) idrinfo->lock is released.
>> 12) tcf_idr_insert() returns.
>>
>> So in this case "concurrent allocations of actions with same index"
>> means not the allocation with same index during tcf_idr_check_alloc(),
>> but during the period when idrinfo->lock was released(6-8).
>
> Yes but it is unnecessary:
>
> a) When adding a new action, you can actually allocate and init it before
> touching idrinfo, therefore the check and insert can be done in one step
> instead of breaking down it into multiple steps, which means you can
> acquire idrinfo->lock once.
>
> b) When updating an existing action, it is slightly complicated.
> However, you can still allocate a new one first, then find the old one
> and copy it into the new one and finally replace it.
>
> In summary, we can do the following:
>
> 1. always allocate a new action
> 2. acquire idrinfo->lock
> 3a. if it is an add operation: allocate a new ID and insert the new action
> 3b. if it is a replace operation: find the old one with ID, copy it into the
> new one and replace it
> 4. release idrinfo->lock
> 5. If 3a or 3b fails, free the allocation. Otherwise succeed.
>
> I know, the locking scope is now per netns rather than per action,
> but this can be optimized for replacing, you can hold the old action
> and then release the idrinfo->lock, as idr_replace() later doesn't
> require idrinfo->lock AFAIK.
>
> Is there anything I miss here?

Approach you suggest is valid, but has its own trade-offs:

- As you noted, lock granularity becomes coarse-grained due to per-netns
scope.

- I am not sure it is possible to call idr_replace() without obtaining
idrinfo->lock in this particular case. Concurrent delete of action with
same id is possible and, according to idr_replace() description,
unlocked execution is not supported for such use-case:

/**
 * idr_replace() - replace pointer for given ID.
 * @idr: IDR handle.
 * @ptr: New pointer to associate with the ID.
 * @id: ID to change.
 *
 * Replace the pointer registered with an ID and return the old value.
 * This function can be called under the RCU read lock concurrently with
 * idr_alloc() and idr_remove() (as long as the ID being removed is not
 * the one being replaced!).
 *
 * Returns: the old value on success.  %-ENOENT indicates that @id was not
 * found.  %-EINVAL indicates that @ptr was not valid.
 */

- High rate or replace request will generate a lot of unnecessary memory
allocations and deallocations.

>
>
>>
>> >
>> > For me, it should be only one allocation could succeed, all others
>> > should fail.
>>
>> Correct! And this change is made specifically to enforce that rule.
>>
>> Otherwise, multiple processes could try to create new action with same
>> id at the same time, and all processes that executed 3, before any
>> process reached 10, will "succeed" by overwriting each others action in
>> idr. (and leak memory while doing so)
>
> I know but again it doesn't look necessary to achieve a same goal.
>
>
>>
>> >
>> > Maybe you are trying to prevent others treat it like existing one,
>> > but in that case you can just hold the idinfo->lock for all idr operations.
>> >
>> > And more importantly, upper layer is able to tell it is a creation or
>> > just replace, you don't have to check this in this complicated way.
>> >
>> > IOW, all of these complicated code should not exist.
>>
>> Original code was simpler and didn't involve temporary EBUSY pointer.
>> This change was made according to Jiri's request. He wanted to have
>> unified API to be used by all actions and suggested this approach
>> specifically.
>
> I will work on this, as this is aligned to my work to make
> it RCU-complete.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
  2018-08-10 10:29         ` Vlad Buslov
@ 2018-08-10 21:45           ` Cong Wang
  2018-08-13  7:55             ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-08-10 21:45 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Fri, Aug 10, 2018 at 3:29 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Approach you suggest is valid, but has its own trade-offs:
>
> - As you noted, lock granularity becomes coarse-grained due to per-netns
> scope.

Sure, you acquire idrinfo->lock too, the only difference is how long
you take it.

The bottleneck of your approach is the same, also you take idrinfo->lock
twice, so the contention is heavier.


>
> - I am not sure it is possible to call idr_replace() without obtaining
> idrinfo->lock in this particular case. Concurrent delete of action with
> same id is possible and, according to idr_replace() description,
> unlocked execution is not supported for such use-case:

But we can hold its refcnt before releasing idrinfo->lock, so
idr_replace() can't race with concurrent delete.


>
> - High rate or replace request will generate a lot of unnecessary memory
> allocations and deallocations.
>

Yes, this is literally how RCU works, always allocate and copy,
release upon error.

Also, if this is really a problem, we have SLAB_TYPESAFE_BY_RCU
too. ;)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 10/11] net: sched: atomically check-allocate action
  2018-08-10 21:45           ` Cong Wang
@ 2018-08-13  7:55             ` Vlad Buslov
  0 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-08-13  7:55 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko


On Fri 10 Aug 2018 at 21:45, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Aug 10, 2018 at 3:29 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Approach you suggest is valid, but has its own trade-offs:
>>
>> - As you noted, lock granularity becomes coarse-grained due to per-netns
>> scope.
>
> Sure, you acquire idrinfo->lock too, the only difference is how long
> you take it.
>
> The bottleneck of your approach is the same, also you take idrinfo->lock
> twice, so the contention is heavier.
>
>
>>
>> - I am not sure it is possible to call idr_replace() without obtaining
>> idrinfo->lock in this particular case. Concurrent delete of action with
>> same id is possible and, according to idr_replace() description,
>> unlocked execution is not supported for such use-case:
>
> But we can hold its refcnt before releasing idrinfo->lock, so
> idr_replace() can't race with concurrent delete.

Yes, for concurrent delete case I agree. Action is removed from idr only
when last reference is released and, in case of existing action update,
init holds a reference.

What about case when multiple task race to update the same existing
action? I assume idr_replace() can be used for such case, but what would
be the algorithm in case init replaced some other action, and not the
action it actually copied before calling idr_replace()?

>
>
>>
>> - High rate or replace request will generate a lot of unnecessary memory
>> allocations and deallocations.
>>
>
> Yes, this is literally how RCU works, always allocate and copy,
> release upon error.
>
> Also, if this is really a problem, we have SLAB_TYPESAFE_BY_RCU
> too. ;)

Current action update implementation is in-place, so there is no "copy"
stage, besides members of some actions that are RCU-pointers. But I
guess it makes sense if your goal is to refactor all actions to be
updated with RCU.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite
  2018-07-05 14:24 ` [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite Vlad Buslov
@ 2018-08-13 23:00   ` Cong Wang
  2018-08-14 17:23     ` Vlad Buslov
  0 siblings, 1 reply; 37+ messages in thread
From: Cong Wang @ 2018-08-13 23:00 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
> index 89a761395c94..acea3feae762 100644
> --- a/net/sched/act_ife.c
> +++ b/net/sched/act_ife.c
...
> @@ -548,6 +546,8 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
>
>                         if (exists)
>                                 spin_unlock_bh(&ife->tcf_lock);
> +                       tcf_idr_release(*a, bind);
> +
>                         kfree(p);
>                         return err;
>                 }

With this change, you seem release it twice when nla_parse_nested() fails
for ACT_P_CREATED case...?

Looks like what you want is the following?

                if (err) {
                        tcf_idr_release(*a, bind);
                        kfree(p);
                        return err;
                }

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite
  2018-08-13 23:00   ` Cong Wang
@ 2018-08-14 17:23     ` Vlad Buslov
  0 siblings, 0 replies; 37+ messages in thread
From: Vlad Buslov @ 2018-08-14 17:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
	Jiri Pirko, Alexei Starovoitov, Daniel Borkmann,
	Yevgeny Kliteynik, Jiri Pirko

On Mon 13 Aug 2018 at 23:00, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Jul 5, 2018 at 7:24 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
>> index 89a761395c94..acea3feae762 100644
>> --- a/net/sched/act_ife.c
>> +++ b/net/sched/act_ife.c
> ...
>> @@ -548,6 +546,8 @@ static int tcf_ife_init(struct net *net, struct nlattr *nla,
>>
>>                         if (exists)
>>                                 spin_unlock_bh(&ife->tcf_lock);
>> +                       tcf_idr_release(*a, bind);
>> +
>>                         kfree(p);
>>                         return err;
>>                 }
>
> With this change, you seem release it twice when nla_parse_nested() fails
> for ACT_P_CREATED case...?

Thank you, great catch!

>
> Looks like what you want is the following?
>
>                 if (err) {
>                         tcf_idr_release(*a, bind);
>                         kfree(p);
>                         return err;
>                 }

Yes. Sending the fix.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-08-14 20:14 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-05 14:24 [PATCH net-next v6 00/11] Modify action API for implementing lockless actions Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 01/11] net: sched: use rcu for action cookie update Vlad Buslov
2018-07-13  3:52   ` Cong Wang
2018-07-13 13:30     ` Vlad Buslov
2018-07-13 21:51       ` Cong Wang
2018-07-13 22:11         ` David Miller
2018-07-14  0:14           ` Cong Wang
2018-07-16  8:31         ` Vlad Buslov
2018-07-17 20:46           ` Cong Wang
2018-07-05 14:24 ` [PATCH net-next v6 02/11] net: sched: change type of reference and bind counters Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 03/11] net: sched: implement unlocked action init API Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 04/11] net: sched: always take reference to action Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 05/11] net: sched: implement action API that deletes action by index Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 06/11] net: sched: add 'delete' function to action ops Vlad Buslov
2018-08-09 19:38   ` Cong Wang
2018-08-10  9:41     ` Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 07/11] net: sched: implement reference counted action release Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 08/11] net: sched: don't release reference on action overwrite Vlad Buslov
2018-08-13 23:00   ` Cong Wang
2018-08-14 17:23     ` Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 09/11] net: sched: use reference counting action init Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 10/11] net: sched: atomically check-allocate action Vlad Buslov
2018-08-08  1:20   ` Cong Wang
2018-08-08 12:06     ` Vlad Buslov
2018-08-09 23:43       ` Cong Wang
2018-08-10 10:29         ` Vlad Buslov
2018-08-10 21:45           ` Cong Wang
2018-08-13  7:55             ` Vlad Buslov
2018-07-05 14:24 ` [PATCH net-next v6 11/11] net: sched: change action API to use array of pointers to actions Vlad Buslov
2018-08-07 23:26   ` Cong Wang
2018-08-08 11:41     ` Vlad Buslov
2018-08-08 18:29       ` Cong Wang
2018-08-09  7:03         ` Vlad Buslov
2018-07-07 11:41 ` [PATCH net-next v6 00/11] Modify action API for implementing lockless actions David Miller
2018-07-08  3:43 ` David Miller
2018-07-13  3:54   ` Cong Wang
2018-07-13 13:40     ` Vlad Buslov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.