All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock
@ 2018-11-12  7:55 Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue Vlad Buslov
                   ` (16 more replies)
  0 siblings, 17 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently, and completes refactoring of cls API rules update handlers
by removing rtnl lock dependency from code that handles chains and
classifiers. Rules update handlers are registered with
RTNL_FLAG_DOIT_UNLOCKED flag.

This patch set substitutes global rtnl lock dependency on rules update
path in cls API by extending its data structures with following locks:
- tcf_block with 'lock' spinlock. It is used to protect block state, and
  life-time management fields of chains on the block (chain->refcnt,
  chain->action_refcnt, chain->explicitly crated, etc.).
- tcf_chain with 'filter_chain_lock' spinlock, that is used to protect
  list of classifier instances attached to chain.
- tcf_proto with 'lock' spinlock that is intended to be used to
  synchronize access to classifiers that support unlocked execution.

Chain0 head change callbacks can be sleeping and cannot be protected by
block spinlock. To solve this issue, sleeping miniqp swap function (used
as head change callback by ingress and clsact Qdiscs) is refactored to
offload sleeping operations to workqueue. New ordered workqueue
'tc_proto_workqueue' is created in cls_api to be used by miniqp and for
tcf_proto deallocation, which is also moved to workqueue to prevent
deallocation of tp's that are still in use by block. Performing both
miniqp swap and tp deallocation on same ordered workqueue ensures that
any pending head change requests involving tp are completed before the
tp is deallocated.

Classifiers are extended with reference counting to accommodate parallel
access by unlocked cls API. Classifier ops structure is extended with
additional 'put' function to allow reference counting of filters and
intended to be used by classifiers that implement rtnl-unlocked API.
Users of classifiers and individual filter instances are modified to
always hold reference while working with them.

Classifiers that support unlocked execution still need to know the
status of rtnl lock, so their API is extended with additional
'rtnl_held' argument that is used to indicate that caller holds rtnl
lock.

Vlad Buslov (17):
  net: sched: refactor mini_qdisc_pair_swap() to use workqueue
  net: sched: protect block state with spinlock
  net: sched: refactor tc_ctl_chain() to use block->lock
  net: sched: protect block->chain0 with block->lock
  net: sched: traverse chains in block with tcf_get_next_chain()
  net: sched: protect chain template accesses with block lock
  net: sched: lock the chain when accessing filter_chain list
  net: sched: introduce reference counting for tcf proto
  net: sched: traverse classifiers in chain with tcf_get_next_proto()
  net: sched: refactor tp insert/delete for concurrent execution
  net: sched: prevent insertion of new classifiers during chain flush
  net: sched: track rtnl lock status when validating extensions
  net: sched: extend proto ops with 'put' callback
  net: sched: extend proto ops to support unlocked classifiers
  net: sched: add flags to Qdisc class ops struct
  net: sched: conditionally take rtnl lock on rules update path
  net: sched: unlock rules update API

 include/net/pkt_cls.h     |   6 +-
 include/net/sch_generic.h |  73 +++-
 net/sched/cls_api.c       | 919 +++++++++++++++++++++++++++++++++++++---------
 net/sched/cls_basic.c     |  14 +-
 net/sched/cls_bpf.c       |  15 +-
 net/sched/cls_cgroup.c    |  13 +-
 net/sched/cls_flow.c      |  15 +-
 net/sched/cls_flower.c    |  16 +-
 net/sched/cls_fw.c        |  15 +-
 net/sched/cls_matchall.c  |  16 +-
 net/sched/cls_route.c     |  14 +-
 net/sched/cls_rsvp.h      |  16 +-
 net/sched/cls_tcindex.c   |  17 +-
 net/sched/cls_u32.c       |  14 +-
 net/sched/sch_api.c       |  10 +-
 net/sched/sch_generic.c   |  37 +-
 net/sched/sch_ingress.c   |   4 +
 17 files changed, 955 insertions(+), 259 deletions(-)

-- 
2.7.5

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12 17:28   ` David Miller
  2018-11-12  7:55 ` [PATCH net-next 02/17] net: sched: protect block state with spinlock Vlad Buslov
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

As a part of the effort to remove dependency on rtnl lock, cls API is being
converted to use fine-grained locking mechanisms instead of global rtnl
lock. However, chain_head_change callback for ingress Qdisc is a sleeping
function and cannot be executed while holding a spinlock.

Extend cls API with new workqueue intended to be used for tcf_proto
lifetime management. Modify tcf_proto_destroy() to deallocate proto
asynchronously on workqueue in order to ensure that all chain_head_change
callbacks involving the proto complete before it is freed. Convert
mini_qdisc_pair_swap(), that is used as a chain_head_change callback for
ingress and clsact Qdiscs, to use a workqueue.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  6 ++++++
 net/sched/cls_api.c       | 48 +++++++++++++++++++++++++++++++++++++++++------
 net/sched/sch_generic.c   | 37 +++++++++++++++++++++++++++++++++---
 net/sched/sch_ingress.c   |  4 ++++
 4 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4d736427a4cb..29118bbd528f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -319,6 +319,7 @@ struct tcf_proto {
 	const struct tcf_proto_ops	*ops;
 	struct tcf_chain	*chain;
 	struct rcu_head		rcu;
+	struct work_struct	work;
 };
 
 struct qdisc_skb_cb {
@@ -397,6 +398,8 @@ tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
 	}
 }
 
+bool tc_queue_proto_work(struct work_struct *work);
+
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 {
 	struct qdisc_skb_cb *qcb;
@@ -1148,12 +1151,15 @@ struct mini_Qdisc_pair {
 	struct mini_Qdisc miniq1;
 	struct mini_Qdisc miniq2;
 	struct mini_Qdisc __rcu **p_miniq;
+	struct tcf_proto *tp_head;
+	struct work_struct work;
 };
 
 void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 			  struct tcf_proto *tp_head);
 void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
 			  struct mini_Qdisc __rcu **p_miniq);
+void mini_qdisc_pair_cleanup(struct mini_Qdisc_pair *miniqp);
 
 static inline void skb_tc_reinsert(struct sk_buff *skb, struct tcf_result *res)
 {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f427a1e00e7e..17c1691bf0c0 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -107,12 +107,14 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops)
 EXPORT_SYMBOL(register_tcf_proto_ops);
 
 static struct workqueue_struct *tc_filter_wq;
+static struct workqueue_struct *tc_proto_wq;
 
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops)
 {
 	struct tcf_proto_ops *t;
 	int rc = -ENOENT;
 
+	flush_workqueue(tc_proto_wq);
 	/* Wait for outstanding call_rcu()s, if any, from a
 	 * tcf_proto_ops's destroy() handler.
 	 */
@@ -139,6 +141,12 @@ bool tcf_queue_work(struct rcu_work *rwork, work_func_t func)
 }
 EXPORT_SYMBOL(tcf_queue_work);
 
+bool tc_queue_proto_work(struct work_struct *work)
+{
+	return queue_work(tc_proto_wq, work);
+}
+EXPORT_SYMBOL(tc_queue_proto_work);
+
 /* Select new prio value from the range, managed by kernel. */
 
 static inline u32 tcf_auto_prio(struct tcf_proto *tp)
@@ -151,6 +159,23 @@ static inline u32 tcf_auto_prio(struct tcf_proto *tp)
 	return TC_H_MAJ(first);
 }
 
+static void tcf_chain_put(struct tcf_chain *chain);
+
+static void tcf_proto_destroy_work(struct work_struct *work)
+{
+	struct tcf_proto *tp = container_of(work, struct tcf_proto, work);
+	struct tcf_chain *chain = tp->chain;
+
+	rtnl_lock();
+
+	tp->ops->destroy(tp, NULL);
+	module_put(tp->ops->owner);
+	kfree_rcu(tp, rcu);
+	tcf_chain_put(chain);
+
+	rtnl_unlock();
+}
+
 static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 					  u32 prio, struct tcf_chain *chain,
 					  struct netlink_ext_ack *extack)
@@ -171,6 +196,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	tp->protocol = protocol;
 	tp->prio = prio;
 	tp->chain = chain;
+	INIT_WORK(&tp->work, tcf_proto_destroy_work);
 
 	err = tp->ops->init(tp);
 	if (err) {
@@ -187,9 +213,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 static void tcf_proto_destroy(struct tcf_proto *tp,
 			      struct netlink_ext_ack *extack)
 {
-	tp->ops->destroy(tp, extack);
-	module_put(tp->ops->owner);
-	kfree_rcu(tp, rcu);
+	tc_queue_proto_work(&tp->work);
 }
 
 struct tcf_filter_chain_list_item {
@@ -361,7 +385,6 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 		RCU_INIT_POINTER(chain->filter_chain, tp->next);
 		tcf_proto_destroy(tp, NULL);
 		tp = rtnl_dereference(chain->filter_chain);
-		tcf_chain_put(chain);
 	}
 }
 
@@ -1131,7 +1154,6 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 	if (tp == chain->filter_chain)
 		tcf_chain0_head_change(chain, next);
 	RCU_INIT_POINTER(*chain_info->pprev, next);
-	tcf_chain_put(chain);
 }
 
 static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
@@ -1420,8 +1442,14 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false);
 	} else {
-		if (tp_created)
+		if (tp_created) {
+			/* tp wasn't inserted to chain tp list. Take reference
+			 * to chain manually for tcf_proto_destroy() to
+			 * release.
+			 */
+			tcf_chain_hold(chain);
 			tcf_proto_destroy(tp, NULL);
+		}
 	}
 
 errout:
@@ -2351,6 +2379,12 @@ static int __init tc_filter_init(void)
 	if (!tc_filter_wq)
 		return -ENOMEM;
 
+	tc_proto_wq = alloc_ordered_workqueue("tc_proto_workqueue", 0);
+	if (!tc_proto_wq) {
+		err = -ENOMEM;
+		goto err_proto_wq;
+	}
+
 	err = register_pernet_subsys(&tcf_net_ops);
 	if (err)
 		goto err_register_pernet_subsys;
@@ -2367,6 +2401,8 @@ static int __init tc_filter_init(void)
 	return 0;
 
 err_register_pernet_subsys:
+	destroy_workqueue(tc_proto_wq);
+err_proto_wq:
 	destroy_workqueue(tc_filter_wq);
 	return err;
 }
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index de1663f7d3ad..5aa8e485fcb0 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1363,10 +1363,14 @@ static void mini_qdisc_rcu_func(struct rcu_head *head)
 {
 }
 
-void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
-			  struct tcf_proto *tp_head)
+void __mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
+			    struct tcf_proto *tp_head)
 {
-	struct mini_Qdisc *miniq_old = rtnl_dereference(*miniqp->p_miniq);
+	/* p_miniq is either changed on ordered workqueue or during miniqp
+	 * cleanup. In both cases no concurrent modification is possible.
+	 */
+	struct mini_Qdisc *miniq_old =
+		rcu_dereference_protected(*miniqp->p_miniq, 1);
 	struct mini_Qdisc *miniq;
 
 	if (!tp_head) {
@@ -1394,6 +1398,25 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
 		 */
 		call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func);
 }
+
+void mini_qdisc_pair_swap_work(struct work_struct *work)
+{
+	struct mini_Qdisc_pair *miniqp = container_of(work,
+						      struct mini_Qdisc_pair,
+						      work);
+	struct tcf_proto *tp_head;
+
+	tp_head = xchg(&miniqp->tp_head, ERR_PTR(-ENOENT));
+	if (!IS_ERR(tp_head))
+		__mini_qdisc_pair_swap(miniqp, tp_head);
+}
+
+void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
+			  struct tcf_proto *tp_head)
+{
+	xchg(&miniqp->tp_head, tp_head);
+	tc_queue_proto_work(&miniqp->work);
+}
 EXPORT_SYMBOL(mini_qdisc_pair_swap);
 
 void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
@@ -1404,5 +1427,13 @@ void mini_qdisc_pair_init(struct mini_Qdisc_pair *miniqp, struct Qdisc *qdisc,
 	miniqp->miniq2.cpu_bstats = qdisc->cpu_bstats;
 	miniqp->miniq2.cpu_qstats = qdisc->cpu_qstats;
 	miniqp->p_miniq = p_miniq;
+	INIT_WORK(&miniqp->work, mini_qdisc_pair_swap_work);
 }
 EXPORT_SYMBOL(mini_qdisc_pair_init);
+
+void mini_qdisc_pair_cleanup(struct mini_Qdisc_pair *miniqp)
+{
+	cancel_work_sync(&miniqp->work);
+	__mini_qdisc_pair_swap(miniqp, NULL);
+}
+EXPORT_SYMBOL(mini_qdisc_pair_cleanup);
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index ce3f55259d0d..8edc3c46bd9c 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -99,6 +99,7 @@ static void ingress_destroy(struct Qdisc *sch)
 	struct ingress_sched_data *q = qdisc_priv(sch);
 
 	tcf_block_put_ext(q->block, sch, &q->block_info);
+	mini_qdisc_pair_cleanup(&q->miniqp);
 	net_dec_ingress_queue();
 }
 
@@ -245,6 +246,9 @@ static void clsact_destroy(struct Qdisc *sch)
 	tcf_block_put_ext(q->egress_block, sch, &q->egress_block_info);
 	tcf_block_put_ext(q->ingress_block, sch, &q->ingress_block_info);
 
+	mini_qdisc_pair_cleanup(&q->miniqp_ingress);
+	mini_qdisc_pair_cleanup(&q->miniqp_egress);
+
 	net_dec_ingress_queue();
 	net_dec_egress_queue();
 }
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 02/17] net: sched: protect block state with spinlock
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12 17:28   ` David Miller
  2018-11-12  7:55 ` [PATCH net-next 03/17] net: sched: refactor tc_ctl_chain() to use block->lock Vlad Buslov
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Currently, tcf_block doesn't use any synchronization mechanisms to protect
code that manages lifetime of its chains. block->chain_list and multiple
variables in tcf_chain that control its lifetime assume external
synchronization provided by global rtnl lock. Converting chain reference
counting to atomic reference counters is not possible because cls API uses
multiple counters and flags to control chain lifetime, so all of them must
be synchronized in chain get/put code.

Use single per-block lock to protect block data and manage lifetime of all
chains on the block. Always take block->lock when accessing chain_list.
Chain get and put modify chain lifetime-management data and parent block's
chain_list, so take the lock in these functions. Verify block->lock state
with assertions in functions that expect to be called with the lock taken
and are called from multiple places. Take block->lock when accessing
filter_chain_list.

block->lock is a spinlock which means blocking functions like classifier
ops callbacks cannot be called while holding it. Rearrange chain get and
put functions code to only access protected chain data while holding block
lock and move blocking calls outside critical section:
- Check if chain was explicitly created inside put function while holding
  block lock. Add additional argument to __tcf_chain_put() to only put
  explicitly created chain.
- Rearrange code to only access chain reference counter and chain action
  reference counter while holding block lock.
- Split tcf_chain_destroy() helper to two functions: one that requires
  block->lock, and another one that needs to call sleeping functions and
  can be executed after lock is released. First helper is used to detach
  chain from block and make it inaccessible for concurrent users, second
  actually deallocates chain memory (and parent block, if applicable).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  4 ++
 net/sched/cls_api.c       | 93 +++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 81 insertions(+), 16 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 29118bbd528f..5f4fc28fc77a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -350,6 +350,10 @@ struct tcf_chain {
 };
 
 struct tcf_block {
+	/* Lock protects tcf_block and lifetime-management data of chains
+	 * attached to the block (refcnt, action_refcnt, explicitly_created).
+	 */
+	spinlock_t lock;
 	struct list_head chain_list;
 	u32 index; /* block index for shared blocks */
 	refcount_t refcnt;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 17c1691bf0c0..df3326dd33ef 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -216,6 +216,10 @@ static void tcf_proto_destroy(struct tcf_proto *tp,
 	tc_queue_proto_work(&tp->work);
 }
 
+#define ASSERT_BLOCK_LOCKED(block)					\
+	WARN_ONCE(!spin_is_locked(&(block)->lock),		\
+		  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)
+
 struct tcf_filter_chain_list_item {
 	struct list_head list;
 	tcf_chain_head_change_t *chain_head_change;
@@ -227,7 +231,9 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 {
 	struct tcf_chain *chain;
 
-	chain = kzalloc(sizeof(*chain), GFP_KERNEL);
+	ASSERT_BLOCK_LOCKED(block);
+
+	chain = kzalloc(sizeof(*chain), GFP_ATOMIC);
 	if (!chain)
 		return NULL;
 	list_add_tail(&chain->list, &block->chain_list);
@@ -258,13 +264,29 @@ static void tcf_chain0_head_change(struct tcf_chain *chain,
 		tcf_chain_head_change_item(item, tp_head);
 }
 
-static void tcf_chain_destroy(struct tcf_chain *chain)
+/* Returns true if block can be safely freed. */
+
+static bool tcf_chain_detach(struct tcf_chain *chain)
 {
 	struct tcf_block *block = chain->block;
 
+	ASSERT_BLOCK_LOCKED(block);
+
 	list_del(&chain->list);
 	if (!chain->index)
 		block->chain0.chain = NULL;
+
+	if (list_empty(&block->chain_list) &&
+	    refcount_read(&block->refcnt) == 0)
+		return true;
+
+	return false;
+}
+
+static void tcf_chain_destroy(struct tcf_chain *chain, bool free_block)
+{
+	struct tcf_block *block = chain->block;
+
 	kfree(chain);
 	if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
 		kfree_rcu(block, rcu);
@@ -272,11 +294,15 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 
 static void tcf_chain_hold(struct tcf_chain *chain)
 {
+	ASSERT_BLOCK_LOCKED(chain->block);
+
 	++chain->refcnt;
 }
 
 static bool tcf_chain_held_by_acts_only(struct tcf_chain *chain)
 {
+	ASSERT_BLOCK_LOCKED(chain->block);
+
 	/* In case all the references are action references, this
 	 * chain should not be shown to the user.
 	 */
@@ -288,6 +314,8 @@ static struct tcf_chain *tcf_chain_lookup(struct tcf_block *block,
 {
 	struct tcf_chain *chain;
 
+	ASSERT_BLOCK_LOCKED(block);
+
 	list_for_each_entry(chain, &block->chain_list, list) {
 		if (chain->index == chain_index)
 			return chain;
@@ -302,31 +330,40 @@ static struct tcf_chain *__tcf_chain_get(struct tcf_block *block,
 					 u32 chain_index, bool create,
 					 bool by_act)
 {
-	struct tcf_chain *chain = tcf_chain_lookup(block, chain_index);
+	struct tcf_chain *chain = NULL;
+	bool is_first_reference;
 
+	spin_lock(&block->lock);
+	chain = tcf_chain_lookup(block, chain_index);
 	if (chain) {
 		tcf_chain_hold(chain);
 	} else {
 		if (!create)
-			return NULL;
+			goto errout;
 		chain = tcf_chain_create(block, chain_index);
 		if (!chain)
-			return NULL;
+			goto errout;
 	}
 
 	if (by_act)
 		++chain->action_refcnt;
+	is_first_reference = chain->refcnt - chain->action_refcnt == 1;
+	spin_unlock(&block->lock);
 
 	/* Send notification only in case we got the first
 	 * non-action reference. Until then, the chain acts only as
 	 * a placeholder for actions pointing to it and user ought
 	 * not know about them.
 	 */
-	if (chain->refcnt - chain->action_refcnt == 1 && !by_act)
+	if (is_first_reference && !by_act)
 		tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL,
 				RTM_NEWCHAIN, false);
 
 	return chain;
+
+errout:
+	spin_unlock(&block->lock);
+	return chain;
 }
 
 static struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
@@ -343,37 +380,59 @@ EXPORT_SYMBOL(tcf_chain_get_by_act);
 
 static void tc_chain_tmplt_del(struct tcf_chain *chain);
 
-static void __tcf_chain_put(struct tcf_chain *chain, bool by_act)
+static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
+			    bool explicitly_created)
 {
+	struct tcf_block *block = chain->block;
+	bool is_last, free_block = false;
+	unsigned int refcnt;
+
+	spin_lock(&block->lock);
+	if (explicitly_created) {
+		if (!chain->explicitly_created) {
+			spin_unlock(&block->lock);
+			return;
+		}
+		chain->explicitly_created = false;
+	}
+
 	if (by_act)
 		chain->action_refcnt--;
-	chain->refcnt--;
+
+	/* tc_chain_notify_delete can't be called while holding block lock.
+	 * However, when block is unlocked chain can be changed concurrently, so
+	 * save these to temporary variables.
+	 */
+	refcnt = --chain->refcnt;
+	is_last = refcnt - chain->action_refcnt == 0;
+	if (refcnt == 0)
+		free_block = tcf_chain_detach(chain);
+	spin_unlock(&block->lock);
 
 	/* The last dropped non-action reference will trigger notification. */
-	if (chain->refcnt - chain->action_refcnt == 0 && !by_act)
+	if (is_last && !by_act)
 		tc_chain_notify(chain, NULL, 0, 0, RTM_DELCHAIN, false);
 
-	if (chain->refcnt == 0) {
+	if (refcnt == 0) {
 		tc_chain_tmplt_del(chain);
-		tcf_chain_destroy(chain);
+		tcf_chain_destroy(chain, free_block);
 	}
 }
 
 static void tcf_chain_put(struct tcf_chain *chain)
 {
-	__tcf_chain_put(chain, false);
+	__tcf_chain_put(chain, false, false);
 }
 
 void tcf_chain_put_by_act(struct tcf_chain *chain)
 {
-	__tcf_chain_put(chain, true);
+	__tcf_chain_put(chain, true, false);
 }
 EXPORT_SYMBOL(tcf_chain_put_by_act);
 
 static void tcf_chain_put_explicitly_created(struct tcf_chain *chain)
 {
-	if (chain->explicitly_created)
-		tcf_chain_put(chain);
+	__tcf_chain_put(chain, false, true);
 }
 
 static void tcf_chain_flush(struct tcf_chain *chain)
@@ -540,6 +599,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 		NL_SET_ERR_MSG(extack, "Memory allocation for block failed");
 		return ERR_PTR(-ENOMEM);
 	}
+	spin_lock_init(&block->lock);
 	INIT_LIST_HEAD(&block->chain_list);
 	INIT_LIST_HEAD(&block->cb_list);
 	INIT_LIST_HEAD(&block->owner_list);
@@ -603,7 +663,7 @@ static void tcf_block_put_all_chains(struct tcf_block *block)
 static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
 			    struct tcf_block_ext_info *ei)
 {
-	if (refcount_dec_and_test(&block->refcnt)) {
+	if (refcount_dec_and_lock(&block->refcnt, &block->lock)) {
 		/* Flushing/putting all chains will cause the block to be
 		 * deallocated when last chain is freed. However, if chain_list
 		 * is empty, block has to be manually deallocated. After block
@@ -612,6 +672,7 @@ static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
 		 */
 		bool free_block = list_empty(&block->chain_list);
 
+		spin_unlock(&block->lock);
 		if (tcf_block_shared(block))
 			tcf_block_remove(block, block->net);
 		if (!free_block)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 03/17] net: sched: refactor tc_ctl_chain() to use block->lock
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 02/17] net: sched: protect block state with spinlock Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 04/17] net: sched: protect block->chain0 with block->lock Vlad Buslov
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

In order to remove dependency on rtnl lock, modify chain API to use
block->lock to protect chain from concurrent modification. Rearrange
tc_ctl_chain() code to call tcf_chain_hold() while holding block->lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index df3326dd33ef..146a02094905 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2047,6 +2047,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 		err = -EINVAL;
 		goto errout_block;
 	}
+
+	spin_lock(&block->lock);
 	chain = tcf_chain_lookup(block, chain_index);
 	if (n->nlmsg_type == RTM_NEWCHAIN) {
 		if (chain) {
@@ -2058,41 +2060,49 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 			} else {
 				NL_SET_ERR_MSG(extack, "Filter chain already exists");
 				err = -EEXIST;
-				goto errout_block;
+				goto errout_block_locked;
 			}
 		} else {
 			if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 				NL_SET_ERR_MSG(extack, "Need both RTM_NEWCHAIN and NLM_F_CREATE to create a new chain");
 				err = -ENOENT;
-				goto errout_block;
+				goto errout_block_locked;
 			}
 			chain = tcf_chain_create(block, chain_index);
 			if (!chain) {
 				NL_SET_ERR_MSG(extack, "Failed to create filter chain");
 				err = -ENOMEM;
-				goto errout_block;
+				goto errout_block_locked;
 			}
 		}
 	} else {
 		if (!chain || tcf_chain_held_by_acts_only(chain)) {
 			NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
 			err = -EINVAL;
-			goto errout_block;
+			goto errout_block_locked;
 		}
 		tcf_chain_hold(chain);
 	}
 
+	if (n->nlmsg_type == RTM_NEWCHAIN) {
+		/* Modifying chain requires holding parent block lock. In case
+		 * the chain was successfully added, take a reference to the
+		 * chain. This ensures that an empty chain does not disappear at
+		 * the end of this function.
+		 */
+		tcf_chain_hold(chain);
+		chain->explicitly_created = true;
+	}
+	spin_unlock(&block->lock);
+
 	switch (n->nlmsg_type) {
 	case RTM_NEWCHAIN:
 		err = tc_chain_tmplt_add(chain, net, tca, extack);
-		if (err)
+		if (err) {
+			tcf_chain_put_explicitly_created(chain);
 			goto errout;
-		/* In case the chain was successfully added, take a reference
-		 * to the chain. This ensures that an empty chain
-		 * does not disappear at the end of this function.
-		 */
-		tcf_chain_hold(chain);
-		chain->explicitly_created = true;
+		}
+
 		tc_chain_notify(chain, NULL, 0, NLM_F_CREATE | NLM_F_EXCL,
 				RTM_NEWCHAIN, false);
 		break;
@@ -2127,6 +2137,10 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 		/* Replay the request. */
 		goto replay;
 	return err;
+
+errout_block_locked:
+	spin_unlock(&block->lock);
+	goto errout_block;
 }
 
 /* called with RTNL */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 04/17] net: sched: protect block->chain0 with block->lock
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (2 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 03/17] net: sched: refactor tc_ctl_chain() to use block->lock Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 05/17] net: sched: traverse chains in block with tcf_get_next_chain() Vlad Buslov
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

In order to remove dependency on rtnl lock, use block->lock to protect
chain0 struct from concurrent modification. Rearrange code in chain0
callback add and del functions to only access chain0 when block->lock is
held.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 146a02094905..beffcc2ab1fa 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -260,8 +260,11 @@ static void tcf_chain0_head_change(struct tcf_chain *chain,
 
 	if (chain->index)
 		return;
+
+	spin_lock(&block->lock);
 	list_for_each_entry(item, &block->chain0.filter_chain_list, list)
 		tcf_chain_head_change_item(item, tp_head);
+	spin_unlock(&block->lock);
 }
 
 /* Returns true if block can be safely freed. */
@@ -519,8 +522,8 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
 			      struct tcf_block_ext_info *ei,
 			      struct netlink_ext_ack *extack)
 {
-	struct tcf_chain *chain0 = block->chain0.chain;
 	struct tcf_filter_chain_list_item *item;
+	struct tcf_chain *chain0;
 
 	item = kmalloc(sizeof(*item), GFP_KERNEL);
 	if (!item) {
@@ -529,9 +532,14 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
 	}
 	item->chain_head_change = ei->chain_head_change;
 	item->chain_head_change_priv = ei->chain_head_change_priv;
+
+	spin_lock(&block->lock);
+	chain0 = block->chain0.chain;
 	if (chain0 && chain0->filter_chain)
 		tcf_chain_head_change_item(item, chain0->filter_chain);
 	list_add(&item->list, &block->chain0.filter_chain_list);
+	spin_unlock(&block->lock);
+
 	return 0;
 }
 
@@ -539,20 +547,23 @@ static void
 tcf_chain0_head_change_cb_del(struct tcf_block *block,
 			      struct tcf_block_ext_info *ei)
 {
-	struct tcf_chain *chain0 = block->chain0.chain;
 	struct tcf_filter_chain_list_item *item;
 
+	spin_lock(&block->lock);
 	list_for_each_entry(item, &block->chain0.filter_chain_list, list) {
 		if ((!ei->chain_head_change && !ei->chain_head_change_priv) ||
 		    (item->chain_head_change == ei->chain_head_change &&
 		     item->chain_head_change_priv == ei->chain_head_change_priv)) {
-			if (chain0)
+			if (block->chain0.chain)
 				tcf_chain_head_change_item(item, NULL);
 			list_del(&item->list);
+			spin_unlock(&block->lock);
+
 			kfree(item);
 			return;
 		}
 	}
+	spin_unlock(&block->lock);
 	WARN_ON(1);
 }
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 05/17] net: sched: traverse chains in block with tcf_get_next_chain()
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (3 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 04/17] net: sched: protect block->chain0 with block->lock Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 06/17] net: sched: protect chain template accesses with block lock Vlad Buslov
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

All users of block->chain_list rely on rtnl lock and assume that no new
chains are added when traversing the list. Use tcf_get_next_chain() to
traverse chain list without relying on rtnl mutex. This function iterates
over chains by taking reference to current iterator chain only and doesn't
assume external synchronization of chain list.

Don't take reference to all chains in block when flushing and use
tcf_get_next_chain() to safely iterate over chain list instead. Remove
tcf_block_put_all_chains() that is no longer used.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h |  2 ++
 net/sched/cls_api.c   | 96 +++++++++++++++++++++++++++++++++++++--------------
 net/sched/sch_api.c   |  4 ++-
 3 files changed, 76 insertions(+), 26 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 72ffb3120ced..8396d51e78a1 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -43,6 +43,8 @@ bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
 				       u32 chain_index);
 void tcf_chain_put_by_act(struct tcf_chain *chain);
+struct tcf_chain *tcf_get_next_chain(struct tcf_block *block,
+				     struct tcf_chain *chain);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
 int tcf_block_get(struct tcf_block **p_block,
 		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index beffcc2ab1fa..bc4666985d4d 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -646,28 +646,62 @@ static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index)
 	return block;
 }
 
-static void tcf_block_flush_all_chains(struct tcf_block *block)
+static struct tcf_chain *
+__tcf_get_next_chain(struct tcf_block *block, struct tcf_chain *chain)
 {
-	struct tcf_chain *chain;
+	spin_lock(&block->lock);
+	if (chain)
+		chain = list_is_last(&chain->list, &block->chain_list) ?
+			NULL : list_next_entry(chain, list);
+	else
+		chain = list_first_entry_or_null(&block->chain_list,
+						 struct tcf_chain, list);
 
-	/* Hold a refcnt for all chains, so that they don't disappear
-	 * while we are iterating.
-	 */
-	list_for_each_entry(chain, &block->chain_list, list)
+	/* skip all action-only chains */
+	while (chain && tcf_chain_held_by_acts_only(chain))
+		chain = list_is_last(&chain->list, &block->chain_list) ?
+			NULL : list_next_entry(chain, list);
+
+	if (chain)
 		tcf_chain_hold(chain);
+	spin_unlock(&block->lock);
 
-	list_for_each_entry(chain, &block->chain_list, list)
-		tcf_chain_flush(chain);
+	return chain;
 }
 
-static void tcf_block_put_all_chains(struct tcf_block *block)
+/* Function to be used by all clients that want to iterate over all chains on
+ * block. It properly obtains block->lock and takes reference to chain before
+ * returning it. Users of this function must be tolerant to concurrent chain
+ * insertion/deletion or ensure that no concurrent chain modification is
+ * possible. Note that all netlink dump callbacks cannot guarantee to provide
+ * consistent dump because rtnl lock is released each time skb is filled with
+ * data and sent to user-space.
+ */
+
+struct tcf_chain *
+tcf_get_next_chain(struct tcf_block *block, struct tcf_chain *chain)
 {
-	struct tcf_chain *chain, *tmp;
+	struct tcf_chain *chain_next = __tcf_get_next_chain(block, chain);
 
-	/* At this point, all the chains should have refcnt >= 1. */
-	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
-		tcf_chain_put_explicitly_created(chain);
+	if (chain)
 		tcf_chain_put(chain);
+
+	return chain_next;
+}
+EXPORT_SYMBOL(tcf_get_next_chain);
+
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+	struct tcf_chain *chain;
+
+	/* Last reference to block. At this point chains cannot be added or
+	 * removed concurrently.
+	 */
+	for (chain = tcf_get_next_chain(block, NULL);
+	     chain;
+	     chain = tcf_get_next_chain(block, chain)) {
+		tcf_chain_put_explicitly_created(chain);
+		tcf_chain_flush(chain);
 	}
 }
 
@@ -686,8 +720,6 @@ static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
 		spin_unlock(&block->lock);
 		if (tcf_block_shared(block))
 			tcf_block_remove(block, block->net);
-		if (!free_block)
-			tcf_block_flush_all_chains(block);
 
 		if (q)
 			tcf_block_offload_unbind(block, q, ei);
@@ -695,7 +727,7 @@ static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
 		if (free_block)
 			kfree_rcu(block, rcu);
 		else
-			tcf_block_put_all_chains(block);
+			tcf_block_flush_all_chains(block);
 	} else if (q) {
 		tcf_block_offload_unbind(block, q, ei);
 	}
@@ -1029,11 +1061,15 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 			    void *cb_priv, bool add, bool offload_in_use,
 			    struct netlink_ext_ack *extack)
 {
-	struct tcf_chain *chain;
+	struct tcf_chain *chain, *chain_prev;
 	struct tcf_proto *tp;
 	int err;
 
-	list_for_each_entry(chain, &block->chain_list, list) {
+	for (chain = __tcf_get_next_chain(block, NULL);
+	     chain;
+	     chain_prev = chain,
+		     chain = __tcf_get_next_chain(block, chain),
+		     tcf_chain_put(chain_prev)) {
 		for (tp = rtnl_dereference(chain->filter_chain); tp;
 		     tp = rtnl_dereference(tp->next)) {
 			if (tp->ops->reoffload) {
@@ -1052,6 +1088,7 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 	return 0;
 
 err_playback_remove:
+	tcf_chain_put(chain);
 	tcf_block_playback_offloads(block, cb, cb_priv, false, offload_in_use,
 				    extack);
 	return err;
@@ -1815,11 +1852,11 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 /* called with RTNL */
 static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	struct tcf_chain *chain, *chain_prev;
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tca[TCA_MAX + 1];
 	struct Qdisc *q = NULL;
 	struct tcf_block *block;
-	struct tcf_chain *chain;
 	struct tcmsg *tcm = nlmsg_data(cb->nlh);
 	long index_start;
 	long index;
@@ -1883,12 +1920,17 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	index_start = cb->args[0];
 	index = 0;
 
-	list_for_each_entry(chain, &block->chain_list, list) {
+	for (chain = __tcf_get_next_chain(block, NULL);
+	     chain;
+	     chain_prev = chain,
+		     chain = __tcf_get_next_chain(block, chain),
+		     tcf_chain_put(chain_prev)) {
 		if (tca[TCA_CHAIN] &&
 		    nla_get_u32(tca[TCA_CHAIN]) != chain->index)
 			continue;
 		if (!tcf_chain_dump(chain, q, parent, skb, cb,
 				    index_start, &index)) {
+			tcf_chain_put(chain);
 			err = -EMSGSIZE;
 			break;
 		}
@@ -2157,11 +2199,11 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 /* called with RTNL */
 static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	struct tcf_chain *chain, *chain_prev;
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *tca[TCA_MAX + 1];
 	struct Qdisc *q = NULL;
 	struct tcf_block *block;
-	struct tcf_chain *chain;
 	struct tcmsg *tcm = nlmsg_data(cb->nlh);
 	long index_start;
 	long index;
@@ -2225,7 +2267,11 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 	index_start = cb->args[0];
 	index = 0;
 
-	list_for_each_entry(chain, &block->chain_list, list) {
+	for (chain = __tcf_get_next_chain(block, NULL);
+	     chain;
+	     chain_prev = chain,
+		     chain = __tcf_get_next_chain(block, chain),
+		     tcf_chain_put(chain_prev)) {
 		if ((tca[TCA_CHAIN] &&
 		     nla_get_u32(tca[TCA_CHAIN]) != chain->index))
 			continue;
@@ -2233,14 +2279,14 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 			index++;
 			continue;
 		}
-		if (tcf_chain_held_by_acts_only(chain))
-			continue;
 		err = tc_chain_fill_node(chain, net, skb, block,
 					 NETLINK_CB(cb->skb).portid,
 					 cb->nlh->nlmsg_seq, NLM_F_MULTI,
 					 RTM_NEWCHAIN);
-		if (err <= 0)
+		if (err <= 0) {
+			tcf_chain_put(chain);
 			break;
+		}
 		index++;
 	}
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ca3b0f46de53..28e9c3d9981f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1845,7 +1845,9 @@ static void tc_bind_tclass(struct Qdisc *q, u32 portid, u32 clid,
 	block = cops->tcf_block(q, cl, NULL);
 	if (!block)
 		return;
-	list_for_each_entry(chain, &block->chain_list, list) {
+	for (chain = tcf_get_next_chain(block, NULL);
+	     chain;
+	     chain = tcf_get_next_chain(block, chain)) {
 		struct tcf_proto *tp;
 
 		for (tp = rtnl_dereference(chain->filter_chain);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 06/17] net: sched: protect chain template accesses with block lock
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (4 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 05/17] net: sched: traverse chains in block with tcf_get_next_chain() Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 07/17] net: sched: lock the chain when accessing filter_chain list Vlad Buslov
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

When cls API is called without protection of rtnl lock, parallel
modification of chain is possible, which means that chain template can be
changed concurrently in certain circumstances. For example, when chain is
'deleted' by new user-space chain API, the chain might continue to be used
if it is referenced by actions, and can be 're-created' again by user. In
such case same chain structure is reused and its template is changed. To
protect from described scenario, cache chain template while holding block
lock. Introduce standalone tc_chain_notify_delete() function that works
with cached template values, instead of chains themselves.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 73 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 57 insertions(+), 16 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bc4666985d4d..bd0dac35e26b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -381,14 +381,22 @@ struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block, u32 chain_index)
 }
 EXPORT_SYMBOL(tcf_chain_get_by_act);
 
-static void tc_chain_tmplt_del(struct tcf_chain *chain);
+static void tc_chain_tmplt_del(const struct tcf_proto_ops *tmplt_ops,
+			       void *tmplt_priv);
+static int tc_chain_notify_delete(const struct tcf_proto_ops *tmplt_ops,
+				  void *tmplt_priv, u32 chain_index,
+				  struct tcf_block *block, struct sk_buff *oskb,
+				  u32 seq, u16 flags, bool unicast);
 
 static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
 			    bool explicitly_created)
 {
 	struct tcf_block *block = chain->block;
+	const struct tcf_proto_ops *tmplt_ops;
 	bool is_last, free_block = false;
 	unsigned int refcnt;
+	void *tmplt_priv;
+	u32 chain_index;
 
 	spin_lock(&block->lock);
 	if (explicitly_created) {
@@ -408,16 +416,21 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
 	 */
 	refcnt = --chain->refcnt;
 	is_last = refcnt - chain->action_refcnt == 0;
+	tmplt_ops = chain->tmplt_ops;
+	tmplt_priv = chain->tmplt_priv;
+	chain_index = chain->index;
+
 	if (refcnt == 0)
 		free_block = tcf_chain_detach(chain);
 	spin_unlock(&block->lock);
 
 	/* The last dropped non-action reference will trigger notification. */
 	if (is_last && !by_act)
-		tc_chain_notify(chain, NULL, 0, 0, RTM_DELCHAIN, false);
+		tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain_index,
+				       block, NULL, 0, 0, false);
 
 	if (refcnt == 0) {
-		tc_chain_tmplt_del(chain);
+		tc_chain_tmplt_del(tmplt_ops, tmplt_priv);
 		tcf_chain_destroy(chain, free_block);
 	}
 }
@@ -1947,8 +1960,10 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	return skb->len;
 }
 
-static int tc_chain_fill_node(struct tcf_chain *chain, struct net *net,
-			      struct sk_buff *skb, struct tcf_block *block,
+static int tc_chain_fill_node(const struct tcf_proto_ops *tmplt_ops,
+			      void *tmplt_priv, u32 chain_index,
+			      struct net *net, struct sk_buff *skb,
+			      struct tcf_block *block,
 			      u32 portid, u32 seq, u16 flags, int event)
 {
 	unsigned char *b = skb_tail_pointer(skb);
@@ -1957,8 +1972,8 @@ static int tc_chain_fill_node(struct tcf_chain *chain, struct net *net,
 	struct tcmsg *tcm;
 	void *priv;
 
-	ops = chain->tmplt_ops;
-	priv = chain->tmplt_priv;
+	ops = tmplt_ops;
+	priv = tmplt_priv;
 
 	nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
 	if (!nlh)
@@ -1976,7 +1991,7 @@ static int tc_chain_fill_node(struct tcf_chain *chain, struct net *net,
 		tcm->tcm_block_index = block->index;
 	}
 
-	if (nla_put_u32(skb, TCA_CHAIN, chain->index))
+	if (nla_put_u32(skb, TCA_CHAIN, chain_index))
 		goto nla_put_failure;
 
 	if (ops) {
@@ -2007,7 +2022,8 @@ static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb,
 	if (!skb)
 		return -ENOBUFS;
 
-	if (tc_chain_fill_node(chain, net, skb, block, portid,
+	if (tc_chain_fill_node(chain->tmplt_ops, chain->tmplt_priv,
+			       chain->index, net, skb, block, portid,
 			       seq, flags, event) <= 0) {
 		kfree_skb(skb);
 		return -EINVAL;
@@ -2019,6 +2035,31 @@ static int tc_chain_notify(struct tcf_chain *chain, struct sk_buff *oskb,
 	return rtnetlink_send(skb, net, portid, RTNLGRP_TC, flags & NLM_F_ECHO);
 }
 
+static int tc_chain_notify_delete(const struct tcf_proto_ops *tmplt_ops,
+				  void *tmplt_priv, u32 chain_index,
+				  struct tcf_block *block, struct sk_buff *oskb,
+				  u32 seq, u16 flags, bool unicast)
+{
+	u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
+	struct net *net = block->net;
+	struct sk_buff *skb;
+
+	skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOBUFS;
+
+	if (tc_chain_fill_node(tmplt_ops, tmplt_priv, chain_index, net, skb,
+			       block, portid, seq, flags, RTM_DELCHAIN) <= 0) {
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
+	if (unicast)
+		return netlink_unicast(net->rtnl, skb, portid, MSG_DONTWAIT);
+
+	return rtnetlink_send(skb, net, portid, RTNLGRP_TC, flags & NLM_F_ECHO);
+}
+
 static int tc_chain_tmplt_add(struct tcf_chain *chain, struct net *net,
 			      struct nlattr **tca,
 			      struct netlink_ext_ack *extack)
@@ -2048,16 +2089,15 @@ static int tc_chain_tmplt_add(struct tcf_chain *chain, struct net *net,
 	return 0;
 }
 
-static void tc_chain_tmplt_del(struct tcf_chain *chain)
+static void tc_chain_tmplt_del(const struct tcf_proto_ops *tmplt_ops,
+			       void *tmplt_priv)
 {
-	const struct tcf_proto_ops *ops = chain->tmplt_ops;
-
 	/* If template ops are set, no work to do for us. */
-	if (!ops)
+	if (!tmplt_ops)
 		return;
 
-	ops->tmplt_destroy(chain->tmplt_priv);
-	module_put(ops->owner);
+	tmplt_ops->tmplt_destroy(tmplt_priv);
+	module_put(tmplt_ops->owner);
 }
 
 /* Add/delete/get a chain */
@@ -2279,7 +2319,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 			index++;
 			continue;
 		}
-		err = tc_chain_fill_node(chain, net, skb, block,
+		err = tc_chain_fill_node(chain->tmplt_ops, chain->tmplt_priv,
+					 chain->index, net, skb, block,
 					 NETLINK_CB(cb->skb).portid,
 					 cb->nlh->nlmsg_seq, NLM_F_MULTI,
 					 RTM_NEWCHAIN);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 07/17] net: sched: lock the chain when accessing filter_chain list
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (5 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 06/17] net: sched: protect chain template accesses with block lock Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 08/17] net: sched: introduce reference counting for tcf proto Vlad Buslov
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Always lock chain when accessing filter_chain list, instead of relying on
rtnl lock. Dereference filter_chain with tcf_chain_dereference() lockdep
macro to verify that all users of chain_list have the lock taken.

Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute
all necessary code while holding chain lock in order to prevent
invalidation of chain_info structure by potential concurrent change.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  17 ++++++++
 net/sched/cls_api.c       | 103 +++++++++++++++++++++++++++++-----------------
 2 files changed, 83 insertions(+), 37 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5f4fc28fc77a..82fa23da4969 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -338,6 +338,8 @@ struct qdisc_skb_cb {
 typedef void tcf_chain_head_change_t(struct tcf_proto *tp_head, void *priv);
 
 struct tcf_chain {
+	/* Protects filter_chain and filter_chain_list. */
+	spinlock_t filter_chain_lock;
 	struct tcf_proto __rcu *filter_chain;
 	struct list_head list;
 	struct tcf_block *block;
@@ -371,6 +373,21 @@ struct tcf_block {
 	struct rcu_head rcu;
 };
 
+#ifdef CONFIG_PROVE_LOCKING
+static inline bool lockdep_tcf_chain_is_locked(struct tcf_chain *chain)
+{
+	return lockdep_is_held(&chain->filter_chain_lock);
+}
+#else
+static inline bool lockdep_tcf_chain_is_locked(struct tcf_block *chain)
+{
+	return true;
+}
+#endif /* #ifdef CONFIG_PROVE_LOCKING */
+
+#define tcf_chain_dereference(p, chain)					\
+	rcu_dereference_protected(p, lockdep_tcf_chain_is_locked(chain))
+
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 {
 	if (*flags & TCA_CLS_FLAGS_IN_HW)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bd0dac35e26b..8f5dfa3ffb1c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -237,6 +237,7 @@ static struct tcf_chain *tcf_chain_create(struct tcf_block *block,
 	if (!chain)
 		return NULL;
 	list_add_tail(&chain->list, &block->chain_list);
+	spin_lock_init(&chain->filter_chain_lock);
 	chain->block = block;
 	chain->index = chain_index;
 	chain->refcnt = 1;
@@ -453,9 +454,13 @@ static void tcf_chain_put_explicitly_created(struct tcf_chain *chain)
 
 static void tcf_chain_flush(struct tcf_chain *chain)
 {
-	struct tcf_proto *tp = rtnl_dereference(chain->filter_chain);
+	struct tcf_proto *tp;
 
+	spin_lock(&chain->filter_chain_lock);
+	tp = tcf_chain_dereference(chain->filter_chain, chain);
 	tcf_chain0_head_change(chain, NULL);
+	spin_unlock(&chain->filter_chain_lock);
+
 	while (tp) {
 		RCU_INIT_POINTER(chain->filter_chain, tp->next);
 		tcf_proto_destroy(tp, NULL);
@@ -548,8 +553,15 @@ tcf_chain0_head_change_cb_add(struct tcf_block *block,
 
 	spin_lock(&block->lock);
 	chain0 = block->chain0.chain;
-	if (chain0 && chain0->filter_chain)
-		tcf_chain_head_change_item(item, chain0->filter_chain);
+	if (chain0) {
+		struct tcf_proto *tp_head;
+
+		spin_lock(&chain0->filter_chain_lock);
+		tp_head = tcf_chain_dereference(chain0->filter_chain, chain0);
+		if (tp_head)
+			tcf_chain_head_change_item(item, tp_head);
+		spin_unlock(&chain0->filter_chain_lock);
+	}
 	list_add(&item->list, &block->chain0.filter_chain_list);
 	spin_unlock(&block->lock);
 
@@ -1251,9 +1263,10 @@ struct tcf_chain_info {
 	struct tcf_proto __rcu *next;
 };
 
-static struct tcf_proto *tcf_chain_tp_prev(struct tcf_chain_info *chain_info)
+static struct tcf_proto *tcf_chain_tp_prev(struct tcf_chain *chain,
+					   struct tcf_chain_info *chain_info)
 {
-	return rtnl_dereference(*chain_info->pprev);
+	return tcf_chain_dereference(*chain_info->pprev, chain);
 }
 
 static void tcf_chain_tp_insert(struct tcf_chain *chain,
@@ -1262,7 +1275,7 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 {
 	if (*chain_info->pprev == chain->filter_chain)
 		tcf_chain0_head_change(chain, tp);
-	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain_info));
+	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
 	tcf_chain_hold(chain);
 }
@@ -1271,7 +1284,7 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 				struct tcf_chain_info *chain_info,
 				struct tcf_proto *tp)
 {
-	struct tcf_proto *next = rtnl_dereference(chain_info->next);
+	struct tcf_proto *next = tcf_chain_dereference(chain_info->next, chain);
 
 	if (tp == chain->filter_chain)
 		tcf_chain0_head_change(chain, next);
@@ -1288,7 +1301,8 @@ static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 
 	/* Check the chain for existence of proto-tcf with this priority */
 	for (pprev = &chain->filter_chain;
-	     (tp = rtnl_dereference(*pprev)); pprev = &tp->next) {
+	     (tp = tcf_chain_dereference(*pprev, chain));
+	     pprev = &tp->next) {
 		if (tp->prio >= prio) {
 			if (tp->prio == prio) {
 				if (prio_allocate ||
@@ -1496,12 +1510,13 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		goto errout;
 	}
 
+	spin_lock(&chain->filter_chain_lock);
 	tp = tcf_chain_tp_find(chain, &chain_info, protocol,
 			       prio, prio_allocate);
 	if (IS_ERR(tp)) {
 		NL_SET_ERR_MSG(extack, "Filter with specified priority/protocol not found");
 		err = PTR_ERR(tp);
-		goto errout;
+		goto errout_locked;
 	}
 
 	if (tp == NULL) {
@@ -1510,29 +1525,37 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		if (tca[TCA_KIND] == NULL || !protocol) {
 			NL_SET_ERR_MSG(extack, "Filter kind and protocol must be specified");
 			err = -EINVAL;
-			goto errout;
+			goto errout_locked;
 		}
 
 		if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 			NL_SET_ERR_MSG(extack, "Need both RTM_NEWTFILTER and NLM_F_CREATE to create a new filter");
 			err = -ENOENT;
-			goto errout;
+			goto errout_locked;
 		}
 
 		if (prio_allocate)
-			prio = tcf_auto_prio(tcf_chain_tp_prev(&chain_info));
+			prio = tcf_auto_prio(tcf_chain_tp_prev(chain,
+							       &chain_info));
 
+		spin_unlock(&chain->filter_chain_lock);
 		tp = tcf_proto_create(nla_data(tca[TCA_KIND]),
 				      protocol, prio, chain, extack);
 		if (IS_ERR(tp)) {
 			err = PTR_ERR(tp);
 			goto errout;
 		}
+
+		spin_lock(&chain->filter_chain_lock);
+		tcf_chain_tp_insert(chain, &chain_info, tp);
+		spin_unlock(&chain->filter_chain_lock);
 		tp_created = 1;
 	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
 		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
 		err = -EINVAL;
-		goto errout;
+		goto errout_locked;
+	} else {
+		spin_unlock(&chain->filter_chain_lock);
 	}
 
 	fh = tp->ops->get(tp, t->tcm_handle);
@@ -1558,21 +1581,11 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
 			      n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE,
 			      extack);
-	if (err == 0) {
-		if (tp_created)
-			tcf_chain_tp_insert(chain, &chain_info, tp);
+	if (err == 0)
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false);
-	} else {
-		if (tp_created) {
-			/* tp wasn't inserted to chain tp list. Take reference
-			 * to chain manually for tcf_proto_destroy() to
-			 * release.
-			 */
-			tcf_chain_hold(chain);
-			tcf_proto_destroy(tp, NULL);
-		}
-	}
+	else if (tp_created)
+		tcf_proto_destroy(tp, NULL);
 
 errout:
 	if (chain)
@@ -1582,6 +1595,10 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		/* Replay the request. */
 		goto replay;
 	return err;
+
+errout_locked:
+	spin_unlock(&chain->filter_chain_lock);
+	goto errout;
 }
 
 static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
@@ -1657,31 +1674,34 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		goto errout;
 	}
 
+	spin_lock(&chain->filter_chain_lock);
 	tp = tcf_chain_tp_find(chain, &chain_info, protocol,
 			       prio, false);
 	if (!tp || IS_ERR(tp)) {
 		NL_SET_ERR_MSG(extack, "Filter with specified priority/protocol not found");
 		err = tp ? PTR_ERR(tp) : -ENOENT;
-		goto errout;
+		goto errout_locked;
 	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
 		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
 		err = -EINVAL;
+		goto errout_locked;
+	} else if (t->tcm_handle == 0) {
+		tcf_chain_tp_remove(chain, &chain_info, tp);
+		spin_unlock(&chain->filter_chain_lock);
+
+		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
+			       RTM_DELTFILTER, false);
+		tcf_proto_destroy(tp, extack);
+		err = 0;
 		goto errout;
 	}
+	spin_unlock(&chain->filter_chain_lock);
 
 	fh = tp->ops->get(tp, t->tcm_handle);
 
 	if (!fh) {
-		if (t->tcm_handle == 0) {
-			tcf_chain_tp_remove(chain, &chain_info, tp);
-			tfilter_notify(net, skb, n, tp, block, q, parent, fh,
-				       RTM_DELTFILTER, false);
-			tcf_proto_destroy(tp, extack);
-			err = 0;
-		} else {
-			NL_SET_ERR_MSG(extack, "Specified filter handle not found");
-			err = -ENOENT;
-		}
+		NL_SET_ERR_MSG(extack, "Specified filter handle not found");
+		err = -ENOENT;
 	} else {
 		bool last;
 
@@ -1691,7 +1711,10 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		if (err)
 			goto errout;
 		if (last) {
+			spin_lock(&chain->filter_chain_lock);
 			tcf_chain_tp_remove(chain, &chain_info, tp);
+			spin_unlock(&chain->filter_chain_lock);
+
 			tcf_proto_destroy(tp, extack);
 		}
 	}
@@ -1701,6 +1724,10 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tcf_chain_put(chain);
 	tcf_block_release(q, block);
 	return err;
+
+errout_locked:
+	spin_unlock(&chain->filter_chain_lock);
+	goto errout;
 }
 
 static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
@@ -1758,8 +1785,10 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		goto errout;
 	}
 
+	spin_lock(&chain->filter_chain_lock);
 	tp = tcf_chain_tp_find(chain, &chain_info, protocol,
 			       prio, false);
+	spin_unlock(&chain->filter_chain_lock);
 	if (!tp || IS_ERR(tp)) {
 		NL_SET_ERR_MSG(extack, "Filter with specified priority/protocol not found");
 		err = tp ? PTR_ERR(tp) : -ENOENT;
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 08/17] net: sched: introduce reference counting for tcf proto
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (6 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 07/17] net: sched: lock the chain when accessing filter_chain list Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto() Vlad Buslov
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Add reference counter to tcf proto. Use it to manage tcf proto life cycle
in cls API.

Implement helper get/put functions for tcf proto and use them to modify cls
API to always take reference to tcf proto while using it. This change
allows to concurrently modify proto, instead of relying on rtnl lock for
protection.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c       | 44 +++++++++++++++++++++++++++++++++++++-------
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 82fa23da4969..1015e3491187 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -318,6 +318,7 @@ struct tcf_proto {
 	void			*data;
 	const struct tcf_proto_ops	*ops;
 	struct tcf_chain	*chain;
+	refcount_t		refcnt;
 	struct rcu_head		rcu;
 	struct work_struct	work;
 };
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 8f5dfa3ffb1c..6cbdb28017d3 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -197,6 +197,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	tp->prio = prio;
 	tp->chain = chain;
 	INIT_WORK(&tp->work, tcf_proto_destroy_work);
+	refcount_set(&tp->refcnt, 1);
 
 	err = tp->ops->init(tp);
 	if (err) {
@@ -210,12 +211,24 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	return ERR_PTR(err);
 }
 
+static void tcf_proto_get(struct tcf_proto *tp)
+{
+	refcount_inc(&tp->refcnt);
+}
+
 static void tcf_proto_destroy(struct tcf_proto *tp,
 			      struct netlink_ext_ack *extack)
 {
 	tc_queue_proto_work(&tp->work);
 }
 
+static void tcf_proto_put(struct tcf_proto *tp,
+			  struct netlink_ext_ack *extack)
+{
+	if (refcount_dec_and_test(&tp->refcnt))
+		tcf_proto_destroy(tp, extack);
+}
+
 #define ASSERT_BLOCK_LOCKED(block)					\
 	WARN_ONCE(!spin_is_locked(&(block)->lock),		\
 		  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)
@@ -458,13 +471,13 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 
 	spin_lock(&chain->filter_chain_lock);
 	tp = tcf_chain_dereference(chain->filter_chain, chain);
+	RCU_INIT_POINTER(chain->filter_chain, NULL);
 	tcf_chain0_head_change(chain, NULL);
 	spin_unlock(&chain->filter_chain_lock);
 
 	while (tp) {
-		RCU_INIT_POINTER(chain->filter_chain, tp->next);
-		tcf_proto_destroy(tp, NULL);
-		tp = rtnl_dereference(chain->filter_chain);
+		tcf_proto_put(tp, NULL);
+		tp = tp->next;
 	}
 }
 
@@ -1275,9 +1288,9 @@ static void tcf_chain_tp_insert(struct tcf_chain *chain,
 {
 	if (*chain_info->pprev == chain->filter_chain)
 		tcf_chain0_head_change(chain, tp);
+	tcf_proto_get(tp);
 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
-	tcf_chain_hold(chain);
 }
 
 static void tcf_chain_tp_remove(struct tcf_chain *chain,
@@ -1315,7 +1328,12 @@ static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 		}
 	}
 	chain_info->pprev = pprev;
-	chain_info->next = tp ? tp->next : NULL;
+	if (tp) {
+		chain_info->next = tp->next;
+		tcf_proto_get(tp);
+	} else {
+		chain_info->next = NULL;
+	}
 	return tp;
 }
 
@@ -1590,6 +1608,12 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	if (chain) {
+		if (tp && !IS_ERR(tp))
+			tcf_proto_put(tp, NULL);
+		if (!tp_created)
+			tcf_chain_put(chain);
+	}
 	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
@@ -1720,8 +1744,11 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 
 errout:
-	if (chain)
+	if (chain) {
+		if (tp && !IS_ERR(tp))
+			tcf_proto_put(tp, NULL);
 		tcf_chain_put(chain);
+	}
 	tcf_block_release(q, block);
 	return err;
 
@@ -1812,8 +1839,11 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 
 errout:
-	if (chain)
+	if (chain) {
+		if (tp && !IS_ERR(tp))
+			tcf_proto_put(tp, NULL);
 		tcf_chain_put(chain);
+	}
 	tcf_block_release(q, block);
 	return err;
 }
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto()
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (7 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 08/17] net: sched: introduce reference counting for tcf proto Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 10/17] net: sched: refactor tp insert/delete for concurrent execution Vlad Buslov
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

All users of chain->filters_chain rely on rtnl lock and assume that no new
classifier instances are added when traversing the list. Use
tcf_get_next_proto() to traverse filters list without relying on rtnl
mutex. This function iterates over classifiers by taking reference to
current iterator classifier only and doesn't assume external
synchronization of filters list.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h |  2 ++
 net/sched/cls_api.c   | 70 +++++++++++++++++++++++++++++++++++++++++++--------
 net/sched/sch_api.c   |  4 +--
 3 files changed, 64 insertions(+), 12 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 8396d51e78a1..727cb99da5d4 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -45,6 +45,8 @@ struct tcf_chain *tcf_chain_get_by_act(struct tcf_block *block,
 void tcf_chain_put_by_act(struct tcf_chain *chain);
 struct tcf_chain *tcf_get_next_chain(struct tcf_block *block,
 				     struct tcf_chain *chain);
+struct tcf_proto *tcf_get_next_proto(struct tcf_chain *chain,
+				     struct tcf_proto *tp);
 void tcf_block_netif_keep_dst(struct tcf_block *block);
 int tcf_block_get(struct tcf_block **p_block,
 		  struct tcf_proto __rcu **p_filter_chain, struct Qdisc *q,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6cbdb28017d3..02130f8d89e1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -728,6 +728,45 @@ tcf_get_next_chain(struct tcf_block *block, struct tcf_chain *chain)
 }
 EXPORT_SYMBOL(tcf_get_next_chain);
 
+static struct tcf_proto *
+__tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
+{
+	ASSERT_RTNL();
+	spin_lock(&chain->filter_chain_lock);
+
+	if (!tp)
+		tp = tcf_chain_dereference(chain->filter_chain, chain);
+	else
+		tp = tcf_chain_dereference(tp->next, chain);
+
+	if (tp)
+		tcf_proto_get(tp);
+
+	spin_unlock(&chain->filter_chain_lock);
+
+	return tp;
+}
+
+/* Function to be used by all clients that want to iterate over all tp's on
+ * chain. Users of this function must be tolerant to concurrent tp
+ * insertion/deletion or ensure that no concurrent chain modification is
+ * possible. Note that all netlink dump callbacks cannot guarantee to provide
+ * consistent dump because rtnl lock is released each time skb is filled with
+ * data and sent to user-space.
+ */
+
+struct tcf_proto *
+tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
+{
+	struct tcf_proto *tp_next = __tcf_get_next_proto(chain, tp);
+
+	if (tp)
+		tcf_proto_put(tp, NULL);
+
+	return tp_next;
+}
+EXPORT_SYMBOL(tcf_get_next_proto);
+
 static void tcf_block_flush_all_chains(struct tcf_block *block)
 {
 	struct tcf_chain *chain;
@@ -1100,7 +1139,7 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 			    struct netlink_ext_ack *extack)
 {
 	struct tcf_chain *chain, *chain_prev;
-	struct tcf_proto *tp;
+	struct tcf_proto *tp, *tp_prev;
 	int err;
 
 	for (chain = __tcf_get_next_chain(block, NULL);
@@ -1108,8 +1147,10 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 	     chain_prev = chain,
 		     chain = __tcf_get_next_chain(block, chain),
 		     tcf_chain_put(chain_prev)) {
-		for (tp = rtnl_dereference(chain->filter_chain); tp;
-		     tp = rtnl_dereference(tp->next)) {
+		for (tp = __tcf_get_next_proto(chain, NULL); tp;
+		     tp_prev = tp,
+			     tp = __tcf_get_next_proto(chain, tp),
+			     tcf_proto_put(tp_prev, NULL)) {
 			if (tp->ops->reoffload) {
 				err = tp->ops->reoffload(tp, add, cb, cb_priv,
 							 extack);
@@ -1126,6 +1167,7 @@ tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
 	return 0;
 
 err_playback_remove:
+	tcf_proto_put(tp, NULL);
 	tcf_chain_put(chain);
 	tcf_block_playback_offloads(block, cb, cb_priv, false, offload_in_use,
 				    extack);
@@ -1449,8 +1491,8 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 {
 	struct tcf_proto *tp;
 
-	for (tp = rtnl_dereference(chain->filter_chain);
-	     tp; tp = rtnl_dereference(tp->next))
+	for (tp = tcf_get_next_proto(chain, NULL);
+	     tp; tp = tcf_get_next_proto(chain, tp))
 		tfilter_notify(net, oskb, n, tp, block,
 			       q, parent, NULL, event, false);
 }
@@ -1875,11 +1917,15 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 	struct net *net = sock_net(skb->sk);
 	struct tcf_block *block = chain->block;
 	struct tcmsg *tcm = nlmsg_data(cb->nlh);
+	struct tcf_proto *tp, *tp_prev;
 	struct tcf_dump_args arg;
-	struct tcf_proto *tp;
 
-	for (tp = rtnl_dereference(chain->filter_chain);
-	     tp; tp = rtnl_dereference(tp->next), (*p_index)++) {
+	for (tp = __tcf_get_next_proto(chain, NULL);
+	     tp;
+	     tp_prev = tp,
+		     tp = __tcf_get_next_proto(chain, tp),
+		     tcf_proto_put(tp_prev, NULL),
+		     (*p_index)++) {
 		if (*p_index < index_start)
 			continue;
 		if (TC_H_MAJ(tcm->tcm_info) &&
@@ -1896,7 +1942,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 					  NETLINK_CB(cb->skb).portid,
 					  cb->nlh->nlmsg_seq, NLM_F_MULTI,
 					  RTM_NEWTFILTER) <= 0)
-				return false;
+				goto errout;
 
 			cb->args[1] = 1;
 		}
@@ -1916,9 +1962,13 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 		cb->args[2] = arg.w.cookie;
 		cb->args[1] = arg.w.count + 1;
 		if (arg.w.stop)
-			return false;
+			goto errout;
 	}
 	return true;
+
+errout:
+	tcf_proto_put(tp, NULL);
+	return false;
 }
 
 /* called with RTNL */
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 28e9c3d9981f..0605f01a419b 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1850,8 +1850,8 @@ static void tc_bind_tclass(struct Qdisc *q, u32 portid, u32 clid,
 	     chain = tcf_get_next_chain(block, chain)) {
 		struct tcf_proto *tp;
 
-		for (tp = rtnl_dereference(chain->filter_chain);
-		     tp; tp = rtnl_dereference(tp->next)) {
+		for (tp = tcf_get_next_proto(chain, NULL);
+		     tp; tp = tcf_get_next_proto(chain, tp)) {
 			struct tcf_bind_args arg = {};
 
 			arg.w.fn = tcf_node_bind;
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 10/17] net: sched: refactor tp insert/delete for concurrent execution
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (8 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto() Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 11/17] net: sched: prevent insertion of new classifiers during chain flush Vlad Buslov
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Implement unique insertion function to atomically attach tcf proto to chain
after verifying that no other tcf proto with specified priority exists.
Implement delete function that verifies that tp is actually empty before
deleting it. Use these functions to refactor cls API to account for
concurrent tp and rule update instead of relying on rtnl lock. Add new
'deleting' flag to tcf proto. Use it to restart search when iterating over
tp's on chain to prevent accessing potentially inval tp->next pointer.

Extend tcf proto with spinlock that is intended to be used to protects its
data from concurrent modification instead of relying on rtnl mutex. Use it
to protect 'deleting' flag. Add lockdep macros to validate that lock is
held when accessing protected fields.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  18 +++++
 net/sched/cls_api.c       | 183 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 176 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 1015e3491187..4809eca41f95 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -318,6 +318,11 @@ struct tcf_proto {
 	void			*data;
 	const struct tcf_proto_ops	*ops;
 	struct tcf_chain	*chain;
+	/* Lock protects tcf_proto shared state and can be used by unlocked
+	 * classifiers to protect their private data.
+	 */
+	spinlock_t		lock;
+	bool			deleting;
 	refcount_t		refcnt;
 	struct rcu_head		rcu;
 	struct work_struct	work;
@@ -379,16 +384,29 @@ static inline bool lockdep_tcf_chain_is_locked(struct tcf_chain *chain)
 {
 	return lockdep_is_held(&chain->filter_chain_lock);
 }
+
+static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
+{
+	return lockdep_is_held(&tp->lock);
+}
 #else
 static inline bool lockdep_tcf_chain_is_locked(struct tcf_block *chain)
 {
 	return true;
 }
+
+static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
+{
+	return true;
+}
 #endif /* #ifdef CONFIG_PROVE_LOCKING */
 
 #define tcf_chain_dereference(p, chain)					\
 	rcu_dereference_protected(p, lockdep_tcf_chain_is_locked(chain))
 
+#define tcf_proto_dereference(p, tp)					\
+	rcu_dereference_protected(p, lockdep_tcf_proto_is_locked(tp))
+
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 {
 	if (*flags & TCA_CLS_FLAGS_IN_HW)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 02130f8d89e1..3ce244fbfb4d 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -197,6 +197,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	tp->prio = prio;
 	tp->chain = chain;
 	INIT_WORK(&tp->work, tcf_proto_destroy_work);
+	spin_lock_init(&tp->lock);
 	refcount_set(&tp->refcnt, 1);
 
 	err = tp->ops->init(tp);
@@ -229,6 +230,49 @@ static void tcf_proto_put(struct tcf_proto *tp,
 		tcf_proto_destroy(tp, extack);
 }
 
+static int walker_noop(struct tcf_proto *tp, void *d, struct tcf_walker *arg)
+{
+	return -1;
+}
+
+static bool tcf_proto_is_empty(struct tcf_proto *tp)
+{
+	struct tcf_walker walker = { .fn = walker_noop, };
+
+	if (tp->ops->walk) {
+		tp->ops->walk(tp, &walker);
+		return !walker.stop;
+	}
+	return true;
+}
+
+static bool tcf_proto_check_delete(struct tcf_proto *tp)
+{
+	spin_lock(&tp->lock);
+	if (tcf_proto_is_empty(tp))
+		tp->deleting = true;
+	spin_unlock(&tp->lock);
+	return tp->deleting;
+}
+
+static void tcf_proto_mark_delete(struct tcf_proto *tp)
+{
+	spin_lock(&tp->lock);
+	tp->deleting = true;
+	spin_unlock(&tp->lock);
+}
+
+static bool tcf_proto_is_deleting(struct tcf_proto *tp)
+{
+	bool deleting;
+
+	spin_lock(&tp->lock);
+	deleting = tp->deleting;
+	spin_unlock(&tp->lock);
+
+	return deleting;
+}
+
 #define ASSERT_BLOCK_LOCKED(block)					\
 	WARN_ONCE(!spin_is_locked(&(block)->lock),		\
 		  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)
@@ -731,13 +775,27 @@ EXPORT_SYMBOL(tcf_get_next_chain);
 static struct tcf_proto *
 __tcf_get_next_proto(struct tcf_chain *chain, struct tcf_proto *tp)
 {
+	u32 prio = 0;
+
 	ASSERT_RTNL();
 	spin_lock(&chain->filter_chain_lock);
 
-	if (!tp)
+	if (!tp) {
 		tp = tcf_chain_dereference(chain->filter_chain, chain);
-	else
+	} else if (tcf_proto_is_deleting(tp)) {
+		/* 'deleting' flag is set and chain->filter_chain_lock was
+		 * unlocked, which means next pointer could be invalid. Restart
+		 * search.
+		 */
+		prio = tp->prio + 1;
+		tp = tcf_chain_dereference(chain->filter_chain, chain);
+
+		for (; tp; tp = tcf_chain_dereference(tp->next, chain))
+			if (!tp->deleting && tp->prio >= prio)
+				break;
+	} else {
 		tp = tcf_chain_dereference(tp->next, chain);
+	}
 
 	if (tp)
 		tcf_proto_get(tp);
@@ -1341,6 +1399,7 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 {
 	struct tcf_proto *next = tcf_chain_dereference(chain_info->next, chain);
 
+	tcf_proto_mark_delete(tp);
 	if (tp == chain->filter_chain)
 		tcf_chain0_head_change(chain, next);
 	RCU_INIT_POINTER(*chain_info->pprev, next);
@@ -1349,6 +1408,79 @@ static void tcf_chain_tp_remove(struct tcf_chain *chain,
 static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 					   struct tcf_chain_info *chain_info,
 					   u32 protocol, u32 prio,
+					   bool prio_allocate);
+
+/* Try to insert new proto.
+ * If proto with specified priority already exists, free new proto
+ * and return existing one.
+ */
+
+static struct tcf_proto *tcf_chain_tp_insert_unique(struct tcf_chain *chain,
+						    struct tcf_proto *tp_new,
+						    u32 protocol, u32 prio)
+{
+	struct tcf_chain_info chain_info;
+	struct tcf_proto *tp;
+
+	spin_lock(&chain->filter_chain_lock);
+
+	tp = tcf_chain_tp_find(chain, &chain_info,
+			       protocol, prio, false);
+	if (!tp)
+		tcf_chain_tp_insert(chain, &chain_info, tp_new);
+	spin_unlock(&chain->filter_chain_lock);
+
+	if (tp) {
+		tcf_proto_destroy(tp_new, NULL);
+		tp_new = tp;
+	}
+
+	return tp_new;
+}
+
+static void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
+				      struct tcf_proto *tp,
+				      struct netlink_ext_ack *extack)
+{
+	struct tcf_chain_info chain_info;
+	struct tcf_proto *tp_iter;
+	struct tcf_proto **pprev;
+	struct tcf_proto *next;
+
+	spin_lock(&chain->filter_chain_lock);
+
+	/* Atomically find and remove tp from chain. */
+	for (pprev = &chain->filter_chain;
+	     (tp_iter = tcf_chain_dereference(*pprev, chain));
+	     pprev = &tp_iter->next) {
+		if (tp_iter == tp) {
+			chain_info.pprev = pprev;
+			chain_info.next = tp_iter->next;
+			WARN_ON(tp_iter->deleting);
+			break;
+		}
+	}
+	/* Verify that tp still exists and no new filters were inserted
+	 * concurrently.
+	 * Mark tp for deletion if it is empty.
+	 */
+	if (!tp_iter || !tcf_proto_check_delete(tp)) {
+		spin_unlock(&chain->filter_chain_lock);
+		return;
+	}
+
+	next = tcf_chain_dereference(chain_info.next, chain);
+	if (tp == chain->filter_chain)
+		tcf_chain0_head_change(chain, next);
+	RCU_INIT_POINTER(*chain_info.pprev, next);
+	spin_unlock(&chain->filter_chain_lock);
+
+	tcf_proto_put(tp, extack);
+}
+
+static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
+					   struct tcf_chain_info *chain_info,
+					   u32 protocol, u32 prio,
 					   bool prio_allocate)
 {
 	struct tcf_proto **pprev;
@@ -1533,6 +1665,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	prio = TC_H_MAJ(t->tcm_info);
 	prio_allocate = false;
 	parent = t->tcm_parent;
+	tp = NULL;
 	cl = 0;
 
 	if (prio == 0) {
@@ -1580,6 +1713,8 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 
 	if (tp == NULL) {
+		struct tcf_proto *tp_new = NULL;
+
 		/* Proto-tcf does not exist, create new one */
 
 		if (tca[TCA_KIND] == NULL || !protocol) {
@@ -1599,23 +1734,28 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 							       &chain_info));
 
 		spin_unlock(&chain->filter_chain_lock);
-		tp = tcf_proto_create(nla_data(tca[TCA_KIND]),
-				      protocol, prio, chain, extack);
-		if (IS_ERR(tp)) {
-			err = PTR_ERR(tp);
+		tp_new = tcf_proto_create(nla_data(tca[TCA_KIND]),
+					  protocol, prio, chain, extack);
+		if (IS_ERR(tp_new)) {
+			err = PTR_ERR(tp_new);
 			goto errout;
 		}
 
-		spin_lock(&chain->filter_chain_lock);
-		tcf_chain_tp_insert(chain, &chain_info, tp);
+		tp = tcf_chain_tp_insert_unique(chain, tp_new, protocol, prio);
+
+		/* tp insert function can return another tp instance, if it was
+		 * created concurrently.
+		 */
+		if (tp == tp_new)
+			tp_created = 1;
+	} else {
 		spin_unlock(&chain->filter_chain_lock);
-		tp_created = 1;
-	} else if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
+	}
+
+	if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
 		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
 		err = -EINVAL;
-		goto errout_locked;
-	} else {
-		spin_unlock(&chain->filter_chain_lock);
+		goto errout;
 	}
 
 	fh = tp->ops->get(tp, t->tcm_handle);
@@ -1644,12 +1784,10 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	if (err == 0)
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false);
-	else if (tp_created)
-		tcf_proto_destroy(tp, NULL);
 
 errout:
-	if (chain)
-		tcf_chain_put(chain);
+	if (err && tp_created)
+		tcf_chain_tp_delete_empty(chain, tp, NULL);
 	if (chain) {
 		if (tp && !IS_ERR(tp))
 			tcf_proto_put(tp, NULL);
@@ -1755,9 +1893,9 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		tcf_chain_tp_remove(chain, &chain_info, tp);
 		spin_unlock(&chain->filter_chain_lock);
 
+		tcf_proto_put(tp, NULL);
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_DELTFILTER, false);
-		tcf_proto_destroy(tp, extack);
 		err = 0;
 		goto errout;
 	}
@@ -1776,13 +1914,8 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 					 extack);
 		if (err)
 			goto errout;
-		if (last) {
-			spin_lock(&chain->filter_chain_lock);
-			tcf_chain_tp_remove(chain, &chain_info, tp);
-			spin_unlock(&chain->filter_chain_lock);
-
-			tcf_proto_destroy(tp, extack);
-		}
+		if (last)
+			tcf_chain_tp_delete_empty(chain, tp, extack);
 	}
 
 errout:
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 11/17] net: sched: prevent insertion of new classifiers during chain flush
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (9 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 10/17] net: sched: refactor tp insert/delete for concurrent execution Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 12/17] net: sched: track rtnl lock status when validating extensions Vlad Buslov
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Extend tcf_chain with 'flushing' flag. Use the flag to prevent insertion of
new classifier instances when chain flushing is in progress in order to
prevent resource leak when tcf_proto is created by unlocked users
concurrently.

Return EAGAIN error from tcf_chain_tp_insert_unique() to restart
tc_new_tfilter() and lookup the chain/proto again.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c       | 35 ++++++++++++++++++++++++++---------
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4809eca41f95..100368594524 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -353,6 +353,7 @@ struct tcf_chain {
 	unsigned int refcnt;
 	unsigned int action_refcnt;
 	bool explicitly_created;
+	bool flushing;
 	const struct tcf_proto_ops *tmplt_ops;
 	void *tmplt_priv;
 };
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3ce244fbfb4d..d4f763525412 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -483,9 +483,12 @@ static void __tcf_chain_put(struct tcf_chain *chain, bool by_act,
 	spin_unlock(&block->lock);
 
 	/* The last dropped non-action reference will trigger notification. */
-	if (is_last && !by_act)
+	if (is_last && !by_act) {
 		tc_chain_notify_delete(tmplt_ops, tmplt_priv, chain_index,
 				       block, NULL, 0, 0, false);
+		/* Last reference to chain, no need to lock. */
+		chain->flushing = false;
+	}
 
 	if (refcnt == 0) {
 		tc_chain_tmplt_del(tmplt_ops, tmplt_priv);
@@ -517,6 +520,7 @@ static void tcf_chain_flush(struct tcf_chain *chain)
 	tp = tcf_chain_dereference(chain->filter_chain, chain);
 	RCU_INIT_POINTER(chain->filter_chain, NULL);
 	tcf_chain0_head_change(chain, NULL);
+	chain->flushing = true;
 	spin_unlock(&chain->filter_chain_lock);
 
 	while (tp) {
@@ -1382,15 +1386,20 @@ static struct tcf_proto *tcf_chain_tp_prev(struct tcf_chain *chain,
 	return tcf_chain_dereference(*chain_info->pprev, chain);
 }
 
-static void tcf_chain_tp_insert(struct tcf_chain *chain,
-				struct tcf_chain_info *chain_info,
-				struct tcf_proto *tp)
+static int tcf_chain_tp_insert(struct tcf_chain *chain,
+			       struct tcf_chain_info *chain_info,
+			       struct tcf_proto *tp)
 {
+	if (chain->flushing)
+		return -EAGAIN;
+
 	if (*chain_info->pprev == chain->filter_chain)
 		tcf_chain0_head_change(chain, tp);
 	tcf_proto_get(tp);
 	RCU_INIT_POINTER(tp->next, tcf_chain_tp_prev(chain, chain_info));
 	rcu_assign_pointer(*chain_info->pprev, tp);
+
+	return 0;
 }
 
 static void tcf_chain_tp_remove(struct tcf_chain *chain,
@@ -1421,18 +1430,22 @@ static struct tcf_proto *tcf_chain_tp_insert_unique(struct tcf_chain *chain,
 {
 	struct tcf_chain_info chain_info;
 	struct tcf_proto *tp;
+	int err = 0;
 
 	spin_lock(&chain->filter_chain_lock);
 
 	tp = tcf_chain_tp_find(chain, &chain_info,
 			       protocol, prio, false);
 	if (!tp)
-		tcf_chain_tp_insert(chain, &chain_info, tp_new);
+		err = tcf_chain_tp_insert(chain, &chain_info, tp_new);
 	spin_unlock(&chain->filter_chain_lock);
 
 	if (tp) {
 		tcf_proto_destroy(tp_new, NULL);
 		tp_new = tp;
+	} else if (err) {
+		tcf_proto_destroy(tp_new, NULL);
+		tp_new = ERR_PTR(err);
 	}
 
 	return tp_new;
@@ -1743,11 +1756,15 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 		tp = tcf_chain_tp_insert_unique(chain, tp_new, protocol, prio);
 
-		/* tp insert function can return another tp instance, if it was
-		 * created concurrently.
-		 */
-		if (tp == tp_new)
+		if (IS_ERR(tp)) {
+			err = PTR_ERR(tp);
+			goto errout;
+		} else if (tp == tp_new) {
+			/* tp insert function can return another tp instance, if
+			 * it was created concurrently.
+			 */
 			tp_created = 1;
+		}
 	} else {
 		spin_unlock(&chain->filter_chain_lock);
 	}
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 12/17] net: sched: track rtnl lock status when validating extensions
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (10 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 11/17] net: sched: prevent insertion of new classifiers during chain flush Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 13/17] net: sched: extend proto ops with 'put' callback Vlad Buslov
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Actions API is already updated to not rely on rtnl lock for
synchronization. However, it need to be provided with rtnl status when
called from classifiers API in order to be able to correctly release the
lock when loading kernel module.

Extend extension validation function with 'rtnl_held' flag which is passed
to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls
API. No classifier is currently updated to support unlocked execution, so
pass hardcoded 'true' flag parameter value.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h    | 2 +-
 net/sched/cls_api.c      | 9 +++++----
 net/sched/cls_basic.c    | 2 +-
 net/sched/cls_bpf.c      | 3 ++-
 net/sched/cls_cgroup.c   | 2 +-
 net/sched/cls_flow.c     | 2 +-
 net/sched/cls_flower.c   | 3 ++-
 net/sched/cls_fw.c       | 2 +-
 net/sched/cls_matchall.c | 3 ++-
 net/sched/cls_route.c    | 2 +-
 net/sched/cls_rsvp.h     | 3 ++-
 net/sched/cls_tcindex.c  | 2 +-
 net/sched/cls_u32.c      | 2 +-
 13 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 727cb99da5d4..1f1b06311622 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -381,7 +381,7 @@ tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
 
 int tcf_exts_validate(struct net *net, struct tcf_proto *tp,
 		      struct nlattr **tb, struct nlattr *rate_tlv,
-		      struct tcf_exts *exts, bool ovr,
+		      struct tcf_exts *exts, bool ovr, bool rtnl_held,
 		      struct netlink_ext_ack *extack);
 void tcf_exts_destroy(struct tcf_exts *exts);
 void tcf_exts_change(struct tcf_exts *dst, struct tcf_exts *src);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d4f763525412..0949502e31b9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -2613,7 +2613,7 @@ EXPORT_SYMBOL(tcf_exts_destroy);
 
 int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 		      struct nlattr *rate_tlv, struct tcf_exts *exts, bool ovr,
-		      struct netlink_ext_ack *extack)
+		      bool rtnl_held, struct netlink_ext_ack *extack)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	{
@@ -2623,7 +2623,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 		if (exts->police && tb[exts->police]) {
 			act = tcf_action_init_1(net, tp, tb[exts->police],
 						rate_tlv, "police", ovr,
-						TCA_ACT_BIND, true, extack);
+						TCA_ACT_BIND, rtnl_held,
+						extack);
 			if (IS_ERR(act))
 				return PTR_ERR(act);
 
@@ -2635,8 +2636,8 @@ int tcf_exts_validate(struct net *net, struct tcf_proto *tp, struct nlattr **tb,
 
 			err = tcf_action_init(net, tp, tb[exts->action],
 					      rate_tlv, NULL, ovr, TCA_ACT_BIND,
-					      exts->actions, &attr_size, true,
-					      extack);
+					      exts->actions, &attr_size,
+					      rtnl_held, extack);
 			if (err < 0)
 				return err;
 			exts->nr_actions = err;
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 6a5dce8baf19..90e44888f85d 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -148,7 +148,7 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp,
 {
 	int err;
 
-	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, extack);
+	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, true, extack);
 	if (err < 0)
 		return err;
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index fa6fe2fe0f32..9c0ac7c23ad7 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -417,7 +417,8 @@ static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp,
 	if ((!is_bpf && !is_ebpf) || (is_bpf && is_ebpf))
 		return -EINVAL;
 
-	ret = tcf_exts_validate(net, tp, tb, est, &prog->exts, ovr, extack);
+	ret = tcf_exts_validate(net, tp, tb, est, &prog->exts, ovr, true,
+				extack);
 	if (ret < 0)
 		return ret;
 
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 3bc01bdde165..663ee1c6d606 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -110,7 +110,7 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 		goto errout;
 
 	err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &new->exts, ovr,
-				extack);
+				true, extack);
 	if (err < 0)
 		goto errout;
 
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 2bb043cd436b..39a6407d4832 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -445,7 +445,7 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 		goto err2;
 
 	err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &fnew->exts, ovr,
-				extack);
+				true, extack);
 	if (err < 0)
 		goto err2;
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 9aada2d0ef06..8bf3d4660fac 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1132,7 +1132,8 @@ static int fl_set_parms(struct net *net, struct tcf_proto *tp,
 {
 	int err;
 
-	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, extack);
+	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, true,
+				extack);
 	if (err < 0)
 		return err;
 
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index 29eeeaf3ea44..c8173ebb69f2 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -217,7 +217,7 @@ static int fw_set_parms(struct net *net, struct tcf_proto *tp,
 	int err;
 
 	err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &f->exts, ovr,
-				extack);
+				true, extack);
 	if (err < 0)
 		return err;
 
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 856fa79d4ffd..5621c38a303a 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -142,7 +142,8 @@ static int mall_set_parms(struct net *net, struct tcf_proto *tp,
 {
 	int err;
 
-	err = tcf_exts_validate(net, tp, tb, est, &head->exts, ovr, extack);
+	err = tcf_exts_validate(net, tp, tb, est, &head->exts, ovr, true,
+				extack);
 	if (err < 0)
 		return err;
 
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 0404aa5fa7cb..44b26038c4c4 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -393,7 +393,7 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 	struct route4_bucket *b;
 	int err;
 
-	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, extack);
+	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, true, extack);
 	if (err < 0)
 		return err;
 
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index e9ccf7daea7d..9dd9530e6a52 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -502,7 +502,8 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
 	err = tcf_exts_init(&e, TCA_RSVP_ACT, TCA_RSVP_POLICE);
 	if (err < 0)
 		return err;
-	err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &e, ovr, extack);
+	err = tcf_exts_validate(net, tp, tb, tca[TCA_RATE], &e, ovr, true,
+				extack);
 	if (err < 0)
 		goto errout2;
 
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 9ccc93f257db..b7dc667b6ec0 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -314,7 +314,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	err = tcf_exts_init(&e, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 	if (err < 0)
 		return err;
-	err = tcf_exts_validate(net, tp, tb, est, &e, ovr, extack);
+	err = tcf_exts_validate(net, tp, tb, est, &e, ovr, true, extack);
 	if (err < 0)
 		goto errout;
 
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 4b28fd44576d..cd79ae3b605a 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -725,7 +725,7 @@ static int u32_set_parms(struct net *net, struct tcf_proto *tp,
 {
 	int err;
 
-	err = tcf_exts_validate(net, tp, tb, est, &n->exts, ovr, extack);
+	err = tcf_exts_validate(net, tp, tb, est, &n->exts, ovr, true, extack);
 	if (err < 0)
 		return err;
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 13/17] net: sched: extend proto ops with 'put' callback
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (11 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 12/17] net: sched: track rtnl lock status when validating extensions Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 14/17] net: sched: extend proto ops to support unlocked classifiers Vlad Buslov
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Add optional tp->ops->put() API to be implemented for filter reference
counting. This new function is called by cls API to release filter
reference for filters returned by tp->ops->change() or tp->ops->get()
functions. Implement tfilter_put() helper to call tp->ops->put() only for
classifiers that implement it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c       | 12 +++++++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 100368594524..24103b7282bd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -273,6 +273,7 @@ struct tcf_proto_ops {
 					   struct netlink_ext_ack *extack);
 
 	void*			(*get)(struct tcf_proto*, u32 handle);
+	void			(*put)(struct tcf_proto *tp, void *f);
 	int			(*change)(struct net *net, struct sk_buff *,
 					struct tcf_proto*, unsigned long,
 					u32 handle, struct nlattr **,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0949502e31b9..7f65ed84b5e5 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1642,6 +1642,12 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 			       q, parent, NULL, event, false);
 }
 
+static void tfilter_put(struct tcf_proto *tp, void *fh)
+{
+	if (tp->ops->put && fh)
+		tp->ops->put(tp, fh);
+}
+
 static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			  struct netlink_ext_ack *extack)
 {
@@ -1784,6 +1790,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			goto errout;
 		}
 	} else if (n->nlmsg_flags & NLM_F_EXCL) {
+		tfilter_put(tp, fh);
 		NL_SET_ERR_MSG(extack, "Filter already exists");
 		err = -EEXIST;
 		goto errout;
@@ -1798,9 +1805,11 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
 			      n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE,
 			      extack);
-	if (err == 0)
+	if (err == 0) {
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false);
+		tfilter_put(tp, fh);
+	}
 
 errout:
 	if (err && tp_created)
@@ -2030,6 +2039,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			NL_SET_ERR_MSG(extack, "Failed to send filter notify message");
 	}
 
+	tfilter_put(tp, fh);
 errout:
 	if (chain) {
 		if (tp && !IS_ERR(tp))
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 14/17] net: sched: extend proto ops to support unlocked classifiers
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (12 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 13/17] net: sched: extend proto ops with 'put' callback Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 15/17] net: sched: add flags to Qdisc class ops struct Vlad Buslov
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk
functions to track rtnl lock status. This allows classifiers to release
rtnl lock when necessary and to pass rtnl lock status to extensions and
driver offload callbacks.

Add flags field to tcf proto ops. Add flag value to indicate that
classifier doesn't require rtnl lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 17 +++++++---
 net/sched/cls_api.c       | 85 ++++++++++++++++++++++++++++-------------------
 net/sched/cls_basic.c     | 12 ++++---
 net/sched/cls_bpf.c       | 12 ++++---
 net/sched/cls_cgroup.c    | 11 +++---
 net/sched/cls_flow.c      | 13 +++++---
 net/sched/cls_flower.c    | 13 +++++---
 net/sched/cls_fw.c        | 13 +++++---
 net/sched/cls_matchall.c  | 13 +++++---
 net/sched/cls_route.c     | 12 ++++---
 net/sched/cls_rsvp.h      | 13 +++++---
 net/sched/cls_tcindex.c   | 15 +++++----
 net/sched/cls_u32.c       | 12 ++++---
 net/sched/sch_api.c       |  2 +-
 14 files changed, 145 insertions(+), 98 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 24103b7282bd..7b158d6fae85 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -269,7 +269,7 @@ struct tcf_proto_ops {
 					    const struct tcf_proto *,
 					    struct tcf_result *);
 	int			(*init)(struct tcf_proto*);
-	void			(*destroy)(struct tcf_proto *tp,
+	void			(*destroy)(struct tcf_proto *tp, bool rtnl_held,
 					   struct netlink_ext_ack *extack);
 
 	void*			(*get)(struct tcf_proto*, u32 handle);
@@ -277,12 +277,13 @@ struct tcf_proto_ops {
 	int			(*change)(struct net *net, struct sk_buff *,
 					struct tcf_proto*, unsigned long,
 					u32 handle, struct nlattr **,
-					void **, bool,
+					void **, bool, bool,
 					struct netlink_ext_ack *);
 	int			(*delete)(struct tcf_proto *tp, void *arg,
-					  bool *last,
+					  bool *last, bool rtnl_held,
 					  struct netlink_ext_ack *);
-	void			(*walk)(struct tcf_proto*, struct tcf_walker *arg);
+	void			(*walk)(struct tcf_proto *tp,
+					struct tcf_walker *arg, bool rtnl_held);
 	int			(*reoffload)(struct tcf_proto *tp, bool add,
 					     tc_setup_cb_t *cb, void *cb_priv,
 					     struct netlink_ext_ack *extack);
@@ -295,12 +296,18 @@ struct tcf_proto_ops {
 
 	/* rtnetlink specific */
 	int			(*dump)(struct net*, struct tcf_proto*, void *,
-					struct sk_buff *skb, struct tcmsg*);
+					struct sk_buff *skb, struct tcmsg*,
+					bool);
 	int			(*tmplt_dump)(struct sk_buff *skb,
 					      struct net *net,
 					      void *tmplt_priv);
 
 	struct module		*owner;
+	int			flags;
+};
+
+enum tcf_proto_ops_flags {
+	TCF_PROTO_OPS_DOIT_UNLOCKED = 1,
 };
 
 struct tcf_proto {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 7f65ed84b5e5..1956c5df5f89 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -60,7 +60,8 @@ static const struct tcf_proto_ops *__tcf_proto_lookup_ops(const char *kind)
 }
 
 static const struct tcf_proto_ops *
-tcf_proto_lookup_ops(const char *kind, struct netlink_ext_ack *extack)
+tcf_proto_lookup_ops(const char *kind, bool rtnl_held,
+		     struct netlink_ext_ack *extack)
 {
 	const struct tcf_proto_ops *ops;
 
@@ -68,9 +69,11 @@ tcf_proto_lookup_ops(const char *kind, struct netlink_ext_ack *extack)
 	if (ops)
 		return ops;
 #ifdef CONFIG_MODULES
-	rtnl_unlock();
+	if (rtnl_held)
+		rtnl_unlock();
 	request_module("cls_%s", kind);
-	rtnl_lock();
+	if (rtnl_held)
+		rtnl_lock();
 	ops = __tcf_proto_lookup_ops(kind);
 	/* We dropped the RTNL semaphore in order to perform
 	 * the module load. So, even if we succeeded in loading
@@ -168,7 +171,7 @@ static void tcf_proto_destroy_work(struct work_struct *work)
 
 	rtnl_lock();
 
-	tp->ops->destroy(tp, NULL);
+	tp->ops->destroy(tp, true, NULL);
 	module_put(tp->ops->owner);
 	kfree_rcu(tp, rcu);
 	tcf_chain_put(chain);
@@ -178,6 +181,7 @@ static void tcf_proto_destroy_work(struct work_struct *work)
 
 static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 					  u32 prio, struct tcf_chain *chain,
+					  bool rtnl_held,
 					  struct netlink_ext_ack *extack)
 {
 	struct tcf_proto *tp;
@@ -187,7 +191,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	if (!tp)
 		return ERR_PTR(-ENOBUFS);
 
-	tp->ops = tcf_proto_lookup_ops(kind, extack);
+	tp->ops = tcf_proto_lookup_ops(kind, rtnl_held, extack);
 	if (IS_ERR(tp->ops)) {
 		err = PTR_ERR(tp->ops);
 		goto errout;
@@ -235,21 +239,21 @@ static int walker_noop(struct tcf_proto *tp, void *d, struct tcf_walker *arg)
 	return -1;
 }
 
-static bool tcf_proto_is_empty(struct tcf_proto *tp)
+static bool tcf_proto_is_empty(struct tcf_proto *tp, bool rtnl_held)
 {
 	struct tcf_walker walker = { .fn = walker_noop, };
 
 	if (tp->ops->walk) {
-		tp->ops->walk(tp, &walker);
+		tp->ops->walk(tp, &walker, rtnl_held);
 		return !walker.stop;
 	}
 	return true;
 }
 
-static bool tcf_proto_check_delete(struct tcf_proto *tp)
+static bool tcf_proto_check_delete(struct tcf_proto *tp, bool rtnl_held)
 {
 	spin_lock(&tp->lock);
-	if (tcf_proto_is_empty(tp))
+	if (tcf_proto_is_empty(tp, rtnl_held))
 		tp->deleting = true;
 	spin_unlock(&tp->lock);
 	return tp->deleting;
@@ -1452,7 +1456,7 @@ static struct tcf_proto *tcf_chain_tp_insert_unique(struct tcf_chain *chain,
 }
 
 static void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
-				      struct tcf_proto *tp,
+				      struct tcf_proto *tp, bool rtnl_held,
 				      struct netlink_ext_ack *extack)
 {
 	struct tcf_chain_info chain_info;
@@ -1477,7 +1481,7 @@ static void tcf_chain_tp_delete_empty(struct tcf_chain *chain,
 	 * concurrently.
 	 * Mark tp for deletion if it is empty.
 	 */
-	if (!tp_iter || !tcf_proto_check_delete(tp)) {
+	if (!tp_iter || !tcf_proto_check_delete(tp, rtnl_held)) {
 		spin_unlock(&chain->filter_chain_lock);
 		return;
 	}
@@ -1527,7 +1531,8 @@ static struct tcf_proto *tcf_chain_tp_find(struct tcf_chain *chain,
 static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 			 struct tcf_proto *tp, struct tcf_block *block,
 			 struct Qdisc *q, u32 parent, void *fh,
-			 u32 portid, u32 seq, u16 flags, int event)
+			 u32 portid, u32 seq, u16 flags, int event,
+			 bool rtnl_held)
 {
 	struct tcmsg *tcm;
 	struct nlmsghdr  *nlh;
@@ -1555,7 +1560,8 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 	if (!fh) {
 		tcm->tcm_handle = 0;
 	} else {
-		if (tp->ops->dump && tp->ops->dump(net, tp, fh, skb, tcm) < 0)
+		if (tp->ops->dump &&
+		    tp->ops->dump(net, tp, fh, skb, tcm, rtnl_held) < 0)
 			goto nla_put_failure;
 	}
 	nlh->nlmsg_len = skb_tail_pointer(skb) - b;
@@ -1570,7 +1576,8 @@ static int tcf_fill_node(struct net *net, struct sk_buff *skb,
 static int tfilter_notify(struct net *net, struct sk_buff *oskb,
 			  struct nlmsghdr *n, struct tcf_proto *tp,
 			  struct tcf_block *block, struct Qdisc *q,
-			  u32 parent, void *fh, int event, bool unicast)
+			  u32 parent, void *fh, int event, bool unicast,
+			  bool rtnl_held)
 {
 	struct sk_buff *skb;
 	u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
@@ -1580,7 +1587,8 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
 		return -ENOBUFS;
 
 	if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
-			  n->nlmsg_seq, n->nlmsg_flags, event) <= 0) {
+			  n->nlmsg_seq, n->nlmsg_flags, event,
+			  rtnl_held) <= 0) {
 		kfree_skb(skb);
 		return -EINVAL;
 	}
@@ -1596,7 +1604,7 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
 			      struct nlmsghdr *n, struct tcf_proto *tp,
 			      struct tcf_block *block, struct Qdisc *q,
 			      u32 parent, void *fh, bool unicast, bool *last,
-			      struct netlink_ext_ack *extack)
+			      bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct sk_buff *skb;
 	u32 portid = oskb ? NETLINK_CB(oskb).portid : 0;
@@ -1607,13 +1615,14 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
 		return -ENOBUFS;
 
 	if (tcf_fill_node(net, skb, tp, block, q, parent, fh, portid,
-			  n->nlmsg_seq, n->nlmsg_flags, RTM_DELTFILTER) <= 0) {
+			  n->nlmsg_seq, n->nlmsg_flags, RTM_DELTFILTER,
+			  rtnl_held) <= 0) {
 		NL_SET_ERR_MSG(extack, "Failed to build del event notification");
 		kfree_skb(skb);
 		return -EINVAL;
 	}
 
-	err = tp->ops->delete(tp, fh, last, extack);
+	err = tp->ops->delete(tp, fh, last, rtnl_held, extack);
 	if (err) {
 		kfree_skb(skb);
 		return err;
@@ -1632,14 +1641,15 @@ static int tfilter_del_notify(struct net *net, struct sk_buff *oskb,
 static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 				 struct tcf_block *block, struct Qdisc *q,
 				 u32 parent, struct nlmsghdr *n,
-				 struct tcf_chain *chain, int event)
+				 struct tcf_chain *chain, int event,
+				 bool rtnl_held)
 {
 	struct tcf_proto *tp;
 
 	for (tp = tcf_get_next_proto(chain, NULL);
 	     tp; tp = tcf_get_next_proto(chain, tp))
 		tfilter_notify(net, oskb, n, tp, block,
-			       q, parent, NULL, event, false);
+			       q, parent, NULL, event, false, rtnl_held);
 }
 
 static void tfilter_put(struct tcf_proto *tp, void *fh)
@@ -1668,6 +1678,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	void *fh;
 	int err;
 	int tp_created;
+	bool rtnl_held;
 
 	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -1686,6 +1697,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	parent = t->tcm_parent;
 	tp = NULL;
 	cl = 0;
+	rtnl_held = true;
 
 	if (prio == 0) {
 		/* If no priority is provided by the user,
@@ -1754,7 +1766,8 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 		spin_unlock(&chain->filter_chain_lock);
 		tp_new = tcf_proto_create(nla_data(tca[TCA_KIND]),
-					  protocol, prio, chain, extack);
+					  protocol, prio, chain, rtnl_held,
+					  extack);
 		if (IS_ERR(tp_new)) {
 			err = PTR_ERR(tp_new);
 			goto errout;
@@ -1804,16 +1817,16 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
 			      n->nlmsg_flags & NLM_F_CREATE ? TCA_ACT_NOREPLACE : TCA_ACT_REPLACE,
-			      extack);
+			      rtnl_held, extack);
 	if (err == 0) {
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
-			       RTM_NEWTFILTER, false);
+			       RTM_NEWTFILTER, false, rtnl_held);
 		tfilter_put(tp, fh);
 	}
 
 errout:
 	if (err && tp_created)
-		tcf_chain_tp_delete_empty(chain, tp, NULL);
+		tcf_chain_tp_delete_empty(chain, tp, rtnl_held, NULL);
 	if (chain) {
 		if (tp && !IS_ERR(tp))
 			tcf_proto_put(tp, NULL);
@@ -1849,6 +1862,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	unsigned long cl = 0;
 	void *fh = NULL;
 	int err;
+	bool rtnl_held = true;
 
 	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -1898,7 +1912,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 	if (prio == 0) {
 		tfilter_notify_chain(net, skb, block, q, parent, n,
-				     chain, RTM_DELTFILTER);
+				     chain, RTM_DELTFILTER, rtnl_held);
 		tcf_chain_flush(chain);
 		err = 0;
 		goto errout;
@@ -1921,7 +1935,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 		tcf_proto_put(tp, NULL);
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
-			       RTM_DELTFILTER, false);
+			       RTM_DELTFILTER, false, rtnl_held);
 		err = 0;
 		goto errout;
 	}
@@ -1937,11 +1951,12 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 		err = tfilter_del_notify(net, skb, n, tp, block,
 					 q, parent, fh, false, &last,
-					 extack);
+					 rtnl_held, extack);
+
 		if (err)
 			goto errout;
 		if (last)
-			tcf_chain_tp_delete_empty(chain, tp, extack);
+			tcf_chain_tp_delete_empty(chain, tp, rtnl_held, extack);
 	}
 
 errout:
@@ -1976,6 +1991,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	unsigned long cl = 0;
 	void *fh = NULL;
 	int err;
+	bool rtnl_held = true;
 
 	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, rtm_tca_policy, extack);
 	if (err < 0)
@@ -2034,7 +2050,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		err = -ENOENT;
 	} else {
 		err = tfilter_notify(net, skb, n, tp, block, q, parent,
-				     fh, RTM_NEWTFILTER, true);
+				     fh, RTM_NEWTFILTER, true, rtnl_held);
 		if (err < 0)
 			NL_SET_ERR_MSG(extack, "Failed to send filter notify message");
 	}
@@ -2067,7 +2083,7 @@ static int tcf_node_dump(struct tcf_proto *tp, void *n, struct tcf_walker *arg)
 	return tcf_fill_node(net, a->skb, tp, a->block, a->q, a->parent,
 			     n, NETLINK_CB(a->cb->skb).portid,
 			     a->cb->nlh->nlmsg_seq, NLM_F_MULTI,
-			     RTM_NEWTFILTER);
+			     RTM_NEWTFILTER, true);
 }
 
 static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
@@ -2101,9 +2117,8 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 			if (tcf_fill_node(net, skb, tp, block, q, parent, NULL,
 					  NETLINK_CB(cb->skb).portid,
 					  cb->nlh->nlmsg_seq, NLM_F_MULTI,
-					  RTM_NEWTFILTER) <= 0)
+					  RTM_NEWTFILTER, true) <= 0)
 				goto errout;
-
 			cb->args[1] = 1;
 		}
 		if (!tp->ops->walk)
@@ -2118,7 +2133,7 @@ static bool tcf_chain_dump(struct tcf_chain *chain, struct Qdisc *q, u32 parent,
 		arg.w.skip = cb->args[1] - 1;
 		arg.w.count = 0;
 		arg.w.cookie = cb->args[2];
-		tp->ops->walk(tp, &arg.w);
+		tp->ops->walk(tp, &arg.w, true);
 		cb->args[2] = arg.w.cookie;
 		cb->args[1] = arg.w.count + 1;
 		if (arg.w.stop)
@@ -2340,7 +2355,7 @@ static int tc_chain_tmplt_add(struct tcf_chain *chain, struct net *net,
 	if (!tca[TCA_KIND])
 		return 0;
 
-	ops = tcf_proto_lookup_ops(nla_data(tca[TCA_KIND]), extack);
+	ops = tcf_proto_lookup_ops(nla_data(tca[TCA_KIND]), true, extack);
 	if (IS_ERR(ops))
 		return PTR_ERR(ops);
 	if (!ops->tmplt_create || !ops->tmplt_destroy || !ops->tmplt_dump) {
@@ -2470,7 +2485,7 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 		break;
 	case RTM_DELCHAIN:
 		tfilter_notify_chain(net, skb, block, q, parent, n,
-				     chain, RTM_DELTFILTER);
+				     chain, RTM_DELTFILTER, true);
 		/* Flush the chain first as the user requested chain removal. */
 		tcf_chain_flush(chain);
 		/* In case the chain was successfully deleted, put a reference
diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index 90e44888f85d..7bf9e4335d74 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -102,7 +102,8 @@ static void basic_delete_filter_work(struct work_struct *work)
 	rtnl_unlock();
 }
 
-static void basic_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void basic_destroy(struct tcf_proto *tp, bool rtnl_held,
+			  struct netlink_ext_ack *extack)
 {
 	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f, *n;
@@ -121,7 +122,7 @@ static void basic_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 }
 
 static int basic_delete(struct tcf_proto *tp, void *arg, bool *last,
-			struct netlink_ext_ack *extack)
+			bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f = arg;
@@ -168,7 +169,7 @@ static int basic_set_parms(struct net *net, struct tcf_proto *tp,
 static int basic_change(struct net *net, struct sk_buff *in_skb,
 			struct tcf_proto *tp, unsigned long base, u32 handle,
 			struct nlattr **tca, void **arg, bool ovr,
-			struct netlink_ext_ack *extack)
+			bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	int err;
 	struct basic_head *head = rtnl_dereference(tp->root);
@@ -236,7 +237,8 @@ static int basic_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static void basic_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void basic_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+		       bool rtnl_held)
 {
 	struct basic_head *head = rtnl_dereference(tp->root);
 	struct basic_filter *f;
@@ -263,7 +265,7 @@ static void basic_bind_class(void *fh, u32 classid, unsigned long cl)
 }
 
 static int basic_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		      struct sk_buff *skb, struct tcmsg *t)
+		      struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct basic_filter *f = fh;
 	struct nlattr *nest;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 9c0ac7c23ad7..455bb0bca959 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -298,7 +298,7 @@ static void __cls_bpf_delete(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 }
 
 static int cls_bpf_delete(struct tcf_proto *tp, void *arg, bool *last,
-			  struct netlink_ext_ack *extack)
+			  bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 
@@ -307,7 +307,7 @@ static int cls_bpf_delete(struct tcf_proto *tp, void *arg, bool *last,
 	return 0;
 }
 
-static void cls_bpf_destroy(struct tcf_proto *tp,
+static void cls_bpf_destroy(struct tcf_proto *tp, bool rtnl_held,
 			    struct netlink_ext_ack *extack)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
@@ -456,7 +456,8 @@ static int cls_bpf_set_parms(struct net *net, struct tcf_proto *tp,
 static int cls_bpf_change(struct net *net, struct sk_buff *in_skb,
 			  struct tcf_proto *tp, unsigned long base,
 			  u32 handle, struct nlattr **tca,
-			  void **arg, bool ovr, struct netlink_ext_ack *extack)
+			  void **arg, bool ovr, bool rtnl_held,
+			  struct netlink_ext_ack *extack)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 	struct cls_bpf_prog *oldprog = *arg;
@@ -576,7 +577,7 @@ static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
 }
 
 static int cls_bpf_dump(struct net *net, struct tcf_proto *tp, void *fh,
-			struct sk_buff *skb, struct tcmsg *tm)
+			struct sk_buff *skb, struct tcmsg *tm, bool rtnl_held)
 {
 	struct cls_bpf_prog *prog = fh;
 	struct nlattr *nest;
@@ -636,7 +637,8 @@ static void cls_bpf_bind_class(void *fh, u32 classid, unsigned long cl)
 		prog->res.class = cl;
 }
 
-static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+			 bool rtnl_held)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
 	struct cls_bpf_prog *prog;
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 663ee1c6d606..1cef3b416094 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -78,7 +78,7 @@ static void cls_cgroup_destroy_work(struct work_struct *work)
 static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 			     struct tcf_proto *tp, unsigned long base,
 			     u32 handle, struct nlattr **tca,
-			     void **arg, bool ovr,
+			     void **arg, bool ovr, bool rtnl_held,
 			     struct netlink_ext_ack *extack)
 {
 	struct nlattr *tb[TCA_CGROUP_MAX + 1];
@@ -130,7 +130,7 @@ static int cls_cgroup_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static void cls_cgroup_destroy(struct tcf_proto *tp,
+static void cls_cgroup_destroy(struct tcf_proto *tp, bool rtnl_held,
 			       struct netlink_ext_ack *extack)
 {
 	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
@@ -145,12 +145,13 @@ static void cls_cgroup_destroy(struct tcf_proto *tp,
 }
 
 static int cls_cgroup_delete(struct tcf_proto *tp, void *arg, bool *last,
-			     struct netlink_ext_ack *extack)
+			     bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	return -EOPNOTSUPP;
 }
 
-static void cls_cgroup_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void cls_cgroup_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+			    bool rtnl_held)
 {
 	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 
@@ -166,7 +167,7 @@ static void cls_cgroup_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 }
 
 static int cls_cgroup_dump(struct net *net, struct tcf_proto *tp, void *fh,
-			   struct sk_buff *skb, struct tcmsg *t)
+			   struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct cls_cgroup_head *head = rtnl_dereference(tp->root);
 	struct nlattr *nest;
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index 39a6407d4832..204e2edae8d5 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -391,7 +391,8 @@ static void flow_destroy_filter_work(struct work_struct *work)
 static int flow_change(struct net *net, struct sk_buff *in_skb,
 		       struct tcf_proto *tp, unsigned long base,
 		       u32 handle, struct nlattr **tca,
-		       void **arg, bool ovr, struct netlink_ext_ack *extack)
+		       void **arg, bool ovr, bool rtnl_held,
+		       struct netlink_ext_ack *extack)
 {
 	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *fold, *fnew;
@@ -566,7 +567,7 @@ static int flow_change(struct net *net, struct sk_buff *in_skb,
 }
 
 static int flow_delete(struct tcf_proto *tp, void *arg, bool *last,
-		       struct netlink_ext_ack *extack)
+		       bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f = arg;
@@ -590,7 +591,8 @@ static int flow_init(struct tcf_proto *tp)
 	return 0;
 }
 
-static void flow_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void flow_destroy(struct tcf_proto *tp, bool rtnl_held,
+			 struct netlink_ext_ack *extack)
 {
 	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f, *next;
@@ -617,7 +619,7 @@ static void *flow_get(struct tcf_proto *tp, u32 handle)
 }
 
 static int flow_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		     struct sk_buff *skb, struct tcmsg *t)
+		     struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct flow_filter *f = fh;
 	struct nlattr *nest;
@@ -677,7 +679,8 @@ static int flow_dump(struct net *net, struct tcf_proto *tp, void *fh,
 	return -1;
 }
 
-static void flow_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void flow_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+		      bool rtnl_held)
 {
 	struct flow_head *head = rtnl_dereference(tp->root);
 	struct flow_filter *f;
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 8bf3d4660fac..b82efcab9c82 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -375,7 +375,8 @@ static void fl_destroy_sleepable(struct work_struct *work)
 	module_put(THIS_MODULE);
 }
 
-static void fl_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
+		       struct netlink_ext_ack *extack)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 	struct fl_flow_mask *mask, *next_mask;
@@ -1160,7 +1161,8 @@ static int fl_set_parms(struct net *net, struct tcf_proto *tp,
 static int fl_change(struct net *net, struct sk_buff *in_skb,
 		     struct tcf_proto *tp, unsigned long base,
 		     u32 handle, struct nlattr **tca,
-		     void **arg, bool ovr, struct netlink_ext_ack *extack)
+		     void **arg, bool ovr, bool rtnl_held,
+		     struct netlink_ext_ack *extack)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 	struct cls_fl_filter *fold = *arg;
@@ -1287,7 +1289,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 }
 
 static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
-		     struct netlink_ext_ack *extack)
+		     bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 	struct cls_fl_filter *f = arg;
@@ -1300,7 +1302,8 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
 	return 0;
 }
 
-static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+		    bool rtnl_held)
 {
 	struct cls_fl_head *head = rtnl_dereference(tp->root);
 	struct cls_fl_filter *f;
@@ -1850,7 +1853,7 @@ static int fl_dump_key(struct sk_buff *skb, struct net *net,
 }
 
 static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		   struct sk_buff *skb, struct tcmsg *t)
+		   struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct cls_fl_filter *f = fh;
 	struct nlattr *nest;
diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
index c8173ebb69f2..317151bae73b 100644
--- a/net/sched/cls_fw.c
+++ b/net/sched/cls_fw.c
@@ -139,7 +139,8 @@ static void fw_delete_filter_work(struct work_struct *work)
 	rtnl_unlock();
 }
 
-static void fw_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void fw_destroy(struct tcf_proto *tp, bool rtnl_held,
+		       struct netlink_ext_ack *extack)
 {
 	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f;
@@ -163,7 +164,7 @@ static void fw_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 }
 
 static int fw_delete(struct tcf_proto *tp, void *arg, bool *last,
-		     struct netlink_ext_ack *extack)
+		     bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = arg;
@@ -250,7 +251,8 @@ static int fw_set_parms(struct net *net, struct tcf_proto *tp,
 static int fw_change(struct net *net, struct sk_buff *in_skb,
 		     struct tcf_proto *tp, unsigned long base,
 		     u32 handle, struct nlattr **tca, void **arg,
-		     bool ovr, struct netlink_ext_ack *extack)
+		     bool ovr, bool rtnl_held,
+		     struct netlink_ext_ack *extack)
 {
 	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = *arg;
@@ -354,7 +356,8 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+		    bool rtnl_held)
 {
 	struct fw_head *head = rtnl_dereference(tp->root);
 	int h;
@@ -384,7 +387,7 @@ static void fw_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 }
 
 static int fw_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		   struct sk_buff *skb, struct tcmsg *t)
+		   struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct fw_head *head = rtnl_dereference(tp->root);
 	struct fw_filter *f = fh;
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 5621c38a303a..4e6d1139434a 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -106,7 +106,8 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 	return 0;
 }
 
-static void mall_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void mall_destroy(struct tcf_proto *tp, bool rtnl_held,
+			 struct netlink_ext_ack *extack)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
 
@@ -157,7 +158,8 @@ static int mall_set_parms(struct net *net, struct tcf_proto *tp,
 static int mall_change(struct net *net, struct sk_buff *in_skb,
 		       struct tcf_proto *tp, unsigned long base,
 		       u32 handle, struct nlattr **tca,
-		       void **arg, bool ovr, struct netlink_ext_ack *extack)
+		       void **arg, bool ovr, bool rtnl_held,
+		       struct netlink_ext_ack *extack)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
 	struct nlattr *tb[TCA_MATCHALL_MAX + 1];
@@ -223,12 +225,13 @@ static int mall_change(struct net *net, struct sk_buff *in_skb,
 }
 
 static int mall_delete(struct tcf_proto *tp, void *arg, bool *last,
-		       struct netlink_ext_ack *extack)
+		       bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	return -EOPNOTSUPP;
 }
 
-static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+		      bool rtnl_held)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
 
@@ -270,7 +273,7 @@ static int mall_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
 }
 
 static int mall_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		     struct sk_buff *skb, struct tcmsg *t)
+		     struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct cls_mall_head *head = fh;
 	struct nlattr *nest;
diff --git a/net/sched/cls_route.c b/net/sched/cls_route.c
index 44b26038c4c4..e590c3a2999d 100644
--- a/net/sched/cls_route.c
+++ b/net/sched/cls_route.c
@@ -276,7 +276,8 @@ static void route4_queue_work(struct route4_filter *f)
 	tcf_queue_work(&f->rwork, route4_delete_filter_work);
 }
 
-static void route4_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void route4_destroy(struct tcf_proto *tp, bool rtnl_held,
+			   struct netlink_ext_ack *extack)
 {
 	struct route4_head *head = rtnl_dereference(tp->root);
 	int h1, h2;
@@ -312,7 +313,7 @@ static void route4_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 }
 
 static int route4_delete(struct tcf_proto *tp, void *arg, bool *last,
-			 struct netlink_ext_ack *extack)
+			 bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct route4_head *head = rtnl_dereference(tp->root);
 	struct route4_filter *f = arg;
@@ -468,7 +469,7 @@ static int route4_set_parms(struct net *net, struct tcf_proto *tp,
 static int route4_change(struct net *net, struct sk_buff *in_skb,
 			 struct tcf_proto *tp, unsigned long base, u32 handle,
 			 struct nlattr **tca, void **arg, bool ovr,
-			 struct netlink_ext_ack *extack)
+			 bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct route4_head *head = rtnl_dereference(tp->root);
 	struct route4_filter __rcu **fp;
@@ -560,7 +561,8 @@ static int route4_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+			bool rtnl_held)
 {
 	struct route4_head *head = rtnl_dereference(tp->root);
 	unsigned int h, h1;
@@ -597,7 +599,7 @@ static void route4_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 }
 
 static int route4_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		       struct sk_buff *skb, struct tcmsg *t)
+		       struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct route4_filter *f = fh;
 	struct nlattr *nest;
diff --git a/net/sched/cls_rsvp.h b/net/sched/cls_rsvp.h
index 9dd9530e6a52..4d3836178fa5 100644
--- a/net/sched/cls_rsvp.h
+++ b/net/sched/cls_rsvp.h
@@ -312,7 +312,8 @@ static void rsvp_delete_filter(struct tcf_proto *tp, struct rsvp_filter *f)
 		__rsvp_delete_filter(f);
 }
 
-static void rsvp_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void rsvp_destroy(struct tcf_proto *tp, bool rtnl_held,
+			 struct netlink_ext_ack *extack)
 {
 	struct rsvp_head *data = rtnl_dereference(tp->root);
 	int h1, h2;
@@ -341,7 +342,7 @@ static void rsvp_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 }
 
 static int rsvp_delete(struct tcf_proto *tp, void *arg, bool *last,
-		       struct netlink_ext_ack *extack)
+		       bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct rsvp_head *head = rtnl_dereference(tp->root);
 	struct rsvp_filter *nfp, *f = arg;
@@ -477,7 +478,8 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
 		       struct tcf_proto *tp, unsigned long base,
 		       u32 handle,
 		       struct nlattr **tca,
-		       void **arg, bool ovr, struct netlink_ext_ack *extack)
+		       void **arg, bool ovr, bool rtnl_held,
+		       struct netlink_ext_ack *extack)
 {
 	struct rsvp_head *data = rtnl_dereference(tp->root);
 	struct rsvp_filter *f, *nfp;
@@ -655,7 +657,8 @@ static int rsvp_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+		      bool rtnl_held)
 {
 	struct rsvp_head *head = rtnl_dereference(tp->root);
 	unsigned int h, h1;
@@ -689,7 +692,7 @@ static void rsvp_walk(struct tcf_proto *tp, struct tcf_walker *arg)
 }
 
 static int rsvp_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		     struct sk_buff *skb, struct tcmsg *t)
+		     struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct rsvp_filter *f = fh;
 	struct rsvp_session *s;
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index b7dc667b6ec0..14d6b4058045 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -173,7 +173,7 @@ static void tcindex_destroy_fexts_work(struct work_struct *work)
 }
 
 static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last,
-			  struct netlink_ext_ack *extack)
+			  bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = arg;
@@ -226,7 +226,7 @@ static int tcindex_destroy_element(struct tcf_proto *tp,
 {
 	bool last;
 
-	return tcindex_delete(tp, arg, &last, NULL);
+	return tcindex_delete(tp, arg, &last, false, NULL);
 }
 
 static void __tcindex_destroy(struct rcu_head *head)
@@ -499,7 +499,7 @@ static int
 tcindex_change(struct net *net, struct sk_buff *in_skb,
 	       struct tcf_proto *tp, unsigned long base, u32 handle,
 	       struct nlattr **tca, void **arg, bool ovr,
-	       struct netlink_ext_ack *extack)
+	       bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct nlattr *opt = tca[TCA_OPTIONS];
 	struct nlattr *tb[TCA_TCINDEX_MAX + 1];
@@ -522,7 +522,8 @@ tcindex_change(struct net *net, struct sk_buff *in_skb,
 				 tca[TCA_RATE], ovr, extack);
 }
 
-static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
+static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker,
+			 bool rtnl_held)
 {
 	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter *f, *next;
@@ -558,7 +559,7 @@ static void tcindex_walk(struct tcf_proto *tp, struct tcf_walker *walker)
 	}
 }
 
-static void tcindex_destroy(struct tcf_proto *tp,
+static void tcindex_destroy(struct tcf_proto *tp, bool rtnl_held,
 			    struct netlink_ext_ack *extack)
 {
 	struct tcindex_data *p = rtnl_dereference(tp->root);
@@ -568,14 +569,14 @@ static void tcindex_destroy(struct tcf_proto *tp,
 	walker.count = 0;
 	walker.skip = 0;
 	walker.fn = tcindex_destroy_element;
-	tcindex_walk(tp, &walker);
+	tcindex_walk(tp, &walker, true);
 
 	call_rcu(&p->rcu, __tcindex_destroy);
 }
 
 
 static int tcindex_dump(struct net *net, struct tcf_proto *tp, void *fh,
-			struct sk_buff *skb, struct tcmsg *t)
+			struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct tcindex_data *p = rtnl_dereference(tp->root);
 	struct tcindex_filter_result *r = fh;
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index cd79ae3b605a..3d3c79e19370 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -628,7 +628,8 @@ static int u32_destroy_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 	return -ENOENT;
 }
 
-static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
+static void u32_destroy(struct tcf_proto *tp, bool rtnl_held,
+			struct netlink_ext_ack *extack)
 {
 	struct tc_u_common *tp_c = tp->data;
 	struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
@@ -662,7 +663,7 @@ static void u32_destroy(struct tcf_proto *tp, struct netlink_ext_ack *extack)
 }
 
 static int u32_delete(struct tcf_proto *tp, void *arg, bool *last,
-		      struct netlink_ext_ack *extack)
+		      bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct tc_u_hnode *ht = arg;
 	struct tc_u_common *tp_c = tp->data;
@@ -857,7 +858,7 @@ static struct tc_u_knode *u32_init_knode(struct tcf_proto *tp,
 
 static int u32_change(struct net *net, struct sk_buff *in_skb,
 		      struct tcf_proto *tp, unsigned long base, u32 handle,
-		      struct nlattr **tca, void **arg, bool ovr,
+		      struct nlattr **tca, void **arg, bool ovr, bool rtnl_held,
 		      struct netlink_ext_ack *extack)
 {
 	struct tc_u_common *tp_c = tp->data;
@@ -1122,7 +1123,8 @@ static int u32_change(struct net *net, struct sk_buff *in_skb,
 	return err;
 }
 
-static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg)
+static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg,
+		     bool rtnl_held)
 {
 	struct tc_u_common *tp_c = tp->data;
 	struct tc_u_hnode *ht;
@@ -1279,7 +1281,7 @@ static void u32_bind_class(void *fh, u32 classid, unsigned long cl)
 }
 
 static int u32_dump(struct net *net, struct tcf_proto *tp, void *fh,
-		    struct sk_buff *skb, struct tcmsg *t)
+		    struct sk_buff *skb, struct tcmsg *t, bool rtnl_held)
 {
 	struct tc_u_knode *n = fh;
 	struct tc_u_hnode *ht_up, *ht_down;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 0605f01a419b..d868da10c019 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1857,7 +1857,7 @@ static void tc_bind_tclass(struct Qdisc *q, u32 portid, u32 clid,
 			arg.w.fn = tcf_node_bind;
 			arg.classid = clid;
 			arg.cl = new_cl;
-			tp->ops->walk(tp, &arg.w);
+			tp->ops->walk(tp, &arg.w, true);
 		}
 	}
 }
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 15/17] net: sched: add flags to Qdisc class ops struct
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (13 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 14/17] net: sched: extend proto ops to support unlocked classifiers Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path Vlad Buslov
  2018-11-12  7:55 ` [PATCH net-next 17/17] net: sched: unlock rules update API Vlad Buslov
  16 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Extend Qdisc_class_ops with flags. Create enum to hold possible class ops
flag values. Add first class ops flags value QDISC_CLASS_OPS_DOIT_UNLOCKED
to indicate that class ops functions can be called without taking rtnl
lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 7b158d6fae85..c97300f5ea5c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -175,6 +175,7 @@ static inline int qdisc_avail_bulklimit(const struct netdev_queue *txq)
 }
 
 struct Qdisc_class_ops {
+	unsigned int		flags;
 	/* Child qdisc manipulation */
 	struct netdev_queue *	(*select_queue)(struct Qdisc *, struct tcmsg *);
 	int			(*graft)(struct Qdisc *, unsigned long cl,
@@ -206,6 +207,13 @@ struct Qdisc_class_ops {
 					struct gnet_dump *);
 };
 
+/* Qdisc_class_ops flag values */
+
+/* Implements API that doesn't require rtnl lock */
+enum qdisc_class_ops_flags {
+	QDISC_CLASS_OPS_DOIT_UNLOCKED = 1,
+};
+
 struct Qdisc_ops {
 	struct Qdisc_ops	*next;
 	const struct Qdisc_class_ops	*cl_ops;
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (14 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 15/17] net: sched: add flags to Qdisc class ops struct Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-13  9:40   ` Stefano Brivio
  2018-11-12  7:55 ` [PATCH net-next 17/17] net: sched: unlock rules update API Vlad Buslov
  16 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

As a preparation for registering rules update netlink handlers as unlocked,
conditionally take rtnl in following cases:
- Parent qdisc doesn't support unlocked execution.
- Requested classifier type doesn't support unlocked execution.
- User requested to flash whole chain using old filter update API, instead
of new chains API.

Add helper function tcf_require_rtnl() to only lock rtnl when specified
condition is true and the lock hasn't been taken already.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 74 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 11 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1956c5df5f89..848f148f1019 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct *work)
 	rtnl_unlock();
 }
 
+/* Helper function to lock rtnl mutex when specified condition is true and mutex
+ * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl lock.
+ * Note that this function does nothing if rtnl is already held. This is
+ * intended to be used by cls API rules update API when multiple conditions
+ * could require rtnl lock and its state needs to be tracked to prevent trying
+ * to obtain lock multiple times.
+ */
+
+static void tcf_require_rtnl(bool cond, bool *rtnl_held)
+{
+	if (!*rtnl_held && cond) {
+		*rtnl_held = true;
+		rtnl_lock();
+	}
+}
+
 static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 					  u32 prio, struct tcf_chain *chain,
-					  bool rtnl_held,
+					  bool *rtnl_held,
 					  struct netlink_ext_ack *extack)
 {
 	struct tcf_proto *tp;
@@ -191,7 +207,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	if (!tp)
 		return ERR_PTR(-ENOBUFS);
 
-	tp->ops = tcf_proto_lookup_ops(kind, rtnl_held, extack);
+	tp->ops = tcf_proto_lookup_ops(kind, *rtnl_held, extack);
 	if (IS_ERR(tp->ops)) {
 		err = PTR_ERR(tp->ops);
 		goto errout;
@@ -204,6 +220,8 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
 	spin_lock_init(&tp->lock);
 	refcount_set(&tp->refcnt, 1);
 
+	tcf_require_rtnl(!(tp->ops->flags & TCF_PROTO_OPS_DOIT_UNLOCKED),
+			 rtnl_held);
 	err = tp->ops->init(tp);
 	if (err) {
 		module_put(tp->ops->owner);
@@ -888,6 +906,7 @@ static void tcf_block_refcnt_put(struct tcf_block *block)
 static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 					u32 *parent, unsigned long *cl,
 					int ifindex, u32 block_index,
+					bool *rtnl_held,
 					struct netlink_ext_ack *extack)
 {
 	struct tcf_block *block;
@@ -953,6 +972,12 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 		 */
 		rcu_read_unlock();
 
+		/* Take rtnl mutex if qdisc doesn't support unlocked
+		 * execution.
+		 */
+		tcf_require_rtnl(!(cops->flags & QDISC_CLASS_OPS_DOIT_UNLOCKED),
+				 rtnl_held);
+
 		/* Do we search for filter, attached to class? */
 		if (TC_H_MIN(*parent)) {
 			*cl = cops->find(*q, *parent);
@@ -990,7 +1015,10 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 	rcu_read_unlock();
 errout_qdisc:
 	if (*q) {
-		qdisc_put(*q);
+		if (*rtnl_held)
+			qdisc_put(*q);
+		else
+			qdisc_put_unlocked(*q);
 		*q = NULL;
 	}
 	return ERR_PTR(err);
@@ -1678,7 +1706,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	void *fh;
 	int err;
 	int tp_created;
-	bool rtnl_held;
+	bool rtnl_held = true;
 
 	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -1697,7 +1725,6 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	parent = t->tcm_parent;
 	tp = NULL;
 	cl = 0;
-	rtnl_held = true;
 
 	if (prio == 0) {
 		/* If no priority is provided by the user,
@@ -1715,7 +1742,8 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	/* Find head of filter chain. */
 
 	block = tcf_block_find(net, &q, &parent, &cl,
-			       t->tcm_ifindex, t->tcm_block_index, extack);
+			       t->tcm_ifindex, t->tcm_block_index, &rtnl_held,
+			       extack);
 	if (IS_ERR(block)) {
 		err = PTR_ERR(block);
 		goto errout;
@@ -1766,7 +1794,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 
 		spin_unlock(&chain->filter_chain_lock);
 		tp_new = tcf_proto_create(nla_data(tca[TCA_KIND]),
-					  protocol, prio, chain, rtnl_held,
+					  protocol, prio, chain, &rtnl_held,
 					  extack);
 		if (IS_ERR(tp_new)) {
 			err = PTR_ERR(tp_new);
@@ -1788,6 +1816,10 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		spin_unlock(&chain->filter_chain_lock);
 	}
 
+	/* take rtnl mutex if classifier doesn't support unlocked execution */
+	tcf_require_rtnl(!(tp->ops->flags & TCF_PROTO_OPS_DOIT_UNLOCKED),
+			 &rtnl_held);
+
 	if (tca[TCA_KIND] && nla_strcmp(tca[TCA_KIND], tp->ops->kind)) {
 		NL_SET_ERR_MSG(extack, "Specified filter kind does not match existing one");
 		err = -EINVAL;
@@ -1834,9 +1866,14 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			tcf_chain_put(chain);
 	}
 	tcf_block_release(q, block);
-	if (err == -EAGAIN)
+	if (err == -EAGAIN) {
+		/* Take rtnl lock in case EAGAIN is caused by concurrent flush
+		 * of target chain.
+		 */
+		tcf_require_rtnl(true, &rtnl_held);
 		/* Replay the request. */
 		goto replay;
+	}
 	return err;
 
 errout_locked:
@@ -1881,10 +1918,16 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		return -ENOENT;
 	}
 
+	/* Always take rtnl mutex when flushing whole chain in order to
+	 * synchronize with chain locked API.
+	 */
+	tcf_require_rtnl(!prio, &rtnl_held);
+
 	/* Find head of filter chain. */
 
 	block = tcf_block_find(net, &q, &parent, &cl,
-			       t->tcm_ifindex, t->tcm_block_index, extack);
+			       t->tcm_ifindex, t->tcm_block_index, &rtnl_held,
+			       extack);
 	if (IS_ERR(block)) {
 		err = PTR_ERR(block);
 		goto errout;
@@ -1941,6 +1984,9 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	}
 	spin_unlock(&chain->filter_chain_lock);
 
+	/* take rtnl mutex if classifier doesn't support unlocked execution */
+	tcf_require_rtnl(!(tp->ops->flags & TCF_PROTO_OPS_DOIT_UNLOCKED),
+			 &rtnl_held);
 	fh = tp->ops->get(tp, t->tcm_handle);
 
 	if (!fh) {
@@ -2010,7 +2056,8 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	/* Find head of filter chain. */
 
 	block = tcf_block_find(net, &q, &parent, &cl,
-			       t->tcm_ifindex, t->tcm_block_index, extack);
+			       t->tcm_ifindex, t->tcm_block_index, &rtnl_held,
+			       extack);
 	if (IS_ERR(block)) {
 		err = PTR_ERR(block);
 		goto errout;
@@ -2043,6 +2090,9 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		goto errout;
 	}
 
+	/* take rtnl mutex if classifier doesn't support unlocked execution */
+	tcf_require_rtnl(!(tp->ops->flags & TCF_PROTO_OPS_DOIT_UNLOCKED),
+			 &rtnl_held);
 	fh = tp->ops->get(tp, t->tcm_handle);
 
 	if (!fh) {
@@ -2397,6 +2447,7 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 	struct Qdisc *q = NULL;
 	struct tcf_chain *chain = NULL;
 	struct tcf_block *block;
+	bool rtnl_held = true;
 	unsigned long cl;
 	int err;
 
@@ -2414,7 +2465,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 	cl = 0;
 
 	block = tcf_block_find(net, &q, &parent, &cl,
-			       t->tcm_ifindex, t->tcm_block_index, extack);
+			       t->tcm_ifindex, t->tcm_block_index, &rtnl_held,
+			       extack);
 	if (IS_ERR(block))
 		return PTR_ERR(block);
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH net-next 17/17] net: sched: unlock rules update API
  2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
                   ` (15 preceding siblings ...)
  2018-11-12  7:55 ` [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path Vlad Buslov
@ 2018-11-12  7:55 ` Vlad Buslov
  2018-11-12 17:30   ` David Miller
  16 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-12  7:55 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, ast, daniel, Vlad Buslov

Register netlink protocol handlers for message types RTM_NEWTFILTER,
RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
tracks rtnl mutex state to be false by default.

Modify tcf_block_release() to release rtnl lock if it was taken before.
Move code that releases block and qdisc to function __tcf_block_release()
that is used internally by regular block release and by chain update
function, which is not unlocked and doesn't need to release rtnl.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 43 ++++++++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 848f148f1019..a23aeac8ea4e 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1024,13 +1024,28 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 	return ERR_PTR(err);
 }
 
-static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
+static void __tcf_block_release(struct Qdisc *q, struct tcf_block *block,
+				bool rtnl_held)
 {
 	if (!IS_ERR_OR_NULL(block))
 		tcf_block_refcnt_put(block);
 
-	if (q)
-		qdisc_put(q);
+	if (q) {
+		if (rtnl_held)
+			qdisc_put(q);
+		else
+			qdisc_put_unlocked(q);
+	}
+}
+
+static void tcf_block_release(struct Qdisc *q, struct tcf_block *block,
+			      bool *rtnl_held)
+{
+	if (*rtnl_held) {
+		rtnl_unlock();
+		*rtnl_held = false;
+	}
+	__tcf_block_release(q, block, false);
 }
 
 struct tcf_block_owner_item {
@@ -1706,7 +1721,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	void *fh;
 	int err;
 	int tp_created;
-	bool rtnl_held = true;
+	bool rtnl_held = false;
 
 	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -1865,7 +1880,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		if (!tp_created)
 			tcf_chain_put(chain);
 	}
-	tcf_block_release(q, block);
+	tcf_block_release(q, block, &rtnl_held);
 	if (err == -EAGAIN) {
 		/* Take rtnl lock in case EAGAIN is caused by concurrent flush
 		 * of target chain.
@@ -1899,7 +1914,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	unsigned long cl = 0;
 	void *fh = NULL;
 	int err;
-	bool rtnl_held = true;
+	bool rtnl_held = false;
 
 	if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
@@ -2011,7 +2026,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			tcf_proto_put(tp, NULL);
 		tcf_chain_put(chain);
 	}
-	tcf_block_release(q, block);
+	tcf_block_release(q, block, &rtnl_held);
 	return err;
 
 errout_locked:
@@ -2037,7 +2052,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	unsigned long cl = 0;
 	void *fh = NULL;
 	int err;
-	bool rtnl_held = true;
+	bool rtnl_held = false;
 
 	err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, rtm_tca_policy, extack);
 	if (err < 0)
@@ -2112,7 +2127,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			tcf_proto_put(tp, NULL);
 		tcf_chain_put(chain);
 	}
-	tcf_block_release(q, block);
+	tcf_block_release(q, block, &rtnl_held);
 	return err;
 }
 
@@ -2561,7 +2576,7 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	tcf_chain_put(chain);
 errout_block:
-	tcf_block_release(q, block);
+	__tcf_block_release(q, block, true);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
@@ -2899,10 +2914,12 @@ static int __init tc_filter_init(void)
 	if (err)
 		goto err_register_pernet_subsys;
 
-	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL, 0);
-	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL, 0);
+	rtnl_register(PF_UNSPEC, RTM_NEWTFILTER, tc_new_tfilter, NULL,
+		      RTNL_FLAG_DOIT_UNLOCKED);
+	rtnl_register(PF_UNSPEC, RTM_DELTFILTER, tc_del_tfilter, NULL,
+		      RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_GETTFILTER, tc_get_tfilter,
-		      tc_dump_tfilter, 0);
+		      tc_dump_tfilter, RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_NEWCHAIN, tc_ctl_chain, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_DELCHAIN, tc_ctl_chain, NULL, 0);
 	rtnl_register(PF_UNSPEC, RTM_GETCHAIN, tc_ctl_chain,
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue
  2018-11-12  7:55 ` [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue Vlad Buslov
@ 2018-11-12 17:28   ` David Miller
  2018-11-13 13:13     ` Vlad Buslov
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2018-11-12 17:28 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel

From: Vlad Buslov <vladbu@mellanox.com>
Date: Mon, 12 Nov 2018 09:55:30 +0200

> +void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
> +			  struct tcf_proto *tp_head)
> +{
> +	xchg(&miniqp->tp_head, tp_head);

If you are not checking the return value of xchg(), then this is
simply a store with optionally a memory barrier of some sort
either before or after.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 02/17] net: sched: protect block state with spinlock
  2018-11-12  7:55 ` [PATCH net-next 02/17] net: sched: protect block state with spinlock Vlad Buslov
@ 2018-11-12 17:28   ` David Miller
  2018-11-13 10:07     ` Stefano Brivio
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2018-11-12 17:28 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel

From: Vlad Buslov <vladbu@mellanox.com>
Date: Mon, 12 Nov 2018 09:55:31 +0200

> +#define ASSERT_BLOCK_LOCKED(block)					\
> +	WARN_ONCE(!spin_is_locked(&(block)->lock),		\
> +		  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)

spin_is_locked() is not usable for assertions.

When SMP=n and DEBUG_SPINLOCK=n, it always returns zero.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 17/17] net: sched: unlock rules update API
  2018-11-12  7:55 ` [PATCH net-next 17/17] net: sched: unlock rules update API Vlad Buslov
@ 2018-11-12 17:30   ` David Miller
  2018-11-13 13:46     ` Vlad Buslov
  0 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2018-11-12 17:30 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel

From: Vlad Buslov <vladbu@mellanox.com>
Date: Mon, 12 Nov 2018 09:55:46 +0200

> Register netlink protocol handlers for message types RTM_NEWTFILTER,
> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
> tracks rtnl mutex state to be false by default.

This whole conditional locking mechanism is really not clean and makes
this code so much harder to understand and audit.

Please improve the code so that this kind of construct is not needed.

Thank you.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path
  2018-11-12  7:55 ` [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path Vlad Buslov
@ 2018-11-13  9:40   ` Stefano Brivio
  2018-11-13 13:25     ` Vlad Buslov
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2018-11-13  9:40 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, ast, daniel

Hi Vlad,

On Mon, 12 Nov 2018 09:55:45 +0200
Vlad Buslov <vladbu@mellanox.com> wrote:

> @@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct *work)
>  	rtnl_unlock();
>  }
>  
> +/* Helper function to lock rtnl mutex when specified condition is true and mutex
> + * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl lock.
> + * Note that this function does nothing if rtnl is already held. This is
> + * intended to be used by cls API rules update API when multiple conditions
> + * could require rtnl lock and its state needs to be tracked to prevent trying
> + * to obtain lock multiple times.
> + */
> +
> +static void tcf_require_rtnl(bool cond, bool *rtnl_held)
> +{
> +	if (!*rtnl_held && cond) {
> +		*rtnl_held = true;
> +		rtnl_lock();
> +	}
> +}

I guess calls to this function are supposed to be serialised. If that's
the case (which is my tentative understanding so far), I would indicate
that in the comment.

If that's not the case, you would be introducing a race I guess.

Same applies to tcf_block_release() from 17/17.

-- 
Stefano

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 02/17] net: sched: protect block state with spinlock
  2018-11-12 17:28   ` David Miller
@ 2018-11-13 10:07     ` Stefano Brivio
  2018-11-13 13:28       ` Vlad Buslov
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2018-11-13 10:07 UTC (permalink / raw)
  To: vladbu; +Cc: David Miller, netdev, jhs, xiyou.wangcong, jiri, ast, daniel

Vlad,

On Mon, 12 Nov 2018 09:28:59 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Vlad Buslov <vladbu@mellanox.com>
> Date: Mon, 12 Nov 2018 09:55:31 +0200
> 
> > +#define ASSERT_BLOCK_LOCKED(block)					\
> > +	WARN_ONCE(!spin_is_locked(&(block)->lock),		\
> > +		  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)  
> 
> spin_is_locked() is not usable for assertions.

See also b86077207d0c ("igbvf: Replace spin_is_locked() with lockdep").

-- 
Stefano

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue
  2018-11-12 17:28   ` David Miller
@ 2018-11-13 13:13     ` Vlad Buslov
  2018-11-13 16:08       ` David Miller
  0 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-13 13:13 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel


On Mon 12 Nov 2018 at 17:28, David Miller <davem@davemloft.net> wrote:
> From: Vlad Buslov <vladbu@mellanox.com>
> Date: Mon, 12 Nov 2018 09:55:30 +0200
>
>> +void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>> +			  struct tcf_proto *tp_head)
>> +{
>> +	xchg(&miniqp->tp_head, tp_head);
>
> If you are not checking the return value of xchg(), then this is
> simply a store with optionally a memory barrier of some sort
> either before or after.

That was my intention. What would be a better way to atomically
reset a pointer? Should I just change this line to explicit
assignment+barrier?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path
  2018-11-13  9:40   ` Stefano Brivio
@ 2018-11-13 13:25     ` Vlad Buslov
  2018-11-13 13:40       ` Stefano Brivio
  0 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-13 13:25 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, ast, daniel


On Tue 13 Nov 2018 at 09:40, Stefano Brivio <sbrivio@redhat.com> wrote:
> Hi Vlad,
>
> On Mon, 12 Nov 2018 09:55:45 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> @@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct *work)
>>  	rtnl_unlock();
>>  }
>>  
>> +/* Helper function to lock rtnl mutex when specified condition is true and mutex
>> + * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl lock.
>> + * Note that this function does nothing if rtnl is already held. This is
>> + * intended to be used by cls API rules update API when multiple conditions
>> + * could require rtnl lock and its state needs to be tracked to prevent trying
>> + * to obtain lock multiple times.
>> + */
>> +
>> +static void tcf_require_rtnl(bool cond, bool *rtnl_held)
>> +{
>> +	if (!*rtnl_held && cond) {
>> +		*rtnl_held = true;
>> +		rtnl_lock();
>> +	}
>> +}
>
> I guess calls to this function are supposed to be serialised. If that's
> the case (which is my tentative understanding so far), I would indicate
> that in the comment.
>
> If that's not the case, you would be introducing a race I guess.
>
> Same applies to tcf_block_release() from 17/17.

Hi Stefano,

Thank you for reviewing my code!

I did not intend for this function to be serialized. First argument to
tcf_require_rtnl() is passed by value, and second argument is always a
pointer to local stack-allocated value of the caller. Same applies to
tcf_block_release() - its arguments are Qdisc and block which support
concurrency-safe reference counting, and pointer to local variable
rtnl_held, which is not accessible to concurrent users.

What is the race in these cases? Am I missing something?

Vlad

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 02/17] net: sched: protect block state with spinlock
  2018-11-13 10:07     ` Stefano Brivio
@ 2018-11-13 13:28       ` Vlad Buslov
  0 siblings, 0 replies; 35+ messages in thread
From: Vlad Buslov @ 2018-11-13 13:28 UTC (permalink / raw)
  To: Stefano Brivio
  Cc: David Miller, netdev, jhs, xiyou.wangcong, jiri, ast, daniel


On Tue 13 Nov 2018 at 10:07, Stefano Brivio <sbrivio@redhat.com> wrote:
> Vlad,
>
> On Mon, 12 Nov 2018 09:28:59 -0800 (PST)
> David Miller <davem@davemloft.net> wrote:
>
>> From: Vlad Buslov <vladbu@mellanox.com>
>> Date: Mon, 12 Nov 2018 09:55:31 +0200
>> 
>> > +#define ASSERT_BLOCK_LOCKED(block)					\
>> > +	WARN_ONCE(!spin_is_locked(&(block)->lock),		\
>> > +		  "BLOCK: assertion failed at %s (%d)\n", __FILE__,  __LINE__)  
>> 
>> spin_is_locked() is not usable for assertions.
>
> See also b86077207d0c ("igbvf: Replace spin_is_locked() with
> lockdep").

Stefano,

Thanks for the tip. I will check it out.

Vlad.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path
  2018-11-13 13:25     ` Vlad Buslov
@ 2018-11-13 13:40       ` Stefano Brivio
  2018-11-13 13:58         ` Vlad Buslov
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2018-11-13 13:40 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, ast, daniel

On Tue, 13 Nov 2018 13:25:52 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Tue 13 Nov 2018 at 09:40, Stefano Brivio <sbrivio@redhat.com> wrote:
> > Hi Vlad,
> >
> > On Mon, 12 Nov 2018 09:55:45 +0200
> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >  
> >> @@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct *work)
> >>  	rtnl_unlock();
> >>  }
> >>  
> >> +/* Helper function to lock rtnl mutex when specified condition is true and mutex
> >> + * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl lock.
> >> + * Note that this function does nothing if rtnl is already held. This is
> >> + * intended to be used by cls API rules update API when multiple conditions
> >> + * could require rtnl lock and its state needs to be tracked to prevent trying
> >> + * to obtain lock multiple times.
> >> + */
> >> +
> >> +static void tcf_require_rtnl(bool cond, bool *rtnl_held)
> >> +{
> >> +	if (!*rtnl_held && cond) {
> >> +		*rtnl_held = true;
> >> +		rtnl_lock();
> >> +	}
> >> +}  
> >
> > I guess calls to this function are supposed to be serialised. If that's
> > the case (which is my tentative understanding so far), I would indicate
> > that in the comment.
> >
> > If that's not the case, you would be introducing a race I guess.
> >
> > Same applies to tcf_block_release() from 17/17.  
> 
> Hi Stefano,
> 
> Thank you for reviewing my code!
> 
> I did not intend for this function to be serialized. First argument to
> tcf_require_rtnl() is passed by value, and second argument is always a
> pointer to local stack-allocated value of the caller.

Yes, sorry, I haven't been terribly clear, that's what I meant by
serialised: it won't be called concurrently with the same *rtnl_held.

Perhaps the risk that somebody uses it that way is close to zero, so
I'm not even too sure this is worth a comment, but if you can come up
with a concise way of saying this, that would be nice.

> Same applies to tcf_block_release() - its arguments are Qdisc and block
> which support concurrency-safe reference counting, and pointer to local
> variable rtnl_held, which is not accessible to concurrent users.

Same there.

> What is the race in these cases? Am I missing something?

No, no race then. My only concern was:

thread A:                             thread B:
- x = false;
- tcf_require_rtnl(true, &x);         - tcf_require_rtnl(true, &x);
  - if (!*x && true)                    - if (!*x && true)
    - *x = true;
    - rtnl_lock()                         - *x = true;
                                          - rtnl_lock()

but this cannot happen as you explained.

-- 
Stefano

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 17/17] net: sched: unlock rules update API
  2018-11-12 17:30   ` David Miller
@ 2018-11-13 13:46     ` Vlad Buslov
  2018-11-14  6:44       ` Jiri Pirko
  0 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-13 13:46 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel

On Mon 12 Nov 2018 at 17:30, David Miller <davem@davemloft.net> wrote:
> From: Vlad Buslov <vladbu@mellanox.com>
> Date: Mon, 12 Nov 2018 09:55:46 +0200
>
>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>> tracks rtnl mutex state to be false by default.
>
> This whole conditional locking mechanism is really not clean and makes
> this code so much harder to understand and audit.
>
> Please improve the code so that this kind of construct is not needed.
>
> Thank you.

Hi David,

I considered several approaches to this problem and decided that this
one is most straightforward to implement. I understand your concern and
agree that this code is not easiest to understand and can suggest
several possible solutions that do not require this kind of elaborate
locking mechanism in cls API, but have their own drawbacks:

1. Convert all qdiscs and classifiers to support unlocked execution,
like we did for actions. However, according to my experience with
converting flower classifier, these require much more code than actions.
I would estimate it to be more work than whole current unlocking effort
(hundred+ patches). Also, authors of some of them might be unhappy with
such intrusive changes. I don't think this approach is realistic.

2. Somehow determine if rtnl is needed at the beginning of cls API rule
update functions. Currently, this is not possible because locking
requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
field, which requires code to first do whole ops lookup sequence.
However, instead of class field I can put 'flags' in some kind of hash
table or array that will map qdisc/classifier type string to flags, so
it will be possible to determine locking requirements by just parsing
netlink message and obtaining flags by qdisc/classifier type. I do not
consider it pretty solution either, but maybe you have different
opinion.

3. Anything you can suggest? I might be missing something simple that
you would consider more elegant solution to this problem.

Thanks,
Vlad

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path
  2018-11-13 13:40       ` Stefano Brivio
@ 2018-11-13 13:58         ` Vlad Buslov
  2018-11-13 15:53           ` Stefano Brivio
  0 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-13 13:58 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, ast, daniel


On Tue 13 Nov 2018 at 13:40, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Tue, 13 Nov 2018 13:25:52 +0000
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> On Tue 13 Nov 2018 at 09:40, Stefano Brivio <sbrivio@redhat.com> wrote:
>> > Hi Vlad,
>> >
>> > On Mon, 12 Nov 2018 09:55:45 +0200
>> > Vlad Buslov <vladbu@mellanox.com> wrote:
>> >  
>> >> @@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct *work)
>> >>  	rtnl_unlock();
>> >>  }
>> >>  
>> >> +/* Helper function to lock rtnl mutex when specified condition is true and mutex
>> >> + * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl lock.
>> >> + * Note that this function does nothing if rtnl is already held. This is
>> >> + * intended to be used by cls API rules update API when multiple conditions
>> >> + * could require rtnl lock and its state needs to be tracked to prevent trying
>> >> + * to obtain lock multiple times.
>> >> + */
>> >> +
>> >> +static void tcf_require_rtnl(bool cond, bool *rtnl_held)
>> >> +{
>> >> +	if (!*rtnl_held && cond) {
>> >> +		*rtnl_held = true;
>> >> +		rtnl_lock();
>> >> +	}
>> >> +}  
>> >
>> > I guess calls to this function are supposed to be serialised. If that's
>> > the case (which is my tentative understanding so far), I would indicate
>> > that in the comment.
>> >
>> > If that's not the case, you would be introducing a race I guess.
>> >
>> > Same applies to tcf_block_release() from 17/17.  
>> 
>> Hi Stefano,
>> 
>> Thank you for reviewing my code!
>> 
>> I did not intend for this function to be serialized. First argument to
>> tcf_require_rtnl() is passed by value, and second argument is always a
>> pointer to local stack-allocated value of the caller.
>
> Yes, sorry, I haven't been terribly clear, that's what I meant by
> serialised: it won't be called concurrently with the same *rtnl_held.
>
> Perhaps the risk that somebody uses it that way is close to zero, so
> I'm not even too sure this is worth a comment, but if you can come up
> with a concise way of saying this, that would be nice.

I considered my comment that function "Will set rtnl_held to 'true'
before taking rtnl lock" as a red flag for caller to not pass pointer to
a variable that can be accessed concurrently. I guess I can add
additional sentence to explicitly warn potential users. Or I can just
move rtnl_held assignment in both functions to be performed while
holding rtnl mutex. I implemented it the way I did as an overzealous
optimization, but realistically price of an assignment is negligible in
this case. Suggestions are welcome!

>
>> Same applies to tcf_block_release() - its arguments are Qdisc and block
>> which support concurrency-safe reference counting, and pointer to local
>> variable rtnl_held, which is not accessible to concurrent users.
>
> Same there.
>
>> What is the race in these cases? Am I missing something?
>
> No, no race then. My only concern was:
>
> thread A:                             thread B:
> - x = false;
> - tcf_require_rtnl(true, &x);         - tcf_require_rtnl(true, &x);
>   - if (!*x && true)                    - if (!*x && true)
>     - *x = true;
>     - rtnl_lock()                         - *x = true;
>                                           - rtnl_lock()
>
> but this cannot happen as you explained.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path
  2018-11-13 13:58         ` Vlad Buslov
@ 2018-11-13 15:53           ` Stefano Brivio
  2018-11-13 16:57             ` Stefano Brivio
  0 siblings, 1 reply; 35+ messages in thread
From: Stefano Brivio @ 2018-11-13 15:53 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, ast, daniel

On Tue, 13 Nov 2018 13:58:05 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Tue 13 Nov 2018 at 13:40, Stefano Brivio <sbrivio@redhat.com> wrote:
> > On Tue, 13 Nov 2018 13:25:52 +0000
> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >  
> >> On Tue 13 Nov 2018 at 09:40, Stefano Brivio <sbrivio@redhat.com> wrote:  
> >> > Hi Vlad,
> >> >
> >> > On Mon, 12 Nov 2018 09:55:45 +0200
> >> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >> >    
> >> >> @@ -179,9 +179,25 @@ static void tcf_proto_destroy_work(struct work_struct *work)
> >> >>  	rtnl_unlock();
> >> >>  }
> >> >>  
> >> >> +/* Helper function to lock rtnl mutex when specified condition is true and mutex
> >> >> + * hasn't been locked yet. Will set rtnl_held to 'true' before taking rtnl lock.
> >> >> + * Note that this function does nothing if rtnl is already held. This is
> >> >> + * intended to be used by cls API rules update API when multiple conditions
> >> >> + * could require rtnl lock and its state needs to be tracked to prevent trying
> >> >> + * to obtain lock multiple times.
> >> >> + */
> >> >> +
> >> >> +static void tcf_require_rtnl(bool cond, bool *rtnl_held)
> >> >> +{
> >> >> +	if (!*rtnl_held && cond) {
> >> >> +		*rtnl_held = true;
> >> >> +		rtnl_lock();
> >> >> +	}
> >> >> +}    
> >> >
> >> > I guess calls to this function are supposed to be serialised. If that's
> >> > the case (which is my tentative understanding so far), I would indicate
> >> > that in the comment.
> >> >
> >> > If that's not the case, you would be introducing a race I guess.
> >> >
> >> > Same applies to tcf_block_release() from 17/17.    
> >> 
> >> Hi Stefano,
> >> 
> >> Thank you for reviewing my code!
> >> 
> >> I did not intend for this function to be serialized. First argument to
> >> tcf_require_rtnl() is passed by value, and second argument is always a
> >> pointer to local stack-allocated value of the caller.  
> >
> > Yes, sorry, I haven't been terribly clear, that's what I meant by
> > serialised: it won't be called concurrently with the same *rtnl_held.
> >
> > Perhaps the risk that somebody uses it that way is close to zero, so
> > I'm not even too sure this is worth a comment, but if you can come up
> > with a concise way of saying this, that would be nice.  
> 
> I considered my comment that function "Will set rtnl_held to 'true'
> before taking rtnl lock" as a red flag for caller to not pass pointer to
> a variable that can be accessed concurrently. I guess I can add
> additional sentence to explicitly warn potential users. Or I can just
> move rtnl_held assignment in both functions to be performed while
> holding rtnl mutex. I implemented it the way I did as an overzealous
> optimization, but realistically price of an assignment is negligible in
> this case.

But to make that effective, you would need to protect the read too, and
that makes your optimisation not really overzealous I think.

I'd rather go with an additional comment, if that doesn't become
unreadable.

-- 
Stefano

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue
  2018-11-13 13:13     ` Vlad Buslov
@ 2018-11-13 16:08       ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2018-11-13 16:08 UTC (permalink / raw)
  To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel

From: Vlad Buslov <vladbu@mellanox.com>
Date: Tue, 13 Nov 2018 13:13:19 +0000

> 
> On Mon 12 Nov 2018 at 17:28, David Miller <davem@davemloft.net> wrote:
>> From: Vlad Buslov <vladbu@mellanox.com>
>> Date: Mon, 12 Nov 2018 09:55:30 +0200
>>
>>> +void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,
>>> +			  struct tcf_proto *tp_head)
>>> +{
>>> +	xchg(&miniqp->tp_head, tp_head);
>>
>> If you are not checking the return value of xchg(), then this is
>> simply a store with optionally a memory barrier of some sort
>> either before or after.
> 
> That was my intention. What would be a better way to atomically
> reset a pointer? Should I just change this line to explicit
> assignment+barrier?

We have all kinds of helpers in the kernel for doing things like
this, grep for things like "smp_load_acquire", "smp_store_release()"
etc.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path
  2018-11-13 15:53           ` Stefano Brivio
@ 2018-11-13 16:57             ` Stefano Brivio
  0 siblings, 0 replies; 35+ messages in thread
From: Stefano Brivio @ 2018-11-13 16:57 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, ast, daniel

On Tue, 13 Nov 2018 16:53:07 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> But to make that effective, you would need to protect the read too, and
> that makes your optimisation not really overzealous I think.
> 
> I'd rather go with an additional comment, if that doesn't become
> unreadable.

Oh, and of course, this whole thing makes sense only if for some reason
you end up keeping this function. Maybe ignore my comments on this patch
at least for the moment being ;)

-- 
Stefano

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 17/17] net: sched: unlock rules update API
  2018-11-13 13:46     ` Vlad Buslov
@ 2018-11-14  6:44       ` Jiri Pirko
  2018-11-14 16:45         ` Vlad Buslov
  0 siblings, 1 reply; 35+ messages in thread
From: Jiri Pirko @ 2018-11-14  6:44 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: David Miller, netdev, jhs, xiyou.wangcong, ast, daniel

Tue, Nov 13, 2018 at 02:46:54PM CET, vladbu@mellanox.com wrote:
>On Mon 12 Nov 2018 at 17:30, David Miller <davem@davemloft.net> wrote:
>> From: Vlad Buslov <vladbu@mellanox.com>
>> Date: Mon, 12 Nov 2018 09:55:46 +0200
>>
>>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>>> tracks rtnl mutex state to be false by default.
>>
>> This whole conditional locking mechanism is really not clean and makes
>> this code so much harder to understand and audit.
>>
>> Please improve the code so that this kind of construct is not needed.
>>
>> Thank you.
>
>Hi David,
>
>I considered several approaches to this problem and decided that this
>one is most straightforward to implement. I understand your concern and
>agree that this code is not easiest to understand and can suggest
>several possible solutions that do not require this kind of elaborate
>locking mechanism in cls API, but have their own drawbacks:
>
>1. Convert all qdiscs and classifiers to support unlocked execution,
>like we did for actions. However, according to my experience with
>converting flower classifier, these require much more code than actions.
>I would estimate it to be more work than whole current unlocking effort
>(hundred+ patches). Also, authors of some of them might be unhappy with
>such intrusive changes. I don't think this approach is realistic.
>
>2. Somehow determine if rtnl is needed at the beginning of cls API rule
>update functions. Currently, this is not possible because locking
>requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
>field, which requires code to first do whole ops lookup sequence.
>However, instead of class field I can put 'flags' in some kind of hash
>table or array that will map qdisc/classifier type string to flags, so
>it will be possible to determine locking requirements by just parsing
>netlink message and obtaining flags by qdisc/classifier type. I do not
>consider it pretty solution either, but maybe you have different
>opinion.

I think you will have to do 2. or some modification. Can't you just
check for cls ability to run unlocked early on in tc_new_tfilter()?
You would call tcf_proto_locking_check(nla_data(tca[TCA_KIND]), ...),
which would do tcf_proto_lookup_ops() for ops and check the flags?


>
>3. Anything you can suggest? I might be missing something simple that
>you would consider more elegant solution to this problem.
>
>Thanks,
>Vlad
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 17/17] net: sched: unlock rules update API
  2018-11-14  6:44       ` Jiri Pirko
@ 2018-11-14 16:45         ` Vlad Buslov
  2018-11-15 10:20           ` Jiri Pirko
  0 siblings, 1 reply; 35+ messages in thread
From: Vlad Buslov @ 2018-11-14 16:45 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: David Miller, netdev, jhs, xiyou.wangcong, ast, daniel


On Wed 14 Nov 2018 at 06:44, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, Nov 13, 2018 at 02:46:54PM CET, vladbu@mellanox.com wrote:
>>On Mon 12 Nov 2018 at 17:30, David Miller <davem@davemloft.net> wrote:
>>> From: Vlad Buslov <vladbu@mellanox.com>
>>> Date: Mon, 12 Nov 2018 09:55:46 +0200
>>>
>>>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>>>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>>>> tracks rtnl mutex state to be false by default.
>>>
>>> This whole conditional locking mechanism is really not clean and makes
>>> this code so much harder to understand and audit.
>>>
>>> Please improve the code so that this kind of construct is not needed.
>>>
>>> Thank you.
>>
>>Hi David,
>>
>>I considered several approaches to this problem and decided that this
>>one is most straightforward to implement. I understand your concern and
>>agree that this code is not easiest to understand and can suggest
>>several possible solutions that do not require this kind of elaborate
>>locking mechanism in cls API, but have their own drawbacks:
>>
>>1. Convert all qdiscs and classifiers to support unlocked execution,
>>like we did for actions. However, according to my experience with
>>converting flower classifier, these require much more code than actions.
>>I would estimate it to be more work than whole current unlocking effort
>>(hundred+ patches). Also, authors of some of them might be unhappy with
>>such intrusive changes. I don't think this approach is realistic.
>>
>>2. Somehow determine if rtnl is needed at the beginning of cls API rule
>>update functions. Currently, this is not possible because locking
>>requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
>>field, which requires code to first do whole ops lookup sequence.
>>However, instead of class field I can put 'flags' in some kind of hash
>>table or array that will map qdisc/classifier type string to flags, so
>>it will be possible to determine locking requirements by just parsing
>>netlink message and obtaining flags by qdisc/classifier type. I do not
>>consider it pretty solution either, but maybe you have different
>>opinion.
>
> I think you will have to do 2. or some modification. Can't you just
> check for cls ability to run unlocked early on in tc_new_tfilter()?
> You would call tcf_proto_locking_check(nla_data(tca[TCA_KIND]), ...),
> which would do tcf_proto_lookup_ops() for ops and check the flags?

I guess that would work. However, such solution requires calling
tcf_proto_lookup_ops(), which iterates over tcf_proto_base list and
calls strcmp() for each proto, for every rule update call. That is why I
suggested to use some kind of optimized data structure for that purpose
in my first reply. Dunno if such solution will significantly impact rule
update performance. We don't have that many classifiers and their names
are short, so I guess not?

>
>
>>
>>3. Anything you can suggest? I might be missing something simple that
>>you would consider more elegant solution to this problem.
>>
>>Thanks,
>>Vlad
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH net-next 17/17] net: sched: unlock rules update API
  2018-11-14 16:45         ` Vlad Buslov
@ 2018-11-15 10:20           ` Jiri Pirko
  0 siblings, 0 replies; 35+ messages in thread
From: Jiri Pirko @ 2018-11-15 10:20 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: David Miller, netdev, jhs, xiyou.wangcong, ast, daniel

Wed, Nov 14, 2018 at 05:45:34PM CET, vladbu@mellanox.com wrote:
>
>On Wed 14 Nov 2018 at 06:44, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Nov 13, 2018 at 02:46:54PM CET, vladbu@mellanox.com wrote:
>>>On Mon 12 Nov 2018 at 17:30, David Miller <davem@davemloft.net> wrote:
>>>> From: Vlad Buslov <vladbu@mellanox.com>
>>>> Date: Mon, 12 Nov 2018 09:55:46 +0200
>>>>
>>>>> Register netlink protocol handlers for message types RTM_NEWTFILTER,
>>>>> RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that
>>>>> tracks rtnl mutex state to be false by default.
>>>>
>>>> This whole conditional locking mechanism is really not clean and makes
>>>> this code so much harder to understand and audit.
>>>>
>>>> Please improve the code so that this kind of construct is not needed.
>>>>
>>>> Thank you.
>>>
>>>Hi David,
>>>
>>>I considered several approaches to this problem and decided that this
>>>one is most straightforward to implement. I understand your concern and
>>>agree that this code is not easiest to understand and can suggest
>>>several possible solutions that do not require this kind of elaborate
>>>locking mechanism in cls API, but have their own drawbacks:
>>>
>>>1. Convert all qdiscs and classifiers to support unlocked execution,
>>>like we did for actions. However, according to my experience with
>>>converting flower classifier, these require much more code than actions.
>>>I would estimate it to be more work than whole current unlocking effort
>>>(hundred+ patches). Also, authors of some of them might be unhappy with
>>>such intrusive changes. I don't think this approach is realistic.
>>>
>>>2. Somehow determine if rtnl is needed at the beginning of cls API rule
>>>update functions. Currently, this is not possible because locking
>>>requirements are determined by qdisc_class_ops and tcf_proto_ops 'flags'
>>>field, which requires code to first do whole ops lookup sequence.
>>>However, instead of class field I can put 'flags' in some kind of hash
>>>table or array that will map qdisc/classifier type string to flags, so
>>>it will be possible to determine locking requirements by just parsing
>>>netlink message and obtaining flags by qdisc/classifier type. I do not
>>>consider it pretty solution either, but maybe you have different
>>>opinion.
>>
>> I think you will have to do 2. or some modification. Can't you just
>> check for cls ability to run unlocked early on in tc_new_tfilter()?
>> You would call tcf_proto_locking_check(nla_data(tca[TCA_KIND]), ...),
>> which would do tcf_proto_lookup_ops() for ops and check the flags?
>
>I guess that would work. However, such solution requires calling
>tcf_proto_lookup_ops(), which iterates over tcf_proto_base list and
>calls strcmp() for each proto, for every rule update call. That is why I
>suggested to use some kind of optimized data structure for that purpose
>in my first reply. Dunno if such solution will significantly impact rule
>update performance. We don't have that many classifiers and their names
>are short, so I guess not?

Let's do it like this for unlocked first, then we can optimize if
necessary.


>
>>
>>
>>>
>>>3. Anything you can suggest? I might be missing something simple that
>>>you would consider more elegant solution to this problem.
>>>
>>>Thanks,
>>>Vlad
>>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-11-15 20:34 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-12  7:55 [PATCH net-next 00/17] Refactor classifier API to work with chain/classifiers without rtnl lock Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 01/17] net: sched: refactor mini_qdisc_pair_swap() to use workqueue Vlad Buslov
2018-11-12 17:28   ` David Miller
2018-11-13 13:13     ` Vlad Buslov
2018-11-13 16:08       ` David Miller
2018-11-12  7:55 ` [PATCH net-next 02/17] net: sched: protect block state with spinlock Vlad Buslov
2018-11-12 17:28   ` David Miller
2018-11-13 10:07     ` Stefano Brivio
2018-11-13 13:28       ` Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 03/17] net: sched: refactor tc_ctl_chain() to use block->lock Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 04/17] net: sched: protect block->chain0 with block->lock Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 05/17] net: sched: traverse chains in block with tcf_get_next_chain() Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 06/17] net: sched: protect chain template accesses with block lock Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 07/17] net: sched: lock the chain when accessing filter_chain list Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 08/17] net: sched: introduce reference counting for tcf proto Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 09/17] net: sched: traverse classifiers in chain with tcf_get_next_proto() Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 10/17] net: sched: refactor tp insert/delete for concurrent execution Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 11/17] net: sched: prevent insertion of new classifiers during chain flush Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 12/17] net: sched: track rtnl lock status when validating extensions Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 13/17] net: sched: extend proto ops with 'put' callback Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 14/17] net: sched: extend proto ops to support unlocked classifiers Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 15/17] net: sched: add flags to Qdisc class ops struct Vlad Buslov
2018-11-12  7:55 ` [PATCH net-next 16/17] net: sched: conditionally take rtnl lock on rules update path Vlad Buslov
2018-11-13  9:40   ` Stefano Brivio
2018-11-13 13:25     ` Vlad Buslov
2018-11-13 13:40       ` Stefano Brivio
2018-11-13 13:58         ` Vlad Buslov
2018-11-13 15:53           ` Stefano Brivio
2018-11-13 16:57             ` Stefano Brivio
2018-11-12  7:55 ` [PATCH net-next 17/17] net: sched: unlock rules update API Vlad Buslov
2018-11-12 17:30   ` David Miller
2018-11-13 13:46     ` Vlad Buslov
2018-11-14  6:44       ` Jiri Pirko
2018-11-14 16:45         ` Vlad Buslov
2018-11-15 10:20           ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.