All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock
@ 2018-09-17  7:17 Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function Vlad Buslov
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
Handlers registered with this flag are called without RTNL taken. End
goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER,
etc.) to be registered with UNLOCKED flag to allow parallel execution.
However, there is no intention to completely remove or split rtnl lock
itself. This patch set addresses specific problems in implementation of
classifiers API that prevent its control path from being executed
concurrently. Additional changes are required to refactor classifiers
API and individual classifiers for parallel execution. This patch set
lays groundwork to eventually register rule update handlers as
rtnl-unlocked by modifying code in cls API that works with Qdiscs and
blocks. Following patch set does the same for chains and classifiers.

The goal of this change is to refactor tcf_block_find() and its
dependencies to allow concurrent execution:
- Extend Qdisc API with rcu to lookup and take reference to Qdisc
  without relying on rtnl lock.
- Extend tcf_block with atomic reference counting and rcu.
- Always take reference to tcf_block while working with it.
- Implement tcf_block_release() to release resources obtained by
  tcf_block_find()
- Create infrastructure to allow registering Qdiscs with class ops that
  do not require the caller to hold rtnl lock.

All three netlink rule update handlers use tcf_block_find() to lookup
Qdisc and block, and this patch set introduces additional means of
synchronization to substitute rtnl lock in cls API.

Some functions in cls and sch APIs have historic names that no longer
clearly describe their intent. In order not make this code even more
confusing when introducing their concurrency-friendly versions, rename
these functions to describe actual implementation.

Changes from V1 to V2:
- Rebase on latest net-next.
- Patch 8 - remove.
- Patch 9 - fold into patch 11.
- Patch 11:
  - Rename tcf_block_{get|put}() to tcf_block_refcnt_{get|put}().
- Patch 13 - remove.

Vlad Buslov (10):
  net: core: netlink: add helper refcount dec and lock function
  net: sched: rename qdisc_destroy() to qdisc_put()
  net: sched: extend Qdisc with rcu
  net: sched: add helper function to take reference to Qdisc
  net: sched: use Qdisc rcu API instead of relying on rtnl lock
  net: sched: change tcf block reference counter type to refcount_t
  net: sched: implement functions to put and flush all chains
  net: sched: protect block idr with spinlock
  net: sched: implement tcf_block_refcnt_{get|put}()
  net: sched: use reference counting for tcf blocks on rules update

 include/linux/rtnetlink.h |   6 ++
 include/net/pkt_sched.h   |   1 +
 include/net/sch_generic.h |  20 +++-
 net/core/rtnetlink.c      |   6 ++
 net/sched/cls_api.c       | 250 +++++++++++++++++++++++++++++++++-------------
 net/sched/sch_api.c       |  24 ++++-
 net/sched/sch_atm.c       |   2 +-
 net/sched/sch_cbq.c       |   2 +-
 net/sched/sch_cbs.c       |   2 +-
 net/sched/sch_drr.c       |   4 +-
 net/sched/sch_dsmark.c    |   2 +-
 net/sched/sch_fifo.c      |   2 +-
 net/sched/sch_generic.c   |  48 +++++++--
 net/sched/sch_hfsc.c      |   2 +-
 net/sched/sch_htb.c       |   4 +-
 net/sched/sch_mq.c        |   4 +-
 net/sched/sch_mqprio.c    |   4 +-
 net/sched/sch_multiq.c    |   6 +-
 net/sched/sch_netem.c     |   2 +-
 net/sched/sch_prio.c      |   6 +-
 net/sched/sch_qfq.c       |   4 +-
 net/sched/sch_red.c       |   4 +-
 net/sched/sch_sfb.c       |   4 +-
 net/sched/sch_tbf.c       |   4 +-
 24 files changed, 301 insertions(+), 112 deletions(-)

-- 
2.7.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-19 21:46   ` Cong Wang
  2018-09-17  7:17 ` [PATCH net-next v2 02/10] net: sched: rename qdisc_destroy() to qdisc_put() Vlad Buslov
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Rtnl lock is encapsulated in netlink and cannot be accessed by other
modules directly. This means that reference counted objects that rely on
rtnl lock cannot use it with refcounter helper function that atomically
releases decrements reference and obtains mutex.

This patch implements simple wrapper function around refcount_dec_and_lock
that obtains rtnl lock if reference counter value reached 0.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/rtnetlink.h | 1 +
 net/core/rtnetlink.c      | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 5225832bd6ff..dffbf665c086 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -34,6 +34,7 @@ extern void rtnl_unlock(void);
 extern int rtnl_trylock(void);
 extern int rtnl_is_locked(void);
 extern int rtnl_lock_killable(void);
+extern bool refcount_dec_and_rtnl_lock(refcount_t *r);
 
 extern wait_queue_head_t netdev_unregistering_wq;
 extern struct rw_semaphore pernet_ops_rwsem;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e4ae0319e189..ad9d1493cb27 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -130,6 +130,12 @@ int rtnl_is_locked(void)
 }
 EXPORT_SYMBOL(rtnl_is_locked);
 
+bool refcount_dec_and_rtnl_lock(refcount_t *r)
+{
+	return refcount_dec_and_mutex_lock(r, &rtnl_mutex);
+}
+EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);
+
 #ifdef CONFIG_PROVE_LOCKING
 bool lockdep_rtnl_is_held(void)
 {
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 02/10] net: sched: rename qdisc_destroy() to qdisc_put()
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu Vlad Buslov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Current implementation of qdisc_destroy() decrements Qdisc reference
counter and only actually destroy Qdisc if reference counter value reached
zero. Rename qdisc_destroy() to qdisc_put() in order for it to better
describe the way in which this function currently implemented and used.

Extract code that deallocates Qdisc into new private qdisc_destroy()
function. It is intended to be shared between regular qdisc_put() and its
unlocked version that is introduced in next patch in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  2 +-
 net/sched/sch_api.c       |  6 +++---
 net/sched/sch_atm.c       |  2 +-
 net/sched/sch_cbq.c       |  2 +-
 net/sched/sch_cbs.c       |  2 +-
 net/sched/sch_drr.c       |  4 ++--
 net/sched/sch_dsmark.c    |  2 +-
 net/sched/sch_fifo.c      |  2 +-
 net/sched/sch_generic.c   | 23 ++++++++++++++---------
 net/sched/sch_hfsc.c      |  2 +-
 net/sched/sch_htb.c       |  4 ++--
 net/sched/sch_mq.c        |  4 ++--
 net/sched/sch_mqprio.c    |  4 ++--
 net/sched/sch_multiq.c    |  6 +++---
 net/sched/sch_netem.c     |  2 +-
 net/sched/sch_prio.c      |  6 +++---
 net/sched/sch_qfq.c       |  4 ++--
 net/sched/sch_red.c       |  4 ++--
 net/sched/sch_sfb.c       |  4 ++--
 net/sched/sch_tbf.c       |  4 ++--
 20 files changed, 47 insertions(+), 42 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d326fd553b58..fadb1a4d4ee8 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -554,7 +554,7 @@ void dev_deactivate_many(struct list_head *head);
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
-void qdisc_destroy(struct Qdisc *qdisc);
+void qdisc_put(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
 			       unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 411c40344b77..2096138c4bf6 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -920,7 +920,7 @@ static void notify_and_destroy(struct net *net, struct sk_buff *skb,
 		qdisc_notify(net, skb, n, clid, old, new);
 
 	if (old)
-		qdisc_destroy(old);
+		qdisc_put(old);
 }
 
 /* Graft qdisc "new" to class "classid" of qdisc "parent" or
@@ -973,7 +973,7 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent,
 				qdisc_refcount_inc(new);
 
 			if (!ingress)
-				qdisc_destroy(old);
+				qdisc_put(old);
 		}
 
 skip:
@@ -1561,7 +1561,7 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
 	err = qdisc_graft(dev, p, skb, n, clid, q, NULL, extack);
 	if (err) {
 		if (q)
-			qdisc_destroy(q);
+			qdisc_put(q);
 		return err;
 	}
 
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index cd49afca9617..d714d3747bcb 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -150,7 +150,7 @@ static void atm_tc_put(struct Qdisc *sch, unsigned long cl)
 	pr_debug("atm_tc_put: destroying\n");
 	list_del_init(&flow->list);
 	pr_debug("atm_tc_put: qdisc %p\n", flow->q);
-	qdisc_destroy(flow->q);
+	qdisc_put(flow->q);
 	tcf_block_put(flow->block);
 	if (flow->sock) {
 		pr_debug("atm_tc_put: f_count %ld\n",
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index f42025d53cfe..4dc05409e3fb 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1418,7 +1418,7 @@ static void cbq_destroy_class(struct Qdisc *sch, struct cbq_class *cl)
 	WARN_ON(cl->filters);
 
 	tcf_block_put(cl->block);
-	qdisc_destroy(cl->q);
+	qdisc_put(cl->q);
 	qdisc_put_rtab(cl->R_tab);
 	gen_kill_estimator(&cl->rate_est);
 	if (cl != &q->link)
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index e26a24017faa..e689e11b6d0f 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -379,7 +379,7 @@ static void cbs_destroy(struct Qdisc *sch)
 	cbs_disable_offload(dev, q);
 
 	if (q->qdisc)
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 }
 
 static int cbs_dump(struct Qdisc *sch, struct sk_buff *skb)
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index e0b0cf8a9939..cdebaed0f8cf 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -134,7 +134,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 					    tca[TCA_RATE]);
 		if (err) {
 			NL_SET_ERR_MSG(extack, "Failed to replace estimator");
-			qdisc_destroy(cl->qdisc);
+			qdisc_put(cl->qdisc);
 			kfree(cl);
 			return err;
 		}
@@ -153,7 +153,7 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 static void drr_destroy_class(struct Qdisc *sch, struct drr_class *cl)
 {
 	gen_kill_estimator(&cl->rate_est);
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	kfree(cl);
 }
 
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 049714c57075..f6f480784bc6 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -412,7 +412,7 @@ static void dsmark_destroy(struct Qdisc *sch)
 	pr_debug("%s(sch %p,[qdisc %p])\n", __func__, sch, p);
 
 	tcf_block_put(p->block);
-	qdisc_destroy(p->q);
+	qdisc_put(p->q);
 	if (p->mv != p->embedded)
 		kfree(p->mv);
 }
diff --git a/net/sched/sch_fifo.c b/net/sched/sch_fifo.c
index 24893d3b5d22..3809c9bf8896 100644
--- a/net/sched/sch_fifo.c
+++ b/net/sched/sch_fifo.c
@@ -177,7 +177,7 @@ struct Qdisc *fifo_create_dflt(struct Qdisc *sch, struct Qdisc_ops *ops,
 	if (q) {
 		err = fifo_set_limit(q, limit);
 		if (err < 0) {
-			qdisc_destroy(q);
+			qdisc_put(q);
 			q = NULL;
 		}
 	}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a64132a5db36..3e7696f3e053 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -901,7 +901,7 @@ struct Qdisc *qdisc_create_dflt(struct netdev_queue *dev_queue,
 	if (!ops->init || ops->init(sch, NULL, extack) == 0)
 		return sch;
 
-	qdisc_destroy(sch);
+	qdisc_put(sch);
 	return NULL;
 }
 EXPORT_SYMBOL(qdisc_create_dflt);
@@ -941,15 +941,11 @@ void qdisc_free(struct Qdisc *qdisc)
 	kfree((char *) qdisc - qdisc->padded);
 }
 
-void qdisc_destroy(struct Qdisc *qdisc)
+static void qdisc_destroy(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops  *ops = qdisc->ops;
 	struct sk_buff *skb, *tmp;
 
-	if (qdisc->flags & TCQ_F_BUILTIN ||
-	    !refcount_dec_and_test(&qdisc->refcnt))
-		return;
-
 #ifdef CONFIG_NET_SCHED
 	qdisc_hash_del(qdisc);
 
@@ -976,7 +972,16 @@ void qdisc_destroy(struct Qdisc *qdisc)
 
 	qdisc_free(qdisc);
 }
-EXPORT_SYMBOL(qdisc_destroy);
+
+void qdisc_put(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN ||
+	    !refcount_dec_and_test(&qdisc->refcnt))
+		return;
+
+	qdisc_destroy(qdisc);
+}
+EXPORT_SYMBOL(qdisc_put);
 
 /* Attach toplevel qdisc to device queue. */
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
@@ -1270,7 +1275,7 @@ static void shutdown_scheduler_queue(struct net_device *dev,
 		rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
 		dev_queue->qdisc_sleeping = qdisc_default;
 
-		qdisc_destroy(qdisc);
+		qdisc_put(qdisc);
 	}
 }
 
@@ -1279,7 +1284,7 @@ void dev_shutdown(struct net_device *dev)
 	netdev_for_each_tx_queue(dev, shutdown_scheduler_queue, &noop_qdisc);
 	if (dev_ingress_queue(dev))
 		shutdown_scheduler_queue(dev, dev_ingress_queue(dev), &noop_qdisc);
-	qdisc_destroy(dev->qdisc);
+	qdisc_put(dev->qdisc);
 	dev->qdisc = &noop_qdisc;
 
 	WARN_ON(timer_pending(&dev->watchdog_timer));
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 3278a76f6861..b18ec1f6de60 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1092,7 +1092,7 @@ hfsc_destroy_class(struct Qdisc *sch, struct hfsc_class *cl)
 	struct hfsc_sched *q = qdisc_priv(sch);
 
 	tcf_block_put(cl->block);
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	gen_kill_estimator(&cl->rate_est);
 	if (cl != &q->root)
 		kfree(cl);
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 18ac2d6ca294..58b449490757 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1208,7 +1208,7 @@ static void htb_destroy_class(struct Qdisc *sch, struct htb_class *cl)
 {
 	if (!cl->level) {
 		WARN_ON(!cl->leaf.q);
-		qdisc_destroy(cl->leaf.q);
+		qdisc_put(cl->leaf.q);
 	}
 	gen_kill_estimator(&cl->rate_est);
 	tcf_block_put(cl->block);
@@ -1409,7 +1409,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid,
 			/* turn parent into inner node */
 			qdisc_reset(parent->leaf.q);
 			qdisc_tree_reduce_backlog(parent->leaf.q, qlen, backlog);
-			qdisc_destroy(parent->leaf.q);
+			qdisc_put(parent->leaf.q);
 			if (parent->prio_activity)
 				htb_deactivate(q, parent);
 
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index d6b8ae4ed7a3..f20f3a0f8424 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -65,7 +65,7 @@ static void mq_destroy(struct Qdisc *sch)
 	if (!priv->qdiscs)
 		return;
 	for (ntx = 0; ntx < dev->num_tx_queues && priv->qdiscs[ntx]; ntx++)
-		qdisc_destroy(priv->qdiscs[ntx]);
+		qdisc_put(priv->qdiscs[ntx]);
 	kfree(priv->qdiscs);
 }
 
@@ -119,7 +119,7 @@ static void mq_attach(struct Qdisc *sch)
 		qdisc = priv->qdiscs[ntx];
 		old = dev_graft_qdisc(qdisc->dev_queue, qdisc);
 		if (old)
-			qdisc_destroy(old);
+			qdisc_put(old);
 #ifdef CONFIG_NET_SCHED
 		if (ntx < dev->real_num_tx_queues)
 			qdisc_hash_add(qdisc, false);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 0e9d761cdd80..d364e63c396d 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -40,7 +40,7 @@ static void mqprio_destroy(struct Qdisc *sch)
 		for (ntx = 0;
 		     ntx < dev->num_tx_queues && priv->qdiscs[ntx];
 		     ntx++)
-			qdisc_destroy(priv->qdiscs[ntx]);
+			qdisc_put(priv->qdiscs[ntx]);
 		kfree(priv->qdiscs);
 	}
 
@@ -300,7 +300,7 @@ static void mqprio_attach(struct Qdisc *sch)
 		qdisc = priv->qdiscs[ntx];
 		old = dev_graft_qdisc(qdisc->dev_queue, qdisc);
 		if (old)
-			qdisc_destroy(old);
+			qdisc_put(old);
 		if (ntx < dev->real_num_tx_queues)
 			qdisc_hash_add(qdisc, false);
 	}
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 1da7ea8de0ad..7410ce4d0321 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -175,7 +175,7 @@ multiq_destroy(struct Qdisc *sch)
 
 	tcf_block_put(q->block);
 	for (band = 0; band < q->bands; band++)
-		qdisc_destroy(q->queues[band]);
+		qdisc_put(q->queues[band]);
 
 	kfree(q->queues);
 }
@@ -204,7 +204,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
 			q->queues[i] = &noop_qdisc;
 			qdisc_tree_reduce_backlog(child, child->q.qlen,
 						  child->qstats.backlog);
-			qdisc_destroy(child);
+			qdisc_put(child);
 		}
 	}
 
@@ -228,7 +228,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
 					qdisc_tree_reduce_backlog(old,
 								  old->q.qlen,
 								  old->qstats.backlog);
-					qdisc_destroy(old);
+					qdisc_put(old);
 				}
 				sch_tree_unlock(sch);
 			}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 506e1960ed7f..57b3ad9394ad 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -1022,7 +1022,7 @@ static void netem_destroy(struct Qdisc *sch)
 
 	qdisc_watchdog_cancel(&q->watchdog);
 	if (q->qdisc)
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 	dist_free(q->delay_dist);
 	dist_free(q->slot_dist);
 }
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 222e53d3d27a..f8af98621179 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -175,7 +175,7 @@ prio_destroy(struct Qdisc *sch)
 	tcf_block_put(q->block);
 	prio_offload(sch, NULL);
 	for (prio = 0; prio < q->bands; prio++)
-		qdisc_destroy(q->queues[prio]);
+		qdisc_put(q->queues[prio]);
 }
 
 static int prio_tune(struct Qdisc *sch, struct nlattr *opt,
@@ -205,7 +205,7 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt,
 					      extack);
 		if (!queues[i]) {
 			while (i > oldbands)
-				qdisc_destroy(queues[--i]);
+				qdisc_put(queues[--i]);
 			return -ENOMEM;
 		}
 	}
@@ -220,7 +220,7 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt,
 
 		qdisc_tree_reduce_backlog(child, child->q.qlen,
 					  child->qstats.backlog);
-		qdisc_destroy(child);
+		qdisc_put(child);
 	}
 
 	for (i = oldbands; i < q->bands; i++) {
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index bb1a9c11fc54..dc37c4ead439 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -526,7 +526,7 @@ static int qfq_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
 	return 0;
 
 destroy_class:
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	kfree(cl);
 	return err;
 }
@@ -537,7 +537,7 @@ static void qfq_destroy_class(struct Qdisc *sch, struct qfq_class *cl)
 
 	qfq_rm_from_agg(q, cl);
 	gen_kill_estimator(&cl->rate_est);
-	qdisc_destroy(cl->qdisc);
+	qdisc_put(cl->qdisc);
 	kfree(cl);
 }
 
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 56c181c3feeb..3ce6c0a2c493 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -181,7 +181,7 @@ static void red_destroy(struct Qdisc *sch)
 
 	del_timer_sync(&q->adapt_timer);
 	red_offload(sch, false);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 }
 
 static const struct nla_policy red_policy[TCA_RED_MAX + 1] = {
@@ -233,7 +233,7 @@ static int red_change(struct Qdisc *sch, struct nlattr *opt,
 	if (child) {
 		qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
 					  q->qdisc->qstats.backlog);
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 		q->qdisc = child;
 	}
 
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 7cbdad8419b7..bab506b01a32 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -469,7 +469,7 @@ static void sfb_destroy(struct Qdisc *sch)
 	struct sfb_sched_data *q = qdisc_priv(sch);
 
 	tcf_block_put(q->block);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 }
 
 static const struct nla_policy sfb_policy[TCA_SFB_MAX + 1] = {
@@ -523,7 +523,7 @@ static int sfb_change(struct Qdisc *sch, struct nlattr *opt,
 
 	qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
 				  q->qdisc->qstats.backlog);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 	q->qdisc = child;
 
 	q->rehash_interval = msecs_to_jiffies(ctl->rehash_interval);
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index a4530e85bd02..942dcca09cf2 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -392,7 +392,7 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt,
 	if (child) {
 		qdisc_tree_reduce_backlog(q->qdisc, q->qdisc->q.qlen,
 					  q->qdisc->qstats.backlog);
-		qdisc_destroy(q->qdisc);
+		qdisc_put(q->qdisc);
 		q->qdisc = child;
 	}
 	q->limit = qopt->limit;
@@ -438,7 +438,7 @@ static void tbf_destroy(struct Qdisc *sch)
 	struct tbf_sched_data *q = qdisc_priv(sch);
 
 	qdisc_watchdog_cancel(&q->watchdog);
-	qdisc_destroy(q->qdisc);
+	qdisc_put(q->qdisc);
 }
 
 static int tbf_dump(struct Qdisc *sch, struct sk_buff *skb)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 02/10] net: sched: rename qdisc_destroy() to qdisc_put() Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-19  2:45   ` David Miller
  2018-09-17  7:17 ` [PATCH net-next v2 04/10] net: sched: add helper function to take reference to Qdisc Vlad Buslov
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Currently, Qdisc API functions assume that users have rtnl lock taken. To
implement rtnl unlocked classifiers update interface, Qdisc API must be
extended with functions that do not require rtnl lock.

Extend Qdisc structure with rcu. Implement special version of put function
qdisc_put_unlocked() that is called without rtnl lock taken. This function
only takes rtnl lock if Qdisc reference counter reached zero and is
intended to be used as optimization.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/rtnetlink.h |  5 +++++
 include/net/pkt_sched.h   |  1 +
 include/net/sch_generic.h |  2 ++
 net/sched/sch_api.c       | 18 ++++++++++++++++++
 net/sched/sch_generic.c   | 25 ++++++++++++++++++++++++-
 5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index dffbf665c086..d3dff3e41e6c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -84,6 +84,11 @@ static inline struct netdev_queue *dev_ingress_queue(struct net_device *dev)
 	return rtnl_dereference(dev->ingress_queue);
 }
 
+static inline struct netdev_queue *dev_ingress_queue_rcu(struct net_device *dev)
+{
+	return rcu_dereference(dev->ingress_queue);
+}
+
 struct netdev_queue *dev_ingress_queue_create(struct net_device *dev);
 
 #ifdef CONFIG_NET_INGRESS
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 7dc769e5452b..a16fbe9a2a67 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -102,6 +102,7 @@ int qdisc_set_default(const char *id);
 void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
 					struct nlattr *tab,
 					struct netlink_ext_ack *extack);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index fadb1a4d4ee8..8a0d1523d11b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -90,6 +90,7 @@ struct Qdisc {
 	struct gnet_stats_queue	__percpu *cpu_qstats;
 	int			padded;
 	refcount_t		refcnt;
+	struct rcu_head		rcu;
 
 	/*
 	 * For performance sake on SMP, we put highly modified fields at the end
@@ -555,6 +556,7 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc);
 void qdisc_reset(struct Qdisc *qdisc);
 void qdisc_put(struct Qdisc *qdisc);
+void qdisc_put_unlocked(struct Qdisc *qdisc);
 void qdisc_tree_reduce_backlog(struct Qdisc *qdisc, unsigned int n,
 			       unsigned int len);
 struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2096138c4bf6..070bed39155b 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -314,6 +314,24 @@ struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 	return q;
 }
 
+struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
+{
+	struct Qdisc *q;
+	struct netdev_queue *nq;
+
+	if (!handle)
+		return NULL;
+	q = qdisc_match_from_root(dev->qdisc, handle);
+	if (q)
+		goto out;
+
+	nq = dev_ingress_queue_rcu(dev);
+	if (nq)
+		q = qdisc_match_from_root(nq->qdisc_sleeping, handle);
+out:
+	return q;
+}
+
 static struct Qdisc *qdisc_leaf(struct Qdisc *p, u32 classid)
 {
 	unsigned long cl;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 3e7696f3e053..531fac1d2875 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -941,6 +941,13 @@ void qdisc_free(struct Qdisc *qdisc)
 	kfree((char *) qdisc - qdisc->padded);
 }
 
+void qdisc_free_cb(struct rcu_head *head)
+{
+	struct Qdisc *q = container_of(head, struct Qdisc, rcu);
+
+	qdisc_free(q);
+}
+
 static void qdisc_destroy(struct Qdisc *qdisc)
 {
 	const struct Qdisc_ops  *ops = qdisc->ops;
@@ -970,7 +977,7 @@ static void qdisc_destroy(struct Qdisc *qdisc)
 		kfree_skb_list(skb);
 	}
 
-	qdisc_free(qdisc);
+	call_rcu(&qdisc->rcu, qdisc_free_cb);
 }
 
 void qdisc_put(struct Qdisc *qdisc)
@@ -983,6 +990,22 @@ void qdisc_put(struct Qdisc *qdisc)
 }
 EXPORT_SYMBOL(qdisc_put);
 
+/* Version of qdisc_put() that is called with rtnl mutex unlocked.
+ * Intended to be used as optimization, this function only takes rtnl lock if
+ * qdisc reference counter reached zero.
+ */
+
+void qdisc_put_unlocked(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN ||
+	    !refcount_dec_and_rtnl_lock(&qdisc->refcnt))
+		return;
+
+	qdisc_destroy(qdisc);
+	rtnl_unlock();
+}
+EXPORT_SYMBOL(qdisc_put_unlocked);
+
 /* Attach toplevel qdisc to device queue. */
 struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue,
 			      struct Qdisc *qdisc)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 04/10] net: sched: add helper function to take reference to Qdisc
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
                   ` (2 preceding siblings ...)
  2018-09-17  7:17 ` [PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock Vlad Buslov
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Implement function to take reference to Qdisc that relies on rcu read lock
instead of rtnl mutex. Function only takes reference to Qdisc if reference
counter isn't zero. Intended to be used by unlocked cls API.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8a0d1523d11b..9a295e690efe 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -115,6 +115,19 @@ static inline void qdisc_refcount_inc(struct Qdisc *qdisc)
 	refcount_inc(&qdisc->refcnt);
 }
 
+/* Intended to be used by unlocked users, when concurrent qdisc release is
+ * possible.
+ */
+
+static inline struct Qdisc *qdisc_refcount_inc_nz(struct Qdisc *qdisc)
+{
+	if (qdisc->flags & TCQ_F_BUILTIN)
+		return qdisc;
+	if (refcount_inc_not_zero(&qdisc->refcnt))
+		return qdisc;
+	return NULL;
+}
+
 static inline bool qdisc_is_running(struct Qdisc *qdisc)
 {
 	if (qdisc->flags & TCQ_F_NOLOCK)
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
                   ` (3 preceding siblings ...)
  2018-09-17  7:17 ` [PATCH net-next v2 04/10] net: sched: add helper function to take reference to Qdisc Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-19 22:04   ` Cong Wang
  2018-09-17  7:17 ` [PATCH net-next v2 06/10] net: sched: change tcf block reference counter type to refcount_t Vlad Buslov
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

As a preparation from removing rtnl lock dependency from rules update path,
use Qdisc rcu and reference counting capabilities instead of relying on
rtnl lock while working with Qdiscs. Create new tcf_block_release()
function, and use it to free resources taken by tcf_block_find().
Currently, this function only releases Qdisc and it is extended in next
patches in this series.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 88 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 73 insertions(+), 15 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1a67af8a6e8c..cfa4a02a6a1a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -527,6 +527,17 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
 	return idr_find(&tn->idr, block_index);
 }
 
+static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
+{
+	if (!q)
+		return;
+
+	if (rtnl_held)
+		qdisc_put(q);
+	else
+		qdisc_put_unlocked(q);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -537,6 +548,7 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 					struct netlink_ext_ack *extack)
 {
 	struct tcf_block *block;
+	int err = 0;
 
 	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
 		block = tcf_block_lookup(net, block_index);
@@ -548,55 +560,91 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 		const struct Qdisc_class_ops *cops;
 		struct net_device *dev;
 
+		rcu_read_lock();
+
 		/* Find link */
-		dev = __dev_get_by_index(net, ifindex);
-		if (!dev)
+		dev = dev_get_by_index_rcu(net, ifindex);
+		if (!dev) {
+			rcu_read_unlock();
 			return ERR_PTR(-ENODEV);
+		}
 
 		/* Find qdisc */
 		if (!*parent) {
 			*q = dev->qdisc;
 			*parent = (*q)->handle;
 		} else {
-			*q = qdisc_lookup(dev, TC_H_MAJ(*parent));
+			*q = qdisc_lookup_rcu(dev, TC_H_MAJ(*parent));
 			if (!*q) {
 				NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
-				return ERR_PTR(-EINVAL);
+				err = -EINVAL;
+				goto errout_rcu;
 			}
 		}
 
+		*q = qdisc_refcount_inc_nz(*q);
+		if (!*q) {
+			NL_SET_ERR_MSG(extack, "Parent Qdisc doesn't exists");
+			err = -EINVAL;
+			goto errout_rcu;
+		}
+
 		/* Is it classful? */
 		cops = (*q)->ops->cl_ops;
 		if (!cops) {
 			NL_SET_ERR_MSG(extack, "Qdisc not classful");
-			return ERR_PTR(-EINVAL);
+			err = -EINVAL;
+			goto errout_rcu;
 		}
 
 		if (!cops->tcf_block) {
 			NL_SET_ERR_MSG(extack, "Class doesn't support blocks");
-			return ERR_PTR(-EOPNOTSUPP);
+			err = -EOPNOTSUPP;
+			goto errout_rcu;
 		}
 
+		/* At this point we know that qdisc is not noop_qdisc,
+		 * which means that qdisc holds a reference to net_device
+		 * and we hold a reference to qdisc, so it is safe to release
+		 * rcu read lock.
+		 */
+		rcu_read_unlock();
+
 		/* Do we search for filter, attached to class? */
 		if (TC_H_MIN(*parent)) {
 			*cl = cops->find(*q, *parent);
 			if (*cl == 0) {
 				NL_SET_ERR_MSG(extack, "Specified class doesn't exist");
-				return ERR_PTR(-ENOENT);
+				err = -ENOENT;
+				goto errout_qdisc;
 			}
 		}
 
 		/* And the last stroke */
 		block = cops->tcf_block(*q, *cl, extack);
-		if (!block)
-			return ERR_PTR(-EINVAL);
+		if (!block) {
+			err = -EINVAL;
+			goto errout_qdisc;
+		}
 		if (tcf_block_shared(block)) {
 			NL_SET_ERR_MSG(extack, "This filter block is shared. Please use the block index to manipulate the filters");
-			return ERR_PTR(-EOPNOTSUPP);
+			err = -EOPNOTSUPP;
+			goto errout_qdisc;
 		}
 	}
 
 	return block;
+
+errout_rcu:
+	rcu_read_unlock();
+errout_qdisc:
+	tcf_qdisc_put(*q, true);
+	return ERR_PTR(err);
+}
+
+static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
+{
+	tcf_qdisc_put(q, true);
 }
 
 struct tcf_block_owner_item {
@@ -1332,6 +1380,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
@@ -1453,6 +1502,7 @@ static int tc_del_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	return err;
 }
 
@@ -1538,6 +1588,7 @@ static int tc_get_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 errout:
 	if (chain)
 		tcf_chain_put(chain);
+	tcf_block_release(q, block);
 	return err;
 }
 
@@ -1854,7 +1905,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 	chain_index = tca[TCA_CHAIN] ? nla_get_u32(tca[TCA_CHAIN]) : 0;
 	if (chain_index > TC_ACT_EXT_VAL_MASK) {
 		NL_SET_ERR_MSG(extack, "Specified chain index exceeds upper limit");
-		return -EINVAL;
+		err = -EINVAL;
+		goto errout_block;
 	}
 	chain = tcf_chain_lookup(block, chain_index);
 	if (n->nlmsg_type == RTM_NEWCHAIN) {
@@ -1866,23 +1918,27 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 				tcf_chain_hold(chain);
 			} else {
 				NL_SET_ERR_MSG(extack, "Filter chain already exists");
-				return -EEXIST;
+				err = -EEXIST;
+				goto errout_block;
 			}
 		} else {
 			if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 				NL_SET_ERR_MSG(extack, "Need both RTM_NEWCHAIN and NLM_F_CREATE to create a new chain");
-				return -ENOENT;
+				err = -ENOENT;
+				goto errout_block;
 			}
 			chain = tcf_chain_create(block, chain_index);
 			if (!chain) {
 				NL_SET_ERR_MSG(extack, "Failed to create filter chain");
-				return -ENOMEM;
+				err = -ENOMEM;
+				goto errout_block;
 			}
 		}
 	} else {
 		if (!chain || tcf_chain_held_by_acts_only(chain)) {
 			NL_SET_ERR_MSG(extack, "Cannot find specified filter chain");
-			return -EINVAL;
+			err = -EINVAL;
+			goto errout_block;
 		}
 		tcf_chain_hold(chain);
 	}
@@ -1924,6 +1980,8 @@ static int tc_ctl_chain(struct sk_buff *skb, struct nlmsghdr *n,
 
 errout:
 	tcf_chain_put(chain);
+errout_block:
+	tcf_block_release(q, block);
 	if (err == -EAGAIN)
 		/* Replay the request. */
 		goto replay;
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 06/10] net: sched: change tcf block reference counter type to refcount_t
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
                   ` (4 preceding siblings ...)
  2018-09-17  7:17 ` [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 07/10] net: sched: implement functions to put and flush all chains Vlad Buslov
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

As a preparation for removing rtnl lock dependency from rules update path,
change tcf block reference counter type to refcount_t to allow modification
by concurrent users.

In block put function perform decrement and check reference counter once to
accommodate concurrent modification by unlocked users. After this change
tcf_chain_put at the end of block put function is called with
block->refcnt==0 and will deallocate block after the last chain is
released, so there is no need to manually deallocate block in this case.
However, if block reference counter reached 0 and there are no chains to
release, block must still be deallocated manually.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  2 +-
 net/sched/cls_api.c       | 59 ++++++++++++++++++++++++++++-------------------
 2 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9a295e690efe..45fee65468d0 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -345,7 +345,7 @@ struct tcf_chain {
 struct tcf_block {
 	struct list_head chain_list;
 	u32 index; /* block index for shared blocks */
-	unsigned int refcnt;
+	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
 	struct list_head cb_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index cfa4a02a6a1a..c3c7d4e2f84c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -240,7 +240,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 	if (!chain->index)
 		block->chain0.chain = NULL;
 	kfree(chain);
-	if (list_empty(&block->chain_list) && block->refcnt == 0)
+	if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
 		kfree(block);
 }
 
@@ -510,7 +510,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 	INIT_LIST_HEAD(&block->owner_list);
 	INIT_LIST_HEAD(&block->chain0.filter_chain_list);
 
-	block->refcnt = 1;
+	refcount_set(&block->refcnt, 1);
 	block->net = net;
 	block->index = block_index;
 
@@ -719,7 +719,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 		/* block_index not 0 means the shared block is requested */
 		block = tcf_block_lookup(net, ei->block_index);
 		if (block)
-			block->refcnt++;
+			refcount_inc(&block->refcnt);
 	}
 
 	if (!block) {
@@ -762,7 +762,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 err_block_insert:
 		kfree(block);
 	} else {
-		block->refcnt--;
+		refcount_dec(&block->refcnt);
 	}
 	return err;
 }
@@ -802,34 +802,45 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 	tcf_chain0_head_change_cb_del(block, ei);
 	tcf_block_owner_del(block, q, ei->binder_type);
 
-	if (block->refcnt == 1) {
-		if (tcf_block_shared(block))
-			tcf_block_remove(block, block->net);
-
-		/* Hold a refcnt for all chains, so that they don't disappear
-		 * while we are iterating.
+	if (refcount_dec_and_test(&block->refcnt)) {
+		/* Flushing/putting all chains will cause the block to be
+		 * deallocated when last chain is freed. However, if chain_list
+		 * is empty, block has to be manually deallocated. After block
+		 * reference counter reached 0, it is no longer possible to
+		 * increment it or add new chains to block.
 		 */
-		list_for_each_entry(chain, &block->chain_list, list)
-			tcf_chain_hold(chain);
+		bool free_block = list_empty(&block->chain_list);
 
-		list_for_each_entry(chain, &block->chain_list, list)
-			tcf_chain_flush(chain);
-	}
+		if (tcf_block_shared(block))
+			tcf_block_remove(block, block->net);
 
-	tcf_block_offload_unbind(block, q, ei);
+		if (!free_block) {
+			/* Hold a refcnt for all chains, so that they don't
+			 * disappear while we are iterating.
+			 */
+			list_for_each_entry(chain, &block->chain_list, list)
+				tcf_chain_hold(chain);
 
-	if (block->refcnt == 1) {
-		/* At this point, all the chains should have refcnt >= 1. */
-		list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
-			tcf_chain_put_explicitly_created(chain);
-			tcf_chain_put(chain);
+			list_for_each_entry(chain, &block->chain_list, list)
+				tcf_chain_flush(chain);
 		}
 
-		block->refcnt--;
-		if (list_empty(&block->chain_list))
+		tcf_block_offload_unbind(block, q, ei);
+
+		if (free_block) {
 			kfree(block);
+		} else {
+			/* At this point, all the chains should have
+			 * refcnt >= 1.
+			 */
+			list_for_each_entry_safe(chain, tmp, &block->chain_list,
+						 list) {
+				tcf_chain_put_explicitly_created(chain);
+				tcf_chain_put(chain);
+			}
+		}
 	} else {
-		block->refcnt--;
+		tcf_block_offload_unbind(block, q, ei);
 	}
 }
 EXPORT_SYMBOL(tcf_block_put_ext);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 07/10] net: sched: implement functions to put and flush all chains
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
                   ` (5 preceding siblings ...)
  2018-09-17  7:17 ` [PATCH net-next v2 06/10] net: sched: change tcf block reference counter type to refcount_t Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock Vlad Buslov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Extract code that flushes and puts all chains on tcf block to two
standalone function to be shared with functions that locklessly get/put
reference to block.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 55 +++++++++++++++++++++++++++++------------------------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index c3c7d4e2f84c..58b2d8443f6a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -538,6 +538,31 @@ static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 		qdisc_put_unlocked(q);
 }
 
+static void tcf_block_flush_all_chains(struct tcf_block *block)
+{
+	struct tcf_chain *chain;
+
+	/* Hold a refcnt for all chains, so that they don't disappear
+	 * while we are iterating.
+	 */
+	list_for_each_entry(chain, &block->chain_list, list)
+		tcf_chain_hold(chain);
+
+	list_for_each_entry(chain, &block->chain_list, list)
+		tcf_chain_flush(chain);
+}
+
+static void tcf_block_put_all_chains(struct tcf_block *block)
+{
+	struct tcf_chain *chain, *tmp;
+
+	/* At this point, all the chains should have refcnt >= 1. */
+	list_for_each_entry_safe(chain, tmp, &block->chain_list, list) {
+		tcf_chain_put_explicitly_created(chain);
+		tcf_chain_put(chain);
+	}
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,8 +820,6 @@ EXPORT_SYMBOL(tcf_block_get);
 void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 		       struct tcf_block_ext_info *ei)
 {
-	struct tcf_chain *chain, *tmp;
-
 	if (!block)
 		return;
 	tcf_chain0_head_change_cb_del(block, ei);
@@ -813,32 +836,14 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 
 		if (tcf_block_shared(block))
 			tcf_block_remove(block, block->net);
-
-		if (!free_block) {
-			/* Hold a refcnt for all chains, so that they don't
-			 * disappear while we are iterating.
-			 */
-			list_for_each_entry(chain, &block->chain_list, list)
-				tcf_chain_hold(chain);
-
-			list_for_each_entry(chain, &block->chain_list, list)
-				tcf_chain_flush(chain);
-		}
-
+		if (!free_block)
+			tcf_block_flush_all_chains(block);
 		tcf_block_offload_unbind(block, q, ei);
 
-		if (free_block) {
+		if (free_block)
 			kfree(block);
-		} else {
-			/* At this point, all the chains should have
-			 * refcnt >= 1.
-			 */
-			list_for_each_entry_safe(chain, tmp, &block->chain_list,
-						 list) {
-				tcf_chain_put_explicitly_created(chain);
-				tcf_chain_put(chain);
-			}
-		}
+		else
+			tcf_block_put_all_chains(block);
 	} else {
 		tcf_block_offload_unbind(block, q, ei);
 	}
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
                   ` (6 preceding siblings ...)
  2018-09-17  7:17 ` [PATCH net-next v2 07/10] net: sched: implement functions to put and flush all chains Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-19 22:09   ` Cong Wang
  2018-09-17  7:17 ` [PATCH net-next v2 09/10] net: sched: implement tcf_block_refcnt_{get|put}() Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 10/10] net: sched: use reference counting for tcf blocks on rules update Vlad Buslov
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Protect block idr access with spinlock, instead of relying on rtnl lock.
Take tn->idr_lock spinlock during block insertion and removal.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 58b2d8443f6a..924723fb74f6 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -473,6 +473,7 @@ tcf_chain0_head_change_cb_del(struct tcf_block *block,
 }
 
 struct tcf_net {
+	spinlock_t idr_lock; /* Protects idr */
 	struct idr idr;
 };
 
@@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, struct net *net,
 			    struct netlink_ext_ack *extack)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
+	int err;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&tn->idr_lock);
+	err = idr_alloc_u32(&tn->idr, block, &block->index, block->index,
+			    GFP_NOWAIT);
+	spin_unlock(&tn->idr_lock);
+	idr_preload_end();
 
-	return idr_alloc_u32(&tn->idr, block, &block->index, block->index,
-			     GFP_KERNEL);
+	return err;
 }
 
 static void tcf_block_remove(struct tcf_block *block, struct net *net)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+	spin_lock(&tn->idr_lock);
 	idr_remove(&tn->idr, block->index);
+	spin_unlock(&tn->idr_lock);
 }
 
 static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
@@ -2285,6 +2295,7 @@ static __net_init int tcf_net_init(struct net *net)
 {
 	struct tcf_net *tn = net_generic(net, tcf_net_id);
 
+	spin_lock_init(&tn->idr_lock);
 	idr_init(&tn->idr);
 	return 0;
 }
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 09/10] net: sched: implement tcf_block_refcnt_{get|put}()
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
                   ` (7 preceding siblings ...)
  2018-09-17  7:17 ` [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  2018-09-17  7:17 ` [PATCH net-next v2 10/10] net: sched: use reference counting for tcf blocks on rules update Vlad Buslov
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

Implement get/put function for blocks that only take/release the reference
and perform deallocation. These functions are intended to be used by
unlocked rules update path to always hold reference to block while working
with it. They use on new fine-grained locking mechanisms introduced in
previous patches in this set, instead of relying on global protection
provided by rtnl lock.

Extract code that is common with tcf_block_detach_ext() into common
function __tcf_block_put().

Extend tcf_block with rcu to allow safe deallocation when it is accessed
concurrently.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  1 +
 net/sched/cls_api.c       | 74 ++++++++++++++++++++++++++++++++---------------
 2 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 45fee65468d0..931fcdadf64a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -357,6 +357,7 @@ struct tcf_block {
 		struct tcf_chain *chain;
 		struct list_head filter_chain_list;
 	} chain0;
+	struct rcu_head rcu;
 };
 
 static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 924723fb74f6..0a7a3ace2da9 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -241,7 +241,7 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
 		block->chain0.chain = NULL;
 	kfree(chain);
 	if (list_empty(&block->chain_list) && !refcount_read(&block->refcnt))
-		kfree(block);
+		kfree_rcu(block, rcu);
 }
 
 static void tcf_chain_hold(struct tcf_chain *chain)
@@ -537,6 +537,19 @@ static struct tcf_block *tcf_block_lookup(struct net *net, u32 block_index)
 	return idr_find(&tn->idr, block_index);
 }
 
+static struct tcf_block *tcf_block_refcnt_get(struct net *net, u32 block_index)
+{
+	struct tcf_block *block;
+
+	rcu_read_lock();
+	block = tcf_block_lookup(net, block_index);
+	if (block && !refcount_inc_not_zero(&block->refcnt))
+		block = NULL;
+	rcu_read_unlock();
+
+	return block;
+}
+
 static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
 {
 	if (!q)
@@ -573,6 +586,40 @@ static void tcf_block_put_all_chains(struct tcf_block *block)
 	}
 }
 
+static void __tcf_block_put(struct tcf_block *block, struct Qdisc *q,
+			    struct tcf_block_ext_info *ei)
+{
+	if (refcount_dec_and_test(&block->refcnt)) {
+		/* Flushing/putting all chains will cause the block to be
+		 * deallocated when last chain is freed. However, if chain_list
+		 * is empty, block has to be manually deallocated. After block
+		 * reference counter reached 0, it is no longer possible to
+		 * increment it or add new chains to block.
+		 */
+		bool free_block = list_empty(&block->chain_list);
+
+		if (tcf_block_shared(block))
+			tcf_block_remove(block, block->net);
+		if (!free_block)
+			tcf_block_flush_all_chains(block);
+
+		if (q)
+			tcf_block_offload_unbind(block, q, ei);
+
+		if (free_block)
+			kfree_rcu(block, rcu);
+		else
+			tcf_block_put_all_chains(block);
+	} else if (q) {
+		tcf_block_offload_unbind(block, q, ei);
+	}
+}
+
+static void tcf_block_refcnt_put(struct tcf_block *block)
+{
+	__tcf_block_put(block, NULL, NULL);
+}
+
 /* Find tcf block.
  * Set q, parent, cl when appropriate.
  */
@@ -795,7 +842,7 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 		if (tcf_block_shared(block))
 			tcf_block_remove(block, net);
 err_block_insert:
-		kfree(block);
+		kfree_rcu(block, rcu);
 	} else {
 		refcount_dec(&block->refcnt);
 	}
@@ -835,28 +882,7 @@ void tcf_block_put_ext(struct tcf_block *block, struct Qdisc *q,
 	tcf_chain0_head_change_cb_del(block, ei);
 	tcf_block_owner_del(block, q, ei->binder_type);
 
-	if (refcount_dec_and_test(&block->refcnt)) {
-		/* Flushing/putting all chains will cause the block to be
-		 * deallocated when last chain is freed. However, if chain_list
-		 * is empty, block has to be manually deallocated. After block
-		 * reference counter reached 0, it is no longer possible to
-		 * increment it or add new chains to block.
-		 */
-		bool free_block = list_empty(&block->chain_list);
-
-		if (tcf_block_shared(block))
-			tcf_block_remove(block, block->net);
-		if (!free_block)
-			tcf_block_flush_all_chains(block);
-		tcf_block_offload_unbind(block, q, ei);
-
-		if (free_block)
-			kfree(block);
-		else
-			tcf_block_put_all_chains(block);
-	} else {
-		tcf_block_offload_unbind(block, q, ei);
-	}
+	__tcf_block_put(block, q, ei);
 }
 EXPORT_SYMBOL(tcf_block_put_ext);
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v2 10/10] net: sched: use reference counting for tcf blocks on rules update
  2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
                   ` (8 preceding siblings ...)
  2018-09-17  7:17 ` [PATCH net-next v2 09/10] net: sched: implement tcf_block_refcnt_{get|put}() Vlad Buslov
@ 2018-09-17  7:17 ` Vlad Buslov
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2018-09-17  7:17 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc,
	Vlad Buslov

In order to remove dependency on rtnl lock on rules update path, always
take reference to block while using it on rules update path. Change
tcf_block_get() error handling to properly release block with reference
counting, instead of just destroying it, in order to accommodate potential
concurrent users.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0a7a3ace2da9..6a3eec5dbdf1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -633,7 +633,7 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 	int err = 0;
 
 	if (ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_lookup(net, block_index);
+		block = tcf_block_refcnt_get(net, block_index);
 		if (!block) {
 			NL_SET_ERR_MSG(extack, "Block of given index was not found");
 			return ERR_PTR(-EINVAL);
@@ -713,6 +713,14 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 			err = -EOPNOTSUPP;
 			goto errout_qdisc;
 		}
+
+		/* Always take reference to block in order to support execution
+		 * of rules update path of cls API without rtnl lock. Caller
+		 * must release block when it is finished using it. 'if' block
+		 * of this conditional obtain reference to block by calling
+		 * tcf_block_refcnt_get().
+		 */
+		refcount_inc(&block->refcnt);
 	}
 
 	return block;
@@ -726,6 +734,8 @@ static struct tcf_block *tcf_block_find(struct net *net, struct Qdisc **q,
 
 static void tcf_block_release(struct Qdisc *q, struct tcf_block *block)
 {
+	if (!IS_ERR_OR_NULL(block))
+		tcf_block_refcnt_put(block);
 	tcf_qdisc_put(q, true);
 }
 
@@ -794,21 +804,16 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 {
 	struct net *net = qdisc_net(q);
 	struct tcf_block *block = NULL;
-	bool created = false;
 	int err;
 
-	if (ei->block_index) {
+	if (ei->block_index)
 		/* block_index not 0 means the shared block is requested */
-		block = tcf_block_lookup(net, ei->block_index);
-		if (block)
-			refcount_inc(&block->refcnt);
-	}
+		block = tcf_block_refcnt_get(net, ei->block_index);
 
 	if (!block) {
 		block = tcf_block_create(net, q, ei->block_index, extack);
 		if (IS_ERR(block))
 			return PTR_ERR(block);
-		created = true;
 		if (tcf_block_shared(block)) {
 			err = tcf_block_insert(block, net, extack);
 			if (err)
@@ -838,14 +843,8 @@ int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
 err_chain0_head_change_cb_add:
 	tcf_block_owner_del(block, q, ei->binder_type);
 err_block_owner_add:
-	if (created) {
-		if (tcf_block_shared(block))
-			tcf_block_remove(block, net);
 err_block_insert:
-		kfree_rcu(block, rcu);
-	} else {
-		refcount_dec(&block->refcnt);
-	}
+	tcf_block_refcnt_put(block);
 	return err;
 }
 EXPORT_SYMBOL(tcf_block_get_ext);
@@ -1739,7 +1738,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		return err;
 
 	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_lookup(net, tcm->tcm_block_index);
+		block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
 		if (!block)
 			goto out;
 		/* If we work with block index, q is NULL and parent value
@@ -1798,6 +1797,8 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 		}
 	}
 
+	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+		tcf_block_refcnt_put(block);
 	cb->args[0] = index;
 
 out:
@@ -2062,7 +2063,7 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 		return err;
 
 	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK) {
-		block = tcf_block_lookup(net, tcm->tcm_block_index);
+		block = tcf_block_refcnt_get(net, tcm->tcm_block_index);
 		if (!block)
 			goto out;
 		/* If we work with block index, q is NULL and parent value
@@ -2129,6 +2130,8 @@ static int tc_dump_chain(struct sk_buff *skb, struct netlink_callback *cb)
 		index++;
 	}
 
+	if (tcm->tcm_ifindex == TCM_IFINDEX_MAGIC_BLOCK)
+		tcf_block_refcnt_put(block);
 	cb->args[0] = index;
 
 out:
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu
  2018-09-17  7:17 ` [PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu Vlad Buslov
@ 2018-09-19  2:45   ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2018-09-19  2:45 UTC (permalink / raw)
  To: vladbu
  Cc: netdev, jhs, xiyou.wangcong, jiri, stephen, ktkhai,
	nicolas.dichtel, gregkh, mark.rutland, leon, paulmck, fw,
	dsahern, christian, lucien.xin, jakub.kicinski, jbenc

From: Vlad Buslov <vladbu@mellanox.com>
Date: Mon, 17 Sep 2018 10:17:33 +0300

> +struct Qdisc *qdisc_lookup_rcu(struct net_device *dev, u32 handle)
> +{
> +	struct Qdisc *q;
> +	struct netdev_queue *nq;

Reverse christmas tree for the local variables, please.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function
  2018-09-17  7:17 ` [PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function Vlad Buslov
@ 2018-09-19 21:46   ` Cong Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2018-09-19 21:46 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, Stephen Hemminger, Kirill Tkhai, Nicolas Dichtel,
	Greg KH, mark.rutland, Leon Romanovsky, Paul E. McKenney,
	Florian Westphal, David Ahern, christian, lucien xin,
	Jakub Kicinski, Jiri Benc

On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -130,6 +130,12 @@ int rtnl_is_locked(void)
>  }
>  EXPORT_SYMBOL(rtnl_is_locked);
>
> +bool refcount_dec_and_rtnl_lock(refcount_t *r)
> +{
> +       return refcount_dec_and_mutex_lock(r, &rtnl_mutex);
> +}
> +EXPORT_SYMBOL(refcount_dec_and_rtnl_lock);

I think you probably want to #include <linux/refcount.h>
explicitly.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock
  2018-09-17  7:17 ` [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock Vlad Buslov
@ 2018-09-19 22:04   ` Cong Wang
  2018-09-20  7:20     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2018-09-19 22:04 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, Stephen Hemminger, Kirill Tkhai, Nicolas Dichtel,
	Greg KH, mark.rutland, Leon Romanovsky, Paul E. McKenney,
	Florian Westphal, David Ahern, christian, lucien xin,
	Jakub Kicinski, Jiri Benc

On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> +static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
> +{
> +       if (!q)
> +               return;
> +
> +       if (rtnl_held)
> +               qdisc_put(q);
> +       else
> +               qdisc_put_unlocked(q);
> +}

This is very ugly. You should know whether RTNL is held or
not when calling it.

What's more, all of your code passes true, so why do you
need a parameter for rtnl_held?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock
  2018-09-17  7:17 ` [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock Vlad Buslov
@ 2018-09-19 22:09   ` Cong Wang
  2018-09-20  7:36     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2018-09-19 22:09 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, Stephen Hemminger, Kirill Tkhai, Nicolas Dichtel,
	Greg KH, mark.rutland, Leon Romanovsky, Paul E. McKenney,
	Florian Westphal, David Ahern, christian, lucien xin,
	Jakub Kicinski, Jiri Benc

On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> @@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, struct net *net,
>                             struct netlink_ext_ack *extack)
>  {
>         struct tcf_net *tn = net_generic(net, tcf_net_id);
> +       int err;
> +
> +       idr_preload(GFP_KERNEL);
> +       spin_lock(&tn->idr_lock);
> +       err = idr_alloc_u32(&tn->idr, block, &block->index, block->index,
> +                           GFP_NOWAIT);


Why GFP_NOWAIT rather than GFP_ATOMIC here?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock
  2018-09-19 22:04   ` Cong Wang
@ 2018-09-20  7:20     ` Vlad Buslov
  2018-09-21 19:29       ` Cong Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2018-09-20  7:20 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, Stephen Hemminger, Kirill Tkhai, Nicolas Dichtel,
	Greg KH, mark.rutland, Leon Romanovsky, Paul E. McKenney,
	Florian Westphal, David Ahern, christian, lucien xin,
	Jakub Kicinski, Jiri Benc


On Wed 19 Sep 2018 at 22:04, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> +static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
>> +{
>> +       if (!q)
>> +               return;
>> +
>> +       if (rtnl_held)
>> +               qdisc_put(q);
>> +       else
>> +               qdisc_put_unlocked(q);
>> +}
>
> This is very ugly. You should know whether RTNL is held or
> not when calling it.
>
> What's more, all of your code passes true, so why do you
> need a parameter for rtnl_held?

It passes true because currently rule update handlers still registered
as locked. This is a preparation for next patch set where this would be
changed to proper variable that depends on qdics and classifier type.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock
  2018-09-19 22:09   ` Cong Wang
@ 2018-09-20  7:36     ` Vlad Buslov
  2018-09-21 19:33       ` Cong Wang
  0 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2018-09-20  7:36 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, Stephen Hemminger, Kirill Tkhai, Nicolas Dichtel,
	Greg KH, mark.rutland, Leon Romanovsky, Paul E. McKenney,
	Florian Westphal, David Ahern, christian, lucien xin,
	Jakub Kicinski, Jiri Benc


On Wed 19 Sep 2018 at 22:09, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> @@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, struct net *net,
>>                             struct netlink_ext_ack *extack)
>>  {
>>         struct tcf_net *tn = net_generic(net, tcf_net_id);
>> +       int err;
>> +
>> +       idr_preload(GFP_KERNEL);
>> +       spin_lock(&tn->idr_lock);
>> +       err = idr_alloc_u32(&tn->idr, block, &block->index, block->index,
>> +                           GFP_NOWAIT);
>
>
> Why GFP_NOWAIT rather than GFP_ATOMIC here?

I checked how idr_preload is used in kernel and in most places following
allocation uses GFP_NOWAIT (including idr-test.c). You suggest I should
change it to GFP_ATOMIC?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock
  2018-09-20  7:20     ` Vlad Buslov
@ 2018-09-21 19:29       ` Cong Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2018-09-21 19:29 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, Stephen Hemminger, Kirill Tkhai, Nicolas Dichtel,
	Greg KH, mark.rutland, Leon Romanovsky, Paul E. McKenney,
	Florian Westphal, David Ahern, christian, lucien xin,
	Jakub Kicinski, Jiri Benc

On Thu, Sep 20, 2018 at 12:21 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Wed 19 Sep 2018 at 22:04, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> +static void tcf_qdisc_put(struct Qdisc *q, bool rtnl_held)
> >> +{
> >> +       if (!q)
> >> +               return;
> >> +
> >> +       if (rtnl_held)
> >> +               qdisc_put(q);
> >> +       else
> >> +               qdisc_put_unlocked(q);
> >> +}
> >
> > This is very ugly. You should know whether RTNL is held or
> > not when calling it.
> >
> > What's more, all of your code passes true, so why do you
> > need a parameter for rtnl_held?
>
> It passes true because currently rule update handlers still registered
> as locked. This is a preparation for next patch set where this would be
> changed to proper variable that depends on qdics and classifier type.

You can always add it when you really need it.

I doubt you need such a tiny wrapper even in the next patchset,
as it can be easily folded into callers.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock
  2018-09-20  7:36     ` Vlad Buslov
@ 2018-09-21 19:33       ` Cong Wang
  0 siblings, 0 replies; 19+ messages in thread
From: Cong Wang @ 2018-09-21 19:33 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller, Stephen Hemminger, Kirill Tkhai, Nicolas Dichtel,
	Greg KH, mark.rutland, Leon Romanovsky, Paul E. McKenney,
	Florian Westphal, David Ahern, christian, lucien xin,
	Jakub Kicinski, Jiri Benc

On Thu, Sep 20, 2018 at 12:36 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Wed 19 Sep 2018 at 22:09, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Mon, Sep 17, 2018 at 12:19 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> @@ -482,16 +483,25 @@ static int tcf_block_insert(struct tcf_block *block, struct net *net,
> >>                             struct netlink_ext_ack *extack)
> >>  {
> >>         struct tcf_net *tn = net_generic(net, tcf_net_id);
> >> +       int err;
> >> +
> >> +       idr_preload(GFP_KERNEL);
> >> +       spin_lock(&tn->idr_lock);
> >> +       err = idr_alloc_u32(&tn->idr, block, &block->index, block->index,
> >> +                           GFP_NOWAIT);
> >
> >
> > Why GFP_NOWAIT rather than GFP_ATOMIC here?
>
> I checked how idr_preload is used in kernel and in most places following
> allocation uses GFP_NOWAIT (including idr-test.c). You suggest I should
> change it to GFP_ATOMIC?

No, I am just curious, as GFP_ATOMIC is more widely used when holding
spinlock. I thought you have a special reason to use GFP_NOWAIT here,
but anyway, GFP_NOWAIT is probably fine too.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-09-22  1:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17  7:17 [PATCH net-next v2 00/10] Refactor classifier API to work with Qdisc/blocks without rtnl lock Vlad Buslov
2018-09-17  7:17 ` [PATCH net-next v2 01/10] net: core: netlink: add helper refcount dec and lock function Vlad Buslov
2018-09-19 21:46   ` Cong Wang
2018-09-17  7:17 ` [PATCH net-next v2 02/10] net: sched: rename qdisc_destroy() to qdisc_put() Vlad Buslov
2018-09-17  7:17 ` [PATCH net-next v2 03/10] net: sched: extend Qdisc with rcu Vlad Buslov
2018-09-19  2:45   ` David Miller
2018-09-17  7:17 ` [PATCH net-next v2 04/10] net: sched: add helper function to take reference to Qdisc Vlad Buslov
2018-09-17  7:17 ` [PATCH net-next v2 05/10] net: sched: use Qdisc rcu API instead of relying on rtnl lock Vlad Buslov
2018-09-19 22:04   ` Cong Wang
2018-09-20  7:20     ` Vlad Buslov
2018-09-21 19:29       ` Cong Wang
2018-09-17  7:17 ` [PATCH net-next v2 06/10] net: sched: change tcf block reference counter type to refcount_t Vlad Buslov
2018-09-17  7:17 ` [PATCH net-next v2 07/10] net: sched: implement functions to put and flush all chains Vlad Buslov
2018-09-17  7:17 ` [PATCH net-next v2 08/10] net: sched: protect block idr with spinlock Vlad Buslov
2018-09-19 22:09   ` Cong Wang
2018-09-20  7:36     ` Vlad Buslov
2018-09-21 19:33       ` Cong Wang
2018-09-17  7:17 ` [PATCH net-next v2 09/10] net: sched: implement tcf_block_refcnt_{get|put}() Vlad Buslov
2018-09-17  7:17 ` [PATCH net-next v2 10/10] net: sched: use reference counting for tcf blocks on rules update Vlad Buslov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.