netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock
@ 2019-02-14  7:47 Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference Vlad Buslov
                   ` (12 more replies)
  0 siblings, 13 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Currently, all netlink protocol handlers for updating rules, actions and
qdiscs are protected with single global rtnl lock which removes any
possibility for parallelism. This patch set is a third step to remove
rtnl lock dependency from TC rules update path.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
already registered with this flag and only take rtnl lock when qdisc or
classifier requires it. Classifiers can indicate that their ops
callbacks don't require caller to hold rtnl lock by setting the
TCF_PROTO_OPS_DOIT_UNLOCKED flag. The goal of this change is to refactor
flower classifier to support unlocked execution and register it with
unlocked flag.

This patch set implements following changes to make flower classifier
concurrency-safe:

- Implement reference counting for individual filters. Change fl_get to
  take reference to filter. Implement tp->ops->put callback that was
  introduced in cls API patch set to release reference to flower filter.

- Use tp->lock spinlock to protect internal classifier data structures
  from concurrent modification.

- Handle concurrent tcf proto deletion by returning EAGAIN, which will
  cause cls API to retry and create new proto instance or return error
  to the user (depending on message type).

- Handle concurrent insertion of filter with same priority and handle by
  returning EAGAIN, which will cause cls API to lookup filter again and
  process it accordingly to netlink message flags.

- Extend flower mask with reference counting and protect masks list with
  masks_lock spinlock.

- Prevent concurrent mask insertion by inserting temporary value to
  masks hash table. This is necessary because mask initialization is a
  sleeping operation and cannot be done while holding tp->lock.

Tcf hw offloads API is not changed by this patch set and still requires
caller to hold rtnl lock. Refactored flower classifier tracks rtnl lock
state by means of 'rtnl_held' flag provided by cls API and obtains the
lock before calling hw offloads.

With these changes flower classifier is safely registered with
TCF_PROTO_OPS_DOIT_UNLOCKED flag in last patch.

Github: [https://github.com/vbuslov/linux/tree/unlocked_flower_cong_1]

Vlad Buslov (12):
  net: sched: flower: don't check for rtnl on head dereference
  net: sched: flower: refactor fl_change
  net: sched: flower: introduce reference counting for filters
  net: sched: flower: track filter deletion with flag
  net: sched: flower: add reference counter to flower mask
  net: sched: flower: handle concurrent mask insertion
  net: sched: flower: protect masks list with spinlock
  net: sched: flower: handle concurrent filter insertion in fl_change
  net: sched: flower: handle concurrent tcf proto deletion
  net: sched: flower: protect flower classifier state with spinlock
  net: sched: flower: track rtnl lock state
  net: sched: flower: set unlocked flag for flower proto ops

 net/sched/cls_flower.c | 424 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 321 insertions(+), 103 deletions(-)

-- 
2.13.6


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-18 19:08   ` Cong Wang
  2019-02-14  7:47 ` [PATCH net-next 02/12] net: sched: flower: refactor fl_change Vlad Buslov
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Flower classifier only changes root pointer during init and destroy. Cls
API implements reference counting for tcf_proto, so there is no danger of
concurrent access to tp when it is being destroyed, even without protection
provided by rtnl lock.

Implement new function fl_head_dereference() to dereference tp->root
without checking for rtnl lock. Use it in all flower function that obtain
head pointer instead of rtnl_dereference().

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 32fa3e20adc5..88d7af78ba7e 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -433,10 +433,20 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f)
 			      cls_flower.stats.lastused);
 }
 
+static struct cls_fl_head *fl_head_dereference(struct tcf_proto *tp)
+{
+	/* Flower classifier only changes root pointer during init and destroy.
+	 * Cls API implements reference counting for tcf_proto, so there is no
+	 * danger of concurrent access to tp when it is being destroyed, even
+	 * without protection provided by rtnl lock.
+	 */
+	return rcu_dereference_protected(tp->root, 1);
+}
+
 static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
 			struct netlink_ext_ack *extack)
 {
-	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct cls_fl_head *head = fl_head_dereference(tp);
 	bool async = tcf_exts_get_net(&f->exts);
 	bool last;
 
@@ -468,7 +478,7 @@ static void fl_destroy_sleepable(struct work_struct *work)
 static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
 		       struct netlink_ext_ack *extack)
 {
-	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct fl_flow_mask *mask, *next_mask;
 	struct cls_fl_filter *f, *next;
 
@@ -486,7 +496,7 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
 
 static void *fl_get(struct tcf_proto *tp, u32 handle)
 {
-	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct cls_fl_head *head = fl_head_dereference(tp);
 
 	return idr_find(&head->handle_idr, handle);
 }
@@ -1304,7 +1314,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		     void **arg, bool ovr, bool rtnl_held,
 		     struct netlink_ext_ack *extack)
 {
-	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct cls_fl_filter *fold = *arg;
 	struct cls_fl_filter *fnew;
 	struct fl_flow_mask *mask;
@@ -1441,7 +1451,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
 		     bool rtnl_held, struct netlink_ext_ack *extack)
 {
-	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct cls_fl_filter *f = arg;
 
 	rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
@@ -1454,7 +1464,7 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
 static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 		    bool rtnl_held)
 {
-	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct cls_fl_filter *f;
 
 	arg->count = arg->skip;
@@ -1473,7 +1483,7 @@ static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
 			void *cb_priv, struct netlink_ext_ack *extack)
 {
-	struct cls_fl_head *head = rtnl_dereference(tp->root);
+	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct tc_cls_flower_offload cls_flower = {};
 	struct tcf_block *block = tp->chain->block;
 	struct fl_flow_mask *mask;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 02/12] net: sched: flower: refactor fl_change
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-14 20:34   ` Stefano Brivio
  2019-02-14  7:47 ` [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters Vlad Buslov
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

As a preparation for using classifier spinlock instead of relying on
external rtnl lock, rearrange code in fl_change. The goal is to group the
code which changes classifier state in single block in order to allow
following commits in this set to protect it from parallel modification with
tp->lock. Data structures that require tp->lock protection are mask
hashtable and filters list, and classifier handle_idr.

fl_hw_replace_filter() is a sleeping function and cannot be called while
holding a spinlock. In order to execute all sequence of changes to shared
classifier data structures atomically, call fl_hw_replace_filter() before
modifying them.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
 1 file changed, 44 insertions(+), 41 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 88d7af78ba7e..91596a6271f8 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (err < 0)
 		goto errout;
 
-	if (!handle) {
-		handle = 1;
-		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
-				    INT_MAX, GFP_KERNEL);
-	} else if (!fold) {
-		/* user specifies a handle and it doesn't exist */
-		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
-				    handle, GFP_KERNEL);
-	}
-	if (err)
-		goto errout;
-	fnew->handle = handle;
-
 	if (tb[TCA_FLOWER_FLAGS]) {
 		fnew->flags = nla_get_u32(tb[TCA_FLOWER_FLAGS]);
 
 		if (!tc_flags_valid(fnew->flags)) {
 			err = -EINVAL;
-			goto errout_idr;
+			goto errout;
 		}
 	}
 
 	err = fl_set_parms(net, tp, fnew, mask, base, tb, tca[TCA_RATE], ovr,
 			   tp->chain->tmplt_priv, extack);
 	if (err)
-		goto errout_idr;
+		goto errout;
 
 	err = fl_check_assign_mask(head, fnew, fold, mask);
 	if (err)
-		goto errout_idr;
-
-	if (!fold && __fl_lookup(fnew->mask, &fnew->mkey)) {
-		err = -EEXIST;
-		goto errout_mask;
-	}
-
-	err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
-				     fnew->mask->filter_ht_params);
-	if (err)
-		goto errout_mask;
+		goto errout;
 
 	if (!tc_skip_hw(fnew->flags)) {
 		err = fl_hw_replace_filter(tp, fnew, extack);
 		if (err)
-			goto errout_mask_ht;
+			goto errout_mask;
 	}
 
 	if (!tc_in_hw(fnew->flags))
 		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
 	if (fold) {
+		fnew->handle = handle;
+
+		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
+					     fnew->mask->filter_ht_params);
+		if (err)
+			goto errout_hw;
+
 		rhashtable_remove_fast(&fold->mask->ht,
 				       &fold->ht_node,
 				       fold->mask->filter_ht_params);
-		if (!tc_skip_hw(fold->flags))
-			fl_hw_destroy_filter(tp, fold, NULL);
-	}
-
-	*arg = fnew;
-
-	if (fold) {
 		idr_replace(&head->handle_idr, fnew, fnew->handle);
 		list_replace_rcu(&fold->list, &fnew->list);
+
+		if (!tc_skip_hw(fold->flags))
+			fl_hw_destroy_filter(tp, fold, NULL);
 		tcf_unbind_filter(tp, &fold->res);
 		tcf_exts_get_net(&fold->exts);
 		tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
 	} else {
+		if (__fl_lookup(fnew->mask, &fnew->mkey)) {
+			err = -EEXIST;
+			goto errout_hw;
+		}
+
+		if (handle) {
+			/* user specifies a handle and it doesn't exist */
+			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+					    handle, GFP_ATOMIC);
+		} else {
+			handle = 1;
+			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
+					    INT_MAX, GFP_ATOMIC);
+		}
+		if (err)
+			goto errout_hw;
+		fnew->handle = handle;
+
+		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
+					     fnew->mask->filter_ht_params);
+		if (err)
+			goto errout_idr;
+
 		list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
 	}
 
+	*arg = fnew;
+
 	kfree(tb);
 	kfree(mask);
 	return 0;
 
-errout_mask_ht:
-	rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node,
-			       fnew->mask->filter_ht_params);
-
-errout_mask:
-	fl_mask_put(head, fnew->mask, false);
-
 errout_idr:
 	if (!fold)
 		idr_remove(&head->handle_idr, fnew->handle);
+errout_hw:
+	if (!tc_skip_hw(fnew->flags))
+		fl_hw_destroy_filter(tp, fnew, NULL);
+errout_mask:
+	fl_mask_put(head, fnew->mask, false);
 errout:
 	tcf_exts_destroy(&fnew->exts);
 	kfree(fnew);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 02/12] net: sched: flower: refactor fl_change Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-14 20:34   ` Stefano Brivio
  2019-02-14  7:47 ` [PATCH net-next 04/12] net: sched: flower: track filter deletion with flag Vlad Buslov
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Extend flower filters with reference counting in order to remove dependency
on rtnl lock in flower ops and allow to modify filters concurrently.
Reference to flower filter can be taken/released concurrently as soon as it
is marked as 'unlocked' by last patch in this series. Use atomic reference
counter type to make concurrent modifications safe.

Always take reference to flower filter while working with it:
- Modify fl_get() to take reference to filter.
- Implement tp->put() callback as fl_put() function to allow cls API to
release reference taken by fl_get().
- Modify fl_change() to assume that caller holds reference to fold and take
reference to fnew.
- Take reference to filter while using it in fl_walk().

Implement helper functions to get/put filter reference counter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 95 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 81 insertions(+), 14 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 91596a6271f8..b216ed26f344 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -14,6 +14,7 @@
 #include <linux/module.h>
 #include <linux/rhashtable.h>
 #include <linux/workqueue.h>
+#include <linux/refcount.h>
 
 #include <linux/if_ether.h>
 #include <linux/in6.h>
@@ -104,6 +105,11 @@ struct cls_fl_filter {
 	u32 in_hw_count;
 	struct rcu_work rwork;
 	struct net_device *hw_dev;
+	/* Flower classifier is unlocked, which means that its reference counter
+	 * can be changed concurrently without any kind of external
+	 * synchronization. Use atomic reference counter to be concurrency-safe.
+	 */
+	refcount_t refcnt;
 };
 
 static const struct rhashtable_params mask_ht_params = {
@@ -443,6 +449,47 @@ static struct cls_fl_head *fl_head_dereference(struct tcf_proto *tp)
 	return rcu_dereference_protected(tp->root, 1);
 }
 
+static void __fl_put(struct cls_fl_filter *f)
+{
+	if (!refcount_dec_and_test(&f->refcnt))
+		return;
+
+	if (tcf_exts_get_net(&f->exts))
+		tcf_queue_work(&f->rwork, fl_destroy_filter_work);
+	else
+		__fl_destroy_filter(f);
+}
+
+static struct cls_fl_filter *__fl_get(struct cls_fl_head *head, u32 handle)
+{
+	struct cls_fl_filter *f;
+
+	rcu_read_lock();
+	f = idr_find(&head->handle_idr, handle);
+	if (f && !refcount_inc_not_zero(&f->refcnt))
+		f = NULL;
+	rcu_read_unlock();
+
+	return f;
+}
+
+static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
+						unsigned long *handle)
+{
+	struct cls_fl_head *head = fl_head_dereference(tp);
+	struct cls_fl_filter *f;
+
+	rcu_read_lock();
+	/* don't return filters that are being deleted */
+	while ((f = idr_get_next_ul(&head->handle_idr,
+				    handle)) != NULL &&
+	       !refcount_inc_not_zero(&f->refcnt))
+		++(*handle);
+	rcu_read_unlock();
+
+	return f;
+}
+
 static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
 			struct netlink_ext_ack *extack)
 {
@@ -456,10 +503,7 @@ static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
 	if (!tc_skip_hw(f->flags))
 		fl_hw_destroy_filter(tp, f, extack);
 	tcf_unbind_filter(tp, &f->res);
-	if (async)
-		tcf_queue_work(&f->rwork, fl_destroy_filter_work);
-	else
-		__fl_destroy_filter(f);
+	__fl_put(f);
 
 	return last;
 }
@@ -494,11 +538,18 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
 	tcf_queue_work(&head->rwork, fl_destroy_sleepable);
 }
 
+static void fl_put(struct tcf_proto *tp, void *arg)
+{
+	struct cls_fl_filter *f = arg;
+
+	__fl_put(f);
+}
+
 static void *fl_get(struct tcf_proto *tp, u32 handle)
 {
 	struct cls_fl_head *head = fl_head_dereference(tp);
 
-	return idr_find(&head->handle_idr, handle);
+	return __fl_get(head, handle);
 }
 
 static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
@@ -1321,12 +1372,16 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	struct nlattr **tb;
 	int err;
 
-	if (!tca[TCA_OPTIONS])
-		return -EINVAL;
+	if (!tca[TCA_OPTIONS]) {
+		err = -EINVAL;
+		goto errout_fold;
+	}
 
 	mask = kzalloc(sizeof(struct fl_flow_mask), GFP_KERNEL);
-	if (!mask)
-		return -ENOBUFS;
+	if (!mask) {
+		err = -ENOBUFS;
+		goto errout_fold;
+	}
 
 	tb = kcalloc(TCA_FLOWER_MAX + 1, sizeof(struct nlattr *), GFP_KERNEL);
 	if (!tb) {
@@ -1349,6 +1404,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		err = -ENOBUFS;
 		goto errout_tb;
 	}
+	refcount_set(&fnew->refcnt, 1);
 
 	err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
 	if (err < 0)
@@ -1381,6 +1437,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (!tc_in_hw(fnew->flags))
 		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
+	refcount_inc(&fnew->refcnt);
 	if (fold) {
 		fnew->handle = handle;
 
@@ -1399,7 +1456,11 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 			fl_hw_destroy_filter(tp, fold, NULL);
 		tcf_unbind_filter(tp, &fold->res);
 		tcf_exts_get_net(&fold->exts);
-		tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
+		/* Caller holds reference to fold, so refcnt is always > 0
+		 * after this.
+		 */
+		refcount_dec(&fold->refcnt);
+		__fl_put(fold);
 	} else {
 		if (__fl_lookup(fnew->mask, &fnew->mkey)) {
 			err = -EEXIST;
@@ -1448,6 +1509,9 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	kfree(tb);
 errout_mask_alloc:
 	kfree(mask);
+errout_fold:
+	if (fold)
+		__fl_put(fold);
 	return err;
 }
 
@@ -1461,24 +1525,26 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
 			       f->mask->filter_ht_params);
 	__fl_delete(tp, f, extack);
 	*last = list_empty(&head->masks);
+	__fl_put(f);
+
 	return 0;
 }
 
 static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 		    bool rtnl_held)
 {
-	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct cls_fl_filter *f;
 
 	arg->count = arg->skip;
 
-	while ((f = idr_get_next_ul(&head->handle_idr,
-				    &arg->cookie)) != NULL) {
+	while ((f = fl_get_next_filter(tp, &arg->cookie)) != NULL) {
 		if (arg->fn(tp, f, arg) < 0) {
+			__fl_put(f);
 			arg->stop = 1;
 			break;
 		}
-		arg->cookie = f->handle + 1;
+		__fl_put(f);
+		arg->cookie++;
 		arg->count++;
 	}
 }
@@ -2148,6 +2214,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
 	.init		= fl_init,
 	.destroy	= fl_destroy,
 	.get		= fl_get,
+	.put		= fl_put,
 	.change		= fl_change,
 	.delete		= fl_delete,
 	.walk		= fl_walk,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 04/12] net: sched: flower: track filter deletion with flag
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (2 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-14 20:49   ` Stefano Brivio
  2019-02-14  7:47 ` [PATCH net-next 05/12] net: sched: flower: add reference counter to flower mask Vlad Buslov
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

In order to prevent double deletion of filter by concurrent tasks when rtnl
lock is not used for synchronization, add 'deleted' filter field. Check
value of this field when modifying filters and return error if concurrent
deletion is detected.

Refactor __fl_delete() to accept pointer to 'last' boolean as argument,
and return error code as function return value instead. This is necessary
to signal concurrent filter delete to caller.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 55 ++++++++++++++++++++++++++++++++++----------------
 1 file changed, 38 insertions(+), 17 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b216ed26f344..fa5465f890e1 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -110,6 +110,7 @@ struct cls_fl_filter {
 	 * synchronization. Use atomic reference counter to be concurrency-safe.
 	 */
 	refcount_t refcnt;
+	bool deleted;
 };
 
 static const struct rhashtable_params mask_ht_params = {
@@ -454,6 +455,8 @@ static void __fl_put(struct cls_fl_filter *f)
 	if (!refcount_dec_and_test(&f->refcnt))
 		return;
 
+	WARN_ON(!f->deleted);
+
 	if (tcf_exts_get_net(&f->exts))
 		tcf_queue_work(&f->rwork, fl_destroy_filter_work);
 	else
@@ -490,22 +493,31 @@ static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
 	return f;
 }
 
-static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
-			struct netlink_ext_ack *extack)
+static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
+		       bool *last, struct netlink_ext_ack *extack)
 {
 	struct cls_fl_head *head = fl_head_dereference(tp);
 	bool async = tcf_exts_get_net(&f->exts);
-	bool last;
-
-	idr_remove(&head->handle_idr, f->handle);
-	list_del_rcu(&f->list);
-	last = fl_mask_put(head, f->mask, async);
-	if (!tc_skip_hw(f->flags))
-		fl_hw_destroy_filter(tp, f, extack);
-	tcf_unbind_filter(tp, &f->res);
-	__fl_put(f);
+	int err = 0;
+
+	(*last) = false;
+
+	if (!f->deleted) {
+		f->deleted = true;
+		rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
+				       f->mask->filter_ht_params);
+		idr_remove(&head->handle_idr, f->handle);
+		list_del_rcu(&f->list);
+		(*last) = fl_mask_put(head, f->mask, async);
+		if (!tc_skip_hw(f->flags))
+			fl_hw_destroy_filter(tp, f, extack);
+		tcf_unbind_filter(tp, &f->res);
+		__fl_put(f);
+	} else {
+		err = -ENOENT;
+	}
 
-	return last;
+	return err;
 }
 
 static void fl_destroy_sleepable(struct work_struct *work)
@@ -525,10 +537,12 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
 	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct fl_flow_mask *mask, *next_mask;
 	struct cls_fl_filter *f, *next;
+	bool last;
 
 	list_for_each_entry_safe(mask, next_mask, &head->masks, list) {
 		list_for_each_entry_safe(f, next, &mask->filters, list) {
-			if (__fl_delete(tp, f, extack))
+			__fl_delete(tp, f, &last, extack);
+			if (last)
 				break;
 		}
 	}
@@ -1439,6 +1453,12 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 
 	refcount_inc(&fnew->refcnt);
 	if (fold) {
+		/* Fold filter was deleted concurrently. Retry lookup. */
+		if (fold->deleted) {
+			err = -EAGAIN;
+			goto errout_hw;
+		}
+
 		fnew->handle = handle;
 
 		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
@@ -1451,6 +1471,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 				       fold->mask->filter_ht_params);
 		idr_replace(&head->handle_idr, fnew, fnew->handle);
 		list_replace_rcu(&fold->list, &fnew->list);
+		fold->deleted = true;
 
 		if (!tc_skip_hw(fold->flags))
 			fl_hw_destroy_filter(tp, fold, NULL);
@@ -1520,14 +1541,14 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
 {
 	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct cls_fl_filter *f = arg;
+	bool last_on_mask;
+	int err = 0;
 
-	rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
-			       f->mask->filter_ht_params);
-	__fl_delete(tp, f, extack);
+	err = __fl_delete(tp, f, &last_on_mask, extack);
 	*last = list_empty(&head->masks);
 	__fl_put(f);
 
-	return 0;
+	return err;
 }
 
 static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 05/12] net: sched: flower: add reference counter to flower mask
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (3 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 04/12] net: sched: flower: track filter deletion with flag Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 06/12] net: sched: flower: handle concurrent mask insertion Vlad Buslov
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Extend fl_flow_mask structure with reference counter to allow parallel
modification without relying on rtnl lock. Use rcu read lock to safely
lookup mask and increment reference counter in order to accommodate
concurrent deletes.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index fa5465f890e1..b41b72e894a6 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -76,6 +76,7 @@ struct fl_flow_mask {
 	struct list_head filters;
 	struct rcu_work rwork;
 	struct list_head list;
+	refcount_t refcnt;
 };
 
 struct fl_flow_tmplt {
@@ -320,6 +321,7 @@ static int fl_init(struct tcf_proto *tp)
 
 static void fl_mask_free(struct fl_flow_mask *mask)
 {
+	WARN_ON(!list_empty(&mask->filters));
 	rhashtable_destroy(&mask->ht);
 	kfree(mask);
 }
@@ -335,7 +337,7 @@ static void fl_mask_free_work(struct work_struct *work)
 static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
 			bool async)
 {
-	if (!list_empty(&mask->filters))
+	if (!refcount_dec_and_test(&mask->refcnt))
 		return false;
 
 	rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params);
@@ -1298,6 +1300,7 @@ static struct fl_flow_mask *fl_create_new_mask(struct cls_fl_head *head,
 
 	INIT_LIST_HEAD_RCU(&newmask->filters);
 
+	refcount_set(&newmask->refcnt, 1);
 	err = rhashtable_insert_fast(&head->ht, &newmask->ht_node,
 				     mask_ht_params);
 	if (err)
@@ -1321,9 +1324,13 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
 				struct fl_flow_mask *mask)
 {
 	struct fl_flow_mask *newmask;
+	int ret = 0;
 
+	rcu_read_lock();
 	fnew->mask = rhashtable_lookup_fast(&head->ht, mask, mask_ht_params);
 	if (!fnew->mask) {
+		rcu_read_unlock();
+
 		if (fold)
 			return -EINVAL;
 
@@ -1332,11 +1339,15 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
 			return PTR_ERR(newmask);
 
 		fnew->mask = newmask;
+		return 0;
 	} else if (fold && fold->mask != fnew->mask) {
-		return -EINVAL;
+		ret = -EINVAL;
+	} else if (!refcount_inc_not_zero(&fnew->mask->refcnt)) {
+		/* Mask was deleted concurrently, try again */
+		ret = -EAGAIN;
 	}
-
-	return 0;
+	rcu_read_unlock();
+	return ret;
 }
 
 static int fl_set_parms(struct net *net, struct tcf_proto *tp,
@@ -1473,6 +1484,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		list_replace_rcu(&fold->list, &fnew->list);
 		fold->deleted = true;
 
+		fl_mask_put(head, fold->mask, true);
 		if (!tc_skip_hw(fold->flags))
 			fl_hw_destroy_filter(tp, fold, NULL);
 		tcf_unbind_filter(tp, &fold->res);
@@ -1522,7 +1534,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (!tc_skip_hw(fnew->flags))
 		fl_hw_destroy_filter(tp, fnew, NULL);
 errout_mask:
-	fl_mask_put(head, fnew->mask, false);
+	fl_mask_put(head, fnew->mask, true);
 errout:
 	tcf_exts_destroy(&fnew->exts);
 	kfree(fnew);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 06/12] net: sched: flower: handle concurrent mask insertion
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (4 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 05/12] net: sched: flower: add reference counter to flower mask Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-15 22:46   ` Stefano Brivio
  2019-02-14  7:47 ` [PATCH net-next 07/12] net: sched: flower: protect masks list with spinlock Vlad Buslov
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Without rtnl lock protection masks with same key can be inserted
concurrently. Insert temporary mask with reference count zero to masks
hashtable. This will cause any concurrent modifications to retry.

Wait for rcu grace period to complete after removing temporary mask from
masks hashtable to accommodate concurrent readers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Suggested-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 41 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b41b72e894a6..2b032303f8d5 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1301,11 +1301,14 @@ static struct fl_flow_mask *fl_create_new_mask(struct cls_fl_head *head,
 	INIT_LIST_HEAD_RCU(&newmask->filters);
 
 	refcount_set(&newmask->refcnt, 1);
-	err = rhashtable_insert_fast(&head->ht, &newmask->ht_node,
-				     mask_ht_params);
+	err = rhashtable_replace_fast(&head->ht, &mask->ht_node,
+				      &newmask->ht_node, mask_ht_params);
 	if (err)
 		goto errout_destroy;
 
+	/* Wait until any potential concurrent users of mask are finished */
+	synchronize_rcu();
+
 	list_add_tail_rcu(&newmask->list, &head->masks);
 
 	return newmask;
@@ -1327,19 +1330,36 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
 	int ret = 0;
 
 	rcu_read_lock();
-	fnew->mask = rhashtable_lookup_fast(&head->ht, mask, mask_ht_params);
+
+	/* Insert mask as temporary node to prevent concurrent creation of mask
+	 * with same key. Any concurrent lookups with same key will return
+	 * EAGAIN because mask's refcnt is zero. It is safe to insert
+	 * stack-allocated 'mask' to masks hash table because we call
+	 * synchronize_rcu() before returning from this function (either in case
+	 * of error or after replacing it with heap-allocated mask in
+	 * fl_create_new_mask()).
+	 */
+	fnew->mask = rhashtable_lookup_get_insert_fast(&head->ht,
+						       &mask->ht_node,
+						       mask_ht_params);
 	if (!fnew->mask) {
 		rcu_read_unlock();
 
-		if (fold)
-			return -EINVAL;
+		if (fold) {
+			ret = -EINVAL;
+			goto errout_cleanup;
+		}
 
 		newmask = fl_create_new_mask(head, mask);
-		if (IS_ERR(newmask))
-			return PTR_ERR(newmask);
+		if (IS_ERR(newmask)) {
+			ret = PTR_ERR(newmask);
+			goto errout_cleanup;
+		}
 
 		fnew->mask = newmask;
 		return 0;
+	} else if (IS_ERR(fnew->mask)) {
+		ret = PTR_ERR(fnew->mask);
 	} else if (fold && fold->mask != fnew->mask) {
 		ret = -EINVAL;
 	} else if (!refcount_inc_not_zero(&fnew->mask->refcnt)) {
@@ -1348,6 +1368,13 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
 	}
 	rcu_read_unlock();
 	return ret;
+
+errout_cleanup:
+	rhashtable_remove_fast(&head->ht, &mask->ht_node,
+			       mask_ht_params);
+	/* Wait until any potential concurrent users of mask are finished */
+	synchronize_rcu();
+	return ret;
 }
 
 static int fl_set_parms(struct net *net, struct tcf_proto *tp,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 07/12] net: sched: flower: protect masks list with spinlock
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (5 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 06/12] net: sched: flower: handle concurrent mask insertion Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 08/12] net: sched: flower: handle concurrent filter insertion in fl_change Vlad Buslov
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Protect modifications of flower masks list with spinlock to remove
dependency on rtnl lock and allow concurrent access.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 2b032303f8d5..fc6371a9b0f9 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -88,6 +88,7 @@ struct fl_flow_tmplt {
 
 struct cls_fl_head {
 	struct rhashtable ht;
+	spinlock_t masks_lock; /* Protect masks list */
 	struct list_head masks;
 	struct rcu_work rwork;
 	struct idr handle_idr;
@@ -312,6 +313,7 @@ static int fl_init(struct tcf_proto *tp)
 	if (!head)
 		return -ENOBUFS;
 
+	spin_lock_init(&head->masks_lock);
 	INIT_LIST_HEAD_RCU(&head->masks);
 	rcu_assign_pointer(tp->root, head);
 	idr_init(&head->handle_idr);
@@ -341,7 +343,11 @@ static bool fl_mask_put(struct cls_fl_head *head, struct fl_flow_mask *mask,
 		return false;
 
 	rhashtable_remove_fast(&head->ht, &mask->ht_node, mask_ht_params);
+
+	spin_lock(&head->masks_lock);
 	list_del_rcu(&mask->list);
+	spin_unlock(&head->masks_lock);
+
 	if (async)
 		tcf_queue_work(&mask->rwork, fl_mask_free_work);
 	else
@@ -1309,7 +1315,9 @@ static struct fl_flow_mask *fl_create_new_mask(struct cls_fl_head *head,
 	/* Wait until any potential concurrent users of mask are finished */
 	synchronize_rcu();
 
+	spin_lock(&head->masks_lock);
 	list_add_tail_rcu(&newmask->list, &head->masks);
+	spin_unlock(&head->masks_lock);
 
 	return newmask;
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 08/12] net: sched: flower: handle concurrent filter insertion in fl_change
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (6 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 07/12] net: sched: flower: protect masks list with spinlock Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 09/12] net: sched: flower: handle concurrent tcf proto deletion Vlad Buslov
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Check if user specified a handle and another filter with the same handle
was inserted concurrently. Return EAGAIN to retry filter processing (in
case it is an overwrite request).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index fc6371a9b0f9..114cb7876133 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1539,6 +1539,15 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 			/* user specifies a handle and it doesn't exist */
 			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
 					    handle, GFP_ATOMIC);
+
+			/* Filter with specified handle was concurrently
+			 * inserted after initial check in cls_api. This is not
+			 * necessarily an error if NLM_F_EXCL is not set in
+			 * message flags. Returning EAGAIN will cause cls_api to
+			 * try to update concurrently inserted rule.
+			 */
+			if (err == -ENOSPC)
+				err = -EAGAIN;
 		} else {
 			handle = 1;
 			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 09/12] net: sched: flower: handle concurrent tcf proto deletion
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (7 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 08/12] net: sched: flower: handle concurrent filter insertion in fl_change Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-18 20:47   ` Cong Wang
  2019-02-14  7:47 ` [PATCH net-next 10/12] net: sched: flower: protect flower classifier state with spinlock Vlad Buslov
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Without rtnl lock protection tcf proto can be deleted concurrently. Check
tcf proto 'deleting' flag after taking tcf spinlock to verify that no
concurrent deletion is in progress. Return EAGAIN error if concurrent
deletion detected, which will cause caller to retry and possibly create new
instance of tcf proto.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 114cb7876133..bfef7d6c597d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1497,6 +1497,14 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (!tc_in_hw(fnew->flags))
 		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
+	/* tp was deleted concurrently. EAGAIN will cause caller to lookup proto
+	 * again or create new one, if necessary.
+	 */
+	if (tp->deleting) {
+		err = -EAGAIN;
+		goto errout_hw;
+	}
+
 	refcount_inc(&fnew->refcnt);
 	if (fold) {
 		/* Fold filter was deleted concurrently. Retry lookup. */
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 10/12] net: sched: flower: protect flower classifier state with spinlock
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (8 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 09/12] net: sched: flower: handle concurrent tcf proto deletion Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-14  7:47 ` [PATCH net-next 11/12] net: sched: flower: track rtnl lock state Vlad Buslov
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

struct tcf_proto was extended with spinlock to be used by classifiers
instead of global rtnl lock. Use it to protect shared flower classifier
data structures (handle_idr, mask hashtable and list) and fields of
individual filters that can be accessed concurrently. This patch set uses
tcf_proto->lock as per instance lock that protects all filters on
tcf_proto.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 36 ++++++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index bfef7d6c597d..556f7a1c694a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -384,7 +384,9 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 	cls_flower.cookie = (unsigned long) f;
 
 	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	spin_lock(&tp->lock);
 	tcf_block_offload_dec(block, &f->flags);
+	spin_unlock(&tp->lock);
 }
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
@@ -422,7 +424,9 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 		return err;
 	} else if (err > 0) {
 		f->in_hw_count = err;
+		spin_lock(&tp->lock);
 		tcf_block_offload_inc(block, &f->flags);
+		spin_unlock(&tp->lock);
 	}
 
 	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW))
@@ -510,18 +514,22 @@ static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
 
 	(*last) = false;
 
+	spin_lock(&tp->lock);
 	if (!f->deleted) {
 		f->deleted = true;
 		rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
 				       f->mask->filter_ht_params);
 		idr_remove(&head->handle_idr, f->handle);
 		list_del_rcu(&f->list);
+		spin_unlock(&tp->lock);
+
 		(*last) = fl_mask_put(head, f->mask, async);
 		if (!tc_skip_hw(f->flags))
 			fl_hw_destroy_filter(tp, f, extack);
 		tcf_unbind_filter(tp, &f->res);
 		__fl_put(f);
 	} else {
+		spin_unlock(&tp->lock);
 		err = -ENOENT;
 	}
 
@@ -1497,6 +1505,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (!tc_in_hw(fnew->flags))
 		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
 
+	spin_lock(&tp->lock);
+
 	/* tp was deleted concurrently. EAGAIN will cause caller to lookup proto
 	 * again or create new one, if necessary.
 	 */
@@ -1527,6 +1537,8 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		list_replace_rcu(&fold->list, &fnew->list);
 		fold->deleted = true;
 
+		spin_unlock(&tp->lock);
+
 		fl_mask_put(head, fold->mask, true);
 		if (!tc_skip_hw(fold->flags))
 			fl_hw_destroy_filter(tp, fold, NULL);
@@ -1571,6 +1583,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 			goto errout_idr;
 
 		list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
+		spin_unlock(&tp->lock);
 	}
 
 	*arg = fnew;
@@ -1583,6 +1596,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (!fold)
 		idr_remove(&head->handle_idr, fnew->handle);
 errout_hw:
+	spin_unlock(&tp->lock);
 	if (!tc_skip_hw(fnew->flags))
 		fl_hw_destroy_filter(tp, fnew, NULL);
 errout_mask:
@@ -1681,8 +1695,10 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
 				continue;
 			}
 
+			spin_lock(&tp->lock);
 			tc_cls_offload_cnt_update(block, &f->in_hw_count,
 						  &f->flags, add);
+			spin_unlock(&tp->lock);
 		}
 	}
 
@@ -2216,6 +2232,7 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 	struct cls_fl_filter *f = fh;
 	struct nlattr *nest;
 	struct fl_flow_key *key, *mask;
+	bool skip_hw;
 
 	if (!f)
 		return skb->len;
@@ -2226,21 +2243,26 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 	if (!nest)
 		goto nla_put_failure;
 
+	spin_lock(&tp->lock);
+
 	if (f->res.classid &&
 	    nla_put_u32(skb, TCA_FLOWER_CLASSID, f->res.classid))
-		goto nla_put_failure;
+		goto nla_put_failure_locked;
 
 	key = &f->key;
 	mask = &f->mask->key;
+	skip_hw = tc_skip_hw(f->flags);
 
 	if (fl_dump_key(skb, net, key, mask))
-		goto nla_put_failure;
-
-	if (!tc_skip_hw(f->flags))
-		fl_hw_update_stats(tp, f);
+		goto nla_put_failure_locked;
 
 	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
-		goto nla_put_failure;
+		goto nla_put_failure_locked;
+
+	spin_unlock(&tp->lock);
+
+	if (!skip_hw)
+		fl_hw_update_stats(tp, f);
 
 	if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
 		goto nla_put_failure;
@@ -2255,6 +2277,8 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 
 	return skb->len;
 
+nla_put_failure_locked:
+	spin_unlock(&tp->lock);
 nla_put_failure:
 	nla_nest_cancel(skb, nest);
 	return -1;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 11/12] net: sched: flower: track rtnl lock state
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (9 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 10/12] net: sched: flower: protect flower classifier state with spinlock Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-15 22:46   ` Stefano Brivio
  2019-02-14  7:47 ` [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops Vlad Buslov
  2019-02-18 19:15 ` [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Cong Wang
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag
to internal functions that need to know rtnl lock state. Take rtnl lock
before calling tcf APIs that require it (hw offload, bind filter, etc.).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 68 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 48 insertions(+), 20 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 556f7a1c694a..8b53959ca716 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -374,11 +374,14 @@ static void fl_destroy_filter_work(struct work_struct *work)
 }
 
 static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
-				 struct netlink_ext_ack *extack)
+				 bool rtnl_held, struct netlink_ext_ack *extack)
 {
 	struct tc_cls_flower_offload cls_flower = {};
 	struct tcf_block *block = tp->chain->block;
 
+	if (!rtnl_held)
+		rtnl_lock();
+
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
 	cls_flower.command = TC_CLSFLOWER_DESTROY;
 	cls_flower.cookie = (unsigned long) f;
@@ -387,16 +390,22 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 	spin_lock(&tp->lock);
 	tcf_block_offload_dec(block, &f->flags);
 	spin_unlock(&tp->lock);
+
+	if (!rtnl_held)
+		rtnl_unlock();
 }
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
-				struct cls_fl_filter *f,
+				struct cls_fl_filter *f, bool rtnl_held,
 				struct netlink_ext_ack *extack)
 {
 	struct tc_cls_flower_offload cls_flower = {};
 	struct tcf_block *block = tp->chain->block;
 	bool skip_sw = tc_skip_sw(f->flags);
-	int err;
+	int err = 0;
+
+	if (!rtnl_held)
+		rtnl_lock();
 
 	cls_flower.rule = flow_rule_alloc(tcf_exts_num_actions(&f->exts));
 	if (!cls_flower.rule)
@@ -420,26 +429,37 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	kfree(cls_flower.rule);
 
 	if (err < 0) {
-		fl_hw_destroy_filter(tp, f, NULL);
-		return err;
+		fl_hw_destroy_filter(tp, f, true, NULL);
+		goto errout;
 	} else if (err > 0) {
 		f->in_hw_count = err;
+		err = 0;
 		spin_lock(&tp->lock);
 		tcf_block_offload_inc(block, &f->flags);
 		spin_unlock(&tp->lock);
 	}
 
-	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW))
-		return -EINVAL;
+	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {
+		err = -EINVAL;
+		goto errout;
+	}
 
-	return 0;
+errout:
+	if (!rtnl_held)
+		rtnl_unlock();
+
+	return err;
 }
 
-static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f)
+static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
+			       bool rtnl_held)
 {
 	struct tc_cls_flower_offload cls_flower = {};
 	struct tcf_block *block = tp->chain->block;
 
+	if (!rtnl_held)
+		rtnl_lock();
+
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, NULL);
 	cls_flower.command = TC_CLSFLOWER_STATS;
 	cls_flower.cookie = (unsigned long) f;
@@ -450,6 +470,9 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f)
 	tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
 			      cls_flower.stats.pkts,
 			      cls_flower.stats.lastused);
+
+	if (!rtnl_held)
+		rtnl_unlock();
 }
 
 static struct cls_fl_head *fl_head_dereference(struct tcf_proto *tp)
@@ -506,7 +529,8 @@ static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
 }
 
 static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
-		       bool *last, struct netlink_ext_ack *extack)
+		       bool *last, bool rtnl_held,
+		       struct netlink_ext_ack *extack)
 {
 	struct cls_fl_head *head = fl_head_dereference(tp);
 	bool async = tcf_exts_get_net(&f->exts);
@@ -525,7 +549,7 @@ static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
 
 		(*last) = fl_mask_put(head, f->mask, async);
 		if (!tc_skip_hw(f->flags))
-			fl_hw_destroy_filter(tp, f, extack);
+			fl_hw_destroy_filter(tp, f, rtnl_held, extack);
 		tcf_unbind_filter(tp, &f->res);
 		__fl_put(f);
 	} else {
@@ -557,7 +581,7 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
 
 	list_for_each_entry_safe(mask, next_mask, &head->masks, list) {
 		list_for_each_entry_safe(f, next, &mask->filters, list) {
-			__fl_delete(tp, f, &last, extack);
+			__fl_delete(tp, f, &last, rtnl_held, extack);
 			if (last)
 				break;
 		}
@@ -1397,19 +1421,23 @@ static int fl_set_parms(struct net *net, struct tcf_proto *tp,
 			struct cls_fl_filter *f, struct fl_flow_mask *mask,
 			unsigned long base, struct nlattr **tb,
 			struct nlattr *est, bool ovr,
-			struct fl_flow_tmplt *tmplt,
+			struct fl_flow_tmplt *tmplt, bool rtnl_held,
 			struct netlink_ext_ack *extack)
 {
 	int err;
 
-	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, true,
+	err = tcf_exts_validate(net, tp, tb, est, &f->exts, ovr, rtnl_held,
 				extack);
 	if (err < 0)
 		return err;
 
 	if (tb[TCA_FLOWER_CLASSID]) {
 		f->res.classid = nla_get_u32(tb[TCA_FLOWER_CLASSID]);
+		if (!rtnl_held)
+			rtnl_lock();
 		tcf_bind_filter(tp, &f->res, base);
+		if (!rtnl_held)
+			rtnl_unlock();
 	}
 
 	err = fl_set_key(net, tb, &f->key, &mask->key, extack);
@@ -1488,7 +1516,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	}
 
 	err = fl_set_parms(net, tp, fnew, mask, base, tb, tca[TCA_RATE], ovr,
-			   tp->chain->tmplt_priv, extack);
+			   tp->chain->tmplt_priv, rtnl_held, extack);
 	if (err)
 		goto errout;
 
@@ -1497,7 +1525,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 		goto errout;
 
 	if (!tc_skip_hw(fnew->flags)) {
-		err = fl_hw_replace_filter(tp, fnew, extack);
+		err = fl_hw_replace_filter(tp, fnew, rtnl_held, extack);
 		if (err)
 			goto errout_mask;
 	}
@@ -1541,7 +1569,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 
 		fl_mask_put(head, fold->mask, true);
 		if (!tc_skip_hw(fold->flags))
-			fl_hw_destroy_filter(tp, fold, NULL);
+			fl_hw_destroy_filter(tp, fold, rtnl_held, NULL);
 		tcf_unbind_filter(tp, &fold->res);
 		tcf_exts_get_net(&fold->exts);
 		/* Caller holds reference to fold, so refcnt is always > 0
@@ -1598,7 +1626,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 errout_hw:
 	spin_unlock(&tp->lock);
 	if (!tc_skip_hw(fnew->flags))
-		fl_hw_destroy_filter(tp, fnew, NULL);
+		fl_hw_destroy_filter(tp, fnew, rtnl_held, NULL);
 errout_mask:
 	fl_mask_put(head, fnew->mask, true);
 errout:
@@ -1622,7 +1650,7 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
 	bool last_on_mask;
 	int err = 0;
 
-	err = __fl_delete(tp, f, &last_on_mask, extack);
+	err = __fl_delete(tp, f, &last_on_mask, rtnl_held, extack);
 	*last = list_empty(&head->masks);
 	__fl_put(f);
 
@@ -2262,7 +2290,7 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
 	spin_unlock(&tp->lock);
 
 	if (!skip_hw)
-		fl_hw_update_stats(tp, f);
+		fl_hw_update_stats(tp, f, rtnl_held);
 
 	if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
 		goto nla_put_failure;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (10 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 11/12] net: sched: flower: track rtnl lock state Vlad Buslov
@ 2019-02-14  7:47 ` Vlad Buslov
  2019-02-18 19:27   ` Cong Wang
  2019-02-18 19:15 ` [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Cong Wang
  12 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-14  7:47 UTC (permalink / raw)
  To: netdev; +Cc: jhs, xiyou.wangcong, jiri, davem, Vlad Buslov

Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
ops callbacks don't require caller to hold rtnl lock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 8b53959ca716..360cac828cad 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -2362,6 +2362,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
 	.tmplt_destroy	= fl_tmplt_destroy,
 	.tmplt_dump	= fl_tmplt_dump,
 	.owner		= THIS_MODULE,
+	.flags		= TCF_PROTO_OPS_DOIT_UNLOCKED,
 };
 
 static int __init cls_fl_init(void)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
  2019-02-14  7:47 ` [PATCH net-next 02/12] net: sched: flower: refactor fl_change Vlad Buslov
@ 2019-02-14 20:34   ` Stefano Brivio
  2019-02-15 10:38     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Stefano Brivio @ 2019-02-14 20:34 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Thu, 14 Feb 2019 09:47:02 +0200
Vlad Buslov <vladbu@mellanox.com> wrote:

> As a preparation for using classifier spinlock instead of relying on
> external rtnl lock, rearrange code in fl_change. The goal is to group the
> code which changes classifier state in single block in order to allow
> following commits in this set to protect it from parallel modification with
> tp->lock. Data structures that require tp->lock protection are mask
> hashtable and filters list, and classifier handle_idr.
> 
> fl_hw_replace_filter() is a sleeping function and cannot be called while
> holding a spinlock. In order to execute all sequence of changes to shared
> classifier data structures atomically, call fl_hw_replace_filter() before
> modifying them.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
>  1 file changed, 44 insertions(+), 41 deletions(-)
> 
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index 88d7af78ba7e..91596a6271f8 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>  	if (err < 0)
>  		goto errout;
>  
> -	if (!handle) {
> -		handle = 1;
> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> -				    INT_MAX, GFP_KERNEL);
> -	} else if (!fold) {
> -		/* user specifies a handle and it doesn't exist */
> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> -				    handle, GFP_KERNEL);
> -	}
> -	if (err)
> -		goto errout;
> -	fnew->handle = handle;
> -
>
> [...]
>
>  	if (fold) {
> +		fnew->handle = handle;

I'm probably missing something, but what if fold is passed and the
handle isn't specified? That can still happen, right? In that case we
wouldn't be allocating the handle.

> +
> +		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
> +					     fnew->mask->filter_ht_params);
> +		if (err)
> +			goto errout_hw;
> +
>  		rhashtable_remove_fast(&fold->mask->ht,
>  				       &fold->ht_node,
>  				       fold->mask->filter_ht_params);
> -		if (!tc_skip_hw(fold->flags))
> -			fl_hw_destroy_filter(tp, fold, NULL);
> -	}
> -
> -	*arg = fnew;
> -
> -	if (fold) {
>  		idr_replace(&head->handle_idr, fnew, fnew->handle);
>  		list_replace_rcu(&fold->list, &fnew->list);
> +
> +		if (!tc_skip_hw(fold->flags))
> +			fl_hw_destroy_filter(tp, fold, NULL);
>  		tcf_unbind_filter(tp, &fold->res);
>  		tcf_exts_get_net(&fold->exts);
>  		tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
>  	} else {
> +		if (__fl_lookup(fnew->mask, &fnew->mkey)) {
> +			err = -EEXIST;
> +			goto errout_hw;
> +		}
> +
> +		if (handle) {
> +			/* user specifies a handle and it doesn't exist */
> +			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> +					    handle, GFP_ATOMIC);
> +		} else {
> +			handle = 1;
> +			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> +					    INT_MAX, GFP_ATOMIC);
> +		}
> +		if (err)
> +			goto errout_hw;

Just if you respin: a newline here would be nice to have.

> +		fnew->handle = handle;
> +
> +		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
> +					     fnew->mask->filter_ht_params);
> +		if (err)
> +			goto errout_idr;
> +
>  		list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
>  	}
>  
> +	*arg = fnew;
> +
>  	kfree(tb);
>  	kfree(mask);
>  	return 0;
>  
> -errout_mask_ht:
> -	rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node,
> -			       fnew->mask->filter_ht_params);
> -
> -errout_mask:
> -	fl_mask_put(head, fnew->mask, false);
> -
>  errout_idr:
>  	if (!fold)

This check could go away, I guess (not a strong preference though).

>  		idr_remove(&head->handle_idr, fnew->handle);
> +errout_hw:
> +	if (!tc_skip_hw(fnew->flags))
> +		fl_hw_destroy_filter(tp, fnew, NULL);
> +errout_mask:
> +	fl_mask_put(head, fnew->mask, false);
>  errout:
>  	tcf_exts_destroy(&fnew->exts);
>  	kfree(fnew);

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters
  2019-02-14  7:47 ` [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters Vlad Buslov
@ 2019-02-14 20:34   ` Stefano Brivio
  2019-02-15 11:22     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Stefano Brivio @ 2019-02-14 20:34 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Thu, 14 Feb 2019 09:47:03 +0200
Vlad Buslov <vladbu@mellanox.com> wrote:

> +static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
> +						unsigned long *handle)
> +{
> +	struct cls_fl_head *head = fl_head_dereference(tp);
> +	struct cls_fl_filter *f;
> +
> +	rcu_read_lock();
> +	/* don't return filters that are being deleted */
> +	while ((f = idr_get_next_ul(&head->handle_idr,
> +				    handle)) != NULL &&
> +	       !refcount_inc_not_zero(&f->refcnt))
> +		++(*handle);

This... hurts :) What about:

	while ((f = idr_get_next_ul(&head->handle_idr, &handle))) {
		if (refcount_inc_not_zero(&f->refcnt))
			break;
		++(*handle);
	}

?

> +	rcu_read_unlock();
> +
> +	return f;
> +}
> +
>  static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>  			struct netlink_ext_ack *extack)
>  {
> @@ -456,10 +503,7 @@ static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>  	if (!tc_skip_hw(f->flags))
>  		fl_hw_destroy_filter(tp, f, extack);
>  	tcf_unbind_filter(tp, &f->res);
> -	if (async)
> -		tcf_queue_work(&f->rwork, fl_destroy_filter_work);
> -	else
> -		__fl_destroy_filter(f);
> +	__fl_put(f);
>  
>  	return last;
>  }
> @@ -494,11 +538,18 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
>  	tcf_queue_work(&head->rwork, fl_destroy_sleepable);
>  }
>  
> +static void fl_put(struct tcf_proto *tp, void *arg)
> +{
> +	struct cls_fl_filter *f = arg;
> +
> +	__fl_put(f);
> +}
> +
>  static void *fl_get(struct tcf_proto *tp, u32 handle)
>  {
>  	struct cls_fl_head *head = fl_head_dereference(tp);
>  
> -	return idr_find(&head->handle_idr, handle);
> +	return __fl_get(head, handle);
>  }
>  
>  static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
> @@ -1321,12 +1372,16 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>  	struct nlattr **tb;
>  	int err;
>  
> -	if (!tca[TCA_OPTIONS])
> -		return -EINVAL;
> +	if (!tca[TCA_OPTIONS]) {
> +		err = -EINVAL;
> +		goto errout_fold;
> +	}
>  
>  	mask = kzalloc(sizeof(struct fl_flow_mask), GFP_KERNEL);
> -	if (!mask)
> -		return -ENOBUFS;
> +	if (!mask) {
> +		err = -ENOBUFS;
> +		goto errout_fold;
> +	}
>  
>  	tb = kcalloc(TCA_FLOWER_MAX + 1, sizeof(struct nlattr *), GFP_KERNEL);
>  	if (!tb) {
> @@ -1349,6 +1404,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>  		err = -ENOBUFS;
>  		goto errout_tb;
>  	}
> +	refcount_set(&fnew->refcnt, 1);
>  
>  	err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
>  	if (err < 0)
> @@ -1381,6 +1437,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>  	if (!tc_in_hw(fnew->flags))
>  		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
>  
> +	refcount_inc(&fnew->refcnt);

I guess I'm not getting the semantics but... why is it 2 now?

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 04/12] net: sched: flower: track filter deletion with flag
  2019-02-14  7:47 ` [PATCH net-next 04/12] net: sched: flower: track filter deletion with flag Vlad Buslov
@ 2019-02-14 20:49   ` Stefano Brivio
  2019-02-15 15:54     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Stefano Brivio @ 2019-02-14 20:49 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Thu, 14 Feb 2019 09:47:04 +0200
Vlad Buslov <vladbu@mellanox.com> wrote:

> +static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
> +		       bool *last, struct netlink_ext_ack *extack)

This would be easier to follow (at least for me):

>  {
>  	struct cls_fl_head *head = fl_head_dereference(tp);
>  	bool async = tcf_exts_get_net(&f->exts);
> -	bool last;
> -
> -	idr_remove(&head->handle_idr, f->handle);
> -	list_del_rcu(&f->list);
> -	last = fl_mask_put(head, f->mask, async);
> -	if (!tc_skip_hw(f->flags))
> -		fl_hw_destroy_filter(tp, f, extack);
> -	tcf_unbind_filter(tp, &f->res);
> -	__fl_put(f);
> +	int err = 0;

without this

> +
> +	(*last) = false;

with *last = false;

> +
> +	if (!f->deleted) {

with:
	if (f->deleted)
		return -ENOENT;

> +		f->deleted = true;
> +		rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
> +				       f->mask->filter_ht_params);
> +		idr_remove(&head->handle_idr, f->handle);
> +		list_del_rcu(&f->list);
> +		(*last) = fl_mask_put(head, f->mask, async);

with:
	*last = fl_mask_put(head, f->mask, async);

> +		if (!tc_skip_hw(f->flags))
> +			fl_hw_destroy_filter(tp, f, extack);
> +		tcf_unbind_filter(tp, &f->res);
> +		__fl_put(f);

and a return 0; here

> +	} else {
> +		err = -ENOENT;
> +	}
>  
> -	return last;
> +	return err;
>  }
>  
> [...]
>
> @@ -1520,14 +1541,14 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
>  {
>  	struct cls_fl_head *head = fl_head_dereference(tp);
>  	struct cls_fl_filter *f = arg;
> +	bool last_on_mask;

This is unused in this series, maybe change __fl_delete() to optionally
take NULL as 'bool *last' argument?

> +	int err = 0;

Nit: no need to initialise this.
 
> -	rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
> -			       f->mask->filter_ht_params);
> -	__fl_delete(tp, f, extack);
> +	err = __fl_delete(tp, f, &last_on_mask, extack);
>  	*last = list_empty(&head->masks);
>  	__fl_put(f);
>  
> -	return 0;
> +	return err;
>  }
>  
>  static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
  2019-02-14 20:34   ` Stefano Brivio
@ 2019-02-15 10:38     ` Vlad Buslov
  2019-02-15 10:47       ` Stefano Brivio
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-15 10:38 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Thu, 14 Feb 2019 09:47:02 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> As a preparation for using classifier spinlock instead of relying on
>> external rtnl lock, rearrange code in fl_change. The goal is to group the
>> code which changes classifier state in single block in order to allow
>> following commits in this set to protect it from parallel modification with
>> tp->lock. Data structures that require tp->lock protection are mask
>> hashtable and filters list, and classifier handle_idr.
>>
>> fl_hw_replace_filter() is a sleeping function and cannot be called while
>> holding a spinlock. In order to execute all sequence of changes to shared
>> classifier data structures atomically, call fl_hw_replace_filter() before
>> modifying them.
>>
>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
>>  1 file changed, 44 insertions(+), 41 deletions(-)
>>
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 88d7af78ba7e..91596a6271f8 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	if (err < 0)
>>  		goto errout;
>>
>> -	if (!handle) {
>> -		handle = 1;
>> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> -				    INT_MAX, GFP_KERNEL);
>> -	} else if (!fold) {
>> -		/* user specifies a handle and it doesn't exist */
>> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> -				    handle, GFP_KERNEL);
>> -	}
>> -	if (err)
>> -		goto errout;
>> -	fnew->handle = handle;
>> -
>>
>> [...]
>>
>>  	if (fold) {
>> +		fnew->handle = handle;
>
> I'm probably missing something, but what if fold is passed and the
> handle isn't specified? That can still happen, right? In that case we
> wouldn't be allocating the handle.

Hi Stefano,

Thank you for reviewing my code.

Cls API lookups fold by handle, so this pointer can only be not NULL
when user specified a handle and filter with such handle exists on tp.

>
>> +
>> +		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
>> +					     fnew->mask->filter_ht_params);
>> +		if (err)
>> +			goto errout_hw;
>> +
>>  		rhashtable_remove_fast(&fold->mask->ht,
>>  				       &fold->ht_node,
>>  				       fold->mask->filter_ht_params);
>> -		if (!tc_skip_hw(fold->flags))
>> -			fl_hw_destroy_filter(tp, fold, NULL);
>> -	}
>> -
>> -	*arg = fnew;
>> -
>> -	if (fold) {
>>  		idr_replace(&head->handle_idr, fnew, fnew->handle);
>>  		list_replace_rcu(&fold->list, &fnew->list);
>> +
>> +		if (!tc_skip_hw(fold->flags))
>> +			fl_hw_destroy_filter(tp, fold, NULL);
>>  		tcf_unbind_filter(tp, &fold->res);
>>  		tcf_exts_get_net(&fold->exts);
>>  		tcf_queue_work(&fold->rwork, fl_destroy_filter_work);
>>  	} else {
>> +		if (__fl_lookup(fnew->mask, &fnew->mkey)) {
>> +			err = -EEXIST;
>> +			goto errout_hw;
>> +		}
>> +
>> +		if (handle) {
>> +			/* user specifies a handle and it doesn't exist */
>> +			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> +					    handle, GFP_ATOMIC);
>> +		} else {
>> +			handle = 1;
>> +			err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> +					    INT_MAX, GFP_ATOMIC);
>> +		}
>> +		if (err)
>> +			goto errout_hw;
>
> Just if you respin: a newline here would be nice to have.

Agree.

>
>> +		fnew->handle = handle;
>> +
>> +		err = rhashtable_insert_fast(&fnew->mask->ht, &fnew->ht_node,
>> +					     fnew->mask->filter_ht_params);
>> +		if (err)
>> +			goto errout_idr;
>> +
>>  		list_add_tail_rcu(&fnew->list, &fnew->mask->filters);
>>  	}
>>
>> +	*arg = fnew;
>> +
>>  	kfree(tb);
>>  	kfree(mask);
>>  	return 0;
>>
>> -errout_mask_ht:
>> -	rhashtable_remove_fast(&fnew->mask->ht, &fnew->ht_node,
>> -			       fnew->mask->filter_ht_params);
>> -
>> -errout_mask:
>> -	fl_mask_put(head, fnew->mask, false);
>> -
>>  errout_idr:
>>  	if (!fold)
>
> This check could go away, I guess (not a strong preference though).

Yes, it seems that after this change errout_idr lable is only accessed
from else branch of if(fold) conditional so the check is redundant.

>
>>  		idr_remove(&head->handle_idr, fnew->handle);
>> +errout_hw:
>> +	if (!tc_skip_hw(fnew->flags))
>> +		fl_hw_destroy_filter(tp, fnew, NULL);
>> +errout_mask:
>> +	fl_mask_put(head, fnew->mask, false);
>>  errout:
>>  	tcf_exts_destroy(&fnew->exts);
>>  	kfree(fnew);

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
  2019-02-15 10:38     ` Vlad Buslov
@ 2019-02-15 10:47       ` Stefano Brivio
  2019-02-15 16:25         ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Stefano Brivio @ 2019-02-15 10:47 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Fri, 15 Feb 2019 10:38:04 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> > On Thu, 14 Feb 2019 09:47:02 +0200
> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >  
> >> As a preparation for using classifier spinlock instead of relying on
> >> external rtnl lock, rearrange code in fl_change. The goal is to group the
> >> code which changes classifier state in single block in order to allow
> >> following commits in this set to protect it from parallel modification with
> >> tp->lock. Data structures that require tp->lock protection are mask
> >> hashtable and filters list, and classifier handle_idr.
> >>
> >> fl_hw_replace_filter() is a sleeping function and cannot be called while
> >> holding a spinlock. In order to execute all sequence of changes to shared
> >> classifier data structures atomically, call fl_hw_replace_filter() before
> >> modifying them.
> >>
> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
> >> ---
> >>  net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
> >>  1 file changed, 44 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> >> index 88d7af78ba7e..91596a6271f8 100644
> >> --- a/net/sched/cls_flower.c
> >> +++ b/net/sched/cls_flower.c
> >> @@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> >>  	if (err < 0)
> >>  		goto errout;
> >>
> >> -	if (!handle) {
> >> -		handle = 1;
> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> >> -				    INT_MAX, GFP_KERNEL);
> >> -	} else if (!fold) {
> >> -		/* user specifies a handle and it doesn't exist */
> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
> >> -				    handle, GFP_KERNEL);
> >> -	}
> >> -	if (err)
> >> -		goto errout;
> >> -	fnew->handle = handle;
> >> -
> >>
> >> [...]
> >>
> >>  	if (fold) {
> >> +		fnew->handle = handle;  
> >
> > I'm probably missing something, but what if fold is passed and the
> > handle isn't specified? That can still happen, right? In that case we
> > wouldn't be allocating the handle.  
> 
> Hi Stefano,
> 
> Thank you for reviewing my code.
> 
> Cls API lookups fold by handle, so this pointer can only be not NULL
> when user specified a handle and filter with such handle exists on tp.

Ah, of course. Thanks for clarifying. By the way, what tricked me here
was this check in fl_change():

	if (fold && handle && fold->handle != handle)
		...

which could be turned into:

	if (fold && fold->handle != handle)
		...

at this point.

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters
  2019-02-14 20:34   ` Stefano Brivio
@ 2019-02-15 11:22     ` Vlad Buslov
  2019-02-15 12:32       ` Stefano Brivio
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-15 11:22 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem


On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Thu, 14 Feb 2019 09:47:03 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> +static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
>> +						unsigned long *handle)
>> +{
>> +	struct cls_fl_head *head = fl_head_dereference(tp);
>> +	struct cls_fl_filter *f;
>> +
>> +	rcu_read_lock();
>> +	/* don't return filters that are being deleted */
>> +	while ((f = idr_get_next_ul(&head->handle_idr,
>> +				    handle)) != NULL &&
>> +	       !refcount_inc_not_zero(&f->refcnt))
>> +		++(*handle);
>
> This... hurts :) What about:
>
> 	while ((f = idr_get_next_ul(&head->handle_idr, &handle))) {
> 		if (refcount_inc_not_zero(&f->refcnt))
> 			break;
> 		++(*handle);
> 	}
>
> ?

I prefer to avoid using value of assignment as boolean and
non-structured jumps, when possible. In this case it seems OK either
way, but how about:

	for (f = idr_get_next_ul(&head->handle_idr, handle);
	     f && !refcount_inc_not_zero(&f->refcnt);
	     f = idr_get_next_ul(&head->handle_idr, handle))
		++(*handle);

>
>> +	rcu_read_unlock();
>> +
>> +	return f;
>> +}
>> +
>>  static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>>  			struct netlink_ext_ack *extack)
>>  {
>> @@ -456,10 +503,7 @@ static bool __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>>  	if (!tc_skip_hw(f->flags))
>>  		fl_hw_destroy_filter(tp, f, extack);
>>  	tcf_unbind_filter(tp, &f->res);
>> -	if (async)
>> -		tcf_queue_work(&f->rwork, fl_destroy_filter_work);
>> -	else
>> -		__fl_destroy_filter(f);
>> +	__fl_put(f);
>>
>>  	return last;
>>  }
>> @@ -494,11 +538,18 @@ static void fl_destroy(struct tcf_proto *tp, bool rtnl_held,
>>  	tcf_queue_work(&head->rwork, fl_destroy_sleepable);
>>  }
>>
>> +static void fl_put(struct tcf_proto *tp, void *arg)
>> +{
>> +	struct cls_fl_filter *f = arg;
>> +
>> +	__fl_put(f);
>> +}
>> +
>>  static void *fl_get(struct tcf_proto *tp, u32 handle)
>>  {
>>  	struct cls_fl_head *head = fl_head_dereference(tp);
>>
>> -	return idr_find(&head->handle_idr, handle);
>> +	return __fl_get(head, handle);
>>  }
>>
>>  static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
>> @@ -1321,12 +1372,16 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	struct nlattr **tb;
>>  	int err;
>>
>> -	if (!tca[TCA_OPTIONS])
>> -		return -EINVAL;
>> +	if (!tca[TCA_OPTIONS]) {
>> +		err = -EINVAL;
>> +		goto errout_fold;
>> +	}
>>
>>  	mask = kzalloc(sizeof(struct fl_flow_mask), GFP_KERNEL);
>> -	if (!mask)
>> -		return -ENOBUFS;
>> +	if (!mask) {
>> +		err = -ENOBUFS;
>> +		goto errout_fold;
>> +	}
>>
>>  	tb = kcalloc(TCA_FLOWER_MAX + 1, sizeof(struct nlattr *), GFP_KERNEL);
>>  	if (!tb) {
>> @@ -1349,6 +1404,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  		err = -ENOBUFS;
>>  		goto errout_tb;
>>  	}
>> +	refcount_set(&fnew->refcnt, 1);
>>
>>  	err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
>>  	if (err < 0)
>> @@ -1381,6 +1437,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>>  	if (!tc_in_hw(fnew->flags))
>>  		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
>>
>> +	refcount_inc(&fnew->refcnt);
>
> I guess I'm not getting the semantics but... why is it 2 now?

As soon as fnew is inserted into head->handle_idr (one reference), it
becomes accessible to concurrent users, which means that it can be
deleted at any time. However, tp->change() returns a reference to newly
created filter to cls_api by assigning "arg" parameter to it (second
reference). After tp->change() returns, cls API continues to use fnew
and releases it with tfilter_put() when finished.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters
  2019-02-15 11:22     ` Vlad Buslov
@ 2019-02-15 12:32       ` Stefano Brivio
  0 siblings, 0 replies; 44+ messages in thread
From: Stefano Brivio @ 2019-02-15 12:32 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Fri, 15 Feb 2019 11:22:45 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
> > On Thu, 14 Feb 2019 09:47:03 +0200
> > Vlad Buslov <vladbu@mellanox.com> wrote:
> >  
> >> +static struct cls_fl_filter *fl_get_next_filter(struct tcf_proto *tp,
> >> +						unsigned long *handle)
> >> +{
> >> +	struct cls_fl_head *head = fl_head_dereference(tp);
> >> +	struct cls_fl_filter *f;
> >> +
> >> +	rcu_read_lock();
> >> +	/* don't return filters that are being deleted */
> >> +	while ((f = idr_get_next_ul(&head->handle_idr,
> >> +				    handle)) != NULL &&
> >> +	       !refcount_inc_not_zero(&f->refcnt))
> >> +		++(*handle);  
> >
> > This... hurts :) What about:
> >
> > 	while ((f = idr_get_next_ul(&head->handle_idr, &handle))) {
> > 		if (refcount_inc_not_zero(&f->refcnt))
> > 			break;
> > 		++(*handle);
> > 	}
> >
> > ?  
> 
> I prefer to avoid using value of assignment as boolean and
> non-structured jumps, when possible. In this case it seems OK either
> way, but how about:
> 
> 	for (f = idr_get_next_ul(&head->handle_idr, handle);
> 	     f && !refcount_inc_not_zero(&f->refcnt);
> 	     f = idr_get_next_ul(&head->handle_idr, handle))
> 		++(*handle);

Honestly, I preferred the original, this is repeating idr_get_next_ul()
twice.

Maybe, just:

	[...]
	struct idr *idr;

	[...]
	idr = &head->handle_idr;
	while ((f = idr_get_next_ul(idr, handle)) != NULL &&
	       !refcount_inc_not_zero(&f->refcnt))
		++(*handle);

also rather ugly, but not entirely unreadable. I tried drafting a
helper for this, but it just ends up hiding what this does.

> >> @@ -1349,6 +1404,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> >>  		err = -ENOBUFS;
> >>  		goto errout_tb;
> >>  	}
> >> +	refcount_set(&fnew->refcnt, 1);
> >>
> >>  	err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
> >>  	if (err < 0)
> >> @@ -1381,6 +1437,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> >>  	if (!tc_in_hw(fnew->flags))
> >>  		fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
> >>
> >> +	refcount_inc(&fnew->refcnt);  
> >
> > I guess I'm not getting the semantics but... why is it 2 now?  
> 
> As soon as fnew is inserted into head->handle_idr (one reference), it
> becomes accessible to concurrent users, which means that it can be
> deleted at any time. However, tp->change() returns a reference to newly
> created filter to cls_api by assigning "arg" parameter to it (second
> reference). After tp->change() returns, cls API continues to use fnew
> and releases it with tfilter_put() when finished.

I see, thanks for the explanation!

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 04/12] net: sched: flower: track filter deletion with flag
  2019-02-14 20:49   ` Stefano Brivio
@ 2019-02-15 15:54     ` Vlad Buslov
  0 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-15 15:54 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem


On Thu 14 Feb 2019 at 20:49, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Thu, 14 Feb 2019 09:47:04 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> +static int __fl_delete(struct tcf_proto *tp, struct cls_fl_filter *f,
>> +		       bool *last, struct netlink_ext_ack *extack)
>
> This would be easier to follow (at least for me):
>
>>  {
>>  	struct cls_fl_head *head = fl_head_dereference(tp);
>>  	bool async = tcf_exts_get_net(&f->exts);
>> -	bool last;
>> -
>> -	idr_remove(&head->handle_idr, f->handle);
>> -	list_del_rcu(&f->list);
>> -	last = fl_mask_put(head, f->mask, async);
>> -	if (!tc_skip_hw(f->flags))
>> -		fl_hw_destroy_filter(tp, f, extack);
>> -	tcf_unbind_filter(tp, &f->res);
>> -	__fl_put(f);
>> +	int err = 0;
>
> without this
>
>> +
>> +	(*last) = false;
>
> with *last = false;
>
>> +
>> +	if (!f->deleted) {
>
> with:
> 	if (f->deleted)
> 		return -ENOENT;
>
>> +		f->deleted = true;
>> +		rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
>> +				       f->mask->filter_ht_params);
>> +		idr_remove(&head->handle_idr, f->handle);
>> +		list_del_rcu(&f->list);
>> +		(*last) = fl_mask_put(head, f->mask, async);
>
> with:
> 	*last = fl_mask_put(head, f->mask, async);
>
>> +		if (!tc_skip_hw(f->flags))
>> +			fl_hw_destroy_filter(tp, f, extack);
>> +		tcf_unbind_filter(tp, &f->res);
>> +		__fl_put(f);
>
> and a return 0; here

Agree, this function looks better when structured in the way you
suggest. Will change it in V2.

>
>> +	} else {
>> +		err = -ENOENT;
>> +	}
>>  
>> -	return last;
>> +	return err;
>>  }
>>  
>> [...]
>>
>> @@ -1520,14 +1541,14 @@ static int fl_delete(struct tcf_proto *tp, void *arg, bool *last,
>>  {
>>  	struct cls_fl_head *head = fl_head_dereference(tp);
>>  	struct cls_fl_filter *f = arg;
>> +	bool last_on_mask;
>
> This is unused in this series, maybe change __fl_delete() to optionally
> take NULL as 'bool *last' argument?

It was implemented like that originally, but on internal review with
Jiri we decided that having unused variable here is better than
complicating __fl_delete() with support for "last" being NULL without
good reason.

>
>> +	int err = 0;
>
> Nit: no need to initialise this.

Yes, but I always regret having uninitialized variables in my functions
later on :(

>  
>> -	rhashtable_remove_fast(&f->mask->ht, &f->ht_node,
>> -			       f->mask->filter_ht_params);
>> -	__fl_delete(tp, f, extack);
>> +	err = __fl_delete(tp, f, &last_on_mask, extack);
>>  	*last = list_empty(&head->masks);
>>  	__fl_put(f);
>>  
>> -	return 0;
>> +	return err;
>>  }
>>  
>>  static void fl_walk(struct tcf_proto *tp, struct tcf_walker *arg,


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
  2019-02-15 10:47       ` Stefano Brivio
@ 2019-02-15 16:25         ` Vlad Buslov
  2019-02-18 18:20           ` Stefano Brivio
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-15 16:25 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem


On Fri 15 Feb 2019 at 10:47, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Fri, 15 Feb 2019 10:38:04 +0000
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>> On Thu 14 Feb 2019 at 20:34, Stefano Brivio <sbrivio@redhat.com> wrote:
>> > On Thu, 14 Feb 2019 09:47:02 +0200
>> > Vlad Buslov <vladbu@mellanox.com> wrote:
>> >
>> >> As a preparation for using classifier spinlock instead of relying on
>> >> external rtnl lock, rearrange code in fl_change. The goal is to group the
>> >> code which changes classifier state in single block in order to allow
>> >> following commits in this set to protect it from parallel modification with
>> >> tp->lock. Data structures that require tp->lock protection are mask
>> >> hashtable and filters list, and classifier handle_idr.
>> >>
>> >> fl_hw_replace_filter() is a sleeping function and cannot be called while
>> >> holding a spinlock. In order to execute all sequence of changes to shared
>> >> classifier data structures atomically, call fl_hw_replace_filter() before
>> >> modifying them.
>> >>
>> >> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> >> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> >> ---
>> >>  net/sched/cls_flower.c | 85 ++++++++++++++++++++++++++------------------------
>> >>  1 file changed, 44 insertions(+), 41 deletions(-)
>> >>
>> >> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> >> index 88d7af78ba7e..91596a6271f8 100644
>> >> --- a/net/sched/cls_flower.c
>> >> +++ b/net/sched/cls_flower.c
>> >> @@ -1354,90 +1354,93 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
>> >>  	if (err < 0)
>> >>  		goto errout;
>> >>
>> >> -	if (!handle) {
>> >> -		handle = 1;
>> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> >> -				    INT_MAX, GFP_KERNEL);
>> >> -	} else if (!fold) {
>> >> -		/* user specifies a handle and it doesn't exist */
>> >> -		err = idr_alloc_u32(&head->handle_idr, fnew, &handle,
>> >> -				    handle, GFP_KERNEL);
>> >> -	}
>> >> -	if (err)
>> >> -		goto errout;
>> >> -	fnew->handle = handle;
>> >> -
>> >>
>> >> [...]
>> >>
>> >>  	if (fold) {
>> >> +		fnew->handle = handle;
>> >
>> > I'm probably missing something, but what if fold is passed and the
>> > handle isn't specified? That can still happen, right? In that case we
>> > wouldn't be allocating the handle.
>>
>> Hi Stefano,
>>
>> Thank you for reviewing my code.
>>
>> Cls API lookups fold by handle, so this pointer can only be not NULL
>> when user specified a handle and filter with such handle exists on tp.
>
> Ah, of course. Thanks for clarifying. By the way, what tricked me here
> was this check in fl_change():
>
> 	if (fold && handle && fold->handle != handle)
> 		...
>
> which could be turned into:
>
> 	if (fold && fold->handle != handle)
> 		...
>
> at this point.

At this point I don't think this check is needed at all because fold
can't suddenly change its handle in between this check and initial
lookup in cls API. Looking at commit history, this check is present
since original commit by Jiri that implements flower classifier. Maybe
semantics of cls API was different back then?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 06/12] net: sched: flower: handle concurrent mask insertion
  2019-02-14  7:47 ` [PATCH net-next 06/12] net: sched: flower: handle concurrent mask insertion Vlad Buslov
@ 2019-02-15 22:46   ` Stefano Brivio
  0 siblings, 0 replies; 44+ messages in thread
From: Stefano Brivio @ 2019-02-15 22:46 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Thu, 14 Feb 2019 09:47:06 +0200
Vlad Buslov <vladbu@mellanox.com> wrote:

> @@ -1327,19 +1330,36 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
>  	int ret = 0;
>  
>  	rcu_read_lock();
> -	fnew->mask = rhashtable_lookup_fast(&head->ht, mask, mask_ht_params);
> +
> +	/* Insert mask as temporary node to prevent concurrent creation of mask
> +	 * with same key. Any concurrent lookups with same key will return
> +	 * EAGAIN because mask's refcnt is zero. It is safe to insert

Nit: -EAGAIN

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 11/12] net: sched: flower: track rtnl lock state
  2019-02-14  7:47 ` [PATCH net-next 11/12] net: sched: flower: track rtnl lock state Vlad Buslov
@ 2019-02-15 22:46   ` Stefano Brivio
  2019-02-18  9:35     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Stefano Brivio @ 2019-02-15 22:46 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem

On Thu, 14 Feb 2019 09:47:11 +0200
Vlad Buslov <vladbu@mellanox.com> wrote:

>  static int fl_hw_replace_filter(struct tcf_proto *tp,
> -				struct cls_fl_filter *f,
> +				struct cls_fl_filter *f, bool rtnl_held,
>  				struct netlink_ext_ack *extack)
>  {
>  	struct tc_cls_flower_offload cls_flower = {};
>  	struct tcf_block *block = tp->chain->block;
>  	bool skip_sw = tc_skip_sw(f->flags);
> -	int err;
> +	int err = 0;
> +
> +	if (!rtnl_held)
> +		rtnl_lock();
>  
>  	cls_flower.rule = flow_rule_alloc(tcf_exts_num_actions(&f->exts));
>  	if (!cls_flower.rule)

                return -ENOMEM;

Don't you need to:

		err = -ENOMEM;
		goto errout;

here?

Same...

        err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
        if (err) {
                kfree(cls_flower.rule);
                if (skip_sw) {
                        NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
                        return err;

here,

                }
                return 0;

and here.

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 11/12] net: sched: flower: track rtnl lock state
  2019-02-15 22:46   ` Stefano Brivio
@ 2019-02-18  9:35     ` Vlad Buslov
  0 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-18  9:35 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem


On Fri 15 Feb 2019 at 22:46, Stefano Brivio <sbrivio@redhat.com> wrote:
> On Thu, 14 Feb 2019 09:47:11 +0200
> Vlad Buslov <vladbu@mellanox.com> wrote:
>
>>  static int fl_hw_replace_filter(struct tcf_proto *tp,
>> -				struct cls_fl_filter *f,
>> +				struct cls_fl_filter *f, bool rtnl_held,
>>  				struct netlink_ext_ack *extack)
>>  {
>>  	struct tc_cls_flower_offload cls_flower = {};
>>  	struct tcf_block *block = tp->chain->block;
>>  	bool skip_sw = tc_skip_sw(f->flags);
>> -	int err;
>> +	int err = 0;
>> +
>> +	if (!rtnl_held)
>> +		rtnl_lock();
>>  
>>  	cls_flower.rule = flow_rule_alloc(tcf_exts_num_actions(&f->exts));
>>  	if (!cls_flower.rule)
>
>                 return -ENOMEM;
>
> Don't you need to:
>
> 		err = -ENOMEM;
> 		goto errout;
>
> here?
>
> Same...
>
>         err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
>         if (err) {
>                 kfree(cls_flower.rule);
>                 if (skip_sw) {
>                         NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
>                         return err;
>
> here,
>
>                 }
>                 return 0;
>
> and here.

Thanks for catching this!
These error handlers were introduced by recent Pablo's patches and I
missed them during rebase.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 02/12] net: sched: flower: refactor fl_change
  2019-02-15 16:25         ` Vlad Buslov
@ 2019-02-18 18:20           ` Stefano Brivio
  0 siblings, 0 replies; 44+ messages in thread
From: Stefano Brivio @ 2019-02-18 18:20 UTC (permalink / raw)
  To: Vlad Buslov, jiri; +Cc: netdev, jhs, xiyou.wangcong, davem

On Fri, 15 Feb 2019 16:25:52 +0000
Vlad Buslov <vladbu@mellanox.com> wrote:

> On Fri 15 Feb 2019 at 10:47, Stefano Brivio <sbrivio@redhat.com> wrote:
>
> > Ah, of course. Thanks for clarifying. By the way, what tricked me here
> > was this check in fl_change():
> >
> > 	if (fold && handle && fold->handle != handle)
> > 		...
> >
> > which could be turned into:
> >
> > 	if (fold && fold->handle != handle)
> > 		...
> >
> > at this point.  
> 
> At this point I don't think this check is needed at all because fold
> can't suddenly change its handle in between this check and initial
> lookup in cls API.

Oh, right.

> Looking at commit history, this check is present since original commit
> by Jiri that implements flower classifier. Maybe semantics of cls API
> was different back then?

I wasn't able to figure that out either... Jiri?

-- 
Stefano

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-14  7:47 ` [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference Vlad Buslov
@ 2019-02-18 19:08   ` Cong Wang
  2019-02-19  9:45     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-18 19:08 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Flower classifier only changes root pointer during init and destroy. Cls
> API implements reference counting for tcf_proto, so there is no danger of
> concurrent access to tp when it is being destroyed, even without protection
> provided by rtnl lock.

How about atomicity? Refcnt doesn't guarantee atomicity, how do
you make sure two concurrent modifications are atomic?


>
> Implement new function fl_head_dereference() to dereference tp->root
> without checking for rtnl lock. Use it in all flower function that obtain
> head pointer instead of rtnl_dereference().
>

So what lock protects RCU writers after this patch?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock
  2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
                   ` (11 preceding siblings ...)
  2019-02-14  7:47 ` [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops Vlad Buslov
@ 2019-02-18 19:15 ` Cong Wang
  2019-02-19 10:00   ` Vlad Buslov
  12 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-18 19:15 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

Hi,

>  net/sched/cls_flower.c | 424 +++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 321 insertions(+), 103 deletions(-)
>

Given you change cls_flower so much, please also add a test case for
verifying your changes, especially focusing on the atomicity of concurrent
modifications.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops
  2019-02-14  7:47 ` [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops Vlad Buslov
@ 2019-02-18 19:27   ` Cong Wang
  2019-02-19 10:15     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-18 19:27 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
> ops callbacks don't require caller to hold rtnl lock.

So, if this means RTNL is gone for all cls_flower changes, why
do I still see rtnl_lock() in cls_flower.c after all your patches in this set?

For instance:

 366 static void fl_destroy_filter_work(struct work_struct *work)
 367 {
 368         struct cls_fl_filter *f = container_of(to_rcu_work(work),
 369                                         struct cls_fl_filter, rwork);
 370
 371         rtnl_lock();
 372         __fl_destroy_filter(f);
 373         rtnl_unlock();
 374 }

and...

 382         if (!rtnl_held)
 383                 rtnl_lock();

...

1436                 if (!rtnl_held)
1437                         rtnl_lock();


Please explain in your changelog, otherwise it is very confusing.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 09/12] net: sched: flower: handle concurrent tcf proto deletion
  2019-02-14  7:47 ` [PATCH net-next 09/12] net: sched: flower: handle concurrent tcf proto deletion Vlad Buslov
@ 2019-02-18 20:47   ` Cong Wang
  2019-02-19 14:08     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-18 20:47 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>
> Without rtnl lock protection tcf proto can be deleted concurrently. Check
> tcf proto 'deleting' flag after taking tcf spinlock to verify that no
> concurrent deletion is in progress. Return EAGAIN error if concurrent
> deletion detected, which will cause caller to retry and possibly create new
> instance of tcf proto.
>

Please state the reason why you prefer retry over locking the whole
tp without retrying, that is why and how it is better?

Personally I always prefer non-retry logic, because it is very easy
to understand and justify its correctness.

As you prefer otherwise, please share your reasoning in changelog.

Thanks!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-18 19:08   ` Cong Wang
@ 2019-02-19  9:45     ` Vlad Buslov
  2019-02-20 22:33       ` Cong Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-19  9:45 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Mon 18 Feb 2019 at 19:08, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Flower classifier only changes root pointer during init and destroy. Cls
>> API implements reference counting for tcf_proto, so there is no danger of
>> concurrent access to tp when it is being destroyed, even without protection
>> provided by rtnl lock.
>
> How about atomicity? Refcnt doesn't guarantee atomicity, how do
> you make sure two concurrent modifications are atomic?

In order to guarantee atomicity I lock shared flower classifier data
structures with tp->lock in following patches.

>
>
>>
>> Implement new function fl_head_dereference() to dereference tp->root
>> without checking for rtnl lock. Use it in all flower function that obtain
>> head pointer instead of rtnl_dereference().
>>
>
> So what lock protects RCU writers after this patch?

I explained it in comment for fl_head_dereference(), but should have
copied this information to changelog as well:
Flower classifier only changes root pointer during init and destroy.
Cls API implements reference counting for tcf_proto, so there is no
danger of concurrent access to tp when it is being destroyed, even
without protection provided by rtnl lock.

In initial version of this change I used tp->lock to protect tp->root
access and verified it with lockdep, but during internal review Jiri
noted that this is not needed in current flower implementation.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock
  2019-02-18 19:15 ` [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Cong Wang
@ 2019-02-19 10:00   ` Vlad Buslov
  0 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-19 10:00 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Mon 18 Feb 2019 at 19:15, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Hi,
>
>>  net/sched/cls_flower.c | 424 +++++++++++++++++++++++++++++++++++++------------
>>  1 file changed, 321 insertions(+), 103 deletions(-)
>>
>
> Given you change cls_flower so much, please also add a test case for
> verifying your changes, especially focusing on the atomicity of concurrent
> modifications.
>
> Thanks.

Will do.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops
  2019-02-18 19:27   ` Cong Wang
@ 2019-02-19 10:15     ` Vlad Buslov
  2019-02-20 22:36       ` Cong Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-19 10:15 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Mon 18 Feb 2019 at 19:27, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
>> ops callbacks don't require caller to hold rtnl lock.
>
> So, if this means RTNL is gone for all cls_flower changes, why
> do I still see rtnl_lock() in cls_flower.c after all your patches in
> this set?

It doesn't say that rtnl lock is gone, what it says is that caller
doesn't have to obtain rtnl lock before calling flower ops callbacks.

>
> For instance:
>
>  366 static void fl_destroy_filter_work(struct work_struct *work)
>  367 {
>  368         struct cls_fl_filter *f = container_of(to_rcu_work(work),
>  369                                         struct cls_fl_filter, rwork);
>  370
>  371         rtnl_lock();
>  372         __fl_destroy_filter(f);
>  373         rtnl_unlock();
>  374 }

This shouldn't be needed. Thanks for spotting it.

>
> and...
>
>  382         if (!rtnl_held)
>  383                 rtnl_lock();
>
> ...
>
> 1436                 if (!rtnl_held)
> 1437                         rtnl_lock();

Drivers assume rtnl lock, so flower obtains it before calling offloads
API.

>
>
> Please explain in your changelog, otherwise it is very confusing.

Sorry for not making this stuff clear. I will expand cover letter with
more details.

>
> Thanks.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 09/12] net: sched: flower: handle concurrent tcf proto deletion
  2019-02-18 20:47   ` Cong Wang
@ 2019-02-19 14:08     ` Vlad Buslov
  0 siblings, 0 replies; 44+ messages in thread
From: Vlad Buslov @ 2019-02-19 14:08 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Mon 18 Feb 2019 at 20:47, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>> Without rtnl lock protection tcf proto can be deleted concurrently. Check
>> tcf proto 'deleting' flag after taking tcf spinlock to verify that no
>> concurrent deletion is in progress. Return EAGAIN error if concurrent
>> deletion detected, which will cause caller to retry and possibly create new
>> instance of tcf proto.
>>
>
> Please state the reason why you prefer retry over locking the whole
> tp without retrying, that is why and how it is better?
>
> Personally I always prefer non-retry logic, because it is very easy
> to understand and justify its correctness.
>
> As you prefer otherwise, please share your reasoning in changelog.
>
> Thanks!

At the moment filter removal code is implemented by cls API in following
fashion:

1) tc_del_tfilter() obtains opaque void pointer to filter by calling
tp->ops->get()

2) Pass filter pointer to tfilter_del_notify() which prepares skb with
all necessary info about filter that is being removed and...

3) ... calls tp->ops->delete() to actually delete filter.

Between 1) and 3) filter can be removed concurrently and there is
nothing we can do about it in flower, besides account for that with some
kind of retry logic. I will explain why I prefer cls API to not just
lock whole classifier instance when modifying it in any way in reply to
cls API patch "net: sched: protect filter_chain list with
filter_chain_lock mutex" discussion.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-19  9:45     ` Vlad Buslov
@ 2019-02-20 22:33       ` Cong Wang
  2019-02-21 17:45         ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-20 22:33 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Tue, Feb 19, 2019 at 1:46 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Mon 18 Feb 2019 at 19:08, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> Flower classifier only changes root pointer during init and destroy. Cls
> >> API implements reference counting for tcf_proto, so there is no danger of
> >> concurrent access to tp when it is being destroyed, even without protection
> >> provided by rtnl lock.
> >
> > How about atomicity? Refcnt doesn't guarantee atomicity, how do
> > you make sure two concurrent modifications are atomic?
>
> In order to guarantee atomicity I lock shared flower classifier data
> structures with tp->lock in following patches.

Sure, I meant the atomicity of the _whole_ change, as you know
the TC filters are stored in hierarchical structures: a block, a chain,
a tp struct, some type-specific data structure like a hash table.

Locking tp only solves a partial of the atomicity here. Are you
going to restart the whole change from top down to the bottom?


>
> >
> >
> >>
> >> Implement new function fl_head_dereference() to dereference tp->root
> >> without checking for rtnl lock. Use it in all flower function that obtain
> >> head pointer instead of rtnl_dereference().
> >>
> >
> > So what lock protects RCU writers after this patch?
>
> I explained it in comment for fl_head_dereference(), but should have
> copied this information to changelog as well:
> Flower classifier only changes root pointer during init and destroy.
> Cls API implements reference counting for tcf_proto, so there is no
> danger of concurrent access to tp when it is being destroyed, even
> without protection provided by rtnl lock.

So you are saying an RCU pointer is okay to deference without
any lock eve without RCU read lock, right?


>
> In initial version of this change I used tp->lock to protect tp->root
> access and verified it with lockdep, but during internal review Jiri
> noted that this is not needed in current flower implementation.

Let's see what you have on top of your own branch
unlocked_flower_cong_1:

1458 static int fl_change(struct net *net, struct sk_buff *in_skb,
1459                      struct tcf_proto *tp, unsigned long base,
1460                      u32 handle, struct nlattr **tca,
1461                      void **arg, bool ovr, bool rtnl_held,
1462                      struct netlink_ext_ack *extack)
1463 {
1464         struct cls_fl_head *head = fl_head_dereference(tp);

At the point of line 1464, there is no lock taken, tp->lock is taken
after it, block->lock or chain lock is already unlocked before ->change().

So, what protects this RCU structure? According to RCU, it must be
either RCU read lock or some writer lock. I see none here. :(

What am I missing?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops
  2019-02-19 10:15     ` Vlad Buslov
@ 2019-02-20 22:36       ` Cong Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Cong Wang @ 2019-02-20 22:36 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Tue, Feb 19, 2019 at 2:15 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Mon 18 Feb 2019 at 19:27, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >> Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its
> >> ops callbacks don't require caller to hold rtnl lock.
> >
> > So, if this means RTNL is gone for all cls_flower changes, why
> > do I still see rtnl_lock() in cls_flower.c after all your patches in
> > this set?
>
> It doesn't say that rtnl lock is gone, what it says is that caller
> doesn't have to obtain rtnl lock before calling flower ops callbacks.

So RTNL lock is still a bottle neck after all of these patches,
isn't it? :)

Yeah, please kindly add an explanation for why RTNL lock is still
here.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-20 22:33       ` Cong Wang
@ 2019-02-21 17:45         ` Vlad Buslov
  2019-02-22 19:32           ` Cong Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-21 17:45 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Wed 20 Feb 2019 at 22:33, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, Feb 19, 2019 at 1:46 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Mon 18 Feb 2019 at 19:08, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >> Flower classifier only changes root pointer during init and destroy. Cls
>> >> API implements reference counting for tcf_proto, so there is no danger of
>> >> concurrent access to tp when it is being destroyed, even without protection
>> >> provided by rtnl lock.
>> >
>> > How about atomicity? Refcnt doesn't guarantee atomicity, how do
>> > you make sure two concurrent modifications are atomic?
>>
>> In order to guarantee atomicity I lock shared flower classifier data
>> structures with tp->lock in following patches.
>
> Sure, I meant the atomicity of the _whole_ change, as you know
> the TC filters are stored in hierarchical structures: a block, a chain,
> a tp struct, some type-specific data structure like a hash table.
>
> Locking tp only solves a partial of the atomicity here. Are you
> going to restart the whole change from top down to the bottom?

You mean in cases where there is a conflict? Yes, processing EAGAIN in
tc_new_tfilter() involves releasing and re-obtaining all locks and
references.

>
>
>>
>> >
>> >
>> >>
>> >> Implement new function fl_head_dereference() to dereference tp->root
>> >> without checking for rtnl lock. Use it in all flower function that obtain
>> >> head pointer instead of rtnl_dereference().
>> >>
>> >
>> > So what lock protects RCU writers after this patch?
>>
>> I explained it in comment for fl_head_dereference(), but should have
>> copied this information to changelog as well:
>> Flower classifier only changes root pointer during init and destroy.
>> Cls API implements reference counting for tcf_proto, so there is no
>> danger of concurrent access to tp when it is being destroyed, even
>> without protection provided by rtnl lock.
>
> So you are saying an RCU pointer is okay to deference without
> any lock eve without RCU read lock, right?
>
>
>>
>> In initial version of this change I used tp->lock to protect tp->root
>> access and verified it with lockdep, but during internal review Jiri
>> noted that this is not needed in current flower implementation.
>
> Let's see what you have on top of your own branch
> unlocked_flower_cong_1:
>
> 1458 static int fl_change(struct net *net, struct sk_buff *in_skb,
> 1459                      struct tcf_proto *tp, unsigned long base,
> 1460                      u32 handle, struct nlattr **tca,
> 1461                      void **arg, bool ovr, bool rtnl_held,
> 1462                      struct netlink_ext_ack *extack)
> 1463 {
> 1464         struct cls_fl_head *head = fl_head_dereference(tp);
>
> At the point of line 1464, there is no lock taken, tp->lock is taken
> after it, block->lock or chain lock is already unlocked before ->change().
>
> So, what protects this RCU structure? According to RCU, it must be
> either RCU read lock or some writer lock. I see none here. :(
>
> What am I missing?

Initially I had flower implementation that protected tp->root access
with tp->lock, re-obtaining lock and head reference each time it was
used, sometimes multiple times per function (head reference is usually
obtained in beginning of every flower API function and used multiple
times afterwards). This complicated the code and reduced parallelism.
However, in case of flower classifier tp->root is never reassigned after
init, so essentially there is no "CU" part in this "RCU" pointer. This
realization allowed me to refactor unlocked flower code to look much
nicer and perform better. Am I missing something in flower classifier
implementation?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-21 17:45         ` Vlad Buslov
@ 2019-02-22 19:32           ` Cong Wang
  2019-02-25 16:11             ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-22 19:32 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Thu, Feb 21, 2019 at 9:45 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Wed 20 Feb 2019 at 22:33, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Tue, Feb 19, 2019 at 1:46 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >>
> >> On Mon 18 Feb 2019 at 19:08, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> >>
> >> >> Flower classifier only changes root pointer during init and destroy. Cls
> >> >> API implements reference counting for tcf_proto, so there is no danger of
> >> >> concurrent access to tp when it is being destroyed, even without protection
> >> >> provided by rtnl lock.
> >> >
> >> > How about atomicity? Refcnt doesn't guarantee atomicity, how do
> >> > you make sure two concurrent modifications are atomic?
> >>
> >> In order to guarantee atomicity I lock shared flower classifier data
> >> structures with tp->lock in following patches.
> >
> > Sure, I meant the atomicity of the _whole_ change, as you know
> > the TC filters are stored in hierarchical structures: a block, a chain,
> > a tp struct, some type-specific data structure like a hash table.
> >
> > Locking tp only solves a partial of the atomicity here. Are you
> > going to restart the whole change from top down to the bottom?
>
> You mean in cases where there is a conflict? Yes, processing EAGAIN in
> tc_new_tfilter() involves releasing and re-obtaining all locks and
> references.

Sure, restart only happens when a conflict is detected, this is
why I called it worst scenario.


>
> >
> >
> >>
> >> >
> >> >
> >> >>
> >> >> Implement new function fl_head_dereference() to dereference tp->root
> >> >> without checking for rtnl lock. Use it in all flower function that obtain
> >> >> head pointer instead of rtnl_dereference().
> >> >>
> >> >
> >> > So what lock protects RCU writers after this patch?
> >>
> >> I explained it in comment for fl_head_dereference(), but should have
> >> copied this information to changelog as well:
> >> Flower classifier only changes root pointer during init and destroy.
> >> Cls API implements reference counting for tcf_proto, so there is no
> >> danger of concurrent access to tp when it is being destroyed, even
> >> without protection provided by rtnl lock.
> >
> > So you are saying an RCU pointer is okay to deference without
> > any lock eve without RCU read lock, right?
> >
> >
> >>
> >> In initial version of this change I used tp->lock to protect tp->root
> >> access and verified it with lockdep, but during internal review Jiri
> >> noted that this is not needed in current flower implementation.
> >
> > Let's see what you have on top of your own branch
> > unlocked_flower_cong_1:
> >
> > 1458 static int fl_change(struct net *net, struct sk_buff *in_skb,
> > 1459                      struct tcf_proto *tp, unsigned long base,
> > 1460                      u32 handle, struct nlattr **tca,
> > 1461                      void **arg, bool ovr, bool rtnl_held,
> > 1462                      struct netlink_ext_ack *extack)
> > 1463 {
> > 1464         struct cls_fl_head *head = fl_head_dereference(tp);
> >
> > At the point of line 1464, there is no lock taken, tp->lock is taken
> > after it, block->lock or chain lock is already unlocked before ->change().
> >
> > So, what protects this RCU structure? According to RCU, it must be
> > either RCU read lock or some writer lock. I see none here. :(
> >
> > What am I missing?
>
> Initially I had flower implementation that protected tp->root access
> with tp->lock, re-obtaining lock and head reference each time it was
> used, sometimes multiple times per function (head reference is usually
> obtained in beginning of every flower API function and used multiple
> times afterwards). This complicated the code and reduced parallelism.
> However, in case of flower classifier tp->root is never reassigned after
> init, so essentially there is no "CU" part in this "RCU" pointer. This
> realization allowed me to refactor unlocked flower code to look much
> nicer and perform better. Am I missing something in flower classifier
> implementation?

So if it is no longer RCU any more, why do you still use
rcu_dereference_protected()? That is, why not just deref it as a raw
pointer?

And, I don't think I can buy your argument here. The RCU infrastructure
should not be changed even after your patches, the fast path is still
protocted by RCU read lock, while the slow path now is protected by
some smaller-scope locks. What makes cls_flower so unique that
it doesn't even need RCU here? tp->root is not reassigned but it is still
freed via RCU infra, that is in fl_destroy_sleepable().

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-22 19:32           ` Cong Wang
@ 2019-02-25 16:11             ` Vlad Buslov
  2019-02-25 22:39               ` Cong Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-25 16:11 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Fri 22 Feb 2019 at 19:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Feb 21, 2019 at 9:45 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Wed 20 Feb 2019 at 22:33, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Tue, Feb 19, 2019 at 1:46 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >>
>> >> On Mon 18 Feb 2019 at 19:08, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> > On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >> >>
>> >> >> Flower classifier only changes root pointer during init and destroy. Cls
>> >> >> API implements reference counting for tcf_proto, so there is no danger of
>> >> >> concurrent access to tp when it is being destroyed, even without protection
>> >> >> provided by rtnl lock.
>> >> >
>> >> > How about atomicity? Refcnt doesn't guarantee atomicity, how do
>> >> > you make sure two concurrent modifications are atomic?
>> >>
>> >> In order to guarantee atomicity I lock shared flower classifier data
>> >> structures with tp->lock in following patches.
>> >
>> > Sure, I meant the atomicity of the _whole_ change, as you know
>> > the TC filters are stored in hierarchical structures: a block, a chain,
>> > a tp struct, some type-specific data structure like a hash table.
>> >
>> > Locking tp only solves a partial of the atomicity here. Are you
>> > going to restart the whole change from top down to the bottom?
>>
>> You mean in cases where there is a conflict? Yes, processing EAGAIN in
>> tc_new_tfilter() involves releasing and re-obtaining all locks and
>> references.
>
> Sure, restart only happens when a conflict is detected, this is
> why I called it worst scenario.
>
>
>>
>> >
>> >
>> >>
>> >> >
>> >> >
>> >> >>
>> >> >> Implement new function fl_head_dereference() to dereference tp->root
>> >> >> without checking for rtnl lock. Use it in all flower function that obtain
>> >> >> head pointer instead of rtnl_dereference().
>> >> >>
>> >> >
>> >> > So what lock protects RCU writers after this patch?
>> >>
>> >> I explained it in comment for fl_head_dereference(), but should have
>> >> copied this information to changelog as well:
>> >> Flower classifier only changes root pointer during init and destroy.
>> >> Cls API implements reference counting for tcf_proto, so there is no
>> >> danger of concurrent access to tp when it is being destroyed, even
>> >> without protection provided by rtnl lock.
>> >
>> > So you are saying an RCU pointer is okay to deference without
>> > any lock eve without RCU read lock, right?
>> >
>> >
>> >>
>> >> In initial version of this change I used tp->lock to protect tp->root
>> >> access and verified it with lockdep, but during internal review Jiri
>> >> noted that this is not needed in current flower implementation.
>> >
>> > Let's see what you have on top of your own branch
>> > unlocked_flower_cong_1:
>> >
>> > 1458 static int fl_change(struct net *net, struct sk_buff *in_skb,
>> > 1459                      struct tcf_proto *tp, unsigned long base,
>> > 1460                      u32 handle, struct nlattr **tca,
>> > 1461                      void **arg, bool ovr, bool rtnl_held,
>> > 1462                      struct netlink_ext_ack *extack)
>> > 1463 {
>> > 1464         struct cls_fl_head *head = fl_head_dereference(tp);
>> >
>> > At the point of line 1464, there is no lock taken, tp->lock is taken
>> > after it, block->lock or chain lock is already unlocked before ->change().
>> >
>> > So, what protects this RCU structure? According to RCU, it must be
>> > either RCU read lock or some writer lock. I see none here. :(
>> >
>> > What am I missing?
>>
>> Initially I had flower implementation that protected tp->root access
>> with tp->lock, re-obtaining lock and head reference each time it was
>> used, sometimes multiple times per function (head reference is usually
>> obtained in beginning of every flower API function and used multiple
>> times afterwards). This complicated the code and reduced parallelism.
>> However, in case of flower classifier tp->root is never reassigned after
>> init, so essentially there is no "CU" part in this "RCU" pointer. This
>> realization allowed me to refactor unlocked flower code to look much
>> nicer and perform better. Am I missing something in flower classifier
>> implementation?
>
> So if it is no longer RCU any more, why do you still use
> rcu_dereference_protected()? That is, why not just deref it as a raw
> pointer?
>
> And, I don't think I can buy your argument here. The RCU infrastructure
> should not be changed even after your patches, the fast path is still
> protocted by RCU read lock, while the slow path now is protected by
> some smaller-scope locks. What makes cls_flower so unique that
> it doesn't even need RCU here? tp->root is not reassigned but it is still
> freed via RCU infra, that is in fl_destroy_sleepable().
>
> Thanks.

My cls API patch set introduced reference counting for tcf_proto
structure. With that change tp->ops->destroy() (which calls fl_destroy()
and fl_destroy_sleepable(), in case of flower classifier) is only called
after last reference to tp is released. All slow path users of tp->ops
must obtain reference to tp, so concurrent call to fl_destroy() is not
possible. Before this change tcf_proto structure didn't have reference
counting support and required users to obtain rtnl mutex before calling
its ops callbacks. This was verified in flower by using rtnl_dereference
to obtain tp->root.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-25 16:11             ` Vlad Buslov
@ 2019-02-25 22:39               ` Cong Wang
  2019-02-26 14:57                 ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-25 22:39 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Mon, Feb 25, 2019 at 8:11 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Fri 22 Feb 2019 at 19:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > So if it is no longer RCU any more, why do you still use
> > rcu_dereference_protected()? That is, why not just deref it as a raw
> > pointer?


Any answer for this question?


> >
> > And, I don't think I can buy your argument here. The RCU infrastructure
> > should not be changed even after your patches, the fast path is still
> > protocted by RCU read lock, while the slow path now is protected by
> > some smaller-scope locks. What makes cls_flower so unique that
> > it doesn't even need RCU here? tp->root is not reassigned but it is still
> > freed via RCU infra, that is in fl_destroy_sleepable().
> >
> > Thanks.
>
> My cls API patch set introduced reference counting for tcf_proto
> structure. With that change tp->ops->destroy() (which calls fl_destroy()
> and fl_destroy_sleepable(), in case of flower classifier) is only called
> after last reference to tp is released. All slow path users of tp->ops
> must obtain reference to tp, so concurrent call to fl_destroy() is not
> possible. Before this change tcf_proto structure didn't have reference
> counting support and required users to obtain rtnl mutex before calling
> its ops callbacks. This was verified in flower by using rtnl_dereference
> to obtain tp->root.

Yes, but fast path doesn't hold a refnct of tp, does it? If not, you still
rely on RCU for sync with readers. If yes, then probably RCU can be
gone.

Now you are in a middle of the two, that is taking RCU read lock on
fast path without a refcnt, meanwhile still uses rcu_dereference on
slow paths without any lock.

For me, you at least don't use the RCU API correctly here.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-25 22:39               ` Cong Wang
@ 2019-02-26 14:57                 ` Vlad Buslov
  2019-02-28  0:49                   ` Cong Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-26 14:57 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller


On Mon 25 Feb 2019 at 22:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Feb 25, 2019 at 8:11 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Fri 22 Feb 2019 at 19:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >
>> > So if it is no longer RCU any more, why do you still use
>> > rcu_dereference_protected()? That is, why not just deref it as a raw
>> > pointer?
>
>
> Any answer for this question?

I decided that since there is neither possibility of concurrent pointer
assignment nor deallocation of object that it points to, most performant
solution would be using rcu_dereference_protected() which is the only
RCU dereference helper that doesn't use READ_ONCE. I now understand that
this is confusing (and most likely doesn't provide any noticeable
performance improvement anyway!) and will change this patch to use
rcu_dereference_raw() as you suggest.

>
>
>> >
>> > And, I don't think I can buy your argument here. The RCU infrastructure
>> > should not be changed even after your patches, the fast path is still
>> > protocted by RCU read lock, while the slow path now is protected by
>> > some smaller-scope locks. What makes cls_flower so unique that
>> > it doesn't even need RCU here? tp->root is not reassigned but it is still
>> > freed via RCU infra, that is in fl_destroy_sleepable().
>> >
>> > Thanks.
>>
>> My cls API patch set introduced reference counting for tcf_proto
>> structure. With that change tp->ops->destroy() (which calls fl_destroy()
>> and fl_destroy_sleepable(), in case of flower classifier) is only called
>> after last reference to tp is released. All slow path users of tp->ops
>> must obtain reference to tp, so concurrent call to fl_destroy() is not
>> possible. Before this change tcf_proto structure didn't have reference
>> counting support and required users to obtain rtnl mutex before calling
>> its ops callbacks. This was verified in flower by using rtnl_dereference
>> to obtain tp->root.
>
> Yes, but fast path doesn't hold a refnct of tp, does it? If not, you still
> rely on RCU for sync with readers. If yes, then probably RCU can be
> gone.
>
> Now you are in a middle of the two, that is taking RCU read lock on
> fast path without a refcnt, meanwhile still uses rcu_dereference on
> slow paths without any lock.
>
> For me, you at least don't use the RCU API correctly here.
>
> Thanks.

Yes, fast path still relies on RCU. What I meant is that slow path (cls
API) now only calls tp ops after obtaining reference to tp, so there is
no need to protect it from concurrent tp->ops->destroy() by means of
rtnl or any other lock. I understand that using
rcu_dereference_protected() is confusing in this case and will refactor
this patch appropriately.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-26 14:57                 ` Vlad Buslov
@ 2019-02-28  0:49                   ` Cong Wang
  2019-02-28 18:35                     ` Vlad Buslov
  0 siblings, 1 reply; 44+ messages in thread
From: Cong Wang @ 2019-02-28  0:49 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, Jiri Pirko,
	David Miller

On Tue, Feb 26, 2019 at 6:57 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Mon 25 Feb 2019 at 22:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Mon, Feb 25, 2019 at 8:11 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >>
> >> On Fri 22 Feb 2019 at 19:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> >
> >> > So if it is no longer RCU any more, why do you still use
> >> > rcu_dereference_protected()? That is, why not just deref it as a raw
> >> > pointer?
> >
> >
> > Any answer for this question?
>
> I decided that since there is neither possibility of concurrent pointer
> assignment nor deallocation of object that it points to, most performant
> solution would be using rcu_dereference_protected() which is the only
> RCU dereference helper that doesn't use READ_ONCE. I now understand that
> this is confusing (and most likely doesn't provide any noticeable
> performance improvement anyway!) and will change this patch to use
> rcu_dereference_raw() as you suggest.

Yeah, please make sure sparse is happy with that. :)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-28  0:49                   ` Cong Wang
@ 2019-02-28 18:35                     ` Vlad Buslov
  2019-03-02  0:51                       ` Cong Wang
  0 siblings, 1 reply; 44+ messages in thread
From: Vlad Buslov @ 2019-02-28 18:35 UTC (permalink / raw)
  To: Cong Wang, Jiri Pirko
  Cc: Linux Kernel Network Developers, Jamal Hadi Salim, David Miller


On Thu 28 Feb 2019 at 00:49, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Tue, Feb 26, 2019 at 6:57 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>>
>>
>> On Mon 25 Feb 2019 at 22:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Mon, Feb 25, 2019 at 8:11 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>> >>
>> >>
>> >> On Fri 22 Feb 2019 at 19:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> >> >
>> >> > So if it is no longer RCU any more, why do you still use
>> >> > rcu_dereference_protected()? That is, why not just deref it as a raw
>> >> > pointer?
>> >
>> >
>> > Any answer for this question?
>>
>> I decided that since there is neither possibility of concurrent pointer
>> assignment nor deallocation of object that it points to, most performant
>> solution would be using rcu_dereference_protected() which is the only
>> RCU dereference helper that doesn't use READ_ONCE. I now understand that
>> this is confusing (and most likely doesn't provide any noticeable
>> performance improvement anyway!) and will change this patch to use
>> rcu_dereference_raw() as you suggest.
>
> Yeah, please make sure sparse is happy with that. :)

I checked my flower change with sparse. It produced a lot of warnings,
some of which are several years old. None in the code I changed though:

  CHECK   net/sched/cls_flower.c
net/sched/cls_flower.c:200:20: warning: cast from restricted __be16
net/sched/cls_flower.c:200:20: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:200:20:    expected unsigned short [usertype] val
net/sched/cls_flower.c:200:20:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:200:20: warning: cast from restricted __be16
net/sched/cls_flower.c:200:20: warning: cast from restricted __be16
net/sched/cls_flower.c:201:20: warning: cast from restricted __be16
net/sched/cls_flower.c:201:20: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:201:20:    expected unsigned short [usertype] val
net/sched/cls_flower.c:201:20:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:201:20: warning: cast from restricted __be16
net/sched/cls_flower.c:201:20: warning: cast from restricted __be16
net/sched/cls_flower.c:202:19: warning: cast from restricted __be16
net/sched/cls_flower.c:202:19: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:202:19:    expected unsigned short [usertype] val
net/sched/cls_flower.c:202:19:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:202:19: warning: cast from restricted __be16
net/sched/cls_flower.c:202:19: warning: cast from restricted __be16
net/sched/cls_flower.c:203:19: warning: cast from restricted __be16
net/sched/cls_flower.c:203:19: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:203:19:    expected unsigned short [usertype] val
net/sched/cls_flower.c:203:19:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:203:19: warning: cast from restricted __be16
net/sched/cls_flower.c:203:19: warning: cast from restricted __be16
net/sched/cls_flower.c:206:21: warning: cast from restricted __be16
net/sched/cls_flower.c:206:21: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:206:21:    expected unsigned short [usertype] val
net/sched/cls_flower.c:206:21:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:206:21: warning: cast from restricted __be16
net/sched/cls_flower.c:206:21: warning: cast from restricted __be16
net/sched/cls_flower.c:206:21: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:206:42: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:207:21: warning: cast from restricted __be16
net/sched/cls_flower.c:207:21: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:207:21:    expected unsigned short [usertype] val
net/sched/cls_flower.c:207:21:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:207:21: warning: cast from restricted __be16
net/sched/cls_flower.c:207:21: warning: cast from restricted __be16
net/sched/cls_flower.c:207:21: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:207:42: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:223:20: warning: cast from restricted __be16
net/sched/cls_flower.c:223:20: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:223:20:    expected unsigned short [usertype] val
net/sched/cls_flower.c:223:20:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:223:20: warning: cast from restricted __be16
net/sched/cls_flower.c:223:20: warning: cast from restricted __be16
net/sched/cls_flower.c:224:20: warning: cast from restricted __be16
net/sched/cls_flower.c:224:20: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:224:20:    expected unsigned short [usertype] val
net/sched/cls_flower.c:224:20:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:224:20: warning: cast from restricted __be16
net/sched/cls_flower.c:224:20: warning: cast from restricted __be16
net/sched/cls_flower.c:225:19: warning: cast from restricted __be16
net/sched/cls_flower.c:225:19: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:225:19:    expected unsigned short [usertype] val
net/sched/cls_flower.c:225:19:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:225:19: warning: cast from restricted __be16
net/sched/cls_flower.c:225:19: warning: cast from restricted __be16
net/sched/cls_flower.c:226:19: warning: cast from restricted __be16
net/sched/cls_flower.c:226:19: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:226:19:    expected unsigned short [usertype] val
net/sched/cls_flower.c:226:19:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:226:19: warning: cast from restricted __be16
net/sched/cls_flower.c:226:19: warning: cast from restricted __be16
net/sched/cls_flower.c:229:21: warning: cast from restricted __be16
net/sched/cls_flower.c:229:21: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:229:21:    expected unsigned short [usertype] val
net/sched/cls_flower.c:229:21:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:229:21: warning: cast from restricted __be16
net/sched/cls_flower.c:229:21: warning: cast from restricted __be16
net/sched/cls_flower.c:229:21: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:229:42: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:230:21: warning: cast from restricted __be16
net/sched/cls_flower.c:230:21: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:230:21:    expected unsigned short [usertype] val
net/sched/cls_flower.c:230:21:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:230:21: warning: cast from restricted __be16
net/sched/cls_flower.c:230:21: warning: cast from restricted __be16
net/sched/cls_flower.c:230:21: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:230:42: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:744:14: warning: cast from restricted __be16
net/sched/cls_flower.c:744:14: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:744:14:    expected unsigned short [usertype] val
net/sched/cls_flower.c:744:14:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:744:14: warning: cast from restricted __be16
net/sched/cls_flower.c:744:14: warning: cast from restricted __be16
net/sched/cls_flower.c:744:40: warning: cast from restricted __be16
net/sched/cls_flower.c:744:40: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:744:40:    expected unsigned short [usertype] val
net/sched/cls_flower.c:744:40:    got restricted __be16 [usertype] dst
net/sched/cls_flower.c:744:40: warning: cast from restricted __be16
net/sched/cls_flower.c:744:40: warning: cast from restricted __be16
net/sched/cls_flower.c:744:14: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:744:40: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:746:15: warning: cast from restricted __be16
net/sched/cls_flower.c:746:15: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:746:15:    expected unsigned short [usertype] val
net/sched/cls_flower.c:746:15:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:746:15: warning: cast from restricted __be16
net/sched/cls_flower.c:746:15: warning: cast from restricted __be16
net/sched/cls_flower.c:746:41: warning: cast from restricted __be16
net/sched/cls_flower.c:746:41: warning: incorrect type in argument 1 (different base types)
net/sched/cls_flower.c:746:41:    expected unsigned short [usertype] val
net/sched/cls_flower.c:746:41:    got restricted __be16 [usertype] src
net/sched/cls_flower.c:746:41: warning: cast from restricted __be16
net/sched/cls_flower.c:746:41: warning: cast from restricted __be16
net/sched/cls_flower.c:746:15: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:746:41: warning: restricted __be16 degrades to integer
net/sched/cls_flower.c:830:15: warning: cast to restricted __be32
net/sched/cls_flower.c:830:15: warning: cast to restricted __be32
net/sched/cls_flower.c:830:15: warning: cast to restricted __be32
net/sched/cls_flower.c:830:15: warning: cast to restricted __be32
net/sched/cls_flower.c:830:15: warning: cast to restricted __be32
net/sched/cls_flower.c:830:15: warning: cast to restricted __be32
net/sched/cls_flower.c:831:16: warning: cast to restricted __be32
net/sched/cls_flower.c:831:16: warning: cast to restricted __be32
net/sched/cls_flower.c:831:16: warning: cast to restricted __be32
net/sched/cls_flower.c:831:16: warning: cast to restricted __be32
net/sched/cls_flower.c:831:16: warning: cast to restricted __be32
net/sched/cls_flower.c:831:16: warning: cast to restricted __be32

Cong, Jiri, do you want me to send patch that fixes all of these first?

Regards,
Vlad

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference
  2019-02-28 18:35                     ` Vlad Buslov
@ 2019-03-02  0:51                       ` Cong Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Cong Wang @ 2019-03-02  0:51 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Jiri Pirko, Linux Kernel Network Developers, Jamal Hadi Salim,
	David Miller

On Thu, Feb 28, 2019 at 10:35 AM Vlad Buslov <vladbu@mellanox.com> wrote:
>
>
> On Thu 28 Feb 2019 at 00:49, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Tue, Feb 26, 2019 at 6:57 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >>
> >>
> >> On Mon 25 Feb 2019 at 22:39, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> > On Mon, Feb 25, 2019 at 8:11 AM Vlad Buslov <vladbu@mellanox.com> wrote:
> >> >>
> >> >>
> >> >> On Fri 22 Feb 2019 at 19:32, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >> >> >
> >> >> > So if it is no longer RCU any more, why do you still use
> >> >> > rcu_dereference_protected()? That is, why not just deref it as a raw
> >> >> > pointer?
> >> >
> >> >
> >> > Any answer for this question?
> >>
> >> I decided that since there is neither possibility of concurrent pointer
> >> assignment nor deallocation of object that it points to, most performant
> >> solution would be using rcu_dereference_protected() which is the only
> >> RCU dereference helper that doesn't use READ_ONCE. I now understand that
> >> this is confusing (and most likely doesn't provide any noticeable
> >> performance improvement anyway!) and will change this patch to use
> >> rcu_dereference_raw() as you suggest.
> >
> > Yeah, please make sure sparse is happy with that. :)
>
> I checked my flower change with sparse. It produced a lot of warnings,
> some of which are several years old. None in the code I changed though:

If so, we can address this later, it is not urgent.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-03-02  0:51 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-14  7:47 [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference Vlad Buslov
2019-02-18 19:08   ` Cong Wang
2019-02-19  9:45     ` Vlad Buslov
2019-02-20 22:33       ` Cong Wang
2019-02-21 17:45         ` Vlad Buslov
2019-02-22 19:32           ` Cong Wang
2019-02-25 16:11             ` Vlad Buslov
2019-02-25 22:39               ` Cong Wang
2019-02-26 14:57                 ` Vlad Buslov
2019-02-28  0:49                   ` Cong Wang
2019-02-28 18:35                     ` Vlad Buslov
2019-03-02  0:51                       ` Cong Wang
2019-02-14  7:47 ` [PATCH net-next 02/12] net: sched: flower: refactor fl_change Vlad Buslov
2019-02-14 20:34   ` Stefano Brivio
2019-02-15 10:38     ` Vlad Buslov
2019-02-15 10:47       ` Stefano Brivio
2019-02-15 16:25         ` Vlad Buslov
2019-02-18 18:20           ` Stefano Brivio
2019-02-14  7:47 ` [PATCH net-next 03/12] net: sched: flower: introduce reference counting for filters Vlad Buslov
2019-02-14 20:34   ` Stefano Brivio
2019-02-15 11:22     ` Vlad Buslov
2019-02-15 12:32       ` Stefano Brivio
2019-02-14  7:47 ` [PATCH net-next 04/12] net: sched: flower: track filter deletion with flag Vlad Buslov
2019-02-14 20:49   ` Stefano Brivio
2019-02-15 15:54     ` Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 05/12] net: sched: flower: add reference counter to flower mask Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 06/12] net: sched: flower: handle concurrent mask insertion Vlad Buslov
2019-02-15 22:46   ` Stefano Brivio
2019-02-14  7:47 ` [PATCH net-next 07/12] net: sched: flower: protect masks list with spinlock Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 08/12] net: sched: flower: handle concurrent filter insertion in fl_change Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 09/12] net: sched: flower: handle concurrent tcf proto deletion Vlad Buslov
2019-02-18 20:47   ` Cong Wang
2019-02-19 14:08     ` Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 10/12] net: sched: flower: protect flower classifier state with spinlock Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 11/12] net: sched: flower: track rtnl lock state Vlad Buslov
2019-02-15 22:46   ` Stefano Brivio
2019-02-18  9:35     ` Vlad Buslov
2019-02-14  7:47 ` [PATCH net-next 12/12] net: sched: flower: set unlocked flag for flower proto ops Vlad Buslov
2019-02-18 19:27   ` Cong Wang
2019-02-19 10:15     ` Vlad Buslov
2019-02-20 22:36       ` Cong Wang
2019-02-18 19:15 ` [PATCH net-next 00/12] Refactor flower classifier to remove dependency on rtnl lock Cong Wang
2019-02-19 10:00   ` Vlad Buslov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).