netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers
@ 2019-08-22 12:43 Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore Vlad Buslov
                   ` (9 more replies)
  0 siblings, 10 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo, Vlad Buslov

Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. This patch set introduces new API
that allows drivers to register callbacks that are not dependent on rtnl
lock and unlocked classifiers to offload filters without obtaining rtnl
lock first, which is intended to allow offloading tc rules in parallel.

Recently, new rtnl registration flag RTNL_FLAG_DOIT_UNLOCKED was added.
TC rule update handlers (RTM_NEWTFILTER, RTM_DELTFILTER, etc.) are
already registered with this flag and only take rtnl lock when qdisc or
classifier requires it. Classifiers can indicate that their ops
callbacks don't require caller to hold rtnl lock by setting the
TCF_PROTO_OPS_DOIT_UNLOCKED flag. Unlocked implementation of flower
classifier is now upstreamed. However, this implementation still obtains
rtnl lock before calling hardware offloads API.

Implement following cls API changes:

- Introduce new "unlocked_driver_cb" flag to struct flow_block_offload
  to allow registering and unregistering block hardware offload
  callbacks that do not require caller to hold rtnl lock. Drivers that
  doesn't require users of its tc offload callbacks to hold rtnl lock
  sets the flag to true on block bind/unbind. Internally tcf_block is
  extended with additional lockeddevcnt counter that is used to count
  number of devices that require rtnl lock that block is bound to. When
  this counter is zero, tc_setup_cb_*() functions execute callbacks
  without obtaining rtnl lock.

- Extend cls API single hardware rule update tc_setup_cb_call() function
  with tc_setup_cb_add(), tc_setup_cb_replace() and
  tc_setup_cb_destroy() functions. These new APIs are needed to move
  management of block offload counter, filter in hardware counter and
  flag from classifier implementations to cls API, which is now
  responsible for managing them in concurrency-safe manner. Access to
  cb_list from callback execution code is synchronized by obtaining new
  'cb_lock' rw_semaphore in read mode, which allows executing callbacks
  in parallel, but excludes any modifications of data from
  register/unregister code. tcf_block offloads counter type is changed
  to atomic integer to allow updating the counter concurrently.

- Extend classifier ops with new ops->hw_add() and ops->hw_del()
  callbacks which are used to notify unlocked classifiers when filter is
  successfully added or deleted to hardware without releasing cb_lock.
  This is necessary to update classifier state atomically with callback
  list traversal and updating of all relevant counters and allows
  unlocked classifiers to synchronize with concurrent reoffload without
  requiring any changes to driver callback API implementations.

New tc flow_action infrastructure is also modified to allow its user to
execute without rtnl lock protection. Function tc_setup_flow_action() is
modified to conditionally obtain rtnl lock before accessing action
state. Action data that is accessed by reference is either copied or
reference counted to prevent concurrent action overwrite from
deallocating it. New function tc_cleanup_flow_action() is introduced to
cleanup/release all such data obtained by tc_setup_flow_action().

Flower classifier (only unlocked classifier at the moment) is modified
to use new cls hardware offloads API and no longer obtains rtnl lock
before calling it.

Vlad Buslov (10):
  net: sched: protect block offload-related fields with rw_semaphore
  net: sched: change tcf block offload counter type to atomic_t
  net: sched: refactor block offloads counter usage
  net: sched: notify classifier on successful offload add/delete
  net: sched: add API for registering unlocked offload block callbacks
  net: sched: conditionally obtain rtnl lock in cls hw offloads API
  net: sched: take rtnl lock in tc_setup_flow_action()
  net: sched: take reference to action dev before calling offloads
  net: sched: copy tunnel info when setting flow_action entry->tunnel
  net: sched: flower: don't take rtnl lock for cls hw offloads API

 .../net/ethernet/mellanox/mlx5/core/en_main.c |   2 +
 .../net/ethernet/mellanox/mlx5/core/en_rep.c  |   3 +
 include/net/flow_offload.h                    |   1 +
 include/net/pkt_cls.h                         |  17 +-
 include/net/sch_generic.h                     |  42 +--
 include/net/tc_act/tc_tunnel_key.h            |  17 +
 net/sched/cls_api.c                           | 322 +++++++++++++++++-
 net/sched/cls_bpf.c                           |  22 +-
 net/sched/cls_flower.c                        | 111 +++---
 net/sched/cls_matchall.c                      |  19 +-
 net/sched/cls_u32.c                           |  17 +-
 11 files changed, 437 insertions(+), 136 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 22:15   ` Jakub Kicinski
  2019-08-22 12:43 ` [PATCH net-next 02/10] net: sched: change tcf block offload counter type to atomic_t Vlad Buslov
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock'
rwsem and use it to protect flow_block->cb_list and related counters from
concurrent modification. The lock is taken in read mode for read-only
traversal of cb_list in tc_setup_cb_call() and write mode in all other
cases. This approach ensures that:

- cb_list is not changed concurrently while filters is being offloaded on
  block.

- block->nooffloaddevcnt is checked while holding the lock in read mode,
  but is only changed by bind/unbind code when holding the cb_lock in write
  mode to prevent concurrent modification.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  2 ++
 net/sched/cls_api.c       | 45 +++++++++++++++++++++++++++++++--------
 2 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d9f359af0b93..a3eaf5f9d28f 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -13,6 +13,7 @@
 #include <linux/refcount.h>
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
+#include <linux/rwsem.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 #include <net/flow_offload.h>
@@ -396,6 +397,7 @@ struct tcf_block {
 	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
+	struct rw_semaphore cb_lock; /* protects cb_list and offload counters */
 	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e0d8b456e9f5..a3b07c6e3f53 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -568,9 +568,11 @@ static void tc_indr_block_ing_cmd(struct net_device *dev,
 
 	bo.block = &block->flow_block;
 
+	down_write(&block->cb_lock);
 	cb(dev, cb_priv, TC_SETUP_BLOCK, &bo);
 
 	tcf_block_setup(block, &bo);
+	up_write(&block->cb_lock);
 }
 
 static struct tcf_block *tc_dev_ingress_block(struct net_device *dev)
@@ -661,6 +663,7 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 	struct net_device *dev = q->dev_queue->dev;
 	int err;
 
+	down_write(&block->cb_lock);
 	if (!dev->netdev_ops->ndo_setup_tc)
 		goto no_offload_dev_inc;
 
@@ -669,24 +672,31 @@ static int tcf_block_offload_bind(struct tcf_block *block, struct Qdisc *q,
 	 */
 	if (!tc_can_offload(dev) && tcf_block_offload_in_use(block)) {
 		NL_SET_ERR_MSG(extack, "Bind to offloaded block failed as dev has offload disabled");
-		return -EOPNOTSUPP;
+		err = -EOPNOTSUPP;
+		goto errout;
 	}
 
 	err = tcf_block_offload_cmd(block, dev, ei, FLOW_BLOCK_BIND, extack);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_inc;
 	if (err)
-		return err;
+		goto errout;
 
 	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
+	up_write(&block->cb_lock);
 	return 0;
 
 no_offload_dev_inc:
-	if (tcf_block_offload_in_use(block))
-		return -EOPNOTSUPP;
+	if (tcf_block_offload_in_use(block)) {
+		err = -EOPNOTSUPP;
+		goto errout;
+	}
+	err = 0;
 	block->nooffloaddevcnt++;
 	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_BIND, extack);
-	return 0;
+errout:
+	up_write(&block->cb_lock);
+	return err;
 }
 
 static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
@@ -695,6 +705,7 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 	struct net_device *dev = q->dev_queue->dev;
 	int err;
 
+	down_write(&block->cb_lock);
 	tc_indr_block_call(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
 
 	if (!dev->netdev_ops->ndo_setup_tc)
@@ -702,10 +713,12 @@ static void tcf_block_offload_unbind(struct tcf_block *block, struct Qdisc *q,
 	err = tcf_block_offload_cmd(block, dev, ei, FLOW_BLOCK_UNBIND, NULL);
 	if (err == -EOPNOTSUPP)
 		goto no_offload_dev_dec;
+	up_write(&block->cb_lock);
 	return;
 
 no_offload_dev_dec:
 	WARN_ON(block->nooffloaddevcnt-- == 0);
+	up_write(&block->cb_lock);
 }
 
 static int
@@ -820,6 +833,7 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 		return ERR_PTR(-ENOMEM);
 	}
 	mutex_init(&block->lock);
+	init_rwsem(&block->cb_lock);
 	flow_block_init(&block->flow_block);
 	INIT_LIST_HEAD(&block->chain_list);
 	INIT_LIST_HEAD(&block->owner_list);
@@ -1355,6 +1369,8 @@ tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb,
 	struct tcf_proto *tp, *tp_prev;
 	int err;
 
+	lockdep_assert_held(&block->cb_lock);
+
 	for (chain = __tcf_get_next_chain(block, NULL);
 	     chain;
 	     chain_prev = chain,
@@ -1393,6 +1409,8 @@ static int tcf_block_bind(struct tcf_block *block,
 	struct flow_block_cb *block_cb, *next;
 	int err, i = 0;
 
+	lockdep_assert_held(&block->cb_lock);
+
 	list_for_each_entry(block_cb, &bo->cb_list, list) {
 		err = tcf_block_playback_offloads(block, block_cb->cb,
 						  block_cb->cb_priv, true,
@@ -1427,6 +1445,8 @@ static void tcf_block_unbind(struct tcf_block *block,
 {
 	struct flow_block_cb *block_cb, *next;
 
+	lockdep_assert_held(&block->cb_lock);
+
 	list_for_each_entry_safe(block_cb, next, &bo->cb_list, list) {
 		tcf_block_playback_offloads(block, block_cb->cb,
 					    block_cb->cb_priv, false,
@@ -2987,19 +3007,26 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 	int ok_count = 0;
 	int err;
 
+	down_read(&block->cb_lock);
 	/* Make sure all netdevs sharing this block are offload-capable. */
-	if (block->nooffloaddevcnt && err_stop)
-		return -EOPNOTSUPP;
+	if (block->nooffloaddevcnt && err_stop) {
+		ok_count = -EOPNOTSUPP;
+		goto errout;
+	}
 
 	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
-			if (err_stop)
-				return err;
+			if (err_stop) {
+				ok_count = err;
+				goto errout;
+			}
 		} else {
 			ok_count++;
 		}
 	}
+errout:
+	up_read(&block->cb_lock);
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 02/10] net: sched: change tcf block offload counter type to atomic_t
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 03/10] net: sched: refactor block offloads counter usage Vlad Buslov
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

As a preparation for running proto ops functions without rtnl lock, change
offload counter type to atomic. This is necessary to allow updating the
counter by multiple concurrent users when offloading filters to hardware
from unlocked classifiers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h | 7 ++++---
 net/sched/cls_api.c       | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index a3eaf5f9d28f..d778c502decd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -14,6 +14,7 @@
 #include <linux/workqueue.h>
 #include <linux/mutex.h>
 #include <linux/rwsem.h>
+#include <linux/atomic.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
 #include <net/flow_offload.h>
@@ -401,7 +402,7 @@ struct tcf_block {
 	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
-	unsigned int offloadcnt; /* Number of oddloaded filters */
+	atomic_t offloadcnt; /* Number of oddloaded filters */
 	unsigned int nooffloaddevcnt; /* Number of devs unable to do offload */
 	struct {
 		struct tcf_chain *chain;
@@ -443,7 +444,7 @@ static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 	if (*flags & TCA_CLS_FLAGS_IN_HW)
 		return;
 	*flags |= TCA_CLS_FLAGS_IN_HW;
-	block->offloadcnt++;
+	atomic_inc(&block->offloadcnt);
 }
 
 static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
@@ -451,7 +452,7 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
 	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
 		return;
 	*flags &= ~TCA_CLS_FLAGS_IN_HW;
-	block->offloadcnt--;
+	atomic_dec(&block->offloadcnt);
 }
 
 static inline void
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a3b07c6e3f53..8502bd006b37 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -629,7 +629,7 @@ static void tc_indr_block_call(struct tcf_block *block,
 
 static bool tcf_block_offload_in_use(struct tcf_block *block)
 {
-	return block->offloadcnt;
+	return atomic_read(&block->offloadcnt);
 }
 
 static int tcf_block_offload_cmd(struct tcf_block *block,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 03/10] net: sched: refactor block offloads counter usage
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 02/10] net: sched: change tcf block offload counter type to atomic_t Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 22:53   ` Jakub Kicinski
  2019-08-22 12:43 ` [PATCH net-next 04/10] net: sched: notify classifier on successful offload add/delete Vlad Buslov
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.

Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_inc() and tcf_block_offload_dec() to be cls API private.
Implement following new cls API to be used instead:

  tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
  already in hardware is successfully offloaded, increment block offloads
  counter, set filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be intact and offloads counter is not
  decremented.

  tc_setup_cb_replace() - destructive filter replace. Release existing
  filter block offload counter and reset its in hardware counter and flag.
  Set new filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be destroyed and offload counter is
  decremented.

  tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
  offloads counter.

Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h     |  13 +++-
 include/net/sch_generic.h |  32 +-------
 net/sched/cls_api.c       | 155 ++++++++++++++++++++++++++++++++++----
 net/sched/cls_bpf.c       |  22 +++---
 net/sched/cls_flower.c    |  23 +++---
 net/sched/cls_matchall.c  |  15 ++--
 net/sched/cls_u32.c       |  17 ++---
 7 files changed, 193 insertions(+), 84 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 64999ffcb486..a5937fb3c4b2 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -506,7 +506,18 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts);
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
-		     void *type_data, bool err_stop);
+		     void *type_data, bool err_stop, bool rtnl_held);
+int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
+		    enum tc_setup_type type, void *type_data, bool err_stop,
+		    u32 *flags, unsigned int *in_hw_count, bool rtnl_held);
+int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *old_flags, unsigned int *old_in_hw_count,
+			u32 *new_flags, unsigned int *new_in_hw_count,
+			bool rtnl_held);
+int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *flags, unsigned int *in_hw_count, bool rtnl_held);
 unsigned int tcf_exts_num_actions(struct tcf_exts *exts);
 
 struct tc_cls_u32_knode {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index d778c502decd..373f9476c1de 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -439,36 +439,8 @@ static inline bool lockdep_tcf_proto_is_locked(struct tcf_proto *tp)
 #define tcf_proto_dereference(p, tp)					\
 	rcu_dereference_protected(p, lockdep_tcf_proto_is_locked(tp))
 
-static inline void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
-{
-	if (*flags & TCA_CLS_FLAGS_IN_HW)
-		return;
-	*flags |= TCA_CLS_FLAGS_IN_HW;
-	atomic_inc(&block->offloadcnt);
-}
-
-static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
-{
-	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
-		return;
-	*flags &= ~TCA_CLS_FLAGS_IN_HW;
-	atomic_dec(&block->offloadcnt);
-}
-
-static inline void
-tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
-			  u32 *flags, bool add)
-{
-	if (add) {
-		if (!*cnt)
-			tcf_block_offload_inc(block, flags);
-		(*cnt)++;
-	} else {
-		(*cnt)--;
-		if (!*cnt)
-			tcf_block_offload_dec(block, flags);
-	}
-}
+void tc_cls_offload_cnt_update(struct tcf_block *block, struct tcf_proto *tp,
+			       u32 *cnt, u32 *flags, u32 diff, bool add);
 
 static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz)
 {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 8502bd006b37..4215c849f4a3 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3000,13 +3000,97 @@ int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
 }
 EXPORT_SYMBOL(tcf_exts_dump_stats);
 
-int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
-		     void *type_data, bool err_stop)
+static void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
+{
+	if (*flags & TCA_CLS_FLAGS_IN_HW)
+		return;
+	*flags |= TCA_CLS_FLAGS_IN_HW;
+	atomic_inc(&block->offloadcnt);
+}
+
+static void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
+{
+	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
+		return;
+	*flags &= ~TCA_CLS_FLAGS_IN_HW;
+	atomic_dec(&block->offloadcnt);
+}
+
+void tc_cls_offload_cnt_update(struct tcf_block *block, struct tcf_proto *tp,
+			       u32 *cnt, u32 *flags, u32 diff, bool add)
+{
+	lockdep_assert_held(&block->cb_lock);
+
+	spin_lock(&tp->lock);
+	if (add) {
+		if (!*cnt)
+			tcf_block_offload_inc(block, flags);
+		(*cnt) += diff;
+	} else {
+		(*cnt) -= diff;
+		if (!*cnt)
+			tcf_block_offload_dec(block, flags);
+	}
+	spin_unlock(&tp->lock);
+}
+EXPORT_SYMBOL(tc_cls_offload_cnt_update);
+
+static void
+tc_cls_offload_cnt_reset(struct tcf_block *block, struct tcf_proto *tp,
+			 u32 *cnt, u32 *flags)
+{
+	lockdep_assert_held(&block->cb_lock);
+
+	spin_lock(&tp->lock);
+	tcf_block_offload_dec(block, flags);
+	(*cnt) = 0;
+	spin_unlock(&tp->lock);
+}
+
+static int
+__tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
+		   void *type_data, bool err_stop)
 {
 	struct flow_block_cb *block_cb;
 	int ok_count = 0;
 	int err;
 
+	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
+		err = block_cb->cb(type, type_data, block_cb->cb_priv);
+		if (err) {
+			if (err_stop)
+				return err;
+		} else {
+			ok_count++;
+		}
+	}
+	return ok_count;
+}
+
+int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
+		     void *type_data, bool err_stop, bool rtnl_held)
+{
+	int ok_count;
+
+	down_read(&block->cb_lock);
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+	up_read(&block->cb_lock);
+	return ok_count;
+}
+EXPORT_SYMBOL(tc_setup_cb_call);
+
+/* Non-destructive filter add. If filter that wasn't already in hardware is
+ * successfully offloaded, increment block offloads counter. On failure,
+ * previously offloaded filter is considered to be intact and offloads counter
+ * is not decremented.
+ */
+
+int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
+		    enum tc_setup_type type, void *type_data, bool err_stop,
+		    u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
+{
+	int ok_count;
+
 	down_read(&block->cb_lock);
 	/* Make sure all netdevs sharing this block are offload-capable. */
 	if (block->nooffloaddevcnt && err_stop) {
@@ -3014,22 +3098,67 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 		goto errout;
 	}
 
-	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
-		err = block_cb->cb(type, type_data, block_cb->cb_priv);
-		if (err) {
-			if (err_stop) {
-				ok_count = err;
-				goto errout;
-			}
-		} else {
-			ok_count++;
-		}
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+	if (ok_count > 0)
+		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
+					  ok_count, true);
+errout:
+	up_read(&block->cb_lock);
+	return ok_count;
+}
+EXPORT_SYMBOL(tc_setup_cb_add);
+
+/* Destructive filter replace. If filter that wasn't already in hardware is
+ * successfully offloaded, increment block offload counter. On failure,
+ * previously offloaded filter is considered to be destroyed and offload counter
+ * is decremented.
+ */
+
+int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *old_flags, unsigned int *old_in_hw_count,
+			u32 *new_flags, unsigned int *new_in_hw_count,
+			bool rtnl_held)
+{
+	int ok_count;
+
+	down_read(&block->cb_lock);
+	/* Make sure all netdevs sharing this block are offload-capable. */
+	if (block->nooffloaddevcnt && err_stop) {
+		ok_count = -EOPNOTSUPP;
+		goto errout;
 	}
+
+	tc_cls_offload_cnt_reset(block, tp, old_in_hw_count, old_flags);
+
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+	if (ok_count > 0)
+		tc_cls_offload_cnt_update(block, tp, new_in_hw_count, new_flags,
+					  ok_count, true);
 errout:
 	up_read(&block->cb_lock);
 	return ok_count;
 }
-EXPORT_SYMBOL(tc_setup_cb_call);
+EXPORT_SYMBOL(tc_setup_cb_replace);
+
+/* Destroy filter and decrement block offload counter, if filter was previously
+ * offloaded.
+ */
+
+int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
+			enum tc_setup_type type, void *type_data, bool err_stop,
+			u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
+{
+	int ok_count;
+
+	down_read(&block->cb_lock);
+	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+
+	tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
+	up_read(&block->cb_lock);
+	return ok_count;
+}
+EXPORT_SYMBOL(tc_setup_cb_destroy);
 
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts)
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 3f7a9c02b70c..7f304db7e697 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -162,17 +162,21 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
 	cls_bpf.name = obj->bpf_name;
 	cls_bpf.exts_integrated = obj->exts_integrated;
 
-	if (oldprog)
-		tcf_block_offload_dec(block, &oldprog->gen_flags);
+	if (cls_bpf.oldprog)
+		err = tc_setup_cb_replace(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
+					  skip_sw, &oldprog->gen_flags,
+					  &oldprog->in_hw_count,
+					  &prog->gen_flags, &prog->in_hw_count,
+					  true);
+	else
+		err = tc_setup_cb_add(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
+				      skip_sw, &prog->gen_flags,
+				      &prog->in_hw_count, true);
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
 	if (prog) {
 		if (err < 0) {
 			cls_bpf_offload_cmd(tp, oldprog, prog, extack);
 			return err;
-		} else if (err > 0) {
-			prog->in_hw_count = err;
-			tcf_block_offload_inc(block, &prog->gen_flags);
 		}
 	}
 
@@ -230,7 +234,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
 	cls_bpf.name = prog->bpf_name;
 	cls_bpf.exts_integrated = prog->exts_integrated;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false, true);
 }
 
 static int cls_bpf_init(struct tcf_proto *tp)
@@ -680,8 +684,8 @@ static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb
 			continue;
 		}
 
-		tc_cls_offload_cnt_update(block, &prog->in_hw_count,
-					  &prog->gen_flags, add);
+		tc_cls_offload_cnt_update(block, tp, &prog->in_hw_count,
+					  &prog->gen_flags, 1, add);
 	}
 
 	return 0;
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 054123742e32..0001a933d48b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -419,10 +419,10 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 	cls_flower.command = FLOW_CLS_DESTROY;
 	cls_flower.cookie = (unsigned long) f;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
+			    &f->flags, &f->in_hw_count, true);
 	spin_lock(&tp->lock);
 	list_del_init(&f->hw_list);
-	tcf_block_offload_dec(block, &f->flags);
 	spin_unlock(&tp->lock);
 
 	if (!rtnl_held)
@@ -466,18 +466,15 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 		goto errout;
 	}
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, skip_sw);
+	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
+			      skip_sw, &f->flags, &f->in_hw_count, true);
 	kfree(cls_flower.rule);
 
 	if (err < 0) {
 		fl_hw_destroy_filter(tp, f, true, NULL);
 		goto errout;
 	} else if (err > 0) {
-		f->in_hw_count = err;
 		err = 0;
-		spin_lock(&tp->lock);
-		tcf_block_offload_inc(block, &f->flags);
-		spin_unlock(&tp->lock);
 	}
 
 	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {
@@ -509,7 +506,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
 	cls_flower.cookie = (unsigned long) f;
 	cls_flower.classid = f->res.classid;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
 
 	tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
 			      cls_flower.stats.pkts,
@@ -1855,10 +1852,8 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			goto next_flow;
 		}
 
-		spin_lock(&tp->lock);
-		tc_cls_offload_cnt_update(block, &f->in_hw_count, &f->flags,
-					  add);
-		spin_unlock(&tp->lock);
+		tc_cls_offload_cnt_update(block, tp, &f->in_hw_count, &f->flags,
+					  1, add);
 next_flow:
 		__fl_put(f);
 	}
@@ -1886,7 +1881,7 @@ static int fl_hw_create_tmplt(struct tcf_chain *chain,
 	/* We don't care if driver (any of them) fails to handle this
 	 * call. It serves just as a hint for it.
 	 */
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
 	kfree(cls_flower.rule);
 
 	return 0;
@@ -1902,7 +1897,7 @@ static void fl_hw_destroy_tmplt(struct tcf_chain *chain,
 	cls_flower.command = FLOW_CLS_TMPLT_DESTROY;
 	cls_flower.cookie = (unsigned long) tmplt;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
 }
 
 static void *fl_tmplt_create(struct net *net, struct tcf_chain *chain,
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 455ea2793f9b..d38e329d71bd 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -75,8 +75,8 @@ static void mall_destroy_hw_filter(struct tcf_proto *tp,
 	cls_mall.command = TC_CLSMATCHALL_DESTROY;
 	cls_mall.cookie = cookie;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false);
-	tcf_block_offload_dec(block, &head->flags);
+	tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false, true);
+	head->flags &= ~TCA_CLS_FLAGS_IN_HW;
 }
 
 static int mall_replace_hw_filter(struct tcf_proto *tp,
@@ -109,15 +109,13 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 		return err;
 	}
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, skip_sw);
+	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSMATCHALL, &cls_mall,
+			      skip_sw, &head->flags, &head->in_hw_count, true);
 	kfree(cls_mall.rule);
 
 	if (err < 0) {
 		mall_destroy_hw_filter(tp, head, cookie, NULL);
 		return err;
-	} else if (err > 0) {
-		head->in_hw_count = err;
-		tcf_block_offload_inc(block, &head->flags);
 	}
 
 	if (skip_sw && !(head->flags & TCA_CLS_FLAGS_IN_HW))
@@ -321,7 +319,8 @@ static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		return 0;
 	}
 
-	tc_cls_offload_cnt_update(block, &head->in_hw_count, &head->flags, add);
+	tc_cls_offload_cnt_update(block, tp, &head->in_hw_count, &head->flags,
+				  1, add);
 
 	return 0;
 }
@@ -337,7 +336,7 @@ static void mall_stats_hw_filter(struct tcf_proto *tp,
 	cls_mall.command = TC_CLSMATCHALL_STATS;
 	cls_mall.cookie = cookie;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false, true);
 
 	tcf_exts_stats_update(&head->exts, cls_mall.stats.bytes,
 			      cls_mall.stats.pkts, cls_mall.stats.lastused);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index 8614088edd1b..94c6eca191f9 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -480,7 +480,7 @@ static void u32_clear_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
 	cls_u32.hnode.handle = h->handle;
 	cls_u32.hnode.prio = h->prio;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, false);
+	tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, false, true);
 }
 
 static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
@@ -498,7 +498,7 @@ static int u32_replace_hw_hnode(struct tcf_proto *tp, struct tc_u_hnode *h,
 	cls_u32.hnode.handle = h->handle;
 	cls_u32.hnode.prio = h->prio;
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, skip_sw);
+	err = tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, skip_sw, true);
 	if (err < 0) {
 		u32_clear_hw_hnode(tp, h, NULL);
 		return err;
@@ -522,8 +522,8 @@ static void u32_remove_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 	cls_u32.command = TC_CLSU32_DELETE_KNODE;
 	cls_u32.knode.handle = n->handle;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, false);
-	tcf_block_offload_dec(block, &n->flags);
+	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSU32, &cls_u32, false,
+			    &n->flags, &n->in_hw_count, true);
 }
 
 static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
@@ -552,13 +552,11 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 	if (n->ht_down)
 		cls_u32.knode.link_handle = ht->handle;
 
-	err = tc_setup_cb_call(block, TC_SETUP_CLSU32, &cls_u32, skip_sw);
+	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSU32, &cls_u32, skip_sw,
+			      &n->flags, &n->in_hw_count, true);
 	if (err < 0) {
 		u32_remove_hw_knode(tp, n, NULL);
 		return err;
-	} else if (err > 0) {
-		n->in_hw_count = err;
-		tcf_block_offload_inc(block, &n->flags);
 	}
 
 	if (skip_sw && !(n->flags & TCA_CLS_FLAGS_IN_HW))
@@ -1208,7 +1206,8 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 		return 0;
 	}
 
-	tc_cls_offload_cnt_update(block, &n->in_hw_count, &n->flags, add);
+	tc_cls_offload_cnt_update(block, tp, &n->in_hw_count, &n->flags, 1,
+				  add);
 
 	return 0;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 04/10] net: sched: notify classifier on successful offload add/delete
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
                   ` (2 preceding siblings ...)
  2019-08-22 12:43 ` [PATCH net-next 03/10] net: sched: refactor block offloads counter usage Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 22:58   ` Jakub Kicinski
  2019-08-22 12:43 ` [PATCH net-next 05/10] net: sched: add API for registering unlocked offload block callbacks Vlad Buslov
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.

Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/sch_generic.h |  4 ++++
 net/sched/cls_api.c       | 25 +++++++++++++++++++------
 net/sched/cls_flower.c    | 33 ++++++++++++++++++++++++++-------
 3 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 373f9476c1de..4ad43335cae5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -312,6 +312,10 @@ struct tcf_proto_ops {
 	int			(*reoffload)(struct tcf_proto *tp, bool add,
 					     flow_setup_cb_t *cb, void *cb_priv,
 					     struct netlink_ext_ack *extack);
+	void			(*hw_add)(struct tcf_proto *tp,
+					  void *type_data);
+	void			(*hw_del)(struct tcf_proto *tp,
+					  void *type_data);
 	void			(*bind_class)(void *, u32, unsigned long);
 	void *			(*tmplt_create)(struct net *net,
 						struct tcf_chain *chain,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 4215c849f4a3..d8ef7a9e6906 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3099,9 +3099,13 @@ int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
 	}
 
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
-	if (ok_count > 0)
-		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
-					  ok_count, true);
+	if (ok_count >= 0) {
+		if (tp->ops->hw_add)
+			tp->ops->hw_add(tp, type_data);
+		if (ok_count > 0)
+			tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
+						  ok_count, true);
+	}
 errout:
 	up_read(&block->cb_lock);
 	return ok_count;
@@ -3130,11 +3134,17 @@ int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
 	}
 
 	tc_cls_offload_cnt_reset(block, tp, old_in_hw_count, old_flags);
+	if (tp->ops->hw_del)
+		tp->ops->hw_del(tp, type_data);
 
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
-	if (ok_count > 0)
-		tc_cls_offload_cnt_update(block, tp, new_in_hw_count, new_flags,
-					  ok_count, true);
+	if (ok_count >= 0) {
+		if (tp->ops->hw_add)
+			tp->ops->hw_add(tp, type_data);
+		if (ok_count > 0)
+			tc_cls_offload_cnt_update(block, tp, new_in_hw_count,
+						  new_flags, ok_count, true);
+	}
 errout:
 	up_read(&block->cb_lock);
 	return ok_count;
@@ -3155,6 +3165,9 @@ int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
 
 	tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
+	if (tp->ops->hw_del)
+		tp->ops->hw_del(tp, type_data);
+
 	up_read(&block->cb_lock);
 	return ok_count;
 }
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 0001a933d48b..cd1686a5abe7 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -421,9 +421,6 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 
 	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
 			    &f->flags, &f->in_hw_count, true);
-	spin_lock(&tp->lock);
-	list_del_init(&f->hw_list);
-	spin_unlock(&tp->lock);
 
 	if (!rtnl_held)
 		rtnl_unlock();
@@ -433,7 +430,6 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 				struct cls_fl_filter *f, bool rtnl_held,
 				struct netlink_ext_ack *extack)
 {
-	struct cls_fl_head *head = fl_head_dereference(tp);
 	struct tcf_block *block = tp->chain->block;
 	struct flow_cls_offload cls_flower = {};
 	bool skip_sw = tc_skip_sw(f->flags);
@@ -482,9 +478,6 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 		goto errout;
 	}
 
-	spin_lock(&tp->lock);
-	list_add(&f->hw_list, &head->hw_filters);
-	spin_unlock(&tp->lock);
 errout:
 	if (!rtnl_held)
 		rtnl_unlock();
@@ -1861,6 +1854,30 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 	return 0;
 }
 
+static void fl_hw_add(struct tcf_proto *tp, void *type_data)
+{
+	struct flow_cls_offload *cls_flower = type_data;
+	struct cls_fl_filter *f =
+		(struct cls_fl_filter *) cls_flower->cookie;
+	struct cls_fl_head *head = fl_head_dereference(tp);
+
+	spin_lock(&tp->lock);
+	list_add(&f->hw_list, &head->hw_filters);
+	spin_unlock(&tp->lock);
+}
+
+static void fl_hw_del(struct tcf_proto *tp, void *type_data)
+{
+	struct flow_cls_offload *cls_flower = type_data;
+	struct cls_fl_filter *f =
+		(struct cls_fl_filter *) cls_flower->cookie;
+
+	spin_lock(&tp->lock);
+	if (!list_empty(&f->hw_list))
+		list_del_init(&f->hw_list);
+	spin_unlock(&tp->lock);
+}
+
 static int fl_hw_create_tmplt(struct tcf_chain *chain,
 			      struct fl_flow_tmplt *tmplt)
 {
@@ -2521,6 +2538,8 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = {
 	.delete		= fl_delete,
 	.walk		= fl_walk,
 	.reoffload	= fl_reoffload,
+	.hw_add		= fl_hw_add,
+	.hw_del		= fl_hw_del,
 	.dump		= fl_dump,
 	.bind_class	= fl_bind_class,
 	.tmplt_create	= fl_tmplt_create,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 05/10] net: sched: add API for registering unlocked offload block callbacks
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
                   ` (3 preceding siblings ...)
  2019-08-22 12:43 ` [PATCH net-next 04/10] net: sched: notify classifier on successful offload add/delete Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 06/10] net: sched: conditionally obtain rtnl lock in cls hw offloads API Vlad Buslov
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.

Register mlx5 tc block offload callbacks as "unlocked".

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c  | 3 +++
 include/net/flow_offload.h                        | 1 +
 include/net/sch_generic.h                         | 1 +
 net/sched/cls_api.c                               | 6 ++++++
 5 files changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 7fdea6479ff6..8abdef1f8470 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3469,10 +3469,12 @@ static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			  void *type_data)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct flow_block_offload *f = type_data;
 
 	switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
 	case TC_SETUP_BLOCK:
+		f->unlocked_driver_cb = true;
 		return flow_block_cb_setup_simple(type_data,
 						  &mlx5e_block_cb_list,
 						  mlx5e_setup_tc_block_cb,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 3c0d36b2b91c..e7ac6233037d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -763,6 +763,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 	if (f->binder_type != FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		return -EOPNOTSUPP;
 
+	f->unlocked_driver_cb = true;
 	f->driver_block_list = &mlx5e_block_cb_list;
 
 	switch (f->command) {
@@ -1245,9 +1246,11 @@ static int mlx5e_rep_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			      void *type_data)
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct flow_block_offload *f = type_data;
 
 	switch (type) {
 	case TC_SETUP_BLOCK:
+		f->unlocked_driver_cb = true;
 		return flow_block_cb_setup_simple(type_data,
 						  &mlx5e_rep_block_cb_list,
 						  mlx5e_rep_setup_tc_cb,
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 757fa84de654..fc881875f856 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -284,6 +284,7 @@ struct flow_block_offload {
 	enum flow_block_command command;
 	enum flow_block_binder_type binder_type;
 	bool block_shared;
+	bool unlocked_driver_cb;
 	struct net *net;
 	struct flow_block *block;
 	struct list_head cb_list;
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4ad43335cae5..d0c282e3630b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -408,6 +408,7 @@ struct tcf_block {
 	bool keep_dst;
 	atomic_t offloadcnt; /* Number of oddloaded filters */
 	unsigned int nooffloaddevcnt; /* Number of devs unable to do offload */
+	unsigned int lockeddevcnt; /* Number of devs that require rtnl lock. */
 	struct {
 		struct tcf_chain *chain;
 		struct list_head filter_chain_list;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index d8ef7a9e6906..02a547aa77c0 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1418,6 +1418,8 @@ static int tcf_block_bind(struct tcf_block *block,
 						  bo->extack);
 		if (err)
 			goto err_unroll;
+		if (!bo->unlocked_driver_cb)
+			block->lockeddevcnt++;
 
 		i++;
 	}
@@ -1433,6 +1435,8 @@ static int tcf_block_bind(struct tcf_block *block,
 						    block_cb->cb_priv, false,
 						    tcf_block_offload_in_use(block),
 						    NULL);
+			if (!bo->unlocked_driver_cb)
+				block->lockeddevcnt--;
 		}
 		flow_block_cb_free(block_cb);
 	}
@@ -1454,6 +1458,8 @@ static void tcf_block_unbind(struct tcf_block *block,
 					    NULL);
 		list_del(&block_cb->list);
 		flow_block_cb_free(block_cb);
+		if (!bo->unlocked_driver_cb)
+			block->lockeddevcnt--;
 	}
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 06/10] net: sched: conditionally obtain rtnl lock in cls hw offloads API
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
                   ` (4 preceding siblings ...)
  2019-08-22 12:43 ` [PATCH net-next 05/10] net: sched: add API for registering unlocked offload block callbacks Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 23:03   ` Jakub Kicinski
  2019-08-22 12:43 ` [PATCH net-next 07/10] net: sched: take rtnl lock in tc_setup_flow_action() Vlad Buslov
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

In order to remove dependency on rtnl lock from offloads code of
classifiers, take rtnl lock conditionally before executing driver
callbacks. Only obtain rtnl lock if block is bound to devices that require
it.

Block bind/unbind code is rtnl-locked and obtains block->cb_lock while
holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions
to prevent deadlock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_api.c | 65 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 02a547aa77c0..bda42f1b5514 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3076,11 +3076,28 @@ __tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 		     void *type_data, bool err_stop, bool rtnl_held)
 {
+	bool take_rtnl = false;
 	int ok_count;
 
+retry:
+	if (take_rtnl)
+		rtnl_lock();
 	down_read(&block->cb_lock);
+	/* Need to obtain rtnl lock if block is bound to devs that require it.
+	 * In block bind code cb_lock is obtained while holding rtnl, so we must
+	 * obtain the locks in same order here.
+	 */
+	if (!rtnl_held && !take_rtnl && block->lockeddevcnt) {
+		up_read(&block->cb_lock);
+		take_rtnl = true;
+		goto retry;
+	}
+
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
+
 	up_read(&block->cb_lock);
+	if (take_rtnl)
+		rtnl_unlock();
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_call);
@@ -3095,9 +3112,23 @@ int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
 		    enum tc_setup_type type, void *type_data, bool err_stop,
 		    u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
 {
+	bool take_rtnl = false;
 	int ok_count;
 
+retry:
+	if (take_rtnl)
+		rtnl_lock();
 	down_read(&block->cb_lock);
+	/* Need to obtain rtnl lock if block is bound to devs that require it.
+	 * In block bind code cb_lock is obtained while holding rtnl, so we must
+	 * obtain the locks in same order here.
+	 */
+	if (!rtnl_held && !take_rtnl && block->lockeddevcnt) {
+		up_read(&block->cb_lock);
+		take_rtnl = true;
+		goto retry;
+	}
+
 	/* Make sure all netdevs sharing this block are offload-capable. */
 	if (block->nooffloaddevcnt && err_stop) {
 		ok_count = -EOPNOTSUPP;
@@ -3114,6 +3145,8 @@ int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
 	}
 errout:
 	up_read(&block->cb_lock);
+	if (take_rtnl)
+		rtnl_unlock();
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_add);
@@ -3130,9 +3163,23 @@ int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
 			u32 *new_flags, unsigned int *new_in_hw_count,
 			bool rtnl_held)
 {
+	bool take_rtnl = false;
 	int ok_count;
 
+retry:
+	if (take_rtnl)
+		rtnl_lock();
 	down_read(&block->cb_lock);
+	/* Need to obtain rtnl lock if block is bound to devs that require it.
+	 * In block bind code cb_lock is obtained while holding rtnl, so we must
+	 * obtain the locks in same order here.
+	 */
+	if (!rtnl_held && !take_rtnl && block->lockeddevcnt) {
+		up_read(&block->cb_lock);
+		take_rtnl = true;
+		goto retry;
+	}
+
 	/* Make sure all netdevs sharing this block are offload-capable. */
 	if (block->nooffloaddevcnt && err_stop) {
 		ok_count = -EOPNOTSUPP;
@@ -3153,6 +3200,8 @@ int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
 	}
 errout:
 	up_read(&block->cb_lock);
+	if (take_rtnl)
+		rtnl_unlock();
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_replace);
@@ -3165,9 +3214,23 @@ int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
 			enum tc_setup_type type, void *type_data, bool err_stop,
 			u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
 {
+	bool take_rtnl = false;
 	int ok_count;
 
+retry:
+	if (take_rtnl)
+		rtnl_lock();
 	down_read(&block->cb_lock);
+	/* Need to obtain rtnl lock if block is bound to devs that require it.
+	 * In block bind code cb_lock is obtained while holding rtnl, so we must
+	 * obtain the locks in same order here.
+	 */
+	if (!rtnl_held && !take_rtnl && block->lockeddevcnt) {
+		up_read(&block->cb_lock);
+		take_rtnl = true;
+		goto retry;
+	}
+
 	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
 
 	tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
@@ -3175,6 +3238,8 @@ int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
 		tp->ops->hw_del(tp, type_data);
 
 	up_read(&block->cb_lock);
+	if (take_rtnl)
+		rtnl_unlock();
 	return ok_count;
 }
 EXPORT_SYMBOL(tc_setup_cb_destroy);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 07/10] net: sched: take rtnl lock in tc_setup_flow_action()
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
                   ` (5 preceding siblings ...)
  2019-08-22 12:43 ` [PATCH net-next 06/10] net: sched: conditionally obtain rtnl lock in cls hw offloads API Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 08/10] net: sched: take reference to action dev before calling offloads Vlad Buslov
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h    |  2 +-
 net/sched/cls_api.c      | 17 +++++++++++++----
 net/sched/cls_flower.c   |  6 ++++--
 net/sched/cls_matchall.c |  4 ++--
 4 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a5937fb3c4b2..4d861426507c 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -504,7 +504,7 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 }
 
 int tc_setup_flow_action(struct flow_action *flow_action,
-			 const struct tcf_exts *exts);
+			 const struct tcf_exts *exts, bool rtnl_held);
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 		     void *type_data, bool err_stop, bool rtnl_held);
 int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index bda42f1b5514..31680493b2b1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3245,14 +3245,17 @@ int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
 EXPORT_SYMBOL(tc_setup_cb_destroy);
 
 int tc_setup_flow_action(struct flow_action *flow_action,
-			 const struct tcf_exts *exts)
+			 const struct tcf_exts *exts, bool rtnl_held)
 {
 	const struct tc_action *act;
-	int i, j, k;
+	int i, j, k, err = 0;
 
 	if (!exts)
 		return 0;
 
+	if (!rtnl_held)
+		rtnl_lock();
+
 	j = 0;
 	tcf_exts_for_each_action(i, act, exts) {
 		struct flow_action_entry *entry;
@@ -3297,6 +3300,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 				entry->vlan.prio = tcf_vlan_push_prio(act);
 				break;
 			default:
+				err = -EOPNOTSUPP;
 				goto err_out;
 			}
 		} else if (is_tcf_tunnel_set(act)) {
@@ -3314,6 +3318,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 					entry->id = FLOW_ACTION_ADD;
 					break;
 				default:
+					err = -EOPNOTSUPP;
 					goto err_out;
 				}
 				entry->mangle.htype = tcf_pedit_htype(act, k);
@@ -3372,15 +3377,19 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			entry->id = FLOW_ACTION_PTYPE;
 			entry->ptype = tcf_skbedit_ptype(act);
 		} else {
+			err = -EOPNOTSUPP;
 			goto err_out;
 		}
 
 		if (!is_tcf_pedit(act))
 			j++;
 	}
-	return 0;
+
 err_out:
-	return -EOPNOTSUPP;
+	if (!rtnl_held)
+		rtnl_unlock();
+
+	return err;
 }
 EXPORT_SYMBOL(tc_setup_flow_action);
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index cd1686a5abe7..006759b9c78a 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -452,7 +452,8 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	cls_flower.rule->match.key = &f->mkey;
 	cls_flower.classid = f->res.classid;
 
-	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
+	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts,
+				   true);
 	if (err) {
 		kfree(cls_flower.rule);
 		if (skip_sw)
@@ -1821,7 +1822,8 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		cls_flower.rule->match.mask = &f->mask->key;
 		cls_flower.rule->match.key = &f->mkey;
 
-		err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts);
+		err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts,
+					   true);
 		if (err) {
 			kfree(cls_flower.rule);
 			if (tc_skip_sw(f->flags)) {
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index d38e329d71bd..52e6ef7538b7 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -97,7 +97,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
 	cls_mall.command = TC_CLSMATCHALL_REPLACE;
 	cls_mall.cookie = cookie;
 
-	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts);
+	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts, true);
 	if (err) {
 		kfree(cls_mall.rule);
 		mall_destroy_hw_filter(tp, head, cookie, NULL);
@@ -300,7 +300,7 @@ static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		TC_CLSMATCHALL_REPLACE : TC_CLSMATCHALL_DESTROY;
 	cls_mall.cookie = (unsigned long)head;
 
-	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts);
+	err = tc_setup_flow_action(&cls_mall.rule->action, &head->exts, true);
 	if (err) {
 		kfree(cls_mall.rule);
 		if (add && tc_skip_sw(head->flags)) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 08/10] net: sched: take reference to action dev before calling offloads
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
                   ` (6 preceding siblings ...)
  2019-08-22 12:43 ` [PATCH net-next 07/10] net: sched: take rtnl lock in tc_setup_flow_action() Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 09/10] net: sched: copy tunnel info when setting flow_action entry->tunnel Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 10/10] net: sched: flower: don't take rtnl lock for cls hw offloads API Vlad Buslov
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/pkt_cls.h  |  2 ++
 net/sched/cls_api.c    | 32 ++++++++++++++++++++++++++++++++
 net/sched/cls_flower.c |  2 ++
 3 files changed, 36 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 4d861426507c..b20017793952 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -505,6 +505,8 @@ tcf_match_indev(struct sk_buff *skb, int ifindex)
 
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts, bool rtnl_held);
+void tc_cleanup_flow_action(struct flow_action *flow_action);
+
 int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 		     void *type_data, bool err_stop, bool rtnl_held);
 int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 31680493b2b1..96adbff9e845 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3244,6 +3244,27 @@ int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
 }
 EXPORT_SYMBOL(tc_setup_cb_destroy);
 
+void tc_cleanup_flow_action(struct flow_action *flow_action)
+{
+	struct flow_action_entry *entry;
+	int i;
+
+	flow_action_for_each(i, entry, flow_action) {
+		switch (entry->id) {
+		case FLOW_ACTION_REDIRECT:
+		case FLOW_ACTION_MIRRED:
+		case FLOW_ACTION_REDIRECT_INGRESS:
+		case FLOW_ACTION_MIRRED_INGRESS:
+			if (entry->dev)
+				dev_put(entry->dev);
+			break;
+		default:
+			break;
+		}
+	}
+}
+EXPORT_SYMBOL(tc_cleanup_flow_action);
+
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts, bool rtnl_held)
 {
@@ -3273,15 +3294,23 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 		} else if (is_tcf_mirred_egress_redirect(act)) {
 			entry->id = FLOW_ACTION_REDIRECT;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_mirred_egress_mirror(act)) {
 			entry->id = FLOW_ACTION_MIRRED;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_mirred_ingress_redirect(act)) {
 			entry->id = FLOW_ACTION_REDIRECT_INGRESS;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_mirred_ingress_mirror(act)) {
 			entry->id = FLOW_ACTION_MIRRED_INGRESS;
 			entry->dev = tcf_mirred_dev(act);
+			if (entry->dev)
+				dev_hold(entry->dev);
 		} else if (is_tcf_vlan(act)) {
 			switch (tcf_vlan_action(act)) {
 			case TCA_VLAN_ACT_PUSH:
@@ -3389,6 +3418,9 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 	if (!rtnl_held)
 		rtnl_unlock();
 
+	if (err)
+		tc_cleanup_flow_action(flow_action);
+
 	return err;
 }
 EXPORT_SYMBOL(tc_setup_flow_action);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 006759b9c78a..1cc68702f93d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -465,6 +465,7 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 
 	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
 			      skip_sw, &f->flags, &f->in_hw_count, true);
+	tc_cleanup_flow_action(&cls_flower.rule->action);
 	kfree(cls_flower.rule);
 
 	if (err < 0) {
@@ -1837,6 +1838,7 @@ static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 		cls_flower.classid = f->res.classid;
 
 		err = cb(TC_SETUP_CLSFLOWER, &cls_flower, cb_priv);
+		tc_cleanup_flow_action(&cls_flower.rule->action);
 		kfree(cls_flower.rule);
 
 		if (err) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 09/10] net: sched: copy tunnel info when setting flow_action entry->tunnel
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
                   ` (7 preceding siblings ...)
  2019-08-22 12:43 ` [PATCH net-next 08/10] net: sched: take reference to action dev before calling offloads Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  2019-08-22 12:43 ` [PATCH net-next 10/10] net: sched: flower: don't take rtnl lock for cls hw offloads API Vlad Buslov
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

In order to remove dependency on rtnl lock, modify tc_setup_flow_action()
to copy tunnel info, instead of just saving pointer to tunnel_key action
tunnel info. This is necessary to prevent concurrent action overwrite from
releasing tunnel info while it is being used by rtnl-unlocked driver.

Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info
with all its options to dynamically allocated memory block. Modify
tc_cleanup_flow_action() to free dynamically allocated tunnel info.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/tc_act/tc_tunnel_key.h | 17 +++++++++++++++++
 net/sched/cls_api.c                |  9 ++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/net/tc_act/tc_tunnel_key.h b/include/net/tc_act/tc_tunnel_key.h
index 7c3f777c168c..0689d9bcdf84 100644
--- a/include/net/tc_act/tc_tunnel_key.h
+++ b/include/net/tc_act/tc_tunnel_key.h
@@ -59,4 +59,21 @@ static inline struct ip_tunnel_info *tcf_tunnel_info(const struct tc_action *a)
 	return NULL;
 #endif
 }
+
+static inline struct ip_tunnel_info *
+tcf_tunnel_info_copy(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+	struct ip_tunnel_info *tun = tcf_tunnel_info(a);
+
+	if (tun) {
+		size_t tun_size = sizeof(*tun) + tun->options_len;
+		struct ip_tunnel_info *tun_copy = kmemdup(tun, tun_size,
+							  GFP_KERNEL);
+
+		return tun_copy;
+	}
+#endif
+	return NULL;
+}
 #endif /* __NET_TC_TUNNEL_KEY_H */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 96adbff9e845..15423d3a89fd 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3258,6 +3258,9 @@ void tc_cleanup_flow_action(struct flow_action *flow_action)
 			if (entry->dev)
 				dev_put(entry->dev);
 			break;
+		case FLOW_ACTION_TUNNEL_ENCAP:
+			kfree(entry->tunnel);
+			break;
 		default:
 			break;
 		}
@@ -3334,7 +3337,11 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			}
 		} else if (is_tcf_tunnel_set(act)) {
 			entry->id = FLOW_ACTION_TUNNEL_ENCAP;
-			entry->tunnel = tcf_tunnel_info(act);
+			entry->tunnel = tcf_tunnel_info_copy(act);
+			if (!entry->tunnel) {
+				err = -ENOMEM;
+				goto err_out;
+			}
 		} else if (is_tcf_tunnel_release(act)) {
 			entry->id = FLOW_ACTION_TUNNEL_DECAP;
 		} else if (is_tcf_pedit(act)) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 10/10] net: sched: flower: don't take rtnl lock for cls hw offloads API
  2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
                   ` (8 preceding siblings ...)
  2019-08-22 12:43 ` [PATCH net-next 09/10] net: sched: copy tunnel info when setting flow_action entry->tunnel Vlad Buslov
@ 2019-08-22 12:43 ` Vlad Buslov
  9 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-22 12:43 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, jiri, davem, jakub.kicinski, pablo,
	Vlad Buslov, Jiri Pirko

Don't manually take rtnl lock in flower classifier before calling cls
hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
parameter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/sched/cls_flower.c | 55 ++++++++++++------------------------------
 1 file changed, 16 insertions(+), 39 deletions(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 1cc68702f93d..786427997be6 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -412,18 +412,13 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
 	struct tcf_block *block = tp->chain->block;
 	struct flow_cls_offload cls_flower = {};
 
-	if (!rtnl_held)
-		rtnl_lock();
-
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
 	cls_flower.command = FLOW_CLS_DESTROY;
 	cls_flower.cookie = (unsigned long) f;
 
 	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
-			    &f->flags, &f->in_hw_count, true);
+			    &f->flags, &f->in_hw_count, rtnl_held);
 
-	if (!rtnl_held)
-		rtnl_unlock();
 }
 
 static int fl_hw_replace_filter(struct tcf_proto *tp,
@@ -435,14 +430,9 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	bool skip_sw = tc_skip_sw(f->flags);
 	int err = 0;
 
-	if (!rtnl_held)
-		rtnl_lock();
-
 	cls_flower.rule = flow_rule_alloc(tcf_exts_num_actions(&f->exts));
-	if (!cls_flower.rule) {
-		err = -ENOMEM;
-		goto errout;
-	}
+	if (!cls_flower.rule)
+		return -ENOMEM;
 
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, extack);
 	cls_flower.command = FLOW_CLS_REPLACE;
@@ -453,38 +443,30 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 	cls_flower.classid = f->res.classid;
 
 	err = tc_setup_flow_action(&cls_flower.rule->action, &f->exts,
-				   true);
+				   rtnl_held);
 	if (err) {
 		kfree(cls_flower.rule);
-		if (skip_sw)
+		if (skip_sw) {
 			NL_SET_ERR_MSG_MOD(extack, "Failed to setup flow action");
-		else
-			err = 0;
-		goto errout;
+			return err;
+		}
+		return 0;
 	}
 
 	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
-			      skip_sw, &f->flags, &f->in_hw_count, true);
+			      skip_sw, &f->flags, &f->in_hw_count, rtnl_held);
 	tc_cleanup_flow_action(&cls_flower.rule->action);
 	kfree(cls_flower.rule);
 
 	if (err < 0) {
-		fl_hw_destroy_filter(tp, f, true, NULL);
-		goto errout;
-	} else if (err > 0) {
-		err = 0;
-	}
-
-	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {
-		err = -EINVAL;
-		goto errout;
+		fl_hw_destroy_filter(tp, f, rtnl_held, NULL);
+		return err;
 	}
 
-errout:
-	if (!rtnl_held)
-		rtnl_unlock();
+	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW))
+		return -EINVAL;
 
-	return err;
+	return 0;
 }
 
 static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
@@ -493,22 +475,17 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,
 	struct tcf_block *block = tp->chain->block;
 	struct flow_cls_offload cls_flower = {};
 
-	if (!rtnl_held)
-		rtnl_lock();
-
 	tc_cls_common_offload_init(&cls_flower.common, tp, f->flags, NULL);
 	cls_flower.command = FLOW_CLS_STATS;
 	cls_flower.cookie = (unsigned long) f;
 	cls_flower.classid = f->res.classid;
 
-	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false, true);
+	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false,
+			 rtnl_held);
 
 	tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
 			      cls_flower.stats.pkts,
 			      cls_flower.stats.lastused);
-
-	if (!rtnl_held)
-		rtnl_unlock();
 }
 
 static void __fl_put(struct cls_fl_filter *f)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore
  2019-08-22 12:43 ` [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore Vlad Buslov
@ 2019-08-22 22:15   ` Jakub Kicinski
  2019-08-23 10:25     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2019-08-22 22:15 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko

On Thu, 22 Aug 2019 15:43:44 +0300, Vlad Buslov wrote:
> @@ -2987,19 +3007,26 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>  	int ok_count = 0;
>  	int err;
>  
> +	down_read(&block->cb_lock);
>  	/* Make sure all netdevs sharing this block are offload-capable. */
> -	if (block->nooffloaddevcnt && err_stop)
> -		return -EOPNOTSUPP;
> +	if (block->nooffloaddevcnt && err_stop) {
> +		ok_count = -EOPNOTSUPP;
> +		goto errout;
> +	}
>  
>  	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
>  		err = block_cb->cb(type, type_data, block_cb->cb_priv);
>  		if (err) {
> -			if (err_stop)
> -				return err;
> +			if (err_stop) {
> +				ok_count = err;
> +				goto errout;
> +			}
>  		} else {
>  			ok_count++;
>  		}
>  	}
> +errout:

Please name the labels with the first action they perform. Here:

err_unlock:

> +	up_read(&block->cb_lock);
>  	return ok_count;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 03/10] net: sched: refactor block offloads counter usage
  2019-08-22 12:43 ` [PATCH net-next 03/10] net: sched: refactor block offloads counter usage Vlad Buslov
@ 2019-08-22 22:53   ` Jakub Kicinski
  2019-08-23 10:39     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2019-08-22 22:53 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko

On Thu, 22 Aug 2019 15:43:46 +0300, Vlad Buslov wrote:
> Without rtnl lock protection filters can no longer safely manage block
> offloads counter themselves. Refactor cls API to protect block offloadcnt
> with tcf_block->cb_lock that is already used to protect driver callback
> list and nooffloaddevcnt counter. The counter can be modified by concurrent
> tasks by new functions that execute block callbacks (which is safe with
> previous patch that changed its type to atomic_t), however, block
> bind/unbind code that checks the counter value takes cb_lock in write mode
> to exclude any concurrent modifications. This approach prevents race
> conditions between bind/unbind and callback execution code but allows for
> concurrency for tc rule update path.
> 
> Move block offload counter, filter in hardware counter and filter flags
> management from classifiers into cls hardware offloads API. Make functions
> tcf_block_offload_inc() and tcf_block_offload_dec() to be cls API private.
> Implement following new cls API to be used instead:
> 
>   tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
>   already in hardware is successfully offloaded, increment block offloads
>   counter, set filter in hardware counter and flag. On failure, previously
>   offloaded filter is considered to be intact and offloads counter is not
>   decremented.
> 
>   tc_setup_cb_replace() - destructive filter replace. Release existing
>   filter block offload counter and reset its in hardware counter and flag.
>   Set new filter in hardware counter and flag. On failure, previously
>   offloaded filter is considered to be destroyed and offload counter is
>   decremented.
> 
>   tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
>   offloads counter.
> 
> Refactor all offload-capable classifiers to atomically offload filters to
> hardware, change block offload counter, and set filter in hardware counter
> and flag by means of the new cls API functions.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>

Looks good, minor nits

> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 8502bd006b37..4215c849f4a3 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3000,13 +3000,97 @@ int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
>  }
>  EXPORT_SYMBOL(tcf_exts_dump_stats);
>  
> -int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> -		     void *type_data, bool err_stop)
> +static void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
> +{
> +	if (*flags & TCA_CLS_FLAGS_IN_HW)
> +		return;
> +	*flags |= TCA_CLS_FLAGS_IN_HW;
> +	atomic_inc(&block->offloadcnt);
> +}
> +
> +static void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
> +{
> +	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
> +		return;
> +	*flags &= ~TCA_CLS_FLAGS_IN_HW;
> +	atomic_dec(&block->offloadcnt);
> +}
> +
> +void tc_cls_offload_cnt_update(struct tcf_block *block, struct tcf_proto *tp,
> +			       u32 *cnt, u32 *flags, u32 diff, bool add)
> +{
> +	lockdep_assert_held(&block->cb_lock);
> +
> +	spin_lock(&tp->lock);
> +	if (add) {
> +		if (!*cnt)
> +			tcf_block_offload_inc(block, flags);
> +		(*cnt) += diff;

brackets unnecessary

> +	} else {
> +		(*cnt) -= diff;
> +		if (!*cnt)
> +			tcf_block_offload_dec(block, flags);
> +	}
> +	spin_unlock(&tp->lock);
> +}
> +EXPORT_SYMBOL(tc_cls_offload_cnt_update);
> +
> +static void
> +tc_cls_offload_cnt_reset(struct tcf_block *block, struct tcf_proto *tp,
> +			 u32 *cnt, u32 *flags)
> +{
> +	lockdep_assert_held(&block->cb_lock);
> +
> +	spin_lock(&tp->lock);
> +	tcf_block_offload_dec(block, flags);
> +	(*cnt) = 0;

ditto

> +	spin_unlock(&tp->lock);
> +}
> +
> +static int
> +__tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> +		   void *type_data, bool err_stop)
>  {
>  	struct flow_block_cb *block_cb;
>  	int ok_count = 0;
>  	int err;
>  
> +	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
> +		err = block_cb->cb(type, type_data, block_cb->cb_priv);
> +		if (err) {
> +			if (err_stop)
> +				return err;
> +		} else {
> +			ok_count++;
> +		}
> +	}
> +	return ok_count;
> +}
> +
> +int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
> +		     void *type_data, bool err_stop, bool rtnl_held)
> +{
> +	int ok_count;
> +
> +	down_read(&block->cb_lock);
> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> +	up_read(&block->cb_lock);
> +	return ok_count;
> +}
> +EXPORT_SYMBOL(tc_setup_cb_call);
> +
> +/* Non-destructive filter add. If filter that wasn't already in hardware is
> + * successfully offloaded, increment block offloads counter. On failure,
> + * previously offloaded filter is considered to be intact and offloads counter
> + * is not decremented.
> + */
> +

Spurious new line here?

> +int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
> +		    enum tc_setup_type type, void *type_data, bool err_stop,
> +		    u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
> +{
> +	int ok_count;
> +
>  	down_read(&block->cb_lock);
>  	/* Make sure all netdevs sharing this block are offload-capable. */
>  	if (block->nooffloaddevcnt && err_stop) {
> @@ -3014,22 +3098,67 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>  		goto errout;
>  	}
>  
> -	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
> -		err = block_cb->cb(type, type_data, block_cb->cb_priv);
> -		if (err) {
> -			if (err_stop) {
> -				ok_count = err;
> -				goto errout;
> -			}
> -		} else {
> -			ok_count++;
> -		}
> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> +	if (ok_count > 0)
> +		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
> +					  ok_count, true);
> +errout:

and the labels again

> +	up_read(&block->cb_lock);
> +	return ok_count;
> +}
> +EXPORT_SYMBOL(tc_setup_cb_add);
> +
> +/* Destructive filter replace. If filter that wasn't already in hardware is
> + * successfully offloaded, increment block offload counter. On failure,
> + * previously offloaded filter is considered to be destroyed and offload counter
> + * is decremented.
> + */
> +

spurious new line?

> +int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
> +			enum tc_setup_type type, void *type_data, bool err_stop,
> +			u32 *old_flags, unsigned int *old_in_hw_count,
> +			u32 *new_flags, unsigned int *new_in_hw_count,
> +			bool rtnl_held)
> +{
> +	int ok_count;
> +
> +	down_read(&block->cb_lock);
> +	/* Make sure all netdevs sharing this block are offload-capable. */
> +	if (block->nooffloaddevcnt && err_stop) {
> +		ok_count = -EOPNOTSUPP;
> +		goto errout;
>  	}
> +
> +	tc_cls_offload_cnt_reset(block, tp, old_in_hw_count, old_flags);
> +
> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> +	if (ok_count > 0)
> +		tc_cls_offload_cnt_update(block, tp, new_in_hw_count, new_flags,
> +					  ok_count, true);
>  errout:
>  	up_read(&block->cb_lock);
>  	return ok_count;
>  }
> -EXPORT_SYMBOL(tc_setup_cb_call);
> +EXPORT_SYMBOL(tc_setup_cb_replace);
> +
> +/* Destroy filter and decrement block offload counter, if filter was previously
> + * offloaded.
> + */
> +

hm.. is this gap between comment and function it pertains to
intentional?

> +int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
> +			enum tc_setup_type type, void *type_data, bool err_stop,
> +			u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
> +{
> +	int ok_count;
> +
> +	down_read(&block->cb_lock);
> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> +
> +	tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
> +	up_read(&block->cb_lock);
> +	return ok_count;
> +}
> +EXPORT_SYMBOL(tc_setup_cb_destroy);
>  
>  int tc_setup_flow_action(struct flow_action *flow_action,
>  			 const struct tcf_exts *exts)
> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
> index 3f7a9c02b70c..7f304db7e697 100644
> --- a/net/sched/cls_bpf.c
> +++ b/net/sched/cls_bpf.c
> @@ -162,17 +162,21 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
>  	cls_bpf.name = obj->bpf_name;
>  	cls_bpf.exts_integrated = obj->exts_integrated;
>  
> -	if (oldprog)
> -		tcf_block_offload_dec(block, &oldprog->gen_flags);
> +	if (cls_bpf.oldprog)

why the change from oldprog to cls_bpf.oldprog?

> +		err = tc_setup_cb_replace(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
> +					  skip_sw, &oldprog->gen_flags,
> +					  &oldprog->in_hw_count,
> +					  &prog->gen_flags, &prog->in_hw_count,
> +					  true);
> +	else
> +		err = tc_setup_cb_add(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
> +				      skip_sw, &prog->gen_flags,
> +				      &prog->in_hw_count, true);
>  
> -	err = tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
>  	if (prog) {
>  		if (err < 0) {
>  			cls_bpf_offload_cmd(tp, oldprog, prog, extack);
>  			return err;
> -		} else if (err > 0) {
> -			prog->in_hw_count = err;
> -			tcf_block_offload_inc(block, &prog->gen_flags);
>  		}
>  	}
>  
> @@ -230,7 +234,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
>  	cls_bpf.name = prog->bpf_name;
>  	cls_bpf.exts_integrated = prog->exts_integrated;
>  
> -	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false);
> +	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false, true);
>  }
>  
>  static int cls_bpf_init(struct tcf_proto *tp)
> @@ -680,8 +684,8 @@ static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb
>  			continue;
>  		}
>  
> -		tc_cls_offload_cnt_update(block, &prog->in_hw_count,
> -					  &prog->gen_flags, add);
> +		tc_cls_offload_cnt_update(block, tp, &prog->in_hw_count,
> +					  &prog->gen_flags, 1, add);

Since we're adding those higher level add/replace/destroy helpers,
would it also be possible to have a helper which takes care of
reoffload? tc_cls_offload_cnt_update() is kind of low level now, it'd
be cool to also hide it in the core.

>  	}
>  
>  	return 0;
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index 054123742e32..0001a933d48b 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -419,10 +419,10 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
>  	cls_flower.command = FLOW_CLS_DESTROY;
>  	cls_flower.cookie = (unsigned long) f;
>  
> -	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
> +	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
> +			    &f->flags, &f->in_hw_count, true);
>  	spin_lock(&tp->lock);
>  	list_del_init(&f->hw_list);
> -	tcf_block_offload_dec(block, &f->flags);
>  	spin_unlock(&tp->lock);
>  
>  	if (!rtnl_held)
> @@ -466,18 +466,15 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
>  		goto errout;
>  	}
>  
> -	err = tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, skip_sw);
> +	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
> +			      skip_sw, &f->flags, &f->in_hw_count, true);
>  	kfree(cls_flower.rule);
>  
>  	if (err < 0) {
>  		fl_hw_destroy_filter(tp, f, true, NULL);
>  		goto errout;
>  	} else if (err > 0) {
> -		f->in_hw_count = err;
>  		err = 0;

Why does the tc_setup_cb* API still return the positive values, the
callers should no longer care, right?

> -		spin_lock(&tp->lock);
> -		tcf_block_offload_inc(block, &f->flags);
> -		spin_unlock(&tp->lock);
>  	}
>  
>  	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 04/10] net: sched: notify classifier on successful offload add/delete
  2019-08-22 12:43 ` [PATCH net-next 04/10] net: sched: notify classifier on successful offload add/delete Vlad Buslov
@ 2019-08-22 22:58   ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2019-08-22 22:58 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko

On Thu, 22 Aug 2019 15:43:47 +0300, Vlad Buslov wrote:
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 4215c849f4a3..d8ef7a9e6906 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3099,9 +3099,13 @@ int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
>  	}
>  
>  	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> -	if (ok_count > 0)
> -		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
> -					  ok_count, true);
> +	if (ok_count >= 0) {
> +		if (tp->ops->hw_add)
> +			tp->ops->hw_add(tp, type_data);
> +		if (ok_count > 0)
> +			tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
> +						  ok_count, true);
> +	}

nit:

	if (ok_count < 0)
		goto err_unlock;

	if (tp->ops->hw_add)
		tp->ops->hw_add(tp, type_data);
	if (ok_count > 0)
		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
					  ok_count, true);

IOW try to keep the "success flow" unindented.

>  errout:
>  	up_read(&block->cb_lock);
>  	return ok_count;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 06/10] net: sched: conditionally obtain rtnl lock in cls hw offloads API
  2019-08-22 12:43 ` [PATCH net-next 06/10] net: sched: conditionally obtain rtnl lock in cls hw offloads API Vlad Buslov
@ 2019-08-22 23:03   ` Jakub Kicinski
  2019-08-23 10:42     ` Vlad Buslov
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2019-08-22 23:03 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko

On Thu, 22 Aug 2019 15:43:49 +0300, Vlad Buslov wrote:
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 02a547aa77c0..bda42f1b5514 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3076,11 +3076,28 @@ __tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>  int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>  		     void *type_data, bool err_stop, bool rtnl_held)
>  {
> +	bool take_rtnl = false;

Should we perhaps speculatively:

	 bool take_rtnl = READ_ONCE(block->lockeddevcnt);

here? It shouldn't hurt, really, and otherwise every offload that
requires rtnl will have to re-lock cb_lock, every single time..

>  	int ok_count;
>  
> +retry:
> +	if (take_rtnl)
> +		rtnl_lock();
>  	down_read(&block->cb_lock);
> +	/* Need to obtain rtnl lock if block is bound to devs that require it.
> +	 * In block bind code cb_lock is obtained while holding rtnl, so we must
> +	 * obtain the locks in same order here.
> +	 */
> +	if (!rtnl_held && !take_rtnl && block->lockeddevcnt) {
> +		up_read(&block->cb_lock);
> +		take_rtnl = true;
> +		goto retry;
> +	}
> +
>  	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
> +
>  	up_read(&block->cb_lock);
> +	if (take_rtnl)
> +		rtnl_unlock();
>  	return ok_count;
>  }
>  EXPORT_SYMBOL(tc_setup_cb_call);


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore
  2019-08-22 22:15   ` Jakub Kicinski
@ 2019-08-23 10:25     ` Vlad Buslov
  0 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-23 10:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko


On Fri 23 Aug 2019 at 01:15, Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Thu, 22 Aug 2019 15:43:44 +0300, Vlad Buslov wrote:
>> @@ -2987,19 +3007,26 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>>  	int ok_count = 0;
>>  	int err;
>>  
>> +	down_read(&block->cb_lock);
>>  	/* Make sure all netdevs sharing this block are offload-capable. */
>> -	if (block->nooffloaddevcnt && err_stop)
>> -		return -EOPNOTSUPP;
>> +	if (block->nooffloaddevcnt && err_stop) {
>> +		ok_count = -EOPNOTSUPP;
>> +		goto errout;
>> +	}
>>  
>>  	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
>>  		err = block_cb->cb(type, type_data, block_cb->cb_priv);
>>  		if (err) {
>> -			if (err_stop)
>> -				return err;
>> +			if (err_stop) {
>> +				ok_count = err;
>> +				goto errout;
>> +			}
>>  		} else {
>>  			ok_count++;
>>  		}
>>  	}
>> +errout:
>
> Please name the labels with the first action they perform. Here:
>
> err_unlock:

Thanks for reviewing. Will fix in V2.

>
>> +	up_read(&block->cb_lock);
>>  	return ok_count;


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 03/10] net: sched: refactor block offloads counter usage
  2019-08-22 22:53   ` Jakub Kicinski
@ 2019-08-23 10:39     ` Vlad Buslov
  2019-08-23 18:26       ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Vlad Buslov @ 2019-08-23 10:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko

On Fri 23 Aug 2019 at 01:53, Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Thu, 22 Aug 2019 15:43:46 +0300, Vlad Buslov wrote:
>> Without rtnl lock protection filters can no longer safely manage block
>> offloads counter themselves. Refactor cls API to protect block offloadcnt
>> with tcf_block->cb_lock that is already used to protect driver callback
>> list and nooffloaddevcnt counter. The counter can be modified by concurrent
>> tasks by new functions that execute block callbacks (which is safe with
>> previous patch that changed its type to atomic_t), however, block
>> bind/unbind code that checks the counter value takes cb_lock in write mode
>> to exclude any concurrent modifications. This approach prevents race
>> conditions between bind/unbind and callback execution code but allows for
>> concurrency for tc rule update path.
>>
>> Move block offload counter, filter in hardware counter and filter flags
>> management from classifiers into cls hardware offloads API. Make functions
>> tcf_block_offload_inc() and tcf_block_offload_dec() to be cls API private.
>> Implement following new cls API to be used instead:
>>
>>   tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
>>   already in hardware is successfully offloaded, increment block offloads
>>   counter, set filter in hardware counter and flag. On failure, previously
>>   offloaded filter is considered to be intact and offloads counter is not
>>   decremented.
>>
>>   tc_setup_cb_replace() - destructive filter replace. Release existing
>>   filter block offload counter and reset its in hardware counter and flag.
>>   Set new filter in hardware counter and flag. On failure, previously
>>   offloaded filter is considered to be destroyed and offload counter is
>>   decremented.
>>
>>   tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
>>   offloads counter.
>>
>> Refactor all offload-capable classifiers to atomically offload filters to
>> hardware, change block offload counter, and set filter in hardware counter
>> and flag by means of the new cls API functions.
>>
>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>
> Looks good, minor nits
>
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 8502bd006b37..4215c849f4a3 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -3000,13 +3000,97 @@ int tcf_exts_dump_stats(struct sk_buff *skb, struct tcf_exts *exts)
>>  }
>>  EXPORT_SYMBOL(tcf_exts_dump_stats);
>>
>> -int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>> -		     void *type_data, bool err_stop)
>> +static void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
>> +{
>> +	if (*flags & TCA_CLS_FLAGS_IN_HW)
>> +		return;
>> +	*flags |= TCA_CLS_FLAGS_IN_HW;
>> +	atomic_inc(&block->offloadcnt);
>> +}
>> +
>> +static void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
>> +{
>> +	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
>> +		return;
>> +	*flags &= ~TCA_CLS_FLAGS_IN_HW;
>> +	atomic_dec(&block->offloadcnt);
>> +}
>> +
>> +void tc_cls_offload_cnt_update(struct tcf_block *block, struct tcf_proto *tp,
>> +			       u32 *cnt, u32 *flags, u32 diff, bool add)
>> +{
>> +	lockdep_assert_held(&block->cb_lock);
>> +
>> +	spin_lock(&tp->lock);
>> +	if (add) {
>> +		if (!*cnt)
>> +			tcf_block_offload_inc(block, flags);
>> +		(*cnt) += diff;
>
> brackets unnecessary
>
>> +	} else {
>> +		(*cnt) -= diff;
>> +		if (!*cnt)
>> +			tcf_block_offload_dec(block, flags);
>> +	}
>> +	spin_unlock(&tp->lock);
>> +}
>> +EXPORT_SYMBOL(tc_cls_offload_cnt_update);
>> +
>> +static void
>> +tc_cls_offload_cnt_reset(struct tcf_block *block, struct tcf_proto *tp,
>> +			 u32 *cnt, u32 *flags)
>> +{
>> +	lockdep_assert_held(&block->cb_lock);
>> +
>> +	spin_lock(&tp->lock);
>> +	tcf_block_offload_dec(block, flags);
>> +	(*cnt) = 0;
>
> ditto
>
>> +	spin_unlock(&tp->lock);
>> +}
>> +
>> +static int
>> +__tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>> +		   void *type_data, bool err_stop)
>>  {
>>  	struct flow_block_cb *block_cb;
>>  	int ok_count = 0;
>>  	int err;
>>
>> +	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
>> +		err = block_cb->cb(type, type_data, block_cb->cb_priv);
>> +		if (err) {
>> +			if (err_stop)
>> +				return err;
>> +		} else {
>> +			ok_count++;
>> +		}
>> +	}
>> +	return ok_count;
>> +}
>> +
>> +int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>> +		     void *type_data, bool err_stop, bool rtnl_held)
>> +{
>> +	int ok_count;
>> +
>> +	down_read(&block->cb_lock);
>> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
>> +	up_read(&block->cb_lock);
>> +	return ok_count;
>> +}
>> +EXPORT_SYMBOL(tc_setup_cb_call);
>> +
>> +/* Non-destructive filter add. If filter that wasn't already in hardware is
>> + * successfully offloaded, increment block offloads counter. On failure,
>> + * previously offloaded filter is considered to be intact and offloads counter
>> + * is not decremented.
>> + */
>> +
>
> Spurious new line here?
>
>> +int tc_setup_cb_add(struct tcf_block *block, struct tcf_proto *tp,
>> +		    enum tc_setup_type type, void *type_data, bool err_stop,
>> +		    u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
>> +{
>> +	int ok_count;
>> +
>>  	down_read(&block->cb_lock);
>>  	/* Make sure all netdevs sharing this block are offload-capable. */
>>  	if (block->nooffloaddevcnt && err_stop) {
>> @@ -3014,22 +3098,67 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>>  		goto errout;
>>  	}
>>
>> -	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
>> -		err = block_cb->cb(type, type_data, block_cb->cb_priv);
>> -		if (err) {
>> -			if (err_stop) {
>> -				ok_count = err;
>> -				goto errout;
>> -			}
>> -		} else {
>> -			ok_count++;
>> -		}
>> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
>> +	if (ok_count > 0)
>> +		tc_cls_offload_cnt_update(block, tp, in_hw_count, flags,
>> +					  ok_count, true);
>> +errout:
>
> and the labels again
>
>> +	up_read(&block->cb_lock);
>> +	return ok_count;
>> +}
>> +EXPORT_SYMBOL(tc_setup_cb_add);
>> +
>> +/* Destructive filter replace. If filter that wasn't already in hardware is
>> + * successfully offloaded, increment block offload counter. On failure,
>> + * previously offloaded filter is considered to be destroyed and offload counter
>> + * is decremented.
>> + */
>> +
>
> spurious new line?
>
>> +int tc_setup_cb_replace(struct tcf_block *block, struct tcf_proto *tp,
>> +			enum tc_setup_type type, void *type_data, bool err_stop,
>> +			u32 *old_flags, unsigned int *old_in_hw_count,
>> +			u32 *new_flags, unsigned int *new_in_hw_count,
>> +			bool rtnl_held)
>> +{
>> +	int ok_count;
>> +
>> +	down_read(&block->cb_lock);
>> +	/* Make sure all netdevs sharing this block are offload-capable. */
>> +	if (block->nooffloaddevcnt && err_stop) {
>> +		ok_count = -EOPNOTSUPP;
>> +		goto errout;
>>  	}
>> +
>> +	tc_cls_offload_cnt_reset(block, tp, old_in_hw_count, old_flags);
>> +
>> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
>> +	if (ok_count > 0)
>> +		tc_cls_offload_cnt_update(block, tp, new_in_hw_count, new_flags,
>> +					  ok_count, true);
>>  errout:
>>  	up_read(&block->cb_lock);
>>  	return ok_count;
>>  }
>> -EXPORT_SYMBOL(tc_setup_cb_call);
>> +EXPORT_SYMBOL(tc_setup_cb_replace);
>> +
>> +/* Destroy filter and decrement block offload counter, if filter was previously
>> + * offloaded.
>> + */
>> +
>
> hm.. is this gap between comment and function it pertains to
> intentional?

Majority of function comments in cls_api.c have newline after them (not
all of them though). I don't have any strong opinions regarding this.
You suggest it is better not to have blank lines after function
comments?

>
>> +int tc_setup_cb_destroy(struct tcf_block *block, struct tcf_proto *tp,
>> +			enum tc_setup_type type, void *type_data, bool err_stop,
>> +			u32 *flags, unsigned int *in_hw_count, bool rtnl_held)
>> +{
>> +	int ok_count;
>> +
>> +	down_read(&block->cb_lock);
>> +	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
>> +
>> +	tc_cls_offload_cnt_reset(block, tp, in_hw_count, flags);
>> +	up_read(&block->cb_lock);
>> +	return ok_count;
>> +}
>> +EXPORT_SYMBOL(tc_setup_cb_destroy);
>>
>>  int tc_setup_flow_action(struct flow_action *flow_action,
>>  			 const struct tcf_exts *exts)
>> diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
>> index 3f7a9c02b70c..7f304db7e697 100644
>> --- a/net/sched/cls_bpf.c
>> +++ b/net/sched/cls_bpf.c
>> @@ -162,17 +162,21 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
>>  	cls_bpf.name = obj->bpf_name;
>>  	cls_bpf.exts_integrated = obj->exts_integrated;
>>
>> -	if (oldprog)
>> -		tcf_block_offload_dec(block, &oldprog->gen_flags);
>> +	if (cls_bpf.oldprog)
>
> why the change from oldprog to cls_bpf.oldprog?

No reason. Looks like a mistake I made when rewriting the conditional
for new tc_setup_cb_*() API.

>
>> +		err = tc_setup_cb_replace(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
>> +					  skip_sw, &oldprog->gen_flags,
>> +					  &oldprog->in_hw_count,
>> +					  &prog->gen_flags, &prog->in_hw_count,
>> +					  true);
>> +	else
>> +		err = tc_setup_cb_add(block, tp, TC_SETUP_CLSBPF, &cls_bpf,
>> +				      skip_sw, &prog->gen_flags,
>> +				      &prog->in_hw_count, true);
>>
>> -	err = tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, skip_sw);
>>  	if (prog) {
>>  		if (err < 0) {
>>  			cls_bpf_offload_cmd(tp, oldprog, prog, extack);
>>  			return err;
>> -		} else if (err > 0) {
>> -			prog->in_hw_count = err;
>> -			tcf_block_offload_inc(block, &prog->gen_flags);
>>  		}
>>  	}
>>
>> @@ -230,7 +234,7 @@ static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
>>  	cls_bpf.name = prog->bpf_name;
>>  	cls_bpf.exts_integrated = prog->exts_integrated;
>>
>> -	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false);
>> +	tc_setup_cb_call(block, TC_SETUP_CLSBPF, &cls_bpf, false, true);
>>  }
>>
>>  static int cls_bpf_init(struct tcf_proto *tp)
>> @@ -680,8 +684,8 @@ static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb
>>  			continue;
>>  		}
>>
>> -		tc_cls_offload_cnt_update(block, &prog->in_hw_count,
>> -					  &prog->gen_flags, add);
>> +		tc_cls_offload_cnt_update(block, tp, &prog->in_hw_count,
>> +					  &prog->gen_flags, 1, add);
>
> Since we're adding those higher level add/replace/destroy helpers,
> would it also be possible to have a helper which takes care of
> reoffload? tc_cls_offload_cnt_update() is kind of low level now, it'd
> be cool to also hide it in the core.

Agree. I'll try to come up with something more elegant.

>
>>  	}
>>
>>  	return 0;
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 054123742e32..0001a933d48b 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -419,10 +419,10 @@ static void fl_hw_destroy_filter(struct tcf_proto *tp, struct cls_fl_filter *f,
>>  	cls_flower.command = FLOW_CLS_DESTROY;
>>  	cls_flower.cookie = (unsigned long) f;
>>
>> -	tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, false);
>> +	tc_setup_cb_destroy(block, tp, TC_SETUP_CLSFLOWER, &cls_flower, false,
>> +			    &f->flags, &f->in_hw_count, true);
>>  	spin_lock(&tp->lock);
>>  	list_del_init(&f->hw_list);
>> -	tcf_block_offload_dec(block, &f->flags);
>>  	spin_unlock(&tp->lock);
>>
>>  	if (!rtnl_held)
>> @@ -466,18 +466,15 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
>>  		goto errout;
>>  	}
>>
>> -	err = tc_setup_cb_call(block, TC_SETUP_CLSFLOWER, &cls_flower, skip_sw);
>> +	err = tc_setup_cb_add(block, tp, TC_SETUP_CLSFLOWER, &cls_flower,
>> +			      skip_sw, &f->flags, &f->in_hw_count, true);
>>  	kfree(cls_flower.rule);
>>
>>  	if (err < 0) {
>>  		fl_hw_destroy_filter(tp, f, true, NULL);
>>  		goto errout;
>>  	} else if (err > 0) {
>> -		f->in_hw_count = err;
>>  		err = 0;
>
> Why does the tc_setup_cb* API still return the positive values, the
> callers should no longer care, right?

Yep. I'll refactor this for V2 and simplify related conditionals in
classifiers.

>
>> -		spin_lock(&tp->lock);
>> -		tcf_block_offload_inc(block, &f->flags);
>> -		spin_unlock(&tp->lock);
>>  	}
>>
>>  	if (skip_sw && !(f->flags & TCA_CLS_FLAGS_IN_HW)) {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 06/10] net: sched: conditionally obtain rtnl lock in cls hw offloads API
  2019-08-22 23:03   ` Jakub Kicinski
@ 2019-08-23 10:42     ` Vlad Buslov
  0 siblings, 0 replies; 19+ messages in thread
From: Vlad Buslov @ 2019-08-23 10:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vlad Buslov, netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko


On Fri 23 Aug 2019 at 02:03, Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Thu, 22 Aug 2019 15:43:49 +0300, Vlad Buslov wrote:
>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>> index 02a547aa77c0..bda42f1b5514 100644
>> --- a/net/sched/cls_api.c
>> +++ b/net/sched/cls_api.c
>> @@ -3076,11 +3076,28 @@ __tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>>  int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
>>  		     void *type_data, bool err_stop, bool rtnl_held)
>>  {
>> +	bool take_rtnl = false;
>
> Should we perhaps speculatively:
>
> 	 bool take_rtnl = READ_ONCE(block->lockeddevcnt);
>
> here? It shouldn't hurt, really, and otherwise every offload that
> requires rtnl will have to re-lock cb_lock, every single time..

Great idea! This looks like a straightforward opportunity for
optimization.

>
>>  	int ok_count;
>>
>> +retry:
>> +	if (take_rtnl)
>> +		rtnl_lock();
>>  	down_read(&block->cb_lock);
>> +	/* Need to obtain rtnl lock if block is bound to devs that require it.
>> +	 * In block bind code cb_lock is obtained while holding rtnl, so we must
>> +	 * obtain the locks in same order here.
>> +	 */
>> +	if (!rtnl_held && !take_rtnl && block->lockeddevcnt) {
>> +		up_read(&block->cb_lock);
>> +		take_rtnl = true;
>> +		goto retry;
>> +	}
>> +
>>  	ok_count = __tc_setup_cb_call(block, type, type_data, err_stop);
>> +
>>  	up_read(&block->cb_lock);
>> +	if (take_rtnl)
>> +		rtnl_unlock();
>>  	return ok_count;
>>  }
>>  EXPORT_SYMBOL(tc_setup_cb_call);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 03/10] net: sched: refactor block offloads counter usage
  2019-08-23 10:39     ` Vlad Buslov
@ 2019-08-23 18:26       ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2019-08-23 18:26 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo, Jiri Pirko, linux-doc

On Fri, 23 Aug 2019 10:39:50 +0000, Vlad Buslov wrote:
> >> +/* Destroy filter and decrement block offload counter, if filter was previously
> >> + * offloaded.
> >> + */
> >> +  
> >
> > hm.. is this gap between comment and function it pertains to
> > intentional?  
> 
> Majority of function comments in cls_api.c have newline after them (not
> all of them though). I don't have any strong opinions regarding this.
> You suggest it is better not to have blank lines after function
> comments?

Ah, you're right. I think it's pretty strange to have a new line after
a comment which pertains only to the function which is immediately
following it. Often the new line is used as a separation, when the
comment describes whole section of the file..

I kind of wish kdoc allowed none of the parameters to be described.
Often you want to document the function but the parameters are kind 
of obvious.

Anyway... feel free to leave this as is.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-08-23 18:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 12:43 [PATCH net-next 00/10] Refactor cls hardware offload API to support rtnl-independent drivers Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 01/10] net: sched: protect block offload-related fields with rw_semaphore Vlad Buslov
2019-08-22 22:15   ` Jakub Kicinski
2019-08-23 10:25     ` Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 02/10] net: sched: change tcf block offload counter type to atomic_t Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 03/10] net: sched: refactor block offloads counter usage Vlad Buslov
2019-08-22 22:53   ` Jakub Kicinski
2019-08-23 10:39     ` Vlad Buslov
2019-08-23 18:26       ` Jakub Kicinski
2019-08-22 12:43 ` [PATCH net-next 04/10] net: sched: notify classifier on successful offload add/delete Vlad Buslov
2019-08-22 22:58   ` Jakub Kicinski
2019-08-22 12:43 ` [PATCH net-next 05/10] net: sched: add API for registering unlocked offload block callbacks Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 06/10] net: sched: conditionally obtain rtnl lock in cls hw offloads API Vlad Buslov
2019-08-22 23:03   ` Jakub Kicinski
2019-08-23 10:42     ` Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 07/10] net: sched: take rtnl lock in tc_setup_flow_action() Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 08/10] net: sched: take reference to action dev before calling offloads Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 09/10] net: sched: copy tunnel info when setting flow_action entry->tunnel Vlad Buslov
2019-08-22 12:43 ` [PATCH net-next 10/10] net: sched: flower: don't take rtnl lock for cls hw offloads API Vlad Buslov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).