All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT
@ 2021-06-09 10:33 Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 01/17] rcu: Create an unrcu_pointer() to remove __rcu from a pointer Toke Høiland-Jørgensen
                   ` (17 more replies)
  0 siblings, 18 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen

During the discussion[0] of Hangbin's multicast patch series, Martin pointed out
that the lifetime of the RCU-protected  map entries used by XDP_REDIRECT is by
no means obvious. I promised to look into cleaning this up, and Paul helpfully
provided some hints and a new unrcu_pointer() helper to aid in this.

This is mostly a documentation exercise, clearing up the description of the
lifetime expectations and adding __rcu annotations so sparse and lockdep can
help verify it.

Patches 1-2 are prepatory: Patch 1 adds Paul's unrcu_pointer() helper (which has
already been added to his tree) and patch 2 is a small fix for
dev_get_by_index_rcu() so lockdep understands _bh-disabled access to it. Patch 3
is the main bit that adds the __rcu annotations and updates documentation
comments, and the rest are patches updating the drivers, with one patch per
distinct maintainer.

Unfortunately I don't have any hardware to test any of the driver patches;
Jesper helpfully verified that it doesn't break anything on i40e, but the rest
of the driver patches are only compile-tested.

[0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/

Paul E. McKenney (1):
  rcu: Create an unrcu_pointer() to remove __rcu from a pointer

Toke Høiland-Jørgensen (16):
  bpf: allow RCU-protected lookups to happen from bh context
  dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU
    dev ref
  xdp: add proper __rcu annotations to redirect map entries
  ena: remove rcu_read_lock() around XDP program invocation
  bnxt: remove rcu_read_lock() around XDP program invocation
  thunderx: remove rcu_read_lock() around XDP program invocation
  freescale: remove rcu_read_lock() around XDP program invocation
  net: intel: remove rcu_read_lock() around XDP program invocation
  marvell: remove rcu_read_lock() around XDP program invocation
  mlx4: remove rcu_read_lock() around XDP program invocation
  nfp: remove rcu_read_lock() around XDP program invocation
  qede: remove rcu_read_lock() around XDP program invocation
  sfc: remove rcu_read_lock() around XDP program invocation
  netsec: remove rcu_read_lock() around XDP program invocation
  stmmac: remove rcu_read_lock() around XDP program invocation
  net: ti: remove rcu_read_lock() around XDP program invocation

 drivers/net/ethernet/amazon/ena/ena_netdev.c  |  3 --
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  2 -
 .../net/ethernet/cavium/thunder/nicvf_main.c  |  2 -
 .../net/ethernet/freescale/dpaa/dpaa_eth.c    |  8 +--
 .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  3 --
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 -
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  6 +--
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  6 +--
 drivers/net/ethernet/intel/ice/ice_xsk.c      |  6 +--
 drivers/net/ethernet/intel/igb/igb_main.c     |  2 -
 drivers/net/ethernet/intel/igc/igc_main.c     |  7 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  6 +--
 .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 -
 drivers/net/ethernet/marvell/mvneta.c         |  2 -
 .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 --
 drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  8 +--
 .../ethernet/netronome/nfp/nfp_net_common.c   |  2 -
 drivers/net/ethernet/qlogic/qede/qede_fp.c    |  6 ---
 drivers/net/ethernet/sfc/rx.c                 |  9 +---
 drivers/net/ethernet/socionext/netsec.c       |  3 --
 .../net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +---
 drivers/net/ethernet/ti/cpsw_priv.c           | 10 +---
 include/linux/rcupdate.h                      | 14 +++++
 include/net/xdp_sock.h                        |  2 +-
 kernel/bpf/cpumap.c                           | 14 +++--
 kernel/bpf/devmap.c                           | 52 ++++++++-----------
 kernel/bpf/hashtab.c                          | 21 +++++---
 kernel/bpf/helpers.c                          |  6 +--
 kernel/bpf/lpm_trie.c                         |  6 ++-
 net/core/dev.c                                |  2 +-
 net/core/filter.c                             | 28 ++++++++++
 net/xdp/xsk.c                                 |  4 +-
 net/xdp/xsk.h                                 |  4 +-
 net/xdp/xskmap.c                              | 29 ++++++-----
 35 files changed, 134 insertions(+), 159 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 01/17] rcu: Create an unrcu_pointer() to remove __rcu from a pointer
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen

From: "Paul E. McKenney" <paulmck@kernel.org>

The xchg() and cmpxchg() functions are sometimes used to carry out RCU
updates.  Unfortunately, this can result in sparse warnings for both
the old-value and new-value arguments, as well as for the return value.
The arguments can be dealt with using RCU_INITIALIZER():

        old_p = xchg(&p, RCU_INITIALIZER(new_p));

But a sparse warning still remains due to assigning the __rcu pointer
returned from xchg to the (most likely) non-__rcu pointer old_p.

This commit therefore provides an unrcu_pointer() macro that strips
the __rcu.  This macro can be used as follows:

        old_p = unrcu_pointer(xchg(&p, RCU_INITIALIZER(new_p)));

Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/rcupdate.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9455476c5ba2..d7895b81264e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -363,6 +363,20 @@ static inline void rcu_preempt_sleep_check(void) { }
 #define rcu_check_sparse(p, space)
 #endif /* #else #ifdef __CHECKER__ */
 
+/**
+ * unrcu_pointer - mark a pointer as not being RCU protected
+ * @p: pointer needing to lose its __rcu property
+ *
+ * Converts @p from an __rcu pointer to a __kernel pointer.
+ * This allows an __rcu pointer to be used with xchg() and friends.
+ */
+#define unrcu_pointer(p)						\
+({									\
+	typeof(*p) *_________p1 = (typeof(*p) *__force)(p);		\
+	rcu_check_sparse(p, __rcu); 					\
+	((typeof(*p) __force __kernel *)(_________p1)); 		\
+})
+
 #define __rcu_access_pointer(p, space) \
 ({ \
 	typeof(*p) *_________p1 = (typeof(*p) *__force)READ_ONCE(p); \
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 01/17] rcu: Create an unrcu_pointer() to remove __rcu from a pointer Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-10 18:38   ` Alexei Starovoitov
  2021-06-10 19:33   ` Martin KaFai Lau
  2021-06-09 10:33 ` [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref Toke Høiland-Jørgensen
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen

XDP programs are called from a NAPI poll context, which means the RCU
reference liveness is ensured by local_bh_disable(). Add
rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
lockdep understands that the dereferences are safe from inside *either* an
rcu_read_lock() section *or* a local_bh_disable() section. This is done in
preparation for removing the redundant rcu_read_lock()s from the drivers.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 kernel/bpf/hashtab.c  | 21 ++++++++++++++-------
 kernel/bpf/helpers.c  |  6 +++---
 kernel/bpf/lpm_trie.c |  6 ++++--
 3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 6f6681b07364..72c58cc516a3 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
 	struct htab_elem *l;
 	u32 hash, key_size;
 
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
 
@@ -989,7 +990,8 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
 		/* unknown flags */
 		return -EINVAL;
 
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
 
@@ -1082,7 +1084,8 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
 		/* unknown flags */
 		return -EINVAL;
 
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
 
@@ -1148,7 +1151,8 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
 		/* unknown flags */
 		return -EINVAL;
 
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
 
@@ -1202,7 +1206,8 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
 		/* unknown flags */
 		return -EINVAL;
 
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
 
@@ -1276,7 +1281,8 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
 	u32 hash, key_size;
 	int ret;
 
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
 
@@ -1311,7 +1317,8 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
 	u32 hash, key_size;
 	int ret;
 
-	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
+		     !rcu_read_lock_bh_held());
 
 	key_size = map->key_size;
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 544773970dbc..e880f6bb6f28 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -28,7 +28,7 @@
  */
 BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
 {
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
 	return (unsigned long) map->ops->map_lookup_elem(map, key);
 }
 
@@ -44,7 +44,7 @@ const struct bpf_func_proto bpf_map_lookup_elem_proto = {
 BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
 	   void *, value, u64, flags)
 {
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
 	return map->ops->map_update_elem(map, key, value, flags);
 }
 
@@ -61,7 +61,7 @@ const struct bpf_func_proto bpf_map_update_elem_proto = {
 
 BPF_CALL_2(bpf_map_delete_elem, struct bpf_map *, map, void *, key)
 {
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
 	return map->ops->map_delete_elem(map, key);
 }
 
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 1b7b8a6f34ee..423549d2c52e 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -232,7 +232,8 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 
 	/* Start walking the trie from the root node ... */
 
-	for (node = rcu_dereference(trie->root); node;) {
+	for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
+	     node;) {
 		unsigned int next_bit;
 		size_t matchlen;
 
@@ -264,7 +265,8 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
 		 * traverse down.
 		 */
 		next_bit = extract_bit(key->data, node->prefixlen);
-		node = rcu_dereference(node->child[next_bit]);
+		node = rcu_dereference_check(node->child[next_bit],
+					     rcu_read_lock_bh_held());
 	}
 
 	if (!found)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 01/17] rcu: Create an unrcu_pointer() to remove __rcu from a pointer Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-10 19:37   ` Martin KaFai Lau
  2021-06-09 10:33 ` [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries Toke Høiland-Jørgensen
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen

Some of the XDP helpers (in particular, xdp_do_redirect()) will get a
struct net_device reference using dev_get_by_index_rcu(). These are called
from a NAPI poll context, which means the RCU reference liveness is ensured
by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the
RCU list traversal in dev_get_by_index_rcu() so lockdep understands that
the dereferences are safe from *both* an rcu_read_lock() *and* with
local_bh_disable().

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index febb23708184..a499c5ffe4a5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1002,7 +1002,7 @@ struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex)
 	struct net_device *dev;
 	struct hlist_head *head = dev_index_hash(net, ifindex);
 
-	hlist_for_each_entry_rcu(dev, head, index_hlist)
+	hlist_for_each_entry_rcu(dev, head, index_hlist, rcu_read_lock_bh_held())
 		if (dev->ifindex == ifindex)
 			return dev;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (2 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-10 21:09   ` Martin KaFai Lau
  2021-06-09 10:33 ` [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation Toke Høiland-Jørgensen
                   ` (13 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen

XDP_REDIRECT works by a three-step process: the bpf_redirect() and
bpf_redirect_map() helpers will lookup the target of the redirect and store
it (along with some other metadata) in a per-CPU struct bpf_redirect_info.
Next, when the program returns the XDP_REDIRECT return code, the driver
will call xdp_do_redirect() which will use the information thus stored to
actually enqueue the frame into a bulk queue structure (that differs
slightly by map type, but shares the same principle). Finally, before
exiting its NAPI poll loop, the driver will call xdp_do_flush(), which will
flush all the different bulk queues, thus completing the redirect.

Pointers to the map entries will be kept around for this whole sequence of
steps, protected by RCU. However, there is no top-level rcu_read_lock() in
the core code; instead drivers add their own rcu_read_lock() around the XDP
portions of the code, but somewhat inconsistently as Martin discovered[0].
However, things still work because everything happens inside a single NAPI
poll sequence, which means it's between a pair of calls to
local_bh_disable()/local_bh_enable(). So Paul suggested[1] that we could
document this intention by using rcu_dereference_check() with
rcu_read_lock_bh_held() as a second parameter, thus allowing sparse and
lockdep to verify that everything is done correctly.

This patch does just that: we add an __rcu annotation to the map entry
pointers and remove the various comments explaining the NAPI poll assurance
strewn through devmap.c in favour of a longer explanation in filter.c. The
goal is to have one coherent documentation of the entire flow, and rely on
the RCU annotations as a "standard" way of communicating the flow in the
map code (which can additionally be understood by sparse and lockdep).

The RCU annotation replacements result in a fairly straight-forward
replacement where READ_ONCE() becomes rcu_dereference_check(), WRITE_ONCE()
becomes rcu_assign_pointer() and xchg() and cmpxchg() gets wrapped in the
proper constructs to cast the pointer back and forth between __rcu and
__kernel address space (for the benefit of sparse). The one complication is
that xskmap has a few constructions where double-pointers are passed back
and forth; these simply all gain __rcu annotations, and only the final
reference/dereference to the inner-most pointer gets changed.

With this, everything can be run through sparse without eliciting
complaints, and lockdep can verify correctness even without the use of
rcu_read_lock() in the drivers. Subsequent patches will clean these up from
the drivers.

[0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/
[1] https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/xdp_sock.h |  2 +-
 kernel/bpf/cpumap.c    | 14 ++++++++----
 kernel/bpf/devmap.c    | 52 +++++++++++++++++++-----------------------
 net/core/filter.c      | 28 +++++++++++++++++++++++
 net/xdp/xsk.c          |  4 ++--
 net/xdp/xsk.h          |  4 ++--
 net/xdp/xskmap.c       | 29 +++++++++++++----------
 7 files changed, 83 insertions(+), 50 deletions(-)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 9c0722c6d7ac..fff069d2ed1b 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -37,7 +37,7 @@ struct xdp_umem {
 struct xsk_map {
 	struct bpf_map map;
 	spinlock_t lock; /* Synchronize map updates */
-	struct xdp_sock *xsk_map[];
+	struct xdp_sock __rcu *xsk_map[];
 };
 
 struct xdp_sock {
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index a1a0c4e791c6..bfa94efc5c05 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -74,7 +74,7 @@ struct bpf_cpu_map_entry {
 struct bpf_cpu_map {
 	struct bpf_map map;
 	/* Below members specific for map type */
-	struct bpf_cpu_map_entry **cpu_map;
+	struct bpf_cpu_map_entry __rcu **cpu_map;
 };
 
 static DEFINE_PER_CPU(struct list_head, cpu_map_flush_list);
@@ -469,7 +469,7 @@ static void __cpu_map_entry_replace(struct bpf_cpu_map *cmap,
 {
 	struct bpf_cpu_map_entry *old_rcpu;
 
-	old_rcpu = xchg(&cmap->cpu_map[key_cpu], rcpu);
+	old_rcpu = unrcu_pointer(xchg(&cmap->cpu_map[key_cpu], RCU_INITIALIZER(rcpu)));
 	if (old_rcpu) {
 		call_rcu(&old_rcpu->rcu, __cpu_map_entry_free);
 		INIT_WORK(&old_rcpu->kthread_stop_wq, cpu_map_kthread_stop);
@@ -551,7 +551,8 @@ static void cpu_map_free(struct bpf_map *map)
 	for (i = 0; i < cmap->map.max_entries; i++) {
 		struct bpf_cpu_map_entry *rcpu;
 
-		rcpu = READ_ONCE(cmap->cpu_map[i]);
+		rcpu = rcu_dereference_check(cmap->cpu_map[i],
+					     rcu_read_lock_bh_held());
 		if (!rcpu)
 			continue;
 
@@ -562,6 +563,10 @@ static void cpu_map_free(struct bpf_map *map)
 	kfree(cmap);
 }
 
+/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
+ * by local_bh_disable() (from XDP calls inside NAPI). The
+ * rcu_read_lock_bh_held() below makes lockdep accept both.
+ */
 static void *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_cpu_map *cmap = container_of(map, struct bpf_cpu_map, map);
@@ -570,7 +575,8 @@ static void *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
 	if (key >= map->max_entries)
 		return NULL;
 
-	rcpu = READ_ONCE(cmap->cpu_map[key]);
+	rcpu = rcu_dereference_check(cmap->cpu_map[key],
+				     rcu_read_lock_bh_held());
 	return rcpu;
 }
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 2a75e6c2d27d..ae6d9bfeae06 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -73,7 +73,7 @@ struct bpf_dtab_netdev {
 
 struct bpf_dtab {
 	struct bpf_map map;
-	struct bpf_dtab_netdev **netdev_map; /* DEVMAP type only */
+	struct bpf_dtab_netdev __rcu **netdev_map; /* DEVMAP type only */
 	struct list_head list;
 
 	/* these are only used for DEVMAP_HASH type maps */
@@ -226,7 +226,7 @@ static void dev_map_free(struct bpf_map *map)
 		for (i = 0; i < dtab->map.max_entries; i++) {
 			struct bpf_dtab_netdev *dev;
 
-			dev = dtab->netdev_map[i];
+			dev = rcu_dereference_raw(dtab->netdev_map[i]);
 			if (!dev)
 				continue;
 
@@ -259,6 +259,10 @@ static int dev_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
 	return 0;
 }
 
+/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
+ * by local_bh_disable() (from XDP calls inside NAPI). The
+ * rcu_read_lock_bh_held() below makes lockdep accept both.
+ */
 static void *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
@@ -266,7 +270,8 @@ static void *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key)
 	struct bpf_dtab_netdev *dev;
 
 	hlist_for_each_entry_rcu(dev, head, index_hlist,
-				 lockdep_is_held(&dtab->index_lock))
+				 (lockdep_is_held(&dtab->index_lock) ||
+				  rcu_read_lock_bh_held()))
 		if (dev->idx == key)
 			return dev;
 
@@ -410,15 +415,9 @@ static void bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 	trace_xdp_devmap_xmit(bq->dev_rx, dev, sent, cnt - sent, err);
 }
 
-/* __dev_flush is called from xdp_do_flush() which _must_ be signaled
- * from the driver before returning from its napi->poll() routine. The poll()
- * routine is called either from busy_poll context or net_rx_action signaled
- * from NET_RX_SOFTIRQ. Either way the poll routine must complete before the
- * net device can be torn down. On devmap tear down we ensure the flush list
- * is empty before completing to ensure all flush operations have completed.
- * When drivers update the bpf program they may need to ensure any flush ops
- * are also complete. Using synchronize_rcu or call_rcu will suffice for this
- * because both wait for napi context to exit.
+/* __dev_flush is called from xdp_do_flush() which _must_ be signalled from the
+ * driver before returning from its napi->poll() routine. See the comment above
+ * xdp_do_flush() in filter.c.
  */
 void __dev_flush(void)
 {
@@ -433,9 +432,9 @@ void __dev_flush(void)
 	}
 }
 
-/* rcu_read_lock (from syscall and BPF contexts) ensures that if a delete and/or
- * update happens in parallel here a dev_put won't happen until after reading
- * the ifindex.
+/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
+ * by local_bh_disable() (from XDP calls inside NAPI). The
+ * rcu_read_lock_bh_held() below makes lockdep accept both.
  */
 static void *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 {
@@ -445,12 +444,14 @@ static void *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
 	if (key >= map->max_entries)
 		return NULL;
 
-	obj = READ_ONCE(dtab->netdev_map[key]);
+	obj = rcu_dereference_check(dtab->netdev_map[key],
+				    rcu_read_lock_bh_held());
 	return obj;
 }
 
-/* Runs under RCU-read-side, plus in softirq under NAPI protection.
- * Thus, safe percpu variable access.
+/* Runs in NAPI, i.e., softirq under local_bh_disable(). Thus, safe percpu
+ * variable access, and map elements stick around. See comment above
+ * xdp_do_flush() in filter.c.
  */
 static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
 		       struct net_device *dev_rx, struct bpf_prog *xdp_prog)
@@ -735,14 +736,7 @@ static int dev_map_delete_elem(struct bpf_map *map, void *key)
 	if (k >= map->max_entries)
 		return -EINVAL;
 
-	/* Use call_rcu() here to ensure any rcu critical sections have
-	 * completed as well as any flush operations because call_rcu
-	 * will wait for preempt-disable region to complete, NAPI in this
-	 * context.  And additionally, the driver tear down ensures all
-	 * soft irqs are complete before removing the net device in the
-	 * case of dev_put equals zero.
-	 */
-	old_dev = xchg(&dtab->netdev_map[k], NULL);
+	old_dev = unrcu_pointer(xchg(&dtab->netdev_map[k], NULL));
 	if (old_dev)
 		call_rcu(&old_dev->rcu, __dev_map_entry_free);
 	return 0;
@@ -851,7 +845,7 @@ static int __dev_map_update_elem(struct net *net, struct bpf_map *map,
 	 * Remembering the driver side flush operation will happen before the
 	 * net device is removed.
 	 */
-	old_dev = xchg(&dtab->netdev_map[i], dev);
+	old_dev = unrcu_pointer(xchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev)));
 	if (old_dev)
 		call_rcu(&old_dev->rcu, __dev_map_entry_free);
 
@@ -1031,10 +1025,10 @@ static int dev_map_notification(struct notifier_block *notifier,
 			for (i = 0; i < dtab->map.max_entries; i++) {
 				struct bpf_dtab_netdev *dev, *odev;
 
-				dev = READ_ONCE(dtab->netdev_map[i]);
+				dev = rcu_dereference(dtab->netdev_map[i]);
 				if (!dev || netdev != dev->dev)
 					continue;
-				odev = cmpxchg(&dtab->netdev_map[i], dev, NULL);
+				odev = unrcu_pointer(cmpxchg(&dtab->netdev_map[i], RCU_INITIALIZER(dev), NULL));
 				if (dev == odev)
 					call_rcu(&dev->rcu,
 						 __dev_map_entry_free);
diff --git a/net/core/filter.c b/net/core/filter.c
index caa88955562e..0b7db5c70385 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3922,6 +3922,34 @@ static const struct bpf_func_proto bpf_xdp_adjust_meta_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+/* XDP_REDIRECT works by a three-step process, implemented in the functions
+ * below:
+ *
+ * 1. The bpf_redirect() and bpf_redirect_map() helpers will lookup the target
+ *    of the redirect and store it (along with some other metadata) in a per-CPU
+ *    struct bpf_redirect_info.
+ *
+ * 2. When the program returns the XDP_REDIRECT return code, the driver will
+ *    call xdp_do_redirect() which will use the information in struct
+ *    bpf_redirect_info to actually enqueue the frame into a map type-specific
+ *    bulk queue structure.
+ *
+ * 3. Before exiting its NAPI poll loop, the driver will call xdp_do_flush(),
+ *    which will flush all the different bulk queues, thus completing the
+ *    redirect.
+ *
+ * Pointers to the map entries will be kept around for this whole sequence of
+ * steps, protected by RCU. However, there is no top-level rcu_read_lock() in
+ * the core code; instead, the RCU protection relies on everything happening
+ * inside a single NAPI poll sequence, which means it's between a pair of calls
+ * to local_bh_disable()/local_bh_enable().
+ *
+ * The map entries are marked as __rcu and the map code makes sure to
+ * dereference those pointers with rcu_dereference_check() in a way that works
+ * for both sections that to hold an rcu_read_lock() and sections that are
+ * called from NAPI without a separate rcu_read_lock(). The code below does not
+ * use RCU annotations, but relies on those in the map code.
+ */
 void xdp_do_flush(void)
 {
 	__dev_flush();
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index cd62d4ba87a9..996da915f520 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -749,7 +749,7 @@ static void xsk_unbind_dev(struct xdp_sock *xs)
 }
 
 static struct xsk_map *xsk_get_map_list_entry(struct xdp_sock *xs,
-					      struct xdp_sock ***map_entry)
+					      struct xdp_sock __rcu ***map_entry)
 {
 	struct xsk_map *map = NULL;
 	struct xsk_map_node *node;
@@ -785,7 +785,7 @@ static void xsk_delete_from_maps(struct xdp_sock *xs)
 	 * might be updates to the map between
 	 * xsk_get_map_list_entry() and xsk_map_try_sock_delete().
 	 */
-	struct xdp_sock **map_entry = NULL;
+	struct xdp_sock __rcu **map_entry = NULL;
 	struct xsk_map *map;
 
 	while ((map = xsk_get_map_list_entry(xs, &map_entry))) {
diff --git a/net/xdp/xsk.h b/net/xdp/xsk.h
index edcf249ad1f1..a4bc4749faac 100644
--- a/net/xdp/xsk.h
+++ b/net/xdp/xsk.h
@@ -31,7 +31,7 @@ struct xdp_mmap_offsets_v1 {
 struct xsk_map_node {
 	struct list_head node;
 	struct xsk_map *map;
-	struct xdp_sock **map_entry;
+	struct xdp_sock __rcu **map_entry;
 };
 
 static inline struct xdp_sock *xdp_sk(struct sock *sk)
@@ -40,7 +40,7 @@ static inline struct xdp_sock *xdp_sk(struct sock *sk)
 }
 
 void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
-			     struct xdp_sock **map_entry);
+			     struct xdp_sock __rcu **map_entry);
 void xsk_clear_pool_at_qid(struct net_device *dev, u16 queue_id);
 int xsk_reg_pool_at_qid(struct net_device *dev, struct xsk_buff_pool *pool,
 			u16 queue_id);
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index 9df75ea4a567..a754262779ba 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -12,7 +12,7 @@
 #include "xsk.h"
 
 static struct xsk_map_node *xsk_map_node_alloc(struct xsk_map *map,
-					       struct xdp_sock **map_entry)
+					       struct xdp_sock __rcu **map_entry)
 {
 	struct xsk_map_node *node;
 
@@ -42,7 +42,7 @@ static void xsk_map_sock_add(struct xdp_sock *xs, struct xsk_map_node *node)
 }
 
 static void xsk_map_sock_delete(struct xdp_sock *xs,
-				struct xdp_sock **map_entry)
+				struct xdp_sock __rcu **map_entry)
 {
 	struct xsk_map_node *n, *tmp;
 
@@ -124,6 +124,10 @@ static int xsk_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
 	return insn - insn_buf;
 }
 
+/* Elements are kept alive by RCU; either by rcu_read_lock() (from syscall) or
+ * by local_bh_disable() (from XDP calls inside NAPI). The
+ * rcu_read_lock_bh_held() below makes lockdep accept both.
+ */
 static void *__xsk_map_lookup_elem(struct bpf_map *map, u32 key)
 {
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
@@ -131,12 +135,11 @@ static void *__xsk_map_lookup_elem(struct bpf_map *map, u32 key)
 	if (key >= map->max_entries)
 		return NULL;
 
-	return READ_ONCE(m->xsk_map[key]);
+	return rcu_dereference_check(m->xsk_map[key], rcu_read_lock_bh_held());
 }
 
 static void *xsk_map_lookup_elem(struct bpf_map *map, void *key)
 {
-	WARN_ON_ONCE(!rcu_read_lock_held());
 	return __xsk_map_lookup_elem(map, *(u32 *)key);
 }
 
@@ -149,7 +152,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 			       u64 map_flags)
 {
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
-	struct xdp_sock *xs, *old_xs, **map_entry;
+	struct xdp_sock __rcu **map_entry;
+	struct xdp_sock *xs, *old_xs;
 	u32 i = *(u32 *)key, fd = *(u32 *)value;
 	struct xsk_map_node *node;
 	struct socket *sock;
@@ -179,7 +183,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 	}
 
 	spin_lock_bh(&m->lock);
-	old_xs = READ_ONCE(*map_entry);
+	old_xs = rcu_dereference_check(*map_entry, rcu_read_lock_bh_held());
 	if (old_xs == xs) {
 		err = 0;
 		goto out;
@@ -191,7 +195,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 		goto out;
 	}
 	xsk_map_sock_add(xs, node);
-	WRITE_ONCE(*map_entry, xs);
+	rcu_assign_pointer(*map_entry, xs);
 	if (old_xs)
 		xsk_map_sock_delete(old_xs, map_entry);
 	spin_unlock_bh(&m->lock);
@@ -208,7 +212,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
 static int xsk_map_delete_elem(struct bpf_map *map, void *key)
 {
 	struct xsk_map *m = container_of(map, struct xsk_map, map);
-	struct xdp_sock *old_xs, **map_entry;
+	struct xdp_sock __rcu **map_entry;
+	struct xdp_sock *old_xs;
 	int k = *(u32 *)key;
 
 	if (k >= map->max_entries)
@@ -216,7 +221,7 @@ static int xsk_map_delete_elem(struct bpf_map *map, void *key)
 
 	spin_lock_bh(&m->lock);
 	map_entry = &m->xsk_map[k];
-	old_xs = xchg(map_entry, NULL);
+	old_xs = unrcu_pointer(xchg(map_entry, NULL));
 	if (old_xs)
 		xsk_map_sock_delete(old_xs, map_entry);
 	spin_unlock_bh(&m->lock);
@@ -231,11 +236,11 @@ static int xsk_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags)
 }
 
 void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
-			     struct xdp_sock **map_entry)
+			     struct xdp_sock __rcu **map_entry)
 {
 	spin_lock_bh(&map->lock);
-	if (READ_ONCE(*map_entry) == xs) {
-		WRITE_ONCE(*map_entry, NULL);
+	if (rcu_dereference(*map_entry) == xs) {
+		rcu_assign_pointer(*map_entry, NULL);
 		xsk_map_sock_delete(xs, map_entry);
 	}
 	spin_unlock_bh(&map->lock);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (3 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 13:57   ` Paul E. McKenney
  2021-06-09 10:33 ` [PATCH bpf-next 06/17] bnxt: " Toke Høiland-Jørgensen
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Guy Tzalik, Saeed Bishara

The ena driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Guy Tzalik <gtzalik@amazon.com>
Cc: Saeed Bishara <saeedb@amazon.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 881f88754bf6..a4378b14af4c 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -385,7 +385,6 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp)
 	u64 *xdp_stat;
 	int qid;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_bpf_prog);
 
 	if (!xdp_prog)
@@ -443,8 +442,6 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp)
 
 	ena_increase_stat(xdp_stat, 1, &rx_ring->syncp);
 out:
-	rcu_read_unlock();
-
 	return verdict;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 06/17] bnxt: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (4 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 13:58   ` Paul E. McKenney
  2021-06-09 10:33   ` Toke Høiland-Jørgensen
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Michael Chan

The bnxt driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index ec9564e584e0..bee6e091a997 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -138,9 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 	xdp_prepare_buff(&xdp, *data_ptr - offset, offset, *len, false);
 	orig_data = xdp.data;
 
-	rcu_read_lock();
 	act = bpf_prog_run_xdp(xdp_prog, &xdp);
-	rcu_read_unlock();
 
 	tx_avail = bnxt_tx_avail(bp, txr);
 	/* If the tx ring is not full, we must not update the rx producer yet
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 07/17] thunderx: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
@ 2021-06-09 10:33   ` Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Sunil Goutham,
	linux-arm-kernel

The thunderx driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index c33b4e837515..e2b290135fd9 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -555,9 +555,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
 	xdp_prepare_buff(&xdp, hard_start, data - hard_start, len, false);
 	orig_data = xdp.data;
 
-	rcu_read_lock();
 	action = bpf_prog_run_xdp(prog, &xdp);
-	rcu_read_unlock();
 
 	len = xdp.data_end - xdp.data;
 	/* Check if XDP program has changed headers */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 07/17] thunderx: remove rcu_read_lock() around XDP program invocation
@ 2021-06-09 10:33   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Sunil Goutham,
	linux-arm-kernel

The thunderx driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index c33b4e837515..e2b290135fd9 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -555,9 +555,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
 	xdp_prepare_buff(&xdp, hard_start, data - hard_start, len, false);
 	orig_data = xdp.data;
 
-	rcu_read_lock();
 	action = bpf_prog_run_xdp(prog, &xdp);
-	rcu_read_unlock();
 
 	len = xdp.data_end - xdp.data;
 	/* Check if XDP program has changed headers */
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 08/17] freescale: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (6 preceding siblings ...)
  2021-06-09 10:33   ` Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 10:33   ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Madalin Bucur, Ioana Ciornei,
	Ioana Radulescu

The dpaa and dpaa2 drivers have rcu_read_lock()/rcu_read_unlock() pairs
around XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Madalin Bucur <madalin.bucur@nxp.com>
Cc: Ioana Ciornei <ioana.ciornei@nxp.com>
Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c   | 8 +-------
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 3 ---
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 177c020bf34a..e6826561cf11 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2558,13 +2558,9 @@ static u32 dpaa_run_xdp(struct dpaa_priv *priv, struct qm_fd *fd, void *vaddr,
 	u32 xdp_act;
 	int err;
 
-	rcu_read_lock();
-
 	xdp_prog = READ_ONCE(priv->xdp_prog);
-	if (!xdp_prog) {
-		rcu_read_unlock();
+	if (!xdp_prog)
 		return XDP_PASS;
-	}
 
 	xdp_init_buff(&xdp, DPAA_BP_RAW_SIZE - DPAA_TX_PRIV_DATA_SIZE,
 		      &dpaa_fq->xdp_rxq);
@@ -2638,8 +2634,6 @@ static u32 dpaa_run_xdp(struct dpaa_priv *priv, struct qm_fd *fd, void *vaddr,
 		break;
 	}
 
-	rcu_read_unlock();
-
 	return xdp_act;
 }
 
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 8433aa730c42..973352393bd4 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -352,8 +352,6 @@ static u32 dpaa2_eth_run_xdp(struct dpaa2_eth_priv *priv,
 	u32 xdp_act = XDP_PASS;
 	int err, offset;
 
-	rcu_read_lock();
-
 	xdp_prog = READ_ONCE(ch->xdp.prog);
 	if (!xdp_prog)
 		goto out;
@@ -414,7 +412,6 @@ static u32 dpaa2_eth_run_xdp(struct dpaa2_eth_priv *priv,
 
 	ch->xdp.res |= xdp_act;
 out:
-	rcu_read_unlock();
 	return xdp_act;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 09/17] net: intel: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
@ 2021-06-09 10:33   ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Jesse Brandeburg, Tony Nguyen,
	intel-wired-lan

The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around
XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 2 --
 drivers/net/ethernet/intel/i40e/i40e_xsk.c        | 6 +-----
 drivers/net/ethernet/intel/ice/ice_txrx.c         | 6 +-----
 drivers/net/ethernet/intel/ice/ice_xsk.c          | 6 +-----
 drivers/net/ethernet/intel/igb/igb_main.c         | 2 --
 drivers/net/ethernet/intel/igc/igc_main.c         | 7 ++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 2 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c      | 6 +-----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 --
 9 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index de70c16ef619..ae3a64b6f5f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2298,7 +2298,6 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -2329,7 +2328,6 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 46d884417c63..8dca53b7daff 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -153,7 +153,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	/* NB! xdp_prog will always be !NULL, due to the fact that
 	 * this path is enabled by setting an XDP program.
 	 */
@@ -162,9 +161,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
-		rcu_read_unlock();
-		return result;
+		return !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
 	}
 
 	switch (act) {
@@ -184,7 +181,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		result = I40E_XDP_CONSUMED;
 		break;
 	}
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index e2b4b29ea207..1a311e91fb6d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1129,15 +1129,11 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
 #endif
 
-		rcu_read_lock();
 		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
-		if (!xdp_prog) {
-			rcu_read_unlock();
+		if (!xdp_prog)
 			goto construct_skb;
-		}
 
 		xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog);
-		rcu_read_unlock();
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index faa7b8d96adb..d6da377f5ac3 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -463,7 +463,6 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 	struct ice_ring *xdp_ring;
 	u32 act;
 
-	rcu_read_lock();
 	/* ZC patch is enabled only when XDP program is set,
 	 * so here it can not be NULL
 	 */
@@ -473,9 +472,7 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED;
-		rcu_read_unlock();
-		return result;
+		return !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED;
 	}
 
 	switch (act) {
@@ -496,7 +493,6 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 		break;
 	}
 
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 038a9fd1af44..8a11b7e55326 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8387,7 +8387,6 @@ static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -8420,7 +8419,6 @@ static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return ERR_PTR(-result);
 }
 
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ea998d2defa4..2b666a6ec989 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2175,18 +2175,15 @@ static struct sk_buff *igc_xdp_run_prog(struct igc_adapter *adapter,
 	struct bpf_prog *prog;
 	int res;
 
-	rcu_read_lock();
-
 	prog = READ_ONCE(adapter->xdp_prog);
 	if (!prog) {
 		res = IGC_XDP_PASS;
-		goto unlock;
+		goto out;
 	}
 
 	res = __igc_xdp_run_prog(adapter, prog, xdp);
 
-unlock:
-	rcu_read_unlock();
+out:
 	return ERR_PTR(-res);
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c5ec17d19c59..27d7467534e0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2199,7 +2199,6 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	struct xdp_frame *xdpf;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -2237,7 +2236,6 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return ERR_PTR(-result);
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 91ad5b902673..ffbf8a694362 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -100,15 +100,12 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	struct xdp_frame *xdpf;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
-		rcu_read_unlock();
-		return result;
+		return !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
 	}
 
 	switch (act) {
@@ -132,7 +129,6 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 		result = IXGBE_XDP_CONSUMED;
 		break;
 	}
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index ba2ed8a43d2d..fabada4ce315 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1054,7 +1054,6 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -1079,7 +1078,6 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return ERR_PTR(-result);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Intel-wired-lan] [PATCH bpf-next 09/17] net: intel: remove rcu_read_lock() around XDP program invocation
@ 2021-06-09 10:33   ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
  0 siblings, 0 replies; 39+ messages in thread
From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-06-09 10:33 UTC (permalink / raw)
  To: intel-wired-lan

The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around
XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan at lists.osuosl.org
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e
Signed-off-by: Toke H?iland-J?rgensen <toke@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c       | 2 --
 drivers/net/ethernet/intel/i40e/i40e_xsk.c        | 6 +-----
 drivers/net/ethernet/intel/ice/ice_txrx.c         | 6 +-----
 drivers/net/ethernet/intel/ice/ice_xsk.c          | 6 +-----
 drivers/net/ethernet/intel/igb/igb_main.c         | 2 --
 drivers/net/ethernet/intel/igc/igc_main.c         | 7 ++-----
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c     | 2 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c      | 6 +-----
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 2 --
 9 files changed, 6 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index de70c16ef619..ae3a64b6f5f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2298,7 +2298,6 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -2329,7 +2328,6 @@ static int i40e_run_xdp(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 46d884417c63..8dca53b7daff 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -153,7 +153,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	/* NB! xdp_prog will always be !NULL, due to the fact that
 	 * this path is enabled by setting an XDP program.
 	 */
@@ -162,9 +161,7 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
-		rcu_read_unlock();
-		return result;
+		return !err ? I40E_XDP_REDIR : I40E_XDP_CONSUMED;
 	}
 
 	switch (act) {
@@ -184,7 +181,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		result = I40E_XDP_CONSUMED;
 		break;
 	}
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index e2b4b29ea207..1a311e91fb6d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1129,15 +1129,11 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
 #endif
 
-		rcu_read_lock();
 		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
-		if (!xdp_prog) {
-			rcu_read_unlock();
+		if (!xdp_prog)
 			goto construct_skb;
-		}
 
 		xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog);
-		rcu_read_unlock();
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index faa7b8d96adb..d6da377f5ac3 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -463,7 +463,6 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 	struct ice_ring *xdp_ring;
 	u32 act;
 
-	rcu_read_lock();
 	/* ZC patch is enabled only when XDP program is set,
 	 * so here it can not be NULL
 	 */
@@ -473,9 +472,7 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED;
-		rcu_read_unlock();
-		return result;
+		return !err ? ICE_XDP_REDIR : ICE_XDP_CONSUMED;
 	}
 
 	switch (act) {
@@ -496,7 +493,6 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 		break;
 	}
 
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 038a9fd1af44..8a11b7e55326 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8387,7 +8387,6 @@ static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -8420,7 +8419,6 @@ static struct sk_buff *igb_run_xdp(struct igb_adapter *adapter,
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return ERR_PTR(-result);
 }
 
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index ea998d2defa4..2b666a6ec989 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -2175,18 +2175,15 @@ static struct sk_buff *igc_xdp_run_prog(struct igc_adapter *adapter,
 	struct bpf_prog *prog;
 	int res;
 
-	rcu_read_lock();
-
 	prog = READ_ONCE(adapter->xdp_prog);
 	if (!prog) {
 		res = IGC_XDP_PASS;
-		goto unlock;
+		goto out;
 	}
 
 	res = __igc_xdp_run_prog(adapter, prog, xdp);
 
-unlock:
-	rcu_read_unlock();
+out:
 	return ERR_PTR(-res);
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c5ec17d19c59..27d7467534e0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2199,7 +2199,6 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	struct xdp_frame *xdpf;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -2237,7 +2236,6 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return ERR_PTR(-result);
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 91ad5b902673..ffbf8a694362 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -100,15 +100,12 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 	struct xdp_frame *xdpf;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
-		rcu_read_unlock();
-		return result;
+		return !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
 	}
 
 	switch (act) {
@@ -132,7 +129,6 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 		result = IXGBE_XDP_CONSUMED;
 		break;
 	}
-	rcu_read_unlock();
 	return result;
 }
 
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index ba2ed8a43d2d..fabada4ce315 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1054,7 +1054,6 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
@@ -1079,7 +1078,6 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 		break;
 	}
 xdp_out:
-	rcu_read_unlock();
 	return ERR_PTR(-result);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 10/17] marvell: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (8 preceding siblings ...)
  2021-06-09 10:33   ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 11/17] mlx4: " Toke Høiland-Jørgensen
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Thomas Petazzoni,
	Marcin Wojtas, Russell King

The mvneta and mvpp2 drivers have rcu_read_lock()/rcu_read_unlock() pairs
around XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Cc: Marcin Wojtas <mw@semihalf.com>
Cc: Russell King <linux@armlinux.org.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c           | 2 --
 drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 4 ----
 2 files changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 7d5cd9bc6c99..b9e5875b20bc 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2370,7 +2370,6 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	/* Get number of received packets */
 	rx_todo = mvneta_rxq_busy_desc_num_get(pp, rxq);
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(pp->xdp_prog);
 
 	/* Fairness NAPI loop */
@@ -2448,7 +2447,6 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		xdp_buf.data_hard_start = NULL;
 		sinfo.nr_frags = 0;
 	}
-	rcu_read_unlock();
 
 	if (xdp_buf.data_hard_start)
 		mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index b2259bf1d299..521ed3c1cfe9 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3852,8 +3852,6 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi,
 	int rx_done = 0;
 	u32 xdp_ret = 0;
 
-	rcu_read_lock();
-
 	xdp_prog = READ_ONCE(port->xdp_prog);
 
 	/* Get number of received packets and clamp the to-do */
@@ -3988,8 +3986,6 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi,
 		mvpp2_bm_pool_put(port, pool, dma_addr, phys_addr);
 	}
 
-	rcu_read_unlock();
-
 	if (xdp_ret & MVPP2_XDP_REDIR)
 		xdp_do_flush_map();
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 11/17] mlx4: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (9 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 10/17] marvell: " Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-10  7:10   ` Tariq Toukan
  2021-06-09 10:33 ` [PATCH bpf-next 12/17] nfp: " Toke Høiland-Jørgensen
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Tariq Toukan

The mlx4 driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around. Also switch the RCU
dereferences in the driver loop itself to the _bh variants.

Cc: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index e35e4d7ef4d1..3f08c14d0441 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -679,9 +679,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 
 	ring = priv->rx_ring[cq_ring];
 
-	/* Protect accesses to: ring->xdp_prog, priv->mac_hash list */
-	rcu_read_lock();
-	xdp_prog = rcu_dereference(ring->xdp_prog);
+	xdp_prog = rcu_dereference_bh(ring->xdp_prog);
 	xdp_init_buff(&xdp, priv->frag_info[0].frag_stride, &ring->xdp_rxq);
 	doorbell_pending = false;
 
@@ -744,7 +742,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 				/* Drop the packet, since HW loopback-ed it */
 				mac_hash = ethh->h_source[MLX4_EN_MAC_HASH_IDX];
 				bucket = &priv->mac_hash[mac_hash];
-				hlist_for_each_entry_rcu(entry, bucket, hlist) {
+				hlist_for_each_entry_rcu_bh(entry, bucket, hlist) {
 					if (ether_addr_equal_64bits(entry->mac,
 								    ethh->h_source))
 						goto next;
@@ -899,8 +897,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			break;
 	}
 
-	rcu_read_unlock();
-
 	if (likely(polled)) {
 		if (doorbell_pending) {
 			priv->tx_cq[TX_XDP][cq_ring]->xdp_busy = true;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 12/17] nfp: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (10 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 11/17] mlx4: " Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-11 16:30   ` Simon Horman
  2021-06-09 10:33 ` [PATCH bpf-next 13/17] qede: " Toke Høiland-Jørgensen
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Simon Horman, oss-drivers

The nfp driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small.

While this is not actually an issue for the nfp driver because it doesn't
support XDP_REDIRECT (and thus doesn't call xdp_do_flush()), the
rcu_read_lock() is still unneeded. And With the addition of RCU annotations
to the XDP_REDIRECT map types that take bh execution into account, lockdep
even understands this to be safe, so there's really no reason to keep it
around.

Cc: Simon Horman <simon.horman@netronome.com>
Cc: oss-drivers@netronome.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index eeb30680b4dc..5dfa4799c34f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1819,7 +1819,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 	struct xdp_buff xdp;
 	int idx;
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(dp->xdp_prog);
 	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
 	xdp_init_buff(&xdp, PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM,
@@ -2036,7 +2035,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			if (!nfp_net_xdp_complete(tx_ring))
 				pkts_polled = budget;
 	}
-	rcu_read_unlock();
 
 	return pkts_polled;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 13/17] qede: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (11 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 12/17] nfp: " Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 14/17] sfc: " Toke Høiland-Jørgensen
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Ariel Elior,
	GR-everest-linux-l2

The qede driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 8e150dd4f899..065e9004598e 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -1089,13 +1089,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,
 	xdp_prepare_buff(&xdp, page_address(bd->data), *data_offset,
 			 *len, false);
 
-	/* Queues always have a full reset currently, so for the time
-	 * being until there's atomic program replace just mark read
-	 * side for map helpers.
-	 */
-	rcu_read_lock();
 	act = bpf_prog_run_xdp(prog, &xdp);
-	rcu_read_unlock();
 
 	/* Recalculate, as XDP might have changed the headers */
 	*data_offset = xdp.data - xdp.data_hard_start;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 14/17] sfc: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (12 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 13/17] qede: " Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 12:15   ` Edward Cree
  2021-06-09 10:33 ` [PATCH bpf-next 15/17] netsec: " Toke Høiland-Jørgensen
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Edward Cree, Martin Habets

The sfc driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: Martin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/sfc/rx.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 17b8119c48e5..606750938b89 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -260,18 +260,14 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 	s16 offset;
 	int err;
 
-	rcu_read_lock();
-	xdp_prog = rcu_dereference(efx->xdp_prog);
-	if (!xdp_prog) {
-		rcu_read_unlock();
+	xdp_prog = rcu_dereference_bh(efx->xdp_prog);
+	if (!xdp_prog)
 		return true;
-	}
 
 	rx_queue = efx_channel_get_rx_queue(channel);
 
 	if (unlikely(channel->rx_pkt_n_frags > 1)) {
 		/* We can't do XDP on fragmented packets - drop. */
-		rcu_read_unlock();
 		efx_free_rx_buffers(rx_queue, rx_buf,
 				    channel->rx_pkt_n_frags);
 		if (net_ratelimit())
@@ -296,7 +292,6 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 			 rx_buf->len, false);
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
-	rcu_read_unlock();
 
 	offset = (u8 *)xdp.data - *ehp;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 15/17] netsec: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (13 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 14/17] sfc: " Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-10  5:30   ` Ilias Apalodimas
  2021-06-09 10:33 ` [PATCH bpf-next 16/17] stmmac: " Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Jassi Brar, Ilias Apalodimas

The netsec driver has a rcu_read_lock()/rcu_read_unlock() pair around the
full RX loop, covering everything up to and including xdp_do_flush(). This
is actually the correct behaviour, but because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), it is also technically
redundant.

With the addition of RCU annotations to the XDP_REDIRECT map types that
take bh execution into account, lockdep even understands this to be safe,
so there's really no reason to keep the rcu_read_lock() around anymore, so
let's just remove it.

Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/socionext/netsec.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index dfc85cc68173..20d148c019d8 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -958,7 +958,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 
 	xdp_init_buff(&xdp, PAGE_SIZE, &dring->xdp_rxq);
 
-	rcu_read_lock();
 	xdp_prog = READ_ONCE(priv->xdp_prog);
 	dma_dir = page_pool_get_dma_dir(dring->page_pool);
 
@@ -1069,8 +1068,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 	}
 	netsec_finalize_xdp_rx(priv, xdp_act, xdp_xmit);
 
-	rcu_read_unlock();
-
 	return done;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 16/17] stmmac: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (14 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 15/17] netsec: " Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 10:33 ` [PATCH bpf-next 17/17] net: ti: " Toke Høiland-Jørgensen
  2021-06-10  0:18 ` [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Yonghong Song
  17 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Giuseppe Cavallaro,
	Alexandre Torgue, Jose Abreu

The stmmac driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bf9fe25fed69..08c4b999e1ba 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4654,7 +4654,6 @@ static int stmmac_xdp_xmit_back(struct stmmac_priv *priv,
 	return res;
 }
 
-/* This function assumes rcu_read_lock() is held by the caller. */
 static int __stmmac_xdp_run_prog(struct stmmac_priv *priv,
 				 struct bpf_prog *prog,
 				 struct xdp_buff *xdp)
@@ -4696,17 +4695,14 @@ static struct sk_buff *stmmac_xdp_run_prog(struct stmmac_priv *priv,
 	struct bpf_prog *prog;
 	int res;
 
-	rcu_read_lock();
-
 	prog = READ_ONCE(priv->xdp_prog);
 	if (!prog) {
 		res = STMMAC_XDP_PASS;
-		goto unlock;
+		goto out;
 	}
 
 	res = __stmmac_xdp_run_prog(priv, prog, xdp);
-unlock:
-	rcu_read_unlock();
+out:
 	return ERR_PTR(-res);
 }
 
@@ -4976,10 +4972,8 @@ static int stmmac_rx_zc(struct stmmac_priv *priv, int limit, u32 queue)
 		buf->xdp->data_end = buf->xdp->data + buf1_len;
 		xsk_buff_dma_sync_for_cpu(buf->xdp, rx_q->xsk_pool);
 
-		rcu_read_lock();
 		prog = READ_ONCE(priv->xdp_prog);
 		res = __stmmac_xdp_run_prog(priv, prog, buf->xdp);
-		rcu_read_unlock();
 
 		switch (res) {
 		case STMMAC_XDP_PASS:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 17/17] net: ti: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (15 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 16/17] stmmac: " Toke Høiland-Jørgensen
@ 2021-06-09 10:33 ` Toke Høiland-Jørgensen
  2021-06-09 17:04   ` Grygorii Strashko
  2021-06-10  0:18 ` [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Yonghong Song
  17 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-09 10:33 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney,
	Toke Høiland-Jørgensen, Grygorii Strashko, linux-omap

The cpsw driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
program invocations. However, the actual lifetime of the objects referred
by the XDP program invocation is longer, all the way through to the call to
xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
turns out to be harmless because it all happens in a single NAPI poll
cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-omap@vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/ethernet/ti/cpsw_priv.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw_priv.c b/drivers/net/ethernet/ti/cpsw_priv.c
index 5862f0a4a975..ecc2a6b7e28f 100644
--- a/drivers/net/ethernet/ti/cpsw_priv.c
+++ b/drivers/net/ethernet/ti/cpsw_priv.c
@@ -1328,13 +1328,9 @@ int cpsw_run_xdp(struct cpsw_priv *priv, int ch, struct xdp_buff *xdp,
 	struct bpf_prog *prog;
 	u32 act;
 
-	rcu_read_lock();
-
 	prog = READ_ONCE(priv->xdp_prog);
-	if (!prog) {
-		ret = CPSW_XDP_PASS;
-		goto out;
-	}
+	if (!prog)
+		return CPSW_XDP_PASS;
 
 	act = bpf_prog_run_xdp(prog, xdp);
 	/* XDP prog might have changed packet data and boundaries */
@@ -1378,10 +1374,8 @@ int cpsw_run_xdp(struct cpsw_priv *priv, int ch, struct xdp_buff *xdp,
 	ndev->stats.rx_bytes += *len;
 	ndev->stats.rx_packets++;
 out:
-	rcu_read_unlock();
 	return ret;
 drop:
-	rcu_read_unlock();
 	page_pool_recycle_direct(cpsw->page_pool[ch], page);
 	return ret;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 14/17] sfc: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 ` [PATCH bpf-next 14/17] sfc: " Toke Høiland-Jørgensen
@ 2021-06-09 12:15   ` Edward Cree
  0 siblings, 0 replies; 39+ messages in thread
From: Edward Cree @ 2021-06-09 12:15 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney, Martin Habets

On 09/06/2021 11:33, Toke Høiland-Jørgensen wrote:
> The sfc driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
> program invocations. However, the actual lifetime of the objects referred
> by the XDP program invocation is longer, all the way through to the call to
> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
> turns out to be harmless because it all happens in a single NAPI poll
> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
> misleading.
> 
> Rather than extend the scope of the rcu_read_lock(), just get rid of it
> entirely. With the addition of RCU annotations to the XDP_REDIRECT map
> types that take bh execution into account, lockdep even understands this to
> be safe, so there's really no reason to keep it around.
> 
> Cc: Edward Cree <ecree.xilinx@gmail.com>
> Cc: Martin Habets <habetsm.xilinx@gmail.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
Acked-by: Edward Cree <ecree.xilinx@gmail.com>

>  drivers/net/ethernet/sfc/rx.c | 9 ++-------
>  1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
> index 17b8119c48e5..606750938b89 100644
> --- a/drivers/net/ethernet/sfc/rx.c
> +++ b/drivers/net/ethernet/sfc/rx.c
> @@ -260,18 +260,14 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
>  	s16 offset;
>  	int err;
>  
> -	rcu_read_lock();
> -	xdp_prog = rcu_dereference(efx->xdp_prog);
> -	if (!xdp_prog) {
> -		rcu_read_unlock();
> +	xdp_prog = rcu_dereference_bh(efx->xdp_prog);
> +	if (!xdp_prog)
>  		return true;
> -	}
>  
>  	rx_queue = efx_channel_get_rx_queue(channel);
>  
>  	if (unlikely(channel->rx_pkt_n_frags > 1)) {
>  		/* We can't do XDP on fragmented packets - drop. */
> -		rcu_read_unlock();
>  		efx_free_rx_buffers(rx_queue, rx_buf,
>  				    channel->rx_pkt_n_frags);
>  		if (net_ratelimit())
> @@ -296,7 +292,6 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
>  			 rx_buf->len, false);
>  
>  	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
> -	rcu_read_unlock();
>  
>  	offset = (u8 *)xdp.data - *ehp;
>  
> 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 ` [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation Toke Høiland-Jørgensen
@ 2021-06-09 13:57   ` Paul E. McKenney
  0 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2021-06-09 13:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, netdev, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Guy Tzalik,
	Saeed Bishara

On Wed, Jun 09, 2021 at 12:33:14PM +0200, Toke Høiland-Jørgensen wrote:
> The ena driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
> program invocations. However, the actual lifetime of the objects referred
> by the XDP program invocation is longer, all the way through to the call to
> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
> turns out to be harmless because it all happens in a single NAPI poll
> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
> misleading.
> 
> Rather than extend the scope of the rcu_read_lock(), just get rid of it
> entirely. With the addition of RCU annotations to the XDP_REDIRECT map
> types that take bh execution into account, lockdep even understands this to
> be safe, so there's really no reason to keep it around.

It might be worth adding a comment, perhaps where the rcu_read_lock()
used to be, stating what the protection is.  Maybe something like this?

	/*
	 * This code is invoked within a single NAPI poll cycle
	 * and thus under local_bh_disable(), which provides the
	 * needed RCU protection.
	 */

							Thanx, Paul

> Cc: Guy Tzalik <gtzalik@amazon.com>
> Cc: Saeed Bishara <saeedb@amazon.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  drivers/net/ethernet/amazon/ena/ena_netdev.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> index 881f88754bf6..a4378b14af4c 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
> @@ -385,7 +385,6 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp)
>  	u64 *xdp_stat;
>  	int qid;
>  
> -	rcu_read_lock();
>  	xdp_prog = READ_ONCE(rx_ring->xdp_bpf_prog);
>  
>  	if (!xdp_prog)
> @@ -443,8 +442,6 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp)
>  
>  	ena_increase_stat(xdp_stat, 1, &rx_ring->syncp);
>  out:
> -	rcu_read_unlock();
> -
>  	return verdict;
>  }
>  
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 06/17] bnxt: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 ` [PATCH bpf-next 06/17] bnxt: " Toke Høiland-Jørgensen
@ 2021-06-09 13:58   ` Paul E. McKenney
  2021-06-10  8:47     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2021-06-09 13:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, netdev, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Michael Chan

On Wed, Jun 09, 2021 at 12:33:15PM +0200, Toke Høiland-Jørgensen wrote:
> The bnxt driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
> program invocations. However, the actual lifetime of the objects referred
> by the XDP program invocation is longer, all the way through to the call to
> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
> turns out to be harmless because it all happens in a single NAPI poll
> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
> misleading.
> 
> Rather than extend the scope of the rcu_read_lock(), just get rid of it
> entirely. With the addition of RCU annotations to the XDP_REDIRECT map
> types that take bh execution into account, lockdep even understands this to
> be safe, so there's really no reason to keep it around.

And same for the rest of these removals.  Someone might be very happy
to have that comment at some later date, and that someone just might
be you.  ;-)

							Thanx, Paul

> Cc: Michael Chan <michael.chan@broadcom.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> index ec9564e584e0..bee6e091a997 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
> @@ -138,9 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
>  	xdp_prepare_buff(&xdp, *data_ptr - offset, offset, *len, false);
>  	orig_data = xdp.data;
>  
> -	rcu_read_lock();
>  	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> -	rcu_read_unlock();
>  
>  	tx_avail = bnxt_tx_avail(bp, txr);
>  	/* If the tx ring is not full, we must not update the rx producer yet
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 17/17] net: ti: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 ` [PATCH bpf-next 17/17] net: ti: " Toke Høiland-Jørgensen
@ 2021-06-09 17:04   ` Grygorii Strashko
  0 siblings, 0 replies; 39+ messages in thread
From: Grygorii Strashko @ 2021-06-09 17:04 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney, linux-omap



On 09/06/2021 13:33, Toke Høiland-Jørgensen wrote:
> The cpsw driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
> program invocations. However, the actual lifetime of the objects referred
> by the XDP program invocation is longer, all the way through to the call to
> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
> turns out to be harmless because it all happens in a single NAPI poll
> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
> misleading.
> 
> Rather than extend the scope of the rcu_read_lock(), just get rid of it
> entirely. With the addition of RCU annotations to the XDP_REDIRECT map
> types that take bh execution into account, lockdep even understands this to
> be safe, so there's really no reason to keep it around.
> 
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: linux-omap@vger.kernel.org
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>   drivers/net/ethernet/ti/cpsw_priv.c | 10 ++--------
>   1 file changed, 2 insertions(+), 8 deletions(-)

Tested-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>

-- 
Best regards,
grygorii

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT
  2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
                   ` (16 preceding siblings ...)
  2021-06-09 10:33 ` [PATCH bpf-next 17/17] net: ti: " Toke Høiland-Jørgensen
@ 2021-06-10  0:18 ` Yonghong Song
  17 siblings, 0 replies; 39+ messages in thread
From: Yonghong Song @ 2021-06-10  0:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney



On 6/9/21 3:33 AM, Toke Høiland-Jørgensen wrote:
> During the discussion[0] of Hangbin's multicast patch series, Martin pointed out
> that the lifetime of the RCU-protected  map entries used by XDP_REDIRECT is by
> no means obvious. I promised to look into cleaning this up, and Paul helpfully
> provided some hints and a new unrcu_pointer() helper to aid in this.
> 
> This is mostly a documentation exercise, clearing up the description of the
> lifetime expectations and adding __rcu annotations so sparse and lockdep can
> help verify it.
> 
> Patches 1-2 are prepatory: Patch 1 adds Paul's unrcu_pointer() helper (which has
> already been added to his tree) and patch 2 is a small fix for
> dev_get_by_index_rcu() so lockdep understands _bh-disabled access to it. Patch 3
> is the main bit that adds the __rcu annotations and updates documentation
> comments, and the rest are patches updating the drivers, with one patch per
> distinct maintainer.
> 
> Unfortunately I don't have any hardware to test any of the driver patches;
> Jesper helpfully verified that it doesn't break anything on i40e, but the rest
> of the driver patches are only compile-tested.
> 
> [0] https://lore.kernel.org/bpf/20210415173551.7ma4slcbqeyiba2r@kafai-mbp.dhcp.thefacebook.com/
> 
> Paul E. McKenney (1):
>    rcu: Create an unrcu_pointer() to remove __rcu from a pointer
> 
> Toke Høiland-Jørgensen (16):
>    bpf: allow RCU-protected lookups to happen from bh context
>    dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU
>      dev ref
>    xdp: add proper __rcu annotations to redirect map entries
>    ena: remove rcu_read_lock() around XDP program invocation
>    bnxt: remove rcu_read_lock() around XDP program invocation
>    thunderx: remove rcu_read_lock() around XDP program invocation
>    freescale: remove rcu_read_lock() around XDP program invocation
>    net: intel: remove rcu_read_lock() around XDP program invocation
>    marvell: remove rcu_read_lock() around XDP program invocation
>    mlx4: remove rcu_read_lock() around XDP program invocation
>    nfp: remove rcu_read_lock() around XDP program invocation
>    qede: remove rcu_read_lock() around XDP program invocation
>    sfc: remove rcu_read_lock() around XDP program invocation
>    netsec: remove rcu_read_lock() around XDP program invocation
>    stmmac: remove rcu_read_lock() around XDP program invocation
>    net: ti: remove rcu_read_lock() around XDP program invocation
> 
>   drivers/net/ethernet/amazon/ena/ena_netdev.c  |  3 --
>   drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  2 -
>   .../net/ethernet/cavium/thunder/nicvf_main.c  |  2 -
>   .../net/ethernet/freescale/dpaa/dpaa_eth.c    |  8 +--
>   .../net/ethernet/freescale/dpaa2/dpaa2-eth.c  |  3 --
>   drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  2 -
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    |  6 +--
>   drivers/net/ethernet/intel/ice/ice_txrx.c     |  6 +--
>   drivers/net/ethernet/intel/ice/ice_xsk.c      |  6 +--
>   drivers/net/ethernet/intel/igb/igb_main.c     |  2 -
>   drivers/net/ethernet/intel/igc/igc_main.c     |  7 +--
>   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  2 -
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  |  6 +--
>   .../net/ethernet/intel/ixgbevf/ixgbevf_main.c |  2 -
>   drivers/net/ethernet/marvell/mvneta.c         |  2 -
>   .../net/ethernet/marvell/mvpp2/mvpp2_main.c   |  4 --
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c    |  8 +--
>   .../ethernet/netronome/nfp/nfp_net_common.c   |  2 -
>   drivers/net/ethernet/qlogic/qede/qede_fp.c    |  6 ---
>   drivers/net/ethernet/sfc/rx.c                 |  9 +---
>   drivers/net/ethernet/socionext/netsec.c       |  3 --
>   .../net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +---
>   drivers/net/ethernet/ti/cpsw_priv.c           | 10 +---
>   include/linux/rcupdate.h                      | 14 +++++
>   include/net/xdp_sock.h                        |  2 +-
>   kernel/bpf/cpumap.c                           | 14 +++--
>   kernel/bpf/devmap.c                           | 52 ++++++++-----------
>   kernel/bpf/hashtab.c                          | 21 +++++---
>   kernel/bpf/helpers.c                          |  6 +--
>   kernel/bpf/lpm_trie.c                         |  6 ++-
>   net/core/dev.c                                |  2 +-
>   net/core/filter.c                             | 28 ++++++++++
>   net/xdp/xsk.c                                 |  4 +-
>   net/xdp/xsk.h                                 |  4 +-
>   net/xdp/xskmap.c                              | 29 ++++++-----
>   35 files changed, 134 insertions(+), 159 deletions(-)

Martin, could you help review this patch set? You had participated
in early discussions related to this patch. Thanks!

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 15/17] netsec: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 ` [PATCH bpf-next 15/17] netsec: " Toke Høiland-Jørgensen
@ 2021-06-10  5:30   ` Ilias Apalodimas
  0 siblings, 0 replies; 39+ messages in thread
From: Ilias Apalodimas @ 2021-06-10  5:30 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, Networking, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Paul E . McKenney,
	Jassi Brar

On Wed, 9 Jun 2021 at 13:33, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> The netsec driver has a rcu_read_lock()/rcu_read_unlock() pair around the
> full RX loop, covering everything up to and including xdp_do_flush(). This
> is actually the correct behaviour, but because it all happens in a single
> NAPI poll cycle (and thus under local_bh_disable()), it is also technically
> redundant.
>
> With the addition of RCU annotations to the XDP_REDIRECT map types that
> take bh execution into account, lockdep even understands this to be safe,
> so there's really no reason to keep the rcu_read_lock() around anymore, so
> let's just remove it.
>
> Cc: Jassi Brar <jaswinder.singh@linaro.org>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  drivers/net/ethernet/socionext/netsec.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
> index dfc85cc68173..20d148c019d8 100644
> --- a/drivers/net/ethernet/socionext/netsec.c
> +++ b/drivers/net/ethernet/socionext/netsec.c
> @@ -958,7 +958,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
>
>         xdp_init_buff(&xdp, PAGE_SIZE, &dring->xdp_rxq);
>
> -       rcu_read_lock();
>         xdp_prog = READ_ONCE(priv->xdp_prog);
>         dma_dir = page_pool_get_dma_dir(dring->page_pool);
>
> @@ -1069,8 +1068,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
>         }
>         netsec_finalize_xdp_rx(priv, xdp_act, xdp_xmit);
>
> -       rcu_read_unlock();
> -
>         return done;
>  }
>
> --
> 2.31.1
>

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 11/17] mlx4: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 ` [PATCH bpf-next 11/17] mlx4: " Toke Høiland-Jørgensen
@ 2021-06-10  7:10   ` Tariq Toukan
  0 siblings, 0 replies; 39+ messages in thread
From: Tariq Toukan @ 2021-06-10  7:10 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, bpf, netdev
  Cc: Martin KaFai Lau, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney, Tariq Toukan



On 6/9/2021 1:33 PM, Toke Høiland-Jørgensen wrote:
> The mlx4 driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
> program invocations. However, the actual lifetime of the objects referred
> by the XDP program invocation is longer, all the way through to the call to
> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
> turns out to be harmless because it all happens in a single NAPI poll
> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
> misleading.
> 
> Rather than extend the scope of the rcu_read_lock(), just get rid of it
> entirely. With the addition of RCU annotations to the XDP_REDIRECT map
> types that take bh execution into account, lockdep even understands this to
> be safe, so there's really no reason to keep it around. Also switch the RCU
> dereferences in the driver loop itself to the _bh variants.
> 
> Cc: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---

Thanks for your patch.
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

Regards,
Tariq

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 06/17] bnxt: remove rcu_read_lock() around XDP program invocation
  2021-06-09 13:58   ` Paul E. McKenney
@ 2021-06-10  8:47     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-10  8:47 UTC (permalink / raw)
  To: paulmck
  Cc: bpf, netdev, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Michael Chan

"Paul E. McKenney" <paulmck@kernel.org> writes:

> On Wed, Jun 09, 2021 at 12:33:15PM +0200, Toke Høiland-Jørgensen wrote:
>> The bnxt driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
>> program invocations. However, the actual lifetime of the objects referred
>> by the XDP program invocation is longer, all the way through to the call to
>> xdp_do_flush(), making the scope of the rcu_read_lock() too small. This
>> turns out to be harmless because it all happens in a single NAPI poll
>> cycle (and thus under local_bh_disable()), but it makes the rcu_read_lock()
>> misleading.
>> 
>> Rather than extend the scope of the rcu_read_lock(), just get rid of it
>> entirely. With the addition of RCU annotations to the XDP_REDIRECT map
>> types that take bh execution into account, lockdep even understands this to
>> be safe, so there's really no reason to keep it around.
>
> And same for the rest of these removals.  Someone might be very happy
> to have that comment at some later date, and that someone just might
> be you.  ;-)

Bah, why do you have to go and make sensible suggestions like that? ;)

Will wait for Martin's review and add this in a v2. BTW, is it OK to
include your patch in the series like this, or should I rather request
that your tree be merged into bpf-next?

-Toke


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
  2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
@ 2021-06-10 18:38   ` Alexei Starovoitov
  2021-06-10 21:24     ` Daniel Borkmann
  2021-06-10 19:33   ` Martin KaFai Lau
  1 sibling, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2021-06-10 18:38 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, Network Development, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Paul E . McKenney

On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> XDP programs are called from a NAPI poll context, which means the RCU
> reference liveness is ensured by local_bh_disable(). Add
> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
> lockdep understands that the dereferences are safe from inside *either* an
> rcu_read_lock() section *or* a local_bh_disable() section. This is done in
> preparation for removing the redundant rcu_read_lock()s from the drivers.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  kernel/bpf/hashtab.c  | 21 ++++++++++++++-------
>  kernel/bpf/helpers.c  |  6 +++---
>  kernel/bpf/lpm_trie.c |  6 ++++--
>  3 files changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 6f6681b07364..72c58cc516a3 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>         struct htab_elem *l;
>         u32 hash, key_size;
>
> -       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +                    !rcu_read_lock_bh_held());

It's not clear to me whether rcu_read_lock_held() is still needed.
All comments sound like rcu_read_lock_bh_held() is a superset of rcu
that includes bh.
But reading rcu source code it looks like RCU_BH is its own rcu flavor...
which is confusing.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
  2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
  2021-06-10 18:38   ` Alexei Starovoitov
@ 2021-06-10 19:33   ` Martin KaFai Lau
  1 sibling, 0 replies; 39+ messages in thread
From: Martin KaFai Lau @ 2021-06-10 19:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, netdev, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney

On Wed, Jun 09, 2021 at 12:33:11PM +0200, Toke Høiland-Jørgensen wrote:
> XDP programs are called from a NAPI poll context, which means the RCU
> reference liveness is ensured by local_bh_disable(). Add
> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
> lockdep understands that the dereferences are safe from inside *either* an
> rcu_read_lock() section *or* a local_bh_disable() section. This is done in
> preparation for removing the redundant rcu_read_lock()s from the drivers.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  kernel/bpf/hashtab.c  | 21 ++++++++++++++-------
>  kernel/bpf/helpers.c  |  6 +++---
>  kernel/bpf/lpm_trie.c |  6 ++++--
>  3 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 6f6681b07364..72c58cc516a3 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>  	struct htab_elem *l;
>  	u32 hash, key_size;
>  
> -	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +		     !rcu_read_lock_bh_held());
>  
>  	key_size = map->key_size;
>  
> @@ -989,7 +990,8 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>  		/* unknown flags */
>  		return -EINVAL;
>  
> -	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +		     !rcu_read_lock_bh_held());
>  
>  	key_size = map->key_size;
>  
> @@ -1082,7 +1084,8 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
>  		/* unknown flags */
>  		return -EINVAL;
>  
> -	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +		     !rcu_read_lock_bh_held());
>  
>  	key_size = map->key_size;
>  
> @@ -1148,7 +1151,8 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
>  		/* unknown flags */
>  		return -EINVAL;
>  
> -	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +		     !rcu_read_lock_bh_held());
>  
>  	key_size = map->key_size;
>  
> @@ -1202,7 +1206,8 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
>  		/* unknown flags */
>  		return -EINVAL;
>  
> -	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +		     !rcu_read_lock_bh_held());
>  
>  	key_size = map->key_size;
>  
> @@ -1276,7 +1281,8 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
>  	u32 hash, key_size;
>  	int ret;
>  
> -	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +		     !rcu_read_lock_bh_held());
>  
>  	key_size = map->key_size;
>  
> @@ -1311,7 +1317,8 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
>  	u32 hash, key_size;
>  	int ret;
>  
> -	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
> +		     !rcu_read_lock_bh_held());
>  
>  	key_size = map->key_size;
>  
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 544773970dbc..e880f6bb6f28 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -28,7 +28,7 @@
>   */
>  BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
>  {
> -	WARN_ON_ONCE(!rcu_read_lock_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
There is a discrepancy in rcu_read_lock_trace_held() here but
I think the patch_map_ops_generic step in the verifier has skipped
these helper calls.  It is unrelated and can be addressed later
until it is needed.

Acked-by: Martin KaFai Lau <kafai@fb.com>

>  	return (unsigned long) map->ops->map_lookup_elem(map, key);
>  }
>  
> @@ -44,7 +44,7 @@ const struct bpf_func_proto bpf_map_lookup_elem_proto = {
>  BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
>  	   void *, value, u64, flags)
>  {
> -	WARN_ON_ONCE(!rcu_read_lock_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
>  	return map->ops->map_update_elem(map, key, value, flags);
>  }
>  
> @@ -61,7 +61,7 @@ const struct bpf_func_proto bpf_map_update_elem_proto = {
>  
>  BPF_CALL_2(bpf_map_delete_elem, struct bpf_map *, map, void *, key)
>  {
> -	WARN_ON_ONCE(!rcu_read_lock_held());
> +	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
>  	return map->ops->map_delete_elem(map, key);
>  }

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref
  2021-06-09 10:33 ` [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref Toke Høiland-Jørgensen
@ 2021-06-10 19:37   ` Martin KaFai Lau
  2021-06-10 23:05     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 39+ messages in thread
From: Martin KaFai Lau @ 2021-06-10 19:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, netdev, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney

On Wed, Jun 09, 2021 at 12:33:12PM +0200, Toke Høiland-Jørgensen wrote:
> Some of the XDP helpers (in particular, xdp_do_redirect()) will get a
> struct net_device reference using dev_get_by_index_rcu(). These are called
> from a NAPI poll context, which means the RCU reference liveness is ensured
> by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the
> RCU list traversal in dev_get_by_index_rcu() so lockdep understands that
> the dereferences are safe from *both* an rcu_read_lock() *and* with
> local_bh_disable().
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  net/core/dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index febb23708184..a499c5ffe4a5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1002,7 +1002,7 @@ struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex)
>  	struct net_device *dev;
>  	struct hlist_head *head = dev_index_hash(net, ifindex);
>  
> -	hlist_for_each_entry_rcu(dev, head, index_hlist)
> +	hlist_for_each_entry_rcu(dev, head, index_hlist, rcu_read_lock_bh_held())
Is it needed?  hlist_for_each_entry_rcu() checks for
rcu_read_lock_any_held().  Did lockdep complain?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries
  2021-06-09 10:33 ` [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries Toke Høiland-Jørgensen
@ 2021-06-10 21:09   ` Martin KaFai Lau
  2021-06-10 23:19     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 39+ messages in thread
From: Martin KaFai Lau @ 2021-06-10 21:09 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, netdev, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney

On Wed, Jun 09, 2021 at 12:33:13PM +0200, Toke Høiland-Jørgensen wrote:
[ ... ]

> @@ -551,7 +551,8 @@ static void cpu_map_free(struct bpf_map *map)
>  	for (i = 0; i < cmap->map.max_entries; i++) {
>  		struct bpf_cpu_map_entry *rcpu;
>  
> -		rcpu = READ_ONCE(cmap->cpu_map[i]);
> +		rcpu = rcu_dereference_check(cmap->cpu_map[i],
> +					     rcu_read_lock_bh_held());
Is rcu_read_lock_bh_held() true during map_free()?

[ ... ]

> @@ -149,7 +152,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
>  			       u64 map_flags)
>  {
>  	struct xsk_map *m = container_of(map, struct xsk_map, map);
> -	struct xdp_sock *xs, *old_xs, **map_entry;
> +	struct xdp_sock __rcu **map_entry;
> +	struct xdp_sock *xs, *old_xs;
>  	u32 i = *(u32 *)key, fd = *(u32 *)value;
>  	struct xsk_map_node *node;
>  	struct socket *sock;
> @@ -179,7 +183,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
>  	}
>  
>  	spin_lock_bh(&m->lock);
> -	old_xs = READ_ONCE(*map_entry);
> +	old_xs = rcu_dereference_check(*map_entry, rcu_read_lock_bh_held());
Is it actually protected by the m->lock at this point?

[ ... ]

>  void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
> -			     struct xdp_sock **map_entry)
> +			     struct xdp_sock __rcu **map_entry)
>  {
>  	spin_lock_bh(&map->lock);
> -	if (READ_ONCE(*map_entry) == xs) {
> -		WRITE_ONCE(*map_entry, NULL);
> +	if (rcu_dereference(*map_entry) == xs) {
nit. rcu_access_pointer()?

> +		rcu_assign_pointer(*map_entry, NULL);
>  		xsk_map_sock_delete(xs, map_entry);
>  	}
>  	spin_unlock_bh(&map->lock);
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
  2021-06-10 18:38   ` Alexei Starovoitov
@ 2021-06-10 21:24     ` Daniel Borkmann
  2021-06-10 22:27       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 39+ messages in thread
From: Daniel Borkmann @ 2021-06-10 21:24 UTC (permalink / raw)
  To: Alexei Starovoitov, Toke Høiland-Jørgensen
  Cc: bpf, Network Development, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Paul E . McKenney

Hi Paul,

On 6/10/21 8:38 PM, Alexei Starovoitov wrote:
> On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> XDP programs are called from a NAPI poll context, which means the RCU
>> reference liveness is ensured by local_bh_disable(). Add
>> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
>> lockdep understands that the dereferences are safe from inside *either* an
>> rcu_read_lock() section *or* a local_bh_disable() section. This is done in
>> preparation for removing the redundant rcu_read_lock()s from the drivers.
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>   kernel/bpf/hashtab.c  | 21 ++++++++++++++-------
>>   kernel/bpf/helpers.c  |  6 +++---
>>   kernel/bpf/lpm_trie.c |  6 ++++--
>>   3 files changed, 21 insertions(+), 12 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index 6f6681b07364..72c58cc516a3 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>>          struct htab_elem *l;
>>          u32 hash, key_size;
>>
>> -       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
>> +       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
>> +                    !rcu_read_lock_bh_held());
> 
> It's not clear to me whether rcu_read_lock_held() is still needed.
> All comments sound like rcu_read_lock_bh_held() is a superset of rcu
> that includes bh.
> But reading rcu source code it looks like RCU_BH is its own rcu flavor...
> which is confusing.

The series is a bit confusing to me as well. I recall we had a discussion with
Paul, but it was back in 2016 aka very early days of XDP to get some clarifications
about RCU vs RCU-bh flavour on this. Paul, given the series in here, I assume the
below is not true anymore, and in this case (since we're removing rcu_read_lock()
from drivers), the RCU-bh acts as a real superset?

Back then from your clarifications this was not the case:

   On Mon, Jul 25, 2016 at 11:26:02AM -0700, Alexei Starovoitov wrote:
   > On Mon, Jul 25, 2016 at 11:03 AM, Paul E. McKenney
   > <paulmck@linux.vnet.ibm.com> wrote:
   [...]
   >>> The crux of the question is whether a particular driver rx handler, when
   >>> called from __do_softirq, needs to add an additional rcu_read_lock or
   >>> whether it can rely on the mechanics of softirq.
   >>
   >> If it was rcu_read_lock_bh(), you could.
   >>
   >> But you didn't say rcu_read_lock_bh(), you instead said rcu_read_lock(),
   >> which means that you absolutely cannot rely on softirq semantics.
   >>
   >> In particular, in CONFIG_PREEMPT=y kernels, rcu_preempt_check_callbacks()
   >> will notice that there is no rcu_read_lock() in effect and report
   >> a quiescent state for that CPU.  Because rcu_preempt_check_callbacks()
   >> is invoked from the scheduling-clock interrupt, it absolutely can
   >> execute during do_softirq(), and therefore being in softirq context
   >> in no way provides rcu_read_lock()-style protection.
   >>
   >> Now, Alexei's question was for CONFIG_PREEMPT=n kernels.  However, in
   >> that case, rcu_read_lock() and rcu_read_unlock() generate no code
   >> in recent production kernels, so there is no performance penalty for
   >> using them.  (In older kernels, they implied a barrier().)
   >>
   >> So either way, with or without CONFIG_PREEMPT, you should use
   >> rcu_read_lock() to get RCU protection.
   >>
   >> One alternative might be to switch to rcu_read_lock_bh(), but that
   >> will add local_disable_bh() overhead to your read paths.
   >>
   >> Does that help, or am I missing the point of the question?
   >
   > thanks a lot for explanation.

   Glad you liked it!

   > I mistakenly assumed that _bh variants are 'stronger' and
   > act as inclusive, but sounds like they're completely orthogonal
   > especially with preempt_rcu=y.

   Yes, they are pretty much orthogonal.

   > With preempt_rcu=n and preempt=y, it would be the case, since
   > bh disables preemption and rcu_read_lock does the same as well,
   > right? Of course, the code shouldn't be relying on that, so we
   > have to fix our stuff.

   Indeed, especially given that the kernel currently won't allow you
   to configure CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT=y.  If it does,
   please let me know, as that would be a bug that needs to be fixed.
   (For one thing, I do not test that combination.)

							Thanx, Paul

And now, fast-forward again to 2021 ... :)

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
  2021-06-10 21:24     ` Daniel Borkmann
@ 2021-06-10 22:27       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-10 22:27 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: bpf, Network Development, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Paul E . McKenney

Daniel Borkmann <daniel@iogearbox.net> writes:

> Hi Paul,
>
> On 6/10/21 8:38 PM, Alexei Starovoitov wrote:
>> On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>>
>>> XDP programs are called from a NAPI poll context, which means the RCU
>>> reference liveness is ensured by local_bh_disable(). Add
>>> rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so
>>> lockdep understands that the dereferences are safe from inside *either* an
>>> rcu_read_lock() section *or* a local_bh_disable() section. This is done in
>>> preparation for removing the redundant rcu_read_lock()s from the drivers.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>>   kernel/bpf/hashtab.c  | 21 ++++++++++++++-------
>>>   kernel/bpf/helpers.c  |  6 +++---
>>>   kernel/bpf/lpm_trie.c |  6 ++++--
>>>   3 files changed, 21 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> index 6f6681b07364..72c58cc516a3 100644
>>> --- a/kernel/bpf/hashtab.c
>>> +++ b/kernel/bpf/hashtab.c
>>> @@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>>>          struct htab_elem *l;
>>>          u32 hash, key_size;
>>>
>>> -       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held());
>>> +       WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() &&
>>> +                    !rcu_read_lock_bh_held());
>> 
>> It's not clear to me whether rcu_read_lock_held() is still needed.
>> All comments sound like rcu_read_lock_bh_held() is a superset of rcu
>> that includes bh.
>> But reading rcu source code it looks like RCU_BH is its own rcu flavor...
>> which is confusing.
>
> The series is a bit confusing to me as well. I recall we had a discussion with
> Paul, but it was back in 2016 aka very early days of XDP to get some clarifications
> about RCU vs RCU-bh flavour on this. Paul, given the series in here, I assume the
> below is not true anymore, and in this case (since we're removing rcu_read_lock()
> from drivers), the RCU-bh acts as a real superset?
>
> Back then from your clarifications this was not the case:
>
>    On Mon, Jul 25, 2016 at 11:26:02AM -0700, Alexei Starovoitov wrote:
>    > On Mon, Jul 25, 2016 at 11:03 AM, Paul E. McKenney
>    > <paulmck@linux.vnet.ibm.com> wrote:
>    [...]
>    >>> The crux of the question is whether a particular driver rx handler, when
>    >>> called from __do_softirq, needs to add an additional rcu_read_lock or
>    >>> whether it can rely on the mechanics of softirq.
>    >>
>    >> If it was rcu_read_lock_bh(), you could.
>    >>
>    >> But you didn't say rcu_read_lock_bh(), you instead said rcu_read_lock(),
>    >> which means that you absolutely cannot rely on softirq semantics.
>    >>
>    >> In particular, in CONFIG_PREEMPT=y kernels, rcu_preempt_check_callbacks()
>    >> will notice that there is no rcu_read_lock() in effect and report
>    >> a quiescent state for that CPU.  Because rcu_preempt_check_callbacks()
>    >> is invoked from the scheduling-clock interrupt, it absolutely can
>    >> execute during do_softirq(), and therefore being in softirq context
>    >> in no way provides rcu_read_lock()-style protection.
>    >>
>    >> Now, Alexei's question was for CONFIG_PREEMPT=n kernels.  However, in
>    >> that case, rcu_read_lock() and rcu_read_unlock() generate no code
>    >> in recent production kernels, so there is no performance penalty for
>    >> using them.  (In older kernels, they implied a barrier().)
>    >>
>    >> So either way, with or without CONFIG_PREEMPT, you should use
>    >> rcu_read_lock() to get RCU protection.
>    >>
>    >> One alternative might be to switch to rcu_read_lock_bh(), but that
>    >> will add local_disable_bh() overhead to your read paths.
>    >>
>    >> Does that help, or am I missing the point of the question?
>    >
>    > thanks a lot for explanation.
>
>    Glad you liked it!
>
>    > I mistakenly assumed that _bh variants are 'stronger' and
>    > act as inclusive, but sounds like they're completely orthogonal
>    > especially with preempt_rcu=y.
>
>    Yes, they are pretty much orthogonal.
>
>    > With preempt_rcu=n and preempt=y, it would be the case, since
>    > bh disables preemption and rcu_read_lock does the same as well,
>    > right? Of course, the code shouldn't be relying on that, so we
>    > have to fix our stuff.
>
>    Indeed, especially given that the kernel currently won't allow you
>    to configure CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT=y.  If it does,
>    please let me know, as that would be a bug that needs to be fixed.
>    (For one thing, I do not test that combination.)
>
> 							Thanx, Paul
>
> And now, fast-forward again to 2021 ... :)

We covered this in the thread I linked from the cover letter.
Specifically, this seems to have been a change from v4.20, see Paul's
reply here:
https://lore.kernel.org/bpf/20210417002301.GO4212@paulmck-ThinkPad-P17-Gen-1/

and the follow-up covering -rt here:
https://lore.kernel.org/bpf/20210419165837.GA975577@paulmck-ThinkPad-P17-Gen-1/

-Toke


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref
  2021-06-10 19:37   ` Martin KaFai Lau
@ 2021-06-10 23:05     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-10 23:05 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney

Martin KaFai Lau <kafai@fb.com> writes:

> On Wed, Jun 09, 2021 at 12:33:12PM +0200, Toke Høiland-Jørgensen wrote:
>> Some of the XDP helpers (in particular, xdp_do_redirect()) will get a
>> struct net_device reference using dev_get_by_index_rcu(). These are called
>> from a NAPI poll context, which means the RCU reference liveness is ensured
>> by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the
>> RCU list traversal in dev_get_by_index_rcu() so lockdep understands that
>> the dereferences are safe from *both* an rcu_read_lock() *and* with
>> local_bh_disable().
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>  net/core/dev.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index febb23708184..a499c5ffe4a5 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1002,7 +1002,7 @@ struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex)
>>  	struct net_device *dev;
>>  	struct hlist_head *head = dev_index_hash(net, ifindex);
>>  
>> -	hlist_for_each_entry_rcu(dev, head, index_hlist)
>> +	hlist_for_each_entry_rcu(dev, head, index_hlist, rcu_read_lock_bh_held())
> Is it needed?  hlist_for_each_entry_rcu() checks for
> rcu_read_lock_any_held().  Did lockdep complain?

Ah, yes, I think you're right. I totally missed that
rcu_read_lock_any_held() includes a '!preemptible()' check at the end.
I'll drop this patch, then!

-Toke


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries
  2021-06-10 21:09   ` Martin KaFai Lau
@ 2021-06-10 23:19     ` Toke Høiland-Jørgensen
  2021-06-10 23:32       ` Martin KaFai Lau
  0 siblings, 1 reply; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-10 23:19 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney

Martin KaFai Lau <kafai@fb.com> writes:

> On Wed, Jun 09, 2021 at 12:33:13PM +0200, Toke Høiland-Jørgensen wrote:
> [ ... ]
>
>> @@ -551,7 +551,8 @@ static void cpu_map_free(struct bpf_map *map)
>>  	for (i = 0; i < cmap->map.max_entries; i++) {
>>  		struct bpf_cpu_map_entry *rcpu;
>>  
>> -		rcpu = READ_ONCE(cmap->cpu_map[i]);
>> +		rcpu = rcu_dereference_check(cmap->cpu_map[i],
>> +					     rcu_read_lock_bh_held());
> Is rcu_read_lock_bh_held() true during map_free()?

Hmm, no, I guess not since that's called from a workqueue. Will fix!

>> @@ -149,7 +152,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
>>  			       u64 map_flags)
>>  {
>>  	struct xsk_map *m = container_of(map, struct xsk_map, map);
>> -	struct xdp_sock *xs, *old_xs, **map_entry;
>> +	struct xdp_sock __rcu **map_entry;
>> +	struct xdp_sock *xs, *old_xs;
>>  	u32 i = *(u32 *)key, fd = *(u32 *)value;
>>  	struct xsk_map_node *node;
>>  	struct socket *sock;
>> @@ -179,7 +183,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
>>  	}
>>  
>>  	spin_lock_bh(&m->lock);
>> -	old_xs = READ_ONCE(*map_entry);
>> +	old_xs = rcu_dereference_check(*map_entry, rcu_read_lock_bh_held());
> Is it actually protected by the m->lock at this point?

True, can just add that to the check.

>>  void xsk_map_try_sock_delete(struct xsk_map *map, struct xdp_sock *xs,
>> -			     struct xdp_sock **map_entry)
>> +			     struct xdp_sock __rcu **map_entry)
>>  {
>>  	spin_lock_bh(&map->lock);
>> -	if (READ_ONCE(*map_entry) == xs) {
>> -		WRITE_ONCE(*map_entry, NULL);
>> +	if (rcu_dereference(*map_entry) == xs) {
> nit. rcu_access_pointer()?

Yup.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries
  2021-06-10 23:19     ` Toke Høiland-Jørgensen
@ 2021-06-10 23:32       ` Martin KaFai Lau
  2021-06-10 23:41         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 39+ messages in thread
From: Martin KaFai Lau @ 2021-06-10 23:32 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, netdev, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney

On Fri, Jun 11, 2021 at 01:19:04AM +0200, Toke Høiland-Jørgensen wrote:
> >> @@ -149,7 +152,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
> >>  			       u64 map_flags)
> >>  {
> >>  	struct xsk_map *m = container_of(map, struct xsk_map, map);
> >> -	struct xdp_sock *xs, *old_xs, **map_entry;
> >> +	struct xdp_sock __rcu **map_entry;
> >> +	struct xdp_sock *xs, *old_xs;
> >>  	u32 i = *(u32 *)key, fd = *(u32 *)value;
> >>  	struct xsk_map_node *node;
> >>  	struct socket *sock;
> >> @@ -179,7 +183,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
> >>  	}
> >>  
> >>  	spin_lock_bh(&m->lock);
> >> -	old_xs = READ_ONCE(*map_entry);
> >> +	old_xs = rcu_dereference_check(*map_entry, rcu_read_lock_bh_held());
> > Is it actually protected by the m->lock at this point?
> 
> True, can just add that to the check.
this should be enough
rcu_dereference_protected(*map_entry, lockdep_is_held(&m->lock));

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries
  2021-06-10 23:32       ` Martin KaFai Lau
@ 2021-06-10 23:41         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 39+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-10 23:41 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: bpf, netdev, Hangbin Liu, Jesper Dangaard Brouer,
	Magnus Karlsson, Paul E . McKenney

Martin KaFai Lau <kafai@fb.com> writes:

> On Fri, Jun 11, 2021 at 01:19:04AM +0200, Toke Høiland-Jørgensen wrote:
>> >> @@ -149,7 +152,8 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
>> >>  			       u64 map_flags)
>> >>  {
>> >>  	struct xsk_map *m = container_of(map, struct xsk_map, map);
>> >> -	struct xdp_sock *xs, *old_xs, **map_entry;
>> >> +	struct xdp_sock __rcu **map_entry;
>> >> +	struct xdp_sock *xs, *old_xs;
>> >>  	u32 i = *(u32 *)key, fd = *(u32 *)value;
>> >>  	struct xsk_map_node *node;
>> >>  	struct socket *sock;
>> >> @@ -179,7 +183,7 @@ static int xsk_map_update_elem(struct bpf_map *map, void *key, void *value,
>> >>  	}
>> >>  
>> >>  	spin_lock_bh(&m->lock);
>> >> -	old_xs = READ_ONCE(*map_entry);
>> >> +	old_xs = rcu_dereference_check(*map_entry, rcu_read_lock_bh_held());
>> > Is it actually protected by the m->lock at this point?
>> 
>> True, can just add that to the check.
> this should be enough
> rcu_dereference_protected(*map_entry, lockdep_is_held(&m->lock));

Right, that's what I had in mind as well :)

-Toke


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 12/17] nfp: remove rcu_read_lock() around XDP program invocation
  2021-06-09 10:33 ` [PATCH bpf-next 12/17] nfp: " Toke Høiland-Jørgensen
@ 2021-06-11 16:30   ` Simon Horman
  0 siblings, 0 replies; 39+ messages in thread
From: Simon Horman @ 2021-06-11 16:30 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, netdev, Martin KaFai Lau, Hangbin Liu,
	Jesper Dangaard Brouer, Magnus Karlsson, Paul E . McKenney,
	oss-drivers, Jakub Kicinski

+Jakub

On Wed, Jun 09, 2021 at 12:33:21PM +0200, Toke Høiland-Jørgensen wrote:
> The nfp driver has rcu_read_lock()/rcu_read_unlock() pairs around XDP
> program invocations. However, the actual lifetime of the objects referred
> by the XDP program invocation is longer, all the way through to the call to
> xdp_do_flush(), making the scope of the rcu_read_lock() too small.
> 
> While this is not actually an issue for the nfp driver because it doesn't
> support XDP_REDIRECT (and thus doesn't call xdp_do_flush()), the
> rcu_read_lock() is still unneeded. And With the addition of RCU annotations
> to the XDP_REDIRECT map types that take bh execution into account, lockdep
> even understands this to be safe, so there's really no reason to keep it
> around.
> 
> Cc: Simon Horman <simon.horman@netronome.com>
> Cc: oss-drivers@netronome.com
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

Reviewed-by: Simon Horman <simon.horman@netronome.com>

> ---
>  drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> index eeb30680b4dc..5dfa4799c34f 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
> @@ -1819,7 +1819,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>  	struct xdp_buff xdp;
>  	int idx;
>  
> -	rcu_read_lock();
>  	xdp_prog = READ_ONCE(dp->xdp_prog);
>  	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
>  	xdp_init_buff(&xdp, PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM,
> @@ -2036,7 +2035,6 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
>  			if (!nfp_net_xdp_complete(tx_ring))
>  				pkts_polled = budget;
>  	}
> -	rcu_read_unlock();
>  
>  	return pkts_polled;
>  }
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-06-11 16:31 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 10:33 [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 01/17] rcu: Create an unrcu_pointer() to remove __rcu from a pointer Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context Toke Høiland-Jørgensen
2021-06-10 18:38   ` Alexei Starovoitov
2021-06-10 21:24     ` Daniel Borkmann
2021-06-10 22:27       ` Toke Høiland-Jørgensen
2021-06-10 19:33   ` Martin KaFai Lau
2021-06-09 10:33 ` [PATCH bpf-next 03/17] dev: add rcu_read_lock_bh_held() as a valid check when getting a RCU dev ref Toke Høiland-Jørgensen
2021-06-10 19:37   ` Martin KaFai Lau
2021-06-10 23:05     ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 04/17] xdp: add proper __rcu annotations to redirect map entries Toke Høiland-Jørgensen
2021-06-10 21:09   ` Martin KaFai Lau
2021-06-10 23:19     ` Toke Høiland-Jørgensen
2021-06-10 23:32       ` Martin KaFai Lau
2021-06-10 23:41         ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 05/17] ena: remove rcu_read_lock() around XDP program invocation Toke Høiland-Jørgensen
2021-06-09 13:57   ` Paul E. McKenney
2021-06-09 10:33 ` [PATCH bpf-next 06/17] bnxt: " Toke Høiland-Jørgensen
2021-06-09 13:58   ` Paul E. McKenney
2021-06-10  8:47     ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 07/17] thunderx: " Toke Høiland-Jørgensen
2021-06-09 10:33   ` Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 08/17] freescale: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 09/17] net: intel: " Toke Høiland-Jørgensen
2021-06-09 10:33   ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?=
2021-06-09 10:33 ` [PATCH bpf-next 10/17] marvell: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 11/17] mlx4: " Toke Høiland-Jørgensen
2021-06-10  7:10   ` Tariq Toukan
2021-06-09 10:33 ` [PATCH bpf-next 12/17] nfp: " Toke Høiland-Jørgensen
2021-06-11 16:30   ` Simon Horman
2021-06-09 10:33 ` [PATCH bpf-next 13/17] qede: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 14/17] sfc: " Toke Høiland-Jørgensen
2021-06-09 12:15   ` Edward Cree
2021-06-09 10:33 ` [PATCH bpf-next 15/17] netsec: " Toke Høiland-Jørgensen
2021-06-10  5:30   ` Ilias Apalodimas
2021-06-09 10:33 ` [PATCH bpf-next 16/17] stmmac: " Toke Høiland-Jørgensen
2021-06-09 10:33 ` [PATCH bpf-next 17/17] net: ti: " Toke Høiland-Jørgensen
2021-06-09 17:04   ` Grygorii Strashko
2021-06-10  0:18 ` [PATCH bpf-next 00/17] Clean up and document RCU-based object protection for XDP_REDIRECT Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.