[PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked

* [PATCH bpf-next v2 1/3] bpf: Disable preemption when increasing per-cpu map_locked
@ 2022-08-27 10:01 Hou Tao
  2022-08-27 10:01 ` [PATCH bpf-next v2 2/3] bpf: Propagate error from htab_lock_bucket() to userspace Hou Tao
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Hou Tao @ 2022-08-27 10:01 UTC (permalink / raw)
  To: bpf, Song Liu, Hao Luo
  Cc: Hao Sun, Sebastian Andrzej Siewior, Andrii Nakryiko,
	Yonghong Song, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, KP Singh, David S . Miller, Jakub Kicinski,
	Stanislav Fomichev, Jiri Olsa, John Fastabend, Lorenz Bauer,
	houtao1

From: Hou Tao <houtao1@huawei.com>

Per-cpu htab->map_locked is used to prohibit the concurrent accesses
from both NMI and non-NMI contexts. But since commit 74d862b682f5
("sched: Make migrate_disable/enable() independent of RT"),
migrations_disable() is also preemptible under CONFIG_PREEMPT case,
so now map_locked also disallows concurrent updates from normal contexts
(e.g. userspace processes) unexpectedly as shown below:

process A                      process B

htab_map_update_elem()
  htab_lock_bucket()
    migrate_disable()
    /* return 1 */
    __this_cpu_inc_return()
    /* preempted by B */

                               htab_map_update_elem()
                                 /* the same bucket as A */
                                 htab_lock_bucket()
                                   migrate_disable()
                                   /* return 2, so lock fails */
                                   __this_cpu_inc_return()
                                   return -EBUSY

A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
only checking the value of map_locked for nmi context. But it will
re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
through non-tracing program (e.g. fentry program).

So fixing it by using disable_preempt() instead of migrate_disable() when
increasing htab->map_locked. However when htab_use_raw_lock() is false,
bucket lock will be a sleepable spin-lock and it breaks disable_preempt(),
so still use migrate_disable() for spin-lock case and leave the
concurrent map updates problem to BPF memory allocator patchset in which
!htab_use_raw_lock() case will be removed.

Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index b301a63afa2f..6fb3b7fd1622 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
 				   unsigned long *pflags)
 {
 	unsigned long flags;
+	bool use_raw_lock;
 
 	hash = hash & HASHTAB_MAP_LOCK_MASK;
 
-	migrate_disable();
+	use_raw_lock = htab_use_raw_lock(htab);
+	if (use_raw_lock)
+		preempt_disable();
+	else
+		migrate_disable();
 	if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
 		__this_cpu_dec(*(htab->map_locked[hash]));
-		migrate_enable();
+		if (use_raw_lock)
+			preempt_enable();
+		else
+			migrate_enable();
 		return -EBUSY;
 	}
 
-	if (htab_use_raw_lock(htab))
+	if (use_raw_lock)
 		raw_spin_lock_irqsave(&b->raw_lock, flags);
 	else
 		spin_lock_irqsave(&b->lock, flags);
@@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
 				      struct bucket *b, u32 hash,
 				      unsigned long flags)
 {
+	bool use_raw_lock = htab_use_raw_lock(htab);
+
 	hash = hash & HASHTAB_MAP_LOCK_MASK;
-	if (htab_use_raw_lock(htab))
+	if (use_raw_lock)
 		raw_spin_unlock_irqrestore(&b->raw_lock, flags);
 	else
 		spin_unlock_irqrestore(&b->lock, flags);
 	__this_cpu_dec(*(htab->map_locked[hash]));
-	migrate_enable();
+	if (use_raw_lock)
+		preempt_enable();
+	else
+		migrate_enable();
 }
 
 static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread