netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock
@ 2019-01-30  4:04 Alexei Starovoitov
  2019-01-30  4:04 ` [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist Alexei Starovoitov
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team

In addition to preempt_disable patch for socket filters
https://patchwork.ozlabs.org/patch/1032437/
the first three patches fix various lockdep false positives.
Last patch fixes potential deadlock in stackmap access from
tracing bpf prog and from syscall.

Alexei Starovoitov (3):
  bpf: fix lockdep false positive in percpu_freelist
  bpf: fix lockdep false positive in stackmap
  bpf: fix lockdep false positive in bpf_prog_register

Martin KaFai Lau (1):
  bpf: Fix syscall's stackmap lookup potential deadlock

 kernel/bpf/hashtab.c         |  4 ++--
 kernel/bpf/percpu_freelist.c | 41 +++++++++++++++++++++++++-----------
 kernel/bpf/percpu_freelist.h |  4 ++++
 kernel/bpf/stackmap.c        |  2 +-
 kernel/bpf/syscall.c         | 12 +++++++++--
 kernel/trace/bpf_trace.c     | 14 ++----------
 6 files changed, 48 insertions(+), 29 deletions(-)

-- 
2.20.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
  2019-01-30  4:04 [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
@ 2019-01-30  4:04 ` Alexei Starovoitov
  2019-01-30 10:21   ` Peter Zijlstra
  2019-01-30  4:04 ` [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap Alexei Starovoitov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team

Lockdep warns about false positive:
[   12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
[   12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
[   12.493275]  (&rq->lock){-.-.}
[   12.493276]
[   12.493276]
[   12.493276] and interrupts could create inverse lock ordering between them.
[   12.493276]
[   12.494435]
[   12.494435] other info that might help us debug this:
[   12.494979]  Possible interrupt unsafe locking scenario:
[   12.494979]
[   12.495518]        CPU0                    CPU1
[   12.495879]        ----                    ----
[   12.496243]   lock(&head->lock);
[   12.496502]                                local_irq_disable();
[   12.496969]                                lock(&rq->lock);
[   12.497431]                                lock(&head->lock);
[   12.497890]   <Interrupt>
[   12.498104]     lock(&rq->lock);
[   12.498368]
[   12.498368]  *** DEADLOCK ***
[   12.498368]
[   12.498837] 1 lock held by dd/276:
[   12.499110]  #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
[   12.499747]
[   12.499747] the shortest dependencies between 2nd lock and 1st lock:
[   12.500389]  -> (&rq->lock){-.-.} {
[   12.500669]     IN-HARDIRQ-W at:
[   12.500934]                       _raw_spin_lock+0x2f/0x40
[   12.501373]                       scheduler_tick+0x4c/0xf0
[   12.501812]                       update_process_times+0x40/0x50
[   12.502294]                       tick_periodic+0x27/0xb0
[   12.502723]                       tick_handle_periodic+0x1f/0x60
[   12.503203]                       timer_interrupt+0x11/0x20
[   12.503651]                       __handle_irq_event_percpu+0x43/0x2c0
[   12.504167]                       handle_irq_event_percpu+0x20/0x50
[   12.504674]                       handle_irq_event+0x37/0x60
[   12.505139]                       handle_level_irq+0xa7/0x120
[   12.505601]                       handle_irq+0xa1/0x150
[   12.506018]                       do_IRQ+0x77/0x140
[   12.506411]                       ret_from_intr+0x0/0x1d
[   12.506834]                       _raw_spin_unlock_irqrestore+0x53/0x60
[   12.507362]                       __setup_irq+0x481/0x730
[   12.507789]                       setup_irq+0x49/0x80
[   12.508195]                       hpet_time_init+0x21/0x32
[   12.508644]                       x86_late_time_init+0xb/0x16
[   12.509106]                       start_kernel+0x390/0x42a
[   12.509554]                       secondary_startup_64+0xa4/0xb0
[   12.510034]     IN-SOFTIRQ-W at:
[   12.510305]                       _raw_spin_lock+0x2f/0x40
[   12.510772]                       try_to_wake_up+0x1c7/0x4e0
[   12.511220]                       swake_up_locked+0x20/0x40
[   12.511657]                       swake_up_one+0x1a/0x30
[   12.512070]                       rcu_process_callbacks+0xc5/0x650
[   12.512553]                       __do_softirq+0xe6/0x47b
[   12.512978]                       irq_exit+0xc3/0xd0
[   12.513372]                       smp_apic_timer_interrupt+0xa9/0x250
[   12.513876]                       apic_timer_interrupt+0xf/0x20
[   12.514343]                       default_idle+0x1c/0x170
[   12.514765]                       do_idle+0x199/0x240
[   12.515159]                       cpu_startup_entry+0x19/0x20
[   12.515614]                       start_kernel+0x422/0x42a
[   12.516045]                       secondary_startup_64+0xa4/0xb0
[   12.516521]     INITIAL USE at:
[   12.516774]                      _raw_spin_lock_irqsave+0x38/0x50
[   12.517258]                      rq_attach_root+0x16/0xd0
[   12.517685]                      sched_init+0x2f2/0x3eb
[   12.518096]                      start_kernel+0x1fb/0x42a
[   12.518525]                      secondary_startup_64+0xa4/0xb0
[   12.518986]   }
[   12.519132]   ... key      at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
[   12.519649]   ... acquired at:
[   12.519892]    pcpu_freelist_pop+0x7b/0xd0
[   12.520221]    bpf_get_stackid+0x1d2/0x4d0
[   12.520563]    ___bpf_prog_run+0x8b4/0x11a0
[   12.520887]
[   12.521008] -> (&head->lock){+...} {
[   12.521292]    HARDIRQ-ON-W at:
[   12.521539]                     _raw_spin_lock+0x2f/0x40
[   12.521950]                     pcpu_freelist_push+0x2a/0x40
[   12.522396]                     bpf_get_stackid+0x494/0x4d0
[   12.522828]                     ___bpf_prog_run+0x8b4/0x11a0
[   12.523296]    INITIAL USE at:
[   12.523537]                    _raw_spin_lock+0x2f/0x40
[   12.523944]                    pcpu_freelist_populate+0xc0/0x120
[   12.524417]                    htab_map_alloc+0x405/0x500
[   12.524835]                    __do_sys_bpf+0x1a3/0x1a90
[   12.525253]                    do_syscall_64+0x4a/0x180
[   12.525659]                    entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   12.526167]  }
[   12.526311]  ... key      at: [<ffffffff838f7668>] __key.13130+0x0/0x8
[   12.526812]  ... acquired at:
[   12.527047]    __lock_acquire+0x521/0x1350
[   12.527371]    lock_acquire+0x98/0x190
[   12.527680]    _raw_spin_lock+0x2f/0x40
[   12.527994]    pcpu_freelist_push+0x2a/0x40
[   12.528325]    bpf_get_stackid+0x494/0x4d0
[   12.528645]    ___bpf_prog_run+0x8b4/0x11a0
[   12.528970]
[   12.529092]
[   12.529092] stack backtrace:
[   12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
[   12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   12.530750] Call Trace:
[   12.530948]  dump_stack+0x5f/0x8b
[   12.531248]  check_usage_backwards+0x10c/0x120
[   12.531598]  ? ___bpf_prog_run+0x8b4/0x11a0
[   12.531935]  ? mark_lock+0x382/0x560
[   12.532229]  mark_lock+0x382/0x560
[   12.532496]  ? print_shortest_lock_dependencies+0x180/0x180
[   12.532928]  __lock_acquire+0x521/0x1350
[   12.533271]  ? find_get_entry+0x17f/0x2e0
[   12.533586]  ? find_get_entry+0x19c/0x2e0
[   12.533902]  ? lock_acquire+0x98/0x190
[   12.534196]  lock_acquire+0x98/0x190
[   12.534482]  ? pcpu_freelist_push+0x2a/0x40
[   12.534810]  _raw_spin_lock+0x2f/0x40
[   12.535099]  ? pcpu_freelist_push+0x2a/0x40
[   12.535432]  pcpu_freelist_push+0x2a/0x40
[   12.535750]  bpf_get_stackid+0x494/0x4d0
[   12.536062]  ___bpf_prog_run+0x8b4/0x11a0

It has been explained that is a false positive here:
https://lkml.org/lkml/2018/7/25/756
Recap:
- stackmap uses pcpu_freelist
- The lock in pcpu_freelist is a percpu lock
- stackmap is only used by tracing bpf_prog
- A tracing bpf_prog cannot be run if another bpf_prog
  has already been running (ensured by the percpu bpf_prog_active counter).

Eric pointed out that this lockdep splats stops other
legit lockdep splats in selftests/bpf/test_progs.c.

Fix this by calling local_irq_save/restore for stackmap.

Another false positive had also been worked around by calling
local_irq_save in commit 89ad2fa3f043 ("bpf: fix lockdep splat").
That commit added unnecessary irq_save/restore to fast path of
bpf hash map. irqs are already disabled at that point, since htab
is holding per bucket spin_lock with irqsave.

Let's reduce overhead for htab by introducing __pcpu_freelist_push/pop
function w/o irqsave and convert pcpu_freelist_push/pop to irqsave
to be used elsewhere (right now only in stackmap).
It stops lockdep false positive in stackmap with a bit of acceptable overhead.

Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/hashtab.c         |  4 ++--
 kernel/bpf/percpu_freelist.c | 41 +++++++++++++++++++++++++-----------
 kernel/bpf/percpu_freelist.h |  4 ++++
 3 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 4b7c76765d9d..f9274114c88d 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -686,7 +686,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
 	}
 
 	if (htab_is_prealloc(htab)) {
-		pcpu_freelist_push(&htab->freelist, &l->fnode);
+		__pcpu_freelist_push(&htab->freelist, &l->fnode);
 	} else {
 		atomic_dec(&htab->count);
 		l->htab = htab;
@@ -748,7 +748,7 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
 		} else {
 			struct pcpu_freelist_node *l;
 
-			l = pcpu_freelist_pop(&htab->freelist);
+			l = __pcpu_freelist_pop(&htab->freelist);
 			if (!l)
 				return ERR_PTR(-E2BIG);
 			l_new = container_of(l, struct htab_elem, fnode);
diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c
index 673fa6fe2d73..0c1b4ba9e90e 100644
--- a/kernel/bpf/percpu_freelist.c
+++ b/kernel/bpf/percpu_freelist.c
@@ -28,8 +28,8 @@ void pcpu_freelist_destroy(struct pcpu_freelist *s)
 	free_percpu(s->freelist);
 }
 
-static inline void __pcpu_freelist_push(struct pcpu_freelist_head *head,
-					struct pcpu_freelist_node *node)
+static inline void ___pcpu_freelist_push(struct pcpu_freelist_head *head,
+					 struct pcpu_freelist_node *node)
 {
 	raw_spin_lock(&head->lock);
 	node->next = head->first;
@@ -37,12 +37,22 @@ static inline void __pcpu_freelist_push(struct pcpu_freelist_head *head,
 	raw_spin_unlock(&head->lock);
 }
 
-void pcpu_freelist_push(struct pcpu_freelist *s,
+void __pcpu_freelist_push(struct pcpu_freelist *s,
 			struct pcpu_freelist_node *node)
 {
 	struct pcpu_freelist_head *head = this_cpu_ptr(s->freelist);
 
-	__pcpu_freelist_push(head, node);
+	___pcpu_freelist_push(head, node);
+}
+
+void pcpu_freelist_push(struct pcpu_freelist *s,
+			struct pcpu_freelist_node *node)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	__pcpu_freelist_push(s, node);
+	local_irq_restore(flags);
 }
 
 void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
@@ -63,7 +73,7 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 	for_each_possible_cpu(cpu) {
 again:
 		head = per_cpu_ptr(s->freelist, cpu);
-		__pcpu_freelist_push(head, buf);
+		___pcpu_freelist_push(head, buf);
 		i++;
 		buf += elem_size;
 		if (i == nr_elems)
@@ -74,14 +84,12 @@ void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 	local_irq_restore(flags);
 }
 
-struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
+struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *s)
 {
 	struct pcpu_freelist_head *head;
 	struct pcpu_freelist_node *node;
-	unsigned long flags;
 	int orig_cpu, cpu;
 
-	local_irq_save(flags);
 	orig_cpu = cpu = raw_smp_processor_id();
 	while (1) {
 		head = per_cpu_ptr(s->freelist, cpu);
@@ -89,16 +97,25 @@ struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
 		node = head->first;
 		if (node) {
 			head->first = node->next;
-			raw_spin_unlock_irqrestore(&head->lock, flags);
+			raw_spin_unlock(&head->lock);
 			return node;
 		}
 		raw_spin_unlock(&head->lock);
 		cpu = cpumask_next(cpu, cpu_possible_mask);
 		if (cpu >= nr_cpu_ids)
 			cpu = 0;
-		if (cpu == orig_cpu) {
-			local_irq_restore(flags);
+		if (cpu == orig_cpu)
 			return NULL;
-		}
 	}
 }
+
+struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s)
+{
+	struct pcpu_freelist_node *ret;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	ret = __pcpu_freelist_pop(s);
+	local_irq_restore(flags);
+	return ret;
+}
diff --git a/kernel/bpf/percpu_freelist.h b/kernel/bpf/percpu_freelist.h
index 3049aae8ea1e..c3960118e617 100644
--- a/kernel/bpf/percpu_freelist.h
+++ b/kernel/bpf/percpu_freelist.h
@@ -22,8 +22,12 @@ struct pcpu_freelist_node {
 	struct pcpu_freelist_node *next;
 };
 
+/* pcpu_freelist_* do spin_lock_irqsave. */
 void pcpu_freelist_push(struct pcpu_freelist *, struct pcpu_freelist_node *);
 struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *);
+/* __pcpu_freelist_* do spin_lock only. caller must disable irqs. */
+void __pcpu_freelist_push(struct pcpu_freelist *, struct pcpu_freelist_node *);
+struct pcpu_freelist_node *__pcpu_freelist_pop(struct pcpu_freelist *);
 void pcpu_freelist_populate(struct pcpu_freelist *s, void *buf, u32 elem_size,
 			    u32 nr_elems);
 int pcpu_freelist_init(struct pcpu_freelist *);
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30  4:04 [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
  2019-01-30  4:04 ` [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist Alexei Starovoitov
@ 2019-01-30  4:04 ` Alexei Starovoitov
  2019-01-30 10:15   ` Peter Zijlstra
  2019-01-30  4:04 ` [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register Alexei Starovoitov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team

Lockdep warns about false positive:
[   11.211460] ------------[ cut here ]------------
[   11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
[   11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
[   11.213134] Modules linked in:
[   11.213413] CPU: 0 PID: 141 Comm: systemd-journal Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #476
[   11.214191] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   11.214954] RIP: 0010:lock_release+0x1ad/0x280
[   11.217036] RSP: 0018:ffff88813ba03f50 EFLAGS: 00010086
[   11.217516] RAX: 000000000000001f RBX: ffff8881378d8000 RCX: 0000000000000000
[   11.218179] RDX: ffffffff810d3e9e RSI: 0000000000000001 RDI: ffffffff810d3eb3
[   11.218851] RBP: ffff8881393e2b08 R08: 0000000000000002 R09: 0000000000000000
[   11.219504] R10: 0000000000000000 R11: ffff88813ba03d9d R12: ffffffff8118dfa2
[   11.220162] R13: 0000000000000086 R14: 0000000000000000 R15: 0000000000000000
[   11.220717] FS:  00007f3c8cf35780(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
[   11.221348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.221822] CR2: 00007f5825d92080 CR3: 00000001378c8005 CR4: 00000000003606f0
[   11.222381] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   11.222951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   11.223508] Call Trace:
[   11.223705]  <IRQ>
[   11.223874]  ? __local_bh_enable+0x7a/0x80
[   11.224199]  up_read+0x1c/0xa0
[   11.224446]  do_up_read+0x12/0x20
[   11.224713]  irq_work_run_list+0x43/0x70
[   11.225030]  irq_work_run+0x26/0x50
[   11.225310]  smp_irq_work_interrupt+0x57/0x1f0
[   11.225662]  irq_work_interrupt+0xf/0x20

since rw_semaphore is released in a different task vs task that locked the sema.
It is expected behavior.
Silence the warning by using up_read_non_owner().

Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/stackmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index d43b14535827..4b79e7c251e5 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -44,7 +44,7 @@ static void do_up_read(struct irq_work *entry)
 	struct stack_map_irq_work *work;
 
 	work = container_of(entry, struct stack_map_irq_work, irq_work);
-	up_read(work->sem);
+	up_read_non_owner(work->sem);
 	work->sem = NULL;
 }
 
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
  2019-01-30  4:04 [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
  2019-01-30  4:04 ` [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist Alexei Starovoitov
  2019-01-30  4:04 ` [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap Alexei Starovoitov
@ 2019-01-30  4:04 ` Alexei Starovoitov
  2019-01-30 10:10   ` Peter Zijlstra
  2019-01-30  4:04 ` [PATCH bpf-next 4/4] bpf: Fix syscall's stackmap lookup potential deadlock Alexei Starovoitov
  2019-01-30  4:07 ` [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
  4 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team

Lockdep warns about false positive:
[   13.007000] WARNING: possible circular locking dependency detected
[   13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[   13.008124] ------------------------------------------------------
[   13.008624] test_progs/246 is trying to acquire lock:
[   13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[   13.009770]
[   13.009770] but task is already holding lock:
[   13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.010877]
[   13.010877] which lock already depends on the new lock.
[   13.010877]
[   13.011532]
[   13.011532] the existing dependency chain (in reverse order) is:
[   13.012129]
[   13.012129] -> #4 (bpf_event_mutex){+.+.}:
[   13.012582]        perf_event_query_prog_array+0x9b/0x130
[   13.013016]        _perf_ioctl+0x3aa/0x830
[   13.013354]        perf_ioctl+0x2e/0x50
[   13.013668]        do_vfs_ioctl+0x8f/0x6a0
[   13.014003]        ksys_ioctl+0x70/0x80
[   13.014320]        __x64_sys_ioctl+0x16/0x20
[   13.014668]        do_syscall_64+0x4a/0x180
[   13.015007]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.015469]
[   13.015469] -> #3 (&cpuctx_mutex){+.+.}:
[   13.015910]        perf_event_init_cpu+0x5a/0x90
[   13.016291]        perf_event_init+0x1b2/0x1de
[   13.016654]        start_kernel+0x2b8/0x42a
[   13.016995]        secondary_startup_64+0xa4/0xb0
[   13.017382]
[   13.017382] -> #2 (pmus_lock){+.+.}:
[   13.017794]        perf_event_init_cpu+0x21/0x90
[   13.018172]        cpuhp_invoke_callback+0xb3/0x960
[   13.018573]        _cpu_up+0xa7/0x140
[   13.018871]        do_cpu_up+0xa4/0xc0
[   13.019178]        smp_init+0xcd/0xd2
[   13.019483]        kernel_init_freeable+0x123/0x24f
[   13.019878]        kernel_init+0xa/0x110
[   13.020201]        ret_from_fork+0x24/0x30
[   13.020541]
[   13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[   13.021051]        static_key_slow_inc+0xe/0x20
[   13.021424]        tracepoint_probe_register_prio+0x28c/0x300
[   13.021891]        perf_trace_event_init+0x11f/0x250
[   13.022297]        perf_trace_init+0x6b/0xa0
[   13.022644]        perf_tp_event_init+0x25/0x40
[   13.023011]        perf_try_init_event+0x6b/0x90
[   13.023386]        perf_event_alloc+0x9a8/0xc40
[   13.023754]        __do_sys_perf_event_open+0x1dd/0xd30
[   13.024173]        do_syscall_64+0x4a/0x180
[   13.024519]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.024968]
[   13.024968] -> #0 (tracepoints_mutex){+.+.}:
[   13.025434]        __mutex_lock+0x86/0x970
[   13.025764]        tracepoint_probe_register_prio+0x2d/0x300
[   13.026215]        bpf_probe_register+0x40/0x60
[   13.026584]        bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.027042]        __do_sys_bpf+0x94f/0x1a90
[   13.027389]        do_syscall_64+0x4a/0x180
[   13.027727]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.028171]
[   13.028171] other info that might help us debug this:
[   13.028171]
[   13.028807] Chain exists of:
[   13.028807]   tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[   13.028807]
[   13.029666]  Possible unsafe locking scenario:
[   13.029666]
[   13.030140]        CPU0                    CPU1
[   13.030510]        ----                    ----
[   13.030875]   lock(bpf_event_mutex);
[   13.031166]                                lock(&cpuctx_mutex);
[   13.031645]                                lock(bpf_event_mutex);
[   13.032135]   lock(tracepoints_mutex);
[   13.032441]
[   13.032441]  *** DEADLOCK ***
[   13.032441]
[   13.032911] 1 lock held by test_progs/246:
[   13.033239]  #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[   13.033909]
[   13.033909] stack backtrace:
[   13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[   13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[   13.035657] Call Trace:
[   13.035859]  dump_stack+0x5f/0x8b
[   13.036130]  print_circular_bug.isra.37+0x1ce/0x1db
[   13.036526]  __lock_acquire+0x1158/0x1350
[   13.036852]  ? lock_acquire+0x98/0x190
[   13.037154]  lock_acquire+0x98/0x190
[   13.037447]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.037876]  __mutex_lock+0x86/0x970
[   13.038167]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.038600]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.039028]  ? __mutex_lock+0x86/0x970
[   13.039337]  ? __mutex_lock+0x24a/0x970
[   13.039649]  ? bpf_probe_register+0x1d/0x60
[   13.039992]  ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[   13.040478]  ? tracepoint_probe_register_prio+0x2d/0x300
[   13.040906]  tracepoint_probe_register_prio+0x2d/0x300
[   13.041325]  bpf_probe_register+0x40/0x60
[   13.041649]  bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[   13.042068]  ? __might_fault+0x3e/0x90
[   13.042374]  __do_sys_bpf+0x94f/0x1a90
[   13.042678]  do_syscall_64+0x4a/0x180
[   13.042975]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   13.043382] RIP: 0033:0x7f23b10a07f9
[   13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[   13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[   13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[   13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[   13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[   13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000

Lockdep seems to be confusing different mutexes.
Such deadlock is not possible.
Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.

Fixes: c4f6699dfcb8 ("bpf: introduce BPF_RAW_TRACEPOINT")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/trace/bpf_trace.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8b068adb9da1..f1a86a0d881d 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1204,22 +1204,12 @@ static int __bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *
 
 int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
-	int err;
-
-	mutex_lock(&bpf_event_mutex);
-	err = __bpf_probe_register(btp, prog);
-	mutex_unlock(&bpf_event_mutex);
-	return err;
+	return __bpf_probe_register(btp, prog);
 }
 
 int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog)
 {
-	int err;
-
-	mutex_lock(&bpf_event_mutex);
-	err = tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
-	mutex_unlock(&bpf_event_mutex);
-	return err;
+	return tracepoint_probe_unregister(btp->tp, (void *)btp->bpf_func, prog);
 }
 
 int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next 4/4] bpf: Fix syscall's stackmap lookup potential deadlock
  2019-01-30  4:04 [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2019-01-30  4:04 ` [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register Alexei Starovoitov
@ 2019-01-30  4:04 ` Alexei Starovoitov
  2019-01-30  4:07 ` [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
  4 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30  4:04 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, edumazet, jannh, netdev, kernel-team

From: Martin KaFai Lau <kafai@fb.com>

The map_lookup_elem used to not acquiring spinlock
in order to optimize the reader.

It was true until commit 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
The syscall's map_lookup_elem(stackmap) calls bpf_stackmap_copy().
bpf_stackmap_copy() may find the elem no longer needed after the copy is done.
If that is the case, pcpu_freelist_push() saves this elem for reuse later.
This push requires a spinlock.

If a tracing bpf_prog got run in the middle of the syscall's
map_lookup_elem(stackmap) and this tracing bpf_prog is calling
bpf_get_stackid(stackmap) which also requires the same pcpu_freelist's
spinlock, it may end up with a dead lock situation as reported by
Eric Dumazet in https://patchwork.ozlabs.org/patch/1030266/

The situation is the same as the syscall's map_update_elem() which
needs to acquire the pcpu_freelist's spinlock and could race
with tracing bpf_prog.  Hence, this patch fixes it by protecting
bpf_stackmap_copy() with this_cpu_inc(bpf_prog_active)
to prevent tracing bpf_prog from running.

A later syscall's map_lookup_elem commit f1a2e44a3aec ("bpf: add queue and stack maps")
also acquires a spinlock and races with tracing bpf_prog similarly.
Hence, this patch is forward looking and protects the majority
of the map lookups.  bpf_map_offload_lookup_elem() is the exception
since it is for network bpf_prog only (i.e. never called by tracing
bpf_prog).

Fixes: 557c0c6e7df8 ("bpf: convert stackmap to pre-allocation")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/syscall.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b155cd17c1bd..8577bb7f8be6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -713,8 +713,13 @@ static int map_lookup_elem(union bpf_attr *attr)
 
 	if (bpf_map_is_dev_bound(map)) {
 		err = bpf_map_offload_lookup_elem(map, key, value);
-	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-		   map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
+		goto done;
+	}
+
+	preempt_disable();
+	this_cpu_inc(bpf_prog_active);
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
 		err = bpf_percpu_hash_copy(map, key, value);
 	} else if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
 		err = bpf_percpu_array_copy(map, key, value);
@@ -744,7 +749,10 @@ static int map_lookup_elem(union bpf_attr *attr)
 		}
 		rcu_read_unlock();
 	}
+	this_cpu_dec(bpf_prog_active);
+	preempt_enable();
 
+done:
 	if (err)
 		goto free_value;
 
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock
  2019-01-30  4:04 [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2019-01-30  4:04 ` [PATCH bpf-next 4/4] bpf: Fix syscall's stackmap lookup potential deadlock Alexei Starovoitov
@ 2019-01-30  4:07 ` Alexei Starovoitov
  4 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30  4:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Peter Zijlstra, Eric Dumazet,
	Jann Horn, Network Development, Kernel Team

On Tue, Jan 29, 2019 at 8:06 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> In addition to preempt_disable patch for socket filters
> https://patchwork.ozlabs.org/patch/1032437/
> the first three patches fix various lockdep false positives.
> Last patch fixes potential deadlock in stackmap access from
> tracing bpf prog and from syscall.

Typo in subject.
All patches are for 'bpf' tree.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
  2019-01-30  4:04 ` [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register Alexei Starovoitov
@ 2019-01-30 10:10   ` Peter Zijlstra
  2019-01-30 10:37     ` Peter Zijlstra
  2019-01-30 19:32     ` Alexei Starovoitov
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-01-30 10:10 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, edumazet, jannh, netdev, kernel-team

On Tue, Jan 29, 2019 at 08:04:57PM -0800, Alexei Starovoitov wrote:
> Lockdep warns about false positive:

The report reads like:

	tracepoint_probe_register()
#0	  mutex_lock(&tracepoint_mutex)
	  tracepoint_add_func()
	    static_key_slow_inc()
#1	      cpus_read_lock();


	_cpu_up()
#1	  cpus_write_lock();
	  ...
	  perf_event_init_cpu()
#2	    mutex_lock(&pmus_lock);
#3	    mutex_lock(&ctx->mutex);


	perf_ioctl()
#4	  perf_event_ctx_lock();
	  _perf_ioctl(IOC_QUERY_BPF)
	    perf_event_query_prog_array()
#5	      mutex_lock(&bpf_event_mutex);


	bpf_probe_register()
#5	  mutex_lock(&bpf_event_mutex);
	  __bpf_probe_register()
	    tracepoint_probe_register()
#0	      mutex_lock(&tracepoint_mutex);

Which to me reads like an entirely valid deadlock scenario.

And note that the first and last can be combined to give:

	bpf_probe_register()
#5	  mutex_lock(&bpf_event_mutex);
	  __bpf_probe_register()
	    tracepoint_probe_register()
#0	      mutex_lock(&tracepoint_mutex);
	      tracepoint_add_func()
	        static_key_slow_inc()
#1		  cpus_read_lock();


Which generates a deadlock even without #0.

Why do you say this is not possible? All you need is 3 CPUs, one doing a
CPU online, one doing a perf ioctl() and one doing that
bpf_probe_register().



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30  4:04 ` [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap Alexei Starovoitov
@ 2019-01-30 10:15   ` Peter Zijlstra
  2019-01-30 19:30     ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-01-30 10:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, edumazet, jannh, netdev, kernel-team, Waiman Long

On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> Lockdep warns about false positive:

This is not a false positive, and you probably also need to use
down_read_non_owner() to match this up_read_non_owner().

{up,down}_read() and {up,down}_read_non_owner() are not only different
in the lockdep annotation; there is also optimistic spin stuff that
relies on 'owner' tracking.

> [   11.211460] ------------[ cut here ]------------
> [   11.211936] DEBUG_LOCKS_WARN_ON(depth <= 0)
> [   11.211985] WARNING: CPU: 0 PID: 141 at ../kernel/locking/lockdep.c:3592 lock_release+0x1ad/0x280
> [   11.213134] Modules linked in:
> [   11.213413] CPU: 0 PID: 141 Comm: systemd-journal Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #476
> [   11.214191] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> [   11.214954] RIP: 0010:lock_release+0x1ad/0x280
> [   11.217036] RSP: 0018:ffff88813ba03f50 EFLAGS: 00010086
> [   11.217516] RAX: 000000000000001f RBX: ffff8881378d8000 RCX: 0000000000000000
> [   11.218179] RDX: ffffffff810d3e9e RSI: 0000000000000001 RDI: ffffffff810d3eb3
> [   11.218851] RBP: ffff8881393e2b08 R08: 0000000000000002 R09: 0000000000000000
> [   11.219504] R10: 0000000000000000 R11: ffff88813ba03d9d R12: ffffffff8118dfa2
> [   11.220162] R13: 0000000000000086 R14: 0000000000000000 R15: 0000000000000000
> [   11.220717] FS:  00007f3c8cf35780(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
> [   11.221348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   11.221822] CR2: 00007f5825d92080 CR3: 00000001378c8005 CR4: 00000000003606f0
> [   11.222381] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   11.222951] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   11.223508] Call Trace:
> [   11.223705]  <IRQ>
> [   11.223874]  ? __local_bh_enable+0x7a/0x80
> [   11.224199]  up_read+0x1c/0xa0
> [   11.224446]  do_up_read+0x12/0x20
> [   11.224713]  irq_work_run_list+0x43/0x70
> [   11.225030]  irq_work_run+0x26/0x50
> [   11.225310]  smp_irq_work_interrupt+0x57/0x1f0
> [   11.225662]  irq_work_interrupt+0xf/0x20
> 
> since rw_semaphore is released in a different task vs task that locked the sema.
> It is expected behavior.
> Silence the warning by using up_read_non_owner().
> 
> Fixes: bae77c5eb5b2 ("bpf: enable stackmap with build_id in nmi context")
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/bpf/stackmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index d43b14535827..4b79e7c251e5 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -44,7 +44,7 @@ static void do_up_read(struct irq_work *entry)
>  	struct stack_map_irq_work *work;
>  
>  	work = container_of(entry, struct stack_map_irq_work, irq_work);
> -	up_read(work->sem);
> +	up_read_non_owner(work->sem);
>  	work->sem = NULL;
>  }
>  
> -- 
> 2.20.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
  2019-01-30  4:04 ` [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist Alexei Starovoitov
@ 2019-01-30 10:21   ` Peter Zijlstra
  2019-01-30 19:27     ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-01-30 10:21 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, edumazet, jannh, netdev, kernel-team

On Tue, Jan 29, 2019 at 08:04:55PM -0800, Alexei Starovoitov wrote:
> 
> It has been explained that is a false positive here:
> https://lkml.org/lkml/2018/7/25/756

Please, no external references like that. The best option is to fully
include the explanation here so that a future software archeology
student (note, this might be yourself in a years time) can find all
relevant information in the git history.

Or, if you _really_ _really_ have to refer to external sources, use:

  https://lkml.kernel.org/r/${msg_id}

which is a controlled URL and can be made to point to any archive
(currently, and sadly, lore.kernel.org).

Also, since it includes the whole msg_id, people can trivially find the
email in their local archive.

While I agree that lkml.org is a _MUCH_ saner interface than lore (which
causes eye and brain cancer for just looking at it), it is a _really_
flaky website and the url contains no clues for finding it in the local
archive.

> Recap:
> - stackmap uses pcpu_freelist
> - The lock in pcpu_freelist is a percpu lock
> - stackmap is only used by tracing bpf_prog
> - A tracing bpf_prog cannot be run if another bpf_prog
>   has already been running (ensured by the percpu bpf_prog_active counter).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
  2019-01-30 10:10   ` Peter Zijlstra
@ 2019-01-30 10:37     ` Peter Zijlstra
  2019-01-30 19:32     ` Alexei Starovoitov
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-01-30 10:37 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, edumazet, jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 11:10:58AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 29, 2019 at 08:04:57PM -0800, Alexei Starovoitov wrote:
> > Lockdep warns about false positive:
> 
> The report reads like:
> 
> 	tracepoint_probe_register()
> #0	  mutex_lock(&tracepoint_mutex)
> 	  tracepoint_add_func()
> 	    static_key_slow_inc()
> #1	      cpus_read_lock();
> 
> 
> 	_cpu_up()
> #1	  cpus_write_lock();
> 	  ...
> 	  perf_event_init_cpu()
> #2	    mutex_lock(&pmus_lock);
> #3	    mutex_lock(&ctx->mutex);
> 
> 
> 	perf_ioctl()
> #4	  perf_event_ctx_lock();

Sorry, that's #3, and then do s/#5/#4/ on the rest of the text.

> 	  _perf_ioctl(IOC_QUERY_BPF)
> 	    perf_event_query_prog_array()
> #5	      mutex_lock(&bpf_event_mutex);
> 
> 
> 	bpf_probe_register()
> #5	  mutex_lock(&bpf_event_mutex);
> 	  __bpf_probe_register()
> 	    tracepoint_probe_register()
> #0	      mutex_lock(&tracepoint_mutex);
> 
> Which to me reads like an entirely valid deadlock scenario.
> 
> And note that the first and last can be combined to give:
> 
> 	bpf_probe_register()
> #5	  mutex_lock(&bpf_event_mutex);
> 	  __bpf_probe_register()
> 	    tracepoint_probe_register()
> #0	      mutex_lock(&tracepoint_mutex);
> 	      tracepoint_add_func()
> 	        static_key_slow_inc()
> #1		  cpus_read_lock();
> 
> 
> Which generates a deadlock even without #0.
> 
> Why do you say this is not possible? All you need is 3 CPUs, one doing a
> CPU online, one doing a perf ioctl() and one doing that
> bpf_probe_register().
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
  2019-01-30 10:21   ` Peter Zijlstra
@ 2019-01-30 19:27     ` Alexei Starovoitov
  2019-01-30 19:53       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30 19:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 11:21:26AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 29, 2019 at 08:04:55PM -0800, Alexei Starovoitov wrote:
> > 
> > It has been explained that is a false positive here:
> > https://lkml.org/lkml/2018/7/25/756
> 
> Please, no external references like that. The best option is to fully

I strongly disagree.
We allowed all kinds of external links in bpf tree in the past and
going to continue doing so in the future.
I'm perfectly aware that some of them will go stale in a day or
in a year.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 10:15   ` Peter Zijlstra
@ 2019-01-30 19:30     ` Alexei Starovoitov
  2019-01-30 19:42       ` Waiman Long
  2019-01-30 19:44       ` Peter Zijlstra
  0 siblings, 2 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30 19:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team, Waiman Long

On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> > Lockdep warns about false positive:
> 
> This is not a false positive, and you probably also need to use
> down_read_non_owner() to match this up_read_non_owner().
> 
> {up,down}_read() and {up,down}_read_non_owner() are not only different
> in the lockdep annotation; there is also optimistic spin stuff that
> relies on 'owner' tracking.

Can you point out in the code the spin bit?
As far as I can see sem->owner is debug only feature.
All owner checks are done under CONFIG_DEBUG_RWSEMS.

Also there is no down_read_trylock_non_owner() at the moment.
We can argue about it for -next, but I'd rather silence lockdep
with this patch today.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
  2019-01-30 10:10   ` Peter Zijlstra
  2019-01-30 10:37     ` Peter Zijlstra
@ 2019-01-30 19:32     ` Alexei Starovoitov
  2019-01-30 19:46       ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30 19:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 11:10:58AM +0100, Peter Zijlstra wrote:
> 
> Why do you say this is not possible? All you need is 3 CPUs, one doing a
> CPU online, one doing a perf ioctl() and one doing that
> bpf_probe_register().

yeah. indeed. I'm impressed that lockdep figured it out
while I missed it manually looking through the code and stack traces.
Will reword the commit log.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 19:30     ` Alexei Starovoitov
@ 2019-01-30 19:42       ` Waiman Long
  2019-01-30 20:10         ` Alexei Starovoitov
  2019-01-30 19:44       ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Waiman Long @ 2019-01-30 19:42 UTC (permalink / raw)
  To: Alexei Starovoitov, Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev, kernel-team

On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>> Lockdep warns about false positive:
>> This is not a false positive, and you probably also need to use
>> down_read_non_owner() to match this up_read_non_owner().
>>
>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>> in the lockdep annotation; there is also optimistic spin stuff that
>> relies on 'owner' tracking.
> Can you point out in the code the spin bit?
> As far as I can see sem->owner is debug only feature.
> All owner checks are done under CONFIG_DEBUG_RWSEMS.

No, sem->owner is mainly for performing optimistic spinning which is a
performance feature to make rwsem writer-lock performs similar to mutex.
The debugging part is just an add-on. It is not the reason for the
presence of sem->owner.

> Also there is no down_read_trylock_non_owner() at the moment.
> We can argue about it for -next, but I'd rather silence lockdep
> with this patch today.
>
We can add down_read_trylock_non_owner() if there is a need for it. It
should be easy to do.

Cheers,
Longman

down_read_trylock_non_owner(


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 19:30     ` Alexei Starovoitov
  2019-01-30 19:42       ` Waiman Long
@ 2019-01-30 19:44       ` Peter Zijlstra
  2019-01-30 20:05         ` Waiman Long
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-01-30 19:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev,
	kernel-team, Waiman Long

On Wed, Jan 30, 2019 at 11:30:41AM -0800, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> > > Lockdep warns about false positive:
> > 
> > This is not a false positive, and you probably also need to use
> > down_read_non_owner() to match this up_read_non_owner().
> > 
> > {up,down}_read() and {up,down}_read_non_owner() are not only different
> > in the lockdep annotation; there is also optimistic spin stuff that
> > relies on 'owner' tracking.
> 
> Can you point out in the code the spin bit?

Hurmph, looks like you're right. I got lost in that stuff again. I hate
that rwsem code :/

Rewriting that all is on the todo list somewhere, but it's far down :/

> As far as I can see sem->owner is debug only feature.
> All owner checks are done under CONFIG_DEBUG_RWSEMS.
> 
> Also there is no down_read_trylock_non_owner() at the moment.
> We can argue about it for -next, but I'd rather silence lockdep
> with this patch today.

I don't agree with calling is silencing; it fixes an mis-use of the API.
But yes, keep the patch as is for now.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register
  2019-01-30 19:32     ` Alexei Starovoitov
@ 2019-01-30 19:46       ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-01-30 19:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 11:32:43AM -0800, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 11:10:58AM +0100, Peter Zijlstra wrote:
> > 
> > Why do you say this is not possible? All you need is 3 CPUs, one doing a
> > CPU online, one doing a perf ioctl() and one doing that
> > bpf_probe_register().
> 
> yeah. indeed. I'm impressed that lockdep figured it out
> while I missed it manually looking through the code and stack traces.

That's what it does :-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
  2019-01-30 19:27     ` Alexei Starovoitov
@ 2019-01-30 19:53       ` Peter Zijlstra
  2019-01-30 20:18         ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-01-30 19:53 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 11:27:54AM -0800, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 11:21:26AM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 29, 2019 at 08:04:55PM -0800, Alexei Starovoitov wrote:
> > > 
> > > It has been explained that is a false positive here:
> > > https://lkml.org/lkml/2018/7/25/756
> > 
> > Please, no external references like that. The best option is to fully
> 
> I strongly disagree.
> We allowed all kinds of external links in bpf tree in the past and
> going to continue doing so in the future.
> I'm perfectly aware that some of them will go stale in a day or
> in a year.

What's the point of adding URLs if you know they'll not be useful later?

Anyway, your tree, so you get to make the rules, but personally I've
cursed about this exact issue a fair few times.

See for example the x86 tree policy of creating BZ entries to store
Intel documents to refer to them from commits because the Intel website
is notoriously flaky wrt persistence.

There's nothing worse than references to a document you can no longer
find while trying to make sense of this 3 year old code that suddenly
comes apart.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 19:44       ` Peter Zijlstra
@ 2019-01-30 20:05         ` Waiman Long
  0 siblings, 0 replies; 27+ messages in thread
From: Waiman Long @ 2019-01-30 20:05 UTC (permalink / raw)
  To: Peter Zijlstra, Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev, kernel-team

On 01/30/2019 02:44 PM, Peter Zijlstra wrote:
> On Wed, Jan 30, 2019 at 11:30:41AM -0800, Alexei Starovoitov wrote:
>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>> Lockdep warns about false positive:
>>> This is not a false positive, and you probably also need to use
>>> down_read_non_owner() to match this up_read_non_owner().
>>>
>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>> in the lockdep annotation; there is also optimistic spin stuff that
>>> relies on 'owner' tracking.
>> Can you point out in the code the spin bit?
> Hurmph, looks like you're right. I got lost in that stuff again. I hate
> that rwsem code :/

It is actually related to how the lockdep code track if a lock is
acquired or released. It is not actually related to how the rwsem code work.

>
> Rewriting that all is on the todo list somewhere, but it's far down :/
>

I am actually planning to rewrite it. Hopefully, I can send out the
patch soon.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 19:42       ` Waiman Long
@ 2019-01-30 20:10         ` Alexei Starovoitov
  2019-01-30 21:11           ` Waiman Long
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30 20:10 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
> > On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
> >> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> >>> Lockdep warns about false positive:
> >> This is not a false positive, and you probably also need to use
> >> down_read_non_owner() to match this up_read_non_owner().
> >>
> >> {up,down}_read() and {up,down}_read_non_owner() are not only different
> >> in the lockdep annotation; there is also optimistic spin stuff that
> >> relies on 'owner' tracking.
> > Can you point out in the code the spin bit?
> > As far as I can see sem->owner is debug only feature.
> > All owner checks are done under CONFIG_DEBUG_RWSEMS.
> 
> No, sem->owner is mainly for performing optimistic spinning which is a
> performance feature to make rwsem writer-lock performs similar to mutex.
> The debugging part is just an add-on. It is not the reason for the
> presence of sem->owner.

I see. Got it.

> > Also there is no down_read_trylock_non_owner() at the moment.
> > We can argue about it for -next, but I'd rather silence lockdep
> > with this patch today.
> >
> We can add down_read_trylock_non_owner() if there is a need for it. It
> should be easy to do.

Yes, but looking through the code it's not clear to me that it's safe
to mix non_owner() versions with regular.
bpf/stackmap.c does down_read_trylock + up_read.
If we add new down_read_trylock_non_owner that set the owner to
NULL | RWSEM_* bits is this safe with conccurent read/write
that do regular versions?
rwsem_can_spin_on_owner() does:
        if (owner) {
                ret = is_rwsem_owner_spinnable(owner) &&
                      owner_on_cpu(owner);
that looks correct.
For a second I thought there could be fault here due to non_owner.
But there could be other places where it's assumed that owner
is never null?

May be we should live with this lockdep warn in bpf tree
and fix it only in bpf-next?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist
  2019-01-30 19:53       ` Peter Zijlstra
@ 2019-01-30 20:18         ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-30 20:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, davem, daniel, edumazet, jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 08:53:13PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 30, 2019 at 11:27:54AM -0800, Alexei Starovoitov wrote:
> > On Wed, Jan 30, 2019 at 11:21:26AM +0100, Peter Zijlstra wrote:
> > > On Tue, Jan 29, 2019 at 08:04:55PM -0800, Alexei Starovoitov wrote:
> > > > 
> > > > It has been explained that is a false positive here:
> > > > https://lkml.org/lkml/2018/7/25/756
> > > 
> > > Please, no external references like that. The best option is to fully
> > 
> > I strongly disagree.
> > We allowed all kinds of external links in bpf tree in the past and
> > going to continue doing so in the future.
> > I'm perfectly aware that some of them will go stale in a day or
> > in a year.
> 
> What's the point of adding URLs if you know they'll not be useful later?
> 
> Anyway, your tree, so you get to make the rules, but personally I've
> cursed about this exact issue a fair few times.
> 
> See for example the x86 tree policy of creating BZ entries to store
> Intel documents to refer to them from commits because the Intel website
> is notoriously flaky wrt persistence.

well. yes. it's case by case.
Not every link would be acceptable.
URL to a news website wouldn't appropriate a git tree :)
but pointer to github.com or a corporate website likely going to be alive
for some time.

> There's nothing worse than references to a document you can no longer
> find while trying to make sense of this 3 year old code that suddenly
> comes apart.

true. I'm not saying that it's ok to put all info on the website
and keep commit minimal. Quite the opposite.
The commit log should be descriptive and contain all information
to explain the change. Extra url is an additional info.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 20:10         ` Alexei Starovoitov
@ 2019-01-30 21:11           ` Waiman Long
  2019-01-30 21:32             ` Waiman Long
  0 siblings, 1 reply; 27+ messages in thread
From: Waiman Long @ 2019-01-30 21:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team

On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>> Lockdep warns about false positive:
>>>> This is not a false positive, and you probably also need to use
>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>
>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>> relies on 'owner' tracking.
>>> Can you point out in the code the spin bit?
>>> As far as I can see sem->owner is debug only feature.
>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>> No, sem->owner is mainly for performing optimistic spinning which is a
>> performance feature to make rwsem writer-lock performs similar to mutex.
>> The debugging part is just an add-on. It is not the reason for the
>> presence of sem->owner.
> I see. Got it.
>
>>> Also there is no down_read_trylock_non_owner() at the moment.
>>> We can argue about it for -next, but I'd rather silence lockdep
>>> with this patch today.
>>>
>> We can add down_read_trylock_non_owner() if there is a need for it. It
>> should be easy to do.
> Yes, but looking through the code it's not clear to me that it's safe
> to mix non_owner() versions with regular.
> bpf/stackmap.c does down_read_trylock + up_read.
> If we add new down_read_trylock_non_owner that set the owner to
> NULL | RWSEM_* bits is this safe with conccurent read/write
> that do regular versions?
> rwsem_can_spin_on_owner() does:
>         if (owner) {
>                 ret = is_rwsem_owner_spinnable(owner) &&
>                       owner_on_cpu(owner);
> that looks correct.
> For a second I thought there could be fault here due to non_owner.
> But there could be other places where it's assumed that owner
> is never null?

The content of owner is not the cause of the lockdep warning. The
lockdep code assumes that the task that acquires the lock will release
it some time later. That is not the case when you need to acquire the
lock by one task and released by another. In this case, you have to use
the non_owner version of down/up_read which disable the lockdep
acquire/release tracking. That will be the only difference between the
two set of APIs.

Cheers,
Longman

>
> May be we should live with this lockdep warn in bpf tree
> and fix it only in bpf-next?
>



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 21:11           ` Waiman Long
@ 2019-01-30 21:32             ` Waiman Long
  2019-01-31  2:01               ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Waiman Long @ 2019-01-30 21:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team

On 01/30/2019 04:11 PM, Waiman Long wrote:
> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>>> Lockdep warns about false positive:
>>>>> This is not a false positive, and you probably also need to use
>>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>>
>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>>> relies on 'owner' tracking.
>>>> Can you point out in the code the spin bit?
>>>> As far as I can see sem->owner is debug only feature.
>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>>> No, sem->owner is mainly for performing optimistic spinning which is a
>>> performance feature to make rwsem writer-lock performs similar to mutex.
>>> The debugging part is just an add-on. It is not the reason for the
>>> presence of sem->owner.
>> I see. Got it.
>>
>>>> Also there is no down_read_trylock_non_owner() at the moment.
>>>> We can argue about it for -next, but I'd rather silence lockdep
>>>> with this patch today.
>>>>
>>> We can add down_read_trylock_non_owner() if there is a need for it. It
>>> should be easy to do.
>> Yes, but looking through the code it's not clear to me that it's safe
>> to mix non_owner() versions with regular.
>> bpf/stackmap.c does down_read_trylock + up_read.
>> If we add new down_read_trylock_non_owner that set the owner to
>> NULL | RWSEM_* bits is this safe with conccurent read/write
>> that do regular versions?
>> rwsem_can_spin_on_owner() does:
>>         if (owner) {
>>                 ret = is_rwsem_owner_spinnable(owner) &&
>>                       owner_on_cpu(owner);
>> that looks correct.
>> For a second I thought there could be fault here due to non_owner.
>> But there could be other places where it's assumed that owner
>> is never null?
> The content of owner is not the cause of the lockdep warning. The
> lockdep code assumes that the task that acquires the lock will release
> it some time later. That is not the case when you need to acquire the
> lock by one task and released by another. In this case, you have to use
> the non_owner version of down/up_read which disable the lockdep
> acquire/release tracking. That will be the only difference between the
> two set of APIs.
>
> Cheers,
> Longman

BTW, you may want to do something like that to make sure that the lock
ownership is probably tracked.

-Longman

diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index d43b145..79eef9d 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
bpf_stack_
        } else {
                work->sem = &current->mm->mmap_sem;
                irq_work_queue(&work->irq_work);
+
+               /*
+                * The irq_work will release the mmap_sem with
+                * up_read_non_owner(). The rwsem_release() is called
+                * here to release the lock from lockdep's perspective.
+                */
+               rwsem_release(&current->mm->mmap_sem.dep_map, 1, _RET_IP_);
        }
 }


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-30 21:32             ` Waiman Long
@ 2019-01-31  2:01               ` Alexei Starovoitov
  2019-01-31  2:48                 ` Waiman Long
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-01-31  2:01 UTC (permalink / raw)
  To: Waiman Long
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team

On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote:
> On 01/30/2019 04:11 PM, Waiman Long wrote:
> > On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
> >> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
> >>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
> >>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
> >>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
> >>>>>> Lockdep warns about false positive:
> >>>>> This is not a false positive, and you probably also need to use
> >>>>> down_read_non_owner() to match this up_read_non_owner().
> >>>>>
> >>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
> >>>>> in the lockdep annotation; there is also optimistic spin stuff that
> >>>>> relies on 'owner' tracking.
> >>>> Can you point out in the code the spin bit?
> >>>> As far as I can see sem->owner is debug only feature.
> >>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
> >>> No, sem->owner is mainly for performing optimistic spinning which is a
> >>> performance feature to make rwsem writer-lock performs similar to mutex.
> >>> The debugging part is just an add-on. It is not the reason for the
> >>> presence of sem->owner.
> >> I see. Got it.
> >>
> >>>> Also there is no down_read_trylock_non_owner() at the moment.
> >>>> We can argue about it for -next, but I'd rather silence lockdep
> >>>> with this patch today.
> >>>>
> >>> We can add down_read_trylock_non_owner() if there is a need for it. It
> >>> should be easy to do.
> >> Yes, but looking through the code it's not clear to me that it's safe
> >> to mix non_owner() versions with regular.
> >> bpf/stackmap.c does down_read_trylock + up_read.
> >> If we add new down_read_trylock_non_owner that set the owner to
> >> NULL | RWSEM_* bits is this safe with conccurent read/write
> >> that do regular versions?
> >> rwsem_can_spin_on_owner() does:
> >>         if (owner) {
> >>                 ret = is_rwsem_owner_spinnable(owner) &&
> >>                       owner_on_cpu(owner);
> >> that looks correct.
> >> For a second I thought there could be fault here due to non_owner.
> >> But there could be other places where it's assumed that owner
> >> is never null?
> > The content of owner is not the cause of the lockdep warning. The
> > lockdep code assumes that the task that acquires the lock will release
> > it some time later. That is not the case when you need to acquire the
> > lock by one task and released by another. In this case, you have to use
> > the non_owner version of down/up_read which disable the lockdep
> > acquire/release tracking. That will be the only difference between the
> > two set of APIs.
> >
> > Cheers,
> > Longman
> 
> BTW, you may want to do something like that to make sure that the lock
> ownership is probably tracked.
> 
> -Longman
> 
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index d43b145..79eef9d 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
> bpf_stack_
>         } else {
>                 work->sem = &current->mm->mmap_sem;
>                 irq_work_queue(&work->irq_work);
> +
> +               /*
> +                * The irq_work will release the mmap_sem with
> +                * up_read_non_owner(). The rwsem_release() is called
> +                * here to release the lock from lockdep's perspective.
> +                */
> +               rwsem_release(&current->mm->mmap_sem.dep_map, 1, _RET_IP_);

are you saying the above is enough coupled with up_read_non_owner?
Looking at how amdgpu is using this api... I think they just use non_owner
version when doing things from different task.
So I don't think pairing non_owner with non_owner is strictly necessary.
It seems fine to use down_read_trylock() with above rwsem_release() hack
plus up_read_non_owner().
Am I missing something?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-31  2:01               ` Alexei Starovoitov
@ 2019-01-31  2:48                 ` Waiman Long
  2019-02-06  3:21                   ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Waiman Long @ 2019-01-31  2:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team

On 01/30/2019 09:01 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote:
>> On 01/30/2019 04:11 PM, Waiman Long wrote:
>>> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
>>>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>>>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>>>>> Lockdep warns about false positive:
>>>>>>> This is not a false positive, and you probably also need to use
>>>>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>>>>
>>>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>>>>> relies on 'owner' tracking.
>>>>>> Can you point out in the code the spin bit?
>>>>>> As far as I can see sem->owner is debug only feature.
>>>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>>>>> No, sem->owner is mainly for performing optimistic spinning which is a
>>>>> performance feature to make rwsem writer-lock performs similar to mutex.
>>>>> The debugging part is just an add-on. It is not the reason for the
>>>>> presence of sem->owner.
>>>> I see. Got it.
>>>>
>>>>>> Also there is no down_read_trylock_non_owner() at the moment.
>>>>>> We can argue about it for -next, but I'd rather silence lockdep
>>>>>> with this patch today.
>>>>>>
>>>>> We can add down_read_trylock_non_owner() if there is a need for it. It
>>>>> should be easy to do.
>>>> Yes, but looking through the code it's not clear to me that it's safe
>>>> to mix non_owner() versions with regular.
>>>> bpf/stackmap.c does down_read_trylock + up_read.
>>>> If we add new down_read_trylock_non_owner that set the owner to
>>>> NULL | RWSEM_* bits is this safe with conccurent read/write
>>>> that do regular versions?
>>>> rwsem_can_spin_on_owner() does:
>>>>         if (owner) {
>>>>                 ret = is_rwsem_owner_spinnable(owner) &&
>>>>                       owner_on_cpu(owner);
>>>> that looks correct.
>>>> For a second I thought there could be fault here due to non_owner.
>>>> But there could be other places where it's assumed that owner
>>>> is never null?
>>> The content of owner is not the cause of the lockdep warning. The
>>> lockdep code assumes that the task that acquires the lock will release
>>> it some time later. That is not the case when you need to acquire the
>>> lock by one task and released by another. In this case, you have to use
>>> the non_owner version of down/up_read which disable the lockdep
>>> acquire/release tracking. That will be the only difference between the
>>> two set of APIs.
>>>
>>> Cheers,
>>> Longman
>> BTW, you may want to do something like that to make sure that the lock
>> ownership is probably tracked.
>>
>> -Longman
>>
>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>> index d43b145..79eef9d 100644
>> --- a/kernel/bpf/stackmap.c
>> +++ b/kernel/bpf/stackmap.c
>> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
>> bpf_stack_
>>         } else {
>>                 work->sem = &current->mm->mmap_sem;
>>                 irq_work_queue(&work->irq_work);
>> +
>> +               /*
>> +                * The irq_work will release the mmap_sem with
>> +                * up_read_non_owner(). The rwsem_release() is called
>> +                * here to release the lock from lockdep's perspective.
>> +                */
>> +               rwsem_release(&current->mm->mmap_sem.dep_map, 1, _RET_IP_);
> are you saying the above is enough coupled with up_read_non_owner?
> Looking at how amdgpu is using this api... I think they just use non_owner
> version when doing things from different task.
> So I don't think pairing non_owner with non_owner is strictly necessary.
> It seems fine to use down_read_trylock() with above rwsem_release() hack
> plus up_read_non_owner().
> Am I missing something?
>
The statement above is to clear the state for the lockdep so that it
knows that the task no longer owns the lock. Without doing that, there
is a possibility of some other kind of incorrect lockdep warning may be
produced because the code will still think the task hold a read-lock on
the mmap_sem. It is also possible no warning will be reported.

The bpf code is kind of special that it acquires the mmap_sem. Later on,
it either releases it itself (non-NMI) or request irq_work to release it
(NMI). So either, you use the _non_owner versions for both acquire and
release or fake the release like the code segment above.

Peter, please chime in if you have other suggestion.

Cheers,
Longman



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-01-31  2:48                 ` Waiman Long
@ 2019-02-06  3:21                   ` Eric Dumazet
  2019-02-06  3:30                     ` Alexei Starovoitov
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2019-02-06  3:21 UTC (permalink / raw)
  To: Waiman Long, Alexei Starovoitov
  Cc: Peter Zijlstra, Alexei Starovoitov, davem, daniel, edumazet,
	jannh, netdev, kernel-team



On 01/30/2019 06:48 PM, Waiman Long wrote:
> On 01/30/2019 09:01 PM, Alexei Starovoitov wrote:
>> On Wed, Jan 30, 2019 at 04:32:12PM -0500, Waiman Long wrote:
>>> On 01/30/2019 04:11 PM, Waiman Long wrote:
>>>> On 01/30/2019 03:10 PM, Alexei Starovoitov wrote:
>>>>> On Wed, Jan 30, 2019 at 02:42:23PM -0500, Waiman Long wrote:
>>>>>> On 01/30/2019 02:30 PM, Alexei Starovoitov wrote:
>>>>>>> On Wed, Jan 30, 2019 at 11:15:30AM +0100, Peter Zijlstra wrote:
>>>>>>>> On Tue, Jan 29, 2019 at 08:04:56PM -0800, Alexei Starovoitov wrote:
>>>>>>>>> Lockdep warns about false positive:
>>>>>>>> This is not a false positive, and you probably also need to use
>>>>>>>> down_read_non_owner() to match this up_read_non_owner().
>>>>>>>>
>>>>>>>> {up,down}_read() and {up,down}_read_non_owner() are not only different
>>>>>>>> in the lockdep annotation; there is also optimistic spin stuff that
>>>>>>>> relies on 'owner' tracking.
>>>>>>> Can you point out in the code the spin bit?
>>>>>>> As far as I can see sem->owner is debug only feature.
>>>>>>> All owner checks are done under CONFIG_DEBUG_RWSEMS.
>>>>>> No, sem->owner is mainly for performing optimistic spinning which is a
>>>>>> performance feature to make rwsem writer-lock performs similar to mutex.
>>>>>> The debugging part is just an add-on. It is not the reason for the
>>>>>> presence of sem->owner.
>>>>> I see. Got it.
>>>>>
>>>>>>> Also there is no down_read_trylock_non_owner() at the moment.
>>>>>>> We can argue about it for -next, but I'd rather silence lockdep
>>>>>>> with this patch today.
>>>>>>>
>>>>>> We can add down_read_trylock_non_owner() if there is a need for it. It
>>>>>> should be easy to do.
>>>>> Yes, but looking through the code it's not clear to me that it's safe
>>>>> to mix non_owner() versions with regular.
>>>>> bpf/stackmap.c does down_read_trylock + up_read.
>>>>> If we add new down_read_trylock_non_owner that set the owner to
>>>>> NULL | RWSEM_* bits is this safe with conccurent read/write
>>>>> that do regular versions?
>>>>> rwsem_can_spin_on_owner() does:
>>>>>         if (owner) {
>>>>>                 ret = is_rwsem_owner_spinnable(owner) &&
>>>>>                       owner_on_cpu(owner);
>>>>> that looks correct.
>>>>> For a second I thought there could be fault here due to non_owner.
>>>>> But there could be other places where it's assumed that owner
>>>>> is never null?
>>>> The content of owner is not the cause of the lockdep warning. The
>>>> lockdep code assumes that the task that acquires the lock will release
>>>> it some time later. That is not the case when you need to acquire the
>>>> lock by one task and released by another. In this case, you have to use
>>>> the non_owner version of down/up_read which disable the lockdep
>>>> acquire/release tracking. That will be the only difference between the
>>>> two set of APIs.
>>>>
>>>> Cheers,
>>>> Longman
>>> BTW, you may want to do something like that to make sure that the lock
>>> ownership is probably tracked.
>>>
>>> -Longman
>>>
>>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
>>> index d43b145..79eef9d 100644
>>> --- a/kernel/bpf/stackmap.c
>>> +++ b/kernel/bpf/stackmap.c
>>> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
>>> bpf_stack_
>>>         } else {
>>>                 work->sem = &current->mm->mmap_sem;
>>>                 irq_work_queue(&work->irq_work);
>>> +
>>> +               /*
>>> +                * The irq_work will release the mmap_sem with
>>> +                * up_read_non_owner(). The rwsem_release() is called
>>> +                * here to release the lock from lockdep's perspective.
>>> +                */
>>> +               rwsem_release(&current->mm->mmap_sem.dep_map, 1, _RET_IP_);
>> are you saying the above is enough coupled with up_read_non_owner?
>> Looking at how amdgpu is using this api... I think they just use non_owner
>> version when doing things from different task.
>> So I don't think pairing non_owner with non_owner is strictly necessary.
>> It seems fine to use down_read_trylock() with above rwsem_release() hack
>> plus up_read_non_owner().
>> Am I missing something?
>>
> The statement above is to clear the state for the lockdep so that it
> knows that the task no longer owns the lock. Without doing that, there
> is a possibility of some other kind of incorrect lockdep warning may be
> produced because the code will still think the task hold a read-lock on
> the mmap_sem. It is also possible no warning will be reported.
> 
> The bpf code is kind of special that it acquires the mmap_sem. Later on,
> it either releases it itself (non-NMI) or request irq_work to release it
> (NMI). So either, you use the _non_owner versions for both acquire and
> release or fake the release like the code segment above.
> 
> Peter, please chime in if you have other suggestion.
> 

Hi guys

What are the plans to address the issue then ?

Latest net tree exhibits this trace :

Thanks

[  546.116982] =====================================
[  546.121688] WARNING: bad unlock balance detected!
[  546.126393] 5.0.0-dbg-DEV #559 Not tainted
[  546.130491] -------------------------------------
[  546.135196] taskset/15409 is trying to release lock (&mm->mmap_sem) at:
[  546.141844] [<ffffffffb0233246>] do_up_read+0x16/0x30
[  546.146919] but there are no more locks to release!
[  546.151790] 
               other info that might help us debug this:
[  546.158325] 1 lock held by taskset/15409:
[  546.162325]  #0: 00000000683db857 (&sig->cred_guard_mutex){+.+.}, at: __do_execve_file.isra.38+0x13e/0xbc0
[  546.171978] 
               stack backtrace:
[  546.176327] CPU: 0 PID: 15409 Comm: taskset Not tainted 5.0.0-dbg-DEV #559
[  546.183214] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.54.0 06/07/2018
[  546.190182] Call Trace:
[  546.192633]  <IRQ>
[  546.194672]  dump_stack+0x67/0x95
[  546.198006]  ? do_up_read+0x16/0x30
[  546.201491]  print_unlock_imbalance_bug.part.33+0xd0/0xd7
[  546.206914]  ? do_up_read+0x16/0x30
[  546.210406]  lock_release+0x213/0x2d0
[  546.214088]  up_read+0x20/0xa0
[  546.217138]  do_up_read+0x16/0x30
[  546.220457]  irq_work_run_list+0x4a/0x70
[  546.224408]  irq_work_run+0x18/0x40
[  546.227911]  smp_irq_work_interrupt+0x54/0x1d0
[  546.232362]  irq_work_interrupt+0xf/0x20
[  546.236280]  </IRQ>
[  546.238396] RIP: 0010:release_pages+0x69/0x650
[  546.242839] Code: 01 4c 8b 2f 4c 8d 67 08 48 8d 44 f7 08 45 31 f6 48 89 04 24 eb 04 49 83 c4 08 48 8b 05 20 fe 31 01 49 39 c5 74 26 4d 8b 7d 08 <4d> 8d 47 ff 41 83 e7 01 4d 0f 44 c5 41 8b 40 34 4d 89 c7 85 c0 0f
[  546.261615] RSP: 0018:ffffac604732bb00 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff09
[  546.269190] RAX: ffffe4c9c0ca8000 RBX: 0000000000000020 RCX: 0000000000000006
[  546.276352] RDX: 0000000000000006 RSI: ffff8cf7facb6aa8 RDI: ffff8cf7facb6240
[  546.283482] RBP: ffffac604732bb70 R08: ffffe4c9c0ba3d80 R09: 0000000000000000
[  546.290613] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cf7db3292d8
[  546.297752] R13: ffffe4c9c0ba3dc0 R14: 0000000000000000 R15: ffffe4c9c0ba3d88
[  546.304907]  free_pages_and_swap_cache+0xe6/0x100
[  546.309618]  tlb_flush_mmu_free+0x36/0x60
[  546.313625]  arch_tlb_finish_mmu+0x28/0x1e0
[  546.317801]  tlb_finish_mmu+0x23/0x30
[  546.321459]  exit_mmap+0xc9/0x1f0
[  546.324787]  ? __mutex_unlock_slowpath+0x11/0x2e0
[  546.329502]  mmput+0x62/0x130
[  546.332481]  flush_old_exec+0x60e/0x810
[  546.336314]  load_elf_binary+0x830/0x18c0
[  546.340323]  ? search_binary_handler+0x8a/0x200
[  546.344848]  search_binary_handler+0x99/0x200
[  546.349221]  __do_execve_file.isra.38+0x76d/0xbc0
[  546.353918]  __x64_sys_execve+0x39/0x50
[  546.357757]  do_syscall_64+0x5a/0x460
[  546.361440]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  546.366489] RIP: 0033:0x7fd599e3c067
[  546.370069] Code: Bad RIP value.
[  546.373306] RSP: 002b:00007ffcf7092e58 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
[  546.380880] RAX: ffffffffffffffda RBX: 00007ffcf7093058 RCX: 00007fd599e3c067
[  546.388010] RDX: 00007ffcf7093070 RSI: 00007ffcf7093058 RDI: 00007ffcf70937c0
[  546.395132] RBP: 00007ffcf7092ec0 R08: 00000000ffffffff R09: 0000000001b1ae40
[  546.402265] R10: 00007ffcf7092c80 R11: 0000000000000202 R12: 00007ffcf70937c0
[  546.409385] R13: 00007ffcf7093070 R14: 0000000000000000 R15: 00007ffcf7093048


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-02-06  3:21                   ` Eric Dumazet
@ 2019-02-06  3:30                     ` Alexei Starovoitov
  2019-02-06  3:40                       ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Alexei Starovoitov @ 2019-02-06  3:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Waiman Long, Peter Zijlstra, Alexei Starovoitov, davem, daniel,
	edumazet, jannh, netdev, kernel-team, songliubraving

On Tue, Feb 05, 2019 at 07:21:08PM -0800, Eric Dumazet wrote:
> >>>
> >>> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> >>> index d43b145..79eef9d 100644
> >>> --- a/kernel/bpf/stackmap.c
> >>> +++ b/kernel/bpf/stackmap.c
> >>> @@ -338,6 +338,13 @@ static void stack_map_get_build_id_offset(struct
> >>> bpf_stack_
> >>>         } else {
> >>>                 work->sem = &current->mm->mmap_sem;
> >>>                 irq_work_queue(&work->irq_work);
> >>> +
> >>> +               /*
> >>> +                * The irq_work will release the mmap_sem with
> >>> +                * up_read_non_owner(). The rwsem_release() is called
> >>> +                * here to release the lock from lockdep's perspective.
> >>> +                */
> >>> +               rwsem_release(&current->mm->mmap_sem.dep_map, 1, _RET_IP_);
> >> are you saying the above is enough coupled with up_read_non_owner?
> >> Looking at how amdgpu is using this api... I think they just use non_owner
> >> version when doing things from different task.
> >> So I don't think pairing non_owner with non_owner is strictly necessary.
> >> It seems fine to use down_read_trylock() with above rwsem_release() hack
> >> plus up_read_non_owner().
> >> Am I missing something?
> >>
> > The statement above is to clear the state for the lockdep so that it
> > knows that the task no longer owns the lock. Without doing that, there
> > is a possibility of some other kind of incorrect lockdep warning may be
> > produced because the code will still think the task hold a read-lock on
> > the mmap_sem. It is also possible no warning will be reported.
> > 
> > The bpf code is kind of special that it acquires the mmap_sem. Later on,
> > it either releases it itself (non-NMI) or request irq_work to release it
> > (NMI). So either, you use the _non_owner versions for both acquire and
> > release or fake the release like the code segment above.
> > 
> > Peter, please chime in if you have other suggestion.
> > 
> 
> Hi guys
> 
> What are the plans to address the issue then ?
> Latest net tree exhibits this trace :

Thanks for the reminder :)

I've been waiting for Peter's direction on this one.
Happy to fix it whichever way.

To recap:
Approach 1:
s/up_read/up_read_non_owner/ from irq_work + rwsem_release as Longman proposed.

Apporach 2:
introduce down_read_trylock_non_owner and pair it with up_read_non_owner
in both irq_work and normal.

My preference is 1. I think Longman's as well.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap
  2019-02-06  3:30                     ` Alexei Starovoitov
@ 2019-02-06  3:40                       ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2019-02-06  3:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Waiman Long, Peter Zijlstra, Alexei Starovoitov, davem, daniel,
	edumazet, jannh, netdev, kernel-team, songliubraving



On 02/05/2019 07:30 PM, Alexei Starovoitov wrote:

> Thanks for the reminder :)
> 
> I've been waiting for Peter's direction on this one.
> Happy to fix it whichever way.
> 
> To recap:
> Approach 1:
> s/up_read/up_read_non_owner/ from irq_work + rwsem_release as Longman proposed.
> 
> Apporach 2:
> introduce down_read_trylock_non_owner and pair it with up_read_non_owner
> in both irq_work and normal.
> 
> My preference is 1. I think Longman's as well.
> 


Unless we envision other uses for down_read_trylock_non_owner() and up_read_non_owner(),
I agree 1) seems good enough, at least for the short term.

Thanks !


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-02-06  3:40 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-30  4:04 [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov
2019-01-30  4:04 ` [PATCH bpf-next 1/4] bpf: fix lockdep false positive in percpu_freelist Alexei Starovoitov
2019-01-30 10:21   ` Peter Zijlstra
2019-01-30 19:27     ` Alexei Starovoitov
2019-01-30 19:53       ` Peter Zijlstra
2019-01-30 20:18         ` Alexei Starovoitov
2019-01-30  4:04 ` [PATCH bpf-next 2/4] bpf: fix lockdep false positive in stackmap Alexei Starovoitov
2019-01-30 10:15   ` Peter Zijlstra
2019-01-30 19:30     ` Alexei Starovoitov
2019-01-30 19:42       ` Waiman Long
2019-01-30 20:10         ` Alexei Starovoitov
2019-01-30 21:11           ` Waiman Long
2019-01-30 21:32             ` Waiman Long
2019-01-31  2:01               ` Alexei Starovoitov
2019-01-31  2:48                 ` Waiman Long
2019-02-06  3:21                   ` Eric Dumazet
2019-02-06  3:30                     ` Alexei Starovoitov
2019-02-06  3:40                       ` Eric Dumazet
2019-01-30 19:44       ` Peter Zijlstra
2019-01-30 20:05         ` Waiman Long
2019-01-30  4:04 ` [PATCH bpf-next 3/4] bpf: fix lockdep false positive in bpf_prog_register Alexei Starovoitov
2019-01-30 10:10   ` Peter Zijlstra
2019-01-30 10:37     ` Peter Zijlstra
2019-01-30 19:32     ` Alexei Starovoitov
2019-01-30 19:46       ` Peter Zijlstra
2019-01-30  4:04 ` [PATCH bpf-next 4/4] bpf: Fix syscall's stackmap lookup potential deadlock Alexei Starovoitov
2019-01-30  4:07 ` [PATCH bpf-next 0/4] bpf: fixes for lockdep and deadlock Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).