All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] BPF splat on latest kernels
@ 2019-03-08 20:30 Eric Dumazet
  2019-03-09  0:29 ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2019-03-08 20:30 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Networking, Eric Dumazet

Running test_progs on a LOCKDEP enabled kernel (latest David Miller net tree)

I got the following splat.

It is not immediately obvious to me. Any idea ?

[ 4169.908826] ======================================================
[ 4169.914996] WARNING: possible circular locking dependency detected
[ 4169.921172] 5.0.0-dbg-DEV #597 Not tainted
[ 4169.925294] ------------------------------------------------------
[ 4169.931482] swapper/29/0 is trying to acquire lock:
[ 4169.936379] 00000000541e3f72 (&htab->buckets[i].lock){....}, at: htab_map_update_elem+0x1b6/0x3a0
[ 4169.945262] 
               but task is already holding lock:
[ 4169.951084] 00000000e213eabd (&rq->lock){-.-.}, at: __schedule+0xd6/0xb80
[ 4169.957904] 
               which lock already depends on the new lock.

[ 4169.966101] 
               the existing dependency chain (in reverse order) is:
[ 4169.973605] 
               -> #4 (&rq->lock){-.-.}:
[ 4169.978651]        _raw_spin_lock+0x2f/0x40
[ 4169.982866]        task_fork_fair+0x37/0x160
[ 4169.987150]        sched_fork+0x206/0x310
[ 4169.991167]        copy_process.part.41+0x7eb/0x2370
[ 4169.996128]        _do_fork+0xda/0x6a0
[ 4169.999872]        kernel_thread+0x29/0x30
[ 4170.003962]        rest_init+0x22/0x260
[ 4170.007820]        arch_call_rest_init+0xe/0x10
[ 4170.012382]        start_kernel+0x4d8/0x4f9
[ 4170.016557]        x86_64_start_reservations+0x24/0x26
[ 4170.021697]        x86_64_start_kernel+0x6f/0x72
[ 4170.026312]        secondary_startup_64+0xa4/0xb0
[ 4170.031039] 
               -> #3 (&p->pi_lock){-.-.}:
[ 4170.036265]        _raw_spin_lock_irqsave+0x3a/0x50
[ 4170.041172]        try_to_wake_up+0x41/0x610
[ 4170.045466]        wake_up_process+0x15/0x20
[ 4170.049745]        create_worker+0x16b/0x1e0
[ 4170.054039]        workqueue_init+0x1ff/0x31d
[ 4170.058397]        kernel_init_freeable+0x116/0x2a5
[ 4170.063291]        kernel_init+0xf/0x180
[ 4170.067212]        ret_from_fork+0x24/0x30
[ 4170.071307] 
               -> #2 (&(&pool->lock)->rlock){-.-.}:
[ 4170.077425]        _raw_spin_lock+0x2f/0x40
[ 4170.081602]        __queue_work+0xb2/0x540
[ 4170.085709]        queue_work_on+0x38/0x80
[ 4170.089827]        free_percpu+0x231/0x270
[ 4170.093933]        array_map_free+0x47/0x80
[ 4170.098131]        bpf_map_free_deferred+0x3c/0x50
[ 4170.102926]        process_one_work+0x1f4/0x590
[ 4170.107466]        worker_thread+0x6f/0x430
[ 4170.111654]        kthread+0x132/0x170
[ 4170.115431]        ret_from_fork+0x24/0x30
[ 4170.119520] 
               -> #1 (pcpu_lock){..-.}:
[ 4170.124562]        _raw_spin_lock_irqsave+0x3a/0x50
[ 4170.129448]        pcpu_alloc+0xfa/0x790
[ 4170.133383]        __alloc_percpu_gfp+0x12/0x20
[ 4170.137905]        alloc_htab_elem+0x182/0x2b0
[ 4170.142351]        __htab_percpu_map_update_elem+0x270/0x2c0
[ 4170.148036]        bpf_percpu_hash_update+0x7c/0x130
[ 4170.153035]        __do_sys_bpf+0x194a/0x1c10
[ 4170.157385]        __x64_sys_bpf+0x1a/0x20
[ 4170.161485]        do_syscall_64+0x5a/0x460
[ 4170.165674]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 4170.171255] 
               -> #0 (&htab->buckets[i].lock){....}:
[ 4170.177433]        lock_acquire+0xa7/0x190
[ 4170.181549]        _raw_spin_lock_irqsave+0x3a/0x50
[ 4170.186448]        htab_map_update_elem+0x1b6/0x3a0
[ 4170.191347] 
               other info that might help us debug this:

[ 4170.199345] Chain exists of:
                 &htab->buckets[i].lock --> &p->pi_lock --> &rq->lock

[ 4170.209867]  Possible unsafe locking scenario:

[ 4170.215776]        CPU0                    CPU1
[ 4170.220301]        ----                    ----
[ 4170.224823]   lock(&rq->lock);
[ 4170.227882]                                lock(&p->pi_lock);
[ 4170.233646]                                lock(&rq->lock);
[ 4170.239208]   lock(&htab->buckets[i].lock);
[ 4170.243384] 
                *** DEADLOCK ***

[ 4170.249297] 2 locks held by swapper/29/0:
[ 4170.253307]  #0: 00000000e213eabd (&rq->lock){-.-.}, at: __schedule+0xd6/0xb80
[ 4170.260558]  #1: 000000006891c367 (rcu_read_lock){....}, at: trace_call_bpf+0x40/0x260
[ 4170.268476] 
               stack backtrace:
[ 4170.272838] CPU: 29 PID: 0 Comm: swapper/29 Not tainted 5.0.0-dbg-DEV #597
[ 4170.279708] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.54.0 06/07/2018
[ 4170.286658] Call Trace:
[ 4170.289108]  dump_stack+0x67/0x95
[ 4170.292424]  print_circular_bug.isra.36+0x1fd/0x20b
[ 4170.297302]  __lock_acquire+0x15b3/0x1850
[ 4170.301308]  ? __pcpu_freelist_pop+0x87/0xb0
[ 4170.305579]  lock_acquire+0xa7/0x190
[ 4170.309157]  ? lock_acquire+0xa7/0x190
[ 4170.312900]  ? htab_map_update_elem+0x1b6/0x3a0
[ 4170.317434]  _raw_spin_lock_irqsave+0x3a/0x50
[ 4170.321791]  ? htab_map_update_elem+0x1b6/0x3a0
[ 4170.326324]  htab_map_update_elem+0x1b6/0x3a0
[ 4170.330684]  ? trace_call_bpf+0x100/0x260
[ 4170.334695]  ? perf_trace_run_bpf_submit+0x42/0xb0
[ 4170.339487]  ? perf_trace_sched_switch+0x129/0x1a0
[ 4170.344279]  ? __schedule+0x44d/0xb80
[ 4170.347943]  ? lockdep_hardirqs_on+0x14d/0x1d0
[ 4170.352388]  ? trace_hardirqs_on+0x3f/0xf0
[ 4170.356487]  ? schedule_idle+0x2c/0x40
[ 4170.360237]  ? do_idle+0x184/0x280
[ 4170.363634]  ? cpu_startup_entry+0x1d/0x20
[ 4170.367735]  ? start_secondary+0x184/0x1d0
[ 4170.371830]  ? secondary_startup_64+0xa4/0xb0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] BPF splat on latest kernels
  2019-03-08 20:30 [BUG] BPF splat on latest kernels Eric Dumazet
@ 2019-03-09  0:29 ` Alexei Starovoitov
  2019-03-09  1:59   ` Eric Dumazet
  2019-03-21  4:49   ` Eric Dumazet
  0 siblings, 2 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2019-03-09  0:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, Alexei Starovoitov, Networking, Eric Dumazet

On Fri, Mar 8, 2019 at 12:33 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Running test_progs on a LOCKDEP enabled kernel (latest David Miller net tree)
>
> I got the following splat.
>
> It is not immediately obvious to me. Any idea ?

I think I saw something similar running test_maps,
but I had other changes in the tree. And later I was
never able to reproduce it, so figured it must have
been my local change. Looks like it was for real.
Do you have a reproducer?
I'll take a look on monday.

> [ 4169.908826] ======================================================
> [ 4169.914996] WARNING: possible circular locking dependency detected
> [ 4169.921172] 5.0.0-dbg-DEV #597 Not tainted
> [ 4169.925294] ------------------------------------------------------
> [ 4169.931482] swapper/29/0 is trying to acquire lock:
> [ 4169.936379] 00000000541e3f72 (&htab->buckets[i].lock){....}, at: htab_map_update_elem+0x1b6/0x3a0
> [ 4169.945262]
>                but task is already holding lock:
> [ 4169.951084] 00000000e213eabd (&rq->lock){-.-.}, at: __schedule+0xd6/0xb80
> [ 4169.957904]
>                which lock already depends on the new lock.
>
> [ 4169.966101]
>                the existing dependency chain (in reverse order) is:
> [ 4169.973605]
>                -> #4 (&rq->lock){-.-.}:
> [ 4169.978651]        _raw_spin_lock+0x2f/0x40
> [ 4169.982866]        task_fork_fair+0x37/0x160
> [ 4169.987150]        sched_fork+0x206/0x310
> [ 4169.991167]        copy_process.part.41+0x7eb/0x2370
> [ 4169.996128]        _do_fork+0xda/0x6a0
> [ 4169.999872]        kernel_thread+0x29/0x30
> [ 4170.003962]        rest_init+0x22/0x260
> [ 4170.007820]        arch_call_rest_init+0xe/0x10
> [ 4170.012382]        start_kernel+0x4d8/0x4f9
> [ 4170.016557]        x86_64_start_reservations+0x24/0x26
> [ 4170.021697]        x86_64_start_kernel+0x6f/0x72
> [ 4170.026312]        secondary_startup_64+0xa4/0xb0
> [ 4170.031039]
>                -> #3 (&p->pi_lock){-.-.}:
> [ 4170.036265]        _raw_spin_lock_irqsave+0x3a/0x50
> [ 4170.041172]        try_to_wake_up+0x41/0x610
> [ 4170.045466]        wake_up_process+0x15/0x20
> [ 4170.049745]        create_worker+0x16b/0x1e0
> [ 4170.054039]        workqueue_init+0x1ff/0x31d
> [ 4170.058397]        kernel_init_freeable+0x116/0x2a5
> [ 4170.063291]        kernel_init+0xf/0x180
> [ 4170.067212]        ret_from_fork+0x24/0x30
> [ 4170.071307]
>                -> #2 (&(&pool->lock)->rlock){-.-.}:
> [ 4170.077425]        _raw_spin_lock+0x2f/0x40
> [ 4170.081602]        __queue_work+0xb2/0x540
> [ 4170.085709]        queue_work_on+0x38/0x80
> [ 4170.089827]        free_percpu+0x231/0x270
> [ 4170.093933]        array_map_free+0x47/0x80
> [ 4170.098131]        bpf_map_free_deferred+0x3c/0x50
> [ 4170.102926]        process_one_work+0x1f4/0x590
> [ 4170.107466]        worker_thread+0x6f/0x430
> [ 4170.111654]        kthread+0x132/0x170
> [ 4170.115431]        ret_from_fork+0x24/0x30
> [ 4170.119520]
>                -> #1 (pcpu_lock){..-.}:
> [ 4170.124562]        _raw_spin_lock_irqsave+0x3a/0x50
> [ 4170.129448]        pcpu_alloc+0xfa/0x790
> [ 4170.133383]        __alloc_percpu_gfp+0x12/0x20
> [ 4170.137905]        alloc_htab_elem+0x182/0x2b0
> [ 4170.142351]        __htab_percpu_map_update_elem+0x270/0x2c0
> [ 4170.148036]        bpf_percpu_hash_update+0x7c/0x130
> [ 4170.153035]        __do_sys_bpf+0x194a/0x1c10
> [ 4170.157385]        __x64_sys_bpf+0x1a/0x20
> [ 4170.161485]        do_syscall_64+0x5a/0x460
> [ 4170.165674]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 4170.171255]
>                -> #0 (&htab->buckets[i].lock){....}:
> [ 4170.177433]        lock_acquire+0xa7/0x190
> [ 4170.181549]        _raw_spin_lock_irqsave+0x3a/0x50
> [ 4170.186448]        htab_map_update_elem+0x1b6/0x3a0
> [ 4170.191347]
>                other info that might help us debug this:
>
> [ 4170.199345] Chain exists of:
>                  &htab->buckets[i].lock --> &p->pi_lock --> &rq->lock
>
> [ 4170.209867]  Possible unsafe locking scenario:
>
> [ 4170.215776]        CPU0                    CPU1
> [ 4170.220301]        ----                    ----
> [ 4170.224823]   lock(&rq->lock);
> [ 4170.227882]                                lock(&p->pi_lock);
> [ 4170.233646]                                lock(&rq->lock);
> [ 4170.239208]   lock(&htab->buckets[i].lock);
> [ 4170.243384]
>                 *** DEADLOCK ***
>
> [ 4170.249297] 2 locks held by swapper/29/0:
> [ 4170.253307]  #0: 00000000e213eabd (&rq->lock){-.-.}, at: __schedule+0xd6/0xb80
> [ 4170.260558]  #1: 000000006891c367 (rcu_read_lock){....}, at: trace_call_bpf+0x40/0x260
> [ 4170.268476]
>                stack backtrace:
> [ 4170.272838] CPU: 29 PID: 0 Comm: swapper/29 Not tainted 5.0.0-dbg-DEV #597
> [ 4170.279708] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.54.0 06/07/2018
> [ 4170.286658] Call Trace:
> [ 4170.289108]  dump_stack+0x67/0x95
> [ 4170.292424]  print_circular_bug.isra.36+0x1fd/0x20b
> [ 4170.297302]  __lock_acquire+0x15b3/0x1850
> [ 4170.301308]  ? __pcpu_freelist_pop+0x87/0xb0
> [ 4170.305579]  lock_acquire+0xa7/0x190
> [ 4170.309157]  ? lock_acquire+0xa7/0x190
> [ 4170.312900]  ? htab_map_update_elem+0x1b6/0x3a0
> [ 4170.317434]  _raw_spin_lock_irqsave+0x3a/0x50
> [ 4170.321791]  ? htab_map_update_elem+0x1b6/0x3a0
> [ 4170.326324]  htab_map_update_elem+0x1b6/0x3a0
> [ 4170.330684]  ? trace_call_bpf+0x100/0x260
> [ 4170.334695]  ? perf_trace_run_bpf_submit+0x42/0xb0
> [ 4170.339487]  ? perf_trace_sched_switch+0x129/0x1a0
> [ 4170.344279]  ? __schedule+0x44d/0xb80
> [ 4170.347943]  ? lockdep_hardirqs_on+0x14d/0x1d0
> [ 4170.352388]  ? trace_hardirqs_on+0x3f/0xf0
> [ 4170.356487]  ? schedule_idle+0x2c/0x40
> [ 4170.360237]  ? do_idle+0x184/0x280
> [ 4170.363634]  ? cpu_startup_entry+0x1d/0x20
> [ 4170.367735]  ? start_secondary+0x184/0x1d0
> [ 4170.371830]  ? secondary_startup_64+0xa4/0xb0
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] BPF splat on latest kernels
  2019-03-09  0:29 ` Alexei Starovoitov
@ 2019-03-09  1:59   ` Eric Dumazet
  2019-03-21  4:49   ` Eric Dumazet
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2019-03-09  1:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, Networking, Eric Dumazet



On 03/08/2019 04:29 PM, Alexei Starovoitov wrote:
> On Fri, Mar 8, 2019 at 12:33 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> Running test_progs on a LOCKDEP enabled kernel (latest David Miller net tree)
>>
>> I got the following splat.
>>
>> It is not immediately obvious to me. Any idea ?
> 
> I think I saw something similar running test_maps,
> but I had other changes in the tree. And later I was
> never able to reproduce it, so figured it must have
> been my local change. Looks like it was for real.
> Do you have a reproducer?

You might need to tweak a bit test_maps [1] to stress the percpu allocator a bit.

Then run test_maps, then test_progs (from tools/testing/selftests/bpf/)  another splat in [2]



[1]

diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index 3c627771f9654b676a6eca5a143ffcaf0d65d8bc..44c331bf114530057f0d7b2e67effe1649be132a 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -164,7 +164,7 @@ static void test_hashmap_percpu(unsigned int task, void *data)
        int fd, i;
 
        fd = bpf_create_map(BPF_MAP_TYPE_PERCPU_HASH, sizeof(key),
-                           sizeof(bpf_percpu(value, 0)), 2, map_flags);
+                           sizeof(bpf_percpu(value, 0)), 10000, map_flags);
        if (fd < 0) {
                printf("Failed to create hashmap '%s'!\n", strerror(errno));
                exit(1);
@@ -213,8 +213,8 @@ static void test_hashmap_percpu(unsigned int task, void *data)
         * inserted due to max_entries limit.
         */
        key = 0;
-       assert(bpf_map_update_elem(fd, &key, value, BPF_NOEXIST) == -1 &&
-              errno == E2BIG);
+//     assert(bpf_map_update_elem(fd, &key, value, BPF_NOEXIST) == -1 &&
+//            errno == E2BIG);
 
        /* Check that key = 0 doesn't exist. */
        assert(bpf_map_delete_elem(fd, &key) == -1 && errno == ENOENT);


[2]
[  215.615945] ======================================================
[  215.622140] WARNING: possible circular locking dependency detected
[  215.628366] 5.0.0-dbg-DEV #597 Not tainted
[  215.632487] ------------------------------------------------------
[  215.638702] kworker/43:158/25017 is trying to acquire lock:
[  215.644306] 0000000098ba7f09 (&(&pool->lock)->rlock){-.-.}, at: __queue_work+0xb2/0x540
[  215.652353] 
               but task is already holding lock:
[  215.658243] 00000000ccf019e1 (pcpu_lock){..-.}, at: free_percpu+0x44/0x270
[  215.665169] 
               which lock already depends on the new lock.

[  215.673366] 
               the existing dependency chain (in reverse order) is:
[  215.680873] 
               -> #4 (pcpu_lock){..-.}:
[  215.685921]        _raw_spin_lock_irqsave+0x3a/0x50
[  215.690823]        pcpu_alloc+0xfa/0x790
[  215.694739]        __alloc_percpu_gfp+0x12/0x20
[  215.699299]        alloc_htab_elem+0x182/0x2b0
[  215.703779]        __htab_percpu_map_update_elem+0x270/0x2c0
[  215.709472]        bpf_percpu_hash_update+0x7c/0x130
[  215.714442]        __do_sys_bpf+0x194a/0x1c10
[  215.718831]        __x64_sys_bpf+0x1a/0x20
[  215.722946]        do_syscall_64+0x5a/0x460
[  215.727135]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  215.732742] 
               -> #3 (&htab->buckets[i].lock){....}:
[  215.738942]        _raw_spin_lock_irqsave+0x3a/0x50
[  215.743838]        htab_map_update_elem+0x1b6/0x3a0
[  215.748731] 
               -> #2 (&rq->lock){-.-.}:
[  215.753819]        _raw_spin_lock+0x2f/0x40
[  215.758015]        task_fork_fair+0x37/0x160
[  215.762314]        sched_fork+0x206/0x310
[  215.766353]        copy_process.part.41+0x7eb/0x2370
[  215.771350]        _do_fork+0xda/0x6a0
[  215.775120]        kernel_thread+0x29/0x30
[  215.779240]        rest_init+0x22/0x260
[  215.783086]        arch_call_rest_init+0xe/0x10
[  215.787649]        start_kernel+0x4d8/0x4f9
[  215.791887]        x86_64_start_reservations+0x24/0x26
[  215.797044]        x86_64_start_kernel+0x6f/0x72
[  215.801697]        secondary_startup_64+0xa4/0xb0
[  215.806436] 
               -> #1 (&p->pi_lock){-.-.}:
[  215.811664]        _raw_spin_lock_irqsave+0x3a/0x50
[  215.816559]        try_to_wake_up+0x41/0x610
[  215.820830]        wake_up_process+0x15/0x20
[  215.825095]        create_worker+0x16b/0x1e0
[  215.829363]        workqueue_init+0x1ff/0x31d
[  215.833744]        kernel_init_freeable+0x116/0x2a5
[  215.838651]        kernel_init+0xf/0x180
[  215.842602]        ret_from_fork+0x24/0x30
[  215.846699] 
               -> #0 (&(&pool->lock)->rlock){-.-.}:
[  215.852822]        lock_acquire+0xa7/0x190
[  215.856943]        _raw_spin_lock+0x2f/0x40
[  215.861124]        __queue_work+0xb2/0x540
[  215.865250]        queue_work_on+0x38/0x80
[  215.869356]        free_percpu+0x231/0x270
[  215.873482]        pcpu_freelist_destroy+0x11/0x20
[  215.878275]        htab_map_free+0x48/0x110
[  215.882505]        bpf_map_free_deferred+0x3c/0x50
[  215.887307]        process_one_work+0x1f4/0x590
[  215.891837]        worker_thread+0x6f/0x430
[  215.896041]        kthread+0x132/0x170
[  215.899827]        ret_from_fork+0x24/0x30
[  215.903941] 
               other info that might help us debug this:

[  215.911951] Chain exists of:
                 &(&pool->lock)->rlock --> &htab->buckets[i].lock --> pcpu_lock

[  215.923391]  Possible unsafe locking scenario:

[  215.929362]        CPU0                    CPU1
[  215.933895]        ----                    ----
[  215.938453]   lock(pcpu_lock);
[  215.941528]                                lock(&htab->buckets[i].lock);
[  215.948272]                                lock(pcpu_lock);
[  215.953872]   lock(&(&pool->lock)->rlock);
[  215.958000] 
                *** DEADLOCK ***

[  215.963968] 3 locks held by kworker/43:158/25017:
[  215.968706]  #0: 00000000f6b8c08b ((wq_completion)"events"){+.+.}, at: process_one_work+0x177/0x590
[  215.977788]  #1: 00000000ca9d2a82 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0x177/0x590
[  215.987399]  #2: 00000000ccf019e1 (pcpu_lock){..-.}, at: free_percpu+0x44/0x270
[  215.994747] 
               stack backtrace:
[  215.999145] CPU: 43 PID: 25017 Comm: kworker/43:158 Not tainted 5.0.0-dbg-DEV #597
[  216.006732] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.54.0 06/07/2018
[  216.013738] Workqueue: events bpf_map_free_deferred
[  216.018659] Call Trace:
[  216.021140]  dump_stack+0x67/0x95
[  216.024501]  print_circular_bug.isra.36+0x1fd/0x20b
[  216.029410]  __lock_acquire+0x15b3/0x1850
[  216.033447]  ? __lock_acquire+0x3f4/0x1850
[  216.037577]  lock_acquire+0xa7/0x190
[  216.041165]  ? lock_acquire+0xa7/0x190
[  216.044959]  ? __queue_work+0xb2/0x540
[  216.048712]  _raw_spin_lock+0x2f/0x40
[  216.052404]  ? __queue_work+0xb2/0x540
[  216.056183]  __queue_work+0xb2/0x540
[  216.059789]  queue_work_on+0x38/0x80
[  216.063403]  free_percpu+0x231/0x270
[  216.067009]  pcpu_freelist_destroy+0x11/0x20
[  216.071315]  htab_map_free+0x48/0x110
[  216.075015]  bpf_map_free_deferred+0x3c/0x50
[  216.079310]  process_one_work+0x1f4/0x590
[  216.083317]  worker_thread+0x6f/0x430
[  216.087000]  ? rescuer_thread+0x3d0/0x3d0
[  216.091047]  kthread+0x132/0x170
[  216.094315]  ? kthread_stop+0x200/0x200

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [BUG] BPF splat on latest kernels
  2019-03-09  0:29 ` Alexei Starovoitov
  2019-03-09  1:59   ` Eric Dumazet
@ 2019-03-21  4:49   ` Eric Dumazet
  2019-03-22  3:36     ` Alexei Starovoitov
  1 sibling, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2019-03-21  4:49 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, Networking, Eric Dumazet



On 03/08/2019 04:29 PM, Alexei Starovoitov wrote:
> On Fri, Mar 8, 2019 at 12:33 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> Running test_progs on a LOCKDEP enabled kernel (latest David Miller net tree)
>>
>> I got the following splat.
>>
>> It is not immediately obvious to me. Any idea ?
> 
> I think I saw something similar running test_maps,
> but I had other changes in the tree. And later I was
> never able to reproduce it, so figured it must have
> been my local change. Looks like it was for real.
> Do you have a reproducer?
> I'll take a look on monday.

Hi Alexei

Has this problem been root caused ?

Thanks !

> 
>> [ 4169.908826] ======================================================
>> [ 4169.914996] WARNING: possible circular locking dependency detected
>> [ 4169.921172] 5.0.0-dbg-DEV #597 Not tainted
>> [ 4169.925294] ------------------------------------------------------
>> [ 4169.931482] swapper/29/0 is trying to acquire lock:
>> [ 4169.936379] 00000000541e3f72 (&htab->buckets[i].lock){....}, at: htab_map_update_elem+0x1b6/0x3a0
>> [ 4169.945262]
>>                but task is already holding lock:
>> [ 4169.951084] 00000000e213eabd (&rq->lock){-.-.}, at: __schedule+0xd6/0xb80
>> [ 4169.957904]
>>                which lock already depends on the new lock.
>>
>> [ 4169.966101]
>>                the existing dependency chain (in reverse order) is:
>> [ 4169.973605]
>>                -> #4 (&rq->lock){-.-.}:
>> [ 4169.978651]        _raw_spin_lock+0x2f/0x40
>> [ 4169.982866]        task_fork_fair+0x37/0x160
>> [ 4169.987150]        sched_fork+0x206/0x310
>> [ 4169.991167]        copy_process.part.41+0x7eb/0x2370
>> [ 4169.996128]        _do_fork+0xda/0x6a0
>> [ 4169.999872]        kernel_thread+0x29/0x30
>> [ 4170.003962]        rest_init+0x22/0x260
>> [ 4170.007820]        arch_call_rest_init+0xe/0x10
>> [ 4170.012382]        start_kernel+0x4d8/0x4f9
>> [ 4170.016557]        x86_64_start_reservations+0x24/0x26
>> [ 4170.021697]        x86_64_start_kernel+0x6f/0x72
>> [ 4170.026312]        secondary_startup_64+0xa4/0xb0
>> [ 4170.031039]
>>                -> #3 (&p->pi_lock){-.-.}:
>> [ 4170.036265]        _raw_spin_lock_irqsave+0x3a/0x50
>> [ 4170.041172]        try_to_wake_up+0x41/0x610
>> [ 4170.045466]        wake_up_process+0x15/0x20
>> [ 4170.049745]        create_worker+0x16b/0x1e0
>> [ 4170.054039]        workqueue_init+0x1ff/0x31d
>> [ 4170.058397]        kernel_init_freeable+0x116/0x2a5
>> [ 4170.063291]        kernel_init+0xf/0x180
>> [ 4170.067212]        ret_from_fork+0x24/0x30
>> [ 4170.071307]
>>                -> #2 (&(&pool->lock)->rlock){-.-.}:
>> [ 4170.077425]        _raw_spin_lock+0x2f/0x40
>> [ 4170.081602]        __queue_work+0xb2/0x540
>> [ 4170.085709]        queue_work_on+0x38/0x80
>> [ 4170.089827]        free_percpu+0x231/0x270
>> [ 4170.093933]        array_map_free+0x47/0x80
>> [ 4170.098131]        bpf_map_free_deferred+0x3c/0x50
>> [ 4170.102926]        process_one_work+0x1f4/0x590
>> [ 4170.107466]        worker_thread+0x6f/0x430
>> [ 4170.111654]        kthread+0x132/0x170
>> [ 4170.115431]        ret_from_fork+0x24/0x30
>> [ 4170.119520]
>>                -> #1 (pcpu_lock){..-.}:
>> [ 4170.124562]        _raw_spin_lock_irqsave+0x3a/0x50
>> [ 4170.129448]        pcpu_alloc+0xfa/0x790
>> [ 4170.133383]        __alloc_percpu_gfp+0x12/0x20
>> [ 4170.137905]        alloc_htab_elem+0x182/0x2b0
>> [ 4170.142351]        __htab_percpu_map_update_elem+0x270/0x2c0
>> [ 4170.148036]        bpf_percpu_hash_update+0x7c/0x130
>> [ 4170.153035]        __do_sys_bpf+0x194a/0x1c10
>> [ 4170.157385]        __x64_sys_bpf+0x1a/0x20
>> [ 4170.161485]        do_syscall_64+0x5a/0x460
>> [ 4170.165674]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [ 4170.171255]
>>                -> #0 (&htab->buckets[i].lock){....}:
>> [ 4170.177433]        lock_acquire+0xa7/0x190
>> [ 4170.181549]        _raw_spin_lock_irqsave+0x3a/0x50
>> [ 4170.186448]        htab_map_update_elem+0x1b6/0x3a0
>> [ 4170.191347]
>>                other info that might help us debug this:
>>
>> [ 4170.199345] Chain exists of:
>>                  &htab->buckets[i].lock --> &p->pi_lock --> &rq->lock
>>
>> [ 4170.209867]  Possible unsafe locking scenario:
>>
>> [ 4170.215776]        CPU0                    CPU1
>> [ 4170.220301]        ----                    ----
>> [ 4170.224823]   lock(&rq->lock);
>> [ 4170.227882]                                lock(&p->pi_lock);
>> [ 4170.233646]                                lock(&rq->lock);
>> [ 4170.239208]   lock(&htab->buckets[i].lock);
>> [ 4170.243384]
>>                 *** DEADLOCK ***
>>
>> [ 4170.249297] 2 locks held by swapper/29/0:
>> [ 4170.253307]  #0: 00000000e213eabd (&rq->lock){-.-.}, at: __schedule+0xd6/0xb80
>> [ 4170.260558]  #1: 000000006891c367 (rcu_read_lock){....}, at: trace_call_bpf+0x40/0x260
>> [ 4170.268476]
>>                stack backtrace:
>> [ 4170.272838] CPU: 29 PID: 0 Comm: swapper/29 Not tainted 5.0.0-dbg-DEV #597
>> [ 4170.279708] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.54.0 06/07/2018
>> [ 4170.286658] Call Trace:
>> [ 4170.289108]  dump_stack+0x67/0x95
>> [ 4170.292424]  print_circular_bug.isra.36+0x1fd/0x20b
>> [ 4170.297302]  __lock_acquire+0x15b3/0x1850
>> [ 4170.301308]  ? __pcpu_freelist_pop+0x87/0xb0
>> [ 4170.305579]  lock_acquire+0xa7/0x190
>> [ 4170.309157]  ? lock_acquire+0xa7/0x190
>> [ 4170.312900]  ? htab_map_update_elem+0x1b6/0x3a0
>> [ 4170.317434]  _raw_spin_lock_irqsave+0x3a/0x50
>> [ 4170.321791]  ? htab_map_update_elem+0x1b6/0x3a0
>> [ 4170.326324]  htab_map_update_elem+0x1b6/0x3a0
>> [ 4170.330684]  ? trace_call_bpf+0x100/0x260
>> [ 4170.334695]  ? perf_trace_run_bpf_submit+0x42/0xb0
>> [ 4170.339487]  ? perf_trace_sched_switch+0x129/0x1a0
>> [ 4170.344279]  ? __schedule+0x44d/0xb80
>> [ 4170.347943]  ? lockdep_hardirqs_on+0x14d/0x1d0
>> [ 4170.352388]  ? trace_hardirqs_on+0x3f/0xf0
>> [ 4170.356487]  ? schedule_idle+0x2c/0x40
>> [ 4170.360237]  ? do_idle+0x184/0x280
>> [ 4170.363634]  ? cpu_startup_entry+0x1d/0x20
>> [ 4170.367735]  ? start_secondary+0x184/0x1d0
>> [ 4170.371830]  ? secondary_startup_64+0xa4/0xb0
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] BPF splat on latest kernels
  2019-03-21  4:49   ` Eric Dumazet
@ 2019-03-22  3:36     ` Alexei Starovoitov
  2019-03-22 12:52       ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2019-03-22  3:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, Alexei Starovoitov, Networking, Eric Dumazet

On Wed, Mar 20, 2019 at 09:49:34PM -0700, Eric Dumazet wrote:
> 
> 
> On 03/08/2019 04:29 PM, Alexei Starovoitov wrote:
> > On Fri, Mar 8, 2019 at 12:33 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >> Running test_progs on a LOCKDEP enabled kernel (latest David Miller net tree)
> >>
> >> I got the following splat.
> >>
> >> It is not immediately obvious to me. Any idea ?
> > 
> > I think I saw something similar running test_maps,
> > but I had other changes in the tree. And later I was
> > never able to reproduce it, so figured it must have
> > been my local change. Looks like it was for real.
> > Do you have a reproducer?
> > I'll take a look on monday.
> 
> Hi Alexei
> 
> Has this problem been root caused ?

This is not forgotten.
I'm prioritizing bpf code reviews at the moment since backlog is still big
and developers are waiting.
If somebody can dig into this lockdep splat it would be great.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] BPF splat on latest kernels
  2019-03-22  3:36     ` Alexei Starovoitov
@ 2019-03-22 12:52       ` Eric Dumazet
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2019-03-22 12:52 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Daniel Borkmann, Alexei Starovoitov, Networking

On Thu, Mar 21, 2019 at 8:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Mar 20, 2019 at 09:49:34PM -0700, Eric Dumazet wrote:
> >
> >
> > On 03/08/2019 04:29 PM, Alexei Starovoitov wrote:
> > > On Fri, Mar 8, 2019 at 12:33 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > >>
> > >> Running test_progs on a LOCKDEP enabled kernel (latest David Miller net tree)
> > >>
> > >> I got the following splat.
> > >>
> > >> It is not immediately obvious to me. Any idea ?
> > >
> > > I think I saw something similar running test_maps,
> > > but I had other changes in the tree. And later I was
> > > never able to reproduce it, so figured it must have
> > > been my local change. Looks like it was for real.
> > > Do you have a reproducer?
> > > I'll take a look on monday.
> >
> > Hi Alexei
> >
> > Has this problem been root caused ?
>
> This is not forgotten.
> I'm prioritizing bpf code reviews at the moment since backlog is still big
> and developers are waiting.
> If somebody can dig into this lockdep splat it would be great.

Maybe I could resend the splat to lkml@ to get some help.

Thanks !

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-03-22 12:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08 20:30 [BUG] BPF splat on latest kernels Eric Dumazet
2019-03-09  0:29 ` Alexei Starovoitov
2019-03-09  1:59   ` Eric Dumazet
2019-03-21  4:49   ` Eric Dumazet
2019-03-22  3:36     ` Alexei Starovoitov
2019-03-22 12:52       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.