From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Hillf Danton <hdanton@sina.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Dmitry Vyukov <dvyukov@google.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Tejun Heo <tj@kernel.org>
Subject: Re: BUG: MAX_LOCKDEP_KEYS too low!
Date: Sat, 23 Jan 2021 17:35:03 +1100 [thread overview]
Message-ID: <8b1fff6b-1acc-4582-0abe-86f948ac4b4a@ozlabs.ru> (raw)
In-Reply-To: <20210123060145.18356-1-hdanton@sina.com>
On 23/01/2021 17:01, Hillf Danton wrote:
> On Sat, 23 Jan 2021 09:53:42 +1100 Alexey Kardashevskiy wrote:
>> On 23/01/2021 02:30, Tetsuo Handa wrote:
>>> On 2021/01/22 22:28, Tetsuo Handa wrote:
>>>> On 2021/01/22 21:10, Dmitry Vyukov wrote:
>>>>> On Fri, Jan 22, 2021 at 1:03 PM Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 22/01/2021 21:30, Tetsuo Handa wrote:
>>>>>>> On 2021/01/22 18:16, Dmitry Vyukov wrote:
>>>>>>>> The reproducer only does 2 bpf syscalls, so something is slowly leaking in bpf.
>>>>>>>> My first suspect would be one of these. Since workqueue is async, it
>>>>>>>> may cause such slow drain that happens only when tasks are spawned
>>>>>>>> fast. I don't know if there is a procfs/debugfs introspection file to
>>>>>>>> monitor workqueue lengths to verify this hypothesis.
>>>>>>>
>>>>>>> If you can reproduce locally, you can call show_workqueue_state()
>>>>>>> (part of SysRq-t) when hitting the limit.
>>>>>>>
>>>>>>> --- a/kernel/locking/lockdep.c
>>>>>>> +++ b/kernel/locking/lockdep.c
>>>>>>> @@ -1277,6 +1277,7 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
>>>>>>>
>>>>>>> print_lockdep_off("BUG: MAX_LOCKDEP_KEYS too low!");
>>>>>>> dump_stack();
>>>>>>> + show_workqueue_state();
>>>>>>> return NULL;
>>>>>>> }
>>>>>>> nr_lock_classes++;
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here is the result:
>>>>>> https://pastebin.com/rPn0Cytu
>>>>>
>>>>> Do you mind posting this publicly?
>>>>> Yes, it seems that bpf_map_free_deferred is the problem (11138
>>>>> outstanding callbacks).
>
> Need to set up a local queue for releasing bpf maps if 10,000 means
> flooding.
>
>>>>>
>>>>
>>>> Wow. Horribly stuck queue. I guess BPF wants to try WQ created by
>>>>
>>>> alloc_workqueue("bpf_free_wq", WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_UNBOUND, 0);
>>>>
>>>> rather than system_wq . You can add Tejun Heo <tj@kernel.org> for WQ.
>>>>
>>>> Anyway, please post your result to ML.
>>
>> https://pastebin.com/JfrmzguK is with the patch below applied. Seems
>> less output. Interestingly when I almost hit "send", OOM kicked in and
>> tried killing a bunch of "maxlockdep" processes (my test case):
>>
>> [ 891.037315] [ 31007] 0 31007 281 5 49152 0
>> 1000 maxlockdep
>> [ 891.037540] [ 31009] 0 31009 281 5 49152 0
>> 1000 maxlockdep
>> [ 891.037760] [ 31012] 0 31012 281 5 49152 0
>> 1000 maxlockdep
>> [ 891.037980] [ 31013] 0 31013 281 5 47104 0
>> 0 maxlockdep
>> [ 891.038210] [ 31014] 0 31014 281 5 49152 0
>> 1000 maxlockdep
>> [ 891.038429] [ 31018] 0 31018 281 5 47104 0
>> 0 maxlockdep
>> [ 891.038652] [ 31019] 0 31019 281 5 49152 0
>> 1000 maxlockdep
>> [ 891.038874] [ 31020] 0 31020 281 5 49152 0
>> 1000 maxlockdep
>> [ 891.039095] [ 31021] 0 31021 281 5 49152 0
>> 1000 maxlockdep
>> [ 891.039317] [ 31022] 0 31022 281 5 47104 0
>> 0 maxlockdep
>>
>
> A local queue, the mix of list head and spin lock, helps to collapse
> the entities of map->work into one work in order to cut the risk of
> work flooding to WQ.
>
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -448,16 +448,40 @@ static void bpf_map_release_memcg(struct
> }
> #endif
>
> -/* called from workqueue */
> +static int worker_idle = 1;
> +static LIST_HEAD(bpf_map_work_list);
> +static DEFINE_SPINLOCK(bpf_map_work_lock);
> +
> static void bpf_map_free_deferred(struct work_struct *work)
> {
> - struct bpf_map *map = container_of(work, struct bpf_map, work);
> + struct bpf_map *map;
> +
> + worker_idle = 0;
> +again:
> + map = NULL;
> + spin_lock_irq(&bpf_map_work_lock);
> +
> + if (!list_empty(&bpf_map_work_list)) {
> + map = list_first_entry(&bpf_map_work_list, struct bpf_map,
> + work_list);
> + list_del_init(&map->work_list);
> + } else
> + worker_idle = 1;
> +
> + spin_unlock_irq(&bpf_map_work_lock);
> +
> + if (!map)
> + return;
>
> security_bpf_map_free(map);
> bpf_map_release_memcg(map);
> /* implementation dependent freeing */
> map->ops->map_free(map);
> +
> + cond_resched();
> + goto again;
> }
> +static DECLARE_WORK(bpf_map_release_work, bpf_map_free_deferred);
>
> static void bpf_map_put_uref(struct bpf_map *map)
> {
> @@ -473,11 +497,20 @@ static void bpf_map_put_uref(struct bpf_
> static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
> {
> if (atomic64_dec_and_test(&map->refcnt)) {
> + unsigned long flags;
> + int idle;
> +
> /* bpf_map_free_id() must be called first */
> bpf_map_free_id(map, do_idr_lock);
> btf_put(map->btf);
> - INIT_WORK(&map->work, bpf_map_free_deferred);
> - schedule_work(&map->work);
> +
> + spin_lock_irqsave(&bpf_map_work_lock, flags);
> + idle = worker_idle;
> + list_add(&map->work_list, &bpf_map_work_list);
> + spin_unlock_irqrestore(&bpf_map_work_lock, flags);
> +
> + if (idle)
> + queue_work(system_unbound_wq, &bpf_map_release_work);
> }
> }
>
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -149,7 +149,7 @@ struct bpf_map {
> */
> atomic64_t refcnt ____cacheline_aligned;
> atomic64_t usercnt;
> - struct work_struct work;
> + struct list_head work_list;
> struct mutex freeze_mutex;
> u64 writecnt; /* writable mmap cnt; protected by freeze_mutex */
> };
> --
>
this behaves quite different but still produces the message (i have
show_workqueue_state() right after the bug message):
[ 85.803991] BUG: MAX_LOCKDEP_KEYS too low!
[ 85.804338] turning off the locking correctness validator.
[ 85.804474] Showing busy workqueues and worker pools:
[ 85.804620] workqueue events_unbound: flags=0x2
[ 85.804764] pwq 16: cpus=0-7 flags=0x4 nice=0 active=1/512 refcnt=3
[ 85.804965] in-flight: 81:bpf_map_free_deferred
[ 85.805229] workqueue events_power_efficient: flags=0x80
[ 85.805357] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 85.805558] in-flight: 57:gc_worker
[ 85.805877] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=0s workers=3
idle: 82 24
[ 85.806147] pool 16: cpus=0-7 flags=0x4 nice=0 hung=69s workers=3
idle: 7 251
^C[ 100.129747] maxlockdep (5104) used greatest stack depth: 8032 bytes
left
root@le-dbg:~# grep "lock-classes" /proc/lockdep_stats
lock-classes: 8192 [max: 8192]
--
Alexey
next prev parent reply other threads:[~2021-01-23 6:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-22 3:43 BUG: MAX_LOCKDEP_KEYS too low! Alexey Kardashevskiy
2021-01-22 9:16 ` Dmitry Vyukov
[not found] ` <6af41136-4344-73da-f821-e831674be473@i-love.sakura.ne.jp>
[not found] ` <70d427e8-7281-0aae-c524-813d73eca2d7@ozlabs.ru>
[not found] ` <CACT4Y+bqidtwh1HUFFoyyKyVy0jnwrzhVBgqmU+T9sN1yPMO=g@mail.gmail.com>
[not found] ` <eb71cc37-afbd-5446-6305-8c7abcc6e91f@i-love.sakura.ne.jp>
[not found] ` <6eaafbd8-1c10-75df-75ae-9afa0861f69b@i-love.sakura.ne.jp>
2021-01-22 22:53 ` Alexey Kardashevskiy
2021-01-23 0:39 ` BPF: unbounded bpf_map_free_deferred problem Tetsuo Handa
2021-01-23 3:27 ` Cong Wang
[not found] ` <cf17e6c4-76c7-52b9-39d5-c14946070fc4@i-love.sakura.ne.jp>
[not found] ` <c1aecd4e-8db7-87a5-94bf-c630f1cf0866@ozlabs.ru>
2021-01-25 0:52 ` Tetsuo Handa
[not found] ` <20210123060145.18356-1-hdanton@sina.com>
2021-01-23 6:35 ` Alexey Kardashevskiy [this message]
2021-01-23 10:29 ` BUG: MAX_LOCKDEP_KEYS too low! Tetsuo Handa
2021-01-23 11:26 ` Alexey Kardashevskiy
2021-01-23 13:12 ` Tetsuo Handa
-- strict thread matches above, loose matches on Subject: below --
2019-10-27 3:31 syzbot
2019-10-29 2:13 ` syzbot
2019-10-29 14:09 ` syzbot
2019-10-29 14:09 ` syzbot
2019-10-29 14:09 ` syzbot
2019-11-01 15:08 ` syzbot
2019-11-01 15:08 ` syzbot
2019-11-01 15:08 ` syzbot
2020-12-09 8:01 ` Dmitry Vyukov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8b1fff6b-1acc-4582-0abe-86f948ac4b4a@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=dvyukov@google.com \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.