bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Roman Gushchin <guro@fb.com>, Jiri Olsa <jolsa@redhat.com>
Cc: "Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"YiFei Zhu" <zhuyifei@google.com>,
	"Andrii Nakryiko" <andriin@fb.com>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Yauheni Kaliuta" <ykaliuta@redhat.com>,
	"Jiri Benc" <jbenc@redhat.com>, "Hangbin Liu" <haliu@redhat.com>,
	"Stanislav Fomichev" <sdf@google.com>
Subject: Re: [BUG] hitting bug when running spinlock test
Date: Fri, 5 Mar 2021 13:10:33 -0800	[thread overview]
Message-ID: <14570a91-a793-3f56-047f-5c203cc44345@fb.com> (raw)
In-Reply-To: <YEKWyLG20OgpBMnt@carbon.DHCP.thefacebook.com>



On 3/5/21 12:38 PM, Roman Gushchin wrote:
> On Thu, Mar 04, 2021 at 08:03:33PM +0100, Jiri Olsa wrote:
>> hi,
>> I'm getting attached BUG/crash when running in parralel selftests, like:
>>
>>    while :; do ./test_progs -t spinlock; done
>>    while :; do ./test_progs ; done
>>
>> it's the latest bpf-next/master, I can send the .config if needed,
>> but I don't think there's anything special about it, because I saw
>> the bug on other servers with different generic configs
>>
>> it looks like it's related to cgroup local storage, for some reason
>> the storage deref returns NULL
>>
>> I'm bit lost in this code, so any help would be great ;-)
> 
> Hi!
> 
> I think the patch to blame is df1a2cb7c74b ("bpf/test_run: fix unkillable BPF_PROG_TEST_RUN").

Thanks, Roman, I did some experiments and found the reason of NULL 
storage deref is because a tracing program (mostly like a kprobe) is run 
after bpf_cgroup_storage_set() is called but before bpf program calls 
bpf_get_local_storage(). Note that trace_call_bpf() macro
BPF_PROG_RUN_ARRAY_CHECK does call bpf_cgroup_storage_set().

> Prior to it, we were running the test program in the preempt_disable() && rcu_read_lock()
> section:
> 
> preempt_disable();
> rcu_read_lock();
> bpf_cgroup_storage_set(storage);
> ret = BPF_PROG_RUN(prog, ctx);
> rcu_read_unlock();
> preempt_enable();
> 
> So, a percpu variable with a cgroup local storage pointer couldn't go away.

I think even with using preempt_disable(), if the bpf program calls map 
lookup and there is a kprobe bpf on function htab_map_lookup_elem(), we
will have the issue as BPF_PROG_RUN_ARRAY_CHECK will call 
bpf_cgroup_storage_set() too. I need to write a test case to confirm 
this though.

> 
> After df1a2cb7c74b we can temporarily enable the preemption, so nothing prevents
> another program to call into bpf_cgroup_storage_set() on the same cpu.
> I guess it's exactly what happens here.

It is. I confirmed.

> 
> One option to fix it is to make bpf_cgroup_storage_set() to return the old value,
> save it on a local variable and restore after the execution of the program.

In this particular case, we are doing bpf_test_run, we explicitly 
allocate storage and call bpf_cgroup_storage_set() right before
each BPF_PROG_RUN.

> But I didn't follow closely the development of sleepable bpf programs, so I could
> easily miss something.

Yes, sleepable bpf program is another complication. I think we need a 
variable similar to bpf_prog_active, which should not nested bpf program
execution for those bpf programs having local_storage map.
Will try to craft some patch to facilitate the discussion.

> 
> Thanks!
> 
> Roman
> 
>>
>> thanks,
>> jirka
>>
>>
>> ---
>> ...
>> [  382.324440] bpf_testmod: loading out-of-tree module taints kernel.
>> [  382.330670] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
>> [  480.391667] perf: interrupt took too long (2540 > 2500), lowering kernel.perf_event_max_sample_rate to 78000
>> [  480.401730] perf: interrupt took too long (6860 > 6751), lowering kernel.perf_event_max_sample_rate to 29000
>> [  480.416172] perf: interrupt took too long (8602 > 8575), lowering kernel.perf_event_max_sample_rate to 23000
>> [  480.433053] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> [  480.440014] #PF: supervisor read access in kernel mode
>> [  480.445153] #PF: error_code(0x0000) - not-present page
>> [  480.450294] PGD 8000000133a18067 P4D 8000000133a18067 PUD 10c019067 PMD 0
>> [  480.457164] Oops: 0000 [#1] PREEMPT SMP PTI
>> [  480.461350] CPU: 6 PID: 16689 Comm: test_progs Tainted: G          IOE     5.11.0+ #11
>> [  480.469263] Hardware name: Dell Inc. PowerEdge R440/08CYF7, BIOS 1.7.0 12/14/2018
>> [  480.476742] RIP: 0010:bpf_get_local_storage+0x13/0x50
>> [  480.481797] Code: e8 92 c5 8e 00 5d 89 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 83 7f 18 15 74 10 65 48 8b 05 6d c6 e2 7e <48> 8b 00 48 83 c0 10 c3 55 48 89 e5 53 65 48 8b 05 60 c6 e2 7e8
>> [  480.500540] RSP: 0018:ffffc90001bd3ce0 EFLAGS: 00010293
>> [  480.505766] RAX: 0000000000000000 RBX: 982a259500000000 RCX: 0000000000000018
>> [  480.512901] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888149ccf000
>> [  480.520034] RBP: ffffc90001bd3d20 R08: ffffc90001bd3d04 R09: ffff888105121600
>> [  480.527164] R10: d3b9342000000000 R11: 000000000000025c R12: 0000000000000734
>> [  480.534299] R13: ffff888149ccc710 R14: 0000000000000000 R15: ffffc90000379048
>> [  480.541430] FS:  00007f8f2357b640(0000) GS:ffff8897e0980000(0000) knlGS:0000000000000000
>> [  480.549515] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  480.555262] CR2: 0000000000000000 CR3: 000000014e826006 CR4: 00000000007706e0
>> [  480.562395] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  480.569527] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  480.576660] PKRU: 55555554
>> [  480.579372] Call Trace:
>> [  480.581829]  ? bpf_prog_c48154a736e5c014_bpf_sping_lock_test+0x2ba/0x860
>> [  480.588526]  bpf_test_run+0x127/0x2b0
>> [  480.592192]  ? __build_skb_around+0xb0/0xc0
>> [  480.596378]  bpf_prog_test_run_skb+0x32f/0x6b0
>> [  480.600824]  __do_sys_bpf+0xa94/0x2240
>> [  480.604577]  ? debug_smp_processor_id+0x17/0x20
>> [  480.609107]  ? __perf_event_task_sched_in+0x32/0x340
>> [  480.614077]  __x64_sys_bpf+0x1a/0x20
>> [  480.617653]  do_syscall_64+0x38/0x50
>> [  480.621233]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [  480.626286] RIP: 0033:0x7f8f2467f55d
>> [  480.629865] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d eb 78 0c 00 f7 d8 64 89 018
>> [  480.648611] RSP: 002b:00007f8f2357ad58 EFLAGS: 00000206 ORIG_RAX: 0000000000000141
>> [  480.656175] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8f2467f55d
>> [  480.663308] RDX: 0000000000000078 RSI: 00007f8f2357ad60 RDI: 000000000000000a
>> [  480.670442] RBP: 00007f8f2357ae28 R08: 0000000000000000 R09: 0000000000000008
>> [  480.677574] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f8f2357ae2c
>> [  480.684707] R13: 00000000022df420 R14: 0000000000000000 R15: 00007f8f2357b640
>> [  480.691842] Modules linked in: bpf_testmod(OE) intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass rapl intel_cstate dell_smbios intel_uncore mei_]
>> [  480.739134] CR2: 0000000000000000
>> [  480.742452] ---[ end trace 807177cbb5e3b3da ]---
>> [  480.752174] RIP: 0010:bpf_get_local_storage+0x13/0x50
>> [  480.757230] Code: e8 92 c5 8e 00 5d 89 c0 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 83 7f 18 15 74 10 65 48 8b 05 6d c6 e2 7e <48> 8b 00 48 83 c0 10 c3 55 48 89 e5 53 65 48 8b 05 60 c6 e2 7e8
>> [  480.775976] RSP: 0018:ffffc90001bd3ce0 EFLAGS: 00010293
>> [  480.781202] RAX: 0000000000000000 RBX: 982a259500000000 RCX: 0000000000000018
>> [  480.788335] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff888149ccf000
>> [  480.795466] RBP: ffffc90001bd3d20 R08: ffffc90001bd3d04 R09: ffff888105121600
>> [  480.802598] R10: d3b9342000000000 R11: 000000000000025c R12: 0000000000000734
>> [  480.809730] R13: ffff888149ccc710 R14: 0000000000000000 R15: ffffc90000379048
>> [  480.816865] FS:  00007f8f2357b640(0000) GS:ffff8897e0980000(0000) knlGS:0000000000000000
>> [  480.824951] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  480.830695] CR2: 0000000000000000 CR3: 000000014e826006 CR4: 00000000007706e0
>> [  480.837829] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  480.844961] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  480.852093] PKRU: 55555554
>>

  reply	other threads:[~2021-03-05 21:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-04 19:03 [BUG] hitting bug when running spinlock test Jiri Olsa
2021-03-05  4:08 ` Yonghong Song
2021-03-05 20:38 ` Roman Gushchin
2021-03-05 21:10   ` Yonghong Song [this message]
2021-03-05 21:17     ` Stanislav Fomichev
2021-03-09  5:44     ` Yonghong Song
2021-03-09 18:21       ` Roman Gushchin
2021-03-09 18:36         ` Yonghong Song
2021-03-09 19:25           ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14570a91-a793-3f56-047f-5c203cc44345@fb.com \
    --to=yhs@fb.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=guro@fb.com \
    --cc=haliu@redhat.com \
    --cc=jbenc@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=ykaliuta@redhat.com \
    --cc=zhuyifei@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).