All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
To: Yonghong Song <yhs@meta.com>, Hou Tao <houtao1@huawei.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>
Subject: Re: [bpf-next v3 2/2] selftests/bpf: add test case for htab map
Date: Wed, 4 Jan 2023 22:32:06 +0800	[thread overview]
Message-ID: <CAMDZJNWs3jSbZwbSe-U4ypMSqhjgJ=Z8AyYcz=wEzX-B4pJg7w@mail.gmail.com> (raw)
In-Reply-To: <ff6473d8-c640-267e-c0f7-a92ce747c888@meta.com>

On Wed, Jan 4, 2023 at 4:01 PM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 1/3/23 11:51 PM, Hou Tao wrote:
> > Hi,
> >
> > On 1/4/2023 3:09 PM, Yonghong Song wrote:
> >>
> >>
> >> On 1/2/23 6:40 PM, Tonghao Zhang wrote:
> >>>    a
> >>>
> >>> On Thu, Dec 29, 2022 at 2:29 PM Yonghong Song <yhs@meta.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 12/28/22 2:24 PM, Alexei Starovoitov wrote:
> >>>>> On Tue, Dec 27, 2022 at 8:43 PM Yonghong Song <yhs@meta.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 12/18/22 8:15 PM, xiangxia.m.yue@gmail.com wrote:
> >>>>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>>>>>
> >>>>>>> This testing show how to reproduce deadlock in special case.
> >>>>>>> We update htab map in Task and NMI context. Task can be interrupted by
> >>>>>>> NMI, if the same map bucket was locked, there will be a deadlock.
> >>>>>>>
> >>>>>>> * map max_entries is 2.
> >>>>>>> * NMI using key 4 and Task context using key 20.
> >>>>>>> * so same bucket index but map_locked index is different.
> >>>>>>>
> >>>>>>> The selftest use perf to produce the NMI and fentry nmi_handle.
> >>>>>>> Note that bpf_overflow_handler checks bpf_prog_active, but in bpf update
> >>>>>>> map syscall increase this counter in bpf_disable_instrumentation.
> >>>>>>> Then fentry nmi_handle and update hash map will reproduce the issue.
> > SNIP
> >>>>>>> diff --git a/tools/testing/selftests/bpf/progs/htab_deadlock.c
> >>>>>>> b/tools/testing/selftests/bpf/progs/htab_deadlock.c
> >>>>>>> new file mode 100644
> >>>>>>> index 000000000000..d394f95e97c3
> >>>>>>> --- /dev/null
> >>>>>>> +++ b/tools/testing/selftests/bpf/progs/htab_deadlock.c
> >>>>>>> @@ -0,0 +1,32 @@
> >>>>>>> +// SPDX-License-Identifier: GPL-2.0
> >>>>>>> +/* Copyright (c) 2022 DiDi Global Inc. */
> >>>>>>> +#include <linux/bpf.h>
> >>>>>>> +#include <bpf/bpf_helpers.h>
> >>>>>>> +#include <bpf/bpf_tracing.h>
> >>>>>>> +
> >>>>>>> +char _license[] SEC("license") = "GPL";
> >>>>>>> +
> >>>>>>> +struct {
> >>>>>>> +     __uint(type, BPF_MAP_TYPE_HASH);
> >>>>>>> +     __uint(max_entries, 2);
> >>>>>>> +     __uint(map_flags, BPF_F_ZERO_SEED);
> >>>>>>> +     __type(key, unsigned int);
> >>>>>>> +     __type(value, unsigned int);
> >>>>>>> +} htab SEC(".maps");
> >>>>>>> +
> >>>>>>> +/* nmi_handle on x86 platform. If changing keyword
> >>>>>>> + * "static" to "inline", this prog load failed. */
> >>>>>>> +SEC("fentry/nmi_handle")
> >>>>>>
> >>>>>> The above comment is not what I mean. In arch/x86/kernel/nmi.c,
> >>>>>> we have
> >>>>>>       static int nmi_handle(unsigned int type, struct pt_regs *regs)
> >>>>>>       {
> >>>>>>            ...
> >>>>>>       }
> >>>>>>       ...
> >>>>>>       static noinstr void default_do_nmi(struct pt_regs *regs)
> >>>>>>       {
> >>>>>>            ...
> >>>>>>            handled = nmi_handle(NMI_LOCAL, regs);
> >>>>>>            ...
> >>>>>>       }
> >>>>>>
> >>>>>> Since nmi_handle is a static function, it is possible that
> >>>>>> the function might be inlined in default_do_nmi by the
> >>>>>> compiler. If this happens, fentry/nmi_handle will not
> >>>>>> be triggered and the test will pass.
> >>>>>>
> >>>>>> So I suggest to change the comment to
> >>>>>>       nmi_handle() is a static function and might be
> >>>>>>       inlined into its caller. If this happens, the
> >>>>>>       test can still pass without previous kernel fix.
> >>>>>
> >>>>> It's worse than this.
> >>>>> fentry is buggy.
> >>>>> We shouldn't allow attaching fentry to:
> >>>>> NOKPROBE_SYMBOL(nmi_handle);
> >>>>
> >>>> Okay, I see. Looks we should prevent fentry from
> >>>> attaching any NOKPROBE_SYMBOL functions.
> >>>>
> >>>> BTW, I think fentry/nmi_handle can be replaced with
> >>>> tracepoint nmi/nmi_handler. it is more reliable
> >>> The tracepoint will not reproduce the deadlock(we have discussed v2).
> >>> If it's not easy to complete a test for this case, should we drop this
> >>> testcase patch? or fentry the nmi_handle and update the comments.
> >>
> >> could we use a softirq perf event (timer), e.g.,
> >>
> >>          struct perf_event_attr attr = {
> >>                  .sample_period = 1,
> >>                  .type = PERF_TYPE_SOFTWARE,
> >>                  .config = PERF_COUNT_SW_CPU_CLOCK,
> >>          };
> >>
> >> then you can attach function hrtimer_run_softirq (not tested) or
> >> similar functions?
> > The context will be a hard-irq context, right ? Because htab_lock_bucket() has
> > already disabled hard-irq on current CPU, so the dead-lock will be impossible.
>
> Okay, I see. soft-irq doesn't work. The only thing it works is nmi since
> it is non-masking.
>
> >>
> >> I suspect most (if not all) functions in nmi path cannot
> >> be kprobe'd.
> > It seems that perf_event_nmi_handler() is also nokprobe function. However I
> > think we could try its callees (e.g., x86_pmu_handle_irq or perf_event_overflow).
>
> If we can find a function in nmi handling path which is not marked as
> nonkprobe, sure, we can use that function for fentry.
I think perf_event_overflow is ok.
[   93.233093]  dump_stack_lvl+0x57/0x81
[   93.233098]  lock_acquire+0x1f4/0x29a
[   93.233103]  ? htab_lock_bucket+0x61/0x6c
[   93.233108]  _raw_spin_lock_irqsave+0x43/0x7f
[   93.233111]  ? htab_lock_bucket+0x61/0x6c
[   93.233114]  htab_lock_bucket+0x61/0x6c
[   93.233118]  htab_map_update_elem+0x11e/0x220
[   93.233124]  bpf_prog_df326439468c24a9_bpf_prog1+0x41/0x45
[   93.233137]  bpf_trampoline_6442478975_0+0x48/0x1000
[   93.233144]  perf_event_overflow+0x5/0x15
[   93.233149]  handle_pmi_common+0x1ad/0x1f0
[   93.233166]  intel_pmu_handle_irq+0x136/0x18a
[   93.233170]  perf_event_nmi_handler+0x28/0x47
[   93.233176]  nmi_handle+0xb8/0x254
[   93.233182]  default_do_nmi+0x3d/0xf6
[   93.233187]  exc_nmi+0xa1/0x109
[   93.233191]  end_repeat_nmi+0x16/0x67
> >>
> >>>> and won't be impacted by potential NOKPROBE_SYMBOL
> >>>> issues.
> >>>
> >>>
> >>>
> >> .
> >



-- 
Best regards, Tonghao

  reply	other threads:[~2023-01-04 14:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-19  4:15 [bpf-next v3 1/2] bpf: hash map, avoid deadlock with suitable hash mask xiangxia.m.yue
2022-12-19  4:15 ` [bpf-next v3 2/2] selftests/bpf: add test case for htab map xiangxia.m.yue
2022-12-28  4:42   ` Yonghong Song
2022-12-28 22:24     ` Alexei Starovoitov
2022-12-29  6:29       ` Yonghong Song
2023-01-03  2:40         ` Tonghao Zhang
2023-01-04  7:09           ` Yonghong Song
2023-01-04  7:51             ` Hou Tao
2023-01-04  8:01               ` Yonghong Song
2023-01-04 14:32                 ` Tonghao Zhang [this message]
2023-01-04 17:10                   ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMDZJNWs3jSbZwbSe-U4ypMSqhjgJ=Z8AyYcz=wEzX-B4pJg7w@mail.gmail.com' \
    --to=xiangxia.m.yue@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=houtao1@huawei.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=yhs@fb.com \
    --cc=yhs@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.