From: Like Xu <email@example.com> To: Song Liu <firstname.lastname@example.org> Cc: Peter Zijlstra <email@example.com>, bpf <firstname.lastname@example.org>, Networking <email@example.com>, Kernel Team <Kernelfirstname.lastname@example.org>, Andrii Nakryiko <email@example.com>, Alexei Starovoitov <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, Andi Kleen <email@example.com>, "Liang, Kan" <firstname.lastname@example.org> Subject: Re: bpf_get_branch_snapshot on qemu-kvm Date: Sat, 9 Oct 2021 17:03:46 +0800 [thread overview] Message-ID: <email@example.com> (raw) In-Reply-To: <1EB93A74-804B-4EE2-AECB-38580D40C80D@fb.com> On 9/10/2021 1:08 am, Song Liu wrote: > > >> On Oct 7, 2021, at 11:36 PM, Like Xu <firstname.lastname@example.org> wrote: >> >> On 8/10/2021 1:46 pm, Song Liu wrote: >>>> On Oct 7, 2021, at 8:34 PM, Like Xu <email@example.com> wrote: >>>> >>>> On 30/9/2021 4:05 am, Song Liu wrote: >>>>> Hi Kan, >>>>>> On Sep 29, 2021, at 9:35 AM, Liang, Kan <firstname.lastname@example.org> wrote: >>>>>> >>>>>>>>> - get confirmation that clearing GLOBAL_CTRL is suffient to supress >>>>>>>>> PEBS, in which case we can simply remove the PEBS_ENABLE clear. >>>>>>>> >>>>>>>> How should we confirm this? Can we run some tests for this? Or do we >>>>>>>> need hardware experts' input for this? >>>>>>> >>>>>>> I'll put it on the list to ask the hardware people when I talk to them next. But >>>>>>> maybe Kan or Andi know without asking. >>>>>> >>>>>> If the GLOBAL_CTRL is explicitly disabled, the counters do not count anymore. >>>>>> It doesn't matter if PEBS is enabled or not. >>>>>> >>>>>> See 6c1c07b33eb0 ("perf/x86/intel: Avoid unnecessary PEBS_ENABLE MSR >>>>>> access in PMI "). We optimized the PMU handler base on it. >>>>> Thanks for these information! >>>>> IIUC, all we need is the following on top of bpf-next/master: >>>>> diff --git i/arch/x86/events/intel/core.c w/arch/x86/events/intel/core.c >>>>> index 1248fc1937f82..d0d357e7d6f21 100644 >>>>> --- i/arch/x86/events/intel/core.c >>>>> +++ w/arch/x86/events/intel/core.c >>>>> @@ -2209,7 +2209,6 @@ intel_pmu_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int >>>>> /* must not have branches... */ >>>>> local_irq_save(flags); >>>>> __intel_pmu_disable_all(false); /* we don't care about BTS */ >>>> >>>> If the value passed in is true, does it affect your use case? >>>> >>>>> - __intel_pmu_pebs_disable_all(); >>>> >>>> In that case, we can reuse "static __always_inline void intel_pmu_disable_all(void)" >>>> regardless of whether PEBS is supported or enabled inside the guest and the host ? >>>> >>>>> __intel_pmu_lbr_disable(); >>>> >>>> How about using intel_pmu_lbr_disable_all() to cover Arch LBR? >>> We are using LBR without PMI, so there isn't any hardware mechanism to >>> stop the LBR, we have to stop it in software. There is always a delay >>> between the event triggers and the LBR is stopped. In this window, >> >> Do you use counters for snapshot branch stack? >> >> Can the assumption of "without PMI" be broken sine Intel does have >> the hardware mechanism like "freeze LBR on counter overflow >> (aka, DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)" ? > > We are capturing LBR on software events. For example, when a complex syscall, > such as sys_bpf() and sys_perf_event_open(), returns -EINVAL, it is not obvious > what wen wrong. The branch stack at the return (on a kretprobe or fexit) could > give us additional information. > >> >>> the LBR is still running and old entries are being replaced by new entries. >>> We actually need the old entries before the triggering event, so the key >>> design goal here is to minimize the number of branch instructions between >>> the event triggers and the LBR is stopped. >> >> Yes, it makes sense. >> >>> Here, both __intel_pmu_disable_all(false) and __intel_pmu_lbr_disable() >>> are used to optimize for this goal: the fewer branch instructions the >>> better. >> >> Is it possible that we have another LBR in-kernel user in addition to the current >> BPF-LBR snapshot user, such as another BPF-LBR snapshot user or a LBR perf user ? > > I think it is OK to have another user. We just need to capture the LBR entries. > In fact, we simply enable LBR by opening a perf_event on each CPU. So from the > kernel's point of view, the LBR is owned used by "another user". > >> >> In the intel_pmu_snapshot_[arch]_branch_stack(), what if there is a PMI or NMI handler >> to be called before __intel_pmu_lbr_disable(), which means more branch instructions >> (assuming we don't use the FREEZE_LBRS_ON_xxx capability)? > > If we are unlucky and hit an NMI, we may get garbage data. The user will run the > test again. > >> How about try to disable LBR at the earliest possible time, before __intel_pmu_disable_all(false) ? > > I am not sure which solution is the best here. On bare metal, current version works > fine (available in bpf-next tree). > >> >>> After removing __intel_pmu_pebs_disable_all() from >>> intel_pmu_snapshot_branch_stack(), we found quite a few LBR entries in >>> extable related code. With these entries, snapshot branch stack is not >> >> Are you saying that you still need to call >> __intel_pmu_pebs_disable_all() to maintain precision ? > > I think we don't need pebs_disable_all. In the VM, pebs_disable_all will trigger > "unchecked MSR access error" warning. After removing it, the warning message is > gone. However, after we remove pebs_disable_all, we still see too many LBR entries > are flushed before LBR is stopped. Most of these new entries are in extable code. > I guess this is because the VM access these MSR differently. Hi Song, Thanks for your detailed input. I saw your workaround "if (is_hypervisor())" on the tree. Even when the guest supports PEBS, this use case fails and the root cause is still playing hide-and-seek with me. Just check with you to see if you get similar results when the guest LBR behavior makes the test case fail like this: serial_test_get_branch_snapshot:FAIL:find_looptest_in_lbr unexpected find_looptest_in_lbr: actual 0 <= expected 6 serial_test_get_branch_snapshot:FAIL:check_wasted_entries unexpected check_wasted_entries: actual 32 >= expected 10 #52 get_branch_snapshot:FAIL Also, do you know or rough guess about how extable code relates to the test case ? > > Thanks, > Song >
next prev parent reply other threads:[~2021-10-09 9:03 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-29 0:04 Song Liu 2021-09-29 7:35 ` Peter Zijlstra 2021-09-29 12:05 ` Like Xu 2021-09-29 12:26 ` Peter Zijlstra 2021-09-29 14:42 ` Song Liu 2021-09-29 15:59 ` Peter Zijlstra 2021-09-29 16:35 ` Liang, Kan 2021-09-29 20:05 ` Song Liu 2021-10-08 3:34 ` Like Xu 2021-10-08 5:46 ` Song Liu 2021-10-08 6:36 ` Like Xu 2021-10-08 17:08 ` Song Liu 2021-10-09 9:03 ` Like Xu [this message] 2021-10-26 7:09 ` Song Liu 2021-10-28 3:09 ` Like Xu 2021-10-29 18:09 ` Song Liu 2021-09-29 23:20 ` Andi Kleen 2021-09-29 14:39 ` Song Liu
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --email@example.com \ --firstname.lastname@example.org \ --cc=Kernelemail@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: bpf_get_branch_snapshot on qemu-kvm' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.