All of lore.kernel.org
 help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Like Xu <like.xu.linux@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	"like.xu@linux.intel.com" <like.xu@linux.intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	"Liang, Kan" <kan.liang@intel.com>
Subject: Re: bpf_get_branch_snapshot on qemu-kvm
Date: Fri, 8 Oct 2021 17:08:42 +0000	[thread overview]
Message-ID: <1EB93A74-804B-4EE2-AECB-38580D40C80D@fb.com> (raw)
In-Reply-To: <d46d96ff-02a3-5ea7-a273-2945f4ef17a5@gmail.com>



> On Oct 7, 2021, at 11:36 PM, Like Xu <like.xu.linux@gmail.com> wrote:
> 
> On 8/10/2021 1:46 pm, Song Liu wrote:
>>> On Oct 7, 2021, at 8:34 PM, Like Xu <like.xu.linux@gmail.com> wrote:
>>> 
>>> On 30/9/2021 4:05 am, Song Liu wrote:
>>>> Hi Kan,
>>>>> On Sep 29, 2021, at 9:35 AM, Liang, Kan <kan.liang@intel.com> wrote:
>>>>> 
>>>>>>>> - get confirmation that clearing GLOBAL_CTRL is suffient to supress
>>>>>>>>  PEBS, in which case we can simply remove the PEBS_ENABLE clear.
>>>>>>> 
>>>>>>> How should we confirm this? Can we run some tests for this? Or do we
>>>>>>> need hardware experts' input for this?
>>>>>> 
>>>>>> I'll put it on the list to ask the hardware people when I talk to them next. But
>>>>>> maybe Kan or Andi know without asking.
>>>>> 
>>>>> If the GLOBAL_CTRL is explicitly disabled, the counters do not count anymore.
>>>>> It doesn't matter if PEBS is enabled or not.
>>>>> 
>>>>> See 6c1c07b33eb0 ("perf/x86/intel: Avoid unnecessary PEBS_ENABLE MSR
>>>>> access in PMI "). We optimized the PMU handler base on it.
>>>> Thanks for these information!
>>>> IIUC, all we need is the following on top of bpf-next/master:
>>>> diff --git i/arch/x86/events/intel/core.c w/arch/x86/events/intel/core.c
>>>> index 1248fc1937f82..d0d357e7d6f21 100644
>>>> --- i/arch/x86/events/intel/core.c
>>>> +++ w/arch/x86/events/intel/core.c
>>>> @@ -2209,7 +2209,6 @@ intel_pmu_snapshot_branch_stack(struct perf_branch_entry *entries, unsigned int
>>>>         /* must not have branches... */
>>>>         local_irq_save(flags);
>>>>         __intel_pmu_disable_all(false); /* we don't care about BTS */
>>> 
>>> If the value passed in is true, does it affect your use case?
>>> 
>>>> -       __intel_pmu_pebs_disable_all();
>>> 
>>> In that case, we can reuse "static __always_inline void intel_pmu_disable_all(void)"
>>> regardless of whether PEBS is supported or enabled inside the guest and the host ?
>>> 
>>>>         __intel_pmu_lbr_disable();
>>> 
>>> How about using intel_pmu_lbr_disable_all() to cover Arch LBR?
>> We are using LBR without PMI, so there isn't any hardware mechanism to
>> stop the LBR, we have to stop it in software. There is always a delay
>> between the event triggers and the LBR is stopped. In this window,
> 
> Do you use counters for snapshot branch stack?
> 
> Can the assumption of "without PMI" be broken sine Intel does have
> the hardware mechanism like "freeze LBR on counter overflow
> (aka, DEBUGCTLMSR_FREEZE_LBRS_ON_PMI)" ?

We are capturing LBR on software events. For example, when a complex syscall,
such as sys_bpf() and sys_perf_event_open(), returns -EINVAL, it is not obvious
what wen wrong. The branch stack at the return (on a kretprobe or fexit) could 
give us additional information. 

> 
>> the LBR is still running and old entries are being replaced by new entries.
>> We actually need the old entries before the triggering event, so the key
>> design goal here is to minimize the number of branch instructions between
>> the event triggers and the LBR is stopped.
> 
> Yes, it makes sense.
> 
>> Here, both __intel_pmu_disable_all(false) and __intel_pmu_lbr_disable()
>> are used to optimize for this goal: the fewer branch instructions the
>> better.
> 
> Is it possible that we have another LBR in-kernel user in addition to the current
> BPF-LBR snapshot user, such as another BPF-LBR snapshot user or a LBR perf user ?

I think it is OK to have another user. We just need to capture the LBR entries. 
In fact, we simply enable LBR by opening a perf_event on each CPU. So from the 
kernel's point of view, the LBR is owned used by "another user". 

> 
> In the intel_pmu_snapshot_[arch]_branch_stack(), what if there is a PMI or NMI handler
> to be called before __intel_pmu_lbr_disable(), which means more branch instructions
> (assuming we don't use the FREEZE_LBRS_ON_xxx capability)?

If we are unlucky and hit an NMI, we may get garbage data. The user will run the 
test again. 

> How about try to disable LBR at the earliest possible time, before __intel_pmu_disable_all(false) ?

I am not sure which solution is the best here. On bare metal, current version works
fine (available in bpf-next tree). 

> 
>> After removing __intel_pmu_pebs_disable_all() from
>> intel_pmu_snapshot_branch_stack(), we found quite a few LBR entries in
>> extable related code. With these entries, snapshot branch stack is not
> 
> Are you saying that you still need to call
> __intel_pmu_pebs_disable_all() to maintain precision ?

I think we don't need pebs_disable_all. In the VM, pebs_disable_all will trigger
"unchecked MSR access error" warning. After removing it, the warning message is 
gone. However, after we remove pebs_disable_all, we still see too many LBR entries
are flushed before LBR is stopped. Most of these new entries are in extable code. 
I guess this is because the VM access these MSR differently. 

Thanks,
Song

  reply	other threads:[~2021-10-08 17:09 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29  0:04 bpf_get_branch_snapshot on qemu-kvm Song Liu
2021-09-29  7:35 ` Peter Zijlstra
2021-09-29 12:05   ` Like Xu
2021-09-29 12:26     ` Peter Zijlstra
2021-09-29 14:42       ` Song Liu
2021-09-29 15:59         ` Peter Zijlstra
2021-09-29 16:35           ` Liang, Kan
2021-09-29 20:05             ` Song Liu
2021-10-08  3:34               ` Like Xu
2021-10-08  5:46                 ` Song Liu
2021-10-08  6:36                   ` Like Xu
2021-10-08 17:08                     ` Song Liu [this message]
2021-10-09  9:03                       ` Like Xu
2021-10-26  7:09                         ` Song Liu
2021-10-28  3:09                           ` Like Xu
2021-10-29 18:09                             ` Song Liu
2021-09-29 23:20           ` Andi Kleen
2021-09-29 14:39     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1EB93A74-804B-4EE2-AECB-38580D40C80D@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=andi@firstfloor.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=kan.liang@intel.com \
    --cc=like.xu.linux@gmail.com \
    --cc=like.xu@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.