kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jim Mattson <jmattson@google.com>
To: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: David Dunn <daviddunn@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Like Xu <like.xu.linux@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Like Xu <likexu@tencent.com>,
	Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly
Date: Thu, 10 Feb 2022 11:16:52 -0800	[thread overview]
Message-ID: <CALMp9eSjDro169JjTXyCZn=Rf3PT0uHhdNXEifiXGYQK-Zn8LA@mail.gmail.com> (raw)
In-Reply-To: <e06db1a5-1b67-28ac-ee4c-34ece5857b1f@linux.intel.com>

On Thu, Feb 10, 2022 at 10:30 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2/10/2022 11:34 AM, Jim Mattson wrote:
> > On Thu, Feb 10, 2022 at 7:34 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
> >>
> >>
> >>
> >> On 2/9/2022 2:24 PM, David Dunn wrote:
> >>> Dave,
> >>>
> >>> In my opinion, the right policy depends on what the host owner and
> >>> guest owner are trying to achieve.
> >>>
> >>> If the PMU is being used to locate places where performance could be
> >>> improved in the system, there are two sub scenarios:
> >>>      - The host and guest are owned by same entity that is optimizing
> >>> overall system.  In this case, the guest doesn't need PMU access and
> >>> better information is provided by profiling the entire system from the
> >>> host.
> >>>      - The host and guest are owned by different entities.  In this
> >>> case, profiling from the host can identify perf issues in the guest.
> >>> But what action can be taken?  The host entity must communicate issues
> >>> back to the guest owner through some sort of out-of-band information
> >>> channel.  On the other hand, preempting the host PMU to give the guest
> >>> a fully functional PMU serves this use case well.
> >>>
> >>> TDX and SGX (outside of debug mode) strongly assume different
> >>> entities.  And Intel is doing this to reduce insight of the host into
> >>> guest operations.  So in my opinion, preemption makes sense.
> >>>
> >>> There are also scenarios where the host owner is trying to identify
> >>> systemwide impacts of guest actions.  For example, detecting memory
> >>> bandwidth consumption or split locks.  In this case, host control
> >>> without preemption is necessary.
> >>>
> >>> To address these various scenarios, it seems like the host needs to be
> >>> able to have policy control on whether it is willing to have the PMU
> >>> preempted by the guest.
> >>>
> >>> But I don't see what scenario is well served by the current situation
> >>> in KVM.  Currently the guest will either be told it has no PMU (which
> >>> is fine) or that it has full control of a PMU.  If the guest is told
> >>> it has full control of the PMU, it actually doesn't.  But instead of
> >>> losing counters on well defined events (from the guest perspective),
> >>> they simply stop counting depending on what the host is doing with the
> >>> PMU.
> >>
> >> For the current perf subsystem, a PMU should be shared among different
> >> users via the multiplexing mechanism if the resource is limited. No one
> >> has full control of a PMU for lifetime. A user can only have the PMU in
> >> its given period. I think the user can understand how long it runs via
> >> total_time_enabled and total_time_running.
> >
> > For most clients, yes. For kvm, no. KVM currently tosses
> > total_time_enabled and total_time_running in the bitbucket. It could
> > extrapolate, but that would result in loss of precision. Some guest
> > uses of the PMU would not be able to cope (e.g.
> > https://github.com/rr-debugger/rr).
> >
> >> For a guest, it should rely on the host to tell whether the PMU resource
> >> is available. But unfortunately, I don't think we have such a
> >> notification mechanism in KVM. The guest has the wrong impression that
> >> the guest can have full control of the PMU.
> >
> > That is the only impression that the architectural specification
> > allows the guest to have. On Intel, we can mask off individual fixed
> > counters, and we can reduce the number of GP counters, but AMD offers
> > us no such freedom. Whatever resources we advertise to the guest must
> > be available for its use whenever it wants. Otherwise, PMU
> > virtualization is simply broken.
> >
> >> In my opinion, we should add the notification mechanism in KVM. When the
> >> PMU resource is limited, the guest can know whether it's multiplexing or
> >> can choose to reschedule the event.
> >
> > That sounds like a paravirtual perf mechanism, rather than PMU
> > virtualization. Are you suggesting that we not try to virtualize the
> > PMU? Unfortunately, PMU virtualization is what we have customers
> > clamoring for. No one is interested in a paravirtual perf mechanism.
> > For example, when will VTune in the guest know how to use your
> > proposed paravirtual interface?
>
> OK. If KVM cannot notify the guest, maybe guest can query the usage of
> counters before using a counter. There is a IA32_PERF_GLOBAL_INUSE MSR
> introduced with Arch perfmon v4. The MSR provides an "InUse" bit for
> each counters. But it cannot guarantee that the counter can always be
> owned by the guest unless the host treats the guest as a super-user and
> agrees to not touch its counter. This should only works for the Intel
> platforms.

Simple question: Do all existing guests (Windows and Linux are my
primary interest) query that MSR today? If not, then this proposal is
DOA.

> >
> >> But seems the notification mechanism may not work for TDX case?
> >>>
> >>> On the other hand, if we flip it around the semantics are more clear.
> >>> A guest will be told it has no PMU (which is fine) or that it has full
> >>> control of the PMU.  If the guest is told that it has full control of
> >>> the PMU, it does.  And the host (which is the thing that granted the
> >>> full PMU to the guest) knows that events inside the guest are not
> >>> being measured.  This results in all entities seeing something that
> >>> can be reasoned about from their perspective.
> >>>
> >>
> >> I assume that this is for the TDX case (where the notification mechanism
> >>    doesn't work). The host still control all the PMU resources. The TDX
> >> guest is treated as a super-user who can 'own' a PMU. The admin in the
> >> host can configure/change the owned PMUs of the TDX. Personally, I think
> >> it makes sense. But please keep in mind that the counters are not
> >> identical. There are some special events that can only run on a specific
> >> counter. If the special counter is assigned to TDX, other entities can
> >> never run some events. We should let other entities know if it happens.
> >> Or we should never let non-host entities own the special counter.
> >
> > Right; the counters are not fungible. Ideally, when the guest requests
> > a particular counter, that is the counter it gets. If it is given a
> > different counter, the counter it is given must provide the same
> > behavior as the requested counter for the event in question.
>
> Ideally, Yes, but sometimes KVM/host may not know whether they can use
> another counter to replace the requested counter, because KVM/host
> cannot retrieve the event constraint information from guest.

In that case, don't do it. When the guest asks for a specific counter,
give the guest that counter. This isn't rocket science.

> For example, we have Precise Distribution (PDist) feature enabled only
> for the GP counter 0 on SPR. Perf uses the precise_level 3 (a SW
> variable) to indicate the feature. For the KVM/host, they never know
> whether the guest apply the PDist feature.
>
> I have a patch that forces the perf scheduler starts from the regular
> counters, which may mitigates the issue, but cannot fix it. (I will post
> the patch separately.)
>
> Or we should never let the guest own the special counters. Although the
> guest has to lose some special events, I guess the host may more likely
> be willing to let the guest own a regular counter.
>
>
> Thanks,
> Kan
>
> >
> >>
> >> Thanks,
> >> Kan
> >>
> >>> Thanks,
> >>>
> >>> Dave Dunn
> >>>
> >>> On Wed, Feb 9, 2022 at 10:57 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >>>
> >>>>> I was referring to gaps in the collection of data that the host perf
> >>>>> subsystem doesn't know about if ATTRIBUTES.PERFMON is set for a TDX
> >>>>> guest. This can potentially be a problem if someone is trying to
> >>>>> measure events per unit of time.
> >>>>
> >>>> Ahh, that makes sense.
> >>>>
> >>>> Does SGX cause problem for these people?  It can create some of the same
> >>>> collection gaps:
> >>>>
> >>>>           performance monitoring activities are suppressed when entering
> >>>>           an opt-out (of performance monitoring) enclave.

  reply	other threads:[~2022-02-10 19:17 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-17  8:53 [PATCH kvm/queue v2 0/3] KVM: x86/pmu: Fix out-of-date AMD amd_event_mapping[] Like Xu
2022-01-17  8:53 ` [PATCH kvm/queue v2 1/3] KVM: x86/pmu: Replace pmu->available_event_types with a new BITMAP Like Xu
2022-02-01 12:26   ` Paolo Bonzini
2022-01-17  8:53 ` [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly Like Xu
2022-02-01 12:27   ` Paolo Bonzini
2022-02-02 14:43   ` Peter Zijlstra
2022-02-02 22:35     ` Jim Mattson
2022-02-03 17:33       ` David Dunn
2022-02-09  8:10       ` KVM: x86: Reconsider the current approach of vPMU Like Xu
2022-02-09 13:33         ` Peter Zijlstra
2022-02-09 21:00           ` Sean Christopherson
2022-02-10 12:08             ` Like Xu
2022-02-10 17:12               ` Sean Christopherson
2022-02-16  3:33                 ` Like Xu
2022-02-16 17:53                   ` Jim Mattson
2022-02-09 13:21       ` [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly Peter Zijlstra
2022-02-09 15:40         ` Dave Hansen
2022-02-09 18:47           ` Jim Mattson
2022-02-09 18:57             ` Dave Hansen
2022-02-09 19:24               ` David Dunn
2022-02-10 13:29                 ` Like Xu
2022-02-10 15:34                 ` Liang, Kan
2022-02-10 16:34                   ` Jim Mattson
2022-02-10 18:30                     ` Liang, Kan
2022-02-10 19:16                       ` Jim Mattson [this message]
2022-02-10 19:46                         ` Liang, Kan
2022-02-10 19:55                           ` David Dunn
2022-02-11 14:11                             ` Liang, Kan
2022-02-11 18:08                               ` Jim Mattson
2022-02-11 21:47                                 ` Liang, Kan
2022-02-12 23:31                                   ` Jim Mattson
2022-02-14 21:55                                     ` Liang, Kan
2022-02-14 22:55                                       ` Jim Mattson
2022-02-16  7:36                                         ` Like Xu
2022-02-16 18:10                                           ` Jim Mattson
2022-02-16  7:30                           ` Like Xu
2022-02-16  5:08                   ` Like Xu
2022-02-10 12:55               ` Like Xu
2022-02-12 23:32               ` Jim Mattson
2022-02-08 11:52     ` Like Xu
2022-01-17  8:53 ` [PATCH kvm/queue v2 3/3] KVM: x86/pmu: Setup the {inte|amd}_event_mapping[] when hardware_setup Like Xu
2022-02-01 12:28   ` Paolo Bonzini
2022-02-08 10:10     ` Like Xu
2022-01-26 11:22 ` [PATCH kvm/queue v2 0/3] KVM: x86/pmu: Fix out-of-date AMD amd_event_mapping[] Like Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALMp9eSjDro169JjTXyCZn=Rf3PT0uHhdNXEifiXGYQK-Zn8LA@mail.gmail.com' \
    --to=jmattson@google.com \
    --cc=dave.hansen@intel.com \
    --cc=daviddunn@google.com \
    --cc=eranian@google.com \
    --cc=joro@8bytes.org \
    --cc=kan.liang@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).