kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Jim Mattson <jmattson@google.com>
Cc: David Dunn <daviddunn@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Like Xu <like.xu.linux@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Like Xu <likexu@tencent.com>,
	Stephane Eranian <eranian@google.com>
Subject: Re: [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly
Date: Fri, 11 Feb 2022 16:47:42 -0500	[thread overview]
Message-ID: <2180ea93-5f05-b1c1-7253-e3707da29f8c@linux.intel.com> (raw)
In-Reply-To: <CALMp9eQ79Cnh1aqNGLR8n5MQuHTwuf=DMjJ2cTb9uXu94GGhEA@mail.gmail.com>



On 2/11/2022 1:08 PM, Jim Mattson wrote:
> On Fri, Feb 11, 2022 at 6:11 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>
>>
>>
>> On 2/10/2022 2:55 PM, David Dunn wrote:
>>> Kan,
>>>
>>> On Thu, Feb 10, 2022 at 11:46 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>>>
>>>> No, we don't, at least for Linux. Because the host own everything. It
>>>> doesn't need the MSR to tell which one is in use. We track it in an SW way.
>>>>
>>>> For the new request from the guest to own a counter, I guess maybe it is
>>>> worth implementing it. But yes, the existing/legacy guest never check
>>>> the MSR.
>>>
>>> This is the expectation of all software that uses the PMU in every
>>> guest.  It isn't just the Linux perf system.
>>>
>>> The KVM vPMU model we have today results in the PMU utilizing software
>>> simply not working properly in a guest.  The only case that can
>>> consistently "work" today is not giving the guest a PMU at all.
>>>
>>> And that's why you are hearing requests to gift the entire PMU to the
>>> guest while it is running. All existing PMU software knows about the
>>> various constraints on exactly how each MSR must be used to get sane
>>> data.  And by gifting the entire PMU it allows that software to work
>>> properly.  But that has to be controlled by policy at host level such
>>> that the owner of the host knows that they are not going to have PMU
>>> visibility into guests that have control of PMU.
>>>
>>
>> I think here is how a guest event works today with KVM and perf subsystem.
>>       - Guest create an event A
>>       - The guest kernel assigns a guest counter M to event A, and config
>> the related MSRs of the guest counter M.
>>       - KVM intercepts the MSR access and create a host event B. (The
>> host event B is based on the settings of the guest counter M. As I said,
>> at least for Linux, some SW config impacts the counter assignment. KVM
>> never knows it. Event B can only be a similar event to A.)
>>       - Linux perf subsystem assigns a physical counter N to a host event
>> B according to event B's constraint. (N may not be the same as M,
>> because A and B may have different event constraints)
>>
>> As you can see, even the entire PMU is given to the guest, we still
>> cannot guarantee that the physical counter M can be assigned to the
>> guest event A.
> 
> All we know about the guest is that it has programmed virtual counter
> M. It seems obvious to me that we can satisfy that request by giving
> it physical counter M. If, for whatever reason, we give it physical
> counter N isntead, and M and N are not completely fungible, then we
> have failed.
> 
>> How to fix it? The only thing I can imagine is "passthrough". Let KVM
>> directly assign the counter M to guest. So, to me, this policy sounds
>> like let KVM replace the perf to control the whole PMU resources, and we
>> will handover them to our guest then. Is it what we want?
> 
> We want PMU virtualization to work. There are at least two ways of doing that:
> 1) Cede the entire PMU to the guest while it's running.

So the guest will take over the control of the entire PMUs while it's 
running. I know someone wants to do system-wide monitoring. This case 
will be failed.

I'm not sure whether you can fully trust the guest. If malware runs in 
the guest, I don't know whether it will harm the entire system. I'm not 
a security expert, but it sounds dangerous.
Hope the administrators know what they are doing when choosing this policy.

> 2) Introduce a new "ultimate" priority level in the host perf
> subsystem. Only KVM can request events at the ultimate priority, and
> these requests supersede any other requests.

The "ultimate" priority level doesn't help in the above case. The 
counter M may not bring the precise which guest requests. I remember you 
called it "broken".

KVM can fails the case, but KVM cannot notify the guest. The guest still 
see wrong result.

> 
> Other solutions are welcome.

I don't have a perfect solution to achieve all your requirements. Based 
on my understanding, the guest has to be compromised by either 
tolerating some errors or dropping some features (e.g., some special 
events). With that, we may consider the above "ultimate" priority level 
policy. The default policy would be the same as the current 
implementation, where the host perf treats all the users, including the 
guest, equally. The administrators can set the "ultimate" priority level 
policy, which may let the KVM/guest pin/own some regular counters via 
perf subsystem. That's just my personal opinion for your reference.

Thanks,
Kan

  reply	other threads:[~2022-02-11 21:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-17  8:53 [PATCH kvm/queue v2 0/3] KVM: x86/pmu: Fix out-of-date AMD amd_event_mapping[] Like Xu
2022-01-17  8:53 ` [PATCH kvm/queue v2 1/3] KVM: x86/pmu: Replace pmu->available_event_types with a new BITMAP Like Xu
2022-02-01 12:26   ` Paolo Bonzini
2022-01-17  8:53 ` [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly Like Xu
2022-02-01 12:27   ` Paolo Bonzini
2022-02-02 14:43   ` Peter Zijlstra
2022-02-02 22:35     ` Jim Mattson
2022-02-03 17:33       ` David Dunn
2022-02-09  8:10       ` KVM: x86: Reconsider the current approach of vPMU Like Xu
2022-02-09 13:33         ` Peter Zijlstra
2022-02-09 21:00           ` Sean Christopherson
2022-02-10 12:08             ` Like Xu
2022-02-10 17:12               ` Sean Christopherson
2022-02-16  3:33                 ` Like Xu
2022-02-16 17:53                   ` Jim Mattson
2022-02-09 13:21       ` [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event_map[] directly Peter Zijlstra
2022-02-09 15:40         ` Dave Hansen
2022-02-09 18:47           ` Jim Mattson
2022-02-09 18:57             ` Dave Hansen
2022-02-09 19:24               ` David Dunn
2022-02-10 13:29                 ` Like Xu
2022-02-10 15:34                 ` Liang, Kan
2022-02-10 16:34                   ` Jim Mattson
2022-02-10 18:30                     ` Liang, Kan
2022-02-10 19:16                       ` Jim Mattson
2022-02-10 19:46                         ` Liang, Kan
2022-02-10 19:55                           ` David Dunn
2022-02-11 14:11                             ` Liang, Kan
2022-02-11 18:08                               ` Jim Mattson
2022-02-11 21:47                                 ` Liang, Kan [this message]
2022-02-12 23:31                                   ` Jim Mattson
2022-02-14 21:55                                     ` Liang, Kan
2022-02-14 22:55                                       ` Jim Mattson
2022-02-16  7:36                                         ` Like Xu
2022-02-16 18:10                                           ` Jim Mattson
2022-02-16  7:30                           ` Like Xu
2022-02-16  5:08                   ` Like Xu
2022-02-10 12:55               ` Like Xu
2022-02-12 23:32               ` Jim Mattson
2022-02-08 11:52     ` Like Xu
2022-01-17  8:53 ` [PATCH kvm/queue v2 3/3] KVM: x86/pmu: Setup the {inte|amd}_event_mapping[] when hardware_setup Like Xu
2022-02-01 12:28   ` Paolo Bonzini
2022-02-08 10:10     ` Like Xu
2022-01-26 11:22 ` [PATCH kvm/queue v2 0/3] KVM: x86/pmu: Fix out-of-date AMD amd_event_mapping[] Like Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2180ea93-5f05-b1c1-7253-e3707da29f8c@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=daviddunn@google.com \
    --cc=eranian@google.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=likexu@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).