Re: [PATCH] x86/vpmu_intel: Fix hypervisor crash by catching wrmsr fault

From: "Jan Beulich" <JBeulich@suse.com>
To: Mohit Gambhir <mohit.gambhir@oracle.com>
Cc: kevin.tian@intel.com, boris.ostrovsky@oracle.com,
	jun.nakajima@intel.com, xen-devel@lists.xen.org
Subject: Re: [PATCH] x86/vpmu_intel: Fix hypervisor crash by catching wrmsr fault
Date: Mon, 24 Apr 2017 10:04:16 -0600	[thread overview]
Message-ID: <58FE3E200200007800153A4B@prv-mh.provo.novell.com> (raw)
In-Reply-To: <73ddd600-b14c-257f-9322-33a516e050a2@oracle.com>

>>> On 24.04.17 at 17:44, <mohit.gambhir@oracle.com> wrote:
> On 04/21/2017 03:14 AM, Jan Beulich wrote:
>>>>> On 20.04.17 at 19:49,<mohit.gambhir@oracle.com>  wrote:
>>> This patch changes wrmsrl() calls to write to MSR_P6_EVTSEL register in the
>>> VPMU to wrmsr_safe(). There are known (and possibly some unknown) cases where
>>> setting certain bits in MSR_P6_EVTSEL reg. can cause a General Protection
>>> fault on some machines. Unless we catch this fault when it happens, it will
>>> result in a hypervisor crash.
>>>
>>> For instance, setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results
>>> in a General Protection Fault on Broadwell machines and thus causes the
>>> hypervisor to crash.
>> I can't seem to find indication of this in the SDM. Is this removal of a
>> valid bit from an architectural MSR documented anywhere?
> It see it as an observed hardware bug so I don't expect it to be found 
> in the SDM.  I reached out to
> the folks at Intel (CC) but they are unaware of it. Ideally it should 
> work the way it says in the SDM
> for all events and on all machines but it doesn't - not for the all the 
> events and certainly not on all
> machines.

Please have the description make clear this is an (assumed)
erratum, until Intel either clarifies or adds an entry to the spec
update for the affected model(s).

>> Further a more general question: Why do you do this only for
>> MSR_P6_EVNTSEL*, but not consistently for all MSRs which are
>> being written with (partially) guest controlled values? I'd namely
>> expect all wrmsrl() to be switched over, unless it is clear that
>> we're dealing with an erratum here and we therefore can
>> assume that normal guarantees for architectural MSRs apply to
>> everything else.
> I think it is an erratum for one specific MSR which is MSR_P6_EVTSEL
>> And finally, in the description you talk about forwarding the
>> fault - I can see this being the case for core2_vpmu_do_wrmsr(),
>> but how does this match up with __core2_vpmu_load() and its
>> callers? Namely you may report an error there after already
>> having updated some MSRs. Leaving the overall set of MSRs in
>> an inconsistent state is not very likely to be a good idea.
> Yes, that's something I did not think about and needs to be fixed. There 
> are two issues here -
> 
> 1. What should we set MSR values in case of an error? We can not revert 
> them because if it is part of a
> context switch  then we do not want to leak the values from the previous 
> guest to the new guest.
> So one possible solution is to  wipe all PMU MSRs (set them to 0) before 
> returning the error.
> 
> 2. __core2_vpmu_load is called in two scenarios - One from a hyper call 
> XENPMU_FLUSH from do xenpmu_op()
> and the other during a context switch from vpmu_switch_to(). The 
> hypercall seems to be handling the error
> correctly but vpmu_switch_to() ignores the error. So to propagate the 
> error up to the guest kernel maybe we should
> inject the fault into the guest using 
> hvm_inject_hw_exception(TRAP_gp_fault, error)/ 
> pv_inject_hw_exception(TRAP_gp_fault, error)?

You can't arbitrarily inject an exception out of the context switch
code. I'm not sure you can sensibly continue the guest in this
case, unless - as you say - there is a way to place the PMU into a
consistent state (and you provide _some_kind of indication to the
guest about the vPMU state no longer matching its expectations).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel