All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
@ 2023-07-19 18:08 Jim Mattson
  2023-07-20  1:25 ` Xiaoyao Li
  0 siblings, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2023-07-19 18:08 UTC (permalink / raw)
  To: kvm list

Normally, we would restrict guest MSR writes based on guest CPU
features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
the case.

For the first non-zero write to IA32_SPEC_CTRL, we check to see that
the host supports the value written. We don't care whether or not the
guest supports the value written (as long as it supports the MSR).
After the first non-zero write, we stop intercepting writes to
IA32_SPEC_CTRL, so the guest can write any value supported by the
hardware. This could be problematic in heterogeneous migration pools.
For instance, a VM that starts on a Cascade Lake host may set
IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
migrated to a Skylake host, KVM_SET_MSRS will refuse to set
IA32_SPEC_CTRL to its current value, because Skylake doesn't support
PSFD.

We disable write intercepts IA32_PRED_CMD as long as the guest
supports the MSR. That's fine for now, since only one bit of PRED_CMD
has been defined. Hence, guest support and host support are
equivalent...today. But, are we really comfortable with letting the
guest set any IA32_PRED_CMD bit that may be defined in the future?

The same question applies to IA32_SPEC_CTRL. Are we comfortable with
letting the guest write to any bit that may be defined in the future?
At least the AMD approach with V_SPEC_CTRL prevents the guest from
clearing any bits set by the host, but on Intel, it's a total
free-for-all. What happens when a new bit is defined that absolutely
must be set to 1 all of the time?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-19 18:08 KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD Jim Mattson
@ 2023-07-20  1:25 ` Xiaoyao Li
  2023-07-20  1:58   ` Chao Gao
  0 siblings, 1 reply; 16+ messages in thread
From: Xiaoyao Li @ 2023-07-20  1:25 UTC (permalink / raw)
  To: Jim Mattson, kvm list; +Cc: Gao, Chao

On 7/20/2023 2:08 AM, Jim Mattson wrote:
> Normally, we would restrict guest MSR writes based on guest CPU
> features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
> the case.
> 
> For the first non-zero write to IA32_SPEC_CTRL, we check to see that
> the host supports the value written. We don't care whether or not the
> guest supports the value written (as long as it supports the MSR).
> After the first non-zero write, we stop intercepting writes to
> IA32_SPEC_CTRL, so the guest can write any value supported by the
> hardware. This could be problematic in heterogeneous migration pools.
> For instance, a VM that starts on a Cascade Lake host may set
> IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
> CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
> migrated to a Skylake host, KVM_SET_MSRS will refuse to set
> IA32_SPEC_CTRL to its current value, because Skylake doesn't support
> PSFD.
> 
> We disable write intercepts IA32_PRED_CMD as long as the guest
> supports the MSR. That's fine for now, since only one bit of PRED_CMD
> has been defined. Hence, guest support and host support are
> equivalent...today. But, are we really comfortable with letting the
> guest set any IA32_PRED_CMD bit that may be defined in the future?
 >
> The same question applies to IA32_SPEC_CTRL. Are we comfortable with
> letting the guest write to any bit that may be defined in the future?

My point is we need to fix it, though Chao has different point that 
sometimes performance may be more important[*]

[*] https://lore.kernel.org/all/ZGdE3jNS11wV+V2w@chao-email/

> At least the AMD approach with V_SPEC_CTRL prevents the guest from
> clearing any bits set by the host, but on Intel, it's a total
> free-for-all. What happens when a new bit is defined that absolutely
> must be set to 1 all of the time?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-20  1:25 ` Xiaoyao Li
@ 2023-07-20  1:58   ` Chao Gao
  2023-07-20  4:04     ` Xiaoyao Li
  2023-07-20  4:04     ` Jim Mattson
  0 siblings, 2 replies; 16+ messages in thread
From: Chao Gao @ 2023-07-20  1:58 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Jim Mattson, kvm list

On Thu, Jul 20, 2023 at 09:25:14AM +0800, Xiaoyao Li wrote:
>On 7/20/2023 2:08 AM, Jim Mattson wrote:
>> Normally, we would restrict guest MSR writes based on guest CPU
>> features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
>> the case.

This issue isn't specific to the two MSRs. Any MSRs that are not
intercepted and with some reserved bits for future extenstions may run
into this issue. Right? IMO, it is a conflict of interests between
disabling MSR write intercept for less VM-exits and host's control over
the value written to the MSR by guest.

We may need something like CR0/CR4 masks and read shadows for all MSRs
to address this fundamental issue.

>> 
>> For the first non-zero write to IA32_SPEC_CTRL, we check to see that
>> the host supports the value written. We don't care whether or not the
>> guest supports the value written (as long as it supports the MSR).
>> After the first non-zero write, we stop intercepting writes to
>> IA32_SPEC_CTRL, so the guest can write any value supported by the
>> hardware. This could be problematic in heterogeneous migration pools.
>> For instance, a VM that starts on a Cascade Lake host may set
>> IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
>> CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
>> migrated to a Skylake host, KVM_SET_MSRS will refuse to set
>> IA32_SPEC_CTRL to its current value, because Skylake doesn't support
>> PSFD.

It is a guest fault. Can we modify guest kernel in this case?

>> 
>> We disable write intercepts IA32_PRED_CMD as long as the guest
>> supports the MSR. That's fine for now, since only one bit of PRED_CMD
>> has been defined. Hence, guest support and host support are
>> equivalent...today. But, are we really comfortable with letting the
>> guest set any IA32_PRED_CMD bit that may be defined in the future?
>>
>> The same question applies to IA32_SPEC_CTRL. Are we comfortable with
>> letting the guest write to any bit that may be defined in the future?
>
>My point is we need to fix it, though Chao has different point that sometimes
>performance may be more important[*]
>
>[*] https://lore.kernel.org/all/ZGdE3jNS11wV+V2w@chao-email/

Maybe KVM can provide options to QEMU. e.g., we can define a KVM quirk.
Disabling the quirk means always intercept IA32_SPEC_CTRL MSR writes.

>
>> At least the AMD approach with V_SPEC_CTRL prevents the guest from
>> clearing any bits set by the host, but on Intel, it's a total
>> free-for-all. What happens when a new bit is defined that absolutely
>> must be set to 1 all of the time?

I suppose there is no such bit now. For SPR and future CPUs, "virtualize
IA32_SPEC_CTRL" VMX feature can lock some bits to 0 or 1 regardless of
the value written by guests.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-20  1:58   ` Chao Gao
@ 2023-07-20  4:04     ` Xiaoyao Li
  2023-07-20 10:38       ` Chao Gao
  2023-07-20  4:04     ` Jim Mattson
  1 sibling, 1 reply; 16+ messages in thread
From: Xiaoyao Li @ 2023-07-20  4:04 UTC (permalink / raw)
  To: Chao Gao; +Cc: Jim Mattson, kvm list

On 7/20/2023 9:58 AM, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 09:25:14AM +0800, Xiaoyao Li wrote:
>> On 7/20/2023 2:08 AM, Jim Mattson wrote:
>>> Normally, we would restrict guest MSR writes based on guest CPU
>>> features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
>>> the case.
> 
> This issue isn't specific to the two MSRs. Any MSRs that are not
> intercepted and with some reserved bits for future extenstions may run
> into this issue. Right? 

The luck is KVM defines a list of MSRs that can be passthrough for vmx:

static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS]  = {
	MSR_IA32_SPEC_CTRL,
	MSR_IA32_PRED_CMD,
	MSR_IA32_FLUSH_CMD,
	MSR_IA32_TSC,
#ifdef CONFIG_X86_64
	MSR_FS_BASE,
	MSR_GS_BASE,
	MSR_KERNEL_GS_BASE,
	MSR_IA32_XFD,
	MSR_IA32_XFD_ERR,
#endif
	MSR_IA32_SYSENTER_CS,
	MSR_IA32_SYSENTER_ESP,
	MSR_IA32_SYSENTER_EIP,
	MSR_CORE_C1_RES,
	MSR_CORE_C3_RESIDENCY,
	MSR_CORE_C6_RESIDENCY,
	MSR_CORE_C7_RESIDENCY,
};

and only a few of them has reserved bits. It's feasible to fix them.

> IMO, it is a conflict of interests between
> disabling MSR write intercept for less VM-exits and host's control over
> the value written to the MSR by guest.
> 
> We may need something like CR0/CR4 masks and read shadows for all MSRs
> to address this fundamental issue.

It looks unacceptable for HW vendor. There are so many MSRs.

>>>
>>> For the first non-zero write to IA32_SPEC_CTRL, we check to see that
>>> the host supports the value written. We don't care whether or not the
>>> guest supports the value written (as long as it supports the MSR).
>>> After the first non-zero write, we stop intercepting writes to
>>> IA32_SPEC_CTRL, so the guest can write any value supported by the
>>> hardware. This could be problematic in heterogeneous migration pools.
>>> For instance, a VM that starts on a Cascade Lake host may set
>>> IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
>>> CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
>>> migrated to a Skylake host, KVM_SET_MSRS will refuse to set
>>> IA32_SPEC_CTRL to its current value, because Skylake doesn't support
>>> PSFD.
> 
> It is a guest fault. Can we modify guest kernel in this case?

I don't think it's a guest fault. Guest can do whatever it wants and KVM 
cannot expect guest's behavior.

>>>
>>> We disable write intercepts IA32_PRED_CMD as long as the guest
>>> supports the MSR. That's fine for now, since only one bit of PRED_CMD
>>> has been defined. Hence, guest support and host support are
>>> equivalent...today. But, are we really comfortable with letting the
>>> guest set any IA32_PRED_CMD bit that may be defined in the future?
>>>
>>> The same question applies to IA32_SPEC_CTRL. Are we comfortable with
>>> letting the guest write to any bit that may be defined in the future?
>>
>> My point is we need to fix it, though Chao has different point that sometimes
>> performance may be more important[*]
>>
>> [*] https://lore.kernel.org/all/ZGdE3jNS11wV+V2w@chao-email/
> 
> Maybe KVM can provide options to QEMU. e.g., we can define a KVM quirk.
> Disabling the quirk means always intercept IA32_SPEC_CTRL MSR writes.
> 
>>
>>> At least the AMD approach with V_SPEC_CTRL prevents the guest from
>>> clearing any bits set by the host, but on Intel, it's a total
>>> free-for-all. What happens when a new bit is defined that absolutely
>>> must be set to 1 all of the time?
> 
> I suppose there is no such bit now. For SPR and future CPUs, "virtualize
> IA32_SPEC_CTRL" VMX feature can lock some bits to 0 or 1 regardless of
> the value written by guests.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-20  1:58   ` Chao Gao
  2023-07-20  4:04     ` Xiaoyao Li
@ 2023-07-20  4:04     ` Jim Mattson
  2023-07-20  8:03       ` Chao Gao
  1 sibling, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2023-07-20  4:04 UTC (permalink / raw)
  To: Chao Gao; +Cc: Xiaoyao Li, kvm list

On Wed, Jul 19, 2023 at 6:58 PM Chao Gao <chao.gao@intel.com> wrote:
>
> On Thu, Jul 20, 2023 at 09:25:14AM +0800, Xiaoyao Li wrote:
> >On 7/20/2023 2:08 AM, Jim Mattson wrote:
> >> Normally, we would restrict guest MSR writes based on guest CPU
> >> features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
> >> the case.
>
> This issue isn't specific to the two MSRs. Any MSRs that are not
> intercepted and with some reserved bits for future extenstions may run
> into this issue. Right? IMO, it is a conflict of interests between
> disabling MSR write intercept for less VM-exits and host's control over
> the value written to the MSR by guest.

I've clearly been falling behind in tracking upstream development. I
didn't realize that we passed through any other MSRs that had bits
reserved for future extensions (virtual addresses don't count). It
looks like we've decided to virtualize IA32_FLUSH_CMD as well, even
though Konrad had the good sense to talk me out of it when I first
proposed it. Are there others I'm missing?

Philosophically, there are three principles potentially in conflict
here: security, correctness, and performance. Userspace should perhaps
be given the option of prioritizing one over the others, but the
default precedence should be security first, correctness second, and
performance last.

> We may need something like CR0/CR4 masks and read shadows for all MSRs
> to address this fundamental issue.

Not *all* MSRs, but some, certainly. That is one possible solution,
but I get the impression that you're not really serious about this
proposal.

> >>
> >> For the first non-zero write to IA32_SPEC_CTRL, we check to see that
> >> the host supports the value written. We don't care whether or not the
> >> guest supports the value written (as long as it supports the MSR).
> >> After the first non-zero write, we stop intercepting writes to
> >> IA32_SPEC_CTRL, so the guest can write any value supported by the
> >> hardware. This could be problematic in heterogeneous migration pools.
> >> For instance, a VM that starts on a Cascade Lake host may set
> >> IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
> >> CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
> >> migrated to a Skylake host, KVM_SET_MSRS will refuse to set
> >> IA32_SPEC_CTRL to its current value, because Skylake doesn't support
> >> PSFD.
>
> It is a guest fault. Can we modify guest kernel in this case?

The guest should not have set the bit. The hypervisor should not have
allowed it. Both are at fault.

I'm willing to bet that Intel has a CPU validation suite that includes
such tests as setting reserved bits in MSRs and ensuring that #GP is
raised. Those tests should also work in a VM. If they don't, the
hypervisor is broken.

> >>
> >> We disable write intercepts IA32_PRED_CMD as long as the guest
> >> supports the MSR. That's fine for now, since only one bit of PRED_CMD
> >> has been defined. Hence, guest support and host support are
> >> equivalent...today. But, are we really comfortable with letting the
> >> guest set any IA32_PRED_CMD bit that may be defined in the future?
> >>
> >> The same question applies to IA32_SPEC_CTRL. Are we comfortable with
> >> letting the guest write to any bit that may be defined in the future?
> >
> >My point is we need to fix it, though Chao has different point that sometimes
> >performance may be more important[*]
> >
> >[*] https://lore.kernel.org/all/ZGdE3jNS11wV+V2w@chao-email/
>
> Maybe KVM can provide options to QEMU. e.g., we can define a KVM quirk.
> Disabling the quirk means always intercept IA32_SPEC_CTRL MSR writes.

Alternatively, we can check the host value of IA32_SPEC_CTRL on
VM-entry, since we have to read it anyway. If any bits are set that
cannot be cleared in VMX non-root operation without compromising
security, then writes to IA32_SPEC_CTRL should be intercepted.

> >
> >> At least the AMD approach with V_SPEC_CTRL prevents the guest from
> >> clearing any bits set by the host, but on Intel, it's a total
> >> free-for-all. What happens when a new bit is defined that absolutely
> >> must be set to 1 all of the time?
>
> I suppose there is no such bit now. For SPR and future CPUs, "virtualize
> IA32_SPEC_CTRL" VMX feature can lock some bits to 0 or 1 regardless of
> the value written by guests.

As your colleague pointed out earlier, IA32_SPEC_CTRL.STIBP[bit 1] is
such a bit. If the host has this bit set and you allow the guest to
clear it, then you have compromised host security.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-20  4:04     ` Jim Mattson
@ 2023-07-20  8:03       ` Chao Gao
  2023-07-20 17:52         ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Gao @ 2023-07-20  8:03 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Xiaoyao Li, kvm list

On Wed, Jul 19, 2023 at 09:04:58PM -0700, Jim Mattson wrote:
>On Wed, Jul 19, 2023 at 6:58 PM Chao Gao <chao.gao@intel.com> wrote:
>>
>> On Thu, Jul 20, 2023 at 09:25:14AM +0800, Xiaoyao Li wrote:
>> >On 7/20/2023 2:08 AM, Jim Mattson wrote:
>> >> Normally, we would restrict guest MSR writes based on guest CPU
>> >> features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
>> >> the case.
>>
>> This issue isn't specific to the two MSRs. Any MSRs that are not
>> intercepted and with some reserved bits for future extenstions may run
>> into this issue. Right? IMO, it is a conflict of interests between
>> disabling MSR write intercept for less VM-exits and host's control over
>> the value written to the MSR by guest.
>
>I've clearly been falling behind in tracking upstream development. I
>didn't realize that we passed through any other MSRs that had bits
>reserved for future extensions (virtual addresses don't count). It
>looks like we've decided to virtualize IA32_FLUSH_CMD as well, even
>though Konrad had the good sense to talk me out of it when I first
>proposed it. Are there others I'm missing?

MSR_IA32_XFD{_ERR}, I think

SDM says:
Bit i of either MSR can be set to 1 only if CPUID.(EAX=0DH,ECX=i):ECX[2]
is enumerated as 1 (see Section 13.2). An execution of WRMSR that
attempts to set an unsupported bit in either MSR causes a
general-protection fault (#GP)

>
>Philosophically, there are three principles potentially in conflict
>here: security, correctness, and performance. Userspace should perhaps
>be given the option of prioritizing one over the others, but the
>default precedence should be security first, correctness second, and
>performance last.

I am not sure about the default precedence. I can name a few other cases
in which KVM behavior doesn't align with this precedence.
1. new instructions w/o control bits in CR0/4
2. CR3.LAM_U57/U48 can always be set by guests if KVM doesn't trap CR3
   writes, e.g., when EPT is enabled. This case is identical to the PSFD
   case you mentioned below.
3. GBPAGES*

*: https://lore.kernel.org/all/20230217231022.816138-3-seanjc@google.com/

>
>> We may need something like CR0/CR4 masks and read shadows for all MSRs
>> to address this fundamental issue.
>
>Not *all* MSRs, but some, certainly. That is one possible solution,
>but I get the impression that you're not really serious about this
>proposal.

I meant we need a generic mechanism applicable to all MSRs. There are up
to 2K MSRs (MSR-bitmap is 4KB). Then adding a mask and read shadow e.g.,
16 Bytes for each MSR needs about 32KB. I don't think it is completely
unacceptable because, IIRC, IPI virtualization takes up to 64KB. To
reduce the memory consumption, we can even let CPU consume a list of MSR
index, mask and read shadow, like VM-entry/exit "MSR-load" count/address.

>
>> >>
>> >> For the first non-zero write to IA32_SPEC_CTRL, we check to see that
>> >> the host supports the value written. We don't care whether or not the
>> >> guest supports the value written (as long as it supports the MSR).
>> >> After the first non-zero write, we stop intercepting writes to
>> >> IA32_SPEC_CTRL, so the guest can write any value supported by the
>> >> hardware. This could be problematic in heterogeneous migration pools.
>> >> For instance, a VM that starts on a Cascade Lake host may set
>> >> IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
>> >> CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
>> >> migrated to a Skylake host, KVM_SET_MSRS will refuse to set
>> >> IA32_SPEC_CTRL to its current value, because Skylake doesn't support
>> >> PSFD.
>>
>> It is a guest fault. Can we modify guest kernel in this case?
>
>The guest should not have set the bit. The hypervisor should not have
>allowed it. Both are at fault.
>
>I'm willing to bet that Intel has a CPU validation suite that includes
>such tests as setting reserved bits in MSRs and ensuring that #GP is
>raised. Those tests should also work in a VM. If they don't, the
>hypervisor is broken.

I agree hypervisor is broken in this specific case. I just doubt if it
is worthwhile to fix this issue i.e., the benefit is significant. I am
assuing the benefit of fixing the issue is just guests won't be able to
set reserved bits and attempts to do that cause #GP.

And is it fair to good citizens that won't set reserved bits but will
suffer performance drop caused by the fix?

>
>> >>
>> >> We disable write intercepts IA32_PRED_CMD as long as the guest
>> >> supports the MSR. That's fine for now, since only one bit of PRED_CMD
>> >> has been defined. Hence, guest support and host support are
>> >> equivalent...today. But, are we really comfortable with letting the
>> >> guest set any IA32_PRED_CMD bit that may be defined in the future?
>> >>
>> >> The same question applies to IA32_SPEC_CTRL. Are we comfortable with
>> >> letting the guest write to any bit that may be defined in the future?
>> >
>> >My point is we need to fix it, though Chao has different point that sometimes
>> >performance may be more important[*]
>> >
>> >[*] https://lore.kernel.org/all/ZGdE3jNS11wV+V2w@chao-email/
>>
>> Maybe KVM can provide options to QEMU. e.g., we can define a KVM quirk.
>> Disabling the quirk means always intercept IA32_SPEC_CTRL MSR writes.
>
>Alternatively, we can check the host value of IA32_SPEC_CTRL on
>VM-entry, since we have to read it anyway. If any bits are set that
>cannot be cleared in VMX non-root operation without compromising
>security, then writes to IA32_SPEC_CTRL should be intercepted.
>
>> >
>> >> At least the AMD approach with V_SPEC_CTRL prevents the guest from
>> >> clearing any bits set by the host, but on Intel, it's a total
>> >> free-for-all. What happens when a new bit is defined that absolutely
>> >> must be set to 1 all of the time?
>>
>> I suppose there is no such bit now. For SPR and future CPUs, "virtualize
>> IA32_SPEC_CTRL" VMX feature can lock some bits to 0 or 1 regardless of
>> the value written by guests.
>
>As your colleague pointed out earlier, IA32_SPEC_CTRL.STIBP[bit 1] is
>such a bit. If the host has this bit set and you allow the guest to
>clear it, then you have compromised host security.

If guest can compromise host security, I definitly agree to intercept
IA32_SPEC_CTRL MSR.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-20  4:04     ` Xiaoyao Li
@ 2023-07-20 10:38       ` Chao Gao
  0 siblings, 0 replies; 16+ messages in thread
From: Chao Gao @ 2023-07-20 10:38 UTC (permalink / raw)
  To: Xiaoyao Li; +Cc: Jim Mattson, kvm list

On Thu, Jul 20, 2023 at 12:04:48PM +0800, Xiaoyao Li wrote:
>On 7/20/2023 9:58 AM, Chao Gao wrote:
>> On Thu, Jul 20, 2023 at 09:25:14AM +0800, Xiaoyao Li wrote:
>> > On 7/20/2023 2:08 AM, Jim Mattson wrote:
>> > > Normally, we would restrict guest MSR writes based on guest CPU
>> > > features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
>> > > the case.
>> 
>> This issue isn't specific to the two MSRs. Any MSRs that are not
>> intercepted and with some reserved bits for future extenstions may run
>> into this issue. Right?
>
>The luck is KVM defines a list of MSRs that can be passthrough for vmx:
>
>static u32 vmx_possible_passthrough_msrs[MAX_POSSIBLE_PASSTHROUGH_MSRS]  = {
>	MSR_IA32_SPEC_CTRL,
>	MSR_IA32_PRED_CMD,
>	MSR_IA32_FLUSH_CMD,
>	MSR_IA32_TSC,
>#ifdef CONFIG_X86_64
>	MSR_FS_BASE,
>	MSR_GS_BASE,
>	MSR_KERNEL_GS_BASE,
>	MSR_IA32_XFD,
>	MSR_IA32_XFD_ERR,
>#endif
>	MSR_IA32_SYSENTER_CS,
>	MSR_IA32_SYSENTER_ESP,
>	MSR_IA32_SYSENTER_EIP,
>	MSR_CORE_C1_RES,
>	MSR_CORE_C3_RESIDENCY,
>	MSR_CORE_C6_RESIDENCY,
>	MSR_CORE_C7_RESIDENCY,
>};
>
>and only a few of them has reserved bits. It's feasible to fix them.

Yes. But note that a few MSRs will be added to the list soon, in the CET
series and the FRED series

>
>> > > hardware. This could be problematic in heterogeneous migration pools.
>> > > For instance, a VM that starts on a Cascade Lake host may set
>> > > IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
>> > > CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
>> > > migrated to a Skylake host, KVM_SET_MSRS will refuse to set
>> > > IA32_SPEC_CTRL to its current value, because Skylake doesn't support
>> > > PSFD.
>> 
>> It is a guest fault. Can we modify guest kernel in this case?
>
>I don't think it's a guest fault. Guest can do whatever it wants and KVM
>cannot expect guest's behavior.

OK. I have no objection.

But I still think adjusting guest behavior is the right thing to do.
Because I don't get the benefit of emulating hardware precisely in this
case but the cost of fixing KVM's behavior is obvious: if guests write
to the MSR frequently, they get a lot of VM-exits. I think correctness
is important but not always the most important.

We are working on a real-world project rather than a toy; we should take
other factors into consideration.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-20  8:03       ` Chao Gao
@ 2023-07-20 17:52         ` Jim Mattson
  2023-07-21  3:37           ` Chao Gao
  0 siblings, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2023-07-20 17:52 UTC (permalink / raw)
  To: Chao Gao; +Cc: Xiaoyao Li, kvm list

On Thu, Jul 20, 2023 at 1:04 AM Chao Gao <chao.gao@intel.com> wrote:
>
> On Wed, Jul 19, 2023 at 09:04:58PM -0700, Jim Mattson wrote:
> >On Wed, Jul 19, 2023 at 6:58 PM Chao Gao <chao.gao@intel.com> wrote:
> >>
> >> On Thu, Jul 20, 2023 at 09:25:14AM +0800, Xiaoyao Li wrote:
> >> >On 7/20/2023 2:08 AM, Jim Mattson wrote:
> >> >> Normally, we would restrict guest MSR writes based on guest CPU
> >> >> features. However, with IA32_SPEC_CTRL and IA32_PRED_CMD, this is not
> >> >> the case.
> >>
> >> This issue isn't specific to the two MSRs. Any MSRs that are not
> >> intercepted and with some reserved bits for future extenstions may run
> >> into this issue. Right? IMO, it is a conflict of interests between
> >> disabling MSR write intercept for less VM-exits and host's control over
> >> the value written to the MSR by guest.
> >
> >I've clearly been falling behind in tracking upstream development. I
> >didn't realize that we passed through any other MSRs that had bits
> >reserved for future extensions (virtual addresses don't count). It
> >looks like we've decided to virtualize IA32_FLUSH_CMD as well, even
> >though Konrad had the good sense to talk me out of it when I first
> >proposed it. Are there others I'm missing?
>
> MSR_IA32_XFD{_ERR}, I think
>
> SDM says:
> Bit i of either MSR can be set to 1 only if CPUID.(EAX=0DH,ECX=i):ECX[2]
> is enumerated as 1 (see Section 13.2). An execution of WRMSR that
> attempts to set an unsupported bit in either MSR causes a
> general-protection fault (#GP)
>
> >
> >Philosophically, there are three principles potentially in conflict
> >here: security, correctness, and performance. Userspace should perhaps
> >be given the option of prioritizing one over the others, but the
> >default precedence should be security first, correctness second, and
> >performance last.
>
> I am not sure about the default precedence. I can name a few other cases
> in which KVM behavior doesn't align with this precedence.
> 1. new instructions w/o control bits in CR0/4

We've come a long way since Popek & Goldberg, but that's an example of
why the x86 architecture still isn't virtualizable. I don't think
anything can be done about that.

> 2. CR3.LAM_U57/U48 can always be set by guests if KVM doesn't trap CR3
>    writes, e.g., when EPT is enabled. This case is identical to the PSFD
>    case you mentioned below.

LAM is another feature I haven't been following. It's a bit sad that
the x86 vendors continue to introduce new features that restrict the
host platforms that can share a migration pool.

> 3. GBPAGES*

Is there an Intel CPU that isn't well past EOL that doesn't support GBPAGES?

> *: https://lore.kernel.org/all/20230217231022.816138-3-seanjc@google.com/
>
> >
> >> We may need something like CR0/CR4 masks and read shadows for all MSRs
> >> to address this fundamental issue.
> >
> >Not *all* MSRs, but some, certainly. That is one possible solution,
> >but I get the impression that you're not really serious about this
> >proposal.
>
> I meant we need a generic mechanism applicable to all MSRs. There are up
> to 2K MSRs (MSR-bitmap is 4KB). Then adding a mask and read shadow e.g.,
> 16 Bytes for each MSR needs about 32KB. I don't think it is completely
> unacceptable because, IIRC, IPI virtualization takes up to 64KB. To
> reduce the memory consumption, we can even let CPU consume a list of MSR
> index, mask and read shadow, like VM-entry/exit "MSR-load" count/address.

I assume the quoted IPI virtualization overhead is per-VM. We would
need a read shadow per vCPU.

> >> >>
> >> >> For the first non-zero write to IA32_SPEC_CTRL, we check to see that
> >> >> the host supports the value written. We don't care whether or not the
> >> >> guest supports the value written (as long as it supports the MSR).
> >> >> After the first non-zero write, we stop intercepting writes to
> >> >> IA32_SPEC_CTRL, so the guest can write any value supported by the
> >> >> hardware. This could be problematic in heterogeneous migration pools.
> >> >> For instance, a VM that starts on a Cascade Lake host may set
> >> >> IA32_SPEC_CTRL.PSFD[bit 7], even if the guest
> >> >> CPUID.(EAX=07H,ECX=02H):EDX.PSFD[bit 0] is clear. Then, if that VM is
> >> >> migrated to a Skylake host, KVM_SET_MSRS will refuse to set
> >> >> IA32_SPEC_CTRL to its current value, because Skylake doesn't support
> >> >> PSFD.
> >>
> >> It is a guest fault. Can we modify guest kernel in this case?
> >
> >The guest should not have set the bit. The hypervisor should not have
> >allowed it. Both are at fault.
> >
> >I'm willing to bet that Intel has a CPU validation suite that includes
> >such tests as setting reserved bits in MSRs and ensuring that #GP is
> >raised. Those tests should also work in a VM. If they don't, the
> >hypervisor is broken.
>
> I agree hypervisor is broken in this specific case. I just doubt if it
> is worthwhile to fix this issue i.e., the benefit is significant. I am
> assuing the benefit of fixing the issue is just guests won't be able to
> set reserved bits and attempts to do that cause #GP.

I'm not entirely convinced that setting bits in IA32_SPEC_CTRL is (and
always will be) "safe," from a security perspective. *Clearing* bits
certainly isn't.

> And is it fair to good citizens that won't set reserved bits but will
> suffer performance drop caused by the fix?

Is it fair to other tenants of the host to have their data exfiltrated
by a bad citizen, because KVM didn't control access to the MSR?
> >
> >> >>
> >> >> We disable write intercepts IA32_PRED_CMD as long as the guest
> >> >> supports the MSR. That's fine for now, since only one bit of PRED_CMD
> >> >> has been defined. Hence, guest support and host support are
> >> >> equivalent...today. But, are we really comfortable with letting the
> >> >> guest set any IA32_PRED_CMD bit that may be defined in the future?
> >> >>
> >> >> The same question applies to IA32_SPEC_CTRL. Are we comfortable with
> >> >> letting the guest write to any bit that may be defined in the future?
> >> >
> >> >My point is we need to fix it, though Chao has different point that sometimes
> >> >performance may be more important[*]
> >> >
> >> >[*] https://lore.kernel.org/all/ZGdE3jNS11wV+V2w@chao-email/
> >>
> >> Maybe KVM can provide options to QEMU. e.g., we can define a KVM quirk.
> >> Disabling the quirk means always intercept IA32_SPEC_CTRL MSR writes.
> >
> >Alternatively, we can check the host value of IA32_SPEC_CTRL on
> >VM-entry, since we have to read it anyway. If any bits are set that
> >cannot be cleared in VMX non-root operation without compromising
> >security, then writes to IA32_SPEC_CTRL should be intercepted.
> >
> >> >
> >> >> At least the AMD approach with V_SPEC_CTRL prevents the guest from
> >> >> clearing any bits set by the host, but on Intel, it's a total
> >> >> free-for-all. What happens when a new bit is defined that absolutely
> >> >> must be set to 1 all of the time?
> >>
> >> I suppose there is no such bit now. For SPR and future CPUs, "virtualize
> >> IA32_SPEC_CTRL" VMX feature can lock some bits to 0 or 1 regardless of
> >> the value written by guests.
> >
> >As your colleague pointed out earlier, IA32_SPEC_CTRL.STIBP[bit 1] is
> >such a bit. If the host has this bit set and you allow the guest to
> >clear it, then you have compromised host security.
>
> If guest can compromise host security, I definitly agree to intercept
> IA32_SPEC_CTRL MSR.

I believe that when the decision was made to pass through this MSR for
write, the assumption was that the host wouldn't ever use it (hence
the host value would be zero). That assumption has not stood the test
of time.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-20 17:52         ` Jim Mattson
@ 2023-07-21  3:37           ` Chao Gao
  2023-07-21 19:01             ` Pawan Gupta
  0 siblings, 1 reply; 16+ messages in thread
From: Chao Gao @ 2023-07-21  3:37 UTC (permalink / raw)
  To: Jim Mattson, Pawan Gupta; +Cc: Xiaoyao Li, kvm list

On Thu, Jul 20, 2023 at 10:52:44AM -0700, Jim Mattson wrote:
>> And is it fair to good citizens that won't set reserved bits but will
>> suffer performance drop caused by the fix?
>
>Is it fair to other tenants of the host to have their data exfiltrated
>by a bad citizen, because KVM didn't control access to the MSR?

To be clear, I agree to intercept IA32_SPEC_CTRL MSR if allowing guests
to clear some bits puts host or other tenents at risk.

>> >As your colleague pointed out earlier, IA32_SPEC_CTRL.STIBP[bit 1] is
>> >such a bit. If the host has this bit set and you allow the guest to
>> >clear it, then you have compromised host security.

...

>>
>> If guest can compromise host security, I definitly agree to intercept
>> IA32_SPEC_CTRL MSR.
>
>I believe that when the decision was made to pass through this MSR for
>write, the assumption was that the host wouldn't ever use it (hence
>the host value would be zero). That assumption has not stood the test
>of time.

Could you elaborate on the security risk of guests' clearing
IA32_SPEC_CTRL.STIBP[bit 1] (or any other bit)? +Pawan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-21  3:37           ` Chao Gao
@ 2023-07-21 19:01             ` Pawan Gupta
  2023-07-21 19:18               ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Pawan Gupta @ 2023-07-21 19:01 UTC (permalink / raw)
  To: Chao Gao; +Cc: Jim Mattson, Xiaoyao Li, kvm list

On Fri, Jul 21, 2023 at 11:37:42AM +0800, Chao Gao wrote:
> On Thu, Jul 20, 2023 at 10:52:44AM -0700, Jim Mattson wrote:
> >> And is it fair to good citizens that won't set reserved bits but will
> >> suffer performance drop caused by the fix?
> >
> >Is it fair to other tenants of the host to have their data exfiltrated
> >by a bad citizen, because KVM didn't control access to the MSR?
> 
> To be clear, I agree to intercept IA32_SPEC_CTRL MSR if allowing guests
> to clear some bits puts host or other tenents at risk.
> 
> >> >As your colleague pointed out earlier, IA32_SPEC_CTRL.STIBP[bit 1] is
> >> >such a bit. If the host has this bit set and you allow the guest to
> >> >clear it, then you have compromised host security.
> 
> ...
> 
> >>
> >> If guest can compromise host security, I definitly agree to intercept
> >> IA32_SPEC_CTRL MSR.
> >
> >I believe that when the decision was made to pass through this MSR for
> >write, the assumption was that the host wouldn't ever use it (hence
> >the host value would be zero). That assumption has not stood the test
> >of time.
> 
> Could you elaborate on the security risk of guests' clearing
> IA32_SPEC_CTRL.STIBP[bit 1] (or any other bit)? +Pawan

Please note that clearing STIBP bit on one thread does not disable STIBP
protection if the sibling has it set:

  Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
  prevents the predicted targets of indirect branches on any logical
  processor of that core from being controlled by software that executes
  (or executed previously) on another logical processor of the same core
  [1].

Also IBRS on Intel parts automatically provides STIBP protection:

  Section 2.4.1.2. IBRS: Support Based on Software Enabling [2]:

  when IA32_SPEC_CTRL.IBRS is set to 1 on any logical processors of that
  core, the predicted targets of indirect branches cannot be controlled by
  software that executes (or executed previously) on another logical
  processor of the same core.

  Section 2.4.2. Single Thread Indirect Branch Predictors (STIBP)[2]:

  Enabling IBRS prevents software operating on one logical processor
  from controlling the predicted targets of indirect branches executed
  on another logical processor. For that reason, it is not necessary to
  enable STIBP when IBRS is enabled.

So a guest disabling STIBP on one thread does not pose a security risk
to the sibling if the sibling has either STIBP or IBRS set. Note that
sensitive applications can always choose to have STIBP set for them via
the prctl() interface.

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/single-thread-indirect-branch-predictors.html

[2] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/speculative-execution-side-channel-mitigations.html#IBRS

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-21 19:01             ` Pawan Gupta
@ 2023-07-21 19:18               ` Jim Mattson
  2023-07-21 20:54                 ` Pawan Gupta
  0 siblings, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2023-07-21 19:18 UTC (permalink / raw)
  To: Pawan Gupta; +Cc: Chao Gao, Xiaoyao Li, kvm list

On Fri, Jul 21, 2023 at 12:01 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Jul 21, 2023 at 11:37:42AM +0800, Chao Gao wrote:
> > On Thu, Jul 20, 2023 at 10:52:44AM -0700, Jim Mattson wrote:
> > >> And is it fair to good citizens that won't set reserved bits but will
> > >> suffer performance drop caused by the fix?
> > >
> > >Is it fair to other tenants of the host to have their data exfiltrated
> > >by a bad citizen, because KVM didn't control access to the MSR?
> >
> > To be clear, I agree to intercept IA32_SPEC_CTRL MSR if allowing guests
> > to clear some bits puts host or other tenents at risk.
> >
> > >> >As your colleague pointed out earlier, IA32_SPEC_CTRL.STIBP[bit 1] is
> > >> >such a bit. If the host has this bit set and you allow the guest to
> > >> >clear it, then you have compromised host security.
> >
> > ...
> >
> > >>
> > >> If guest can compromise host security, I definitly agree to intercept
> > >> IA32_SPEC_CTRL MSR.
> > >
> > >I believe that when the decision was made to pass through this MSR for
> > >write, the assumption was that the host wouldn't ever use it (hence
> > >the host value would be zero). That assumption has not stood the test
> > >of time.
> >
> > Could you elaborate on the security risk of guests' clearing
> > IA32_SPEC_CTRL.STIBP[bit 1] (or any other bit)? +Pawan
>
> Please note that clearing STIBP bit on one thread does not disable STIBP
> protection if the sibling has it set:
>
>   Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
>   prevents the predicted targets of indirect branches on any logical
>   processor of that core from being controlled by software that executes
>   (or executed previously) on another logical processor of the same core
>   [1].

I stand corrected. For completeness, then, is it true now and
forevermore that passing IA32_SPEC_CTRL through to the guest for write
can in no way compromise code running on the sibling thread?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-21 19:18               ` Jim Mattson
@ 2023-07-21 20:54                 ` Pawan Gupta
  2023-07-21 22:18                   ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Pawan Gupta @ 2023-07-21 20:54 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Chao Gao, Xiaoyao Li, kvm list

On Fri, Jul 21, 2023 at 12:18:36PM -0700, Jim Mattson wrote:
> > Please note that clearing STIBP bit on one thread does not disable STIBP
> > protection if the sibling has it set:
> >
> >   Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
> >   prevents the predicted targets of indirect branches on any logical
> >   processor of that core from being controlled by software that executes
> >   (or executed previously) on another logical processor of the same core
> >   [1].
> 
> I stand corrected. For completeness, then, is it true now and
> forevermore that passing IA32_SPEC_CTRL through to the guest for write
> can in no way compromise code running on the sibling thread?

As IA32_SPEC_CTRL is a thread-scope MSR, a malicious guest would be able
to turn off the mitigation on its own thread only. Looking at the
current controls in this MSR, I don't see how a malicious guest can
compromise code running on sibling thread.

But, I don't think there is a guarantee that future mitigations would
not allow a malicious guest to compromise code running on sibling. To
avoid this, care must be taken to add such mitigations to other MSRs
that are not exported to guests.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-21 20:54                 ` Pawan Gupta
@ 2023-07-21 22:18                   ` Jim Mattson
  2023-07-21 22:29                     ` Pawan Gupta
  0 siblings, 1 reply; 16+ messages in thread
From: Jim Mattson @ 2023-07-21 22:18 UTC (permalink / raw)
  To: Pawan Gupta; +Cc: Chao Gao, Xiaoyao Li, kvm list

On Fri, Jul 21, 2023 at 1:54 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Jul 21, 2023 at 12:18:36PM -0700, Jim Mattson wrote:
> > > Please note that clearing STIBP bit on one thread does not disable STIBP
> > > protection if the sibling has it set:
> > >
> > >   Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
> > >   prevents the predicted targets of indirect branches on any logical
> > >   processor of that core from being controlled by software that executes
> > >   (or executed previously) on another logical processor of the same core
> > >   [1].
> >
> > I stand corrected. For completeness, then, is it true now and
> > forevermore that passing IA32_SPEC_CTRL through to the guest for write
> > can in no way compromise code running on the sibling thread?
>
> As IA32_SPEC_CTRL is a thread-scope MSR, a malicious guest would be able
> to turn off the mitigation on its own thread only. Looking at the
> current controls in this MSR, I don't see how a malicious guest can
> compromise code running on sibling thread.

Does this imply that where core-shared resources are affected (as with
STIBP), the mitigation is enabled whenever at least one thread
requests it?

> But, I don't think there is a guarantee that future mitigations would
> not allow a malicious guest to compromise code running on sibling. To
> avoid this, care must be taken to add such mitigations to other MSRs
> that are not exported to guests.

Can you make sure that the right people at Intel get that message?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-21 22:18                   ` Jim Mattson
@ 2023-07-21 22:29                     ` Pawan Gupta
  2023-07-24 19:25                       ` Pawan Gupta
  0 siblings, 1 reply; 16+ messages in thread
From: Pawan Gupta @ 2023-07-21 22:29 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Chao Gao, Xiaoyao Li, kvm list

On Fri, Jul 21, 2023 at 03:18:12PM -0700, Jim Mattson wrote:
> On Fri, Jul 21, 2023 at 1:54 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > On Fri, Jul 21, 2023 at 12:18:36PM -0700, Jim Mattson wrote:
> > > > Please note that clearing STIBP bit on one thread does not disable STIBP
> > > > protection if the sibling has it set:
> > > >
> > > >   Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
> > > >   prevents the predicted targets of indirect branches on any logical
> > > >   processor of that core from being controlled by software that executes
> > > >   (or executed previously) on another logical processor of the same core
> > > >   [1].
> > >
> > > I stand corrected. For completeness, then, is it true now and
> > > forevermore that passing IA32_SPEC_CTRL through to the guest for write
> > > can in no way compromise code running on the sibling thread?
> >
> > As IA32_SPEC_CTRL is a thread-scope MSR, a malicious guest would be able
> > to turn off the mitigation on its own thread only. Looking at the
> > current controls in this MSR, I don't see how a malicious guest can
> > compromise code running on sibling thread.
> 
> Does this imply that where core-shared resources are affected (as with
> STIBP), the mitigation is enabled whenever at least one thread
> requests it?

Let me check with CPU architects.

> > But, I don't think there is a guarantee that future mitigations would
> > not allow a malicious guest to compromise code running on sibling. To
> > avoid this, care must be taken to add such mitigations to other MSRs
> > that are not exported to guests.
> 
> Can you make sure that the right people at Intel get that message?

Passing this message with the first query.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-21 22:29                     ` Pawan Gupta
@ 2023-07-24 19:25                       ` Pawan Gupta
  2023-07-24 20:01                         ` Jim Mattson
  0 siblings, 1 reply; 16+ messages in thread
From: Pawan Gupta @ 2023-07-24 19:25 UTC (permalink / raw)
  To: Jim Mattson; +Cc: Chao Gao, Xiaoyao Li, kvm list

On Fri, Jul 21, 2023 at 03:29:04PM -0700, Pawan Gupta wrote:
> On Fri, Jul 21, 2023 at 03:18:12PM -0700, Jim Mattson wrote:
> > On Fri, Jul 21, 2023 at 1:54 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > On Fri, Jul 21, 2023 at 12:18:36PM -0700, Jim Mattson wrote:
> > > > > Please note that clearing STIBP bit on one thread does not disable STIBP
> > > > > protection if the sibling has it set:
> > > > >
> > > > >   Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
> > > > >   prevents the predicted targets of indirect branches on any logical
> > > > >   processor of that core from being controlled by software that executes
> > > > >   (or executed previously) on another logical processor of the same core
> > > > >   [1].
> > > >
> > > > I stand corrected. For completeness, then, is it true now and
> > > > forevermore that passing IA32_SPEC_CTRL through to the guest for write
> > > > can in no way compromise code running on the sibling thread?
> > >
> > > As IA32_SPEC_CTRL is a thread-scope MSR, a malicious guest would be able
> > > to turn off the mitigation on its own thread only. Looking at the
> > > current controls in this MSR, I don't see how a malicious guest can
> > > compromise code running on sibling thread.
> > 
> > Does this imply that where core-shared resources are affected (as with
> > STIBP), the mitigation is enabled whenever at least one thread
> > requests it?
> 
> Let me check with CPU architects.

For the controls present in IA32_SPEC_CTRL MSR, if atleast one of the
thread has the mitigation enabled, current CPUs do not disable core-wide
mitigations when core-shared resources are affected.

This will be the guiding principle for future mitigation controls that
may be added to IA32_SPEC_CTRL MSR.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD
  2023-07-24 19:25                       ` Pawan Gupta
@ 2023-07-24 20:01                         ` Jim Mattson
  0 siblings, 0 replies; 16+ messages in thread
From: Jim Mattson @ 2023-07-24 20:01 UTC (permalink / raw)
  To: Pawan Gupta; +Cc: Chao Gao, Xiaoyao Li, kvm list

On Mon, Jul 24, 2023 at 12:26 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Fri, Jul 21, 2023 at 03:29:04PM -0700, Pawan Gupta wrote:
> > On Fri, Jul 21, 2023 at 03:18:12PM -0700, Jim Mattson wrote:
> > > On Fri, Jul 21, 2023 at 1:54 PM Pawan Gupta
> > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > >
> > > > On Fri, Jul 21, 2023 at 12:18:36PM -0700, Jim Mattson wrote:
> > > > > > Please note that clearing STIBP bit on one thread does not disable STIBP
> > > > > > protection if the sibling has it set:
> > > > > >
> > > > > >   Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
> > > > > >   prevents the predicted targets of indirect branches on any logical
> > > > > >   processor of that core from being controlled by software that executes
> > > > > >   (or executed previously) on another logical processor of the same core
> > > > > >   [1].
> > > > >
> > > > > I stand corrected. For completeness, then, is it true now and
> > > > > forevermore that passing IA32_SPEC_CTRL through to the guest for write
> > > > > can in no way compromise code running on the sibling thread?
> > > >
> > > > As IA32_SPEC_CTRL is a thread-scope MSR, a malicious guest would be able
> > > > to turn off the mitigation on its own thread only. Looking at the
> > > > current controls in this MSR, I don't see how a malicious guest can
> > > > compromise code running on sibling thread.
> > >
> > > Does this imply that where core-shared resources are affected (as with
> > > STIBP), the mitigation is enabled whenever at least one thread
> > > requests it?
> >
> > Let me check with CPU architects.
>
> For the controls present in IA32_SPEC_CTRL MSR, if atleast one of the
> thread has the mitigation enabled, current CPUs do not disable core-wide
> mitigations when core-shared resources are affected.
>
> This will be the guiding principle for future mitigation controls that
> may be added to IA32_SPEC_CTRL MSR.

Excellent. Thank you!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-07-24 20:01 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-19 18:08 KVM's sloppiness wrt IA32_SPEC_CTRL and IA32_PRED_CMD Jim Mattson
2023-07-20  1:25 ` Xiaoyao Li
2023-07-20  1:58   ` Chao Gao
2023-07-20  4:04     ` Xiaoyao Li
2023-07-20 10:38       ` Chao Gao
2023-07-20  4:04     ` Jim Mattson
2023-07-20  8:03       ` Chao Gao
2023-07-20 17:52         ` Jim Mattson
2023-07-21  3:37           ` Chao Gao
2023-07-21 19:01             ` Pawan Gupta
2023-07-21 19:18               ` Jim Mattson
2023-07-21 20:54                 ` Pawan Gupta
2023-07-21 22:18                   ` Jim Mattson
2023-07-21 22:29                     ` Pawan Gupta
2023-07-24 19:25                       ` Pawan Gupta
2023-07-24 20:01                         ` Jim Mattson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.