All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 15:33 Liran Alon
  2018-01-09 15:56 ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Liran Alon @ 2018-01-09 15:33 UTC (permalink / raw)
  To: arjan
  Cc: jmattson, dwmw, bp, thomas.lendacky, aliguori, pbonzini,
	linux-kernel, kvm


----- arjan@linux.intel.com wrote:

> On 1/9/2018 7:00 AM, Liran Alon wrote:
> > 
> > ----- arjan@linux.intel.com wrote:
> > 
> >> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> >>> The above ("IBRS simply disables the indirect branch predictor")
> was
> >> my
> >>> take-away message from private discussion with Intel.  My guess
> is
> >> that
> >>> the vendors are just handwaving a spec that doesn't match what
> they
> >> have
> >>> implemented, because honestly a microcode update is unlikely to
> do
> >> much
> >>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
> >>> though, since the performance characteristics of IBRS are so
> >> different
> >>> from previous processors.  Let's ask Arjan who might have more
> >>> information about it, and hope he actually can disclose it...
> >>
> >> IBRS will ensure that, when set after the ring transition, no
> earlier
> >> branch prediction data is used for indirect branches while IBRS is
> >> set
> > 
> > Consider the following scenario:
> > 1. L1 runs with IBRS=1 in Ring0.
> > 2. L1 restores L2 SPEC_CTRL and enters into L2.
> > 3. L1 VMRUN exits into L0 which backups L1 SPEC_CTRL and enters L2
> (using same VMCB).
> > 4. L2 populates BTB/BHB with values and cause a hypercall which
> #VMExit into L0.
> > 5. L0 backups L2 SPEC_CTRL and writes IBRS=1.
> > 6. L0 restores L1 SPEC_CTRL and enters L1.
> > 7. L1 backups L2 SPEC_CTRL and writes IBRS=1.
> > 
> 
> I'm sorry I'm not familiar with your L0/L1/L2 terminology
> (maybe it's before coffee has had time to permeate the brain)

These are standard terminology for guest levels:
L0 == hypervisor that runs on bare-metal
L1 == hypervisor that runs as L0 guest.
L2 == software that runs as L1 guest.
(We are talking about nested virtualization here)

-Liran

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 15:33 [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU Liran Alon
@ 2018-01-09 15:56 ` Arjan van de Ven
  0 siblings, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2018-01-09 15:56 UTC (permalink / raw)
  To: Liran Alon
  Cc: jmattson, dwmw, bp, thomas.lendacky, aliguori, pbonzini,
	linux-kernel, kvm


>> I'm sorry I'm not familiar with your L0/L1/L2 terminology
>> (maybe it's before coffee has had time to permeate the brain)
> 
> These are standard terminology for guest levels:
> L0 == hypervisor that runs on bare-metal
> L1 == hypervisor that runs as L0 guest.
> L2 == software that runs as L1 guest.
> (We are talking about nested virtualization here)

1. I really really hope that the guests don't use IBRS but use retpoline. At least for Linux that is going to be the prefered approach.

2. For the CPU, there really is only "bare metal" vs "guest"; all guests are "guests" no matter how deeply nested. So for the language of privilege domains etc,
nested guests equal their parent.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 21:42               ` Paolo Bonzini
@ 2018-01-09 21:59                 ` Jim Mattson
  0 siblings, 0 replies; 22+ messages in thread
From: Jim Mattson @ 2018-01-09 21:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Konrad Rzeszutek Wilk, Arjan van de Ven, Liran Alon, dwmw, bp,
	aliguori, Tom Lendacky, LKML, kvm list

It's unclear from Intel's documentation whether writing bit 0 of
IA32_SPEC_CTRL or bit 0 of IA32_PRED_CMD will flush the BHB. (At
least, it's unclear from the documentation I have.)

The retpoline patches include code for *filling* the RSB, but if you
invoke the RSB refill code from kernel text before VM-entry, you still
reveal information about KASLR to the guest. I think we may need a
copy of the RSB refill code on a dynamically allocated page. In fact,
we may need ~32 branches on that page to clobber the BHB. That means
that maybe we can't do VM-entry from kernel text (unless one of IBRS
or IBPB flushes the BHB).

On Tue, Jan 9, 2018 at 1:42 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 09/01/2018 21:57, Jim Mattson wrote:
>> Before VM-entry, don't we need to flush the BHB and the RSB to avoid
>> revealing KASLR information to the guest? (Thanks to Liran for
>> pointing this out.)
>
> I don't know how you flush the BHB?  As to the RSB, that would also be
> part of generic Linux code so I've not included it yet in this series
> which was mostly about the new MSRs and CPUID bits.
>
> Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 20:39         ` Konrad Rzeszutek Wilk
  2018-01-09 20:47           ` Konrad Rzeszutek Wilk
@ 2018-01-09 21:56           ` Paolo Bonzini
  1 sibling, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2018-01-09 21:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Arjan van de Ven, Liran Alon, jmattson, dwmw, bp, aliguori,
	thomas.lendacky, linux-kernel, kvm

On 09/01/2018 21:39, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 09, 2018 at 05:49:08PM +0100, Paolo Bonzini wrote:
>> On 09/01/2018 17:23, Arjan van de Ven wrote:
>>> On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
>>>> On 09/01/2018 16:19, Arjan van de Ven wrote:
>>>>> On 1/9/2018 7:00 AM, Liran Alon wrote:
>>>>>>
>>>>>> ----- arjan@linux.intel.com wrote:
>>>>>>
>>>>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
>>>>>>>> The above ("IBRS simply disables the indirect branch predictor")
>>>>>>>> was my
>>>>>>>> take-away message from private discussion with Intel.  My guess is
>>>>>>>> that
>>>>>>>> the vendors are just handwaving a spec that doesn't match what
>>>>>>>> they have
>>>>>>>> implemented, because honestly a microcode update is unlikely to do
>>>>>>>> much
>>>>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
>>>>>>>> though, since the performance characteristics of IBRS are so
>>>>>>>> different
>>>>>>>> from previous processors.  Let's ask Arjan who might have more
>>>>>>>> information about it, and hope he actually can disclose it...
>>>>>>>
>>>>>>> IBRS will ensure that, when set after the ring transition, no earlier
>>>>>>> branch prediction data is used for indirect branches while IBRS is
>>>>>>> set
>>>>
>>>> Let me ask you my questions, which are independent of L0/L1/L2
>>>> terminology.
>>>>
>>>> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
>>>> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
>>>> the host will not do a wrmsr on exit.  Is this safe for the host kernel?
>>>
>>> I think the CPU folks would want us to write the msr again.
>>
>> Want us, or need us---and if we don't do that, what happens?  And if we
>> have to do it, how is IBRS=1 different from an IBPB?...
> 
> Arjan says 'ring transition' but I am pretty sure it is more of 'prediction
> mode change'. And from what I have gathered so far moving from lower (guest)
> to higher (hypervisor) has no bearing on the branch predicator. Meaning
> the guest ring0 can attack us if we don't touch this MSR.
> 
> We have to WRMSR 0x48 to 1 to flush out lower prediction. Aka this is a
> 'reset' button and at every 'prediction mode' you have to hit this.

That however makes me wonder why Intel said "before transitioning to
ring 3, do WRMSR to IA32_SPEC_CTRL to clear IBRS to 0".

I have looked again at the slides I had and "IBRS all the time" seems to
require an IBPB anyway, thus making me wonder what's the point of it at
all.  Why can't we have proper indexing of the BTB by PCID and VPID, and
forget IBRS completely on newer machines?!?

> <sigh> Can we have a discussion on making an kvm-security mailing list
> where we can figure all this out during embargo and not have these
> misunderstandings.

Being told who knows what from other companies, would also have been a
start.  Instead CVE-2017-5715 was disclosed to each partner
individually, and now _we_ are reaping what Intel has sown.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 20:57             ` Jim Mattson
  2018-01-09 21:11               ` Konrad Rzeszutek Wilk
@ 2018-01-09 21:42               ` Paolo Bonzini
  2018-01-09 21:59                 ` Jim Mattson
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2018-01-09 21:42 UTC (permalink / raw)
  To: Jim Mattson, Konrad Rzeszutek Wilk
  Cc: Arjan van de Ven, Liran Alon, dwmw, bp, aliguori, Tom Lendacky,
	LKML, kvm list

On 09/01/2018 21:57, Jim Mattson wrote:
> Before VM-entry, don't we need to flush the BHB and the RSB to avoid
> revealing KASLR information to the guest? (Thanks to Liran for
> pointing this out.)

I don't know how you flush the BHB?  As to the RSB, that would also be
part of generic Linux code so I've not included it yet in this series
which was mostly about the new MSRs and CPUID bits.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 21:11               ` Konrad Rzeszutek Wilk
@ 2018-01-09 21:19                 ` Jim Mattson
  0 siblings, 0 replies; 22+ messages in thread
From: Jim Mattson @ 2018-01-09 21:19 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jun.nakajima, Paolo Bonzini, Arjan van de Ven, Liran Alon, dwmw,
	bp, aliguori, Tom Lendacky, LKML, kvm list

If my documentation is up-to-date, writing IBRS does not clear the RSB
(except for parts which contain an RSB that is not filled by 32
CALLs).

On Tue, Jan 9, 2018 at 1:11 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Jan 09, 2018 at 12:57:38PM -0800, Jim Mattson wrote:
>> Before VM-entry, don't we need to flush the BHB and the RSB to avoid
>> revealing KASLR information to the guest? (Thanks to Liran for
>> pointing this out.)
>
> Exactly.
>
> Or is is touching with any value good enough?
>
> (Removing 't@char.us.oracle.com') from the email. Adding Jun.
>>
>> On Tue, Jan 9, 2018 at 12:47 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> > On Tue, Jan 09, 2018 at 03:39:09PM -0500, Konrad Rzeszutek Wilk wrote:
>> >> On Tue, Jan 09, 2018 at 05:49:08PM +0100, Paolo Bonzini wrote:
>> >> > On 09/01/2018 17:23, Arjan van de Ven wrote:
>> >> > > On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
>> >> > >> On 09/01/2018 16:19, Arjan van de Ven wrote:
>> >> > >>> On 1/9/2018 7:00 AM, Liran Alon wrote:
>> >> > >>>>
>> >> > >>>> ----- arjan@linux.intel.com wrote:
>> >> > >>>>
>> >> > >>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
>> >> > >>>>>> The above ("IBRS simply disables the indirect branch predictor")
>> >> > >>>>>> was my
>> >> > >>>>>> take-away message from private discussion with Intel.  My guess is
>> >> > >>>>>> that
>> >> > >>>>>> the vendors are just handwaving a spec that doesn't match what
>> >> > >>>>>> they have
>> >> > >>>>>> implemented, because honestly a microcode update is unlikely to do
>> >> > >>>>>> much
>> >> > >>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
>> >> > >>>>>> though, since the performance characteristics of IBRS are so
>> >> > >>>>>> different
>> >> > >>>>>> from previous processors.  Let's ask Arjan who might have more
>> >> > >>>>>> information about it, and hope he actually can disclose it...
>> >> > >>>>>
>> >> > >>>>> IBRS will ensure that, when set after the ring transition, no earlier
>> >> > >>>>> branch prediction data is used for indirect branches while IBRS is
>> >> > >>>>> set
>> >> > >>
>> >> > >> Let me ask you my questions, which are independent of L0/L1/L2
>> >> > >> terminology.
>> >> > >>
>> >> > >> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
>> >> > >> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
>> >> > >> the host will not do a wrmsr on exit.  Is this safe for the host kernel?
>> >> > >
>> >> > > I think the CPU folks would want us to write the msr again.
>> >> >
>> >> > Want us, or need us---and if we don't do that, what happens?  And if we
>> >> > have to do it, how is IBRS=1 different from an IBPB?...
>> >>
>> >> Arjan says 'ring transition' but I am pretty sure it is more of 'prediction
>> >> mode change'. And from what I have gathered so far moving from lower (guest)
>> >> to higher (hypervisor) has no bearing on the branch predicator. Meaning
>> >> the guest ring0 can attack us if we don't touch this MSR.
>> >>
>> >> We have to WRMSR 0x48 to 1 to flush out lower prediction. Aka this is a
>> >> 'reset' button and at every 'prediction mode' you have to hit this.
>> >
>> > I suppose means that when we VMENTER the original fix (where we
>> > compare the host to guest) can stay - as we entering an lower prediction
>> > mode. I wonder then what does writting 0 do to it? A nop?
>> >
>> >>
>> >>
>> >> <sigh> Can we have a discussion on making an kvm-security mailing list
>> >> where we can figure all this out during embargo and not have these
>> >> misunderstandings.
>> >>
>> >> >
>> >> > Since I am at it, what happens on *current generation* CPUs if you
>> >> > always leave IBRS=1?  Slow and safe, or fast and unsafe?
>> >> >
>> >> > >> 2) How will the future processors work where IBRS should always be =1?
>> >> > >
>> >> > > IBRS=1 should be "fire and forget this ever happened".
>> >> > > This is the only time anyone should use IBRS in practice
>> >> >
>> >> > And IBPB too I hope?  But besides that, I need to know exactly how that
>> >> > is implemented to ensure that it's doing the right thing.
>> >> >
>> >> > > (and then the host turns it on and makes sure to not expose it to the
>> >> > > guests I hope)
>> >> >
>> >> > That's not that easy, because guests might have support for SPEC_CTRL
>> >> > but not for IA32_ARCH_CAPABILITIES.
>> >> >
>> >> > You could disable the SPEC_CTRL bit, but then the guest might think it
>> >> > is not secure.  It might also actually *be* insecure, if you migrated to
>> >> > an older CPU where IBRS is not fire-and-forget.
>> >> >
>> >> > Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 20:57             ` Jim Mattson
@ 2018-01-09 21:11               ` Konrad Rzeszutek Wilk
  2018-01-09 21:19                 ` Jim Mattson
  2018-01-09 21:42               ` Paolo Bonzini
  1 sibling, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-09 21:11 UTC (permalink / raw)
  To: Jim Mattson, jun.nakajima
  Cc: Paolo Bonzini, Arjan van de Ven, Liran Alon, dwmw, bp, aliguori,
	Tom Lendacky, LKML, kvm list

On Tue, Jan 09, 2018 at 12:57:38PM -0800, Jim Mattson wrote:
> Before VM-entry, don't we need to flush the BHB and the RSB to avoid
> revealing KASLR information to the guest? (Thanks to Liran for
> pointing this out.)

Exactly.

Or is is touching with any value good enough?

(Removing 't@char.us.oracle.com') from the email. Adding Jun.
> 
> On Tue, Jan 9, 2018 at 12:47 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Tue, Jan 09, 2018 at 03:39:09PM -0500, Konrad Rzeszutek Wilk wrote:
> >> On Tue, Jan 09, 2018 at 05:49:08PM +0100, Paolo Bonzini wrote:
> >> > On 09/01/2018 17:23, Arjan van de Ven wrote:
> >> > > On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
> >> > >> On 09/01/2018 16:19, Arjan van de Ven wrote:
> >> > >>> On 1/9/2018 7:00 AM, Liran Alon wrote:
> >> > >>>>
> >> > >>>> ----- arjan@linux.intel.com wrote:
> >> > >>>>
> >> > >>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> >> > >>>>>> The above ("IBRS simply disables the indirect branch predictor")
> >> > >>>>>> was my
> >> > >>>>>> take-away message from private discussion with Intel.  My guess is
> >> > >>>>>> that
> >> > >>>>>> the vendors are just handwaving a spec that doesn't match what
> >> > >>>>>> they have
> >> > >>>>>> implemented, because honestly a microcode update is unlikely to do
> >> > >>>>>> much
> >> > >>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
> >> > >>>>>> though, since the performance characteristics of IBRS are so
> >> > >>>>>> different
> >> > >>>>>> from previous processors.  Let's ask Arjan who might have more
> >> > >>>>>> information about it, and hope he actually can disclose it...
> >> > >>>>>
> >> > >>>>> IBRS will ensure that, when set after the ring transition, no earlier
> >> > >>>>> branch prediction data is used for indirect branches while IBRS is
> >> > >>>>> set
> >> > >>
> >> > >> Let me ask you my questions, which are independent of L0/L1/L2
> >> > >> terminology.
> >> > >>
> >> > >> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
> >> > >> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
> >> > >> the host will not do a wrmsr on exit.  Is this safe for the host kernel?
> >> > >
> >> > > I think the CPU folks would want us to write the msr again.
> >> >
> >> > Want us, or need us---and if we don't do that, what happens?  And if we
> >> > have to do it, how is IBRS=1 different from an IBPB?...
> >>
> >> Arjan says 'ring transition' but I am pretty sure it is more of 'prediction
> >> mode change'. And from what I have gathered so far moving from lower (guest)
> >> to higher (hypervisor) has no bearing on the branch predicator. Meaning
> >> the guest ring0 can attack us if we don't touch this MSR.
> >>
> >> We have to WRMSR 0x48 to 1 to flush out lower prediction. Aka this is a
> >> 'reset' button and at every 'prediction mode' you have to hit this.
> >
> > I suppose means that when we VMENTER the original fix (where we
> > compare the host to guest) can stay - as we entering an lower prediction
> > mode. I wonder then what does writting 0 do to it? A nop?
> >
> >>
> >>
> >> <sigh> Can we have a discussion on making an kvm-security mailing list
> >> where we can figure all this out during embargo and not have these
> >> misunderstandings.
> >>
> >> >
> >> > Since I am at it, what happens on *current generation* CPUs if you
> >> > always leave IBRS=1?  Slow and safe, or fast and unsafe?
> >> >
> >> > >> 2) How will the future processors work where IBRS should always be =1?
> >> > >
> >> > > IBRS=1 should be "fire and forget this ever happened".
> >> > > This is the only time anyone should use IBRS in practice
> >> >
> >> > And IBPB too I hope?  But besides that, I need to know exactly how that
> >> > is implemented to ensure that it's doing the right thing.
> >> >
> >> > > (and then the host turns it on and makes sure to not expose it to the
> >> > > guests I hope)
> >> >
> >> > That's not that easy, because guests might have support for SPEC_CTRL
> >> > but not for IA32_ARCH_CAPABILITIES.
> >> >
> >> > You could disable the SPEC_CTRL bit, but then the guest might think it
> >> > is not secure.  It might also actually *be* insecure, if you migrated to
> >> > an older CPU where IBRS is not fire-and-forget.
> >> >
> >> > Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 20:47           ` Konrad Rzeszutek Wilk
@ 2018-01-09 20:57             ` Jim Mattson
  2018-01-09 21:11               ` Konrad Rzeszutek Wilk
  2018-01-09 21:42               ` Paolo Bonzini
  0 siblings, 2 replies; 22+ messages in thread
From: Jim Mattson @ 2018-01-09 20:57 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: Paolo Bonzini, t, Arjan van de Ven, Liran Alon, dwmw, bp,
	aliguori, Tom Lendacky, LKML, kvm list

Before VM-entry, don't we need to flush the BHB and the RSB to avoid
revealing KASLR information to the guest? (Thanks to Liran for
pointing this out.)

On Tue, Jan 9, 2018 at 12:47 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Jan 09, 2018 at 03:39:09PM -0500, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jan 09, 2018 at 05:49:08PM +0100, Paolo Bonzini wrote:
>> > On 09/01/2018 17:23, Arjan van de Ven wrote:
>> > > On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
>> > >> On 09/01/2018 16:19, Arjan van de Ven wrote:
>> > >>> On 1/9/2018 7:00 AM, Liran Alon wrote:
>> > >>>>
>> > >>>> ----- arjan@linux.intel.com wrote:
>> > >>>>
>> > >>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
>> > >>>>>> The above ("IBRS simply disables the indirect branch predictor")
>> > >>>>>> was my
>> > >>>>>> take-away message from private discussion with Intel.  My guess is
>> > >>>>>> that
>> > >>>>>> the vendors are just handwaving a spec that doesn't match what
>> > >>>>>> they have
>> > >>>>>> implemented, because honestly a microcode update is unlikely to do
>> > >>>>>> much
>> > >>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
>> > >>>>>> though, since the performance characteristics of IBRS are so
>> > >>>>>> different
>> > >>>>>> from previous processors.  Let's ask Arjan who might have more
>> > >>>>>> information about it, and hope he actually can disclose it...
>> > >>>>>
>> > >>>>> IBRS will ensure that, when set after the ring transition, no earlier
>> > >>>>> branch prediction data is used for indirect branches while IBRS is
>> > >>>>> set
>> > >>
>> > >> Let me ask you my questions, which are independent of L0/L1/L2
>> > >> terminology.
>> > >>
>> > >> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
>> > >> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
>> > >> the host will not do a wrmsr on exit.  Is this safe for the host kernel?
>> > >
>> > > I think the CPU folks would want us to write the msr again.
>> >
>> > Want us, or need us---and if we don't do that, what happens?  And if we
>> > have to do it, how is IBRS=1 different from an IBPB?...
>>
>> Arjan says 'ring transition' but I am pretty sure it is more of 'prediction
>> mode change'. And from what I have gathered so far moving from lower (guest)
>> to higher (hypervisor) has no bearing on the branch predicator. Meaning
>> the guest ring0 can attack us if we don't touch this MSR.
>>
>> We have to WRMSR 0x48 to 1 to flush out lower prediction. Aka this is a
>> 'reset' button and at every 'prediction mode' you have to hit this.
>
> I suppose means that when we VMENTER the original fix (where we
> compare the host to guest) can stay - as we entering an lower prediction
> mode. I wonder then what does writting 0 do to it? A nop?
>
>>
>>
>> <sigh> Can we have a discussion on making an kvm-security mailing list
>> where we can figure all this out during embargo and not have these
>> misunderstandings.
>>
>> >
>> > Since I am at it, what happens on *current generation* CPUs if you
>> > always leave IBRS=1?  Slow and safe, or fast and unsafe?
>> >
>> > >> 2) How will the future processors work where IBRS should always be =1?
>> > >
>> > > IBRS=1 should be "fire and forget this ever happened".
>> > > This is the only time anyone should use IBRS in practice
>> >
>> > And IBPB too I hope?  But besides that, I need to know exactly how that
>> > is implemented to ensure that it's doing the right thing.
>> >
>> > > (and then the host turns it on and makes sure to not expose it to the
>> > > guests I hope)
>> >
>> > That's not that easy, because guests might have support for SPEC_CTRL
>> > but not for IA32_ARCH_CAPABILITIES.
>> >
>> > You could disable the SPEC_CTRL bit, but then the guest might think it
>> > is not secure.  It might also actually *be* insecure, if you migrated to
>> > an older CPU where IBRS is not fire-and-forget.
>> >
>> > Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 20:39         ` Konrad Rzeszutek Wilk
@ 2018-01-09 20:47           ` Konrad Rzeszutek Wilk
  2018-01-09 20:57             ` Jim Mattson
  2018-01-09 21:56           ` Paolo Bonzini
  1 sibling, 1 reply; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-09 20:47 UTC (permalink / raw)
  To: Paolo Bonzini, t
  Cc: Arjan van de Ven, Liran Alon, jmattson, dwmw, bp, aliguori,
	thomas.lendacky, linux-kernel, kvm

On Tue, Jan 09, 2018 at 03:39:09PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 09, 2018 at 05:49:08PM +0100, Paolo Bonzini wrote:
> > On 09/01/2018 17:23, Arjan van de Ven wrote:
> > > On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
> > >> On 09/01/2018 16:19, Arjan van de Ven wrote:
> > >>> On 1/9/2018 7:00 AM, Liran Alon wrote:
> > >>>>
> > >>>> ----- arjan@linux.intel.com wrote:
> > >>>>
> > >>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> > >>>>>> The above ("IBRS simply disables the indirect branch predictor")
> > >>>>>> was my
> > >>>>>> take-away message from private discussion with Intel.  My guess is
> > >>>>>> that
> > >>>>>> the vendors are just handwaving a spec that doesn't match what
> > >>>>>> they have
> > >>>>>> implemented, because honestly a microcode update is unlikely to do
> > >>>>>> much
> > >>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
> > >>>>>> though, since the performance characteristics of IBRS are so
> > >>>>>> different
> > >>>>>> from previous processors.  Let's ask Arjan who might have more
> > >>>>>> information about it, and hope he actually can disclose it...
> > >>>>>
> > >>>>> IBRS will ensure that, when set after the ring transition, no earlier
> > >>>>> branch prediction data is used for indirect branches while IBRS is
> > >>>>> set
> > >>
> > >> Let me ask you my questions, which are independent of L0/L1/L2
> > >> terminology.
> > >>
> > >> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
> > >> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
> > >> the host will not do a wrmsr on exit.  Is this safe for the host kernel?
> > > 
> > > I think the CPU folks would want us to write the msr again.
> > 
> > Want us, or need us---and if we don't do that, what happens?  And if we
> > have to do it, how is IBRS=1 different from an IBPB?...
> 
> Arjan says 'ring transition' but I am pretty sure it is more of 'prediction
> mode change'. And from what I have gathered so far moving from lower (guest)
> to higher (hypervisor) has no bearing on the branch predicator. Meaning
> the guest ring0 can attack us if we don't touch this MSR.
> 
> We have to WRMSR 0x48 to 1 to flush out lower prediction. Aka this is a
> 'reset' button and at every 'prediction mode' you have to hit this.

I suppose means that when we VMENTER the original fix (where we
compare the host to guest) can stay - as we entering an lower prediction
mode. I wonder then what does writting 0 do to it? A nop?

> 
> 
> <sigh> Can we have a discussion on making an kvm-security mailing list
> where we can figure all this out during embargo and not have these
> misunderstandings.
> 
> > 
> > Since I am at it, what happens on *current generation* CPUs if you
> > always leave IBRS=1?  Slow and safe, or fast and unsafe?
> > 
> > >> 2) How will the future processors work where IBRS should always be =1?
> > > 
> > > IBRS=1 should be "fire and forget this ever happened".
> > > This is the only time anyone should use IBRS in practice
> > 
> > And IBPB too I hope?  But besides that, I need to know exactly how that
> > is implemented to ensure that it's doing the right thing.
> > 
> > > (and then the host turns it on and makes sure to not expose it to the
> > > guests I hope)
> > 
> > That's not that easy, because guests might have support for SPEC_CTRL
> > but not for IA32_ARCH_CAPABILITIES.
> > 
> > You could disable the SPEC_CTRL bit, but then the guest might think it
> > is not secure.  It might also actually *be* insecure, if you migrated to
> > an older CPU where IBRS is not fire-and-forget.
> > 
> > Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 16:49       ` Paolo Bonzini
@ 2018-01-09 20:39         ` Konrad Rzeszutek Wilk
  2018-01-09 20:47           ` Konrad Rzeszutek Wilk
  2018-01-09 21:56           ` Paolo Bonzini
  0 siblings, 2 replies; 22+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-01-09 20:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Arjan van de Ven, Liran Alon, jmattson, dwmw, bp, aliguori,
	thomas.lendacky, linux-kernel, kvm

On Tue, Jan 09, 2018 at 05:49:08PM +0100, Paolo Bonzini wrote:
> On 09/01/2018 17:23, Arjan van de Ven wrote:
> > On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
> >> On 09/01/2018 16:19, Arjan van de Ven wrote:
> >>> On 1/9/2018 7:00 AM, Liran Alon wrote:
> >>>>
> >>>> ----- arjan@linux.intel.com wrote:
> >>>>
> >>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> >>>>>> The above ("IBRS simply disables the indirect branch predictor")
> >>>>>> was my
> >>>>>> take-away message from private discussion with Intel.  My guess is
> >>>>>> that
> >>>>>> the vendors are just handwaving a spec that doesn't match what
> >>>>>> they have
> >>>>>> implemented, because honestly a microcode update is unlikely to do
> >>>>>> much
> >>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
> >>>>>> though, since the performance characteristics of IBRS are so
> >>>>>> different
> >>>>>> from previous processors.  Let's ask Arjan who might have more
> >>>>>> information about it, and hope he actually can disclose it...
> >>>>>
> >>>>> IBRS will ensure that, when set after the ring transition, no earlier
> >>>>> branch prediction data is used for indirect branches while IBRS is
> >>>>> set
> >>
> >> Let me ask you my questions, which are independent of L0/L1/L2
> >> terminology.
> >>
> >> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
> >> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
> >> the host will not do a wrmsr on exit.  Is this safe for the host kernel?
> > 
> > I think the CPU folks would want us to write the msr again.
> 
> Want us, or need us---and if we don't do that, what happens?  And if we
> have to do it, how is IBRS=1 different from an IBPB?...

Arjan says 'ring transition' but I am pretty sure it is more of 'prediction
mode change'. And from what I have gathered so far moving from lower (guest)
to higher (hypervisor) has no bearing on the branch predicator. Meaning
the guest ring0 can attack us if we don't touch this MSR.

We have to WRMSR 0x48 to 1 to flush out lower prediction. Aka this is a
'reset' button and at every 'prediction mode' you have to hit this.


<sigh> Can we have a discussion on making an kvm-security mailing list
where we can figure all this out during embargo and not have these
misunderstandings.

> 
> Since I am at it, what happens on *current generation* CPUs if you
> always leave IBRS=1?  Slow and safe, or fast and unsafe?
> 
> >> 2) How will the future processors work where IBRS should always be =1?
> > 
> > IBRS=1 should be "fire and forget this ever happened".
> > This is the only time anyone should use IBRS in practice
> 
> And IBPB too I hope?  But besides that, I need to know exactly how that
> is implemented to ensure that it's doing the right thing.
> 
> > (and then the host turns it on and makes sure to not expose it to the
> > guests I hope)
> 
> That's not that easy, because guests might have support for SPEC_CTRL
> but not for IA32_ARCH_CAPABILITIES.
> 
> You could disable the SPEC_CTRL bit, but then the guest might think it
> is not secure.  It might also actually *be* insecure, if you migrated to
> an older CPU where IBRS is not fire-and-forget.
> 
> Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 16:23     ` Arjan van de Ven
@ 2018-01-09 16:49       ` Paolo Bonzini
  2018-01-09 20:39         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2018-01-09 16:49 UTC (permalink / raw)
  To: Arjan van de Ven, Liran Alon
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, linux-kernel, kvm

On 09/01/2018 17:23, Arjan van de Ven wrote:
> On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
>> On 09/01/2018 16:19, Arjan van de Ven wrote:
>>> On 1/9/2018 7:00 AM, Liran Alon wrote:
>>>>
>>>> ----- arjan@linux.intel.com wrote:
>>>>
>>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
>>>>>> The above ("IBRS simply disables the indirect branch predictor")
>>>>>> was my
>>>>>> take-away message from private discussion with Intel.  My guess is
>>>>>> that
>>>>>> the vendors are just handwaving a spec that doesn't match what
>>>>>> they have
>>>>>> implemented, because honestly a microcode update is unlikely to do
>>>>>> much
>>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
>>>>>> though, since the performance characteristics of IBRS are so
>>>>>> different
>>>>>> from previous processors.  Let's ask Arjan who might have more
>>>>>> information about it, and hope he actually can disclose it...
>>>>>
>>>>> IBRS will ensure that, when set after the ring transition, no earlier
>>>>> branch prediction data is used for indirect branches while IBRS is
>>>>> set
>>
>> Let me ask you my questions, which are independent of L0/L1/L2
>> terminology.
>>
>> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
>> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
>> the host will not do a wrmsr on exit.  Is this safe for the host kernel?
> 
> I think the CPU folks would want us to write the msr again.

Want us, or need us---and if we don't do that, what happens?  And if we
have to do it, how is IBRS=1 different from an IBPB?...

Since I am at it, what happens on *current generation* CPUs if you
always leave IBRS=1?  Slow and safe, or fast and unsafe?

>> 2) How will the future processors work where IBRS should always be =1?
> 
> IBRS=1 should be "fire and forget this ever happened".
> This is the only time anyone should use IBRS in practice

And IBPB too I hope?  But besides that, I need to know exactly how that
is implemented to ensure that it's doing the right thing.

> (and then the host turns it on and makes sure to not expose it to the
> guests I hope)

That's not that easy, because guests might have support for SPEC_CTRL
but not for IA32_ARCH_CAPABILITIES.

You could disable the SPEC_CTRL bit, but then the guest might think it
is not secure.  It might also actually *be* insecure, if you migrated to
an older CPU where IBRS is not fire-and-forget.

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 16:17   ` Paolo Bonzini
@ 2018-01-09 16:23     ` Arjan van de Ven
  2018-01-09 16:49       ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2018-01-09 16:23 UTC (permalink / raw)
  To: Paolo Bonzini, Liran Alon
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, linux-kernel, kvm

On 1/9/2018 8:17 AM, Paolo Bonzini wrote:
> On 09/01/2018 16:19, Arjan van de Ven wrote:
>> On 1/9/2018 7:00 AM, Liran Alon wrote:
>>>
>>> ----- arjan@linux.intel.com wrote:
>>>
>>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
>>>>> The above ("IBRS simply disables the indirect branch predictor") was my
>>>>> take-away message from private discussion with Intel.  My guess is that
>>>>> the vendors are just handwaving a spec that doesn't match what they have
>>>>> implemented, because honestly a microcode update is unlikely to do much
>>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
>>>>> though, since the performance characteristics of IBRS are so different
>>>>> from previous processors.  Let's ask Arjan who might have more
>>>>> information about it, and hope he actually can disclose it...
>>>>
>>>> IBRS will ensure that, when set after the ring transition, no earlier
>>>> branch prediction data is used for indirect branches while IBRS is
>>>> set
> 
> Let me ask you my questions, which are independent of L0/L1/L2 terminology.
> 
> 1) Is vmentry/vmexit considered a ring transition, even if the guest is
> running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
> the host will not do a wrmsr on exit.  Is this safe for the host kernel?

I think the CPU folks would want us to write the msr again.


> 2) How will the future processors work where IBRS should always be =1?

IBRS=1 should be "fire and forget this ever happened".
This is the only time anyone should use IBRS in practice
(and then the host turns it on and makes sure to not expose it to the guests I hope)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 15:19 ` Arjan van de Ven
@ 2018-01-09 16:17   ` Paolo Bonzini
  2018-01-09 16:23     ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2018-01-09 16:17 UTC (permalink / raw)
  To: Arjan van de Ven, Liran Alon
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, linux-kernel, kvm

On 09/01/2018 16:19, Arjan van de Ven wrote:
> On 1/9/2018 7:00 AM, Liran Alon wrote:
>>
>> ----- arjan@linux.intel.com wrote:
>>
>>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
>>>> The above ("IBRS simply disables the indirect branch predictor") was my
>>>> take-away message from private discussion with Intel.  My guess is that
>>>> the vendors are just handwaving a spec that doesn't match what they have
>>>> implemented, because honestly a microcode update is unlikely to do much
>>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
>>>> though, since the performance characteristics of IBRS are so different
>>>> from previous processors.  Let's ask Arjan who might have more
>>>> information about it, and hope he actually can disclose it...
>>>
>>> IBRS will ensure that, when set after the ring transition, no earlier
>>> branch prediction data is used for indirect branches while IBRS is
>>> set

Let me ask you my questions, which are independent of L0/L1/L2 terminology.

1) Is vmentry/vmexit considered a ring transition, even if the guest is
running in ring 0?  If IBRS=1 in the guest and the host is using IBRS,
the host will not do a wrmsr on exit.  Is this safe for the host kernel?

2) How will the future processors work where IBRS should always be =1?
Will they still need IBPB?  If I get a vmexit from a guest with IBRS=1,
and do a vmentry to the same VMCS *but with a different VPID*, will the
code running after the vmentry share BTB entries with code running
before the vmexit?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 16:01 Liran Alon
  0 siblings, 0 replies; 22+ messages in thread
From: Liran Alon @ 2018-01-09 16:01 UTC (permalink / raw)
  To: arjan
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, pbonzini,
	linux-kernel, kvm


----- arjan@linux.intel.com wrote:

> >> I'm sorry I'm not familiar with your L0/L1/L2 terminology
> >> (maybe it's before coffee has had time to permeate the brain)
> > 
> > These are standard terminology for guest levels:
> > L0 == hypervisor that runs on bare-metal
> > L1 == hypervisor that runs as L0 guest.
> > L2 == software that runs as L1 guest.
> > (We are talking about nested virtualization here)
> 
> 1. I really really hope that the guests don't use IBRS but use
> retpoline. At least for Linux that is going to be the prefered
> approach.
> 
> 2. For the CPU, there really is only "bare metal" vs "guest"; all
> guests are "guests" no matter how deeply nested. So for the language
> of privilege domains etc,
> nested guests equal their parent.

So in the scenario I mentioned above, would L1 use BTB/BHB entries inserted by L2?
To me it seems that it would if IBRS takes prediction-mode into consideration.
And therefore, we must issue IBPB when switching between L1 & L2.
Same as happen on nVMX when switching between vmcs01 & vmcs02.

-Liran

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 15:00 Liran Alon
@ 2018-01-09 15:19 ` Arjan van de Ven
  2018-01-09 16:17   ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2018-01-09 15:19 UTC (permalink / raw)
  To: Liran Alon
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, pbonzini,
	linux-kernel, kvm

On 1/9/2018 7:00 AM, Liran Alon wrote:
> 
> ----- arjan@linux.intel.com wrote:
> 
>> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
>>> The above ("IBRS simply disables the indirect branch predictor") was
>> my
>>> take-away message from private discussion with Intel.  My guess is
>> that
>>> the vendors are just handwaving a spec that doesn't match what they
>> have
>>> implemented, because honestly a microcode update is unlikely to do
>> much
>>> more than an old-fashioned chicken bit.  Maybe on Skylake it does
>>> though, since the performance characteristics of IBRS are so
>> different
>>> from previous processors.  Let's ask Arjan who might have more
>>> information about it, and hope he actually can disclose it...
>>
>> IBRS will ensure that, when set after the ring transition, no earlier
>> branch prediction data is used for indirect branches while IBRS is
>> set
> 
> Consider the following scenario:
> 1. L1 runs with IBRS=1 in Ring0.
> 2. L1 restores L2 SPEC_CTRL and enters into L2.
> 3. L1 VMRUN exits into L0 which backups L1 SPEC_CTRL and enters L2 (using same VMCB).
> 4. L2 populates BTB/BHB with values and cause a hypercall which #VMExit into L0.
> 5. L0 backups L2 SPEC_CTRL and writes IBRS=1.
> 6. L0 restores L1 SPEC_CTRL and enters L1.
> 7. L1 backups L2 SPEC_CTRL and writes IBRS=1.
> 

I'm sorry I'm not familiar with your L0/L1/L2 terminology
(maybe it's before coffee has had time to permeate the brain)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 15:00 Liran Alon
  2018-01-09 15:19 ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Liran Alon @ 2018-01-09 15:00 UTC (permalink / raw)
  To: arjan
  Cc: jmattson, dwmw, bp, aliguori, thomas.lendacky, pbonzini,
	linux-kernel, kvm


----- arjan@linux.intel.com wrote:

> On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> > The above ("IBRS simply disables the indirect branch predictor") was
> my
> > take-away message from private discussion with Intel.  My guess is
> that
> > the vendors are just handwaving a spec that doesn't match what they
> have
> > implemented, because honestly a microcode update is unlikely to do
> much
> > more than an old-fashioned chicken bit.  Maybe on Skylake it does
> > though, since the performance characteristics of IBRS are so
> different
> > from previous processors.  Let's ask Arjan who might have more
> > information about it, and hope he actually can disclose it...
> 
> IBRS will ensure that, when set after the ring transition, no earlier
> branch prediction data is used for indirect branches while IBRS is
> set

Consider the following scenario:
1. L1 runs with IBRS=1 in Ring0.
2. L1 restores L2 SPEC_CTRL and enters into L2.
3. L1 VMRUN exits into L0 which backups L1 SPEC_CTRL and enters L2 (using same VMCB).
4. L2 populates BTB/BHB with values and cause a hypercall which #VMExit into L0.
5. L0 backups L2 SPEC_CTRL and writes IBRS=1.
6. L0 restores L1 SPEC_CTRL and enters L1.
7. L1 backups L2 SPEC_CTRL and writes IBRS=1.

Just to make sure I understand:
You state that L2 BTB/BHB won't be used by L1 because L1 have set IBRS=1 in step (7).
And that is even though L1 & L2 could both be running in SVM guest-mode & Ring0 from physical CPU perspective. Therefore, having the same prediction-mode.

So basically you are saying that IBRS don't make sure to avoid using BTB/BHB from lower prediction-modes but instead just make sure to avoid usage of all BTB/BHB while IBRS is set.

Did I understand everything correctly?

Thanks,
-Liran

> 
> (this is a english summary of two pages of technical spec so it lacks
> the language lawyer precision)
> 
> because of this promise, the implementation tends to be impactful
> and it is very strongly recommended that retpoline is used instead of
> IBRS.
> (with all the caveats already on lkml)
> 
> the IBPB is different, this is a covenient thing for switching between
> VM guests etc

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 11:41 ` Paolo Bonzini
@ 2018-01-09 14:30   ` Arjan van de Ven
  0 siblings, 0 replies; 22+ messages in thread
From: Arjan van de Ven @ 2018-01-09 14:30 UTC (permalink / raw)
  To: Paolo Bonzini, Liran Alon
  Cc: jmattson, dwmw, bp, thomas.lendacky, aliguori, linux-kernel, kvm

On 1/9/2018 3:41 AM, Paolo Bonzini wrote:
> The above ("IBRS simply disables the indirect branch predictor") was my
> take-away message from private discussion with Intel.  My guess is that
> the vendors are just handwaving a spec that doesn't match what they have
> implemented, because honestly a microcode update is unlikely to do much
> more than an old-fashioned chicken bit.  Maybe on Skylake it does
> though, since the performance characteristics of IBRS are so different
> from previous processors.  Let's ask Arjan who might have more
> information about it, and hope he actually can disclose it...

IBRS will ensure that, when set after the ring transition, no earlier
branch prediction data is used for indirect branches while IBRS is set

(this is a english summary of two pages of technical spec so it lacks
the language lawyer precision)

because of this promise, the implementation tends to be impactful
and it is very strongly recommended that retpoline is used instead of IBRS.
(with all the caveats already on lkml)

the IBPB is different, this is a covenient thing for switching between VM guests etc

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-09 11:31 Liran Alon
@ 2018-01-09 11:41 ` Paolo Bonzini
  2018-01-09 14:30   ` Arjan van de Ven
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2018-01-09 11:41 UTC (permalink / raw)
  To: Liran Alon
  Cc: jmattson, dwmw, bp, thomas.lendacky, aliguori, linux-kernel, kvm,
	Arjan van de Ven

On 09/01/2018 12:31, Liran Alon wrote:
>> On current generation processors, as I understand it, IBRS simply 
>> disables the indirect branch predictor when set to 1.  Therefore,
>> as
> This is not how I understood what IBRS does.
> 
> Intel (not sure about AMD) states that if IBRS is set, indirect
> branches will now allow their predicted target address to be
> controlled by code that executed in a less privileged prediction-mode
> before the IBRS was last written with a value of 1. (Intel also
> states that thus an indirect branch may be affected by code in a less
> privileged prediction-mode that executed AFTER IBRS mode was last
> written with a value of 1).

Difficult to say that, because AMD microcode for IBRS does not exist.
Is it even documented whether VMX/SVM non-root mode is a different
physical prediction mode.

Generally I agree you'd go by what the spec say, but we don't really
have a final spec from the vendors.  The microarchitectural level
matters much more than the spec in this case, I'm opinion: what current
microcode patches do, and how future processors plan to close the leak
once and for all.

The above ("IBRS simply disables the indirect branch predictor") was my
take-away message from private discussion with Intel.  My guess is that
the vendors are just handwaving a spec that doesn't match what they have
implemented, because honestly a microcode update is unlikely to do much
more than an old-fashioned chicken bit.  Maybe on Skylake it does
though, since the performance characteristics of IBRS are so different
from previous processors.  Let's ask Arjan who might have more
information about it, and hope he actually can disclose it...

Paolo

> Therefore, L2 could also inject BTB/BHB entries that will be used by
> L1: Consider the case that L2 injects values into BTB/BHB and then
> issues a hypercall which will cause #VMExit which will be forwarded
> to L1. Because L2 & L1 runs in the same prediction-mode (both Ring0
> SVM guest-mode) from physical CPU perspective, Ring0 L1 code could be
> using BTB/BHB entries injected by Ring0 L2. (The fact that L0 have
> set IBRS to 1 when exiting from L2 to L0 doesn't prevent those
> entries from being used by L1 after L0 enters L1).
> 
> This is different than what happens in non-nested case as L0 & L1
> runs in different physical prediction-modes and therefore setting
> IBRS=1 on every #VMExit should suffice.
> 
> Therefore, I don't understand how L1 setting IBRS to 1 will help him
> in this case. Maybe this mechanism works differently on AMD?
> 
>> Future processors might have a mode that says "just set IBRS=1 and
>> I'll DTRT".  If that's done by indexing the BTB using the full virtual
>> address, the CPL _and_ the ASID/PCID/VPID, then IBPB is not needed
>> here because the L2 VM uses a different ASID.  If that's done by only
>> augmenting the BTB index with the CPL, we'd need an IBPB.  But in
>> this new world we've been thrown into, that would be a processor bug...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
@ 2018-01-09 11:31 Liran Alon
  2018-01-09 11:41 ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Liran Alon @ 2018-01-09 11:31 UTC (permalink / raw)
  To: pbonzini; +Cc: jmattson, dwmw, bp, thomas.lendacky, aliguori, linux-kernel, kvm


----- pbonzini@redhat.com wrote:

> On 08/01/2018 21:00, Liran Alon wrote:
> > 
> > 
> > On 08/01/18 20:08, Paolo Bonzini wrote:
> >> From: Tom Lendacky <thomas.lendacky@amd.com>
> >>
> >> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU
> is
> >> going to run a VCPU different from what was previously run. 
> Nested
> >> virtualization uses the same VMCB for the second level guest, but
> the
> >> L1 hypervisor should be using IBRS to protect itself.
> >>
> >> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >> ---
> >>   arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
> >>   1 file changed, 31 insertions(+)
> >>
> >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >> index 779981a00d01..dd744d896cec 100644
> >> --- a/arch/x86/kvm/svm.c
> >> +++ b/arch/x86/kvm/svm.c
> >> @@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
> >>   module_param(vgif, int, 0444);
> >>
> >>   static bool __read_mostly have_spec_ctrl;
> >> +static bool __read_mostly have_ibpb_support;
> >>
> >>   static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long
> cr0);
> >>   static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool
> invalidate_gpa);
> >> @@ -540,6 +541,7 @@ struct svm_cpu_data {
> >>       struct kvm_ldttss_desc *tss_desc;
> >>
> >>       struct page *save_area;
> >> +    struct vmcb *current_vmcb;
> >>   };
> >>
> >>   static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> >> @@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
> >>           pr_info("kvm: SPEC_CTRL available\n");
> >>       else
> >>           pr_info("kvm: SPEC_CTRL not available\n");
> >> +    have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
> >> +    if (have_ibpb_support)
> >> +        pr_info("kvm: IBPB_SUPPORT available\n");
> >> +    else
> >> +        pr_info("kvm: IBPB_SUPPORT not available\n");
> >>
> >>       return 0;
> >>
> >> @@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu
> *vcpu)
> >>       __free_pages(virt_to_page(svm->nested.msrpm),
> MSRPM_ALLOC_ORDER);
> >>       kvm_vcpu_uninit(vcpu);
> >>       kmem_cache_free(kvm_vcpu_cache, svm);
> >> +
> >> +    /*
> >> +     * The VMCB could be recycled, causing a false negative in
> >> +     * svm_vcpu_load; block speculative execution.
> >> +     */
> >> +    if (have_ibpb_support)
> >> +        wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
> >>   }
> >>
> >>   static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >>   {
> >>       struct vcpu_svm *svm = to_svm(vcpu);
> >> +    struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
> >>       int i;
> >>
> >>       if (unlikely(cpu != vcpu->cpu)) {
> >> @@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu
> >> *vcpu, int cpu)
> >>       if (static_cpu_has(X86_FEATURE_RDTSCP))
> >>           wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
> >>
> >> +    if (sd->current_vmcb != svm->vmcb) {
> >> +        sd->current_vmcb = svm->vmcb;
> >> +        if (have_ibpb_support)
> >> +            wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
> >> +    }
> >> +
> >>       avic_vcpu_load(vcpu, cpu);
> >>   }
> >>
> >> @@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm
> *svm)
> >>       if (!nested_vmcb)
> >>           return 1;
> >>
> >> +    /*
> >> +     * No need for IBPB here, the L1 hypervisor should be running
> with
> >> +     * IBRS=1 and inserts one already when switching L2 VMs.
> >> +     */
> >> +
> > 
> > I am not fully convinced yet this is OK from security perspective.
> > From the CPU point-of-view, both L1 & L2 run in the same
> prediction-mode
> > (as they are both running in SVM guest-mode). Therefore, IBRS will
> not
> > hide the BHB & BTB of L1 from L2.
> 
> Indeed, IBRS will not hide the indirect branch predictor state of L2
> user mode from L1 user mode.
> 
> On current generation processors, as I understand it, IBRS simply
> disables the indirect branch predictor when set to 1.  Therefore, as

This is not how I understood what IBRS does.

Intel (not sure about AMD) states that if IBRS is set, indirect branches will now allow their predicted target address to be controlled by code that executed in a less privileged prediction-mode before the IBRS was last written with a value of 1.
(Intel also states that thus an indirect branch may be affected by code in a less privileged prediction-mode that executed AFTER IBRS mode was last written with a value of 1).

Therefore, L2 could also inject BTB/BHB entries that will be used by L1:
Consider the case that L2 injects values into BTB/BHB and then issues a hypercall which will cause #VMExit which will be forwarded to L1. Because L2 & L1 runs in the same prediction-mode (both Ring0 SVM guest-mode) from physical CPU perspective, Ring0 L1 code could be using BTB/BHB entries injected by Ring0 L2.
(The fact that L0 have set IBRS to 1 when exiting from L2 to L0 doesn't prevent those entries from being used by L1 after L0 enters L1).

This is different than what happens in non-nested case as L0 & L1 runs in different physical prediction-modes and therefore setting IBRS=1 on every #VMExit should suffice.

Therefore, I don't understand how L1 setting IBRS to 1 will help him in this case.
Maybe this mechanism works differently on AMD?

-Liran

> long as the L1 hypervisor sets IBRS=1, the indirect branch predictor
> state left by L2 does not affect execution of the L1 hypervisor.
> 
> Future processors might have a mode that says "just set IBRS=1 and
> I'll
> DTRT".  If that's done by indexing the BTB using the full virtual
> address, the CPL _and_ the ASID/PCID/VPID, then IBPB is not needed
> here
> because the L2 VM uses a different ASID.  If that's done by only
> augmenting the BTB index with the CPL, we'd need an IBPB.  But in
> this
> new world we've been thrown into, that would be a processor bug...
> 
> Note that AMD currently doesn't implement IBRS at all.  In order to
> mitigate against CVE-2017-5715, the hypervisor should issue an IBPB
> on
> every vmexit (in both L0 and L1).  However, the cost of that is very
> high.  While we may want a paranoia mode that does it, it is outside
> the
> scope of this series (which is more of a "let's unblock QEMU patches"
> thing than a complete fix).
> 
> > 6. One can implicitly assume that L1 hypervisor did issued a IBPB
> when
> > loading an VMCB and therefore it doesn't have to do it again in L0.
> > However, I see 2 problems with this approach:
> > (a) We don't protect old non-patched L1 hypervisors from Spectre
> even
> > though we could easily do that from L0 with small performance hit.
> 
> Yeah, that's a nice thing to have.  However, I disagree on the
> "small"
> performance hit... on a patched hypervisor, the cost of a double fix
> can
> be very noticeable.
> 
> > (b) When L1 hypervisor runs only a single L2 VM, it doesn't have to
> > issue an IBPB when loading the VMCB (as it didn't run any other
> VMCB
> > before) and it should be sufficient from L1 perspective to just
> write 1
> > to IBRS. However, in our nested-case, this is a security-hole.
> 
> This is the main difference in our reasoning.  I think IBRS=1 (or
> IBPB
> on vmexit) should be enough.
> 
> Paolo
> 
> > Am I misunderstanding something?
> > 
> > Regards,
> > -Liran
> > 
> >>       /* Exit Guest-Mode */
> >>       leave_guest_mode(&svm->vcpu);
> >>       svm->nested.vmcb = 0;
> >> @@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm
> *svm)
> >>       if (!nested_vmcb)
> >>           return false;
> >>
> >> +    /*
> >> +     * No need for IBPB here, since the nested VM is less
> >> privileged.  The
> >> +     * L1 hypervisor inserts one already when switching L2 VMs.
> >> +     */
> >> +
> >>       if (!nested_vmcb_checks(nested_vmcb)) {
> >>           nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
> >>           nested_vmcb->control.exit_code_hi = 0;
> >>
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-08 20:00   ` Liran Alon
@ 2018-01-09 11:07     ` Paolo Bonzini
  0 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2018-01-09 11:07 UTC (permalink / raw)
  To: Liran Alon, linux-kernel, kvm
  Cc: jmattson, aliguori, thomas.lendacky, dwmw, bp

On 08/01/2018 21:00, Liran Alon wrote:
> 
> 
> On 08/01/18 20:08, Paolo Bonzini wrote:
>> From: Tom Lendacky <thomas.lendacky@amd.com>
>>
>> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
>> going to run a VCPU different from what was previously run.  Nested
>> virtualization uses the same VMCB for the second level guest, but the
>> L1 hypervisor should be using IBRS to protect itself.
>>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
>>   1 file changed, 31 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 779981a00d01..dd744d896cec 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
>>   module_param(vgif, int, 0444);
>>
>>   static bool __read_mostly have_spec_ctrl;
>> +static bool __read_mostly have_ibpb_support;
>>
>>   static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
>>   static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
>> @@ -540,6 +541,7 @@ struct svm_cpu_data {
>>       struct kvm_ldttss_desc *tss_desc;
>>
>>       struct page *save_area;
>> +    struct vmcb *current_vmcb;
>>   };
>>
>>   static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
>> @@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
>>           pr_info("kvm: SPEC_CTRL available\n");
>>       else
>>           pr_info("kvm: SPEC_CTRL not available\n");
>> +    have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
>> +    if (have_ibpb_support)
>> +        pr_info("kvm: IBPB_SUPPORT available\n");
>> +    else
>> +        pr_info("kvm: IBPB_SUPPORT not available\n");
>>
>>       return 0;
>>
>> @@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>>       __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>>       kvm_vcpu_uninit(vcpu);
>>       kmem_cache_free(kvm_vcpu_cache, svm);
>> +
>> +    /*
>> +     * The VMCB could be recycled, causing a false negative in
>> +     * svm_vcpu_load; block speculative execution.
>> +     */
>> +    if (have_ibpb_support)
>> +        wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>>   }
>>
>>   static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>   {
>>       struct vcpu_svm *svm = to_svm(vcpu);
>> +    struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>>       int i;
>>
>>       if (unlikely(cpu != vcpu->cpu)) {
>> @@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu
>> *vcpu, int cpu)
>>       if (static_cpu_has(X86_FEATURE_RDTSCP))
>>           wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>>
>> +    if (sd->current_vmcb != svm->vmcb) {
>> +        sd->current_vmcb = svm->vmcb;
>> +        if (have_ibpb_support)
>> +            wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>> +    }
>> +
>>       avic_vcpu_load(vcpu, cpu);
>>   }
>>
>> @@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
>>       if (!nested_vmcb)
>>           return 1;
>>
>> +    /*
>> +     * No need for IBPB here, the L1 hypervisor should be running with
>> +     * IBRS=1 and inserts one already when switching L2 VMs.
>> +     */
>> +
> 
> I am not fully convinced yet this is OK from security perspective.
> From the CPU point-of-view, both L1 & L2 run in the same prediction-mode
> (as they are both running in SVM guest-mode). Therefore, IBRS will not
> hide the BHB & BTB of L1 from L2.

Indeed, IBRS will not hide the indirect branch predictor state of L2
user mode from L1 user mode.

On current generation processors, as I understand it, IBRS simply
disables the indirect branch predictor when set to 1.  Therefore, as
long as the L1 hypervisor sets IBRS=1, the indirect branch predictor
state left by L2 does not affect execution of the L1 hypervisor.

Future processors might have a mode that says "just set IBRS=1 and I'll
DTRT".  If that's done by indexing the BTB using the full virtual
address, the CPL _and_ the ASID/PCID/VPID, then IBPB is not needed here
because the L2 VM uses a different ASID.  If that's done by only
augmenting the BTB index with the CPL, we'd need an IBPB.  But in this
new world we've been thrown into, that would be a processor bug...

Note that AMD currently doesn't implement IBRS at all.  In order to
mitigate against CVE-2017-5715, the hypervisor should issue an IBPB on
every vmexit (in both L0 and L1).  However, the cost of that is very
high.  While we may want a paranoia mode that does it, it is outside the
scope of this series (which is more of a "let's unblock QEMU patches"
thing than a complete fix).

> 6. One can implicitly assume that L1 hypervisor did issued a IBPB when
> loading an VMCB and therefore it doesn't have to do it again in L0.
> However, I see 2 problems with this approach:
> (a) We don't protect old non-patched L1 hypervisors from Spectre even
> though we could easily do that from L0 with small performance hit.

Yeah, that's a nice thing to have.  However, I disagree on the "small"
performance hit... on a patched hypervisor, the cost of a double fix can
be very noticeable.

> (b) When L1 hypervisor runs only a single L2 VM, it doesn't have to
> issue an IBPB when loading the VMCB (as it didn't run any other VMCB
> before) and it should be sufficient from L1 perspective to just write 1
> to IBRS. However, in our nested-case, this is a security-hole.

This is the main difference in our reasoning.  I think IBRS=1 (or IBPB
on vmexit) should be enough.

Paolo

> Am I misunderstanding something?
> 
> Regards,
> -Liran
> 
>>       /* Exit Guest-Mode */
>>       leave_guest_mode(&svm->vcpu);
>>       svm->nested.vmcb = 0;
>> @@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
>>       if (!nested_vmcb)
>>           return false;
>>
>> +    /*
>> +     * No need for IBPB here, since the nested VM is less
>> privileged.  The
>> +     * L1 hypervisor inserts one already when switching L2 VMs.
>> +     */
>> +
>>       if (!nested_vmcb_checks(nested_vmcb)) {
>>           nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
>>           nested_vmcb->control.exit_code_hi = 0;
>>
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-08 18:08 ` [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU Paolo Bonzini
@ 2018-01-08 20:00   ` Liran Alon
  2018-01-09 11:07     ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Liran Alon @ 2018-01-08 20:00 UTC (permalink / raw)
  To: Paolo Bonzini, linux-kernel, kvm
  Cc: jmattson, aliguori, thomas.lendacky, dwmw, bp



On 08/01/18 20:08, Paolo Bonzini wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
>
> Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
> going to run a VCPU different from what was previously run.  Nested
> virtualization uses the same VMCB for the second level guest, but the
> L1 hypervisor should be using IBRS to protect itself.
>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
>   1 file changed, 31 insertions(+)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 779981a00d01..dd744d896cec 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
>   module_param(vgif, int, 0444);
>
>   static bool __read_mostly have_spec_ctrl;
> +static bool __read_mostly have_ibpb_support;
>
>   static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
>   static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
> @@ -540,6 +541,7 @@ struct svm_cpu_data {
>   	struct kvm_ldttss_desc *tss_desc;
>
>   	struct page *save_area;
> +	struct vmcb *current_vmcb;
>   };
>
>   static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
> @@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
>   		pr_info("kvm: SPEC_CTRL available\n");
>   	else
>   		pr_info("kvm: SPEC_CTRL not available\n");
> +	have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
> +	if (have_ibpb_support)
> +		pr_info("kvm: IBPB_SUPPORT available\n");
> +	else
> +		pr_info("kvm: IBPB_SUPPORT not available\n");
>
>   	return 0;
>
> @@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
>   	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
>   	kvm_vcpu_uninit(vcpu);
>   	kmem_cache_free(kvm_vcpu_cache, svm);
> +
> +	/*
> +	 * The VMCB could be recycled, causing a false negative in
> +	 * svm_vcpu_load; block speculative execution.
> +	 */
> +	if (have_ibpb_support)
> +		wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
>   }
>
>   static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
>   	int i;
>
>   	if (unlikely(cpu != vcpu->cpu)) {
> @@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>   	if (static_cpu_has(X86_FEATURE_RDTSCP))
>   		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
>
> +	if (sd->current_vmcb != svm->vmcb) {
> +		sd->current_vmcb = svm->vmcb;
> +		if (have_ibpb_support)
> +			wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
> +	}
> +
>   	avic_vcpu_load(vcpu, cpu);
>   }
>
> @@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
>   	if (!nested_vmcb)
>   		return 1;
>
> +	/*
> +	 * No need for IBPB here, the L1 hypervisor should be running with
> +	 * IBRS=1 and inserts one already when switching L2 VMs.
> +	 */
> +

I am not fully convinced yet this is OK from security perspective.
 From the CPU point-of-view, both L1 & L2 run in the same 
prediction-mode (as they are both running in SVM guest-mode). Therefore, 
IBRS will not hide the BHB & BTB of L1 from L2.

This is how I understand things:
1. When transition between contexts in same prediction-mode, software is 
responsible for issuing an IBPB to basically "delete" the "branch 
prediction data" of the previous context such that the new context won't 
be able to use it.
This is orthogonal to IBRS which makes sure new context won't use 
"branch prediction data" that was created by a previous less-privileged 
prediction-mode as we are talking about transitioning between 2 contexts 
in same physical prediction-mode.
2. Because of (1), KVM was modified to issue IBPB when replacing active 
VMCB/VMCS on CPU.
3. For the nested-virtualization case, L1 & L2 both run in same physical 
prediction-mode. As from physical CPU perspective, they are both running 
in SVM guest-mode. Therefore, L0 should issue IBPB when transitioning 
from L1 to L2 and vice-versa.
4. In nested VMX case, this already happens because transitioning 
between L1 & L2 involves changing active VMCS on CPU (from vmcs01 to 
vmcs02) which already issues an IBPB.
5. However, nested SVM is reusing the same VMCB when transitioning 
between L1 & L2 and therefore we should explicitly issue an IBPB in 
nested_svm_vmrun() & nested_svm_vmexit().
6. One can implicitly assume that L1 hypervisor did issued a IBPB when 
loading an VMCB and therefore it doesn't have to do it again in L0.
However, I see 2 problems with this approach:
(a) We don't protect old non-patched L1 hypervisors from Spectre even 
though we could easily do that from L0 with small performance hit.
(b) When L1 hypervisor runs only a single L2 VM, it doesn't have to 
issue an IBPB when loading the VMCB (as it didn't run any other VMCB 
before) and it should be sufficient from L1 perspective to just write 1 
to IBRS. However, in our nested-case, this is a security-hole.

Am I misunderstanding something?

Regards,
-Liran

>   	/* Exit Guest-Mode */
>   	leave_guest_mode(&svm->vcpu);
>   	svm->nested.vmcb = 0;
> @@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
>   	if (!nested_vmcb)
>   		return false;
>
> +	/*
> +	 * No need for IBPB here, since the nested VM is less privileged.  The
> +	 * L1 hypervisor inserts one already when switching L2 VMs.
> +	 */
> +
>   	if (!nested_vmcb_checks(nested_vmcb)) {
>   		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
>   		nested_vmcb->control.exit_code_hi = 0;
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU
  2018-01-08 18:08 [PATCH 0/7] KVM: x86: expose CVE-2017-5715 ("Spectre variant 2") mitigations to guest Paolo Bonzini
@ 2018-01-08 18:08 ` Paolo Bonzini
  2018-01-08 20:00   ` Liran Alon
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2018-01-08 18:08 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: jmattson, aliguori, thomas.lendacky, dwmw, bp

From: Tom Lendacky <thomas.lendacky@amd.com>

Set IBPB (Indirect Branch Prediction Barrier) when the current CPU is
going to run a VCPU different from what was previously run.  Nested
virtualization uses the same VMCB for the second level guest, but the
L1 hypervisor should be using IBRS to protect itself.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/svm.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 779981a00d01..dd744d896cec 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -289,6 +289,7 @@ struct amd_svm_iommu_ir {
 module_param(vgif, int, 0444);
  
 static bool __read_mostly have_spec_ctrl;
+static bool __read_mostly have_ibpb_support;
 
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
@@ -540,6 +541,7 @@ struct svm_cpu_data {
 	struct kvm_ldttss_desc *tss_desc;
 
 	struct page *save_area;
+	struct vmcb *current_vmcb;
 };
 
 static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data);
@@ -1151,6 +1153,11 @@ static __init int svm_hardware_setup(void)
 		pr_info("kvm: SPEC_CTRL available\n");
 	else
 		pr_info("kvm: SPEC_CTRL not available\n");
+	have_ibpb_support = have_spec_ctrl || cpu_has_ibpb_support();
+	if (have_ibpb_support)
+		pr_info("kvm: IBPB_SUPPORT available\n");
+	else
+		pr_info("kvm: IBPB_SUPPORT not available\n");
 
 	return 0;
 
@@ -1725,11 +1732,19 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 	kvm_vcpu_uninit(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
+
+	/*
+	 * The VMCB could be recycled, causing a false negative in
+	 * svm_vcpu_load; block speculative execution.
+	 */
+	if (have_ibpb_support)
+		wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
 }
 
 static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	struct svm_cpu_data *sd = per_cpu(svm_data, cpu);
 	int i;
 
 	if (unlikely(cpu != vcpu->cpu)) {
@@ -1758,6 +1773,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
 
+	if (sd->current_vmcb != svm->vmcb) {
+		sd->current_vmcb = svm->vmcb;
+		if (have_ibpb_support)
+			wrmsrl(MSR_IA32_PRED_CMD, FEATURE_SET_IBPB);
+	}
+
 	avic_vcpu_load(vcpu, cpu);
 }
 
@@ -2798,6 +2819,11 @@ static int nested_svm_vmexit(struct vcpu_svm *svm)
 	if (!nested_vmcb)
 		return 1;
 
+	/*
+	 * No need for IBPB here, the L1 hypervisor should be running with
+	 * IBRS=1 and inserts one already when switching L2 VMs.
+	 */
+
 	/* Exit Guest-Mode */
 	leave_guest_mode(&svm->vcpu);
 	svm->nested.vmcb = 0;
@@ -3061,6 +3087,11 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm)
 	if (!nested_vmcb)
 		return false;
 
+	/*
+	 * No need for IBPB here, since the nested VM is less privileged.  The
+	 * L1 hypervisor inserts one already when switching L2 VMs.
+	 */
+
 	if (!nested_vmcb_checks(nested_vmcb)) {
 		nested_vmcb->control.exit_code    = SVM_EXIT_ERR;
 		nested_vmcb->control.exit_code_hi = 0;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-01-09 21:59 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-09 15:33 [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU Liran Alon
2018-01-09 15:56 ` Arjan van de Ven
  -- strict thread matches above, loose matches on Subject: below --
2018-01-09 16:01 Liran Alon
2018-01-09 15:00 Liran Alon
2018-01-09 15:19 ` Arjan van de Ven
2018-01-09 16:17   ` Paolo Bonzini
2018-01-09 16:23     ` Arjan van de Ven
2018-01-09 16:49       ` Paolo Bonzini
2018-01-09 20:39         ` Konrad Rzeszutek Wilk
2018-01-09 20:47           ` Konrad Rzeszutek Wilk
2018-01-09 20:57             ` Jim Mattson
2018-01-09 21:11               ` Konrad Rzeszutek Wilk
2018-01-09 21:19                 ` Jim Mattson
2018-01-09 21:42               ` Paolo Bonzini
2018-01-09 21:59                 ` Jim Mattson
2018-01-09 21:56           ` Paolo Bonzini
2018-01-09 11:31 Liran Alon
2018-01-09 11:41 ` Paolo Bonzini
2018-01-09 14:30   ` Arjan van de Ven
2018-01-08 18:08 [PATCH 0/7] KVM: x86: expose CVE-2017-5715 ("Spectre variant 2") mitigations to guest Paolo Bonzini
2018-01-08 18:08 ` [PATCH 6/7] x86/svm: Set IBPB when running a different VCPU Paolo Bonzini
2018-01-08 20:00   ` Liran Alon
2018-01-09 11:07     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.