All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] SVM: limit GIF=0 region
@ 2018-08-15  6:09 Jan Beulich
  2018-08-28 23:00 ` Boris Ostrovsky
  2018-08-28 23:06 ` Andrew Cooper
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Beulich @ 2018-08-15  6:09 UTC (permalink / raw)
  To: xen-devel
  Cc: Andrew Cooper, Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit

Use EFLAGS.IF for all ordinary purposes; there's in particular no need
to unduly defer NMI/#MC. Clear/set GIF solely around VMRUN itself. This
has the additional advantage that svm_stgi_label now indeed marks the
only place where GIF gets set.

A note regarding the main STI(s) placement: Quite counterintuitively the
host's EFLAGS.IF continues to have a meaning while the guest runs; see
PM Vol 2 section "Physical (INTR) Interrupt Masking in EFLAGS". Hence we
need to set the flag for the duration of time being in guest context.
However, SPEC_CTRL_ENTRY_FROM_HVM wants to be carried out with EFLAGS.IF
clear.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Add CLI after VMRUN. Adjust description.

--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -43,7 +43,7 @@ ENTRY(svm_asm_do_resume)
         lea  irq_stat+IRQSTAT_softirq_pending(%rip),%rdx
         xor  %ecx,%ecx
         shl  $IRQSTAT_shift,%eax
-        CLGI
+        cli
         cmp  %ecx,(%rdx,%rax,1)
         jne  .Lsvm_process_softirqs
 
@@ -57,7 +57,7 @@ UNLIKELY_START(ne, nsvm_hap)
          * Someone shot down our nested p2m table; go round again
          * and nsvm_vcpu_switch() will fix it for us.
          */
-        STGI
+        sti
         jmp  .Lsvm_do_resume
 __UNLIKELY_END(nsvm_hap)
 
@@ -87,7 +87,12 @@ __UNLIKELY_END(nsvm_hap)
         pop  %rsi
         pop  %rdi
 
+        CLGI
+        sti
         VMRUN
+        cli
+        STGI
+GLOBAL(svm_stgi_label)
 
         SAVE_ALL
 
@@ -96,13 +101,12 @@ __UNLIKELY_END(nsvm_hap)
         SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
-        STGI
-GLOBAL(svm_stgi_label)
+        sti
         mov  %rsp,%rdi
         call svm_vmexit_handler
         jmp  .Lsvm_do_resume
 
 .Lsvm_process_softirqs:
-        STGI
+        sti
         call do_softirq
         jmp  .Lsvm_do_resume





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] SVM: limit GIF=0 region
  2018-08-15  6:09 [PATCH v2] SVM: limit GIF=0 region Jan Beulich
@ 2018-08-28 23:00 ` Boris Ostrovsky
  2018-08-28 23:06 ` Andrew Cooper
  1 sibling, 0 replies; 6+ messages in thread
From: Boris Ostrovsky @ 2018-08-28 23:00 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Andrew Cooper, Brian Woods, Suravee Suthikulpanit

On 08/15/2018 02:09 AM, Jan Beulich wrote:
> Use EFLAGS.IF for all ordinary purposes; there's in particular no need
> to unduly defer NMI/#MC. Clear/set GIF solely around VMRUN itself. This
> has the additional advantage that svm_stgi_label now indeed marks the
> only place where GIF gets set.
>
> A note regarding the main STI(s) placement: Quite counterintuitively the
> host's EFLAGS.IF continues to have a meaning while the guest runs; see
> PM Vol 2 section "Physical (INTR) Interrupt Masking in EFLAGS". Hence we
> need to set the flag for the duration of time being in guest context.
> However, SPEC_CTRL_ENTRY_FROM_HVM wants to be carried out with EFLAGS.IF
> clear.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] SVM: limit GIF=0 region
  2018-08-15  6:09 [PATCH v2] SVM: limit GIF=0 region Jan Beulich
  2018-08-28 23:00 ` Boris Ostrovsky
@ 2018-08-28 23:06 ` Andrew Cooper
  2018-08-29  6:26   ` Jan Beulich
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2018-08-28 23:06 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit

On 15/08/18 07:09, Jan Beulich wrote:
> Use EFLAGS.IF for all ordinary purposes; there's in particular no need
> to unduly defer NMI/#MC. Clear/set GIF solely around VMRUN itself. This
> has the additional advantage that svm_stgi_label now indeed marks the
> only place where GIF gets set.
>
> A note regarding the main STI(s) placement: Quite counterintuitively the
> host's EFLAGS.IF continues to have a meaning while the guest runs; see
> PM Vol 2 section "Physical (INTR) Interrupt Masking in EFLAGS". Hence we
> need to set the flag for the duration of time being in guest context.
> However, SPEC_CTRL_ENTRY_FROM_HVM wants to be carried out with EFLAGS.IF
> clear.
>
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Add CLI after VMRUN. Adjust description.
>
> --- a/xen/arch/x86/hvm/svm/entry.S
> +++ b/xen/arch/x86/hvm/svm/entry.S
> @@ -43,7 +43,7 @@ ENTRY(svm_asm_do_resume)
>          lea  irq_stat+IRQSTAT_softirq_pending(%rip),%rdx
>          xor  %ecx,%ecx
>          shl  $IRQSTAT_shift,%eax
> -        CLGI
> +        cli
>          cmp  %ecx,(%rdx,%rax,1)
>          jne  .Lsvm_process_softirqs
>  
> @@ -57,7 +57,7 @@ UNLIKELY_START(ne, nsvm_hap)
>           * Someone shot down our nested p2m table; go round again
>           * and nsvm_vcpu_switch() will fix it for us.
>           */
> -        STGI
> +        sti
>          jmp  .Lsvm_do_resume
>  __UNLIKELY_END(nsvm_hap)
>  
> @@ -87,7 +87,12 @@ __UNLIKELY_END(nsvm_hap)
>          pop  %rsi
>          pop  %rdi
>  
> +        CLGI
> +        sti
>          VMRUN
> +        cli
> +        STGI
> +GLOBAL(svm_stgi_label)
>  
>          SAVE_ALL
>  
> @@ -96,13 +101,12 @@ __UNLIKELY_END(nsvm_hap)
>          SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>  
> -        STGI
> -GLOBAL(svm_stgi_label)
> +        sti

Nack.  As indicated in v1, moving this breaks SPEC_CTRL_ENTRY_FROM_HVM
(Even if there is an unexpected bug on the VT-x side of things which
needs fixing differently).

Furthermore, to fix LBR handling, the first thing I'd have to do is
revert this, so please leave it as it is.

~Andrew

>          mov  %rsp,%rdi
>          call svm_vmexit_handler
>          jmp  .Lsvm_do_resume
>  
>  .Lsvm_process_softirqs:
> -        STGI
> +        sti
>          call do_softirq
>          jmp  .Lsvm_do_resume
>
>
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] SVM: limit GIF=0 region
  2018-08-28 23:06 ` Andrew Cooper
@ 2018-08-29  6:26   ` Jan Beulich
  2018-08-29 17:55     ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2018-08-29  6:26 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit

>>> On 29.08.18 at 01:06, <andrew.cooper3@citrix.com> wrote:
> On 15/08/18 07:09, Jan Beulich wrote:
>> @@ -96,13 +101,12 @@ __UNLIKELY_END(nsvm_hap)
>>          SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>>  
>> -        STGI
>> -GLOBAL(svm_stgi_label)
>> +        sti
> 
> Nack.  As indicated in v1, moving this breaks SPEC_CTRL_ENTRY_FROM_HVM
> (Even if there is an unexpected bug on the VT-x side of things which
> needs fixing differently).

Well, this increases the pressure to fix the issue. I very much
expect that fix to also take care of the code here. It was
therefore intentional that I did only the technically necessary
adjustments in v2.

I would, btw, likely have tried to find time to look into fixing
that issue, but so far I was under the impression that you are
planning to, and since you've written the code I thought you'd
likely also be in a better position to do so.

> Furthermore, to fix LBR handling, the first thing I'd have to do is
> revert this, so please leave it as it is.

Mind being a little more specific as to the whys here? From a
purely formal pov, Boris'es R-b allows me to put the change in
as is. I'm not overly happy to do the requested change, but I
certainly will, provided - once again - I understand the reason.

Furthermore, please don't forget that the more we delay
especially #MC, the more likely it becomes that a system will
crash in a far less obvious way than by logging an #MC.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] SVM: limit GIF=0 region
  2018-08-29  6:26   ` Jan Beulich
@ 2018-08-29 17:55     ` Andrew Cooper
  2018-08-30  6:11       ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2018-08-29 17:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit

On 29/08/18 07:26, Jan Beulich wrote:
>>>> On 29.08.18 at 01:06, <andrew.cooper3@citrix.com> wrote:
>> On 15/08/18 07:09, Jan Beulich wrote:
>>> @@ -96,13 +101,12 @@ __UNLIKELY_END(nsvm_hap)
>>>          SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>>>          /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
>>>  
>>> -        STGI
>>> -GLOBAL(svm_stgi_label)
>>> +        sti
>> Nack.  As indicated in v1, moving this breaks SPEC_CTRL_ENTRY_FROM_HVM
>> (Even if there is an unexpected bug on the VT-x side of things which
>> needs fixing differently).
> Well, this increases the pressure to fix the issue. I very much
> expect that fix to also take care of the code here. It was
> therefore intentional that I did only the technically necessary
> adjustments in v2.
>
> I would, btw, likely have tried to find time to look into fixing
> that issue, but so far I was under the impression that you are
> planning to, and since you've written the code I thought you'd
> likely also be in a better position to do so.

Its somewhere in the massive pile of still-security work that needs
doing.  It is not the most urgent of the still-security work to do,
partly because this protection is currently in place.  Top of the list,
and most important, is still the toolstack CPUID/MSR interactions so we
can feasibly tell a guest what it needs to know WRT L1TF.

>
>> Furthermore, to fix LBR handling, the first thing I'd have to do is
>> revert this, so please leave it as it is.
> Mind being a little more specific as to the whys here?

When vmcb->virt_ext.fields.lbr_enable is clear and DEBUGCTL.LBR is set,
the hypervisor must atomically switch the 5 LBR MSRs inside the critical
region.

Usage of DEBUGCTL.LBR is broken on pre-lbrv hardware (gen 1/2 svm?), and
when suitably (mis)configured by nested virt.

Other bugs include Xen's LBR setting leaking into guests.

> From a purely formal pov, Boris'es R-b allows me to put the change in
> as is. I'm not overly happy to do the requested change, but I
> certainly will, provided - once again - I understand the reason.
>
> Furthermore, please don't forget that the more we delay
> especially #MC, the more likely it becomes that a system will
> crash in a far less obvious way than by logging an #MC.

#MC's only get raised after the bank registers have been updated, so
even if a shutdown occurs, the data will be captured by the firmware
after reset.

As for non-deferrable #MC's, a couple of extra stack/msr/vcpu references
in the critical region doesn't alter the chances.  We were never going
to survive, or won't trigger a new one.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] SVM: limit GIF=0 region
  2018-08-29 17:55     ` Andrew Cooper
@ 2018-08-30  6:11       ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2018-08-30  6:11 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Boris Ostrovsky, Brian Woods, Suravee Suthikulpanit

>>> On 29.08.18 at 19:55, <andrew.cooper3@citrix.com> wrote:
> On 29/08/18 07:26, Jan Beulich wrote:
>>>>> On 29.08.18 at 01:06, <andrew.cooper3@citrix.com> wrote:
>>> Furthermore, to fix LBR handling, the first thing I'd have to do is
>>> revert this, so please leave it as it is.
>> Mind being a little more specific as to the whys here?
> 
> When vmcb->virt_ext.fields.lbr_enable is clear and DEBUGCTL.LBR is set,
> the hypervisor must atomically switch the 5 LBR MSRs inside the critical
> region.
> 
> Usage of DEBUGCTL.LBR is broken on pre-lbrv hardware (gen 1/2 svm?), and
> when suitably (mis)configured by nested virt.

Okay, that's what I had concluded on the way home yesterday.
But this needs doing before the first branch, i.e. in particular
prior to SPEC_CTRL_ENTRY_FROM_HVM. Hence the STGI could
still move ahead (provided the NMI/#MC issue was addressed).

Anyway - I guess I'll make the change and record further room
for improvement in the description.

>> From a purely formal pov, Boris'es R-b allows me to put the change in
>> as is. I'm not overly happy to do the requested change, but I
>> certainly will, provided - once again - I understand the reason.
>>
>> Furthermore, please don't forget that the more we delay
>> especially #MC, the more likely it becomes that a system will
>> crash in a far less obvious way than by logging an #MC.
> 
> #MC's only get raised after the bank registers have been updated, so
> even if a shutdown occurs, the data will be captured by the firmware
> after reset.
> 
> As for non-deferrable #MC's, a couple of extra stack/msr/vcpu references
> in the critical region doesn't alter the chances.  We were never going
> to survive, or won't trigger a new one.

The question isn't whether we survive, but whether the logged
data allows concluding what has happened (and in roughly what
context) in a halfway reasonable manner.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-30  6:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-15  6:09 [PATCH v2] SVM: limit GIF=0 region Jan Beulich
2018-08-28 23:00 ` Boris Ostrovsky
2018-08-28 23:06 ` Andrew Cooper
2018-08-29  6:26   ` Jan Beulich
2018-08-29 17:55     ` Andrew Cooper
2018-08-30  6:11       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.