All of lore.kernel.org
 help / color / mirror / Atom feed
* XSA-351 causing Solaris-11 systems to panic during boot.
@ 2020-11-16 21:57 Cheyenne Wills
  2020-11-17  8:12 ` Jan Beulich
  2020-11-17 10:50 ` Roger Pau Monné
  0 siblings, 2 replies; 16+ messages in thread
From: Cheyenne Wills @ 2020-11-16 21:57 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 597 bytes --]

Running Xen with XSA-351 is causing Solaris 11 systems to panic during
boot.  The panic screen is showing the failure to be coming from
"unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
and the  booting from an install ISO image.

I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
requested that I report it here.

This was failing on a Xen 4.13 and a Xen 4.14 system built via gentoo.

I understand that ultimately this is a bug in Solaris.  However it does
impact existing guests that were functional before applying the XSA-351
security patches.

[-- Attachment #2: Type: text/html, Size: 673 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-11-16 21:57 XSA-351 causing Solaris-11 systems to panic during boot Cheyenne Wills
@ 2020-11-17  8:12 ` Jan Beulich
  2020-11-17 14:43   ` Cheyenne Wills
  2020-12-17  1:51   ` boris.ostrovsky
  2020-11-17 10:50 ` Roger Pau Monné
  1 sibling, 2 replies; 16+ messages in thread
From: Jan Beulich @ 2020-11-17  8:12 UTC (permalink / raw)
  To: Cheyenne Wills; +Cc: xen-devel

On 16.11.2020 22:57, Cheyenne Wills wrote:
> Running Xen with XSA-351 is causing Solaris 11 systems to panic during
> boot.  The panic screen is showing the failure to be coming from
> "unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
> and the  booting from an install ISO image.
> 
> I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
> requested that I report it here.

Thanks. What we need though is information on the specific MSR(s) that
will need to have workarounds added: We surely would want to avoid
blindly doing this for all that the XSA change disallowed access to.
Reproducing the panic screen here might already help; proper full logs
would be even better.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-11-16 21:57 XSA-351 causing Solaris-11 systems to panic during boot Cheyenne Wills
  2020-11-17  8:12 ` Jan Beulich
@ 2020-11-17 10:50 ` Roger Pau Monné
  2020-11-17 12:54   ` Roger Pau Monné
  1 sibling, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2020-11-17 10:50 UTC (permalink / raw)
  To: Cheyenne Wills; +Cc: xen-devel

On Mon, Nov 16, 2020 at 02:57:14PM -0700, Cheyenne Wills wrote:
> Running Xen with XSA-351 is causing Solaris 11 systems to panic during
> boot.  The panic screen is showing the failure to be coming from
> "unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
> and the  booting from an install ISO image.
> 
> I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
> requested that I report it here.
> 
> This was failing on a Xen 4.13 and a Xen 4.14 system built via gentoo.
> 
> I understand that ultimately this is a bug in Solaris.  However it does
> impact existing guests that were functional before applying the XSA-351
> security patches.

I seem to have some issues getting the Solaris 11.4 ISO to boot, which I
think are unrelated to the MSR changes. I get what seems to be a panic
just after the Copyright message, but there's no reason printed at all
about the panic. The message just reads (transcript):

SunOS Release 5.11 Version 11.4.0.15.0 64-bit
Copyright (c) 1983, 2018, Oracle and/or it's affiliates. All right reserved.
System would not fast reboot because:
 newkernel not valid
 fastreboot_onpanic is not set
 ...

The config file I'm using is:

memory=1024
vcpus=4
name="solaris"

builder="hvm"

disk = [
  'format=raw,vdev=hdc,access=ro,devtype=cdrom,target=/root/sol-11_4-text-x86.iso',
  'format=raw,vdev=hda,access=rw,target=/root/solaris.img',
]

vif = [
 'mac=00:16:3E:74:3d:88,bridge=bridge0',
]

vnc=1
vnclisten="0.0.0.0"

serial='pty'

on_crash="preserve"

Is there anything I'm missing?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-11-17 10:50 ` Roger Pau Monné
@ 2020-11-17 12:54   ` Roger Pau Monné
  2020-11-17 13:59     ` Cheyenne Wills
  0 siblings, 1 reply; 16+ messages in thread
From: Roger Pau Monné @ 2020-11-17 12:54 UTC (permalink / raw)
  To: Cheyenne Wills; +Cc: xen-devel

On Tue, Nov 17, 2020 at 11:50:39AM +0100, Roger Pau Monné wrote:
> On Mon, Nov 16, 2020 at 02:57:14PM -0700, Cheyenne Wills wrote:
> > Running Xen with XSA-351 is causing Solaris 11 systems to panic during
> > boot.  The panic screen is showing the failure to be coming from
> > "unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
> > and the  booting from an install ISO image.
> > 
> > I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
> > requested that I report it here.
> > 
> > This was failing on a Xen 4.13 and a Xen 4.14 system built via gentoo.
> > 
> > I understand that ultimately this is a bug in Solaris.  However it does
> > impact existing guests that were functional before applying the XSA-351
> > security patches.
> 
> I seem to have some issues getting the Solaris 11.4 ISO to boot, which I
> think are unrelated to the MSR changes. I get what seems to be a panic
> just after the Copyright message, but there's no reason printed at all
> about the panic. The message just reads (transcript):
> 
> SunOS Release 5.11 Version 11.4.0.15.0 64-bit
> Copyright (c) 1983, 2018, Oracle and/or it's affiliates. All right reserved.
> System would not fast reboot because:
>  newkernel not valid
>  fastreboot_onpanic is not set
>  ...
> 
> The config file I'm using is:
> 
> memory=1024
> vcpus=4
> name="solaris"
> 
> builder="hvm"
> 
> disk = [
>   'format=raw,vdev=hdc,access=ro,devtype=cdrom,target=/root/sol-11_4-text-x86.iso',
>   'format=raw,vdev=hda,access=rw,target=/root/solaris.img',
> ]
> 
> vif = [
>  'mac=00:16:3E:74:3d:88,bridge=bridge0',
> ]
> 
> vnc=1
> vnclisten="0.0.0.0"
> 
> serial='pty'
> 
> on_crash="preserve"
> 
> Is there anything I'm missing?

OK, it seems like Solaris requires more than 1GB of memory in order to
boot. I've increased it to 4GB and I've been able to boot successfully
up to the installer.

I'm however able to boot up to the installer screen without any
crashes, so I guess the version I'm using (11.4.0.15.0) is already
fixed?

Can you paste which version of Solaris you are using and if possible
where I can find the installer media to reproduce?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-11-17 12:54   ` Roger Pau Monné
@ 2020-11-17 13:59     ` Cheyenne Wills
  0 siblings, 0 replies; 16+ messages in thread
From: Cheyenne Wills @ 2020-11-17 13:59 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 2605 bytes --]

Yes. I will have to re-upgrade my xen system to collect the additional info
from the panic, so it will be later today before I can reply with all the
info.

On Tue, Nov 17, 2020, 5:54 AM Roger Pau Monné <roger.pau@citrix.com> wrote:

> On Tue, Nov 17, 2020 at 11:50:39AM +0100, Roger Pau Monné wrote:
> > On Mon, Nov 16, 2020 at 02:57:14PM -0700, Cheyenne Wills wrote:
> > > Running Xen with XSA-351 is causing Solaris 11 systems to panic during
> > > boot.  The panic screen is showing the failure to be coming from
> > > "unix:rdmsr".  The panic occurs with existing guests (booting off a
> disk)
> > > and the  booting from an install ISO image.
> > >
> > > I discussed the problem with "andyhhp__" in the "#xen" IRC channel and
> he
> > > requested that I report it here.
> > >
> > > This was failing on a Xen 4.13 and a Xen 4.14 system built via gentoo.
> > >
> > > I understand that ultimately this is a bug in Solaris.  However it does
> > > impact existing guests that were functional before applying the XSA-351
> > > security patches.
> >
> > I seem to have some issues getting the Solaris 11.4 ISO to boot, which I
> > think are unrelated to the MSR changes. I get what seems to be a panic
> > just after the Copyright message, but there's no reason printed at all
> > about the panic. The message just reads (transcript):
> >
> > SunOS Release 5.11 Version 11.4.0.15.0 64-bit
> > Copyright (c) 1983, 2018, Oracle and/or it's affiliates. All right
> reserved.
> > System would not fast reboot because:
> >  newkernel not valid
> >  fastreboot_onpanic is not set
> >  ...
> >
> > The config file I'm using is:
> >
> > memory=1024
> > vcpus=4
> > name="solaris"
> >
> > builder="hvm"
> >
> > disk = [
> >
>  'format=raw,vdev=hdc,access=ro,devtype=cdrom,target=/root/sol-11_4-text-x86.iso',
> >   'format=raw,vdev=hda,access=rw,target=/root/solaris.img',
> > ]
> >
> > vif = [
> >  'mac=00:16:3E:74:3d:88,bridge=bridge0',
> > ]
> >
> > vnc=1
> > vnclisten="0.0.0.0"
> >
> > serial='pty'
> >
> > on_crash="preserve"
> >
> > Is there anything I'm missing?
>
> OK, it seems like Solaris requires more than 1GB of memory in order to
> boot. I've increased it to 4GB and I've been able to boot successfully
> up to the installer.
>
> I'm however able to boot up to the installer screen without any
> crashes, so I guess the version I'm using (11.4.0.15.0) is already
> fixed?
>
> Can you paste which version of Solaris you are using and if possible
> where I can find the installer media to reproduce?
>
> Thanks, Roger.
>

[-- Attachment #2: Type: text/html, Size: 3338 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-11-17  8:12 ` Jan Beulich
@ 2020-11-17 14:43   ` Cheyenne Wills
  2020-11-17 14:46     ` Andrew Cooper
  2020-12-17  1:51   ` boris.ostrovsky
  1 sibling, 1 reply; 16+ messages in thread
From: Cheyenne Wills @ 2020-11-17 14:43 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1236 bytes --]

The Solaris version reported in the copyright banner on the ISO is SunOS
Release 5.11 Version 11.4.0.15.0 64-bit

My existing guest solaris systems are also at the same release/version level

At the time of the panic, the panic log reports that the rcx register
contains '0606' (this was from my notes yesterday).  If additional
information is needed, I will need a bit more time to set up my system
again.

On Tue, Nov 17, 2020 at 1:12 AM Jan Beulich <jbeulich@suse.com> wrote:

> On 16.11.2020 22:57, Cheyenne Wills wrote:
> > Running Xen with XSA-351 is causing Solaris 11 systems to panic during
> > boot.  The panic screen is showing the failure to be coming from
> > "unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
> > and the  booting from an install ISO image.
> >
> > I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
> > requested that I report it here.
>
> Thanks. What we need though is information on the specific MSR(s) that
> will need to have workarounds added: We surely would want to avoid
> blindly doing this for all that the XSA change disallowed access to.
> Reproducing the panic screen here might already help; proper full logs
> would be even better.
>
> Jan
>

[-- Attachment #2: Type: text/html, Size: 1674 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-11-17 14:43   ` Cheyenne Wills
@ 2020-11-17 14:46     ` Andrew Cooper
  0 siblings, 0 replies; 16+ messages in thread
From: Andrew Cooper @ 2020-11-17 14:46 UTC (permalink / raw)
  To: Cheyenne Wills, Jan Beulich; +Cc: xen-devel

On 17/11/2020 14:43, Cheyenne Wills wrote:
> The Solaris version reported in the copyright banner on the ISO is
> SunOS Release 5.11 Version 11.4.0.15.0 64-bit
>
> My existing guest solaris systems are also at the same release/version
> level
>
> At the time of the panic, the panic log reports that the rcx register
> contains '0606' (this was from my notes yesterday).  If additional
> information is needed, I will need a bit more time to set up my system
> again.

As I said on IRC, this is RAPL_POWER_UNIT, but if it is read unguarded,
then the others will be too.

~Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-11-17  8:12 ` Jan Beulich
  2020-11-17 14:43   ` Cheyenne Wills
@ 2020-12-17  1:51   ` boris.ostrovsky
  2020-12-17  7:40     ` Jan Beulich
  1 sibling, 1 reply; 16+ messages in thread
From: boris.ostrovsky @ 2020-12-17  1:51 UTC (permalink / raw)
  To: Jan Beulich, Cheyenne Wills; +Cc: xen-devel


On 11/17/20 3:12 AM, Jan Beulich wrote:
> On 16.11.2020 22:57, Cheyenne Wills wrote:
>> Running Xen with XSA-351 is causing Solaris 11 systems to panic during
>> boot.  The panic screen is showing the failure to be coming from
>> "unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
>> and the  booting from an install ISO image.
>>
>> I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
>> requested that I report it here.
> Thanks. What we need though is information on the specific MSR(s) that
> will need to have workarounds added: We surely would want to avoid
> blindly doing this for all that the XSA change disallowed access to.
> Reproducing the panic screen here might already help; proper full logs
> would be even better.


We hit this issue today so I poked a bit around Solaris code.


It definitely reads MSR_RAPL_POWER_UNIT unguarded during boot.


In addition, it may read MSR_*_ENERGY_STATUS when running kstat. I haven't been able to trigger those reads (I didn't have access to the system myself and with neither me nor the tester remembering much about Solaris we only tried some basic commands).


The patch below lets Solaris guest boot on OVM. Our codebase is somewhat different from stable branches but if this is an acceptable workaround I will send proper patch for stable. I won't be able to test it though.


Author: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Date:   Wed Dec 16 17:19:07 2020 -0500

    x86/msr: Allow read access to some RAPL MSRs
   
    XSA-351 limited access to RAPL-related MSRs to avoid creating a
    side-channel that might allow information leakage. Guests trying
    to access those MSRs now receive #GP.
   
    RAPL is not indicated by CPUID but the assumption is that guests
    should not deal with power-related features and therefore should
    not touch those MSRs. (Linux, in fact, does read MSR_RAPL_POWER_UNIT
    but it does so in safe manner and can ignore the fault).
   
    Unfortunately, Solaris reads some of those registers without
    safeguards. So for those MSRs let's return 0.
   
    Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 0dbe810e4b27..6b4a5dc77b7f 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -131,6 +131,18 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
         *val &= ~(ARCH_CAPS_TSX_CTRL);
         break;
 
+        /* Solaris reads these MSRs unguarded so let's return 0 */
+    case MSR_RAPL_POWER_UNIT:
+    case MSR_PKG_ENERGY_STATUS:
+    case MSR_DRAM_ENERGY_STATUS:
+    case MSR_PP0_ENERGY_STATUS:
+    case MSR_PP1_ENERGY_STATUS:
+        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
+            goto gp_fault;
+
+        *val = 0;
+        break;
+
         /*
          * These MSRs are not enumerated in CPUID.  They have been around
          * since the Pentium 4, and implemented by other vendors.
@@ -151,11 +163,16 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
             break;
 
         /*fallthrough*/
-    case MSR_RAPL_POWER_UNIT:
-    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
-    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
-    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
-    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
+    case MSR_PKG_POWER_LIMIT:
+    case MSR_PKG_PERF_STATUS:
+    case MSR_PKG_POWER_INFO:
+    case MSR_DRAM_POWER_LIMIT:
+    case MSR_DRAM_PERF_STATUS:
+    case MSR_DRAM_POWER_INFO:
+    case MSR_PP0_POWER_LIMIT:
+    case MSR_PP0_POLICY:
+    case MSR_PP1_POWER_LIMIT:
+    case MSR_PP1_POLICY:
     case MSR_PLATFORM_ENERGY_COUNTER:
     case MSR_PLATFORM_POWER_LIMIT:
     case MSR_F15H_CU_POWER ... MSR_F15H_CU_MAX_POWER:



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-17  1:51   ` boris.ostrovsky
@ 2020-12-17  7:40     ` Jan Beulich
  2020-12-17 16:25       ` boris.ostrovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2020-12-17  7:40 UTC (permalink / raw)
  To: boris.ostrovsky, Andrew Cooper; +Cc: xen-devel, Cheyenne Wills

On 17.12.2020 02:51, boris.ostrovsky@oracle.com wrote:
> 
> On 11/17/20 3:12 AM, Jan Beulich wrote:
>> On 16.11.2020 22:57, Cheyenne Wills wrote:
>>> Running Xen with XSA-351 is causing Solaris 11 systems to panic during
>>> boot.  The panic screen is showing the failure to be coming from
>>> "unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
>>> and the  booting from an install ISO image.
>>>
>>> I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
>>> requested that I report it here.
>> Thanks. What we need though is information on the specific MSR(s) that
>> will need to have workarounds added: We surely would want to avoid
>> blindly doing this for all that the XSA change disallowed access to.
>> Reproducing the panic screen here might already help; proper full logs
>> would be even better.
> 
> 
> We hit this issue today so I poked a bit around Solaris code.
> 
> 
> It definitely reads MSR_RAPL_POWER_UNIT unguarded during boot.
> 
> 
> In addition, it may read MSR_*_ENERGY_STATUS when running kstat. I haven't been able to trigger those reads (I didn't have access to the system myself and with neither me nor the tester remembering much about Solaris we only tried some basic commands).
> 
> 
> The patch below lets Solaris guest boot on OVM. Our codebase is somewhat different from stable branches but if this is an acceptable workaround I will send proper patch for stable. I won't be able to test it though.

I think this is acceptable as a workaround, albeit we may want to
consider further restricting this (at least on staging), like e.g.
requiring a guest config setting to enable the workaround. But
maybe this will need to be part of the MSR policy for the domain
instead, down the road. We'll definitely want Andrew's view here.

Speaking of staging - before applying anything to the stable
branches, I think we want to have this addressed on the main
branch. I can't see how Solaris would work there.

> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -131,6 +131,18 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>          *val &= ~(ARCH_CAPS_TSX_CTRL);
>          break;
>  
> +        /* Solaris reads these MSRs unguarded so let's return 0 */
> +    case MSR_RAPL_POWER_UNIT:
> +    case MSR_PKG_ENERGY_STATUS:
> +    case MSR_DRAM_ENERGY_STATUS:
> +    case MSR_PP0_ENERGY_STATUS:
> +    case MSR_PP1_ENERGY_STATUS:
> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
> +            goto gp_fault;
> +
> +        *val = 0;
> +        break;
> +
>          /*
>           * These MSRs are not enumerated in CPUID.  They have been around
>           * since the Pentium 4, and implemented by other vendors.
> @@ -151,11 +163,16 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>              break;
>  
>          /*fallthrough*/
> -    case MSR_RAPL_POWER_UNIT:
> -    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
> -    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
> -    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
> -    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
> +    case MSR_PKG_POWER_LIMIT:
> +    case MSR_PKG_PERF_STATUS:
> +    case MSR_PKG_POWER_INFO:
> +    case MSR_DRAM_POWER_LIMIT:
> +    case MSR_DRAM_PERF_STATUS:
> +    case MSR_DRAM_POWER_INFO:
> +    case MSR_PP0_POWER_LIMIT:
> +    case MSR_PP0_POLICY:
> +    case MSR_PP1_POWER_LIMIT:
> +    case MSR_PP1_POLICY:
>      case MSR_PLATFORM_ENERGY_COUNTER:
>      case MSR_PLATFORM_POWER_LIMIT:
>      case MSR_F15H_CU_POWER ... MSR_F15H_CU_MAX_POWER:

Note how you no longer handle MSRs previously included (one each
in the first two groups) in the range expressions. I think I'd
prefer the alternative of filtering just the STATUS ones here:

    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
             (msr & 0xf) != 1 /* MSR_*_POWER_STATUS */ )
            goto gp_fault;

        *val = 0;
        break;

Or, folding in MSR_RAPL_POWER_UNIT,

    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
        if ( (msr & 0xf) != 1 /* MSR_*_POWER_STATUS */ )
            goto gp_fault;
        /* fallthrough */
    case MSR_RAPL_POWER_UNIT:
        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
            goto gp_fault;

        *val = 0;
        break;

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-17  7:40     ` Jan Beulich
@ 2020-12-17 16:25       ` boris.ostrovsky
  2020-12-17 16:46         ` Andrew Cooper
  0 siblings, 1 reply; 16+ messages in thread
From: boris.ostrovsky @ 2020-12-17 16:25 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper; +Cc: xen-devel, Cheyenne Wills


On 12/17/20 2:40 AM, Jan Beulich wrote:
> On 17.12.2020 02:51, boris.ostrovsky@oracle.com wrote:
> I think this is acceptable as a workaround, albeit we may want to
> consider further restricting this (at least on staging), like e.g.
> requiring a guest config setting to enable the workaround. 


Maybe, but then someone migrating from a stable release to 4.15 will have to modify guest configuration.


> But
> maybe this will need to be part of the MSR policy for the domain
> instead, down the road. We'll definitely want Andrew's view here.
>
> Speaking of staging - before applying anything to the stable
> branches, I think we want to have this addressed on the main
> branch. I can't see how Solaris would work there.


Indeed it won't. I'll need to do that as well (I misinterpreted the statement in the XSA about only 4.14- being vulnerable)



-boris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-17 16:25       ` boris.ostrovsky
@ 2020-12-17 16:46         ` Andrew Cooper
  2020-12-17 17:49           ` boris.ostrovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Cooper @ 2020-12-17 16:46 UTC (permalink / raw)
  To: boris.ostrovsky, Jan Beulich; +Cc: xen-devel, Cheyenne Wills

On 17/12/2020 16:25, boris.ostrovsky@oracle.com wrote:
> On 12/17/20 2:40 AM, Jan Beulich wrote:
>> On 17.12.2020 02:51, boris.ostrovsky@oracle.com wrote:
>> I think this is acceptable as a workaround, albeit we may want to
>> consider further restricting this (at least on staging), like e.g.
>> requiring a guest config setting to enable the workaround. 
>
> Maybe, but then someone migrating from a stable release to 4.15 will have to modify guest configuration.
>
>
>> But
>> maybe this will need to be part of the MSR policy for the domain
>> instead, down the road. We'll definitely want Andrew's view here.
>>
>> Speaking of staging - before applying anything to the stable
>> branches, I think we want to have this addressed on the main
>> branch. I can't see how Solaris would work there.
>
> Indeed it won't. I'll need to do that as well (I misinterpreted the statement in the XSA about only 4.14- being vulnerable)

It's hopefully obvious now why we suddenly finished the "lets turn all
unknown MSRs to #GP" work at the point that we did (after dithering on
the point for several years).

To put it bluntly, default MSR readability was not a clever decision at all.

There is a large risk that there is a similar vulnerability elsewhere,
given how poorly documented the MSRs are (and one contemporary CPU I've
got the manual open for has more than 6000 *documented* MSRs).  We did
debate for a while whether the readability of the PPIN MSRs was a
vulnerability or not, before eventually deciding not.

Irrespective of what we do to fix this in Xen, has anyone fixed Solaris yet?

~Andrew


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-17 16:46         ` Andrew Cooper
@ 2020-12-17 17:49           ` boris.ostrovsky
  2020-12-18 20:43             ` boris.ostrovsky
  0 siblings, 1 reply; 16+ messages in thread
From: boris.ostrovsky @ 2020-12-17 17:49 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: xen-devel, Cheyenne Wills


On 12/17/20 11:46 AM, Andrew Cooper wrote:
> On 17/12/2020 16:25, boris.ostrovsky@oracle.com wrote:
>> On 12/17/20 2:40 AM, Jan Beulich wrote:
>>> On 17.12.2020 02:51, boris.ostrovsky@oracle.com wrote:
>>> I think this is acceptable as a workaround, albeit we may want to
>>> consider further restricting this (at least on staging), like e.g.
>>> requiring a guest config setting to enable the workaround. 
>> Maybe, but then someone migrating from a stable release to 4.15 will have to modify guest configuration.
>>
>>
>>> But
>>> maybe this will need to be part of the MSR policy for the domain
>>> instead, down the road. We'll definitely want Andrew's view here.
>>>
>>> Speaking of staging - before applying anything to the stable
>>> branches, I think we want to have this addressed on the main
>>> branch. I can't see how Solaris would work there.
>> Indeed it won't. I'll need to do that as well (I misinterpreted the statement in the XSA about only 4.14- being vulnerable)
> It's hopefully obvious now why we suddenly finished the "lets turn all
> unknown MSRs to #GP" work at the point that we did (after dithering on
> the point for several years).
>
> To put it bluntly, default MSR readability was not a clever decision at all.
>
> There is a large risk that there is a similar vulnerability elsewhere,
> given how poorly documented the MSRs are (and one contemporary CPU I've
> got the manual open for has more than 6000 *documented* MSRs).  We did
> debate for a while whether the readability of the PPIN MSRs was a
> vulnerability or not, before eventually deciding not.

> Irrespective of what we do to fix this in Xen, has anyone fixed Solaris yet?


I am not aware of anyone working on this (not that I would be).


-boris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-17 17:49           ` boris.ostrovsky
@ 2020-12-18 20:43             ` boris.ostrovsky
  2020-12-21  8:21               ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: boris.ostrovsky @ 2020-12-18 20:43 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich; +Cc: xen-devel, Cheyenne Wills


On 12/17/20 12:49 PM, boris.ostrovsky@oracle.com wrote:
> On 12/17/20 11:46 AM, Andrew Cooper wrote:
>> On 17/12/2020 16:25, boris.ostrovsky@oracle.com wrote:
>>> On 12/17/20 2:40 AM, Jan Beulich wrote:
>>>> On 17.12.2020 02:51, boris.ostrovsky@oracle.com wrote:
>>>> I think this is acceptable as a workaround, albeit we may want to
>>>> consider further restricting this (at least on staging), like e.g.
>>>> requiring a guest config setting to enable the workaround. 
>>> Maybe, but then someone migrating from a stable release to 4.15 will have to modify guest configuration.
>>>
>>>
>>>> But
>>>> maybe this will need to be part of the MSR policy for the domain
>>>> instead, down the road. We'll definitely want Andrew's view here.
>>>>
>>>> Speaking of staging - before applying anything to the stable
>>>> branches, I think we want to have this addressed on the main
>>>> branch. I can't see how Solaris would work there.
>>> Indeed it won't. I'll need to do that as well (I misinterpreted the statement in the XSA about only 4.14- being vulnerable)
>> It's hopefully obvious now why we suddenly finished the "lets turn all
>> unknown MSRs to #GP" work at the point that we did (after dithering on
>> the point for several years).
>>
>> To put it bluntly, default MSR readability was not a clever decision at all.
>>
>> There is a large risk that there is a similar vulnerability elsewhere,
>> given how poorly documented the MSRs are (and one contemporary CPU I've
>> got the manual open for has more than 6000 *documented* MSRs).  We did
>> debate for a while whether the readability of the PPIN MSRs was a
>> vulnerability or not, before eventually deciding not.


Can we do something like KVM's ignore_msrs (but probably return 0 on reads to avoid leaks from the system)? It would allow to deal with cases when a guest is suddenly unable to boot after hypervisor update (especially from pre-4.14). It won't help in all cases since some MSRs may be expected to be non-zero but I think it will cover large number of them. (and it will certainly do what Jan is asking above but will not be specific to this particular breakage)


-boris


>> Irrespective of what we do to fix this in Xen, has anyone fixed Solaris yet?
>
> I am not aware of anyone working on this (not that I would be).
>
>
> -boris
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-18 20:43             ` boris.ostrovsky
@ 2020-12-21  8:21               ` Jan Beulich
  2020-12-21 16:21                 ` boris.ostrovsky
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2020-12-21  8:21 UTC (permalink / raw)
  To: boris.ostrovsky; +Cc: xen-devel, Cheyenne Wills, Andrew Cooper

On 18.12.2020 21:43, boris.ostrovsky@oracle.com wrote:
> On 12/17/20 12:49 PM, boris.ostrovsky@oracle.com wrote:
>> On 12/17/20 11:46 AM, Andrew Cooper wrote:
>>> On 17/12/2020 16:25, boris.ostrovsky@oracle.com wrote:
>>>> On 12/17/20 2:40 AM, Jan Beulich wrote:
>>>>> On 17.12.2020 02:51, boris.ostrovsky@oracle.com wrote:
>>>>> I think this is acceptable as a workaround, albeit we may want to
>>>>> consider further restricting this (at least on staging), like e.g.
>>>>> requiring a guest config setting to enable the workaround. 
>>>> Maybe, but then someone migrating from a stable release to 4.15 will have to modify guest configuration.
>>>>
>>>>
>>>>> But
>>>>> maybe this will need to be part of the MSR policy for the domain
>>>>> instead, down the road. We'll definitely want Andrew's view here.
>>>>>
>>>>> Speaking of staging - before applying anything to the stable
>>>>> branches, I think we want to have this addressed on the main
>>>>> branch. I can't see how Solaris would work there.
>>>> Indeed it won't. I'll need to do that as well (I misinterpreted the statement in the XSA about only 4.14- being vulnerable)
>>> It's hopefully obvious now why we suddenly finished the "lets turn all
>>> unknown MSRs to #GP" work at the point that we did (after dithering on
>>> the point for several years).
>>>
>>> To put it bluntly, default MSR readability was not a clever decision at all.
>>>
>>> There is a large risk that there is a similar vulnerability elsewhere,
>>> given how poorly documented the MSRs are (and one contemporary CPU I've
>>> got the manual open for has more than 6000 *documented* MSRs).  We did
>>> debate for a while whether the readability of the PPIN MSRs was a
>>> vulnerability or not, before eventually deciding not.
> 
> 
> Can we do something like KVM's ignore_msrs (but probably return 0 on reads to avoid leaks from the system)? It would allow to deal with cases when a guest is suddenly unable to boot after hypervisor update (especially from pre-4.14). It won't help in all cases since some MSRs may be expected to be non-zero but I think it will cover large number of them. (and it will certainly do what Jan is asking above but will not be specific to this particular breakage)

This would re-introduce the problem with detection (by guests) of certain
features lacking suitable CPUID bits. Guests would no longer observe the
expected #GP(0), and hence be at risk of misbehaving. Hence at the very
least such an option would need to be per-domain rather than (like for
KVM) global, and use of it should then imo be explicitly unsupported. And
along the lines of what KVM has, this may want to be a tristate so the
ignoring can be both silent and verbose.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-21  8:21               ` Jan Beulich
@ 2020-12-21 16:21                 ` boris.ostrovsky
  2020-12-21 16:55                   ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: boris.ostrovsky @ 2020-12-21 16:21 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Cheyenne Wills, Andrew Cooper


On 12/21/20 3:21 AM, Jan Beulich wrote:
> On 18.12.2020 21:43, boris.ostrovsky@oracle.com wrote:
>> Can we do something like KVM's ignore_msrs (but probably return 0 on reads to avoid leaks from the system)? It would allow to deal with cases when a guest is suddenly unable to boot after hypervisor update (especially from pre-4.14). It won't help in all cases since some MSRs may be expected to be non-zero but I think it will cover large number of them. (and it will certainly do what Jan is asking above but will not be specific to this particular breakage)
> This would re-introduce the problem with detection (by guests) of certain
> features lacking suitable CPUID bits. Guests would no longer observe the
> expected #GP(0), and hence be at risk of misbehaving. Hence at the very
> least such an option would need to be per-domain rather than (like for
> KVM) global,


Yes, of course.


>  and use of it should then imo be explicitly unsupported.


Unsupported or not recommended? There are options that are not recommended from security perspective but they are still supported. For example, `spec-ctrl=no` (although it's a global setting)


>  And
> along the lines of what KVM has, this may want to be a tristate so the
> ignoring can be both silent and verbose.


OK.


ignore_msrs="never" (default)

ignore_msrs="silent"

ignore_msrs="verbose'



-boris



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: XSA-351 causing Solaris-11 systems to panic during boot.
  2020-12-21 16:21                 ` boris.ostrovsky
@ 2020-12-21 16:55                   ` Jan Beulich
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2020-12-21 16:55 UTC (permalink / raw)
  To: boris.ostrovsky; +Cc: xen-devel, Cheyenne Wills, Andrew Cooper

On 21.12.2020 17:21, boris.ostrovsky@oracle.com wrote:
> 
> On 12/21/20 3:21 AM, Jan Beulich wrote:
>> On 18.12.2020 21:43, boris.ostrovsky@oracle.com wrote:
>>> Can we do something like KVM's ignore_msrs (but probably return 0 on reads to avoid leaks from the system)? It would allow to deal with cases when a guest is suddenly unable to boot after hypervisor update (especially from pre-4.14). It won't help in all cases since some MSRs may be expected to be non-zero but I think it will cover large number of them. (and it will certainly do what Jan is asking above but will not be specific to this particular breakage)
>> This would re-introduce the problem with detection (by guests) of certain
>> features lacking suitable CPUID bits. Guests would no longer observe the
>> expected #GP(0), and hence be at risk of misbehaving. Hence at the very
>> least such an option would need to be per-domain rather than (like for
>> KVM) global,
> 
> 
> Yes, of course.
> 
> 
>>  and use of it should then imo be explicitly unsupported.
> 
> 
> Unsupported or not recommended? There are options that are not recommended from security perspective but they are still supported. For example, `spec-ctrl=no` (although it's a global setting)

"Security unsupported", i.e. use of it causing what might look like
a security issue would not get an XSA.

Jan


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-12-21 16:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16 21:57 XSA-351 causing Solaris-11 systems to panic during boot Cheyenne Wills
2020-11-17  8:12 ` Jan Beulich
2020-11-17 14:43   ` Cheyenne Wills
2020-11-17 14:46     ` Andrew Cooper
2020-12-17  1:51   ` boris.ostrovsky
2020-12-17  7:40     ` Jan Beulich
2020-12-17 16:25       ` boris.ostrovsky
2020-12-17 16:46         ` Andrew Cooper
2020-12-17 17:49           ` boris.ostrovsky
2020-12-18 20:43             ` boris.ostrovsky
2020-12-21  8:21               ` Jan Beulich
2020-12-21 16:21                 ` boris.ostrovsky
2020-12-21 16:55                   ` Jan Beulich
2020-11-17 10:50 ` Roger Pau Monné
2020-11-17 12:54   ` Roger Pau Monné
2020-11-17 13:59     ` Cheyenne Wills

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.