All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Beulich <jbeulich@suse.com>
To: boris.ostrovsky@oracle.com, Andrew Cooper <andrew.cooper3@citrix.com>
Cc: xen-devel@lists.xenproject.org,
	Cheyenne Wills <cheyenne.wills@gmail.com>
Subject: Re: XSA-351 causing Solaris-11 systems to panic during boot.
Date: Thu, 17 Dec 2020 08:40:38 +0100	[thread overview]
Message-ID: <f4ff3d16-40f6-e8a1-fcdd-ca52e1f52ca6@suse.com> (raw)
In-Reply-To: <4fc3532b-f53f-2a15-ce64-f857816b0566@oracle.com>

On 17.12.2020 02:51, boris.ostrovsky@oracle.com wrote:
> 
> On 11/17/20 3:12 AM, Jan Beulich wrote:
>> On 16.11.2020 22:57, Cheyenne Wills wrote:
>>> Running Xen with XSA-351 is causing Solaris 11 systems to panic during
>>> boot.  The panic screen is showing the failure to be coming from
>>> "unix:rdmsr".  The panic occurs with existing guests (booting off a disk)
>>> and the  booting from an install ISO image.
>>>
>>> I discussed the problem with "andyhhp__" in the "#xen" IRC channel and he
>>> requested that I report it here.
>> Thanks. What we need though is information on the specific MSR(s) that
>> will need to have workarounds added: We surely would want to avoid
>> blindly doing this for all that the XSA change disallowed access to.
>> Reproducing the panic screen here might already help; proper full logs
>> would be even better.
> 
> 
> We hit this issue today so I poked a bit around Solaris code.
> 
> 
> It definitely reads MSR_RAPL_POWER_UNIT unguarded during boot.
> 
> 
> In addition, it may read MSR_*_ENERGY_STATUS when running kstat. I haven't been able to trigger those reads (I didn't have access to the system myself and with neither me nor the tester remembering much about Solaris we only tried some basic commands).
> 
> 
> The patch below lets Solaris guest boot on OVM. Our codebase is somewhat different from stable branches but if this is an acceptable workaround I will send proper patch for stable. I won't be able to test it though.

I think this is acceptable as a workaround, albeit we may want to
consider further restricting this (at least on staging), like e.g.
requiring a guest config setting to enable the workaround. But
maybe this will need to be part of the MSR policy for the domain
instead, down the road. We'll definitely want Andrew's view here.

Speaking of staging - before applying anything to the stable
branches, I think we want to have this addressed on the main
branch. I can't see how Solaris would work there.

> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -131,6 +131,18 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>          *val &= ~(ARCH_CAPS_TSX_CTRL);
>          break;
>  
> +        /* Solaris reads these MSRs unguarded so let's return 0 */
> +    case MSR_RAPL_POWER_UNIT:
> +    case MSR_PKG_ENERGY_STATUS:
> +    case MSR_DRAM_ENERGY_STATUS:
> +    case MSR_PP0_ENERGY_STATUS:
> +    case MSR_PP1_ENERGY_STATUS:
> +        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
> +            goto gp_fault;
> +
> +        *val = 0;
> +        break;
> +
>          /*
>           * These MSRs are not enumerated in CPUID.  They have been around
>           * since the Pentium 4, and implemented by other vendors.
> @@ -151,11 +163,16 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
>              break;
>  
>          /*fallthrough*/
> -    case MSR_RAPL_POWER_UNIT:
> -    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
> -    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
> -    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
> -    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
> +    case MSR_PKG_POWER_LIMIT:
> +    case MSR_PKG_PERF_STATUS:
> +    case MSR_PKG_POWER_INFO:
> +    case MSR_DRAM_POWER_LIMIT:
> +    case MSR_DRAM_PERF_STATUS:
> +    case MSR_DRAM_POWER_INFO:
> +    case MSR_PP0_POWER_LIMIT:
> +    case MSR_PP0_POLICY:
> +    case MSR_PP1_POWER_LIMIT:
> +    case MSR_PP1_POLICY:
>      case MSR_PLATFORM_ENERGY_COUNTER:
>      case MSR_PLATFORM_POWER_LIMIT:
>      case MSR_F15H_CU_POWER ... MSR_F15H_CU_MAX_POWER:

Note how you no longer handle MSRs previously included (one each
in the first two groups) in the range expressions. I think I'd
prefer the alternative of filtering just the STATUS ones here:

    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
             (msr & 0xf) != 1 /* MSR_*_POWER_STATUS */ )
            goto gp_fault;

        *val = 0;
        break;

Or, folding in MSR_RAPL_POWER_UNIT,

    case MSR_PKG_POWER_LIMIT  ... MSR_PKG_POWER_INFO:
    case MSR_DRAM_POWER_LIMIT ... MSR_DRAM_POWER_INFO:
    case MSR_PP0_POWER_LIMIT  ... MSR_PP0_POLICY:
    case MSR_PP1_POWER_LIMIT  ... MSR_PP1_POLICY:
        if ( (msr & 0xf) != 1 /* MSR_*_POWER_STATUS */ )
            goto gp_fault;
        /* fallthrough */
    case MSR_RAPL_POWER_UNIT:
        if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL )
            goto gp_fault;

        *val = 0;
        break;

Jan


  reply	other threads:[~2020-12-17  7:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-16 21:57 XSA-351 causing Solaris-11 systems to panic during boot Cheyenne Wills
2020-11-17  8:12 ` Jan Beulich
2020-11-17 14:43   ` Cheyenne Wills
2020-11-17 14:46     ` Andrew Cooper
2020-12-17  1:51   ` boris.ostrovsky
2020-12-17  7:40     ` Jan Beulich [this message]
2020-12-17 16:25       ` boris.ostrovsky
2020-12-17 16:46         ` Andrew Cooper
2020-12-17 17:49           ` boris.ostrovsky
2020-12-18 20:43             ` boris.ostrovsky
2020-12-21  8:21               ` Jan Beulich
2020-12-21 16:21                 ` boris.ostrovsky
2020-12-21 16:55                   ` Jan Beulich
2020-11-17 10:50 ` Roger Pau Monné
2020-11-17 12:54   ` Roger Pau Monné
2020-11-17 13:59     ` Cheyenne Wills

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4ff3d16-40f6-e8a1-fcdd-ca52e1f52ca6@suse.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=cheyenne.wills@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.