All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2][4.15] x86: guest MSR access handling tweaks
@ 2021-03-12  7:53 Jan Beulich
  2021-03-12  7:54 ` [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
  2021-03-12  7:55 ` [PATCH v3 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12  7:53 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Addressing review feedback, albeit some of it still looks to be
stalled. But I didn't want to wait longer.

1: PV: conditionally avoid raising #GP for early guest MSR reads
2: AMD: expose HWCR.TscFreqSel to guests

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-12  7:53 [PATCH v3 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
@ 2021-03-12  7:54 ` Jan Beulich
  2021-03-12  9:08   ` Roger Pau Monné
  2021-03-12 11:19   ` Andrew Cooper
  2021-03-12  7:55 ` [PATCH v3 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
  1 sibling, 2 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12  7:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Prior to 4.15 Linux, when running in PV mode, did not install a #GP
handler early enough to cover for example the rdmsrl_safe() of
MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
of MSR_K7_HWCR later in the same function). The respective change
(42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
backported to 4.14, but no further - presumably since it wasn't really
easy because of other dependencies.

Therefore, to prevent our change in the handling of guest MSR accesses
to render PV Linux 4.13 and older unusable on at least AMD systems, make
the raising of #GP on this paths conditional upon the guest having
installed a handler, provided of course the MSR can be read in the first
place (we would have raised #GP in that case even before). Producing
zero for reads isn't necessarily correct and may trip code trying to
detect presence of MSRs early, but since such detection logic won't work
without a #GP handler anyway, this ought to be a fair workaround.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
---
(projected v4: re-base over Roger's change)
v3: Use temporary variable for probing. Document the behavior (in a
    public header, for the lack of a better place).
v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
    messages (in debug builds). Don't alter WRMSR behavior.
---
While I didn't myself observe or find similar WRMSR side issues, I'm
nevertheless not convinced we can get away without also making the WRMSR
path somewhat more permissive again, e.g. tolerating attempts to set
bits which are already set. But of course this would require keeping in
sync for which MSRs we "fake" reads, as then a kernel attempt to set a
bit may also appear as an attempt to clear others (because of the zero
value that we gave it for the read). Roger validly points out that
making behavior dependent upon MSR values has its own downsides, so
simply depending on MSR readability is another option (with, in turn,
its own undesirable effects, e.g. for write-only MSRs).

--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -874,7 +874,8 @@ static int read_msr(unsigned int reg, ui
     struct vcpu *curr = current;
     const struct domain *currd = curr->domain;
     const struct cpuid_policy *cp = currd->arch.cpuid;
-    bool vpmu_msr = false;
+    bool vpmu_msr = false, warn = false;
+    uint64_t tmp;
     int ret;
 
     if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
@@ -882,7 +883,7 @@ static int read_msr(unsigned int reg, ui
         if ( ret == X86EMUL_EXCEPTION )
             x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
 
-        return ret;
+        goto done;
     }
 
     switch ( reg )
@@ -986,7 +987,7 @@ static int read_msr(unsigned int reg, ui
         }
         /* fall through */
     default:
-        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+        warn = true;
         break;
 
     normal:
@@ -995,7 +996,19 @@ static int read_msr(unsigned int reg, ui
         return X86EMUL_OKAY;
     }
 
-    return X86EMUL_UNHANDLEABLE;
+ done:
+    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
+         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, tmp) )
+    {
+        gprintk(XENLOG_WARNING, "faking RDMSR 0x%08x\n", reg);
+        *val = 0;
+        x86_emul_reset_event(ctxt);
+        ret = X86EMUL_OKAY;
+    }
+    else if ( warn )
+        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+
+    return ret;
 }
 
 static int write_msr(unsigned int reg, uint64_t val,
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -143,6 +143,12 @@ typedef unsigned long xen_ulong_t;
  *  Level == 1: Kernel may enter
  *  Level == 2: Kernel may enter
  *  Level == 3: Everyone may enter
+ *
+ * Note: For compatibility with kernels not setting up exception handlers
+ *       early enough, Xen will avoid trying to inject #GP (and hence crash
+ *       the domain) when an RDMSR would require this, but no handler was
+ *       set yet. The precise conditions are implementation specific, and
+ *       new code shouldn't rely on such behavior anyway.
  */
 #define TI_GET_DPL(_ti)      ((_ti)->flags & 3)
 #define TI_GET_IF(_ti)       ((_ti)->flags & 4)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-12  7:53 [PATCH v3 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
  2021-03-12  7:54 ` [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
@ 2021-03-12  7:55 ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12  7:55 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Linux has been warning ("firmware bug") about this bit being clear for a
long time. While writable in older hardware it has been readonly on more
than just most recent hardware. For simplicitly report it always set (if
anything we may want to log the issue ourselves if it turns out to be
clear on older hardware) on CPU families 10h and up (in family 0fh the
bit is part of a larger field of different purpose).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v3: Report 0 for Fam0F.
v2: New.
---
There are likely more bits worthwhile to expose, but for about every one
of them there would be the risk of a lengthy discussion, as there are
clear downsides to exposing such information. The more that it would be
tbd whether the hardware values should be surfaced, and if so what
should happen when the guest gets migrated.

The main risk with making the read not fault here is that guests might
imply they can also write this MSR then.

--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -315,6 +315,13 @@ int guest_rdmsr(struct vcpu *v, uint32_t
         *val = msrs->tsc_aux;
         break;
 
+    case MSR_K8_HWCR:
+        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+            goto gp_fault;
+        *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
+               ? K8_HWCR_TSC_FREQ_SEL : 0;
+        break;
+
     case MSR_AMD64_DE_CFG:
         if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
             goto gp_fault;
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -287,6 +287,8 @@
 
 #define MSR_K7_HWCR			0xc0010015
 #define MSR_K8_HWCR			0xc0010015
+#define K8_HWCR_TSC_FREQ_SEL		(1ULL << 24)
+
 #define MSR_K7_FID_VID_CTL		0xc0010041
 #define MSR_K7_FID_VID_STATUS		0xc0010042
 #define MSR_K8_PSTATE_LIMIT		0xc0010061



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-12  7:54 ` [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
@ 2021-03-12  9:08   ` Roger Pau Monné
  2021-03-12  9:32     ` Jan Beulich
  2021-03-12 11:19   ` Andrew Cooper
  1 sibling, 1 reply; 11+ messages in thread
From: Roger Pau Monné @ 2021-03-12  9:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Fri, Mar 12, 2021 at 08:54:46AM +0100, Jan Beulich wrote:
> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> handler early enough to cover for example the rdmsrl_safe() of
> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> of MSR_K7_HWCR later in the same function). The respective change
> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> backported to 4.14, but no further - presumably since it wasn't really
> easy because of other dependencies.
> 
> Therefore, to prevent our change in the handling of guest MSR accesses
> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> the raising of #GP on this paths conditional upon the guest having
> installed a handler, provided of course the MSR can be read in the first
> place (we would have raised #GP in that case even before). Producing
> zero for reads isn't necessarily correct and may trip code trying to
> detect presence of MSRs early, but since such detection logic won't work
> without a #GP handler anyway, this ought to be a fair workaround.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Release-Acked-by: Ian Jackson <iwj@xenproject.org>

I think the approach is fine:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Some comments below.

> ---
> (projected v4: re-base over Roger's change)
> v3: Use temporary variable for probing. Document the behavior (in a
>     public header, for the lack of a better place).
> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>     messages (in debug builds). Don't alter WRMSR behavior.
> ---
> While I didn't myself observe or find similar WRMSR side issues, I'm
> nevertheless not convinced we can get away without also making the WRMSR
> path somewhat more permissive again, e.g. tolerating attempts to set
> bits which are already set. But of course this would require keeping in
> sync for which MSRs we "fake" reads, as then a kernel attempt to set a
> bit may also appear as an attempt to clear others (because of the zero
> value that we gave it for the read). Roger validly points out that
> making behavior dependent upon MSR values has its own downsides, so
> simply depending on MSR readability is another option (with, in turn,
> its own undesirable effects, e.g. for write-only MSRs).
> 
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -874,7 +874,8 @@ static int read_msr(unsigned int reg, ui
>      struct vcpu *curr = current;
>      const struct domain *currd = curr->domain;
>      const struct cpuid_policy *cp = currd->arch.cpuid;
> -    bool vpmu_msr = false;
> +    bool vpmu_msr = false, warn = false;
> +    uint64_t tmp;
>      int ret;
>  
>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> @@ -882,7 +883,7 @@ static int read_msr(unsigned int reg, ui
>          if ( ret == X86EMUL_EXCEPTION )
>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);

You might want to move the injection of the exception to the done
label?

So that we can avoid the call to x86_emul_reset_event.

>  
> -        return ret;
> +        goto done;
>      }
>  
>      switch ( reg )
> @@ -986,7 +987,7 @@ static int read_msr(unsigned int reg, ui
>          }
>          /* fall through */
>      default:
> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> +        warn = true;
>          break;
>  
>      normal:
> @@ -995,7 +996,19 @@ static int read_msr(unsigned int reg, ui
>          return X86EMUL_OKAY;
>      }
>  
> -    return X86EMUL_UNHANDLEABLE;
> + done:
> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, tmp) )
> +    {
> +        gprintk(XENLOG_WARNING, "faking RDMSR 0x%08x\n", reg);
> +        *val = 0;
> +        x86_emul_reset_event(ctxt);
> +        ret = X86EMUL_OKAY;
> +    }
> +    else if ( warn )
> +        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);

I think you could add:

if ( rc == X86EMUL_EXCEPTION )
    x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);

> +
> +    return ret;
>  }
>  
>  static int write_msr(unsigned int reg, uint64_t val,
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -143,6 +143,12 @@ typedef unsigned long xen_ulong_t;
>   *  Level == 1: Kernel may enter
>   *  Level == 2: Kernel may enter
>   *  Level == 3: Everyone may enter
> + *
> + * Note: For compatibility with kernels not setting up exception handlers
> + *       early enough, Xen will avoid trying to inject #GP (and hence crash
> + *       the domain) when an RDMSR would require this, but no handler was
> + *       set yet. The precise conditions are implementation specific, and

You can drop the 'yet' here I think? As even if a handler has been set
and then removed we would still prevent injecting a #GP AFAICT. Not a
native speaker anyway, so I might be wrong on that one.

> + *       new code shouldn't rely on such behavior anyway.

I would use a stronger mustn't here instead of shouldn't.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-12  9:08   ` Roger Pau Monné
@ 2021-03-12  9:32     ` Jan Beulich
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12  9:32 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 12.03.2021 10:08, Roger Pau Monné wrote:
> On Fri, Mar 12, 2021 at 08:54:46AM +0100, Jan Beulich wrote:
>> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
>> handler early enough to cover for example the rdmsrl_safe() of
>> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
>> of MSR_K7_HWCR later in the same function). The respective change
>> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
>> backported to 4.14, but no further - presumably since it wasn't really
>> easy because of other dependencies.
>>
>> Therefore, to prevent our change in the handling of guest MSR accesses
>> to render PV Linux 4.13 and older unusable on at least AMD systems, make
>> the raising of #GP on this paths conditional upon the guest having
>> installed a handler, provided of course the MSR can be read in the first
>> place (we would have raised #GP in that case even before). Producing
>> zero for reads isn't necessarily correct and may trip code trying to
>> detect presence of MSRs early, but since such detection logic won't work
>> without a #GP handler anyway, this ought to be a fair workaround.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
> 
> I think the approach is fine:
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -874,7 +874,8 @@ static int read_msr(unsigned int reg, ui
>>      struct vcpu *curr = current;
>>      const struct domain *currd = curr->domain;
>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>> -    bool vpmu_msr = false;
>> +    bool vpmu_msr = false, warn = false;
>> +    uint64_t tmp;
>>      int ret;
>>  
>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>> @@ -882,7 +883,7 @@ static int read_msr(unsigned int reg, ui
>>          if ( ret == X86EMUL_EXCEPTION )
>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> 
> You might want to move the injection of the exception to the done
> label?
> 
> So that we can avoid the call to x86_emul_reset_event.

At the expense of slightly more code churn, yes, perhaps. I have
to admit though that it feels less prone to future issues to me
to have an unconditional x86_emul_reset_event() on that path.

>> --- a/xen/include/public/arch-x86/xen.h
>> +++ b/xen/include/public/arch-x86/xen.h
>> @@ -143,6 +143,12 @@ typedef unsigned long xen_ulong_t;
>>   *  Level == 1: Kernel may enter
>>   *  Level == 2: Kernel may enter
>>   *  Level == 3: Everyone may enter
>> + *
>> + * Note: For compatibility with kernels not setting up exception handlers
>> + *       early enough, Xen will avoid trying to inject #GP (and hence crash
>> + *       the domain) when an RDMSR would require this, but no handler was
>> + *       set yet. The precise conditions are implementation specific, and
> 
> You can drop the 'yet' here I think? As even if a handler has been set
> and then removed we would still prevent injecting a #GP AFAICT. Not a
> native speaker anyway, so I might be wrong on that one.

Well, I've put it there intentionally to leave room (effectively
trying to further emphasize "implementation specific") for us to
indeed only behave this way if no handler was ever set (as
opposed to a handler having got set and then zapped again).

>> + *       new code shouldn't rely on such behavior anyway.
> 
> I would use a stronger mustn't here instead of shouldn't.

Fine with me. I've been using "mustn't" in a number of cases in
the past and was told I'm using it too often, sounding sort of
impolite. I guess I'll switch to "may not", which was suggested
to me as the better replacement.

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks
@ 2021-03-12 11:01     ` Jan Beulich
  2021-03-12 11:02       ` [PATCH v4 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12 11:01 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Largely to re-base patch 1.

1: PV: conditionally avoid raising #GP for early guest MSR reads
2: AMD: expose HWCR.TscFreqSel to guests

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-12 11:01     ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
@ 2021-03-12 11:02       ` Jan Beulich
  2021-03-12 11:03       ` [PATCH v4 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
  2021-03-12 13:59       ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks [and 1 more messages] Ian Jackson
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12 11:02 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Prior to 4.15 Linux, when running in PV mode, did not install a #GP
handler early enough to cover for example the rdmsrl_safe() of
MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
of MSR_K7_HWCR later in the same function). The respective change
(42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
backported to 4.14, but no further - presumably since it wasn't really
easy because of other dependencies.

Therefore, to prevent our change in the handling of guest MSR accesses
to render PV Linux 4.13 and older unusable on at least AMD systems, make
the raising of #GP on this paths conditional upon the guest having
installed a handler, provided of course the MSR can be read in the first
place (we would have raised #GP in that case even before). Producing
zero for reads isn't necessarily correct and may trip code trying to
detect presence of MSRs early, but since such detection logic won't work
without a #GP handler anyway, this ought to be a fair workaround.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
---
v4: Re-base. Slightly adjust comment wording.
v3: Use temporary variable for probing. Document the behavior (in a
    public header, for the lack of a better place).
v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
    messages (in debug builds). Don't alter WRMSR behavior.
---
While I didn't myself observe or find similar WRMSR side issues, I'm
nevertheless not convinced we can get away without also making the WRMSR
path somewhat more permissive again, e.g. tolerating attempts to set
bits which are already set. But of course this would require keeping in
sync for which MSRs we "fake" reads, as then a kernel attempt to set a
bit may also appear as an attempt to clear others (because of the zero
value that we gave it for the read). Roger validly points out that
making behavior dependent upon MSR values has its own downsides, so
simply depending on MSR readability is another option (with, in turn,
its own undesirable effects, e.g. for write-only MSRs).

--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
     struct vcpu *curr = current;
     const struct domain *currd = curr->domain;
     const struct cpuid_policy *cp = currd->arch.cpuid;
-    bool vpmu_msr = false;
+    bool vpmu_msr = false, warn = false;
     uint64_t tmp;
     int ret;
 
@@ -883,7 +883,7 @@ static int read_msr(unsigned int reg, ui
         if ( ret == X86EMUL_EXCEPTION )
             x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
 
-        return ret;
+        goto done;
     }
 
     switch ( reg )
@@ -993,7 +993,7 @@ static int read_msr(unsigned int reg, ui
             return X86EMUL_OKAY;
         }
 
-        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+        warn = true;
         break;
 
     normal:
@@ -1002,7 +1002,19 @@ static int read_msr(unsigned int reg, ui
         return X86EMUL_OKAY;
     }
 
-    return X86EMUL_UNHANDLEABLE;
+ done:
+    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
+         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, tmp) )
+    {
+        gprintk(XENLOG_WARNING, "faking RDMSR 0x%08x\n", reg);
+        *val = 0;
+        x86_emul_reset_event(ctxt);
+        ret = X86EMUL_OKAY;
+    }
+    else if ( warn )
+        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+
+    return ret;
 }
 
 static int write_msr(unsigned int reg, uint64_t val,
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -143,6 +143,12 @@ typedef unsigned long xen_ulong_t;
  *  Level == 1: Kernel may enter
  *  Level == 2: Kernel may enter
  *  Level == 3: Everyone may enter
+ *
+ * Note: For compatibility with kernels not setting up exception handlers
+ *       early enough, Xen will avoid trying to inject #GP (and hence crash
+ *       the domain) when an RDMSR would require this, but no handler was
+ *       set yet. The precise conditions are implementation specific, and
+ *       new code may not rely on such behavior anyway.
  */
 #define TI_GET_DPL(_ti)      ((_ti)->flags & 3)
 #define TI_GET_IF(_ti)       ((_ti)->flags & 4)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-12 11:01     ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
  2021-03-12 11:02       ` [PATCH v4 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
@ 2021-03-12 11:03       ` Jan Beulich
  2021-03-12 13:59       ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks [and 1 more messages] Ian Jackson
  2 siblings, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12 11:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Linux has been warning ("firmware bug") about this bit being clear for a
long time. While writable in older hardware it has been readonly on more
than just most recent hardware. For simplicitly report it always set (if
anything we may want to log the issue ourselves if it turns out to be
clear on older hardware) on CPU families 10h and up (in family 0fh the
bit is part of a larger field of different purpose).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
v3: Report 0 for Fam0F.
v2: New.
---
There are likely more bits worthwhile to expose, but for about every one
of them there would be the risk of a lengthy discussion, as there are
clear downsides to exposing such information. The more that it would be
tbd whether the hardware values should be surfaced, and if so what
should happen when the guest gets migrated.

The main risk with making the read not fault here is that guests might
imply they can also write this MSR then.

--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -315,6 +315,13 @@ int guest_rdmsr(struct vcpu *v, uint32_t
         *val = msrs->tsc_aux;
         break;
 
+    case MSR_K8_HWCR:
+        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+            goto gp_fault;
+        *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10
+               ? K8_HWCR_TSC_FREQ_SEL : 0;
+        break;
+
     case MSR_AMD64_DE_CFG:
         if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
             goto gp_fault;
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -287,6 +287,8 @@
 
 #define MSR_K7_HWCR			0xc0010015
 #define MSR_K8_HWCR			0xc0010015
+#define K8_HWCR_TSC_FREQ_SEL		(1ULL << 24)
+
 #define MSR_K7_FID_VID_CTL		0xc0010041
 #define MSR_K7_FID_VID_STATUS		0xc0010042
 #define MSR_K8_PSTATE_LIMIT		0xc0010061



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-12  7:54 ` [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
  2021-03-12  9:08   ` Roger Pau Monné
@ 2021-03-12 11:19   ` Andrew Cooper
  2021-03-12 11:01     ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
  2021-03-12 13:34     ` [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Cooper @ 2021-03-12 11:19 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné, Ian Jackson

On 12/03/2021 07:54, Jan Beulich wrote:
> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> handler early enough to cover for example the rdmsrl_safe() of
> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> of MSR_K7_HWCR later in the same function). The respective change
> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> backported to 4.14, but no further - presumably since it wasn't really
> easy because of other dependencies.
>
> Therefore, to prevent our change in the handling of guest MSR accesses
> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> the raising of #GP on this paths conditional upon the guest having
> installed a handler, provided of course the MSR can be read in the first
> place (we would have raised #GP in that case even before). Producing
> zero for reads isn't necessarily correct and may trip code trying to
> detect presence of MSRs early, but since such detection logic won't work
> without a #GP handler anyway, this ought to be a fair workaround.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Release-Acked-by: Ian Jackson <iwj@xenproject.org>

I am still of the firm belief that this is the wrong course of action.

It is deliberately papering over error handling bugs which, in the
NetBSD case, literally created memory corruption scenarios.  (Yes - that
was a WRMSR not RDMSR, but the general point still stands, particularly
in light of your expectation to do the same to the WRMSR).

It is one thing to not realise that we have a trainwreck here.  Its
totally another to take wilful action to keep current and all future
guests broken in the same way.

The *only* case where it is acceptable to skip error handling is if the
VM admin has specifically signed their life away to say that they'll
accept the, now discovered, potential-memory-corrupion consequences.

Rogers patch already does this.

~Andrew



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-12 11:19   ` Andrew Cooper
  2021-03-12 11:01     ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
@ 2021-03-12 13:34     ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2021-03-12 13:34 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, Ian Jackson, xen-devel

On 12.03.2021 12:19, Andrew Cooper wrote:
> On 12/03/2021 07:54, Jan Beulich wrote:
>> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
>> handler early enough to cover for example the rdmsrl_safe() of
>> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
>> of MSR_K7_HWCR later in the same function). The respective change
>> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
>> backported to 4.14, but no further - presumably since it wasn't really
>> easy because of other dependencies.
>>
>> Therefore, to prevent our change in the handling of guest MSR accesses
>> to render PV Linux 4.13 and older unusable on at least AMD systems, make
>> the raising of #GP on this paths conditional upon the guest having
>> installed a handler, provided of course the MSR can be read in the first
>> place (we would have raised #GP in that case even before). Producing
>> zero for reads isn't necessarily correct and may trip code trying to
>> detect presence of MSRs early, but since such detection logic won't work
>> without a #GP handler anyway, this ought to be a fair workaround.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
> 
> I am still of the firm belief that this is the wrong course of action.
> 
> It is deliberately papering over error handling bugs which, in the
> NetBSD case, literally created memory corruption scenarios.  (Yes - that
> was a WRMSR not RDMSR, but the general point still stands, particularly
> in light of your expectation to do the same to the WRMSR).
> 
> It is one thing to not realise that we have a trainwreck here.  Its
> totally another to take wilful action to keep current and all future
> guests broken in the same way.
> 
> The *only* case where it is acceptable to skip error handling is if the
> VM admin has specifically signed their life away to say that they'll
> accept the, now discovered, potential-memory-corrupion consequences.
> 
> Rogers patch already does this.

With _much_ bigger impact - it requires changing the behavior for the
entire lifetime of the domain, rather than just very early boot. And
as you may have seen, despite my fear that it may not be enough, Roger
and I have agreed to leave the WRMSR path alone.

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks [and 1 more messages]
  2021-03-12 11:01     ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
  2021-03-12 11:02       ` [PATCH v4 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
  2021-03-12 11:03       ` [PATCH v4 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
@ 2021-03-12 13:59       ` Ian Jackson
  2 siblings, 0 replies; 11+ messages in thread
From: Ian Jackson @ 2021-03-12 13:59 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: xen-devel, Wei Liu, Roger Pau Monné, committers

Jan Beulich writes ("[PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks"):
> Largely to re-base patch 1.
> 
> 1: PV: conditionally avoid raising #GP for early guest MSR reads
> 2: AMD: expose HWCR.TscFreqSel to guests

Jan, thanks for the v4.  Roger, thanks for your reviews and for your
mail to committers@ (on Wednesday).

Andrew Cooper writes ("Re: [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads"):
> I am still of the firm belief that this is the wrong course of action.

Andrew, thanks for your clearly stated opinion.

It seems to me that, even having taken Andrew's strong objection into
account, Jan and Roger and I all still think this patch is the right
thing to do.

With my release manager hat on I would prefer not to spend any more
time debating this; and, the discussions do not seem to be producing
any new information.

I spoke to Jan on IRC and he confirmed that these patches are finished
on a detailed technical level - that is, there is no reason not to
commit these patches, except for the above objection to the whole
principle.

I therefore intend to commit both these two patches (v4) late this
afternoon, say 5pm UK time.  If anyone thinks that this would be
improper, or has a new reason I shouldn't go ahead, please let me know
ASAP.

Thanks,
Ian.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-12 13:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12  7:53 [PATCH v3 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
2021-03-12  7:54 ` [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
2021-03-12  9:08   ` Roger Pau Monné
2021-03-12  9:32     ` Jan Beulich
2021-03-12 11:19   ` Andrew Cooper
2021-03-12 11:01     ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
2021-03-12 11:02       ` [PATCH v4 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
2021-03-12 11:03       ` [PATCH v4 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
2021-03-12 13:59       ` [PATCH v4 0/2][4.15] x86: guest MSR access handling tweaks [and 1 more messages] Ian Jackson
2021-03-12 13:34     ` [PATCH v3 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
2021-03-12  7:55 ` [PATCH v3 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.