All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2][4.15] x86: guest MSR access handling tweaks
@ 2021-03-05  9:43 Jan Beulich
  2021-03-05  9:50 ` [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
  2021-03-05  9:50 ` [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
  0 siblings, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2021-03-05  9:43 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

The first patch was stripped of its WRMSR adjustments, albeit I'm
not convinced we'll get away with this - see there. v2 there also
addresses further comments. The 2nd patch is new here, but the
need for something like this was mentioned in v1 already.

1: PV: conditionally avoid raising #GP for early guest MSR reads
2: AMD: expose HWCR.TscFreqSel to guests

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-05  9:43 [PATCH v2 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
@ 2021-03-05  9:50 ` Jan Beulich
  2021-03-08  8:56   ` Roger Pau Monné
  2021-03-05  9:50 ` [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2021-03-05  9:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Prior to 4.15 Linux, when running in PV mode, did not install a #GP
handler early enough to cover for example the rdmsrl_safe() of
MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
of MSR_K7_HWCR later in the same function). The respective change
(42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
backported to 4.14, but no further - presumably since it wasn't really
easy because of other dependencies.

Therefore, to prevent our change in the handling of guest MSR accesses
to render PV Linux 4.13 and older unusable on at least AMD systems, make
the raising of #GP on this paths conditional upon the guest having
installed a handler, provided of course the MSR can be read in the first
place (we would have raised #GP in that case even before). Producing
zero for reads isn't necessarily correct and may trip code trying to
detect presence of MSRs early, but since such detection logic won't work
without a #GP handler anyway, this ought to be a fair workaround.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
    messages (in debug builds). Don't alter WRMSR behavior.
---
I'm not convinced we can get away without also making the WRMSR path
somewhat more permissive again, e.g. tolerating attempts to set bits
which are already set. But of course this would require keeping in sync
for which MSRs we "fake" reads, as then a kernel attempt to set a bit
may also appear as an attempt to clear others (because of the zero value
that we gave it for the read).

--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
     struct vcpu *curr = current;
     const struct domain *currd = curr->domain;
     const struct cpuid_policy *cp = currd->arch.cpuid;
-    bool vpmu_msr = false;
+    bool vpmu_msr = false, warn = false;
     int ret;
 
     if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
@@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
         if ( ret == X86EMUL_EXCEPTION )
             x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
 
-        return ret;
+        goto done;
     }
 
     switch ( reg )
@@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
         }
         /* fall through */
     default:
-        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+        warn = true;
         break;
 
     normal:
@@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
         return X86EMUL_OKAY;
     }
 
-    return X86EMUL_UNHANDLEABLE;
+ done:
+    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
+         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
+    {
+        gprintk(XENLOG_WARNING, "faking RDMSR 0x%08x\n", reg);
+        *val = 0;
+        x86_emul_reset_event(ctxt);
+        ret = X86EMUL_OKAY;
+    }
+    else if ( warn )
+        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
+
+    return ret;
 }
 
 static int write_msr(unsigned int reg, uint64_t val,



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-05  9:43 [PATCH v2 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
  2021-03-05  9:50 ` [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
@ 2021-03-05  9:50 ` Jan Beulich
  2021-03-08 11:37   ` Roger Pau Monné
  2021-03-08 12:41   ` Andrew Cooper
  1 sibling, 2 replies; 20+ messages in thread
From: Jan Beulich @ 2021-03-05  9:50 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné, Ian Jackson

Linux has been warning ("firmware bug") about this bit being clear for a
long time. While writable in older hardware it has been readonly on more
than just most recent hardware. For simplicitly report it always set (if
anything we may want to log the issue ourselves if it turns out to be
clear on older hardware).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: New.
---
There are likely more bits worthwhile to expose, but for about every one
of them there would be the risk of a lengthy discussion, as there are
clear downsides to exposing such information, the more that it would be
tbd whether the hardware values should be surfaced, and if so what
should happen when the guest gets migrated.

The main risk with making the read not fault here is that guests might
imply they can also write this MSR then.

--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
         *val = msrs->tsc_aux;
         break;
 
+    case MSR_K8_HWCR:
+        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+            goto gp_fault;
+        *val = K8_HWCR_TSC_FREQ_SEL;
+        break;
+
     case MSR_AMD64_DE_CFG:
         if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
             goto gp_fault;
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -287,6 +287,8 @@
 
 #define MSR_K7_HWCR			0xc0010015
 #define MSR_K8_HWCR			0xc0010015
+#define K8_HWCR_TSC_FREQ_SEL		(1ULL << 24)
+
 #define MSR_K7_FID_VID_CTL		0xc0010041
 #define MSR_K7_FID_VID_STATUS		0xc0010042
 #define MSR_K8_PSTATE_LIMIT		0xc0010061



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-05  9:50 ` [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
@ 2021-03-08  8:56   ` Roger Pau Monné
  2021-03-08  9:33     ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-08  8:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> handler early enough to cover for example the rdmsrl_safe() of
> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> of MSR_K7_HWCR later in the same function). The respective change
> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> backported to 4.14, but no further - presumably since it wasn't really
> easy because of other dependencies.
> 
> Therefore, to prevent our change in the handling of guest MSR accesses
> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> the raising of #GP on this paths conditional upon the guest having
> installed a handler, provided of course the MSR can be read in the first
> place (we would have raised #GP in that case even before). Producing
> zero for reads isn't necessarily correct and may trip code trying to
> detect presence of MSRs early, but since such detection logic won't work
> without a #GP handler anyway, this ought to be a fair workaround.
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>     messages (in debug builds). Don't alter WRMSR behavior.
> ---
> I'm not convinced we can get away without also making the WRMSR path
> somewhat more permissive again, e.g. tolerating attempts to set bits
> which are already set. But of course this would require keeping in sync
> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
> may also appear as an attempt to clear others (because of the zero value
> that we gave it for the read).

The above approach seems dangerous, as it could allow a guest to
figure out the value of the underlying MSR by probing whether values
trigger a #GP?

I think we want to do something similar to what we do on HVM in 4.14
and previous versions: ignore writes as long as the rdmsr to the
target MSR succeeds, regardless of the value.

> 
> --- a/xen/arch/x86/pv/emul-priv-op.c
> +++ b/xen/arch/x86/pv/emul-priv-op.c
> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>      struct vcpu *curr = current;
>      const struct domain *currd = curr->domain;
>      const struct cpuid_policy *cp = currd->arch.cpuid;
> -    bool vpmu_msr = false;
> +    bool vpmu_msr = false, warn = false;
>      int ret;
>  
>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>          if ( ret == X86EMUL_EXCEPTION )
>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>  
> -        return ret;
> +        goto done;
>      }
>  
>      switch ( reg )
> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>          }
>          /* fall through */
>      default:
> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> +        warn = true;
>          break;
>  
>      normal:
> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>          return X86EMUL_OKAY;
>      }
>  
> -    return X86EMUL_UNHANDLEABLE;
> + done:

Won't this handling be better placed in the 'default' switch case
above?

> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )

We didn't used to care about explicitly blocking the reserved MSR
range, do we really wan to do it now?

I'm not sure I see an issue with that, but given that we are trying to
bring back something similar to the previous behavior, I would avoid
adding new conditions.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-08  8:56   ` Roger Pau Monné
@ 2021-03-08  9:33     ` Jan Beulich
  2021-03-08 12:11       ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2021-03-08  9:33 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 08.03.2021 09:56, Roger Pau Monné wrote:
> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
>> handler early enough to cover for example the rdmsrl_safe() of
>> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
>> of MSR_K7_HWCR later in the same function). The respective change
>> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
>> backported to 4.14, but no further - presumably since it wasn't really
>> easy because of other dependencies.
>>
>> Therefore, to prevent our change in the handling of guest MSR accesses
>> to render PV Linux 4.13 and older unusable on at least AMD systems, make
>> the raising of #GP on this paths conditional upon the guest having
>> installed a handler, provided of course the MSR can be read in the first
>> place (we would have raised #GP in that case even before). Producing
>> zero for reads isn't necessarily correct and may trip code trying to
>> detect presence of MSRs early, but since such detection logic won't work
>> without a #GP handler anyway, this ought to be a fair workaround.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
>>     messages (in debug builds). Don't alter WRMSR behavior.
>> ---
>> I'm not convinced we can get away without also making the WRMSR path
>> somewhat more permissive again, e.g. tolerating attempts to set bits
>> which are already set. But of course this would require keeping in sync
>> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
>> may also appear as an attempt to clear others (because of the zero value
>> that we gave it for the read).
> 
> The above approach seems dangerous, as it could allow a guest to
> figure out the value of the underlying MSR by probing whether values
> trigger a #GP?

Perhaps, yes. But what do you do? There's potentially a huge value
range to probe ...

> I think we want to do something similar to what we do on HVM in 4.14
> and previous versions: ignore writes as long as the rdmsr to the
> target MSR succeeds, regardless of the value.

Which, as said elsewhere, has its own downsides - writable MSRs don't
need to also be readable. See e.g. AMD's proposed PARTIAL_{FS,GS}_LOAD
MSRs.

>> --- a/xen/arch/x86/pv/emul-priv-op.c
>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>      struct vcpu *curr = current;
>>      const struct domain *currd = curr->domain;
>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>> -    bool vpmu_msr = false;
>> +    bool vpmu_msr = false, warn = false;
>>      int ret;
>>  
>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>          if ( ret == X86EMUL_EXCEPTION )
>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>  
>> -        return ret;
>> +        goto done;
>>      }
>>  
>>      switch ( reg )
>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>          }
>>          /* fall through */
>>      default:
>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>> +        warn = true;
>>          break;
>>  
>>      normal:
>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>          return X86EMUL_OKAY;
>>      }
>>  
>> -    return X86EMUL_UNHANDLEABLE;
>> + done:
> 
> Won't this handling be better placed in the 'default' switch case
> above?

No - see the "goto done" added near the top of the function.

>> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
>> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
> 
> We didn't used to care about explicitly blocking the reserved MSR
> range, do we really wan to do it now?
> 
> I'm not sure I see an issue with that, but given that we are trying to
> bring back something similar to the previous behavior, I would avoid
> adding new conditions.

What I'm particularly trying to avoid here is to allow
information from an underlying hypervisor to "shine through",
even if it's only information as to whether a certain MSR is
readable. It should be solely our own selection which MSRs in
this range a guest is able to (appear to) read.

Plus of course by avoiding the rdmsr_safe() in this case we
also avoid the unnecessary (debug only) log message associated
with such attempted reads.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-05  9:50 ` [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
@ 2021-03-08 11:37   ` Roger Pau Monné
  2021-03-08 11:47     ` Jan Beulich
  2021-03-08 12:41   ` Andrew Cooper
  1 sibling, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-08 11:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
> Linux has been warning ("firmware bug") about this bit being clear for a
> long time. While writable in older hardware it has been readonly on more
> than just most recent hardware. For simplicitly report it always set (if
> anything we may want to log the issue ourselves if it turns out to be
> clear on older hardware).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

One question below.

> ---
> v2: New.
> ---
> There are likely more bits worthwhile to expose, but for about every one
> of them there would be the risk of a lengthy discussion, as there are
> clear downsides to exposing such information, the more that it would be
> tbd whether the hardware values should be surfaced, and if so what
> should happen when the guest gets migrated.
> 
> The main risk with making the read not fault here is that guests might
> imply they can also write this MSR then.
> 
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>          *val = msrs->tsc_aux;
>          break;
>  
> +    case MSR_K8_HWCR:
> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> +            goto gp_fault;
> +        *val = K8_HWCR_TSC_FREQ_SEL;

I've been only able to find information about this MSR up to family
10h, but I think in theory Xen might also run on family 0Fh, do you
know if the MSR is present there, and the bit has the same meaning?

> +        break;
> +
>      case MSR_AMD64_DE_CFG:
>          if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>              goto gp_fault;
> --- a/xen/include/asm-x86/msr-index.h
> +++ b/xen/include/asm-x86/msr-index.h
> @@ -287,6 +287,8 @@
>  
>  #define MSR_K7_HWCR			0xc0010015

We could likely drop the K7 define here, as Xen won't be able to run
on K7 hardware anymore AFAICT.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-08 11:37   ` Roger Pau Monné
@ 2021-03-08 11:47     ` Jan Beulich
  2021-03-08 12:29       ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2021-03-08 11:47 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 08.03.2021 12:37, Roger Pau Monné wrote:
> On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
>> Linux has been warning ("firmware bug") about this bit being clear for a
>> long time. While writable in older hardware it has been readonly on more
>> than just most recent hardware. For simplicitly report it always set (if
>> anything we may want to log the issue ourselves if it turns out to be
>> clear on older hardware).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

> One question below.
> 
>> ---
>> v2: New.
>> ---
>> There are likely more bits worthwhile to expose, but for about every one
>> of them there would be the risk of a lengthy discussion, as there are
>> clear downsides to exposing such information, the more that it would be
>> tbd whether the hardware values should be surfaced, and if so what
>> should happen when the guest gets migrated.
>>
>> The main risk with making the read not fault here is that guests might
>> imply they can also write this MSR then.
>>
>> --- a/xen/arch/x86/msr.c
>> +++ b/xen/arch/x86/msr.c
>> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>>          *val = msrs->tsc_aux;
>>          break;
>>  
>> +    case MSR_K8_HWCR:
>> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>> +            goto gp_fault;
>> +        *val = K8_HWCR_TSC_FREQ_SEL;
> 
> I've been only able to find information about this MSR up to family
> 10h, but I think in theory Xen might also run on family 0Fh, do you
> know if the MSR is present there, and the bit has the same meaning?

From its name (and its K7 alternative name) it's clear the register
had been there at that point. And indeed the bit has a different
meaning there (its the bottom bit of a 6-bit START_FID field if the
BKDG I'm looking at can be trusted. Since I don't think it matters
much whether we expose a value of 0x00 or a value of 0x01 there,
and since we likely don't want to make #GP raising dependent upon
family when we don't _really_ need to, I would want to propose that
the value used is good enough uniformly.

>> --- a/xen/include/asm-x86/msr-index.h
>> +++ b/xen/include/asm-x86/msr-index.h
>> @@ -287,6 +287,8 @@
>>  
>>  #define MSR_K7_HWCR			0xc0010015
> 
> We could likely drop the K7 define here, as Xen won't be able to run
> on K7 hardware anymore AFAICT.

Indeed, but not at this point in time.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-08  9:33     ` Jan Beulich
@ 2021-03-08 12:11       ` Roger Pau Monné
  2021-03-08 13:49         ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-08 12:11 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> On 08.03.2021 09:56, Roger Pau Monné wrote:
> > On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >> Prior to 4.15 Linux, when running in PV mode, did not install a #GP
> >> handler early enough to cover for example the rdmsrl_safe() of
> >> MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
> >> of MSR_K7_HWCR later in the same function). The respective change
> >> (42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
> >> backported to 4.14, but no further - presumably since it wasn't really
> >> easy because of other dependencies.
> >>
> >> Therefore, to prevent our change in the handling of guest MSR accesses
> >> to render PV Linux 4.13 and older unusable on at least AMD systems, make
> >> the raising of #GP on this paths conditional upon the guest having
> >> installed a handler, provided of course the MSR can be read in the first
> >> place (we would have raised #GP in that case even before). Producing
> >> zero for reads isn't necessarily correct and may trip code trying to
> >> detect presence of MSRs early, but since such detection logic won't work
> >> without a #GP handler anyway, this ought to be a fair workaround.
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >> ---
> >> v2: Probe MSR read. Exclude hypervisor range. Avoid issuing two log
> >>     messages (in debug builds). Don't alter WRMSR behavior.
> >> ---
> >> I'm not convinced we can get away without also making the WRMSR path
> >> somewhat more permissive again, e.g. tolerating attempts to set bits
> >> which are already set. But of course this would require keeping in sync
> >> for which MSRs we "fake" reads, as then a kernel attempt to set a bit
> >> may also appear as an attempt to clear others (because of the zero value
> >> that we gave it for the read).
> > 
> > The above approach seems dangerous, as it could allow a guest to
> > figure out the value of the underlying MSR by probing whether values
> > trigger a #GP?
> 
> Perhaps, yes. But what do you do? There's potentially a huge value
> range to probe ...
> 
> > I think we want to do something similar to what we do on HVM in 4.14
> > and previous versions: ignore writes as long as the rdmsr to the
> > target MSR succeeds, regardless of the value.
> 
> Which, as said elsewhere, has its own downsides - writable MSRs don't
> need to also be readable. See e.g. AMD's proposed PARTIAL_{FS,GS}_LOAD
> MSRs.

Yes, but it's IMO the lesser of two evils, I think we should avoid any
kind of behavior that depends on the underlying MSR value, just to be
on the safe side.

> >> --- a/xen/arch/x86/pv/emul-priv-op.c
> >> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>      struct vcpu *curr = current;
> >>      const struct domain *currd = curr->domain;
> >>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >> -    bool vpmu_msr = false;
> >> +    bool vpmu_msr = false, warn = false;
> >>      int ret;
> >>  
> >>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>          if ( ret == X86EMUL_EXCEPTION )
> >>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>  
> >> -        return ret;
> >> +        goto done;
> >>      }
> >>  
> >>      switch ( reg )
> >> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>          }
> >>          /* fall through */
> >>      default:
> >> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >> +        warn = true;
> >>          break;
> >>  
> >>      normal:
> >> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>          return X86EMUL_OKAY;
> >>      }
> >>  
> >> -    return X86EMUL_UNHANDLEABLE;
> >> + done:
> > 
> > Won't this handling be better placed in the 'default' switch case
> > above?
> 
> No - see the "goto done" added near the top of the function.

Yes, I'm not sure of that. If guest_rdmsr returns anything different
than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
and hence we shouldn't check whether the #GP handler is set or not.

This is not the behavior of older Xen versions, so I'm unsure whether
we should introduce a policy that's even less strict than the previous
one in regard to whether a #GP is injected or not.

I know injecting a #GP when the handler is not set is not helpful for
the guest, but we should limit the workaround to kind of restoring the
previous behavior for making buggy guests happy, not expanding it
anymore.

> >> +    if ( ret != X86EMUL_OKAY && !curr->arch.pv.trap_ctxt[X86_EXC_GP].address &&
> >> +         (reg >> 16) != 0x4000 && !rdmsr_safe(reg, *val) )
> > 
> > We didn't used to care about explicitly blocking the reserved MSR
> > range, do we really wan to do it now?
> > 
> > I'm not sure I see an issue with that, but given that we are trying to
> > bring back something similar to the previous behavior, I would avoid
> > adding new conditions.
> 
> What I'm particularly trying to avoid here is to allow
> information from an underlying hypervisor to "shine through",
> even if it's only information as to whether a certain MSR is
> readable. It should be solely our own selection which MSRs in
> this range a guest is able to (appear to) read.
> 
> Plus of course by avoiding the rdmsr_safe() in this case we
> also avoid the unnecessary (debug only) log message associated
> with such attempted reads.

OK, I think that's fine.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-08 11:47     ` Jan Beulich
@ 2021-03-08 12:29       ` Roger Pau Monné
  2021-03-08 13:42         ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-08 12:29 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Mon, Mar 08, 2021 at 12:47:44PM +0100, Jan Beulich wrote:
> On 08.03.2021 12:37, Roger Pau Monné wrote:
> > On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
> >> Linux has been warning ("firmware bug") about this bit being clear for a
> >> long time. While writable in older hardware it has been readonly on more
> >> than just most recent hardware. For simplicitly report it always set (if
> >> anything we may want to log the issue ourselves if it turns out to be
> >> clear on older hardware).
> >>
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Thanks.
> 
> > One question below.
> > 
> >> ---
> >> v2: New.
> >> ---
> >> There are likely more bits worthwhile to expose, but for about every one
> >> of them there would be the risk of a lengthy discussion, as there are
> >> clear downsides to exposing such information, the more that it would be
> >> tbd whether the hardware values should be surfaced, and if so what
> >> should happen when the guest gets migrated.
> >>
> >> The main risk with making the read not fault here is that guests might
> >> imply they can also write this MSR then.
> >>
> >> --- a/xen/arch/x86/msr.c
> >> +++ b/xen/arch/x86/msr.c
> >> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
> >>          *val = msrs->tsc_aux;
> >>          break;
> >>  
> >> +    case MSR_K8_HWCR:
> >> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> >> +            goto gp_fault;
> >> +        *val = K8_HWCR_TSC_FREQ_SEL;
> > 
> > I've been only able to find information about this MSR up to family
> > 10h, but I think in theory Xen might also run on family 0Fh, do you
> > know if the MSR is present there, and the bit has the same meaning?
> 
> From its name (and its K7 alternative name) it's clear the register
> had been there at that point. And indeed the bit has a different
> meaning there (its the bottom bit of a 6-bit START_FID field if the
> BKDG I'm looking at can be trusted.

OK, I cannot seem to find the BKDG for family 0Fh. The oldest BKDG I
can find is for Family 10h [0].

> Since I don't think it matters
> much whether we expose a value of 0x00 or a value of 0x01 there,
> and since we likely don't want to make #GP raising dependent upon
> family when we don't _really_ need to, I would want to propose that
> the value used is good enough uniformly.

I would be fine with setting it to 0 if Fam < 10h if you think that's
acceptable. I think the chances of someone running Xen >= 4.15 on such
old hardware are quite dim.

Thanks, Roger.

[0] https://developer.amd.com/resources/developer-guides-manuals/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-05  9:50 ` [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
  2021-03-08 11:37   ` Roger Pau Monné
@ 2021-03-08 12:41   ` Andrew Cooper
  2021-03-08 13:23     ` Roger Pau Monné
  2021-03-08 13:24     ` Jan Beulich
  1 sibling, 2 replies; 20+ messages in thread
From: Andrew Cooper @ 2021-03-08 12:41 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné, Ian Jackson

On 05/03/2021 09:50, Jan Beulich wrote:
> Linux has been warning ("firmware bug") about this bit being clear for a
> long time. While writable in older hardware it has been readonly on more
> than just most recent hardware. For simplicitly report it always set (if
> anything we may want to log the issue ourselves if it turns out to be
> clear on older hardware).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

I realise Linux is complaining, but simply setting the bit isn't a fix.

This needs corresponding updates in the ACPI tables, as well as Pstate
MSRs, or Linux will derive a false relationship between the TSC rate and
wallclock.

~Andrew



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-08 12:41   ` Andrew Cooper
@ 2021-03-08 13:23     ` Roger Pau Monné
  2021-03-08 13:24     ` Jan Beulich
  1 sibling, 0 replies; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-08 13:23 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Jan Beulich, xen-devel, Wei Liu, Ian Jackson

On Mon, Mar 08, 2021 at 12:41:26PM +0000, Andrew Cooper wrote:
> On 05/03/2021 09:50, Jan Beulich wrote:
> > Linux has been warning ("firmware bug") about this bit being clear for a
> > long time. While writable in older hardware it has been readonly on more
> > than just most recent hardware. For simplicitly report it always set (if
> > anything we may want to log the issue ourselves if it turns out to be
> > clear on older hardware).
> >
> > Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I realise Linux is complaining, but simply setting the bit isn't a fix.
> 
> This needs corresponding updates in the ACPI tables, as well as Pstate
> MSRs, or Linux will derive a false relationship between the TSC rate and
> wallclock.

Is there any description of those relations?

I don't seem to find any other MSR referencing the TscFreqSel bit in
HWCR on the AMD Open-Source Register Reference, but I might be looking
at the wrong place.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-08 12:41   ` Andrew Cooper
  2021-03-08 13:23     ` Roger Pau Monné
@ 2021-03-08 13:24     ` Jan Beulich
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2021-03-08 13:24 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Roger Pau Monné, Ian Jackson, xen-devel

On 08.03.2021 13:41, Andrew Cooper wrote:
> On 05/03/2021 09:50, Jan Beulich wrote:
>> Linux has been warning ("firmware bug") about this bit being clear for a
>> long time. While writable in older hardware it has been readonly on more
>> than just most recent hardware. For simplicitly report it always set (if
>> anything we may want to log the issue ourselves if it turns out to be
>> clear on older hardware).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> I realise Linux is complaining, but simply setting the bit isn't a fix.
> 
> This needs corresponding updates in the ACPI tables, as well as Pstate
> MSRs, or Linux will derive a false relationship between the TSC rate and
> wallclock.

I guess I don't follow: AMD's doc is very clear: BIOSes ought to set the
bit. It not being set is more likely a mistake than an indication of
other pieces (MSRs, ACPI tables) reflecting this unintended state. Plus
isn't what you say true also if Linux sees the bit wrongly clear (which
would be the case prior to this patch)? Are you suggesting we should
revert behavior here all the way to letting the hardware bit shine
through again (for Dom0; for DomU neither other MSRs nor ACPI tables
are possibly aware of this bit's state)?

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests
  2021-03-08 12:29       ` Roger Pau Monné
@ 2021-03-08 13:42         ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2021-03-08 13:42 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 08.03.2021 13:29, Roger Pau Monné wrote:
> On Mon, Mar 08, 2021 at 12:47:44PM +0100, Jan Beulich wrote:
>> On 08.03.2021 12:37, Roger Pau Monné wrote:
>>> On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/msr.c
>>>> +++ b/xen/arch/x86/msr.c
>>>> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t
>>>>          *val = msrs->tsc_aux;
>>>>          break;
>>>>  
>>>> +    case MSR_K8_HWCR:
>>>> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
>>>> +            goto gp_fault;
>>>> +        *val = K8_HWCR_TSC_FREQ_SEL;
>>>
>>> I've been only able to find information about this MSR up to family
>>> 10h, but I think in theory Xen might also run on family 0Fh, do you
>>> know if the MSR is present there, and the bit has the same meaning?
>>
>> From its name (and its K7 alternative name) it's clear the register
>> had been there at that point. And indeed the bit has a different
>> meaning there (its the bottom bit of a 6-bit START_FID field if the
>> BKDG I'm looking at can be trusted.
> 
> OK, I cannot seem to find the BKDG for family 0Fh. The oldest BKDG I
> can find is for Family 10h [0].
> 
>> Since I don't think it matters
>> much whether we expose a value of 0x00 or a value of 0x01 there,
>> and since we likely don't want to make #GP raising dependent upon
>> family when we don't _really_ need to, I would want to propose that
>> the value used is good enough uniformly.
> 
> I would be fine with setting it to 0 if Fam < 10h if you think that's
> acceptable. I think the chances of someone running Xen >= 4.15 on such
> old hardware are quite dim.

Would you mind explaining how returning 0 in this case would be
better? No hard-coded value will ever be guaranteed to reflect the
truth. See my reply to Andrew - if anything we'd need to let the
hardware field shine through, and in _that_ case I of course I
agree that we then should treat Fam0F specially.

I will admit though that as per the BKDG I'm looking at only even
values are defined for the field. Reporting 1 here therefore may
do good (keep OSes from trying to use any of this P-state stuff)
or bad (confuse OSes).

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-08 12:11       ` Roger Pau Monné
@ 2021-03-08 13:49         ` Jan Beulich
  2021-03-09 10:17           ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2021-03-08 13:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 08.03.2021 13:11, Roger Pau Monné wrote:
> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
>> On 08.03.2021 09:56, Roger Pau Monné wrote:
>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>>>      struct vcpu *curr = current;
>>>>      const struct domain *currd = curr->domain;
>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>>>> -    bool vpmu_msr = false;
>>>> +    bool vpmu_msr = false, warn = false;
>>>>      int ret;
>>>>  
>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>>>          if ( ret == X86EMUL_EXCEPTION )
>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>>>  
>>>> -        return ret;
>>>> +        goto done;
>>>>      }
>>>>  
>>>>      switch ( reg )
>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>>>          }
>>>>          /* fall through */
>>>>      default:
>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>>>> +        warn = true;
>>>>          break;
>>>>  
>>>>      normal:
>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>>>          return X86EMUL_OKAY;
>>>>      }
>>>>  
>>>> -    return X86EMUL_UNHANDLEABLE;
>>>> + done:
>>>
>>> Won't this handling be better placed in the 'default' switch case
>>> above?
>>
>> No - see the "goto done" added near the top of the function.
> 
> Yes, I'm not sure of that. If guest_rdmsr returns anything different
> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> and hence we shouldn't check whether the #GP handler is set or not.
> 
> This is not the behavior of older Xen versions, so I'm unsure whether
> we should introduce a policy that's even less strict than the previous
> one in regard to whether a #GP is injected or not.
> 
> I know injecting a #GP when the handler is not set is not helpful for
> the guest, but we should limit the workaround to kind of restoring the
> previous behavior for making buggy guests happy, not expanding it
> anymore.

Yet then we risk breaking guests with any subsequent addition to
guest_rdmsr() which, under whatever extra conditions, wants to
raise #GP.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-08 13:49         ` Jan Beulich
@ 2021-03-09 10:17           ` Roger Pau Monné
  2021-03-09 11:16             ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-09 10:17 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
> On 08.03.2021 13:11, Roger Pau Monné wrote:
> > On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> >> On 08.03.2021 09:56, Roger Pau Monné wrote:
> >>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >>>> --- a/xen/arch/x86/pv/emul-priv-op.c
> >>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>>>      struct vcpu *curr = current;
> >>>>      const struct domain *currd = curr->domain;
> >>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >>>> -    bool vpmu_msr = false;
> >>>> +    bool vpmu_msr = false, warn = false;
> >>>>      int ret;
> >>>>  
> >>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>>>          if ( ret == X86EMUL_EXCEPTION )
> >>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>>>  
> >>>> -        return ret;
> >>>> +        goto done;
> >>>>      }
> >>>>  
> >>>>      switch ( reg )
> >>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>>>          }
> >>>>          /* fall through */
> >>>>      default:
> >>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >>>> +        warn = true;
> >>>>          break;
> >>>>  
> >>>>      normal:
> >>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>>>          return X86EMUL_OKAY;
> >>>>      }
> >>>>  
> >>>> -    return X86EMUL_UNHANDLEABLE;
> >>>> + done:
> >>>
> >>> Won't this handling be better placed in the 'default' switch case
> >>> above?
> >>
> >> No - see the "goto done" added near the top of the function.
> > 
> > Yes, I'm not sure of that. If guest_rdmsr returns anything different
> > than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> > and hence we shouldn't check whether the #GP handler is set or not.
> > 
> > This is not the behavior of older Xen versions, so I'm unsure whether
> > we should introduce a policy that's even less strict than the previous
> > one in regard to whether a #GP is injected or not.
> > 
> > I know injecting a #GP when the handler is not set is not helpful for
> > the guest, but we should limit the workaround to kind of restoring the
> > previous behavior for making buggy guests happy, not expanding it
> > anymore.
> 
> Yet then we risk breaking guests with any subsequent addition to
> guest_rdmsr() which, under whatever extra conditions, wants to
> raise #GP.

But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
preventing taking the default path in the {read/write}_msr PV
handlers.

If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
#GP handler is not set we might as well drop the rdmsr_safe check and
just don't try to inject any #GP at all from MSR accesses unless the
handler is setup?

I feel this is likely going too far. I agree we should attempt to
restore something compatible with the previous behavior for unhandled
MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
seems to go beyond that.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-09 10:17           ` Roger Pau Monné
@ 2021-03-09 11:16             ` Jan Beulich
  2021-03-09 13:37               ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2021-03-09 11:16 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 09.03.2021 11:17, Roger Pau Monné wrote:
> On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
>> On 08.03.2021 13:11, Roger Pau Monné wrote:
>>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
>>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
>>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
>>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>      struct vcpu *curr = current;
>>>>>>      const struct domain *currd = curr->domain;
>>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>>>>>> -    bool vpmu_msr = false;
>>>>>> +    bool vpmu_msr = false, warn = false;
>>>>>>      int ret;
>>>>>>  
>>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>          if ( ret == X86EMUL_EXCEPTION )
>>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>>>>>  
>>>>>> -        return ret;
>>>>>> +        goto done;
>>>>>>      }
>>>>>>  
>>>>>>      switch ( reg )
>>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>          }
>>>>>>          /* fall through */
>>>>>>      default:
>>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>>>>>> +        warn = true;
>>>>>>          break;
>>>>>>  
>>>>>>      normal:
>>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>>>>>          return X86EMUL_OKAY;
>>>>>>      }
>>>>>>  
>>>>>> -    return X86EMUL_UNHANDLEABLE;
>>>>>> + done:
>>>>>
>>>>> Won't this handling be better placed in the 'default' switch case
>>>>> above?
>>>>
>>>> No - see the "goto done" added near the top of the function.
>>>
>>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
>>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
>>> and hence we shouldn't check whether the #GP handler is set or not.
>>>
>>> This is not the behavior of older Xen versions, so I'm unsure whether
>>> we should introduce a policy that's even less strict than the previous
>>> one in regard to whether a #GP is injected or not.
>>>
>>> I know injecting a #GP when the handler is not set is not helpful for
>>> the guest, but we should limit the workaround to kind of restoring the
>>> previous behavior for making buggy guests happy, not expanding it
>>> anymore.
>>
>> Yet then we risk breaking guests with any subsequent addition to
>> guest_rdmsr() which, under whatever extra conditions, wants to
>> raise #GP.
> 
> But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
> preventing taking the default path in the {read/write}_msr PV
> handlers.

Yes, but the impact so far and the impact going forward are different.

> If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
> #GP handler is not set we might as well drop the rdmsr_safe check and
> just don't try to inject any #GP at all from MSR accesses unless the
> handler is setup?

Well, that's what I had initially. You asked me to change to what I
have now.

> I feel this is likely going too far. I agree we should attempt to
> restore something compatible with the previous behavior for unhandled
> MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
> seems to go beyond that.

I understand this is a downside. Yet as said - the downside of _not_
doing so is that every further raising of #GP will risk breaking a
random guest kernel variant.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-09 11:16             ` Jan Beulich
@ 2021-03-09 13:37               ` Roger Pau Monné
  2021-03-09 14:50                 ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-09 13:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Tue, Mar 09, 2021 at 12:16:49PM +0100, Jan Beulich wrote:
> On 09.03.2021 11:17, Roger Pau Monné wrote:
> > On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
> >> On 08.03.2021 13:11, Roger Pau Monné wrote:
> >>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> >>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
> >>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
> >>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>      struct vcpu *curr = current;
> >>>>>>      const struct domain *currd = curr->domain;
> >>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >>>>>> -    bool vpmu_msr = false;
> >>>>>> +    bool vpmu_msr = false, warn = false;
> >>>>>>      int ret;
> >>>>>>  
> >>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>          if ( ret == X86EMUL_EXCEPTION )
> >>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>>>>>  
> >>>>>> -        return ret;
> >>>>>> +        goto done;
> >>>>>>      }
> >>>>>>  
> >>>>>>      switch ( reg )
> >>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>          }
> >>>>>>          /* fall through */
> >>>>>>      default:
> >>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >>>>>> +        warn = true;
> >>>>>>          break;
> >>>>>>  
> >>>>>>      normal:
> >>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>>>>>          return X86EMUL_OKAY;
> >>>>>>      }
> >>>>>>  
> >>>>>> -    return X86EMUL_UNHANDLEABLE;
> >>>>>> + done:
> >>>>>
> >>>>> Won't this handling be better placed in the 'default' switch case
> >>>>> above?
> >>>>
> >>>> No - see the "goto done" added near the top of the function.
> >>>
> >>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
> >>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> >>> and hence we shouldn't check whether the #GP handler is set or not.
> >>>
> >>> This is not the behavior of older Xen versions, so I'm unsure whether
> >>> we should introduce a policy that's even less strict than the previous
> >>> one in regard to whether a #GP is injected or not.
> >>>
> >>> I know injecting a #GP when the handler is not set is not helpful for
> >>> the guest, but we should limit the workaround to kind of restoring the
> >>> previous behavior for making buggy guests happy, not expanding it
> >>> anymore.
> >>
> >> Yet then we risk breaking guests with any subsequent addition to
> >> guest_rdmsr() which, under whatever extra conditions, wants to
> >> raise #GP.
> > 
> > But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
> > preventing taking the default path in the {read/write}_msr PV
> > handlers.
> 
> Yes, but the impact so far and the impact going forward are different.

OK, I assume this is because we plan to handle more MSRs in
guest_{rd/wr}msr?

In which case those newly added handlers are not likely to inject a
#GP?

> > If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
> > #GP handler is not set we might as well drop the rdmsr_safe check and
> > just don't try to inject any #GP at all from MSR accesses unless the
> > handler is setup?
> 
> Well, that's what I had initially. You asked me to change to what I
> have now.
> 
> > I feel this is likely going too far. I agree we should attempt to
> > restore something compatible with the previous behavior for unhandled
> > MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
> > seems to go beyond that.
> 
> I understand this is a downside. Yet as said - the downside of _not_
> doing so is that every further raising of #GP will risk breaking a
> random guest kernel variant.

Right. So given this awkward position Xen is in, we should maybe make
the lack of #GP injection as a result of an MSR access when no handler
is set formally part of the ABI and written down somewhere?

It's not ideal, but at the end of day PV is 'our' own architecture,
and given that this workaround will be enabled by default, and that we
won't be able to turn it off we should have it written down as part of
the ABI.

If you agree with this I'm fine with not injecting a #GP at all unless
the handler is set for PV, like you proposed in your first patch. IMO
it's not ideal, but it's better if it's a consistent behavior and
clearly written down in the public headers (likely next to the
hypercall used to setup the #GP handler).

I know this can be seen as broken behavior from an x86 perspective,
but again PV is already different from x86.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-09 13:37               ` Roger Pau Monné
@ 2021-03-09 14:50                 ` Jan Beulich
  2021-03-09 15:19                   ` Roger Pau Monné
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Beulich @ 2021-03-09 14:50 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 09.03.2021 14:37, Roger Pau Monné wrote:
> On Tue, Mar 09, 2021 at 12:16:49PM +0100, Jan Beulich wrote:
>> On 09.03.2021 11:17, Roger Pau Monné wrote:
>>> On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
>>>> On 08.03.2021 13:11, Roger Pau Monné wrote:
>>>>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
>>>>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
>>>>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
>>>>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
>>>>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
>>>>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>      struct vcpu *curr = current;
>>>>>>>>      const struct domain *currd = curr->domain;
>>>>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
>>>>>>>> -    bool vpmu_msr = false;
>>>>>>>> +    bool vpmu_msr = false, warn = false;
>>>>>>>>      int ret;
>>>>>>>>  
>>>>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
>>>>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>          if ( ret == X86EMUL_EXCEPTION )
>>>>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
>>>>>>>>  
>>>>>>>> -        return ret;
>>>>>>>> +        goto done;
>>>>>>>>      }
>>>>>>>>  
>>>>>>>>      switch ( reg )
>>>>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>          }
>>>>>>>>          /* fall through */
>>>>>>>>      default:
>>>>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
>>>>>>>> +        warn = true;
>>>>>>>>          break;
>>>>>>>>  
>>>>>>>>      normal:
>>>>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
>>>>>>>>          return X86EMUL_OKAY;
>>>>>>>>      }
>>>>>>>>  
>>>>>>>> -    return X86EMUL_UNHANDLEABLE;
>>>>>>>> + done:
>>>>>>>
>>>>>>> Won't this handling be better placed in the 'default' switch case
>>>>>>> above?
>>>>>>
>>>>>> No - see the "goto done" added near the top of the function.
>>>>>
>>>>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
>>>>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
>>>>> and hence we shouldn't check whether the #GP handler is set or not.
>>>>>
>>>>> This is not the behavior of older Xen versions, so I'm unsure whether
>>>>> we should introduce a policy that's even less strict than the previous
>>>>> one in regard to whether a #GP is injected or not.
>>>>>
>>>>> I know injecting a #GP when the handler is not set is not helpful for
>>>>> the guest, but we should limit the workaround to kind of restoring the
>>>>> previous behavior for making buggy guests happy, not expanding it
>>>>> anymore.
>>>>
>>>> Yet then we risk breaking guests with any subsequent addition to
>>>> guest_rdmsr() which, under whatever extra conditions, wants to
>>>> raise #GP.
>>>
>>> But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
>>> preventing taking the default path in the {read/write}_msr PV
>>> handlers.
>>
>> Yes, but the impact so far and the impact going forward are different.
> 
> OK, I assume this is because we plan to handle more MSRs in
> guest_{rd/wr}msr?

Yes.

> In which case those newly added handlers are not likely to inject a
> #GP?

Kind of the opposite - because it's not impossible that some
addition there may want to raise #GP.

>>> If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
>>> #GP handler is not set we might as well drop the rdmsr_safe check and
>>> just don't try to inject any #GP at all from MSR accesses unless the
>>> handler is setup?
>>
>> Well, that's what I had initially. You asked me to change to what I
>> have now.
>>
>>> I feel this is likely going too far. I agree we should attempt to
>>> restore something compatible with the previous behavior for unhandled
>>> MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
>>> seems to go beyond that.
>>
>> I understand this is a downside. Yet as said - the downside of _not_
>> doing so is that every further raising of #GP will risk breaking a
>> random guest kernel variant.
> 
> Right. So given this awkward position Xen is in, we should maybe make
> the lack of #GP injection as a result of an MSR access when no handler
> is set formally part of the ABI and written down somewhere?
> 
> It's not ideal, but at the end of day PV is 'our' own architecture,
> and given that this workaround will be enabled by default, and that we
> won't be able to turn it off we should have it written down as part of
> the ABI.
> 
> If you agree with this I'm fine with not injecting a #GP at all unless
> the handler is set for PV, like you proposed in your first patch. IMO
> it's not ideal, but it's better if it's a consistent behavior and
> clearly written down in the public headers (likely next to the
> hypercall used to setup the #GP handler).
> 
> I know this can be seen as broken behavior from an x86 perspective,
> but again PV is already different from x86.

I'm certainly not opposed to spelling this out somewhere; iirc you
said the other day that you couldn't spot a good place. I can't think
of a good place either. Furthermore before we spell out anything we
(which specifically includes Andrew) need to settle on the precise
behavior we want. I did suggest earlier that I could see us tighten
the condition, and there are many possible variations. For example we
could record whether a #GP handler was ever installed, so we wouldn't
return back to the relaxed behavior in case a guest zapped its handler
again. But for behavior like this the immediate question is going to
be what effect migration (or saving/restoring) of the guest ought to
have.

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-09 14:50                 ` Jan Beulich
@ 2021-03-09 15:19                   ` Roger Pau Monné
  2021-03-09 16:07                     ` Jan Beulich
  0 siblings, 1 reply; 20+ messages in thread
From: Roger Pau Monné @ 2021-03-09 15:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On Tue, Mar 09, 2021 at 03:50:59PM +0100, Jan Beulich wrote:
> On 09.03.2021 14:37, Roger Pau Monné wrote:
> > On Tue, Mar 09, 2021 at 12:16:49PM +0100, Jan Beulich wrote:
> >> On 09.03.2021 11:17, Roger Pau Monné wrote:
> >>> On Mon, Mar 08, 2021 at 02:49:19PM +0100, Jan Beulich wrote:
> >>>> On 08.03.2021 13:11, Roger Pau Monné wrote:
> >>>>> On Mon, Mar 08, 2021 at 10:33:12AM +0100, Jan Beulich wrote:
> >>>>>> On 08.03.2021 09:56, Roger Pau Monné wrote:
> >>>>>>> On Fri, Mar 05, 2021 at 10:50:34AM +0100, Jan Beulich wrote:
> >>>>>>>> --- a/xen/arch/x86/pv/emul-priv-op.c
> >>>>>>>> +++ b/xen/arch/x86/pv/emul-priv-op.c
> >>>>>>>> @@ -874,7 +874,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>      struct vcpu *curr = current;
> >>>>>>>>      const struct domain *currd = curr->domain;
> >>>>>>>>      const struct cpuid_policy *cp = currd->arch.cpuid;
> >>>>>>>> -    bool vpmu_msr = false;
> >>>>>>>> +    bool vpmu_msr = false, warn = false;
> >>>>>>>>      int ret;
> >>>>>>>>  
> >>>>>>>>      if ( (ret = guest_rdmsr(curr, reg, val)) != X86EMUL_UNHANDLEABLE )
> >>>>>>>> @@ -882,7 +882,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>          if ( ret == X86EMUL_EXCEPTION )
> >>>>>>>>              x86_emul_hw_exception(TRAP_gp_fault, 0, ctxt);
> >>>>>>>>  
> >>>>>>>> -        return ret;
> >>>>>>>> +        goto done;
> >>>>>>>>      }
> >>>>>>>>  
> >>>>>>>>      switch ( reg )
> >>>>>>>> @@ -986,7 +986,7 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>          }
> >>>>>>>>          /* fall through */
> >>>>>>>>      default:
> >>>>>>>> -        gdprintk(XENLOG_WARNING, "RDMSR 0x%08x unimplemented\n", reg);
> >>>>>>>> +        warn = true;
> >>>>>>>>          break;
> >>>>>>>>  
> >>>>>>>>      normal:
> >>>>>>>> @@ -995,7 +995,19 @@ static int read_msr(unsigned int reg, ui
> >>>>>>>>          return X86EMUL_OKAY;
> >>>>>>>>      }
> >>>>>>>>  
> >>>>>>>> -    return X86EMUL_UNHANDLEABLE;
> >>>>>>>> + done:
> >>>>>>>
> >>>>>>> Won't this handling be better placed in the 'default' switch case
> >>>>>>> above?
> >>>>>>
> >>>>>> No - see the "goto done" added near the top of the function.
> >>>>>
> >>>>> Yes, I'm not sure of that. If guest_rdmsr returns anything different
> >>>>> than X86EMUL_UNHANDLEABLE it means it has handled the MSR in some way,
> >>>>> and hence we shouldn't check whether the #GP handler is set or not.
> >>>>>
> >>>>> This is not the behavior of older Xen versions, so I'm unsure whether
> >>>>> we should introduce a policy that's even less strict than the previous
> >>>>> one in regard to whether a #GP is injected or not.
> >>>>>
> >>>>> I know injecting a #GP when the handler is not set is not helpful for
> >>>>> the guest, but we should limit the workaround to kind of restoring the
> >>>>> previous behavior for making buggy guests happy, not expanding it
> >>>>> anymore.
> >>>>
> >>>> Yet then we risk breaking guests with any subsequent addition to
> >>>> guest_rdmsr() which, under whatever extra conditions, wants to
> >>>> raise #GP.
> >>>
> >>> But it's always been like that AFAICT? Additions to guest_{rd/wr}msr
> >>> preventing taking the default path in the {read/write}_msr PV
> >>> handlers.
> >>
> >> Yes, but the impact so far and the impact going forward are different.
> > 
> > OK, I assume this is because we plan to handle more MSRs in
> > guest_{rd/wr}msr?
> 
> Yes.
> 
> > In which case those newly added handlers are not likely to inject a
> > #GP?
> 
> Kind of the opposite - because it's not impossible that some
> addition there may want to raise #GP.
> 
> >>> If #GP signaled by guest_{rd/wr}msr are no longer injected when the guest
> >>> #GP handler is not set we might as well drop the rdmsr_safe check and
> >>> just don't try to inject any #GP at all from MSR accesses unless the
> >>> handler is setup?
> >>
> >> Well, that's what I had initially. You asked me to change to what I
> >> have now.
> >>
> >>> I feel this is likely going too far. I agree we should attempt to
> >>> restore something compatible with the previous behavior for unhandled
> >>> MSRs, but also not injecting the #GPs signaled by guest_{rd/wr}msr
> >>> seems to go beyond that.
> >>
> >> I understand this is a downside. Yet as said - the downside of _not_
> >> doing so is that every further raising of #GP will risk breaking a
> >> random guest kernel variant.
> > 
> > Right. So given this awkward position Xen is in, we should maybe make
> > the lack of #GP injection as a result of an MSR access when no handler
> > is set formally part of the ABI and written down somewhere?
> > 
> > It's not ideal, but at the end of day PV is 'our' own architecture,
> > and given that this workaround will be enabled by default, and that we
> > won't be able to turn it off we should have it written down as part of
> > the ABI.
> > 
> > If you agree with this I'm fine with not injecting a #GP at all unless
> > the handler is set for PV, like you proposed in your first patch. IMO
> > it's not ideal, but it's better if it's a consistent behavior and
> > clearly written down in the public headers (likely next to the
> > hypercall used to setup the #GP handler).
> > 
> > I know this can be seen as broken behavior from an x86 perspective,
> > but again PV is already different from x86.
> 
> I'm certainly not opposed to spelling this out somewhere; iirc you
> said the other day that you couldn't spot a good place. I can't think
> of a good place either.

After looking some more, I think placing such comment next to
HYPERVISOR_set_trap_table (in arch-x86/xen.h) would be fine.

> Furthermore before we spell out anything we
> (which specifically includes Andrew) need to settle on the precise
> behavior we want. I did suggest earlier that I could see us tighten
> the condition, and there are many possible variations. For example we
> could record whether a #GP handler was ever installed, so we wouldn't
> return back to the relaxed behavior in case a guest zapped its handler
> again. But for behavior like this the immediate question is going to
> be what effect migration (or saving/restoring) of the guest ought to
> have.

Replying to the save/restore part: this is covered by my patch. Any
restore (or incoming live migration) from a source that doesn't have
msr_relaxed support will get that option enabled by default, so that
guests migrated from previous Xen versions don't see a change in MSR
access behavior. That applies to both PV and HVM guests (unless I have
messed things up in my patch).

Thanks, Roger.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads
  2021-03-09 15:19                   ` Roger Pau Monné
@ 2021-03-09 16:07                     ` Jan Beulich
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Beulich @ 2021-03-09 16:07 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Andrew Cooper, Wei Liu, Ian Jackson

On 09.03.2021 16:19, Roger Pau Monné wrote:
> On Tue, Mar 09, 2021 at 03:50:59PM +0100, Jan Beulich wrote:
>> On 09.03.2021 14:37, Roger Pau Monné wrote:
>>> Right. So given this awkward position Xen is in, we should maybe make
>>> the lack of #GP injection as a result of an MSR access when no handler
>>> is set formally part of the ABI and written down somewhere?
>>>
>>> It's not ideal, but at the end of day PV is 'our' own architecture,
>>> and given that this workaround will be enabled by default, and that we
>>> won't be able to turn it off we should have it written down as part of
>>> the ABI.
>>>
>>> If you agree with this I'm fine with not injecting a #GP at all unless
>>> the handler is set for PV, like you proposed in your first patch. IMO
>>> it's not ideal, but it's better if it's a consistent behavior and
>>> clearly written down in the public headers (likely next to the
>>> hypercall used to setup the #GP handler).
>>>
>>> I know this can be seen as broken behavior from an x86 perspective,
>>> but again PV is already different from x86.
>>
>> I'm certainly not opposed to spelling this out somewhere; iirc you
>> said the other day that you couldn't spot a good place. I can't think
>> of a good place either.
> 
> After looking some more, I think placing such comment next to
> HYPERVISOR_set_trap_table (in arch-x86/xen.h) would be fine.
> 
>> Furthermore before we spell out anything we
>> (which specifically includes Andrew) need to settle on the precise
>> behavior we want. I did suggest earlier that I could see us tighten
>> the condition, and there are many possible variations. For example we
>> could record whether a #GP handler was ever installed, so we wouldn't
>> return back to the relaxed behavior in case a guest zapped its handler
>> again. But for behavior like this the immediate question is going to
>> be what effect migration (or saving/restoring) of the guest ought to
>> have.
> 
> Replying to the save/restore part: this is covered by my patch. Any
> restore (or incoming live migration) from a source that doesn't have
> msr_relaxed support will get that option enabled by default, so that
> guests migrated from previous Xen versions don't see a change in MSR
> access behavior. That applies to both PV and HVM guests (unless I have
> messed things up in my patch).

Well, yes, that's for your changes. But here the question is about
mine (and remember we didn't settle on the precise condition(s) yet,
so the migration aspect may not be relevant in the end).

Jan


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-03-09 16:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05  9:43 [PATCH v2 0/2][4.15] x86: guest MSR access handling tweaks Jan Beulich
2021-03-05  9:50 ` [PATCH v2 1/2][4.15] x86/PV: conditionally avoid raising #GP for early guest MSR reads Jan Beulich
2021-03-08  8:56   ` Roger Pau Monné
2021-03-08  9:33     ` Jan Beulich
2021-03-08 12:11       ` Roger Pau Monné
2021-03-08 13:49         ` Jan Beulich
2021-03-09 10:17           ` Roger Pau Monné
2021-03-09 11:16             ` Jan Beulich
2021-03-09 13:37               ` Roger Pau Monné
2021-03-09 14:50                 ` Jan Beulich
2021-03-09 15:19                   ` Roger Pau Monné
2021-03-09 16:07                     ` Jan Beulich
2021-03-05  9:50 ` [PATCH v2 2/2][4.15] x86/AMD: expose HWCR.TscFreqSel to guests Jan Beulich
2021-03-08 11:37   ` Roger Pau Monné
2021-03-08 11:47     ` Jan Beulich
2021-03-08 12:29       ` Roger Pau Monné
2021-03-08 13:42         ` Jan Beulich
2021-03-08 12:41   ` Andrew Cooper
2021-03-08 13:23     ` Roger Pau Monné
2021-03-08 13:24     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.