All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/xstate: reset cached register values on resume
@ 2021-08-18 11:30 Marek Marczykowski-Górecki
  2021-08-18 12:05 ` Jan Beulich
  2021-08-18 12:44 ` Andrew Cooper
  0 siblings, 2 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-18 11:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

set_xcr0() and set_msr_xss() use cached value to avoid setting the
register to the same value over and over. But suspend/resume implicitly
reset the registers and since percpu areas are not deallocated on
suspend anymore, the cache gets stale.
Reset the cache on resume, to ensure the next write will really hit the
hardware. Choose value 0, as it will never be a legitimate write to
those registers - and so, will force write (and cache update).

Note the cache is used io get_xcr0() and get_msr_xss() too, but:
- set_xcr0() is called few lines below in xstate_init(), so it will
  update the cache with appropriate value
- get_msr_xss() is not used anywhere - and thus not before any
  set_msr_xss() that will fill the cache

Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
 xen/arch/x86/xstate.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 6aaf9a2f1546..28726d8fbf2b 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -642,6 +642,13 @@ void xstate_init(struct cpuinfo_x86 *c)
         return;
     }
 
+    /*
+     * Clear the cached value to make set_xcr0() and set_msr_xss() really
+     * write it.
+     */
+    this_cpu(xcr0) = 0;
+    this_cpu(xss) = 0;
+
     cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
     feature_mask = (((u64)edx << 32) | eax) & XCNTXT_MASK;
     BUG_ON(!valid_xcr0(feature_mask));
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-18 11:30 [PATCH] x86/xstate: reset cached register values on resume Marek Marczykowski-Górecki
@ 2021-08-18 12:05 ` Jan Beulich
  2021-08-18 12:07   ` Marek Marczykowski-Górecki
  2021-08-18 12:44 ` Andrew Cooper
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-08-18 12:05 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, Roger Pau Monné, xen-devel, Wei Liu

On 18.08.2021 13:30, Marek Marczykowski-Górecki wrote:
> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -642,6 +642,13 @@ void xstate_init(struct cpuinfo_x86 *c)
>          return;
>      }
>  
> +    /*
> +     * Clear the cached value to make set_xcr0() and set_msr_xss() really
> +     * write it.
> +     */
> +    this_cpu(xcr0) = 0;

While XCR0 cannot be successfully written with 0, ...

> +    this_cpu(xss) = 0;

... XSS can. XSS can't be written with various other values; I'd
suggest using ~0 here. Then

Reviewed-by: Jan Beulich <jbeulich@suse.com>

and I'd be happy to make the adjustment while committing, so long
as you agree.

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-18 12:05 ` Jan Beulich
@ 2021-08-18 12:07   ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-18 12:07 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, xen-devel, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 895 bytes --]

On Wed, Aug 18, 2021 at 02:05:05PM +0200, Jan Beulich wrote:
> On 18.08.2021 13:30, Marek Marczykowski-Górecki wrote:
> > --- a/xen/arch/x86/xstate.c
> > +++ b/xen/arch/x86/xstate.c
> > @@ -642,6 +642,13 @@ void xstate_init(struct cpuinfo_x86 *c)
> >          return;
> >      }
> >  
> > +    /*
> > +     * Clear the cached value to make set_xcr0() and set_msr_xss() really
> > +     * write it.
> > +     */
> > +    this_cpu(xcr0) = 0;
> 
> While XCR0 cannot be successfully written with 0, ...
> 
> > +    this_cpu(xss) = 0;
> 
> ... XSS can. XSS can't be written with various other values; I'd
> suggest using ~0 here. Then
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> and I'd be happy to make the adjustment while committing, so long
> as you agree.

Yes, makes sense. Thanks!

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-18 11:30 [PATCH] x86/xstate: reset cached register values on resume Marek Marczykowski-Górecki
  2021-08-18 12:05 ` Jan Beulich
@ 2021-08-18 12:44 ` Andrew Cooper
  2021-08-19 11:41   ` Marek Marczykowski-Górecki
  2021-08-24 21:11   ` Andrew Cooper
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Cooper @ 2021-08-18 12:44 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel
  Cc: Jan Beulich, Roger Pau Monné, Wei Liu

On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
> set_xcr0() and set_msr_xss() use cached value to avoid setting the
> register to the same value over and over. But suspend/resume implicitly
> reset the registers and since percpu areas are not deallocated on
> suspend anymore, the cache gets stale.
> Reset the cache on resume, to ensure the next write will really hit the
> hardware. Choose value 0, as it will never be a legitimate write to
> those registers - and so, will force write (and cache update).
>
> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
> - set_xcr0() is called few lines below in xstate_init(), so it will
>   update the cache with appropriate value
> - get_msr_xss() is not used anywhere - and thus not before any
>   set_msr_xss() that will fill the cache
>
> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

I'd prefer to do this differently.  As I said in the thread, there are
other registers such as MSR_TSC_AUX which fall into the same category,
and I'd like to make something which works systematically.

~Andrew


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-18 12:44 ` Andrew Cooper
@ 2021-08-19 11:41   ` Marek Marczykowski-Górecki
  2021-08-24 21:11   ` Andrew Cooper
  1 sibling, 0 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-19 11:41 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Roger Pau Monné, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On Wed, Aug 18, 2021 at 01:44:31PM +0100, Andrew Cooper wrote:
> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
> > set_xcr0() and set_msr_xss() use cached value to avoid setting the
> > register to the same value over and over. But suspend/resume implicitly
> > reset the registers and since percpu areas are not deallocated on
> > suspend anymore, the cache gets stale.
> > Reset the cache on resume, to ensure the next write will really hit the
> > hardware. Choose value 0, as it will never be a legitimate write to
> > those registers - and so, will force write (and cache update).
> >
> > Note the cache is used io get_xcr0() and get_msr_xss() too, but:
> > - set_xcr0() is called few lines below in xstate_init(), so it will
> >   update the cache with appropriate value
> > - get_msr_xss() is not used anywhere - and thus not before any
> >   set_msr_xss() that will fill the cache
> >
> > Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> 
> I'd prefer to do this differently.  As I said in the thread, there are
> other registers such as MSR_TSC_AUX which fall into the same category,
> and I'd like to make something which works systematically.

I'm not sure if I understand your message: do you want me to do things
differently, or are you working on an alternative fix yourself?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-18 12:44 ` Andrew Cooper
  2021-08-19 11:41   ` Marek Marczykowski-Górecki
@ 2021-08-24 21:11   ` Andrew Cooper
  2021-08-25 15:02     ` Jan Beulich
  2021-10-18  8:21     ` Ping: " Jan Beulich
  1 sibling, 2 replies; 14+ messages in thread
From: Andrew Cooper @ 2021-08-24 21:11 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel
  Cc: Jan Beulich, Roger Pau Monné, Wei Liu

On 18/08/2021 13:44, Andrew Cooper wrote:
> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
>> register to the same value over and over. But suspend/resume implicitly
>> reset the registers and since percpu areas are not deallocated on
>> suspend anymore, the cache gets stale.
>> Reset the cache on resume, to ensure the next write will really hit the
>> hardware. Choose value 0, as it will never be a legitimate write to
>> those registers - and so, will force write (and cache update).
>>
>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
>> - set_xcr0() is called few lines below in xstate_init(), so it will
>>   update the cache with appropriate value
>> - get_msr_xss() is not used anywhere - and thus not before any
>>   set_msr_xss() that will fill the cache
>>
>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> I'd prefer to do this differently.  As I said in the thread, there are
> other registers such as MSR_TSC_AUX which fall into the same category,
> and I'd like to make something which works systematically.

Ok - after some searching, I think we have problems with:

cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);
xstate.c:36:static DEFINE_PER_CPU(uint64_t, xcr0);
xstate.c:79:static DEFINE_PER_CPU(uint64_t, xss);

There is also:

traps.c:100:DEFINE_PER_CPU(uint64_t, efer);

which we *almost* handle correctly, but fail to update the cache on the
BSP out of S3.


For the APIC, I think we have issues with:

irq.c:1083:static DEFINE_PER_CPU(struct pending_eoi,
pending_eoi[NR_DYNAMIC_VECTORS]);

because we don't defer S3 until all pending EOIs are complete.


I gave up trying to figure out the MCE or power governor logic so we may
have additional issues there.


Additionally,

flushtlb.c:34:DEFINE_PER_CPU(u32, tlbflush_time);

really does need setting appropriately, although I think the only
fallout is a few unnecessary TLB flushes.

~Andrew



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-24 21:11   ` Andrew Cooper
@ 2021-08-25 15:02     ` Jan Beulich
  2021-08-25 16:49       ` Andrew Cooper
  2021-10-18  8:21     ` Ping: " Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-08-25 15:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Marek Marczykowski-Górecki, xen-devel

On 24.08.2021 23:11, Andrew Cooper wrote:
> On 18/08/2021 13:44, Andrew Cooper wrote:
>> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
>>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
>>> register to the same value over and over. But suspend/resume implicitly
>>> reset the registers and since percpu areas are not deallocated on
>>> suspend anymore, the cache gets stale.
>>> Reset the cache on resume, to ensure the next write will really hit the
>>> hardware. Choose value 0, as it will never be a legitimate write to
>>> those registers - and so, will force write (and cache update).
>>>
>>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
>>> - set_xcr0() is called few lines below in xstate_init(), so it will
>>>   update the cache with appropriate value
>>> - get_msr_xss() is not used anywhere - and thus not before any
>>>   set_msr_xss() that will fill the cache
>>>
>>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> I'd prefer to do this differently.  As I said in the thread, there are
>> other registers such as MSR_TSC_AUX which fall into the same category,
>> and I'd like to make something which works systematically.
> 
> Ok - after some searching, I think we have problems with:
> 
> cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);

Don't we have a problem here even during initial boot? I can't see
the per-CPU variable to get filled by what the registers hold. If
the register started out non-zero (the default on AMD iirc, as it's
not really masks there) but the first value to be written was zero,
we'd skip the write.

> cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);

Almost the same here - we only initialize the variable on the BSP
afaics.

> msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);

And again no boot time setup at all for this one as it looks. Not 
sure how likely it is for firmware or bootloaders to use this MSR
(and then leave it non-zero).

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-25 15:02     ` Jan Beulich
@ 2021-08-25 16:49       ` Andrew Cooper
  2021-08-26  7:40         ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Andrew Cooper @ 2021-08-25 16:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Marek Marczykowski-Górecki, xen-devel

On 25/08/2021 16:02, Jan Beulich wrote:
> On 24.08.2021 23:11, Andrew Cooper wrote:
>> On 18/08/2021 13:44, Andrew Cooper wrote:
>>> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
>>>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
>>>> register to the same value over and over. But suspend/resume implicitly
>>>> reset the registers and since percpu areas are not deallocated on
>>>> suspend anymore, the cache gets stale.
>>>> Reset the cache on resume, to ensure the next write will really hit the
>>>> hardware. Choose value 0, as it will never be a legitimate write to
>>>> those registers - and so, will force write (and cache update).
>>>>
>>>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
>>>> - set_xcr0() is called few lines below in xstate_init(), so it will
>>>>   update the cache with appropriate value
>>>> - get_msr_xss() is not used anywhere - and thus not before any
>>>>   set_msr_xss() that will fill the cache
>>>>
>>>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
>>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>> I'd prefer to do this differently.  As I said in the thread, there are
>>> other registers such as MSR_TSC_AUX which fall into the same category,
>>> and I'd like to make something which works systematically.
>> Ok - after some searching, I think we have problems with:
>>
>> cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
> Don't we have a problem here even during initial boot? I can't see
> the per-CPU variable to get filled by what the registers hold.

No, I don't think so, but it is a roundabout route.

>  If
> the register started out non-zero (the default on AMD iirc, as it's
> not really masks there) but the first value to be written was zero,
> we'd skip the write.

There is cpuidmask_defaults which does get filled from the MSRs on boot.

AMD are real CPUID settings, while Intel is an and-mask.  i.e. they're
both non-zero (unless someone does something silly with the command line
arguments, and I don't expect Xen to be happy booting if the leaves all
read 0).

Each early_init_*() has an explicit ctxt_switch_levelling(NULL) call
which, given non-zero content in cpuidmask_defaults should latch each
one appropriately in the the per-cpu variable.

Thinking about it some more, we load cpuidmask_defaults in IDLE and HVM
context, while PV guests (even PV dom0) will have non-default
cpuidmask's, so with a PV dom0, things ought to correct themselves
fairly promptly after S3, but not as early as we expect.

>> cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
> Almost the same here - we only initialize the variable on the BSP
> afaics.

No - way way way worse, I think.

For all APs, we write 0 or MSR_MISC_FEATURES_CPUID_FAULTING into
MSR_INTEL_MISC_FEATURES_ENABLES, which amongst other things turns off
Fast String Enable.

I think it might be time to expand the "MSR consistency across the
system" logic from test-tsx to rather more MSRs..

>> msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);
> And again no boot time setup at all for this one as it looks. Not 
> sure how likely it is for firmware or bootloaders to use this MSR
> (and then leave it non-zero).

Regular firmware I'd expect it to be 0.  Linuxboot, I'd expect whatever
value Linux last left in there for userspace.

~Andrew



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-25 16:49       ` Andrew Cooper
@ 2021-08-26  7:40         ` Jan Beulich
  2021-08-26  8:50           ` Andrew Cooper
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-08-26  7:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Marek Marczykowski-Górecki, xen-devel

On 25.08.2021 18:49, Andrew Cooper wrote:
> On 25/08/2021 16:02, Jan Beulich wrote:
>> On 24.08.2021 23:11, Andrew Cooper wrote:
>>> On 18/08/2021 13:44, Andrew Cooper wrote:
>>>> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
>>>>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
>>>>> register to the same value over and over. But suspend/resume implicitly
>>>>> reset the registers and since percpu areas are not deallocated on
>>>>> suspend anymore, the cache gets stale.
>>>>> Reset the cache on resume, to ensure the next write will really hit the
>>>>> hardware. Choose value 0, as it will never be a legitimate write to
>>>>> those registers - and so, will force write (and cache update).
>>>>>
>>>>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
>>>>> - set_xcr0() is called few lines below in xstate_init(), so it will
>>>>>   update the cache with appropriate value
>>>>> - get_msr_xss() is not used anywhere - and thus not before any
>>>>>   set_msr_xss() that will fill the cache
>>>>>
>>>>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
>>>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>> I'd prefer to do this differently.  As I said in the thread, there are
>>>> other registers such as MSR_TSC_AUX which fall into the same category,
>>>> and I'd like to make something which works systematically.
>>> Ok - after some searching, I think we have problems with:
>>>
>>> cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
>> Don't we have a problem here even during initial boot? I can't see
>> the per-CPU variable to get filled by what the registers hold.
> 
> No, I don't think so, but it is a roundabout route.

So where do you see it getting filled?

>>  If
>> the register started out non-zero (the default on AMD iirc, as it's
>> not really masks there) but the first value to be written was zero,
>> we'd skip the write.
> 
> There is cpuidmask_defaults which does get filled from the MSRs on boot.
> 
> AMD are real CPUID settings, while Intel is an and-mask.  i.e. they're
> both non-zero (unless someone does something silly with the command line
> arguments, and I don't expect Xen to be happy booting if the leaves all
> read 0).

Surely not all of them together, but I don't think it's completely
unreasonable for one (say the XSAVE one, if e.g. XSAVE is to be turned
off altogether for guests) to be all zero.

> Each early_init_*() has an explicit ctxt_switch_levelling(NULL) call
> which, given non-zero content in cpuidmask_defaults should latch each
> one appropriately in the the per-cpu variable.

Well, as you say - provided the individual fields are all non-zero.

>>> cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
>> Almost the same here - we only initialize the variable on the BSP
>> afaics.
> 
> No - way way way worse, I think.
> 
> For all APs, we write 0 or MSR_MISC_FEATURES_CPUID_FAULTING into
> MSR_INTEL_MISC_FEATURES_ENABLES, which amongst other things turns off
> Fast String Enable.

Urgh, indeed. Prior to 6e2fdc0f8902 there was a read-modify-write
operation. With the introduction of the cache variable this went
away, while the cache gets filled for BSP only.

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-26  7:40         ` Jan Beulich
@ 2021-08-26  8:50           ` Andrew Cooper
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Cooper @ 2021-08-26  8:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Roger Pau Monné,
	Wei Liu, Marek Marczykowski-Górecki, xen-devel

On 26/08/2021 08:40, Jan Beulich wrote:
> On 25.08.2021 18:49, Andrew Cooper wrote:
>> On 25/08/2021 16:02, Jan Beulich wrote:
>>> On 24.08.2021 23:11, Andrew Cooper wrote:
>>>  If
>>> the register started out non-zero (the default on AMD iirc, as it's
>>> not really masks there) but the first value to be written was zero,
>>> we'd skip the write.
>> There is cpuidmask_defaults which does get filled from the MSRs on boot.
>>
>> AMD are real CPUID settings, while Intel is an and-mask.  i.e. they're
>> both non-zero (unless someone does something silly with the command line
>> arguments, and I don't expect Xen to be happy booting if the leaves all
>> read 0).
> Surely not all of them together, but I don't think it's completely
> unreasonable for one (say the XSAVE one, if e.g. XSAVE is to be turned
> off altogether for guests) to be all zero.
>
>> Each early_init_*() has an explicit ctxt_switch_levelling(NULL) call
>> which, given non-zero content in cpuidmask_defaults should latch each
>> one appropriately in the the per-cpu variable.
> Well, as you say - provided the individual fields are all non-zero.

The MSRs only exist on CPUs which have non-zero features in the relevant
leaves.

The XSAVE and Therm registers could plausibly be 0.  Dom0 is first to
boot and won't expect to have XSAVE hidden, but we do zero all of leaf 6
in recalculate_misc()

There are certainly corner cases here to improve, but I think all
registers will latch on the first early_init_*(), except for Therm on
AMD which will latch on the first context switch from a PV guest back to
idle.

>>>> cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
>>> Almost the same here - we only initialize the variable on the BSP
>>> afaics.
>> No - way way way worse, I think.
>>
>> For all APs, we write 0 or MSR_MISC_FEATURES_CPUID_FAULTING into
>> MSR_INTEL_MISC_FEATURES_ENABLES, which amongst other things turns off
>> Fast String Enable.
> Urgh, indeed. Prior to 6e2fdc0f8902 there was a read-modify-write
> operation. With the introduction of the cache variable this went
> away, while the cache gets filled for BSP only.

Yeah - I really screwed up on that one...  It was also part of the PV
Shim work done in a hurry in the lead up to Spectre/Meltdown.

MSR_INTEL_MISC_FEATURES_ENABLES is a lot like
MSR_{TSX_FORCE_ABORT,TSX_CTRL,MCU_OPT_CTRL} and the MTRRs.

Most of the content wants to be identical on all cores, so we do want to
fix up AP values with the BSP value if they differ, but we also want a
software cache covering at least the CPUID_FAULTING bit so we don't have
a unnecessary MSR read on the context switch path.

I'll try to do something better.

~Andrew



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Ping: [PATCH] x86/xstate: reset cached register values on resume
  2021-08-24 21:11   ` Andrew Cooper
  2021-08-25 15:02     ` Jan Beulich
@ 2021-10-18  8:21     ` Jan Beulich
  2021-10-21 13:44       ` Roger Pau Monné
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2021-10-18  8:21 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Roger Pau Monné,
	Wei Liu, Marek Marczykowski-Górecki, xen-devel, Ian Jackson

On 24.08.2021 23:11, Andrew Cooper wrote:
> On 18/08/2021 13:44, Andrew Cooper wrote:
>> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
>>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
>>> register to the same value over and over. But suspend/resume implicitly
>>> reset the registers and since percpu areas are not deallocated on
>>> suspend anymore, the cache gets stale.
>>> Reset the cache on resume, to ensure the next write will really hit the
>>> hardware. Choose value 0, as it will never be a legitimate write to
>>> those registers - and so, will force write (and cache update).
>>>
>>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
>>> - set_xcr0() is called few lines below in xstate_init(), so it will
>>>   update the cache with appropriate value
>>> - get_msr_xss() is not used anywhere - and thus not before any
>>>   set_msr_xss() that will fill the cache
>>>
>>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> I'd prefer to do this differently.  As I said in the thread, there are
>> other registers such as MSR_TSC_AUX which fall into the same category,
>> and I'd like to make something which works systematically.
> 
> Ok - after some searching, I think we have problems with:
> 
> cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
> cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
> msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);
> xstate.c:36:static DEFINE_PER_CPU(uint64_t, xcr0);
> xstate.c:79:static DEFINE_PER_CPU(uint64_t, xss);
> 
> There is also:
> 
> traps.c:100:DEFINE_PER_CPU(uint64_t, efer);
> 
> which we *almost* handle correctly, but fail to update the cache on the
> BSP out of S3.
> 
> 
> For the APIC, I think we have issues with:
> 
> irq.c:1083:static DEFINE_PER_CPU(struct pending_eoi,
> pending_eoi[NR_DYNAMIC_VECTORS]);
> 
> because we don't defer S3 until all pending EOIs are complete.

As your planned more extensive rework appears to not have made much
progress yet, may I suggest that we go with Marek's fix for 4.16,
with the one adjustment I suggested alongside giving my R-b?

Jan

> I gave up trying to figure out the MCE or power governor logic so we may
> have additional issues there.
> 
> 
> Additionally,
> 
> flushtlb.c:34:DEFINE_PER_CPU(u32, tlbflush_time);
> 
> really does need setting appropriately, although I think the only
> fallout is a few unnecessary TLB flushes.
> 
> ~Andrew
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ping: [PATCH] x86/xstate: reset cached register values on resume
  2021-10-18  8:21     ` Ping: " Jan Beulich
@ 2021-10-21 13:44       ` Roger Pau Monné
  2021-11-01 13:40         ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2021-10-21 13:44 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Wei Liu, Marek Marczykowski-Górecki,
	xen-devel, Ian Jackson

On Mon, Oct 18, 2021 at 10:21:28AM +0200, Jan Beulich wrote:
> On 24.08.2021 23:11, Andrew Cooper wrote:
> > On 18/08/2021 13:44, Andrew Cooper wrote:
> >> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
> >>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
> >>> register to the same value over and over. But suspend/resume implicitly
> >>> reset the registers and since percpu areas are not deallocated on
> >>> suspend anymore, the cache gets stale.
> >>> Reset the cache on resume, to ensure the next write will really hit the
> >>> hardware. Choose value 0, as it will never be a legitimate write to
> >>> those registers - and so, will force write (and cache update).
> >>>
> >>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
> >>> - set_xcr0() is called few lines below in xstate_init(), so it will
> >>>   update the cache with appropriate value
> >>> - get_msr_xss() is not used anywhere - and thus not before any
> >>>   set_msr_xss() that will fill the cache
> >>>
> >>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
> >>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> >> I'd prefer to do this differently.  As I said in the thread, there are
> >> other registers such as MSR_TSC_AUX which fall into the same category,
> >> and I'd like to make something which works systematically.
> > 
> > Ok - after some searching, I think we have problems with:
> > 
> > cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
> > cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
> > msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);
> > xstate.c:36:static DEFINE_PER_CPU(uint64_t, xcr0);
> > xstate.c:79:static DEFINE_PER_CPU(uint64_t, xss);
> > 
> > There is also:
> > 
> > traps.c:100:DEFINE_PER_CPU(uint64_t, efer);
> > 
> > which we *almost* handle correctly, but fail to update the cache on the
> > BSP out of S3.
> > 
> > 
> > For the APIC, I think we have issues with:
> > 
> > irq.c:1083:static DEFINE_PER_CPU(struct pending_eoi,
> > pending_eoi[NR_DYNAMIC_VECTORS]);
> > 
> > because we don't defer S3 until all pending EOIs are complete.
> 
> As your planned more extensive rework appears to not have made much
> progress yet, may I suggest that we go with Marek's fix for 4.16,
> with the one adjustment I suggested alongside giving my R-b?

I think that's the only viable solution in order to avoid shipping a
broken 4.16 so we should go ahead with it.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ping: [PATCH] x86/xstate: reset cached register values on resume
  2021-10-21 13:44       ` Roger Pau Monné
@ 2021-11-01 13:40         ` Marek Marczykowski-Górecki
  2021-11-03 11:31           ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-11-01 13:40 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Jan Beulich, Andrew Cooper, Wei Liu, xen-devel, Ian Jackson

[-- Attachment #1: Type: text/plain, Size: 2914 bytes --]

On Thu, Oct 21, 2021 at 03:44:27PM +0200, Roger Pau Monné wrote:
> On Mon, Oct 18, 2021 at 10:21:28AM +0200, Jan Beulich wrote:
> > On 24.08.2021 23:11, Andrew Cooper wrote:
> > > On 18/08/2021 13:44, Andrew Cooper wrote:
> > >> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
> > >>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
> > >>> register to the same value over and over. But suspend/resume implicitly
> > >>> reset the registers and since percpu areas are not deallocated on
> > >>> suspend anymore, the cache gets stale.
> > >>> Reset the cache on resume, to ensure the next write will really hit the
> > >>> hardware. Choose value 0, as it will never be a legitimate write to
> > >>> those registers - and so, will force write (and cache update).
> > >>>
> > >>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
> > >>> - set_xcr0() is called few lines below in xstate_init(), so it will
> > >>>   update the cache with appropriate value
> > >>> - get_msr_xss() is not used anywhere - and thus not before any
> > >>>   set_msr_xss() that will fill the cache
> > >>>
> > >>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
> > >>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > >> I'd prefer to do this differently.  As I said in the thread, there are
> > >> other registers such as MSR_TSC_AUX which fall into the same category,
> > >> and I'd like to make something which works systematically.
> > > 
> > > Ok - after some searching, I think we have problems with:
> > > 
> > > cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
> > > cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
> > > msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);
> > > xstate.c:36:static DEFINE_PER_CPU(uint64_t, xcr0);
> > > xstate.c:79:static DEFINE_PER_CPU(uint64_t, xss);
> > > 
> > > There is also:
> > > 
> > > traps.c:100:DEFINE_PER_CPU(uint64_t, efer);
> > > 
> > > which we *almost* handle correctly, but fail to update the cache on the
> > > BSP out of S3.
> > > 
> > > 
> > > For the APIC, I think we have issues with:
> > > 
> > > irq.c:1083:static DEFINE_PER_CPU(struct pending_eoi,
> > > pending_eoi[NR_DYNAMIC_VECTORS]);
> > > 
> > > because we don't defer S3 until all pending EOIs are complete.
> > 
> > As your planned more extensive rework appears to not have made much
> > progress yet, may I suggest that we go with Marek's fix for 4.16,
> > with the one adjustment I suggested alongside giving my R-b?
> 
> I think that's the only viable solution in order to avoid shipping a
> broken 4.16 so we should go ahead with it.

Do you want me to post v2 with `this_cpu(xss) = ~0` change? IIUC that's
the only thing requested in this patch specifically.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Ping: [PATCH] x86/xstate: reset cached register values on resume
  2021-11-01 13:40         ` Marek Marczykowski-Górecki
@ 2021-11-03 11:31           ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2021-11-03 11:31 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Andrew Cooper, Wei Liu, xen-devel, Ian Jackson, Roger Pau Monné

On 01.11.2021 14:40, Marek Marczykowski-Górecki wrote:
> On Thu, Oct 21, 2021 at 03:44:27PM +0200, Roger Pau Monné wrote:
>> On Mon, Oct 18, 2021 at 10:21:28AM +0200, Jan Beulich wrote:
>>> On 24.08.2021 23:11, Andrew Cooper wrote:
>>>> On 18/08/2021 13:44, Andrew Cooper wrote:
>>>>> On 18/08/2021 12:30, Marek Marczykowski-Górecki wrote:
>>>>>> set_xcr0() and set_msr_xss() use cached value to avoid setting the
>>>>>> register to the same value over and over. But suspend/resume implicitly
>>>>>> reset the registers and since percpu areas are not deallocated on
>>>>>> suspend anymore, the cache gets stale.
>>>>>> Reset the cache on resume, to ensure the next write will really hit the
>>>>>> hardware. Choose value 0, as it will never be a legitimate write to
>>>>>> those registers - and so, will force write (and cache update).
>>>>>>
>>>>>> Note the cache is used io get_xcr0() and get_msr_xss() too, but:
>>>>>> - set_xcr0() is called few lines below in xstate_init(), so it will
>>>>>>   update the cache with appropriate value
>>>>>> - get_msr_xss() is not used anywhere - and thus not before any
>>>>>>   set_msr_xss() that will fill the cache
>>>>>>
>>>>>> Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
>>>>>> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>>> I'd prefer to do this differently.  As I said in the thread, there are
>>>>> other registers such as MSR_TSC_AUX which fall into the same category,
>>>>> and I'd like to make something which works systematically.
>>>>
>>>> Ok - after some searching, I think we have problems with:
>>>>
>>>> cpu/common.c:47:DEFINE_PER_CPU(struct cpuidmasks, cpuidmasks);
>>>> cpu/common.c:120:static DEFINE_PER_CPU(uint64_t, msr_misc_features);
>>>> msr.c:35:DEFINE_PER_CPU(uint32_t, tsc_aux);
>>>> xstate.c:36:static DEFINE_PER_CPU(uint64_t, xcr0);
>>>> xstate.c:79:static DEFINE_PER_CPU(uint64_t, xss);
>>>>
>>>> There is also:
>>>>
>>>> traps.c:100:DEFINE_PER_CPU(uint64_t, efer);
>>>>
>>>> which we *almost* handle correctly, but fail to update the cache on the
>>>> BSP out of S3.
>>>>
>>>>
>>>> For the APIC, I think we have issues with:
>>>>
>>>> irq.c:1083:static DEFINE_PER_CPU(struct pending_eoi,
>>>> pending_eoi[NR_DYNAMIC_VECTORS]);
>>>>
>>>> because we don't defer S3 until all pending EOIs are complete.
>>>
>>> As your planned more extensive rework appears to not have made much
>>> progress yet, may I suggest that we go with Marek's fix for 4.16,
>>> with the one adjustment I suggested alongside giving my R-b?
>>
>> I think that's the only viable solution in order to avoid shipping a
>> broken 4.16 so we should go ahead with it.
> 
> Do you want me to post v2 with `this_cpu(xss) = ~0` change? IIUC that's
> the only thing requested in this patch specifically.

Since Ian in particular prefers to see the full final version on the list,
and since at this point you will need his release ack for the patch to go
in, I think re-sending with the adjustment and tag(s) added would be the
best course of action.

Jan



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-11-03 11:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18 11:30 [PATCH] x86/xstate: reset cached register values on resume Marek Marczykowski-Górecki
2021-08-18 12:05 ` Jan Beulich
2021-08-18 12:07   ` Marek Marczykowski-Górecki
2021-08-18 12:44 ` Andrew Cooper
2021-08-19 11:41   ` Marek Marczykowski-Górecki
2021-08-24 21:11   ` Andrew Cooper
2021-08-25 15:02     ` Jan Beulich
2021-08-25 16:49       ` Andrew Cooper
2021-08-26  7:40         ` Jan Beulich
2021-08-26  8:50           ` Andrew Cooper
2021-10-18  8:21     ` Ping: " Jan Beulich
2021-10-21 13:44       ` Roger Pau Monné
2021-11-01 13:40         ` Marek Marczykowski-Górecki
2021-11-03 11:31           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.