All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
@ 2018-12-20 13:40 Vitaly Kuznetsov
  2018-12-20 16:17 ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: Vitaly Kuznetsov @ 2018-12-20 13:40 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, Fenghua Yu, Reinette Chatre, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Jan H . Schönherr, David Duncan

It was found that AWS x1 instances (Xen-based) lack xen.git commit
1f1d183d4900 (x86/HVM: don't give the wrong impression of WRMSR succeeding)
and because of that the wrmsr_safe() check in cache_alloc_hsw_probe()
doesn't help: the consequent rdmsr() blows up with

 unchecked MSR access error: RDMSR from 0xc90 at rIP:
   0xffffffff88c5bba3 (native_read_msr+0x3/0x30)

The issue should definitely get fixed on AWS side. We can, however, simply
workaround this in Linux and live happily after.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kernel/cpu/intel_rdt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 44272b7107ad..0acee6cd07a8 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -213,7 +213,8 @@ static inline void cache_alloc_hsw_probe(void)
 
 	if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
 		return;
-	rdmsr(IA32_L3_CBM_BASE, l, h);
+	if (rdmsr_safe(IA32_L3_CBM_BASE, &l, &h))
+		return;
 
 	/* If all the bits were set in MSR, return success */
 	if (l != max_cbm)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
  2018-12-20 13:40 [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue Vitaly Kuznetsov
@ 2018-12-20 16:17 ` Borislav Petkov
  2018-12-20 17:31   ` Vitaly Kuznetsov
       [not found]   ` <51dcb13a-4751-47f5-1e01-f6731a2c6f3c@intel.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Borislav Petkov @ 2018-12-20 16:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Reinette Chatre, Babu Moger
  Cc: x86, linux-kernel, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jan H . Schönherr, David Duncan

On Thu, Dec 20, 2018 at 02:40:46PM +0100, Vitaly Kuznetsov wrote:
> It was found that AWS x1 instances (Xen-based) lack xen.git commit
> 1f1d183d4900 (x86/HVM: don't give the wrong impression of WRMSR succeeding)
> and because of that the wrmsr_safe() check in cache_alloc_hsw_probe()
> doesn't help: the consequent rdmsr() blows up with
> 
>  unchecked MSR access error: RDMSR from 0xc90 at rIP:
>    0xffffffff88c5bba3 (native_read_msr+0x3/0x30)
> 
> The issue should definitely get fixed on AWS side. We can, however, simply
> workaround this in Linux and live happily after.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kernel/cpu/intel_rdt.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 44272b7107ad..0acee6cd07a8 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -213,7 +213,8 @@ static inline void cache_alloc_hsw_probe(void)
>  
>  	if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
>  		return;
> -	rdmsr(IA32_L3_CBM_BASE, l, h);
> +	if (rdmsr_safe(IA32_L3_CBM_BASE, &l, &h))
> +		return;
>  
>  	/* If all the bits were set in MSR, return success */
>  	if (l != max_cbm)
> -- 

Does the below hunk work too?

Reinette, Babu: is there any reason for rdt_init_res_defs() and
check_quirks() to run before get_rdt_resources() which looks at CPUID
bits?

If not, we need to do something like below, for guests.

Thx.

---
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c3a9dc63edf2..64cb0fb31862 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -969,17 +969,13 @@ static int __init resctrl_late_init(void)
 	struct rdt_resource *r;
 	int state, ret;
 
-	/*
-	 * Initialize functions(or definitions) that are different
-	 * between vendors here.
-	 */
+	if (!get_rdt_resources())
+		return -ENODEV;
+
 	rdt_init_res_defs();
 
 	check_quirks();
 
-	if (!get_rdt_resources())
-		return -ENODEV;
-
 	rdt_init_padding();
 
 	state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,



-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
  2018-12-20 16:17 ` Borislav Petkov
@ 2018-12-20 17:31   ` Vitaly Kuznetsov
       [not found]   ` <51dcb13a-4751-47f5-1e01-f6731a2c6f3c@intel.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Vitaly Kuznetsov @ 2018-12-20 17:31 UTC (permalink / raw)
  To: Borislav Petkov, Reinette Chatre, Babu Moger
  Cc: x86, linux-kernel, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Jan H . Schönherr, David Duncan

Borislav Petkov <bp@alien8.de> writes:

> On Thu, Dec 20, 2018 at 02:40:46PM +0100, Vitaly Kuznetsov wrote:
>> It was found that AWS x1 instances (Xen-based) lack xen.git commit
>> 1f1d183d4900 (x86/HVM: don't give the wrong impression of WRMSR succeeding)
>> and because of that the wrmsr_safe() check in cache_alloc_hsw_probe()
>> doesn't help: the consequent rdmsr() blows up with
>> 
>>  unchecked MSR access error: RDMSR from 0xc90 at rIP:
>>    0xffffffff88c5bba3 (native_read_msr+0x3/0x30)
>> 
>> The issue should definitely get fixed on AWS side. We can, however, simply
>> workaround this in Linux and live happily after.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  arch/x86/kernel/cpu/intel_rdt.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
>> index 44272b7107ad..0acee6cd07a8 100644
>> --- a/arch/x86/kernel/cpu/intel_rdt.c
>> +++ b/arch/x86/kernel/cpu/intel_rdt.c
>> @@ -213,7 +213,8 @@ static inline void cache_alloc_hsw_probe(void)
>>  
>>  	if (wrmsr_safe(IA32_L3_CBM_BASE, max_cbm, 0))
>>  		return;
>> -	rdmsr(IA32_L3_CBM_BASE, l, h);
>> +	if (rdmsr_safe(IA32_L3_CBM_BASE, &l, &h))
>> +		return;
>>  
>>  	/* If all the bits were set in MSR, return success */
>>  	if (l != max_cbm)
>> -- 
>
> Does the below hunk work too?
>

Yes, it does, thanks!

-- 
Vitaly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
       [not found]     ` <20521afe-09af-7acf-6f32-3f6e9a971091@intel.com>
@ 2019-01-09 11:39       ` Borislav Petkov
  2019-01-09 12:09         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2019-01-09 11:39 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Vitaly Kuznetsov, Babu Moger, x86, linux-kernel, Fenghua Yu,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jan H. Schönherr, David Duncan

On Tue, Jan 08, 2019 at 09:39:46PM -0800, Reinette Chatre wrote:
> To clarify please the last sentence applies to real Haswell server CPUs
> (not virtualized as prompted this work, my apologies for the confusion)
> that support Intel RDT but does not have CPUID enumeration for this
> feature. With Vitaly's patch this hardware would still be detected as
> supporting CAT but if CPUID enumeration is moved earlier then from what
> I can tell this hardware would be considered as not supporting the
> feature anymore.

Ok, so hw "forgot" to add CPUID again. And CPU guys should know better
but everytime we hear "important reasons" why they dropped the ball
there.

So, assuming RDT is not going to be supported in a guest, we need a
proper fix to disable it when in a guest. So the RDT init path needs
something like this then:

---
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index c3a9dc63edf2..1e5a1cb49e9c 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -969,6 +969,9 @@ static int __init resctrl_late_init(void)
 	struct rdt_resource *r;
 	int state, ret;
 
+	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+		return -ENODEV;
+
 	/*
 	 * Initialize functions(or definitions) that are different
 	 * between vendors here.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
  2019-01-09 11:39       ` Borislav Petkov
@ 2019-01-09 12:09         ` Vitaly Kuznetsov
  2019-01-09 12:14           ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: Vitaly Kuznetsov @ 2019-01-09 12:09 UTC (permalink / raw)
  To: Borislav Petkov, Reinette Chatre
  Cc: Babu Moger, x86, linux-kernel, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Jan H. Schönherr, David Duncan

Borislav Petkov <bp@alien8.de> writes:

> So, assuming RDT is not going to be supported in a guest

Hm, why is that? In theory, hypervisors can pass through or emulate the
required MSRs...

-- 
Vitaly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
  2019-01-09 12:09         ` Vitaly Kuznetsov
@ 2019-01-09 12:14           ` Borislav Petkov
  2019-01-09 18:41             ` Tony Luck
  0 siblings, 1 reply; 9+ messages in thread
From: Borislav Petkov @ 2019-01-09 12:14 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Reinette Chatre, Babu Moger, x86, linux-kernel, Fenghua Yu,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jan H. Schönherr, David Duncan

On Wed, Jan 09, 2019 at 01:09:31PM +0100, Vitaly Kuznetsov wrote:
> Hm, why is that? In theory, hypervisors can pass through or emulate the
> required MSRs...

...and when the theory becomes reality we'll remove the check.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
  2019-01-09 12:14           ` Borislav Petkov
@ 2019-01-09 18:41             ` Tony Luck
  2019-01-10 10:32               ` Vitaly Kuznetsov
  0 siblings, 1 reply; 9+ messages in thread
From: Tony Luck @ 2019-01-09 18:41 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Vitaly Kuznetsov, Reinette Chatre, Babu Moger, X86-ML,
	Linux Kernel Mailing List, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Jan H. Schönherr, David Duncan

On Wed, Jan 9, 2019 at 5:00 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Jan 09, 2019 at 01:09:31PM +0100, Vitaly Kuznetsov wrote:
> > Hm, why is that? In theory, hypervisors can pass through or emulate the
> > required MSRs...
>
> ...and when the theory becomes reality we'll remove the check.

In practice that may be a long time coming. We don't have many CLOSIDs, or
bits in a cache mask, at the h/w level. If you start trying to
subdivide those resources
to pass a subset to a guest, then you'll quickly find that you have no
flexibility in the
guest to do anything useful.  It would only work if you limited to
two, or perhaps three
guests.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
  2019-01-09 18:41             ` Tony Luck
@ 2019-01-10 10:32               ` Vitaly Kuznetsov
  2019-01-10 10:53                 ` Borislav Petkov
  0 siblings, 1 reply; 9+ messages in thread
From: Vitaly Kuznetsov @ 2019-01-10 10:32 UTC (permalink / raw)
  To: Tony Luck, Borislav Petkov
  Cc: Reinette Chatre, Babu Moger, X86-ML, Linux Kernel Mailing List,
	Fenghua Yu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Jan H. Schönherr, David Duncan

Tony Luck <tony.luck@gmail.com> writes:

> On Wed, Jan 9, 2019 at 5:00 AM Borislav Petkov <bp@alien8.de> wrote:
>>
>> On Wed, Jan 09, 2019 at 01:09:31PM +0100, Vitaly Kuznetsov wrote:
>> > Hm, why is that? In theory, hypervisors can pass through or emulate the
>> > required MSRs...
>>
>> ...and when the theory becomes reality we'll remove the check.
>
> In practice that may be a long time coming. We don't have many CLOSIDs, or
> bits in a cache mask, at the h/w level. If you start trying to
> subdivide those resources to pass a subset to a guest, then you'll
> quickly find that you have no flexibility in the guest to do anything
> useful.  It would only work if you limited to two, or perhaps three
> guests.

Running a single guest on a physical CPU is a very common scenario. In
fact, sharing cores is very rare for public clouds: e.g. all worthy
instance types on AWS/Azure give you dedicated cores and I don't see why
hypervisor can't pass through resctl features.

The other thing is: how can we be sure that there's no hypervisor
exposing these feature already? Even if open-source hypervisors like
KVM/Xen don't do it it doesn't prove anything: there are numerous
proprietary hypervisors and who knows what they do.

The original issue which triggered the discussion was discovered on AWS
Xen where the host is buggy and I suggested a simple short-term
workaround, I'm no expert in rdt/qos so I'm leaving this up to the
maintainers to decide which fix deserves to go in (if any).

-- 
Vitaly

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue
  2019-01-10 10:32               ` Vitaly Kuznetsov
@ 2019-01-10 10:53                 ` Borislav Petkov
  0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2019-01-10 10:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Tony Luck, Reinette Chatre, Babu Moger, X86-ML,
	Linux Kernel Mailing List, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Jan H. Schönherr, David Duncan

On Thu, Jan 10, 2019 at 11:32:47AM +0100, Vitaly Kuznetsov wrote:
> The other thing is: how can we be sure that there's no hypervisor
> exposing these feature already? Even if open-source hypervisors like
> KVM/Xen don't do it it doesn't prove anything: there are numerous
> proprietary hypervisors and who knows what they do.

Well, we have this thing called CPUID which enumerates features -
regardless of running on baremetal or on a hypervisor. But apparently
some Intel CPUs dropped the ball there. Which caused the f*ckup we are
in right now.

If you really want to not foreclose that feature for guests - and it
sounds like you do - the other thing I can think of is an early quirk
*setting* those feature bits for those Intel CPUs which "forgot" to set
them and then changing the resctrl code to test CPUID bits first.

This way, the kernel is presented with a unified view on what is
supported by the underlying machine - and that machine can be a HV or
baremetal - kernel doesn't care.

> The original issue which triggered the discussion was discovered on
> AWS Xen where the host is buggy and I suggested a simple short-term
> workaround

And I'm sick'n'tired of simple-short term workarounds and band-aids and
the kernel always bending because people dropped the ball and those
being perpetuated, because, sure, we'll add one more "fix" on the pile,
who cares... :-\

If AWS xen is broken, then that should be fixed - not the kernel.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-01-10 10:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-20 13:40 [PATCH] x86/intel_rdt: use rdmsr_safe() to workaround AWS host issue Vitaly Kuznetsov
2018-12-20 16:17 ` Borislav Petkov
2018-12-20 17:31   ` Vitaly Kuznetsov
     [not found]   ` <51dcb13a-4751-47f5-1e01-f6731a2c6f3c@intel.com>
     [not found]     ` <20521afe-09af-7acf-6f32-3f6e9a971091@intel.com>
2019-01-09 11:39       ` Borislav Petkov
2019-01-09 12:09         ` Vitaly Kuznetsov
2019-01-09 12:14           ` Borislav Petkov
2019-01-09 18:41             ` Tony Luck
2019-01-10 10:32               ` Vitaly Kuznetsov
2019-01-10 10:53                 ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.