All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h
@ 2020-03-09  9:08 Jan Beulich
  2020-04-02  8:34 ` Ping: " Jan Beulich
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jan Beulich @ 2020-03-09  9:08 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:

    There have been reports of RDRAND issues after resuming from suspend on
    some AMD family 15h and family 16h systems. This issue stems from a BIOS
    not performing the proper steps during resume to ensure RDRAND continues
    to function properly.

    Update the CPU initialization to clear the RDRAND CPUID bit for any family
    15h and 16h processor that supports RDRAND. If it is known that the family
    15h or family 16h system does not have an RDRAND resume issue or that the
    system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
    can be used to stop the clearing of the RDRAND CPUID bit.

    Note, that clearing the RDRAND CPUID bit does not prevent a processor
    that normally supports the RDRAND instruction from executing it. So any
    code that determined the support based on family and model won't #UD.

Warn if no explicit choice was given on affected hardware.

Check RDRAND functions at boot as well as after S3 resume (the retry
limit chosen is entirely arbitrary).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Still slightly RFC, and still in particular because of the change to
parse_xen_cpuid(): Alternative approach suggestions are welcome. But now
also because with many CPUs there may now be a lot of warnings in case
of issues.
---
v4: Check always, including during boot. Slightly better sanity check,
    inspired by Linux commit 7879fc4bdc7.
v3: Add call to warning_add(). If force-enabled, check RDRAND still
    functioning after S3 resume.
v2: Re-base.

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -488,6 +488,10 @@ The Speculation Control hardware feature
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
 
+`rdrand` can be used to override the default disabling of the feature on certain
+AMD systems.  Its negative form can of course also be used to suppress use and
+exposure of the feature.
+
 ### cpuid_mask_cpu
 > `= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b`
 
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -4,6 +4,7 @@
 #include <xen/param.h>
 #include <xen/smp.h>
 #include <xen/pci.h>
+#include <xen/warning.h>
 #include <asm/io.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
@@ -646,6 +647,25 @@ static void init_amd(struct cpuinfo_x86
 		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
 			amd_acpi_c1e_quirk = true;
 		break;
+
+	case 0x15: case 0x16:
+		/*
+		 * There are too many Fam15/Fam16 systems where upon resume
+		 * from S3 firmware fails to re-setup properly functioning
+		 * RDRAND.  Clear the feature unless force-enabled on the
+		 * command line.
+		 */
+		if (c == &boot_cpu_data &&
+		    cpu_has(c, X86_FEATURE_RDRAND) &&
+		    !is_forced_cpu_cap(X86_FEATURE_RDRAND)) {
+			static const char __initconst text[] =
+				"RDRAND may cease to work on this hardware upon resume from S3.\n"
+				"Please choose an explicit cpuid={no-}rdrand setting.\n";
+
+			setup_clear_cpu_cap(X86_FEATURE_RDRAND);
+			warning_add(text);
+		}
+		break;
 	}
 
 	display_cacheinfo(c);
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -11,6 +11,7 @@
 #include <asm/io.h>
 #include <asm/mpspec.h>
 #include <asm/apic.h>
+#include <asm/random.h>
 #include <asm/setup.h>
 #include <mach_apic.h>
 #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
@@ -98,6 +99,11 @@ void __init setup_force_cpu_cap(unsigned
 	__set_bit(cap, boot_cpu_data.x86_capability);
 }
 
+bool is_forced_cpu_cap(unsigned int cap)
+{
+	return test_bit(cap, forced_caps);
+}
+
 static void default_init(struct cpuinfo_x86 * c)
 {
 	/* Not much we can do here... */
@@ -498,6 +504,28 @@ void identify_cpu(struct cpuinfo_x86 *c)
 	printk("\n");
 #endif
 
+	/*
+	 * If RDRAND is available, make an attempt to check that it actually
+	 * (still) works.
+	 */
+	if (cpu_has(c, X86_FEATURE_RDRAND)) {
+		unsigned int prev = 0;
+
+		for (i = 0; i < 5; ++i)
+		{
+			unsigned int cur = arch_get_random();
+
+			if (prev && cur != prev)
+				break;
+			prev = cur;
+			cpu_relax();
+		}
+
+		if (i >= 5)
+			printk(XENLOG_WARNING "CPU%u: RDRAND appears to not work\n",
+			       smp_processor_id());
+	}
+
 	if (system_state == SYS_STATE_resume)
 		return;
 
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -71,6 +71,9 @@ static int __init parse_xen_cpuid(const
             {
                 if ( !val )
                     setup_clear_cpu_cap(mid->bit);
+                else if ( mid->bit == X86_FEATURE_RDRAND &&
+                          (cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_RDRAND)) )
+                    setup_force_cpu_cap(X86_FEATURE_RDRAND);
                 mid = NULL;
             }
 
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -166,6 +166,7 @@ extern const struct x86_cpu_id *x86_matc
 extern void identify_cpu(struct cpuinfo_x86 *);
 extern void setup_clear_cpu_cap(unsigned int);
 extern void setup_force_cpu_cap(unsigned int);
+extern bool is_forced_cpu_cap(unsigned int);
 extern void print_cpu_info(unsigned int cpu);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Ping: [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2020-03-09  9:08 [Xen-devel] [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h Jan Beulich
@ 2020-04-02  8:34 ` Jan Beulich
  2020-04-02 14:25 ` Andrew Cooper
  2020-05-15 15:18 ` Roger Pau Monné
  2 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2020-04-02  8:34 UTC (permalink / raw)
  To: Andrew Cooper, Wei Liu, Roger Pau Monné; +Cc: xen-devel

On 09.03.2020 10:08, Jan Beulich wrote:
> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
> 
>     There have been reports of RDRAND issues after resuming from suspend on
>     some AMD family 15h and family 16h systems. This issue stems from a BIOS
>     not performing the proper steps during resume to ensure RDRAND continues
>     to function properly.
> 
>     Update the CPU initialization to clear the RDRAND CPUID bit for any family
>     15h and 16h processor that supports RDRAND. If it is known that the family
>     15h or family 16h system does not have an RDRAND resume issue or that the
>     system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
>     can be used to stop the clearing of the RDRAND CPUID bit.
> 
>     Note, that clearing the RDRAND CPUID bit does not prevent a processor
>     that normally supports the RDRAND instruction from executing it. So any
>     code that determined the support based on family and model won't #UD.
> 
> Warn if no explicit choice was given on affected hardware.
> 
> Check RDRAND functions at boot as well as after S3 resume (the retry
> limit chosen is entirely arbitrary).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Still slightly RFC, and still in particular because of the change to
> parse_xen_cpuid(): Alternative approach suggestions are welcome. But now
> also because with many CPUs there may now be a lot of warnings in case
> of issues.

Ping?

> ---
> v4: Check always, including during boot. Slightly better sanity check,
>     inspired by Linux commit 7879fc4bdc7.
> v3: Add call to warning_add(). If force-enabled, check RDRAND still
>     functioning after S3 resume.
> v2: Re-base.
> 
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -488,6 +488,10 @@ The Speculation Control hardware feature
>  be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
>  won't offer them to guests.
>  
> +`rdrand` can be used to override the default disabling of the feature on certain
> +AMD systems.  Its negative form can of course also be used to suppress use and
> +exposure of the feature.
> +
>  ### cpuid_mask_cpu
>  > `= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b`
>  
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -4,6 +4,7 @@
>  #include <xen/param.h>
>  #include <xen/smp.h>
>  #include <xen/pci.h>
> +#include <xen/warning.h>
>  #include <asm/io.h>
>  #include <asm/msr.h>
>  #include <asm/processor.h>
> @@ -646,6 +647,25 @@ static void init_amd(struct cpuinfo_x86
>  		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
>  			amd_acpi_c1e_quirk = true;
>  		break;
> +
> +	case 0x15: case 0x16:
> +		/*
> +		 * There are too many Fam15/Fam16 systems where upon resume
> +		 * from S3 firmware fails to re-setup properly functioning
> +		 * RDRAND.  Clear the feature unless force-enabled on the
> +		 * command line.
> +		 */
> +		if (c == &boot_cpu_data &&
> +		    cpu_has(c, X86_FEATURE_RDRAND) &&
> +		    !is_forced_cpu_cap(X86_FEATURE_RDRAND)) {
> +			static const char __initconst text[] =
> +				"RDRAND may cease to work on this hardware upon resume from S3.\n"
> +				"Please choose an explicit cpuid={no-}rdrand setting.\n";
> +
> +			setup_clear_cpu_cap(X86_FEATURE_RDRAND);
> +			warning_add(text);
> +		}
> +		break;
>  	}
>  
>  	display_cacheinfo(c);
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -11,6 +11,7 @@
>  #include <asm/io.h>
>  #include <asm/mpspec.h>
>  #include <asm/apic.h>
> +#include <asm/random.h>
>  #include <asm/setup.h>
>  #include <mach_apic.h>
>  #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
> @@ -98,6 +99,11 @@ void __init setup_force_cpu_cap(unsigned
>  	__set_bit(cap, boot_cpu_data.x86_capability);
>  }
>  
> +bool is_forced_cpu_cap(unsigned int cap)
> +{
> +	return test_bit(cap, forced_caps);
> +}
> +
>  static void default_init(struct cpuinfo_x86 * c)
>  {
>  	/* Not much we can do here... */
> @@ -498,6 +504,28 @@ void identify_cpu(struct cpuinfo_x86 *c)
>  	printk("\n");
>  #endif
>  
> +	/*
> +	 * If RDRAND is available, make an attempt to check that it actually
> +	 * (still) works.
> +	 */
> +	if (cpu_has(c, X86_FEATURE_RDRAND)) {
> +		unsigned int prev = 0;
> +
> +		for (i = 0; i < 5; ++i)
> +		{
> +			unsigned int cur = arch_get_random();
> +
> +			if (prev && cur != prev)
> +				break;
> +			prev = cur;
> +			cpu_relax();
> +		}
> +
> +		if (i >= 5)
> +			printk(XENLOG_WARNING "CPU%u: RDRAND appears to not work\n",
> +			       smp_processor_id());
> +	}
> +
>  	if (system_state == SYS_STATE_resume)
>  		return;
>  
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -71,6 +71,9 @@ static int __init parse_xen_cpuid(const
>              {
>                  if ( !val )
>                      setup_clear_cpu_cap(mid->bit);
> +                else if ( mid->bit == X86_FEATURE_RDRAND &&
> +                          (cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_RDRAND)) )
> +                    setup_force_cpu_cap(X86_FEATURE_RDRAND);
>                  mid = NULL;
>              }
>  
> --- a/xen/include/asm-x86/processor.h
> +++ b/xen/include/asm-x86/processor.h
> @@ -166,6 +166,7 @@ extern const struct x86_cpu_id *x86_matc
>  extern void identify_cpu(struct cpuinfo_x86 *);
>  extern void setup_clear_cpu_cap(unsigned int);
>  extern void setup_force_cpu_cap(unsigned int);
> +extern bool is_forced_cpu_cap(unsigned int);
>  extern void print_cpu_info(unsigned int cpu);
>  extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
>  


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2020-03-09  9:08 [Xen-devel] [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h Jan Beulich
  2020-04-02  8:34 ` Ping: " Jan Beulich
@ 2020-04-02 14:25 ` Andrew Cooper
  2020-04-29  8:22   ` Jan Beulich
  2020-05-15 15:18 ` Roger Pau Monné
  2 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2020-04-02 14:25 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Paul Durrant, George Dunlap, Wei Liu, Roger Pau Monné

On 09/03/2020 09:08, Jan Beulich wrote:
> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>
>     There have been reports of RDRAND issues after resuming from suspend on
>     some AMD family 15h and family 16h systems. This issue stems from a BIOS
>     not performing the proper steps during resume to ensure RDRAND continues
>     to function properly.
>
>     Update the CPU initialization to clear the RDRAND CPUID bit for any family
>     15h and 16h processor that supports RDRAND. If it is known that the family
>     15h or family 16h system does not have an RDRAND resume issue or that the
>     system will not be placed in suspend, the "cpuid=rdrand" kernel parameter

I'm not sure what is best to do here.  The type suggests that this is a
verbatim copy of the Linux commit message, but this tiny detail is Xen
specific.

>     can be used to stop the clearing of the RDRAND CPUID bit.
>
>     Note, that clearing the RDRAND CPUID bit does not prevent a processor
>     that normally supports the RDRAND instruction from executing it. So any
>     code that determined the support based on family and model won't #UD.
>
> Warn if no explicit choice was given on affected hardware.
>
> Check RDRAND functions at boot as well as after S3 resume (the retry
> limit chosen is entirely arbitrary).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

CC'ing Paul & George for 4.14 things:

tl;dr Some AMD system firmware doesn't reinitialise the RNG
configuration on S3, resulting in the RDRAND instruction returning -1
and "data good".

AMD's recommended mitigation is to disable RDRAND entirely.  We can't
identify the problem before an S3 resume and finding the instruction
broken, and can't repair the damage at that point (The MSR needing
fixing has already been locked), and AMD don't have a list of DMI
quirks/equivalent to use to identify the buggy systems on boot.

Also, there is no public statement from AMD on the matter, but one
obvious fallout on Linux systems is systemd looping forever trying to
create a UUID which doesn't already alias in its database.

The impact of this would be substantially less bad if Xen could identify
(a lack of) platform support for S3 at boot, as AMD servers have never
supported S3.  This would reduce the user-nagging to only client
systems, but there isn't an obvious way to do this.

It's a complete mess, but I don't think we should put this out without
some form of release note.

> ---
> Still slightly RFC, and still in particular because of the change to
> parse_xen_cpuid():

FWIW, that is very similar to XenServer's AVX512 off-by-default bodge
until the default vs max policy work is ready.

I don't have a better suggestion right now, but hopefully something
better might become obvious when we've got more users.  Either way, I'm
expecting it to turn into a "switch ( mid->bit )" expression in due course.

> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -4,6 +4,7 @@
>  #include <xen/param.h>
>  #include <xen/smp.h>
>  #include <xen/pci.h>
> +#include <xen/warning.h>
>  #include <asm/io.h>
>  #include <asm/msr.h>
>  #include <asm/processor.h>
> @@ -646,6 +647,25 @@ static void init_amd(struct cpuinfo_x86
>  		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
>  			amd_acpi_c1e_quirk = true;
>  		break;
> +
> +	case 0x15: case 0x16:
> +		/*
> +		 * There are too many Fam15/Fam16 systems where upon resume

"some" systems.

> +		 * from S3 firmware fails to re-setup properly functioning
> +		 * RDRAND.

I think this needs another sentence of explanation.

By the time we can spot the problem, it is too late to take evasive
action, and there is nothing Xen can do to repair the problem.

>   Clear the feature unless force-enabled on the
> +		 * command line.
> +		 */
> +		if (c == &boot_cpu_data &&
> +		    cpu_has(c, X86_FEATURE_RDRAND) &&
> +		    !is_forced_cpu_cap(X86_FEATURE_RDRAND)) {
> +			static const char __initconst text[] =
> +				"RDRAND may cease to work on this hardware upon resume from S3.\n"
> +				"Please choose an explicit cpuid={no-}rdrand setting.\n";
> +
> +			setup_clear_cpu_cap(X86_FEATURE_RDRAND);
> +			warning_add(text);

What do you think to clobbering RDRAND via the CPUMASK registers in this
case?  We've got full control there, and it would stop PV userspace as well.

> +		}
> +		break;
>  	}
>  
>  	display_cacheinfo(c);
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -11,6 +11,7 @@
>  #include <asm/io.h>
>  #include <asm/mpspec.h>
>  #include <asm/apic.h>
> +#include <asm/random.h>
>  #include <asm/setup.h>
>  #include <mach_apic.h>
>  #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
> @@ -98,6 +99,11 @@ void __init setup_force_cpu_cap(unsigned
>  	__set_bit(cap, boot_cpu_data.x86_capability);
>  }
>  
> +bool is_forced_cpu_cap(unsigned int cap)
> +{
> +	return test_bit(cap, forced_caps);
> +}
> +
>  static void default_init(struct cpuinfo_x86 * c)
>  {
>  	/* Not much we can do here... */
> @@ -498,6 +504,28 @@ void identify_cpu(struct cpuinfo_x86 *c)
>  	printk("\n");
>  #endif
>  
> +	/*
> +	 * If RDRAND is available, make an attempt to check that it actually
> +	 * (still) works.
> +	 */

Do you think it would be helpful to test in the opposite case as well. 
If we come back from S3 and find that RDRAND does actually work, then it
is safe to tell the user that they can re-enable.

> +	if (cpu_has(c, X86_FEATURE_RDRAND)) {
> +		unsigned int prev = 0;
> +
> +		for (i = 0; i < 5; ++i)
> +		{
> +			unsigned int cur = arch_get_random();
> +
> +			if (prev && cur != prev)
> +				break;
> +			prev = cur;
> +			cpu_relax();

Why the relax?  We're not polling hammering the memory bus waiting for
an unknown period of time until something changes.

We simply need up to 5 random numbers as fast as the RNG can produce
them (which is actually quite slow.  RDRAND has ~350 cycle latency minimum.)

~Andrew

> +		}
> +
> +		if (i >= 5)
> +			printk(XENLOG_WARNING "CPU%u: RDRAND appears to not work\n",
> +			       smp_processor_id());
> +	}
> +
>  	if (system_state == SYS_STATE_resume)
>  		return;
>  



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2020-04-02 14:25 ` Andrew Cooper
@ 2020-04-29  8:22   ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2020-04-29  8:22 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: xen-devel, Paul Durrant, George Dunlap, Wei Liu, Roger Pau Monné

On 02.04.2020 16:25, Andrew Cooper wrote:
> On 09/03/2020 09:08, Jan Beulich wrote:
>> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>>
>>     There have been reports of RDRAND issues after resuming from suspend on
>>     some AMD family 15h and family 16h systems. This issue stems from a BIOS
>>     not performing the proper steps during resume to ensure RDRAND continues
>>     to function properly.
>>
>>     Update the CPU initialization to clear the RDRAND CPUID bit for any family
>>     15h and 16h processor that supports RDRAND. If it is known that the family
>>     15h or family 16h system does not have an RDRAND resume issue or that the
>>     system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
> 
> I'm not sure what is best to do here.  The type suggests that this is a
> verbatim copy of the Linux commit message, but this tiny detail is Xen
> specific.

It simply didn't seem to make sense to leave the Linux way of
specifying this in here, just to then say further down what the
correct (Xen) way is.

>> ---
>> Still slightly RFC, and still in particular because of the change to
>> parse_xen_cpuid():
> 
> FWIW, that is very similar to XenServer's AVX512 off-by-default bodge
> until the default vs max policy work is ready.
> 
> I don't have a better suggestion right now, but hopefully something
> better might become obvious when we've got more users.  Either way, I'm
> expecting it to turn into a "switch ( mid->bit )" expression in due course.

IOW do you want me to use switch() right away?

>> @@ -646,6 +647,25 @@ static void init_amd(struct cpuinfo_x86
>>  		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
>>  			amd_acpi_c1e_quirk = true;
>>  		break;
>> +
>> +	case 0x15: case 0x16:
>> +		/*
>> +		 * There are too many Fam15/Fam16 systems where upon resume
> 
> "some" systems.
> 
>> +		 * from S3 firmware fails to re-setup properly functioning
>> +		 * RDRAND.
> 
> I think this needs another sentence of explanation.
> 
> By the time we can spot the problem, it is too late to take evasive
> action, and there is nothing Xen can do to repair the problem.

Sure, added.

>>   Clear the feature unless force-enabled on the
>> +		 * command line.
>> +		 */
>> +		if (c == &boot_cpu_data &&
>> +		    cpu_has(c, X86_FEATURE_RDRAND) &&
>> +		    !is_forced_cpu_cap(X86_FEATURE_RDRAND)) {
>> +			static const char __initconst text[] =
>> +				"RDRAND may cease to work on this hardware upon resume from S3.\n"
>> +				"Please choose an explicit cpuid={no-}rdrand setting.\n";
>> +
>> +			setup_clear_cpu_cap(X86_FEATURE_RDRAND);
>> +			warning_add(text);
> 
> What do you think to clobbering RDRAND via the CPUMASK registers in this
> case?  We've got full control there, and it would stop PV userspace as well.

Why would such be needed? The host_policy -> pv_max_cpuid_policy
-> recalculate_cpuid_policy() propagation already causes the flag
to get zapped from guest policies once it got cleared here. And
it's the guest policy that controls what gets put in the masking
MSRs for PV guests, isn't it?

>> @@ -498,6 +504,28 @@ void identify_cpu(struct cpuinfo_x86 *c)
>>  	printk("\n");
>>  #endif
>>  
>> +	/*
>> +	 * If RDRAND is available, make an attempt to check that it actually
>> +	 * (still) works.
>> +	 */
> 
> Do you think it would be helpful to test in the opposite case as well. 
> If we come back from S3 and find that RDRAND does actually work, then it
> is safe to tell the user that they can re-enable.

I'd view this as a nice-to-have that isn't all that obvious how to
actually implement in a sufficiently clean way. For example, we
can't use arch_get_random() in that case. Therefore I'd prefer if
this extra courtesy could be left out of here for now, and - if
indeed deemed useful - be added later.

>> +	if (cpu_has(c, X86_FEATURE_RDRAND)) {
>> +		unsigned int prev = 0;
>> +
>> +		for (i = 0; i < 5; ++i)
>> +		{
>> +			unsigned int cur = arch_get_random();
>> +
>> +			if (prev && cur != prev)
>> +				break;
>> +			prev = cur;
>> +			cpu_relax();
> 
> Why the relax?  We're not polling hammering the memory bus waiting for
> an unknown period of time until something changes.
> 
> We simply need up to 5 random numbers as fast as the RNG can produce
> them (which is actually quite slow.  RDRAND has ~350 cycle latency minimum.)

Dropped; I put it there simply to give the hardware some breathing
room between adjacent requests.

Jan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2020-03-09  9:08 [Xen-devel] [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h Jan Beulich
  2020-04-02  8:34 ` Ping: " Jan Beulich
  2020-04-02 14:25 ` Andrew Cooper
@ 2020-05-15 15:18 ` Roger Pau Monné
  2020-05-18 13:13   ` Jan Beulich
  2 siblings, 1 reply; 6+ messages in thread
From: Roger Pau Monné @ 2020-05-15 15:18 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Wei Liu, Andrew Cooper

On Mon, Mar 09, 2020 at 10:08:50AM +0100, Jan Beulich wrote:
> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
> 
>     There have been reports of RDRAND issues after resuming from suspend on
>     some AMD family 15h and family 16h systems. This issue stems from a BIOS
>     not performing the proper steps during resume to ensure RDRAND continues
>     to function properly.
> 
>     Update the CPU initialization to clear the RDRAND CPUID bit for any family
>     15h and 16h processor that supports RDRAND. If it is known that the family
>     15h or family 16h system does not have an RDRAND resume issue or that the
>     system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
>     can be used to stop the clearing of the RDRAND CPUID bit.
> 
>     Note, that clearing the RDRAND CPUID bit does not prevent a processor
>     that normally supports the RDRAND instruction from executing it. So any
>     code that determined the support based on family and model won't #UD.
> 
> Warn if no explicit choice was given on affected hardware.
> 
> Check RDRAND functions at boot as well as after S3 resume (the retry
> limit chosen is entirely arbitrary).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> Still slightly RFC, and still in particular because of the change to
> parse_xen_cpuid(): Alternative approach suggestions are welcome. But now
> also because with many CPUs there may now be a lot of warnings in case
> of issues.
> ---
> v4: Check always, including during boot. Slightly better sanity check,
>     inspired by Linux commit 7879fc4bdc7.
> v3: Add call to warning_add(). If force-enabled, check RDRAND still
>     functioning after S3 resume.
> v2: Re-base.
> 
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -488,6 +488,10 @@ The Speculation Control hardware feature
>  be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
>  won't offer them to guests.
>  
> +`rdrand` can be used to override the default disabling of the feature on certain
> +AMD systems.  Its negative form can of course also be used to suppress use and
> +exposure of the feature.
> +
>  ### cpuid_mask_cpu
>  > `= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b`
>  
> --- a/xen/arch/x86/cpu/amd.c
> +++ b/xen/arch/x86/cpu/amd.c
> @@ -4,6 +4,7 @@
>  #include <xen/param.h>
>  #include <xen/smp.h>
>  #include <xen/pci.h>
> +#include <xen/warning.h>
>  #include <asm/io.h>
>  #include <asm/msr.h>
>  #include <asm/processor.h>
> @@ -646,6 +647,25 @@ static void init_amd(struct cpuinfo_x86
>  		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
>  			amd_acpi_c1e_quirk = true;
>  		break;
> +
> +	case 0x15: case 0x16:
> +		/*
> +		 * There are too many Fam15/Fam16 systems where upon resume
> +		 * from S3 firmware fails to re-setup properly functioning
> +		 * RDRAND.  Clear the feature unless force-enabled on the
> +		 * command line.
> +		 */
> +		if (c == &boot_cpu_data &&
> +		    cpu_has(c, X86_FEATURE_RDRAND) &&
> +		    !is_forced_cpu_cap(X86_FEATURE_RDRAND)) {

Given this is the only user of is_forced_cpu_cap...

> +			static const char __initconst text[] =
> +				"RDRAND may cease to work on this hardware upon resume from S3.\n"
> +				"Please choose an explicit cpuid={no-}rdrand setting.\n";
> +
> +			setup_clear_cpu_cap(X86_FEATURE_RDRAND);
> +			warning_add(text);
> +		}
> +		break;
>  	}
>  
>  	display_cacheinfo(c);
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -11,6 +11,7 @@
>  #include <asm/io.h>
>  #include <asm/mpspec.h>
>  #include <asm/apic.h>
> +#include <asm/random.h>
>  #include <asm/setup.h>
>  #include <mach_apic.h>
>  #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
> @@ -98,6 +99,11 @@ void __init setup_force_cpu_cap(unsigned
>  	__set_bit(cap, boot_cpu_data.x86_capability);
>  }
>  
> +bool is_forced_cpu_cap(unsigned int cap)

... I think this could be made __init?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2020-05-15 15:18 ` Roger Pau Monné
@ 2020-05-18 13:13   ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2020-05-18 13:13 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Wei Liu, Andrew Cooper

On 15.05.2020 17:18, Roger Pau Monné wrote:
> On Mon, Mar 09, 2020 at 10:08:50AM +0100, Jan Beulich wrote:
>> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>>
>>     There have been reports of RDRAND issues after resuming from suspend on
>>     some AMD family 15h and family 16h systems. This issue stems from a BIOS
>>     not performing the proper steps during resume to ensure RDRAND continues
>>     to function properly.
>>
>>     Update the CPU initialization to clear the RDRAND CPUID bit for any family
>>     15h and 16h processor that supports RDRAND. If it is known that the family
>>     15h or family 16h system does not have an RDRAND resume issue or that the
>>     system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
>>     can be used to stop the clearing of the RDRAND CPUID bit.
>>
>>     Note, that clearing the RDRAND CPUID bit does not prevent a processor
>>     that normally supports the RDRAND instruction from executing it. So any
>>     code that determined the support based on family and model won't #UD.
>>
>> Warn if no explicit choice was given on affected hardware.
>>
>> Check RDRAND functions at boot as well as after S3 resume (the retry
>> limit chosen is entirely arbitrary).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks much.

>> @@ -646,6 +647,25 @@ static void init_amd(struct cpuinfo_x86
>>  		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
>>  			amd_acpi_c1e_quirk = true;
>>  		break;
>> +
>> +	case 0x15: case 0x16:
>> +		/*
>> +		 * There are too many Fam15/Fam16 systems where upon resume
>> +		 * from S3 firmware fails to re-setup properly functioning
>> +		 * RDRAND.  Clear the feature unless force-enabled on the
>> +		 * command line.
>> +		 */
>> +		if (c == &boot_cpu_data &&
>> +		    cpu_has(c, X86_FEATURE_RDRAND) &&
>> +		    !is_forced_cpu_cap(X86_FEATURE_RDRAND)) {
> 
> Given this is the only user of is_forced_cpu_cap...
> 
>> +			static const char __initconst text[] =
>> +				"RDRAND may cease to work on this hardware upon resume from S3.\n"
>> +				"Please choose an explicit cpuid={no-}rdrand setting.\n";
>> +
>> +			setup_clear_cpu_cap(X86_FEATURE_RDRAND);
>> +			warning_add(text);
>> +		}
>> +		break;
>>  	}
>>  
>>  	display_cacheinfo(c);
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -11,6 +11,7 @@
>>  #include <asm/io.h>
>>  #include <asm/mpspec.h>
>>  #include <asm/apic.h>
>> +#include <asm/random.h>
>>  #include <asm/setup.h>
>>  #include <mach_apic.h>
>>  #include <public/sysctl.h> /* for XEN_INVALID_{SOCKET,CORE}_ID */
>> @@ -98,6 +99,11 @@ void __init setup_force_cpu_cap(unsigned
>>  	__set_bit(cap, boot_cpu_data.x86_capability);
>>  }
>>  
>> +bool is_forced_cpu_cap(unsigned int cap)
> 
> ... I think this could be made __init?

Ah, now it can be again, yes. It was an endless back and forth between
the various versions (some not even posted).

Jan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-18 13:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-09  9:08 [Xen-devel] [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h Jan Beulich
2020-04-02  8:34 ` Ping: " Jan Beulich
2020-04-02 14:25 ` Andrew Cooper
2020-04-29  8:22   ` Jan Beulich
2020-05-15 15:18 ` Roger Pau Monné
2020-05-18 13:13   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.