xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h
@ 2019-08-29  9:28 Jan Beulich
  2019-08-29 11:42 ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2019-08-29  9:28 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Wei Liu, Roger Pau Monné

Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:

    There have been reports of RDRAND issues after resuming from suspend on
    some AMD family 15h and family 16h systems. This issue stems from a BIOS
    not performing the proper steps during resume to ensure RDRAND continues
    to function properly.

    Update the CPU initialization to clear the RDRAND CPUID bit for any family
    15h and 16h processor that supports RDRAND. If it is known that the family
    15h or family 16h system does not have an RDRAND resume issue or that the
    system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
    can be used to stop the clearing of the RDRAND CPUID bit.

    Note, that clearing the RDRAND CPUID bit does not prevent a processor
    that normally supports the RDRAND instruction from executing it. So any
    code that determined the support based on family and model won't #UD.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Slightly RFC, in particular because of the change to parse_xen_cpuid():
Alternative approach suggestions are welcome.

--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -488,6 +488,10 @@ The Speculation Control hardware feature
 be ignored, e.g. `no-ibrsb`, at which point Xen won't use them itself, and
 won't offer them to guests.
 
+`rdrand` can be used to override the default disabling of the feature on certain
+AMD systems.  Its negative form can of course also be used to suppress use and
+exposure of the feature.
+
 ### cpuid_mask_cpu
 > `= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b`
 
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -641,6 +641,18 @@ static void init_amd(struct cpuinfo_x86
 		if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
 			amd_acpi_c1e_quirk = true;
 		break;
+
+	case 0x15: case 0x16:
+		/*
+		 * There are too many Fam15/Fam16 systems where upon resume
+		 * from S3 firmware fails to re-setup properly functioning
+		 * RDRAND.  Clear the feature unless force-enabled on the
+		 * command line.
+		 */
+		if (c == &boot_cpu_data &&
+		    !is_forced_cpu_cap(X86_FEATURE_RDRAND))
+			setup_clear_cpu_cap(X86_FEATURE_RDRAND);
+		break;
 	}
 
 	display_cacheinfo(c);
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -94,6 +94,11 @@ void __init setup_force_cpu_cap(unsigned
 	__set_bit(cap, boot_cpu_data.x86_capability);
 }
 
+bool __init is_forced_cpu_cap(unsigned int cap)
+{
+	return test_bit(cap, forced_caps);
+}
+
 static void default_init(struct cpuinfo_x86 * c)
 {
 	/* Not much we can do here... */
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -74,6 +74,13 @@ static int __init parse_xen_cpuid(const
                 setup_clear_cpu_cap(X86_FEATURE_AMD_SSBD);
             }
         }
+        else if ( (val = parse_boolean("rdrand", s, ss)) >= 0 )
+        {
+            if ( !val )
+                setup_clear_cpu_cap(X86_FEATURE_RDRAND);
+            else if ( cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_RDRAND) )
+                setup_force_cpu_cap(X86_FEATURE_RDRAND);
+        }
         else
             rc = -EINVAL;
 
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -166,6 +166,7 @@ extern const struct x86_cpu_id *x86_matc
 extern void identify_cpu(struct cpuinfo_x86 *);
 extern void setup_clear_cpu_cap(unsigned int);
 extern void setup_force_cpu_cap(unsigned int);
+extern bool is_forced_cpu_cap(unsigned int);
 extern void print_cpu_info(unsigned int cpu);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2019-08-29  9:28 [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h Jan Beulich
@ 2019-08-29 11:42 ` Andrew Cooper
  2019-08-29 12:01   ` Juergen Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2019-08-29 11:42 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 29/08/2019 10:28, Jan Beulich wrote:
> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>
>     There have been reports of RDRAND issues after resuming from suspend on
>     some AMD family 15h and family 16h systems. This issue stems from a BIOS
>     not performing the proper steps during resume to ensure RDRAND continues
>     to function properly.
>
>     Update the CPU initialization to clear the RDRAND CPUID bit for any family
>     15h and 16h processor that supports RDRAND. If it is known that the family
>     15h or family 16h system does not have an RDRAND resume issue or that the
>     system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
>     can be used to stop the clearing of the RDRAND CPUID bit.
>
>     Note, that clearing the RDRAND CPUID bit does not prevent a processor
>     that normally supports the RDRAND instruction from executing it. So any
>     code that determined the support based on family and model won't #UD.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Slightly RFC, in particular because of the change to parse_xen_cpuid():
> Alternative approach suggestions are welcome.

This issue was on my radar, but I hadn't got around to starting a patch yet.

I was wondering if we could perhaps default it to whether S3 was
available, but looking at the code which prints "ACPI sleep modes: S3",
it doesn't appear to be related to any real ACPI information.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2019-08-29 11:42 ` Andrew Cooper
@ 2019-08-29 12:01   ` Juergen Gross
  2019-08-29 12:27     ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Juergen Gross @ 2019-08-29 12:01 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, xen-devel; +Cc: Wei Liu, Roger Pau Monné

On 29.08.19 13:42, Andrew Cooper wrote:
> On 29/08/2019 10:28, Jan Beulich wrote:
>> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>>
>>      There have been reports of RDRAND issues after resuming from suspend on
>>      some AMD family 15h and family 16h systems. This issue stems from a BIOS
>>      not performing the proper steps during resume to ensure RDRAND continues
>>      to function properly.
>>
>>      Update the CPU initialization to clear the RDRAND CPUID bit for any family
>>      15h and 16h processor that supports RDRAND. If it is known that the family
>>      15h or family 16h system does not have an RDRAND resume issue or that the
>>      system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
>>      can be used to stop the clearing of the RDRAND CPUID bit.
>>
>>      Note, that clearing the RDRAND CPUID bit does not prevent a processor
>>      that normally supports the RDRAND instruction from executing it. So any
>>      code that determined the support based on family and model won't #UD.

Yeah, but it will no longer see random numbers after resume.

>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Slightly RFC, in particular because of the change to parse_xen_cpuid():
>> Alternative approach suggestions are welcome.
> 
> This issue was on my radar, but I hadn't got around to starting a patch yet.
> 
> I was wondering if we could perhaps default it to whether S3 was
> available, but looking at the code which prints "ACPI sleep modes: S3",
> it doesn't appear to be related to any real ACPI information.

Wouldn't it make more sense to inhibit suspend/resume instead?

I'm quite sure a machine running Xen is very rarely put to S3.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2019-08-29 12:01   ` Juergen Gross
@ 2019-08-29 12:27     ` Jan Beulich
  2019-08-29 12:39       ` Juergen Gross
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2019-08-29 12:27 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 29.08.2019 14:01, Juergen Gross wrote:
> On 29.08.19 13:42, Andrew Cooper wrote:
>> On 29/08/2019 10:28, Jan Beulich wrote:
>>> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>>>
>>>      There have been reports of RDRAND issues after resuming from suspend on
>>>      some AMD family 15h and family 16h systems. This issue stems from a BIOS
>>>      not performing the proper steps during resume to ensure RDRAND continues
>>>      to function properly.
>>>
>>>      Update the CPU initialization to clear the RDRAND CPUID bit for any family
>>>      15h and 16h processor that supports RDRAND. If it is known that the family
>>>      15h or family 16h system does not have an RDRAND resume issue or that the
>>>      system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
>>>      can be used to stop the clearing of the RDRAND CPUID bit.
>>>
>>>      Note, that clearing the RDRAND CPUID bit does not prevent a processor
>>>      that normally supports the RDRAND instruction from executing it. So any
>>>      code that determined the support based on family and model won't #UD.
> 
> Yeah, but it will no longer see random numbers after resume.

That's the implication; note that I've taken the text from the
Linux commit.

>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>> ---
>>> Slightly RFC, in particular because of the change to parse_xen_cpuid():
>>> Alternative approach suggestions are welcome.
>>
>> This issue was on my radar, but I hadn't got around to starting a patch yet.
>>
>> I was wondering if we could perhaps default it to whether S3 was
>> available, but looking at the code which prints "ACPI sleep modes: S3",
>> it doesn't appear to be related to any real ACPI information.
> 
> Wouldn't it make more sense to inhibit suspend/resume instead?

That's an alternative option. But if anything I'd see us providing
both, not the one you suggest instead of what the patch here does.

> I'm quite sure a machine running Xen is very rarely put to S3.

I'm not at all sure about this.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2019-08-29 12:27     ` Jan Beulich
@ 2019-08-29 12:39       ` Juergen Gross
  2019-08-29 12:49         ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Juergen Gross @ 2019-08-29 12:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 29.08.19 14:27, Jan Beulich wrote:
> On 29.08.2019 14:01, Juergen Gross wrote:
>> On 29.08.19 13:42, Andrew Cooper wrote:
>>> On 29/08/2019 10:28, Jan Beulich wrote:
>>>> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>>>>
>>>>       There have been reports of RDRAND issues after resuming from suspend on
>>>>       some AMD family 15h and family 16h systems. This issue stems from a BIOS
>>>>       not performing the proper steps during resume to ensure RDRAND continues
>>>>       to function properly.
>>>>
>>>>       Update the CPU initialization to clear the RDRAND CPUID bit for any family
>>>>       15h and 16h processor that supports RDRAND. If it is known that the family
>>>>       15h or family 16h system does not have an RDRAND resume issue or that the
>>>>       system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
>>>>       can be used to stop the clearing of the RDRAND CPUID bit.
>>>>
>>>>       Note, that clearing the RDRAND CPUID bit does not prevent a processor
>>>>       that normally supports the RDRAND instruction from executing it. So any
>>>>       code that determined the support based on family and model won't #UD.
>>
>> Yeah, but it will no longer see random numbers after resume.
> 
> That's the implication; note that I've taken the text from the
> Linux commit.
> 
>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>> ---
>>>> Slightly RFC, in particular because of the change to parse_xen_cpuid():
>>>> Alternative approach suggestions are welcome.
>>>
>>> This issue was on my radar, but I hadn't got around to starting a patch yet.
>>>
>>> I was wondering if we could perhaps default it to whether S3 was
>>> available, but looking at the code which prints "ACPI sleep modes: S3",
>>> it doesn't appear to be related to any real ACPI information.
>>
>> Wouldn't it make more sense to inhibit suspend/resume instead?
> 
> That's an alternative option. But if anything I'd see us providing
> both, not the one you suggest instead of what the patch here does.
> 
>> I'm quite sure a machine running Xen is very rarely put to S3.
> 
> I'm not at all sure about this.

Suspend/resume in Xen was broken for more than 3 months in the late 4.12
development phase and nobody noticed it. Only when I started my core
scheduling series I came across the issue.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h
  2019-08-29 12:39       ` Juergen Gross
@ 2019-08-29 12:49         ` Andrew Cooper
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2019-08-29 12:49 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich; +Cc: xen-devel, Wei Liu, Roger Pau Monné

On 29/08/2019 13:39, Juergen Gross wrote:
> On 29.08.19 14:27, Jan Beulich wrote:
>> On 29.08.2019 14:01, Juergen Gross wrote:
>>> On 29.08.19 13:42, Andrew Cooper wrote:
>>>> On 29/08/2019 10:28, Jan Beulich wrote:
>>>>> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>>>>>
>>>>>       There have been reports of RDRAND issues after resuming from
>>>>> suspend on
>>>>>       some AMD family 15h and family 16h systems. This issue stems
>>>>> from a BIOS
>>>>>       not performing the proper steps during resume to ensure
>>>>> RDRAND continues
>>>>>       to function properly.
>>>>>
>>>>>       Update the CPU initialization to clear the RDRAND CPUID bit
>>>>> for any family
>>>>>       15h and 16h processor that supports RDRAND. If it is known
>>>>> that the family
>>>>>       15h or family 16h system does not have an RDRAND resume
>>>>> issue or that the
>>>>>       system will not be placed in suspend, the "cpuid=rdrand"
>>>>> kernel parameter
>>>>>       can be used to stop the clearing of the RDRAND CPUID bit.
>>>>>
>>>>>       Note, that clearing the RDRAND CPUID bit does not prevent a
>>>>> processor
>>>>>       that normally supports the RDRAND instruction from executing
>>>>> it. So any
>>>>>       code that determined the support based on family and model
>>>>> won't #UD.
>>>
>>> Yeah, but it will no longer see random numbers after resume.
>>
>> That's the implication; note that I've taken the text from the
>> Linux commit.
>>
>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>>>>> ---
>>>>> Slightly RFC, in particular because of the change to
>>>>> parse_xen_cpuid():
>>>>> Alternative approach suggestions are welcome.
>>>>
>>>> This issue was on my radar, but I hadn't got around to starting a
>>>> patch yet.
>>>>
>>>> I was wondering if we could perhaps default it to whether S3 was
>>>> available, but looking at the code which prints "ACPI sleep modes:
>>>> S3",
>>>> it doesn't appear to be related to any real ACPI information.
>>>
>>> Wouldn't it make more sense to inhibit suspend/resume instead?
>>
>> That's an alternative option. But if anything I'd see us providing
>> both, not the one you suggest instead of what the patch here does.
>>
>>> I'm quite sure a machine running Xen is very rarely put to S3.
>>
>> I'm not at all sure about this.
>
> Suspend/resume in Xen was broken for more than 3 months in the late 4.12
> development phase and nobody noticed it. Only when I started my core
> scheduling series I came across the issue.

OpenXT and Qubes will come after you with sticks...

For traditional server scenarios, S3 is rare to non-existent, but Xen
also runs in a number of well known cases where S3 is very important. 
The reason S3 breaks frequently (see also Jan's IRQ adjustments broke it
for a while in staging), is because none of the developers who work in
these areas use it.

I now have a test scenario using a CoffeeLake system sat on my desk, but
it only gets testing when I'm poking the low level code, or when I
suspect a change might have an impact.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-29 12:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-29  9:28 [Xen-devel] [PATCH] x86: clear RDRAND CPUID bit on AMD family 15h/16h Jan Beulich
2019-08-29 11:42 ` Andrew Cooper
2019-08-29 12:01   ` Juergen Gross
2019-08-29 12:27     ` Jan Beulich
2019-08-29 12:39       ` Juergen Gross
2019-08-29 12:49         ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).