xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/2] Update microcode driver
@ 2015-08-11 19:11 Aravind Gopalakrishnan
  2015-08-11 19:11 ` [PATCH V4 1/2] x86/microcode: Cleanup int type usage for cpu numbers Aravind Gopalakrishnan
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Aravind Gopalakrishnan @ 2015-08-11 19:11 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3
  Cc: boris.ostrovsky, keir, Suravee.Suthikulpanit, xen-devel

Patch 1: Cleans up integer types used for cpu numbers
	 per Jan's suggestion
Patch 2: Fix HW issue by skipping certain microcode levels
	 and aborting microcode update process as otherwise
	 system hangs are known to occur

Changes from V3 (per Jan)
  - const-ify final_levels array
  - cleanup int usage for cpu numbers in a pre-patch

Changes from V2 (per Boris)
  - introduce family check as it's theoritically possible to have
    same patch level for different family too.
  - Indicate that the check is only on Fam10h too while at it

Changes from V1 (per Andrew)
  - use commit text from linux patch
  - include details about how 'final_levels' are obtaines in comments
    (I have also copied it into commit message. But shall remove if you
     feel it's redundant)
  - use ARRAY_SIZE() and kill NULL terminator
  - use XENLOG_INFO in place of pr_debug()
  - correct unsigned int usage

Aravind Gopalakrishnan (2):
  x86/microcode: Cleanup int type usage for cpu numbers
  x86, amd_ucode: Skip microcode updates for final levels

 xen/arch/x86/microcode.c        |  6 ++---
 xen/arch/x86/microcode_amd.c    | 57 +++++++++++++++++++++++++++++++++++++----
 xen/arch/x86/microcode_intel.c  | 15 ++++++-----
 xen/include/asm-x86/microcode.h |  9 ++++---
 xen/include/asm-x86/processor.h |  2 +-
 5 files changed, 69 insertions(+), 20 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V4 1/2] x86/microcode: Cleanup int type usage for cpu numbers
  2015-08-11 19:11 [PATCH V4 0/2] Update microcode driver Aravind Gopalakrishnan
@ 2015-08-11 19:11 ` Aravind Gopalakrishnan
  2015-08-11 19:11 ` [PATCH V4 2/2] x86, amd_ucode: Skip microcode updates for final levels Aravind Gopalakrishnan
  2015-08-20 22:45 ` [PATCH V4 0/2] Update microcode driver Wei Liu
  2 siblings, 0 replies; 6+ messages in thread
From: Aravind Gopalakrishnan @ 2015-08-11 19:11 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3
  Cc: boris.ostrovsky, keir, Suravee.Suthikulpanit, xen-devel

CPU numbers can't be negative. Fixing the microcode*
files to properly use unsigned type in this patch.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
---
 xen/arch/x86/microcode.c        |  6 +++---
 xen/arch/x86/microcode_amd.c    | 12 +++++++-----
 xen/arch/x86/microcode_intel.c  | 15 ++++++++-------
 xen/include/asm-x86/microcode.h |  9 +++++----
 xen/include/asm-x86/processor.h |  2 +-
 5 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 091d5d1..c20bde6 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -195,7 +195,7 @@ struct microcode_info {
     char buffer[1];
 };
 
-static void __microcode_fini_cpu(int cpu)
+static void __microcode_fini_cpu(unsigned int cpu)
 {
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
 
@@ -203,14 +203,14 @@ static void __microcode_fini_cpu(int cpu)
     memset(uci, 0, sizeof(*uci));
 }
 
-static void microcode_fini_cpu(int cpu)
+static void microcode_fini_cpu(unsigned int cpu)
 {
     spin_lock(&microcode_mutex);
     __microcode_fini_cpu(cpu);
     spin_unlock(&microcode_mutex);
 }
 
-int microcode_resume_cpu(int cpu)
+int microcode_resume_cpu(unsigned int cpu)
 {
     int err;
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index f79b397..2717479 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -79,7 +79,7 @@ struct mpbhdr {
 static DEFINE_SPINLOCK(microcode_update_lock);
 
 /* See comment in start_update() for cases when this routine fails */
-static int collect_cpu_info(int cpu, struct cpu_signature *csig)
+static int collect_cpu_info(unsigned int cpu, struct cpu_signature *csig)
 {
     struct cpuinfo_x86 *c = &cpu_data[cpu];
 
@@ -149,7 +149,8 @@ static bool_t find_equiv_cpu_id(const struct equiv_cpu_entry *equiv_cpu_table,
     return 0;
 }
 
-static bool_t microcode_fits(const struct microcode_amd *mc_amd, int cpu)
+static bool_t microcode_fits(const struct microcode_amd *mc_amd,
+                             unsigned int cpu)
 {
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
     const struct microcode_header_amd *mc_header = mc_amd->mpb;
@@ -186,7 +187,7 @@ static bool_t microcode_fits(const struct microcode_amd *mc_amd, int cpu)
     return 1;
 }
 
-static int apply_microcode(int cpu)
+static int apply_microcode(unsigned int cpu)
 {
     unsigned long flags;
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
@@ -347,7 +348,8 @@ static int container_fast_forward(const void *data, size_t size_left, size_t *of
     return 0;
 }
 
-static int cpu_request_microcode(int cpu, const void *buf, size_t bufsize)
+static int cpu_request_microcode(unsigned int cpu, const void *buf,
+                                 size_t bufsize)
 {
     struct microcode_amd *mc_amd, *mc_old;
     size_t offset = 0;
@@ -511,7 +513,7 @@ static int cpu_request_microcode(int cpu, const void *buf, size_t bufsize)
     return error;
 }
 
-static int microcode_resume_match(int cpu, const void *mc)
+static int microcode_resume_match(unsigned int cpu, const void *mc)
 {
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
     struct microcode_amd *mc_amd = uci->mc.mc_amd;
diff --git a/xen/arch/x86/microcode_intel.c b/xen/arch/x86/microcode_intel.c
index b54cd71..0a5f403 100644
--- a/xen/arch/x86/microcode_intel.c
+++ b/xen/arch/x86/microcode_intel.c
@@ -90,7 +90,7 @@ struct extended_sigtable {
 /* serialize access to the physical write to MSR 0x79 */
 static DEFINE_SPINLOCK(microcode_update_lock);
 
-static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
+static int collect_cpu_info(unsigned int cpu_num, struct cpu_signature *csig)
 {
     struct cpuinfo_x86 *c = &cpu_data[cpu_num];
     uint64_t msr_content;
@@ -129,7 +129,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 }
 
 static inline int microcode_update_match(
-    int cpu_num, const struct microcode_header_intel *mc_header,
+    unsigned int cpu_num, const struct microcode_header_intel *mc_header,
     int sig, int pf)
 {
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu_num);
@@ -232,7 +232,7 @@ static int microcode_sanity_check(void *mc)
  * return 1 - found update
  * return < 0 - error
  */
-static int get_matching_microcode(const void *mc, int cpu)
+static int get_matching_microcode(const void *mc, unsigned int cpu)
 {
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
     const struct microcode_header_intel *mc_header = mc;
@@ -277,12 +277,12 @@ static int get_matching_microcode(const void *mc, int cpu)
     return 1;
 }
 
-static int apply_microcode(int cpu)
+static int apply_microcode(unsigned int cpu)
 {
     unsigned long flags;
     uint64_t msr_content;
     unsigned int val[2];
-    int cpu_num = raw_smp_processor_id();
+    unsigned int cpu_num = raw_smp_processor_id();
     struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu_num);
 
     /* We should bind the task to the CPU */
@@ -351,7 +351,8 @@ static long get_next_ucode_from_buffer(void **mc, const u8 *buf,
     return offset + total_size;
 }
 
-static int cpu_request_microcode(int cpu, const void *buf, size_t size)
+static int cpu_request_microcode(unsigned int cpu, const void *buf,
+                                 size_t size)
 {
     long offset = 0;
     int error = 0;
@@ -391,7 +392,7 @@ static int cpu_request_microcode(int cpu, const void *buf, size_t size)
     return error;
 }
 
-static int microcode_resume_match(int cpu, const void *mc)
+static int microcode_resume_match(unsigned int cpu, const void *mc)
 {
     return get_matching_microcode(mc, cpu);
 }
diff --git a/xen/include/asm-x86/microcode.h b/xen/include/asm-x86/microcode.h
index 00a672a..23ea954 100644
--- a/xen/include/asm-x86/microcode.h
+++ b/xen/include/asm-x86/microcode.h
@@ -7,10 +7,11 @@ struct cpu_signature;
 struct ucode_cpu_info;
 
 struct microcode_ops {
-    int (*microcode_resume_match)(int cpu, const void *mc);
-    int (*cpu_request_microcode)(int cpu, const void *buf, size_t size);
-    int (*collect_cpu_info)(int cpu, struct cpu_signature *csig);
-    int (*apply_microcode)(int cpu);
+    int (*microcode_resume_match)(unsigned int cpu, const void *mc);
+    int (*cpu_request_microcode)(unsigned int cpu, const void *buf,
+                                 size_t size);
+    int (*collect_cpu_info)(unsigned int cpu, struct cpu_signature *csig);
+    int (*apply_microcode)(unsigned int cpu);
     int (*start_update)(void);
 };
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index b9a00aa..f507f5e 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -570,7 +570,7 @@ int wrmsr_hypervisor_regs(uint32_t idx, uint64_t val);
 
 void microcode_set_module(unsigned int);
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void), unsigned long len);
-int microcode_resume_cpu(int cpu);
+int microcode_resume_cpu(unsigned int cpu);
 
 enum get_cpu_vendor {
    gcv_host_early,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH V4 2/2] x86, amd_ucode: Skip microcode updates for final levels
  2015-08-11 19:11 [PATCH V4 0/2] Update microcode driver Aravind Gopalakrishnan
  2015-08-11 19:11 ` [PATCH V4 1/2] x86/microcode: Cleanup int type usage for cpu numbers Aravind Gopalakrishnan
@ 2015-08-11 19:11 ` Aravind Gopalakrishnan
  2015-08-12  9:38   ` Jan Beulich
  2015-08-20 22:45 ` [PATCH V4 0/2] Update microcode driver Wei Liu
  2 siblings, 1 reply; 6+ messages in thread
From: Aravind Gopalakrishnan @ 2015-08-11 19:11 UTC (permalink / raw)
  To: jbeulich, andrew.cooper3
  Cc: boris.ostrovsky, keir, Suravee.Suthikulpanit, xen-devel

Some of older[Fam10h] systems require that certain number of
applied microcode patch levels should not be overwritten by
the microcode loader. Otherwise, system hangs are known to occur.

The 'final_levels' of patch ids have been obtained empirically.
Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
for details of the issue.

The short version is that people have predominantly noticed
system hang issues when trying to update microcode levels
beyond the patch IDs below.
[0x01000098, 0x0100009f, 0x010000af]

>From internal discussions, we gathered that OS/hypervisor
cannot reliably perform microcode updates beyond these levels
due to hardware issues. Therefore, we need to abort microcode
update process if we hit any of these levels.

In this patch, we check for those microcode versions and abort
if the current core has one of those final patch levels applied
by the BIOS

A linux version of the patch has already made it into tip-
http://marc.info/?l=linux-kernel&m=143703405627170

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/arch/x86/microcode_amd.c | 45 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/xen/arch/x86/microcode_amd.c b/xen/arch/x86/microcode_amd.c
index 2717479..b98a541 100644
--- a/xen/arch/x86/microcode_amd.c
+++ b/xen/arch/x86/microcode_amd.c
@@ -348,6 +348,43 @@ static int container_fast_forward(const void *data, size_t size_left, size_t *of
     return 0;
 }
 
+/*
+ * The 'final_levels' of patch ids have been obtained empirically.
+ * Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
+ * for details of the issue. The short version is that people
+ * using certain Fam10h systems noticed system hang issues when
+ * trying to update microcode levels beyond the patch IDs below.
+ * From internal discussions, we gathered that OS/hypervisor
+ * cannot reliably perform microcode updates beyond these levels
+ * due to hardware issues. Therefore, we need to abort microcode
+ * update process if we hit any of these levels.
+ */
+static const unsigned int final_levels[] = {
+    0x01000098,
+    0x0100009f,
+    0x010000af
+};
+
+static bool_t check_final_patch_levels(unsigned int cpu)
+{
+    /*
+     * Check the current patch levels on the cpu. If they are equal to
+     * any of the 'final_levels', then we should not update the microcode
+     * patch on the cpu as system will hang otherwise.
+     */
+    struct ucode_cpu_info *uci = &per_cpu(ucode_cpu_info, cpu);
+    unsigned int i;
+
+    if ( boot_cpu_data.x86 != 0x10 )
+        return 0;
+
+    for ( i = 0; i < ARRAY_SIZE(final_levels); i++ )
+        if ( uci->cpu_sig.rev == final_levels[i] )
+            return 1;
+
+    return 0;
+}
+
 static int cpu_request_microcode(unsigned int cpu, const void *buf,
                                  size_t bufsize)
 {
@@ -371,6 +408,14 @@ static int cpu_request_microcode(unsigned int cpu, const void *buf,
         goto out;
     }
 
+    if ( check_final_patch_levels(cpu) )
+    {
+        printk(XENLOG_INFO
+               "microcode: Cannot update microcode patch on the cpu as we hit a final level\n");
+        error = -EPERM;
+        goto out;
+    }
+
     mc_amd = xmalloc(struct microcode_amd);
     if ( !mc_amd )
     {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH V4 2/2] x86, amd_ucode: Skip microcode updates for final levels
  2015-08-11 19:11 ` [PATCH V4 2/2] x86, amd_ucode: Skip microcode updates for final levels Aravind Gopalakrishnan
@ 2015-08-12  9:38   ` Jan Beulich
  2015-08-13 22:51     ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2015-08-12  9:38 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: andrew.cooper3, boris.ostrovsky, keir, Suravee.Suthikulpanit, xen-devel

>>> On 11.08.15 at 21:11, <aravind.gopalakrishnan@amd.com> wrote:
> Some of older[Fam10h] systems require that certain number of
> applied microcode patch levels should not be overwritten by
> the microcode loader. Otherwise, system hangs are known to occur.
> 
> The 'final_levels' of patch ids have been obtained empirically.
> Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996 
> for details of the issue.
> 
> The short version is that people have predominantly noticed
> system hang issues when trying to update microcode levels
> beyond the patch IDs below.
> [0x01000098, 0x0100009f, 0x010000af]
> 
> From internal discussions, we gathered that OS/hypervisor
> cannot reliably perform microcode updates beyond these levels
> due to hardware issues. Therefore, we need to abort microcode
> update process if we hit any of these levels.

While the patch itself looks fine now, I'm still hesitant to take this
(even more so after having read through the bugzilla entry
linked to above): The list being established empirically - will it be
ever growing? Did you internally gain understanding of what it
actually is that goes wrong (and hence can perhaps narrow
down the conditions for the hangs to occur)? Have there been
any checks whether indeed _all_ systems at the listed ucode
levels are affected?

Also that's leaving aside the question of what unfixed CPU issues
people now being prevented from doing the ucode update are
going to run into.

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V4 2/2] x86, amd_ucode: Skip microcode updates for final levels
  2015-08-12  9:38   ` Jan Beulich
@ 2015-08-13 22:51     ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 6+ messages in thread
From: Aravind Gopalakrishnan @ 2015-08-13 22:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir, andrew.cooper3, xen-devel, Suravee.Suthikulpanit, Hurwitz,
	Sherry, boris.ostrovsky

On 8/12/2015 4:38 AM, Jan Beulich wrote:
>>>> On 11.08.15 at 21:11, <aravind.gopalakrishnan@amd.com> wrote:
>> Some of older[Fam10h] systems require that certain number of
>> applied microcode patch levels should not be overwritten by
>> the microcode loader. Otherwise, system hangs are known to occur.
>>
>> The 'final_levels' of patch ids have been obtained empirically.
>> Refer bug https://bugzilla.suse.com/show_bug.cgi?id=913996
>> for details of the issue.
>>
>> The short version is that people have predominantly noticed
>> system hang issues when trying to update microcode levels
>> beyond the patch IDs below.
>> [0x01000098, 0x0100009f, 0x010000af]
>>
>>  From internal discussions, we gathered that OS/hypervisor
>> cannot reliably perform microcode updates beyond these levels
>> due to hardware issues. Therefore, we need to abort microcode
>> update process if we hit any of these levels.
> While the patch itself looks fine now, I'm still hesitant to take this
> (even more so after having read through the bugzilla entry
> linked to above): The list being established empirically - will it be
> ever growing?

No, although the list is established empirically, it is not going to 
continue to grow..
(see below..)


>   Did you internally gain understanding of what it
> actually is that goes wrong (and hence can perhaps narrow
> down the conditions for the hangs to occur)? Have there been
> any checks whether indeed _all_ systems at the listed ucode
> levels are affected?

Yeah, HW architects mentioned they are aware of the problem and that 
it's relevant only on Fam10h (which is why the list would not grow).
And they verified that it affects all the systems at the listed 
microcode levels.


> Also that's leaving aside the question of what unfixed CPU issues
> people now being prevented from doing the ucode update are
> going to run into.
>
>

Right. So, the problem is that the user may have a perfectly normal 
working system at the listed microcode levels, but the hang occurs
*only* when you try to update the patch level. Hence the recommendation 
from HW architects to hold down the microcode level from
OS/hypervisor POV and not go ahead with the update process.

If the user has to update the microcode levels beyond these levels, then 
BIOS updates are an option.

HTH,

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V4 0/2] Update microcode driver
  2015-08-11 19:11 [PATCH V4 0/2] Update microcode driver Aravind Gopalakrishnan
  2015-08-11 19:11 ` [PATCH V4 1/2] x86/microcode: Cleanup int type usage for cpu numbers Aravind Gopalakrishnan
  2015-08-11 19:11 ` [PATCH V4 2/2] x86, amd_ucode: Skip microcode updates for final levels Aravind Gopalakrishnan
@ 2015-08-20 22:45 ` Wei Liu
  2 siblings, 0 replies; 6+ messages in thread
From: Wei Liu @ 2015-08-20 22:45 UTC (permalink / raw)
  To: Aravind Gopalakrishnan
  Cc: keir, jbeulich, andrew.cooper3, xen-devel, Suravee.Suthikulpanit,
	wei.liu2, boris.ostrovsky

Aravind and Andrew pointed me to this series.

The first patch is low risk, the second patch has been reviewed by two
experts in that area.

Provided Jan  is happy with Aravind's answer:

Release-acked-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-20 22:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-11 19:11 [PATCH V4 0/2] Update microcode driver Aravind Gopalakrishnan
2015-08-11 19:11 ` [PATCH V4 1/2] x86/microcode: Cleanup int type usage for cpu numbers Aravind Gopalakrishnan
2015-08-11 19:11 ` [PATCH V4 2/2] x86, amd_ucode: Skip microcode updates for final levels Aravind Gopalakrishnan
2015-08-12  9:38   ` Jan Beulich
2015-08-13 22:51     ` Aravind Gopalakrishnan
2015-08-20 22:45 ` [PATCH V4 0/2] Update microcode driver Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).