* [PATCH 0/3] Patch series to address some limitations in OS microcode loading. @ 2018-02-21 16:49 Ashok Raj 2018-02-21 16:49 ` [PATCH 1/3] x86/microcode/intel: Check microcode revision before updating sibling threads Ashok Raj ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Ashok Raj @ 2018-02-21 16:49 UTC (permalink / raw) To: bp Cc: Ashok Raj, X86 ML, LKML, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Tom Lendacky, Arjan Van De Ven The following set of patches address some limitations on microcode loading. First patch avoids a redundant microcode load on sibling thread if another HT sibling got updated. Ashok Raj (3): x86/microcode/intel: Check microcode revision before updating sibling threads x86/microcode/intel: Perform a cache flush before ucode update. x86/microcode: Quiesce all threads before a microcode update. arch/x86/kernel/cpu/microcode/core.c | 113 +++++++++++++++++++++++++++++----- arch/x86/kernel/cpu/microcode/intel.c | 19 +++++- 2 files changed, 113 insertions(+), 19 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] x86/microcode/intel: Check microcode revision before updating sibling threads 2018-02-21 16:49 [PATCH 0/3] Patch series to address some limitations in OS microcode loading Ashok Raj @ 2018-02-21 16:49 ` Ashok Raj 2018-02-21 16:49 ` [PATCH 2/3] x86/microcode/intel: Perform a cache flush before ucode update Ashok Raj 2018-02-21 16:49 ` [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update Ashok Raj 2 siblings, 0 replies; 9+ messages in thread From: Ashok Raj @ 2018-02-21 16:49 UTC (permalink / raw) To: bp Cc: Ashok Raj, X86 ML, LKML, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Tom Lendacky, Arjan Van De Ven After updating microcode on one of the threads in the core, the thread sibling automatically gets the update since the microcode resources are shared. Check the ucode revision on the CPU before performing a ucode update. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: X86 ML <x86@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> Updates: v2: Address Boris's to cleanup apply_microcode_intel v3: Fixups per Ingo: Spell Checks --- arch/x86/kernel/cpu/microcode/intel.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 09b95a7..eff80df 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -776,7 +776,7 @@ static enum ucode_state apply_microcode_intel(int cpu) { struct microcode_intel *mc; struct ucode_cpu_info *uci; - struct cpuinfo_x86 *c; + struct cpuinfo_x86 *c = &cpu_data(cpu); static int prev_rev; u32 rev; @@ -793,6 +793,18 @@ static enum ucode_state apply_microcode_intel(int cpu) return UCODE_NFOUND; } + rev = intel_get_microcode_revision(); + /* + * Its possible the microcode got updated + * because its sibling update was done earlier. + * Skip the update in that case. + */ + if (rev >= mc->hdr.rev) { + uci->cpu_sig.rev = rev; + c->microcode = rev; + return UCODE_OK; + } + /* write microcode via MSR 0x79 */ wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits); @@ -813,8 +825,6 @@ static enum ucode_state apply_microcode_intel(int cpu) prev_rev = rev; } - c = &cpu_data(cpu); - uci->cpu_sig.rev = rev; c->microcode = rev; -- 2.7.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] x86/microcode/intel: Perform a cache flush before ucode update. 2018-02-21 16:49 [PATCH 0/3] Patch series to address some limitations in OS microcode loading Ashok Raj 2018-02-21 16:49 ` [PATCH 1/3] x86/microcode/intel: Check microcode revision before updating sibling threads Ashok Raj @ 2018-02-21 16:49 ` Ashok Raj 2018-02-21 18:25 ` Borislav Petkov 2018-02-21 16:49 ` [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update Ashok Raj 2 siblings, 1 reply; 9+ messages in thread From: Ashok Raj @ 2018-02-21 16:49 UTC (permalink / raw) To: bp Cc: Ashok Raj, X86 ML, LKML, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Tom Lendacky, Arjan Van De Ven Microcode updates can be safer if the caches are clean. Some of the issues around in certain Broadwell parts can be addressed by doing a full cache flush. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: X86 ML <x86@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> --- arch/x86/kernel/cpu/microcode/intel.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index eff80df..5d32724 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -589,6 +589,7 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early) if (!mc) return 0; + wbinvd(); /* write microcode via MSR 0x79 */ native_wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits); @@ -805,6 +806,7 @@ static enum ucode_state apply_microcode_intel(int cpu) return UCODE_OK; } + wbinvd(); /* write microcode via MSR 0x79 */ wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits); -- 2.7.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3] x86/microcode/intel: Perform a cache flush before ucode update. 2018-02-21 16:49 ` [PATCH 2/3] x86/microcode/intel: Perform a cache flush before ucode update Ashok Raj @ 2018-02-21 18:25 ` Borislav Petkov 0 siblings, 0 replies; 9+ messages in thread From: Borislav Petkov @ 2018-02-21 18:25 UTC (permalink / raw) To: Ashok Raj Cc: X86 ML, LKML, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Tom Lendacky, Arjan Van De Ven On Wed, Feb 21, 2018 at 08:49:43AM -0800, Ashok Raj wrote: > Microcode updates can be safer if the caches are clean. > Some of the issues around in certain Broadwell parts > can be addressed by doing a full cache flush. Take that text... > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > Cc: X86 ML <x86@kernel.org> > Cc: LKML <linux-kernel@vger.kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Andi Kleen <andi.kleen@intel.com> > Cc: Boris Petkov <bp@suse.de> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> > --- > arch/x86/kernel/cpu/microcode/intel.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c > index eff80df..5d32724 100644 > --- a/arch/x86/kernel/cpu/microcode/intel.c > +++ b/arch/x86/kernel/cpu/microcode/intel.c > @@ -589,6 +589,7 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early) > if (!mc) > return 0; > ... and put it here as a comment. > + wbinvd(); This definitely needs to be native_wbinvd(). Early mode don't love pvops. > /* write microcode via MSR 0x79 */ > native_wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits); > > @@ -805,6 +806,7 @@ static enum ucode_state apply_microcode_intel(int cpu) > return UCODE_OK; > } Ditto. > + wbinvd(); > /* write microcode via MSR 0x79 */ > wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits); > > -- > 2.7.4 > -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update. 2018-02-21 16:49 [PATCH 0/3] Patch series to address some limitations in OS microcode loading Ashok Raj 2018-02-21 16:49 ` [PATCH 1/3] x86/microcode/intel: Check microcode revision before updating sibling threads Ashok Raj 2018-02-21 16:49 ` [PATCH 2/3] x86/microcode/intel: Perform a cache flush before ucode update Ashok Raj @ 2018-02-21 16:49 ` Ashok Raj 2018-02-21 19:06 ` Borislav Petkov 2 siblings, 1 reply; 9+ messages in thread From: Ashok Raj @ 2018-02-21 16:49 UTC (permalink / raw) To: bp Cc: Ashok Raj, X86 ML, LKML, Tom Lendacky, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Arjan Van De Ven Microcode updates during OS load always assumed the other hyperthread was "quiet", but Linux never really did this. We've recently received several issues on this, where things did not go well at scale deployments, and the Intel microcode team asked us to make sure the system is in a quiet state during these updates. Such updates are rare events, so we use stop_machine() to ensure the whole system is quiet. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: X86 ML <x86@kernel.org> Cc: LKML <linux-kernel@vger.kernel.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Boris Petkov <bp@suse.de> Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> --- arch/x86/kernel/cpu/microcode/core.c | 113 +++++++++++++++++++++++++++++----- arch/x86/kernel/cpu/microcode/intel.c | 1 + 2 files changed, 98 insertions(+), 16 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index aa1b9a4..af0aeb2 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -31,6 +31,9 @@ #include <linux/cpu.h> #include <linux/fs.h> #include <linux/mm.h> +#include <linux/nmi.h> +#include <linux/stop_machine.h> +#include <linux/delay.h> #include <asm/microcode_intel.h> #include <asm/cpu_device_id.h> @@ -489,19 +492,82 @@ static void __exit microcode_dev_exit(void) /* fake device for request_firmware */ static struct platform_device *microcode_pdev; -static enum ucode_state reload_for_cpu(int cpu) +static struct ucode_update_param { + spinlock_t ucode_lock; + atomic_t count; + atomic_t errors; + atomic_t enter; + int timeout; +} uc_data; + +static void do_ucode_update(int cpu, struct ucode_update_param *ucd) { - struct ucode_cpu_info *uci = ucode_cpu_info + cpu; - enum ucode_state ustate; + enum ucode_state retval = 0; - if (!uci->valid) - return UCODE_OK; + spin_lock(&ucd->ucode_lock); + retval = microcode_ops->apply_microcode(cpu); + spin_unlock(&ucd->ucode_lock); + if (retval > UCODE_NFOUND) { + atomic_inc(&ucd->errors); + pr_warn("microcode update to cpu %d failed\n", cpu); + } + atomic_inc(&ucd->count); +} + +/* + * Wait for upto 1sec for all cpus + * to show up in the rendezvous function + */ +#define MAX_UCODE_RENDEZVOUS 1000000000 /* nanosec */ +#define SPINUNIT 100 /* 100ns */ + +/* + * Each cpu waits for 1sec max. + */ +static int ucode_wait_timedout(int *time_out, void *data) +{ + struct ucode_update_param *ucd = data; + if (*time_out < SPINUNIT) { + pr_err("Not all cpus entered ucode update handler %d cpus missing\n", + (num_online_cpus() - atomic_read(&ucd->enter))); + return 1; + } + *time_out -= SPINUNIT; + touch_nmi_watchdog(); + return 0; +} + +/* + * All cpus enter here before a ucode load upto 1 sec. + * If not all cpus showed up, we abort the ucode update + * and return. ucode update is serialized with the spinlock + */ +static int ucode_load_rendezvous(void *data) +{ + int cpu = smp_processor_id(); + struct ucode_update_param *ucd = data; + int timeout = MAX_UCODE_RENDEZVOUS; + int total_cpus = num_online_cpus(); - ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev, true); - if (ustate != UCODE_OK) - return ustate; + /* + * Wait for all cpu's to arrive + */ + atomic_dec(&ucd->enter); + while(atomic_read(&ucd->enter)) { + if (ucode_wait_timedout(&timeout, ucd)) + return 1; + ndelay(SPINUNIT); + } + + do_ucode_update(cpu, ucd); - return apply_microcode_on_target(cpu); + /* + * Wait for all cpu's to complete + * ucode update + */ + while (atomic_read(&ucd->count) != total_cpus) + cpu_relax(); + return 0; } static ssize_t reload_store(struct device *dev, @@ -509,7 +575,6 @@ static ssize_t reload_store(struct device *dev, const char *buf, size_t size) { enum ucode_state tmp_ret = UCODE_OK; - bool do_callback = false; unsigned long val; ssize_t ret = 0; int cpu; @@ -523,21 +588,37 @@ static ssize_t reload_store(struct device *dev, get_online_cpus(); mutex_lock(µcode_mutex); + /* + * First load the microcode file for all cpu's + */ for_each_online_cpu(cpu) { - tmp_ret = reload_for_cpu(cpu); + tmp_ret = microcode_ops->request_microcode_fw(cpu, + µcode_pdev->dev, true); if (tmp_ret > UCODE_NFOUND) { - pr_warn("Error reloading microcode on CPU %d\n", cpu); + pr_warn("Error reloading microcode file for CPU %d\n", cpu); /* set retval for the first encountered reload error */ if (!ret) ret = -EINVAL; } - - if (tmp_ret == UCODE_UPDATED) - do_callback = true; } + pr_debug("Done loading microcode file for all cpus\n"); - if (!ret && do_callback) + memset(&uc_data, 0, sizeof(struct ucode_update_param)); + spin_lock_init(&uc_data.ucode_lock); + atomic_set(&uc_data.enter, num_online_cpus()); + /* + * Wait for a 1 sec + */ + uc_data.timeout = USEC_PER_SEC; + stop_machine(ucode_load_rendezvous, &uc_data, cpu_online_mask); + + pr_debug("Total CPUS = %d uperrors = %d\n", + atomic_read(&uc_data.count), atomic_read(&uc_data.errors)); + + if (atomic_read(&uc_data.errors)) + pr_warn("Update failed for %d cpus\n", atomic_read(&uc_data.errors)); + else microcode_check(); mutex_unlock(µcode_mutex); diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 5d32724..0c55be6 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -809,6 +809,7 @@ static enum ucode_state apply_microcode_intel(int cpu) wbinvd(); /* write microcode via MSR 0x79 */ wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits); + pr_debug("ucode loading for cpu %d done\n", cpu); rev = intel_get_microcode_revision(); -- 2.7.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update. 2018-02-21 16:49 ` [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update Ashok Raj @ 2018-02-21 19:06 ` Borislav Petkov 2018-02-21 20:13 ` Raj, Ashok 0 siblings, 1 reply; 9+ messages in thread From: Borislav Petkov @ 2018-02-21 19:06 UTC (permalink / raw) To: Ashok Raj Cc: X86 ML, LKML, Tom Lendacky, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Arjan Van De Ven On Wed, Feb 21, 2018 at 08:49:44AM -0800, Ashok Raj wrote: > Microcode updates during OS load always assumed the other hyperthread > was "quiet", but Linux never really did this. We've recently received > several issues on this, where things did not go well at scale > deployments, and the Intel microcode team asked us to make sure the > system is in a quiet state during these updates. Such updates are > rare events, so we use stop_machine() to ensure the whole system is > quiet. Ewww, where do I begin?! I really really hoped that we could avoid nasty dancing like that. > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > Cc: X86 ML <x86@kernel.org> > Cc: LKML <linux-kernel@vger.kernel.org> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Tony Luck <tony.luck@intel.com> > Cc: Andi Kleen <andi.kleen@intel.com> > Cc: Boris Petkov <bp@suse.de> > Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com> > --- > arch/x86/kernel/cpu/microcode/core.c | 113 +++++++++++++++++++++++++++++----- This is generic so Tom needs to ack whatever we end up doing for the AMD side. > arch/x86/kernel/cpu/microcode/intel.c | 1 + > 2 files changed, 98 insertions(+), 16 deletions(-) > > diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c > index aa1b9a4..af0aeb2 100644 > --- a/arch/x86/kernel/cpu/microcode/core.c > +++ b/arch/x86/kernel/cpu/microcode/core.c > @@ -31,6 +31,9 @@ > #include <linux/cpu.h> > #include <linux/fs.h> > #include <linux/mm.h> > +#include <linux/nmi.h> > +#include <linux/stop_machine.h> > +#include <linux/delay.h> > > #include <asm/microcode_intel.h> > #include <asm/cpu_device_id.h> > @@ -489,19 +492,82 @@ static void __exit microcode_dev_exit(void) > /* fake device for request_firmware */ > static struct platform_device *microcode_pdev; > > -static enum ucode_state reload_for_cpu(int cpu) > +static struct ucode_update_param { > + spinlock_t ucode_lock; > + atomic_t count; > + atomic_t errors; > + atomic_t enter; > + int timeout; > +} uc_data; > + > +static void do_ucode_update(int cpu, struct ucode_update_param *ucd) > { > - struct ucode_cpu_info *uci = ucode_cpu_info + cpu; > - enum ucode_state ustate; > + enum ucode_state retval = 0; > > - if (!uci->valid) > - return UCODE_OK; > + spin_lock(&ucd->ucode_lock); > + retval = microcode_ops->apply_microcode(cpu); > + spin_unlock(&ucd->ucode_lock); What's the spinlock protecting against? We hold the hotplug lock and the microcode mutex. And yet interrupts are still enabled. So what's up? > + if (retval > UCODE_NFOUND) { > + atomic_inc(&ucd->errors); You don't need ->errors. Simply propagate retval from do_ucode_update(). Or compare ucd->count to the number of CPUs. Or something like that. > + pr_warn("microcode update to cpu %d failed\n", cpu); > + } > + atomic_inc(&ucd->count); > +} > + > +/* > + * Wait for upto 1sec for all cpus > + * to show up in the rendezvous function > + */ > +#define MAX_UCODE_RENDEZVOUS 1000000000 /* nanosec */ 1 * NSEC_PER_SEC > +#define SPINUNIT 100 /* 100ns */ > + > +/* > + * Each cpu waits for 1sec max. > + */ > +static int ucode_wait_timedout(int *time_out, void *data) > +{ > + struct ucode_update_param *ucd = data; > + if (*time_out < SPINUNIT) { > + pr_err("Not all cpus entered ucode update handler %d cpus missing\n", > + (num_online_cpus() - atomic_read(&ucd->enter))); > + return 1; > + } > + *time_out -= SPINUNIT; > + touch_nmi_watchdog(); > + return 0; > +} > + > +/* > + * All cpus enter here before a ucode load upto 1 sec. > + * If not all cpus showed up, we abort the ucode update > + * and return. ucode update is serialized with the spinlock ... and yet you don't check stop_machine()'s retval and issue an error message that it failed. > + */ > +static int ucode_load_rendezvous(void *data) The correct prefix is "microcode_" > +{ > + int cpu = smp_processor_id(); > + struct ucode_update_param *ucd = data; > + int timeout = MAX_UCODE_RENDEZVOUS; > + int total_cpus = num_online_cpus(); > > - ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev, true); > - if (ustate != UCODE_OK) > - return ustate; > + /* > + * Wait for all cpu's to arrive > + */ > + atomic_dec(&ucd->enter); > + while(atomic_read(&ucd->enter)) { > + if (ucode_wait_timedout(&timeout, ucd)) > + return 1; > + ndelay(SPINUNIT); > + } > + > + do_ucode_update(cpu, ucd); > > - return apply_microcode_on_target(cpu); > + /* > + * Wait for all cpu's to complete > + * ucode update > + */ > + while (atomic_read(&ucd->count) != total_cpus) > + cpu_relax(); > + return 0; > } > > static ssize_t reload_store(struct device *dev, > @@ -509,7 +575,6 @@ static ssize_t reload_store(struct device *dev, > const char *buf, size_t size) > { > enum ucode_state tmp_ret = UCODE_OK; > - bool do_callback = false; > unsigned long val; > ssize_t ret = 0; > int cpu; > @@ -523,21 +588,37 @@ static ssize_t reload_store(struct device *dev, > > get_online_cpus(); > mutex_lock(µcode_mutex); > + /* > + * First load the microcode file for all cpu's > + */ It is always "CPU" or "CPUs". Fix all misspelled places. > for_each_online_cpu(cpu) { You need to fail loading and not even try when not all cores are online. And issue an error message. > - tmp_ret = reload_for_cpu(cpu); > + tmp_ret = microcode_ops->request_microcode_fw(cpu, > + µcode_pdev->dev, true); This needs to happen only once - not per CPU. Let's simply forget heterogeneous microcode revisions. > if (tmp_ret > UCODE_NFOUND) { > - pr_warn("Error reloading microcode on CPU %d\n", cpu); > + pr_warn("Error reloading microcode file for CPU %d\n", cpu); > > /* set retval for the first encountered reload error */ > if (!ret) > ret = -EINVAL; You can't continue loading here if you've encountered an error. > } > - > - if (tmp_ret == UCODE_UPDATED) > - do_callback = true; > } > + pr_debug("Done loading microcode file for all cpus\n"); > > - if (!ret && do_callback) > + memset(&uc_data, 0, sizeof(struct ucode_update_param)); > + spin_lock_init(&uc_data.ucode_lock); > + atomic_set(&uc_data.enter, num_online_cpus()); > + /* > + * Wait for a 1 sec > + */ > + uc_data.timeout = USEC_PER_SEC; > + stop_machine(ucode_load_rendezvous, &uc_data, cpu_online_mask); > + > + pr_debug("Total CPUS = %d uperrors = %d\n", > + atomic_read(&uc_data.count), atomic_read(&uc_data.errors)); > + > + if (atomic_read(&uc_data.errors)) > + pr_warn("Update failed for %d cpus\n", atomic_read(&uc_data.errors)); > + else > microcode_check(); This whole jumping through hoops needs to be extracted away in a separate function. Ok, that should be enough review for now. More with v2. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update. 2018-02-21 19:06 ` Borislav Petkov @ 2018-02-21 20:13 ` Raj, Ashok 2018-02-21 21:15 ` Borislav Petkov 2018-02-22 15:36 ` Tom Lendacky 0 siblings, 2 replies; 9+ messages in thread From: Raj, Ashok @ 2018-02-21 20:13 UTC (permalink / raw) To: Borislav Petkov Cc: X86 ML, LKML, Tom Lendacky, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Arjan Van De Ven, Ashok Raj On Wed, Feb 21, 2018 at 08:06:11PM +0100, Borislav Petkov wrote: > > arch/x86/kernel/cpu/microcode/core.c | 113 +++++++++++++++++++++++++++++----- > > This is generic so Tom needs to ack whatever we end up doing for the AMD > side. Yes, i did ping Tom to check if this is ok with them. > > > arch/x86/kernel/cpu/microcode/intel.c | 1 + > > 2 files changed, 98 insertions(+), 16 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c > > index aa1b9a4..af0aeb2 100644 > > --- a/arch/x86/kernel/cpu/microcode/core.c > > +++ b/arch/x86/kernel/cpu/microcode/core.c > > @@ -31,6 +31,9 @@ > > #include <linux/cpu.h> > > #include <linux/fs.h> > > #include <linux/mm.h> > > +#include <linux/nmi.h> > > +#include <linux/stop_machine.h> > > +#include <linux/delay.h> > > > > #include <asm/microcode_intel.h> > > #include <asm/cpu_device_id.h> > > @@ -489,19 +492,82 @@ static void __exit microcode_dev_exit(void) > > /* fake device for request_firmware */ > > static struct platform_device *microcode_pdev; > > > > -static enum ucode_state reload_for_cpu(int cpu) > > +static struct ucode_update_param { > > + spinlock_t ucode_lock; > > + atomic_t count; > > + atomic_t errors; > > + atomic_t enter; > > + int timeout; > > +} uc_data; > > + > > +static void do_ucode_update(int cpu, struct ucode_update_param *ucd) > > { > > - struct ucode_cpu_info *uci = ucode_cpu_info + cpu; > > - enum ucode_state ustate; > > + enum ucode_state retval = 0; > > > > - if (!uci->valid) > > - return UCODE_OK; > > + spin_lock(&ucd->ucode_lock); > > + retval = microcode_ops->apply_microcode(cpu); > > + spin_unlock(&ucd->ucode_lock); > > What's the spinlock protecting against? This is ensuring no 2 cpus do ucode update at the same time. Since all cpus wait for all the online cpus to arrive in stop_machine handler. Once we let go, every cpu tries to update. This just serializes against that. > > We hold the hotplug lock and the microcode mutex. And yet interrupts are > still enabled. So what's up? hotplug lock/microcode mutex are at global level, these are protecting individual cpus in stop machine trying to update microcode. these are called while in stop_machine() so i think interrupts are disabled IRC. > > > > + if (retval > UCODE_NFOUND) { > > + atomic_inc(&ucd->errors); > > You don't need ->errors. Simply propagate retval from do_ucode_update(). > Or compare ucd->count to the number of CPUs. Or something like that. That's what we are doing here, but simply returning number of cpus that encountered failure instead of a per-cpu retval like before. I use ucd->count to use as an exit rendezvous.. to make sure we leave only after all cpus have done updating ucode. > > + pr_warn("microcode update to cpu %d failed\n", cpu); > > + } > > + atomic_inc(&ucd->count); > > +} > > + > > +/* > > + * Wait for upto 1sec for all cpus > > + * to show up in the rendezvous function > > + */ > > +#define MAX_UCODE_RENDEZVOUS 1000000000 /* nanosec */ > > 1 * NSEC_PER_SEC > > > +#define SPINUNIT 100 /* 100ns */ > > + > > +/* > > + * Each cpu waits for 1sec max. > > + */ > > +static int ucode_wait_timedout(int *time_out, void *data) > > +{ > > + struct ucode_update_param *ucd = data; > > + if (*time_out < SPINUNIT) { > > + pr_err("Not all cpus entered ucode update handler %d cpus missing\n", > > + (num_online_cpus() - atomic_read(&ucd->enter))); > > + return 1; > > + } > > + *time_out -= SPINUNIT; > > + touch_nmi_watchdog(); > > + return 0; > > +} > > + > > +/* > > + * All cpus enter here before a ucode load upto 1 sec. > > + * If not all cpus showed up, we abort the ucode update > > + * and return. ucode update is serialized with the spinlock > > ... and yet you don't check stop_machine()'s retval and issue an error > message that it failed. > Will add that > > + */ > > +static int ucode_load_rendezvous(void *data) > > The correct prefix is "microcode_" > > > +{ > > + int cpu = smp_processor_id(); > > + struct ucode_update_param *ucd = data; > > + int timeout = MAX_UCODE_RENDEZVOUS; > > + int total_cpus = num_online_cpus(); > > > > - ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev, true); > > - if (ustate != UCODE_OK) > > - return ustate; > > + /* > > + * Wait for all cpu's to arrive > > + */ > > + atomic_dec(&ucd->enter); > > + while(atomic_read(&ucd->enter)) { > > + if (ucode_wait_timedout(&timeout, ucd)) > > + return 1; > > + ndelay(SPINUNIT); > > + } > > + > > + do_ucode_update(cpu, ucd); > > > > - return apply_microcode_on_target(cpu); > > + /* > > + * Wait for all cpu's to complete > > + * ucode update > > + */ > > + while (atomic_read(&ucd->count) != total_cpus) > > + cpu_relax(); > > + return 0; > > } > > > > static ssize_t reload_store(struct device *dev, > > @@ -509,7 +575,6 @@ static ssize_t reload_store(struct device *dev, > > const char *buf, size_t size) > > { > > enum ucode_state tmp_ret = UCODE_OK; > > - bool do_callback = false; > > unsigned long val; > > ssize_t ret = 0; > > int cpu; > > @@ -523,21 +588,37 @@ static ssize_t reload_store(struct device *dev, > > > > get_online_cpus(); > > mutex_lock(µcode_mutex); > > + /* > > + * First load the microcode file for all cpu's > > + */ > > It is always "CPU" or "CPUs". Fix all misspelled places. > > > for_each_online_cpu(cpu) { > > You need to fail loading and not even try when not all cores are online. > And issue an error message. > When we online any of the offline cpu's we do a microcode load again right? I did check with offlining 2 threads of the same core offline, then reload with a new version of microcode. Online the new cpus i did find the microcode was updated during online process. Since offline ones don't participate in any OS activity thought its ok to update everything that is available and visitible to the OS. If BIOS has turned off cores due to some failures and didn't expose in MADT during boot, we will never get a chance to update online. > > - tmp_ret = reload_for_cpu(cpu); > > + tmp_ret = microcode_ops->request_microcode_fw(cpu, > > + µcode_pdev->dev, true); > > This needs to happen only once - not per CPU. Let's simply forget > heterogeneous microcode revisions. Sounds good.. let me take a look at this. > > > if (tmp_ret > UCODE_NFOUND) { > > - pr_warn("Error reloading microcode on CPU %d\n", cpu); > > + pr_warn("Error reloading microcode file for CPU %d\n", cpu); > > > > /* set retval for the first encountered reload error */ > > if (!ret) > > ret = -EINVAL; > > You can't continue loading here if you've encountered an error. Sounds good. > > > } > > - > > - if (tmp_ret == UCODE_UPDATED) > > - do_callback = true; > > } > > + pr_debug("Done loading microcode file for all cpus\n"); > > > > - if (!ret && do_callback) > > + memset(&uc_data, 0, sizeof(struct ucode_update_param)); > > + spin_lock_init(&uc_data.ucode_lock); > > + atomic_set(&uc_data.enter, num_online_cpus()); > > + /* > > + * Wait for a 1 sec > > + */ > > + uc_data.timeout = USEC_PER_SEC; > > + stop_machine(ucode_load_rendezvous, &uc_data, cpu_online_mask); > > + > > + pr_debug("Total CPUS = %d uperrors = %d\n", > > + atomic_read(&uc_data.count), atomic_read(&uc_data.errors)); > > + > > + if (atomic_read(&uc_data.errors)) > > + pr_warn("Update failed for %d cpus\n", atomic_read(&uc_data.errors)); > > + else > > microcode_check(); > > This whole jumping through hoops needs to be extracted away in a > separate function. Not sure what you mean by jumping through hoops need to be extracted away.. > > Ok, that should be enough review for now. More with v2. > > -- > Regards/Gruss, > Boris. > > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) > -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update. 2018-02-21 20:13 ` Raj, Ashok @ 2018-02-21 21:15 ` Borislav Petkov 2018-02-22 15:36 ` Tom Lendacky 1 sibling, 0 replies; 9+ messages in thread From: Borislav Petkov @ 2018-02-21 21:15 UTC (permalink / raw) To: Raj, Ashok Cc: X86 ML, LKML, Tom Lendacky, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Arjan Van De Ven On Wed, Feb 21, 2018 at 12:13:08PM -0800, Raj, Ashok wrote: > This is ensuring no 2 cpus do ucode update at the same time. And that is a problem? We don't do any of that mutual exclusion for early loading. Why isn't it there a problem? > That's what we are doing here, but simply returning number of cpus > that encountered failure instead of a per-cpu retval > like before. You still don't need ->errors. > When we online any of the offline cpu's we do a microcode load again right? That doesn't make any sense. First you say: "the Intel microcode team asked us to make sure the system is in a quiet state during these updates." When you've updated the microcode on a subset of the cores and then the other cores come up and you do that in the hotplug notifier, the system is far from quiet. On the contrary, it is busy bootstrapping cores. Which makes me wonder if that "quiet" argument even means anything. Because if you wanna do that in the notifiers, you can just as well offline all the cores but the BSP, update the microcode there and then online the rest again ---> no need for that patch at all. > Not sure what you mean by jumping through hoops need to be extracted away.. Take that code: + memset(&uc_data, 0, sizeof(struct ucode_update_param)); + spin_lock_init(&uc_data.ucode_lock); + atomic_set(&uc_data.enter, num_online_cpus()); + /* + * Wait for a 1 sec + */ + uc_data.timeout = USEC_PER_SEC; + stop_machine(ucode_load_rendezvous, &uc_data, cpu_online_mask); + + pr_debug("Total CPUS = %d uperrors = %d\n", + atomic_read(&uc_data.count), atomic_read(&uc_data.errors)); + + if (atomic_read(&uc_data.errors)) and put it in a separate function which you call in reload_store(). -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update. 2018-02-21 20:13 ` Raj, Ashok 2018-02-21 21:15 ` Borislav Petkov @ 2018-02-22 15:36 ` Tom Lendacky 1 sibling, 0 replies; 9+ messages in thread From: Tom Lendacky @ 2018-02-22 15:36 UTC (permalink / raw) To: Raj, Ashok, Borislav Petkov Cc: X86 ML, LKML, Thomas Gleixner, Ingo Molnar, Tony Luck, Andi Kleen, Arjan Van De Ven On 2/21/2018 2:13 PM, Raj, Ashok wrote: > On Wed, Feb 21, 2018 at 08:06:11PM +0100, Borislav Petkov wrote: >>> arch/x86/kernel/cpu/microcode/core.c | 113 +++++++++++++++++++++++++++++----- >> >> This is generic so Tom needs to ack whatever we end up doing for the AMD >> side. > > Yes, i did ping Tom to check if this is ok with them. I did some testing with these patches and didn't notice any issues on my EPYC system. At the moment, I currently don't have access to anything older on which to test. But I don't believe there should be any issues with this approach. I'll retest when we get closer to the final version of the patch. Thanks, Tom > >> >>> arch/x86/kernel/cpu/microcode/intel.c | 1 + >>> 2 files changed, 98 insertions(+), 16 deletions(-) >>> >>> diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c >>> index aa1b9a4..af0aeb2 100644 >>> --- a/arch/x86/kernel/cpu/microcode/core.c >>> +++ b/arch/x86/kernel/cpu/microcode/core.c >>> @@ -31,6 +31,9 @@ >>> #include <linux/cpu.h> >>> #include <linux/fs.h> >>> #include <linux/mm.h> >>> +#include <linux/nmi.h> >>> +#include <linux/stop_machine.h> >>> +#include <linux/delay.h> >>> >>> #include <asm/microcode_intel.h> >>> #include <asm/cpu_device_id.h> >>> @@ -489,19 +492,82 @@ static void __exit microcode_dev_exit(void) >>> /* fake device for request_firmware */ >>> static struct platform_device *microcode_pdev; >>> >>> -static enum ucode_state reload_for_cpu(int cpu) >>> +static struct ucode_update_param { >>> + spinlock_t ucode_lock; >>> + atomic_t count; >>> + atomic_t errors; >>> + atomic_t enter; >>> + int timeout; >>> +} uc_data; >>> + >>> +static void do_ucode_update(int cpu, struct ucode_update_param *ucd) >>> { >>> - struct ucode_cpu_info *uci = ucode_cpu_info + cpu; >>> - enum ucode_state ustate; >>> + enum ucode_state retval = 0; >>> >>> - if (!uci->valid) >>> - return UCODE_OK; >>> + spin_lock(&ucd->ucode_lock); >>> + retval = microcode_ops->apply_microcode(cpu); >>> + spin_unlock(&ucd->ucode_lock); >> >> What's the spinlock protecting against? > > This is ensuring no 2 cpus do ucode update at the same time. > > Since all cpus wait for all the online cpus to arrive in stop_machine handler. > Once we let go, every cpu tries to update. This just serializes against that. > >> >> We hold the hotplug lock and the microcode mutex. And yet interrupts are >> still enabled. So what's up? > > hotplug lock/microcode mutex are at global level, these are > protecting individual cpus in stop machine trying to update microcode. > > these are called while in stop_machine() so i think interrupts are disabled IRC. > >> >> >>> + if (retval > UCODE_NFOUND) { >>> + atomic_inc(&ucd->errors); >> >> You don't need ->errors. Simply propagate retval from do_ucode_update(). >> Or compare ucd->count to the number of CPUs. Or something like that. > > That's what we are doing here, but simply returning number of cpus > that encountered failure instead of a per-cpu retval > like before. > > I use ucd->count to use as an exit rendezvous.. to make sure we leave only > after all cpus have done updating ucode. > >>> + pr_warn("microcode update to cpu %d failed\n", cpu); >>> + } >>> + atomic_inc(&ucd->count); >>> +} >>> + >>> +/* >>> + * Wait for upto 1sec for all cpus >>> + * to show up in the rendezvous function >>> + */ >>> +#define MAX_UCODE_RENDEZVOUS 1000000000 /* nanosec */ >> >> 1 * NSEC_PER_SEC >> >>> +#define SPINUNIT 100 /* 100ns */ >>> + >>> +/* >>> + * Each cpu waits for 1sec max. >>> + */ >>> +static int ucode_wait_timedout(int *time_out, void *data) >>> +{ >>> + struct ucode_update_param *ucd = data; >>> + if (*time_out < SPINUNIT) { >>> + pr_err("Not all cpus entered ucode update handler %d cpus missing\n", >>> + (num_online_cpus() - atomic_read(&ucd->enter))); >>> + return 1; >>> + } >>> + *time_out -= SPINUNIT; >>> + touch_nmi_watchdog(); >>> + return 0; >>> +} >>> + >>> +/* >>> + * All cpus enter here before a ucode load upto 1 sec. >>> + * If not all cpus showed up, we abort the ucode update >>> + * and return. ucode update is serialized with the spinlock >> >> ... and yet you don't check stop_machine()'s retval and issue an error >> message that it failed. >> > > Will add that > >>> + */ >>> +static int ucode_load_rendezvous(void *data) >> >> The correct prefix is "microcode_" >> >>> +{ >>> + int cpu = smp_processor_id(); >>> + struct ucode_update_param *ucd = data; >>> + int timeout = MAX_UCODE_RENDEZVOUS; >>> + int total_cpus = num_online_cpus(); >>> >>> - ustate = microcode_ops->request_microcode_fw(cpu, µcode_pdev->dev, true); >>> - if (ustate != UCODE_OK) >>> - return ustate; >>> + /* >>> + * Wait for all cpu's to arrive >>> + */ >>> + atomic_dec(&ucd->enter); >>> + while(atomic_read(&ucd->enter)) { >>> + if (ucode_wait_timedout(&timeout, ucd)) >>> + return 1; >>> + ndelay(SPINUNIT); >>> + } >>> + >>> + do_ucode_update(cpu, ucd); >>> >>> - return apply_microcode_on_target(cpu); >>> + /* >>> + * Wait for all cpu's to complete >>> + * ucode update >>> + */ >>> + while (atomic_read(&ucd->count) != total_cpus) >>> + cpu_relax(); >>> + return 0; >>> } >>> >>> static ssize_t reload_store(struct device *dev, >>> @@ -509,7 +575,6 @@ static ssize_t reload_store(struct device *dev, >>> const char *buf, size_t size) >>> { >>> enum ucode_state tmp_ret = UCODE_OK; >>> - bool do_callback = false; >>> unsigned long val; >>> ssize_t ret = 0; >>> int cpu; >>> @@ -523,21 +588,37 @@ static ssize_t reload_store(struct device *dev, >>> >>> get_online_cpus(); >>> mutex_lock(µcode_mutex); >>> + /* >>> + * First load the microcode file for all cpu's >>> + */ >> >> It is always "CPU" or "CPUs". Fix all misspelled places. >> >>> for_each_online_cpu(cpu) { >> >> You need to fail loading and not even try when not all cores are online. >> And issue an error message. >> > > When we online any of the offline cpu's we do a microcode load again right? > > I did check with offlining 2 threads of the same core offline, then reload with a > new version of microcode. Online the new cpus i did find the microcode was updated > during online process. > > Since offline ones don't participate in any OS activity thought its ok to > update everything that is available and visitible to the OS. > > If BIOS has turned off cores due to some failures and didn't expose > in MADT during boot, we will never get a chance to update online. > >>> - tmp_ret = reload_for_cpu(cpu); >>> + tmp_ret = microcode_ops->request_microcode_fw(cpu, >>> + µcode_pdev->dev, true); >> >> This needs to happen only once - not per CPU. Let's simply forget >> heterogeneous microcode revisions. > > Sounds good.. let me take a look at this. > >> >>> if (tmp_ret > UCODE_NFOUND) { >>> - pr_warn("Error reloading microcode on CPU %d\n", cpu); >>> + pr_warn("Error reloading microcode file for CPU %d\n", cpu); >>> >>> /* set retval for the first encountered reload error */ >>> if (!ret) >>> ret = -EINVAL; >> >> You can't continue loading here if you've encountered an error. > > Sounds good. >> >>> } >>> - >>> - if (tmp_ret == UCODE_UPDATED) >>> - do_callback = true; >>> } >>> + pr_debug("Done loading microcode file for all cpus\n"); >>> >>> - if (!ret && do_callback) >>> + memset(&uc_data, 0, sizeof(struct ucode_update_param)); >>> + spin_lock_init(&uc_data.ucode_lock); >>> + atomic_set(&uc_data.enter, num_online_cpus()); >>> + /* >>> + * Wait for a 1 sec >>> + */ >>> + uc_data.timeout = USEC_PER_SEC; >>> + stop_machine(ucode_load_rendezvous, &uc_data, cpu_online_mask); >>> + >>> + pr_debug("Total CPUS = %d uperrors = %d\n", >>> + atomic_read(&uc_data.count), atomic_read(&uc_data.errors)); >>> + >>> + if (atomic_read(&uc_data.errors)) >>> + pr_warn("Update failed for %d cpus\n", atomic_read(&uc_data.errors)); >>> + else >>> microcode_check(); >> >> This whole jumping through hoops needs to be extracted away in a >> separate function. > > Not sure what you mean by jumping through hoops need to be extracted away.. > >> >> Ok, that should be enough review for now. More with v2. >> >> -- >> Regards/Gruss, >> Boris. >> >> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) >> -- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-02-22 15:36 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-21 16:49 [PATCH 0/3] Patch series to address some limitations in OS microcode loading Ashok Raj 2018-02-21 16:49 ` [PATCH 1/3] x86/microcode/intel: Check microcode revision before updating sibling threads Ashok Raj 2018-02-21 16:49 ` [PATCH 2/3] x86/microcode/intel: Perform a cache flush before ucode update Ashok Raj 2018-02-21 18:25 ` Borislav Petkov 2018-02-21 16:49 ` [PATCH 3/3] x86/microcode: Quiesce all threads before a microcode update Ashok Raj 2018-02-21 19:06 ` Borislav Petkov 2018-02-21 20:13 ` Raj, Ashok 2018-02-21 21:15 ` Borislav Petkov 2018-02-22 15:36 ` Tom Lendacky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).