[Patch v3 1/2] x86/smp: count the number of online physical processor in the system

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Patch v3 1/2] x86/smp: count the number of online physical processor in the system
@ 2018-05-08 22:01 Chao Gao
  2018-05-08 22:01 ` [Patch v3 2/2] x86/microcode: Synchronize late microcode loading Chao Gao
  2018-05-16 12:54 ` [Patch v3 1/2] x86/smp: count the number of online physical processor in the system Jan Beulich
  0 siblings, 2 replies; 12+ messages in thread
From: Chao Gao @ 2018-05-08 22:01 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Jan Beulich, Chao Gao

Mainly for the patch behind which relies on 'nr_phys_cpus' to estimate
the time needed for microcode update in the worst case.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
v3:
 - new

---
 xen/arch/x86/smpboot.c    | 13 +++++++++++++
 xen/include/asm-x86/smp.h |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 86fa410..c3c3558 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -67,6 +67,8 @@ unsigned int __read_mostly nr_sockets;
 cpumask_t **__read_mostly socket_cpumask;
 static cpumask_t *secondary_socket_cpumask;
 
+unsigned int __read_mostly nr_phys_cpus;
+
 struct cpuinfo_x86 cpu_data[NR_CPUS];
 
 u32 x86_cpu_to_apicid[NR_CPUS] __read_mostly =
@@ -262,6 +264,10 @@ static void set_cpu_sibling_map(int cpu)
         cpumask_set_cpu(cpu, per_cpu(cpu_sibling_mask, cpu));
     }
 
+    /* Increase physical processor count when a new cpu comes up */
+    if ( cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) == 1 )
+        nr_phys_cpus++;
+
     if ( c[cpu].x86_max_cores == 1 )
     {
         cpumask_copy(per_cpu(cpu_core_mask, cpu),
@@ -1156,6 +1162,13 @@ remove_siblinginfo(int cpu)
             cpu_data[sibling].booted_cores--;
     }
 
+    /*
+     * Decrease physical processor count when all threads of a physical
+     * processor go down
+     */
+    if ( cpumask_weight(per_cpu(cpu_sibling_mask, cpu)) == 1 )
+        nr_phys_cpus--;
+
     for_each_cpu(sibling, per_cpu(cpu_sibling_mask, cpu))
         cpumask_clear_cpu(cpu, per_cpu(cpu_sibling_mask, sibling));
     cpumask_clear(per_cpu(cpu_sibling_mask, cpu));
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 4e5f673..910888a 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -65,6 +65,9 @@ uint32_t get_cur_idle_nums(void);
  */
 extern unsigned int nr_sockets;
 
+/* The number of online physical CPUs in this system */
+extern unsigned int nr_phys_cpus;
+
 void set_nr_sockets(void);
 
 /* Representing HT and core siblings in each socket. */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-08 22:01 [Patch v3 1/2] x86/smp: count the number of online physical processor in the system Chao Gao
@ 2018-05-08 22:01 ` Chao Gao
  2018-05-16 13:10   ` Jan Beulich
  2018-11-13  9:08   ` Chao Gao
  2018-05-16 12:54 ` [Patch v3 1/2] x86/smp: count the number of online physical processor in the system Jan Beulich
  1 sibling, 2 replies; 12+ messages in thread
From: Chao Gao @ 2018-05-08 22:01 UTC (permalink / raw)
  To: xen-devel
  Cc: Kevin Tian, Ashok Raj, Andrew Cooper, Jan Beulich, Jun Nakajima,
	Thomas Gleixner, Borislav Petkov, Chao Gao

This patch ports microcode improvement patches from linux kernel.

Before you read any further: the early loading method is still the
preferred one and you should always do that. The following patch is
improving the late loading mechanism for long running jobs and cloud use
cases.

Gather all cores and serialize the microcode update on them by doing it
one-by-one to make the late update process as reliable as possible and
avoid potential issues caused by the microcode update.

This patch is also in accord with Andrew's suggestion,
"Rendezvous all online cpus in an IPI to apply the patch, and keep the
processors in until all have completed the patch.", in [1].

[1]:https://wiki.xenproject.org/wiki/XenParavirtOps/microcode_update#Run_time_microcode_updates

Signed-off-by: Chao Gao <chao.gao@intel.com>
Tested-by: Chao Gao <chao.gao@intel.com>
[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
v3:
 - make the 2nd parameter of wait_for_cpu() unsigned
 - correct the inverted condition in the 2nd if() in do_microcode_update().
 - when waiting for the finish of microcode update, scale the timeout by
 physical processor count other than logical thread count
 - pass the return value of stop_machine_run() to the caller.

v2:
 - Reduce the timeout from 1s to 30ms
 - update microcode with better parallelism; only one logical thread of each
 core tries to take lock and do the update.
 - remove 'error' field from struct microcode_info
 - correct coding style issues
---
 xen/arch/x86/microcode.c | 117 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 91 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
index 4163f50..355bc6d 100644
--- a/xen/arch/x86/microcode.c
+++ b/xen/arch/x86/microcode.c
@@ -22,6 +22,7 @@
  */
 
 #include <xen/cpu.h>
+#include <xen/cpumask.h>
 #include <xen/lib.h>
 #include <xen/kernel.h>
 #include <xen/init.h>
@@ -30,15 +31,21 @@
 #include <xen/smp.h>
 #include <xen/softirq.h>
 #include <xen/spinlock.h>
+#include <xen/stop_machine.h>
 #include <xen/tasklet.h>
 #include <xen/guest_access.h>
 #include <xen/earlycpio.h>
+#include <xen/watchdog.h>
 
+#include <asm/delay.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
 #include <asm/setup.h>
 #include <asm/microcode.h>
 
+/* By default, wait for 30000us */
+#define MICROCODE_DEFAULT_TIMEOUT 30000
+
 static module_t __initdata ucode_mod;
 static signed int __initdata ucode_mod_idx;
 static bool_t __initdata ucode_mod_forced;
@@ -187,9 +194,9 @@ static DEFINE_SPINLOCK(microcode_mutex);
 DEFINE_PER_CPU(struct ucode_cpu_info, ucode_cpu_info);
 
 struct microcode_info {
-    unsigned int cpu;
+    cpumask_t cpus;
+    atomic_t cpu_in, cpu_out;
     uint32_t buffer_size;
-    int error;
     char buffer[1];
 };
 
@@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, size_t size)
     return err;
 }
 
-static long do_microcode_update(void *_info)
+/* Wait for all CPUs to rendezvous with a timeout (us) */
+static int wait_for_cpus(atomic_t *cnt, unsigned int timeout)
 {
-    struct microcode_info *info = _info;
-    int error;
+    unsigned int cpus = num_online_cpus();
 
-    BUG_ON(info->cpu != smp_processor_id());
+    atomic_inc(cnt);
 
-    error = microcode_update_cpu(info->buffer, info->buffer_size);
-    if ( error )
-        info->error = error;
+    while ( atomic_read(cnt) != cpus )
+    {
+        if ( timeout <= 0 )
+        {
+            printk("Timeout when waiting for CPUs calling in\n");
+            return -EBUSY;
+        }
+        udelay(1);
+        timeout--;
+    }
 
-    info->cpu = cpumask_next(info->cpu, &cpu_online_map);
-    if ( info->cpu < nr_cpu_ids )
-        return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info);
+    return 0;
+}
 
-    error = info->error;
-    xfree(info);
-    return error;
+static int do_microcode_update(void *_info)
+{
+    struct microcode_info *info = _info;
+    unsigned int cpu = smp_processor_id();
+    int ret;
+
+    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
+    if ( ret )
+        return ret;
+
+    /*
+     * Logical threads which set the first bit in cpu_sibling_mask can do
+     * the update. Other sibling threads just await the completion of
+     * microcode update.
+     */
+    if ( !cpumask_test_and_set_cpu(
+                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
+        ret = microcode_update_cpu(info->buffer, info->buffer_size);
+    /*
+     * Increase the wait timeout to a safe value here since we're serializing
+     * the microcode update and that could take a while on a large number of
+     * CPUs. And that is fine as the *actual* timeout will be determined by
+     * the last CPU finished updating and thus cut short
+     */
+    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
+                                       nr_phys_cpus) )
+        panic("Timeout when finishing updating microcode");
+
+    return ret;
 }
 
 int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
@@ -318,26 +357,52 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
 
     ret = copy_from_guest(info->buffer, buf, len);
     if ( ret != 0 )
-    {
-        xfree(info);
-        return ret;
-    }
+        goto free;
 
     info->buffer_size = len;
-    info->error = 0;
-    info->cpu = cpumask_first(&cpu_online_map);
+
+    /* cpu_online_map must not change during update */
+    if ( !get_cpu_maps() )
+    {
+        ret = -EBUSY;
+        goto free;
+    }
 
     if ( microcode_ops->start_update )
     {
         ret = microcode_ops->start_update();
         if ( ret != 0 )
-        {
-            xfree(info);
-            return ret;
-        }
+            goto put;
     }
 
-    return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info);
+    cpumask_clear(&info->cpus);
+    atomic_set(&info->cpu_in, 0);
+    atomic_set(&info->cpu_out, 0);
+
+    /*
+     * We intend to disable interrupt for long time, which may lead to
+     * watchdog timeout.
+     */
+    watchdog_disable();
+    /*
+     * Late loading dance. Why the heavy-handed stop_machine effort?
+     *
+     * -HT siblings must be idle and not execute other code while the other
+     *  sibling is loading microcode in order to avoid any negative
+     *  interactions cause by the loading.
+     *
+     * -In addition, microcode update on the cores must be serialized until
+     *  this requirement can be relaxed in the feature. Right now, this is
+     *  conservative and good.
+     */
+    ret = stop_machine_run(do_microcode_update, info, NR_CPUS);
+    watchdog_enable();
+
+ put:
+    put_cpu_maps();
+ free:
+    xfree(info);
+    return ret;
 }
 
 static int __init microcode_init(void)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Patch v3 1/2] x86/smp: count the number of online physical processor in the system
  2018-05-08 22:01 [Patch v3 1/2] x86/smp: count the number of online physical processor in the system Chao Gao
  2018-05-08 22:01 ` [Patch v3 2/2] x86/microcode: Synchronize late microcode loading Chao Gao
@ 2018-05-16 12:54 ` Jan Beulich
  1 sibling, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2018-05-16 12:54 UTC (permalink / raw)
  To: Chao Gao; +Cc: Andrew Cooper, xen-devel

>>> On 09.05.18 at 00:01, <chao.gao@intel.com> wrote:
> Mainly for the patch behind which relies on 'nr_phys_cpus' to estimate
> the time needed for microcode update in the worst case.

Leaving aside the aspect of "nr_phys_cpus" not being a suitable name
("nr_online_cores" would come closer imo) I'm not convinced using a
global variable is the way to go here, with just that special purpose
consumer in the next patch. Why can't you work out this count right
there?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-08 22:01 ` [Patch v3 2/2] x86/microcode: Synchronize late microcode loading Chao Gao
@ 2018-05-16 13:10   ` Jan Beulich
  2018-05-16 13:25     ` Andrew Cooper
  2018-11-13  9:08   ` Chao Gao
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-05-16 13:10 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Ashok Raj, Andrew Cooper, xen-devel, Jun Nakajima,
	tglx, Borislav Petkov

>>> On 09.05.18 at 00:01, <chao.gao@intel.com> wrote:
> @@ -30,15 +31,21 @@
>  #include <xen/smp.h>
>  #include <xen/softirq.h>
>  #include <xen/spinlock.h>
> +#include <xen/stop_machine.h>
>  #include <xen/tasklet.h>
>  #include <xen/guest_access.h>
>  #include <xen/earlycpio.h>
> +#include <xen/watchdog.h>
>  
> +#include <asm/delay.h>
>  #include <asm/msr.h>
>  #include <asm/processor.h>
>  #include <asm/setup.h>
>  #include <asm/microcode.h>
>  
> +/* By default, wait for 30000us */
> +#define MICROCODE_DEFAULT_TIMEOUT 30000

Please attach the unit (_US) to the name.

> @@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, size_t size)
>      return err;
>  }
>  
> -static long do_microcode_update(void *_info)
> +/* Wait for all CPUs to rendezvous with a timeout (us) */
> +static int wait_for_cpus(atomic_t *cnt, unsigned int timeout)
>  {
> -    struct microcode_info *info = _info;
> -    int error;
> +    unsigned int cpus = num_online_cpus();
>  
> -    BUG_ON(info->cpu != smp_processor_id());
> +    atomic_inc(cnt);
>  
> -    error = microcode_update_cpu(info->buffer, info->buffer_size);
> -    if ( error )
> -        info->error = error;
> +    while ( atomic_read(cnt) != cpus )
> +    {
> +        if ( timeout <= 0 )

!timeout (or timeout == 0), now that it's of unsigned type.

> +static int do_microcode_update(void *_info)
> +{
> +    struct microcode_info *info = _info;
> +    unsigned int cpu = smp_processor_id();
> +    int ret;
> +
> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
> +    if ( ret )
> +        return ret;
> +
> +    /*
> +     * Logical threads which set the first bit in cpu_sibling_mask can do
> +     * the update. Other sibling threads just await the completion of
> +     * microcode update.
> +     */
> +    if ( !cpumask_test_and_set_cpu(
> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
> +    /*
> +     * Increase the wait timeout to a safe value here since we're serializing
> +     * the microcode update and that could take a while on a large number of
> +     * CPUs. And that is fine as the *actual* timeout will be determined by
> +     * the last CPU finished updating and thus cut short
> +     */
> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
> +                                       nr_phys_cpus) )

I remain unconvinced that this is a safe thing to do on a huge system with
guests running (even Dom0 alone would seem risky enough). I continue to
hope for comments from others, in particular Andrew, here. At the very
least I think you should taint the hypervisor when making it here.

> +    /*
> +     * Late loading dance. Why the heavy-handed stop_machine effort?
> +     *
> +     * -HT siblings must be idle and not execute other code while the other
> +     *  sibling is loading microcode in order to avoid any negative
> +     *  interactions cause by the loading.
> +     *
> +     * -In addition, microcode update on the cores must be serialized until
> +     *  this requirement can be relaxed in the feature. Right now, this is

s/feature/future/

Also please add blanks after the hyphens.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-16 13:10   ` Jan Beulich
@ 2018-05-16 13:25     ` Andrew Cooper
  2018-05-16 13:46       ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Cooper @ 2018-05-16 13:25 UTC (permalink / raw)
  To: Jan Beulich, Chao Gao
  Cc: Kevin Tian, Ashok Raj, xen-devel, Jun Nakajima, tglx, Borislav Petkov

On 16/05/18 14:10, Jan Beulich wrote:
>> +static int do_microcode_update(void *_info)
>> +{
>> +    struct microcode_info *info = _info;
>> +    unsigned int cpu = smp_processor_id();
>> +    int ret;
>> +
>> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>> +    if ( ret )
>> +        return ret;
>> +
>> +    /*
>> +     * Logical threads which set the first bit in cpu_sibling_mask can do
>> +     * the update. Other sibling threads just await the completion of
>> +     * microcode update.
>> +     */
>> +    if ( !cpumask_test_and_set_cpu(
>> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
>> +    /*
>> +     * Increase the wait timeout to a safe value here since we're serializing
>> +     * the microcode update and that could take a while on a large number of
>> +     * CPUs. And that is fine as the *actual* timeout will be determined by
>> +     * the last CPU finished updating and thus cut short
>> +     */
>> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
>> +                                       nr_phys_cpus) )
> I remain unconvinced that this is a safe thing to do on a huge system with
> guests running (even Dom0 alone would seem risky enough). I continue to
> hope for comments from others, in particular Andrew, here. At the very
> least I think you should taint the hypervisor when making it here.

I see nothing in this patch which prevents a deadlock against the time
calibration rendezvous.  It think its fine to pause the time calibration
rendezvous while performing this update.

Also, what is the purpose of serialising the updates while all pcpus are
in rendezvous?  Surely at that point the best option is to initiate an
update on all processors which don't have an online sibling thread with
a lower thread id.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-16 13:25     ` Andrew Cooper
@ 2018-05-16 13:46       ` Jan Beulich
  2018-05-18  7:21         ` Chao Gao
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-05-16 13:46 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Kevin Tian, Ashok Raj, xen-devel, Jun Nakajima, tglx,
	Borislav Petkov, Chao Gao

>>> On 16.05.18 at 15:25, <andrew.cooper3@citrix.com> wrote:
> On 16/05/18 14:10, Jan Beulich wrote:
>>> +static int do_microcode_update(void *_info)
>>> +{
>>> +    struct microcode_info *info = _info;
>>> +    unsigned int cpu = smp_processor_id();
>>> +    int ret;
>>> +
>>> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>>> +    if ( ret )
>>> +        return ret;
>>> +
>>> +    /*
>>> +     * Logical threads which set the first bit in cpu_sibling_mask can do
>>> +     * the update. Other sibling threads just await the completion of
>>> +     * microcode update.
>>> +     */
>>> +    if ( !cpumask_test_and_set_cpu(
>>> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>>> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
>>> +    /*
>>> +     * Increase the wait timeout to a safe value here since we're serializing
>>> +     * the microcode update and that could take a while on a large number of
>>> +     * CPUs. And that is fine as the *actual* timeout will be determined by
>>> +     * the last CPU finished updating and thus cut short
>>> +     */
>>> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
>>> +                                       nr_phys_cpus) )
>> I remain unconvinced that this is a safe thing to do on a huge system with
>> guests running (even Dom0 alone would seem risky enough). I continue to
>> hope for comments from others, in particular Andrew, here. At the very
>> least I think you should taint the hypervisor when making it here.
> 
> I see nothing in this patch which prevents a deadlock against the time
> calibration rendezvous.  It think its fine to pause the time calibration
> rendezvous while performing this update.

If there's a problem here, wouldn't that be a general one with
stop_machine()?

> Also, what is the purpose of serialising the updates while all pcpus are
> in rendezvous?  Surely at that point the best option is to initiate an
> update on all processors which don't have an online sibling thread with
> a lower thread id.

I've suggested that before.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-16 13:46       ` Jan Beulich
@ 2018-05-18  7:21         ` Chao Gao
  2018-05-22  8:59           ` Chao Gao
  0 siblings, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-05-18  7:21 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Kevin Tian, Ashok Raj, xen-devel, Jun Nakajima, tglx, Borislav Petkov

On Wed, May 16, 2018 at 07:46:48AM -0600, Jan Beulich wrote:
>>>> On 16.05.18 at 15:25, <andrew.cooper3@citrix.com> wrote:
>> On 16/05/18 14:10, Jan Beulich wrote:
>>>> +static int do_microcode_update(void *_info)
>>>> +{
>>>> +    struct microcode_info *info = _info;
>>>> +    unsigned int cpu = smp_processor_id();
>>>> +    int ret;
>>>> +
>>>> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>>>> +    if ( ret )
>>>> +        return ret;
>>>> +
>>>> +    /*
>>>> +     * Logical threads which set the first bit in cpu_sibling_mask can do
>>>> +     * the update. Other sibling threads just await the completion of
>>>> +     * microcode update.
>>>> +     */
>>>> +    if ( !cpumask_test_and_set_cpu(
>>>> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>>>> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
>>>> +    /*
>>>> +     * Increase the wait timeout to a safe value here since we're serializing
>>>> +     * the microcode update and that could take a while on a large number of
>>>> +     * CPUs. And that is fine as the *actual* timeout will be determined by
>>>> +     * the last CPU finished updating and thus cut short
>>>> +     */
>>>> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
>>>> +                                       nr_phys_cpus) )
>>> I remain unconvinced that this is a safe thing to do on a huge system with
>>> guests running (even Dom0 alone would seem risky enough). I continue to

I think there are other operations may also endanger the security, stability
of the whole system. We offer them with caveats. Same here, three
different methods can be used to update microcode; the late update isn't
perfect at this moment. At least, we provide a more reliable method to update
microcode at runtime on systems with no so many cores. And for a huge
system, admins can assess the risk and choose the most suitable method.
They can completely avoid doing live updates and mandate a reboot and do
it early since that's the most dependable method.

>>> hope for comments from others, in particular Andrew, here. At the very
>>> least I think you should taint the hypervisor when making it here.
>> 
>> I see nothing in this patch which prevents a deadlock against the time
>> calibration rendezvous.  It think its fine to pause the time calibration
>> rendezvous while performing this update.
>
>If there's a problem here, wouldn't that be a general one with
>stop_machine()?

I agree with Jan. It shouldn't be specific to the stop_machine() here.
Anyhow, I will look into the potential deadlock you mentioned.

>
>> Also, what is the purpose of serialising the updates while all pcpus are
>> in rendezvous?

microcode_mutex which prevents doing the updates in parallel is not
introduced by this patch. At present, We want to keep this patch and the
update process simple. Could we just make it work first and try to work
out some optimizations later?

>> Surely at that point the best option is to initiate an
>> update on all processors which don't have an online sibling thread with
>> a lower thread id.
>
>I've suggested that before.

I think Andrew's suggestion here is similar to the method which this patch is
using.

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-18  7:21         ` Chao Gao
@ 2018-05-22  8:59           ` Chao Gao
  2018-05-22  9:26             ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-05-22  8:59 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: Kevin Tian, Ashok Raj, xen-devel, Jun Nakajima, tglx, Borislav Petkov

On Fri, May 18, 2018 at 03:21:14PM +0800, Chao Gao wrote:
>On Wed, May 16, 2018 at 07:46:48AM -0600, Jan Beulich wrote:
>>>>> On 16.05.18 at 15:25, <andrew.cooper3@citrix.com> wrote:
>>> On 16/05/18 14:10, Jan Beulich wrote:
>>>>> +static int do_microcode_update(void *_info)
>>>>> +{
>>>>> +    struct microcode_info *info = _info;
>>>>> +    unsigned int cpu = smp_processor_id();
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>>>>> +    if ( ret )
>>>>> +        return ret;
>>>>> +
>>>>> +    /*
>>>>> +     * Logical threads which set the first bit in cpu_sibling_mask can do
>>>>> +     * the update. Other sibling threads just await the completion of
>>>>> +     * microcode update.
>>>>> +     */
>>>>> +    if ( !cpumask_test_and_set_cpu(
>>>>> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>>>>> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
>>>>> +    /*
>>>>> +     * Increase the wait timeout to a safe value here since we're serializing
>>>>> +     * the microcode update and that could take a while on a large number of
>>>>> +     * CPUs. And that is fine as the *actual* timeout will be determined by
>>>>> +     * the last CPU finished updating and thus cut short
>>>>> +     */
>>>>> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
>>>>> +                                       nr_phys_cpus) )
>>>> I remain unconvinced that this is a safe thing to do on a huge system with
>>>> guests running (even Dom0 alone would seem risky enough). I continue to
>
>I think there are other operations may also endanger the security, stability
>of the whole system. We offer them with caveats. Same here, three
>different methods can be used to update microcode; the late update isn't
>perfect at this moment. At least, we provide a more reliable method to update
>microcode at runtime on systems with no so many cores. And for a huge
>system, admins can assess the risk and choose the most suitable method.
>They can completely avoid doing live updates and mandate a reboot and do
>it early since that's the most dependable method.
>
>>>> hope for comments from others, in particular Andrew, here. At the very
>>>> least I think you should taint the hypervisor when making it here.
>>> 
>>> I see nothing in this patch which prevents a deadlock against the time
>>> calibration rendezvous.  It think its fine to pause the time calibration
>>> rendezvous while performing this update.
>>
>>If there's a problem here, wouldn't that be a general one with
>>stop_machine()?
>
>I agree with Jan. It shouldn't be specific to the stop_machine() here.
>Anyhow, I will look into the potential deadlock you mentioned.
>
>>
>>> Also, what is the purpose of serialising the updates while all pcpus are
>>> in rendezvous?
>
>microcode_mutex which prevents doing the updates in parallel is not
>introduced by this patch. At present, We want to keep this patch and the
>update process simple. Could we just make it work first and try to work
>out some optimizations later?


Hi Jan & Andrew,

Do you think it is acceptable that we just follow linux kernel at present
and work out optimizations later?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-22  8:59           ` Chao Gao
@ 2018-05-22  9:26             ` Jan Beulich
  2018-05-22 20:14               ` Raj, Ashok
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2018-05-22  9:26 UTC (permalink / raw)
  To: Chao Gao
  Cc: Kevin Tian, Ashok Raj, Andrew Cooper, xen-devel, Jun Nakajima,
	tglx, Borislav Petkov

>>> On 22.05.18 at 10:59, <chao.gao@intel.com> wrote:
> Do you think it is acceptable that we just follow linux kernel at present
> and work out optimizations later?

In the worst case I could live with it, but I'd be far from happy if so. Sadly
experience (in general, not with you personally) has told me that if we let
things go in this way right now, there's a high risk that you'd never follow
up with the subsequent optimization of the process.

Furthermore, in Linux, was the decision to go the presented route really
taken with KVM and its possibly active guests in mind? Not to speak of
ordinary applications that may be latency sensitive.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-22  9:26             ` Jan Beulich
@ 2018-05-22 20:14               ` Raj, Ashok
  0 siblings, 0 replies; 12+ messages in thread
From: Raj, Ashok @ 2018-05-22 20:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, Ashok Raj, Andrew Cooper, xen-devel, Jun Nakajima,
	tglx, Borislav Petkov, Chao Gao

Hi Jan

On Tue, May 22, 2018 at 03:26:06AM -0600, Jan Beulich wrote:
> >>> On 22.05.18 at 10:59, <chao.gao@intel.com> wrote:
> > Do you think it is acceptable that we just follow linux kernel at present
> > and work out optimizations later?
> 
> In the worst case I could live with it, but I'd be far from happy if so. Sadly
> experience (in general, not with you personally) has told me that if we let
> things go in this way right now, there's a high risk that you'd never follow
> up with the subsequent optimization of the process.

You are right.. but to be fair, the original patch we did was to update all cores
at the same time (just limiting no more than one thread in a core do the update)

We scaled it back to be conservative taking feedback from our internal teams.
You are preaching to the choir here :-).. but given that there has been so many
problems in this area, we wanted to be extra safe and take baby steps.

so the optimization is stil in the plan.. 

> 
> Furthermore, in Linux, was the decision to go the presented route really
> taken with KVM and its possibly active guests in mind? Not to speak of
> ordinary applications that may be latency sensitive.

Late load is really an option that i think system/cloud adminstrator would do
based on need. If they have real latency sensitive workloads runing they can always
have the option on when to do the live update. At best they can simply update initrd
so just in case there is a reboot scheduled it would catch the update. Which is the 
2nd most receommended method compared to doing it in the BIOS (most preferred). late_load
is the last choice to ucode update.

We can make the duration small, but a stop_machine() is still doing to be noticable 
on certain workloads at any scale.

Like someone told me: If you tell "doctor doctor it hurts when i do this"

the recommendation is "Don't do it". :-)

This is a facility that we will improve, but its never going to be perfect for all 
conditions. Not recommended by end user and weak at heart.

Cheers,
Ashok

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-05-08 22:01 ` [Patch v3 2/2] x86/microcode: Synchronize late microcode loading Chao Gao
  2018-05-16 13:10   ` Jan Beulich
@ 2018-11-13  9:08   ` Chao Gao
  2018-11-13  9:09     ` Andrew Cooper
  1 sibling, 1 reply; 12+ messages in thread
From: Chao Gao @ 2018-11-13  9:08 UTC (permalink / raw)
  To: xen-devel
  Cc: Tian, Kevin, Raj, Ashok, Andrew Cooper, Jan Beulich, Nakajima,
	Jun, Thomas Gleixner, Borislav Petkov

On Wed, May 09, 2018 at 06:01:33AM +0800, Gao, Chao wrote:
>This patch ports microcode improvement patches from linux kernel.
>
>Before you read any further: the early loading method is still the
>preferred one and you should always do that. The following patch is
>improving the late loading mechanism for long running jobs and cloud use
>cases.
>
>Gather all cores and serialize the microcode update on them by doing it
>one-by-one to make the late update process as reliable as possible and
>avoid potential issues caused by the microcode update.
>
>This patch is also in accord with Andrew's suggestion,
>"Rendezvous all online cpus in an IPI to apply the patch, and keep the
>processors in until all have completed the patch.", in [1].
>
>[1]:https://wiki.xenproject.org/wiki/XenParavirtOps/microcode_update#Run_time_microcode_updates
>
>Signed-off-by: Chao Gao <chao.gao@intel.com>
>Tested-by: Chao Gao <chao.gao@intel.com>
>[linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
>[linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
>Cc: Kevin Tian <kevin.tian@intel.com>
>Cc: Jun Nakajima <jun.nakajima@intel.com>
>Cc: Ashok Raj <ashok.raj@intel.com>
>Cc: Borislav Petkov <bp@suse.de>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>Cc: Jan Beulich <jbeulich@suse.com>
>---
>+static int do_microcode_update(void *_info)
>+{
>+    struct microcode_info *info = _info;
>+    unsigned int cpu = smp_processor_id();
>+    int ret;
>+
>+    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>+    if ( ret )
>+        return ret;
>+
>+    /*
>+     * Logical threads which set the first bit in cpu_sibling_mask can do
>+     * the update. Other sibling threads just await the completion of
>+     * microcode update.
>+     */
>+    if ( !cpumask_test_and_set_cpu(
>+                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>+        ret = microcode_update_cpu(info->buffer, info->buffer_size);

HI

A critical issue I realized is that microcode_update_cpu() here contains
much things, for instance, parsing the microcode binary and loading it. It is
a bad idea to put so much things in stop_machine context, especially memory
allocation (I did observe one assertion triggered sometimes when doing this).
AFAIK, we have two solutions:
1. use a malloc variation which can be used in stop_machine context.
2. like linux kernel, we can separate the microcode parsing from loading,
and in stop_machine context only do microcode loading. When parsing microcode
binary, all valid ucodes are put into a list. At loading stage, going through
the list is definitely safe.

What's your opinion on these two solutions?

Thanks
Chao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Patch v3 2/2] x86/microcode: Synchronize late microcode loading
  2018-11-13  9:08   ` Chao Gao
@ 2018-11-13  9:09     ` Andrew Cooper
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew Cooper @ 2018-11-13  9:09 UTC (permalink / raw)
  To: Chao Gao, xen-devel
  Cc: Tian, Kevin, Raj, Ashok, Nakajima, Jun, Jan Beulich,
	Thomas Gleixner, Borislav Petkov

On 13/11/2018 09:08, Chao Gao wrote:
> On Wed, May 09, 2018 at 06:01:33AM +0800, Gao, Chao wrote:
>> This patch ports microcode improvement patches from linux kernel.
>>
>> Before you read any further: the early loading method is still the
>> preferred one and you should always do that. The following patch is
>> improving the late loading mechanism for long running jobs and cloud use
>> cases.
>>
>> Gather all cores and serialize the microcode update on them by doing it
>> one-by-one to make the late update process as reliable as possible and
>> avoid potential issues caused by the microcode update.
>>
>> This patch is also in accord with Andrew's suggestion,
>> "Rendezvous all online cpus in an IPI to apply the patch, and keep the
>> processors in until all have completed the patch.", in [1].
>>
>> [1]:https://wiki.xenproject.org/wiki/XenParavirtOps/microcode_update#Run_time_microcode_updates
>>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Tested-by: Chao Gao <chao.gao@intel.com>
>> [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
>> [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> ---
>> +static int do_microcode_update(void *_info)
>> +{
>> +    struct microcode_info *info = _info;
>> +    unsigned int cpu = smp_processor_id();
>> +    int ret;
>> +
>> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
>> +    if ( ret )
>> +        return ret;
>> +
>> +    /*
>> +     * Logical threads which set the first bit in cpu_sibling_mask can do
>> +     * the update. Other sibling threads just await the completion of
>> +     * microcode update.
>> +     */
>> +    if ( !cpumask_test_and_set_cpu(
>> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
>> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);
> HI
>
> A critical issue I realized is that microcode_update_cpu() here contains
> much things, for instance, parsing the microcode binary and loading it. It is
> a bad idea to put so much things in stop_machine context, especially memory
> allocation (I did observe one assertion triggered sometimes when doing this).
> AFAIK, we have two solutions:
> 1. use a malloc variation which can be used in stop_machine context.
> 2. like linux kernel, we can separate the microcode parsing from loading,
> and in stop_machine context only do microcode loading. When parsing microcode
> binary, all valid ucodes are put into a list. At loading stage, going through
> the list is definitely safe.
>
> What's your opinion on these two solutions?

Definitely option 2.  That work wants to be done once, in the hypercall
handler, before rendezvousing the system.

A useful side effect is that it reduces the quantity of work to do in
the rendezvous.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-11-13  9:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-08 22:01 [Patch v3 1/2] x86/smp: count the number of online physical processor in the system Chao Gao
2018-05-08 22:01 ` [Patch v3 2/2] x86/microcode: Synchronize late microcode loading Chao Gao
2018-05-16 13:10   ` Jan Beulich
2018-05-16 13:25     ` Andrew Cooper
2018-05-16 13:46       ` Jan Beulich
2018-05-18  7:21         ` Chao Gao
2018-05-22  8:59           ` Chao Gao
2018-05-22  9:26             ` Jan Beulich
2018-05-22 20:14               ` Raj, Ashok
2018-11-13  9:08   ` Chao Gao
2018-11-13  9:09     ` Andrew Cooper
2018-05-16 12:54 ` [Patch v3 1/2] x86/smp: count the number of online physical processor in the system Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.