From: Chao Gao <chao.gao@intel.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Sergey Dyasli <sergey.dyasli@citrix.com>,
Kevin Tian <kevin.tian@intel.com>,
Ashok Raj <ashok.raj@intel.com>, WeiLiu <wl@xen.org>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Jun Nakajima <jun.nakajima@intel.com>,
xen-devel <xen-devel@lists.xenproject.org>,
tglx@linutronix.de, Borislav Petkov <bp@suse.de>,
Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH v7 08/10] x86/microcode: Synchronize late microcode loading
Date: Tue, 11 Jun 2019 20:36:17 +0800 [thread overview]
Message-ID: <20190611123615.GA22930@gao-cwp> (raw)
In-Reply-To: <5CF7CD2702000078002358F4@prv1-mh.provo.novell.com>
On Wed, Jun 05, 2019 at 08:09:43AM -0600, Jan Beulich wrote:
>>>> On 27.05.19 at 10:31, <chao.gao@intel.com> wrote:
>> This patch ports microcode improvement patches from linux kernel.
>>
>> Before you read any further: the early loading method is still the
>> preferred one and you should always do that. The following patch is
>> improving the late loading mechanism for long running jobs and cloud use
>> cases.
>>
>> Gather all cores and serialize the microcode update on them by doing it
>> one-by-one to make the late update process as reliable as possible and
>> avoid potential issues caused by the microcode update.
>>
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Tested-by: Chao Gao <chao.gao@intel.com>
>> [linux commit: a5321aec6412b20b5ad15db2d6b916c05349dbff]
>> [linux commit: bb8c13d61a629276a162c1d2b1a20a815cbcfbb7]
>> Cc: Kevin Tian <kevin.tian@intel.com>
>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>> Cc: Ashok Raj <ashok.raj@intel.com>
>> Cc: Borislav Petkov <bp@suse.de>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> ---
>> Changes in v7:
>> - Check whether 'timeout' is 0 rather than "<=0" since it is unsigned int.
>> - reword the comment above microcode_update_cpu() to clearly state that
>> one thread per core should do the update.
>>
>> Changes in v6:
>> - Use one timeout period for rendezvous stage and another for update stage.
>> - scale time to wait by the number of remaining cpus to respond.
>> It helps to find something wrong earlier and thus we can reboot the
>> system earlier.
>> ---
>> xen/arch/x86/microcode.c | 171 ++++++++++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 155 insertions(+), 16 deletions(-)
>>
>> diff --git a/xen/arch/x86/microcode.c b/xen/arch/x86/microcode.c
>> index 23cf550..f4a417e 100644
>> --- a/xen/arch/x86/microcode.c
>> +++ b/xen/arch/x86/microcode.c
>> @@ -22,6 +22,7 @@
>> */
>>
>> #include <xen/cpu.h>
>> +#include <xen/cpumask.h>
>
>It seems vanishingly unlikely that you would need this explicit #include
>here, but it certainly isn't wrong.
>
>> @@ -270,31 +296,90 @@ bool microcode_update_cache(struct microcode_patch *patch)
>> return true;
>> }
>>
>> -static long do_microcode_update(void *patch)
>> +/* Wait for CPUs to rendezvous with a timeout (us) */
>> +static int wait_for_cpus(atomic_t *cnt, unsigned int expect,
>> + unsigned int timeout)
>> {
>> - int error, cpu;
>> -
>> - error = microcode_update_cpu(patch);
>> - if ( error )
>> + while ( atomic_read(cnt) < expect )
>> {
>> - microcode_ops->free_patch(microcode_cache);
>> - return error;
>> + if ( !timeout )
>> + {
>> + printk("CPU%d: Timeout when waiting for CPUs calling in\n",
>> + smp_processor_id());
>> + return -EBUSY;
>> + }
>> + udelay(1);
>> + timeout--;
>> }
>
>There's no comment here and nothing in the description: I don't
>recall clarification as to whether RDTSC is fine to be issued by a
>thread when ucode is being updated by another thread on the
>same core.
Yes. I think it is fine.
Ashok, could you share your opinion on this question?
>
>> +static int do_microcode_update(void *patch)
>> +{
>> + unsigned int cpu = smp_processor_id();
>> + unsigned int cpu_nr = num_online_cpus();
>> + unsigned int finished;
>> + int ret;
>> + static bool error;
>>
>> - microcode_update_cache(patch);
>> + atomic_inc(&cpu_in);
>> + ret = wait_for_cpus(&cpu_in, cpu_nr, MICROCODE_CALLIN_TIMEOUT_US);
>> + if ( ret )
>> + return ret;
>>
>> - return error;
>> + ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
>> + /*
>> + * Load microcode update on only one logical processor per core.
>> + * Here, among logical processors of a core, the one with the
>> + * lowest thread id is chosen to perform the loading.
>> + */
>> + if ( !ret && (cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu))) )
>
>At the very least it's not obvious whether this hyper-threading-centric
>view ("logical processor") also applies to AMD's compute unit model
>(which reuses cpu_sibling_mask). It does, as the respective MSRs are
>per-compute-unit rather than per-core, but I'd appreciate if the
>wording could be adjusted to explicitly name both cases (multiple
>threads per core and multiple cores per CU).
OK. Will do
>
>> + {
>> + ret = microcode_ops->apply_microcode(patch);
>> + if ( !ret )
>> + atomic_inc(&cpu_updated);
>> + }
>> + /*
>> + * Increase the wait timeout to a safe value here since we're serializing
>
>I'm struggling with the "increase": I don't see anything being increased
>here. You simply use a larger timeout than above.
>
>> + * the microcode update and that could take a while on a large number of
>> + * CPUs. And that is fine as the *actual* timeout will be determined by
>> + * the last CPU finished updating and thus cut short
>> + */
>> + atomic_inc(&cpu_out);
>> + finished = atomic_read(&cpu_out);
>> + while ( !error && finished != cpu_nr )
>> + {
>> + /*
>> + * During each timeout interval, at least a CPU is expected to
>> + * finish its update. Otherwise, something goes wrong.
>> + */
>> + if ( wait_for_cpus(&cpu_out, finished + 1,
>> + MICROCODE_UPDATE_TIMEOUT_US) && !error )
>> + {
>> + error = true;
>> + panic("Timeout when finishing updating microcode (finished %d/%d)",
>> + finished, cpu_nr);
>
>Why the setting of "error" when you panic anyway?
>
>And please use format specifiers matching the types of the
>further arguments (i.e. twice %u here, but please check other
>code as well).
>
>Furthermore (and I'm sure I've given this comment before) if
>you really hit the limit, how many panic() invocations are there
>going to be? You run this function on all CPUs after all.
"error" is to avoid calling of panic() on multiple CPUs simultaneously.
Roger is right: atomic primitives should be used here.
>
>On the whole, taking a 256-thread system as example, you
>allow the whole process to take over 4 min without calling
>panic().
>Leaving aside guests, I don't think Xen itself would
>survive this in all cases. We've found the need to process
>softirqs with far smaller delays, in particular from key handlers
>producing lots of output. At the very least there should be a
>bold warning logged if the system had been in stop-machine
>state for, say, longer than 100ms (value subject to discussion).
>
In theory, if you mean 256 cores, yes. Do you think a configurable and
run-time changeable upper bound for the whole process can address your
concern? The default value for this upper bound can be set to a large
value (for example, 1s * the number of online core) and the admin can
ajust/lower the upper bound according to the way (serial or parallel) to
perform the update and other requirements. Once the upper bound is
reached, we would call panic().
>> + }
>> +
>> + finished = atomic_read(&cpu_out);
>> + }
>> +
>> + /*
>> + * Refresh CPU signature (revision) on threads which didn't call
>> + * apply_microcode().
>> + */
>> + if ( cpu != cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
>> + ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
>
>Another option would be for the CPU doing the update to simply
>propagate the new value to all its siblings' cpu_sig values.
Will do.
>
>> @@ -337,12 +429,59 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
>> if ( patch )
>> microcode_ops->free_patch(patch);
>> ret = -EINVAL;
>> - goto free;
>> + goto put;
>> }
>>
>> - ret = continue_hypercall_on_cpu(cpumask_first(&cpu_online_map),
>> - do_microcode_update, patch);
>> + atomic_set(&cpu_in, 0);
>> + atomic_set(&cpu_out, 0);
>> + atomic_set(&cpu_updated, 0);
>> +
>> + /* Calculate the number of online CPU core */
>> + nr_cores = 0;
>> + for_each_online_cpu(cpu)
>> + if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
>> + nr_cores++;
>> +
>> + printk(XENLOG_INFO "%d cores are to update their microcode\n", nr_cores);
>> +
>> + /*
>> + * We intend to disable interrupt for long time, which may lead to
>> + * watchdog timeout.
>> + */
>> + watchdog_disable();
>> + /*
>> + * Late loading dance. Why the heavy-handed stop_machine effort?
>> + *
>> + * - HT siblings must be idle and not execute other code while the other
>> + * sibling is loading microcode in order to avoid any negative
>> + * interactions cause by the loading.
>> + *
>> + * - In addition, microcode update on the cores must be serialized until
>> + * this requirement can be relaxed in the future. Right now, this is
>> + * conservative and good.
>> + */
>> + ret = stop_machine_run(do_microcode_update, patch, NR_CPUS);
>> + watchdog_enable();
>> +
>> + if ( atomic_read(&cpu_updated) == nr_cores )
>> + {
>> + spin_lock(µcode_mutex);
>> + microcode_update_cache(patch);
>> + spin_unlock(µcode_mutex);
>> + }
>> + else if ( atomic_read(&cpu_updated) == 0 )
>> + microcode_ops->free_patch(patch);
>> + else
>> + {
>> + printk("Updating microcode succeeded on part of CPUs and failed on\n"
>> + "others due to an unknown reason. A system with different\n"
>> + "microcode revisions is considered unstable. Please reboot and\n"
>> + "do not load the microcode that triggers this warning\n");
>> + microcode_ops->free_patch(patch);
>> + }
>
>As said on an earlier patch, I think the cache can be updated if at
>least one CPU loaded the blob successfully. Additionally I'd like to
>ask that you log the number of successfully updated cores. And
>finally perhaps "differing" instead of "different" and omit "due to
>an unknown reason"?
Will do.
Thanks
Chao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2019-06-11 12:32 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-27 8:31 [PATCH v7 00/10] improve late microcode loading Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-05-27 8:31 ` [PATCH v7 01/10] misc/xen-ucode: Upload a microcode blob to the hypervisor Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-04 16:14 ` Andrew Cooper
2019-06-04 16:23 ` Jan Beulich
2019-06-06 2:29 ` Chao Gao
2019-05-27 8:31 ` [PATCH v7 02/10] microcode/intel: extend microcode_update_match() Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-04 14:39 ` Jan Beulich
2019-06-05 13:22 ` Roger Pau Monné
2019-06-05 14:16 ` Jan Beulich
2019-06-06 8:26 ` Chao Gao
2019-06-06 9:01 ` Jan Beulich
2019-05-27 8:31 ` [PATCH v7 03/10] microcode: introduce a global cache of ucode patch Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-04 15:03 ` Jan Beulich
2019-06-10 5:33 ` Chao Gao
2019-06-11 6:50 ` Jan Beulich
2019-05-27 8:31 ` [PATCH v7 04/10] microcode: remove struct ucode_cpu_info Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-04 15:13 ` Jan Beulich
2019-06-10 7:19 ` Chao Gao
2019-05-27 8:31 ` [PATCH v7 05/10] microcode: remove pointless 'cpu' parameter Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-04 15:29 ` Jan Beulich
2019-06-10 7:31 ` Chao Gao
2019-05-27 8:31 ` [PATCH v7 06/10] microcode: split out apply_microcode() from cpu_request_microcode() Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-05 12:37 ` Jan Beulich
2019-06-11 3:32 ` Chao Gao
2019-06-11 7:08 ` Jan Beulich
2019-06-11 8:53 ` Chao Gao
2019-06-11 9:15 ` Jan Beulich
2019-05-27 8:31 ` [PATCH v7 07/10] microcode/intel: Writeback and invalidate caches before updating microcode Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-05 13:20 ` Jan Beulich
2019-05-27 8:31 ` [PATCH v7 08/10] x86/microcode: Synchronize late microcode loading Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-05 14:09 ` Jan Beulich
2019-06-11 12:36 ` Chao Gao [this message]
2019-06-11 12:58 ` Jan Beulich
2019-06-11 15:47 ` Raj, Ashok
2019-06-05 14:42 ` Roger Pau Monné
2019-05-27 8:31 ` [PATCH v7 09/10] microcode: remove microcode_update_lock Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-05 14:52 ` Roger Pau Monné
2019-06-05 15:15 ` Jan Beulich
2019-06-05 14:53 ` Jan Beulich
2019-06-11 12:46 ` Chao Gao
2019-06-11 13:23 ` Jan Beulich
2019-06-11 16:04 ` Raj, Ashok
2019-06-12 7:38 ` Jan Beulich
2019-06-13 14:05 ` Chao Gao
2019-06-13 14:08 ` Jan Beulich
2019-06-13 14:58 ` Chao Gao
2019-06-13 17:47 ` Raj, Ashok
2019-06-14 8:58 ` Jan Beulich
2019-05-27 8:31 ` [PATCH v7 10/10] x86/microcode: always collect_cpu_info() during boot Chao Gao
2019-05-27 8:31 ` [Xen-devel] " Chao Gao
2019-06-05 14:56 ` Roger Pau Monné
2019-06-11 13:02 ` Chao Gao
2019-06-05 15:05 ` Jan Beulich
2019-06-11 12:58 ` Chao Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190611123615.GA22930@gao-cwp \
--to=chao.gao@intel.com \
--cc=JBeulich@suse.com \
--cc=andrew.cooper3@citrix.com \
--cc=ashok.raj@intel.com \
--cc=bp@suse.de \
--cc=jun.nakajima@intel.com \
--cc=kevin.tian@intel.com \
--cc=roger.pau@citrix.com \
--cc=sergey.dyasli@citrix.com \
--cc=tglx@linutronix.de \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).