xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: "Chao Gao" <chao.gao@intel.com>
Cc: Sergey Dyasli <sergey.dyasli@citrix.com>,
	Ashok Raj <ashok.raj@intel.com>, WeiLiu <wl@xen.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH v7 06/10] microcode: split out apply_microcode() from cpu_request_microcode()
Date: Tue, 11 Jun 2019 01:08:36 -0600	[thread overview]
Message-ID: <5CFF53740200007800236D68@prv1-mh.provo.novell.com> (raw)
In-Reply-To: <20190611033240.GA26361@gao-cwp>

>>> On 11.06.19 at 05:32, <chao.gao@intel.com> wrote:
> On Wed, Jun 05, 2019 at 06:37:27AM -0600, Jan Beulich wrote:
>>>>> On 27.05.19 at 10:31, <chao.gao@intel.com> wrote:
>>> During late microcode update, apply_microcode() is invoked in
>>> cpu_request_microcode(). To make late microcode update more reliable,
>>> we want to put the apply_microcode() into stop_machine context. So
>>> we split out it from cpu_request_microcode(). As a consequence,
>>> apply_microcode() should be invoked explicitly in the common code.
>>> 
>>> Previously, apply_microcode() gets the microcode patch to be applied from
>>> the microcode cache. Now, the patch is passed as a function argument and
>>> a patch is cached for cpu-hotplug and cpu resuming, only after it has
>>> been loaded to a cpu without any error. As a consequence, the
>>> 'match_cpu' check in microcode_update_cache is removed, which otherwise
>>> would fail.
>>
>>The "only after it has been loaded to a cpu without any error" is a
>>problem, precisely for the case where ucode on the different cores
>>is not in sync initially. I would actually like to put up this question:
>>When a core has no ucode loaded at all yet and only strictly older
>>(than loaded on some other cores) ucode is found to be available,
>>whether then it wouldn't still be better to apply that ucode to
>>_at least_ the cores that have none loaded yet.
> 
> Yes, it is better for this special case. And I agree to support this case.
> 
> This in v7, a patch is loaded only if its revision is newer than that
> loaded to current CPU. And it is stored only if it has been loaded
> successfully. But, as you described, a broken bios might puts the system
> in an inconsistent state (multiple microcode revision in the system) and
> furthermore in this case, if no or an older microcode update is
> provided, early loading cannot get the system into sane state. So for
> both early and late microcode loading, we could face a situation that
> the patch to be loaded has equal or old revision than microcode of some
> CPUs.
> 
> Changes I plan to make in next version are:
> 1. For early microcode, a patch would be stored if it covers current CPU's
> signature. All CPUs would try to load from the cache.
> 2. For late microcode, a patch is loaded only if its revision is newer than
> *the patch cached*. And it is stored only if has been loaded without an
> "EIO" error.
> 3. Cache replacement remains the same.

Why the difference between early and late loading?

> But it is a temperary solution, especially for CSPs. A better way
> might be getting the newest ucode or upgrading to the newest bios,
> even downgrading the bios to an older version which wouldn't put
> the system into "insane" state.

On the quoted system, all BIOS versions I've ever been provided
had the same odd behavior.

>>> +static int microcode_update_cpu(struct microcode_patch *patch)
>>>  {
>>> -    int err;
>>> -    struct cpu_signature *sig = &this_cpu(cpu_sig);
>>> +    int ret = microcode_ops->collect_cpu_info(&this_cpu(cpu_sig));
>>>  
>>> -    if ( !microcode_ops )
>>> -        return 0;
>>> +    if ( unlikely(ret) )
>>> +        return ret;
>>>  
>>>      spin_lock(&microcode_mutex);
>>>  
>>> -    err = microcode_ops->collect_cpu_info(sig);
>>> -    if ( likely(!err) )
>>> -        err = microcode_ops->apply_microcode();
>>> -    spin_unlock(&microcode_mutex);
>>> +    if ( patch )
>>> +    {
>>> +        /*
>>> +         * If a patch is specified, it should has newer revision than
>>> +         * that of the patch cached.
>>> +         */
>>> +        if ( microcode_cache &&
>>> +             microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE )
>>> +        {
>>> +            spin_unlock(&microcode_mutex);
>>> +            return -EINVAL;
>>> +        }
>>>  
>>> -    return err;
>>> -}
>>> +        ret = microcode_ops->apply_microcode(patch);
>>
>>There's no printk() here but ...
>>
>>> +    }
>>> +    else if ( microcode_cache )
>>> +    {
>>> +        ret = microcode_ops->apply_microcode(microcode_cache);
>>> +        if ( ret == -EIO )
>>> +            printk("Update failed. Reboot needed\n");
>>
>>... you emit a log message here. Why the difference? And wouldn't
>>it be better to have just a single call to ->apply anyway, by simply
>>assigning "microcode_cache" to "patch" and moving the call a little
>>further down?
> 
> -EIO means we loaded the patch but the revision didn't change. This
> error code indicates an error in the patch. It is unlikely to happen.
> And in this v7, a patch is stored after being loading successfully
> on a CPU. To simplify handling of load failure and avoid cleanup to the
> global cache (if a patch has a success rate, we need to consider which
> one to save when we have a new patch with 95% rate and an old one with
> 100% rate, it would be really complex), we assume that loading the cache
> (being loaded successfully on a CPU) shouldn't return EIO. Otherwise,
> an error message is prompted.

But then the log message itself needs to be more specific. That'll
then either allow to understand why one is wanted here but not
elsewhere, or additionally a comment should be attached.

>>> --- a/xen/arch/x86/smpboot.c
>>> +++ b/xen/arch/x86/smpboot.c
>>> @@ -363,10 +363,7 @@ void start_secondary(void *unused)
>>>  
>>>      initialize_cpu_data(cpu);
>>>  
>>> -    if ( system_state <= SYS_STATE_smp_boot )
>>> -        early_microcode_update_cpu(false);
>>> -    else
>>> -        microcode_resume_cpu();
>>> +    early_microcode_update_cpu();
>>
>>I'm struggling to understand how you get away without the "false"
>>argument that was passed here before. You look to now be calling
>>->start_update() unconditionally (so long as the hook is not NULL),
>>which I don't think is correct. This should be called only once by
>>the CPU _leading_ an update (the BSP during boot, and the CPU
>>the hypercall gets invoked on (or the first CPU an update gets
>>issued one) for a late update. Am I missing something?
> 
> BSP and APs call different functions, early_microcode_parse_and_update_cpu()
> and early_microcode_update_cpu() respectively. The latter won't call
> ->start_update().

Oh, I see - I'm sorry. I've been mislead by the patch context
indicators. But still, by their names there's not supposed to be
such a difference. I wonder whether we wouldn't better stick
to just one function (early_microcode_update_cpu()), having
it take a boolean as is the case now. That would control both
parsing and ->start_update() invocation.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-06-11  7:09 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-27  8:31 [PATCH v7 00/10] improve late microcode loading Chao Gao
2019-05-27  8:31 ` [Xen-devel] " Chao Gao
2019-05-27  8:31 ` [PATCH v7 01/10] misc/xen-ucode: Upload a microcode blob to the hypervisor Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-04 16:14   ` Andrew Cooper
2019-06-04 16:23     ` Jan Beulich
2019-06-06  2:29     ` Chao Gao
2019-05-27  8:31 ` [PATCH v7 02/10] microcode/intel: extend microcode_update_match() Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-04 14:39   ` Jan Beulich
2019-06-05 13:22     ` Roger Pau Monné
2019-06-05 14:16       ` Jan Beulich
2019-06-06  8:26     ` Chao Gao
2019-06-06  9:01       ` Jan Beulich
2019-05-27  8:31 ` [PATCH v7 03/10] microcode: introduce a global cache of ucode patch Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-04 15:03   ` Jan Beulich
2019-06-10  5:33     ` Chao Gao
2019-06-11  6:50       ` Jan Beulich
2019-05-27  8:31 ` [PATCH v7 04/10] microcode: remove struct ucode_cpu_info Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-04 15:13   ` Jan Beulich
2019-06-10  7:19     ` Chao Gao
2019-05-27  8:31 ` [PATCH v7 05/10] microcode: remove pointless 'cpu' parameter Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-04 15:29   ` Jan Beulich
2019-06-10  7:31     ` Chao Gao
2019-05-27  8:31 ` [PATCH v7 06/10] microcode: split out apply_microcode() from cpu_request_microcode() Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-05 12:37   ` Jan Beulich
2019-06-11  3:32     ` Chao Gao
2019-06-11  7:08       ` Jan Beulich [this message]
2019-06-11  8:53         ` Chao Gao
2019-06-11  9:15           ` Jan Beulich
2019-05-27  8:31 ` [PATCH v7 07/10] microcode/intel: Writeback and invalidate caches before updating microcode Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-05 13:20   ` Jan Beulich
2019-05-27  8:31 ` [PATCH v7 08/10] x86/microcode: Synchronize late microcode loading Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-05 14:09   ` Jan Beulich
2019-06-11 12:36     ` Chao Gao
2019-06-11 12:58       ` Jan Beulich
2019-06-11 15:47       ` Raj, Ashok
2019-06-05 14:42   ` Roger Pau Monné
2019-05-27  8:31 ` [PATCH v7 09/10] microcode: remove microcode_update_lock Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-05 14:52   ` Roger Pau Monné
2019-06-05 15:15     ` Jan Beulich
2019-06-05 14:53   ` Jan Beulich
2019-06-11 12:46     ` Chao Gao
2019-06-11 13:23       ` Jan Beulich
2019-06-11 16:04       ` Raj, Ashok
2019-06-12  7:38         ` Jan Beulich
2019-06-13 14:05           ` Chao Gao
2019-06-13 14:08             ` Jan Beulich
2019-06-13 14:58               ` Chao Gao
2019-06-13 17:47               ` Raj, Ashok
2019-06-14  8:58                 ` Jan Beulich
2019-05-27  8:31 ` [PATCH v7 10/10] x86/microcode: always collect_cpu_info() during boot Chao Gao
2019-05-27  8:31   ` [Xen-devel] " Chao Gao
2019-06-05 14:56   ` Roger Pau Monné
2019-06-11 13:02     ` Chao Gao
2019-06-05 15:05   ` Jan Beulich
2019-06-11 12:58     ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5CFF53740200007800236D68@prv1-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ashok.raj@intel.com \
    --cc=chao.gao@intel.com \
    --cc=roger.pau@citrix.com \
    --cc=sergey.dyasli@citrix.com \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).