All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Chao Gao <chao.gao@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	xen-devel@lists.xen.org, Jun Nakajima <jun.nakajima@intel.com>,
	tglx@linutronix.de, Borislav Petkov <bp@suse.de>
Subject: Re: [PATCH v2] x86/microcode: Synchronize late microcode loading
Date: Mon, 30 Apr 2018 09:25:26 -0600	[thread overview]
Message-ID: <5AE7356602000078001BFA92@prv1-mh.provo.novell.com> (raw)
In-Reply-To: <1524656778-8324-1-git-send-email-chao.gao@intel.com>

>>> On 25.04.18 at 13:46, <chao.gao@intel.com> wrote:
> @@ -281,24 +288,56 @@ static int microcode_update_cpu(const void *buf, size_t size)
>      return err;
>  }
>  
> -static long do_microcode_update(void *_info)
> +/* Wait for all CPUs to rendezvous with a timeout (us) */
> +static int wait_for_cpus(atomic_t *cnt, int timeout)

unsigned int

> +static int do_microcode_update(void *_info)
> +{
> +    struct microcode_info *info = _info;
> +    unsigned int cpu = smp_processor_id();
> +    int ret;
> +
> +    ret = wait_for_cpus(&info->cpu_in, MICROCODE_DEFAULT_TIMEOUT);
> +    if ( ret )
> +        return ret;
> +    /*
> +     * Logical threads which set the first bit in cpu_sibling_mask can do
> +     * the update. Other sibling threads just await the completion of
> +     * microcode update.
> +     */
> +    if ( cpumask_test_and_set_cpu(
> +                cpumask_first(per_cpu(cpu_sibling_mask, cpu)), &info->cpus) )
> +        ret = microcode_update_cpu(info->buffer, info->buffer_size);

Isn't the condition inverted (i.e. missing a ! )?

Also I take it that you've confirmed that loading ucode in parallel on multiple
cores of the same socket is not a problem? The comment in the last hunk
suggests otherwise.

> +    /*
> +     * Increase the wait timeout to a safe value here since we're serializing
> +     * the microcode update and that could take a while on a large number of
> +     * CPUs. And that is fine as the *actual* timeout will be determined by
> +     * the last CPU finished updating and thus cut short
> +     */
> +    if ( wait_for_cpus(&info->cpu_out, MICROCODE_DEFAULT_TIMEOUT *
> +                                       num_online_cpus()) )
> +        panic("Timeout when finishing updating microcode");

A 3s timeout (as an example for a system with 100 CPU threads) is still
absurdly high to me, but considering you panic() anyway if you hit the
timeout the question mainly is whether there's a slim chance for this to
complete a brief moment before the timeout expires. If all goes well,
you won't come close to even 1s, but as said before - there may be
guests running, and they may become utterly confused if they don't
get any time within a second or more.

With you no longer doing things sequentially I don't, however, see why
you need to scale the timeout by CPU count.

> +
> +    return ret;
>  }

You're losing this return value (once for every CPU making it into this
function).

> @@ -318,26 +357,52 @@ int microcode_update(XEN_GUEST_HANDLE_PARAM(const_void) buf, unsigned long len)
>  
>      ret = copy_from_guest(info->buffer, buf, len);
>      if ( ret != 0 )
> -    {
> -        xfree(info);
> -        return ret;
> -    }
> +        goto free;
>  
>      info->buffer_size = len;
> -    info->error = 0;
> -    info->cpu = cpumask_first(&cpu_online_map);
> +
> +    /* cpu_online_map must not change during update */
> +    if ( !get_cpu_maps() )
> +    {
> +        ret = -EBUSY;
> +        goto free;
> +    }
>  
>      if ( microcode_ops->start_update )
>      {
>          ret = microcode_ops->start_update();
>          if ( ret != 0 )
> -        {
> -            xfree(info);
> -            return ret;
> -        }
> +            goto put;
>      }
>  
> -    return continue_hypercall_on_cpu(info->cpu, do_microcode_update, info);
> +    cpumask_empty(&info->cpus);

DYM cpumask_clear()?

> +    atomic_set(&info->cpu_in, 0);
> +    atomic_set(&info->cpu_out, 0);
> +
> +    /*
> +     * We intend to disable interrupt for long time, which may lead to
> +     * watchdog timeout.
> +     */
> +    watchdog_disable();
> +    /*
> +     * Late loading dance. Why the heavy-handed stop_machine effort?
> +     *
> +     * -HT siblings must be idle and not execute other code while the other
> +     *  sibling is loading microcode in order to avoid any negative
> +     *  interactions cause by the loading.
> +     *
> +     * -In addition, microcode update on the cores must be serialized until
> +     *  this requirement can be relaxed in the feature. Right now, this is
> +     *  conservative and good.

This is the comment I've referred to above.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-04-30 15:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-25 11:46 [PATCH v2] x86/microcode: Synchronize late microcode loading Chao Gao
2018-04-30 15:25 ` Jan Beulich [this message]
2018-05-01  8:15   ` Chao Gao
2018-05-02  6:48     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5AE7356602000078001BFA92@prv1-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ashok.raj@intel.com \
    --cc=bp@suse.de \
    --cc=chao.gao@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=tglx@linutronix.de \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.