Re: [PATCH] misc/xenmicrocode: Upload /lib/firmware/<some blob> to the hypervisor

From: Henrique de Moraes Holschuh <hmh@debian.org>
To: Borislav Petkov <bp@suse.de>, Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Juergen Gross <jgross@suse.com>, Michal Marek <mmarek@suse.cz>,
	Jason Douglas <jdouglas@suse.com>,
	stefano.stabellini@eu.citrix.com, Takashi Iwai <tiwai@suse.de>,
	mcgrof@suse.com, "Luis R. Rodriguez" <mcgrof@do-not-panic.com>,
	david.vrabel@citrix.com, Jan Beulich <JBeulich@suse.com>,
	xen-devel@lists.xenproject.org, boris.ostrovsky@oracle.com,
	Olaf Hering <ohering@suse.de>
Subject: Re: [PATCH] misc/xenmicrocode: Upload /lib/firmware/<some blob> to the hypervisor
Date: Thu, 29 Jan 2015 09:36:49 -0200	[thread overview]
Message-ID: <1422531409.591726.220430733.6DE9E92B@webmail.messagingengine.com> (raw)
In-Reply-To: <20150128083924.GA6360@pd.tnic>

On Wed, Jan 28, 2015, at 06:39, Borislav Petkov wrote:
> On Wed, Jan 28, 2015 at 12:10:43AM +0000, Andrew Cooper wrote:
> > There was a thread on xen-devel but I cant currently find it in the
> > archives.
> > 
> > To the best of my memory,  it was a 4 core APU system where the BIOS had
> > updated the microcode on cpu 0 but left 1-3 at a lower patch level. 

Which is a situation that will *always* happen  when the late microcode
update driver is used, both on Intel and AMD.  You will always have a
time window where the processor is running with mismatched microcode
between some of the hardware threads/cores/modules/whatever.

This is not an issue when using the bare-metal early microcode update
driver, because it updates the BSP while still in uniprocessor mode, and
it updates the APs very early on the processor bootstrap code, well
before they are on-lined.  We can control what machien code runs in a
hardware thread that might be running mismatched microcode (i.e. that
runs before the processor bootstrap code attempts to update the
microcode) and keep it simple and away from anything that would heavly
object to mismatched microcode.

Likewise, it is not an issue for a non-broken BIOS/UEFI, as it is
*supposed* to update everything to the same microcode well before it
attempts to do anything complex.

> > Every time the reporter tried creating an HVM guest (i.e. entering SVM
> > non-root mode), the system reset.
> > 
> > The instability was sorted by ensuring each core was at the same
> > microcode level.
> 
> That sounds like a BIOS bug to me, frankly.

Sort of.  The extremely wide time window of mismatched microcode in that
computer was a BIOS bug, of course.

But the fact that you cannot trust a system with mismatched microcode to
be stable is the hard truth: neither AMD nor Intel are really enforcing
that late microcode updates will be always safe in all conditions.

What we can do about it in the Linux kernel late microcode driver is to
shorten that window as much as possible, and try to quiesce the system
as much as possible during the microcode update until all cores have
been updated.

It still looks like Xen should *never* trigger a late microcode update,
unless it freezes all VMs first.

> > As Xen updates microcode one cpu at a time from 0, it could easily
> > create a similar situation if microcode is updated after VMs have been
> > started.  Come to think of it, this is also an impending problem for PVH
> > dom0 systems.
> 
> The common way for doing microcode updates is to update all cores at
> the same time, possibly. Or at least as close to one another in time as
> possible.

The later.  We serialize microcode updates across CPUs, and doing them
all at the same time is neither trivial (unforeseen side effects on a
running system) nor future-proof.

For example, on Intel you must *never* have two CPUs attempt to update
the same "microcode store" at the same time, which requires that you
actually know how the microcode is partitioned relative to
packages/cores/threads (so far, this is easy: HT siblings share
microcode, nothing else does.  But what about future processors?).

> * the late update is an addition to the early one to cover the cases of
> long running systems where a reboot is prohibitively painful. With that,
> as with the early method, you would want to update all hardware cores in
> one go.

And, unfortunately, you have a time window of mismatched microcode
during the "one go", which is not something we can fix.

So we would have to try to limit what happens during that time window,
instead.

> Now, this is where it becomes tricky for virt: you need to stop guests,
> do the update and then resume them. Even worse, if all of a sudden you
> want to hide hardware features and/or instructions like HSW TSX for
> example, you most likely want to even avoid the late update and warn the
> admin that she has to reboot that machine and apply microcode with the
> early method.

Exactly.  But it goes further: we likely should freeze the entire kernel
and run nothing (not even interrupt handling) on non-up-to-date cores. 
I.e. offline every CPU but one, switch to the last online CPU, update
its microcode, then update the other ones one-at-a-time, onlining them
after they are up-to-date (and leaving them offline if something wrong
happens).

Or something to that effect.

It is no wonder we currently "hope for the best" as far as late
microcode update mode goes, and also that Linux distros are switching to
"early updates only" by default.  BTW, most datacenter people I know
have a policy of never updating *any* firmware at all outside of
maintenance downtime, so they're actually quite fine with the idea of a
reboot being required to update processor microcode.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique de Moraes Holschuh <hmh@debian.org>