On 25/10/2019 08:17, speck for Borislav Petkov wrote: > On Thu, Oct 24, 2019 at 11:38:21PM +0100, speck for Andrew Cooper wrote: >> I don't necessarily disagree, but the customers (who ultimately pay my >> salary) want late microcode loading and livepatching, so we've delivered. > Yeah, you guys promised too much. How do you deal with userspace using > a feature and you wanna upgrade microcode which disables it? TSX might > not be a good example here Its the perfect example here.  The answer is by requesting that Intel change bit 0's behaviour from causing #UD's to causing aborts. The first version of this microcode was definitely not safe to late load. > because feature bits disappearing is still ok, Some userspace apparently gets confused when CPUID changes behind its back, which is why the CPUID control in bit 1 was split out from an otherwise monolithic bit 0. At late load, choose (or not) to use bit 0 only. At boot, choose (or not) both bits 0 and 1 in unison. > it doesn't fault but it would simply start aborting transactions > unconditionally but what if it is a CPU feature which userspace is > actively using and it disappears underneath its feet all of a sudden? Noone has guaranteed that all microcode ever in the future is going to be safe to use on a running system.  If it really can't be made to be safe, then customers are really going to have to reboot. However, there is a lot of effort going into trying to make sure that fixes such as this one are made safe for late loading. To give a concrete example, we have customers who's elapsed time for a reboot, conforming to SLAs, is in excess of 9 months, and new microcode with critical fixes is coming out faster than that.  I bet that I'm not the only person on this list with this type of customer. > Just upgrade the microcode and forget about it is not enough. I'm pretty > sure you'll have to "dance". But hey, you can buy almost everything with > money nowadays so... :-) Yeah, you have to dance, but the constituent pieces are already around, so its not too bad. > >> Skylake CPUs aren't getting TSX_CTRL, but force setting/clearing bits at >> boot will affect later logic.  (Unless I'm being blind while reading the >> patches, which is a distinct possibility). > Yes, that's why I'm saying we should not blindly force set and clear > bits but mirror what CPUID is telling us. At least wrt TSX. > Ah - in which case I agree.  Sorry for the noise. ~Andrew