On Fri, 15 Mar 2019, Andrew Cooper wrote: > On 14/03/2019 20:25, Thomas Gleixner wrote: > > On Thu, 14 Mar 2019, Raj, Ashok wrote: > >> On Thu, Mar 14, 2019 at 12:39:46PM +0000, Andrew Cooper wrote: > >>> On late load failure, we should dump enough information to work out > >>> exactly what went on, to determine how best to proceed, but the server > >>> is effectively lost to us.  On late load success, the proposed new > >>> "version" replaces the current "version". > >>> > >>> And again - I reiterate the point that I think it is fine to have a > >>> simplifying assumption that we don't have mixed stepping systems to > >>> start with, presuming this is generally in line with Intel's support > >>> statement.  If in practice we find mixed stepping systems which are > >>> supported by an OEM/Intel, we can see about extending the logic. > >> Checking with Asit he says it is in fact permitted to have 1 step behind > >> even on a multi-socket system. One could be N and other N-1 should be > >> supported. > > That turns into a total disaster if N has an issue fixed ant N-1 requires > > microcode + software workaround. > > > > So if N is on the boot socket, then we fail to enable the workaround > > because CPU0 has the 'Issue fixed' bit set. > > > > If N-1 is on the boot socket, then we go to do the workaround nevertheless > > on N and that might dependend on the issue just be some pointless exercise > > or even try to access some MSR which is not available. > > > > *Shudder* > > Intel: Are you saying that Skylake (06-55-04) is supported in > combination with Cascade Lake B0 (06-55-05) and/or Cascade Lake B1 > (06-55-06) ? > > The most insidious problem is TSX_FORCE_ABORT between the two Cascade > Lakes.  There really will be an asymmetric existence of an MSR required > for use in one part of the system, and unavailable in the other part of > the system. > > To a certain degree, what is technically supported by Intel is also > tempered by what the major OS/VMM vendors are willing to boot on, as > that is ultimately what the customer is paying for.  When the steppings > differed only by the errata fixed, and the silicon was otherwise > identical from software's point of view, supporting a range of adjacent > steppings seems entirely reasonable. Sure, if the software does not have to worry about the differences then supporting that is a no-brainer. > In this case you've got 3 adjacent steppings, *all* of which offer > different architecturally defined features, and will involve software > changes to allow mixed systems to function in a safe way. Let's not go there. We've seen the mess which other architectures created with big/little CPUs which expose different feature sets. Thanks, tglx