>> That means there were no VALID=1, EN=1, S=1 errors anywhere. But there >> might be some other things logged that would help us understand. > > By "other things" you mean other MCEs? Logs with EN=0 and/or S=0. They may have interesting information, and have a good chance of being useful (especially if they are from some functional unit that isn't part of the buggy behavior. Bad data flowing through multiple functional units can leave a trail of logged entries (perhaps as many as four units may see and log a single error). Only one of them should signal the machine check (to avoid shutdown because of nested machine check). > Oh, cpu errata. So this would mean that we can't even rely on the > contents of the MCA banks, can we? > > In any case, is any of the information in the MCA banks in such cases > even usable then? Because if not, we're definitely barking up the wrong > tree... See above - I think even if there is a bug in the core that isn't setting the right bits in the MCi_STATUS register - we could get good data from devices out in the uncore. -Tony {.n++%ݶw{.n+{G{ayʇڙ,jfhz_(階ݢj"mG?&~iOzv^m ?I