Here's additional information.

All of the remill tests of the legacy MMX instructions fail. These instructions work on 64-bit registers aliased with the lower 64-bits of the x87 fp80 registers.  The tests fail because remill expects the fxsave64 instruction to deliver 16 bits of 1's (infinity or nan prefix) in the fp80 exponent, eg bits 79:64.  Metal does this, but QEMU does not.

Reading of Intel Software development manual, table 3.44 (https://www.felixcloutier.com/x86/FXSAVE.html#tbl-3-44) says these 16 bits are reserved, but another version of the manual (http://math-atlas.sourceforge.net/devel/arch/ia32_arch.pdf) section 9.6.2 "Transitions between x87 fpu and mmx code" says a write to an MMX register sets those 16 bits to all 1s.

In digging through the code for the implementation of the SSE/mmx instruction pavgb I see a nice clean implementation in the SSE_HELPER_B macro which takes a MMXREG which is an MMREG_UNION which does not provide, to the extent that I can figure this out, a handle to bits 79:64 of the aliased-with x87 register.

I find it hard to believe that an apparent bug like this has been here "forever". Am I missing something?

Robert Henry
________________________________
From: Robert Henry
Sent: Friday, May 29, 2020 10:38 AM
To: qemu-devel@nongnu.org <qemu-devel@nongnu.org>
Subject: ia-32/ia-64 fxsave64 instruction behavior when saving mmx

Background: The ia-32/ia-64 fxsave64 instruction saves fp80 or legacy SSE mmx registers. The mmx registers are saved as if they were fp80 values. The lower 64 bits of the constructed fp80 value is the mmx register.  The upper 16 bits of the constructed fp80 value are reserved; see the last row of table 3-44 of https://www.felixcloutier.com/x86/fxsave#tbl-3-44

The Intel core i9-9980XE Skylake metal I have puts 0xffff into these reserved 16 bits when saving MMX.

QEMU appears to put 0's there.

Does anybody have insight as to what "reserved" really means, or must be, in this case?  I take the verb "reserved" to mean something other than "undefined".

I came across this issue when running the remill instruction test engine.  See my issue https://github.com/lifting-bits/remill/issues/423 For better or worse, remill assumes that those bits are 0xffff, not 0x0000