linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
@ 2020-04-02  2:34 Jann Horn
  2020-04-02  7:33 ` Christian König
  0 siblings, 1 reply; 37+ messages in thread
From: Jann Horn @ 2020-04-02  2:34 UTC (permalink / raw)
  To: Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	Christian König, David (ChunMing) Zhou
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski

[x86 folks in CC so that they can chime in on the precise rules for this stuff]

Hi!

I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
turn on floating-point instructions in the compiler flags
(-mhard-float, -msse and -msse2) in order to make the "float" and
"double" types usable from C code without requiring helper functions.

However, as far as I know, code running in normal kernel context isn't
allowed to use floating-point registers without special protection
using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
require that the protected code never blocks). If you violate that
rule, that can lead to various issues - among other things, I think
the kernel will clobber userspace FPU register state, and I think the
kernel code can blow up if a context switch happens at the wrong time,
since in-kernel task switches don't preserve FPU state.

Is there some hidden trick I'm missing that makes it okay to use FPU
registers here?

I would try testing this, but unfortunately none of the AMD devices I
have here have the appropriate graphics hardware...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-02  2:34 AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Jann Horn
@ 2020-04-02  7:33 ` Christian König
  2020-04-02  7:56   ` Jann Horn
  2020-04-02 14:13   ` Peter Zijlstra
  0 siblings, 2 replies; 37+ messages in thread
From: Christian König @ 2020-04-02  7:33 UTC (permalink / raw)
  To: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski

Hi Jann,

Am 02.04.20 um 04:34 schrieb Jann Horn:
> [x86 folks in CC so that they can chime in on the precise rules for this stuff]
>
> Hi!
>
> I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
> turn on floating-point instructions in the compiler flags
> (-mhard-float, -msse and -msse2) in order to make the "float" and
> "double" types usable from C code without requiring helper functions.
>
> However, as far as I know, code running in normal kernel context isn't
> allowed to use floating-point registers without special protection
> using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
> require that the protected code never blocks). If you violate that
> rule, that can lead to various issues - among other things, I think
> the kernel will clobber userspace FPU register state, and I think the
> kernel code can blow up if a context switch happens at the wrong time,
> since in-kernel task switches don't preserve FPU state.
>
> Is there some hidden trick I'm missing that makes it okay to use FPU
> registers here?
>
> I would try testing this, but unfortunately none of the AMD devices I
> have here have the appropriate graphics hardware...

yes, using the floating point calculations in the display code has been 
a source of numerous problems and confusion in the past.

The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind 
the DC_FP_START() and DC_FP_END() macros which are supposed to hide the 
architecture depend handling for x86 and PPC64.

This originated from the graphics block integrated into AMD CPU (where 
we knew which fp unit we had), but as far as I know is now also used for 
dedicated AMD GPUs as well.

I'm not really a fan of this either, but so far we weren't able to 
convince the hardware engineers to not use floating point calculations 
for the display stuff.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-02  7:33 ` Christian König
@ 2020-04-02  7:56   ` Jann Horn
  2020-04-02  9:36     ` Thomas Gleixner
  2020-04-02 14:13   ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Jann Horn @ 2020-04-02  7:56 UTC (permalink / raw)
  To: Christian König
  Cc: Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski

On Thu, Apr 2, 2020 at 9:34 AM Christian König <christian.koenig@amd.com> wrote:
> Am 02.04.20 um 04:34 schrieb Jann Horn:
> > [x86 folks in CC so that they can chime in on the precise rules for this stuff]
> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
> > turn on floating-point instructions in the compiler flags
> > (-mhard-float, -msse and -msse2) in order to make the "float" and
> > "double" types usable from C code without requiring helper functions.
> >
> > However, as far as I know, code running in normal kernel context isn't
> > allowed to use floating-point registers without special protection
> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
> > require that the protected code never blocks). If you violate that
> > rule, that can lead to various issues - among other things, I think
> > the kernel will clobber userspace FPU register state, and I think the
> > kernel code can blow up if a context switch happens at the wrong time,
> > since in-kernel task switches don't preserve FPU state.
> >
> > Is there some hidden trick I'm missing that makes it okay to use FPU
> > registers here?
> >
> > I would try testing this, but unfortunately none of the AMD devices I
> > have here have the appropriate graphics hardware...
>
> yes, using the floating point calculations in the display code has been
> a source of numerous problems and confusion in the past.
>
> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind
> the DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> architecture depend handling for x86 and PPC64.

Hmm... but as far as I can tell, you're using those macros from inside
functions that are already compiled with the FPU on:

 - drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c uses the macros,
but is already compiled with calcs_ccflags
 - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c uses the
macros, but is already compiled with "-mhard-float -msse -msse2"
 - drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c uses the
macros, but is already compiled with "-mhard-float -msse -msse2"

AFAIK as soon as you enter any function in any file compiled with FPU
instructions, you may encounter SSE instructions, e.g. via things like
compiler-generated memory-zeroing code - not just when you're actually
using doubles or floats.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-02  7:56   ` Jann Horn
@ 2020-04-02  9:36     ` Thomas Gleixner
  2020-04-02 14:50       ` Jann Horn
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2020-04-02  9:36 UTC (permalink / raw)
  To: Jann Horn, Christian König
  Cc: Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, the arch/x86 maintainers, kernel list,
	Josh Poimboeuf, Andy Lutomirski

Jann Horn <jannh@google.com> writes:
> On Thu, Apr 2, 2020 at 9:34 AM Christian König <christian.koenig@amd.com> wrote:
>> Am 02.04.20 um 04:34 schrieb Jann Horn:
>> > [x86 folks in CC so that they can chime in on the precise rules for
>> > this stuff]

They are pretty simple.

Any code using FPU needs to be completely isolated from regular code
either by using inline asm or by moving it to a different compilation
unit. The invocations need fpu_begin/end() of course.

>> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
>> > turn on floating-point instructions in the compiler flags
>> > (-mhard-float, -msse and -msse2) in order to make the "float" and
>> > "double" types usable from C code without requiring helper functions.
>> >
>> > However, as far as I know, code running in normal kernel context isn't
>> > allowed to use floating-point registers without special protection
>> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
>> > require that the protected code never blocks). If you violate that
>> > rule, that can lead to various issues - among other things, I think
>> > the kernel will clobber userspace FPU register state, and I think the
>> > kernel code can blow up if a context switch happens at the wrong time,
>> > since in-kernel task switches don't preserve FPU state.
>> >
>> > Is there some hidden trick I'm missing that makes it okay to use FPU
>> > registers here?
>> >
>> > I would try testing this, but unfortunately none of the AMD devices I
>> > have here have the appropriate graphics hardware...
>>
>> yes, using the floating point calculations in the display code has been
>> a source of numerous problems and confusion in the past.
>>
>> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind
>> the DC_FP_START() and DC_FP_END() macros which are supposed to hide the
>> architecture depend handling for x86 and PPC64.
>
> Hmm... but as far as I can tell, you're using those macros from inside
> functions that are already compiled with the FPU on:
>
>  - drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c uses the macros,
> but is already compiled with calcs_ccflags
>  - drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c uses the
> macros, but is already compiled with "-mhard-float -msse -msse2"
>  - drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c uses the
> macros, but is already compiled with "-mhard-float -msse -msse2"
>
> AFAIK as soon as you enter any function in any file compiled with FPU
> instructions, you may encounter SSE instructions, e.g. via things like
> compiler-generated memory-zeroing code - not just when you're actually
> using doubles or floats.

That's correct and this will silently cause FPU state corruption ...

We really need objtool support to validate that.

Peter, now that we know how to do it (noinstr, clac/stac) we can emit
annotations (see patch below) and validate that any FPU instruction is
inside a safe region. Hmm?

Thanks,

        tglx

8<---------------
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -19,8 +19,27 @@
  * If you intend to use the FPU in softirq you need to check first with
  * irq_fpu_usable() if it is possible.
  */
-extern void kernel_fpu_begin(void);
-extern void kernel_fpu_end(void);
+extern void __kernel_fpu_begin(void);
+extern void __kernel_fpu_end(void);
+
+static inline void kernel_fpu_begin(void)
+{
+	asm volatile("%c0:\n\t"
+		     ".pushsection .discard.fpu_begin\n\t"
+		     ".long %c0b - .\n\t"
+		     ".popsection\n\t" : : "i" (__COUNTER__));
+	__kernel_fpu_begin();
+}
+
+static inline void kernel_fpu_end(void)
+{
+	__kernel_fpu_end();
+	asm volatile("%c0:\n\t"
+		     ".pushsection .discard.fpu_end\n\t"
+		     ".long %c0b - .\n\t"
+		     ".popsection\n\t" : : "i" (__COUNTER__));
+}
+
 extern bool irq_fpu_usable(void);
 extern void fpregs_mark_activate(void);
 
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -82,7 +82,7 @@ bool irq_fpu_usable(void)
 }
 EXPORT_SYMBOL(irq_fpu_usable);
 
-void kernel_fpu_begin(void)
+void __kernel_fpu_begin(void)
 {
 	preempt_disable();
 
@@ -102,16 +102,16 @@ void kernel_fpu_begin(void)
 	}
 	__cpu_invalidate_fpregs_state();
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+EXPORT_SYMBOL_GPL(__kernel_fpu_begin);
 
-void kernel_fpu_end(void)
+void __kernel_fpu_end(void)
 {
 	WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
 
 	this_cpu_write(in_kernel_fpu, false);
 	preempt_enable();
 }
-EXPORT_SYMBOL_GPL(kernel_fpu_end);
+EXPORT_SYMBOL_GPL(__kernel_fpu_end);
 
 /*
  * Save the FPU state (mark it for reload if necessary):




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-02  7:33 ` Christian König
  2020-04-02  7:56   ` Jann Horn
@ 2020-04-02 14:13   ` Peter Zijlstra
  2020-04-03  5:28     ` Masami Hiramatsu
  2020-04-09 15:59     ` Peter Zijlstra
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-02 14:13 UTC (permalink / raw)
  To: Christian König
  Cc: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo, mhiramat

On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:
> Hi Jann,
> 
> Am 02.04.20 um 04:34 schrieb Jann Horn:
> > [x86 folks in CC so that they can chime in on the precise rules for this stuff]
> > 
> > Hi!
> > 
> > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
> > turn on floating-point instructions in the compiler flags
> > (-mhard-float, -msse and -msse2) in order to make the "float" and
> > "double" types usable from C code without requiring helper functions.
> > 
> > However, as far as I know, code running in normal kernel context isn't
> > allowed to use floating-point registers without special protection
> > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
> > require that the protected code never blocks). If you violate that
> > rule, that can lead to various issues - among other things, I think
> > the kernel will clobber userspace FPU register state, and I think the
> > kernel code can blow up if a context switch happens at the wrong time,
> > since in-kernel task switches don't preserve FPU state.
> > 
> > Is there some hidden trick I'm missing that makes it okay to use FPU
> > registers here?
> > 
> > I would try testing this, but unfortunately none of the AMD devices I
> > have here have the appropriate graphics hardware...
> 
> yes, using the floating point calculations in the display code has been a
> source of numerous problems and confusion in the past.
> 
> The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> architecture depend handling for x86 and PPC64.
> 
> This originated from the graphics block integrated into AMD CPU (where we
> knew which fp unit we had), but as far as I know is now also used for
> dedicated AMD GPUs as well.
> 
> I'm not really a fan of this either, but so far we weren't able to convince
> the hardware engineers to not use floating point calculations for the
> display stuff.

Might I complain that:

	make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/

does not in fact work?

Anyway, how do people feel about something like the below?

Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
While digging through various opcode manuals is of course forever fun, I
do feel like it might not be the best way.

---
 arch/x86/include/asm/fpu/api.h      |  7 ++++
 arch/x86/include/asm/fpu/internal.h | 11 ++++++
 arch/x86/kernel/fpu/init.c          |  5 +++
 tools/objtool/arch.h                |  1 +
 tools/objtool/arch/x86/decode.c     | 71 ++++++++++++++++++++++++++-----------
 tools/objtool/check.c               | 58 ++++++++++++++++++++++++++++++
 tools/objtool/check.h               |  3 +-
 7 files changed, 134 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index b774c52e5411..b9bca1cdc875 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,6 +12,13 @@
 #define _ASM_X86_FPU_API_H
 #include <linux/bottom_half.h>
 
+#define annotate_fpu() ({						\
+	asm volatile("%c0:\n\t"						\
+		     ".pushsection .discard.fpu_safe\n\t"		\
+		     ".long %c0b - .\n\t"				\
+		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+})
+
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 44c48e34d799..bc436213a0d4 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -102,6 +102,11 @@ static inline void fpstate_init_fxstate(struct fxregs_state *fx)
 }
 extern void fpstate_sanitize_xstate(struct fpu *fpu);
 
+#define _ASM_ANNOTATE_FPU(at)						\
+		     ".pushsection .discard.fpu_safe\n"			\
+		     ".long " #at " - .\n"				\
+		     ".popsection\n"					\
+
 #define user_insn(insn, output, input...)				\
 ({									\
 	int err;							\
@@ -116,6 +121,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 		     "    jmp  2b\n"					\
 		     ".previous\n"					\
 		     _ASM_EXTABLE(1b, 3b)				\
+		     _ASM_ANNOTATE_FPU(1b)				\
 		     : [err] "=r" (err), output				\
 		     : "0"(0), input);					\
 	err;								\
@@ -131,6 +137,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 		     "    jmp  2b\n"					\
 		     ".previous\n"					\
 		     _ASM_EXTABLE(1b, 3b)				\
+		     _ASM_ANNOTATE_FPU(1b)				\
 		     : [err] "=r" (err), output				\
 		     : "0"(0), input);					\
 	err;								\
@@ -140,6 +147,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
 	asm volatile("1:" #insn "\n\t"					\
 		     "2:\n"						\
 		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)	\
+		     _ASM_ANNOTATE_FPU(1b)				\
 		     : output : input)
 
 static inline int copy_fregs_to_user(struct fregs_state __user *fx)
@@ -197,6 +205,7 @@ static inline int copy_user_to_fregs(struct fregs_state __user *fx)
 
 static inline void copy_fxregs_to_kernel(struct fpu *fpu)
 {
+	annotate_fpu();
 	if (IS_ENABLED(CONFIG_X86_32))
 		asm volatile( "fxsave %[fx]" : [fx] "=m" (fpu->state.fxsave));
 	else
@@ -437,6 +446,7 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
 	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
 	 * so we have to mark them inactive:
 	 */
+	annotate_fpu();
 	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
 
 	return 0;
@@ -462,6 +472,7 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
 	 * "m" is a random variable that should be in L1.
 	 */
 	if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
+		annotate_fpu();
 		asm volatile(
 			"fnclex\n\t"
 			"emms\n\t"
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6ce7e0a23268..ca7890bd197c 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void)
 		fpstate_init_soft(&current->thread.fpu.state.soft);
 	else
 #endif
+	{
+		annotate_fpu();
 		asm volatile ("fninit");
+	}
 }
 
 /*
@@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(void)
 	cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
 	write_cr0(cr0);
 
+	annotate_fpu();
 	asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw));
 
 	pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, fcw);
@@ -101,6 +105,7 @@ static void __init fpu__init_system_mxcsr(void)
 		/* Static because GCC does not get 16-byte stack alignment right: */
 		static struct fxregs_state fxregs __initdata;
 
+		annotate_fpu();
 		asm volatile("fxsave %0" : "+m" (fxregs));
 
 		mask = fxregs.mxcsr_mask;
diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index ced3765c4f44..e748ddc92958 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -27,6 +27,7 @@ enum insn_type {
 	INSN_CLAC,
 	INSN_STD,
 	INSN_CLD,
+	INSN_FPU,
 	INSN_OTHER,
 };
 
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index a62e032863a8..7be6e1384efb 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -92,8 +92,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 	*len = insn.length;
 	*type = INSN_OTHER;
 
-	if (insn.vex_prefix.nbytes)
+	if (insn.vex_prefix.nbytes) {
+		*type = INSN_FPU;
 		return 0;
+	}
 
 	op1 = insn.opcode.bytes[0];
 	op2 = insn.opcode.bytes[1];
@@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 
 	case 0x0f:
 
-		if (op2 == 0x01) {
-
+		switch (op2) {
+		case 0x01:
 			if (modrm == 0xca)
 				*type = INSN_CLAC;
 			else if (modrm == 0xcb)
 				*type = INSN_STAC;
+			break;
 
-		} else if (op2 >= 0x80 && op2 <= 0x8f) {
-
+		case 0x80 ... 0x8f: /* Jcc */
 			*type = INSN_JUMP_CONDITIONAL;
+			break;
 
-		} else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
-			   op2 == 0x35) {
-
-			/* sysenter, sysret */
+		case 0x05: /* syscall */
+		case 0x07: /* sysret */
+		case 0x34: /* sysenter */
+		case 0x35: /* sysexit */
 			*type = INSN_CONTEXT_SWITCH;
+			break;
 
-		} else if (op2 == 0x0b || op2 == 0xb9) {
-
-			/* ud2 */
+		case 0x0b: /* ud2 */
+		case 0xb9: /* ud1 */
 			*type = INSN_BUG;
+			break;
 
-		} else if (op2 == 0x0d || op2 == 0x1f) {
-
+		case 0x0d:
+		case 0x1f:
 			/* nopl/nopw */
 			*type = INSN_NOP;
+			break;
 
-		} else if (op2 == 0xa0 || op2 == 0xa8) {
-
-			/* push fs/gs */
+		case 0xa0: /* push fs */
+		case 0xa8: /* push gs */
 			*type = INSN_STACK;
 			op->src.type = OP_SRC_CONST;
 			op->dest.type = OP_DEST_PUSH;
+			break;
 
-		} else if (op2 == 0xa1 || op2 == 0xa9) {
-
-			/* pop fs/gs */
+		case 0xa1: /* pop fs */
+		case 0xa9: /* pop gs */
 			*type = INSN_STACK;
 			op->src.type = OP_SRC_POP;
 			op->dest.type = OP_DEST_MEM;
-		}
+			break;
+
+		case 0xae:
+			/* insane!! */
+			if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != 3 && !insn.prefixes.nbytes)
+				*type = INSN_FPU;
+			break;
 
+		case 0x10 ... 0x17:
+		case 0x28 ... 0x2f:
+		case 0x3a:
+		case 0x50 ... 0x77:
+		case 0x7a ... 0x7f:
+		case 0xc2:
+		case 0xc4 ... 0xc6:
+		case 0xd0 ... 0xff:
+			/* MMX, SSE, VMX */
+			*type = INSN_FPU;
+			break;
+
+		default:
+			break;
+		}
 		break;
 
 	case 0xc9:
@@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 
 		break;
 
+	case 0xd8 ... 0xdf: /* x87 FPU range */
+		*type = INSN_FPU;
+		break;
+
 	case 0xe3:
 		/* jecxz/jrcxz */
 		*type = INSN_JUMP_CONDITIONAL;
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e3bb76358148..af6be584f6a5 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1316,6 +1316,43 @@ static int read_unwind_hints(struct objtool_file *file)
 	return 0;
 }
 
+static int read_fpu_hints(struct objtool_file *file)
+{
+	struct section *sec;
+	struct instruction *insn;
+	struct rela *rela;
+
+	sec = find_section_by_name(file->elf, ".rela.discard.fpu_safe");
+	if (!sec)
+		return 0;
+
+	list_for_each_entry(rela, &sec->rela_list, list) {
+		if (rela->sym->type != STT_SECTION) {
+			WARN("unexpected relocation symbol type in %s", sec->name);
+			return -1;
+		}
+
+		insn = find_insn(file, rela->sym->sec, rela->addend);
+		if (!insn) {
+			WARN("bad .discard.fpu_safe entry");
+			return -1;
+		}
+
+		if (insn->type != INSN_FPU) {
+			WARN_FUNC("fpu_safe hint not an FPU instruction",
+				  insn->sec, insn->offset);
+//			return -1;
+		}
+
+		while (insn && insn->type == INSN_FPU) {
+			insn->fpu_safe = true;
+			insn = next_insn_same_func(file, insn);
+		}
+	}
+
+	return 0;
+}
+
 static int read_retpoline_hints(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1422,6 +1459,10 @@ static int decode_sections(struct objtool_file *file)
 	if (ret)
 		return ret;
 
+	ret = read_fpu_hints(file);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
@@ -2167,6 +2208,16 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
 			if (dead_end_function(file, insn->call_dest))
 				return 0;
 
+			if (insn->call_dest) {
+				if (!strcmp(insn->call_dest->name, "kernel_fpu_begin") ||
+				    !strcmp(insn->call_dest->name, "emulator_get_fpu"))
+					state.fpu = true;
+
+				if (!strcmp(insn->call_dest->name, "kernel_fpu_end") ||
+				    !strcmp(insn->call_dest->name, "emulator_put_fpu"))
+					state.fpu = false;
+			}
+
 			break;
 
 		case INSN_JUMP_CONDITIONAL:
@@ -2275,6 +2326,13 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
 			state.df = false;
 			break;
 
+		case INSN_FPU:
+			if (!state.fpu && !insn->fpu_safe) {
+				WARN_FUNC("FPU instruction outside of kernel_fpu_{begin,end}()", sec, insn->offset);
+				return 1;
+			}
+			break;
+
 		default:
 			break;
 		}
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index f0ce8ffe7135..89c22bcdc64f 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -20,6 +20,7 @@ struct insn_state {
 	unsigned char type;
 	bool bp_scratch;
 	bool drap, end, uaccess, df;
+	bool fpu;
 	unsigned int uaccess_stack;
 	int drap_reg, drap_offset;
 	struct cfi_reg vals[CFI_NUM_REGS];
@@ -34,7 +35,7 @@ struct instruction {
 	enum insn_type type;
 	unsigned long immediate;
 	bool alt_group, dead_end, ignore, hint, save, restore, ignore_alts;
-	bool retpoline_safe;
+	bool retpoline_safe, fpu_safe;
 	u8 visited;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-02  9:36     ` Thomas Gleixner
@ 2020-04-02 14:50       ` Jann Horn
  0 siblings, 0 replies; 37+ messages in thread
From: Jann Horn @ 2020-04-02 14:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christian König, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski

On Thu, Apr 2, 2020 at 11:36 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> Jann Horn <jannh@google.com> writes:
> > On Thu, Apr 2, 2020 at 9:34 AM Christian König <christian.koenig@amd.com> wrote:
> >> Am 02.04.20 um 04:34 schrieb Jann Horn:
> >> > [x86 folks in CC so that they can chime in on the precise rules for
> >> > this stuff]
>
> They are pretty simple.
>
> Any code using FPU needs to be completely isolated from regular code
> either by using inline asm or by moving it to a different compilation
> unit. The invocations need fpu_begin/end() of course.
[...]
> We really need objtool support to validate that.
>
> Peter, now that we know how to do it (noinstr, clac/stac) we can emit
> annotations (see patch below) and validate that any FPU instruction is
> inside a safe region. Hmm?

One annoying aspect is that for the "move it to a different
compilation unit" method, objtool needs to know at compile time
(before linking) which functions are in FPU-enabled object files,
right? So we'd need to have some sort of function annotation that gets
plumbed from the function declaration in a header file through the
compiler into the ELF file, and then let objtool verify that calls to
FPU-enabled methods occur only when the FPU is available? (Ideally
something that covers indirect calls... but this would probably get
really complicated unless we can get the compiler to include that
annotation in its type checking.)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-02 14:13   ` Peter Zijlstra
@ 2020-04-03  5:28     ` Masami Hiramatsu
  2020-04-03 11:21       ` Peter Zijlstra
  2020-04-09 15:59     ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-03  5:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo, mhiramat

On Thu, 2 Apr 2020 16:13:08 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:
> > Hi Jann,
> > 
> > Am 02.04.20 um 04:34 schrieb Jann Horn:
> > > [x86 folks in CC so that they can chime in on the precise rules for this stuff]
> > > 
> > > Hi!
> > > 
> > > I noticed that several makefiles under drivers/gpu/drm/amd/display/dc/
> > > turn on floating-point instructions in the compiler flags
> > > (-mhard-float, -msse and -msse2) in order to make the "float" and
> > > "double" types usable from C code without requiring helper functions.
> > > 
> > > However, as far as I know, code running in normal kernel context isn't
> > > allowed to use floating-point registers without special protection
> > > using helpers like kernel_fpu_begin() and kernel_fpu_end() (which also
> > > require that the protected code never blocks). If you violate that
> > > rule, that can lead to various issues - among other things, I think
> > > the kernel will clobber userspace FPU register state, and I think the
> > > kernel code can blow up if a context switch happens at the wrong time,
> > > since in-kernel task switches don't preserve FPU state.
> > > 
> > > Is there some hidden trick I'm missing that makes it okay to use FPU
> > > registers here?
> > > 
> > > I would try testing this, but unfortunately none of the AMD devices I
> > > have here have the appropriate graphics hardware...
> > 
> > yes, using the floating point calculations in the display code has been a
> > source of numerous problems and confusion in the past.
> > 
> > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> > DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> > architecture depend handling for x86 and PPC64.
> > 
> > This originated from the graphics block integrated into AMD CPU (where we
> > knew which fp unit we had), but as far as I know is now also used for
> > dedicated AMD GPUs as well.
> > 
> > I'm not really a fan of this either, but so far we weren't able to convince
> > the hardware engineers to not use floating point calculations for the
> > display stuff.
> 
> Might I complain that:
> 
> 	make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/
> 
> does not in fact work?
> 
> Anyway, how do people feel about something like the below?
> 
> Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
> While digging through various opcode manuals is of course forever fun, I
> do feel like it might not be the best way.

Yes, it is possible to add INAT_FPU and insn_is_fpu().
But it seems that the below patch needs more classification based on
nmemonic or opcodes.

IMHO, it is the time to expand gen-insn-attr.awk or clone it to
generate another opcode map, so that user will easily extend the
insn infrastructure.
(e.g. I had made an in-kernel disassembler, which generates a mnemonic
 maps from x86-opcode-map.txt)
 https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414

Thank you,

> 
> ---
>  arch/x86/include/asm/fpu/api.h      |  7 ++++
>  arch/x86/include/asm/fpu/internal.h | 11 ++++++
>  arch/x86/kernel/fpu/init.c          |  5 +++
>  tools/objtool/arch.h                |  1 +
>  tools/objtool/arch/x86/decode.c     | 71 ++++++++++++++++++++++++++-----------
>  tools/objtool/check.c               | 58 ++++++++++++++++++++++++++++++
>  tools/objtool/check.h               |  3 +-
>  7 files changed, 134 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> index b774c52e5411..b9bca1cdc875 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -12,6 +12,13 @@
>  #define _ASM_X86_FPU_API_H
>  #include <linux/bottom_half.h>
>  
> +#define annotate_fpu() ({						\
> +	asm volatile("%c0:\n\t"						\
> +		     ".pushsection .discard.fpu_safe\n\t"		\
> +		     ".long %c0b - .\n\t"				\
> +		     ".popsection\n\t" : : "i" (__COUNTER__));		\
> +})
> +
>  /*
>   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
>   * disables preemption so be careful if you intend to use it for long periods
> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
> index 44c48e34d799..bc436213a0d4 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -102,6 +102,11 @@ static inline void fpstate_init_fxstate(struct fxregs_state *fx)
>  }
>  extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  
> +#define _ASM_ANNOTATE_FPU(at)						\
> +		     ".pushsection .discard.fpu_safe\n"			\
> +		     ".long " #at " - .\n"				\
> +		     ".popsection\n"					\
> +
>  #define user_insn(insn, output, input...)				\
>  ({									\
>  	int err;							\
> @@ -116,6 +121,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  		     "    jmp  2b\n"					\
>  		     ".previous\n"					\
>  		     _ASM_EXTABLE(1b, 3b)				\
> +		     _ASM_ANNOTATE_FPU(1b)				\
>  		     : [err] "=r" (err), output				\
>  		     : "0"(0), input);					\
>  	err;								\
> @@ -131,6 +137,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  		     "    jmp  2b\n"					\
>  		     ".previous\n"					\
>  		     _ASM_EXTABLE(1b, 3b)				\
> +		     _ASM_ANNOTATE_FPU(1b)				\
>  		     : [err] "=r" (err), output				\
>  		     : "0"(0), input);					\
>  	err;								\
> @@ -140,6 +147,7 @@ extern void fpstate_sanitize_xstate(struct fpu *fpu);
>  	asm volatile("1:" #insn "\n\t"					\
>  		     "2:\n"						\
>  		     _ASM_EXTABLE_HANDLE(1b, 2b, ex_handler_fprestore)	\
> +		     _ASM_ANNOTATE_FPU(1b)				\
>  		     : output : input)
>  
>  static inline int copy_fregs_to_user(struct fregs_state __user *fx)
> @@ -197,6 +205,7 @@ static inline int copy_user_to_fregs(struct fregs_state __user *fx)
>  
>  static inline void copy_fxregs_to_kernel(struct fpu *fpu)
>  {
> +	annotate_fpu();
>  	if (IS_ENABLED(CONFIG_X86_32))
>  		asm volatile( "fxsave %[fx]" : [fx] "=m" (fpu->state.fxsave));
>  	else
> @@ -437,6 +446,7 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
>  	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
>  	 * so we have to mark them inactive:
>  	 */
> +	annotate_fpu();
>  	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
>  
>  	return 0;
> @@ -462,6 +472,7 @@ static inline void copy_kernel_to_fpregs(union fpregs_state *fpstate)
>  	 * "m" is a random variable that should be in L1.
>  	 */
>  	if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
> +		annotate_fpu();
>  		asm volatile(
>  			"fnclex\n\t"
>  			"emms\n\t"
> diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
> index 6ce7e0a23268..ca7890bd197c 100644
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void)
>  		fpstate_init_soft(&current->thread.fpu.state.soft);
>  	else
>  #endif
> +	{
> +		annotate_fpu();
>  		asm volatile ("fninit");
> +	}
>  }
>  
>  /*
> @@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(void)
>  	cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
>  	write_cr0(cr0);
>  
> +	annotate_fpu();
>  	asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw));
>  
>  	pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, fcw);
> @@ -101,6 +105,7 @@ static void __init fpu__init_system_mxcsr(void)
>  		/* Static because GCC does not get 16-byte stack alignment right: */
>  		static struct fxregs_state fxregs __initdata;
>  
> +		annotate_fpu();
>  		asm volatile("fxsave %0" : "+m" (fxregs));
>  
>  		mask = fxregs.mxcsr_mask;
> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
> index ced3765c4f44..e748ddc92958 100644
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -27,6 +27,7 @@ enum insn_type {
>  	INSN_CLAC,
>  	INSN_STD,
>  	INSN_CLD,
> +	INSN_FPU,
>  	INSN_OTHER,
>  };
>  
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index a62e032863a8..7be6e1384efb 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -92,8 +92,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
>  	*len = insn.length;
>  	*type = INSN_OTHER;
>  
> -	if (insn.vex_prefix.nbytes)
> +	if (insn.vex_prefix.nbytes) {
> +		*type = INSN_FPU;
>  		return 0;
> +	}
>  
>  	op1 = insn.opcode.bytes[0];
>  	op2 = insn.opcode.bytes[1];
> @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
>  
>  	case 0x0f:
>  
> -		if (op2 == 0x01) {
> -
> +		switch (op2) {
> +		case 0x01:
>  			if (modrm == 0xca)
>  				*type = INSN_CLAC;
>  			else if (modrm == 0xcb)
>  				*type = INSN_STAC;
> +			break;
>  
> -		} else if (op2 >= 0x80 && op2 <= 0x8f) {
> -
> +		case 0x80 ... 0x8f: /* Jcc */
>  			*type = INSN_JUMP_CONDITIONAL;
> +			break;
>  
> -		} else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
> -			   op2 == 0x35) {
> -
> -			/* sysenter, sysret */
> +		case 0x05: /* syscall */
> +		case 0x07: /* sysret */
> +		case 0x34: /* sysenter */
> +		case 0x35: /* sysexit */
>  			*type = INSN_CONTEXT_SWITCH;
> +			break;
>  
> -		} else if (op2 == 0x0b || op2 == 0xb9) {
> -
> -			/* ud2 */
> +		case 0x0b: /* ud2 */
> +		case 0xb9: /* ud1 */
>  			*type = INSN_BUG;
> +			break;
>  
> -		} else if (op2 == 0x0d || op2 == 0x1f) {
> -
> +		case 0x0d:
> +		case 0x1f:
>  			/* nopl/nopw */
>  			*type = INSN_NOP;
> +			break;
>  
> -		} else if (op2 == 0xa0 || op2 == 0xa8) {
> -
> -			/* push fs/gs */
> +		case 0xa0: /* push fs */
> +		case 0xa8: /* push gs */
>  			*type = INSN_STACK;
>  			op->src.type = OP_SRC_CONST;
>  			op->dest.type = OP_DEST_PUSH;
> +			break;
>  
> -		} else if (op2 == 0xa1 || op2 == 0xa9) {
> -
> -			/* pop fs/gs */
> +		case 0xa1: /* pop fs */
> +		case 0xa9: /* pop gs */
>  			*type = INSN_STACK;
>  			op->src.type = OP_SRC_POP;
>  			op->dest.type = OP_DEST_MEM;
> -		}
> +			break;
> +
> +		case 0xae:
> +			/* insane!! */
> +			if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != 3 && !insn.prefixes.nbytes)
> +				*type = INSN_FPU;
> +			break;
>  
> +		case 0x10 ... 0x17:
> +		case 0x28 ... 0x2f:
> +		case 0x3a:
> +		case 0x50 ... 0x77:
> +		case 0x7a ... 0x7f:
> +		case 0xc2:
> +		case 0xc4 ... 0xc6:
> +		case 0xd0 ... 0xff:
> +			/* MMX, SSE, VMX */
> +			*type = INSN_FPU;
> +			break;
> +
> +		default:
> +			break;
> +		}
>  		break;
>  
>  	case 0xc9:
> @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
>  
>  		break;
>  
> +	case 0xd8 ... 0xdf: /* x87 FPU range */
> +		*type = INSN_FPU;
> +		break;
> +
>  	case 0xe3:
>  		/* jecxz/jrcxz */
>  		*type = INSN_JUMP_CONDITIONAL;
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index e3bb76358148..af6be584f6a5 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -1316,6 +1316,43 @@ static int read_unwind_hints(struct objtool_file *file)
>  	return 0;
>  }
>  
> +static int read_fpu_hints(struct objtool_file *file)
> +{
> +	struct section *sec;
> +	struct instruction *insn;
> +	struct rela *rela;
> +
> +	sec = find_section_by_name(file->elf, ".rela.discard.fpu_safe");
> +	if (!sec)
> +		return 0;
> +
> +	list_for_each_entry(rela, &sec->rela_list, list) {
> +		if (rela->sym->type != STT_SECTION) {
> +			WARN("unexpected relocation symbol type in %s", sec->name);
> +			return -1;
> +		}
> +
> +		insn = find_insn(file, rela->sym->sec, rela->addend);
> +		if (!insn) {
> +			WARN("bad .discard.fpu_safe entry");
> +			return -1;
> +		}
> +
> +		if (insn->type != INSN_FPU) {
> +			WARN_FUNC("fpu_safe hint not an FPU instruction",
> +				  insn->sec, insn->offset);
> +//			return -1;
> +		}
> +
> +		while (insn && insn->type == INSN_FPU) {
> +			insn->fpu_safe = true;
> +			insn = next_insn_same_func(file, insn);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int read_retpoline_hints(struct objtool_file *file)
>  {
>  	struct section *sec;
> @@ -1422,6 +1459,10 @@ static int decode_sections(struct objtool_file *file)
>  	if (ret)
>  		return ret;
>  
> +	ret = read_fpu_hints(file);
> +	if (ret)
> +		return ret;
> +
>  	return 0;
>  }
>  
> @@ -2167,6 +2208,16 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
>  			if (dead_end_function(file, insn->call_dest))
>  				return 0;
>  
> +			if (insn->call_dest) {
> +				if (!strcmp(insn->call_dest->name, "kernel_fpu_begin") ||
> +				    !strcmp(insn->call_dest->name, "emulator_get_fpu"))
> +					state.fpu = true;
> +
> +				if (!strcmp(insn->call_dest->name, "kernel_fpu_end") ||
> +				    !strcmp(insn->call_dest->name, "emulator_put_fpu"))
> +					state.fpu = false;
> +			}
> +
>  			break;
>  
>  		case INSN_JUMP_CONDITIONAL:
> @@ -2275,6 +2326,13 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
>  			state.df = false;
>  			break;
>  
> +		case INSN_FPU:
> +			if (!state.fpu && !insn->fpu_safe) {
> +				WARN_FUNC("FPU instruction outside of kernel_fpu_{begin,end}()", sec, insn->offset);
> +				return 1;
> +			}
> +			break;
> +
>  		default:
>  			break;
>  		}
> diff --git a/tools/objtool/check.h b/tools/objtool/check.h
> index f0ce8ffe7135..89c22bcdc64f 100644
> --- a/tools/objtool/check.h
> +++ b/tools/objtool/check.h
> @@ -20,6 +20,7 @@ struct insn_state {
>  	unsigned char type;
>  	bool bp_scratch;
>  	bool drap, end, uaccess, df;
> +	bool fpu;
>  	unsigned int uaccess_stack;
>  	int drap_reg, drap_offset;
>  	struct cfi_reg vals[CFI_NUM_REGS];
> @@ -34,7 +35,7 @@ struct instruction {
>  	enum insn_type type;
>  	unsigned long immediate;
>  	bool alt_group, dead_end, ignore, hint, save, restore, ignore_alts;
> -	bool retpoline_safe;
> +	bool retpoline_safe, fpu_safe;
>  	u8 visited;
>  	struct symbol *call_dest;
>  	struct instruction *jump_dest;


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-03  5:28     ` Masami Hiramatsu
@ 2020-04-03 11:21       ` Peter Zijlstra
  2020-04-04  3:08         ` Masami Hiramatsu
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-03 11:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, Apr 03, 2020 at 02:28:37PM +0900, Masami Hiramatsu wrote:
> On Thu, 2 Apr 2020 16:13:08 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
> > While digging through various opcode manuals is of course forever fun, I
> > do feel like it might not be the best way.
> 
> Yes, it is possible to add INAT_FPU and insn_is_fpu().
> But it seems that the below patch needs more classification based on
> nmemonic or opcodes.

I went with opcode, and I think I did a fairly decent job, but I did
find a few problems on a second look at things.

I don't think nmemonic are going to help, the x86 nmemonics are a mess
(much like its opcode tables), there's no way to sanely detect what
registers are effected by an instruction based on name.

The best I came up with is operand class, see below.

> IMHO, it is the time to expand gen-insn-attr.awk or clone it to
> generate another opcode map, so that user will easily extend the
> insn infrastructure.
> (e.g. I had made an in-kernel disassembler, which generates a mnemonic
>  maps from x86-opcode-map.txt)
>  https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414

Cute, and I'm thinking we might want that eventually, people have been
asking for a kernel specific objdump, one that knows about and shows all
the magical things the kernel does, like alternative, jump-labels and
soon the static_call stuff, but also things like the exception handling.

Objtool actually knows about much of that, and pairing it with your
disassembler could print it.

> > +	if (insn.vex_prefix.nbytes) {
> > +		*type = INSN_FPU;
> >  		return 0;
> > +	}

So that's the AVX nonsense dealt with; right until they stick an integer
instruction in the AVX space I suppose :/ Please tell me they didn't
already do that..

> >  
> >  	op1 = insn.opcode.bytes[0];
> >  	op2 = insn.opcode.bytes[1];
> > @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
> >  
> >  	case 0x0f:
> >  
> > +		switch (op2) {

> > +		case 0xae:
> > +			/* insane!! */
> > +			if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != 3 && !insn.prefixes.nbytes)
> > +				*type = INSN_FPU;
> > +			break;

This is crazy, but I was trying to get at the x86 FPU control
instructions:

  FXSAVE, FXRSTOR, LDMXCSR and STMXCSR

Which are in Grp15

Now arguably, I could skip them, the compiler should never emit those,
and the newer, fancier, XSAV family isn't marked as FPU either, even
though it will save/restore the FPU/MMX/SSE/AVX states too.

So I think I'll remove this part, it'll also make the fpu_safe
annotations easier.

> > +		case 0x10 ... 0x17:
> > +		case 0x28 ... 0x2f:
> > +		case 0x3a:
> > +		case 0x50 ... 0x77:
> > +		case 0x7a ... 0x7f:
> > +		case 0xc2:
> > +		case 0xc4 ... 0xc6:
> > +		case 0xd0 ... 0xff:
> > +			/* MMX, SSE, VMX */

So afaict these are the MMX and SSE instruction (clearly the VMX is my
brain loosing it).

I went with the coder64 opcode tables, but our x86-opcode-map.txt seems
to agree, mostly.

I now see that 0f 3a is not all mmx/sse, it also includes RORX which is
an integer instruction. Also, may I state that the opcode map is a
sodding disgrace? Why is an integer instruction stuck in the middle of
SSE instructions like that ?!?!

And I should shorten the last range to 0xd0 ... 0xfe, as 0f ff is UD0.

Other than that I think this is pretty accurate.

> > +			*type = INSN_FPU;
> > +			break;
> > +
> > +		default:
> > +			break;
> > +		}
> >  		break;
> >  
> >  	case 0xc9:
> > @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
> >  
> >  		break;
> >  
> > +	case 0xd8 ... 0xdf: /* x87 FPU range */
> > +		*type = INSN_FPU;
> > +		break;

Our x86-opcode-map.txt lists that as ESC, but doesn't have an escape
table for it. Per:

  http://ref.x86asm.net/coder64.html

these are all the traditional x87 FPU ops.


> > +
> >  	case 0xe3:
> >  		/* jecxz/jrcxz */
> >  		*type = INSN_JUMP_CONDITIONAL;


Now; I suppose I need our x86-opcode-map.txt extended in at least two
ways:

 - all those x87 FPU instructions need adding
 - a way of detecting the affected register set

Now, I suspect we can do that latter by the instruction operands that
are already there, although I've not managed to untangle them fully
(hint, we really should improve the comments on top). Operands seem to
have one capital that denotes the class:

 - I: immediate
 - G: general purpose
 - E
 - P,Q: MMX
 - V,M,W,H: SSE

So if we can extend the awk magic to provide operand classes for each
decoded instruction, then that would simplify this lots.

New version below...

---
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,6 +12,13 @@
 #define _ASM_X86_FPU_API_H
 #include <linux/bottom_half.h>
 
+#define annotate_fpu() ({						\
+	asm volatile("%c0:\n\t"						\
+		     ".pushsection .discard.fpu_safe\n\t"		\
+		     ".long %c0b - .\n\t"				\
+		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+})
+
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -437,6 +437,7 @@ static inline int copy_fpregs_to_fpstate
 	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
 	 * so we have to mark them inactive:
 	 */
+	annotate_fpu();
 	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
 
 	return 0;
@@ -462,6 +463,7 @@ static inline void copy_kernel_to_fpregs
 	 * "m" is a random variable that should be in L1.
 	 */
 	if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
+		annotate_fpu();
 		asm volatile(
 			"fnclex\n\t"
 			"emms\n\t"
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void)
 		fpstate_init_soft(&current->thread.fpu.state.soft);
 	else
 #endif
+	{
+		annotate_fpu();
 		asm volatile ("fninit");
+	}
 }
 
 /*
@@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(voi
 	cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
 	write_cr0(cr0);
 
+	annotate_fpu();
 	asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw));
 
 	pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, fcw);
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -27,6 +27,7 @@ enum insn_type {
 	INSN_CLAC,
 	INSN_STD,
 	INSN_CLD,
+	INSN_FPU,
 	INSN_OTHER,
 };
 
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -73,7 +73,7 @@ int arch_decode_instruction(struct elf *
 {
 	struct insn insn;
 	int x86_64, sign;
-	unsigned char op1, op2, rex = 0, rex_b = 0, rex_r = 0, rex_w = 0,
+	unsigned char op1, op2, op3, rex = 0, rex_b = 0, rex_r = 0, rex_w = 0,
 		      rex_x = 0, modrm = 0, modrm_mod = 0, modrm_rm = 0,
 		      modrm_reg = 0, sib = 0;
 
@@ -92,11 +92,14 @@ int arch_decode_instruction(struct elf *
 	*len = insn.length;
 	*type = INSN_OTHER;
 
-	if (insn.vex_prefix.nbytes)
+	if (insn.vex_prefix.nbytes) {
+		*type = INSN_FPU; /* AVX */
 		return 0;
+	}
 
 	op1 = insn.opcode.bytes[0];
 	op2 = insn.opcode.bytes[1];
+	op3 = insn.opcode.bytes[2];
 
 	if (insn.rex_prefix.nbytes) {
 		rex = insn.rex_prefix.bytes[0];
@@ -357,48 +360,75 @@ int arch_decode_instruction(struct elf *
 
 	case 0x0f:
 
-		if (op2 == 0x01) {
-
+		switch (op2) {
+		case 0x01:
 			if (modrm == 0xca)
 				*type = INSN_CLAC;
 			else if (modrm == 0xcb)
 				*type = INSN_STAC;
+			break;
 
-		} else if (op2 >= 0x80 && op2 <= 0x8f) {
-
+		case 0x80 ... 0x8f: /* Jcc */
 			*type = INSN_JUMP_CONDITIONAL;
+			break;
 
-		} else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
-			   op2 == 0x35) {
-
-			/* sysenter, sysret */
+		case 0x05: /* syscall */
+		case 0x07: /* sysret */
+		case 0x34: /* sysenter */
+		case 0x35: /* sysexit */
 			*type = INSN_CONTEXT_SWITCH;
+			break;
 
-		} else if (op2 == 0x0b || op2 == 0xb9) {
-
-			/* ud2 */
+		case 0xff: /* ud0 */
+		case 0xb9: /* ud1 */
+		case 0x0b: /* ud2 */
 			*type = INSN_BUG;
+			break;
 
-		} else if (op2 == 0x0d || op2 == 0x1f) {
-
+		case 0x0d:
+		case 0x1f:
 			/* nopl/nopw */
 			*type = INSN_NOP;
+			break;
 
-		} else if (op2 == 0xa0 || op2 == 0xa8) {
-
-			/* push fs/gs */
+		case 0xa0: /* push fs */
+		case 0xa8: /* push gs */
 			*type = INSN_STACK;
 			op->src.type = OP_SRC_CONST;
 			op->dest.type = OP_DEST_PUSH;
+			break;
 
-		} else if (op2 == 0xa1 || op2 == 0xa9) {
-
-			/* pop fs/gs */
+		case 0xa1: /* pop fs */
+		case 0xa9: /* pop gs */
 			*type = INSN_STACK;
 			op->src.type = OP_SRC_POP;
 			op->dest.type = OP_DEST_MEM;
-		}
+			break;
+
+		case 0x3a:
+			/* 3 byte escape 0f 3a; SSE4 */
+			switch (op3) {
+			case 0xf0: break; /* exclude RORX */
+			default:
+				 *type = INSN_FPU;
+				 break;
+			}
+			break;
 
+		case 0x10 ... 0x17: /* SSE */
+		case 0x28 ... 0x2f: /* SSE */
+		case 0x50 ... 0x5f: /* SSE */
+		case 0x60 ... 0x77: /* MMX */
+		case 0x7a ... 0x7f: /* MMX */
+		case 0xc2:	    /* SSE */
+		case 0xc4 ... 0xc6: /* SSE */
+		case 0xd0 ... 0xfe: /* MMX */
+			*type = INSN_FPU;
+			break;
+
+		default:
+			break;
+		}
 		break;
 
 	case 0xc9:
@@ -414,6 +444,10 @@ int arch_decode_instruction(struct elf *
 
 		break;
 
+	case 0xd8 ... 0xdf: /* x87 FPU range */
+		*type = INSN_FPU;
+		break;
+
 	case 0xe3:
 		/* jecxz/jrcxz */
 		*type = INSN_JUMP_CONDITIONAL;
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1316,6 +1316,43 @@ static int read_unwind_hints(struct objt
 	return 0;
 }
 
+static int read_fpu_hints(struct objtool_file *file)
+{
+	struct section *sec;
+	struct instruction *insn;
+	struct rela *rela;
+
+	sec = find_section_by_name(file->elf, ".rela.discard.fpu_safe");
+	if (!sec)
+		return 0;
+
+	list_for_each_entry(rela, &sec->rela_list, list) {
+		if (rela->sym->type != STT_SECTION) {
+			WARN("unexpected relocation symbol type in %s", sec->name);
+			return -1;
+		}
+
+		insn = find_insn(file, rela->sym->sec, rela->addend);
+		if (!insn) {
+			WARN("bad .discard.fpu_safe entry");
+			return -1;
+		}
+
+		if (insn->type != INSN_FPU) {
+			WARN_FUNC("fpu_safe hint not an FPU instruction",
+				  insn->sec, insn->offset);
+//			return -1;
+		}
+
+		while (insn && insn->type == INSN_FPU) {
+			insn->fpu_safe = true;
+			insn = next_insn_same_func(file, insn);
+		}
+	}
+
+	return 0;
+}
+
 static int read_retpoline_hints(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1422,6 +1459,10 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;
 
+	ret = read_fpu_hints(file);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
@@ -2167,6 +2208,16 @@ static int validate_branch(struct objtoo
 			if (dead_end_function(file, insn->call_dest))
 				return 0;
 
+			if (insn->call_dest) {
+				if (!strcmp(insn->call_dest->name, "kernel_fpu_begin") ||
+				    !strcmp(insn->call_dest->name, "emulator_get_fpu"))
+					state.fpu = true;
+
+				if (!strcmp(insn->call_dest->name, "kernel_fpu_end") ||
+				    !strcmp(insn->call_dest->name, "emulator_put_fpu"))
+					state.fpu = false;
+			}
+
 			break;
 
 		case INSN_JUMP_CONDITIONAL:
@@ -2275,6 +2326,13 @@ static int validate_branch(struct objtoo
 			state.df = false;
 			break;
 
+		case INSN_FPU:
+			if (!state.fpu && !insn->fpu_safe) {
+				WARN_FUNC("FPU instruction outside of kernel_fpu_{begin,end}()", sec, insn->offset);
+				return 1;
+			}
+			break;
+
 		default:
 			break;
 		}
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -20,6 +20,7 @@ struct insn_state {
 	unsigned char type;
 	bool bp_scratch;
 	bool drap, end, uaccess, df;
+	bool fpu;
 	unsigned int uaccess_stack;
 	int drap_reg, drap_offset;
 	struct cfi_reg vals[CFI_NUM_REGS];
@@ -34,7 +35,7 @@ struct instruction {
 	enum insn_type type;
 	unsigned long immediate;
 	bool alt_group, dead_end, ignore, hint, save, restore, ignore_alts;
-	bool retpoline_safe;
+	bool retpoline_safe, fpu_safe;
 	u8 visited;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-03 11:21       ` Peter Zijlstra
@ 2020-04-04  3:08         ` Masami Hiramatsu
  2020-04-04  3:15           ` Randy Dunlap
                             ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-04  3:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, 3 Apr 2020 13:21:13 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Apr 03, 2020 at 02:28:37PM +0900, Masami Hiramatsu wrote:
> > On Thu, 2 Apr 2020 16:13:08 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Masami, Boris, is there any semi-sane way we can have insn_is_fpu() ?
> > > While digging through various opcode manuals is of course forever fun, I
> > > do feel like it might not be the best way.
> > 
> > Yes, it is possible to add INAT_FPU and insn_is_fpu().
> > But it seems that the below patch needs more classification based on
> > nmemonic or opcodes.
> 
> I went with opcode, and I think I did a fairly decent job, but I did
> find a few problems on a second look at things.
> 
> I don't think nmemonic are going to help, the x86 nmemonics are a mess
> (much like its opcode tables), there's no way to sanely detect what
> registers are effected by an instruction based on name.
> 
> The best I came up with is operand class, see below.

Yeah, so we need another map, current inat map is optimized for
decoding, and lack of some information for reducing size.
E.g. it mixed up the VEX prefix instruction with non-VEX one.

> 
> > IMHO, it is the time to expand gen-insn-attr.awk or clone it to
> > generate another opcode map, so that user will easily extend the
> > insn infrastructure.
> > (e.g. I had made an in-kernel disassembler, which generates a mnemonic
> >  maps from x86-opcode-map.txt)
> >  https://github.com/mhiramat/linux/commits/inkernel-disasm-20130414
> 
> Cute, and I'm thinking we might want that eventually, people have been
> asking for a kernel specific objdump, one that knows about and shows all
> the magical things the kernel does, like alternative, jump-labels and
> soon the static_call stuff, but also things like the exception handling.
> 
> Objtool actually knows about much of that, and pairing it with your
> disassembler could print it.
> 
> > > +	if (insn.vex_prefix.nbytes) {
> > > +		*type = INSN_FPU;
> > >  		return 0;
> > > +	}
> 
> So that's the AVX nonsense dealt with; right until they stick an integer
> instruction in the AVX space I suppose :/ Please tell me they didn't
> already do that..

I'm not so sure.
Theoretically, x86 instruction can be encoded with VEX prefix instead of
REX prefix (most compiler may not output such inefficient code.)

> > >  	op1 = insn.opcode.bytes[0];
> > >  	op2 = insn.opcode.bytes[1];
> > > @@ -357,48 +359,71 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
> > >  
> > >  	case 0x0f:
> > >  
> > > +		switch (op2) {
> 
> > > +		case 0xae:
> > > +			/* insane!! */
> > > +			if ((modrm_reg >= 0 && modrm_reg <= 3) && modrm_mod != 3 && !insn.prefixes.nbytes)
> > > +				*type = INSN_FPU;
> > > +			break;
> 
> This is crazy, but I was trying to get at the x86 FPU control
> instructions:
> 
>   FXSAVE, FXRSTOR, LDMXCSR and STMXCSR
> 
> Which are in Grp15

Yes, that is a complex part.

> Now arguably, I could skip them, the compiler should never emit those,
> and the newer, fancier, XSAV family isn't marked as FPU either, even
> though it will save/restore the FPU/MMX/SSE/AVX states too.
> 
> So I think I'll remove this part, it'll also make the fpu_safe
> annotations easier.
> 
> > > +		case 0x10 ... 0x17:
> > > +		case 0x28 ... 0x2f:
> > > +		case 0x3a:
> > > +		case 0x50 ... 0x77:
> > > +		case 0x7a ... 0x7f:
> > > +		case 0xc2:
> > > +		case 0xc4 ... 0xc6:
> > > +		case 0xd0 ... 0xff:
> > > +			/* MMX, SSE, VMX */
> 
> So afaict these are the MMX and SSE instruction (clearly the VMX is my
> brain loosing it).
> 
> I went with the coder64 opcode tables, but our x86-opcode-map.txt seems
> to agree, mostly.
> 
> I now see that 0f 3a is not all mmx/sse, it also includes RORX which is
> an integer instruction. Also, may I state that the opcode map is a
> sodding disgrace? Why is an integer instruction stuck in the middle of
> SSE instructions like that ?!?!
> 
> And I should shorten the last range to 0xd0 ... 0xfe, as 0f ff is UD0.
> 
> Other than that I think this is pretty accurate.
> 
> > > +			*type = INSN_FPU;
> > > +			break;
> > > +
> > > +		default:
> > > +			break;
> > > +		}
> > >  		break;
> > >  
> > >  	case 0xc9:
> > > @@ -414,6 +439,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
> > >  
> > >  		break;
> > >  
> > > +	case 0xd8 ... 0xdf: /* x87 FPU range */
> > > +		*type = INSN_FPU;
> > > +		break;
> 
> Our x86-opcode-map.txt lists that as ESC, but doesn't have an escape
> table for it. Per:
> 
>   http://ref.x86asm.net/coder64.html
> 
> these are all the traditional x87 FPU ops.

Yes, for decoding, we don't need those tables.

> > > +
> > >  	case 0xe3:
> > >  		/* jecxz/jrcxz */
> > >  		*type = INSN_JUMP_CONDITIONAL;
> 
> 
> Now; I suppose I need our x86-opcode-map.txt extended in at least two
> ways:
> 
>  - all those x87 FPU instructions need adding
>  - a way of detecting the affected register set
> 
> Now, I suspect we can do that latter by the instruction operands that
> are already there, although I've not managed to untangle them fully
> (hint, we really should improve the comments on top). Operands seem to
> have one capital that denotes the class:
> 
>  - I: immediate
>  - G: general purpose
>  - E
>  - P,Q: MMX
>  - V,M,W,H: SSE
> 
> So if we can extend the awk magic to provide operand classes for each
> decoded instruction, then that would simplify this lots.

Hmm, it requires to generate another tables. Instead, what about below?
I've added INAT_FPU (and INAT_FPUIFVEX*) flag to find FPU related code.

*) actually, current inat tables have variant tables for the last prefix
variations. But it doesn't have vex variations which doubles the size
of table, that is too much just for FPU opcode.

From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat@kernel.org>
Date: Fri, 3 Apr 2020 16:58:22 +0900
Subject: [PATCH] x86: insn: Add insn_is_fpu()

Add insn_is_fpu(insn) which tells that the insn is
whether touch the MMX/XMM/YMM register or the instruction
of FP coprocessor.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 arch/x86/include/asm/inat.h                |  7 +++++++
 arch/x86/include/asm/insn.h                | 11 +++++++++++
 arch/x86/lib/x86-opcode-map.txt            | 22 +++++++++++-----------
 arch/x86/tools/gen-insn-attr-x86.awk       | 21 ++++++++++++++++-----
 tools/arch/x86/include/asm/inat.h          |  7 +++++++
 tools/arch/x86/include/asm/insn.h          | 11 +++++++++++
 tools/arch/x86/lib/x86-opcode-map.txt      | 22 +++++++++++-----------
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 21 ++++++++++++++++-----
 8 files changed, 90 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 4cf2ad521f65..ffce45178c08 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5c1ae3eff9d4..03e711668839 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -129,6 +129,17 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..f139bfccfdb9 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -404,17 +404,17 @@ AVXcode: 1
 3f:
 # 0x0f 0x40-0x4f
 40: CMOVO Gv,Ev
-41: CMOVNO Gv,Ev | kandw/q Vk,Hk,Uk | kandb/d Vk,Hk,Uk (66)
-42: CMOVB/C/NAE Gv,Ev | kandnw/q Vk,Hk,Uk | kandnb/d Vk,Hk,Uk (66)
+41: CMOVNO Gv,Ev | kandw/q Vk,Hk,Uk (v) | kandb/d Vk,Hk,Uk (66),(v)
+42: CMOVB/C/NAE Gv,Ev | kandnw/q Vk,Hk,Uk (v) | kandnb/d Vk,Hk,Uk (66),(v)
 43: CMOVAE/NB/NC Gv,Ev
-44: CMOVE/Z Gv,Ev | knotw/q Vk,Uk | knotb/d Vk,Uk (66)
-45: CMOVNE/NZ Gv,Ev | korw/q Vk,Hk,Uk | korb/d Vk,Hk,Uk (66)
-46: CMOVBE/NA Gv,Ev | kxnorw/q Vk,Hk,Uk | kxnorb/d Vk,Hk,Uk (66)
-47: CMOVA/NBE Gv,Ev | kxorw/q Vk,Hk,Uk | kxorb/d Vk,Hk,Uk (66)
+44: CMOVE/Z Gv,Ev | knotw/q Vk,Uk (v) | knotb/d Vk,Uk (66),(v)
+45: CMOVNE/NZ Gv,Ev | korw/q Vk,Hk,Uk (v) | korb/d Vk,Hk,Uk (66),(v)
+46: CMOVBE/NA Gv,Ev | kxnorw/q Vk,Hk,Uk (v) | kxnorb/d Vk,Hk,Uk (66),(v)
+47: CMOVA/NBE Gv,Ev | kxorw/q Vk,Hk,Uk (v) | kxorb/d Vk,Hk,Uk (66),(v)
 48: CMOVS Gv,Ev
 49: CMOVNS Gv,Ev
-4a: CMOVP/PE Gv,Ev | kaddw/q Vk,Hk,Uk | kaddb/d Vk,Hk,Uk (66)
-4b: CMOVNP/PO Gv,Ev | kunpckbw Vk,Hk,Uk (66) | kunpckwd/dq Vk,Hk,Uk
+4a: CMOVP/PE Gv,Ev | kaddw/q Vk,Hk,Uk (v) | kaddb/d Vk,Hk,Uk (66),(v)
+4b: CMOVNP/PO Gv,Ev | kunpckbw Vk,Hk,Uk (66),(v) | kunpckwd/dq Vk,Hk,Uk (v)
 4c: CMOVL/NGE Gv,Ev
 4d: CMOVNL/GE Gv,Ev
 4e: CMOVLE/NG Gv,Ev
@@ -1037,9 +1037,9 @@ EndTable
 
 GrpTable: Grp15
 0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+1: fxrstor | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..2b1ab6673bd3 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+	mmxreg_expr = "^[HLNPQUVW][a-z]+"
+	mmx_expr = "^\\((emms|fxsave|fxrstor|ldmxcsr|stmxcsr)\\)"
+	mmxifvex_expr = "^CMOV" # CMOV is non-vex non-mmx
+	fpu_expr = "^ESC"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 {
 	imm = null
 	mod = null
+	mmx = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, mmxreg_expr) == 1) {
+			mmx = "INAT_FPU"
+		}
 	}
+	if (mmx)
+		imm = add_flags(imm, mmx)
 	return add_flags(imm, mod)
 }
 
@@ -318,9 +327,11 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check coprocessor escape
+		if (match(opcode, fpu_expr) || match(opcode, mmx_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
+		if (match(opcode, mmxifvex_expr))
+			flags = add_flags(flags, "INAT_FPUIFVEX")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 877827b7c2c3..2e6a05290efd 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 568854b14d0a..d21b1debd230 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -129,6 +129,17 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..f139bfccfdb9 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -404,17 +404,17 @@ AVXcode: 1
 3f:
 # 0x0f 0x40-0x4f
 40: CMOVO Gv,Ev
-41: CMOVNO Gv,Ev | kandw/q Vk,Hk,Uk | kandb/d Vk,Hk,Uk (66)
-42: CMOVB/C/NAE Gv,Ev | kandnw/q Vk,Hk,Uk | kandnb/d Vk,Hk,Uk (66)
+41: CMOVNO Gv,Ev | kandw/q Vk,Hk,Uk (v) | kandb/d Vk,Hk,Uk (66),(v)
+42: CMOVB/C/NAE Gv,Ev | kandnw/q Vk,Hk,Uk (v) | kandnb/d Vk,Hk,Uk (66),(v)
 43: CMOVAE/NB/NC Gv,Ev
-44: CMOVE/Z Gv,Ev | knotw/q Vk,Uk | knotb/d Vk,Uk (66)
-45: CMOVNE/NZ Gv,Ev | korw/q Vk,Hk,Uk | korb/d Vk,Hk,Uk (66)
-46: CMOVBE/NA Gv,Ev | kxnorw/q Vk,Hk,Uk | kxnorb/d Vk,Hk,Uk (66)
-47: CMOVA/NBE Gv,Ev | kxorw/q Vk,Hk,Uk | kxorb/d Vk,Hk,Uk (66)
+44: CMOVE/Z Gv,Ev | knotw/q Vk,Uk (v) | knotb/d Vk,Uk (66),(v)
+45: CMOVNE/NZ Gv,Ev | korw/q Vk,Hk,Uk (v) | korb/d Vk,Hk,Uk (66),(v)
+46: CMOVBE/NA Gv,Ev | kxnorw/q Vk,Hk,Uk (v) | kxnorb/d Vk,Hk,Uk (66),(v)
+47: CMOVA/NBE Gv,Ev | kxorw/q Vk,Hk,Uk (v) | kxorb/d Vk,Hk,Uk (66),(v)
 48: CMOVS Gv,Ev
 49: CMOVNS Gv,Ev
-4a: CMOVP/PE Gv,Ev | kaddw/q Vk,Hk,Uk | kaddb/d Vk,Hk,Uk (66)
-4b: CMOVNP/PO Gv,Ev | kunpckbw Vk,Hk,Uk (66) | kunpckwd/dq Vk,Hk,Uk
+4a: CMOVP/PE Gv,Ev | kaddw/q Vk,Hk,Uk (v) | kaddb/d Vk,Hk,Uk (66),(v)
+4b: CMOVNP/PO Gv,Ev | kunpckbw Vk,Hk,Uk (66),(v) | kunpckwd/dq Vk,Hk,Uk (v)
 4c: CMOVL/NGE Gv,Ev
 4d: CMOVNL/GE Gv,Ev
 4e: CMOVLE/NG Gv,Ev
@@ -1037,9 +1037,9 @@ EndTable
 
 GrpTable: Grp15
 0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+1: fxrstor | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..2b1ab6673bd3 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+	mmxreg_expr = "^[HLNPQUVW][a-z]+"
+	mmx_expr = "^\\((emms|fxsave|fxrstor|ldmxcsr|stmxcsr)\\)"
+	mmxifvex_expr = "^CMOV" # CMOV is non-vex non-mmx
+	fpu_expr = "^ESC"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 {
 	imm = null
 	mod = null
+	mmx = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, mmxreg_expr) == 1) {
+			mmx = "INAT_FPU"
+		}
 	}
+	if (mmx)
+		imm = add_flags(imm, mmx)
 	return add_flags(imm, mod)
 }
 
@@ -318,9 +327,11 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check coprocessor escape
+		if (match(opcode, fpu_expr) || match(opcode, mmx_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
+		if (match(opcode, mmxifvex_expr))
+			flags = add_flags(flags, "INAT_FPUIFVEX")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
-- 
2.20.1





-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-04  3:08         ` Masami Hiramatsu
@ 2020-04-04  3:15           ` Randy Dunlap
  2020-04-04  8:32             ` Masami Hiramatsu
  2020-04-04 14:32           ` Peter Zijlstra
  2020-04-04 14:36           ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
  2 siblings, 1 reply; 37+ messages in thread
From: Randy Dunlap @ 2020-04-04  3:15 UTC (permalink / raw)
  To: Masami Hiramatsu, Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On 4/3/20 8:08 PM, Masami Hiramatsu wrote:
> +static inline int insn_is_fpu(struct insn *insn)
> +{
> +	if (!insn->opcode.got)
> +		insn_get_opcode(insn);
> +	if (inat_is_fpu(insn->attr)) {
> +		if (insn->attr & INAT_FPUIFVEX)
> +			return insn_is_avx(insn);
> +		return 1;
> +	}

	return 0; // ??

> +}
> +


-- 
~Randy


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-04  3:15           ` Randy Dunlap
@ 2020-04-04  8:32             ` Masami Hiramatsu
  0 siblings, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-04  8:32 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Peter Zijlstra, Christian König, Jann Horn, Harry Wentland,
	Leo Li, amd-gfx, Alex Deucher, David (ChunMing) Zhou,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, 3 Apr 2020 20:15:11 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> On 4/3/20 8:08 PM, Masami Hiramatsu wrote:
> > +static inline int insn_is_fpu(struct insn *insn)
> > +{
> > +	if (!insn->opcode.got)
> > +		insn_get_opcode(insn);
> > +	if (inat_is_fpu(insn->attr)) {
> > +		if (insn->attr & INAT_FPUIFVEX)
> > +			return insn_is_avx(insn);
> > +		return 1;
> > +	}
> 
> 	return 0; // ??

Oops, right!

Hm, I need to add a caller for this API...

Thanks,

> 
> > +}
> > +
> 
> 
> -- 
> ~Randy
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-04  3:08         ` Masami Hiramatsu
  2020-04-04  3:15           ` Randy Dunlap
@ 2020-04-04 14:32           ` Peter Zijlstra
  2020-04-05  3:19             ` Masami Hiramatsu
  2020-04-04 14:36           ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
  2 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-04 14:32 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote:
> From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Fri, 3 Apr 2020 16:58:22 +0900
> Subject: [PATCH] x86: insn: Add insn_is_fpu()
> 
> Add insn_is_fpu(insn) which tells that the insn is
> whether touch the MMX/XMM/YMM register or the instruction
> of FP coprocessor.

Looks good, although I changed it a little like so:

--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -133,11 +133,12 @@ static inline int insn_is_fpu(struct ins
 {
 	if (!insn->opcode.got)
 		insn_get_opcode(insn);
-	if (inat_is_fpu(insn->attr)) {
+	if (inat_is_fpu(insn->attr)) {
 		if (insn->attr & INAT_FPUIFVEX)
 			return insn_is_avx(insn);
 		return 1;
 	}
+	return 0;
 }
 
 static inline int insn_has_emulate_prefix(struct insn *insn)
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -269,14 +269,14 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+d8: FPU
+d9: FPU
+da: FPU
+db: FPU
+dc: FPU
+dd: FPU
+de: FPU
+df: FPU
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -65,10 +65,11 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	mmxreg_expr = "^[HLNPQUVW][a-z]+"
-	mmx_expr = "^\\((emms|fxsave|fxrstor|ldmxcsr|stmxcsr)\\)"
-	mmxifvex_expr = "^CMOV" # CMOV is non-vex non-mmx
-	fpu_expr = "^ESC"
+
+	mmxreg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	mmx_expr = "^\\(emms\\)"	  # MMX/SSE nmemonics lacking operands
+	mmxifvex_expr = "^CMOV"		  # nmemonics NOT an AVX
+	fpu_expr = "^FPU"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-04  3:08         ` Masami Hiramatsu
  2020-04-04  3:15           ` Randy Dunlap
  2020-04-04 14:32           ` Peter Zijlstra
@ 2020-04-04 14:36           ` Peter Zijlstra
  2020-04-05  3:37             ` Masami Hiramatsu
  2 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-04 14:36 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote:
> From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Fri, 3 Apr 2020 16:58:22 +0900
> Subject: [PATCH] x86: insn: Add insn_is_fpu()
> 
> Add insn_is_fpu(insn) which tells that the insn is
> whether touch the MMX/XMM/YMM register or the instruction
> of FP coprocessor.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>

With that I get a lot of warnings:

  FPU instruction outside of kernel_fpu_{begin,end}()

two random examples (x86-64-allmodconfig build):

arch/x86/xen/enlighten.o: warning: objtool: xen_vcpu_restore()+0x341: FPU instruction outside of kernel_fpu_{begin,end}()

$ ./objdump-func.sh defconfig-build/arch/x86/xen/enlighten.o xen_vcpu_restore | grep 341
0341  841:      0f 92 c3                setb   %bl

arch/x86/events/core.o: warning: objtool: x86_pmu_stop()+0x6d: FPU instruction outside of kernel_fpu_{begin,end}()

$ ./objdump-func.sh defconfig-build/arch/x86/events/core.o x86_pmu_stop | grep 6d
006d     23ad:  41 0f 92 c6             setb   %r14b

Which seems to suggest something goes wobbly with SETB, but I'm not
seeing what in a hurry.


---
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,6 +12,13 @@
 #define _ASM_X86_FPU_API_H
 #include <linux/bottom_half.h>

+#define annotate_fpu() ({						\
+	asm volatile("%c0:\n\t"						\
+		     ".pushsection .discard.fpu_safe\n\t"		\
+		     ".long %c0b - .\n\t"				\
+		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+})
+
 /*
  * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
  * disables preemption so be careful if you intend to use it for long periods
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -437,6 +437,7 @@ static inline int copy_fpregs_to_fpstate
 	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
 	 * so we have to mark them inactive:
 	 */
+	annotate_fpu();
 	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));

 	return 0;
@@ -462,6 +463,7 @@ static inline void copy_kernel_to_fpregs
 	 * "m" is a random variable that should be in L1.
 	 */
 	if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
+		annotate_fpu();
 		asm volatile(
 			"fnclex\n\t"
 			"emms\n\t"
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void)
 		fpstate_init_soft(&current->thread.fpu.state.soft);
 	else
 #endif
+	{
+		annotate_fpu();
 		asm volatile ("fninit");
+	}
 }

 /*
@@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(voi
 	cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
 	write_cr0(cr0);

+	annotate_fpu();
 	asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw));

 	pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, fcw);
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -27,6 +27,7 @@ enum insn_type {
 	INSN_CLAC,
 	INSN_STD,
 	INSN_CLD,
+	INSN_FPU,
 	INSN_OTHER,
 };

--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -92,6 +92,11 @@ int arch_decode_instruction(struct elf *
 	*len = insn.length;
 	*type = INSN_OTHER;

+	if (insn_is_fpu(&insn)) {
+		*type = INSN_FPU;
+		return 0;
+	}
+
 	if (insn.vex_prefix.nbytes)
 		return 0;

@@ -357,48 +362,54 @@ int arch_decode_instruction(struct elf *

 	case 0x0f:

-		if (op2 == 0x01) {
-
+		switch (op2) {
+		case 0x01:
 			if (modrm == 0xca)
 				*type = INSN_CLAC;
 			else if (modrm == 0xcb)
 				*type = INSN_STAC;
+			break;

-		} else if (op2 >= 0x80 && op2 <= 0x8f) {
-
+		case 0x80 ... 0x8f: /* Jcc */
 			*type = INSN_JUMP_CONDITIONAL;
+			break;

-		} else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
-			   op2 == 0x35) {
-
-			/* sysenter, sysret */
+		case 0x05: /* syscall */
+		case 0x07: /* sysret */
+		case 0x34: /* sysenter */
+		case 0x35: /* sysexit */
 			*type = INSN_CONTEXT_SWITCH;
+			break;

-		} else if (op2 == 0x0b || op2 == 0xb9) {
-
-			/* ud2 */
+		case 0xff: /* ud0 */
+		case 0xb9: /* ud1 */
+		case 0x0b: /* ud2 */
 			*type = INSN_BUG;
+			break;

-		} else if (op2 == 0x0d || op2 == 0x1f) {
-
+		case 0x0d:
+		case 0x1f:
 			/* nopl/nopw */
 			*type = INSN_NOP;
+			break;

-		} else if (op2 == 0xa0 || op2 == 0xa8) {
-
-			/* push fs/gs */
+		case 0xa0: /* push fs */
+		case 0xa8: /* push gs */
 			*type = INSN_STACK;
 			op->src.type = OP_SRC_CONST;
 			op->dest.type = OP_DEST_PUSH;
+			break;

-		} else if (op2 == 0xa1 || op2 == 0xa9) {
-
-			/* pop fs/gs */
+		case 0xa1: /* pop fs */
+		case 0xa9: /* pop gs */
 			*type = INSN_STACK;
 			op->src.type = OP_SRC_POP;
 			op->dest.type = OP_DEST_MEM;
-		}
+			break;

+		default:
+			break;
+		}
 		break;

 	case 0xc9:
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1316,6 +1316,43 @@ static int read_unwind_hints(struct objt
 	return 0;
 }

+static int read_fpu_hints(struct objtool_file *file)
+{
+	struct section *sec;
+	struct instruction *insn;
+	struct rela *rela;
+
+	sec = find_section_by_name(file->elf, ".rela.discard.fpu_safe");
+	if (!sec)
+		return 0;
+
+	list_for_each_entry(rela, &sec->rela_list, list) {
+		if (rela->sym->type != STT_SECTION) {
+			WARN("unexpected relocation symbol type in %s", sec->name);
+			return -1;
+		}
+
+		insn = find_insn(file, rela->sym->sec, rela->addend);
+		if (!insn) {
+			WARN("bad .discard.fpu_safe entry");
+			return -1;
+		}
+
+		if (insn->type != INSN_FPU) {
+			WARN_FUNC("fpu_safe hint not an FPU instruction",
+				  insn->sec, insn->offset);
+//			return -1;
+		}
+
+		while (insn && insn->type == INSN_FPU) {
+			insn->fpu_safe = true;
+			insn = next_insn_same_func(file, insn);
+		}
+	}
+
+	return 0;
+}
+
 static int read_retpoline_hints(struct objtool_file *file)
 {
 	struct section *sec;
@@ -1422,6 +1459,10 @@ static int decode_sections(struct objtoo
 	if (ret)
 		return ret;

+	ret = read_fpu_hints(file);
+	if (ret)
+		return ret;
+
 	return 0;
 }

@@ -2167,6 +2208,16 @@ static int validate_branch(struct objtoo
 			if (dead_end_function(file, insn->call_dest))
 				return 0;

+			if (insn->call_dest) {
+				if (!strcmp(insn->call_dest->name, "kernel_fpu_begin") ||
+				    !strcmp(insn->call_dest->name, "emulator_get_fpu"))
+					state.fpu = true;
+
+				if (!strcmp(insn->call_dest->name, "kernel_fpu_end") ||
+				    !strcmp(insn->call_dest->name, "emulator_put_fpu"))
+					state.fpu = false;
+			}
+
 			break;

 		case INSN_JUMP_CONDITIONAL:
@@ -2275,6 +2326,13 @@ static int validate_branch(struct objtoo
 			state.df = false;
 			break;

+		case INSN_FPU:
+			if (!state.fpu && !insn->fpu_safe) {
+				WARN_FUNC("FPU instruction outside of kernel_fpu_{begin,end}()", sec, insn->offset);
+				return 1;
+			}
+			break;
+
 		default:
 			break;
 		}
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -20,6 +20,7 @@ struct insn_state {
 	unsigned char type;
 	bool bp_scratch;
 	bool drap, end, uaccess, df;
+	bool fpu;
 	unsigned int uaccess_stack;
 	int drap_reg, drap_offset;
 	struct cfi_reg vals[CFI_NUM_REGS];
@@ -34,7 +35,7 @@ struct instruction {
 	enum insn_type type;
 	unsigned long immediate;
 	bool alt_group, dead_end, ignore, hint, save, restore, ignore_alts;
-	bool retpoline_safe;
+	bool retpoline_safe, fpu_safe;
 	u8 visited;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-04 14:32           ` Peter Zijlstra
@ 2020-04-05  3:19             ` Masami Hiramatsu
  2020-04-06 10:21               ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-05  3:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Sat, 4 Apr 2020 16:32:24 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote:
> > From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Date: Fri, 3 Apr 2020 16:58:22 +0900
> > Subject: [PATCH] x86: insn: Add insn_is_fpu()
> > 
> > Add insn_is_fpu(insn) which tells that the insn is
> > whether touch the MMX/XMM/YMM register or the instruction
> > of FP coprocessor.
> 
> Looks good, although I changed it a little like so:

OK, and I found there is a mistake on my patch. I should not use (v) for
the instruction, which makes decoder insane.

> 
> --- a/arch/x86/include/asm/insn.h
> +++ b/arch/x86/include/asm/insn.h
> @@ -133,11 +133,12 @@ static inline int insn_is_fpu(struct ins
>  {
>  	if (!insn->opcode.got)
>  		insn_get_opcode(insn);
> -	if (inat_is_fpu(insn->attr)) {
> +	if (inat_is_fpu(insn->attr)) {
>  		if (insn->attr & INAT_FPUIFVEX)
>  			return insn_is_avx(insn);
>  		return 1;
>  	}
> +	return 0;
>  }
>  
>  static inline int insn_has_emulate_prefix(struct insn *insn)
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -269,14 +269,14 @@ d4: AAM Ib (i64)
>  d5: AAD Ib (i64)
>  d6:
>  d7: XLAT/XLATB
> -d8: ESC
> -d9: ESC
> -da: ESC
> -db: ESC
> -dc: ESC
> -dd: ESC
> -de: ESC
> -df: ESC
> +d8: FPU
> +d9: FPU
> +da: FPU
> +db: FPU
> +dc: FPU
> +dd: FPU
> +de: FPU
> +df: FPU

I don't want to use FPU since Intel SDM is still using ESC because it
is co-processor escape code.

Here is the new patch. 

From d7eca4946ab3f0d08ad1268f49418f8655aaf57c Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat@kernel.org>
Date: Fri, 3 Apr 2020 16:58:22 +0900
Subject: [PATCH] x86: insn: Add insn_is_fpu()

Add insn_is_fpu(insn) which tells that the insn is
whether touch the MMX/XMM/YMM register or the instruction
of FP coprocessor.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes:
  - Fix SET* also not FPU (unless it has vex prefix.)
  - Fix to remove (v) (VEX only) flag.
---
 arch/x86/include/asm/inat.h                |  7 +++++++
 arch/x86/include/asm/insn.h                | 12 ++++++++++++
 arch/x86/lib/x86-opcode-map.txt            |  6 +++---
 arch/x86/tools/gen-insn-attr-x86.awk       | 22 +++++++++++++++++-----
 tools/arch/x86/include/asm/inat.h          |  7 +++++++
 tools/arch/x86/include/asm/insn.h          | 12 ++++++++++++
 tools/arch/x86/lib/x86-opcode-map.txt      |  6 +++---
 tools/arch/x86/tools/gen-insn-attr-x86.awk | 22 +++++++++++++++++-----
 8 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 4cf2ad521f65..ffce45178c08 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5c1ae3eff9d4..1752c54d2103 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..c3d36b4c894d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -1037,9 +1037,9 @@ EndTable
 
 GrpTable: Grp15
 0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+1: fxrstor | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..21de6757893f 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -65,7 +65,11 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+
+	mmxreg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	mmx_expr = "^\\((emms|fxsave|fxrstor|ldmxcsr|stmxcsr)\\)" # MMX/SSE nmemonics lacking operands
+	mmxifvex_expr = "^(CMOV|SET.*)" # nmemonics NOT an AVX
+	fpu_expr = "^ESC"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +240,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 {
 	imm = null
 	mod = null
+	mmx = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +258,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, mmxreg_expr) == 1) {
+			mmx = "INAT_FPU"
+		}
 	}
+	if (mmx)
+		imm = add_flags(imm, mmx)
 	return add_flags(imm, mod)
 }
 
@@ -318,9 +328,11 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check coprocessor escape
+		if (match(opcode, fpu_expr) || match(opcode, mmx_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
+		if (match(opcode, mmxifvex_expr))
+			flags = add_flags(flags, "INAT_FPUIFVEX")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 877827b7c2c3..2e6a05290efd 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 568854b14d0a..d9f6bd9059c1 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..c3d36b4c894d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -1037,9 +1037,9 @@ EndTable
 
 GrpTable: Grp15
 0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+1: fxrstor | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..21de6757893f 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -65,7 +65,11 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+
+	mmxreg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	mmx_expr = "^\\((emms|fxsave|fxrstor|ldmxcsr|stmxcsr)\\)" # MMX/SSE nmemonics lacking operands
+	mmxifvex_expr = "^(CMOV|SET.*)" # nmemonics NOT an AVX
+	fpu_expr = "^ESC"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +240,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 {
 	imm = null
 	mod = null
+	mmx = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +258,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, mmxreg_expr) == 1) {
+			mmx = "INAT_FPU"
+		}
 	}
+	if (mmx)
+		imm = add_flags(imm, mmx)
 	return add_flags(imm, mod)
 }
 
@@ -318,9 +328,11 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check coprocessor escape
+		if (match(opcode, fpu_expr) || match(opcode, mmx_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
+		if (match(opcode, mmxifvex_expr))
+			flags = add_flags(flags, "INAT_FPUIFVEX")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
-- 
2.20.1


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-04 14:36           ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
@ 2020-04-05  3:37             ` Masami Hiramatsu
  0 siblings, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-05  3:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Sat, 4 Apr 2020 16:36:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Sat, Apr 04, 2020 at 12:08:08PM +0900, Masami Hiramatsu wrote:
> > From c609be0b6403245612503fca1087628655bab96c Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Date: Fri, 3 Apr 2020 16:58:22 +0900
> > Subject: [PATCH] x86: insn: Add insn_is_fpu()
> > 
> > Add insn_is_fpu(insn) which tells that the insn is
> > whether touch the MMX/XMM/YMM register or the instruction
> > of FP coprocessor.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> With that I get a lot of warnings:
> 
>   FPU instruction outside of kernel_fpu_{begin,end}()
> 
> two random examples (x86-64-allmodconfig build):
> 
> arch/x86/xen/enlighten.o: warning: objtool: xen_vcpu_restore()+0x341: FPU instruction outside of kernel_fpu_{begin,end}()
> 
> $ ./objdump-func.sh defconfig-build/arch/x86/xen/enlighten.o xen_vcpu_restore | grep 341
> 0341  841:      0f 92 c3                setb   %bl
> 
> arch/x86/events/core.o: warning: objtool: x86_pmu_stop()+0x6d: FPU instruction outside of kernel_fpu_{begin,end}()
> 
> $ ./objdump-func.sh defconfig-build/arch/x86/events/core.o x86_pmu_stop | grep 6d
> 006d     23ad:  41 0f 92 c6             setb   %r14b
> 
> Which seems to suggest something goes wobbly with SETB, but I'm not
> seeing what in a hurry.

Yes, I also got same issue, please try the new one.

Thank you!

> 
> 
> ---
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -12,6 +12,13 @@
>  #define _ASM_X86_FPU_API_H
>  #include <linux/bottom_half.h>
> 
> +#define annotate_fpu() ({						\
> +	asm volatile("%c0:\n\t"						\
> +		     ".pushsection .discard.fpu_safe\n\t"		\
> +		     ".long %c0b - .\n\t"				\
> +		     ".popsection\n\t" : : "i" (__COUNTER__));		\
> +})
> +
>  /*
>   * Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
>   * disables preemption so be careful if you intend to use it for long periods
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -437,6 +437,7 @@ static inline int copy_fpregs_to_fpstate
>  	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
>  	 * so we have to mark them inactive:
>  	 */
> +	annotate_fpu();
>  	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
> 
>  	return 0;
> @@ -462,6 +463,7 @@ static inline void copy_kernel_to_fpregs
>  	 * "m" is a random variable that should be in L1.
>  	 */
>  	if (unlikely(static_cpu_has_bug(X86_BUG_FXSAVE_LEAK))) {
> +		annotate_fpu();
>  		asm volatile(
>  			"fnclex\n\t"
>  			"emms\n\t"
> --- a/arch/x86/kernel/fpu/init.c
> +++ b/arch/x86/kernel/fpu/init.c
> @@ -38,7 +38,10 @@ static void fpu__init_cpu_generic(void)
>  		fpstate_init_soft(&current->thread.fpu.state.soft);
>  	else
>  #endif
> +	{
> +		annotate_fpu();
>  		asm volatile ("fninit");
> +	}
>  }
> 
>  /*
> @@ -61,6 +64,7 @@ static bool fpu__probe_without_cpuid(voi
>  	cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
>  	write_cr0(cr0);
> 
> +	annotate_fpu();
>  	asm volatile("fninit ; fnstsw %0 ; fnstcw %1" : "+m" (fsw), "+m" (fcw));
> 
>  	pr_info("x86/fpu: Probing for FPU: FSW=0x%04hx FCW=0x%04hx\n", fsw, fcw);
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -27,6 +27,7 @@ enum insn_type {
>  	INSN_CLAC,
>  	INSN_STD,
>  	INSN_CLD,
> +	INSN_FPU,
>  	INSN_OTHER,
>  };
> 
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -92,6 +92,11 @@ int arch_decode_instruction(struct elf *
>  	*len = insn.length;
>  	*type = INSN_OTHER;
> 
> +	if (insn_is_fpu(&insn)) {
> +		*type = INSN_FPU;
> +		return 0;
> +	}
> +
>  	if (insn.vex_prefix.nbytes)
>  		return 0;
> 
> @@ -357,48 +362,54 @@ int arch_decode_instruction(struct elf *
> 
>  	case 0x0f:
> 
> -		if (op2 == 0x01) {
> -
> +		switch (op2) {
> +		case 0x01:
>  			if (modrm == 0xca)
>  				*type = INSN_CLAC;
>  			else if (modrm == 0xcb)
>  				*type = INSN_STAC;
> +			break;
> 
> -		} else if (op2 >= 0x80 && op2 <= 0x8f) {
> -
> +		case 0x80 ... 0x8f: /* Jcc */
>  			*type = INSN_JUMP_CONDITIONAL;
> +			break;
> 
> -		} else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
> -			   op2 == 0x35) {
> -
> -			/* sysenter, sysret */
> +		case 0x05: /* syscall */
> +		case 0x07: /* sysret */
> +		case 0x34: /* sysenter */
> +		case 0x35: /* sysexit */
>  			*type = INSN_CONTEXT_SWITCH;
> +			break;
> 
> -		} else if (op2 == 0x0b || op2 == 0xb9) {
> -
> -			/* ud2 */
> +		case 0xff: /* ud0 */
> +		case 0xb9: /* ud1 */
> +		case 0x0b: /* ud2 */
>  			*type = INSN_BUG;
> +			break;
> 
> -		} else if (op2 == 0x0d || op2 == 0x1f) {
> -
> +		case 0x0d:
> +		case 0x1f:
>  			/* nopl/nopw */
>  			*type = INSN_NOP;
> +			break;
> 
> -		} else if (op2 == 0xa0 || op2 == 0xa8) {
> -
> -			/* push fs/gs */
> +		case 0xa0: /* push fs */
> +		case 0xa8: /* push gs */
>  			*type = INSN_STACK;
>  			op->src.type = OP_SRC_CONST;
>  			op->dest.type = OP_DEST_PUSH;
> +			break;
> 
> -		} else if (op2 == 0xa1 || op2 == 0xa9) {
> -
> -			/* pop fs/gs */
> +		case 0xa1: /* pop fs */
> +		case 0xa9: /* pop gs */
>  			*type = INSN_STACK;
>  			op->src.type = OP_SRC_POP;
>  			op->dest.type = OP_DEST_MEM;
> -		}
> +			break;
> 
> +		default:
> +			break;
> +		}
>  		break;
> 
>  	case 0xc9:
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -1316,6 +1316,43 @@ static int read_unwind_hints(struct objt
>  	return 0;
>  }
> 
> +static int read_fpu_hints(struct objtool_file *file)
> +{
> +	struct section *sec;
> +	struct instruction *insn;
> +	struct rela *rela;
> +
> +	sec = find_section_by_name(file->elf, ".rela.discard.fpu_safe");
> +	if (!sec)
> +		return 0;
> +
> +	list_for_each_entry(rela, &sec->rela_list, list) {
> +		if (rela->sym->type != STT_SECTION) {
> +			WARN("unexpected relocation symbol type in %s", sec->name);
> +			return -1;
> +		}
> +
> +		insn = find_insn(file, rela->sym->sec, rela->addend);
> +		if (!insn) {
> +			WARN("bad .discard.fpu_safe entry");
> +			return -1;
> +		}
> +
> +		if (insn->type != INSN_FPU) {
> +			WARN_FUNC("fpu_safe hint not an FPU instruction",
> +				  insn->sec, insn->offset);
> +//			return -1;
> +		}
> +
> +		while (insn && insn->type == INSN_FPU) {
> +			insn->fpu_safe = true;
> +			insn = next_insn_same_func(file, insn);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
>  static int read_retpoline_hints(struct objtool_file *file)
>  {
>  	struct section *sec;
> @@ -1422,6 +1459,10 @@ static int decode_sections(struct objtoo
>  	if (ret)
>  		return ret;
> 
> +	ret = read_fpu_hints(file);
> +	if (ret)
> +		return ret;
> +
>  	return 0;
>  }
> 
> @@ -2167,6 +2208,16 @@ static int validate_branch(struct objtoo
>  			if (dead_end_function(file, insn->call_dest))
>  				return 0;
> 
> +			if (insn->call_dest) {
> +				if (!strcmp(insn->call_dest->name, "kernel_fpu_begin") ||
> +				    !strcmp(insn->call_dest->name, "emulator_get_fpu"))
> +					state.fpu = true;
> +
> +				if (!strcmp(insn->call_dest->name, "kernel_fpu_end") ||
> +				    !strcmp(insn->call_dest->name, "emulator_put_fpu"))
> +					state.fpu = false;
> +			}
> +
>  			break;
> 
>  		case INSN_JUMP_CONDITIONAL:
> @@ -2275,6 +2326,13 @@ static int validate_branch(struct objtoo
>  			state.df = false;
>  			break;
> 
> +		case INSN_FPU:
> +			if (!state.fpu && !insn->fpu_safe) {
> +				WARN_FUNC("FPU instruction outside of kernel_fpu_{begin,end}()", sec, insn->offset);
> +				return 1;
> +			}
> +			break;
> +
>  		default:
>  			break;
>  		}
> --- a/tools/objtool/check.h
> +++ b/tools/objtool/check.h
> @@ -20,6 +20,7 @@ struct insn_state {
>  	unsigned char type;
>  	bool bp_scratch;
>  	bool drap, end, uaccess, df;
> +	bool fpu;
>  	unsigned int uaccess_stack;
>  	int drap_reg, drap_offset;
>  	struct cfi_reg vals[CFI_NUM_REGS];
> @@ -34,7 +35,7 @@ struct instruction {
>  	enum insn_type type;
>  	unsigned long immediate;
>  	bool alt_group, dead_end, ignore, hint, save, restore, ignore_alts;
> -	bool retpoline_safe;
> +	bool retpoline_safe, fpu_safe;
>  	u8 visited;
>  	struct symbol *call_dest;
>  	struct instruction *jump_dest;
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-05  3:19             ` Masami Hiramatsu
@ 2020-04-06 10:21               ` Peter Zijlstra
  2020-04-07  9:50                 ` Masami Hiramatsu
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-06 10:21 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Sun, Apr 05, 2020 at 12:19:30PM +0900, Masami Hiramatsu wrote:

> > @@ -269,14 +269,14 @@ d4: AAM Ib (i64)
> >  d5: AAD Ib (i64)
> >  d6:
> >  d7: XLAT/XLATB
> > -d8: ESC
> > -d9: ESC
> > -da: ESC
> > -db: ESC
> > -dc: ESC
> > -dd: ESC
> > -de: ESC
> > -df: ESC
> > +d8: FPU
> > +d9: FPU
> > +da: FPU
> > +db: FPU
> > +dc: FPU
> > +dd: FPU
> > +de: FPU
> > +df: FPU
> 
> I don't want to use FPU since Intel SDM is still using ESC because it
> is co-processor escape code.

But we all know that co-processor is x87. Can we then perhaps put in
'x87' as an escape code instead of 'ESC' ?

> Here is the new patch. 
> 
> From d7eca4946ab3f0d08ad1268f49418f8655aaf57c Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <mhiramat@kernel.org>
> Date: Fri, 3 Apr 2020 16:58:22 +0900
> Subject: [PATCH] x86: insn: Add insn_is_fpu()
> 
> Add insn_is_fpu(insn) which tells that the insn is
> whether touch the MMX/XMM/YMM register or the instruction
> of FP coprocessor.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>

arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: fpu_safe hint not an FPU instruction
008b  36b:      48 0f ae 0d 00 00 00    fxrstor64 0x0(%rip)        # 373 <ex_handler_fprestore+0x93>

arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: fpu_safe hint not an FPU instruction
01fa    1d2fa:  48 0f ae 4b 40          fxrstor64 0x40(%rbx)



Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
that previously):

arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_eoi_induced()+0x20: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_write()+0x20: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invlpg()+0x20: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_interrupt_shadow()+0x1c: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr4_guest_bits()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr0_guest_bits()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_io()+0x24: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_access()+0x39: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_idt()+0x57: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_exit_info()+0x58: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_gdt()+0x57: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_guest_apic_has_interrupt()+0xf8: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_nmi_allowed()+0x98: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_handle_exit_irqoff()+0xb3: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_nmi_mask()+0x8a: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_rflags()+0x99: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_ept_misconfig()+0x22: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_interrupt_allowed()+0x8d: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_write_pml_buffer()+0x1c5: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invpcid()+0x26a: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_ar()+0x9b: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_selector()+0x96: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_base()+0x9b: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_segment()+0x2da: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmwrite_error()+0x161: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: exec_controls_set.isra.0()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_dr()+0x1bc: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: update_exception_bitmap()+0x136: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_interrupt_shadow()+0x8b: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: skip_emulated_instruction()+0xe9: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_exception_nmi()+0x674: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_clear_hlt.isra.0()+0xe5: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_idt()+0x65: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_gdt()+0x65: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: seg_setup()+0x125: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_nmi_mask()+0x11c: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: __vmx_complete_interrupts()+0x167: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_task_switch()+0x34d: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_dr7()+0x1e: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: update_cr8_intercept()+0x1a2: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_hwapic_isr_update()+0x8a: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_rvi()+0x87: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_load_eoi_exitmap()+0x107: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_rflags()+0x20f: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: fix_rmode_seg()+0x1de: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_segment()+0x28b: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: enter_pmode()+0x1ad: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: enter_rmode()+0x27e: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_write_l1_tsc_offset()+0x11b: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_pml_full()+0x138: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_cancel_injection()+0x5d: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_queue_exception()+0x10d: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_inject_nmi()+0xd1: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_inject_irq()+0x127: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_ept_violation()+0x13e: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: enable_irq_window()+0x71: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_interrupt_window()+0x71: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_nmi_window()+0x92: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_sync_dirty_debug_regs()+0x203: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: enable_nmi_window()+0xca: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: clear_atomic_switch_msr()+0x2ba: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: add_atomic_switch_msr.constprop.0()+0x314: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_run()+0xc16: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_host_fs_gs()+0x1f9: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_load_vmcs()+0x404: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_cr4()+0x240: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_msr()+0x49d: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_cpuid_update()+0xc89: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: ept_save_pdptrs()+0x176: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_cache_reg()+0xe3: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_cr3()+0x47: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_apic_access_page_addr()+0x9f: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_efer()+0x22f: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_cr0()+0x15e: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_cr()+0xe1d: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_reset()+0x1a98: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_virtual_apic_mode()+0x216: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_refresh_apicv_exec_ctrl()+0x2e0: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_msr()+0xf26: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_constant_host_state()+0x364: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: set_cr4_guest_host_mask()+0xda: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: init_vmcs()+0x1705: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: dump_vmcs.cold()+0x193c: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_handle_exit()+0xcfc: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_update_host_rsp()+0x65: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_dump_dtsel()+0x5d: FPU instruction outside of kernel_fpu_{begin,end}()
arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_dump_sel()+0xda: FPU instruction outside of kernel_fpu_{begin,end}()


./objdump-func.sh defconfig-build/arch/x86/kvm/vmx/vmx.o ept_save_pdptrs | grep 176
0176    1d436:  41 0f 78 c4             vmread %rax,%r12



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-06 10:21               ` Peter Zijlstra
@ 2020-04-07  9:50                 ` Masami Hiramatsu
  2020-04-07 11:15                   ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-07  9:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Mon, 6 Apr 2020 12:21:07 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Sun, Apr 05, 2020 at 12:19:30PM +0900, Masami Hiramatsu wrote:
> 
> > > @@ -269,14 +269,14 @@ d4: AAM Ib (i64)
> > >  d5: AAD Ib (i64)
> > >  d6:
> > >  d7: XLAT/XLATB
> > > -d8: ESC
> > > -d9: ESC
> > > -da: ESC
> > > -db: ESC
> > > -dc: ESC
> > > -dd: ESC
> > > -de: ESC
> > > -df: ESC
> > > +d8: FPU
> > > +d9: FPU
> > > +da: FPU
> > > +db: FPU
> > > +dc: FPU
> > > +dd: FPU
> > > +de: FPU
> > > +df: FPU
> > 
> > I don't want to use FPU since Intel SDM is still using ESC because it
> > is co-processor escape code.
> 
> But we all know that co-processor is x87. Can we then perhaps put in
> 'x87' as an escape code instead of 'ESC' ?

Hmm, x87 might be good, but it still need a comment.

> 
> > Here is the new patch. 
> > 
> > From d7eca4946ab3f0d08ad1268f49418f8655aaf57c Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu <mhiramat@kernel.org>
> > Date: Fri, 3 Apr 2020 16:58:22 +0900
> > Subject: [PATCH] x86: insn: Add insn_is_fpu()
> > 
> > Add insn_is_fpu(insn) which tells that the insn is
> > whether touch the MMX/XMM/YMM register or the instruction
> > of FP coprocessor.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> 
> arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: fpu_safe hint not an FPU instruction
> 008b  36b:      48 0f ae 0d 00 00 00    fxrstor64 0x0(%rip)        # 373 <ex_handler_fprestore+0x93>
> 
> arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: fpu_safe hint not an FPU instruction
> 01fa    1d2fa:  48 0f ae 4b 40          fxrstor64 0x40(%rbx)

Ah, fxstor will not chang the FPU/MMX/SSE regs but just store it on memory.
OK, I'll remove it from the list.

> Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> that previously):

Oops, let me check it.

Thanks!

> 
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_eoi_induced()+0x20: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_write()+0x20: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invlpg()+0x20: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_interrupt_shadow()+0x1c: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr4_guest_bits()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_decache_cr0_guest_bits()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_io()+0x24: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_apic_access()+0x39: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_idt()+0x57: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_exit_info()+0x58: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_gdt()+0x57: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_guest_apic_has_interrupt()+0xf8: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_nmi_allowed()+0x98: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_handle_exit_irqoff()+0xb3: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_nmi_mask()+0x8a: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_rflags()+0x99: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_ept_misconfig()+0x22: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_interrupt_allowed()+0x8d: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_write_pml_buffer()+0x1c5: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_invpcid()+0x26a: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_ar()+0x9b: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_selector()+0x96: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_read_guest_seg_base()+0x9b: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_segment()+0x2da: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmwrite_error()+0x161: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: exec_controls_set.isra.0()+0x5a: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_dr()+0x1bc: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: update_exception_bitmap()+0x136: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_interrupt_shadow()+0x8b: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: skip_emulated_instruction()+0xe9: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_exception_nmi()+0x674: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_clear_hlt.isra.0()+0xe5: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_idt()+0x65: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_gdt()+0x65: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: seg_setup()+0x125: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_nmi_mask()+0x11c: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: __vmx_complete_interrupts()+0x167: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_task_switch()+0x34d: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_dr7()+0x1e: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: update_cr8_intercept()+0x1a2: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_hwapic_isr_update()+0x8a: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_rvi()+0x87: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_load_eoi_exitmap()+0x107: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_rflags()+0x20f: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: fix_rmode_seg()+0x1de: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_segment()+0x28b: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: enter_pmode()+0x1ad: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: enter_rmode()+0x27e: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_write_l1_tsc_offset()+0x11b: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_pml_full()+0x138: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_cancel_injection()+0x5d: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_queue_exception()+0x10d: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_inject_nmi()+0xd1: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_inject_irq()+0x127: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_ept_violation()+0x13e: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: enable_irq_window()+0x71: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_interrupt_window()+0x71: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_nmi_window()+0x92: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_sync_dirty_debug_regs()+0x203: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: enable_nmi_window()+0xca: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: clear_atomic_switch_msr()+0x2ba: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: add_atomic_switch_msr.constprop.0()+0x314: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_run()+0xc16: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_host_fs_gs()+0x1f9: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_load_vmcs()+0x404: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_cr4()+0x240: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_get_msr()+0x49d: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_cpuid_update()+0xc89: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: ept_save_pdptrs()+0x176: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_cache_reg()+0xe3: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_cr3()+0x47: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_apic_access_page_addr()+0x9f: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_efer()+0x22f: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_cr0()+0x15e: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: handle_cr()+0xe1d: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_reset()+0x1a98: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_virtual_apic_mode()+0x216: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_refresh_apicv_exec_ctrl()+0x2e0: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_msr()+0xf26: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_set_constant_host_state()+0x364: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: set_cr4_guest_host_mask()+0xda: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: init_vmcs()+0x1705: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: dump_vmcs.cold()+0x193c: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_handle_exit()+0xcfc: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_update_host_rsp()+0x65: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_dump_dtsel()+0x5d: FPU instruction outside of kernel_fpu_{begin,end}()
> arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_dump_sel()+0xda: FPU instruction outside of kernel_fpu_{begin,end}()
> 
> 
> ./objdump-func.sh defconfig-build/arch/x86/kvm/vmx/vmx.o ept_save_pdptrs | grep 176
> 0176    1d436:  41 0f 78 c4             vmread %rax,%r12
> 
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-07  9:50                 ` Masami Hiramatsu
@ 2020-04-07 11:15                   ` Peter Zijlstra
  2020-04-07 15:41                     ` Masami Hiramatsu
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-07 11:15 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Tue, Apr 07, 2020 at 06:50:08PM +0900, Masami Hiramatsu wrote:
> On Mon, 6 Apr 2020 12:21:07 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: fpu_safe hint not an FPU instruction
> > 008b  36b:      48 0f ae 0d 00 00 00    fxrstor64 0x0(%rip)        # 373 <ex_handler_fprestore+0x93>
> > 
> > arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: fpu_safe hint not an FPU instruction
> > 01fa    1d2fa:  48 0f ae 4b 40          fxrstor64 0x40(%rbx)
> 
> Ah, fxstor will not chang the FPU/MMX/SSE regs but just store it on memory.
> OK, I'll remove it from the list.

Yeah, I don't much care if its in or out, but the way I was reading that
patch it _should_ be in, but then it doesn't seem to recognise it.

> > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> > that previously):
> 
> Oops, let me check it.

I just send you another patch that could do with insn_is_vmx()
(sorry!!!)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-07 11:15                   ` Peter Zijlstra
@ 2020-04-07 15:41                     ` Masami Hiramatsu
  2020-04-07 15:43                       ` [PATCH] x86: insn: Add insn_is_fpu() Masami Hiramatsu
  2020-04-07 15:54                       ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
  0 siblings, 2 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-07 15:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Tue, 7 Apr 2020 13:15:35 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Apr 07, 2020 at 06:50:08PM +0900, Masami Hiramatsu wrote:
> > On Mon, 6 Apr 2020 12:21:07 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > arch/x86/mm/extable.o: warning: objtool: ex_handler_fprestore()+0x8b: fpu_safe hint not an FPU instruction
> > > 008b  36b:      48 0f ae 0d 00 00 00    fxrstor64 0x0(%rip)        # 373 <ex_handler_fprestore+0x93>
> > > 
> > > arch/x86/kvm/x86.o: warning: objtool: kvm_load_guest_fpu.isra.0()+0x1fa: fpu_safe hint not an FPU instruction
> > > 01fa    1d2fa:  48 0f ae 4b 40          fxrstor64 0x40(%rbx)
> > 
> > Ah, fxstor will not chang the FPU/MMX/SSE regs but just store it on memory.
> > OK, I'll remove it from the list.
> 
> Yeah, I don't much care if its in or out, but the way I was reading that
> patch it _should_ be in, but then it doesn't seem to recognise it.

Oops, I misread. OK. I fixed the issue.

> 
> > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> > > that previously):
> > 
> > Oops, let me check it.
> 
> I just send you another patch that could do with insn_is_vmx()
> (sorry!!!)

Hmm, it is hard to find out the vmx insns. Maybe we need to clarify it by
opcode pattern. (like "VM.*")

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] x86: insn: Add insn_is_fpu()
  2020-04-07 15:41                     ` Masami Hiramatsu
@ 2020-04-07 15:43                       ` Masami Hiramatsu
  2020-04-07 15:54                       ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
  1 sibling, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-07 15:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo

Add insn_is_fpu(insn) which tells that the insn is
whether touch the MMX/XMM/YMM register or the instruction
of FP coprocessor.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes:
 - Fix non-argument mmx/sse opcode pattern
 - Fix to add INAT_FPUIFVEX if the first opcode isn't FPU but 2nd is FPU
   instead of mnemonic pattern.
---
 arch/x86/include/asm/inat.h                |    7 ++++
 arch/x86/include/asm/insn.h                |   12 +++++++
 arch/x86/lib/x86-opcode-map.txt            |   25 ++++++++------
 arch/x86/tools/gen-insn-attr-x86.awk       |   51 ++++++++++++++++++++++++----
 tools/arch/x86/include/asm/inat.h          |    7 ++++
 tools/arch/x86/include/asm/insn.h          |   12 +++++++
 tools/arch/x86/lib/x86-opcode-map.txt      |   25 ++++++++------
 tools/arch/x86/tools/gen-insn-attr-x86.awk |   51 ++++++++++++++++++++++++----
 8 files changed, 154 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 4cf2ad521f65..ffce45178c08 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5c1ae3eff9d4..1752c54d2103 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..5470d378731a 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -269,14 +269,17 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set). Since the coprocessor means only x87 FPU
+# now, make it "x87" instead of "ESC".
+d8: x87
+d9: x87
+da: x87
+db: x87
+dc: x87
+dd: x87
+de: x87
+df: x87
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -1037,9 +1040,9 @@ EndTable
 
 GrpTable: Grp15
 0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+1: fxrstor | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..d74d9e605723 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+
+	mmxreg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	mmx_expr = "^(emms|fxsave|fxrstor|ldmxcsr|stmxcsr)" # MMX/SSE nmemonics lacking operands
+	fpu_expr = "^x87"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 {
 	imm = null
 	mod = null
+	mmx = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, mmxreg_expr) == 1) {
+			mmx = "INAT_FPU"
+		}
 	}
+	if (mmx)
+		imm = add_flags(imm, mmx)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +292,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -294,6 +307,7 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 			opnd = $i
 			count = split($(i++), opnds, ",")
 			flags = convert_operands(count, opnds)
+
 		}
 		if (match($i, ext_expr))
 			ext = $(i++)
@@ -318,9 +332,9 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check coprocessor escape
+		if (match(opcode, fpu_expr) || match(opcode, mmx_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
@@ -336,22 +350,45 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU")
+				lpfpu[0] = 1
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 877827b7c2c3..2e6a05290efd 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 568854b14d0a..d9f6bd9059c1 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..5470d378731a 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -269,14 +269,17 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set). Since the coprocessor means only x87 FPU
+# now, make it "x87" instead of "ESC".
+d8: x87
+d9: x87
+da: x87
+db: x87
+dc: x87
+dd: x87
+de: x87
+df: x87
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -1037,9 +1040,9 @@ EndTable
 
 GrpTable: Grp15
 0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+1: fxrstor | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..d74d9e605723 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+
+	mmxreg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	mmx_expr = "^(emms|fxsave|fxrstor|ldmxcsr|stmxcsr)" # MMX/SSE nmemonics lacking operands
+	fpu_expr = "^x87"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,mmx)
 {
 	imm = null
 	mod = null
+	mmx = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, mmxreg_expr) == 1) {
+			mmx = "INAT_FPU"
+		}
 	}
+	if (mmx)
+		imm = add_flags(imm, mmx)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +292,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -294,6 +307,7 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 			opnd = $i
 			count = split($(i++), opnds, ",")
 			flags = convert_operands(count, opnds)
+
 		}
 		if (match($i, ext_expr))
 			ext = $(i++)
@@ -318,9 +332,9 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check coprocessor escape
+		if (match(opcode, fpu_expr) || match(opcode, mmx_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
@@ -336,22 +350,45 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU")
+				lpfpu[0] = 1
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-07 15:41                     ` Masami Hiramatsu
  2020-04-07 15:43                       ` [PATCH] x86: insn: Add insn_is_fpu() Masami Hiramatsu
@ 2020-04-07 15:54                       ` Peter Zijlstra
  2020-04-08  0:31                         ` Masami Hiramatsu
  2020-04-08 16:09                         ` [PATCH v2] x86: insn: Add insn_is_fpu() Masami Hiramatsu
  1 sibling, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-07 15:54 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Wed, Apr 08, 2020 at 12:41:11AM +0900, Masami Hiramatsu wrote:
> On Tue, 7 Apr 2020 13:15:35 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:

> > > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> > > > that previously):
> > > 
> > > Oops, let me check it.
> > 
> > I just send you another patch that could do with insn_is_vmx()
> > (sorry!!!)
> 
> Hmm, it is hard to find out the vmx insns. Maybe we need to clarify it by
> opcode pattern. (like "VM.*")

Yeah, I know. Maybe I should just keep it as I have for now.

One thing I thought of is we could perhaps add manual markers in
x86-opcode-map.txt. The '{','}' characters appear unused so far, we
perhaps we can use them to classify things.

That could maybe replace "mmx_expr" as well. That is, something like so:

---

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..e01b76e0a294 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -462,9 +462,9 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
-78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
-79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
+77: emms {FPU} | vzeroupper | vzeroall
+78: VMREAD Ey,Gy {VMX} | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
+79: VMWRITE Gy,Ey {VMX} | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
 7b: vcvtusi2sd Vpd,Hpd,Ev (F2),(ev) | vcvtusi2ss Vps,Hps,Ev (F3),(ev) | vcvtps2qq/pd2qq Vx,Wx (66),(ev)
 7c: vhaddpd Vpd,Hpd,Wpd (66) | vhaddps Vps,Hps,Wps (F2)
@@ -965,9 +965,9 @@ GrpTable: Grp6
 EndTable
 
 GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
+0: SGDT Ms | VMCALL (001),(11B) {VMX} | VMLAUNCH (010),(11B) {VMX} | VMRESUME (011),(11B) {VMX} | VMXOFF (100),(11B) {VMX} | PCONFIG (101),(11B) | ENCLV (000),(11B)
 1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
-2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
+2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) {VMX} | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
 3: LIDT Ms
 4: SMSW Mw/Rv
 5: rdpkru (110),(11B) | wrpkru (111),(11B) | SAVEPREVSSP (F3),(010),(11B) | RSTORSSP Mq (F3) | SETSSBSY (F3),(000),(11B)
@@ -987,8 +987,8 @@ GrpTable: Grp9
 3: xrstors
 4: xsavec
 5: xsaves
-6: VMPTRLD Mq | VMCLEAR Mq (66) | VMXON Mq (F3) | RDRAND Rv (11B)
-7: VMPTRST Mq | VMPTRST Mq (F3) | RDSEED Rv (11B)
+6: VMPTRLD Mq {VMX} | VMCLEAR Mq (66) {VMX} | VMXON Mq (F3) {VMX} | RDRAND Rv (11B)
+7: VMPTRST Mq {VMX} | VMPTRST Mq (F3) {VMX} | RDSEED Rv (11B)
 EndTable
 
 GrpTable: Grp10
@@ -1036,10 +1036,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-07 15:54                       ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
@ 2020-04-08  0:31                         ` Masami Hiramatsu
  2020-04-08 16:09                         ` [PATCH v2] x86: insn: Add insn_is_fpu() Masami Hiramatsu
  1 sibling, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-08  0:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David (ChunMing) Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Tue, 7 Apr 2020 17:54:49 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Apr 08, 2020 at 12:41:11AM +0900, Masami Hiramatsu wrote:
> > On Tue, 7 Apr 2020 13:15:35 +0200
> > Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > > > Also, all the VMX bits seems to qualify as FPU (I can't remember seeing
> > > > > that previously):
> > > > 
> > > > Oops, let me check it.
> > > 
> > > I just send you another patch that could do with insn_is_vmx()
> > > (sorry!!!)
> > 
> > Hmm, it is hard to find out the vmx insns. Maybe we need to clarify it by
> > opcode pattern. (like "VM.*")
> 
> Yeah, I know. Maybe I should just keep it as I have for now.
> 
> One thing I thought of is we could perhaps add manual markers in
> x86-opcode-map.txt. The '{','}' characters appear unused so far, we
> perhaps we can use them to classify things.
> 
> That could maybe replace "mmx_expr" as well. That is, something like so:

Thanks for the good suggestion!
Maybe this is much better than the fragile mnemonic pattern matching :)

BTW, I would like to use {VIRT} instead of {VMX} because some
instructions are for SVM.

Thank you!

> 
> ---
> 
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index ec31f5b60323..e01b76e0a294 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -462,9 +462,9 @@ AVXcode: 1
>  75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
>  76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
>  # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
> -77: emms | vzeroupper | vzeroall
> -78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
> -79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
> +77: emms {FPU} | vzeroupper | vzeroall
> +78: VMREAD Ey,Gy {VMX} | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
> +79: VMWRITE Gy,Ey {VMX} | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
>  7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
>  7b: vcvtusi2sd Vpd,Hpd,Ev (F2),(ev) | vcvtusi2ss Vps,Hps,Ev (F3),(ev) | vcvtps2qq/pd2qq Vx,Wx (66),(ev)
>  7c: vhaddpd Vpd,Hpd,Wpd (66) | vhaddps Vps,Hps,Wps (F2)
> @@ -965,9 +965,9 @@ GrpTable: Grp6
>  EndTable
>  
>  GrpTable: Grp7
> -0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B)
> +0: SGDT Ms | VMCALL (001),(11B) {VMX} | VMLAUNCH (010),(11B) {VMX} | VMRESUME (011),(11B) {VMX} | VMXOFF (100),(11B) {VMX} | PCONFIG (101),(11B) | ENCLV (000),(11B)
>  1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B)
> -2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
> +2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) {VMX} | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
>  3: LIDT Ms
>  4: SMSW Mw/Rv
>  5: rdpkru (110),(11B) | wrpkru (111),(11B) | SAVEPREVSSP (F3),(010),(11B) | RSTORSSP Mq (F3) | SETSSBSY (F3),(000),(11B)
> @@ -987,8 +987,8 @@ GrpTable: Grp9
>  3: xrstors
>  4: xsavec
>  5: xsaves
> -6: VMPTRLD Mq | VMCLEAR Mq (66) | VMXON Mq (F3) | RDRAND Rv (11B)
> -7: VMPTRST Mq | VMPTRST Mq (F3) | RDSEED Rv (11B)
> +6: VMPTRLD Mq {VMX} | VMCLEAR Mq (66) {VMX} | VMXON Mq (F3) {VMX} | RDRAND Rv (11B)
> +7: VMPTRST Mq {VMX} | VMPTRST Mq (F3) {VMX} | RDSEED Rv (11B)
>  EndTable
>  
>  GrpTable: Grp10
> @@ -1036,10 +1036,10 @@ GrpTable: Grp14
>  EndTable
>  
>  GrpTable: Grp15
> -0: fxsave | RDFSBASE Ry (F3),(11B)
> -1: fxstor | RDGSBASE Ry (F3),(11B)
> -2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
> -3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
> +0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
> +1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
> +2: ldmxcsr {FPU} | vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
> +3: stmxcsr {FPU} | vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
>  4: XSAVE | ptwrite Ey (F3),(11B)
>  5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
>  6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2] x86: insn: Add insn_is_fpu()
  2020-04-07 15:54                       ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
  2020-04-08  0:31                         ` Masami Hiramatsu
@ 2020-04-08 16:09                         ` Masami Hiramatsu
  2020-04-09 14:32                           ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-08 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo

Add insn_is_fpu(insn) which tells that the insn is
whether touch the FPU/SSE/MMX register or the instruction
of FP coprocessor.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v2:
 - Introduce FPU superscript.
 - Fix to add INAT_FPUIFVEX for variant if the first opcode has no
   last prefix superscript.
---
 tools/arch/x86/include/asm/inat.h          |    7 ++++
 tools/arch/x86/include/asm/insn.h          |   12 ++++++
 tools/arch/x86/lib/x86-opcode-map.txt      |   32 ++++++++++------
 tools/arch/x86/tools/gen-insn-attr-x86.awk |   56 ++++++++++++++++++++++++----
 4 files changed, 86 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 4cf2ad521f65..ffce45178c08 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5c1ae3eff9d4..1752c54d2103 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..3aae11931a0a 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# Optional Superscripts
+#  - {FPU}: this mnemonic doesn't have FPU/MMX/SSE operands but access those
+#           registers.
 
 Table: one byte opcode
 Referrer:
@@ -269,14 +273,16 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set), the coprocessor means x87 FPU.
+d8: ESC {FPU}
+d9: ESC {FPU}
+da: ESC {FPU}
+db: ESC {FPU}
+dc: ESC {FPU}
+dd: ESC {FPU}
+de: ESC {FPU}
+df: ESC {FPU}
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -462,7 +468,7 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
+77: emms {FPU} | vzeroupper | vzeroall
 78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
 79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
@@ -1036,10 +1042,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1),{FPU} | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1),{FPU} | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..e8a0436d6397 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -44,7 +44,7 @@ BEGIN {
 	delete atable
 
 	opnd_expr = "^[A-Za-z/]"
-	ext_expr = "^\\("
+	ext_expr = "^(\\(|\\{)"
 	sep_expr = "^\\|$"
 	group_expr = "^Grp[0-9A-Za-z]+"
 
@@ -65,7 +65,9 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+
+	fpureg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	fpu_expr = "\\{FPU\\}"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +238,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,fpu)
 {
 	imm = null
 	mod = null
+	fpu = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +256,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, fpureg_expr) == 1) {
+			fpu = "INAT_FPU"
+		}
 	}
+	if (fpu)
+		imm = add_flags(imm, fpu)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +291,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -294,6 +306,7 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 			opnd = $i
 			count = split($(i++), opnds, ",")
 			flags = convert_operands(count, opnds)
+
 		}
 		if (match($i, ext_expr))
 			ext = $(i++)
@@ -318,9 +331,9 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check FPU/MMX/SSE superscripts
+		if (match(ext, fpu_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
@@ -336,22 +349,49 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU") {
+				lpfpu[0] = 1
+				lpfpu[1] = 1
+				lpfpu[2] = 1
+				lpfpu[3] = 1
+			}
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 877827b7c2c3..2e6a05290efd 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 568854b14d0a..d9f6bd9059c1 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..3aae11931a0a 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# Optional Superscripts
+#  - {FPU}: this mnemonic doesn't have FPU/MMX/SSE operands but access those
+#           registers.
 
 Table: one byte opcode
 Referrer:
@@ -269,14 +273,16 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set), the coprocessor means x87 FPU.
+d8: ESC {FPU}
+d9: ESC {FPU}
+da: ESC {FPU}
+db: ESC {FPU}
+dc: ESC {FPU}
+dd: ESC {FPU}
+de: ESC {FPU}
+df: ESC {FPU}
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -462,7 +468,7 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
+77: emms {FPU} | vzeroupper | vzeroall
 78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
 79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
@@ -1036,10 +1042,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1),{FPU} | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1),{FPU} | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..e8a0436d6397 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -44,7 +44,7 @@ BEGIN {
 	delete atable
 
 	opnd_expr = "^[A-Za-z/]"
-	ext_expr = "^\\("
+	ext_expr = "^(\\(|\\{)"
 	sep_expr = "^\\|$"
 	group_expr = "^Grp[0-9A-Za-z]+"
 
@@ -65,7 +65,9 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+
+	fpureg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	fpu_expr = "\\{FPU\\}"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +238,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,fpu)
 {
 	imm = null
 	mod = null
+	fpu = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +256,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, fpureg_expr) == 1) {
+			fpu = "INAT_FPU"
+		}
 	}
+	if (fpu)
+		imm = add_flags(imm, fpu)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +291,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -294,6 +306,7 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 			opnd = $i
 			count = split($(i++), opnds, ",")
 			flags = convert_operands(count, opnds)
+
 		}
 		if (match($i, ext_expr))
 			ext = $(i++)
@@ -318,9 +331,9 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM")
+		# check FPU/MMX/SSE superscripts
+		if (match(ext, fpu_expr))
+			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))
@@ -336,22 +349,49 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU") {
+				lpfpu[0] = 1
+				lpfpu[1] = 1
+				lpfpu[2] = 1
+				lpfpu[3] = 1
+			}
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] x86: insn: Add insn_is_fpu()
  2020-04-08 16:09                         ` [PATCH v2] x86: insn: Add insn_is_fpu() Masami Hiramatsu
@ 2020-04-09 14:32                           ` Peter Zijlstra
  2020-04-09 14:45                             ` Peter Zijlstra
                                               ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-09 14:32 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Thu, Apr 09, 2020 at 01:09:11AM +0900, Masami Hiramatsu wrote:
> Add insn_is_fpu(insn) which tells that the insn is
> whether touch the FPU/SSE/MMX register or the instruction
> of FP coprocessor.
> 
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> ---

Sadly, it turns out I need "FWAIT" too, which I tried adding like the
below, but that comes apart most mighty :/

The trouble is that FWAIT doesn't take a MODRM, so the previous
assumption that INAT_FPU implied INAT_MODRM needs to be broken, and I
think that ripples through somewhere.

(also, your patch adds some whitespace to convert_operands(), not sure
that was intended)

--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -206,7 +206,7 @@ Table: one byte opcode
 98: CBW/CWDE/CDQE
 99: CWD/CDQ/CQO
 9a: CALLF Ap (i64)
-9b: FWAIT/WAIT
+9b: FWAIT/WAIT {FPU}
 9c: PUSHF/D/Q Fv (d64)
 9d: POPF/D/Q Fv (d64)
 9e: SAHF
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -331,9 +331,13 @@ function convert_operands(count,opnd,
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
+		# check coprocessor escape
+		if (match(ext, "^ESC"))
+			flags = add_flags(flags, "INAT_MODRM")
+
 		# check FPU/MMX/SSE superscripts
 		if (match(ext, fpu_expr))
-			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
+			flags = add_flags(flags, "INAT_FPU")
 
 		# check VEX codes
 		if (match(ext, evexonly_expr))

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] x86: insn: Add insn_is_fpu()
  2020-04-09 14:32                           ` Peter Zijlstra
@ 2020-04-09 14:45                             ` Peter Zijlstra
  2020-04-10  0:47                             ` Masami Hiramatsu
                                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-09 14:45 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Thu, Apr 09, 2020 at 04:32:12PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 09, 2020 at 01:09:11AM +0900, Masami Hiramatsu wrote:
> > Add insn_is_fpu(insn) which tells that the insn is
> > whether touch the FPU/SSE/MMX register or the instruction
> > of FP coprocessor.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > ---
> 
> Sadly, it turns out I need "FWAIT" too, which I tried adding like the
> below, but that comes apart most mighty :/
> 
> The trouble is that FWAIT doesn't take a MODRM, so the previous
> assumption that INAT_FPU implied INAT_MODRM needs to be broken, and I
> think that ripples through somewhere.
> 
> (also, your patch adds some whitespace to convert_operands(), not sure
> that was intended)
> 
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -206,7 +206,7 @@ Table: one byte opcode
>  98: CBW/CWDE/CDQE
>  99: CWD/CDQ/CQO
>  9a: CALLF Ap (i64)
> -9b: FWAIT/WAIT
> +9b: FWAIT/WAIT {FPU}
>  9c: PUSHF/D/Q Fv (d64)
>  9d: POPF/D/Q Fv (d64)
>  9e: SAHF
> --- a/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -331,9 +331,13 @@ function convert_operands(count,opnd,
>  		if (match(opcode, rex_expr))
>  			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
>  
> +		# check coprocessor escape
> +		if (match(ext, "^ESC"))
> +			flags = add_flags(flags, "INAT_MODRM")

I'm an idiot; that needs to be:

		if (match(opcode, "^ESC"))

> +
>  		# check FPU/MMX/SSE superscripts
>  		if (match(ext, fpu_expr))
> -			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
> +			flags = add_flags(flags, "INAT_FPU")
>  
>  		# check VEX codes
>  		if (match(ext, evexonly_expr))

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-02 14:13   ` Peter Zijlstra
  2020-04-03  5:28     ` Masami Hiramatsu
@ 2020-04-09 15:59     ` Peter Zijlstra
  2020-04-09 17:09       ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-09 15:59 UTC (permalink / raw)
  To: Christian König
  Cc: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo, mhiramat

On Thu, Apr 02, 2020 at 04:13:08PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:

> > yes, using the floating point calculations in the display code has been a
> > source of numerous problems and confusion in the past.
> > 
> > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> > DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> > architecture depend handling for x86 and PPC64.
> > 
> > This originated from the graphics block integrated into AMD CPU (where we
> > knew which fp unit we had), but as far as I know is now also used for
> > dedicated AMD GPUs as well.
> > 
> > I'm not really a fan of this either, but so far we weren't able to convince
> > the hardware engineers to not use floating point calculations for the
> > display stuff.
> 
> Might I complain that:
> 
> 	make O=allmodconfig-build drivers/gpu/drm/amd/display/dc/
> 
> does not in fact work?

Worse; allmodconfig doesn't select these, and hence I did not in fact
build-test them for a while :/

Anyway, I now have a config that includes them and I get plenty fail
with my objtool patch. In part because this is spread over multiple
object files and in part because of the forrest of indirect calls Jann
already mentioned.

The multi-unit issue can be fixed by simply sticking all the related .o
files in an archive and running objtool on that, but the pointer crap is
much harder.

I'll need another approach, let me consider.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-09 15:59     ` Peter Zijlstra
@ 2020-04-09 17:09       ` Peter Zijlstra
  2020-04-09 18:15         ` Christian König
  0 siblings, 1 reply; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-09 17:09 UTC (permalink / raw)
  To: Christian König
  Cc: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo, mhiramat

On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 02, 2020 at 04:13:08PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 02, 2020 at 09:33:54AM +0200, Christian König wrote:
> 
> > > yes, using the floating point calculations in the display code has been a
> > > source of numerous problems and confusion in the past.
> > > 
> > > The calls to kernel_fpu_begin() and kernel_fpu_end() are hidden behind the
> > > DC_FP_START() and DC_FP_END() macros which are supposed to hide the
> > > architecture depend handling for x86 and PPC64.
> > > 
> > > This originated from the graphics block integrated into AMD CPU (where we
> > > knew which fp unit we had), but as far as I know is now also used for
> > > dedicated AMD GPUs as well.
> > > 
> > > I'm not really a fan of this either, but so far we weren't able to convince
> > > the hardware engineers to not use floating point calculations for the
> > > display stuff.

> I'll need another approach, let me consider.

Christian; it says these files are generated, does that generator know
which functions are wholly in FPU context and which are not?

My current thinking is that if I annotate all functions that are wholly
inside kernel_fpu_start() with an __fpu function attribute, then I can
verify that any call from regular text to fpu text only happens inside
kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
fuctions only contain !fpu instructions.

Can that generator add the __fpu function attribute or is that something
that would need to be done manually (which seems like it would be
painful, since it is quite a bit of code) ?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-09 17:09       ` Peter Zijlstra
@ 2020-04-09 18:15         ` Christian König
  2020-04-09 20:01           ` Peter Zijlstra
  0 siblings, 1 reply; 37+ messages in thread
From: Christian König @ 2020-04-09 18:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo, mhiramat

Am 09.04.20 um 19:09 schrieb Peter Zijlstra:
> On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> [SNIP]
>> I'll need another approach, let me consider.
> Christian; it says these files are generated, does that generator know
> which functions are wholly in FPU context and which are not?

Well that "generator" is still a human being :)

It's just that the formulae for the calculation come from the hardware 
team and we are not able to easily transcript them to fixed point 
calculations.

> My current thinking is that if I annotate all functions that are wholly
> inside kernel_fpu_start() with an __fpu function attribute, then I can
> verify that any call from regular text to fpu text only happens inside
> kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
> fuctions only contain !fpu instructions.

Yeah, that sounds like a good idea to me and should be easily doable.

> Can that generator add the __fpu function attribute or is that something
> that would need to be done manually (which seems like it would be
> painful, since it is quite a bit of code) ?

We are currently in the process of moving all the stuff which requires 
floating point into a single C file(s) and then make sure that we only 
call those within kernel_fpu_begin()/end() blocks.

Annotating those function with __fpu or even saying to gcc that all code 
of those files should go into a special text.fpu segment shouldn't be 
much of a problem.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-09 18:15         ` Christian König
@ 2020-04-09 20:01           ` Peter Zijlstra
  2020-04-10 14:31             ` Christian König
  2020-04-17 20:27             ` Rodrigo Siqueira
  0 siblings, 2 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-09 20:01 UTC (permalink / raw)
  To: Christian König
  Cc: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo, mhiramat

On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote:
> Am 09.04.20 um 19:09 schrieb Peter Zijlstra:
> > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> > [SNIP]
> > > I'll need another approach, let me consider.
> > Christian; it says these files are generated, does that generator know
> > which functions are wholly in FPU context and which are not?
> 
> Well that "generator" is still a human being :)
> 
> It's just that the formulae for the calculation come from the hardware team
> and we are not able to easily transcript them to fixed point calculations.

Well, if it's a human, can this human respect the kernel coding style a
bit more :-) Some of that stuff is atrocious.

> > My current thinking is that if I annotate all functions that are wholly
> > inside kernel_fpu_start() with an __fpu function attribute, then I can
> > verify that any call from regular text to fpu text only happens inside
> > kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
> > fuctions only contain !fpu instructions.
> 
> Yeah, that sounds like a good idea to me and should be easily doable.
> 
> > Can that generator add the __fpu function attribute or is that something
> > that would need to be done manually (which seems like it would be
> > painful, since it is quite a bit of code) ?
> 
> We are currently in the process of moving all the stuff which requires
> floating point into a single C file(s) and then make sure that we only call
> those within kernel_fpu_begin()/end() blocks.

Can you make the build system stick all those .o files in a single
archive? That's the only way I can do call validation; external
relocatoin records do not contain the section.

> Annotating those function with __fpu or even saying to gcc that all code of
> those files should go into a special text.fpu segment shouldn't be much of a
> problem.

Guess what the __fpu attribute does ;-)

With the below patch (which is on to of newer versions of the objtool
patches send earlier, let me know if you want a full set) that only
converts a few files, but fully converts:

  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c

But building it (and this is an absolute pain; when you're reworking
this, can you pretty please also fix the Makefiles?), we get:

  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of kernel_fpu_{begin,end}()

$ ./scripts/faddr2line defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth+0x34fa
dcn_validate_bandwidth+0x34fa/0x57ce:
dcn_validate_bandwidth at /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293 (discriminator 5)

# ./objdump-func.sh defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth | grep 34fa
34fa     50fa:  f2 0f 10 b5 60 ff ff    movsd  -0xa0(%rbp),%xmm6

Which seems to indicate there's still problms with the current code.



---
 arch/x86/include/asm/fpu/api.h                     | 12 +++++++++++
 arch/x86/kernel/vmlinux.lds.S                      |  1 +
 .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c   | 25 +++++++++++-----------
 drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c   |  4 ++--
 .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c  | 10 ++++-----
 .../amd/display/dc/dml/display_rq_dlg_helpers.c    |  2 +-
 .../gpu/drm/amd/display/dc/dml/dml_common_defs.c   |  2 +-
 drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c        |  2 +-
 drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c       | 10 ++++-----
 drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c   |  4 ++--
 drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h |  2 ++
 tools/objtool/check.c                              |  7 +++++-
 tools/objtool/elf.h                                |  2 +-
 13 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 64be4426fda9..19eaf98bbb0a 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -12,11 +12,23 @@
 #define _ASM_X86_FPU_API_H
 #include <linux/bottom_half.h>

+#ifdef CONFIG_STACK_VALIDATION
+
+#define __fpu __section(".text.fpu")
+
 #define _ASM_ANNOTATE_FPU(at)						\
 		     ".pushsection .discard.fpu_safe\n"			\
 		     ".long " #at " - .\n"				\
 		     ".popsection\n"					\

+#else
+
+#define __fpu
+
+#define _ASM_ANNOTATE_FPU(at)
+
+#endif /* CONFIG_STACK_VALIDATION */
+
 #define annotate_fpu() ({						\
 	asm volatile("%c0:\n\t"						\
 		     _ASM_ANNOTATE_FPU(%c0b)				\
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1bf7e312361f..8442f8633d07 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -139,6 +139,7 @@ SECTIONS
 		SOFTIRQENTRY_TEXT
 		*(.fixup)
 		*(.gnu.warning)
+		*(.text.fpu)

 #ifdef CONFIG_RETPOLINE
 		__indirect_thunk_start = .;
diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
index 07d18e78de49..57ab3aafef5a 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
@@ -36,7 +36,7 @@
  * remain as-is as it provides us with a guarantee from HW that it is correct.
  */

-float dcn_bw_mod(const float arg1, const float arg2)
+__fpu float dcn_bw_mod(const float arg1, const float arg2)
 {
 	if (isNaN(arg1))
 		return arg2;
@@ -45,7 +45,7 @@ float dcn_bw_mod(const float arg1, const float arg2)
 	return arg1 - arg1 * ((int) (arg1 / arg2));
 }

-float dcn_bw_min2(const float arg1, const float arg2)
+__fpu float dcn_bw_min2(const float arg1, const float arg2)
 {
 	if (isNaN(arg1))
 		return arg2;
@@ -58,7 +58,7 @@ unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2)
 {
 	return arg1 > arg2 ? arg1 : arg2;
 }
-float dcn_bw_max2(const float arg1, const float arg2)
+__fpu float dcn_bw_max2(const float arg1, const float arg2)
 {
 	if (isNaN(arg1))
 		return arg2;
@@ -67,25 +67,26 @@ float dcn_bw_max2(const float arg1, const float arg2)
 	return arg1 > arg2 ? arg1 : arg2;
 }

-float dcn_bw_floor2(const float arg, const float significance)
+__fpu float dcn_bw_floor2(const float arg, const float significance)
 {
 	if (significance == 0)
 		return 0;
 	return ((int) (arg / significance)) * significance;
 }
-float dcn_bw_floor(const float arg)
+
+__fpu float dcn_bw_floor(const float arg)
 {
 	return ((int) (arg));
 }

-float dcn_bw_ceil(const float arg)
+__fpu float dcn_bw_ceil(const float arg)
 {
 	float flr = dcn_bw_floor2(arg, 1);

 	return flr + 0.00001 >= arg ? arg : flr + 1;
 }

-float dcn_bw_ceil2(const float arg, const float significance)
+__fpu float dcn_bw_ceil2(const float arg, const float significance)
 {
 	float flr = dcn_bw_floor2(arg, significance);
 	if (significance == 0)
@@ -93,17 +94,17 @@ float dcn_bw_ceil2(const float arg, const float significance)
 	return flr + 0.00001 >= arg ? arg : flr + significance;
 }

-float dcn_bw_max3(float v1, float v2, float v3)
+__fpu float dcn_bw_max3(float v1, float v2, float v3)
 {
 	return v3 > dcn_bw_max2(v1, v2) ? v3 : dcn_bw_max2(v1, v2);
 }

-float dcn_bw_max5(float v1, float v2, float v3, float v4, float v5)
+__fpu float dcn_bw_max5(float v1, float v2, float v3, float v4, float v5)
 {
 	return dcn_bw_max3(v1, v2, v3) > dcn_bw_max2(v4, v5) ? dcn_bw_max3(v1, v2, v3) : dcn_bw_max2(v4, v5);
 }

-float dcn_bw_pow(float a, float exp)
+__fpu float dcn_bw_pow(float a, float exp)
 {
 	float temp;
 	/*ASSERT(exp == (int)exp);*/
@@ -120,7 +121,7 @@ float dcn_bw_pow(float a, float exp)
 	}
 }

-double dcn_bw_fabs(double a)
+__fpu double dcn_bw_fabs(double a)
 {
 	if (a > 0)
 		return (a);
@@ -129,7 +130,7 @@ double dcn_bw_fabs(double a)
 }


-float dcn_bw_log(float a, float b)
+__fpu float dcn_bw_log(float a, float b)
 {
 	int * const exp_ptr = (int *)(&a);
 	int x = *exp_ptr;
diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
index 3960a8db94cb..b3e305d9d1c9 100644
--- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
+++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
@@ -435,7 +435,7 @@ static void pipe_ctx_to_e2e_pipe_params (

 }

-static void dcn_bw_calc_rq_dlg_ttu(
+static __fpu void dcn_bw_calc_rq_dlg_ttu(
 		const struct dc *dc,
 		const struct dcn_bw_internal_vars *v,
 		struct pipe_ctx *pipe,
@@ -1388,7 +1388,7 @@ static unsigned int dcn_find_normalized_clock_vdd_Level(
 	return vdd_level;
 }

-unsigned int dcn_find_dcfclk_suits_all(
+__fpu unsigned int dcn_find_dcfclk_suits_all(
 	const struct dc *dc,
 	struct dc_clocks *clocks)
 {
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
index ca807846032f..0cbb58fc7fb8 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
@@ -115,7 +115,7 @@ static bool is_dual_plane(enum source_format_class source_format)
 	return ret_val;
 }

-static double get_refcyc_per_delivery(struct display_mode_lib *mode_lib,
+static __fpu double get_refcyc_per_delivery(struct display_mode_lib *mode_lib,
 		double refclk_freq_in_mhz,
 		double pclk_freq_in_mhz,
 		bool odm_combine,
@@ -162,7 +162,7 @@ static unsigned int get_blk_size_bytes(const enum source_macro_tile_size tile_si
 		return (4 * 1024);
 }

-static void extract_rq_sizing_regs(struct display_mode_lib *mode_lib,
+static __fpu void extract_rq_sizing_regs(struct display_mode_lib *mode_lib,
 		display_data_rq_regs_st *rq_regs,
 		const display_data_rq_sizing_params_st rq_sizing)
 {
@@ -313,7 +313,7 @@ static void handle_det_buf_split(struct display_mode_lib *mode_lib,
 			full_swath_bytes_packed_c);
 }

-static void get_meta_and_pte_attr(struct display_mode_lib *mode_lib,
+static __fpu void get_meta_and_pte_attr(struct display_mode_lib *mode_lib,
 		display_data_rq_dlg_params_st *rq_dlg_param,
 		display_data_rq_misc_params_st *rq_misc_param,
 		display_data_rq_sizing_params_st *rq_sizing_param,
@@ -763,7 +763,7 @@ void dml20_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib,

 // Note: currently taken in as is.
 // Nice to decouple code from hw register implement and extract code that are repeated for luma and chroma.
-static void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib,
+static __fpu void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib,
 		const display_e2e_pipe_params_st *e2e_pipe_param,
 		const unsigned int num_pipes,
 		const unsigned int pipe_idx,
@@ -1611,7 +1611,7 @@ void dml20_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib,
 	dml_print("DML_DLG: Calculation for pipe[%d] end\n", pipe_idx);
 }

-static void calculate_ttu_cursor(struct display_mode_lib *mode_lib,
+static __fpu void calculate_ttu_cursor(struct display_mode_lib *mode_lib,
 		double *refcyc_per_req_delivery_pre_cur,
 		double *refcyc_per_req_delivery_cur,
 		double refclk_freq_in_mhz,
diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c
index e2d82aacd3bc..36541cba3894 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c
@@ -133,7 +133,7 @@ void print__rq_dlg_params_st(struct display_mode_lib *mode_lib, display_rq_dlg_p
 	dml_print("DML_RQ_DLG_CALC: =====================================\n");
 }

-void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, display_dlg_sys_params_st dlg_sys_param)
+__fpu void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, display_dlg_sys_params_st dlg_sys_param)
 {
 	dml_print("DML_RQ_DLG_CALC: =====================================\n");
 	dml_print("DML_RQ_DLG_CALC: DISPLAY_RQ_DLG_PARAM_ST\n");
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c b/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c
index 723af0b2dda0..e86b1d0128cf 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c
@@ -28,7 +28,7 @@

 #include "dml_inline_defs.h"

-double dml_round(double a)
+__fpu double dml_round(double a)
 {
 	double round_pt = 0.5;
 	double ceil = dml_ceil(a, 1);
diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
index 87d682d25278..a0b0eb2f0fe3 100644
--- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
+++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
@@ -323,7 +323,7 @@ static inline uint32_t calc_dsc_bpp_x16(uint32_t stream_bandwidth_kbps, uint32_t
 /* Get DSC bandwidth range based on [min_bpp, max_bpp] target bitrate range, and timing's pixel clock
  * and uncompressed bandwidth.
  */
-static void get_dsc_bandwidth_range(
+static __fpu void get_dsc_bandwidth_range(
 		const uint32_t min_bpp,
 		const uint32_t max_bpp,
 		const struct dsc_enc_caps *dsc_caps,
diff --git a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c
index 03ae15946c6d..535770930343 100644
--- a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c
+++ b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c
@@ -40,7 +40,7 @@
 	break


-void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum max_min max_min, float bpp)
+__fpu void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum max_min max_min, float bpp)
 {
 	int mode = MODE_SELECT(444, 422, 420);
 	int sel = table_hash(mode, bpc, max_min);
@@ -85,7 +85,7 @@ void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum ma
 	memcpy(qps, table[index].qps, sizeof(qp_set));
 }

-double dsc_roundf(double num)
+__fpu double dsc_roundf(double num)
 {
 	if (num < 0.0)
 		num = num - 0.5;
@@ -95,7 +95,7 @@ double dsc_roundf(double num)
 	return (int)(num);
 }

-double dsc_ceil(double num)
+__fpu double dsc_ceil(double num)
 {
 	double retval = (int)num;

@@ -105,7 +105,7 @@ double dsc_ceil(double num)
 	return (int)retval;
 }

-void get_ofs_set(qp_set ofs, enum colour_mode mode, float bpp)
+__fpu void get_ofs_set(qp_set ofs, enum colour_mode mode, float bpp)
 {
 	int   *p = ofs;

@@ -172,7 +172,7 @@ int median3(int a, int b, int c)
 	return b;
 }

-void calc_rc_params(struct rc_params *rc, enum colour_mode cm, enum bits_per_comp bpc, float bpp, int slice_width, int slice_height, int minor_version)
+__fpu void calc_rc_params(struct rc_params *rc, enum colour_mode cm, enum bits_per_comp bpc, float bpp, int slice_width, int slice_height, int minor_version)
 {
 	float bpp_group;
 	float initial_xmit_delay_factor;
diff --git a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c
index 1f6e63b71456..38b3c4ac96dd 100644
--- a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c
+++ b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c
@@ -98,7 +98,7 @@ static void copy_rc_to_cfg(struct drm_dsc_config *dsc_cfg, const struct rc_param
 		dsc_cfg->rc_buf_thresh[i] = rc->rc_buf_thresh[i];
 }

-int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_parameters *dsc_params)
+__fpu int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_parameters *dsc_params)
 {
 	enum colour_mode  mode = pps->convert_rgb ? CM_RGB :
 							(pps->simple_422  ? CM_444 :
@@ -115,7 +115,7 @@ int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_par

 	double d_bytes_per_pixel = dsc_ceil(bpp * slice_width / 8.0) / slice_width;

-	// TODO: Make sure the formula for calculating this is precise (ceiling vs. floor, and at what point they should be applied)
+	// TODO: Make sure the formula for calculating this is precise (ceiling vs. floor, and at what point they should be applied
 	if (pps->native_422 || pps->native_420)
 		d_bytes_per_pixel /= 2;

diff --git a/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h b/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h
index 45a07eeffbb6..d2ea6cf65f7e 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h
@@ -26,6 +26,8 @@
 #ifndef _DCN_CALC_MATH_H_
 #define _DCN_CALC_MATH_H_

+#include <asm/fpu/api.h>
+
 float dcn_bw_mod(const float arg1, const float arg2);
 float dcn_bw_min2(const float arg1, const float arg2);
 unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2);
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 1607a698eccd..02a51fedd031 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -252,6 +252,9 @@ static int decode_instructions(struct objtool_file *file)
 		    strncmp(sec->name, ".discard.", 9))
 			sec->text = true;

+		if (!strcmp(sec->name, ".text.fpu"))
+			sec->fpu = true;
+
 		for (offset = 0; offset < sec->len; offset += insn->len) {
 			insn = malloc(sizeof(*insn));
 			if (!insn) {
@@ -288,8 +291,10 @@ static int decode_instructions(struct objtool_file *file)
 				return -1;
 			}

-			sym_for_each_insn(file, func, insn)
+			sym_for_each_insn(file, func, insn) {
 				insn->func = func;
+				insn->fpu_safe = sec->fpu;
+			}
 		}
 	}

diff --git a/tools/objtool/elf.h b/tools/objtool/elf.h
index ebbb10c61e24..a743f2f28feb 100644
--- a/tools/objtool/elf.h
+++ b/tools/objtool/elf.h
@@ -39,7 +39,7 @@ struct section {
 	char *name;
 	int idx;
 	unsigned int len;
-	bool changed, text, rodata;
+	bool changed, text, rodata, fpu;
 };

 struct symbol {


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] x86: insn: Add insn_is_fpu()
  2020-04-09 14:32                           ` Peter Zijlstra
  2020-04-09 14:45                             ` Peter Zijlstra
@ 2020-04-10  0:47                             ` Masami Hiramatsu
  2020-04-10  1:22                             ` [PATCH v3] " Masami Hiramatsu
  2020-04-15  8:49                             ` [PATCH v4] " Masami Hiramatsu
  3 siblings, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-10  0:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo

On Thu, 9 Apr 2020 16:32:12 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Apr 09, 2020 at 01:09:11AM +0900, Masami Hiramatsu wrote:
> > Add insn_is_fpu(insn) which tells that the insn is
> > whether touch the FPU/SSE/MMX register or the instruction
> > of FP coprocessor.
> > 
> > Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> > ---
> 
> Sadly, it turns out I need "FWAIT" too, which I tried adding like the
> below, but that comes apart most mighty :/

Thanks for pointing it out. Now I understand the FWAIT/WAIT is used
to wait for FPU...

> 
> The trouble is that FWAIT doesn't take a MODRM, so the previous
> assumption that INAT_FPU implied INAT_MODRM needs to be broken, and I
> think that ripples through somewhere.

Oops, I missed it.

> 
> (also, your patch adds some whitespace to convert_operands(), not sure
> that was intended)

Ah, that's my typo.

> 
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -206,7 +206,7 @@ Table: one byte opcode
>  98: CBW/CWDE/CDQE
>  99: CWD/CDQ/CQO
>  9a: CALLF Ap (i64)
> -9b: FWAIT/WAIT
> +9b: FWAIT/WAIT {FPU}
>  9c: PUSHF/D/Q Fv (d64)
>  9d: POPF/D/Q Fv (d64)
>  9e: SAHF
> --- a/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -331,9 +331,13 @@ function convert_operands(count,opnd,
>  		if (match(opcode, rex_expr))
>  			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
>  
> +		# check coprocessor escape
> +		if (match(ext, "^ESC"))
> +			flags = add_flags(flags, "INAT_MODRM")
> +
>  		# check FPU/MMX/SSE superscripts
>  		if (match(ext, fpu_expr))
> -			flags = add_flags(flags, "INAT_MODRM | INAT_FPU")
> +			flags = add_flags(flags, "INAT_FPU")


OK, I'll include this.

Thank you,

>  
>  		# check VEX codes
>  		if (match(ext, evexonly_expr))


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3] x86: insn: Add insn_is_fpu()
  2020-04-09 14:32                           ` Peter Zijlstra
  2020-04-09 14:45                             ` Peter Zijlstra
  2020-04-10  0:47                             ` Masami Hiramatsu
@ 2020-04-10  1:22                             ` Masami Hiramatsu
  2020-04-15  8:23                               ` Masami Hiramatsu
  2020-04-15  8:49                             ` [PATCH v4] " Masami Hiramatsu
  3 siblings, 1 reply; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-10  1:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo

Add insn_is_fpu(insn) which tells that the insn is
whether touch the FPU/SSE/MMX register or the instruction
of FP coprocessor.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v3:
 - Add {FPU} to FWAIT/WAIT and FEMMS.
 - Split INAT_FPU and INAT_MODRM.
 - Remove a blank line typo.
---
 arch/x86/include/asm/inat.h                |    7 +++
 arch/x86/include/asm/insn.h                |   12 ++++++
 arch/x86/lib/x86-opcode-map.txt            |   36 ++++++++++-------
 arch/x86/tools/gen-insn-attr-x86.awk       |   58 +++++++++++++++++++++++++---
 tools/arch/x86/include/asm/inat.h          |    7 +++
 tools/arch/x86/include/asm/insn.h          |   12 ++++++
 tools/arch/x86/lib/x86-opcode-map.txt      |   36 ++++++++++-------
 tools/arch/x86/tools/gen-insn-attr-x86.awk |   58 +++++++++++++++++++++++++---
 8 files changed, 182 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 4cf2ad521f65..ffce45178c08 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5c1ae3eff9d4..1752c54d2103 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..0adf11cbd3a8 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# Optional Superscripts
+#  - {FPU}: this mnemonic doesn't have FPU/MMX/SSE operands but access those
+#           registers.
 
 Table: one byte opcode
 Referrer:
@@ -202,7 +206,7 @@ AVXcode:
 98: CBW/CWDE/CDQE
 99: CWD/CDQ/CQO
 9a: CALLF Ap (i64)
-9b: FWAIT/WAIT
+9b: FWAIT/WAIT {FPU}
 9c: PUSHF/D/Q Fv (d64)
 9d: POPF/D/Q Fv (d64)
 9e: SAHF
@@ -269,14 +273,16 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set), the coprocessor means x87 FPU.
+d8: ESC {FPU}
+d9: ESC {FPU}
+da: ESC {FPU}
+db: ESC {FPU}
+dc: ESC {FPU}
+dd: ESC {FPU}
+de: ESC {FPU}
+df: ESC {FPU}
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -339,7 +345,7 @@ AVXcode: 1
 0c:
 # AMD's prefetch group. Intel supports prefetchw(/1) only.
 0d: GrpP
-0e: FEMMS
+0e: FEMMS {FPU}
 # 3DNow! uses the last imm byte as opcode extension.
 0f: 3DNow! Pq,Qq,Ib
 # 0x0f 0x10-0x1f
@@ -462,7 +468,7 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
+77: emms {FPU} | vzeroupper | vzeroall
 78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
 79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
@@ -1036,10 +1042,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1),{FPU} | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1),{FPU} | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..d8a9dae42c3d 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -44,7 +44,7 @@ BEGIN {
 	delete atable
 
 	opnd_expr = "^[A-Za-z/]"
-	ext_expr = "^\\("
+	ext_expr = "^(\\(|\\{)"
 	sep_expr = "^\\|$"
 	group_expr = "^Grp[0-9A-Za-z]+"
 
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+	x87_expr = "^ESC"
+
+	fpureg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	fpu_expr = "\\{FPU\\}"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,fpu)
 {
 	imm = null
 	mod = null
+	fpu = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, fpureg_expr) == 1) {
+			fpu = "INAT_FPU"
+		}
 	}
+	if (fpu)
+		imm = add_flags(imm, fpu)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +292,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -318,10 +331,14 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
+		# x87 escape opcode needs MODRM
+		if (match(ext, x87_expr))
 			flags = add_flags(flags, "INAT_MODRM")
 
+		# check FPU/MMX/SSE superscripts
+		if (match(ext, fpu_expr))
+			flags = add_flags(flags, "INAT_FPU")
+
 		# check VEX codes
 		if (match(ext, evexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
@@ -336,22 +353,49 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU") {
+				lpfpu[0] = 1
+				lpfpu[1] = 1
+				lpfpu[2] = 1
+				lpfpu[3] = 1
+			}
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 877827b7c2c3..2e6a05290efd 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 568854b14d0a..d9f6bd9059c1 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..0adf11cbd3a8 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# Optional Superscripts
+#  - {FPU}: this mnemonic doesn't have FPU/MMX/SSE operands but access those
+#           registers.
 
 Table: one byte opcode
 Referrer:
@@ -202,7 +206,7 @@ AVXcode:
 98: CBW/CWDE/CDQE
 99: CWD/CDQ/CQO
 9a: CALLF Ap (i64)
-9b: FWAIT/WAIT
+9b: FWAIT/WAIT {FPU}
 9c: PUSHF/D/Q Fv (d64)
 9d: POPF/D/Q Fv (d64)
 9e: SAHF
@@ -269,14 +273,16 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set), the coprocessor means x87 FPU.
+d8: ESC {FPU}
+d9: ESC {FPU}
+da: ESC {FPU}
+db: ESC {FPU}
+dc: ESC {FPU}
+dd: ESC {FPU}
+de: ESC {FPU}
+df: ESC {FPU}
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -339,7 +345,7 @@ AVXcode: 1
 0c:
 # AMD's prefetch group. Intel supports prefetchw(/1) only.
 0d: GrpP
-0e: FEMMS
+0e: FEMMS {FPU}
 # 3DNow! uses the last imm byte as opcode extension.
 0f: 3DNow! Pq,Qq,Ib
 # 0x0f 0x10-0x1f
@@ -462,7 +468,7 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
+77: emms {FPU} | vzeroupper | vzeroall
 78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
 79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
@@ -1036,10 +1042,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1),{FPU} | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1),{FPU} | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..d8a9dae42c3d 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -44,7 +44,7 @@ BEGIN {
 	delete atable
 
 	opnd_expr = "^[A-Za-z/]"
-	ext_expr = "^\\("
+	ext_expr = "^(\\(|\\{)"
 	sep_expr = "^\\|$"
 	group_expr = "^Grp[0-9A-Za-z]+"
 
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+	x87_expr = "^ESC"
+
+	fpureg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	fpu_expr = "\\{FPU\\}"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,fpu)
 {
 	imm = null
 	mod = null
+	fpu = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, fpureg_expr) == 1) {
+			fpu = "INAT_FPU"
+		}
 	}
+	if (fpu)
+		imm = add_flags(imm, fpu)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +292,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -318,10 +331,14 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
+		# x87 escape opcode needs MODRM
+		if (match(ext, x87_expr))
 			flags = add_flags(flags, "INAT_MODRM")
 
+		# check FPU/MMX/SSE superscripts
+		if (match(ext, fpu_expr))
+			flags = add_flags(flags, "INAT_FPU")
+
 		# check VEX codes
 		if (match(ext, evexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
@@ -336,22 +353,49 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU") {
+				lpfpu[0] = 1
+				lpfpu[1] = 1
+				lpfpu[2] = 1
+				lpfpu[3] = 1
+			}
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-09 20:01           ` Peter Zijlstra
@ 2020-04-10 14:31             ` Christian König
  2020-04-15  9:16               ` Peter Zijlstra
  2020-04-17 20:27             ` Rodrigo Siqueira
  1 sibling, 1 reply; 37+ messages in thread
From: Christian König @ 2020-04-10 14:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo, mhiramat

Am 09.04.20 um 22:01 schrieb Peter Zijlstra:
> On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote:
>> Am 09.04.20 um 19:09 schrieb Peter Zijlstra:
>>> On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
>>> [SNIP]
>>>> I'll need another approach, let me consider.
>>> Christian; it says these files are generated, does that generator know
>>> which functions are wholly in FPU context and which are not?
>> Well that "generator" is still a human being :)
>>
>> It's just that the formulae for the calculation come from the hardware team
>> and we are not able to easily transcript them to fixed point calculations.
> Well, if it's a human, can this human respect the kernel coding style a
> bit more :-) Some of that stuff is atrocious.

Yes, I know. That's unfortunately something we still need to work on as 
well.

>> We are currently in the process of moving all the stuff which requires
>> floating point into a single C file(s) and then make sure that we only call
>> those within kernel_fpu_begin()/end() blocks.
> Can you make the build system stick all those .o files in a single
> archive? That's the only way I can do call validation; external
> relocatoin records do not contain the section.

Need to double check that with the display team responsible for the 
code, but I think that shouldn't be much of a problem.

>> Annotating those function with __fpu or even saying to gcc that all code of
>> those files should go into a special text.fpu segment shouldn't be much of a
>> problem.
> Guess what the __fpu attribute does ;-)

Good to know that my suspicion how this is implemented was correct :)

> With the below patch (which is on to of newer versions of the objtool
> patches send earlier, let me know if you want a full set

Getting a branch somewhere would be perfect.

> ) that only
> converts a few files, but fully converts:
>
>    drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c
>
> But building it (and this is an absolute pain; when you're reworking
> this, can you pretty please also fix the Makefiles?), we get:
>
>    drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of kernel_fpu_{begin,end}()
>
> $ ./scripts/faddr2line defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth+0x34fa
> dcn_validate_bandwidth+0x34fa/0x57ce:
> dcn_validate_bandwidth at /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293 (discriminator 5)
>
> # ./objdump-func.sh defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth | grep 34fa
> 34fa     50fa:  f2 0f 10 b5 60 ff ff    movsd  -0xa0(%rbp),%xmm6
>
> Which seems to indicate there's still problms with the current code.

Making an educated guess I would say the compiler has no idea that it 
shouldn't use instructions which touch fp registers outside of 
kernel_fpu_{begin,end}().

Going to talk with the display team about this whole topic internally 
once more. Since this discussion already raised attention in our 
technical management it shouldn't be to much of a problem to get 
manpower to get this fixed properly.

Can we put this new automated check will be behind a configuration flag 
initially? Or at least make it a warning and not a hard error.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3] x86: insn: Add insn_is_fpu()
  2020-04-10  1:22                             ` [PATCH v3] " Masami Hiramatsu
@ 2020-04-15  8:23                               ` Masami Hiramatsu
  0 siblings, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-15  8:23 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, Christian König, Jann Horn, Harry Wentland,
	Leo Li, amd-gfx, Alex Deucher, David Zhou, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	the arch/x86 maintainers, kernel list, Josh Poimboeuf,
	Andy Lutomirski, Arnaldo Carvalho de Melo

On Fri, 10 Apr 2020 10:22:30 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> @@ -318,10 +331,14 @@ function convert_operands(count,opnd,       i,j,imm,mod)
>  		if (match(opcode, rex_expr))
>  			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
>  
> -		# check coprocessor escape : TODO
> -		if (match(opcode, fpu_expr))
> +		# x87 escape opcode needs MODRM
> +		if (match(ext, x87_expr))

Oops, it must be match(opcode, x87_expr). I'll fix it.

Thanks,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4] x86: insn: Add insn_is_fpu()
  2020-04-09 14:32                           ` Peter Zijlstra
                                               ` (2 preceding siblings ...)
  2020-04-10  1:22                             ` [PATCH v3] " Masami Hiramatsu
@ 2020-04-15  8:49                             ` Masami Hiramatsu
  3 siblings, 0 replies; 37+ messages in thread
From: Masami Hiramatsu @ 2020-04-15  8:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, Jann Horn, Harry Wentland, Leo Li, amd-gfx,
	Alex Deucher, David Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo

Add insn_is_fpu(insn) which tells that the insn is
whether touch the FPU/SSE/MMX register or the instruction
of FP coprocessor.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
---
 Changes in v4:
 - Fix to match x87-opcode pattern with opcode instead
   of ext(ension).
---
 arch/x86/include/asm/inat.h                |    7 +++
 arch/x86/include/asm/insn.h                |   12 ++++++
 arch/x86/lib/x86-opcode-map.txt            |   36 ++++++++++-------
 arch/x86/tools/gen-insn-attr-x86.awk       |   58 +++++++++++++++++++++++++---
 tools/arch/x86/include/asm/inat.h          |    7 +++
 tools/arch/x86/include/asm/insn.h          |   12 ++++++
 tools/arch/x86/lib/x86-opcode-map.txt      |   36 ++++++++++-------
 tools/arch/x86/tools/gen-insn-attr-x86.awk |   58 +++++++++++++++++++++++++---
 8 files changed, 182 insertions(+), 44 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 4cf2ad521f65..ffce45178c08 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 5c1ae3eff9d4..1752c54d2103 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..0adf11cbd3a8 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# Optional Superscripts
+#  - {FPU}: this mnemonic doesn't have FPU/MMX/SSE operands but access those
+#           registers.
 
 Table: one byte opcode
 Referrer:
@@ -202,7 +206,7 @@ AVXcode:
 98: CBW/CWDE/CDQE
 99: CWD/CDQ/CQO
 9a: CALLF Ap (i64)
-9b: FWAIT/WAIT
+9b: FWAIT/WAIT {FPU}
 9c: PUSHF/D/Q Fv (d64)
 9d: POPF/D/Q Fv (d64)
 9e: SAHF
@@ -269,14 +273,16 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set), the coprocessor means x87 FPU.
+d8: ESC {FPU}
+d9: ESC {FPU}
+da: ESC {FPU}
+db: ESC {FPU}
+dc: ESC {FPU}
+dd: ESC {FPU}
+de: ESC {FPU}
+df: ESC {FPU}
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -339,7 +345,7 @@ AVXcode: 1
 0c:
 # AMD's prefetch group. Intel supports prefetchw(/1) only.
 0d: GrpP
-0e: FEMMS
+0e: FEMMS {FPU}
 # 3DNow! uses the last imm byte as opcode extension.
 0f: 3DNow! Pq,Qq,Ib
 # 0x0f 0x10-0x1f
@@ -462,7 +468,7 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
+77: emms {FPU} | vzeroupper | vzeroall
 78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
 79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
@@ -1036,10 +1042,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1),{FPU} | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1),{FPU} | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..38475918467c 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -44,7 +44,7 @@ BEGIN {
 	delete atable
 
 	opnd_expr = "^[A-Za-z/]"
-	ext_expr = "^\\("
+	ext_expr = "^(\\(|\\{)"
 	sep_expr = "^\\|$"
 	group_expr = "^Grp[0-9A-Za-z]+"
 
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+	x87_expr = "^ESC"
+
+	fpureg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	fpu_expr = "\\{FPU\\}"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,fpu)
 {
 	imm = null
 	mod = null
+	fpu = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, fpureg_expr) == 1) {
+			fpu = "INAT_FPU"
+		}
 	}
+	if (fpu)
+		imm = add_flags(imm, fpu)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +292,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -318,10 +331,14 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
+		# x87 escape opcode needs MODRM
+		if (match(opcode, x87_expr))
 			flags = add_flags(flags, "INAT_MODRM")
 
+		# check FPU/MMX/SSE superscripts
+		if (match(ext, fpu_expr))
+			flags = add_flags(flags, "INAT_FPU")
+
 		# check VEX codes
 		if (match(ext, evexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
@@ -336,22 +353,49 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU") {
+				lpfpu[0] = 1
+				lpfpu[1] = 1
+				lpfpu[2] = 1
+				lpfpu[3] = 1
+			}
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 877827b7c2c3..2e6a05290efd 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -77,6 +77,8 @@
 #define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
 #define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
 #define INAT_EVEXONLY	(1 << (INAT_FLAG_OFFS + 7))
+#define INAT_FPU	(1 << (INAT_FLAG_OFFS + 8))
+#define INAT_FPUIFVEX	(1 << (INAT_FLAG_OFFS + 9))
 /* Attribute making macros for attribute tables */
 #define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
 #define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
@@ -227,4 +229,9 @@ static inline int inat_must_evex(insn_attr_t attr)
 {
 	return attr & INAT_EVEXONLY;
 }
+
+static inline int inat_is_fpu(insn_attr_t attr)
+{
+	return attr & INAT_FPU;
+}
 #endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 568854b14d0a..d9f6bd9059c1 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -129,6 +129,18 @@ static inline int insn_is_evex(struct insn *insn)
 	return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_is_fpu(struct insn *insn)
+{
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+	if (inat_is_fpu(insn->attr)) {
+		if (insn->attr & INAT_FPUIFVEX)
+			return insn_is_avx(insn);
+		return 1;
+	}
+	return 0;
+}
+
 static inline int insn_has_emulate_prefix(struct insn *insn)
 {
 	return !!insn->emulate_prefix_size;
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index ec31f5b60323..0adf11cbd3a8 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
 #  - (F2): the last prefix is 0xF2
 #  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
 #  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# Optional Superscripts
+#  - {FPU}: this mnemonic doesn't have FPU/MMX/SSE operands but access those
+#           registers.
 
 Table: one byte opcode
 Referrer:
@@ -202,7 +206,7 @@ AVXcode:
 98: CBW/CWDE/CDQE
 99: CWD/CDQ/CQO
 9a: CALLF Ap (i64)
-9b: FWAIT/WAIT
+9b: FWAIT/WAIT {FPU}
 9c: PUSHF/D/Q Fv (d64)
 9d: POPF/D/Q Fv (d64)
 9e: SAHF
@@ -269,14 +273,16 @@ d4: AAM Ib (i64)
 d5: AAD Ib (i64)
 d6:
 d7: XLAT/XLATB
-d8: ESC
-d9: ESC
-da: ESC
-db: ESC
-dc: ESC
-dd: ESC
-de: ESC
-df: ESC
+# Intel SDM Appendix A Opcode Map shows these opcode are ESC (Escape to
+# coprocessor instruction set), the coprocessor means x87 FPU.
+d8: ESC {FPU}
+d9: ESC {FPU}
+da: ESC {FPU}
+db: ESC {FPU}
+dc: ESC {FPU}
+dd: ESC {FPU}
+de: ESC {FPU}
+df: ESC {FPU}
 # 0xe0 - 0xef
 # Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
 # in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
@@ -339,7 +345,7 @@ AVXcode: 1
 0c:
 # AMD's prefetch group. Intel supports prefetchw(/1) only.
 0d: GrpP
-0e: FEMMS
+0e: FEMMS {FPU}
 # 3DNow! uses the last imm byte as opcode extension.
 0f: 3DNow! Pq,Qq,Ib
 # 0x0f 0x10-0x1f
@@ -462,7 +468,7 @@ AVXcode: 1
 75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
 76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
 # Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
-77: emms | vzeroupper | vzeroall
+77: emms {FPU} | vzeroupper | vzeroall
 78: VMREAD Ey,Gy | vcvttps2udq/pd2udq Vx,Wpd (evo) | vcvttsd2usi Gv,Wx (F2),(ev) | vcvttss2usi Gv,Wx (F3),(ev) | vcvttps2uqq/pd2uqq Vx,Wx (66),(ev)
 79: VMWRITE Gy,Ey | vcvtps2udq/pd2udq Vx,Wpd (evo) | vcvtsd2usi Gv,Wx (F2),(ev) | vcvtss2usi Gv,Wx (F3),(ev) | vcvtps2uqq/pd2uqq Vx,Wx (66),(ev)
 7a: vcvtudq2pd/uqq2pd Vpd,Wx (F3),(ev) | vcvtudq2ps/uqq2ps Vpd,Wx (F2),(ev) | vcvttps2qq/pd2qq Vx,Wx (66),(ev)
@@ -1036,10 +1042,10 @@ GrpTable: Grp14
 EndTable
 
 GrpTable: Grp15
-0: fxsave | RDFSBASE Ry (F3),(11B)
-1: fxstor | RDGSBASE Ry (F3),(11B)
-2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
-3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+0: fxsave {FPU} | RDFSBASE Ry (F3),(11B)
+1: fxrstor {FPU} | RDGSBASE Ry (F3),(11B)
+2: ldmxcsr {FPU} | vldmxcsr Md (v1),{FPU} | WRFSBASE Ry (F3),(11B)
+3: stmxcsr {FPU} | vstmxcsr Md (v1),{FPU} | WRGSBASE Ry (F3),(11B)
 4: XSAVE | ptwrite Ey (F3),(11B)
 5: XRSTOR | lfence (11B) | INCSSPD/Q Ry (F3),(11B)
 6: XSAVEOPT | clwb (66) | mfence (11B) | TPAUSE Rd (66),(11B) | UMONITOR Rv (F3),(11B) | UMWAIT Rd (F2),(11B) | CLRSSBSY Mq (F3)
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index a42015b305f4..38475918467c 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -44,7 +44,7 @@ BEGIN {
 	delete atable
 
 	opnd_expr = "^[A-Za-z/]"
-	ext_expr = "^\\("
+	ext_expr = "^(\\(|\\{)"
 	sep_expr = "^\\|$"
 	group_expr = "^Grp[0-9A-Za-z]+"
 
@@ -65,7 +65,10 @@ BEGIN {
 	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
 	force64_expr = "\\([df]64\\)"
 	rex_expr = "^REX(\\.[XRWB]+)*"
-	fpu_expr = "^ESC" # TODO
+	x87_expr = "^ESC"
+
+	fpureg_expr = "^[HLNPQUVW][a-z]+" # MMX/SSE register operands
+	fpu_expr = "\\{FPU\\}"
 
 	lprefix1_expr = "\\((66|!F3)\\)"
 	lprefix2_expr = "\\(F3\\)"
@@ -236,10 +239,11 @@ function add_flags(old,new) {
 }
 
 # convert operands to flags.
-function convert_operands(count,opnd,       i,j,imm,mod)
+function convert_operands(count,opnd,       i,j,imm,mod,fpu)
 {
 	imm = null
 	mod = null
+	fpu = null
 	for (j = 1; j <= count; j++) {
 		i = opnd[j]
 		if (match(i, imm_expr) == 1) {
@@ -253,7 +257,12 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				imm = imm_flag[i]
 		} else if (match(i, modrm_expr))
 			mod = "INAT_MODRM"
+		if (match(i, fpureg_expr) == 1) {
+			fpu = "INAT_FPU"
+		}
 	}
+	if (fpu)
+		imm = add_flags(imm, fpu)
 	return add_flags(imm, mod)
 }
 
@@ -283,6 +292,10 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 	variant = null
 	# converts
 	i = 2
+	lpfpu[0] = 0
+	lpfpu[1] = 0
+	lpfpu[2] = 0
+	lpfpu[3] = 0
 	while (i <= NF) {
 		opcode = $(i++)
 		delete opnds
@@ -318,10 +331,14 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 		if (match(opcode, rex_expr))
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
 
-		# check coprocessor escape : TODO
-		if (match(opcode, fpu_expr))
+		# x87 escape opcode needs MODRM
+		if (match(opcode, x87_expr))
 			flags = add_flags(flags, "INAT_MODRM")
 
+		# check FPU/MMX/SSE superscripts
+		if (match(ext, fpu_expr))
+			flags = add_flags(flags, "INAT_FPU")
+
 		# check VEX codes
 		if (match(ext, evexonly_expr))
 			flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
@@ -336,22 +353,49 @@ function convert_operands(count,opnd,       i,j,imm,mod)
 				semantic_error("Unknown prefix: " opcode)
 			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
 		}
-		if (length(flags) == 0)
-			continue
+
 		# check if last prefix
 		if (match(ext, lprefix1_expr)) {
+			if (lpfpu[1] == 0 && flags !~ "INAT_FPU")
+				lpfpu[1] = 1
+			else if (lpfpu[1] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable1[idx] = add_flags(lptable1[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix2_expr)) {
+			if (lpfpu[2] == 0 && flags !~ "INAT_FPU")
+				lpfpu[2] = 1
+			else if (lpfpu[2] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable2[idx] = add_flags(lptable2[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (match(ext, lprefix3_expr)) {
+			if (lpfpu[3] == 0 && flags !~ "INAT_FPU")
+				lpfpu[3] = 1
+			else if (lpfpu[3] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			lptable3[idx] = add_flags(lptable3[idx],flags)
 			variant = "INAT_VARIANT"
 		}
 		if (!match(ext, lprefix_expr)){
+			if (lpfpu[0] == 0 && flags !~ "INAT_FPU") {
+				lpfpu[0] = 1
+				lpfpu[1] = 1
+				lpfpu[2] = 1
+				lpfpu[3] = 1
+			}
+			else if (lpfpu[0] != 0 && flags ~ "INAT_FPU")
+				flags = add_flags(flags, "INAT_FPUIFVEX")
+			if (length(flags) == 0)
+				continue;
 			table[idx] = add_flags(table[idx],flags)
 		}
 	}


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-10 14:31             ` Christian König
@ 2020-04-15  9:16               ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-15  9:16 UTC (permalink / raw)
  To: Christian König
  Cc: Jann Horn, Harry Wentland, Leo Li, amd-gfx, Alex Deucher,
	David (ChunMing) Zhou, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, the arch/x86 maintainers,
	kernel list, Josh Poimboeuf, Andy Lutomirski,
	Arnaldo Carvalho de Melo, mhiramat

On Fri, Apr 10, 2020 at 04:31:39PM +0200, Christian König wrote:
> Can we put this new automated check will be behind a configuration flag
> initially? Or at least make it a warning and not a hard error.

I'll try and get the patches merged in mainline objtool but with a flag
that isn't used by default.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-09 20:01           ` Peter Zijlstra
  2020-04-10 14:31             ` Christian König
@ 2020-04-17 20:27             ` Rodrigo Siqueira
  2020-04-17 21:56               ` Peter Zijlstra
  1 sibling, 1 reply; 37+ messages in thread
From: Rodrigo Siqueira @ 2020-04-17 20:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Christian König, David (ChunMing) Zhou, Josh Poimboeuf,
	Jann Horn, Leo Li, the arch/x86 maintainers, kernel list,
	amd-gfx, Ingo Molnar, Borislav Petkov, Arnaldo Carvalho de Melo,
	Andy Lutomirski, H. Peter Anvin, Alex Deucher, Thomas Gleixner,
	Harry Wentland, mhiramat

[-- Attachment #1: Type: text/plain, Size: 19768 bytes --]

On 04/09, Peter Zijlstra wrote:
> On Thu, Apr 09, 2020 at 08:15:57PM +0200, Christian König wrote:
> > Am 09.04.20 um 19:09 schrieb Peter Zijlstra:
> > > On Thu, Apr 09, 2020 at 05:59:56PM +0200, Peter Zijlstra wrote:
> > > [SNIP]
> > > > I'll need another approach, let me consider.
> > > Christian; it says these files are generated, does that generator know
> > > which functions are wholly in FPU context and which are not?
> > 
> > Well that "generator" is still a human being :)
> > 
> > It's just that the formulae for the calculation come from the hardware team
> > and we are not able to easily transcript them to fixed point calculations.
> 
> Well, if it's a human, can this human respect the kernel coding style a
> bit more :-) Some of that stuff is atrocious.
> 
> > > My current thinking is that if I annotate all functions that are wholly
> > > inside kernel_fpu_start() with an __fpu function attribute, then I can
> > > verify that any call from regular text to fpu text only happens inside
> > > kernel_fpu_begin()/end(). And I can ensure that all !__fpu annotation
> > > fuctions only contain !fpu instructions.
> > 
> > Yeah, that sounds like a good idea to me and should be easily doable.
> > 
> > > Can that generator add the __fpu function attribute or is that something
> > > that would need to be done manually (which seems like it would be
> > > painful, since it is quite a bit of code) ?
> > 
> > We are currently in the process of moving all the stuff which requires
> > floating point into a single C file(s) and then make sure that we only call
> > those within kernel_fpu_begin()/end() blocks.
> 
> Can you make the build system stick all those .o files in a single
> archive? That's the only way I can do call validation; external
> relocatoin records do not contain the section.
> 
> > Annotating those function with __fpu or even saying to gcc that all code of
> > those files should go into a special text.fpu segment shouldn't be much of a
> > problem.
> 
> Guess what the __fpu attribute does ;-)
> 
> With the below patch (which is on to of newer versions of the objtool
> patches send earlier, let me know if you want a full set) that only
> converts a few files, but fully converts:
> 
>   drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c
> 
> But building it (and this is an absolute pain; when you're reworking
> this, can you pretty please also fix the Makefiles?), we get:

Hi,

I'm going to work on the FPU issues in the display code. In this sense,
could you clarify a little bit more about the Makefile issues?

Also, I applied the patch `[PATCH v4] x86: insn: Add insn_is_fpu()` and
tried to reproduce the warning that you described but I didn't see it.
Could you explain to me how I can check those warnings?

Thanks

>   drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x34fa: FPU instruction outside of kernel_fpu_{begin,end}()
> 
> $ ./scripts/faddr2line defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth+0x34fa
> dcn_validate_bandwidth+0x34fa/0x57ce:
> dcn_validate_bandwidth at /usr/src/linux-2.6/defconfig-build/../drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1293 (discriminator 5)
> 
> # ./objdump-func.sh defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o dcn_validate_bandwidth | grep 34fa
> 34fa     50fa:  f2 0f 10 b5 60 ff ff    movsd  -0xa0(%rbp),%xmm6
> 
> Which seems to indicate there's still problms with the current code.
> 
> 
> 
> ---
>  arch/x86/include/asm/fpu/api.h                     | 12 +++++++++++
>  arch/x86/kernel/vmlinux.lds.S                      |  1 +
>  .../gpu/drm/amd/display/dc/calcs/dcn_calc_math.c   | 25 +++++++++++-----------
>  drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c   |  4 ++--
>  .../display/dc/dml/dcn20/display_rq_dlg_calc_20.c  | 10 ++++-----
>  .../amd/display/dc/dml/display_rq_dlg_helpers.c    |  2 +-
>  .../gpu/drm/amd/display/dc/dml/dml_common_defs.c   |  2 +-
>  drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c        |  2 +-
>  drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c       | 10 ++++-----
>  drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c   |  4 ++--
>  drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h |  2 ++
>  tools/objtool/check.c                              |  7 +++++-
>  tools/objtool/elf.h                                |  2 +-
>  13 files changed, 52 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
> index 64be4426fda9..19eaf98bbb0a 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -12,11 +12,23 @@
>  #define _ASM_X86_FPU_API_H
>  #include <linux/bottom_half.h>
> 
> +#ifdef CONFIG_STACK_VALIDATION
> +
> +#define __fpu __section(".text.fpu")
> +
>  #define _ASM_ANNOTATE_FPU(at)						\
>  		     ".pushsection .discard.fpu_safe\n"			\
>  		     ".long " #at " - .\n"				\
>  		     ".popsection\n"					\
> 
> +#else
> +
> +#define __fpu
> +
> +#define _ASM_ANNOTATE_FPU(at)
> +
> +#endif /* CONFIG_STACK_VALIDATION */
> +
>  #define annotate_fpu() ({						\
>  	asm volatile("%c0:\n\t"						\
>  		     _ASM_ANNOTATE_FPU(%c0b)				\
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 1bf7e312361f..8442f8633d07 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -139,6 +139,7 @@ SECTIONS
>  		SOFTIRQENTRY_TEXT
>  		*(.fixup)
>  		*(.gnu.warning)
> +		*(.text.fpu)
> 
>  #ifdef CONFIG_RETPOLINE
>  		__indirect_thunk_start = .;
> diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
> index 07d18e78de49..57ab3aafef5a 100644
> --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
> +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
> @@ -36,7 +36,7 @@
>   * remain as-is as it provides us with a guarantee from HW that it is correct.
>   */
> 
> -float dcn_bw_mod(const float arg1, const float arg2)
> +__fpu float dcn_bw_mod(const float arg1, const float arg2)
>  {
>  	if (isNaN(arg1))
>  		return arg2;
> @@ -45,7 +45,7 @@ float dcn_bw_mod(const float arg1, const float arg2)
>  	return arg1 - arg1 * ((int) (arg1 / arg2));
>  }
> 
> -float dcn_bw_min2(const float arg1, const float arg2)
> +__fpu float dcn_bw_min2(const float arg1, const float arg2)
>  {
>  	if (isNaN(arg1))
>  		return arg2;
> @@ -58,7 +58,7 @@ unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2)
>  {
>  	return arg1 > arg2 ? arg1 : arg2;
>  }
> -float dcn_bw_max2(const float arg1, const float arg2)
> +__fpu float dcn_bw_max2(const float arg1, const float arg2)
>  {
>  	if (isNaN(arg1))
>  		return arg2;
> @@ -67,25 +67,26 @@ float dcn_bw_max2(const float arg1, const float arg2)
>  	return arg1 > arg2 ? arg1 : arg2;
>  }
> 
> -float dcn_bw_floor2(const float arg, const float significance)
> +__fpu float dcn_bw_floor2(const float arg, const float significance)
>  {
>  	if (significance == 0)
>  		return 0;
>  	return ((int) (arg / significance)) * significance;
>  }
> -float dcn_bw_floor(const float arg)
> +
> +__fpu float dcn_bw_floor(const float arg)
>  {
>  	return ((int) (arg));
>  }
> 
> -float dcn_bw_ceil(const float arg)
> +__fpu float dcn_bw_ceil(const float arg)
>  {
>  	float flr = dcn_bw_floor2(arg, 1);
> 
>  	return flr + 0.00001 >= arg ? arg : flr + 1;
>  }
> 
> -float dcn_bw_ceil2(const float arg, const float significance)
> +__fpu float dcn_bw_ceil2(const float arg, const float significance)
>  {
>  	float flr = dcn_bw_floor2(arg, significance);
>  	if (significance == 0)
> @@ -93,17 +94,17 @@ float dcn_bw_ceil2(const float arg, const float significance)
>  	return flr + 0.00001 >= arg ? arg : flr + significance;
>  }
> 
> -float dcn_bw_max3(float v1, float v2, float v3)
> +__fpu float dcn_bw_max3(float v1, float v2, float v3)
>  {
>  	return v3 > dcn_bw_max2(v1, v2) ? v3 : dcn_bw_max2(v1, v2);
>  }
> 
> -float dcn_bw_max5(float v1, float v2, float v3, float v4, float v5)
> +__fpu float dcn_bw_max5(float v1, float v2, float v3, float v4, float v5)
>  {
>  	return dcn_bw_max3(v1, v2, v3) > dcn_bw_max2(v4, v5) ? dcn_bw_max3(v1, v2, v3) : dcn_bw_max2(v4, v5);
>  }
> 
> -float dcn_bw_pow(float a, float exp)
> +__fpu float dcn_bw_pow(float a, float exp)
>  {
>  	float temp;
>  	/*ASSERT(exp == (int)exp);*/
> @@ -120,7 +121,7 @@ float dcn_bw_pow(float a, float exp)
>  	}
>  }
> 
> -double dcn_bw_fabs(double a)
> +__fpu double dcn_bw_fabs(double a)
>  {
>  	if (a > 0)
>  		return (a);
> @@ -129,7 +130,7 @@ double dcn_bw_fabs(double a)
>  }
> 
> 
> -float dcn_bw_log(float a, float b)
> +__fpu float dcn_bw_log(float a, float b)
>  {
>  	int * const exp_ptr = (int *)(&a);
>  	int x = *exp_ptr;
> diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
> index 3960a8db94cb..b3e305d9d1c9 100644
> --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
> +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c
> @@ -435,7 +435,7 @@ static void pipe_ctx_to_e2e_pipe_params (
> 
>  }
> 
> -static void dcn_bw_calc_rq_dlg_ttu(
> +static __fpu void dcn_bw_calc_rq_dlg_ttu(
>  		const struct dc *dc,
>  		const struct dcn_bw_internal_vars *v,
>  		struct pipe_ctx *pipe,
> @@ -1388,7 +1388,7 @@ static unsigned int dcn_find_normalized_clock_vdd_Level(
>  	return vdd_level;
>  }
> 
> -unsigned int dcn_find_dcfclk_suits_all(
> +__fpu unsigned int dcn_find_dcfclk_suits_all(
>  	const struct dc *dc,
>  	struct dc_clocks *clocks)
>  {
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
> index ca807846032f..0cbb58fc7fb8 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
> @@ -115,7 +115,7 @@ static bool is_dual_plane(enum source_format_class source_format)
>  	return ret_val;
>  }
> 
> -static double get_refcyc_per_delivery(struct display_mode_lib *mode_lib,
> +static __fpu double get_refcyc_per_delivery(struct display_mode_lib *mode_lib,
>  		double refclk_freq_in_mhz,
>  		double pclk_freq_in_mhz,
>  		bool odm_combine,
> @@ -162,7 +162,7 @@ static unsigned int get_blk_size_bytes(const enum source_macro_tile_size tile_si
>  		return (4 * 1024);
>  }
> 
> -static void extract_rq_sizing_regs(struct display_mode_lib *mode_lib,
> +static __fpu void extract_rq_sizing_regs(struct display_mode_lib *mode_lib,
>  		display_data_rq_regs_st *rq_regs,
>  		const display_data_rq_sizing_params_st rq_sizing)
>  {
> @@ -313,7 +313,7 @@ static void handle_det_buf_split(struct display_mode_lib *mode_lib,
>  			full_swath_bytes_packed_c);
>  }
> 
> -static void get_meta_and_pte_attr(struct display_mode_lib *mode_lib,
> +static __fpu void get_meta_and_pte_attr(struct display_mode_lib *mode_lib,
>  		display_data_rq_dlg_params_st *rq_dlg_param,
>  		display_data_rq_misc_params_st *rq_misc_param,
>  		display_data_rq_sizing_params_st *rq_sizing_param,
> @@ -763,7 +763,7 @@ void dml20_rq_dlg_get_rq_reg(struct display_mode_lib *mode_lib,
> 
>  // Note: currently taken in as is.
>  // Nice to decouple code from hw register implement and extract code that are repeated for luma and chroma.
> -static void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib,
> +static __fpu void dml20_rq_dlg_get_dlg_params(struct display_mode_lib *mode_lib,
>  		const display_e2e_pipe_params_st *e2e_pipe_param,
>  		const unsigned int num_pipes,
>  		const unsigned int pipe_idx,
> @@ -1611,7 +1611,7 @@ void dml20_rq_dlg_get_dlg_reg(struct display_mode_lib *mode_lib,
>  	dml_print("DML_DLG: Calculation for pipe[%d] end\n", pipe_idx);
>  }
> 
> -static void calculate_ttu_cursor(struct display_mode_lib *mode_lib,
> +static __fpu void calculate_ttu_cursor(struct display_mode_lib *mode_lib,
>  		double *refcyc_per_req_delivery_pre_cur,
>  		double *refcyc_per_req_delivery_cur,
>  		double refclk_freq_in_mhz,
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c
> index e2d82aacd3bc..36541cba3894 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/display_rq_dlg_helpers.c
> @@ -133,7 +133,7 @@ void print__rq_dlg_params_st(struct display_mode_lib *mode_lib, display_rq_dlg_p
>  	dml_print("DML_RQ_DLG_CALC: =====================================\n");
>  }
> 
> -void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, display_dlg_sys_params_st dlg_sys_param)
> +__fpu void print__dlg_sys_params_st(struct display_mode_lib *mode_lib, display_dlg_sys_params_st dlg_sys_param)
>  {
>  	dml_print("DML_RQ_DLG_CALC: =====================================\n");
>  	dml_print("DML_RQ_DLG_CALC: DISPLAY_RQ_DLG_PARAM_ST\n");
> diff --git a/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c b/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c
> index 723af0b2dda0..e86b1d0128cf 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c
> +++ b/drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.c
> @@ -28,7 +28,7 @@
> 
>  #include "dml_inline_defs.h"
> 
> -double dml_round(double a)
> +__fpu double dml_round(double a)
>  {
>  	double round_pt = 0.5;
>  	double ceil = dml_ceil(a, 1);
> diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
> index 87d682d25278..a0b0eb2f0fe3 100644
> --- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
> +++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c
> @@ -323,7 +323,7 @@ static inline uint32_t calc_dsc_bpp_x16(uint32_t stream_bandwidth_kbps, uint32_t
>  /* Get DSC bandwidth range based on [min_bpp, max_bpp] target bitrate range, and timing's pixel clock
>   * and uncompressed bandwidth.
>   */
> -static void get_dsc_bandwidth_range(
> +static __fpu void get_dsc_bandwidth_range(
>  		const uint32_t min_bpp,
>  		const uint32_t max_bpp,
>  		const struct dsc_enc_caps *dsc_caps,
> diff --git a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c
> index 03ae15946c6d..535770930343 100644
> --- a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c
> +++ b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc.c
> @@ -40,7 +40,7 @@
>  	break
> 
> 
> -void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum max_min max_min, float bpp)
> +__fpu void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum max_min max_min, float bpp)
>  {
>  	int mode = MODE_SELECT(444, 422, 420);
>  	int sel = table_hash(mode, bpc, max_min);
> @@ -85,7 +85,7 @@ void get_qp_set(qp_set qps, enum colour_mode cm, enum bits_per_comp bpc, enum ma
>  	memcpy(qps, table[index].qps, sizeof(qp_set));
>  }
> 
> -double dsc_roundf(double num)
> +__fpu double dsc_roundf(double num)
>  {
>  	if (num < 0.0)
>  		num = num - 0.5;
> @@ -95,7 +95,7 @@ double dsc_roundf(double num)
>  	return (int)(num);
>  }
> 
> -double dsc_ceil(double num)
> +__fpu double dsc_ceil(double num)
>  {
>  	double retval = (int)num;
> 
> @@ -105,7 +105,7 @@ double dsc_ceil(double num)
>  	return (int)retval;
>  }
> 
> -void get_ofs_set(qp_set ofs, enum colour_mode mode, float bpp)
> +__fpu void get_ofs_set(qp_set ofs, enum colour_mode mode, float bpp)
>  {
>  	int   *p = ofs;
> 
> @@ -172,7 +172,7 @@ int median3(int a, int b, int c)
>  	return b;
>  }
> 
> -void calc_rc_params(struct rc_params *rc, enum colour_mode cm, enum bits_per_comp bpc, float bpp, int slice_width, int slice_height, int minor_version)
> +__fpu void calc_rc_params(struct rc_params *rc, enum colour_mode cm, enum bits_per_comp bpc, float bpp, int slice_width, int slice_height, int minor_version)
>  {
>  	float bpp_group;
>  	float initial_xmit_delay_factor;
> diff --git a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c
> index 1f6e63b71456..38b3c4ac96dd 100644
> --- a/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c
> +++ b/drivers/gpu/drm/amd/display/dc/dsc/rc_calc_dpi.c
> @@ -98,7 +98,7 @@ static void copy_rc_to_cfg(struct drm_dsc_config *dsc_cfg, const struct rc_param
>  		dsc_cfg->rc_buf_thresh[i] = rc->rc_buf_thresh[i];
>  }
> 
> -int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_parameters *dsc_params)
> +__fpu int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_parameters *dsc_params)
>  {
>  	enum colour_mode  mode = pps->convert_rgb ? CM_RGB :
>  							(pps->simple_422  ? CM_444 :
> @@ -115,7 +115,7 @@ int dscc_compute_dsc_parameters(const struct drm_dsc_config *pps, struct dsc_par
> 
>  	double d_bytes_per_pixel = dsc_ceil(bpp * slice_width / 8.0) / slice_width;
> 
> -	// TODO: Make sure the formula for calculating this is precise (ceiling vs. floor, and at what point they should be applied)
> +	// TODO: Make sure the formula for calculating this is precise (ceiling vs. floor, and at what point they should be applied
>  	if (pps->native_422 || pps->native_420)
>  		d_bytes_per_pixel /= 2;
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h b/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h
> index 45a07eeffbb6..d2ea6cf65f7e 100644
> --- a/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h
> +++ b/drivers/gpu/drm/amd/display/dc/inc/dcn_calc_math.h
> @@ -26,6 +26,8 @@
>  #ifndef _DCN_CALC_MATH_H_
>  #define _DCN_CALC_MATH_H_
> 
> +#include <asm/fpu/api.h>
> +
>  float dcn_bw_mod(const float arg1, const float arg2);
>  float dcn_bw_min2(const float arg1, const float arg2);
>  unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2);
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 1607a698eccd..02a51fedd031 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -252,6 +252,9 @@ static int decode_instructions(struct objtool_file *file)
>  		    strncmp(sec->name, ".discard.", 9))
>  			sec->text = true;
> 
> +		if (!strcmp(sec->name, ".text.fpu"))
> +			sec->fpu = true;
> +
>  		for (offset = 0; offset < sec->len; offset += insn->len) {
>  			insn = malloc(sizeof(*insn));
>  			if (!insn) {
> @@ -288,8 +291,10 @@ static int decode_instructions(struct objtool_file *file)
>  				return -1;
>  			}
> 
> -			sym_for_each_insn(file, func, insn)
> +			sym_for_each_insn(file, func, insn) {
>  				insn->func = func;
> +				insn->fpu_safe = sec->fpu;
> +			}
>  		}
>  	}
> 
> diff --git a/tools/objtool/elf.h b/tools/objtool/elf.h
> index ebbb10c61e24..a743f2f28feb 100644
> --- a/tools/objtool/elf.h
> +++ b/tools/objtool/elf.h
> @@ -39,7 +39,7 @@ struct section {
>  	char *name;
>  	int idx;
>  	unsigned int len;
> -	bool changed, text, rodata;
> +	bool changed, text, rodata, fpu;
>  };
> 
>  struct symbol {
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7CRodrigo.Siqueira%40amd.com%7C8224db5fb3164b0f857c08d7dcc217d8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637220598535185983&amp;sdata=Zp4IQeEXy56dJhG39TsIyvfC%2BYJYEoBuV%2Bna5VVvQpk%3D&amp;reserved=0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection
  2020-04-17 20:27             ` Rodrigo Siqueira
@ 2020-04-17 21:56               ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2020-04-17 21:56 UTC (permalink / raw)
  To: Rodrigo Siqueira
  Cc: Christian König, David (ChunMing) Zhou, Josh Poimboeuf,
	Jann Horn, Leo Li, the arch/x86 maintainers, kernel list,
	amd-gfx, Ingo Molnar, Borislav Petkov, Arnaldo Carvalho de Melo,
	Andy Lutomirski, H. Peter Anvin, Alex Deucher, Thomas Gleixner,
	Harry Wentland, mhiramat

On Fri, Apr 17, 2020 at 04:27:28PM -0400, Rodrigo Siqueira wrote:
> I'm going to work on the FPU issues in the display code. In this sense,
> could you clarify a little bit more about the Makefile issues?

I think it might have been PEBKAC, I assumed allmodconfig would end up
selecting the driver, it doesn't!

Adding "AMD GPU" to a defconfig seems to make it work:

$ make O=defconfig-build/ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o

Sorry about the confusion.

> Also, I applied the patch `[PATCH v4] x86: insn: Add insn_is_fpu()` and
> tried to reproduce the warning that you described but I didn't see it.
> Could you explain to me how I can check those warnings?

grab:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/fpu

Then build like above:

$ make O=defconfig-build/ drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o

And manually validate:

$ defconfig-build/tools/objtool/objtool check -Ffa defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o
defconfig-build/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o: warning: objtool: dcn_validate_bandwidth()+0x1b17: FPU instruction outside of kernel_fpu_{begin,end}()



^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-04-17 21:56 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02  2:34 AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Jann Horn
2020-04-02  7:33 ` Christian König
2020-04-02  7:56   ` Jann Horn
2020-04-02  9:36     ` Thomas Gleixner
2020-04-02 14:50       ` Jann Horn
2020-04-02 14:13   ` Peter Zijlstra
2020-04-03  5:28     ` Masami Hiramatsu
2020-04-03 11:21       ` Peter Zijlstra
2020-04-04  3:08         ` Masami Hiramatsu
2020-04-04  3:15           ` Randy Dunlap
2020-04-04  8:32             ` Masami Hiramatsu
2020-04-04 14:32           ` Peter Zijlstra
2020-04-05  3:19             ` Masami Hiramatsu
2020-04-06 10:21               ` Peter Zijlstra
2020-04-07  9:50                 ` Masami Hiramatsu
2020-04-07 11:15                   ` Peter Zijlstra
2020-04-07 15:41                     ` Masami Hiramatsu
2020-04-07 15:43                       ` [PATCH] x86: insn: Add insn_is_fpu() Masami Hiramatsu
2020-04-07 15:54                       ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
2020-04-08  0:31                         ` Masami Hiramatsu
2020-04-08 16:09                         ` [PATCH v2] x86: insn: Add insn_is_fpu() Masami Hiramatsu
2020-04-09 14:32                           ` Peter Zijlstra
2020-04-09 14:45                             ` Peter Zijlstra
2020-04-10  0:47                             ` Masami Hiramatsu
2020-04-10  1:22                             ` [PATCH v3] " Masami Hiramatsu
2020-04-15  8:23                               ` Masami Hiramatsu
2020-04-15  8:49                             ` [PATCH v4] " Masami Hiramatsu
2020-04-04 14:36           ` AMD DC graphics display code enables -mhard-float, -msse, -msse2 without any visible FPU state protection Peter Zijlstra
2020-04-05  3:37             ` Masami Hiramatsu
2020-04-09 15:59     ` Peter Zijlstra
2020-04-09 17:09       ` Peter Zijlstra
2020-04-09 18:15         ` Christian König
2020-04-09 20:01           ` Peter Zijlstra
2020-04-10 14:31             ` Christian König
2020-04-15  9:16               ` Peter Zijlstra
2020-04-17 20:27             ` Rodrigo Siqueira
2020-04-17 21:56               ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).