Re: [PATCH 17/21] x86/fpu: Use proper mask to replace full instruction mask

From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: mingo@redhat.com, acme@kernel.org, tglx@linutronix.de,
	bp@alien8.de, x86@kernel.org, linux-kernel@vger.kernel.org,
	mark.rutland@arm.com, alexander.shishkin@linux.intel.com,
	jolsa@redhat.com, namhyung@kernel.org, yu-cheng.yu@intel.com,
	bigeasy@linutronix.de, gorcunov@gmail.com, hpa@zytor.com,
	alexey.budankov@linux.intel.com, eranian@google.com,
	ak@linux.intel.com, like.xu@linux.intel.com,
	yao.jin@linux.intel.com
Subject: Re: [PATCH 17/21] x86/fpu: Use proper mask to replace full instruction mask
Date: Mon, 22 Jun 2020 14:46:53 -0400	[thread overview]
Message-ID: <ca901df8-5765-9483-1898-a27efb5b87a2@linux.intel.com> (raw)
In-Reply-To: <5223f714-87eb-947e-e65c-886431cc7655@intel.com>



On 6/22/2020 2:05 PM, Dave Hansen wrote:
> On 6/22/20 10:47 AM, Liang, Kan wrote:
>>> I'm wondering if we should just take these copy_*regs_to_*() functions
>>> and uninline them.  Yeah, they are basically wrapping one instruction,
>>> but it might literally be the most heavyweight instruction in the
>>> whole ISA.
>> Thanks for the suggestions, but I'm not sure if I follow these methods.
>>
>> I don't think simply removing the "inline" key word for the
>> copy_xregs_to_kernel() functions would help here.
>> Do you mean exporting the copy_*regs_to_*()?
> The thing that worries me here is exporting "internal" FPU state like
> xfeatures_mask_all.  I'm much happier exporting a function with a much
> more defined purpose.
> 
> So, yes, I'm suggesting exporting the functions,*not*  the data structures.
> 

I think maybe we should just export the copy_fpregs_to_fpstate() as 
below, because
- KVM directly invokes this function. The copy_xregs_to_kernel() is 
indirectly invoked via the function. I think we should export the 
function which is directly used by other modules.
- The copy_fpregs_to_fpstate() is a bigger function with many checks. 
Uninline the function should not impact the performance.
- it's also a function. It's a safer way than exporting the "internal" 
FPU state. No one except the FPU can change the state 
intentionally/unintentionally.

diff --git a/arch/x86/include/asm/fpu/internal.h 
b/arch/x86/include/asm/fpu/internal.h
index 0388c792..d3724dc 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -411,43 +411,7 @@ static inline int copy_kernel_to_xregs_err(struct 
xregs_state *xstate, u64 mask)
  	return err;
  }

-/*
- * These must be called with preempt disabled. Returns
- * 'true' if the FPU state is still intact and we can
- * keep registers active.
- *
- * The legacy FNSAVE instruction cleared all FPU state
- * unconditionally, so registers are essentially destroyed.
- * Modern FPU state can be kept in registers, if there are
- * no pending FP exceptions.
- */
-static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
-{
-	if (likely(use_xsave())) {
-		copy_xregs_to_kernel(&fpu->state.xsave);
-
-		/*
-		 * AVX512 state is tracked here because its use is
-		 * known to slow the max clock speed of the core.
-		 */
-		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
-			fpu->avx512_timestamp = jiffies;
-		return 1;
-	}
-
-	if (likely(use_fxsr())) {
-		copy_fxregs_to_kernel(fpu);
-		return 1;
-	}
-
-	/*
-	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
-	 * so we have to mark them inactive:
-	 */
-	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
-
-	return 0;
-}
+extern int copy_fpregs_to_fpstate(struct fpu *fpu);

  static inline void __copy_kernel_to_fpregs(union fpregs_state 
*fpstate, u64 mask)
  {
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 06c8189..1bb7532 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -82,6 +82,45 @@ bool irq_fpu_usable(void)
  }
  EXPORT_SYMBOL(irq_fpu_usable);

+/*
+ * These must be called with preempt disabled. Returns
+ * 'true' if the FPU state is still intact and we can
+ * keep registers active.
+ *
+ * The legacy FNSAVE instruction cleared all FPU state
+ * unconditionally, so registers are essentially destroyed.
+ * Modern FPU state can be kept in registers, if there are
+ * no pending FP exceptions.
+ */
+int copy_fpregs_to_fpstate(struct fpu *fpu)
+{
+	if (likely(use_xsave())) {
+		copy_xregs_to_kernel(&fpu->state.xsave);
+
+		/*
+		 * AVX512 state is tracked here because its use is
+		 * known to slow the max clock speed of the core.
+		 */
+		if (fpu->state.xsave.header.xfeatures & XFEATURE_MASK_AVX512)
+			fpu->avx512_timestamp = jiffies;
+		return 1;
+	}
+
+	if (likely(use_fxsr())) {
+		copy_fxregs_to_kernel(fpu);
+		return 1;
+	}
+
+	/*
+	 * Legacy FPU register saving, FNSAVE always clears FPU registers,
+	 * so we have to mark them inactive:
+	 */
+	asm volatile("fnsave %[fp]; fwait" : [fp] "=m" (fpu->state.fsave));
+
+	return 0;
+}
+EXPORT_SYMBOL(copy_fpregs_to_fpstate);
+
  void kernel_fpu_begin(void)
  {
  	preempt_disable();
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 9c0541d..ca20029 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -58,7 +58,6 @@ static short xsave_cpuid_features[] __initdata = {
   * XSAVE buffer, both supervisor and user xstates.
   */
  u64 xfeatures_mask_all __read_mostly;
-EXPORT_SYMBOL_GPL(xfeatures_mask_all);

  static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... 
XFEATURE_MAX - 1] = -1}; static unsigned int xstate_sizes[XFEATURE_MAX] 
  = { [ 0 ... XFEATURE_MAX - 1] = -1};