All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Andrew Cooper <andrew.cooper3@citrix.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Subject: Re: [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is supported
Date: Thu, 14 Apr 2022 10:24:45 -0700	[thread overview]
Message-ID: <a93e6d3f-e8b9-2fab-1139-a8ba3dc4820b@intel.com> (raw)
In-Reply-To: <20220404104820.713066297@linutronix.de>

On 4/4/22 05:11, Thomas Gleixner wrote:
> A typical scenario is an active set of 0x202 (PKRU + SSE) out of the full
> supported set of 0x2FF. That means XSAVEC/S writes and XRSTOR[S] reads:

It might be worth reminding folks why PKRU is a special snowflake:

The default PKRU enforced by the kernel is its most restrictive possible
value (0xfffffffc).  This means that PKRU defaults to being in its
non-init state even for tasks which do nothing protection-keys-related.


> which is suboptimal. Prefetch works better when the access is linear. But
> what's worse is that PKRU can be located in a different page which
> obviously affects dTLB.

The numbers don't lie, but I'm still surprised by this.  Was this in a
VM that isn't backed with large pages?  task_struct.thread.fpu is
kmem_cache_alloc()'d and is in the direct map, which should be 2M/1G
pages almost all the time.

> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -86,6 +86,8 @@ static unsigned int xstate_flags[XFEATUR
>  #define XSTATE_FLAG_SUPERVISOR	BIT(0)
>  #define XSTATE_FLAG_ALIGNED64	BIT(1)
>  
> +DEFINE_STATIC_KEY_FALSE(__xsave_use_xgetbv1);
> +
>  /*
>   * Return whether the system supports a given xfeature.
>   *
> @@ -1481,7 +1483,7 @@ void xfd_validate_state(struct fpstate *
>  }
>  #endif /* CONFIG_X86_DEBUG_FPU */
>  
> -static int __init xfd_update_static_branch(void)
> +static int __init fpu_update_static_branches(void)
>  {
>  	/*
>  	 * If init_fpstate.xfd has bits set then dynamic features are
> @@ -1489,9 +1491,13 @@ static int __init xfd_update_static_bran
>  	 */
>  	if (init_fpstate.xfd)
>  		static_branch_enable(&__fpu_state_size_dynamic);
> +
> +	if (cpu_feature_enabled(X86_FEATURE_XGETBV1) &&
> +	    cpu_feature_enabled(X86_FEATURE_XCOMPACTED))
> +		static_branch_enable(&__xsave_use_xgetbv1);
>  	return 0;
>  }
> -arch_initcall(xfd_update_static_branch)
> +arch_initcall(fpu_update_static_branches)
>  
>  void fpstate_free(struct fpu *fpu)
>  {
> --- a/arch/x86/kernel/fpu/xstate.h
> +++ b/arch/x86/kernel/fpu/xstate.h
> @@ -10,7 +10,12 @@
>  DECLARE_PER_CPU(u64, xfd_state);
>  #endif
>  
> -static inline bool xsave_use_xgetbv1(void) { return false; }
> +DECLARE_STATIC_KEY_FALSE(__xsave_use_xgetbv1);
> +
> +static __always_inline __pure bool xsave_use_xgetbv1(void)
> +{
> +	return static_branch_likely(&__xsave_use_xgetbv1);
> +}
>  
>  static inline void __xstate_init_xcomp_bv(struct xregs_state *xsave, u64 mask)
>  {
> @@ -185,13 +190,18 @@ static inline int __xfd_enable_feature(u
>  static inline void os_xsave(struct fpstate *fpstate)
>  {
>  	u64 mask = fpstate->xfeatures;
> -	u32 lmask = mask;
> -	u32 hmask = mask >> 32;
> +	u32 lmask, hmask;
>  	int err;
>  
>  	WARN_ON_FPU(!alternatives_patched);
>  	xfd_validate_state(fpstate, mask, false);
>  
> +	if (xsave_use_xgetbv1())
> +		mask &= xgetbv(1);

How about this comment for the masking operation:

	/*
	 * Remove features in their init state from the mask.  This
	 * makes the XSAVE{S,C} writes less sparse and quicker for
	 * the CPU.
	 */

> +	lmask = mask;
> +	hmask = mask >> 32;
> +
>  	XSTATE_XSAVE(&fpstate->regs.xsave, lmask, hmask, err);
>  
>  	/* We should never fault when copying to a kernel buffer: */


  reply	other threads:[~2022-04-14 17:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-04 12:11 [patch 0/3] x86/fpu/xsave: Add XSAVEC support and XGETBV1 utilization Thomas Gleixner
2022-04-04 12:11 ` [patch 1/3] x86/fpu/xsave: Support XSAVEC in the kernel Thomas Gleixner
2022-04-04 16:10   ` Andrew Cooper
2022-04-14 14:43   ` Dave Hansen
2022-04-25 13:11   ` [tip: x86/fpu] " tip-bot2 for Thomas Gleixner
2022-04-04 12:11 ` [patch 2/3] x86/fpu/xsave: Prepare for optimized compaction Thomas Gleixner
2022-04-14 15:46   ` Dave Hansen
2022-04-19 12:39     ` Thomas Gleixner
2022-04-19 13:33       ` Thomas Gleixner
2022-04-04 12:11 ` [patch 3/3] x86/fpu/xsave: Optimize XSAVEC/S when XGETBV1 is supported Thomas Gleixner
2022-04-14 17:24   ` Dave Hansen [this message]
2022-04-19 13:43     ` Thomas Gleixner
2022-04-19 21:22       ` Thomas Gleixner
2022-04-20 18:15         ` Tom Lendacky
2022-04-22 19:30           ` Thomas Gleixner
2022-04-23 15:20             ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a93e6d3f-e8b9-2fab-1139-a8ba3dc4820b@intel.com \
    --to=dave.hansen@intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.