All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] x86/fpu: Improve the init_fpstate setup code
@ 2022-08-24 19:12 Chang S. Bae
  2022-08-24 19:12 ` [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly Chang S. Bae
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Chang S. Bae @ 2022-08-24 19:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, chang.seok.bae

Hi all,

This set of patches fixes the init_fpstate code. So this first version is
sent to the maintainers hoping the fix is reviewable.

Thanks,
Chang

== Background ==

The init_fpstate is an XSAVE image that records init states during the boot
time. It is presumed to cover all the supported and enabled features. The
setup code has been recently optimized to capture legacy states only as all
of the other init states are all zeros.

== Problem with AMX state ==

When AMX is enabled, this buffer is too small to include AMX TILE_DATA
(8KB) as it is statically allocated with about a page. But, the buffer is
formatted to have them all although using the compacted format.

This also leads to a noisy splat with XRSTORS as it expects all the buffer
memory accessible. This is mentioned in Intel SDM Vol.1 13.13 Memory Access
By The XSAVE Feature Set:
    "An execution of an instruction in the XSAVE feature set may access any
     byte of any state component on which that execution operates."

== Other minor issues ==

The existing sanity check could help finding this issue as it checks
whether the allocated init_fpstate is enough for the expected size or not.
But what is currently measured is not matched -- the size without the AMX
state.

Also, these size and features are better to be configured first before
setting up the init image.

== Patchset ==

As AMX requires the compacted format, init_fpstate may exclude dynamic
states. The series also includes other improvments:
* Set up the init_fpstate buffer after its scope is clarified.
* Fix the size that is validated against the static allocation.

Chang S. Bae (3):
  x86/fpu: Configure init_fpstate attributes orderly
  x86/fpu: Fix the init_fpstate size check with the actual size
  x86/fpu: Exclude dynamic states from init_fpstate

 arch/x86/kernel/fpu/init.c   |  8 --------
 arch/x86/kernel/fpu/xstate.c | 33 ++++++++++++++-------------------
 2 files changed, 14 insertions(+), 27 deletions(-)


base-commit: cf90f46223eef9d5f389b4b88ee2fc7914458b06
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly
  2022-08-24 19:12 [PATCH 0/3] x86/fpu: Improve the init_fpstate setup code Chang S. Bae
@ 2022-08-24 19:12 ` Chang S. Bae
  2022-10-03 21:18   ` Neel Natu
  2022-10-17 13:47   ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
  2022-08-24 19:12 ` [PATCH 2/3] x86/fpu: Fix the init_fpstate size check with the actual size Chang S. Bae
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 19+ messages in thread
From: Chang S. Bae @ 2022-08-24 19:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, chang.seok.bae

The init_fpstate setup code is spread out and out of order. The init image
is recorded before its scoped features and the buffer size are determined.

Determine the scope of init_fpstate components and its size before
recording the init state. Also move the relevant code together.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/fpu/init.c   | 8 --------
 arch/x86/kernel/fpu/xstate.c | 6 +++++-
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 621f4b6cac4a..8946f89761cc 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -210,13 +210,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpstate_reset(&current->thread.fpu);
 }
 
-static void __init fpu__init_init_fpstate(void)
-{
-	/* Bring init_fpstate size and features up to date */
-	init_fpstate.size		= fpu_kernel_cfg.max_size;
-	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
-}
-
 /*
  * Called on the boot CPU once per system bootup, to set up the initial
  * FPU state that is later cloned into all processes:
@@ -236,5 +229,4 @@ void __init fpu__init_system(struct cpuinfo_x86 *c)
 	fpu__init_system_xstate_size_legacy();
 	fpu__init_system_xstate(fpu_kernel_cfg.max_size);
 	fpu__init_task_struct_size();
-	fpu__init_init_fpstate();
 }
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..f0ce10620ab0 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -360,7 +360,7 @@ static void __init setup_init_fpu_buf(void)
 
 	print_xstate_features();
 
-	xstate_init_xcomp_bv(&init_fpstate.regs.xsave, fpu_kernel_cfg.max_features);
+	xstate_init_xcomp_bv(&init_fpstate.regs.xsave, init_fpstate.xfeatures);
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
@@ -875,6 +875,10 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	update_regset_xstate_info(fpu_user_cfg.max_size,
 				  fpu_user_cfg.max_features);
 
+	/* Bring init_fpstate size and features up to date */
+	init_fpstate.size		= fpu_kernel_cfg.max_size;
+	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
+
 	setup_init_fpu_buf();
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] x86/fpu: Fix the init_fpstate size check with the actual size
  2022-08-24 19:12 [PATCH 0/3] x86/fpu: Improve the init_fpstate setup code Chang S. Bae
  2022-08-24 19:12 ` [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly Chang S. Bae
@ 2022-08-24 19:12 ` Chang S. Bae
  2022-10-17 13:47   ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
  2022-08-24 19:12 ` [PATCH 3/3] x86/fpu: Exclude dynamic states from init_fpstate Chang S. Bae
  2022-10-18 22:13 ` [PATCH 0/1] x86/fpu: Follow up on the init_fpstate fix Chang S. Bae
  3 siblings, 1 reply; 19+ messages in thread
From: Chang S. Bae @ 2022-08-24 19:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, chang.seok.bae

The init_fpstate buffer is statically allocated. Thus, the sanity test was
established to check whether the pre-allocated buffer is enough for the
calculated size or not.

The currently measured size is not strictly relevant. Fix to validate the
calculated init_fpstate size with the pre-allocated area.

Also, replace the sanity check function with open code for clarity. The
abstraction itself and the function naming do not tend to represent simply
what it does.

Fixes: 2ae996e0c1a3 ("x86/fpu: Calculate the default sizes independently")
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/fpu/xstate.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f0ce10620ab0..f5ef78633b4c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -678,20 +678,6 @@ static unsigned int __init get_xsave_size_user(void)
 	return ebx;
 }
 
-/*
- * Will the runtime-enumerated 'xstate_size' fit in the init
- * task's statically-allocated buffer?
- */
-static bool __init is_supported_xstate_size(unsigned int test_xstate_size)
-{
-	if (test_xstate_size <= sizeof(init_fpstate.regs))
-		return true;
-
-	pr_warn("x86/fpu: xstate buffer too small (%zu < %d), disabling xsave\n",
-			sizeof(init_fpstate.regs), test_xstate_size);
-	return false;
-}
-
 static int __init init_xstate_size(void)
 {
 	/* Recompute the context size for enabled features: */
@@ -717,10 +703,6 @@ static int __init init_xstate_size(void)
 	kernel_default_size =
 		xstate_calculate_size(fpu_kernel_cfg.default_features, compacted);
 
-	/* Ensure we have the space to store all default enabled features. */
-	if (!is_supported_xstate_size(kernel_default_size))
-		return -EINVAL;
-
 	if (!paranoid_xstate_size_valid(kernel_size))
 		return -EINVAL;
 
@@ -879,6 +861,12 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	init_fpstate.size		= fpu_kernel_cfg.max_size;
 	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
 
+	if (init_fpstate.size > sizeof(init_fpstate.regs)) {
+		pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
+			sizeof(init_fpstate.regs), init_fpstate.size);
+		goto out_disable;
+	}
+
 	setup_init_fpu_buf();
 
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] x86/fpu: Exclude dynamic states from init_fpstate
  2022-08-24 19:12 [PATCH 0/3] x86/fpu: Improve the init_fpstate setup code Chang S. Bae
  2022-08-24 19:12 ` [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly Chang S. Bae
  2022-08-24 19:12 ` [PATCH 2/3] x86/fpu: Fix the init_fpstate size check with the actual size Chang S. Bae
@ 2022-08-24 19:12 ` Chang S. Bae
  2022-10-17 13:47   ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
  2022-10-18 22:13 ` [PATCH 0/1] x86/fpu: Follow up on the init_fpstate fix Chang S. Bae
  3 siblings, 1 reply; 19+ messages in thread
From: Chang S. Bae @ 2022-08-24 19:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: x86, chang.seok.bae

== Background ==

The XSTATE init code initializes all enabled and supported components.
Then, the init states are saved in the init_fpstate buffer that is
statically allocated in about one page.

The AMX TILE_DATA state is large (8KB) but its init state is zero. And the
feature comes only with the compacted format with these established
dependencies: AMX->XFD->XSAVES. So this state is excludable from
init_fpstate.

== Problem ==

But the buffer is formatted to include that large state. Then, this can be
the cause of a noisy splat like the below.

This came from XRSTORS for the task with init_fpstate in its XSAVE buffer.
It is reproducible on AMX systems when the running kernel is built with
CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y:

[   30.583122] ------------[ cut here ]------------
[   30.586625] Bad FPU state detected at restore_fpregs_from_fpstate+0x57/0xd0, reinitializing FPU registers.
[   30.586676] WARNING: CPU: 130 PID: 1689 at arch/x86/mm/extable.c:74 fixup_exception+0x2c1/0x2f0
[   30.602091] CPU: 130 PID: 1689 Comm: probe-bcache Not tainted 5.19.0-various+ #1077
[   30.610381] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.7314.D09.2202231344 02/23/2022
[   30.618850] RIP: 0010:fixup_exception+0x2c1/0x2f0
[   30.618859] Code: bd fe ff ff e9 1c ff ff ff 0f 0b 48 c7 c2 90 05 8c 92 e9 32 ff ff ff 48 c7 c7 20 a7 fe 91 c6 05 4a dc 99 01 01 e8 4f de 01 00 <0f> 0b eb ab 0f 0b 48 c7 c2 90 05 8c 92 e9 16 fe ff ff 31 f6 4c 89
[   30.636326] RSP: 0018:ff36ae41e4af7ca8 EFLAGS: 00010082
[   30.636331] RAX: 0000000000000000 RBX: ffffffff921c6138 RCX: 0000000000000001
[   30.636334] RDX: 0000000080000001 RSI: 00000000ffff7fff RDI: 00000000ffffffff
[   30.636337] RBP: ff36ae41e4af7cc8 R08: 0000000000000000 R09: c0000000ffff7fff
[   30.636339] R10: 0000000000000000 R11: ff36ae41e4af7af8 R12: ff36ae41e4af7dc8
[   30.636342] R13: 000000000000000e R14: 0000000000000000 R15: 0000000000000001
[   30.636344] FS:  0000000000000000(0000) GS:ff19681e5fd80000(0000) knlGS:0000000000000000
[   30.636348] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   30.645197] CR2: ff196816897300bf CR3: 0000001083388006 CR4: 0000000000771ee0
[   30.645201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   30.689689] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   30.694040] PKRU: 55555554
[   30.698347] Call Trace:
[   30.702615]  <TASK>
[   30.706829]  kernelmode_fixup_or_oops+0x49/0x120
[   30.711098]  __bad_area_nosemaphore+0x15a/0x200
[   30.715346]  bad_area_nosemaphore+0x16/0x20
[   30.719610]  do_kern_addr_fault+0x43/0xa0
[   30.723810]  exc_page_fault+0xdd/0x180
[   30.727968]  asm_exc_page_fault+0x27/0x30
[   30.732088] RIP: 0010:restore_fpregs_from_fpstate+0x57/0xd0
[   30.736240] Code: 4c 48 23 1d 43 7d 61 01 4c 89 e7 ba 01 00 00 00 48 89 de e8 0b 34 00 00 48 89 da 49 8d 7c 24 40 89 d8 48 c1 ea 20 48 0f c7 1f <48> 83 c4 08 5b 41 5c 5d c3 cc cc cc cc 48 8b 45 e8 48 0f ae 48 40
[   30.744890] RSP: 0018:ff36ae41e4af7e78 EFLAGS: 00010046
[   30.749209] RAX: 00000000000604e7 RBX: 00000000000604e7 RCX: 0000000000040000
[   30.753572] RDX: 0000000000000000 RSI: ffffffff920d70ad RDI: ff1968168972d6c0
[   30.757937] RBP: ff36ae41e4af7e90 R08: ff196816861bdd80 R09: 0000000000020009
[   30.762350] R10: 8080808080808080 R11: fefefefefefefeff R12: ff1968168972d680
[   30.766717] R13: 0000000000000082 R14: 0000000000000000 R15: 0000000000000000
[   30.771026]  ? restore_fpregs_from_fpstate+0x45/0xd0
[   30.775318]  switch_fpu_return+0x4e/0xe0
[   30.779599]  exit_to_user_mode_prepare+0x17b/0x1b0
[   30.783892]  syscall_exit_to_user_mode+0x29/0x40
[   30.788170]  do_syscall_64+0x67/0x80
[   30.792354]  ? do_syscall_64+0x67/0x80
[   30.796451]  ? exc_page_fault+0x86/0x180
[   30.800428]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   30.804317] RIP: 0033:0x7fdce1d9a2b0
[   30.808108] Code: Unable to access opcode bytes at RIP 0x7fdce1d9a286.
[   30.811926] RSP: 002b:00007ffe7368b950 EFLAGS: 00000200 ORIG_RAX: 000000000000003b
[   30.811931] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   30.811932] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[   30.811934] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[   30.811936] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[   30.819620] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   30.819625]  </TASK>
[   30.819627] ---[ end trace 0000000000000000 ]---

== Solution ==

Adjust init_fpstate to exclude dynamic states. XRSTORS from init_fpstate
can still initialize those states when their bits are set in the
requested-feature bitmap.

Reported-by: Lin X Wang <lin.x.wang@intel.com>
Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Tested-by: Lin X Wang <lin.x.wang@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/fpu/xstate.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f5ef78633b4c..e77cabfa802f 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -857,9 +857,12 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	update_regset_xstate_info(fpu_user_cfg.max_size,
 				  fpu_user_cfg.max_features);
 
-	/* Bring init_fpstate size and features up to date */
-	init_fpstate.size		= fpu_kernel_cfg.max_size;
-	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
+	/*
+	 * init_fpstate excludes dynamic states as they are large but init
+	 * state is zero.
+	 */
+	init_fpstate.size		= fpu_kernel_cfg.default_size;
+	init_fpstate.xfeatures		= fpu_kernel_cfg.default_features;
 
 	if (init_fpstate.size > sizeof(init_fpstate.regs)) {
 		pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly
  2022-08-24 19:12 ` [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly Chang S. Bae
@ 2022-10-03 21:18   ` Neel Natu
  2022-10-17 13:47   ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 19+ messages in thread
From: Neel Natu @ 2022-10-03 21:18 UTC (permalink / raw)
  To: chang.seok.bae; +Cc: linux-kernel, x86, neelnatu

Acked-by: neelnatu@google.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip: x86/urgent] x86/fpu: Exclude dynamic states from init_fpstate
  2022-08-24 19:12 ` [PATCH 3/3] x86/fpu: Exclude dynamic states from init_fpstate Chang S. Bae
@ 2022-10-17 13:47   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 19+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2022-10-17 13:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Lin X Wang, Chang S. Bae, Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     a401f45e38754953c9d402f8b3bc965707eecc91
Gitweb:        https://git.kernel.org/tip/a401f45e38754953c9d402f8b3bc965707eecc91
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Wed, 24 Aug 2022 12:12:23 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 17 Oct 2022 15:44:25 +02:00

x86/fpu: Exclude dynamic states from init_fpstate

== Background ==

The XSTATE init code initializes all enabled and supported components.
Then, the init states are saved in the init_fpstate buffer that is
statically allocated in about one page.

The AMX TILE_DATA state is large (8KB) but its init state is zero. And the
feature comes only with the compacted format with these established
dependencies: AMX->XFD->XSAVES. So this state is excludable from
init_fpstate.

== Problem ==

But the buffer is formatted to include that large state. Then, this can be
the cause of a noisy splat like the below.

This came from XRSTORS for the task with init_fpstate in its XSAVE buffer.
It is reproducible on AMX systems when the running kernel is built with
CONFIG_DEBUG_PAGEALLOC=y and CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y:

 Bad FPU state detected at restore_fpregs_from_fpstate+0x57/0xd0, reinitializing FPU registers.
 ...
 RIP: 0010:restore_fpregs_from_fpstate+0x57/0xd0
  ? restore_fpregs_from_fpstate+0x45/0xd0
  switch_fpu_return+0x4e/0xe0
  exit_to_user_mode_prepare+0x17b/0x1b0
  syscall_exit_to_user_mode+0x29/0x40
  do_syscall_64+0x67/0x80
  ? do_syscall_64+0x67/0x80
  ? exc_page_fault+0x86/0x180
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

== Solution ==

Adjust init_fpstate to exclude dynamic states. XRSTORS from init_fpstate
still initializes those states when their bits are set in the
requested-feature bitmap.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Lin X Wang <lin.x.wang@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lin X Wang <lin.x.wang@intel.com>
Link: https://lore.kernel.org/r/20220824191223.1248-4-chang.seok.bae@intel.com

---
 arch/x86/kernel/fpu/xstate.c |  9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f5ef786..e77cabf 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -857,9 +857,12 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	update_regset_xstate_info(fpu_user_cfg.max_size,
 				  fpu_user_cfg.max_features);
 
-	/* Bring init_fpstate size and features up to date */
-	init_fpstate.size		= fpu_kernel_cfg.max_size;
-	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
+	/*
+	 * init_fpstate excludes dynamic states as they are large but init
+	 * state is zero.
+	 */
+	init_fpstate.size		= fpu_kernel_cfg.default_size;
+	init_fpstate.xfeatures		= fpu_kernel_cfg.default_features;
 
 	if (init_fpstate.size > sizeof(init_fpstate.regs)) {
 		pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip: x86/urgent] x86/fpu: Fix the init_fpstate size check with the actual size
  2022-08-24 19:12 ` [PATCH 2/3] x86/fpu: Fix the init_fpstate size check with the actual size Chang S. Bae
@ 2022-10-17 13:47   ` tip-bot2 for Chang S. Bae
  0 siblings, 0 replies; 19+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2022-10-17 13:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Chang S. Bae, Thomas Gleixner, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     d3e021adac7c51a26d9ede167c789fcc1b878467
Gitweb:        https://git.kernel.org/tip/d3e021adac7c51a26d9ede167c789fcc1b878467
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Wed, 24 Aug 2022 12:12:22 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 17 Oct 2022 15:44:25 +02:00

x86/fpu: Fix the init_fpstate size check with the actual size

The init_fpstate buffer is statically allocated. Thus, the sanity test was
established to check whether the pre-allocated buffer is enough for the
calculated size or not.

The currently measured size is not strictly relevant. Fix to validate the
calculated init_fpstate size with the pre-allocated area.

Also, replace the sanity check function with open code for clarity. The
abstraction itself and the function naming do not tend to represent simply
what it does.

Fixes: 2ae996e0c1a3 ("x86/fpu: Calculate the default sizes independently")
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220824191223.1248-3-chang.seok.bae@intel.com

---
 arch/x86/kernel/fpu/xstate.c | 24 ++++++------------------
 1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f0ce106..f5ef786 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -678,20 +678,6 @@ static unsigned int __init get_xsave_size_user(void)
 	return ebx;
 }
 
-/*
- * Will the runtime-enumerated 'xstate_size' fit in the init
- * task's statically-allocated buffer?
- */
-static bool __init is_supported_xstate_size(unsigned int test_xstate_size)
-{
-	if (test_xstate_size <= sizeof(init_fpstate.regs))
-		return true;
-
-	pr_warn("x86/fpu: xstate buffer too small (%zu < %d), disabling xsave\n",
-			sizeof(init_fpstate.regs), test_xstate_size);
-	return false;
-}
-
 static int __init init_xstate_size(void)
 {
 	/* Recompute the context size for enabled features: */
@@ -717,10 +703,6 @@ static int __init init_xstate_size(void)
 	kernel_default_size =
 		xstate_calculate_size(fpu_kernel_cfg.default_features, compacted);
 
-	/* Ensure we have the space to store all default enabled features. */
-	if (!is_supported_xstate_size(kernel_default_size))
-		return -EINVAL;
-
 	if (!paranoid_xstate_size_valid(kernel_size))
 		return -EINVAL;
 
@@ -879,6 +861,12 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	init_fpstate.size		= fpu_kernel_cfg.max_size;
 	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
 
+	if (init_fpstate.size > sizeof(init_fpstate.regs)) {
+		pr_warn("x86/fpu: init_fpstate buffer too small (%zu < %d), disabling XSAVE\n",
+			sizeof(init_fpstate.regs), init_fpstate.size);
+		goto out_disable;
+	}
+
 	setup_init_fpu_buf();
 
 	/*

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip: x86/urgent] x86/fpu: Configure init_fpstate attributes orderly
  2022-08-24 19:12 ` [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly Chang S. Bae
  2022-10-03 21:18   ` Neel Natu
@ 2022-10-17 13:47   ` tip-bot2 for Chang S. Bae
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2022-10-17 13:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chang S. Bae, Thomas Gleixner, neelnatu, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     c32d7cab57e3a77af8ecc17cde7a5761a26483b8
Gitweb:        https://git.kernel.org/tip/c32d7cab57e3a77af8ecc17cde7a5761a26483b8
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Wed, 24 Aug 2022 12:12:21 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Mon, 17 Oct 2022 15:44:25 +02:00

x86/fpu: Configure init_fpstate attributes orderly

The init_fpstate setup code is spread out and out of order. The init image
is recorded before its scoped features and the buffer size are determined.

Determine the scope of init_fpstate components and its size before
recording the init state. Also move the relevant code together.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: neelnatu@google.com
Link: https://lore.kernel.org/r/20220824191223.1248-2-chang.seok.bae@intel.com

---
 arch/x86/kernel/fpu/init.c   | 8 --------
 arch/x86/kernel/fpu/xstate.c | 6 +++++-
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 621f4b6..8946f89 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -210,13 +210,6 @@ static void __init fpu__init_system_xstate_size_legacy(void)
 	fpstate_reset(&current->thread.fpu);
 }
 
-static void __init fpu__init_init_fpstate(void)
-{
-	/* Bring init_fpstate size and features up to date */
-	init_fpstate.size		= fpu_kernel_cfg.max_size;
-	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
-}
-
 /*
  * Called on the boot CPU once per system bootup, to set up the initial
  * FPU state that is later cloned into all processes:
@@ -236,5 +229,4 @@ void __init fpu__init_system(struct cpuinfo_x86 *c)
 	fpu__init_system_xstate_size_legacy();
 	fpu__init_system_xstate(fpu_kernel_cfg.max_size);
 	fpu__init_task_struct_size();
-	fpu__init_init_fpstate();
 }
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c834015..f0ce106 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -360,7 +360,7 @@ static void __init setup_init_fpu_buf(void)
 
 	print_xstate_features();
 
-	xstate_init_xcomp_bv(&init_fpstate.regs.xsave, fpu_kernel_cfg.max_features);
+	xstate_init_xcomp_bv(&init_fpstate.regs.xsave, init_fpstate.xfeatures);
 
 	/*
 	 * Init all the features state with header.xfeatures being 0x0
@@ -875,6 +875,10 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
 	update_regset_xstate_info(fpu_user_cfg.max_size,
 				  fpu_user_cfg.max_features);
 
+	/* Bring init_fpstate size and features up to date */
+	init_fpstate.size		= fpu_kernel_cfg.max_size;
+	init_fpstate.xfeatures		= fpu_kernel_cfg.max_features;
+
 	setup_init_fpu_buf();
 
 	/*

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 0/1] x86/fpu: Follow up on the init_fpstate fix
  2022-08-24 19:12 [PATCH 0/3] x86/fpu: Improve the init_fpstate setup code Chang S. Bae
                   ` (2 preceding siblings ...)
  2022-08-24 19:12 ` [PATCH 3/3] x86/fpu: Exclude dynamic states from init_fpstate Chang S. Bae
@ 2022-10-18 22:13 ` Chang S. Bae
  2022-10-18 22:13   ` [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly Chang S. Bae
  2023-02-27 21:05   ` [PATCH 0/2] x86/fpu/xstate: Follow up on the init_fpstate fallout again Chang S. Bae
  3 siblings, 2 replies; 19+ messages in thread
From: Chang S. Bae @ 2022-10-18 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, hpa, yuan.yao, chang.seok.bae

In short, the init_fpstate fix currently at tip's x86/urgent [6] is
incomplete for KVM. This patch follows up on that.

---

While these patches [1][2] attempt to fix the init fpstate, it results in a
fallout with KVM [3]:

The VCPU's XSAVE buffer is expanded at the time of VCPU setup [4] when any
dynamic feature is determined to be exposed via KVM_SET_CPUID.

Then, when the VCPU thread is executed [5], the xstate is copied to the
userspace via KVM_GET_XSAVE*. At the moment, the entire guest fpstate is in
init state.

But, init_fpstate has been incorrectly indicating the inclusion of
dynamic states. Subsequently, in the xstate copy loop, memory beyond the
init_fpstate looks to be referenced for the init states of dynamic
features.

The tip-merged series [1] fixes init_fpstate only. The xstate copy loop
remains to retrieve init_fpstate. Then, it will explode when a
dynamic-featured VCPU is about executed.

The included patch is to fix the copy function. It is based on the
x86/urgent branch [6].

Thanks,
Chang

[1] https://lore.kernel.org/lkml/20220824191223.1248-1-chang.seok.bae@intel.com/
[2] https://lore.kernel.org/lkml/20221011222425.866137-1-dave.hansen@linux.intel.com/
[3] https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
[4] https://gitlab.com/qemu-project/qemu/-/blob/master/accel/kvm/kvm-accel-ops.c#L42
[5] https://gitlab.com/qemu-project/qemu/-/blob/master/accel/kvm/kvm-accel-ops.c#L51
[6] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/urgent

Chang S. Bae (1):
  x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly

 arch/x86/kernel/fpu/xstate.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)


base-commit: 67bf6493449b09590f9f71d7df29efb392b12d25
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
  2022-10-18 22:13 ` [PATCH 0/1] x86/fpu: Follow up on the init_fpstate fix Chang S. Bae
@ 2022-10-18 22:13   ` Chang S. Bae
  2022-10-20 16:57     ` Dave Hansen
  2022-10-21 18:58     ` [PATCH v2] " Chang S. Bae
  2023-02-27 21:05   ` [PATCH 0/2] x86/fpu/xstate: Follow up on the init_fpstate fallout again Chang S. Bae
  1 sibling, 2 replies; 19+ messages in thread
From: Chang S. Bae @ 2022-10-18 22:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, hpa, yuan.yao, chang.seok.bae

When an extended state component is present in fpstate, but in init state,
the function copies from init_fpstate via copy_feature().

But, dynamic states are not present in init_fpstate. Then accessing
init_fpstate for those will explode like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:memcpy_erms+0x6/0x10
  ? __copy_xstate_to_uabi_buf+0x381/0x870
  fpu_copy_guest_fpstate_to_uabi+0x28/0x80
  kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
  ? __this_cpu_preempt_check+0x13/0x20
  ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
  kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? __fget_light+0xd4/0x130
  __x64_sys_ioctl+0xe3/0x910
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x27/0x50
  do_syscall_64+0x3f/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Instead of referencing init_fpstate, simply zero out the userspace buffer
for the state component in an all-zeros init state.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Yuan Yao <yuan.yao@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Yuan Yao <yuan.yao@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
---
 arch/x86/kernel/fpu/xstate.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e77cabfa802f..efa9e3a269fc 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1141,10 +1141,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 			 */
 			pkru.pkru = pkru_val;
 			membuf_write(&to, &pkru, sizeof(pkru));
+		} else if (!(header.xfeatures & BIT_ULL(i))) {
+			/*
+			 * Every extended state component has an all zeros
+			 * init state.
+			 */
+			membuf_zero(&to, xstate_sizes[i]);
 		} else {
-			copy_feature(header.xfeatures & BIT_ULL(i), &to,
-				     __raw_xsave_addr(xsave, i),
-				     __raw_xsave_addr(xinit, i),
+			membuf_write(&to, __raw_xsave_addr(xsave, i),
 				     xstate_sizes[i]);
 		}
 		/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
  2022-10-18 22:13   ` [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly Chang S. Bae
@ 2022-10-20 16:57     ` Dave Hansen
  2022-10-20 18:52       ` Chang S. Bae
  2022-10-21 18:58     ` [PATCH v2] " Chang S. Bae
  1 sibling, 1 reply; 19+ messages in thread
From: Dave Hansen @ 2022-10-20 16:57 UTC (permalink / raw)
  To: Chang S. Bae, linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, hpa, yuan.yao

On 10/18/22 15:13, Chang S. Bae wrote:
> @@ -1141,10 +1141,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
>  			 */
>  			pkru.pkru = pkru_val;
>  			membuf_write(&to, &pkru, sizeof(pkru));
> +		} else if (!(header.xfeatures & BIT_ULL(i))) {
> +			/*
> +			 * Every extended state component has an all zeros
> +			 * init state.
> +			 */
> +			membuf_zero(&to, xstate_sizes[i]);
>  		} else {
> -			copy_feature(header.xfeatures & BIT_ULL(i), &to,
> -				     __raw_xsave_addr(xsave, i),
> -				     __raw_xsave_addr(xinit, i),
> +			membuf_write(&to, __raw_xsave_addr(xsave, i),
>  				     xstate_sizes[i]);
>  		}

Just to add a bit more context, this is inside this loop:

        mask = fpstate->user_xfeatures;
        for_each_extended_xfeature(i, mask) {
                if (zerofrom < xstate_offsets[i])
                        membuf_zero(&to, xstate_offsets[i] - zerofrom);
		...
	}
        if (to.left)
                membuf_zero(&to, to.left);

In other words, the loop and the surrounding code already know how to
membuf_zero() any gaps in the middle or the end of the user buffer.
Would it be simpler to just adjust the 'mask' over which the loop iterates?

I think that would end up being something like:

	 mask = fpstate->user_xfeatures &
		(xsave->xfeatures | xinit->xfeatures);

Logically, that makes sense too.  We're copying out of either 'xsave' or
'xinit'.  If a feature isn't in either one of those we can't do the
copy_feature() on it.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
  2022-10-20 16:57     ` Dave Hansen
@ 2022-10-20 18:52       ` Chang S. Bae
  0 siblings, 0 replies; 19+ messages in thread
From: Chang S. Bae @ 2022-10-20 18:52 UTC (permalink / raw)
  To: Dave Hansen, linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, hpa, yuan.yao

On 10/20/2022 9:57 AM, Dave Hansen wrote:
> On 10/18/22 15:13, Chang S. Bae wrote:
>> @@ -1141,10 +1141,14 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
>>   			 */
>>   			pkru.pkru = pkru_val;
>>   			membuf_write(&to, &pkru, sizeof(pkru));
>> +		} else if (!(header.xfeatures & BIT_ULL(i))) {
>> +			/*
>> +			 * Every extended state component has an all zeros
>> +			 * init state.
>> +			 */
>> +			membuf_zero(&to, xstate_sizes[i]);
>>   		} else {
>> -			copy_feature(header.xfeatures & BIT_ULL(i), &to,
>> -				     __raw_xsave_addr(xsave, i),
>> -				     __raw_xsave_addr(xinit, i),
>> +			membuf_write(&to, __raw_xsave_addr(xsave, i),
>>   				     xstate_sizes[i]);
>>   		}
> 
> Just to add a bit more context, this is inside this loop:
> 
>          mask = fpstate->user_xfeatures;
>          for_each_extended_xfeature(i, mask) {
>                  if (zerofrom < xstate_offsets[i])
>                          membuf_zero(&to, xstate_offsets[i] - zerofrom);
> 		...
> 	}
>          if (to.left)
>                  membuf_zero(&to, to.left);
> 
> In other words, the loop and the surrounding code already know how to
> membuf_zero() any gaps in the middle or the end of the user buffer.
> Would it be simpler to just adjust the 'mask' over which the loop iterates?

Yeah, right!

> I think that would end up being something like:
> 
> 	 mask = fpstate->user_xfeatures &
> 		(xsave->xfeatures | xinit->xfeatures);
> 
> Logically, that makes sense too.  We're copying out of either 'xsave' or
> 'xinit'.  If a feature isn't in either one of those we can't do the
> copy_feature() on it.

Yes, it is. But, one tricky part here is xinit->xstate_bv is zero. 
Instead, xinit->xcomp_bv appears to be relevant. Also, we want this for 
dynamic features that rely on XSAVES. Then, the change can be something 
like this:

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e77cabfa802f..3f3286d7e1a8 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1125,6 +1125,15 @@ void __copy_xstate_to_uabi_buf(struct membuf to, 
struct fpstate *fpstate,
          */
         mask = fpstate->user_xfeatures;

+       /*
+        * Dynamic features are not present in init_fpstate since they have
+        * an all zeros init state. When they are in init state, instead of
+        * retrieving them from init_fpstate, remove those from 'mask' to
+        * zero the user buffer.
+        */
+       if (fpu_state_size_dynamic())
+               mask &= (header.xfeatures | xinit->header.xcomp_bv);

Thanks,
Chang

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v2] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
  2022-10-18 22:13   ` [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly Chang S. Bae
  2022-10-20 16:57     ` Dave Hansen
@ 2022-10-21 18:58     ` Chang S. Bae
  2022-10-21 22:25       ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
  1 sibling, 1 reply; 19+ messages in thread
From: Chang S. Bae @ 2022-10-21 18:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, hpa, yuan.yao, chang.seok.bae

When an extended state component is not present in fpstate, but in init
state, the function copies from init_fpstate via copy_feature().

But, dynamic states are not present in init_fpstate because of all-zeros
init states. Then retrieving them from init_fpstate will explode like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:memcpy_erms+0x6/0x10
  ? __copy_xstate_to_uabi_buf+0x381/0x870
  fpu_copy_guest_fpstate_to_uabi+0x28/0x80
  kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
  ? __this_cpu_preempt_check+0x13/0x20
  ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
  kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? __fget_light+0xd4/0x130
  __x64_sys_ioctl+0xe3/0x910
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x27/0x50
  do_syscall_64+0x3f/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Adjust the 'mask' to zero out the userspace buffer for the features that
are not available both from fpstate and from init_fpstate.

The dynamic features depend on the compacted XSAVE format. Ensure it is
enabled before reading XCOMP_BV in init_fpstate.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Yuan Yao <yuan.yao@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Yuan Yao <yuan.yao@intel.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
---
Change from v1:
* Adjust the 'mask' instead of the loop iteration code (Dave Hansen).

The context of this along with the init_fpstate fix was described in the v1
cover letter:
  https://lore.kernel.org/lkml/20221018221349.4196-1-chang.seok.bae@intel.com/
---
 arch/x86/kernel/fpu/xstate.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e77cabfa802f..59e543b95a3c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1125,6 +1125,15 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 	 */
 	mask = fpstate->user_xfeatures;
 
+	/*
+	 * Dynamic features are not present in init_fpstate. When they are
+	 * in an all zeros init state, remove those from 'mask' to zero
+	 * those features in the user buffer instead of retrieving them
+	 * from init_fpstate.
+	 */
+	if (fpu_state_size_dynamic())
+		mask &= (header.xfeatures | xinit->header.xcomp_bv);
+
 	for_each_extended_xfeature(i, mask) {
 		/*
 		 * If there was a feature or alignment gap, zero the space
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [tip: x86/urgent] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
  2022-10-21 18:58     ` [PATCH v2] " Chang S. Bae
@ 2022-10-21 22:25       ` tip-bot2 for Chang S. Bae
  2022-10-22  6:23         ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: tip-bot2 for Chang S. Bae @ 2022-10-21 22:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yuan Yao, Dave Hansen, Chang S. Bae, Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     471f0aa7fa64e23766a1473b32d9ec3f0718895a
Gitweb:        https://git.kernel.org/tip/471f0aa7fa64e23766a1473b32d9ec3f0718895a
Author:        Chang S. Bae <chang.seok.bae@intel.com>
AuthorDate:    Fri, 21 Oct 2022 11:58:44 -07:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Fri, 21 Oct 2022 15:22:09 -07:00

x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly

When an extended state component is not present in fpstate, but in init
state, the function copies from init_fpstate via copy_feature().

But, dynamic states are not present in init_fpstate because of all-zeros
init states. Then retrieving them from init_fpstate will explode like this:

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:memcpy_erms+0x6/0x10
  ? __copy_xstate_to_uabi_buf+0x381/0x870
  fpu_copy_guest_fpstate_to_uabi+0x28/0x80
  kvm_arch_vcpu_ioctl+0x14c/0x1460 [kvm]
  ? __this_cpu_preempt_check+0x13/0x20
  ? vmx_vcpu_put+0x2e/0x260 [kvm_intel]
  kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? kvm_vcpu_ioctl+0xea/0x6b0 [kvm]
  ? __fget_light+0xd4/0x130
  __x64_sys_ioctl+0xe3/0x910
  ? debug_smp_processor_id+0x17/0x20
  ? fpregs_assert_state_consistent+0x27/0x50
  do_syscall_64+0x3f/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Adjust the 'mask' to zero out the userspace buffer for the features that
are not available both from fpstate and from init_fpstate.

The dynamic features depend on the compacted XSAVE format. Ensure it is
enabled before reading XCOMP_BV in init_fpstate.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Yuan Yao <yuan.yao@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/lkml/BYAPR11MB3717EDEF2351C958F2C86EED95259@BYAPR11MB3717.namprd11.prod.outlook.com/
Link: https://lkml.kernel.org/r/20221021185844.13472-1-chang.seok.bae@intel.com
---
 arch/x86/kernel/fpu/xstate.c |  9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e77cabf..59e543b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1125,6 +1125,15 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 	 */
 	mask = fpstate->user_xfeatures;
 
+	/*
+	 * Dynamic features are not present in init_fpstate. When they are
+	 * in an all zeros init state, remove those from 'mask' to zero
+	 * those features in the user buffer instead of retrieving them
+	 * from init_fpstate.
+	 */
+	if (fpu_state_size_dynamic())
+		mask &= (header.xfeatures | xinit->header.xcomp_bv);
+
 	for_each_extended_xfeature(i, mask) {
 		/*
 		 * If there was a feature or alignment gap, zero the space

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [tip: x86/urgent] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
  2022-10-21 22:25       ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
@ 2022-10-22  6:23         ` Ingo Molnar
  2022-10-22 13:02           ` Chang S. Bae
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2022-10-22  6:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, Yuan Yao, Dave Hansen, Chang S. Bae, Dave Hansen, x86


* tip-bot2 for Chang S. Bae <tip-bot2@linutronix.de> wrote:

> The following commit has been merged into the x86/urgent branch of tip:
> 
> Commit-ID:     471f0aa7fa64e23766a1473b32d9ec3f0718895a
> Gitweb:        https://git.kernel.org/tip/471f0aa7fa64e23766a1473b32d9ec3f0718895a
> Author:        Chang S. Bae <chang.seok.bae@intel.com>
> AuthorDate:    Fri, 21 Oct 2022 11:58:44 -07:00
> Committer:     Dave Hansen <dave.hansen@linux.intel.com>
> CommitterDate: Fri, 21 Oct 2022 15:22:09 -07:00
> 
> x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly

> +		mask &= (header.xfeatures | xinit->header.xcomp_bv);

Nit: those parentheses are not needed.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [tip: x86/urgent] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly
  2022-10-22  6:23         ` Ingo Molnar
@ 2022-10-22 13:02           ` Chang S. Bae
  0 siblings, 0 replies; 19+ messages in thread
From: Chang S. Bae @ 2022-10-22 13:02 UTC (permalink / raw)
  To: Ingo Molnar, linux-kernel
  Cc: linux-tip-commits, Yuan Yao, Dave Hansen, Dave Hansen, x86

On 10/21/2022 11:23 PM, Ingo Molnar wrote:
> 
> Nit: those parentheses are not needed.

Ah, right.

Sorry, let me follow up on this later.

Thanks,
Chang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/2] x86/fpu/xstate: Follow up on the init_fpstate fallout again
  2022-10-18 22:13 ` [PATCH 0/1] x86/fpu: Follow up on the init_fpstate fix Chang S. Bae
  2022-10-18 22:13   ` [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly Chang S. Bae
@ 2023-02-27 21:05   ` Chang S. Bae
  2023-02-27 21:05     ` [PATCH 1/2] x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf() Chang S. Bae
  2023-02-27 21:05     ` [PATCH 2/2] selftests/x86/amx: Add a ptrace test Chang S. Bae
  1 sibling, 2 replies; 19+ messages in thread
From: Chang S. Bae @ 2023-02-27 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, dave.hansen, hpa,
	linux-kselftest, mizhang, chang.seok.bae

Dear maintainers,

This series is following up on the last fix [2]. I thought I could
forget about it with that. But, I was wrong because now this was
realized as an incomplete solution -- my bad. Here is some context for
this series:

The last fix [3] has resolved the case when copying the initialized
dynamic state from init_fpstate to the user buffer in
__copy_xstate_to_uabi_buf(). (This was intended to resolve the fallout
of the init_fpstate fix [1].)

But, when copying the *non-initialized* dynamic state from the task
xstate, the code [4] unconditionally retrieves the address in
init_fpstate which is needless. Consequently, this triggers a
false-positive warning as shown in [5] which meaninglessly confuses
users.

With these repetitive surgeries, a more solid and comprehensive
solution is more helpful I thought. Considerably removing init_fpstate
from this loop is not impossible here because dynamic states have an
all-zeros init state. Then, zeroing the user buffer instead of
retrieving init_fpstate resolves the issue and simplifies the code.

These issues were discovered from the KVM execution with launching a
guest and running the KVM self-test as __copy_xstate_to_uabi_buf() was
called. But, the negligibly missing ptrace test could have disclosed
them too. So that case is included here.

Thanks,
Chang

[1] https://lore.kernel.org/lkml/20220824191223.1248-1-chang.seok.bae@intel.com/
[2] https://lore.kernel.org/lkml/20221018221349.4196-1-chang.seok.bae@intel.com/
[3] https://lore.kernel.org/lkml/20221021185844.13472-1-chang.seok.bae@intel.com/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n1156
[5] https://lore.kernel.org/kvm/20230221163655.920289-2-mizhang@google.com/

Chang S. Bae (2):
  x86/fpu/xstate: Prevent false-positive warning in
    __copy_xstate_uabi_buf()
  selftests/x86/amx: Add a ptrace test

 arch/x86/kernel/fpu/xstate.c      |  30 ++++-----
 tools/testing/selftests/x86/amx.c | 108 +++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 19 deletions(-)


base-commit: 7fa08de735e41001a70c8ca869b2b159d74c2339
-- 
2.17.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf()
  2023-02-27 21:05   ` [PATCH 0/2] x86/fpu/xstate: Follow up on the init_fpstate fallout again Chang S. Bae
@ 2023-02-27 21:05     ` Chang S. Bae
  2023-02-27 21:05     ` [PATCH 2/2] selftests/x86/amx: Add a ptrace test Chang S. Bae
  1 sibling, 0 replies; 19+ messages in thread
From: Chang S. Bae @ 2023-02-27 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, dave.hansen, hpa,
	linux-kselftest, mizhang, chang.seok.bae

__copy_xstate_to_uabi_buf() copies either from the tasks XSAVE buffer
or from init_fpstate into the ptrace buffer. Dynamic features, like
XTILEDATA, have an all zeroes init state and are not saved in
init_fpstate, which means the corresponding bit is not set in the
xfeatures bitmap of the init_fpstate header.

But __copy_xstate_to_uabi_buf() retrieves addresses for both the tasks
xstate and init_fpstate unconditionally via __raw_xsave_addr().

So if the tasks XSAVE buffer has a dynamic feature set, then the
address retrieval for init_fpstate triggers the warning in
__raw_xsave_addr() which checks the feature bit in the init_fpstate
header.

Remove the address retrieval from init_fpstate for extended features.
They have an all zeroes init state so init_fpstate has zeros for them.
Then zeroing the user buffer for the init state is the same as copying
them from init_fpstate.

Fixes: 2308ee57d93d ("x86/fpu/amx: Enable the AMX feature in 64-bit mode")
Reported-by: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/kvm/20230221163655.920289-2-mizhang@google.com/
---
Thanks, Mingwei for detecting the issue and Thomas for explaining the
problem with the well-written description. The background and problem
sections in this changelog came from his write-up.

I acknowledge that Mingwei's patch (in the link above) can solve the
problem. But, I would rather propose this approach as it simplifies
the code which is considered good in this sensitive copy function.

This mostly zeroed init_fpstate is still useful, e.g., when copying
the init state between buffers with the same format -- like in
fpu_clone(). But, this case between different XSAVE formats looks a
bit different than that.
---
 arch/x86/kernel/fpu/xstate.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 714166cc25f2..0bab497c9436 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1118,21 +1118,20 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 	zerofrom = offsetof(struct xregs_state, extended_state_area);
 
 	/*
-	 * The ptrace buffer is in non-compacted XSAVE format.  In
-	 * non-compacted format disabled features still occupy state space,
-	 * but there is no state to copy from in the compacted
-	 * init_fpstate. The gap tracking will zero these states.
-	 */
-	mask = fpstate->user_xfeatures;
-
-	/*
-	 * Dynamic features are not present in init_fpstate. When they are
-	 * in an all zeros init state, remove those from 'mask' to zero
-	 * those features in the user buffer instead of retrieving them
-	 * from init_fpstate.
+	 * This 'mask' indicates which states to copy from fpstate.
+	 * Those extended states that are not present in fpstate are
+	 * either disabled or initialized:
+	 *
+	 * In non-compacted format, disabled features still occupy
+	 * state space but there is no state to copy from in the
+	 * compacted init_fpstate. The gap tracking will zero these
+	 * states.
+	 *
+	 * The extended features have an all zeroes init state. Thus,
+	 * remove them from 'mask' to zero those features in the user
+	 * buffer instead of retrieving them from init_fpstate.
 	 */
-	if (fpu_state_size_dynamic())
-		mask &= (header.xfeatures | xinit->header.xcomp_bv);
+	mask = header.xfeatures;
 
 	for_each_extended_xfeature(i, mask) {
 		/*
@@ -1151,9 +1150,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
 			pkru.pkru = pkru_val;
 			membuf_write(&to, &pkru, sizeof(pkru));
 		} else {
-			copy_feature(header.xfeatures & BIT_ULL(i), &to,
+			membuf_write(&to,
 				     __raw_xsave_addr(xsave, i),
-				     __raw_xsave_addr(xinit, i),
 				     xstate_sizes[i]);
 		}
 		/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] selftests/x86/amx: Add a ptrace test
  2023-02-27 21:05   ` [PATCH 0/2] x86/fpu/xstate: Follow up on the init_fpstate fallout again Chang S. Bae
  2023-02-27 21:05     ` [PATCH 1/2] x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf() Chang S. Bae
@ 2023-02-27 21:05     ` Chang S. Bae
  1 sibling, 0 replies; 19+ messages in thread
From: Chang S. Bae @ 2023-02-27 21:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: x86, tglx, mingo, bp, dave.hansen, dave.hansen, hpa,
	linux-kselftest, mizhang, chang.seok.bae

Include a test case to validate the XTILEDATA injection to the target.

Also, it ensures the kernel's ability to copy states between different
XSAVE formats.

Refactor the memcmp() code to be usable for the state validation.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: x86@kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 tools/testing/selftests/x86/amx.c | 108 +++++++++++++++++++++++++++++-
 1 file changed, 105 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/amx.c b/tools/testing/selftests/x86/amx.c
index 625e42901237..d884fd69dd51 100644
--- a/tools/testing/selftests/x86/amx.c
+++ b/tools/testing/selftests/x86/amx.c
@@ -14,8 +14,10 @@
 #include <sys/auxv.h>
 #include <sys/mman.h>
 #include <sys/shm.h>
+#include <sys/ptrace.h>
 #include <sys/syscall.h>
 #include <sys/wait.h>
+#include <sys/uio.h>
 
 #include "../kselftest.h" /* For __cpuid_count() */
 
@@ -583,6 +585,13 @@ static void test_dynamic_state(void)
 	_exit(0);
 }
 
+static inline int __compare_tiledata_state(struct xsave_buffer *xbuf1, struct xsave_buffer *xbuf2)
+{
+	return memcmp(&xbuf1->bytes[xtiledata.xbuf_offset],
+		      &xbuf2->bytes[xtiledata.xbuf_offset],
+		      xtiledata.size);
+}
+
 /*
  * Save current register state and compare it to @xbuf1.'
  *
@@ -599,9 +608,7 @@ static inline bool __validate_tiledata_regs(struct xsave_buffer *xbuf1)
 		fatal_error("failed to allocate XSAVE buffer\n");
 
 	xsave(xbuf2, XFEATURE_MASK_XTILEDATA);
-	ret = memcmp(&xbuf1->bytes[xtiledata.xbuf_offset],
-		     &xbuf2->bytes[xtiledata.xbuf_offset],
-		     xtiledata.size);
+	ret = __compare_tiledata_state(xbuf1, xbuf2);
 
 	free(xbuf2);
 
@@ -826,6 +833,99 @@ static void test_context_switch(void)
 	free(finfo);
 }
 
+/* Ptrace test */
+
+/*
+ * Make sure the ptracee has the expanded kernel buffer on the first
+ * use. Then, initialize the state before performing the state
+ * injection from the ptracer.
+ */
+static inline void ptracee_firstuse_tiledata(void)
+{
+	load_rand_tiledata(stashed_xsave);
+	init_xtiledata();
+}
+
+/*
+ * Ptracer injects the randomized tile data state. It also reads
+ * before and after that, which will execute the kernel's state copy
+ * functions. So, the tester is advised to double-check any emitted
+ * kernel messages.
+ */
+static void ptracer_inject_tiledata(pid_t target)
+{
+	struct xsave_buffer *xbuf;
+	struct iovec iov;
+
+	xbuf = alloc_xbuf();
+	if (!xbuf)
+		fatal_error("unable to allocate XSAVE buffer");
+
+	printf("\tRead the init'ed tiledata via ptrace().\n");
+
+	iov.iov_base = xbuf;
+	iov.iov_len = xbuf_size;
+
+	memset(stashed_xsave, 0, xbuf_size);
+
+	if (ptrace(PTRACE_GETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov))
+		fatal_error("PTRACE_GETREGSET");
+
+	if (!__compare_tiledata_state(stashed_xsave, xbuf))
+		printf("[OK]\tThe init'ed tiledata was read from ptracee.\n");
+	else
+		printf("[FAIL]\tThe init'ed tiledata was not read from ptracee.\n");
+
+	printf("\tInject tiledata via ptrace().\n");
+
+	load_rand_tiledata(xbuf);
+
+	memcpy(&stashed_xsave->bytes[xtiledata.xbuf_offset],
+	       &xbuf->bytes[xtiledata.xbuf_offset],
+	       xtiledata.size);
+
+	if (ptrace(PTRACE_SETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov))
+		fatal_error("PTRACE_SETREGSET");
+
+	if (ptrace(PTRACE_GETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov))
+		fatal_error("PTRACE_GETREGSET");
+
+	if (!__compare_tiledata_state(stashed_xsave, xbuf))
+		printf("[OK]\tTiledata was correctly written to ptracee.\n");
+	else
+		printf("[FAIL]\tTiledata was not correctly written to ptracee.\n");
+}
+
+static void test_ptrace(void)
+{
+	pid_t child;
+	int status;
+
+	child = fork();
+	if (child < 0) {
+		err(1, "fork");
+	} else if (!child) {
+		if (ptrace(PTRACE_TRACEME, 0, NULL, NULL))
+			err(1, "PTRACE_TRACEME");
+
+		ptracee_firstuse_tiledata();
+
+		raise(SIGTRAP);
+		_exit(0);
+	}
+
+	do {
+		wait(&status);
+	} while (WSTOPSIG(status) != SIGTRAP);
+
+	ptracer_inject_tiledata(child);
+
+	ptrace(PTRACE_DETACH, child, NULL, NULL);
+	wait(&status);
+	if (!WIFEXITED(status) || WEXITSTATUS(status))
+		err(1, "ptrace test");
+}
+
 int main(void)
 {
 	/* Check hardware availability at first */
@@ -846,6 +946,8 @@ int main(void)
 	ctxtswtest_config.num_threads = 5;
 	test_context_switch();
 
+	test_ptrace();
+
 	clearhandler(SIGILL);
 	free_stashed_xsave();
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-02-27 21:16 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24 19:12 [PATCH 0/3] x86/fpu: Improve the init_fpstate setup code Chang S. Bae
2022-08-24 19:12 ` [PATCH 1/3] x86/fpu: Configure init_fpstate attributes orderly Chang S. Bae
2022-10-03 21:18   ` Neel Natu
2022-10-17 13:47   ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
2022-08-24 19:12 ` [PATCH 2/3] x86/fpu: Fix the init_fpstate size check with the actual size Chang S. Bae
2022-10-17 13:47   ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
2022-08-24 19:12 ` [PATCH 3/3] x86/fpu: Exclude dynamic states from init_fpstate Chang S. Bae
2022-10-17 13:47   ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
2022-10-18 22:13 ` [PATCH 0/1] x86/fpu: Follow up on the init_fpstate fix Chang S. Bae
2022-10-18 22:13   ` [PATCH 1/1] x86/fpu: Fix copy_xstate_to_uabi() to copy init states correctly Chang S. Bae
2022-10-20 16:57     ` Dave Hansen
2022-10-20 18:52       ` Chang S. Bae
2022-10-21 18:58     ` [PATCH v2] " Chang S. Bae
2022-10-21 22:25       ` [tip: x86/urgent] " tip-bot2 for Chang S. Bae
2022-10-22  6:23         ` Ingo Molnar
2022-10-22 13:02           ` Chang S. Bae
2023-02-27 21:05   ` [PATCH 0/2] x86/fpu/xstate: Follow up on the init_fpstate fallout again Chang S. Bae
2023-02-27 21:05     ` [PATCH 1/2] x86/fpu/xstate: Prevent false-positive warning in __copy_xstate_uabi_buf() Chang S. Bae
2023-02-27 21:05     ` [PATCH 2/2] selftests/x86/amx: Add a ptrace test Chang S. Bae

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.