* AVX register corruption from signal delivery @ 2019-11-26 19:49 Barret Rhoden 2019-11-26 20:20 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 14+ messages in thread From: Barret Rhoden @ 2019-11-26 19:49 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Rik van Riel" Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov Hi - The Go Team found a bug[1] where the AVX registers can get corrupted during signal delivery. They were able to bisect it to commits related to the "x86: load FPU registers on return to userland" patchset[2]. The bug requires the kernel to be built with GCC 9 to trigger. In particular, arch/x86/kernel/fpu/signal.c needs to be built with GCC 9. Thanks, Barret [1] https://bugzilla.kernel.org/show_bug.cgi?id=205663 [2] https://lore.kernel.org/kvm/20190403164156.19645-1-bigeasy@linutronix.de/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AVX register corruption from signal delivery 2019-11-26 19:49 AVX register corruption from signal delivery Barret Rhoden @ 2019-11-26 20:20 ` Sebastian Andrzej Siewior 2019-11-26 21:23 ` Barret Rhoden 0 siblings, 1 reply; 14+ messages in thread From: Sebastian Andrzej Siewior @ 2019-11-26 20:20 UTC (permalink / raw) To: Barret Rhoden Cc: Rik van Riel", x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov On 2019-11-26 14:49:55 [-0500], Barret Rhoden wrote: > Hi - Hi, > The bug requires the kernel to be built with GCC 9 to trigger. In > particular, arch/x86/kernel/fpu/signal.c needs to be built with GCC 9. I've been pinged already, I will look into this. Please send me a .config just to be sure. From browsing over the bug CONFIG_PREEMPTION was required. > Thanks, > > Barret Sebastian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AVX register corruption from signal delivery 2019-11-26 20:20 ` Sebastian Andrzej Siewior @ 2019-11-26 21:23 ` Barret Rhoden 2019-11-26 22:13 ` Borislav Petkov 2019-11-27 12:42 ` [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx Sebastian Andrzej Siewior 0 siblings, 2 replies; 14+ messages in thread From: Barret Rhoden @ 2019-11-26 21:23 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Rik van Riel", x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov [-- Attachment #1: Type: text/plain, Size: 529 bytes --] On 11/26/19 3:20 PM, Sebastian Andrzej Siewior wrote: > On 2019-11-26 14:49:55 [-0500], Barret Rhoden wrote: >> Hi - > Hi, > >> The bug requires the kernel to be built with GCC 9 to trigger. In >> particular, arch/x86/kernel/fpu/signal.c needs to be built with GCC 9. > > I've been pinged already, I will look into this. Please send me a > .config just to be sure. From browsing over the bug CONFIG_PREEMPTION > was required. Thanks; config attached. I've been able to recreate it in QEMU with at least 2 cores. Barret [-- Attachment #2: avx-bug.config.gz --] [-- Type: application/gzip, Size: 28236 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AVX register corruption from signal delivery 2019-11-26 21:23 ` Barret Rhoden @ 2019-11-26 22:13 ` Borislav Petkov 2019-11-26 22:30 ` Andy Lutomirski 2019-11-27 12:42 ` [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx Sebastian Andrzej Siewior 1 sibling, 1 reply; 14+ messages in thread From: Borislav Petkov @ 2019-11-26 22:13 UTC (permalink / raw) To: Barret Rhoden, austin Cc: Sebastian Andrzej Siewior, Rik van Riel, x86, linux-kernel, Thomas Gleixner, Ingo Molnar On Tue, Nov 26, 2019 at 04:23:40PM -0500, Barret Rhoden wrote: > Thanks; config attached. I've been able to recreate it in QEMU with at > least 2 cores. Yap, I can too, in my VM. Btw, would you guys like to submit that reproducer test program https://bugzilla.kernel.org/attachment.cgi?id=286073 into the kernel selftests pile here: tools/testing/selftests/x86/ ? It needs proper cleanup to fit kernel coding style but it could be a good start for collecting interesting FPU test cases... Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AVX register corruption from signal delivery 2019-11-26 22:13 ` Borislav Petkov @ 2019-11-26 22:30 ` Andy Lutomirski 2019-11-26 23:00 ` Borislav Petkov 0 siblings, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2019-11-26 22:30 UTC (permalink / raw) To: Borislav Petkov Cc: Barret Rhoden, austin, Sebastian Andrzej Siewior, Rik van Riel, x86, linux-kernel, Thomas Gleixner, Ingo Molnar > On Nov 26, 2019, at 2:14 PM, Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Nov 26, 2019 at 04:23:40PM -0500, Barret Rhoden wrote: >> Thanks; config attached. I've been able to recreate it in QEMU with at >> least 2 cores. > > Yap, I can too, in my VM. > > Btw, would you guys like to submit that reproducer test program > > https://bugzilla.kernel.org/attachment.cgi?id=286073 > > into the kernel selftests pile here: > > tools/testing/selftests/x86/ > > ? > > It needs proper cleanup to fit kernel coding style but it could be a > good start for collecting interesting FPU test cases. If we do this, we should have selftests/x86/slow or otherwise have a fast vs slow mode. I really like that the entire suite takes under 2s. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AVX register corruption from signal delivery 2019-11-26 22:30 ` Andy Lutomirski @ 2019-11-26 23:00 ` Borislav Petkov 0 siblings, 0 replies; 14+ messages in thread From: Borislav Petkov @ 2019-11-26 23:00 UTC (permalink / raw) To: Andy Lutomirski Cc: Barret Rhoden, austin, Sebastian Andrzej Siewior, Rik van Riel, x86, linux-kernel, Thomas Gleixner, Ingo Molnar On Tue, Nov 26, 2019 at 02:30:10PM -0800, Andy Lutomirski wrote: > If we do this, we should have selftests/x86/slow or otherwise have a > fast vs slow mode. I really like that the entire suite takes under 2s. Sure, we can stick it under a separate Makefile target called "full" or so to mean, run the full set of tests. We can run the fast ones first and the slow ones then. Or something to that effect. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-26 21:23 ` Barret Rhoden 2019-11-26 22:13 ` Borislav Petkov @ 2019-11-27 12:42 ` Sebastian Andrzej Siewior 2019-11-27 14:07 ` Borislav Petkov 2019-11-27 15:46 ` [PATCH] " Rik van Riel 1 sibling, 2 replies; 14+ messages in thread From: Sebastian Andrzej Siewior @ 2019-11-27 12:42 UTC (permalink / raw) To: Barret Rhoden, Josh Bleecher Snyder Cc: Rik van Riel", x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the context that is currently loaded. It never changed during the life time of a task and remained stable/constant. Since we deferred loading the FPU registers on return to userland, the content of fpu_fpregs_owner_ctx may change during preemption and must not be cached. This went unnoticed for some time and was now noticed, in particular gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse it in the retry loop: copy_fpstate_to_sigframe() load fpu_fpregs_owner_ctx and save on stack fpregs_lock() copy_fpregs_to_sigframe() /* failed */ fpregs_unlock() *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx *** fault_in_pages_writeable() /* succeed, retry */ fpregs_lock() __fpregs_load_activate() fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */ copy_fpregs_to_sigframe() /* succeeds, random FPU content */ This is a comparison of the assembly of gcc-9, without vs with this patch: | # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size)) | cmpq %rdx, %rax # tmp183, _4 | jb .L190 #, |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |-#APP |-# 512 "arch/x86/include/asm/fpu/internal.h" 1 |- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__ |-# 0 "" 2 |-#NO_APP |- movq %rax, -88(%rbp) # pfo_ret__, %sfp … |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |- movq -88(%rbp), %rcx # %sfp, pfo_ret__ |- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#APP |+# 512 "arch/x86/include/asm/fpu/internal.h" 1 |+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__ |+# 0 "" 2 |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#NO_APP |+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of fpu_fpregs_owner_ctx during preemption points. Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") --- There is no Sign-off by here. Could this please be verified by the reporter? Also I would like to add Debugged-by: Ian Lance Taylor but I lack the complete address also I'm not sure if he wants to. Also please send a Reported-by line since I'm not sure who started this. arch/x86/include/asm/fpu/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 4c95c365058aa..44c48e34d7994 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu) static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu) { - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; } /* -- 2.24.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-27 12:42 ` [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx Sebastian Andrzej Siewior @ 2019-11-27 14:07 ` Borislav Petkov 2019-11-27 18:42 ` Barret Rhoden 2019-11-27 15:46 ` [PATCH] " Rik van Riel 1 sibling, 1 reply; 14+ messages in thread From: Borislav Petkov @ 2019-11-27 14:07 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Barret Rhoden, Josh Bleecher Snyder, Rik van Riel", x86, linux-kernel, Thomas Gleixner, Ingo Molnar, ian On Wed, Nov 27, 2019 at 01:42:43PM +0100, Sebastian Andrzej Siewior wrote: > The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the ^ to > context that is currently loaded. It never changed during the life time > of a task and remained stable/constant. > > Since we deferred loading the FPU registers on return to userland, the Drop those "we"s :) > content of fpu_fpregs_owner_ctx may change during preemption and must > not be cached. > This went unnoticed for some time and was now noticed, in particular > gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse > it in the retry loop: > > copy_fpstate_to_sigframe() > load fpu_fpregs_owner_ctx and save on stack > fpregs_lock() > copy_fpregs_to_sigframe() /* failed */ > fpregs_unlock() > *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx *** > > fault_in_pages_writeable() /* succeed, retry */ > > fpregs_lock() > __fpregs_load_activate() > fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */ > copy_fpregs_to_sigframe() /* succeeds, random FPU content */ > > This is a comparison of the assembly of gcc-9, without vs with this > patch: > > | # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size)) > | cmpq %rdx, %rax # tmp183, _4 > | jb .L190 #, > |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; > |-#APP > |-# 512 "arch/x86/include/asm/fpu/internal.h" 1 > |- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__ > |-# 0 "" 2 > |-#NO_APP > |- movq %rax, -88(%rbp) # pfo_ret__, %sfp > … > |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; > |- movq -88(%rbp), %rcx # %sfp, pfo_ret__ > |- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp > |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; > |+#APP > |+# 512 "arch/x86/include/asm/fpu/internal.h" 1 > |+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__ > |+# 0 "" 2 > |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; > |+#NO_APP > |+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp > > Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of > fpu_fpregs_owner_ctx during preemption points. > > Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Or a352a3b7b792 ("x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD") maybe, which adds the fpregs_unlock() ? > --- > > There is no Sign-off by here. Could this please be verified by the > reporter? Not the reporter, but I just tested it successfully too: Tested-by: Borislav Petkov <bp@suse.de> > Also I would like to add > Debugged-by: Ian Lance Taylor Yes, pls. CCed. > > but I lack the complete address also I'm not sure if he wants to. > Also please send a Reported-by line since I'm not sure who started this. > > arch/x86/include/asm/fpu/internal.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h > index 4c95c365058aa..44c48e34d7994 100644 > --- a/arch/x86/include/asm/fpu/internal.h > +++ b/arch/x86/include/asm/fpu/internal.h > @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu) > > static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu) > { > - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; > + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; > } And to add one more data point from IRC: this is also documented: /* * this_cpu_read() makes gcc load the percpu variable every time it is * accessed while this_cpu_read_stable() allows the value to be cached. ^^^^^^^^^^^^^^^ * this_cpu_read_stable() is more efficient and can be used if its value * is guaranteed to be valid across cpus. The current users include * get_current() and get_thread_info() both of which are actually * per-thread variables implemented as per-cpu variables and thus * stable for the duration of the respective task. */ #define this_cpu_read_stable(var) percpu_stable_op("mov", var) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-27 14:07 ` Borislav Petkov @ 2019-11-27 18:42 ` Barret Rhoden 2019-11-28 8:53 ` [PATCH v2] " Sebastian Andrzej Siewior 0 siblings, 1 reply; 14+ messages in thread From: Barret Rhoden @ 2019-11-27 18:42 UTC (permalink / raw) To: Borislav Petkov, Sebastian Andrzej Siewior Cc: Josh Bleecher Snyder, Rik van Riel", x86, linux-kernel, Thomas Gleixner, Ingo Molnar, ian >> Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of >> fpu_fpregs_owner_ctx during preemption points. >> >> Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") > > Or > > a352a3b7b792 ("x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD") > > maybe, which adds the fpregs_unlock() ? Using this_cpu_read_stable() (or some variant) seems to go back quite a while; not sure when exactly it became a problem. If it helps, commit d9c9ce34ed5c ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails") was the one that popped up the most during Austin's bisection. >> Also I would like to add >> Debugged-by: Ian Lance Taylor > > Yes, pls. CCed. To close the loop on this, here's what Austin wrote on the bugzilla: > --- Comment #2 from Austin Clements (austin@google.com) --- > I can confirm that the patch posted by Sebastian Andrzej Siewior at > https://lkml.org/lkml/2019/11/27/304 fixes the issue both in our C reproducer > and in our original Go reproducer. (Sorry, I'm not subscribed to LKML, so I > can't reply there, and I'm on an airplane, so it's hard to get subscribed :) > > Regarding the question about the "Debugged-by" line in the patch, debugging was > a joint effort between myself (Austin Clements <austin@google.com>), David > Chase <drchase@golang.org>, and Ian Lance Taylor <ian@airs.com>. Thanks, Barret ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-27 18:42 ` Barret Rhoden @ 2019-11-28 8:53 ` Sebastian Andrzej Siewior 2019-11-28 9:22 ` [tip: x86/urgent] " tip-bot2 for Sebastian Andrzej Siewior 2019-11-29 16:57 ` [PATCH v2] " David Laight 0 siblings, 2 replies; 14+ messages in thread From: Sebastian Andrzej Siewior @ 2019-11-28 8:53 UTC (permalink / raw) To: Barret Rhoden Cc: Borislav Petkov, Josh Bleecher Snyder, Rik van Riel, x86, linux-kernel, Thomas Gleixner, Ingo Molnar, ian, Austin Clements, David Chase The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the context that is currently loaded. It never changed during the life time of a task and remained stable/constant. Since deferred loading the FPU registers on return to userland, the content of fpu_fpregs_owner_ctx may change during preemption and must not be cached. This went unnoticed for some time and was now noticed, in particular gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse it in the retry loop: copy_fpstate_to_sigframe() load fpu_fpregs_owner_ctx and save on stack fpregs_lock() copy_fpregs_to_sigframe() /* failed */ fpregs_unlock() *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx *** fault_in_pages_writeable() /* succeed, retry */ fpregs_lock() __fpregs_load_activate() fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */ copy_fpregs_to_sigframe() /* succeeds, random FPU content */ This is a comparison of the assembly of gcc-9, without vs with this patch: | # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size)) | cmpq %rdx, %rax # tmp183, _4 | jb .L190 #, |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |-#APP |-# 512 "arch/x86/include/asm/fpu/internal.h" 1 |- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__ |-# 0 "" 2 |-#NO_APP |- movq %rax, -88(%rbp) # pfo_ret__, %sfp … |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |- movq -88(%rbp), %rcx # %sfp, pfo_ret__ |- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#APP |+# 512 "arch/x86/include/asm/fpu/internal.h" 1 |+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__ |+# 0 "" 2 |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#NO_APP |+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of fpu_fpregs_owner_ctx during preemption points. The fixes tag points to the commit where defered FPU loading was added. Since this commit the compiler is no longer allowed move the load of fpu_fpregs_owner_ctx somewhere else / outside of the locked section. A task preemption will change its value and stale content will be observed. Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663 Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Debugged-by: Austin Clements <austin@google.com> Debugged-by: David Chase <drchase@golang.org> Debugged-by: Ian Lance Taylor <ian@airs.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- v1…v2: - Adding tags - Explaining why Fixes: does not point to the bisected commit. arch/x86/include/asm/fpu/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 4c95c365058aa..44c48e34d7994 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu) static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu) { - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; } /* -- 2.24.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [tip: x86/urgent] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-28 8:53 ` [PATCH v2] " Sebastian Andrzej Siewior @ 2019-11-28 9:22 ` tip-bot2 for Sebastian Andrzej Siewior 2019-11-29 16:57 ` [PATCH v2] " David Laight 1 sibling, 0 replies; 14+ messages in thread From: tip-bot2 for Sebastian Andrzej Siewior @ 2019-11-28 9:22 UTC (permalink / raw) To: linux-tip-commits Cc: Sebastian Andrzej Siewior, Borislav Petkov, Rik van Riel, Aubrey Li, Austin Clements, Barret Rhoden, Dave Hansen, David Chase, H. Peter Anvin, ian, Ingo Molnar, Josh Bleecher Snyder, Thomas Gleixner, x86-ml, LKML The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 59c4bd853abcea95eccc167a7d7fd5f1a5f47b98 Gitweb: https://git.kernel.org/tip/59c4bd853abcea95eccc167a7d7fd5f1a5f47b98 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Thu, 28 Nov 2019 09:53:06 +01:00 Committer: Borislav Petkov <bp@suse.de> CommitterDate: Thu, 28 Nov 2019 10:16:46 +01:00 x86/fpu: Don't cache access to fpu_fpregs_owner_ctx The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing to the context that is currently loaded. It never changed during the lifetime of a task - it remained stable/constant. After deferred FPU registers loading until return to userland was implemented, the content of fpu_fpregs_owner_ctx may change during preemption and must not be cached. This went unnoticed for some time and was now noticed, in particular since gcc 9 is caching that load in copy_fpstate_to_sigframe() and reusing it in the retry loop: copy_fpstate_to_sigframe() load fpu_fpregs_owner_ctx and save on stack fpregs_lock() copy_fpregs_to_sigframe() /* failed */ fpregs_unlock() *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx *** fault_in_pages_writeable() /* succeed, retry */ fpregs_lock() __fpregs_load_activate() fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */ copy_fpregs_to_sigframe() /* succeeds, random FPU content */ This is a comparison of the assembly produced by gcc 9, without vs with this patch: | # arch/x86/kernel/fpu/signal.c:173: if (!access_ok(buf, size)) | cmpq %rdx, %rax # tmp183, _4 | jb .L190 #, |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |-#APP |-# 512 "arch/x86/include/asm/fpu/internal.h" 1 |- movq %gs:fpu_fpregs_owner_ctx,%rax #, pfo_ret__ |-# 0 "" 2 |-#NO_APP |- movq %rax, -88(%rbp) # pfo_ret__, %sfp … |-# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |- movq -88(%rbp), %rcx # %sfp, pfo_ret__ |- cmpq %rcx, -64(%rbp) # pfo_ret__, %sfp |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#APP |+# 512 "arch/x86/include/asm/fpu/internal.h" 1 |+ movq %gs:fpu_fpregs_owner_ctx(%rip),%rax # fpu_fpregs_owner_ctx, pfo_ret__ |+# 0 "" 2 |+# arch/x86/include/asm/fpu/internal.h:512: return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; |+#NO_APP |+ cmpq %rax, -64(%rbp) # pfo_ret__, %sfp Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of fpu_fpregs_owner_ctx during preemption points. The Fixes: tag points to the commit where deferred FPU loading was added. Since this commit, the compiler is no longer allowed to move the load of fpu_fpregs_owner_ctx somewhere else / outside of the locked section. A task preemption will change its value and stale content will be observed. [ bp: Massage. ] Debugged-by: Austin Clements <austin@google.com> Debugged-by: David Chase <drchase@golang.org> Debugged-by: Ian Lance Taylor <ian@airs.com> Fixes: 5f409e20b7945 ("x86/fpu: Defer FPU state load until return to userspace") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Rik van Riel <riel@surriel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: Aubrey Li <aubrey.li@intel.com> Cc: Austin Clements <austin@google.com> Cc: Barret Rhoden <brho@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Chase <drchase@golang.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: ian@airs.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Josh Bleecher Snyder <josharian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191128085306.hxfa2o3knqtu4wfn@linutronix.de Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663 --- arch/x86/include/asm/fpu/internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index 4c95c36..44c48e3 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -509,7 +509,7 @@ static inline void __fpu_invalidate_fpregs_state(struct fpu *fpu) static inline int fpregs_state_valid(struct fpu *fpu, unsigned int cpu) { - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu; } /* ^ permalink raw reply related [flat|nested] 14+ messages in thread
* RE: [PATCH v2] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-28 8:53 ` [PATCH v2] " Sebastian Andrzej Siewior 2019-11-28 9:22 ` [tip: x86/urgent] " tip-bot2 for Sebastian Andrzej Siewior @ 2019-11-29 16:57 ` David Laight 2019-11-29 17:08 ` 'Sebastian Andrzej Siewior' 1 sibling, 1 reply; 14+ messages in thread From: David Laight @ 2019-11-29 16:57 UTC (permalink / raw) To: 'Sebastian Andrzej Siewior', Barret Rhoden Cc: Borislav Petkov, Josh Bleecher Snyder, Rik van Riel, x86, linux-kernel, Thomas Gleixner, Ingo Molnar, ian, Austin Clements, David Chase From Sebastian Andrzej Siewior > Sent: 28 November 2019 08:53 > The state/owner of FPU is saved fpu_fpregs_owner_ctx by pointing to the > context that is currently loaded. It never changed during the life time > of a task and remained stable/constant. > > Since deferred loading the FPU registers on return to userland, the > content of fpu_fpregs_owner_ctx may change during preemption and must > not be cached. > This went unnoticed for some time and was now noticed, in particular > gcc-9 is able to cache that load in copy_fpstate_to_sigframe() and reuse > it in the retry loop: > > copy_fpstate_to_sigframe() > load fpu_fpregs_owner_ctx and save on stack > fpregs_lock() > copy_fpregs_to_sigframe() /* failed */ > fpregs_unlock() > *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx *** > > fault_in_pages_writeable() /* succeed, retry */ > > fpregs_lock() > __fpregs_load_activate() > fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */ > copy_fpregs_to_sigframe() /* succeeds, random FPU content */ Should both fpregs_lock() and fpregs_unlock() contain a barrier() (or "memory" clobber) do stop the compiler moving anything across them? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-29 16:57 ` [PATCH v2] " David Laight @ 2019-11-29 17:08 ` 'Sebastian Andrzej Siewior' 0 siblings, 0 replies; 14+ messages in thread From: 'Sebastian Andrzej Siewior' @ 2019-11-29 17:08 UTC (permalink / raw) To: David Laight Cc: Barret Rhoden, Borislav Petkov, Josh Bleecher Snyder, Rik van Riel, x86, linux-kernel, Thomas Gleixner, Ingo Molnar, ian, Austin Clements, David Chase On 2019-11-29 16:57:42 [+0000], David Laight wrote: > Should both fpregs_lock() and fpregs_unlock() contain a barrier() (or "memory" clobber) > do stop the compiler moving anything across them? They already do. > David Sebastian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx 2019-11-27 12:42 ` [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx Sebastian Andrzej Siewior 2019-11-27 14:07 ` Borislav Petkov @ 2019-11-27 15:46 ` Rik van Riel 1 sibling, 0 replies; 14+ messages in thread From: Rik van Riel @ 2019-11-27 15:46 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Barret Rhoden, Josh Bleecher Snyder Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov [-- Attachment #1: Type: text/plain, Size: 886 bytes --] On Wed, 2019-11-27 at 13:42 +0100, Sebastian Andrzej Siewior wrote: > There is no Sign-off by here. Could this please be verified by the > reporter? Next time this is posted, feel free to add this :) Reviewed-by: Rik van Riel <riel@surriel.com> > diff --git a/arch/x86/include/asm/fpu/internal.h > b/arch/x86/include/asm/fpu/internal.h > index 4c95c365058aa..44c48e34d7994 100644 > --- a/arch/x86/include/asm/fpu/internal.h > +++ b/arch/x86/include/asm/fpu/internal.h > @@ -509,7 +509,7 @@ static inline void > __fpu_invalidate_fpregs_state(struct fpu *fpu) > > static inline int fpregs_state_valid(struct fpu *fpu, unsigned int > cpu) > { > - return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu > == fpu->last_cpu; > + return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == > fpu->last_cpu; > } > > /* -- All Rights Reversed. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-11-29 17:08 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-26 19:49 AVX register corruption from signal delivery Barret Rhoden 2019-11-26 20:20 ` Sebastian Andrzej Siewior 2019-11-26 21:23 ` Barret Rhoden 2019-11-26 22:13 ` Borislav Petkov 2019-11-26 22:30 ` Andy Lutomirski 2019-11-26 23:00 ` Borislav Petkov 2019-11-27 12:42 ` [PATCH] x86/fpu: Don't cache access to fpu_fpregs_owner_ctx Sebastian Andrzej Siewior 2019-11-27 14:07 ` Borislav Petkov 2019-11-27 18:42 ` Barret Rhoden 2019-11-28 8:53 ` [PATCH v2] " Sebastian Andrzej Siewior 2019-11-28 9:22 ` [tip: x86/urgent] " tip-bot2 for Sebastian Andrzej Siewior 2019-11-29 16:57 ` [PATCH v2] " David Laight 2019-11-29 17:08 ` 'Sebastian Andrzej Siewior' 2019-11-27 15:46 ` [PATCH] " Rik van Riel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).