From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753968AbcBOTOd (ORCPT ); Mon, 15 Feb 2016 14:14:33 -0500 Received: from mail.skyhub.de ([78.46.96.112]:43253 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752383AbcBOTOb (ORCPT ); Mon, 15 Feb 2016 14:14:31 -0500 Date: Mon, 15 Feb 2016 20:14:22 +0100 From: Borislav Petkov To: Andy Lutomirski Cc: x86-ml , lkml Subject: Re: WARNING: CPU: 0 PID: 3031 at ./arch/x86/include/asm/fpu/internal.h:530 fpu__restore+0x90/0x130() Message-ID: <20160215191422.GB32716@pd.tnic> References: <20160211192741.GG5565@pd.tnic> <20160212170010.GE4099@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160212170010.GE4099@pd.tnic> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 12, 2016 at 06:00:10PM +0100, Borislav Petkov wrote: > Something for me to try when I get a chance. Ok, so I wanted to know what happens in detail. Here's some ftracing (debug patch at the end). Now pay attention to this udevadm thing [ 3.816977] rcu_pree-7 0d..2 4058241us : __switch_to: prev: rcu_preempt <-> next: udevadm [ 3.816977] rcu_pree-7 0d..2 4058241us : __switch_to: set ->fpregs_active [ 3.816977] udevadm-982 0.... 4058258us : __fpu__restore_sig: fpregs_active 0, f443d7c0 We're in __fpu__restore_sig() about to call schedule() [ 3.816977] udevadm-982 0d..2 4058260us : __switch_to: prev: udevadm <-> next: usb_id [ 3.816977] udevadm-982 0d..2 4058260us : __switch_to: set ->fpregs_active [ 3.816977] usb_id-987 0d..2 4059684us : __switch_to: prev: usb_id <-> next: udevd [ 3.816977] usb_id-987 0d..2 4059685us : __switch_to: set ->fpregs_active [ 3.816977] udevd-843 0d..2 4059697us : __switch_to: prev: udevd <-> next: udevd [ 3.816977] udevd-843 0d..2 4059697us : __switch_to: set ->fpregs_active [ 3.816977] alsa-uti-989 0d..2 4060452us : __switch_to: prev: alsa-utils <-> next: udevd [ 3.816977] alsa-uti-989 0d..2 4060452us : __switch_to: set ->fpregs_active [ 3.816977] udevd-840 0d..2 4060521us : __switch_to: prev: udevd <-> next: udevd [ 3.816977] udevd-840 0d..2 4060522us : __switch_to: set ->fpregs_active [ 3.816977] udevd-829 0d..2 4060557us : __switch_to: prev: udevd <-> next: udevd [ 3.816977] udevd-829 0d..2 4060558us : __switch_to: set ->fpregs_active [ 3.816977] udevd-840 0d..2 4060862us : __switch_to: prev: udevd <-> next: blkid [ 3.816977] udevd-840 0d..2 4060862us : __switch_to: set ->fpregs_active [ 3.816977] blkid-985 0d..2 4061148us : __switch_to: prev: blkid <-> next: udevadm [ 3.816977] blkid-985 0d..2 4061148us : __switch_to: set ->fpregs_active Now we're switching back to udevadm which is @next_p of __switch_to(). There we do: fpu_switch = switch_fpu_prepare(prev_fpu, next_fpu, cpu); which does: /* Don't change CR0.TS if we just switch! */ if (fpu.preload) { new_fpu->counter++; __fpregs_activate(new_fpu); prefetch(&new_fpu->state); __fpregs_activate() sets ->fpregs_active of @new_fpu, i.e. udevadm's one. [ 3.816977] udevadm-982 0.... 4061149us : __fpu__restore_sig: after schedule: fpregs_active: 1 f443d7c0 __fpu__restore_sig() -> fpu__restore() sets ->fpregs_active again. [ 3.816977] udevadm-982 0.N.1 4185386us : fpu__restore: WARN: fpu: f443d7c0 Boom! [ 3.816977] udevadm-982 0.N.1 4185392us : [ 3.816977] => fpu__restore [ 3.816977] => __fpu__restore_sig [ 3.816977] => fpu__restore_sig [ 3.816977] => restore_sigcontext [ 3.816977] => sys_sigreturn [ 3.816977] => do_syscall_32_irqs_on [ 3.816977] => restore_all [ 3.816977] --------------------------------- [ 3.816977] Kernel Offset: disabled [ 3.816977] ---[ end Kernel panic - not syncing: Outta here... So yeah, we probably should enlarge the preemption-off region to contain ->fpstate_active. Here's what you basically suggested but with a *looot* of explanatory text. Which might be really wrong or completely unparseable or both. So holler what should be changed. Thanks! --- From: Borislav Petkov Date: Mon, 15 Feb 2016 19:50:33 +0100 Subject: [RFC PATCH] x86/FPU: Fix double FPU regs activation On the entry_INT80_32->do_syscall_32_irqs_on path on 32-bit we run with interrupts enabled. And it can happen that we get preempted right after setting ->fpstate_active in a task's FPU. After we get preempted, we switch between tasks merrily and eventually are about to switch to that task above whose ->fpstate_active we set. We enter __switch_to() and do switch_fpu_prepare(). Our task gets ->fpregs_active set, we find ourselves back on the call stack below and especially in __fpu__restore_sig() which sets ->fpregs_active again. Leading to that whoops below. So let's enlarge the preemption-off region so that we set ->fpstate_active with preemption disabled and thus not trigger fpu.preload: switch_fpu_prepare ... fpu.preload = static_cpu_has(X86_FEATURE_FPU) && new_fpu->fpstate_active && ^^^^^^^^^^^^^^^^^^^^^^ prematurely. WARNING: CPU: 0 PID: 3031 at ./arch/x86/include/asm/fpu/internal.h:530 fpu__restore+0x90/0x130() Modules linked in: hid_generic usbhid hid snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic iTCO_wdt iTCO_vendor_support emp_thermal coretemp kvm_intel kvm irqbypass crc32_pclmul crc32c_intel iwldvm mac80211 aesni_intel xts snd_hda_intel input_leds aes_i586 sdhci_pci lrw iwlwifi snd_hwdep gf128mul snd_hda_core ablk_helper cryptd ehci_pci pcspkr serio_raw xhci_pci sdhci snd_pcm sg mmc_core 211 lpc_ich mfd_core e1000e snd_timer ehci_hcd xhci_hcd thinkpad_acpi nvram wmi snd battery soundcore led_class ac thermal CPU: 0 PID: 3031 Comm: bash Not tainted 4.5.0-rc3+ #1 Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012 00000000 00000286 f158be4c c12cce56 00000000 00000000 f158be80 c10567fb c1866c2c 00000000 00000bd7 c1859e8c 00000212 c1025ab0 00000212 c1025ab0 f2012b00 f2011f00 f2012d80 f158be90 c10568d2 00000009 00000000 f158bea4 Call Trace: dump_stack warn_slowpath_common ? fpu__restore ? fpu__restore warn_slowpath_null fpu__restore __fpu__restore_sig fpu__restore_sig restore_sigcontext sys_sigreturn do_syscall_32_irqs_on entry_INT80_32 Suggested-by: Andy Lutomirski Signed-off-by: Borislav Petkov --- arch/x86/kernel/fpu/signal.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index 31c6a60505e6..408e5a1c6fdd 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -316,12 +316,11 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) sanitize_restored_xstate(tsk, &env, xfeatures, fx_only); } + preempt_disable(); fpu->fpstate_active = 1; - if (use_eager_fpu()) { - preempt_disable(); + if (use_eager_fpu()) fpu__restore(fpu); - preempt_enable(); - } + preempt_enable(); return err; } else { -- 2.3.5 Tracing patch: --- diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h index a2124343edf5..2cbc3bf34928 100644 --- a/arch/x86/include/asm/fpu/internal.h +++ b/arch/x86/include/asm/fpu/internal.h @@ -527,9 +527,16 @@ static inline void __fpregs_deactivate(struct fpu *fpu) /* Must be paired with a 'clts' (fpregs_activate_hw()) before! */ static inline void __fpregs_activate(struct fpu *fpu) { - WARN_ON_FPU(fpu->fpregs_active); + if (WARN_ON_FPU(fpu->fpregs_active)) { + trace_printk("WARN: fpu: %p\n", fpu); + trace_dump_stack(0); + tracing_off(); + panic("Outta here...\n"); + } fpu->fpregs_active = 1; + trace_printk("set ->fpregs_active\n"); + this_cpu_write(fpu_fpregs_owner_ctx, fpu); } diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index 31c6a60505e6..bb40f02cdfdd 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -317,6 +317,14 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) } fpu->fpstate_active = 1; + + trace_printk("fpregs_active %d, %p\n", fpu->fpregs_active, fpu); + + schedule(); + + trace_printk("after schedule: fpregs_active: %d %p\n", + fpu->fpregs_active, fpu); + if (use_eager_fpu()) { preempt_disable(); fpu__restore(fpu); diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 9f950917528b..ce768c728f38 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -249,6 +249,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) struct tss_struct *tss = &per_cpu(cpu_tss, cpu); fpu_switch_t fpu_switch; + trace_printk("prev: %s <-> next: %s\n", prev_p->comm, next_p->comm); + /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */ fpu_switch = switch_fpu_prepare(prev_fpu, next_fpu, cpu); -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.