Hi Andi, 0day kernel testing robot got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git x86/fsgs-2 commit b8a868e9ea876a1b40020397305533c095921d7a Author: Andi Kleen AuthorDate: Wed Apr 23 13:26:20 2014 -0700 Commit: Andi Kleen CommitDate: Fri Oct 3 15:19:56 2014 -0700 x86: Add support for rd/wr fs/gs base IvyBridge added new instructions to directly write the fs and gs 64bit base registers. Previously this had to be done with a system call to write to MSRs. The main use case is fast user space threading and switching the fs/gs registers quickly there. Another use case is having (relatively) cheap access to a new address register per thread. The instructions are opt-in and have to be explicitely enabled by the OS. Previously Linux couldn't support this because the paranoid entry code relied on the gs base never being negative outside the kernel to decide when to use swaps. It would check the gs MSR value and assume it was already running in kernel if the value was already negative. To make this work we have to revamp the paranoid exception path to not rely on this. We can use the new instructions to get (relatively) quick access to the values. This is also significantly faster than a MSR read, so will speed NMIs (critical for profiling) The original patch compared the gs with the kernel gs and assumed that if it was the same swapgs is not needed (and no user space processing was needed). This was nice and simple and didn't need a lot of changes. But this had the side effect that if a user process set its GS to the same as the kernel it may lose rescheduling checks (so a racing reschedule IPI would have been only acted upon the next non paranoid interrupt) This version now switches to full save/restore of the GS. This requires quite some changes in the paranoid path. Unfortunately didn't come up with a simpler scheme: The kernel gs for the paranoid path is now stored at the bottom of the IST stack (so that it can be derived from RSP). For this we need to know the size of the IST stack (4K or 8K), which is now passed in as a stack parameter to save_paranoid. Previously we had a flag in EBX that indicated whether SWAPGS needs to be called later or not. In the new scheme this turns into a tristate, with a new "restore from R15" mode. The exit paths are all adjusted to handle this correctly. There is one complication: to allow debuggers (especially from the int3 or debug vectors) access to the user GS we need to save it in the task struct. Normally the next context switch would overwrite it with the wrong value from kernel_gs, so we set new flag also in task_struct that prevents it. Also to prevent recursive interrupts clobbering this state in the task_struct this is only done for interrupts coming from ring 3. After a schedule comes back we check if the flag is still set. If it wasn't set the GS is back in the (swapped) kernel gs so we revert to the SWAPGS mode, instead of restoring GS. Then after these changes we need to also use the new instructions to save/restore fs and gs, so that the new values set by the users won't disappear. This is also significantly faster for the case when the 64bit base has to be switched (that is when GS is larger than 4GB), as we can replace the slow MSR write with a faster wr[fg]sbase execution. The instructions do not context switch the segment index, so the old invariant that fs or gs index have to be 0 for a different 64bit value to stick is still true. Previously it was enforced by arch_prctl, now the user program has to make sure it keeps the segment indexes zero. If it doesn't the changes may not stick. This is in term enables fast switching when there are enough threads that their TLS segment does not fit below 4GB, or alternatively programs that use fs as an additional base register will not get a sigificant context switch penalty. It is all done in a single patch to avoid bisect crash holes. v2: Change to save/restore GS instead of using swapgs based on the value. Large scale changes. Signed-off-by: Andi Kleen +------------------------------------------+------------+------------+------------+ | | 598d570a05 | b8a868e9ea | 8048975233 | +------------------------------------------+------------+------------+------------+ | boot_successes | 900 | 280 | 79 | | boot_failures | 0 | 20 | 2 | | PANIC:double_fault, | 0 | 12 | 2 | | Kernel_panic-not_syncing:Machine_halted | 0 | 11 | 2 | | BUG:unable_to_handle_kernel | 0 | 5 | | | Oops | 0 | 3 | | | RIP:pgd_free | 0 | 1 | | | BUG:kernel_boot_crashed | 0 | 4 | | | RIP:show_stack_log_lvl | 0 | 1 | | | Kernel_panic-not_syncing:Fatal_exception | 0 | 1 | | +------------------------------------------+------------+------------+------------+ [ 5.087621] Freeing unused kernel memory: 1248K (ffff8800014c8000 - ffff880001600000) [ 5.136856] Freeing unused kernel memory: 1936K (ffff88000181c000 - ffff880001a00000) [ 5.167951] random: init urandom read with 5 bits of entropy available [ 19.307116] PANIC: double fault, error_code: 0xffffffffffffffff [ 19.309941] Kernel panic - not syncing: Machine halted. [ 19.310083] CPU: 1 PID: 150 Comm: trinity-main Not tainted 3.17.0-rc7-00004-gb8a868e #130 [ 19.310083] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 19.310083] 0000000000000000 ffff880012707e88 ffffffff814befff ffffffff8174c441 [ 19.310083] ffff880012707f08 ffffffff814bcd7d 0000000000000008 ffff880012707f18 [ 19.310083] ffff880012707eb0 ffffffff81ba8d00 0000000000000046 ffff880010c7ffd8 [ 19.310083] Call Trace: [ 19.310083] <#DF> [] dump_stack+0x4d/0x66 [ 19.310083] [] panic+0xc4/0x1d6 [ 19.310083] [] df_debug+0x2c/0x2c [ 19.310083] [] do_double_fault+0x62/0x7d [ 19.310083] [] double_fault+0x2e/0x40 [ 19.310083] [] ? async_page_fault+0xd/0x30 [ 19.310083] <> [ 19.310083] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) git bisect start 80489752332f4e4f75343d6b539095b366013bc6 fe82dcec644244676d55a1384c958d5f67979adb -- git bisect bad 0e3e2d7a608587a01aaa9eaabd7f75cad56cb8ea # 07:18 19- 6 Merge 'ak/x86/fsgs-2' into devel-lkp-hsx02-x86_64-201410040643 git bisect good 63007bebe851b304d6e2d66fd08307a4fd35cc50 # 08:09 300+ 1 0day base guard for 'devel-lkp-hsx02-x86_64-201410040643' git bisect bad b8a868e9ea876a1b40020397305533c095921d7a # 08:18 78- 19 x86: Add support for rd/wr fs/gs base git bisect good a0b0be64599f50dc2c9fa85734026701221f186a # 08:27 300+ 0 x86: Naturally align the debug IST stack git bisect good 598d570a05cd31500fb15a843a92f68ddb1b3618 # 08:33 300+ 0 x86: Add intrinsics/macros for new rd/wr fs/gs base instructions # first bad commit: [b8a868e9ea876a1b40020397305533c095921d7a] x86: Add support for rd/wr fs/gs base git bisect good 598d570a05cd31500fb15a843a92f68ddb1b3618 # 08:38 900+ 0 x86: Add intrinsics/macros for new rd/wr fs/gs base instructions git bisect bad 80489752332f4e4f75343d6b539095b366013bc6 # 08:38 0- 2 0day head guard for 'devel-lkp-hsx02-x86_64-201410040643' git bisect good 126d4576cb73c8a440adc37c129589cd66051bcc # 08:43 900+ 0 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux git bisect good 2e1d004b9645628c64a2db55ef6b81fadf5e6e91 # 08:55 900+ 0 Add linux-next specific files for 20141003 This script may reproduce the error. ---------------------------------------------------------------------------- #!/bin/bash kernel=$1 initrd=quantal-core-x86_64.cgz wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/raw/master/initrd/$initrd kvm=( qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel $kernel -initrd $initrd -m 320 -smp 2 -net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -serial stdio -display none -monitor null ) append=( hung_task_panic=1 earlyprintk=ttyS0,115200 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw drbd.minor_count=8 ) "${kvm[@]}" --append "${append[*]}" ---------------------------------------------------------------------------- Thanks, Fengguang