From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756706AbbCRUve (ORCPT ); Wed, 18 Mar 2015 16:51:34 -0400 Received: from mail-lb0-f172.google.com ([209.85.217.172]:34945 "EHLO mail-lb0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753067AbbCRUvd convert rfc822-to-8bit (ORCPT ); Wed, 18 Mar 2015 16:51:33 -0400 MIME-Version: 1.0 In-Reply-To: <5509DAA4.10901@message-id.googlemail.com> References: <5505400B.8050300@message-id.googlemail.com>

<5509CBF7.3040602@message-id.googlemail.com> <5509DAA4.10901@message-id.googlemail.com> From: Andy Lutomirski Date: Wed, 18 Mar 2015 13:51:11 -0700 Message-ID: Subject: Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related? To: Stefan Seyfried Cc: Linus Torvalds , Takashi Iwai , Denys Vlasenko , X86 ML , LKML , Tejun Heo Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 18, 2015 at 1:05 PM, Stefan Seyfried wrote: > Hi Andy, > > Am 18.03.2015 um 20:26 schrieb Andy Lutomirski: >> Hi Linus- >> >> You seem to enjoy debugging these things. Want to give this a shot? >> My guess is a vmalloc fault accessing either old_rsp or kernel_stack >> right after swapgs in syscall entry. >> >> On Wed, Mar 18, 2015 at 12:03 PM, Stefan Seyfried >> wrote: >>> Hi all, >>> >>> first, I'm kind of happy that I'm not the only one seeing this, and >>> thus my beloved Thinkpad can stay for a bit longer... :-) >>> >>> Then, I'm mostly an amateur when it comes to kernel debugging, so bear >>> with me when I'm stumbling through the code... >>> >>> Am 18.03.2015 um 19:03 schrieb Andy Lutomirski: >>>> On Wed, Mar 18, 2015 at 10:46 AM, Takashi Iwai wrote: >>>>> At Wed, 18 Mar 2015 18:43:52 +0100, >>>>> Takashi Iwai wrote: >>>>>> >>>>>> At Wed, 18 Mar 2015 15:16:42 +0100, >>>>>> Takashi Iwai wrote: >>>>>>> >>>>>>> At Sun, 15 Mar 2015 09:17:15 +0100, >>>>>>> Stefan Seyfried wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> in 4.0-rc I have recently seen a few crashes, always when running >>>>>>>> KVM guests (IIRC). Today I was able to capture a crash dump, this >>>>>>>> is the backtrace from dmesg.txt: >>>>>>>> >>>>>>>> [242060.604870] PANIC: double fault, error_code: 0x0 >>>> >>>> OK, we double faulted. Too bad that x86 CPUs don't tell us why. >>>> >>>>>>>> [242060.604878] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 >>>>>>>> [242060.604880] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 >>>>>>>> [242060.604883] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 >>>>>>>> [242060.604885] RIP: 0010:[] [] page_fault+0xd/0x30 >>>> >>>> The double fault happened during page fault processing. Could you >>>> disassemble your page_fault function to find the offending >>>> instruction? >>> >>> This one is easy: >>> >>> crash> disassemble page_fault >>> Dump of assembler code for function page_fault: >>> 0xffffffff816834a0 <+0>: data32 xchg %ax,%ax >>> 0xffffffff816834a3 <+3>: data32 xchg %ax,%ax >>> 0xffffffff816834a6 <+6>: data32 xchg %ax,%ax >>> 0xffffffff816834a9 <+9>: sub $0x78,%rsp >>> 0xffffffff816834ad <+13>: callq 0xffffffff81683620 >> >> The callq was the double-faulting instruction, and it is indeed the >> first function in here that would have accessed the stack. (The sub >> *changes* rsp but isn't a memory access.) So, since RSP is bogus, we >> page fault, and the page fault is promoted to a double fault. The >> surprising thing is that the page fault itself seems to have been >> delivered okay, and RSP wasn't on a page boundary. >> >> You wouldn't happen to be using a Broadwell machine? > > No, this is a quite old Thinkpad X200s, Core2duo > processor : 1 > vendor_id : GenuineIntel > cpu family : 6 > model : 23 > model name : Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz > stepping : 10 > microcode : 0xa0c > >> The only way to get here with bogus RSP is if we interrupted something >> that was previously running at CPL0 with similarly bogus RSP. >> >> I don't know if I trust CR2. It's 16 bytes lower than I'd expect. >> >>> 0xffffffff816834b2 <+18>: mov %rsp,%rdi >>> 0xffffffff816834b5 <+21>: mov 0x78(%rsp),%rsi >>> 0xffffffff816834ba <+26>: movq $0xffffffffffffffff,0x78(%rsp) >>> 0xffffffff816834c3 <+35>: callq 0xffffffff810504e0 >>> 0xffffffff816834c8 <+40>: jmpq 0xffffffff816836d0 >>> End of assembler dump. >>> >>> >>>>>>>> [242060.604893] RSP: 0018:00007fffa55eafb8 EFLAGS: 00010016 >>>> >>>> Uh, what? That RSP is a user address. >>>> >>>>>>>> [242060.604895] RAX: 000000000000aa40 RBX: 0000000000000001 RCX: ffffffff81682237 >>>>>>>> [242060.604896] RDX: 000000000000aa40 RSI: 0000000000000000 RDI: 00007fffa55eb078 >>>>>>>> [242060.604898] RBP: 00007fffa55f1c1c R08: 0000000000000008 R09: 0000000000000000 >>>>>>>> [242060.604900] R10: 0000000000000000 R11: 0000000000000293 R12: 000000000000004a >>>>>>>> [242060.604902] R13: 00007ffa356b5d60 R14: 000000000000000f R15: 00007ffa3556cf20 >>>>>>>> [242060.604904] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 >>>>>>>> [242060.604906] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> [242060.604908] CR2: 00007fffa55eafa8 CR3: 0000000002d7e000 CR4: 00000000000427e0 >>>>>>>> [242060.604909] Stack: >>>>>>>> [242060.604942] BUG: unable to handle kernel paging request at 00007fffa55eafb8 >>>>>>>> [242060.604995] IP: [] show_stack_log_lvl+0x124/0x190 >>>> >>>> This is suspicious. We need to have died, again, of a fatal page >>>> fault while dumping the stack. >>> >>> I posted the same problem to the opensuse kernel list shortly before turning >>> to LKML. There, Michal Kubecek noted: >>> >>> "I encountered a similar problem recently. The thing is, x86 >>> specification says that on a double fault, RIP and RSP registers are >>> undefined, i.e. you not only can't expect them to contain values >>> corresponding to the first or second fault but you can't even expect >>> them to have any usable values at all. Unfortunately the kernel double >>> fault handler doesn't take this into account and does try to display >>> usual crash related information so that it itself does usually crash >>> when trying to show stack content (that's the show_stack_log_lvl() >>> crash). >> >> I think that's not entirely true. RIP is reliable for many classes of >> double faults, and we rely on that for espfix64. The fact that hpa >> was willing to write that code strongly suggests that Intel chips at >> least really do work that way. >> >>> >>> The result is a double fault (which itself would be very hard to debug) >>> followed by a crash in its handler so that analysing the outcome is >>> extremely difficult." >>> >>> I cannot judge if this is true, but it sounded related to solving the >>> problem to me. >> >> >> The crash in the handler is a separate bug. >> >> >> >>> >>>>>>>> [242060.605036] PGD 4779a067 PUD 40e3e067 PMD 4769e067 PTE 0 >>>>>>>> [242060.605078] Oops: 0000 [#1] PREEMPT SMP >>>>>>>> [242060.605106] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache nls_iso8859_1 nls_cp437 vfat fat ppp_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc ses enclosure uas usb_storage cmac algif_hash ctr ccm rfcomm fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet bnep dm_crypt ecb cbc algif_skcipher af_alg xfs libcrc32c snd_hda_codec_conexant snd_hda_codec_generic iTCO_wdt iTCO_vendor_support snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm_oss snd_pcm >>>>>>>> [242060.605396] dm_mod snd_seq snd_seq_device snd_timer coretemp kvm_intel kvm snd_mixer_oss cdc_ether cdc_wdm cdc_acm usbnet mii arc4 uvcvideo videobuf2_vmalloc videobuf2_memops thinkpad_acpi videobuf2_core btusb v4l2_common videodev i2c_i801 iwldvm bluetooth serio_raw mac80211 pcspkr e1000e iwlwifi snd lpc_ich mei_me ptp mfd_core pps_core mei cfg80211 shpchp wmi soundcore rfkill battery ac tpm_tis tpm acpi_cpufreq i915 xhci_pci xhci_hcd i2c_algo_bit drm_kms_helper drm thermal video button processor sg loop >>>>>>>> [242060.605396] CPU: 1 PID: 2132 Comm: qemu-system-x86 Tainted: G W 4.0.0-rc3-2.gd5c547f-desktop #1 >>>>>>>> [242060.605396] Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011 >>>>>>>> [242060.605396] task: ffff880103f46150 ti: ffff8801013d4000 task.ti: ffff8801013d4000 >>>>>>>> [242060.605396] RIP: 0010:[] [] show_stack_log_lvl+0x124/0x190 >>>>>>>> [242060.605396] RSP: 0018:ffff88023bc84e88 EFLAGS: 00010046 >>>>>>>> [242060.605396] RAX: 00007fffa55eafc0 RBX: 00007fffa55eafb8 RCX: ffff88023bc7ffc0 >>>>>>>> [242060.605396] RDX: 0000000000000000 RSI: ffff88023bc84f58 RDI: 0000000000000000 >>>>>>>> [242060.605396] RBP: ffff88023bc83fc0 R08: ffffffff81a2fe15 R09: 0000000000000020 >>>>>>>> [242060.605396] R10: 0000000000000afb R11: ffff88023bc84bee R12: ffff88023bc84f58 >>>>>>>> [242060.605396] R13: 0000000000000000 R14: ffffffff81a2fe15 R15: 0000000000000000 >>>>>>>> [242060.605396] FS: 00007ffa33dbfa80(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 >>>>>>>> [242060.605396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> [242060.605396] CR2: 00007fffa55eafb8 CR3: 0000000002d7e000 CR4: 00000000000427e0 >>>>>>>> [242060.605396] Stack: >>>>>>>> [242060.605396] 0000000002d7e000 0000000000000008 ffff88023bc84ee8 00007fffa55eafb8 >>>>>>>> [242060.605396] 0000000000000000 ffff88023bc84f58 00007fffa55eafb8 0000000000000040 >>>>>>>> [242060.605396] 00007ffa356b5d60 000000000000000f 00007ffa3556cf20 ffffffff81005c36 >>>>>>>> [242060.605396] Call Trace: >>>>>>>> [242060.605396] [] show_regs+0x86/0x210 >>>>>>>> [242060.605396] [] df_debug+0x1f/0x30 >>>>>>>> [242060.605396] [] do_double_fault+0x84/0x100 >>>>>>>> [242060.605396] [] double_fault+0x28/0x30 >>>>>>>> [242060.605396] [] page_fault+0xd/0x30 >>>>>>>> [242060.605396] Code: fe a2 81 31 c0 89 54 24 08 48 89 0c 24 48 8b 5b f8 e8 cc 06 67 00 48 8b 0c 24 8b 54 24 08 85 d2 74 05 f6 c2 03 74 48 48 8d 43 08 <48> 8b 33 48 c7 c7 0d fe a2 81 89 54 24 14 48 89 4c 24 08 48 89 >>>>>>>> [242060.605396] RIP [] show_stack_log_lvl+0x124/0x190 >>>>>>>> [242060.605396] RSP >>>>>>>> [242060.605396] CR2: 00007fffa55eafb8 >>>>>>>> >>>>>>>> I would not totally rule out a hardware problem, since this machine had >>>>>>>> another weird crash where it crashed and the bios beeper was constant >>>>>>>> on until I hit the power button for 5 seconds. >>>>>>>> >>>>>>>> Unfortunately, I cannot load the crashdump with the crash version in >>>>>>>> openSUSE Tumbleweed, so the backtrace is all I have for now. >>>>>>> >>>>>>> Just "me too", I'm getting the very same crash out of sudden with the >>>>>>> recent 4.0-rc. Judging from the very same pattern (usually crash >>>>>>> happened while using KVM (-smp 4) and kernel builds with -j8), I don't >>>>>>> think it's a hardware problem. >>>>>> >>>>>> The git bisection pointed to the commit: >>>>>> commit b926e6f61a26036ee9eabe6761483954d481ad25 >>>>>> x86, traps: Fix ist_enter from userspace >>>>>> >>>>>> And reverting this on top of the latest Linus tree seems working. >>>>>> Seife, could you verify on your machine, too? >>>>> >>>>> Argh, false positive. Right after I wrote this mail, I got the very >>>>> same crash. I seem to need running the test much longer than I >>>>> thought. >>>>> >>>>> But somehow the commits around the above smell suspicious... >>>>> >>>> >>>> Those commits shouldn't really have affected page fault or double >>>> fault behavior. They made big changes to MCE, breakpoints, and debug >>>> exceptions. >>>> >>>> Something's very wrong here. I'm guessing that we somehow ended up in >>>> page_fault in a completely invalid context. >>>> >>>> One hairy code path that could plausibly do this is: >>>> >>>> 1. syscall >>>> >>>> 2. vmalloc fault accessing old_rsp aka rsp_scratch (or kernel_stack -- >>>> same issue) >>>> >>>> 3. page fault. Now we're on the user stack and we do awful things. >>>> If we run off the end of the presently writable portion of the stack, >>>> we double fault. >>> >>> Maybe Michal's idea from above points in the right direction? >>> >>> Now since I have a crash dump and the corresponding debuginfo at hand, >>> might this help somehow to find out where the problem originated from? >>> >>> I mean -- it's only 508 processes to look at :-) but if I knew what to >>> look for in their backtraces, I would try to do it. >> >> The relevant thread's stack is here (see ti in the trace): >> >> ffff8801013d4000 >> >> It could be interesting to see what's there. >> >> I don't suppose you want to try to walk the paging structures to see >> if ffff88023bc80000 (i.e. gsbase) and, more specifically, >> ffff88023bc80000 + old_rsp and ffff88023bc80000 + kernel_stack are >> present? You'd only have to walk one level -- presumably, if the PGD >> entry is there, the rest of the entries are okay, too. > > That's all greek to me :-) > > I see that there is something at ffff88023bc80000: > > crash> x /64xg 0xffff88023bc80000 > 0xffff88023bc80000: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80010: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80020: 0x0000000000000000 0x000000006686ada9 > 0xffff88023bc80030: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80040: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80050: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80060: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80070: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80080: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80090: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc800a0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc800b0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc800c0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc800d0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc800e0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc800f0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80100: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80110: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80120: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80130: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80140: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80150: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80160: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80170: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80180: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc80190: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc801a0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc801b0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc801c0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc801d0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc801e0: 0x0000000000000000 0x0000000000000000 > 0xffff88023bc801f0: 0x0000000000000000 0x0000000000000000 > > old_rsp and kernel_stack seem bogus: > crash> print old_rsp > Cannot access memory at address 0xa200 > gdb: gdb request failed: print old_rsp > crash> print kernel_stack > Cannot access memory at address 0xaa48 > gdb: gdb request failed: print kernel_stack > > kernel_stack is not a pointer? So 0xffff88023bc80000 + 0xaa48 it is: Yup. old_rsp and kernel_stack are offsets relative to gsbase. > > crash> x /64xg 0xffff88023bc8aa00 > 0xffff88023bc8aa00: 0x0000000000000000 0x0000000000000000 [...] I don't know enough about crashkernel to know whether the fact that this worked means anything. Can you dump the page of physical memory at 0x4779a067? That's the PGD. --Andy