From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751329AbdH1M0J (ORCPT ); Mon, 28 Aug 2017 08:26:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:48374 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750866AbdH1M0I (ORCPT ); Mon, 28 Aug 2017 08:26:08 -0400 Date: Mon, 28 Aug 2017 14:26:06 +0200 Message-ID: From: Takashi Iwai To: Paolo Bonzini Cc: kvm@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org Subject: 4.13-rc7: WARNING at arch/x86/kvm/mmu.c:717 (and a crash thereafter) User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL/10.8 Emacs/25.2 (x86_64-suse-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I seem to get a kernel warning when running KVM on Dell desktop with IvyBridge like below. As you can see, a bad page BUG is triggered after that, too. The problem is not triggered always, but it happens occasionally. I haven't seen this on 4.13-rc4 at all, and IIRC, it started happening since rc5. So this might be a regression at rc5. But, as it doesn't happen always, I can't be 100% sure about it, and it's quite difficult to bisect (the test case isn't reliable), unfortunately. Any hint for further debugging this? thanks, Takashi === WARNING: CPU: 2 PID: 7188 at arch/x86/kvm/mmu.c:717 mmu_spte_clear_track_bits+0xe7/0x100 [kvm] CPU: 2 PID: 7188 Comm: qemu-system-x86 Not tainted 4.13.0-rc7-test+ #26 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 task: ffff997fdc1320c0 task.stack: ffffb9394cf70000 RIP: 0010:mmu_spte_clear_track_bits+0xe7/0x100 [kvm] RSP: 0018:ffffb9394cf73ab8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 400000010552cc77 RCX: dead0000000000ff RDX: 0000000000000000 RSI: ffff997f53659108 RDI: ffffe7c684154b00 RBP: ffffb9394cf73ad0 R08: 0000000000000100 R09: 000000000000000c R10: ffff997fc2ba0008 R11: ffff997fc2ba0000 R12: 000000000010552c R13: ffff997edd0f0000 R14: ffff997f69c44348 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff9980de280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd57a77ff8 CR3: 0000000105704000 CR4: 00000000001426e0 Call Trace: drop_spte+0x1a/0xb0 [kvm] mmu_page_zap_pte+0x9d/0xe0 [kvm] kvm_mmu_prepare_zap_page+0x56/0x300 [kvm] kvm_mmu_invalidate_zap_all_pages+0x10d/0x160 [kvm] kvm_arch_flush_shadow_all+0xe/0x10 [kvm] kvm_mmu_notifier_release+0x2c/0x40 [kvm] __mmu_notifier_release+0x44/0xc0 exit_mmap+0x11e/0x130 ? __switch_to+0x216/0x3f0 ? __khugepaged_exit+0x114/0x120 mmput+0x46/0x120 do_exit+0x277/0xb10 ? preempt_schedule_common+0x1f/0x30 ? preempt_schedule+0x27/0x30 ? ___preempt_schedule+0x16/0x18 do_group_exit+0x43/0xb0 get_signal+0x260/0x600 ? get_futex_key+0x294/0x350 do_signal+0x28/0x710 ? recalc_sigpending+0x1b/0x50 exit_to_usermode_loop+0x6e/0xc0 syscall_return_slowpath+0x59/0x60 entry_SYSCALL_64_fastpath+0xa7/0xa9 RIP: 0033:0x7f95aab8f8ec RSP: 002b:00007f959c36c7e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca RAX: fffffffffffffe00 RBX: 000056156743b010 RCX: 00007f95aab8f8ec RDX: 0000000000000002 RSI: 0000000000000080 RDI: 000056156626c080 RBP: 0000000000000000 R08: 000056156626c080 R09: 000056156626c000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000f R13: 000056156626c080 R14: 0000000000000000 R15: 000056156743b03c Code: 4f 04 00 48 85 c0 75 1c 4c 89 e7 e8 54 4d fe ff 48 8b 05 9d 4f 04 00 48 85 c0 74 c1 48 85 c3 0f 95 c3 eb bf 48 85 c3 74 e7 eb dd <0f> ff eb 9e 4c 89 e7 e8 2d 4d fe ff eb a4 90 66 2e 0f 1f 84 00 ---[ end trace dd04b99ed8787467 ]--- BUG: Bad page state in process python pfn:10552c page:ffffe7c684154b00 count:0 mapcount:0 mapping: (null) index:0x1 flags: 0x2ffff800000014(referenced|dirty) raw: 002ffff800000014 0000000000000000 0000000000000001 00000000ffffffff raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set bad because of flags: 0x14(referenced|dirty) CPU: 7 PID: 9171 Comm: python Tainted: G W 4.13.0-rc7-test+ #26 Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A12 01/10/2013 Call Trace: dump_stack+0x63/0x8d bad_page+0xcb/0x120 check_new_page_bad+0x67/0x80 get_page_from_freelist+0x9d1/0xb00 __alloc_pages_nodemask+0xce/0x230 alloc_pages_vma+0x88/0x1f0 __handle_mm_fault+0x7c9/0x1000 handle_mm_fault+0xde/0x1e0 __do_page_fault+0x23b/0x4f0 do_page_fault+0x22/0x30 page_fault+0x28/0x30 RIP: 0033:0x7f4a2761c2c3 RSP: 002b:00007fffbe1b2988 EFLAGS: 00010206 RAX: 0000557a87b725b0 RBX: 00000000000000e8 RCX: 0000557a87b75000 RDX: 0000557a87b78a00 RSI: 00000000000000ba RDI: 0000557a87b725b0 RBP: 00007f4a279362a8 R08: 0000000000000003 R09: 000000000000007c R10: 00000000000001c1 R11: 0000000000000077 R12: 00007f4a27935b38 R13: 0000000000006460 R14: 0000557a87b725b0 R15: 00007f4a27935ae0 Disabling lock debugging due to kernel taint