From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753007AbaF2Kxq (ORCPT ); Sun, 29 Jun 2014 06:53:46 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:38358 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752817AbaF2Kxp (ORCPT ); Sun, 29 Jun 2014 06:53:45 -0400 Date: Sun, 29 Jun 2014 13:53:40 +0300 From: Gleb Natapov To: Jan Kiszka Cc: Borislav Petkov , Paolo Bonzini , lkml , Peter Zijlstra , Steven Rostedt , x86-ml , kvm@vger.kernel.org, =?utf-8?B?SsO2cmcgUsO2ZGVs?= Subject: Re: __schedule #DF splat Message-ID: <20140629105339.GF18167@minantech.com> References: <20140627101831.GB23153@pd.tnic> <53AD586A.40900@redhat.com> <20140627115545.GC23153@pd.tnic> <53AD5D27.2090505@redhat.com> <20140627121053.GD23153@pd.tnic> <20140628114431.GB4373@pd.tnic> <20140629064626.GD18167@minantech.com> <53AFE2B3.5080300@web.de> <20140629102403.GE18167@minantech.com> <53AFEB16.5040608@web.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53AFEB16.5040608@web.de> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote: > On 2014-06-29 12:24, Gleb Natapov wrote: > > On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote: > >> On 2014-06-29 08:46, Gleb Natapov wrote: > >>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: > >>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 > >>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a > >>>> > >>>> kvm injects the #PF into the guest. > >>>> > >>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 > >>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 > >>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 > >>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) > >>>> > >>>> Second #PF at the same address and kvm injects the #DF. > >>>> > >>>> BUT(!), why? > >>>> > >>>> I probably am missing something but WTH are we pagefaulting at a > >>>> user address in context_switch() while doing a lockdep call, i.e. > >>>> spin_release? We're not touching any userspace gunk there AFAICT. > >>>> > >>>> Is this an async pagefault or so which kvm is doing so that the guest > >>>> rip is actually pointing at the wrong place? > >>>> > >>> There is nothing in the trace that point to async pagefault as far as I see. > >>> > >>>> Or something else I'm missing, most probably... > >>>> > >>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument > >>> kvm_multiple_exception() to see which two exception are combined into #DF. > >>> > >> > >> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It > >> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and > >> when patch-disabling the vmport in QEMU. > >> > >> Let me know if I can help with the analysis. > >> > > Bisection would be great of course. Once thing that is special about > > vmport that comes to mind is that it reads vcpu registers to userspace and > > write them back. IIRC "info registers" does the same. Can you see if the > > problem is reproducible with disabled vmport, but doing "info registers" > > in qemu console? Although trace does not should any exists to userspace > > near the failure... > > Yes, info registers crashes the guest after a while as well (with > different backtrace due to different context). > Oh crap. Bisection would be most helpful. Just to be absolutely sure that this is not QEMU problem: does exactly same QEMU version work with older kernels? -- Gleb.