On 2014-06-29 12:53, Gleb Natapov wrote: > On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote: >> On 2014-06-29 12:24, Gleb Natapov wrote: >>> On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote: >>>> On 2014-06-29 08:46, Gleb Natapov wrote: >>>>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote: >>>>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2 >>>>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a >>>>>> >>>>>> kvm injects the #PF into the guest. >>>>>> >>>>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1 >>>>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318 >>>>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2 >>>>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0) >>>>>> >>>>>> Second #PF at the same address and kvm injects the #DF. >>>>>> >>>>>> BUT(!), why? >>>>>> >>>>>> I probably am missing something but WTH are we pagefaulting at a >>>>>> user address in context_switch() while doing a lockdep call, i.e. >>>>>> spin_release? We're not touching any userspace gunk there AFAICT. >>>>>> >>>>>> Is this an async pagefault or so which kvm is doing so that the guest >>>>>> rip is actually pointing at the wrong place? >>>>>> >>>>> There is nothing in the trace that point to async pagefault as far as I see. >>>>> >>>>>> Or something else I'm missing, most probably... >>>>>> >>>>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument >>>>> kvm_multiple_exception() to see which two exception are combined into #DF. >>>>> >>>> >>>> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It >>>> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and >>>> when patch-disabling the vmport in QEMU. >>>> >>>> Let me know if I can help with the analysis. >>>> >>> Bisection would be great of course. Once thing that is special about >>> vmport that comes to mind is that it reads vcpu registers to userspace and >>> write them back. IIRC "info registers" does the same. Can you see if the >>> problem is reproducible with disabled vmport, but doing "info registers" >>> in qemu console? Although trace does not should any exists to userspace >>> near the failure... >> >> Yes, info registers crashes the guest after a while as well (with >> different backtrace due to different context). >> > Oh crap. Bisection would be most helpful. Just to be absolutely sure > that this is not QEMU problem: does exactly same QEMU version work with > older kernels? Yes, that was the case last time I tried (I'm on today's git head with QEMU right now). Will see what I can do regarding bisecting. That host is a bit slow (netbook), so it may take a while. Boris will probably beat me in this. Jan