All of lore.kernel.org
 help / color / mirror / Atom feed
* __schedule #DF splat
@ 2014-06-25 15:32 Borislav Petkov
  2014-06-25 20:26 ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-25 15:32 UTC (permalink / raw)
  To: lkml; +Cc: Peter Zijlstra, Steven Rostedt, x86-ml, kvm

Hi guys,

so I'm looking at this splat below when booting current linus+tip/master
in a kvm guest. Initially I thought this is something related to the
PARAVIRT gunk but it happens with and without it.

So, from what I can see, we first #DF and then lockdep fires a deadlock
warning. That I can understand but what I can't understand is why we #DF
with this RIP:

[    2.744062] RIP: 0010:[<ffffffff816139df>]  [<ffffffff816139df>] __schedule+0x28f/0xab0

disassembling this points to

	/*
	 * Since the runqueue lock will be released by the next
	 * task (which is an invalid locking op but in the case
	 * of the scheduler it's an obvious special-case), so we
	 * do an early lockdep release here:
	 */
#ifndef __ARCH_WANT_UNLOCKED_CTXSW
	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
#endif

this call in context_switch() (provided this RIP is correct, of course).
(btw, various dumps at the end of this mail with the "<---- faulting"
marker).

And that's lock_release() in lockdep.c.

What's also interesting is that we have two __schedule calls on the stack
before #DF:

[    2.744062]  [<ffffffff816139ce>] ? __schedule+0x27e/0xab0
[    2.744062]  [<ffffffff816139df>] ? __schedule+0x28f/0xab0

The show_stack_log_lvl() I'm attributing to the userspace stack not
being mapped while we're trying to walk it (we do have a %cr3 write
shortly before the RIP we're faulting at) which is another snafu and
shouldn't happen, i.e., we should detect that and not walk it or
whatever...

Anyway, this is what I can see - any and all suggestions on how to debug
this further are appreciated. More info available upon request.

Thanks.

[    1.932807] devtmpfs: mounted
[    1.938324] Freeing unused kernel memory: 2872K (ffffffff819ad000 - ffffffff81c7b000)
[    2.450824] udevd[814]: starting version 175
[    2.743648] PANIC: double fault, error_code: 0x0
[    2.743657] 
[    2.744062] ======================================================
[    2.744062] [ INFO: possible circular locking dependency detected ]
[    2.744062] 3.16.0-rc2+ #2 Not tainted
[    2.744062] -------------------------------------------------------
[    2.744062] vmmouse_detect/957 is trying to acquire lock:
[    2.744062]  ((console_sem).lock){-.....}, at: [<ffffffff81092dcd>] down_trylock+0x1d/0x50
[    2.744062] 
[    2.744062] but task is already holding lock:
[    2.744062]  (&rq->lock){-.-.-.}, at: [<ffffffff8161382f>] __schedule+0xdf/0xab0
[    2.744062] 
[    2.744062] which lock already depends on the new lock.
[    2.744062] 
[    2.744062] 
[    2.744062] the existing dependency chain (in reverse order) is:
[    2.744062] 
-> #2 (&rq->lock){-.-.-.}:
[    2.744062]        [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[    2.744062]        [<ffffffff81619111>] _raw_spin_lock+0x41/0x80
[    2.744062]        [<ffffffff8108090b>] wake_up_new_task+0xbb/0x290
[    2.744062]        [<ffffffff8104e847>] do_fork+0x147/0x770
[    2.744062]        [<ffffffff8104ee96>] kernel_thread+0x26/0x30
[    2.744062]        [<ffffffff8160e282>] rest_init+0x22/0x140
[    2.744062]        [<ffffffff81b82e3e>] start_kernel+0x408/0x415
[    2.744062]        [<ffffffff81b82463>] x86_64_start_reservations+0x2a/0x2c
[    2.744062]        [<ffffffff81b8255b>] x86_64_start_kernel+0xf6/0xf9
[    2.744062] 
-> #1 (&p->pi_lock){-.-.-.}:
[    2.744062]        [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[    2.744062]        [<ffffffff81619333>] _raw_spin_lock_irqsave+0x53/0x90
[    2.744062]        [<ffffffff810803b1>] try_to_wake_up+0x31/0x450
[    2.744062]        [<ffffffff810807f3>] wake_up_process+0x23/0x40
[    2.744062]        [<ffffffff816177ff>] __up.isra.0+0x1f/0x30
[    2.744062]        [<ffffffff81092fc1>] up+0x41/0x50
[    2.744062]        [<ffffffff810ac7b8>] console_unlock+0x258/0x490
[    2.744062]        [<ffffffff810acc81>] vprintk_emit+0x291/0x610
[    2.744062]        [<ffffffff8161185c>] printk+0x4f/0x57
[    2.744062]        [<ffffffff81486ad1>] input_register_device+0x401/0x4d0
[    2.744062]        [<ffffffff814909b4>] atkbd_connect+0x2b4/0x2e0
[    2.744062]        [<ffffffff81481a3b>] serio_connect_driver+0x3b/0x60
[    2.744062]        [<ffffffff81481a80>] serio_driver_probe+0x20/0x30
[    2.744062]        [<ffffffff813cd8e5>] really_probe+0x75/0x230
[    2.744062]        [<ffffffff813cdbc1>] __driver_attach+0xb1/0xc0
[    2.744062]        [<ffffffff813cb97b>] bus_for_each_dev+0x6b/0xb0
[    2.744062]        [<ffffffff813cd43e>] driver_attach+0x1e/0x20
[    2.744062]        [<ffffffff81482ded>] serio_handle_event+0x14d/0x1f0
[    2.744062]        [<ffffffff8106c9d7>] process_one_work+0x1c7/0x680
[    2.744062]        [<ffffffff8106d77b>] worker_thread+0x6b/0x540
[    2.744062]        [<ffffffff81072ec8>] kthread+0x108/0x120
[    2.744062]        [<ffffffff8161a3ac>] ret_from_fork+0x7c/0xb0
[    2.744062] 
-> #0 ((console_sem).lock){-.....}:
[    2.744062]        [<ffffffff8109b564>] __lock_acquire+0x1f14/0x2290
[    2.744062]        [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[    2.744062]        [<ffffffff81619333>] _raw_spin_lock_irqsave+0x53/0x90
[    2.744062]        [<ffffffff81092dcd>] down_trylock+0x1d/0x50
[    2.744062]        [<ffffffff810ac2ae>] console_trylock+0x1e/0xb0
[    2.744062]        [<ffffffff810acc63>] vprintk_emit+0x273/0x610
[    2.744062]        [<ffffffff8161185c>] printk+0x4f/0x57
[    2.744062]        [<ffffffff8103d10b>] df_debug+0x1b/0x40
[    2.744062]        [<ffffffff81003d1d>] do_double_fault+0x5d/0x80
[    2.744062]        [<ffffffff8161bf87>] double_fault+0x27/0x30
[    2.744062] 
[    2.744062] other info that might help us debug this:
[    2.744062] 
[    2.744062] Chain exists of:
  (console_sem).lock --> &p->pi_lock --> &rq->lock

[    2.744062]  Possible unsafe locking scenario:
[    2.744062] 
[    2.744062]        CPU0                    CPU1
[    2.744062]        ----                    ----
[    2.744062]   lock(&rq->lock);
[    2.744062]                                lock(&p->pi_lock);
[    2.744062]                                lock(&rq->lock);
[    2.744062]   lock((console_sem).lock);
[    2.744062] 
[    2.744062]  *** DEADLOCK ***
[    2.744062] 
[    2.744062] 1 lock held by vmmouse_detect/957:
[    2.744062]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff8161382f>] __schedule+0xdf/0xab0
[    2.744062] 
[    2.744062] stack backtrace:
[    2.744062] CPU: 0 PID: 957 Comm: vmmouse_detect Not tainted 3.16.0-rc2+ #2
[    2.744062] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    2.744062]  ffffffff823f00a0 ffff88007c205c50 ffffffff8161206f ffffffff823f2d30
[    2.744062]  ffff88007c205c90 ffffffff81095b3b ffffffff827f4980 ffff88007aab9ad8
[    2.744062]  ffff88007aab93a8 ffff88007aab9370 0000000000000001 ffff88007aab9aa0
[    2.744062] Call Trace:
[    2.744062]  <#DF>  [<ffffffff8161206f>] dump_stack+0x4e/0x7a
[    2.744062]  [<ffffffff81095b3b>] print_circular_bug+0x1fb/0x330
[    2.744062]  [<ffffffff8109b564>] __lock_acquire+0x1f14/0x2290
[    2.744062]  [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[    2.744062]  [<ffffffff81092dcd>] ? down_trylock+0x1d/0x50
[    2.744062]  [<ffffffff81619333>] _raw_spin_lock_irqsave+0x53/0x90
[    2.744062]  [<ffffffff81092dcd>] ? down_trylock+0x1d/0x50
[    2.744062]  [<ffffffff810acc63>] ? vprintk_emit+0x273/0x610
[    2.744062]  [<ffffffff81092dcd>] down_trylock+0x1d/0x50
[    2.744062]  [<ffffffff810acc63>] ? vprintk_emit+0x273/0x610
[    2.744062]  [<ffffffff810ac2ae>] console_trylock+0x1e/0xb0
[    2.744062]  [<ffffffff810acc63>] vprintk_emit+0x273/0x610
[    2.744062]  [<ffffffff8161185c>] printk+0x4f/0x57
[    2.744062]  [<ffffffff8103d10b>] df_debug+0x1b/0x40
[    2.744062]  [<ffffffff81003d1d>] do_double_fault+0x5d/0x80
[    2.744062]  [<ffffffff8161bf87>] double_fault+0x27/0x30
[    2.744062]  [<ffffffff816139ce>] ? __schedule+0x27e/0xab0
[    2.744062]  [<ffffffff816139df>] ? __schedule+0x28f/0xab0
[    2.744062]  <<EOE>>  <UNK> 
[    2.744062] CPU: 0 PID: 957 Comm: vmmouse_detect Not tainted 3.16.0-rc2+ #2
[    2.744062] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    2.744062] task: ffff88007aab9370 ti: ffff88007abb8000 task.ti: ffff88007abb8000
[    2.744062] RIP: 0010:[<ffffffff816139df>]  [<ffffffff816139df>] __schedule+0x28f/0xab0
[    2.744062] RSP: 002b:00007fffb47a8730  EFLAGS: 00013086
[    2.744062] RAX: 000000007b4b2000 RBX: ffff88007b0cb200 RCX: 0000000000000028
[    2.744062] RDX: ffffffff816139ce RSI: 0000000000000001 RDI: ffff88007c3d3a18
[    2.744062] RBP: 00007fffb47a8820 R08: 0000000000000000 R09: 0000000000002dd4
[    2.744062] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c3d3a00
[    2.744062] R13: ffff880079c24a00 R14: 0000000000000000 R15: ffff88007aab9370
[    2.744062] FS:  00007f48052ad700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000
[    2.744062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.744062] CR2: 00007fffb47a8728 CR3: 000000007b4b2000 CR4: 00000000000006f0
[    2.744062] Stack:
[    2.744062] BUG: unable to handle kernel paging request at 00007fffb47a8730
[    2.744062] IP: [<ffffffff81005a4c>] show_stack_log_lvl+0x11c/0x1d0
[    2.744062] PGD 7b210067 PUD 0 
[    2.744062] Oops: 0000 [#1] PREEMPT SMP 
[    2.744062] Modules linked in:
[    2.744062] CPU: 0 PID: 957 Comm: vmmouse_detect Not tainted 3.16.0-rc2+ #2
[    2.744062] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    2.744062] task: ffff88007aab9370 ti: ffff88007abb8000 task.ti: ffff88007abb8000
[    2.744062] RIP: 0010:[<ffffffff81005a4c>]  [<ffffffff81005a4c>] show_stack_log_lvl+0x11c/0x1d0
[    2.744062] RSP: 002b:ffff88007c205e58  EFLAGS: 00013046
[    2.744062] RAX: 00007fffb47a8738 RBX: 0000000000000000 RCX: ffff88007c203fc0
[    2.744062] RDX: 00007fffb47a8730 RSI: ffff88007c200000 RDI: ffffffff8184e0ea
[    2.744062] RBP: ffff88007c205ea8 R08: ffff88007c1fffc0 R09: 0000000000000000
[    2.744062] R10: 000000007c200000 R11: 0000000000000000 R12: ffff88007c205f58
[    2.744062] R13: 0000000000000000 R14: ffffffff8181b584 R15: 0000000000000000
[    2.744062] FS:  00007f48052ad700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000
[    2.744062] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.744062] CR2: 00007fffb47a8730 CR3: 000000007b4b2000 CR4: 00000000000006f0
[    2.744062] Stack:
[    2.744062]  0000000000000008 ffff88007c205eb8 ffff88007c205e70 000000007b4b2000
[    2.744062]  00007fffb47a8730 ffff88007c205f58 00007fffb47a8730 0000000000000040
[    2.744062]  0000000000000ac0 ffff88007aab9370 ffff88007c205f08 ffffffff81005ba0
[    2.744062] Call Trace:
[    2.744062]  <#DF> 
[    2.744062]  [<ffffffff81005ba0>] show_regs+0xa0/0x280
[    2.744062]  [<ffffffff8103d113>] df_debug+0x23/0x40
[    2.744062]  [<ffffffff81003d1d>] do_double_fault+0x5d/0x80
[    2.744062]  [<ffffffff8161bf87>] double_fault+0x27/0x30
[    2.744062]  [<ffffffff816139ce>] ? __schedule+0x27e/0xab0
[    2.744062]  [<ffffffff816139df>] ? __schedule+0x28f/0xab0
[    2.744062]  <<EOE>> 
[    2.744062]  <UNK> Code: 7a ff ff ff 0f 1f 00 e8 d3 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 7c b5 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7 
[    2.744062] RIP  [<ffffffff81005a4c>] show_stack_log_lvl+0x11c/0x1d0
[    2.744062]  RSP <ffff88007c205e58>
[    2.744062] CR2: 00007fffb47a8730
[    2.744062] ---[ end trace 5cdf016839902dea ]---
[    2.744062] note: vmmouse_detect[957] exited with preempt_count 3



     247:       48 8b 7b 40             mov    0x40(%rbx),%rdi
     24b:       e8 00 00 00 00          callq  250 <__schedule+0x250>
     250:       0f 22 d8                mov    %rax,%cr3
     253:       f0 4d 0f b3 b5 88 03    lock btr %r14,0x388(%r13)
     25a:       00 00 
     25c:       4c 8b b3 90 03 00 00    mov    0x390(%rbx),%r14
     263:       4d 39 b5 90 03 00 00    cmp    %r14,0x390(%r13)
     26a:       0f 85 38 06 00 00       jne    8a8 <__schedule+0x8a8>
     270:       49 83 bf 88 02 00 00    cmpq   $0x0,0x288(%r15)
     277:       00 
     278:       0f 84 9a 03 00 00       je     618 <__schedule+0x618>
     27e:       49 8d 7c 24 18          lea    0x18(%r12),%rdi
     283:       48 c7 c2 00 00 00 00    mov    $0x0,%rdx
     28a:       be 01 00 00 00          mov    $0x1,%esi
     28f:       e8 00 00 00 00          callq  294 <__schedule+0x294>		<--- faulting
     294:       48 8b 74 24 18          mov    0x18(%rsp),%rsi
     299:       4c 89 ff                mov    %r15,%rdi
     29c:       9c                      pushfq 
     29d:       55                      push   %rbp
     29e:       48 89 f5                mov    %rsi,%rbp
     2a1:       48 89 a7 f0 04 00 00    mov    %rsp,0x4f0(%rdi)
     2a8:       48 8b a6 f0 04 00 00    mov    0x4f0(%rsi),%rsp
     2af:       e8 00 00 00 00          callq  2b4 <__schedule+0x2b4>
     2b4:       65 48 8b 34 25 00 00    mov    %gs:0x0,%rsi


ffffffff81613997:       48 8b 7b 40             mov    0x40(%rbx),%rdi
ffffffff8161399b:       e8 50 28 a3 ff          callq  ffffffff810461f0 <__phys_addr>
ffffffff816139a0:       0f 22 d8                mov    %rax,%cr3
ffffffff816139a3:       f0 4d 0f b3 b5 88 03    lock btr %r14,0x388(%r13)
ffffffff816139aa:       00 00 
ffffffff816139ac:       4c 8b b3 90 03 00 00    mov    0x390(%rbx),%r14
ffffffff816139b3:       4d 39 b5 90 03 00 00    cmp    %r14,0x390(%r13)
ffffffff816139ba:       0f 85 38 06 00 00       jne    ffffffff81613ff8 <__schedule+0x8a8>
ffffffff816139c0:       49 83 bf 88 02 00 00    cmpq   $0x0,0x288(%r15)
ffffffff816139c7:       00 
ffffffff816139c8:       0f 84 9a 03 00 00       je     ffffffff81613d68 <__schedule+0x618>
ffffffff816139ce:       49 8d 7c 24 18          lea    0x18(%r12),%rdi
ffffffff816139d3:       48 c7 c2 ce 39 61 81    mov    $0xffffffff816139ce,%rdx
ffffffff816139da:       be 01 00 00 00          mov    $0x1,%esi
ffffffff816139df:       e8 bc 82 a8 ff          callq  ffffffff8109bca0 <lock_release>		<--- faulting
ffffffff816139e4:       48 8b 74 24 18          mov    0x18(%rsp),%rsi
ffffffff816139e9:       4c 89 ff                mov    %r15,%rdi
ffffffff816139ec:       9c                      pushfq 
ffffffff816139ed:       55                      push   %rbp
ffffffff816139ee:       48 89 f5                mov    %rsi,%rbp
ffffffff816139f1:       48 89 a7 f0 04 00 00    mov    %rsp,0x4f0(%rdi)
ffffffff816139f8:       48 8b a6 f0 04 00 00    mov    0x4f0(%rsi),%rsp
ffffffff816139ff:       e8 cc d9 9e ff          callq  ffffffff810013d0 <__switch_to>
ffffffff81613a04:       65 48 8b 34 25 00 b9    mov    %gs:0xb900,%rsi


# 80 "./arch/x86/include/asm/bitops.h" 1
	.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
	lock; bts %r14,904(%rbx)	# D.63059, MEM[(volatile long int *)_201]
# 0 "" 2
#NO_APP
	movq	64(%rbx), %rdi	# mm_193->pgd, mm_193->pgd
	call	__phys_addr	#
#APP
# 54 "./arch/x86/include/asm/special_insns.h" 1
	mov %rax,%cr3	# D.63056
# 0 "" 2
# 117 "./arch/x86/include/asm/bitops.h" 1
	.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
	lock; btr %r14,904(%r13)	# D.63059, MEM[(volatile long int *)_215]
# 0 "" 2
#NO_APP
	movq	912(%rbx), %r14	# mm_193->context.ldt, D.63062
	cmpq	%r14, 912(%r13)	# D.63062, oldmm_194->context.ldt
	jne	.L2117	#,
.L2023:
	cmpq	$0, 648(%r15)	#, prev_21->mm
	je	.L2118	#,
.L2029:
	leaq	24(%r12), %rdi	#, D.63079
	movq	$.L2029, %rdx	#,
	movl	$1, %esi	#,
	call	lock_release	#				<---faulting
	movq	24(%rsp), %rsi	# %sfp, D.63067
	movq	%r15, %rdi	# prev, prev
#APP
# 2338 "kernel/sched/core.c" 1
	pushf ; pushq %rbp ; movq %rsi,%rbp
	movq %rsp,1264(%rdi)	#, prev
	movq 1264(%rsi),%rsp	#, D.63067
	call __switch_to
	movq %gs:current_task,%rsi	# current_task
	movq 760(%rsi),%r8	#
	movq %r8,%gs:irq_stack_union+40	# irq_stack_union.D.4635.stack_canary
	movq 8(%rsi),%r8	#
	movq %rax,%rdi
	testl  $262144,16(%r8)	#,
	jnz   ret_from_fork
	movq %rbp,%rsi ; popq %rbp ; popf	
# 0 "" 2
#NO_APP
	movq	%rax, 24(%rsp)	# prev, %sfp
	call	debug_smp_processor_id	#
	movl	%eax, %eax	# D.63055, D.63055
	movq	$runqueues, %rbx	#, __ptr
	movq	24(%rsp), %rsi	# %sfp, prev
	movq	%rbx, %rdi	# __ptr, D.63056
	addq	__per_cpu_offset(,%rax,8), %rdi	# __per_cpu_offset, D.63056
	movq	$runqueues, %r12	#, __ptr
	call	finish_task_switch	#
	call	debug_smp_processor_id	#

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-25 15:32 __schedule #DF splat Borislav Petkov
@ 2014-06-25 20:26 ` Borislav Petkov
  2014-06-27 10:18   ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-25 20:26 UTC (permalink / raw)
  To: lkml, Paolo Bonzini; +Cc: Peter Zijlstra, Steven Rostedt, x86-ml, kvm

On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote:
> Hi guys,
> 
> so I'm looking at this splat below when booting current linus+tip/master
> in a kvm guest. Initially I thought this is something related to the
> PARAVIRT gunk but it happens with and without it.

Ok, here's a cleaner splat. I went and rebuilt qemu to latest master
from today to rule out some breakage there but it still fires.

Paolo, any ideas why would kvm+qemu trigger a #DF in the guest? I guess
I should dust off my old kvm/qemu #DF debugging patch I had somewhere...

I did try to avoid the invalid stack issue by doing:

---
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 1abcb50b48ae..dd8e0eec071e 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -286,7 +286,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
                }
                if (i && ((i % STACKSLOTS_PER_LINE) == 0))
                        pr_cont("\n");
-               pr_cont(" %016lx", *stack++);
+               pr_cont(" %016lx", (((unsigned long)stack <= 0x00007fffffffffffUL) ? -1 : *stack++));
                touch_nmi_watchdog();
        }
        preempt_enable();
---

but that didn't work either - see second splat at the end.

[    2.704184] PANIC: double fault, error_code: 0x0
[    2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[    2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000
[    2.708132] RIP: 0010:[<ffffffff8161130f>]  [<ffffffff8161130f>] __schedule+0x28f/0xab0
[    2.708132] RSP: 002b:00007fff99e51100  EFLAGS: 00013082
[    2.708132] RAX: 000000007b206000 RBX: ffff88007b526f80 RCX: 0000000000000028
[    2.708132] RDX: ffffffff816112fe RSI: 0000000000000001 RDI: ffff88007c5d3c58
[    2.708132] RBP: 00007fff99e511f0 R08: 0000000000000000 R09: 0000000000000000
[    2.708132] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c5d3c40
[    2.708132] R13: ffff880079c84e40 R14: 0000000000000000 R15: ffff880079c78000
[    2.708132] FS:  00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[    2.708132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.708132] CR2: 00007fff99e510f8 CR3: 000000007b206000 CR4: 00000000000006e0
[    2.708132] Stack:
[    2.708132] BUG: unable to handle kernel paging request at 00007fff99e51100
[    2.708132] IP: [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    2.708132] PGD 7b20d067 PUD 0 
[    2.708132] Oops: 0000 [#1] PREEMPT SMP 
[    2.708132] Modules linked in:
[    2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[    2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000
[    2.708132] RIP: 0010:[<ffffffff81005bbc>]  [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    2.708132] RSP: 002b:ffff88007c405e58  EFLAGS: 00013046
[    2.708132] RAX: 00007fff99e51108 RBX: 0000000000000000 RCX: ffff88007c403fc0
[    2.708132] RDX: 00007fff99e51100 RSI: ffff88007c400000 RDI: ffffffff81846aba
[    2.708132] RBP: ffff88007c405ea8 R08: ffff88007c3fffc0 R09: 0000000000000000
[    2.708132] R10: 000000007c400000 R11: 0000000000000000 R12: ffff88007c405f58
[    2.708132] R13: 0000000000000000 R14: ffffffff818136fc R15: 0000000000000000
[    2.708132] FS:  00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[    2.708132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.708132] CR2: 00007fff99e51100 CR3: 000000007b206000 CR4: 00000000000006e0
[    2.708132] Stack:
[    2.708132]  0000000000000008 ffff88007c405eb8 ffff88007c405e70 000000007b206000
[    2.708132]  00007fff99e51100 ffff88007c405f58 00007fff99e51100 0000000000000040
[    2.708132]  0000000000000ac0 ffff880079c78000 ffff88007c405f08 ffffffff81005d10
[    2.708132] Call Trace:
[    2.708132]  <#DF> 
[    2.708132]  [<ffffffff81005d10>] show_regs+0xa0/0x280
[    2.708132]  [<ffffffff8103d143>] df_debug+0x23/0x40
[    2.708132]  [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.708132]  [<ffffffff816194c7>] double_fault+0x27/0x30
[    2.708132]  [<ffffffff816112fe>] ? __schedule+0x27e/0xab0
[    2.708132]  [<ffffffff8161130f>] ? __schedule+0x28f/0xab0
[    2.708132]  <<EOE>> 
[    2.708132]  <UNK> Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7 
[    2.708132] RIP  [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    2.708132]  RSP <ffff88007c405e58>
[    2.708132] CR2: 00007fff99e51100
[    2.708132] ---[ end trace 749cd02c31c493a0 ]---
[    2.708132] note: vmmouse_detect[959] exited with preempt_count 3





[    1.730726] VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
[    1.737392] devtmpfs: mounted
[    1.748817] Freeing unused kernel memory: 2872K (ffffffff819a9000 - ffffffff81c77000)
[    2.249240] udevd[812]: starting version 175
[    2.563876] PANIC: double fault, error_code: 0x0
[    2.563885] 
[    2.564051] ======================================================
[    2.564051] [ INFO: possible circular locking dependency detected ]
[    2.575059] 3.15.0+ #8 Not tainted
[    2.575059] -------------------------------------------------------
[    2.575059] vmmouse_detect/960 is trying to acquire lock:
[    2.575059]  ((console_sem).lock){-.....}, at: [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[    2.575059] 
[    2.575059] but task is already holding lock:
[    2.575059]  (&rq->lock){-.-.-.}, at: [<ffffffff8161118f>] __schedule+0xdf/0xab0
[    2.575059] 
[    2.575059] which lock already depends on the new lock.
[    2.575059] 
[    2.575059] 
[    2.575059] the existing dependency chain (in reverse order) is:
[    2.575059] 
-> #2 (&rq->lock){-.-.-.}:
[    2.575059]        [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]        [<ffffffff816160e1>] _raw_spin_lock+0x41/0x80
[    2.575059]        [<ffffffff8108ab3b>] wake_up_new_task+0xbb/0x290
[    2.575059]        [<ffffffff8104e887>] do_fork+0x147/0x770
[    2.575059]        [<ffffffff8104eed6>] kernel_thread+0x26/0x30
[    2.575059]        [<ffffffff8160b572>] rest_init+0x22/0x140
[    2.575059]        [<ffffffff81b7ee3e>] start_kernel+0x408/0x415
[    2.575059]        [<ffffffff81b7e463>] x86_64_start_reservations+0x2a/0x2c
[    2.575059]        [<ffffffff81b7e55b>] x86_64_start_kernel+0xf6/0xf9
[    2.575059] 
-> #1 (&p->pi_lock){-.-.-.}:
[    2.575059]        [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]        [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[    2.575059]        [<ffffffff8108a70c>] try_to_wake_up+0x3c/0x330
[    2.575059]        [<ffffffff8108aa23>] wake_up_process+0x23/0x40
[    2.575059]        [<ffffffff816151af>] __up.isra.0+0x1f/0x30
[    2.575059]        [<ffffffff8109d1d1>] up+0x41/0x50
[    2.575059]        [<ffffffff810b5608>] console_unlock+0x258/0x490
[    2.575059]        [<ffffffff810b5ad1>] vprintk_emit+0x291/0x610
[    2.575059]        [<ffffffff8160ebf7>] printk_emit+0x33/0x3b
[    2.575059]        [<ffffffff810b5fd4>] devkmsg_writev+0x154/0x1d0
[    2.575059]        [<ffffffff8116d77a>] do_sync_write+0x5a/0x90
[    2.575059]        [<ffffffff8116df25>] vfs_write+0x175/0x1c0
[    2.575059]        [<ffffffff8116e982>] SyS_write+0x52/0xc0
[    2.575059]        [<ffffffff81617ce6>] system_call_fastpath+0x1a/0x1f
[    2.575059] 
-> #0 ((console_sem).lock){-.....}:
[    2.575059]        [<ffffffff810a5754>] __lock_acquire+0x1f14/0x2290
[    2.575059]        [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]        [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[    2.575059]        [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[    2.575059]        [<ffffffff810b50fe>] console_trylock+0x1e/0xb0
[    2.575059]        [<ffffffff810b5ab3>] vprintk_emit+0x273/0x610
[    2.575059]        [<ffffffff8160ec4e>] printk+0x4f/0x57
[    2.575059]        [<ffffffff8103d16b>] df_debug+0x1b/0x40
[    2.575059]        [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.575059]        [<ffffffff81619507>] double_fault+0x27/0x30
[    2.575059] 
[    2.575059] other info that might help us debug this:
[    2.575059] 
[    2.575059] Chain exists of:
  (console_sem).lock --> &p->pi_lock --> &rq->lock

[    2.575059]  Possible unsafe locking scenario:
[    2.575059] 
[    2.575059]        CPU0                    CPU1
[    2.575059]        ----                    ----
[    2.575059]   lock(&rq->lock);
[    2.575059]                                lock(&p->pi_lock);
[    2.575059]                                lock(&rq->lock);
[    2.575059]   lock((console_sem).lock);
[    2.575059] 
[    2.575059]  *** DEADLOCK ***
[    2.575059] 
[    2.575059] 1 lock held by vmmouse_detect/960:
[    2.575059]  #0:  (&rq->lock){-.-.-.}, at: [<ffffffff8161118f>] __schedule+0xdf/0xab0
[    2.575059] 
[    2.575059] stack backtrace:
[    2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[    2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.575059]  ffffffff823ef810 ffff88007c205c50 ffffffff8160f461 ffffffff823f22b0
[    2.575059]  ffff88007c205c90 ffffffff8109fd2b ffffffff82802180 ffff880079ca2e48
[    2.575059]  ffff880079ca2718 ffff880079ca26e0 0000000000000001 ffff880079ca2e10
[    2.575059] Call Trace:
[    2.575059]  <#DF>  [<ffffffff8160f461>] dump_stack+0x4e/0x7a
[    2.575059]  [<ffffffff8109fd2b>] print_circular_bug+0x1fb/0x330
[    2.575059]  [<ffffffff810a5754>] __lock_acquire+0x1f14/0x2290
[    2.575059]  [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[    2.575059]  [<ffffffff8109cfdd>] ? down_trylock+0x1d/0x50
[    2.575059]  [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[    2.575059]  [<ffffffff8109cfdd>] ? down_trylock+0x1d/0x50
[    2.575059]  [<ffffffff810b5ab3>] ? vprintk_emit+0x273/0x610
[    2.575059]  [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[    2.575059]  [<ffffffff810b5ab3>] ? vprintk_emit+0x273/0x610
[    2.575059]  [<ffffffff810b50fe>] console_trylock+0x1e/0xb0
[    2.575059]  [<ffffffff810b5ab3>] vprintk_emit+0x273/0x610
[    2.575059]  [<ffffffff8160ec4e>] printk+0x4f/0x57
[    2.575059]  [<ffffffff8103d16b>] df_debug+0x1b/0x40
[    2.575059]  [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.575059]  [<ffffffff81619507>] double_fault+0x27/0x30
[    2.575059]  [<ffffffff8161132e>] ? __schedule+0x27e/0xab0
[    2.575059]  [<ffffffff8161133f>] ? __schedule+0x28f/0xab0
[    2.575059]  <<EOE>>  <UNK> 
[    2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[    2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.575059] task: ffff880079ca26e0 ti: ffff880079d04000 task.ti: ffff880079d04000
[    2.575059] RIP: 0010:[<ffffffff8161133f>]  [<ffffffff8161133f>] __schedule+0x28f/0xab0
[    2.575059] RSP: 002b:00007fff70516420  EFLAGS: 00013086
[    2.575059] RAX: 000000007ae81000 RBX: ffff88007be67900 RCX: 0000000000000028
[    2.575059] RDX: ffffffff8161132e RSI: 0000000000000001 RDI: ffff88007c3d3c58
[    2.575059] RBP: 00007fff70516510 R08: 0000000000000000 R09: 0000000000005c00
[    2.575059] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c3d3c40
[    2.575059] R13: ffff88007b634000 R14: 0000000000000000 R15: ffff880079ca26e0
[    2.575059] FS:  00007f77c13d6700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000
[    2.575059] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.575059] CR2: 00007fff70516418 CR3: 000000007ae81000 CR4: 00000000000006f0
[    2.575059] Stack:
[    2.575059]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    2.575059]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    2.575059]  ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[    2.575059] Call Trace:
[    2.575059]  <UNK> 
[    2.575059] Code: 39 b5 80 03 00 00 0f 85 38 06 00 00 49 83 bf 88 02 00 00 00 0f 84 9a 03 00 00 49 8d 7c 24 18 48 c7 c2 2e 13 61 81 be 01 00 00 00 <e8> 4c 4b a9 ff 48 8b 74 24 18 4c 89 ff 9c 55 48 89 f5 48 89 a7 
[    2.575059] Kernel panic - not syncing: Machine halted.
[    2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[    2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    2.575059]  ffff88007c205f18 ffff88007c205e88 ffffffff8160f461 ffffffff81817b42
[    2.575059]  ffff88007c205f08 ffffffff8160ded3 0000000000000008 ffff88007c205f18
[    2.575059]  ffff88007c205eb0 ffffffff81005cfb ffffffff81616531 0000000080000002
[    2.575059] Call Trace:
[    2.575059]  <#DF>  [<ffffffff8160f461>] dump_stack+0x4e/0x7a
[    2.575059]  [<ffffffff8160ded3>] panic+0xc5/0x1e1
[    2.575059]  [<ffffffff81005cfb>] ? show_regs+0x5b/0x280
[    2.575059]  [<ffffffff81616531>] ? _raw_spin_unlock_irqrestore+0x41/0x90
[    2.575059]  [<ffffffff8103d181>] df_debug+0x31/0x40
[    2.575059]  [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    2.575059]  [<ffffffff81619507>] double_fault+0x27/0x30
[    2.575059]  [<ffffffff8161132e>] ? __schedule+0x27e/0xab0
[    2.575059]  [<ffffffff8161133f>] ? __schedule+0x28f/0xab0
[    2.575059]  <<EOE>>  <UNK> 
[    2.575059] Shutting down cpus with NMI
[    2.575059] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[    2.575059] ---[ end Kernel panic - not syncing: Machine halted.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-25 20:26 ` Borislav Petkov
@ 2014-06-27 10:18   ` Borislav Petkov
  2014-06-27 11:41     ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-27 10:18 UTC (permalink / raw)
  To: lkml, Paolo Bonzini
  Cc: Peter Zijlstra, Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Wed, Jun 25, 2014 at 10:26:50PM +0200, Borislav Petkov wrote:
> On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote:
> > Hi guys,
> > 
> > so I'm looking at this splat below when booting current linus+tip/master
> > in a kvm guest. Initially I thought this is something related to the
> > PARAVIRT gunk but it happens with and without it.
> 
> Ok, here's a cleaner splat. I went and rebuilt qemu to latest master
> from today to rule out some breakage there but it still fires.

Ok, another observation: I was using qemu from sources from the other day:

v2.0.0-1806-g2b5b7ae917e8

Switching back to the installed one:

$ qemu-system-x86_64 --version
QEMU emulator version 1.7.1 (Debian 1.7.0+dfsg-6), Copyright (c) 2003-2008 Fabrice Bellard

fixes the issue.

Joerg says I should bisect but I'm busy with other stuff. If people are
interested in chasing this further, I could free up some time to do
so...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-27 10:18   ` Borislav Petkov
@ 2014-06-27 11:41     ` Paolo Bonzini
  2014-06-27 11:55       ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2014-06-27 11:41 UTC (permalink / raw)
  To: Borislav Petkov, lkml
  Cc: Peter Zijlstra, Steven Rostedt, x86-ml, kvm, Jörg Rödel

Il 27/06/2014 12:18, Borislav Petkov ha scritto:
> Joerg says I should bisect but I'm busy with other stuff. If people are
> interested in chasing this further, I could free up some time to do
> so...

Please first try "-M pc-1.7" on the 2.0 QEMU.  If it fails, please do 
bisect it.  A QEMU bisection between one release only usually takes only 
half an hour or so for me.

I use

../configure --target-list=x86_64-softmmu && make distclean &&
../configure --target-list=x86_64-softmmu &&
make -j8 subdir-x86_64-softmmu

Until it's below 50 commits.  After that point just "make -j8 
subdir-x86_64-softmmu" should do.  This ensures that build system 
changes do not bite you as you move back and forth in time.

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-27 11:41     ` Paolo Bonzini
@ 2014-06-27 11:55       ` Borislav Petkov
  2014-06-27 12:01         ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-27 11:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Fri, Jun 27, 2014 at 01:41:30PM +0200, Paolo Bonzini wrote:
> Il 27/06/2014 12:18, Borislav Petkov ha scritto:
> >Joerg says I should bisect but I'm busy with other stuff. If people are
> >interested in chasing this further, I could free up some time to do
> >so...
> 
> Please first try "-M pc-1.7" on the 2.0 QEMU.  If it fails, please do bisect
> it.  A QEMU bisection between one release only usually takes only half an
> hour or so for me.
> 
> I use
> 
> ../configure --target-list=x86_64-softmmu && make distclean &&
> ../configure --target-list=x86_64-softmmu &&
> make -j8 subdir-x86_64-softmmu
> 
> Until it's below 50 commits.  After that point just "make -j8
> subdir-x86_64-softmmu" should do.  This ensures that build system changes do
> not bite you as you move back and forth in time.

Ok, thanks for the help.

However, as it always happens, right after sending the mail, I triggered
it with the qemu installed on the system too :-( I.e., qemu 1.7.1.

So we will try to debug the #DF first to find out why kvm is injecting
it in the first place. I'll keep you posted.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-27 11:55       ` Borislav Petkov
@ 2014-06-27 12:01         ` Paolo Bonzini
  2014-06-27 12:10           ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2014-06-27 12:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm, Jörg Rödel

Il 27/06/2014 13:55, Borislav Petkov ha scritto:
> On Fri, Jun 27, 2014 at 01:41:30PM +0200, Paolo Bonzini wrote:
>> Il 27/06/2014 12:18, Borislav Petkov ha scritto:
>>> Joerg says I should bisect but I'm busy with other stuff. If people are
>>> interested in chasing this further, I could free up some time to do
>>> so...
>>
>> Please first try "-M pc-1.7" on the 2.0 QEMU.  If it fails, please do bisect
>> it.  A QEMU bisection between one release only usually takes only half an
>> hour or so for me.
>>
>> I use
>>
>> ../configure --target-list=x86_64-softmmu && make distclean &&
>> ../configure --target-list=x86_64-softmmu &&
>> make -j8 subdir-x86_64-softmmu
>>
>> Until it's below 50 commits.  After that point just "make -j8
>> subdir-x86_64-softmmu" should do.  This ensures that build system changes do
>> not bite you as you move back and forth in time.
>
> Ok, thanks for the help.
>
> However, as it always happens, right after sending the mail, I triggered
> it with the qemu installed on the system too :-( I.e., qemu 1.7.1.

:)

Can you try gathering a trace? (and since those things get huge, you can 
send it to me offlist)  Also try without ept and see what happens.

Also, perhaps you can bisect between Linus's tree and tip?

And finally, what is the host kernel?

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-27 12:01         ` Paolo Bonzini
@ 2014-06-27 12:10           ` Borislav Petkov
  2014-06-28 11:44             ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-27 12:10 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Fri, Jun 27, 2014 at 02:01:43PM +0200, Paolo Bonzini wrote:
> Can you try gathering a trace? (and since those things get huge, you
> can send it to me offlist) Also try without ept and see what happens.

Yeah, Joerg just sent me a diff on how to intercept #DF. I'll add a
tracepoint so that it all goes into the trace together.

> Also, perhaps you can bisect between Linus's tree and tip?

Yep, that's next if we don't get smart from the #DF trace.

> And finally, what is the host kernel?

3.16-rc2+ - "+" is tip/master from the last weekend with a couple of
patches to arch/x86/ from me which should be unrelated (yeah, we've
heard that before :-)).

Thanks for the suggestions, I'm going for a run now but after I get
back, debugging session starts with host and guest rebuilt afresh. :-)

Stay tuned.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-27 12:10           ` Borislav Petkov
@ 2014-06-28 11:44             ` Borislav Petkov
  2014-06-29  6:46               ` Gleb Natapov
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-28 11:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm, Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 10208 bytes --]

Ok, I rebuilt the host kernel with latest linus+tip/master and my queue.
The guest kernel is v3.15-8992-g08f7cc749389 with a is a bunch of RAS
patches. Before I start doing the coarse-grained bisection by testing
-rcs and major numbers, I wanted to catch a #DF and try to analyze at
least why it happens. And from what I'm seeing, it looks insane.

Ok, so kvm_amd.ko is loaded with npt=0 so that I can see the pagefaults
in the trace.

All TPs in events/kvm/ are enabled. The df tracepoint is
straightforward, attached.

However, with npt=0 this #DF TP doesn't get hit. I still can see the #DF
though and here's what it looks like. (qemu is latest from git):

So let's comment on what I'm seeing:

...
 qemu-system-x86-20240 [006] d..2  9406.484041: kvm_entry: vcpu 1
 qemu-system-x86-20240 [006] d..2  9406.484042: kvm_exit: reason PF excp rip 0xffffffff8103be46 info b ffffffffff5fd380
 qemu-system-x86-20240 [006] ...1  9406.484042: kvm_page_fault: address ffffffffff5fd380 error_code b
 qemu-system-x86-20240 [006] ...1  9406.484044: kvm_emulate_insn: 0:ffffffff8103be46:89 b7 00 d0 5f ff (prot64)
 qemu-system-x86-20240 [006] ...1  9406.484044: vcpu_match_mmio: gva 0xffffffffff5fd380 gpa 0xfee00380 Write GVA
 qemu-system-x86-20240 [006] ...1  9406.484044: kvm_mmio: mmio write len 4 gpa 0xfee00380 val 0x39884
 qemu-system-x86-20240 [006] ...1  9406.484045: kvm_apic: apic_write APIC_TMICT = 0x39884
 qemu-system-x86-20239 [004] d..2  9406.484046: kvm_entry: vcpu 0
 qemu-system-x86-20240 [006] d..2  9406.484048: kvm_entry: vcpu 1
 qemu-system-x86-20239 [004] d..2  9406.484052: kvm_exit: reason PF excp rip 0xffffffff812da4ff info 0 1188808

this rip is

ffffffff812da4e0 <__get_user_8>:
...
ffffffff812da4ff:       48 8b 50 f9             mov    -0x7(%rax),%rdx

so we're basically pagefaulting when doing get_user and the user address is 1188808.

And that looks ok, this value is exitinfo2 where SVM puts the faulting
address on an #PF exception intercept.

 qemu-system-x86-20239 [004] ...1  9406.484053: kvm_page_fault: address 1188808 error_code 0
 qemu-system-x86-20240 [006] d..2  9406.484055: kvm_exit: reason write_cr3 rip 0xffffffff816112d0 info 8000000000000000 0

This is interesting, cpu1 switches address spaces, looks like we're
in context_switch(), i.e. consistent with the guest rip pointing to
__schedule+0x28f below.

I say "interesting" because this bug feels like we're trying to access
the user process' memory which is gone by the time we do so. Hmm, just a
gut feeling though.

 qemu-system-x86-20239 [004] d..2  9406.484059: kvm_entry: vcpu 0
 qemu-system-x86-20239 [004] d..2  9406.484060: kvm_exit: reason PF excp rip 0xffffffff812da4ff info 0 1188808
 qemu-system-x86-20239 [004] ...1  9406.484061: kvm_page_fault: address 1188808 error_code 0

 Now here's where it gets interesting:

 qemu-system-x86-20239 [004] d..2  9406.484131: kvm_entry: vcpu 0
 qemu-system-x86-20240 [006] d..2  9406.484132: kvm_entry: vcpu 1
 qemu-system-x86-20240 [006] d..2  9406.484133: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318

We're pagefaulting on a user address 7fffb62ba318 at guest rip
0xffffffff8161130f which is:

ffffffff816112da:       00 00 
ffffffff816112dc:       4c 8b b3 80 03 00 00    mov    0x380(%rbx),%r14
ffffffff816112e3:       4d 39 b5 80 03 00 00    cmp    %r14,0x380(%r13)
ffffffff816112ea:       0f 85 38 06 00 00       jne    ffffffff81611928 <__schedule+0x8a8>
ffffffff816112f0:       49 83 bf 88 02 00 00    cmpq   $0x0,0x288(%r15)
ffffffff816112f7:       00 
ffffffff816112f8:       0f 84 9a 03 00 00       je     ffffffff81611698 <__schedule+0x618>
ffffffff816112fe:       49 8d 7c 24 18          lea    0x18(%r12),%rdi
ffffffff81611303:       48 c7 c2 fe 12 61 81    mov    $0xffffffff816112fe,%rdx
ffffffff8161130a:       be 01 00 00 00          mov    $0x1,%esi
ffffffff8161130f:       e8 4c 4b a9 ff          callq  ffffffff810a5e60 <lock_release>			<---
ffffffff81611314:       48 8b 74 24 18          mov    0x18(%rsp),%rsi
ffffffff81611319:       4c 89 ff                mov    %r15,%rdi
ffffffff8161131c:       9c                      pushfq 
ffffffff8161131d:       55                      push   %rbp

which, if I'm not mistaken is this here in context_switch():

#ifndef __ARCH_WANT_UNLOCKED_CTXSW
	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
#endif

Related annotated asm:

#APP
# 54 "/w/kernel/linux-2.6/arch/x86/include/asm/special_insns.h" 1
	mov %rax,%cr3	# D.62668
# 0 "" 2
# 117 "/w/kernel/linux-2.6/arch/x86/include/asm/bitops.h" 1
	.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
	lock; btr %r14,888(%r13)	# D.62671, MEM[(volatile long int *)_215]
# 0 "" 2
#NO_APP
	movq	896(%rbx), %r14	# mm_193->context.ldt, D.62674
	cmpq	%r14, 896(%r13)	# D.62674, oldmm_194->context.ldt
	jne	.L2019	#,
.L1925:
	cmpq	$0, 648(%r15)	#, prev_21->mm			<--- that's the "if (!prev->mm)" test
	je	.L2020	#,
.L1931:
	leaq	24(%r12), %rdi	#, D.62691
	movq	$.L1931, %rdx	#,
	movl	$1, %esi	#,
	call	lock_release	#				<---- the call to spin_release
	movq	24(%rsp), %rsi	# %sfp, D.62679
	movq	%r15, %rdi	# prev, prev
#APP
# 2307 "kernel/sched/core.c" 1

so it basically is the same as what we saw before.

 qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
 qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a

kvm injects the #PF into the guest.

 qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
 qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
 qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
 qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)

Second #PF at the same address and kvm injects the #DF.

BUT(!), why?

I probably am missing something but WTH are we pagefaulting at a
user address in context_switch() while doing a lockdep call, i.e.
spin_release? We're not touching any userspace gunk there AFAICT.

Is this an async pagefault or so which kvm is doing so that the guest
rip is actually pointing at the wrong place?

Or something else I'm missing, most probably...

In any case I'll try to repro with the latest kernel in the guest too.

Here's the splat shown in the guest:

[    3.130253] random: nonblocking pool is initialized
[    3.700333] PANIC: double fault, error_code: 0x0
[    3.704212] CPU: 1 PID: 911 Comm: vmmouse_detect Not tainted 3.15.0+ #1
[    3.704212] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    3.704212] task: ffff88007b4e4dc0 ti: ffff88007aa08000 task.ti: ffff88007aa08000
[    3.704212] RIP: 0010:[<ffffffff8161130f>]  [<ffffffff8161130f>] __schedule+0x28f/0xab0
[    3.704212] RSP: 002b:00007fffb62ba320  EFLAGS: 00013082
[    3.704212] RAX: 000000007b75b000 RBX: ffff88007b5b8980 RCX: 0000000000000028
[    3.704212] RDX: ffffffff816112fe RSI: 0000000000000001 RDI: ffff88007c5d3c58
[    3.704212] RBP: 00007fffb62ba410 R08: ffff88007bdd3ac9 R09: 0000000000000000
[    3.704212] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c5d3c40
[    3.704212] R13: ffff88007b5bb440 R14: 0000000000000000 R15: ffff88007b4e4dc0
[    3.704212] FS:  00007fa1eec0f700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[    3.704212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.704212] CR2: 00007fffb62ba318 CR3: 000000007b75b000 CR4: 00000000000006e0
[    3.704212] Stack:
[    3.704212] BUG: unable to handle kernel paging request at 00007fffb62ba320
[    3.704212] IP: [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    3.704212] PGD 7b3ab067 PUD 0 
[    3.704212] Oops: 0000 [#1] PREEMPT SMP 
[    3.704212] Modules linked in:
[    3.704212] CPU: 1 PID: 911 Comm: vmmouse_detect Not tainted 3.15.0+ #1
[    3.704212] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[    3.704212] task: ffff88007b4e4dc0 ti: ffff88007aa08000 task.ti: ffff88007aa08000
[    3.704212] RIP: 0010:[<ffffffff81005bbc>]  [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    3.704212] RSP: 002b:ffff88007c405e58  EFLAGS: 00013046
[    3.704212] RAX: 00007fffb62ba328 RBX: 0000000000000000 RCX: ffff88007c403fc0
[    3.704212] RDX: 00007fffb62ba320 RSI: ffff88007c400000 RDI: ffffffff81846aba
[    3.704212] RBP: ffff88007c405ea8 R08: ffff88007c3fffc0 R09: 0000000000000000
[    3.704212] R10: 000000007c400000 R11: 0000000000000000 R12: ffff88007c405f58
[    3.704212] R13: 0000000000000000 R14: ffffffff818136fc R15: 0000000000000000
[    3.704212] FS:  00007fa1eec0f700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[    3.704212] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.704212] CR2: 00007fffb62ba320 CR3: 000000007b75b000 CR4: 00000000000006e0
[    3.704212] Stack:
[    3.704212]  0000000000000008 ffff88007c405eb8 ffff88007c405e70 000000007b75b000
[    3.704212]  00007fffb62ba320 ffff88007c405f58 00007fffb62ba320 0000000000000040
[    3.704212]  0000000000000ac0 ffff88007b4e4dc0 ffff88007c405f08 ffffffff81005d10
[    3.704212] Call Trace:
[    3.704212]  <#DF> 
[    3.704212]  [<ffffffff81005d10>] show_regs+0xa0/0x280
[    3.704212]  [<ffffffff8103d143>] df_debug+0x23/0x40
[    3.704212]  [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[    3.704212]  [<ffffffff816194c7>] double_fault+0x27/0x30
[    3.704212]  [<ffffffff816112fe>] ? __schedule+0x27e/0xab0
[    3.704212]  [<ffffffff8161130f>] ? __schedule+0x28f/0xab0
[    3.704212]  <<EOE>> 
[    3.704212]  <UNK> Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7 
[    3.704212] RIP  [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[    3.704212]  RSP <ffff88007c405e58>
[    3.704212] CR2: 00007fffb62ba320
[    3.704212] ---[ end trace 85735a6f8b08ee31 ]---
[    3.704212] note: vmmouse_detect[911] exited with preempt_count 3

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

[-- Attachment #2: 0001-kvm-svm-Intercept-DF-on-AMD.patch --]
[-- Type: text/x-diff, Size: 2547 bytes --]

>From b2e6dd5168373feb7548da5521efc40c58409567 Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp@suse.de>
Date: Fri, 27 Jun 2014 20:22:05 +0200
Subject: [PATCH] kvm, svm: Intercept #DF on AMD

Thanks Joro for the initial patch.

Originally-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kvm/svm.c   |  9 +++++++++
 arch/x86/kvm/trace.h | 15 +++++++++++++++
 arch/x86/kvm/x86.c   |  1 +
 3 files changed, 25 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c5cfea..30a83f219aa5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 	set_exception_intercept(svm, PF_VECTOR);
 	set_exception_intercept(svm, UD_VECTOR);
 	set_exception_intercept(svm, MC_VECTOR);
+	set_exception_intercept(svm, DF_VECTOR);
 
 	set_intercept(svm, INTERCEPT_INTR);
 	set_intercept(svm, INTERCEPT_NMI);
@@ -1784,6 +1785,13 @@ static int ud_interception(struct vcpu_svm *svm)
 	return 1;
 }
 
+static int df_interception(struct vcpu_svm *svm)
+{
+	trace_kvm_df(kvm_rip_read(&svm->vcpu));
+
+	return 1;
+}
+
 static void svm_fpu_activate(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -3324,6 +3332,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
 	[SVM_EXIT_EXCP_BASE + PF_VECTOR]	= pf_interception,
 	[SVM_EXIT_EXCP_BASE + NM_VECTOR]	= nm_interception,
 	[SVM_EXIT_EXCP_BASE + MC_VECTOR]	= mc_interception,
+	[SVM_EXIT_EXCP_BASE + DF_VECTOR]	= df_interception,
 	[SVM_EXIT_INTR]				= intr_interception,
 	[SVM_EXIT_NMI]				= nmi_interception,
 	[SVM_EXIT_SMI]				= nop_on_interception,
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 33574c95220d..8ac01d218443 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -88,6 +88,21 @@ TRACE_EVENT(kvm_hv_hypercall,
 		  __entry->outgpa)
 );
 
+TRACE_EVENT(kvm_df,
+	TP_PROTO(unsigned long rip),
+	TP_ARGS(rip),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	rip	)
+	),
+
+	TP_fast_assign(
+		__entry->rip	= rip;
+	),
+
+	TP_printk("rip: 0x%lx", __entry->rip)
+);
+
 /*
  * Tracepoint for PIO.
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a02578c0d..9e6056dcdaea 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7576,3 +7576,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_df);
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-28 11:44             ` Borislav Petkov
@ 2014-06-29  6:46               ` Gleb Natapov
  2014-06-29  9:56                 ` Jan Kiszka
  0 siblings, 1 reply; 30+ messages in thread
From: Gleb Natapov @ 2014-06-29  6:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm,
	Jörg Rödel

On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
> 
> kvm injects the #PF into the guest.
> 
>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
> 
> Second #PF at the same address and kvm injects the #DF.
> 
> BUT(!), why?
> 
> I probably am missing something but WTH are we pagefaulting at a
> user address in context_switch() while doing a lockdep call, i.e.
> spin_release? We're not touching any userspace gunk there AFAICT.
> 
> Is this an async pagefault or so which kvm is doing so that the guest
> rip is actually pointing at the wrong place?
> 
There is nothing in the trace that point to async pagefault as far as I see.

> Or something else I'm missing, most probably...
> 
Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
kvm_multiple_exception() to see which two exception are combined into #DF.

--
			Gleb.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29  6:46               ` Gleb Natapov
@ 2014-06-29  9:56                 ` Jan Kiszka
  2014-06-29 10:24                   ` Gleb Natapov
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2014-06-29  9:56 UTC (permalink / raw)
  To: Gleb Natapov, Borislav Petkov
  Cc: Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm,
	Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On 2014-06-29 08:46, Gleb Natapov wrote:
> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
>>
>> kvm injects the #PF into the guest.
>>
>>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
>>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
>>
>> Second #PF at the same address and kvm injects the #DF.
>>
>> BUT(!), why?
>>
>> I probably am missing something but WTH are we pagefaulting at a
>> user address in context_switch() while doing a lockdep call, i.e.
>> spin_release? We're not touching any userspace gunk there AFAICT.
>>
>> Is this an async pagefault or so which kvm is doing so that the guest
>> rip is actually pointing at the wrong place?
>>
> There is nothing in the trace that point to async pagefault as far as I see.
> 
>> Or something else I'm missing, most probably...
>>
> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> kvm_multiple_exception() to see which two exception are combined into #DF.
> 

FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
when patch-disabling the vmport in QEMU.

Let me know if I can help with the analysis.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29  9:56                 ` Jan Kiszka
@ 2014-06-29 10:24                   ` Gleb Natapov
  2014-06-29 10:31                     ` Jan Kiszka
  0 siblings, 1 reply; 30+ messages in thread
From: Gleb Natapov @ 2014-06-29 10:24 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Borislav Petkov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
> On 2014-06-29 08:46, Gleb Natapov wrote:
> > On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> >>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
> >>
> >> kvm injects the #PF into the guest.
> >>
> >>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
> >>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> >>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
> >>
> >> Second #PF at the same address and kvm injects the #DF.
> >>
> >> BUT(!), why?
> >>
> >> I probably am missing something but WTH are we pagefaulting at a
> >> user address in context_switch() while doing a lockdep call, i.e.
> >> spin_release? We're not touching any userspace gunk there AFAICT.
> >>
> >> Is this an async pagefault or so which kvm is doing so that the guest
> >> rip is actually pointing at the wrong place?
> >>
> > There is nothing in the trace that point to async pagefault as far as I see.
> > 
> >> Or something else I'm missing, most probably...
> >>
> > Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> > kvm_multiple_exception() to see which two exception are combined into #DF.
> > 
> 
> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
> when patch-disabling the vmport in QEMU.
> 
> Let me know if I can help with the analysis.
>
Bisection would be great of course. Once thing that is special about
vmport that comes to mind is that it reads vcpu registers to userspace and
write them back. IIRC "info registers" does the same. Can you see if the
problem is reproducible with disabled vmport, but doing "info registers"
in qemu console? Although trace does not should any exists to userspace
near the failure...

--
			Gleb.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 10:24                   ` Gleb Natapov
@ 2014-06-29 10:31                     ` Jan Kiszka
  2014-06-29 10:53                       ` Gleb Natapov
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2014-06-29 10:31 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Borislav Petkov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 2386 bytes --]

On 2014-06-29 12:24, Gleb Natapov wrote:
> On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
>> On 2014-06-29 08:46, Gleb Natapov wrote:
>>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>>>>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
>>>>
>>>> kvm injects the #PF into the guest.
>>>>
>>>>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
>>>>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>>>>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
>>>>
>>>> Second #PF at the same address and kvm injects the #DF.
>>>>
>>>> BUT(!), why?
>>>>
>>>> I probably am missing something but WTH are we pagefaulting at a
>>>> user address in context_switch() while doing a lockdep call, i.e.
>>>> spin_release? We're not touching any userspace gunk there AFAICT.
>>>>
>>>> Is this an async pagefault or so which kvm is doing so that the guest
>>>> rip is actually pointing at the wrong place?
>>>>
>>> There is nothing in the trace that point to async pagefault as far as I see.
>>>
>>>> Or something else I'm missing, most probably...
>>>>
>>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
>>> kvm_multiple_exception() to see which two exception are combined into #DF.
>>>
>>
>> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
>> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
>> when patch-disabling the vmport in QEMU.
>>
>> Let me know if I can help with the analysis.
>>
> Bisection would be great of course. Once thing that is special about
> vmport that comes to mind is that it reads vcpu registers to userspace and
> write them back. IIRC "info registers" does the same. Can you see if the
> problem is reproducible with disabled vmport, but doing "info registers"
> in qemu console? Although trace does not should any exists to userspace
> near the failure...

Yes, info registers crashes the guest after a while as well (with
different backtrace due to different context).

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 10:31                     ` Jan Kiszka
@ 2014-06-29 10:53                       ` Gleb Natapov
  2014-06-29 10:59                         ` Jan Kiszka
  0 siblings, 1 reply; 30+ messages in thread
From: Gleb Natapov @ 2014-06-29 10:53 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Borislav Petkov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote:
> On 2014-06-29 12:24, Gleb Natapov wrote:
> > On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
> >> On 2014-06-29 08:46, Gleb Natapov wrote:
> >>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
> >>>>
> >>>> kvm injects the #PF into the guest.
> >>>>
> >>>>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
> >>>>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>>>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
> >>>>
> >>>> Second #PF at the same address and kvm injects the #DF.
> >>>>
> >>>> BUT(!), why?
> >>>>
> >>>> I probably am missing something but WTH are we pagefaulting at a
> >>>> user address in context_switch() while doing a lockdep call, i.e.
> >>>> spin_release? We're not touching any userspace gunk there AFAICT.
> >>>>
> >>>> Is this an async pagefault or so which kvm is doing so that the guest
> >>>> rip is actually pointing at the wrong place?
> >>>>
> >>> There is nothing in the trace that point to async pagefault as far as I see.
> >>>
> >>>> Or something else I'm missing, most probably...
> >>>>
> >>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> >>> kvm_multiple_exception() to see which two exception are combined into #DF.
> >>>
> >>
> >> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
> >> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
> >> when patch-disabling the vmport in QEMU.
> >>
> >> Let me know if I can help with the analysis.
> >>
> > Bisection would be great of course. Once thing that is special about
> > vmport that comes to mind is that it reads vcpu registers to userspace and
> > write them back. IIRC "info registers" does the same. Can you see if the
> > problem is reproducible with disabled vmport, but doing "info registers"
> > in qemu console? Although trace does not should any exists to userspace
> > near the failure...
> 
> Yes, info registers crashes the guest after a while as well (with
> different backtrace due to different context).
> 
Oh crap. Bisection would be most helpful. Just to be absolutely sure
that this is not QEMU problem: does exactly same QEMU version work with
older kernels?

--
			Gleb.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 10:53                       ` Gleb Natapov
@ 2014-06-29 10:59                         ` Jan Kiszka
  2014-06-29 11:51                           ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2014-06-29 10:59 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Borislav Petkov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]

On 2014-06-29 12:53, Gleb Natapov wrote:
> On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote:
>> On 2014-06-29 12:24, Gleb Natapov wrote:
>>> On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
>>>> On 2014-06-29 08:46, Gleb Natapov wrote:
>>>>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>>>>>>  qemu-system-x86-20240 [006] ...1  9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>>>>  qemu-system-x86-20240 [006] ...1  9406.484136: kvm_inj_exception: #PF (0x2)a
>>>>>>
>>>>>> kvm injects the #PF into the guest.
>>>>>>
>>>>>>  qemu-system-x86-20240 [006] d..2  9406.484136: kvm_entry: vcpu 1
>>>>>>  qemu-system-x86-20240 [006] d..2  9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>>>>>>  qemu-system-x86-20240 [006] ...1  9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>>>>  qemu-system-x86-20240 [006] ...1  9406.484141: kvm_inj_exception: #DF (0x0)
>>>>>>
>>>>>> Second #PF at the same address and kvm injects the #DF.
>>>>>>
>>>>>> BUT(!), why?
>>>>>>
>>>>>> I probably am missing something but WTH are we pagefaulting at a
>>>>>> user address in context_switch() while doing a lockdep call, i.e.
>>>>>> spin_release? We're not touching any userspace gunk there AFAICT.
>>>>>>
>>>>>> Is this an async pagefault or so which kvm is doing so that the guest
>>>>>> rip is actually pointing at the wrong place?
>>>>>>
>>>>> There is nothing in the trace that point to async pagefault as far as I see.
>>>>>
>>>>>> Or something else I'm missing, most probably...
>>>>>>
>>>>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
>>>>> kvm_multiple_exception() to see which two exception are combined into #DF.
>>>>>
>>>>
>>>> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
>>>> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
>>>> when patch-disabling the vmport in QEMU.
>>>>
>>>> Let me know if I can help with the analysis.
>>>>
>>> Bisection would be great of course. Once thing that is special about
>>> vmport that comes to mind is that it reads vcpu registers to userspace and
>>> write them back. IIRC "info registers" does the same. Can you see if the
>>> problem is reproducible with disabled vmport, but doing "info registers"
>>> in qemu console? Although trace does not should any exists to userspace
>>> near the failure...
>>
>> Yes, info registers crashes the guest after a while as well (with
>> different backtrace due to different context).
>>
> Oh crap. Bisection would be most helpful. Just to be absolutely sure
> that this is not QEMU problem: does exactly same QEMU version work with
> older kernels?

Yes, that was the case last time I tried (I'm on today's git head with
QEMU right now).

Will see what I can do regarding bisecting. That host is a bit slow
(netbook), so it may take a while. Boris will probably beat me in this.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 10:59                         ` Jan Kiszka
@ 2014-06-29 11:51                           ` Borislav Petkov
  2014-06-29 12:22                             ` Jan Kiszka
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-29 11:51 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gleb Natapov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 12:59:30PM +0200, Jan Kiszka wrote:
> Will see what I can do regarding bisecting. That host is a bit slow
> (netbook), so it may take a while. Boris will probably beat me in
> this.

Nah, I was about to instrument kvm_multiple_exception() first and am
slow anyway so... :-)

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 11:51                           ` Borislav Petkov
@ 2014-06-29 12:22                             ` Jan Kiszka
  2014-06-29 13:14                               ` Borislav Petkov
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2014-06-29 12:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Gleb Natapov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

On 2014-06-29 13:51, Borislav Petkov wrote:
> On Sun, Jun 29, 2014 at 12:59:30PM +0200, Jan Kiszka wrote:
>> Will see what I can do regarding bisecting. That host is a bit slow
>> (netbook), so it may take a while. Boris will probably beat me in
>> this.
> 
> Nah, I was about to instrument kvm_multiple_exception() first and am
> slow anyway so... :-)

OK, looks like I won ;): The issue was apparently introduced with "KVM:
x86: get CPL from SS.DPL" (ae9fedc793). Maybe we are not properly saving
or restoring this state on SVM since then.

Need a break, will look into details later.

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 12:22                             ` Jan Kiszka
@ 2014-06-29 13:14                               ` Borislav Petkov
  2014-06-29 13:42                                 ` Gleb Natapov
  2014-06-29 13:46                                 ` __schedule #DF splat Borislav Petkov
  0 siblings, 2 replies; 30+ messages in thread
From: Borislav Petkov @ 2014-06-29 13:14 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gleb Natapov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 02:22:35PM +0200, Jan Kiszka wrote:
> OK, looks like I won ;):

I gladly let you win. :-P

> The issue was apparently introduced with "KVM: x86: get CPL from
> SS.DPL" (ae9fedc793). Maybe we are not properly saving or restoring
> this state on SVM since then.

I wonder if this change in the CPL saving would have anything to do with
the fact that we're doing a CR3 write right before we fail pagetable
walk and end up walking a user page table. It could be unrelated though,
as in the previous dump I had a get_user right before the #DF. Hmmm.

I better go and revert that one and check whether it fixes things.

> Need a break, will look into details later.

Ok, some more info from my side, see relevant snippet below. We're
basically not finding the pte at level 3 during the page walk for
7fff0b0f8908.

However, why we're even page walking this userspace address at that
point I have no idea.

And the CR3 write right before this happens is there so I'm pretty much
sure by now that this is related...

 qemu-system-x86-5007  [007] ...1   346.126204: vcpu_match_mmio: gva 0xffffffffff5fd0b0 gpa 0xfee000b0 Write GVA
 qemu-system-x86-5007  [007] ...1   346.126204: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0
 qemu-system-x86-5007  [007] ...1   346.126205: kvm_apic: apic_write APIC_EOI = 0x0
 qemu-system-x86-5007  [007] ...1   346.126205: kvm_eoi: apicid 0 vector 253
 qemu-system-x86-5007  [007] d..2   346.126206: kvm_entry: vcpu 0
 qemu-system-x86-5007  [007] d..2   346.126211: kvm_exit: reason write_cr3 rip 0xffffffff816113a0 info 8000000000000000 0
 qemu-system-x86-5007  [007] ...2   346.126214: kvm_mmu_get_page: sp gen 25 gfn 7b2b1 4 pae q0 wux !nxe root 0 sync existing
 qemu-system-x86-5007  [007] d..2   346.126215: kvm_entry: vcpu 0
 qemu-system-x86-5007  [007] d..2   346.126216: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
 qemu-system-x86-5007  [007] ...1   346.126217: kvm_page_fault: address 7fff0b0f8908 error_code 2
 qemu-system-x86-5007  [007] ...1   346.126218: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
 qemu-system-x86-5007  [007] ...1   346.126219: kvm_mmu_paging_element: pte 7b2b6067 level 4
 qemu-system-x86-5007  [007] ...1   346.126220: kvm_mmu_paging_element: pte 0 level 3
 qemu-system-x86-5007  [007] ...1   346.126220: kvm_mmu_walker_error: pferr 2 W
 qemu-system-x86-5007  [007] ...1   346.126221: kvm_multiple_exception: nr: 14, prev: 255, has_error: 1, error_code: 0x2, reinj: 0
 qemu-system-x86-5007  [007] ...1   346.126221: kvm_inj_exception: #PF (0x2)
 qemu-system-x86-5007  [007] d..2   346.126222: kvm_entry: vcpu 0
 qemu-system-x86-5007  [007] d..2   346.126223: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
 qemu-system-x86-5007  [007] ...1   346.126224: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 1
 qemu-system-x86-5007  [007] ...1   346.126225: kvm_page_fault: address 7fff0b0f8908 error_code 2
 qemu-system-x86-5007  [007] ...1   346.126225: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 0 
 qemu-system-x86-5007  [007] ...1   346.126226: kvm_mmu_paging_element: pte 7b2b6067 level 4
 qemu-system-x86-5007  [007] ...1   346.126227: kvm_mmu_paging_element: pte 0 level 3
 qemu-system-x86-5007  [007] ...1   346.126227: kvm_mmu_walker_error: pferr 0 
 qemu-system-x86-5007  [007] ...1   346.126228: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
 qemu-system-x86-5007  [007] ...1   346.126229: kvm_mmu_paging_element: pte 7b2b6067 level 4
 qemu-system-x86-5007  [007] ...1   346.126230: kvm_mmu_paging_element: pte 0 level 3
 qemu-system-x86-5007  [007] ...1   346.126230: kvm_mmu_walker_error: pferr 2 W
 qemu-system-x86-5007  [007] ...1   346.126231: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 0
 qemu-system-x86-5007  [007] ...1   346.126231: kvm_inj_exception: #DF (0x0)
 qemu-system-x86-5007  [007] d..2   346.126232: kvm_entry: vcpu 0
 qemu-system-x86-5007  [007] d..2   346.126371: kvm_exit: reason io rip 0xffffffff8131e623 info 3d40220 ffffffff8131e625
 qemu-system-x86-5007  [007] ...1   346.126372: kvm_pio: pio_write at 0x3d4 size 2 count 1 val 0x130e 
 qemu-system-x86-5007  [007] ...1   346.126374: kvm_userspace_exit: reason KVM_EXIT_IO (2)
 qemu-system-x86-5007  [007] d..2   346.126383: kvm_entry: vcpu 0

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 13:14                               ` Borislav Petkov
@ 2014-06-29 13:42                                 ` Gleb Natapov
  2014-06-29 14:01                                   ` Borislav Petkov
  2014-06-29 13:46                                 ` __schedule #DF splat Borislav Petkov
  1 sibling, 1 reply; 30+ messages in thread
From: Gleb Natapov @ 2014-06-29 13:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jan Kiszka, Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt,
	x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 03:14:43PM +0200, Borislav Petkov wrote:
> On Sun, Jun 29, 2014 at 02:22:35PM +0200, Jan Kiszka wrote:
> > OK, looks like I won ;):
> 
> I gladly let you win. :-P
> 
> > The issue was apparently introduced with "KVM: x86: get CPL from
> > SS.DPL" (ae9fedc793). Maybe we are not properly saving or restoring
> > this state on SVM since then.
> 
> I wonder if this change in the CPL saving would have anything to do with
> the fact that we're doing a CR3 write right before we fail pagetable
> walk and end up walking a user page table. It could be unrelated though,
> as in the previous dump I had a get_user right before the #DF. Hmmm.
> 
> I better go and revert that one and check whether it fixes things.
Please do so and let us know.

> 
> > Need a break, will look into details later.
> 
> Ok, some more info from my side, see relevant snippet below. We're
> basically not finding the pte at level 3 during the page walk for
> 7fff0b0f8908.
> 
> However, why we're even page walking this userspace address at that
> point I have no idea.
> 
> And the CR3 write right before this happens is there so I'm pretty much
> sure by now that this is related...
> 
>  qemu-system-x86-5007  [007] ...1   346.126204: vcpu_match_mmio: gva 0xffffffffff5fd0b0 gpa 0xfee000b0 Write GVA
>  qemu-system-x86-5007  [007] ...1   346.126204: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0
>  qemu-system-x86-5007  [007] ...1   346.126205: kvm_apic: apic_write APIC_EOI = 0x0
>  qemu-system-x86-5007  [007] ...1   346.126205: kvm_eoi: apicid 0 vector 253
>  qemu-system-x86-5007  [007] d..2   346.126206: kvm_entry: vcpu 0
>  qemu-system-x86-5007  [007] d..2   346.126211: kvm_exit: reason write_cr3 rip 0xffffffff816113a0 info 8000000000000000 0
>  qemu-system-x86-5007  [007] ...2   346.126214: kvm_mmu_get_page: sp gen 25 gfn 7b2b1 4 pae q0 wux !nxe root 0 sync existing
>  qemu-system-x86-5007  [007] d..2   346.126215: kvm_entry: vcpu 0
>  qemu-system-x86-5007  [007] d..2   346.126216: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
>  qemu-system-x86-5007  [007] ...1   346.126217: kvm_page_fault: address 7fff0b0f8908 error_code 2
VCPU faults on 7fff0b0f8908.

>  qemu-system-x86-5007  [007] ...1   346.126218: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
>  qemu-system-x86-5007  [007] ...1   346.126219: kvm_mmu_paging_element: pte 7b2b6067 level 4
>  qemu-system-x86-5007  [007] ...1   346.126220: kvm_mmu_paging_element: pte 0 level 3
>  qemu-system-x86-5007  [007] ...1   346.126220: kvm_mmu_walker_error: pferr 2 W
Address is not mapped by the page tables.

>  qemu-system-x86-5007  [007] ...1   346.126221: kvm_multiple_exception: nr: 14, prev: 255, has_error: 1, error_code: 0x2, reinj: 0
>  qemu-system-x86-5007  [007] ...1   346.126221: kvm_inj_exception: #PF (0x2)
KVM injects #PF.

>  qemu-system-x86-5007  [007] d..2   346.126222: kvm_entry: vcpu 0
>  qemu-system-x86-5007  [007] d..2   346.126223: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
>  qemu-system-x86-5007  [007] ...1   346.126224: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 1
reinj:1 means that previous injection failed due to another #PF that
happened during the event injection itself This may happen if GDT or fist
instruction of a fault handler is not mapped by shadow pages, but here
it says that the new page fault is at the same address as the previous
one as if GDT is or #PF handler is mapped there. Strange. Especially
since #DF is injected successfully, so GDT should be fine. May be wrong
cpl makes svm crazy?

 
>  qemu-system-x86-5007  [007] ...1   346.126225: kvm_page_fault: address 7fff0b0f8908 error_code 2
>  qemu-system-x86-5007  [007] ...1   346.126225: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 0 
>  qemu-system-x86-5007  [007] ...1   346.126226: kvm_mmu_paging_element: pte 7b2b6067 level 4
>  qemu-system-x86-5007  [007] ...1   346.126227: kvm_mmu_paging_element: pte 0 level 3
>  qemu-system-x86-5007  [007] ...1   346.126227: kvm_mmu_walker_error: pferr 0 
>  qemu-system-x86-5007  [007] ...1   346.126228: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
>  qemu-system-x86-5007  [007] ...1   346.126229: kvm_mmu_paging_element: pte 7b2b6067 level 4
>  qemu-system-x86-5007  [007] ...1   346.126230: kvm_mmu_paging_element: pte 0 level 3
>  qemu-system-x86-5007  [007] ...1   346.126230: kvm_mmu_walker_error: pferr 2 W
>  qemu-system-x86-5007  [007] ...1   346.126231: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 0
Here we getting a #PF while delivering another #PF which is, rightfully, transformed to #DF.

>  qemu-system-x86-5007  [007] ...1   346.126231: kvm_inj_exception: #DF (0x0)
>  qemu-system-x86-5007  [007] d..2   346.126232: kvm_entry: vcpu 0
>  qemu-system-x86-5007  [007] d..2   346.126371: kvm_exit: reason io rip 0xffffffff8131e623 info 3d40220 ffffffff8131e625
>  qemu-system-x86-5007  [007] ...1   346.126372: kvm_pio: pio_write at 0x3d4 size 2 count 1 val 0x130e 
>  qemu-system-x86-5007  [007] ...1   346.126374: kvm_userspace_exit: reason KVM_EXIT_IO (2)
>  qemu-system-x86-5007  [007] d..2   346.126383: kvm_entry: vcpu 0
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 13:14                               ` Borislav Petkov
  2014-06-29 13:42                                 ` Gleb Natapov
@ 2014-06-29 13:46                                 ` Borislav Petkov
  1 sibling, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2014-06-29 13:46 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gleb Natapov, Paolo Bonzini, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 03:14:43PM +0200, Borislav Petkov wrote:
> I better go and revert that one and check whether it fixes things.

Yahaaa, that was some good bisection work Jan! :-)

> 20 guest restart cycles and all is fine - it used to trigger after 5
max.

Phew, we have it right in time before the football game in 2 hrs. :-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 13:42                                 ` Gleb Natapov
@ 2014-06-29 14:01                                   ` Borislav Petkov
  2014-06-29 14:27                                     ` Gleb Natapov
  0 siblings, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-29 14:01 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Jan Kiszka, Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt,
	x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
> Please do so and let us know.

Yep, just did. Reverting ae9fedc793 fixes the issue.

> reinj:1 means that previous injection failed due to another #PF that
> happened during the event injection itself This may happen if GDT or fist
> instruction of a fault handler is not mapped by shadow pages, but here
> it says that the new page fault is at the same address as the previous
> one as if GDT is or #PF handler is mapped there. Strange. Especially
> since #DF is injected successfully, so GDT should be fine. May be wrong
> cpl makes svm crazy?

Well, I'm not going to even pretend to know kvm to know *when* we're
saving VMCB state but if we're saving the wrong CPL and then doing the
pagetable walk, I can very well imagine if the walker gets confused. One
possible issue could be U/S bit (bit 2) in the PTE bits which allows
access to supervisor pages only when CPL < 3. I.e., CPL has effect on
pagetable walk and a wrong CPL level could break it.

All a conjecture though...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 14:01                                   ` Borislav Petkov
@ 2014-06-29 14:27                                     ` Gleb Natapov
  2014-06-29 14:32                                       ` Jan Kiszka
  0 siblings, 1 reply; 30+ messages in thread
From: Gleb Natapov @ 2014-06-29 14:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jan Kiszka, Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt,
	x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote:
> On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
> > Please do so and let us know.
> 
> Yep, just did. Reverting ae9fedc793 fixes the issue.
> 
> > reinj:1 means that previous injection failed due to another #PF that
> > happened during the event injection itself This may happen if GDT or fist
> > instruction of a fault handler is not mapped by shadow pages, but here
> > it says that the new page fault is at the same address as the previous
> > one as if GDT is or #PF handler is mapped there. Strange. Especially
> > since #DF is injected successfully, so GDT should be fine. May be wrong
> > cpl makes svm crazy?
> 
> Well, I'm not going to even pretend to know kvm to know *when* we're
> saving VMCB state but if we're saving the wrong CPL and then doing the
> pagetable walk, I can very well imagine if the walker gets confused. One
> possible issue could be U/S bit (bit 2) in the PTE bits which allows
> access to supervisor pages only when CPL < 3. I.e., CPL has effect on
> pagetable walk and a wrong CPL level could break it.
> 
> All a conjecture though...
> 
Looks plausible, still strange that second #PF is at the same address as the first one though.
Anyway, not we have the commit to blame.

--
			Gleb.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 14:27                                     ` Gleb Natapov
@ 2014-06-29 14:32                                       ` Jan Kiszka
  2014-06-29 14:51                                         ` Jan Kiszka
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2014-06-29 14:32 UTC (permalink / raw)
  To: Gleb Natapov, Borislav Petkov
  Cc: Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm,
	Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 2618 bytes --]

On 2014-06-29 16:27, Gleb Natapov wrote:
> On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote:
>> On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
>>> Please do so and let us know.
>>
>> Yep, just did. Reverting ae9fedc793 fixes the issue.
>>
>>> reinj:1 means that previous injection failed due to another #PF that
>>> happened during the event injection itself This may happen if GDT or fist
>>> instruction of a fault handler is not mapped by shadow pages, but here
>>> it says that the new page fault is at the same address as the previous
>>> one as if GDT is or #PF handler is mapped there. Strange. Especially
>>> since #DF is injected successfully, so GDT should be fine. May be wrong
>>> cpl makes svm crazy?
>>
>> Well, I'm not going to even pretend to know kvm to know *when* we're
>> saving VMCB state but if we're saving the wrong CPL and then doing the
>> pagetable walk, I can very well imagine if the walker gets confused. One
>> possible issue could be U/S bit (bit 2) in the PTE bits which allows
>> access to supervisor pages only when CPL < 3. I.e., CPL has effect on
>> pagetable walk and a wrong CPL level could break it.
>>
>> All a conjecture though...
>>
> Looks plausible, still strange that second #PF is at the same address as the first one though.
> Anyway, not we have the commit to blame.

I suspect there is a gap between cause and effect. I'm tracing CPL
changes currently, and my first impression is that QEMU triggers an
unwanted switch from CPL 3 to 0 on vmport access:

 qemu-system-x86-11883 [001]  7493.378630: kvm_entry:            vcpu 0
 qemu-system-x86-11883 [001]  7493.378631: bprint:               svm_vcpu_run: entry cpl 0
 qemu-system-x86-11883 [001]  7493.378636: bprint:               svm_vcpu_run: exit cpl 3
 qemu-system-x86-11883 [001]  7493.378637: kvm_exit:             reason io rip 0x400854 info 56580241 400855
 qemu-system-x86-11883 [001]  7493.378640: kvm_emulate_insn:     0:400854:ed (prot64)
 qemu-system-x86-11883 [001]  7493.378642: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
 qemu-system-x86-11883 [001]  7493.378655: bprint:               kvm_arch_vcpu_ioctl_get_sregs: ss.dpl 0
 qemu-system-x86-11883 [001]  7493.378684: bprint:               kvm_arch_vcpu_ioctl_set_sregs: ss.dpl 0
 qemu-system-x86-11883 [001]  7493.378685: bprint:               svm_set_segment: cpl = 0
 qemu-system-x86-11883 [001]  7493.378711: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x3442554a 

Yeah... do we have to manually sync save.cpl into ss.dpl on get_sregs
on AMD?

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: __schedule #DF splat
  2014-06-29 14:32                                       ` Jan Kiszka
@ 2014-06-29 14:51                                         ` Jan Kiszka
  2014-06-29 15:12                                           ` [PATCH] KVM: SVM: Fix CPL export via SS.DPL Jan Kiszka
  0 siblings, 1 reply; 30+ messages in thread
From: Jan Kiszka @ 2014-06-29 14:51 UTC (permalink / raw)
  To: Gleb Natapov, Borislav Petkov
  Cc: Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt, x86-ml, kvm,
	Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 3149 bytes --]

On 2014-06-29 16:32, Jan Kiszka wrote:
> On 2014-06-29 16:27, Gleb Natapov wrote:
>> On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote:
>>> On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
>>>> Please do so and let us know.
>>>
>>> Yep, just did. Reverting ae9fedc793 fixes the issue.
>>>
>>>> reinj:1 means that previous injection failed due to another #PF that
>>>> happened during the event injection itself This may happen if GDT or fist
>>>> instruction of a fault handler is not mapped by shadow pages, but here
>>>> it says that the new page fault is at the same address as the previous
>>>> one as if GDT is or #PF handler is mapped there. Strange. Especially
>>>> since #DF is injected successfully, so GDT should be fine. May be wrong
>>>> cpl makes svm crazy?
>>>
>>> Well, I'm not going to even pretend to know kvm to know *when* we're
>>> saving VMCB state but if we're saving the wrong CPL and then doing the
>>> pagetable walk, I can very well imagine if the walker gets confused. One
>>> possible issue could be U/S bit (bit 2) in the PTE bits which allows
>>> access to supervisor pages only when CPL < 3. I.e., CPL has effect on
>>> pagetable walk and a wrong CPL level could break it.
>>>
>>> All a conjecture though...
>>>
>> Looks plausible, still strange that second #PF is at the same address as the first one though.
>> Anyway, not we have the commit to blame.
> 
> I suspect there is a gap between cause and effect. I'm tracing CPL
> changes currently, and my first impression is that QEMU triggers an
> unwanted switch from CPL 3 to 0 on vmport access:
> 
>  qemu-system-x86-11883 [001]  7493.378630: kvm_entry:            vcpu 0
>  qemu-system-x86-11883 [001]  7493.378631: bprint:               svm_vcpu_run: entry cpl 0
>  qemu-system-x86-11883 [001]  7493.378636: bprint:               svm_vcpu_run: exit cpl 3
>  qemu-system-x86-11883 [001]  7493.378637: kvm_exit:             reason io rip 0x400854 info 56580241 400855
>  qemu-system-x86-11883 [001]  7493.378640: kvm_emulate_insn:     0:400854:ed (prot64)
>  qemu-system-x86-11883 [001]  7493.378642: kvm_userspace_exit:   reason KVM_EXIT_IO (2)
>  qemu-system-x86-11883 [001]  7493.378655: bprint:               kvm_arch_vcpu_ioctl_get_sregs: ss.dpl 0
>  qemu-system-x86-11883 [001]  7493.378684: bprint:               kvm_arch_vcpu_ioctl_set_sregs: ss.dpl 0
>  qemu-system-x86-11883 [001]  7493.378685: bprint:               svm_set_segment: cpl = 0
>  qemu-system-x86-11883 [001]  7493.378711: kvm_pio:              pio_read at 0x5658 size 4 count 1 val 0x3442554a 
> 
> Yeah... do we have to manually sync save.cpl into ss.dpl on get_sregs
> on AMD?
> 

Applying this logic:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..b5e994a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
 		 */
 		if (var->unusable)
 			var->db = 0;
+		var->dpl = to_svm(vcpu)->vmcb->save.cpl;
 		break;
 	}
 }

...and my VM runs smoothly so far. Does it make sense in all scenarios?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH] KVM: SVM: Fix CPL export via SS.DPL
  2014-06-29 14:51                                         ` Jan Kiszka
@ 2014-06-29 15:12                                           ` Jan Kiszka
  2014-06-29 18:00                                             ` Borislav Petkov
  2014-06-30 15:01                                             ` Paolo Bonzini
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kiszka @ 2014-06-29 15:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, Borislav Petkov, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 810 bytes --]

From: Jan Kiszka <jan.kiszka@siemens.com>

We import the CPL via SS.DPL since ae9fedc793. However, we fail to
export it this way so far. This caused spurious guest crashes, e.g. of
Linux when accessing the vmport from guest user space which triggered
register saving/restoring to/from host user space.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---

Just in time for the next match :D

 arch/x86/kvm/svm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..b5e994a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
 		 */
 		if (var->unusable)
 			var->db = 0;
+		var->dpl = to_svm(vcpu)->vmcb->save.cpl;
 		break;
 	}
 }
-- 
1.8.4.5


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] KVM: SVM: Fix CPL export via SS.DPL
  2014-06-29 15:12                                           ` [PATCH] KVM: SVM: Fix CPL export via SS.DPL Jan Kiszka
@ 2014-06-29 18:00                                             ` Borislav Petkov
  2014-06-30 15:01                                             ` Paolo Bonzini
  1 sibling, 0 replies; 30+ messages in thread
From: Borislav Petkov @ 2014-06-29 18:00 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Paolo Bonzini, Gleb Natapov, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Sun, Jun 29, 2014 at 05:12:43PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> We import the CPL via SS.DPL since ae9fedc793. However, we fail to
> export it this way so far. This caused spurious guest crashes, e.g. of
> Linux when accessing the vmport from guest user space which triggered
> register saving/restoring to/from host user space.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Yep, looks good.

Tested-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] KVM: SVM: Fix CPL export via SS.DPL
  2014-06-29 15:12                                           ` [PATCH] KVM: SVM: Fix CPL export via SS.DPL Jan Kiszka
  2014-06-29 18:00                                             ` Borislav Petkov
@ 2014-06-30 15:01                                             ` Paolo Bonzini
  2014-06-30 15:03                                               ` Jan Kiszka
  1 sibling, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2014-06-30 15:01 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gleb Natapov, Borislav Petkov, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

Il 29/06/2014 17:12, Jan Kiszka ha scritto:
> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> We import the CPL via SS.DPL since ae9fedc793. However, we fail to
> export it this way so far. This caused spurious guest crashes, e.g. of
> Linux when accessing the vmport from guest user space which triggered
> register saving/restoring to/from host user space.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>
> Just in time for the next match :D
>
>  arch/x86/kvm/svm.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..b5e994a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
>  		 */
>  		if (var->unusable)
>  			var->db = 0;
> +		var->dpl = to_svm(vcpu)->vmcb->save.cpl;
>  		break;
>  	}
>  }
>

Thanks.  In theory this is not necessary, the SS.DPL should be the same 
as the CPL according to the manuals (the manual say that the SS.DPL 
"should match" the CPL, and that's the only reason why I included the 
import in ae9fedc793).  But apparently this is not the case.

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] KVM: SVM: Fix CPL export via SS.DPL
  2014-06-30 15:01                                             ` Paolo Bonzini
@ 2014-06-30 15:03                                               ` Jan Kiszka
  2014-06-30 15:15                                                 ` Borislav Petkov
  2014-06-30 15:26                                                 ` Paolo Bonzini
  0 siblings, 2 replies; 30+ messages in thread
From: Jan Kiszka @ 2014-06-30 15:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, Borislav Petkov, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

[-- Attachment #1: Type: text/plain, Size: 1451 bytes --]

On 2014-06-30 17:01, Paolo Bonzini wrote:
> Il 29/06/2014 17:12, Jan Kiszka ha scritto:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> We import the CPL via SS.DPL since ae9fedc793. However, we fail to
>> export it this way so far. This caused spurious guest crashes, e.g. of
>> Linux when accessing the vmport from guest user space which triggered
>> register saving/restoring to/from host user space.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>
>> Just in time for the next match :D
>>
>>  arch/x86/kvm/svm.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index ec8366c..b5e994a 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
>>           */
>>          if (var->unusable)
>>              var->db = 0;
>> +        var->dpl = to_svm(vcpu)->vmcb->save.cpl;
>>          break;
>>      }
>>  }
>>
> 
> Thanks.  In theory this is not necessary, the SS.DPL should be the same
> as the CPL according to the manuals (the manual say that the SS.DPL
> "should match" the CPL, and that's the only reason why I included the
> import in ae9fedc793).  But apparently this is not the case.

15.5.1:

"When examining segment attributes after a #VMEXIT:
[...]
• Retrieve the CPL from the CPL field in the VMCB, not from any segment
DPL."

Jan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] KVM: SVM: Fix CPL export via SS.DPL
  2014-06-30 15:03                                               ` Jan Kiszka
@ 2014-06-30 15:15                                                 ` Borislav Petkov
  2014-06-30 15:25                                                   ` Gleb Natapov
  2014-06-30 15:26                                                 ` Paolo Bonzini
  1 sibling, 1 reply; 30+ messages in thread
From: Borislav Petkov @ 2014-06-30 15:15 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Paolo Bonzini, Gleb Natapov, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

On Mon, Jun 30, 2014 at 05:03:57PM +0200, Jan Kiszka wrote:
> 15.5.1:
> 
> "When examining segment attributes after a #VMEXIT:
> [...]
> • Retrieve the CPL from the CPL field in the VMCB, not from any segment
> DPL."

Heey, it is even documented! :-P

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] KVM: SVM: Fix CPL export via SS.DPL
  2014-06-30 15:15                                                 ` Borislav Petkov
@ 2014-06-30 15:25                                                   ` Gleb Natapov
  0 siblings, 0 replies; 30+ messages in thread
From: Gleb Natapov @ 2014-06-30 15:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Jan Kiszka, Paolo Bonzini, lkml, Peter Zijlstra, Steven Rostedt,
	x86-ml, kvm, Jörg Rödel

On Mon, Jun 30, 2014 at 05:15:44PM +0200, Borislav Petkov wrote:
> On Mon, Jun 30, 2014 at 05:03:57PM +0200, Jan Kiszka wrote:
> > 15.5.1:
> > 
> > "When examining segment attributes after a #VMEXIT:
> > [...]
> > • Retrieve the CPL from the CPL field in the VMCB, not from any segment
> > DPL."
> 
> Heey, it is even documented! :-P
> 
Yes, on SVM we should always respect this field. Unfortunately there
is no such field in VMX, so we have to do DPL gymnastics there.

--
			Gleb.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] KVM: SVM: Fix CPL export via SS.DPL
  2014-06-30 15:03                                               ` Jan Kiszka
  2014-06-30 15:15                                                 ` Borislav Petkov
@ 2014-06-30 15:26                                                 ` Paolo Bonzini
  1 sibling, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2014-06-30 15:26 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Gleb Natapov, Borislav Petkov, lkml, Peter Zijlstra,
	Steven Rostedt, x86-ml, kvm, Jörg Rödel

Il 30/06/2014 17:03, Jan Kiszka ha scritto:
> 15.5.1:
>
> "When examining segment attributes after a #VMEXIT:
> [...]
> • Retrieve the CPL from the CPL field in the VMCB, not from any segment
> DPL."

It's only the fourth paragraph below the one I did read...

Paolo

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-06-30 15:27 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-25 15:32 __schedule #DF splat Borislav Petkov
2014-06-25 20:26 ` Borislav Petkov
2014-06-27 10:18   ` Borislav Petkov
2014-06-27 11:41     ` Paolo Bonzini
2014-06-27 11:55       ` Borislav Petkov
2014-06-27 12:01         ` Paolo Bonzini
2014-06-27 12:10           ` Borislav Petkov
2014-06-28 11:44             ` Borislav Petkov
2014-06-29  6:46               ` Gleb Natapov
2014-06-29  9:56                 ` Jan Kiszka
2014-06-29 10:24                   ` Gleb Natapov
2014-06-29 10:31                     ` Jan Kiszka
2014-06-29 10:53                       ` Gleb Natapov
2014-06-29 10:59                         ` Jan Kiszka
2014-06-29 11:51                           ` Borislav Petkov
2014-06-29 12:22                             ` Jan Kiszka
2014-06-29 13:14                               ` Borislav Petkov
2014-06-29 13:42                                 ` Gleb Natapov
2014-06-29 14:01                                   ` Borislav Petkov
2014-06-29 14:27                                     ` Gleb Natapov
2014-06-29 14:32                                       ` Jan Kiszka
2014-06-29 14:51                                         ` Jan Kiszka
2014-06-29 15:12                                           ` [PATCH] KVM: SVM: Fix CPL export via SS.DPL Jan Kiszka
2014-06-29 18:00                                             ` Borislav Petkov
2014-06-30 15:01                                             ` Paolo Bonzini
2014-06-30 15:03                                               ` Jan Kiszka
2014-06-30 15:15                                                 ` Borislav Petkov
2014-06-30 15:25                                                   ` Gleb Natapov
2014-06-30 15:26                                                 ` Paolo Bonzini
2014-06-29 13:46                                 ` __schedule #DF splat Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.