All of lore.kernel.org
 help / color / mirror / Atom feed
* EPT: Misconfiguration
@ 2011-01-20 11:48 Ruben Kerkhof
  2011-01-20 11:59 ` Ruben Kerkhof
  2011-01-21 13:22 ` Marcelo Tosatti
  0 siblings, 2 replies; 19+ messages in thread
From: Ruben Kerkhof @ 2011-01-20 11:48 UTC (permalink / raw)
  To: kvm

I'm suddenly getting lots of the following errors on a server running
2.36.7, but I have no idea what it means:

2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
2011-01-20T12:41:18.358624+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x50743e007 level 4
2011-01-20T12:41:18.358627+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x523de2007 level 3
2011-01-20T12:41:18.358629+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x62336f007 level 2
2011-01-20T12:41:18.360109+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
2011-01-20T12:41:18.360137+01:00 phy005 kernel:
ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
2011-01-20T12:41:18.360151+01:00 phy005 kernel: ------------[ cut here
]------------
2011-01-20T12:41:18.360155+01:00 phy005 kernel: WARNING: at
arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]()
2011-01-20T12:41:18.360160+01:00 phy005 kernel: Hardware name: X8DTU
2011-01-20T12:41:18.363296+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_
filter ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt i2c_core
ioatdma joydev iTCO_vendor_support serio_raw dca 3w_9xxx [last
unloaded: scsi_wait_scan]
2011-01-20T12:41:18.363312+01:00 phy005 kernel: Pid: 3595, comm:
qemu-kvm Tainted: G      D W  2.6.34.7-66.tilaa.fc13.x86_64 #1
2011-01-20T12:41:18.363314+01:00 phy005 kernel: Call Trace:
2011-01-20T12:41:18.364385+01:00 phy005 kernel: [<ffffffff8104d11f>]
warn_slowpath_common+0x7c/0x94
2011-01-20T12:41:18.364455+01:00 phy005 kernel: [<ffffffff8104d14b>]
warn_slowpath_null+0x14/0x16
2011-01-20T12:41:18.364462+01:00 phy005 kernel: [<ffffffffa00ba7fb>]
handle_ept_misconfig+0x152/0x1d8 [kvm_intel]
2011-01-20T12:41:18.364466+01:00 phy005 kernel: [<ffffffffa00bb401>]
vmx_handle_exit+0x204/0x23a [kvm_intel]
2011-01-20T12:41:18.370619+01:00 phy005 kernel: [<ffffffffa0075998>]
kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
2011-01-20T12:41:18.370731+01:00 phy005 kernel: [<ffffffffa00645ba>]
kvm_vcpu_ioctl+0xfd/0x56e [kvm]
2011-01-20T12:41:18.370737+01:00 phy005 kernel: [<ffffffff8100a60e>] ?
apic_timer_interrupt+0xe/0x20
2011-01-20T12:41:18.370741+01:00 phy005 kernel: [<ffffffff8111aa2f>]
vfs_ioctl+0x32/0xa6
2011-01-20T12:41:18.371562+01:00 phy005 kernel: [<ffffffff8111afa2>]
do_vfs_ioctl+0x483/0x4c9
2011-01-20T12:41:18.371577+01:00 phy005 kernel: [<ffffffff8111b03e>]
sys_ioctl+0x56/0x79
2011-01-20T12:41:18.371581+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-01-20T12:41:18.372244+01:00 phy005 kernel: ---[ end trace
7d57b311d4a5b22c ]---
2011-01-20T12:41:57.568322+01:00 phy005 kernel: general protection
fault: 0000 [#2] SMP
2011-01-20T12:41:57.568335+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-01-20T12:41:57.568339+01:00 phy005 kernel: CPU 0

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-20 11:48 EPT: Misconfiguration Ruben Kerkhof
@ 2011-01-20 11:59 ` Ruben Kerkhof
  2011-01-21 13:22 ` Marcelo Tosatti
  1 sibling, 0 replies; 19+ messages in thread
From: Ruben Kerkhof @ 2011-01-20 11:59 UTC (permalink / raw)
  To: kvm

On Thu, Jan 20, 2011 at 12:48, Ruben Kerkhof <ruben@rubenkerkhof.com> wrote:
> I'm suddenly getting lots of the following errors on a server running
> 2.36.7, but I have no idea what it means:

Sorry, that should be 2.34.7.

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-20 11:48 EPT: Misconfiguration Ruben Kerkhof
  2011-01-20 11:59 ` Ruben Kerkhof
@ 2011-01-21 13:22 ` Marcelo Tosatti
  2011-01-25 14:44   ` Ruben Kerkhof
  1 sibling, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2011-01-21 13:22 UTC (permalink / raw)
  To: Ruben Kerkhof; +Cc: kvm

On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
> I'm suddenly getting lots of the following errors on a server running
> 2.36.7, but I have no idea what it means:
> 
> 2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
> 2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
> 2011-01-20T12:41:18.358624+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x50743e007 level 4
> 2011-01-20T12:41:18.358627+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x523de2007 level 3
> 2011-01-20T12:41:18.358629+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x62336f007 level 2
> 2011-01-20T12:41:18.360109+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
> 2011-01-20T12:41:18.360137+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
> 2011-01-20T12:41:18.360151+01:00 phy005 kernel: ------------[ cut here
> ]------------

A shadow pagetable entry in memory has bits 45-49 set, which is not
allowed. Its probably bad memory if this errors were not present before 
with the same workload and host software. Would be useful to see what
memtest86 says.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-21 13:22 ` Marcelo Tosatti
@ 2011-01-25 14:44   ` Ruben Kerkhof
  2011-01-25 17:39     ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Ruben Kerkhof @ 2011-01-25 14:44 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm

Hi Marcello,

On Fri, Jan 21, 2011 at 14:22, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
>> I'm suddenly getting lots of the following errors on a server running
>> 2.36.7, but I have no idea what it means:
>>
>> 2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
>> 2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
>> 2011-01-20T12:41:18.358624+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x50743e007 level 4
>> 2011-01-20T12:41:18.358627+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x523de2007 level 3
>> 2011-01-20T12:41:18.358629+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x62336f007 level 2
>> 2011-01-20T12:41:18.360109+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
>> 2011-01-20T12:41:18.360137+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
>> 2011-01-20T12:41:18.360151+01:00 phy005 kernel: ------------[ cut here
>> ]------------
>
> A shadow pagetable entry in memory has bits 45-49 set, which is not
> allowed. Its probably bad memory if this errors were not present before
> with the same workload and host software. Would be useful to see what
> memtest86 says.

I did 2 memtest86+ passes, but no errors were found.

Just to be save, we replaced all memory. The machine has been running
stable over the weekend, but now gives exactly the same error.

Is there anything else which could cause this?

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-25 14:44   ` Ruben Kerkhof
@ 2011-01-25 17:39     ` Avi Kivity
  2011-01-25 18:29       ` Ruben Kerkhof
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2011-01-25 17:39 UTC (permalink / raw)
  To: Ruben Kerkhof; +Cc: Marcelo Tosatti, kvm

On 01/25/2011 04:44 PM, Ruben Kerkhof wrote:
> Hi Marcello,
>
> On Fri, Jan 21, 2011 at 14:22, Marcelo Tosatti<mtosatti@redhat.com>  wrote:
> >  On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
> >>  I'm suddenly getting lots of the following errors on a server running
> >>  2.36.7, but I have no idea what it means:
> >>
> >>  2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
> >>  2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
> >>  2011-01-20T12:41:18.358624+01:00 phy005 kernel:
> >>  ept_misconfig_inspect_spte: spte 0x50743e007 level 4
> >>  2011-01-20T12:41:18.358627+01:00 phy005 kernel:
> >>  ept_misconfig_inspect_spte: spte 0x523de2007 level 3
> >>  2011-01-20T12:41:18.358629+01:00 phy005 kernel:
> >>  ept_misconfig_inspect_spte: spte 0x62336f007 level 2
> >>  2011-01-20T12:41:18.360109+01:00 phy005 kernel:
> >>  ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
> >>  2011-01-20T12:41:18.360137+01:00 phy005 kernel:
> >>  ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
> >>  2011-01-20T12:41:18.360151+01:00 phy005 kernel: ------------[ cut here
> >>  ]------------
> >
> >  A shadow pagetable entry in memory has bits 45-49 set, which is not
> >  allowed. Its probably bad memory if this errors were not present before
> >  with the same workload and host software. Would be useful to see what
> >  memtest86 says.
>
> I did 2 memtest86+ passes, but no errors were found.
>
> Just to be save, we replaced all memory. The machine has been running
> stable over the weekend, but now gives exactly the same error.
>
> Is there anything else which could cause this?

Try updating the BIOS.

When you say "suddenly", this was with no changes to software and hardware?

Is cooling adequate?

How much memory is on that machine?  Even outside the reserved bits the 
address looks way too large.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-25 17:39     ` Avi Kivity
@ 2011-01-25 18:29       ` Ruben Kerkhof
  2011-01-26  9:52         ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Ruben Kerkhof @ 2011-01-25 18:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

Hi Avi,

On Tue, Jan 25, 2011 at 18:39, Avi Kivity <avi@redhat.com> wrote:
> On 01/25/2011 04:44 PM, Ruben Kerkhof wrote:
>>
>> Hi Marcello,
>>
>> On Fri, Jan 21, 2011 at 14:22, Marcelo Tosatti<mtosatti@redhat.com>
>>  wrote:
>> >  On Thu, Jan 20, 2011 at 12:48:00PM +0100, Ruben Kerkhof wrote:
>> >>  I'm suddenly getting lots of the following errors on a server running
>> >>  2.36.7, but I have no idea what it means:
>> >>
>> >>  2011-01-20T12:41:18.358603+01:00 phy005 kernel: EPT: Misconfiguration.
>> >>  2011-01-20T12:41:18.358621+01:00 phy005 kernel: EPT: GPA: 0x3dbff6b0
>> >>  2011-01-20T12:41:18.358624+01:00 phy005 kernel:
>> >>  ept_misconfig_inspect_spte: spte 0x50743e007 level 4
>> >>  2011-01-20T12:41:18.358627+01:00 phy005 kernel:
>> >>  ept_misconfig_inspect_spte: spte 0x523de2007 level 3
>> >>  2011-01-20T12:41:18.358629+01:00 phy005 kernel:
>> >>  ept_misconfig_inspect_spte: spte 0x62336f007 level 2
>> >>  2011-01-20T12:41:18.360109+01:00 phy005 kernel:
>> >>  ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
>> >>  2011-01-20T12:41:18.360137+01:00 phy005 kernel:
>> >>  ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
>> >>  2011-01-20T12:41:18.360151+01:00 phy005 kernel: ------------[ cut here
>> >>  ]------------
>> >
>> >  A shadow pagetable entry in memory has bits 45-49 set, which is not
>> >  allowed. Its probably bad memory if this errors were not present before
>> >  with the same workload and host software. Would be useful to see what
>> >  memtest86 says.
>>
>> I did 2 memtest86+ passes, but no errors were found.
>>
>> Just to be save, we replaced all memory. The machine has been running
>> stable over the weekend, but now gives exactly the same error.
>>
>> Is there anything else which could cause this?
>
> Try updating the BIOS.

That's the first thing we did. It's a Supermicro with an X8DTU-F
board, updated to bios version 2.0b (which includes the latest
microcode). The procs are Intel 5620's

> When you say "suddenly", this was with no changes to software and hardware?

The host software and hardware hasn't changed in the two months since
the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.

We host customer vms on it though, so virtual machines come and go.
Various operating systems, a mixture of Linux, FreeBSD and Windows
2008 R2. We have other machines with the same config without these
problems though.

> Is cooling adequate?

Yes.

> How much memory is on that machine?  Even outside the reserved bits the
> address looks way too large.

48GB.

This time I have a few different messages though:

2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection
fault: 0000 [#1] SMP
2011-01-25T11:58:50.001310+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-01-25T11:58:50.001316+01:00 phy005 kernel: CPU 12
2011-01-25T11:58:50.001323+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt i2c_core ioatdma
joydev iTCO_vendor_support dca serio_raw 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-01-25T11:58:50.001327+01:00 phy005 kernel:
2011-01-25T11:58:50.001331+01:00 phy005 kernel: Pid: 1849, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
2011-01-25T11:58:50.001336+01:00 phy005 kernel: RIP:
0010:[<ffffffff810d0216>]  [<ffffffff810d0216>] __free_pages+0x9/0x26
2011-01-25T11:58:50.001339+01:00 phy005 kernel: RSP:
0018:ffff8802fbe45ab8  EFLAGS: 00010216
2011-01-25T11:58:50.001343+01:00 phy005 kernel: RAX: ffff88061ef8c000
RBX: ffff8803131ec100 RCX: 0000000000000000
2011-01-25T11:58:50.001348+01:00 phy005 kernel: RDX: 00000000000000ff
RSI: 0000000000000000 RDI: 1603a07305001568
2011-01-25T11:58:50.001352+01:00 phy005 kernel: RBP: ffff8802fbe45ab8
R08: ffffea000a83b7f0 R09: 0000000000000004
2011-01-25T11:58:50.001356+01:00 phy005 kernel: R10: 0000000000000000
R11: ffff8802fbe45b38 R12: 0000000000000100
2011-01-25T11:58:50.001359+01:00 phy005 kernel: R13: 0000000000000001
R14: ffff8802e934c010 R15: ffff8802e934c010
2011-01-25T11:58:50.001363+01:00 phy005 kernel: FS:
00007f1f14844700(0000) GS:ffff880655480000(0000)
knlGS:0000000000000000
2011-01-25T11:58:50.001366+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
2011-01-25T11:58:50.001370+01:00 phy005 kernel: CR2: 00000000b72f6cb0
CR3: 0000000ba561c000 CR4: 00000000000026e0
2011-01-25T11:58:50.001374+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-01-25T11:58:50.001378+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-01-25T11:58:50.001382+01:00 phy005 kernel: Process qemu-kvm (pid:
1849, threadinfo ffff8802fbe44000, task ffff8802ea11aee0)
2011-01-25T11:58:50.001385+01:00 phy005 kernel: Stack:
2011-01-25T11:58:50.001389+01:00 phy005 kernel: ffff8802fbe45af8
ffffffff810ee455 0000000000000206 ffffc9001e2d4000
2011-01-25T11:58:50.001392+01:00 phy005 kernel: <0> ffff8802e934c010
ffff880b680a2050 0000000000000000 ffff880b680a2000
2011-01-25T11:58:50.001396+01:00 phy005 kernel: <0> ffff8802fbe45b08
ffffffff810ee504 ffff8802fbe45b28 ffffffffa0065d70
2011-01-25T11:58:50.001399+01:00 phy005 kernel: Call Trace:
2011-01-25T11:58:50.001402+01:00 phy005 kernel: [<ffffffff810ee455>]
__vunmap+0x8e/0xbd
2011-01-25T11:58:50.001406+01:00 phy005 kernel: [<ffffffff810ee504>]
vfree+0x2e/0x30
2011-01-25T11:58:50.001410+01:00 phy005 kernel: [<ffffffffa0065d70>]
kvm_free_physmem_slot+0x2a/0xa4 [kvm]
2011-01-25T11:58:50.001414+01:00 phy005 kernel: [<ffffffffa00663fa>]
kvm_free_physmem+0x32/0x4b [kvm]
2011-01-25T11:58:50.001417+01:00 phy005 kernel: [<ffffffffa006f90e>]
kvm_arch_destroy_vm+0xf1/0x13d [kvm]
2011-01-25T11:58:50.001421+01:00 phy005 kernel: [<ffffffffa00664ce>]
kvm_put_kvm+0xbb/0xe2 [kvm]
2011-01-25T11:58:50.001424+01:00 phy005 kernel: [<ffffffffa0066d04>]
kvm_vcpu_release+0x18/0x1c [kvm]
2011-01-25T11:58:50.001427+01:00 phy005 kernel: [<ffffffff8110ef2b>]
__fput+0x12a/0x1dc
2011-01-25T11:58:50.001438+01:00 phy005 kernel: [<ffffffff8110eff7>]
fput+0x1a/0x1c
2011-01-25T11:58:50.001441+01:00 phy005 kernel: [<ffffffff8110c067>]
filp_close+0x68/0x72
2011-01-25T11:58:50.001444+01:00 phy005 kernel: [<ffffffff8104f298>]
put_files_struct+0x6a/0xcc
2011-01-25T11:58:50.001447+01:00 phy005 kernel: [<ffffffff8104f33b>]
exit_files+0x41/0x46
2011-01-25T11:58:50.001450+01:00 phy005 kernel: [<ffffffff81050c36>]
do_exit+0x295/0x752
2011-01-25T11:58:50.001453+01:00 phy005 kernel: [<ffffffff8104816f>] ?
default_wake_function+0x12/0x14
2011-01-25T11:58:50.001459+01:00 phy005 kernel: [<ffffffff81051174>]
do_group_exit+0x81/0xab
2011-01-25T11:58:50.001463+01:00 phy005 kernel: [<ffffffff8105e5cd>]
get_signal_to_deliver+0x3a6/0x3c8
2011-01-25T11:58:50.001466+01:00 phy005 kernel: [<ffffffff81092b6a>] ?
audit_buffer_free+0x75/0x7a
2011-01-25T11:58:50.001469+01:00 phy005 kernel: [<ffffffff81009038>]
do_signal+0x72/0x6b8
2011-01-25T11:58:50.001472+01:00 phy005 kernel: [<ffffffff8110efce>] ?
__fput+0x1cd/0x1dc
2011-01-25T11:58:50.001478+01:00 phy005 kernel: [<ffffffff810096a6>]
do_notify_resume+0x28/0x86
2011-01-25T11:58:50.001482+01:00 phy005 kernel: [<ffffffff81009f3e>]
int_signal+0x12/0x17
2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00 <f0> ff 4f 08 0f 94 c0 84
c0 74 10 85 f6 75 07 e8 63 fe ff ff eb
2011-01-25T11:58:50.001489+01:00 phy005 kernel: RIP
[<ffffffff810d0216>] __free_pages+0x9/0x26
2011-01-25T11:58:50.001494+01:00 phy005 kernel: RSP <ffff8802fbe45ab8>
2011-01-25T11:58:50.001497+01:00 phy005 kernel: ---[ end trace
643b51f38991abec ]---
2011-01-25T11:58:50.001500+01:00 phy005 kernel: Fixing recursive fault
but reboot is needed!

and a bit later:

2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
page table at address 7f37b37ff000
2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
94e538067 PMD 61e5bf067 PTE 1603a0730500e067
2011-01-25T12:06:32.673962+01:00 phy005 kernel: Bad pagetable: 0009 [#2] SMP
2011-01-25T12:06:32.673965+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-01-25T12:06:32.673967+01:00 phy005 kernel: CPU 2
2011-01-25T12:06:32.673972+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt i2c_core ioatdma
joydev iTCO_vendor_support dca serio_raw 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-01-25T12:06:32.673978+01:00 phy005 kernel:
2011-01-25T12:06:32.673981+01:00 phy005 kernel: Pid: 2428, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
X8DTU/X8DTU
2011-01-25T12:06:32.673985+01:00 phy005 kernel: RIP:
0010:[<ffffffff81213bd7>]  [<ffffffff81213bd7>]
copy_user_generic_string+0x17/0x40
2011-01-25T12:06:32.673987+01:00 phy005 kernel: RSP:
0018:ffff88061df85ba0  EFLAGS: 00010202
2011-01-25T12:06:32.673989+01:00 phy005 kernel: RAX: ffff88061df84000
RBX: ffff88061df85e98 RCX: 0000000000000005
2011-01-25T12:06:32.673992+01:00 phy005 kernel: RDX: 0000000000000720
RSI: 00007f37b37ff000 RDI: ffff8805db642453
2011-01-25T12:06:32.673999+01:00 phy005 kernel: RBP: ffff88061df85bc8
R08: 0000000000000b76 R09: 0000000000000b80
2011-01-25T12:06:32.674003+01:00 phy005 kernel: R10: 0000000000000650
R11: 0000000000000004 R12: ffff8805db642453
2011-01-25T12:06:32.674007+01:00 phy005 kernel: R13: 0000000000000725
R14: 0000000000000725 R15: 0000000000000725
2011-01-25T12:06:32.674011+01:00 phy005 kernel: FS:
00007f37e20e2700(0000) GS:ffff880002040000(0000)
knlGS:0000000000000000
2011-01-25T12:06:32.674014+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
2011-01-25T12:06:32.674018+01:00 phy005 kernel: CR2: 00007f37b37ff000
CR3: 0000000c23570000 CR4: 00000000000026e0
2011-01-25T12:06:32.674022+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-01-25T12:06:32.674036+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-01-25T12:06:32.674041+01:00 phy005 kernel: Process qemu-kvm (pid:
2428, threadinfo ffff88061df84000, task ffff88061c1aaee0)
2011-01-25T12:06:32.674044+01:00 phy005 kernel: Stack:
2011-01-25T12:06:32.674048+01:00 phy005 kernel: ffffffff8139f26e
0000000000000000 0000000000000725 00007f37b37ff000
2011-01-25T12:06:32.674052+01:00 phy005 kernel: <0> ffff8805db642453
ffff88061df85c08 ffffffff8139f475 0000000000000063
2011-01-25T12:06:32.674056+01:00 phy005 kernel: <0> 0000000000000b76
0000000000000000 ffff88061e089a00 0000000000000b76
2011-01-25T12:06:32.674058+01:00 phy005 kernel: Call Trace:
2011-01-25T12:06:32.674061+01:00 phy005 kernel: [<ffffffff8139f26e>] ?
copy_from_user+0x2f/0x31
2011-01-25T12:06:32.674063+01:00 phy005 kernel: [<ffffffff8139f475>]
memcpy_fromiovecend+0x57/0x82
2011-01-25T12:06:32.674075+01:00 phy005 kernel: [<ffffffff8139fb49>]
skb_copy_datagram_from_iovec+0x5d/0x1ea
2011-01-25T12:06:32.674077+01:00 phy005 kernel: [<ffffffff8111c413>] ?
pollwake+0x0/0x54
2011-01-25T12:06:32.674080+01:00 phy005 kernel: [<ffffffff8139f475>] ?
memcpy_fromiovecend+0x57/0x82
2011-01-25T12:06:32.674082+01:00 phy005 kernel: [<ffffffffa00408f5>]
tun_get_user+0x1bd/0x3e3 [tun]
2011-01-25T12:06:32.674084+01:00 phy005 kernel: [<ffffffffa0040b42>] ?
tun_chr_aio_write+0x0/0x98 [tun]
2011-01-25T12:06:32.674087+01:00 phy005 kernel: [<ffffffffa0040bba>]
tun_chr_aio_write+0x78/0x98 [tun]
2011-01-25T12:06:32.674089+01:00 phy005 kernel: [<ffffffff8110d995>]
do_sync_readv_writev+0xc1/0x100
2011-01-25T12:06:32.674091+01:00 phy005 kernel: [<ffffffff811d78b4>] ?
selinux_file_permission+0xa7/0xb3
2011-01-25T12:06:32.674094+01:00 phy005 kernel: [<ffffffff8110d6f9>] ?
copy_from_user+0x2f/0x31
2011-01-25T12:06:32.674097+01:00 phy005 kernel: [<ffffffff811cdb0b>] ?
security_file_permission+0x16/0x18
2011-01-25T12:06:32.674099+01:00 phy005 kernel: [<ffffffff8110e68c>]
do_readv_writev+0xa7/0x127
2011-01-25T12:06:32.674101+01:00 phy005 kernel: [<ffffffff81065601>] ?
sys_timer_settime+0x259/0x2ab
2011-01-25T12:06:32.674104+01:00 phy005 kernel: [<ffffffff8110e74f>]
vfs_writev+0x43/0x4e
2011-01-25T12:06:32.674106+01:00 phy005 kernel: [<ffffffff8110e83f>]
sys_writev+0x4a/0x93
2011-01-25T12:06:32.674109+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-01-25T12:06:32.674112+01:00 phy005 kernel: Code: 06 88 07 48 ff
c6 48 ff c7 ff c9 75 f2 31 c0 c3 0f 1f 40 00 21 d2 74 30 83 fa 08 72
27 89 f9 83 e1 07 74 15 83 e9 08 f7 d9 29 ca <8a> 06 88 07 48 ff c6 48
ff c7 ff c9 75 f2 89 d1 c1 e9 03 83 e2
2011-01-25T12:06:32.674116+01:00 phy005 kernel: RIP
[<ffffffff81213bd7>] copy_user_generic_string+0x17/0x40
2011-01-25T12:06:32.674118+01:00 phy005 kernel: RSP <ffff88061df85ba0>
2011-01-25T12:06:32.674120+01:00 phy005 kernel: ---[ end trace
643b51f38991abed ]---
2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
2011-01-25T12:38:49.417526+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
2011-01-25T12:38:49.417532+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5db595007 level 3
2011-01-25T12:38:49.417553+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
2011-01-25T12:38:49.417558+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1
2011-01-25T12:38:49.419858+01:00 phy005 kernel:
ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
2011-01-25T12:38:49.419881+01:00 phy005 kernel: ------------[ cut here
]------------
2011-01-25T12:38:49.419884+01:00 phy005 kernel: WARNING: at
arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]()
2011-01-25T12:38:49.419886+01:00 phy005 kernel: Hardware name: X8DTU
2011-01-25T12:38:49.419890+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt i2c_core ioatdma
joydev iTCO_vendor_support dca serio_raw 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-01-25T12:38:49.419893+01:00 phy005 kernel: Pid: 4475, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
2011-01-25T12:38:49.419896+01:00 phy005 kernel: Call Trace:
2011-01-25T12:38:49.419900+01:00 phy005 kernel: [<ffffffff8104d11f>]
warn_slowpath_common+0x7c/0x94
2011-01-25T12:38:49.419907+01:00 phy005 kernel: [<ffffffff8104d14b>]
warn_slowpath_null+0x14/0x16
2011-01-25T12:38:49.420860+01:00 phy005 kernel: [<ffffffffa00ba7fb>]
handle_ept_misconfig+0x152/0x1d8 [kvm_intel]
2011-01-25T12:38:49.420887+01:00 phy005 kernel: [<ffffffffa00bb401>]
vmx_handle_exit+0x204/0x23a [kvm_intel]
2011-01-25T12:38:49.420891+01:00 phy005 kernel: [<ffffffffa0075998>]
kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
2011-01-25T12:38:49.420893+01:00 phy005 kernel: [<ffffffffa00645ba>]
kvm_vcpu_ioctl+0xfd/0x56e [kvm]
2011-01-25T12:38:49.422064+01:00 phy005 kernel: [<ffffffff8111aa2f>]
vfs_ioctl+0x32/0xa6
2011-01-25T12:38:49.422090+01:00 phy005 kernel: [<ffffffff8111afa2>]
do_vfs_ioctl+0x483/0x4c9
2011-01-25T12:38:49.422092+01:00 phy005 kernel: [<ffffffff8111b03e>]
sys_ioctl+0x56/0x79
2011-01-25T12:38:49.422096+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-01-25T12:38:49.422099+01:00 phy005 kernel: ---[ end trace
643b51f38991abee ]---
2011-01-25T13:16:57.541111+01:00 phy005 kernel: br0: port 39(vnet74)
entering disabled state
2011-01-25T13:16:57.588110+01:00 phy005 kernel: device vnet74 left
promiscuous mode
2011-01-25T13:16:57.588169+01:00 phy005 kernel: br0: port 39(vnet74)
entering disabled state
2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
process qemu-kvm  pte:1603a0730500d067 pmd:61059f067
2011-01-25T13:16:58.192462+01:00 phy005 kernel: addr:00007f97fe1ff000
vm_flags:80100073 anon_vma:ffff88061dd04440 mapping:(null)
index:7f97fe1ff
2011-01-25T13:16:58.193253+01:00 phy005 kernel: Pid: 4444, comm:
qemu-kvm Tainted: G      D W  2.6.34.7-66.tilaa.fc13.x86_64 #1
2011-01-25T13:16:58.193275+01:00 phy005 kernel: Call Trace:
2011-01-25T13:16:58.193280+01:00 phy005 kernel: [<ffffffff810e135a>]
print_bad_pte+0x203/0x21c
2011-01-25T13:16:58.193284+01:00 phy005 kernel: [<ffffffff810e13be>]
vm_normal_page+0x4b/0x64
2011-01-25T13:16:58.194123+01:00 phy005 kernel: [<ffffffff810e1c5a>]
unmap_vmas+0x492/0x92c
2011-01-25T13:16:58.194132+01:00 phy005 kernel: [<ffffffff810e73ff>]
exit_mmap+0xce/0x132
2011-01-25T13:16:58.194138+01:00 phy005 kernel: [<ffffffff8104ad7a>]
mmput+0x5e/0xca
2011-01-25T13:16:58.194142+01:00 phy005 kernel: [<ffffffff8104f0d5>]
exit_mm+0x114/0x121
2011-01-25T13:16:58.195242+01:00 phy005 kernel: [<ffffffff81050bf5>]
do_exit+0x254/0x752
2011-01-25T13:16:58.195253+01:00 phy005 kernel: [<ffffffff8104816f>] ?
default_wake_function+0x12/0x14
2011-01-25T13:16:58.195256+01:00 phy005 kernel: [<ffffffff81051174>]
do_group_exit+0x81/0xab
2011-01-25T13:16:58.195260+01:00 phy005 kernel: [<ffffffff8105e5cd>]
get_signal_to_deliver+0x3a6/0x3c8
2011-01-25T13:16:58.195264+01:00 phy005 kernel: [<ffffffff81092b6a>] ?
audit_buffer_free+0x75/0x7a
2011-01-25T13:16:58.196201+01:00 phy005 kernel: [<ffffffff81009038>]
do_signal+0x72/0x6b8
2011-01-25T13:16:58.196212+01:00 phy005 kernel: [<ffffffff8110efce>] ?
__fput+0x1cd/0x1dc
2011-01-25T13:16:58.196216+01:00 phy005 kernel: [<ffffffff810096a6>]
do_notify_resume+0x28/0x86
2011-01-25T13:16:58.196219+01:00 phy005 kernel: [<ffffffff81009f3e>]
int_signal+0x12/0x17
2011-01-25T13:17:00.006943+01:00 phy005 kernel: br1: port 39(vnet75)
entering disabled state
2011-01-25T13:17:00.511943+01:00 phy005 kernel: device vnet75 left
promiscuous mode
2011-01-25T13:17:00.511991+01:00 phy005 kernel: br1: port 39(vnet75)
entering disabled state
2011-01-25T13:17:18.748195+01:00 phy005 kernel: device vnet74 entered
promiscuous mode
2011-01-25T13:17:18.752020+01:00 phy005 kernel: br0: port 39(vnet74)
entering forwarding state
2011-01-25T13:17:18.754127+01:00 phy005 kernel: device vnet75 entered
promiscuous mode
2011-01-25T13:17:18.756087+01:00 phy005 kernel: br1: port 39(vnet75)
entering forwarding state
2011-01-25T13:17:24.416116+01:00 phy005 kernel: kvm: 16063: cpu0
unhandled wrmsr: 0x198 data 0
2011-01-25T13:17:24.416135+01:00 phy005 kernel: kvm: 16063: cpu1
unhandled wrmsr: 0x198 data 0
2011-01-25T13:17:29.051982+01:00 phy005 kernel: vnet74: no IPv6 routers present
2011-01-25T13:17:29.166986+01:00 phy005 kernel: vnet75: no IPv6 routers present
2011-01-25T15:01:38.735441+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at fffff6b192918010
2011-01-25T15:01:38.735756+01:00 phy005 kernel: IP:
[<ffffffffa0079826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-01-25T15:01:38.735762+01:00 phy005 kernel: PGD 0
2011-01-25T15:01:38.735766+01:00 phy005 kernel: Oops: 0000 [#3] SMP
2011-01-25T15:01:38.735770+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-01-25T15:01:38.735773+01:00 phy005 kernel: CPU 10
2011-01-25T15:01:38.735780+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt i2c_core ioatdma
joydev iTCO_vendor_support dca serio_raw 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-01-25T15:01:38.735783+01:00 phy005 kernel:
2011-01-25T15:01:38.735788+01:00 phy005 kernel: Pid: 2465, comm:
qemu-kvm Tainted: G    B D W  2.6.34.7-66.tilaa.fc13.x86_64 #1
X8DTU/X8DTU
2011-01-25T15:01:38.735792+01:00 phy005 kernel: RIP:
0010:[<ffffffffa0079826>]  [<ffffffffa0079826>]
kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-01-25T15:01:38.735796+01:00 phy005 kernel: RSP:
0018:ffff880c243cdb58  EFLAGS: 00010206
2011-01-25T15:01:38.735800+01:00 phy005 kernel: RAX: 00000cb192918000
RBX: ffff88030010b8c0 RCX: 0000000000000000
2011-01-25T15:01:38.735804+01:00 phy005 kernel: RDX: ffffea0000000000
RSI: ffff880310adfff8 RDI: ffff8803000f68f8
2011-01-25T15:01:38.735807+01:00 phy005 kernel: RBP: ffff880c243cdb88
R08: ffff880310adf018 R09: 0000000000000004
2011-01-25T15:01:38.735811+01:00 phy005 kernel: R10: 0000000000000000
R11: ffffea000ac288c0 R12: ffff880c201fc000
2011-01-25T15:01:38.735819+01:00 phy005 kernel: R13: ffff880310adfff8
R14: 00000000000001ff R15: 0000000000000000
2011-01-25T15:01:38.735823+01:00 phy005 kernel: FS:
0000000000000000(0000) GS:ffff8800020c0000(0000)
knlGS:0000000000000000
2011-01-25T15:01:38.735826+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 000000008005003b
2011-01-25T15:01:38.735830+01:00 phy005 kernel: CR2: fffff6b192918010
CR3: 0000000001a42000 CR4: 00000000000026e0
2011-01-25T15:01:38.735833+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-01-25T15:01:38.735836+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-01-25T15:01:38.735840+01:00 phy005 kernel: Process qemu-kvm (pid:
2465, threadinfo ffff880c243cc000, task ffff880c235f1770)
2011-01-25T15:01:38.735843+01:00 phy005 kernel: Stack:
2011-01-25T15:01:38.735847+01:00 phy005 kernel: 0000000000000002
ffff880c201fc000 ffff88030010b810 ffff880c201fe328
2011-01-25T15:01:38.735851+01:00 phy005 kernel: <0> ffff880c22d01568
ffff880c235f1770 ffff880c243cdbb8 ffffffffa0079a42
2011-01-25T15:01:38.735855+01:00 phy005 kernel: <0> ffffea002a7039c0
ffff880c201fc000 ffff880c201fc000 0000000000000001
2011-01-25T15:01:38.735858+01:00 phy005 kernel: Call Trace:
2011-01-25T15:01:38.735862+01:00 phy005 kernel: [<ffffffffa0079a42>]
kvm_mmu_zap_all+0x35/0x60 [kvm]
2011-01-25T15:01:38.735866+01:00 phy005 kernel: [<ffffffffa006ecde>]
kvm_arch_flush_shadow+0x16/0x22 [kvm]
2011-01-25T15:01:38.735870+01:00 phy005 kernel: [<ffffffffa0064b0a>]
kvm_mmu_notifier_release+0x31/0x44 [kvm]
2011-01-25T15:01:38.735875+01:00 phy005 kernel: [<ffffffff810fac37>]
__mmu_notifier_release+0x4f/0x7b
2011-01-25T15:01:38.735879+01:00 phy005 kernel: [<ffffffff810e735d>]
exit_mmap+0x2c/0x132
2011-01-25T15:01:38.735882+01:00 phy005 kernel: [<ffffffff8104ad7a>]
mmput+0x5e/0xca
2011-01-25T15:01:38.735886+01:00 phy005 kernel: [<ffffffff8104f0d5>]
exit_mm+0x114/0x121
2011-01-25T15:01:38.735890+01:00 phy005 kernel: [<ffffffff81050bf5>]
do_exit+0x254/0x752
2011-01-25T15:01:38.735893+01:00 phy005 kernel: [<ffffffffa006709a>] ?
vcpu_put+0x28/0x2d [kvm]
2011-01-25T15:01:38.735897+01:00 phy005 kernel: [<ffffffff81051174>]
do_group_exit+0x81/0xab
2011-01-25T15:01:38.735902+01:00 phy005 kernel: [<ffffffff8105e5cd>]
get_signal_to_deliver+0x3a6/0x3c8
2011-01-25T15:01:38.735906+01:00 phy005 kernel: [<ffffffff81009038>]
do_signal+0x72/0x6b8
2011-01-25T15:01:38.735910+01:00 phy005 kernel: [<ffffffff8111aa2f>] ?
vfs_ioctl+0x32/0xa6
2011-01-25T15:01:38.735914+01:00 phy005 kernel: [<ffffffff8111afa2>] ?
do_vfs_ioctl+0x483/0x4c9
2011-01-25T15:01:38.735918+01:00 phy005 kernel: [<ffffffff810096a6>]
do_notify_resume+0x28/0x86
2011-01-25T15:01:38.735922+01:00 phy005 kernel: [<ffffffff81009f3e>]
int_signal+0x12/0x17
2011-01-25T15:01:38.735928+01:00 phy005 kernel: Code: 41 5e 44 89 f8
41 5f c9 c3 48 ba 00 f0 ff ff ff ff 0f 00 4c 89 ee 48 21 d0 48 ba 00
00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 38 <48> 8b 7c 10 10 e8 a3 f3
ff ff e9 06 fe ff ff 55 48 89 e5 41 57
2011-01-25T15:01:38.735938+01:00 phy005 kernel: RIP
[<ffffffffa0079826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-01-25T15:01:38.735942+01:00 phy005 kernel: RSP <ffff880c243cdb58>
2011-01-25T15:01:38.735946+01:00 phy005 kernel: CR2: fffff6b192918010
2011-01-25T15:01:38.735950+01:00 phy005 kernel: ---[ end trace
643b51f38991abef ]---
2011-01-25T15:01:38.735954+01:00 phy005 kernel: Fixing recursive fault
but reboot is needed!

and

2011-01-25T17:33:57.393780+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at ffffea7192918310
2011-01-25T17:33:57.393888+01:00 phy005 kernel: IP:
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-01-25T17:33:57.393895+01:00 phy005 kernel: PGD 118600067 PUD 0
2011-01-25T17:33:57.393897+01:00 phy005 kernel: Oops: 0000 [#4] SMP
2011-01-25T17:33:57.393900+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-01-25T17:33:57.393902+01:00 phy005 kernel: CPU 4
2011-01-25T17:33:57.393906+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt i2c_core ioatdma
joydev iTCO_vendor_support dca serio_raw 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-01-25T17:33:57.393913+01:00 phy005 kernel:
2011-01-25T17:33:57.393915+01:00 phy005 kernel: Pid: 3630, comm:
qemu-kvm Tainted: G    B D W  2.6.34.7-66.tilaa.fc13.x86_64 #1
X8DTU/X8DTU
2011-01-25T17:33:57.393918+01:00 phy005 kernel: RIP:
0010:[<ffffffff81034880>]  [<ffffffff81034880>]
gup_pte_range+0x94/0xd3
2011-01-25T17:33:57.393920+01:00 phy005 kernel: RSP:
0018:ffff880bac24bab8  EFLAGS: 00010082
2011-01-25T17:33:57.393923+01:00 phy005 kernel: RAX: ffffea7192918310
RBX: 00003ffffffff000 RCX: 0000000000000007
2011-01-25T17:33:57.393925+01:00 phy005 kernel: RDX: 00007fce4fc00000
RSI: 00007fce4fbff000 RDI: 1603a0730500e067
2011-01-25T17:33:57.393927+01:00 phy005 kernel: RBP: ffff880bac24bad8
R08: ffff880bac24bbc8 R09: ffff880bac24bb84
2011-01-25T17:33:57.393929+01:00 phy005 kernel: R10: ffff880315507ff8
R11: ffffea0000000000 R12: 0000000000000207
2011-01-25T17:33:57.393931+01:00 phy005 kernel: R13: ffffc00000000fff
R14: 0000000000000007 R15: 0000000000000001
2011-01-25T17:33:57.393935+01:00 phy005 kernel: FS:
00007fce993a2700(0000) GS:ffff880655400000(0000)
knlGS:0000000000000000
2011-01-25T17:33:57.393938+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-01-25T17:33:57.393941+01:00 phy005 kernel: CR2: ffffea7192918310
CR3: 0000000bac235000 CR4: 00000000000026e0
2011-01-25T17:33:57.393944+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-01-25T17:33:57.393948+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-01-25T17:33:57.393951+01:00 phy005 kernel: Process qemu-kvm (pid:
3630, threadinfo ffff880bac24a000, task ffff880bac380000)
2011-01-25T17:33:57.393955+01:00 phy005 kernel: Stack:
2011-01-25T17:33:57.393958+01:00 phy005 kernel: 00007fce4fc00000
00007fce4fc00000 00007fce4fc00000 ffff880bac3b43e8
2011-01-25T17:33:57.393962+01:00 phy005 kernel: <0> ffff880bac24bb38
ffffffff81034a15 00007fce4fbfffff 00007fce4fbfffff
2011-01-25T17:33:57.393966+01:00 phy005 kernel: <0> ffff880bac24bb84
ffff880bac24bbc8 ffff880bac1bb9c8 ffff880bac2357f8
2011-01-25T17:33:57.393969+01:00 phy005 kernel: Call Trace:
2011-01-25T17:33:57.393973+01:00 phy005 kernel: [<ffffffff81034a15>]
gup_pud_range+0x156/0x192
2011-01-25T17:33:57.393977+01:00 phy005 kernel: [<ffffffff81034b15>]
get_user_pages_fast+0xc4/0x172
2011-01-25T17:33:57.393981+01:00 phy005 kernel: [<ffffffff8102f16b>] ?
device_change_notifier+0x5d/0x120
2011-01-25T17:33:57.393985+01:00 phy005 kernel: [<ffffffffa00656a7>]
hva_to_pfn+0x41/0x123 [kvm]
2011-01-25T17:33:57.393989+01:00 phy005 kernel: [<ffffffffa00657bd>] ?
gfn_to_hva+0x16/0x72 [kvm]
2011-01-25T17:33:57.393993+01:00 phy005 kernel: [<ffffffffa0065bfa>]
gfn_to_pfn+0x6a/0x6e [kvm]
2011-01-25T17:33:57.393997+01:00 phy005 kernel: [<ffffffffa007e862>]
tdp_page_fault+0x80/0x10c [kvm]
2011-01-25T17:33:57.393999+01:00 phy005 kernel: [<ffffffffa008450f>] ?
apic_update_ppr+0x22/0x57 [kvm]
2011-01-25T17:33:57.394002+01:00 phy005 kernel: [<ffffffff812c0437>] ?
device_find_child+0x12/0x81
2011-01-25T17:33:57.394004+01:00 phy005 kernel: [<ffffffffa007c5cd>]
kvm_mmu_page_fault+0x1f/0x98 [kvm]
2011-01-25T17:33:57.394007+01:00 phy005 kernel: [<ffffffffa00ba97a>]
handle_ept_violation+0xf9/0x102 [kvm_intel]
2011-01-25T17:33:57.394010+01:00 phy005 kernel: [<ffffffffa00bb401>]
vmx_handle_exit+0x204/0x23a [kvm_intel]
2011-01-25T17:33:57.394012+01:00 phy005 kernel: [<ffffffffa0075998>]
kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
2011-01-25T17:33:57.394015+01:00 phy005 kernel: [<ffffffffa00645ba>]
kvm_vcpu_ioctl+0xfd/0x56e [kvm]
2011-01-25T17:33:57.394017+01:00 phy005 kernel: [<ffffffff810206c4>] ?
lapic_next_event+0x1d/0x21
2011-01-25T17:33:57.394020+01:00 phy005 kernel: [<ffffffff81071435>] ?
clockevents_program_event+0x7a/0x83
2011-01-25T17:33:57.394023+01:00 phy005 kernel: [<ffffffff8107258d>] ?
tick_dev_program_event+0x3c/0xfc
2011-01-25T17:33:57.394026+01:00 phy005 kernel: [<ffffffff8111aa2f>]
vfs_ioctl+0x32/0xa6
2011-01-25T17:33:57.394030+01:00 phy005 kernel: [<ffffffff8111afa2>]
do_vfs_ioctl+0x483/0x4c9
2011-01-25T17:33:57.394034+01:00 phy005 kernel: [<ffffffff8111b03e>]
sys_ioctl+0x56/0x79
2011-01-25T17:33:57.394037+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-01-25T17:33:57.394040+01:00 phy005 kernel: Code: 21 d8 49 01 c2
49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79
04 48 8b 78 10 f0 ff 47 08 49 63 39 48
2011-01-25T17:33:57.394043+01:00 phy005 kernel: RIP
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-01-25T17:33:57.394055+01:00 phy005 kernel: RSP <ffff880bac24bab8>
2011-01-25T17:33:57.394058+01:00 phy005 kernel: CR2: ffffea7192918310
2011-01-25T17:33:57.394060+01:00 phy005 kernel: ---[ end trace
643b51f38991abf0 ]---

Regards,

Ruben




Thanks,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-25 18:29       ` Ruben Kerkhof
@ 2011-01-26  9:52         ` Avi Kivity
  2011-01-26 15:00           ` Ruben Kerkhof
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2011-01-26  9:52 UTC (permalink / raw)
  To: Ruben Kerkhof; +Cc: Marcelo Tosatti, kvm

On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:
> >  When you say "suddenly", this was with no changes to software and hardware?
>
> The host software and hardware hasn't changed in the two months since
> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.
>
> We host customer vms on it though, so virtual machines come and go.
> Various operating systems, a mixture of Linux, FreeBSD and Windows
> 2008 R2. We have other machines with the same config without these
> problems though.

Are those other machines running a similar workload?

The traces look awfully like bad hardware, though that can also be 
explained by random memory corruption due to a bug.

> This time I have a few different messages though:
>
> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault: 0000 [#1] SMP
>
> RSI: 0000000000000000 RDI: 1603a07305001568
>
> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0>  ff 4f 08 0f 94 c0 84
> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb

lock decl 0x8(%rdi)

%rdi is completely crap, looks like corruption again.  Strangely, it is 
similar to the bad spte from the previous trace: 0x1603a0730500d277.  
The upper 48 bits are identical, the lower 16 bits are different.:
> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
> page table at address 7f37b37ff000
> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067

Here are those magic 48 bits again, in the PTE entry.
> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
> 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
> 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x5db595007 level 3
> 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
> 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1

Again.

> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
> process qemu-kvm  pte:1603a0730500d067 pmd:61059f067

Again.

However, these all came from a single boot, yes?  If so they can be the 
same corruption.  Please collect more traces, with reboots in between.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-26  9:52         ` Avi Kivity
@ 2011-01-26 15:00           ` Ruben Kerkhof
  2011-02-10 15:23             ` Ruben Kerkhof
  0 siblings, 1 reply; 19+ messages in thread
From: Ruben Kerkhof @ 2011-01-26 15:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On Wed, Jan 26, 2011 at 10:52, Avi Kivity <avi@redhat.com> wrote:
> On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:
>>
>> >  When you say "suddenly", this was with no changes to software and
>> > hardware?
>>
>> The host software and hardware hasn't changed in the two months since
>> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.
>>
>> We host customer vms on it though, so virtual machines come and go.
>> Various operating systems, a mixture of Linux, FreeBSD and Windows
>> 2008 R2. We have other machines with the same config without these
>> problems though.
>
> Are those other machines running a similar workload?

Yes, similar, or they're more heavily loaded.

On this machine, about half of the 48GB memory was used for virtual machines.

> The traces look awfully like bad hardware, though that can also be explained
> by random memory corruption due to a bug.

Yeah, that's what I'm expecting. We already replaced the memory, next
step is to move the disks over to another server to make sure it's not
the board or cpu's.

>> This time I have a few different messages though:
>>
>> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
>> 0000 [#1] SMP
>>
>> RSI: 0000000000000000 RDI: 1603a07305001568
>>
>> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
>> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
>> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0>  ff 4f 08 0f 94 c0 84
>> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb
>
> lock decl 0x8(%rdi)
>
> %rdi is completely crap, looks like corruption again.  Strangely, it is
> similar to the bad spte from the previous trace: 0x1603a0730500d277.  The
> upper 48 bits are identical, the lower 16 bits are different.:
>>
>> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
>> page table at address 7f37b37ff000
>> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
>> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067
>
> Here are those magic 48 bits again, in the PTE entry.
>>
>> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
>> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
>> 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
>> 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x5db595007 level 3
>> 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
>> 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1
>
> Again.
>
>> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
>> process qemu-kvm  pte:1603a0730500d067 pmd:61059f067
>
> Again.
>
> However, these all came from a single boot, yes?

Correct.

> If so they can be the same
> corruption.  Please collect more traces, with reboots in between.

Ok, thanks, will do.

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-01-26 15:00           ` Ruben Kerkhof
@ 2011-02-10 15:23             ` Ruben Kerkhof
  2011-02-13  2:07               ` Ruben Kerkhof
  2011-02-13 12:58               ` Avi Kivity
  0 siblings, 2 replies; 19+ messages in thread
From: Ruben Kerkhof @ 2011-02-10 15:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On Wed, Jan 26, 2011 at 16:00, Ruben Kerkhof <ruben@rubenkerkhof.com> wrote:
> On Wed, Jan 26, 2011 at 10:52, Avi Kivity <avi@redhat.com> wrote:
>> On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:
>>>
>>> >  When you say "suddenly", this was with no changes to software and
>>> > hardware?
>>>
>>> The host software and hardware hasn't changed in the two months since
>>> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.
>>>
>>> We host customer vms on it though, so virtual machines come and go.
>>> Various operating systems, a mixture of Linux, FreeBSD and Windows
>>> 2008 R2. We have other machines with the same config without these
>>> problems though.
>>
>> Are those other machines running a similar workload?
>
> Yes, similar, or they're more heavily loaded.
>
> On this machine, about half of the 48GB memory was used for virtual machines.
>
>> The traces look awfully like bad hardware, though that can also be explained
>> by random memory corruption due to a bug.
>
> Yeah, that's what I'm expecting. We already replaced the memory, next
> step is to move the disks over to another server to make sure it's not
> the board or cpu's.
>
>>> This time I have a few different messages though:
>>>
>>> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
>>> 0000 [#1] SMP
>>>
>>> RSI: 0000000000000000 RDI: 1603a07305001568
>>>
>>> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
>>> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
>>> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0>  ff 4f 08 0f 94 c0 84
>>> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb
>>
>> lock decl 0x8(%rdi)
>>
>> %rdi is completely crap, looks like corruption again.  Strangely, it is
>> similar to the bad spte from the previous trace: 0x1603a0730500d277.  The
>> upper 48 bits are identical, the lower 16 bits are different.:
>>>
>>> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
>>> page table at address 7f37b37ff000
>>> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
>>> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067
>>
>> Here are those magic 48 bits again, in the PTE entry.
>>>
>>> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
>>> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
>>> 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
>>> 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x5db595007 level 3
>>> 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
>>> 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
>>> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1
>>
>> Again.
>>
>>> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
>>> process qemu-kvm  pte:1603a0730500d067 pmd:61059f067
>>
>> Again.
>>
>> However, these all came from a single boot, yes?
>
> Correct.
>
>> If so they can be the same
>> corruption.  Please collect more traces, with reboots in between.

This machine has been running for a week without problems, but then we
started to get the following oopses again:

2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at ffffea71929180e0
2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
2011-02-06T19:45:35.222203+01:00 phy005 kernel: Oops: 0000 [#1] SMP
2011-02-06T19:45:35.222221+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-06T19:45:35.222224+01:00 phy005 kernel: CPU 4
2011-02-06T19:45:35.222229+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-06T19:45:35.222231+01:00 phy005 kernel:
2011-02-06T19:45:35.222233+01:00 phy005 kernel: Pid: 3650, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
2011-02-06T19:45:35.222236+01:00 phy005 kernel: RIP:
0010:[<ffffffff81034880>]  [<ffffffff81034880>]
gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222239+01:00 phy005 kernel: RSP:
0018:ffff88060b9bda78  EFLAGS: 00010082
2011-02-06T19:45:35.222241+01:00 phy005 kernel: RAX: ffffea71929180e0
RBX: 00003ffffffff000 RCX: 0000000000000005
2011-02-06T19:45:35.222243+01:00 phy005 kernel: RDX: 00007fe54e400000
RSI: 00007fe54e3ff000 RDI: 1603a07305004067
2011-02-06T19:45:35.222245+01:00 phy005 kernel: RBP: ffff88060b9bda98
R08: ffff880b94384560 R09: ffff88060b9bdb44
2011-02-06T19:45:35.222248+01:00 phy005 kernel: R10: ffff880606b2fff8
R11: ffffea0000000000 R12: 0000000000000205
2011-02-06T19:45:35.222251+01:00 phy005 kernel: R13: ffffc00000000fff
R14: 0000000000000005 R15: 0000000000000000
2011-02-06T19:45:35.222255+01:00 phy005 kernel: FS:
00007fe64cb0e700(0000) GS:ffff880655400000(0000)
knlGS:0000000000000000
2011-02-06T19:45:35.222259+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-02-06T19:45:35.222263+01:00 phy005 kernel: CR2: ffffea71929180e0
CR3: 0000000bff06d000 CR4: 00000000000026e0
2011-02-06T19:45:35.222267+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-06T19:45:35.222271+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-06T19:45:35.222274+01:00 phy005 kernel: Process qemu-kvm (pid:
3650, threadinfo ffff88060b9bc000, task ffff880623ed2ee0)
2011-02-06T19:45:35.222278+01:00 phy005 kernel: Stack:
2011-02-06T19:45:35.222281+01:00 phy005 kernel: 00007fe54e400000
00007fe54e400000 00007fe54e400000 ffff88053a0d2388
2011-02-06T19:45:35.222285+01:00 phy005 kernel: <0> ffff88060b9bdaf8
ffffffff81034a15 00007fe54e3fffff 00007fe54e3fffff
2011-02-06T19:45:35.222289+01:00 phy005 kernel: <0> ffff88060b9bdb44
ffff880b94384560 ffff880bff06eca8 ffff880bff06d7f8
2011-02-06T19:45:35.222292+01:00 phy005 kernel: Call Trace:
2011-02-06T19:45:35.222296+01:00 phy005 kernel: [<ffffffff81034a15>]
gup_pud_range+0x156/0x192
2011-02-06T19:45:35.222300+01:00 phy005 kernel: [<ffffffff81034b15>]
get_user_pages_fast+0xc4/0x172
2011-02-06T19:45:35.222304+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
bio_add_page+0x36/0x38
2011-02-06T19:45:35.222308+01:00 phy005 kernel: [<ffffffff81134730>]
dio_get_page+0x54/0x127
2011-02-06T19:45:35.222312+01:00 phy005 kernel: [<ffffffff81135317>]
__blockdev_direct_IO+0x41d/0xa36
2011-02-06T19:45:35.222316+01:00 phy005 kernel: [<ffffffffa0080f69>] ?
x86_emulate_insn+0x1ff8/0x2d61 [kvm]
2011-02-06T19:45:35.222320+01:00 phy005 kernel: [<ffffffff8113379b>]
blkdev_direct_IO+0x4e/0x50
2011-02-06T19:45:35.222324+01:00 phy005 kernel: [<ffffffff81132c49>] ?
blkdev_get_blocks+0x0/0x8d
2011-02-06T19:45:35.222328+01:00 phy005 kernel: [<ffffffff810cb516>]
generic_file_direct_write+0xed/0x16d
2011-02-06T19:45:35.222331+01:00 phy005 kernel: [<ffffffff810cb72c>]
__generic_file_aio_write+0x196/0x281
2011-02-06T19:45:35.222335+01:00 phy005 kernel: [<ffffffff811d5352>] ?
file_has_perm+0xa4/0xc6
2011-02-06T19:45:35.222339+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-06T19:45:35.222343+01:00 phy005 kernel: [<ffffffff8113306d>]
blkdev_aio_write+0x2a/0x69
2011-02-06T19:45:35.222347+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-06T19:45:35.222351+01:00 phy005 kernel: [<ffffffff8113d4eb>]
aio_rw_vect_retry+0x85/0x18e
2011-02-06T19:45:35.222355+01:00 phy005 kernel: [<ffffffff8113e9b3>]
aio_run_iocb+0x77/0x10f
2011-02-06T19:45:35.222359+01:00 phy005 kernel: [<ffffffff8113f508>]
do_io_submit+0x558/0x7ce
2011-02-06T19:45:35.222363+01:00 phy005 kernel: [<ffffffff8113f78e>]
sys_io_submit+0x10/0x12
2011-02-06T19:45:35.222366+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-06T19:45:35.222372+01:00 phy005 kernel: Code: 21 d8 49 01 c2
49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79
04 48 8b 78 10 f0 ff 47 08 49 63 39 48
2011-02-06T19:45:35.222376+01:00 phy005 kernel: RIP
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-06T19:45:35.222379+01:00 phy005 kernel: RSP <ffff88060b9bda78>
2011-02-06T19:45:35.222382+01:00 phy005 kernel: CR2: ffffea71929180e0
2011-02-06T19:45:35.222386+01:00 phy005 kernel: ---[ end trace
beed2b54d0bb8a00 ]---

and

2011-02-06T19:47:15.023129+01:00 phy005 kernel: qemu-kvm: Corrupted
page table at address 7fbde15ff64c
2011-02-06T19:47:15.023207+01:00 phy005 kernel: PGD 5ff58a067 PUD
612668067 PMD 5937b7067 PTE 1603a07305008067
2011-02-06T19:47:15.023214+01:00 phy005 kernel: Bad pagetable: 000d [#2] SMP
2011-02-06T19:47:15.023219+01:00 phy005 kernel: last sysfs file:
/sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host0/scsi_host/host0/stats
2011-02-06T19:47:15.023226+01:00 phy005 kernel: CPU 13
2011-02-06T19:47:15.023232+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-06T19:47:15.023236+01:00 phy005 kernel:
2011-02-06T19:47:15.023239+01:00 phy005 kernel: Pid: 3387, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
X8DTU/X8DTU
2011-02-06T19:47:15.023244+01:00 phy005 kernel: RIP:
0033:[<00000000004abb73>]  [<00000000004abb73>] 0x4abb73
2011-02-06T19:47:15.023247+01:00 phy005 kernel: RSP:
002b:00007fbdf3c00680  EFLAGS: 00010206
2011-02-06T19:47:15.023251+01:00 phy005 kernel: RAX: 00007fbde15ff000
RBX: 000000000000064c RCX: 0000000001abe968
2011-02-06T19:47:15.023254+01:00 phy005 kernel: RDX: 0000000001abe850
RSI: 0000000000000000 RDI: 000000003d600000
2011-02-06T19:47:15.023257+01:00 phy005 kernel: RBP: 0000000001f2ab00
R08: 0000000000000003 R09: 0000000002000000
2011-02-06T19:47:15.023260+01:00 phy005 kernel: R10: 000000000000c050
R11: 00007fbdec000818 R12: 0000000000000025
2011-02-06T19:47:15.023269+01:00 phy005 kernel: R13: 0000000000000003
R14: 000000003d600640 R15: 0000000000000000
2011-02-06T19:47:15.023273+01:00 phy005 kernel: FS:
00007fbdf3c01700(0000) GS:ffff8806554a0000(0000)
knlGS:0000000000000000
2011-02-06T19:47:15.023276+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-02-06T19:47:15.023280+01:00 phy005 kernel: CR2: 00007fbde15ff64c
CR3: 0000000606858000 CR4: 00000000000026e0
2011-02-06T19:47:15.023283+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-06T19:47:15.023286+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-06T19:47:15.023290+01:00 phy005 kernel: Process qemu-kvm (pid:
3387, threadinfo ffff88060689e000, task ffff8805ff5a9770)
2011-02-06T19:47:15.023294+01:00 phy005 kernel:
2011-02-06T19:47:15.023296+01:00 phy005 kernel: RIP
[<00000000004abb73>] 0x4abb73
2011-02-06T19:47:15.023298+01:00 phy005 kernel: RSP <00007fbdf3c00680>
2011-02-06T19:47:15.023300+01:00 phy005 kernel: ---[ end trace
beed2b54d0bb8a01 ]---

followed by

2011-02-06T21:20:32.882972+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at fffff6b192918010
2011-02-06T21:20:32.883252+01:00 phy005 kernel: IP:
[<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-02-06T21:20:32.883259+01:00 phy005 kernel: PGD 0
2011-02-06T21:20:32.883263+01:00 phy005 kernel: Oops: 0000 [#5] SMP
2011-02-06T21:20:32.883267+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-06T21:20:32.883271+01:00 phy005 kernel: CPU 8
2011-02-06T21:20:32.883278+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
 garp stp llc bonding xt_comment xt_recent ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
2c_i801 i2c_core iTCO_wdt serio_raw igb iTCO_vendor_support joydev
ioatdma dca 3w_9xxx [last unloaded: scsi_wait_scan]
2011-02-06T21:20:32.883286+01:00 phy005 kernel:
2011-02-06T21:20:32.883290+01:00 phy005 kernel: Pid: 13247, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x
86_64 #1 X8DTU/X8DTU
2011-02-06T21:20:32.883295+01:00 phy005 kernel: RIP:
0010:[<ffffffffa0078826>]  [<ffffffffa0078826>]
kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-02-06T21:20:32.883300+01:00 phy005 kernel: RSP:
0018:ffff880312bdfb58  EFLAGS: 00010206
2011-02-06T21:20:32.883303+01:00 phy005 kernel: RAX: 00000cb192918000
RBX: ffff8802d16ae210 RCX: 0000000000000000
2011-02-06T21:20:32.883307+01:00 phy005 kernel: RDX: ffffea0000000000
RSI: ffff88060bb07ff8 RDI: 0000000000000200
2011-02-06T21:20:32.883311+01:00 phy005 kernel: RBP: ffff880312bdfb88
R08: dead000000100100 R09: 0000000000000004
2011-02-06T21:20:32.883315+01:00 phy005 kernel: R10: 0000000000000000
R11: 0000000000000010 R12: ffff880853ae0000
2011-02-06T21:20:32.883319+01:00 phy005 kernel: R13: ffff88060bb07ff8
R14: 00000000000001ff R15: 0000000000000000
2011-02-06T21:20:32.883323+01:00 phy005 kernel: FS:
0000000000000000(0000) GS:ffff880002080000(0000)
knlGS:0000000000000000
2011-02-06T21:20:32.883327+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 000000008005003b
2011-02-06T21:20:32.883331+01:00 phy005 kernel: CR2: fffff6b192918010
CR3: 0000000001a42000 CR4: 00000000000026e0
2011-02-06T21:20:32.883335+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-06T21:20:32.883338+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-06T21:20:32.883343+01:00 phy005 kernel: Process qemu-kvm (pid:
13247, threadinfo ffff880312bde000, task ffff880268ad8000)
2011-02-06T21:20:32.883347+01:00 phy005 kernel: Stack:
2011-02-06T21:20:32.883351+01:00 phy005 kernel: 0000000000000002
ffff880853ae0000 ffff8802d16ae160 ffff880853ae2328
2011-02-06T21:20:32.883355+01:00 phy005 kernel: <0> ffff880c22d426e8
ffff880268ad8000 ffff880312bdfbb8 ffffffffa0078a42
2011-02-06T21:20:32.883358+01:00 phy005 kernel: <0> ffffea00134a16c8
ffff880853ae0000 ffff880853ae0000 0000000000000001
2011-02-06T21:20:32.883362+01:00 phy005 kernel: Call Trace:
2011-02-06T21:20:32.883366+01:00 phy005 kernel: [<ffffffffa0078a42>]
kvm_mmu_zap_all+0x35/0x60 [kvm]
2011-02-06T21:20:32.883371+01:00 phy005 kernel: [<ffffffffa006dcde>]
kvm_arch_flush_shadow+0x16/0x22 [kvm]
2011-02-06T21:20:32.883375+01:00 phy005 kernel: [<ffffffffa0063b0a>]
kvm_mmu_notifier_release+0x31/0x44 [kvm]
2011-02-06T21:20:32.883379+01:00 phy005 kernel: [<ffffffff810fac37>]
__mmu_notifier_release+0x4f/0x7b
2011-02-06T21:20:32.883383+01:00 phy005 kernel: [<ffffffff810e735d>]
exit_mmap+0x2c/0x132
2011-02-06T21:20:32.883386+01:00 phy005 kernel: [<ffffffff8104ad7a>]
mmput+0x5e/0xca
2011-02-06T21:20:32.883390+01:00 phy005 kernel: [<ffffffff8104f0d5>]
exit_mm+0x114/0x121
2011-02-06T21:20:32.883394+01:00 phy005 kernel: [<ffffffff81050bf5>]
do_exit+0x254/0x752
2011-02-06T21:20:32.883398+01:00 phy005 kernel: [<ffffffff81051174>]
do_group_exit+0x81/0xab
2011-02-06T21:20:32.883403+01:00 phy005 kernel: [<ffffffff8105e5cd>]
get_signal_to_deliver+0x3a6/0x3c8
2011-02-06T21:20:32.883406+01:00 phy005 kernel: [<ffffffff81009038>]
do_signal+0x72/0x6b8
2011-02-06T21:20:32.883410+01:00 phy005 kernel: [<ffffffff8111aa2f>] ?
vfs_ioctl+0x32/0xa6
2011-02-06T21:20:32.883413+01:00 phy005 kernel: [<ffffffff8111afa2>] ?
do_vfs_ioctl+0x483/0x4c9
2011-02-06T21:20:32.883416+01:00 phy005 kernel: [<ffffffff810096a6>]
do_notify_resume+0x28/0x86
2011-02-06T21:20:32.883420+01:00 phy005 kernel: [<ffffffff81009f3e>]
int_signal+0x12/0x17
2011-02-06T21:20:32.883426+01:00 phy005 kernel: Code: 41 5e 44 89 f8
41 5f c9 c3 48 ba 00 f0 ff ff ff ff 0f 00 4c 89 ee 48 21 d0 48 ba 00
00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 38 <48> 8b 7c 10 10 e8 a3 f3
ff ff e9 06 fe ff ff 55 48 89 e5 41 57
2011-02-06T21:20:32.883431+01:00 phy005 kernel: RIP
[<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
2011-02-06T21:20:32.883434+01:00 phy005 kernel: RSP <ffff880312bdfb58>
2011-02-06T21:20:32.883437+01:00 phy005 kernel: CR2: fffff6b192918010
2011-02-06T21:20:32.883441+01:00 phy005 kernel: ---[ end trace
beed2b54d0bb8a04 ]---
2011-02-06T21:20:32.883444+01:00 phy005 kernel: Fixing recursive fault
but reboot is needed!

after which we rebooted the machine and replaced the motherboard and
cpus (we already replaced the memory before).

But 2 days ago we got this oops:

2011-02-08T15:56:19.902104+01:00 phy005 kernel: BUG: unable to handle
kernel paging request at ffffea71929181c0
2011-02-08T15:56:19.902686+01:00 phy005 kernel: IP:
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-08T15:56:19.902693+01:00 phy005 kernel: PGD 118600067 PUD 0
2011-02-08T15:56:19.902699+01:00 phy005 kernel: Oops: 0000 [#1] SMP
2011-02-08T15:56:19.902703+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m
ap
2011-02-08T15:56:19.902708+01:00 phy005 kernel: CPU 8
2011-02-08T15:56:19.902715+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
 garp stp llc bonding xt_comment xt_recent ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca
serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan]
2011-02-08T15:56:19.902770+01:00 phy005 kernel:
2011-02-08T15:56:19.902775+01:00 phy005 kernel: Pid: 3346, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X
8DTU/X8DTU
2011-02-08T15:56:19.902781+01:00 phy005 kernel: RIP:
0010:[<ffffffff81034880>]  [<ffffffff81034880>] gup_pte_range+0x94/
0xd3
2011-02-08T15:56:19.902785+01:00 phy005 kernel: RSP:
0018:ffff880c21bc1a78  EFLAGS: 00010086
2011-02-08T15:56:19.902789+01:00 phy005 kernel: RAX: ffffea71929181c0
RBX: 00003ffffffff000 RCX: 0000000000000005
2011-02-08T15:56:19.902793+01:00 phy005 kernel: RDX: 00007fa2ca200000
RSI: 00007fa2ca1ff000 RDI: 1603a07305008067
2011-02-08T15:56:19.902797+01:00 phy005 kernel: RBP: ffff880c21bc1a98
R08: ffff88060fdfad60 R09: ffff880c21bc1b44
2011-02-08T15:56:19.902801+01:00 phy005 kernel: R10: ffff88061493fff8
R11: ffffea0000000000 R12: 0000000000000205
2011-02-08T15:56:19.902805+01:00 phy005 kernel: R13: ffffc00000000fff
R14: 0000000000000005 R15: 0000000000000000
2011-02-08T15:56:19.902810+01:00 phy005 kernel: FS:
00007fa2d8724700(0000) GS:ffff880002080000(0000) knlGS:000000000000
0000
2011-02-08T15:56:19.902820+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
002b CR0: 0000000080050033
2011-02-08T15:56:19.902825+01:00 phy005 kernel: CR2: ffffea71929181c0
CR3: 0000000c231f9000 CR4: 00000000000026e0
2011-02-08T15:56:19.902829+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-08T15:56:19.902833+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-08T15:56:19.902837+01:00 phy005 kernel: Process qemu-kvm (pid:
3346, threadinfo ffff880c21bc0000, task ffff880c2
264ddc0)
2011-02-08T15:56:19.902841+01:00 phy005 kernel: Stack:
2011-02-08T15:56:19.902844+01:00 phy005 kernel: 00007fa2ca200000
00007fa2ca201000 00007fa2ca201000 ffff880c22c3d280
2011-02-08T15:56:19.902848+01:00 phy005 kernel: <0> ffff880c21bc1af8
ffffffff81034a15 00007fa2ca200fff 00007fa2ca200fff
2011-02-08T15:56:19.902852+01:00 phy005 kernel: <0> ffff880c21bc1b44
ffff88060fdfad60 ffff880c2231a458 ffff880c231f97f8
2011-02-08T15:56:19.902855+01:00 phy005 kernel: Call Trace:
2011-02-08T15:56:19.902859+01:00 phy005 kernel: [<ffffffff81034a15>]
gup_pud_range+0x156/0x192
2011-02-08T15:56:19.902863+01:00 phy005 kernel: [<ffffffff81034b15>]
get_user_pages_fast+0xc4/0x172
2011-02-08T15:56:19.902867+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
bio_add_page+0x36/0x38
2011-02-08T15:56:19.902871+01:00 phy005 kernel: [<ffffffff81134730>]
dio_get_page+0x54/0x127
2011-02-08T15:56:19.902875+01:00 phy005 kernel: [<ffffffff81135317>]
__blockdev_direct_IO+0x41d/0xa36
2011-02-08T15:56:19.902880+01:00 phy005 kernel: [<ffffffffa008bf69>] ?
x86_emulate_insn+0x1ff8/0x2d61 [kvm]
2011-02-08T15:56:19.902884+01:00 phy005 kernel: [<ffffffff8113379b>]
blkdev_direct_IO+0x4e/0x50
2011-02-08T15:56:19.902888+01:00 phy005 kernel: [<ffffffff81132c49>] ?
blkdev_get_blocks+0x0/0x8d
2011-02-08T15:56:19.902892+01:00 phy005 kernel: [<ffffffff810cb516>]
generic_file_direct_write+0xed/0x16d
2011-02-08T15:56:19.902896+01:00 phy005 kernel: [<ffffffff810cb72c>]
__generic_file_aio_write+0x196/0x281
2011-02-08T15:56:19.902899+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-08T15:56:19.902909+01:00 phy005 kernel: [<ffffffff81133043>] ?
blkdev_aio_write+0x0/0x69
2011-02-08T15:56:19.902914+01:00 phy005 kernel: [<ffffffff8113d4eb>]
aio_rw_vect_retry+0x85/0x18e
2011-02-08T15:56:19.902919+01:00 phy005 kernel: [<ffffffff8113e9b3>]
aio_run_iocb+0x77/0x10f
2011-02-08T15:56:19.902923+01:00 phy005 kernel: [<ffffffff8113f508>]
do_io_submit+0x558/0x7ce
2011-02-08T15:56:19.902927+01:00 phy005 kernel: [<ffffffff8113f78e>]
sys_io_submit+0x10/0x12
2011-02-08T15:56:19.902932+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-08T15:56:19.902938+01:00 phy005 kernel: Code: 21 d8 49 01 c2
49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79
04 48 8b 78 10 f0 ff 47 08 49 63 39 48
2011-02-08T15:56:19.903077+01:00 phy005 kernel: RIP
[<ffffffff81034880>] gup_pte_range+0x94/0xd3
2011-02-08T15:56:19.903081+01:00 phy005 kernel: RSP <ffff880c21bc1a78>
2011-02-08T15:56:19.903084+01:00 phy005 kernel: CR2: ffffea71929181c0
2011-02-08T15:56:19.903088+01:00 phy005 kernel: ---[ end trace
174c28940e9fd0a7 ]---

and yesterday this one:

2011-02-09T07:40:15.636528+01:00 phy005 kernel: BUG: unable to handle
kernel NULL pointer dereference at (null)
2011-02-09T07:40:15.636635+01:00 phy005 kernel: IP:
[<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
2011-02-09T07:40:15.636639+01:00 phy005 kernel: PGD 0
2011-02-09T07:40:15.636643+01:00 phy005 kernel: Oops: 0000 [#3] SMP
2011-02-09T07:40:15.636647+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/topology/thread_siblings
2011-02-09T07:40:15.636650+01:00 phy005 kernel: CPU 2
2011-02-09T07:40:15.636656+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt ioatdma i2c_core
iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded:
scsi_wait_scan]
2011-02-09T07:40:15.636663+01:00 phy005 kernel:
2011-02-09T07:40:15.636666+01:00 phy005 kernel: Pid: 2572, comm:
qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
X8DTU/X8DTU
2011-02-09T07:40:15.636670+01:00 phy005 kernel: RIP:
0010:[<ffffffffa0082db8>]  [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e
[kvm]
2011-02-09T07:40:15.636673+01:00 phy005 kernel: RSP:
0018:ffff88061cbcbcd8  EFLAGS: 00010246
2011-02-09T07:40:15.636677+01:00 phy005 kernel: RAX: 0000000000000000
RBX: 1603a07305004fff RCX: ffff88061cbcbd08
2011-02-09T07:40:15.636680+01:00 phy005 kernel: RDX: 0000000000000023
RSI: 1603a07305004fff RDI: 0000000000000000
2011-02-09T07:40:15.636683+01:00 phy005 kernel: RBP: ffff88061cbcbce8
R08: 0000000000000023 R09: 0000000000000000
2011-02-09T07:40:15.636686+01:00 phy005 kernel: R10: 0000000000000000
R11: ffffffffa0082c7f R12: 0000000000000001
2011-02-09T07:40:15.636689+01:00 phy005 kernel: R13: 0000000000311763
R14: ffff8809b8b01ce0 R15: 0000000000000000
2011-02-09T07:40:15.636692+01:00 phy005 kernel: FS:
0000000000000000(0000) GS:ffff880002040000(0000)
knlGS:0000000000000000
2011-02-09T07:40:15.636695+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 000000008005003b
2011-02-09T07:40:15.636699+01:00 phy005 kernel: CR2: 0000000000000000
CR3: 0000000001a42000 CR4: 00000000000026e0
2011-02-09T07:40:15.636702+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-09T07:40:15.636705+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-09T07:40:15.636709+01:00 phy005 kernel: Process qemu-kvm (pid:
2572, threadinfo ffff88061cbca000, task ffff88061cf04650)
2011-02-09T07:40:15.636711+01:00 phy005 kernel: Stack:
2011-02-09T07:40:15.636715+01:00 phy005 kernel: ffff88036c471ff8
ffff880c23984000 ffff88061cbcbd18 ffffffffa0082ea9
2011-02-09T07:40:15.636718+01:00 phy005 kernel: <0> ffff8809b8b01ce0
ffff880c23984000 ffff88036c471ff8 00000000000001ff
2011-02-09T07:40:15.636721+01:00 phy005 kernel: <0> ffff88061cbcbd58
ffffffffa008363b 0000000000000200 ffff880c23984000
2011-02-09T07:40:15.636724+01:00 phy005 kernel: Call Trace:
2011-02-09T07:40:15.636728+01:00 phy005 kernel: [<ffffffffa0082ea9>]
rmap_remove+0xa3/0x1a0 [kvm]
2011-02-09T07:40:15.636731+01:00 phy005 kernel: [<ffffffffa008363b>]
kvm_mmu_zap_page+0x9f/0x299 [kvm]
2011-02-09T07:40:15.636734+01:00 phy005 kernel: [<ffffffffa0083a42>]
kvm_mmu_zap_all+0x35/0x60 [kvm]
2011-02-09T07:40:15.636738+01:00 phy005 kernel: [<ffffffffa0078cde>]
kvm_arch_flush_shadow+0x16/0x22 [kvm]
2011-02-09T07:40:15.636741+01:00 phy005 kernel: [<ffffffffa006eb0a>]
kvm_mmu_notifier_release+0x31/0x44 [kvm]
2011-02-09T07:40:15.636744+01:00 phy005 kernel: [<ffffffff810fac37>]
__mmu_notifier_release+0x4f/0x7b
2011-02-09T07:40:15.636748+01:00 phy005 kernel: [<ffffffff810e735d>]
exit_mmap+0x2c/0x132
2011-02-09T07:40:15.636751+01:00 phy005 kernel: [<ffffffff8104ad7a>]
mmput+0x5e/0xca
2011-02-09T07:40:15.636754+01:00 phy005 kernel: [<ffffffff8104f0d5>]
exit_mm+0x114/0x121
2011-02-09T07:40:15.636757+01:00 phy005 kernel: [<ffffffff81050bf5>]
do_exit+0x254/0x752
2011-02-09T07:40:15.636760+01:00 phy005 kernel: [<ffffffff8100a60e>] ?
apic_timer_interrupt+0xe/0x20
2011-02-09T07:40:15.636764+01:00 phy005 kernel: [<ffffffff81051174>]
do_group_exit+0x81/0xab
2011-02-09T07:40:15.636767+01:00 phy005 kernel: [<ffffffff810511b5>]
sys_exit_group+0x17/0x1b
2011-02-09T07:40:15.636771+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-09T07:40:15.636777+01:00 phy005 kernel: Code: 88 ff ff ff b8
01 00 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 41 89 d4 48 89
f3 e8 7b c7 fe ff 41 83 fc 01 48 89 c7 75 0d <48> 2b 18 48 c1 e3 03 48
03 58 18 eb 39 41 8d 4c 24 ff be 01 00
2011-02-09T07:40:15.636785+01:00 phy005 kernel: RIP
[<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
2011-02-09T07:40:15.636788+01:00 phy005 kernel: RSP <ffff88061cbcbcd8>
2011-02-09T07:40:15.636791+01:00 phy005 kernel: CR2: 0000000000000000
2011-02-09T07:40:15.637743+01:00 phy005 kernel: ---[ end trace
174c28940e9fd0a9 ]---
2011-02-09T07:40:15.637751+01:00 phy005 kernel: Fixing recursive fault
but reboot is needed!

So it doesn't seem to be a hardware problem since we replaced all that.

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-10 15:23             ` Ruben Kerkhof
@ 2011-02-13  2:07               ` Ruben Kerkhof
  2011-02-13 13:03                 ` Avi Kivity
  2011-02-13 12:58               ` Avi Kivity
  1 sibling, 1 reply; 19+ messages in thread
From: Ruben Kerkhof @ 2011-02-13  2:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On Thu, Feb 10, 2011 at 16:23, Ruben Kerkhof <ruben@rubenkerkhof.com> wrote:
> On Wed, Jan 26, 2011 at 16:00, Ruben Kerkhof <ruben@rubenkerkhof.com> wrote:
>> On Wed, Jan 26, 2011 at 10:52, Avi Kivity <avi@redhat.com> wrote:
>>> On 01/25/2011 08:29 PM, Ruben Kerkhof wrote:
>>>>
>>>> >  When you say "suddenly", this was with no changes to software and
>>>> > hardware?
>>>>
>>>> The host software and hardware hasn't changed in the two months since
>>>> the machine has been running. 2.6.34.7 kernel and qemu-kvm 0.13.
>>>>
>>>> We host customer vms on it though, so virtual machines come and go.
>>>> Various operating systems, a mixture of Linux, FreeBSD and Windows
>>>> 2008 R2. We have other machines with the same config without these
>>>> problems though.
>>>
>>> Are those other machines running a similar workload?
>>
>> Yes, similar, or they're more heavily loaded.
>>
>> On this machine, about half of the 48GB memory was used for virtual machines.
>>
>>> The traces look awfully like bad hardware, though that can also be explained
>>> by random memory corruption due to a bug.
>>
>> Yeah, that's what I'm expecting. We already replaced the memory, next
>> step is to move the disks over to another server to make sure it's not
>> the board or cpu's.
>>
>>>> This time I have a few different messages though:
>>>>
>>>> 2011-01-25T11:58:50.001208+01:00 phy005 kernel: general protection fault:
>>>> 0000 [#1] SMP
>>>>
>>>> RSI: 0000000000000000 RDI: 1603a07305001568
>>>>
>>>> 2011-01-25T11:58:50.001486+01:00 phy005 kernel: Code: ff ff 41 8b 46
>>>> 08 41 29 06 4c 89 e7 57 9d 0f 1f 44 00 00 48 83 c4 18 5b 41 5c 41 5d
>>>> 41 5e 41 5f c9 c3 55 48 89 e5 0f 1f 44 00 00<f0>  ff 4f 08 0f 94 c0 84
>>>> c0 74 10 85 f6 75 07 e8 63 fe ff ff eb
>>>
>>> lock decl 0x8(%rdi)
>>>
>>> %rdi is completely crap, looks like corruption again.  Strangely, it is
>>> similar to the bad spte from the previous trace: 0x1603a0730500d277.  The
>>> upper 48 bits are identical, the lower 16 bits are different.:
>>>>
>>>> 2011-01-25T12:06:32.673937+01:00 phy005 kernel: qemu-kvm: Corrupted
>>>> page table at address 7f37b37ff000
>>>> 2011-01-25T12:06:32.673959+01:00 phy005 kernel: PGD c201d1067 PUD
>>>> 94e538067 PMD 61e5bf067 PTE 1603a0730500e067
>>>
>>> Here are those magic 48 bits again, in the PTE entry.
>>>>
>>>> 2011-01-25T12:38:49.416943+01:00 phy005 kernel: EPT: Misconfiguration.
>>>> 2011-01-25T12:38:49.417518+01:00 phy005 kernel: EPT: GPA: 0x2abff038
>>>> 2011-01-25T12:38:49.417526+01:00 phy005 kernel:
>>>> ept_misconfig_inspect_spte: spte 0x5f49e9007 level 4
>>>> 2011-01-25T12:38:49.417532+01:00 phy005 kernel:
>>>> ept_misconfig_inspect_spte: spte 0x5db595007 level 3
>>>> 2011-01-25T12:38:49.417553+01:00 phy005 kernel:
>>>> ept_misconfig_inspect_spte: spte 0x5d5da7007 level 2
>>>> 2011-01-25T12:38:49.417558+01:00 phy005 kernel:
>>>> ept_misconfig_inspect_spte: spte 0x1603a07305006277 level 1
>>>
>>> Again.
>>>
>>>> 2011-01-25T13:16:58.192440+01:00 phy005 kernel: BUG: Bad page map in
>>>> process qemu-kvm  pte:1603a0730500d067 pmd:61059f067
>>>
>>> Again.
>>>
>>> However, these all came from a single boot, yes?
>>
>> Correct.
>>
>>> If so they can be the same
>>> corruption.  Please collect more traces, with reboots in between.
>
> This machine has been running for a week without problems, but then we
> started to get the following oopses again:
>
> 2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
> kernel paging request at ffffea71929180e0
> 2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
> 2011-02-06T19:45:35.222203+01:00 phy005 kernel: Oops: 0000 [#1] SMP
> 2011-02-06T19:45:35.222221+01:00 phy005 kernel: last sysfs file:
> /sys/devices/system/cpu/cpu15/topology/thread_siblings
> 2011-02-06T19:45:35.222224+01:00 phy005 kernel: CPU 4
> 2011-02-06T19:45:35.222229+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
> ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
> iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
> scsi_wait_scan]
> 2011-02-06T19:45:35.222231+01:00 phy005 kernel:
> 2011-02-06T19:45:35.222233+01:00 phy005 kernel: Pid: 3650, comm:
> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
> 2011-02-06T19:45:35.222236+01:00 phy005 kernel: RIP:
> 0010:[<ffffffff81034880>]  [<ffffffff81034880>]
> gup_pte_range+0x94/0xd3
> 2011-02-06T19:45:35.222239+01:00 phy005 kernel: RSP:
> 0018:ffff88060b9bda78  EFLAGS: 00010082
> 2011-02-06T19:45:35.222241+01:00 phy005 kernel: RAX: ffffea71929180e0
> RBX: 00003ffffffff000 RCX: 0000000000000005
> 2011-02-06T19:45:35.222243+01:00 phy005 kernel: RDX: 00007fe54e400000
> RSI: 00007fe54e3ff000 RDI: 1603a07305004067
> 2011-02-06T19:45:35.222245+01:00 phy005 kernel: RBP: ffff88060b9bda98
> R08: ffff880b94384560 R09: ffff88060b9bdb44
> 2011-02-06T19:45:35.222248+01:00 phy005 kernel: R10: ffff880606b2fff8
> R11: ffffea0000000000 R12: 0000000000000205
> 2011-02-06T19:45:35.222251+01:00 phy005 kernel: R13: ffffc00000000fff
> R14: 0000000000000005 R15: 0000000000000000
> 2011-02-06T19:45:35.222255+01:00 phy005 kernel: FS:
> 00007fe64cb0e700(0000) GS:ffff880655400000(0000)
> knlGS:0000000000000000
> 2011-02-06T19:45:35.222259+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
> 002b CR0: 0000000080050033
> 2011-02-06T19:45:35.222263+01:00 phy005 kernel: CR2: ffffea71929180e0
> CR3: 0000000bff06d000 CR4: 00000000000026e0
> 2011-02-06T19:45:35.222267+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-06T19:45:35.222271+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-06T19:45:35.222274+01:00 phy005 kernel: Process qemu-kvm (pid:
> 3650, threadinfo ffff88060b9bc000, task ffff880623ed2ee0)
> 2011-02-06T19:45:35.222278+01:00 phy005 kernel: Stack:
> 2011-02-06T19:45:35.222281+01:00 phy005 kernel: 00007fe54e400000
> 00007fe54e400000 00007fe54e400000 ffff88053a0d2388
> 2011-02-06T19:45:35.222285+01:00 phy005 kernel: <0> ffff88060b9bdaf8
> ffffffff81034a15 00007fe54e3fffff 00007fe54e3fffff
> 2011-02-06T19:45:35.222289+01:00 phy005 kernel: <0> ffff88060b9bdb44
> ffff880b94384560 ffff880bff06eca8 ffff880bff06d7f8
> 2011-02-06T19:45:35.222292+01:00 phy005 kernel: Call Trace:
> 2011-02-06T19:45:35.222296+01:00 phy005 kernel: [<ffffffff81034a15>]
> gup_pud_range+0x156/0x192
> 2011-02-06T19:45:35.222300+01:00 phy005 kernel: [<ffffffff81034b15>]
> get_user_pages_fast+0xc4/0x172
> 2011-02-06T19:45:35.222304+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
> bio_add_page+0x36/0x38
> 2011-02-06T19:45:35.222308+01:00 phy005 kernel: [<ffffffff81134730>]
> dio_get_page+0x54/0x127
> 2011-02-06T19:45:35.222312+01:00 phy005 kernel: [<ffffffff81135317>]
> __blockdev_direct_IO+0x41d/0xa36
> 2011-02-06T19:45:35.222316+01:00 phy005 kernel: [<ffffffffa0080f69>] ?
> x86_emulate_insn+0x1ff8/0x2d61 [kvm]
> 2011-02-06T19:45:35.222320+01:00 phy005 kernel: [<ffffffff8113379b>]
> blkdev_direct_IO+0x4e/0x50
> 2011-02-06T19:45:35.222324+01:00 phy005 kernel: [<ffffffff81132c49>] ?
> blkdev_get_blocks+0x0/0x8d
> 2011-02-06T19:45:35.222328+01:00 phy005 kernel: [<ffffffff810cb516>]
> generic_file_direct_write+0xed/0x16d
> 2011-02-06T19:45:35.222331+01:00 phy005 kernel: [<ffffffff810cb72c>]
> __generic_file_aio_write+0x196/0x281
> 2011-02-06T19:45:35.222335+01:00 phy005 kernel: [<ffffffff811d5352>] ?
> file_has_perm+0xa4/0xc6
> 2011-02-06T19:45:35.222339+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-06T19:45:35.222343+01:00 phy005 kernel: [<ffffffff8113306d>]
> blkdev_aio_write+0x2a/0x69
> 2011-02-06T19:45:35.222347+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-06T19:45:35.222351+01:00 phy005 kernel: [<ffffffff8113d4eb>]
> aio_rw_vect_retry+0x85/0x18e
> 2011-02-06T19:45:35.222355+01:00 phy005 kernel: [<ffffffff8113e9b3>]
> aio_run_iocb+0x77/0x10f
> 2011-02-06T19:45:35.222359+01:00 phy005 kernel: [<ffffffff8113f508>]
> do_io_submit+0x558/0x7ce
> 2011-02-06T19:45:35.222363+01:00 phy005 kernel: [<ffffffff8113f78e>]
> sys_io_submit+0x10/0x12
> 2011-02-06T19:45:35.222366+01:00 phy005 kernel: [<ffffffff81009c72>]
> system_call_fastpath+0x16/0x1b
> 2011-02-06T19:45:35.222372+01:00 phy005 kernel: Code: 21 d8 49 01 c2
> 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
> 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79
> 04 48 8b 78 10 f0 ff 47 08 49 63 39 48
> 2011-02-06T19:45:35.222376+01:00 phy005 kernel: RIP
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-06T19:45:35.222379+01:00 phy005 kernel: RSP <ffff88060b9bda78>
> 2011-02-06T19:45:35.222382+01:00 phy005 kernel: CR2: ffffea71929180e0
> 2011-02-06T19:45:35.222386+01:00 phy005 kernel: ---[ end trace
> beed2b54d0bb8a00 ]---
>
> and
>
> 2011-02-06T19:47:15.023129+01:00 phy005 kernel: qemu-kvm: Corrupted
> page table at address 7fbde15ff64c
> 2011-02-06T19:47:15.023207+01:00 phy005 kernel: PGD 5ff58a067 PUD
> 612668067 PMD 5937b7067 PTE 1603a07305008067
> 2011-02-06T19:47:15.023214+01:00 phy005 kernel: Bad pagetable: 000d [#2] SMP
> 2011-02-06T19:47:15.023219+01:00 phy005 kernel: last sysfs file:
> /sys/devices/pci0000:00/0000:00:09.0/0000:05:00.0/host0/scsi_host/host0/stats
> 2011-02-06T19:47:15.023226+01:00 phy005 kernel: CPU 13
> 2011-02-06T19:47:15.023232+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
> ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
> iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
> scsi_wait_scan]
> 2011-02-06T19:47:15.023236+01:00 phy005 kernel:
> 2011-02-06T19:47:15.023239+01:00 phy005 kernel: Pid: 3387, comm:
> qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
> X8DTU/X8DTU
> 2011-02-06T19:47:15.023244+01:00 phy005 kernel: RIP:
> 0033:[<00000000004abb73>]  [<00000000004abb73>] 0x4abb73
> 2011-02-06T19:47:15.023247+01:00 phy005 kernel: RSP:
> 002b:00007fbdf3c00680  EFLAGS: 00010206
> 2011-02-06T19:47:15.023251+01:00 phy005 kernel: RAX: 00007fbde15ff000
> RBX: 000000000000064c RCX: 0000000001abe968
> 2011-02-06T19:47:15.023254+01:00 phy005 kernel: RDX: 0000000001abe850
> RSI: 0000000000000000 RDI: 000000003d600000
> 2011-02-06T19:47:15.023257+01:00 phy005 kernel: RBP: 0000000001f2ab00
> R08: 0000000000000003 R09: 0000000002000000
> 2011-02-06T19:47:15.023260+01:00 phy005 kernel: R10: 000000000000c050
> R11: 00007fbdec000818 R12: 0000000000000025
> 2011-02-06T19:47:15.023269+01:00 phy005 kernel: R13: 0000000000000003
> R14: 000000003d600640 R15: 0000000000000000
> 2011-02-06T19:47:15.023273+01:00 phy005 kernel: FS:
> 00007fbdf3c01700(0000) GS:ffff8806554a0000(0000)
> knlGS:0000000000000000
> 2011-02-06T19:47:15.023276+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
> 002b CR0: 0000000080050033
> 2011-02-06T19:47:15.023280+01:00 phy005 kernel: CR2: 00007fbde15ff64c
> CR3: 0000000606858000 CR4: 00000000000026e0
> 2011-02-06T19:47:15.023283+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-06T19:47:15.023286+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-06T19:47:15.023290+01:00 phy005 kernel: Process qemu-kvm (pid:
> 3387, threadinfo ffff88060689e000, task ffff8805ff5a9770)
> 2011-02-06T19:47:15.023294+01:00 phy005 kernel:
> 2011-02-06T19:47:15.023296+01:00 phy005 kernel: RIP
> [<00000000004abb73>] 0x4abb73
> 2011-02-06T19:47:15.023298+01:00 phy005 kernel: RSP <00007fbdf3c00680>
> 2011-02-06T19:47:15.023300+01:00 phy005 kernel: ---[ end trace
> beed2b54d0bb8a01 ]---
>
> followed by
>
> 2011-02-06T21:20:32.882972+01:00 phy005 kernel: BUG: unable to handle
> kernel paging request at fffff6b192918010
> 2011-02-06T21:20:32.883252+01:00 phy005 kernel: IP:
> [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
> 2011-02-06T21:20:32.883259+01:00 phy005 kernel: PGD 0
> 2011-02-06T21:20:32.883263+01:00 phy005 kernel: Oops: 0000 [#5] SMP
> 2011-02-06T21:20:32.883267+01:00 phy005 kernel: last sysfs file:
> /sys/devices/system/cpu/cpu15/topology/thread_siblings
> 2011-02-06T21:20:32.883271+01:00 phy005 kernel: CPU 8
> 2011-02-06T21:20:32.883278+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
>  garp stp llc bonding xt_comment xt_recent ip6t_REJECT
> nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
> 2c_i801 i2c_core iTCO_wdt serio_raw igb iTCO_vendor_support joydev
> ioatdma dca 3w_9xxx [last unloaded: scsi_wait_scan]
> 2011-02-06T21:20:32.883286+01:00 phy005 kernel:
> 2011-02-06T21:20:32.883290+01:00 phy005 kernel: Pid: 13247, comm:
> qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x
> 86_64 #1 X8DTU/X8DTU
> 2011-02-06T21:20:32.883295+01:00 phy005 kernel: RIP:
> 0010:[<ffffffffa0078826>]  [<ffffffffa0078826>]
> kvm_mmu_zap_page+0x28a/0x299 [kvm]
> 2011-02-06T21:20:32.883300+01:00 phy005 kernel: RSP:
> 0018:ffff880312bdfb58  EFLAGS: 00010206
> 2011-02-06T21:20:32.883303+01:00 phy005 kernel: RAX: 00000cb192918000
> RBX: ffff8802d16ae210 RCX: 0000000000000000
> 2011-02-06T21:20:32.883307+01:00 phy005 kernel: RDX: ffffea0000000000
> RSI: ffff88060bb07ff8 RDI: 0000000000000200
> 2011-02-06T21:20:32.883311+01:00 phy005 kernel: RBP: ffff880312bdfb88
> R08: dead000000100100 R09: 0000000000000004
> 2011-02-06T21:20:32.883315+01:00 phy005 kernel: R10: 0000000000000000
> R11: 0000000000000010 R12: ffff880853ae0000
> 2011-02-06T21:20:32.883319+01:00 phy005 kernel: R13: ffff88060bb07ff8
> R14: 00000000000001ff R15: 0000000000000000
> 2011-02-06T21:20:32.883323+01:00 phy005 kernel: FS:
> 0000000000000000(0000) GS:ffff880002080000(0000)
> knlGS:0000000000000000
> 2011-02-06T21:20:32.883327+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
> 002b CR0: 000000008005003b
> 2011-02-06T21:20:32.883331+01:00 phy005 kernel: CR2: fffff6b192918010
> CR3: 0000000001a42000 CR4: 00000000000026e0
> 2011-02-06T21:20:32.883335+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-06T21:20:32.883338+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-06T21:20:32.883343+01:00 phy005 kernel: Process qemu-kvm (pid:
> 13247, threadinfo ffff880312bde000, task ffff880268ad8000)
> 2011-02-06T21:20:32.883347+01:00 phy005 kernel: Stack:
> 2011-02-06T21:20:32.883351+01:00 phy005 kernel: 0000000000000002
> ffff880853ae0000 ffff8802d16ae160 ffff880853ae2328
> 2011-02-06T21:20:32.883355+01:00 phy005 kernel: <0> ffff880c22d426e8
> ffff880268ad8000 ffff880312bdfbb8 ffffffffa0078a42
> 2011-02-06T21:20:32.883358+01:00 phy005 kernel: <0> ffffea00134a16c8
> ffff880853ae0000 ffff880853ae0000 0000000000000001
> 2011-02-06T21:20:32.883362+01:00 phy005 kernel: Call Trace:
> 2011-02-06T21:20:32.883366+01:00 phy005 kernel: [<ffffffffa0078a42>]
> kvm_mmu_zap_all+0x35/0x60 [kvm]
> 2011-02-06T21:20:32.883371+01:00 phy005 kernel: [<ffffffffa006dcde>]
> kvm_arch_flush_shadow+0x16/0x22 [kvm]
> 2011-02-06T21:20:32.883375+01:00 phy005 kernel: [<ffffffffa0063b0a>]
> kvm_mmu_notifier_release+0x31/0x44 [kvm]
> 2011-02-06T21:20:32.883379+01:00 phy005 kernel: [<ffffffff810fac37>]
> __mmu_notifier_release+0x4f/0x7b
> 2011-02-06T21:20:32.883383+01:00 phy005 kernel: [<ffffffff810e735d>]
> exit_mmap+0x2c/0x132
> 2011-02-06T21:20:32.883386+01:00 phy005 kernel: [<ffffffff8104ad7a>]
> mmput+0x5e/0xca
> 2011-02-06T21:20:32.883390+01:00 phy005 kernel: [<ffffffff8104f0d5>]
> exit_mm+0x114/0x121
> 2011-02-06T21:20:32.883394+01:00 phy005 kernel: [<ffffffff81050bf5>]
> do_exit+0x254/0x752
> 2011-02-06T21:20:32.883398+01:00 phy005 kernel: [<ffffffff81051174>]
> do_group_exit+0x81/0xab
> 2011-02-06T21:20:32.883403+01:00 phy005 kernel: [<ffffffff8105e5cd>]
> get_signal_to_deliver+0x3a6/0x3c8
> 2011-02-06T21:20:32.883406+01:00 phy005 kernel: [<ffffffff81009038>]
> do_signal+0x72/0x6b8
> 2011-02-06T21:20:32.883410+01:00 phy005 kernel: [<ffffffff8111aa2f>] ?
> vfs_ioctl+0x32/0xa6
> 2011-02-06T21:20:32.883413+01:00 phy005 kernel: [<ffffffff8111afa2>] ?
> do_vfs_ioctl+0x483/0x4c9
> 2011-02-06T21:20:32.883416+01:00 phy005 kernel: [<ffffffff810096a6>]
> do_notify_resume+0x28/0x86
> 2011-02-06T21:20:32.883420+01:00 phy005 kernel: [<ffffffff81009f3e>]
> int_signal+0x12/0x17
> 2011-02-06T21:20:32.883426+01:00 phy005 kernel: Code: 41 5e 44 89 f8
> 41 5f c9 c3 48 ba 00 f0 ff ff ff ff 0f 00 4c 89 ee 48 21 d0 48 ba 00
> 00 00 00 00 ea ff ff 48 c1 e8 0c 48 6b c0 38 <48> 8b 7c 10 10 e8 a3 f3
> ff ff e9 06 fe ff ff 55 48 89 e5 41 57
> 2011-02-06T21:20:32.883431+01:00 phy005 kernel: RIP
> [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
> 2011-02-06T21:20:32.883434+01:00 phy005 kernel: RSP <ffff880312bdfb58>
> 2011-02-06T21:20:32.883437+01:00 phy005 kernel: CR2: fffff6b192918010
> 2011-02-06T21:20:32.883441+01:00 phy005 kernel: ---[ end trace
> beed2b54d0bb8a04 ]---
> 2011-02-06T21:20:32.883444+01:00 phy005 kernel: Fixing recursive fault
> but reboot is needed!
>
> after which we rebooted the machine and replaced the motherboard and
> cpus (we already replaced the memory before).
>
> But 2 days ago we got this oops:
>
> 2011-02-08T15:56:19.902104+01:00 phy005 kernel: BUG: unable to handle
> kernel paging request at ffffea71929181c0
> 2011-02-08T15:56:19.902686+01:00 phy005 kernel: IP:
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-08T15:56:19.902693+01:00 phy005 kernel: PGD 118600067 PUD 0
> 2011-02-08T15:56:19.902699+01:00 phy005 kernel: Oops: 0000 [#1] SMP
> 2011-02-08T15:56:19.902703+01:00 phy005 kernel: last sysfs file:
> /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m
> ap
> 2011-02-08T15:56:19.902708+01:00 phy005 kernel: CPU 8
> 2011-02-08T15:56:19.902715+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
>  garp stp llc bonding xt_comment xt_recent ip6t_REJECT
> nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
> gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca
> serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan]
> 2011-02-08T15:56:19.902770+01:00 phy005 kernel:
> 2011-02-08T15:56:19.902775+01:00 phy005 kernel: Pid: 3346, comm:
> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X
> 8DTU/X8DTU
> 2011-02-08T15:56:19.902781+01:00 phy005 kernel: RIP:
> 0010:[<ffffffff81034880>]  [<ffffffff81034880>] gup_pte_range+0x94/
> 0xd3
> 2011-02-08T15:56:19.902785+01:00 phy005 kernel: RSP:
> 0018:ffff880c21bc1a78  EFLAGS: 00010086
> 2011-02-08T15:56:19.902789+01:00 phy005 kernel: RAX: ffffea71929181c0
> RBX: 00003ffffffff000 RCX: 0000000000000005
> 2011-02-08T15:56:19.902793+01:00 phy005 kernel: RDX: 00007fa2ca200000
> RSI: 00007fa2ca1ff000 RDI: 1603a07305008067
> 2011-02-08T15:56:19.902797+01:00 phy005 kernel: RBP: ffff880c21bc1a98
> R08: ffff88060fdfad60 R09: ffff880c21bc1b44
> 2011-02-08T15:56:19.902801+01:00 phy005 kernel: R10: ffff88061493fff8
> R11: ffffea0000000000 R12: 0000000000000205
> 2011-02-08T15:56:19.902805+01:00 phy005 kernel: R13: ffffc00000000fff
> R14: 0000000000000005 R15: 0000000000000000
> 2011-02-08T15:56:19.902810+01:00 phy005 kernel: FS:
> 00007fa2d8724700(0000) GS:ffff880002080000(0000) knlGS:000000000000
> 0000
> 2011-02-08T15:56:19.902820+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
> 002b CR0: 0000000080050033
> 2011-02-08T15:56:19.902825+01:00 phy005 kernel: CR2: ffffea71929181c0
> CR3: 0000000c231f9000 CR4: 00000000000026e0
> 2011-02-08T15:56:19.902829+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-08T15:56:19.902833+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-08T15:56:19.902837+01:00 phy005 kernel: Process qemu-kvm (pid:
> 3346, threadinfo ffff880c21bc0000, task ffff880c2
> 264ddc0)
> 2011-02-08T15:56:19.902841+01:00 phy005 kernel: Stack:
> 2011-02-08T15:56:19.902844+01:00 phy005 kernel: 00007fa2ca200000
> 00007fa2ca201000 00007fa2ca201000 ffff880c22c3d280
> 2011-02-08T15:56:19.902848+01:00 phy005 kernel: <0> ffff880c21bc1af8
> ffffffff81034a15 00007fa2ca200fff 00007fa2ca200fff
> 2011-02-08T15:56:19.902852+01:00 phy005 kernel: <0> ffff880c21bc1b44
> ffff88060fdfad60 ffff880c2231a458 ffff880c231f97f8
> 2011-02-08T15:56:19.902855+01:00 phy005 kernel: Call Trace:
> 2011-02-08T15:56:19.902859+01:00 phy005 kernel: [<ffffffff81034a15>]
> gup_pud_range+0x156/0x192
> 2011-02-08T15:56:19.902863+01:00 phy005 kernel: [<ffffffff81034b15>]
> get_user_pages_fast+0xc4/0x172
> 2011-02-08T15:56:19.902867+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
> bio_add_page+0x36/0x38
> 2011-02-08T15:56:19.902871+01:00 phy005 kernel: [<ffffffff81134730>]
> dio_get_page+0x54/0x127
> 2011-02-08T15:56:19.902875+01:00 phy005 kernel: [<ffffffff81135317>]
> __blockdev_direct_IO+0x41d/0xa36
> 2011-02-08T15:56:19.902880+01:00 phy005 kernel: [<ffffffffa008bf69>] ?
> x86_emulate_insn+0x1ff8/0x2d61 [kvm]
> 2011-02-08T15:56:19.902884+01:00 phy005 kernel: [<ffffffff8113379b>]
> blkdev_direct_IO+0x4e/0x50
> 2011-02-08T15:56:19.902888+01:00 phy005 kernel: [<ffffffff81132c49>] ?
> blkdev_get_blocks+0x0/0x8d
> 2011-02-08T15:56:19.902892+01:00 phy005 kernel: [<ffffffff810cb516>]
> generic_file_direct_write+0xed/0x16d
> 2011-02-08T15:56:19.902896+01:00 phy005 kernel: [<ffffffff810cb72c>]
> __generic_file_aio_write+0x196/0x281
> 2011-02-08T15:56:19.902899+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-08T15:56:19.902909+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-08T15:56:19.902914+01:00 phy005 kernel: [<ffffffff8113d4eb>]
> aio_rw_vect_retry+0x85/0x18e
> 2011-02-08T15:56:19.902919+01:00 phy005 kernel: [<ffffffff8113e9b3>]
> aio_run_iocb+0x77/0x10f
> 2011-02-08T15:56:19.902923+01:00 phy005 kernel: [<ffffffff8113f508>]
> do_io_submit+0x558/0x7ce
> 2011-02-08T15:56:19.902927+01:00 phy005 kernel: [<ffffffff8113f78e>]
> sys_io_submit+0x10/0x12
> 2011-02-08T15:56:19.902932+01:00 phy005 kernel: [<ffffffff81009c72>]
> system_call_fastpath+0x16/0x1b
> 2011-02-08T15:56:19.902938+01:00 phy005 kernel: Code: 21 d8 49 01 c2
> 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
> 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8 <66> 83 38 00 48 89 c7 79
> 04 48 8b 78 10 f0 ff 47 08 49 63 39 48
> 2011-02-08T15:56:19.903077+01:00 phy005 kernel: RIP
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-08T15:56:19.903081+01:00 phy005 kernel: RSP <ffff880c21bc1a78>
> 2011-02-08T15:56:19.903084+01:00 phy005 kernel: CR2: ffffea71929181c0
> 2011-02-08T15:56:19.903088+01:00 phy005 kernel: ---[ end trace
> 174c28940e9fd0a7 ]---
>
> and yesterday this one:
>
> 2011-02-09T07:40:15.636528+01:00 phy005 kernel: BUG: unable to handle
> kernel NULL pointer dereference at (null)
> 2011-02-09T07:40:15.636635+01:00 phy005 kernel: IP:
> [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
> 2011-02-09T07:40:15.636639+01:00 phy005 kernel: PGD 0
> 2011-02-09T07:40:15.636643+01:00 phy005 kernel: Oops: 0000 [#3] SMP
> 2011-02-09T07:40:15.636647+01:00 phy005 kernel: last sysfs file:
> /sys/devices/system/cpu/cpu15/topology/thread_siblings
> 2011-02-09T07:40:15.636650+01:00 phy005 kernel: CPU 2
> 2011-02-09T07:40:15.636656+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
> ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt ioatdma i2c_core
> iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded:
> scsi_wait_scan]
> 2011-02-09T07:40:15.636663+01:00 phy005 kernel:
> 2011-02-09T07:40:15.636666+01:00 phy005 kernel: Pid: 2572, comm:
> qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
> X8DTU/X8DTU
> 2011-02-09T07:40:15.636670+01:00 phy005 kernel: RIP:
> 0010:[<ffffffffa0082db8>]  [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e
> [kvm]
> 2011-02-09T07:40:15.636673+01:00 phy005 kernel: RSP:
> 0018:ffff88061cbcbcd8  EFLAGS: 00010246
> 2011-02-09T07:40:15.636677+01:00 phy005 kernel: RAX: 0000000000000000
> RBX: 1603a07305004fff RCX: ffff88061cbcbd08
> 2011-02-09T07:40:15.636680+01:00 phy005 kernel: RDX: 0000000000000023
> RSI: 1603a07305004fff RDI: 0000000000000000
> 2011-02-09T07:40:15.636683+01:00 phy005 kernel: RBP: ffff88061cbcbce8
> R08: 0000000000000023 R09: 0000000000000000
> 2011-02-09T07:40:15.636686+01:00 phy005 kernel: R10: 0000000000000000
> R11: ffffffffa0082c7f R12: 0000000000000001
> 2011-02-09T07:40:15.636689+01:00 phy005 kernel: R13: 0000000000311763
> R14: ffff8809b8b01ce0 R15: 0000000000000000
> 2011-02-09T07:40:15.636692+01:00 phy005 kernel: FS:
> 0000000000000000(0000) GS:ffff880002040000(0000)
> knlGS:0000000000000000
> 2011-02-09T07:40:15.636695+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> 2011-02-09T07:40:15.636699+01:00 phy005 kernel: CR2: 0000000000000000
> CR3: 0000000001a42000 CR4: 00000000000026e0
> 2011-02-09T07:40:15.636702+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-09T07:40:15.636705+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-09T07:40:15.636709+01:00 phy005 kernel: Process qemu-kvm (pid:
> 2572, threadinfo ffff88061cbca000, task ffff88061cf04650)
> 2011-02-09T07:40:15.636711+01:00 phy005 kernel: Stack:
> 2011-02-09T07:40:15.636715+01:00 phy005 kernel: ffff88036c471ff8
> ffff880c23984000 ffff88061cbcbd18 ffffffffa0082ea9
> 2011-02-09T07:40:15.636718+01:00 phy005 kernel: <0> ffff8809b8b01ce0
> ffff880c23984000 ffff88036c471ff8 00000000000001ff
> 2011-02-09T07:40:15.636721+01:00 phy005 kernel: <0> ffff88061cbcbd58
> ffffffffa008363b 0000000000000200 ffff880c23984000
> 2011-02-09T07:40:15.636724+01:00 phy005 kernel: Call Trace:
> 2011-02-09T07:40:15.636728+01:00 phy005 kernel: [<ffffffffa0082ea9>]
> rmap_remove+0xa3/0x1a0 [kvm]
> 2011-02-09T07:40:15.636731+01:00 phy005 kernel: [<ffffffffa008363b>]
> kvm_mmu_zap_page+0x9f/0x299 [kvm]
> 2011-02-09T07:40:15.636734+01:00 phy005 kernel: [<ffffffffa0083a42>]
> kvm_mmu_zap_all+0x35/0x60 [kvm]
> 2011-02-09T07:40:15.636738+01:00 phy005 kernel: [<ffffffffa0078cde>]
> kvm_arch_flush_shadow+0x16/0x22 [kvm]
> 2011-02-09T07:40:15.636741+01:00 phy005 kernel: [<ffffffffa006eb0a>]
> kvm_mmu_notifier_release+0x31/0x44 [kvm]
> 2011-02-09T07:40:15.636744+01:00 phy005 kernel: [<ffffffff810fac37>]
> __mmu_notifier_release+0x4f/0x7b
> 2011-02-09T07:40:15.636748+01:00 phy005 kernel: [<ffffffff810e735d>]
> exit_mmap+0x2c/0x132
> 2011-02-09T07:40:15.636751+01:00 phy005 kernel: [<ffffffff8104ad7a>]
> mmput+0x5e/0xca
> 2011-02-09T07:40:15.636754+01:00 phy005 kernel: [<ffffffff8104f0d5>]
> exit_mm+0x114/0x121
> 2011-02-09T07:40:15.636757+01:00 phy005 kernel: [<ffffffff81050bf5>]
> do_exit+0x254/0x752
> 2011-02-09T07:40:15.636760+01:00 phy005 kernel: [<ffffffff8100a60e>] ?
> apic_timer_interrupt+0xe/0x20
> 2011-02-09T07:40:15.636764+01:00 phy005 kernel: [<ffffffff81051174>]
> do_group_exit+0x81/0xab
> 2011-02-09T07:40:15.636767+01:00 phy005 kernel: [<ffffffff810511b5>]
> sys_exit_group+0x17/0x1b
> 2011-02-09T07:40:15.636771+01:00 phy005 kernel: [<ffffffff81009c72>]
> system_call_fastpath+0x16/0x1b
> 2011-02-09T07:40:15.636777+01:00 phy005 kernel: Code: 88 ff ff ff b8
> 01 00 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 41 89 d4 48 89
> f3 e8 7b c7 fe ff 41 83 fc 01 48 89 c7 75 0d <48> 2b 18 48 c1 e3 03 48
> 03 58 18 eb 39 41 8d 4c 24 ff be 01 00
> 2011-02-09T07:40:15.636785+01:00 phy005 kernel: RIP
> [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
> 2011-02-09T07:40:15.636788+01:00 phy005 kernel: RSP <ffff88061cbcbcd8>
> 2011-02-09T07:40:15.636791+01:00 phy005 kernel: CR2: 0000000000000000
> 2011-02-09T07:40:15.637743+01:00 phy005 kernel: ---[ end trace
> 174c28940e9fd0a9 ]---
> 2011-02-09T07:40:15.637751+01:00 phy005 kernel: Fixing recursive fault
> but reboot is needed!
>
> So it doesn't seem to be a hardware problem since we replaced all that.
>
> Kind regards,
>
> Ruben

And tonight we had another one of those errors we had a few weeks ago:

2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000
2011-02-13T02:56:28.694914+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x25602d007 level 4
2011-02-13T02:56:28.694916+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x3df3e2007 level 3
2011-02-13T02:56:28.694919+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x5e90c7007 level 2
2011-02-13T02:56:28.694925+01:00 phy005 kernel:
ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
2011-02-13T02:56:28.694928+01:00 phy005 kernel:
ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
2011-02-13T02:56:28.694930+01:00 phy005 kernel: ------------[ cut here
]------------
2011-02-13T02:56:28.694933+01:00 phy005 kernel: WARNING: at
arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]()
2011-02-13T02:56:28.694936+01:00 phy005 kernel: Hardware name: X8DTU
2011-02-13T02:56:28.694941+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt igb ioatdma
dca iTCO_vendor_support joydev serio_raw microcode 3w_9xxx [last
unloaded: scsi_wait_scan]
2011-02-13T02:56:28.695004+01:00 phy005 kernel: Pid: 4756, comm:
qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1
2011-02-13T02:56:28.695008+01:00 phy005 kernel: Call Trace:
2011-02-13T02:56:28.695013+01:00 phy005 kernel: [<ffffffff8104d11f>]
warn_slowpath_common+0x7c/0x94
2011-02-13T02:56:28.695020+01:00 phy005 kernel: [<ffffffff8104d14b>]
warn_slowpath_null+0x14/0x16
2011-02-13T02:56:28.695024+01:00 phy005 kernel: [<ffffffffa00c97fb>]
handle_ept_misconfig+0x152/0x1d8 [kvm_intel]
2011-02-13T02:56:28.695028+01:00 phy005 kernel: [<ffffffffa00ca401>]
vmx_handle_exit+0x204/0x23a [kvm_intel]
2011-02-13T02:56:28.695033+01:00 phy005 kernel: [<ffffffffa0084998>]
kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
2011-02-13T02:56:28.695037+01:00 phy005 kernel: [<ffffffffa00735ba>]
kvm_vcpu_ioctl+0xfd/0x56e [kvm]
2011-02-13T02:56:28.695042+01:00 phy005 kernel: [<ffffffff810feaab>] ?
virt_to_head_page+0xe/0x2f
2011-02-13T02:56:28.695046+01:00 phy005 kernel: [<ffffffff810cc6ca>] ?
mempool_kfree+0xe/0x10
2011-02-13T02:56:28.695051+01:00 phy005 kernel: [<ffffffff810cc857>] ?
mempool_free+0x76/0x7b
2011-02-13T02:56:28.695055+01:00 phy005 kernel: [<ffffffff8111aa2f>]
vfs_ioctl+0x32/0xa6
2011-02-13T02:56:28.695060+01:00 phy005 kernel: [<ffffffff8111afa2>]
do_vfs_ioctl+0x483/0x4c9
2011-02-13T02:56:28.695065+01:00 phy005 kernel: [<ffffffff8111b03e>]
sys_ioctl+0x56/0x79
2011-02-13T02:56:28.695070+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-13T02:56:28.695074+01:00 phy005 kernel: ---[ end trace
d95032626ea304ca ]---

Any help would be much appreciated. It seems very strange that I'm the
first one who runs into this.
I've found two bugreports which report the same, the first one at
https://partner-bugzilla.redhat.com/show_bug.cgi?format=multiple&id=613691,
but that's a duplicate of
https://partner-bugzilla.redhat.com/show_bug.cgi?id=606131 which I'm
not authorized to see...

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-10 15:23             ` Ruben Kerkhof
  2011-02-13  2:07               ` Ruben Kerkhof
@ 2011-02-13 12:58               ` Avi Kivity
  2011-02-13 14:36                 ` Ruben Kerkhof
  1 sibling, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2011-02-13 12:58 UTC (permalink / raw)
  To: Ruben Kerkhof; +Cc: Marcelo Tosatti, kvm, Andrea Arcangeli

On 02/10/2011 05:23 PM, Ruben Kerkhof wrote:
>
> This machine has been running for a week without problems, but then we
> started to get the following oopses again:
>
> 2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
> kernel paging request at ffffea71929180e0
> 2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
> 2011-02-06T19:45:35.222203+01:00 phy005 kernel: Oops: 0000 [#1] SMP
> 2011-02-06T19:45:35.222221+01:00 phy005 kernel: last sysfs file:
> /sys/devices/system/cpu/cpu15/topology/thread_siblings
> 2011-02-06T19:45:35.222224+01:00 phy005 kernel: CPU 4
> 2011-02-06T19:45:35.222229+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
> ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
> iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
> scsi_wait_scan]
> 2011-02-06T19:45:35.222231+01:00 phy005 kernel:
> 2011-02-06T19:45:35.222233+01:00 phy005 kernel: Pid: 3650, comm:
> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
> 2011-02-06T19:45:35.222236+01:00 phy005 kernel: RIP:
> 0010:[<ffffffff81034880>]  [<ffffffff81034880>]
> gup_pte_range+0x94/0xd3
> 2011-02-06T19:45:35.222239+01:00 phy005 kernel: RSP:
> 0018:ffff88060b9bda78  EFLAGS: 00010082
> 2011-02-06T19:45:35.222241+01:00 phy005 kernel: RAX: ffffea71929180e0
> RBX: 00003ffffffff000 RCX: 0000000000000005
> 2011-02-06T19:45:35.222243+01:00 phy005 kernel: RDX: 00007fe54e400000
> RSI: 00007fe54e3ff000 RDI: 1603a07305004067
> 2011-02-06T19:45:35.222245+01:00 phy005 kernel: RBP: ffff88060b9bda98
> R08: ffff880b94384560 R09: ffff88060b9bdb44
> 2011-02-06T19:45:35.222248+01:00 phy005 kernel: R10: ffff880606b2fff8
> R11: ffffea0000000000 R12: 0000000000000205
> 2011-02-06T19:45:35.222251+01:00 phy005 kernel: R13: ffffc00000000fff
> R14: 0000000000000005 R15: 0000000000000000
> 2011-02-06T19:45:35.222255+01:00 phy005 kernel: FS:
> 00007fe64cb0e700(0000) GS:ffff880655400000(0000)
> knlGS:0000000000000000
> 2011-02-06T19:45:35.222259+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
> 002b CR0: 0000000080050033
> 2011-02-06T19:45:35.222263+01:00 phy005 kernel: CR2: ffffea71929180e0
> CR3: 0000000bff06d000 CR4: 00000000000026e0
> 2011-02-06T19:45:35.222267+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-06T19:45:35.222271+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-06T19:45:35.222274+01:00 phy005 kernel: Process qemu-kvm (pid:
> 3650, threadinfo ffff88060b9bc000, task ffff880623ed2ee0)
> 2011-02-06T19:45:35.222278+01:00 phy005 kernel: Stack:
> 2011-02-06T19:45:35.222281+01:00 phy005 kernel: 00007fe54e400000
> 00007fe54e400000 00007fe54e400000 ffff88053a0d2388
> 2011-02-06T19:45:35.222285+01:00 phy005 kernel:<0>  ffff88060b9bdaf8
> ffffffff81034a15 00007fe54e3fffff 00007fe54e3fffff
> 2011-02-06T19:45:35.222289+01:00 phy005 kernel:<0>  ffff88060b9bdb44
> ffff880b94384560 ffff880bff06eca8 ffff880bff06d7f8
> 2011-02-06T19:45:35.222292+01:00 phy005 kernel: Call Trace:
> 2011-02-06T19:45:35.222296+01:00 phy005 kernel: [<ffffffff81034a15>]
> gup_pud_range+0x156/0x192
> 2011-02-06T19:45:35.222300+01:00 phy005 kernel: [<ffffffff81034b15>]
> get_user_pages_fast+0xc4/0x172
> 2011-02-06T19:45:35.222304+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
> bio_add_page+0x36/0x38
> 2011-02-06T19:45:35.222308+01:00 phy005 kernel: [<ffffffff81134730>]
> dio_get_page+0x54/0x127
> 2011-02-06T19:45:35.222312+01:00 phy005 kernel: [<ffffffff81135317>]
> __blockdev_direct_IO+0x41d/0xa36
> 2011-02-06T19:45:35.222316+01:00 phy005 kernel: [<ffffffffa0080f69>] ?
> x86_emulate_insn+0x1ff8/0x2d61 [kvm]
> 2011-02-06T19:45:35.222320+01:00 phy005 kernel: [<ffffffff8113379b>]
> blkdev_direct_IO+0x4e/0x50
> 2011-02-06T19:45:35.222324+01:00 phy005 kernel: [<ffffffff81132c49>] ?
> blkdev_get_blocks+0x0/0x8d
> 2011-02-06T19:45:35.222328+01:00 phy005 kernel: [<ffffffff810cb516>]
> generic_file_direct_write+0xed/0x16d
> 2011-02-06T19:45:35.222331+01:00 phy005 kernel: [<ffffffff810cb72c>]
> __generic_file_aio_write+0x196/0x281
> 2011-02-06T19:45:35.222335+01:00 phy005 kernel: [<ffffffff811d5352>] ?
> file_has_perm+0xa4/0xc6
> 2011-02-06T19:45:35.222339+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-06T19:45:35.222343+01:00 phy005 kernel: [<ffffffff8113306d>]
> blkdev_aio_write+0x2a/0x69
> 2011-02-06T19:45:35.222347+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-06T19:45:35.222351+01:00 phy005 kernel: [<ffffffff8113d4eb>]
> aio_rw_vect_retry+0x85/0x18e
> 2011-02-06T19:45:35.222355+01:00 phy005 kernel: [<ffffffff8113e9b3>]
> aio_run_iocb+0x77/0x10f
> 2011-02-06T19:45:35.222359+01:00 phy005 kernel: [<ffffffff8113f508>]
> do_io_submit+0x558/0x7ce
> 2011-02-06T19:45:35.222363+01:00 phy005 kernel: [<ffffffff8113f78e>]
> sys_io_submit+0x10/0x12
> 2011-02-06T19:45:35.222366+01:00 phy005 kernel: [<ffffffff81009c72>]
> system_call_fastpath+0x16/0x1b
> 2011-02-06T19:45:35.222372+01:00 phy005 kernel: Code: 21 d8 49 01 c2
> 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
> 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8<66>  83 38 00 48 89 c7 79
> 04 48 8b 78 10 f0 ff 47 08 49 63 39 48
> 2011-02-06T19:45:35.222376+01:00 phy005 kernel: RIP
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-06T19:45:35.222379+01:00 phy005 kernel: RSP<ffff88060b9bda78>
> 2011-02-06T19:45:35.222382+01:00 phy005 kernel: CR2: ffffea71929180e0
> 2011-02-06T19:45:35.222386+01:00 phy005 kernel: ---[ end trace
> beed2b54d0bb8a00 ]---
>

Hm, outside any kvm code.

> and
>
> 2011-02-06T19:47:15.023129+01:00 phy005 kernel: qemu-kvm: Corrupted
> page table at address 7fbde15ff64c
> 2011-02-06T19:47:15.023207+01:00 phy005 kernel: PGD 5ff58a067 PUD
> 612668067 PMD 5937b7067 PE 1603a07305008067

Again outside kvm, and again the magic pte 1603axxxxx.


> followed by
>
> 2011-02-06T21:20:32.882972+01:00 phy005 kernel: BUG: unable to handle
> kernel paging request at fffff6b192918010
> 2011-02-06T21:20:32.883252+01:00 phy005 kernel: IP:
> [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]

Well, after something goes bad, nothing good can come out of it.

> after which we rebooted the machine and replaced the motherboard and
> cpus (we already replaced the memory before).
>
> But 2 days ago we got this oops:
>
> 2011-02-08T15:56:19.902104+01:00 phy005 kernel: BUG: unable to handle
> kernel paging request at ffffea71929181c0
> 2011-02-08T15:56:19.902686+01:00 phy005 kernel: IP:
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-08T15:56:19.902693+01:00 phy005 kernel: PGD 118600067 PUD 0
> 2011-02-08T15:56:19.902699+01:00 phy005 kernel: Oops: 0000 [#1] SMP
> 2011-02-08T15:56:19.902703+01:00 phy005 kernel: last sysfs file:
> /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m
> ap
> 2011-02-08T15:56:19.902708+01:00 phy005 kernel: CPU 8
> 2011-02-08T15:56:19.902715+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
>   garp stp llc bonding xt_comment xt_recent ip6t_REJECT
> nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
> gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca
> serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan]
> 2011-02-08T15:56:19.902770+01:00 phy005 kernel:
> 2011-02-08T15:56:19.902775+01:00 phy005 kernel: Pid: 3346, comm:
> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X
> 8DTU/X8DTU
> 2011-02-08T15:56:19.902781+01:00 phy005 kernel: RIP:
> 0010:[<ffffffff81034880>]  [<ffffffff81034880>] gup_pte_range+0x94/
> 0xd3
> 2011-02-08T15:56:19.902785+01:00 phy005 kernel: RSP:
> 0018:ffff880c21bc1a78  EFLAGS: 00010086
> 2011-02-08T15:56:19.902789+01:00 phy005 kernel: RAX: ffffea71929181c0
> RBX: 00003ffffffff000 RCX: 0000000000000005
> 2011-02-08T15:56:19.902793+01:00 phy005 kernel: RDX: 00007fa2ca200000
> RSI: 00007fa2ca1ff000 RDI: 1603a07305008067
> 2011-02-08T15:56:19.902797+01:00 phy005 kernel: RBP: ffff880c21bc1a98
> R08: ffff88060fdfad60 R09: ffff880c21bc1b44
> 2011-02-08T15:56:19.902801+01:00 phy005 kernel: R10: ffff88061493fff8
> R11: ffffea0000000000 R12: 0000000000000205
> 2011-02-08T15:56:19.902805+01:00 phy005 kernel: R13: ffffc00000000fff
> R14: 0000000000000005 R15: 0000000000000000
> 2011-02-08T15:56:19.902810+01:00 phy005 kernel: FS:
> 00007fa2d8724700(0000) GS:ffff880002080000(0000) knlGS:000000000000
> 0000
> 2011-02-08T15:56:19.902820+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
> 002b CR0: 0000000080050033
> 2011-02-08T15:56:19.902825+01:00 phy005 kernel: CR2: ffffea71929181c0
> CR3: 0000000c231f9000 CR4: 00000000000026e0
> 2011-02-08T15:56:19.902829+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-08T15:56:19.902833+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-08T15:56:19.902837+01:00 phy005 kernel: Process qemu-kvm (pid:
> 3346, threadinfo ffff880c21bc0000, task ffff880c2
> 264ddc0)
> 2011-02-08T15:56:19.902841+01:00 phy005 kernel: Stack:
> 2011-02-08T15:56:19.902844+01:00 phy005 kernel: 00007fa2ca200000
> 00007fa2ca201000 00007fa2ca201000 ffff880c22c3d280
> 2011-02-08T15:56:19.902848+01:00 phy005 kernel:<0>  ffff880c21bc1af8
> ffffffff81034a15 00007fa2ca200fff 00007fa2ca200fff
> 2011-02-08T15:56:19.902852+01:00 phy005 kernel:<0>  ffff880c21bc1b44
> ffff88060fdfad60 ffff880c2231a458 ffff880c231f97f8
> 2011-02-08T15:56:19.902855+01:00 phy005 kernel: Call Trace:
> 2011-02-08T15:56:19.902859+01:00 phy005 kernel: [<ffffffff81034a15>]
> gup_pud_range+0x156/0x192
> 2011-02-08T15:56:19.902863+01:00 phy005 kernel: [<ffffffff81034b15>]
> get_user_pages_fast+0xc4/0x172
> 2011-02-08T15:56:19.902867+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
> bio_add_page+0x36/0x38
> 2011-02-08T15:56:19.902871+01:00 phy005 kernel: [<ffffffff81134730>]
> dio_get_page+0x54/0x127
> 2011-02-08T15:56:19.902875+01:00 phy005 kernel: [<ffffffff81135317>]
> __blockdev_direct_IO+0x41d/0xa36
> 2011-02-08T15:56:19.902880+01:00 phy005 kernel: [<ffffffffa008bf69>] ?
> x86_emulate_insn+0x1ff8/0x2d61 [kvm]
> 2011-02-08T15:56:19.902884+01:00 phy005 kernel: [<ffffffff8113379b>]
> blkdev_direct_IO+0x4e/0x50
> 2011-02-08T15:56:19.902888+01:00 phy005 kernel: [<ffffffff81132c49>] ?
> blkdev_get_blocks+0x0/0x8d
> 2011-02-08T15:56:19.902892+01:00 phy005 kernel: [<ffffffff810cb516>]
> generic_file_direct_write+0xed/0x16d
> 2011-02-08T15:56:19.902896+01:00 phy005 kernel: [<ffffffff810cb72c>]
> __generic_file_aio_write+0x196/0x281
> 2011-02-08T15:56:19.902899+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-08T15:56:19.902909+01:00 phy005 kernel: [<ffffffff81133043>] ?
> blkdev_aio_write+0x0/0x69
> 2011-02-08T15:56:19.902914+01:00 phy005 kernel: [<ffffffff8113d4eb>]
> aio_rw_vect_retry+0x85/0x18e
> 2011-02-08T15:56:19.902919+01:00 phy005 kernel: [<ffffffff8113e9b3>]
> aio_run_iocb+0x77/0x10f
> 2011-02-08T15:56:19.902923+01:00 phy005 kernel: [<ffffffff8113f508>]
> do_io_submit+0x558/0x7ce
> 2011-02-08T15:56:19.902927+01:00 phy005 kernel: [<ffffffff8113f78e>]
> sys_io_submit+0x10/0x12
> 2011-02-08T15:56:19.902932+01:00 phy005 kernel: [<ffffffff81009c72>]
> system_call_fastpath+0x16/0x1b
> 2011-02-08T15:56:19.902938+01:00 phy005 kernel: Code: 21 d8 49 01 c2
> 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
> 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8<66>  83 38 00 48 89 c7 79
> 04 48 8b 78 10 f0 ff 47 08 49 63 39 48
> 2011-02-08T15:56:19.903077+01:00 phy005 kernel: RIP
> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
> 2011-02-08T15:56:19.903081+01:00 phy005 kernel: RSP<ffff880c21bc1a78>
> 2011-02-08T15:56:19.903084+01:00 phy005 kernel: CR2: ffffea71929181c0
> 2011-02-08T15:56:19.903088+01:00 phy005 kernel: ---[ end trace
> 174c28940e9fd0a7 ]---
>

Again outside kvm.

> and yesterday this one:
>
> 2011-02-09T07:40:15.636528+01:00 phy005 kernel: BUG: unable to handle
> kernel NULL pointer dereference at (null)
> 2011-02-09T07:40:15.636635+01:00 phy005 kernel: IP:
> [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
> 2011-02-09T07:40:15.636639+01:00 phy005 kernel: PGD 0
> 2011-02-09T07:40:15.636643+01:00 phy005 kernel: Oops: 0000 [#3] SMP
> 2011-02-09T07:40:15.636647+01:00 phy005 kernel: last sysfs file:
> /sys/devices/system/cpu/cpu15/topology/thread_siblings
> 2011-02-09T07:40:15.636650+01:00 phy005 kernel: CPU 2
> 2011-02-09T07:40:15.636656+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
> ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt ioatdma i2c_core
> iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded:
> scsi_wait_scan]
> 2011-02-09T07:40:15.636663+01:00 phy005 kernel:
> 2011-02-09T07:40:15.636666+01:00 phy005 kernel: Pid: 2572, comm:
> qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
> X8DTU/X8DTU
> 2011-02-09T07:40:15.636670+01:00 phy005 kernel: RIP:
> 0010:[<ffffffffa0082db8>]  [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e
> [kvm]
> 2011-02-09T07:40:15.636673+01:00 phy005 kernel: RSP:
> 0018:ffff88061cbcbcd8  EFLAGS: 00010246
> 2011-02-09T07:40:15.636677+01:00 phy005 kernel: RAX: 0000000000000000
> RBX: 1603a07305004fff RCX: ffff88061cbcbd08
> 2011-02-09T07:40:15.636680+01:00 phy005 kernel: RDX: 0000000000000023
> RSI: 1603a07305004fff RDI: 0000000000000000
> 2011-02-09T07:40:15.636683+01:00 phy005 kernel: RBP: ffff88061cbcbce8
> R08: 0000000000000023 R09: 0000000000000000
> 2011-02-09T07:40:15.636686+01:00 phy005 kernel: R10: 0000000000000000
> R11: ffffffffa0082c7f R12: 0000000000000001
> 2011-02-09T07:40:15.636689+01:00 phy005 kernel: R13: 0000000000311763
> R14: ffff8809b8b01ce0 R15: 0000000000000000
> 2011-02-09T07:40:15.636692+01:00 phy005 kernel: FS:
> 0000000000000000(0000) GS:ffff880002040000(0000)
> knlGS:0000000000000000
> 2011-02-09T07:40:15.636695+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
> 0000 CR0: 000000008005003b
> 2011-02-09T07:40:15.636699+01:00 phy005 kernel: CR2: 0000000000000000
> CR3: 0000000001a42000 CR4: 00000000000026e0
> 2011-02-09T07:40:15.636702+01:00 phy005 kernel: DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> 2011-02-09T07:40:15.636705+01:00 phy005 kernel: DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> 2011-02-09T07:40:15.636709+01:00 phy005 kernel: Process qemu-kvm (pid:
> 2572, threadinfo ffff88061cbca000, task ffff88061cf04650)
> 2011-02-09T07:40:15.636711+01:00 phy005 kernel: Stack:
> 2011-02-09T07:40:15.636715+01:00 phy005 kernel: ffff88036c471ff8
> ffff880c23984000 ffff88061cbcbd18 ffffffffa0082ea9
> 2011-02-09T07:40:15.636718+01:00 phy005 kernel:<0>  ffff8809b8b01ce0
> ffff880c23984000 ffff88036c471ff8 00000000000001ff
> 2011-02-09T07:40:15.636721+01:00 phy005 kernel:<0>  ffff88061cbcbd58
> ffffffffa008363b 0000000000000200 ffff880c23984000
> 2011-02-09T07:40:15.636724+01:00 phy005 kernel: Call Trace:
> 2011-02-09T07:40:15.636728+01:00 phy005 kernel: [<ffffffffa0082ea9>]
> rmap_remove+0xa3/0x1a0 [kvm]
> 2011-02-09T07:40:15.636731+01:00 phy005 kernel: [<ffffffffa008363b>]
> kvm_mmu_zap_page+0x9f/0x299 [kvm]
> 2011-02-09T07:40:15.636734+01:00 phy005 kernel: [<ffffffffa0083a42>]
> kvm_mmu_zap_all+0x35/0x60 [kvm]
> 2011-02-09T07:40:15.636738+01:00 phy005 kernel: [<ffffffffa0078cde>]
> kvm_arch_flush_shadow+0x16/0x22 [kvm]
> 2011-02-09T07:40:15.636741+01:00 phy005 kernel: [<ffffffffa006eb0a>]
> kvm_mmu_notifier_release+0x31/0x44 [kvm]
> 2011-02-09T07:40:15.636744+01:00 phy005 kernel: [<ffffffff810fac37>]
> __mmu_notifier_release+0x4f/0x7b
> 2011-02-09T07:40:15.636748+01:00 phy005 kernel: [<ffffffff810e735d>]
> exit_mmap+0x2c/0x132
> 2011-02-09T07:40:15.636751+01:00 phy005 kernel: [<ffffffff8104ad7a>]
> mmput+0x5e/0xca
> 2011-02-09T07:40:15.636754+01:00 phy005 kernel: [<ffffffff8104f0d5>]
> exit_mm+0x114/0x121
> 2011-02-09T07:40:15.636757+01:00 phy005 kernel: [<ffffffff81050bf5>]
> do_exit+0x254/0x752
> 2011-02-09T07:40:15.636760+01:00 phy005 kernel: [<ffffffff8100a60e>] ?
> apic_timer_interrupt+0xe/0x20
> 2011-02-09T07:40:15.636764+01:00 phy005 kernel: [<ffffffff81051174>]
> do_group_exit+0x81/0xab
> 2011-02-09T07:40:15.636767+01:00 phy005 kernel: [<ffffffff810511b5>]
> sys_exit_group+0x17/0x1b
> 2011-02-09T07:40:15.636771+01:00 phy005 kernel: [<ffffffff81009c72>]
> system_call_fastpath+0x16/0x1b
> 2011-02-09T07:40:15.636777+01:00 phy005 kernel: Code: 88 ff ff ff b8
> 01 00 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 41 89 d4 48 89
> f3 e8 7b c7 fe ff 41 83 fc 01 48 89 c7 75 0d<48>  2b 18 48 c1 e3 03 48
> 03 58 18 eb 39 41 8d 4c 24 ff be 01 00
> 2011-02-09T07:40:15.636785+01:00 phy005 kernel: RIP
> [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
> 2011-02-09T07:40:15.636788+01:00 phy005 kernel: RSP<ffff88061cbcbcd8>
> 2011-02-09T07:40:15.636791+01:00 phy005 kernel: CR2: 0000000000000000
> 2011-02-09T07:40:15.637743+01:00 phy005 kernel: ---[ end trace
> 174c28940e9fd0a9 ]---
> 2011-02-09T07:40:15.637751+01:00 phy005 kernel: Fixing recursive fault
> but reboot is needed!
>

In kvm.  Was there a reboot between the two?

> So it doesn't seem to be a hardware problem since we replaced all that.

I agree.  And your other machines are stable?

When you say "identical software", are those exactly the same binaries?

copying Andrea for possible insight into the non-kvm oopses.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-13  2:07               ` Ruben Kerkhof
@ 2011-02-13 13:03                 ` Avi Kivity
  2011-02-13 14:40                   ` Ruben Kerkhof
  2011-02-15 17:16                   ` Marcelo Tosatti
  0 siblings, 2 replies; 19+ messages in thread
From: Avi Kivity @ 2011-02-13 13:03 UTC (permalink / raw)
  To: Ruben Kerkhof; +Cc: Marcelo Tosatti, kvm

On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:
> And tonight we had another one of those errors we had a few weeks ago:
>
> 2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
> 2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000

This GPA indexes into the 511th entry of the spte.  Marcelo, does this 
remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052 by any 
chance?

> 2011-02-13T02:56:28.694914+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x25602d007 level 4
> 2011-02-13T02:56:28.694916+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x3df3e2007 level 3
> 2011-02-13T02:56:28.694919+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x5e90c7007 level 2
> 2011-02-13T02:56:28.694925+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1

Magic 1603a073........ pte.

> 2011-02-13T02:56:28.694928+01:00 phy005 kernel:
> ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
> 2011-02-13T02:56:28.694930+01:00 phy005 kernel: ------------[ cut here
> ]------------
> 2011-02-13T02:56:28.694933+01:00 phy005 kernel: WARNING: at
> arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]()
> 2011-02-13T02:56:28.694936+01:00 phy005 kernel: Hardware name: X8DTU
> 2011-02-13T02:56:28.694941+01:00 phy005 kernel: Modules linked in: tun
> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
> ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt igb ioatdma
> dca iTCO_vendor_support joydev serio_raw microcode 3w_9xxx [last
> unloaded: scsi_wait_scan]
> 2011-02-13T02:56:28.695004+01:00 phy005 kernel: Pid: 4756, comm:
> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1
> 2011-02-13T02:56:28.695008+01:00 phy005 kernel: Call Trace:
> 2011-02-13T02:56:28.695013+01:00 phy005 kernel: [<ffffffff8104d11f>]
> warn_slowpath_common+0x7c/0x94
> 2011-02-13T02:56:28.695020+01:00 phy005 kernel: [<ffffffff8104d14b>]
> warn_slowpath_null+0x14/0x16
> 2011-02-13T02:56:28.695024+01:00 phy005 kernel: [<ffffffffa00c97fb>]
> handle_ept_misconfig+0x152/0x1d8 [kvm_intel]
> 2011-02-13T02:56:28.695028+01:00 phy005 kernel: [<ffffffffa00ca401>]
> vmx_handle_exit+0x204/0x23a [kvm_intel]
> 2011-02-13T02:56:28.695033+01:00 phy005 kernel: [<ffffffffa0084998>]
> kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
> 2011-02-13T02:56:28.695037+01:00 phy005 kernel: [<ffffffffa00735ba>]
> kvm_vcpu_ioctl+0xfd/0x56e [kvm]
> 2011-02-13T02:56:28.695042+01:00 phy005 kernel: [<ffffffff810feaab>] ?
> virt_to_head_page+0xe/0x2f
> 2011-02-13T02:56:28.695046+01:00 phy005 kernel: [<ffffffff810cc6ca>] ?
> mempool_kfree+0xe/0x10
> 2011-02-13T02:56:28.695051+01:00 phy005 kernel: [<ffffffff810cc857>] ?
> mempool_free+0x76/0x7b
> 2011-02-13T02:56:28.695055+01:00 phy005 kernel: [<ffffffff8111aa2f>]
> vfs_ioctl+0x32/0xa6
> 2011-02-13T02:56:28.695060+01:00 phy005 kernel: [<ffffffff8111afa2>]
> do_vfs_ioctl+0x483/0x4c9
> 2011-02-13T02:56:28.695065+01:00 phy005 kernel: [<ffffffff8111b03e>]
> sys_ioctl+0x56/0x79
> 2011-02-13T02:56:28.695070+01:00 phy005 kernel: [<ffffffff81009c72>]
> system_call_fastpath+0x16/0x1b
> 2011-02-13T02:56:28.695074+01:00 phy005 kernel: ---[ end trace
> d95032626ea304ca ]---
>
> Any help would be much appreciated. It seems very strange that I'm the
> first one who runs into this.
> I've found two bugreports which report the same, the first one at
> https://partner-bugzilla.redhat.com/show_bug.cgi?format=multiple&id=613691,
> but that's a duplicate of
> https://partner-bugzilla.redhat.com/show_bug.cgi?id=606131 which I'm
> not authorized to see...

These don't appear to be related.  Are you running ksm, btw?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-13 12:58               ` Avi Kivity
@ 2011-02-13 14:36                 ` Ruben Kerkhof
  0 siblings, 0 replies; 19+ messages in thread
From: Ruben Kerkhof @ 2011-02-13 14:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, Andrea Arcangeli

Hi Avi,

On Sun, Feb 13, 2011 at 13:58, Avi Kivity <avi@redhat.com> wrote:
> On 02/10/2011 05:23 PM, Ruben Kerkhof wrote:
>>
>> This machine has been running for a week without problems, but then we
>> started to get the following oopses again:
>>
>> 2011-02-06T19:45:35.221555+01:00 phy005 kernel: BUG: unable to handle
>> kernel paging request at ffffea71929180e0
>> 2011-02-06T19:45:35.222194+01:00 phy005 kernel: IP:
>> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
>> 2011-02-06T19:45:35.222199+01:00 phy005 kernel: PGD 118600067 PUD 0
>> 2011-02-06T19:45:35.222203+01:00 phy005 kernel: Oops: 0000 [#1] SMP
>> 2011-02-06T19:45:35.222221+01:00 phy005 kernel: last sysfs file:
>> /sys/devices/system/cpu/cpu15/topology/thread_siblings
>> 2011-02-06T19:45:35.222224+01:00 phy005 kernel: CPU 4
>> 2011-02-06T19:45:35.222229+01:00 phy005 kernel: Modules linked in: tun
>> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
>> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
>> ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt serio_raw igb
>> iTCO_vendor_support joydev ioatdma dca 3w_9xxx [last unloaded:
>> scsi_wait_scan]
>> 2011-02-06T19:45:35.222231+01:00 phy005 kernel:
>> 2011-02-06T19:45:35.222233+01:00 phy005 kernel: Pid: 3650, comm:
>> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X8DTU/X8DTU
>> 2011-02-06T19:45:35.222236+01:00 phy005 kernel: RIP:
>> 0010:[<ffffffff81034880>]  [<ffffffff81034880>]
>> gup_pte_range+0x94/0xd3
>> 2011-02-06T19:45:35.222239+01:00 phy005 kernel: RSP:
>> 0018:ffff88060b9bda78  EFLAGS: 00010082
>> 2011-02-06T19:45:35.222241+01:00 phy005 kernel: RAX: ffffea71929180e0
>> RBX: 00003ffffffff000 RCX: 0000000000000005
>> 2011-02-06T19:45:35.222243+01:00 phy005 kernel: RDX: 00007fe54e400000
>> RSI: 00007fe54e3ff000 RDI: 1603a07305004067
>> 2011-02-06T19:45:35.222245+01:00 phy005 kernel: RBP: ffff88060b9bda98
>> R08: ffff880b94384560 R09: ffff88060b9bdb44
>> 2011-02-06T19:45:35.222248+01:00 phy005 kernel: R10: ffff880606b2fff8
>> R11: ffffea0000000000 R12: 0000000000000205
>> 2011-02-06T19:45:35.222251+01:00 phy005 kernel: R13: ffffc00000000fff
>> R14: 0000000000000005 R15: 0000000000000000
>> 2011-02-06T19:45:35.222255+01:00 phy005 kernel: FS:
>> 00007fe64cb0e700(0000) GS:ffff880655400000(0000)
>> knlGS:0000000000000000
>> 2011-02-06T19:45:35.222259+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
>> 002b CR0: 0000000080050033
>> 2011-02-06T19:45:35.222263+01:00 phy005 kernel: CR2: ffffea71929180e0
>> CR3: 0000000bff06d000 CR4: 00000000000026e0
>> 2011-02-06T19:45:35.222267+01:00 phy005 kernel: DR0: 0000000000000000
>> DR1: 0000000000000000 DR2: 0000000000000000
>> 2011-02-06T19:45:35.222271+01:00 phy005 kernel: DR3: 0000000000000000
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> 2011-02-06T19:45:35.222274+01:00 phy005 kernel: Process qemu-kvm (pid:
>> 3650, threadinfo ffff88060b9bc000, task ffff880623ed2ee0)
>> 2011-02-06T19:45:35.222278+01:00 phy005 kernel: Stack:
>> 2011-02-06T19:45:35.222281+01:00 phy005 kernel: 00007fe54e400000
>> 00007fe54e400000 00007fe54e400000 ffff88053a0d2388
>> 2011-02-06T19:45:35.222285+01:00 phy005 kernel:<0>  ffff88060b9bdaf8
>> ffffffff81034a15 00007fe54e3fffff 00007fe54e3fffff
>> 2011-02-06T19:45:35.222289+01:00 phy005 kernel:<0>  ffff88060b9bdb44
>> ffff880b94384560 ffff880bff06eca8 ffff880bff06d7f8
>> 2011-02-06T19:45:35.222292+01:00 phy005 kernel: Call Trace:
>> 2011-02-06T19:45:35.222296+01:00 phy005 kernel: [<ffffffff81034a15>]
>> gup_pud_range+0x156/0x192
>> 2011-02-06T19:45:35.222300+01:00 phy005 kernel: [<ffffffff81034b15>]
>> get_user_pages_fast+0xc4/0x172
>> 2011-02-06T19:45:35.222304+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
>> bio_add_page+0x36/0x38
>> 2011-02-06T19:45:35.222308+01:00 phy005 kernel: [<ffffffff81134730>]
>> dio_get_page+0x54/0x127
>> 2011-02-06T19:45:35.222312+01:00 phy005 kernel: [<ffffffff81135317>]
>> __blockdev_direct_IO+0x41d/0xa36
>> 2011-02-06T19:45:35.222316+01:00 phy005 kernel: [<ffffffffa0080f69>] ?
>> x86_emulate_insn+0x1ff8/0x2d61 [kvm]
>> 2011-02-06T19:45:35.222320+01:00 phy005 kernel: [<ffffffff8113379b>]
>> blkdev_direct_IO+0x4e/0x50
>> 2011-02-06T19:45:35.222324+01:00 phy005 kernel: [<ffffffff81132c49>] ?
>> blkdev_get_blocks+0x0/0x8d
>> 2011-02-06T19:45:35.222328+01:00 phy005 kernel: [<ffffffff810cb516>]
>> generic_file_direct_write+0xed/0x16d
>> 2011-02-06T19:45:35.222331+01:00 phy005 kernel: [<ffffffff810cb72c>]
>> __generic_file_aio_write+0x196/0x281
>> 2011-02-06T19:45:35.222335+01:00 phy005 kernel: [<ffffffff811d5352>] ?
>> file_has_perm+0xa4/0xc6
>> 2011-02-06T19:45:35.222339+01:00 phy005 kernel: [<ffffffff81133043>] ?
>> blkdev_aio_write+0x0/0x69
>> 2011-02-06T19:45:35.222343+01:00 phy005 kernel: [<ffffffff8113306d>]
>> blkdev_aio_write+0x2a/0x69
>> 2011-02-06T19:45:35.222347+01:00 phy005 kernel: [<ffffffff81133043>] ?
>> blkdev_aio_write+0x0/0x69
>> 2011-02-06T19:45:35.222351+01:00 phy005 kernel: [<ffffffff8113d4eb>]
>> aio_rw_vect_retry+0x85/0x18e
>> 2011-02-06T19:45:35.222355+01:00 phy005 kernel: [<ffffffff8113e9b3>]
>> aio_run_iocb+0x77/0x10f
>> 2011-02-06T19:45:35.222359+01:00 phy005 kernel: [<ffffffff8113f508>]
>> do_io_submit+0x558/0x7ce
>> 2011-02-06T19:45:35.222363+01:00 phy005 kernel: [<ffffffff8113f78e>]
>> sys_io_submit+0x10/0x12
>> 2011-02-06T19:45:35.222366+01:00 phy005 kernel: [<ffffffff81009c72>]
>> system_call_fastpath+0x16/0x1b
>> 2011-02-06T19:45:35.222372+01:00 phy005 kernel: Code: 21 d8 49 01 c2
>> 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
>> 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8<66>  83 38 00 48 89 c7 79
>> 04 48 8b 78 10 f0 ff 47 08 49 63 39 48
>> 2011-02-06T19:45:35.222376+01:00 phy005 kernel: RIP
>> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
>> 2011-02-06T19:45:35.222379+01:00 phy005 kernel: RSP<ffff88060b9bda78>
>> 2011-02-06T19:45:35.222382+01:00 phy005 kernel: CR2: ffffea71929180e0
>> 2011-02-06T19:45:35.222386+01:00 phy005 kernel: ---[ end trace
>> beed2b54d0bb8a00 ]---
>>
>
> Hm, outside any kvm code.
>
>> and
>>
>> 2011-02-06T19:47:15.023129+01:00 phy005 kernel: qemu-kvm: Corrupted
>> page table at address 7fbde15ff64c
>> 2011-02-06T19:47:15.023207+01:00 phy005 kernel: PGD 5ff58a067 PUD
>> 612668067 PMD 5937b7067 PE 1603a07305008067
>
> Again outside kvm, and again the magic pte 1603axxxxx.
>
>
>> followed by
>>
>> 2011-02-06T21:20:32.882972+01:00 phy005 kernel: BUG: unable to handle
>> kernel paging request at fffff6b192918010
>> 2011-02-06T21:20:32.883252+01:00 phy005 kernel: IP:
>> [<ffffffffa0078826>] kvm_mmu_zap_page+0x28a/0x299 [kvm]
>
> Well, after something goes bad, nothing good can come out of it.
>
>> after which we rebooted the machine and replaced the motherboard and
>> cpus (we already replaced the memory before).
>>
>> But 2 days ago we got this oops:
>>
>> 2011-02-08T15:56:19.902104+01:00 phy005 kernel: BUG: unable to handle
>> kernel paging request at ffffea71929181c0
>> 2011-02-08T15:56:19.902686+01:00 phy005 kernel: IP:
>> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
>> 2011-02-08T15:56:19.902693+01:00 phy005 kernel: PGD 118600067 PUD 0
>> 2011-02-08T15:56:19.902699+01:00 phy005 kernel: Oops: 0000 [#1] SMP
>> 2011-02-08T15:56:19.902703+01:00 phy005 kernel: last sysfs file:
>> /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m
>> ap
>> 2011-02-08T15:56:19.902708+01:00 phy005 kernel: CPU 8
>> 2011-02-08T15:56:19.902715+01:00 phy005 kernel: Modules linked in: tun
>> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
>>  garp stp llc bonding xt_comment xt_recent ip6t_REJECT
>> nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
>> gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca
>> serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan]
>> 2011-02-08T15:56:19.902770+01:00 phy005 kernel:
>> 2011-02-08T15:56:19.902775+01:00 phy005 kernel: Pid: 3346, comm:
>> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1 X
>> 8DTU/X8DTU
>> 2011-02-08T15:56:19.902781+01:00 phy005 kernel: RIP:
>> 0010:[<ffffffff81034880>]  [<ffffffff81034880>] gup_pte_range+0x94/
>> 0xd3
>> 2011-02-08T15:56:19.902785+01:00 phy005 kernel: RSP:
>> 0018:ffff880c21bc1a78  EFLAGS: 00010086
>> 2011-02-08T15:56:19.902789+01:00 phy005 kernel: RAX: ffffea71929181c0
>> RBX: 00003ffffffff000 RCX: 0000000000000005
>> 2011-02-08T15:56:19.902793+01:00 phy005 kernel: RDX: 00007fa2ca200000
>> RSI: 00007fa2ca1ff000 RDI: 1603a07305008067
>> 2011-02-08T15:56:19.902797+01:00 phy005 kernel: RBP: ffff880c21bc1a98
>> R08: ffff88060fdfad60 R09: ffff880c21bc1b44
>> 2011-02-08T15:56:19.902801+01:00 phy005 kernel: R10: ffff88061493fff8
>> R11: ffffea0000000000 R12: 0000000000000205
>> 2011-02-08T15:56:19.902805+01:00 phy005 kernel: R13: ffffc00000000fff
>> R14: 0000000000000005 R15: 0000000000000000
>> 2011-02-08T15:56:19.902810+01:00 phy005 kernel: FS:
>> 00007fa2d8724700(0000) GS:ffff880002080000(0000) knlGS:000000000000
>> 0000
>> 2011-02-08T15:56:19.902820+01:00 phy005 kernel: CS:  0010 DS: 002b ES:
>> 002b CR0: 0000000080050033
>> 2011-02-08T15:56:19.902825+01:00 phy005 kernel: CR2: ffffea71929181c0
>> CR3: 0000000c231f9000 CR4: 00000000000026e0
>> 2011-02-08T15:56:19.902829+01:00 phy005 kernel: DR0: 0000000000000000
>> DR1: 0000000000000000 DR2: 0000000000000000
>> 2011-02-08T15:56:19.902833+01:00 phy005 kernel: DR3: 0000000000000000
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> 2011-02-08T15:56:19.902837+01:00 phy005 kernel: Process qemu-kvm (pid:
>> 3346, threadinfo ffff880c21bc0000, task ffff880c2
>> 264ddc0)
>> 2011-02-08T15:56:19.902841+01:00 phy005 kernel: Stack:
>> 2011-02-08T15:56:19.902844+01:00 phy005 kernel: 00007fa2ca200000
>> 00007fa2ca201000 00007fa2ca201000 ffff880c22c3d280
>> 2011-02-08T15:56:19.902848+01:00 phy005 kernel:<0>  ffff880c21bc1af8
>> ffffffff81034a15 00007fa2ca200fff 00007fa2ca200fff
>> 2011-02-08T15:56:19.902852+01:00 phy005 kernel:<0>  ffff880c21bc1b44
>> ffff88060fdfad60 ffff880c2231a458 ffff880c231f97f8
>> 2011-02-08T15:56:19.902855+01:00 phy005 kernel: Call Trace:
>> 2011-02-08T15:56:19.902859+01:00 phy005 kernel: [<ffffffff81034a15>]
>> gup_pud_range+0x156/0x192
>> 2011-02-08T15:56:19.902863+01:00 phy005 kernel: [<ffffffff81034b15>]
>> get_user_pages_fast+0xc4/0x172
>> 2011-02-08T15:56:19.902867+01:00 phy005 kernel: [<ffffffff81131fbc>] ?
>> bio_add_page+0x36/0x38
>> 2011-02-08T15:56:19.902871+01:00 phy005 kernel: [<ffffffff81134730>]
>> dio_get_page+0x54/0x127
>> 2011-02-08T15:56:19.902875+01:00 phy005 kernel: [<ffffffff81135317>]
>> __blockdev_direct_IO+0x41d/0xa36
>> 2011-02-08T15:56:19.902880+01:00 phy005 kernel: [<ffffffffa008bf69>] ?
>> x86_emulate_insn+0x1ff8/0x2d61 [kvm]
>> 2011-02-08T15:56:19.902884+01:00 phy005 kernel: [<ffffffff8113379b>]
>> blkdev_direct_IO+0x4e/0x50
>> 2011-02-08T15:56:19.902888+01:00 phy005 kernel: [<ffffffff81132c49>] ?
>> blkdev_get_blocks+0x0/0x8d
>> 2011-02-08T15:56:19.902892+01:00 phy005 kernel: [<ffffffff810cb516>]
>> generic_file_direct_write+0xed/0x16d
>> 2011-02-08T15:56:19.902896+01:00 phy005 kernel: [<ffffffff810cb72c>]
>> __generic_file_aio_write+0x196/0x281
>> 2011-02-08T15:56:19.902899+01:00 phy005 kernel: [<ffffffff81133043>] ?
>> blkdev_aio_write+0x0/0x69
>> 2011-02-08T15:56:19.902909+01:00 phy005 kernel: [<ffffffff81133043>] ?
>> blkdev_aio_write+0x0/0x69
>> 2011-02-08T15:56:19.902914+01:00 phy005 kernel: [<ffffffff8113d4eb>]
>> aio_rw_vect_retry+0x85/0x18e
>> 2011-02-08T15:56:19.902919+01:00 phy005 kernel: [<ffffffff8113e9b3>]
>> aio_run_iocb+0x77/0x10f
>> 2011-02-08T15:56:19.902923+01:00 phy005 kernel: [<ffffffff8113f508>]
>> do_io_submit+0x558/0x7ce
>> 2011-02-08T15:56:19.902927+01:00 phy005 kernel: [<ffffffff8113f78e>]
>> sys_io_submit+0x10/0x12
>> 2011-02-08T15:56:19.902932+01:00 phy005 kernel: [<ffffffff81009c72>]
>> system_call_fastpath+0x16/0x1b
>> 2011-02-08T15:56:19.902938+01:00 phy005 kernel: Code: 21 d8 49 01 c2
>> 49 8b 3a 49 89 fe 4d 21 ee 4d 21 e6 49 39 ce 75 49 48 89 f8 0f 1f 40
>> 00 48 21 d8 48 c1 e8 0c 48 6b c0 38 4c 01 d8<66>  83 38 00 48 89 c7 79
>> 04 48 8b 78 10 f0 ff 47 08 49 63 39 48
>> 2011-02-08T15:56:19.903077+01:00 phy005 kernel: RIP
>> [<ffffffff81034880>] gup_pte_range+0x94/0xd3
>> 2011-02-08T15:56:19.903081+01:00 phy005 kernel: RSP<ffff880c21bc1a78>
>> 2011-02-08T15:56:19.903084+01:00 phy005 kernel: CR2: ffffea71929181c0
>> 2011-02-08T15:56:19.903088+01:00 phy005 kernel: ---[ end trace
>> 174c28940e9fd0a7 ]---
>>
>
> Again outside kvm.
>
>> and yesterday this one:
>>
>> 2011-02-09T07:40:15.636528+01:00 phy005 kernel: BUG: unable to handle
>> kernel NULL pointer dereference at (null)
>> 2011-02-09T07:40:15.636635+01:00 phy005 kernel: IP:
>> [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
>> 2011-02-09T07:40:15.636639+01:00 phy005 kernel: PGD 0
>> 2011-02-09T07:40:15.636643+01:00 phy005 kernel: Oops: 0000 [#3] SMP
>> 2011-02-09T07:40:15.636647+01:00 phy005 kernel: last sysfs file:
>> /sys/devices/system/cpu/cpu15/topology/thread_siblings
>> 2011-02-09T07:40:15.636650+01:00 phy005 kernel: CPU 2
>> 2011-02-09T07:40:15.636656+01:00 phy005 kernel: Modules linked in: tun
>> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
>> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
>> ip6_tables ipv6 kvm_intel kvm igb i2c_i801 iTCO_wdt ioatdma i2c_core
>> iTCO_vendor_support dca serio_raw joydev 3w_9xxx [last unloaded:
>> scsi_wait_scan]
>> 2011-02-09T07:40:15.636663+01:00 phy005 kernel:
>> 2011-02-09T07:40:15.636666+01:00 phy005 kernel: Pid: 2572, comm:
>> qemu-kvm Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_64 #1
>> X8DTU/X8DTU
>> 2011-02-09T07:40:15.636670+01:00 phy005 kernel: RIP:
>> 0010:[<ffffffffa0082db8>]  [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e
>> [kvm]
>> 2011-02-09T07:40:15.636673+01:00 phy005 kernel: RSP:
>> 0018:ffff88061cbcbcd8  EFLAGS: 00010246
>> 2011-02-09T07:40:15.636677+01:00 phy005 kernel: RAX: 0000000000000000
>> RBX: 1603a07305004fff RCX: ffff88061cbcbd08
>> 2011-02-09T07:40:15.636680+01:00 phy005 kernel: RDX: 0000000000000023
>> RSI: 1603a07305004fff RDI: 0000000000000000
>> 2011-02-09T07:40:15.636683+01:00 phy005 kernel: RBP: ffff88061cbcbce8
>> R08: 0000000000000023 R09: 0000000000000000
>> 2011-02-09T07:40:15.636686+01:00 phy005 kernel: R10: 0000000000000000
>> R11: ffffffffa0082c7f R12: 0000000000000001
>> 2011-02-09T07:40:15.636689+01:00 phy005 kernel: R13: 0000000000311763
>> R14: ffff8809b8b01ce0 R15: 0000000000000000
>> 2011-02-09T07:40:15.636692+01:00 phy005 kernel: FS:
>> 0000000000000000(0000) GS:ffff880002040000(0000)
>> knlGS:0000000000000000
>> 2011-02-09T07:40:15.636695+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
>> 0000 CR0: 000000008005003b
>> 2011-02-09T07:40:15.636699+01:00 phy005 kernel: CR2: 0000000000000000
>> CR3: 0000000001a42000 CR4: 00000000000026e0
>> 2011-02-09T07:40:15.636702+01:00 phy005 kernel: DR0: 0000000000000000
>> DR1: 0000000000000000 DR2: 0000000000000000
>> 2011-02-09T07:40:15.636705+01:00 phy005 kernel: DR3: 0000000000000000
>> DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> 2011-02-09T07:40:15.636709+01:00 phy005 kernel: Process qemu-kvm (pid:
>> 2572, threadinfo ffff88061cbca000, task ffff88061cf04650)
>> 2011-02-09T07:40:15.636711+01:00 phy005 kernel: Stack:
>> 2011-02-09T07:40:15.636715+01:00 phy005 kernel: ffff88036c471ff8
>> ffff880c23984000 ffff88061cbcbd18 ffffffffa0082ea9
>> 2011-02-09T07:40:15.636718+01:00 phy005 kernel:<0>  ffff8809b8b01ce0
>> ffff880c23984000 ffff88036c471ff8 00000000000001ff
>> 2011-02-09T07:40:15.636721+01:00 phy005 kernel:<0>  ffff88061cbcbd58
>> ffffffffa008363b 0000000000000200 ffff880c23984000
>> 2011-02-09T07:40:15.636724+01:00 phy005 kernel: Call Trace:
>> 2011-02-09T07:40:15.636728+01:00 phy005 kernel: [<ffffffffa0082ea9>]
>> rmap_remove+0xa3/0x1a0 [kvm]
>> 2011-02-09T07:40:15.636731+01:00 phy005 kernel: [<ffffffffa008363b>]
>> kvm_mmu_zap_page+0x9f/0x299 [kvm]
>> 2011-02-09T07:40:15.636734+01:00 phy005 kernel: [<ffffffffa0083a42>]
>> kvm_mmu_zap_all+0x35/0x60 [kvm]
>> 2011-02-09T07:40:15.636738+01:00 phy005 kernel: [<ffffffffa0078cde>]
>> kvm_arch_flush_shadow+0x16/0x22 [kvm]
>> 2011-02-09T07:40:15.636741+01:00 phy005 kernel: [<ffffffffa006eb0a>]
>> kvm_mmu_notifier_release+0x31/0x44 [kvm]
>> 2011-02-09T07:40:15.636744+01:00 phy005 kernel: [<ffffffff810fac37>]
>> __mmu_notifier_release+0x4f/0x7b
>> 2011-02-09T07:40:15.636748+01:00 phy005 kernel: [<ffffffff810e735d>]
>> exit_mmap+0x2c/0x132
>> 2011-02-09T07:40:15.636751+01:00 phy005 kernel: [<ffffffff8104ad7a>]
>> mmput+0x5e/0xca
>> 2011-02-09T07:40:15.636754+01:00 phy005 kernel: [<ffffffff8104f0d5>]
>> exit_mm+0x114/0x121
>> 2011-02-09T07:40:15.636757+01:00 phy005 kernel: [<ffffffff81050bf5>]
>> do_exit+0x254/0x752
>> 2011-02-09T07:40:15.636760+01:00 phy005 kernel: [<ffffffff8100a60e>] ?
>> apic_timer_interrupt+0xe/0x20
>> 2011-02-09T07:40:15.636764+01:00 phy005 kernel: [<ffffffff81051174>]
>> do_group_exit+0x81/0xab
>> 2011-02-09T07:40:15.636767+01:00 phy005 kernel: [<ffffffff810511b5>]
>> sys_exit_group+0x17/0x1b
>> 2011-02-09T07:40:15.636771+01:00 phy005 kernel: [<ffffffff81009c72>]
>> system_call_fastpath+0x16/0x1b
>> 2011-02-09T07:40:15.636777+01:00 phy005 kernel: Code: 88 ff ff ff b8
>> 01 00 00 00 c9 c3 55 48 89 e5 41 54 53 0f 1f 44 00 00 41 89 d4 48 89
>> f3 e8 7b c7 fe ff 41 83 fc 01 48 89 c7 75 0d<48>  2b 18 48 c1 e3 03 48
>> 03 58 18 eb 39 41 8d 4c 24 ff be 01 00
>> 2011-02-09T07:40:15.636785+01:00 phy005 kernel: RIP
>> [<ffffffffa0082db8>] gfn_to_rmap+0x20/0x6e [kvm]
>> 2011-02-09T07:40:15.636788+01:00 phy005 kernel: RSP<ffff88061cbcbcd8>
>> 2011-02-09T07:40:15.636791+01:00 phy005 kernel: CR2: 0000000000000000
>> 2011-02-09T07:40:15.637743+01:00 phy005 kernel: ---[ end trace
>> 174c28940e9fd0a9 ]---
>> 2011-02-09T07:40:15.637751+01:00 phy005 kernel: Fixing recursive fault
>> but reboot is needed!
>>
>
> In kvm.  Was there a reboot between the two?

No, there wasn't. I've just looked back at the logs and there was
another oops in between:

2011-02-09T04:28:01.890999+01:00 phy005 kernel: general protection
fault: 0000 [#2] SMP
2011-02-09T04:28:01.891122+01:00 phy005 kernel: last sysfs file:
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_m
ap
2011-02-09T04:28:01.891127+01:00 phy005 kernel: CPU 12
2011-02-09T04:28:01.891137+01:00 phy005 kernel: Modules linked in: tun
ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q
 garp stp llc bonding xt_comment xt_recent ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 kvm_intel kvm i
gb i2c_i801 iTCO_wdt ioatdma i2c_core iTCO_vendor_support dca
serio_raw joydev 3w_9xxx [last unloaded: scsi_wait_scan]
2011-02-09T04:28:01.891144+01:00 phy005 kernel:
2011-02-09T04:28:01.891148+01:00 phy005 kernel: Pid: 19782, comm: find
Tainted: G      D    2.6.34.7-66.tilaa.fc13.x86_6
4 #1 X8DTU/X8DTU
2011-02-09T04:28:01.891154+01:00 phy005 kernel: RIP:
0010:[<ffffffff81158aa4>]  [<ffffffff81158aa4>] proc_fd_instantiate
+0x88/0x127
2011-02-09T04:28:01.891157+01:00 phy005 kernel: RSP:
0018:ffff880245677da8  EFLAGS: 00010206
2011-02-09T04:28:01.891161+01:00 phy005 kernel: RAX: 1603a07305000000
RBX: ffff8808076ada40 RCX: ffff88058bbbddc0
2011-02-09T04:28:01.891164+01:00 phy005 kernel: RDX: 000000000000022a
RSI: ffff8808076ada40 RDI: ffff88062293ee80
2011-02-09T04:28:01.891168+01:00 phy005 kernel: RBP: ffff880245677dc8
R08: ffff8808076a91d0 R09: ffffffff81158a1c
2011-02-09T04:28:01.891172+01:00 phy005 kernel: R10: 0000000000000002
R11: ffff880245677d08 R12: ffff88062293ee00
2011-02-09T04:28:01.891176+01:00 phy005 kernel: R13: ffff8805b3897bf8
R14: ffff8808076a9430 R15: ffff8807ddd76c00
2011-02-09T04:28:01.891180+01:00 phy005 kernel: FS:
00007f09aa8e07a0(0000) GS:ffff880655480000(0000) knlGS:000000000000
0000
2011-02-09T04:28:01.891184+01:00 phy005 kernel: CS:  0010 DS: 0000 ES:
0000 CR0: 0000000080050033
2011-02-09T04:28:01.891188+01:00 phy005 kernel: CR2: 0000000000e43080
CR3: 00000007d6d6c000 CR4: 00000000000026e0
2011-02-09T04:28:01.891192+01:00 phy005 kernel: DR0: 0000000000000000
DR1: 0000000000000000 DR2: 0000000000000000
2011-02-09T04:28:01.891196+01:00 phy005 kernel: DR3: 0000000000000000
DR6: 00000000ffff0ff0 DR7: 0000000000000400
2011-02-09T04:28:01.891199+01:00 phy005 kernel: Process find (pid:
19782, threadinfo ffff880245676000, task ffff88058bbb
8000)
2011-02-09T04:28:01.891202+01:00 phy005 kernel: Stack:
2011-02-09T04:28:01.891206+01:00 phy005 kernel: ffff880245677e78
0000000000000003 ffff8802bfe0af00 ffff8808076ada40
2011-02-09T04:28:01.891209+01:00 phy005 kernel: <0> ffff880245677e38
ffffffff811564b8 ffff880245677e38 ffffffff81158a1c
2011-02-09T04:28:01.891213+01:00 phy005 kernel: <0> ffffffff8111b530
ffff880245677f38 0000000300119d45 ffff880245677e78
2011-02-09T04:28:01.891216+01:00 phy005 kernel: Call Trace:
2011-02-09T04:28:01.891220+01:00 phy005 kernel: [<ffffffff811564b8>]
proc_fill_cache+0xa7/0x13f
2011-02-09T04:28:01.891224+01:00 phy005 kernel: [<ffffffff81158a1c>] ?
proc_fd_instantiate+0x0/0x127
2011-02-09T04:28:01.891227+01:00 phy005 kernel: [<ffffffff8111b530>] ?
filldir+0x0/0xd0
2011-02-09T04:28:01.891231+01:00 phy005 kernel: [<ffffffff8111b530>] ?
filldir+0x0/0xd0
2011-02-09T04:28:01.891235+01:00 phy005 kernel: [<ffffffff811586c8>]
proc_readfd_common+0x159/0x1a3
2011-02-09T04:28:01.891239+01:00 phy005 kernel: [<ffffffff81158a1c>] ?
proc_fd_instantiate+0x0/0x127
2011-02-09T04:28:01.891242+01:00 phy005 kernel: [<ffffffff8111b530>] ?
filldir+0x0/0xd0
2011-02-09T04:28:01.891246+01:00 phy005 kernel: [<ffffffff8115873e>]
proc_readfd+0x15/0x17
2011-02-09T04:28:01.891250+01:00 phy005 kernel: [<ffffffff8111b731>]
vfs_readdir+0x77/0xb4
2011-02-09T04:28:01.891254+01:00 phy005 kernel: [<ffffffff8111b8b7>]
sys_getdents+0x81/0xd1
2011-02-09T04:28:01.891258+01:00 phy005 kernel: [<ffffffff81009c72>]
system_call_fastpath+0x16/0x1b
2011-02-09T04:28:01.891263+01:00 phy005 kernel: Code: e8 08 3e 2f 00
49 8b 44 24 08 44 3b 28 0f 83 9c 00 00 00 45 89 ed 49 c1 e5 03 4c 03
68 08 49 8b 45 00 48 85 c0 0f 84 84 00 00 00 <f6> 40 3c 01 74 0a 66 41
81 8e aa 00 00 00 40 01 f6 40 3c 02 74
2011-02-09T04:28:01.891275+01:00 phy005 kernel: RIP
[<ffffffff81158aa4>] proc_fd_instantiate+0x88/0x127
2011-02-09T04:28:01.891279+01:00 phy005 kernel: RSP <ffff880245677da8>
2011-02-09T04:28:01.891283+01:00 phy005 kernel: ---[ end trace
174c28940e9fd0a8 ]---

>
>> So it doesn't seem to be a hardware problem since we replaced all that.
>
> I agree.  And your other machines are stable?

Yes, the other ones have been running for ages without problems.
We've been using 2.6.34.7 for about three months now.

> When you say "identical software", are those exactly the same binaries?

Yes, the same (kickstarted) install, the same rpms.

> copying Andrea for possible insight into the non-kvm oopses.
>
> --
> error compiling committee.c: too many arguments to function

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-13 13:03                 ` Avi Kivity
@ 2011-02-13 14:40                   ` Ruben Kerkhof
  2011-02-15 17:16                   ` Marcelo Tosatti
  1 sibling, 0 replies; 19+ messages in thread
From: Ruben Kerkhof @ 2011-02-13 14:40 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm

On Sun, Feb 13, 2011 at 14:03, Avi Kivity <avi@redhat.com> wrote:
> On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:
>>
>> And tonight we had another one of those errors we had a few weeks ago:
>>
>> 2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
>> 2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000
>
> This GPA indexes into the 511th entry of the spte.  Marcelo, does this
> remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052 by any
> chance?
>
>> 2011-02-13T02:56:28.694914+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x25602d007 level 4
>> 2011-02-13T02:56:28.694916+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x3df3e2007 level 3
>> 2011-02-13T02:56:28.694919+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x5e90c7007 level 2
>> 2011-02-13T02:56:28.694925+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: spte 0x1603a0730500d277 level 1
>
> Magic 1603a073........ pte.
>
>> 2011-02-13T02:56:28.694928+01:00 phy005 kernel:
>> ept_misconfig_inspect_spte: rsvd_bits = 0x3a00000000000
>> 2011-02-13T02:56:28.694930+01:00 phy005 kernel: ------------[ cut here
>> ]------------
>> 2011-02-13T02:56:28.694933+01:00 phy005 kernel: WARNING: at
>> arch/x86/kvm/vmx.c:3425 handle_ept_misconfig+0x152/0x1d8 [kvm_intel]()
>> 2011-02-13T02:56:28.694936+01:00 phy005 kernel: Hardware name: X8DTU
>> 2011-02-13T02:56:28.694941+01:00 phy005 kernel: Modules linked in: tun
>> ipmi_devintf ipmi_si ipmi_msghandler bridge 8021q garp stp llc bonding
>> xt_comment xt_recent ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
>> ip6_tables ipv6 kvm_intel kvm i2c_i801 i2c_core iTCO_wdt igb ioatdma
>> dca iTCO_vendor_support joydev serio_raw microcode 3w_9xxx [last
>> unloaded: scsi_wait_scan]
>> 2011-02-13T02:56:28.695004+01:00 phy005 kernel: Pid: 4756, comm:
>> qemu-kvm Not tainted 2.6.34.7-66.tilaa.fc13.x86_64 #1
>> 2011-02-13T02:56:28.695008+01:00 phy005 kernel: Call Trace:
>> 2011-02-13T02:56:28.695013+01:00 phy005 kernel: [<ffffffff8104d11f>]
>> warn_slowpath_common+0x7c/0x94
>> 2011-02-13T02:56:28.695020+01:00 phy005 kernel: [<ffffffff8104d14b>]
>> warn_slowpath_null+0x14/0x16
>> 2011-02-13T02:56:28.695024+01:00 phy005 kernel: [<ffffffffa00c97fb>]
>> handle_ept_misconfig+0x152/0x1d8 [kvm_intel]
>> 2011-02-13T02:56:28.695028+01:00 phy005 kernel: [<ffffffffa00ca401>]
>> vmx_handle_exit+0x204/0x23a [kvm_intel]
>> 2011-02-13T02:56:28.695033+01:00 phy005 kernel: [<ffffffffa0084998>]
>> kvm_arch_vcpu_ioctl_run+0x7cd/0xa74 [kvm]
>> 2011-02-13T02:56:28.695037+01:00 phy005 kernel: [<ffffffffa00735ba>]
>> kvm_vcpu_ioctl+0xfd/0x56e [kvm]
>> 2011-02-13T02:56:28.695042+01:00 phy005 kernel: [<ffffffff810feaab>] ?
>> virt_to_head_page+0xe/0x2f
>> 2011-02-13T02:56:28.695046+01:00 phy005 kernel: [<ffffffff810cc6ca>] ?
>> mempool_kfree+0xe/0x10
>> 2011-02-13T02:56:28.695051+01:00 phy005 kernel: [<ffffffff810cc857>] ?
>> mempool_free+0x76/0x7b
>> 2011-02-13T02:56:28.695055+01:00 phy005 kernel: [<ffffffff8111aa2f>]
>> vfs_ioctl+0x32/0xa6
>> 2011-02-13T02:56:28.695060+01:00 phy005 kernel: [<ffffffff8111afa2>]
>> do_vfs_ioctl+0x483/0x4c9
>> 2011-02-13T02:56:28.695065+01:00 phy005 kernel: [<ffffffff8111b03e>]
>> sys_ioctl+0x56/0x79
>> 2011-02-13T02:56:28.695070+01:00 phy005 kernel: [<ffffffff81009c72>]
>> system_call_fastpath+0x16/0x1b
>> 2011-02-13T02:56:28.695074+01:00 phy005 kernel: ---[ end trace
>> d95032626ea304ca ]---
>>
>> Any help would be much appreciated. It seems very strange that I'm the
>> first one who runs into this.
>> I've found two bugreports which report the same, the first one at
>>
>> https://partner-bugzilla.redhat.com/show_bug.cgi?format=multiple&id=613691,
>> but that's a duplicate of
>> https://partner-bugzilla.redhat.com/show_bug.cgi?id=606131 which I'm
>> not authorized to see...
>
> These don't appear to be related.  Are you running ksm, btw?

No.

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-13 13:03                 ` Avi Kivity
  2011-02-13 14:40                   ` Ruben Kerkhof
@ 2011-02-15 17:16                   ` Marcelo Tosatti
  2011-02-15 19:04                     ` Ruben Kerkhof
  1 sibling, 1 reply; 19+ messages in thread
From: Marcelo Tosatti @ 2011-02-15 17:16 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Ruben Kerkhof, kvm

On Sun, Feb 13, 2011 at 03:03:40PM +0200, Avi Kivity wrote:
> On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:
> >And tonight we had another one of those errors we had a few weeks ago:
> >
> >2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
> >2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000
> 
> This GPA indexes into the 511th entry of the spte.  Marcelo, does
> this remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052
> by any chance?

This and the others reported. So yes, it looks something is corrupting
memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.
Is there any reason for not upgrading to FC14?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-15 17:16                   ` Marcelo Tosatti
@ 2011-02-15 19:04                     ` Ruben Kerkhof
  2011-02-24 21:15                       ` Ruben Kerkhof
  0 siblings, 1 reply; 19+ messages in thread
From: Ruben Kerkhof @ 2011-02-15 19:04 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm

Hi Marcelo,

On Tue, Feb 15, 2011 at 18:16, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Sun, Feb 13, 2011 at 03:03:40PM +0200, Avi Kivity wrote:
>> On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:
>> >And tonight we had another one of those errors we had a few weeks ago:
>> >
>> >2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
>> >2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000
>>
>> This GPA indexes into the 511th entry of the spte.  Marcelo, does
>> this remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052
>> by any chance?
>
> This and the others reported. So yes, it looks something is corrupting
> memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.

Sure, but not for a while, I'm first moving all my customers of this
machine. We've had to reboot it like 5 or 6 times in the last couple
of weeks.
As soon as that's done I'm going to test the hell out of it.

Now that we moved a few of the vm's we don't see any oopses, so it
could either be that it only triggers under load, or there's a
specific guest which is triggering it.

> Is there any reason for not upgrading to FC14?

I haven't had a reason to upgrade yet, all our other machines are
running fine, using the same kernel.
Plus I'm still finding lots of issues unrelated to kvm on F14, broken
ssh in combination with openldap, ipmi bugs, selinux policy etc.
Next to that it takes a lot of time to test all our images etc.

I'll probably skip the F14 kernel and go straight to 2.638, since that
should bring significant improvements like THP, async pagefaults etc.

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-15 19:04                     ` Ruben Kerkhof
@ 2011-02-24 21:15                       ` Ruben Kerkhof
  2011-02-27 10:46                         ` Avi Kivity
  0 siblings, 1 reply; 19+ messages in thread
From: Ruben Kerkhof @ 2011-02-24 21:15 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm

[-- Attachment #1: Type: text/plain, Size: 2145 bytes --]

Hi Marcelo,

On Tue, Feb 15, 2011 at 20:04, Ruben Kerkhof <ruben@rubenkerkhof.com> wrote:
> Hi Marcelo,
>
> On Tue, Feb 15, 2011 at 18:16, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>> On Sun, Feb 13, 2011 at 03:03:40PM +0200, Avi Kivity wrote:
>>> On 02/13/2011 04:07 AM, Ruben Kerkhof wrote:
>>> >And tonight we had another one of those errors we had a few weeks ago:
>>> >
>>> >2011-02-13T02:56:28.694496+01:00 phy005 kernel: EPT: Misconfiguration.
>>> >2011-02-13T02:56:28.694908+01:00 phy005 kernel: EPT: GPA: 0x2edff000
>>>
>>> This GPA indexes into the 511th entry of the spte.  Marcelo, does
>>> this remind you of https://bugzilla.kernel.org/show_bug.cgi?id=27052
>>> by any chance?
>>
>> This and the others reported. So yes, it looks something is corrupting
>> memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.

Ok, there are now only 6 vms left on this host, and I've booted it
with the slub_debug=ZFPU option.
After a few hours, I got the following result:

2011-02-24T21:41:30.818496+01:00 phy005 kernel:
=============================================================================
2011-02-24T21:41:30.818517+01:00 phy005 kernel: BUG kmalloc-2048 (Not
tainted): Object padding overwritten
2011-02-24T21:41:30.818523+01:00 phy005 kernel:
-----------------------------------------------------------------------------
2011-02-24T21:41:30.818526+01:00 phy005 kernel:
2011-02-24T21:41:30.818530+01:00 phy005 kernel: INFO:
0xffff8806230752ca-0xffff8806230752cf. First byte 0x0 instead of 0x5a
2011-02-24T21:41:30.818534+01:00 phy005 kernel: INFO: Allocated in
__netdev_alloc_skb+0x34/0x51 age=2231 cpu=8 pid=0
2011-02-24T21:41:30.818537+01:00 phy005 kernel: INFO: Freed in
skb_release_data+0xc9/0xce age=2368 cpu=8 pid=2159
2011-02-24T21:41:30.818541+01:00 phy005 kernel: INFO: Slab
0xffffea00157a9880 objects=15 used=13 fp=0xffff8806230752d0
flags=0x40000000004083
2011-02-24T21:41:30.818545+01:00 phy005 kernel: INFO: Object
0xffff880623074a88 @offset=19080 fp=0xffff8806230752d0

The rest of the output is attached since it's quite large.

Kind regards,

Ruben

[-- Attachment #2: messages.gz --]
[-- Type: application/x-gzip, Size: 46355 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-24 21:15                       ` Ruben Kerkhof
@ 2011-02-27 10:46                         ` Avi Kivity
  2011-03-05 18:57                           ` Ruben Kerkhof
  0 siblings, 1 reply; 19+ messages in thread
From: Avi Kivity @ 2011-02-27 10:46 UTC (permalink / raw)
  To: Ruben Kerkhof; +Cc: Marcelo Tosatti, kvm, netdev


Copying netdev: looks like memory corruption in the networking stack.

Archive link: http://www.spinics.net/lists/kvm/msg50651.html (for the 
attachment).

On 02/24/2011 11:15 PM, Ruben Kerkhof wrote:
> >
> >  On Tue, Feb 15, 2011 at 18:16, Marcelo Tosatti<mtosatti@redhat.com>  wrote:
>
> >>  This and the others reported. So yes, it looks something is corrupting
> >>  memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.
>
> Ok, there are now only 6 vms left on this host, and I've booted it
> with the slub_debug=ZFPU option.
> After a few hours, I got the following result:
>
> 2011-02-24T21:41:30.818496+01:00 phy005 kernel:
> =============================================================================
> 2011-02-24T21:41:30.818517+01:00 phy005 kernel: BUG kmalloc-2048 (Not
> tainted): Object padding overwritten
> 2011-02-24T21:41:30.818523+01:00 phy005 kernel:
> -----------------------------------------------------------------------------
> 2011-02-24T21:41:30.818526+01:00 phy005 kernel:
> 2011-02-24T21:41:30.818530+01:00 phy005 kernel: INFO:
> 0xffff8806230752ca-0xffff8806230752cf. First byte 0x0 instead of 0x5a
> 2011-02-24T21:41:30.818534+01:00 phy005 kernel: INFO: Allocated in
> __netdev_alloc_skb+0x34/0x51 age=2231 cpu=8 pid=0
> 2011-02-24T21:41:30.818537+01:00 phy005 kernel: INFO: Freed in
> skb_release_data+0xc9/0xce age=2368 cpu=8 pid=2159
> 2011-02-24T21:41:30.818541+01:00 phy005 kernel: INFO: Slab
> 0xffffea00157a9880 objects=15 used=13 fp=0xffff8806230752d0
> flags=0x40000000004083
> 2011-02-24T21:41:30.818545+01:00 phy005 kernel: INFO: Object
> 0xffff880623074a88 @offset=19080 fp=0xffff8806230752d0
>
> The rest of the output is attached since it's quite large.
>
> Kind regards,
>
> Ruben


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: EPT: Misconfiguration
  2011-02-27 10:46                         ` Avi Kivity
@ 2011-03-05 18:57                           ` Ruben Kerkhof
  0 siblings, 0 replies; 19+ messages in thread
From: Ruben Kerkhof @ 2011-03-05 18:57 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, kvm, netdev

On Sun, Feb 27, 2011 at 11:46, Avi Kivity <avi@redhat.com> wrote:
>
> Copying netdev: looks like memory corruption in the networking stack.
>
> Archive link: http://www.spinics.net/lists/kvm/msg50651.html (for the
> attachment).

There's now only a single guest running on this host (Ubuntu Maverick).
I've also upgraded the host kernel to 2.6.38-rc6, and this just
happened (after a day or so):

2011-03-05T19:41:58.328866+01:00 phy005 kernel: [85271.656862] BUG
kmalloc-2048 (Not tainted): Object padding overwritten
2011-03-05T19:41:58.328870+01:00 phy005 kernel: [85271.656864]
-----------------------------------------------------------------------------
2011-03-05T19:41:58.328875+01:00 phy005 kernel: [85271.656866]
2011-03-05T19:41:58.328885+01:00 phy005 kernel: [85271.656870] INFO:
0xffff880c0d52a960-0xffff880c0d52a967. First byte 0x0 instead of 0x5a
2011-03-05T19:41:58.328890+01:00 phy005 kernel: [85271.656880] INFO:
Allocated in __netdev_alloc_skb+0x1f/0x3b age=16039 cpu=5 pid=0
2011-03-05T19:41:58.328894+01:00 phy005 kernel: [85271.656886] INFO:
Freed in skb_release_data+0xa5/0xaa age=0 cpu=5 pid=1766
2011-03-05T19:41:58.328898+01:00 phy005 kernel: [85271.656890] INFO:
Slab 0xffffea002a2ea0c0 objects=15 used=13 fp=0xffff880c0d52a120
flags=0xc00000000040c1
2011-03-05T19:41:58.328902+01:00 phy005 kernel: [85271.656894] INFO:
Object 0xffff880c0d52a120 @offset=8480 fp=0xffff880c0d52d2d0
2011-03-05T19:41:58.328905+01:00 phy005 kernel: [85271.656895]
2011-03-05T19:41:58.328909+01:00 phy005 kernel: [85271.656897] Bytes
b4 0xffff880c0d52a110:  14 89 12 05 01 00 00 00 5a 5a 5a 5a 5a 5a 5a
5a ........ZZZZZZZZ
2011-03-05T19:41:58.328913+01:00 phy005 kernel: [85271.656909]
Object 0xffff880c0d52a120:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b kkkkkkkkkkkkkkkk

We have a quite complex network stack, two interfaces (igb) attached
to bond0, with on top two bridges and on that two vlans.
The guest is running a vpn and an IPv6 tunnel.

Let me know if more info is needed.

Kind regards,

Ruben

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2011-03-05 18:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-20 11:48 EPT: Misconfiguration Ruben Kerkhof
2011-01-20 11:59 ` Ruben Kerkhof
2011-01-21 13:22 ` Marcelo Tosatti
2011-01-25 14:44   ` Ruben Kerkhof
2011-01-25 17:39     ` Avi Kivity
2011-01-25 18:29       ` Ruben Kerkhof
2011-01-26  9:52         ` Avi Kivity
2011-01-26 15:00           ` Ruben Kerkhof
2011-02-10 15:23             ` Ruben Kerkhof
2011-02-13  2:07               ` Ruben Kerkhof
2011-02-13 13:03                 ` Avi Kivity
2011-02-13 14:40                   ` Ruben Kerkhof
2011-02-15 17:16                   ` Marcelo Tosatti
2011-02-15 19:04                     ` Ruben Kerkhof
2011-02-24 21:15                       ` Ruben Kerkhof
2011-02-27 10:46                         ` Avi Kivity
2011-03-05 18:57                           ` Ruben Kerkhof
2011-02-13 12:58               ` Avi Kivity
2011-02-13 14:36                 ` Ruben Kerkhof

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.