All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFH] NULL pointer dereference oops occurs when running kvm VM
@ 2016-08-12  9:08 Xiexiangyou
       [not found] ` <57AD9821.6010804@windriver.com>
       [not found] ` <20160815125011.GA16471@potion>
  0 siblings, 2 replies; 4+ messages in thread
From: Xiexiangyou @ 2016-08-12  9:08 UTC (permalink / raw)
  To: kvm; +Cc: guangrong.xiao@linux.intel.com;pbonzini

Kvm vm runs in hardware server with intel broadwell CPU. A oops exception occurs.

kernel version: 3.0.93
kvm version: 3.6
CPU: And the CPU is Intel(R) Xeon(R) CPU E5-2618L v4 @ 2.20GHz.

The message as follows :
<1>[25808.222049] BUG: unable to handle kernel NULL pointer dereference at           (null)
<1>[25808.230539] IP: [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
<4>[25808.237496] PGD 0
<1>[25808.239839] Thread overran stack, or stack corrupted
<0>[25808.245107] Oops: 0002 [#1] SMP
<4>[25808.286629] CPU 2
<4>[25808.288464] Modules linked in: kbox(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) die_notify(FN) signo_catch ipmi_devintf(F) ipmi_si(F) ipmi_msghandler(F) bonding(F) iptable_filter(F) ip_tables(F) x_tables(F) pmcint(F) openvswitch(F) gre(F) crc32c(F) libcrc32c(F) mperf(F) uhci_hcd(F) thermal(F) tg3(F) pcmcia(F) pcmcia_core(F) pciehp(F) pci_hotplug(F) nfs(F) lockd(F) fscache(F) auth_rpcgss(F) nfs_acl(F) sunrpc(F) mlx4_en(F) mlx4_core(F) compat(F) ide_cd_mod(F) ide_core(F) hpsa(F) fan(F) esp4(F) e1000(F) ata_generic(F) af_packet(F) vhost_scsi(F) target_core_mod(F) configfs(F) loop(F) dm_mod(F) ext3(F) jbd(F) mbcache(F) scsi_dh_rdac(F) scsi_dh_hp_sw(F) scsi_dh_emc(F) scsi_dh_alua(F) scsi_dh(F) mptsas(F) mptscsih(F) mptctl(F) mptbase(F) mpt2sas(F) scsi_transport_sas(F) raid_class(F) sd_mod(F) crc_t10dif(F) usbhid(F) hid(F) usb_storage(F) sr_mod(F) cdrom(F) vhost_net(F) macvtap(F) sg(F) macvlan(F) ixgbe(FX) tun(F) igb(F) ehci_hcd(F) kvm_intel(F) ipv6(F) dca(F) ipv6_lib(F) kvm(F) usbcore(F) ptp(F) i2c_i801(F) i2c_core(F) usb_common(F) megaraid_sas(F) pps_core(F) rtc_cmos(F) processor(F) thermal_sys(F) hwmon(F) button(F) ata_piix(F) ahci(F) libahci(F) libata(F) scsi_mod(F) [last unloaded: kbox]
<4>[25808.402500] Supported: No, Unsupported modules are loaded
<4>[25808.408212]
<4>[25808.410023] Pid: 29180, comm: qemu-kvm Tainted: GF          NX 3.0.93-0.8-default #1 Huawei RH2288H V3/BC11HGSA0
<4>[25808.420863] RIP: 0010:[<ffffffffa021f3c5>]  [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
<4>[25808.430560] RSP: 0018:ffff882fe1141d88  EFLAGS: 00010046
<4>[25808.436179] RAX: 0000000000000000 RBX: ffffffffa00d5270 RCX: 0000000000000000
<4>[25808.443617] RDX: ffff88187f88cee0 RSI: 0000000000000000 RDI: 0000000000000002
<4>[25808.451049] RBP: ffff8817bfba8140 R08: ffff8817c28be4c0 R09: 0000000000000000
<4>[25808.458490] R10: ffff8817bfbac100 R11: ffffffff81017ea0 R12: 0000000000000000
<4>[25808.465933] R13: ffff8817bfba8170 R14: 0000000000000000 R15: 0000000000000000
<4>[25808.473374] FS:  00007f1083e60700(0000) GS:ffff88187f880000(0000) knlGS:0000000000000000
<4>[25808.482088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>[25808.488145] CR2: 0000000000000000 CR3: 00000017c2a13000 CR4: 00000000001427e0
<4>[25808.495577] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[25808.503013] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[25808.510447] Process qemu-kvm (pid: 29180, threadinfo ffff882fe1140000, task ffff882fde8ac3c0)
<0>[25808.519588] Stack:
<4>[25808.521921]  ffff8817bfba8140 ffff882fde8ac3c0 0000000000000206 ffffffffa020b9e5
<4>[25808.530016]  0000000000000000 ffff882fde8ac3c0 ffffffff81082f10 ffff882fe1141dc0
<4>[25808.538103]  ffff882fe1141dc0 ffff8817bfba8140 ffff8817bfba8170 0000000000000001
<0>[25808.546203] Call Trace:
<4>[25808.549010]  [<ffffffffa021f798>] __vcpu_run+0x198/0x260 [kvm]
<4>[25808.562703]  [<ffffffffa0220418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
<4>[25808.569851]  [<ffffffffa020ccee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
<4>[25808.576344]  [<ffffffff8116bf6b>] do_vfs_ioctl+0x8b/0x3b0
<4>[25808.582059]  [<ffffffff8116c331>] sys_ioctl+0xa1/0xb0
<4>[25808.587425]  [<ffffffff81469872>] system_call_fastpath+0x16/0x1b
<4>[25808.593753]  [<00007f10871c1ce7>] 0x7f10871c1ce6
<0>[25808.598630] Code: 65 24 85 c0 74 25 48 8b 1d 99 30 04 00 48 85 db 74 19 48 8b 03 90 48 8b 7b 08 48 83 c3 10 44 89 e6 ff d0 48 8b 03 48 85 c0 75 eb
      [25808.612804] <48> 8b 05 14 51 04 00 48 89 ef ff 90 48 01 00 00 65 48 8b 04 25
<1>[25808.620776] RIP  [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
<4>[25808.627768]  RSP <ffff882fe1141d88>
<0>[25808.631572] CR2: 0000000000000000


The assembly instruction of "RIP vcpu_enter_guest+0x555" is:”mov    0x45114(%rip),%rax”

The assembly code is:
0xffffffffa02f73bd <vcpu_enter_guest+1357>:     mov    (%rbx),%rax
0xffffffffa02f73c0 <vcpu_enter_guest+1360>:     test   %rax,%rax
0xffffffffa02f73c3 <vcpu_enter_guest+1363>:     jne    0xffffffffa02f73b0 <vcpu_enter_guest+1344>
0xffffffffa02f73c5 <vcpu_enter_guest+1365>:     mov    0x45114(%rip),%rax        # 0xffffffffa033c4e0 <kvm_x86_ops>
0xffffffffa02f73cc <vcpu_enter_guest+1372>:     mov    %rbp,%rdi
0xffffffffa02f73cf <vcpu_enter_guest+1375>:     callq  *0x148(%rax)

It's impossible that the instruction "mov  0x45114(%rip),%rax" make the BUG like "unable to handle kernel NULL pointer dereference at (null)",
Have anyone met the issue before? Is it a CPU bug?

Best regards!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFH] NULL pointer dereference oops occurs when running kvm VM
       [not found] ` <57AD9821.6010804@windriver.com>
@ 2016-08-14  8:45   ` Xiexiangyou
  0 siblings, 0 replies; 4+ messages in thread
From: Xiexiangyou @ 2016-08-14  8:45 UTC (permalink / raw)
  To: Yadi, kvm; +Cc: guangrong.xiao, pbonzini

Thanks,

The bug is hard to reproduced, It appears only twice. Unfortunately,
I didn't open kdump before the bug appear.

I want to find clues from these logs. Of course I will try to reproduce it.


On 2016/8/12 17:34, Yadi wrote:
> On 2016年08月12日 17:08, Xiexiangyou wrote:
>> Kvm vm runs in hardware server with intel broadwell CPU. A oops exception occurs.
>>
>> kernel version: 3.0.93
>> kvm version: 3.6
>> CPU: And the CPU is Intel(R) Xeon(R) CPU E5-2618L v4 @ 2.20GHz.
>>
>> The message as follows :
>> <1>[25808.222049] BUG: unable to handle kernel NULL pointer dereference at           (null)
>> <1>[25808.230539] IP: [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
>> <4>[25808.237496] PGD 0
>> <1>[25808.239839] Thread overran stack, or stack corrupted
> I am not for sure 100 percentage, just a suggestion: Turn on the stack depth checking functions to determine what is happening: echo 1 > /proc/sys/kernel/stack_tracer_enable
> 
>> <0>[25808.245107] Oops: 0002 [#1] SMP
>> <4>[25808.286629] CPU 2
>> <4>[25808.288464] Modules linked in: kbox(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) die_notify(FN) signo_catch ipmi_devintf(F) ipmi_si(F) ipmi_msghandler(F) bonding(F) iptable_filter(F) ip_tables(F) x_tables(F) pmcint(F) openvswitch(F) gre(F) crc32c(F) libcrc32c(F) mperf(F) uhci_hcd(F) thermal(F) tg3(F) pcmcia(F) pcmcia_core(F) pciehp(F) pci_hotplug(F) nfs(F) lockd(F) fscache(F) auth_rpcgss(F) nfs_acl(F) sunrpc(F) mlx4_en(F) mlx4_core(F) compat(F) ide_cd_mod(F) ide_core(F) hpsa(F) fan(F) esp4(F) e1000(F) ata_generic(F) af_packet(F) vhost_scsi(F) target_core_mod(F) configfs(F) loop(F) dm_mod(F) ext3(F) jbd(F) mbcache(F) scsi_dh_rdac(F) scsi_dh_hp_sw(F) scsi_dh_emc(F) scsi_dh_alua(F) scsi_dh(F) mptsas(F) mptscsih(F) mptctl(F) mptbase(F) mpt2sas(F) scsi_transport_sas(F) raid_class(F) sd_mod(F) crc_t10dif(F) usbhid(F) hid(F) usb_storage(F) sr_mod(F) cdrom(F) vhost_net(F) macvtap(F) sg(F) macvlan(F) ixgbe(FX) tun(F) igb(F) ehci_hcd(F) kvm_intel(F) ipv6(F) dca(
>> F) ipv6_lib(F) kvm(F) usbcore(F) ptp(F) i2c_i801(F) i2c_core(F) usb_common(F) megaraid_sas(F) pps_core(F) rtc_cmos(F) processor(F) thermal_sys(F) hwmon(F) button(F) ata_piix(F) ahci(F) libahci(F) libata(F) scsi_mod(F) [last unloaded: kbox]
>> <4>[25808.402500] Supported: No, Unsupported modules are loaded
>> <4>[25808.408212]
>> <4>[25808.410023] Pid: 29180, comm: qemu-kvm Tainted: GF          NX 3.0.93-0.8-default #1 Huawei RH2288H V3/BC11HGSA0
>> <4>[25808.420863] RIP: 0010:[<ffffffffa021f3c5>]  [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
>> <4>[25808.430560] RSP: 0018:ffff882fe1141d88  EFLAGS: 00010046
>> <4>[25808.436179] RAX: 0000000000000000 RBX: ffffffffa00d5270 RCX: 0000000000000000
>> <4>[25808.443617] RDX: ffff88187f88cee0 RSI: 0000000000000000 RDI: 0000000000000002
>> <4>[25808.451049] RBP: ffff8817bfba8140 R08: ffff8817c28be4c0 R09: 0000000000000000
>> <4>[25808.458490] R10: ffff8817bfbac100 R11: ffffffff81017ea0 R12: 0000000000000000
>> <4>[25808.465933] R13: ffff8817bfba8170 R14: 0000000000000000 R15: 0000000000000000
>> <4>[25808.473374] FS:  00007f1083e60700(0000) GS:ffff88187f880000(0000) knlGS:0000000000000000
>> <4>[25808.482088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> <4>[25808.488145] CR2: 0000000000000000 CR3: 00000017c2a13000 CR4: 00000000001427e0
>> <4>[25808.495577] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> <4>[25808.503013] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> <4>[25808.510447] Process qemu-kvm (pid: 29180, threadinfo ffff882fe1140000, task ffff882fde8ac3c0)
>> <0>[25808.519588] Stack:
>> <4>[25808.521921]  ffff8817bfba8140 ffff882fde8ac3c0 0000000000000206 ffffffffa020b9e5
>> <4>[25808.530016]  0000000000000000 ffff882fde8ac3c0 ffffffff81082f10 ffff882fe1141dc0
>> <4>[25808.538103]  ffff882fe1141dc0 ffff8817bfba8140 ffff8817bfba8170 0000000000000001
>> <0>[25808.546203] Call Trace:
>> <4>[25808.549010]  [<ffffffffa021f798>] __vcpu_run+0x198/0x260 [kvm]
>> <4>[25808.562703]  [<ffffffffa0220418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
>> <4>[25808.569851]  [<ffffffffa020ccee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
>> <4>[25808.576344]  [<ffffffff8116bf6b>] do_vfs_ioctl+0x8b/0x3b0
>> <4>[25808.582059]  [<ffffffff8116c331>] sys_ioctl+0xa1/0xb0
>> <4>[25808.587425]  [<ffffffff81469872>] system_call_fastpath+0x16/0x1b
>> <4>[25808.593753]  [<00007f10871c1ce7>] 0x7f10871c1ce6
>> <0>[25808.598630] Code: 65 24 85 c0 74 25 48 8b 1d 99 30 04 00 48 85 db 74 19 48 8b 03 90 48 8b 7b 08 48 83 c3 10 44 89 e6 ff d0 48 8b 03 48 85 c0 75 eb
>>       [25808.612804] <48> 8b 05 14 51 04 00 48 89 ef ff 90 48 01 00 00 65 48 8b 04 25
>> <1>[25808.620776] RIP  [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
>> <4>[25808.627768]  RSP <ffff882fe1141d88>
>> <0>[25808.631572] CR2: 0000000000000000
>>
>>
>> The assembly instruction of "RIP vcpu_enter_guest+0x555" is:”mov    0x45114(%rip),%rax”
>>
>> The assembly code is:
>> 0xffffffffa02f73bd <vcpu_enter_guest+1357>:     mov    (%rbx),%rax
>> 0xffffffffa02f73c0 <vcpu_enter_guest+1360>:     test   %rax,%rax
>> 0xffffffffa02f73c3 <vcpu_enter_guest+1363>:     jne    0xffffffffa02f73b0 <vcpu_enter_guest+1344>
>> 0xffffffffa02f73c5 <vcpu_enter_guest+1365>:     mov    0x45114(%rip),%rax        # 0xffffffffa033c4e0 <kvm_x86_ops>
>> 0xffffffffa02f73cc <vcpu_enter_guest+1372>:     mov    %rbp,%rdi
>> 0xffffffffa02f73cf <vcpu_enter_guest+1375>:     callq  *0x148(%rax)
>>
>> It's impossible that the instruction "mov  0x45114(%rip),%rax" make the BUG like "unable to handle kernel NULL pointer dereference at (null)",
>> Have anyone met the issue before? Is it a CPU bug?
>>
>> Best regards!
>> \x13��칻\x1c�&�~�&�\x18��+-��ݶ\x17��w��˛���m�/�)���w*\x1fjg���\x1e�����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a��\x7f��\x1e�G���h�\x0f�j:+v���w�٥
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFH] NULL pointer dereference oops occurs when running kvm VM
       [not found] ` <20160815125011.GA16471@potion>
@ 2016-08-16  8:25   ` Xiexiangyou
  2016-08-16 16:49     ` Radim Krčmář
  0 siblings, 1 reply; 4+ messages in thread
From: Xiexiangyou @ 2016-08-16  8:25 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: kvm, guangrong.xiao, pbonzini

Thanks for your reply :)

I'm confused that it throw an exception "NULL pointer dereference" when
implement "mov    0x45114(%rip),%rax" instruction. Because "0x45114(%rip)" couldn't be NULL.
Will thread stack overflow result in Oops which is hard to explain like this?

Reproducing is ongoing...


Regards~

On 2016/8/15 20:50, Radim Krčmář wrote:
> 2016-08-12 17:08+0800, Xiexiangyou:
>> Kvm vm runs in hardware server with intel broadwell CPU. A oops exception occurs.
>>
>> kernel version: 3.0.93
>> kvm version: 3.6
>> CPU: And the CPU is Intel(R) Xeon(R) CPU E5-2618L v4 @ 2.20GHz.
>>
>> The message as follows :
>> <1>[25808.222049] BUG: unable to handle kernel NULL pointer dereference at           (null)
>> <1>[25808.230539] IP: [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
>> <4>[25808.237496] PGD 0
>> <1>[25808.239839] Thread overran stack, or stack corrupted
> 
> This could be an imporant lead.  Stack overrun usually happened with xfs
> or similar operations, but your kernel does not look standard ...
> Can you reproduce after increasing the stack size with 6538b8ea886e
> ("x86_64: expand kernel stack to 16K")?
> 
> .
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFH] NULL pointer dereference oops occurs when running kvm VM
  2016-08-16  8:25   ` Xiexiangyou
@ 2016-08-16 16:49     ` Radim Krčmář
  0 siblings, 0 replies; 4+ messages in thread
From: Radim Krčmář @ 2016-08-16 16:49 UTC (permalink / raw)
  To: Xiexiangyou; +Cc: kvm, guangrong.xiao, pbonzini

2016-08-16 16:25+0800, Xiexiangyou:
> Thanks for your reply :)
> 
> I'm confused that it throw an exception "NULL pointer dereference" when
> implement "mov    0x45114(%rip),%rax" instruction. Because "0x45114(%rip)" couldn't be NULL.
> Will thread stack overflow result in Oops which is hard to explain like this?

Probably not, but it is easy to rule out.  Can't trust anything in a
corrupted system ...

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-16 16:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-12  9:08 [RFH] NULL pointer dereference oops occurs when running kvm VM Xiexiangyou
     [not found] ` <57AD9821.6010804@windriver.com>
2016-08-14  8:45   ` Xiexiangyou
     [not found] ` <20160815125011.GA16471@potion>
2016-08-16  8:25   ` Xiexiangyou
2016-08-16 16:49     ` Radim Krčmář

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.