[RFH] NULL pointer dereference oops occurs when running kvm VM

* [RFH] NULL pointer dereference oops occurs when running kvm VM
@ 2016-08-12  9:08 Xiexiangyou
       [not found] ` <57AD9821.6010804@windriver.com>
       [not found] ` <20160815125011.GA16471@potion>
  0 siblings, 2 replies; 4+ messages in thread
From: Xiexiangyou @ 2016-08-12  9:08 UTC (permalink / raw)
  To: kvm; +Cc: guangrong.xiao@linux.intel.com;pbonzini

Kvm vm runs in hardware server with intel broadwell CPU. A oops exception occurs.

kernel version: 3.0.93
kvm version: 3.6
CPU: And the CPU is Intel(R) Xeon(R) CPU E5-2618L v4 @ 2.20GHz.

The message as follows :
<1>[25808.222049] BUG: unable to handle kernel NULL pointer dereference at           (null)
<1>[25808.230539] IP: [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
<4>[25808.237496] PGD 0
<1>[25808.239839] Thread overran stack, or stack corrupted
<0>[25808.245107] Oops: 0002 [#1] SMP
<4>[25808.286629] CPU 2
<4>[25808.288464] Modules linked in: kbox(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) die_notify(FN) signo_catch ipmi_devintf(F) ipmi_si(F) ipmi_msghandler(F) bonding(F) iptable_filter(F) ip_tables(F) x_tables(F) pmcint(F) openvswitch(F) gre(F) crc32c(F) libcrc32c(F) mperf(F) uhci_hcd(F) thermal(F) tg3(F) pcmcia(F) pcmcia_core(F) pciehp(F) pci_hotplug(F) nfs(F) lockd(F) fscache(F) auth_rpcgss(F) nfs_acl(F) sunrpc(F) mlx4_en(F) mlx4_core(F) compat(F) ide_cd_mod(F) ide_core(F) hpsa(F) fan(F) esp4(F) e1000(F) ata_generic(F) af_packet(F) vhost_scsi(F) target_core_mod(F) configfs(F) loop(F) dm_mod(F) ext3(F) jbd(F) mbcache(F) scsi_dh_rdac(F) scsi_dh_hp_sw(F) scsi_dh_emc(F) scsi_dh_alua(F) scsi_dh(F) mptsas(F) mptscsih(F) mptctl(F) mptbase(F) mpt2sas(F) scsi_transport_sas(F) raid_class(F) sd_mod(F) crc_t10dif(F) usbhid(F) hid(F) usb_storage(F) sr_mod(F) cdrom(F) vhost_net(F) macvtap(F) sg(F) macvlan(F) ixgbe(FX) tun(F) igb(F) ehci_hcd(F) kvm_intel(F) ipv6(F) dca(F) ipv6_lib(F) kvm(F) usbcore(F) ptp(F) i2c_i801(F) i2c_core(F) usb_common(F) megaraid_sas(F) pps_core(F) rtc_cmos(F) processor(F) thermal_sys(F) hwmon(F) button(F) ata_piix(F) ahci(F) libahci(F) libata(F) scsi_mod(F) [last unloaded: kbox]
<4>[25808.402500] Supported: No, Unsupported modules are loaded
<4>[25808.408212]
<4>[25808.410023] Pid: 29180, comm: qemu-kvm Tainted: GF          NX 3.0.93-0.8-default #1 Huawei RH2288H V3/BC11HGSA0
<4>[25808.420863] RIP: 0010:[<ffffffffa021f3c5>]  [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
<4>[25808.430560] RSP: 0018:ffff882fe1141d88  EFLAGS: 00010046
<4>[25808.436179] RAX: 0000000000000000 RBX: ffffffffa00d5270 RCX: 0000000000000000
<4>[25808.443617] RDX: ffff88187f88cee0 RSI: 0000000000000000 RDI: 0000000000000002
<4>[25808.451049] RBP: ffff8817bfba8140 R08: ffff8817c28be4c0 R09: 0000000000000000
<4>[25808.458490] R10: ffff8817bfbac100 R11: ffffffff81017ea0 R12: 0000000000000000
<4>[25808.465933] R13: ffff8817bfba8170 R14: 0000000000000000 R15: 0000000000000000
<4>[25808.473374] FS:  00007f1083e60700(0000) GS:ffff88187f880000(0000) knlGS:0000000000000000
<4>[25808.482088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>[25808.488145] CR2: 0000000000000000 CR3: 00000017c2a13000 CR4: 00000000001427e0
<4>[25808.495577] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[25808.503013] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
<4>[25808.510447] Process qemu-kvm (pid: 29180, threadinfo ffff882fe1140000, task ffff882fde8ac3c0)
<0>[25808.519588] Stack:
<4>[25808.521921]  ffff8817bfba8140 ffff882fde8ac3c0 0000000000000206 ffffffffa020b9e5
<4>[25808.530016]  0000000000000000 ffff882fde8ac3c0 ffffffff81082f10 ffff882fe1141dc0
<4>[25808.538103]  ffff882fe1141dc0 ffff8817bfba8140 ffff8817bfba8170 0000000000000001
<0>[25808.546203] Call Trace:
<4>[25808.549010]  [<ffffffffa021f798>] __vcpu_run+0x198/0x260 [kvm]
<4>[25808.562703]  [<ffffffffa0220418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm]
<4>[25808.569851]  [<ffffffffa020ccee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm]
<4>[25808.576344]  [<ffffffff8116bf6b>] do_vfs_ioctl+0x8b/0x3b0
<4>[25808.582059]  [<ffffffff8116c331>] sys_ioctl+0xa1/0xb0
<4>[25808.587425]  [<ffffffff81469872>] system_call_fastpath+0x16/0x1b
<4>[25808.593753]  [<00007f10871c1ce7>] 0x7f10871c1ce6
<0>[25808.598630] Code: 65 24 85 c0 74 25 48 8b 1d 99 30 04 00 48 85 db 74 19 48 8b 03 90 48 8b 7b 08 48 83 c3 10 44 89 e6 ff d0 48 8b 03 48 85 c0 75 eb
      [25808.612804] <48> 8b 05 14 51 04 00 48 89 ef ff 90 48 01 00 00 65 48 8b 04 25
<1>[25808.620776] RIP  [<ffffffffa021f3c5>] vcpu_enter_guest+0x555/0x790 [kvm]
<4>[25808.627768]  RSP <ffff882fe1141d88>
<0>[25808.631572] CR2: 0000000000000000

The assembly instruction of "RIP vcpu_enter_guest+0x555" is：”mov    0x45114(%rip),%rax”

The assembly code is:
0xffffffffa02f73bd <vcpu_enter_guest+1357>:     mov    (%rbx),%rax
0xffffffffa02f73c0 <vcpu_enter_guest+1360>:     test   %rax,%rax
0xffffffffa02f73c3 <vcpu_enter_guest+1363>:     jne    0xffffffffa02f73b0 <vcpu_enter_guest+1344>
0xffffffffa02f73c5 <vcpu_enter_guest+1365>:     mov    0x45114(%rip),%rax        # 0xffffffffa033c4e0 <kvm_x86_ops>
0xffffffffa02f73cc <vcpu_enter_guest+1372>:     mov    %rbp,%rdi
0xffffffffa02f73cf <vcpu_enter_guest+1375>:     callq  *0x148(%rax)

It's impossible that the instruction "mov  0x45114(%rip),%rax" make the BUG like "unable to handle kernel NULL pointer dereference at (null)",
Have anyone met the issue before? Is it a CPU bug?

Best regards!

^ permalink raw reply	[flat|nested] 4+ messages in thread