* Kernel crashes in __migration_entry_wait
@ 2016-11-13 12:39 Dashi DS1 Cao
2016-11-14 11:19 ` Odzioba, Lukasz
0 siblings, 1 reply; 3+ messages in thread
From: Dashi DS1 Cao @ 2016-11-13 12:39 UTC (permalink / raw)
To: 'linux-x86_64@vger.kernel.org',
'linux-numa@vger.kernel.org'
Hi all,
A X86_64 server repeatedly dumps once a while with the following signature:
PID: 32577 TASK: ffff882d4351d080 CPU: 22 COMMAND: "vertica"
#0 [ffff8812a2bdfba8] machine_kexec at ffffffff81051beb
#1 [ffff8812a2bdfc08] crash_kexec at ffffffff810f2542
#2 [ffff8812a2bdfcd8] oops_end at ffffffff8163e1a8
#3 [ffff8812a2bdfd00] die at ffffffff8101859b
#4 [ffff8812a2bdfd30] do_general_protection at ffffffff8163da9e
#5 [ffff8812a2bdfd60] general_protection at ffffffff8163d3a8
[exception RIP: __migration_entry_wait+148]
RIP: ffffffff811c5f64 RSP: ffff8812a2bdfe18 RFLAGS: 00010203
RAX: 01ffffffffffffff RBX: ffffea0000000030 RCX: 0000000000000000
RDX: 1e001897de001880 RSI: ffffea0000000030 RDI: f000c4bef000c4be
RBP: ffff8812a2bdfe28 R8: 00003ffffffff000 R9: 00000000000000a9
R10: 0000000000000000 R11: f000c4bef000c4be R12: 1e000297de001880
R13: ffff88195992d440 R14: ffff883be5ea0000 R15: 0000000000000080
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#6 [ffff8812a2bdfe10] __migration_entry_wait at ffffffff811c5eea
#7 [ffff8812a2bdfe30] migration_entry_wait at ffffffff811c62b3
#8 [ffff8812a2bdfe40] handle_mm_fault at ffffffff81197a12
#9 [ffff8812a2bdfed0] __do_page_fault at ffffffff81640e22
#10 [ffff8812a2bdff28] do_page_fault at ffffffff81641113
#11 [ffff8812a2bdff50] page_fault at ffffffff8163d408
RIP: 00000000022d80ba RSP: 00007f2171bf7990 RFLAGS: 00010206
RAX: 0000000000002000 RBX: 00007f1a2521fac0 RCX: 00007f1a2521fac0
RDX: 0000000000000000 RSI: 00000000b9b0b802 RDI: 00000000000039d8
RBP: 00007f2171bf79a0 R8: 0000000000000000 R9: 000000000de857de
R10: 0000000000000000 R11: 00007f1a2521fac0 R12: 00007f1bad9215f0
R13: 00007f21540c4710 R14: 00007f1e09b40d70 R15: 00007f2154040d78
ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b
KERNEL: vmlinux
DUMPFILE: 127.0.0.1-2016-10-03-09:59:36/vmcore [PARTIAL DUMP]
CPUS: 32
DATE: Mon Oct 3 10:13:22 2016
UPTIME: 4 days, 17:04:52
LOAD AVERAGE: 0.49, 0.26, 0.24
TASKS: 657
NODENAME: node04-priv
RELEASE: 3.10.0-327.el7.x86_64
VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015
MACHINE: x86_64 (2600 Mhz)
MEMORY: 240 GB
PANIC: "general protection fault: 0000 [#1] SMP "
PID: 32577
COMMAND: "vertica"
TASK: ffff882d4351d080 [THREAD_INFO: ffff8812a2bdc000]
CPU: 22
STATE: TASK_RUNNING (PANIC)
It seems that this is a bug. I'm not sure if it has been identified and removed, but it cannot be found on the web. The customer was adviced to disable numa balancing to work around and I'm waiting for the latest results from them.
Thank you all!
Dashi Cao
181 0102 1741
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Kernel crashes in __migration_entry_wait
2016-11-13 12:39 Kernel crashes in __migration_entry_wait Dashi DS1 Cao
@ 2016-11-14 11:19 ` Odzioba, Lukasz
2016-11-28 12:46 ` Dashi DS1 Cao
0 siblings, 1 reply; 3+ messages in thread
From: Odzioba, Lukasz @ 2016-11-14 11:19 UTC (permalink / raw)
To: Dashi DS1 Cao
Cc: 'linux-x86_64@vger.kernel.org',
'linux-numa@vger.kernel.org'
On Sunday, November 13, 2016 1:40 PM Dashi Cao wrote:
> A X86_64 server repeatedly dumps once a while with the following signature:
> (snip)
>
> KERNEL: vmlinux
> DUMPFILE: 127.0.0.1-2016-10-03-09:59:36/vmcore [PARTIAL DUMP]
> CPUS: 32
> DATE: Mon Oct 3 10:13:22 2016
> UPTIME: 4 days, 17:04:52
> LOAD AVERAGE: 0.49, 0.26, 0.24
> TASKS: 657
> NODENAME: node04-priv
> RELEASE: 3.10.0-327.el7.x86_64
> (snip)
>
> It seems that this is a bug. I'm not sure if it has been identified and removed, but it cannot be found on the web. The customer was adviced to disable numa balancing to work around and I'm waiting for the latest results from them.
Hi Dashi,
Thank you for your report ,but this seems to be kernel from RedHat 7.2 not our latest one nor stable, so I am not sure how many people here may be interested in your issue. If you don't get answer you can talk to RH support. Also this kernel is not the latest available for 7.2 so you may just try to update it.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: Kernel crashes in __migration_entry_wait
2016-11-14 11:19 ` Odzioba, Lukasz
@ 2016-11-28 12:46 ` Dashi DS1 Cao
0 siblings, 0 replies; 3+ messages in thread
From: Dashi DS1 Cao @ 2016-11-28 12:46 UTC (permalink / raw)
To: Odzioba, Lukasz
Cc: 'linux-x86_64@vger.kernel.org',
'linux-numa@vger.kernel.org'
Hi Lukas,
Thank you for your reply.
This frequent crash is due to a PTE with a value of 0xF000 CXXX XXXX C4BE. Total eight dumps were examined and all of them show the same format of PTE. I think the latest kernel may well still has this bug in it. But I have no idea how this entry come to exist.
Either the 0xF in the most significant nibble of a PTE is a bad one, or the "#define __swp_offset(x) ((x).val >> SWP_OFFSET_SHIFT)" should mask off the most significant bits from M to 63, where M is the maximum number of physical address lines.
0xF in the most significant nibble of a PTE is probably valid, because of the NX bit and the Protection Key features of Intel paging entries.
Thanks.
Dashi Cao
-----Original Message-----
From: linux-x86_64-owner@vger.kernel.org [mailto:linux-x86_64-owner@vger.kernel.org] On Behalf Of Odzioba, Lukasz
Sent: Monday, November 14, 2016 7:20 PM
To: Dashi DS1 Cao <caods1@lenovo.com>
Cc: 'linux-x86_64@vger.kernel.org' <linux-x86_64@vger.kernel.org>; 'linux-numa@vger.kernel.org' <linux-numa@vger.kernel.org>
Subject: RE: Kernel crashes in __migration_entry_wait
On Sunday, November 13, 2016 1:40 PM Dashi Cao wrote:
> A X86_64 server repeatedly dumps once a while with the following signature:
> (snip)
>
> KERNEL: vmlinux
> DUMPFILE: 127.0.0.1-2016-10-03-09:59:36/vmcore [PARTIAL DUMP]
> CPUS: 32
> DATE: Mon Oct 3 10:13:22 2016
> UPTIME: 4 days, 17:04:52
> LOAD AVERAGE: 0.49, 0.26, 0.24
> TASKS: 657
> NODENAME: node04-priv
> RELEASE: 3.10.0-327.el7.x86_64
> (snip)
>
> It seems that this is a bug. I'm not sure if it has been identified and removed, but it cannot be found on the web. The customer was adviced to disable numa balancing to work around and I'm waiting for the latest results from them.
Hi Dashi,
Thank you for your report ,but this seems to be kernel from RedHat 7.2 not our latest one nor stable, so I am not sure how many people here may be interested in your issue. If you don't get answer you can talk to RH support. Also this kernel is not the latest available for 7.2 so you may just try to update it.
Thanks,
Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-x86_64" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-11-28 12:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-13 12:39 Kernel crashes in __migration_entry_wait Dashi DS1 Cao
2016-11-14 11:19 ` Odzioba, Lukasz
2016-11-28 12:46 ` Dashi DS1 Cao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.