From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dashi DS1 Cao Subject: Kernel crashes in __migration_entry_wait Date: Sun, 13 Nov 2016 12:39:38 +0000 Message-ID: <23B7B563BA4E9446B962B142C86EF24A3DCF87@CNMAILEX03.lenovo.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT Return-path: Content-Language: zh-CN Sender: linux-numa-owner@vger.kernel.org List-ID: To: "'linux-x86_64@vger.kernel.org'" , "'linux-numa@vger.kernel.org'" Hi all, A X86_64 server repeatedly dumps once a while with the following signature: PID: 32577 TASK: ffff882d4351d080 CPU: 22 COMMAND: "vertica" #0 [ffff8812a2bdfba8] machine_kexec at ffffffff81051beb #1 [ffff8812a2bdfc08] crash_kexec at ffffffff810f2542 #2 [ffff8812a2bdfcd8] oops_end at ffffffff8163e1a8 #3 [ffff8812a2bdfd00] die at ffffffff8101859b #4 [ffff8812a2bdfd30] do_general_protection at ffffffff8163da9e #5 [ffff8812a2bdfd60] general_protection at ffffffff8163d3a8 [exception RIP: __migration_entry_wait+148] RIP: ffffffff811c5f64 RSP: ffff8812a2bdfe18 RFLAGS: 00010203 RAX: 01ffffffffffffff RBX: ffffea0000000030 RCX: 0000000000000000 RDX: 1e001897de001880 RSI: ffffea0000000030 RDI: f000c4bef000c4be RBP: ffff8812a2bdfe28 R8: 00003ffffffff000 R9: 00000000000000a9 R10: 0000000000000000 R11: f000c4bef000c4be R12: 1e000297de001880 R13: ffff88195992d440 R14: ffff883be5ea0000 R15: 0000000000000080 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff8812a2bdfe10] __migration_entry_wait at ffffffff811c5eea #7 [ffff8812a2bdfe30] migration_entry_wait at ffffffff811c62b3 #8 [ffff8812a2bdfe40] handle_mm_fault at ffffffff81197a12 #9 [ffff8812a2bdfed0] __do_page_fault at ffffffff81640e22 #10 [ffff8812a2bdff28] do_page_fault at ffffffff81641113 #11 [ffff8812a2bdff50] page_fault at ffffffff8163d408 RIP: 00000000022d80ba RSP: 00007f2171bf7990 RFLAGS: 00010206 RAX: 0000000000002000 RBX: 00007f1a2521fac0 RCX: 00007f1a2521fac0 RDX: 0000000000000000 RSI: 00000000b9b0b802 RDI: 00000000000039d8 RBP: 00007f2171bf79a0 R8: 0000000000000000 R9: 000000000de857de R10: 0000000000000000 R11: 00007f1a2521fac0 R12: 00007f1bad9215f0 R13: 00007f21540c4710 R14: 00007f1e09b40d70 R15: 00007f2154040d78 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b KERNEL: vmlinux DUMPFILE: 127.0.0.1-2016-10-03-09:59:36/vmcore [PARTIAL DUMP] CPUS: 32 DATE: Mon Oct 3 10:13:22 2016 UPTIME: 4 days, 17:04:52 LOAD AVERAGE: 0.49, 0.26, 0.24 TASKS: 657 NODENAME: node04-priv RELEASE: 3.10.0-327.el7.x86_64 VERSION: #1 SMP Thu Nov 19 22:10:57 UTC 2015 MACHINE: x86_64 (2600 Mhz) MEMORY: 240 GB PANIC: "general protection fault: 0000 [#1] SMP " PID: 32577 COMMAND: "vertica" TASK: ffff882d4351d080 [THREAD_INFO: ffff8812a2bdc000] CPU: 22 STATE: TASK_RUNNING (PANIC) It seems that this is a bug. I'm not sure if it has been identified and removed, but it cannot be found on the web. The customer was adviced to disable numa balancing to work around and I'm waiting for the latest results from them. Thank you all! Dashi Cao 181 0102 1741