From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753055AbdJ3L14 (ORCPT ); Mon, 30 Oct 2017 07:27:56 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:49045 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753042AbdJ3L1z (ORCPT ); Mon, 30 Oct 2017 07:27:55 -0400 X-Google-Smtp-Source: ABhQp+RgiruxyeuqKmbA9JdvejSlqdRFj8u0GZ2HyFOdM6drKv0tcvmcf45mznSU5P9LIKboF+cqJg== Date: Mon, 30 Oct 2017 14:27:52 +0300 From: "Kirill A. Shutemov" To: Fengguang Wu Cc: Linux Memory Management List , Linus Torvalds , Linux Kernel Mailing List , "Kirill A. Shutemov" , Vineet Gupta , "Aneesh Kumar K.V" , Dan Williams , Geliang Tang Subject: Re: [pgtable_trans_huge_withdraw] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 Message-ID: <20171030112752.c4n4m4vhh2barjew@node.shutemov.name> References: <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com> <20171029233701.4pjqaesnrjqshmzn@wfg-t540p.sh.intel.com> <20171030091940.mcljomnaqvrhvwjx@node.shutemov.name> <20171030092842.a2zq5gza4tufyku2@wfg-t540p.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171030092842.a2zq5gza4tufyku2@wfg-t540p.sh.intel.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 30, 2017 at 10:28:42AM +0100, Fengguang Wu wrote: > Hi Kirill, > > On Mon, Oct 30, 2017 at 12:19:40PM +0300, Kirill A. Shutemov wrote: > > On Mon, Oct 30, 2017 at 12:37:01AM +0100, Fengguang Wu wrote: > > > CC MM people. > > > > > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: > > > > Hi Linus, > > > > > > > > Up to now we see the below boot error/warnings when testing v4.14-rc6. > > > > > > > > They hit the RC release mainly due to various imperfections in 0day's > > > > auto bisection. So I manually list them here and CC the likely easy to > > > > debug ones to the corresponding maintainers in the followup emails. > > > > > > > > boot_successes: 4700 > > > > boot_failures: 247 > > > > > > > > BUG:kernel_hang_in_test_stage: 152 > > > > BUG:kernel_reboot-without-warning_in_test_stage: 10 > > > > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c: 1 > > > > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c: 3 > > > > BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c: 21 > > > > BUG:soft_lockup-CPU##stuck_for#s: 1 > > > > BUG:unable_to_handle_kernel: 13 > > > > > > Here is the call trace: > > > > > > [ 956.669197] [ 956.670421] stress-ng: fail: [27945] stress-ng-numa: > > > get_mempolicy: errno=22 (Invalid argument) > > > > Can you also share how you run stress-ng? Is it reproducible? > > The command line is > > stress-ng --class cpu --sequential $(nproc) --timeout 1 --times --verify --metrics-brief > > The test box is > > model: Broadwell-EP > nr_cpu: 88 > memory: 128G By chance, do you emulated nvdimm there? I suspect DAX stuff. Do you have full dmesg around? -- Kirill A. Shutemov From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f198.google.com (mail-wr0-f198.google.com [209.85.128.198]) by kanga.kvack.org (Postfix) with ESMTP id EDFD86B0033 for ; Mon, 30 Oct 2017 07:27:55 -0400 (EDT) Received: by mail-wr0-f198.google.com with SMTP id 11so7871375wrb.10 for ; Mon, 30 Oct 2017 04:27:55 -0700 (PDT) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id h4sor6743812edb.7.2017.10.30.04.27.54 for (Google Transport Security); Mon, 30 Oct 2017 04:27:54 -0700 (PDT) Date: Mon, 30 Oct 2017 14:27:52 +0300 From: "Kirill A. Shutemov" Subject: Re: [pgtable_trans_huge_withdraw] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 Message-ID: <20171030112752.c4n4m4vhh2barjew@node.shutemov.name> References: <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com> <20171029233701.4pjqaesnrjqshmzn@wfg-t540p.sh.intel.com> <20171030091940.mcljomnaqvrhvwjx@node.shutemov.name> <20171030092842.a2zq5gza4tufyku2@wfg-t540p.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171030092842.a2zq5gza4tufyku2@wfg-t540p.sh.intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Fengguang Wu Cc: Linux Memory Management List , Linus Torvalds , Linux Kernel Mailing List , "Kirill A. Shutemov" , Vineet Gupta , "Aneesh Kumar K.V" , Dan Williams , Geliang Tang On Mon, Oct 30, 2017 at 10:28:42AM +0100, Fengguang Wu wrote: > Hi Kirill, > > On Mon, Oct 30, 2017 at 12:19:40PM +0300, Kirill A. Shutemov wrote: > > On Mon, Oct 30, 2017 at 12:37:01AM +0100, Fengguang Wu wrote: > > > CC MM people. > > > > > > On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote: > > > > Hi Linus, > > > > > > > > Up to now we see the below boot error/warnings when testing v4.14-rc6. > > > > > > > > They hit the RC release mainly due to various imperfections in 0day's > > > > auto bisection. So I manually list them here and CC the likely easy to > > > > debug ones to the corresponding maintainers in the followup emails. > > > > > > > > boot_successes: 4700 > > > > boot_failures: 247 > > > > > > > > BUG:kernel_hang_in_test_stage: 152 > > > > BUG:kernel_reboot-without-warning_in_test_stage: 10 > > > > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c: 1 > > > > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c: 3 > > > > BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c: 21 > > > > BUG:soft_lockup-CPU##stuck_for#s: 1 > > > > BUG:unable_to_handle_kernel: 13 > > > > > > Here is the call trace: > > > > > > [ 956.669197] [ 956.670421] stress-ng: fail: [27945] stress-ng-numa: > > > get_mempolicy: errno=22 (Invalid argument) > > > > Can you also share how you run stress-ng? Is it reproducible? > > The command line is > > stress-ng --class cpu --sequential $(nproc) --timeout 1 --times --verify --metrics-brief > > The test box is > > model: Broadwell-EP > nr_cpu: 88 > memory: 128G By chance, do you emulated nvdimm there? I suspect DAX stuff. Do you have full dmesg around? -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org