From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932410AbdJ3J2v (ORCPT <rfc822;w@1wt.eu>);
        Mon, 30 Oct 2017 05:28:51 -0400
Received: from mga06.intel.com ([134.134.136.31]:44311 "EHLO mga06.intel.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S932369AbdJ3J2t (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 30 Oct 2017 05:28:49 -0400
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.44,319,1505804400"; 
   d="scan'208";a="1031098765"
Date: Mon, 30 Oct 2017 10:28:42 +0100
From: Fengguang Wu <fengguang.wu@intel.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
        Vineet Gupta <Vineet.Gupta1@synopsys.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
        Dan Williams <dan.j.williams@intel.com>,
        Geliang Tang <geliangtang@163.com>
Subject: Re: [pgtable_trans_huge_withdraw] BUG: unable to handle kernel NULL
 pointer dereference at 0000000000000020
Message-ID: <20171030092842.a2zq5gza4tufyku2@wfg-t540p.sh.intel.com>
References: <CA+55aFxSJGeN=2X-uX-on1Uq2Nb8+v1aiMDz5H1+tKW_N5Q+6g@mail.gmail.com>
 <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com>
 <20171029233701.4pjqaesnrjqshmzn@wfg-t540p.sh.intel.com>
 <20171030091940.mcljomnaqvrhvwjx@node.shutemov.name>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Disposition: inline
In-Reply-To: <20171030091940.mcljomnaqvrhvwjx@node.shutemov.name>
User-Agent: NeoMutt/20170609 (1.8.3)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Kirill,

On Mon, Oct 30, 2017 at 12:19:40PM +0300, Kirill A. Shutemov wrote:
>On Mon, Oct 30, 2017 at 12:37:01AM +0100, Fengguang Wu wrote:
>> CC MM people.
>>
>> On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
>> > Hi Linus,
>> >
>> > Up to now we see the below boot error/warnings when testing v4.14-rc6.
>> >
>> > They hit the RC release mainly due to various imperfections in 0day's
>> > auto bisection. So I manually list them here and CC the likely easy to
>> > debug ones to the corresponding maintainers in the followup emails.
>> >
>> > boot_successes: 4700
>> > boot_failures: 247
>> >
>> > BUG:kernel_hang_in_test_stage: 152
>> > BUG:kernel_reboot-without-warning_in_test_stage: 10
>> > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c: 1
>> > BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c: 3
>> > BUG:sleeping_function_called_from_invalid_context_at_mm/page_alloc.c: 21
>> > BUG:soft_lockup-CPU##stuck_for#s: 1
>> > BUG:unable_to_handle_kernel: 13
>>
>> Here is the call trace:
>>
>> [  956.669197] [  956.670421] stress-ng: fail:  [27945] stress-ng-numa:
>> get_mempolicy: errno=22 (Invalid argument)
>
>Can you also share how you run stress-ng? Is it reproducible?

The command line is

        stress-ng --class cpu --sequential $(nproc) --timeout 1 --times --verify --metrics-brief

The test box is

        model: Broadwell-EP
        nr_cpu: 88
        memory: 128G

It shows up 4 times in 6 test runs:

/result/stress-ng/60s-cpu-performance/lkp-bdw-ep6/debian-x86_64-2016-08-31.cgz/x86_64-rhel-7.2/gcc-6/bb176f67090ca54869fc1262c913aa69d2ede070/matrix.json

  "dmesg.BUG:unable_to_handle_kernel": [
    0,
    1,
    1,
    1,
    0,
    1
  ],
 
Thanks,
Fengguang