From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 490FDC10DCE for ; Fri, 13 Mar 2020 11:12:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 213F42074E for ; Fri, 13 Mar 2020 11:12:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726504AbgCMLM5 (ORCPT ); Fri, 13 Mar 2020 07:12:57 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:5748 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726491AbgCMLM5 (ORCPT ); Fri, 13 Mar 2020 07:12:57 -0400 Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02DB7knT060945 for ; Fri, 13 Mar 2020 07:12:55 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2yqyjbapnu-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 13 Mar 2020 07:12:55 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 13 Mar 2020 11:12:53 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 13 Mar 2020 11:12:50 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 02DBCn8064749782 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 13 Mar 2020 11:12:49 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 62D584C04E; Fri, 13 Mar 2020 11:12:49 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1F7CF4C040; Fri, 13 Mar 2020 11:12:47 +0000 (GMT) Received: from linux.vnet.ibm.com (unknown [9.126.150.29]) by d06av22.portsmouth.uk.ibm.com (Postfix) with SMTP; Fri, 13 Mar 2020 11:12:46 +0000 (GMT) Date: Fri, 13 Mar 2020 16:42:46 +0530 From: Srikar Dronamraju To: Michael Ellerman Cc: Sachin Sant , Michal Hocko , Benjamin Herrenschmidt , Paul Mackerras , Pekka Enberg , Linux-Next Mailing List , David Rientjes , Christopher Lameter , linuxppc-dev@lists.ozlabs.org, Joonsoo Kim , Kirill Tkhai , Vlastimil Babka Subject: Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9 Reply-To: Srikar Dronamraju References: <20200227121214.GE3771@dhcp22.suse.cz> <52EF4673-7292-4C4C-B459-AF583951BA48@linux.vnet.ibm.com> <9a86f865-50b5-7483-9257-dbb08fecd62b@suse.cz> <20200227182650.GG3771@dhcp22.suse.cz> <20200310150114.GO8447@dhcp22.suse.cz> <87a74lix5p.fsf@mpe.ellerman.id.au> <875zf8y1i1.fsf@mpe.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <875zf8y1i1.fsf@mpe.ellerman.id.au> User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 x-cbid: 20031311-0020-0000-0000-000003B3BD01 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20031311-0021-0000-0000-0000220C14F7 Message-Id: <20200313111246.GB25144@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-03-13_04:2020-03-12,2020-03-13 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 suspectscore=0 adultscore=0 spamscore=0 malwarescore=0 priorityscore=1501 bulkscore=0 phishscore=0 mlxlogscore=991 mlxscore=0 clxscore=1011 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003130060 Sender: linux-next-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-next@vger.kernel.org * Michael Ellerman [2020-03-13 21:48:06]: > Sachin Sant writes: > >> The patch below might work. Sachin can you test this? I tried faking up > >> a system with a memoryless node zero but couldn't get it to even start > >> booting. > >> > > The patch did not help. The kernel crashed during > > the boot with the same call trace. > > > > BUG_ON() introduced with the patch was not triggered. > > OK, that's weird. > > I eventually managed to get a memoryless node going in sim, and it > appears to work there. > > eg in dmesg: > > [ 0.000000][ T0] numa: NODE_DATA [mem 0x2000fffa2f80-0x2000fffa7fff] > [ 0.000000][ T0] numa: NODE_DATA(0) on node 1 > [ 0.000000][ T0] numa: NODE_DATA [mem 0x2000fff9df00-0x2000fffa2f7f] > ... > [ 0.000000][ T0] Early memory node ranges > [ 0.000000][ T0] node 1: [mem 0x0000000000000000-0x00000000ffffffff] > [ 0.000000][ T0] node 1: [mem 0x0000200000000000-0x00002000ffffffff] > [ 0.000000][ T0] Could not find start_pfn for node 0 > [ 0.000000][ T0] Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000] > [ 0.000000][ T0] On node 0 totalpages: 0 > [ 0.000000][ T0] Initmem setup node 1 [mem 0x0000000000000000-0x00002000ffffffff] > [ 0.000000][ T0] On node 1 totalpages: 131072 > > # dmesg | grep set_numa > [ 0.000000][ T0] set_numa_mem: mem node for 0 = 1 > [ 0.005654][ T0] set_numa_mem: mem node for 1 = 1 > > So is the problem more than just node zero having no memory? > The problem would happen with possible nodes which are not yet present. i.e no cpus, no memory attached to those nodes. Please look at http://lore.kernel.org/lkml/20200312131438.GB3277@linux.vnet.ibm.com/t/#u for more details. The summary being: pgdat/Node_Data for such nodes is not allocated. Hence the node_present_pages(nid) called where nid is a possible but not yet present node fails. Currently node_present_pages(nid) and node_to_mem_node don't seem to be equipped to handle possible but not present nodes. > cheers -- Thanks and Regards Srikar Dronamraju From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8780C10DCE for ; Fri, 13 Mar 2020 11:16:18 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7FE5F2072C for ; Fri, 13 Mar 2020 11:16:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FE5F2072C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.vnet.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 48f36P380QzDqRh for ; Fri, 13 Mar 2020 22:16:13 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=srikar@linux.vnet.ibm.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.vnet.ibm.com Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 48f32g00JFzDqHh for ; Fri, 13 Mar 2020 22:12:58 +1100 (AEDT) Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02DB4JOu074357 for ; Fri, 13 Mar 2020 07:12:56 -0400 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2yr17j11eg-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 13 Mar 2020 07:12:56 -0400 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 13 Mar 2020 11:12:53 -0000 Received: from b06cxnps3074.portsmouth.uk.ibm.com (9.149.109.194) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 13 Mar 2020 11:12:50 -0000 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 02DBCn8064749782 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 13 Mar 2020 11:12:49 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 62D584C04E; Fri, 13 Mar 2020 11:12:49 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1F7CF4C040; Fri, 13 Mar 2020 11:12:47 +0000 (GMT) Received: from linux.vnet.ibm.com (unknown [9.126.150.29]) by d06av22.portsmouth.uk.ibm.com (Postfix) with SMTP; Fri, 13 Mar 2020 11:12:46 +0000 (GMT) Date: Fri, 13 Mar 2020 16:42:46 +0530 From: Srikar Dronamraju To: Michael Ellerman Subject: Re: [5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9 References: <20200227121214.GE3771@dhcp22.suse.cz> <52EF4673-7292-4C4C-B459-AF583951BA48@linux.vnet.ibm.com> <9a86f865-50b5-7483-9257-dbb08fecd62b@suse.cz> <20200227182650.GG3771@dhcp22.suse.cz> <20200310150114.GO8447@dhcp22.suse.cz> <87a74lix5p.fsf@mpe.ellerman.id.au> <875zf8y1i1.fsf@mpe.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <875zf8y1i1.fsf@mpe.ellerman.id.au> User-Agent: Mutt/1.10.1 (2018-07-13) X-TM-AS-GCONF: 00 x-cbid: 20031311-0020-0000-0000-000003B3BD01 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20031311-0021-0000-0000-0000220C14F7 Message-Id: <20200313111246.GB25144@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-03-13_04:2020-03-12, 2020-03-13 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 priorityscore=1501 phishscore=0 malwarescore=0 clxscore=1015 bulkscore=0 spamscore=0 suspectscore=0 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=971 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003130057 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Srikar Dronamraju Cc: Sachin Sant , Michal Hocko , Pekka Enberg , Linux-Next Mailing List , Paul Mackerras , Vlastimil Babka , David Rientjes , Christopher Lameter , linuxppc-dev@lists.ozlabs.org, Joonsoo Kim , Kirill Tkhai Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" * Michael Ellerman [2020-03-13 21:48:06]: > Sachin Sant writes: > >> The patch below might work. Sachin can you test this? I tried faking up > >> a system with a memoryless node zero but couldn't get it to even start > >> booting. > >> > > The patch did not help. The kernel crashed during > > the boot with the same call trace. > > > > BUG_ON() introduced with the patch was not triggered. > > OK, that's weird. > > I eventually managed to get a memoryless node going in sim, and it > appears to work there. > > eg in dmesg: > > [ 0.000000][ T0] numa: NODE_DATA [mem 0x2000fffa2f80-0x2000fffa7fff] > [ 0.000000][ T0] numa: NODE_DATA(0) on node 1 > [ 0.000000][ T0] numa: NODE_DATA [mem 0x2000fff9df00-0x2000fffa2f7f] > ... > [ 0.000000][ T0] Early memory node ranges > [ 0.000000][ T0] node 1: [mem 0x0000000000000000-0x00000000ffffffff] > [ 0.000000][ T0] node 1: [mem 0x0000200000000000-0x00002000ffffffff] > [ 0.000000][ T0] Could not find start_pfn for node 0 > [ 0.000000][ T0] Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000] > [ 0.000000][ T0] On node 0 totalpages: 0 > [ 0.000000][ T0] Initmem setup node 1 [mem 0x0000000000000000-0x00002000ffffffff] > [ 0.000000][ T0] On node 1 totalpages: 131072 > > # dmesg | grep set_numa > [ 0.000000][ T0] set_numa_mem: mem node for 0 = 1 > [ 0.005654][ T0] set_numa_mem: mem node for 1 = 1 > > So is the problem more than just node zero having no memory? > The problem would happen with possible nodes which are not yet present. i.e no cpus, no memory attached to those nodes. Please look at http://lore.kernel.org/lkml/20200312131438.GB3277@linux.vnet.ibm.com/t/#u for more details. The summary being: pgdat/Node_Data for such nodes is not allocated. Hence the node_present_pages(nid) called where nid is a possible but not yet present node fails. Currently node_present_pages(nid) and node_to_mem_node don't seem to be equipped to handle possible but not present nodes. > cheers -- Thanks and Regards Srikar Dronamraju