From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7473FC56202 for ; Wed, 25 Nov 2020 21:04:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BA15C206F9 for ; Wed, 25 Nov 2020 21:04:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="CBHDZYZ3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BA15C206F9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 114286B006E; Wed, 25 Nov 2020 16:04:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C5C16B0070; Wed, 25 Nov 2020 16:04:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF6316B0071; Wed, 25 Nov 2020 16:04:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0035.hostedemail.com [216.40.44.35]) by kanga.kvack.org (Postfix) with ESMTP id D8D056B006E for ; Wed, 25 Nov 2020 16:04:27 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A02D0180AD80F for ; Wed, 25 Nov 2020 21:04:27 +0000 (UTC) X-FDA: 77524168974.11.nut83_160a98c27379 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 79E19180F8B81 for ; Wed, 25 Nov 2020 21:04:27 +0000 (UTC) X-HE-Tag: nut83_160a98c27379 X-Filterd-Recvd-Size: 6877 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Wed, 25 Nov 2020 21:04:26 +0000 (UTC) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 0APL2TNo003301; Wed, 25 Nov 2020 16:04:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=pp1; bh=sy/ZQzr8x3n93eziZ9EheOK1LOAyh8IWJnZiQJqqAs8=; b=CBHDZYZ3/EDiMWKoW6nta7993XgKZtfXMJ3hJLXXmj8aEPWTpQ7C+1zD2FKaSGfo/ZJc Vha7knirueFs4QLFfYaI2BAsbMn8vhQ5e3kRXNKb8tKlW3D97dRM6Bt2mLuArdNXVXAu MSB0AAzIvBe1fAPJg4R9Ya52Nl4k4NA1sCu8IP/fIa0mY+B4Cp5x/dPjx+T6FjuLPtiq SsU1KYonswiD+ksPgZWMyTVD/u0GgkZfGvMNKNaBDX0PsnZB+rVWLXPUo/StgWlySAll 3CKJuDZmiXdof/XNQ7tluPxyK81Hi7SCVQCNXKbcmiltjh5mkrIXqBD5IcGrAxevhp4W pA== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 351ry9u7xw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 25 Nov 2020 16:04:24 -0500 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 0APL3W0i009246; Wed, 25 Nov 2020 16:04:23 -0500 Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0b-001b2d01.pphosted.com with ESMTP id 351ry9u7x7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 25 Nov 2020 16:04:23 -0500 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 0APL3NSc023891; Wed, 25 Nov 2020 21:04:22 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma06fra.de.ibm.com with ESMTP id 351pca077n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 25 Nov 2020 21:04:21 +0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 0APL4JZN9175800 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 25 Nov 2020 21:04:19 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9344C42047; Wed, 25 Nov 2020 21:04:19 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AF7264204B; Wed, 25 Nov 2020 21:04:17 +0000 (GMT) Received: from linux.ibm.com (unknown [9.145.183.229]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Wed, 25 Nov 2020 21:04:17 +0000 (GMT) Date: Wed, 25 Nov 2020 23:04:14 +0200 From: Mike Rapoport To: David Hildenbrand Cc: Andrea Arcangeli , Vlastimil Babka , Mel Gorman , Andrew Morton , linux-mm@kvack.org, Qian Cai , Michal Hocko , linux-kernel@vger.kernel.org, Baoquan He Subject: Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages Message-ID: <20201125210414.GO123287@linux.ibm.com> References: <35F8AADA-6CAA-4BD6-A4CF-6F29B3F402A4@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.312,18.0.737 definitions=2020-11-25_12:2020-11-25,2020-11-25 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 lowpriorityscore=0 mlxscore=0 suspectscore=0 malwarescore=0 adultscore=0 impostorscore=0 mlxlogscore=999 priorityscore=1501 bulkscore=0 clxscore=1015 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2011250127 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 25, 2020 at 08:27:21PM +0100, David Hildenbrand wrote: > On 25.11.20 19:28, Andrea Arcangeli wrote: > > On Wed, Nov 25, 2020 at 07:45:30AM +0100, David Hildenbrand wrote: > > > > What would need to call pfn_zone in between first and second stage? > > > > If something calls pfn_zone in between first and second stage isn't it > > a feature if it crashes the kernel at boot? > > > > Note: I suggested 0xff kernel crashing "until the second stage comes > > around" during meminit at boot, not permanently. > > Yes, then it makes sense - if we're able to come up with a way to > initialize any memmap we might have - including actual memory holes that > have a memmap. > > > > > /* > > * Use a fake node/zone (0) for now. Some of these pages > > * (in memblock.reserved but not in memblock.memory) will > > * get re-initialized via reserve_bootmem_region() later. > > */ > > > > Specifically I relied on the comment "get re-initialized via > > reserve_bootmem_region() later". > > Yes, but there is a "Some of these" :) > > Boot a VM with "-M 4000" and observe the memmap in the last section - > they won't get initialized a second time. > > > > > I assumed the second stage overwrites the 0,0 to the real zoneid/nid > > value, which is clearly not happening, hence it'd be preferable to get > > a crash at boot reliably. > > > > Now I have CONFIG_DEFERRED_STRUCT_PAGE_INIT=n so the second stage > > calling init_reserved_page(start_pfn) won't do much with > > CONFIG_DEFERRED_STRUCT_PAGE_INIT=n but I already tried to enable > > CONFIG_DEFERRED_STRUCT_PAGE_INIT=y yesterday and it didn't help, the > > page->flags were still wrong for reserved pages in the "Unknown E820 > > type" region. I think the very root cause is how e820__memblock_setup() registers memory with memblock: if (entry->type == E820_TYPE_SOFT_RESERVED) memblock_reserve(entry->addr, entry->size); if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue; memblock_add(entry->addr, entry->size); >From that point the system has inconsistent view of RAM in both memblock.memory and memblock.reserved and, which is then translated to memmap etc. Unfortunately, simply adding all RAM to memblock is not possible as there are systems that for them "the addresses listed in the reserved range must never be accessed, or (as we discovered) even be reachable by an active page table entry" [1]. [1] https://lore.kernel.org/lkml/20200528151510.GA6154@raspberrypi/ -- Sincerely yours, Mike.