From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 836F4C433E0 for ; Tue, 23 Feb 2021 09:55:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2B84B614A7 for ; Tue, 23 Feb 2021 09:55:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232397AbhBWJzH (ORCPT ); Tue, 23 Feb 2021 04:55:07 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:58566 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232129AbhBWJvY (ORCPT ); Tue, 23 Feb 2021 04:51:24 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614073797; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8A4pxXGWG8a3xJa5PU7F4MaUKGA98BAtqJjTrcqSQjk=; b=MiozXcCpnH9qRmWVw/MKhw5pquCLF6ulU8e8oCyz43owlEDtYLNbt+iyncHRrpzaNWabhH suxWEnLKbOo+fiN6bWuzfHh++2R4pkhVNZZfpPdP48/guQiwcp5NzJeduc+SUGp9IxgHEK xpdZIA74AQpU1buauxyNgpNeRv7R304= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-441-htWRi2jJMLSCgfyIR0hjxg-1; Tue, 23 Feb 2021 04:49:53 -0500 X-MC-Unique: htWRi2jJMLSCgfyIR0hjxg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5C1C3807902; Tue, 23 Feb 2021 09:49:50 +0000 (UTC) Received: from [10.36.114.0] (ovpn-114-0.ams2.redhat.com [10.36.114.0]) by smtp.corp.redhat.com (Postfix) with ESMTP id 886F35D9DC; Tue, 23 Feb 2021 09:49:45 +0000 (UTC) Subject: Re: [PATCH v6 1/1] mm/page_alloc.c: refactor initialization of struct page for holes in memory layout To: Mike Rapoport Cc: Andrew Morton , Andrea Arcangeli , Baoquan He , Borislav Petkov , Chris Wilson , "H. Peter Anvin" , Ingo Molnar , Linus Torvalds , =?UTF-8?Q?=c5=81ukasz_Majczak?= , Mel Gorman , Michal Hocko , Mike Rapoport , Qian Cai , "Sarvela, Tomi P" , Thomas Gleixner , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, x86@kernel.org References: <20210222105728.28636-1-rppt@kernel.org> <20210223094802.GI1447004@kernel.org> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: Date: Tue, 23 Feb 2021 10:49:44 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210223094802.GI1447004@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23.02.21 10:48, Mike Rapoport wrote: > On Tue, Feb 23, 2021 at 09:04:19AM +0100, David Hildenbrand wrote: >> On 22.02.21 11:57, Mike Rapoport wrote: >>> From: Mike Rapoport >>> >>> There could be struct pages that are not backed by actual physical memory. >>> This can happen when the actual memory bank is not a multiple of >>> SECTION_SIZE or when an architecture does not register memory holes >>> reserved by the firmware as memblock.memory. >>> >>> Such pages are currently initialized using init_unavailable_mem() function >>> that iterates through PFNs in holes in memblock.memory and if there is a >>> struct page corresponding to a PFN, the fields of this page are set to >>> default values and it is marked as Reserved. >>> >>> init_unavailable_mem() does not take into account zone and node the page >>> belongs to and sets both zone and node links in struct page to zero. >>> >>> Before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions >>> rather that check each PFN") the holes inside a zone were re-initialized >>> during memmap_init() and got their zone/node links right. However, after >>> that commit nothing updates the struct pages representing such holes. >>> >>> On a system that has firmware reserved holes in a zone above ZONE_DMA, for >>> instance in a configuration below: >>> >>> # grep -A1 E820 /proc/iomem >>> 7a17b000-7a216fff : Unknown E820 type >>> 7a217000-7bffffff : System RAM >>> >>> unset zone link in struct page will trigger >>> >>> VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); >>> >>> because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link >>> in struct page) in the same pageblock. >>> >>> Interleave initialization of the unavailable pages with the normal >>> initialization of memory map, so that zone and node information will be >>> properly set on struct pages that are not backed by the actual memory. >>> >>> With this change the pages for holes inside a zone will get proper >>> zone/node links and the pages that are not spanned by any node will get >>> links to the adjacent zone/node. >> >> Does this include pages in the last section has handled by ... >> ... >>> - /* >>> - * Early sections always have a fully populated memmap for the whole >>> - * section - see pfn_valid(). If the last section has holes at the >>> - * end and that section is marked "online", the memmap will be >>> - * considered initialized. Make sure that memmap has a well defined >>> - * state. >>> - */ >>> - pgcnt += init_unavailable_range(PFN_DOWN(next), >>> - round_up(max_pfn, PAGES_PER_SECTION)); >>> - >> >> ^ this code? >> >> Or how is that case handled now? > > Hmm, now it's clamped to node_end_pfn/zone_end_pfn, so in your funny example with > > -object memory-backend-ram,id=bmem0,size=4160M \ > -object memory-backend-ram,id=bmem1,size=4032M \ > > this is not handled :( > > But it will be handled with this on top: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 29bbd08b8e63..6c9b490f5a8b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6350,9 +6350,12 @@ void __meminit __weak memmap_init_zone(struct zone *zone) > hole_pfn = end_pfn; > } > > - if (hole_pfn < zone_end_pfn) > - pgcnt += init_unavailable_range(hole_pfn, zone_end_pfn, > +#ifdef CONFIG_SPARSEMEM > + end_pfn = round_up(zone_end_pfn, PAGES_PER_SECTION); > + if (hole_pfn < end_pfn) > + pgcnt += init_unavailable_range(hole_pfn, end_pfn, > zone_id, nid); > +#endif > > if (pgcnt) > pr_info(" %s zone: %lld pages in unavailable ranges\n", > Also, just wondering, will PFN 0 still get initialized? -- Thanks, David / dhildenb