From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EA55C433DB for ; Mon, 15 Feb 2021 08:45:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63E3E64E6E for ; Mon, 15 Feb 2021 08:45:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63E3E64E6E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 93D3B8D00E0; Mon, 15 Feb 2021 03:45:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C8898D0060; Mon, 15 Feb 2021 03:45:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71B388D00E0; Mon, 15 Feb 2021 03:45:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id 554338D0060 for ; Mon, 15 Feb 2021 03:45:49 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0B298814B for ; Mon, 15 Feb 2021 08:45:49 +0000 (UTC) X-FDA: 77819869218.09.wine34_371229827639 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id E7EEF181D6051 for ; Mon, 15 Feb 2021 08:45:48 +0000 (UTC) X-HE-Tag: wine34_371229827639 X-Filterd-Recvd-Size: 10917 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Feb 2021 08:45:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613378747; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0pkjOUDVkE/MRzGwK00jh4QWH14hnNozDo7Jiu945IU=; b=Q7UHO6FCVy2ddAV7iVANtQ/T1btpPyGQ23kW6VhvWQeVvZmFUWpQw+UwojhBnaAdfYJ8PM 4KPqlmjvb4pHvD8j5kqesdhDtaH9N7jMGcSYyZn6ghZKHXNze2jbse3O60c010NAnak5KS iWC7j74WahsxTr8eYHBzPRw6Uq+ptuw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-579-D7aIz6oePyqRY6RpuI7yAQ-1; Mon, 15 Feb 2021 03:45:43 -0500 X-MC-Unique: D7aIz6oePyqRY6RpuI7yAQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C07A3801965; Mon, 15 Feb 2021 08:45:40 +0000 (UTC) Received: from [10.36.114.89] (ovpn-114-89.ams2.redhat.com [10.36.114.89]) by smtp.corp.redhat.com (Postfix) with ESMTP id 889DA5D9CA; Mon, 15 Feb 2021 08:45:31 +0000 (UTC) To: Mike Rapoport Cc: Andrew Morton , Andrea Arcangeli , Baoquan He , Borislav Petkov , Chris Wilson , "H. Peter Anvin" , Ingo Molnar , Linus Torvalds , =?UTF-8?Q?=c5=81ukasz_Majczak?= , Mel Gorman , Michal Hocko , Mike Rapoport , Qian Cai , "Sarvela, Tomi P" , Thomas Gleixner , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, x86@kernel.org References: <20210208110820.6269-1-rppt@kernel.org> <5dccbc93-f260-7f14-23bc-6dee2dff6c13@redhat.com> <20210214172906.GN242749@kernel.org> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v5 1/1] mm: refactor initialization of struct page for holes in memory layout Message-ID: Date: Mon, 15 Feb 2021 09:45:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210214172906.GN242749@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 14.02.21 18:29, Mike Rapoport wrote: > On Fri, Feb 12, 2021 at 10:56:19AM +0100, David Hildenbrand wrote: >> On 12.02.21 10:55, David Hildenbrand wrote: >>> On 08.02.21 12:08, Mike Rapoport wrote: >>>> +#ifdef CONFIG_SPARSEMEM >>>> + /* >>>> + * Sections in the memory map may not match actual populated >>>> + * memory, extend the node span to cover the entire section. >>>> + */ >>>> + *start_pfn =3D round_down(*start_pfn, PAGES_PER_SECTION); >>>> + *end_pfn =3D round_up(*end_pfn, PAGES_PER_SECTION); >>> >>> Does that mean that we might create overlapping zones when one node >> >> s/overlapping zones/overlapping nodes/ >> >>> starts in the middle of a section and the other one ends in the middl= e >>> of a section? >> >>> Could it be a problem? (e.g., would we have to look at neighboring no= des >>> when making the decision to extend, and how far to extend?) >=20 > Having a node end/start in a middle of a section would be a problem, bu= t in > this case I don't see a way to detect how a node should be extended :( Running QEMU with something like: ... -m 8G \ -smp sockets=3D2,cores=3D2 \ -object memory-backend-ram,id=3Dbmem0,size=3D4160M \ -object memory-backend-ram,id=3Dbmem1,size=3D4032M \ -numa node,nodeid=3D0,cpus=3D0-1,memdev=3Dbmem0 -numa node,nodeid=3D= 1,cpus=3D2-3,memdev=3Dbmem1 \ ... Creates such a setup. With an older kernel: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usa= ble [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] res= erved [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] res= erved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usa= ble [ 0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] res= erved [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] res= erved [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] res= erved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000023fffffff] usa= ble [...] [ 0.002506] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] [ 0.002508] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] [ 0.002509] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x143ffffff] [ 0.002510] ACPI: SRAT: Node 1 PXM 1 [mem 0x144000000-0x23fffffff] [ 0.002511] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000= -0xbfffffff] -> [mem 0x00000000-0xbfffffff] [ 0.002513] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x10000000= 0-0x143ffffff] -> [mem 0x00000000-0x143ffffff] [ 0.002519] NODE_DATA(0) allocated [mem 0x143fd5000-0x143ffffff] [ 0.002669] NODE_DATA(1) allocated [mem 0x23ffd2000-0x23fffcfff] [ 0.017947] memblock: reserved range [0x0000000000000000-0x00000000000= 01000] is not in memory [ 0.017953] memblock: reserved range [0x000000000009f000-0x00000000001= 00000] is not in memory [ 0.017956] Zone ranges: [ 0.017957] DMA [mem 0x0000000000000000-0x0000000000ffffff] [ 0.017958] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] [ 0.017960] Normal [mem 0x0000000100000000-0x000000023fffffff] [ 0.017961] Device empty [ 0.017962] Movable zone start for each node [ 0.017964] Early memory node ranges [ 0.017965] node 0: [mem 0x0000000000000000-0x00000000bffdffff] [ 0.017966] node 0: [mem 0x0000000100000000-0x0000000143ffffff] [ 0.017967] node 1: [mem 0x0000000144000000-0x000000023fffffff] [ 0.017969] Initmem setup node 0 [mem 0x0000000000000000-0x0000000143f= fffff] [ 0.017971] On node 0 totalpages: 1064928 [ 0.017972] DMA zone: 64 pages used for memmap [ 0.017973] DMA zone: 21 pages reserved [ 0.017974] DMA zone: 4096 pages, LIFO batch:0 [ 0.017994] DMA32 zone: 12224 pages used for memmap [ 0.017995] DMA32 zone: 782304 pages, LIFO batch:63 [ 0.022281] DMA32: Zeroed struct page in unavailable ranges: 32 [ 0.022286] Normal zone: 4352 pages used for memmap [ 0.022287] Normal zone: 278528 pages, LIFO batch:63 [ 0.023769] Initmem setup node 1 [mem 0x0000000144000000-0x000000023ff= fffff] [ 0.023774] On node 1 totalpages: 1032192 [ 0.023775] Normal zone: 16128 pages used for memmap [ 0.023775] Normal zone: 1032192 pages, LIFO batch:63 With current next/master: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usa= ble [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] res= erved [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] res= erved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usa= ble [ 0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] res= erved [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] res= erved [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] res= erved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000023fffffff] usa= ble [...] [ 0.002419] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] [ 0.002421] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] [ 0.002422] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x143ffffff] [ 0.002423] ACPI: SRAT: Node 1 PXM 1 [mem 0x144000000-0x23fffffff] [ 0.002424] NUMA: Node 0 [mem 0x00000000-0x0009ffff] + [mem 0x00100000= -0xbfffffff] -> [mem 0x00000000-0xbfffffff] [ 0.002426] NUMA: Node 0 [mem 0x00000000-0xbfffffff] + [mem 0x10000000= 0-0x143ffffff] -> [mem 0x00000000-0x143ffffff] [ 0.002432] NODE_DATA(0) allocated [mem 0x143fd5000-0x143ffffff] [ 0.002583] NODE_DATA(1) allocated [mem 0x23ffd2000-0x23fffcfff] [ 0.017722] Zone ranges: [ 0.017726] DMA [mem 0x0000000000000000-0x0000000000ffffff] [ 0.017728] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] [ 0.017729] Normal [mem 0x0000000100000000-0x000000023fffffff] [ 0.017731] Device empty [ 0.017732] Movable zone start for each node [ 0.017734] Early memory node ranges [ 0.017735] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.017736] node 0: [mem 0x0000000000100000-0x00000000bffdffff] [ 0.017737] node 0: [mem 0x0000000100000000-0x0000000143ffffff] [ 0.017738] node 1: [mem 0x0000000144000000-0x000000023fffffff] [ 0.017741] Initmem setup node 0 [mem 0x0000000000000000-0x0000000147f= fffff] [ 0.017742] On node 0 totalpages: 1064830 [ 0.017743] DMA zone: 64 pages used for memmap [ 0.017744] DMA zone: 21 pages reserved [ 0.017745] DMA zone: 3998 pages, LIFO batch:0 [ 0.017765] DMA zone: 98 pages in unavailable ranges [ 0.017766] DMA32 zone: 12224 pages used for memmap [ 0.017766] DMA32 zone: 782304 pages, LIFO batch:63 [ 0.022042] DMA32 zone: 32 pages in unavailable ranges [ 0.022046] Normal zone: 4608 pages used for memmap [ 0.022047] Normal zone: 278528 pages, LIFO batch:63 [ 0.023601] Normal zone: 16384 pages in unavailable ranges [ 0.023606] Initmem setup node 1 [mem 0x0000000140000000-0x000000023ff= fffff] [ 0.023608] On node 1 totalpages: 1032192 [ 0.023609] Normal zone: 16384 pages used for memmap [ 0.023609] Normal zone: 1032192 pages, LIFO batch:63 [ 0.029267] Normal zone: 16384 pages in unavailable ranges In this setup, one node ends in the middle of a section (+64MB), the other one starts in the middle of the same section (+64MB). After your patch, the nodes overlap (in one section) I can spot that each node still has the same number of present pages and that each node now has exactly 64MB unavailable pages (the extra ones spa= nned). So at least here, it looks like the machinery is still doing the right th= ing? --=20 Thanks, David / dhildenb