From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4043AC433B4 for ; Wed, 14 Apr 2021 20:09:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CC0F161177 for ; Wed, 14 Apr 2021 20:09:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CC0F161177 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 402E66B0074; Wed, 14 Apr 2021 16:09:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D8F06B0075; Wed, 14 Apr 2021 16:09:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27ACC6B0078; Wed, 14 Apr 2021 16:09:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0189.hostedemail.com [216.40.44.189]) by kanga.kvack.org (Postfix) with ESMTP id 0B5E16B0074 for ; Wed, 14 Apr 2021 16:09:31 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C28BD52C5 for ; Wed, 14 Apr 2021 20:09:30 +0000 (UTC) X-FDA: 78032062500.36.F5ED39B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 960B980192C0 for ; Wed, 14 Apr 2021 20:09:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618430969; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=lPDLvyWiDWr007QIeC/gxWsnIDhSfOudqWTKl0gmPTU=; b=L0GQPqO9tB6JJGwwpfOXWLhwyVPh+9jqijr5GeA3oS2HtZC3qeQ4elJHF9CAYbHZe+fR85 JRcwZbwRaO5Dz+R5/MoSMVYEhhXtpW3Fy2mYJ9jYgJK3T8RqDZdjA8foABYkFN0VU0gmLR QlF+s+dMbuXdLr3/2OtWNMSZNuRrSYk= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-400-bOq81IudMhyJH3QnLmuBNQ-1; Wed, 14 Apr 2021 16:09:25 -0400 X-MC-Unique: bOq81IudMhyJH3QnLmuBNQ-1 Received: by mail-wm1-f72.google.com with SMTP id w187-20020a1cdfc40000b029012e8682a12bso430094wmg.9 for ; Wed, 14 Apr 2021 13:09:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lPDLvyWiDWr007QIeC/gxWsnIDhSfOudqWTKl0gmPTU=; b=p4OfbbIJmekNXDl4wqNq55TcALAVrT2RD+HPgJU32yUzXHySK2SHWLCSUthCcbOiK5 fq45jQBM52TyvYb6nQM5irvUQjur0Hx8kv0IRXTPLU3rqJ7DcdAQyDysMrWjA5kmDEUW PxxKoS6ugjAwzsB/otCwdwp/6jz1oI7Fv4RFWP4X63RtZGyBh4IiZlxUPQGBbbpabxdt DUO2W2CLlsh8SWlukKhEss7E+aTWcFN6JNGcZEFbneTDvmV+NJeYVaakAnGVDTuX3KCo FBZKV1yeR1GzqzvkU6s/oXSByjSASg/5tNceFM1s+yMezZPYOFQ3/NW/NbKguPVahDqG tuCw== X-Gm-Message-State: AOAM530t4jJU4QqDyn/luGgTkOcv9phoWv6ZoL+EKbe3J8K66FFAoFQA dY+TUYjBOf/hH/rfYnXx+m3MA2dfPXFpryIAALPG24Murg6Rh6Os6REuNRtfC0TKRBHZGBfjrdc ibAFqVrqnA14FzZBHLnaU0cX7DHw= X-Received: by 2002:a7b:c2fa:: with SMTP id e26mr4037337wmk.118.1618430964339; Wed, 14 Apr 2021 13:09:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/XAvHspyAk+iSN2SFeJg1ElH6gfX+yek4iOtc5d3orcVRFuGmKT4pimIf7A4nXfVKrkNfELfMhIulk5i5bx4= X-Received: by 2002:a7b:c2fa:: with SMTP id e26mr4037326wmk.118.1618430964125; Wed, 14 Apr 2021 13:09:24 -0700 (PDT) MIME-Version: 1.0 References: <20210407172607.8812-1-rppt@kernel.org> <20210407172607.8812-2-rppt@kernel.org> <0c48f98c-7454-1458-15a5-cc5a7e1fb7cd@redhat.com> In-Reply-To: From: David Hildenbrand Date: Wed, 14 Apr 2021 22:09:13 +0200 Message-ID: Subject: Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages To: Mike Rapoport Cc: Anshuman Khandual , Ard Biesheuvel , Catalin Marinas , Marc Zyngier , Mark Rutland , Mike Rapoport , Will Deacon , kvmarm@lists.cs.columbia.edu, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dhildenb@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/alternative; boundary="000000000000b5e15d05bff45259" X-Stat-Signature: 6ukyne6bzfkrot5c6j9auj8d37nytsj4 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 960B980192C0 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618430969-862241 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000b5e15d05bff45259 Content-Type: text/plain; charset="UTF-8" Mike Rapoport schrieb am Mi. 14. Apr. 2021 um 22:06: > On Wed, Apr 14, 2021 at 05:12:11PM +0200, David Hildenbrand wrote: > > On 07.04.21 19:26, Mike Rapoport wrote: > > > From: Mike Rapoport > > > > > > The struct pages representing a reserved memory region are initialized > > > using reserve_bootmem_range() function. This function is called for > each > > > reserved region just before the memory is freed from memblock to the > buddy > > > page allocator. > > > > > > The struct pages for MEMBLOCK_NOMAP regions are kept with the default > > > values set by the memory map initialization which makes it necessary to > > > have a special treatment for such pages in pfn_valid() and > > > pfn_valid_within(). > > > > I assume these pages are never given to the buddy, because we don't have > a > > direct mapping. So to the kernel, it's essentially just like a memory > hole > > with benefits. > > The pages should not be accessed as normal memory so they do not have a > direct (or in ARMish linear) mapping and are never given to buddy. > After looking at ACPI standard I don't see a fundamental reason for this > but they've already made this mess and we need to cope with it. > > > I can spot that we want to export such memory like any special memory > > thingy/hole in /proc/iomem -- "reserved", which makes sense. > > It does, but let's wait with /proc/iomem changes. We don't really have a > 100% consistent view of it on different architectures, so adding yet > another type there does not seem, well, urgent. > To clarify: this is already done on arm64. > > I would assume that MEMBLOCK_NOMAP is a special type of *reserved* > memory. > > IOW, that for_each_reserved_mem_range() should already succeed on these > as > > well -- we should mark anything that is MEMBLOCK_NOMAP implicitly as > > reserved. Or are there valid reasons not to do so? What can anyone do > with > > that memory? > > > > I assume they are pretty much useless for the kernel, right? Like other > > reserved memory ranges. > > I agree that there is a lot of commonality between NOMAP and reserved. The > problem is that even semantics for reserved is different between > architectures. Moreover, on the same architecture there could be > E820_TYPE_RESERVED and memblock.reserved with different properties. > > I'd really prefer moving in baby steps here because any change in the boot > mm can bear several month of early hangs debugging ;-) Yeah I know. We just should have the desired target state figured out :) > > > > Split out initialization of the reserved pages to a function with a > > > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as > the > > > reserved regions and mark struct pages for the NOMAP regions as > > > PageReserved. > > > > > > Signed-off-by: Mike Rapoport > > > --- > > > mm/memblock.c | 23 +++++++++++++++++++++-- > > > 1 file changed, 21 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/memblock.c b/mm/memblock.c > > > index afaefa8fc6ab..6b7ea9d86310 100644 > > > --- a/mm/memblock.c > > > +++ b/mm/memblock.c > > > @@ -2002,6 +2002,26 @@ static unsigned long __init > __free_memory_core(phys_addr_t start, > > > return end_pfn - start_pfn; > > > } > > > +static void __init memmap_init_reserved_pages(void) > > > +{ > > > + struct memblock_region *region; > > > + phys_addr_t start, end; > > > + u64 i; > > > + > > > + /* initialize struct pages for the reserved regions */ > > > + for_each_reserved_mem_range(i, &start, &end) > > > + reserve_bootmem_region(start, end); > > > + > > > + /* and also treat struct pages for the NOMAP regions as > PageReserved */ > > > + for_each_mem_region(region) { > > > + if (memblock_is_nomap(region)) { > > > + start = region->base; > > > + end = start + region->size; > > > + reserve_bootmem_region(start, end); > > > + } > > > + } > > > +} > > > + > > > static unsigned long __init free_low_memory_core_early(void) > > > { > > > unsigned long count = 0; > > > @@ -2010,8 +2030,7 @@ static unsigned long __init > free_low_memory_core_early(void) > > > memblock_clear_hotplug(0, -1); > > > - for_each_reserved_mem_range(i, &start, &end) > > > - reserve_bootmem_region(start, end); > > > + memmap_init_reserved_pages(); > > > /* > > > * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id > > -- > Sincerely yours, > Mike. > > -- Thanks, David / dhildenb --000000000000b5e15d05bff45259 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Mike Rapoport <rppt@k= ernel.org> schrieb am Mi. 14. Apr. 2021 um 22:06:
On Wed, Apr 14, 2021 at 05:12:11PM +0200, David Hildenbrand wrote:<= br> > On 07.04.21 19:26, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > The struct pages representing a reserved memory region are initia= lized
> > using reserve_bootmem_range() function. This function is called f= or each
> > reserved region just before the memory is freed from memblock to = the buddy
> > page allocator.
> >
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the def= ault
> > values set by the memory map initialization which makes it necess= ary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
>
> I assume these pages are never given to the buddy, because we don'= t have a
> direct mapping. So to the kernel, it's essentially just like a mem= ory hole
> with benefits.

The pages should not be accessed as normal memory so they do not have a
direct (or in ARMish linear) mapping and are never given to buddy.
After looking at ACPI standard I don't see a fundamental reason for thi= s
but they've already made this mess and we need to cope with it.

> I can spot that we want to export such memory like any special memory<= br> > thingy/hole in /proc/iomem -- "reserved", which makes sense.=

It does, but let's wait with /proc/iomem changes. We don't really h= ave a
100% consistent view of it on different architectures, so adding yet
another type there does not seem, well, urgent.

To clarify: this= is already done on arm64.


> I would assume that MEMBLOCK_NOMAP is a special type of *reserved* mem= ory.
> IOW, that for_each_reserved_mem_range() should already succeed on thes= e as
> well -- we should mark anything that is MEMBLOCK_NOMAP implicitly as > reserved. Or are there valid reasons not to do so? What can anyone do = with
> that memory?
>
> I assume they are pretty much useless for the kernel, right? Like othe= r
> reserved memory ranges.

I agree that there is a lot of commonality between NOMAP and reserved. The<= br> problem is that even semantics for reserved is different between
architectures. Moreover, on the same architecture there could be
E820_TYPE_RESERVED and memblock.reserved with different properties.

I'd really prefer moving in baby steps here because any change in the b= oot
mm can bear several month of early hangs debugging ;-)

Yeah I know. We just should have the = desired target state figured out :)




> > Split out initialization of the reserved pages to a function with= a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way= as the
> > reserved regions and mark struct pages for the NOMAP regions as > > PageReserved.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >=C2=A0 =C2=A0mm/memblock.c | 23 +++++++++++++++++++++--
> >=C2=A0 =C2=A01 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_= core(phys_addr_t start,
> >=C2=A0 =C2=A0 =C2=A0return end_pfn - start_pfn;
> >=C2=A0 =C2=A0}
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > +=C2=A0 =C2=A0struct memblock_region *region;
> > +=C2=A0 =C2=A0phys_addr_t start, end;
> > +=C2=A0 =C2=A0u64 i;
> > +
> > +=C2=A0 =C2=A0/* initialize struct pages for the reserved regions= */
> > +=C2=A0 =C2=A0for_each_reserved_mem_range(i, &start, &end= )
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reserve_bootmem_region(= start, end);
> > +
> > +=C2=A0 =C2=A0/* and also treat struct pages for the NOMAP region= s as PageReserved */
> > +=C2=A0 =C2=A0for_each_mem_region(region) {
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (memblock_is_nomap(r= egion)) {
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0start =3D region->base;
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0end =3D start + region->size;
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0reserve_bootmem_region(start, end);
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> > +=C2=A0 =C2=A0}
> > +}
> > +
> >=C2=A0 =C2=A0static unsigned long __init free_low_memory_core_earl= y(void)
> >=C2=A0 =C2=A0{
> >=C2=A0 =C2=A0 =C2=A0unsigned long count =3D 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory= _core_early(void)
> >=C2=A0 =C2=A0 =C2=A0memblock_clear_hotplug(0, -1);
> > -=C2=A0 =C2=A0for_each_reserved_mem_range(i, &start, &end= )
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reserve_bootmem_region(= start, end);
> > +=C2=A0 =C2=A0memmap_init_reserved_pages();
> >=C2=A0 =C2=A0 =C2=A0/*
> >=C2=A0 =C2=A0 =C2=A0 * We need to use NUMA_NO_NODE instead of NODE= _DATA(0)->node_id

--
Sincerely yours,
Mike.

--
Thanks,

David / dhildenb
--000000000000b5e15d05bff45259-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E21FC43460 for ; Wed, 14 Apr 2021 20:09:33 +0000 (UTC) Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by mail.kernel.org (Postfix) with ESMTP id 8597061179 for ; Wed, 14 Apr 2021 20:09:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8597061179 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvmarm-bounces@lists.cs.columbia.edu Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 0D87A4B304; Wed, 14 Apr 2021 16:09:32 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@redhat.com Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zRZqpT-9rbEc; Wed, 14 Apr 2021 16:09:30 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 842FA4B331; Wed, 14 Apr 2021 16:09:30 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 66BE74B331 for ; Wed, 14 Apr 2021 16:09:29 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1fhj0Lf77HYF for ; Wed, 14 Apr 2021 16:09:28 -0400 (EDT) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mm01.cs.columbia.edu (Postfix) with ESMTP id F01E04B2E5 for ; Wed, 14 Apr 2021 16:09:27 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618430967; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=lPDLvyWiDWr007QIeC/gxWsnIDhSfOudqWTKl0gmPTU=; b=Tg4im1cIsc/ZaJXF2CM3pEN79UyHriPSxeQ7cNESD9dzhbF0C5aoUR7VP2wU5OTPabtYU+ HQ0VpSD6haFe7sAWnXUUfDyd6nwWfr3BOkMAsagbrQFSfcaeirYxPt3WDd82Pp20LGM+cy feTgL4vZ/atJuLC6bK1Ign3elVEYdvc= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-173-9XhduoplMruP510RiskrFg-1; Wed, 14 Apr 2021 16:09:25 -0400 X-MC-Unique: 9XhduoplMruP510RiskrFg-1 Received: by mail-wr1-f69.google.com with SMTP id f15-20020adffccf0000b02901028c7339ccso1536263wrs.1 for ; Wed, 14 Apr 2021 13:09:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lPDLvyWiDWr007QIeC/gxWsnIDhSfOudqWTKl0gmPTU=; b=qmujQPpYJx/RWMVB5rf6HERhueVkJVEXNucSx0SmQjCq/P1nf3SuZlARCjYGLyEDab pDveRKKHfcwK/myHwMqHhoIIgbUbhSTrxUwqPZBqC1cN1c+JQg04Un9BZeH2Gl2fqItB biUFfd3e2QkqwXmQM8bDtUnKbGXtoRstGTfkjLF855UQLUVR61LfFxcn4L3TWSmdHqnD HLuZs8DVvUuLuNzUtPyYBSxDhH2SRPIWo4uI6I0KVBY4QoCXUDgkAqO1ORmitO0hlmD5 1FVIMUDGoN/nLTGCDtUgugGvviohc9b2vXJzWMFlUh6l8oYwNFlqwkrCwALORtulyMKR xzCQ== X-Gm-Message-State: AOAM530MUNv022blgEo5/ybNkUpoztnVfGOZxki0/3VGuTgvkPn/eq41 /yLvfaH8cz8Ry7+Tw+sPH48uQYkK6t2xc6tmbnlSp/OSVBQHn/2sMuqxFUyNLh3Cma+JxqmnRfQ b7SnxBBPNVlssEO6skS3kXOdsBnBUtzgC/8CJlGIv X-Received: by 2002:a7b:c2fa:: with SMTP id e26mr4037338wmk.118.1618430964340; Wed, 14 Apr 2021 13:09:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/XAvHspyAk+iSN2SFeJg1ElH6gfX+yek4iOtc5d3orcVRFuGmKT4pimIf7A4nXfVKrkNfELfMhIulk5i5bx4= X-Received: by 2002:a7b:c2fa:: with SMTP id e26mr4037326wmk.118.1618430964125; Wed, 14 Apr 2021 13:09:24 -0700 (PDT) MIME-Version: 1.0 References: <20210407172607.8812-1-rppt@kernel.org> <20210407172607.8812-2-rppt@kernel.org> <0c48f98c-7454-1458-15a5-cc5a7e1fb7cd@redhat.com> In-Reply-To: From: David Hildenbrand Date: Wed, 14 Apr 2021 22:09:13 +0200 Message-ID: Subject: Re: [RFC/RFT PATCH 1/3] memblock: update initialization of reserved pages To: Mike Rapoport Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dhildenb@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Anshuman Khandual , Catalin Marinas , linux-kernel@vger.kernel.org, Mike Rapoport , linux-mm@kvack.org, kvmarm@lists.cs.columbia.edu, Marc Zyngier , Will Deacon , linux-arm-kernel@lists.infradead.org X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============0545650954576413949==" Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu --===============0545650954576413949== Content-Type: multipart/alternative; boundary="000000000000b5e15d05bff45259" --000000000000b5e15d05bff45259 Content-Type: text/plain; charset="UTF-8" Mike Rapoport schrieb am Mi. 14. Apr. 2021 um 22:06: > On Wed, Apr 14, 2021 at 05:12:11PM +0200, David Hildenbrand wrote: > > On 07.04.21 19:26, Mike Rapoport wrote: > > > From: Mike Rapoport > > > > > > The struct pages representing a reserved memory region are initialized > > > using reserve_bootmem_range() function. This function is called for > each > > > reserved region just before the memory is freed from memblock to the > buddy > > > page allocator. > > > > > > The struct pages for MEMBLOCK_NOMAP regions are kept with the default > > > values set by the memory map initialization which makes it necessary to > > > have a special treatment for such pages in pfn_valid() and > > > pfn_valid_within(). > > > > I assume these pages are never given to the buddy, because we don't have > a > > direct mapping. So to the kernel, it's essentially just like a memory > hole > > with benefits. > > The pages should not be accessed as normal memory so they do not have a > direct (or in ARMish linear) mapping and are never given to buddy. > After looking at ACPI standard I don't see a fundamental reason for this > but they've already made this mess and we need to cope with it. > > > I can spot that we want to export such memory like any special memory > > thingy/hole in /proc/iomem -- "reserved", which makes sense. > > It does, but let's wait with /proc/iomem changes. We don't really have a > 100% consistent view of it on different architectures, so adding yet > another type there does not seem, well, urgent. > To clarify: this is already done on arm64. > > I would assume that MEMBLOCK_NOMAP is a special type of *reserved* > memory. > > IOW, that for_each_reserved_mem_range() should already succeed on these > as > > well -- we should mark anything that is MEMBLOCK_NOMAP implicitly as > > reserved. Or are there valid reasons not to do so? What can anyone do > with > > that memory? > > > > I assume they are pretty much useless for the kernel, right? Like other > > reserved memory ranges. > > I agree that there is a lot of commonality between NOMAP and reserved. The > problem is that even semantics for reserved is different between > architectures. Moreover, on the same architecture there could be > E820_TYPE_RESERVED and memblock.reserved with different properties. > > I'd really prefer moving in baby steps here because any change in the boot > mm can bear several month of early hangs debugging ;-) Yeah I know. We just should have the desired target state figured out :) > > > > Split out initialization of the reserved pages to a function with a > > > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as > the > > > reserved regions and mark struct pages for the NOMAP regions as > > > PageReserved. > > > > > > Signed-off-by: Mike Rapoport > > > --- > > > mm/memblock.c | 23 +++++++++++++++++++++-- > > > 1 file changed, 21 insertions(+), 2 deletions(-) > > > > > > diff --git a/mm/memblock.c b/mm/memblock.c > > > index afaefa8fc6ab..6b7ea9d86310 100644 > > > --- a/mm/memblock.c > > > +++ b/mm/memblock.c > > > @@ -2002,6 +2002,26 @@ static unsigned long __init > __free_memory_core(phys_addr_t start, > > > return end_pfn - start_pfn; > > > } > > > +static void __init memmap_init_reserved_pages(void) > > > +{ > > > + struct memblock_region *region; > > > + phys_addr_t start, end; > > > + u64 i; > > > + > > > + /* initialize struct pages for the reserved regions */ > > > + for_each_reserved_mem_range(i, &start, &end) > > > + reserve_bootmem_region(start, end); > > > + > > > + /* and also treat struct pages for the NOMAP regions as > PageReserved */ > > > + for_each_mem_region(region) { > > > + if (memblock_is_nomap(region)) { > > > + start = region->base; > > > + end = start + region->size; > > > + reserve_bootmem_region(start, end); > > > + } > > > + } > > > +} > > > + > > > static unsigned long __init free_low_memory_core_early(void) > > > { > > > unsigned long count = 0; > > > @@ -2010,8 +2030,7 @@ static unsigned long __init > free_low_memory_core_early(void) > > > memblock_clear_hotplug(0, -1); > > > - for_each_reserved_mem_range(i, &start, &end) > > > - reserve_bootmem_region(start, end); > > > + memmap_init_reserved_pages(); > > > /* > > > * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id > > -- > Sincerely yours, > Mike. > > -- Thanks, David / dhildenb --000000000000b5e15d05bff45259 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


Mike Rapoport <rppt@k= ernel.org> schrieb am Mi. 14. Apr. 2021 um 22:06:
On Wed, Apr 14, 2021 at 05:12:11PM +0200, David Hildenbrand wrote:<= br> > On 07.04.21 19:26, Mike Rapoport wrote:
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > The struct pages representing a reserved memory region are initia= lized
> > using reserve_bootmem_range() function. This function is called f= or each
> > reserved region just before the memory is freed from memblock to = the buddy
> > page allocator.
> >
> > The struct pages for MEMBLOCK_NOMAP regions are kept with the def= ault
> > values set by the memory map initialization which makes it necess= ary to
> > have a special treatment for such pages in pfn_valid() and
> > pfn_valid_within().
>
> I assume these pages are never given to the buddy, because we don'= t have a
> direct mapping. So to the kernel, it's essentially just like a mem= ory hole
> with benefits.

The pages should not be accessed as normal memory so they do not have a
direct (or in ARMish linear) mapping and are never given to buddy.
After looking at ACPI standard I don't see a fundamental reason for thi= s
but they've already made this mess and we need to cope with it.

> I can spot that we want to export such memory like any special memory<= br> > thingy/hole in /proc/iomem -- "reserved", which makes sense.=

It does, but let's wait with /proc/iomem changes. We don't really h= ave a
100% consistent view of it on different architectures, so adding yet
another type there does not seem, well, urgent.

To clarify: this= is already done on arm64.


> I would assume that MEMBLOCK_NOMAP is a special type of *reserved* mem= ory.
> IOW, that for_each_reserved_mem_range() should already succeed on thes= e as
> well -- we should mark anything that is MEMBLOCK_NOMAP implicitly as > reserved. Or are there valid reasons not to do so? What can anyone do = with
> that memory?
>
> I assume they are pretty much useless for the kernel, right? Like othe= r
> reserved memory ranges.

I agree that there is a lot of commonality between NOMAP and reserved. The<= br> problem is that even semantics for reserved is different between
architectures. Moreover, on the same architecture there could be
E820_TYPE_RESERVED and memblock.reserved with different properties.

I'd really prefer moving in baby steps here because any change in the b= oot
mm can bear several month of early hangs debugging ;-)

Yeah I know. We just should have the = desired target state figured out :)




> > Split out initialization of the reserved pages to a function with= a
> > meaningful name and treat the MEMBLOCK_NOMAP regions the same way= as the
> > reserved regions and mark struct pages for the NOMAP regions as > > PageReserved.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >=C2=A0 =C2=A0mm/memblock.c | 23 +++++++++++++++++++++--
> >=C2=A0 =C2=A01 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index afaefa8fc6ab..6b7ea9d86310 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
> > @@ -2002,6 +2002,26 @@ static unsigned long __init __free_memory_= core(phys_addr_t start,
> >=C2=A0 =C2=A0 =C2=A0return end_pfn - start_pfn;
> >=C2=A0 =C2=A0}
> > +static void __init memmap_init_reserved_pages(void)
> > +{
> > +=C2=A0 =C2=A0struct memblock_region *region;
> > +=C2=A0 =C2=A0phys_addr_t start, end;
> > +=C2=A0 =C2=A0u64 i;
> > +
> > +=C2=A0 =C2=A0/* initialize struct pages for the reserved regions= */
> > +=C2=A0 =C2=A0for_each_reserved_mem_range(i, &start, &end= )
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reserve_bootmem_region(= start, end);
> > +
> > +=C2=A0 =C2=A0/* and also treat struct pages for the NOMAP region= s as PageReserved */
> > +=C2=A0 =C2=A0for_each_mem_region(region) {
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (memblock_is_nomap(r= egion)) {
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0start =3D region->base;
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0end =3D start + region->size;
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0reserve_bootmem_region(start, end);
> > +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> > +=C2=A0 =C2=A0}
> > +}
> > +
> >=C2=A0 =C2=A0static unsigned long __init free_low_memory_core_earl= y(void)
> >=C2=A0 =C2=A0{
> >=C2=A0 =C2=A0 =C2=A0unsigned long count =3D 0;
> > @@ -2010,8 +2030,7 @@ static unsigned long __init free_low_memory= _core_early(void)
> >=C2=A0 =C2=A0 =C2=A0memblock_clear_hotplug(0, -1);
> > -=C2=A0 =C2=A0for_each_reserved_mem_range(i, &start, &end= )
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reserve_bootmem_region(= start, end);
> > +=C2=A0 =C2=A0memmap_init_reserved_pages();
> >=C2=A0 =C2=A0 =C2=A0/*
> >=C2=A0 =C2=A0 =C2=A0 * We need to use NUMA_NO_NODE instead of NODE= _DATA(0)->node_id

--
Sincerely yours,
Mike.

--
Thanks,

David / dhildenb
--000000000000b5e15d05bff45259-- --===============0545650954576413949== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm --===============0545650954576413949==--