From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-x341.google.com (mail-ot1-x341.google.com [IPv6:2607:f8b0:4864:20::341]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id ED1E02194D3B8 for ; Fri, 7 Jun 2019 08:24:06 -0700 (PDT) Received: by mail-ot1-x341.google.com with SMTP id j19so2213356otq.2 for ; Fri, 07 Jun 2019 08:24:06 -0700 (PDT) MIME-Version: 1.0 References: <155925716254.3775979.16716824941364738117.stgit@dwillia2-desk3.amr.corp.intel.com> <155925718351.3775979.13546720620952434175.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: From: Dan Williams Date: Fri, 7 Jun 2019 08:23:54 -0700 Message-ID: Subject: Re: [PATCH v2 4/8] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Ard Biesheuvel Cc: linux-efi , kbuild test robot , linux-nvdimm , the arch/x86 maintainers , Linux Kernel Mailing List , Mike Rapoport , Linux-MM , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Darren Hart , Thomas Gleixner , Andy Shevchenko List-ID: On Fri, Jun 7, 2019 at 5:29 AM Ard Biesheuvel wrote: > > On Sat, 1 Jun 2019 at 06:26, Dan Williams wrote: > > > > On Fri, May 31, 2019 at 8:30 AM Ard Biesheuvel > > wrote: > > > > > > On Fri, 31 May 2019 at 17:28, Dan Williams wrote: > > > > > > > > On Fri, May 31, 2019 at 1:30 AM Ard Biesheuvel > > > > wrote: > > > > > > > > > > (cc Mike for memblock) > > > > > > > > > > On Fri, 31 May 2019 at 01:13, Dan Williams wrote: > > > > > > > > > > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the > > > > > > interpretation of the EFI Memory Types as "reserved for a special > > > > > > purpose". > > > > > > > > > > > > The proposed Linux behavior for specific purpose memory is that it is > > > > > > reserved for direct-access (device-dax) by default and not available for > > > > > > any kernel usage, not even as an OOM fallback. Later, through udev > > > > > > scripts or another init mechanism, these device-dax claimed ranges can > > > > > > be reconfigured and hot-added to the available System-RAM with a unique > > > > > > node identifier. > > > > > > > > > > > > This patch introduces 3 new concepts at once given the entanglement > > > > > > between early boot enumeration relative to memory that can optionally be > > > > > > reserved from the kernel page allocator by default. The new concepts > > > > > > are: > > > > > > > > > > > > - E820_TYPE_SPECIFIC: Upon detecting the EFI_MEMORY_SP attribute on > > > > > > EFI_CONVENTIONAL memory, update the E820 map with this new type. Only > > > > > > perform this classification if the CONFIG_EFI_SPECIFIC_DAX=y policy is > > > > > > enabled, otherwise treat it as typical ram. > > > > > > > > > > > > > > > > OK, so now we have 'special purpose', 'specific' and 'app specific' > > > > > [below]. Do they all mean the same thing? > > > > > > > > I struggled with separating the raw-EFI-type name from the name of the > > > > Linux specific policy. Since the reservation behavior is optional I > > > > was thinking there should be a distinct Linux kernel name for that > > > > policy. I did try to go back and change all occurrences of "special" > > > > to "specific" from the RFC to this v2, but seems I missed one. > > > > > > > > > > OK > > > > I'll go ahead and use "application reserved" terminology consistently > > throughout the code to distinguish that Linux translation from the raw > > "EFI specific purpose" attribute. > > > > OK > > > > > > > > > > > > > > > - IORES_DESC_APPLICATION_RESERVED: Add a new I/O resource descriptor for > > > > > > a device driver to search iomem resources for application specific > > > > > > memory. Teach the iomem code to identify such ranges as "Application > > > > > > Reserved". > > > > > > > > > > > > - MEMBLOCK_APP_SPECIFIC: Given the memory ranges can fallback to the > > > > > > traditional System RAM pool the expectation is that they will have > > > > > > typical SRAT entries. In order to support a policy of device-dax by > > > > > > default with the option to hotplug later, the numa initialization code > > > > > > is taught to avoid marking online MEMBLOCK_APP_SPECIFIC regions. > > > > > > > > > > > > > > > > Can we move the generic memblock changes into a separate patch please? > > > > > > > > Yeah, that can move to a lead-in patch. > > > > > > > > [..] > > > > > > diff --git a/include/linux/efi.h b/include/linux/efi.h > > > > > > index 91368f5ce114..b57b123cbdf9 100644 > > > > > > --- a/include/linux/efi.h > > > > > > +++ b/include/linux/efi.h > > > > > > @@ -129,6 +129,19 @@ typedef struct { > > > > > > u64 attribute; > > > > > > } efi_memory_desc_t; > > > > > > > > > > > > +#ifdef CONFIG_EFI_SPECIFIC_DAX > > > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md) > > > > > > +{ > > > > > > + return md->type == EFI_CONVENTIONAL_MEMORY > > > > > > + && (md->attribute & EFI_MEMORY_SP); > > > > > > +} > > > > > > +#else > > > > > > +static inline bool is_efi_dax(efi_memory_desc_t *md) > > > > > > +{ > > > > > > + return false; > > > > > > +} > > > > > > +#endif > > > > > > + > > > > > > typedef struct { > > > > > > efi_guid_t guid; > > > > > > u32 headersize; > > > > > > > > > > I'd prefer it if we could avoid this DAX policy distinction leaking > > > > > into the EFI layer. > > > > > > > > > > IOW, I am fine with having a 'is_efi_sp_memory()' helper here, but > > > > > whether that is DAX memory or not should be decided in the DAX layer. > > > > > > > > Ok, how about is_efi_sp_ram()? Since EFI_MEMORY_SP might be applied to > > > > things that aren't EFI_CONVENTIONAL_MEMORY. > > > > > > Yes, that is fine. As long as the #ifdef lives in the DAX code and not here. > > > > We still need some ifdef in the efi core because that is the central > > location to make the policy distinction to identify identify > > EFI_CONVENTIONAL_MEMORY differently depending on whether EFI_MEMORY_SP > > is present. I agree with you that "dax" should be dropped from the > > naming. So how about: > > > > #ifdef CONFIG_EFI_APPLICATION_RESERVED > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md) > > { > > return md->type == EFI_CONVENTIONAL_MEMORY > > && (md->attribute & EFI_MEMORY_SP); > > } > > #else > > static inline bool is_efi_application_reserved(efi_memory_desc_t *md) > > { > > return false; > > } > > #endif > > I think this policy decision should not live inside the EFI subsystem. > EFI just gives you the memory map, and mangling that information > depending on whether you think a certain memory attribute should be > ignored is the job of the MM subsystem. The problem is that we don't have an mm subsystem at the time a decision needs to be made. The reservation policy needs to be deployed before even memblock has been initialized in order to keep kernel allocations out of the reservation. I agree with the sentiment I just don't see how to practically achieve an optional "System RAM" vs "Application Reserved" routing decision without an early (before e820__memblock_setup()) conditional branch. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm