Linux-EFI Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v5 00/10] EFI Specific Purpose Memory Support
@ 2019-08-30  1:52 Dan Williams
  2019-08-30  1:52 ` [PATCH v5 01/10] acpi/numa: Establish a new drivers/acpi/numa/ directory Dan Williams
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: Dave Jiang, Jonathan Cameron, Keith Busch, kbuild test robot,
	Andy Shevchenko, Borislav Petkov, Vishal Verma, H. Peter Anvin,
	x86, Dave Hansen, Ingo Molnar, Len Brown, Peter Zijlstra,
	Rafael J. Wysocki, Ard Biesheuvel, Andy Lutomirski, Darren Hart,
	linux-kernel, linux-efi, x86

Changes since v4 [1]:
- Rename the facility from "Application Reserved" to "Soft Reserved" to
  better reflect how the memory is treated. While the spec talks about
  "specific / application purpose" memory the expected kernel behavior is
  to make a best effort at reserving the memory from general purpose
  allocations.

- Add a new efi=nosoftreserve option to disable consideration of the
  EFI_MEMORY_SP attribute at boot time. This is also motivated by
  Christoph's initial feedback of allowing the kernel to opt-out of the
  policy whims of the platform BIOS implementation.

- Update the KASLR implementation to exclude soft-reserved memory
  including the case where soft-reserved memory is specified via the
  efi_fake_mem= attribute-override command-line option.

- Move the memregion allocator to its own object file. v4 had it in
  kernel/resource.c which caused compile errors on Sparc. I otherwise
  could not find an appropriate place to stash it.

- Rebase on a merge of tip/master and rafael/linux-next since the series
  collides with changes in both those trees.

[1]: https://lore.kernel.org/r/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/

---

Thomas, Rafael,

This happens to collide with both your trees. I think the content
warrants going through the x86 tree, but would need to publish commit:

5c7ed4385424 HMAT: Skip publishing target info for nodes with no online memory

...in Rafael's tree as a stable id for -tip to pull in, but I'm also
open to other options. I've retained Dave's reviewed-by from v4.

---

The EFI 2.8 Specification [2] introduces the EFI_MEMORY_SP ("specific
purpose") memory attribute. This attribute bit replaces the deprecated
ACPI HMAT "reservation hint" that was introduced in ACPI 6.2 and removed
in ACPI 6.3.

Given the increasing diversity of memory types that might be advertised
to the operating system, there is a need for platform firmware to hint
which memory ranges are free for the OS to use as general purpose memory
and which ranges are intended for application specific usage. For
example, an application with prior knowledge of the platform may expect
to be able to exclusively allocate a precious / limited pool of high
bandwidth memory. Alternatively, for the general purpose case, the
operating system may want to make the memory available on a best effort
basis as a unique numa-node with performance properties by the new
CONFIG_HMEM_REPORTING [3] facility.

In support of optionally allowing either application-exclusive and
core-kernel-mm managed access to differentiated memory, claim
EFI_MEMORY_SP ranges for exposure as "soft reserved" and assigned to a
device-dax instance by default. Such instances can be directly owned /
mapped by a platform-topology-aware application. Alternatively, with the
new kmem facility [4], the administrator has the option to instead
designate that those memory ranges be hot-added to the core-kernel-mm as
a unique memory numa-node. In short, allow for the decision about what
software agent manages soft-reserved memory to be made at runtime.

The patches build on the new HMAT+HMEM_REPORTING facilities merged
for v5.2-rc1. The implementation is tested with qemu emulation of HMAT
[5] plus the efi_fake_mem facility for applying the EFI_MEMORY_SP
attribute. Specific details on reproducing the test configuration are in
patch 10.

[2]: https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
[3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1cf33aafb84
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308f
[5]: http://patchwork.ozlabs.org/cover/1096737/

---

Dan Williams (10):
      acpi/numa: Establish a new drivers/acpi/numa/ directory
      efi: Enumerate EFI_MEMORY_SP
      x86, efi: Push EFI_MEMMAP check into leaf routines
      x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
      x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
      lib: Uplevel the pmem "region" ida to a global allocator
      dax: Fix alloc_dax_region() compile warning
      device-dax: Add a driver for "hmem" devices
      acpi/numa/hmat: Register HMAT at device_initcall level
      acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device


 Documentation/admin-guide/kernel-parameters.txt |   19 +++
 arch/x86/Kconfig                                |   21 ++++
 arch/x86/boot/compressed/eboot.c                |    7 +
 arch/x86/boot/compressed/kaslr.c                |   50 +++++++-
 arch/x86/include/asm/e820/types.h               |    8 +
 arch/x86/include/asm/efi-stub.h                 |   11 ++
 arch/x86/include/asm/efi.h                      |   17 +++
 arch/x86/kernel/e820.c                          |   12 ++
 arch/x86/kernel/setup.c                         |   19 ++-
 arch/x86/platform/efi/efi.c                     |   56 +++++++++
 arch/x86/platform/efi/quirks.c                  |    3 +
 drivers/acpi/Kconfig                            |    9 --
 drivers/acpi/Makefile                           |    3 -
 drivers/acpi/hmat/Makefile                      |    2 
 drivers/acpi/numa/Kconfig                       |    8 +
 drivers/acpi/numa/Makefile                      |    3 +
 drivers/acpi/numa/hmat.c                        |  138 +++++++++++++++++++++--
 drivers/acpi/numa/srat.c                        |    0 
 drivers/dax/Kconfig                             |   27 ++++-
 drivers/dax/Makefile                            |    2 
 drivers/dax/bus.c                               |    2 
 drivers/dax/bus.h                               |    2 
 drivers/dax/dax-private.h                       |    2 
 drivers/dax/hmem.c                              |   57 ++++++++++
 drivers/firmware/efi/Makefile                   |    5 +
 drivers/firmware/efi/efi.c                      |    8 +
 drivers/firmware/efi/esrt.c                     |    3 +
 drivers/firmware/efi/fake_mem.c                 |   26 ++--
 drivers/firmware/efi/fake_mem.h                 |   10 ++
 drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 ++
 drivers/firmware/efi/x86-fake_mem.c             |   69 ++++++++++++
 drivers/nvdimm/Kconfig                          |    1 
 drivers/nvdimm/core.c                           |    1 
 drivers/nvdimm/nd-core.h                        |    1 
 drivers/nvdimm/region_devs.c                    |   13 +-
 include/linux/efi.h                             |    4 -
 include/linux/ioport.h                          |    1 
 include/linux/memregion.h                       |   23 ++++
 lib/Kconfig                                     |    3 +
 lib/Makefile                                    |    1 
 lib/memregion.c                                 |   18 +++
 41 files changed, 584 insertions(+), 93 deletions(-)
 create mode 100644 arch/x86/include/asm/efi-stub.h
 delete mode 100644 drivers/acpi/hmat/Makefile
 rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (70%)
 create mode 100644 drivers/acpi/numa/Makefile
 rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (85%)
 rename drivers/acpi/{numa.c => numa/srat.c} (100%)
 create mode 100644 drivers/dax/hmem.c
 create mode 100644 drivers/firmware/efi/fake_mem.h
 create mode 100644 drivers/firmware/efi/x86-fake_mem.c
 create mode 100644 include/linux/memregion.h
 create mode 100644 lib/memregion.c

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 01/10] acpi/numa: Establish a new drivers/acpi/numa/ directory
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
@ 2019-08-30  1:52 ` Dan Williams
  2019-08-30  1:52 ` [PATCH v5 02/10] efi: Enumerate EFI_MEMORY_SP Dan Williams
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: Len Brown, Keith Busch, Rafael J. Wysocki, Dave Hansen, peterz,
	vishal.l.verma, ard.biesheuvel, linux-kernel, linux-efi, x86

Currently hmat.c lives under an "hmat" directory which does not enhance
the description of the file. The initial motivation for giving hmat.c
its own directory was to delineate it as mm functionality in contrast to
ACPI device driver functionality.

As ACPI continues to play an increasing role in conveying
memory location and performance topology information to the OS take the
opportunity to co-locate these NUMA relevant tables in a combined
directory.

numa.c is renamed to srat.c and moved to drivers/acpi/numa/ along with
hmat.c.

Cc: Len Brown <lenb@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/Kconfig       |    9 +--------
 drivers/acpi/Makefile      |    3 +--
 drivers/acpi/hmat/Makefile |    2 --
 drivers/acpi/numa/Kconfig  |    7 ++++++-
 drivers/acpi/numa/Makefile |    3 +++
 drivers/acpi/numa/hmat.c   |    0 
 drivers/acpi/numa/srat.c   |    0 
 7 files changed, 11 insertions(+), 13 deletions(-)
 delete mode 100644 drivers/acpi/hmat/Makefile
 rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (72%)
 create mode 100644 drivers/acpi/numa/Makefile
 rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (100%)
 rename drivers/acpi/{numa.c => numa/srat.c} (100%)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 5f6158973289..8c7c46065e9d 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -319,12 +319,6 @@ config ACPI_THERMAL
 	  To compile this driver as a module, choose M here:
 	  the module will be called thermal.
 
-config ACPI_NUMA
-	bool "NUMA support"
-	depends on NUMA
-	depends on (X86 || IA64 || ARM64)
-	default y if IA64_GENERIC || IA64_SGI_SN2 || ARM64
-
 config ACPI_CUSTOM_DSDT_FILE
 	string "Custom DSDT Table file to include"
 	default ""
@@ -473,8 +467,7 @@ config ACPI_REDUCED_HARDWARE_ONLY
 	  If you are unsure what to do, do not enable this option.
 
 source "drivers/acpi/nfit/Kconfig"
-source "drivers/acpi/hmat/Kconfig"
-
+source "drivers/acpi/numa/Kconfig"
 source "drivers/acpi/apei/Kconfig"
 source "drivers/acpi/dptf/Kconfig"
 
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 5d361e4e3405..f08a661274e8 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -55,7 +55,6 @@ acpi-$(CONFIG_X86)		+= acpi_cmos_rtc.o
 acpi-$(CONFIG_X86)		+= x86/apple.o
 acpi-$(CONFIG_X86)		+= x86/utils.o
 acpi-$(CONFIG_DEBUG_FS)		+= debugfs.o
-acpi-$(CONFIG_ACPI_NUMA)	+= numa.o
 acpi-$(CONFIG_ACPI_PROCFS_POWER) += cm_sbs.o
 acpi-y				+= acpi_lpat.o
 acpi-$(CONFIG_ACPI_LPIT)	+= acpi_lpit.o
@@ -80,7 +79,7 @@ obj-$(CONFIG_ACPI_PROCESSOR)	+= processor.o
 obj-$(CONFIG_ACPI)		+= container.o
 obj-$(CONFIG_ACPI_THERMAL)	+= thermal.o
 obj-$(CONFIG_ACPI_NFIT)		+= nfit/
-obj-$(CONFIG_ACPI_HMAT)		+= hmat/
+obj-$(CONFIG_ACPI_NUMA)		+= numa/
 obj-$(CONFIG_ACPI)		+= acpi_memhotplug.o
 obj-$(CONFIG_ACPI_HOTPLUG_IOAPIC) += ioapic.o
 obj-$(CONFIG_ACPI_BATTERY)	+= battery.o
diff --git a/drivers/acpi/hmat/Makefile b/drivers/acpi/hmat/Makefile
deleted file mode 100644
index 1c20ef36a385..000000000000
--- a/drivers/acpi/hmat/Makefile
+++ /dev/null
@@ -1,2 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_ACPI_HMAT) := hmat.o
diff --git a/drivers/acpi/hmat/Kconfig b/drivers/acpi/numa/Kconfig
similarity index 72%
rename from drivers/acpi/hmat/Kconfig
rename to drivers/acpi/numa/Kconfig
index 95a29964dbea..d14582387ed0 100644
--- a/drivers/acpi/hmat/Kconfig
+++ b/drivers/acpi/numa/Kconfig
@@ -1,4 +1,9 @@
-# SPDX-License-Identifier: GPL-2.0
+config ACPI_NUMA
+	bool "NUMA support"
+	depends on NUMA
+	depends on (X86 || IA64 || ARM64)
+	default y if IA64_GENERIC || IA64_SGI_SN2 || ARM64
+
 config ACPI_HMAT
 	bool "ACPI Heterogeneous Memory Attribute Table Support"
 	depends on ACPI_NUMA
diff --git a/drivers/acpi/numa/Makefile b/drivers/acpi/numa/Makefile
new file mode 100644
index 000000000000..517a6c689a94
--- /dev/null
+++ b/drivers/acpi/numa/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_ACPI_NUMA) += srat.o
+obj-$(CONFIG_ACPI_HMAT) += hmat.o
diff --git a/drivers/acpi/hmat/hmat.c b/drivers/acpi/numa/hmat.c
similarity index 100%
rename from drivers/acpi/hmat/hmat.c
rename to drivers/acpi/numa/hmat.c
diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa/srat.c
similarity index 100%
rename from drivers/acpi/numa.c
rename to drivers/acpi/numa/srat.c


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 02/10] efi: Enumerate EFI_MEMORY_SP
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
  2019-08-30  1:52 ` [PATCH v5 01/10] acpi/numa: Establish a new drivers/acpi/numa/ directory Dan Williams
@ 2019-08-30  1:52 ` Dan Williams
  2019-08-30  1:52 ` [PATCH v5 03/10] x86, efi: Push EFI_MEMMAP check into leaf routines Dan Williams
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: Ard Biesheuvel, Dave Hansen, peterz, vishal.l.verma,
	linux-kernel, linux-efi, x86

UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
interpretation of the EFI Memory Types as "reserved for a specific
purpose". The intent of this bit is to allow the OS to identify precious
or scarce memory resources and optionally manage it separately from
EfiConventionalMemory. As defined older OSes that do not know about this
attribute are permitted to ignore it and the memory will be handled
according to the OS default policy for the given memory type.

In other words, this "specific purpose" hint is deliberately weaker than
EfiReservedMemoryType in that the system continues to operate if the OS
takes no action on the attribute. The risk of taking no action is
potentially unwanted / unmovable kernel allocations from the designated
resource that prevent the full realization of the "specific purpose".
For example, consider a system with a high-bandwidth memory pool. Older
kernels are permitted to boot and consume that memory as conventional
"System-RAM" newer kernels may arrange for that memory to be set aside
(soft reserved) by the system administrator for a dedicated
high-bandwidth memory aware application to consume.

Specifically, this mechanism allows for the elimination of scenarios
where platform firmware tries to game OS policy by lying about ACPI SLIT
values, i.e. claiming that a precious memory resource has a high
distance to trigger the OS to avoid it by default. This reservation hint
allows platform-firmware to instead tell the truth about performance
characteristics by indicate to OS memory management to put immovable
allocations elsewhere.

Implement simple detection of the bit for EFI memory table dumps and
save the kernel policy for a follow-on change.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/firmware/efi/efi.c |    5 +++--
 include/linux/efi.h        |    1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 8f1ab04f6743..363bb9d00fa5 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -833,15 +833,16 @@ char * __init efi_md_typeattr_format(char *buf, size_t size,
 	if (attr & ~(EFI_MEMORY_UC | EFI_MEMORY_WC | EFI_MEMORY_WT |
 		     EFI_MEMORY_WB | EFI_MEMORY_UCE | EFI_MEMORY_RO |
 		     EFI_MEMORY_WP | EFI_MEMORY_RP | EFI_MEMORY_XP |
-		     EFI_MEMORY_NV |
+		     EFI_MEMORY_NV | EFI_MEMORY_SP |
 		     EFI_MEMORY_RUNTIME | EFI_MEMORY_MORE_RELIABLE))
 		snprintf(pos, size, "|attr=0x%016llx]",
 			 (unsigned long long)attr);
 	else
 		snprintf(pos, size,
-			 "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
+			 "|%3s|%2s|%2s|%2s|%2s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
 			 attr & EFI_MEMORY_RUNTIME ? "RUN" : "",
 			 attr & EFI_MEMORY_MORE_RELIABLE ? "MR" : "",
+			 attr & EFI_MEMORY_SP      ? "SP"  : "",
 			 attr & EFI_MEMORY_NV      ? "NV"  : "",
 			 attr & EFI_MEMORY_XP      ? "XP"  : "",
 			 attr & EFI_MEMORY_RP      ? "RP"  : "",
diff --git a/include/linux/efi.h b/include/linux/efi.h
index bd3837022307..5c1dd0221384 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -112,6 +112,7 @@ typedef	struct {
 #define EFI_MEMORY_MORE_RELIABLE \
 				((u64)0x0000000000010000ULL)	/* higher reliability */
 #define EFI_MEMORY_RO		((u64)0x0000000000020000ULL)	/* read-only */
+#define EFI_MEMORY_SP		((u64)0x0000000000040000ULL)	/* soft reserved */
 #define EFI_MEMORY_RUNTIME	((u64)0x8000000000000000ULL)	/* range requires runtime mapping */
 #define EFI_MEMORY_DESCRIPTOR_VERSION	1
 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 03/10] x86, efi: Push EFI_MEMMAP check into leaf routines
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
  2019-08-30  1:52 ` [PATCH v5 01/10] acpi/numa: Establish a new drivers/acpi/numa/ directory Dan Williams
  2019-08-30  1:52 ` [PATCH v5 02/10] efi: Enumerate EFI_MEMORY_SP Dan Williams
@ 2019-08-30  1:52 ` Dan Williams
  2019-09-13  9:05   ` Ard Biesheuvel
  2019-08-30  1:52 ` [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax Dan Williams
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: x86, Ingo Molnar, H. Peter Anvin, Andy Lutomirski,
	Peter Zijlstra, Dave Hansen, vishal.l.verma, ard.biesheuvel,
	linux-kernel, linux-efi, x86

In preparation for adding another EFI_MEMMAP dependent call that needs
to occur before e820__memblock_setup() fixup the existing efi calls to
check for EFI_MEMMAP internally. This ends up being cleaner than the
alternative of checking EFI_MEMMAP multiple times in setup_arch().

Cc: <x86@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/include/asm/efi.h      |    9 ++++++++-
 arch/x86/kernel/setup.c         |   19 +++++++++----------
 arch/x86/platform/efi/efi.c     |    3 +++
 arch/x86/platform/efi/quirks.c  |    3 +++
 drivers/firmware/efi/esrt.c     |    3 +++
 drivers/firmware/efi/fake_mem.c |    2 +-
 include/linux/efi.h             |    2 --
 7 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 43a82e59c59d..45f853bce869 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -140,7 +140,6 @@ extern void efi_delete_dummy_variable(void);
 extern void efi_switch_mm(struct mm_struct *mm);
 extern void efi_recover_from_page_fault(unsigned long phys_addr);
 extern void efi_free_boot_services(void);
-extern void efi_reserve_boot_services(void);
 
 struct efi_setup_data {
 	u64 fw_vendor;
@@ -244,6 +243,8 @@ static inline bool efi_is_64bit(void)
 extern bool efi_reboot_required(void);
 extern bool efi_is_table_address(unsigned long phys_addr);
 
+extern void efi_find_mirror(void);
+extern void efi_reserve_boot_services(void);
 #else
 static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
 static inline bool efi_reboot_required(void)
@@ -254,6 +255,12 @@ static inline  bool efi_is_table_address(unsigned long phys_addr)
 {
 	return false;
 }
+static inline void efi_find_mirror(void)
+{
+}
+static inline void efi_reserve_boot_services(void)
+{
+}
 #endif /* CONFIG_EFI */
 
 #endif /* _ASM_X86_EFI_H */
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bbe35bf879f5..9bfecb542440 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1118,21 +1118,20 @@ void __init setup_arch(char **cmdline_p)
 	cleanup_highmap();
 
 	memblock_set_current_limit(ISA_END_ADDRESS);
+
 	e820__memblock_setup();
 
 	reserve_bios_regions();
 
-	if (efi_enabled(EFI_MEMMAP)) {
-		efi_fake_memmap();
-		efi_find_mirror();
-		efi_esrt_init();
+	efi_fake_memmap();
+	efi_find_mirror();
+	efi_esrt_init();
 
-		/*
-		 * The EFI specification says that boot service code won't be
-		 * called after ExitBootServices(). This is, in fact, a lie.
-		 */
-		efi_reserve_boot_services();
-	}
+	/*
+	 * The EFI specification says that boot service code won't be
+	 * called after ExitBootServices(). This is, in fact, a lie.
+	 */
+	efi_reserve_boot_services();
 
 	/* preallocate 4k for mptable mpc */
 	e820__memblock_alloc_reserved_mpc_new();
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index c202e1b07e29..0bb58eb33ca0 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -128,6 +128,9 @@ void __init efi_find_mirror(void)
 	efi_memory_desc_t *md;
 	u64 mirror_size = 0, total_size = 0;
 
+	if (!efi_enabled(EFI_MEMMAP))
+		return;
+
 	for_each_efi_memory_desc(md) {
 		unsigned long long start = md->phys_addr;
 		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index 3b9fd679cea9..7675cf754d90 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -320,6 +320,9 @@ void __init efi_reserve_boot_services(void)
 {
 	efi_memory_desc_t *md;
 
+	if (!efi_enabled(EFI_MEMMAP))
+		return;
+
 	for_each_efi_memory_desc(md) {
 		u64 start = md->phys_addr;
 		u64 size = md->num_pages << EFI_PAGE_SHIFT;
diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
index d6dd5f503fa2..2762e0662bf4 100644
--- a/drivers/firmware/efi/esrt.c
+++ b/drivers/firmware/efi/esrt.c
@@ -246,6 +246,9 @@ void __init efi_esrt_init(void)
 	int rc;
 	phys_addr_t end;
 
+	if (!efi_enabled(EFI_MEMMAP))
+		return;
+
 	pr_debug("esrt-init: loading.\n");
 	if (!esrt_table_exists())
 		return;
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 9501edc0fcfb..526b45331d96 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -44,7 +44,7 @@ void __init efi_fake_memmap(void)
 	void *new_memmap;
 	int i;
 
-	if (!nr_fake_mem)
+	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
 		return;
 
 	/* count up the number of EFI memory descriptor */
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 5c1dd0221384..acc2b8982ed2 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1045,9 +1045,7 @@ extern void efi_enter_virtual_mode (void);	/* switch EFI to virtual mode, if pos
 extern efi_status_t efi_query_variable_store(u32 attributes,
 					     unsigned long size,
 					     bool nonblocking);
-extern void efi_find_mirror(void);
 #else
-
 static inline efi_status_t efi_query_variable_store(u32 attributes,
 						    unsigned long size,
 						    bool nonblocking)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (2 preceding siblings ...)
  2019-08-30  1:52 ` [PATCH v5 03/10] x86, efi: Push EFI_MEMMAP check into leaf routines Dan Williams
@ 2019-08-30  1:52 ` Dan Williams
  2019-09-13 12:59   ` Ard Biesheuvel
  2019-08-30  1:52 ` [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP Dan Williams
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: x86, Borislav Petkov, Ingo Molnar, H. Peter Anvin, Darren Hart,
	Andy Shevchenko, Andy Lutomirski, Peter Zijlstra, Ard Biesheuvel,
	kbuild test robot, Dave Hansen, vishal.l.verma, linux-kernel,
	linux-efi, x86

UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
interpretation of the EFI Memory Types as "reserved for a specific
purpose".

The proposed Linux behavior for specific purpose memory is that it is
reserved for direct-access (device-dax) by default and not available for
any kernel usage, not even as an OOM fallback.  Later, through udev
scripts or another init mechanism, these device-dax claimed ranges can
be reconfigured and hot-added to the available System-RAM with a unique
node identifier. This device-dax management scheme implements "soft" in
the "soft reserved" designation by allowing some or all of the
reservation to be recovered as typical memory. This policy can be
disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
efi=nosoftreserve.

This patch introduces 2 new concepts at once given the entanglement
between early boot enumeration relative to memory that can optionally be
reserved from the kernel page allocator by default. The new concepts
are:

- E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
  attribute on EFI_CONVENTIONAL memory, update the E820 map with this
  new type. Only perform this classification if the
  CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
  typical ram.

- IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
  a device driver to search iomem resources for application specific
  memory. Teach the iomem code to identify such ranges as "Soft Reserved".

A follow-on change integrates parsing of the ACPI HMAT to identify the
node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
now, just identify and reserve memory of this type.

The translation of EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP to "soft
reserved" is x86/E820-only, but other archs could choose to publish
IORES_DESC_SOFT_RESERVED resources from their platform-firmware memory
map handlers. Other EFI-capable platforms would need to go audit their
local usages of EFI_CONVENTIONAL_MEMORY to consider the soft reserved
case.

Cc: <x86@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Andy Shevchenko <andy@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   19 +++++++--
 arch/x86/Kconfig                                |   21 +++++++++
 arch/x86/boot/compressed/eboot.c                |    7 +++
 arch/x86/boot/compressed/kaslr.c                |    4 ++
 arch/x86/include/asm/e820/types.h               |    8 ++++
 arch/x86/include/asm/efi-stub.h                 |   11 +++++
 arch/x86/kernel/e820.c                          |   12 +++++
 arch/x86/platform/efi/efi.c                     |   51 +++++++++++++++++++++--
 drivers/firmware/efi/efi.c                      |    3 +
 drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 +++++
 include/linux/efi.h                             |    1 
 include/linux/ioport.h                          |    1 
 12 files changed, 139 insertions(+), 11 deletions(-)
 create mode 100644 arch/x86/include/asm/efi-stub.h

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1c67acd1df65..dd28f0726309 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1152,7 +1152,8 @@
 			Format: {"off" | "on" | "skip[mbr]"}
 
 	efi=		[EFI]
-			Format: { "old_map", "nochunk", "noruntime", "debug" }
+			Format: { "old_map", "nochunk", "noruntime", "debug",
+				  "nosoftreserve" }
 			old_map [X86-64]: switch to the old ioremap-based EFI
 			runtime services mapping. 32-bit still uses this one by
 			default.
@@ -1161,6 +1162,12 @@
 			firmware implementations.
 			noruntime : disable EFI runtime services support
 			debug: enable misc debug output
+			nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
+			attribute may cause the kernel to reserve the
+			memory range for a memory mapping driver to
+			claim. Specify efi=nosoftreserve to disable this
+			reservation and treat the memory by its base type
+			(i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
 
 	efi_no_storage_paranoia [EFI; X86]
 			Using this parameter you can use more than 50% of
@@ -1173,15 +1180,21 @@
 			updating original EFI memory map.
 			Region of memory which aa attribute is added to is
 			from ss to ss+nn.
+
 			If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000
 			is specified, EFI_MEMORY_MORE_RELIABLE(0x10000)
 			attribute is added to range 0x100000000-0x180000000 and
 			0x10a0000000-0x1120000000.
 
+			If efi_fake_mem=8G@9G:0x40000 is specified, the
+			EFI_MEMORY_SP(0x40000) attribute is added to
+			range 0x240000000-0x43fffffff.
+
 			Using this parameter you can do debugging of EFI memmap
-			related feature. For example, you can do debugging of
+			related features. For example, you can do debugging of
 			Address Range Mirroring feature even if your box
-			doesn't support it.
+			doesn't support it, or mark specific memory as
+			"soft reserved".
 
 	efivar_ssdt=	[EFI; X86] Name of an EFI variable that contains an SSDT
 			that is to be dynamically loaded by Linux. If there are
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4195f44c6a09..bced13503bb1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1981,6 +1981,27 @@ config EFI_MIXED
 
 	   If unsure, say N.
 
+config EFI_SOFT_RESERVE
+	bool "Reserve EFI Specific Purpose Memory"
+	depends on EFI && ACPI_HMAT
+	default ACPI_HMAT
+	---help---
+	  On systems that have mixed performance classes of memory EFI
+	  may indicate specific purpose memory with an attribute (See
+	  EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
+	  attribute may have unique performance characteristics compared
+	  to the system's general purpose "System RAM" pool. On the
+	  expectation that such memory has application specific usage,
+	  and its base EFI memory type is "conventional" answer Y to
+	  arrange for the kernel to reserve it as a "Soft Reserved"
+	  resource, and set aside for direct-access (device-dax) by
+	  default. The memory range can later be optionally assigned to
+	  the page allocator by system administrator policy via the
+	  device-dax kmem facility. Say N to have the kernel treat this
+	  memory as "System RAM" by default.
+
+	  If unsure, say Y.
+
 config SECCOMP
 	def_bool y
 	prompt "Enable seccomp to safely compute untrusted bytecode"
diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index d6662fdef300..f2dc5896d770 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -10,6 +10,7 @@
 #include <linux/pci.h>
 
 #include <asm/efi.h>
+#include <asm/efi-stub.h>
 #include <asm/e820/types.h>
 #include <asm/setup.h>
 #include <asm/desc.h>
@@ -553,7 +554,11 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
 		case EFI_BOOT_SERVICES_CODE:
 		case EFI_BOOT_SERVICES_DATA:
 		case EFI_CONVENTIONAL_MEMORY:
-			e820_type = E820_TYPE_RAM;
+			if (!efi_nosoftreserve
+					&& (d->attribute & EFI_MEMORY_SP))
+				e820_type = E820_TYPE_SOFT_RESERVED;
+			else
+				e820_type = E820_TYPE_RAM;
 			break;
 
 		case EFI_ACPI_MEMORY_NVS:
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 2e53c056ba20..093e84e28b7a 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -38,6 +38,7 @@
 #include <linux/efi.h>
 #include <generated/utsrelease.h>
 #include <asm/efi.h>
+#include <asm/efi-stub.h>
 
 /* Macros used by the included decompressor code below. */
 #define STATIC
@@ -760,6 +761,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
 		if (md->type != EFI_CONVENTIONAL_MEMORY)
 			continue;
 
+		if (!efi_nosoftreserve && (md->attribute & EFI_MEMORY_SP))
+			continue;
+
 		if (efi_mirror_found &&
 		    !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
 			continue;
diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
index c3aa4b5e49e2..314f75d886d0 100644
--- a/arch/x86/include/asm/e820/types.h
+++ b/arch/x86/include/asm/e820/types.h
@@ -28,6 +28,14 @@ enum e820_type {
 	 */
 	E820_TYPE_PRAM		= 12,
 
+	/*
+	 * Special-purpose memory is indicated to the system via the
+	 * EFI_MEMORY_SP attribute. Define an e820 translation of this
+	 * memory type for the purpose of reserving this range and
+	 * marking it with the IORES_DESC_SOFT_RESERVED designation.
+	 */
+	E820_TYPE_SOFT_RESERVED	= 0xefffffff,
+
 	/*
 	 * Reserved RAM used by the kernel itself if
 	 * CONFIG_INTEL_TXT=y is enabled, memory of this type
diff --git a/arch/x86/include/asm/efi-stub.h b/arch/x86/include/asm/efi-stub.h
new file mode 100644
index 000000000000..16ebd036387b
--- /dev/null
+++ b/arch/x86/include/asm/efi-stub.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _X86_EFI_STUB_H_
+#define _X86_EFI_STUB_H_
+
+#ifdef CONFIG_EFI_STUB
+extern bool efi_nosoftreserve;
+#else
+#define efi_nosoftreserve (1)
+#endif
+
+#endif /* _X86_EFI_STUB_H_ */
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 7da2bcd2b8eb..9976106b57ec 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -190,6 +190,7 @@ static void __init e820_print_type(enum e820_type type)
 	case E820_TYPE_RAM:		/* Fall through: */
 	case E820_TYPE_RESERVED_KERN:	pr_cont("usable");			break;
 	case E820_TYPE_RESERVED:	pr_cont("reserved");			break;
+	case E820_TYPE_SOFT_RESERVED:	pr_cont("soft reserved");		break;
 	case E820_TYPE_ACPI:		pr_cont("ACPI data");			break;
 	case E820_TYPE_NVS:		pr_cont("ACPI NVS");			break;
 	case E820_TYPE_UNUSABLE:	pr_cont("unusable");			break;
@@ -1037,6 +1038,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry)
 	case E820_TYPE_PRAM:		return "Persistent Memory (legacy)";
 	case E820_TYPE_PMEM:		return "Persistent Memory";
 	case E820_TYPE_RESERVED:	return "Reserved";
+	case E820_TYPE_SOFT_RESERVED:	return "Soft Reserved";
 	default:			return "Unknown E820 type";
 	}
 }
@@ -1052,6 +1054,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry)
 	case E820_TYPE_PRAM:		/* Fall-through: */
 	case E820_TYPE_PMEM:		/* Fall-through: */
 	case E820_TYPE_RESERVED:	/* Fall-through: */
+	case E820_TYPE_SOFT_RESERVED:	/* Fall-through: */
 	default:			return IORESOURCE_MEM;
 	}
 }
@@ -1064,6 +1067,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
 	case E820_TYPE_PMEM:		return IORES_DESC_PERSISTENT_MEMORY;
 	case E820_TYPE_PRAM:		return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
 	case E820_TYPE_RESERVED:	return IORES_DESC_RESERVED;
+	case E820_TYPE_SOFT_RESERVED:	return IORES_DESC_SOFT_RESERVED;
 	case E820_TYPE_RESERVED_KERN:	/* Fall-through: */
 	case E820_TYPE_RAM:		/* Fall-through: */
 	case E820_TYPE_UNUSABLE:	/* Fall-through: */
@@ -1078,11 +1082,12 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res)
 		return true;
 
 	/*
-	 * Treat persistent memory like device memory, i.e. reserve it
-	 * for exclusive use of a driver
+	 * Treat persistent memory and other special memory ranges like
+	 * device memory, i.e. reserve it for exclusive use of a driver
 	 */
 	switch (type) {
 	case E820_TYPE_RESERVED:
+	case E820_TYPE_SOFT_RESERVED:
 	case E820_TYPE_PRAM:
 	case E820_TYPE_PMEM:
 		return false;
@@ -1285,6 +1290,9 @@ void __init e820__memblock_setup(void)
 		if (end != (resource_size_t)end)
 			continue;
 
+		if (entry->type == E820_TYPE_SOFT_RESERVED)
+			memblock_reserve(entry->addr, entry->size);
+
 		if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
 			continue;
 
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 0bb58eb33ca0..9cfb7f1cf25d 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -151,10 +151,18 @@ void __init efi_find_mirror(void)
  * more than the max 128 entries that can fit in the e820 legacy
  * (zeropage) memory map.
  */
+enum add_efi_mode {
+	ADD_EFI_ALL,
+	ADD_EFI_SOFT_RESERVED,
+};
 
-static void __init do_add_efi_memmap(void)
+static void __init do_add_efi_memmap(enum add_efi_mode mode)
 {
 	efi_memory_desc_t *md;
+	int add = 0;
+
+	if (!efi_enabled(EFI_MEMMAP))
+		return;
 
 	for_each_efi_memory_desc(md) {
 		unsigned long long start = md->phys_addr;
@@ -167,7 +175,10 @@ static void __init do_add_efi_memmap(void)
 		case EFI_BOOT_SERVICES_CODE:
 		case EFI_BOOT_SERVICES_DATA:
 		case EFI_CONVENTIONAL_MEMORY:
-			if (md->attribute & EFI_MEMORY_WB)
+			if (efi_enabled(EFI_MEM_SOFT_RESERVE)
+					&& (md->attribute & EFI_MEMORY_SP))
+				e820_type = E820_TYPE_SOFT_RESERVED;
+			else if (md->attribute & EFI_MEMORY_WB)
 				e820_type = E820_TYPE_RAM;
 			else
 				e820_type = E820_TYPE_RESERVED;
@@ -193,9 +204,17 @@ static void __init do_add_efi_memmap(void)
 			e820_type = E820_TYPE_RESERVED;
 			break;
 		}
+
+		if (e820_type == E820_TYPE_SOFT_RESERVED)
+			/* always add E820_TYPE_SOFT_RESERVED */;
+		else if (mode == ADD_EFI_SOFT_RESERVED)
+			continue;
+
+		add++;
 		e820__range_add(start, size, e820_type);
 	}
-	e820__update_table(e820_table);
+	if (add)
+		e820__update_table(e820_table);
 }
 
 int __init efi_memblock_x86_reserve_range(void)
@@ -227,8 +246,18 @@ int __init efi_memblock_x86_reserve_range(void)
 	if (rv)
 		return rv;
 
-	if (add_efi_memmap)
-		do_add_efi_memmap();
+	if (add_efi_memmap) {
+		do_add_efi_memmap(ADD_EFI_ALL);
+	} else {
+		/*
+		 * Given add_efi_memmap defaults to 0 and there there is no e820
+		 * mechanism for soft-reserved memory. Explicitly scan for
+		 * soft-reserved memory. Otherwise, the mechanism to disable the
+		 * kernel's consideration of EFI_MEMORY_SP is the
+		 * efi=nosoftreserve option.
+		 */
+		do_add_efi_memmap(ADD_EFI_SOFT_RESERVED);
+	}
 
 	WARN(efi.memmap.desc_version != 1,
 	     "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
@@ -781,6 +810,15 @@ static bool should_map_region(efi_memory_desc_t *md)
 	if (IS_ENABLED(CONFIG_X86_32))
 		return false;
 
+	/*
+	 * EFI specific purpose memory may be reserved by default
+	 * depending on kernel config and boot options.
+	 */
+	if (md->type == EFI_CONVENTIONAL_MEMORY
+			&& efi_enabled(EFI_MEM_SOFT_RESERVE)
+			&& (md->attribute & EFI_MEMORY_SP))
+		return false;
+
 	/*
 	 * Map all of RAM so that we can access arguments in the 1:1
 	 * mapping when making EFI runtime calls.
@@ -1072,6 +1110,9 @@ static int __init arch_parse_efi_cmdline(char *str)
 	if (parse_option_str(str, "old_map"))
 		set_bit(EFI_OLD_MEMMAP, &efi.flags);
 
+	if (parse_option_str(str, "nosoftreserve"))
+		clear_bit(EFI_MEM_SOFT_RESERVE, &efi.flags);
+
 	return 0;
 }
 early_param("efi", arch_parse_efi_cmdline);
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 363bb9d00fa5..6d54d5c74347 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -52,6 +52,9 @@ struct efi __read_mostly efi = {
 	.tpm_log		= EFI_INVALID_TABLE_ADDR,
 	.tpm_final_log		= EFI_INVALID_TABLE_ADDR,
 	.mem_reserve		= EFI_INVALID_TABLE_ADDR,
+#ifdef CONFIG_EFI_SOFT_RESERVE
+	.flags			= 1UL << EFI_MEM_SOFT_RESERVE,
+#endif
 };
 EXPORT_SYMBOL(efi);
 
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index 3caae7f2cf56..35ee98a2c00c 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -28,6 +28,7 @@
 #define EFI_READ_CHUNK_SIZE	(1024 * 1024)
 
 static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
+bool efi_nosoftreserve;
 
 static int __section(.data) __nokaslr;
 static int __section(.data) __quiet;
@@ -211,6 +212,9 @@ efi_status_t efi_high_alloc(efi_system_table_t *sys_table_arg,
 		if (desc->type != EFI_CONVENTIONAL_MEMORY)
 			continue;
 
+		if (!efi_nosoftreserve && (desc->attribute & EFI_MEMORY_SP))
+			continue;
+
 		if (desc->num_pages < nr_pages)
 			continue;
 
@@ -305,6 +309,9 @@ efi_status_t efi_low_alloc(efi_system_table_t *sys_table_arg,
 		if (desc->type != EFI_CONVENTIONAL_MEMORY)
 			continue;
 
+		if (!efi_nosoftreserve && (desc->attribute & EFI_MEMORY_SP))
+			continue;
+
 		if (desc->num_pages < nr_pages)
 			continue;
 
@@ -489,6 +496,11 @@ efi_status_t efi_parse_options(char const *cmdline)
 			__novamap = 1;
 		}
 
+		if (!strncmp(str, "nosoftreserve", 7)) {
+			str += strlen("nosoftreserve");
+			efi_nosoftreserve = 1;
+		}
+
 		/* Group words together, delimited by "," */
 		while (*str && *str != ' ' && *str != ',')
 			str++;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index acc2b8982ed2..f50e0f01a5ed 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1201,6 +1201,7 @@ extern int __init efi_setup_pcdp_console(char *);
 #define EFI_DBG			8	/* Print additional debug info at runtime */
 #define EFI_NX_PE_DATA		9	/* Can runtime data regions be mapped non-executable? */
 #define EFI_MEM_ATTR		10	/* Did firmware publish an EFI_MEMORY_ATTRIBUTES table? */
+#define EFI_MEM_SOFT_RESERVE	11	/* Is the kernel configured to honor soft reservations? */
 
 #ifdef CONFIG_EFI
 /*
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 5b6a7121c9f0..17d9b1abc2f0 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -134,6 +134,7 @@ enum {
 	IORES_DESC_PERSISTENT_MEMORY_LEGACY	= 5,
 	IORES_DESC_DEVICE_PRIVATE_MEMORY	= 6,
 	IORES_DESC_RESERVED			= 7,
+	IORES_DESC_SOFT_RESERVED		= 8,
 };
 
 /*


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (3 preceding siblings ...)
  2019-08-30  1:52 ` [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax Dan Williams
@ 2019-08-30  1:52 ` Dan Williams
  2019-09-10  6:48   ` Ingo Molnar
                     ` (2 more replies)
  2019-08-30  1:52 ` [PATCH v5 06/10] lib: Uplevel the pmem "region" ida to a global allocator Dan Williams
                   ` (5 subsequent siblings)
  10 siblings, 3 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: x86, Borislav Petkov, Ingo Molnar, H. Peter Anvin,
	Ard Biesheuvel, Dave Hansen, peterz, vishal.l.verma,
	linux-kernel, linux-efi, x86

Given that EFI_MEMORY_SP is platform BIOS policy descision for marking
memory ranges as "reserved for a specific purpose" there will inevitably
be scenarios where the BIOS omits the attribute in situations where it
is desired. Unlike other attributes if the OS wants to reserve this
memory from the kernel the reservation needs to happen early in init. So
early, in fact, that it needs to happen before e820__memblock_setup()
which is a pre-requisite for efi_fake_memmap() that wants to allocate
memory for the updated table.

Introduce an x86 specific efi_fake_memmap_early() that can search for
attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table
accordingly.

The KASLR code that scans the command line looking for user-directed
memory reservations also needs to be updated to consider
"efi_fake_mem=nn@ss:0x40000" requests.

Cc: <x86@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 arch/x86/boot/compressed/kaslr.c    |   46 ++++++++++++++++++++---
 arch/x86/include/asm/efi.h          |    8 ++++
 arch/x86/platform/efi/efi.c         |    2 +
 drivers/firmware/efi/Makefile       |    5 ++-
 drivers/firmware/efi/fake_mem.c     |   24 ++++++------
 drivers/firmware/efi/fake_mem.h     |   10 +++++
 drivers/firmware/efi/x86-fake_mem.c |   69 +++++++++++++++++++++++++++++++++++
 7 files changed, 143 insertions(+), 21 deletions(-)
 create mode 100644 drivers/firmware/efi/fake_mem.h
 create mode 100644 drivers/firmware/efi/x86-fake_mem.c

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 093e84e28b7a..53ed3991f9a8 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -133,8 +133,14 @@ char *skip_spaces(const char *str)
 #include "../../../../lib/ctype.c"
 #include "../../../../lib/cmdline.c"
 
+enum parse_mode {
+	PARSE_MEMMAP,
+	PARSE_EFI,
+};
+
 static int
-parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
+parse_memmap(char *p, unsigned long long *start, unsigned long long *size,
+		enum parse_mode mode)
 {
 	char *oldp;
 
@@ -157,8 +163,33 @@ parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
 		*start = memparse(p + 1, &p);
 		return 0;
 	case '@':
-		/* memmap=nn@ss specifies usable region, should be skipped */
-		*size = 0;
+		if (mode == PARSE_MEMMAP) {
+			/*
+			 * memmap=nn@ss specifies usable region, should
+			 * be skipped
+			 */
+			*size = 0;
+		} else {
+			unsigned long long flags;
+
+			/*
+			 * efi_fake_mem=nn@ss:attr the attr specifies
+			 * flags that might imply a soft-reservation.
+			 */
+			*start = memparse(p + 1, &p);
+			if (p && *p == ':') {
+				p++;
+				oldp = p;
+				flags = simple_strtoull(p, &p, 0);
+				if (p == oldp)
+					*size = 0;
+				else if (flags & EFI_MEMORY_SP)
+					return 0;
+				else
+					*size = 0;
+			} else
+				*size = 0;
+		}
 		/* Fall through */
 	default:
 		/*
@@ -173,7 +204,7 @@ parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
 	return -EINVAL;
 }
 
-static void mem_avoid_memmap(char *str)
+static void mem_avoid_memmap(enum parse_mode mode, char *str)
 {
 	static int i;
 
@@ -188,7 +219,7 @@ static void mem_avoid_memmap(char *str)
 		if (k)
 			*k++ = 0;
 
-		rc = parse_memmap(str, &start, &size);
+		rc = parse_memmap(str, &start, &size, mode);
 		if (rc < 0)
 			break;
 		str = k;
@@ -239,7 +270,6 @@ static void parse_gb_huge_pages(char *param, char *val)
 	}
 }
 
-
 static void handle_mem_options(void)
 {
 	char *args = (char *)get_cmd_line_ptr();
@@ -272,7 +302,7 @@ static void handle_mem_options(void)
 		}
 
 		if (!strcmp(param, "memmap")) {
-			mem_avoid_memmap(val);
+			mem_avoid_memmap(PARSE_MEMMAP, val);
 		} else if (strstr(param, "hugepages")) {
 			parse_gb_huge_pages(param, val);
 		} else if (!strcmp(param, "mem")) {
@@ -285,6 +315,8 @@ static void handle_mem_options(void)
 				goto out;
 
 			mem_limit = mem_size;
+		} else if (!strcmp(param, "efi_fake_mem")) {
+			mem_avoid_memmap(PARSE_EFI, val);
 		}
 	}
 
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 45f853bce869..d028e9acdf1c 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -263,4 +263,12 @@ static inline void efi_reserve_boot_services(void)
 }
 #endif /* CONFIG_EFI */
 
+#ifdef CONFIG_EFI_FAKE_MEMMAP
+extern void __init efi_fake_memmap_early(void);
+#else
+static inline void efi_fake_memmap_early(void)
+{
+}
+#endif
+
 #endif /* _ASM_X86_EFI_H */
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 9cfb7f1cf25d..ac63e244ae55 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -259,6 +259,8 @@ int __init efi_memblock_x86_reserve_range(void)
 		do_add_efi_memmap(ADD_EFI_SOFT_RESERVED);
 	}
 
+	efi_fake_memmap_early();
+
 	WARN(efi.memmap.desc_version != 1,
 	     "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
 	     efi.memmap.desc_version);
diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
index 4ac2de4dfa72..d7a6db03ea79 100644
--- a/drivers/firmware/efi/Makefile
+++ b/drivers/firmware/efi/Makefile
@@ -20,13 +20,16 @@ obj-$(CONFIG_UEFI_CPER)			+= cper.o
 obj-$(CONFIG_EFI_RUNTIME_MAP)		+= runtime-map.o
 obj-$(CONFIG_EFI_RUNTIME_WRAPPERS)	+= runtime-wrappers.o
 obj-$(CONFIG_EFI_STUB)			+= libstub/
-obj-$(CONFIG_EFI_FAKE_MEMMAP)		+= fake_mem.o
+obj-$(CONFIG_EFI_FAKE_MEMMAP)		+= fake_map.o
 obj-$(CONFIG_EFI_BOOTLOADER_CONTROL)	+= efibc.o
 obj-$(CONFIG_EFI_TEST)			+= test/
 obj-$(CONFIG_EFI_DEV_PATH_PARSER)	+= dev-path-parser.o
 obj-$(CONFIG_APPLE_PROPERTIES)		+= apple-properties.o
 obj-$(CONFIG_EFI_RCI2_TABLE)		+= rci2-table.o
 
+fake_map-y				+= fake_mem.o
+fake_map-$(CONFIG_X86)			+= x86-fake_mem.o
+
 arm-obj-$(CONFIG_EFI)			:= arm-init.o arm-runtime.o
 obj-$(CONFIG_ARM)			+= $(arm-obj-y)
 obj-$(CONFIG_ARM64)			+= $(arm-obj-y)
diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
index 526b45331d96..bb9fc70d0cfa 100644
--- a/drivers/firmware/efi/fake_mem.c
+++ b/drivers/firmware/efi/fake_mem.c
@@ -17,12 +17,10 @@
 #include <linux/memblock.h>
 #include <linux/types.h>
 #include <linux/sort.h>
-#include <asm/efi.h>
+#include "fake_mem.h"
 
-#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKE_MEM
-
-static struct efi_mem_range fake_mems[EFI_MAX_FAKEMEM];
-static int nr_fake_mem;
+struct efi_mem_range efi_fake_mems[EFI_MAX_FAKEMEM];
+int nr_fake_mem;
 
 static int __init cmp_fake_mem(const void *x1, const void *x2)
 {
@@ -50,7 +48,7 @@ void __init efi_fake_memmap(void)
 	/* count up the number of EFI memory descriptor */
 	for (i = 0; i < nr_fake_mem; i++) {
 		for_each_efi_memory_desc(md) {
-			struct range *r = &fake_mems[i].range;
+			struct range *r = &efi_fake_mems[i].range;
 
 			new_nr_map += efi_memmap_split_count(md, r);
 		}
@@ -70,7 +68,7 @@ void __init efi_fake_memmap(void)
 	}
 
 	for (i = 0; i < nr_fake_mem; i++)
-		efi_memmap_insert(&efi.memmap, new_memmap, &fake_mems[i]);
+		efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
 
 	/* swap into new EFI memmap */
 	early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
@@ -104,22 +102,22 @@ static int __init setup_fake_mem(char *p)
 		if (nr_fake_mem >= EFI_MAX_FAKEMEM)
 			break;
 
-		fake_mems[nr_fake_mem].range.start = start;
-		fake_mems[nr_fake_mem].range.end = start + mem_size - 1;
-		fake_mems[nr_fake_mem].attribute = attribute;
+		efi_fake_mems[nr_fake_mem].range.start = start;
+		efi_fake_mems[nr_fake_mem].range.end = start + mem_size - 1;
+		efi_fake_mems[nr_fake_mem].attribute = attribute;
 		nr_fake_mem++;
 
 		if (*p == ',')
 			p++;
 	}
 
-	sort(fake_mems, nr_fake_mem, sizeof(struct efi_mem_range),
+	sort(efi_fake_mems, nr_fake_mem, sizeof(struct efi_mem_range),
 	     cmp_fake_mem, NULL);
 
 	for (i = 0; i < nr_fake_mem; i++)
 		pr_info("efi_fake_mem: add attr=0x%016llx to [mem 0x%016llx-0x%016llx]",
-			fake_mems[i].attribute, fake_mems[i].range.start,
-			fake_mems[i].range.end);
+			efi_fake_mems[i].attribute, efi_fake_mems[i].range.start,
+			efi_fake_mems[i].range.end);
 
 	return *p == '\0' ? 0 : -EINVAL;
 }
diff --git a/drivers/firmware/efi/fake_mem.h b/drivers/firmware/efi/fake_mem.h
new file mode 100644
index 000000000000..0390be13df96
--- /dev/null
+++ b/drivers/firmware/efi/fake_mem.h
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef __EFI_FAKE_MEM_H__
+#define __EFI_FAKE_MEM_H__
+#include <asm/efi.h>
+
+#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKE_MEM
+
+extern struct efi_mem_range efi_fake_mems[EFI_MAX_FAKEMEM];
+extern int nr_fake_mem;
+#endif /* __EFI_FAKE_MEM_H__ */
diff --git a/drivers/firmware/efi/x86-fake_mem.c b/drivers/firmware/efi/x86-fake_mem.c
new file mode 100644
index 000000000000..8c369555dafe
--- /dev/null
+++ b/drivers/firmware/efi/x86-fake_mem.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2019 Intel Corporation. All rights reserved. */
+#include <linux/efi.h>
+#include <asm/e820/api.h>
+#include "fake_mem.h"
+
+void __init efi_fake_memmap_early(void)
+{
+	int i;
+
+	/*
+	 * efi_fake_mem() can handle all possibilities if EFI_MEMORY_SP
+	 * is ignored.
+	 */
+	if (!efi_enabled(EFI_MEM_SOFT_RESERVE))
+		return;
+
+	if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
+		return;
+
+	/*
+	 * Given that efi_fake_memmap() needs to perform memblock
+	 * allocations it needs to run after e820__memblock_setup().
+	 * However, if efi_fake_mem specifies EFI_MEMORY_SP for a given
+	 * address range that potentially needs to mark the memory as
+	 * reserved prior to e820__memblock_setup(). Update e820
+	 * directly if EFI_MEMORY_SP is specified for an
+	 * EFI_CONVENTIONAL_MEMORY descriptor.
+	 */
+	for (i = 0; i < nr_fake_mem; i++) {
+		struct efi_mem_range *mem = &efi_fake_mems[i];
+		efi_memory_desc_t *md;
+		u64 m_start, m_end;
+
+		if ((mem->attribute & EFI_MEMORY_SP) == 0)
+			continue;
+
+		m_start = mem->range.start;
+		m_end = mem->range.end;
+		for_each_efi_memory_desc(md) {
+			u64 start, end;
+
+			if (md->type != EFI_CONVENTIONAL_MEMORY)
+				continue;
+
+			start = md->phys_addr;
+			end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
+
+			if (m_start <= end && m_end >= start)
+				/* fake range overlaps descriptor */;
+			else
+				continue;
+
+			/*
+			 * Trim the boundary of the e820 update to the
+			 * descriptor in case the fake range overlaps
+			 * !EFI_CONVENTIONAL_MEMORY
+			 */
+			start = max(start, m_start);
+			end = min(end, m_end);
+
+			if (end <= start)
+				continue;
+			e820__range_update(start, end - start + 1, E820_TYPE_RAM,
+					E820_TYPE_SOFT_RESERVED);
+			e820__update_table(e820_table);
+		}
+	}
+}


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 06/10] lib: Uplevel the pmem "region" ida to a global allocator
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (4 preceding siblings ...)
  2019-08-30  1:52 ` [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP Dan Williams
@ 2019-08-30  1:52 ` Dan Williams
  2019-08-30  1:52 ` [PATCH v5 07/10] dax: Fix alloc_dax_region() compile warning Dan Williams
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: Keith Busch, peterz, vishal.l.verma, dave.hansen, ard.biesheuvel,
	linux-kernel, linux-efi, x86

In preparation for handling platform differentiated memory types beyond
persistent memory, uplevel the "region" identifier to a global number
space. This enables a device-dax instance to be registered to any memory
type with guaranteed unique names.

Cc: Keith Busch <keith.busch@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/nvdimm/Kconfig       |    1 +
 drivers/nvdimm/core.c        |    1 -
 drivers/nvdimm/nd-core.h     |    1 -
 drivers/nvdimm/region_devs.c |   13 ++++---------
 include/linux/memregion.h    |   19 +++++++++++++++++++
 lib/Kconfig                  |    3 +++
 lib/Makefile                 |    1 +
 lib/memregion.c              |   18 ++++++++++++++++++
 8 files changed, 46 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/memregion.h
 create mode 100644 lib/memregion.c

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index a5fde15e91d3..f6e7bcb3f9a5 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -4,6 +4,7 @@ menuconfig LIBNVDIMM
 	depends on PHYS_ADDR_T_64BIT
 	depends on HAS_IOMEM
 	depends on BLK_DEV
+	select MEMREGION
 	help
 	  Generic support for non-volatile memory devices including
 	  ACPI-6-NFIT defined resources.  On platforms that define an
diff --git a/drivers/nvdimm/core.c b/drivers/nvdimm/core.c
index 9204f1e9fd14..e592c4964674 100644
--- a/drivers/nvdimm/core.c
+++ b/drivers/nvdimm/core.c
@@ -455,7 +455,6 @@ static __exit void libnvdimm_exit(void)
 	nd_region_exit();
 	nvdimm_exit();
 	nvdimm_bus_exit();
-	nd_region_devs_exit();
 	nvdimm_devs_exit();
 }
 
diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h
index 0ac52b6eb00e..8408412fba1b 100644
--- a/drivers/nvdimm/nd-core.h
+++ b/drivers/nvdimm/nd-core.h
@@ -127,7 +127,6 @@ struct nvdimm_bus *walk_to_nvdimm_bus(struct device *nd_dev);
 int __init nvdimm_bus_init(void);
 void nvdimm_bus_exit(void);
 void nvdimm_devs_exit(void);
-void nd_region_devs_exit(void);
 void nd_region_probe_success(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 struct nd_region;
 void nd_region_create_ns_seed(struct nd_region *nd_region);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index af30cbe7a8ea..5a152a897c94 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -3,6 +3,7 @@
  * Copyright(c) 2013-2015 Intel Corporation. All rights reserved.
  */
 #include <linux/scatterlist.h>
+#include <linux/memregion.h>
 #include <linux/highmem.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
@@ -19,7 +20,6 @@
  */
 #include <linux/io-64-nonatomic-hi-lo.h>
 
-static DEFINE_IDA(region_ida);
 static DEFINE_PER_CPU(int, flush_idx);
 
 static int nvdimm_map_flush(struct device *dev, struct nvdimm *nvdimm, int dimm,
@@ -133,7 +133,7 @@ static void nd_region_release(struct device *dev)
 		put_device(&nvdimm->dev);
 	}
 	free_percpu(nd_region->lane);
-	ida_simple_remove(&region_ida, nd_region->id);
+	memregion_free(nd_region->id);
 	if (is_nd_blk(dev))
 		kfree(to_nd_blk_region(dev));
 	else
@@ -1034,7 +1034,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 
 	if (!region_buf)
 		return NULL;
-	nd_region->id = ida_simple_get(&region_ida, 0, 0, GFP_KERNEL);
+	nd_region->id = memregion_alloc(GFP_KERNEL);
 	if (nd_region->id < 0)
 		goto err_id;
 
@@ -1093,7 +1093,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	return nd_region;
 
  err_percpu:
-	ida_simple_remove(&region_ida, nd_region->id);
+	memregion_free(nd_region->id);
  err_id:
 	kfree(region_buf);
 	return NULL;
@@ -1262,8 +1262,3 @@ int nd_region_conflict(struct nd_region *nd_region, resource_size_t start,
 
 	return device_for_each_child(&nvdimm_bus->dev, &ctx, region_conflict);
 }
-
-void __exit nd_region_devs_exit(void)
-{
-	ida_destroy(&region_ida);
-}
diff --git a/include/linux/memregion.h b/include/linux/memregion.h
new file mode 100644
index 000000000000..07ecde0bd136
--- /dev/null
+++ b/include/linux/memregion.h
@@ -0,0 +1,19 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _MEMREGION_H_
+#define _MEMREGION_H_
+#include <linux/types.h>
+#include <linux/errno.h>
+
+#ifdef CONFIG_MEMREGION
+int memregion_alloc(gfp_t gfp);
+void memregion_free(int id);
+#else
+static inline int memregion_alloc(gfp_t gfp)
+{
+	return -ENOMEM;
+}
+void memregion_free(int id)
+{
+}
+#endif
+#endif /* _MEMREGION_H_ */
diff --git a/lib/Kconfig b/lib/Kconfig
index f33d66fc0e86..9fce7e15716a 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -607,6 +607,9 @@ config ARCH_NO_SG_CHAIN
 config ARCH_HAS_PMEM_API
 	bool
 
+config MEMREGION
+	bool
+
 # use memcpy to implement user copies for nommu architectures
 config UACCESS_MEMCPY
 	bool
diff --git a/lib/Makefile b/lib/Makefile
index c5892807e06f..2fb7b47018f1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -212,6 +212,7 @@ obj-$(CONFIG_GENERIC_NET_UTILS) += net_utils.o
 
 obj-$(CONFIG_SG_SPLIT) += sg_split.o
 obj-$(CONFIG_SG_POOL) += sg_pool.o
+obj-$(CONFIG_MEMREGION) += memregion.o
 obj-$(CONFIG_STMP_DEVICE) += stmp_device.o
 obj-$(CONFIG_IRQ_POLL) += irq_poll.o
 
diff --git a/lib/memregion.c b/lib/memregion.c
new file mode 100644
index 000000000000..77c85b5251da
--- /dev/null
+++ b/lib/memregion.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* identifiers for device / performance-differentiated memory regions */
+#include <linux/idr.h>
+#include <linux/types.h>
+
+static DEFINE_IDA(memregion_ids);
+
+int memregion_alloc(gfp_t gfp)
+{
+	return ida_alloc(&memregion_ids, gfp);
+}
+EXPORT_SYMBOL(memregion_alloc);
+
+void memregion_free(int id)
+{
+	ida_free(&memregion_ids, id);
+}
+EXPORT_SYMBOL(memregion_free);


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 07/10] dax: Fix alloc_dax_region() compile warning
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (5 preceding siblings ...)
  2019-08-30  1:52 ` [PATCH v5 06/10] lib: Uplevel the pmem "region" ida to a global allocator Dan Williams
@ 2019-08-30  1:52 ` Dan Williams
  2019-08-30  1:53 ` [PATCH v5 08/10] device-dax: Add a driver for "hmem" devices Dan Williams
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:52 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: kbuild test robot, peterz, vishal.l.verma, dave.hansen,
	ard.biesheuvel, linux-kernel, linux-efi, x86

PFN flags are (unsigned long long), fix the alloc_dax_region() calling
convention to fix warnings of the form:

>> include/linux/pfn_t.h:18:17: warning: large integer implicitly truncated to unsigned type [-Woverflow]
    #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/bus.c         |    2 +-
 drivers/dax/bus.h         |    2 +-
 drivers/dax/dax-private.h |    2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 8fafbeab510a..eccdda1f7b71 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -227,7 +227,7 @@ static void dax_region_unregister(void *region)
 
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		struct resource *res, int target_node, unsigned int align,
-		unsigned long pfn_flags)
+		unsigned long long pfn_flags)
 {
 	struct dax_region *dax_region;
 
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 8619e3299943..9e4eba67e8b9 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -11,7 +11,7 @@ struct dax_region;
 void dax_region_put(struct dax_region *dax_region);
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		struct resource *res, int target_node, unsigned int align,
-		unsigned long flags);
+		unsigned long long flags);
 
 enum dev_dax_subsys {
 	DEV_DAX_BUS,
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index 6ccca3b890d6..3107ce80e809 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -32,7 +32,7 @@ struct dax_region {
 	struct device *dev;
 	unsigned int align;
 	struct resource res;
-	unsigned long pfn_flags;
+	unsigned long long pfn_flags;
 };
 
 /**


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 08/10] device-dax: Add a driver for "hmem" devices
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (6 preceding siblings ...)
  2019-08-30  1:52 ` [PATCH v5 07/10] dax: Fix alloc_dax_region() compile warning Dan Williams
@ 2019-08-30  1:53 ` Dan Williams
  2019-08-30  1:53 ` [PATCH v5 09/10] acpi/numa/hmat: Register HMAT at device_initcall level Dan Williams
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:53 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: Vishal Verma, Keith Busch, Dave Jiang, kbuild test robot,
	Dave Hansen, peterz, ard.biesheuvel, linux-kernel, linux-efi,
	x86

Platform firmware like EFI/ACPI may publish "hmem" platform devices.
Such a device is a performance differentiated memory range likely
reserved for an application specific use case. The driver gives access
to 100% of the capacity via a device-dax mmap instance by default.

However, if over-subscription and other kernel memory management is
desired the resulting dax device can be assigned to the core-mm via the
kmem driver.

This consumes "hmem" devices the producer of "hmem" devices is saved for
a follow-on patch so that it can reference the new CONFIG_DEV_DAX_HMEM
symbol to gate performing the enumeration work.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/dax/Kconfig       |   27 +++++++++++++++++----
 drivers/dax/Makefile      |    2 ++
 drivers/dax/hmem.c        |   57 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/memregion.h |    4 +++
 4 files changed, 85 insertions(+), 5 deletions(-)
 create mode 100644 drivers/dax/hmem.c

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index f33c73e4af41..3b6c06f07326 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -32,19 +32,36 @@ config DEV_DAX_PMEM
 
 	  Say M if unsure
 
+config DEV_DAX_HMEM
+	tristate "HMEM DAX: direct access to 'specific purpose' memory"
+	depends on EFI_SOFT_RESERVE
+	default DEV_DAX
+	help
+	  EFI 2.8 platforms, and others, may advertise 'specific purpose'
+	  memory. For example, a high bandwidth memory pool. The
+	  indication from platform firmware is meant to reserve the
+	  memory from typical usage by default. This driver creates
+	  device-dax instances for these memory ranges, and that also
+	  enables the possibility to assign them to the DEV_DAX_KMEM
+	  driver to override the reservation and add them to kernel
+	  "System RAM" pool.
+
+	  Say M if unsure.
+
 config DEV_DAX_KMEM
 	tristate "KMEM DAX: volatile-use of persistent memory"
 	default DEV_DAX
 	depends on DEV_DAX
 	depends on MEMORY_HOTPLUG # for add_memory() and friends
 	help
-	  Support access to persistent memory as if it were RAM.  This
-	  allows easier use of persistent memory by unmodified
-	  applications.
+	  Support access to persistent, or other performance
+	  differentiated memory as if it were System RAM. This allows
+	  easier use of persistent memory by unmodified applications, or
+	  adds core kernel memory services to heterogeneous memory types
+	  (HMEM) marked "reserved" by platform firmware.
 
 	  To use this feature, a DAX device must be unbound from the
-	  device_dax driver (PMEM DAX) and bound to this kmem driver
-	  on each boot.
+	  device_dax driver and bound to this kmem driver on each boot.
 
 	  Say N if unsure.
 
diff --git a/drivers/dax/Makefile b/drivers/dax/Makefile
index 81f7d54dadfb..80065b38b3c4 100644
--- a/drivers/dax/Makefile
+++ b/drivers/dax/Makefile
@@ -2,9 +2,11 @@
 obj-$(CONFIG_DAX) += dax.o
 obj-$(CONFIG_DEV_DAX) += device_dax.o
 obj-$(CONFIG_DEV_DAX_KMEM) += kmem.o
+obj-$(CONFIG_DEV_DAX_HMEM) += dax_hmem.o
 
 dax-y := super.o
 dax-y += bus.o
 device_dax-y := device.o
+dax_hmem-y := hmem.o
 
 obj-y += pmem/
diff --git a/drivers/dax/hmem.c b/drivers/dax/hmem.c
new file mode 100644
index 000000000000..adbc548a0aef
--- /dev/null
+++ b/drivers/dax/hmem.c
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/platform_device.h>
+#include <linux/memregion.h>
+#include <linux/module.h>
+#include <linux/pfn_t.h>
+#include "bus.h"
+
+static int dax_hmem_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct dev_pagemap pgmap = { };
+	struct dax_region *dax_region;
+	struct memregion_info *mri;
+	struct dev_dax *dev_dax;
+	struct resource *res;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res)
+		return -ENOMEM;
+
+	mri = dev->platform_data;
+	pgmap.dev = dev;
+	memcpy(&pgmap.res, res, sizeof(*res));
+
+	dax_region = alloc_dax_region(dev, pdev->id, res, mri->target_node,
+			PMD_SIZE, PFN_DEV|PFN_MAP);
+	if (!dax_region)
+		return -ENOMEM;
+
+	dev_dax = devm_create_dev_dax(dax_region, 0, &pgmap);
+	if (IS_ERR(dev_dax))
+		return PTR_ERR(dev_dax);
+
+	/* child dev_dax instances now own the lifetime of the dax_region */
+	dax_region_put(dax_region);
+	return 0;
+}
+
+static int dax_hmem_remove(struct platform_device *pdev)
+{
+	/* devm handles teardown */
+	return 0;
+}
+
+static struct platform_driver dax_hmem_driver = {
+	.probe = dax_hmem_probe,
+	.remove = dax_hmem_remove,
+	.driver = {
+		.name = "hmem",
+	},
+};
+
+module_platform_driver(dax_hmem_driver);
+
+MODULE_ALIAS("platform:hmem*");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Intel Corporation");
diff --git a/include/linux/memregion.h b/include/linux/memregion.h
index 07ecde0bd136..3d679eda66cc 100644
--- a/include/linux/memregion.h
+++ b/include/linux/memregion.h
@@ -4,6 +4,10 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 
+struct memregion_info {
+	int target_node;
+};
+
 #ifdef CONFIG_MEMREGION
 int memregion_alloc(gfp_t gfp);
 void memregion_free(int id);


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 09/10] acpi/numa/hmat: Register HMAT at device_initcall level
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (7 preceding siblings ...)
  2019-08-30  1:53 ` [PATCH v5 08/10] device-dax: Add a driver for "hmem" devices Dan Williams
@ 2019-08-30  1:53 ` Dan Williams
  2019-08-30  1:53 ` [PATCH v5 10/10] acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device Dan Williams
  2019-09-02 11:09 ` [PATCH v5 00/10] EFI Specific Purpose Memory Support Rafael J. Wysocki
  10 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:53 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: Rafael J. Wysocki, Len Brown, Keith Busch, Jonathan Cameron,
	Dave Hansen, peterz, vishal.l.verma, ard.biesheuvel,
	linux-kernel, linux-efi, x86

In preparation for registering device-dax instances for accessing EFI
specific-purpose memory, arrange for the HMAT registration to occur
later in the init process. Critically HMAT initialization needs to occur
after e820__reserve_resources_late() which is the point at which the
iomem resource tree is populated with "Application Reserved"
(IORES_DESC_APPLICATION_RESERVED). e820__reserve_resources_late()
happens at subsys_initcall time.

Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/numa/hmat.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 8f9a28a870b0..4707eb9dd07b 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -748,4 +748,4 @@ static __init int hmat_init(void)
 	acpi_put_table(tbl);
 	return 0;
 }
-subsys_initcall(hmat_init);
+device_initcall(hmat_init);


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v5 10/10] acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (8 preceding siblings ...)
  2019-08-30  1:53 ` [PATCH v5 09/10] acpi/numa/hmat: Register HMAT at device_initcall level Dan Williams
@ 2019-08-30  1:53 ` Dan Williams
  2019-09-02 11:09 ` [PATCH v5 00/10] EFI Specific Purpose Memory Support Rafael J. Wysocki
  10 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-08-30  1:53 UTC (permalink / raw)
  To: tglx, rafael.j.wysocki
  Cc: Len Brown, Keith Busch, Rafael J. Wysocki, Vishal Verma,
	Jonathan Cameron, Dave Hansen, peterz, ard.biesheuvel,
	linux-kernel, linux-efi, x86

Memory that has been tagged EFI_MEMORY_SP, and has performance
properties described by the ACPI HMAT is expected to have an application
specific consumer.

Those consumers may want 100% of the memory capacity to be reserved from
any usage by the kernel. By default, with this enabling, a platform
device is created to represent this differentiated resource.

The device-dax "hmem" driver claims these devices by default and
provides an mmap interface for the target application.  If the
administrator prefers, the hmem resource range can be made available to
the core-mm via the device-dax hotplug facility, kmem, to online the
memory with its own numa node.

This was tested with an emulated HMAT produced by qemu (with the pending
HMAT enabling patches), and "efi_fake_mem=8G@9G:0x40000" on the kernel
command line to mark the memory ranges associated with node2 and node3
as EFI_MEMORY_SP.

qemu numa configuration options:

-numa node,mem=4G,cpus=0-19,nodeid=0
-numa node,mem=4G,cpus=20-39,nodeid=1
-numa node,mem=4G,nodeid=2
-numa node,mem=4G,nodeid=3
-numa dist,src=0,dst=0,val=10
-numa dist,src=0,dst=1,val=21
-numa dist,src=0,dst=2,val=21
-numa dist,src=0,dst=3,val=21
-numa dist,src=1,dst=0,val=21
-numa dist,src=1,dst=1,val=10
-numa dist,src=1,dst=2,val=21
-numa dist,src=1,dst=3,val=21
-numa dist,src=2,dst=0,val=21
-numa dist,src=2,dst=1,val=21
-numa dist,src=2,dst=2,val=10
-numa dist,src=2,dst=3,val=21
-numa dist,src=3,dst=0,val=21
-numa dist,src=3,dst=1,val=21
-numa dist,src=3,dst=2,val=21
-numa dist,src=3,dst=3,val=10
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5
-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10
-numa hmat-lb,initiator=0,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10
-numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15
-numa hmat-lb,initiator=0,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15
-numa hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20
-numa hmat-lb,initiator=0,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20
-numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-latency,base-lat=10,latency=10
-numa hmat-lb,initiator=1,target=0,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=10
-numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-latency,base-lat=10,latency=5
-numa hmat-lb,initiator=1,target=1,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=5
-numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-latency,base-lat=10,latency=15
-numa hmat-lb,initiator=1,target=2,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=15
-numa hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-latency,base-lat=10,latency=20
-numa hmat-lb,initiator=1,target=3,hierarchy=memory,data-type=access-bandwidth,base-bw=20,bandwidth=20

Result:

# daxctl list -RDu
[
  {
    "path":"\/platform\/hmem.1",
    "id":1,
    "size":"4.00 GiB (4.29 GB)",
    "align":2097152,
    "devices":[
      {
        "chardev":"dax1.0",
        "size":"4.00 GiB (4.29 GB)"
      }
    ]
  },
  {
    "path":"\/platform\/hmem.0",
    "id":0,
    "size":"4.00 GiB (4.29 GB)",
    "align":2097152,
    "devices":[
      {
        "chardev":"dax0.0",
        "size":"4.00 GiB (4.29 GB)"
      }
    ]
  }
]

# cat /proc/iomem
[..]
240000000-43fffffff : Soft Reserved
  240000000-33fffffff : hmem.0
    240000000-33fffffff : dax0.0
  340000000-43fffffff : hmem.1
    340000000-43fffffff : dax1.0

Cc: Len Brown <lenb@kernel.org>
Cc: Keith Busch <keith.busch@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/acpi/numa/Kconfig |    1 
 drivers/acpi/numa/hmat.c  |  136 +++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 125 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/numa/Kconfig b/drivers/acpi/numa/Kconfig
index d14582387ed0..c1be746e111a 100644
--- a/drivers/acpi/numa/Kconfig
+++ b/drivers/acpi/numa/Kconfig
@@ -8,6 +8,7 @@ config ACPI_HMAT
 	bool "ACPI Heterogeneous Memory Attribute Table Support"
 	depends on ACPI_NUMA
 	select HMEM_REPORTING
+	select MEMREGION
 	help
 	 If set, this option has the kernel parse and report the
 	 platform's ACPI HMAT (Heterogeneous Memory Attributes Table),
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 4707eb9dd07b..eaa5a0f93dec 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -8,12 +8,18 @@
  * the applicable attributes with the node's interfaces.
  */
 
+#define pr_fmt(fmt) "acpi/hmat: " fmt
+#define dev_fmt(fmt) "acpi/hmat: " fmt
+
 #include <linux/acpi.h>
 #include <linux/bitops.h>
 #include <linux/device.h>
 #include <linux/init.h>
 #include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/platform_device.h>
 #include <linux/list_sort.h>
+#include <linux/memregion.h>
 #include <linux/memory.h>
 #include <linux/mutex.h>
 #include <linux/node.h>
@@ -49,6 +55,7 @@ struct memory_target {
 	struct list_head node;
 	unsigned int memory_pxm;
 	unsigned int processor_pxm;
+	struct resource memregions;
 	struct node_hmem_attrs hmem_attrs;
 	struct list_head caches;
 	struct node_cache_attrs cache_attrs;
@@ -104,22 +111,36 @@ static __init void alloc_memory_initiator(unsigned int cpu_pxm)
 	list_add_tail(&initiator->node, &initiators);
 }
 
-static __init void alloc_memory_target(unsigned int mem_pxm)
+static __init void alloc_memory_target(unsigned int mem_pxm,
+		resource_size_t start, resource_size_t len)
 {
 	struct memory_target *target;
 
 	target = find_mem_target(mem_pxm);
-	if (target)
-		return;
-
-	target = kzalloc(sizeof(*target), GFP_KERNEL);
-	if (!target)
-		return;
+	if (!target) {
+		target = kzalloc(sizeof(*target), GFP_KERNEL);
+		if (!target)
+			return;
+		target->memory_pxm = mem_pxm;
+		target->processor_pxm = PXM_INVAL;
+		target->memregions = (struct resource) {
+			.name	= "ACPI mem",
+			.start	= 0,
+			.end	= -1,
+			.flags	= IORESOURCE_MEM,
+		};
+		list_add_tail(&target->node, &targets);
+		INIT_LIST_HEAD(&target->caches);
+	}
 
-	target->memory_pxm = mem_pxm;
-	target->processor_pxm = PXM_INVAL;
-	list_add_tail(&target->node, &targets);
-	INIT_LIST_HEAD(&target->caches);
+	/*
+	 * There are potentially multiple ranges per PXM, so record each
+	 * in the per-target memregions resource tree.
+	 */
+	if (!__request_region(&target->memregions, start, len, "memory target",
+				IORESOURCE_MEM))
+		pr_warn("failed to reserve %#llx - %#llx in pxm: %d\n",
+				start, start + len, mem_pxm);
 }
 
 static __init const char *hmat_data_type(u8 type)
@@ -452,7 +473,7 @@ static __init int srat_parse_mem_affinity(union acpi_subtable_headers *header,
 		return -EINVAL;
 	if (!(ma->flags & ACPI_SRAT_MEM_ENABLED))
 		return 0;
-	alloc_memory_target(ma->proximity_domain);
+	alloc_memory_target(ma->proximity_domain, ma->base_address, ma->length);
 	return 0;
 }
 
@@ -613,10 +634,91 @@ static void hmat_register_target_perf(struct memory_target *target)
 	node_set_perf_attrs(mem_nid, &target->hmem_attrs, 0);
 }
 
+static void hmat_register_target_device(struct memory_target *target,
+		struct resource *r)
+{
+	/* define a clean / non-busy resource for the platform device */
+	struct resource res = {
+		.start = r->start,
+		.end = r->end,
+		.flags = IORESOURCE_MEM,
+	};
+	struct platform_device *pdev;
+	struct memregion_info info;
+	int rc, id;
+
+	rc = region_intersects(res.start, resource_size(&res), IORESOURCE_MEM,
+			IORES_DESC_SOFT_RESERVED);
+	if (rc != REGION_INTERSECTS)
+		return;
+
+	id = memregion_alloc(GFP_KERNEL);
+	if (id < 0) {
+		pr_err("memregion allocation failure for %pr\n", &res);
+		return;
+	}
+
+	pdev = platform_device_alloc("hmem", id);
+	if (!pdev) {
+		pr_err("hmem device allocation failure for %pr\n", &res);
+		goto out_pdev;
+	}
+
+	pdev->dev.numa_node = acpi_map_pxm_to_online_node(target->memory_pxm);
+	info = (struct memregion_info) {
+		.target_node = acpi_map_pxm_to_node(target->memory_pxm),
+	};
+	rc = platform_device_add_data(pdev, &info, sizeof(info));
+	if (rc < 0) {
+		pr_err("hmem memregion_info allocation failure for %pr\n", &res);
+		goto out_pdev;
+	}
+
+	rc = platform_device_add_resources(pdev, &res, 1);
+	if (rc < 0) {
+		pr_err("hmem resource allocation failure for %pr\n", &res);
+		goto out_resource;
+	}
+
+	rc = platform_device_add(pdev);
+	if (rc < 0) {
+		dev_err(&pdev->dev, "device add failed for %pr\n", &res);
+		goto out_resource;
+	}
+
+	return;
+
+out_resource:
+	put_device(&pdev->dev);
+out_pdev:
+	memregion_free(id);
+}
+
+static __init void hmat_register_target_devices(struct memory_target *target)
+{
+	struct resource *res;
+
+	/*
+	 * Do not bother creating devices if no driver is available to
+	 * consume them.
+	 */
+	if (!IS_ENABLED(CONFIG_DEV_DAX_HMEM))
+		return;
+
+	for (res = target->memregions.child; res; res = res->sibling)
+		hmat_register_target_device(target, res);
+}
+
 static void hmat_register_target(struct memory_target *target)
 {
 	int nid = pxm_to_node(target->memory_pxm);
 
+	/*
+	 * Devices may belong to either an offline or online
+	 * node, so unconditionally add them.
+	 */
+	hmat_register_target_devices(target);
+
 	/*
 	 * Skip offline nodes. This can happen when memory
 	 * marked EFI_MEMORY_SP, "specific purpose", is applied
@@ -677,11 +779,21 @@ static __init void hmat_free_structures(void)
 	struct target_cache *tcache, *cnext;
 
 	list_for_each_entry_safe(target, tnext, &targets, node) {
+		struct resource *res, *res_next;
+
 		list_for_each_entry_safe(tcache, cnext, &target->caches, node) {
 			list_del(&tcache->node);
 			kfree(tcache);
 		}
+
 		list_del(&target->node);
+		res = target->memregions.child;
+		while (res) {
+			res_next = res->sibling;
+			__release_region(&target->memregions, res->start,
+					resource_size(res));
+			res = res_next;
+		}
 		kfree(target);
 	}
 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 00/10] EFI Specific Purpose Memory Support
  2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
                   ` (9 preceding siblings ...)
  2019-08-30  1:53 ` [PATCH v5 10/10] acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device Dan Williams
@ 2019-09-02 11:09 ` Rafael J. Wysocki
  2019-09-04 23:06   ` Dan Williams
  10 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2019-09-02 11:09 UTC (permalink / raw)
  To: Dan Williams
  Cc: tglx, rafael.j.wysocki, Dave Jiang, Jonathan Cameron,
	Keith Busch, kbuild test robot, Andy Shevchenko, Borislav Petkov,
	Vishal Verma, H. Peter Anvin, x86, Dave Hansen, Ingo Molnar,
	Len Brown, Peter Zijlstra, Ard Biesheuvel, Andy Lutomirski,
	Darren Hart, linux-kernel, linux-efi

On Friday, August 30, 2019 3:52:18 AM CEST Dan Williams wrote:
> Changes since v4 [1]:
> - Rename the facility from "Application Reserved" to "Soft Reserved" to
>   better reflect how the memory is treated. While the spec talks about
>   "specific / application purpose" memory the expected kernel behavior is
>   to make a best effort at reserving the memory from general purpose
>   allocations.
> 
> - Add a new efi=nosoftreserve option to disable consideration of the
>   EFI_MEMORY_SP attribute at boot time. This is also motivated by
>   Christoph's initial feedback of allowing the kernel to opt-out of the
>   policy whims of the platform BIOS implementation.
> 
> - Update the KASLR implementation to exclude soft-reserved memory
>   including the case where soft-reserved memory is specified via the
>   efi_fake_mem= attribute-override command-line option.
> 
> - Move the memregion allocator to its own object file. v4 had it in
>   kernel/resource.c which caused compile errors on Sparc. I otherwise
>   could not find an appropriate place to stash it.
> 
> - Rebase on a merge of tip/master and rafael/linux-next since the series
>   collides with changes in both those trees.
> 
> [1]: https://lore.kernel.org/r/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
> 
> ---
> 
> Thomas, Rafael,
> 
> This happens to collide with both your trees. I think the content
> warrants going through the x86 tree, but would need to publish commit:
> 
> 5c7ed4385424 HMAT: Skip publishing target info for nodes with no online memory
> 
> ...in Rafael's tree as a stable id for -tip to pull in, but I'm also
> open to other options. I've retained Dave's reviewed-by from v4.
> 
> ---
> 
> The EFI 2.8 Specification [2] introduces the EFI_MEMORY_SP ("specific
> purpose") memory attribute. This attribute bit replaces the deprecated
> ACPI HMAT "reservation hint" that was introduced in ACPI 6.2 and removed
> in ACPI 6.3.
> 
> Given the increasing diversity of memory types that might be advertised
> to the operating system, there is a need for platform firmware to hint
> which memory ranges are free for the OS to use as general purpose memory
> and which ranges are intended for application specific usage. For
> example, an application with prior knowledge of the platform may expect
> to be able to exclusively allocate a precious / limited pool of high
> bandwidth memory. Alternatively, for the general purpose case, the
> operating system may want to make the memory available on a best effort
> basis as a unique numa-node with performance properties by the new
> CONFIG_HMEM_REPORTING [3] facility.
> 
> In support of optionally allowing either application-exclusive and
> core-kernel-mm managed access to differentiated memory, claim
> EFI_MEMORY_SP ranges for exposure as "soft reserved" and assigned to a
> device-dax instance by default. Such instances can be directly owned /
> mapped by a platform-topology-aware application. Alternatively, with the
> new kmem facility [4], the administrator has the option to instead
> designate that those memory ranges be hot-added to the core-kernel-mm as
> a unique memory numa-node. In short, allow for the decision about what
> software agent manages soft-reserved memory to be made at runtime.
> 
> The patches build on the new HMAT+HMEM_REPORTING facilities merged
> for v5.2-rc1. The implementation is tested with qemu emulation of HMAT
> [5] plus the efi_fake_mem facility for applying the EFI_MEMORY_SP
> attribute. Specific details on reproducing the test configuration are in
> patch 10.
> 
> [2]: https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1cf33aafb84
> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308f
> [5]: http://patchwork.ozlabs.org/cover/1096737/
> 
> ---
> 
> Dan Williams (10):
>       acpi/numa: Establish a new drivers/acpi/numa/ directory
>       efi: Enumerate EFI_MEMORY_SP
>       x86, efi: Push EFI_MEMMAP check into leaf routines
>       x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
>       x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
>       lib: Uplevel the pmem "region" ida to a global allocator
>       dax: Fix alloc_dax_region() compile warning
>       device-dax: Add a driver for "hmem" devices
>       acpi/numa/hmat: Register HMAT at device_initcall level
>       acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device
> 
> 
>  Documentation/admin-guide/kernel-parameters.txt |   19 +++
>  arch/x86/Kconfig                                |   21 ++++
>  arch/x86/boot/compressed/eboot.c                |    7 +
>  arch/x86/boot/compressed/kaslr.c                |   50 +++++++-
>  arch/x86/include/asm/e820/types.h               |    8 +
>  arch/x86/include/asm/efi-stub.h                 |   11 ++
>  arch/x86/include/asm/efi.h                      |   17 +++
>  arch/x86/kernel/e820.c                          |   12 ++
>  arch/x86/kernel/setup.c                         |   19 ++-
>  arch/x86/platform/efi/efi.c                     |   56 +++++++++
>  arch/x86/platform/efi/quirks.c                  |    3 +
>  drivers/acpi/Kconfig                            |    9 --
>  drivers/acpi/Makefile                           |    3 -
>  drivers/acpi/hmat/Makefile                      |    2 
>  drivers/acpi/numa/Kconfig                       |    8 +
>  drivers/acpi/numa/Makefile                      |    3 +
>  drivers/acpi/numa/hmat.c                        |  138 +++++++++++++++++++++--
>  drivers/acpi/numa/srat.c                        |    0 
>  drivers/dax/Kconfig                             |   27 ++++-
>  drivers/dax/Makefile                            |    2 
>  drivers/dax/bus.c                               |    2 
>  drivers/dax/bus.h                               |    2 
>  drivers/dax/dax-private.h                       |    2 
>  drivers/dax/hmem.c                              |   57 ++++++++++
>  drivers/firmware/efi/Makefile                   |    5 +
>  drivers/firmware/efi/efi.c                      |    8 +
>  drivers/firmware/efi/esrt.c                     |    3 +
>  drivers/firmware/efi/fake_mem.c                 |   26 ++--
>  drivers/firmware/efi/fake_mem.h                 |   10 ++
>  drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 ++
>  drivers/firmware/efi/x86-fake_mem.c             |   69 ++++++++++++
>  drivers/nvdimm/Kconfig                          |    1 
>  drivers/nvdimm/core.c                           |    1 
>  drivers/nvdimm/nd-core.h                        |    1 
>  drivers/nvdimm/region_devs.c                    |   13 +-
>  include/linux/efi.h                             |    4 -
>  include/linux/ioport.h                          |    1 
>  include/linux/memregion.h                       |   23 ++++
>  lib/Kconfig                                     |    3 +
>  lib/Makefile                                    |    1 
>  lib/memregion.c                                 |   18 +++
>  41 files changed, 584 insertions(+), 93 deletions(-)
>  create mode 100644 arch/x86/include/asm/efi-stub.h
>  delete mode 100644 drivers/acpi/hmat/Makefile
>  rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (70%)
>  create mode 100644 drivers/acpi/numa/Makefile
>  rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (85%)
>  rename drivers/acpi/{numa.c => numa/srat.c} (100%)
>  create mode 100644 drivers/dax/hmem.c
>  create mode 100644 drivers/firmware/efi/fake_mem.h
>  create mode 100644 drivers/firmware/efi/x86-fake_mem.c
>  create mode 100644 include/linux/memregion.h
>  create mode 100644 lib/memregion.c
> 

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

for the ACPI-related changes in this series.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 00/10] EFI Specific Purpose Memory Support
  2019-09-02 11:09 ` [PATCH v5 00/10] EFI Specific Purpose Memory Support Rafael J. Wysocki
@ 2019-09-04 23:06   ` Dan Williams
  2019-09-06 11:37     ` Rafael J. Wysocki
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2019-09-04 23:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, Rafael J. Wysocki, Dave Jiang, Jonathan Cameron,
	Keith Busch, kbuild test robot, Andy Shevchenko, Borislav Petkov,
	Vishal Verma, H. Peter Anvin, X86 ML, Dave Hansen, Ingo Molnar,
	Len Brown, Peter Zijlstra, Ard Biesheuvel, Andy Lutomirski,
	Darren Hart, Linux Kernel Mailing List, linux-efi

On Mon, Sep 2, 2019 at 4:09 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> On Friday, August 30, 2019 3:52:18 AM CEST Dan Williams wrote:
> > Changes since v4 [1]:
> > - Rename the facility from "Application Reserved" to "Soft Reserved" to
> >   better reflect how the memory is treated. While the spec talks about
> >   "specific / application purpose" memory the expected kernel behavior is
> >   to make a best effort at reserving the memory from general purpose
> >   allocations.
> >
> > - Add a new efi=nosoftreserve option to disable consideration of the
> >   EFI_MEMORY_SP attribute at boot time. This is also motivated by
> >   Christoph's initial feedback of allowing the kernel to opt-out of the
> >   policy whims of the platform BIOS implementation.
> >
> > - Update the KASLR implementation to exclude soft-reserved memory
> >   including the case where soft-reserved memory is specified via the
> >   efi_fake_mem= attribute-override command-line option.
> >
> > - Move the memregion allocator to its own object file. v4 had it in
> >   kernel/resource.c which caused compile errors on Sparc. I otherwise
> >   could not find an appropriate place to stash it.
> >
> > - Rebase on a merge of tip/master and rafael/linux-next since the series
> >   collides with changes in both those trees.
> >
> > [1]: https://lore.kernel.org/r/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
> >
> > ---
> >
> > Thomas, Rafael,
> >
> > This happens to collide with both your trees. I think the content
> > warrants going through the x86 tree, but would need to publish commit:
> >
> > 5c7ed4385424 HMAT: Skip publishing target info for nodes with no online memory
> >
> > ...in Rafael's tree as a stable id for -tip to pull in, but I'm also
> > open to other options. I've retained Dave's reviewed-by from v4.
> >
> > ---
> >
> > The EFI 2.8 Specification [2] introduces the EFI_MEMORY_SP ("specific
> > purpose") memory attribute. This attribute bit replaces the deprecated
> > ACPI HMAT "reservation hint" that was introduced in ACPI 6.2 and removed
> > in ACPI 6.3.
> >
> > Given the increasing diversity of memory types that might be advertised
> > to the operating system, there is a need for platform firmware to hint
> > which memory ranges are free for the OS to use as general purpose memory
> > and which ranges are intended for application specific usage. For
> > example, an application with prior knowledge of the platform may expect
> > to be able to exclusively allocate a precious / limited pool of high
> > bandwidth memory. Alternatively, for the general purpose case, the
> > operating system may want to make the memory available on a best effort
> > basis as a unique numa-node with performance properties by the new
> > CONFIG_HMEM_REPORTING [3] facility.
> >
> > In support of optionally allowing either application-exclusive and
> > core-kernel-mm managed access to differentiated memory, claim
> > EFI_MEMORY_SP ranges for exposure as "soft reserved" and assigned to a
> > device-dax instance by default. Such instances can be directly owned /
> > mapped by a platform-topology-aware application. Alternatively, with the
> > new kmem facility [4], the administrator has the option to instead
> > designate that those memory ranges be hot-added to the core-kernel-mm as
> > a unique memory numa-node. In short, allow for the decision about what
> > software agent manages soft-reserved memory to be made at runtime.
> >
> > The patches build on the new HMAT+HMEM_REPORTING facilities merged
> > for v5.2-rc1. The implementation is tested with qemu emulation of HMAT
> > [5] plus the efi_fake_mem facility for applying the EFI_MEMORY_SP
> > attribute. Specific details on reproducing the test configuration are in
> > patch 10.
> >
> > [2]: https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> > [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1cf33aafb84
> > [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308f
> > [5]: http://patchwork.ozlabs.org/cover/1096737/
> >
> > ---
> >
> > Dan Williams (10):
> >       acpi/numa: Establish a new drivers/acpi/numa/ directory
> >       efi: Enumerate EFI_MEMORY_SP
> >       x86, efi: Push EFI_MEMMAP check into leaf routines
> >       x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
> >       x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
> >       lib: Uplevel the pmem "region" ida to a global allocator
> >       dax: Fix alloc_dax_region() compile warning
> >       device-dax: Add a driver for "hmem" devices
> >       acpi/numa/hmat: Register HMAT at device_initcall level
> >       acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device
> >
> >
> >  Documentation/admin-guide/kernel-parameters.txt |   19 +++
> >  arch/x86/Kconfig                                |   21 ++++
> >  arch/x86/boot/compressed/eboot.c                |    7 +
> >  arch/x86/boot/compressed/kaslr.c                |   50 +++++++-
> >  arch/x86/include/asm/e820/types.h               |    8 +
> >  arch/x86/include/asm/efi-stub.h                 |   11 ++
> >  arch/x86/include/asm/efi.h                      |   17 +++
> >  arch/x86/kernel/e820.c                          |   12 ++
> >  arch/x86/kernel/setup.c                         |   19 ++-
> >  arch/x86/platform/efi/efi.c                     |   56 +++++++++
> >  arch/x86/platform/efi/quirks.c                  |    3 +
> >  drivers/acpi/Kconfig                            |    9 --
> >  drivers/acpi/Makefile                           |    3 -
> >  drivers/acpi/hmat/Makefile                      |    2
> >  drivers/acpi/numa/Kconfig                       |    8 +
> >  drivers/acpi/numa/Makefile                      |    3 +
> >  drivers/acpi/numa/hmat.c                        |  138 +++++++++++++++++++++--
> >  drivers/acpi/numa/srat.c                        |    0
> >  drivers/dax/Kconfig                             |   27 ++++-
> >  drivers/dax/Makefile                            |    2
> >  drivers/dax/bus.c                               |    2
> >  drivers/dax/bus.h                               |    2
> >  drivers/dax/dax-private.h                       |    2
> >  drivers/dax/hmem.c                              |   57 ++++++++++
> >  drivers/firmware/efi/Makefile                   |    5 +
> >  drivers/firmware/efi/efi.c                      |    8 +
> >  drivers/firmware/efi/esrt.c                     |    3 +
> >  drivers/firmware/efi/fake_mem.c                 |   26 ++--
> >  drivers/firmware/efi/fake_mem.h                 |   10 ++
> >  drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 ++
> >  drivers/firmware/efi/x86-fake_mem.c             |   69 ++++++++++++
> >  drivers/nvdimm/Kconfig                          |    1
> >  drivers/nvdimm/core.c                           |    1
> >  drivers/nvdimm/nd-core.h                        |    1
> >  drivers/nvdimm/region_devs.c                    |   13 +-
> >  include/linux/efi.h                             |    4 -
> >  include/linux/ioport.h                          |    1
> >  include/linux/memregion.h                       |   23 ++++
> >  lib/Kconfig                                     |    3 +
> >  lib/Makefile                                    |    1
> >  lib/memregion.c                                 |   18 +++
> >  41 files changed, 584 insertions(+), 93 deletions(-)
> >  create mode 100644 arch/x86/include/asm/efi-stub.h
> >  delete mode 100644 drivers/acpi/hmat/Makefile
> >  rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (70%)
> >  create mode 100644 drivers/acpi/numa/Makefile
> >  rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (85%)
> >  rename drivers/acpi/{numa.c => numa/srat.c} (100%)
> >  create mode 100644 drivers/dax/hmem.c
> >  create mode 100644 drivers/firmware/efi/fake_mem.h
> >  create mode 100644 drivers/firmware/efi/x86-fake_mem.c
> >  create mode 100644 include/linux/memregion.h
> >  create mode 100644 lib/memregion.c
> >
>
> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> for the ACPI-related changes in this series.

Thanks Rafael, is commit 5c7ed4385424 on a stable branch that Thomas
could merge, or Thomas, is this all too late for v5.4?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 00/10] EFI Specific Purpose Memory Support
  2019-09-04 23:06   ` Dan Williams
@ 2019-09-06 11:37     ` Rafael J. Wysocki
  2019-10-03 15:43       ` Jonathan Cameron
  0 siblings, 1 reply; 28+ messages in thread
From: Rafael J. Wysocki @ 2019-09-06 11:37 UTC (permalink / raw)
  To: Dan Williams
  Cc: Rafael J. Wysocki, Thomas Gleixner, Dave Jiang, Jonathan Cameron,
	Keith Busch, kbuild test robot, Andy Shevchenko, Borislav Petkov,
	Vishal Verma, H. Peter Anvin, X86 ML, Dave Hansen, Ingo Molnar,
	Len Brown, Peter Zijlstra, Ard Biesheuvel, Andy Lutomirski,
	Darren Hart, Linux Kernel Mailing List, linux-efi

On 9/5/2019 1:06 AM, Dan Williams wrote:
> On Mon, Sep 2, 2019 at 4:09 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Friday, August 30, 2019 3:52:18 AM CEST Dan Williams wrote:
>>> Changes since v4 [1]:
>>> - Rename the facility from "Application Reserved" to "Soft Reserved" to
>>>    better reflect how the memory is treated. While the spec talks about
>>>    "specific / application purpose" memory the expected kernel behavior is
>>>    to make a best effort at reserving the memory from general purpose
>>>    allocations.
>>>
>>> - Add a new efi=nosoftreserve option to disable consideration of the
>>>    EFI_MEMORY_SP attribute at boot time. This is also motivated by
>>>    Christoph's initial feedback of allowing the kernel to opt-out of the
>>>    policy whims of the platform BIOS implementation.
>>>
>>> - Update the KASLR implementation to exclude soft-reserved memory
>>>    including the case where soft-reserved memory is specified via the
>>>    efi_fake_mem= attribute-override command-line option.
>>>
>>> - Move the memregion allocator to its own object file. v4 had it in
>>>    kernel/resource.c which caused compile errors on Sparc. I otherwise
>>>    could not find an appropriate place to stash it.
>>>
>>> - Rebase on a merge of tip/master and rafael/linux-next since the series
>>>    collides with changes in both those trees.
>>>
>>> [1]: https://lore.kernel.org/r/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
>>>
>>> ---
>>>
>>> Thomas, Rafael,
>>>
>>> This happens to collide with both your trees. I think the content
>>> warrants going through the x86 tree, but would need to publish commit:
>>>
>>> 5c7ed4385424 HMAT: Skip publishing target info for nodes with no online memory
>>>
>>> ...in Rafael's tree as a stable id for -tip to pull in, but I'm also
>>> open to other options. I've retained Dave's reviewed-by from v4.
>>>
>>> ---
>>>
>>> The EFI 2.8 Specification [2] introduces the EFI_MEMORY_SP ("specific
>>> purpose") memory attribute. This attribute bit replaces the deprecated
>>> ACPI HMAT "reservation hint" that was introduced in ACPI 6.2 and removed
>>> in ACPI 6.3.
>>>
>>> Given the increasing diversity of memory types that might be advertised
>>> to the operating system, there is a need for platform firmware to hint
>>> which memory ranges are free for the OS to use as general purpose memory
>>> and which ranges are intended for application specific usage. For
>>> example, an application with prior knowledge of the platform may expect
>>> to be able to exclusively allocate a precious / limited pool of high
>>> bandwidth memory. Alternatively, for the general purpose case, the
>>> operating system may want to make the memory available on a best effort
>>> basis as a unique numa-node with performance properties by the new
>>> CONFIG_HMEM_REPORTING [3] facility.
>>>
>>> In support of optionally allowing either application-exclusive and
>>> core-kernel-mm managed access to differentiated memory, claim
>>> EFI_MEMORY_SP ranges for exposure as "soft reserved" and assigned to a
>>> device-dax instance by default. Such instances can be directly owned /
>>> mapped by a platform-topology-aware application. Alternatively, with the
>>> new kmem facility [4], the administrator has the option to instead
>>> designate that those memory ranges be hot-added to the core-kernel-mm as
>>> a unique memory numa-node. In short, allow for the decision about what
>>> software agent manages soft-reserved memory to be made at runtime.
>>>
>>> The patches build on the new HMAT+HMEM_REPORTING facilities merged
>>> for v5.2-rc1. The implementation is tested with qemu emulation of HMAT
>>> [5] plus the efi_fake_mem facility for applying the EFI_MEMORY_SP
>>> attribute. Specific details on reproducing the test configuration are in
>>> patch 10.
>>>
>>> [2]: https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
>>> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1cf33aafb84
>>> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308f
>>> [5]: http://patchwork.ozlabs.org/cover/1096737/
>>>
>>> ---
>>>
>>> Dan Williams (10):
>>>        acpi/numa: Establish a new drivers/acpi/numa/ directory
>>>        efi: Enumerate EFI_MEMORY_SP
>>>        x86, efi: Push EFI_MEMMAP check into leaf routines
>>>        x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
>>>        x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
>>>        lib: Uplevel the pmem "region" ida to a global allocator
>>>        dax: Fix alloc_dax_region() compile warning
>>>        device-dax: Add a driver for "hmem" devices
>>>        acpi/numa/hmat: Register HMAT at device_initcall level
>>>        acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device
>>>
>>>
>>>   Documentation/admin-guide/kernel-parameters.txt |   19 +++
>>>   arch/x86/Kconfig                                |   21 ++++
>>>   arch/x86/boot/compressed/eboot.c                |    7 +
>>>   arch/x86/boot/compressed/kaslr.c                |   50 +++++++-
>>>   arch/x86/include/asm/e820/types.h               |    8 +
>>>   arch/x86/include/asm/efi-stub.h                 |   11 ++
>>>   arch/x86/include/asm/efi.h                      |   17 +++
>>>   arch/x86/kernel/e820.c                          |   12 ++
>>>   arch/x86/kernel/setup.c                         |   19 ++-
>>>   arch/x86/platform/efi/efi.c                     |   56 +++++++++
>>>   arch/x86/platform/efi/quirks.c                  |    3 +
>>>   drivers/acpi/Kconfig                            |    9 --
>>>   drivers/acpi/Makefile                           |    3 -
>>>   drivers/acpi/hmat/Makefile                      |    2
>>>   drivers/acpi/numa/Kconfig                       |    8 +
>>>   drivers/acpi/numa/Makefile                      |    3 +
>>>   drivers/acpi/numa/hmat.c                        |  138 +++++++++++++++++++++--
>>>   drivers/acpi/numa/srat.c                        |    0
>>>   drivers/dax/Kconfig                             |   27 ++++-
>>>   drivers/dax/Makefile                            |    2
>>>   drivers/dax/bus.c                               |    2
>>>   drivers/dax/bus.h                               |    2
>>>   drivers/dax/dax-private.h                       |    2
>>>   drivers/dax/hmem.c                              |   57 ++++++++++
>>>   drivers/firmware/efi/Makefile                   |    5 +
>>>   drivers/firmware/efi/efi.c                      |    8 +
>>>   drivers/firmware/efi/esrt.c                     |    3 +
>>>   drivers/firmware/efi/fake_mem.c                 |   26 ++--
>>>   drivers/firmware/efi/fake_mem.h                 |   10 ++
>>>   drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 ++
>>>   drivers/firmware/efi/x86-fake_mem.c             |   69 ++++++++++++
>>>   drivers/nvdimm/Kconfig                          |    1
>>>   drivers/nvdimm/core.c                           |    1
>>>   drivers/nvdimm/nd-core.h                        |    1
>>>   drivers/nvdimm/region_devs.c                    |   13 +-
>>>   include/linux/efi.h                             |    4 -
>>>   include/linux/ioport.h                          |    1
>>>   include/linux/memregion.h                       |   23 ++++
>>>   lib/Kconfig                                     |    3 +
>>>   lib/Makefile                                    |    1
>>>   lib/memregion.c                                 |   18 +++
>>>   41 files changed, 584 insertions(+), 93 deletions(-)
>>>   create mode 100644 arch/x86/include/asm/efi-stub.h
>>>   delete mode 100644 drivers/acpi/hmat/Makefile
>>>   rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (70%)
>>>   create mode 100644 drivers/acpi/numa/Makefile
>>>   rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (85%)
>>>   rename drivers/acpi/{numa.c => numa/srat.c} (100%)
>>>   create mode 100644 drivers/dax/hmem.c
>>>   create mode 100644 drivers/firmware/efi/fake_mem.h
>>>   create mode 100644 drivers/firmware/efi/x86-fake_mem.c
>>>   create mode 100644 include/linux/memregion.h
>>>   create mode 100644 lib/memregion.c
>>>
>> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> for the ACPI-related changes in this series.
> Thanks Rafael, is commit 5c7ed4385424 on a stable branch that Thomas
> could merge, or Thomas, is this all too late for v5.4?

Yes, I've just exported the acpi-tables branch containing that commit as 
a stable one in the linux-pm.git tree at kernel.org.

Cheers!



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
  2019-08-30  1:52 ` [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP Dan Williams
@ 2019-09-10  6:48   ` Ingo Molnar
  2019-09-13 13:02   ` Ard Biesheuvel
  2019-09-13 19:48   ` Ard Biesheuvel
  2 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2019-09-10  6:48 UTC (permalink / raw)
  To: Dan Williams, Ard Biesheuvel
  Cc: tglx, rafael.j.wysocki, x86, Borislav Petkov, Ingo Molnar,
	H. Peter Anvin, Ard Biesheuvel, Dave Hansen, peterz,
	vishal.l.verma, linux-kernel, linux-efi


* Dan Williams <dan.j.williams@intel.com> wrote:

> Given that EFI_MEMORY_SP is platform BIOS policy descision for marking
> memory ranges as "reserved for a specific purpose" there will inevitably
> be scenarios where the BIOS omits the attribute in situations where it
> is desired. Unlike other attributes if the OS wants to reserve this
> memory from the kernel the reservation needs to happen early in init. So
> early, in fact, that it needs to happen before e820__memblock_setup()
> which is a pre-requisite for efi_fake_memmap() that wants to allocate
> memory for the updated table.
> 
> Introduce an x86 specific efi_fake_memmap_early() that can search for
> attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table
> accordingly.
> 
> The KASLR code that scans the command line looking for user-directed
> memory reservations also needs to be updated to consider
> "efi_fake_mem=nn@ss:0x40000" requests.
> 
> Cc: <x86@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

A couple of these patches are touching EFI code, but only the first one 
carries a Reviewed-by from Ard.

Ard, are these patches and the whole series fine with you?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 03/10] x86, efi: Push EFI_MEMMAP check into leaf routines
  2019-08-30  1:52 ` [PATCH v5 03/10] x86, efi: Push EFI_MEMMAP check into leaf routines Dan Williams
@ 2019-09-13  9:05   ` Ard Biesheuvel
  2019-09-13 12:32     ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2019-09-13  9:05 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Dave Hansen, Vishal L Verma, Linux Kernel Mailing List,
	linux-efi

On Fri, 30 Aug 2019 at 03:06, Dan Williams <dan.j.williams@intel.com> wrote:
>
> In preparation for adding another EFI_MEMMAP dependent call that needs
> to occur before e820__memblock_setup() fixup the existing efi calls to
> check for EFI_MEMMAP internally. This ends up being cleaner than the
> alternative of checking EFI_MEMMAP multiple times in setup_arch().
>
> Cc: <x86@kernel.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

I'd prefer it if the spurious whitespace changes could be dropped, but
otherwise, this looks fine to me, so I am not going to obsess about
it.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>


> ---
>  arch/x86/include/asm/efi.h      |    9 ++++++++-
>  arch/x86/kernel/setup.c         |   19 +++++++++----------
>  arch/x86/platform/efi/efi.c     |    3 +++
>  arch/x86/platform/efi/quirks.c  |    3 +++
>  drivers/firmware/efi/esrt.c     |    3 +++
>  drivers/firmware/efi/fake_mem.c |    2 +-
>  include/linux/efi.h             |    2 --
>  7 files changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 43a82e59c59d..45f853bce869 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -140,7 +140,6 @@ extern void efi_delete_dummy_variable(void);
>  extern void efi_switch_mm(struct mm_struct *mm);
>  extern void efi_recover_from_page_fault(unsigned long phys_addr);
>  extern void efi_free_boot_services(void);
> -extern void efi_reserve_boot_services(void);
>
>  struct efi_setup_data {
>         u64 fw_vendor;
> @@ -244,6 +243,8 @@ static inline bool efi_is_64bit(void)
>  extern bool efi_reboot_required(void);
>  extern bool efi_is_table_address(unsigned long phys_addr);
>
> +extern void efi_find_mirror(void);
> +extern void efi_reserve_boot_services(void);
>  #else
>  static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
>  static inline bool efi_reboot_required(void)
> @@ -254,6 +255,12 @@ static inline  bool efi_is_table_address(unsigned long phys_addr)
>  {
>         return false;
>  }
> +static inline void efi_find_mirror(void)
> +{
> +}
> +static inline void efi_reserve_boot_services(void)
> +{
> +}
>  #endif /* CONFIG_EFI */
>
>  #endif /* _ASM_X86_EFI_H */
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index bbe35bf879f5..9bfecb542440 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1118,21 +1118,20 @@ void __init setup_arch(char **cmdline_p)
>         cleanup_highmap();
>
>         memblock_set_current_limit(ISA_END_ADDRESS);
> +
>         e820__memblock_setup();
>
>         reserve_bios_regions();
>
> -       if (efi_enabled(EFI_MEMMAP)) {
> -               efi_fake_memmap();
> -               efi_find_mirror();
> -               efi_esrt_init();
> +       efi_fake_memmap();
> +       efi_find_mirror();
> +       efi_esrt_init();
>
> -               /*
> -                * The EFI specification says that boot service code won't be
> -                * called after ExitBootServices(). This is, in fact, a lie.
> -                */
> -               efi_reserve_boot_services();
> -       }
> +       /*
> +        * The EFI specification says that boot service code won't be
> +        * called after ExitBootServices(). This is, in fact, a lie.
> +        */
> +       efi_reserve_boot_services();
>
>         /* preallocate 4k for mptable mpc */
>         e820__memblock_alloc_reserved_mpc_new();
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index c202e1b07e29..0bb58eb33ca0 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -128,6 +128,9 @@ void __init efi_find_mirror(void)
>         efi_memory_desc_t *md;
>         u64 mirror_size = 0, total_size = 0;
>
> +       if (!efi_enabled(EFI_MEMMAP))
> +               return;
> +
>         for_each_efi_memory_desc(md) {
>                 unsigned long long start = md->phys_addr;
>                 unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 3b9fd679cea9..7675cf754d90 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -320,6 +320,9 @@ void __init efi_reserve_boot_services(void)
>  {
>         efi_memory_desc_t *md;
>
> +       if (!efi_enabled(EFI_MEMMAP))
> +               return;
> +
>         for_each_efi_memory_desc(md) {
>                 u64 start = md->phys_addr;
>                 u64 size = md->num_pages << EFI_PAGE_SHIFT;
> diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c
> index d6dd5f503fa2..2762e0662bf4 100644
> --- a/drivers/firmware/efi/esrt.c
> +++ b/drivers/firmware/efi/esrt.c
> @@ -246,6 +246,9 @@ void __init efi_esrt_init(void)
>         int rc;
>         phys_addr_t end;
>
> +       if (!efi_enabled(EFI_MEMMAP))
> +               return;
> +
>         pr_debug("esrt-init: loading.\n");
>         if (!esrt_table_exists())
>                 return;
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index 9501edc0fcfb..526b45331d96 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -44,7 +44,7 @@ void __init efi_fake_memmap(void)
>         void *new_memmap;
>         int i;
>
> -       if (!nr_fake_mem)
> +       if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
>                 return;
>
>         /* count up the number of EFI memory descriptor */
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index 5c1dd0221384..acc2b8982ed2 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -1045,9 +1045,7 @@ extern void efi_enter_virtual_mode (void);        /* switch EFI to virtual mode, if pos
>  extern efi_status_t efi_query_variable_store(u32 attributes,
>                                              unsigned long size,
>                                              bool nonblocking);
> -extern void efi_find_mirror(void);
>  #else
> -
>  static inline efi_status_t efi_query_variable_store(u32 attributes,
>                                                     unsigned long size,
>                                                     bool nonblocking)
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 03/10] x86, efi: Push EFI_MEMMAP check into leaf routines
  2019-09-13  9:05   ` Ard Biesheuvel
@ 2019-09-13 12:32     ` Dan Williams
  0 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-09-13 12:32 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Ingo Molnar, H. Peter Anvin, Andy Lutomirski, Peter Zijlstra,
	Dave Hansen, Vishal L Verma, Linux Kernel Mailing List,
	linux-efi

On Fri, Sep 13, 2019 at 2:06 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Fri, 30 Aug 2019 at 03:06, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > In preparation for adding another EFI_MEMMAP dependent call that needs
> > to occur before e820__memblock_setup() fixup the existing efi calls to
> > check for EFI_MEMMAP internally. This ends up being cleaner than the
> > alternative of checking EFI_MEMMAP multiple times in setup_arch().
> >
> > Cc: <x86@kernel.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> I'd prefer it if the spurious whitespace changes could be dropped, but
> otherwise, this looks fine to me, so I am not going to obsess about
> it.

Fair point, I'll drop those when I resubmit after -rc1.

> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
  2019-08-30  1:52 ` [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax Dan Williams
@ 2019-09-13 12:59   ` Ard Biesheuvel
  2019-09-13 16:22     ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2019-09-13 12:59 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Darren Hart,
	Andy Shevchenko, Andy Lutomirski, Peter Zijlstra,
	kbuild test robot, Dave Hansen, Vishal L Verma,
	Linux Kernel Mailing List, linux-efi

On Fri, 30 Aug 2019 at 03:06, Dan Williams <dan.j.williams@intel.com> wrote:
>
> UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> interpretation of the EFI Memory Types as "reserved for a specific
> purpose".
>
> The proposed Linux behavior for specific purpose memory is that it is
> reserved for direct-access (device-dax) by default and not available for
> any kernel usage, not even as an OOM fallback.  Later, through udev
> scripts or another init mechanism, these device-dax claimed ranges can
> be reconfigured and hot-added to the available System-RAM with a unique
> node identifier. This device-dax management scheme implements "soft" in
> the "soft reserved" designation by allowing some or all of the
> reservation to be recovered as typical memory. This policy can be
> disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
> efi=nosoftreserve.
>
> This patch introduces 2 new concepts at once given the entanglement
> between early boot enumeration relative to memory that can optionally be
> reserved from the kernel page allocator by default. The new concepts
> are:
>
> - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
>   attribute on EFI_CONVENTIONAL memory, update the E820 map with this
>   new type. Only perform this classification if the
>   CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
>   typical ram.
>
> - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
>   a device driver to search iomem resources for application specific
>   memory. Teach the iomem code to identify such ranges as "Soft Reserved".
>
> A follow-on change integrates parsing of the ACPI HMAT to identify the
> node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> now, just identify and reserve memory of this type.
>
> The translation of EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP to "soft
> reserved" is x86/E820-only, but other archs could choose to publish
> IORES_DESC_SOFT_RESERVED resources from their platform-firmware memory
> map handlers. Other EFI-capable platforms would need to go audit their
> local usages of EFI_CONVENTIONAL_MEMORY to consider the soft reserved
> case.
>
> Cc: <x86@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Darren Hart <dvhart@infradead.org>
> Cc: Andy Shevchenko <andy@infradead.org>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Reported-by: kbuild test robot <lkp@intel.com>
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Hi Dan,

I understand that non-x86 may be out of scope for you, but this patch
makes changes to x86 and generic code at the same time without regard
for other architectures.
I'd prefer it if we could cover ARM cleanly as well right at the start.

The first step would be to split out the EFI stub changes (i.e., to
avoid allocating memory from EFI_MEMORY_SP regions) and the EFI core
changes from the other changes. Then, I would like to ask for your
help to get the arm64 part implemented where EFI_MEMORY_SP memory gets
registered/reserved in a way that allows the HMAT code (which should
be arch agnostic) to operate in the same way as it does on x86. Would
it be enough to simply memblock_reserve() it and insert the iomem
resource with the soft_reserved attribute?

Some more comments below.

> ---
>  Documentation/admin-guide/kernel-parameters.txt |   19 +++++++--
>  arch/x86/Kconfig                                |   21 +++++++++
>  arch/x86/boot/compressed/eboot.c                |    7 +++
>  arch/x86/boot/compressed/kaslr.c                |    4 ++
>  arch/x86/include/asm/e820/types.h               |    8 ++++
>  arch/x86/include/asm/efi-stub.h                 |   11 +++++
>  arch/x86/kernel/e820.c                          |   12 +++++
>  arch/x86/platform/efi/efi.c                     |   51 +++++++++++++++++++++--
>  drivers/firmware/efi/efi.c                      |    3 +
>  drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 +++++
>  include/linux/efi.h                             |    1
>  include/linux/ioport.h                          |    1
>  12 files changed, 139 insertions(+), 11 deletions(-)
>  create mode 100644 arch/x86/include/asm/efi-stub.h
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1c67acd1df65..dd28f0726309 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1152,7 +1152,8 @@
>                         Format: {"off" | "on" | "skip[mbr]"}
>
>         efi=            [EFI]
> -                       Format: { "old_map", "nochunk", "noruntime", "debug" }
> +                       Format: { "old_map", "nochunk", "noruntime", "debug",
> +                                 "nosoftreserve" }
>                         old_map [X86-64]: switch to the old ioremap-based EFI
>                         runtime services mapping. 32-bit still uses this one by
>                         default.
> @@ -1161,6 +1162,12 @@
>                         firmware implementations.
>                         noruntime : disable EFI runtime services support
>                         debug: enable misc debug output
> +                       nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
> +                       attribute may cause the kernel to reserve the
> +                       memory range for a memory mapping driver to
> +                       claim. Specify efi=nosoftreserve to disable this
> +                       reservation and treat the memory by its base type
> +                       (i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
>
>         efi_no_storage_paranoia [EFI; X86]
>                         Using this parameter you can use more than 50% of
> @@ -1173,15 +1180,21 @@
>                         updating original EFI memory map.
>                         Region of memory which aa attribute is added to is
>                         from ss to ss+nn.
> +
>                         If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000
>                         is specified, EFI_MEMORY_MORE_RELIABLE(0x10000)
>                         attribute is added to range 0x100000000-0x180000000 and
>                         0x10a0000000-0x1120000000.
>
> +                       If efi_fake_mem=8G@9G:0x40000 is specified, the
> +                       EFI_MEMORY_SP(0x40000) attribute is added to
> +                       range 0x240000000-0x43fffffff.
> +
>                         Using this parameter you can do debugging of EFI memmap
> -                       related feature. For example, you can do debugging of
> +                       related features. For example, you can do debugging of
>                         Address Range Mirroring feature even if your box
> -                       doesn't support it.
> +                       doesn't support it, or mark specific memory as
> +                       "soft reserved".
>
>         efivar_ssdt=    [EFI; X86] Name of an EFI variable that contains an SSDT
>                         that is to be dynamically loaded by Linux. If there are
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 4195f44c6a09..bced13503bb1 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1981,6 +1981,27 @@ config EFI_MIXED
>
>            If unsure, say N.
>
> +config EFI_SOFT_RESERVE
> +       bool "Reserve EFI Specific Purpose Memory"
> +       depends on EFI && ACPI_HMAT
> +       default ACPI_HMAT
> +       ---help---
> +         On systems that have mixed performance classes of memory EFI
> +         may indicate specific purpose memory with an attribute (See
> +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> +         attribute may have unique performance characteristics compared
> +         to the system's general purpose "System RAM" pool. On the
> +         expectation that such memory has application specific usage,
> +         and its base EFI memory type is "conventional" answer Y to
> +         arrange for the kernel to reserve it as a "Soft Reserved"
> +         resource, and set aside for direct-access (device-dax) by
> +         default. The memory range can later be optionally assigned to
> +         the page allocator by system administrator policy via the
> +         device-dax kmem facility. Say N to have the kernel treat this
> +         memory as "System RAM" by default.
> +
> +         If unsure, say Y.
> +

This should be in generic code.

>  config SECCOMP
>         def_bool y
>         prompt "Enable seccomp to safely compute untrusted bytecode"
> diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> index d6662fdef300..f2dc5896d770 100644
> --- a/arch/x86/boot/compressed/eboot.c
> +++ b/arch/x86/boot/compressed/eboot.c
> @@ -10,6 +10,7 @@
>  #include <linux/pci.h>
>
>  #include <asm/efi.h>
> +#include <asm/efi-stub.h>
>  #include <asm/e820/types.h>
>  #include <asm/setup.h>
>  #include <asm/desc.h>
> @@ -553,7 +554,11 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
>                 case EFI_BOOT_SERVICES_CODE:
>                 case EFI_BOOT_SERVICES_DATA:
>                 case EFI_CONVENTIONAL_MEMORY:
> -                       e820_type = E820_TYPE_RAM;
> +                       if (!efi_nosoftreserve
> +                                       && (d->attribute & EFI_MEMORY_SP))
> +                               e820_type = E820_TYPE_SOFT_RESERVED;
> +                       else
> +                               e820_type = E820_TYPE_RAM;
>                         break;
>
>                 case EFI_ACPI_MEMORY_NVS:
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 2e53c056ba20..093e84e28b7a 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -38,6 +38,7 @@
>  #include <linux/efi.h>
>  #include <generated/utsrelease.h>
>  #include <asm/efi.h>
> +#include <asm/efi-stub.h>
>
>  /* Macros used by the included decompressor code below. */
>  #define STATIC
> @@ -760,6 +761,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
>                 if (md->type != EFI_CONVENTIONAL_MEMORY)
>                         continue;
>
> +               if (!efi_nosoftreserve && (md->attribute & EFI_MEMORY_SP))
> +                       continue;
> +
>                 if (efi_mirror_found &&
>                     !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
>                         continue;
> diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
> index c3aa4b5e49e2..314f75d886d0 100644
> --- a/arch/x86/include/asm/e820/types.h
> +++ b/arch/x86/include/asm/e820/types.h
> @@ -28,6 +28,14 @@ enum e820_type {
>          */
>         E820_TYPE_PRAM          = 12,
>
> +       /*
> +        * Special-purpose memory is indicated to the system via the
> +        * EFI_MEMORY_SP attribute. Define an e820 translation of this
> +        * memory type for the purpose of reserving this range and
> +        * marking it with the IORES_DESC_SOFT_RESERVED designation.
> +        */
> +       E820_TYPE_SOFT_RESERVED = 0xefffffff,
> +
>         /*
>          * Reserved RAM used by the kernel itself if
>          * CONFIG_INTEL_TXT=y is enabled, memory of this type
> diff --git a/arch/x86/include/asm/efi-stub.h b/arch/x86/include/asm/efi-stub.h
> new file mode 100644
> index 000000000000..16ebd036387b
> --- /dev/null
> +++ b/arch/x86/include/asm/efi-stub.h
> @@ -0,0 +1,11 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#ifndef _X86_EFI_STUB_H_
> +#define _X86_EFI_STUB_H_
> +
> +#ifdef CONFIG_EFI_STUB
> +extern bool efi_nosoftreserve;
> +#else
> +#define efi_nosoftreserve (1)
> +#endif
> +
> +#endif /* _X86_EFI_STUB_H_ */

Please put this in generic code as well (but you need a function not a
variable - see below)

> diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> index 7da2bcd2b8eb..9976106b57ec 100644
> --- a/arch/x86/kernel/e820.c
> +++ b/arch/x86/kernel/e820.c
> @@ -190,6 +190,7 @@ static void __init e820_print_type(enum e820_type type)
>         case E820_TYPE_RAM:             /* Fall through: */
>         case E820_TYPE_RESERVED_KERN:   pr_cont("usable");                      break;
>         case E820_TYPE_RESERVED:        pr_cont("reserved");                    break;
> +       case E820_TYPE_SOFT_RESERVED:   pr_cont("soft reserved");               break;
>         case E820_TYPE_ACPI:            pr_cont("ACPI data");                   break;
>         case E820_TYPE_NVS:             pr_cont("ACPI NVS");                    break;
>         case E820_TYPE_UNUSABLE:        pr_cont("unusable");                    break;
> @@ -1037,6 +1038,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry)
>         case E820_TYPE_PRAM:            return "Persistent Memory (legacy)";
>         case E820_TYPE_PMEM:            return "Persistent Memory";
>         case E820_TYPE_RESERVED:        return "Reserved";
> +       case E820_TYPE_SOFT_RESERVED:   return "Soft Reserved";
>         default:                        return "Unknown E820 type";
>         }
>  }
> @@ -1052,6 +1054,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry)
>         case E820_TYPE_PRAM:            /* Fall-through: */
>         case E820_TYPE_PMEM:            /* Fall-through: */
>         case E820_TYPE_RESERVED:        /* Fall-through: */
> +       case E820_TYPE_SOFT_RESERVED:   /* Fall-through: */
>         default:                        return IORESOURCE_MEM;
>         }
>  }
> @@ -1064,6 +1067,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
>         case E820_TYPE_PMEM:            return IORES_DESC_PERSISTENT_MEMORY;
>         case E820_TYPE_PRAM:            return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>         case E820_TYPE_RESERVED:        return IORES_DESC_RESERVED;
> +       case E820_TYPE_SOFT_RESERVED:   return IORES_DESC_SOFT_RESERVED;
>         case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
>         case E820_TYPE_RAM:             /* Fall-through: */
>         case E820_TYPE_UNUSABLE:        /* Fall-through: */
> @@ -1078,11 +1082,12 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res)
>                 return true;
>
>         /*
> -        * Treat persistent memory like device memory, i.e. reserve it
> -        * for exclusive use of a driver
> +        * Treat persistent memory and other special memory ranges like
> +        * device memory, i.e. reserve it for exclusive use of a driver
>          */
>         switch (type) {
>         case E820_TYPE_RESERVED:
> +       case E820_TYPE_SOFT_RESERVED:
>         case E820_TYPE_PRAM:
>         case E820_TYPE_PMEM:
>                 return false;
> @@ -1285,6 +1290,9 @@ void __init e820__memblock_setup(void)
>                 if (end != (resource_size_t)end)
>                         continue;
>
> +               if (entry->type == E820_TYPE_SOFT_RESERVED)
> +                       memblock_reserve(entry->addr, entry->size);
> +
>                 if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
>                         continue;
>
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 0bb58eb33ca0..9cfb7f1cf25d 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -151,10 +151,18 @@ void __init efi_find_mirror(void)
>   * more than the max 128 entries that can fit in the e820 legacy
>   * (zeropage) memory map.
>   */
> +enum add_efi_mode {
> +       ADD_EFI_ALL,
> +       ADD_EFI_SOFT_RESERVED,
> +};
>
> -static void __init do_add_efi_memmap(void)
> +static void __init do_add_efi_memmap(enum add_efi_mode mode)
>  {
>         efi_memory_desc_t *md;
> +       int add = 0;
> +
> +       if (!efi_enabled(EFI_MEMMAP))
> +               return;
>
>         for_each_efi_memory_desc(md) {
>                 unsigned long long start = md->phys_addr;
> @@ -167,7 +175,10 @@ static void __init do_add_efi_memmap(void)
>                 case EFI_BOOT_SERVICES_CODE:
>                 case EFI_BOOT_SERVICES_DATA:
>                 case EFI_CONVENTIONAL_MEMORY:
> -                       if (md->attribute & EFI_MEMORY_WB)
> +                       if (efi_enabled(EFI_MEM_SOFT_RESERVE)
> +                                       && (md->attribute & EFI_MEMORY_SP))
> +                               e820_type = E820_TYPE_SOFT_RESERVED;
> +                       else if (md->attribute & EFI_MEMORY_WB)
>                                 e820_type = E820_TYPE_RAM;
>                         else
>                                 e820_type = E820_TYPE_RESERVED;
> @@ -193,9 +204,17 @@ static void __init do_add_efi_memmap(void)
>                         e820_type = E820_TYPE_RESERVED;
>                         break;
>                 }
> +
> +               if (e820_type == E820_TYPE_SOFT_RESERVED)
> +                       /* always add E820_TYPE_SOFT_RESERVED */;
> +               else if (mode == ADD_EFI_SOFT_RESERVED)
> +                       continue;
> +
> +               add++;
>                 e820__range_add(start, size, e820_type);
>         }
> -       e820__update_table(e820_table);
> +       if (add)
> +               e820__update_table(e820_table);
>  }
>
>  int __init efi_memblock_x86_reserve_range(void)
> @@ -227,8 +246,18 @@ int __init efi_memblock_x86_reserve_range(void)
>         if (rv)
>                 return rv;
>
> -       if (add_efi_memmap)
> -               do_add_efi_memmap();
> +       if (add_efi_memmap) {
> +               do_add_efi_memmap(ADD_EFI_ALL);
> +       } else {
> +               /*
> +                * Given add_efi_memmap defaults to 0 and there there is no e820
> +                * mechanism for soft-reserved memory. Explicitly scan for
> +                * soft-reserved memory. Otherwise, the mechanism to disable the
> +                * kernel's consideration of EFI_MEMORY_SP is the
> +                * efi=nosoftreserve option.
> +                */
> +               do_add_efi_memmap(ADD_EFI_SOFT_RESERVED);
> +       }
>
>         WARN(efi.memmap.desc_version != 1,
>              "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
> @@ -781,6 +810,15 @@ static bool should_map_region(efi_memory_desc_t *md)
>         if (IS_ENABLED(CONFIG_X86_32))
>                 return false;
>
> +       /*
> +        * EFI specific purpose memory may be reserved by default
> +        * depending on kernel config and boot options.
> +        */
> +       if (md->type == EFI_CONVENTIONAL_MEMORY
> +                       && efi_enabled(EFI_MEM_SOFT_RESERVE)
> +                       && (md->attribute & EFI_MEMORY_SP))
> +               return false;
> +
>         /*
>          * Map all of RAM so that we can access arguments in the 1:1
>          * mapping when making EFI runtime calls.
> @@ -1072,6 +1110,9 @@ static int __init arch_parse_efi_cmdline(char *str)
>         if (parse_option_str(str, "old_map"))
>                 set_bit(EFI_OLD_MEMMAP, &efi.flags);
>
> +       if (parse_option_str(str, "nosoftreserve"))
> +               clear_bit(EFI_MEM_SOFT_RESERVE, &efi.flags);
> +

Can we move this to the generic efi= handling code?

>         return 0;
>  }
>  early_param("efi", arch_parse_efi_cmdline);
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 363bb9d00fa5..6d54d5c74347 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -52,6 +52,9 @@ struct efi __read_mostly efi = {
>         .tpm_log                = EFI_INVALID_TABLE_ADDR,
>         .tpm_final_log          = EFI_INVALID_TABLE_ADDR,
>         .mem_reserve            = EFI_INVALID_TABLE_ADDR,
> +#ifdef CONFIG_EFI_SOFT_RESERVE
> +       .flags                  = 1UL << EFI_MEM_SOFT_RESERVE,
> +#endif
>  };
>  EXPORT_SYMBOL(efi);
>

I'd prefer it if we could call this EFI_MEM_NO_SOFT_RESERVE instead,
and invert the meaning of the bit.

> diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> index 3caae7f2cf56..35ee98a2c00c 100644
> --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> @@ -28,6 +28,7 @@
>  #define EFI_READ_CHUNK_SIZE    (1024 * 1024)
>
>  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
> +bool efi_nosoftreserve;
>

This needs a getter function if you want to access it from other
compilation units. This has to do with how the early relocation code
handles data symbol references. Please refer to nokaslr() for an
example.

>  static int __section(.data) __nokaslr;
>  static int __section(.data) __quiet;
> @@ -211,6 +212,9 @@ efi_status_t efi_high_alloc(efi_system_table_t *sys_table_arg,
>                 if (desc->type != EFI_CONVENTIONAL_MEMORY)
>                         continue;
>
> +               if (!efi_nosoftreserve && (desc->attribute & EFI_MEMORY_SP))
> +                       continue;
> +
>                 if (desc->num_pages < nr_pages)
>                         continue;
>
> @@ -305,6 +309,9 @@ efi_status_t efi_low_alloc(efi_system_table_t *sys_table_arg,
>                 if (desc->type != EFI_CONVENTIONAL_MEMORY)
>                         continue;
>
> +               if (!efi_nosoftreserve && (desc->attribute & EFI_MEMORY_SP))
> +                       continue;
> +
>                 if (desc->num_pages < nr_pages)
>                         continue;
>
> @@ -489,6 +496,11 @@ efi_status_t efi_parse_options(char const *cmdline)
>                         __novamap = 1;
>                 }
>
> +               if (!strncmp(str, "nosoftreserve", 7)) {
> +                       str += strlen("nosoftreserve");
> +                       efi_nosoftreserve = 1;
> +               }
> +
>                 /* Group words together, delimited by "," */
>                 while (*str && *str != ' ' && *str != ',')
>                         str++;
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index acc2b8982ed2..f50e0f01a5ed 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -1201,6 +1201,7 @@ extern int __init efi_setup_pcdp_console(char *);
>  #define EFI_DBG                        8       /* Print additional debug info at runtime */
>  #define EFI_NX_PE_DATA         9       /* Can runtime data regions be mapped non-executable? */
>  #define EFI_MEM_ATTR           10      /* Did firmware publish an EFI_MEMORY_ATTRIBUTES table? */
> +#define EFI_MEM_SOFT_RESERVE   11      /* Is the kernel configured to honor soft reservations? */
>
>  #ifdef CONFIG_EFI
>  /*
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 5b6a7121c9f0..17d9b1abc2f0 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -134,6 +134,7 @@ enum {
>         IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
>         IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
>         IORES_DESC_RESERVED                     = 7,
> +       IORES_DESC_SOFT_RESERVED                = 8,
>  };
>
>  /*
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
  2019-08-30  1:52 ` [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP Dan Williams
  2019-09-10  6:48   ` Ingo Molnar
@ 2019-09-13 13:02   ` Ard Biesheuvel
  2019-09-13 15:02     ` Dan Williams
  2019-09-13 19:48   ` Ard Biesheuvel
  2 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2019-09-13 13:02 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Vishal L Verma, Linux Kernel Mailing List,
	linux-efi

On Fri, 30 Aug 2019 at 03:07, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Given that EFI_MEMORY_SP is platform BIOS policy descision for marking

decision

> memory ranges as "reserved for a specific purpose" there will inevitably
> be scenarios where the BIOS omits the attribute in situations where it
> is desired. Unlike other attributes if the OS wants to reserve this
> memory from the kernel the reservation needs to happen early in init. So
> early, in fact, that it needs to happen before e820__memblock_setup()
> which is a pre-requisite for efi_fake_memmap() that wants to allocate
> memory for the updated table.
>
> Introduce an x86 specific efi_fake_memmap_early() that can search for
> attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table
> accordingly.
>

Is this early enough? The EFI stub runs before this, and allocates
memory as well.

> The KASLR code that scans the command line looking for user-directed
> memory reservations also needs to be updated to consider
> "efi_fake_mem=nn@ss:0x40000" requests.
>
> Cc: <x86@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  arch/x86/boot/compressed/kaslr.c    |   46 ++++++++++++++++++++---
>  arch/x86/include/asm/efi.h          |    8 ++++
>  arch/x86/platform/efi/efi.c         |    2 +
>  drivers/firmware/efi/Makefile       |    5 ++-
>  drivers/firmware/efi/fake_mem.c     |   24 ++++++------
>  drivers/firmware/efi/fake_mem.h     |   10 +++++
>  drivers/firmware/efi/x86-fake_mem.c |   69 +++++++++++++++++++++++++++++++++++
>  7 files changed, 143 insertions(+), 21 deletions(-)
>  create mode 100644 drivers/firmware/efi/fake_mem.h
>  create mode 100644 drivers/firmware/efi/x86-fake_mem.c
>
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index 093e84e28b7a..53ed3991f9a8 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -133,8 +133,14 @@ char *skip_spaces(const char *str)
>  #include "../../../../lib/ctype.c"
>  #include "../../../../lib/cmdline.c"
>
> +enum parse_mode {
> +       PARSE_MEMMAP,
> +       PARSE_EFI,
> +};
> +
>  static int
> -parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
> +parse_memmap(char *p, unsigned long long *start, unsigned long long *size,
> +               enum parse_mode mode)
>  {
>         char *oldp;
>
> @@ -157,8 +163,33 @@ parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
>                 *start = memparse(p + 1, &p);
>                 return 0;
>         case '@':
> -               /* memmap=nn@ss specifies usable region, should be skipped */
> -               *size = 0;
> +               if (mode == PARSE_MEMMAP) {
> +                       /*
> +                        * memmap=nn@ss specifies usable region, should
> +                        * be skipped
> +                        */
> +                       *size = 0;
> +               } else {
> +                       unsigned long long flags;
> +
> +                       /*
> +                        * efi_fake_mem=nn@ss:attr the attr specifies
> +                        * flags that might imply a soft-reservation.
> +                        */
> +                       *start = memparse(p + 1, &p);
> +                       if (p && *p == ':') {
> +                               p++;
> +                               oldp = p;
> +                               flags = simple_strtoull(p, &p, 0);
> +                               if (p == oldp)
> +                                       *size = 0;
> +                               else if (flags & EFI_MEMORY_SP)
> +                                       return 0;
> +                               else
> +                                       *size = 0;
> +                       } else
> +                               *size = 0;
> +               }
>                 /* Fall through */
>         default:
>                 /*
> @@ -173,7 +204,7 @@ parse_memmap(char *p, unsigned long long *start, unsigned long long *size)
>         return -EINVAL;
>  }
>
> -static void mem_avoid_memmap(char *str)
> +static void mem_avoid_memmap(enum parse_mode mode, char *str)
>  {
>         static int i;
>
> @@ -188,7 +219,7 @@ static void mem_avoid_memmap(char *str)
>                 if (k)
>                         *k++ = 0;
>
> -               rc = parse_memmap(str, &start, &size);
> +               rc = parse_memmap(str, &start, &size, mode);
>                 if (rc < 0)
>                         break;
>                 str = k;
> @@ -239,7 +270,6 @@ static void parse_gb_huge_pages(char *param, char *val)
>         }
>  }
>
> -
>  static void handle_mem_options(void)
>  {
>         char *args = (char *)get_cmd_line_ptr();
> @@ -272,7 +302,7 @@ static void handle_mem_options(void)
>                 }
>
>                 if (!strcmp(param, "memmap")) {
> -                       mem_avoid_memmap(val);
> +                       mem_avoid_memmap(PARSE_MEMMAP, val);
>                 } else if (strstr(param, "hugepages")) {
>                         parse_gb_huge_pages(param, val);
>                 } else if (!strcmp(param, "mem")) {
> @@ -285,6 +315,8 @@ static void handle_mem_options(void)
>                                 goto out;
>
>                         mem_limit = mem_size;
> +               } else if (!strcmp(param, "efi_fake_mem")) {
> +                       mem_avoid_memmap(PARSE_EFI, val);
>                 }
>         }
>
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 45f853bce869..d028e9acdf1c 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -263,4 +263,12 @@ static inline void efi_reserve_boot_services(void)
>  }
>  #endif /* CONFIG_EFI */
>
> +#ifdef CONFIG_EFI_FAKE_MEMMAP
> +extern void __init efi_fake_memmap_early(void);
> +#else
> +static inline void efi_fake_memmap_early(void)
> +{
> +}
> +#endif
> +
>  #endif /* _ASM_X86_EFI_H */
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 9cfb7f1cf25d..ac63e244ae55 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -259,6 +259,8 @@ int __init efi_memblock_x86_reserve_range(void)
>                 do_add_efi_memmap(ADD_EFI_SOFT_RESERVED);
>         }
>
> +       efi_fake_memmap_early();
> +
>         WARN(efi.memmap.desc_version != 1,
>              "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
>              efi.memmap.desc_version);
> diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
> index 4ac2de4dfa72..d7a6db03ea79 100644
> --- a/drivers/firmware/efi/Makefile
> +++ b/drivers/firmware/efi/Makefile
> @@ -20,13 +20,16 @@ obj-$(CONFIG_UEFI_CPER)                     += cper.o
>  obj-$(CONFIG_EFI_RUNTIME_MAP)          += runtime-map.o
>  obj-$(CONFIG_EFI_RUNTIME_WRAPPERS)     += runtime-wrappers.o
>  obj-$(CONFIG_EFI_STUB)                 += libstub/
> -obj-$(CONFIG_EFI_FAKE_MEMMAP)          += fake_mem.o
> +obj-$(CONFIG_EFI_FAKE_MEMMAP)          += fake_map.o
>  obj-$(CONFIG_EFI_BOOTLOADER_CONTROL)   += efibc.o
>  obj-$(CONFIG_EFI_TEST)                 += test/
>  obj-$(CONFIG_EFI_DEV_PATH_PARSER)      += dev-path-parser.o
>  obj-$(CONFIG_APPLE_PROPERTIES)         += apple-properties.o
>  obj-$(CONFIG_EFI_RCI2_TABLE)           += rci2-table.o
>
> +fake_map-y                             += fake_mem.o
> +fake_map-$(CONFIG_X86)                 += x86-fake_mem.o
> +
>  arm-obj-$(CONFIG_EFI)                  := arm-init.o arm-runtime.o
>  obj-$(CONFIG_ARM)                      += $(arm-obj-y)
>  obj-$(CONFIG_ARM64)                    += $(arm-obj-y)
> diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c
> index 526b45331d96..bb9fc70d0cfa 100644
> --- a/drivers/firmware/efi/fake_mem.c
> +++ b/drivers/firmware/efi/fake_mem.c
> @@ -17,12 +17,10 @@
>  #include <linux/memblock.h>
>  #include <linux/types.h>
>  #include <linux/sort.h>
> -#include <asm/efi.h>
> +#include "fake_mem.h"
>
> -#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKE_MEM
> -
> -static struct efi_mem_range fake_mems[EFI_MAX_FAKEMEM];
> -static int nr_fake_mem;
> +struct efi_mem_range efi_fake_mems[EFI_MAX_FAKEMEM];
> +int nr_fake_mem;
>
>  static int __init cmp_fake_mem(const void *x1, const void *x2)
>  {
> @@ -50,7 +48,7 @@ void __init efi_fake_memmap(void)
>         /* count up the number of EFI memory descriptor */
>         for (i = 0; i < nr_fake_mem; i++) {
>                 for_each_efi_memory_desc(md) {
> -                       struct range *r = &fake_mems[i].range;
> +                       struct range *r = &efi_fake_mems[i].range;
>
>                         new_nr_map += efi_memmap_split_count(md, r);
>                 }
> @@ -70,7 +68,7 @@ void __init efi_fake_memmap(void)
>         }
>
>         for (i = 0; i < nr_fake_mem; i++)
> -               efi_memmap_insert(&efi.memmap, new_memmap, &fake_mems[i]);
> +               efi_memmap_insert(&efi.memmap, new_memmap, &efi_fake_mems[i]);
>
>         /* swap into new EFI memmap */
>         early_memunmap(new_memmap, efi.memmap.desc_size * new_nr_map);
> @@ -104,22 +102,22 @@ static int __init setup_fake_mem(char *p)
>                 if (nr_fake_mem >= EFI_MAX_FAKEMEM)
>                         break;
>
> -               fake_mems[nr_fake_mem].range.start = start;
> -               fake_mems[nr_fake_mem].range.end = start + mem_size - 1;
> -               fake_mems[nr_fake_mem].attribute = attribute;
> +               efi_fake_mems[nr_fake_mem].range.start = start;
> +               efi_fake_mems[nr_fake_mem].range.end = start + mem_size - 1;
> +               efi_fake_mems[nr_fake_mem].attribute = attribute;
>                 nr_fake_mem++;
>
>                 if (*p == ',')
>                         p++;
>         }
>
> -       sort(fake_mems, nr_fake_mem, sizeof(struct efi_mem_range),
> +       sort(efi_fake_mems, nr_fake_mem, sizeof(struct efi_mem_range),
>              cmp_fake_mem, NULL);
>
>         for (i = 0; i < nr_fake_mem; i++)
>                 pr_info("efi_fake_mem: add attr=0x%016llx to [mem 0x%016llx-0x%016llx]",
> -                       fake_mems[i].attribute, fake_mems[i].range.start,
> -                       fake_mems[i].range.end);
> +                       efi_fake_mems[i].attribute, efi_fake_mems[i].range.start,
> +                       efi_fake_mems[i].range.end);
>
>         return *p == '\0' ? 0 : -EINVAL;
>  }
> diff --git a/drivers/firmware/efi/fake_mem.h b/drivers/firmware/efi/fake_mem.h
> new file mode 100644
> index 000000000000..0390be13df96
> --- /dev/null
> +++ b/drivers/firmware/efi/fake_mem.h
> @@ -0,0 +1,10 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#ifndef __EFI_FAKE_MEM_H__
> +#define __EFI_FAKE_MEM_H__
> +#include <asm/efi.h>
> +
> +#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKE_MEM
> +
> +extern struct efi_mem_range efi_fake_mems[EFI_MAX_FAKEMEM];
> +extern int nr_fake_mem;
> +#endif /* __EFI_FAKE_MEM_H__ */
> diff --git a/drivers/firmware/efi/x86-fake_mem.c b/drivers/firmware/efi/x86-fake_mem.c
> new file mode 100644
> index 000000000000..8c369555dafe
> --- /dev/null
> +++ b/drivers/firmware/efi/x86-fake_mem.c
> @@ -0,0 +1,69 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright(c) 2019 Intel Corporation. All rights reserved. */
> +#include <linux/efi.h>
> +#include <asm/e820/api.h>
> +#include "fake_mem.h"
> +
> +void __init efi_fake_memmap_early(void)
> +{
> +       int i;
> +
> +       /*
> +        * efi_fake_mem() can handle all possibilities if EFI_MEMORY_SP
> +        * is ignored.
> +        */
> +       if (!efi_enabled(EFI_MEM_SOFT_RESERVE))
> +               return;
> +
> +       if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem)
> +               return;
> +
> +       /*
> +        * Given that efi_fake_memmap() needs to perform memblock
> +        * allocations it needs to run after e820__memblock_setup().
> +        * However, if efi_fake_mem specifies EFI_MEMORY_SP for a given
> +        * address range that potentially needs to mark the memory as
> +        * reserved prior to e820__memblock_setup(). Update e820
> +        * directly if EFI_MEMORY_SP is specified for an
> +        * EFI_CONVENTIONAL_MEMORY descriptor.
> +        */
> +       for (i = 0; i < nr_fake_mem; i++) {
> +               struct efi_mem_range *mem = &efi_fake_mems[i];
> +               efi_memory_desc_t *md;
> +               u64 m_start, m_end;
> +
> +               if ((mem->attribute & EFI_MEMORY_SP) == 0)
> +                       continue;
> +
> +               m_start = mem->range.start;
> +               m_end = mem->range.end;
> +               for_each_efi_memory_desc(md) {
> +                       u64 start, end;
> +
> +                       if (md->type != EFI_CONVENTIONAL_MEMORY)
> +                               continue;
> +
> +                       start = md->phys_addr;
> +                       end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) - 1;
> +
> +                       if (m_start <= end && m_end >= start)
> +                               /* fake range overlaps descriptor */;
> +                       else
> +                               continue;
> +
> +                       /*
> +                        * Trim the boundary of the e820 update to the
> +                        * descriptor in case the fake range overlaps
> +                        * !EFI_CONVENTIONAL_MEMORY
> +                        */
> +                       start = max(start, m_start);
> +                       end = min(end, m_end);
> +
> +                       if (end <= start)
> +                               continue;
> +                       e820__range_update(start, end - start + 1, E820_TYPE_RAM,
> +                                       E820_TYPE_SOFT_RESERVED);
> +                       e820__update_table(e820_table);
> +               }
> +       }
> +}
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
  2019-09-13 13:02   ` Ard Biesheuvel
@ 2019-09-13 15:02     ` Dan Williams
  0 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-09-13 15:02 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Vishal L Verma, Linux Kernel Mailing List,
	linux-efi

On Fri, Sep 13, 2019 at 6:02 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Fri, 30 Aug 2019 at 03:07, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > Given that EFI_MEMORY_SP is platform BIOS policy descision for marking
>
> decision

Fixed.

>
> > memory ranges as "reserved for a specific purpose" there will inevitably
> > be scenarios where the BIOS omits the attribute in situations where it
> > is desired. Unlike other attributes if the OS wants to reserve this
> > memory from the kernel the reservation needs to happen early in init. So
> > early, in fact, that it needs to happen before e820__memblock_setup()
> > which is a pre-requisite for efi_fake_memmap() that wants to allocate
> > memory for the updated table.
> >
> > Introduce an x86 specific efi_fake_memmap_early() that can search for
> > attempts to set EFI_MEMORY_SP via efi_fake_mem and update the e820 table
> > accordingly.
> >
>
> Is this early enough? The EFI stub runs before this, and allocates
> memory as well.

Unless I'm missing something the stub only allocates where the kernel
will land. That should be handled by the new mem_avoid_memmap()
extensions to consider "efi_fake_mem" in its exclusions.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
  2019-09-13 12:59   ` Ard Biesheuvel
@ 2019-09-13 16:22     ` Dan Williams
  2019-09-13 16:28       ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2019-09-13 16:22 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Darren Hart,
	Andy Shevchenko, Andy Lutomirski, Peter Zijlstra,
	kbuild test robot, Dave Hansen, Vishal L Verma,
	Linux Kernel Mailing List, linux-efi

On Fri, Sep 13, 2019 at 6:00 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Fri, 30 Aug 2019 at 03:06, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > interpretation of the EFI Memory Types as "reserved for a specific
> > purpose".
> >
> > The proposed Linux behavior for specific purpose memory is that it is
> > reserved for direct-access (device-dax) by default and not available for
> > any kernel usage, not even as an OOM fallback.  Later, through udev
> > scripts or another init mechanism, these device-dax claimed ranges can
> > be reconfigured and hot-added to the available System-RAM with a unique
> > node identifier. This device-dax management scheme implements "soft" in
> > the "soft reserved" designation by allowing some or all of the
> > reservation to be recovered as typical memory. This policy can be
> > disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
> > efi=nosoftreserve.
> >
> > This patch introduces 2 new concepts at once given the entanglement
> > between early boot enumeration relative to memory that can optionally be
> > reserved from the kernel page allocator by default. The new concepts
> > are:
> >
> > - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
> >   attribute on EFI_CONVENTIONAL memory, update the E820 map with this
> >   new type. Only perform this classification if the
> >   CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
> >   typical ram.
> >
> > - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
> >   a device driver to search iomem resources for application specific
> >   memory. Teach the iomem code to identify such ranges as "Soft Reserved".
> >
> > A follow-on change integrates parsing of the ACPI HMAT to identify the
> > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > now, just identify and reserve memory of this type.
> >
> > The translation of EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP to "soft
> > reserved" is x86/E820-only, but other archs could choose to publish
> > IORES_DESC_SOFT_RESERVED resources from their platform-firmware memory
> > map handlers. Other EFI-capable platforms would need to go audit their
> > local usages of EFI_CONVENTIONAL_MEMORY to consider the soft reserved
> > case.
> >
> > Cc: <x86@kernel.org>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > Cc: Darren Hart <dvhart@infradead.org>
> > Cc: Andy Shevchenko <andy@infradead.org>
> > Cc: Andy Lutomirski <luto@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Reported-by: kbuild test robot <lkp@intel.com>
> > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>
> Hi Dan,
>
> I understand that non-x86 may be out of scope for you, but this patch
> makes changes to x86 and generic code at the same time without regard
> for other architectures.

Yes, that did give me pause.

> I'd prefer it if we could cover ARM cleanly as well right at the start.

Let's do it.

>
> The first step would be to split out the EFI stub changes (i.e., to
> avoid allocating memory from EFI_MEMORY_SP regions) and the EFI core
> changes from the other changes. Then, I would like to ask for your
> help to get the arm64 part implemented where EFI_MEMORY_SP memory gets
> registered/reserved in a way that allows the HMAT code (which should
> be arch agnostic) to operate in the same way as it does on x86. Would
> it be enough to simply memblock_reserve() it and insert the iomem
> resource with the soft_reserved attribute?
>
> Some more comments below.
>
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |   19 +++++++--
> >  arch/x86/Kconfig                                |   21 +++++++++
> >  arch/x86/boot/compressed/eboot.c                |    7 +++
> >  arch/x86/boot/compressed/kaslr.c                |    4 ++
> >  arch/x86/include/asm/e820/types.h               |    8 ++++
> >  arch/x86/include/asm/efi-stub.h                 |   11 +++++
> >  arch/x86/kernel/e820.c                          |   12 +++++
> >  arch/x86/platform/efi/efi.c                     |   51 +++++++++++++++++++++--
> >  drivers/firmware/efi/efi.c                      |    3 +
> >  drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 +++++
> >  include/linux/efi.h                             |    1
> >  include/linux/ioport.h                          |    1
> >  12 files changed, 139 insertions(+), 11 deletions(-)
> >  create mode 100644 arch/x86/include/asm/efi-stub.h
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 1c67acd1df65..dd28f0726309 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -1152,7 +1152,8 @@
> >                         Format: {"off" | "on" | "skip[mbr]"}
> >
> >         efi=            [EFI]
> > -                       Format: { "old_map", "nochunk", "noruntime", "debug" }
> > +                       Format: { "old_map", "nochunk", "noruntime", "debug",
> > +                                 "nosoftreserve" }
> >                         old_map [X86-64]: switch to the old ioremap-based EFI
> >                         runtime services mapping. 32-bit still uses this one by
> >                         default.
> > @@ -1161,6 +1162,12 @@
> >                         firmware implementations.
> >                         noruntime : disable EFI runtime services support
> >                         debug: enable misc debug output
> > +                       nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
> > +                       attribute may cause the kernel to reserve the
> > +                       memory range for a memory mapping driver to
> > +                       claim. Specify efi=nosoftreserve to disable this
> > +                       reservation and treat the memory by its base type
> > +                       (i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
> >
> >         efi_no_storage_paranoia [EFI; X86]
> >                         Using this parameter you can use more than 50% of
> > @@ -1173,15 +1180,21 @@
> >                         updating original EFI memory map.
> >                         Region of memory which aa attribute is added to is
> >                         from ss to ss+nn.
> > +
> >                         If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000
> >                         is specified, EFI_MEMORY_MORE_RELIABLE(0x10000)
> >                         attribute is added to range 0x100000000-0x180000000 and
> >                         0x10a0000000-0x1120000000.
> >
> > +                       If efi_fake_mem=8G@9G:0x40000 is specified, the
> > +                       EFI_MEMORY_SP(0x40000) attribute is added to
> > +                       range 0x240000000-0x43fffffff.
> > +
> >                         Using this parameter you can do debugging of EFI memmap
> > -                       related feature. For example, you can do debugging of
> > +                       related features. For example, you can do debugging of
> >                         Address Range Mirroring feature even if your box
> > -                       doesn't support it.
> > +                       doesn't support it, or mark specific memory as
> > +                       "soft reserved".
> >
> >         efivar_ssdt=    [EFI; X86] Name of an EFI variable that contains an SSDT
> >                         that is to be dynamically loaded by Linux. If there are
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index 4195f44c6a09..bced13503bb1 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1981,6 +1981,27 @@ config EFI_MIXED
> >
> >            If unsure, say N.
> >
> > +config EFI_SOFT_RESERVE
> > +       bool "Reserve EFI Specific Purpose Memory"
> > +       depends on EFI && ACPI_HMAT
> > +       default ACPI_HMAT
> > +       ---help---
> > +         On systems that have mixed performance classes of memory EFI
> > +         may indicate specific purpose memory with an attribute (See
> > +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> > +         attribute may have unique performance characteristics compared
> > +         to the system's general purpose "System RAM" pool. On the
> > +         expectation that such memory has application specific usage,
> > +         and its base EFI memory type is "conventional" answer Y to
> > +         arrange for the kernel to reserve it as a "Soft Reserved"
> > +         resource, and set aside for direct-access (device-dax) by
> > +         default. The memory range can later be optionally assigned to
> > +         the page allocator by system administrator policy via the
> > +         device-dax kmem facility. Say N to have the kernel treat this
> > +         memory as "System RAM" by default.
> > +
> > +         If unsure, say Y.
> > +
>
> This should be in generic code.

Agree.

>
> >  config SECCOMP
> >         def_bool y
> >         prompt "Enable seccomp to safely compute untrusted bytecode"
> > diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> > index d6662fdef300..f2dc5896d770 100644
> > --- a/arch/x86/boot/compressed/eboot.c
> > +++ b/arch/x86/boot/compressed/eboot.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/pci.h>
> >
> >  #include <asm/efi.h>
> > +#include <asm/efi-stub.h>
> >  #include <asm/e820/types.h>
> >  #include <asm/setup.h>
> >  #include <asm/desc.h>
> > @@ -553,7 +554,11 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
> >                 case EFI_BOOT_SERVICES_CODE:
> >                 case EFI_BOOT_SERVICES_DATA:
> >                 case EFI_CONVENTIONAL_MEMORY:
> > -                       e820_type = E820_TYPE_RAM;
> > +                       if (!efi_nosoftreserve
> > +                                       && (d->attribute & EFI_MEMORY_SP))
> > +                               e820_type = E820_TYPE_SOFT_RESERVED;
> > +                       else
> > +                               e820_type = E820_TYPE_RAM;
> >                         break;
> >
> >                 case EFI_ACPI_MEMORY_NVS:
> > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > index 2e53c056ba20..093e84e28b7a 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -38,6 +38,7 @@
> >  #include <linux/efi.h>
> >  #include <generated/utsrelease.h>
> >  #include <asm/efi.h>
> > +#include <asm/efi-stub.h>
> >
> >  /* Macros used by the included decompressor code below. */
> >  #define STATIC
> > @@ -760,6 +761,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> >                 if (md->type != EFI_CONVENTIONAL_MEMORY)
> >                         continue;
> >
> > +               if (!efi_nosoftreserve && (md->attribute & EFI_MEMORY_SP))
> > +                       continue;
> > +
> >                 if (efi_mirror_found &&
> >                     !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
> >                         continue;
> > diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
> > index c3aa4b5e49e2..314f75d886d0 100644
> > --- a/arch/x86/include/asm/e820/types.h
> > +++ b/arch/x86/include/asm/e820/types.h
> > @@ -28,6 +28,14 @@ enum e820_type {
> >          */
> >         E820_TYPE_PRAM          = 12,
> >
> > +       /*
> > +        * Special-purpose memory is indicated to the system via the
> > +        * EFI_MEMORY_SP attribute. Define an e820 translation of this
> > +        * memory type for the purpose of reserving this range and
> > +        * marking it with the IORES_DESC_SOFT_RESERVED designation.
> > +        */
> > +       E820_TYPE_SOFT_RESERVED = 0xefffffff,
> > +
> >         /*
> >          * Reserved RAM used by the kernel itself if
> >          * CONFIG_INTEL_TXT=y is enabled, memory of this type
> > diff --git a/arch/x86/include/asm/efi-stub.h b/arch/x86/include/asm/efi-stub.h
> > new file mode 100644
> > index 000000000000..16ebd036387b
> > --- /dev/null
> > +++ b/arch/x86/include/asm/efi-stub.h
> > @@ -0,0 +1,11 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#ifndef _X86_EFI_STUB_H_
> > +#define _X86_EFI_STUB_H_
> > +
> > +#ifdef CONFIG_EFI_STUB
> > +extern bool efi_nosoftreserve;
> > +#else
> > +#define efi_nosoftreserve (1)
> > +#endif
> > +
> > +#endif /* _X86_EFI_STUB_H_ */
>
> Please put this in generic code as well (but you need a function not a
> variable - see below)
>
> > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > index 7da2bcd2b8eb..9976106b57ec 100644
> > --- a/arch/x86/kernel/e820.c
> > +++ b/arch/x86/kernel/e820.c
> > @@ -190,6 +190,7 @@ static void __init e820_print_type(enum e820_type type)
> >         case E820_TYPE_RAM:             /* Fall through: */
> >         case E820_TYPE_RESERVED_KERN:   pr_cont("usable");                      break;
> >         case E820_TYPE_RESERVED:        pr_cont("reserved");                    break;
> > +       case E820_TYPE_SOFT_RESERVED:   pr_cont("soft reserved");               break;
> >         case E820_TYPE_ACPI:            pr_cont("ACPI data");                   break;
> >         case E820_TYPE_NVS:             pr_cont("ACPI NVS");                    break;
> >         case E820_TYPE_UNUSABLE:        pr_cont("unusable");                    break;
> > @@ -1037,6 +1038,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry)
> >         case E820_TYPE_PRAM:            return "Persistent Memory (legacy)";
> >         case E820_TYPE_PMEM:            return "Persistent Memory";
> >         case E820_TYPE_RESERVED:        return "Reserved";
> > +       case E820_TYPE_SOFT_RESERVED:   return "Soft Reserved";
> >         default:                        return "Unknown E820 type";
> >         }
> >  }
> > @@ -1052,6 +1054,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry)
> >         case E820_TYPE_PRAM:            /* Fall-through: */
> >         case E820_TYPE_PMEM:            /* Fall-through: */
> >         case E820_TYPE_RESERVED:        /* Fall-through: */
> > +       case E820_TYPE_SOFT_RESERVED:   /* Fall-through: */
> >         default:                        return IORESOURCE_MEM;
> >         }
> >  }
> > @@ -1064,6 +1067,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
> >         case E820_TYPE_PMEM:            return IORES_DESC_PERSISTENT_MEMORY;
> >         case E820_TYPE_PRAM:            return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> >         case E820_TYPE_RESERVED:        return IORES_DESC_RESERVED;
> > +       case E820_TYPE_SOFT_RESERVED:   return IORES_DESC_SOFT_RESERVED;
> >         case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
> >         case E820_TYPE_RAM:             /* Fall-through: */
> >         case E820_TYPE_UNUSABLE:        /* Fall-through: */
> > @@ -1078,11 +1082,12 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res)
> >                 return true;
> >
> >         /*
> > -        * Treat persistent memory like device memory, i.e. reserve it
> > -        * for exclusive use of a driver
> > +        * Treat persistent memory and other special memory ranges like
> > +        * device memory, i.e. reserve it for exclusive use of a driver
> >          */
> >         switch (type) {
> >         case E820_TYPE_RESERVED:
> > +       case E820_TYPE_SOFT_RESERVED:
> >         case E820_TYPE_PRAM:
> >         case E820_TYPE_PMEM:
> >                 return false;
> > @@ -1285,6 +1290,9 @@ void __init e820__memblock_setup(void)
> >                 if (end != (resource_size_t)end)
> >                         continue;
> >
> > +               if (entry->type == E820_TYPE_SOFT_RESERVED)
> > +                       memblock_reserve(entry->addr, entry->size);
> > +
> >                 if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> >                         continue;
> >
> > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > index 0bb58eb33ca0..9cfb7f1cf25d 100644
> > --- a/arch/x86/platform/efi/efi.c
> > +++ b/arch/x86/platform/efi/efi.c
> > @@ -151,10 +151,18 @@ void __init efi_find_mirror(void)
> >   * more than the max 128 entries that can fit in the e820 legacy
> >   * (zeropage) memory map.
> >   */
> > +enum add_efi_mode {
> > +       ADD_EFI_ALL,
> > +       ADD_EFI_SOFT_RESERVED,
> > +};
> >
> > -static void __init do_add_efi_memmap(void)
> > +static void __init do_add_efi_memmap(enum add_efi_mode mode)
> >  {
> >         efi_memory_desc_t *md;
> > +       int add = 0;
> > +
> > +       if (!efi_enabled(EFI_MEMMAP))
> > +               return;
> >
> >         for_each_efi_memory_desc(md) {
> >                 unsigned long long start = md->phys_addr;
> > @@ -167,7 +175,10 @@ static void __init do_add_efi_memmap(void)
> >                 case EFI_BOOT_SERVICES_CODE:
> >                 case EFI_BOOT_SERVICES_DATA:
> >                 case EFI_CONVENTIONAL_MEMORY:
> > -                       if (md->attribute & EFI_MEMORY_WB)
> > +                       if (efi_enabled(EFI_MEM_SOFT_RESERVE)
> > +                                       && (md->attribute & EFI_MEMORY_SP))
> > +                               e820_type = E820_TYPE_SOFT_RESERVED;
> > +                       else if (md->attribute & EFI_MEMORY_WB)
> >                                 e820_type = E820_TYPE_RAM;
> >                         else
> >                                 e820_type = E820_TYPE_RESERVED;
> > @@ -193,9 +204,17 @@ static void __init do_add_efi_memmap(void)
> >                         e820_type = E820_TYPE_RESERVED;
> >                         break;
> >                 }
> > +
> > +               if (e820_type == E820_TYPE_SOFT_RESERVED)
> > +                       /* always add E820_TYPE_SOFT_RESERVED */;
> > +               else if (mode == ADD_EFI_SOFT_RESERVED)
> > +                       continue;
> > +
> > +               add++;
> >                 e820__range_add(start, size, e820_type);
> >         }
> > -       e820__update_table(e820_table);
> > +       if (add)
> > +               e820__update_table(e820_table);
> >  }
> >
> >  int __init efi_memblock_x86_reserve_range(void)
> > @@ -227,8 +246,18 @@ int __init efi_memblock_x86_reserve_range(void)
> >         if (rv)
> >                 return rv;
> >
> > -       if (add_efi_memmap)
> > -               do_add_efi_memmap();
> > +       if (add_efi_memmap) {
> > +               do_add_efi_memmap(ADD_EFI_ALL);
> > +       } else {
> > +               /*
> > +                * Given add_efi_memmap defaults to 0 and there there is no e820
> > +                * mechanism for soft-reserved memory. Explicitly scan for
> > +                * soft-reserved memory. Otherwise, the mechanism to disable the
> > +                * kernel's consideration of EFI_MEMORY_SP is the
> > +                * efi=nosoftreserve option.
> > +                */
> > +               do_add_efi_memmap(ADD_EFI_SOFT_RESERVED);
> > +       }
> >
> >         WARN(efi.memmap.desc_version != 1,
> >              "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
> > @@ -781,6 +810,15 @@ static bool should_map_region(efi_memory_desc_t *md)
> >         if (IS_ENABLED(CONFIG_X86_32))
> >                 return false;
> >
> > +       /*
> > +        * EFI specific purpose memory may be reserved by default
> > +        * depending on kernel config and boot options.
> > +        */
> > +       if (md->type == EFI_CONVENTIONAL_MEMORY
> > +                       && efi_enabled(EFI_MEM_SOFT_RESERVE)
> > +                       && (md->attribute & EFI_MEMORY_SP))
> > +               return false;
> > +
> >         /*
> >          * Map all of RAM so that we can access arguments in the 1:1
> >          * mapping when making EFI runtime calls.
> > @@ -1072,6 +1110,9 @@ static int __init arch_parse_efi_cmdline(char *str)
> >         if (parse_option_str(str, "old_map"))
> >                 set_bit(EFI_OLD_MEMMAP, &efi.flags);
> >
> > +       if (parse_option_str(str, "nosoftreserve"))
> > +               clear_bit(EFI_MEM_SOFT_RESERVE, &efi.flags);
> > +
>
> Can we move this to the generic efi= handling code?

To parse_efi_cmdline() in drivers/fimrware/efi.c? Sure.

>
> >         return 0;
> >  }
> >  early_param("efi", arch_parse_efi_cmdline);
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index 363bb9d00fa5..6d54d5c74347 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -52,6 +52,9 @@ struct efi __read_mostly efi = {
> >         .tpm_log                = EFI_INVALID_TABLE_ADDR,
> >         .tpm_final_log          = EFI_INVALID_TABLE_ADDR,
> >         .mem_reserve            = EFI_INVALID_TABLE_ADDR,
> > +#ifdef CONFIG_EFI_SOFT_RESERVE
> > +       .flags                  = 1UL << EFI_MEM_SOFT_RESERVE,
> > +#endif
> >  };
> >  EXPORT_SYMBOL(efi);
> >
>
> I'd prefer it if we could call this EFI_MEM_NO_SOFT_RESERVE instead,
> and invert the meaning of the bit.

...but that would mean repeat occurrences of
"!efi_enabled(EFI_MEM_NO_SOFT_RESERVE)", doesn't the double negative
seem less readable to you?

>
> > diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > index 3caae7f2cf56..35ee98a2c00c 100644
> > --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> > +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > @@ -28,6 +28,7 @@
> >  #define EFI_READ_CHUNK_SIZE    (1024 * 1024)
> >
> >  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
> > +bool efi_nosoftreserve;
> >
>
> This needs a getter function if you want to access it from other
> compilation units. This has to do with how the early relocation code
> handles data symbol references. Please refer to nokaslr() for an
> example.

Ah, does that mean that the efi_nosoftreserve global variable
instances in different compilation units are effectively static the
way I currently have them defined?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
  2019-09-13 16:22     ` Dan Williams
@ 2019-09-13 16:28       ` Ard Biesheuvel
  2019-09-13 16:39         ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2019-09-13 16:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Darren Hart,
	Andy Shevchenko, Andy Lutomirski, Peter Zijlstra,
	kbuild test robot, Dave Hansen, Vishal L Verma,
	Linux Kernel Mailing List, linux-efi

On Fri, 13 Sep 2019 at 17:22, Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Sep 13, 2019 at 6:00 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > On Fri, 30 Aug 2019 at 03:06, Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > interpretation of the EFI Memory Types as "reserved for a specific
> > > purpose".
> > >
> > > The proposed Linux behavior for specific purpose memory is that it is
> > > reserved for direct-access (device-dax) by default and not available for
> > > any kernel usage, not even as an OOM fallback.  Later, through udev
> > > scripts or another init mechanism, these device-dax claimed ranges can
> > > be reconfigured and hot-added to the available System-RAM with a unique
> > > node identifier. This device-dax management scheme implements "soft" in
> > > the "soft reserved" designation by allowing some or all of the
> > > reservation to be recovered as typical memory. This policy can be
> > > disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
> > > efi=nosoftreserve.
> > >
> > > This patch introduces 2 new concepts at once given the entanglement
> > > between early boot enumeration relative to memory that can optionally be
> > > reserved from the kernel page allocator by default. The new concepts
> > > are:
> > >
> > > - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
> > >   attribute on EFI_CONVENTIONAL memory, update the E820 map with this
> > >   new type. Only perform this classification if the
> > >   CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
> > >   typical ram.
> > >
> > > - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
> > >   a device driver to search iomem resources for application specific
> > >   memory. Teach the iomem code to identify such ranges as "Soft Reserved".
> > >
> > > A follow-on change integrates parsing of the ACPI HMAT to identify the
> > > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > > now, just identify and reserve memory of this type.
> > >
> > > The translation of EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP to "soft
> > > reserved" is x86/E820-only, but other archs could choose to publish
> > > IORES_DESC_SOFT_RESERVED resources from their platform-firmware memory
> > > map handlers. Other EFI-capable platforms would need to go audit their
> > > local usages of EFI_CONVENTIONAL_MEMORY to consider the soft reserved
> > > case.
> > >
> > > Cc: <x86@kernel.org>
> > > Cc: Borislav Petkov <bp@alien8.de>
> > > Cc: Ingo Molnar <mingo@redhat.com>
> > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > Cc: Darren Hart <dvhart@infradead.org>
> > > Cc: Andy Shevchenko <andy@infradead.org>
> > > Cc: Andy Lutomirski <luto@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > Reported-by: kbuild test robot <lkp@intel.com>
> > > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >
> > Hi Dan,
> >
> > I understand that non-x86 may be out of scope for you, but this patch
> > makes changes to x86 and generic code at the same time without regard
> > for other architectures.
>
> Yes, that did give me pause.
>
> > I'd prefer it if we could cover ARM cleanly as well right at the start.
>
> Let's do it.
>
> >
> > The first step would be to split out the EFI stub changes (i.e., to
> > avoid allocating memory from EFI_MEMORY_SP regions) and the EFI core
> > changes from the other changes. Then, I would like to ask for your
> > help to get the arm64 part implemented where EFI_MEMORY_SP memory gets
> > registered/reserved in a way that allows the HMAT code (which should
> > be arch agnostic) to operate in the same way as it does on x86. Would
> > it be enough to simply memblock_reserve() it and insert the iomem
> > resource with the soft_reserved attribute?
> >
> > Some more comments below.
> >
> > > ---
> > >  Documentation/admin-guide/kernel-parameters.txt |   19 +++++++--
> > >  arch/x86/Kconfig                                |   21 +++++++++
> > >  arch/x86/boot/compressed/eboot.c                |    7 +++
> > >  arch/x86/boot/compressed/kaslr.c                |    4 ++
> > >  arch/x86/include/asm/e820/types.h               |    8 ++++
> > >  arch/x86/include/asm/efi-stub.h                 |   11 +++++
> > >  arch/x86/kernel/e820.c                          |   12 +++++
> > >  arch/x86/platform/efi/efi.c                     |   51 +++++++++++++++++++++--
> > >  drivers/firmware/efi/efi.c                      |    3 +
> > >  drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 +++++
> > >  include/linux/efi.h                             |    1
> > >  include/linux/ioport.h                          |    1
> > >  12 files changed, 139 insertions(+), 11 deletions(-)
> > >  create mode 100644 arch/x86/include/asm/efi-stub.h
> > >
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 1c67acd1df65..dd28f0726309 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -1152,7 +1152,8 @@
> > >                         Format: {"off" | "on" | "skip[mbr]"}
> > >
> > >         efi=            [EFI]
> > > -                       Format: { "old_map", "nochunk", "noruntime", "debug" }
> > > +                       Format: { "old_map", "nochunk", "noruntime", "debug",
> > > +                                 "nosoftreserve" }
> > >                         old_map [X86-64]: switch to the old ioremap-based EFI
> > >                         runtime services mapping. 32-bit still uses this one by
> > >                         default.
> > > @@ -1161,6 +1162,12 @@
> > >                         firmware implementations.
> > >                         noruntime : disable EFI runtime services support
> > >                         debug: enable misc debug output
> > > +                       nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
> > > +                       attribute may cause the kernel to reserve the
> > > +                       memory range for a memory mapping driver to
> > > +                       claim. Specify efi=nosoftreserve to disable this
> > > +                       reservation and treat the memory by its base type
> > > +                       (i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
> > >
> > >         efi_no_storage_paranoia [EFI; X86]
> > >                         Using this parameter you can use more than 50% of
> > > @@ -1173,15 +1180,21 @@
> > >                         updating original EFI memory map.
> > >                         Region of memory which aa attribute is added to is
> > >                         from ss to ss+nn.
> > > +
> > >                         If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000
> > >                         is specified, EFI_MEMORY_MORE_RELIABLE(0x10000)
> > >                         attribute is added to range 0x100000000-0x180000000 and
> > >                         0x10a0000000-0x1120000000.
> > >
> > > +                       If efi_fake_mem=8G@9G:0x40000 is specified, the
> > > +                       EFI_MEMORY_SP(0x40000) attribute is added to
> > > +                       range 0x240000000-0x43fffffff.
> > > +
> > >                         Using this parameter you can do debugging of EFI memmap
> > > -                       related feature. For example, you can do debugging of
> > > +                       related features. For example, you can do debugging of
> > >                         Address Range Mirroring feature even if your box
> > > -                       doesn't support it.
> > > +                       doesn't support it, or mark specific memory as
> > > +                       "soft reserved".
> > >
> > >         efivar_ssdt=    [EFI; X86] Name of an EFI variable that contains an SSDT
> > >                         that is to be dynamically loaded by Linux. If there are
> > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > index 4195f44c6a09..bced13503bb1 100644
> > > --- a/arch/x86/Kconfig
> > > +++ b/arch/x86/Kconfig
> > > @@ -1981,6 +1981,27 @@ config EFI_MIXED
> > >
> > >            If unsure, say N.
> > >
> > > +config EFI_SOFT_RESERVE
> > > +       bool "Reserve EFI Specific Purpose Memory"
> > > +       depends on EFI && ACPI_HMAT
> > > +       default ACPI_HMAT
> > > +       ---help---
> > > +         On systems that have mixed performance classes of memory EFI
> > > +         may indicate specific purpose memory with an attribute (See
> > > +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> > > +         attribute may have unique performance characteristics compared
> > > +         to the system's general purpose "System RAM" pool. On the
> > > +         expectation that such memory has application specific usage,
> > > +         and its base EFI memory type is "conventional" answer Y to
> > > +         arrange for the kernel to reserve it as a "Soft Reserved"
> > > +         resource, and set aside for direct-access (device-dax) by
> > > +         default. The memory range can later be optionally assigned to
> > > +         the page allocator by system administrator policy via the
> > > +         device-dax kmem facility. Say N to have the kernel treat this
> > > +         memory as "System RAM" by default.
> > > +
> > > +         If unsure, say Y.
> > > +
> >
> > This should be in generic code.
>
> Agree.
>
> >
> > >  config SECCOMP
> > >         def_bool y
> > >         prompt "Enable seccomp to safely compute untrusted bytecode"
> > > diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> > > index d6662fdef300..f2dc5896d770 100644
> > > --- a/arch/x86/boot/compressed/eboot.c
> > > +++ b/arch/x86/boot/compressed/eboot.c
> > > @@ -10,6 +10,7 @@
> > >  #include <linux/pci.h>
> > >
> > >  #include <asm/efi.h>
> > > +#include <asm/efi-stub.h>
> > >  #include <asm/e820/types.h>
> > >  #include <asm/setup.h>
> > >  #include <asm/desc.h>
> > > @@ -553,7 +554,11 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
> > >                 case EFI_BOOT_SERVICES_CODE:
> > >                 case EFI_BOOT_SERVICES_DATA:
> > >                 case EFI_CONVENTIONAL_MEMORY:
> > > -                       e820_type = E820_TYPE_RAM;
> > > +                       if (!efi_nosoftreserve
> > > +                                       && (d->attribute & EFI_MEMORY_SP))
> > > +                               e820_type = E820_TYPE_SOFT_RESERVED;
> > > +                       else
> > > +                               e820_type = E820_TYPE_RAM;
> > >                         break;
> > >
> > >                 case EFI_ACPI_MEMORY_NVS:
> > > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > > index 2e53c056ba20..093e84e28b7a 100644
> > > --- a/arch/x86/boot/compressed/kaslr.c
> > > +++ b/arch/x86/boot/compressed/kaslr.c
> > > @@ -38,6 +38,7 @@
> > >  #include <linux/efi.h>
> > >  #include <generated/utsrelease.h>
> > >  #include <asm/efi.h>
> > > +#include <asm/efi-stub.h>
> > >
> > >  /* Macros used by the included decompressor code below. */
> > >  #define STATIC
> > > @@ -760,6 +761,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> > >                 if (md->type != EFI_CONVENTIONAL_MEMORY)
> > >                         continue;
> > >
> > > +               if (!efi_nosoftreserve && (md->attribute & EFI_MEMORY_SP))
> > > +                       continue;
> > > +
> > >                 if (efi_mirror_found &&
> > >                     !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
> > >                         continue;
> > > diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
> > > index c3aa4b5e49e2..314f75d886d0 100644
> > > --- a/arch/x86/include/asm/e820/types.h
> > > +++ b/arch/x86/include/asm/e820/types.h
> > > @@ -28,6 +28,14 @@ enum e820_type {
> > >          */
> > >         E820_TYPE_PRAM          = 12,
> > >
> > > +       /*
> > > +        * Special-purpose memory is indicated to the system via the
> > > +        * EFI_MEMORY_SP attribute. Define an e820 translation of this
> > > +        * memory type for the purpose of reserving this range and
> > > +        * marking it with the IORES_DESC_SOFT_RESERVED designation.
> > > +        */
> > > +       E820_TYPE_SOFT_RESERVED = 0xefffffff,
> > > +
> > >         /*
> > >          * Reserved RAM used by the kernel itself if
> > >          * CONFIG_INTEL_TXT=y is enabled, memory of this type
> > > diff --git a/arch/x86/include/asm/efi-stub.h b/arch/x86/include/asm/efi-stub.h
> > > new file mode 100644
> > > index 000000000000..16ebd036387b
> > > --- /dev/null
> > > +++ b/arch/x86/include/asm/efi-stub.h
> > > @@ -0,0 +1,11 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +#ifndef _X86_EFI_STUB_H_
> > > +#define _X86_EFI_STUB_H_
> > > +
> > > +#ifdef CONFIG_EFI_STUB
> > > +extern bool efi_nosoftreserve;
> > > +#else
> > > +#define efi_nosoftreserve (1)
> > > +#endif
> > > +
> > > +#endif /* _X86_EFI_STUB_H_ */
> >
> > Please put this in generic code as well (but you need a function not a
> > variable - see below)
> >
> > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > > index 7da2bcd2b8eb..9976106b57ec 100644
> > > --- a/arch/x86/kernel/e820.c
> > > +++ b/arch/x86/kernel/e820.c
> > > @@ -190,6 +190,7 @@ static void __init e820_print_type(enum e820_type type)
> > >         case E820_TYPE_RAM:             /* Fall through: */
> > >         case E820_TYPE_RESERVED_KERN:   pr_cont("usable");                      break;
> > >         case E820_TYPE_RESERVED:        pr_cont("reserved");                    break;
> > > +       case E820_TYPE_SOFT_RESERVED:   pr_cont("soft reserved");               break;
> > >         case E820_TYPE_ACPI:            pr_cont("ACPI data");                   break;
> > >         case E820_TYPE_NVS:             pr_cont("ACPI NVS");                    break;
> > >         case E820_TYPE_UNUSABLE:        pr_cont("unusable");                    break;
> > > @@ -1037,6 +1038,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry)
> > >         case E820_TYPE_PRAM:            return "Persistent Memory (legacy)";
> > >         case E820_TYPE_PMEM:            return "Persistent Memory";
> > >         case E820_TYPE_RESERVED:        return "Reserved";
> > > +       case E820_TYPE_SOFT_RESERVED:   return "Soft Reserved";
> > >         default:                        return "Unknown E820 type";
> > >         }
> > >  }
> > > @@ -1052,6 +1054,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry)
> > >         case E820_TYPE_PRAM:            /* Fall-through: */
> > >         case E820_TYPE_PMEM:            /* Fall-through: */
> > >         case E820_TYPE_RESERVED:        /* Fall-through: */
> > > +       case E820_TYPE_SOFT_RESERVED:   /* Fall-through: */
> > >         default:                        return IORESOURCE_MEM;
> > >         }
> > >  }
> > > @@ -1064,6 +1067,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
> > >         case E820_TYPE_PMEM:            return IORES_DESC_PERSISTENT_MEMORY;
> > >         case E820_TYPE_PRAM:            return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> > >         case E820_TYPE_RESERVED:        return IORES_DESC_RESERVED;
> > > +       case E820_TYPE_SOFT_RESERVED:   return IORES_DESC_SOFT_RESERVED;
> > >         case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
> > >         case E820_TYPE_RAM:             /* Fall-through: */
> > >         case E820_TYPE_UNUSABLE:        /* Fall-through: */
> > > @@ -1078,11 +1082,12 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res)
> > >                 return true;
> > >
> > >         /*
> > > -        * Treat persistent memory like device memory, i.e. reserve it
> > > -        * for exclusive use of a driver
> > > +        * Treat persistent memory and other special memory ranges like
> > > +        * device memory, i.e. reserve it for exclusive use of a driver
> > >          */
> > >         switch (type) {
> > >         case E820_TYPE_RESERVED:
> > > +       case E820_TYPE_SOFT_RESERVED:
> > >         case E820_TYPE_PRAM:
> > >         case E820_TYPE_PMEM:
> > >                 return false;
> > > @@ -1285,6 +1290,9 @@ void __init e820__memblock_setup(void)
> > >                 if (end != (resource_size_t)end)
> > >                         continue;
> > >
> > > +               if (entry->type == E820_TYPE_SOFT_RESERVED)
> > > +                       memblock_reserve(entry->addr, entry->size);
> > > +
> > >                 if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> > >                         continue;
> > >
> > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > > index 0bb58eb33ca0..9cfb7f1cf25d 100644
> > > --- a/arch/x86/platform/efi/efi.c
> > > +++ b/arch/x86/platform/efi/efi.c
> > > @@ -151,10 +151,18 @@ void __init efi_find_mirror(void)
> > >   * more than the max 128 entries that can fit in the e820 legacy
> > >   * (zeropage) memory map.
> > >   */
> > > +enum add_efi_mode {
> > > +       ADD_EFI_ALL,
> > > +       ADD_EFI_SOFT_RESERVED,
> > > +};
> > >
> > > -static void __init do_add_efi_memmap(void)
> > > +static void __init do_add_efi_memmap(enum add_efi_mode mode)
> > >  {
> > >         efi_memory_desc_t *md;
> > > +       int add = 0;
> > > +
> > > +       if (!efi_enabled(EFI_MEMMAP))
> > > +               return;
> > >
> > >         for_each_efi_memory_desc(md) {
> > >                 unsigned long long start = md->phys_addr;
> > > @@ -167,7 +175,10 @@ static void __init do_add_efi_memmap(void)
> > >                 case EFI_BOOT_SERVICES_CODE:
> > >                 case EFI_BOOT_SERVICES_DATA:
> > >                 case EFI_CONVENTIONAL_MEMORY:
> > > -                       if (md->attribute & EFI_MEMORY_WB)
> > > +                       if (efi_enabled(EFI_MEM_SOFT_RESERVE)
> > > +                                       && (md->attribute & EFI_MEMORY_SP))
> > > +                               e820_type = E820_TYPE_SOFT_RESERVED;
> > > +                       else if (md->attribute & EFI_MEMORY_WB)
> > >                                 e820_type = E820_TYPE_RAM;
> > >                         else
> > >                                 e820_type = E820_TYPE_RESERVED;
> > > @@ -193,9 +204,17 @@ static void __init do_add_efi_memmap(void)
> > >                         e820_type = E820_TYPE_RESERVED;
> > >                         break;
> > >                 }
> > > +
> > > +               if (e820_type == E820_TYPE_SOFT_RESERVED)
> > > +                       /* always add E820_TYPE_SOFT_RESERVED */;
> > > +               else if (mode == ADD_EFI_SOFT_RESERVED)
> > > +                       continue;
> > > +
> > > +               add++;
> > >                 e820__range_add(start, size, e820_type);
> > >         }
> > > -       e820__update_table(e820_table);
> > > +       if (add)
> > > +               e820__update_table(e820_table);
> > >  }
> > >
> > >  int __init efi_memblock_x86_reserve_range(void)
> > > @@ -227,8 +246,18 @@ int __init efi_memblock_x86_reserve_range(void)
> > >         if (rv)
> > >                 return rv;
> > >
> > > -       if (add_efi_memmap)
> > > -               do_add_efi_memmap();
> > > +       if (add_efi_memmap) {
> > > +               do_add_efi_memmap(ADD_EFI_ALL);
> > > +       } else {
> > > +               /*
> > > +                * Given add_efi_memmap defaults to 0 and there there is no e820
> > > +                * mechanism for soft-reserved memory. Explicitly scan for
> > > +                * soft-reserved memory. Otherwise, the mechanism to disable the
> > > +                * kernel's consideration of EFI_MEMORY_SP is the
> > > +                * efi=nosoftreserve option.
> > > +                */
> > > +               do_add_efi_memmap(ADD_EFI_SOFT_RESERVED);
> > > +       }
> > >
> > >         WARN(efi.memmap.desc_version != 1,
> > >              "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
> > > @@ -781,6 +810,15 @@ static bool should_map_region(efi_memory_desc_t *md)
> > >         if (IS_ENABLED(CONFIG_X86_32))
> > >                 return false;
> > >
> > > +       /*
> > > +        * EFI specific purpose memory may be reserved by default
> > > +        * depending on kernel config and boot options.
> > > +        */
> > > +       if (md->type == EFI_CONVENTIONAL_MEMORY
> > > +                       && efi_enabled(EFI_MEM_SOFT_RESERVE)
> > > +                       && (md->attribute & EFI_MEMORY_SP))
> > > +               return false;
> > > +
> > >         /*
> > >          * Map all of RAM so that we can access arguments in the 1:1
> > >          * mapping when making EFI runtime calls.
> > > @@ -1072,6 +1110,9 @@ static int __init arch_parse_efi_cmdline(char *str)
> > >         if (parse_option_str(str, "old_map"))
> > >                 set_bit(EFI_OLD_MEMMAP, &efi.flags);
> > >
> > > +       if (parse_option_str(str, "nosoftreserve"))
> > > +               clear_bit(EFI_MEM_SOFT_RESERVE, &efi.flags);
> > > +
> >
> > Can we move this to the generic efi= handling code?
>
> To parse_efi_cmdline() in drivers/fimrware/efi.c? Sure.
>
> >
> > >         return 0;
> > >  }
> > >  early_param("efi", arch_parse_efi_cmdline);
> > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > > index 363bb9d00fa5..6d54d5c74347 100644
> > > --- a/drivers/firmware/efi/efi.c
> > > +++ b/drivers/firmware/efi/efi.c
> > > @@ -52,6 +52,9 @@ struct efi __read_mostly efi = {
> > >         .tpm_log                = EFI_INVALID_TABLE_ADDR,
> > >         .tpm_final_log          = EFI_INVALID_TABLE_ADDR,
> > >         .mem_reserve            = EFI_INVALID_TABLE_ADDR,
> > > +#ifdef CONFIG_EFI_SOFT_RESERVE
> > > +       .flags                  = 1UL << EFI_MEM_SOFT_RESERVE,
> > > +#endif
> > >  };
> > >  EXPORT_SYMBOL(efi);
> > >
> >
> > I'd prefer it if we could call this EFI_MEM_NO_SOFT_RESERVE instead,
> > and invert the meaning of the bit.
>
> ...but that would mean repeat occurrences of
> "!efi_enabled(EFI_MEM_NO_SOFT_RESERVE)", doesn't the double negative
> seem less readable to you?
>

One the one hand, yes. On the other hand, it is the only flag whose
default is 'enabled' which is also less than ideal.

> >
> > > diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > index 3caae7f2cf56..35ee98a2c00c 100644
> > > --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > @@ -28,6 +28,7 @@
> > >  #define EFI_READ_CHUNK_SIZE    (1024 * 1024)
> > >
> > >  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
> > > +bool efi_nosoftreserve;
> > >
> >
> > This needs a getter function if you want to access it from other
> > compilation units. This has to do with how the early relocation code
> > handles data symbol references. Please refer to nokaslr() for an
> > example.
>
> Ah, does that mean that the efi_nosoftreserve global variable
> instances in different compilation units are effectively static the
> way I currently have them defined?

No, the problem had to do with relocation of GOT entries on some x86
builds. Then, things got more complicated when I added the 32-bit ARM
port, which puts other constraints related to how symbols are placed
in the binary.

So please duplicate the pattern with the static variable and the
__pure setter, which has proven to be the most robust way to expose
variables to other compilation units in the stub.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
  2019-09-13 16:28       ` Ard Biesheuvel
@ 2019-09-13 16:39         ` Dan Williams
  2019-09-13 17:39           ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2019-09-13 16:39 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Darren Hart,
	Andy Shevchenko, Andy Lutomirski, Peter Zijlstra,
	kbuild test robot, Dave Hansen, Vishal L Verma,
	Linux Kernel Mailing List, linux-efi

On Fri, Sep 13, 2019 at 9:29 AM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Fri, 13 Sep 2019 at 17:22, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Fri, Sep 13, 2019 at 6:00 AM Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > >
> > > On Fri, 30 Aug 2019 at 03:06, Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > UEFI 2.8 defines an EFI_MEMORY_SP attribute bit to augment the
> > > > interpretation of the EFI Memory Types as "reserved for a specific
> > > > purpose".
> > > >
> > > > The proposed Linux behavior for specific purpose memory is that it is
> > > > reserved for direct-access (device-dax) by default and not available for
> > > > any kernel usage, not even as an OOM fallback.  Later, through udev
> > > > scripts or another init mechanism, these device-dax claimed ranges can
> > > > be reconfigured and hot-added to the available System-RAM with a unique
> > > > node identifier. This device-dax management scheme implements "soft" in
> > > > the "soft reserved" designation by allowing some or all of the
> > > > reservation to be recovered as typical memory. This policy can be
> > > > disabled at compile-time with CONFIG_EFI_SOFT_RESERVE=n, or runtime with
> > > > efi=nosoftreserve.
> > > >
> > > > This patch introduces 2 new concepts at once given the entanglement
> > > > between early boot enumeration relative to memory that can optionally be
> > > > reserved from the kernel page allocator by default. The new concepts
> > > > are:
> > > >
> > > > - E820_TYPE_SOFT_RESERVED: Upon detecting the EFI_MEMORY_SP
> > > >   attribute on EFI_CONVENTIONAL memory, update the E820 map with this
> > > >   new type. Only perform this classification if the
> > > >   CONFIG_EFI_SOFT_RESERVE=y policy is enabled, otherwise treat it as
> > > >   typical ram.
> > > >
> > > > - IORES_DESC_SOFT_RESERVED: Add a new I/O resource descriptor for
> > > >   a device driver to search iomem resources for application specific
> > > >   memory. Teach the iomem code to identify such ranges as "Soft Reserved".
> > > >
> > > > A follow-on change integrates parsing of the ACPI HMAT to identify the
> > > > node and sub-range boundaries of EFI_MEMORY_SP designated memory. For
> > > > now, just identify and reserve memory of this type.
> > > >
> > > > The translation of EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP to "soft
> > > > reserved" is x86/E820-only, but other archs could choose to publish
> > > > IORES_DESC_SOFT_RESERVED resources from their platform-firmware memory
> > > > map handlers. Other EFI-capable platforms would need to go audit their
> > > > local usages of EFI_CONVENTIONAL_MEMORY to consider the soft reserved
> > > > case.
> > > >
> > > > Cc: <x86@kernel.org>
> > > > Cc: Borislav Petkov <bp@alien8.de>
> > > > Cc: Ingo Molnar <mingo@redhat.com>
> > > > Cc: "H. Peter Anvin" <hpa@zytor.com>
> > > > Cc: Darren Hart <dvhart@infradead.org>
> > > > Cc: Andy Shevchenko <andy@infradead.org>
> > > > Cc: Andy Lutomirski <luto@kernel.org>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Thomas Gleixner <tglx@linutronix.de>
> > > > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > > > Reported-by: kbuild test robot <lkp@intel.com>
> > > > Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > > Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> > >
> > > Hi Dan,
> > >
> > > I understand that non-x86 may be out of scope for you, but this patch
> > > makes changes to x86 and generic code at the same time without regard
> > > for other architectures.
> >
> > Yes, that did give me pause.
> >
> > > I'd prefer it if we could cover ARM cleanly as well right at the start.
> >
> > Let's do it.
> >
> > >
> > > The first step would be to split out the EFI stub changes (i.e., to
> > > avoid allocating memory from EFI_MEMORY_SP regions) and the EFI core
> > > changes from the other changes. Then, I would like to ask for your
> > > help to get the arm64 part implemented where EFI_MEMORY_SP memory gets
> > > registered/reserved in a way that allows the HMAT code (which should
> > > be arch agnostic) to operate in the same way as it does on x86. Would
> > > it be enough to simply memblock_reserve() it and insert the iomem
> > > resource with the soft_reserved attribute?
> > >
> > > Some more comments below.
> > >
> > > > ---
> > > >  Documentation/admin-guide/kernel-parameters.txt |   19 +++++++--
> > > >  arch/x86/Kconfig                                |   21 +++++++++
> > > >  arch/x86/boot/compressed/eboot.c                |    7 +++
> > > >  arch/x86/boot/compressed/kaslr.c                |    4 ++
> > > >  arch/x86/include/asm/e820/types.h               |    8 ++++
> > > >  arch/x86/include/asm/efi-stub.h                 |   11 +++++
> > > >  arch/x86/kernel/e820.c                          |   12 +++++
> > > >  arch/x86/platform/efi/efi.c                     |   51 +++++++++++++++++++++--
> > > >  drivers/firmware/efi/efi.c                      |    3 +
> > > >  drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 +++++
> > > >  include/linux/efi.h                             |    1
> > > >  include/linux/ioport.h                          |    1
> > > >  12 files changed, 139 insertions(+), 11 deletions(-)
> > > >  create mode 100644 arch/x86/include/asm/efi-stub.h
> > > >
> > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > > index 1c67acd1df65..dd28f0726309 100644
> > > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > > @@ -1152,7 +1152,8 @@
> > > >                         Format: {"off" | "on" | "skip[mbr]"}
> > > >
> > > >         efi=            [EFI]
> > > > -                       Format: { "old_map", "nochunk", "noruntime", "debug" }
> > > > +                       Format: { "old_map", "nochunk", "noruntime", "debug",
> > > > +                                 "nosoftreserve" }
> > > >                         old_map [X86-64]: switch to the old ioremap-based EFI
> > > >                         runtime services mapping. 32-bit still uses this one by
> > > >                         default.
> > > > @@ -1161,6 +1162,12 @@
> > > >                         firmware implementations.
> > > >                         noruntime : disable EFI runtime services support
> > > >                         debug: enable misc debug output
> > > > +                       nosoftreserve: The EFI_MEMORY_SP (Specific Purpose)
> > > > +                       attribute may cause the kernel to reserve the
> > > > +                       memory range for a memory mapping driver to
> > > > +                       claim. Specify efi=nosoftreserve to disable this
> > > > +                       reservation and treat the memory by its base type
> > > > +                       (i.e. EFI_CONVENTIONAL_MEMORY / "System RAM").
> > > >
> > > >         efi_no_storage_paranoia [EFI; X86]
> > > >                         Using this parameter you can use more than 50% of
> > > > @@ -1173,15 +1180,21 @@
> > > >                         updating original EFI memory map.
> > > >                         Region of memory which aa attribute is added to is
> > > >                         from ss to ss+nn.
> > > > +
> > > >                         If efi_fake_mem=2G@4G:0x10000,2G@0x10a0000000:0x10000
> > > >                         is specified, EFI_MEMORY_MORE_RELIABLE(0x10000)
> > > >                         attribute is added to range 0x100000000-0x180000000 and
> > > >                         0x10a0000000-0x1120000000.
> > > >
> > > > +                       If efi_fake_mem=8G@9G:0x40000 is specified, the
> > > > +                       EFI_MEMORY_SP(0x40000) attribute is added to
> > > > +                       range 0x240000000-0x43fffffff.
> > > > +
> > > >                         Using this parameter you can do debugging of EFI memmap
> > > > -                       related feature. For example, you can do debugging of
> > > > +                       related features. For example, you can do debugging of
> > > >                         Address Range Mirroring feature even if your box
> > > > -                       doesn't support it.
> > > > +                       doesn't support it, or mark specific memory as
> > > > +                       "soft reserved".
> > > >
> > > >         efivar_ssdt=    [EFI; X86] Name of an EFI variable that contains an SSDT
> > > >                         that is to be dynamically loaded by Linux. If there are
> > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > > > index 4195f44c6a09..bced13503bb1 100644
> > > > --- a/arch/x86/Kconfig
> > > > +++ b/arch/x86/Kconfig
> > > > @@ -1981,6 +1981,27 @@ config EFI_MIXED
> > > >
> > > >            If unsure, say N.
> > > >
> > > > +config EFI_SOFT_RESERVE
> > > > +       bool "Reserve EFI Specific Purpose Memory"
> > > > +       depends on EFI && ACPI_HMAT
> > > > +       default ACPI_HMAT
> > > > +       ---help---
> > > > +         On systems that have mixed performance classes of memory EFI
> > > > +         may indicate specific purpose memory with an attribute (See
> > > > +         EFI_MEMORY_SP in UEFI 2.8). A memory range tagged with this
> > > > +         attribute may have unique performance characteristics compared
> > > > +         to the system's general purpose "System RAM" pool. On the
> > > > +         expectation that such memory has application specific usage,
> > > > +         and its base EFI memory type is "conventional" answer Y to
> > > > +         arrange for the kernel to reserve it as a "Soft Reserved"
> > > > +         resource, and set aside for direct-access (device-dax) by
> > > > +         default. The memory range can later be optionally assigned to
> > > > +         the page allocator by system administrator policy via the
> > > > +         device-dax kmem facility. Say N to have the kernel treat this
> > > > +         memory as "System RAM" by default.
> > > > +
> > > > +         If unsure, say Y.
> > > > +
> > >
> > > This should be in generic code.
> >
> > Agree.
> >
> > >
> > > >  config SECCOMP
> > > >         def_bool y
> > > >         prompt "Enable seccomp to safely compute untrusted bytecode"
> > > > diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
> > > > index d6662fdef300..f2dc5896d770 100644
> > > > --- a/arch/x86/boot/compressed/eboot.c
> > > > +++ b/arch/x86/boot/compressed/eboot.c
> > > > @@ -10,6 +10,7 @@
> > > >  #include <linux/pci.h>
> > > >
> > > >  #include <asm/efi.h>
> > > > +#include <asm/efi-stub.h>
> > > >  #include <asm/e820/types.h>
> > > >  #include <asm/setup.h>
> > > >  #include <asm/desc.h>
> > > > @@ -553,7 +554,11 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s
> > > >                 case EFI_BOOT_SERVICES_CODE:
> > > >                 case EFI_BOOT_SERVICES_DATA:
> > > >                 case EFI_CONVENTIONAL_MEMORY:
> > > > -                       e820_type = E820_TYPE_RAM;
> > > > +                       if (!efi_nosoftreserve
> > > > +                                       && (d->attribute & EFI_MEMORY_SP))
> > > > +                               e820_type = E820_TYPE_SOFT_RESERVED;
> > > > +                       else
> > > > +                               e820_type = E820_TYPE_RAM;
> > > >                         break;
> > > >
> > > >                 case EFI_ACPI_MEMORY_NVS:
> > > > diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> > > > index 2e53c056ba20..093e84e28b7a 100644
> > > > --- a/arch/x86/boot/compressed/kaslr.c
> > > > +++ b/arch/x86/boot/compressed/kaslr.c
> > > > @@ -38,6 +38,7 @@
> > > >  #include <linux/efi.h>
> > > >  #include <generated/utsrelease.h>
> > > >  #include <asm/efi.h>
> > > > +#include <asm/efi-stub.h>
> > > >
> > > >  /* Macros used by the included decompressor code below. */
> > > >  #define STATIC
> > > > @@ -760,6 +761,9 @@ process_efi_entries(unsigned long minimum, unsigned long image_size)
> > > >                 if (md->type != EFI_CONVENTIONAL_MEMORY)
> > > >                         continue;
> > > >
> > > > +               if (!efi_nosoftreserve && (md->attribute & EFI_MEMORY_SP))
> > > > +                       continue;
> > > > +
> > > >                 if (efi_mirror_found &&
> > > >                     !(md->attribute & EFI_MEMORY_MORE_RELIABLE))
> > > >                         continue;
> > > > diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
> > > > index c3aa4b5e49e2..314f75d886d0 100644
> > > > --- a/arch/x86/include/asm/e820/types.h
> > > > +++ b/arch/x86/include/asm/e820/types.h
> > > > @@ -28,6 +28,14 @@ enum e820_type {
> > > >          */
> > > >         E820_TYPE_PRAM          = 12,
> > > >
> > > > +       /*
> > > > +        * Special-purpose memory is indicated to the system via the
> > > > +        * EFI_MEMORY_SP attribute. Define an e820 translation of this
> > > > +        * memory type for the purpose of reserving this range and
> > > > +        * marking it with the IORES_DESC_SOFT_RESERVED designation.
> > > > +        */
> > > > +       E820_TYPE_SOFT_RESERVED = 0xefffffff,
> > > > +
> > > >         /*
> > > >          * Reserved RAM used by the kernel itself if
> > > >          * CONFIG_INTEL_TXT=y is enabled, memory of this type
> > > > diff --git a/arch/x86/include/asm/efi-stub.h b/arch/x86/include/asm/efi-stub.h
> > > > new file mode 100644
> > > > index 000000000000..16ebd036387b
> > > > --- /dev/null
> > > > +++ b/arch/x86/include/asm/efi-stub.h
> > > > @@ -0,0 +1,11 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +#ifndef _X86_EFI_STUB_H_
> > > > +#define _X86_EFI_STUB_H_
> > > > +
> > > > +#ifdef CONFIG_EFI_STUB
> > > > +extern bool efi_nosoftreserve;
> > > > +#else
> > > > +#define efi_nosoftreserve (1)
> > > > +#endif
> > > > +
> > > > +#endif /* _X86_EFI_STUB_H_ */
> > >
> > > Please put this in generic code as well (but you need a function not a
> > > variable - see below)
> > >
> > > > diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
> > > > index 7da2bcd2b8eb..9976106b57ec 100644
> > > > --- a/arch/x86/kernel/e820.c
> > > > +++ b/arch/x86/kernel/e820.c
> > > > @@ -190,6 +190,7 @@ static void __init e820_print_type(enum e820_type type)
> > > >         case E820_TYPE_RAM:             /* Fall through: */
> > > >         case E820_TYPE_RESERVED_KERN:   pr_cont("usable");                      break;
> > > >         case E820_TYPE_RESERVED:        pr_cont("reserved");                    break;
> > > > +       case E820_TYPE_SOFT_RESERVED:   pr_cont("soft reserved");               break;
> > > >         case E820_TYPE_ACPI:            pr_cont("ACPI data");                   break;
> > > >         case E820_TYPE_NVS:             pr_cont("ACPI NVS");                    break;
> > > >         case E820_TYPE_UNUSABLE:        pr_cont("unusable");                    break;
> > > > @@ -1037,6 +1038,7 @@ static const char *__init e820_type_to_string(struct e820_entry *entry)
> > > >         case E820_TYPE_PRAM:            return "Persistent Memory (legacy)";
> > > >         case E820_TYPE_PMEM:            return "Persistent Memory";
> > > >         case E820_TYPE_RESERVED:        return "Reserved";
> > > > +       case E820_TYPE_SOFT_RESERVED:   return "Soft Reserved";
> > > >         default:                        return "Unknown E820 type";
> > > >         }
> > > >  }
> > > > @@ -1052,6 +1054,7 @@ static unsigned long __init e820_type_to_iomem_type(struct e820_entry *entry)
> > > >         case E820_TYPE_PRAM:            /* Fall-through: */
> > > >         case E820_TYPE_PMEM:            /* Fall-through: */
> > > >         case E820_TYPE_RESERVED:        /* Fall-through: */
> > > > +       case E820_TYPE_SOFT_RESERVED:   /* Fall-through: */
> > > >         default:                        return IORESOURCE_MEM;
> > > >         }
> > > >  }
> > > > @@ -1064,6 +1067,7 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
> > > >         case E820_TYPE_PMEM:            return IORES_DESC_PERSISTENT_MEMORY;
> > > >         case E820_TYPE_PRAM:            return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> > > >         case E820_TYPE_RESERVED:        return IORES_DESC_RESERVED;
> > > > +       case E820_TYPE_SOFT_RESERVED:   return IORES_DESC_SOFT_RESERVED;
> > > >         case E820_TYPE_RESERVED_KERN:   /* Fall-through: */
> > > >         case E820_TYPE_RAM:             /* Fall-through: */
> > > >         case E820_TYPE_UNUSABLE:        /* Fall-through: */
> > > > @@ -1078,11 +1082,12 @@ static bool __init do_mark_busy(enum e820_type type, struct resource *res)
> > > >                 return true;
> > > >
> > > >         /*
> > > > -        * Treat persistent memory like device memory, i.e. reserve it
> > > > -        * for exclusive use of a driver
> > > > +        * Treat persistent memory and other special memory ranges like
> > > > +        * device memory, i.e. reserve it for exclusive use of a driver
> > > >          */
> > > >         switch (type) {
> > > >         case E820_TYPE_RESERVED:
> > > > +       case E820_TYPE_SOFT_RESERVED:
> > > >         case E820_TYPE_PRAM:
> > > >         case E820_TYPE_PMEM:
> > > >                 return false;
> > > > @@ -1285,6 +1290,9 @@ void __init e820__memblock_setup(void)
> > > >                 if (end != (resource_size_t)end)
> > > >                         continue;
> > > >
> > > > +               if (entry->type == E820_TYPE_SOFT_RESERVED)
> > > > +                       memblock_reserve(entry->addr, entry->size);
> > > > +
> > > >                 if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
> > > >                         continue;
> > > >
> > > > diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> > > > index 0bb58eb33ca0..9cfb7f1cf25d 100644
> > > > --- a/arch/x86/platform/efi/efi.c
> > > > +++ b/arch/x86/platform/efi/efi.c
> > > > @@ -151,10 +151,18 @@ void __init efi_find_mirror(void)
> > > >   * more than the max 128 entries that can fit in the e820 legacy
> > > >   * (zeropage) memory map.
> > > >   */
> > > > +enum add_efi_mode {
> > > > +       ADD_EFI_ALL,
> > > > +       ADD_EFI_SOFT_RESERVED,
> > > > +};
> > > >
> > > > -static void __init do_add_efi_memmap(void)
> > > > +static void __init do_add_efi_memmap(enum add_efi_mode mode)
> > > >  {
> > > >         efi_memory_desc_t *md;
> > > > +       int add = 0;
> > > > +
> > > > +       if (!efi_enabled(EFI_MEMMAP))
> > > > +               return;
> > > >
> > > >         for_each_efi_memory_desc(md) {
> > > >                 unsigned long long start = md->phys_addr;
> > > > @@ -167,7 +175,10 @@ static void __init do_add_efi_memmap(void)
> > > >                 case EFI_BOOT_SERVICES_CODE:
> > > >                 case EFI_BOOT_SERVICES_DATA:
> > > >                 case EFI_CONVENTIONAL_MEMORY:
> > > > -                       if (md->attribute & EFI_MEMORY_WB)
> > > > +                       if (efi_enabled(EFI_MEM_SOFT_RESERVE)
> > > > +                                       && (md->attribute & EFI_MEMORY_SP))
> > > > +                               e820_type = E820_TYPE_SOFT_RESERVED;
> > > > +                       else if (md->attribute & EFI_MEMORY_WB)
> > > >                                 e820_type = E820_TYPE_RAM;
> > > >                         else
> > > >                                 e820_type = E820_TYPE_RESERVED;
> > > > @@ -193,9 +204,17 @@ static void __init do_add_efi_memmap(void)
> > > >                         e820_type = E820_TYPE_RESERVED;
> > > >                         break;
> > > >                 }
> > > > +
> > > > +               if (e820_type == E820_TYPE_SOFT_RESERVED)
> > > > +                       /* always add E820_TYPE_SOFT_RESERVED */;
> > > > +               else if (mode == ADD_EFI_SOFT_RESERVED)
> > > > +                       continue;
> > > > +
> > > > +               add++;
> > > >                 e820__range_add(start, size, e820_type);
> > > >         }
> > > > -       e820__update_table(e820_table);
> > > > +       if (add)
> > > > +               e820__update_table(e820_table);
> > > >  }
> > > >
> > > >  int __init efi_memblock_x86_reserve_range(void)
> > > > @@ -227,8 +246,18 @@ int __init efi_memblock_x86_reserve_range(void)
> > > >         if (rv)
> > > >                 return rv;
> > > >
> > > > -       if (add_efi_memmap)
> > > > -               do_add_efi_memmap();
> > > > +       if (add_efi_memmap) {
> > > > +               do_add_efi_memmap(ADD_EFI_ALL);
> > > > +       } else {
> > > > +               /*
> > > > +                * Given add_efi_memmap defaults to 0 and there there is no e820
> > > > +                * mechanism for soft-reserved memory. Explicitly scan for
> > > > +                * soft-reserved memory. Otherwise, the mechanism to disable the
> > > > +                * kernel's consideration of EFI_MEMORY_SP is the
> > > > +                * efi=nosoftreserve option.
> > > > +                */
> > > > +               do_add_efi_memmap(ADD_EFI_SOFT_RESERVED);
> > > > +       }
> > > >
> > > >         WARN(efi.memmap.desc_version != 1,
> > > >              "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
> > > > @@ -781,6 +810,15 @@ static bool should_map_region(efi_memory_desc_t *md)
> > > >         if (IS_ENABLED(CONFIG_X86_32))
> > > >                 return false;
> > > >
> > > > +       /*
> > > > +        * EFI specific purpose memory may be reserved by default
> > > > +        * depending on kernel config and boot options.
> > > > +        */
> > > > +       if (md->type == EFI_CONVENTIONAL_MEMORY
> > > > +                       && efi_enabled(EFI_MEM_SOFT_RESERVE)
> > > > +                       && (md->attribute & EFI_MEMORY_SP))
> > > > +               return false;
> > > > +
> > > >         /*
> > > >          * Map all of RAM so that we can access arguments in the 1:1
> > > >          * mapping when making EFI runtime calls.
> > > > @@ -1072,6 +1110,9 @@ static int __init arch_parse_efi_cmdline(char *str)
> > > >         if (parse_option_str(str, "old_map"))
> > > >                 set_bit(EFI_OLD_MEMMAP, &efi.flags);
> > > >
> > > > +       if (parse_option_str(str, "nosoftreserve"))
> > > > +               clear_bit(EFI_MEM_SOFT_RESERVE, &efi.flags);
> > > > +
> > >
> > > Can we move this to the generic efi= handling code?
> >
> > To parse_efi_cmdline() in drivers/fimrware/efi.c? Sure.
> >
> > >
> > > >         return 0;
> > > >  }
> > > >  early_param("efi", arch_parse_efi_cmdline);
> > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > > > index 363bb9d00fa5..6d54d5c74347 100644
> > > > --- a/drivers/firmware/efi/efi.c
> > > > +++ b/drivers/firmware/efi/efi.c
> > > > @@ -52,6 +52,9 @@ struct efi __read_mostly efi = {
> > > >         .tpm_log                = EFI_INVALID_TABLE_ADDR,
> > > >         .tpm_final_log          = EFI_INVALID_TABLE_ADDR,
> > > >         .mem_reserve            = EFI_INVALID_TABLE_ADDR,
> > > > +#ifdef CONFIG_EFI_SOFT_RESERVE
> > > > +       .flags                  = 1UL << EFI_MEM_SOFT_RESERVE,
> > > > +#endif
> > > >  };
> > > >  EXPORT_SYMBOL(efi);
> > > >
> > >
> > > I'd prefer it if we could call this EFI_MEM_NO_SOFT_RESERVE instead,
> > > and invert the meaning of the bit.
> >
> > ...but that would mean repeat occurrences of
> > "!efi_enabled(EFI_MEM_NO_SOFT_RESERVE)", doesn't the double negative
> > seem less readable to you?
> >
>
> One the one hand, yes. On the other hand, it is the only flag whose
> default is 'enabled' which is also less than ideal.

Ok, I can get on board with "default 0" being the non exception state
of the flags.

>
> > >
> > > > diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > > index 3caae7f2cf56..35ee98a2c00c 100644
> > > > --- a/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > > +++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
> > > > @@ -28,6 +28,7 @@
> > > >  #define EFI_READ_CHUNK_SIZE    (1024 * 1024)
> > > >
> > > >  static unsigned long __chunk_size = EFI_READ_CHUNK_SIZE;
> > > > +bool efi_nosoftreserve;
> > > >
> > >
> > > This needs a getter function if you want to access it from other
> > > compilation units. This has to do with how the early relocation code
> > > handles data symbol references. Please refer to nokaslr() for an
> > > example.
> >
> > Ah, does that mean that the efi_nosoftreserve global variable
> > instances in different compilation units are effectively static the
> > way I currently have them defined?
>
> No, the problem had to do with relocation of GOT entries on some x86
> builds. Then, things got more complicated when I added the 32-bit ARM
> port, which puts other constraints related to how symbols are placed
> in the binary.
>
> So please duplicate the pattern with the static variable and the
> __pure setter, which has proven to be the most robust way to expose
> variables to other compilation units in the stub.

Will do.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
  2019-09-13 16:39         ` Dan Williams
@ 2019-09-13 17:39           ` Ard Biesheuvel
  2019-09-13 17:54             ` Ard Biesheuvel
  0 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2019-09-13 17:39 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Darren Hart,
	Andy Shevchenko, Andy Lutomirski, Peter Zijlstra,
	kbuild test robot, Dave Hansen, Vishal L Verma,
	Linux Kernel Mailing List, linux-efi

On Fri, 13 Sep 2019 at 17:39, Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Sep 13, 2019 at 9:29 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > On Fri, 13 Sep 2019 at 17:22, Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > On Fri, Sep 13, 2019 at 6:00 AM Ard Biesheuvel
> > > <ard.biesheuvel@linaro.org> wrote:
...
> > > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > > > > index 363bb9d00fa5..6d54d5c74347 100644
> > > > > --- a/drivers/firmware/efi/efi.c
> > > > > +++ b/drivers/firmware/efi/efi.c
> > > > > @@ -52,6 +52,9 @@ struct efi __read_mostly efi = {
> > > > >         .tpm_log                = EFI_INVALID_TABLE_ADDR,
> > > > >         .tpm_final_log          = EFI_INVALID_TABLE_ADDR,
> > > > >         .mem_reserve            = EFI_INVALID_TABLE_ADDR,
> > > > > +#ifdef CONFIG_EFI_SOFT_RESERVE
> > > > > +       .flags                  = 1UL << EFI_MEM_SOFT_RESERVE,
> > > > > +#endif
> > > > >  };
> > > > >  EXPORT_SYMBOL(efi);
> > > > >
> > > >
> > > > I'd prefer it if we could call this EFI_MEM_NO_SOFT_RESERVE instead,
> > > > and invert the meaning of the bit.
> > >
> > > ...but that would mean repeat occurrences of
> > > "!efi_enabled(EFI_MEM_NO_SOFT_RESERVE)", doesn't the double negative
> > > seem less readable to you?
> > >
> >
> > One the one hand, yes. On the other hand, it is the only flag whose
> > default is 'enabled' which is also less than ideal.
>
> Ok, I can get on board with "default 0" being the non exception state
> of the flags.
>

In fact, let's just add something like

static inline bool efi_soft_reserve_enabled(void)
{
    return IS_ENABLED(CONFIG_EFI_SOFT_RESERVE) &&
           !efi_enabled(EFI_MEM_NO_SOFT_RESERVE);
}

to linux/efi.h and use that in the code?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
  2019-09-13 17:39           ` Ard Biesheuvel
@ 2019-09-13 17:54             ` Ard Biesheuvel
  0 siblings, 0 replies; 28+ messages in thread
From: Ard Biesheuvel @ 2019-09-13 17:54 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Darren Hart,
	Andy Shevchenko, Andy Lutomirski, Peter Zijlstra,
	kbuild test robot, Dave Hansen, Vishal L Verma,
	Linux Kernel Mailing List, linux-efi

On Fri, 13 Sep 2019 at 18:39, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>
> On Fri, 13 Sep 2019 at 17:39, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > On Fri, Sep 13, 2019 at 9:29 AM Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > >
> > > On Fri, 13 Sep 2019 at 17:22, Dan Williams <dan.j.williams@intel.com> wrote:
> > > >
> > > > On Fri, Sep 13, 2019 at 6:00 AM Ard Biesheuvel
> > > > <ard.biesheuvel@linaro.org> wrote:
> ...
> > > > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > > > > > index 363bb9d00fa5..6d54d5c74347 100644
> > > > > > --- a/drivers/firmware/efi/efi.c
> > > > > > +++ b/drivers/firmware/efi/efi.c
> > > > > > @@ -52,6 +52,9 @@ struct efi __read_mostly efi = {
> > > > > >         .tpm_log                = EFI_INVALID_TABLE_ADDR,
> > > > > >         .tpm_final_log          = EFI_INVALID_TABLE_ADDR,
> > > > > >         .mem_reserve            = EFI_INVALID_TABLE_ADDR,
> > > > > > +#ifdef CONFIG_EFI_SOFT_RESERVE
> > > > > > +       .flags                  = 1UL << EFI_MEM_SOFT_RESERVE,
> > > > > > +#endif
> > > > > >  };
> > > > > >  EXPORT_SYMBOL(efi);
> > > > > >
> > > > >
> > > > > I'd prefer it if we could call this EFI_MEM_NO_SOFT_RESERVE instead,
> > > > > and invert the meaning of the bit.
> > > >
> > > > ...but that would mean repeat occurrences of
> > > > "!efi_enabled(EFI_MEM_NO_SOFT_RESERVE)", doesn't the double negative
> > > > seem less readable to you?
> > > >
> > >
> > > One the one hand, yes. On the other hand, it is the only flag whose
> > > default is 'enabled' which is also less than ideal.
> >
> > Ok, I can get on board with "default 0" being the non exception state
> > of the flags.
> >
>
> In fact, let's just add something like
>
> static inline bool efi_soft_reserve_enabled(void)
> {
>     return IS_ENABLED(CONFIG_EFI_SOFT_RESERVE) &&
>            !efi_enabled(EFI_MEM_NO_SOFT_RESERVE);
> }
>
> to linux/efi.h and use that in the code?

Or even better, add just the declaration to linux.efi,h

bool __pure efi_soft_reserve_enabled(void);

and put one implementation in efi-stub-helper.c:

bool __pure efi_soft_reserve_enabled(void)
{
    return IS_ENABLED(CONFIG_EFI_SOFT_RESERVE) &&
           !efi_nosoftreserve;
}

and the one above in drivers/firmware/efi/efi.c

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
  2019-08-30  1:52 ` [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP Dan Williams
  2019-09-10  6:48   ` Ingo Molnar
  2019-09-13 13:02   ` Ard Biesheuvel
@ 2019-09-13 19:48   ` Ard Biesheuvel
  2019-09-13 20:43     ` Dan Williams
  2 siblings, 1 reply; 28+ messages in thread
From: Ard Biesheuvel @ 2019-09-13 19:48 UTC (permalink / raw)
  To: Dan Williams
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Vishal L Verma, Linux Kernel Mailing List,
	linux-efi

On Fri, 30 Aug 2019 at 03:07, Dan Williams <dan.j.williams@intel.com> wrote:
>
...
> diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
> index 4ac2de4dfa72..d7a6db03ea79 100644
> --- a/drivers/firmware/efi/Makefile
> +++ b/drivers/firmware/efi/Makefile
> @@ -20,13 +20,16 @@ obj-$(CONFIG_UEFI_CPER)                     += cper.o
>  obj-$(CONFIG_EFI_RUNTIME_MAP)          += runtime-map.o
>  obj-$(CONFIG_EFI_RUNTIME_WRAPPERS)     += runtime-wrappers.o
>  obj-$(CONFIG_EFI_STUB)                 += libstub/
> -obj-$(CONFIG_EFI_FAKE_MEMMAP)          += fake_mem.o
> +obj-$(CONFIG_EFI_FAKE_MEMMAP)          += fake_map.o
>  obj-$(CONFIG_EFI_BOOTLOADER_CONTROL)   += efibc.o
>  obj-$(CONFIG_EFI_TEST)                 += test/
>  obj-$(CONFIG_EFI_DEV_PATH_PARSER)      += dev-path-parser.o
>  obj-$(CONFIG_APPLE_PROPERTIES)         += apple-properties.o
>  obj-$(CONFIG_EFI_RCI2_TABLE)           += rci2-table.o
>
> +fake_map-y                             += fake_mem.o
> +fake_map-$(CONFIG_X86)                 += x86-fake_mem.o
> +

Please use

fake-mem-$(CONFIG_X86) := x86-fake_mem.o
obj-$(CONFIG_EFI_FAKE_MEMMAP) += fake_mem.o $(fake-mem-y)

instead, and please use either - or _ in filenames, not both.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
  2019-09-13 19:48   ` Ard Biesheuvel
@ 2019-09-13 20:43     ` Dan Williams
  0 siblings, 0 replies; 28+ messages in thread
From: Dan Williams @ 2019-09-13 20:43 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Thomas Gleixner, Rafael J. Wysocki, the arch/x86 maintainers,
	Borislav Petkov, Ingo Molnar, H. Peter Anvin, Dave Hansen,
	Peter Zijlstra, Vishal L Verma, Linux Kernel Mailing List,
	linux-efi

On Fri, Sep 13, 2019 at 12:49 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
> On Fri, 30 Aug 2019 at 03:07, Dan Williams <dan.j.williams@intel.com> wrote:
> >
> ...
> > diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
> > index 4ac2de4dfa72..d7a6db03ea79 100644
> > --- a/drivers/firmware/efi/Makefile
> > +++ b/drivers/firmware/efi/Makefile
> > @@ -20,13 +20,16 @@ obj-$(CONFIG_UEFI_CPER)                     += cper.o
> >  obj-$(CONFIG_EFI_RUNTIME_MAP)          += runtime-map.o
> >  obj-$(CONFIG_EFI_RUNTIME_WRAPPERS)     += runtime-wrappers.o
> >  obj-$(CONFIG_EFI_STUB)                 += libstub/
> > -obj-$(CONFIG_EFI_FAKE_MEMMAP)          += fake_mem.o
> > +obj-$(CONFIG_EFI_FAKE_MEMMAP)          += fake_map.o
> >  obj-$(CONFIG_EFI_BOOTLOADER_CONTROL)   += efibc.o
> >  obj-$(CONFIG_EFI_TEST)                 += test/
> >  obj-$(CONFIG_EFI_DEV_PATH_PARSER)      += dev-path-parser.o
> >  obj-$(CONFIG_APPLE_PROPERTIES)         += apple-properties.o
> >  obj-$(CONFIG_EFI_RCI2_TABLE)           += rci2-table.o
> >
> > +fake_map-y                             += fake_mem.o
> > +fake_map-$(CONFIG_X86)                 += x86-fake_mem.o
> > +
>
> Please use
>
> fake-mem-$(CONFIG_X86) := x86-fake_mem.o
> obj-$(CONFIG_EFI_FAKE_MEMMAP) += fake_mem.o $(fake-mem-y)

Ok, looks good.

>
> instead, and please use either - or _ in filenames, not both.

Fair enough.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v5 00/10] EFI Specific Purpose Memory Support
  2019-09-06 11:37     ` Rafael J. Wysocki
@ 2019-10-03 15:43       ` Jonathan Cameron
  0 siblings, 0 replies; 28+ messages in thread
From: Jonathan Cameron @ 2019-10-03 15:43 UTC (permalink / raw)
  To: unlisted-recipients:; (no To-header on input)
  Cc: Dan Williams, Rafael J. Wysocki, Thomas Gleixner, Dave Jiang,
	Keith Busch, kbuild test robot, Andy Shevchenko, Borislav Petkov,
	Vishal Verma, H. Peter Anvin, X86 ML, Dave Hansen, Ingo Molnar,
	Len Brown, Peter Zijlstra, Ard Biesheuvel, Andy Lutomirski,
	Darren Hart, Linux Kernel Mailing List, linux-efi

On Fri, 6 Sep 2019 13:37:30 +0200
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com> wrote:

> On 9/5/2019 1:06 AM, Dan Williams wrote:
> > On Mon, Sep 2, 2019 at 4:09 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:  
> >> On Friday, August 30, 2019 3:52:18 AM CEST Dan Williams wrote:  
> >>> Changes since v4 [1]:
> >>> - Rename the facility from "Application Reserved" to "Soft Reserved" to
> >>>    better reflect how the memory is treated. While the spec talks about
> >>>    "specific / application purpose" memory the expected kernel behavior is
> >>>    to make a best effort at reserving the memory from general purpose
> >>>    allocations.
> >>>
> >>> - Add a new efi=nosoftreserve option to disable consideration of the
> >>>    EFI_MEMORY_SP attribute at boot time. This is also motivated by
> >>>    Christoph's initial feedback of allowing the kernel to opt-out of the
> >>>    policy whims of the platform BIOS implementation.
> >>>
> >>> - Update the KASLR implementation to exclude soft-reserved memory
> >>>    including the case where soft-reserved memory is specified via the
> >>>    efi_fake_mem= attribute-override command-line option.
> >>>
> >>> - Move the memregion allocator to its own object file. v4 had it in
> >>>    kernel/resource.c which caused compile errors on Sparc. I otherwise
> >>>    could not find an appropriate place to stash it.
> >>>
> >>> - Rebase on a merge of tip/master and rafael/linux-next since the series
> >>>    collides with changes in both those trees.
> >>>
> >>> [1]: https://lore.kernel.org/r/156140036490.2951909.1837804994781523185.stgit@dwillia2-desk3.amr.corp.intel.com/
> >>>
> >>> ---
> >>>
> >>> Thomas, Rafael,
> >>>
> >>> This happens to collide with both your trees. I think the content
> >>> warrants going through the x86 tree, but would need to publish commit:
> >>>
> >>> 5c7ed4385424 HMAT: Skip publishing target info for nodes with no online memory
> >>>
> >>> ...in Rafael's tree as a stable id for -tip to pull in, but I'm also
> >>> open to other options. I've retained Dave's reviewed-by from v4.
> >>>
> >>> ---
> >>>
> >>> The EFI 2.8 Specification [2] introduces the EFI_MEMORY_SP ("specific
> >>> purpose") memory attribute. This attribute bit replaces the deprecated
> >>> ACPI HMAT "reservation hint" that was introduced in ACPI 6.2 and removed
> >>> in ACPI 6.3.
> >>>
> >>> Given the increasing diversity of memory types that might be advertised
> >>> to the operating system, there is a need for platform firmware to hint
> >>> which memory ranges are free for the OS to use as general purpose memory
> >>> and which ranges are intended for application specific usage. For
> >>> example, an application with prior knowledge of the platform may expect
> >>> to be able to exclusively allocate a precious / limited pool of high
> >>> bandwidth memory. Alternatively, for the general purpose case, the
> >>> operating system may want to make the memory available on a best effort
> >>> basis as a unique numa-node with performance properties by the new
> >>> CONFIG_HMEM_REPORTING [3] facility.
> >>>
> >>> In support of optionally allowing either application-exclusive and
> >>> core-kernel-mm managed access to differentiated memory, claim
> >>> EFI_MEMORY_SP ranges for exposure as "soft reserved" and assigned to a
> >>> device-dax instance by default. Such instances can be directly owned /
> >>> mapped by a platform-topology-aware application. Alternatively, with the
> >>> new kmem facility [4], the administrator has the option to instead
> >>> designate that those memory ranges be hot-added to the core-kernel-mm as
> >>> a unique memory numa-node. In short, allow for the decision about what
> >>> software agent manages soft-reserved memory to be made at runtime.
> >>>
> >>> The patches build on the new HMAT+HMEM_REPORTING facilities merged
> >>> for v5.2-rc1. The implementation is tested with qemu emulation of HMAT
> >>> [5] plus the efi_fake_mem facility for applying the EFI_MEMORY_SP
> >>> attribute. Specific details on reproducing the test configuration are in
> >>> patch 10.
> >>>
> >>> [2]: https://uefi.org/sites/default/files/resources/UEFI_Spec_2_8_final.pdf
> >>> [3]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1cf33aafb84
> >>> [4]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c221c0b0308f
> >>> [5]: http://patchwork.ozlabs.org/cover/1096737/
> >>>
> >>> ---
> >>>
> >>> Dan Williams (10):
> >>>        acpi/numa: Establish a new drivers/acpi/numa/ directory
> >>>        efi: Enumerate EFI_MEMORY_SP
> >>>        x86, efi: Push EFI_MEMMAP check into leaf routines
> >>>        x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax
> >>>        x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP
> >>>        lib: Uplevel the pmem "region" ida to a global allocator
> >>>        dax: Fix alloc_dax_region() compile warning
> >>>        device-dax: Add a driver for "hmem" devices
> >>>        acpi/numa/hmat: Register HMAT at device_initcall level
> >>>        acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device
> >>>
> >>>
> >>>   Documentation/admin-guide/kernel-parameters.txt |   19 +++
> >>>   arch/x86/Kconfig                                |   21 ++++
> >>>   arch/x86/boot/compressed/eboot.c                |    7 +
> >>>   arch/x86/boot/compressed/kaslr.c                |   50 +++++++-
> >>>   arch/x86/include/asm/e820/types.h               |    8 +
> >>>   arch/x86/include/asm/efi-stub.h                 |   11 ++
> >>>   arch/x86/include/asm/efi.h                      |   17 +++
> >>>   arch/x86/kernel/e820.c                          |   12 ++
> >>>   arch/x86/kernel/setup.c                         |   19 ++-
> >>>   arch/x86/platform/efi/efi.c                     |   56 +++++++++
> >>>   arch/x86/platform/efi/quirks.c                  |    3 +
> >>>   drivers/acpi/Kconfig                            |    9 --
> >>>   drivers/acpi/Makefile                           |    3 -
> >>>   drivers/acpi/hmat/Makefile                      |    2
> >>>   drivers/acpi/numa/Kconfig                       |    8 +
> >>>   drivers/acpi/numa/Makefile                      |    3 +
> >>>   drivers/acpi/numa/hmat.c                        |  138 +++++++++++++++++++++--
> >>>   drivers/acpi/numa/srat.c                        |    0
> >>>   drivers/dax/Kconfig                             |   27 ++++-
> >>>   drivers/dax/Makefile                            |    2
> >>>   drivers/dax/bus.c                               |    2
> >>>   drivers/dax/bus.h                               |    2
> >>>   drivers/dax/dax-private.h                       |    2
> >>>   drivers/dax/hmem.c                              |   57 ++++++++++
> >>>   drivers/firmware/efi/Makefile                   |    5 +
> >>>   drivers/firmware/efi/efi.c                      |    8 +
> >>>   drivers/firmware/efi/esrt.c                     |    3 +
> >>>   drivers/firmware/efi/fake_mem.c                 |   26 ++--
> >>>   drivers/firmware/efi/fake_mem.h                 |   10 ++
> >>>   drivers/firmware/efi/libstub/efi-stub-helper.c  |   12 ++
> >>>   drivers/firmware/efi/x86-fake_mem.c             |   69 ++++++++++++
> >>>   drivers/nvdimm/Kconfig                          |    1
> >>>   drivers/nvdimm/core.c                           |    1
> >>>   drivers/nvdimm/nd-core.h                        |    1
> >>>   drivers/nvdimm/region_devs.c                    |   13 +-
> >>>   include/linux/efi.h                             |    4 -
> >>>   include/linux/ioport.h                          |    1
> >>>   include/linux/memregion.h                       |   23 ++++
> >>>   lib/Kconfig                                     |    3 +
> >>>   lib/Makefile                                    |    1
> >>>   lib/memregion.c                                 |   18 +++
> >>>   41 files changed, 584 insertions(+), 93 deletions(-)
> >>>   create mode 100644 arch/x86/include/asm/efi-stub.h
> >>>   delete mode 100644 drivers/acpi/hmat/Makefile
> >>>   rename drivers/acpi/{hmat/Kconfig => numa/Kconfig} (70%)
> >>>   create mode 100644 drivers/acpi/numa/Makefile
> >>>   rename drivers/acpi/{hmat/hmat.c => numa/hmat.c} (85%)
> >>>   rename drivers/acpi/{numa.c => numa/srat.c} (100%)
> >>>   create mode 100644 drivers/dax/hmem.c
> >>>   create mode 100644 drivers/firmware/efi/fake_mem.h
> >>>   create mode 100644 drivers/firmware/efi/x86-fake_mem.c
> >>>   create mode 100644 include/linux/memregion.h
> >>>   create mode 100644 lib/memregion.c
> >>>  
> >> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> for the ACPI-related changes in this series.  
> > Thanks Rafael, is commit 5c7ed4385424 on a stable branch that Thomas
> > could merge, or Thomas, is this all too late for v5.4?  
> 
> Yes, I've just exported the acpi-tables branch containing that commit as 
> a stable one in the linux-pm.git tree at kernel.org.
> 
> Cheers!
> 
Hi All,

Just wondering when this set might make progress? As Rafael observed,
the Generic Initiator set needs rebasing on top of it under the
reasonable assumption that this gets applied first.

Thanks,

Jonathan



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, back to index

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-30  1:52 [PATCH v5 00/10] EFI Specific Purpose Memory Support Dan Williams
2019-08-30  1:52 ` [PATCH v5 01/10] acpi/numa: Establish a new drivers/acpi/numa/ directory Dan Williams
2019-08-30  1:52 ` [PATCH v5 02/10] efi: Enumerate EFI_MEMORY_SP Dan Williams
2019-08-30  1:52 ` [PATCH v5 03/10] x86, efi: Push EFI_MEMMAP check into leaf routines Dan Williams
2019-09-13  9:05   ` Ard Biesheuvel
2019-09-13 12:32     ` Dan Williams
2019-08-30  1:52 ` [PATCH v5 04/10] x86, efi: Reserve UEFI 2.8 Specific Purpose Memory for dax Dan Williams
2019-09-13 12:59   ` Ard Biesheuvel
2019-09-13 16:22     ` Dan Williams
2019-09-13 16:28       ` Ard Biesheuvel
2019-09-13 16:39         ` Dan Williams
2019-09-13 17:39           ` Ard Biesheuvel
2019-09-13 17:54             ` Ard Biesheuvel
2019-08-30  1:52 ` [PATCH v5 05/10] x86, efi: Add efi_fake_mem support for EFI_MEMORY_SP Dan Williams
2019-09-10  6:48   ` Ingo Molnar
2019-09-13 13:02   ` Ard Biesheuvel
2019-09-13 15:02     ` Dan Williams
2019-09-13 19:48   ` Ard Biesheuvel
2019-09-13 20:43     ` Dan Williams
2019-08-30  1:52 ` [PATCH v5 06/10] lib: Uplevel the pmem "region" ida to a global allocator Dan Williams
2019-08-30  1:52 ` [PATCH v5 07/10] dax: Fix alloc_dax_region() compile warning Dan Williams
2019-08-30  1:53 ` [PATCH v5 08/10] device-dax: Add a driver for "hmem" devices Dan Williams
2019-08-30  1:53 ` [PATCH v5 09/10] acpi/numa/hmat: Register HMAT at device_initcall level Dan Williams
2019-08-30  1:53 ` [PATCH v5 10/10] acpi/numa/hmat: Register "soft reserved" memory as an "hmem" device Dan Williams
2019-09-02 11:09 ` [PATCH v5 00/10] EFI Specific Purpose Memory Support Rafael J. Wysocki
2019-09-04 23:06   ` Dan Williams
2019-09-06 11:37     ` Rafael J. Wysocki
2019-10-03 15:43       ` Jonathan Cameron

Linux-EFI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-efi/0 linux-efi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-efi linux-efi/ https://lore.kernel.org/linux-efi \
		linux-efi@vger.kernel.org
	public-inbox-index linux-efi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-efi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git