linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 1/2] mm/nvidmm: Drop x86 dependency on nvdimm e820 device
@ 2018-07-06  8:29 Aneesh Kumar K.V
  2018-07-06  8:29 ` [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver Aneesh Kumar K.V
  0 siblings, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2018-07-06  8:29 UTC (permalink / raw)
  To: akpm, Dan Williams, Oliver; +Cc: linux-mm, linux-kernel, Aneesh Kumar K.V

This patch adds new Kconfig variable PMEM_PLATFORM_DEVICE and use that to select
the nvdimm e820 device. The x86 config is now named X86_PMEM_LEGACY_DEVICE.

Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/x86/Kconfig                  | 5 +----
 arch/x86/include/asm/e820/types.h | 2 +-
 arch/x86/include/uapi/asm/e820.h  | 2 +-
 drivers/nvdimm/Kconfig            | 5 ++++-
 drivers/nvdimm/Makefile           | 2 +-
 tools/testing/nvdimm/Kbuild       | 2 +-
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f1dbb4ee19d7..1186e1330876 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1641,13 +1641,10 @@ config ILLEGAL_POINTER_VALUE
 source "mm/Kconfig"
 
 config X86_PMEM_LEGACY_DEVICE
-	bool
-
-config X86_PMEM_LEGACY
 	tristate "Support non-standard NVDIMMs and ADR protected memory"
 	depends on PHYS_ADDR_T_64BIT
 	depends on BLK_DEV
-	select X86_PMEM_LEGACY_DEVICE
+	select PMEM_PLATFORM_DEVICE
 	select LIBNVDIMM
 	help
 	  Treat memory marked using the non-standard e820 type of 12 as used
diff --git a/arch/x86/include/asm/e820/types.h b/arch/x86/include/asm/e820/types.h
index c3aa4b5e49e2..0fb25d04dd26 100644
--- a/arch/x86/include/asm/e820/types.h
+++ b/arch/x86/include/asm/e820/types.h
@@ -20,7 +20,7 @@ enum e820_type {
 	 * NVDIMM regions that persist over a reboot.
 	 *
 	 * The kernel will ignore their special capabilities
-	 * unless the CONFIG_X86_PMEM_LEGACY=y option is set.
+	 * unless the CONFIG_X86_PMEM_LEGACY_DEVICE=y option is set.
 	 *
 	 * ( Note that older platforms also used 6 for the same
 	 *   type of memory, but newer versions switched to 12 as
diff --git a/arch/x86/include/uapi/asm/e820.h b/arch/x86/include/uapi/asm/e820.h
index 2f491efe3a12..b8ae7c221269 100644
--- a/arch/x86/include/uapi/asm/e820.h
+++ b/arch/x86/include/uapi/asm/e820.h
@@ -38,7 +38,7 @@
 /*
  * This is a non-standardized way to represent ADR or NVDIMM regions that
  * persist over a reboot.  The kernel will ignore their special capabilities
- * unless the CONFIG_X86_PMEM_LEGACY option is set.
+ * unless the CONFIG_X86_PMEM_LEGACY_DEVICE option is set.
  *
  * ( Note that older platforms also used 6 for the same type of memory,
  *   but newer versions switched to 12 as 6 was assigned differently.  Some
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 9d36473dc2a2..50d2a33de441 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -27,7 +27,7 @@ config BLK_DEV_PMEM
 	  Memory ranges for PMEM are described by either an NFIT
 	  (NVDIMM Firmware Interface Table, see CONFIG_NFIT_ACPI), a
 	  non-standard OEM-specific E820 memory type (type-12, see
-	  CONFIG_X86_PMEM_LEGACY), or it is manually specified by the
+	  CONFIG_X86_PMEM_LEGACY_DEVICE), or it is manually specified by the
 	  'memmap=nn[KMG]!ss[KMG]' kernel command line (see
 	  Documentation/admin-guide/kernel-parameters.rst).  This driver converts
 	  these persistent memory ranges into block devices that are
@@ -112,4 +112,7 @@ config OF_PMEM
 
 	  Select Y if unsure.
 
+config PMEM_PLATFORM_DEVICE
+       bool
+
 endif
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index e8847045dac0..94f7f29146ce 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -3,7 +3,7 @@ obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
 obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
 obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
-obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
+obj-$(CONFIG_PMEM_PLATFORM_DEVICE) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
 
 nd_pmem-y := pmem.o
diff --git a/tools/testing/nvdimm/Kbuild b/tools/testing/nvdimm/Kbuild
index 0392153a0009..82e84253a6ae 100644
--- a/tools/testing/nvdimm/Kbuild
+++ b/tools/testing/nvdimm/Kbuild
@@ -27,7 +27,7 @@ obj-$(CONFIG_LIBNVDIMM) += libnvdimm.o
 obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
 obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
-obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
+obj-$(CONFIG_PMEM_PLATFORM_DEVICE) += nd_e820.o
 obj-$(CONFIG_ACPI_NFIT) += nfit.o
 ifeq ($(CONFIG_DAX),m)
 obj-$(CONFIG_DAX) += dax.o
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-06  8:29 [RFC PATCH 1/2] mm/nvidmm: Drop x86 dependency on nvdimm e820 device Aneesh Kumar K.V
@ 2018-07-06  8:29 ` Aneesh Kumar K.V
  2018-07-06 18:46   ` Randy Dunlap
                     ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2018-07-06  8:29 UTC (permalink / raw)
  To: akpm, Dan Williams, Oliver; +Cc: linux-mm, linux-kernel, Aneesh Kumar K.V

This patch steal system RAM and use that to emulate pmem device using the
e820 platform driver.

This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
to allocate memory early in the boot. This memory is later registered as
persistent memory range.

Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>

Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 drivers/nvdimm/Kconfig        |  13 ++++
 drivers/nvdimm/Makefile       |   1 +
 drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
 3 files changed, 129 insertions(+)
 create mode 100644 drivers/nvdimm/memblockpmem.c

diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index 50d2a33de441..cbbbcbd4506b 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -115,4 +115,17 @@ config OF_PMEM
 config PMEM_PLATFORM_DEVICE
        bool
 
+config MEMBLOCK_PMEM
+	bool "pmemmap= parameter support"
+	default y
+	depends on HAVE_MEMBLOCK
+	select PMEM_PLATFORM_DEVICE
+	help
+	  Add support for the pmemmap= kernel command line parameter. This is similar
+	  to the memmap= parameter available on ACPI platforms, but it uses generic
+	  kernel facilities (the memblock allocator) to reserve memory rather than adding
+	  to the e820 table.
+
+	  Select Y if unsure.
+
 endif
diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
index 94f7f29146ce..0215ce0182e9 100644
--- a/drivers/nvdimm/Makefile
+++ b/drivers/nvdimm/Makefile
@@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
 obj-$(CONFIG_ND_BLK) += nd_blk.o
 obj-$(CONFIG_PMEM_PLATFORM_DEVICE) += nd_e820.o
 obj-$(CONFIG_OF_PMEM) += of_pmem.o
+obj-$(CONFIG_MEMBLOCK_PMEM) += memblockpmem.o
 
 nd_pmem-y := pmem.o
 
diff --git a/drivers/nvdimm/memblockpmem.c b/drivers/nvdimm/memblockpmem.c
new file mode 100644
index 000000000000..d39772b75fcd
--- /dev/null
+++ b/drivers/nvdimm/memblockpmem.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (c) 2018 IBM Corporation
+ */
+
+#define pr_fmt(fmt) "memblock pmem: " fmt
+
+#include <linux/libnvdimm.h>
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/mmzone.h>
+#include <linux/cpu.h>
+#include <linux/platform_device.h>
+#include <linux/init.h>
+#include <linux/ioport.h>
+#include <linux/ctype.h>
+#include <linux/slab.h>
+
+/*
+ * Align pmem reservations to the section size so we don't have issues with
+ * memory hotplug
+ */
+#ifdef CONFIG_SPARSEMEM
+#define BOOTPMEM_ALIGN (1UL << SECTION_SIZE_BITS)
+#else
+#define BOOTPMEM_ALIGN PFN_DEFAULT_ALIGNMENT
+#endif
+
+static __initdata u64 pmem_size;
+static __initdata phys_addr_t pmem_stolen_memory;
+
+static void alloc_pmem_from_memblock(void)
+{
+
+	pmem_stolen_memory = memblock_alloc_base(pmem_size,
+						 BOOTPMEM_ALIGN,
+						 MEMBLOCK_ALLOC_ACCESSIBLE);
+	if (!pmem_stolen_memory) {
+		pr_err("Failed to allocate memory for PMEM from memblock\n");
+		return;
+	}
+
+	/*
+	 * Remove from the memblock reserved range
+	 */
+	memblock_free(pmem_stolen_memory, pmem_size);
+
+	/*
+	 * Remove from the memblock memory range.
+	 */
+	memblock_remove(pmem_stolen_memory, pmem_size);
+	pr_info("Allocated %ld memory at 0x%lx\n", (unsigned long)pmem_size,
+		(unsigned long)pmem_stolen_memory);
+	return;
+}
+
+/*
+ * pmemmap=ss[KMG]
+ *
+ * This is similar to the memremap=offset[KMG]!size[KMG] paramater
+ * for adding a legacy pmem range to the e820 map on x86, but it's
+ * platform agnostic.
+ *
+ * e.g. pmemmap=16G allocates 16G pmem region
+ */
+static int __init parse_pmemmap(char *p)
+{
+	char *old_p = p;
+
+	if (!p)
+		return -EINVAL;
+
+	pmem_size = memparse(p, &p);
+	if (p == old_p)
+		return -EINVAL;
+
+	alloc_pmem_from_memblock();
+	return 0;
+}
+early_param("pmemmap", parse_pmemmap);
+
+static __init int register_e820_pmem(void)
+{
+	struct resource *res, *conflict;
+        struct platform_device *pdev;
+
+	if (!pmem_stolen_memory)
+		return 0;
+
+	res = kzalloc(sizeof(*res), GFP_KERNEL);
+	if (!res)
+		return -1;
+
+	memset(res, 0, sizeof(*res));
+	res->start = pmem_stolen_memory;
+	res->end = pmem_stolen_memory + pmem_size - 1;
+	res->name = "Persistent Memory (legacy)";
+	res->desc = IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+	res->flags = IORESOURCE_MEM;
+
+	conflict = insert_resource_conflict(&iomem_resource, res);
+	if (conflict) {
+		pr_err("%pR conflicts, try insert below %pR\n", res, conflict);
+		kfree(res);
+		return -1;
+	}
+	/*
+	 * See drivers/nvdimm/e820.c for the implementation, this is
+	 * simply here to trigger the module to load on demand.
+	 */
+	pdev = platform_device_alloc("e820_pmem", -1);
+
+	return platform_device_add(pdev);
+}
+device_initcall(register_e820_pmem);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-06  8:29 ` [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver Aneesh Kumar K.V
@ 2018-07-06 18:46   ` Randy Dunlap
  2018-07-06 19:38   ` Dan Williams
  2018-07-07  7:50   ` Oliver
  2 siblings, 0 replies; 10+ messages in thread
From: Randy Dunlap @ 2018-07-06 18:46 UTC (permalink / raw)
  To: Aneesh Kumar K.V, akpm, Dan Williams, Oliver; +Cc: linux-mm, linux-kernel

On 07/06/18 01:29, Aneesh Kumar K.V wrote:
> This patch steal system RAM and use that to emulate pmem device using the
> e820 platform driver.
> 
> This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
> to allocate memory early in the boot. This memory is later registered as
> persistent memory range.
> 
> Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>
> 
> Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/Kconfig        |  13 ++++
>  drivers/nvdimm/Makefile       |   1 +
>  drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+)
>  create mode 100644 drivers/nvdimm/memblockpmem.c
> 
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 50d2a33de441..cbbbcbd4506b 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -115,4 +115,17 @@ config OF_PMEM
>  config PMEM_PLATFORM_DEVICE
>         bool
>  
> +config MEMBLOCK_PMEM
> +	bool "pmemmap= parameter support"
> +	default y
> +	depends on HAVE_MEMBLOCK
> +	select PMEM_PLATFORM_DEVICE
> +	help
> +	  Add support for the pmemmap= kernel command line parameter. This is similar
> +	  to the memmap= parameter available on ACPI platforms, but it uses generic
> +	  kernel facilities (the memblock allocator) to reserve memory rather than adding
> +	  to the e820 table.
> +
> +	  Select Y if unsure.
> +
>  endif


There's a high barrier for "default y", something like if the platform or device
cannot boot without it, it can be "default y".  I have doubts that this is OK.


-- 
~Randy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-06  8:29 ` [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver Aneesh Kumar K.V
  2018-07-06 18:46   ` Randy Dunlap
@ 2018-07-06 19:38   ` Dan Williams
  2018-07-07  7:15     ` Oliver
  2018-07-07  7:50   ` Oliver
  2 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2018-07-06 19:38 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Oliver, Linux MM, Linux Kernel Mailing List

On Fri, Jul 6, 2018 at 1:29 AM, Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
> This patch steal system RAM and use that to emulate pmem device using the
> e820 platform driver.
>
> This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
> to allocate memory early in the boot. This memory is later registered as
> persistent memory range.
>
> Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>
>
> Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/Kconfig        |  13 ++++
>  drivers/nvdimm/Makefile       |   1 +
>  drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+)
>  create mode 100644 drivers/nvdimm/memblockpmem.c
>
[..]
> +/*
> + * pmemmap=ss[KMG]
> + *
> + * This is similar to the memremap=offset[KMG]!size[KMG] paramater
> + * for adding a legacy pmem range to the e820 map on x86, but it's
> + * platform agnostic.

The current memmap=ss!nn option is a non-stop source of bugs and
fragility. The fact that this lets the kernel specify the base address
helps, but then this is purely just a debug facility because
memmap=ss!nn is there to cover platform firmware implementations that
fail to mark a given address range as persistent.

If this is just for debug, why not use qemu? If this is not for debug
what are these systems that don't have proper firmware support?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-06 19:38   ` Dan Williams
@ 2018-07-07  7:15     ` Oliver
  2018-07-07 17:36       ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver @ 2018-07-07  7:15 UTC (permalink / raw)
  To: Dan Williams
  Cc: Aneesh Kumar K.V, Andrew Morton, Linux MM, Linux Kernel Mailing List

On Sat, Jul 7, 2018 at 5:38 AM, Dan Williams <dan.j.williams@intel.com> wrote:
> On Fri, Jul 6, 2018 at 1:29 AM, Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>> This patch steal system RAM and use that to emulate pmem device using the
>> e820 platform driver.
>>
>> This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
>> to allocate memory early in the boot. This memory is later registered as
>> persistent memory range.
>>
>> Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>
>>
>> Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>  drivers/nvdimm/Kconfig        |  13 ++++
>>  drivers/nvdimm/Makefile       |   1 +
>>  drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
>>  3 files changed, 129 insertions(+)
>>  create mode 100644 drivers/nvdimm/memblockpmem.c
>>
> [..]
>> +/*
>> + * pmemmap=ss[KMG]
>> + *
>> + * This is similar to the memremap=offset[KMG]!size[KMG] paramater
>> + * for adding a legacy pmem range to the e820 map on x86, but it's
>> + * platform agnostic.

> The current memmap=ss!nn option is a non-stop source of bugs and
> fragility. The fact that this lets the kernel specify the base address
> helps, but then this is purely just a debug facility because
> memmap=ss!nn is there to cover platform firmware implementations that
> fail to mark a given address range as persistent.

> If this is just for debug, why not use qemu?

To make a long story short, we have two virtualisation stacks and only one of
them is based on qemu. An unfortunately large chunk of our customers (and
our internal test systems) run the other one so we need to accommodate them
somehow.

> If this is not for debug what are these systems that don't have proper firmware
> support?

I wrote the original version (for RHEL 7.something) for a customer who wanted
to do some testing which needed to be run on real hardware for some reason.
We couldn't install a FW update on their system so this ended up being the least
painful way to get them going. That's not a strong argument for
merging this, but
the point is that it's sometimes useful to have the capability in the kernel.

Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-06  8:29 ` [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver Aneesh Kumar K.V
  2018-07-06 18:46   ` Randy Dunlap
  2018-07-06 19:38   ` Dan Williams
@ 2018-07-07  7:50   ` Oliver
  2018-07-09  5:16     ` Aneesh Kumar K.V
  2 siblings, 1 reply; 10+ messages in thread
From: Oliver @ 2018-07-07  7:50 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Andrew Morton, Dan Williams, Linux MM, Linux Kernel Mailing List

On Fri, Jul 6, 2018 at 6:29 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
> This patch steal system RAM and use that to emulate pmem device using the
> e820 platform driver.
>
> This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
> to allocate memory early in the boot. This memory is later registered as
> persistent memory range.
>
> Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>

I use this account rather than my internal address for community
facing stuff since
no one deserves to have IBM email inflicted upon them. Also you left out the
apostrophe, you monster!

> Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  drivers/nvdimm/Kconfig        |  13 ++++
>  drivers/nvdimm/Makefile       |   1 +
>  drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
>  3 files changed, 129 insertions(+)
>  create mode 100644 drivers/nvdimm/memblockpmem.c
>
> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
> index 50d2a33de441..cbbbcbd4506b 100644
> --- a/drivers/nvdimm/Kconfig
> +++ b/drivers/nvdimm/Kconfig
> @@ -115,4 +115,17 @@ config OF_PMEM
>  config PMEM_PLATFORM_DEVICE
>         bool
>
> +config MEMBLOCK_PMEM
> +       bool "pmemmap= parameter support"
> +       default y
> +       depends on HAVE_MEMBLOCK
> +       select PMEM_PLATFORM_DEVICE
> +       help
> +         Add support for the pmemmap= kernel command line parameter. This is similar
> +         to the memmap= parameter available on ACPI platforms, but it uses generic
> +         kernel facilities (the memblock allocator) to reserve memory rather than adding
> +         to the e820 table.
> +
> +         Select Y if unsure.
> +
>  endif
> diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
> index 94f7f29146ce..0215ce0182e9 100644
> --- a/drivers/nvdimm/Makefile
> +++ b/drivers/nvdimm/Makefile
> @@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
>  obj-$(CONFIG_ND_BLK) += nd_blk.o
>  obj-$(CONFIG_PMEM_PLATFORM_DEVICE) += nd_e820.o
>  obj-$(CONFIG_OF_PMEM) += of_pmem.o
> +obj-$(CONFIG_MEMBLOCK_PMEM) += memblockpmem.o

Does this work when libnvdimm is built as a module? I remember doing
something like
this and discovering that the early_param() stuff didn't get included
in the vmlinux
when libnvdimm was built as a module due to how the makefiles worked.
It might have
been a bug in the RHEL7 tree I was using that has since been fixed upstream.

>  nd_pmem-y := pmem.o
>
> diff --git a/drivers/nvdimm/memblockpmem.c b/drivers/nvdimm/memblockpmem.c
> new file mode 100644
> index 000000000000..d39772b75fcd
> --- /dev/null
> +++ b/drivers/nvdimm/memblockpmem.c
> @@ -0,0 +1,115 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (c) 2018 IBM Corporation
> + */
> +
> +#define pr_fmt(fmt) "memblock pmem: " fmt
> +
> +#include <linux/libnvdimm.h>
> +#include <linux/bootmem.h>
> +#include <linux/memblock.h>
> +#include <linux/mmzone.h>
> +#include <linux/cpu.h>
> +#include <linux/platform_device.h>
> +#include <linux/init.h>
> +#include <linux/ioport.h>
> +#include <linux/ctype.h>
> +#include <linux/slab.h>
> +
> +/*
> + * Align pmem reservations to the section size so we don't have issues with
> + * memory hotplug
> + */
> +#ifdef CONFIG_SPARSEMEM
> +#define BOOTPMEM_ALIGN (1UL << SECTION_SIZE_BITS)
> +#else
> +#define BOOTPMEM_ALIGN PFN_DEFAULT_ALIGNMENT
> +#endif

Is aligning to the section size sufficient? IIRC I had to align it to
the memory block
size on some systems. Of course, that might have been a RHEL bug that has
since been fixed upstream.

> +
> +static __initdata u64 pmem_size;
> +static __initdata phys_addr_t pmem_stolen_memory;
> +
> +static void alloc_pmem_from_memblock(void)
> +{
> +
> +       pmem_stolen_memory = memblock_alloc_base(pmem_size,
> +                                                BOOTPMEM_ALIGN,
> +                                                MEMBLOCK_ALLOC_ACCESSIBLE);
> +       if (!pmem_stolen_memory) {
> +               pr_err("Failed to allocate memory for PMEM from memblock\n");
> +               return;
> +       }
> +
> +       /*
> +        * Remove from the memblock reserved range
> +        */
> +       memblock_free(pmem_stolen_memory, pmem_size);
> +
> +       /*
> +        * Remove from the memblock memory range.
> +        */
> +       memblock_remove(pmem_stolen_memory, pmem_size);
> +       pr_info("Allocated %ld memory at 0x%lx\n", (unsigned long)pmem_size,
> +               (unsigned long)pmem_stolen_memory);
> +       return;
> +}
> +
> +/*
> + * pmemmap=ss[KMG]
> + *
> + * This is similar to the memremap=offset[KMG]!size[KMG] paramater
> + * for adding a legacy pmem range to the e820 map on x86, but it's
> + * platform agnostic.
> + *
> + * e.g. pmemmap=16G allocates 16G pmem region

I'm not really thrilled with this and I'd rather we kept the <size>@<node id>
format and the ability to reserve multiple regions that I had in the
old version.

I know getting the nid allocations working is a pain in the ass since
HAVE_MEMBLOCK_NODE_MAP doesn't specify *when* in the boot process
the node map information is actually available, but it's useful
functionality and
I think the problems are resolvable.

It's also possible that the whole memblock approach to this is wrong and we
should look at doing something similar to how gigantic pages are allocated
at runtime.

> + */
> +static int __init parse_pmemmap(char *p)
> +{
> +       char *old_p = p;
> +
> +       if (!p)
> +               return -EINVAL;
> +
> +       pmem_size = memparse(p, &p);
> +       if (p == old_p)
> +               return -EINVAL;
> +
> +       alloc_pmem_from_memblock();
> +       return 0;
> +}
> +early_param("pmemmap", parse_pmemmap);
> +
> +static __init int register_e820_pmem(void)
> +{
> +       struct resource *res, *conflict;
> +        struct platform_device *pdev;
> +
> +       if (!pmem_stolen_memory)
> +               return 0;
> +
> +       res = kzalloc(sizeof(*res), GFP_KERNEL);
> +       if (!res)
> +               return -1;
> +
> +       memset(res, 0, sizeof(*res));
> +       res->start = pmem_stolen_memory;
> +       res->end = pmem_stolen_memory + pmem_size - 1;
> +       res->name = "Persistent Memory (legacy)";
> +       res->desc = IORES_DESC_PERSISTENT_MEMORY_LEGACY;
> +       res->flags = IORESOURCE_MEM;
> +
> +       conflict = insert_resource_conflict(&iomem_resource, res);
> +       if (conflict) {
> +               pr_err("%pR conflicts, try insert below %pR\n", res, conflict);
> +               kfree(res);
> +               return -1;
> +       }
> +       /*
> +        * See drivers/nvdimm/e820.c for the implementation, this is
> +        * simply here to trigger the module to load on demand.
> +        */
> +       pdev = platform_device_alloc("e820_pmem", -1);
> +
> +       return platform_device_add(pdev);
> +}
> +device_initcall(register_e820_pmem);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-07  7:15     ` Oliver
@ 2018-07-07 17:36       ` Dan Williams
  2018-07-09  5:17         ` Aneesh Kumar K.V
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2018-07-07 17:36 UTC (permalink / raw)
  To: Oliver
  Cc: Aneesh Kumar K.V, Andrew Morton, Linux MM, Linux Kernel Mailing List

On Sat, Jul 7, 2018 at 12:15 AM, Oliver <oohall@gmail.com> wrote:
> On Sat, Jul 7, 2018 at 5:38 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>> On Fri, Jul 6, 2018 at 1:29 AM, Aneesh Kumar K.V
>> <aneesh.kumar@linux.ibm.com> wrote:
>>> This patch steal system RAM and use that to emulate pmem device using the
>>> e820 platform driver.
>>>
>>> This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
>>> to allocate memory early in the boot. This memory is later registered as
>>> persistent memory range.
>>>
>>> Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>
>>>
>>> Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>  drivers/nvdimm/Kconfig        |  13 ++++
>>>  drivers/nvdimm/Makefile       |   1 +
>>>  drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
>>>  3 files changed, 129 insertions(+)
>>>  create mode 100644 drivers/nvdimm/memblockpmem.c
>>>
>> [..]
>>> +/*
>>> + * pmemmap=ss[KMG]
>>> + *
>>> + * This is similar to the memremap=offset[KMG]!size[KMG] paramater
>>> + * for adding a legacy pmem range to the e820 map on x86, but it's
>>> + * platform agnostic.
>
>> The current memmap=ss!nn option is a non-stop source of bugs and
>> fragility. The fact that this lets the kernel specify the base address
>> helps, but then this is purely just a debug facility because
>> memmap=ss!nn is there to cover platform firmware implementations that
>> fail to mark a given address range as persistent.
>
>> If this is just for debug, why not use qemu?
>
> To make a long story short, we have two virtualisation stacks and only one of
> them is based on qemu. An unfortunately large chunk of our customers (and
> our internal test systems) run the other one so we need to accommodate them
> somehow.
>
>> If this is not for debug what are these systems that don't have proper firmware
>> support?
>
> I wrote the original version (for RHEL 7.something) for a customer who wanted
> to do some testing which needed to be run on real hardware for some reason.
> We couldn't install a FW update on their system so this ended up being the least
> painful way to get them going. That's not a strong argument for
> merging this, but
> the point is that it's sometimes useful to have the capability in the kernel.

Ok, correct me if I'm wrong, but it seems to be purely about debug and
emulation? If that's the case would it be acceptable to just add more
capabilities to tools/testing/nvdimm/ for what you want to do? That
has been our primary vehicle for testing libnvdimm.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-07  7:50   ` Oliver
@ 2018-07-09  5:16     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 10+ messages in thread
From: Aneesh Kumar K.V @ 2018-07-09  5:16 UTC (permalink / raw)
  To: Oliver; +Cc: Andrew Morton, Dan Williams, Linux MM, Linux Kernel Mailing List

On 07/07/2018 01:20 PM, Oliver wrote:
> On Fri, Jul 6, 2018 at 6:29 PM, Aneesh Kumar K.V
> <aneesh.kumar@linux.ibm.com> wrote:
>> This patch steal system RAM and use that to emulate pmem device using the
>> e820 platform driver.
>>
>> This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
>> to allocate memory early in the boot. This memory is later registered as
>> persistent memory range.
>>
>> Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>
> 
> I use this account rather than my internal address for community
> facing stuff since
> no one deserves to have IBM email inflicted upon them. Also you left out the
> apostrophe, you monster!


:) Will switch to gmail.com address?

> 
>> Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>> ---
>>   drivers/nvdimm/Kconfig        |  13 ++++
>>   drivers/nvdimm/Makefile       |   1 +
>>   drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
>>   3 files changed, 129 insertions(+)
>>   create mode 100644 drivers/nvdimm/memblockpmem.c
>>
>> diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
>> index 50d2a33de441..cbbbcbd4506b 100644
>> --- a/drivers/nvdimm/Kconfig
>> +++ b/drivers/nvdimm/Kconfig
>> @@ -115,4 +115,17 @@ config OF_PMEM
>>   config PMEM_PLATFORM_DEVICE
>>          bool
>>
>> +config MEMBLOCK_PMEM
>> +       bool "pmemmap= parameter support"
>> +       default y
>> +       depends on HAVE_MEMBLOCK
>> +       select PMEM_PLATFORM_DEVICE
>> +       help
>> +         Add support for the pmemmap= kernel command line parameter. This is similar
>> +         to the memmap= parameter available on ACPI platforms, but it uses generic
>> +         kernel facilities (the memblock allocator) to reserve memory rather than adding
>> +         to the e820 table.
>> +
>> +         Select Y if unsure.
>> +
>>   endif
>> diff --git a/drivers/nvdimm/Makefile b/drivers/nvdimm/Makefile
>> index 94f7f29146ce..0215ce0182e9 100644
>> --- a/drivers/nvdimm/Makefile
>> +++ b/drivers/nvdimm/Makefile
>> @@ -5,6 +5,7 @@ obj-$(CONFIG_ND_BTT) += nd_btt.o
>>   obj-$(CONFIG_ND_BLK) += nd_blk.o
>>   obj-$(CONFIG_PMEM_PLATFORM_DEVICE) += nd_e820.o
>>   obj-$(CONFIG_OF_PMEM) += of_pmem.o
>> +obj-$(CONFIG_MEMBLOCK_PMEM) += memblockpmem.o
> 
> Does this work when libnvdimm is built as a module? I remember doing
> something like
> this and discovering that the early_param() stuff didn't get included
> in the vmlinux
> when libnvdimm was built as a module due to how the makefiles worked.
> It might have
> been a bug in the RHEL7 tree I was using that has since been fixed upstream.
> 


I didn't check that. Will do that in next iteration.



>>   nd_pmem-y := pmem.o
>>
>> diff --git a/drivers/nvdimm/memblockpmem.c b/drivers/nvdimm/memblockpmem.c
>> new file mode 100644
>> index 000000000000..d39772b75fcd
>> --- /dev/null
>> +++ b/drivers/nvdimm/memblockpmem.c
>> @@ -0,0 +1,115 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +/*
>> + * Copyright (c) 2018 IBM Corporation
>> + */
>> +
>> +#define pr_fmt(fmt) "memblock pmem: " fmt
>> +
>> +#include <linux/libnvdimm.h>
>> +#include <linux/bootmem.h>
>> +#include <linux/memblock.h>
>> +#include <linux/mmzone.h>
>> +#include <linux/cpu.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/init.h>
>> +#include <linux/ioport.h>
>> +#include <linux/ctype.h>
>> +#include <linux/slab.h>
>> +
>> +/*
>> + * Align pmem reservations to the section size so we don't have issues with
>> + * memory hotplug
>> + */
>> +#ifdef CONFIG_SPARSEMEM
>> +#define BOOTPMEM_ALIGN (1UL << SECTION_SIZE_BITS)
>> +#else
>> +#define BOOTPMEM_ALIGN PFN_DEFAULT_ALIGNMENT
>> +#endif
> 
> Is aligning to the section size sufficient? IIRC I had to align it to
> the memory block
> size on some systems. Of course, that might have been a RHEL bug that has
> since been fixed upstream.
> 

Ok, I didn't face any issues. But will look more what issues we could face.



>> +
>> +static __initdata u64 pmem_size;
>> +static __initdata phys_addr_t pmem_stolen_memory;
>> +
>> +static void alloc_pmem_from_memblock(void)
>> +{
>> +
>> +       pmem_stolen_memory = memblock_alloc_base(pmem_size,
>> +                                                BOOTPMEM_ALIGN,
>> +                                                MEMBLOCK_ALLOC_ACCESSIBLE);
>> +       if (!pmem_stolen_memory) {
>> +               pr_err("Failed to allocate memory for PMEM from memblock\n");
>> +               return;
>> +       }
>> +
>> +       /*
>> +        * Remove from the memblock reserved range
>> +        */
>> +       memblock_free(pmem_stolen_memory, pmem_size);
>> +
>> +       /*
>> +        * Remove from the memblock memory range.
>> +        */
>> +       memblock_remove(pmem_stolen_memory, pmem_size);
>> +       pr_info("Allocated %ld memory at 0x%lx\n", (unsigned long)pmem_size,
>> +               (unsigned long)pmem_stolen_memory);
>> +       return;
>> +}
>> +
>> +/*
>> + * pmemmap=ss[KMG]
>> + *
>> + * This is similar to the memremap=offset[KMG]!size[KMG] paramater
>> + * for adding a legacy pmem range to the e820 map on x86, but it's
>> + * platform agnostic.
>> + *
>> + * e.g. pmemmap=16G allocates 16G pmem region
> 
> I'm not really thrilled with this and I'd rather we kept the <size>@<node id>
> format and the ability to reserve multiple regions that I had in the
> old version.
> 
> I know getting the nid allocations working is a pain in the ass since
> HAVE_MEMBLOCK_NODE_MAP doesn't specify *when* in the boot process
> the node map information is actually available, but it's useful
> functionality and
> I think the problems are resolvable.
> 


The reason I dropped the @nid is because, we do set node details in 
memblock late (memblock_set_node()). We setup our linear mapped page 
table before that. That implies, we can't steal memory from memblock and 
use that for pmem backing if we do the pmem backing allocation after 
node information is set on memblock. Since we can't use nid, I was not 
sure there is any value in doing multiple pmem region.

> It's also possible that the whole memblock approach to this is wrong and we
> should look at doing something similar to how gigantic pages are allocated
> at runtime.
> 
>> + */
>> +static int __init parse_pmemmap(char *p)
>> +{
>> +       char *old_p = p;
>> +
>> +       if (!p)
>> +               return -EINVAL;
>> +
>> +       pmem_size = memparse(p, &p);
>> +       if (p == old_p)
>> +               return -EINVAL;
>> +
>> +       alloc_pmem_from_memblock();
>> +       return 0;
>> +}
>> +early_param("pmemmap", parse_pmemmap);
>> +
>> +static __init int register_e820_pmem(void)
>> +{
>> +       struct resource *res, *conflict;
>> +        struct platform_device *pdev;
>> +
>> +       if (!pmem_stolen_memory)
>> +               return 0;
>> +
>> +       res = kzalloc(sizeof(*res), GFP_KERNEL);
>> +       if (!res)
>> +               return -1;
>> +
>> +       memset(res, 0, sizeof(*res));
>> +       res->start = pmem_stolen_memory;
>> +       res->end = pmem_stolen_memory + pmem_size - 1;
>> +       res->name = "Persistent Memory (legacy)";
>> +       res->desc = IORES_DESC_PERSISTENT_MEMORY_LEGACY;
>> +       res->flags = IORESOURCE_MEM;
>> +
>> +       conflict = insert_resource_conflict(&iomem_resource, res);
>> +       if (conflict) {
>> +               pr_err("%pR conflicts, try insert below %pR\n", res, conflict);
>> +               kfree(res);
>> +               return -1;
>> +       }
>> +       /*
>> +        * See drivers/nvdimm/e820.c for the implementation, this is
>> +        * simply here to trigger the module to load on demand.
>> +        */
>> +       pdev = platform_device_alloc("e820_pmem", -1);
>> +
>> +       return platform_device_add(pdev);
>> +}
>> +device_initcall(register_e820_pmem);
>> --

-aneesh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-07 17:36       ` Dan Williams
@ 2018-07-09  5:17         ` Aneesh Kumar K.V
  2018-07-09  6:06           ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Aneesh Kumar K.V @ 2018-07-09  5:17 UTC (permalink / raw)
  To: Dan Williams, Oliver; +Cc: Andrew Morton, Linux MM, Linux Kernel Mailing List

On 07/07/2018 11:06 PM, Dan Williams wrote:
> On Sat, Jul 7, 2018 at 12:15 AM, Oliver <oohall@gmail.com> wrote:
>> On Sat, Jul 7, 2018 at 5:38 AM, Dan Williams <dan.j.williams@intel.com> wrote:
>>> On Fri, Jul 6, 2018 at 1:29 AM, Aneesh Kumar K.V
>>> <aneesh.kumar@linux.ibm.com> wrote:
>>>> This patch steal system RAM and use that to emulate pmem device using the
>>>> e820 platform driver.
>>>>
>>>> This adds a new kernel command line 'pmemmap' which takes the format <size[KMG]>
>>>> to allocate memory early in the boot. This memory is later registered as
>>>> persistent memory range.
>>>>
>>>> Based on original patch from Oliver OHalloran <oliveroh@au1.ibm.com>
>>>>
>>>> Not-Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>>> ---
>>>>   drivers/nvdimm/Kconfig        |  13 ++++
>>>>   drivers/nvdimm/Makefile       |   1 +
>>>>   drivers/nvdimm/memblockpmem.c | 115 ++++++++++++++++++++++++++++++++++
>>>>   3 files changed, 129 insertions(+)
>>>>   create mode 100644 drivers/nvdimm/memblockpmem.c
>>>>
>>> [..]
>>>> +/*
>>>> + * pmemmap=ss[KMG]
>>>> + *
>>>> + * This is similar to the memremap=offset[KMG]!size[KMG] paramater
>>>> + * for adding a legacy pmem range to the e820 map on x86, but it's
>>>> + * platform agnostic.
>>
>>> The current memmap=ss!nn option is a non-stop source of bugs and
>>> fragility. The fact that this lets the kernel specify the base address
>>> helps, but then this is purely just a debug facility because
>>> memmap=ss!nn is there to cover platform firmware implementations that
>>> fail to mark a given address range as persistent.
>>
>>> If this is just for debug, why not use qemu?
>>
>> To make a long story short, we have two virtualisation stacks and only one of
>> them is based on qemu. An unfortunately large chunk of our customers (and
>> our internal test systems) run the other one so we need to accommodate them
>> somehow.
>>
>>> If this is not for debug what are these systems that don't have proper firmware
>>> support?
>>
>> I wrote the original version (for RHEL 7.something) for a customer who wanted
>> to do some testing which needed to be run on real hardware for some reason.
>> We couldn't install a FW update on their system so this ended up being the least
>> painful way to get them going. That's not a strong argument for
>> merging this, but
>> the point is that it's sometimes useful to have the capability in the kernel.
> 
> Ok, correct me if I'm wrong, but it seems to be purely about debug and
> emulation? If that's the case would it be acceptable to just add more
> capabilities to tools/testing/nvdimm/ for what you want to do? That
> has been our primary vehicle for testing libnvdimm.
> 

What we need is the ability to run with fsdax on hypervisor other than KVM.

-aneesh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver
  2018-07-09  5:17         ` Aneesh Kumar K.V
@ 2018-07-09  6:06           ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2018-07-09  6:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Oliver, Andrew Morton, Linux MM, Linux Kernel Mailing List

On Sun, Jul 8, 2018 at 10:17 PM, Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
> On 07/07/2018 11:06 PM, Dan Williams wrote:
[..]
> What we need is the ability to run with fsdax on hypervisor other than KVM.

That sounds like a production use case? How can it be actual
persistent memory if the kernel is picking the physical address
backing the range?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-09  6:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06  8:29 [RFC PATCH 1/2] mm/nvidmm: Drop x86 dependency on nvdimm e820 device Aneesh Kumar K.V
2018-07-06  8:29 ` [RFC PATCH 2/2] mm/pmem: Add memblock based e820 platform driver Aneesh Kumar K.V
2018-07-06 18:46   ` Randy Dunlap
2018-07-06 19:38   ` Dan Williams
2018-07-07  7:15     ` Oliver
2018-07-07 17:36       ` Dan Williams
2018-07-09  5:17         ` Aneesh Kumar K.V
2018-07-09  6:06           ` Dan Williams
2018-07-07  7:50   ` Oliver
2018-07-09  5:16     ` Aneesh Kumar K.V

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).