Linux-HyperV Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with kexec-tools
@ 2020-04-29 16:08 David Hildenbrand
  2020-04-29 16:08 ` [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-04-29 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtio-dev, virtualization, linuxppc-dev, linux-acpi,
	linux-nvdimm, linux-hyperv, linux-s390, xen-devel, Michal Hocko,
	Andrew Morton, Michael S . Tsirkin, David Hildenbrand,
	Baoquan He, Benjamin Herrenschmidt, Boris Ostrovsky,
	Christian Borntraeger, Dan Williams, Dave Jiang, Eric Biederman,
	Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens, Jason Wang,
	Juergen Gross, K. Y. Srinivasan, Len Brown, Leonardo Bras,
	Michael Ellerman, Michal Hocko, Nathan Lynch, Oscar Salvador,
	Pankaj Gupta, Paul Mackerras, Pingfan Liu, Rafael J. Wysocki,
	Stefano Stabellini, Stephen Hemminger, Thomas Gleixner,
	Vasily Gorbik, Vishal Verma, Wei Liu, Wei Yang

This series is based on [1]:
	[PATCH v2 00/10] virtio-mem: paravirtualized memory
That will hopefull get picked up soon, rebased to -next.

The following patches were reverted from -next [2]:
	[PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
As discussed in that thread, they should be reverted from -next already.

In theory, if people agree, we could take the first two patches via the
-mm tree now and the last (virtio-mem) patch via MST's tree once picking up
virtio-mem. No strong feelings.


Memory added by virtio-mem is special and might contain logical holes,
especially after memory unplug, but also when adding memory in
sub-section size. While memory in these holes can usually be read, that
memory should not be touched. virtio-mem managed device memory is never
exposed via any firmware memmap (esp., e820). The device driver will
request to plug memory from the hypervisor and add it to Linux.

On a cold start, all memory is unplugged, and the guest driver will first
request to plug memory from the hypervisor, to then add it to Linux. After
a reboot, all memory will get unplugged (except in rare, special cases). In
case the device driver comes up and detects that some memory is still
plugged after a reboot, it will manually request to unplug all memory from
the hypervisor first - to then request to plug memory from the hypervisor
and add to Linux. This is essentially a defragmentation step, where all
logical holes are removed.

As the device driver is responsible for detecting, adding and managing that
memory, also kexec should treat it like that. It is special. We need a way
to teach kexec-tools to not add that memory to the fixed-up firmware
memmap, to not place kexec images onto this memory, but still allow kdump
to dump it. Add a flag to tell memory hotplug code to
not create /sys/firmware/memmap entries and to indicate it via
"System RAM (driver managed)" in /proc/iomem.

Before this series, kexec_file_load() already did the right thing (for
virtio-mem) by not adding that memory to the fixed-up firmware memmap and
letting the device driver handle it. With this series, also kexec_load() -
which relies on user space to provide a fixed up firmware memmap - does
the right thing with virtio-mem memory.

When the virtio-mem device driver(s) come up, they will request to unplug
all memory from the hypervisor first (esp. defragment), to then request to
plug consecutive memory ranges from the hypervisor, and add them to Linux
- just like on a reboot where we still have memory plugged.

[1] https://lore.kernel.org/r/20200311171422.10484-1-david@redhat.com/
[2] https://lore.kernel.org/r/20200326180730.4754-1-james.morse@arm.com

David Hildenbrand (3):
  mm/memory_hotplug: Prepare passing flags to add_memory() and friends
  mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED
  virtio-mem: Add memory with MHP_DRIVER_MANAGED

 arch/powerpc/platforms/powernv/memtrace.c     |  2 +-
 .../platforms/pseries/hotplug-memory.c        |  2 +-
 drivers/acpi/acpi_memhotplug.c                |  2 +-
 drivers/base/memory.c                         |  2 +-
 drivers/dax/kmem.c                            |  2 +-
 drivers/hv/hv_balloon.c                       |  2 +-
 drivers/s390/char/sclp_cmd.c                  |  2 +-
 drivers/virtio/virtio_mem.c                   |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory_hotplug.h                | 15 +++++++--
 mm/memory_hotplug.c                           | 31 +++++++++++++------
 11 files changed, 44 insertions(+), 21 deletions(-)

-- 
2.25.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends
  2020-04-29 16:08 [PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with kexec-tools David Hildenbrand
@ 2020-04-29 16:08 ` David Hildenbrand
  2020-04-29 16:41   ` Wei Liu
  2020-04-29 16:08 ` [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED David Hildenbrand
  2020-04-29 16:08 ` [PATCH v1 3/3] virtio-mem: Add memory with MHP_DRIVER_MANAGED David Hildenbrand
  2 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-04-29 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtio-dev, virtualization, linuxppc-dev, linux-acpi,
	linux-nvdimm, linux-hyperv, linux-s390, xen-devel, Michal Hocko,
	Andrew Morton, Michael S . Tsirkin, David Hildenbrand,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman, Dan Williams,
	Vishal Verma, Dave Jiang, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Jason Wang, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Thomas Gleixner, Pingfan Liu,
	Leonardo Bras, Nathan Lynch, Oscar Salvador, Michal Hocko,
	Baoquan He, Wei Yang, Pankaj Gupta, Eric Biederman

We soon want to pass flags - prepare for that.

This patch is based on a similar patch by Oscar Salvador:

https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Leonardo Bras <leobras.c@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-acpi@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Cc: linux-hyperv@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c       |  2 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  2 +-
 drivers/acpi/acpi_memhotplug.c                  |  2 +-
 drivers/base/memory.c                           |  2 +-
 drivers/dax/kmem.c                              |  2 +-
 drivers/hv/hv_balloon.c                         |  2 +-
 drivers/s390/char/sclp_cmd.c                    |  2 +-
 drivers/virtio/virtio_mem.c                     |  2 +-
 drivers/xen/balloon.c                           |  2 +-
 include/linux/memory_hotplug.h                  |  7 ++++---
 mm/memory_hotplug.c                             | 11 ++++++-----
 11 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 13b369d2cc45..a7475d18c671 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -224,7 +224,7 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size)) {
+		if (add_memory(ent->nid, ent->start, ent->size, 0)) {
 			pr_err("Failed to add trace memory to node %d\n",
 				ent->nid);
 			ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5ace2f9a277e..ae44eba46ca0 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -646,7 +646,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	block_sz = memory_block_size_bytes();
 
 	/* Add the memory */
-	rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
+	rc = __add_memory(lmb->nid, lmb->base_addr, block_sz, 0);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index e294f44a7850..d91b3584d4b2 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -207,7 +207,7 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length, 0);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 2b09b68b9f78..c0ef7d9e310a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -432,7 +432,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0);
 
 	if (ret)
 		goto out;
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 3d0a7e702c94..e159184e0ba0 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -65,7 +65,7 @@ int dev_dax_kmem_probe(struct device *dev)
 	new_res->flags = IORESOURCE_SYSTEM_RAM;
 	new_res->name = dev_name(dev);
 
-	rc = add_memory(numa_node, new_res->start, resource_size(new_res));
+	rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0);
 	if (rc) {
 		release_resource(new_res);
 		kfree(new_res);
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 32e3bc0aa665..0194bed1a573 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				(HA_CHUNK << PAGE_SHIFT), 0);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index a864b21af602..a6a908244c74 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,7 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(0, addr, block_size);
+		add_memory(0, addr, block_size, 0);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 96e687531933..3101cbf9e59d 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -421,7 +421,7 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 		nid = memory_add_physaddr_to_nid(addr);
 
 	dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
-	return add_memory(nid, addr, memory_block_size_bytes());
+	return add_memory(nid, addr, memory_block_size_bytes(), 0);
 }
 
 /*
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 0c142bcab79d..6ec0373fa624 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -347,7 +347,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, 0);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index d641828e5596..bf0e3edb8688 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -340,9 +340,10 @@ extern void set_zone_contiguous(struct zone *zone);
 extern void clear_zone_contiguous(struct zone *zone);
 
 extern void __ref free_area_init_core_hotplug(int nid);
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, unsigned long flags);
+extern int add_memory(int nid, u64 start, u64 size, unsigned long flags);
+extern int add_memory_resource(int nid, struct resource *resource,
+			       unsigned long flags);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern void remove_pfn_range_from_zone(struct zone *zone,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4ac6395ee2fc..ebdf6541d074 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1000,7 +1000,8 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res,
+			      unsigned long flags)
 {
 	struct mhp_params params = { .pgprot = PAGE_KERNEL };
 	u64 start, size;
@@ -1078,7 +1079,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
 {
 	struct resource *res;
 	int ret;
@@ -1087,18 +1088,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, flags);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, unsigned long flags)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, flags);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.25.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED
  2020-04-29 16:08 [PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with kexec-tools David Hildenbrand
  2020-04-29 16:08 ` [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends David Hildenbrand
@ 2020-04-29 16:08 ` David Hildenbrand
  2020-04-30  7:19   ` David Hildenbrand
  2020-04-29 16:08 ` [PATCH v1 3/3] virtio-mem: Add memory with MHP_DRIVER_MANAGED David Hildenbrand
  2 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-04-29 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtio-dev, virtualization, linuxppc-dev, linux-acpi,
	linux-nvdimm, linux-hyperv, linux-s390, xen-devel, Michal Hocko,
	Andrew Morton, Michael S . Tsirkin, David Hildenbrand,
	Michal Hocko, Pankaj Gupta, Wei Yang, Baoquan He, Eric Biederman

Some paravirtualized devices that add memory via add_memory() and
friends (esp. virtio-mem) don't want to create entries in
/sys/firmware/memmap/ - primarily to hinder kexec from adding this
memory to the boot memmap of the kexec kernel.

In fact, such memory is never exposed via the firmware (e.g., e820), but
only via the device, so exposing this memory via /sys/firmware/memmap/ is
wrong:
 "kexec needs the raw firmware-provided memory map to setup the
  parameter segment of the kernel that should be booted with
  kexec. Also, the raw memory map is useful for debugging. For
  that reason, /sys/firmware/memmap is an interface that provides
  the raw memory map to userspace." [1]

We want to let user space know that memory which is always detected,
added, and managed via a (device) driver - like memory managed by
virtio-mem - is special. It cannot be used for placing kexec segments
and the (device) driver is responsible for re-adding memory that
(eventually shrunk/grown/defragmented) memory after a reboot/kexec. It
should e.g., not be added to a fixed up firmware memmap. However, it should
be dumped by kdump.

Also, such memory could behave differently than an ordinary DIMM - e.g.,
memory managed by virtio-mem can have holes inside added memory resource,
which should not be touched, especially for writing.

Let's expose that memory as "System RAM (driver managed)" e.g., via
/pro/iomem.

We don't have to worry about firmware_map_remove() on the removal path.
If there is no entry, it will simply return with -EINVAL.

[1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/memory_hotplug.h |  8 ++++++++
 mm/memory_hotplug.c            | 20 ++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index bf0e3edb8688..cc538584b39e 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -68,6 +68,14 @@ struct mhp_params {
 	pgprot_t pgprot;
 };
 
+/* Flags used for add_memory() and friends. */
+
+/*
+ * Don't create entries in /sys/firmware/memmap/ and expose memory as
+ * "System RAM (driver managed)" in e.g., /proc/iomem
+ */
+#define MHP_DRIVER_MANAGED		1
+
 /*
  * Zone resizing functions
  *
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ebdf6541d074..cfa0721280aa 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -98,11 +98,11 @@ void mem_hotplug_done(void)
 u64 max_mem_size = U64_MAX;
 
 /* add this memory to iomem resource */
-static struct resource *register_memory_resource(u64 start, u64 size)
+static struct resource *register_memory_resource(u64 start, u64 size,
+						 const char *resource_name)
 {
 	struct resource *res;
 	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-	char *resource_name = "System RAM";
 
 	/*
 	 * Make sure value parsed from 'mem=' only restricts memory adding
@@ -1058,7 +1058,8 @@ int __ref add_memory_resource(int nid, struct resource *res,
 	BUG_ON(ret);
 
 	/* create new memmap entry */
-	firmware_map_add_hotplug(start, start + size, "System RAM");
+	if (!(flags & MHP_DRIVER_MANAGED))
+		firmware_map_add_hotplug(start, start + size, "System RAM");
 
 	/* device_online() will take the lock when calling online_pages() */
 	mem_hotplug_done();
@@ -1081,10 +1082,21 @@ int __ref add_memory_resource(int nid, struct resource *res,
 /* requires device_hotplug_lock, see add_memory_resource() */
 int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
 {
+	const char *resource_name = "System RAM";
 	struct resource *res;
 	int ret;
 
-	res = register_memory_resource(start, size);
+	/*
+	 * Indicate that memory managed by a driver is special. It's always
+	 * detected and added via a driver, should not be given to the kexec
+	 * kernel for booting when manually crafting the firmware memmap, and
+	 * no kexec segments should be placed on it. However, kdump should
+	 * dump this memory.
+	 */
+	if (flags & MHP_DRIVER_MANAGED)
+		resource_name = "System RAM (driver managed)";
+
+	res = register_memory_resource(start, size, resource_name);
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-- 
2.25.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 3/3] virtio-mem: Add memory with MHP_DRIVER_MANAGED
  2020-04-29 16:08 [PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with kexec-tools David Hildenbrand
  2020-04-29 16:08 ` [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends David Hildenbrand
  2020-04-29 16:08 ` [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED David Hildenbrand
@ 2020-04-29 16:08 ` David Hildenbrand
  2 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2020-04-29 16:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtio-dev, virtualization, linuxppc-dev, linux-acpi,
	linux-nvdimm, linux-hyperv, linux-s390, xen-devel, Michal Hocko,
	Andrew Morton, Michael S . Tsirkin, David Hildenbrand,
	Jason Wang, Michal Hocko, Eric Biederman

We don't want /sys/firmware/memmap entries and we want to indicate
our memory as "System RAM (driver managed)" in /proc/iomem. This is
especially relevant for kexec-tools, which have to be updated to
support dumping virtio-mem memory after this patch. Expected behavior in
kexec-tools:
- Don't use this memory when creating a fixed-up firmware memmap. Works
  now out of the box on x86-64.
- Don't use this memory for placing kexec segments. Works now out of the
  box on x86-64.
- Consider "System RAM (driver managed)" when creating the elfcorehdr
  for kdump. This memory has to be dumped. Needs update of kexec-tools.

With this patch on x86-64:

/proc/iomem:
	00000000-00000fff : Reserved
	00001000-0009fbff : System RAM
	[...]
	fffc0000-ffffffff : Reserved
	100000000-13fffffff : System RAM
	140000000-147ffffff : System RAM (driver managed)
	340000000-347ffffff : System RAM (driver managed)
	348000000-34fffffff : System RAM (driver managed)
	[..]
	3280000000-32ffffffff : PCI Bus 0000:00

/sys/firmware/memmap:
	0000000000000000-000000000009fc00 (System RAM)
	000000000009fc00-00000000000a0000 (Reserved)
	00000000000f0000-0000000000100000 (Reserved)
	0000000000100000-00000000bffe0000 (System RAM)
	00000000bffe0000-00000000c0000000 (Reserved)
	00000000feffc000-00000000ff000000 (Reserved)
	00000000fffc0000-0000000100000000 (Reserved)
	0000000100000000-0000000140000000 (System RAM)

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/virtio/virtio_mem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 3101cbf9e59d..6f658d1aeac4 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -421,7 +421,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, unsigned long mb_id)
 		nid = memory_add_physaddr_to_nid(addr);
 
 	dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
-	return add_memory(nid, addr, memory_block_size_bytes(), 0);
+	return add_memory(nid, addr, memory_block_size_bytes(),
+			  MHP_DRIVER_MANAGED);
 }
 
 /*
-- 
2.25.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends
  2020-04-29 16:08 ` [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends David Hildenbrand
@ 2020-04-29 16:41   ` Wei Liu
  0 siblings, 0 replies; 9+ messages in thread
From: Wei Liu @ 2020-04-29 16:41 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, virtio-dev, virtualization, linuxppc-dev,
	linux-acpi, linux-nvdimm, linux-hyperv, linux-s390, xen-devel,
	Michal Hocko, Andrew Morton, Michael S . Tsirkin,
	Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman, Dan Williams,
	Vishal Verma, Dave Jiang, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Wei Liu, Heiko Carstens, Vasily Gorbik,
	Christian Borntraeger, Jason Wang, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Thomas Gleixner, Pingfan Liu,
	Leonardo Bras, Nathan Lynch, Oscar Salvador, Michal Hocko,
	Baoquan He, Wei Yang, Pankaj Gupta, Eric Biederman

On Wed, Apr 29, 2020 at 06:08:01PM +0200, David Hildenbrand wrote:
> We soon want to pass flags - prepare for that.
> 
> This patch is based on a similar patch by Oscar Salvador:
> 
> https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.de
> 
[...]
> ---
>  drivers/hv/hv_balloon.c                         |  2 +-

> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 32e3bc0aa665..0194bed1a573 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
>  
>  		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>  		ret = add_memory(nid, PFN_PHYS((start_pfn)),
> -				(HA_CHUNK << PAGE_SHIFT));
> +				(HA_CHUNK << PAGE_SHIFT), 0);
>  
>  		if (ret) {
>  			pr_err("hot_add memory failed error is %d\n", ret);

Acked-by: Wei Liu <wei.liu@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED
  2020-04-29 16:08 ` [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED David Hildenbrand
@ 2020-04-30  7:19   ` David Hildenbrand
  2020-04-30  8:11     ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-04-30  7:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, virtio-dev, virtualization, linuxppc-dev, linux-acpi,
	linux-nvdimm, linux-hyperv, linux-s390, xen-devel, Michal Hocko,
	Andrew Morton, Michael S . Tsirkin, Michal Hocko, Pankaj Gupta,
	Wei Yang, Baoquan He, Eric Biederman, Dan Williams,
	Pavel Tatashin, Dave Hansen

On 29.04.20 18:08, David Hildenbrand wrote:
> Some paravirtualized devices that add memory via add_memory() and
> friends (esp. virtio-mem) don't want to create entries in
> /sys/firmware/memmap/ - primarily to hinder kexec from adding this
> memory to the boot memmap of the kexec kernel.
> 
> In fact, such memory is never exposed via the firmware (e.g., e820), but
> only via the device, so exposing this memory via /sys/firmware/memmap/ is
> wrong:
>  "kexec needs the raw firmware-provided memory map to setup the
>   parameter segment of the kernel that should be booted with
>   kexec. Also, the raw memory map is useful for debugging. For
>   that reason, /sys/firmware/memmap is an interface that provides
>   the raw memory map to userspace." [1]
> 
> We want to let user space know that memory which is always detected,
> added, and managed via a (device) driver - like memory managed by
> virtio-mem - is special. It cannot be used for placing kexec segments
> and the (device) driver is responsible for re-adding memory that
> (eventually shrunk/grown/defragmented) memory after a reboot/kexec. It
> should e.g., not be added to a fixed up firmware memmap. However, it should
> be dumped by kdump.
> 
> Also, such memory could behave differently than an ordinary DIMM - e.g.,
> memory managed by virtio-mem can have holes inside added memory resource,
> which should not be touched, especially for writing.
> 
> Let's expose that memory as "System RAM (driver managed)" e.g., via
> /pro/iomem.
> 
> We don't have to worry about firmware_map_remove() on the removal path.
> If there is no entry, it will simply return with -EINVAL.
> 
> [1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> Cc: Wei Yang <richard.weiyang@gmail.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Eric Biederman <ebiederm@xmission.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  include/linux/memory_hotplug.h |  8 ++++++++
>  mm/memory_hotplug.c            | 20 ++++++++++++++++----
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index bf0e3edb8688..cc538584b39e 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -68,6 +68,14 @@ struct mhp_params {
>  	pgprot_t pgprot;
>  };
>  
> +/* Flags used for add_memory() and friends. */
> +
> +/*
> + * Don't create entries in /sys/firmware/memmap/ and expose memory as
> + * "System RAM (driver managed)" in e.g., /proc/iomem
> + */
> +#define MHP_DRIVER_MANAGED		1
> +
>  /*
>   * Zone resizing functions
>   *
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ebdf6541d074..cfa0721280aa 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -98,11 +98,11 @@ void mem_hotplug_done(void)
>  u64 max_mem_size = U64_MAX;
>  
>  /* add this memory to iomem resource */
> -static struct resource *register_memory_resource(u64 start, u64 size)
> +static struct resource *register_memory_resource(u64 start, u64 size,
> +						 const char *resource_name)
>  {
>  	struct resource *res;
>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> -	char *resource_name = "System RAM";
>  
>  	/*
>  	 * Make sure value parsed from 'mem=' only restricts memory adding
> @@ -1058,7 +1058,8 @@ int __ref add_memory_resource(int nid, struct resource *res,
>  	BUG_ON(ret);
>  
>  	/* create new memmap entry */
> -	firmware_map_add_hotplug(start, start + size, "System RAM");
> +	if (!(flags & MHP_DRIVER_MANAGED))
> +		firmware_map_add_hotplug(start, start + size, "System RAM");
>  
>  	/* device_online() will take the lock when calling online_pages() */
>  	mem_hotplug_done();
> @@ -1081,10 +1082,21 @@ int __ref add_memory_resource(int nid, struct resource *res,
>  /* requires device_hotplug_lock, see add_memory_resource() */
>  int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
>  {
> +	const char *resource_name = "System RAM";
>  	struct resource *res;
>  	int ret;
>  
> -	res = register_memory_resource(start, size);
> +	/*
> +	 * Indicate that memory managed by a driver is special. It's always
> +	 * detected and added via a driver, should not be given to the kexec
> +	 * kernel for booting when manually crafting the firmware memmap, and
> +	 * no kexec segments should be placed on it. However, kdump should
> +	 * dump this memory.
> +	 */
> +	if (flags & MHP_DRIVER_MANAGED)
> +		resource_name = "System RAM (driver managed)";
> +
> +	res = register_memory_resource(start, size, resource_name);
>  	if (IS_ERR(res))
>  		return PTR_ERR(res);
>  
> 

BTW, I was wondering if this is actually also something that
drivers/dax/kmem.c wants to use for adding memory.

Just because we decided to use some DAX memory in the current kernel as
system ram, doesn't mean we should make that decision for the kexec
kernel (e.g., using it as initial memory, placing kexec binaries onto
it, etc.). This is also not what we would observe during a real reboot.

I can see that the "System RAM" resource will show up as child resource
under the device e.g., in /proc/iomem.

However, entries in /sys/firmware/memmap/ are created as "System RAM".

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED
  2020-04-30  7:19   ` David Hildenbrand
@ 2020-04-30  8:11     ` Dan Williams
  2020-04-30  8:20       ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2020-04-30  8:11 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Kernel Mailing List, Linux MM, virtio-dev, virtualization,
	linuxppc-dev, Linux ACPI, linux-nvdimm, linux-hyperv, linux-s390,
	xen-devel, Michal Hocko, Andrew Morton, Michael S . Tsirkin,
	Michal Hocko, Pankaj Gupta, Wei Yang, Baoquan He, Eric Biederman,
	Pavel Tatashin, Dave Hansen

On Thu, Apr 30, 2020 at 12:20 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 29.04.20 18:08, David Hildenbrand wrote:
> > Some paravirtualized devices that add memory via add_memory() and
> > friends (esp. virtio-mem) don't want to create entries in
> > /sys/firmware/memmap/ - primarily to hinder kexec from adding this
> > memory to the boot memmap of the kexec kernel.
> >
> > In fact, such memory is never exposed via the firmware (e.g., e820), but
> > only via the device, so exposing this memory via /sys/firmware/memmap/ is
> > wrong:
> >  "kexec needs the raw firmware-provided memory map to setup the
> >   parameter segment of the kernel that should be booted with
> >   kexec. Also, the raw memory map is useful for debugging. For
> >   that reason, /sys/firmware/memmap is an interface that provides
> >   the raw memory map to userspace." [1]
> >
> > We want to let user space know that memory which is always detected,
> > added, and managed via a (device) driver - like memory managed by
> > virtio-mem - is special. It cannot be used for placing kexec segments
> > and the (device) driver is responsible for re-adding memory that
> > (eventually shrunk/grown/defragmented) memory after a reboot/kexec. It
> > should e.g., not be added to a fixed up firmware memmap. However, it should
> > be dumped by kdump.
> >
> > Also, such memory could behave differently than an ordinary DIMM - e.g.,
> > memory managed by virtio-mem can have holes inside added memory resource,
> > which should not be touched, especially for writing.
> >
> > Let's expose that memory as "System RAM (driver managed)" e.g., via
> > /pro/iomem.
> >
> > We don't have to worry about firmware_map_remove() on the removal path.
> > If there is no entry, it will simply return with -EINVAL.
> >
> > [1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > Cc: Wei Yang <richard.weiyang@gmail.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Eric Biederman <ebiederm@xmission.com>
> > Signed-off-by: David Hildenbrand <david@redhat.com>
> > ---
> >  include/linux/memory_hotplug.h |  8 ++++++++
> >  mm/memory_hotplug.c            | 20 ++++++++++++++++----
> >  2 files changed, 24 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> > index bf0e3edb8688..cc538584b39e 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -68,6 +68,14 @@ struct mhp_params {
> >       pgprot_t pgprot;
> >  };
> >
> > +/* Flags used for add_memory() and friends. */
> > +
> > +/*
> > + * Don't create entries in /sys/firmware/memmap/ and expose memory as
> > + * "System RAM (driver managed)" in e.g., /proc/iomem
> > + */
> > +#define MHP_DRIVER_MANAGED           1
> > +
> >  /*
> >   * Zone resizing functions
> >   *
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index ebdf6541d074..cfa0721280aa 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -98,11 +98,11 @@ void mem_hotplug_done(void)
> >  u64 max_mem_size = U64_MAX;
> >
> >  /* add this memory to iomem resource */
> > -static struct resource *register_memory_resource(u64 start, u64 size)
> > +static struct resource *register_memory_resource(u64 start, u64 size,
> > +                                              const char *resource_name)
> >  {
> >       struct resource *res;
> >       unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> > -     char *resource_name = "System RAM";
> >
> >       /*
> >        * Make sure value parsed from 'mem=' only restricts memory adding
> > @@ -1058,7 +1058,8 @@ int __ref add_memory_resource(int nid, struct resource *res,
> >       BUG_ON(ret);
> >
> >       /* create new memmap entry */
> > -     firmware_map_add_hotplug(start, start + size, "System RAM");
> > +     if (!(flags & MHP_DRIVER_MANAGED))
> > +             firmware_map_add_hotplug(start, start + size, "System RAM");
> >
> >       /* device_online() will take the lock when calling online_pages() */
> >       mem_hotplug_done();
> > @@ -1081,10 +1082,21 @@ int __ref add_memory_resource(int nid, struct resource *res,
> >  /* requires device_hotplug_lock, see add_memory_resource() */
> >  int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
> >  {
> > +     const char *resource_name = "System RAM";
> >       struct resource *res;
> >       int ret;
> >
> > -     res = register_memory_resource(start, size);
> > +     /*
> > +      * Indicate that memory managed by a driver is special. It's always
> > +      * detected and added via a driver, should not be given to the kexec
> > +      * kernel for booting when manually crafting the firmware memmap, and
> > +      * no kexec segments should be placed on it. However, kdump should
> > +      * dump this memory.
> > +      */
> > +     if (flags & MHP_DRIVER_MANAGED)
> > +             resource_name = "System RAM (driver managed)";
> > +
> > +     res = register_memory_resource(start, size, resource_name);
> >       if (IS_ERR(res))
> >               return PTR_ERR(res);
> >
> >
>
> BTW, I was wondering if this is actually also something that
> drivers/dax/kmem.c wants to use for adding memory.
>
> Just because we decided to use some DAX memory in the current kernel as
> system ram, doesn't mean we should make that decision for the kexec
> kernel (e.g., using it as initial memory, placing kexec binaries onto
> it, etc.). This is also not what we would observe during a real reboot.

Agree.

> I can see that the "System RAM" resource will show up as child resource
> under the device e.g., in /proc/iomem.
>
> However, entries in /sys/firmware/memmap/ are created as "System RAM".

True. Do you think this rename should just be limited to what type
/sys/firmware/memmap/ emits? I have the concern, but no proof
currently, that there are /proc/iomem walkers that explicitly look for
"System RAM", but might be thrown off by "System RAM (driver
managed)". I was not aware of /sys/firmware/memmap until about 5
minutes ago.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED
  2020-04-30  8:11     ` Dan Williams
@ 2020-04-30  8:20       ` David Hildenbrand
  2020-04-30  8:34         ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2020-04-30  8:20 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux Kernel Mailing List, Linux MM, virtio-dev, virtualization,
	linuxppc-dev, Linux ACPI, linux-nvdimm, linux-hyperv, linux-s390,
	xen-devel, Michal Hocko, Andrew Morton, Michael S . Tsirkin,
	Michal Hocko, Pankaj Gupta, Wei Yang, Baoquan He, Eric Biederman,
	Pavel Tatashin, Dave Hansen

On 30.04.20 10:11, Dan Williams wrote:
> On Thu, Apr 30, 2020 at 12:20 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 29.04.20 18:08, David Hildenbrand wrote:
>>> Some paravirtualized devices that add memory via add_memory() and
>>> friends (esp. virtio-mem) don't want to create entries in
>>> /sys/firmware/memmap/ - primarily to hinder kexec from adding this
>>> memory to the boot memmap of the kexec kernel.
>>>
>>> In fact, such memory is never exposed via the firmware (e.g., e820), but
>>> only via the device, so exposing this memory via /sys/firmware/memmap/ is
>>> wrong:
>>>  "kexec needs the raw firmware-provided memory map to setup the
>>>   parameter segment of the kernel that should be booted with
>>>   kexec. Also, the raw memory map is useful for debugging. For
>>>   that reason, /sys/firmware/memmap is an interface that provides
>>>   the raw memory map to userspace." [1]
>>>
>>> We want to let user space know that memory which is always detected,
>>> added, and managed via a (device) driver - like memory managed by
>>> virtio-mem - is special. It cannot be used for placing kexec segments
>>> and the (device) driver is responsible for re-adding memory that
>>> (eventually shrunk/grown/defragmented) memory after a reboot/kexec. It
>>> should e.g., not be added to a fixed up firmware memmap. However, it should
>>> be dumped by kdump.
>>>
>>> Also, such memory could behave differently than an ordinary DIMM - e.g.,
>>> memory managed by virtio-mem can have holes inside added memory resource,
>>> which should not be touched, especially for writing.
>>>
>>> Let's expose that memory as "System RAM (driver managed)" e.g., via
>>> /pro/iomem.
>>>
>>> We don't have to worry about firmware_map_remove() on the removal path.
>>> If there is no entry, it will simply return with -EINVAL.
>>>
>>> [1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>>> Cc: Wei Yang <richard.weiyang@gmail.com>
>>> Cc: Baoquan He <bhe@redhat.com>
>>> Cc: Eric Biederman <ebiederm@xmission.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  include/linux/memory_hotplug.h |  8 ++++++++
>>>  mm/memory_hotplug.c            | 20 ++++++++++++++++----
>>>  2 files changed, 24 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>>> index bf0e3edb8688..cc538584b39e 100644
>>> --- a/include/linux/memory_hotplug.h
>>> +++ b/include/linux/memory_hotplug.h
>>> @@ -68,6 +68,14 @@ struct mhp_params {
>>>       pgprot_t pgprot;
>>>  };
>>>
>>> +/* Flags used for add_memory() and friends. */
>>> +
>>> +/*
>>> + * Don't create entries in /sys/firmware/memmap/ and expose memory as
>>> + * "System RAM (driver managed)" in e.g., /proc/iomem
>>> + */
>>> +#define MHP_DRIVER_MANAGED           1
>>> +
>>>  /*
>>>   * Zone resizing functions
>>>   *
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index ebdf6541d074..cfa0721280aa 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -98,11 +98,11 @@ void mem_hotplug_done(void)
>>>  u64 max_mem_size = U64_MAX;
>>>
>>>  /* add this memory to iomem resource */
>>> -static struct resource *register_memory_resource(u64 start, u64 size)
>>> +static struct resource *register_memory_resource(u64 start, u64 size,
>>> +                                              const char *resource_name)
>>>  {
>>>       struct resource *res;
>>>       unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>> -     char *resource_name = "System RAM";
>>>
>>>       /*
>>>        * Make sure value parsed from 'mem=' only restricts memory adding
>>> @@ -1058,7 +1058,8 @@ int __ref add_memory_resource(int nid, struct resource *res,
>>>       BUG_ON(ret);
>>>
>>>       /* create new memmap entry */
>>> -     firmware_map_add_hotplug(start, start + size, "System RAM");
>>> +     if (!(flags & MHP_DRIVER_MANAGED))
>>> +             firmware_map_add_hotplug(start, start + size, "System RAM");
>>>
>>>       /* device_online() will take the lock when calling online_pages() */
>>>       mem_hotplug_done();
>>> @@ -1081,10 +1082,21 @@ int __ref add_memory_resource(int nid, struct resource *res,
>>>  /* requires device_hotplug_lock, see add_memory_resource() */
>>>  int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
>>>  {
>>> +     const char *resource_name = "System RAM";
>>>       struct resource *res;
>>>       int ret;
>>>
>>> -     res = register_memory_resource(start, size);
>>> +     /*
>>> +      * Indicate that memory managed by a driver is special. It's always
>>> +      * detected and added via a driver, should not be given to the kexec
>>> +      * kernel for booting when manually crafting the firmware memmap, and
>>> +      * no kexec segments should be placed on it. However, kdump should
>>> +      * dump this memory.
>>> +      */
>>> +     if (flags & MHP_DRIVER_MANAGED)
>>> +             resource_name = "System RAM (driver managed)";
>>> +
>>> +     res = register_memory_resource(start, size, resource_name);
>>>       if (IS_ERR(res))
>>>               return PTR_ERR(res);
>>>
>>>
>>
>> BTW, I was wondering if this is actually also something that
>> drivers/dax/kmem.c wants to use for adding memory.
>>
>> Just because we decided to use some DAX memory in the current kernel as
>> system ram, doesn't mean we should make that decision for the kexec
>> kernel (e.g., using it as initial memory, placing kexec binaries onto
>> it, etc.). This is also not what we would observe during a real reboot.
> 
> Agree.
> 
>> I can see that the "System RAM" resource will show up as child resource
>> under the device e.g., in /proc/iomem.
>>
>> However, entries in /sys/firmware/memmap/ are created as "System RAM".
> 
> True. Do you think this rename should just be limited to what type
> /sys/firmware/memmap/ emits? I have the concern, but no proof

We could split this patch into

MHP_NO_FIRMWARE_MEMMAP (create firmware memmap entries)

and

MHP_DRIVER_MANAGED (name of the resource)

See below, the latter might not be needed.

> currently, that there are /proc/iomem walkers that explicitly look for
> "System RAM", but might be thrown off by "System RAM (driver
> managed)". I was not aware of /sys/firmware/memmap until about 5
> minutes ago.

The only two users of /proc/iomem I am aware of are kexec-tools and some
s390x tools.

kexec-tools on x86-64 uses /sys/firmware/memmap to craft the initial
memmap, but uses /proc/iomem to
a) Find places for kexec images
b) Detect memory regions to dump via kdump

I am not yet sure if we really need the "System RAM (driver managed)"
part. If we can teach kexec-tools to
a) Don't place kexec images on "System RAM" that has a parent resource
(most likely requires kexec-tools changes)
b) Consider for kdump "System RAM" that has a parent resource
we might be able to avoid renaming that. (I assume that's already done)

E.g., regarding virtio-mem (patch #3) I am currently also looking into
creating a parent resource instead, like dax/kmem to avoid the rename:

:/# cat /proc/iomem
00000000-00000fff : Reserved
[...]
100000000-13fffffff : System RAM
140000000-33fffffff : virtio0
  140000000-147ffffff : System RAM
  148000000-14fffffff : System RAM
  150000000-157ffffff : System RAM
340000000-303fffffff : virtio1
  340000000-347ffffff : System RAM
3280000000-32ffffffff : PCI Bus 0000:00



-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED
  2020-04-30  8:20       ` David Hildenbrand
@ 2020-04-30  8:34         ` Dan Williams
  0 siblings, 0 replies; 9+ messages in thread
From: Dan Williams @ 2020-04-30  8:34 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Linux Kernel Mailing List, Linux MM, virtio-dev, virtualization,
	linuxppc-dev, Linux ACPI, linux-nvdimm, linux-hyperv, linux-s390,
	xen-devel, Michal Hocko, Andrew Morton, Michael S . Tsirkin,
	Michal Hocko, Pankaj Gupta, Wei Yang, Baoquan He, Eric Biederman,
	Pavel Tatashin, Dave Hansen

On Thu, Apr 30, 2020 at 1:21 AM David Hildenbrand <david@redhat.com> wrote:
> >> Just because we decided to use some DAX memory in the current kernel as
> >> system ram, doesn't mean we should make that decision for the kexec
> >> kernel (e.g., using it as initial memory, placing kexec binaries onto
> >> it, etc.). This is also not what we would observe during a real reboot.
> >
> > Agree.
> >
> >> I can see that the "System RAM" resource will show up as child resource
> >> under the device e.g., in /proc/iomem.
> >>
> >> However, entries in /sys/firmware/memmap/ are created as "System RAM".
> >
> > True. Do you think this rename should just be limited to what type
> > /sys/firmware/memmap/ emits? I have the concern, but no proof
>
> We could split this patch into
>
> MHP_NO_FIRMWARE_MEMMAP (create firmware memmap entries)
>
> and
>
> MHP_DRIVER_MANAGED (name of the resource)
>
> See below, the latter might not be needed.
>
> > currently, that there are /proc/iomem walkers that explicitly look for
> > "System RAM", but might be thrown off by "System RAM (driver
> > managed)". I was not aware of /sys/firmware/memmap until about 5
> > minutes ago.
>
> The only two users of /proc/iomem I am aware of are kexec-tools and some
> s390x tools.
>
> kexec-tools on x86-64 uses /sys/firmware/memmap to craft the initial
> memmap, but uses /proc/iomem to
> a) Find places for kexec images
> b) Detect memory regions to dump via kdump
>
> I am not yet sure if we really need the "System RAM (driver managed)"
> part. If we can teach kexec-tools to
> a) Don't place kexec images on "System RAM" that has a parent resource
> (most likely requires kexec-tools changes)
> b) Consider for kdump "System RAM" that has a parent resource
> we might be able to avoid renaming that. (I assume that's already done)
>
> E.g., regarding virtio-mem (patch #3) I am currently also looking into
> creating a parent resource instead, like dax/kmem to avoid the rename:
>
> :/# cat /proc/iomem
> 00000000-00000fff : Reserved
> [...]
> 100000000-13fffffff : System RAM
> 140000000-33fffffff : virtio0
>   140000000-147ffffff : System RAM
>   148000000-14fffffff : System RAM
>   150000000-157ffffff : System RAM
> 340000000-303fffffff : virtio1
>   340000000-347ffffff : System RAM
> 3280000000-32ffffffff : PCI Bus 0000:00

Looks good to me if it flies with kexec-tools.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-29 16:08 [PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with kexec-tools David Hildenbrand
2020-04-29 16:08 ` [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends David Hildenbrand
2020-04-29 16:41   ` Wei Liu
2020-04-29 16:08 ` [PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED David Hildenbrand
2020-04-30  7:19   ` David Hildenbrand
2020-04-30  8:11     ` Dan Williams
2020-04-30  8:20       ` David Hildenbrand
2020-04-30  8:34         ` Dan Williams
2020-04-29 16:08 ` [PATCH v1 3/3] virtio-mem: Add memory with MHP_DRIVER_MANAGED David Hildenbrand

Linux-HyperV Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-hyperv/0 linux-hyperv/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-hyperv linux-hyperv/ https://lore.kernel.org/linux-hyperv \
		linux-hyperv@vger.kernel.org
	public-inbox-index linux-hyperv

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-hyperv


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git