All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59 ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, David Hildenbrand, Michal Hocko, Vitaly Kuznetsov,
	Pavel Tatashin, Rich Felker, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, Dan Williams, Paul Mackerras,
	Pavel Tatashin, linux-s390, Michael Neuling, Stefano Stabellini

This is the second approach, introducing more meaningful memory block
types and not changing online behavior in the kernel. It is based on
latest linux-next.

As we found out during dicussion, user space should always handle onlining
of memory, in any case. However in order to make smart decisions in user
space about if and how to online memory, we have to export more information
about memory blocks. This way, we can formulate rules in user space.

One such information is the type of memory block we are talking about.
This helps to answer some questions like:
- Does this memory block belong to a DIMM?
- Can this DIMM theoretically ever be unplugged again?
- Was this memory added by a balloon driver that will rely on balloon
  inflation to remove chunks of that memory again? Which zone is advised?
- Is this special standby memory on s390x that is usually not automatically
  onlined?

And in short it helps to answer to some extend (excluding zone imbalances)
- Should I online this memory block?
- To which zone should I online this memory block?
... of course special use cases will result in different anwers. But that's
why user space has control of onlining memory.

More details can be found in Patch 1 and Patch 3.
Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.


Example:
$ udevadm info -q all -a /sys/devices/system/memory/memory0
	KERNEL=="memory0"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="00000000"
	ATTR{removable}=="0"
	ATTR{state}=="online"
	ATTR{type}=="boot"
	ATTR{valid_zones}=="none"
$ udevadm info -q all -a /sys/devices/system/memory/memory90
	KERNEL=="memory90"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="0000005a"
	ATTR{removable}=="1"
	ATTR{state}=="online"
	ATTR{type}=="dimm"
	ATTR{valid_zones}=="Normal"


RFC -> RFCv2:
- Now also taking care of PPC (somehow missed it :/ )
- Split the series up to some degree (some ideas on how to split up patch 3
  would be very welcome)
- Introduce more memory block types. Turns out abstracting too much was
  rather confusing and not helpful. Properly document them.

Notes:
- I wanted to convert the enum of types into a named enum but this
  provoked all kinds of different errors. For now, I am doing it just like
  the other types (e.g. online_type) we are using in that context.
- The "removable" property should never have been named like that. It
  should have been "offlinable". Can we still rename that? E.g. boot memory
  is sometimes marked as removable ...

David Hildenbrand (4):
  mm/memory_hotplug: Introduce memory block types
  mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  mm/memory_hotplug: Introduce and use more memory types
  mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED

 arch/ia64/mm/init.c                           |  4 +-
 arch/powerpc/mm/mem.c                         |  4 +-
 arch/powerpc/platforms/powernv/memtrace.c     |  9 +--
 .../platforms/pseries/hotplug-memory.c        |  7 +-
 arch/s390/mm/init.c                           |  4 +-
 arch/sh/mm/init.c                             |  4 +-
 arch/x86/mm/init_32.c                         |  4 +-
 arch/x86/mm/init_64.c                         |  8 +--
 drivers/acpi/acpi_memhotplug.c                | 16 ++++-
 drivers/base/memory.c                         | 60 ++++++++++++++--
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 69 ++++++++++++++++++-
 include/linux/memory_hotplug.h                | 18 ++---
 kernel/memremap.c                             |  6 +-
 mm/memory_hotplug.c                           | 29 ++++----
 17 files changed, 194 insertions(+), 56 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59 ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Andrew Banman, Andrew Morton, Andy Lutomirski, Arun KS,
	Balbir Singh, Benjamin Herrenschmidt, Borislav Petkov,
	Boris Ostrovsky, Christophe Leroy, Dan Williams, Dave Hansen,
	Dave Jiang, Fenghua Yu, Greg Kroah-Hartman, Haiyang Zhang,
	Heiko Carstens, H. Peter Anvin, Ingo Molnar, Ingo Molnar,
	Jan H. Schönherr, Jérôme Glisse,
	Jonathan Neuschäfer, Joonsoo Kim, Juergen Gross,
	Kirill A. Shutemov, K. Y. Srinivasan, Len Brown, Logan Gunthorpe,
	Martin Schwidefsky, Mathieu Malaterre, Matthew Wilcox,
	Mauricio Faria de Oliveira, Michael Ellerman, Michael Neuling,
	Michal Hocko, Michal Hocko, Michal Suchánek, Mike Rapoport,
	mike.travis, Nathan Fontenot, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Pavel Tatashin, Peter Zijlstra, Rafael J. Wysocki,
	Rafael J. Wysocki, Rashmica Gupta, Rich Felker, Rob Herring,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Vitaly Kuznetsov,
	Wei Yang, Yoshinori Sato, YueHaibing

This is the second approach, introducing more meaningful memory block
types and not changing online behavior in the kernel. It is based on
latest linux-next.

As we found out during dicussion, user space should always handle onlining
of memory, in any case. However in order to make smart decisions in user
space about if and how to online memory, we have to export more information
about memory blocks. This way, we can formulate rules in user space.

One such information is the type of memory block we are talking about.
This helps to answer some questions like:
- Does this memory block belong to a DIMM?
- Can this DIMM theoretically ever be unplugged again?
- Was this memory added by a balloon driver that will rely on balloon
  inflation to remove chunks of that memory again? Which zone is advised?
- Is this special standby memory on s390x that is usually not automatically
  onlined?

And in short it helps to answer to some extend (excluding zone imbalances)
- Should I online this memory block?
- To which zone should I online this memory block?
... of course special use cases will result in different anwers. But that's
why user space has control of onlining memory.

More details can be found in Patch 1 and Patch 3.
Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.


Example:
$ udevadm info -q all -a /sys/devices/system/memory/memory0
	KERNEL=="memory0"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="00000000"
	ATTR{removable}=="0"
	ATTR{state}=="online"
	ATTR{type}=="boot"
	ATTR{valid_zones}=="none"
$ udevadm info -q all -a /sys/devices/system/memory/memory90
	KERNEL=="memory90"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="0000005a"
	ATTR{removable}=="1"
	ATTR{state}=="online"
	ATTR{type}=="dimm"
	ATTR{valid_zones}=="Normal"


RFC -> RFCv2:
- Now also taking care of PPC (somehow missed it :/ )
- Split the series up to some degree (some ideas on how to split up patch 3
  would be very welcome)
- Introduce more memory block types. Turns out abstracting too much was
  rather confusing and not helpful. Properly document them.

Notes:
- I wanted to convert the enum of types into a named enum but this
  provoked all kinds of different errors. For now, I am doing it just like
  the other types (e.g. online_type) we are using in that context.
- The "removable" property should never have been named like that. It
  should have been "offlinable". Can we still rename that? E.g. boot memory
  is sometimes marked as removable ...

David Hildenbrand (4):
  mm/memory_hotplug: Introduce memory block types
  mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  mm/memory_hotplug: Introduce and use more memory types
  mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED

 arch/ia64/mm/init.c                           |  4 +-
 arch/powerpc/mm/mem.c                         |  4 +-
 arch/powerpc/platforms/powernv/memtrace.c     |  9 +--
 .../platforms/pseries/hotplug-memory.c        |  7 +-
 arch/s390/mm/init.c                           |  4 +-
 arch/sh/mm/init.c                             |  4 +-
 arch/x86/mm/init_32.c                         |  4 +-
 arch/x86/mm/init_64.c                         |  8 +--
 drivers/acpi/acpi_memhotplug.c                | 16 ++++-
 drivers/base/memory.c                         | 60 ++++++++++++++--
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 69 ++++++++++++++++++-
 include/linux/memory_hotplug.h                | 18 ++---
 kernel/memremap.c                             |  6 +-
 mm/memory_hotplug.c                           | 29 ++++----
 17 files changed, 194 insertions(+), 56 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59 ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Dave Hansen, David Hildenbrand,
	Michal Hocko, Vitaly Kuznetsov, Pavel Tatashin, Rich Felker,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	K. Y. Srinivasan, Dan Williams, Paul Mackerras, Pavel Tatashin,
	linux-s390, Michael Neuling, Stefano Stabellini, Dave Jiang,
	Yoshinori Sato, Logan Gunthorpe, x86, YueHaibing, Pavel Tatashin,
	Matthew Wilcox, Ingo Molnar, linux-acpi, Ingo Molnar, xen-devel,
	Michal Suchánek, Len Brown, Fenghua Yu,
	Jan H. Schönherr, Juergen Gross, Vasily Gorbik, Rob Herring,
	mike.travis, Heiko Carstens, Haiyang Zhang,
	Jonathan Neuschäfer, Nicholas Piggin,
	Jérôme Glisse, Mike Rapoport, Borislav Petkov,
	Andy Lutomirski, Nathan Fontenot, Stephen Hemminger,
	Boris Ostrovsky, Wei Yang, Joonsoo Kim, Oscar Salvador,
	Tony Luck, Andrew Banman, Mathieu Malaterre, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, Mauricio Faria de Oliveira,
	Thomas Gleixner, Martin Schwidefsky, devel, Andrew Morton,
	linuxppc-dev, Kirill A. Shutemov

This is the second approach, introducing more meaningful memory block
types and not changing online behavior in the kernel. It is based on
latest linux-next.

As we found out during dicussion, user space should always handle onlining
of memory, in any case. However in order to make smart decisions in user
space about if and how to online memory, we have to export more information
about memory blocks. This way, we can formulate rules in user space.

One such information is the type of memory block we are talking about.
This helps to answer some questions like:
- Does this memory block belong to a DIMM?
- Can this DIMM theoretically ever be unplugged again?
- Was this memory added by a balloon driver that will rely on balloon
  inflation to remove chunks of that memory again? Which zone is advised?
- Is this special standby memory on s390x that is usually not automatically
  onlined?

And in short it helps to answer to some extend (excluding zone imbalances)
- Should I online this memory block?
- To which zone should I online this memory block?
... of course special use cases will result in different anwers. But that's
why user space has control of onlining memory.

More details can be found in Patch 1 and Patch 3.
Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.


Example:
$ udevadm info -q all -a /sys/devices/system/memory/memory0
	KERNEL=="memory0"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="00000000"
	ATTR{removable}=="0"
	ATTR{state}=="online"
	ATTR{type}=="boot"
	ATTR{valid_zones}=="none"
$ udevadm info -q all -a /sys/devices/system/memory/memory90
	KERNEL=="memory90"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="0000005a"
	ATTR{removable}=="1"
	ATTR{state}=="online"
	ATTR{type}=="dimm"
	ATTR{valid_zones}=="Normal"


RFC -> RFCv2:
- Now also taking care of PPC (somehow missed it :/ )
- Split the series up to some degree (some ideas on how to split up patch 3
  would be very welcome)
- Introduce more memory block types. Turns out abstracting too much was
  rather confusing and not helpful. Properly document them.

Notes:
- I wanted to convert the enum of types into a named enum but this
  provoked all kinds of different errors. For now, I am doing it just like
  the other types (e.g. online_type) we are using in that context.
- The "removable" property should never have been named like that. It
  should have been "offlinable". Can we still rename that? E.g. boot memory
  is sometimes marked as removable ...

David Hildenbrand (4):
  mm/memory_hotplug: Introduce memory block types
  mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  mm/memory_hotplug: Introduce and use more memory types
  mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED

 arch/ia64/mm/init.c                           |  4 +-
 arch/powerpc/mm/mem.c                         |  4 +-
 arch/powerpc/platforms/powernv/memtrace.c     |  9 +--
 .../platforms/pseries/hotplug-memory.c        |  7 +-
 arch/s390/mm/init.c                           |  4 +-
 arch/sh/mm/init.c                             |  4 +-
 arch/x86/mm/init_32.c                         |  4 +-
 arch/x86/mm/init_64.c                         |  8 +--
 drivers/acpi/acpi_memhotplug.c                | 16 ++++-
 drivers/base/memory.c                         | 60 ++++++++++++++--
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 69 ++++++++++++++++++-
 include/linux/memory_hotplug.h                | 18 ++---
 kernel/memremap.c                             |  6 +-
 mm/memory_hotplug.c                           | 29 ++++----
 17 files changed, 194 insertions(+), 56 deletions(-)

-- 
2.17.2


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59 ` David Hildenbrand
                     ` (2 preceding siblings ...)
  (?)
@ 2018-11-30 17:59   ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

Memory onlining should always be handled by user space, because only user
space knows which use cases it wants to satisfy. E.g. memory might be
onlined to the MOVABLE zone even if it can never be removed from the
system, e.g. to make usage of huge pages more reliable.

However to implement such rules (especially default rules in distributions)
we need more information about the memory that was added in user space.

E.g. on x86 we want to online memory provided by balloon devices (e.g.
XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
block) than ordinary DIMMs (-> might eventually be unplugged by offlining
the whole block). This might also become relevat for other architectures.

Also, udev rules right now check if running on s390x and treat all added
memory blocks as standby memory (-> don't online automatically). As soon as
we support other memory hotplug mechanism (e.g. virtio-mem) checks would
have to get more involved (e.g. also check if under KVM) but eventually
also wrong (e.g. if KVM ever supports standby memory we are doomed).

I decided to allow to specify the type of memory that is getting added
to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
basic infrastructure running. We'll introduce and use further types in
follow-up patches. For now we classify any hotplugged memory temporarily
as as UNSPECIFIED (which will eventually be dropped later on).

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
 include/linux/memory.h | 27 +++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0c290f86ab20..17f2985c07c5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct memory_block *mem = to_memory_block(dev);
+	ssize_t len = 0;
+
+	switch (mem->type) {
+	case MEMORY_BLOCK_UNSPECIFIED:
+		len = sprintf(buf, "unspecified\n");
+		break;
+	case MEMORY_BLOCK_BOOT:
+		len = sprintf(buf, "boot\n");
+		break;
+	default:
+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
+				mem->state);
+		WARN_ON(1);
+		break;
+	}
+
+	return len;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 		unsigned long nr_pages, int online_type,
@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
 static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
 static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
+static DEVICE_ATTR_RO(type);
 
 /*
  * Block size attribute stuff
@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_phys_device.attr,
 	&dev_attr_removable.attr,
+	&dev_attr_type.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
 #endif
@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
 }
 
 static int init_memory_block(struct memory_block **memory,
-			     struct mem_section *section, unsigned long state)
+			     struct mem_section *section, unsigned long state,
+			     int type)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
 	int scn_nr;
 	int ret = 0;
 
+	if (type = MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
 	mem->state = state;
 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
+	mem->type = type;
 
 	ret = register_memory(mem);
 
@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
 
 	if (section_count = 0)
 		return 0;
-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
+				MEMORY_BLOCK_BOOT);
 	if (ret)
 		return ret;
 	mem->section_count = section_count;
@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 		mem->section_count++;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
+					MEMORY_BLOCK_UNSPECIFIED);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index d75ec88ca09d..06268e96e0da 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -34,12 +34,39 @@ struct memory_block {
 	int (*phys_callback)(struct memory_block *);
 	struct device dev;
 	int nid;			/* NID for this memory block */
+	int type;			/* type of this memory block */
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
 unsigned long memory_block_size_bytes(void);
 int set_memory_block_size_order(unsigned int order);
 
+/*
+ * Memory block types allow user space to formulate rules if and how to
+ * online memory blocks. The types are exposed to user space as text
+ * strings in sysfs.
+ *
+ * MEMORY_BLOCK_NONE:
+ *  No memory block is to be created (e.g. device memory). Not exposed to
+ *  user space.
+ *
+ * MEMORY_BLOCK_UNSPECIFIED:
+ *  The type of memory block was not further specified when adding the
+ *  memory block.
+ *
+ * MEMORY_BLOCK_BOOT:
+ *  This memory block was added during boot by the basic system. No
+ *  specific device driver takes care of this memory block. This memory
+ *  block type is onlined automatically by the kernel during boot and might
+ *  later be managed by a different device driver, in which case the type
+ *  might change.
+ */
+enum {
+	MEMORY_BLOCK_NONE = 0,
+	MEMORY_BLOCK_UNSPECIFIED,
+	MEMORY_BLOCK_BOOT,
+};
+
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
 #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

Memory onlining should always be handled by user space, because only user
space knows which use cases it wants to satisfy. E.g. memory might be
onlined to the MOVABLE zone even if it can never be removed from the
system, e.g. to make usage of huge pages more reliable.

However to implement such rules (especially default rules in distributions)
we need more information about the memory that was added in user space.

E.g. on x86 we want to online memory provided by balloon devices (e.g.
XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
block) than ordinary DIMMs (-> might eventually be unplugged by offlining
the whole block). This might also become relevat for other architectures.

Also, udev rules right now check if running on s390x and treat all added
memory blocks as standby memory (-> don't online automatically). As soon as
we support other memory hotplug mechanism (e.g. virtio-mem) checks would
have to get more involved (e.g. also check if under KVM) but eventually
also wrong (e.g. if KVM ever supports standby memory we are doomed).

I decided to allow to specify the type of memory that is getting added
to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
basic infrastructure running. We'll introduce and use further types in
follow-up patches. For now we classify any hotplugged memory temporarily
as as UNSPECIFIED (which will eventually be dropped later on).

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
 include/linux/memory.h | 27 +++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0c290f86ab20..17f2985c07c5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct memory_block *mem = to_memory_block(dev);
+	ssize_t len = 0;
+
+	switch (mem->type) {
+	case MEMORY_BLOCK_UNSPECIFIED:
+		len = sprintf(buf, "unspecified\n");
+		break;
+	case MEMORY_BLOCK_BOOT:
+		len = sprintf(buf, "boot\n");
+		break;
+	default:
+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
+				mem->state);
+		WARN_ON(1);
+		break;
+	}
+
+	return len;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 		unsigned long nr_pages, int online_type,
@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
 static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
 static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
+static DEVICE_ATTR_RO(type);
 
 /*
  * Block size attribute stuff
@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_phys_device.attr,
 	&dev_attr_removable.attr,
+	&dev_attr_type.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
 #endif
@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
 }
 
 static int init_memory_block(struct memory_block **memory,
-			     struct mem_section *section, unsigned long state)
+			     struct mem_section *section, unsigned long state,
+			     int type)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
 	int scn_nr;
 	int ret = 0;
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
 	mem->state = state;
 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
+	mem->type = type;
 
 	ret = register_memory(mem);
 
@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
 
 	if (section_count == 0)
 		return 0;
-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
+				MEMORY_BLOCK_BOOT);
 	if (ret)
 		return ret;
 	mem->section_count = section_count;
@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 		mem->section_count++;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
+					MEMORY_BLOCK_UNSPECIFIED);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index d75ec88ca09d..06268e96e0da 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -34,12 +34,39 @@ struct memory_block {
 	int (*phys_callback)(struct memory_block *);
 	struct device dev;
 	int nid;			/* NID for this memory block */
+	int type;			/* type of this memory block */
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
 unsigned long memory_block_size_bytes(void);
 int set_memory_block_size_order(unsigned int order);
 
+/*
+ * Memory block types allow user space to formulate rules if and how to
+ * online memory blocks. The types are exposed to user space as text
+ * strings in sysfs.
+ *
+ * MEMORY_BLOCK_NONE:
+ *  No memory block is to be created (e.g. device memory). Not exposed to
+ *  user space.
+ *
+ * MEMORY_BLOCK_UNSPECIFIED:
+ *  The type of memory block was not further specified when adding the
+ *  memory block.
+ *
+ * MEMORY_BLOCK_BOOT:
+ *  This memory block was added during boot by the basic system. No
+ *  specific device driver takes care of this memory block. This memory
+ *  block type is onlined automatically by the kernel during boot and might
+ *  later be managed by a different device driver, in which case the type
+ *  might change.
+ */
+enum {
+	MEMORY_BLOCK_NONE = 0,
+	MEMORY_BLOCK_UNSPECIFIED,
+	MEMORY_BLOCK_BOOT,
+};
+
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
 #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko,
	Michal Suchánek, Vitaly Kuznetsov, Dan Williams,
	Pavel Tatashin, Martin Schwidefsky, Heiko Carstens

Memory onlining should always be handled by user space, because only user
space knows which use cases it wants to satisfy. E.g. memory might be
onlined to the MOVABLE zone even if it can never be removed from the
system, e.g. to make usage of huge pages more reliable.

However to implement such rules (especially default rules in distributions)
we need more information about the memory that was added in user space.

E.g. on x86 we want to online memory provided by balloon devices (e.g.
XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
block) than ordinary DIMMs (-> might eventually be unplugged by offlining
the whole block). This might also become relevat for other architectures.

Also, udev rules right now check if running on s390x and treat all added
memory blocks as standby memory (-> don't online automatically). As soon as
we support other memory hotplug mechanism (e.g. virtio-mem) checks would
have to get more involved (e.g. also check if under KVM) but eventually
also wrong (e.g. if KVM ever supports standby memory we are doomed).

I decided to allow to specify the type of memory that is getting added
to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
basic infrastructure running. We'll introduce and use further types in
follow-up patches. For now we classify any hotplugged memory temporarily
as as UNSPECIFIED (which will eventually be dropped later on).

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
 include/linux/memory.h | 27 +++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0c290f86ab20..17f2985c07c5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct memory_block *mem = to_memory_block(dev);
+	ssize_t len = 0;
+
+	switch (mem->type) {
+	case MEMORY_BLOCK_UNSPECIFIED:
+		len = sprintf(buf, "unspecified\n");
+		break;
+	case MEMORY_BLOCK_BOOT:
+		len = sprintf(buf, "boot\n");
+		break;
+	default:
+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
+				mem->state);
+		WARN_ON(1);
+		break;
+	}
+
+	return len;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 		unsigned long nr_pages, int online_type,
@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
 static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
 static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
+static DEVICE_ATTR_RO(type);
 
 /*
  * Block size attribute stuff
@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_phys_device.attr,
 	&dev_attr_removable.attr,
+	&dev_attr_type.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
 #endif
@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
 }
 
 static int init_memory_block(struct memory_block **memory,
-			     struct mem_section *section, unsigned long state)
+			     struct mem_section *section, unsigned long state,
+			     int type)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
 	int scn_nr;
 	int ret = 0;
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
 	mem->state = state;
 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
+	mem->type = type;
 
 	ret = register_memory(mem);
 
@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
 
 	if (section_count == 0)
 		return 0;
-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
+				MEMORY_BLOCK_BOOT);
 	if (ret)
 		return ret;
 	mem->section_count = section_count;
@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 		mem->section_count++;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
+					MEMORY_BLOCK_UNSPECIFIED);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index d75ec88ca09d..06268e96e0da 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -34,12 +34,39 @@ struct memory_block {
 	int (*phys_callback)(struct memory_block *);
 	struct device dev;
 	int nid;			/* NID for this memory block */
+	int type;			/* type of this memory block */
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
 unsigned long memory_block_size_bytes(void);
 int set_memory_block_size_order(unsigned int order);
 
+/*
+ * Memory block types allow user space to formulate rules if and how to
+ * online memory blocks. The types are exposed to user space as text
+ * strings in sysfs.
+ *
+ * MEMORY_BLOCK_NONE:
+ *  No memory block is to be created (e.g. device memory). Not exposed to
+ *  user space.
+ *
+ * MEMORY_BLOCK_UNSPECIFIED:
+ *  The type of memory block was not further specified when adding the
+ *  memory block.
+ *
+ * MEMORY_BLOCK_BOOT:
+ *  This memory block was added during boot by the basic system. No
+ *  specific device driver takes care of this memory block. This memory
+ *  block type is onlined automatically by the kernel during boot and might
+ *  later be managed by a different device driver, in which case the type
+ *  might change.
+ */
+enum {
+	MEMORY_BLOCK_NONE = 0,
+	MEMORY_BLOCK_UNSPECIFIED,
+	MEMORY_BLOCK_BOOT,
+};
+
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
 #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	David Hildenbrand, Michal Hocko, Ingo Molnar, linux-s390, x86,
	Pavel Tatashin, linux-acpi, xen-devel, Michal Suchánek,
	Pavel Tatashin, Stephen Rothwell, mike.travis, Heiko Carstens,
	Martin Schwidefsky, Dan Williams, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel,
	Rafael J. Wysocki, devel, Andrew Morton, linuxppc-dev

Memory onlining should always be handled by user space, because only user
space knows which use cases it wants to satisfy. E.g. memory might be
onlined to the MOVABLE zone even if it can never be removed from the
system, e.g. to make usage of huge pages more reliable.

However to implement such rules (especially default rules in distributions)
we need more information about the memory that was added in user space.

E.g. on x86 we want to online memory provided by balloon devices (e.g.
XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
block) than ordinary DIMMs (-> might eventually be unplugged by offlining
the whole block). This might also become relevat for other architectures.

Also, udev rules right now check if running on s390x and treat all added
memory blocks as standby memory (-> don't online automatically). As soon as
we support other memory hotplug mechanism (e.g. virtio-mem) checks would
have to get more involved (e.g. also check if under KVM) but eventually
also wrong (e.g. if KVM ever supports standby memory we are doomed).

I decided to allow to specify the type of memory that is getting added
to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
basic infrastructure running. We'll introduce and use further types in
follow-up patches. For now we classify any hotplugged memory temporarily
as as UNSPECIFIED (which will eventually be dropped later on).

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
 include/linux/memory.h | 27 +++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0c290f86ab20..17f2985c07c5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct memory_block *mem = to_memory_block(dev);
+	ssize_t len = 0;
+
+	switch (mem->type) {
+	case MEMORY_BLOCK_UNSPECIFIED:
+		len = sprintf(buf, "unspecified\n");
+		break;
+	case MEMORY_BLOCK_BOOT:
+		len = sprintf(buf, "boot\n");
+		break;
+	default:
+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
+				mem->state);
+		WARN_ON(1);
+		break;
+	}
+
+	return len;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 		unsigned long nr_pages, int online_type,
@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
 static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
 static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
+static DEVICE_ATTR_RO(type);
 
 /*
  * Block size attribute stuff
@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_phys_device.attr,
 	&dev_attr_removable.attr,
+	&dev_attr_type.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
 #endif
@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
 }
 
 static int init_memory_block(struct memory_block **memory,
-			     struct mem_section *section, unsigned long state)
+			     struct mem_section *section, unsigned long state,
+			     int type)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
 	int scn_nr;
 	int ret = 0;
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
 	mem->state = state;
 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
+	mem->type = type;
 
 	ret = register_memory(mem);
 
@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
 
 	if (section_count == 0)
 		return 0;
-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
+				MEMORY_BLOCK_BOOT);
 	if (ret)
 		return ret;
 	mem->section_count = section_count;
@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 		mem->section_count++;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
+					MEMORY_BLOCK_UNSPECIFIED);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index d75ec88ca09d..06268e96e0da 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -34,12 +34,39 @@ struct memory_block {
 	int (*phys_callback)(struct memory_block *);
 	struct device dev;
 	int nid;			/* NID for this memory block */
+	int type;			/* type of this memory block */
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
 unsigned long memory_block_size_bytes(void);
 int set_memory_block_size_order(unsigned int order);
 
+/*
+ * Memory block types allow user space to formulate rules if and how to
+ * online memory blocks. The types are exposed to user space as text
+ * strings in sysfs.
+ *
+ * MEMORY_BLOCK_NONE:
+ *  No memory block is to be created (e.g. device memory). Not exposed to
+ *  user space.
+ *
+ * MEMORY_BLOCK_UNSPECIFIED:
+ *  The type of memory block was not further specified when adding the
+ *  memory block.
+ *
+ * MEMORY_BLOCK_BOOT:
+ *  This memory block was added during boot by the basic system. No
+ *  specific device driver takes care of this memory block. This memory
+ *  block type is onlined automatically by the kernel during boot and might
+ *  later be managed by a different device driver, in which case the type
+ *  might change.
+ */
+enum {
+	MEMORY_BLOCK_NONE = 0,
+	MEMORY_BLOCK_UNSPECIFIED,
+	MEMORY_BLOCK_BOOT,
+};
+
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
 #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59 ` David Hildenbrand
                   ` (2 preceding siblings ...)
  (?)
@ 2018-11-30 17:59 ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	David Hildenbrand, Michal Hocko, Ingo Molnar, linux-s390, x86,
	Pavel Tatashin, linux-acpi, xen-devel, Michal Suchánek,
	Pavel Tatashin, Stephen Rothwell, mike.travis, Heiko Carstens,
	Martin Schwidefsky, Dan Williams, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel

Memory onlining should always be handled by user space, because only user
space knows which use cases it wants to satisfy. E.g. memory might be
onlined to the MOVABLE zone even if it can never be removed from the
system, e.g. to make usage of huge pages more reliable.

However to implement such rules (especially default rules in distributions)
we need more information about the memory that was added in user space.

E.g. on x86 we want to online memory provided by balloon devices (e.g.
XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
block) than ordinary DIMMs (-> might eventually be unplugged by offlining
the whole block). This might also become relevat for other architectures.

Also, udev rules right now check if running on s390x and treat all added
memory blocks as standby memory (-> don't online automatically). As soon as
we support other memory hotplug mechanism (e.g. virtio-mem) checks would
have to get more involved (e.g. also check if under KVM) but eventually
also wrong (e.g. if KVM ever supports standby memory we are doomed).

I decided to allow to specify the type of memory that is getting added
to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
basic infrastructure running. We'll introduce and use further types in
follow-up patches. For now we classify any hotplugged memory temporarily
as as UNSPECIFIED (which will eventually be dropped later on).

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
 include/linux/memory.h | 27 +++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0c290f86ab20..17f2985c07c5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct memory_block *mem = to_memory_block(dev);
+	ssize_t len = 0;
+
+	switch (mem->type) {
+	case MEMORY_BLOCK_UNSPECIFIED:
+		len = sprintf(buf, "unspecified\n");
+		break;
+	case MEMORY_BLOCK_BOOT:
+		len = sprintf(buf, "boot\n");
+		break;
+	default:
+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
+				mem->state);
+		WARN_ON(1);
+		break;
+	}
+
+	return len;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 		unsigned long nr_pages, int online_type,
@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
 static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
 static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
+static DEVICE_ATTR_RO(type);
 
 /*
  * Block size attribute stuff
@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_phys_device.attr,
 	&dev_attr_removable.attr,
+	&dev_attr_type.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
 #endif
@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
 }
 
 static int init_memory_block(struct memory_block **memory,
-			     struct mem_section *section, unsigned long state)
+			     struct mem_section *section, unsigned long state,
+			     int type)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
 	int scn_nr;
 	int ret = 0;
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
 	mem->state = state;
 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
+	mem->type = type;
 
 	ret = register_memory(mem);
 
@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
 
 	if (section_count == 0)
 		return 0;
-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
+				MEMORY_BLOCK_BOOT);
 	if (ret)
 		return ret;
 	mem->section_count = section_count;
@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 		mem->section_count++;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
+					MEMORY_BLOCK_UNSPECIFIED);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index d75ec88ca09d..06268e96e0da 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -34,12 +34,39 @@ struct memory_block {
 	int (*phys_callback)(struct memory_block *);
 	struct device dev;
 	int nid;			/* NID for this memory block */
+	int type;			/* type of this memory block */
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
 unsigned long memory_block_size_bytes(void);
 int set_memory_block_size_order(unsigned int order);
 
+/*
+ * Memory block types allow user space to formulate rules if and how to
+ * online memory blocks. The types are exposed to user space as text
+ * strings in sysfs.
+ *
+ * MEMORY_BLOCK_NONE:
+ *  No memory block is to be created (e.g. device memory). Not exposed to
+ *  user space.
+ *
+ * MEMORY_BLOCK_UNSPECIFIED:
+ *  The type of memory block was not further specified when adding the
+ *  memory block.
+ *
+ * MEMORY_BLOCK_BOOT:
+ *  This memory block was added during boot by the basic system. No
+ *  specific device driver takes care of this memory block. This memory
+ *  block type is onlined automatically by the kernel during boot and might
+ *  later be managed by a different device driver, in which case the type
+ *  might change.
+ */
+enum {
+	MEMORY_BLOCK_NONE = 0,
+	MEMORY_BLOCK_UNSPECIFIED,
+	MEMORY_BLOCK_BOOT,
+};
+
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
 #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
-- 
2.17.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

Memory onlining should always be handled by user space, because only user
space knows which use cases it wants to satisfy. E.g. memory might be
onlined to the MOVABLE zone even if it can never be removed from the
system, e.g. to make usage of huge pages more reliable.

However to implement such rules (especially default rules in distributions)
we need more information about the memory that was added in user space.

E.g. on x86 we want to online memory provided by balloon devices (e.g.
XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
block) than ordinary DIMMs (-> might eventually be unplugged by offlining
the whole block). This might also become relevat for other architectures.

Also, udev rules right now check if running on s390x and treat all added
memory blocks as standby memory (-> don't online automatically). As soon as
we support other memory hotplug mechanism (e.g. virtio-mem) checks would
have to get more involved (e.g. also check if under KVM) but eventually
also wrong (e.g. if KVM ever supports standby memory we are doomed).

I decided to allow to specify the type of memory that is getting added
to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
basic infrastructure running. We'll introduce and use further types in
follow-up patches. For now we classify any hotplugged memory temporarily
as as UNSPECIFIED (which will eventually be dropped later on).

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Such√°nek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
 include/linux/memory.h | 27 +++++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 0c290f86ab20..17f2985c07c5 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
 	return sprintf(buf, "%d\n", mem->phys_device);
 }
 
+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct memory_block *mem = to_memory_block(dev);
+	ssize_t len = 0;
+
+	switch (mem->type) {
+	case MEMORY_BLOCK_UNSPECIFIED:
+		len = sprintf(buf, "unspecified\n");
+		break;
+	case MEMORY_BLOCK_BOOT:
+		len = sprintf(buf, "boot\n");
+		break;
+	default:
+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
+				mem->state);
+		WARN_ON(1);
+		break;
+	}
+
+	return len;
+}
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
 		unsigned long nr_pages, int online_type,
@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
 static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
 static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
 static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
+static DEVICE_ATTR_RO(type);
 
 /*
  * Block size attribute stuff
@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
 	&dev_attr_state.attr,
 	&dev_attr_phys_device.attr,
 	&dev_attr_removable.attr,
+	&dev_attr_type.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
 #endif
@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
 }
 
 static int init_memory_block(struct memory_block **memory,
-			     struct mem_section *section, unsigned long state)
+			     struct mem_section *section, unsigned long state,
+			     int type)
 {
 	struct memory_block *mem;
 	unsigned long start_pfn;
 	int scn_nr;
 	int ret = 0;
 
+	if (type = MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
 	if (!mem)
 		return -ENOMEM;
@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
 	mem->state = state;
 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
+	mem->type = type;
 
 	ret = register_memory(mem);
 
@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
 
 	if (section_count = 0)
 		return 0;
-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
+				MEMORY_BLOCK_BOOT);
 	if (ret)
 		return ret;
 	mem->section_count = section_count;
@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 		mem->section_count++;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
+					MEMORY_BLOCK_UNSPECIFIED);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index d75ec88ca09d..06268e96e0da 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -34,12 +34,39 @@ struct memory_block {
 	int (*phys_callback)(struct memory_block *);
 	struct device dev;
 	int nid;			/* NID for this memory block */
+	int type;			/* type of this memory block */
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
 unsigned long memory_block_size_bytes(void);
 int set_memory_block_size_order(unsigned int order);
 
+/*
+ * Memory block types allow user space to formulate rules if and how to
+ * online memory blocks. The types are exposed to user space as text
+ * strings in sysfs.
+ *
+ * MEMORY_BLOCK_NONE:
+ *  No memory block is to be created (e.g. device memory). Not exposed to
+ *  user space.
+ *
+ * MEMORY_BLOCK_UNSPECIFIED:
+ *  The type of memory block was not further specified when adding the
+ *  memory block.
+ *
+ * MEMORY_BLOCK_BOOT:
+ *  This memory block was added during boot by the basic system. No
+ *  specific device driver takes care of this memory block. This memory
+ *  block type is onlined automatically by the kernel during boot and might
+ *  later be managed by a different device driver, in which case the type
+ *  might change.
+ */
+enum {
+	MEMORY_BLOCK_NONE = 0,
+	MEMORY_BLOCK_UNSPECIFIED,
+	MEMORY_BLOCK_BOOT,
+};
+
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
 #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  2018-11-30 17:59 ` David Hildenbrand
  (?)
@ 2018-11-30 17:59   ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Benjamin Herrenschmidt, Dave Hansen,
	Heiko Carstens, Wei Yang, Michal Hocko, Paul Mackerras,
	H. Peter Anvin, Thomas Gleixner, Rafael J. Wysocki, linux-s390,
	Dave Jiang, Yoshinori Sato, Michael Ellerman, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu

Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
functional change.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: "Jonathan Neuschäfer" <j.neuschaefer@gmx.net>
Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Jan H. Schönherr" <jschoenh@amazon.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/ia64/mm/init.c            |  4 ++--
 arch/powerpc/mm/mem.c          |  4 ++--
 arch/s390/mm/init.c            |  4 ++--
 arch/sh/mm/init.c              |  4 ++--
 arch/x86/mm/init_32.c          |  4 ++--
 arch/x86/mm/init_64.c          |  8 ++++----
 drivers/base/memory.c          | 11 +++++++----
 include/linux/memory.h         |  2 +-
 include/linux/memory_hotplug.h | 12 ++++++------
 kernel/memremap.c              |  6 ++++--
 mm/memory_hotplug.c            | 16 ++++++++--------
 11 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 904fe55e10fc..408635d2902f 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -646,13 +646,13 @@ mem_init (void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (ret)
 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
 		       __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index b3c9ee5c4f78..e394637da270 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
 }
 
 int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+			      int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
 	}
 	flush_inval_dcache_range(start, start + size);
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 3e82f66d5c61..ba2c56328e6d 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
 #endif /* CONFIG_CMA */
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long size_pages = PFN_DOWN(size);
@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 	if (rc)
 		return rc;
 
-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
 	if (rc)
 		vmem_remove_mapping(start, size);
 	return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 1a483a008872..5fbb8724e0f2 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
 	/* We only have ZONE_NORMAL, so this is easy.. */
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (unlikely(ret))
 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 0b8c7b0033d2..41e409b29d2b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -851,12 +851,12 @@ void __init mem_init(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f80d98381a97..5b4f3dcd44cf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
 }
 
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock)
+	      struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	WARN_ON_ONCE(ret);
 
 	/* update max_pfn, max_low_pfn and high_memory */
@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 }
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
 	init_memory_mapping(start, start + size);
 
-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #define PAGE_INUSE 0xFD
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 17f2985c07c5..c42300082c88 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
  * need an interface for the VM to add new memory regions,
  * but without onlining it.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int hotplug_memory_register(int nid, struct mem_section *section, int type)
 {
 	int ret = 0;
 	struct memory_block *mem;
@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 
 	mem = find_memory_block(section);
 	if (mem) {
-		mem->section_count++;
+		/* make sure the type matches */
+		if (mem->type == type)
+			mem->section_count++;
+		else
+			ret = -EINVAL;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
-					MEMORY_BLOCK_UNSPECIFIED);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 06268e96e0da..9f39ef41e6d2 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int hotplug_memory_register(int nid, struct mem_section *section, int type);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern int unregister_memory_section(int nid, struct mem_section *);
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 5493d3fa0c7f..667a37aa9a3c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+		       struct vmem_altmap *altmap, int type);
 
 #ifndef CONFIG_ARCH_HAS_ADD_PAGES
 static inline int add_pages(int nid, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+			    unsigned long nr_pages, struct vmem_altmap *altmap,
+			    int type)
 {
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 #else /* ARCH_HAS_ADD_PAGES */
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+	      struct vmem_altmap *altmap, int type);
 #endif /* ARCH_HAS_ADD_PAGES */
 
 #ifdef CONFIG_NUMA
@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
 extern int arch_add_memory(int nid, u64 start, u64 size,
-		struct vmem_altmap *altmap, bool want_memblock);
+			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 66cbf334203b..422e4e779208 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -4,6 +4,7 @@
 #include <linux/io.h>
 #include <linux/kasan.h>
 #include <linux/memory_hotplug.h>
+#include <linux/memory.h>
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/swap.h>
@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		error = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL, false);
+				  align_size >> PAGE_SHIFT, NULL,
+				  MEMORY_BLOCK_NONE);
 	} else {
 		error = kasan_add_zero_shadow(__va(align_start), align_size);
 		if (error) {
@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		}
 
 		error = arch_add_memory(nid, align_start, align_size, altmap,
-				false);
+					MEMORY_BLOCK_NONE);
 	}
 
 	if (!error) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 16c600771298..7246faa44488 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+				   struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (ret < 0)
 		return ret;
 
-	if (!want_memblock)
+	if (type == MEMORY_BLOCK_NONE)
 		return 0;
 
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
+				       type);
 }
 
 /*
@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * add the new pages.
  */
 int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+		      unsigned long nr_pages, struct vmem_altmap *altmap,
+		      int type)
 {
 	unsigned long i;
 	int err = 0;
@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
-				want_memblock);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, true);
+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
 	if (ret < 0)
 		goto error;
 
-- 
2.17.2

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand, Tony Luck,
	Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Martin Schwidefsky, Heiko Carstens,
	Yoshinori Sato, Rich Felker, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Morton, Mike Rapoport, Michal Hocko, Dan Williams,
	Kirill A. Shutemov, Oscar Salvador, Nicholas Piggin,
	Stephen Rothwell, Christophe Leroy, Jonathan Neuschäfer,
	Mauricio Faria de Oliveira, Vasily Gorbik, Arun KS, Rob Herring,
	Pavel Tatashin, mike.travis, Joonsoo Kim, Wei Yang,
	Logan Gunthorpe, Jérôme Glisse, Jan H. Schönherr,
	Dave Jiang, Matthew Wilcox, Mathieu Malaterre

Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
functional change.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: "Jonathan Neuschäfer" <j.neuschaefer@gmx.net>
Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Jan H. Schönherr" <jschoenh@amazon.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/ia64/mm/init.c            |  4 ++--
 arch/powerpc/mm/mem.c          |  4 ++--
 arch/s390/mm/init.c            |  4 ++--
 arch/sh/mm/init.c              |  4 ++--
 arch/x86/mm/init_32.c          |  4 ++--
 arch/x86/mm/init_64.c          |  8 ++++----
 drivers/base/memory.c          | 11 +++++++----
 include/linux/memory.h         |  2 +-
 include/linux/memory_hotplug.h | 12 ++++++------
 kernel/memremap.c              |  6 ++++--
 mm/memory_hotplug.c            | 16 ++++++++--------
 11 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 904fe55e10fc..408635d2902f 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -646,13 +646,13 @@ mem_init (void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (ret)
 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
 		       __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index b3c9ee5c4f78..e394637da270 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
 }
 
 int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+			      int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
 	}
 	flush_inval_dcache_range(start, start + size);
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 3e82f66d5c61..ba2c56328e6d 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
 #endif /* CONFIG_CMA */
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long size_pages = PFN_DOWN(size);
@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 	if (rc)
 		return rc;
 
-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
 	if (rc)
 		vmem_remove_mapping(start, size);
 	return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 1a483a008872..5fbb8724e0f2 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
 	/* We only have ZONE_NORMAL, so this is easy.. */
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (unlikely(ret))
 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 0b8c7b0033d2..41e409b29d2b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -851,12 +851,12 @@ void __init mem_init(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f80d98381a97..5b4f3dcd44cf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
 }
 
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock)
+	      struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	WARN_ON_ONCE(ret);
 
 	/* update max_pfn, max_low_pfn and high_memory */
@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 }
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
 	init_memory_mapping(start, start + size);
 
-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #define PAGE_INUSE 0xFD
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 17f2985c07c5..c42300082c88 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
  * need an interface for the VM to add new memory regions,
  * but without onlining it.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int hotplug_memory_register(int nid, struct mem_section *section, int type)
 {
 	int ret = 0;
 	struct memory_block *mem;
@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 
 	mem = find_memory_block(section);
 	if (mem) {
-		mem->section_count++;
+		/* make sure the type matches */
+		if (mem->type == type)
+			mem->section_count++;
+		else
+			ret = -EINVAL;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
-					MEMORY_BLOCK_UNSPECIFIED);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 06268e96e0da..9f39ef41e6d2 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int hotplug_memory_register(int nid, struct mem_section *section, int type);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern int unregister_memory_section(int nid, struct mem_section *);
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 5493d3fa0c7f..667a37aa9a3c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+		       struct vmem_altmap *altmap, int type);
 
 #ifndef CONFIG_ARCH_HAS_ADD_PAGES
 static inline int add_pages(int nid, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+			    unsigned long nr_pages, struct vmem_altmap *altmap,
+			    int type)
 {
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 #else /* ARCH_HAS_ADD_PAGES */
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+	      struct vmem_altmap *altmap, int type);
 #endif /* ARCH_HAS_ADD_PAGES */
 
 #ifdef CONFIG_NUMA
@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
 extern int arch_add_memory(int nid, u64 start, u64 size,
-		struct vmem_altmap *altmap, bool want_memblock);
+			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 66cbf334203b..422e4e779208 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -4,6 +4,7 @@
 #include <linux/io.h>
 #include <linux/kasan.h>
 #include <linux/memory_hotplug.h>
+#include <linux/memory.h>
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/swap.h>
@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		error = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL, false);
+				  align_size >> PAGE_SHIFT, NULL,
+				  MEMORY_BLOCK_NONE);
 	} else {
 		error = kasan_add_zero_shadow(__va(align_start), align_size);
 		if (error) {
@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		}
 
 		error = arch_add_memory(nid, align_start, align_size, altmap,
-				false);
+					MEMORY_BLOCK_NONE);
 	}
 
 	if (!error) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 16c600771298..7246faa44488 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+				   struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (ret < 0)
 		return ret;
 
-	if (!want_memblock)
+	if (type == MEMORY_BLOCK_NONE)
 		return 0;
 
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
+				       type);
 }
 
 /*
@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * add the new pages.
  */
 int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+		      unsigned long nr_pages, struct vmem_altmap *altmap,
+		      int type)
 {
 	unsigned long i;
 	int err = 0;
@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
-				want_memblock);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, true);
+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
 	if (ret < 0)
 		goto error;
 
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Dave Hansen, Heiko Carstens, Wei Yang,
	Michal Hocko, Paul Mackerras, H. Peter Anvin, Thomas Gleixner,
	Rafael J. Wysocki, linux-s390, Dave Jiang, Yoshinori Sato, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu, Jan H. Schönherr, Pavel Tatashin, Vasily Gorbik,
	Stephen Rothwell, mike.travis, Dan Williams,
	Jonathan Neuschäfer, Nicholas Piggin,
	Jérôme Glisse, Mike Rapoport, Borislav Petkov,
	Andy Lutomirski, David Hildenbrand, Joonsoo Kim, Arun KS,
	Tony Luck, Mathieu Malaterre, Greg Kroah-Hartman, linux-kernel,
	Logan Gunthorpe, Mauricio Faria de Oliveira, Martin Schwidefsky,
	devel, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
functional change.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: "Jonathan Neuschäfer" <j.neuschaefer@gmx.net>
Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Jan H. Schönherr" <jschoenh@amazon.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/ia64/mm/init.c            |  4 ++--
 arch/powerpc/mm/mem.c          |  4 ++--
 arch/s390/mm/init.c            |  4 ++--
 arch/sh/mm/init.c              |  4 ++--
 arch/x86/mm/init_32.c          |  4 ++--
 arch/x86/mm/init_64.c          |  8 ++++----
 drivers/base/memory.c          | 11 +++++++----
 include/linux/memory.h         |  2 +-
 include/linux/memory_hotplug.h | 12 ++++++------
 kernel/memremap.c              |  6 ++++--
 mm/memory_hotplug.c            | 16 ++++++++--------
 11 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 904fe55e10fc..408635d2902f 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -646,13 +646,13 @@ mem_init (void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (ret)
 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
 		       __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index b3c9ee5c4f78..e394637da270 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
 }
 
 int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+			      int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
 	}
 	flush_inval_dcache_range(start, start + size);
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 3e82f66d5c61..ba2c56328e6d 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
 #endif /* CONFIG_CMA */
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long size_pages = PFN_DOWN(size);
@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 	if (rc)
 		return rc;
 
-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
 	if (rc)
 		vmem_remove_mapping(start, size);
 	return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 1a483a008872..5fbb8724e0f2 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
 	/* We only have ZONE_NORMAL, so this is easy.. */
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (unlikely(ret))
 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 0b8c7b0033d2..41e409b29d2b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -851,12 +851,12 @@ void __init mem_init(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f80d98381a97..5b4f3dcd44cf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
 }
 
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock)
+	      struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	WARN_ON_ONCE(ret);
 
 	/* update max_pfn, max_low_pfn and high_memory */
@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 }
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
 	init_memory_mapping(start, start + size);
 
-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #define PAGE_INUSE 0xFD
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 17f2985c07c5..c42300082c88 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
  * need an interface for the VM to add new memory regions,
  * but without onlining it.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int hotplug_memory_register(int nid, struct mem_section *section, int type)
 {
 	int ret = 0;
 	struct memory_block *mem;
@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 
 	mem = find_memory_block(section);
 	if (mem) {
-		mem->section_count++;
+		/* make sure the type matches */
+		if (mem->type == type)
+			mem->section_count++;
+		else
+			ret = -EINVAL;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
-					MEMORY_BLOCK_UNSPECIFIED);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 06268e96e0da..9f39ef41e6d2 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int hotplug_memory_register(int nid, struct mem_section *section, int type);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern int unregister_memory_section(int nid, struct mem_section *);
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 5493d3fa0c7f..667a37aa9a3c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+		       struct vmem_altmap *altmap, int type);
 
 #ifndef CONFIG_ARCH_HAS_ADD_PAGES
 static inline int add_pages(int nid, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+			    unsigned long nr_pages, struct vmem_altmap *altmap,
+			    int type)
 {
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 #else /* ARCH_HAS_ADD_PAGES */
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+	      struct vmem_altmap *altmap, int type);
 #endif /* ARCH_HAS_ADD_PAGES */
 
 #ifdef CONFIG_NUMA
@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
 extern int arch_add_memory(int nid, u64 start, u64 size,
-		struct vmem_altmap *altmap, bool want_memblock);
+			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 66cbf334203b..422e4e779208 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -4,6 +4,7 @@
 #include <linux/io.h>
 #include <linux/kasan.h>
 #include <linux/memory_hotplug.h>
+#include <linux/memory.h>
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/swap.h>
@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		error = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL, false);
+				  align_size >> PAGE_SHIFT, NULL,
+				  MEMORY_BLOCK_NONE);
 	} else {
 		error = kasan_add_zero_shadow(__va(align_start), align_size);
 		if (error) {
@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		}
 
 		error = arch_add_memory(nid, align_start, align_size, altmap,
-				false);
+					MEMORY_BLOCK_NONE);
 	}
 
 	if (!error) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 16c600771298..7246faa44488 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+				   struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (ret < 0)
 		return ret;
 
-	if (!want_memblock)
+	if (type == MEMORY_BLOCK_NONE)
 		return 0;
 
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
+				       type);
 }
 
 /*
@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * add the new pages.
  */
 int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+		      unsigned long nr_pages, struct vmem_altmap *altmap,
+		      int type)
 {
 	unsigned long i;
 	int err = 0;
@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
-				want_memblock);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, true);
+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
 	if (ret < 0)
 		goto error;
 
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  2018-11-30 17:59 ` David Hildenbrand
                   ` (3 preceding siblings ...)
  (?)
@ 2018-11-30 17:59 ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Benjamin Herrenschmidt, Dave Hansen,
	Heiko Carstens, Wei Yang, Michal Hocko, Paul Mackerras,
	H. Peter Anvin, Thomas Gleixner, Rafael J. Wysocki, linux-s390,
	Dave Jiang, Yoshinori Sato, Michael Ellerman, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu

Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
functional change.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: "Jonathan Neuschäfer" <j.neuschaefer@gmx.net>
Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: "Jan H. Schönherr" <jschoenh@amazon.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/ia64/mm/init.c            |  4 ++--
 arch/powerpc/mm/mem.c          |  4 ++--
 arch/s390/mm/init.c            |  4 ++--
 arch/sh/mm/init.c              |  4 ++--
 arch/x86/mm/init_32.c          |  4 ++--
 arch/x86/mm/init_64.c          |  8 ++++----
 drivers/base/memory.c          | 11 +++++++----
 include/linux/memory.h         |  2 +-
 include/linux/memory_hotplug.h | 12 ++++++------
 kernel/memremap.c              |  6 ++++--
 mm/memory_hotplug.c            | 16 ++++++++--------
 11 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 904fe55e10fc..408635d2902f 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -646,13 +646,13 @@ mem_init (void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (ret)
 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
 		       __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index b3c9ee5c4f78..e394637da270 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
 }
 
 int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+			      int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
 	}
 	flush_inval_dcache_range(start, start + size);
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index 3e82f66d5c61..ba2c56328e6d 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
 #endif /* CONFIG_CMA */
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long size_pages = PFN_DOWN(size);
@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
 	if (rc)
 		return rc;
 
-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
 	if (rc)
 		vmem_remove_mapping(start, size);
 	return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 1a483a008872..5fbb8724e0f2 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = PFN_DOWN(start);
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 	int ret;
 
 	/* We only have ZONE_NORMAL, so this is easy.. */
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	if (unlikely(ret))
 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 0b8c7b0033d2..41e409b29d2b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -851,12 +851,12 @@ void __init mem_init(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #ifdef CONFIG_MEMORY_HOTREMOVE
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index f80d98381a97..5b4f3dcd44cf 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
 }
 
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock)
+	      struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
 	WARN_ON_ONCE(ret);
 
 	/* update max_pfn, max_low_pfn and high_memory */
@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
 }
 
 int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
-		bool want_memblock)
+		    int type)
 {
 	unsigned long start_pfn = start >> PAGE_SHIFT;
 	unsigned long nr_pages = size >> PAGE_SHIFT;
 
 	init_memory_mapping(start, start + size);
 
-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 
 #define PAGE_INUSE 0xFD
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 17f2985c07c5..c42300082c88 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
  * need an interface for the VM to add new memory regions,
  * but without onlining it.
  */
-int hotplug_memory_register(int nid, struct mem_section *section)
+int hotplug_memory_register(int nid, struct mem_section *section, int type)
 {
 	int ret = 0;
 	struct memory_block *mem;
@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
 
 	mem = find_memory_block(section);
 	if (mem) {
-		mem->section_count++;
+		/* make sure the type matches */
+		if (mem->type == type)
+			mem->section_count++;
+		else
+			ret = -EINVAL;
 		put_device(&mem->dev);
 	} else {
-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
-					MEMORY_BLOCK_UNSPECIFIED);
+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
 		if (ret)
 			goto out;
 		mem->section_count++;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 06268e96e0da..9f39ef41e6d2 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
 extern void unregister_memory_notifier(struct notifier_block *nb);
 extern int register_memory_isolate_notifier(struct notifier_block *nb);
 extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
-int hotplug_memory_register(int nid, struct mem_section *section);
+int hotplug_memory_register(int nid, struct mem_section *section, int type);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 extern int unregister_memory_section(int nid, struct mem_section *);
 #endif
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 5493d3fa0c7f..667a37aa9a3c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
 
 /* reasonably generic interface to expand the physical pages */
 extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+		       struct vmem_altmap *altmap, int type);
 
 #ifndef CONFIG_ARCH_HAS_ADD_PAGES
 static inline int add_pages(int nid, unsigned long start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+			    unsigned long nr_pages, struct vmem_altmap *altmap,
+			    int type)
 {
-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
 }
 #else /* ARCH_HAS_ADD_PAGES */
 int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
-		struct vmem_altmap *altmap, bool want_memblock);
+	      struct vmem_altmap *altmap, int type);
 #endif /* ARCH_HAS_ADD_PAGES */
 
 #ifdef CONFIG_NUMA
@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
 extern int add_memory(int nid, u64 start, u64 size);
 extern int add_memory_resource(int nid, struct resource *resource);
 extern int arch_add_memory(int nid, u64 start, u64 size,
-		struct vmem_altmap *altmap, bool want_memblock);
+			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
 		unsigned long nr_pages, struct vmem_altmap *altmap);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
diff --git a/kernel/memremap.c b/kernel/memremap.c
index 66cbf334203b..422e4e779208 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -4,6 +4,7 @@
 #include <linux/io.h>
 #include <linux/kasan.h>
 #include <linux/memory_hotplug.h>
+#include <linux/memory.h>
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/swap.h>
@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 	 */
 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
 		error = add_pages(nid, align_start >> PAGE_SHIFT,
-				align_size >> PAGE_SHIFT, NULL, false);
+				  align_size >> PAGE_SHIFT, NULL,
+				  MEMORY_BLOCK_NONE);
 	} else {
 		error = kasan_add_zero_shadow(__va(align_start), align_size);
 		if (error) {
@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
 		}
 
 		error = arch_add_memory(nid, align_start, align_size, altmap,
-				false);
+					MEMORY_BLOCK_NONE);
 	}
 
 	if (!error) {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 16c600771298..7246faa44488 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
 #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
 
 static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
-		struct vmem_altmap *altmap, bool want_memblock)
+				   struct vmem_altmap *altmap, int type)
 {
 	int ret;
 
@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
 	if (ret < 0)
 		return ret;
 
-	if (!want_memblock)
+	if (type == MEMORY_BLOCK_NONE)
 		return 0;
 
-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
+				       type);
 }
 
 /*
@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
  * add the new pages.
  */
 int __ref __add_pages(int nid, unsigned long phys_start_pfn,
-		unsigned long nr_pages, struct vmem_altmap *altmap,
-		bool want_memblock)
+		      unsigned long nr_pages, struct vmem_altmap *altmap,
+		      int type)
 {
 	unsigned long i;
 	int err = 0;
@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
 	}
 
 	for (i = start_sec; i <= end_sec; i++) {
-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
-				want_memblock);
+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
 
 		/*
 		 * EEXIST is finally dealt with by ioresource collision
@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, true);
+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
 	if (ret < 0)
 		goto error;
 
-- 
2.17.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
  2018-11-30 17:59 ` David Hildenbrand
                     ` (2 preceding siblings ...)
  (?)
@ 2018-11-30 17:59   ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky

Let's introduce new types for different kinds of memory blocks and use
them in existing code. As I don't see an easy way to split this up,
do it in one hunk for now.

acpi:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 Properly change the type when trying to add memory that was already
 detected and used during boot (so this memory will correctly end up as
 "acpi" in user space).

pseries:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 As far as I see, handling like in the acpi case for existing blocks is
 not required.

probed memory from user space:
 Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
 again.

hv_balloon,xen/balloon:
 Use BALLOON. As simple as that :)

s390x/sclp:
 Use a dedicated type S390X_STANDBY as this type of memory and it's
 semantics are very s390x specific.

powernv/memtrace:
 Only allow to use BOOT memory for memtrace. I consider this code in
 general dangerous, but we have to keep it working ... most probably just
 a debug feature.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

---

At first I tried to abstract the types quite a lot, but I think there
are subtle differences that are worth differentiating. More details about
the types can be found in the excessive documentation.

It is wort noting that BALLOON_MOVABLE has no user yet, but I have
something in mind that might want to make use of that (virtio-mem).
Just included it to discuss the general approach. I can drop it from
this patch.
---
 arch/powerpc/platforms/powernv/memtrace.c     |  9 ++--
 .../platforms/pseries/hotplug-memory.c        |  7 ++-
 drivers/acpi/acpi_memhotplug.c                | 16 ++++++-
 drivers/base/memory.c                         | 18 ++++++-
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 47 ++++++++++++++++++-
 include/linux/memory_hotplug.h                |  6 +--
 mm/memory_hotplug.c                           | 15 +++---
 10 files changed, 104 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 248a38ad25c7..5d08db87091e 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
+static int check_memblock_boot_and_online(struct memory_block *mem, void *arg)
 {
-	if (mem->state != MEM_ONLINE)
+	if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE)
 		return -1;
 
 	return 0;
@@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_boot_and_online))
 		return false;
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
@@ -233,7 +233,8 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size)) {
+		if (add_memory(ent->nid, ent->start, ent->size,
+			       MEMORY_BLOCK_BOOT)) {
 			pr_err("Failed to add trace memory to node %d\n",
 				ent->nid);
 			ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2a983b5a52e1..5f91359c7993 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int nid, rc;
+	int nid, rc, type = MEMORY_BLOCK_DIMM;
 
 	if (lmb->flags & DRCONF_MEM_ASSIGNED)
 		return -EINVAL;
@@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	/* Find the node id for this address */
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, type);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8fe0960ea572..f841113b450d 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -177,6 +177,13 @@ static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
 
 static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 {
+	/* switch the type of memory block if this memory was already present */
+	if (mem->type = MEMORY_BLOCK_BOOT) {
+		if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+			mem->type = MEMORY_BLOCK_DIMM;
+		else
+			mem->type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+	}
 	return acpi_bind_one(&mem->dev, arg);
 }
 
@@ -191,6 +198,7 @@ static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
 static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
 {
 	acpi_unbind_one(&mem->dev);
+	mem->type = MEMORY_BLOCK_BOOT;
 	return 0;
 }
 
@@ -203,10 +211,13 @@ static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
 static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 {
 	acpi_handle handle = mem_device->device->handle;
-	int result, num_enabled = 0;
+	int result, num_enabled = 0, type = MEMORY_BLOCK_DIMM;
 	struct acpi_memory_info *info;
 	int node;
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	node = acpi_get_node(handle);
 	/*
 	 * Tell the VM there is more memory here...
@@ -228,7 +239,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length,
+				      type);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c42300082c88..c5fdca7a3009 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,6 +394,21 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
+	case MEMORY_BLOCK_DIMM:
+		len = sprintf(buf, "dimm\n");
+		break;
+	case MEMORY_BLOCK_DIMM_UNREMOVABLE:
+		len = sprintf(buf, "dimm-unremovable\n");
+		break;
+	case MEMORY_BLOCK_BALLOON:
+		len = sprintf(buf, "balloon\n");
+		break;
+	case MEMORY_BLOCK_BALLOON_MOVABLE:
+		len = sprintf(buf, "balloon-movable\n");
+		break;
+	case MEMORY_BLOCK_S390X_STANDBY:
+		len = sprintf(buf, "s390x-standby\n");
+		break;
 	default:
 		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
 				mem->state);
@@ -538,7 +553,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+			   MEMORY_BLOCK_DIMM_UNREMOVABLE);
 
 	if (ret)
 		goto out;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 47719862e57f..f502ea6cd255 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -741,7 +741,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				 (HA_CHUNK << PAGE_SHIFT),
+				 MEMORY_BLOCK_BALLOON);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 37d42de06079..0ca6f77e7e1d 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size,
+			   MEMORY_BLOCK_S390X_STANDBY);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 5d2d7a917b4e..953ff86d609b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, MEMORY_BLOCK_BALLOON);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 9f39ef41e6d2..a3a1e9764805 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -59,12 +59,57 @@ int set_memory_block_size_order(unsigned int order);
  *  specific device driver takes care of this memory block. This memory
  *  block type is onlined automatically by the kernel during boot and might
  *  later be managed by a different device driver, in which case the type
- *  might change.
+ *  might change (e.g. to MEMORY_BLOCK_DIMM).
+ *
+ * MEMORY_BLOCK_DIMM:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). Once all memory blocks belonging to the DIMM have been
+ *  offlined, the DIMM along with the memory blocks can be removed to
+ *  effectively unplug it. This memory block type is usually onlined to the
+ *  MOVABLE zone, to make offlining and unplug possible. Examples include
+ *  ACPI DIMMs and PPC LMBs if the kernel supports removal of memory.
+ *
+ * MEMORY_BLOCK_DIMM_UNREMOVABLE:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). There is either no HW interface to remove the DIMM or
+ *  the kernel does not support offlining/removal of memory, so this memory
+ *  block can never be removed. Examples include ACPI DIMMs and PPC LMBs
+ *  when removal of memory is not supported by the kernel, as well as
+ *  memory probed manually from user space.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that does not require a specific zone for optimal operation
+ *  (e.g. unplug memory using balloon inflation on this memory block on
+ *  page granularity). Examples include memory added by the XEN and Hyper-V
+ *  balloon driver.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON_MOVABLE:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that suggests to online this memory block to the MOVABLE zone for
+ *  optimal operation (a.g. unplug using balloon inflation on this memory
+ *  block in bigger chunks than pages). There are no examples yet.
+ *  This memory block type is usually onlined to the MOVABLE zone.
+ *
+ * MEMORY_BLOCK_S390X_STANDBY:
+ *  The memory block is special standby memory on s390x. As long as
+ *  offline, no memory will be allocated to the system for this memory
+ *  block. Onlining memory will result in memory getting allocated to the
+ *  system and memory can usually not be offlined again. The memory block
+ *  will never be removed. This memory type is usually not onlined
+ *  automatically but explicitly by the administrator.
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
 	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
+	MEMORY_BLOCK_DIMM,
+	MEMORY_BLOCK_DIMM_UNREMOVABLE,
+	MEMORY_BLOCK_BALLOON,
+	MEMORY_BLOCK_BALLOON_MOVABLE,
+	MEMORY_BLOCK_S390X_STANDBY,
 };
 
 /* These states are exposed to userspace as text strings in sysfs */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 667a37aa9a3c..7c8895299e8c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory_resource(int nid, struct resource *resource, int type);
 extern int arch_add_memory(int nid, u64 start, u64 size,
 			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7246faa44488..f109002d6e6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1071,7 +1071,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, int type)
 {
 	u64 start, size;
 	bool new_node = false;
@@ -1080,6 +1080,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	start = res->start;
 	size = resource_size(res);
 
+	if (type = MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	ret = check_hotplug_memory_range(start, size);
 	if (ret)
 		return ret;
@@ -1100,7 +1103,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
+	ret = arch_add_memory(nid, start, size, NULL, type);
 	if (ret < 0)
 		goto error;
 
@@ -1141,7 +1144,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, int type)
 {
 	struct resource *res;
 	int ret;
@@ -1150,18 +1153,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, type);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, int type)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, type);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky

Let's introduce new types for different kinds of memory blocks and use
them in existing code. As I don't see an easy way to split this up,
do it in one hunk for now.

acpi:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 Properly change the type when trying to add memory that was already
 detected and used during boot (so this memory will correctly end up as
 "acpi" in user space).

pseries:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 As far as I see, handling like in the acpi case for existing blocks is
 not required.

probed memory from user space:
 Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
 again.

hv_balloon,xen/balloon:
 Use BALLOON. As simple as that :)

s390x/sclp:
 Use a dedicated type S390X_STANDBY as this type of memory and it's
 semantics are very s390x specific.

powernv/memtrace:
 Only allow to use BOOT memory for memtrace. I consider this code in
 general dangerous, but we have to keep it working ... most probably just
 a debug feature.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

---

At first I tried to abstract the types quite a lot, but I think there
are subtle differences that are worth differentiating. More details about
the types can be found in the excessive documentation.

It is wort noting that BALLOON_MOVABLE has no user yet, but I have
something in mind that might want to make use of that (virtio-mem).
Just included it to discuss the general approach. I can drop it from
this patch.
---
 arch/powerpc/platforms/powernv/memtrace.c     |  9 ++--
 .../platforms/pseries/hotplug-memory.c        |  7 ++-
 drivers/acpi/acpi_memhotplug.c                | 16 ++++++-
 drivers/base/memory.c                         | 18 ++++++-
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 47 ++++++++++++++++++-
 include/linux/memory_hotplug.h                |  6 +--
 mm/memory_hotplug.c                           | 15 +++---
 10 files changed, 104 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 248a38ad25c7..5d08db87091e 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
+static int check_memblock_boot_and_online(struct memory_block *mem, void *arg)
 {
-	if (mem->state != MEM_ONLINE)
+	if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE)
 		return -1;
 
 	return 0;
@@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_boot_and_online))
 		return false;
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
@@ -233,7 +233,8 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size)) {
+		if (add_memory(ent->nid, ent->start, ent->size,
+			       MEMORY_BLOCK_BOOT)) {
 			pr_err("Failed to add trace memory to node %d\n",
 				ent->nid);
 			ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2a983b5a52e1..5f91359c7993 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int nid, rc;
+	int nid, rc, type = MEMORY_BLOCK_DIMM;
 
 	if (lmb->flags & DRCONF_MEM_ASSIGNED)
 		return -EINVAL;
@@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	/* Find the node id for this address */
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, type);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8fe0960ea572..f841113b450d 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -177,6 +177,13 @@ static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
 
 static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 {
+	/* switch the type of memory block if this memory was already present */
+	if (mem->type == MEMORY_BLOCK_BOOT) {
+		if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+			mem->type = MEMORY_BLOCK_DIMM;
+		else
+			mem->type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+	}
 	return acpi_bind_one(&mem->dev, arg);
 }
 
@@ -191,6 +198,7 @@ static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
 static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
 {
 	acpi_unbind_one(&mem->dev);
+	mem->type = MEMORY_BLOCK_BOOT;
 	return 0;
 }
 
@@ -203,10 +211,13 @@ static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
 static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 {
 	acpi_handle handle = mem_device->device->handle;
-	int result, num_enabled = 0;
+	int result, num_enabled = 0, type = MEMORY_BLOCK_DIMM;
 	struct acpi_memory_info *info;
 	int node;
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	node = acpi_get_node(handle);
 	/*
 	 * Tell the VM there is more memory here...
@@ -228,7 +239,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length,
+				      type);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c42300082c88..c5fdca7a3009 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,6 +394,21 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
+	case MEMORY_BLOCK_DIMM:
+		len = sprintf(buf, "dimm\n");
+		break;
+	case MEMORY_BLOCK_DIMM_UNREMOVABLE:
+		len = sprintf(buf, "dimm-unremovable\n");
+		break;
+	case MEMORY_BLOCK_BALLOON:
+		len = sprintf(buf, "balloon\n");
+		break;
+	case MEMORY_BLOCK_BALLOON_MOVABLE:
+		len = sprintf(buf, "balloon-movable\n");
+		break;
+	case MEMORY_BLOCK_S390X_STANDBY:
+		len = sprintf(buf, "s390x-standby\n");
+		break;
 	default:
 		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
 				mem->state);
@@ -538,7 +553,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+			   MEMORY_BLOCK_DIMM_UNREMOVABLE);
 
 	if (ret)
 		goto out;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 47719862e57f..f502ea6cd255 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -741,7 +741,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				 (HA_CHUNK << PAGE_SHIFT),
+				 MEMORY_BLOCK_BALLOON);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 37d42de06079..0ca6f77e7e1d 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size,
+			   MEMORY_BLOCK_S390X_STANDBY);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 5d2d7a917b4e..953ff86d609b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, MEMORY_BLOCK_BALLOON);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 9f39ef41e6d2..a3a1e9764805 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -59,12 +59,57 @@ int set_memory_block_size_order(unsigned int order);
  *  specific device driver takes care of this memory block. This memory
  *  block type is onlined automatically by the kernel during boot and might
  *  later be managed by a different device driver, in which case the type
- *  might change.
+ *  might change (e.g. to MEMORY_BLOCK_DIMM).
+ *
+ * MEMORY_BLOCK_DIMM:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). Once all memory blocks belonging to the DIMM have been
+ *  offlined, the DIMM along with the memory blocks can be removed to
+ *  effectively unplug it. This memory block type is usually onlined to the
+ *  MOVABLE zone, to make offlining and unplug possible. Examples include
+ *  ACPI DIMMs and PPC LMBs if the kernel supports removal of memory.
+ *
+ * MEMORY_BLOCK_DIMM_UNREMOVABLE:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). There is either no HW interface to remove the DIMM or
+ *  the kernel does not support offlining/removal of memory, so this memory
+ *  block can never be removed. Examples include ACPI DIMMs and PPC LMBs
+ *  when removal of memory is not supported by the kernel, as well as
+ *  memory probed manually from user space.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that does not require a specific zone for optimal operation
+ *  (e.g. unplug memory using balloon inflation on this memory block on
+ *  page granularity). Examples include memory added by the XEN and Hyper-V
+ *  balloon driver.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON_MOVABLE:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that suggests to online this memory block to the MOVABLE zone for
+ *  optimal operation (a.g. unplug using balloon inflation on this memory
+ *  block in bigger chunks than pages). There are no examples yet.
+ *  This memory block type is usually onlined to the MOVABLE zone.
+ *
+ * MEMORY_BLOCK_S390X_STANDBY:
+ *  The memory block is special standby memory on s390x. As long as
+ *  offline, no memory will be allocated to the system for this memory
+ *  block. Onlining memory will result in memory getting allocated to the
+ *  system and memory can usually not be offlined again. The memory block
+ *  will never be removed. This memory type is usually not onlined
+ *  automatically but explicitly by the administrator.
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
 	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
+	MEMORY_BLOCK_DIMM,
+	MEMORY_BLOCK_DIMM_UNREMOVABLE,
+	MEMORY_BLOCK_BALLOON,
+	MEMORY_BLOCK_BALLOON_MOVABLE,
+	MEMORY_BLOCK_S390X_STANDBY,
 };
 
 /* These states are exposed to userspace as text strings in sysfs */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 667a37aa9a3c..7c8895299e8c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory_resource(int nid, struct resource *resource, int type);
 extern int arch_add_memory(int nid, u64 start, u64 size,
 			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7246faa44488..f109002d6e6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1071,7 +1071,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, int type)
 {
 	u64 start, size;
 	bool new_node = false;
@@ -1080,6 +1080,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	start = res->start;
 	size = resource_size(res);
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	ret = check_hotplug_memory_range(start, size);
 	if (ret)
 		return ret;
@@ -1100,7 +1103,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
+	ret = arch_add_memory(nid, start, size, NULL, type);
 	if (ret < 0)
 		goto error;
 
@@ -1141,7 +1144,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, int type)
 {
 	struct resource *res;
 	int ret;
@@ -1150,18 +1153,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, type);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, int type)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, type);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Rashmica Gupta, Andrew Morton,
	Pavel Tatashin, Balbir Singh, Michael Neuling, Nathan Fontenot,
	YueHaibing, Vasily Gorbik, Ingo Molnar, Stephen Rothwell,
	mike.travis, Oscar Salvador, Joonsoo Kim, Mathieu Malaterre,
	Michal Hocko, Arun KS, Andrew Banman, Dave Hansen,
	Michal Suchánek, Vitaly Kuznetsov, Dan Williams

Let's introduce new types for different kinds of memory blocks and use
them in existing code. As I don't see an easy way to split this up,
do it in one hunk for now.

acpi:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 Properly change the type when trying to add memory that was already
 detected and used during boot (so this memory will correctly end up as
 "acpi" in user space).

pseries:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 As far as I see, handling like in the acpi case for existing blocks is
 not required.

probed memory from user space:
 Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
 again.

hv_balloon,xen/balloon:
 Use BALLOON. As simple as that :)

s390x/sclp:
 Use a dedicated type S390X_STANDBY as this type of memory and it's
 semantics are very s390x specific.

powernv/memtrace:
 Only allow to use BOOT memory for memtrace. I consider this code in
 general dangerous, but we have to keep it working ... most probably just
 a debug feature.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

---

At first I tried to abstract the types quite a lot, but I think there
are subtle differences that are worth differentiating. More details about
the types can be found in the excessive documentation.

It is wort noting that BALLOON_MOVABLE has no user yet, but I have
something in mind that might want to make use of that (virtio-mem).
Just included it to discuss the general approach. I can drop it from
this patch.
---
 arch/powerpc/platforms/powernv/memtrace.c     |  9 ++--
 .../platforms/pseries/hotplug-memory.c        |  7 ++-
 drivers/acpi/acpi_memhotplug.c                | 16 ++++++-
 drivers/base/memory.c                         | 18 ++++++-
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 47 ++++++++++++++++++-
 include/linux/memory_hotplug.h                |  6 +--
 mm/memory_hotplug.c                           | 15 +++---
 10 files changed, 104 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 248a38ad25c7..5d08db87091e 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
+static int check_memblock_boot_and_online(struct memory_block *mem, void *arg)
 {
-	if (mem->state != MEM_ONLINE)
+	if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE)
 		return -1;
 
 	return 0;
@@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_boot_and_online))
 		return false;
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
@@ -233,7 +233,8 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size)) {
+		if (add_memory(ent->nid, ent->start, ent->size,
+			       MEMORY_BLOCK_BOOT)) {
 			pr_err("Failed to add trace memory to node %d\n",
 				ent->nid);
 			ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2a983b5a52e1..5f91359c7993 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int nid, rc;
+	int nid, rc, type = MEMORY_BLOCK_DIMM;
 
 	if (lmb->flags & DRCONF_MEM_ASSIGNED)
 		return -EINVAL;
@@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	/* Find the node id for this address */
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, type);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8fe0960ea572..f841113b450d 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -177,6 +177,13 @@ static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
 
 static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 {
+	/* switch the type of memory block if this memory was already present */
+	if (mem->type == MEMORY_BLOCK_BOOT) {
+		if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+			mem->type = MEMORY_BLOCK_DIMM;
+		else
+			mem->type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+	}
 	return acpi_bind_one(&mem->dev, arg);
 }
 
@@ -191,6 +198,7 @@ static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
 static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
 {
 	acpi_unbind_one(&mem->dev);
+	mem->type = MEMORY_BLOCK_BOOT;
 	return 0;
 }
 
@@ -203,10 +211,13 @@ static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
 static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 {
 	acpi_handle handle = mem_device->device->handle;
-	int result, num_enabled = 0;
+	int result, num_enabled = 0, type = MEMORY_BLOCK_DIMM;
 	struct acpi_memory_info *info;
 	int node;
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	node = acpi_get_node(handle);
 	/*
 	 * Tell the VM there is more memory here...
@@ -228,7 +239,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length,
+				      type);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c42300082c88..c5fdca7a3009 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,6 +394,21 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
+	case MEMORY_BLOCK_DIMM:
+		len = sprintf(buf, "dimm\n");
+		break;
+	case MEMORY_BLOCK_DIMM_UNREMOVABLE:
+		len = sprintf(buf, "dimm-unremovable\n");
+		break;
+	case MEMORY_BLOCK_BALLOON:
+		len = sprintf(buf, "balloon\n");
+		break;
+	case MEMORY_BLOCK_BALLOON_MOVABLE:
+		len = sprintf(buf, "balloon-movable\n");
+		break;
+	case MEMORY_BLOCK_S390X_STANDBY:
+		len = sprintf(buf, "s390x-standby\n");
+		break;
 	default:
 		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
 				mem->state);
@@ -538,7 +553,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+			   MEMORY_BLOCK_DIMM_UNREMOVABLE);
 
 	if (ret)
 		goto out;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 47719862e57f..f502ea6cd255 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -741,7 +741,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				 (HA_CHUNK << PAGE_SHIFT),
+				 MEMORY_BLOCK_BALLOON);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 37d42de06079..0ca6f77e7e1d 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size,
+			   MEMORY_BLOCK_S390X_STANDBY);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 5d2d7a917b4e..953ff86d609b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, MEMORY_BLOCK_BALLOON);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 9f39ef41e6d2..a3a1e9764805 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -59,12 +59,57 @@ int set_memory_block_size_order(unsigned int order);
  *  specific device driver takes care of this memory block. This memory
  *  block type is onlined automatically by the kernel during boot and might
  *  later be managed by a different device driver, in which case the type
- *  might change.
+ *  might change (e.g. to MEMORY_BLOCK_DIMM).
+ *
+ * MEMORY_BLOCK_DIMM:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). Once all memory blocks belonging to the DIMM have been
+ *  offlined, the DIMM along with the memory blocks can be removed to
+ *  effectively unplug it. This memory block type is usually onlined to the
+ *  MOVABLE zone, to make offlining and unplug possible. Examples include
+ *  ACPI DIMMs and PPC LMBs if the kernel supports removal of memory.
+ *
+ * MEMORY_BLOCK_DIMM_UNREMOVABLE:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). There is either no HW interface to remove the DIMM or
+ *  the kernel does not support offlining/removal of memory, so this memory
+ *  block can never be removed. Examples include ACPI DIMMs and PPC LMBs
+ *  when removal of memory is not supported by the kernel, as well as
+ *  memory probed manually from user space.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that does not require a specific zone for optimal operation
+ *  (e.g. unplug memory using balloon inflation on this memory block on
+ *  page granularity). Examples include memory added by the XEN and Hyper-V
+ *  balloon driver.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON_MOVABLE:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that suggests to online this memory block to the MOVABLE zone for
+ *  optimal operation (a.g. unplug using balloon inflation on this memory
+ *  block in bigger chunks than pages). There are no examples yet.
+ *  This memory block type is usually onlined to the MOVABLE zone.
+ *
+ * MEMORY_BLOCK_S390X_STANDBY:
+ *  The memory block is special standby memory on s390x. As long as
+ *  offline, no memory will be allocated to the system for this memory
+ *  block. Onlining memory will result in memory getting allocated to the
+ *  system and memory can usually not be offlined again. The memory block
+ *  will never be removed. This memory type is usually not onlined
+ *  automatically but explicitly by the administrator.
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
 	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
+	MEMORY_BLOCK_DIMM,
+	MEMORY_BLOCK_DIMM_UNREMOVABLE,
+	MEMORY_BLOCK_BALLOON,
+	MEMORY_BLOCK_BALLOON_MOVABLE,
+	MEMORY_BLOCK_S390X_STANDBY,
 };
 
 /* These states are exposed to userspace as text strings in sysfs */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 667a37aa9a3c..7c8895299e8c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory_resource(int nid, struct resource *resource, int type);
 extern int arch_add_memory(int nid, u64 start, u64 size,
 			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7246faa44488..f109002d6e6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1071,7 +1071,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, int type)
 {
 	u64 start, size;
 	bool new_node = false;
@@ -1080,6 +1080,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	start = res->start;
 	size = resource_size(res);
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	ret = check_hotplug_memory_range(start, size);
 	if (ret)
 		return ret;
@@ -1100,7 +1103,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
+	ret = arch_add_memory(nid, start, size, NULL, type);
 	if (ret < 0)
 		goto error;
 
@@ -1141,7 +1144,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, int type)
 {
 	struct resource *res;
 	int ret;
@@ -1150,18 +1153,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, type);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, int type)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, type);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, linux-ia64, linux-sh, Dave Hansen, Heiko Carstens,
	Vitaly Kuznetsov, Paul Mackerras, Rashmica Gupta,
	K. Y. Srinivasan, Boris Ostrovsky, linux-s390, Michael Neuling,
	Stefano Stabellini, Stephen Hemminger, x86, YueHaibing,
	Ingo Molnar, linux-acpi, xen-devel, Michal Suchánek,
	Len Brown, Pavel Tatashin, Vasily Gorbik, Stephen Rothwell,
	mike.travis, Haiyang Zhang, Dan Williams, Nathan Fontenot,
	David Hildenbrand, Joonsoo Kim, Arun KS, Oscar Salvador,
	Juergen Gross, Andrew Banman, Mathieu Malaterre,
	Greg Kroah-Hartman, Rafael J. Wysocki, linux-kernel,
	Martin Schwidefsky, devel, Andrew Morton, linuxppc-dev

Let's introduce new types for different kinds of memory blocks and use
them in existing code. As I don't see an easy way to split this up,
do it in one hunk for now.

acpi:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 Properly change the type when trying to add memory that was already
 detected and used during boot (so this memory will correctly end up as
 "acpi" in user space).

pseries:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 As far as I see, handling like in the acpi case for existing blocks is
 not required.

probed memory from user space:
 Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
 again.

hv_balloon,xen/balloon:
 Use BALLOON. As simple as that :)

s390x/sclp:
 Use a dedicated type S390X_STANDBY as this type of memory and it's
 semantics are very s390x specific.

powernv/memtrace:
 Only allow to use BOOT memory for memtrace. I consider this code in
 general dangerous, but we have to keep it working ... most probably just
 a debug feature.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

---

At first I tried to abstract the types quite a lot, but I think there
are subtle differences that are worth differentiating. More details about
the types can be found in the excessive documentation.

It is wort noting that BALLOON_MOVABLE has no user yet, but I have
something in mind that might want to make use of that (virtio-mem).
Just included it to discuss the general approach. I can drop it from
this patch.
---
 arch/powerpc/platforms/powernv/memtrace.c     |  9 ++--
 .../platforms/pseries/hotplug-memory.c        |  7 ++-
 drivers/acpi/acpi_memhotplug.c                | 16 ++++++-
 drivers/base/memory.c                         | 18 ++++++-
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 47 ++++++++++++++++++-
 include/linux/memory_hotplug.h                |  6 +--
 mm/memory_hotplug.c                           | 15 +++---
 10 files changed, 104 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 248a38ad25c7..5d08db87091e 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
+static int check_memblock_boot_and_online(struct memory_block *mem, void *arg)
 {
-	if (mem->state != MEM_ONLINE)
+	if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE)
 		return -1;
 
 	return 0;
@@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_boot_and_online))
 		return false;
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
@@ -233,7 +233,8 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size)) {
+		if (add_memory(ent->nid, ent->start, ent->size,
+			       MEMORY_BLOCK_BOOT)) {
 			pr_err("Failed to add trace memory to node %d\n",
 				ent->nid);
 			ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2a983b5a52e1..5f91359c7993 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int nid, rc;
+	int nid, rc, type = MEMORY_BLOCK_DIMM;
 
 	if (lmb->flags & DRCONF_MEM_ASSIGNED)
 		return -EINVAL;
@@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	/* Find the node id for this address */
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, type);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8fe0960ea572..f841113b450d 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -177,6 +177,13 @@ static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
 
 static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 {
+	/* switch the type of memory block if this memory was already present */
+	if (mem->type == MEMORY_BLOCK_BOOT) {
+		if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+			mem->type = MEMORY_BLOCK_DIMM;
+		else
+			mem->type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+	}
 	return acpi_bind_one(&mem->dev, arg);
 }
 
@@ -191,6 +198,7 @@ static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
 static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
 {
 	acpi_unbind_one(&mem->dev);
+	mem->type = MEMORY_BLOCK_BOOT;
 	return 0;
 }
 
@@ -203,10 +211,13 @@ static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
 static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 {
 	acpi_handle handle = mem_device->device->handle;
-	int result, num_enabled = 0;
+	int result, num_enabled = 0, type = MEMORY_BLOCK_DIMM;
 	struct acpi_memory_info *info;
 	int node;
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	node = acpi_get_node(handle);
 	/*
 	 * Tell the VM there is more memory here...
@@ -228,7 +239,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length,
+				      type);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c42300082c88..c5fdca7a3009 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,6 +394,21 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
+	case MEMORY_BLOCK_DIMM:
+		len = sprintf(buf, "dimm\n");
+		break;
+	case MEMORY_BLOCK_DIMM_UNREMOVABLE:
+		len = sprintf(buf, "dimm-unremovable\n");
+		break;
+	case MEMORY_BLOCK_BALLOON:
+		len = sprintf(buf, "balloon\n");
+		break;
+	case MEMORY_BLOCK_BALLOON_MOVABLE:
+		len = sprintf(buf, "balloon-movable\n");
+		break;
+	case MEMORY_BLOCK_S390X_STANDBY:
+		len = sprintf(buf, "s390x-standby\n");
+		break;
 	default:
 		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
 				mem->state);
@@ -538,7 +553,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+			   MEMORY_BLOCK_DIMM_UNREMOVABLE);
 
 	if (ret)
 		goto out;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 47719862e57f..f502ea6cd255 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -741,7 +741,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				 (HA_CHUNK << PAGE_SHIFT),
+				 MEMORY_BLOCK_BALLOON);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 37d42de06079..0ca6f77e7e1d 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size,
+			   MEMORY_BLOCK_S390X_STANDBY);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 5d2d7a917b4e..953ff86d609b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, MEMORY_BLOCK_BALLOON);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 9f39ef41e6d2..a3a1e9764805 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -59,12 +59,57 @@ int set_memory_block_size_order(unsigned int order);
  *  specific device driver takes care of this memory block. This memory
  *  block type is onlined automatically by the kernel during boot and might
  *  later be managed by a different device driver, in which case the type
- *  might change.
+ *  might change (e.g. to MEMORY_BLOCK_DIMM).
+ *
+ * MEMORY_BLOCK_DIMM:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). Once all memory blocks belonging to the DIMM have been
+ *  offlined, the DIMM along with the memory blocks can be removed to
+ *  effectively unplug it. This memory block type is usually onlined to the
+ *  MOVABLE zone, to make offlining and unplug possible. Examples include
+ *  ACPI DIMMs and PPC LMBs if the kernel supports removal of memory.
+ *
+ * MEMORY_BLOCK_DIMM_UNREMOVABLE:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). There is either no HW interface to remove the DIMM or
+ *  the kernel does not support offlining/removal of memory, so this memory
+ *  block can never be removed. Examples include ACPI DIMMs and PPC LMBs
+ *  when removal of memory is not supported by the kernel, as well as
+ *  memory probed manually from user space.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that does not require a specific zone for optimal operation
+ *  (e.g. unplug memory using balloon inflation on this memory block on
+ *  page granularity). Examples include memory added by the XEN and Hyper-V
+ *  balloon driver.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON_MOVABLE:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that suggests to online this memory block to the MOVABLE zone for
+ *  optimal operation (a.g. unplug using balloon inflation on this memory
+ *  block in bigger chunks than pages). There are no examples yet.
+ *  This memory block type is usually onlined to the MOVABLE zone.
+ *
+ * MEMORY_BLOCK_S390X_STANDBY:
+ *  The memory block is special standby memory on s390x. As long as
+ *  offline, no memory will be allocated to the system for this memory
+ *  block. Onlining memory will result in memory getting allocated to the
+ *  system and memory can usually not be offlined again. The memory block
+ *  will never be removed. This memory type is usually not onlined
+ *  automatically but explicitly by the administrator.
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
 	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
+	MEMORY_BLOCK_DIMM,
+	MEMORY_BLOCK_DIMM_UNREMOVABLE,
+	MEMORY_BLOCK_BALLOON,
+	MEMORY_BLOCK_BALLOON_MOVABLE,
+	MEMORY_BLOCK_S390X_STANDBY,
 };
 
 /* These states are exposed to userspace as text strings in sysfs */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 667a37aa9a3c..7c8895299e8c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory_resource(int nid, struct resource *resource, int type);
 extern int arch_add_memory(int nid, u64 start, u64 size,
 			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7246faa44488..f109002d6e6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1071,7 +1071,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, int type)
 {
 	u64 start, size;
 	bool new_node = false;
@@ -1080,6 +1080,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	start = res->start;
 	size = resource_size(res);
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	ret = check_hotplug_memory_range(start, size);
 	if (ret)
 		return ret;
@@ -1100,7 +1103,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
+	ret = arch_add_memory(nid, start, size, NULL, type);
 	if (ret < 0)
 		goto error;
 
@@ -1141,7 +1144,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, int type)
 {
 	struct resource *res;
 	int ret;
@@ -1150,18 +1153,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, type);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, int type)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, type);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
  2018-11-30 17:59 ` David Hildenbrand
                   ` (5 preceding siblings ...)
  (?)
@ 2018-11-30 17:59 ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, linux-ia64, linux-sh, Benjamin Herrenschmidt,
	Balbir Singh, Dave Hansen, Heiko Carstens, Vitaly Kuznetsov,
	Paul Mackerras, Rashmica Gupta, K. Y. Srinivasan,
	Boris Ostrovsky, linux-s390, Michael Neuling, Stefano Stabellini,
	Stephen Hemminger, Michael Ellerman, x86, YueHaibing,
	Ingo Molnar, linux-acpi, xen-devel, Michal Suchánek,
	Len Brown

Let's introduce new types for different kinds of memory blocks and use
them in existing code. As I don't see an easy way to split this up,
do it in one hunk for now.

acpi:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 Properly change the type when trying to add memory that was already
 detected and used during boot (so this memory will correctly end up as
 "acpi" in user space).

pseries:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 As far as I see, handling like in the acpi case for existing blocks is
 not required.

probed memory from user space:
 Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
 again.

hv_balloon,xen/balloon:
 Use BALLOON. As simple as that :)

s390x/sclp:
 Use a dedicated type S390X_STANDBY as this type of memory and it's
 semantics are very s390x specific.

powernv/memtrace:
 Only allow to use BOOT memory for memtrace. I consider this code in
 general dangerous, but we have to keep it working ... most probably just
 a debug feature.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

---

At first I tried to abstract the types quite a lot, but I think there
are subtle differences that are worth differentiating. More details about
the types can be found in the excessive documentation.

It is wort noting that BALLOON_MOVABLE has no user yet, but I have
something in mind that might want to make use of that (virtio-mem).
Just included it to discuss the general approach. I can drop it from
this patch.
---
 arch/powerpc/platforms/powernv/memtrace.c     |  9 ++--
 .../platforms/pseries/hotplug-memory.c        |  7 ++-
 drivers/acpi/acpi_memhotplug.c                | 16 ++++++-
 drivers/base/memory.c                         | 18 ++++++-
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 47 ++++++++++++++++++-
 include/linux/memory_hotplug.h                |  6 +--
 mm/memory_hotplug.c                           | 15 +++---
 10 files changed, 104 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 248a38ad25c7..5d08db87091e 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
+static int check_memblock_boot_and_online(struct memory_block *mem, void *arg)
 {
-	if (mem->state != MEM_ONLINE)
+	if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE)
 		return -1;
 
 	return 0;
@@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_boot_and_online))
 		return false;
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
@@ -233,7 +233,8 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size)) {
+		if (add_memory(ent->nid, ent->start, ent->size,
+			       MEMORY_BLOCK_BOOT)) {
 			pr_err("Failed to add trace memory to node %d\n",
 				ent->nid);
 			ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2a983b5a52e1..5f91359c7993 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int nid, rc;
+	int nid, rc, type = MEMORY_BLOCK_DIMM;
 
 	if (lmb->flags & DRCONF_MEM_ASSIGNED)
 		return -EINVAL;
@@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	/* Find the node id for this address */
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, type);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8fe0960ea572..f841113b450d 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -177,6 +177,13 @@ static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
 
 static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 {
+	/* switch the type of memory block if this memory was already present */
+	if (mem->type == MEMORY_BLOCK_BOOT) {
+		if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+			mem->type = MEMORY_BLOCK_DIMM;
+		else
+			mem->type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+	}
 	return acpi_bind_one(&mem->dev, arg);
 }
 
@@ -191,6 +198,7 @@ static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
 static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
 {
 	acpi_unbind_one(&mem->dev);
+	mem->type = MEMORY_BLOCK_BOOT;
 	return 0;
 }
 
@@ -203,10 +211,13 @@ static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
 static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 {
 	acpi_handle handle = mem_device->device->handle;
-	int result, num_enabled = 0;
+	int result, num_enabled = 0, type = MEMORY_BLOCK_DIMM;
 	struct acpi_memory_info *info;
 	int node;
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	node = acpi_get_node(handle);
 	/*
 	 * Tell the VM there is more memory here...
@@ -228,7 +239,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length,
+				      type);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c42300082c88..c5fdca7a3009 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,6 +394,21 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
+	case MEMORY_BLOCK_DIMM:
+		len = sprintf(buf, "dimm\n");
+		break;
+	case MEMORY_BLOCK_DIMM_UNREMOVABLE:
+		len = sprintf(buf, "dimm-unremovable\n");
+		break;
+	case MEMORY_BLOCK_BALLOON:
+		len = sprintf(buf, "balloon\n");
+		break;
+	case MEMORY_BLOCK_BALLOON_MOVABLE:
+		len = sprintf(buf, "balloon-movable\n");
+		break;
+	case MEMORY_BLOCK_S390X_STANDBY:
+		len = sprintf(buf, "s390x-standby\n");
+		break;
 	default:
 		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
 				mem->state);
@@ -538,7 +553,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+			   MEMORY_BLOCK_DIMM_UNREMOVABLE);
 
 	if (ret)
 		goto out;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 47719862e57f..f502ea6cd255 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -741,7 +741,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				 (HA_CHUNK << PAGE_SHIFT),
+				 MEMORY_BLOCK_BALLOON);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 37d42de06079..0ca6f77e7e1d 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size,
+			   MEMORY_BLOCK_S390X_STANDBY);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 5d2d7a917b4e..953ff86d609b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, MEMORY_BLOCK_BALLOON);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 9f39ef41e6d2..a3a1e9764805 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -59,12 +59,57 @@ int set_memory_block_size_order(unsigned int order);
  *  specific device driver takes care of this memory block. This memory
  *  block type is onlined automatically by the kernel during boot and might
  *  later be managed by a different device driver, in which case the type
- *  might change.
+ *  might change (e.g. to MEMORY_BLOCK_DIMM).
+ *
+ * MEMORY_BLOCK_DIMM:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). Once all memory blocks belonging to the DIMM have been
+ *  offlined, the DIMM along with the memory blocks can be removed to
+ *  effectively unplug it. This memory block type is usually onlined to the
+ *  MOVABLE zone, to make offlining and unplug possible. Examples include
+ *  ACPI DIMMs and PPC LMBs if the kernel supports removal of memory.
+ *
+ * MEMORY_BLOCK_DIMM_UNREMOVABLE:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). There is either no HW interface to remove the DIMM or
+ *  the kernel does not support offlining/removal of memory, so this memory
+ *  block can never be removed. Examples include ACPI DIMMs and PPC LMBs
+ *  when removal of memory is not supported by the kernel, as well as
+ *  memory probed manually from user space.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that does not require a specific zone for optimal operation
+ *  (e.g. unplug memory using balloon inflation on this memory block on
+ *  page granularity). Examples include memory added by the XEN and Hyper-V
+ *  balloon driver.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON_MOVABLE:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that suggests to online this memory block to the MOVABLE zone for
+ *  optimal operation (a.g. unplug using balloon inflation on this memory
+ *  block in bigger chunks than pages). There are no examples yet.
+ *  This memory block type is usually onlined to the MOVABLE zone.
+ *
+ * MEMORY_BLOCK_S390X_STANDBY:
+ *  The memory block is special standby memory on s390x. As long as
+ *  offline, no memory will be allocated to the system for this memory
+ *  block. Onlining memory will result in memory getting allocated to the
+ *  system and memory can usually not be offlined again. The memory block
+ *  will never be removed. This memory type is usually not onlined
+ *  automatically but explicitly by the administrator.
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
 	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
+	MEMORY_BLOCK_DIMM,
+	MEMORY_BLOCK_DIMM_UNREMOVABLE,
+	MEMORY_BLOCK_BALLOON,
+	MEMORY_BLOCK_BALLOON_MOVABLE,
+	MEMORY_BLOCK_S390X_STANDBY,
 };
 
 /* These states are exposed to userspace as text strings in sysfs */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 667a37aa9a3c..7c8895299e8c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory_resource(int nid, struct resource *resource, int type);
 extern int arch_add_memory(int nid, u64 start, u64 size,
 			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7246faa44488..f109002d6e6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1071,7 +1071,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, int type)
 {
 	u64 start, size;
 	bool new_node = false;
@@ -1080,6 +1080,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	start = res->start;
 	size = resource_size(res);
 
+	if (type == MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	ret = check_hotplug_memory_range(start, size);
 	if (ret)
 		return ret;
@@ -1100,7 +1103,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
+	ret = arch_add_memory(nid, start, size, NULL, type);
 	if (ret < 0)
 		goto error;
 
@@ -1141,7 +1144,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, int type)
 {
 	struct resource *res;
 	int ret;
@@ -1150,18 +1153,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, type);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, int type)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, type);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.17.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky

Let's introduce new types for different kinds of memory blocks and use
them in existing code. As I don't see an easy way to split this up,
do it in one hunk for now.

acpi:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 Properly change the type when trying to add memory that was already
 detected and used during boot (so this memory will correctly end up as
 "acpi" in user space).

pseries:
 Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
 As far as I see, handling like in the acpi case for existing blocks is
 not required.

probed memory from user space:
 Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
 again.

hv_balloon,xen/balloon:
 Use BALLOON. As simple as that :)

s390x/sclp:
 Use a dedicated type S390X_STANDBY as this type of memory and it's
 semantics are very s390x specific.

powernv/memtrace:
 Only allow to use BOOT memory for memtrace. I consider this code in
 general dangerous, but we have to keep it working ... most probably just
 a debug feature.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Arun KS <arunks@codeaurora.org>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Such√°nek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

---

At first I tried to abstract the types quite a lot, but I think there
are subtle differences that are worth differentiating. More details about
the types can be found in the excessive documentation.

It is wort noting that BALLOON_MOVABLE has no user yet, but I have
something in mind that might want to make use of that (virtio-mem).
Just included it to discuss the general approach. I can drop it from
this patch.
---
 arch/powerpc/platforms/powernv/memtrace.c     |  9 ++--
 .../platforms/pseries/hotplug-memory.c        |  7 ++-
 drivers/acpi/acpi_memhotplug.c                | 16 ++++++-
 drivers/base/memory.c                         | 18 ++++++-
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 47 ++++++++++++++++++-
 include/linux/memory_hotplug.h                |  6 +--
 mm/memory_hotplug.c                           | 15 +++---
 10 files changed, 104 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 248a38ad25c7..5d08db87091e 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -54,9 +54,9 @@ static const struct file_operations memtrace_fops = {
 	.open	= simple_open,
 };
 
-static int check_memblock_online(struct memory_block *mem, void *arg)
+static int check_memblock_boot_and_online(struct memory_block *mem, void *arg)
 {
-	if (mem->state != MEM_ONLINE)
+	if (mem->type != MEM_BLOCK_BOOT || mem->state != MEM_ONLINE)
 		return -1;
 
 	return 0;
@@ -77,7 +77,7 @@ static bool memtrace_offline_pages(u32 nid, u64 start_pfn, u64 nr_pages)
 	u64 end_pfn = start_pfn + nr_pages - 1;
 
 	if (walk_memory_range(start_pfn, end_pfn, NULL,
-	    check_memblock_online))
+	    check_memblock_boot_and_online))
 		return false;
 
 	walk_memory_range(start_pfn, end_pfn, (void *)MEM_GOING_OFFLINE,
@@ -233,7 +233,8 @@ static int memtrace_online(void)
 			ent->mem = 0;
 		}
 
-		if (add_memory(ent->nid, ent->start, ent->size)) {
+		if (add_memory(ent->nid, ent->start, ent->size,
+			       MEMORY_BLOCK_BOOT)) {
 			pr_err("Failed to add trace memory to node %d\n",
 				ent->nid);
 			ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 2a983b5a52e1..5f91359c7993 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -651,7 +651,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 static int dlpar_add_lmb(struct drmem_lmb *lmb)
 {
 	unsigned long block_sz;
-	int nid, rc;
+	int nid, rc, type = MEMORY_BLOCK_DIMM;
 
 	if (lmb->flags & DRCONF_MEM_ASSIGNED)
 		return -EINVAL;
@@ -667,8 +667,11 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	/* Find the node id for this address */
 	nid = memory_add_physaddr_to_nid(lmb->base_addr);
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	/* Add the memory */
-	rc = __add_memory(nid, lmb->base_addr, block_sz);
+	rc = __add_memory(nid, lmb->base_addr, block_sz, type);
 	if (rc) {
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index 8fe0960ea572..f841113b450d 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -177,6 +177,13 @@ static unsigned long acpi_meminfo_end_pfn(struct acpi_memory_info *info)
 
 static int acpi_bind_memblk(struct memory_block *mem, void *arg)
 {
+	/* switch the type of memory block if this memory was already present */
+	if (mem->type = MEMORY_BLOCK_BOOT) {
+		if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+			mem->type = MEMORY_BLOCK_DIMM;
+		else
+			mem->type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+	}
 	return acpi_bind_one(&mem->dev, arg);
 }
 
@@ -191,6 +198,7 @@ static int acpi_bind_memory_blocks(struct acpi_memory_info *info,
 static int acpi_unbind_memblk(struct memory_block *mem, void *arg)
 {
 	acpi_unbind_one(&mem->dev);
+	mem->type = MEMORY_BLOCK_BOOT;
 	return 0;
 }
 
@@ -203,10 +211,13 @@ static void acpi_unbind_memory_blocks(struct acpi_memory_info *info)
 static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 {
 	acpi_handle handle = mem_device->device->handle;
-	int result, num_enabled = 0;
+	int result, num_enabled = 0, type = MEMORY_BLOCK_DIMM;
 	struct acpi_memory_info *info;
 	int node;
 
+	if (!IS_ENABLED(CONFIG_MEMORY_HOTREMOVE))
+		type = MEMORY_BLOCK_DIMM_UNREMOVABLE;
+
 	node = acpi_get_node(handle);
 	/*
 	 * Tell the VM there is more memory here...
@@ -228,7 +239,8 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 		if (node < 0)
 			node = memory_add_physaddr_to_nid(info->start_addr);
 
-		result = __add_memory(node, info->start_addr, info->length);
+		result = __add_memory(node, info->start_addr, info->length,
+				      type);
 
 		/*
 		 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c42300082c88..c5fdca7a3009 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -394,6 +394,21 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
+	case MEMORY_BLOCK_DIMM:
+		len = sprintf(buf, "dimm\n");
+		break;
+	case MEMORY_BLOCK_DIMM_UNREMOVABLE:
+		len = sprintf(buf, "dimm-unremovable\n");
+		break;
+	case MEMORY_BLOCK_BALLOON:
+		len = sprintf(buf, "balloon\n");
+		break;
+	case MEMORY_BLOCK_BALLOON_MOVABLE:
+		len = sprintf(buf, "balloon-movable\n");
+		break;
+	case MEMORY_BLOCK_S390X_STANDBY:
+		len = sprintf(buf, "s390x-standby\n");
+		break;
 	default:
 		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
 				mem->state);
@@ -538,7 +553,8 @@ memory_probe_store(struct device *dev, struct device_attribute *attr,
 
 	nid = memory_add_physaddr_to_nid(phys_addr);
 	ret = __add_memory(nid, phys_addr,
-			   MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+			   MIN_MEMORY_BLOCK_SIZE * sections_per_block,
+			   MEMORY_BLOCK_DIMM_UNREMOVABLE);
 
 	if (ret)
 		goto out;
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 47719862e57f..f502ea6cd255 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -741,7 +741,8 @@ static void hv_mem_hot_add(unsigned long start, unsigned long size,
 
 		nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
 		ret = add_memory(nid, PFN_PHYS((start_pfn)),
-				(HA_CHUNK << PAGE_SHIFT));
+				 (HA_CHUNK << PAGE_SHIFT),
+				 MEMORY_BLOCK_BALLOON);
 
 		if (ret) {
 			pr_err("hot_add memory failed error is %d\n", ret);
diff --git a/drivers/s390/char/sclp_cmd.c b/drivers/s390/char/sclp_cmd.c
index 37d42de06079..0ca6f77e7e1d 100644
--- a/drivers/s390/char/sclp_cmd.c
+++ b/drivers/s390/char/sclp_cmd.c
@@ -406,7 +406,8 @@ static void __init add_memory_merged(u16 rn)
 	if (!size)
 		goto skip_add;
 	for (addr = start; addr < start + size; addr += block_size)
-		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size);
+		add_memory(numa_pfn_to_nid(PFN_DOWN(addr)), addr, block_size,
+			   MEMORY_BLOCK_S390X_STANDBY);
 skip_add:
 	first_rn = rn;
 	num = 1;
diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
index 5d2d7a917b4e..953ff86d609b 100644
--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -352,7 +352,7 @@ static enum bp_state reserve_additional_memory(void)
 	mutex_unlock(&balloon_mutex);
 	/* add_memory_resource() requires the device_hotplug lock */
 	lock_device_hotplug();
-	rc = add_memory_resource(nid, resource);
+	rc = add_memory_resource(nid, resource, MEMORY_BLOCK_BALLOON);
 	unlock_device_hotplug();
 	mutex_lock(&balloon_mutex);
 
diff --git a/include/linux/memory.h b/include/linux/memory.h
index 9f39ef41e6d2..a3a1e9764805 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -59,12 +59,57 @@ int set_memory_block_size_order(unsigned int order);
  *  specific device driver takes care of this memory block. This memory
  *  block type is onlined automatically by the kernel during boot and might
  *  later be managed by a different device driver, in which case the type
- *  might change.
+ *  might change (e.g. to MEMORY_BLOCK_DIMM).
+ *
+ * MEMORY_BLOCK_DIMM:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). Once all memory blocks belonging to the DIMM have been
+ *  offlined, the DIMM along with the memory blocks can be removed to
+ *  effectively unplug it. This memory block type is usually onlined to the
+ *  MOVABLE zone, to make offlining and unplug possible. Examples include
+ *  ACPI DIMMs and PPC LMBs if the kernel supports removal of memory.
+ *
+ * MEMORY_BLOCK_DIMM_UNREMOVABLE:
+ *  This memory block is managed by a device driver taking care of DIMMs
+ *  (or similar). There is either no HW interface to remove the DIMM or
+ *  the kernel does not support offlining/removal of memory, so this memory
+ *  block can never be removed. Examples include ACPI DIMMs and PPC LMBs
+ *  when removal of memory is not supported by the kernel, as well as
+ *  memory probed manually from user space.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that does not require a specific zone for optimal operation
+ *  (e.g. unplug memory using balloon inflation on this memory block on
+ *  page granularity). Examples include memory added by the XEN and Hyper-V
+ *  balloon driver.
+ *  This memory block type is usually onlined to the NORMAL zone.
+ *
+ * MEMORY_BLOCK_BALLOON_MOVABLE:
+ *  This memory block was added by a balloon device driver (or similar)
+ *  that suggests to online this memory block to the MOVABLE zone for
+ *  optimal operation (a.g. unplug using balloon inflation on this memory
+ *  block in bigger chunks than pages). There are no examples yet.
+ *  This memory block type is usually onlined to the MOVABLE zone.
+ *
+ * MEMORY_BLOCK_S390X_STANDBY:
+ *  The memory block is special standby memory on s390x. As long as
+ *  offline, no memory will be allocated to the system for this memory
+ *  block. Onlining memory will result in memory getting allocated to the
+ *  system and memory can usually not be offlined again. The memory block
+ *  will never be removed. This memory type is usually not onlined
+ *  automatically but explicitly by the administrator.
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
 	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
+	MEMORY_BLOCK_DIMM,
+	MEMORY_BLOCK_DIMM_UNREMOVABLE,
+	MEMORY_BLOCK_BALLOON,
+	MEMORY_BLOCK_BALLOON_MOVABLE,
+	MEMORY_BLOCK_S390X_STANDBY,
 };
 
 /* These states are exposed to userspace as text strings in sysfs */
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 667a37aa9a3c..7c8895299e8c 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -326,9 +326,9 @@ static inline void __remove_memory(int nid, u64 start, u64 size) {}
 extern void __ref free_area_init_core_hotplug(int nid);
 extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
 		void *arg, int (*func)(struct memory_block *, void *));
-extern int __add_memory(int nid, u64 start, u64 size);
-extern int add_memory(int nid, u64 start, u64 size);
-extern int add_memory_resource(int nid, struct resource *resource);
+extern int __add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory(int nid, u64 start, u64 size, int type);
+extern int add_memory_resource(int nid, struct resource *resource, int type);
 extern int arch_add_memory(int nid, u64 start, u64 size,
 			   struct vmem_altmap *altmap, int type);
 extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 7246faa44488..f109002d6e6e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1071,7 +1071,7 @@ static int online_memory_block(struct memory_block *mem, void *arg)
  *
  * we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG
  */
-int __ref add_memory_resource(int nid, struct resource *res)
+int __ref add_memory_resource(int nid, struct resource *res, int type)
 {
 	u64 start, size;
 	bool new_node = false;
@@ -1080,6 +1080,9 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	start = res->start;
 	size = resource_size(res);
 
+	if (type = MEMORY_BLOCK_NONE)
+		return -EINVAL;
+
 	ret = check_hotplug_memory_range(start, size);
 	if (ret)
 		return ret;
@@ -1100,7 +1103,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 	new_node = ret;
 
 	/* call arch's memory hotadd */
-	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
+	ret = arch_add_memory(nid, start, size, NULL, type);
 	if (ret < 0)
 		goto error;
 
@@ -1141,7 +1144,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
 }
 
 /* requires device_hotplug_lock, see add_memory_resource() */
-int __ref __add_memory(int nid, u64 start, u64 size)
+int __ref __add_memory(int nid, u64 start, u64 size, int type)
 {
 	struct resource *res;
 	int ret;
@@ -1150,18 +1153,18 @@ int __ref __add_memory(int nid, u64 start, u64 size)
 	if (IS_ERR(res))
 		return PTR_ERR(res);
 
-	ret = add_memory_resource(nid, res);
+	ret = add_memory_resource(nid, res, type);
 	if (ret < 0)
 		release_memory_resource(res);
 	return ret;
 }
 
-int add_memory(int nid, u64 start, u64 size)
+int add_memory(int nid, u64 start, u64 size, int type)
 {
 	int rc;
 
 	lock_device_hotplug();
-	rc = __add_memory(nid, start, size);
+	rc = __add_memory(nid, start, size, type);
 	unlock_device_hotplug();
 
 	return rc;
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED
  2018-11-30 17:59 ` David Hildenbrand
                     ` (2 preceding siblings ...)
  (?)
@ 2018-11-30 17:59   ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

We now have proper types for all users, we can drop this one.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 3 ---
 include/linux/memory.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c5fdca7a3009..a6e524f0ea38 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,9 +388,6 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	ssize_t len = 0;
 
 	switch (mem->type) {
-	case MEMORY_BLOCK_UNSPECIFIED:
-		len = sprintf(buf, "unspecified\n");
-		break;
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a3a1e9764805..11679622f743 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,10 +50,6 @@ int set_memory_block_size_order(unsigned int order);
  *  No memory block is to be created (e.g. device memory). Not exposed to
  *  user space.
  *
- * MEMORY_BLOCK_UNSPECIFIED:
- *  The type of memory block was not further specified when adding the
- *  memory block.
- *
  * MEMORY_BLOCK_BOOT:
  *  This memory block was added during boot by the basic system. No
  *  specific device driver takes care of this memory block. This memory
@@ -103,7 +99,6 @@ int set_memory_block_size_order(unsigned int order);
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
-	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
 	MEMORY_BLOCK_DIMM,
 	MEMORY_BLOCK_DIMM_UNREMOVABLE,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

We now have proper types for all users, we can drop this one.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 3 ---
 include/linux/memory.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c5fdca7a3009..a6e524f0ea38 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,9 +388,6 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	ssize_t len = 0;
 
 	switch (mem->type) {
-	case MEMORY_BLOCK_UNSPECIFIED:
-		len = sprintf(buf, "unspecified\n");
-		break;
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a3a1e9764805..11679622f743 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,10 +50,6 @@ int set_memory_block_size_order(unsigned int order);
  *  No memory block is to be created (e.g. device memory). Not exposed to
  *  user space.
  *
- * MEMORY_BLOCK_UNSPECIFIED:
- *  The type of memory block was not further specified when adding the
- *  memory block.
- *
  * MEMORY_BLOCK_BOOT:
  *  This memory block was added during boot by the basic system. No
  *  specific device driver takes care of this memory block. This memory
@@ -103,7 +99,6 @@ int set_memory_block_size_order(unsigned int order);
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
-	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
 	MEMORY_BLOCK_DIMM,
 	MEMORY_BLOCK_DIMM_UNREMOVABLE,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko,
	Michal Suchánek, Vitaly Kuznetsov, Dan Williams,
	Pavel Tatashin

We now have proper types for all users, we can drop this one.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 3 ---
 include/linux/memory.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c5fdca7a3009..a6e524f0ea38 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,9 +388,6 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	ssize_t len = 0;
 
 	switch (mem->type) {
-	case MEMORY_BLOCK_UNSPECIFIED:
-		len = sprintf(buf, "unspecified\n");
-		break;
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a3a1e9764805..11679622f743 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,10 +50,6 @@ int set_memory_block_size_order(unsigned int order);
  *  No memory block is to be created (e.g. device memory). Not exposed to
  *  user space.
  *
- * MEMORY_BLOCK_UNSPECIFIED:
- *  The type of memory block was not further specified when adding the
- *  memory block.
- *
  * MEMORY_BLOCK_BOOT:
  *  This memory block was added during boot by the basic system. No
  *  specific device driver takes care of this memory block. This memory
@@ -103,7 +99,6 @@ int set_memory_block_size_order(unsigned int order);
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
-	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
 	MEMORY_BLOCK_DIMM,
 	MEMORY_BLOCK_DIMM_UNREMOVABLE,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	David Hildenbrand, Michal Hocko, Ingo Molnar, linux-s390, x86,
	Pavel Tatashin, linux-acpi, xen-devel, Michal Suchánek,
	Pavel Tatashin, Stephen Rothwell, mike.travis, Dan Williams,
	Vitaly Kuznetsov, Andrew Banman, Greg Kroah-Hartman,
	linux-kernel, Rafael J. Wysocki, devel, Andrew Morton,
	linuxppc-dev

We now have proper types for all users, we can drop this one.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 3 ---
 include/linux/memory.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c5fdca7a3009..a6e524f0ea38 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,9 +388,6 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	ssize_t len = 0;
 
 	switch (mem->type) {
-	case MEMORY_BLOCK_UNSPECIFIED:
-		len = sprintf(buf, "unspecified\n");
-		break;
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a3a1e9764805..11679622f743 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,10 +50,6 @@ int set_memory_block_size_order(unsigned int order);
  *  No memory block is to be created (e.g. device memory). Not exposed to
  *  user space.
  *
- * MEMORY_BLOCK_UNSPECIFIED:
- *  The type of memory block was not further specified when adding the
- *  memory block.
- *
  * MEMORY_BLOCK_BOOT:
  *  This memory block was added during boot by the basic system. No
  *  specific device driver takes care of this memory block. This memory
@@ -103,7 +99,6 @@ int set_memory_block_size_order(unsigned int order);
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
-	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
 	MEMORY_BLOCK_DIMM,
 	MEMORY_BLOCK_DIMM_UNREMOVABLE,
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED
  2018-11-30 17:59 ` David Hildenbrand
                   ` (8 preceding siblings ...)
  (?)
@ 2018-11-30 17:59 ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	David Hildenbrand, Michal Hocko, Ingo Molnar, linux-s390, x86,
	Pavel Tatashin, linux-acpi, xen-devel, Michal Suchánek,
	Pavel Tatashin, Stephen Rothwell, mike.travis, Dan Williams,
	Vitaly Kuznetsov, Andrew Banman, Greg Kroah-Hartman,
	linux-kernel, Rafael J. Wysocki, devel, Andrew Morton

We now have proper types for all users, we can drop this one.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchánek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 3 ---
 include/linux/memory.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c5fdca7a3009..a6e524f0ea38 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,9 +388,6 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	ssize_t len = 0;
 
 	switch (mem->type) {
-	case MEMORY_BLOCK_UNSPECIFIED:
-		len = sprintf(buf, "unspecified\n");
-		break;
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a3a1e9764805..11679622f743 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,10 +50,6 @@ int set_memory_block_size_order(unsigned int order);
  *  No memory block is to be created (e.g. device memory). Not exposed to
  *  user space.
  *
- * MEMORY_BLOCK_UNSPECIFIED:
- *  The type of memory block was not further specified when adding the
- *  memory block.
- *
  * MEMORY_BLOCK_BOOT:
  *  This memory block was added during boot by the basic system. No
  *  specific device driver takes care of this memory block. This memory
@@ -103,7 +99,6 @@ int set_memory_block_size_order(unsigned int order);
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
-	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
 	MEMORY_BLOCK_DIMM,
 	MEMORY_BLOCK_DIMM_UNREMOVABLE,
-- 
2.17.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED
@ 2018-11-30 17:59   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, David Hildenbrand,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

We now have proper types for all users, we can drop this one.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrew Banman <andrew.banman@hpe.com>
Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Such√°nek <msuchanek@suse.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/base/memory.c  | 3 ---
 include/linux/memory.h | 5 -----
 2 files changed, 8 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index c5fdca7a3009..a6e524f0ea38 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -388,9 +388,6 @@ static ssize_t type_show(struct device *dev, struct device_attribute *attr,
 	ssize_t len = 0;
 
 	switch (mem->type) {
-	case MEMORY_BLOCK_UNSPECIFIED:
-		len = sprintf(buf, "unspecified\n");
-		break;
 	case MEMORY_BLOCK_BOOT:
 		len = sprintf(buf, "boot\n");
 		break;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a3a1e9764805..11679622f743 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -50,10 +50,6 @@ int set_memory_block_size_order(unsigned int order);
  *  No memory block is to be created (e.g. device memory). Not exposed to
  *  user space.
  *
- * MEMORY_BLOCK_UNSPECIFIED:
- *  The type of memory block was not further specified when adding the
- *  memory block.
- *
  * MEMORY_BLOCK_BOOT:
  *  This memory block was added during boot by the basic system. No
  *  specific device driver takes care of this memory block. This memory
@@ -103,7 +99,6 @@ int set_memory_block_size_order(unsigned int order);
  */
 enum {
 	MEMORY_BLOCK_NONE = 0,
-	MEMORY_BLOCK_UNSPECIFIED,
 	MEMORY_BLOCK_BOOT,
 	MEMORY_BLOCK_DIMM,
 	MEMORY_BLOCK_DIMM_UNREMOVABLE,
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59 ` David Hildenbrand
  (?)
@ 2018-12-01  0:48   ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  0:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Michal Hocko, linux-mm,
	Pavel Tatashin, Rich Felker, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, Boris Ostrovsky,
	Paul Mackerras, Pavel Tatashin, linux-s390, Michael Neuling,
	Stefano Stabellini

On Fri, Nov 30, 2018 at 06:59:18PM +0100, David Hildenbrand wrote:
>This is the second approach, introducing more meaningful memory block
>types and not changing online behavior in the kernel. It is based on
>latest linux-next.
>
>As we found out during dicussion, user space should always handle onlining
>of memory, in any case. However in order to make smart decisions in user
>space about if and how to online memory, we have to export more information
>about memory blocks. This way, we can formulate rules in user space.
>
>One such information is the type of memory block we are talking about.
>This helps to answer some questions like:
>- Does this memory block belong to a DIMM?
>- Can this DIMM theoretically ever be unplugged again?
>- Was this memory added by a balloon driver that will rely on balloon
>  inflation to remove chunks of that memory again? Which zone is advised?
>- Is this special standby memory on s390x that is usually not automatically
>  onlined?
>
>And in short it helps to answer to some extend (excluding zone imbalances)
>- Should I online this memory block?
>- To which zone should I online this memory block?
>... of course special use cases will result in different anwers. But that's
>why user space has control of onlining memory.
>
>More details can be found in Patch 1 and Patch 3.
>Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>
>
>Example:
>$ udevadm info -q all -a /sys/devices/system/memory/memory0
>	KERNEL=="memory0"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="00000000"
>	ATTR{removable}=="0"
>	ATTR{state}=="online"
>	ATTR{type}=="boot"
>	ATTR{valid_zones}=="none"
>$ udevadm info -q all -a /sys/devices/system/memory/memory90
>	KERNEL=="memory90"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="0000005a"
>	ATTR{removable}=="1"
>	ATTR{state}=="online"
>	ATTR{type}=="dimm"
>	ATTR{valid_zones}=="Normal"
>
>
>RFC -> RFCv2:
>- Now also taking care of PPC (somehow missed it :/ )
>- Split the series up to some degree (some ideas on how to split up patch 3
>  would be very welcome)
>- Introduce more memory block types. Turns out abstracting too much was
>  rather confusing and not helpful. Properly document them.
>
>Notes:
>- I wanted to convert the enum of types into a named enum but this
>  provoked all kinds of different errors. For now, I am doing it just like
>  the other types (e.g. online_type) we are using in that context.
>- The "removable" property should never have been named like that. It
>  should have been "offlinable". Can we still rename that? E.g. boot memory
>  is sometimes marked as removable ...
>

This make sense to me. Remove usually describe physical hotplug phase,
if I am correct. 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-01  0:48   ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  0:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Andrew Banman,
	Andrew Morton, Andy Lutomirski, Arun KS, Balbir Singh,
	Benjamin Herrenschmidt, Borislav Petkov, Boris Ostrovsky,
	Christophe Leroy, Dan Williams, Dave Hansen, Dave Jiang,
	Fenghua Yu, Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jan H. Sch??nherr,
	J??r??me Glisse, Jonathan Neusch??fer, Joonsoo Kim,
	Juergen Gross, Kirill A. Shutemov, K. Y. Srinivasan, Len Brown,
	Logan Gunthorpe, Martin Schwidefsky, Mathieu Malaterre,
	Matthew Wilcox, Mauricio Faria de Oliveira, Michael Ellerman,
	Michael Neuling, Michal Hocko, Michal Hocko, Michal Such??nek,
	Mike Rapoport, mike.travis, Nathan Fontenot, Nicholas Piggin,
	Oscar Salvador, Oscar Salvador, Paul Mackerras, Pavel Tatashin,
	Pavel Tatashin, Pavel Tatashin, Peter Zijlstra,
	Rafael J. Wysocki, Rafael J. Wysocki, Rashmica Gupta,
	Rich Felker, Rob Herring, Stefano Stabellini, Stephen Hemminger,
	Stephen Rothwell, Thomas Gleixner, Tony Luck, Vasily Gorbik,
	Vitaly Kuznetsov, Wei Yang, Yoshinori Sato, YueHaibing

On Fri, Nov 30, 2018 at 06:59:18PM +0100, David Hildenbrand wrote:
>This is the second approach, introducing more meaningful memory block
>types and not changing online behavior in the kernel. It is based on
>latest linux-next.
>
>As we found out during dicussion, user space should always handle onlining
>of memory, in any case. However in order to make smart decisions in user
>space about if and how to online memory, we have to export more information
>about memory blocks. This way, we can formulate rules in user space.
>
>One such information is the type of memory block we are talking about.
>This helps to answer some questions like:
>- Does this memory block belong to a DIMM?
>- Can this DIMM theoretically ever be unplugged again?
>- Was this memory added by a balloon driver that will rely on balloon
>  inflation to remove chunks of that memory again? Which zone is advised?
>- Is this special standby memory on s390x that is usually not automatically
>  onlined?
>
>And in short it helps to answer to some extend (excluding zone imbalances)
>- Should I online this memory block?
>- To which zone should I online this memory block?
>... of course special use cases will result in different anwers. But that's
>why user space has control of onlining memory.
>
>More details can be found in Patch 1 and Patch 3.
>Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>
>
>Example:
>$ udevadm info -q all -a /sys/devices/system/memory/memory0
>	KERNEL=="memory0"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="00000000"
>	ATTR{removable}=="0"
>	ATTR{state}=="online"
>	ATTR{type}=="boot"
>	ATTR{valid_zones}=="none"
>$ udevadm info -q all -a /sys/devices/system/memory/memory90
>	KERNEL=="memory90"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="0000005a"
>	ATTR{removable}=="1"
>	ATTR{state}=="online"
>	ATTR{type}=="dimm"
>	ATTR{valid_zones}=="Normal"
>
>
>RFC -> RFCv2:
>- Now also taking care of PPC (somehow missed it :/ )
>- Split the series up to some degree (some ideas on how to split up patch 3
>  would be very welcome)
>- Introduce more memory block types. Turns out abstracting too much was
>  rather confusing and not helpful. Properly document them.
>
>Notes:
>- I wanted to convert the enum of types into a named enum but this
>  provoked all kinds of different errors. For now, I am doing it just like
>  the other types (e.g. online_type) we are using in that context.
>- The "removable" property should never have been named like that. It
>  should have been "offlinable". Can we still rename that? E.g. boot memory
>  is sometimes marked as removable ...
>

This make sense to me. Remove usually describe physical hotplug phase,
if I am correct. 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-01  0:48   ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  0:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Dave Hansen, Heiko Carstens,
	Michal Hocko, linux-mm, Pavel Tatashin, Rich Felker, Arun KS,
	H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	K. Y. Srinivasan, Boris Ostrovsky, Paul Mackerras,
	Pavel Tatashin, linux-s390, Michael Neuling, Stefano Stabellini,
	Dave Jiang, Yoshinori Sato, Logan Gunthorpe, x86, YueHaibing,
	Pavel Tatashin, Matthew Wilcox, Ingo Molnar, linux-acpi,
	Ingo Molnar, xen-devel, Michal Such??nek, Len Brown, Fenghua Yu,
	Vitaly Kuznetsov, Jan H. Sch??nherr, Juergen Gross,
	Vasily Gorbik, Rob Herring, mike.travis, Haiyang Zhang,
	Jonathan Neusch??fer, Nicholas Piggin, J??r??me Glisse,
	Mike Rapoport, Borislav Petkov, Andy Lutomirski, Nathan Fontenot,
	Stephen Hemminger, Dan Williams, Wei Yang, Joonsoo Kim,
	Oscar Salvador, Tony Luck, Andrew Banman, Mathieu Malaterre,
	Greg Kroah-Hartman, Rafael J. Wysocki, linux-kernel,
	Mauricio Faria de Oliveira, Thomas Gleixner, Martin Schwidefsky,
	devel, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

On Fri, Nov 30, 2018 at 06:59:18PM +0100, David Hildenbrand wrote:
>This is the second approach, introducing more meaningful memory block
>types and not changing online behavior in the kernel. It is based on
>latest linux-next.
>
>As we found out during dicussion, user space should always handle onlining
>of memory, in any case. However in order to make smart decisions in user
>space about if and how to online memory, we have to export more information
>about memory blocks. This way, we can formulate rules in user space.
>
>One such information is the type of memory block we are talking about.
>This helps to answer some questions like:
>- Does this memory block belong to a DIMM?
>- Can this DIMM theoretically ever be unplugged again?
>- Was this memory added by a balloon driver that will rely on balloon
>  inflation to remove chunks of that memory again? Which zone is advised?
>- Is this special standby memory on s390x that is usually not automatically
>  onlined?
>
>And in short it helps to answer to some extend (excluding zone imbalances)
>- Should I online this memory block?
>- To which zone should I online this memory block?
>... of course special use cases will result in different anwers. But that's
>why user space has control of onlining memory.
>
>More details can be found in Patch 1 and Patch 3.
>Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>
>
>Example:
>$ udevadm info -q all -a /sys/devices/system/memory/memory0
>	KERNEL=="memory0"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="00000000"
>	ATTR{removable}=="0"
>	ATTR{state}=="online"
>	ATTR{type}=="boot"
>	ATTR{valid_zones}=="none"
>$ udevadm info -q all -a /sys/devices/system/memory/memory90
>	KERNEL=="memory90"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="0000005a"
>	ATTR{removable}=="1"
>	ATTR{state}=="online"
>	ATTR{type}=="dimm"
>	ATTR{valid_zones}=="Normal"
>
>
>RFC -> RFCv2:
>- Now also taking care of PPC (somehow missed it :/ )
>- Split the series up to some degree (some ideas on how to split up patch 3
>  would be very welcome)
>- Introduce more memory block types. Turns out abstracting too much was
>  rather confusing and not helpful. Properly document them.
>
>Notes:
>- I wanted to convert the enum of types into a named enum but this
>  provoked all kinds of different errors. For now, I am doing it just like
>  the other types (e.g. online_type) we are using in that context.
>- The "removable" property should never have been named like that. It
>  should have been "offlinable". Can we still rename that? E.g. boot memory
>  is sometimes marked as removable ...
>

This make sense to me. Remove usually describe physical hotplug phase,
if I am correct. 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59 ` David Hildenbrand
                   ` (9 preceding siblings ...)
  (?)
@ 2018-12-01  0:48 ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  0:48 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Michal Hocko, linux-mm,
	Pavel Tatashin, Rich Felker, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, K. Y. Srinivasan,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael Neuling

On Fri, Nov 30, 2018 at 06:59:18PM +0100, David Hildenbrand wrote:
>This is the second approach, introducing more meaningful memory block
>types and not changing online behavior in the kernel. It is based on
>latest linux-next.
>
>As we found out during dicussion, user space should always handle onlining
>of memory, in any case. However in order to make smart decisions in user
>space about if and how to online memory, we have to export more information
>about memory blocks. This way, we can formulate rules in user space.
>
>One such information is the type of memory block we are talking about.
>This helps to answer some questions like:
>- Does this memory block belong to a DIMM?
>- Can this DIMM theoretically ever be unplugged again?
>- Was this memory added by a balloon driver that will rely on balloon
>  inflation to remove chunks of that memory again? Which zone is advised?
>- Is this special standby memory on s390x that is usually not automatically
>  onlined?
>
>And in short it helps to answer to some extend (excluding zone imbalances)
>- Should I online this memory block?
>- To which zone should I online this memory block?
>... of course special use cases will result in different anwers. But that's
>why user space has control of onlining memory.
>
>More details can be found in Patch 1 and Patch 3.
>Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>
>
>Example:
>$ udevadm info -q all -a /sys/devices/system/memory/memory0
>	KERNEL=="memory0"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="00000000"
>	ATTR{removable}=="0"
>	ATTR{state}=="online"
>	ATTR{type}=="boot"
>	ATTR{valid_zones}=="none"
>$ udevadm info -q all -a /sys/devices/system/memory/memory90
>	KERNEL=="memory90"
>	SUBSYSTEM=="memory"
>	DRIVER==""
>	ATTR{online}=="1"
>	ATTR{phys_device}=="0"
>	ATTR{phys_index}=="0000005a"
>	ATTR{removable}=="1"
>	ATTR{state}=="online"
>	ATTR{type}=="dimm"
>	ATTR{valid_zones}=="Normal"
>
>
>RFC -> RFCv2:
>- Now also taking care of PPC (somehow missed it :/ )
>- Split the series up to some degree (some ideas on how to split up patch 3
>  would be very welcome)
>- Introduce more memory block types. Turns out abstracting too much was
>  rather confusing and not helpful. Properly document them.
>
>Notes:
>- I wanted to convert the enum of types into a named enum but this
>  provoked all kinds of different errors. For now, I am doing it just like
>  the other types (e.g. online_type) we are using in that context.
>- The "removable" property should never have been named like that. It
>  should have been "offlinable". Can we still rename that? E.g. boot memory
>  is sometimes marked as removable ...
>

This make sense to me. Remove usually describe physical hotplug phase,
if I am correct. 

-- 
Wei Yang
Help you, Help me

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59   ` David Hildenbrand
  (?)
  (?)
@ 2018-12-01  1:25     ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Greg Kroah-Hartman,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Pavel Tatashin,
	Stephen Rothwell, Andrew Banman, mike.travis, Oscar Salvador,
	Dave Hansen, Michal Hocko, Michal Such??nek, Vital

On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>Memory onlining should always be handled by user space, because only user
>space knows which use cases it wants to satisfy. E.g. memory might be
>onlined to the MOVABLE zone even if it can never be removed from the
>system, e.g. to make usage of huge pages more reliable.
>
>However to implement such rules (especially default rules in distributions)
>we need more information about the memory that was added in user space.
>
>E.g. on x86 we want to online memory provided by balloon devices (e.g.
>XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>the whole block). This might also become relevat for other architectures.
>
>Also, udev rules right now check if running on s390x and treat all added
>memory blocks as standby memory (-> don't online automatically). As soon as
>we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>have to get more involved (e.g. also check if under KVM) but eventually
>also wrong (e.g. if KVM ever supports standby memory we are doomed).
>
>I decided to allow to specify the type of memory that is getting added
>to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>basic infrastructure running. We'll introduce and use further types in
>follow-up patches. For now we classify any hotplugged memory temporarily
>as as UNSPECIFIED (which will eventually be dropped later on).
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Michal Such??nek <msuchanek@suse.de>
>Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
> include/linux/memory.h | 27 +++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 0c290f86ab20..17f2985c07c5 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
> 	return sprintf(buf, "%d\n", mem->phys_device);
> }
> 
>+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>+			 char *buf)
>+{
>+	struct memory_block *mem = to_memory_block(dev);
>+	ssize_t len = 0;
>+
>+	switch (mem->type) {
>+	case MEMORY_BLOCK_UNSPECIFIED:
>+		len = sprintf(buf, "unspecified\n");
>+		break;
>+	case MEMORY_BLOCK_BOOT:
>+		len = sprintf(buf, "boot\n");
>+		break;
>+	default:
>+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>+				mem->state);
>+		WARN_ON(1);
>+		break;
>+	}
>+
>+	return len;
>+}
>+
> #ifdef CONFIG_MEMORY_HOTREMOVE
> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
> 		unsigned long nr_pages, int online_type,
>@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>+static DEVICE_ATTR_RO(type);

This is correct, while looks not consistent with other attributes.

Not that beautiful :-)

> 
> /*
>  * Block size attribute stuff
>@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
> 	&dev_attr_state.attr,
> 	&dev_attr_phys_device.attr,
> 	&dev_attr_removable.attr,
>+	&dev_attr_type.attr,
> #ifdef CONFIG_MEMORY_HOTREMOVE
> 	&dev_attr_valid_zones.attr,
> #endif
>@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
> }
> 
> static int init_memory_block(struct memory_block **memory,
>-			     struct mem_section *section, unsigned long state)
>+			     struct mem_section *section, unsigned long state,
>+			     int type)
> {
> 	struct memory_block *mem;
> 	unsigned long start_pfn;
> 	int scn_nr;
> 	int ret = 0;
> 
>+	if (type = MEMORY_BLOCK_NONE)
>+		return -EINVAL;

No one will pass in this value. Can we omit this check for now?

>+
> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> 	if (!mem)
> 		return -ENOMEM;
>@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
> 	mem->state = state;
> 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
> 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
>+	mem->type = type;
> 
> 	ret = register_memory(mem);
> 
>@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
> 
> 	if (section_count = 0)
> 		return 0;
>-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
>+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
>+				MEMORY_BLOCK_BOOT);
> 	if (ret)
> 		return ret;
> 	mem->section_count = section_count;
>@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 		mem->section_count++;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>+					MEMORY_BLOCK_UNSPECIFIED);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index d75ec88ca09d..06268e96e0da 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -34,12 +34,39 @@ struct memory_block {
> 	int (*phys_callback)(struct memory_block *);
> 	struct device dev;
> 	int nid;			/* NID for this memory block */
>+	int type;			/* type of this memory block */
> };
> 
> int arch_get_memory_phys_device(unsigned long start_pfn);
> unsigned long memory_block_size_bytes(void);
> int set_memory_block_size_order(unsigned int order);
> 
>+/*
>+ * Memory block types allow user space to formulate rules if and how to
>+ * online memory blocks. The types are exposed to user space as text
>+ * strings in sysfs.
>+ *
>+ * MEMORY_BLOCK_NONE:
>+ *  No memory block is to be created (e.g. device memory). Not exposed to
>+ *  user space.
>+ *
>+ * MEMORY_BLOCK_UNSPECIFIED:
>+ *  The type of memory block was not further specified when adding the
>+ *  memory block.
>+ *
>+ * MEMORY_BLOCK_BOOT:
>+ *  This memory block was added during boot by the basic system. No
>+ *  specific device driver takes care of this memory block. This memory
>+ *  block type is onlined automatically by the kernel during boot and might
>+ *  later be managed by a different device driver, in which case the type
>+ *  might change.
>+ */
>+enum {
>+	MEMORY_BLOCK_NONE = 0,
>+	MEMORY_BLOCK_UNSPECIFIED,
>+	MEMORY_BLOCK_BOOT,
>+};
>+
> /* These states are exposed to userspace as text strings in sysfs */
> #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
> #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-01  1:25     ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Greg Kroah-Hartman,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Pavel Tatashin,
	Stephen Rothwell, Andrew Banman, mike.travis, Oscar Salvador,
	Dave Hansen, Michal Hocko, Michal Such??nek, Vital

On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>Memory onlining should always be handled by user space, because only user
>space knows which use cases it wants to satisfy. E.g. memory might be
>onlined to the MOVABLE zone even if it can never be removed from the
>system, e.g. to make usage of huge pages more reliable.
>
>However to implement such rules (especially default rules in distributions)
>we need more information about the memory that was added in user space.
>
>E.g. on x86 we want to online memory provided by balloon devices (e.g.
>XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>the whole block). This might also become relevat for other architectures.
>
>Also, udev rules right now check if running on s390x and treat all added
>memory blocks as standby memory (-> don't online automatically). As soon as
>we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>have to get more involved (e.g. also check if under KVM) but eventually
>also wrong (e.g. if KVM ever supports standby memory we are doomed).
>
>I decided to allow to specify the type of memory that is getting added
>to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>basic infrastructure running. We'll introduce and use further types in
>follow-up patches. For now we classify any hotplugged memory temporarily
>as as UNSPECIFIED (which will eventually be dropped later on).
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Michal Such??nek <msuchanek@suse.de>
>Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
> include/linux/memory.h | 27 +++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 0c290f86ab20..17f2985c07c5 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
> 	return sprintf(buf, "%d\n", mem->phys_device);
> }
> 
>+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>+			 char *buf)
>+{
>+	struct memory_block *mem = to_memory_block(dev);
>+	ssize_t len = 0;
>+
>+	switch (mem->type) {
>+	case MEMORY_BLOCK_UNSPECIFIED:
>+		len = sprintf(buf, "unspecified\n");
>+		break;
>+	case MEMORY_BLOCK_BOOT:
>+		len = sprintf(buf, "boot\n");
>+		break;
>+	default:
>+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>+				mem->state);
>+		WARN_ON(1);
>+		break;
>+	}
>+
>+	return len;
>+}
>+
> #ifdef CONFIG_MEMORY_HOTREMOVE
> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
> 		unsigned long nr_pages, int online_type,
>@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>+static DEVICE_ATTR_RO(type);

This is correct, while looks not consistent with other attributes.

Not that beautiful :-)

> 
> /*
>  * Block size attribute stuff
>@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
> 	&dev_attr_state.attr,
> 	&dev_attr_phys_device.attr,
> 	&dev_attr_removable.attr,
>+	&dev_attr_type.attr,
> #ifdef CONFIG_MEMORY_HOTREMOVE
> 	&dev_attr_valid_zones.attr,
> #endif
>@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
> }
> 
> static int init_memory_block(struct memory_block **memory,
>-			     struct mem_section *section, unsigned long state)
>+			     struct mem_section *section, unsigned long state,
>+			     int type)
> {
> 	struct memory_block *mem;
> 	unsigned long start_pfn;
> 	int scn_nr;
> 	int ret = 0;
> 
>+	if (type == MEMORY_BLOCK_NONE)
>+		return -EINVAL;

No one will pass in this value. Can we omit this check for now?

>+
> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> 	if (!mem)
> 		return -ENOMEM;
>@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
> 	mem->state = state;
> 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
> 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
>+	mem->type = type;
> 
> 	ret = register_memory(mem);
> 
>@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
> 
> 	if (section_count == 0)
> 		return 0;
>-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
>+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
>+				MEMORY_BLOCK_BOOT);
> 	if (ret)
> 		return ret;
> 	mem->section_count = section_count;
>@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 		mem->section_count++;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>+					MEMORY_BLOCK_UNSPECIFIED);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index d75ec88ca09d..06268e96e0da 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -34,12 +34,39 @@ struct memory_block {
> 	int (*phys_callback)(struct memory_block *);
> 	struct device dev;
> 	int nid;			/* NID for this memory block */
>+	int type;			/* type of this memory block */
> };
> 
> int arch_get_memory_phys_device(unsigned long start_pfn);
> unsigned long memory_block_size_bytes(void);
> int set_memory_block_size_order(unsigned int order);
> 
>+/*
>+ * Memory block types allow user space to formulate rules if and how to
>+ * online memory blocks. The types are exposed to user space as text
>+ * strings in sysfs.
>+ *
>+ * MEMORY_BLOCK_NONE:
>+ *  No memory block is to be created (e.g. device memory). Not exposed to
>+ *  user space.
>+ *
>+ * MEMORY_BLOCK_UNSPECIFIED:
>+ *  The type of memory block was not further specified when adding the
>+ *  memory block.
>+ *
>+ * MEMORY_BLOCK_BOOT:
>+ *  This memory block was added during boot by the basic system. No
>+ *  specific device driver takes care of this memory block. This memory
>+ *  block type is onlined automatically by the kernel during boot and might
>+ *  later be managed by a different device driver, in which case the type
>+ *  might change.
>+ */
>+enum {
>+	MEMORY_BLOCK_NONE = 0,
>+	MEMORY_BLOCK_UNSPECIFIED,
>+	MEMORY_BLOCK_BOOT,
>+};
>+
> /* These states are exposed to userspace as text strings in sysfs */
> #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
> #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-01  1:25     ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Greg Kroah-Hartman,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Pavel Tatashin,
	Stephen Rothwell, Andrew Banman, mike.travis, Oscar Salvador,
	Dave Hansen, Michal Hocko, Michal Such??nek, Vitaly Kuznetsov,
	Dan Williams, Pavel Tatashin, Martin Schwidefsky, Heiko Carstens

On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>Memory onlining should always be handled by user space, because only user
>space knows which use cases it wants to satisfy. E.g. memory might be
>onlined to the MOVABLE zone even if it can never be removed from the
>system, e.g. to make usage of huge pages more reliable.
>
>However to implement such rules (especially default rules in distributions)
>we need more information about the memory that was added in user space.
>
>E.g. on x86 we want to online memory provided by balloon devices (e.g.
>XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>the whole block). This might also become relevat for other architectures.
>
>Also, udev rules right now check if running on s390x and treat all added
>memory blocks as standby memory (-> don't online automatically). As soon as
>we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>have to get more involved (e.g. also check if under KVM) but eventually
>also wrong (e.g. if KVM ever supports standby memory we are doomed).
>
>I decided to allow to specify the type of memory that is getting added
>to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>basic infrastructure running. We'll introduce and use further types in
>follow-up patches. For now we classify any hotplugged memory temporarily
>as as UNSPECIFIED (which will eventually be dropped later on).
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Michal Such??nek <msuchanek@suse.de>
>Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
> include/linux/memory.h | 27 +++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 0c290f86ab20..17f2985c07c5 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
> 	return sprintf(buf, "%d\n", mem->phys_device);
> }
> 
>+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>+			 char *buf)
>+{
>+	struct memory_block *mem = to_memory_block(dev);
>+	ssize_t len = 0;
>+
>+	switch (mem->type) {
>+	case MEMORY_BLOCK_UNSPECIFIED:
>+		len = sprintf(buf, "unspecified\n");
>+		break;
>+	case MEMORY_BLOCK_BOOT:
>+		len = sprintf(buf, "boot\n");
>+		break;
>+	default:
>+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>+				mem->state);
>+		WARN_ON(1);
>+		break;
>+	}
>+
>+	return len;
>+}
>+
> #ifdef CONFIG_MEMORY_HOTREMOVE
> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
> 		unsigned long nr_pages, int online_type,
>@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>+static DEVICE_ATTR_RO(type);

This is correct, while looks not consistent with other attributes.

Not that beautiful :-)

> 
> /*
>  * Block size attribute stuff
>@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
> 	&dev_attr_state.attr,
> 	&dev_attr_phys_device.attr,
> 	&dev_attr_removable.attr,
>+	&dev_attr_type.attr,
> #ifdef CONFIG_MEMORY_HOTREMOVE
> 	&dev_attr_valid_zones.attr,
> #endif
>@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
> }
> 
> static int init_memory_block(struct memory_block **memory,
>-			     struct mem_section *section, unsigned long state)
>+			     struct mem_section *section, unsigned long state,
>+			     int type)
> {
> 	struct memory_block *mem;
> 	unsigned long start_pfn;
> 	int scn_nr;
> 	int ret = 0;
> 
>+	if (type == MEMORY_BLOCK_NONE)
>+		return -EINVAL;

No one will pass in this value. Can we omit this check for now?

>+
> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> 	if (!mem)
> 		return -ENOMEM;
>@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
> 	mem->state = state;
> 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
> 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
>+	mem->type = type;
> 
> 	ret = register_memory(mem);
> 
>@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
> 
> 	if (section_count == 0)
> 		return 0;
>-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
>+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
>+				MEMORY_BLOCK_BOOT);
> 	if (ret)
> 		return ret;
> 	mem->section_count = section_count;
>@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 		mem->section_count++;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>+					MEMORY_BLOCK_UNSPECIFIED);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index d75ec88ca09d..06268e96e0da 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -34,12 +34,39 @@ struct memory_block {
> 	int (*phys_callback)(struct memory_block *);
> 	struct device dev;
> 	int nid;			/* NID for this memory block */
>+	int type;			/* type of this memory block */
> };
> 
> int arch_get_memory_phys_device(unsigned long start_pfn);
> unsigned long memory_block_size_bytes(void);
> int set_memory_block_size_order(unsigned int order);
> 
>+/*
>+ * Memory block types allow user space to formulate rules if and how to
>+ * online memory blocks. The types are exposed to user space as text
>+ * strings in sysfs.
>+ *
>+ * MEMORY_BLOCK_NONE:
>+ *  No memory block is to be created (e.g. device memory). Not exposed to
>+ *  user space.
>+ *
>+ * MEMORY_BLOCK_UNSPECIFIED:
>+ *  The type of memory block was not further specified when adding the
>+ *  memory block.
>+ *
>+ * MEMORY_BLOCK_BOOT:
>+ *  This memory block was added during boot by the basic system. No
>+ *  specific device driver takes care of this memory block. This memory
>+ *  block type is onlined automatically by the kernel during boot and might
>+ *  later be managed by a different device driver, in which case the type
>+ *  might change.
>+ */
>+enum {
>+	MEMORY_BLOCK_NONE = 0,
>+	MEMORY_BLOCK_UNSPECIFIED,
>+	MEMORY_BLOCK_BOOT,
>+};
>+
> /* These states are exposed to userspace as text strings in sysfs */
> #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
> #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-01  1:25     ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	Heiko Carstens, Michal Hocko, linux-mm, Ingo Molnar, linux-s390,
	x86, Pavel Tatashin, linux-acpi, xen-devel, Michal Such??nek,
	Pavel Tatashin, Stephen Rothwell, mike.travis,
	Martin Schwidefsky, Dan Williams, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel,
	Rafael J. Wysocki, devel, Andrew Morton, linuxppc-dev

On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>Memory onlining should always be handled by user space, because only user
>space knows which use cases it wants to satisfy. E.g. memory might be
>onlined to the MOVABLE zone even if it can never be removed from the
>system, e.g. to make usage of huge pages more reliable.
>
>However to implement such rules (especially default rules in distributions)
>we need more information about the memory that was added in user space.
>
>E.g. on x86 we want to online memory provided by balloon devices (e.g.
>XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>the whole block). This might also become relevat for other architectures.
>
>Also, udev rules right now check if running on s390x and treat all added
>memory blocks as standby memory (-> don't online automatically). As soon as
>we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>have to get more involved (e.g. also check if under KVM) but eventually
>also wrong (e.g. if KVM ever supports standby memory we are doomed).
>
>I decided to allow to specify the type of memory that is getting added
>to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>basic infrastructure running. We'll introduce and use further types in
>follow-up patches. For now we classify any hotplugged memory temporarily
>as as UNSPECIFIED (which will eventually be dropped later on).
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Michal Such??nek <msuchanek@suse.de>
>Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
> include/linux/memory.h | 27 +++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 0c290f86ab20..17f2985c07c5 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
> 	return sprintf(buf, "%d\n", mem->phys_device);
> }
> 
>+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>+			 char *buf)
>+{
>+	struct memory_block *mem = to_memory_block(dev);
>+	ssize_t len = 0;
>+
>+	switch (mem->type) {
>+	case MEMORY_BLOCK_UNSPECIFIED:
>+		len = sprintf(buf, "unspecified\n");
>+		break;
>+	case MEMORY_BLOCK_BOOT:
>+		len = sprintf(buf, "boot\n");
>+		break;
>+	default:
>+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>+				mem->state);
>+		WARN_ON(1);
>+		break;
>+	}
>+
>+	return len;
>+}
>+
> #ifdef CONFIG_MEMORY_HOTREMOVE
> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
> 		unsigned long nr_pages, int online_type,
>@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>+static DEVICE_ATTR_RO(type);

This is correct, while looks not consistent with other attributes.

Not that beautiful :-)

> 
> /*
>  * Block size attribute stuff
>@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
> 	&dev_attr_state.attr,
> 	&dev_attr_phys_device.attr,
> 	&dev_attr_removable.attr,
>+	&dev_attr_type.attr,
> #ifdef CONFIG_MEMORY_HOTREMOVE
> 	&dev_attr_valid_zones.attr,
> #endif
>@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
> }
> 
> static int init_memory_block(struct memory_block **memory,
>-			     struct mem_section *section, unsigned long state)
>+			     struct mem_section *section, unsigned long state,
>+			     int type)
> {
> 	struct memory_block *mem;
> 	unsigned long start_pfn;
> 	int scn_nr;
> 	int ret = 0;
> 
>+	if (type == MEMORY_BLOCK_NONE)
>+		return -EINVAL;

No one will pass in this value. Can we omit this check for now?

>+
> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> 	if (!mem)
> 		return -ENOMEM;
>@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
> 	mem->state = state;
> 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
> 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
>+	mem->type = type;
> 
> 	ret = register_memory(mem);
> 
>@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
> 
> 	if (section_count == 0)
> 		return 0;
>-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
>+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
>+				MEMORY_BLOCK_BOOT);
> 	if (ret)
> 		return ret;
> 	mem->section_count = section_count;
>@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 		mem->section_count++;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>+					MEMORY_BLOCK_UNSPECIFIED);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index d75ec88ca09d..06268e96e0da 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -34,12 +34,39 @@ struct memory_block {
> 	int (*phys_callback)(struct memory_block *);
> 	struct device dev;
> 	int nid;			/* NID for this memory block */
>+	int type;			/* type of this memory block */
> };
> 
> int arch_get_memory_phys_device(unsigned long start_pfn);
> unsigned long memory_block_size_bytes(void);
> int set_memory_block_size_order(unsigned int order);
> 
>+/*
>+ * Memory block types allow user space to formulate rules if and how to
>+ * online memory blocks. The types are exposed to user space as text
>+ * strings in sysfs.
>+ *
>+ * MEMORY_BLOCK_NONE:
>+ *  No memory block is to be created (e.g. device memory). Not exposed to
>+ *  user space.
>+ *
>+ * MEMORY_BLOCK_UNSPECIFIED:
>+ *  The type of memory block was not further specified when adding the
>+ *  memory block.
>+ *
>+ * MEMORY_BLOCK_BOOT:
>+ *  This memory block was added during boot by the basic system. No
>+ *  specific device driver takes care of this memory block. This memory
>+ *  block type is onlined automatically by the kernel during boot and might
>+ *  later be managed by a different device driver, in which case the type
>+ *  might change.
>+ */
>+enum {
>+	MEMORY_BLOCK_NONE = 0,
>+	MEMORY_BLOCK_UNSPECIFIED,
>+	MEMORY_BLOCK_BOOT,
>+};
>+
> /* These states are exposed to userspace as text strings in sysfs */
> #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
> #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59   ` David Hildenbrand
                     ` (3 preceding siblings ...)
  (?)
@ 2018-12-01  1:25   ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:25 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	Heiko Carstens, Michal Hocko, linux-mm, Ingo Molnar, linux-s390,
	x86, Pavel Tatashin, linux-acpi, xen-devel, Michal Such??nek,
	Pavel Tatashin, Stephen Rothwell, mike.travis,
	Martin Schwidefsky, Dan Williams, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel,
	Rafael J. Wysocki, devel

On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>Memory onlining should always be handled by user space, because only user
>space knows which use cases it wants to satisfy. E.g. memory might be
>onlined to the MOVABLE zone even if it can never be removed from the
>system, e.g. to make usage of huge pages more reliable.
>
>However to implement such rules (especially default rules in distributions)
>we need more information about the memory that was added in user space.
>
>E.g. on x86 we want to online memory provided by balloon devices (e.g.
>XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>the whole block). This might also become relevat for other architectures.
>
>Also, udev rules right now check if running on s390x and treat all added
>memory blocks as standby memory (-> don't online automatically). As soon as
>we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>have to get more involved (e.g. also check if under KVM) but eventually
>also wrong (e.g. if KVM ever supports standby memory we are doomed).
>
>I decided to allow to specify the type of memory that is getting added
>to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>basic infrastructure running. We'll introduce and use further types in
>follow-up patches. For now we classify any hotplugged memory temporarily
>as as UNSPECIFIED (which will eventually be dropped later on).
>
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Ingo Molnar <mingo@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Andrew Banman <andrew.banman@hpe.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Michal Hocko <mhocko@kernel.org>
>Cc: Michal Such??nek <msuchanek@suse.de>
>Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
> include/linux/memory.h | 27 +++++++++++++++++++++++++++
> 2 files changed, 62 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 0c290f86ab20..17f2985c07c5 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
> 	return sprintf(buf, "%d\n", mem->phys_device);
> }
> 
>+static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>+			 char *buf)
>+{
>+	struct memory_block *mem = to_memory_block(dev);
>+	ssize_t len = 0;
>+
>+	switch (mem->type) {
>+	case MEMORY_BLOCK_UNSPECIFIED:
>+		len = sprintf(buf, "unspecified\n");
>+		break;
>+	case MEMORY_BLOCK_BOOT:
>+		len = sprintf(buf, "boot\n");
>+		break;
>+	default:
>+		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>+				mem->state);
>+		WARN_ON(1);
>+		break;
>+	}
>+
>+	return len;
>+}
>+
> #ifdef CONFIG_MEMORY_HOTREMOVE
> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
> 		unsigned long nr_pages, int online_type,
>@@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>+static DEVICE_ATTR_RO(type);

This is correct, while looks not consistent with other attributes.

Not that beautiful :-)

> 
> /*
>  * Block size attribute stuff
>@@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
> 	&dev_attr_state.attr,
> 	&dev_attr_phys_device.attr,
> 	&dev_attr_removable.attr,
>+	&dev_attr_type.attr,
> #ifdef CONFIG_MEMORY_HOTREMOVE
> 	&dev_attr_valid_zones.attr,
> #endif
>@@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
> }
> 
> static int init_memory_block(struct memory_block **memory,
>-			     struct mem_section *section, unsigned long state)
>+			     struct mem_section *section, unsigned long state,
>+			     int type)
> {
> 	struct memory_block *mem;
> 	unsigned long start_pfn;
> 	int scn_nr;
> 	int ret = 0;
> 
>+	if (type == MEMORY_BLOCK_NONE)
>+		return -EINVAL;

No one will pass in this value. Can we omit this check for now?

>+
> 	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> 	if (!mem)
> 		return -ENOMEM;
>@@ -675,6 +704,7 @@ static int init_memory_block(struct memory_block **memory,
> 	mem->state = state;
> 	start_pfn = section_nr_to_pfn(mem->start_section_nr);
> 	mem->phys_device = arch_get_memory_phys_device(start_pfn);
>+	mem->type = type;
> 
> 	ret = register_memory(mem);
> 
>@@ -699,7 +729,8 @@ static int add_memory_block(int base_section_nr)
> 
> 	if (section_count == 0)
> 		return 0;
>-	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE);
>+	ret = init_memory_block(&mem, __nr_to_section(section_nr), MEM_ONLINE,
>+				MEMORY_BLOCK_BOOT);
> 	if (ret)
> 		return ret;
> 	mem->section_count = section_count;
>@@ -722,7 +753,8 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 		mem->section_count++;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>+					MEMORY_BLOCK_UNSPECIFIED);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index d75ec88ca09d..06268e96e0da 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -34,12 +34,39 @@ struct memory_block {
> 	int (*phys_callback)(struct memory_block *);
> 	struct device dev;
> 	int nid;			/* NID for this memory block */
>+	int type;			/* type of this memory block */
> };
> 
> int arch_get_memory_phys_device(unsigned long start_pfn);
> unsigned long memory_block_size_bytes(void);
> int set_memory_block_size_order(unsigned int order);
> 
>+/*
>+ * Memory block types allow user space to formulate rules if and how to
>+ * online memory blocks. The types are exposed to user space as text
>+ * strings in sysfs.
>+ *
>+ * MEMORY_BLOCK_NONE:
>+ *  No memory block is to be created (e.g. device memory). Not exposed to
>+ *  user space.
>+ *
>+ * MEMORY_BLOCK_UNSPECIFIED:
>+ *  The type of memory block was not further specified when adding the
>+ *  memory block.
>+ *
>+ * MEMORY_BLOCK_BOOT:
>+ *  This memory block was added during boot by the basic system. No
>+ *  specific device driver takes care of this memory block. This memory
>+ *  block type is onlined automatically by the kernel during boot and might
>+ *  later be managed by a different device driver, in which case the type
>+ *  might change.
>+ */
>+enum {
>+	MEMORY_BLOCK_NONE = 0,
>+	MEMORY_BLOCK_UNSPECIFIED,
>+	MEMORY_BLOCK_BOOT,
>+};
>+
> /* These states are exposed to userspace as text strings in sysfs */
> #define	MEM_ONLINE		(1<<0) /* exposed to userspace */
> #define	MEM_GOING_OFFLINE	(1<<1) /* exposed to userspace */
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  2018-11-30 17:59   ` David Hildenbrand
  (?)
@ 2018-12-01  1:50     ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Benjamin Herrenschmidt, Dave Hansen,
	Heiko Carstens, Wei Yang, linux-mm, Michal Hocko, Paul Mackerras,
	H. Peter Anvin, Dan Williams, Rafael J. Wysocki, linux-s390,
	Dave Jiang, Yoshinori Sato, Michael Ellerman, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu

On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>functional change.

I would suggest to put more words to this.

"
Function arch_add_memory()'s last parameter *want_memblock* is used to
determin whether it is necessary to create a corresponding memory block
device. After introducing the memory block type, this patch replaces the
bool type *want_memblock* with memory block type with following rules
for now:

  * Pass "MEMORY_BLOCK_NONE" for device memory
  * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 

Since this parameter is passed deep to __add_section(), all its
descendents are effected. Below lists those descendents.

  arch_add_memory()
    add_pages()
      __add_pages()
        __add_section()

"

>
>Cc: Tony Luck <tony.luck@intel.com>
>Cc: Fenghua Yu <fenghua.yu@intel.com>
>Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>Cc: Paul Mackerras <paulus@samba.org>
>Cc: Michael Ellerman <mpe@ellerman.id.au>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
>Cc: Rich Felker <dalias@libc.org>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: "H. Peter Anvin" <hpa@zytor.com>
>Cc: x86@kernel.org
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Nicholas Piggin <npiggin@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>Cc: "Jonathan Neusch??fer" <j.neuschaefer@gmx.net>
>Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
>Cc: Vasily Gorbik <gor@linux.ibm.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Rob Herring <robh@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Logan Gunthorpe <logang@deltatee.com>
>Cc: "J??r??me Glisse" <jglisse@redhat.com>
>Cc: "Jan H. Sch??nherr" <jschoenh@amazon.de>
>Cc: Dave Jiang <dave.jiang@intel.com>
>Cc: Matthew Wilcox <willy@infradead.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> arch/ia64/mm/init.c            |  4 ++--
> arch/powerpc/mm/mem.c          |  4 ++--
> arch/s390/mm/init.c            |  4 ++--
> arch/sh/mm/init.c              |  4 ++--
> arch/x86/mm/init_32.c          |  4 ++--
> arch/x86/mm/init_64.c          |  8 ++++----
> drivers/base/memory.c          | 11 +++++++----
> include/linux/memory.h         |  2 +-
> include/linux/memory_hotplug.h | 12 ++++++------
> kernel/memremap.c              |  6 ++++--
> mm/memory_hotplug.c            | 16 ++++++++--------
> 11 files changed, 40 insertions(+), 35 deletions(-)
>
>diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
>index 904fe55e10fc..408635d2902f 100644
>--- a/arch/ia64/mm/init.c
>+++ b/arch/ia64/mm/init.c
>@@ -646,13 +646,13 @@ mem_init (void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (ret)
> 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> 		       __func__,  ret);
>diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>index b3c9ee5c4f78..e394637da270 100644
>--- a/arch/powerpc/mm/mem.c
>+++ b/arch/powerpc/mm/mem.c
>@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
> }
> 
> int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			      int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
>@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
> 	}
> 	flush_inval_dcache_range(start, start + size);
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>index 3e82f66d5c61..ba2c56328e6d 100644
>--- a/arch/s390/mm/init.c
>+++ b/arch/s390/mm/init.c
>@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
> #endif /* CONFIG_CMA */
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long size_pages = PFN_DOWN(size);
>@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
> 	if (rc)
> 		return rc;
> 
>-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
>+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
> 	if (rc)
> 		vmem_remove_mapping(start, size);
> 	return rc;
>diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
>index 1a483a008872..5fbb8724e0f2 100644
>--- a/arch/sh/mm/init.c
>+++ b/arch/sh/mm/init.c
>@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
> 	/* We only have ZONE_NORMAL, so this is easy.. */
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (unlikely(ret))
> 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
> 
>diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
>index 0b8c7b0033d2..41e409b29d2b 100644
>--- a/arch/x86/mm/init_32.c
>+++ b/arch/x86/mm/init_32.c
>@@ -851,12 +851,12 @@ void __init mem_init(void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>index f80d98381a97..5b4f3dcd44cf 100644
>--- a/arch/x86/mm/init_64.c
>+++ b/arch/x86/mm/init_64.c
>@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
> }
> 
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+	      struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	WARN_ON_ONCE(ret);
> 
> 	/* update max_pfn, max_low_pfn and high_memory */
>@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> }
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
> 	init_memory_mapping(start, start + size);
> 
>-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #define PAGE_INUSE 0xFD
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 17f2985c07c5..c42300082c88 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
>  * need an interface for the VM to add new memory regions,
>  * but without onlining it.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(int nid, struct mem_section *section, int type)
> {
> 	int ret = 0;
> 	struct memory_block *mem;
>@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 
> 	mem = find_memory_block(section);
> 	if (mem) {
>-		mem->section_count++;
>+		/* make sure the type matches */
>+		if (mem->type == type)
>+			mem->section_count++;
>+		else
>+			ret = -EINVAL;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>-					MEMORY_BLOCK_UNSPECIFIED);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index 06268e96e0da..9f39ef41e6d2 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
> extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>-int hotplug_memory_register(int nid, struct mem_section *section);
>+int hotplug_memory_register(int nid, struct mem_section *section, int type);
> #ifdef CONFIG_MEMORY_HOTREMOVE
> extern int unregister_memory_section(int nid, struct mem_section *);
> #endif
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 5493d3fa0c7f..667a37aa9a3c 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
> 
> /* reasonably generic interface to expand the physical pages */
> extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+		       struct vmem_altmap *altmap, int type);
> 
> #ifndef CONFIG_ARCH_HAS_ADD_PAGES
> static inline int add_pages(int nid, unsigned long start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			    unsigned long nr_pages, struct vmem_altmap *altmap,
>+			    int type)
> {
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> #else /* ARCH_HAS_ADD_PAGES */
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+	      struct vmem_altmap *altmap, int type);
> #endif /* ARCH_HAS_ADD_PAGES */
> 
> #ifdef CONFIG_NUMA
>@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
> extern int add_memory(int nid, u64 start, u64 size);
> extern int add_memory_resource(int nid, struct resource *resource);
> extern int arch_add_memory(int nid, u64 start, u64 size,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+			   struct vmem_altmap *altmap, int type);
> extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> 		unsigned long nr_pages, struct vmem_altmap *altmap);
> extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>diff --git a/kernel/memremap.c b/kernel/memremap.c
>index 66cbf334203b..422e4e779208 100644
>--- a/kernel/memremap.c
>+++ b/kernel/memremap.c
>@@ -4,6 +4,7 @@
> #include <linux/io.h>
> #include <linux/kasan.h>
> #include <linux/memory_hotplug.h>
>+#include <linux/memory.h>
> #include <linux/mm.h>
> #include <linux/pfn_t.h>
> #include <linux/swap.h>
>@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 	 */
> 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> 		error = add_pages(nid, align_start >> PAGE_SHIFT,
>-				align_size >> PAGE_SHIFT, NULL, false);
>+				  align_size >> PAGE_SHIFT, NULL,
>+				  MEMORY_BLOCK_NONE);
> 	} else {
> 		error = kasan_add_zero_shadow(__va(align_start), align_size);
> 		if (error) {
>@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 		}
> 
> 		error = arch_add_memory(nid, align_start, align_size, altmap,
>-				false);
>+					MEMORY_BLOCK_NONE);

Ok, it is used here.

> 	}
> 
> 	if (!error) {
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 16c600771298..7246faa44488 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> 
> static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+				   struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> 	if (ret < 0)
> 		return ret;
> 
>-	if (!want_memblock)
>+	if (type == MEMORY_BLOCK_NONE)
> 		return 0;
> 
>-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
>+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
>+				       type);
> }
> 
> /*
>@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>  * add the new pages.
>  */
> int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		      unsigned long nr_pages, struct vmem_altmap *altmap,
>+		      int type)
> {
> 	unsigned long i;
> 	int err = 0;
>@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> 	}
> 
> 	for (i = start_sec; i <= end_sec; i++) {
>-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
>-				want_memblock);
>+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
> 
> 		/*
> 		 * EEXIST is finally dealt with by ioresource collision
>@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
> 	new_node = ret;
> 
> 	/* call arch's memory hotadd */
>-	ret = arch_add_memory(nid, start, size, NULL, true);
>+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
> 	if (ret < 0)
> 		goto error;
> 
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
@ 2018-12-01  1:50     ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Tony Luck,
	Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Martin Schwidefsky, Heiko Carstens,
	Yoshinori Sato, Rich Felker, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Morton, Mike Rapoport, Michal Hocko, Dan Williams,
	Kirill A. Shutemov, Oscar Salvador, Nicholas Piggin,
	Stephen Rothwell, Christophe Leroy, Jonathan Neusch??fer,
	Mauricio Faria de Oliveira, Vasily Gorbik, Arun KS, Rob Herring,
	Pavel Tatashin, mike.travis, Joonsoo Kim, Wei Yang,
	Logan Gunthorpe, J??r??me Glisse, Jan H. Sch??nherr, Dave Jiang,
	Matthew Wilcox, Mathieu Malaterre

On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>functional change.

I would suggest to put more words to this.

"
Function arch_add_memory()'s last parameter *want_memblock* is used to
determin whether it is necessary to create a corresponding memory block
device. After introducing the memory block type, this patch replaces the
bool type *want_memblock* with memory block type with following rules
for now:

  * Pass "MEMORY_BLOCK_NONE" for device memory
  * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 

Since this parameter is passed deep to __add_section(), all its
descendents are effected. Below lists those descendents.

  arch_add_memory()
    add_pages()
      __add_pages()
        __add_section()

"

>
>Cc: Tony Luck <tony.luck@intel.com>
>Cc: Fenghua Yu <fenghua.yu@intel.com>
>Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>Cc: Paul Mackerras <paulus@samba.org>
>Cc: Michael Ellerman <mpe@ellerman.id.au>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
>Cc: Rich Felker <dalias@libc.org>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: "H. Peter Anvin" <hpa@zytor.com>
>Cc: x86@kernel.org
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Nicholas Piggin <npiggin@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>Cc: "Jonathan Neusch??fer" <j.neuschaefer@gmx.net>
>Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
>Cc: Vasily Gorbik <gor@linux.ibm.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Rob Herring <robh@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Logan Gunthorpe <logang@deltatee.com>
>Cc: "J??r??me Glisse" <jglisse@redhat.com>
>Cc: "Jan H. Sch??nherr" <jschoenh@amazon.de>
>Cc: Dave Jiang <dave.jiang@intel.com>
>Cc: Matthew Wilcox <willy@infradead.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> arch/ia64/mm/init.c            |  4 ++--
> arch/powerpc/mm/mem.c          |  4 ++--
> arch/s390/mm/init.c            |  4 ++--
> arch/sh/mm/init.c              |  4 ++--
> arch/x86/mm/init_32.c          |  4 ++--
> arch/x86/mm/init_64.c          |  8 ++++----
> drivers/base/memory.c          | 11 +++++++----
> include/linux/memory.h         |  2 +-
> include/linux/memory_hotplug.h | 12 ++++++------
> kernel/memremap.c              |  6 ++++--
> mm/memory_hotplug.c            | 16 ++++++++--------
> 11 files changed, 40 insertions(+), 35 deletions(-)
>
>diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
>index 904fe55e10fc..408635d2902f 100644
>--- a/arch/ia64/mm/init.c
>+++ b/arch/ia64/mm/init.c
>@@ -646,13 +646,13 @@ mem_init (void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (ret)
> 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> 		       __func__,  ret);
>diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>index b3c9ee5c4f78..e394637da270 100644
>--- a/arch/powerpc/mm/mem.c
>+++ b/arch/powerpc/mm/mem.c
>@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
> }
> 
> int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			      int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
>@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
> 	}
> 	flush_inval_dcache_range(start, start + size);
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>index 3e82f66d5c61..ba2c56328e6d 100644
>--- a/arch/s390/mm/init.c
>+++ b/arch/s390/mm/init.c
>@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
> #endif /* CONFIG_CMA */
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long size_pages = PFN_DOWN(size);
>@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
> 	if (rc)
> 		return rc;
> 
>-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
>+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
> 	if (rc)
> 		vmem_remove_mapping(start, size);
> 	return rc;
>diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
>index 1a483a008872..5fbb8724e0f2 100644
>--- a/arch/sh/mm/init.c
>+++ b/arch/sh/mm/init.c
>@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
> 	/* We only have ZONE_NORMAL, so this is easy.. */
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (unlikely(ret))
> 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
> 
>diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
>index 0b8c7b0033d2..41e409b29d2b 100644
>--- a/arch/x86/mm/init_32.c
>+++ b/arch/x86/mm/init_32.c
>@@ -851,12 +851,12 @@ void __init mem_init(void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>index f80d98381a97..5b4f3dcd44cf 100644
>--- a/arch/x86/mm/init_64.c
>+++ b/arch/x86/mm/init_64.c
>@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
> }
> 
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+	      struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	WARN_ON_ONCE(ret);
> 
> 	/* update max_pfn, max_low_pfn and high_memory */
>@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> }
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
> 	init_memory_mapping(start, start + size);
> 
>-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #define PAGE_INUSE 0xFD
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 17f2985c07c5..c42300082c88 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
>  * need an interface for the VM to add new memory regions,
>  * but without onlining it.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(int nid, struct mem_section *section, int type)
> {
> 	int ret = 0;
> 	struct memory_block *mem;
>@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 
> 	mem = find_memory_block(section);
> 	if (mem) {
>-		mem->section_count++;
>+		/* make sure the type matches */
>+		if (mem->type == type)
>+			mem->section_count++;
>+		else
>+			ret = -EINVAL;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>-					MEMORY_BLOCK_UNSPECIFIED);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index 06268e96e0da..9f39ef41e6d2 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
> extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>-int hotplug_memory_register(int nid, struct mem_section *section);
>+int hotplug_memory_register(int nid, struct mem_section *section, int type);
> #ifdef CONFIG_MEMORY_HOTREMOVE
> extern int unregister_memory_section(int nid, struct mem_section *);
> #endif
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 5493d3fa0c7f..667a37aa9a3c 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
> 
> /* reasonably generic interface to expand the physical pages */
> extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+		       struct vmem_altmap *altmap, int type);
> 
> #ifndef CONFIG_ARCH_HAS_ADD_PAGES
> static inline int add_pages(int nid, unsigned long start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			    unsigned long nr_pages, struct vmem_altmap *altmap,
>+			    int type)
> {
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> #else /* ARCH_HAS_ADD_PAGES */
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+	      struct vmem_altmap *altmap, int type);
> #endif /* ARCH_HAS_ADD_PAGES */
> 
> #ifdef CONFIG_NUMA
>@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
> extern int add_memory(int nid, u64 start, u64 size);
> extern int add_memory_resource(int nid, struct resource *resource);
> extern int arch_add_memory(int nid, u64 start, u64 size,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+			   struct vmem_altmap *altmap, int type);
> extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> 		unsigned long nr_pages, struct vmem_altmap *altmap);
> extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>diff --git a/kernel/memremap.c b/kernel/memremap.c
>index 66cbf334203b..422e4e779208 100644
>--- a/kernel/memremap.c
>+++ b/kernel/memremap.c
>@@ -4,6 +4,7 @@
> #include <linux/io.h>
> #include <linux/kasan.h>
> #include <linux/memory_hotplug.h>
>+#include <linux/memory.h>
> #include <linux/mm.h>
> #include <linux/pfn_t.h>
> #include <linux/swap.h>
>@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 	 */
> 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> 		error = add_pages(nid, align_start >> PAGE_SHIFT,
>-				align_size >> PAGE_SHIFT, NULL, false);
>+				  align_size >> PAGE_SHIFT, NULL,
>+				  MEMORY_BLOCK_NONE);
> 	} else {
> 		error = kasan_add_zero_shadow(__va(align_start), align_size);
> 		if (error) {
>@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 		}
> 
> 		error = arch_add_memory(nid, align_start, align_size, altmap,
>-				false);
>+					MEMORY_BLOCK_NONE);

Ok, it is used here.

> 	}
> 
> 	if (!error) {
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 16c600771298..7246faa44488 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> 
> static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+				   struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> 	if (ret < 0)
> 		return ret;
> 
>-	if (!want_memblock)
>+	if (type == MEMORY_BLOCK_NONE)
> 		return 0;
> 
>-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
>+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
>+				       type);
> }
> 
> /*
>@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>  * add the new pages.
>  */
> int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		      unsigned long nr_pages, struct vmem_altmap *altmap,
>+		      int type)
> {
> 	unsigned long i;
> 	int err = 0;
>@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> 	}
> 
> 	for (i = start_sec; i <= end_sec; i++) {
>-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
>-				want_memblock);
>+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
> 
> 		/*
> 		 * EEXIST is finally dealt with by ioresource collision
>@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
> 	new_node = ret;
> 
> 	/* call arch's memory hotadd */
>-	ret = arch_add_memory(nid, start, size, NULL, true);
>+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
> 	if (ret < 0)
> 		goto error;
> 
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
@ 2018-12-01  1:50     ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Dave Hansen, Heiko Carstens, Wei Yang, linux-mm,
	Michal Hocko, Paul Mackerras, H. Peter Anvin, Dan Williams,
	Rafael J. Wysocki, linux-s390, Dave Jiang, Yoshinori Sato, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu, Jan H. Sch??nherr, Pavel Tatashin, Vasily Gorbik,
	Stephen Rothwell, mike.travis, Jonathan Neusch??fer,
	Nicholas Piggin, J??r??me Glisse, Mike Rapoport, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, Joonsoo Kim, Arun KS,
	Tony Luck, Mathieu Malaterre, Greg Kroah-Hartman, linux-kernel,
	Logan Gunthorpe, Mauricio Faria de Oliveira, Martin Schwidefsky,
	devel, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>functional change.

I would suggest to put more words to this.

"
Function arch_add_memory()'s last parameter *want_memblock* is used to
determin whether it is necessary to create a corresponding memory block
device. After introducing the memory block type, this patch replaces the
bool type *want_memblock* with memory block type with following rules
for now:

  * Pass "MEMORY_BLOCK_NONE" for device memory
  * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 

Since this parameter is passed deep to __add_section(), all its
descendents are effected. Below lists those descendents.

  arch_add_memory()
    add_pages()
      __add_pages()
        __add_section()

"

>
>Cc: Tony Luck <tony.luck@intel.com>
>Cc: Fenghua Yu <fenghua.yu@intel.com>
>Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>Cc: Paul Mackerras <paulus@samba.org>
>Cc: Michael Ellerman <mpe@ellerman.id.au>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
>Cc: Rich Felker <dalias@libc.org>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: "H. Peter Anvin" <hpa@zytor.com>
>Cc: x86@kernel.org
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Nicholas Piggin <npiggin@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>Cc: "Jonathan Neusch??fer" <j.neuschaefer@gmx.net>
>Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
>Cc: Vasily Gorbik <gor@linux.ibm.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Rob Herring <robh@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Logan Gunthorpe <logang@deltatee.com>
>Cc: "J??r??me Glisse" <jglisse@redhat.com>
>Cc: "Jan H. Sch??nherr" <jschoenh@amazon.de>
>Cc: Dave Jiang <dave.jiang@intel.com>
>Cc: Matthew Wilcox <willy@infradead.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> arch/ia64/mm/init.c            |  4 ++--
> arch/powerpc/mm/mem.c          |  4 ++--
> arch/s390/mm/init.c            |  4 ++--
> arch/sh/mm/init.c              |  4 ++--
> arch/x86/mm/init_32.c          |  4 ++--
> arch/x86/mm/init_64.c          |  8 ++++----
> drivers/base/memory.c          | 11 +++++++----
> include/linux/memory.h         |  2 +-
> include/linux/memory_hotplug.h | 12 ++++++------
> kernel/memremap.c              |  6 ++++--
> mm/memory_hotplug.c            | 16 ++++++++--------
> 11 files changed, 40 insertions(+), 35 deletions(-)
>
>diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
>index 904fe55e10fc..408635d2902f 100644
>--- a/arch/ia64/mm/init.c
>+++ b/arch/ia64/mm/init.c
>@@ -646,13 +646,13 @@ mem_init (void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (ret)
> 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> 		       __func__,  ret);
>diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>index b3c9ee5c4f78..e394637da270 100644
>--- a/arch/powerpc/mm/mem.c
>+++ b/arch/powerpc/mm/mem.c
>@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
> }
> 
> int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			      int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
>@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
> 	}
> 	flush_inval_dcache_range(start, start + size);
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>index 3e82f66d5c61..ba2c56328e6d 100644
>--- a/arch/s390/mm/init.c
>+++ b/arch/s390/mm/init.c
>@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
> #endif /* CONFIG_CMA */
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long size_pages = PFN_DOWN(size);
>@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
> 	if (rc)
> 		return rc;
> 
>-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
>+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
> 	if (rc)
> 		vmem_remove_mapping(start, size);
> 	return rc;
>diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
>index 1a483a008872..5fbb8724e0f2 100644
>--- a/arch/sh/mm/init.c
>+++ b/arch/sh/mm/init.c
>@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
> 	/* We only have ZONE_NORMAL, so this is easy.. */
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (unlikely(ret))
> 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
> 
>diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
>index 0b8c7b0033d2..41e409b29d2b 100644
>--- a/arch/x86/mm/init_32.c
>+++ b/arch/x86/mm/init_32.c
>@@ -851,12 +851,12 @@ void __init mem_init(void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>index f80d98381a97..5b4f3dcd44cf 100644
>--- a/arch/x86/mm/init_64.c
>+++ b/arch/x86/mm/init_64.c
>@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
> }
> 
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+	      struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	WARN_ON_ONCE(ret);
> 
> 	/* update max_pfn, max_low_pfn and high_memory */
>@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> }
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
> 	init_memory_mapping(start, start + size);
> 
>-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #define PAGE_INUSE 0xFD
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 17f2985c07c5..c42300082c88 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
>  * need an interface for the VM to add new memory regions,
>  * but without onlining it.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(int nid, struct mem_section *section, int type)
> {
> 	int ret = 0;
> 	struct memory_block *mem;
>@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 
> 	mem = find_memory_block(section);
> 	if (mem) {
>-		mem->section_count++;
>+		/* make sure the type matches */
>+		if (mem->type == type)
>+			mem->section_count++;
>+		else
>+			ret = -EINVAL;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>-					MEMORY_BLOCK_UNSPECIFIED);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index 06268e96e0da..9f39ef41e6d2 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
> extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>-int hotplug_memory_register(int nid, struct mem_section *section);
>+int hotplug_memory_register(int nid, struct mem_section *section, int type);
> #ifdef CONFIG_MEMORY_HOTREMOVE
> extern int unregister_memory_section(int nid, struct mem_section *);
> #endif
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 5493d3fa0c7f..667a37aa9a3c 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
> 
> /* reasonably generic interface to expand the physical pages */
> extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+		       struct vmem_altmap *altmap, int type);
> 
> #ifndef CONFIG_ARCH_HAS_ADD_PAGES
> static inline int add_pages(int nid, unsigned long start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			    unsigned long nr_pages, struct vmem_altmap *altmap,
>+			    int type)
> {
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> #else /* ARCH_HAS_ADD_PAGES */
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+	      struct vmem_altmap *altmap, int type);
> #endif /* ARCH_HAS_ADD_PAGES */
> 
> #ifdef CONFIG_NUMA
>@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
> extern int add_memory(int nid, u64 start, u64 size);
> extern int add_memory_resource(int nid, struct resource *resource);
> extern int arch_add_memory(int nid, u64 start, u64 size,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+			   struct vmem_altmap *altmap, int type);
> extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> 		unsigned long nr_pages, struct vmem_altmap *altmap);
> extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>diff --git a/kernel/memremap.c b/kernel/memremap.c
>index 66cbf334203b..422e4e779208 100644
>--- a/kernel/memremap.c
>+++ b/kernel/memremap.c
>@@ -4,6 +4,7 @@
> #include <linux/io.h>
> #include <linux/kasan.h>
> #include <linux/memory_hotplug.h>
>+#include <linux/memory.h>
> #include <linux/mm.h>
> #include <linux/pfn_t.h>
> #include <linux/swap.h>
>@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 	 */
> 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> 		error = add_pages(nid, align_start >> PAGE_SHIFT,
>-				align_size >> PAGE_SHIFT, NULL, false);
>+				  align_size >> PAGE_SHIFT, NULL,
>+				  MEMORY_BLOCK_NONE);
> 	} else {
> 		error = kasan_add_zero_shadow(__va(align_start), align_size);
> 		if (error) {
>@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 		}
> 
> 		error = arch_add_memory(nid, align_start, align_size, altmap,
>-				false);
>+					MEMORY_BLOCK_NONE);

Ok, it is used here.

> 	}
> 
> 	if (!error) {
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 16c600771298..7246faa44488 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> 
> static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+				   struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> 	if (ret < 0)
> 		return ret;
> 
>-	if (!want_memblock)
>+	if (type == MEMORY_BLOCK_NONE)
> 		return 0;
> 
>-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
>+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
>+				       type);
> }
> 
> /*
>@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>  * add the new pages.
>  */
> int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		      unsigned long nr_pages, struct vmem_altmap *altmap,
>+		      int type)
> {
> 	unsigned long i;
> 	int err = 0;
>@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> 	}
> 
> 	for (i = start_sec; i <= end_sec; i++) {
>-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
>-				want_memblock);
>+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
> 
> 		/*
> 		 * EEXIST is finally dealt with by ioresource collision
>@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
> 	new_node = ret;
> 
> 	/* call arch's memory hotadd */
>-	ret = arch_add_memory(nid, start, size, NULL, true);
>+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
> 	if (ret < 0)
> 		goto error;
> 
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  2018-11-30 17:59   ` David Hildenbrand
  (?)
  (?)
@ 2018-12-01  1:50   ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-01  1:50 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Benjamin Herrenschmidt, Dave Hansen,
	Heiko Carstens, Wei Yang, linux-mm, Michal Hocko, Paul Mackerras,
	H. Peter Anvin, Dan Williams, Rafael J. Wysocki, linux-s390,
	Dave Jiang, Yoshinori Sato, Michael Ellerman, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu

On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>functional change.

I would suggest to put more words to this.

"
Function arch_add_memory()'s last parameter *want_memblock* is used to
determin whether it is necessary to create a corresponding memory block
device. After introducing the memory block type, this patch replaces the
bool type *want_memblock* with memory block type with following rules
for now:

  * Pass "MEMORY_BLOCK_NONE" for device memory
  * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 

Since this parameter is passed deep to __add_section(), all its
descendents are effected. Below lists those descendents.

  arch_add_memory()
    add_pages()
      __add_pages()
        __add_section()

"

>
>Cc: Tony Luck <tony.luck@intel.com>
>Cc: Fenghua Yu <fenghua.yu@intel.com>
>Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>Cc: Paul Mackerras <paulus@samba.org>
>Cc: Michael Ellerman <mpe@ellerman.id.au>
>Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
>Cc: Rich Felker <dalias@libc.org>
>Cc: Dave Hansen <dave.hansen@linux.intel.com>
>Cc: Andy Lutomirski <luto@kernel.org>
>Cc: Peter Zijlstra <peterz@infradead.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: "H. Peter Anvin" <hpa@zytor.com>
>Cc: x86@kernel.org
>Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>Cc: Andrew Morton <akpm@linux-foundation.org>
>Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
>Cc: Michal Hocko <mhocko@suse.com>
>Cc: Dan Williams <dan.j.williams@intel.com>
>Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>Cc: Oscar Salvador <osalvador@suse.com>
>Cc: Nicholas Piggin <npiggin@gmail.com>
>Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>Cc: Christophe Leroy <christophe.leroy@c-s.fr>
>Cc: "Jonathan Neusch??fer" <j.neuschaefer@gmx.net>
>Cc: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
>Cc: Vasily Gorbik <gor@linux.ibm.com>
>Cc: Arun KS <arunks@codeaurora.org>
>Cc: Rob Herring <robh@kernel.org>
>Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>Cc: Wei Yang <richard.weiyang@gmail.com>
>Cc: Logan Gunthorpe <logang@deltatee.com>
>Cc: "J??r??me Glisse" <jglisse@redhat.com>
>Cc: "Jan H. Sch??nherr" <jschoenh@amazon.de>
>Cc: Dave Jiang <dave.jiang@intel.com>
>Cc: Matthew Wilcox <willy@infradead.org>
>Cc: Mathieu Malaterre <malat@debian.org>
>Signed-off-by: David Hildenbrand <david@redhat.com>
>---
> arch/ia64/mm/init.c            |  4 ++--
> arch/powerpc/mm/mem.c          |  4 ++--
> arch/s390/mm/init.c            |  4 ++--
> arch/sh/mm/init.c              |  4 ++--
> arch/x86/mm/init_32.c          |  4 ++--
> arch/x86/mm/init_64.c          |  8 ++++----
> drivers/base/memory.c          | 11 +++++++----
> include/linux/memory.h         |  2 +-
> include/linux/memory_hotplug.h | 12 ++++++------
> kernel/memremap.c              |  6 ++++--
> mm/memory_hotplug.c            | 16 ++++++++--------
> 11 files changed, 40 insertions(+), 35 deletions(-)
>
>diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
>index 904fe55e10fc..408635d2902f 100644
>--- a/arch/ia64/mm/init.c
>+++ b/arch/ia64/mm/init.c
>@@ -646,13 +646,13 @@ mem_init (void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (ret)
> 		printk("%s: Problem encountered in __add_pages() as ret=%d\n",
> 		       __func__,  ret);
>diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>index b3c9ee5c4f78..e394637da270 100644
>--- a/arch/powerpc/mm/mem.c
>+++ b/arch/powerpc/mm/mem.c
>@@ -118,7 +118,7 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end)
> }
> 
> int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			      int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
>@@ -135,7 +135,7 @@ int __meminit arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *
> 	}
> 	flush_inval_dcache_range(start, start + size);
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>index 3e82f66d5c61..ba2c56328e6d 100644
>--- a/arch/s390/mm/init.c
>+++ b/arch/s390/mm/init.c
>@@ -225,7 +225,7 @@ device_initcall(s390_cma_mem_init);
> #endif /* CONFIG_CMA */
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long size_pages = PFN_DOWN(size);
>@@ -235,7 +235,7 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
> 	if (rc)
> 		return rc;
> 
>-	rc = __add_pages(nid, start_pfn, size_pages, altmap, want_memblock);
>+	rc = __add_pages(nid, start_pfn, size_pages, altmap, type);
> 	if (rc)
> 		vmem_remove_mapping(start, size);
> 	return rc;
>diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
>index 1a483a008872..5fbb8724e0f2 100644
>--- a/arch/sh/mm/init.c
>+++ b/arch/sh/mm/init.c
>@@ -419,14 +419,14 @@ void free_initrd_mem(unsigned long start, unsigned long end)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = PFN_DOWN(start);
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 	int ret;
> 
> 	/* We only have ZONE_NORMAL, so this is easy.. */
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	if (unlikely(ret))
> 		printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
> 
>diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
>index 0b8c7b0033d2..41e409b29d2b 100644
>--- a/arch/x86/mm/init_32.c
>+++ b/arch/x86/mm/init_32.c
>@@ -851,12 +851,12 @@ void __init mem_init(void)
> 
> #ifdef CONFIG_MEMORY_HOTPLUG
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #ifdef CONFIG_MEMORY_HOTREMOVE
>diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>index f80d98381a97..5b4f3dcd44cf 100644
>--- a/arch/x86/mm/init_64.c
>+++ b/arch/x86/mm/init_64.c
>@@ -783,11 +783,11 @@ static void update_end_of_memory_vars(u64 start, u64 size)
> }
> 
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+	      struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>-	ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	ret = __add_pages(nid, start_pfn, nr_pages, altmap, type);
> 	WARN_ON_ONCE(ret);
> 
> 	/* update max_pfn, max_low_pfn and high_memory */
>@@ -798,14 +798,14 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
> }
> 
> int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		    int type)
> {
> 	unsigned long start_pfn = start >> PAGE_SHIFT;
> 	unsigned long nr_pages = size >> PAGE_SHIFT;
> 
> 	init_memory_mapping(start, start + size);
> 
>-	return add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> 
> #define PAGE_INUSE 0xFD
>diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>index 17f2985c07c5..c42300082c88 100644
>--- a/drivers/base/memory.c
>+++ b/drivers/base/memory.c
>@@ -741,7 +741,7 @@ static int add_memory_block(int base_section_nr)
>  * need an interface for the VM to add new memory regions,
>  * but without onlining it.
>  */
>-int hotplug_memory_register(int nid, struct mem_section *section)
>+int hotplug_memory_register(int nid, struct mem_section *section, int type)
> {
> 	int ret = 0;
> 	struct memory_block *mem;
>@@ -750,11 +750,14 @@ int hotplug_memory_register(int nid, struct mem_section *section)
> 
> 	mem = find_memory_block(section);
> 	if (mem) {
>-		mem->section_count++;
>+		/* make sure the type matches */
>+		if (mem->type == type)
>+			mem->section_count++;
>+		else
>+			ret = -EINVAL;
> 		put_device(&mem->dev);
> 	} else {
>-		ret = init_memory_block(&mem, section, MEM_OFFLINE,
>-					MEMORY_BLOCK_UNSPECIFIED);
>+		ret = init_memory_block(&mem, section, MEM_OFFLINE, type);
> 		if (ret)
> 			goto out;
> 		mem->section_count++;
>diff --git a/include/linux/memory.h b/include/linux/memory.h
>index 06268e96e0da..9f39ef41e6d2 100644
>--- a/include/linux/memory.h
>+++ b/include/linux/memory.h
>@@ -138,7 +138,7 @@ extern int register_memory_notifier(struct notifier_block *nb);
> extern void unregister_memory_notifier(struct notifier_block *nb);
> extern int register_memory_isolate_notifier(struct notifier_block *nb);
> extern void unregister_memory_isolate_notifier(struct notifier_block *nb);
>-int hotplug_memory_register(int nid, struct mem_section *section);
>+int hotplug_memory_register(int nid, struct mem_section *section, int type);
> #ifdef CONFIG_MEMORY_HOTREMOVE
> extern int unregister_memory_section(int nid, struct mem_section *);
> #endif
>diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
>index 5493d3fa0c7f..667a37aa9a3c 100644
>--- a/include/linux/memory_hotplug.h
>+++ b/include/linux/memory_hotplug.h
>@@ -117,18 +117,18 @@ extern void shrink_zone(struct zone *zone, unsigned long start_pfn,
> 
> /* reasonably generic interface to expand the physical pages */
> extern int __add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+		       struct vmem_altmap *altmap, int type);
> 
> #ifndef CONFIG_ARCH_HAS_ADD_PAGES
> static inline int add_pages(int nid, unsigned long start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+			    unsigned long nr_pages, struct vmem_altmap *altmap,
>+			    int type)
> {
>-	return __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
>+	return __add_pages(nid, start_pfn, nr_pages, altmap, type);
> }
> #else /* ARCH_HAS_ADD_PAGES */
> int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+	      struct vmem_altmap *altmap, int type);
> #endif /* ARCH_HAS_ADD_PAGES */
> 
> #ifdef CONFIG_NUMA
>@@ -330,7 +330,7 @@ extern int __add_memory(int nid, u64 start, u64 size);
> extern int add_memory(int nid, u64 start, u64 size);
> extern int add_memory_resource(int nid, struct resource *resource);
> extern int arch_add_memory(int nid, u64 start, u64 size,
>-		struct vmem_altmap *altmap, bool want_memblock);
>+			   struct vmem_altmap *altmap, int type);
> extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> 		unsigned long nr_pages, struct vmem_altmap *altmap);
> extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>diff --git a/kernel/memremap.c b/kernel/memremap.c
>index 66cbf334203b..422e4e779208 100644
>--- a/kernel/memremap.c
>+++ b/kernel/memremap.c
>@@ -4,6 +4,7 @@
> #include <linux/io.h>
> #include <linux/kasan.h>
> #include <linux/memory_hotplug.h>
>+#include <linux/memory.h>
> #include <linux/mm.h>
> #include <linux/pfn_t.h>
> #include <linux/swap.h>
>@@ -215,7 +216,8 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 	 */
> 	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
> 		error = add_pages(nid, align_start >> PAGE_SHIFT,
>-				align_size >> PAGE_SHIFT, NULL, false);
>+				  align_size >> PAGE_SHIFT, NULL,
>+				  MEMORY_BLOCK_NONE);
> 	} else {
> 		error = kasan_add_zero_shadow(__va(align_start), align_size);
> 		if (error) {
>@@ -224,7 +226,7 @@ void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
> 		}
> 
> 		error = arch_add_memory(nid, align_start, align_size, altmap,
>-				false);
>+					MEMORY_BLOCK_NONE);

Ok, it is used here.

> 	}
> 
> 	if (!error) {
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index 16c600771298..7246faa44488 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -246,7 +246,7 @@ void __init register_page_bootmem_info_node(struct pglist_data *pgdat)
> #endif /* CONFIG_HAVE_BOOTMEM_INFO_NODE */
> 
> static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>-		struct vmem_altmap *altmap, bool want_memblock)
>+				   struct vmem_altmap *altmap, int type)
> {
> 	int ret;
> 
>@@ -257,10 +257,11 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
> 	if (ret < 0)
> 		return ret;
> 
>-	if (!want_memblock)
>+	if (type == MEMORY_BLOCK_NONE)
> 		return 0;
> 
>-	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn));
>+	return hotplug_memory_register(nid, __pfn_to_section(phys_start_pfn),
>+				       type);
> }
> 
> /*
>@@ -270,8 +271,8 @@ static int __meminit __add_section(int nid, unsigned long phys_start_pfn,
>  * add the new pages.
>  */
> int __ref __add_pages(int nid, unsigned long phys_start_pfn,
>-		unsigned long nr_pages, struct vmem_altmap *altmap,
>-		bool want_memblock)
>+		      unsigned long nr_pages, struct vmem_altmap *altmap,
>+		      int type)
> {
> 	unsigned long i;
> 	int err = 0;
>@@ -295,8 +296,7 @@ int __ref __add_pages(int nid, unsigned long phys_start_pfn,
> 	}
> 
> 	for (i = start_sec; i <= end_sec; i++) {
>-		err = __add_section(nid, section_nr_to_pfn(i), altmap,
>-				want_memblock);
>+		err = __add_section(nid, section_nr_to_pfn(i), altmap, type);
> 
> 		/*
> 		 * EEXIST is finally dealt with by ioresource collision
>@@ -1100,7 +1100,7 @@ int __ref add_memory_resource(int nid, struct resource *res)
> 	new_node = ret;
> 
> 	/* call arch's memory hotadd */
>-	ret = arch_add_memory(nid, start, size, NULL, true);
>+	ret = arch_add_memory(nid, start, size, NULL, MEMORY_TYPE_UNSPECIFIED);
> 	if (ret < 0)
> 		goto error;
> 
>-- 
>2.17.2

-- 
Wei Yang
Help you, Help me

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-12-01  1:25     ` Wei Yang
  (?)
  (?)
@ 2018-12-03 10:32       ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:32 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Greg Kroah-Hartman,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Pavel Tatashin,
	Stephen Rothwell, Andrew Banman, mike.travis, Oscar Salvador,
	Dave Hansen, Michal Hocko, Michal Such??nek, Vital

On 01.12.18 02:25, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>> Memory onlining should always be handled by user space, because only user
>> space knows which use cases it wants to satisfy. E.g. memory might be
>> onlined to the MOVABLE zone even if it can never be removed from the
>> system, e.g. to make usage of huge pages more reliable.
>>
>> However to implement such rules (especially default rules in distributions)
>> we need more information about the memory that was added in user space.
>>
>> E.g. on x86 we want to online memory provided by balloon devices (e.g.
>> XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>> block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>> the whole block). This might also become relevat for other architectures.
>>
>> Also, udev rules right now check if running on s390x and treat all added
>> memory blocks as standby memory (-> don't online automatically). As soon as
>> we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>> have to get more involved (e.g. also check if under KVM) but eventually
>> also wrong (e.g. if KVM ever supports standby memory we are doomed).
>>
>> I decided to allow to specify the type of memory that is getting added
>> to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>> basic infrastructure running. We'll introduce and use further types in
>> follow-up patches. For now we classify any hotplugged memory temporarily
>> as as UNSPECIFIED (which will eventually be dropped later on).
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Michal Such??nek <msuchanek@suse.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
>> include/linux/memory.h | 27 +++++++++++++++++++++++++++
>> 2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 0c290f86ab20..17f2985c07c5 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
>> 	return sprintf(buf, "%d\n", mem->phys_device);
>> }
>>
>> +static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>> +			 char *buf)
>> +{
>> +	struct memory_block *mem = to_memory_block(dev);
>> +	ssize_t len = 0;
>> +
>> +	switch (mem->type) {
>> +	case MEMORY_BLOCK_UNSPECIFIED:
>> +		len = sprintf(buf, "unspecified\n");
>> +		break;
>> +	case MEMORY_BLOCK_BOOT:
>> +		len = sprintf(buf, "boot\n");
>> +		break;
>> +	default:
>> +		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>> +				mem->state);
>> +		WARN_ON(1);
>> +		break;
>> +	}
>> +
>> +	return len;
>> +}
>> +
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>> 		unsigned long nr_pages, int online_type,
>> @@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
>> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
>> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
>> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>> +static DEVICE_ATTR_RO(type);
> 
> This is correct, while looks not consistent with other attributes.
> 
> Not that beautiful :-)

I might change the other ones first, too (or keep this one consistent to
the existing ones). Thanks!

> 
>>
>> /*
>>  * Block size attribute stuff
>> @@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
>> 	&dev_attr_state.attr,
>> 	&dev_attr_phys_device.attr,
>> 	&dev_attr_removable.attr,
>> +	&dev_attr_type.attr,
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> 	&dev_attr_valid_zones.attr,
>> #endif
>> @@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
>> }
>>
>> static int init_memory_block(struct memory_block **memory,
>> -			     struct mem_section *section, unsigned long state)
>> +			     struct mem_section *section, unsigned long state,
>> +			     int type)
>> {
>> 	struct memory_block *mem;
>> 	unsigned long start_pfn;
>> 	int scn_nr;
>> 	int ret = 0;
>>
>> +	if (type = MEMORY_BLOCK_NONE)
>> +		return -EINVAL;
> 
> No one will pass in this value. Can we omit this check for now?

I could move it to patch nr 2 I guess, but as I introduce
MEMORY_BLOCK_NONE here it made sense to keep it in here.

(and I think at least for now it makes sense to not squash patch 1 and
2, to easier discuss the new user interface/concept introduced in this
patch).

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-03 10:32       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:32 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Greg Kroah-Hartman,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Pavel Tatashin,
	Stephen Rothwell, Andrew Banman, mike.travis, Oscar Salvador,
	Dave Hansen, Michal Hocko, Michal Such??nek, Vital

On 01.12.18 02:25, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>> Memory onlining should always be handled by user space, because only user
>> space knows which use cases it wants to satisfy. E.g. memory might be
>> onlined to the MOVABLE zone even if it can never be removed from the
>> system, e.g. to make usage of huge pages more reliable.
>>
>> However to implement such rules (especially default rules in distributions)
>> we need more information about the memory that was added in user space.
>>
>> E.g. on x86 we want to online memory provided by balloon devices (e.g.
>> XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>> block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>> the whole block). This might also become relevat for other architectures.
>>
>> Also, udev rules right now check if running on s390x and treat all added
>> memory blocks as standby memory (-> don't online automatically). As soon as
>> we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>> have to get more involved (e.g. also check if under KVM) but eventually
>> also wrong (e.g. if KVM ever supports standby memory we are doomed).
>>
>> I decided to allow to specify the type of memory that is getting added
>> to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>> basic infrastructure running. We'll introduce and use further types in
>> follow-up patches. For now we classify any hotplugged memory temporarily
>> as as UNSPECIFIED (which will eventually be dropped later on).
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Michal Such??nek <msuchanek@suse.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
>> include/linux/memory.h | 27 +++++++++++++++++++++++++++
>> 2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 0c290f86ab20..17f2985c07c5 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
>> 	return sprintf(buf, "%d\n", mem->phys_device);
>> }
>>
>> +static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>> +			 char *buf)
>> +{
>> +	struct memory_block *mem = to_memory_block(dev);
>> +	ssize_t len = 0;
>> +
>> +	switch (mem->type) {
>> +	case MEMORY_BLOCK_UNSPECIFIED:
>> +		len = sprintf(buf, "unspecified\n");
>> +		break;
>> +	case MEMORY_BLOCK_BOOT:
>> +		len = sprintf(buf, "boot\n");
>> +		break;
>> +	default:
>> +		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>> +				mem->state);
>> +		WARN_ON(1);
>> +		break;
>> +	}
>> +
>> +	return len;
>> +}
>> +
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>> 		unsigned long nr_pages, int online_type,
>> @@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
>> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
>> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
>> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>> +static DEVICE_ATTR_RO(type);
> 
> This is correct, while looks not consistent with other attributes.
> 
> Not that beautiful :-)

I might change the other ones first, too (or keep this one consistent to
the existing ones). Thanks!

> 
>>
>> /*
>>  * Block size attribute stuff
>> @@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
>> 	&dev_attr_state.attr,
>> 	&dev_attr_phys_device.attr,
>> 	&dev_attr_removable.attr,
>> +	&dev_attr_type.attr,
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> 	&dev_attr_valid_zones.attr,
>> #endif
>> @@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
>> }
>>
>> static int init_memory_block(struct memory_block **memory,
>> -			     struct mem_section *section, unsigned long state)
>> +			     struct mem_section *section, unsigned long state,
>> +			     int type)
>> {
>> 	struct memory_block *mem;
>> 	unsigned long start_pfn;
>> 	int scn_nr;
>> 	int ret = 0;
>>
>> +	if (type == MEMORY_BLOCK_NONE)
>> +		return -EINVAL;
> 
> No one will pass in this value. Can we omit this check for now?

I could move it to patch nr 2 I guess, but as I introduce
MEMORY_BLOCK_NONE here it made sense to keep it in here.

(and I think at least for now it makes sense to not squash patch 1 and
2, to easier discuss the new user interface/concept introduced in this
patch).

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-03 10:32       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:32 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Greg Kroah-Hartman,
	Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Pavel Tatashin,
	Stephen Rothwell, Andrew Banman, mike.travis, Oscar Salvador,
	Dave Hansen, Michal Hocko, Michal Such??nek, Vitaly Kuznetsov,
	Dan Williams, Pavel Tatashin, Martin Schwidefsky, Heiko Carstens

On 01.12.18 02:25, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>> Memory onlining should always be handled by user space, because only user
>> space knows which use cases it wants to satisfy. E.g. memory might be
>> onlined to the MOVABLE zone even if it can never be removed from the
>> system, e.g. to make usage of huge pages more reliable.
>>
>> However to implement such rules (especially default rules in distributions)
>> we need more information about the memory that was added in user space.
>>
>> E.g. on x86 we want to online memory provided by balloon devices (e.g.
>> XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>> block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>> the whole block). This might also become relevat for other architectures.
>>
>> Also, udev rules right now check if running on s390x and treat all added
>> memory blocks as standby memory (-> don't online automatically). As soon as
>> we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>> have to get more involved (e.g. also check if under KVM) but eventually
>> also wrong (e.g. if KVM ever supports standby memory we are doomed).
>>
>> I decided to allow to specify the type of memory that is getting added
>> to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>> basic infrastructure running. We'll introduce and use further types in
>> follow-up patches. For now we classify any hotplugged memory temporarily
>> as as UNSPECIFIED (which will eventually be dropped later on).
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Michal Such??nek <msuchanek@suse.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
>> include/linux/memory.h | 27 +++++++++++++++++++++++++++
>> 2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 0c290f86ab20..17f2985c07c5 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
>> 	return sprintf(buf, "%d\n", mem->phys_device);
>> }
>>
>> +static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>> +			 char *buf)
>> +{
>> +	struct memory_block *mem = to_memory_block(dev);
>> +	ssize_t len = 0;
>> +
>> +	switch (mem->type) {
>> +	case MEMORY_BLOCK_UNSPECIFIED:
>> +		len = sprintf(buf, "unspecified\n");
>> +		break;
>> +	case MEMORY_BLOCK_BOOT:
>> +		len = sprintf(buf, "boot\n");
>> +		break;
>> +	default:
>> +		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>> +				mem->state);
>> +		WARN_ON(1);
>> +		break;
>> +	}
>> +
>> +	return len;
>> +}
>> +
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>> 		unsigned long nr_pages, int online_type,
>> @@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
>> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
>> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
>> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>> +static DEVICE_ATTR_RO(type);
> 
> This is correct, while looks not consistent with other attributes.
> 
> Not that beautiful :-)

I might change the other ones first, too (or keep this one consistent to
the existing ones). Thanks!

> 
>>
>> /*
>>  * Block size attribute stuff
>> @@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
>> 	&dev_attr_state.attr,
>> 	&dev_attr_phys_device.attr,
>> 	&dev_attr_removable.attr,
>> +	&dev_attr_type.attr,
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> 	&dev_attr_valid_zones.attr,
>> #endif
>> @@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
>> }
>>
>> static int init_memory_block(struct memory_block **memory,
>> -			     struct mem_section *section, unsigned long state)
>> +			     struct mem_section *section, unsigned long state,
>> +			     int type)
>> {
>> 	struct memory_block *mem;
>> 	unsigned long start_pfn;
>> 	int scn_nr;
>> 	int ret = 0;
>>
>> +	if (type == MEMORY_BLOCK_NONE)
>> +		return -EINVAL;
> 
> No one will pass in this value. Can we omit this check for now?

I could move it to patch nr 2 I guess, but as I introduce
MEMORY_BLOCK_NONE here it made sense to keep it in here.

(and I think at least for now it makes sense to not squash patch 1 and
2, to easier discuss the new user interface/concept introduced in this
patch).

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-03 10:32       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:32 UTC (permalink / raw)
  To: Wei Yang
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	Heiko Carstens, Michal Hocko, linux-mm, Ingo Molnar, linux-s390,
	x86, Pavel Tatashin, linux-acpi, xen-devel, Michal Such??nek,
	Pavel Tatashin, Stephen Rothwell, mike.travis,
	Martin Schwidefsky, Dan Williams, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel,
	Rafael J. Wysocki, devel, Andrew Morton, linuxppc-dev

On 01.12.18 02:25, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>> Memory onlining should always be handled by user space, because only user
>> space knows which use cases it wants to satisfy. E.g. memory might be
>> onlined to the MOVABLE zone even if it can never be removed from the
>> system, e.g. to make usage of huge pages more reliable.
>>
>> However to implement such rules (especially default rules in distributions)
>> we need more information about the memory that was added in user space.
>>
>> E.g. on x86 we want to online memory provided by balloon devices (e.g.
>> XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>> block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>> the whole block). This might also become relevat for other architectures.
>>
>> Also, udev rules right now check if running on s390x and treat all added
>> memory blocks as standby memory (-> don't online automatically). As soon as
>> we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>> have to get more involved (e.g. also check if under KVM) but eventually
>> also wrong (e.g. if KVM ever supports standby memory we are doomed).
>>
>> I decided to allow to specify the type of memory that is getting added
>> to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>> basic infrastructure running. We'll introduce and use further types in
>> follow-up patches. For now we classify any hotplugged memory temporarily
>> as as UNSPECIFIED (which will eventually be dropped later on).
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Michal Such??nek <msuchanek@suse.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
>> include/linux/memory.h | 27 +++++++++++++++++++++++++++
>> 2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 0c290f86ab20..17f2985c07c5 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
>> 	return sprintf(buf, "%d\n", mem->phys_device);
>> }
>>
>> +static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>> +			 char *buf)
>> +{
>> +	struct memory_block *mem = to_memory_block(dev);
>> +	ssize_t len = 0;
>> +
>> +	switch (mem->type) {
>> +	case MEMORY_BLOCK_UNSPECIFIED:
>> +		len = sprintf(buf, "unspecified\n");
>> +		break;
>> +	case MEMORY_BLOCK_BOOT:
>> +		len = sprintf(buf, "boot\n");
>> +		break;
>> +	default:
>> +		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>> +				mem->state);
>> +		WARN_ON(1);
>> +		break;
>> +	}
>> +
>> +	return len;
>> +}
>> +
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>> 		unsigned long nr_pages, int online_type,
>> @@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
>> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
>> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
>> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>> +static DEVICE_ATTR_RO(type);
> 
> This is correct, while looks not consistent with other attributes.
> 
> Not that beautiful :-)

I might change the other ones first, too (or keep this one consistent to
the existing ones). Thanks!

> 
>>
>> /*
>>  * Block size attribute stuff
>> @@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
>> 	&dev_attr_state.attr,
>> 	&dev_attr_phys_device.attr,
>> 	&dev_attr_removable.attr,
>> +	&dev_attr_type.attr,
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> 	&dev_attr_valid_zones.attr,
>> #endif
>> @@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
>> }
>>
>> static int init_memory_block(struct memory_block **memory,
>> -			     struct mem_section *section, unsigned long state)
>> +			     struct mem_section *section, unsigned long state,
>> +			     int type)
>> {
>> 	struct memory_block *mem;
>> 	unsigned long start_pfn;
>> 	int scn_nr;
>> 	int ret = 0;
>>
>> +	if (type == MEMORY_BLOCK_NONE)
>> +		return -EINVAL;
> 
> No one will pass in this value. Can we omit this check for now?

I could move it to patch nr 2 I guess, but as I introduce
MEMORY_BLOCK_NONE here it made sense to keep it in here.

(and I think at least for now it makes sense to not squash patch 1 and
2, to easier discuss the new user interface/concept introduced in this
patch).

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-12-01  1:25     ` Wei Yang
                       ` (2 preceding siblings ...)
  (?)
@ 2018-12-03 10:32     ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:32 UTC (permalink / raw)
  To: Wei Yang
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	Heiko Carstens, Michal Hocko, linux-mm, Ingo Molnar, linux-s390,
	x86, Pavel Tatashin, linux-acpi, xen-devel, Michal Such??nek,
	Pavel Tatashin, Stephen Rothwell, mike.travis,
	Martin Schwidefsky, Dan Williams, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel,
	Rafael J. Wysocki, devel

On 01.12.18 02:25, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:19PM +0100, David Hildenbrand wrote:
>> Memory onlining should always be handled by user space, because only user
>> space knows which use cases it wants to satisfy. E.g. memory might be
>> onlined to the MOVABLE zone even if it can never be removed from the
>> system, e.g. to make usage of huge pages more reliable.
>>
>> However to implement such rules (especially default rules in distributions)
>> we need more information about the memory that was added in user space.
>>
>> E.g. on x86 we want to online memory provided by balloon devices (e.g.
>> XEN, Hyper-V) differently (-> will not be unplugged by offlining the whole
>> block) than ordinary DIMMs (-> might eventually be unplugged by offlining
>> the whole block). This might also become relevat for other architectures.
>>
>> Also, udev rules right now check if running on s390x and treat all added
>> memory blocks as standby memory (-> don't online automatically). As soon as
>> we support other memory hotplug mechanism (e.g. virtio-mem) checks would
>> have to get more involved (e.g. also check if under KVM) but eventually
>> also wrong (e.g. if KVM ever supports standby memory we are doomed).
>>
>> I decided to allow to specify the type of memory that is getting added
>> to the system. Let's start with two types, BOOT and UNSPECIFIED to get the
>> basic infrastructure running. We'll introduce and use further types in
>> follow-up patches. For now we classify any hotplugged memory temporarily
>> as as UNSPECIFIED (which will eventually be dropped later on).
>>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
>> Cc: Andrew Banman <andrew.banman@hpe.com>
>> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
>> Cc: Oscar Salvador <osalvador@suse.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Michal Such??nek <msuchanek@suse.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
>> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
>> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> drivers/base/memory.c  | 38 +++++++++++++++++++++++++++++++++++---
>> include/linux/memory.h | 27 +++++++++++++++++++++++++++
>> 2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 0c290f86ab20..17f2985c07c5 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -381,6 +381,29 @@ static ssize_t show_phys_device(struct device *dev,
>> 	return sprintf(buf, "%d\n", mem->phys_device);
>> }
>>
>> +static ssize_t type_show(struct device *dev, struct device_attribute *attr,
>> +			 char *buf)
>> +{
>> +	struct memory_block *mem = to_memory_block(dev);
>> +	ssize_t len = 0;
>> +
>> +	switch (mem->type) {
>> +	case MEMORY_BLOCK_UNSPECIFIED:
>> +		len = sprintf(buf, "unspecified\n");
>> +		break;
>> +	case MEMORY_BLOCK_BOOT:
>> +		len = sprintf(buf, "boot\n");
>> +		break;
>> +	default:
>> +		len = sprintf(buf, "ERROR-UNKNOWN-%ld\n",
>> +				mem->state);
>> +		WARN_ON(1);
>> +		break;
>> +	}
>> +
>> +	return len;
>> +}
>> +
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> static void print_allowed_zone(char *buf, int nid, unsigned long start_pfn,
>> 		unsigned long nr_pages, int online_type,
>> @@ -442,6 +465,7 @@ static DEVICE_ATTR(phys_index, 0444, show_mem_start_phys_index, NULL);
>> static DEVICE_ATTR(state, 0644, show_mem_state, store_mem_state);
>> static DEVICE_ATTR(phys_device, 0444, show_phys_device, NULL);
>> static DEVICE_ATTR(removable, 0444, show_mem_removable, NULL);
>> +static DEVICE_ATTR_RO(type);
> 
> This is correct, while looks not consistent with other attributes.
> 
> Not that beautiful :-)

I might change the other ones first, too (or keep this one consistent to
the existing ones). Thanks!

> 
>>
>> /*
>>  * Block size attribute stuff
>> @@ -620,6 +644,7 @@ static struct attribute *memory_memblk_attrs[] = {
>> 	&dev_attr_state.attr,
>> 	&dev_attr_phys_device.attr,
>> 	&dev_attr_removable.attr,
>> +	&dev_attr_type.attr,
>> #ifdef CONFIG_MEMORY_HOTREMOVE
>> 	&dev_attr_valid_zones.attr,
>> #endif
>> @@ -657,13 +682,17 @@ int register_memory(struct memory_block *memory)
>> }
>>
>> static int init_memory_block(struct memory_block **memory,
>> -			     struct mem_section *section, unsigned long state)
>> +			     struct mem_section *section, unsigned long state,
>> +			     int type)
>> {
>> 	struct memory_block *mem;
>> 	unsigned long start_pfn;
>> 	int scn_nr;
>> 	int ret = 0;
>>
>> +	if (type == MEMORY_BLOCK_NONE)
>> +		return -EINVAL;
> 
> No one will pass in this value. Can we omit this check for now?

I could move it to patch nr 2 I guess, but as I introduce
MEMORY_BLOCK_NONE here it made sense to keep it in here.

(and I think at least for now it makes sense to not squash patch 1 and
2, to easier discuss the new user interface/concept introduced in this
patch).

Thanks!

-- 

Thanks,

David / dhildenb

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  2018-12-01  1:50     ` Wei Yang
  (?)
@ 2018-12-03 10:33       ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:33 UTC (permalink / raw)
  To: Wei Yang
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Benjamin Herrenschmidt, Dave Hansen,
	Heiko Carstens, linux-mm, Michal Hocko, Paul Mackerras,
	H. Peter Anvin, Dan Williams, Rafael J. Wysocki, linux-s390,
	Dave Jiang, Yoshinori Sato, Michael Ellerman, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu

On 01.12.18 02:50, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>> Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>> memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>> functional change.
> 
> I would suggest to put more words to this.

Sure, makes sense, I'll add more details. Thanks!

> 
> "
> Function arch_add_memory()'s last parameter *want_memblock* is used to
> determin whether it is necessary to create a corresponding memory block
> device. After introducing the memory block type, this patch replaces the
> bool type *want_memblock* with memory block type with following rules
> for now:
> 
>   * Pass "MEMORY_BLOCK_NONE" for device memory
>   * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 
> 
> Since this parameter is passed deep to __add_section(), all its
> descendents are effected. Below lists those descendents.
> 
>   arch_add_memory()
>     add_pages()
>       __add_pages()
>         __add_section()
> 
> "

[...]


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
@ 2018-12-03 10:33       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:33 UTC (permalink / raw)
  To: Wei Yang
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Tony Luck,
	Fenghua Yu, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, Martin Schwidefsky, Heiko Carstens,
	Yoshinori Sato, Rich Felker, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Morton, Mike Rapoport, Michal Hocko, Dan Williams,
	Kirill A. Shutemov, Oscar Salvador, Nicholas Piggin,
	Stephen Rothwell, Christophe Leroy, Jonathan Neusch??fer,
	Mauricio Faria de Oliveira, Vasily Gorbik, Arun KS, Rob Herring,
	Pavel Tatashin, mike.travis, Joonsoo Kim, Logan Gunthorpe,
	J??r??me Glisse, Jan H. Sch??nherr, Dave Jiang, Matthew Wilcox,
	Mathieu Malaterre

On 01.12.18 02:50, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>> Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>> memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>> functional change.
> 
> I would suggest to put more words to this.

Sure, makes sense, I'll add more details. Thanks!

> 
> "
> Function arch_add_memory()'s last parameter *want_memblock* is used to
> determin whether it is necessary to create a corresponding memory block
> device. After introducing the memory block type, this patch replaces the
> bool type *want_memblock* with memory block type with following rules
> for now:
> 
>   * Pass "MEMORY_BLOCK_NONE" for device memory
>   * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 
> 
> Since this parameter is passed deep to __add_section(), all its
> descendents are effected. Below lists those descendents.
> 
>   arch_add_memory()
>     add_pages()
>       __add_pages()
>         __add_section()
> 
> "

[...]


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
@ 2018-12-03 10:33       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:33 UTC (permalink / raw)
  To: Wei Yang
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Dave Hansen, Heiko Carstens, linux-mm,
	Michal Hocko, Paul Mackerras, H. Peter Anvin, Dan Williams,
	Rafael J. Wysocki, linux-s390, Dave Jiang, Yoshinori Sato, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu, Jan H. Sch??nherr, Pavel Tatashin, Vasily Gorbik,
	Stephen Rothwell, mike.travis, Jonathan Neusch??fer,
	Nicholas Piggin, J??r??me Glisse, Mike Rapoport, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, Joonsoo Kim, Arun KS,
	Tony Luck, Mathieu Malaterre, Greg Kroah-Hartman, linux-kernel,
	Logan Gunthorpe, Mauricio Faria de Oliveira, Martin Schwidefsky,
	devel, Andrew Morton, linuxppc-dev, Kirill A. Shutemov

On 01.12.18 02:50, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>> Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>> memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>> functional change.
> 
> I would suggest to put more words to this.

Sure, makes sense, I'll add more details. Thanks!

> 
> "
> Function arch_add_memory()'s last parameter *want_memblock* is used to
> determin whether it is necessary to create a corresponding memory block
> device. After introducing the memory block type, this patch replaces the
> bool type *want_memblock* with memory block type with following rules
> for now:
> 
>   * Pass "MEMORY_BLOCK_NONE" for device memory
>   * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 
> 
> Since this parameter is passed deep to __add_section(), all its
> descendents are effected. Below lists those descendents.
> 
>   arch_add_memory()
>     add_pages()
>       __add_pages()
>         __add_section()
> 
> "

[...]


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  2018-12-01  1:50     ` Wei Yang
                       ` (2 preceding siblings ...)
  (?)
@ 2018-12-03 10:33     ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-03 10:33 UTC (permalink / raw)
  To: Wei Yang
  Cc: Oscar Salvador, Rich Felker, linux-ia64, linux-sh,
	Peter Zijlstra, Benjamin Herrenschmidt, Dave Hansen,
	Heiko Carstens, linux-mm, Michal Hocko, Paul Mackerras,
	H. Peter Anvin, Dan Williams, Rafael J. Wysocki, linux-s390,
	Dave Jiang, Yoshinori Sato, Michael Ellerman, x86,
	Matthew Wilcox, linux-acpi, Ingo Molnar, xen-devel, Rob Herring,
	Fenghua Yu

On 01.12.18 02:50, Wei Yang wrote:
> On Fri, Nov 30, 2018 at 06:59:20PM +0100, David Hildenbrand wrote:
>> Let's pass a memory block type instead. Pass "MEMORY_BLOCK_NONE" for device
>> memory and for now "MEMORY_BLOCK_UNSPECIFIED" for anything else. No
>> functional change.
> 
> I would suggest to put more words to this.

Sure, makes sense, I'll add more details. Thanks!

> 
> "
> Function arch_add_memory()'s last parameter *want_memblock* is used to
> determin whether it is necessary to create a corresponding memory block
> device. After introducing the memory block type, this patch replaces the
> bool type *want_memblock* with memory block type with following rules
> for now:
> 
>   * Pass "MEMORY_BLOCK_NONE" for device memory
>   * Pass "MEMORY_BLOCK_UNSPECIFIED" for anything else 
> 
> Since this parameter is passed deep to __add_section(), all its
> descendents are effected. Below lists those descendents.
> 
>   arch_add_memory()
>     add_pages()
>       __add_pages()
>         __add_section()
> 
> "

[...]


-- 

Thanks,

David / dhildenb

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-12-03 10:32       ` David Hildenbrand
                           ` (2 preceding siblings ...)
  (?)
@ 2018-12-03 20:58         ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-03 20:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-acpi, devel, xen-devel, x86,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

[...]
>>>
>>> +	if (type = MEMORY_BLOCK_NONE)
>>> +		return -EINVAL;
>> 
>> No one will pass in this value. Can we omit this check for now?
>
>I could move it to patch nr 2 I guess, but as I introduce
>MEMORY_BLOCK_NONE here it made sense to keep it in here.
>

Yes, this make sense to me now.

>(and I think at least for now it makes sense to not squash patch 1 and
>2, to easier discuss the new user interface/concept introduced in this
>patch).
>
>Thanks!
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-03 20:58         ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-03 20:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-acpi, devel, xen-devel, x86,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko

[...]
>>>
>>> +	if (type == MEMORY_BLOCK_NONE)
>>> +		return -EINVAL;
>> 
>> No one will pass in this value. Can we omit this check for now?
>
>I could move it to patch nr 2 I guess, but as I introduce
>MEMORY_BLOCK_NONE here it made sense to keep it in here.
>

Yes, this make sense to me now.

>(and I think at least for now it makes sense to not squash patch 1 and
>2, to easier discuss the new user interface/concept introduced in this
>patch).
>
>Thanks!
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-03 20:58         ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-03 20:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-acpi, devel, xen-devel, x86,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko,
	Michal Such??nek, Vitaly Kuznetsov, Dan Williams, Pavel Tatashin,
	Martin Schwidefsky, Heiko Carstens

[...]
>>>
>>> +	if (type == MEMORY_BLOCK_NONE)
>>> +		return -EINVAL;
>> 
>> No one will pass in this value. Can we omit this check for now?
>
>I could move it to patch nr 2 I guess, but as I introduce
>MEMORY_BLOCK_NONE here it made sense to keep it in here.
>

Yes, this make sense to me now.

>(and I think at least for now it makes sense to not squash patch 1 and
>2, to easier discuss the new user interface/concept introduced in this
>patch).
>
>Thanks!
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-03 20:58         ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-03 20:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	Heiko Carstens, Wei Yang, linux-mm, Ingo Molnar, linux-s390, x86,
	Pavel Tatashin, linux-acpi, xen-devel, Michal Such??nek,
	Pavel Tatashin, Stephen Rothwell, mike.travis,
	Martin Schwidefsky, Dan Williams, Michal Hocko, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel,
	Rafael J. Wysocki, devel, Andrew Morton, linuxppc-dev

[...]
>>>
>>> +	if (type == MEMORY_BLOCK_NONE)
>>> +		return -EINVAL;
>> 
>> No one will pass in this value. Can we omit this check for now?
>
>I could move it to patch nr 2 I guess, but as I introduce
>MEMORY_BLOCK_NONE here it made sense to keep it in here.
>

Yes, this make sense to me now.

>(and I think at least for now it makes sense to not squash patch 1 and
>2, to easier discuss the new user interface/concept introduced in this
>patch).
>
>Thanks!
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
  2018-12-03 10:32       ` David Hildenbrand
                         ` (3 preceding siblings ...)
  (?)
@ 2018-12-03 20:58       ` Wei Yang
  -1 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-03 20:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, linux-ia64, linux-sh, Dave Hansen,
	Heiko Carstens, Wei Yang, linux-mm, Ingo Molnar, linux-s390, x86,
	Pavel Tatashin, linux-acpi, xen-devel, Michal Such??nek,
	Pavel Tatashin, Stephen Rothwell, mike.travis,
	Martin Schwidefsky, Dan Williams, Michal Hocko, Vitaly Kuznetsov,
	Andrew Banman, Greg Kroah-Hartman, linux-kernel, Rafael

[...]
>>>
>>> +	if (type == MEMORY_BLOCK_NONE)
>>> +		return -EINVAL;
>> 
>> No one will pass in this value. Can we omit this check for now?
>
>I could move it to patch nr 2 I guess, but as I introduce
>MEMORY_BLOCK_NONE here it made sense to keep it in here.
>

Yes, this make sense to me now.

>(and I think at least for now it makes sense to not squash patch 1 and
>2, to easier discuss the new user interface/concept introduced in this
>patch).
>
>Thanks!
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 1/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-03 20:58         ` Wei Yang
  0 siblings, 0 replies; 80+ messages in thread
From: Wei Yang @ 2018-12-03 20:58 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Wei Yang, linux-mm, linux-kernel, linux-ia64, linuxppc-dev,
	linux-s390, linux-sh, linux-acpi, devel, xen-devel, x86,
	Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Morton,
	Ingo Molnar, Pavel Tatashin, Stephen Rothwell, Andrew Banman,
	mike.travis, Oscar Salvador, Dave Hansen, Michal Hocko, Michal

[...]
>>>
>>> +	if (type = MEMORY_BLOCK_NONE)
>>> +		return -EINVAL;
>> 
>> No one will pass in this value. Can we omit this check for now?
>
>I could move it to patch nr 2 I guess, but as I introduce
>MEMORY_BLOCK_NONE here it made sense to keep it in here.
>

Yes, this make sense to me now.

>(and I think at least for now it makes sense to not squash patch 1 and
>2, to easier discuss the new user interface/concept introduced in this
>patch).
>
>Thanks!
>
>-- 
>
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
  2018-11-30 17:59   ` David Hildenbrand
  (?)
  (?)
@ 2018-12-04  9:44     ` Michal Suchánek
  -1 siblings, 0 replies; 80+ messages in thread
From: Michal Suchánek @ 2018-12-04  9:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky

On Fri, 30 Nov 2018 18:59:21 +0100
David Hildenbrand <david@redhat.com> wrote:

> Let's introduce new types for different kinds of memory blocks and use
> them in existing code. As I don't see an easy way to split this up,
> do it in one hunk for now.
> 
> acpi:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  Properly change the type when trying to add memory that was already
>  detected and used during boot (so this memory will correctly end up as
>  "acpi" in user space).
> 
> pseries:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  As far as I see, handling like in the acpi case for existing blocks is
>  not required.
> 
> probed memory from user space:
>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>  again.
> 
> hv_balloon,xen/balloon:
>  Use BALLOON. As simple as that :)
> 
> s390x/sclp:
>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>  semantics are very s390x specific.
> 
> powernv/memtrace:
>  Only allow to use BOOT memory for memtrace. I consider this code in
>  general dangerous, but we have to keep it working ... most probably just
>  a debug feature.

I don't think it should be arbitrarily restricted like that.

Thanks

Michal

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-12-04  9:44     ` Michal Suchánek
  0 siblings, 0 replies; 80+ messages in thread
From: Michal Suchánek @ 2018-12-04  9:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky

On Fri, 30 Nov 2018 18:59:21 +0100
David Hildenbrand <david@redhat.com> wrote:

> Let's introduce new types for different kinds of memory blocks and use
> them in existing code. As I don't see an easy way to split this up,
> do it in one hunk for now.
> 
> acpi:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  Properly change the type when trying to add memory that was already
>  detected and used during boot (so this memory will correctly end up as
>  "acpi" in user space).
> 
> pseries:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  As far as I see, handling like in the acpi case for existing blocks is
>  not required.
> 
> probed memory from user space:
>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>  again.
> 
> hv_balloon,xen/balloon:
>  Use BALLOON. As simple as that :)
> 
> s390x/sclp:
>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>  semantics are very s390x specific.
> 
> powernv/memtrace:
>  Only allow to use BOOT memory for memtrace. I consider this code in
>  general dangerous, but we have to keep it working ... most probably just
>  a debug feature.

I don't think it should be arbitrarily restricted like that.

Thanks

Michal

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-12-04  9:44     ` Michal Suchánek
  0 siblings, 0 replies; 80+ messages in thread
From: Michal Suchánek @ 2018-12-04  9:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Rashmica Gupta, Andrew Morton,
	Pavel Tatashin, Balbir Singh, Michael Neuling, Nathan Fontenot,
	YueHaibing, Vasily Gorbik, Ingo Molnar, Stephen Rothwell,
	mike.travis, Oscar Salvador, Joonsoo Kim, Mathieu Malaterre,
	Michal Hocko, Arun KS, Andrew Banman, Dave Hansen,
	Vitaly Kuznetsov, Dan Williams

On Fri, 30 Nov 2018 18:59:21 +0100
David Hildenbrand <david@redhat.com> wrote:

> Let's introduce new types for different kinds of memory blocks and use
> them in existing code. As I don't see an easy way to split this up,
> do it in one hunk for now.
> 
> acpi:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  Properly change the type when trying to add memory that was already
>  detected and used during boot (so this memory will correctly end up as
>  "acpi" in user space).
> 
> pseries:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  As far as I see, handling like in the acpi case for existing blocks is
>  not required.
> 
> probed memory from user space:
>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>  again.
> 
> hv_balloon,xen/balloon:
>  Use BALLOON. As simple as that :)
> 
> s390x/sclp:
>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>  semantics are very s390x specific.
> 
> powernv/memtrace:
>  Only allow to use BOOT memory for memtrace. I consider this code in
>  general dangerous, but we have to keep it working ... most probably just
>  a debug feature.

I don't think it should be arbitrarily restricted like that.

Thanks

Michal

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-12-04  9:44     ` Michal Suchánek
  0 siblings, 0 replies; 80+ messages in thread
From: Michal Suchánek @ 2018-12-04  9:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, linux-ia64, linux-sh, Dave Hansen, Heiko Carstens,
	linux-mm, Paul Mackerras, Rashmica Gupta, K. Y. Srinivasan,
	Ingo Molnar, linux-s390, Michael Neuling, Stefano Stabellini,
	Stephen Hemminger, x86, YueHaibing, Vitaly Kuznetsov, linux-acpi,
	xen-devel, Len Brown, Pavel Tatashin, Vasily Gorbik,
	Stephen Rothwell, mike.travis, Haiyang Zhang, Dan Williams,
	Nathan Fontenot, Boris Ostrovsky, Joonsoo Kim, Arun KS,
	Oscar Salvador, Juergen Gross, Andrew Banman, Mathieu Malaterre,
	Greg Kroah-Hartman, Rafael J. Wysocki, linux-kernel,
	Martin Schwidefsky, devel, Andrew Morton, linuxppc-dev

On Fri, 30 Nov 2018 18:59:21 +0100
David Hildenbrand <david@redhat.com> wrote:

> Let's introduce new types for different kinds of memory blocks and use
> them in existing code. As I don't see an easy way to split this up,
> do it in one hunk for now.
> 
> acpi:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  Properly change the type when trying to add memory that was already
>  detected and used during boot (so this memory will correctly end up as
>  "acpi" in user space).
> 
> pseries:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  As far as I see, handling like in the acpi case for existing blocks is
>  not required.
> 
> probed memory from user space:
>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>  again.
> 
> hv_balloon,xen/balloon:
>  Use BALLOON. As simple as that :)
> 
> s390x/sclp:
>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>  semantics are very s390x specific.
> 
> powernv/memtrace:
>  Only allow to use BOOT memory for memtrace. I consider this code in
>  general dangerous, but we have to keep it working ... most probably just
>  a debug feature.

I don't think it should be arbitrarily restricted like that.

Thanks

Michal

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
  2018-11-30 17:59   ` David Hildenbrand
                     ` (3 preceding siblings ...)
  (?)
@ 2018-12-04  9:44   ` Michal Suchánek
  -1 siblings, 0 replies; 80+ messages in thread
From: Michal Suchánek @ 2018-12-04  9:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, linux-ia64, linux-sh, Benjamin Herrenschmidt,
	Balbir Singh, Dave Hansen, Heiko Carstens, linux-mm,
	Paul Mackerras, Rashmica Gupta, K. Y. Srinivasan, Ingo Molnar,
	linux-s390, Michael Neuling, Stefano Stabellini,
	Stephen Hemminger, Michael Ellerman, x86, YueHaibing,
	Vitaly Kuznetsov, linux-acpi, xen-devel, Len Brown,
	Pavel Tatashin, Vasily Gorbik

On Fri, 30 Nov 2018 18:59:21 +0100
David Hildenbrand <david@redhat.com> wrote:

> Let's introduce new types for different kinds of memory blocks and use
> them in existing code. As I don't see an easy way to split this up,
> do it in one hunk for now.
> 
> acpi:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  Properly change the type when trying to add memory that was already
>  detected and used during boot (so this memory will correctly end up as
>  "acpi" in user space).
> 
> pseries:
>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>  As far as I see, handling like in the acpi case for existing blocks is
>  not required.
> 
> probed memory from user space:
>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>  again.
> 
> hv_balloon,xen/balloon:
>  Use BALLOON. As simple as that :)
> 
> s390x/sclp:
>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>  semantics are very s390x specific.
> 
> powernv/memtrace:
>  Only allow to use BOOT memory for memtrace. I consider this code in
>  general dangerous, but we have to keep it working ... most probably just
>  a debug feature.

I don't think it should be arbitrarily restricted like that.

Thanks

Michal

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
  2018-12-04  9:44     ` Michal Suchánek
  (?)
  (?)
@ 2018-12-04  9:47       ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-04  9:47 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky

On 04.12.18 10:44, Michal Suchánek wrote:
> On Fri, 30 Nov 2018 18:59:21 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's introduce new types for different kinds of memory blocks and use
>> them in existing code. As I don't see an easy way to split this up,
>> do it in one hunk for now.
>>
>> acpi:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  Properly change the type when trying to add memory that was already
>>  detected and used during boot (so this memory will correctly end up as
>>  "acpi" in user space).
>>
>> pseries:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  As far as I see, handling like in the acpi case for existing blocks is
>>  not required.
>>
>> probed memory from user space:
>>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>>  again.
>>
>> hv_balloon,xen/balloon:
>>  Use BALLOON. As simple as that :)
>>
>> s390x/sclp:
>>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>>  semantics are very s390x specific.
>>
>> powernv/memtrace:
>>  Only allow to use BOOT memory for memtrace. I consider this code in
>>  general dangerous, but we have to keep it working ... most probably just
>>  a debug feature.
> 
> I don't think it should be arbitrarily restricted like that.
> 

Well code that "randomly" offlines/onlines/removes/adds memory blocks
that it does not own (hint: nobody else in the kernel does that), should
be restricted to types we can guarantee to work.

> Thanks
> 
> Michal
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-12-04  9:47       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-04  9:47 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky

On 04.12.18 10:44, Michal Suchánek wrote:
> On Fri, 30 Nov 2018 18:59:21 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's introduce new types for different kinds of memory blocks and use
>> them in existing code. As I don't see an easy way to split this up,
>> do it in one hunk for now.
>>
>> acpi:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  Properly change the type when trying to add memory that was already
>>  detected and used during boot (so this memory will correctly end up as
>>  "acpi" in user space).
>>
>> pseries:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  As far as I see, handling like in the acpi case for existing blocks is
>>  not required.
>>
>> probed memory from user space:
>>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>>  again.
>>
>> hv_balloon,xen/balloon:
>>  Use BALLOON. As simple as that :)
>>
>> s390x/sclp:
>>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>>  semantics are very s390x specific.
>>
>> powernv/memtrace:
>>  Only allow to use BOOT memory for memtrace. I consider this code in
>>  general dangerous, but we have to keep it working ... most probably just
>>  a debug feature.
> 
> I don't think it should be arbitrarily restricted like that.
> 

Well code that "randomly" offlines/onlines/removes/adds memory blocks
that it does not own (hint: nobody else in the kernel does that), should
be restricted to types we can guarantee to work.

> Thanks
> 
> Michal
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-12-04  9:47       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-04  9:47 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86,
	Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	Rafael J. Wysocki, Len Brown, Greg Kroah-Hartman,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Martin Schwidefsky, Heiko Carstens, Boris Ostrovsky,
	Juergen Gross, Stefano Stabellini, Rashmica Gupta, Andrew Morton,
	Pavel Tatashin, Balbir Singh, Michael Neuling, Nathan Fontenot,
	YueHaibing, Vasily Gorbik, Ingo Molnar, Stephen Rothwell,
	mike.travis, Oscar Salvador, Joonsoo Kim, Mathieu Malaterre,
	Michal Hocko, Arun KS, Andrew Banman, Dave Hansen,
	Vitaly Kuznetsov, Dan Williams

On 04.12.18 10:44, Michal Suchánek wrote:
> On Fri, 30 Nov 2018 18:59:21 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's introduce new types for different kinds of memory blocks and use
>> them in existing code. As I don't see an easy way to split this up,
>> do it in one hunk for now.
>>
>> acpi:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  Properly change the type when trying to add memory that was already
>>  detected and used during boot (so this memory will correctly end up as
>>  "acpi" in user space).
>>
>> pseries:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  As far as I see, handling like in the acpi case for existing blocks is
>>  not required.
>>
>> probed memory from user space:
>>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>>  again.
>>
>> hv_balloon,xen/balloon:
>>  Use BALLOON. As simple as that :)
>>
>> s390x/sclp:
>>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>>  semantics are very s390x specific.
>>
>> powernv/memtrace:
>>  Only allow to use BOOT memory for memtrace. I consider this code in
>>  general dangerous, but we have to keep it working ... most probably just
>>  a debug feature.
> 
> I don't think it should be arbitrarily restricted like that.
> 

Well code that "randomly" offlines/onlines/removes/adds memory blocks
that it does not own (hint: nobody else in the kernel does that), should
be restricted to types we can guarantee to work.

> Thanks
> 
> Michal
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
@ 2018-12-04  9:47       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-04  9:47 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Michal Hocko, linux-ia64, linux-sh, Dave Hansen, Heiko Carstens,
	linux-mm, Paul Mackerras, Rashmica Gupta, K. Y. Srinivasan,
	Ingo Molnar, linux-s390, Michael Neuling, Stefano Stabellini,
	Stephen Hemminger, x86, YueHaibing, Vitaly Kuznetsov, linux-acpi,
	xen-devel, Len Brown, Pavel Tatashin, Vasily Gorbik,
	Stephen Rothwell, mike.travis, Haiyang Zhang, Dan Williams,
	Nathan Fontenot, Boris Ostrovsky, Joonsoo Kim, Arun KS,
	Oscar Salvador, Juergen Gross, Andrew Banman, Mathieu Malaterre,
	Greg Kroah-Hartman, Rafael J. Wysocki, linux-kernel,
	Martin Schwidefsky, devel, Andrew Morton, linuxppc-dev

On 04.12.18 10:44, Michal Suchánek wrote:
> On Fri, 30 Nov 2018 18:59:21 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's introduce new types for different kinds of memory blocks and use
>> them in existing code. As I don't see an easy way to split this up,
>> do it in one hunk for now.
>>
>> acpi:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  Properly change the type when trying to add memory that was already
>>  detected and used during boot (so this memory will correctly end up as
>>  "acpi" in user space).
>>
>> pseries:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  As far as I see, handling like in the acpi case for existing blocks is
>>  not required.
>>
>> probed memory from user space:
>>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>>  again.
>>
>> hv_balloon,xen/balloon:
>>  Use BALLOON. As simple as that :)
>>
>> s390x/sclp:
>>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>>  semantics are very s390x specific.
>>
>> powernv/memtrace:
>>  Only allow to use BOOT memory for memtrace. I consider this code in
>>  general dangerous, but we have to keep it working ... most probably just
>>  a debug feature.
> 
> I don't think it should be arbitrarily restricted like that.
> 

Well code that "randomly" offlines/onlines/removes/adds memory blocks
that it does not own (hint: nobody else in the kernel does that), should
be restricted to types we can guarantee to work.

> Thanks
> 
> Michal
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types
  2018-12-04  9:44     ` Michal Suchánek
                       ` (2 preceding siblings ...)
  (?)
@ 2018-12-04  9:47     ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-04  9:47 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Michal Hocko, linux-ia64, linux-sh, Benjamin Herrenschmidt,
	Balbir Singh, Dave Hansen, Heiko Carstens, linux-mm,
	Paul Mackerras, Rashmica Gupta, K. Y. Srinivasan, Ingo Molnar,
	linux-s390, Michael Neuling, Stefano Stabellini,
	Stephen Hemminger, Michael Ellerman, x86, YueHaibing,
	Vitaly Kuznetsov, linux-acpi, xen-devel, Len Brown,
	Pavel Tatashin, Vasily Gorbik

On 04.12.18 10:44, Michal Suchánek wrote:
> On Fri, 30 Nov 2018 18:59:21 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Let's introduce new types for different kinds of memory blocks and use
>> them in existing code. As I don't see an easy way to split this up,
>> do it in one hunk for now.
>>
>> acpi:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  Properly change the type when trying to add memory that was already
>>  detected and used during boot (so this memory will correctly end up as
>>  "acpi" in user space).
>>
>> pseries:
>>  Use DIMM or DIMM_UNREMOVABLE depending on hotremove support in the kernel.
>>  As far as I see, handling like in the acpi case for existing blocks is
>>  not required.
>>
>> probed memory from user space:
>>  Use DIMM_UNREMOVABLE as there is no interface to get rid of this code
>>  again.
>>
>> hv_balloon,xen/balloon:
>>  Use BALLOON. As simple as that :)
>>
>> s390x/sclp:
>>  Use a dedicated type S390X_STANDBY as this type of memory and it's
>>  semantics are very s390x specific.
>>
>> powernv/memtrace:
>>  Only allow to use BOOT memory for memtrace. I consider this code in
>>  general dangerous, but we have to keep it working ... most probably just
>>  a debug feature.
> 
> I don't think it should be arbitrarily restricted like that.
> 

Well code that "randomly" offlines/onlines/removes/adds memory blocks
that it does not own (hint: nobody else in the kernel does that), should
be restricted to types we can guarantee to work.

> Thanks
> 
> Michal
> 


-- 

Thanks,

David / dhildenb

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59 ` David Hildenbrand
  (?)
@ 2018-12-20 12:58   ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 12:58 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Michal Hocko, Vitaly Kuznetsov,
	Pavel Tatashin, Rich Felker, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, Boris Ostrovsky,
	Paul Mackerras, Pavel Tatashin, linux-s390, Michael Neuling,
	Stefano

On 30.11.18 18:59, David Hildenbrand wrote:
> This is the second approach, introducing more meaningful memory block
> types and not changing online behavior in the kernel. It is based on
> latest linux-next.
> 
> As we found out during dicussion, user space should always handle onlining
> of memory, in any case. However in order to make smart decisions in user
> space about if and how to online memory, we have to export more information
> about memory blocks. This way, we can formulate rules in user space.
> 
> One such information is the type of memory block we are talking about.
> This helps to answer some questions like:
> - Does this memory block belong to a DIMM?
> - Can this DIMM theoretically ever be unplugged again?
> - Was this memory added by a balloon driver that will rely on balloon
>   inflation to remove chunks of that memory again? Which zone is advised?
> - Is this special standby memory on s390x that is usually not automatically
>   onlined?
> 
> And in short it helps to answer to some extend (excluding zone imbalances)
> - Should I online this memory block?
> - To which zone should I online this memory block?
> ... of course special use cases will result in different anwers. But that's
> why user space has control of onlining memory.
> 
> More details can be found in Patch 1 and Patch 3.
> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> 
> 
> Example:
> $ udevadm info -q all -a /sys/devices/system/memory/memory0
> 	KERNEL=="memory0"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="00000000"
> 	ATTR{removable}=="0"
> 	ATTR{state}=="online"
> 	ATTR{type}=="boot"
> 	ATTR{valid_zones}=="none"
> $ udevadm info -q all -a /sys/devices/system/memory/memory90
> 	KERNEL=="memory90"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="0000005a"
> 	ATTR{removable}=="1"
> 	ATTR{state}=="online"
> 	ATTR{type}=="dimm"
> 	ATTR{valid_zones}=="Normal"
> 
> 
> RFC -> RFCv2:
> - Now also taking care of PPC (somehow missed it :/ )
> - Split the series up to some degree (some ideas on how to split up patch 3
>   would be very welcome)
> - Introduce more memory block types. Turns out abstracting too much was
>   rather confusing and not helpful. Properly document them.
> 
> Notes:
> - I wanted to convert the enum of types into a named enum but this
>   provoked all kinds of different errors. For now, I am doing it just like
>   the other types (e.g. online_type) we are using in that context.
> - The "removable" property should never have been named like that. It
>   should have been "offlinable". Can we still rename that? E.g. boot memory
>   is sometimes marked as removable ...
> 


Any feedback regarding the suggested block types would be very much
appreciated!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-20 12:58   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 12:58 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-ia64, linuxppc-dev, linux-s390, linux-sh,
	linux-acpi, devel, xen-devel, x86, Andrew Banman, Andrew Morton,
	Andy Lutomirski, Arun KS, Balbir Singh, Benjamin Herrenschmidt,
	Borislav Petkov, Boris Ostrovsky, Christophe Leroy, Dan Williams,
	Dave Hansen, Dave Jiang, Fenghua Yu, Greg Kroah-Hartman,
	Haiyang Zhang, Heiko Carstens, H. Peter Anvin, Ingo Molnar,
	Ingo Molnar, Jan H. Schönherr, Jérôme Glisse,
	Jonathan Neuschäfer, Joonsoo Kim, Juergen Gross,
	Kirill A. Shutemov, K. Y. Srinivasan, Len Brown, Logan Gunthorpe,
	Martin Schwidefsky, Mathieu Malaterre, Matthew Wilcox,
	Mauricio Faria de Oliveira, Michael Ellerman, Michael Neuling,
	Michal Hocko, Michal Hocko, Michal Suchánek, Mike Rapoport,
	mike.travis, Nathan Fontenot, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Pavel Tatashin, Peter Zijlstra, Rafael J. Wysocki,
	Rafael J. Wysocki, Rashmica Gupta, Rich Felker, Rob Herring,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Vitaly Kuznetsov,
	Wei Yang, Yoshinori Sato, YueHaibing

On 30.11.18 18:59, David Hildenbrand wrote:
> This is the second approach, introducing more meaningful memory block
> types and not changing online behavior in the kernel. It is based on
> latest linux-next.
> 
> As we found out during dicussion, user space should always handle onlining
> of memory, in any case. However in order to make smart decisions in user
> space about if and how to online memory, we have to export more information
> about memory blocks. This way, we can formulate rules in user space.
> 
> One such information is the type of memory block we are talking about.
> This helps to answer some questions like:
> - Does this memory block belong to a DIMM?
> - Can this DIMM theoretically ever be unplugged again?
> - Was this memory added by a balloon driver that will rely on balloon
>   inflation to remove chunks of that memory again? Which zone is advised?
> - Is this special standby memory on s390x that is usually not automatically
>   onlined?
> 
> And in short it helps to answer to some extend (excluding zone imbalances)
> - Should I online this memory block?
> - To which zone should I online this memory block?
> ... of course special use cases will result in different anwers. But that's
> why user space has control of onlining memory.
> 
> More details can be found in Patch 1 and Patch 3.
> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> 
> 
> Example:
> $ udevadm info -q all -a /sys/devices/system/memory/memory0
> 	KERNEL=="memory0"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="00000000"
> 	ATTR{removable}=="0"
> 	ATTR{state}=="online"
> 	ATTR{type}=="boot"
> 	ATTR{valid_zones}=="none"
> $ udevadm info -q all -a /sys/devices/system/memory/memory90
> 	KERNEL=="memory90"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="0000005a"
> 	ATTR{removable}=="1"
> 	ATTR{state}=="online"
> 	ATTR{type}=="dimm"
> 	ATTR{valid_zones}=="Normal"
> 
> 
> RFC -> RFCv2:
> - Now also taking care of PPC (somehow missed it :/ )
> - Split the series up to some degree (some ideas on how to split up patch 3
>   would be very welcome)
> - Introduce more memory block types. Turns out abstracting too much was
>   rather confusing and not helpful. Properly document them.
> 
> Notes:
> - I wanted to convert the enum of types into a named enum but this
>   provoked all kinds of different errors. For now, I am doing it just like
>   the other types (e.g. online_type) we are using in that context.
> - The "removable" property should never have been named like that. It
>   should have been "offlinable". Can we still rename that? E.g. boot memory
>   is sometimes marked as removable ...
> 


Any feedback regarding the suggested block types would be very much
appreciated!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-20 12:58   ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 12:58 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Dave Hansen, Heiko Carstens,
	Michal Hocko, Vitaly Kuznetsov, Pavel Tatashin, Rich Felker,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	K. Y. Srinivasan, Boris Ostrovsky, Paul Mackerras,
	Pavel Tatashin, linux-s390, Michael Neuling, Stefano Stabellini,
	Dave Jiang, Yoshinori Sato, Logan Gunthorpe, x86, YueHaibing,
	Pavel Tatashin, Matthew Wilcox, Ingo Molnar, linux-acpi,
	Ingo Molnar, xen-devel, Michal Suchánek, Len Brown,
	Fenghua Yu, Jan H. Schönherr, Juergen Gross, Vasily Gorbik,
	Rob Herring, mike.travis, Haiyang Zhang,
	Jonathan Neuschäfer, Nicholas Piggin,
	Jérôme Glisse, Mike Rapoport, Borislav Petkov,
	Andy Lutomirski, Nathan Fontenot, Stephen Hemminger,
	Dan Williams, Wei Yang, Joonsoo Kim, Oscar Salvador, Tony Luck,
	Andrew Banman, Mathieu Malaterre, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, Mauricio Faria de Oliveira,
	Thomas Gleixner, Martin Schwidefsky, devel, Andrew Morton,
	linuxppc-dev, Kirill A. Shutemov

On 30.11.18 18:59, David Hildenbrand wrote:
> This is the second approach, introducing more meaningful memory block
> types and not changing online behavior in the kernel. It is based on
> latest linux-next.
> 
> As we found out during dicussion, user space should always handle onlining
> of memory, in any case. However in order to make smart decisions in user
> space about if and how to online memory, we have to export more information
> about memory blocks. This way, we can formulate rules in user space.
> 
> One such information is the type of memory block we are talking about.
> This helps to answer some questions like:
> - Does this memory block belong to a DIMM?
> - Can this DIMM theoretically ever be unplugged again?
> - Was this memory added by a balloon driver that will rely on balloon
>   inflation to remove chunks of that memory again? Which zone is advised?
> - Is this special standby memory on s390x that is usually not automatically
>   onlined?
> 
> And in short it helps to answer to some extend (excluding zone imbalances)
> - Should I online this memory block?
> - To which zone should I online this memory block?
> ... of course special use cases will result in different anwers. But that's
> why user space has control of onlining memory.
> 
> More details can be found in Patch 1 and Patch 3.
> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> 
> 
> Example:
> $ udevadm info -q all -a /sys/devices/system/memory/memory0
> 	KERNEL=="memory0"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="00000000"
> 	ATTR{removable}=="0"
> 	ATTR{state}=="online"
> 	ATTR{type}=="boot"
> 	ATTR{valid_zones}=="none"
> $ udevadm info -q all -a /sys/devices/system/memory/memory90
> 	KERNEL=="memory90"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="0000005a"
> 	ATTR{removable}=="1"
> 	ATTR{state}=="online"
> 	ATTR{type}=="dimm"
> 	ATTR{valid_zones}=="Normal"
> 
> 
> RFC -> RFCv2:
> - Now also taking care of PPC (somehow missed it :/ )
> - Split the series up to some degree (some ideas on how to split up patch 3
>   would be very welcome)
> - Introduce more memory block types. Turns out abstracting too much was
>   rather confusing and not helpful. Properly document them.
> 
> Notes:
> - I wanted to convert the enum of types into a named enum but this
>   provoked all kinds of different errors. For now, I am doing it just like
>   the other types (e.g. online_type) we are using in that context.
> - The "removable" property should never have been named like that. It
>   should have been "offlinable". Can we still rename that? E.g. boot memory
>   is sometimes marked as removable ...
> 


Any feedback regarding the suggested block types would be very much
appreciated!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-11-30 17:59 ` David Hildenbrand
                   ` (12 preceding siblings ...)
  (?)
@ 2018-12-20 12:58 ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 12:58 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Michal Hocko, Vitaly Kuznetsov,
	Pavel Tatashin, Rich Felker, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, K. Y. Srinivasan,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael

On 30.11.18 18:59, David Hildenbrand wrote:
> This is the second approach, introducing more meaningful memory block
> types and not changing online behavior in the kernel. It is based on
> latest linux-next.
> 
> As we found out during dicussion, user space should always handle onlining
> of memory, in any case. However in order to make smart decisions in user
> space about if and how to online memory, we have to export more information
> about memory blocks. This way, we can formulate rules in user space.
> 
> One such information is the type of memory block we are talking about.
> This helps to answer some questions like:
> - Does this memory block belong to a DIMM?
> - Can this DIMM theoretically ever be unplugged again?
> - Was this memory added by a balloon driver that will rely on balloon
>   inflation to remove chunks of that memory again? Which zone is advised?
> - Is this special standby memory on s390x that is usually not automatically
>   onlined?
> 
> And in short it helps to answer to some extend (excluding zone imbalances)
> - Should I online this memory block?
> - To which zone should I online this memory block?
> ... of course special use cases will result in different anwers. But that's
> why user space has control of onlining memory.
> 
> More details can be found in Patch 1 and Patch 3.
> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> 
> 
> Example:
> $ udevadm info -q all -a /sys/devices/system/memory/memory0
> 	KERNEL=="memory0"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="00000000"
> 	ATTR{removable}=="0"
> 	ATTR{state}=="online"
> 	ATTR{type}=="boot"
> 	ATTR{valid_zones}=="none"
> $ udevadm info -q all -a /sys/devices/system/memory/memory90
> 	KERNEL=="memory90"
> 	SUBSYSTEM=="memory"
> 	DRIVER==""
> 	ATTR{online}=="1"
> 	ATTR{phys_device}=="0"
> 	ATTR{phys_index}=="0000005a"
> 	ATTR{removable}=="1"
> 	ATTR{state}=="online"
> 	ATTR{type}=="dimm"
> 	ATTR{valid_zones}=="Normal"
> 
> 
> RFC -> RFCv2:
> - Now also taking care of PPC (somehow missed it :/ )
> - Split the series up to some degree (some ideas on how to split up patch 3
>   would be very welcome)
> - Introduce more memory block types. Turns out abstracting too much was
>   rather confusing and not helpful. Properly document them.
> 
> Notes:
> - I wanted to convert the enum of types into a named enum but this
>   provoked all kinds of different errors. For now, I am doing it just like
>   the other types (e.g. online_type) we are using in that context.
> - The "removable" property should never have been named like that. It
>   should have been "offlinable". Can we still rename that? E.g. boot memory
>   is sometimes marked as removable ...
> 


Any feedback regarding the suggested block types would be very much
appreciated!


-- 

Thanks,

David / dhildenb

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-12-20 12:58   ` David Hildenbrand
  (?)
@ 2018-12-20 13:08     ` Michal Hocko
  -1 siblings, 0 replies; 80+ messages in thread
From: Michal Hocko @ 2018-12-20 13:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Wei Yang, linux-mm, Pavel Tatashin,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael Neuling, Stefano Stabellini

On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
> On 30.11.18 18:59, David Hildenbrand wrote:
> > This is the second approach, introducing more meaningful memory block
> > types and not changing online behavior in the kernel. It is based on
> > latest linux-next.
> > 
> > As we found out during dicussion, user space should always handle onlining
> > of memory, in any case. However in order to make smart decisions in user
> > space about if and how to online memory, we have to export more information
> > about memory blocks. This way, we can formulate rules in user space.
> > 
> > One such information is the type of memory block we are talking about.
> > This helps to answer some questions like:
> > - Does this memory block belong to a DIMM?
> > - Can this DIMM theoretically ever be unplugged again?
> > - Was this memory added by a balloon driver that will rely on balloon
> >   inflation to remove chunks of that memory again? Which zone is advised?
> > - Is this special standby memory on s390x that is usually not automatically
> >   onlined?
> > 
> > And in short it helps to answer to some extend (excluding zone imbalances)
> > - Should I online this memory block?
> > - To which zone should I online this memory block?
> > ... of course special use cases will result in different anwers. But that's
> > why user space has control of onlining memory.
> > 
> > More details can be found in Patch 1 and Patch 3.
> > Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> > 
> > 
> > Example:
> > $ udevadm info -q all -a /sys/devices/system/memory/memory0
> > 	KERNEL=="memory0"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="00000000"
> > 	ATTR{removable}=="0"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="boot"
> > 	ATTR{valid_zones}=="none"
> > $ udevadm info -q all -a /sys/devices/system/memory/memory90
> > 	KERNEL=="memory90"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="0000005a"
> > 	ATTR{removable}=="1"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="dimm"
> > 	ATTR{valid_zones}=="Normal"
> > 
> > 
> > RFC -> RFCv2:
> > - Now also taking care of PPC (somehow missed it :/ )
> > - Split the series up to some degree (some ideas on how to split up patch 3
> >   would be very welcome)
> > - Introduce more memory block types. Turns out abstracting too much was
> >   rather confusing and not helpful. Properly document them.
> > 
> > Notes:
> > - I wanted to convert the enum of types into a named enum but this
> >   provoked all kinds of different errors. For now, I am doing it just like
> >   the other types (e.g. online_type) we are using in that context.
> > - The "removable" property should never have been named like that. It
> >   should have been "offlinable". Can we still rename that? E.g. boot memory
> >   is sometimes marked as removable ...
> > 
> 
> 
> Any feedback regarding the suggested block types would be very much
> appreciated!

I still do not like this much to be honest. I just didn't get to think
through this properly. My fear is that this is conflating an actual API
with the current implementation and as such will cause problems in
future. But I haven't really looked into your patches closely so I might
be wrong. Anyway I won't be able to look into it by the end of year.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-20 13:08     ` Michal Hocko
  0 siblings, 0 replies; 80+ messages in thread
From: Michal Hocko @ 2018-12-20 13:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Andrew Banman,
	Andrew Morton, Andy Lutomirski, Arun KS, Balbir Singh,
	Benjamin Herrenschmidt, Borislav Petkov, Boris Ostrovsky,
	Christophe Leroy, Dan Williams, Dave Hansen, Dave Jiang,
	Fenghua Yu, Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jan H. Schönherr,
	Jérôme Glisse, Jonathan Neuschäfer, Joonsoo Kim,
	Juergen Gross, Kirill A. Shutemov, K. Y. Srinivasan, Len Brown,
	Logan Gunthorpe, Martin Schwidefsky, Mathieu Malaterre,
	Matthew Wilcox, Mauricio Faria de Oliveira, Michael Ellerman,
	Michael Neuling, Michal Suchánek, Mike Rapoport,
	mike.travis, Nathan Fontenot, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Pavel Tatashin, Peter Zijlstra, Rafael J. Wysocki,
	Rafael J. Wysocki, Rashmica Gupta, Rich Felker, Rob Herring,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Vitaly Kuznetsov,
	Wei Yang, Yoshinori Sato, YueHaibing

On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
> On 30.11.18 18:59, David Hildenbrand wrote:
> > This is the second approach, introducing more meaningful memory block
> > types and not changing online behavior in the kernel. It is based on
> > latest linux-next.
> > 
> > As we found out during dicussion, user space should always handle onlining
> > of memory, in any case. However in order to make smart decisions in user
> > space about if and how to online memory, we have to export more information
> > about memory blocks. This way, we can formulate rules in user space.
> > 
> > One such information is the type of memory block we are talking about.
> > This helps to answer some questions like:
> > - Does this memory block belong to a DIMM?
> > - Can this DIMM theoretically ever be unplugged again?
> > - Was this memory added by a balloon driver that will rely on balloon
> >   inflation to remove chunks of that memory again? Which zone is advised?
> > - Is this special standby memory on s390x that is usually not automatically
> >   onlined?
> > 
> > And in short it helps to answer to some extend (excluding zone imbalances)
> > - Should I online this memory block?
> > - To which zone should I online this memory block?
> > ... of course special use cases will result in different anwers. But that's
> > why user space has control of onlining memory.
> > 
> > More details can be found in Patch 1 and Patch 3.
> > Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> > 
> > 
> > Example:
> > $ udevadm info -q all -a /sys/devices/system/memory/memory0
> > 	KERNEL=="memory0"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="00000000"
> > 	ATTR{removable}=="0"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="boot"
> > 	ATTR{valid_zones}=="none"
> > $ udevadm info -q all -a /sys/devices/system/memory/memory90
> > 	KERNEL=="memory90"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="0000005a"
> > 	ATTR{removable}=="1"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="dimm"
> > 	ATTR{valid_zones}=="Normal"
> > 
> > 
> > RFC -> RFCv2:
> > - Now also taking care of PPC (somehow missed it :/ )
> > - Split the series up to some degree (some ideas on how to split up patch 3
> >   would be very welcome)
> > - Introduce more memory block types. Turns out abstracting too much was
> >   rather confusing and not helpful. Properly document them.
> > 
> > Notes:
> > - I wanted to convert the enum of types into a named enum but this
> >   provoked all kinds of different errors. For now, I am doing it just like
> >   the other types (e.g. online_type) we are using in that context.
> > - The "removable" property should never have been named like that. It
> >   should have been "offlinable". Can we still rename that? E.g. boot memory
> >   is sometimes marked as removable ...
> > 
> 
> 
> Any feedback regarding the suggested block types would be very much
> appreciated!

I still do not like this much to be honest. I just didn't get to think
through this properly. My fear is that this is conflating an actual API
with the current implementation and as such will cause problems in
future. But I haven't really looked into your patches closely so I might
be wrong. Anyway I won't be able to look into it by the end of year.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-20 13:08     ` Michal Hocko
  0 siblings, 0 replies; 80+ messages in thread
From: Michal Hocko @ 2018-12-20 13:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Dave Hansen, Heiko Carstens, Wei Yang,
	linux-mm, Pavel Tatashin, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, K. Y. Srinivasan,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael Neuling, Stefano Stabellini, Dave Jiang, Yoshinori Sato,
	Logan Gunthorpe, x86, YueHaibing, Pavel Tatashin, Matthew Wilcox,
	Ingo Molnar, linux-acpi, Ingo Molnar, xen-devel,
	Michal Suchánek, Len Brown, Fenghua Yu, Vitaly Kuznetsov,
	Jan H. Schönherr, Juergen Gross, Vasily Gorbik, Rob Herring,
	mike.travis, Haiyang Zhang, Jonathan Neuschäfer,
	Nicholas Piggin, Jérôme Glisse, Mike Rapoport,
	Borislav Petkov, Andy Lutomirski, Nathan Fontenot,
	Stephen Hemminger, Dan Williams, Joonsoo Kim, Oscar Salvador,
	Tony Luck, Andrew Banman, Mathieu Malaterre, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, Mauricio Faria de Oliveira,
	Thomas Gleixner, Martin Schwidefsky, devel, Andrew Morton,
	linuxppc-dev, Kirill A. Shutemov

On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
> On 30.11.18 18:59, David Hildenbrand wrote:
> > This is the second approach, introducing more meaningful memory block
> > types and not changing online behavior in the kernel. It is based on
> > latest linux-next.
> > 
> > As we found out during dicussion, user space should always handle onlining
> > of memory, in any case. However in order to make smart decisions in user
> > space about if and how to online memory, we have to export more information
> > about memory blocks. This way, we can formulate rules in user space.
> > 
> > One such information is the type of memory block we are talking about.
> > This helps to answer some questions like:
> > - Does this memory block belong to a DIMM?
> > - Can this DIMM theoretically ever be unplugged again?
> > - Was this memory added by a balloon driver that will rely on balloon
> >   inflation to remove chunks of that memory again? Which zone is advised?
> > - Is this special standby memory on s390x that is usually not automatically
> >   onlined?
> > 
> > And in short it helps to answer to some extend (excluding zone imbalances)
> > - Should I online this memory block?
> > - To which zone should I online this memory block?
> > ... of course special use cases will result in different anwers. But that's
> > why user space has control of onlining memory.
> > 
> > More details can be found in Patch 1 and Patch 3.
> > Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> > 
> > 
> > Example:
> > $ udevadm info -q all -a /sys/devices/system/memory/memory0
> > 	KERNEL=="memory0"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="00000000"
> > 	ATTR{removable}=="0"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="boot"
> > 	ATTR{valid_zones}=="none"
> > $ udevadm info -q all -a /sys/devices/system/memory/memory90
> > 	KERNEL=="memory90"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="0000005a"
> > 	ATTR{removable}=="1"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="dimm"
> > 	ATTR{valid_zones}=="Normal"
> > 
> > 
> > RFC -> RFCv2:
> > - Now also taking care of PPC (somehow missed it :/ )
> > - Split the series up to some degree (some ideas on how to split up patch 3
> >   would be very welcome)
> > - Introduce more memory block types. Turns out abstracting too much was
> >   rather confusing and not helpful. Properly document them.
> > 
> > Notes:
> > - I wanted to convert the enum of types into a named enum but this
> >   provoked all kinds of different errors. For now, I am doing it just like
> >   the other types (e.g. online_type) we are using in that context.
> > - The "removable" property should never have been named like that. It
> >   should have been "offlinable". Can we still rename that? E.g. boot memory
> >   is sometimes marked as removable ...
> > 
> 
> 
> Any feedback regarding the suggested block types would be very much
> appreciated!

I still do not like this much to be honest. I just didn't get to think
through this properly. My fear is that this is conflating an actual API
with the current implementation and as such will cause problems in
future. But I haven't really looked into your patches closely so I might
be wrong. Anyway I won't be able to look into it by the end of year.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-12-20 12:58   ` David Hildenbrand
  (?)
  (?)
@ 2018-12-20 13:08   ` Michal Hocko
  -1 siblings, 0 replies; 80+ messages in thread
From: Michal Hocko @ 2018-12-20 13:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Wei Yang, linux-mm, Pavel Tatashin,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	K. Y. Srinivasan, Boris Ostrovsky, Paul Mackerras,
	Pavel Tatashin, linux-s390, Michael Neuling

On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
> On 30.11.18 18:59, David Hildenbrand wrote:
> > This is the second approach, introducing more meaningful memory block
> > types and not changing online behavior in the kernel. It is based on
> > latest linux-next.
> > 
> > As we found out during dicussion, user space should always handle onlining
> > of memory, in any case. However in order to make smart decisions in user
> > space about if and how to online memory, we have to export more information
> > about memory blocks. This way, we can formulate rules in user space.
> > 
> > One such information is the type of memory block we are talking about.
> > This helps to answer some questions like:
> > - Does this memory block belong to a DIMM?
> > - Can this DIMM theoretically ever be unplugged again?
> > - Was this memory added by a balloon driver that will rely on balloon
> >   inflation to remove chunks of that memory again? Which zone is advised?
> > - Is this special standby memory on s390x that is usually not automatically
> >   onlined?
> > 
> > And in short it helps to answer to some extend (excluding zone imbalances)
> > - Should I online this memory block?
> > - To which zone should I online this memory block?
> > ... of course special use cases will result in different anwers. But that's
> > why user space has control of onlining memory.
> > 
> > More details can be found in Patch 1 and Patch 3.
> > Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
> > 
> > 
> > Example:
> > $ udevadm info -q all -a /sys/devices/system/memory/memory0
> > 	KERNEL=="memory0"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="00000000"
> > 	ATTR{removable}=="0"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="boot"
> > 	ATTR{valid_zones}=="none"
> > $ udevadm info -q all -a /sys/devices/system/memory/memory90
> > 	KERNEL=="memory90"
> > 	SUBSYSTEM=="memory"
> > 	DRIVER==""
> > 	ATTR{online}=="1"
> > 	ATTR{phys_device}=="0"
> > 	ATTR{phys_index}=="0000005a"
> > 	ATTR{removable}=="1"
> > 	ATTR{state}=="online"
> > 	ATTR{type}=="dimm"
> > 	ATTR{valid_zones}=="Normal"
> > 
> > 
> > RFC -> RFCv2:
> > - Now also taking care of PPC (somehow missed it :/ )
> > - Split the series up to some degree (some ideas on how to split up patch 3
> >   would be very welcome)
> > - Introduce more memory block types. Turns out abstracting too much was
> >   rather confusing and not helpful. Properly document them.
> > 
> > Notes:
> > - I wanted to convert the enum of types into a named enum but this
> >   provoked all kinds of different errors. For now, I am doing it just like
> >   the other types (e.g. online_type) we are using in that context.
> > - The "removable" property should never have been named like that. It
> >   should have been "offlinable". Can we still rename that? E.g. boot memory
> >   is sometimes marked as removable ...
> > 
> 
> 
> Any feedback regarding the suggested block types would be very much
> appreciated!

I still do not like this much to be honest. I just didn't get to think
through this properly. My fear is that this is conflating an actual API
with the current implementation and as such will cause problems in
future. But I haven't really looked into your patches closely so I might
be wrong. Anyway I won't be able to look into it by the end of year.
-- 
Michal Hocko
SUSE Labs

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-12-20 13:08     ` Michal Hocko
  (?)
@ 2018-12-20 13:16       ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 13:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Wei Yang, linux-mm, Pavel Tatashin,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael Neuling, Stefano Stabellini

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

I guess as long as we have memory block devices and we expect user space
to make a decision we will have this API and the involved problems.

I am open for alternatives, and as I said, any feedback on how to sort
this out will be highly appreciated.

I'll be on vacation for the next two weeks, so this can wait. Just
wanted to note that I am still interested in feedback :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-20 13:16       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 13:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Andrew Banman,
	Andrew Morton, Andy Lutomirski, Arun KS, Balbir Singh,
	Benjamin Herrenschmidt, Borislav Petkov, Boris Ostrovsky,
	Christophe Leroy, Dan Williams, Dave Hansen, Dave Jiang,
	Fenghua Yu, Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jan H. Schönherr,
	Jérôme Glisse, Jonathan Neuschäfer, Joonsoo Kim,
	Juergen Gross, Kirill A. Shutemov, K. Y. Srinivasan, Len Brown,
	Logan Gunthorpe, Martin Schwidefsky, Mathieu Malaterre,
	Matthew Wilcox, Mauricio Faria de Oliveira, Michael Ellerman,
	Michael Neuling, Michal Suchánek, Mike Rapoport,
	mike.travis, Nathan Fontenot, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Pavel Tatashin, Peter Zijlstra, Rafael J. Wysocki,
	Rafael J. Wysocki, Rashmica Gupta, Rich Felker, Rob Herring,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Vitaly Kuznetsov,
	Wei Yang, Yoshinori Sato, YueHaibing

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

I guess as long as we have memory block devices and we expect user space
to make a decision we will have this API and the involved problems.

I am open for alternatives, and as I said, any feedback on how to sort
this out will be highly appreciated.

I'll be on vacation for the next two weeks, so this can wait. Just
wanted to note that I am still interested in feedback :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-12-20 13:16       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 13:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Dave Hansen, Heiko Carstens, Wei Yang,
	linux-mm, Pavel Tatashin, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, K. Y. Srinivasan,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael Neuling, Stefano Stabellini, Dave Jiang, Yoshinori Sato,
	Logan Gunthorpe, x86, YueHaibing, Pavel Tatashin, Matthew Wilcox,
	Ingo Molnar, linux-acpi, Ingo Molnar, xen-devel,
	Michal Suchánek, Len Brown, Fenghua Yu, Vitaly Kuznetsov,
	Jan H. Schönherr, Juergen Gross, Vasily Gorbik, Rob Herring,
	mike.travis, Haiyang Zhang, Jonathan Neuschäfer,
	Nicholas Piggin, Jérôme Glisse, Mike Rapoport,
	Borislav Petkov, Andy Lutomirski, Nathan Fontenot,
	Stephen Hemminger, Dan Williams, Joonsoo Kim, Oscar Salvador,
	Tony Luck, Andrew Banman, Mathieu Malaterre, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, Mauricio Faria de Oliveira,
	Thomas Gleixner, Martin Schwidefsky, devel, Andrew Morton,
	linuxppc-dev, Kirill A. Shutemov

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

I guess as long as we have memory block devices and we expect user space
to make a decision we will have this API and the involved problems.

I am open for alternatives, and as I said, any feedback on how to sort
this out will be highly appreciated.

I'll be on vacation for the next two weeks, so this can wait. Just
wanted to note that I am still interested in feedback :)

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-12-20 13:08     ` Michal Hocko
  (?)
  (?)
@ 2018-12-20 13:16     ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-12-20 13:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Wei Yang, linux-mm, Pavel Tatashin,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	K. Y. Srinivasan, Boris Ostrovsky, Paul Mackerras,
	Pavel Tatashin, linux-s390, Michael Neuling

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

I guess as long as we have memory block devices and we expect user space
to make a decision we will have this API and the involved problems.

I am open for alternatives, and as I said, any feedback on how to sort
this out will be highly appreciated.

I'll be on vacation for the next two weeks, so this can wait. Just
wanted to note that I am still interested in feedback :)

-- 

Thanks,

David / dhildenb

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-12-20 13:08     ` Michal Hocko
  (?)
@ 2019-03-27 16:03       ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2019-03-27 16:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Wei Yang, linux-mm, Pavel Tatashin,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael Neuling, Stefano Stabellini

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

So I started to think about this again, and I guess somehow exposing an
identification of the device driver that added the memory section could
be sufficient.

E.g. "hyperv", "xen", "acpi", "sclp", "virtio-mem" ...

Via separate device driver interfaces, other information about the
memory could be exposed. (e.g. for ACPI: which memory devices belong to
one physical device). So stuff would not have to centered around
/sys/devices/system/memory/ , uglifying it for special cases.

We would have to write udev rules to deal with these values, should be
easy. If no DRIVER is given, it is simply memory detected and detected
during boot. ACPI changing the DRIVER might be tricky (from no DRIVER ->
ACPI), but I guess it could be done.

Now, the question would be how to get the DRIVER value in there. Adding
a bunch of fake device drivers would work, however this might get a
little messy ... and then there is unbining and rebinding which can be
triggered by userspace. Thinks to care about? Most probably not.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2019-03-27 16:03       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2019-03-27 16:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, linux-ia64, linuxppc-dev, linux-s390,
	linux-sh, linux-acpi, devel, xen-devel, x86, Andrew Banman,
	Andrew Morton, Andy Lutomirski, Arun KS, Balbir Singh,
	Benjamin Herrenschmidt, Borislav Petkov, Boris Ostrovsky,
	Christophe Leroy, Dan Williams, Dave Hansen, Dave Jiang,
	Fenghua Yu, Greg Kroah-Hartman, Haiyang Zhang, Heiko Carstens,
	H. Peter Anvin, Ingo Molnar, Ingo Molnar, Jan H. Schönherr,
	Jérôme Glisse, Jonathan Neuschäfer, Joonsoo Kim,
	Juergen Gross, Kirill A. Shutemov, K. Y. Srinivasan, Len Brown,
	Logan Gunthorpe, Martin Schwidefsky, Mathieu Malaterre,
	Matthew Wilcox, Mauricio Faria de Oliveira, Michael Ellerman,
	Michael Neuling, Michal Suchánek, Mike Rapoport,
	mike.travis, Nathan Fontenot, Nicholas Piggin, Oscar Salvador,
	Oscar Salvador, Paul Mackerras, Pavel Tatashin, Pavel Tatashin,
	Pavel Tatashin, Peter Zijlstra, Rafael J. Wysocki,
	Rafael J. Wysocki, Rashmica Gupta, Rich Felker, Rob Herring,
	Stefano Stabellini, Stephen Hemminger, Stephen Rothwell,
	Thomas Gleixner, Tony Luck, Vasily Gorbik, Vitaly Kuznetsov,
	Wei Yang, Greg KH

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

So I started to think about this again, and I guess somehow exposing an
identification of the device driver that added the memory section could
be sufficient.

E.g. "hyperv", "xen", "acpi", "sclp", "virtio-mem" ...

Via separate device driver interfaces, other information about the
memory could be exposed. (e.g. for ACPI: which memory devices belong to
one physical device). So stuff would not have to centered around
/sys/devices/system/memory/ , uglifying it for special cases.

We would have to write udev rules to deal with these values, should be
easy. If no DRIVER is given, it is simply memory detected and detected
during boot. ACPI changing the DRIVER might be tricky (from no DRIVER ->
ACPI), but I guess it could be done.

Now, the question would be how to get the DRIVER value in there. Adding
a bunch of fake device drivers would work, however this might get a
little messy ... and then there is unbining and rebinding which can be
triggered by userspace. Thinks to care about? Most probably not.

-- 

Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2019-03-27 16:03       ` David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2019-03-27 16:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Dave Hansen, Heiko Carstens, Wei Yang,
	linux-mm, Pavel Tatashin, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, K. Y. Srinivasan,
	Boris Ostrovsky, Paul Mackerras, Pavel Tatashin, linux-s390,
	Michael Neuling, Stefano Stabellini, Dave Jiang, Logan Gunthorpe,
	x86, Pavel Tatashin, Matthew Wilcox, Ingo Molnar, linux-acpi,
	Ingo Molnar, xen-devel, Michal Suchánek, Len Brown,
	Fenghua Yu, Vitaly Kuznetsov, Jan H. Schönherr,
	Juergen Gross, Vasily Gorbik, Rob Herring, mike.travis,
	Haiyang Zhang, Jonathan Neuschäfer, Nicholas Piggin,
	Jérôme Glisse, Mike Rapoport, Borislav Petkov,
	Andy Lutomirski, Nathan Fontenot, Stephen Hemminger,
	Dan Williams, Joonsoo Kim, Oscar Salvador, Tony Luck,
	Andrew Banman, Mathieu Malaterre, Greg KH, Rafael J. Wysocki,
	linux-kernel, Mauricio Faria de Oliveira, Thomas Gleixner,
	Martin Schwidefsky, devel, Andrew Morton, linuxppc-dev,
	Kirill A. Shutemov

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

So I started to think about this again, and I guess somehow exposing an
identification of the device driver that added the memory section could
be sufficient.

E.g. "hyperv", "xen", "acpi", "sclp", "virtio-mem" ...

Via separate device driver interfaces, other information about the
memory could be exposed. (e.g. for ACPI: which memory devices belong to
one physical device). So stuff would not have to centered around
/sys/devices/system/memory/ , uglifying it for special cases.

We would have to write udev rules to deal with these values, should be
easy. If no DRIVER is given, it is simply memory detected and detected
during boot. ACPI changing the DRIVER might be tricky (from no DRIVER ->
ACPI), but I guess it could be done.

Now, the question would be how to get the DRIVER value in there. Adding
a bunch of fake device drivers would work, however this might get a
little messy ... and then there is unbining and rebinding which can be
triggered by userspace. Thinks to care about? Most probably not.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
  2018-12-20 13:08     ` Michal Hocko
                       ` (3 preceding siblings ...)
  (?)
@ 2019-03-27 16:03     ` David Hildenbrand
  -1 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2019-03-27 16:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Oscar Salvador, Rafael J. Wysocki, Rich Felker, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, Heiko Carstens, Wei Yang, linux-mm, Pavel Tatashin,
	Arun KS, H. Peter Anvin, Stephen Rothwell, Rashmica Gupta,
	K. Y. Srinivasan, Boris Ostrovsky, Paul Mackerras,
	Pavel Tatashin, linux-s390, Michael Neuling

On 20.12.18 14:08, Michal Hocko wrote:
> On Thu 20-12-18 13:58:16, David Hildenbrand wrote:
>> On 30.11.18 18:59, David Hildenbrand wrote:
>>> This is the second approach, introducing more meaningful memory block
>>> types and not changing online behavior in the kernel. It is based on
>>> latest linux-next.
>>>
>>> As we found out during dicussion, user space should always handle onlining
>>> of memory, in any case. However in order to make smart decisions in user
>>> space about if and how to online memory, we have to export more information
>>> about memory blocks. This way, we can formulate rules in user space.
>>>
>>> One such information is the type of memory block we are talking about.
>>> This helps to answer some questions like:
>>> - Does this memory block belong to a DIMM?
>>> - Can this DIMM theoretically ever be unplugged again?
>>> - Was this memory added by a balloon driver that will rely on balloon
>>>   inflation to remove chunks of that memory again? Which zone is advised?
>>> - Is this special standby memory on s390x that is usually not automatically
>>>   onlined?
>>>
>>> And in short it helps to answer to some extend (excluding zone imbalances)
>>> - Should I online this memory block?
>>> - To which zone should I online this memory block?
>>> ... of course special use cases will result in different anwers. But that's
>>> why user space has control of onlining memory.
>>>
>>> More details can be found in Patch 1 and Patch 3.
>>> Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.
>>>
>>>
>>> Example:
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory0
>>> 	KERNEL=="memory0"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="00000000"
>>> 	ATTR{removable}=="0"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="boot"
>>> 	ATTR{valid_zones}=="none"
>>> $ udevadm info -q all -a /sys/devices/system/memory/memory90
>>> 	KERNEL=="memory90"
>>> 	SUBSYSTEM=="memory"
>>> 	DRIVER==""
>>> 	ATTR{online}=="1"
>>> 	ATTR{phys_device}=="0"
>>> 	ATTR{phys_index}=="0000005a"
>>> 	ATTR{removable}=="1"
>>> 	ATTR{state}=="online"
>>> 	ATTR{type}=="dimm"
>>> 	ATTR{valid_zones}=="Normal"
>>>
>>>
>>> RFC -> RFCv2:
>>> - Now also taking care of PPC (somehow missed it :/ )
>>> - Split the series up to some degree (some ideas on how to split up patch 3
>>>   would be very welcome)
>>> - Introduce more memory block types. Turns out abstracting too much was
>>>   rather confusing and not helpful. Properly document them.
>>>
>>> Notes:
>>> - I wanted to convert the enum of types into a named enum but this
>>>   provoked all kinds of different errors. For now, I am doing it just like
>>>   the other types (e.g. online_type) we are using in that context.
>>> - The "removable" property should never have been named like that. It
>>>   should have been "offlinable". Can we still rename that? E.g. boot memory
>>>   is sometimes marked as removable ...
>>>
>>
>>
>> Any feedback regarding the suggested block types would be very much
>> appreciated!
> 
> I still do not like this much to be honest. I just didn't get to think
> through this properly. My fear is that this is conflating an actual API
> with the current implementation and as such will cause problems in
> future. But I haven't really looked into your patches closely so I might
> be wrong. Anyway I won't be able to look into it by the end of year.
> 

So I started to think about this again, and I guess somehow exposing an
identification of the device driver that added the memory section could
be sufficient.

E.g. "hyperv", "xen", "acpi", "sclp", "virtio-mem" ...

Via separate device driver interfaces, other information about the
memory could be exposed. (e.g. for ACPI: which memory devices belong to
one physical device). So stuff would not have to centered around
/sys/devices/system/memory/ , uglifying it for special cases.

We would have to write udev rules to deal with these values, should be
easy. If no DRIVER is given, it is simply memory detected and detected
during boot. ACPI changing the DRIVER might be tricky (from no DRIVER ->
ACPI), but I guess it could be done.

Now, the question would be how to get the DRIVER value in there. Adding
a bunch of fake device drivers would work, however this might get a
little messy ... and then there is unbining and rebinding which can be
triggered by userspace. Thinks to care about? Most probably not.

-- 

Thanks,

David / dhildenb

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types
@ 2018-11-30 17:59 David Hildenbrand
  0 siblings, 0 replies; 80+ messages in thread
From: David Hildenbrand @ 2018-11-30 17:59 UTC (permalink / raw)
  To: linux-mm
  Cc: Oscar Salvador, Rafael J. Wysocki, Michal Hocko, linux-ia64,
	linux-sh, Peter Zijlstra, Benjamin Herrenschmidt, Balbir Singh,
	Dave Hansen, David Hildenbrand, Michal Hocko, Vitaly Kuznetsov,
	Pavel Tatashin, Rich Felker, Arun KS, H. Peter Anvin,
	Stephen Rothwell, Rashmica Gupta, K. Y. Srinivasan, Dan Williams,
	Paul Mackerras, Pavel Tatashin, linux-s390, Michael Neuling

This is the second approach, introducing more meaningful memory block
types and not changing online behavior in the kernel. It is based on
latest linux-next.

As we found out during dicussion, user space should always handle onlining
of memory, in any case. However in order to make smart decisions in user
space about if and how to online memory, we have to export more information
about memory blocks. This way, we can formulate rules in user space.

One such information is the type of memory block we are talking about.
This helps to answer some questions like:
- Does this memory block belong to a DIMM?
- Can this DIMM theoretically ever be unplugged again?
- Was this memory added by a balloon driver that will rely on balloon
  inflation to remove chunks of that memory again? Which zone is advised?
- Is this special standby memory on s390x that is usually not automatically
  onlined?

And in short it helps to answer to some extend (excluding zone imbalances)
- Should I online this memory block?
- To which zone should I online this memory block?
... of course special use cases will result in different anwers. But that's
why user space has control of onlining memory.

More details can be found in Patch 1 and Patch 3.
Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x.


Example:
$ udevadm info -q all -a /sys/devices/system/memory/memory0
	KERNEL=="memory0"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="00000000"
	ATTR{removable}=="0"
	ATTR{state}=="online"
	ATTR{type}=="boot"
	ATTR{valid_zones}=="none"
$ udevadm info -q all -a /sys/devices/system/memory/memory90
	KERNEL=="memory90"
	SUBSYSTEM=="memory"
	DRIVER==""
	ATTR{online}=="1"
	ATTR{phys_device}=="0"
	ATTR{phys_index}=="0000005a"
	ATTR{removable}=="1"
	ATTR{state}=="online"
	ATTR{type}=="dimm"
	ATTR{valid_zones}=="Normal"


RFC -> RFCv2:
- Now also taking care of PPC (somehow missed it :/ )
- Split the series up to some degree (some ideas on how to split up patch 3
  would be very welcome)
- Introduce more memory block types. Turns out abstracting too much was
  rather confusing and not helpful. Properly document them.

Notes:
- I wanted to convert the enum of types into a named enum but this
  provoked all kinds of different errors. For now, I am doing it just like
  the other types (e.g. online_type) we are using in that context.
- The "removable" property should never have been named like that. It
  should have been "offlinable". Can we still rename that? E.g. boot memory
  is sometimes marked as removable ...

David Hildenbrand (4):
  mm/memory_hotplug: Introduce memory block types
  mm/memory_hotplug: Replace "bool want_memblock" by "int type"
  mm/memory_hotplug: Introduce and use more memory types
  mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED

 arch/ia64/mm/init.c                           |  4 +-
 arch/powerpc/mm/mem.c                         |  4 +-
 arch/powerpc/platforms/powernv/memtrace.c     |  9 +--
 .../platforms/pseries/hotplug-memory.c        |  7 +-
 arch/s390/mm/init.c                           |  4 +-
 arch/sh/mm/init.c                             |  4 +-
 arch/x86/mm/init_32.c                         |  4 +-
 arch/x86/mm/init_64.c                         |  8 +--
 drivers/acpi/acpi_memhotplug.c                | 16 ++++-
 drivers/base/memory.c                         | 60 ++++++++++++++--
 drivers/hv/hv_balloon.c                       |  3 +-
 drivers/s390/char/sclp_cmd.c                  |  3 +-
 drivers/xen/balloon.c                         |  2 +-
 include/linux/memory.h                        | 69 ++++++++++++++++++-
 include/linux/memory_hotplug.h                | 18 ++---
 kernel/memremap.c                             |  6 +-
 mm/memory_hotplug.c                           | 29 ++++----
 17 files changed, 194 insertions(+), 56 deletions(-)

-- 
2.17.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2019-03-27 20:22 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-30 17:59 [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types David Hildenbrand
2018-11-30 17:59 ` David Hildenbrand
2018-11-30 17:59 ` David Hildenbrand
2018-11-30 17:59 ` [PATCH RFCv2 1/4] " David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-12-01  1:25   ` Wei Yang
2018-12-01  1:25   ` Wei Yang
2018-12-01  1:25     ` Wei Yang
2018-12-01  1:25     ` Wei Yang
2018-12-01  1:25     ` Wei Yang
2018-12-03 10:32     ` David Hildenbrand
2018-12-03 10:32     ` David Hildenbrand
2018-12-03 10:32       ` David Hildenbrand
2018-12-03 10:32       ` David Hildenbrand
2018-12-03 10:32       ` David Hildenbrand
2018-12-03 20:58       ` Wei Yang
2018-12-03 20:58         ` Wei Yang
2018-12-03 20:58         ` Wei Yang
2018-12-03 20:58         ` Wei Yang
2018-12-03 20:58         ` Wei Yang
2018-12-03 20:58       ` Wei Yang
2018-11-30 17:59 ` David Hildenbrand
2018-11-30 17:59 ` [PATCH RFCv2 2/4] mm/memory_hotplug: Replace "bool want_memblock" by "int type" David Hildenbrand
2018-11-30 17:59 ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-12-01  1:50   ` Wei Yang
2018-12-01  1:50   ` Wei Yang
2018-12-01  1:50     ` Wei Yang
2018-12-01  1:50     ` Wei Yang
2018-12-03 10:33     ` David Hildenbrand
2018-12-03 10:33       ` David Hildenbrand
2018-12-03 10:33       ` David Hildenbrand
2018-12-03 10:33     ` David Hildenbrand
2018-11-30 17:59 ` [PATCH RFCv2 3/4] mm/memory_hotplug: Introduce and use more memory types David Hildenbrand
2018-11-30 17:59 ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-12-04  9:44   ` Michal Suchánek
2018-12-04  9:44   ` Michal Suchánek
2018-12-04  9:44     ` Michal Suchánek
2018-12-04  9:44     ` Michal Suchánek
2018-12-04  9:44     ` Michal Suchánek
2018-12-04  9:47     ` David Hildenbrand
2018-12-04  9:47     ` David Hildenbrand
2018-12-04  9:47       ` David Hildenbrand
2018-12-04  9:47       ` David Hildenbrand
2018-12-04  9:47       ` David Hildenbrand
2018-11-30 17:59 ` [PATCH RFCv2 4/4] mm/memory_hotplug: Drop MEMORY_TYPE_UNSPECIFIED David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59   ` David Hildenbrand
2018-11-30 17:59 ` David Hildenbrand
2018-12-01  0:48 ` [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types Wei Yang
2018-12-01  0:48 ` Wei Yang
2018-12-01  0:48   ` Wei Yang
2018-12-01  0:48   ` Wei Yang
2018-12-20 12:58 ` David Hildenbrand
2018-12-20 12:58   ` David Hildenbrand
2018-12-20 12:58   ` David Hildenbrand
2018-12-20 13:08   ` Michal Hocko
2018-12-20 13:08   ` Michal Hocko
2018-12-20 13:08     ` Michal Hocko
2018-12-20 13:08     ` Michal Hocko
2018-12-20 13:16     ` David Hildenbrand
2018-12-20 13:16     ` David Hildenbrand
2018-12-20 13:16       ` David Hildenbrand
2018-12-20 13:16       ` David Hildenbrand
2019-03-27 16:03     ` David Hildenbrand
2019-03-27 16:03     ` David Hildenbrand
2019-03-27 16:03       ` David Hildenbrand
2019-03-27 16:03       ` David Hildenbrand
2018-12-20 12:58 ` David Hildenbrand
2018-11-30 17:59 David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.